Page MenuHomeMy privacy DNS

Central Domain Database Layout (Brainstorm)
Closed, ResolvedPublic

Description

Thanks to @jawz101's invitation to take over the maintenance (the hosts-file part), it would be required that we get the central domain database up and running.

The questions now is how should it be build? Should keep it separated T1250 from the #pyfunceble project or have it as integrated part T1901, to save double bits for running the same data twice? or should we actually integrate it directly as a new table?

Next question is which fields (data) is required to be able to extract desired data to each users project.

We will probably like to be able to extract the following data in formats:

  • RPZ
    • Keep everything as wildcard when possible.
    • Extract based on category + level (thinking 0 -5).
  • Hosts/flat files
  • Genral
    • Extract the ref to issue that puts the record into a given categor(y|ies)
    • Have a classification level from 0 (Not leveled yes) to 5(10) whereas highest is the worst
    • We most be able to understand which domains are sub-domains and what is to be wildcarded (a bool value maybe)
    • Each record should be individually categorized from 1 to many categories.
    • Be able to set mobile specific bool. Mentioned in https://github.com/AdAway/adaway.github.io/issues/184#issuecomment-725633293
  • Exceptions
    • To start with we should rely solely on individuals whitelist after extraction with ie. UHBW

I think this looks like a good brainstorm for the first step toward the master-plan

In the end of this (The endgame @funilrys 😈 ) I believe this should end up as a separated DB (T1901) from witch the #pyfunceble_-_cluster_-_master will share the domain/URI table with

What do you think @funilrys? Is it possible to initiate this in a sun-rise phase way earlier than we first thought, to be able to accept the honor proved by @jawz101?

Event Timeline

Spirillen triaged this task as Unbreak Now! priority.Nov 12 2020, 5:24 PM
Spirillen created this task.
Spirillen created this object with edit policy "Matrix (Project)".
Spirillen renamed this task from Central Domain Database to Central Domain Database (Layout Brainstorm).Nov 12 2020, 6:26 PM
Spirillen edited projects, added cdb.matrix.rocks; removed Matrix.
Spirillen added a subscriber: cdb.matrix.rocks.

I have following relational approach with a versioning system:

table registered_sources

source (Primary key)registeredregistered_by (references users)apiKey
Matrix2021-02-19:00:00:00+00:00adminnull
external_project_with_cool_name2021-02-21:00:00:00+00:00Somebodyisnobodysecret_nice_password

table domain_versions

domain (Primary key)source (references registered_sources, Primary key)created (Primary key)[domain fields from T3776 (of course separated columns)]
googleadserices.comexternal_project_with_cool_name2021-02-21:00:00:00+00:00[...] removed by xy, reason "false positive"
googleadserices.comMatrix2021-02-20:00:00:00+00:00[...] added with categories "advertising" by xy
foobar.fooexternal_project_with_cool_name2021-02-10:00:00:00+00:00[...] added with categories "spam" by xy
googleadserices.comexternal_project_with_cool_name2021-02-10:00:00:00+00:00[...] added with categories "advertising" by xy

Now we can generate the active list with:

versions = getAllDomainSourceTuplesWithNewestCreateDate()
foreach {
  if (notInWhitelist() && inAnyVersionAddedState(version)) {
    add to output list in format xy
  }
}

==

googleadservices.com #active in Matrix, not active in external source
foobar.foo #active in external_project_with_cool_name

And we can travel trough the history:

versions = getAllVersionsOf(domain) {
group by project, sort by date desc and push it to the web interface

==

googleadservices.com:
  external_project_with_cool_name:
    10.02.2021 - blocked [...]
    21.02.2021 - unblocked [...]
  Matrix:
    20.02.2021 - blocked [...]

foobar.foo:
  external_project_with_cool_name:
    10.02.2021 - blocked [...]
Spirillen renamed this task from Central Domain Database (Layout Brainstorm) to Central Domain Database Layout (Brainstorm).Mar 11 2021, 2:26 PM