Page MenuHomeMy privacy DNS

Domain list exchange format
Closed, ResolvedPublic

Description

In this task, we want to select a format with which white or black lists can be exchanged between organizations/projects and Matrix. Matrix is to make the blocking and unblocking processes transparent and traceable.

The goal is to select an existing format, or to develop a new format which is universally applicable, in order to be able to exchange domain lists between projects, taking into account the traceability. The exchange itself should be possible via different ways (REST-API, EDIFACT, SMTP, GIT, DAV, FTP), but we do not commit to these in this task. External data of this format will then be converted by Matrix into the database format to be specified, before the (possibly versioned) data can be distributed e.g. as RPZ to DNS services.
Authorized users should be able to block and unblock domains and also record reasons and "evidence" in this process. But there is a problem with transparency when it comes to integrating external data sources. In most cases, the domain is the only information transferred so far. In the community, it is standard practice to exchange text lists with one domain per line.In this case, however, it is not or only with difficulty comprehensible for outsiders and partly also for the respective project itself why a certain domain was blocked or unblocked. With the increasing popularity of a service like Matrix, the system also becomes a victim of attacks.This concerns not only technical attacks but also whitelisting for money under the table, or blacklisting for own opportunism to name just two more examples.

Who are the stakeholders? Why is this task important?

As a domain owner, I want to be able to easily determine if, why and by whom my domain has been locked and how to initiate an unlock process.

As a user, I would like to be able to understand which websites are blocked and why, so that I can gain trust in the project and actively support it or passively benefit from it.

As an operator of DNS services with lists of Matrix or carrier of Matrix I would like to achieve evidence security in competition law disputes, in order not to be able to be personally prosecuted.

As a contributor to the project, I would like to be able to trace which changes have taken place and when, in order to be able to weed out erroneous external sources.

As an external partner of the project, I need a standardized format that is time-stable and codified. At the same time, only data that I can deliver should be required.

Central questions for this task:

  • What data do we need to design transparency even with future development of Matrix?
  • What text format do we use?

Event Timeline

Somebodyisnobody created this task.

I thought about following:

  • author data:
    • author
    • project "domain" (DNS-Domain or unique name)
    • author's contact url (optional)
  • entry black/whitelist:
    • type(s) of record (enum)
    • domain
    • type [blacklist/whitelist]
    • danger level
    • categories (enum)
    • reason
    • created date
    • foregin commit url or other reference to the project's process documentation of this entry

This could be written in JSON. Let's discuss the idea.

I thought about following:

  • author data:
    • author
    • project "domain" (DNS-Domain or unique name)
    • author's contact url (optional)
  • entry black/whitelist:
    • type(s) of record (enum)
    • domain
    • type [blacklist/whitelist]
    • danger level
    • categories (enum)
    • reason
    • created date
    • foregin commit url or other reference to the project's process documentation of this entry

Thanks you so much @Somebodyisnobody for these suggestions and thought, they are good.

A few first thought and comments (read first thought that needs some processing time).

For the author data section I would go a bit further:

  • author data:
    • author: required with the minimum of source of data or project or name of author
    • project: required, identified by unique id [ DNS-Domain | private <id> | (open source (FOSS) | payed service) ]
    • author's contact url: required
      • It should always be possible to trace back to the origin and challenge the addition, for which a record have been added.
      • This does not need to be an email, I would actually prefer a open public available source, for everyone to make up there own decision.

For the records entries I have some thought, and I have also tried to add some comments about what data each entry should be representing. (remember first thoughts). The following should make the data exchange able to handle a wider number of blocking systems, hence include more users and blocking systems.

  • entry: ID black/whitelist
    • type(s) of record (enum) [ domain | url | regex/RE2 (dnsdist/adblock/Squid) ]
    • record:
      • if [domain|url]
        • Can be wildcarded [True|False] (Non hosts file based systems)
        • IF = True, from domain level = numeric or name. Numeric less data needs to be stored and transfer. bit optimized.
    • type [blacklist/whitelist] (Note: whitelists should always be a personal choice)
    • danger level (Wording, IE. adult/news/banking etc. are not dangerous)
    • categories (enum):
    • reason: required, Short note (IE: phinshing email) + can contain a link to source for detailed info.
      • Language of note (ISO code)
    • created date: required for all new records
    • altered (changed) date: required for all altered data.

This could be written in JSON. Let's discuss the idea.

I open for JSON, but i would suggest additions like CSV, TXT, not that JSON itself is bad, however, they are not as easily import by IE, bash scripts, for non programmers.
The alternative would require a substantial copy/paste documentation with real world examples, that can get everyone started.

End note, I was thinking of a record type to the entry but I forgot while writing... I'll add it when it comes back, also my mind spots a way we can minimize the output of such file, I just can't put my finger on it..

  • author data:
    • author: required with the minimum of source of data or project or name of author
    • project: required, identified by unique id [ DNS-Domain | private <id> | (open source (FOSS) | payed service) ]

My thoughts were that these informations could be displayed and processed in the matrix frontend as "revision v5 for baddomain.com by author@project.org" for example.

  • author's contact url: required
    • It should always be possible to trace back to the origin and challenge the addition, for which a record have been added.
    • This does not need to be an email, I would actually prefer a open public available source, for everyone to make up there own decision.

๐Ÿ‘ this would also be possible with the author@domain.org-combination but then it's target-project-dependent (how they're linking the data) so you're right.

For the records entries I have some thought, and I have also tried to add some comments about what data each entry should be representing. (remember first thoughts). The following should make the data exchange able to handle a wider number of blocking systems, hence include more users and blocking systems.

  • entry: ID black/whitelist
    • type(s) of record (enum) [ domain | url | regex/RE2 (dnsdist/adblock/Squid) ]
    • record:
      • if [domain|url]
        • Can be wildcarded [True|False] (Non hosts file based systems)
        • IF = True, from domain level = numeric or name. Numeric less data needs to be stored and transfer. bit optimized.

Oh I need to clearify my thoughts here. I thought about a domain which I only want to block/allow for an A- or AAAA-records, not for MX-records for example. As I understood you mean the format of the field record/domain?

  • type [blacklist/whitelist] (Note: whitelists should always be a personal choice)

I thought maybe someone will apply one of our whitelists. We could discard it later if it's useless?

  • danger level (Wording, IE. adult/news/banking etc. are not dangerous)

Let's call it "risk level" where 0 is no risk and 5 is high risk?

  • Language of note (ISO code)

Perfect!

  • altered (changed) date: required for all altered data.

If we assume a versioning principle where only the latest version is ever imported and versioning is done in the projects themselves, then the field would be unnecessary. But to keep the format generalized (and Matrix would then use the altered field for the "created on" field in the internal data model) we leave it in

This could be written in JSON. Let's discuss the idea.

I open for JSON, but i would suggest additions like CSV, TXT, not that JSON itself is bad, however, they are not as easily import by IE, bash scripts, for non programmers.
The alternative would require a substantial copy/paste documentation with real world examples, that can get everyone started.

Yeah I understand that pretty good, but TXT is not a special format. Converting JSON/XML into CSV is not that problem as long as we have no arrays inside the single domain-entry:

{
  key1: {
    foo: "foo",
    bar: "bar",
  },
  key2: "foobar"
}

==

key1.foo, key1.bar, key2
foo, bar, foobar

The big advantage I see is that with JSON/XML we can specify a namespace (XML Style Sheet) which can be used to validate the XML-File easily on the client's side. From this XSD you can also generate an object structure with all attributes in your preferred language.

My thoughts were that these informations could be displayed and processed in the matrix frontend as "revision v5 for baddomain.com by author@project.org" for example.

Well I was thinking a bit more global ๐Ÿ˜ƒ where this formula should be integrated by any kind of BlackListing projects for various Blocking systems, and not only for the Matrix project.

For you example: revision v5 for baddomain.com by author@project.org.
This can be achieved anyway as the it is inside the data blocks..

I thought about a domain which I only want to block/allow for an A- or AAAA-records, not for MX-records for example

That can be difficult, as it would rely solely on the record it self, and with the exception from RPZ, this is actually not possible.

Example

In this case any blocking of example.org will also be blocking for the mail.example.org,

example.org     IN A    192.0.2.2
example.org     IN MX   10  example.org

Unless, which also is more common practice, the MX record is separated from the domain.

example.org     IN A    192.0.2.2
example.org     IN MX   10  mail.example.org

The lookup process in RPZ will stop the lookup queries here for example.org unless you actually whitelist it ๐Ÿ˜ƒ Or by switching the use case of CNAME to A/AAAA pointing to a IP).

If you now takes, let's say a hosts file/Pi-hole setup, you can't whitelist example.org unless you removes it entirely from your blacklist(s)

Indeed something we NEEDS to be aware of, and take into consideration. ๐Ÿ‘ (Just ruined my work last night ๐Ÿ˜ญ )

I thought maybe someone will apply one of our whitelists. We could discard it later if it's useless?

This will allow integrating any whitelists a user would desire to use, while order matters. ie. https://github.com/Ultimate-Hosts-Blacklist/whitelist

I work on a similar idea, but that will be sticking to the record it self, to make it more flexible. The idea is taken from how the powerdns API is used.

Ruined work from last night - read draft!!
"rrsets": [
    {
        "author": "matrix.rocks",
        "project": "Tracking domains",
        "contact": "https://www.mypdns.org/project/profile/18/",
        "comment": "Records ID are synonyms for https://www.mypdns.org/T<ID>"
                    "Which makes 3776 equal https://www.mypdns.org/T3776",
        "records": [
            {
                "ID": "2891",
                "type": "CNAME",
                "comment": "",
                "content":[
                    {
                        "key1": "ctologger01.analytics.go.com.",
                        "wildcard": "1",
                        "level": "4",
                        "changetype": "REPLACE", # Using REPLACE rather than add to avoid duplicates.
                    },
                    {
                        "key2": "analytics.go.com.",
                        "wildcard": "1",
                        "level": "4",
                        "changetype": "DELETE",
                    }

If we assume a versioning principle where only the latest version is ever imported and versioning is done in the projects themselves, then the field would be unnecessary. But to keep the format generalized (and Matrix would then use the altered field for the "created on" field in the internal data model) we leave it in

I'm not sure we would need this, as you mention earlier, any version (SOA like) record, should of curse be in the beginning (Header) so any system can check if it needs to actually transfer the zone (file)... hmm howto???

Should we make this a last checked date?

Yeah I understand that pretty good, but TXT is not a special format

That's purely to help Windows users, as the default program opening the other formats (as I recalls it) is Excel ๐Ÿ˜† and the TXT can still keep the CSV format.

Thanks for your nice feedback @Somebodyisnobody , I'll keep working on a show and tell paste of the JSON formatted version.

1"rrsets": [
2 {
3 "author": "matrix.rocks",
4 "project": "Domain list exchange format",
5 "contact": "https://www.mypdns.org/project/profile/18/",
6 "comment": "Records ID are synonyms for https://www.mypdns.org/T<ID>"
7 "Which makes 3776 equal https://www.mypdns.org/T3776",
8 "c-lang": "en",
9 "version": "show and tell 0.1a4",
10 "records": [
11 {
12 "ID": "2891",
13 "type": "CNAME",
14 "comment": "",
15 "content": [
16 {
17 "key1": "ctologger01.analytics.go.com.",
18 "wildcard": "1",
19 "wlevel": "4",
20 "sevirety": "5",
21 "category": "Tracking",
22 "changetype": "REPLACE",
23 "comment": "Tracking scripts",
24 "c-lang": "en"
25 },
26 {
27 "key2": "analytics.go.com.",
28 "wildcard": "1",
29 "wlevel": "4",
30 "sevirety": "5",
31 "category": "Tracking",
32 "changetype": "REPLACE",
33 "comment": "Tracking scripts",
34 "c-lang": "en"
35 },
36 {
37 "key3": "unid.go.com.",
38 "wildcard": "1",
39 "wlevel": "4",
40 "sevirety": "5",
41 "category": "Tracking",
42 "changetype": "REPLACE",
43 "comment": "Tracking scripts",
44 "c-lang": "en"
45 },
46 {
47 "key4": "log.go.com.",
48 "wildcard": "0",
49 "wlevel": "0",
50 "sevirety": "5",
51 "category": "Tracking",
52 "changetype": "REPLACE",
53 "comment": "Logging Tracking data",
54 "c-lang": "en"
55 },
56 {
57 "Key5": "ad.go.com",
58 "wildcard": "0",
59 "wlevel": "0",
60 "sevirety": "2",
61 "category": "AdWare",
62 "changetype": "REPLACE",
63 "comment": "AdWare, Some site are broken by implanting this adware",
64 "c-lang": "en-US"
65 },
66 {
67 "Key5": "ngads.go.com",
68 "wildcard": "0",
69 "wlevel": "0",
70 "sevirety": "5",
71 "category": "AdWare",
72 "changetype": "REPLACE",
73 "comment": "AdWare",
74 "c-lang": "en-GB"
75 }
76 ]
77 },
78 {
79 "ID": "1473",
80 "type": "IP",
81 "changetype": "REPLACE",
82 "comment": "",
83 "content": [
84 {
85 "key1": "8.8.8.8",
86 "sevirety": "5",
87 "category": "Tracking",
88 "changetype": "REPLACE",
89 "comment": "Tracking",
90 "c-lang": "en"
91 },
92 {
93 "key2": "8.8.4.4",
94 "sevirety": "5",
95 "category": "Tracking",
96 "changetype": "REPLACE",
97 "comment": "Tracking",
98 "c-lang": "en"
99 }
100 ]
101 },
102 {
103 "ID": "902",
104 "type": "REGEX",
105 "changetype": "DELETE",
106 "comment": "",
107 "content": [
108 {
109 "key1": "^mkt([0-9]{1,5})\\.com$",
110 "sevirety": "5",
111 "category": "Scam / Spam",
112 "changetype": "REPLACE",
113 "comment": "Spam / Scam on third-party sites (fake thread wares)",
114 "c-lang": "en"
115 }
116 ]
117 }
118 ]
119 }

I got a suggestion for the danger level => risk level

What do you think of Exposure level
How likely are you to be exposed for ads/tracking

An alternative to exposed could be "sevirety"
How high is the treat; How many ads; How many banks can you block/access; How much nipples/vreast etx see on this adult site;

In my mind a bit more allround/neutral

My thoughts were that these informations could be displayed and processed in the matrix frontend as "revision v5 for baddomain.com by author@project.org" for example.

Well I was thinking a bit more global ๐Ÿ˜ƒ where this formula should be integrated by any kind of BlackListing projects for various Blocking systems, and not only for the Matrix project.

yeah of course, we can use it so

For you example: revision v5 for baddomain.com by author@project.org.

The lookup process in RPZ will stop the lookup queries here for example.org unless you actually whitelist it ๐Ÿ˜ƒ Or by switching the use case of CNAME to A/AAAA pointing to a IP).

mmmh you're right... So is it worth to keep this field?

I work on a similar idea, but that will be sticking to the record it self, to make it more flexible. The idea is taken from how the powerdns API is used.

See my suggestion, i tried to merge our ideas

If we assume a versioning principle where only the latest version is ever imported and versioning is done in the projects themselves, then the field would be unnecessary. But to keep the format generalized (and Matrix would then use the altered field for the "created on" field in the internal data model) we leave it in

I'm not sure we would need this, as you mention earlier, any version (SOA like) record, should of curse be in the beginning (Header) so any system can check if it needs to actually transfer the zone (file)... hmm howto???

Ooof I don't know where to consolidate my ideas in this many P an T threads. You see the records as individuals/cases with ids and each id has one or more domains. The last state is being overwritten with the new state from the external source. I think in versions, not in cases: I'll try to show you my idea:
Let's assume we have a relational table "versions" with primary key set {domain, organisation, created_date}. This table contains the other fields (contact, author, categories, comment, sevirety (risk level) etc. pp.) of course. Now a DLEF-file is imported via API POST /import or as file (see below). The backend of the organization will now read every entry in the data-array. It will import the element for element if it doesn't exist with the primary keys in the table. Here the created date becomes important, because any other values, ids, comments, authors could be appear at least twice. Versions are important to track why a domain has been blocked in history. Without versions (overwriting the last state) it's not tracable who blocked google.com for 5 minutes just for fun.

TXT can still keep the CSV format.

Perfect so it's just another file extension

My idea is: file format in P35

A few words to my format:

FieldSomebodyisnobodySprillen (in P33)
authorauthor's name for every entryorganisation's name
IDI would not work with ids from external sources to group elements in a record-entry -> versioning idea. I see no advantage grouping domains as this can be done in the project based on the names.follows the case-based idea
changetypefor the processing performance it's better to have different endpoints for adding or removing entries. I planned this field in the metadata for all entriesmixed changetypes in one file/api call
changetype: replaceAs I prefer the versioning system I would never replace a version with another, just adding or removing entries and update the pointer (in sql sort desc "created_date") to the current versiononly the current version is valid
wlevel??????
signaturesigning seems a bit over powered but look at the third element in my example. it's imported by "mypdns.org" but contains entries from "otherproject.org".-

For the protocol: We both mean "sevirety" is the best field identifier for the risk level.

Ooof I don't know where to consolidate my ideas in this many P an T threads. You see the records as individuals/cases with ids and each id has one or more domains.

No, sub-domains(key), related to domain (ID) zone file style...

1example.org. 86400 IN SOA ns1.example.org mail@example.org 2021021801 3600 600 1209600 3600
2example.org. IN NS ns1.example.org.
3example.org. IN NS ns2.example.org.
4example.org. IN A 192.0.2.2
5bad.example.org. IN A 192.0.2.3
6god.example.org. IN A 192.0.2.4
7example.org. 86400 IN MX 10 mail.example.org.

example.zone
example.org. SOA (ID)
example.org. A  (Key)
bad.example.org. A  (Key)
god.example.org.  A (Key)

wlevel

Can be wildcarded from domain level n, this is smart for RPZ/unbound/dnsmasq and other blocking mechanism that understand wildcarding (not to be mixed with RegEx)

who blocked google.com for 5 minutes just for fun

๐Ÿ˜• ๐Ÿ˜• google.com is deliberately blocked in my home network for anti-privacy, adware, spyware etc. ๐Ÿ˜„

Is this how you would handle any sub-domains, that are not to be put into same category??

1{
2 "DLEF-version": "1.0",
3 "metadata of the organization who created this list": "mypdns.org, ecetera pp.",
4 "action": "'add' new evil domains or 'remove' e.g. false positives from target if exist",
5 "data": [
6 {
7 "author": "Disney",
8 "organization": "disney.com",
9 "contact": "https://www.mypdns.org/p/GeorgeLucas/",
10 "record": {
11 "types": [
12 "A"
13 ],
14 "domain": {
15 "name": "starwars.com",
16 "format": "plain"
17 },
18 "categories": [
19 "VIRUS",
20 "Anti Virus",
21 "Female Chauvinisms",
22 "God Guys",
23 "Jedies"
24 ],
25 "sevirety": 3,
26 "comment": {
27 "content": "Abusive misuse of the best film Star Wars!",
28 "language": "en-US"
29 },
30 "created": "--date--",
31 "documentation": "commit-url, issue whatever"
32 },
33 "signature": "ssh, pgp, x.509 signing over data from organization (in this case mypdns.org) for later purpose?"
34 },
35 {
36 "author": "Disney",
37 "organization": "disney.com",
38 "contact": "https://www.mypdns.org/p/Somebodyisnobody/",
39 "record": {
40 "types": [
41 "CNAME"
42 ],
43 "domain": {
44 "name": "rey.starwars.com",
45 "format": "wildcard, regex, plain"
46 },
47 "categories": [
48 "Virus",
49 "Female Chauvinisms",
50 "SPYWARE",
51 "TRACKWARE"
52 ],
53 "sevirety": 5,
54 "comment": {
55 "content": "Not a cute chick... a evil bitch...",
56 "language": "en-US"
57 },
58 "created": "--date--",
59 "documentation": "commit-url, issue whatever"
60 },
61 "signature": "ssh, pgp, x.509 signing over data from organization (in this case mypdns.org) for later purpose?"
62 },
63 {
64 "author": "Disney",
65 "organization": "disney.com",
66 "contact": "https://www.mypdns.org/p/Somebodyisnobody/",
67 "record": {
68 "types": [
69 "CNAME"
70 ],
71 "domain": {
72 "name": "joda.starwars.com",
73 "format": "wildcard, regex, plain"
74 },
75 "categories": [
76 "Anti Virus",
77 "God Guys",
78 "Jedies"
79 ],
80 "sevirety": 1,
81 "comment": {
82 "content": "The oldest Jedi master in Star Wars Saga",
83 "language": "en-US"
84 },
85 "created": "--date--",
86 "documentation": "commit-url, issue whatever"
87 },
88 "signature": "ssh, pgp, x.509 signing over data from organization (in this case mypdns.org) for later purpose?"
89 }
90 ]
91}

I'm following you know about, it's makes sense to me, what you meant.

"author": "Disney",
"organization": "disney.com",
"contact": "some info"
Linesrecords coveredPasteBits
703P353044
91 (70 vscode auto line braking)3P363283
1209P334489

How would you ensure a records found dead by Matrix without the CHANGETYPE directive, is deleted at the recipients end?
How would you append a version (file last changed) aka SOA

wlevel

Can be wildcarded from domain level n, this is smart for RPZ/unbound/dnsmasq and other blocking mechanism that understand wildcarding (not to be mixed with RegEx)

okay ddin't know that. Let's keep it in.

Is this how you would handle any sub-domains, that are not to be put into same category??

1{
2 "DLEF-version": "1.0",
3 "metadata of the organization who created this list": "mypdns.org, ecetera pp.",
4 "action": "'add' new evil domains or 'remove' e.g. false positives from target if exist",
5 "data": [
6 {
7 "author": "Disney",
8 "organization": "disney.com",
9 "contact": "https://www.mypdns.org/p/GeorgeLucas/",
10 "record": {
11 "types": [
12 "A"
13 ],
14 "domain": {
15 "name": "starwars.com",
16 "format": "plain"
17 },
18 "categories": [
19 "VIRUS",
20 "Anti Virus",
21 "Female Chauvinisms",
22 "God Guys",
23 "Jedies"
24 ],
25 "sevirety": 3,
26 "comment": {
27 "content": "Abusive misuse of the best film Star Wars!",
28 "language": "en-US"
29 },
30 "created": "--date--",
31 "documentation": "commit-url, issue whatever"
32 },
33 "signature": "ssh, pgp, x.509 signing over data from organization (in this case mypdns.org) for later purpose?"
34 },
35 {
36 "author": "Disney",
37 "organization": "disney.com",
38 "contact": "https://www.mypdns.org/p/Somebodyisnobody/",
39 "record": {
40 "types": [
41 "CNAME"
42 ],
43 "domain": {
44 "name": "rey.starwars.com",
45 "format": "wildcard, regex, plain"
46 },
47 "categories": [
48 "Virus",
49 "Female Chauvinisms",
50 "SPYWARE",
51 "TRACKWARE"
52 ],
53 "sevirety": 5,
54 "comment": {
55 "content": "Not a cute chick... a evil bitch...",
56 "language": "en-US"
57 },
58 "created": "--date--",
59 "documentation": "commit-url, issue whatever"
60 },
61 "signature": "ssh, pgp, x.509 signing over data from organization (in this case mypdns.org) for later purpose?"
62 },
63 {
64 "author": "Disney",
65 "organization": "disney.com",
66 "contact": "https://www.mypdns.org/p/Somebodyisnobody/",
67 "record": {
68 "types": [
69 "CNAME"
70 ],
71 "domain": {
72 "name": "joda.starwars.com",
73 "format": "wildcard, regex, plain"
74 },
75 "categories": [
76 "Anti Virus",
77 "God Guys",
78 "Jedies"
79 ],
80 "sevirety": 1,
81 "comment": {
82 "content": "The oldest Jedi master in Star Wars Saga",
83 "language": "en-US"
84 },
85 "created": "--date--",
86 "documentation": "commit-url, issue whatever"
87 },
88 "signature": "ssh, pgp, x.509 signing over data from organization (in this case mypdns.org) for later purpose?"
89 }
90 ]
91}

Yeah right, prefect!

How would you ensure a records found dead by Matrix without the CHANGETYPE directive, is deleted at the recipients end?

I would send a file with "action: remove" or call an API endpoint (e.g. POST /remove)

{
    "DLEF-version": "1.0",
    "metadata of the organization who created this list": "mypdns.org, ecetera pp.",
    "action": "remove",
    "data": [
	        {
	            "author": "Matrix",
	            "organization": "mypdns.org",
	            "contact": "Administration url of the matrix crawler or something similar",
	            "record": {
	                "types": [
	                    "A"
	                ],
	                "domain": {
	                    "name": "starwars.com",
	                    "format": "plain"
	                },
	                "categories": [],
	                "sevirety": 3,
	                "comment": {
	                    "content": "Domain isn't active anymore. Automatically removed by Matrix spider/crawler.",
	                    "language": "en-US"
	                },
...........

like it was removed by an author. In the comment Matrix could write an automatic message like "Domain isn't active anymore, automatically removed".

How would you append a version (file last changed) aka SOA

Maybe I misunderstood the question: For the whole file? I intended the metadata-section (pseudo-field "metadata of the organization who created this list") for a value "created" for the date/time, when the file was generated by the backend of the respective organization. If the file isn't pushed to an application and instead the application is pulling it from somewhere every few minutes it can compare the created-time with the last known created-time.

Idea for CSV: semicolon-separated and array-elements like in "categories" can be comma-separated. That's excel-safe

Did you wrote Disney assuming that Disney requested a blacklisting but was not the author?

No, because Disney have bought the rights to Star Wars of George Lucas and then ruined it with all there "Female heroine" spinoffs ๐Ÿ˜’

It was darkVader.de who activated the DeathStar.com ๐Ÿ˜„

I would send a file with "action: remove" or call an API endpoint (e.g. POST /remove)

That's would, at least in my case with an API backend for zone records, mean double system load. https://docs.powerdns.com/authoritative/http-api/zone.html

While if we keeps the changetype or "action" we can do things in one cycle/process.

Load 1 json from server

convert 1 json to usable data for pushing to API

The same will be necessary for any other, first load in new/fresh data, next delete old data, this can be done in one process by the changetype or "action" and should be bound to the individual records.

I still think this is the optimal...

... compare the created-time with the last known created-time

Yep SOA style ๐Ÿ˜† ๐Ÿ˜€ https://docs.powerdns.com/authoritative/dnssec/operational.html#possible-soa-edit-values ๐Ÿ˜‰

It is still a lot of data by following this path vs the grouping in P33

domain = id
    key = sub-domain

We can still maintain the data for "author" in anf to me it makes more seance to keep data together so related data are grouped around each other, otherwise you risk the evil.deathstar is in the first record and good.deathstar so in the last group, this means, if you as a human would like to review the JSON/CSV things are floating around as unorganized.

Yes, I'm leaning against the grouping as in P33.

Idea for CSV: semicolon-separated and array-elements like in "categories" can be comma-separated

Absolutely

We are getting closer ๐Ÿ‘

That's would, at least in my case with an API backend for zone records, mean double system load. https://docs.powerdns.com/authoritative/http-api/zone.html

While if we keeps the changetype or "action" we can do things in one cycle/process.

Load 1 json from server

convert 1 json to usable data for pushing to API

The same will be necessary for any other, first load in new/fresh data, next delete old data, this can be done in one process by the changetype or "action" and should be bound to the individual records.

I don't think that it would be so much load but okay then let's do it like this: {P35}

It is still a lot of data by following this path vs the grouping in P33

domain = id
    key = sub-domain

We can still maintain the data for "author" in anf to me it makes more seance to keep data together so related data are grouped around each other, otherwise you risk the evil.deathstar is in the first record and good.deathstar so in the last group, this means, if you as a human would like to review the JSON/CSV things are floating around as unorganized.

If we do a group array then we have following csv:

group name and metadata; group content
example.com; author, org, contact, domain, domain format, [...], comment, documentation, author, org, contact, domain, domain format, [...], comment, documentation, author, org, contact, domain, domain format, [...], comment, documentation

In Excel:

grafik.png (99ร—1 px, 7 KB)

That's not really human readable.
And what if the domain.com isn't to be blocked but only bad.domain.com? With SOA-records you always have all subdomains also those, which are not affected by the process. For me it's hard to compare this two things.
So I don't feel good with the grouping. This should be done while file generation (sorting entries by name). Then it would be again per-line like:

author; org; contact; domain; domain format; [...]; comment; documentation
author; org; contact; domain; domain format; [...]; comment; documentation
author; org; contact; domain; domain format; [...]; comment; documentation

That's would, at least in my case with an API backend for zone records, mean double system load.

Please keep in mind that in the normal, daily process the domain list would contain a diff from the last x days, not the whole list. And in the whole list (for initial setup) there were no removal entries because it's applied to a green field. So the ressources needed to process the list would be not so huge in production. Just the first set up needs the whole file and this ressources.
That's a different approach to the current "I read the whole file from another project throw it with many others into the pyfunceble mixer and remove duplicates"

I just created a style sheet with the current state

We have completely forgotten one very very important field all this.... value

Living example from T590 trafficmanager.net

T590 trafficmanager.net
"record": {
	        "types": ["CNAME"],
	        "domain": {
	          "name": "vscode-sync.trafficmanager.net",
	          "value": "waws-prod-am2-325.cloudapp.net"
	          "format": "wildcard, regex, plain",
	          "wlevel": null
	        },
iddomain_idnametypecontentttlpriodisabledordernameauth
6406064<ID>vscode-sync.trafficmanager.net.tracking.mypdns.cloudCNAMEwaws-prod-am2-325.cloudapp.net86400001
{
"records": [
	{
		"id" : 6406064,
		"domain_id" : <ID>,
		"name" : "vscode-sync.trafficmanager.net.tracking.mypdns.cloud",
		"type" : "CNAME",
		"content" : "waws-prod-am2-325.cloudapp.net",
		"ttl" : 86400,
		"prio" : 0,
		"disabled" : 0,
		"ordername" : null,
		"auth" : 1
	}
]}

I am completely lost. What do you mean?

After a lot of discussion in another communication channel it has clarified that with the suggestion xyz a mapping of a CNAME manipulation rule is meant. This is for use in RPZ rules and map vscode-sync.trafficmanager.net.tracking.mypdns.cloud CNAME waws-prod-am2-325.cloudapp.net. which jumps over some wildcard blacklisted entries. So it's a whitelisting / exemption. See https://www.mypdns.org/w/dns/dns_query_flow_with_rpz_simplified/ for more information.

CNAME-manipulation data are generated by pyfunceble. They are an image of live data in the DNS-system which changes dynamically. As the CNAME-manipulation rules are a static redirect to another target, invalid, expired or death cname targets will cause errors. In the opposite: Blacklisting a domain which doesn't exist anymore doesn't hurt. So I have big puffing pains including redirect rules into the DLEF and would like to promote that each project/organization generates its own cname manipulation rules (or fetches them from matrix via another format specified specifically for matrix generated data).
So the DLEF-files are used to exchange between projects who has why when (un)blocked or (un)whitelisted a domain (before pyfunceble takes action = input data from pyfunceble's view) while another format like RPZ is used specially for processed and with live data from DNS agggregated lists (after pyfunceble takes action = output data from pyfunceble's view).

@Spirillen can you tell me what's exactly meant with the wildcard-level? I cannot find anything about levels for wildcard expressions in the web.

For the protocol a wlevel example:
bad.ads.adult.domain.com shall be blocked by wildcard with wlevel 3. This means on the 3rd domain level -> *.domain.com

The advantage here is that we can use the information before domain.com for explicit blocking in formats that doesn't support wildcarding like hosts file unter windows.
Example:

bad.ads.adult.domain.com 0.0.0.0
ads.adult.domain.com 0.0.0.0
adult.domain.com 0.0.0.0

If wlevel is 0 or not set then the given domain will be wildcarded as whole (*.bad.ads.adult.domain.com)

I have updated the xsd and renamed "wlevel" to "wildcard-domain-level"

If wlevel is 0 or not set then the given domain will be wildcarded as whole (*.bad.ads.adult.domain.com)

If the "wildcard-domain-level" is 0 or not present, I'm feeling bad about wildcarding them, as It means this records IS and shall be matched with

bad.ads.adult.domain.com == bad.ads.adult.domain.com no more, no less

Example

If we instead are blocking with wildcard *.bad.ads.adult.domain.com then we would also be blocking for god-subdomain.bad.ads.adult.domain.com

If we instead are blocking with wildcard *.bad.ads.adult.domain.com then we would also be blocking for god-subdomain.bad.ads.adult.domain.com

Therefore we don't set the field <format> to enum "WILDCARD_DOMAIN_LEVEL" but on "PLAIN"

But if the format is wildcard and <wildcard-domain-level> integer isn't set or 0 we can assume that the author want's to say "everything that matches the string above"

If you by above a thinking of ie ads.adult.domain.com can be blocked as well, then I'll say no.

I'll would rather say, IF <wildcard-domain-level> is not >0 then the <format>wildcard</format> is the wrong selection and we treat it as it was a <format>plain</format> => it was miss-formatted, we go failsafe.

I would do this to leave room, for human mistakes.

If you by above a thinking of ie ads.adult.domain.com can be blocked as well, then I'll say no.

Of course no, I meant following:

	        <domain>
	          <name>bad.ads.adult.domain.com</name>
	          <format>WILDCARD</format>
	          <wildcard-domain-level>3</wildcard-domain-level>
	        </domain>

results in:
*.domain.com

	        <domain>
	          <name>bad.ads.adult.domain.com</name>
	          <format>WILDCARD</format>
	        </domain>

results in:
*.bad.ads.adult.domain.com

	        <domain>
	          <name>bad.ads.adult.domain.com</name>
	          <format>PLAIN</format>
	        </domain>

results in:
bad.ads.adult.domain.com

As processed by applications I don't see much points where a human mistake could happen and we don't need a double check. This would attract attention on the author's side when his system suddently blocks by wildcard. It makes it easier to implement as most projects want to block *.foo.bar when they write foo.bar. So the <wildcard-domain-level> would be always tha maximum in most cases. That's why in my opinion it's better to only set <wildcard-domain-level> if it's not a "normal" wildcard.
If we turn it arround an existing <wildcard-domain-level> would cause that <format> is ignored. The subproperty <wildcard-domain-level> would have a higher priority than it's parent.

Hey Folks

As I was reading on a topic in @jawz101 https://github.com/AdAway/adaway.github.io & in relation to {T2632} I came to think, do the current layout, take into consideration, that in some cases an exchange only would be desirable if the record to exchange match mobile devices?

What do you mean with "mobile devices"? I would solve this via category "Mobile/Desktop/IoT"

I would solve this via category "Mobile/Desktop/IoT"

And then do it as a sub categorizing/classification in the backend(s) ๐Ÿค” I'm indeed open for that suggestion.

What do you mean with "mobile devices"?

Well the mentioned project is a hostfile which only aims to keep relevant records for mobile devices vs IE UHB whits is a all OS targeting list, hence also a way bigger list.

I can proudly announce that the suggestion for how to distribute information's across projects have been fully implanted into the ChongLuaDao project.

This was done on the 8 of March 2021. We at the Matrix project would like to thanks everyone contributing to this not at least @7onez and his team, helping spreading the good initiative by @Somebodyisnobody