lazydns

Domain Matching Rules Guide

Overview

The Domain Set plugin in lazydns supports sophisticated domain name matching with multiple rule types and priority-based evaluation. This document describes the complete domain matching rule system.

Quick Start

Basic Examples

# Load a domain list file with default (domain) matching
- name: domain_set
  tag: direct
  args:
    files:
      - direct-list.txt

# With specific match type
- name: domain_set
  tag: gfw
  args:
    files:
      - gfw-list.txt
    default_match_type: domain
    auto_reload: true

Rule Formats in Files

# Comments start with #
# This is a comment

# Exact match only
full:google.com

# Domain match (default)
domain:example.com
example.com              # No prefix = uses default_match_type

# Keyword substring match
keyword:facebook

# Regular expression match
regexp:.*\.google\.com$

# Empty lines are ignored

Match Types

1. Full Match (`full:`)

Exact domain matching only, no subdomains.

Syntax: full:example.com
Matches: example.com, EXAMPLE.COM (case-insensitive)
Does NOT match: www.example.com, sub.example.com, example.com.hk
Performance: O(1) - constant time lookup
Use case: Block specific exact domains, whitelist specific services

Examples

full:google.com         → matches only "google.com"
full:api.github.com     → matches only "api.github.com"
                        → does NOT match "github.com" or "www.api.github.com"

2. Domain Match (`domain:`)

Match domain and all its subdomains.

Syntax: domain:example.com or just example.com (uses default)
Matches: example.com, www.example.com, api.example.com, a.b.c.example.com
Does NOT match: notexample.com, example.com.hk, examplecom
Performance: O(levels) - logarithmic in domain depth
Use case: Block entire domain hierarchies (most common use)

Subdomain Priority

When multiple domain rules could match, the most specific (longest) rule wins:

Rules: com, example.com, api.example.com

Query www.example.com:
  ✓ Matches api.example.com? No
  ✓ Matches example.com? Yes (return true)
  ✗ Would also match com, but already found more specific match

Query api.example.com:
  ✓ Matches api.example.com? Yes (return true)

Query other.com:
  ✓ Matches api.example.com? No
  ✓ Matches example.com? No
  ✓ Matches com? Yes (return true)

Examples

domain:google.com       → matches google.com, www.google.com, maps.google.com, etc.
domain:co.uk            → matches all .co.uk domains
example.com             → equivalent to domain:example.com (if default is domain)

3. Keyword Match (`keyword:`)

Substring/keyword matching anywhere in the domain.

Syntax: keyword:google
Matches: google.com, www.google.com, google.com.hk, mygoogle.net, my-google-service.org
Does NOT match: gogle.com (typo), notgooglelike.com (keyword not present as substring)
Performance: O(n) - linear traversal
Evaluation order: Import order (first match wins)
Use case: Catch variations and domain names containing keywords, less precise
Warning: Can produce false positives (e.g., keyword:ad matches add.com, advertisement.com, badword.com)

Examples

keyword:facebook        → matches facebook.com, www.facebook.com, facebook.com.cn, myfacebook.net, etc.
keyword:google          → matches google.com, mygoogle.com, google.com.hk, googlechrome.com, etc.
keyword:cdn             → matches cdn.com, mycdn.net, ocdn.org, etc. (be careful!)

4. Regexp Match (`regexp:`)

Regular expression pattern matching using Rust regex syntax (compatible with Go stdlib).

Syntax: regexp:^[a-z]+\.google\.com$
Pattern: Standard Rust regex syntax
Performance: O(n·regex_complexity) - can be CPU-intensive with complex patterns
Evaluation order: Import order (first match wins)
Use case: Complex pattern matching, flexible rules

Regex Basics

Pattern	Matches	Does NOT match
`.+\.google\.com$`	`www.google.com`, `maps.google.com`	`google.com` (no prefix)
`^google\.`	`google.com`, `google.co.uk`	`www.google.com`
`(baidu\\|google)`	`baidu.com`, `google.com`	`notbaidu.com`
`test-[0-9]+`	`test-123.com`, `test-1.org`	`test-abc.com`

Examples

regexp:.+\.github\.io$          → matches *.github.io (personal GitHub Pages)
regexp:^api\.                   → matches api.example.com, api.service.com, etc.
regexp:(qq\|wechat)             → matches qq.com, wechat.com
regexp:.*cdn.*                  → matches any domain containing "cdn"

Performance Warning

Regexp matching is CPU-intensive, especially with:

Complex backtracking patterns
Overlapping quantifiers (.*.*, .+.+)
Large number of rules

Best practices:

Use simpler alternatives when possible (full/domain match)
Avoid complex patterns with many regexp rules
Order rules by likelihood of match (common patterns first)
Use anchors ^ and $ to improve performance

Matching Priority

Rules are evaluated in strict priority order. The first matching rule determines the result.

Priority Order (Highest to Lowest)

Full > Domain > Regexp > Keyword

Example

Rules:
  - full:example.com
  - domain:example.com
  - keyword:example
  - regexp:.*example.*

Query example.com:

1. Check Full rules → matches full:example.com ✓ RETURN TRUE
   (Never reaches Domain, Regexp, or Keyword checks)

Query sub.example.com:

1. Check Full rules → no match
2. Check Domain rules → matches domain:example.com ✓ RETURN TRUE
   (Never reaches Regexp or Keyword checks)

Query myexample.org:

Check Full rules → no match
Check Domain rules → no match
Check Regexp rules → matches .*example.* ✓ RETURN TRUE
   (Never reaches Keyword check)

Performance Characteristics

Time Complexity

Match Type	Complexity	Notes
Full	O(1)	HashMap lookup
Domain	O(d)	d = domain depth, typically 3-4
Regexp	O(n·r)	n = rules, r = regex complexity
Keyword	O(n·s)	n = rules, s = string length

Space Complexity (Approximate)

Match Type	Memory per 10,000 rules
Full	~1 MB
Domain	~1 MB
Regexp	~2-5 MB (includes compiled regex)
Keyword	~0.5-1 MB

Benchmarks

Rule set size: 100,000 domains

Match type    | Avg Query Time | Remarks
-----------------------------------
Full          | < 1 µs         | Instant
Domain        | < 5 µs         | Very fast
Regexp        | 100-1000 µs    | Slow with complex patterns
Keyword       | 50-500 µs      | Linear scan

Rule Evaluation Order

Full and Domain Rules

When multiple full or domain rules could match, the most specific match wins:

Rules:
  - domain:com
  - domain:example.com
  - domain:api.example.com

Query api.example.com:
  Evaluation: Longest match wins
  → Matches api.example.com (most specific) ✓

Regexp and Keyword Rules

Rules are evaluated in import order (file order). The first match wins:

Rules (in order):
  - regexp:google
  - regexp:.*oogle
  - keyword:abc

Query "google.com":
  → Matches first regexp:google ✓ (returns true immediately)
  → Never evaluates remaining rules

Configuration Examples

Basic Configuration

- name: domain_set
  tag: direct
  args:
    files:
      - direct-list.txt

With default domain matching (rules without prefix use domain match).

With Custom Default Match Type

- name: domain_set
  tag: gfw
  args:
    files:
      - gfw.txt
    default_match_type: keyword
    auto_reload: true

All rules without a prefix will use keyword matching.

Multiple Files with Auto-Reload

- name: domain_set
  tag: combined
  args:
    files:
      - blocklist.txt
      - custom-domains.txt
      - regex-patterns.txt
    default_match_type: domain
    auto_reload: true
    # Auto-reload checks files every ~200ms

Inline Domain Expressions (exps Parameter)

You can also specify domain rules inline using the exps parameter instead of external files:

Single rule (string format):

- name: domain_set
  tag: direct
  args:
    exps: "example.com"

Multiple rules (array format):

- name: domain_set
  tag: combined
  args:
    exps:
      - "example.com"
      - "full:github.com"
      - "regexp:.+\.google\.com$"
      - "keyword:facebook"

Mixed files and inline expressions:

- name: domain_set
  tag: comprehensive
  args:
    files:
      - blocklist.txt
    exps:
      - "example.com"
      - "full:special.service.com"
      - "regexp:^internal-.*\.local$"
    default_match_type: domain
    auto_reload: true

The exps parameter supports the same rule format as files:

Prefix with full:, domain:, keyword:, or regexp: for specific match types
Rules without prefix use the default_match_type
All inline rules are processed after file rules

Recommended File Format

# Direct access (fast, no censorship)
# Domain format (matches subdomains)
domain:example.com
github.com
www.wikipedia.org

# Exact matches for specific services
full:dns.google

# Keywords for broad categories
keyword:cdn

# Complex patterns
regexp:.+\.local$
regexp:^192-168-.*\.nip\.io$

File Format

Supported Formats

Domain (default)
```
example.com
sub.example.com
```

With Prefix

full:exact.com
domain:parent.com
keyword:google
regexp:.+\.example\.com$

Comments

# This is a comment
# Comments must be on their own line
example.com

Whitespace

Leading and trailing whitespace is trimmed
  example.com     →  matches "example.com"

Empty Lines
```
Empty lines are silently ignored
```

Example File

# Direct access domains (no blocking)
# Updated: 2024-12-26

# GitHub and services
github.com
www.github.io
api.github.com

# Exact services
full:dns.google.com
full:8.8.8.8

# CDN and infrastructure
keyword:cdn
keyword:cloudflare

# User agent patterns
regexp:.*bot.*
regexp:.*crawler.*

# Personal domains
domain:*.example.com

Best Practices

1. Rule Organization

Full matches (most specific, fastest)
Domain matches (common case)
Regexp patterns (complex logic)
Keyword matches (broad patterns)

2. Performance Optimization

Use Full/Domain matches for known domains (O(1))
Place frequently matched rules early in Regexp/Keyword sections
Avoid excessive Regexp rules with complex backtracking
Don’t use Keyword matches for everything (use Domain instead)

3. Accuracy vs Coverage

High accuracy: Use full: and specific domain: rules
Coverage: Use keyword: and regexp: patterns
Balance: Mix all types appropriately for your use case

4. Maintenance

Organize rules by category with comments
Use auto_reload: true for frequently updated lists
Test rule changes (benchmark and functional tests)
Document complex regexp patterns

Troubleshooting

Rule Not Matching

Check case sensitivity (all rules are case-insensitive)
Verify prefix is correct (full:, domain:, etc.)
Check trailing dots (automatically normalized)
Verify rule priority (check previous rules)

Example:

# These all match "example.com":
domain:example.com
DOMAIN:EXAMPLE.COM
example.com.              # trailing dot normalized
Example.Com              # case normalized

# These do NOT match "example.com":
full:www.example.com     # full requires exact match
keyword:exam             # keyword is substring, would match but rule is for "exam"

Performance Issues

Slow query matching: Check Regexp/Keyword rule count
Large memory usage: Consider splitting into multiple file sets
File reload delays: Auto-reload debounce is 200ms (configurable)

Debugging

Enable tracing to see matching details:

RUST_LOG=debug lazydns

API Reference

Rust Code

use lazydns::plugins::dataset::{DomainRules, MatchType};

let mut rules = DomainRules::new();

// Add rules
rules.add_rule(MatchType::Full, "exact.com");
rules.add_rule(MatchType::Domain, "example.com");
rules.add_rule(MatchType::Keyword, "google");
rules.add_rule(MatchType::Regexp, r".+\.github\.io$");

// Parse lines
rules.add_line("domain:test.com", MatchType::Domain);

// Check matches
assert!(rules.matches("exact.com"));
assert!(rules.matches("sub.example.com"));
assert!(rules.matches("test-google.com"));
assert!(rules.matches("mysite.github.io"));

Statistics

let stats = rules.stats();
println!("Full: {}", stats.full_count);
println!("Domain: {}", stats.domain_count);
println!("Regexp: {}", stats.regexp_count);
println!("Keyword: {}", stats.keyword_count);

lazydns

Domain Matching Rules Guide

Overview

Quick Start

Basic Examples

Rule Formats in Files

Match Types

1. Full Match (full:)

Examples

2. Domain Match (domain:)

Subdomain Priority

Examples

3. Keyword Match (keyword:)

Examples

4. Regexp Match (regexp:)

Regex Basics

Examples

Performance Warning

Matching Priority

Priority Order (Highest to Lowest)

Example

Performance Characteristics

Time Complexity

Space Complexity (Approximate)

Benchmarks

Rule Evaluation Order

Full and Domain Rules

Regexp and Keyword Rules

Configuration Examples

Basic Configuration

With Custom Default Match Type

Multiple Files with Auto-Reload

Inline Domain Expressions (exps Parameter)

Recommended File Format

File Format

Supported Formats

Example File

Best Practices

1. Rule Organization

2. Performance Optimization

3. Accuracy vs Coverage

4. Maintenance

Troubleshooting

Rule Not Matching

Performance Issues

Debugging

API Reference

Rust Code

Statistics

See Also

1. Full Match (`full:`)

2. Domain Match (`domain:`)

3. Keyword Match (`keyword:`)

4. Regexp Match (`regexp:`)