$ cd /engineering && cat ./every-branch-condition-compiles-to-a-dfa.md
Formal MethodsEngineering
> Every Branch Condition Compiles to a DFA
Robert Chiniquy
||4 min read
share:
width:
Every branch condition in every connector – CIDR ranges, wildcard ARN patterns, timestamp comparisons, VPC checks – compiles to a deterministic finite automaton. This lets the verification framework predict which code paths an input will exercise without running the binary.
This is part of a series on formally verifying identity connectors. The coverage-guided verification needs to know which branches a given input configuration will exercise. Running the connector for every candidate configuration is too slow. Instead, the framework compiles branch conditions to DFAs and evaluates them directly.
Terms
Branch conditions start as structured terms. The system represents every predicate as a node in a hash-consed term store:
type TermID uint32
type Term struct {
Op Operator
Children []TermID
}
type TermStore struct {
terms map[TermID]Term
hashcons map[string]TermID // Structural sharing
parent map[TermID]TermID // Union-find for equivalence
}
Hash-consing means structurally identical terms share the same ID. If two different branches in two different connectors both check AWSMFARequired() AND AWSInVPC("vpc-12345"), they get the same TermID. The union-find tracks equivalences discovered during e-graph saturation.
A branch condition like “MFA is required AND the request is from this VPC” becomes:
Plus domain-specific rules that go beyond Boolean algebra:
StringEquals(k, v1) AND StringEquals(k, v2) → False when v1 ≠ v2
IpAddress(k, cidr1) AND IpAddress(k, cidr2) → IpAddress(k, intersect(cidr1, cidr2))
NumericLessThan(k, v1) AND NumericGreaterThan(k, v2) → False when v1 ≤ v2
The CIDR intersection rule is where this gets specific. The system computes the relationship between two CIDR ranges:
type CIDRRelation int
const (
CIDRDisjoint // Non-overlapping: 10.0.0.0/8 vs 172.16.0.0/12
CIDREqual // Identical
CIDRSubset // First ⊂ Second: 10.0.1.0/24 ⊂ 10.0.0.0/16
CIDRSuperset // First ⊃ Second
CIDROverlap // Partial overlap
)
If two conditions require the request to come from disjoint CIDR ranges, the conjunction is False – the branch is unreachable. If one range is a subset of the other, the conjunction simplifies to the more specific range. These simplifications happen before DFA compilation, reducing the automaton’s state count.
Compilation
After rewriting, each simplified term compiles to a DFA through standard constructions:
Each leaf predicate becomes a two-state automaton (accept/reject)
And becomes product construction (both must accept)
Or becomes union (either accepts)
Not becomes complementation (swap accept/reject states)
This works even for conditions that look like they’d need an infinite alphabet – CIDR ranges, wildcard ARN patterns, timestamp comparisons. In practice, the domain of values that appears in any given connector’s configuration is always finite. A CIDR range is a set of prefixes. An ARN wildcard is a regular expression over a known set of resource identifiers. Timestamps reduce to ordered comparisons over discrete boundaries. All compile through the same product construction.
Symbolic automata are the theoretical framework – transitions labeled with guards over potentially infinite alphabets, where product construction computes guard intersections instead of iterating over symbols. D’Antoni and Veanes’s work on symbolic automata theory is the formal underpinning. The implementation leans into this: guards can be Go callbacks, so a transition predicate can call into arbitrary code to decide whether a symbol matches. CIDR containment, ARN pattern matching, time window checks – domain-specific guards without encoding everything into the automaton’s alphabet.
Coverage Prediction
The compiled DFAs feed directly into the coverage-guided explorer. For a given input configuration, the system evaluates every branch DFA and predicts what will happen:
When the explorer evaluates a point in the input hypercube, the DFAs tell it which branches that input will exercise – without running the connector. The explorer uses this to prioritize inputs that reach new branches and skip inputs that exercise only already-verified paths.
The system tracks exact coverage boundaries:
type CoverageBoundary struct {
LowerPoint []int
LowerCoverage int
HigherPoint []int
HigherCoverage int
Dimension int
Value int
NewBranches []string
}
When the explorer crosses a threshold on a dimension and a new branch becomes reachable, the boundary records which dimension, which value, and which specific branches appeared. “Adding a sixth user activates the pagination branch.” “Setting groups to three triggers the nested membership path.” The DFA predicted it. The verification confirmed it.
Series
This is part of a series on formally verifying identity connectors: