> Every Branch Condition Compiles to a DFA

Robert Chiniquy

2026-02-20 4 min read

width:

Every branch condition in every connector – CIDR ranges, wildcard ARN patterns, timestamp comparisons, VPC checks – compiles to a deterministic finite automaton. This lets the verification framework predict which code paths an input will exercise without running the binary.

This is part of a series on formally verifying identity connectors. The coverage-guided verification needs to know which branches a given input configuration will exercise. Running the connector for every candidate configuration is too slow. Instead, the framework compiles branch conditions to DFAs and evaluates them directly.

Terms

Branch conditions start as structured terms. The system represents every predicate as a node in a hash-consed term store:

type TermID uint32

type Term struct {
    Op       Operator
    Children []TermID
}

type TermStore struct {
    terms    map[TermID]Term
    hashcons map[string]TermID  // Structural sharing
    parent   map[TermID]TermID  // Union-find for equivalence
}

Hash-consing means structurally identical terms share the same ID. If two different branches in two different connectors both check AWSMFARequired() AND AWSInVPC("vpc-12345"), they get the same TermID. The union-find tracks equivalences discovered during e-graph saturation.

A branch condition like “MFA is required AND the request is from this VPC” becomes:

b := NewBuilder()
mfa      := b.AWSMFARequired()
vpc      := b.AWSInVPC("vpc-12345")
combined := b.And(mfa, vpc)

That builds a tree: And(AWSMFAPresent, AWSSourceVPC("vpc-12345")).

Rewriting

A rewrite engine applies Boolean algebra identities before compilation:

And(True, x)    → x            identity
And(False, x)   → False        annihilator
Not(Not(x))     → x            double negation
And(x, Not(x))  → False        contradiction
Or(x, Not(x))   → True         tautology
And(x, Or(x,y)) → x            absorption
And(x, x)       → x            idempotence

Plus De Morgan’s laws: Not(And(x,y)) → Or(Not(x), Not(y)).

Plus domain-specific rules that go beyond Boolean algebra:

StringEquals(k, v1) AND StringEquals(k, v2) → False        when v1 ≠ v2
IpAddress(k, cidr1) AND IpAddress(k, cidr2) → IpAddress(k, intersect(cidr1, cidr2))
NumericLessThan(k, v1) AND NumericGreaterThan(k, v2) → False   when v1 ≤ v2

The CIDR intersection rule is where this gets specific. The system computes the relationship between two CIDR ranges:

type CIDRRelation int
const (
    CIDRDisjoint   // Non-overlapping: 10.0.0.0/8 vs 172.16.0.0/12
    CIDREqual      // Identical
    CIDRSubset     // First ⊂ Second: 10.0.1.0/24 ⊂ 10.0.0.0/16
    CIDRSuperset   // First ⊃ Second
    CIDROverlap    // Partial overlap
)

If two conditions require the request to come from disjoint CIDR ranges, the conjunction is False – the branch is unreachable. If one range is a subset of the other, the conjunction simplifies to the more specific range. These simplifications happen before DFA compilation, reducing the automaton’s state count.

Compilation

After rewriting, each simplified term compiles to a DFA through standard constructions:

Each leaf predicate becomes a two-state automaton (accept/reject)
And becomes product construction (both must accept)
Or becomes union (either accepts)
Not becomes complementation (swap accept/reject states)

This works even for conditions that look like they’d need an infinite alphabet – CIDR ranges, wildcard ARN patterns, timestamp comparisons. In practice, the domain of values that appears in any given connector’s configuration is always finite. A CIDR range is a set of prefixes. An ARN wildcard is a regular expression over a known set of resource identifiers. Timestamps reduce to ordered comparisons over discrete boundaries. All compile through the same product construction.

Symbolic automata are the theoretical framework – transitions labeled with guards over potentially infinite alphabets, where product construction computes guard intersections instead of iterating over symbols. D’Antoni and Veanes’s work on symbolic automata theory is the formal underpinning. The implementation leans into this: guards can be Go callbacks, so a transition predicate can call into arbitrary code to decide whether a symbol matches. CIDR containment, ARN pattern matching, time window checks – domain-specific guards without encoding everything into the automaton’s alphabet.

Coverage Prediction

The compiled DFAs feed directly into the coverage-guided explorer. For a given input configuration, the system evaluates every branch DFA and predicts what will happen:

type PathPrediction struct {
    ReachableBranches   []string
    UnreachableBranches []string
    Confidence          float64
    Assumptions         []string
}

When the explorer evaluates a point in the input hypercube, the DFAs tell it which branches that input will exercise – without running the connector. The explorer uses this to prioritize inputs that reach new branches and skip inputs that exercise only already-verified paths.

The system tracks exact coverage boundaries:

type CoverageBoundary struct {
    LowerPoint     []int
    LowerCoverage  int
    HigherPoint    []int
    HigherCoverage int
    Dimension      int
    Value          int
    NewBranches    []string
}

When the explorer crosses a threshold on a dimension and a new branch becomes reachable, the boundary records which dimension, which value, and which specific branches appeared. “Adding a sixth user activates the pagination branch.” “Setting groups to three triggers the nested membership path.” The DFA predicted it. The verification confirmed it.

Series

This is part of a series on formally verifying identity connectors:

References

D’Antoni, Veanes. “Symbolic Automata Theory.” CAV, 2017.
Kroening, Strichman. Decision Procedures: An Algorithmic Point of View. 2nd ed. Springer, 2016.

$ ls ./related/_

> More from Engineering

← cd /engineering