$ cd /engineering && cat ./formally-verifying-two-hundred-identity-connectors.md
Formal Methods
> Formally Verifying Two Hundred Identity Connectors
Robert Chiniquy
||8 min read
share:
width:
Formally Verifying Two Hundred Identity Connectors
Connector verification reduces to three set operations.
Missing = Expected - Actual
Extra = Actual - Expected
Match = Expected AND Actual
If Missing and Extra are both empty, the connector is correct for that input. A connector runs against a controlled input and produces a set of facts: “user alice has role admin,” “group engineering contains user bob.” Separately, a Datalog specification derives a set of facts from the same input. Subtract one from the other. What remains is what’s wrong.
I’m Robert. I build identity infrastructure at ConductorOne. We maintain connectors for more than two hundred identity systems —Okta, AWS, GitHub, PostgreSQL, Salesforce, Active Directory, and a couple hundred more. Each connector reaches into a vendor API, pulls back permissions data, and translates it into a common schema.
AWS built Zelkova to verify IAM policies—the rules themselves. By 2022, a billion SMT queries a day. This framework verifies the other half: the connectors that read those policies—and the policies of two hundred other systems—and translate them into a common schema. Not tested against a handful of examples. Proved correct across a combinatorial space of inputs.
The Specification
Every connector’s expected behavior is specified as Datalog rules. For each identity system, the axioms encode what the permission model should produce for a given set of inputs. Every permission model we’ve analyzed reduces to one of six shapes. What matters for verification isn’t the classification—it’s that every system, regardless of shape, has a formal specification that a machine can check.
A Datalog rule is declarative, terse, and mechanically checkable. No hidden state, no concurrency bugs, no control flow to trace. No testing in production. The properties that make formal specifications intimidating for humans—rigid syntax, zero ambiguity, no hand-waving—make them a clean target for LLMs. And the property that makes LLM output dangerous elsewhere—plausible-sounding wrongness—is neutralized by the fact that axioms are checkable. A hallucinated Datalog rule gets caught on the next verification run. I built the framework with Claude, and that tight loop—write a rule, verify it, fix it – is what made it practical to specify two hundred systems.
Exploring the Input Space
The set comparison is the core operation. The hard part is choosing which inputs to compare against.
A connector’s input space is a hypercube. Each dimension is a kind of entity—users, groups, roles, memberships—with a range from zero to some maximum. In the framework, each dimension is defined with a generator function that maps abstract positions to concrete input data:
type Space struct {
Dimensions []Dimension
}
type Dimension struct {
Name string
Min, Max int
Generate func(value int) any
}
The Generate function is the bridge between the abstract hypercube and real test data. Position 3 on the “users” dimension might generate three user records with specific names, emails, and roles. Position 7 on “memberships” might generate seven group assignments with varying nesting depths. The generator encodes domain knowledge about what each position means.
A thousand possible configurations for three dimensions. A real connector might have ten dimensions. The space grows to a trillion points.
Gray code changes exactly one dimension per step. The iterator uses a reflected binary construction—a direction array tracks the traversal direction for each dimension and reverses when a dimension hits its boundary:
type GrayCodeIterator struct {
dims []int
current []int
direction []int // Reflection: +1 or -1 per dimension
done bool
}
If the test passed at step t and fails at step t+1, the single dimension that changed is responsible. This is the discrete analog of computing partial derivatives—the output delta is attributable to that one dimension.
When a failure appears, bisection narrows it to the exact boundary. The bisector isolates failures to a single dimension in O(log n) probes, and can sweep an entire dimension to find all boundaries when multiple failure points exist:
Some input dimensions produce smooth output changes: more users means proportionally more synced records. These are differentiable in the discrete sense. Other dimensions are discontinuous: toggling a feature flag enables an entirely different code path, and the output changes categorically. The ratio of smooth to discontinuous dimensions predicts verification difficulty more accurately than lines of code or cyclomatic complexity.
This connects to differentiable programming—normally framed in terms of continuous functions and gradient descent, but the discrete version is the same idea. Change one input, observe the output delta, attribute the change. Baydin, Pearlmutter, Radul, and Siskind’s “Automatic Differentiation in Machine Learning: a Survey” (JMLR, 2018) is the best overview of the continuous foundations. The discrete side, where the “gradient” is which dimension of a hypercube caused a verification failure, is territory I’m still mapping.
Coverage-Guided Verification
The approach borrows from AFL (American Fuzzy Lop), Michal Zalewski’s coverage-guided fuzzer. AFL instruments a binary, feeds it random inputs, and tracks which code branches each input exercises. When an input discovers a new branch, AFL mutates it to explore further. The coverage map is the feedback signal.
This framework works the same way, except instead of fuzzing to find crashes, it verifies correctness. The coverage signal isn’t “did this crash?” but “did this configuration exercise an axiom rule that hasn’t been verified yet?”
The explorer generates facts for each configuration, predicts branch coverage via compiled DFAs, runs verification, tracks coverage gains, and uses bisection to find exact boundaries. Two layers of coverage guide the exploration:
Branch coverage predicts which code paths in the connector source a given input will exercise. Static analysis walks the Go AST, extracts every branch point, and compiles the conditions into predicates over input facts. A finite state transducer compresses the configuration-to-branch mapping from trillions of cells to the number of distinct branch sets actually observed. The system prioritizes dimensions by predicted coverage gain and bisects to find the exact threshold where new branches become reachable.
Axiom coverage sits on top. After each probe, the framework checks which derived predicates from the Datalog specification were actually exercised. If effective_permission(U, P) is a derived rule but no facts satisfied it, that rule is unverified. The system traces back through the axiom’s dependency chain to identify which base predicates are missing.
This gets precise. The system tracks which branches correlate with which input dimensions, and which combinations of predicates on those dimensions activate which axiom rules. If a specific axiom—say, inherits_role(U, R2) :- has_role(U, R1), role_inherits(R1, R2)—has never fired, the system identifies that it needs both has_role and role_inherits to be simultaneously non-empty, traces those back to the users and role_inheritance dimensions, and generates an input that targets exactly that product. Individual axioms become addressable verification targets.
What It Catches
Bugs the framework has found:
Pagination boundary errors (off-by-one on the final page)
Missing grant types (user-to-role synced, group-to-role dropped)
Status mapping errors (suspended mapped to active in edge cases)
Incomplete transitive closure on nested groups
Resource misclassification
These share a property: the connector produces slightly wrong output across a range of inputs. The discrete gradient is nonzero and points toward the bug. None of the computer science here is new. Gray code is from 1953. Congruence closure is from 1980. Equality saturation is from 2009. I read Kroening and Strichman’s Decision Procedures last year and it ties the algorithmic foundations together—DFA construction, congruence closure, satisfiability, all in one place. What’s new is the feedback loop that makes it practical to apply these techniques at scale.
The domains where software quality matters most—authorization, medical systems, infrastructure safety—are the domains most likely to benefit from this pattern. Not from AI products, but from AI making rigorous engineering practical where it previously wasn’t.
The Rest of the Series
This post covers the core verification loop: specification, set algebra, input space exploration, coverage guidance. The remaining components each get their own treatment, coming soon:
One Mock Server, Twelve Protocols – YAML-driven mocks speaking twelve protocols through one interface. Zero Go code per connector.
Every Branch Condition Compiles to a DFA – hash-consed term stores, Boolean algebra rewriting, and symbolic automata with Go callback guards.
Proving Equivalence with E-Graphs – equality saturation discovers when two permission derivations compute the same access.
Fault Injection for Complete Branch Coverage – TCP RSTs, partial responses, and field-level JSON mutations reach the branches valid inputs can’t.
The system is running. It will be part of CI/CD here. Yay!
Series
This is part of a series on formally verifying identity connectors: