AI-nHancement  /  Developer Tooling

Structure
for vibe
coding.

The criticism leveled at vibe coding — "AI slop," fragile codebases, looks good on the surface but fractured underneath — is not anti-AI. It is anti-fragility. Anvil targets the older problem.

"The workflow that prevents failures, not the AI tool that hides them better."
Rust CLI Go Sidecar gRPC / protobuf Adversarial Review Audit-Linked Provenance
Anvil
In Development · v1 CLI
The Wedge

Not a better
AI coding agent.

Anvil is not competing with Codex, Claude Code, Cursor, or any other AI coding agent on the axis of "better AI coding agent." That is a losing race against the model providers.

Anvil competes on a different axis: AI coding done correctly. The competitive question reframes from "which AI tool is best?" to "which workflow produces shippable code?" On that axis, Anvil's competitors are not other AI tools — they are the absence of process.

The defensible claim: Anvil-using developers can end up more structurally disciplined than the median professional codebase, not just more disciplined than vibe coding.

$ anvil charter review
→ Rotating to Reviewer-2 (OpenAI / gpt-5)
→ Sending briefing packet…
 
Findings received: 4
Grounded: 3  
Refuted: 1  
 
$ anvil charter findings
[F-003] Accept → queued for correction
[F-004] Drop → refuted by verifier
 
$ anvil status
Artifact: charter.md (R2)
Reviewer: Reviewer-2 of 2
Round: 2 / 5 (clean threshold)
Open hinges: 4
Gate: Curation required before next rotation
Architecture

Three-layer stack.
Clean trust boundaries.

The CLI is the v1 deliverable. The Vault is designed as a clean Rust library so the v1.1 desktop App can consume it directly — no rework, no translation layer. File-system locking is in place from day one so the App can coexist with the CLI immediately.

Layer 1
CLI anvil
Subcommand dispatch, interactive setup wizard, workflow gates surface as prompts. Six human-approval gates per phase. v1.1 adds the Tauri + React desktop App consuming the same Vault API.
anvil charter review anvil plan anvil phase ship anvil arbiter anvil metrics show anvil hinge list
links (Rust library calls)
Layer 2
Vault anvil-core
State machine, audit store (append-only, 16 record types), provenance graph, policy enforcement. Designed for v1.1 App consumption. Trust-boundary invariants enforced here — not bypassable by CLI or App surface.
anvil-audit anvil-graph anvil-sidecar-client anvil-eval anvil-hinge anvil-ship
gRPC / versioned protobuf
Layer 3
Sidecar anvil-sidecar (Go)
Vendor adapters via raw HTTP. Workspace-scoped daemon. Stateless across invocations — all session context lives in the Vault. API keys passed per-call, never cached, never logged.
Invoke InvokeStreaming Handshake ReloadConfig Health
HTTPS — raw vendor APIs
External
Model Providers
v1 ships adapters for Anthropic direct, OpenAI direct, and Google AI Studio direct. Architecture supports any provider without modifying Vault, contract, or existing adapters — each new adapter is purely additive.
Anthropic OpenAI Google AI Studio + cloud gateways (v1.1+)
Workflow

Four stages. Six gates each.
Human in every loop.

Every phase moves through Charter → Plan → Build → Ship. Multi-reviewer rotation with adversarial cross-family diversity. Full-pool clean required before anything ships. The Coordinator is the load-bearing actor at every gate — not the models.

C
Charter Stage
Interlocutor discussion → Charter render → review → convergence
Discuss Review Curate
P
Plan Stage
Planner invocation → 9-field phase spec → dependency graph
Plan Validate Consolidate
B
Build Stage
Per-phase Coder loop with verifier-tagged findings and briefing packets
Brief Review Ship
S
Ship + Rollback
Transport, blast-radius rollback, cascading invalidation via dependency graph
Blast Radius Confirm Rollback
Trust-Boundary Invariants
1
No commit on partial or invalid sidecar output
The Vault never commits a phase artifact or advances a gate based on partial output. Only the FinalResult event from a streaming invocation is authoritative. Mid-stream errors discard all accumulated tokens from the commit path.
2
Sidecar must remain stateless across invocations
The sidecar holds no persistent state between RPC calls. API keys are passed per-call via the Credentials field, consumed within the request handler, and discarded — never cached, never logged on the sidecar side.
3
App frontend is not on the trust boundary
When the v1.1 App is added, it is a UI surface consuming the Vault API, not a trust-bearing layer. The Vault validates all inputs regardless of whether they originate from the CLI, the App, or any future surface.
Core Features

Every mechanism exists
to prevent a real failure mode.

No speculative infrastructure. Each feature is traceable to a specific class of fragility in AI-assisted development.

⚔️
Adversarial Cross-Vendor Review
Multi-reviewer pool with family-floor invariant: Claude cannot review its own Coder output. v1 minimum pool: Codex-class + Gemini-class. Full-pool clean termination — not first-clean-pass. Convergence safeguards activate at round 5.
Family-floor invariant Rotation arithmetic Severity tiering
📋
Tamper-Evident Audit Store
16 record types, append-only at both API and filesystem level (O_EXCL). Atomic index updates. Completeness check detects out-of-band deletion. Cross-reference integrity blocks ship on unresolved gaps.
16 record types Append-only BlockShip enforcement
🕸️
Provenance Graph
Every decision is traceable. anvil audit list and anvil audit show are first-class commands — not debug tools. Cross-reference keys are stable across re-renderings and hinge-tested for stability.
Queryable graph Stable cross-refs UTF-8 lint
🎯
Convergence Safeguards
After round 5, findings downgrade to advisory — each requiring explicit Coordinator disposition. Per-finding arbiter resolution breaks reviewer contradictions without abandoning the full-pool-clean termination condition. No silent passes.
Arbiter authority Advisory disposition ArbiterFindingResolution
💥
Blast-Radius Rollback
Re-opening a phase computes the full transitive closure of dependent phases via the dependency graph. User sees and confirms the blast radius before any commit. Rotation resets on all invalidated phases — diversity is re-enforced after every rollback.
Cascading invalidation Rotation reset Immutable history
📊
Evaluation Metrics + Alerts
Six Layer-1 metrics computed automatically from audit-store data. Layer-2 per-project targets. Layer-3 rule-based alert engine on four alert kinds. Cost controls with hard-stop opt-in. All surfaced via anvil metrics show.
6 metrics 4 alert kinds Cost controls
Layer-1 Product Metrics

Six metrics. All automatic.
No manual entry.

Every metric is computed from audit-store records. Targets are provisional — P11 dogfooding produces the first observational baselines.

🐛
Defect Escape Rate
Target: 0 P1 defects post-Ship
Any P1 escape triggers re-open. P2/P3 escape budget: ≤2 across all v1 phases.
🎯
Review Finding Precision
Target: ≥70% grounded findings
Alert at <50% sustained over 2 consecutive phases.
⏱️
Human Minutes / Phase
Target: ≤90 min average
Alert at >150 min for 2 consecutive phases. Setup (P4) tracked separately.
🔄
Review Rounds / Phase
Target: ≤3 rounds average
Alert at ≥5 rounds for any single phase — triggers Phase Review Briefing quality review.
🤝
Cross-Reviewer Agreement
Target: 30–60% agreement
Bimodal diagnostic: <15% or >80% triggers pool-configuration review.
📌
Deferred-Decision Resolution
Target: ≥90% resolved within 2 phases
Any deferral open >5 phases at Ship is a BlockShip condition.
Build Roadmap

15 phases. Critical path
P0 → P1 → P2 → … → P11.

Foundation phases build the Vault, audit store, and contract. P4–P8 deliver the workflow stages. P9/P10a/P10b run in parallel after P8. P11 is dogfooding and docs — Anvil v1 manages the Anvil v1.1 design using its own CLI.

P0
Bootstrap
Rust workspace + Go module + protobuf codegen + build orchestration
P1
Config + Charter Loader
Required-Choices schema, provider-connection model, anvil init
P2
Audit Store + Provenance Graph
16 record types, append-only enforcement, cross-reference integrity
P3a
Contract Definition
Wire contract: Handshake, Invoke, InvokeStreaming, ReloadConfig via protobuf
P3b · parallel
Rust Sidecar Client
Vault-side gRPC client, contract enforcement, retry/backoff, idempotency
P3c · parallel
Go Sidecar
gRPC server, Anthropic + OpenAI + Google AI Studio vendor adapters
P4
CLI Setup Wizard
7-step interactive setup: provider connections, model bindings, diversity validation
P5
Charter Stage Pipeline
First end-to-end workflow: Interlocutor → Charter → single reviewer → disposition
P6
Multi-Reviewer Rotation
Pool rotation, convergence safeguards, arbiter authority, severity tiering
P7
Plan Stage Pipeline
Planner Contract validation, Plan Review, dependency graph CLI surface
P8
Build Stage Pipeline
Per-phase Build loop, Phase Review Briefing, 6 gate-approval records
P9
Ship + Rollback
Project ship, cascading invalidation, blast-radius confirmation, rotation reset
P10a · parallel
Evaluation Infrastructure
6 Layer-1 metric collectors, Layer-2 target eval, Layer-3 alert engine
P10b · parallel
Hinge-Test Framework
Bi-language hinge registry (Rust + Go), unified registry, anvil hinge CLI
P11
Dogfooding + Docs
Anvil manages Anvil v1.1 design. External pilot. All Provisional Locks resolved.
P0P1P2P3a( P3bP3c )P4P5P6P7P8
└──P9 (Ship + Rollback) ← ∥
└──P10a (Eval Infrastructure) ← ∥
└──P10b (Hinge-Test Framework) ← ∥
         └── P11 (Dogfooding + Docs) ← requires all three
Technology

Rust core. Go sidecar.
gRPC contract boundary.

The architecture that survives v1 → v1.1. The Vault library is designed for App consumption from day one. The sidecar is additive — new vendor adapters require no changes to the Vault or contract.

🦀
Core · CLI
Rust ≥ 1.80
Vault library (anvil-core), CLI binary (anvil-cli), audit store (anvil-audit), provenance graph (anvil-graph), sidecar gRPC client (anvil-sidecar-client). Type system used to make append-only audit store hard to violate.
Cargo workspace tonic (gRPC) prost serde UUIDv7 idempotency
🐹
Sidecar · Vendor Adapters
Go ≥ 1.22
gRPC server, workspace-scoped daemon with global stale-daemon registry, vendor adapters via raw HTTP (no SDKs). Structured JSON logging with idempotency-key correlation. v1: Anthropic, OpenAI, Google AI Studio.
protoc-gen-go-grpc golangci-lint JSON logging Raw HTTP
Contract · Wire Protocol
protobuf v1 / gRPC
Versioned contract: anvil.v1. Mandatory version handshake on every connect. Configuration epoch (SHA-256 hash) on every handshake prevents split-brain between Vault and daemon. 6 ErrorClass enum values.
anvil.v1 Config epoch 6 error classes Streaming events
🔐
Secrets · Credentials
OS Keychain + Env Vars
Interactive setup uses OS keychain (Windows Credential Manager / macOS Keychain / Linux Secret Service). Headless and CI workflows use ANVIL_API_KEY_* environment variables. No file-based encryption in v1 — env-var floor is strictly safer.
keyring crate Per-call credentials No at-rest caching
In Development — Plan Approved 2026-05-19

Building in the open.
Dogfooding first.

The Anvil v1 Plan converged on 2026-05-19. P0 (Bootstrap) is unblocked. v1 proves the discipline — CLI-first, experienced developer audience. v1.1 adds the Tauri + React desktop App and broadens the audience. The acceptance test: Anvil v1 manages its own v1.1 design using its own CLI.