From Prompting to Systems

Spec-Driven AI Engineering

Demonstrated with Claude Code.

Software development is shifting from writing instructions for machines to defining structured intent for autonomous systems operating inside persistent repositories.

AI coding produces fast prototypes. It fails in real systems — missing structure, memory, and long-term state.

Symptom 1:
Inconsistency

The same prompt produces different outputs depending on context, history, and hidden system state.

Symptom 2:
Fragility

Small requirement changes break assumptions across multiple generated components and files.

Prompts do not encode architecture, constraints, system state, or evolution history.

Without structure

Model generates code
Tests pass
Production breaks
No audit trail

With structure

Spec defines constraints
Model operates within bounds
Failure is traceable
Audit trail exists

The bottleneck is context architecture, not model capability.

Three Layers. One Bottleneck.

The field keeps confusing these. They are not the same problem.

LAYER

ARTIFACT

SCOPE

prompt engineering

the string

a single turn

context engineering

what's in the window

a conversation

harness engineering

the runtime

the whole agent

Enter

Claude Code

Not autocomplete. An agent that does the work.

Steerable

Plain instructions in CLAUDE.md. Custom skills. Sub-agents you compose.

Honest

Stops to ask. Surfaces tradeoffs. Refuses destructive moves without consent.

Composable

Hooks, MCP servers, the Agent SDK — everything is a building block.

Development shifts from stateless prompting to stateful system construction inside repositories.

Repositories are memory systems,
not code storage.

vision.md

Defines product intent, goals, and constraints.

# Vision
Build a regulatory reporting system for UCITS funds.

## Goals
- ingest fund data from multiple sources
- validate against regulatory schema
- generate board-ready reports
- maintain full audit trail

## Principles
- every output must be traceable
- no generation without spec approval
- human review before submission

architecture.md

Defines system components and data flow.

Fund Data → Ingestion → Validation
          → Report Generation → Output

Components:
- ingestion layer (CSV/API)
- regulatory schema validator
- report generator
- audit trail (decisions.md)

Storage:
- Postgres (fund data + audit)
- decisions.md (reasoning trace)

CLAUDE.md

The execution layer.

Defines how the agent operates inside the repository.

# CLAUDE.md

## Rules
- follow architecture.md strictly
- never bypass spec files
- prefer incremental changes
- maintain explicit reasoning traces

## Output style
- structured reasoning
- no hidden assumptions

Real World: Boris Cherny

Creator of Claude Code · Anthropic · Execution, subagents, and self-improvement loops

### 1. Plan Mode Default
- Enter plan mode for ANY non-trivial task (3+ steps or architectural decisions)
- If something goes sideways, STOP and re-plan immediately
- Write detailed specs upfront to reduce ambiguity

### 2. Subagent Strategy
- Use subagents liberally to keep main context window clean
- Offload research, exploration, and parallel analysis to subagents
- One task per subagent for focused execution

### 3. Self-Improvement Loop
- After ANY correction from the user: update tasks/lessons.md with the pattern
- Write rules for yourself that prevent the same mistake
- Ruthlessly iterate on these lessons until mistake rate drops

Real World: Andrej Karpathy

Former Director of AI, Tesla · OpenAI founding team · Safety, simplicity, and surgical changes

## 1. Think Before Coding
**Don't assume. Don't hide confusion. Surface tradeoffs.**
- State your assumptions explicitly. If uncertain, ask.
- If a simpler approach exists, say so. Push back when warranted.

## 2. Simplicity First
**Minimum code that solves the problem. Nothing speculative.**
- No abstractions for single-use code.
- If you write 200 lines and it could be 50, rewrite it.

## 3. Surgical Changes
**Touch only what you must. Clean up only your own mess.**
- Don't "improve" adjacent code, comments, or formatting.
- The test: Every changed line should trace directly to the user's request.

Why CLAUDE.md Matters

Without enforced constraints:

AI drifts architecturally
Systems become inconsistent
Evolution breaks compatibility

CLAUDE.md turns generation into
governed execution.

Karpathy Validation Layer

System-first thinking.

Clarity over complexity
Iterative refinement
Minimal abstractions

def build_system(spec):
    assert spec.is_clear()
    assert spec.is_minimal()

    system = init(spec)

    while not system.is_stable():
        system = iterate(system)

    return system

Cherny Validation Layer

Enforcement-style system thinking.

Tooling enforces correctness
Systems shape behavior
Developer experience is a constraint system

class SystemGuard {
  validate(action) {
    if (!action.intent) {
      throw new Error("Missing intent");
    }

    return this.normalize(action);
  }
}

Karpathy demands clarity before complexity.

Cherny demands enforcement before trust.

CLAUDE.md is where both principles become executable.

Five Rules to Ship By

The harness is governed by principles, not vibes.

> 01

a map, not a manual

short CLAUDE.md with pointers — not a thousand pages of prose.

> 02

progressive disclosure

load context on demand; keep the working window small.

> 03

ratchet failures into rules

every recurring miss becomes a hook, a lint, or an eval.

> 04

success silent, failures verbose

make errors actionable; let wins speak for themselves.

> 05

the harness is a product

version it. eval it. regress against it. do not tweak by feel.

Spec-Driven Development

Code is a derived artifact of structured specifications, not prompts.

Flow

Spec → Validate → Execute → Refine

## Constraint
Never generate a migration without a rollback in decisions.md.
Never add a dependency without updating architecture.md.

## Gate
Before any implementation step: confirm spec is unambiguous.
If not — stop and ask.

ASCII-First Design

All systems must first be expressed in ASCII before implementation.

User → Upload → Process → Store → Retrieve → UI

The ASCII Validation Rule:

If ASCII structure is unclear, implementation is not allowed.

Multi-Persona Analysis

Same system design. Five different experts. Five different prompts.

› Staff backend engineer · $1B+ AUM pipelines
› UK FCA compliance officer · UCITS IV & V
› Distinguished architect · AWS financial services
› Principal data engineer · tier-1 investment bank
› Senior PM · fund administration, 8 yrs regulatory

Each runs independently. Each finds different failures.

You are a staff backend engineer who has built
high-availability financial data pipelines at
$1B+ AUM scale.

Review this system design. Identify every flaw,
scaling risk, and architectural contradiction.
Be brutal. Do not soften findings.

A system is invalid if it is understood from only one perspective.

The Critique Loop

Every design must be actively attacked for failure modes before implementation.

"Destroy this system design. Identify all flaws, edge cases, contradictions, and scaling risks."

Security:
- No audit log for failed validations — UCITS requires
  record of all exceptions, not just passes
- Report proceeds even if 0 positions validate — must fail-safe

Compliance:
- ISIN lookup uses local cache only — stale data risk
  for newly listed instruments
- Board report timestamp not in UTC — ambiguous for
  cross-border fund reporting

Infrastructure:
- No retry logic for CSV ingestion — single network
  failure breaks the pipeline silently

No system executes until the builder approves the spec —
regardless of who drafted the first version.

AI writes. You approve. Then it runs.

The Demo

Regulatory Reporting Workflow

Everything you just saw — the memory architecture, the spec contracts, the persona gates — applied to a problem this room recognises. Built spec-first. No guessing.

The Input

847 fund positions. UCITS Article 84 requires every one to be validated before a board report can be issued.

ISIN,         Qty,    NAV,     CCY
GB0002634946  10,000  £15.23   GBP  ✓
IE00B4L5Y983   5,000  €112.45  EUR  ✓
US0378331005   2,500  $174.82  USD  ✓
GB00B0CTWC01     800  £42.10   GBP  ✗ ISIN mismatch
...
844 valid · 3 flagged

The Output

A board-ready compliance report with a full audit trail — every decision traceable to source.

UCITS Compliance Report — Q1 2026
Fund: Carne UCITS Strategy Fund
NAV Date: 31 March 2026
─────────────────────────────────
Article 84 Status:  PASS (warnings)
  ✓ 844 / 847 positions validated
  ⚠ 3 ISIN exceptions — see Annex A
  ✓ Single-issuer limits: compliant
  ✓ Eligible assets:    100% compliant
─────────────────────────────────
Audit trail:  decisions.md rev 14
Approved by:  [awaiting sign-off]

What We're Building

A fintech team needs to ingest fund data, validate against UCITS Article 84, generate a board-ready report, and maintain a full audit trail.

One command. The harness reads the spec. It follows the rules.

› analyse fund data Q1 2026 and generate
  UCITS regulatory board report

◆ Reading CLAUDE.md, architecture.md...
◆ Loading fund_data_q1_2026.csv (847 positions)
◆ Running UCITS Article 84 validation...
  ✓ 844 positions validated
  ✗ 3 flagged: ISIN mismatch rows 124, 287, 401
◆ Generating board report...
  · Writing validator.py        (+89 lines)
  · Writing report_generator.py (+134 lines)
  · Writing output/q1_board_report.pdf
◆ Audit trail updated in decisions.md
✓ Done. Review flagged ISINs before submission.

Step 1: Define Intent

The builder writes the intent first. Claude expands it into full spec files. The builder approves before execution.

↳ Memory Architecture in action.

# Builder writes intent first — 5 lines, not 50
echo "# Vision: UCITS regulatory reporting, automated" > vision.md

# Claude drafts the full spec from your intent
claude "expand vision.md into architecture.md and CLAUDE.md"

# Review the output — you own the gate.
# Only after approval does execution proceed.

Step 2: What Claude Generated

From 5 lines of intent — a full constraint layer the whole system runs under.

↳ The repo becomes the system's memory.

# CLAUDE.md — UCITS Regulatory Reporting

## Compliance Rules
- Never generate a report without validating all positions
  against UCITS Article 84 schema first
- Flag ISIN mismatches — do not suppress or round
- Every calculation must trace to a source data row
- Board report must include compliance sign-off section

## Gate
Before generating any output: confirm schema validation passed.
If validation fails: surface all errors. Stop. Do not proceed.
Human approval required before any external submission.

Step 3: ASCII Design

Before coding, we force structural alignment.

↳ ASCII-First Design. No code until structure is clear.

Fund Data ↓ Ingestion ↓ Validation ↓ Report Generation ↓ Audit Trail ↓ Output

Step 4: The Critique

Five prompts. Five experts. Each runs independently.

You are a UK FCA-registered compliance officer
with 10 years of UCITS IV and V expertise.

Review this system design for our regulatory
reporting pipeline. Identify every compliance
gap, audit failure, and regulatory risk.
Be specific. Cite the relevant UCITS articles.

↳ Multi-Persona Analysis. The gate before implementation.

Security:
- No audit log for failed validations — UCITS requires
  record of all exceptions, not just passes
- Report proceeds even if 0 positions validate — must fail-safe

Compliance:
- ISIN lookup uses local cache only — stale data risk
  for newly listed instruments
- Board report timestamp not in UTC — ambiguous for
  cross-border fund reporting

Infrastructure:
- No retry logic for CSV ingestion — single network
  failure breaks the pipeline silently

What Came Out

The spec said: every calculation must trace to a source row.

The code does exactly that. The compliance rule became a traceable error.

def validate_ucits_positions(
    df: pd.DataFrame,
) -> ValidationResult:
    """
    Validates fund positions against UCITS Article 84.
    Generated per architecture.md specification.
    Every error traces to source row — per CLAUDE.md rule 3.
    """
    errors = []
    for idx, row in df.iterrows():
        if not is_valid_isin(row["isin"]):
            errors.append(
                f"Row {idx}: Invalid ISIN '{row['isin']}'"
            )
        if row["nav"] < 0:
            errors.append(
                f"Row {idx}: Negative NAV not permitted"
            )
    return ValidationResult(
        passed=len(errors) == 0,
        errors=errors,
        source_rows=[e.split()[1] for e in errors],
    )

The Iteration Loop

Spec → Implementation → Critique → Refinement → Updated Memory

Every iteration makes the system smarter. The repository accumulates structured intelligence over time.

# After critique — Claude updates memory
claude "update decisions.md with findings from persona review"

# decisions.md now holds the audit trail
# Next iteration starts from an informed state

Memory Update

All decisions are recorded back into markdown files to preserve system state.

Repositories accumulate structured intelligence over time through continuous system refinement.

## 2026-05-20 — Post-Critique Update

### Auth Layer
Decision: Switched from stateless JWT to session tokens + Redis.
Reason: Distributed queue compatibility (Security persona).

### Fund Transfer Step 3
Decision: Added explicit rollback to T-1 state.
Reason: Audit requirement — every mutation must be reversible.
Approved by: Builder review 2026-05-20.

Final Synthesis

Software is not written.

It is specified, constrained, validated, and executed through agent systems.

This week:

Write a CLAUDE.md for one existing project
Run one persona critique before the next PR
Make your repo a memory system, not a file cabinet