Architecture

Stato consists of seven subsystems organized in a layered architecture. The module system provides the data model. The compiler validates it. The state manager persists it. The composer enables portability. The bridge generator connects to platforms. The privacy scanner ensures safe export. The bundle parser enables web AI transfer.

Design Analogies

Stato draws from several established software engineering concepts:

Package Manager (npm/pip): Expertise modules are versioned, have dependencies, and can be installed from a registry. stato registry install works like pip install but for agent knowledge.

Compiler (GCC/TypeScript): The 7-pass graduated compiler validates every module before it touches disk. Hard errors block writes, warnings trigger auto-corrections, and informational messages suggest improvements. Error codes (E001-E010, W001-W006, I001-I006) follow compiler conventions.

Container Runtime (Docker): Stato packages expertise so it works on any platform. A snapshot created from Claude Code expertise works identically when imported into a Cursor or Codex project, just as a Docker image runs the same way on any host.

Version Control (Git): Stato supports snapshots (commits), diffs (comparing module versions), slicing (cherry-pick), grafting (merge from external), and merging (combine two archives with conflict resolution).

What is novel: The crystallization step (the agent extracts its own knowledge rather than a human authoring documentation), the web AI to coding agent bridge, privacy scanning before export, and the composition algebra for expertise modules.

Module System

Stato defines five module types, each a Python class with structured fields:

Type	Purpose	Required Fields	Required Methods
Skill	Reusable expertise with parameters and lessons	`name`	`run()`
Plan	Step-by-step execution tracking with DAG dependencies	`name`, `objective`, `steps`
Memory	Working state: current phase, known issues, reflection	`phase`
Context	Project metadata: datasets, environment, conventions	`project`, `description`
Protocol	Multi-agent handoff schemas	`name`, `handoff_schema`

The key design decision is using Python classes rather than JSON, YAML, or TOML. This has three advantages: (1) the agent’s tool already parses Python, so modules ARE executable code; (2) narrative fields like lessons_learned coexist naturally with structured fields like default_params; (3) type inference from class structure eliminates the need for explicit type declarations.

A typical skill module:

class QualityControl:
    """QC filtering for scRNA-seq data."""
    name = "qc_filtering"
    version = "1.2.0"
    depends_on = ["scanpy"]
    default_params = {
        "min_genes": 200,
        "max_genes": 5000,
        "max_pct_mito": 20,
        "min_cells": 3,
    }
    lessons_learned = """
    - Cortex tissue: max_pct_mito=20 retains ~85% of cells
    - FFPE samples: increase to max_pct_mito=40
    - Mouse data: use mt- prefix (lowercase). Human: MT-
    """
    @staticmethod
    def run(adata_path, **kwargs):
        params = {**QualityControl.default_params, **kwargs}
        return params

Each module type has a schema defined in core/module.py that maps field names to expected Python types. The schemas are intentionally minimal: they enforce structural correctness without constraining content.

See Module Format for complete schema tables.

Graduated Compiler

The compiler (core/compiler.py) implements a 7-pass validation pipeline with three severity tiers:

Hard errors (E-codes): block the write entirely
Auto-corrections (W-codes): fixable issues applied automatically
Advice (I-codes): suggestions that do not block

The seven passes execute in order with early termination:

Pass	Name	Purpose	Terminates on Failure
1	Syntax	`ast.parse()` catches malformed Python	Yes
2	Structure	Finds primary class, checks for docstring	Yes
3	Type Inference	Determines module type from class name and fields	No
4	Schema Check	Verifies required fields and methods exist	Yes
5	Type Check	Validates field types, applies auto-corrections	Yes
6	Execute	Runs source in sandbox, verifies methods are callable	Yes
7	Semantic	Module-specific validation (DAG acyclicity for plans)	No

Auto-Correction Mechanics

Pass 5 implements three auto-corrections that fix common issues without user intervention:

W001: depends_on = "scanpy" (string) becomes depends_on = ["scanpy"] (list)
W002: depends_on = 42 (int) becomes depends_on = [42] (list)
W003: version = "1.0" becomes version = "1.0.0" (adds patch number)

Corrections are applied in reverse line order to avoid offset drift when modifying source text. The corrected source is stored in ValidationResult.corrected_source and used for subsequent passes.

Plan Semantic Validation

Pass 7 performs semantic validation specific to each module type. For plans, this includes:

Step ID uniqueness (E008): every step must have a unique id
Dependency reference validity (E008): depends_on values must reference existing step IDs
Status value validation (E010): status must be one of pending, running, complete, failed, blocked
DAG acyclicity (E009): step dependencies are checked for cycles via DFS with three-color marking (white/gray/black)

See Error Codes for the full code catalog.

State Manager

The state manager (core/state_manager.py) enforces the validate-then-write invariant: no module reaches disk without passing the compiler. The write path is:

Validate source through the 7-pass pipeline
If an existing file is present, create a timestamped backup in .stato/.history/
Write the (possibly auto-corrected) source to the target path
Return the ValidationResult for caller inspection

Backups use a simple naming scheme: {module_stem}.{timestamp}.py. Rollback reads the most recent backup and rewrites the current file. This zero-dependency approach (no git required, no database) ensures stato works in any environment.

The init_project() function creates the directory structure:

project/
  .stato/
    skills/          # Skill modules
    .history/        # Automatic backups
    prompts/         # Crystallize prompt templates
      crystallize.md
      crystallize_web.md
  .statoignore       # Privacy scan exclusion patterns

Composition Engine

The composer (core/composer.py) implements four operations that form an algebra over module collections:

Snapshot

Creates a .stato archive (ZIP with manifest.toml):

Discovers all modules via _discover_modules()
Applies optional filtering by module name, type, or exclusion list
Optionally applies template reset (clears runtime state, preserves expertise)
Optionally sanitizes via the privacy scanner
Writes manifest.toml + module files into a ZIP archive

Import

Extracts modules from a .stato archive into a project, with optional filtering by module name or type.

Slice

Extracts specific modules with dependency awareness. When --with-deps is set, it performs BFS through the dependency graph, automatically including transitive dependencies and emitting warnings about what was auto-included.

Graft

Adds external modules with conflict detection. When a name collision occurs, the caller chooses from four strategies: ask (report conflict), replace (overwrite), rename (append _imported suffix), or skip (ignore).

The archive format uses POSIX paths (PurePosixPath) internally for cross-platform compatibility and TOML for the manifest to keep it human-readable.

Bridge Generator

The bridge generator produces platform-specific files that serve as a lightweight index (~500 tokens) pointing agents to detailed module files. Each bridge follows the same pattern:

Scan .stato/ for all valid modules
Build a skill summary table (name, version, key parameters, lesson count)
Summarize plan progress (objective, completed/total steps, next step)
Append working rules that guide agent behavior

Four bridge implementations share a common base class (BridgeBase):

Platform	Output File	Section Header
Claude Code	`CLAUDE.md`	Working Rules
Cursor	`.cursorrules`	Rules
Codex	`AGENTS.md`	Working Rules
Generic	`README.stato.md`	Guidelines

Working Rules

Bridge files instruct agents to:

Read the plan first
Read relevant skills before acting
Update plan status after completing steps
Add new lessons to skill files
Validate after changes
Fix validation errors before proceeding
Update memory before stopping
Run stato resume when context feels stale

On-Demand Loading

This design keeps the bridge under 500 tokens. Agents read full skill files only when performing that specific task, avoiding the context window bloat that occurs when all expertise is embedded in a single file.

Privacy Scanner

The privacy scanner (core/privacy.py) detects sensitive content before export. It searches for 19 patterns across six categories:

Category	Examples	Replacement
`api_key`	`sk-...`, `sk-ant-...`	`{API_KEY}`
`credential`	AWS keys, database URLs, private keys, passwords	`{AWS_ACCESS_KEY}`, `{DATABASE_URL}`
`token`	GitHub PATs, Slack tokens, Bearer tokens	`{GITHUB_TOKEN}`, `{TOKEN}`
`path`	`/home/user/...`, `/Users/user/...`	`/home/{user}/...`
`network`	Internal IPs (10.x.x.x, 192.168.x.x)	`{INTERNAL_IP}`
`pii`	Email addresses, patient IDs, SSNs	`{EMAIL}`, `{SUBJECT_ID}`

The scanner includes bioinformatics-specific patterns (patient IDs, medical record numbers) reflecting stato’s origins in scientific computing workflows.

Design Principle

Sanitize-on-export, never modify originals. The sanitize() method returns a new string with replacements applied. Original files on disk are never modified by the privacy scanner.

The .statoignore file supports pattern-based suppression for known false positives, following a format similar to .gitignore.

Interactive Review Gate

The snapshot command integrates the scanner through an interactive review gate. When findings are detected and no --sanitize or --force flag is passed, the user sees a grouped summary and four choices: sanitize (auto-replace), review (see full detail then decide), force (export as-is with warning), or cancel.

See Privacy & Security for usage details.

Bundle Import (Web AI Bridge)

The bundle system solves a specific problem: web AIs (Claude.ai, Gemini, ChatGPT) can generate structured code but cannot run CLI tools. The solution is a two-step workflow:

Crystallize (in web AI): paste the stato crystallize --web prompt, which asks the AI to output a single Python file (stato_bundle.py) containing all expertise as string variables
Import (in coding agent): run stato import-bundle stato_bundle.py to parse, validate, and write the modules

Bundle Format

SKILLS = {
    "skill_name": '''
class SkillName:
    name = "skill_name"
    version = "1.0.0"
    ...
''',
}

PLAN = '''
class AnalysisPlan:
    name = "plan_name"
    ...
'''

MEMORY = '''...'''
CONTEXT = '''...'''

Security Model

The bundle parser (core/bundle.py) uses ast.parse() to safely extract variable values without executing the untrusted file. It walks the AST, finds assignments to SKILLS, PLAN, MEMORY, and CONTEXT, and extracts their string literal values. No code from the bundle file is ever executed during parsing.

After parsing, each extracted module is validated through the full 7-pass compiler before being written to disk via the state manager, maintaining the validate-then-write invariant.

See Web AI Bridge for the complete workflow.

Cross-Platform Strategy

Stato uses the file system as a universal interface. Every supported platform has a convention for project-level instruction files:

Claude Code reads CLAUDE.md
Cursor reads .cursorrules
Codex reads AGENTS.md

These are ordinary files that the agent discovers automatically. Stato generates them as lightweight indexes that point to .stato/ for detailed content.

Token Cost Model

Component	Approximate Tokens
Bridge file	~500
Each skill (on demand)	~300-500
Plan module	~200-400
Memory + Context	~200-300

An agent performing a specific task reads the bridge (~500 tokens) plus the relevant skill (~400 tokens), totaling ~900 tokens of expertise context. This is far less than embedding all expertise inline, which would scale linearly with the number of skills.

The bridge file is regenerated on demand via stato bridge, ensuring it always reflects the current state of .stato/. Agents are instructed via working rules to run stato validate .stato/ after making changes, which keeps the feedback loop tight.

Stato as Persistent Memory

Context compaction (/compact) is a lossy operation. When an agent’s context window fills, the system summarizes the conversation history to free space. This summarization discards specific details: parameter values that worked, error messages that were diagnosed, architectural decisions and their rationale.

Stato modules survive compaction because they are files on disk, not conversation content.

The Crystallize-Compact-Resume Cycle

Before compacting: the agent (or user) runs stato crystallize and captures key expertise into .stato/ modules
During compaction: conversation history is summarized, but .stato/ files remain intact
After compaction: the agent runs stato resume to get a structured recap of project state, or the bridge file (CLAUDE.md) already points to the surviving modules

The stato resume command reads all modules and produces a structured recap covering project context, plan progress with completed step outputs, available skills with parameters and lesson counts, and memory state including known issues and reflections. A --brief mode compresses this to a single paragraph for quick orientation.

Comparison of Memory Approaches

Approach	Validated	Portable	Survives Compact	Structured
Manual `CLAUDE.md`	No	No	Yes (file)	No
`/compact` summary	No	No	No (lossy)	No
Platform memory	No	No	Varies	Varies
Stato modules	Yes (compiler)	Yes (archives)	Yes (files)	Yes (schema)

Limitations

The agent must cooperate by actually writing to .stato/ modules. Crystallization quality depends on prompt quality and the agent’s willingness to follow instructions. Stato validates format, not semantic correctness: a lessons_learned field with “todo: add lessons” will pass validation.

System	Focus	How Stato Differs
SkillKit	Prompt templates	Stato validates skills, not just templates. Modules have structure, versioning, and dependencies.
MemGPT/Letta	Long-term memory via memory management	Stato is file-based and platform-agnostic. No server required.
Voyager (Minecraft)	Skill library for game agents	Stato is general-purpose and cross-platform, not game-specific.
LangGraph	Agent orchestration graphs	Stato manages expertise, not execution flow. Complementary.
CrewAI	Multi-agent coordination	Stato focuses on expertise persistence, not agent communication. Protocol modules are planned for future handoff support.
Custom GPTs	Platform-specific agent configuration	Stato is open, portable, and composable. No vendor lock-in.
CLAUDE.md (manual)	Project-level agent instructions	Stato adds validation, versioning, composition, privacy scanning, and cross-platform bridges.