How Fabula Works

Stories Are Knowledge Graphs. We Treat Them That Way.

Most AI tools read scripts the way autocomplete reads sentences—left to right, no memory, no structure. Fabula decomposes narrative into a typed knowledge graph with five core node types, dual temporal indices, and hallucination-guarded extraction. This is what it takes to get it right.

Engineered over 18 months. Tested on 60+ episodes across Star Trek TNG, The West Wing, Doctor Who, and more.

The Foundation

Five Node Types. Every Story Ever Told.

Every narrative, from a sitcom pilot to a seven-season epic, reduces to the same five primitives. The insight isn't that these entities exist—it's how they connect. We separate canonical identity from transient participation state. A Character is who someone is. An EventParticipation is who they are in this scene—their goals, their emotional state, what they did. Most tools flatten this distinction. That's why they lose the plot.

Character

Canonical identity

Event

Narrative action

Scene

Temporal container

Location

Spatial anchor

Object

Narrative prop

When other tools extract “Picard was angry in this scene,” they overwrite his previous state. When Fabula extracts it, the anger is scoped to an EventParticipation—his canonical node retains the full arc. This is the difference between a search index and a knowledge graph.

EventParticipation · BDI Model
Mary Tudor · Wolf Hall S1 Schema-validated JSON
{
  "incarnation_identifier":
    "as the King's Demoted Daughter
     and Cromwell's Reluctant Debtor",
  "emotional_state_at_event":
    "Defiant and calculating, with
     undercurrents of vulnerability",
  "goals_at_event": [
    "Force Cromwell to acknowledge
     the political debt he owes her",
    "Test the limits of his protection"
  ],
  "beliefs_at_event": [
    "Cromwell's 'protection' is as much
     a cage as a shield",
    "Her value as a political pawn is both
     her weakness and her leverage"
  ],
  "importance_to_event": "primary"
}

Not Just “Character Was Present”

Every EventParticipation captures the character's Beliefs, Desires, and Intentions in that specific moment. The incarnation identifier tracks how the character presents in this scene versus their canonical identity. This is what makes queries like “scenes where Mary's goals conflict with Cromwell's beliefs” answerable.

The BDI model is extracted per-character, per-event, producing a temporal stack of psychological states that can be traversed, compared, and queried across the full series.

Read: Why Stories Are Webs →
The Architecture

A Compositional Cognitive Architecture, Not a Prompt.

Fabula doesn't send your script to an LLM with “extract the characters.” It runs a four-phase pipeline where each stage has its own schema constraints, validation gates, and error-recovery paths. Base personas combine with specialist modifiers and style templates—each independently testable and versionable.

1

Structural Decomposition

Parse the screenplay into acts, scenes, and beats. Identify dialogue vs. action. Map the temporal skeleton before touching content.

2

Entity Synthesis

Extract characters, locations, objects. Resolve aliases against the existing graph. “The Captain” = “Picard” = “Jean-Luc.” One canonical node.

3

Event Enrichment with BDI Model

For every event, extract Beliefs, Desires, and Intentions of each participant. Not just what happened—but what each character wanted, believed, and did about it.

4

Graph Construction with Dual Timelines

Build the knowledge graph with two temporal indices: fabula (story-world chronology) and syuzhet (narrative presentation order). Flashbacks get both timestamps.

The key architectural insight: base personas + specialist modifiers + style templates. Each component is independently testable and versionable. When the BDI extractor improves, we deploy it without touching structural decomposition. When a new show has unusual formatting, we swap one style template.

Why This Matters

Monolithic prompts are brittle. Change one thing and three others break. A compositional architecture means each stage has its own contract, its own tests, its own failure modes. It's how you build systems that improve reliably over time.

Pipeline Output
EventInteractionOutput · Wolf Hall S1 Schema-validated JSON
{
  "title": "The Ring and the Reckoning:
            A Chess Match of Power and Debt",
  "sequence_in_scene": 13,
  "key_dialogue": [
    { "speaker": "MARY",
      "dialogue": "You see, I am wearing your
        verses, in praise of obedience." },
    { "speaker": "CROMWELL",
      "dialogue": "Cardinal Wolsey used to say,
        'Show your power by your absence.'" },
    { "speaker": "MARY",
      "dialogue": "Your care of me has been
        so tender. Like that of a father." }
  ],
  "is_flashback": null,
  "arc_uuids": ["arc_tudor_succession_crisis"]
}
Context Engineering

Every Function Gets Exactly the Context It Needs. Nothing More.

The AI industry is waking up to what Andrej Karpathy calls “the delicate art and science of filling the context window with just the right information for the next step.” We've been doing it since day one. Every function in our pipeline receives a dynamically constructed payload—optimised for that specific task, stripped of everything irrelevant, scoped to prevent drift. This is the difference between prompting a model and engineering a system.

Payload Scoping

The entity synthesiser sees only the current scene excerpt and the relevant slice of the existing graph—not the entire screenplay, not the full database. Each function receives a context window assembled for its specific task.

Adaptive Token Budgeting

Content duration and complexity determine sampling density. Short scenes get dense context; long sequences get intelligently compressed. Frame counts, token limits, and output caps are all set dynamically—not by a fixed prompt template.

Context Rot Prevention

Chroma's research shows LLM performance degrades well before stated token limits. We never let it get there. Scope constraints, negative instructions, and temporal boundaries keep each call focused. The model can't hallucinate about scenes it never sees.

Prior State Injection

Each extraction call is anchored to previous work. The draft screenplay, the existing entity graph, the resolved aliases—injected as constraints, not conversation history. The model builds on verified state, not its own prior outputs.

Most AI applications dump everything into a giant context window and hope for the best. That's prompt engineering. Context engineering is the opposite: you build a system that dynamically constructs exactly the right payload for each function call. The model sees only what it needs, formatted how it needs it, with explicit constraints on what it should ignore.

Why This Is the Economics Layer

Cost control at the token level

Every token costs money. Scoped payloads mean we send thousands of tokens per call, not tens of thousands. The savings compound across sixty episodes.

Smaller models, same quality

A focused 8K-token payload to a smaller model outperforms a bloated 128K-token dump to a frontier model. We route tasks to the cheapest model that can handle the scoped context.

Hallucination prevention at source

Context poisoning, context distraction, context confusion—the failure modes Drew Breunig catalogued—all stem from sending the model too much, too irrelevant, or too contradictory information. We eliminate them structurally.

Model-agnostic by design

Because context is engineered per-function, we can swap models without rewriting prompts. OpenAI for vision, Anthropic for reasoning, open-source for commodity tasks. The context layer is the stable interface.

In Practice: What One Extraction Call Actually Sees

INCLUDED

  • Current scene excerpt
  • Relevant entity slice from graph
  • Typed schema contract
  • Scope constraints and negative instructions

EXCLUDED

  • Other scenes in the episode
  • Unrelated graph entities
  • Prior conversation history
  • Previous extraction outputs

THE RESULT

  • Focused, deterministic output
  • No scene-hopping or entity confusion
  • Runs on smaller, cheaper models
  • Reproducible and auditable
The Guardrails

Every Claim Requires Evidence. Every Entity Earns Its Place.

LLMs hallucinate. This is not a philosophical problem—it's an engineering one. We treat hallucination the way databases treat corruption: with schema constraints, validation layers, and evidence requirements at every extraction boundary.

BAML Schema Enforcement

Every LLM output is validated against a typed schema before it enters the graph. Wrong types, missing fields, malformed relationships—rejected at the boundary.

Evidence Grounding

Every extracted entity and relationship must cite the scene, dialogue line, or action description that supports it. No citation, no node.

Confidence Gating at 0.7

Extractions below our confidence threshold are flagged for human review, not silently committed. The graph stays clean; the human stays in the loop.

Contrastive Entity Sharpening

When the model is unsure whether two mentions refer to the same entity, it generates arguments for and against—then resolves with evidence, not probability.

Results: What the Guardrails Deliver

Duplicate entity flags 66% fewer
Harmonization time 62% faster
API calls per episode 43% fewer
Entity Resolution

Zero Entity Drift. From Pilot to Series Finale.

The hardest problem in narrative extraction isn't finding entities—it's keeping them consistent across sixty episodes and five years of production. We solve this with a hybrid architecture: Neo4j for structural queries and relationship traversal, ChromaDB for semantic similarity and fuzzy matching.

Before Entity Resolution

“The Captain” Separate node
“Picard” Separate node
“Jean-Luc” Separate node

After Entity Resolution

Jean-Luc Picard
The Captain Picard Jean-Luc Number One (by Lwaxana)
726
Canonical Entities
0
Entity Drift
<100ms
Query Latency

Neo4j handles the structural questions: “Which characters appear in both the pilot and the finale?” ChromaDB handles the semantic ones: “Is ‘the Starfleet captain on the Enterprise’ the same person as ‘Picard’?” The hybrid architecture means we get both relational precision and semantic intelligence—and the confidence to merge only when the evidence warrants it.

Typed Narrative Edges
connections.yaml Schema-validated
- connection_type: CAUSAL
  strength: strong
  description: "Q's abrupt appearance and assertion of authority
    on the bridge leads directly to his ultimatum commanding
    humanity's retreat, provoking Picard's demand for Q's
    identity and Conn's readiness to fight."

- connection_type: CHARACTER_CONTINUITY
  strength: strong
  description: "Picard's immediate reaction to Conn being frozen
    — administering orders for medical aid and confronting Q —
    reflects his steadfast leadership and moral resolve."

- connection_type: THEMATIC_PARALLEL
  strength: medium
  description: "Both events stage a confrontation between
    institutional authority and individual moral conviction,
    with the bridge serving as contested ground."

Every edge carries a type, a strength, and a narrative claim explaining why these events connect—not just that they do. This is what makes the graph queryable.

Adjudication

Multi-Level LLM Reasoning. Not a Similarity Threshold.

When semantic search surfaces potential duplicates, most systems apply a cosine similarity threshold and hope for the best. Fabula uses multi-level LLM adjudication: a compositional prompt system that reasons through evidence, weighs competing interpretations, and documents every decision. The system doesn't just resolve entities—it gets smarter with each pass.

1

Candidate Discovery

ChromaDB semantic search surfaces plausible matches. Mathematical filtering reduces O(n²) comparisons to a tractable candidate set—only entities that could plausibly be the same.

2

LLM Adjudication

Each candidate pair goes to an LLM with full narrative context, existing descriptions, aliases, and a decision framework. The model reasons through evidence for and against merging—then commits with a confidence score.

3

Graph Update + Sharpening

Merges create a refined canonical entity with synthesised descriptions. Keeps trigger contrastive entity sharpening—enhancing both entities' definitions so they're never flagged as duplicates again.

KEEP_SEPARATE · Contrastive Sharpening
Wolf Hall · Austin Friars Locations Confidence: 0.95
{
  "decision": "KEEP_SEPARATE",
  "reasoning": "Two distinct studies within the
    same townhouse, serving different narrative
    purposes: one a political meeting space
    (Wolsey-Cromwell, daylight, power plays),
    the other a private emotional retreat
    (Gregory's confession, candlelit solitude).",
  "distinction_clarifications": [
    {
      "entity": "Cromwell's Study at Austin Friars",
      "enhancement": "Primary private study for
        introspection and family interactions",
      "distinguishing_features": [
        "Private emotional retreat (nighttime)",
        "13 events across multiple scenes",
        "Site of Gregory Cromwell's confession"
      ]
    },
    {
      "entity": "Cromwell's First Study",
      "name_refinement": "Cromwell's First
        Political Study (Austin Friars)",
      "distinguishing_features": [
        "Wolsey-Cromwell first meeting only",
        "Daylit, tight chamber (political)",
        "Tapestry of Solomon and Sheba"
      ]
    }
  ]
}

The system identifies that these are narratively distinct spaces—one political, one intimate—and sharpens both definitions to prevent future false matches. The “First Study” even gets a refined name.

MERGE · Canonical Refinement
Wolf Hall · Character Entities Confidence: 0.95
{
  "decision": "MERGE",
  "surviving_entity_uuid": "agent_4de4ba187e64",
  "uuids_to_deprecate": ["agent_d7bba7626ce0"],
  "refined_canonical_name":
    "Thomas Howard, Duke of Norfolk",
  "refined_foundational_description":
    "Thomas Howard, Duke of Norfolk, heads the
     Howard family among Henry VIII's leading
     nobles. A staunch defender of aristocratic
     privilege who opposes Cromwell's rise through
     aggressive outbursts, pointed interrogations,
     and calculated defiance...",
  "refined_aliases": [
    "Norfolk",
    "Duke of Norfolk",
    "Thomas Howard"
  ],
  "reasoning": "Both entities refer to the same
    historical figure. Aliases overlap. Descriptions
    are complementary, not contradictory."
}

The merge creates a single canonical entity with a synthesised description drawn from both sources, and a consolidated alias list. The deprecated UUID is preserved in the graph—never deleted.

Full Audit Chain

Every adjudication decision—the reasoning, the confidence score, the evidence weighed—is stored in the graph alongside the entities it affected. Deprecated entities aren't deleted; they're linked to the surviving canonical with the full decision chain. When someone asks “why did you merge these?” or “why are these separate?”—there's a complete, LLM-generated analytical answer.

For IP owners, this is the foundation of provenance. A knowledge graph of narrative isn't just a creative tool—it's an auditable record of every analytical claim made about the property. In a world where IP valuation drives trillion-dollar transactions, that audit trail has real value.

A System That Gets Smarter

Contrastive entity sharpening isn't just about the current decision. Every KEEP_SEPARATE adds distinguishing features and description enhancements that update the entities' embeddings in ChromaDB. Next time semantic search runs, those two entities are further apart in vector space. Fewer false positives. Fewer adjudication calls. Lower cost.

Episode 1: Two vague “Cromwell's Study” entries flagged as duplicates, adjudicated, sharpened.

Episode 6: Same locations extracted again—but now their descriptions are specific enough that ChromaDB doesn't flag them. Zero cost. Zero latency.

By series end: The graph has self-optimised its own entity boundaries. Each round is faster, cheaper, and more precise than the last.

The distinction between “Cromwell's private study for family confessions” and “the chamber where Wolsey first summoned him” is exactly the kind of nuance that matters for production: which set to redress, which emotional register to maintain, which thematic thread to follow. A simple string match would merge them. The adjudicator understands they're different rooms in the same house.

Performance

83 Minutes Per Episode. Down from Five Hours.

Performance isn't a feature—it's the difference between a research prototype and production software. We've obsessed over the pipeline, finding 3.6x improvements through architectural discipline rather than hardware scaling.

5h → 83min

Prep / Async / Save

The three-step pattern: prepare all data before LLM calls, run extraction phases concurrently where possible, batch-write results. Simple discipline, dramatic results.

9,993 → 254

Smart Duplicate Filtering

Mathematical filtering reduces pairwise comparisons from O(n²) to a tractable set. We don't compare every entity against every other—we compare candidates that could plausibly match.

18x faster

Batch Graph Operations

Neo4j UNWIND operations replace individual node-by-node writes. One batch operation where we used to make thousands of individual calls.

Read: From 5 Hours to 83 Minutes →
Security

Your Scripts Never Leave Your Control

We Never

  • Share your scripts with other customers
  • Use your scripts to train AI models
  • Store scripts on shared infrastructure
  • Allow cross-customer data access

You Control

  • Where your data is stored (our servers or yours)
  • Which AI providers process your scripts
  • Who has access to your knowledge graph
  • When data gets deleted (it's permanent)

Cloud

We host everything. You upload scripts, we handle servers, backups, updates.

Self-Hosted

You run it on your infrastructure. Your scripts never leave your network.

Hybrid

Process scripts on your servers. Use our cloud for search and visualization.

See the Architecture in Action

Explore our live knowledge graph of The West Wing, or request a demo with your own scripts.