Stories Are Knowledge Graphs. We Treat Them That Way.
Most AI tools read scripts the way autocomplete reads sentences—left to right, no memory, no structure. Fabula decomposes narrative into a typed knowledge graph with five core node types, dual temporal indices, and hallucination-guarded extraction. This is what it takes to get it right.
Engineered over 18 months. Tested on 60+ episodes across Star Trek TNG, The West Wing, Doctor Who, and more.
Five Node Types. Every Story Ever Told.
Every narrative, from a sitcom pilot to a seven-season epic, reduces to the same five primitives. The insight isn't that these entities exist—it's how they connect. We separate canonical identity from transient participation state. A Character is who someone is. An EventParticipation is who they are in this scene—their goals, their emotional state, what they did. Most tools flatten this distinction. That's why they lose the plot.
Character
Canonical identity
Event
Narrative action
Scene
Temporal container
Location
Spatial anchor
Object
Narrative prop
When other tools extract “Picard was angry in this scene,” they overwrite his previous state. When Fabula extracts it, the anger is scoped to an EventParticipation—his canonical node retains the full arc. This is the difference between a search index and a knowledge graph.
{
"incarnation_identifier":
"as the King's Demoted Daughter
and Cromwell's Reluctant Debtor",
"emotional_state_at_event":
"Defiant and calculating, with
undercurrents of vulnerability",
"goals_at_event": [
"Force Cromwell to acknowledge
the political debt he owes her",
"Test the limits of his protection"
],
"beliefs_at_event": [
"Cromwell's 'protection' is as much
a cage as a shield",
"Her value as a political pawn is both
her weakness and her leverage"
],
"importance_to_event": "primary"
}
Not Just “Character Was Present”
Every EventParticipation captures the character's Beliefs, Desires, and Intentions in that specific moment. The incarnation identifier tracks how the character presents in this scene versus their canonical identity. This is what makes queries like “scenes where Mary's goals conflict with Cromwell's beliefs” answerable.
The BDI model is extracted per-character, per-event, producing a temporal stack of psychological states that can be traversed, compared, and queried across the full series.
A Compositional Cognitive Architecture, Not a Prompt.
Fabula doesn't send your script to an LLM with “extract the characters.” It runs a four-phase pipeline where each stage has its own schema constraints, validation gates, and error-recovery paths. Base personas combine with specialist modifiers and style templates—each independently testable and versionable.
Structural Decomposition
Parse the screenplay into acts, scenes, and beats. Identify dialogue vs. action. Map the temporal skeleton before touching content.
Entity Synthesis
Extract characters, locations, objects. Resolve aliases against the existing graph. “The Captain” = “Picard” = “Jean-Luc.” One canonical node.
Event Enrichment with BDI Model
For every event, extract Beliefs, Desires, and Intentions of each participant. Not just what happened—but what each character wanted, believed, and did about it.
Graph Construction with Dual Timelines
Build the knowledge graph with two temporal indices: fabula (story-world chronology) and syuzhet (narrative presentation order). Flashbacks get both timestamps.
The key architectural insight: base personas + specialist modifiers + style templates. Each component is independently testable and versionable. When the BDI extractor improves, we deploy it without touching structural decomposition. When a new show has unusual formatting, we swap one style template.
Why This Matters
Monolithic prompts are brittle. Change one thing and three others break. A compositional architecture means each stage has its own contract, its own tests, its own failure modes. It's how you build systems that improve reliably over time.
{
"title": "The Ring and the Reckoning:
A Chess Match of Power and Debt",
"sequence_in_scene": 13,
"key_dialogue": [
{ "speaker": "MARY",
"dialogue": "You see, I am wearing your
verses, in praise of obedience." },
{ "speaker": "CROMWELL",
"dialogue": "Cardinal Wolsey used to say,
'Show your power by your absence.'" },
{ "speaker": "MARY",
"dialogue": "Your care of me has been
so tender. Like that of a father." }
],
"is_flashback": null,
"arc_uuids": ["arc_tudor_succession_crisis"]
}
Every Function Gets Exactly the Context It Needs. Nothing More.
The AI industry is waking up to what Andrej Karpathy calls “the delicate art and science of filling the context window with just the right information for the next step.” We've been doing it since day one. Every function in our pipeline receives a dynamically constructed payload—optimised for that specific task, stripped of everything irrelevant, scoped to prevent drift. This is the difference between prompting a model and engineering a system.
Payload Scoping
The entity synthesiser sees only the current scene excerpt and the relevant slice of the existing graph—not the entire screenplay, not the full database. Each function receives a context window assembled for its specific task.
Adaptive Token Budgeting
Content duration and complexity determine sampling density. Short scenes get dense context; long sequences get intelligently compressed. Frame counts, token limits, and output caps are all set dynamically—not by a fixed prompt template.
Context Rot Prevention
Chroma's research shows LLM performance degrades well before stated token limits. We never let it get there. Scope constraints, negative instructions, and temporal boundaries keep each call focused. The model can't hallucinate about scenes it never sees.
Prior State Injection
Each extraction call is anchored to previous work. The draft screenplay, the existing entity graph, the resolved aliases—injected as constraints, not conversation history. The model builds on verified state, not its own prior outputs.
Most AI applications dump everything into a giant context window and hope for the best. That's prompt engineering. Context engineering is the opposite: you build a system that dynamically constructs exactly the right payload for each function call. The model sees only what it needs, formatted how it needs it, with explicit constraints on what it should ignore.
Why This Is the Economics Layer
Cost control at the token level
Every token costs money. Scoped payloads mean we send thousands of tokens per call, not tens of thousands. The savings compound across sixty episodes.
Smaller models, same quality
A focused 8K-token payload to a smaller model outperforms a bloated 128K-token dump to a frontier model. We route tasks to the cheapest model that can handle the scoped context.
Hallucination prevention at source
Context poisoning, context distraction, context confusion—the failure modes Drew Breunig catalogued—all stem from sending the model too much, too irrelevant, or too contradictory information. We eliminate them structurally.
Model-agnostic by design
Because context is engineered per-function, we can swap models without rewriting prompts. OpenAI for vision, Anthropic for reasoning, open-source for commodity tasks. The context layer is the stable interface.
In Practice: What One Extraction Call Actually Sees
INCLUDED
- Current scene excerpt
- Relevant entity slice from graph
- Typed schema contract
- Scope constraints and negative instructions
EXCLUDED
- Other scenes in the episode
- Unrelated graph entities
- Prior conversation history
- Previous extraction outputs
THE RESULT
- Focused, deterministic output
- No scene-hopping or entity confusion
- Runs on smaller, cheaper models
- Reproducible and auditable
Every Claim Requires Evidence. Every Entity Earns Its Place.
LLMs hallucinate. This is not a philosophical problem—it's an engineering one. We treat hallucination the way databases treat corruption: with schema constraints, validation layers, and evidence requirements at every extraction boundary.
BAML Schema Enforcement
Every LLM output is validated against a typed schema before it enters the graph. Wrong types, missing fields, malformed relationships—rejected at the boundary.
Evidence Grounding
Every extracted entity and relationship must cite the scene, dialogue line, or action description that supports it. No citation, no node.
Confidence Gating at 0.7
Extractions below our confidence threshold are flagged for human review, not silently committed. The graph stays clean; the human stays in the loop.
Contrastive Entity Sharpening
When the model is unsure whether two mentions refer to the same entity, it generates arguments for and against—then resolves with evidence, not probability.
Results: What the Guardrails Deliver
Zero Entity Drift. From Pilot to Series Finale.
The hardest problem in narrative extraction isn't finding entities—it's keeping them consistent across sixty episodes and five years of production. We solve this with a hybrid architecture: Neo4j for structural queries and relationship traversal, ChromaDB for semantic similarity and fuzzy matching.
Before Entity Resolution
After Entity Resolution
Neo4j handles the structural questions: “Which characters appear in both the pilot and the finale?” ChromaDB handles the semantic ones: “Is ‘the Starfleet captain on the Enterprise’ the same person as ‘Picard’?” The hybrid architecture means we get both relational precision and semantic intelligence—and the confidence to merge only when the evidence warrants it.
- connection_type: CAUSAL
strength: strong
description: "Q's abrupt appearance and assertion of authority
on the bridge leads directly to his ultimatum commanding
humanity's retreat, provoking Picard's demand for Q's
identity and Conn's readiness to fight."
- connection_type: CHARACTER_CONTINUITY
strength: strong
description: "Picard's immediate reaction to Conn being frozen
— administering orders for medical aid and confronting Q —
reflects his steadfast leadership and moral resolve."
- connection_type: THEMATIC_PARALLEL
strength: medium
description: "Both events stage a confrontation between
institutional authority and individual moral conviction,
with the bridge serving as contested ground."
Every edge carries a type, a strength, and a narrative claim explaining why these events connect—not just that they do. This is what makes the graph queryable.
Multi-Level LLM Reasoning. Not a Similarity Threshold.
When semantic search surfaces potential duplicates, most systems apply a cosine similarity threshold and hope for the best. Fabula uses multi-level LLM adjudication: a compositional prompt system that reasons through evidence, weighs competing interpretations, and documents every decision. The system doesn't just resolve entities—it gets smarter with each pass.
Candidate Discovery
ChromaDB semantic search surfaces plausible matches. Mathematical filtering reduces O(n²) comparisons to a tractable candidate set—only entities that could plausibly be the same.
LLM Adjudication
Each candidate pair goes to an LLM with full narrative context, existing descriptions, aliases, and a decision framework. The model reasons through evidence for and against merging—then commits with a confidence score.
Graph Update + Sharpening
Merges create a refined canonical entity with synthesised descriptions. Keeps trigger contrastive entity sharpening—enhancing both entities' definitions so they're never flagged as duplicates again.
{
"decision": "KEEP_SEPARATE",
"reasoning": "Two distinct studies within the
same townhouse, serving different narrative
purposes: one a political meeting space
(Wolsey-Cromwell, daylight, power plays),
the other a private emotional retreat
(Gregory's confession, candlelit solitude).",
"distinction_clarifications": [
{
"entity": "Cromwell's Study at Austin Friars",
"enhancement": "Primary private study for
introspection and family interactions",
"distinguishing_features": [
"Private emotional retreat (nighttime)",
"13 events across multiple scenes",
"Site of Gregory Cromwell's confession"
]
},
{
"entity": "Cromwell's First Study",
"name_refinement": "Cromwell's First
Political Study (Austin Friars)",
"distinguishing_features": [
"Wolsey-Cromwell first meeting only",
"Daylit, tight chamber (political)",
"Tapestry of Solomon and Sheba"
]
}
]
}
The system identifies that these are narratively distinct spaces—one political, one intimate—and sharpens both definitions to prevent future false matches. The “First Study” even gets a refined name.
{
"decision": "MERGE",
"surviving_entity_uuid": "agent_4de4ba187e64",
"uuids_to_deprecate": ["agent_d7bba7626ce0"],
"refined_canonical_name":
"Thomas Howard, Duke of Norfolk",
"refined_foundational_description":
"Thomas Howard, Duke of Norfolk, heads the
Howard family among Henry VIII's leading
nobles. A staunch defender of aristocratic
privilege who opposes Cromwell's rise through
aggressive outbursts, pointed interrogations,
and calculated defiance...",
"refined_aliases": [
"Norfolk",
"Duke of Norfolk",
"Thomas Howard"
],
"reasoning": "Both entities refer to the same
historical figure. Aliases overlap. Descriptions
are complementary, not contradictory."
}
The merge creates a single canonical entity with a synthesised description drawn from both sources, and a consolidated alias list. The deprecated UUID is preserved in the graph—never deleted.
Full Audit Chain
Every adjudication decision—the reasoning, the confidence score, the evidence weighed—is stored in the graph alongside the entities it affected. Deprecated entities aren't deleted; they're linked to the surviving canonical with the full decision chain. When someone asks “why did you merge these?” or “why are these separate?”—there's a complete, LLM-generated analytical answer.
For IP owners, this is the foundation of provenance. A knowledge graph of narrative isn't just a creative tool—it's an auditable record of every analytical claim made about the property. In a world where IP valuation drives trillion-dollar transactions, that audit trail has real value.
A System That Gets Smarter
Contrastive entity sharpening isn't just about the current decision. Every KEEP_SEPARATE adds distinguishing features and description enhancements that update the entities' embeddings in ChromaDB. Next time semantic search runs, those two entities are further apart in vector space. Fewer false positives. Fewer adjudication calls. Lower cost.
Episode 1: Two vague “Cromwell's Study” entries flagged as duplicates, adjudicated, sharpened.
Episode 6: Same locations extracted again—but now their descriptions are specific enough that ChromaDB doesn't flag them. Zero cost. Zero latency.
By series end: The graph has self-optimised its own entity boundaries. Each round is faster, cheaper, and more precise than the last.
The distinction between “Cromwell's private study for family confessions” and “the chamber where Wolsey first summoned him” is exactly the kind of nuance that matters for production: which set to redress, which emotional register to maintain, which thematic thread to follow. A simple string match would merge them. The adjudicator understands they're different rooms in the same house.
83 Minutes Per Episode. Down from Five Hours.
Performance isn't a feature—it's the difference between a research prototype and production software. We've obsessed over the pipeline, finding 3.6x improvements through architectural discipline rather than hardware scaling.
Prep / Async / Save
The three-step pattern: prepare all data before LLM calls, run extraction phases concurrently where possible, batch-write results. Simple discipline, dramatic results.
Smart Duplicate Filtering
Mathematical filtering reduces pairwise comparisons from O(n²) to a tractable set. We don't compare every entity against every other—we compare candidates that could plausibly match.
Batch Graph Operations
Neo4j UNWIND operations replace individual node-by-node writes. One batch operation where we used to make thousands of individual calls.
Your Scripts Never Leave Your Control
We Never
- Share your scripts with other customers
- Use your scripts to train AI models
- Store scripts on shared infrastructure
- Allow cross-customer data access
You Control
- Where your data is stored (our servers or yours)
- Which AI providers process your scripts
- Who has access to your knowledge graph
- When data gets deleted (it's permanent)
Cloud
We host everything. You upload scripts, we handle servers, backups, updates.
Self-Hosted
You run it on your infrastructure. Your scripts never leave your network.
Hybrid
Process scripts on your servers. Use our cloud for search and visualization.
See the Architecture in Action
Explore our live knowledge graph of The West Wing, or request a demo with your own scripts.