Hollywood’s memory problem

A reference graph for the largest industry that doesn't have one yet.

Finance has Bloomberg. Law has LexisNexis. Healthcare has Epic. Scripted production? It has Post-It notes on a whiteboard, a frazzled script coordinator's memory, and a shared drive folder named "OLD-BIBLE-DONOTDELETE". The multi-billion dollar industry that runs on continuity has no canonical record. Until now.

Fabula brings the missing ledger. Every character, location, organisation, and event extracted from the script, connected, and cited to the line that produced it. The same record serves the writers’ room, the script supervisor, and an AI team building agents over the IP.

// Canonical character synthesis · Wolf Hall

{
  "canonical_name": "Thomas Cromwell",
  "title_role": "Secretary to the King",
  "aliases": ["Cromwell", "Master Cromwell", "Crumb", "Cremuel",
    "Young Thomas Cromwell", "Thomas Cromwell (Patriarch)"],
  "foundational_description": "Thomas Cromwell, born the son of a
    blacksmith, rises to become Henry VIII's chief minister and Earl
    of Essex. A master strategist of the English Reformation, he
    dissolves monasteries, engineers royal marriages (Anne of Cleves),
    and advances evangelical reforms while navigating the volatile
    Tudor court. His pragmatic intellect and political acumen allow
    him to mentor loyal aides like Rafe Sadler and Thomas Wriothesley,
    but his ambition and temper expose him to the perils of court
    survival. Key rivals include the Duke of Norfolk, Bishop Gardiner,
    and Princess Mary. His past loyalty to Cardinal Wolsey and
    strategic silences further complicate his position amid shifting
    royal whims and factional dangers.",
  "foundational_traits": ["resilient", "astute", "pragmatic",
    "calculating", "paternal", "protective"],
  "sphere_of_influence": "Tudor court politics and diplomacy",
  "appearance_count": 480,
  "importance_tier": "anchor",
  "source_candidate_uuids": [
    "cand_agent_scene_03a1_1",
    "cand_agent_scene_03a1_4",
    "cand_agent_scene_0412_2"
    // ...477 more
  ],
  "resolution_confidence": 1.0
}

the pitch

Four investable categories. One product.

Fabula sits inside four of the fastest-rising trends in enterprise AI.

Thesis 02

Vertical Knowledge Graph

Knowledge graphs underpin most large software companies you can name. The Semantic Knowledge Graphing market grows at 14.2% CAGR through 2030. Television production is the largest vertical without a canonical graph; we're the first credible attempt at one.

Thesis 03

World Model

World models are the prediction targets behind the next generation of AI agents. Fabula extracts a narrative-state world model from existing material — belief, goal, and intention transitions per character per scene. That’s the input most agents over story material need and don’t have: the narrative world in ultra-HD.

Thesis 04

Synthetic Training Data

The synthetic data market sat near $600M in 2024 and is forecast past $10B by the mid-2030s on a 30%+ CAGR. The drivers aren't going anywhere: privacy law, the cost of acquiring real data, and the data wall every foundation-model team is now staring at. Fabula makes a particular flavour of synthetic data — structured, sourced, adjudicated against evidence — from material that's hard to scrape and harder still to interpret correctly. We already publish it: the catalogue sits on Hugging Face today.

Everything Everywhere All At Once

Catalogue graphs know the show exists. Fabula knows what happens inside it.

Streaming platforms have invested heavily in catalogue knowledge graphs: titles, talent, genres, ratings, rights, availability, viewer behaviour. Those graphs power recommendation, discovery, scheduling, and licensing. They are excellent at the questions around a show.

Fabula answers the questions inside one. Who was in the room? What did they know? What did they want? What changed? Which object mattered? Which line set up the later reversal? Which relationship shifted without anyone saying it out loud?

Catalogue graphs serve the viewer finding something to watch. Story graphs serve the teams trying to understand, extend, protect, adapt, or automate the world of the story.

That is why our graph starts below the metadata layer: events, participations, relationships, beliefs, goals, causal links, callbacks, emotional echoes, and provenance. Not the wrapper around the show. The dramatic machinery inside it.

The same architecture transfers because the hard problem is not television-specific. It is extracting a coherent, cited, queryable record from material too long and too interdependent for one model call to hold in its head. Entertainment is where we stress-tested it. It is not where it stops.

CATALOGUE LAYER Title Talent Genre Studio STORY-INTERNAL LAYER · fabula Character Event Belief Causal link Scene Provenance
fig. 02 — catalogue graphs above, story graphs below

supply chain

The data well is running dry.

High-quality text — the kind you can train an AI on — is nearly exhausted (Epoch AI). Scraping the web again just gives you remixed garbage, like a studio endlessly rebooting the same franchise until it’s a zombie. Models trained on their own outputs degrade, losing signal and originality (Nature, 2024).

Fabula doesn’t scrape. It reads. It produces new, structured, sourced data from material that’s historically been a black box: long-form narrative.

Not reheating leftovers; opening a new kitchen. Every fact, character, and event is linked back to its source line in the script, published as datasets on Hugging Face.

Existing material in. New, structured, queryable data out.

first run, syndication, streaming

A boutique market we can serve now, opening into a much larger one.

The TAM numbers below are sized to specific buyers, not the entire entertainment industry. We are selling into the first today; the second and third are how the same architecture grows. We've avoided summing them.

Today · canonical record $500M

Sized to studios and production companies that need a canonical record of their IP — the writers’ room reference, the bible that doesn’t drift, the asset that gets handed across seasons and adaptations. Today this work happens manually, by people whose memory is the asset.

Next · ai retrieval $5B

Sized to AI tooling spend on retrieval over IP libraries: game adaptation pipelines, video grounding, virtual production prep. The graph is the input layer most of these tools need.

Long-run · synthetic media $50B+

Sized to AI-assisted content creation if it scales as forecast. The bottleneck for any system generating long-form story is having a queryable canonical record. We make that record.

the cold open

Fabula uses AI to build computable story canon for scripted entertainment.

Where LLMs hallucinate, Fabula brings receipts.