The Buried
Footnote Forensic Linguistics Analysis
on Corporate Disclosure Documents
About What this is & how it works.
← Back

Mechanical analysis, grounded judgment.

Every signal in The Buried Footnote is the output of deterministic linguistic analysis. Run the same filing through the same analysis twice, six months apart, and the number is the same. There is no language model deciding what to flag. There is no “the AI thinks Tesla's MD&A is hedgier this quarter”. That is the contract.

Reading materially complex documents has been delegated to language models for a few years now. The price of that delegation is that you cannot audit a vibe. When a generative model summarises a filing, it picks language that “sounds right” given its training, which is not the same thing as the language that is in the filing. Model versions roll. Outputs drift. Two months from now the same prompt against the same filing gets a different answer, for reasons nobody can fully account for.

We measure with code, not with prompts. The analysis is reproducible, auditable, and unaffected by what's happening at any model vendor on a given Tuesday.

The Brief — our paid judgment layer — reasons on top of these measurements, never instead of them. It's grounded: every claim it makes cites a mechanical signal or a specific span of the filing, so it can't wander into a vibe the numbers don't support.

What we do

Each public corporate filing (10-K, 10-Q, 40-F) is parsed into its constituent SEC-defined sections. A fixed set of deterministic linguistic signals is computed on each section. Every signal produces a numeric score that traces back to a specific span of text in a specific filing.

Industry cohorts (peer groups of comparable issuers) generate percentile baselines, recomputed weekly. A filing's score against its cohort tells you whether the issuer's language for that section is ordinary, atypical, or extreme relative to peers. The same score against the issuer's own prior filings tells you whether the issuer has drifted. Both views are useful; neither is sufficient on its own.

The signal set itself is proprietary — we don't publish the specifics or their weights. What we publish is the output of running them, citation-anchored to the underlying filings.

What we don't do

  • We do not analyze filings with language models. No model reads a 10-K and decides what's hedgy or what to flag — every signal is computed by deterministic code, traceable to a specific span of text. Language models do the narrative framing of the weekly dossiers, and the paid Brief is a separate, grounded judgment layer — but neither one performs the analysis, and every numeric claim cites a real measurement on a real filing.
  • We do not predict stock returns. Linguistic signals correlate with disclosure quality, not with price direction. We are an attention tool — we surface filings worth reading more carefully — not a buy/sell signal.
  • We do not detect fraud. The signals capture rhetorical patterns; they cannot, on their own, distinguish careful prose from accounting irregularities. What they can do is flag when a filing's language posture diverges in a way that human attention would otherwise miss.

Where the data comes from

  • SEC EDGAR (US 10-K and 10-Q filings, plus 40-F filings from Canadian cross-listed issuers). Fetched directly from data.sec.gov per SEC fair-access policy.
  • Industry cohorts are defined manually from a curated roster of comparable issuers. Definitions are versioned.
  • Cohort baseline statistics (percentile distributions) refresh weekly.

Who's behind it

Solo operator out of Vancouver, BC. Published from Canadian infrastructure. The signal engine descends from earlier work on forensic linguistic analysis of long-form public writing.

Reach the editor at editor@theburiedfootnote.com.

Not investment advice

Nothing on The Buried Footnote is investment advice. The numbers and narratives published here describe linguistic patterns in public disclosures — they do not constitute a recommendation to buy, sell, hold, or refrain from any security. Readers are responsible for their own due diligence. Past linguistic patterns are not predictive of future financial outcomes.