How the scoring works
Transparent weights, published math, named limitations. The scores on this site are the product of specific judgment calls — they look precise because they're numbers, but the numbers are only as good as the weights and the corpus feeding them.
The Reality Index
A single 0–100 reading of how grounded the prevailing Mythos narrative is in substantive, well-sourced reality right now. It is a weighted composite of three of the four axis scores — Evidence, Substance, and Confidence. Skepticism is deliberately not a separate input because it is already folded into Evidence (credible pushback subtracts from weighted support at ingest). Counting it twice would double-penalize.
The weighting reflects a specific judgment: evidence matters most (it is what the corpus is actually voting on), source substance matters second (a corpus of commentary does not earn the same weight as a corpus of government evaluations), and calibration third (it pulls the index toward zero when the corpus is too thin to trust a read). If the site ships a daily-snapshot view later, the composite is the number plotted over time.
The three axis scores
Each story in the corpus contributes a weighted vote across three axes. The weight of a story is tier × type_multiplier × stance_coefficient.
- T1× 3.0
- T2× 2.0
- T3× 1.0
- T4× 0.4
- Primary× 1.5
- Government× 1.4
- Research× 1.3
- Industry× 0.9
- News× 1.0
- Commentary× 0.6
- supports+1.0
- contextualizes+0.3
- questions-0.7
Formulas
Mechanism read
The mechanism read asks a different question from the Reality Index: not whether Mythos capability is real, but what the corpus says is driving it. Stories can carry zero or more neutral mechanism tags. Each tagged story contributes its normal source weight, split evenly across its tags.
- Base-model capability
Evidence points to the underlying frontier model being materially better at coding, reasoning, exploit chaining, or proof generation.
- Guardrails and access
Evidence points to relaxed safeguards, access gating, refusal policy, or deployment constraints as a major part of the gap.
- Harness and workflow
Evidence points to scaffolding, repo-scale context, tools, validation loops, or target-selection workflow around the model.
- Commodity model diffusion
Evidence points to smaller, cheaper, open-weight, or competitor models recovering similar analysis or quickly closing the gap.
This is deliberately evidence-derived, not hand-authored editorial copy. New ingested stories can carry mechanism tags from the classifier; older or untagged stories get a conservative keyword inference pass until they are curated.
Scenario priors
Each scenario starts with a prior probability before evidence is applied. These priors reflect a judgment about base rates for this class of question — they're defensible but not neutral.
- Contained advantage — Glasswing holds, capability stays asymmetric for 12+ monthsprior 22
- Capability commoditizes — Comparable capability reaches attackers within 6-12 monthsprior 20
- Narrative over-corrects — Independent reproduction narrows the capability gapprior 10
- Material incident — Mythos-class capability used in a disclosed attack within 12 monthsprior 5
- Defender advantage holds — Glasswing delivers measurable patching before capability diffusesprior 8
- Regulatory intervention — Meaningful policy action within 6-9 monthsprior 14
- Industry parity — Multiple labs reach comparable gated-model stateprior 21
Honest limitations
Corpus selection bias. The current corpus is English-language and US/UK-weighted. Serious coverage in Chinese, German, French, and specialized security press is not yet indexed. Probabilities will drift as that coverage is added.
Weight choice is subjective. Tier 1 = 3.0 and questions coefficient = −0.7 are judgment calls. Different reasonable analysts would pick different weights. This changes the numbers but not typically the rank order of scenarios.
Priors are not neutral. The seven scenario priors reflect one reading of base rates. A materially different read of the base rates would change the weighting between scenarios.
Scenarios are mutually exclusive by construction, not by reality. In practice a future could involve multiple scenarios (regulatory intervention + industry parity, for example). The model forces a single-frame characterization for the 12-month period.
No automated refresh yet. The corpus is manually curated. As news continues to land, these numbers will lag until new stories are indexed. Automated ingestion is Phase 2.