Methodology

How the scoring works

Transparent weights, published math, named limitations. The scores on this site are the product of specific judgment calls — they look precise because they're numbers, but the numbers are only as good as the weights and the corpus feeding them.

The Reality Index

A single 0–100 reading of how grounded the prevailing Mythos narrative is in substantive, well-sourced reality right now. It is a weighted composite of three of the four axis scores — Evidence, Substance, and Confidence. Skepticism is deliberately not a separate input because it is already folded into Evidence (credible pushback subtracts from weighted support at ingest). Counting it twice would double-penalize.

Formula

0.5 × Evidence + 0.3 × Substance + 0.2 × Confidence

Bands

Hype-dominant0–25

Claims are circulating faster than they are being validated. Most of the weighted discourse is commentary rather than primary reporting or independent evaluation.

Contested26–50

Claims have some footing but are not broadly corroborated. Credible voices are actively questioning the framing, and the substantive source base is still thin.

Developing51–75

Substantive sources are aligning and the core narrative is solidifying, though open questions remain and additional Tier-1 evidence would sharpen the read.

Well-evidenced76–100

The narrative has strong evidentiary footing across source tiers with minimal credible pushback. Treat the core framing as established and focus attention on unresolved edges.

The weighting reflects a specific judgment: evidence matters most (it is what the corpus is actually voting on), source substance matters second (a corpus of commentary does not earn the same weight as a corpus of government evaluations), and calibration third (it pulls the index toward zero when the corpus is too thin to trust a read). If the site ships a daily-snapshot view later, the composite is the number plotted over time.

The three axis scores

Each story in the corpus contributes a weighted vote across three axes. The weight of a story is tier × type_multiplier × stance_coefficient.

Tier weight

T1× 3.0
T2× 2.0
T3× 1.0
T4× 0.4

Source type multiplier

Primary× 1.5
Government× 1.4
Research× 1.3
Industry× 0.9
News× 1.0
Commentary× 0.6

Stance coefficient

supports+1.0
contextualizes+0.3
questions-0.7

Formulas

Validation

(supports_weight + 0.3 × context_weight − 0.7 × questions_weight) ÷ total_max_weight × 100

Signal / noise

substantive_weight ÷ (substantive + ephemeral) × 100

Substantive = Primary + Government + Research. Ephemeral = everything else.

Skepticism ratio

questions_weight ÷ (supports + questions) × 100

Scenario probability

prior + 2.5 × (for_weight − 0.6 × against_weight) ÷ Σ(all scenarios) × 100

Normalized so all seven scenarios sum to 100. The 0.6 asymmetry reflects that absence of a contradicting signal is softer evidence than presence of a supporting one.

Mechanism read

The mechanism read asks a different question from the Reality Index: not whether Mythos capability is real, but what the corpus says is driving it. Stories can carry zero or more neutral mechanism tags. Each tagged story contributes its normal source weight, split evenly across its tags.

Base-model capability
Evidence points to the underlying frontier model being materially better at coding, reasoning, exploit chaining, or proof generation.
Guardrails and access
Evidence points to relaxed safeguards, access gating, refusal policy, or deployment constraints as a major part of the gap.
Harness and workflow
Evidence points to scaffolding, repo-scale context, tools, validation loops, or target-selection workflow around the model.
Commodity model diffusion
Evidence points to smaller, cheaper, open-weight, or competitor models recovering similar analysis or quickly closing the gap.

This is deliberately evidence-derived, not hand-authored editorial copy. New ingested stories can carry mechanism tags from the classifier; older or untagged stories get a conservative keyword inference pass until they are curated.

Scenario priors

Each scenario starts with a prior probability before evidence is applied. These priors reflect a judgment about base rates for this class of question — they're defensible but not neutral.

Contained advantage — Glasswing holds, capability stays asymmetric for 12+ months
prior 22
Capability commoditizes — Comparable capability reaches attackers within 6-12 months
prior 20
Narrative over-corrects — Independent reproduction narrows the capability gap
prior 10
Material incident — Mythos-class capability used in a disclosed attack within 12 months
prior 5
Defender advantage holds — Glasswing delivers measurable patching before capability diffuses
prior 8
Regulatory intervention — Meaningful policy action within 6-9 months
prior 14
Industry parity — Multiple labs reach comparable gated-model state
prior 21

Honest limitations

Corpus selection bias. The current corpus is English-language and US/UK-weighted. Serious coverage in Chinese, German, French, and specialized security press is not yet indexed. Probabilities will drift as that coverage is added.

Weight choice is subjective. Tier 1 = 3.0 and questions coefficient = −0.7 are judgment calls. Different reasonable analysts would pick different weights. This changes the numbers but not typically the rank order of scenarios.

Priors are not neutral. The seven scenario priors reflect one reading of base rates. A materially different read of the base rates would change the weighting between scenarios.

Scenarios are mutually exclusive by construction, not by reality. In practice a future could involve multiple scenarios (regulatory intervention + industry parity, for example). The model forces a single-frame characterization for the 12-month period.

No automated refresh yet. The corpus is manually curated. As news continues to land, these numbers will lag until new stories are indexed. Automated ingestion is Phase 2.