Back to watch
Methodology

How the scoring works

Transparent weights, published math, named limitations. The scores on this site are the product of specific judgment calls — they look precise because they're numbers, but the numbers are only as good as the weights and the corpus feeding them.

The Reality Index

A single 0–100 reading of how grounded the prevailing Mythos narrative is in substantive, well-sourced reality right now. It is a weighted composite of three of the four axis scores — Evidence, Substance, and Confidence. Skepticism is deliberately not a separate input because it is already folded into Evidence (credible pushback subtracts from weighted support at ingest). Counting it twice would double-penalize.

Formula
0.5 × Evidence + 0.3 × Substance + 0.2 × Confidence
Bands
Hype-dominant0–25
Claims are circulating faster than they are being validated. Most of the weighted discourse is commentary rather than primary reporting or independent evaluation.
Contested26–50
Claims have some footing but are not broadly corroborated. Credible voices are actively questioning the framing, and the substantive source base is still thin.
Developing51–75
Substantive sources are aligning and the core narrative is solidifying, though open questions remain and additional Tier-1 evidence would sharpen the read.
Well-evidenced76–100
The narrative has strong evidentiary footing across source tiers with minimal credible pushback. Treat the core framing as established and focus attention on unresolved edges.

The weighting reflects a specific judgment: evidence matters most (it is what the corpus is actually voting on), source substance matters second (a corpus of commentary does not earn the same weight as a corpus of government evaluations), and calibration third (it pulls the index toward zero when the corpus is too thin to trust a read). If the site ships a daily-snapshot view later, the composite is the number plotted over time.

The three axis scores

Each story in the corpus contributes a weighted vote across three axes. The weight of a story is tier × type_multiplier × stance_coefficient.

Tier weight
  • T1× 3.0
  • T2× 2.0
  • T3× 1.0
  • T4× 0.4
Source type multiplier
  • Primary× 1.5
  • Government× 1.4
  • Research× 1.3
  • Industry× 0.9
  • News× 1.0
  • Commentary× 0.6
Stance coefficient
  • supports+1.0
  • contextualizes+0.3
  • questions-0.7

Formulas

Validation
(supports_weight + 0.3 × context_weight − 0.7 × questions_weight) ÷ total_max_weight × 100
Signal / noise
substantive_weight ÷ (substantive + ephemeral) × 100
Substantive = Primary + Government + Research. Ephemeral = everything else.
Skepticism ratio
questions_weight ÷ (supports + questions) × 100
Scenario probability
prior + 2.5 × (for_weight − 0.6 × against_weight) ÷ Σ(all scenarios) × 100
Normalized so all seven scenarios sum to 100. The 0.6 asymmetry reflects that absence of a contradicting signal is softer evidence than presence of a supporting one.

Mechanism read

The mechanism read asks a different question from the Reality Index: not whether Mythos capability is real, but what the corpus says is driving it. Stories can carry zero or more neutral mechanism tags. Each tagged story contributes its normal source weight, split evenly across its tags.

  • Base-model capability

    Evidence points to the underlying frontier model being materially better at coding, reasoning, exploit chaining, or proof generation.

  • Guardrails and access

    Evidence points to relaxed safeguards, access gating, refusal policy, or deployment constraints as a major part of the gap.

  • Harness and workflow

    Evidence points to scaffolding, repo-scale context, tools, validation loops, or target-selection workflow around the model.

  • Commodity model diffusion

    Evidence points to smaller, cheaper, open-weight, or competitor models recovering similar analysis or quickly closing the gap.

This is deliberately evidence-derived, not hand-authored editorial copy. New ingested stories can carry mechanism tags from the classifier; older or untagged stories get a conservative keyword inference pass until they are curated.

Scenario priors

Each scenario starts with a prior probability before evidence is applied. These priors reflect a judgment about base rates for this class of question — they're defensible but not neutral.

  • Contained advantageGlasswing holds, capability stays asymmetric for 12+ months
    prior 22
  • Capability commoditizesComparable capability reaches attackers within 6-12 months
    prior 20
  • Narrative over-correctsIndependent reproduction narrows the capability gap
    prior 10
  • Material incidentMythos-class capability used in a disclosed attack within 12 months
    prior 5
  • Defender advantage holdsGlasswing delivers measurable patching before capability diffuses
    prior 8
  • Regulatory interventionMeaningful policy action within 6-9 months
    prior 14
  • Industry parityMultiple labs reach comparable gated-model state
    prior 21

Honest limitations

Corpus selection bias. The current corpus is English-language and US/UK-weighted. Serious coverage in Chinese, German, French, and specialized security press is not yet indexed. Probabilities will drift as that coverage is added.

Weight choice is subjective. Tier 1 = 3.0 and questions coefficient = −0.7 are judgment calls. Different reasonable analysts would pick different weights. This changes the numbers but not typically the rank order of scenarios.

Priors are not neutral. The seven scenario priors reflect one reading of base rates. A materially different read of the base rates would change the weighting between scenarios.

Scenarios are mutually exclusive by construction, not by reality. In practice a future could involve multiple scenarios (regulatory intervention + industry parity, for example). The model forces a single-frame characterization for the 12-month period.

No automated refresh yet. The corpus is manually curated. As news continues to land, these numbers will lag until new stories are indexed. Automated ingestion is Phase 2.