HORIZON Methodology
HORIZON is built to a single principle: every claim is auditable back to a public source. No anonymous tips, no scraped social posts presented as fact, no laundered citations. This page documents the exact qualification chain every record passes through.
1. Source qualification — NATO Admiralty Scale
Every source registered on HORIZON receives a two-axis rating per NATO AJP-2.1:
| Reliability (source) | A confirmed · B usually reliable · C fairly reliable · D not usually reliable · E unreliable · F cannot be judged |
|---|---|
| Credibility (info) | 1 confirmed · 2 probably true · 3 possibly true · 4 doubtful · 5 improbable · 6 cannot be judged |
A WHO Disease Outbreak News bulletin rates A1 (confirmed source, confirmed info). A peer-reviewed Lancet ID paper typically rates A2. A verified national-authority press release rates B1 to B2. Reuters and AP news wire rates B2 to B3. A single-source social media post rates D4 or worse — these are stored but never auto-applied to incident counts.
Auto-application of an extracted fact to the incident ontology requires
an A1/A2/B1/B2 source OR three corroborating independent sources within
48 hours. Both paths are documented per record in the
extraction_proposals audit log.
2. ICD 206 Source Reference Citation
Every src_citation field follows the US intelligence-community
ICD 206 format: [CLASSIFICATION] AUTHOR (RELIABILITY/CREDIBILITY)
"TITLE" PUBLICATION, DATE, IDENTIFIER. Example for the MV Hondius
WHO bulletin:
[PUBLIC] WHO (A1) "Disease Outbreak News 2026-DON600: Andes hantavirus — MV Hondius cluster" World Health Organization, 2026-05-11
3. Dual confidence model
Pipeline confidence (machine, 0.0 to 1.0) reflects the statistical confidence of the auto-extraction process — entity disambiguation, deduplication match score, regex pattern specificity. Analyst confidence (human, 0.0 to 1.0, nullable) is set only when a 79th Unit analyst has manually reviewed the record.
These are never conflated. Front-end displays distinguish them clearly: amber for pipeline (provisional), green for analyst (vetted). Exports require analyst confidence on every included object.
4. Berkeley Protocol chain-of-custody
The Berkeley Protocol on Digital Open Source Investigations defines the chain-of-custody requirements that make OSINT admissible in legal proceedings. Every fetched document on HORIZON is hashed (SHA-256) at ingestion and the hash is stored alongside the fetch timestamp, the URL, and the User-Agent that retrieved it. Re-fetch produces a new row if the hash changes — we never overwrite history.
5. Cluster-tie scoring (incident-specific)
For the MV Hondius cluster, an article must pass a cluster-tie classifier before any extracted facts auto-apply to the ontology:
- Strong tie (score 1.0) — explicit MV Hondius / Oceanwide Expeditions / Hondius port-name mention.
- Medium tie (0.5) — hantavirus + repatriation/evacuation context + route country.
- Weak tie (0.0) — hantavirus mention without ship/port/repatriation context. Produces no proposals.
6. Per-country authoritative cap (anti-inflation)
News articles frequently report cluster totals ("infections grow to 9 as
Spanish passenger falls ill") that the extractor could mis-attribute to
the country mentioned nearby. HORIZON now enforces a global cap: per-country
proposals where value_numeric ≥ WHO confirmed total are
rejected as cluster-total misattributions. The cap is sourced from the
WHO Disease Outbreak News authoritative count and ECDC corroboration.
7. Unique datasets
HORIZON ingests two datasets not available in any other public hantavirus surveillance platform:
Oxford Kraemer Lab MV Hondius individual-level ANDV line list (CC0) — maintained by Dr Moritz Kraemer (University of Oxford, Department of Biology), Sam Scarpino, and Andrew Rambaut (University of Edinburgh, Nextstrain). Located at github.com/kraemer-lab/Hondius_hantavirus_h2026. 28-column per-person resolution: status, symptom onset date, clinical outcome, nationality, country of exposure, treatment received, hospitalisation, travel history, and Pathoplexus/GenBank accession identifiers. Each row is cross-referenced against WHO Disease Outbreak News DON600 and national health authority press releases. This is the highest epidemiological resolution dataset available for the 2026 MV Hondius cluster. Updated continuously as the outbreak evolves.
NCBI RefSeq Orthohantavirus reference genome set (HantaNet) — curated by the CDC Molecular Epidemiology and Bioinformatics Team (described in PMC10675615). Covers the full set of NCBI RefSeq Orthohantavirus reference sequences: S, M, and L segments for Andes virus, Sin Nombre virus, Puumala virus, Hantaan virus, Seoul virus, Dobrava-Belgrade virus, Bayou virus, Black Creek Canal virus, Laguna Negra virus, Choclo virus, Saaremaa virus, Tula virus, and all other listed species. Ingested daily to provide a permanent genomic annotation layer cross-linked against the epidemiological case records.
8. Open data and API
All non-pre-decisional data is published live at /api/v1/cases, /api/v1/incidents, /api/v1/sources, and /api/v1/meta/events under CC BY 4.0. Bulk NDJSON streaming export: /api/v1/cases/bulk/ndjson. OpenAPI schema: /api/openapi.json. Cite: /CITATION.cff.
→ View the full source registry
→ How does HORIZON compare to other live hantavirus trackers?