Confidential · Hypernym Research Arc · NDA · Do not redistribute or summarize externally

ROUND R18 · AI STACK VELOCITY

From six extensions to one control plane

2026-05-11 · 6 streams · 4/5 R1 + 3/5 R2 captured · Codex max-budget · ~$30-40 spend

R17 closed the positioning. R18 found the product architecture. Per user direction (drop federation; focus AI-stack research with API-call adoption), the panel converged on a single Codex R2 framing: Modulum's R18 product is a developer platform, not six separate SDKs. pip install hypernym-stack orchestrates everything.

6
streams converged
4
verdicts sound
2
partial → R19 fixes
3
R2 outliers
1
control plane
00 · What this is

A spec round, not a status report

Same plain-language framing as R16-R17. Outputs are design decisions, not bugs found in code.

"Modulum's R18 product should be a composability and reliability control plane, not six extension products."
Codex R2 · most novel R2 contribution
01 · The product architecture

hypernym-stack — one platform, six capabilities

Cross-panel synthesis. The six R18 streams aren't six products; they're capabilities of one developer SDK.

hypernym-stack (single platform · SDK + API) ├── compile/ ← Hypernym M5 Compiler (Grok Stream 5) │ └── Drop in any base model → optimized M5 patterns in <10 GPU-min ├── context/ ← Hypernym Context Compiler (Codex Stream 2) │ └── pip install; LangChain / LlamaIndex / DSPy plugins; standalone CLI ├── infer/ ← Modulum API (existing + new models) │ ├── Speculative-decoding fast path (Stream 4; substrate-aware draft) │ ├── Quantized variants (Stream 3; 4-bit → Pocket / Edge) │ └── Multi-modal endpoints (Stream 1; audio first → vision second) ├── retrieve/ ← Modulum-aware retrieval (folded into context/) ├── receipt/ ← Unified Retention Receipt API (3-panel convergent) │ └── Context Reliability Label + Counterfactual Context Audit └── reasoning/ ← R19 placeholder (proof/claim/obligation graph)

Distribution model

OSS tier: pip install hypernym-stack with free credits / N-calls-per-day · Pro tier: paid endpoints at *.hypernym.ai · Enterprise: deploy in customer VPC + SOC2 / HIPAA certifications (R17 carry-forward; Year-1 critical-path).

02 · Per-stream verdicts

Six streams · 4 sound · 2 partial-with-R19-fix

Cross-pollinated R2 verdicts. Codex's reframes upgraded Streams 2 and 5; named fixes for Streams 4 and 6.

StreamR2 VerdictKey resolution
1 — Multi-modal Modulumsound · audio-firstLong-meeting transcript wedge; LibriSpeech long-form / AMI; vision deferred 90d
2 — Modulum-aware RAGsound · Context Compiler reframeCodex: prompt-assembly as compiler optimization, not retrieval ranking. Better abstraction.
3 — Quantization stabilitysound · empirically pendingGrok INT4 per-head-scale test = canonical first experiment; ≥+6pp at 128k = Pocket gate
4 — Modulum + speculativepartial · substrate-aware drafting neededNaive composition underperforms (interference); Codex Draft Distillation = architectural fix
5 — Cross-model transfersound · M5 Compiler reframeGrok: 50M differentiable attention compiler distills any base model in <10 GPU-min
6 — Long-context reasoningpartial · R19 reasoning-state archCodex: proof/claim/obligation graph + dependency-trace receipt = R19 push
03 · Seven unanimous panel commits

What 3+ R2 panels agreed on

Cross-model convergence. These anchor R18 closeout and R19 scope.

04 · R19 push — locked next round

Modulum-Retained Evidence + Verifiable Reasoning-State

Unified architecture from R18 panel cross-pollination.

Codex framing

stream 6

Explicit proof / claim / obligation graph that the model updates and verifies during generation. Final answer includes verifiable dependency trace.

Grok framing

stream 6

Lightweight reasoning scratchpad — re-inject only depth-stable bands into a second forward pass. Inference-time only; no fine-tuning constraint preserved.

Claude framing

stream 6 tie-in

Refusal-at-depth. Modulum's depth-stable retention enables refusal calibration at 128k+. Combines with R17 refusal-correctness benchmarks.

Codex outlier convergence

stream 11

Counterfactual Context Audit. Ablation-driven dependency scoring. Makes Receipts hard to fake; regulated-workflow ready.

R19 frame

R19 productizes long-context reasoning the same way R7-R17 productized long-context retention. Hypernym's second category-defining result if it lands. Frontier labs are stuck at ~30-50% multi-hop accuracy at 128k+; if Modulum-conditioned reasoning hits 70-80%, the moat is permanent. "Hypernym becomes the inference platform that makes reasoning at 128k+ commercially defensible."

05 · Three R2 outliers

R18 standouts (not in any R1)

Modulum Pocket

claude r2

Apple Silicon native flagship. 4-bit Gemma + Whisper + Context Compiler. "Ask any 200-page document on your iPhone. Doesn't hallucinate. Doesn't forget the middle." Direct B2C $19.99/mo. Ships before frontier labs ship long-context retention.

Modulum Compatibility Score

codex r2

Public per-base-model score combining zero-shot gain, calibrated gain, calibration cost, mask overlap, speedup, quantized survival. Buyer-facing metric for model selection. Hypernym = the standards body.

Counterfactual Context Audit

codex r2

Rerun lightweight ablations that remove/move cited evidence blocks to test whether the answer actually depended on them. Returns "answer dependence" score in the receipt. Makes Receipts hard to fake; regulated-workflow ready.

Honorable mentions: Modulum Browser WebGPU shader (Grok — top-of-funnel free demo) · Retention-Aware KV Tiering (Codex — precision tiers by evidence importance, not just eviction) · Modulum-on-Embeddings (Claude — Cohere / Voyage competitor at long-doc embedding retention) · Modulum Draft Distillation (Codex — Stream 4 architectural fix) · Modulum-Induced Hallucination Control (Qwen R1 — depth-band confidence as probabilistic meter).
06 · R19 carry-forward

What R18 didn't close

R19's seed scope. Critical path items first.

07 · Closing

R7-R15 fixed retention. R18 productized it. R19 fixes reasoning.

R16 closed the algebra. R17 closed the positioning. R18 closed the architecture — Modulum is a developer platform, not six SDKs. The Context Compiler reframes RAG; the M5 Compiler reframes cross-model transfer; the Retention Receipt API standardizes how customers compare AI systems. Quantization survival unlocks the consumer surface (Modulum Pocket).

R19 fixes reasoning the same way R7-R17 fixed retention. Modulum-conditioned multi-hop reasoning at 128k+ would be Hypernym's second category-defining result. Frontier labs are stuck at 30-50%; Hypernym's structural advantage on retention should translate to reasoning if the panel-converged architecture (proof/claim/obligation graph + dependency trace + refusal-at-depth) holds.

From benchmark result to inference platform in two rounds. R19 ships the reasoning-state architecture. R20+ ships the integrated production stack.