ROUND R18 · AI STACK VELOCITY
2026-05-11 · 6 streams · 4/5 R1 + 3/5 R2 captured · Codex max-budget · ~$30-40 spend
R17 closed the positioning. R18 found the product architecture. Per user direction (drop federation; focus AI-stack research with API-call adoption), the panel converged on a single Codex R2 framing: Modulum's R18 product is a developer platform, not six separate SDKs. pip install hypernym-stack orchestrates everything.
Same plain-language framing as R16-R17. Outputs are design decisions, not bugs found in code.
Cross-panel synthesis. The six R18 streams aren't six products; they're capabilities of one developer SDK.
OSS tier: pip install hypernym-stack with free credits / N-calls-per-day · Pro tier: paid endpoints at *.hypernym.ai · Enterprise: deploy in customer VPC + SOC2 / HIPAA certifications (R17 carry-forward; Year-1 critical-path).
Cross-pollinated R2 verdicts. Codex's reframes upgraded Streams 2 and 5; named fixes for Streams 4 and 6.
| Stream | R2 Verdict | Key resolution |
|---|---|---|
| 1 — Multi-modal Modulum | sound · audio-first | Long-meeting transcript wedge; LibriSpeech long-form / AMI; vision deferred 90d |
| 2 — Modulum-aware RAG | sound · Context Compiler reframe | Codex: prompt-assembly as compiler optimization, not retrieval ranking. Better abstraction. |
| 3 — Quantization stability | sound · empirically pending | Grok INT4 per-head-scale test = canonical first experiment; ≥+6pp at 128k = Pocket gate |
| 4 — Modulum + speculative | partial · substrate-aware drafting needed | Naive composition underperforms (interference); Codex Draft Distillation = architectural fix |
| 5 — Cross-model transfer | sound · M5 Compiler reframe | Grok: 50M differentiable attention compiler distills any base model in <10 GPU-min |
| 6 — Long-context reasoning | partial · R19 reasoning-state arch | Codex: proof/claim/obligation graph + dependency-trace receipt = R19 push |
Cross-model convergence. These anchor R18 closeout and R19 scope.
hypernym-stack = the canonical developer SDK.Unified architecture from R18 panel cross-pollination.
Explicit proof / claim / obligation graph that the model updates and verifies during generation. Final answer includes verifiable dependency trace.
Lightweight reasoning scratchpad — re-inject only depth-stable bands into a second forward pass. Inference-time only; no fine-tuning constraint preserved.
Refusal-at-depth. Modulum's depth-stable retention enables refusal calibration at 128k+. Combines with R17 refusal-correctness benchmarks.
Counterfactual Context Audit. Ablation-driven dependency scoring. Makes Receipts hard to fake; regulated-workflow ready.
R19 productizes long-context reasoning the same way R7-R17 productized long-context retention. Hypernym's second category-defining result if it lands. Frontier labs are stuck at ~30-50% multi-hop accuracy at 128k+; if Modulum-conditioned reasoning hits 70-80%, the moat is permanent. "Hypernym becomes the inference platform that makes reasoning at 128k+ commercially defensible."
Apple Silicon native flagship. 4-bit Gemma + Whisper + Context Compiler. "Ask any 200-page document on your iPhone. Doesn't hallucinate. Doesn't forget the middle." Direct B2C $19.99/mo. Ships before frontier labs ship long-context retention.
Public per-base-model score combining zero-shot gain, calibrated gain, calibration cost, mask overlap, speedup, quantized survival. Buyer-facing metric for model selection. Hypernym = the standards body.
Rerun lightweight ablations that remove/move cited evidence blocks to test whether the answer actually depended on them. Returns "answer dependence" score in the receipt. Makes Receipts hard to fake; regulated-workflow ready.
R19's seed scope. Critical path items first.
mlx_vlm not on PATH).R16 closed the algebra. R17 closed the positioning. R18 closed the architecture — Modulum is a developer platform, not six SDKs. The Context Compiler reframes RAG; the M5 Compiler reframes cross-model transfer; the Retention Receipt API standardizes how customers compare AI systems. Quantization survival unlocks the consumer surface (Modulum Pocket).
R19 fixes reasoning the same way R7-R17 fixed retention. Modulum-conditioned multi-hop reasoning at 128k+ would be Hypernym's second category-defining result. Frontier labs are stuck at 30-50%; Hypernym's structural advantage on retention should translate to reasoning if the panel-converged architecture (proof/claim/obligation graph + dependency trace + refusal-at-depth) holds.
From benchmark result to inference platform in two rounds. R19 ships the reasoning-state architecture. R20+ ships the integrated production stack.