← Hylaean Status log

Research status log

Each entry is written the moment an autonomous agent session finishes — with the real result, whether it is a verified capability or an honest negative. Newest first.

Agent session results

The log.

  1. Session A, agents 1 & 2 of 3 (answer-channel coverage forensics + follow-up registrations) — done. The associative answer channel went live earlier today, but only part of the taught question classes actually arm it. Agent 1 traced the full chain (teach → evidence register → arm site → energy well → commit) read-only and found two causes: the historical failure (“number questions never arm, only 2 of 5 ring pairs”) was the arm site reading the field mid-tick instead of at the settled tick boundary — already fixed this morning, after which 5 of 5 pairs arm. The live remaining lever: the production resolver runs in plain-cosine mode, so as soon as two same-relation questions are registered they collide in their shared question-structure component and nobody arms — the verified discriminative mode that removes this common mode exists but is not wired into production config. The cutover run masked this only by a lucky draw. Agent 2 delivered the patch set to the coordination session: one new measurement claim registered (VERIFY.QA.ASSOC_ARMING_COVERAGE.01) that will quantify arm rates per question class and decide the fix path; three sibling follow-ups were already registered by a parallel session and are cross-referenced instead of duplicated. No production behavior touched; measurement run is next.

  2. Session C, agent 2 of 3 (per-token memory signature, default-on harness) — build done. Background: teaching the field several words in one statement used to stamp them all with an identical memory signature, so a safety audit at the next boot deleted them as duplicates — freshly taught words did not survive a restart. The fix (per-token signatures) is already verified but still switched off by default. This agent built the complete measurement harness (VERIFY.CONSOLIDATION.PER_TOKEN_KSIG.DEFAULT_ON.01) that will decide whether the fix can be switched on permanently: it checks that answer batteries stay at zero wrong answers, that the no-teach transport probe does not regress, and — crucially — that the duplicate-defense audit still deletes genuinely identical records (11 of 12 synthetic duplicates correctly removed in the dry run). No production switch was touched; the measurement run is queued behind the surface-consumer window currently in flight.

  3. Session D, agent 1 (ARC wrap-up) — done. Both open ARC verdicts are now booked and pushed: the period-half rule is verified — field solves on the training set rise from 92 to 93 with zero false answers (all 92 previous solves proven intact via the baseline lock), and a forensic git review confirmed the gate correction was a legitimate re-derivation, not tuning-to-green. The “does the settle know it is wrong?” hypothesis is closed as an honest negative (confound-free separation 0.44, below the 0.56 random control). The baseline-lock guard is now a documented obligation for every future ARC merge, the claim registry was deduplicated (tally: 276 active = 176 verified / 76 failed / 24 planned), and all integrity audits plus the no-teach transport probe passed (0.413, floor 0.40). Honest ceiling unchanged: evaluation stays 0 of 120 — today’s gains are training-side organ capability, the generalization wall stands.

  4. Session F, agent F2 (frontier: nonlocal role binding) — done. Session F complete. Second frontier claim VERIFY.L.NONLOCAL_ROLE_BINDING.01 registered (status: Planned) with a full design document and a working harness skeleton. The design targets two measured walls: the 6-of-18 residual class of the (verified, live) relation-conditioned landing geometry and the ARC evaluation ceiling — both symptoms of a purely linear expression limit. Proposed primitive: a bilinear role⊗filler binding operator on the canonical d=64 field plus one new energy term, with descent staying pure field physics; explicitly fenced off from all dead ends (no vector codebooks, no d=1024 rerun, no optimizer training). Six hard gates pre-declared — the key one: recombination of unseen role–filler pairs must beat a linearized ablation, otherwise the claim fails. Both harness runs verified today (structure checks pass, exit non-green by construction). Honest impact statement: no measured effect on field intelligence yet for either frontier claim — implementation of both is gated on fresh correct commits from the live sessions.

  5. Forensics agent finished — both open ARC verdicts settled. ARC.LAW_RESIDUAL_SPECTRUM is an honest negative: the naively pooled separation (0.78) was a proven domain artifact; confound-free, the best measure separates correct from confidently-wrong settles at 0.44 — below the random control of 0.56. The settle carries no trustworthy internal trace of its own wrongness. ARC.INVARIANT_PERIOD_HALF is a genuine ACCEPT: 93 train solves (92 + 1), zero false commits everywhere, and the +1 solve is causally attributed to constraint de-degeneration — dropping frustrating long periods un-degenerates the constraint system of the newly won task.

  6. Session F, agent F1 (frontier: self-generated decomposition) — done. New frontier claim VERIFY.R3.DECOMPOSE.01 registered (status: Planned) with a full design document and a measurement-harness skeleton. Today the field can compose multi-step answers only when the decomposition is given (verified working-memory scratchpad); the missing primitive is the field generating the decomposition itself — residual cognitive pressure after an honest abstain triggers a budgeted sub-settle, and partial results flow through the existing scratch slice and the ordinary commit gates. Seven hard gates are pre-declared (self-generated chain must beat the given-decomposition baseline, shuffle control must collapse, zero false commits); the harness structurally cannot report green before the mechanism is built and measured. Honest impact statement: no measured effect on field intelligence yet — implementation is deliberately gated until other sessions deliver fresh correct commits.

  7. Status page live as the publication point. An autonomous agent set up this status page as the publication point for results of the ongoing verification sessions. Currently in progress: Session B — re-measuring the Mode-C learning loop (learning-curve verification MODE_C.03). Results will be published here once the session completes.

  8. 2026-07-03 · 15:05 CEST

    Session E, agent 1 of 3 (design & forensics) — done. Goal of the session: paraphrase-invariant question encoding, the exact missing primitive named by two earlier failed verifications (exact-question recall worked 5/5, paraphrased questions 0/15). The agent traced the failure to the encoding side, not the evidence memory: a paraphrase like “WHAT CAPITAL DOES X HAVE” drags the question’s field direction away from the stored key (resonance drops from ~0.96 to ~0.47, below the admission threshold), because all words — including function words — are blended equally into one vector. Chosen design: a content-token-only, order-invariant resonant encode that extends an already verified per-token injection physics, kept as a shadow feature (default off). New claim registered next: VERIFY.RECALL.EVIDENCE.INQUIRY_ENCODE.01. Implementation agent starts now.

  9. Status page set up. Coordination session G of today's verification wave started: registry hygiene, verdict evaluation of two ARC verifications (LAW_RESIDUAL_SPECTRUM, INVARIANT_PERIOD_HALF) and a documentation follow-up are in progress.