A single writing session produces more than 100 signals across six measurement families. Here is how.
Three systems work together: a native signal engine measures the temporal and structural shape of your writing, a reconstruction adversary tests which dimensions of that shape are irreducible, and a semantic pipeline captures the linguistic content. This page explains the architecture and the corrections that shaped it.
The session flow
Every writing session follows the same path, from keystroke to stored measurement. The synchronous transaction guarantees the session is saved before any derived computation begins. The fire-and-forget pipeline runs six independent signal families; if one fails, the others still complete.
Two channels, recorded simultaneously
Every key-down and key-up event with millisecond timestamps. Character identity, cursor position, deletion events. The full temporal microstructure of how you type.
{c, d, u} per keystroke · [offset, cursor, deleted, inserted] per edit event
The final submitted text, word count, session duration, and the question that prompted the response. What you wrote, not just how you typed it.
Atomic write: all or nothing
The session is persisted in a single database transaction. If any write fails, the entire session rolls back. The pipeline that follows runs after the transaction commits, so pipeline failures never affect session persistence.
tb_responses Raw text, question reference tb_session_summaries 100+ computed fields from keystroke data tb_session_events Keystroke stream + event log (JSON) tb_burst_sequences P-bursts (text produced between pauses) Six families, computed independently
Each signal family runs in isolation. A failure in one family does not prevent the others from completing. If the Rust engine is unavailable, dynamical, motor, and process signals return null and the session saves without them. The health endpoint surfaces this state.
Dynamical Rust 51 signals
Nonlinear dynamics of the IKI series. Treats keystroke timing as output of a complex adaptive system. Organized into nine theoretical sub-families spanning complexity, structure, causality, and mode decomposition.
tb_dynamical_signalsMotor Rust 17 signals
Statistical shape of keystroke timing distributions. Motor control, rhythmic consistency, and neuromuscular execution.
tb_motor_signalsProcess Rust 9 signals
Writing process mechanics. Text reconstruction from the event log to classify pause locations, burst types, revision behavior, and strategy shifts.
tb_process_signalsSemantic Rust 14 signals
Linguistic content analysis via deterministic word-list methods. What you wrote, measured as density metrics and discourse structure.
tb_semantic_signalsCross-session Rust 11 signals
Longitudinal consistency metrics. How your current session relates to your own history.
tb_cross_session_signalsFrom signals to understanding
After all signal families complete, downstream systems synthesize the measurements into longitudinal context.
Response text embedded via self-hosted Qwen3-Embedding-0.6B (512-dim, L2-normalized). Stored with SHA-256 identified weights for reproducibility.
Rolling z-scores per semantic dimension against your own history. Detects meaningful deviations from personal baseline.
Rolling aggregate of all signal dimensions. Your accumulated behavioral fingerprint. Updated after every session.
Profile-predicted signals vs. actual signals. The gap between what your history predicts and what you actually produced.
Operator-run, off-band corpus refresh against a user-agnostic prompt set. The LLM is invoked only to populate a shared question corpus and never receives or analyzes any subject's response data, signals, or profile. Subjects pull from the shared corpus once their personal seeds run out.
The reconstruction adversary
A measurement is only meaningful if you can say what it would look like without the thing you are measuring. Alice's ghost is a reconstruction adversary: it generates a synthetic writing session from your statistical profile alone, then runs the same signal engine on both streams. The residual between real and ghost measurements is what your profile cannot explain.
Your actual keystroke stream from today's writing session
Rust signal engine computes more than 100 measurements
Real signal values
What your profile can reconstruct vs. what requires the actual person
Your accumulated statistical profile (ex-Gaussian params, digraph latencies, burst structure, revision rates)
Avatar engine generates synthetic keystrokes, then the same Rust signal engine computes more than 100 measurements
Ghost signal values
Five adversary variants
Each variant adds one modeling improvement, isolating which dimension of behavior carries the most signal. Comparing reconstruction residuals across variants reveals what a statistical profile can reproduce and what it cannot.
Order-2 Markov text generation with independent ex-Gaussian IKI sampling and fixed hold times. The simplest adversary: text statistics plus independent timing.
Adds serial dependence to inter-keystroke intervals. Tests whether IKI autocorrelation structure carries information beyond the marginal distribution.
Adds hold-flight time coupling via rank correlation. Tests whether motor execution coordination (the relationship between how long you press and how long you travel) is informative.
Replaces order-2 Markov with prediction by partial matching. Tests whether better text modeling reduces the residual, or whether the text channel is already well-captured.
PPM text, AR(1) correlated IKI, Gaussian copula motor coupling. The strongest reconstruction within the measurement space. What remains in the residual after this variant is what the profile genuinely cannot explain.
Every ghost session stores the PRNG seed, the exact profile snapshot, the corpus hash, and the topic string. Given these inputs, the avatar engine produces bit-identical output across rebuilds. The seed is a 64-bit integer initialized via SplitMix64 and advanced by xoshiro128+. Reproducibility is verified on production data and enforced in CI.
How the system got here
A measurement instrument earns its rigor through correction, not assertion. This timeline shows the methodological incidents that shaped Alice's signal engine. Each entry documents what was wrong, how it was discovered, and what changed. The full provenance log is maintained in the codebase.
Signal pipeline boundary failure + extended ghost residuals
Wrapper functions silently stripped 38 new signal columns; ghost residuals expanded from 13 to 41 dimensions
Two TypeScript wrapper functions in libSignalsNative.ts explicitly constructed return objects with only the original 14 fields, silently discarding everything the Rust engine produced after the Phase 1-5 expansion. Three edits to one file fixed it. All new signal columns now populate through the live pipeline. Backfill completed: 31 dynamical + 31 motor rows re-inserted with all new columns populated. Additionally, reconstruction residual comparison expanded from 13 to 41 behavioral dimensions stored in extended_residuals_json (JSONB), organized into seven theoretical families. Calibration guard added to the last unguarded aggregate pipeline function.
Expanding residuals to 41 dimensions revealed that ghost reproduction fidelity varies by over 100x across theoretical families. MF-DFA spectrum width is the most ghost-resistant dimension by 4x, because the reconstruction adversary generates from a single stochastic process and cannot reproduce multifractal structure. Ordinal statistics are nearly perfectly reproducible. This decomposition was invisible at 13 dimensions.
Embedding sovereignty
Replaced vendor API with self-hosted archivable weights
VoyageAI's voyage-3-lite failed three of four constitutional requirements: not archivable (no SHA-256 identifier), not deterministic across vendor changes, no lifecycle control. Migrated to Qwen3-Embedding-0.6B via self-hosted TEI, CPU-only build for bit-reproducibility. Metal FP32 on Apple Silicon is not bit-reproducible due to batch-dependent reduction patterns. All 10 non-calibration sessions re-embedded; semantic baselines regenerated from scratch.
The semantic channel now has the same reproducibility guarantee as the behavioral channel. Embeddings computed today can be regenerated from archived weights indefinitely, independent of any vendor. Baselines remain valid against future sessions without methodological drift.
Construct validity
Stripped unvalidated interpretive labels from signal surfaces
Audit identified two classes of failure: interpretive labels presented as instrument readings (e.g., "rigid"/"malleable" for attractor force without validation), and statistical notation without adequate sample-size context (sigma notation without baseline entry counts). Wave 1 removed four sets of unvalidated labels. Wave 2 added honest framing for low-n signals: dots-only for n<5, explicit sample sizes on deviation callouts, minimum run length 4 for trend detection.
Every readout the instrument now surfaces is either a raw measurement or an honestly-gated statistical claim. The observatory distinguishes between what it knows and what it is still learning, visible in the interface rather than hidden behind premature interpretation.
Statistical rigor for discovery badges
Dynamic critical-r gate replaced hardcoded thresholds
Coupling correlations displayed as "strong" or "moderate" based on hardcoded thresholds without significance testing. A correlation of r=0.55 from n=10 displayed as "strong" despite p>0.05. Replaced with dynamic max(criticalR(n), 0.3) gate using Cornish-Fisher approximation. Two-state badge system: "established" requires both significance and stability; "provisional" for significant but unvalidated couplings.
Coupling discoveries now scale honestly with data depth. At n=10, almost nothing qualifies. At n=50, genuine structure emerges. The instrument's confidence grows with the dataset rather than with arbitrary thresholds, which means early data accumulation cannot produce false discoveries that later need to be retracted.
Reconstruction residual reproducibility
Residuals now store exact inputs for regeneration
Every ghost session stores the PRNG seed, profile snapshot (3.1KB measured), corpus SHA-256 hash, and topic string. Given these inputs, regeneration produces bit-identical dynamical and motor signals. Verified 10/10 on production data. Semantic residuals remain excluded (depend on external embedding model).
Every residual is now an auditable claim. Any future version of the instrument can verify whether a past ghost comparison would produce the same result, making reconstruction validity a permanent property of the dataset rather than a snapshot assertion.
CI reproducibility enforcement
Two-clean-build reproducibility check on every PR touching Rust
Automated CI workflow builds the signal engine twice from clean state and diffs the output JSON on a fixture session. Any bit-level divergence fails the PR. Golden signal values documented for the 100-keystroke fixture.
The signal engine cannot silently drift. Any code change that alters a measurement, whether intentionally or through compiler optimization differences, is caught before it enters the codebase. This is the enforcement mechanism that makes the reproducibility guarantee operational rather than aspirational.
Floating-point summation order
Naive summation sensitive to compiler auto-vectorization; replaced with Neumaier compensated summation
Discovered that naive floating-point sums across 17 accumulation sites were sensitive to LLVM auto-vectorization and loop unrolling order. Replaced all with Neumaier compensated summation (error bound O(epsilon) independent of n). Also fixed HashMap iteration nondeterminism in permutation entropy (converted to BTreeMap) and pinned the Rust toolchain to 1.95.0 + LLVM 22.1.2 on aarch64-apple-darwin.
The same keystroke stream now produces the same signal values on any build, on any machine, indefinitely. This is the numerical foundation that every downstream guarantee depends on: reproducible residuals, stable baselines, and meaningful longitudinal comparisons all require that the measurement itself does not move.
Hold-flight vector misalignment
27/27 sessions affected, transfer entropy values shifted 130%+
Hold times and flight times were filtered independently, causing misalignment when rollover typing produced valid holds with invalid flights. 6,589 total misaligned events across all sessions. Transfer entropy values shifted by over 130% mean, with 5 sign flips. Fixed by paired filtering: both hold and flight are kept or dropped together for the same keystroke event. Original data preserved as a snapshot table.
Transfer entropy now measures what it claims: directional information flow between motor execution (hold) and cognitive planning (flight). Before the fix, the coupling was computed between misaligned series, meaning the causal direction estimates were unreliable. The entire hold-flight analysis framework depends on this alignment being correct.
Eight incidents across four days. Each one made the instrument more honest. The full provenance log, including four additional incidents (INC-003, INC-004, INC-007, INC-011) and four deferred design decisions, is maintained in the codebase and available for review.
Three systems, one instrument. The signal engine measures. The ghost tests whether those measurements are irreducible. The semantic pipeline captures what was said, not just how it was typed. Together, they produce a longitudinal record of cognitive process that no single system could provide alone.