Walking through your actual live data from this morning. Every number, every step.
This is your Mac Mini right now: Monday 10:39 AM. CPU 56%, Memory 72%, Disk 54%, Swap 74%, Load 5.0. You're working. The system has been running 35 hours and seen 19,925 observations.
Here's exactly what happens every 30 seconds.
The system reads your Mac's vital signs using stdlib calls only — subprocess to top, vm_stat, sysctl, df. Zero external dependencies. On an ESP32 it would be sensor reads.
| Metric | Value | Source |
|---|---|---|
| CPU | 50.51% | top -l 1 (parsed) |
| Memory | 72.27% | vm_stat + sysctl hw.memsize |
| Disk | 54.0% | df / (parsed) |
| Swap | 73.59% | sysctl vm.swapusage |
| Load | 4.25 | sysctl vm.loadavg |
| Network | 1 (up) | socket connect test |
| Idle Time | 2.5s | IOKit HIDIdleTime |
| Activity | 0.24/min | key/mouse events per minute |
| Hour | 10 | datetime.now().hour |
| Weekday | 0 (Mon) | datetime.now().weekday() |
This is the raw truth. Just numbers. No interpretation yet.
Each number gets placed into a bucket. CPU 50.51% falls in the 50-60% range = bucket 5. Memory 72.27% = bucket 7. This is language.py — the same idea as a tokenizer in GPT, but for numbers instead of words.
Why buckets? The transformer predicts discrete tokens, not continuous numbers. "CPU went from bucket 3 to bucket 5" is a learnable pattern. "CPU went from 31.247% to 50.512%" is noise.
| Metric | Raw Value | Bucket Rule | Token |
|---|---|---|---|
| CPU 50.51% | ÷10, floor | 50-59% → 5 | C5 |
| Memory 72.27% | ÷10, floor | 70-79% → 7 | M7 |
| Disk 54.0% | ÷10, floor | 50-59% → 5 | D5 |
| Swap 73.59% | ÷20, floor | 60-79% → 3 | S3 |
| Load 4.25 | ÷4, floor | 4.0-7.99 → 1 | L1 |
| Network up | boolean | up → 1 | N1 |
Swap has 5 buckets (0-100%, each 20%). Load has 5 buckets (0-20, each 4 units).
Pulse sequence: C5 M7 D5 S3 L1 N1
| Metric | Raw Value | Token |
|---|---|---|
| Idle 2.5s | bucket 0-7 by duration | I0 (very short idle = active) |
| Activity 0.24/min | bucket 0-5 by rate | A0 (low activity) |
| Hour 10 | ÷3, floor | H3 (morning) |
| Weekday 0 | direct | W0 (Monday) |
Rhythm sequence: I0 A0 H3 W0
Your entire Mac state is now 10 tokens. This is the vocabulary the model speaks.
The Pulse atom is a tiny GPT. It was trained on 15,000+ observations of your Mac. It learned patterns like: "after C2 M7, D3 is the most likely next token."
Now it processes your current sequence token by token:
| Step | Input | Model Predicts | Actually Got | Surprise |
|---|---|---|---|---|
| 1 | BOS | C2 (38%), C3 (25%) | C5 | 4.386 — never seen C5 much |
| 2 | C5 | M7 (85%), M6 (10%) | M7 | 0.296 — expected this |
| 3 | M7 | D3 (45%), D4 (30%) | D5 | 2.936 — disk higher than usual |
| 4 | D5 | S1 (40%), S2 (30%) | S3 | 4.878 — swap much higher than expected |
| 5 | S3 | L0 (50%), L1 (35%) | L1 | 1.366 — slightly surprising |
| 6 | L1 | N1 (99%) | N1 | 0.017 — network always up |
How "surprise" works: The model outputs a probability for every possible next token. If it predicted M7 with 85% probability and M7 actually appeared, surprise = -log(0.85) = 0.16 (low). If it predicted C2 but got C5 (maybe 1% probability), surprise = -log(0.01) = 4.6 (high).
That's the entire mechanism. Surprise = -log(probability the model assigned to what actually happened). High surprise = the model didn't expect this = anomaly.
Same process, different vocabulary:
| Step | Input | Predicts | Got | Surprise |
|---|---|---|---|---|
| 1 | BOS | I0 (60%) | I0 | 1.747 |
| 2 | I0 | A0 (99%) | A0 | 0.011 |
| 3 | A0 | H5 (25%), H6 (25%) | H3 | 1.884 — usually sees evening |
| 4 | H3 | W5 (60%), W6 (20%) | W0 | 4.229 — never seen Monday! |
Notice: W0 (Monday) scores 4.229 — the highest surprise. The model was trained mostly on Saturday and Sunday data. It's never seen a Monday before. That's why W0 is surprising. After a week of weekday data, this score will drop.
Pulse score = average of all per-token surprises:
(4.386 + 0.296 + 2.936 + 4.878 + 1.366 + 0.017) ÷ 6 = 2.313
Rhythm score = (1.747 + 0.011 + 1.884 + 4.229) ÷ 4 = 1.968
The threshold is 2.0. Pulse (2.31) is above it → "elevated." Rhythm (1.97) is just below → "normal."
The molecule sees EVERYTHING the atoms see, plus their scores, plus time context. Its input is a unified sequence of all domains:
BOS p.C5 p.M7 p.D5 p.S3 p.L1 p.N1 r.I0 r.A0 r.H3 r.W0 PS2 RS1 DS0 H3 W0
Notice the prefixes: p.C5 means "Pulse CPU bucket 5." r.I0 means "Rhythm Idle bucket 0." PS2 means "Pulse Score bucket 2." This prevents collisions — Pulse's C and Drift's C are different tokens.
The molecule has 4 expert FFN networks. A gating network looks at each token and decides which 2 experts (out of 4) should process it. This is Mixture of Experts — different experts specialize in different patterns.
The forward pass:
| Layer | What Happens |
|---|---|
| Embed | Each of the 20 tokens → 48-dimensional vector. Position also encoded. |
| Layer 1 | Attention: tokens attend to each other. p.S3 (high swap) attends to p.M7 (high mem) — they're related. Gate routes to experts 1,3. |
| Layer 2 | Deeper patterns: PS2 (elevated pulse score) attends to p.C5 + p.S3. Gate routes to experts 2,4. |
| Layer 3 | Decision layer: combines all evidence. W0 (Monday) + H3 (morning) + elevated scores → routes to experts 1,2. |
| Output | Predict next token: A:alert (100%), A:ok (0%), A:suppress (0%) |
After predicting the action, the model continues generating: EXP → explanation tokens:
A:alert EXP elevated above at high anomaly END
The explanation vocabulary is ~50 diagnostic words. The model picks words autoregressively — same as GPT generating text, but constrained to domain-specific vocabulary.
"elevated above at high anomaly"
This is the molecule's best attempt at explaining why it chose "alert." It's saying: scores are elevated, something is above normal, high anomaly detected.
The explanation quality depends on training data. Right now the molecule trained mostly on synthetic templates. As it retrains on more real observations (it's at 4,300+ sequences and climbing), the explanations will become more specific — eventually something like "cpu high swap elevated during morning" instead of generic "elevated above at high."
Based on the molecule's decision:
| Action | What Happens |
|---|---|
| ok | Log the observation. Do nothing. Everything is normal. |
| alert ← this time | Log + flag as anomaly. In daemon mode, could send Telegram/email. |
| suppress | Log but don't alert. The model recognizes this pattern as "known unusual" — like high CPU during builds. |
| retrain | The model thinks it's seeing patterns outside its training. Triggers a retrain cycle. |
This observation was logged. The alert was sent. 30 seconds from now, the whole cycle repeats.
| Component | Params | Sees | Learns | Outputs |
|---|---|---|---|---|
| Pulse Atom | 27,840 | CPU, MEM, Disk, Swap, Load, Net | "After M7, D3 is normal" | Per-token surprise scores |
| Rhythm Atom | 27,008 | Idle time, Activity, Hour, Weekday | "At H5 W5, I3 is normal" | Per-token surprise scores |
| Drift Atom | ~27K | Tasks added/completed/switched | "3 added 2 completed is normal" | Per-token surprise scores |
| Molecule | 157,824 | All atom tokens + scores + time | "Elevated pulse + night = alert, elevated pulse + weekday = suppress" | Action + explanation |
KIRI — an Eryx Labs project