The real trained model runs right here in your browser — the RSSM dynamics, count/mass heads, and controller actor, ported to JavaScript and verified to match PyTorch. Scope: up to 4 objects — the model counts reliably from a single frame up to ~4; beyond that it needs temporal context.
You act, the model believes. Take actions and watch the model's belief track reality. It runs open-loop (sees only your action, not the new image), so it slowly drifts — especially mass. Hit Peek to let it look and snap back.
Choose a target count; the controller drives the world to it. The actor — trained only inside the model's imagination — picks add/remove/no-op, reaches your target, and holds it. It controls count, not mass (added size is random).
Choose a target TOTAL MASS; the mass controller gets close. Unlike count, mass is only approximately controllable — the size of each added shape is random — so expect it to land within ~1–2 of the target, not exactly on it.