AutomataBench v1
SRSTC: Sparse Reversible Space-Time Completion
A benchmark for reconstructing reversible cellular automaton initial states from sparse space-time observations.
Grid16x16 / 32x32 / 48x48
TaskInitial state recovery
VerifierTrace simulation
Benchmark AutomataBench
SRSTC Leaderboard
Placeholder leaderboard for layout preview. Scores are illustrative and not official AutomataBench results.
Domain: Reversible cellular automata
1
GPT 5.3 Codex (High)
67.2%
2
GPT 5.4 (High)
66.8%
3
Gemini 3.1 Pro (High)
65.9%
4
GPT 5.2 (High)
64.1%
5
GPT 5.2 Codex (High)
63.8%
6
Gemini 3 Flash (High)
62.4%
7
GPT 5 Codex (High)
61.9%
8
Gemini 3 Pro (High)
61.3%
9
GPT 5 (High)
60.6%
10
GPT 5.1 Codex (High)
59.8%
11
o3 (High)
57.4%
12
GLM 4.7
55.2%
13
Opus 4.6 (High)
54.7%
14
GLM 5 (Thinking)
54.3%
15
Opus 4.6 (Max)
53.1%
16
Grok 4.1 (Fast)
50.4%
17
Grok 4
49.8%
18
Opus 4.5 (High)
48.6%
19
Sonnet 4.6 (High)
40.2%
Sample task
Recover a hidden initial state
Given a reversible 2x2 Margolus block rule, a toroidal grid, a time horizon, and sparse observed cells, return the initial binary grid whose simulated trace matches every clue.
Rule
binary_2x2_margolus_permutation(t=0, x=4, y=0) = 1(t=2, x=2, y=2) = 1(t=5, x=3, y=4) = 0(t=8, x=0, y=4) = 1Public data
Sample, public_dev, public_eval
Public splits include answers for inspection and reproducibility. Official scores are reserved for a separate non-public evaluation set.
100sample target
1,500public_dev target
1,000public_eval target