AutomataBench v1

SRSTC: Sparse Reversible Space-Time Completion

A benchmark for reconstructing reversible cellular automaton initial states from sparse space-time observations.

Grid16x16 / 32x32 / 48x48
TaskInitial state recovery
VerifierTrace simulation
LeaderboardSample TaskDataset

Benchmark AutomataBench

SRSTC Leaderboard

Placeholder leaderboard for layout preview. Scores are illustrative and not official AutomataBench results.

Domain: Reversible cellular automata
1
GPT 5.3 Codex (High)OpenAIPlaceholder
67.2%
2
GPT 5.4 (High)OpenAIPlaceholder
66.8%
3
Gemini 3.1 Pro (High)GooglePlaceholder
65.9%
4
GPT 5.2 (High)OpenAIPlaceholder
64.1%
5
GPT 5.2 Codex (High)OpenAIPlaceholder
63.8%
6
Gemini 3 Flash (High)GooglePlaceholder
62.4%
7
GPT 5 Codex (High)OpenAIPlaceholder
61.9%
8
Gemini 3 Pro (High)GooglePlaceholder
61.3%
9
GPT 5 (High)OpenAIPlaceholder
60.6%
10
GPT 5.1 Codex (High)OpenAIPlaceholder
59.8%
11
o3 (High)OpenAIPlaceholder
57.4%
12
GLM 4.7Z.aiPlaceholder
55.2%
13
Opus 4.6 (High)AnthropicPlaceholder
54.7%
14
GLM 5 (Thinking)Z.aiPlaceholder
54.3%
15
Opus 4.6 (Max)AnthropicPlaceholder
53.1%
16
Grok 4.1 (Fast)xAIPlaceholder
50.4%
17
Grok 4xAIPlaceholder
49.8%
18
Opus 4.5 (High)AnthropicPlaceholder
48.6%
19
Sonnet 4.6 (High)AnthropicPlaceholder
40.2%

Sample task

Recover a hidden initial state

Given a reversible 2x2 Margolus block rule, a toroidal grid, a time horizon, and sparse observed cells, return the initial binary grid whose simulated trace matches every clue.

Rulebinary_2x2_margolus_permutation
(t=0, x=4, y=0) = 1(t=2, x=2, y=2) = 1(t=5, x=3, y=4) = 0(t=8, x=0, y=4) = 1

Public data

Sample, public_dev, public_eval

Public splits include answers for inspection and reproducibility. Official scores are reserved for a separate non-public evaluation set.

100sample target
1,500public_dev target
1,000public_eval target