95% CI 882 to 1118 / bootstrap 1000 to 1000
Rotten Tomatoes for AI storytelling
Benchmark board
Audience Score is the headline: ordinary listeners rate individual stories after listening, while model identity stays hidden until after rating.
Listeners
Audience Score
AI Critics
Offline LLM-judge snapshots
Expert Panel
Human critic track
Audience Score - Bayesian-adjusted stars
Leaderboard
95% CI 875 to 1125 / bootstrap 250 to 0
95% CI 875 to 1125 / bootstrap 250 to 0
95% CI 875 to 1125 / bootstrap 250 to 0
Criteria Radar
Score History
Per-genre Bar Charts
History
Mystery
Sci-Fi
Model x Genre Heatmap
| Model | History | Mystery | Sci-Fi |
|---|---|---|---|
| Grok Story | 1000 | pending | pending |
| Claude Fable 5 | pending | pending | pending |
| GPT-6 | pending | pending | pending |
| Gemini Pro Latest | pending | pending | pending |
Retention
Stream rate uses play starts and five-minute reach.
Length-normalized completion is computed per model from max listen position over audio duration.
Consistency Check
0 same-listener, same-premise derived comparisons computed.
Premise preferences remain supplementary and frequency-capped.
Integrity
Bootstrap CIs are shown beside Bayesian intervals.
Anchor, rater-quality, style-control, and season fields are visible on model profiles.
Model profiles
xAI - story
6starts
30%complete
4QC ratings
Anthropic - 5
3starts
0%complete
0QC ratings
OpenAI - 6
1starts
0%complete
0QC ratings
Google - latest
0starts
0%complete
0QC ratings
Best-rated sample works
Grok Story
A Port Learns to Speak★ 0.0 / 0 ratings
Claude Fable 5
Harbor of Turning Tides★ 0.0 / 0 ratings
GPT-6
Ink Under Glass★ 0.0 / 0 ratings
Gemini Pro Latest
A Door Marked Thursday★ 0.0 / 0 ratings