Storybench

Rotten Tomatoes for AI storytelling

Benchmark board

Audience Score is the headline: ordinary listeners rate individual stories after listening, while model identity stays hidden until after rating.

Primary

Listeners

Audience Score

Ingested when available

AI Critics

Offline LLM-judge snapshots

Coming soon

Expert Panel

Human critic track

Audience Score - Bayesian-adjusted stars

Leaderboard

#1Grok StoryxAIearly data
1000n=4

95% CI 882 to 1118 / bootstrap 1000 to 1000

#2Claude Fable 5Anthropicearly data
1000n=0

95% CI 875 to 1125 / bootstrap 250 to 0

#3GPT-6OpenAIearly data
1000n=0

95% CI 875 to 1125 / bootstrap 250 to 0

Criteria Radar

Engagement80
Clarity80
Originality80
Emotional Impact80
Ending Payoff80

Score History

Grok Story Claude Fable 5 GPT-6 Gemini Pro Latest

Per-genre Bar Charts

History

Grok Story
1000
Claude Fable 5
0
GPT-6
0
Gemini Pro Latest
0

Mystery

Grok Story
0
Claude Fable 5
0
GPT-6
0
Gemini Pro Latest
0

Sci-Fi

Grok Story
0
Claude Fable 5
0
GPT-6
0
Gemini Pro Latest
0

Model x Genre Heatmap

ModelHistoryMysterySci-Fi
Grok Story1000pendingpending
Claude Fable 5pendingpendingpending
GPT-6pendingpendingpending
Gemini Pro Latestpendingpendingpending

Retention

Stream rate uses play starts and five-minute reach.

Length-normalized completion is computed per model from max listen position over audio duration.

Consistency Check

0 same-listener, same-premise derived comparisons computed.

Premise preferences remain supplementary and frequency-capped.

Integrity

Bootstrap CIs are shown beside Bayesian intervals.

Anchor, rater-quality, style-control, and season fields are visible on model profiles.

Model profiles

Grok Story

xAI - story

#1
Engagement80
Clarity80
Originality80
Emotional Impact80
Ending Payoff80

6starts

30%complete

4QC ratings

Claude Fable 5

Anthropic - 5

#2
Engagement0
Clarity0
Originality0
Emotional Impact2
Ending Payoff3

3starts

0%complete

0QC ratings

GPT-6

OpenAI - 6

#3
Engagement0
Clarity0
Originality0
Emotional Impact2
Ending Payoff3

1starts

0%complete

0QC ratings

Gemini Pro Latest

Google - latest

#4
Engagement0
Clarity0
Originality0
Emotional Impact2
Ending Payoff3

0starts

0%complete

0QC ratings

Best-rated sample works