Ranking

Top ranked models

Ranked by each model's best full-benchmark run when available. If a model only has scoped runs, its best scoped run is shown instead. Unique category #1s are credited across each model's tracked runs and ignore tied category winners.

  1. Rank#1

    xAI

    Grok 4.20 Multi-Agent

    x-ai/grok-4.20-multi-agent

    Score91.0%A
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-03-15
    Cost$66.6636
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    1 category record

    1

    Across this model's tracked runs, no other model matches these category highs.

    Ambiguous Interpretation
  2. Rank#2

    Google

    Gemini 3 Flash Preview

    google/gemini-3-flash-preview

    Score85.3%B+
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-01
    Cost$1.7638
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  3. Rank#3

    xAI

    Grok 4.1 Fast

    x-ai/grok-4.1-fast

    Score84.8%B
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-01
    Cost$0.7993
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  4. Rank#4

    Z.ai

    GLM 5.1

    z-ai/glm-5.1

    Score84.1%B
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-04-07
    Cost$4.2011
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  5. Rank#5

    Anthropic

    Claude Opus 4.7

    anthropic/claude-opus-4.7

    Score84.0%B
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-04-16
    Cost$14.4565
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  6. Rank#6

    Google

    Gemini 3.1 Flash Lite Preview

    google/gemini-3.1-flash-lite-preview

    Score83.8%B
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-03-03
    Cost$1.0205
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    1 category record

    1

    Across this model's tracked runs, no other model matches these category highs.

    Overfit
  7. Rank#7

    Google

    Gemini 3.1 Pro Preview

    google/gemini-3.1-pro-preview

    Score83.3%B
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-19
    Cost$16.2001
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  8. Rank#8

    Google

    Gemma 4 31B

    google/gemma-4-31b-it

    Score82.2%B
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-04-02
    Cost$0.3346
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  9. Rank#9

    MoonshotAI

    Kimi K2.5

    moonshotai/kimi-k2.5

    Score82.0%B
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-01
    Cost$2.9077
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  10. Rank#10

    Xiaomi

    MiMo-V2-Pro

    xiaomi/mimo-v2-pro

    Score81.4%B
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-03-15
    Cost$0.4332
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  11. Rank#11

    Anthropic

    Claude Opus 4.5

    anthropic/claude-opus-4.5

    Score81.4%B
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-01
    Cost$6.8509
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  12. Rank#12

    DeepSeek

    DeepSeek V4 Pro

    deepseek/deepseek-v4-pro

    Score81.4%B
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-04-24
    Cost$3.7081
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  13. Rank#13

    Anthropic

    Claude Opus 4.6

    anthropic/claude-opus-4.6

    Score80.7%B
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-05
    Cost$13.6604
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  14. Rank#14

    Anthropic

    Claude Sonnet 4.5

    anthropic/claude-sonnet-4.5

    Score79.8%C+
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-01
    Cost$4.5501
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  15. Rank#15

    Google

    Gemini 3 Pro Preview

    google/gemini-3-pro-preview

    Score79.3%C+
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-02
    Cost$15.5395
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  16. Rank#16

    Xiaomi

    MiMo-V2-Omni

    xiaomi/mimo-v2-omni

    Score79.2%C+
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-03-15
    Cost$0.4379
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  17. Rank#17

    MoonshotAI

    Kimi K2.6

    moonshotai/kimi-k2.6

    Score78.3%C+
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-04-21
    Cost$3.6900
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  18. Rank#18

    OpenAI

    GPT-4.1

    openai/gpt-4.1

    Score77.3%C+
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-04-01
    Cost$2.4096
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  19. Rank#19

    Z.ai

    GLM 5

    z-ai/glm-5

    Score76.8%C+
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-11
    Cost$4.4341
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  20. Rank#20

    Qwen

    Qwen3.6 Plus Preview (free)

    qwen/qwen3.6-plus-preview:free

    Score74.4%C
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-04-02
    Cost$0.4492
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  21. Rank#21

    Z.AI

    GLM 4.7

    z-ai/glm-4.7

    Score74.3%C
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-01
    Cost$3.0510
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  22. Rank#22

    DeepSeek

    DeepSeek V4 Flash

    deepseek/deepseek-v4-flash

    Score73.2%C
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-04-24
    Cost$0.6274
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  23. Rank#23

    DeepSeek

    DeepSeek V3.2

    deepseek/deepseek-v3.2

    Score73.0%C
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-01
    Cost$0.6649
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  24. Rank#24

    Anthropic

    Claude Sonnet 4.6

    anthropic/claude-sonnet-4.6

    Score72.7%C
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-17
    Cost$5.3352
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  25. Rank#25

    Xiaomi

    MiMo-V2.5

    xiaomi/mimo-v2.5

    Score72.6%C
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-04-27
    Cost$1.4285
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  26. Rank#26

    DeepSeek

    R1 0528

    deepseek/deepseek-r1-0528

    Score72.0%C
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-04
    Cost$2.7262
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  27. Rank#27

    Mistral

    Mistral Small 4

    mistralai/mistral-small-2603

    Score72.0%C
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-03-28
    Cost$0.7188
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  28. Rank#28

    Arcee AI

    Trinity Large Preview (free)

    arcee-ai/trinity-large-preview:free

    Score70.6%C
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-01
    Cost$0.3629
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  29. Rank#29

    OpenAI

    GPT Chat Latest

    openai/gpt-chat-latest

    Score69.9%D
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-05-06
    Cost$9.7443
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  30. Rank#30

    Xiaomi

    MiMo-V2.5-Pro

    xiaomi/mimo-v2.5-pro

    Score68.7%D
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-04-27
    Cost$2.0532
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  31. Rank#31

    xAI

    Grok 4.20

    x-ai/grok-4.20

    Score67.8%D
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-03-15
    Cost$2.4162
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    1 category record

    1

    Across this model's tracked runs, no other model matches these category highs.

    EQ Boundaries
  32. Rank#32

    OpenAI

    GPT-4o (extended)

    openai/gpt-4o:extended

    Score66.6%D
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-01
    Cost$4.9335
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  33. Rank#33

    Arcee AI

    Trinity Large Thinking

    arcee-ai/trinity-large-thinking

    Score66.6%D
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-04-02
    Cost$1.1208
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  34. Rank#34

    MiniMax

    MiniMax M2.1

    minimax/minimax-m2.1

    Score66.4%D
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-01
    Cost$1.1797
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  35. Rank#35

    OpenAI

    GPT-5.5

    openai/gpt-5.5

    Score65.4%D
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-04-24
    Cost$14.9956
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  36. Rank#36

    Anthropic

    Claude Haiku 4.5

    anthropic/claude-haiku-4.5

    Score64.8%D
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-01
    Cost$1.4211
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  37. Rank#37

    OpenAI

    GPT-5.3 Chat

    openai/gpt-5.3-chat

    Score63.4%D
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-03-03
    Cost$4.1752
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  38. Rank#38

    OpenAI

    GPT-4o-mini

    openai/gpt-4o-mini

    Score61.8%D
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-04-01
    Cost$0.5171
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  39. Rank#39

    Qwen

    Qwen-Max

    qwen/qwen-max

    Score60.8%D
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-02
    Cost$2.0263
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  40. Rank#40

    NVIDIA

    Nemotron 3 Nano 30B A3B

    nvidia/nemotron-3-nano-30b-a3b

    Score60.6%D
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-04-07
    Cost$0.7716
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  41. Rank#41

    Qwen

    Qwen3.5 397B A17B

    qwen/qwen3.5-397b-a17b

    Score60.5%D
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-17
    Cost$6.4897
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  42. Rank#42

    MiniMax

    MiniMax M2.7

    minimax/minimax-m2.7

    Score60.4%D
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-04-13
    Cost$1.3814
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  43. Rank#43

    Elephant

    openrouter/elephant-alpha

    Score60.3%D
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-04-13
    Cost$0.3837
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  44. Rank#44

    OpenAI

    GPT-5.1

    openai/gpt-5.1

    Score58.9%F
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-01
    Cost$6.9468
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  45. Rank#45

    MiniMax

    MiniMax M2.5

    minimax/minimax-m2.5

    Score57.0%F
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-12
    Cost$1.1264
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  46. Rank#46

    Qwen

    Qwen3.5 Plus 2026-02-15

    qwen/qwen3.5-plus-02-15

    Score56.5%F
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-16
    Cost$0.6773
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  47. Rank#47

    NVIDIA

    Nemotron 3 Super

    nvidia/nemotron-3-super-120b-a12b

    Score56.1%F
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-04-07
    Cost$0.8746
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  48. Rank#48

    OpenAI

    GPT-5 Mini

    openai/gpt-5-mini

    Score54.8%F
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-04-01
    Cost$2.4435
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  49. Rank#49

    OpenAI

    GPT-5.3-Codex

    openai/gpt-5.3-codex

    Score53.8%F
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-26
    Cost$4.2936
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  50. Rank#50

    Z.ai

    GLM 5 Turbo

    z-ai/glm-5-turbo

    Score51.9%F
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-03-29
    Cost$2.8015
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  51. Rank#51

    OpenAI

    GPT-5.4

    openai/gpt-5.4

    Score51.4%F
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-03-05
    Cost$6.1319
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  52. Rank#52

    OpenRouter Stealth

    Aurora Alpha

    openrouter/aurora-alpha

    Score49.3%F
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-10
    Cost$0.0445
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  53. Rank#53

    OpenAI

    GPT-5.2

    openai/gpt-5.2

    Score47.8%F
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-01
    Cost$7.6194
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  54. Rank#54

    Xiaomi

    MiMo-V2-Flash

    xiaomi/mimo-v2-flash

    Score47.3%F
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-01
    Cost$0.4337
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  55. Rank#55

    OpenAI

    GPT Mini Latest

    ~openai/gpt-mini-latest

    Score44.3%F
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-04-27
    Cost$1.4206
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  56. Rank#56

    OpenAI

    GPT-5.4 Mini

    openai/gpt-5.4-mini

    Score43.9%F
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-03-29
    Cost$1.4024
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  57. Rank#57

    OpenAI

    GPT-5.4 Nano

    openai/gpt-5.4-nano

    Score38.0%F
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-03-29
    Cost$0.8900
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.