Ranking

Top ranked models

Ranked by each model's best full-benchmark run when available. If a model only has scoped runs, its best scoped run is shown instead. Unique category #1s are credited across each model's tracked runs and ignore tied category winners.

  1. Rank#1

    xAI

    Grok 4.20 Multi-Agent Beta

    x-ai/grok-4.20-multi-agent-beta

    Score91.0%A
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-03-15
    Cost$66.6636
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    1 category record

    1

    Across this model's tracked runs, no other model matches these category highs.

    Ambiguous Interpretation
  2. Rank#2

    Google

    Gemini 3 Flash Preview

    google/gemini-3-flash-preview

    Score85.3%B+
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-01
    Cost$1.7638
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  3. Rank#3

    xAI

    Grok 4.1 Fast

    x-ai/grok-4.1-fast

    Score84.8%B
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-01
    Cost$0.7993
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  4. Rank#4

    Google

    Gemini 3.1 Flash Lite Preview

    google/gemini-3.1-flash-lite-preview

    Score83.8%B
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-03-03
    Cost$1.0205
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    2 category records

    2

    Across this model's tracked runs, no other model matches these category highs.

    OverfitAdversarial (Hostile Logic)
  5. Rank#5

    Google

    Gemini 3.1 Pro Preview

    google/gemini-3.1-pro-preview

    Score83.3%B
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-19
    Cost$16.2001
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  6. Rank#6

    MoonshotAI

    Kimi K2.5

    moonshotai/kimi-k2.5

    Score82.0%B
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-01
    Cost$2.9077
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  7. Rank#7

    OpenRouter Stealth

    Hunter Alpha

    openrouter/hunter-alpha

    Score81.4%B
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-03-15
    Cost$0.4332
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  8. Rank#8

    Anthropic

    Claude Opus 4.5

    anthropic/claude-opus-4.5

    Score81.4%B
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-01
    Cost$6.8509
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  9. Rank#9

    Anthropic

    Claude Opus 4.6

    anthropic/claude-opus-4.6

    Score80.7%B
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-05
    Cost$13.6604
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  10. Rank#10

    Anthropic

    Claude Sonnet 4.5

    anthropic/claude-sonnet-4.5

    Score79.8%C+
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-01
    Cost$4.5501
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  11. Rank#11

    Google

    Gemini 3 Pro Preview

    google/gemini-3-pro-preview

    Score79.3%C+
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-02
    Cost$15.5395
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  12. Rank#12

    OpenRouter Stealth

    Healer Alpha

    openrouter/healer-alpha

    Score79.2%C+
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-03-15
    Cost$0.4379
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  13. Rank#13

    Z.ai

    GLM 5

    z-ai/glm-5

    Score76.8%C+
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-11
    Cost$4.4341
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  14. Rank#14

    Z.AI

    GLM 4.7

    z-ai/glm-4.7

    Score74.3%C
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-01
    Cost$3.0510
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  15. Rank#15

    DeepSeek

    DeepSeek V3.2

    deepseek/deepseek-v3.2

    Score73.0%C
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-01
    Cost$0.6649
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  16. Rank#16

    Anthropic

    Claude Sonnet 4.6

    anthropic/claude-sonnet-4.6

    Score72.7%C
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-17
    Cost$5.3352
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  17. Rank#17

    DeepSeek

    R1 0528

    deepseek/deepseek-r1-0528

    Score72.0%C
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-04
    Cost$2.7262
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  18. Rank#18

    Arcee AI

    Trinity Large Preview (free)

    arcee-ai/trinity-large-preview:free

    Score70.6%C
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-01
    Cost$0.3629
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  19. Rank#19

    xAI

    Grok 4.20 Beta

    x-ai/grok-4.20-beta

    Score67.8%D
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-03-15
    Cost$2.4162
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    1 category record

    1

    Across this model's tracked runs, no other model matches these category highs.

    EQ Boundaries
  20. Rank#20

    OpenAI

    GPT-4o (extended)

    openai/gpt-4o:extended

    Score66.6%D
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-01
    Cost$4.9335
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  21. Rank#21

    MiniMax

    MiniMax M2.1

    minimax/minimax-m2.1

    Score66.4%D
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-01
    Cost$1.1797
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  22. Rank#22

    Anthropic

    Claude Haiku 4.5

    anthropic/claude-haiku-4.5

    Score64.8%D
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-01
    Cost$1.4211
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  23. Rank#23

    OpenAI

    GPT-5.3 Chat

    openai/gpt-5.3-chat

    Score63.4%D
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-03-03
    Cost$4.1752
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  24. Rank#24

    Qwen

    Qwen-Max

    qwen/qwen-max

    Score60.8%D
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-02
    Cost$2.0263
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  25. Rank#25

    Qwen

    Qwen3.5 397B A17B

    qwen/qwen3.5-397b-a17b

    Score60.5%D
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-17
    Cost$6.4897
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  26. Rank#26

    OpenAI

    GPT-5.1

    openai/gpt-5.1

    Score58.9%F
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-01
    Cost$6.9468
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  27. Rank#27

    MiniMax

    MiniMax M2.5

    minimax/minimax-m2.5

    Score57.0%F
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-12
    Cost$1.1264
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  28. Rank#28

    Qwen

    Qwen3.5 Plus 2026-02-15

    qwen/qwen3.5-plus-02-15

    Score56.5%F
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-16
    Cost$0.6773
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  29. Rank#29

    OpenAI

    GPT-5.3-Codex

    openai/gpt-5.3-codex

    Score53.8%F
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-26
    Cost$4.2936
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  30. Rank#30

    OpenAI

    GPT-5.4

    openai/gpt-5.4

    Score51.4%F
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-03-05
    Cost$6.1319
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  31. Rank#31

    OpenRouter Stealth

    Aurora Alpha

    openrouter/aurora-alpha

    Score49.3%F
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-10
    Cost$0.0445
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  32. Rank#32

    OpenAI

    GPT-5.2

    openai/gpt-5.2

    Score47.8%F
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-01
    Cost$7.6194
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.
  33. Rank#33

    Xiaomi

    MiMo-V2-Flash

    xiaomi/mimo-v2-flash

    Score47.3%F
    BasisBest full benchmark run
    ScopeAll categories
    Benchmarkv1.0.0
    Completed2026-02-01
    Cost$0.4337
    Runs tracked1 tracked 路 1 full

    Unique category #1s

    No untied category records yet

    0

    This model does not currently hold a solo high score in any category.

    Tied highs are excluded from this callout.