The 5 Armies of LLMs

It feels like a new LLM is dropping every week, so here’s a short breakdown of the current state of AI, using The Hobbit’s five armies as a fitting analogy:

ChatGPT – The Dwarves

GPT-4o (and GPT-4 Turbo for free users) leads the pack in versatility. Great for writing, image generation, and general tasks. OpenAI is trying to move for-profit and has pricing on the higher side, especially with the $200/month Pro plan for their upcoming O1 deep-thinking model.

Claude Sonnet – The Elves

Sophisticated, refined, and my go-to for coding. I appreciate how Claude acknowledges uncertainty rather than pretending to know everything. Today, they launched Sonnet 3.7, claiming they didn’t overly optimize for the usual math benchmarks, looking forward to testing it.

Deepseek – The Eagles

A beacon of hope for open-source AI, proving that smaller players can compete. However, their official chat is unstable, and it sometimes hallucinates with confidence. Still, a promising challenger, but definitively not recommended for production purposes yet, unless you self-host it and can run 25 Nvidia A100 GPUs if you want the full-size model.

Gemini – The Men

No magical tricks, but solid and reliable. Gemini 2.0 Pro is fast, cheap, and efficient—possibly the most energy-efficient model in its category, even compared to DeepSeek. A strong all-rounder.

Grok – The Goblins

Unexpectedly dominant in math benchmarks, Grok is the wild card. xAI seems to prioritize free speech, making it less filtered than competitors. That can be a plus—or a minus—depending on your believes in (total) free speech. It has a bold (borderline arrogant) personality, which works for some use cases but not all. I might consider it for coding, but writing articles only if you want a cocky tone.

Capabilities Comparison

Based on my experience, I would rate the 5 LLMs that way:

LLM Capabilities Comparison

LLM	Coding	Writing Articles	Image Generation	Miscellaneous
ChatGPT	⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐⭐
Claude	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	❌	⭐⭐⭐
DeepSeek	⭐⭐⭐	⭐⭐	❌	⭐⭐
Gemini	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐
Grok	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐

Personal Take

I use Claude for coding but considering Grok. I’m excited to try Gemini for building agents for their cheap token cost. I like ChatGPT for writing articles or generating images, but Grok generates good images too. I’m not considering DeepSeek for the moment, even though I strongly believe in open source, but their platform is just too unstable.

Bonus: LLM Self-Perception

I asked all LLMs to assign themselves to an army:

Which Army Does Each LLM Assign?

LLM	Men	Elves	Dwarves	Eagles	Goblins
Claude	ChatGPT	Claude	DeepSeek	Gemini	Grok
Gemini	ChatGPT	Gemini	Claude	DeepSeek	Grok
ChatGPT	Gemini	Claude	ChatGPT	DeepSeek	Grok
Grok	ChatGPT	Claude	Gemini	Grok	DeepSeek
DeepSeek	ChatGPT	Claude	Gemini	DeepSeek	Grok

Each LLM assigns itself to a different army (highlighted). Notice how most agree Grok belongs with the Goblins.

In short:

Claude believes to be the elves and Grok the goblins
Gemini believes to be the elves and Grok the goblins
OpenAI believes to be the dwarves and Grok the goblins
Grok believes to be the eagles and DeepSeek the goblins
DeepSeek believes to be the eagles and Grok the goblins

I also keep an eye on Perplexity and the groq (with a q) platform that aims to provide the fastest inference platform and provide open source models such as Mistral, DeepSeek or llama 3.

The AI race is evolving fast—what’s your go-to model these days? 🚀