Model Selection Guide

PreviousNext

Which AI model to use for different use cases — pricing, capabilities, and practical recommendations.

Choosing the right AI model is one of the most impactful decisions you'll make. The wrong choice can mean significantly higher costs, slower responses, or worse output quality.


Quick Decision Matrix

If you need...ConsiderWhy
Best overall qualityClaude 3.5 SonnetStrong reasoning, coding, instruction following
Fastest responsesGroq (Llama models)Optimized for speed
Lowest costGemini FlashAggressive pricing
Longest contextGemini 1.5 Pro2M token context window
Best codingClaude 3.5 SonnetConsistently strong on code benchmarks
Best reasoningo1 or DeepSeek R1Chain-of-thought reasoning models
Vision/ImagesGPT-4o or Claude 3.5Both capable at image understanding

Model Categories

Flagship Models

Best models from each provider — use when quality matters most.

ModelContextBest For
GPT-4o128KGeneral purpose, vision, structured output
Claude 3.5 Sonnet200KCoding, long-form, complex instructions
Gemini 1.5 Pro2MMassive context, video, audio
Mistral Large128KMultilingual, European compliance

Fast & Cheap Models

For high-volume applications where cost and speed matter more than peak quality.

ModelContextBest For
GPT-4o Mini128KHigh volume, simple tasks
Claude 3.5 Haiku200KFast responses, large context
Gemini Flash1MCheapest option, massive context
Llama 3.3 70B (Groq)131KFastest inference, open source
DeepSeek V364KStrong value, good for code

Reasoning Models

For complex multi-step problems, math, and scientific analysis.

ModelContextBest For
o1200KHardest problems, research
o3-mini200KReasoning on a budget
DeepSeek R164KOpen-source reasoning, math

Use Case Recommendations

Chat Applications

Primary: GPT-4o

  • Good balance of quality, speed, and instruction following
  • Handles conversation well

Budget: GPT-4o Mini

  • Much cheaper, still reasonable for most conversations

Speed: Groq Llama 3.3 70B

  • Fastest inference available
  • Open source, no vendor lock-in
import { openai } from "@ai-sdk/openai" const result = await streamText({ model: openai("gpt-4o"), // or "gpt-4o-mini" for volume messages, maxTokens: 1000, })

Code Generation

Primary: Claude 3.5 Sonnet

  • Consistently performs well on coding benchmarks
  • 200K context fits large codebases

Budget: DeepSeek V3

  • Strong coding capability at lower cost
import { anthropic } from "@ai-sdk/anthropic" const result = await generateText({ model: anthropic("claude-3-5-sonnet-20241022"), prompt: "Implement a rate limiter in TypeScript...", maxTokens: 4000, })

Long Document Processing

Primary: Gemini 1.5 Pro

  • 2M token context — entire codebases fit
  • Good at long-range retrieval
import { google } from "@ai-sdk/google" const result = await generateText({ model: google("gemini-1.5-pro"), prompt: `Analyze this codebase:\n\n${entireCodebase}`, })

Structured Data Extraction

Primary: GPT-4o

  • Reliable JSON schema following
  • Structured output mode guarantees valid JSON
import { openai } from "@ai-sdk/openai" import { z } from "zod" const result = await generateObject({ model: openai("gpt-4o"), schema: z.object({ name: z.string(), email: z.string().email(), company: z.string().optional(), }), prompt: "Extract contact info from: ...", })

Cost Optimization

1. Match Model to Task

Don't use flagship models for simple tasks.

// Simple classification doesn't need GPT-4o const sentiment = await generateText({ model: openai("gpt-4o-mini"), // much cheaper prompt: "Is this positive or negative: 'Great product!'", })

2. Set Max Tokens

Always cap output to avoid runaway responses.

const result = await generateText({ model: openai("gpt-4o"), prompt: "Summarize this article...", maxTokens: 500, // cap output })

3. Cache Responses

For repeated prompts, cache results.

const cacheKey = hash(prompt) const cached = await db.cache.get(cacheKey) if (cached) return cached const result = await generateText({ model, prompt }) await db.cache.set(cacheKey, result.text, { ttl: 3600 })

4. Batch When Possible

One large request is often cheaper than many small ones.

// Better: single API call const result = await generateText({ model, prompt: `Classify each item:\n${items.join("\n")}`, })

Provider Notes

OpenAI

  • Most reliable, widest adoption
  • Best for structured output

Anthropic

  • Strong reasoning and coding
  • 200K context

Google

  • Cheapest at scale
  • Massive context windows

Groq

  • Fastest inference
  • Open-source models only

DeepSeek

  • Strong value
  • Good for code and reasoning

External Resources

For real-time benchmarks and pricing: