Fine-Tuning
Fine-tuning is the process of taking a pre-trained large language model and further training it on a smaller, domain-specific dataset to specialize its behavior, knowledge, or output style. The model weights are updated to reflect the new training data, creating a customized version of the base model.
How it works
Pre-trained LLMs are generalists — they know a lot about many topics but are not experts in any specific domain. Fine-tuning narrows this generality:
1. **Start with a pre-trained model** (e.g., GPT-5, Llama) that already understands language 2. **Provide domain-specific training examples** — typically hundreds to thousands of input-output pairs showing desired behavior 3. **Train for a few epochs** — the model adjusts its weights to perform better on your specific task 4. **Result**: A specialized model that retains general language ability but excels at your domain
Fine-tuning is commonly used for: adapting tone and style (matching brand voice), teaching specific output formats (JSON, structured responses), improving performance on niche domains (medical, legal, financial), and reducing latency by using smaller fine-tuned models instead of larger general ones.
Why it matters
How Chatsy uses fine-tuning
Real-world examples
Key takeaways
Frequently asked questions
When should I use fine-tuning instead of RAG?
Use fine-tuning when you need consistent behavior changes (tone, style, format) rather than factual knowledge updates. Fine-tuning is better for teaching the model how to respond, while RAG is better for providing what to respond with. Most customer support use cases are better served by RAG or a combination of both.
How much training data does fine-tuning require?
Effective fine-tuning typically requires 500-5,000 high-quality input-output examples. More data generally improves results, but quality matters more than quantity. Poorly curated training data produces a model that is confidently wrong, which is worse than the base model.
How much does fine-tuning cost?
Fine-tuning costs include training compute ($50-$500+ per training run depending on model size and data volume), hosting the fine-tuned model ($100-$1,000+/month for inference), and data curation time (often the largest hidden cost). RAG on a base model is typically 5-10x cheaper for customer support use cases.
Can I fine-tune any LLM?
Not all LLMs support fine-tuning. OpenAI offers fine-tuning for GPT-4o and GPT-4o-mini. Open-source models like Llama and Mistral can be fine-tuned freely. Anthropic Claude and Google Gemini have more limited fine-tuning access. Check each provider for current availability and pricing.