Or: How I learned to stop worrying and love the monster prompt
There's a classic programming joke that goes: "Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems."
In the AI world, we have a new version: "Some people, when confronted with a problem, think 'I know, I'll fine-tune a model.' Now they have two problems."
And just like with regex, the solution that feels more sophisticated often creates more complexity than it solves. Let me explain why RAG + prompting should be your default approach, and why most teams are making AI harder than it needs to be.
Fine-tuning feels like the "real AI" approach. You're customizing a model specifically for your use case, training it on your data, making it truly yours. What could go wrong?
Everything.
Here's what happens when you choose fine-tuning:
You inherit the entire ML pipeline. Suddenly you're dealing with data quality issues, overfitting, catastrophic forgetting, and evaluation metrics that don't capture what you actually care about. You need to worry about train/test splits, learning rates, and the eternal question of whether your model is actually better or just memorizing your training set differently.
Every change requires retraining. Model acting weird? Another training run. Need to handle edge cases? More data prep, more waiting. Want to update behavior based on user feedback? Hope you saved those training resources.
Debugging becomes a nightmare. When your fine-tuned model gives a bad answer, what do you do? With a prompt, you can see exactly what instructions the model followed. With fine-tuning, you're debugging a black box within a black box.
You need evals you probably don't have. Here's the hard truth: if you can't afford comprehensive evaluations, you can't afford fine-tuning. Fine-tuning without proper evaluation is like deploying code without tests. You might get lucky, but you're probably going to break something in ways you won't notice until it's too late.
The reality is that 90% of the time, fine-tuning is objectively the wrong choice. Maybe 9% of the time it performs as well as alternatives while costing less. And only about 1% of the time is it actually the right approach.
Instead of fine-tuning, start with what I call "monster prompts" - detailed, explicit instructions that tell the model exactly how to behave.
Immediate iteration. Change behavior in real-time by tweaking instructions. No training runs, no waiting, no wondering if your changes actually worked.
Full transparency. You can see exactly what instructions the model is following. When something goes wrong, you know where to look.
Easy debugging. Model making mistakes? Add examples to your prompt. Need to handle edge cases? Write specific instructions for them.
Rich context. You can include examples, reasoning patterns, output formats, edge case handling, and domain-specific knowledge all in one place.
The beauty of a well-crafted prompt is that it's immediately understandable by humans and machines alike. You're not training a neural network to hopefully learn some implicit behavior - you're giving explicit instructions that you can read, debug, and modify.
Retrieval-Augmented Generation (RAG) solves the fundamental problem with both base models and fine-tuned models: knowledge that changes or wasn't in the training data.
With RAG, you can:
Inject current information without retraining. Your model can access up-to-date data, proprietary documents, or any information that wasn't in its original training.
Maintain source tracking. Unlike fine-tuning, where knowledge gets baked into the weights, RAG keeps clear attribution to source documents.
Update knowledge instantly. New document? Add it to your knowledge base. Document changed? Update it. No model retraining required.
Scale knowledge efficiently. Want to add a million new documents? Your model stays the same size - only your retrieval system grows.
But here's where most teams go wrong: they immediately reach for vector databases and embeddings. This is usually a mistake.
Vector databases have become the default "best practice" for RAG, but they're often overkill and add unnecessary complexity.
Opaque failures. When vector search fails, you're stuck wondering if it's the embedding model, the chunking strategy, the similarity threshold, or just the inherent fuzziness of semantic search.
Hard to debug. Why did it return these documents? The black box nature of embeddings makes it nearly impossible to understand search behavior.
Complex infrastructure. Now you need to manage embeddings, vector indexes, similarity thresholds, and chunking strategies.
Unnecessary for most use cases. The majority of search problems are actually pretty literal. People usually know roughly what terms they're looking for.
Instead of vector search, try this approach:
Here's the magic: put instructions in your monster prompt about how to search your specific database, with examples of different query strategies, and encourage the AI to try different keywords if the first attempt doesn't work.
Transparent search. When search fails, you can immediately see why - the keywords just aren't in the docs.
Explainable strategy. The AI can say: "I searched for 'quarterly earnings' but didn't find much, so I'm trying 'Q4 results' instead."
Easy domain adaptation. Tell the AI: "This is a military database - if searching fails, try formal terminology like 'personnel' instead of 'people'."
Combinable with filters. Easily add date ranges, document types, authors, etc.
The AI doesn't need you to enumerate every possible synonym. Give it a few examples and it will extrapolate:
Examples: personnel/people, assets/equipment, kinetic action/combat
AI learns: engagement/fight, sortie/mission, ordnance/weaponsYou can even provide domain-specific guidance:
The AI picks up on patterns and applies them creatively. It's like having a research librarian who understands your domain and can think creatively about terminology.
Here's where the explicit approach really shines: you can automatically log the AI's search strategies.
When the AI says "I tried X but found nothing, so I'm trying Y," that's pure gold for system improvement. You can:
For the ambitious, you can even automate the improvement process: have an AI read the logs, test prompt improvements with A/B tests, and promote successful changes.
That 1% where fine-tuning is truly the right choice usually involves:
But even then, start with RAG + prompting to prove the use case and build your evaluation framework.
Most AI problems are solved better with explicit instructions and smart retrieval than with custom model training. The RAG + monster prompt approach gives you:
Stop making AI harder than it needs to be. Start with the simple approach that works, and only add complexity when you have clear evidence it's needed.
Your future self will thank you when you're iterating on prompts instead of waiting for training runs to complete.
The prompt engineering skills and RAG techniques that work today will still work tomorrow. The same can't be said for your fine-tuned model when the next generation of base models arrives.