Ai Stack

Articles-First RAG: How to not Classify And Cut Chatbot Latency in Half
Most RAG guides classify the query first, then retrieve. That adds a full LLM call before a single vector is fetched. Here is how I flipped the order in a live deployment, kept classification only on the miss path, and why it held up in production.