Search-Augmented Generation
Search-augmented generation is a specific form of RAG where the retrieval step queries a live search engine rather than a static document store. The model searches the web for current information, retrieves and processes the results, and uses them as context to generate a grounded response. This is the architecture behind products like Perplexity, ChatGPT's web browsing mode, and Google's AI Overviews. The distinction from standard RAG is important. Traditional RAG retrieves from a pre-built corpus — your company's documentation, a knowledge base, a collection of PDFs you have already processed and embedded. Search-augmented generation retrieves from the entire web, in real time, for every query. This gives it access to breaking news, recently published content, and information that no pre-built corpus would contain, but it also introduces dependencies on search API availability, result quality, and web page accessibility. A typical search-augmented generation pipeline works as follows. The user's query is analyzed and possibly reformulated into one or more search queries. Those queries are sent to a search API (Exa, Tavily, Brave Search API, or a SERP API). The returned results — snippets, full-page content, or both — are processed, ranked for relevance, and inserted into the model's context window. The model then generates a response that synthesizes the retrieved information, ideally with citations pointing back to the source URLs. For product builders, search-augmented generation is the fastest path to building an AI product that can answer questions about current events, recent data, or topics outside the model's training distribution. The main engineering challenges are query reformulation (turning a conversational question into effective search queries), result quality filtering (not all search results are relevant or trustworthy), context window management (fitting the most useful content within token limits), and citation accuracy (ensuring the model correctly attributes claims to sources). The pattern is widely adopted because it provides grounding without requiring you to build and maintain a document corpus. Your knowledge base is the entire web, kept current by the search engine's crawlers rather than your own infrastructure.