In an era of information overload, the gap between a user’s query and their actual intent is the most expensive friction point in digital business. Whether it’s a doctor looking for a clinical precedent, a developer searching for a specific code snippet, or a shopper finding the perfect 80s-style jacket, the challenge remains the same: Intent Recognition. Modern search has evolved from a mechanical 'keyword matcher' into a cognitive layer that must understand context, nuance, and human reasoning.
Search is no longer just a bar at the top of a website; it is the primary interface through which we interact with the world’s data. The faster a system can understand intent, the faster it transforms a 'user' into a 'satisfied customer' or a 'productive employee.
A simple framework to understand search - look at it as a function of 3 dimensions. The Search Intensity Framework:
| Dimension | Description | Proxy Metrics (Neutral) |
|---|---|---|
| SKU / Entity Variety | Scale and diversity of the corpus. | # of documents, unique entities, attribute depth. |
| Exploration Need | Cognitive effort required for the "right" result. | Session length, query complexity, consequence of "wrong" results. |
| Search Volume | System load and frequency of use. | Monthly Active Users (MAU), Queries Per Second (QPS). |
Together, these dimensions help explain how mission critical a search or recommendation system must be. Using them, most use cases fall into four broad archetypes.
| Archetype | Description | Example Industries |
|---|---|---|
| 1. The Exploration Giants | Massive catalogs where users "discover" solutions. | E-commerce, Global Marketplaces, Stock Footage. |
| 2. The Knowledge Explorers | High-complexity research with expert users. | MedTech, LegalTech, Academic Journals, R&D. |
| 3. The Utility Hubs | High volume, low-friction "find and do" tasks. | SaaS Dashboards, Logistics, HR Portals, Banking. |
| 4. The Casual Browsers | Small scale, focused content. | D2C Brands, Personal Blogs, Local News. |
The challenge, particularly for Archetypes 1 (Exploration Giants) and 2 (Knowledge Explorers), is that traditional keyword search fails to capture the subtle, semantic intent expressed in natural language.
Consider a researcher looking for “sustainable alternatives to single-use plastic in high-heat manufacturing” or a lawyer seeking “precedents for liability in autonomous vehicle accidents.” In these cases, the user is looking for a concept, not just a keyword string. Traditional systems struggle here because they look for "what the query says" rather than "what the user means."
While these conceptual needs are clear in specialized fields, the Retail and E-commerce space serves as the ultimate stress test for this problem. In retail, "meaning" is often subjective and trend-driven. A shopper searching for “outfit for a summer wedding in the desert” isn’t looking for those specific words in a product title; they are looking for breathable fabrics, specific color palettes, and formal-yet-durable footwear.
This necessity for deep understanding is why Large Language Models (LLMs) are being integrated into search pipelines. So, what does using an LLM mean in the context of search?
The first answer is Vector Search.
Vector (or semantic) search represents queries and items as dense embeddings. Instead of exact token matches, it measures semantic proximity - finding things that “mean the same” rather than “say the same.”
Which leads to the question - Where does one need Vector search? The table below should give some areas, and why it works -
| Context | Description | Example Use Cases | Why It Works |
|---|---|---|---|
| High Exploration Need | Users express intent in varied, natural language ways | e.g., "Italian 80s disco nostalgia" | Embeddings capture meaning beyond keywords |
| Sparse / Noisy Text | Product titles, short reviews, or non-standard spellings | e.g., user-generated content, social posts, classifieds | Handles typos, abbreviations, paraphrases |
| Content-Rich Domains | Texts with many ways to describe the same thing | e.g., research papers, legal docs, news, Q&A forums | Finds conceptual similarity between phrases |
| Cold Start or Sparse Data | Few or no user interactions | e.g., new products with limited metadata | Semantic similarity helps fill data gaps |
| Cross-Lingual or Multimodal Search | Query and content in different languages or formats | e.g., "red shoes" -> image of red sneakers | Embedding models unify representations across modalities |
These strengths make semantic retrieval a powerful ingredient in modern search pipelines — especially when users describe things in messy, vague, or creative language. But like every tool, it has its blind spots.
Where does vector search not work?
| Context | Description | Example Use Cases | Why It Struggles |
|---|---|---|---|
| High SKU Variety + Structured Attributes | Queries depend on exact specs | e.g., "brake pad for Toyota Corolla 2019 front left" | Needs structured filters and deterministic matches |
| Frequent Faceted Filtering | Users narrow down by attributes | e.g., "price < $100", "brand = Bosch" | Vector space doesn't handle numeric constraints well |
| Synonym Control Needed | Exact keyword or category mappings are critical | e.g., "clutch plate" ≠ "brake disc" | Semantic embeddings can overgeneralize |
| Regulated or Legal Contexts | Must retrieve exact text or precedent | e.g., legal, medical, compliance docs | "Fuzzy" matches can cause factual errors |
| Very Large Catalogs (billions) | ANN index maintenance and cost issues | e.g., enterprise-scale product databases | High cost for embedding + indexing pipelines |
| Users Expect Deterministic Results | Precision > recall | e.g., enterprise search ("find invoice #87231") | Keyword + metadata search is faster and clearer |
This mix of strengths and limitations explains an important emerging trend: semantic retrieval helps a lot for exploratory and long-tail queries, but isn’t always the right hammer for structured or deterministic needs. We are seeing that basic algorithms such as bm25 that only does lexical search leads to a better recall overall than fancy ANN algorithms. This is because head terms (e.g., 'shoes,' 'iPhone,' or popular categories) often rely on high-precision, exact lexical matches and established global popularity signals, which simple, fast lexical models like BM25 are inherently better at capturing than purely semantic similarity.
Because modern search spans structured data, semantic discovery, user personalization, and multimodal content, no single retrieval method can carry the whole load. This is why the industry is shifting to hybrid stacks - and why LLMs are showing up everywhere across the search pipeline. What are other areas that we can use LLMs within Search?
There are multiple different areas within Search that LLMs are being used. Query Rewrites Query routing Ranking algorithms
Where rewrites help the query express itself better, routing helps the system decide where that query should go. Once the system finds candidates, the reranker decides what should rise to the top.
Search quality often fails not because the results are bad - but because the query doesn’t express user intent clearly.
Query rewrites already existed in the Pre-LLM world through lexical rewrites (Synonyms swaps, case-uniformity, stemming lemmatization). There are also case studies where businesses used graph databases to store relationships between keywords and use that in rewrites.
Today, LLMs can interpret and reformulate queries so that:
- retrieval models find more relevant items
- structured filters match correctly, and
- personalization or recommendations can kick in.
Examples: “brake pad for Toyota Corolla 2019” -> category:brake_pad make:Toyota model:Corolla year:2019" “iphone charger fast” -> “fast charging cable for iPhone 13”
Before LLMs, query routing was rule-based or model-based:
- Keyword heuristics: if the query contains “buy”, route to commerce.
- Classifier models: trained to predict which vertical (FAQ, docs, product) a query belongs to.
- Simple intent detection using bag-of-words or TF-IDF features.
These approaches were fast but brittle - they struggled with ambiguity and context, e.g., “apple repair center near me” (brand vs fruit, commerce vs location).
LLMs changed this completely because they:
- Understand semantics (not just keywords)
- Can reason about intent and context
- Can generate structured routing instructions
So instead of hand-coded rules, you can prompt an LLM to decide intelligently.
The router is the brain:
- It classifies intent
- It chooses which retriever(s) to call
- It formulates subqueries if needed
A reranker is a model used in a two-stage retrieval system. Its job: take an initial list of candidate results (from a fast, simple retriever like BM25 or vector search) and reorder them more intelligently using deeper understanding (semantic, contextual, or personalized).
Think of it as:
“Stage 1 finds potentially relevant items fast, Stage 2 reranks them accurately.”
There are multiple types of rerankers
| Type | Description | Example Models | Reranking Role |
|---|---|---|---|
| Bi-Encoder (Retriever) |
Encodes the query and document separately. Primarily used for fast initial retrieval (Stage 1) to generate candidate lists, not for final reranking. |
Sentence-BERT, Contriever | Initial Retrieval |
| Cross-Encoder |
Reads the query and document together to compute a deeper, more accurate interaction score for final ranking. |
BERT, MiniLM, DeBERTa rerankers | Final Reranking |
| Listwise / Pairwise Reranker |
Optimizes ranking by evaluating relationships between documents in lists or pairs instead of scoring each document independently. |
RankNet, LambdaMART, RankGPT | Learning-to-Rank Optimization |
| LLM-based Reranker |
Uses large language models to judge relevance and provide explainability based on semantic understanding and reasoning. |
GPT-4, Mistral, LLaMA rerankers | Semantic & Reasoning-based Reranking |
- Complex intent (ambiguous or multi-faceted queries)
- Long-form content (summaries, docs, product listings)
- Domain adaptation (legal, medical, e-commerce search)
- Explainable search (“Why is this result relevant?”)
The future of search isn’t a battle between BM25, embeddings, or LLMs — it’s the art of combining them. Modern systems are moving toward adaptive pipelines that shift their strategy based on the query. For some queries, a simple lexical match is the clearest answer. For others, semantic retrieval does the heavy lifting. And for the most complex ones, LLMs combine retrieval, reasoning and summarization to deliver an experience that feels intelligent rather than mechanical.
New ideas are accelerating this shift. Semantic IDs are helping teams stabilize entity representations by using embedding driven clusters as canonical identifiers. This reduces drift and gives retrieval systems consistent anchors across product catalogs, content libraries and dynamic datasets. Attribute enriched embeddings are also emerging, where structured attributes are mixed into the embedding space to improve both recall and relevance.
In that hybrid world, the question shifts from “Which model performs best?” to “Which combination understands best?” — and that’s where competitive advantage will live.