Building Semantic Lead Classification Using Embeddings and Elasticsearch

Introduction

Traditional lead classification systems often rely on keyword matching, predefined categories, or manually curated rules. While these approaches can work for structured inputs, they quickly become unreliable when dealing with real-world user requests written in natural language.

In the LeadHub platform, the goal was to intelligently match leads with relevant business branches and services based on user intent. The challenge was that users rarely describe their needs consistently. Different people can describe the same service using completely different wording, while others combine multiple intentions in a single request.

A simple keyword-based approach was not enough. To improve classification quality and better understand user intent, the system evolved into a semantic classification pipeline using embeddings, Elasticsearch, and LLM-assisted validation.

The Problem With Traditional Matching

Initially, the classification logic relied heavily on keyword matching, predefined categories, direct text comparisons, and static branch associations. This worked reasonably well for simple and explicit requests.

However, real-world leads introduced several problems:

Synonyms and alternative phrasing for the same service
Vague or incomplete descriptions
Multilingual input across different markets
Mixed intentions — multiple services in a single lead
Contextual ambiguity

Real examples

— "I want to replace my old windows and repaint the living room."

— "My house windows are very old and energy inefficient."

The first request could involve window installation, painting services, and renovation-related branches simultaneously. The second expresses a similar intent to the first but uses completely different wording. Traditional matching systems struggle with these variations because they focus on literal words instead of semantic meaning.

As the number of branches and lead combinations increased, maintaining manual matching rules became increasingly difficult and unreliable.

Why Semantic Search Became Necessary

To improve classification quality, the platform moved towards semantic search using embeddings. Instead of comparing text directly, embeddings transform text into high-dimensional vector representations that capture semantic meaning and contextual similarity.

This allows the system to identify relationships between phrases that may not share identical keywords but express similar intent.

Semantically similar despite different wording

→ "replace windows"

→ "install new windows"

→ "old house windows need renovation"

The goal was not to find exact text matches, but to retrieve business branches that were contextually relevant to the user request. This significantly improved flexibility compared to rigid keyword systems.

Key shift

Moving from literal string comparison to semantic vector proximity changed what the system could understand. The same intent, expressed differently, now maps to the same region in vector space.

Architecture Overview

The classification pipeline evolved into a multi-stage architecture designed to combine semantic retrieval with contextual validation. The high-level flow became:

A lead is submitted by the user
Relevant answers and contextual information are extracted
Embeddings are generated using local LLM infrastructure
Elasticsearch performs semantic retrieval using vector similarity
Candidate branches are retrieved and ranked
Additional validation layers evaluate contextual relevance before final association

This architecture allowed the system to scale semantic matching without relying entirely on manually maintained rules. It also created a more flexible foundation for future AI-assisted workflows.

Using Elasticsearch for Semantic Retrieval

Elasticsearch was selected as the semantic retrieval engine due to its flexibility, performance, and support for vector search capabilities. Each branch and category could be represented using embeddings stored as dense vectors. Semantic similarity searches were then performed using cosine similarity scoring.

This enabled the system to:

Retrieve semantically related branches
Rank candidates by contextual similarity
Handle broader user phrasing without manual synonym lists
Support scalable retrieval pipelines

Compared to traditional relational filtering alone, semantic retrieval produced significantly better results for ambiguous or loosely phrased requests. Another important advantage was the ability to combine semantic similarity, metadata filtering, branch-specific logic, and contextual constraints inside a single retrieval pipeline.

Semantic retrieval through dense_vector fields and cosine similarity in Elasticsearch allowed the system to replace brittle rule sets with a scalable, queryable vector index that improved naturally as embeddings quality improved.

Challenges Encountered

Although semantic search improved retrieval quality considerably, several production challenges quickly emerged.

False Positives

One of the biggest problems was semantic proximity without actual relevance. Branches related to accessories or adjacent services, broad construction categories, and loosely related industries could sometimes appear semantically close even when they were not truly appropriate for the lead. Semantic similarity alone was not enough.

Ambiguous User Intent

Users often describe symptoms instead of services, desired outcomes instead of technical tasks, and multiple intentions in a single request. This created noisy and difficult classification scenarios that vector distance could not resolve on its own.

Broad Categories

Some categories naturally overlap semantically — construction, renovation, painting, interior work, cleaning. Without additional contextual validation, retrieval quality could degrade significantly for leads that touched multiple adjacent domains.

Multilingual Variations

The platform also needed to handle leads written using different phrasing styles and language variations, especially across German-language service descriptions. This increased the complexity of maintaining reliable semantic associations across markets.

Lessons Learned

Building semantic classification systems in production revealed several important lessons that shaped the next stages of the pipeline.

Semantic Search Alone Is Not Enough

Embeddings are extremely powerful for retrieval, but semantic similarity does not automatically guarantee contextual correctness. A lead about cleaning services might sit close in vector space to a branch about chemical products. Retrieval pipelines require additional validation layers — and this realisation drove the next phase of the system's evolution.

Retrieval Quality Matters More Than Quantity

Returning more candidates does not necessarily improve results. Carefully filtered and contextually relevant retrieval pipelines perform significantly better than broad semantic matching with a wide similarity threshold.

AI Systems Require Iterative Refinement

Production AI systems evolve continuously. Prompt adjustments, embedding strategies, validation logic, and retrieval thresholds required constant refinement based on real-world lead behaviour — not theoretical benchmarks.

Architecture Matters

As AI pipelines become more complex, clean separation between retrieval, validation, orchestration, and asynchronous processing becomes increasingly important for maintainability and scalability. Mixing these concerns early creates compounding problems later.

Conclusion

Semantic classification using embeddings and Elasticsearch significantly improved the platform's ability to understand user intent and retrieve contextually relevant business branches. The transition from traditional matching to AI-assisted retrieval reduced the reliance on brittle manual rules and improved flexibility across markets and languages.

However, the transition also introduced new challenges around ambiguity, contextual validation, and reliability. While semantic retrieval greatly improved the system's flexibility and scalability, it became clear that semantic similarity alone was not sufficient for production-grade lead matching.

This eventually led to the introduction of additional LLM-based validation layers and hybrid AI pipelines — which became a critical part of improving overall classification accuracy, and the subject of the next article in this series.