RAG (Retrieval-Augmented Generation)
Retrieval-Augmented Generation (RAG) is a technique that enhances LLM responses by retrieving relevant information from a knowledge base before generating answers. This grounds LLM outputs in specific, verifiable sources rather than relying solely on training data.
For more background on RAG concepts, see:
Embabel Agent provides RAG support through the LlmReference interface, which allows you to attach references (including RAG stores) to LLM calls.
The key classes are ToolishRag for exposing search operations as LLM tools, and SearchOperations for the underlying search functionality.
Agentic RAG Architecture
Unlike traditional RAG implementations that perform a single retrieval step, Embabel Agent’s RAG is entirely agentic and tool-based. The LLM has full control over the retrieval process:
- Autonomous Search: The LLM decides when to search, what queries to use, and how many results to retrieve
- Iterative Refinement: The LLM can perform multiple searches with different queries until it finds relevant information
- Cross-Reference Discovery: The LLM can follow references, expand chunks to see surrounding context, and zoom out to parent sections
- HyDE Support: The LLM can generate hypothetical documents (HyDE queries) to improve semantic search results
This agentic approach produces better results than single-shot RAG because the LLM can:
- Start with a broad search and narrow down
- Try different phrasings if initial queries return poor results
- Expand promising results to get more context
- Combine information from multiple chunks
Facade Pattern for Safe Tool Exposure
Embabel Agent uses a facade pattern to expose RAG capabilities safely and consistently across different store implementations.
The ToolishRag class acts as a facade that:
- Inspects Store Capabilities: Examines which
SearchOperationssubinterfaces the store implements - Exposes Appropriate Tools: Only creates tool wrappers for supported operations
- Provides Consistent Interface: All tools use the same parameter patterns regardless of underlying store
@Override
public List<Tool> tools() {
List<Object> toolObjects = new ArrayList<>();
if (searchOperations instanceof VectorSearch) {
toolObjects.add(new VectorSearchTools((VectorSearch) searchOperations));
}
if (searchOperations instanceof TextSearch) {
toolObjects.add(new TextSearchTools((TextSearch) searchOperations));
}
if (searchOperations instanceof ResultExpander) {
toolObjects.add(new ResultExpanderTools((ResultExpander) searchOperations));
}
if (searchOperations instanceof RegexSearchOperations) {
toolObjects.add(new RegexSearchTools((RegexSearchOperations) searchOperations));
}
return toolObjects.stream()
.flatMap(obj -> Tool.fromInstance(obj).stream())
.toList();
}
This means:
- A Lucene store exposes vector search, text search, regex search, AND result expansion tools
- A Spring AI VectorStore adapter exposes only vector search tools
- A basic text-only store exposes only text search tools
- A directory-based text search exposes text search and regex search
The LLM sees only the tools that actually work with the configured store, preventing runtime errors from unsupported operations.
Getting Started
To use RAG in your Embabel Agent application, add the rag-core module and a store implementation to your pom.xml:
<dependency>
<groupId>com.embabel.agent</groupId>
<artifactId>embabel-agent-rag-lucene</artifactId>
<version>$\{embabel-agent.version}</version>
</dependency>
<dependency>
<groupId>com.embabel.agent</groupId>
<artifactId>embabel-agent-rag-tika</artifactId>
<version>$\{embabel-agent.version}</version>
</dependency>
The embabel-agent-rag-lucene module provides Lucene-based vector and text search.
The embabel-agent-rag-tika module provides Apache Tika integration for parsing various document formats.
Our Model
Embabel Agent uses a hierarchical content model that goes beyond traditional flat chunk storage:
Datum (sealed interface)
│ Core: id, uri, metadata, labels()
│
├── ContentElement ─────────────────────────────────────┐
│ Structural content (not embedded) │
│ ┌───────────────────────────────────────────────┐ │
│ │ ContentRoot / NavigableDocument │ │
│ │ Documents with URI and title │ │
│ └───────────────────────────────────────────────┘ │
│ ┌───────────────────────────────────────────────┐ │
│ │ ContainerSection / LeafSection │ │
│ │ Hierarchical document sections │ │
│ └───────────────────────────────────────────────┘ │
│ │
└── Retrievable ────────────────────────────────────────┤
Embeddable/searchable content │
┌───────────────────────────────────────────────┐ │
│ Chunk │ │
│ text, parentId, embedding │ │
│ Primary unit for vector search │ │
└───────────────────────────────────────────────┘ │
┌───────────────────────────────────────────────┐ │
│ NamedEntity │ │
│ Domain entity contract (Person, Product) │ │
│ name, description + domain properties │ │
│ │ │
│ └── NamedEntityData │ │
│ Storage format with properties map │ │
│ Hydration via toTypedInstance() │ │
└───────────────────────────────────────────────┘ │
│
────────────────────────────────────────────────────────┘
Key Design Points:
Datumis the root sealed interface for all data objectsContentElementbranch contains structural content (documents, sections) that is NOT embeddedRetrievablebranch contains searchable content with embeddings (chunks, entities)NamedEntityis the domain contract for typed entitiesNamedEntityDatais the storage format with genericpropertiesmap and hydration support
Content Elements
The ContentElement interface is the supertype for all content in the RAG system.
Key subtypes include:
ContentRoot/NavigableDocument: The root of a document hierarchy, with a required URI and titleSection: A hierarchical division of content with a titleContainerSection: A section containing other sectionsLeafSection: A section containing actual text contentChunk: Traditional RAG text chunks, created by splittingLeafSectioncontent
Chunks
Chunk is the primary unit for vector search.
Each chunk:
- Contains a
textfield with the content - Has a
parentIdlinking to its source section - Includes
metadatawith information about its origin (root document, container section, leaf section) - Can compute its
pathFromRootthrough the document hierarchy
This hierarchical model enables advanced RAG capabilities like "zoom out" to parent sections or expansion to adjacent chunks.
SearchOperations
SearchOperations is the tag interface for search functionality.
Concrete implementations implement one or more subinterfaces based on their capabilities.
This design allows stores to implement only what’s natural and efficient for them—a vector database need not pretend to support full-text search, and a text search engine need not fake vector similarity.
VectorSearch
Classic semantic vector search:
public interface VectorSearch extends SearchOperations {
<T extends Retrievable> List<SimilarityResult<T>> vectorSearch(
TextSimilaritySearchRequest request,
Class<T> clazz
);
}
TextSearch
Full-text search using Lucene query syntax:
public interface TextSearch extends SearchOperations {
<T extends Retrievable> List<SimilarityResult<T>> textSearch(
TextSimilaritySearchRequest request,
Class<T> clazz
);
}
Supported query syntax includes:
+term- term must appear-term- term must not appear"phrase"- exact phrase matchterm*- prefix wildcardterm~- fuzzy match
ResultExpander
Expand search results to surrounding context:
public interface ResultExpander extends SearchOperations {
List<ContentElement> expandResult(
String id,
Method method,
int elementsToAdd
);
}
Expansion methods:
SEQUENCE- expand to previous and next chunksZOOM_OUT- expand to enclosing section
RegexSearchOperations
Pattern-based search across content:
public interface RegexSearchOperations extends SearchOperations {
<T extends Retrievable> List<SimilarityResult<T>> regexSearch(
Pattern regex,
int topK,
Class<T> clazz
);
}
Useful for finding specific patterns like error codes, identifiers, or structured content that doesn’t match well with semantic or keyword search.
CoreSearchOperations
A convenience interface combining the most common search capabilities:
public interface CoreSearchOperations extends VectorSearch, TextSearch { }
Stores that support both vector and text search can implement this single interface for convenience.
ToolishRag
ToolishRag is an LlmReference that exposes SearchOperations as LLM tools.
This gives the LLM fine-grained control over RAG searches.
Configuration
Create a ToolishRag by wrapping your SearchOperations:
public ChatActions(SearchOperations searchOperations) {
this.toolishRag = new ToolishRag(
"sources",
"Sources for answering user questions",
searchOperations
);
}
Using with LLM Calls
Attach ToolishRag to an LLM call using .withReference():
@Action(canRerun = true, trigger = UserMessage.class)
void respond(Conversation conversation, ActionContext context) {
var assistantMessage = context.ai()
.withLlm(properties.chatLlm())
.withReference(toolishRag)
.rendering("ragbot")
.respondWithSystemPrompt(conversation, Map.of(
"properties", properties
));
context.sendMessage(conversation.addMessage(assistantMessage));
}
Based on the capabilities of the underlying SearchOperations, ToolishRag exposes:
- VectorSearchTools:
vectorSearch(query, topK, threshold)- semantic similarity search - TextSearchTools:
textSearch(query, topK, threshold)- BM25 full-text search with Lucene syntax - RegexSearchTools:
regexSearch(regex, topK)- pattern-based search using regular expressions - ResultExpanderTools:
broadenChunk(chunkId, chunksToAdd)- expand to adjacent chunks,zoomOut(id)- expand to parent section
The LLM autonomously decides when to use these tools based on user queries.
Eager Search
By default, ToolishRag is entirely agentic—the LLM decides when to search and what queries to use.
However, when the topic of the conversation is already known, you can preload relevant results before the LLM starts, giving it a head start and reducing the number of tool calls needed.
ToolishRag implements the EagerSearch interface, which provides withEagerSearchAbout():
// Preload results about the user's topic before the LLM starts
ToolishRag eagerRag = toolishRag
.withEagerSearchAbout("Kotlin coroutines", 10);
context.ai()
.withReference(eagerRag)
.respondWithSystemPrompt(conversation, Map.of());
The preloaded results are included in the prompt as hints. The LLM still has access to all the usual search tools and can perform additional searches as needed.
For more control over the search parameters, pass a TextSimilaritySearchRequest directly:
var request = new TextSimilaritySearchRequest("Kotlin coroutines", 0.7, 10);
ToolishRag eagerRag = toolishRag.withEagerSearchAbout(request);
Combining eager search with agentic tools is the sweet spot: preloaded results give the LLM an immediate head start (no round-trip needed), while the tools remain available for follow-up searches if the preloaded results aren’t sufficient. You get the latency benefit of traditional RAG with the quality benefit of agentic RAG.
Eager search requires VectorSearch support in the underlying SearchOperations. If the store does not support vector search, withEagerSearchAbout() throws UnsupportedOperationException eagerly at configuration time.
EagerSearch<T> is a general-purpose interface in the com.embabel.agent.api.reference package for any LlmReference that can preload context via similarity search.
ToolishRag is one implementation, but other reference types can implement EagerSearch to provide the same consistent pattern for preloading relevant context before an LLM call.
ToolishRag lifecycle
It is safe to create a ToolishRag instance and reuse across many LLM calls.
However, instances are not expensive to create, so you can create a new instance per LLM call.
You might choose to do this if you provide a ResultListener
that will collect queries and results for logging or analysis: for example, to track which queries were most useful for answering user questions and the complexity in terms of number of searches performed.
This can be useful for implementing a learning feedback loop, for example to discern which queries performed badly, indicating that content such as documentation needs to be enhanced.
Result Filtering
In multi-tenant applications or scenarios where searches should be scoped to specific data subsets, ToolishRag supports result filtering.
Filters are applied transparently to all searches—the LLM does not see or control them, ensuring security and data isolation.
Embabel Agent provides two types of filters:
- Metadata Filters: Filter on the
metadatamap ofDatumobjects (chunks, sections, etc.) - Property Filters: Filter on object properties of typed entities (e.g., fields of
NamedEntityDataor custom entity classes)
Both use the same PropertyFilter type but are applied at different levels.
Motivation
Consider a document management system where:
- Each document belongs to an owner (user or organization)
- Some documents are shared reference data accessible to all users
- The LLM should only search documents the current user is authorized to access
Without filtering, you would need separate RAG stores per user or risk data leakage.
With filtering, a single ToolishRag instance can be scoped per-request to the current user’s data.
Filter API
Embabel Agent provides two filter interfaces for RAG searches:
PropertyFilter: Filters on map-based properties (metadata, entity properties)EntityFilter: ExtendsPropertyFilterto add entity-specific filtering, particularly label-based filtering
PropertyFilter
The PropertyFilter sealed class hierarchy provides type-safe filter expressions for map-based properties:
| Filter Type | Description | Example |
|---|---|---|
Eq | Equals | PropertyFilter.eq("owner", "alice") |
Ne | Not equals | PropertyFilter.ne("status", "deleted") |
Gt, Gte | Greater than (or equal) | PropertyFilter.gte("score", 0.8) |
Lt, Lte | Less than (or equal) | PropertyFilter.lt("priority", 5) |
In | Value in list | PropertyFilter.in("category", "tech", "science") |
Nin | Value not in list | PropertyFilter.nin("status", "deleted", "archived") |
Contains | String contains substring | PropertyFilter.contains("tags", "important") |
And | Logical AND | PropertyFilter.and(filter1, filter2) |
Or | Logical OR | PropertyFilter.or(filter1, filter2) |
Not | Logical NOT | PropertyFilter.not(filter) |
EntityFilter
EntityFilter extends PropertyFilter to add entity-specific filtering. Currently, it adds label-based filtering via HasAnyLabel:
| Filter Type | Description | Example |
|---|---|---|
HasAnyLabel | Matches entities with any of the specified labels | EntityFilter.hasAnyLabel("Person", "Organization") |
HasAnyLabel is particularly useful for:
- Type-safe entity searches: Filter results to only include specific entity types
- Multi-type queries: Search across multiple entity types in one query
import com.embabel.agent.rag.filter.EntityFilter;
import com.embabel.agent.rag.filter.PropertyFilter;
// Filter by single label
EntityFilter personFilter = EntityFilter.hasAnyLabel("Person");
// Filter by multiple labels (OR semantics - entity must have ANY of these labels)
EntityFilter entityFilter = EntityFilter.hasAnyLabel("Person", "Organization");
// Combine HasAnyLabel with property filters using fluent API
PropertyFilter simpleCombo = EntityFilter.hasAnyLabel("Person")
.and(PropertyFilter.eq("status", "active"));
// Multiple conditions
PropertyFilter complexFilter = EntityFilter.hasAnyLabel("Person")
.and(PropertyFilter.eq("status", "active"))
.and(PropertyFilter.gte("score", 0.8));
// OR combinations
PropertyFilter orFilter = EntityFilter.hasAnyLabel("Person")
.or(PropertyFilter.eq("fallback", true));
// With negation
PropertyFilter notDeleted = EntityFilter.hasAnyLabel("Person")
.and(PropertyFilter.not(PropertyFilter.eq("status", "deleted")));
// Complex grouping
PropertyFilter accessFilter = PropertyFilter.or(
PropertyFilter.and(
EntityFilter.hasAnyLabel("Person", "Employee"),
PropertyFilter.eq("active", true)
),
PropertyFilter.eq("role", "admin")
);
Since EntityFilter extends PropertyFilter, all filter types share the same and, or, not operators and can be freely combined.
EntityFilter.HasAnyLabel is typically handled via in-memory filtering as most vector stores don’t have native label support. When using Neo4j backends, labels can be translated to native Cypher label predicates for optimal performance.
Limitation: Nested Properties Not Supported
Filters currently operate on top-level properties only. Nested property paths like "address.city" or "metadata.source" are not supported.
The filter key must match a direct key in the metadata map or a top-level property on the entity object.
For example:
PropertyFilter.eq("owner", "alice")- Supported: filters on top-levelownerpropertyPropertyFilter.eq("address.city", "London")- Not supported: nested path will not match
Kotlin Operator Syntax
Kotlin users can use operator and infix functions for a more natural DSL syntax:
import com.embabel.agent.rag.filter.PropertyFilter;
// Simple filter with not operator
PropertyFilter notDeleted = PropertyFilter.not(PropertyFilter.eq("status", "deleted"));
// Combine with 'and' and 'or'
PropertyFilter userAccess = PropertyFilter.and(
PropertyFilter.eq("owner", userId),
PropertyFilter.gte("confidenceScore", 0.7)
);
// Complex expressions with grouping
PropertyFilter accessFilter = PropertyFilter.or(
PropertyFilter.and(
PropertyFilter.eq("owner", userId),
PropertyFilter.ne("status", "deleted")
),
PropertyFilter.eq("role", "admin")
);
Metadata vs Entity Filters
ToolishRag accepts two separate filter parameters:
metadataFilter: APropertyFilterthat filters on themetadatamap ofDatumobjects. Metadata is typically ingestion-time information like source URI, ingestion date, owner ID, etc.entityFilter: AnEntityFilterthat filters on entity properties and labels. ForNamedEntityData, this filters on thepropertiesmap andlabels(). For typed entities, reflection is used to access top-level fields.
// Filter on metadata (e.g., which user owns the document)
PropertyFilter metadataFilter = PropertyFilter.eq("ownerId", currentUserId);
// Filter on entity labels and properties
EntityFilter entityFilter = EntityFilter.hasAnyLabel("Person");
// Apply both filters
ToolishRag scopedRag = toolishRag
.withMetadataFilter(metadataFilter)
.withEntityFilter(entityFilter);
In most cases, you’ll use metadata filters for access control and entity filters for type-based and business logic filtering.
Neo4j Cypher Filtering
When using Neo4j via the Drivine module, metadata filters are automatically converted to Cypher WHERE clauses using CypherFilterConverter:
// The filter is converted to Cypher WHERE clause automatically
PropertyFilter filter = PropertyFilter.and(
PropertyFilter.eq("owner", "alice"),
PropertyFilter.gte("confidenceScore", 0.7)
);
// In DrivineNamedEntityDataRepository:
List<SimilarityResult<T>> results = repository.vectorSearch(request, filter);
// Generates: WHERE (e.owner = $_filter_0) AND (e.confidenceScore >= $_filter_1) AND ...
The converter produces parameterized queries for safety and handles all filter types including nested logical expressions.
For both DrivineStore (chunks) and DrivineNamedEntityDataRepository (named entities), both metadata and property filters are translated to native Cypher WHERE clauses. This is because Neo4j stores all data as node properties - metadata is simply the set of properties that aren’t core fields like id, text, parentId, etc. This provides optimal performance by filtering at the database level rather than in-memory.
Basic Usage
Apply a metadata filter to scope all searches to a specific owner:
// Create a filter for the current user
PropertyFilter ownerFilter = PropertyFilter.eq("ownerId", currentUserId);
// Apply to ToolishRag - all searches will be filtered
ToolishRag scopedRag = toolishRag.withMetadataFilter(ownerFilter);
// Use in LLM call - LLM cannot see or bypass the filter
context.ai()
.withReference(scopedRag)
.respondWithSystemPrompt(conversation, Map.of());
Complex Filters
Combine filters for more sophisticated access control:
// User can access their own documents OR documents in their departments
PropertyFilter accessFilter = PropertyFilter.or(
PropertyFilter.eq("ownerId", currentUserId),
PropertyFilter.in("departmentId", userDepartmentIds)
);
ToolishRag scopedRag = toolishRag.withMetadataFilter(accessFilter);
// Organization-scoped with status restriction
PropertyFilter orgFilter = PropertyFilter.and(
PropertyFilter.eq("orgId", currentOrgId),
PropertyFilter.ne("status", "deleted"),
PropertyFilter.gte("confidenceScore", 0.7)
);
ToolishRag scopedRag2 = toolishRag.withMetadataFilter(orgFilter);
Per-Request Scoping Pattern
A common pattern is to create a scoped ToolishRag per request in a web application:
@Action(trigger = UserMessage.class)
void respond(Conversation conversation, ActionContext context) {
// Get current user from security context
String userId = SecurityContextHolder.getContext()
.getAuthentication().getName();
// Create user-scoped RAG for this request
ToolishRag userScopedRag = toolishRag.withMetadataFilter(
PropertyFilter.eq("ownerId", userId)
);
context.ai()
.withReference(userScopedRag)
.rendering("assistant")
.respondWithSystemPrompt(conversation, Map.of());
}
Backend Implementation
Filters are applied at different levels depending on the backend:
- Spring AI VectorStore: Metadata filters are translated to
Filter.Expressionfor native filtering; entity filters (includingHasAnyLabel) are applied in-memory - Neo4j (Drivine): Both metadata and entity filters (including
HasAnyLabel) are translated to native Cypher WHERE clauses and label predicates (optimal performance) - Lucene: Both filter types are applied as post-filters with inflated
topKto compensate for filtered-out results - Custom stores: Can implement
FilteringVectorSearch/FilteringTextSearchfor native translation, or fall back to in-memory filtering
The InMemoryPropertyFilter utility class provides fallback filtering for any store implementation:
// In your SearchOperations implementation
List<SimilarityResult<T>> results = performSearch(request);
return InMemoryPropertyFilter.filterResults(results, metadataFilter, entityFilter);
For EntityFilter.HasAnyLabel, the in-memory filter checks if the entity has any of the specified labels via NamedEntityData.labels().
This ensures filtering works across all backends, with native optimization for metadata filters where available.
Ingestion
Document Parsing with Tika
Embabel Agent uses Apache Tika for document parsing. TikaHierarchicalContentReader reads various formats (Markdown, HTML, PDF, Word, etc.) and extracts a hierarchical structure:
@ShellMethod("Ingest URL or file path")
String ingest(@ShellOption(defaultValue = "./data/document.md") String location) {
var uri = location.startsWith("http://") || location.startsWith("https://")
? location
: Path.of(location).toAbsolutePath().toUri().toString();
var ingested = NeverRefreshExistingDocumentContentPolicy.INSTANCE
.ingestUriIfNeeded(
luceneSearchOperations,
new TikaHierarchicalContentReader(),
uri
);
return ingested != null ?
"Ingested document with ID: " + ingested :
"Document already exists, no ingestion performed.";
}
Chunking Configuration
Content is split into chunks with configurable parameters:
ragbot:
chunker-config:
max-chunk-size: 800
overlap-size: 100
Configuration options:
maxChunkSize- Maximum characters per chunk (default: 1500)overlapSize- Character overlap between consecutive chunks (default: 200)includeSectionTitleInChunk- Include section title in chunk text (default: true)
Chunk Transformation
When chunks are created from documents, they often lack the context needed for effective retrieval.
A chunk containing "This approach improves performance by 40%" is not useful unless the reader knows what "this approach" refers to.
The ChunkTransformer interface allows you to enrich chunks with additional context before they are indexed.
The urtext Field
Every Chunk has two text fields:
text- The indexed content, which may be transformed with additional contexturtext- The original, unmodified chunk text
The urtext field preserves the original content for accurate citations.
When displaying search results to users, use urtext to show exactly what appeared in the source document, while using the enriched text for vector embeddings and search.
AddTitlesChunkTransformer
The recommended default transformer is AddTitlesChunkTransformer, which prepends document and section titles to each chunk:
@Bean
ChunkTransformer chunkTransformer() {
return AddTitlesChunkTransformer.INSTANCE;
}
This transforms a chunk like:
This approach improves performance by 40% compared to the baseline.
Into:
# Title: Performance Optimization Guide
# URI: https://docs.example.com/performance
# Section: Caching Strategies
This approach improves performance by 40% compared to the baseline.
Now the chunk carries its context, improving both retrieval accuracy and LLM understanding.
Custom Transformers
You can create custom transformers by implementing ChunkTransformer or extending AbstractChunkTransformer:
public class MetadataEnrichingTransformer extends AbstractChunkTransformer {
@Override
public Map<String, Object> additionalMetadata(
Chunk chunk,
ChunkTransformationContext context) {
return Map.of(
"documentType", context.getDocument().getMetadata().get("type"),
"lastModified", Instant.now().toString()
);
}
@Override
public String newText(Chunk chunk, ChunkTransformationContext context) {
// Optionally modify the text
return chunk.getText();
}
}
The ChunkTransformationContext provides access to:
section- TheSectioncontaining this chunkdocument- TheContentRoot(may be null for orphan sections)
Chaining Transformers
Use ChainedChunkTransformer to apply multiple transformations in sequence:
@Bean
ChunkTransformer chunkTransformer() {
return new ChainedChunkTransformer(List.of(
AddTitlesChunkTransformer.INSTANCE,
new MetadataEnrichingTransformer(),
new CustomCleanupTransformer()
));
}
Transformers are applied in order, with each receiving the output of the previous transformer.
Configuring the Store
Pass your ChunkTransformer to the store implementation:
@DependsOn("onnxEmbeddingInitializer") // ①
@Bean
DrivineStore drivineStore(
PersistenceManager persistenceManager,
EmbeddingService embeddingService,
ChunkTransformer chunkTransformer, // ②
MyProperties properties) {
return new DrivineStore(
persistenceManager,
properties.neoRag(),
properties.chunkerConfig(),
chunkTransformer, // ③
embeddingService,
platformTransactionManager,
new DrivineCypherSearch(persistenceManager)
);
}
- Ensure the
EmbeddingServicebean is registered before this configuration is wired (see note below) - Inject the
ChunkTransformerbean - Pass it to the store constructor
EmbeddingService beans are registered dynamically by model provider auto-configurations via registerSingleton.
If your @Configuration class injects EmbeddingService directly (as above), you should add @DependsOn on the provider’s initializer bean — e.g. @DependsOn("onnxEmbeddingInitializer") for the ONNX provider.
Without it, Spring may resolve the dependency before the initializer has run, resulting in a NoSuchBeanDefinitionException.
This is only necessary when consuming model beans directly; framework beans like ModelProvider handle this internally.
For most use cases, AddTitlesChunkTransformer is all you need.
It adds essential context that significantly improves retrieval quality without adding complexity.
Using Docling for Markdown Conversion
While we believe that you should write your Gen AI applications in Java or Kotlin, ingestion is more in the realm of data science, and Python is indisputably strong in this area.
For complex documents like PDFs, consider using Docling to convert to Markdown first:
docling https://example.com/document.pdf --from pdf --to md --output ./data
Markdown is easier to parse hierarchically and produces better chunks than raw PDF extraction.
Supported Stores
Embabel Agent provides several RAG store implementations:
Lucene (embabel-agent-rag-lucene)
Full-featured store with vector search, text search, and result expansion. Supports both in-memory and file-based persistence:
@Bean
LuceneSearchOperations luceneSearchOperations(
ModelProvider modelProvider,
RagbotProperties properties) {
var embeddingService = modelProvider.getEmbeddingService(
DefaultModelSelectionCriteria.INSTANCE);
return LuceneSearchOperations
.withName("docs")
.withEmbeddingService(embeddingService)
.withChunkerConfig(properties.chunkerConfig())
.withIndexPath(Paths.get("./.lucene-index")) // File persistence
.buildAndLoadChunks();
}
Omit .withIndexPath() for in-memory only storage.
Neo4j
Graph database store for RAG (available in separate modules embabel-agent-rag-neo-drivine and embabel-agent-rag-neo-ogm).
Ideal when you need graph relationships between content elements.
PostgreSQL pgvector (embabel-rag-pgvector)
PostgreSQL-based RAG store using the pgvector extension (available in the separate embabel/embabel-rag-pgvector repository).
Supports hybrid search combining vector similarity, full-text search via tsvector/tsquery, and fuzzy matching via pg_trgm.
Ideal when you already use PostgreSQL and want a familiar, battle-tested database for RAG.
Spring AI VectorStore (SpringVectorStoreVectorSearch)
Adapter that wraps any Spring AI VectorStore, enabling use of any vector database Spring AI supports:
public class SpringVectorStoreVectorSearch implements VectorSearch {
private final VectorStore vectorStore;
public SpringVectorStoreVectorSearch(VectorStore vectorStore) {
this.vectorStore = vectorStore;
}
@Override
public <T extends Retrievable> List<SimilarityResult<T>> vectorSearch(
TextSimilaritySearchRequest request,
Class<T> clazz) {
SearchRequest searchRequest = SearchRequest
.builder()
.query(request.getQuery())
.similarityThreshold(request.getSimilarityThreshold())
.topK(request.getTopK())
.build();
List<Document> results = vectorStore.similaritySearch(searchRequest);
// ... convert results
}
}
This allows integration with Pinecone, Weaviate, Milvus, Chroma, and other stores via Spring AI.
Implementing Your Own RAG Store
To implement a custom RAG store, implement only the SearchOperations subinterfaces that are natural and efficient for your store.
This is a key design principle: stores should only implement what they can do well.
For example:
- A vector database like Pinecone might implement only
VectorSearchsince that’s its strength - A full-text search engine might implement
TextSearchandRegexSearchOperations - A hierarchical document store might add
ResultExpanderfor context expansion - A full-featured store like Lucene can implement all interfaces
The ToolishRag facade automatically exposes only the tools that your store supports.
This means you don’t need to provide stub implementations or throw "not supported" exceptions—simply don’t implement interfaces that don’t fit your store’s capabilities.
// A store that only supports vector search
public class MyVectorOnlyStore implements VectorSearch {
@Override
public <T extends Retrievable> List<SimilarityResult<T>> vectorSearch(
TextSimilaritySearchRequest request,
Class<T> clazz) {
// Implement vector similarity search
}
}
// A store that supports both vector and text search
public class MyFullTextStore implements VectorSearch, TextSearch {
@Override
public <T extends Retrievable> List<SimilarityResult<T>> vectorSearch(
TextSimilaritySearchRequest request,
Class<T> clazz) {
// Implement vector similarity search
}
@Override
public <T extends Retrievable> List<SimilarityResult<T>> textSearch(
TextSimilaritySearchRequest request,
Class<T> clazz) {
// Implement full-text search
}
@Override
public String getLuceneSyntaxNotes() {
return "Full Lucene syntax supported";
}
}
For ingestion support, extend ChunkingContentElementRepository to handle document storage and chunking.
Complete Example
See the rag-demo project for a complete working example including:
- Lucene-based RAG store configuration
- Document ingestion via Tika
- Chatbot with RAG-powered responses
- Jinja prompt templates for system prompts
- Spring Shell commands for interactive testing




