Parent-Child Retriever
Alternative Names
- Parent-Document-Retriever
Required Graph Shape
Context
Text embeddings represent a text’s semantic meaning. A more narrow piece of text will yield a more meaningful vector representation since there is less noise from multiple topics. However, if the LLM only receives a small piece of information for answer generation, the information might be missing context. Retrieving the broader surrounding text that the found information resides within solves the problem.
Description
The user question is embedded using the same embedder that has been used before to create the Chunk embeddings. A vector similarity search is executed on the Child Chunk embeddings to find k (number previously configured by developer / user) most similar Chunks. The Parent of the found Child Chunks are retrieved and additional metadata from the parent returned. Optionally chunks for the same parent are aggregated and their scores are averaged or picked by max.
Usage
This pattern is a useful evolution of the Basic Retriever. It is especially useful when several topics are covered in a chunk which subsequently influence the embedding negatively while smaller chunks will have more meaningful vector representations which can then lead to better similarity search results. With limited additional effort, better results can be obtained.
Required pre-processing
Split documents into (bigger) chunks (parent chunks) and further split these chunks into smaller chunks (child chunks). Use an embedding model to embed the text content of the child chunks. Note that it isn’t necessary to embed the parent chunks since they are only used for the answer generation and not for the similarity search.
Retrieval Query
Further reading
- Advanced Retriever Techniques to Improve Your RAGs (Damian Gil, April 2024)
- Implementing advanced RAG strategies with Neo4j (November 2023)
Existing Implementations
Example Implementations
Similar Patterns
Similar patterns can be implemented on Lexical Graphs With a Sibling Structure or Lexical Graphs With a Hierarchical Structure, where the additional context does not come from retrieving just the parent document but sibling documents or a previously set depth of structures. The Lexical Graph With Sibling Structure is, for example, currently implemented in Neo4j’s LLM Knowledge Graph Builder.
Note that there are two kinds of retrievers possible on a Lexical Graph With a Hierarchical Structure:
- Bottom-up: Execute retrieval on leaf nodes and retrieve other chunks higher up in the tree (see Going Meta — Ep 24: KG+LLMs: Ontology driven RAG patterns)
- Top-down: Use the top-level nodes to determine which subtree(s) to consider for retrieval. Iterate this methodology until the set of nodes for the similarity search is reasonably narrowed down (see RAG Strategies — Hierarchical Index Retrieval).