image

What is RAG?

A RAG (Retrieval-Augmented Generation) application is based on a multi-level architecture that makes it possible to dynamically access external knowledge sources and combine them with a powerful generation model.
The architecture can be divided into two main phases: Data import and response generation.

This architecture combines the strengths of search and retrieval techniques with the advanced generation capabilities of modern language models. It allows the system not only to be based on pre-trained knowledge, but also to access dynamically retrieved, highly relevant information to provide high quality answers.

RAG - Architektur (Import der Dokumente / User-Query)
RAG - Architektur (Pahse 1: Import der Dokumente / Phase 2: User-Query)

Phase 1: Importing the documents, chunking and embedding (data import):

Firstly, the relevant documents or data records are imported into the system. This step involves the collection and preparation of data from various sources, e.g. text files (pdf, html, doc, exl) or databases.

Chunking:
The imported documents are then divided into smaller, manageable units (chunks). These smaller sections make it possible to search for specific information and increase the efficiency of the retrieval process.

Embedding:
After chunking, the text sections are converted into numerical vectors through embedding. Embedding is the process by which each chunk is given a mathematical representation that captures the semantic meaning of the text. These vectors help the system to measure the similarity and relevance of information and to specifically identify suitable chunks.

Phase 2: User query (response generation)

Retrieval phase:
Once the documents have been divided into chunks and embedded as vectors, the relevant information is retrieved. In the case of a query, the system searches for the chunks that semantically match the input the most and extracts these as the basis for the response.

Generation phase:

Finally, the generation model uses the retrieved, relevant chunks to generate a precise, coherent answer. This combines the knowledge from the retrieved information with the language generation capability of the model.