RAG-with-GCP

Grounding in GenAI

Grounding in Generative AI refers to the practice of making AI-generated outputs based on real, verifiable, and specific data or context. It ensures that responses are not just plausible-sounding, but actually reliable and factually accurate.

Grounding is especially critical in high-stakes domains like healthcare, law, education, and finance, where incorrect or hallucinated content can cause harm.

Types of Grounding

Data Grounding

AI uses a structured database or real-time API to answer questions.

Example: Pulling stock prices from a finance API before generating a summary.

Document Grounding (RAG)

Retrieval-Augmented Generation (RAG) pulls text from a set of documents (e.g., PDFs, manuals, articles) to answer user queries.

Example: Answering customer questions based on a product manual.

User Context Grounding

Using user-specific inputs or history to tailor responses.

Example: Personalizing a workout plan based on previous sessions.

Tools and Libraries for Grounding

LangChain – For chaining LLMs and retrievers.

LlamaIndex – Optimized for document indexing and RAG.

FAISS, Weaviate, Pinecone – Vector stores.

OpenAI, HuggingFace, Cohere – For embeddings and LLMs.

Example

. .

1) Data retrieval

Retrieve data that is relevant to the problem.

This data can include user data for the logged in user and other information that is not found in the foundation model.

2) Augmented prompt

Add the data to the prompt.

In your prompt, you include the retrieved information as context, and tell the model that the data is trusted and that the response can use the information.

3) Generated response

The generated response is more relevant and reliable.

The model generates a response, utilizing both information from the model and the retrieved data that the app provided in the prompt.

detailed view of a RAG architecture on Google Cloud:

1-Data is uploaded to a Cloud Storage bucket. The data source might be an application user performing an upload, database ingestion, or streaming data.

2-When data is uploaded, a notification is sent to a Pub/Sub topic.

3-Pub/Sub triggers a Cloud Run job to process the uploaded data.

4-Cloud Run starts the job by using configuration data that's stored in an AlloyDB for PostgreSQL database.

5-The Cloud Run job uses Document AI to prepare the data for further processing.

For example, the preparation can include parsing the data, converting the data to the required format, and dividing the data into chunks.

6-The Cloud Run job uses the Vertex AI Embeddings for Text model to create vector embeddings of the ingested data.

7-Cloud Run stores the embeddings in an AlloyDB for PostgreSQL database that has the pgvector extension enabled.

When the serving subsystem processes user requests, it uses the embeddings in the vector database to retrieve relevant domain-specific data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!