Contextual Advertising

This repository provides a code base for building a Retrieval-Augmented Generation (RAG) agent on Databricks that can recommend optimal ad placement locations in movies based on user inquiries. The solution can be applied more broadly to contextual advertising use cases for any large corpus of media content (e.g., TV scripts, news articles, blogs, social media posts, etc.)

🎯 Project Overview

The contextual advertising agent is an AI-powered system that leverages Mosaic AI on Databricks to understand movie scripts and identify the best moments to insert advertising content.

Architecture

1. Data Sources: Movie scripts or media content stored in cloud storage or external systems

This code leverages a public dataset of movie scripts for ad placement
The solution can be generalized to work for other content, including TV scripts, blogs, news articles, audio/podcast transcripts, or other media content

2. Data Preprocessing: Unstructured text is ingested, parsed, cleansed, and chunked. We then create embeddings from the processed text chunks and index them in a Databricks Vector Store.

3. Agent Development: Ad Placement agent leverages vector search retriever tool, LangGraph, MLflow, and LLM of choice (in this example we use a Claude model)

4. Agent Evaluation: Agent quality continuously improves through LLM judges, custom judges, human feedback, and iterative development loop

5. Agent Deployment: Agent Framework deploys agent to a Databricks model serving endpoint, governed, secured, and monitored through AI Gateway

6. App Usage: Exposes Agent to end users through Databricks Apps or custom app; log all user feedback and logs back to Databricks for continuous quality improvement

📁 Project Structure

ad-placement-agent/
├── 00_Movie_Dataset_Creation.ipynb    # Script scraping and dataset creation
├── 01a_Data_Preparation.ipynb         # Data preprocessing and cleaning
├── 01b_Data_Preparation_Images.ipynb  # Image processing for movie posters
├── 02_Agent_Definition.ipynb          # RAG agent definition, configuration and deployment
├── 03_Agent_Evaluation.ipynb          # Agent evaluation 
├── requirements.txt                   # Centralized dependency management
├── .gitignore                         # Git ignore patterns
├── LICENSE                            # Project license
├── images/
│   └── ad-placement-architecture.png  # System architecture diagram
├── resources/
│   ├── 00-init.ipynb                  # Initialization notebook
│   └── config                         # Configuration file
└── mcp/                               # Model Context Protocol server implementation

📖 Usage

1. Dataset Creation

Run 00_Movie_Dataset_Creation.ipynb to:

Fetch all movie scripts from IMSDb
Extract metadata (title, genre, rating, etc.)
Store data in Unity Catalog tables

2. Data Preparation

Execute the data preparation notebooks:

01a_Data_Preparation.ipynb: Process, clean, and chunk script data to build Vector Search Index
01b_Data_Preparation_Images.ipynb: Generate and index embeddings from movie posters - not used, but still available for extensions!

3. Agent Definition

Use 02_Agent_Definition.ipynb to:

Define the RAG agent with vector search retrieval tool
Test agent with example prompt
Register and deploy the agent with MLflow and Unity Catalog Agent Framework

4. Evaluation & Deployment

Run 03_Agent_Evaluation.ipynb to:

Perform automated and human evaluations using MLflow 3.0 and Agent Evaluation

🎬 Example

Once the agent is deployed, query the endpoint with a description of the advertisement you would like to place. Example below:

# Example query to the deployed agent
query = "When could I insert a commercial for a light-hearted basketball-themed comedy movie we want to promote for next summer?"

# The agent will analyze movie scripts and return recommendations
# based on scene context, genre, and timing

🤝 Contributing

We're always open to improving this and any contributions! Please follow the below instructions or file an issue.

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

For more information, visit Contributing

📄 License

This project is licensed under the terms of the Databricks License.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github		.github
images		images
mcp		mcp
resources		resources
.gitignore		.gitignore
00_Movie_Dataset_Creation.ipynb		00_Movie_Dataset_Creation.ipynb
01a_Data_Preparation.ipynb		01a_Data_Preparation.ipynb
02_Agent_Definition.ipynb		02_Agent_Definition.ipynb
03_Agent_Evaluation.ipynb		03_Agent_Evaluation.ipynb
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
NOTICE.md		NOTICE.md
README.md		README.md
SECURITY.md		SECURITY.md
agent.py		agent.py
config.yml		config.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Contextual Advertising

🎯 Project Overview

Architecture

📁 Project Structure

📖 Usage

1. Dataset Creation

2. Data Preparation

3. Agent Definition

4. Evaluation & Deployment

🎬 Example

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Uh oh!

License

Uh oh!

databricks-industry-solutions/contextual-content-placement

Folders and files

Latest commit

History

Repository files navigation

Contextual Advertising

🎯 Project Overview

Architecture

📁 Project Structure

📖 Usage

1. Dataset Creation

2. Data Preparation

3. Agent Definition

4. Evaluation & Deployment

🎬 Example

🤝 Contributing

📄 License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages