Skip to content

databricks-industry-solutions/contextual-content-placement

Repository files navigation

Contextual Advertising

This repository provides a code base for building a Retrieval-Augmented Generation (RAG) agent on Databricks that can recommend optimal ad placement locations in movies based on user inquiries. The solution can be applied more broadly to contextual advertising use cases for any large corpus of media content (e.g., TV scripts, news articles, blogs, social media posts, etc.)

🎯 Project Overview

The contextual advertising agent is an AI-powered system that leverages Mosaic AI on Databricks to understand movie scripts and identify the best moments to insert advertising content.

Architecture

Ad Placement Architecture

1. Data Sources: Movie scripts or media content stored in cloud storage or external systems

  • This code leverages a public dataset of movie scripts for ad placement
  • The solution can be generalized to work for other content, including TV scripts, blogs, news articles, audio/podcast transcripts, or other media content

2. Data Preprocessing: Unstructured text is ingested, parsed, cleansed, and chunked. We then create embeddings from the processed text chunks and index them in a Databricks Vector Store.

3. Agent Development: Ad Placement agent leverages vector search retriever tool, LangGraph, MLflow, and LLM of choice (in this example we use a Claude model)

4. Agent Evaluation: Agent quality continuously improves through LLM judges, custom judges, human feedback, and iterative development loop

5. Agent Deployment: Agent Framework deploys agent to a Databricks model serving endpoint, governed, secured, and monitored through AI Gateway

6. App Usage: Exposes Agent to end users through Databricks Apps or custom app; log all user feedback and logs back to Databricks for continuous quality improvement

📁 Project Structure

ad-placement-agent/
├── 00_Movie_Dataset_Creation.ipynb    # Script scraping and dataset creation
├── 01a_Data_Preparation.ipynb         # Data preprocessing and cleaning
├── 01b_Data_Preparation_Images.ipynb  # Image processing for movie posters
├── 02_Agent_Definition.ipynb          # RAG agent definition, configuration and deployment
├── 03_Agent_Evaluation.ipynb          # Agent evaluation 
├── requirements.txt                   # Centralized dependency management
├── .gitignore                         # Git ignore patterns
├── LICENSE                            # Project license
├── images/
│   └── ad-placement-architecture.png  # System architecture diagram
├── resources/
│   ├── 00-init.ipynb                  # Initialization notebook
│   └── config                         # Configuration file
└── mcp/                               # Model Context Protocol server implementation

📖 Usage

1. Dataset Creation

Run 00_Movie_Dataset_Creation.ipynb to:

  • Fetch all movie scripts from IMSDb
  • Extract metadata (title, genre, rating, etc.)
  • Store data in Unity Catalog tables

2. Data Preparation

Execute the data preparation notebooks:

  • 01a_Data_Preparation.ipynb: Process, clean, and chunk script data to build Vector Search Index
  • 01b_Data_Preparation_Images.ipynb: Generate and index embeddings from movie posters - not used, but still available for extensions!

3. Agent Definition

Use 02_Agent_Definition.ipynb to:

  • Define the RAG agent with vector search retrieval tool
  • Test agent with example prompt
  • Register and deploy the agent with MLflow and Unity Catalog Agent Framework

4. Evaluation & Deployment

Run 03_Agent_Evaluation.ipynb to:

🎬 Example

Once the agent is deployed, query the endpoint with a description of the advertisement you would like to place. Example below:

# Example query to the deployed agent
query = "When could I insert a commercial for a light-hearted basketball-themed comedy movie we want to promote for next summer?"

# The agent will analyze movie scripts and return recommendations
# based on scene context, genre, and timing

Example response

🤝 Contributing

We're always open to improving this and any contributions! Please follow the below instructions or file an issue.

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

For more information, visit Contributing

📄 License

This project is licensed under the terms of the Databricks License.

About

No description, website, or topics provided.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •