Skip to content

Yash-Vekaria/genai-assistants

Repository files navigation

Big Help or Big Brother? Auditing Tracking, Profiling, and Personalization in Generative AI Assistants

Crawling Infrastructure Setup and Usage

Overview

This project provides an auditing framework designed to assess the tracking, profiling, and personalization in Generative AI (GenAI) browser assistants. These assistants, implemented as browser extensions, offer functionalities such as search-based integration, querying, and page summarization. Leveraging powerful large language models (LLMs), they can access and process a wide range of user data in their browser.

Research Objectives

The core objectives of our research in context of GenAI-based browser assistants are to:

  • Audit user tracking to understand the collection and sharing of user's data by these assistants
  • Audit profiling and personalization to examine whether these assistants profile user attributes (e.g., location, age, gender, income, and interests) and personalize their responses accordingly.

Expected Outcome

Our framework and analysis pipeline aims to provide a reproducible methodology for capturing and analyzing network traffic from GenAI browser extensions.


Project Files and Structure

Main Scripts

  • parse_flows_to_csvs.py
    This Python script parses .flow files into summarized CSV files, extracting key details about relevant flows. This includes timestamp, request domain, payload, response type, cookies, domain's parent organization and disconnect list based tracker's category.

  • generate_entity_domain_mapping.py
    This Python script parses individual entity-domain mapping files for each domain from DuckDuckGo's tracker-radar repository and generates a consolidated ddg.json. It maps domains to their parent organizations and is used to detect third-party relationships.

Data Files

  • disconnect.json
    Contains a classification of tracker domains from the Disconnect Tracking Protection List. It assigns categories (e.g., Analytics, Advertising, Social) to domains observed in network flows.

  • ddg.json
    Maps a domain to its parent organization entity based on DuckDuckGo’s tracker radar. This file is generated automatically by running generate_entity_domain_mapping.py. The file we generated on our end was too big for GitHub. However, it can be accessed at https://doi.org/10.5281/zenodo.15530229.

Directories

  • Flows/
    Directory contains the captured network traffic via Mitmproxy in the form of .flow files, generated while interacting with GenAI browser extensions. This directory is located in the current working directory and contains sub-directories labeled with extension name being tested (e.g., Merlin). The following are some sample .flow files available in the repository:

    • Merlin-Lin-Control.flow
    • Merlin-Lin-Search.flow
    • Merlin-Lin-Browse.flow
    • Merlin-Lin-Summarize.flow
  • Output/
    Directory contains CSV files generated by processing the captured .flow files using parse_flows_to_csvs.py. A sample CSV for the extension: Merlin is available in the repository as Merlin.csv.

Dependencies

  • requirements.txt
    Lists Python package dependencies required to run the analysis pipeline. Install dependencies by running:
    pip install -r requirements.txt  

Installation

As a framework to audit and evaluate browser assistants for their data collection, storage, and sharing of information, we propose a man-in-the-middle approach. The goal is to install and configure Mitmproxy in your Chrome browser by installing necessary certificates so that all network traffic that flows out from the browser (in presence of the browser extension) is captured in its unecrypted form by this proxy.

  1. Download Mitmproxy

    • Visit the official Mitmproxy website and follow installation guidelines for your operating system to install the proxy on your device.
  2. Extract Chrome Executable and Profile Paths to Automatically Launch Chrome Session

    • Start Mitmproxy or Mitmweb by running the mitmweb command in the terminal.
    • Create a new Chrome profile:
      • Open Chrome.
      • At the top-right, click on the Profile icon (a circle with your account picture) and select Add.
      • In the pop-up window, choose Continue without an account.
      • Enter a name.
      • Click Done.
      • For more detailed instructions, refer to the official Chrome help page.
      • Open Chrome and navigate to chrome://version/.
      • Take note of the Executable Path and Profile Path. These paths will be useful for setting up the proxy.
  3. Start a Chrome Session

    • Open the terminal and navigate to the Chrome executable. For Windows 11, the typical path is:

      C:\Program Files\Google\Chrome\Application\chrome.exe
      
    • For macOS, the typical path is:

      /Applications/Google Chrome.app/Contents/MacOS/Google Chrome
      
    • For Linux, the typical path is:

      /usr/bin/google-chrome
      
    • Run the following command to start Chrome with the Mitmproxy proxy server.

      For Windows 11, the required command is:

      .\chrome.exe --proxy-server="localhost:8080" --user-data-dir="C:\Users\<YourUsername>\AppData\Local\Google\Chrome\User Data\Profile<ProfileNumber>"
      

      For macOS, the required command is (you may need to escape spaces in the path with \):

      /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --proxy-server="localhost:8080" --user-data-dir="/Users/<YourUsername>/Library/Application Support/Google/Chrome/Profile<ProfileNumber>"
      

      For Linux, the required command is (you may need to escape spaces in the path with \):

      /usr/bin/google-chrome --proxy-server="localhost:8080" --user-data-dir="/home/<YourUsername>/.config/google-chrome/Profile<ProfileNumber>"
      

      Replace <YourUsername> and <ProfileNumber> with appropriate values.

  4. Download and Install the Certificate in the Opened Session

    • Visit mitm.it in the Chrome instance configured above.
    • Download the Mitmproxy certificate.
    • Install the certificate by:
      • Navigating to Chrome's settings: Settings → Privacy and Security → Security.
      • Selecting Manage Certificates and importing the downloaded certificate into the Trusted Root Certification Authorities.
    • Follow the prompts to complete the installation. The installed certificates will be available in future sessions as well, without needing to install them every time afresh.
  5. Verify Certificate Installation

    • Open Chrome and navigate to:
      Settings → Privacy and Security → Security → Manage Certificates.
      
    • Under Trusted Root Certification Authorities, confirm the Mitmproxy certificate is listed.

Usage

Now that the Mitmproxy is installed and configured in your Chrome browser, we can audit any browser assistant. This section explains how to capture and save the detailed network traffic with Mitmproxy while performing any experiment with a browser extension.

  1. Start Mitmproxy Web Interface

    • Run the following command to start Mitmproxy with a flow file to log intercepted traffic. This terminal will act a an Mitmproxy server, capturing and logging all intercepted network traffic in the provided flow file.

      For Windows:

      mitmweb.exe -w  <flowFileName>.flow
      

      For macOS:

      mitmweb -w  <flowFileName>.flow
      

      For Linux:

      ./mitmweb -w  <flowFileName>.flow
      
  2. Launch Chrome with Proxy Configuration

    • In another terminal, execute the appropriate command depending on your operating system to launch a Chrome session with Mitmproxy:

    For Windows:

    .\chrome.exe --proxy-server="localhost:8080" --user-data-dir="C:\Users\<YourUsername>\AppData\Local\Google\Chrome\User Data\Profile<ProfileNumber>"
    

    For macOS:

    /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --proxy-server="localhost:8080" --user-data-dir="/Users/<YourUsername>/Library/Application Support/Google/Chrome/Profile<ProfileNumber>"
    

    For Linux:

    /usr/bin/google-chrome --proxy-server="localhost:8080" --user-data-dir="/home/<YourUsername>/.config/google-chrome/Profile<ProfileNumber>"
    

    Replace <YourUsername> and <ProfileNumber> with the appropriate values for your system.

    • This command creates Profile<ProfileNumber> if it does not already exist. You can replace Profile<ProfileNumber> with your desired profile name or number (e.g., Profile1). The "1" is simply provided as an example.
    • Traffic from this Chrome instance will now be routed through Mitmproxy.
  3. Install GenAI Extension

    • Install a Generative AI browser extension to audit from the official Chrome Web Store. To evaluate this browser extension, different interactions can be performed with the extension. If you are interested in performing the exact experiments performed in this paper with GenAI browser assistants, refer to Section 5 of the paper for more details.
  4. Inspect Traffic (Optional)

    • Use Mitmproxy’s web interface to monitor and analyze intercepted HTTP/S traffic.
  5. Inspect Flow File (Optional)

    • To inspect previously saved flows, use the -r flag with the flow file:
      mitmweb.exe -r <flowFileName>.flow
      

Flow File Analyzer

This tool analyzes .flow files and extract data into CSV format.

Features

  • Parse .flow files and save network data to CSV.

Usage

  1. Install Python Dependencies: Set up a Python virtual environment and install the required dependencies:

    Create a virtual environment:

    python3 -m venv <envName>

    Activate the created virtual environment:

    source <envName>/bin/activate

    Install requirements:

    pip install -r requirements.txt
  2. Download Benchmarks: For disconnect list, download services.json from https://github.com/disconnectme/disconnect-tracking-protection/blob/master/services.json, place it inside the current working directory, and rename it to disconnect.json. This JSON file comprises of an up-to-date classification of trackers (or tracking domains) into one of the following categories: Email, EmailAggressive, Advertising, Content, Analytics, FingerprintingInvasive, FingerprintingGeneral, Anti-fraud, Social, ConsentManagers, and Cryptomining. We use this mapping to label and identify the network request endpoints that are ATS (i.e., advertising and tracking services). For generating DuckDuckGo’s mapping, run the following command anywhere and then place the resultantly generated ddg.json in the current working directory. This JSON file maps domains to their parent/owner organizations allowing us to group together entities that are related to one other in order to accurately classify them as first-party or third-party. It is important to use the latest version of these JSON files since an outdated version of the mapping could result in an incomplete or inaccurate classification.

    python3 generate_entity_domain_mapping.py
  3. Analyze Flow Files:

To analyze .flow files and categorize requests:

  • Place your .flow files in the directory specified within parse_flows_to_csvs.py.

  • Run the script:

    python3 parse_flows_to_csvs.py

    Alternatively:

       python3 parse_flows_to_csvs.py <extensionName>
  • You can replace <extensionName> with your desired extension name (e.g., Merlin).

  • The resulting CSV file will be saved in the Output folder.


Time

Each extension takes around 30 minutes for evaluation of all 4 scenarios: control, search, browse, and summarize and needs approximately 700 MB of space on a typical laptop or desktop device

Notes

  • The installation and certificate import process may vary slightly depending on your operating system. For detailed instructions, refer to this blog post.
  • Ensure all applications, including Mitmproxy and Chrome, are closed after your activities to release resources.

Troubleshooting

If you encounter issues:

  • Verify the Mitmproxy certificate is correctly installed under Trusted Root Certification Authorities.
  • Double-check the Chrome executable and profile paths.
  • Consult the Mitmproxy documentation for further assistance.

Security Tip: Use a dedicated Chrome profile for auditing. Avoid logging into personal accounts. Remove the Mitmproxy certificate from your system after completing experiments.


About

A Repository to Audit Generative AI Browser Assistants

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages