Skip to content

yjcmsft/J-browser-agents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Browser Agent

A browser automation agent for web scraping, text extraction, and AI-powered content summarization built with Microsoft Agent Framework patterns and Azure AI Foundry SDK integration.

Based on:

πŸ“ Project Structure

J-browser-agents/
β”‚
β”œβ”€β”€ Core/                          # Core framework modules
β”‚   β”œβ”€β”€ agent_framework.py         # Microsoft Agent Framework base classes
β”‚   β”œβ”€β”€ azure_ai_client.py         # Azure AI Foundry SDK integration
β”‚   β”œβ”€β”€ browser_automation.py      # Playwright browser control
β”‚   β”œβ”€β”€ text_extractor.py          # HTML parsing & extraction
β”‚   β”œβ”€β”€ text_summarizer.py         # AI-powered summarization
β”‚   └── browser_agent.py           # Main orchestrator (extends OpenAIAgent)
β”‚
β”œβ”€β”€ Demos/                         # Example scripts
β”‚   β”œβ”€β”€ demo.py                    # Interactive demo
β”‚   └── demo_mslearn.py            # Microsoft Learn scraper
β”‚
β”œβ”€β”€ Tests/                         # Testing & verification
β”‚   β”œβ”€β”€ test_quick.py              # Quick functionality test
β”‚   └── verify_setup.py            # Comprehensive verification
β”‚
β”œβ”€β”€ Scripts/                       # Utility scripts
β”‚   β”œβ”€β”€ install_dependencies.bat   # One-click installation
β”‚   β”œβ”€β”€ test_framework.bat         # Test launcher
β”‚   β”œβ”€β”€ run_demo.bat               # Demo launcher
β”‚   └── run_test.bat               # Quick test runner
β”‚
β”œβ”€β”€ Config/                        # Configuration
β”‚   └── requirements.txt           # Python dependencies
β”‚
β”œβ”€β”€ .env.example                   # Environment configuration template
β”œβ”€β”€ README.md                      # This file
└── LICENSE                        # License information

✨ Features

  • 🌐 Browser Automation - Automated web browsing using Playwright
  • πŸ“„ Text Extraction - Clean HTML parsing and structured content extraction
  • πŸ€– AI Summarization - Intelligent summarization using OpenAI or Azure OpenAI
  • 🎯 Microsoft Agent Framework - Tool-based agent architecture for extensibility
  • ☁️ Azure AI Foundry SDK - Unified access to Azure AI services
  • πŸ’Ύ JSON Export - Save extracted content for later analysis
  • πŸ” Q&A Support - Ask questions about extracted content
  • πŸ”§ Multi-Agent Orchestration - Coordinate multiple agents for complex workflows

πŸ“¦ Quick Start

Installation

Automated (Windows):

cd Scripts
install_dependencies.bat

Manual installation:

pip install -r Config/requirements.txt
python -m playwright install chromium

Dependencies

The framework uses these main packages (see Config/requirements.txt):

  • playwright - Browser automation
  • beautifulsoup4 + lxml - HTML parsing
  • openai - AI summarization (OpenAI or Azure OpenAI)
  • azure-ai-projects - Azure AI Foundry SDK
  • azure-identity - Azure authentication
  • langchain + langchain-openai - LangChain integration
  • python-dotenv - Environment configuration

Test the Framework

cd Scripts
test_framework.bat
# Choose option 1 for quick test

Run a Demo

cd Scripts
run_demo.bat
# Choose demo option

πŸš€ Usage

Basic Text Extraction (No API Key Required)

from Core.browser_agent import BrowserAgent

with BrowserAgent(headless=True) as agent:
    # Scrape and extract content
    content = agent.scrape_and_extract("https://example.com")
    
    # Access extracted data
    print(f"Title: {content['title']}")
    print(f"Word count: {content['word_count']}")
    print(f"Headings: {len(content['headings'])}")
    
    # Save to JSON
    agent.save_extracted_content("output.json")

With OpenAI (Direct API Key)

from Core.browser_agent import BrowserAgent

# Set environment: set OPENAI_API_KEY=sk-...
with BrowserAgent(headless=True) as agent:
    # Scrape a page
    url = "https://learn.microsoft.com/en-us/azure/ai-foundry/agents/overview"
    agent.scrape_and_extract(url)
    
    # Generate summary
    summary = agent.summarize_current_page(
        style="concise", 
        max_length=200
    )
    
    # Extract key points
    points = agent.get_key_points(num_points=5)
    
    # Ask questions about the content
    answer = agent.ask_question("What is this page about?")

With Azure AI Foundry (Recommended for Enterprise)

from Core.browser_agent import BrowserAgent

# Use Azure AI Foundry project endpoint
azure_endpoint = "https://<resource>.services.ai.azure.com/api/projects/<project>"

with BrowserAgent(
    headless=True,
    azure_endpoint=azure_endpoint,
    model="gpt-4o"
) as agent:
    agent.scrape_and_extract("https://example.com")
    summary = agent.summarize_current_page()

Using the Agent Framework Directly

from Core.agent_framework import OpenAIAgent, AgentOrchestrator
from Core.azure_ai_client import AzureAIClient

# Create an Azure AI client
azure_client = AzureAIClient(
    endpoint="https://<resource>.services.ai.azure.com/api/projects/<project>"
)

# Create a custom agent
agent = OpenAIAgent(
    name="MyAgent",
    system_prompt="You are a helpful assistant.",
    model="gpt-4o",
    azure_client=azure_client
)

# Register custom tools
agent.register_function(
    name="my_tool",
    description="Does something useful",
    function=lambda x: f"Processed: {x}",
    parameters={"type": "object", "properties": {"x": {"type": "string"}}}
)

# Invoke the agent
response = agent.invoke("Hello, how can you help me?")
print(response.content)

Multi-Agent Orchestration

from Core.agent_framework import AgentOrchestrator, OpenAIAgent

# Create orchestrator
orchestrator = AgentOrchestrator(name="MainOrchestrator")

# Register multiple agents
research_agent = OpenAIAgent(name="ResearchAgent", system_prompt="You research topics.")
summary_agent = OpenAIAgent(name="SummaryAgent", system_prompt="You summarize content.")

orchestrator.register_agent(research_agent)
orchestrator.register_agent(summary_agent)

# Invoke specific agents
result = orchestrator.invoke_agent("ResearchAgent", "Find info about AI agents")

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Agent Orchestrator                        β”‚
β”‚                 (Core/agent_framework.py)                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β–Ό                   β–Ό                   β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Browser Agent  β”‚ β”‚  Custom Agent   β”‚ β”‚  Other Agents   β”‚
β”‚ (browser_agent) β”‚ β”‚   (OpenAI)      β”‚ β”‚      ...        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚
          β”œβ”€β”€ Tools (scrape_url, summarize, etc.)
          β”‚
    β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β–Ό           β–Ό             β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Browser β”‚ β”‚  Text   β”‚ β”‚     Text     β”‚
β”‚   Auto  β”‚ β”‚ Extract β”‚ β”‚  Summarize   β”‚
β”‚         β”‚ β”‚         β”‚ β”‚              β”‚
β”‚Playwrightβ”‚ β”‚Beautifulβ”‚ β”‚ OpenAI/Azure β”‚
β”‚         β”‚ β”‚  Soup   β”‚ β”‚  AI Foundry  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β–Ό                   β–Ό
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚   OpenAI    β”‚     β”‚  Azure AI   β”‚
            β”‚   Direct    β”‚     β”‚   Foundry   β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Component Details

Core/agent_framework.py (NEW)

  • BaseAgent - Abstract base class for all agents
  • OpenAIAgent - Agent implementation using OpenAI/Azure OpenAI
  • AgentTool - Tool definition with OpenAI function calling format
  • AgentOrchestrator - Multi-agent coordination

Core/azure_ai_client.py (NEW)

  • AzureAIClient - Azure AI Projects SDK wrapper
  • AzureOpenAIDirectClient - Direct Azure OpenAI endpoint access
  • Supports DefaultAzureCredential authentication
  • Unified project endpoint for Foundry services

Core/browser_automation.py

  • Launch browsers (headless/headed mode)
  • Navigate with smart wait strategies
  • Capture screenshots
  • Extract HTML and text content

Core/text_extractor.py

  • Parse HTML with BeautifulSoup
  • Extract titles, headings, paragraphs
  • Parse links and code blocks
  • Clean and normalize text
  • Calculate statistics

Core/text_summarizer.py

  • Generate AI-powered summaries using OpenAI or Azure OpenAI
  • Supports Azure AI Foundry project endpoint
  • Extract key points from content
  • Answer questions using context
  • Multiple output styles (concise, detailed, bullet points)

Core/browser_agent.py

  • Extends OpenAIAgent from agent framework
  • Registers browser tools (scrape, summarize, key points, Q&A)
  • Supports Azure AI Foundry and direct OpenAI
  • Context manager support

πŸ“Š Extracted Content Structure

{
  "title": "Page Title",
  "headings": [
    {"level": 1, "text": "Main Heading"},
    {"level": 2, "text": "Subheading"}
  ],
  "paragraphs": [
    "First paragraph text...",
    "Second paragraph text..."
  ],
  "main_content": "Full cleaned text content of the page...",
  "links": [
    {"text": "Link text", "url": "https://example.com"}
  ],
  "code_blocks": [
    "code snippet 1",
    "code snippet 2"
  ],
  "word_count": 1234
}

🎨 Summarization Styles

Style Description Use Case
concise Brief overview Quick understanding
detailed Comprehensive with key points In-depth analysis
bullet_points Key takeaways as bullets Executive summary

πŸ› οΈ Configuration

Environment Variables

Copy .env.example to .env and configure:

# Option 1: Azure AI Foundry (Recommended for enterprise)
AZURE_AI_PROJECT_ENDPOINT=https://<resource>.services.ai.azure.com/api/projects/<project>

# Option 2: Direct Azure OpenAI
AZURE_OPENAI_ENDPOINT=https://<resource>.openai.azure.com/openai/v1
AZURE_OPENAI_API_KEY=your-key-here  # Optional if using Entra ID

# Option 3: Direct OpenAI
OPENAI_API_KEY=sk-your-api-key-here

# Model configuration
DEFAULT_MODEL=gpt-4o-mini

Authentication Options

Method Use Case Configuration
Azure AI Foundry + Entra ID Enterprise (Recommended) Set AZURE_AI_PROJECT_ENDPOINT, use DefaultAzureCredential
Azure OpenAI + API Key Azure with key auth Set AZURE_OPENAI_ENDPOINT + AZURE_OPENAI_API_KEY
Direct OpenAI Personal/Development Set OPENAI_API_KEY

Programmatic Configuration

# Option 1: Azure AI Foundry
agent = BrowserAgent(
    headless=True,
    azure_endpoint="https://<resource>.services.ai.azure.com/api/projects/<project>",
    model="gpt-4o"
)

# Option 2: Direct OpenAI
agent = BrowserAgent(
    headless=True,
    api_key="sk-your-api-key-here",
    model="gpt-4o-mini"
)

# Option 3: Environment variables
import os
os.environ["OPENAI_API_KEY"] = "sk-your-api-key-here"
agent = BrowserAgent(headless=True)

πŸ“ Use Cases

  1. Documentation Scraping - Extract and summarize technical docs
  2. Content Analysis - Analyze web pages for specific information
  3. Research Automation - Gather info from multiple sources
  4. Knowledge Extraction - Build structured data from web content
  5. Competitive Analysis - Monitor and analyze competitor content
  6. Tutorial Aggregation - Collect and summarize learning materials

πŸ§ͺ Testing

Quick Test (20 seconds)

cd Tests
python test_quick.py

Full Verification (60 seconds)

cd Tests
python verify_setup.py

What gets tested:

  • βœ… All dependencies installed
  • βœ… Playwright browsers available
  • βœ… Core modules functional
  • βœ… Browser automation working
  • βœ… Text extraction accurate
  • βœ… AI features (if API key set)

πŸ”§ Troubleshooting

Common Issues

Import Errors

# Use module-style imports from project root
from Core.browser_agent import BrowserAgent

"Playwright not found"

python -m playwright install chromium

"No module named..."

pip install -r Config/requirements.txt

"API key not configured"

  • Summarization is optional
  • Framework works without API key for text extraction
  • Only needed for AI-powered features

Path Issues

  • Run scripts from project root directory
  • Use module-style imports: from Core.browser_agent import BrowserAgent

Import errors

# Reinstall all dependencies
pip install --force-reinstall -r Config/requirements.txt

⚠️ Security Considerations

Browser automation can pose security risks. Best practices:

  • βœ… Run in isolated/sandboxed environments
  • βœ… Use headless mode for production
  • βœ… Set appropriate timeouts
  • βœ… Implement rate limiting
  • βœ… Review extracted content
  • ❌ Don't access sensitive sites
  • ❌ Don't store credentials in code
  • ❌ Don't bypass authentication

πŸ“š API Reference

BrowserAgent

class BrowserAgent(headless=True, api_key=None, azure_endpoint=None, model="gpt-4o")

Parameters:

  • headless (bool): Run browser in headless mode (default: True)
  • api_key (str, optional): OpenAI API key for summarization
  • azure_endpoint (str, optional): Azure AI Foundry project endpoint
  • model (str): Model to use (default: "gpt-4o")

Methods:

  • scrape_and_extract(url) β†’ Dict - Scrape URL and return structured content
  • summarize_current_page(style, max_length) β†’ str - Summarize current page
  • get_key_points(num_points) β†’ List[str] - Extract key points
  • ask_question(question) β†’ str - Answer question about content
  • save_extracted_content(filepath) - Save to JSON file
  • invoke(input_text) β†’ AgentResponse - Invoke agent with natural language
  • register_function(name, description, function, parameters) - Register custom tool
  • start() - Start the browser
  • close() - Close the browser

OpenAIAgent (Base Class)

class OpenAIAgent(name, system_prompt, model, api_key=None, azure_client=None)

Methods:

  • invoke(input_text) β†’ AgentResponse - Process input and return response
  • register_tool(tool) - Register an AgentTool
  • register_function(name, description, function, parameters) - Register function as tool
  • execute_tool(tool_name, arguments) - Execute a registered tool
  • add_message(role, content) - Add message to conversation history
  • clear_history() - Clear conversation history

AzureAIClient

class AzureAIClient(endpoint=None, credential=None, use_azure=True)

Methods:

  • get_openai_client(api_version) - Get OpenAI-compatible client
  • get_chat_completion(messages, model, temperature, max_tokens) β†’ str
  • is_available() β†’ bool - Check if Azure client is configured

TextSummarizer

class TextSummarizer(api_key=None, model="gpt-4o-mini", azure_client=None, azure_endpoint=None)

Parameters:

  • api_key (str, optional): OpenAI API key (or set OPENAI_API_KEY env var)
  • model (str): Model to use for summarization (default: gpt-4o-mini)
  • azure_client (AzureAIClient, optional): Azure AI client for Foundry access
  • azure_endpoint (str, optional): Azure AI Foundry project endpoint

Context Manager Support

with BrowserAgent(headless=True) as agent:
    content = agent.scrape_and_extract(url)
    # Browser automatically closed

🀝 Contributing

Contributions welcome! Areas for improvement:

  • Support for dynamic content (JavaScript-heavy sites)
  • Caching and rate limiting
  • Additional extraction patterns
  • MCP (Model Context Protocol) tool integration
  • Microsoft 365 Agents SDK channel support (Teams, Copilot)
  • Multi-page navigation
  • PDF export

πŸ“œ License

See LICENSE file.

πŸ™ Acknowledgments


Made with ❀️ for intelligent web automation

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors