This project features an autonomous web agent designed to perform user-specified tasks on the web, integrating the power of Large Multimodal Models (LMMs) like GPT-4o with browser automation using Selenium. The application includes:
- A frontend for user interaction (branch:
master). - A backend for processing user inputs and managing automation (branch:
backend).
The agent can handle complex tasks, such as navigating websites, filling out forms, and extracting information, while providing real-time feedback to the user.
Media1.mp4
-
Frontend:
- User-friendly interface for task submission and monitoring.
- Live feedback showing the agent’s progress through screenshots.
- Dropdown for selecting AI models (currently GPT-4o).
-
Backend:
- Communication with OpenAI APIs for reasoning and task execution.
- Integration with Selenium for browser automation.
- Flexible architecture allowing future enhancements and scalability.
-
master(Frontend)- Built with React and Next.js.
- Components styled using NextUI.
-
backend- Developed with Python and Flask.
- Automates the browser using Selenium.
- Workflow management via n8n.
Ensure you have the following installed:
- Node.js (for the frontend)
- Python 3.9+ (for the backend)
- Chrome WebDriver (compatible with your Chrome browser version)
- Git
-
Clone the Repository
git clone https://github.com/yourusername/autonomous-web-agent.git cd autonomous-web-agent -
Setup Frontend (master branch)
git checkout master cd frontend npm install -
Setup Backend (backend branch)
git checkout backend cd backend python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Chrome WebDriver
- Download the appropriate version for your Chrome browser from here.
- Place it in a directory included in your system’s PATH.
cd backend
python app.pycd frontend
npm run devThe application will be accessible at http://localhost:3000.
- Enter the task you want the web agent to complete in the input field.
- Optionally, provide a starting website URL.
- Click "Submit" to see the agent navigate the web and perform actions.
- Monitor the progress through real-time screenshots and logs.
- Frontend: React, Next.js, NextUI
- Backend: Python, Flask, Selenium, OpenAI API
- Workflow Automation: n8n
- Browser Automation: Chrome WebDriver
- Support for additional LMMs like Gemini.
- Enhanced UI with more customization options.
- Improved scalability for high-volume tasks.
