AutoSurfer

Overview

This project features an autonomous web agent designed to perform user-specified tasks on the web, integrating the power of Large Multimodal Models (LMMs) like GPT-4o with browser automation using Selenium. The application includes:

A frontend for user interaction (branch: master).
A backend for processing user inputs and managing automation (branch: backend).

The agent can handle complex tasks, such as navigating websites, filling out forms, and extracting information, while providing real-time feedback to the user.

Media1.mp4

Features

Frontend:
- User-friendly interface for task submission and monitoring.
- Live feedback showing the agent’s progress through screenshots.
- Dropdown for selecting AI models (currently GPT-4o).
Backend:
- Communication with OpenAI APIs for reasoning and task execution.
- Integration with Selenium for browser automation.
- Flexible architecture allowing future enhancements and scalability.

Project Structure

Branches

master (Frontend)
- Built with React and Next.js.
- Components styled using NextUI.
backend
- Developed with Python and Flask.
- Automates the browser using Selenium.
- Workflow management via n8n.

Getting Started

Prerequisites

Ensure you have the following installed:

Node.js (for the frontend)
Python 3.9+ (for the backend)
Chrome WebDriver (compatible with your Chrome browser version)
Git

Installation

Clone the Repository

git clone https://github.com/yourusername/autonomous-web-agent.git
cd autonomous-web-agent

Setup Frontend (master branch)

git checkout master
cd frontend
npm install

Setup Backend (backend branch)

git checkout backend
cd backend
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install Chrome WebDriver
- Download the appropriate version for your Chrome browser from here.
- Place it in a directory included in your system’s PATH.

Running the Application

Start the Backend

cd backend
python app.py

Start the Frontend

cd frontend
npm run dev

The application will be accessible at http://localhost:3000.

Usage

Enter the task you want the web agent to complete in the input field.
Optionally, provide a starting website URL.
Click "Submit" to see the agent navigate the web and perform actions.
Monitor the progress through real-time screenshots and logs.

Technologies Used

Frontend: React, Next.js, NextUI
Backend: Python, Flask, Selenium, OpenAI API
Workflow Automation: n8n
Browser Automation: Chrome WebDriver

Future Enhancements

Support for additional LMMs like Gemini.
Enhanced UI with more customization options.
Improved scalability for high-volume tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.vscode		.vscode
app		app
components		components
config		config
public		public
styles		styles
types		types
.eslintignore		.eslintignore
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.npmrc		.npmrc
LICENSE		LICENSE
README.md		README.md
next.config.js		next.config.js
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoSurfer

Overview

Features

Project Structure

Branches

Getting Started

Prerequisites

Installation

Running the Application

Start the Backend

Start the Frontend

Usage

Technologies Used

Future Enhancements

About

Uh oh!

Releases

Packages

Languages

License

BilalElHammouchi/autosurfer

Folders and files

Latest commit

History

Repository files navigation

AutoSurfer

Overview

Features

Project Structure

Branches

Getting Started

Prerequisites

Installation

Running the Application

Start the Backend

Start the Frontend

Usage

Technologies Used

Future Enhancements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages