Skip to content

Jasmine-ardhi/Multi-threaded-HTTP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Project Overview

This project is an implementation of a multi-threaded HTTP/1.1 server built from scratch using Python's low-level socket programming. It is designed to handle multiple concurrent clients using a thread pool architecture, serving both static HTML and binary files (GET requests) and processing JSON uploads (POST requests). The server adheres to strict HTTP protocol requirements, including Connection Keep-Alive and robust security measures like Host header validation and Path Traversal protection.

1. Build and Run Instructions

1.1 Prerequisites

Python 3.x

Basic Linux/macOS command-line environment for testing (using curl or netcat/nc).

1.2 Directory Structure

Before running the server, ensure the following directory structure is set up:

project/ ├── server.py # The main server code └── resources/ # Root directory for serving content ├── index.html # Default HTML file (Required) ├── about.html # HTML file (Required) ├── contact.html # HTML file (Required) ├── sample.txt # Text file for binary transfer (Required) ├── logo.png # Image file for binary transfer (Required) ├── large.png # Large image file (>1MB) (Required) ├── photo.jpg # JPEG image file (Required) └── uploads/ # Directory for POST-ed JSON files

1.3 Running the Server

The server accepts up to three optional command-line arguments: port, host, and max_threads.

Default Execution:

(Runs on 127.0.0.1:8080 with 10 threads)

python3 server.py

Custom Execution Example:

(Runs on 0.0.0.0:8000 with a thread pool size of 20)

python3 server.py 8000 0.0.0.0 20

1.4 Testing Scenarios

Use curl or nc in a separate terminal to test the functionality.

Testing and Verification Scenarios

Test Method/Path Command Example Expected Status
Basic GET / curl -i http://127.0.0.1:8080/ 200 OK (text/html)
Binary Download /logo.png curl -O http://127.0.0.1:8080/logo.png 200 OK (application/octet-stream)
JSON POST /upload curl -i -X POST -H "Content-Type: application/json" -d '{"data": "test"}' http://127.0.0.1:8080/upload 201 Created
Path Traversal /../etc/passwd curl -i http://127.0.0.1:8080/../etc/passwd 403 Forbidden
Host Mismatch Host: evil.com curl -i -H "Host: evil.com" http://127.0.0.1:8080/index.html 403 Forbidden

2. Thread Pool Architecture

The server uses a Producer-Consumer model implemented with Python's built-in threading and queue modules for concurrency (Requirement 3).

Producer (Main Thread): The main thread is responsible for socket listening (server.accept()). When a new connection arrives, the main thread acts as the producer, placing the (socket, address) tuple onto the shared Connection Queue.

Consumer (Worker Threads): A pool of configurable worker threads (MAX_THREADS default 10) continuously monitors the Connection Queue. When a connection is available, a worker thread consumes the task (CONNECTION_QUEUE.get()), and calls handle_client(conn, addr).

Synchronization: The queue.Queue automatically handles synchronization (locks/mutexes) for safe multi-threaded access, preventing race conditions.

Saturation: If the thread pool is busy and the queue capacity (LISTEN_QUEUE_SIZE, default 50) is exceeded, the server immediately returns a 503 Service Unavailable response with a Retry-After header to the client, preventing resource exhaustion.

3. Binary Transfer Implementation

Binary file transfer is designed for efficiency and data integrity (Requirement 5B).

Data Integrity

Files (.txt, .png, .jpg, .jpeg) are opened and read using the binary mode ('rb') to ensure raw byte data is handled without any encoding or corruption.

Header Setup

The Content-Type is set to application/octet-stream.

The Content-Disposition: attachment; filename="..." header is included to instruct the client (browser) to download the content as a file rather than attempting to display it inline.

The exact file size is calculated using os.path.getsize() and set in the Content-Length header.

Buffer Management

Instead of reading the entire file into memory (which is inefficient for large files), the file content is read and sent to the socket in 4KB chunks (f.read(4096)), ensuring efficient buffer management and supporting the seamless transfer of large files (>1MB).

4. Security Measures Implemented

The server adheres to strict security protocols to prevent common web vulnerabilities (Requirement 7).

4.1. Path Traversal Protection

Mechanism: The server uses os.path.realpath() to convert the requested path (/../etc/passwd) and the server's document root (resources/) into their canonical, absolute forms.

Validation: It then strictly checks that the normalized requested path starts with the absolute path of the resources directory. If the request attempts to access any file outside this root (e.g., /../), the check fails.

Response: All attempts are logged, and the server returns 403 Forbidden.

4.2. Host Header Validation

Mechanism: The server extracts the Host header immediately after parsing the initial request.

Validation: It compares the received Host value against a list of explicitly permitted hosts (localhost, 127.0.0.1, localhost:PORT, 127.0.0.1:PORT).

Missing Host: If the header is missing (mandatory for HTTP/1.1), the server responds with 400 Bad Request and closes the connection.

Mismatched Host: If the header is present but invalid, the server responds with 403 Forbidden and logs the violation.

5. Known Limitations

HTTP Version: Only supports the core features of HTTP/1.1 (Keep-Alive, Host header). It does not support features like compression (gzip), chunked transfer encoding, or pipelining.

Method Support: Only GET and POST methods are implemented. All others result in a 405 Method Not Allowed response.

Supported MIME Types: GET requests only support a limited set of file extensions (.html, .txt, .png, .jpg, .jpeg). Any other file type results in a 415 Unsupported Media Type error.

Error Handling: While robust for I/O and protocol errors, it does not include advanced signaling or resource management for high-load production environments.

6. Code Walkthrough

1. Imports and Configuration

The server imports Python modules for networking (socket), threading (threading, queue), file operations (os, json, datetime), and logging (logging). Logging is configured to include timestamps and thread names for better traceability.

ROOT_DIR → Folder from which files are served (resources/)

MAX_THREADS → Default size of thread pool (2, configurable later)

CONNECTION_QUEUE → Queue to manage incoming client connections

2. http_date()

Generates the current date in the correct HTTP format (RFC 7231) for response headers.

3. handle_client(conn, addr)

This is the core request handler that runs inside each worker thread. It performs the following major tasks:

a. Request Parsing

  • Reads raw HTTP data from the client.
  • Extracts method, path, and headers.
  • Validates the Host header to prevent unauthorized access. -Returns 400 if missing. -Returns 403 if invalid.

b. GET Request Handling

Handles requests for static and binary files:

  • Maps request paths to files inside the resources/ directory.

  • Protects against path traversal attacks using os.path.realpath and os.path.commonpath.

  • Serves:

    • .html → text/html; charset=utf-8
    • .txt, .png, .jpg, .jpeg → application/octet-stream
  • Sends files with Content-Disposition headers for downloads.

  • Returns:

    • 404 if file not found
    • 415 for unsupported file types

c. POST Request Handling

  • Only accepts application/json requests.
  • Reads and validates the JSON body.
  • Saves data in resources/uploads/ as: upload_[timestamp]_[randomid].json
  • Returns a JSON response with status 201 Created.

d. Error Handling

  • Handles invalid methods (405), malformed requests (400), and unsupported types (415), logging every event.

4. worker()

Each worker thread continuously listens for connections from the shared queue (CONNECTION_QUEUE):

  • When a connection arrives, it calls handle_client().
  • Marks the task as complete once handled.

5. run(host, port)

The main server startup function:

  • Creates and binds a TCP socket.
  • Starts listening for incoming connections (queue size 50).
  • Spawns worker threads (MAX_THREADS) to form the thread pool.
  • Gracefully shuts down on Ctrl + C.

When new connections arrive:

  • They are added to the queue if space is available.

  • If the queue is full, responds with:

    HTTP/1.1 503 Service Unavailable Retry-After: 5

6. Logging

Comprehensive logging throughout:

  • Server startup and configuration
  • Connection assignments to threads
  • File transfers and response
  • Queue saturation warnings
  • Security violations (invalid Host, path traversal, etc.)

7. Security Features

Host Header Validation → Prevents forged requests Path Traversal Protection → Ensures files are only served from resources/ Error Codes → Prevents information leakage by returning generic HTTP errors

8. Supported Responses

Status Code Description
200 OK File served successfully
201 Created JSON file created on POST
400 Bad Request Missing or invalid request
403 Forbidden Path or Host violation
404 Not Found Missing file
405 Method Not Allowed Unsupported method
415 Unsupported Media Type Wrong Content-Type or file type
503 Service Unavailable Thread pool full

Summary

This implementation:

  • Uses TCP sockets for communication
  • Employs a multi-threaded architecture with a fixed-size thread pool
  • Safely serves both HTML and binary content
  • Handles POST JSON uploads
  • Implements key HTTP protocol features, including Host validation, connection handling, and status responses
  • Includes comprehensive logging and security protections

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors