DistLink is a high-performance, distributed URL shortener designed to handle millions of requests per second. Built with modern technologies like NestJS, ScyllaDB, Kafka, Redis, and Kubernetes, it offers fast URL redirects, real-time analytics, and robust anti-spam protection. This project is designed for scalability, fault tolerance, and low latency, making it ideal for high-traffic applications.
Frontend: DistLink Frontend
- High Performance: Optimized for millions of requests per second.
- Scalability: Designed for horizontal scaling with Kubernetes.
- Real-time Analytics: Track URL clicks and user behavior with Kafka and ClickHouse.
- Anti-Spam Protection: Cloudflare WAF integration for security and rate limiting.
- Fault Tolerance: Distributed architecture with data replication in ScyllaDB and Kafka.
- Automatic Link Expiration: TTL (Time-To-Live) implemented in ScyllaDB for automatic removal of old links.
The architecture prioritizes scalability, fault tolerance, and low latency.
- Cloudflare WAF:
- Protects against DDoS attacks, bot traffic, and other web vulnerabilities.
- Implements rate limiting to prevent abuse and ensure fair usage.
- NestJS Application:
- Handles URL shortening and redirection requests.
- Exposes REST API endpoints for easy integration.
- Implements caching using Redis to reduce database load and improve response times.
- Publishes click events to Kafka for real-time analytics.
- Includes API documentation using Swagger/OpenAPI (Optional, but recommended).
- ScyllaDB (NoSQL Database):
- Provides high-throughput storage for URL mappings, ensuring fast lookups.
- Uses a partition key (short_code) for efficient data distribution and retrieval.
- Implements TTL (Time-To-Live) to automatically expire links after a configured period, preventing database bloat.
- Uses CQL for data modeling.
- Redis:
- Stores frequently accessed URL mappings in-memory for ultra-fast retrieval.
- Significantly reduces database load for high-traffic URLs, improving overall system performance.
- Utilizes a configurable TTL to automatically expire cached entries.
- Kafka Cluster:
- Acts as a distributed event streaming platform.
- Kafka Producers send click tracking events containing metadata like IP address, user-agent, and timestamp.
- Kafka Consumers process these events and store them in ClickHouse.
- ClickHouse (Analytics Database):
- Stores user click events for real-time reporting and analysis.
- Supports high-speed aggregations and queries, enabling the creation of insightful dashboards.
- Consider mentioning the schema for the ClickHouse table (short_code, timestamp, ip_address, user_agent, referrer, etc.).
- Docker:
- Containerizes all services, ensuring consistent and reproducible deployments across different environments.
- Kubernetes (K8s):
- Orchestrates containerized services, automating deployment, scaling, and management.
- Ensures auto-scaling of services based on load, maintaining optimal performance.
- Provides self-healing capabilities, automatically restarting failed containers.
- Supports rolling updates, allowing for seamless deployments without downtime.
- Ingress Controller (Traefik/Nginx):
- Manages external HTTP traffic into the Kubernetes cluster, routing requests to the appropriate services.
- Provides load balancing and SSL termination.
- The API receives a request to shorten a long URL.
- Cloudflare WAF filters the request to prevent malicious activity.
- The NestJS application generates a unique
short_code(consider mentioning the algorithm used for generating the code, e.g., base62 encoding). - The application checks the Redis cache for the
short_code. - If the
short_codeis not found in Redis (cache miss), the URL mapping is stored in ScyllaDB and the Redis cache is updated. - The shortened URL is returned to the user.
- A user accesses a shortened URL.
- Cloudflare WAF filters the traffic.
- The NestJS application checks the Redis cache for the original URL associated with the
short_code. - If the cache misses, the application fetches the URL mapping from ScyllaDB and updates the Redis cache.
- A click event is logged to Kafka, including metadata such as IP address, user-agent, and timestamp.
- The user is redirected to the original URL.
- Kafka Consumer subscribes to the click event topic in Kafka.
- The consumer processes the click events and stores the click metadata (IP, user-agent, timestamp, referrer) in ClickHouse.
- Real-time dashboards query analytics data in ClickHouse to provide insights into URL usage.
- Docker
- Docker Compose (for local development)
- Kubernetes cluster (for production deployment)
- kubectl (Kubernetes command-line tool)
-
Clone the repository:
git clone https://github.com/Azzurriii/DistLink.git cd DistLink -
Configure Environment Variables:
Create a
.envfile based on the.env.exampletemplate. Fill in the necessary configuration parameters for your environment (database credentials, Kafka brokers, Redis connection details, etc.). -
Local Development (Docker Compose):
docker-compose up -d
This will start all the necessary services in Docker containers.
-
Access the API:
The API will be accessible at
http://localhost:8000. Refer to the NestJS application's configuration for the specific port. -
Check the logs
docker-compose logs -f-
Build Docker Images:
Build Docker images for each service (NestJS API, Kafka Consumer, etc.) and push them to a container registry (e.g., Docker Hub, Google Container Registry, AWS ECR). Add a Dockerfile to your project and include build instructions in the README.
-
Configure Kubernetes Manifests:
- The
k8s/directory should contain Kubernetes manifests (YAML files) for deploying the services. - Customize the manifests with your container image names, resource requirements, and other configuration settings.
- Use ConfigMaps and Secrets to manage environment variables and sensitive information.
- Include manifests for:
- Deployments (NestJS API, Kafka Consumer)
- Services (LoadBalancer or NodePort for external access)
- StatefulSet (ScyllaDB)
- PersistentVolumeClaims (for ScyllaDB data persistence)
- Ingress (for routing traffic to the API)
- HorizontalPodAutoscaler (for auto-scaling the API)
- The
-
Deploy to Kubernetes:
kubectl apply -f k8s/
This will deploy all the services to your Kubernetes cluster.
-
Verify Deployment:
kubectl get pods kubectl get services kubectl get deployments
Check that all pods are running and services are accessible.
- NestJS API: Horizontal Pod Autoscaler (HPA) automatically scales the number of API pods based on CPU utilization or other metrics. Configure the HPA with appropriate minimum and maximum replica counts.
- ScyllaDB: Use a StatefulSet to manage ScyllaDB for high availability and data persistence. Ensure proper replication and data distribution across nodes.
- Kafka: Deploy Kafka in a multi-broker setup for fault tolerance. Configure topic replication to ensure data durability.
Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.
