π An AI-powered GDPR compliance verification system for legal contracts
The Contract Compliance Checker is an intelligent web application that automates the process of analyzing legal contracts for - π Runs daily at midnight (00:00)
- π₯ Scrapes latest templates from official sources
- π Compares new templates with existing ones using hash verification
- π§ Sends email notifications when changes are detected
- π¬ Sends Slack notifications (if configured)
- β Ensures compliance checks use current standards
β οΈ Reports any errors encountered during updatesompliance. It uses advanced AI models to detect document types, extract clauses, and compare them against regulatory standards.
- π Automated Document Classification - Identifies 5 types of GDPR-related agreements
- π Intelligent Clause Extraction - Extracts and summarizes contract clauses using AI
- βοΈ Compliance Analysis - Compares uploaded contracts against standard templates
- π― Risk Assessment - Assigns risk scores (0-100) for compliance gaps
- π Detailed Reporting - Provides missing clauses, risks, and recommendations
- π Auto-Update System - Scheduled scraping to keep templates up-to-date
- π¨ User-Friendly Interface - Built with Streamlit for easy interaction
The system can analyze the following contract types:
- π Data Processing Agreement (DPA)
- π€ Joint Controller Agreement (JCA)
- π Controller-to-Controller Agreement (C2C)
- π Processor-to-Subprocessor Agreement (PSA)
- π Standard Contractual Clauses (SCC)
project/
βββ main.py # π Streamlit application entry point
βββ agreement_comparision.py # π Document classification & comparison
βββ data_extraction.py # π Clause extraction with AI
βββ scrapping.py # π·οΈ Template scraping & updates
βββ pipeline.py # βοΈ Automated processing pipeline
βββ notification.py # π§ Email notification module (SMTP)
βββ slack_notification.py # π¬ Slack notification module (Webhooks)
βββ requirements.txt # π¦ Python dependencies
βββ .env # π Environment variables (not in git)
βββ .env.example # π Template for environment variables
βββ json/ # πΎ Template standards
β βββ DPA.json
β βββ JCA.json
β βββ CCA.json
β βββ PSA.json
β βββ SCC.json
βββ templates/ # π Reference documents
βββ (DPA) appendix-gdpr-eea-uk-4-27-21.pdf
βββ (JCA) model-joint-controllership-agreement.pdf
βββ (C2C) 2-Controller-to-controller-data-privacy-addendum.pdf
βββ (SCCs) Standard Contractual Clauses.pdf
βββ (PSA) Personal-Data-Sub-Processor-Agreement-2024-01-24.pdf
- π Python 3.8 or higher
- π Google Gemini API key
-
Clone the repository
git clone <repository-url> cd project
-
Create virtual environment
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Set up environment variables
Create a
.envfile in the project root (copy from.env.example):# API Keys GEMINI_API_KEY=your_gemini_api_key_here GROQ_API_KEY=your_groq_api_key_here # Email Configuration (for notifications) SMTP_SENDER_EMAIL=your_email@gmail.com SMTP_PASSWORD=your_app_password_here SMTP_RECEIVER_EMAIL=receiver_email@gmail.com SMTP_SERVER=smtp.gmail.com SMTP_PORT=587 # Slack Configuration (optional) SLACK_WEBHOOK_URL=https://hooks.slack.com/services/YOUR/WEBHOOK/URL
Note for Gmail Users: Use an App Password, not your regular password. Enable 2-Step Verification first, then generate an app password from Google Account Security settings.
-
Prepare template standards (if not already present)
Run the pipeline to generate JSON templates:
python pipeline.py
Launch the Streamlit web application:
streamlit run main.pyThe application will open in your default browser at http://localhost:8501
-
Upload Contract π€
- Click "Browse files" or drag & drop a PDF contract
- Supported format: PDF only
-
Automatic Analysis π€
- The system detects the document type
- Extracts all clauses automatically
- Compares against GDPR standard templates
-
Review Results π
- View missing or altered clauses
- Check compliance risks
- Review risk score and recommendations
- Get actionable amendments
- π¨ Streamlit web interface
- π€ File upload handling
- π Background scheduler for auto-updates
- π Results visualization
- π Document type detection using AI
- βοΈ Clause-by-clause comparison
- π― Risk scoring and analysis
- π‘ Compliance recommendations
- π PDF text extraction
- π€ AI-powered clause extraction
- π Summarization (for large documents)
- πΎ JSON output generation
- π·οΈ Automated template scraping from web sources
- π Scheduled updates (daily at 12:00 AM)
- π₯ Downloads latest standard agreements
- οΏ½ Detects changes using file hash comparison
- π§ Sends email notifications when templates are updated
- π‘οΈ Error handling and reporting
- βοΈ Orchestrates the entire workflow
- ποΈ Builds template library
- π Runs end-to-end comparison pipeline
- π§ Email notification system using SMTP
- π Secure credential management via environment variables
- β Configurable sender, receiver, and message content
- π‘οΈ Error handling and validation
- π¬ Slack notification system using webhooks
- π Rich formatted messages with blocks
- π― Compliance report formatting
- π Template update notifications
- π Secure webhook URL management
# Run pipeline for a new document
from pipeline import run_pipeline
result = run_pipeline("your-contract.pdf")
print(result)# Send email notification
from notification import send_email
# Use defaults from .env
send_email()
# Send custom notification
send_email(
subject="Compliance Report Ready",
body="Your GDPR compliance analysis is complete. Risk Score: 45/100",
receiver="team@company.com"
)python notification.pypython slack_notification.pypython test_scraping_notification.pyThis will manually trigger the scraping process and send notifications if changes are detected.
- π Temporary files are automatically cleaned up
- ποΈ Uploaded files are deleted after processing
- π API keys and credentials stored securely in
.envfile - π« No data is stored permanently on the server
- π
.envfile excluded from version control via.gitignore - π‘οΈ Gmail App Passwords used instead of regular passwords
- β Environment variables for all sensitive configuration
- Frontend: Streamlit
- AI Model: Google Gemini 2.5 Flash
- PDF Processing: PyPDF2, pypdf
- Data Validation: Pydantic
- Scheduling: schedule
- Environment: python-dotenv
- 0-25: β Low Risk - Minor issues
- 26-50:
β οΈ Medium Risk - Attention needed - 51-75: πΆ High Risk - Significant gaps
- 76-100: π΄ Critical Risk - Major compliance issues
The application includes a background scheduler that:
- π Runs daily at midnight (00:00)
- π₯ Scrapes latest templates from official sources
- οΏ½ Compares new templates with existing ones using hash verification
- π§ Sends email notifications when changes are detected
- β Ensures compliance checks use current standards
β οΈ Reports any errors encountered during updates
- β New templates created
- β Existing templates updated with new clauses
β οΈ Download or processing errors
Notifications are sent via:
- π§ Email (SMTP)
- π¬ Slack (Webhooks - if configured)
When templates are updated, you'll receive an email like:
Subject: π GDPR Template Update Notification
π CHANGES DETECTED:
β’ Data Processing Agreement: Template updated with new clauses
β’ Standard Contractual Clauses: New template created
See SCRAPING_NOTIFICATION_GUIDE.md for detailed information.
Contributions are welcome! Please feel free to submit issues or pull requests.
This project is licensed under the MIT License.
For questions or support, please contact the development team.
- π Multi-language support
- π§ Email notifications for compliance reports β
- π Historical comparison tracking
- π Integration with document management systems
- π± Mobile-responsive interface
- π¨ Custom template creation
- π Advanced analytics dashboard
- π Webhook support for real-time notifications
- π± SMS notifications
- Automatic change detection for template updates
- Sends email notifications when GDPR templates are updated
- Uses MD5 hash comparison for accurate change detection
- Reports both successful updates and errors
- Fixed JSON file paths (
json_files/βjson/) - Added comprehensive logging and error handling
- notification.py completely rewritten with security best practices
- Moved all hardcoded credentials to
.envfile - Added reusable
send_email()function with flexible parameters - Implemented proper error handling and input validation
- Added comprehensive documentation
- All sensitive credentials now in
.envfile:- Email credentials (SMTP sender, password, receiver)
- API keys (Gemini, Groq)
- Server configuration
- Created
.env.exampleas a safe template for team members - Verified
.gitignoreexcludes.envfrom version control
- Eliminated security risks of hardcoded credentials
- Improved code maintainability and reusability
- Added detailed inline documentation
- Fixed typos and improved code structure
- Follows Python PEP 8 standards
- Updated README.md with notification module usage
- Added Gmail App Password setup instructions
- Included environment variable configuration guide
- Provided example workflows for email notifications
Made with β€οΈ for GDPR Compliance | Powered by π€ Google Gemini AI