- Overview
- Quick Start
- Core Principles
- Docker Setup
- Usage
- CI/CD Integration
- Configuration
- Troubleshooting
- Performance
Automated watermark detection system for CI/CD pipelines that prevents sensitive watermarks (e.g., Feishu/DingTalk personal watermarks) from being committed to repositories.
- ✅ Geometric Angle Filtering: Detects tilted watermarks (>10°) using OCR
- ✅ Docker-based: Pre-built image with all dependencies (~10x faster)
- ✅ Automatic: Runs on every PR with image changes
- ✅ Accurate: Pattern matching for "Name + 4-digit number" format
- ✅ Efficient: Only scans changed images in PRs
PR with Images → Detect Changes → Pull Docker Image → OCR Analysis
↓
Angle Calculation
↓
Pattern Matching
↓
Watermark Found? → ❌ Block PR
↓
No → ✅ Pass
Step 1: Add Workflow
Create .github/workflows/checkpatch.yml:
name: checkpatch
on:
pull_request:
types: [opened, reopened, synchronize]
jobs:
checkpatch:
uses: YOUR_USERNAME/public-actions/.github/workflows/checkpatch.yml@dev
secrets: inheritStep 2: Make Docker Image Public
- Go to
https://github.com/YOUR_USERNAME?tab=packages - Click on
watermark-detectorpackage - Package settings → Change visibility → Public
Done! Watermark detection now runs automatically on all PRs.
Local Testing:
# Using Docker (recommended)
docker run --rm -v "$(pwd):/workspace" \
ghcr.io/YOUR_USERNAME/watermark-detector:latest image.jpg
# Using Python script
python detect_watermark.py image.jpgOffice software (Feishu, DingTalk) adds personal watermarks to screenshots:
- Contains employee name + phone number
- Tilted at 30-45° for anti-forgery
- Can leak sensitive information when shared publicly
Geometric Angle Filtering Method:
-
Architecture Content: Horizontal (0°) or vertical (90°) text
- Diagrams must be aligned for readability
-
Watermark Text: Tilted at 30-45°
- Designed to cover entire image
- Cannot be easily removed
-
Detection Logic:
Extract all text → Calculate angles → Filter tilted text (>10°) → Pattern match "Name + 4 digits" → Report if found
- Stable: Based on physical geometry, not colors/brightness
- Accurate: Architecture diagrams are always horizontal/vertical
- Robust: Watermarks must be tilted to be effective
- Unavoidable: This contradiction is fundamental
Docker images are automatically built when:
detect_watermark.pyis modifiedDockerfile.watermarkis modified- Pushed to
devortrunkbranch
Base Image: python:3.10-slim
Pre-installed:
- opencv-python-headless 4.8.1.78
- numpy 1.24.3
- paddlepaddle 2.6.2
- paddleocr 2.7.3
- PaddleOCR models (pre-downloaded):
- en_PP-OCRv3_det_infer (detection)
- en_PP-OCRv4_rec_infer (recognition - latest)
- ch_ppocr_mobile_v2.0_cls_infer (angle classification)
Image Tags:
latest- Latest stable (dev branch, default)trunk- Trunk branch versiondev-<sha>- Specific commit from dev branch
Registry: ghcr.io/YOUR_USERNAME/watermark-detector
# Build locally
docker build -f Dockerfile.watermark -t watermark-detector .
# Test
docker run --rm -v "$(pwd):/workspace" watermark-detector test.jpg
# Push to registry
docker tag watermark-detector ghcr.io/YOUR_USERNAME/watermark-detector:latest
docker push ghcr.io/YOUR_USERNAME/watermark-detector:latestSingle Image:
python detect_watermark.py image.jpgDirectory:
python detect_watermark.py ./images/Docker:
docker run --rm -v "$(pwd):/workspace" \
ghcr.io/YOUR_USERNAME/watermark-detector:latest image.jpgMethod 1: Reusable Workflow (Recommended)
jobs:
checkpatch:
uses: YOUR_USERNAME/public-actions/.github/workflows/checkpatch.yml@dev
secrets: inheritMethod 2: Direct Docker
- name: Check Watermark
run: |
docker run --rm -v "$(pwd):/workspace" \
ghcr.io/YOUR_USERNAME/watermark-detector:latest image.jpgMethod 3: GitHub Action
- uses: YOUR_USERNAME/public-actions/.github/actions/watermark-check@dev
with:
image-path: 'image.jpg'✅ Safe Image:
[*] Starting watermark detection
[*] Filtering logic: Only tilted text > 10°
[*] Detecting image: diagram.png
[*] Recognizing text...
✅ diagram.png: No tilted text found, image is safe.
❌ Watermark Detected:
[*] Starting watermark detection
[*] Detecting image: screenshot.jpg
[*] Recognizing text...
[Suspicious watermark] Angle: 35.2° | Content: Jianjun Li
[Suspicious watermark] Angle: 35.1° | Content: 6719
🚨 screenshot.jpg: DETECTED SENSITIVE WATERMARK 🚨
Name: Jianjun Li | Number: 6719
The watermark check is integrated into checkpatch.yml:
- name: Check Watermark in Images
run: |
cd ${{ env.REPO_NAME }}
commits="${{ github.event.pull_request.base.sha }}..HEAD"
# Get modified images (case-insensitive)
image_count=$(git diff -z --name-only --diff-filter=ACM $commits | \
tr '\0' '\n' | grep -icE '\.(png|jpg|jpeg|bmp|gif)$' || echo 0)
if [ "$image_count" -gt 0 ]; then
echo "Found $image_count image(s), starting detection..."
# Check each image with Docker
git diff -z --name-only --diff-filter=ACM $commits | \
tr '\0' '\n' | grep -iE '\.(png|jpg|jpeg|bmp|gif)$' > /tmp/images.txt
has_error=0
while IFS= read -r img; do
if [ -f "$img" ]; then
docker run --rm -v "$(pwd):/workspace" \
ghcr.io/${{ github.repository_owner }}/watermark-detector:latest "$img" \
|| has_error=1
fi
done < /tmp/images.txt
[ $has_error -eq 1 ] && exit 1
echo "✅ All images passed"
fi- Trigger: PR opened, updated, or synchronized
- Scope: Only images changed in the PR
- Action: Blocks PR merge if watermark detected
- Performance: ~15-25 seconds (after image cache)
Edit detect_watermark.py:
# Angle threshold (default: 10 degrees)
MIN_ANGLE_THRESHOLD = 10.0
# Pattern for "Name + 4-digit number"
pattern = re.compile(r'([a-zA-Z]+(?:\s+[a-zA-Z]+)*)\s+(\d{4})\b')Lower threshold (5°): More sensitive, may have false positives Higher threshold (15°): Less sensitive, may miss some watermarks
- PNG (.png)
- JPEG (.jpg, .jpeg)
- BMP (.bmp)
- GIF (.gif)
Case-insensitive matching.
In detect_watermark.py:
ocr = PaddleOCR(
use_angle_cls=True, # Enable angle classification
lang="en", # Language (en/ch)
det_db_thresh=0.05, # Detection threshold
det_db_unclip_ratio=2.5, # Text box expansion
show_log=False # Suppress logs
)Error:
Error: failed to pull image: unauthorized
Solution:
- Ensure Docker image is public
- Go to GitHub Packages → watermark-detector → Settings
- Change visibility to Public
Error:
Error: manifest unknown
Solution:
- Check if image exists:
https://github.com/YOUR_USERNAME?tab=packages - Wait for initial build to complete (~5-10 minutes)
- Verify image name in workflow matches registry
Symptom: Architecture diagram text flagged as watermark
Solution:
- Check if diagram has tilted text (e.g., diamond shapes)
- Increase
MIN_ANGLE_THRESHOLDto 15° - Adjust regex pattern to be more specific
Symptom: Watermark not detected
Solution:
- Check image resolution (may be too low)
- Decrease
MIN_ANGLE_THRESHOLDto 5° - Verify watermark angle is >10° (use test mode)
- Check if watermark matches pattern (Name + 4 digits)
Symptom: CI takes too long
Solution:
- Ensure Docker image is cached (first run is slower)
- Reduce image resolution before committing
- Check if multiple large images in PR
- Consider parallel processing for multiple images
Without Docker (Installing dependencies each time):
- Install dependencies: ~2-3 minutes
- Detection: ~10-20 seconds per image
- Total: ~2.5-3.5 minutes
With Docker (Pre-built image with models):
- Pull image (first time): ~30-60 seconds
- Pull image (cached): ~5 seconds
- Detection: ~5-10 seconds per image (no model download!)
- Total: ~10-15 seconds (after cache)
Speed Improvement: ~15x faster ⚡
- Image Size: Keep images under 2MB
- Resolution: 1920x1080 is usually sufficient
- Format: PNG is faster than JPEG for diagrams
- Batch: Multiple small images faster than one large image
Docker Image:
- Size: ~1.5 GB (compressed: ~500 MB)
- Memory: ~500 MB during detection
- CPU: 1 core sufficient
CI Runner:
- Disk: ~2 GB for image + workspace
- Memory: ~1 GB total
- Network: ~500 MB first pull, ~0 MB cached
detect_watermark.py- Detection script (5.8K)Dockerfile.watermark- Docker image definition (765B)
.github/workflows/checkpatch.yml- Main CI workflow.github/workflows/build-watermark-docker.yml- Image builder
.github/actions/watermark-check/action.yml- Reusable action
WATERMARK-DETECTION.md- This file (complete guide)
test-watermark.sh- Test Python environmenttest-docker.sh- Test Docker image
.dockerignore- Docker build optimization
1. Preprocess image (enhance contrast, binarize)
2. Run OCR to extract all text boxes
3. For each text box:
a. Calculate angle: atan2(dy, dx)
b. If angle > threshold: mark as watermark candidate
4. Concatenate all watermark candidates
5. Apply regex: r'([a-zA-Z]+(?:\s+[a-zA-Z]+)*)\s+(\d{4})\b'
6. If match found: report watermarkAlternatives considered:
- Color filtering: Unreliable (watermarks can be any color)
- Line removal: Damages architecture content
- Template matching: Requires known watermark templates
Angle-based advantages:
- Fundamental geometric property
- Independent of color/brightness
- Doesn't damage content
- Works with unknown watermark formats
- Check this documentation
- Review GitHub Actions logs
- Test locally with Docker
- Open an issue with:
- Error message
- Sample image (without sensitive data)
- Workflow logs
Improvements welcome:
- Better OCR accuracy
- Faster detection
- Support for more watermark types
- Documentation improvements
Part of OpenVela public-actions repository.
# Local test with Python
python detect_watermark.py image.jpg
# Local test with Docker
docker run --rm -v "$(pwd):/workspace" \
ghcr.io/YOUR_USERNAME/watermark-detector:latest image.jpg
# Build Docker image
docker build -f Dockerfile.watermark -t watermark-detector .
# Run test suite
./test-docker.sh- Angle: 10° (tilted text threshold)
- Pattern:
Name + 4 digits - Formats: png, jpg, jpeg, bmp, gif
- Packages:
https://github.com/YOUR_USERNAME?tab=packages - Actions:
https://github.com/YOUR_USERNAME/REPO/actions - Registry:
ghcr.io/YOUR_USERNAME/watermark-detector
Last Updated: 2025-01-16 Version: 1.0.0