A Python script that monitors specified Telegram chats and forwards messages matching your custom filters to another chat. It also supports OCR on attached images and avoids forwarding duplicate messages.
- Keyword-based filtering: Matches at least one keyword from each of three keyword categories.
- Image text recognition: Downloads attached images and extracts text using OCR (
pytesseract). - Duplicate prevention: Checks your last 10 forwarded messages before sending a new one.
- Customizable filters: Store your keywords and Telegram API keys in a JSON file.
- Multi-chat monitoring: Watch multiple Telegram chats at once.
On October 7th, 2025, the project reached a divergence point — a moment where I had to choose between two paths: specificity, at the expense of ease of repurposing, or generality, at the expense of reliability for my current use case.
In the end, I chose both. I wanted the strengths of each approach, so now this project provides two distinct versions, each tailored to different needs:
Best for: Filtering job postings with intelligent level detection
- Two-stage filtering with inference logic
- Entry-level vs. mid-level classification
- Experience, certification, and responsibility pattern matching
- Optimized for English/Arabic job market terminology
- Trade-off: Highly tailored for my very specific application; requires significant modification for other use cases but only some adjustments to be used as a specialized job filtering
Best for: Simple keyword-based filtering for any content type
- Straightforward AND logic (level + role + location)
- Easy to repurpose for different domains (e.g., real estate, events, products)
- Minimal configuration required
- Trade-off: Less intelligent; may miss nuanced matches
📁 Files:
main.py- Current specialized version (v2)main_simple.py- Original general-purpose version (v1)
💡 Which should you use?
- Filtering job postings specifically? → Use v2
- Need a simple keyword filter for other content? → Use v1
- Want to build something custom? → Start with v1 as a template
| Feature | v1 (Simple) | v2 (Specialized) |
|---|---|---|
| Add new keywords | Easy | Easy |
| Change filter logic | Moderate | Complex |
| Repurpose for different domain | Moderate | Very Complex |
| Add new languages | Moderate | Challenging |
- Python 3.8+
- Telegram API credentials (API ID & API Hash, check Telethon documentation for detailed instructions)
tesseract-ocrfor OCR functionality
Python libraries used:
telethon
pillow
pytesseract
scikit-learn
cryptg
tenacity
Beautiful Soup
lxml-
clone the repository:
git clone https://github.com/5wHN28Dg/tele-notify.git cd tele-notify -
create a virtual environment:
python3 -m venv venv source venv/bin/activate -
Install dependencies:
pip install -r requirements.txt
-
Install Tesseract OCR:
-
Ubuntu/Debian:
sudo apt install tesseract-ocr tesseract-ocr-eng tesseract-ocr-ara
-
Windows: Download installer
-
MacOS:
brew install tesseract tesseract-lang
-
Note: If you encounter any issues or difficulties with Tesseract installation, refer to the official documentation or community forums.
-
📱 Android Users:
lost in dependency hell(coming soon!).
- Run this code (after you fill in your API ID and API Hash) to get a list of your chat list with their names and IDs:
from telethon import TelegramClient api_id = YOUR_API_ID api_hash = 'YOUR_API_HASH' client = TelegramClient('session_name', api_id, api_hash) async def main(): async for dialog in client.iter_dialogs(): print('{:>14}: {}'.format(dialog.id, dialog.title)) with client: client.loop.run_until_complete(main())
- Open
config.jsonfile in the project directory and fill it with the necessary information:
- Your API ID and API Hash.
- the IDs of the chats you want to watch.
- the ID of the chat you want to forward messages to.
- the keywords you want to filter messages based on.
Note: do not touch recent_messages.
Run the script:
python main.pyThe script will:
- ask you to login as the user by entering your phone number and code.
- starts watching the specified Telegram chats.
- starts processing unread messages if there are any and watch for new messages:
- Extract text from the message body and image (if present).
- Check for required keywords.
- Skip if it’s a duplicate of one of your last 10 messages.
- Forward it to your target chat.
- Fix race conditions when updating
recent_messagesand writing toconfig.json.
- Add a FAQ section in the wiki with a table of contents.
- Improve regex matching to detect messages formatted like:
#Basrah www.example.com/electrical-engineering-intern/. - Determine whether account bans reported by telethon.client.updates are caused by the script (highly unlikely, as none of the reported chat IDs appear in the dialogs list obtained beforehand).
- Rethink & test job level identification logic for posts without clear level markers.
- ✅ Implemented two-stage filtering: explicit keywords (stage 1) + inference-based detection (stage 2)
- ✅ Entry-level: Matches if no experience/certification requirements found
- ✅ Mid-level: Matches if no experience/certification/responsibility requirements found
- ✅ Ambiguous messages forwarded to personal chat for manual review
- Fallback to message link sharing for the channels that have message forwarding disabled.
- share the message link with a brief summary (job title, location)
- Review and improve the retry mechanism.
- Set up crash notifications (email, webhook, or other) and autostart upon system boot.🔄
- Add logging for pattern match stages (which stage matched, which patterns triggered) for debugging? Maybe, we will see.
- switch from
requeststoaiohttpfor a truly async operation - reduce false positives 🔄
- add web scraping for ambiguous job posts
-
Create a modern CLI with real-time statistics instead of plain logs:
- Show progress bar for unread message processing.
- Display processed message counts per chat and overall (over a time period).
- Display forwarded message counts per chat and overall (over a time period).
- Show breakdown of matches by stage (stage 1 vs. stage 2 inference)?
- Display count of ambiguous messages forwarded for manual review.
- Highlight important events (account bans, connection issues, etc.).
- Analyze the codebase for a possible second refactoring.
This project is licensed under the AGPL License.