feat(telemetry): add centralized threat intelligence data pipeline (S3 + Lambda + Athena)#35
Open
TrishaG189 wants to merge 1 commit intoc2siorg:mainfrom
Open
feat(telemetry): add centralized threat intelligence data pipeline (S3 + Lambda + Athena)#35TrishaG189 wants to merge 1 commit intoc2siorg:mainfrom
TrishaG189 wants to merge 1 commit intoc2siorg:mainfrom
Conversation
…th S3 sink, Lambda enrichment, Athena, and Fluent Bit log forwarder
|
Thanks for this, really nice work. It helped make the telemetry direction much clearer. I’ve built on top of it in PR #37, where I connected the multi-region Cowrie deployment with this pipeline and tested the full flow with real honeypot-generated events. Looking forward to building more on this together. |
Author
|
@hariram4862 Glad it helped — I’ll check out PR #37 and go through the integration. Nice to see the full pipeline working with real events |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the complete threat intelligence data pipeline for the Honeynet project —
the layer that transforms raw attacker logs into actionable, queryable intelligence.
Problem
Every existing PR deploys honeypots that store logs locally on the instance.
These logs are lost on termination, siloed by region, and provide zero threat context.
A honeypot without centralized, enriched data is just a trap with no analysis.
What This PR Adds
1.
terraform/modules/telemetry/— Serverless Pipeline Infrastructurenode on any cloud provider (AWS, GCP, Azure)
queries AbuseIPDB for abuse score/country/ISP/Tor status, writes enriched record
2.
lambda/enrichment/handler.py— Python Enrichment Function3.
ansible/playbooks/install_log_forwarder.yml— Filebeat AgentExample Athena Queries (works immediately after deployment)
Architecture
Alignment with GSoC Objectives
Relation to Existing PRs
Proof of Execution (Live Environment Test)
To ensure this pipeline functions flawlessly in a real AWS environment, I deployed the complete module to a sandbox account, simulated Fluent Bit log ingestion, and queried the resulting enriched data via Athena.
1. Successful Terraform Deployment
Infrastructure provisioned cleanly. IAM roles, Lambda triggers, and S3 lifecycle rules successfully attached.
2. Lambda IP Enrichment & Caching (CloudWatch Logs)
Uploaded a mock
cowrie.jsonpayload directly to the raw S3 sink. The Lambda trigger fired immediately, processed the file, and successfully stored the enriched JSON.3. Athena SQL Query Results (The Enriched Data)
Triggered the Glue Crawler to infer the schema. Successfully queried the enriched data using standard SQL in Athena. Note the
abuse_scoreandcountry_codefields successfully appended to the raw Cowrie data!Closes #30