This project implements a data collection system for NASA's Near Earth Object (NEO) Web Service API. The system efficiently gathers, processes, and stores information about Near Earth Objects while providing tools for aggregation, analysis, and maintenance.
✅ API Integration: Successfully connected to NASA's NEO Web Service API with proper authentication and error handling
✅ Data Collection: Collected required NEO data, including:
- Identifier data (id, neo_reference_id, name, designation)
- Physical characteristics (magnitude, diameter, hazard status)
- Orbital parameters (approach distance, velocity, observation data)
✅ Efficient Data Processing: Implemented chunk-based processing to manage memory usage and scale to larger datasets
✅ Storage Format: Saved data in Parquet format with organized directory structure
✅ Required Aggregations:
- Total approaches closer than 0.2 AU
- Number of close approaches per year
- Modular code design with separate modules for API handling, data processing, and storage
- Comprehensive error handling and logging for production readiness
- Configurable parameters for flexibility in data collection
- Automated data collection using macOS Automator and Calendar integration
- Timestamped filenames for tracking historical data
- Tools for cleaning old data and log files
- Optional archiving functionality for data retention policy implementation
- Sample data analysis notebook with visualizations
- Exploration of machine learning potential using the collected data
- Language: Python 3.7+
- Key Libraries: pandas, requests, python-dotenv
- Architecture: Modular design with separate components for API access, processing, and storage
- Documentation: Comprehensive README with usage examples and troubleshooting guidance
- Real-time monitoring dashboards
- Expanded machine learning applications
- Serverless deployment options for cloud-based execution
This implementation successfully delivers all required functionality while providing a scalable, maintainable foundation for future development. The system is designed to handle increased data volumes through configurable chunk sizes and could be easily extended to support cloud storage or distributed processing.