Skip to content

Latest commit

 

History

History
54 lines (39 loc) · 2.5 KB

File metadata and controls

54 lines (39 loc) · 2.5 KB

NASA NEO Data Collection Project Summary

Project Overview

This project implements a data collection system for NASA's Near Earth Object (NEO) Web Service API. The system efficiently gathers, processes, and stores information about Near Earth Objects while providing tools for aggregation, analysis, and maintenance.

Key Requirements Implemented

API Integration: Successfully connected to NASA's NEO Web Service API with proper authentication and error handling

Data Collection: Collected required NEO data, including:

  • Identifier data (id, neo_reference_id, name, designation)
  • Physical characteristics (magnitude, diameter, hazard status)
  • Orbital parameters (approach distance, velocity, observation data)

Efficient Data Processing: Implemented chunk-based processing to manage memory usage and scale to larger datasets

Storage Format: Saved data in Parquet format with organized directory structure

Required Aggregations:

  • Total approaches closer than 0.2 AU
  • Number of close approaches per year

Extended Features

Robust Architecture

  • Modular code design with separate modules for API handling, data processing, and storage
  • Comprehensive error handling and logging for production readiness
  • Configurable parameters for flexibility in data collection

Scheduled Data Collection

  • Automated data collection using macOS Automator and Calendar integration
  • Timestamped filenames for tracking historical data

Maintenance Utilities

  • Tools for cleaning old data and log files
  • Optional archiving functionality for data retention policy implementation

Analysis Support

  • Sample data analysis notebook with visualizations
  • Exploration of machine learning potential using the collected data

Technical Implementation

  • Language: Python 3.7+
  • Key Libraries: pandas, requests, python-dotenv
  • Architecture: Modular design with separate components for API access, processing, and storage
  • Documentation: Comprehensive README with usage examples and troubleshooting guidance

Future Enhancements

  • Real-time monitoring dashboards
  • Expanded machine learning applications
  • Serverless deployment options for cloud-based execution

Conclusion

This implementation successfully delivers all required functionality while providing a scalable, maintainable foundation for future development. The system is designed to handle increased data volumes through configurable chunk sizes and could be easily extended to support cloud storage or distributed processing.