Skip to content

A Realtime AI-Powered Exoplanet Classifier specifically trained on datasets from NASA exoplanets archives and ARXIV that utilizes the formidable storage capacity of Filecoin to upload data ml model, ensuring security, and retrieve the data for AI to perform inferential operations and generate the truth probability of those exoplanets

Notifications You must be signed in to change notification settings

Nkovaturient/Astro_FIL

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

🌌 AstroFIL : Realtime AI-Powered Exoplanet Classifier, Powered by Lighthouse Perpetual Storage🌌

  • Have you ever been awed by the vastness of universe, or the formation of stars, pillars of creation, black holes and exoplanets outside Earth, like Earth?!
  • Envisioned with the aim of contributing to open science discovery, bringing more clarification and data accessibility within the scientific ecosystem along with efficient outputs of trained AI and ML models on nuances of exoplanets related datasets for quicker solutions, this Project is a step in fulfilling that vision.
  • AstroFIL demonstrates an end-to-end workflow for building a ML model to classify exoplanets using data from the NASA Exoplanet Archive. It integrates decentralized storage via Lighthouse (a Filecoin-based storage solution) to store datasets, trained models, and metadata. The project showcases data retrieval, preprocessing, model training, decentralized storage, and inference, all in a streamlined pipeline.
  • Fosters decentralized security, real-time adaptability, efficiency and scientific collaboration.

Preview: Youtube Link

494de5ba0e214daf99716afbb5b0f5d2.mp4

🌌 Project Overview

  1. Fetch Realtime Scientific Papers: Queries latest arXiv astro-ph abstracts and extracts scientific keywords using NER (BERT).
  2. Generate Dynamic Dataset: Retrieves exoplanet data from NASA, synthesizes negative samples for classification, and labels accordingly.
  3. Train ML Model: Uses a Random Forest classifier to learn exoplanet classification based on four physical features.
  4. Store on Lighthouse/Filecoin: Uploads dataset, model (.joblib), and metadata (.json) to Lighthouse Storage and returns IPFS CIDs.
  5. Inference + Decentralized Retrieval: Model is reloaded from CID, and predictions are made on test data.

🎯 Features

  • 🔭 Data Source: NASA Exoplanet Archive API.
  • 🧠 ML Model: Random Forest Classifier (scikit-learn).
  • 📦 Decentralized Storage: Lighthouse + Filecoin/IPFS.
  • 🧪 Inference Ready: Demonstrates real-time sample classification.
  • 🔐 Robust Handling: Upload, download, and failure-safe CID operations.
  • ♻️ Temp Management: Efficient tempfile cleanup.
  • 📰 NER on ArXiv Abstracts: Keyword extraction from latest papers.

📋 Prerequisites

  • Before running the project, ensure you have:

  • Python 3.8+ installed.

  • A Lighthouse API key (sign up at Lighthouse Storage).


🚀 Setup Instructions

  1. Fork and Clone the Repository:

    git clone <repository-url>
    cd astro_fil
    
  2. Create a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install the required dependencies

  • in requirements.txt:

    pandas
    numpy
    scikit-learn
    joblib
    requests
    lighthouseweb3
    python-dotenv
    feedparser
    transformers
    torch
    torchvision
    torchaudio
    pip install -r requirements.txt
  1. Set Up Lighthouse API Key:
  • Obtain an API key from Lighthouse Storage.
  • Update the API_KEY variable in the script:API_KEY = "your-lighthouse-api-key"
  1. Run the Script: python -m astrofil

🛠️ Code Structure

  • The project consists of the following key functions:

    • create_sample_dataset() - Downloads a subset of exoplanet data from NASA and labels it as "confirmed."
    • upload_to_lighthouse() - Uploads a file to Lighthouse Storage and returns its CID.
    • download_from_lighthouse() - Downloads a file from Lighthouse using its CID.
    • fetch_arxiv_astro_papers() – Get abstracts from arXiv (astro-ph)
    • extract_keywords() – Use BERT NER to extract topic keywords
    • train_model() - Trains a Random Forest Classifier on the dataset and evaluates accuracy.
    • main() - Orchestrates the workflow: dataset creation, training, storage, and inference.

Key Libraries

  • pandas, numpy, scikit-learn, joblib – ML pipeline
  • requests, feedparser – Data fetching
  • lighthouseweb3 – IPFS/Filecoin Storage
  • transformers, pipeline – Keyword extraction (NER)

🧠 Realtime Pipeline Explained

  • ArXiv paper titles + abstracts ⟶ Keywords
  • Keywords drive context, tracked with dataset & metadata
  • NASA exoplanet dataset ⟶ Classifier ⟶ Decentralized upload
  • Run inference on downloaded model + test data

🌟 Future Improvements

  • 🌐 Multi-label: Classify gas giants, terrestrials, and neutron stars.
  • 🌌 Expand features: Add stellar eccentricity, distance, and magnitude.
  • 🔄 Label diversity: Add real-world unconfirmed objects.
  • 🧪 AutoML: Try XGBoost or GridSearch tuning.
  • 🕸️ IPFS-based UI: Build browser-based querying via CID.

Endnote

Built with curiosity and cosmos in mind. Explore decentralized space research with AstroFIL 🌠😊😍

About

A Realtime AI-Powered Exoplanet Classifier specifically trained on datasets from NASA exoplanets archives and ARXIV that utilizes the formidable storage capacity of Filecoin to upload data ml model, ensuring security, and retrieve the data for AI to perform inferential operations and generate the truth probability of those exoplanets

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%