GitHub - Nkovaturient/Astro_FIL: A Realtime AI-Powered Exoplanet Classifier specifically trained on datasets from NASA exoplanets archives and ARXIV that utilizes the formidable storage capacity of Filecoin to upload data ml model, ensuring security, and retrieve the data for AI to perform inferential operations and generate the truth probability of those exoplanets

🌌 AstroFIL : Realtime AI-Powered Exoplanet Classifier, Powered by Lighthouse Perpetual Storage🌌

Have you ever been awed by the vastness of universe, or the formation of stars, pillars of creation, black holes and exoplanets outside Earth, like Earth?!
Envisioned with the aim of contributing to open science discovery, bringing more clarification and data accessibility within the scientific ecosystem along with efficient outputs of trained AI and ML models on nuances of exoplanets related datasets for quicker solutions, this Project is a step in fulfilling that vision.
AstroFIL demonstrates an end-to-end workflow for building a ML model to classify exoplanets using data from the NASA Exoplanet Archive. It integrates decentralized storage via Lighthouse (a Filecoin-based storage solution) to store datasets, trained models, and metadata. The project showcases data retrieval, preprocessing, model training, decentralized storage, and inference, all in a streamlined pipeline.
Fosters decentralized security, real-time adaptability, efficiency and scientific collaboration.

Preview: Youtube Link

494de5ba0e214daf99716afbb5b0f5d2.mp4

🌌 Project Overview

Fetch Realtime Scientific Papers: Queries latest arXiv astro-ph abstracts and extracts scientific keywords using NER (BERT).
Generate Dynamic Dataset: Retrieves exoplanet data from NASA, synthesizes negative samples for classification, and labels accordingly.
Train ML Model: Uses a Random Forest classifier to learn exoplanet classification based on four physical features.
Store on Lighthouse/Filecoin: Uploads dataset, model (.joblib), and metadata (.json) to Lighthouse Storage and returns IPFS CIDs.
Inference + Decentralized Retrieval: Model is reloaded from CID, and predictions are made on test data.

🎯 Features

🔭 Data Source: NASA Exoplanet Archive API.
🧠 ML Model: Random Forest Classifier (scikit-learn).
📦 Decentralized Storage: Lighthouse + Filecoin/IPFS.
🧪 Inference Ready: Demonstrates real-time sample classification.
🔐 Robust Handling: Upload, download, and failure-safe CID operations.
♻️ Temp Management: Efficient tempfile cleanup.
📰 NER on ArXiv Abstracts: Keyword extraction from latest papers.

📋 Prerequisites

Before running the project, ensure you have:
Python 3.8+ installed.
A Lighthouse API key (sign up at Lighthouse Storage).

🚀 Setup Instructions

Fork and Clone the Repository:

git clone <repository-url>
cd astro_fil

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the required dependencies

in requirements.txt:

pandas
numpy
scikit-learn
joblib
requests
lighthouseweb3
python-dotenv
feedparser
transformers
torch
torchvision
torchaudio

pip install -r requirements.txt

Set Up Lighthouse API Key:

Obtain an API key from Lighthouse Storage.
Update the API_KEY variable in the script:API_KEY = "your-lighthouse-api-key"

Run the Script: python -m astrofil

🛠️ Code Structure

The project consists of the following key functions:
- create_sample_dataset() - Downloads a subset of exoplanet data from NASA and labels it as "confirmed."
- upload_to_lighthouse() - Uploads a file to Lighthouse Storage and returns its CID.
- download_from_lighthouse() - Downloads a file from Lighthouse using its CID.
- fetch_arxiv_astro_papers() – Get abstracts from arXiv (astro-ph)
- extract_keywords() – Use BERT NER to extract topic keywords
- train_model() - Trains a Random Forest Classifier on the dataset and evaluates accuracy.
- main() - Orchestrates the workflow: dataset creation, training, storage, and inference.

Key Libraries

pandas, numpy, scikit-learn, joblib – ML pipeline
requests, feedparser – Data fetching
lighthouseweb3 – IPFS/Filecoin Storage
transformers, pipeline – Keyword extraction (NER)

🧠 Realtime Pipeline Explained

ArXiv paper titles + abstracts ⟶ Keywords
Keywords drive context, tracked with dataset & metadata
NASA exoplanet dataset ⟶ Classifier ⟶ Decentralized upload
Run inference on downloaded model + test data

🌟 Future Improvements

🌐 Multi-label: Classify gas giants, terrestrials, and neutron stars.
🌌 Expand features: Add stellar eccentricity, distance, and magnitude.
🔄 Label diversity: Add real-world unconfirmed objects.
🧪 AutoML: Try XGBoost or GridSearch tuning.
🕸️ IPFS-based UI: Build browser-based querying via CID.

Endnote

Built with curiosity and cosmos in mind. Explore decentralized space research with AstroFIL 🌠😊😍

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
README.md		README.md
astroFil.py		astroFil.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌌 AstroFIL : Realtime AI-Powered Exoplanet Classifier, Powered by Lighthouse Perpetual Storage🌌

Preview: Youtube Link

🌌 Project Overview

🎯 Features

📋 Prerequisites

🚀 Setup Instructions

🛠️ Code Structure

Key Libraries

🧠 Realtime Pipeline Explained

🌟 Future Improvements

Endnote

About

Uh oh!

Releases

Packages

Languages

Nkovaturient/Astro_FIL

Folders and files

Latest commit

History

Repository files navigation

🌌 AstroFIL : Realtime AI-Powered Exoplanet Classifier, Powered by Lighthouse Perpetual Storage🌌

Preview: Youtube Link

🌌 Project Overview

🎯 Features

📋 Prerequisites

🚀 Setup Instructions

🛠️ Code Structure

Key Libraries

🧠 Realtime Pipeline Explained

🌟 Future Improvements

Endnote

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages