Skip to content

LunaticGhoulPiano/Flight-Spotter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Flight-Spotter

Overview

We split this project into the three (or four) stages:

  1. Data preprocessing: fetch and filter with custom conditions from ADS-B Exchange historical data.
  2. Analyze and Build model:
    1. About Data mining: analyze the ADS-B signals which above Taiwan Main Island.
    2. About GAI: train a model for generating future flight routes simulation.
  3. Interactive system: react with user's commands.

Current progress

  • Preprocessed readsb-hist data and made my own dataset "readsb-hist_2025-04-01-000000-120000" and uploaded to huggingface.
  • Use ./data./preprocessed./readsb-hist_filtered_by_Taiwan_manual_edges.csv to expriment the clustering algorithms by using Weka and self-written scripts.
  • Visualized the results of clustered data.
  • Analyze the original dataset and the clustered data.

TODOs

  • Implement discord bot to query the cluster with adsb.lol api.
  • Implement main.py.

Flow

  • See ./flow.drawio : image

Warnings

Please install required packages first by pip

pip install -r requirements.txt
  • Sometimes winsdk may have building while installing python package.

Structure

Flight-Spotter
├──.env (currently unused, to store discord bot token)
├──.gitignore
├──LICENSE
├──README.md
├──Experiments.md (details of the experiments)
├──requirements.txt
├──flow.drawio
├──preprocessor.py (runs data gathering and preprocessing)
├──experiments.py (runs clustering and analyzing algorithms and draw results)
├──main.py (currently unused)
├──bot.py (currently unused)
├──get_data.py (sub-process script)
├──filter_and_encode.py (sub-process script)
├──eval_end_draw_weka_results.py (sub-process script)
├──clusterer.py (sub-process script)
├──analyzer.py (sub-process script)
├──gps.py (tool script)
├──ecef.py (tool script)
├──visualizer.py (tool script)
├──adsb_lol.py (tool script)
├──pics
│  ├──flow.png
│  ├──JADIZ_and_CADIZ_and_KADIZ_in_East_China_Sea.jpg
│  ├──Taiwan_ADIZ.jpg
│  ├──Taiwan_manual_edges.jpg
│  ├──degree2radian.jpg
│  ├──feet2meter.jpg
│  ├──ecef_conversion.jpg
│  ├──demo_p1.png
│  ├──demo_p2.png
│  └──demo_p3.png
├──data
│  ├──aircraft (auto-generated)
│  │  └──basic-ac-db.json
│  ├──filter_regions
│  │  ├──Taiwan_ADIZ.json
│  │  └──Taiwan_manual_edges.json
│  ├──historical_adsbex_sample (auto-generated, based on user's ENABLES settings)
│  │  └──readsb-hist
│  │     ├──2025_04_01_000000.json
│  │     ├──...
│  │     └──2025_04_01_120000.json
│  ├──preprocessed (auto-generated, based on the user's choose)
│  │  ├──readsb-hist_merged.csv
│  │  ├──readsb-hist_filtered_by_Taiwan_ADIZ.csv
│  │  └──readsb-hist_filtered_by_Taiwan_manual_edges.csv
│  └──直轄市、縣(市)界線1140318
│     ├──修正清單_11403.xlsx
│     ├──TW-01-301000100G-000017.xml
│     ├──COUNTY_MOI_1140318.shp.xml
│     ├──COUNTY_MOI_1140318.shx
│     ├──COUNTY_MOI_1140318.CPG
│     ├──COUNTY_MOI_1140318.dbf
│     ├──COUNTY_MOI_1140318.prj
│     ├──COUNTY_MOI_1140318.sbn
│     ├──COUNTY_MOI_1140318.sbx
│     └──COUNTY_MOI_1140318.shp
├──filter_region_maps (auto-generated, based on the user's choose)
│  ├──Taiwan_ADIZ.html
│  └──Taiwam_manual_edges.html
├──logs (auto-generated)
│  ├──basic_ac_db.txt
│  └──readsb-hist.txt
├──python_results
│     ├──kmeans
│     │  └──readsb-hist_filtered_by_Taiwan_manual_edges
│     │     ├──min_2_max_40
│     │     │  ├──SSE
│     │     │  │  ├──3D.png
│     │     │  │  ├──clustered.csv
│     │     │  │  ├──clustered.html
│     │     │  │  ├──distribution.csv
│     │     │  │  └──distribution.png
│     │     │  ├──Silhouette_Score
│     │     │  │  ├──3D.png
│     │     │  │  ├──clustered.csv
│     │     │  │  ├──clustered.html
│     │     │  │  ├──distribution.csv
│     │     │  │  └──distribution.png
│     │     │  ├──evaluation.csv
│     │     │  └──evaluation.png
│     │     ├──min_2_max_125
│     │     │  ├──SSE
│     │     │  │  ├──3D.png
│     │     │  │  ├──clustered.csv
│     │     │  │  ├──clustered.html
│     │     │  │  ├──distribution.csv
│     │     │  │  └──distribution.png
│     │     │  ├──Silhouette_Score
│     │     │  │  ├──3D.png
│     │     │  │  ├──clustered.csv
│     │     │  │  ├──clustered.html
│     │     │  │  ├──distribution.csv
│     │     │  │  └──distribution.png
│     │     │  ├──evaluation.csv
│     │     │  └──evaluation.png
│     │     └──min_7_max_20
│     │        ├──SSE
│     │        │  ├──3D.png
│     │        │  ├──clustered.csv
│     │        │  ├──clustered.html
│     │        │  ├──distribution.csv
│     │        │  └──distribution.png
│     │        ├──Silhouette_Score
│     │        │  ├──3D.png
│     │        │  ├──clustered.csv
│     │        │  ├──clustered.html
│     │        │  ├──distribution.csv
│     │        │  └──distribution.png
│     │        ├──evaluation.csv
│     │        └──evaluation.png
│     ├──hdbscan
│     │  └──readsb-hist_filtered_by_Taiwan_manual_edges
│     │     ├──finetune_records
│     │     │  ├──heatmap_clustering.png
│     │     │  ├──heatmap_noise.png
│     │     │  └──records.csv
│     │     ├──min_7_epsilon_0_07
│     │     │     ├──3D.png
│     │     │     ├──clustered.csv
│     │     │     ├──clustered.html
│     │     │     ├──distribution.csv
│     │     │     └──distribution.png
│     │     ├──min_7_epsilon_0_1
│     │     │     ├──3D.png
│     │     │     ├──clustered.csv
│     │     │     ├──clustered.html
│     │     │     ├──distribution.csv
│     │     │     └──distribution.png
│     │     ├──min_20_epsilon_0_07
│     │     │     ├──3D.png
│     │     │     ├──clustered.csv
│     │     │     ├──clustered.html
│     │     │     ├──distribution.csv
│     │     │     └──distribution.png
│     │     ├──min_20_epsilon_0_1
│     │     │     ├──3D.png
│     │     │     ├──clustered.csv
│     │     │     ├──clustered.html
│     │     │     ├──distribution.csv
│     │     │     └──distribution.png
│     │     ├──min_20_epsilon_0_6
│     │     │     ├──3D.png
│     │     │     ├──clustered.csv
│     │     │     ├──clustered.html
│     │     │     ├──distribution.csv
│     │     │     └──distribution.png
│     │     ├──min_70_epsilon_1_1
│     │     │     ├──3D.png
│     │     │     ├──clustered.csv
│     │     │     ├──clustered.html
│     │     │     ├──distribution.csv
│     │     │     └──distribution.png
│     │     └──min_90_epsilon_1_0
│     │           ├──3D.png
│     │           ├──clustered.csv
│     │           ├──clustered.html
│     │           ├──distribution.csv
│     │           └──distribution.png
│     └──optics
│        └──readsb-hist_filtered_by_Taiwan_manual_edges
│           ├──finetune_records
│           │  ├──heatmap_clustering.png
│           │  ├──heatmap_noise.png
│           │  └──records.csv
│           ├──min_7_epsilon_0_07
│           │     ├──3D.png
│           │     ├──clustered.csv
│           │     ├──clustered.html
│           │     ├──distribution.csv
│           │     └──distribution.png
│           ├──min_7_epsilon_0_1
│           │     ├──3D.png
│           │     ├──clustered.csv
│           │     ├──clustered.html
│           │     ├──distribution.csv
│           │     └──distribution.png
│           ├──min_20_epsilon_0_07
│           │     ├──3D.png
│           │     ├──clustered.csv
│           │     ├──clustered.html
│           │     ├──distribution.csv
│           │     └──distribution.png
│           ├──min_20_epsilon_0_1
│           │     ├──3D.png
│           │     ├──clustered.csv
│           │     ├──clustered.html
│           │     ├──distribution.csv
│           │     └──distribution.png
│           ├──min_20_epsilon_0_6
│           │     ├──3D.png
│           │     ├──clustered.csv
│           │     ├──clustered.html
│           │     ├──distribution.csv
│           │     └──distribution.png
│           ├──min_70_epsilon_1_1
│           │     ├──3D.png
│           │     ├──clustered.csv
│           │     ├──clustered.html
│           │     ├──distribution.csv
│           │     └──distribution.png
│           └──min_90_epsilon_1_0
│                 ├──3D.png
│                 ├──clustered.csv
│                 ├──clustered.html
│                 ├──distribution.csv
│                 └──distribution.png
└──weka_results
│     ├──DBSCAN
│     │  └──readsb-hist_filtered_by_Taiwan_manual_edges
│     │     ├──min_7_epsilon_0_07
│     │     |     ├──clustered.arff
│     │     │     ├──3D.png
│     │     │     ├──clustered.csv
│     │     │     ├──clustered.html
│     │     │     ├──distribution.csv
│     │     │     └──distribution.png
│     │     ├──min_7_epsilon_0_1
│     │     |     ├──clustered.arff
│     │     │     ├──3D.png
│     │     │     ├──clustered.csv
│     │     │     ├──clustered.html
│     │     │     ├──distribution.csv
│     │     │     └──distribution.png
│     │     ├──min_20_epsilon_0_07
│     │     |     ├──clustered.arff
│     │     │     ├──3D.png
│     │     │     ├──clustered.csv
│     │     │     ├──clustered.html
│     │     │     ├──distribution.csv
│     │     │     └──distribution.png
│     │     ├──min_20_epsilon_0_1
│     │     |     ├──clustered.arff
│     │     │     ├──3D.png
│     │     │     ├──clustered.csv
│     │     │     ├──clustered.html
│     │     │     ├──distribution.csv
│     │     │     └──distribution.png
│     │     ├──min_20_epsilon_0_6
│     │     |     ├──clustered.arff
│     │     │     ├──3D.png
│     │     │     ├──clustered.csv
│     │     │     ├──clustered.html
│     │     │     ├──distribution.csv
│     │     │     └──distribution.png
│     │     ├──min_70_epsilon_1_1
│     │     |     ├──clustered.arff
│     │     │     ├──3D.png
│     │     │     ├──clustered.csv
│     │     │     ├──clustered.html
│     │     │     ├──distribution.csv
│     │     │     └──distribution.png
│     │     └──min_90_epsilon_1_0
│     │           ├──clustered.arff
│     │           ├──3D.png
│     │           ├──clustered.csv
│     │           ├──clustered.html
│     │           ├──distribution.csv
│     │           └──distribution.png
│     ├──EM
│     │  └──readsb-hist_filtered_by_Taiwan_manual_edges
│     │     ├──clusters_7
│     │     │  ├──clustered.arff
│     │     │  ├──3D.png
│     │     │  ├──clustered.csv
│     │     │  ├──clustered.html
│     │     │  ├──distribution.csv
│     │     │  └──distribution.png
│     │     ├──clusters_20
│     │     │  ├──clustered.arff
│     │     │  ├──3D.png
│     │     │  ├──clustered.csv
│     │     │  ├──clustered.html
│     │     │  ├──distribution.csv
│     │     │  └──distribution.png
│     │     ├──clusters_70
│     │     │  ├──clustered.arff
│     │     │  ├──3D.png
│     │     │  ├──clustered.csv
│     │     │  ├──clustered.html
│     │     │  ├──distribution.csv
│     │     │  └──distribution.png
│     │     └──clusters_90
│     │        ├──clustered.arff
│     │        ├──3D.png
│     │        ├──clustered.csv
│     │        ├──clustered.html
│     │        ├──distribution.csv
│     │        └──distribution.png
│     └──XMeans
│        └──readsb-hist_filtered_by_Taiwan_manual_edges
│           ├──min_7_max_20
│           │  ├──clustered.arff
│           │  ├──3D.png
│           │  ├──clustered.csv
│           │  ├──clustered.html
│           │  ├──distribution.csv
│           │  └──distribution.png
│           ├──min_20_max_70
│           │  ├──clustered.arff
│           │  ├──3D.png
│           │  ├──clustered.csv
│           │  ├──clustered.html
│           │  ├──distribution.csv
│           │  └──distribution.png
│           └──min_70_max_90
│              ├──clustered.arff
│              ├──3D.png
│              ├──clustered.csv
│              ├──clustered.html
│              ├──distribution.csv
│              └──distribution.png
└──analyzed_results
      └──readsb-hist_filtered_by_Taiwan_manual_edges
         ├──grid_by_count.png
         ├──grid_by_flight_count.png
         ├──grid_by_flight_legend_off.png
         ├──grid_by_flight_legend_on.png
         ├──grid_by_hex_count.png
         ├──grid_by_hex_legend_off.png
         ├──grid_by_hex_legend_on.png
         ├──KDE.png
         ├──GETIS_ORD_Gstar.png
         ├──in_degree.png
         ├──out_degree.png
         ├──betweenness.png
         ├──closeness.png
         ├──pagerank.png
         ├──community_detection.png
         └──cluster_distribution_analysis.png

Datasets

ADS-B Exchange data

  • In this project, we decided to use the ADSBEX-provided readsb-hist data, instead of using traces or hires-traces. There are two reasons:
    1. It's hard to calculate the sampling rate of traces and hires-traces, but readsb-hist has steady sampling rate (60 or 5 seconds, decided by the data time).
    2. It's hard to reduce the dimension of all traces of an aircraft, especially the sampling rate is not steady.
  • ADS-B Exchange free historical data
  • readsb official documentation
  • Data used in this project
    • Our preprocessed data already upload to huggingface.
    • Time range:
      • 2025/04/01 00:00:00 ~ 12:00:00
      • Sample rate: 5 seconds per data
    • Numbers and Size:
      • readsb-hist_merged.csv: 6089010 rows, 857 MB
      • readsb-hist_filtered_by_Taiwan_manual_edges.csv: 9148 rows, 1.27 MB

Region Filter files

  • The "region filter files" are all manually generated json files, described in the following format:
[
    {
        "latitude": first GPS coordinate latitude,
        "longitude": first GPS coordinate longitude
    },
    {second GPS coordinate},
    {third GPS coordinate},
    ...,
    {last GPS coordinate},
    {first GPS coordinate},
]

Each GPS coordinate is in DMS (Degrees, Minutes, Seconds) format. In filter.py, it will fetch those ADS-B signals with their GPS locations in the polygon surrounded by the above region filter file. We made two region filter files:

  1. ./data./filter_regions./Taiwan_ADIZ.json:
    • The exact GPS coordinates of Taiwan ADIZ (Air Defense Identification Zone) is described in Part 2 - ENR 5.2.3 ADIZ, provided by Taiwan eAIP.
    • Here is the plotted polygon: image
    • This picture is from wikipedia, ADIZs in East Asia: image
  2. ./data./fliter_regions./Taiwan_manual_edges.json

Map files

  • Due to the filtered region, I use the official Taiwan map in TWD97 that released in 2025/03/18, which is the folder ./data./直轄市、縣(市)界線1140318.
  • Must use SHP format. image
  • In your case, You should create a folder under ./data, which should contains a .shp map file. In my case, it is ./data./直轄市、縣(市)界線1140318./COUNTY_MOI_1140318.shp.
  • Also, set the .shp map file path and EPSG code in experiments.py.

Stage 1: Data preprocessing

Run

python preprocessor.py

Processes details

  1. Checking the aircraft database: This ADSBEX aircraft database is daily-uploaded (at "Supplementary Information" in ADSBEX historical data).
  2. Download all data in the Enable time range and unzip:
    • In get_data.py you can manually set these ENABLE Parameters:
      # default
      ENABLES_YEAR = ["2025"] # ["2016", "2017", "2018", "2019", "2020", "2021", "2022", "2023", "2024", "2025"]
      ENABLES_MONTH = ["04"] # ["01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12"]
    • In this project, we only use the readsb-hist data in 2025/04/01. Note that in this sample historical data, only the first day is available. So please don't modify ENABLES_DATE.
    • According to the official description of ads-b exchange historical data of readsb-hist:

      "Snapshots of all global airborne traffic are archived every 5 seconds starting April 2020, (prior data is available every 60 secs from starting in July 2016)."

    • However the accurate sampling rate by time is:
      • Sampling rate = 60 seconds: 2016 July ~ 2020 March
      • Sampling rate = 5 seconds: 2020 April ~ Now
    • The downloaded data will be stored at ./data./historical_adsbex_sample./readsb-hist with the formatted name: yyyy_mm_dd_hhmmss.json, e.g. 2025_04_01_003655.
    • Those .json.zip data will auto-unzipped into .json format.
  3. Filtering:
    • The preprocessor will let user choose a region filter file.
    • If choose No filter then will filter the aircrafts in file that must have the following features:
      "hex"
      "flight"
      "t"
      "alt_baro"
      "alt_geom"
      "gs"
      "track"
      "geom_rate"
      "squawk"
      "nav_qnh"
      "nav_altitude_mcp"
      "nav_altitude_fms"
      "nav_heading"
      "lat"
      "lon"
      "nic"
      "rc"
      "track"
      "nic_baro"
      "nac_p"
      "nac_v"
      "sil"
      "sil_type"
      
    • The meanings of above features see ADS-B Exchange Version 2 API Fields Documentations and wiedehopf's readsb README-json.md.
    • If choose a region filter file, then will generate two files, the first one is the above global aircrafts with those features, and the second one is the aircrafts not only with the above features, but also with GPS coordinate in the polygon plotted by the choose region filter flie.
  4. Encoding:
    • The encoded features are:
      "geohash"
      "ecef_x"
      "ecef_y"
      "ecef_z"
      
    1. Geohash
    2. ECEF (Earth-Centered, Earth-Fixed)
      • Encode and Decode implementation: ecef.py
      • Crucial coefficients:
        • Radius of Earth $ER$ = 6371000 meters
        • 1 feet (ft) = 0.3047 meter (m)
      • Encode inputs:
        1. Latitude (lat) $\phi$, unit = degree
        2. Longitude (lon) $\lambda$, unit = degree
        3. Geometric altitude (alt_geom) $h_{ft}$,unit = feet
      • Encode:
        1. Degree to Radian:
          • equation
        2. Feet to Meter:
          • equation
        3. ECEF convertion:
          • equation
  5. Store:
    • The unfiltered (and filtered) data will stored at ./data./preprocessed with name readsb-hist_merged.csv (no filter region file was choose) and readsb-hist_filtered_by_{name of region filter file with no ".json"}.csv (with region filter file).
    • For instance, the first 5 rows of ./data./preprocessed./readsb-hist_filtered_by_Taiwan_manual_edges.csv :
      year,month,day,hour,minute,second,hex,flight,t,alt_baro,alt_geom,gs,track,geom_rate,squawk,nav_qnh,nav_altitude_mcp,nav_altitude_fms,nav_heading,lat,lon,nic,rc,nic_baro,nac_p,nac_v,sil,sil_type,geohash,ecef_x,ecef_y,ecef_z
      2025,4,1,0,2,45,899046,CAL601,B738,1725,1950,169.9,47.39,1184,6264,1019.2,3008,3008,51.33,25.109067,121.26227,8,186,1,10,2,3,perhour,wsqnz6uqztq4,-2994112.9568203683,4931763.774640314,2703739.699255408
      2025,4,1,0,2,50,899046,CAL601,B738,1825,2075,177.0,47.75,1184,6264,1019.2,3008,3008,51.33,25.111767,121.265535,8,186,1,10,2,3,perhour,wsqnz7qyeu8t,-2994345.7634592457,4931513.723232558,2704027.7461128747
      2025,4,1,0,2,55,899046,CAL601,B738,1925,2150,184.1,48.08,1120,6264,1019.2,18592,18000,51.33,25.11442,121.268748,8,186,1,10,2,3,perhour,wsqnzecnnehq,-2994568.0573173896,4931256.46817952,2704304.58935681
      2025,4,1,0,3,0,899046,CAL601,B738,2050,2300,188.5,48.87,1120,6264,1019.2,20000,18000,51.33,25.117865,121.273049,8,186,1,10,2,3,perhour,wsqnzss64v7m,-2994875.2970717354,4930928.05980613,2704670.879634547
      2025,4,1,0,3,5,899046,CAL601,B738,2100,2325,193.1,50.04,1472,6264,1019.2,20000,20000,51.33,25.11882,121.274261,8,186,1,10,2,3,perhour,wsqnzstpxp7j,-2994959.7803176898,4930832.072570452,2704770.2738511804

Stage 2: Data mining

Goal

  1. Cluster the aircrafts by position (GPS coordinates and geometric altitude) by different algorithms.
  2. Analyze the original data density and network, summarize the clustered data of each methods.

Course requirements

  • I use this project as a midterm & final project for courses "高等資料探勘" (Advanced Data Mining) and "生成式AI基礎與應用" (The Fundementals and Applications of Generative-AI), which I took at senior second semester (2025/02/01 ~ 2025/07/31, aka 113-2), at 中原大學 (Chung Yuan Christian University, CYCU), by the professor Hsiu-Min, Chuang (莊秀敏 教授).
  • And the course required that I must choose one data mining tool to use: Weka / Orange / KNIME, and I decided to use Weka. So there is a folder ./weka_results.

Experiments

  • Run:
    python experiments.py
    
  • Warnings:
    • Must have ./weka_results folder, or comment the following execution to avoid errors.
      eval_and_draw_weka_results.run(data_name)
  • See Experiments.md.

Demo of the clustered html webpage

  • Interact with the clustered.html clustered data file
  • Due to the size restriction, only generate the webpage with the data size that <= 20000 (by default). See threshold in visualizer.draw_map.
  • Demo file: HDBSCAN with min_points = 7, epsilon = 0.07 Image 1: the 3D visualization of Taiwan main island. image Image 2: the current data point belongs to noise (cluster number = -1). image Image 3: observed that the highest density is at northern airspace of Taiwan. image

TODOs

Stage 3: Discord bot - Interactive system

  • To be continued
  • Used scripts and folders: ./bot.py
  • If you want to build you own bot:
    1. Create a discord bot, get a token at BOT pgae
    2. Create an .env file and store at the root folder, set DISCORD_BOT_TOKEN='<YOUR_DISCORD_BOT_TOKEN>' into it
  • Still learning from this tutorial

adsb.lol api testing

Before running

About main.py

  • Currently main.py only could run on Windows (only knows how to get user's gps location on Windows, also don't have other platform to test).
  • Please enable Windows GPS in settings (official tutorial here).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages