We split this project into the three (or four) stages:
- Data preprocessing: fetch and filter with custom conditions from ADS-B Exchange historical data.
- Analyze and Build model:
- About Data mining: analyze the ADS-B signals which above Taiwan Main Island.
- About GAI: train a model for generating future flight routes simulation.
- Interactive system: react with user's commands.
- Preprocessed readsb-hist data and made my own dataset "readsb-hist_2025-04-01-000000-120000" and uploaded to huggingface.
- Use
./data./preprocessed./readsb-hist_filtered_by_Taiwan_manual_edges.csvto expriment the clustering algorithms by using Weka and self-written scripts. - Visualized the results of clustered data.
- Analyze the original dataset and the clustered data.
- Implement discord bot to query the cluster with adsb.lol api.
- Implement
main.py.
pip install -r requirements.txt
- Sometimes
winsdkmay have building while installing python package.
Flight-Spotter
├──.env (currently unused, to store discord bot token)
├──.gitignore
├──LICENSE
├──README.md
├──Experiments.md (details of the experiments)
├──requirements.txt
├──flow.drawio
├──preprocessor.py (runs data gathering and preprocessing)
├──experiments.py (runs clustering and analyzing algorithms and draw results)
├──main.py (currently unused)
├──bot.py (currently unused)
├──get_data.py (sub-process script)
├──filter_and_encode.py (sub-process script)
├──eval_end_draw_weka_results.py (sub-process script)
├──clusterer.py (sub-process script)
├──analyzer.py (sub-process script)
├──gps.py (tool script)
├──ecef.py (tool script)
├──visualizer.py (tool script)
├──adsb_lol.py (tool script)
├──pics
│ ├──flow.png
│ ├──JADIZ_and_CADIZ_and_KADIZ_in_East_China_Sea.jpg
│ ├──Taiwan_ADIZ.jpg
│ ├──Taiwan_manual_edges.jpg
│ ├──degree2radian.jpg
│ ├──feet2meter.jpg
│ ├──ecef_conversion.jpg
│ ├──demo_p1.png
│ ├──demo_p2.png
│ └──demo_p3.png
├──data
│ ├──aircraft (auto-generated)
│ │ └──basic-ac-db.json
│ ├──filter_regions
│ │ ├──Taiwan_ADIZ.json
│ │ └──Taiwan_manual_edges.json
│ ├──historical_adsbex_sample (auto-generated, based on user's ENABLES settings)
│ │ └──readsb-hist
│ │ ├──2025_04_01_000000.json
│ │ ├──...
│ │ └──2025_04_01_120000.json
│ ├──preprocessed (auto-generated, based on the user's choose)
│ │ ├──readsb-hist_merged.csv
│ │ ├──readsb-hist_filtered_by_Taiwan_ADIZ.csv
│ │ └──readsb-hist_filtered_by_Taiwan_manual_edges.csv
│ └──直轄市、縣(市)界線1140318
│ ├──修正清單_11403.xlsx
│ ├──TW-01-301000100G-000017.xml
│ ├──COUNTY_MOI_1140318.shp.xml
│ ├──COUNTY_MOI_1140318.shx
│ ├──COUNTY_MOI_1140318.CPG
│ ├──COUNTY_MOI_1140318.dbf
│ ├──COUNTY_MOI_1140318.prj
│ ├──COUNTY_MOI_1140318.sbn
│ ├──COUNTY_MOI_1140318.sbx
│ └──COUNTY_MOI_1140318.shp
├──filter_region_maps (auto-generated, based on the user's choose)
│ ├──Taiwan_ADIZ.html
│ └──Taiwam_manual_edges.html
├──logs (auto-generated)
│ ├──basic_ac_db.txt
│ └──readsb-hist.txt
├──python_results
│ ├──kmeans
│ │ └──readsb-hist_filtered_by_Taiwan_manual_edges
│ │ ├──min_2_max_40
│ │ │ ├──SSE
│ │ │ │ ├──3D.png
│ │ │ │ ├──clustered.csv
│ │ │ │ ├──clustered.html
│ │ │ │ ├──distribution.csv
│ │ │ │ └──distribution.png
│ │ │ ├──Silhouette_Score
│ │ │ │ ├──3D.png
│ │ │ │ ├──clustered.csv
│ │ │ │ ├──clustered.html
│ │ │ │ ├──distribution.csv
│ │ │ │ └──distribution.png
│ │ │ ├──evaluation.csv
│ │ │ └──evaluation.png
│ │ ├──min_2_max_125
│ │ │ ├──SSE
│ │ │ │ ├──3D.png
│ │ │ │ ├──clustered.csv
│ │ │ │ ├──clustered.html
│ │ │ │ ├──distribution.csv
│ │ │ │ └──distribution.png
│ │ │ ├──Silhouette_Score
│ │ │ │ ├──3D.png
│ │ │ │ ├──clustered.csv
│ │ │ │ ├──clustered.html
│ │ │ │ ├──distribution.csv
│ │ │ │ └──distribution.png
│ │ │ ├──evaluation.csv
│ │ │ └──evaluation.png
│ │ └──min_7_max_20
│ │ ├──SSE
│ │ │ ├──3D.png
│ │ │ ├──clustered.csv
│ │ │ ├──clustered.html
│ │ │ ├──distribution.csv
│ │ │ └──distribution.png
│ │ ├──Silhouette_Score
│ │ │ ├──3D.png
│ │ │ ├──clustered.csv
│ │ │ ├──clustered.html
│ │ │ ├──distribution.csv
│ │ │ └──distribution.png
│ │ ├──evaluation.csv
│ │ └──evaluation.png
│ ├──hdbscan
│ │ └──readsb-hist_filtered_by_Taiwan_manual_edges
│ │ ├──finetune_records
│ │ │ ├──heatmap_clustering.png
│ │ │ ├──heatmap_noise.png
│ │ │ └──records.csv
│ │ ├──min_7_epsilon_0_07
│ │ │ ├──3D.png
│ │ │ ├──clustered.csv
│ │ │ ├──clustered.html
│ │ │ ├──distribution.csv
│ │ │ └──distribution.png
│ │ ├──min_7_epsilon_0_1
│ │ │ ├──3D.png
│ │ │ ├──clustered.csv
│ │ │ ├──clustered.html
│ │ │ ├──distribution.csv
│ │ │ └──distribution.png
│ │ ├──min_20_epsilon_0_07
│ │ │ ├──3D.png
│ │ │ ├──clustered.csv
│ │ │ ├──clustered.html
│ │ │ ├──distribution.csv
│ │ │ └──distribution.png
│ │ ├──min_20_epsilon_0_1
│ │ │ ├──3D.png
│ │ │ ├──clustered.csv
│ │ │ ├──clustered.html
│ │ │ ├──distribution.csv
│ │ │ └──distribution.png
│ │ ├──min_20_epsilon_0_6
│ │ │ ├──3D.png
│ │ │ ├──clustered.csv
│ │ │ ├──clustered.html
│ │ │ ├──distribution.csv
│ │ │ └──distribution.png
│ │ ├──min_70_epsilon_1_1
│ │ │ ├──3D.png
│ │ │ ├──clustered.csv
│ │ │ ├──clustered.html
│ │ │ ├──distribution.csv
│ │ │ └──distribution.png
│ │ └──min_90_epsilon_1_0
│ │ ├──3D.png
│ │ ├──clustered.csv
│ │ ├──clustered.html
│ │ ├──distribution.csv
│ │ └──distribution.png
│ └──optics
│ └──readsb-hist_filtered_by_Taiwan_manual_edges
│ ├──finetune_records
│ │ ├──heatmap_clustering.png
│ │ ├──heatmap_noise.png
│ │ └──records.csv
│ ├──min_7_epsilon_0_07
│ │ ├──3D.png
│ │ ├──clustered.csv
│ │ ├──clustered.html
│ │ ├──distribution.csv
│ │ └──distribution.png
│ ├──min_7_epsilon_0_1
│ │ ├──3D.png
│ │ ├──clustered.csv
│ │ ├──clustered.html
│ │ ├──distribution.csv
│ │ └──distribution.png
│ ├──min_20_epsilon_0_07
│ │ ├──3D.png
│ │ ├──clustered.csv
│ │ ├──clustered.html
│ │ ├──distribution.csv
│ │ └──distribution.png
│ ├──min_20_epsilon_0_1
│ │ ├──3D.png
│ │ ├──clustered.csv
│ │ ├──clustered.html
│ │ ├──distribution.csv
│ │ └──distribution.png
│ ├──min_20_epsilon_0_6
│ │ ├──3D.png
│ │ ├──clustered.csv
│ │ ├──clustered.html
│ │ ├──distribution.csv
│ │ └──distribution.png
│ ├──min_70_epsilon_1_1
│ │ ├──3D.png
│ │ ├──clustered.csv
│ │ ├──clustered.html
│ │ ├──distribution.csv
│ │ └──distribution.png
│ └──min_90_epsilon_1_0
│ ├──3D.png
│ ├──clustered.csv
│ ├──clustered.html
│ ├──distribution.csv
│ └──distribution.png
└──weka_results
│ ├──DBSCAN
│ │ └──readsb-hist_filtered_by_Taiwan_manual_edges
│ │ ├──min_7_epsilon_0_07
│ │ | ├──clustered.arff
│ │ │ ├──3D.png
│ │ │ ├──clustered.csv
│ │ │ ├──clustered.html
│ │ │ ├──distribution.csv
│ │ │ └──distribution.png
│ │ ├──min_7_epsilon_0_1
│ │ | ├──clustered.arff
│ │ │ ├──3D.png
│ │ │ ├──clustered.csv
│ │ │ ├──clustered.html
│ │ │ ├──distribution.csv
│ │ │ └──distribution.png
│ │ ├──min_20_epsilon_0_07
│ │ | ├──clustered.arff
│ │ │ ├──3D.png
│ │ │ ├──clustered.csv
│ │ │ ├──clustered.html
│ │ │ ├──distribution.csv
│ │ │ └──distribution.png
│ │ ├──min_20_epsilon_0_1
│ │ | ├──clustered.arff
│ │ │ ├──3D.png
│ │ │ ├──clustered.csv
│ │ │ ├──clustered.html
│ │ │ ├──distribution.csv
│ │ │ └──distribution.png
│ │ ├──min_20_epsilon_0_6
│ │ | ├──clustered.arff
│ │ │ ├──3D.png
│ │ │ ├──clustered.csv
│ │ │ ├──clustered.html
│ │ │ ├──distribution.csv
│ │ │ └──distribution.png
│ │ ├──min_70_epsilon_1_1
│ │ | ├──clustered.arff
│ │ │ ├──3D.png
│ │ │ ├──clustered.csv
│ │ │ ├──clustered.html
│ │ │ ├──distribution.csv
│ │ │ └──distribution.png
│ │ └──min_90_epsilon_1_0
│ │ ├──clustered.arff
│ │ ├──3D.png
│ │ ├──clustered.csv
│ │ ├──clustered.html
│ │ ├──distribution.csv
│ │ └──distribution.png
│ ├──EM
│ │ └──readsb-hist_filtered_by_Taiwan_manual_edges
│ │ ├──clusters_7
│ │ │ ├──clustered.arff
│ │ │ ├──3D.png
│ │ │ ├──clustered.csv
│ │ │ ├──clustered.html
│ │ │ ├──distribution.csv
│ │ │ └──distribution.png
│ │ ├──clusters_20
│ │ │ ├──clustered.arff
│ │ │ ├──3D.png
│ │ │ ├──clustered.csv
│ │ │ ├──clustered.html
│ │ │ ├──distribution.csv
│ │ │ └──distribution.png
│ │ ├──clusters_70
│ │ │ ├──clustered.arff
│ │ │ ├──3D.png
│ │ │ ├──clustered.csv
│ │ │ ├──clustered.html
│ │ │ ├──distribution.csv
│ │ │ └──distribution.png
│ │ └──clusters_90
│ │ ├──clustered.arff
│ │ ├──3D.png
│ │ ├──clustered.csv
│ │ ├──clustered.html
│ │ ├──distribution.csv
│ │ └──distribution.png
│ └──XMeans
│ └──readsb-hist_filtered_by_Taiwan_manual_edges
│ ├──min_7_max_20
│ │ ├──clustered.arff
│ │ ├──3D.png
│ │ ├──clustered.csv
│ │ ├──clustered.html
│ │ ├──distribution.csv
│ │ └──distribution.png
│ ├──min_20_max_70
│ │ ├──clustered.arff
│ │ ├──3D.png
│ │ ├──clustered.csv
│ │ ├──clustered.html
│ │ ├──distribution.csv
│ │ └──distribution.png
│ └──min_70_max_90
│ ├──clustered.arff
│ ├──3D.png
│ ├──clustered.csv
│ ├──clustered.html
│ ├──distribution.csv
│ └──distribution.png
└──analyzed_results
└──readsb-hist_filtered_by_Taiwan_manual_edges
├──grid_by_count.png
├──grid_by_flight_count.png
├──grid_by_flight_legend_off.png
├──grid_by_flight_legend_on.png
├──grid_by_hex_count.png
├──grid_by_hex_legend_off.png
├──grid_by_hex_legend_on.png
├──KDE.png
├──GETIS_ORD_Gstar.png
├──in_degree.png
├──out_degree.png
├──betweenness.png
├──closeness.png
├──pagerank.png
├──community_detection.png
└──cluster_distribution_analysis.png
- In this project, we decided to use the ADSBEX-provided
readsb-histdata, instead of usingtracesorhires-traces. There are two reasons:- It's hard to calculate the sampling rate of
tracesandhires-traces, butreadsb-histhas steady sampling rate (60 or 5 seconds, decided by the data time). - It's hard to reduce the dimension of all traces of an aircraft, especially the sampling rate is not steady.
- It's hard to calculate the sampling rate of
- ADS-B Exchange free historical data
- readsb official documentation
- Data used in this project
- Our preprocessed data already upload to huggingface.
- Time range:
- 2025/04/01 00:00:00 ~ 12:00:00
- Sample rate: 5 seconds per data
- Numbers and Size:
readsb-hist_merged.csv: 6089010 rows, 857 MBreadsb-hist_filtered_by_Taiwan_manual_edges.csv: 9148 rows, 1.27 MB
- The "region filter files" are all manually generated json files, described in the following format:
[
{
"latitude": first GPS coordinate latitude,
"longitude": first GPS coordinate longitude
},
{second GPS coordinate},
{third GPS coordinate},
...,
{last GPS coordinate},
{first GPS coordinate},
]
Each GPS coordinate is in DMS (Degrees, Minutes, Seconds) format.
In filter.py, it will fetch those ADS-B signals with their GPS locations in the polygon surrounded by the above region filter file.
We made two region filter files:
./data./filter_regions./Taiwan_ADIZ.json:- The exact GPS coordinates of Taiwan ADIZ (Air Defense Identification Zone) is described in Part 2 - ENR 5.2.3 ADIZ, provided by Taiwan eAIP.
- Here is the plotted polygon:

- This picture is from wikipedia, ADIZs in East Asia:

./data./fliter_regions./Taiwan_manual_edges.json- We use google map to plot the Taiwan main Island roughly by the following GPS coordinates:
- 台灣最北點 (25°17'58.7"N 121°32'13.1"E)
- 中華民國 領海基點 (25°17'26.6"N 121°30'37.7"E)
- 沙崙湖 (25°14'56.8"N 121°27'16.9"E)
- 台北港貨櫃碼頭股份有限公司 (25°10'04.6"N 121°23'25.5"E)
- 竹圍漁港北堤 (25°07'17.0"N, 121°14'29.4"E)
- 觀音大堀溪北岸 (25°03'49.6"N 121°05'57.9"E)
- 外傘頂洲 (23°28'27.5"N 120°04'06.7"E)
- 布袋北堤 (23°23'03.7"N 120°07'54.1"E)
- 台灣最西點-國聖港燈塔 (23°06'02.6"N 120°02'09.6"E)
- 曾文溪口月牙彎道 (23°03'11.3"N 120°03'15.1"E)
- 黃金海岸 (22°55'53.1"N 120°10'34.9"E)
- 海蝕洞 (22°39'03.0"N 120°15'01.7"E)
- 紅毛港南星燈杆 (22°32'37.8"N 120°17'11.4"E)
- 加祿防波堤彩繪牆 (22°19'43.8"N 120°37'16.7"E)
- 北勢鼻 (21°56'05.0"N 120°42'46.6"E)
- 龍蝦堀 (21°55'13.1"N 120°43'30.1"E)
- 雷公石 (21°55'14.5"N 120°44'21.2"E)
- 南灣遊憩區 (21°57'34.8"N 120°45'45.9"E)
- 臺灣最南點 (21°53'51.9"N 120°51'30.1"E)
- 興海灣沙灘 (21°58'41.4"N 120°50'40.1"E)
- 佳樂水漁村公園 (21°59'20.5"N 120°50'48.5"E)
- 佳樂水風景區 售票處 (21°59'40.2"N 120°51'50.4"E)
- 蟾蜍石 (22°00'04.7"N 120°52'33.2"E)
- 出風鼻 (22°02'03.6"N 120°54'00.5"E)
- 鼻頭礁 (22°06'19.5"N 120°54'05.4"E)
- 台東最東點 (23°07'36.4"N 121°25'26.5"E)
- 烏石鼻 (24°28'53.8"N 121°51'31.1"E)
- 黑礁坪 (24°36'13.7"N 121°53'13.1"E)
- 第二機動巡邏站(合興) (24°55'02.7"N 121°52'59.1"E)
- 台灣最東點 (25°00'40.7"N 122°00'25.8"E)
- 福連里 (25°00'51.8"N 122°00'17.0"E)
- 海廢bar (25°01'23.4"N 121°58'56.8"E)
- 吃飯看海 (25°01'30.2"N 121°58'20.6"E)
- Air Space 福隆海水浴場 (25°01'20.0"N 121°56'37.1"E)
- 美艷山海角奇岩 (25°03'47.4"N 121°55'52.7"E)
- 阿義海鮮商店(柑仔店內海鮮) (25°04'59.2"N 121°54'49.6"E)
- 龍洞灣岬 (25°06'14.4"N 121°55'23.9"E)
- 鼻頭角燈塔 (25°07'43.5"N 121°55'24.5"E)
- 犀牛望月 (25°09'48.2"N 121°45'55.2"E)
- 大義公 (25°10'29.5"N 121°42'28.4"E)
- 步道終點觀景台 (25°12'56.0"N 121°41'60.0"E)
- 頂寮沙灘 (25°12'37.2"N 121°39'31.7"E)
- 神秘海岸 (25°13'45.2"N 121°39'11.3"E)
- 金山區跳石海岸停車場 (25°15'29.7"N 121°38'00.2"E)
- 跳石海岸 芋頭產銷 (25°16'04.9"N 121°37'37.5"E)
- 草里一號店 (25°16'38.5"N 121°37'01.7"E)
- 草里漁港垂釣區 (25°16'56.8"N 121°36'25.0"E)
- 核一廠專用港 (25°17'26.3"N 121°35'45.5"E)
- 鹿邊咖啡Deer cafe (25°17'51.7"N 121°34'24.2"E)
- And here is the plotted polygon:

- We use google map to plot the Taiwan main Island roughly by the following GPS coordinates:
- Due to the filtered region, I use the official Taiwan map in TWD97 that released in 2025/03/18, which is the folder
./data./直轄市、縣(市)界線1140318. - Must use SHP format.

- In your case, You should create a folder under
./data, which should contains a.shpmap file. In my case, it is./data./直轄市、縣(市)界線1140318./COUNTY_MOI_1140318.shp. - Also, set the
.shpmap file path and EPSG code inexperiments.py.
python preprocessor.py
- Checking the aircraft database: This ADSBEX aircraft database is daily-uploaded (at "Supplementary Information" in ADSBEX historical data).
- Download all data in the Enable time range and unzip:
- In
get_data.pyyou can manually set these ENABLE Parameters:# default ENABLES_YEAR = ["2025"] # ["2016", "2017", "2018", "2019", "2020", "2021", "2022", "2023", "2024", "2025"] ENABLES_MONTH = ["04"] # ["01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12"]
- In this project, we only use the
readsb-histdata in 2025/04/01. Note that in this sample historical data, only the first day is available. So please don't modifyENABLES_DATE. - According to the official description of ads-b exchange historical data of readsb-hist:
"Snapshots of all global airborne traffic are archived every 5 seconds starting April 2020, (prior data is available every 60 secs from starting in July 2016)."
- However the accurate sampling rate by time is:
- Sampling rate = 60 seconds: 2016 July ~ 2020 March
- Sampling rate = 5 seconds: 2020 April ~ Now
- The downloaded data will be stored at
./data./historical_adsbex_sample./readsb-histwith the formatted name:yyyy_mm_dd_hhmmss.json, e.g.2025_04_01_003655. - Those
.json.zipdata will auto-unzipped into.jsonformat.
- In
- Filtering:
- The preprocessor will let user choose a region filter file.
- If choose
No filterthen will filter the aircrafts in file that must have the following features:"hex" "flight" "t" "alt_baro" "alt_geom" "gs" "track" "geom_rate" "squawk" "nav_qnh" "nav_altitude_mcp" "nav_altitude_fms" "nav_heading" "lat" "lon" "nic" "rc" "track" "nic_baro" "nac_p" "nac_v" "sil" "sil_type" - The meanings of above features see ADS-B Exchange Version 2 API Fields Documentations and wiedehopf's readsb README-json.md.
- If choose a region filter file, then will generate two files, the first one is the above global aircrafts with those features, and the second one is the aircrafts not only with the above features, but also with GPS coordinate in the polygon plotted by the choose region filter flie.
- Encoding:
- The encoded features are:
"geohash" "ecef_x" "ecef_y" "ecef_z"
-
Geohash
- Using geohash2 (precision = 12)
-
ECEF (Earth-Centered, Earth-Fixed)
- Encode and Decode implementation: ecef.py
- Crucial coefficients:
- Radius of Earth
$ER$ = 6371000 meters - 1 feet (ft) = 0.3047 meter (m)
- Radius of Earth
- Encode inputs:
- Latitude (lat)
$\phi$ , unit = degree - Longitude (lon)
$\lambda$ , unit = degree - Geometric altitude (alt_geom)
$h_{ft}$ ,unit = feet
- Latitude (lat)
- Encode:
- The encoded features are:
- Store:
- The unfiltered (and filtered) data will stored at
./data./preprocessedwith namereadsb-hist_merged.csv(no filter region file was choose) andreadsb-hist_filtered_by_{name of region filter file with no ".json"}.csv(with region filter file). - For instance, the first 5 rows of
./data./preprocessed./readsb-hist_filtered_by_Taiwan_manual_edges.csv:year,month,day,hour,minute,second,hex,flight,t,alt_baro,alt_geom,gs,track,geom_rate,squawk,nav_qnh,nav_altitude_mcp,nav_altitude_fms,nav_heading,lat,lon,nic,rc,nic_baro,nac_p,nac_v,sil,sil_type,geohash,ecef_x,ecef_y,ecef_z 2025,4,1,0,2,45,899046,CAL601,B738,1725,1950,169.9,47.39,1184,6264,1019.2,3008,3008,51.33,25.109067,121.26227,8,186,1,10,2,3,perhour,wsqnz6uqztq4,-2994112.9568203683,4931763.774640314,2703739.699255408 2025,4,1,0,2,50,899046,CAL601,B738,1825,2075,177.0,47.75,1184,6264,1019.2,3008,3008,51.33,25.111767,121.265535,8,186,1,10,2,3,perhour,wsqnz7qyeu8t,-2994345.7634592457,4931513.723232558,2704027.7461128747 2025,4,1,0,2,55,899046,CAL601,B738,1925,2150,184.1,48.08,1120,6264,1019.2,18592,18000,51.33,25.11442,121.268748,8,186,1,10,2,3,perhour,wsqnzecnnehq,-2994568.0573173896,4931256.46817952,2704304.58935681 2025,4,1,0,3,0,899046,CAL601,B738,2050,2300,188.5,48.87,1120,6264,1019.2,20000,18000,51.33,25.117865,121.273049,8,186,1,10,2,3,perhour,wsqnzss64v7m,-2994875.2970717354,4930928.05980613,2704670.879634547 2025,4,1,0,3,5,899046,CAL601,B738,2100,2325,193.1,50.04,1472,6264,1019.2,20000,20000,51.33,25.11882,121.274261,8,186,1,10,2,3,perhour,wsqnzstpxp7j,-2994959.7803176898,4930832.072570452,2704770.2738511804
- The unfiltered (and filtered) data will stored at
- Cluster the aircrafts by position (GPS coordinates and geometric altitude) by different algorithms.
- Analyze the original data density and network, summarize the clustered data of each methods.
- I use this project as a midterm & final project for courses "高等資料探勘" (Advanced Data Mining) and "生成式AI基礎與應用" (The Fundementals and Applications of Generative-AI), which I took at senior second semester (2025/02/01 ~ 2025/07/31, aka 113-2), at 中原大學 (Chung Yuan Christian University, CYCU), by the professor Hsiu-Min, Chuang (莊秀敏 教授).
- And the course required that I must choose one data mining tool to use: Weka / Orange / KNIME, and I decided to use Weka. So there is a folder
./weka_results.
- Run:
python experiments.py - Warnings:
- Must have
./weka_resultsfolder, or comment the following execution to avoid errors.eval_and_draw_weka_results.run(data_name)
- Must have
- See Experiments.md.
- Interact with the
clustered.htmlclustered data file - Due to the size restriction, only generate the webpage with the data size that <= 20000 (by default). See
thresholdin visualizer.draw_map. - Demo file: HDBSCAN with min_points = 7, epsilon = 0.07
Image 1: the 3D visualization of Taiwan main island.
Image 2: the current data point belongs to noise (cluster number = -1).
Image 3: observed that the highest density is at northern airspace of Taiwan.

- To be continued
- Used scripts and folders:
./bot.py - If you want to build you own bot:
- Create a discord bot, get a token at
BOTpgae - Create an
.envfile and store at the root folder, setDISCORD_BOT_TOKEN='<YOUR_DISCORD_BOT_TOKEN>'into it
- Create a discord bot, get a token at
- Still learning from this tutorial
- Used scripts:
./adsb_lol_api.py,./gps.py,./main.py - Documentations please read this official webpage
- Currently
main.pyonly could run on Windows (only knows how to get user's gps location on Windows, also don't have other platform to test). - Please enable Windows GPS in settings (official tutorial here).



