Skip to content
View yih5025's full-sized avatar
๐Ÿ’ญ
Study
๐Ÿ’ญ
Study

Highlights

  • Pro

Organizations

@ubicompteam @Code-SCH @SCHU-CoinGame @PhotoMuze

Block or report yih5025

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
yih5025/README.md

Ilhan Yu - Data Engineer

Recent Projects

1. Cryptocurrency Mock Investment Game

Description: A web-based mock investment game that uses real-time cryptocurrency data for fun and educational investing.

Tech Stack:

Key Features:

  • Real-time data ingestion of up to 30,000 records per minute
  • Automated pipeline for data collection and analysis
  • Serverless backend to reduce costs and simplify management
  • To address cloud service cost issues, I replaced AWS EMR with an on-premises distributed computing environment, reducing server costs by 94% ($139.10 โ†’ $7 per month)
  • Won first prize in a game development contest

๐ŸŽฎ Game Play: CoinKing

๐Ÿ”— GitHub Repository: Organization DE Repo, Organization BE Repo

More Details: Tech Blog Post

Designed DataPipe Line Architecture

CoinGame DataPipeline Architecture

In-Game Screenshot

CoinGame In-Game Screenshot

On-Premise Distributed Cluster Setup / Showcased the Game at University


2. Large-Scale IoT Data Analysis Paper

Paper Link: DBpia

Tech Stack:

Title: "์‚ฌ๋ฌผ์ธํ„ฐ๋„ท ํ™˜๊ฒฝ ์ €๋น„์šฉ ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ ๋ถ„์„ ์‹œ๋‚˜๋ฆฌ์˜ค ์„ค๊ณ„ ๋ฐ ์„ฑ๋Šฅ ๋น„๊ต"
Conference: KCC2024 (ํ•œ๊ตญ์ปดํ“จํ„ฐ์ข…ํ•ฉํ•™์ˆ ๋Œ€ํšŒ)

Key Highlights:

  • Built a low-cost Raspberry Pi cluster using Hadoop (HDFS, Yarn) and Spark (DataFrame)
  • Collected and analyzed 22-year particulate matter data from AirKorea
  • Compared performance of Spark and Pandas on Raspberry Pi cluster vs. a standard PC
  • Achieved Best Student Paper at the Korean Computer Congress 2024

Low-Cost IoT Device Based Cluster Setup

Results:

I computed the time that took to analyze certain size of data in four different settings. The settings are:

  • Spark on single RPi
  • Spark on RPi cluster
  • Spark on a normal desktop PC
  • Pandas on a normal desktop PC

Single RPi could complete the task of analyzing big data, though pandas on a normal desktop PC failed due to OOM. Spark on a normal desktop PC showed the best performance, followed by Spark on RPi cluster, then Spark on single RPi.

So then I used less amount of data for experiment - Pandas on a normal desktop PC showed best performance, followed by Spark on a normal desktop PC, then Spark on a RPi cluster, and then Spark on single RPi.

I tried linear regression to predict PM using Scikit-learn and Spark ML. Similar to the experiment result above, when it comes to big data, Pandas and Scikit-learn failed due to OOM, and Spark on RPi cluster succesfully completed machine learning as well as data analysis.

Conclusion

  • Spark on RPi cluster showed fine performance based on relatively low cost
  • Resolved the necessity of costly servers in IoT environments for big data analysis and machine learning

KCC2024 Poster


๐Ÿ’ฌ About Me

  • Iโ€™m a Data Engineer with experience building real-time data pipelines using Kafka, Spark, and Hadoop.
  • I also developed backend services, including Spring-based systems and AWS Lambda/API Gateway, to ensure scalability and reliability.
  • On the frontend side, Iโ€™m comfortable with JavaScript, HTML, and CSS.
  • For database solutions, Iโ€™ve worked with DynamoDB, InfluxDB, and MySQL, focusing on efficient data modeling and queries.

Tech Focus:

  • Data Engineering: Kafka, Spark, Hadoop, Designed ELT/ETL Data Pipeline
  • Backend: Spring, AWS Lambda, API Gateway
  • Frontend: JavaScript, HTML, CSS
  • Databases: DynamoDB, InfluxDB, MySQL

Korean Resume: Korean Resume


๐ŸŒฑ Mission

"I aim to use diverse data sources to build practical services that truly solve real-world problems and help people."

Currently, I want to develop a platform that makes it easy for anyone to collect and handle complex data, bridging the gap between raw information and real-life solutions. I hope to focus on accessible data engineering that benefits peopleโ€™s daily needs without being overly complicated.


๐Ÿ† Achievements

  • Game Development Contest Winner: Developed a real-time cryptocurrency mock investment game, earning first prize.
  • App Development Contest Winner: Developed a "1 Dollor Breakfast" users monitoring app, earning first prize.
  • SW Idea Contest Winner: Suggested the idea of AI special agreement suggestion service for "Jeonse" contracts, earning second prize.
  • Best Student Paper Award at KCC2024: Authored a paper on low-cost IoT-based big data analysis with Raspberry Pi clusters, Spark & Hadoop.

๐Ÿค Contact

Pinned Loading

  1. APM_controller APM_controller Public

    This repository is APM Controller using lab project

    HTML

  2. InfluxDB_to_Mobius_Data InfluxDB_to_Mobius_Data Public

    In this repository, I had get in Mobius server data, using REST API get method, and upload InfluxDB

    Java

  3. js_game js_game Public

    using js make little game

    CSS

  4. Code-SCH/codeSCH_server Code-SCH/codeSCH_server Public

    codeSCH Server

    Java 1

  5. SCHU-CoinGame/BackEnd SCHU-CoinGame/BackEnd Public

    BackEnd Repository

    Python

  6. SCHU-CoinGame/DataEngineering SCHU-CoinGame/DataEngineering Public

    DataEngineering repository

    Python