K-Pop Agenda Dynamics: A Comparative Analysis of Topic Diffusion and Sentiment on News and Social Media
This research has restarted as of Nov 27, 2025, after over 6 months of time.
One line summary: Sort K-pop related articles into distinct agenda topics, then check the characteristics of community response for each distinct agenda topics. We also compare how the community response differ from one online community / SNS platform to another.
Target of Analysis: Top K-pop related articles on Naver Entertainment, from 20240101 to 20241231
Data Source: News data, Korean SNS platforms such as X(Twitter)API, DCinside, and Instagram.
From Jan 1st, 2024 to Dec 31rd, 2024, the 10 articles with the highest number of daily views were collected. Top 5th article on July 29th and 4th article on Dec 23rd could not be opened and so there is uncollected data. However, it was a extremely small portion of the total number of collected articles and therefore I moved on. In essence, a total of 3658 articles were collected.
For this task, I first obtained the list of top 10 articles for each day. Source code for Ranking Crawler list of top10 articles
The reason of using a tsv file instead of the widely used csv file is because the articles have a lot of commas in their titles, which can reading the csv file very confusing, if sorted in that format.
Then, I obtained the article texts for the 3658 articles. Source code for Content Crawler Article texts are not on this repository, but you can run the codes provided and get them yourself.
Note: Ignore the Rank column in the tsv file for now (written 20250220).
Sort out the news articles with the same content. It is very likely that differnet media companies posted articles with the core information. Example: Article 1 and Article 2 is about the same event. A single company could have also re-posted with a little tweak over time as well. We will sort out the articles with the same core information. This is to focus on the upload time of the 1st news article, as the information diffusion among the public starts then.
Apply topic modeling on the gathered news articles, used KoBERT.
The research paper below uses LDA(u_mass).
K-POP 아이돌 그룹의 세대별 이슈 변화 분석 고찰 - 뉴스 빅데이터를 중심으로,Topic Granularity & Hallucination LLM Topic Modelling, multi-granularity learning towards open topic classification, Granularity of algorithmically constructed publication-level classifications of research publications: Identification of topics
Select the top 8 to 10 most influential news per topic type. (numbers subject to change)
Analyze the user reactions on various websites / SNS platforms. (number of likes, comments, shares)
As most articles regarding K-pop have clear keywords (usually the person / group / company name), we can analyze the articles with these keywords included in the title or text.
Analyze the number of the keywords appearing before / after the news article being posted. Check how long it takes for the news to be at its highest attention. On top of that, we can also check the number of changes in the positive / negative word usage around the keywords. (Useful for platforms like DCinside)
From this, we can draw insights on the distinct agenda topics and their differences when it comes to information diffusion and reactions. We can also compare the different information diffusion and reactions across different platforms.
For example, after the news of 카리나 and 이재욱 dating was first told to the public by 디스패치(company code: 311), we can analyze the number of articles with '카리나' or '재욱' included in the article, and analyze how the numbers of these articles change over time. We can also check the number of changes in the positive / negative word usage in the articles containing '카리나' or '재욱'. This data can be part of the "romance" category. (category name subject to change)
Step 3 - Comparison Between the Different Diffusion and Sentiment Across Different Topics and Platforms
To be continued when this research resumes.
Comparison based on the results. TBD after checking the amount of data.
Visualization: Charts and graphs to show results, may use Python or R instead as Step 1&2 is likely to be in Python, and R provides great visualization packages. Plots.jl, Plotly for Julia, wordcloud for Julia, TimeSeries Plotting for Julia