Skip to content

LoveLow-Global/kpop-agenda-sentiment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 

Repository files navigation

K-Pop Agenda Dynamics: A Comparative Analysis of Topic Diffusion and Sentiment on News and Social Media

This research has restarted as of Nov 27, 2025, after over 6 months of time.

One line summary: Sort K-pop related articles into distinct agenda topics, then check the characteristics of community response for each distinct agenda topics. We also compare how the community response differ from one online community / SNS platform to another.

Target of Analysis: Top K-pop related articles on Naver Entertainment, from 20240101 to 20241231

Data Source: News data, Korean SNS platforms such as X(Twitter)API, DCinside, and Instagram.

Step 1 - Cluster News Articles to Distinct Agenda Topics

1.1 - News Data Collection

From Jan 1st, 2024 to Dec 31rd, 2024, the 10 articles with the highest number of daily views were collected. Top 5th article on July 29th and 4th article on Dec 23rd could not be opened and so there is uncollected data. However, it was a extremely small portion of the total number of collected articles and therefore I moved on. In essence, a total of 3658 articles were collected.

For this task, I first obtained the list of top 10 articles for each day. Source code for Ranking Crawler list of top10 articles

The reason of using a tsv file instead of the widely used csv file is because the articles have a lot of commas in their titles, which can reading the csv file very confusing, if sorted in that format.

Then, I obtained the article texts for the 3658 articles. Source code for Content Crawler Article texts are not on this repository, but you can run the codes provided and get them yourself.

Note: Ignore the Rank column in the tsv file for now (written 20250220).

1.2 - Find news articles with same content

Sort out the news articles with the same content. It is very likely that differnet media companies posted articles with the core information. Example: Article 1 and Article 2 is about the same event. A single company could have also re-posted with a little tweak over time as well. We will sort out the articles with the same core information. This is to focus on the upload time of the 1st news article, as the information diffusion among the public starts then.

1.3 - Topic Modeling

Apply topic modeling on the gathered news articles, used KoBERT.

The research paper below uses LDA(u_mass).

K-POP 아이돌 그룹의 세대별 이슈 변화 분석 고찰 - 뉴스 빅데이터를 중심으로,Topic Granularity & Hallucination LLM Topic Modelling, multi-granularity learning towards open topic classification, Granularity of algorithmically constructed publication-level classifications of research publications: Identification of topics

Step 2 - Analysis on the Diffusion of each Article and Sentiment

2.1 - Select some news over different topics.

Select the top 8 to 10 most influential news per topic type. (numbers subject to change)

2.2 - Analysis

Analyze the user reactions on various websites / SNS platforms. (number of likes, comments, shares)

As most articles regarding K-pop have clear keywords (usually the person / group / company name), we can analyze the articles with these keywords included in the title or text.

$\to$ Azua, MatchDG

2.2.1 - Time period Analysis

Analyze the number of the keywords appearing before / after the news article being posted. Check how long it takes for the news to be at its highest attention. On top of that, we can also check the number of changes in the positive / negative word usage around the keywords. (Useful for platforms like DCinside)

$\to$ We can also do this by counting the number of related tags on a SNS platform before / after the news article posted (Useful for platforms like Instagram)

From this, we can draw insights on the distinct agenda topics and their differences when it comes to information diffusion and reactions. We can also compare the different information diffusion and reactions across different platforms.

2.2.2 - Example

For example, after the news of 카리나 and 이재욱 dating was first told to the public by 디스패치(company code: 311), we can analyze the number of articles with '카리나' or '재욱' included in the article, and analyze how the numbers of these articles change over time. We can also check the number of changes in the positive / negative word usage in the articles containing '카리나' or '재욱'. This data can be part of the "romance" category. (category name subject to change)

Step 3 - Comparison Between the Different Diffusion and Sentiment Across Different Topics and Platforms

To be continued when this research resumes.

Comparison based on the results. TBD after checking the amount of data.

Visualization: Charts and graphs to show results, may use Python or R instead as Step 1&2 is likely to be in Python, and R provides great visualization packages. Plots.jl, Plotly for Julia, wordcloud for Julia, TimeSeries Plotting for Julia

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published