2기_1주차_[구종빈] by Beanssssssss · Pull Request #1 · HateSlop/2-crawling-practice

Beanssssssss · 2025-03-23T10:39:45Z

No description provided.

jhgarry

수고하셨습니다!

jhgarry · 2025-03-26T12:13:17Z

    "# 리뷰 텍스트 추출\n",
    "################################\n",
-    "# reviews_class = \n",
+    "reviews_class = soup.find_all(\"p\", class_=[\"context-text\", \"css-c92dc4\"])  # `class_` 속성 사용\n",


find_all로 class_ 속성을 받아오는 게 훨씬 더 직관적이어서 좋은 것 같네요. 하나 더 배워갑니다.

jhgarry · 2025-03-26T12:14:22Z

+    "# 리뷰어 리스트 출력\n",
    "# 각 리뷰 텍스트 정리 후 추가\n",
-    "for review in reviews_class:\n",
+    "for review, reviewer in zip(reviews_class, reviewers_class) :\n",


리뷰어 정보까지 붙여서 내는 아이디어도 확실히 좋은 것 같아요!

jhgarry · 2025-03-26T12:24:21Z

+    "    count += 1\n",
+    "\n",
+    "# 마지막 리뷰 처리 (남은 게 있을 경우)\n",
+    "if count > 0:\n",


if문이 필요한 건가요? 마지막 리뷰에 도달할 때는 count가 항상 5일 것 같아서요.

jhgarry · 2025-03-26T12:27:50Z

+    "    path_tag = container.find(\"path\")\n",
+    "    if path_tag:\n",
+    "        d_attr = path_tag['d']\n",
+    "        star_check = d_attr[1:3] \n",


d_attr이랑 star_check 두 줄로 안하고 ['d'][1:3]으로 한줄로 줄일 수도 있을 것 같아요.

jhgarry · 2025-03-26T12:35:18Z

-    "################################\n",
    "\n",
-    "# 각 리뷰별로 별점 계산\n",
+    "rating_containers = soup.find_all(\"svg\", attrs={\"xmlns\": \"http://www.w3.org/2000/svg\", \"class\": \"css-1mj121y\"})\n",


이름이랑 attr로 검색해오는 방법이 좋네요, 배워갑니다.

jhgarry · 2025-03-26T12:57:51Z

    "\n",
    "# 단어 추출 (특수문자 제거)\n",
-    "# words = \n",
+    "words = re.sub(r'[^가-힣a-zA-Z0-9\\s]', '', all_reviews_text).split()\n",


sub으로 접근하는 방식이나 ^으로 선택 외 다른 것들을 제거하는 방식은 생각해보지 못했었습니다. 배워갑니다.

jhgarry · 2025-03-26T13:00:06Z

    "\n",
    "# 불용어 제거\n",
-    "# filtered_words = \n",
+    "filtered_words = [c for c in words if c not in korean_stopwords]\n",


이 방식이 정답이었겠군요, 저는 set을 제대로 활용을 못해 결과가 좋지 않았었습니다. 저는이 부분의 코드를 특히 기억해놔야 할 것 같습니다.

Beanssssssss added 2 commits March 23, 2025 19:20

[feat] first commit

f9adf61

[feat] second commit

8ecbad4

Beanssssssss changed the title ~~2기_1주차_크롤링_[구종빈]~~ 2기_1주차_[구종빈] Mar 25, 2025

jhgarry reviewed Mar 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2기_1주차_[구종빈]#1

2기_1주차_[구종빈]#1
Beanssssssss wants to merge 2 commits intoHateSlop:mainfrom
Beanssssssss:kjb

Beanssssssss commented Mar 23, 2025

Uh oh!

jhgarry left a comment

Uh oh!

jhgarry Mar 26, 2025

Uh oh!

jhgarry Mar 26, 2025

Uh oh!

jhgarry Mar 26, 2025

Uh oh!

jhgarry Mar 26, 2025

Uh oh!

jhgarry Mar 26, 2025

Uh oh!

jhgarry Mar 26, 2025

Uh oh!

jhgarry Mar 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Beanssssssss commented Mar 23, 2025

Uh oh!

jhgarry left a comment

Choose a reason for hiding this comment

Uh oh!

jhgarry Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

jhgarry Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

jhgarry Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

jhgarry Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

jhgarry Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

jhgarry Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

jhgarry Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants