Conversation
|
Warning Rate limit exceeded@yunjin-Kim4809 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 2 minutes and 12 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (2)
WalkthroughAdds a new assignment/huggingface.py with a linear dataset → summarization → translation → classification workflow. Updates huggingface_basics.ipynb primarily in metadata and minor cell changes. Expands transformer.ipynb to implement a full KLUE-MRC pipeline with summarization, translation, and emotion analysis, including device handling and installation cells. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User
participant Script as assignment/huggingface.py
participant Datasets as HF Datasets
participant Summ as Summarization Pipeline (BART)
participant Trans as Translation Pipeline (NLLB-200)
participant ZS as Zero-shot Classifier (BART-MNLI)
User->>Script: Run
Script->>Datasets: load(amazon_polarity)
Datasets-->>Script: dataset subset
Script->>Summ: summarize(review_text)
Summ-->>Script: summary
Script->>Trans: translate(summary -> Korean)
Trans-->>Script: ko_summary
Script->>ZS: classify(summary, labels)
ZS-->>Script: predicted label
Script-->>User: dataset with summary, ko_summary, label
sequenceDiagram
autonumber
participant User
participant NB as transformer.ipynb
participant Datasets as HF Datasets (KLUE-MRC)
participant S as Summarizer (KoBART)
participant T as Translator (NLLB-200)
participant E as Emotion (go_emotions)
User->>NB: Execute cells
NB->>Datasets: load(klue_mrc subset)
Datasets-->>NB: dataset
loop per example
NB->>S: summarize(context)
S-->>NB: summary
NB->>T: translate(summary: ko -> en)
T-->>NB: english_summary
NB->>E: analyze(english_summary or summary)
E-->>NB: emotion label
end
NB-->>User: final_dataset with summary, english_summary, emotion
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
huggingface_basics.ipynb (2)
327-332: Replace bare expressions with explicit print/display.Cells use values as standalone expressions (e.g.,
config_dict,tokenizer([...]),klue_mrc_dataset) that don’t render reliably and trigger B018. Preferprint(...)ordisplay(...)so the intent is explicit.Example fixes:
- Lines 327-332:
- config_dict = base_model_config.__dict__ - config_dict + config_dict = base_model_config.__dict__ + print(config_dict)
- Lines 681-686:
- tokenizer(['첫 번째 문장', '두 번째 문장']) # 두 문장은 각각 토큰화 된다. + print(tokenizer(['첫 번째 문장', '두 번째 문장'])) # 두 문장은 각각 토큰화 된다.
- Lines 1095-1096, 1120-1121:
- klue_mrc_dataset + print(klue_mrc_dataset) - klue_mrc_dataset_only_train + print(klue_mrc_dataset_only_train)As per static analysis hints.
Also applies to: 417-422, 507-511, 681-686, 719-724, 956-958, 1095-1096, 1120-1121
1220-1233: Remove duplicate imports to avoid F811.
pandas as pdis re-imported. Keep a single import at the top of the notebook.-from datasets import load_dataset -import pandas as pd +# (imports above already present; remove duplicates here)As per static analysis hints.
transformer.ipynb (1)
38-47: Importdisplayexplicitly to satisfy linters.Add IPython display import so
display(...)is defined outside Colab.from transformers import pipeline import torch import pandas as pd # 데이터 확인용 +from IPython.display import displayAs per static analysis hints.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
assignment/huggingface.py(1 hunks)huggingface_basics.ipynb(33 hunks)transformer.ipynb(11 hunks)
🧰 Additional context used
🪛 Ruff (0.13.3)
transformer.ipynb
107-107: Undefined name display
(F821)
174-174: Undefined name display
(F821)
huggingface_basics.ipynb
19-19: Found useless expression. Either assign it to a variable or remove it.
(B018)
123-123: Found useless expression. Either assign it to a variable or remove it.
(B018)
126-126: Found useless expression. Either assign it to a variable or remove it.
(B018)
163-163: Redefinition of unused pd from line 137
Remove definition: pd
(F811)
assignment/huggingface.py
1-1: Found useless expression. Either assign it to a variable or remove it.
(B018)
| @@ -0,0 +1 @@ | |||
| {"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"provenance":[],"authorship_tag":"ABX9TyM1ckJyPP7ZbdElai2aJUbK"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"code","execution_count":1,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"id":"ouAe_geFwo5j","executionInfo":{"status":"error","timestamp":1759859127341,"user_tz":-540,"elapsed":113095,"user":{"displayName":"김윤진","userId":"16451008522635580194"}},"outputId":"00997dd3-2303-403c-d648-605dbc38bba1"},"outputs":[{"output_type":"stream","name":"stdout","text":["Requirement already satisfied: transformers in /usr/local/lib/python3.12/dist-packages (4.56.2)\n","Requirement already satisfied: datasets in /usr/local/lib/python3.12/dist-packages (4.0.0)\n","Requirement already satisfied: huggingface_hub in /usr/local/lib/python3.12/dist-packages (0.35.3)\n","Requirement already satisfied: filelock in /usr/local/lib/python3.12/dist-packages (from transformers) (3.19.1)\n","Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.12/dist-packages (from transformers) (2.0.2)\n","Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.12/dist-packages (from transformers) (25.0)\n","Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.12/dist-packages (from transformers) (6.0.3)\n","Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.12/dist-packages (from transformers) (2024.11.6)\n","Requirement already satisfied: requests in /usr/local/lib/python3.12/dist-packages (from transformers) (2.32.4)\n","Requirement already satisfied: tokenizers<=0.23.0,>=0.22.0 in /usr/local/lib/python3.12/dist-packages (from transformers) (0.22.1)\n","Requirement already satisfied: safetensors>=0.4.3 in /usr/local/lib/python3.12/dist-packages (from transformers) (0.6.2)\n","Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.12/dist-packages (from transformers) (4.67.1)\n","Requirement already satisfied: pyarrow>=15.0.0 in /usr/local/lib/python3.12/dist-packages (from datasets) (18.1.0)\n","Requirement already satisfied: dill<0.3.9,>=0.3.0 in /usr/local/lib/python3.12/dist-packages (from datasets) (0.3.8)\n","Requirement already satisfied: pandas in /usr/local/lib/python3.12/dist-packages (from datasets) (2.2.2)\n","Requirement already satisfied: xxhash in /usr/local/lib/python3.12/dist-packages (from datasets) (3.5.0)\n","Requirement already satisfied: multiprocess<0.70.17 in /usr/local/lib/python3.12/dist-packages (from datasets) (0.70.16)\n","Requirement already satisfied: fsspec<=2025.3.0,>=2023.1.0 in /usr/local/lib/python3.12/dist-packages (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (2025.3.0)\n","Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.12/dist-packages (from huggingface_hub) (4.15.0)\n","Requirement already satisfied: hf-xet<2.0.0,>=1.1.3 in /usr/local/lib/python3.12/dist-packages (from huggingface_hub) (1.1.10)\n","Requirement already satisfied: aiohttp!=4.0.0a0,!=4.0.0a1 in /usr/local/lib/python3.12/dist-packages (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (3.12.15)\n","Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests->transformers) (3.4.3)\n","Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.12/dist-packages (from requests->transformers) (3.10)\n","Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/dist-packages (from requests->transformers) (2.5.0)\n","Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.12/dist-packages (from requests->transformers) (2025.8.3)\n","Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from pandas->datasets) (2.9.0.post0)\n","Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.12/dist-packages (from pandas->datasets) (2025.2)\n","Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.12/dist-packages (from pandas->datasets) (2025.2)\n","Requirement already satisfied: aiohappyeyeballs>=2.5.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (2.6.1)\n","Requirement already satisfied: aiosignal>=1.4.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (1.4.0)\n","Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (25.3.0)\n","Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (1.7.0)\n","Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (6.6.4)\n","Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (0.3.2)\n","Requirement already satisfied: yarl<2.0,>=1.17.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (1.20.1)\n","Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->pandas->datasets) (1.17.0)\n","Requirement already satisfied: transformers in /usr/local/lib/python3.12/dist-packages (4.56.2)\n","Requirement already satisfied: datasets in /usr/local/lib/python3.12/dist-packages (4.0.0)\n","Requirement already satisfied: sentencepiece in /usr/local/lib/python3.12/dist-packages (0.2.1)\n","Requirement already satisfied: torch in /usr/local/lib/python3.12/dist-packages (2.8.0+cu126)\n","Requirement already satisfied: accelerate in /usr/local/lib/python3.12/dist-packages (1.10.1)\n","Requirement already satisfied: scikit-learn in /usr/local/lib/python3.12/dist-packages (1.6.1)\n","Requirement already satisfied: filelock in /usr/local/lib/python3.12/dist-packages (from transformers) (3.19.1)\n","Requirement already satisfied: huggingface-hub<1.0,>=0.34.0 in /usr/local/lib/python3.12/dist-packages (from transformers) (0.35.3)\n","Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.12/dist-packages (from transformers) (2.0.2)\n","Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.12/dist-packages (from transformers) (25.0)\n","Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.12/dist-packages (from transformers) (6.0.3)\n","Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.12/dist-packages (from transformers) (2024.11.6)\n","Requirement already satisfied: requests in /usr/local/lib/python3.12/dist-packages (from transformers) (2.32.4)\n","Requirement already satisfied: tokenizers<=0.23.0,>=0.22.0 in /usr/local/lib/python3.12/dist-packages (from transformers) (0.22.1)\n","Requirement already satisfied: safetensors>=0.4.3 in /usr/local/lib/python3.12/dist-packages (from transformers) (0.6.2)\n","Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.12/dist-packages (from transformers) (4.67.1)\n","Requirement already satisfied: pyarrow>=15.0.0 in /usr/local/lib/python3.12/dist-packages (from datasets) (18.1.0)\n","Requirement already satisfied: dill<0.3.9,>=0.3.0 in /usr/local/lib/python3.12/dist-packages (from datasets) (0.3.8)\n","Requirement already satisfied: pandas in /usr/local/lib/python3.12/dist-packages (from datasets) (2.2.2)\n","Requirement already satisfied: xxhash in /usr/local/lib/python3.12/dist-packages (from datasets) (3.5.0)\n","Requirement already satisfied: multiprocess<0.70.17 in /usr/local/lib/python3.12/dist-packages (from datasets) (0.70.16)\n","Requirement already satisfied: fsspec<=2025.3.0,>=2023.1.0 in /usr/local/lib/python3.12/dist-packages (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (2025.3.0)\n","Requirement already satisfied: typing-extensions>=4.10.0 in /usr/local/lib/python3.12/dist-packages (from torch) (4.15.0)\n","Requirement already satisfied: setuptools in /usr/local/lib/python3.12/dist-packages (from torch) (75.2.0)\n","Requirement already satisfied: sympy>=1.13.3 in /usr/local/lib/python3.12/dist-packages (from torch) (1.13.3)\n","Requirement already satisfied: networkx in /usr/local/lib/python3.12/dist-packages (from torch) (3.5)\n","Requirement already satisfied: jinja2 in /usr/local/lib/python3.12/dist-packages (from torch) (3.1.6)\n","Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.6.77 in /usr/local/lib/python3.12/dist-packages (from torch) (12.6.77)\n","Requirement already satisfied: nvidia-cuda-runtime-cu12==12.6.77 in /usr/local/lib/python3.12/dist-packages (from torch) (12.6.77)\n","Requirement already satisfied: nvidia-cuda-cupti-cu12==12.6.80 in /usr/local/lib/python3.12/dist-packages (from torch) (12.6.80)\n","Requirement already satisfied: nvidia-cudnn-cu12==9.10.2.21 in /usr/local/lib/python3.12/dist-packages (from torch) (9.10.2.21)\n","Requirement already satisfied: nvidia-cublas-cu12==12.6.4.1 in /usr/local/lib/python3.12/dist-packages (from torch) (12.6.4.1)\n","Requirement already satisfied: nvidia-cufft-cu12==11.3.0.4 in /usr/local/lib/python3.12/dist-packages (from torch) (11.3.0.4)\n","Requirement already satisfied: nvidia-curand-cu12==10.3.7.77 in /usr/local/lib/python3.12/dist-packages (from torch) (10.3.7.77)\n","Requirement already satisfied: nvidia-cusolver-cu12==11.7.1.2 in /usr/local/lib/python3.12/dist-packages (from torch) (11.7.1.2)\n","Requirement already satisfied: nvidia-cusparse-cu12==12.5.4.2 in /usr/local/lib/python3.12/dist-packages (from torch) (12.5.4.2)\n","Requirement already satisfied: nvidia-cusparselt-cu12==0.7.1 in /usr/local/lib/python3.12/dist-packages (from torch) (0.7.1)\n","Requirement already satisfied: nvidia-nccl-cu12==2.27.3 in /usr/local/lib/python3.12/dist-packages (from torch) (2.27.3)\n","Requirement already satisfied: nvidia-nvtx-cu12==12.6.77 in /usr/local/lib/python3.12/dist-packages (from torch) (12.6.77)\n","Requirement already satisfied: nvidia-nvjitlink-cu12==12.6.85 in /usr/local/lib/python3.12/dist-packages (from torch) (12.6.85)\n","Requirement already satisfied: nvidia-cufile-cu12==1.11.1.6 in /usr/local/lib/python3.12/dist-packages (from torch) (1.11.1.6)\n","Requirement already satisfied: triton==3.4.0 in /usr/local/lib/python3.12/dist-packages (from torch) (3.4.0)\n","Requirement already satisfied: psutil in /usr/local/lib/python3.12/dist-packages (from accelerate) (5.9.5)\n","Requirement already satisfied: scipy>=1.6.0 in /usr/local/lib/python3.12/dist-packages (from scikit-learn) (1.16.2)\n","Requirement already satisfied: joblib>=1.2.0 in /usr/local/lib/python3.12/dist-packages (from scikit-learn) (1.5.2)\n","Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.12/dist-packages (from scikit-learn) (3.6.0)\n","Requirement already satisfied: aiohttp!=4.0.0a0,!=4.0.0a1 in /usr/local/lib/python3.12/dist-packages (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (3.12.15)\n","Requirement already satisfied: hf-xet<2.0.0,>=1.1.3 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub<1.0,>=0.34.0->transformers) (1.1.10)\n","Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests->transformers) (3.4.3)\n","Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.12/dist-packages (from requests->transformers) (3.10)\n","Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/dist-packages (from requests->transformers) (2.5.0)\n","Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.12/dist-packages (from requests->transformers) (2025.8.3)\n","Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from sympy>=1.13.3->torch) (1.3.0)\n","Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.12/dist-packages (from jinja2->torch) (3.0.3)\n","Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from pandas->datasets) (2.9.0.post0)\n","Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.12/dist-packages (from pandas->datasets) (2025.2)\n","Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.12/dist-packages (from pandas->datasets) (2025.2)\n","Requirement already satisfied: aiohappyeyeballs>=2.5.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (2.6.1)\n","Requirement already satisfied: aiosignal>=1.4.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (1.4.0)\n","Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (25.3.0)\n","Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (1.7.0)\n","Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (6.6.4)\n","Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (0.3.2)\n","Requirement already satisfied: yarl<2.0,>=1.17.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (1.20.1)\n","Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->pandas->datasets) (1.17.0)\n","사용 가능한 디바이스: CPU\n","Requirement already satisfied: datasets in /usr/local/lib/python3.12/dist-packages (4.0.0)\n","Collecting datasets\n"," Downloading datasets-4.1.1-py3-none-any.whl.metadata (18 kB)\n","Requirement already satisfied: filelock in /usr/local/lib/python3.12/dist-packages (from datasets) (3.19.1)\n","Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.12/dist-packages (from datasets) (2.0.2)\n","Collecting pyarrow>=21.0.0 (from datasets)\n"," Downloading pyarrow-21.0.0-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (3.3 kB)\n","Requirement already satisfied: dill<0.4.1,>=0.3.0 in /usr/local/lib/python3.12/dist-packages (from datasets) (0.3.8)\n","Requirement already satisfied: pandas in /usr/local/lib/python3.12/dist-packages (from datasets) (2.2.2)\n","Requirement already satisfied: requests>=2.32.2 in /usr/local/lib/python3.12/dist-packages (from datasets) (2.32.4)\n","Requirement already satisfied: tqdm>=4.66.3 in /usr/local/lib/python3.12/dist-packages (from datasets) (4.67.1)\n","Requirement already satisfied: xxhash in /usr/local/lib/python3.12/dist-packages (from datasets) (3.5.0)\n","Requirement already satisfied: multiprocess<0.70.17 in /usr/local/lib/python3.12/dist-packages (from datasets) (0.70.16)\n","Requirement already satisfied: fsspec<=2025.9.0,>=2023.1.0 in /usr/local/lib/python3.12/dist-packages (from fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (2025.3.0)\n","Requirement already satisfied: huggingface-hub>=0.24.0 in /usr/local/lib/python3.12/dist-packages (from datasets) (0.35.3)\n","Requirement already satisfied: packaging in /usr/local/lib/python3.12/dist-packages (from datasets) (25.0)\n","Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.12/dist-packages (from datasets) (6.0.3)\n","Requirement already satisfied: aiohttp!=4.0.0a0,!=4.0.0a1 in /usr/local/lib/python3.12/dist-packages (from fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (3.12.15)\n","Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.24.0->datasets) (4.15.0)\n","Requirement already satisfied: hf-xet<2.0.0,>=1.1.3 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.24.0->datasets) (1.1.10)\n","Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests>=2.32.2->datasets) (3.4.3)\n","Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.12/dist-packages (from requests>=2.32.2->datasets) (3.10)\n","Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/dist-packages (from requests>=2.32.2->datasets) (2.5.0)\n","Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.12/dist-packages (from requests>=2.32.2->datasets) (2025.8.3)\n","Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from pandas->datasets) (2.9.0.post0)\n","Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.12/dist-packages (from pandas->datasets) (2025.2)\n","Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.12/dist-packages (from pandas->datasets) (2025.2)\n","Requirement already satisfied: aiohappyeyeballs>=2.5.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (2.6.1)\n","Requirement already satisfied: aiosignal>=1.4.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (1.4.0)\n","Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (25.3.0)\n","Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (1.7.0)\n","Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (6.6.4)\n","Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (0.3.2)\n","Requirement already satisfied: yarl<2.0,>=1.17.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (1.20.1)\n","Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->pandas->datasets) (1.17.0)\n","Downloading datasets-4.1.1-py3-none-any.whl (503 kB)\n","\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m503.6/503.6 kB\u001b[0m \u001b[31m8.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[?25hDownloading pyarrow-21.0.0-cp312-cp312-manylinux_2_28_x86_64.whl (42.8 MB)\n","\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m42.8/42.8 MB\u001b[0m \u001b[31m22.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[?25hInstalling collected packages: pyarrow, datasets\n"," Attempting uninstall: pyarrow\n"," Found existing installation: pyarrow 18.1.0\n"," Uninstalling pyarrow-18.1.0:\n"," Successfully uninstalled pyarrow-18.1.0\n"," Attempting uninstall: datasets\n"," Found existing installation: datasets 4.0.0\n"," Uninstalling datasets-4.0.0:\n"," Successfully uninstalled datasets-4.0.0\n","\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n","cudf-cu12 25.6.0 requires pyarrow<20.0.0a0,>=14.0.0; platform_machine == \"x86_64\", but you have pyarrow 21.0.0 which is incompatible.\n","pylibcudf-cu12 25.6.0 requires pyarrow<20.0.0a0,>=14.0.0; platform_machine == \"x86_64\", but you have pyarrow 21.0.0 which is incompatible.\u001b[0m\u001b[31m\n","\u001b[0mSuccessfully installed datasets-4.1.1 pyarrow-21.0.0\n"]},{"output_type":"error","ename":"KeyboardInterrupt","evalue":"","traceback":["\u001b[0;31m---------------------------------------------------------------------------\u001b[0m","\u001b[0;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)","\u001b[0;32m/tmp/ipython-input-641515054.py\u001b[0m in \u001b[0;36m<cell line: 0>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 17\u001b[0m \u001b[0;31m#데이터셋 로드 준비 - mteb/amazon_polarity\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 18\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 19\u001b[0;31m \u001b[0mget_ipython\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msystem\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'pip install -U datasets'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 20\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 21\u001b[0m \u001b[0;31m#데이터셋 로드 후 불필요한 데이터 지우기\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.12/dist-packages/google/colab/_shell.py\u001b[0m in \u001b[0;36msystem\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m 150\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 151\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mpip_warn\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 152\u001b[0;31m \u001b[0m_pip\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mprint_previous_import_warning\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0moutput\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 153\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 154\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_send_error\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexc_content\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.12/dist-packages/google/colab/_pip.py\u001b[0m in \u001b[0;36mprint_previous_import_warning\u001b[0;34m(output)\u001b[0m\n\u001b[1;32m 54\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mprint_previous_import_warning\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0moutput\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 55\u001b[0m \u001b[0;34m\"\"\"Prints a warning about previously imported packages.\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 56\u001b[0;31m \u001b[0mpackages\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_previously_imported_packages\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0moutput\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 57\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mpackages\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 58\u001b[0m \u001b[0;31m# display a list of packages using the colab-display-data mimetype, which\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.12/dist-packages/google/colab/_pip.py\u001b[0m in \u001b[0;36m_previously_imported_packages\u001b[0;34m(pip_output)\u001b[0m\n\u001b[1;32m 48\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_previously_imported_packages\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpip_output\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 49\u001b[0m \u001b[0;34m\"\"\"List all previously imported packages from a pip install.\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 50\u001b[0;31m \u001b[0minstalled\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mset\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0m_extract_toplevel_packages\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpip_output\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 51\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0msorted\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0minstalled\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mintersection\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mset\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msys\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmodules\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 52\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.12/dist-packages/google/colab/_pip.py\u001b[0m in \u001b[0;36m_extract_toplevel_packages\u001b[0;34m(pip_output)\u001b[0m\n\u001b[1;32m 37\u001b[0m \u001b[0;34m\"\"\"Extract the list of toplevel packages associated with a pip install.\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 38\u001b[0m \u001b[0mtoplevel\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcollections\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdefaultdict\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mset\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 39\u001b[0;31m \u001b[0;32mfor\u001b[0m \u001b[0mm\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mps\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mimportlib\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmetadata\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpackages_distributions\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mitems\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 40\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mp\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mps\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 41\u001b[0m \u001b[0mtoplevel\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mp\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0madd\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mm\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/lib/python3.12/importlib/metadata/__init__.py\u001b[0m in \u001b[0;36mpackages_distributions\u001b[0;34m()\u001b[0m\n\u001b[1;32m 945\u001b[0m \u001b[0mpkg_to_dist\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcollections\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdefaultdict\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlist\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 946\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mdist\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mdistributions\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 947\u001b[0;31m \u001b[0;32mfor\u001b[0m \u001b[0mpkg\u001b[0m \u001b[0;32min\u001b[0m \u001b[0m_top_level_declared\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdist\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mor\u001b[0m \u001b[0m_top_level_inferred\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdist\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 948\u001b[0m \u001b[0mpkg_to_dist\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mpkg\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdist\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmetadata\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'Name'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 949\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mdict\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpkg_to_dist\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/lib/python3.12/importlib/metadata/__init__.py\u001b[0m in \u001b[0;36m_top_level_inferred\u001b[0;34m(dist)\u001b[0m\n\u001b[1;32m 957\u001b[0m opt_names = {\n\u001b[1;32m 958\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mparts\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mparts\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0;36m1\u001b[0m \u001b[0;32melse\u001b[0m \u001b[0minspect\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgetmodulename\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mf\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 959\u001b[0;31m \u001b[0;32mfor\u001b[0m \u001b[0mf\u001b[0m \u001b[0;32min\u001b[0m \u001b[0malways_iterable\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdist\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfiles\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 960\u001b[0m }\n\u001b[1;32m 961\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.12/dist-packages/importlib_metadata/__init__.py\u001b[0m in \u001b[0;36mfiles\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 602\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mlist\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilter\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;32mlambda\u001b[0m \u001b[0mpath\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mpath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlocate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexists\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpackage_paths\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 603\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 604\u001b[0;31m return skip_missing_files(\n\u001b[0m\u001b[1;32m 605\u001b[0m make_files(\n\u001b[1;32m 606\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_read_files_distinfo\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.12/dist-packages/importlib_metadata/_functools.py\u001b[0m in \u001b[0;36mwrapper\u001b[0;34m(param, *args, **kwargs)\u001b[0m\n\u001b[1;32m 100\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mwrapper\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mparam\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 101\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mparam\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 102\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mfunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mparam\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 103\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 104\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mwrapper\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.12/dist-packages/importlib_metadata/__init__.py\u001b[0m in \u001b[0;36mskip_missing_files\u001b[0;34m(package_paths)\u001b[0m\n\u001b[1;32m 600\u001b[0m \u001b[0;34m@\u001b[0m\u001b[0mpass_none\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 601\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mskip_missing_files\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpackage_paths\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 602\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mlist\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilter\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;32mlambda\u001b[0m \u001b[0mpath\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mpath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlocate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexists\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpackage_paths\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 603\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 604\u001b[0m return skip_missing_files(\n","\u001b[0;32m/usr/local/lib/python3.12/dist-packages/importlib_metadata/__init__.py\u001b[0m in \u001b[0;36m<lambda>\u001b[0;34m(path)\u001b[0m\n\u001b[1;32m 600\u001b[0m \u001b[0;34m@\u001b[0m\u001b[0mpass_none\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 601\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mskip_missing_files\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpackage_paths\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 602\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mlist\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilter\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;32mlambda\u001b[0m \u001b[0mpath\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mpath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlocate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexists\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpackage_paths\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 603\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 604\u001b[0m return skip_missing_files(\n","\u001b[0;32m/usr/lib/python3.12/pathlib.py\u001b[0m in \u001b[0;36mexists\u001b[0;34m(self, follow_symlinks)\u001b[0m\n\u001b[1;32m 858\u001b[0m \"\"\"\n\u001b[1;32m 859\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 860\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfollow_symlinks\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mfollow_symlinks\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 861\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mOSError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 862\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0m_ignore_error\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0me\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/lib/python3.12/pathlib.py\u001b[0m in \u001b[0;36mstat\u001b[0;34m(self, follow_symlinks)\u001b[0m\n\u001b[1;32m 838\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0mdoes\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 839\u001b[0m \"\"\"\n\u001b[0;32m--> 840\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfollow_symlinks\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mfollow_symlinks\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 841\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 842\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mlstat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;31mKeyboardInterrupt\u001b[0m: "]}],"source":["#환경 설정 및 라이브러리 설치\n","\n","!pip install transformers datasets huggingface_hub\n","\n","!pip install transformers datasets sentencepiece torch accelerate scikit-learn\n","\n","import datasets\n","from datasets import load_dataset, DatasetDict\n","from transformers import pipeline\n","import torch\n","import pandas as pd # 데이터 확인용\n","\n","device = 0 if torch.cuda.is_available() else -1\n","print(f\"사용 가능한 디바이스: {'GPU' if device == 0 else 'CPU'}\")\n","\n","\n","#데이터셋 로드 준비 - mteb/amazon_polarity\n","\n","!pip install -U datasets\n","\n","#데이터셋 로드 후 불필요한 데이터 지우기\n","\n","from datasets import load_dataset\n","import pandas as pd\n","\n","amazon_review_dataset = load_dataset(\"amazon_polarity\",split=\"train\")\n","print(\"로드완료\\n\",amazon_review_dataset)\n","\n","#일부 데이터만 선택\n","num_samples_to_use = 30\n","sample_subset_raw = amazon_review_dataset.select(range(num_samples_to_use))\n","\n","#불필요한 데이터 지우기\n","sample_subset = sample_subset_raw.remove_columns(['label'])\n","\n","print(\"로드된 데이터셋 정보:\")\n","print(sample_subset)\n","\n","# 데이터셋 확인을 위해 Pandas DataFrame으로 변환\n","df_check = pd.DataFrame(sample_subset)\n","print(\"\\n데이터셋 일부 미리보기 (DataFrame):\")\n","display(df_check.head(10))\n","\n","#요약 모델 파이브라인 로딩\n","summarizer = pipeline(\n"," task=\"summarization\",\n"," model=\"facebook/bart-large-cnn\",\n"," device=device\n",")\n","\n","#요약하기\n","import re\n","\n","def clean_text(text):\n"," #2개 이상 반복되는 특수 문자를 공백으로 대체\n"," text = re.sub(r'[\\^\\_]{2,}', ' ', text)\n","\n"," #3번 이상 반복되는 알파벳을 2번으로 줄이기.\n"," text = re.sub(r'([a-z])\\1{2,}', r'\\1\\1', text, flags=re.IGNORECASE)\n","\n"," #여러 개의 공백을 하나로 줄이고 앞뒤 공백 제거\n"," text = re.sub(r'\\s+', ' ', text).strip()\n"," return text\n","\n","\n","def summarize_content(example):\n"," cleaned_content = clean_text(example['content'])\n"," summary_result = summarizer(\n"," cleaned_content,\n"," max_length=150,\n"," min_length=30,\n"," do_sample=False\n"," )\n"," example['summary'] = summary_result[0]['summary_text']\n"," return example\n","\n","summarized_dataset = sample_subset.map(summarize_content)\n","\n","print(\"\\n요약이 추가된 데이터셋 정보:\")\n","print(summarized_dataset)\n","\n","# 데이터셋 확인 (Pandas)\n","df_check_summary = pd.DataFrame(summarized_dataset)\n","print(\"\\n요약 추가 후 데이터셋 미리보기 (DataFrame):\")\n","display(df_check_summary.head(10))\n","\n","#번역 모델 파이브라인 로딩\n","\n","translator = pipeline(\n"," task=\"translation\",\n"," model=\"facebook/nllb-200-distilled-600M\",\n"," device=device\n",")\n","\n","#영어를 한국어로 번역하기\n","\n","def translate(example):\n"," translation_result = translator(\n"," example['summary'],\n"," src_lang=\"eng_Latn\",\n"," tgt_lang=\"kor_Hang\",\n"," max_length=150,\n"," min_length=30,\n"," do_sample=False\n"," )\n"," example['translation'] = translation_result[0]['translation_text']\n"," return example\n","\n","translated_dataset = summarized_dataset.map(translate)\n","\n","print(\"\\n번역이 추가된 데이터셋 정보:\")\n","print(translated_dataset)\n","\n","# 데이터셋 확인 (Pandas)\n","df_check_translation = pd.DataFrame(translated_dataset)\n","print(\"\\n번역 추가 후 데이터셋 미리보기 (DataFrame):\")\n","display(df_check_translation.head(10))\n","\n","#분류 모델 파이프라인 로딩\n","classifier = pipeline(\n"," \"zero-shot-classification\",\n"," model=\"facebook/bart-large-mnli\"\n",")\n","\n","#카테고리 분류하기\n","def classification(example):\n"," candidate_labels = [\n"," \"Beauty\",\n"," \"Fashion\",\n"," \"Food\",\n"," \"Beverages\",\n"," \"Home Appliances\",\n"," \"Electronics\",\n"," \"Books\",\n"," \"Health\",\n"," \"Toys\",\n"," \"Sports\",\n"," \"Kitchenware\",\n"," \"Music\",\n"," \"Gaming\"\n"," ]\n","\n"," classification_result = classifier(\n"," example['summary'],\n"," candidate_labels=candidate_labels\n"," )\n","\n"," # 가장 높은 점수의 카테고리를 저장\n"," example['category'] = classification_result['labels'][0]\n","\n"," return example\n","\n","classified_dataset = translated_dataset.map(classification)\n","print(classified_dataset)\n","\n","df_check_classify = pd.DataFrame(classified_dataset)\n","print(\"\\n번역 추가 후 데이터셋 미리보기 (DataFrame):\")\n","display(df_check_classify.head(10))"]}]} No newline at end of file | |||
There was a problem hiding this comment.
This .py file is a dumped notebook JSON; not executable and effectively a no-op.
- It’s a single top-level dict literal (B018 “useless expression”); nothing runs.
- Contains Colab
!pipanddisplay(...)in strings; not executed in Python. - Bundles verbose pip logs and a
KeyboardInterrupt. This doesn’t belong in VCS code.
Please either:
- Commit it as a real notebook (
assignment/huggingface.ipynb) or - Replace with a runnable Python script that implements the summarization → translation → classification flow.
Suggested minimal script replacement:
-{"nbformat":4,"nbformat_minor":0,"metadata":{...}}"cells":[...]}
+import re
+import torch
+import pandas as pd
+from datasets import load_dataset
+from transformers import pipeline
+
+device = 0 if torch.cuda.is_available() else -1
+
+# 1) Load small subset
+ds = load_dataset("amazon_polarity", split="train[:30]").remove_columns(["label"])
+
+# 2) Pipelines
+summarizer = pipeline("summarization", model="facebook/bart-large-cnn", device=device)
+translator = pipeline("translation", model="facebook/nllb-200-distilled-600M", device=device)
+classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
+
+def clean_text(text: str) -> str:
+ text = re.sub(r'[\\^\\_]{2,}', ' ', text)
+ text = re.sub(r'([a-z])\\1{2,}', r'\\1\\1', text, flags=re.IGNORECASE)
+ return re.sub(r'\\s+', ' ', text).strip()
+
+def do_summarize(example):
+ cleaned = clean_text(example["content"])
+ out = summarizer(cleaned, max_length=150, min_length=30, do_sample=False)
+ example["summary"] = out[0]["summary_text"]
+ return example
+
+def do_translate(example):
+ out = translator(
+ example["summary"],
+ src_lang="eng_Latn",
+ tgt_lang="kor_Hang",
+ max_length=150,
+ min_length=30,
+ do_sample=False,
+ )
+ example["translation"] = out[0]["translation_text"]
+ return example
+
+CANDIDATE_LABELS = [
+ "Beauty","Fashion","Food","Beverages","Home Appliances","Electronics",
+ "Books","Health","Toys","Sports","Kitchenware","Music","Gaming",
+]
+
+def do_classify(example):
+ out = classifier(example["summary"], candidate_labels=CANDIDATE_LABELS)
+ example["category"] = out["labels"][0]
+ return example
+
+processed = ds.map(do_summarize).map(do_translate).map(do_classify)
+print(processed)
+print(pd.DataFrame(processed).head(10))Also avoid !pip in code; pin dependencies in requirements/environment files instead.
As per static analysis hints.
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.13.3)
1-1: Found useless expression. Either assign it to a variable or remove it.
(B018)
| "test_emotion = emotion_classifier(\"I am very happy today!\")\n", | ||
| "print(f\"감정 분석 테스트: {test_emotion[0][0]}\")" | ||
| ] |
There was a problem hiding this comment.
Fix emotion test output indexing.
For a single string with top_k=1, pipeline returns [{'label': ..., 'score': ...}] (not nested).
-test_emotion = emotion_classifier("I am very happy today!")
-print(f"감정 분석 테스트: {test_emotion[0][0]}")
+test_emotion = emotion_classifier("I am very happy today!")
+print(f"감정 분석 테스트: {test_emotion[0]}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "test_emotion = emotion_classifier(\"I am very happy today!\")\n", | |
| "print(f\"감정 분석 테스트: {test_emotion[0][0]}\")" | |
| ] | |
| test_emotion = emotion_classifier("I am very happy today!") | |
| print(f"감정 분석 테스트: {test_emotion[0]}") |
🤖 Prompt for AI Agents
In transformer.ipynb around lines 330–332, the test output uses incorrect nested
indexing (test_emotion[0][0]) even though emotion_classifier with top_k=1
returns a flat list of dicts like [{'label':..., 'score':...}]; update the print
to access the first result's label (or the dict itself) e.g., use
test_emotion[0]['label'] when printing the predicted label (or test_emotion[0]
to print the full result) so indexing matches the actual return shape.
| "def analyze_emotion(example):\n", | ||
| " \"\"\"데이터셋의 'english_summary'를 받아 감정 분석 결과를 반환하는 함수\"\"\"\n", | ||
| " # TODO: emotion_classifier 파이프라인을 사용하여 example['english_summary']의 감정을 분석하세요.\n", | ||
| " # emotion_result = emotion_classifier(...)\n", | ||
| " # top_k=1 이므로 결과는 [[{'label': '...', 'score': ...}]] 형태입니다.\n", | ||
| " # TODO: 결과에서 가장 확률 높은 감정의 'label' 값만 추출하여 example 딕셔너리의 'emotion' 키 값으로 저장하세요.\n", | ||
| " # example['emotion'] = ...\n", | ||
| " emotion_result = emotion_classifier(example['english_summary'])\n", | ||
| " # 결과는 [[{'label': '...', 'score': ...}]] 형태 (top_k=1이므로 리스트 길이 1)\n", | ||
| " example['emotion'] = emotion_result[0][0]['label']\n", | ||
| " return example\n", |
There was a problem hiding this comment.
Fix analyze_emotion result indexing (avoids KeyError/TypeError).
For single-text input with top_k=1 the result is a list[dict]. Extract the first dict.
-def analyze_emotion(example):
- """데이터셋의 'english_summary'를 받아 감정 분석 결과를 반환하는 함수"""
- emotion_result = emotion_classifier(example['english_summary'])
- # 결과는 [[{'label': '...', 'score': ...}]] 형태 (top_k=1이므로 리스트 길이 1)
- example['emotion'] = emotion_result[0][0]['label']
- return example
+def analyze_emotion(example):
+ """데이터셋의 'english_summary'를 받아 감정 분석 결과를 반환하는 함수"""
+ result = emotion_classifier(example['english_summary'])[0] # [{'label','score'}]
+ example['emotion'] = result['label']
+ return example📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "def analyze_emotion(example):\n", | |
| " \"\"\"데이터셋의 'english_summary'를 받아 감정 분석 결과를 반환하는 함수\"\"\"\n", | |
| " # TODO: emotion_classifier 파이프라인을 사용하여 example['english_summary']의 감정을 분석하세요.\n", | |
| " # emotion_result = emotion_classifier(...)\n", | |
| " # top_k=1 이므로 결과는 [[{'label': '...', 'score': ...}]] 형태입니다.\n", | |
| " # TODO: 결과에서 가장 확률 높은 감정의 'label' 값만 추출하여 example 딕셔너리의 'emotion' 키 값으로 저장하세요.\n", | |
| " # example['emotion'] = ...\n", | |
| " emotion_result = emotion_classifier(example['english_summary'])\n", | |
| " # 결과는 [[{'label': '...', 'score': ...}]] 형태 (top_k=1이므로 리스트 길이 1)\n", | |
| " example['emotion'] = emotion_result[0][0]['label']\n", | |
| " return example\n", | |
| def analyze_emotion(example): | |
| """데이터셋의 'english_summary'를 받아 감정 분석 결과를 반환하는 함수""" | |
| result = emotion_classifier(example['english_summary'])[0] # [{'label','score'}] | |
| example['emotion'] = result['label'] | |
| return example |
🤖 Prompt for AI Agents
In transformer.ipynb around lines 349 to 354, the analyze_emotion function
assumes emotion_classifier returns a nested list (emotion_result[0][0]['label'])
but for single-text input with top_k=1 the classifier returns a list of dicts
(list[dict]) causing KeyError/TypeError; change the extraction to handle the
actual shape by pulling the first dict (e.g., emotion_result[0]['label']), or
defensively check the structure and use the first element if it's a dict or the
first element of the first list if nested, then assign that label to
example['emotion'] and return the example.
아마존 리뷰를 가지고 요약, 번역, 카테고리 분류를 진행하였습니다.
Summary by CodeRabbit
New Features
Documentation