kochat/utils/metrics.py에서 호출한 classification_report() 중 `ValueError: Number of classes, 1, does not match size of target_names, 2. Try specifying the labels parameter`

### Intro

안녕하세요 @hyunwoongko 님! 한국어 챗봇 프레임워크를 필요로 했는데, 너무 잘 만드신 것 같습니다!
코드와 자세한 docs를 읽어보며 감탄했습니다. 덕분에 원하는 기능의 챗봇을 만들 수 있을 것 같습니다.

### 문제 상황

`[DistanceClassifier]` 학습을 완료한 후, 이런 에러가 발생합니다. (아마 OOD를 이용해 classification metrics report 파일을 만드는 과정인 것 같습니다.)

```terminal
...
[DistanceClassifier] Epoch : 10, ETA : 4.3569 sec 
Traceback (most recent call last):
  File "application.py", line 26, in <module>
    kochat = KochatApi(
  File "/workspace/.pyenv_mirror/user/3.8.19/lib/python3.8/site-packages/kochat/app/kochat_api.py", line 56, in __init__
    self.__fit_intent()
  File "/workspace/.pyenv_mirror/user/3.8.19/lib/python3.8/site-packages/kochat/app/kochat_api.py", line 153, in __fit_intent
    self.intent_classifier.fit(self.dataset.load_intent(self.embed_processor))
  File "/workspace/.pyenv_mirror/user/3.8.19/lib/python3.8/site-packages/kochat/proc/intent_classifier.py", line 44, in fit
    report, _ = self.metrics.report(['in_dist', 'out_dist'], mode='ood')
  File "/workspace/.pyenv_mirror/user/3.8.19/lib/python3.8/site-packages/sklearn/utils/_testing.py", line 317, in wrapper
    return fn(*args, **kwargs)
  File "/workspace/.pyenv_mirror/user/3.8.19/lib/python3.8/site-packages/kochat/utils/metrics.py", line 86, in report
    classification_report(
  File "/workspace/.pyenv_mirror/user/3.8.19/lib/python3.8/site-packages/sklearn/utils/validation.py", line 73, in inner_f
    return f(**kwargs)
  File "/workspace/.pyenv_mirror/user/3.8.19/lib/python3.8/site-packages/sklearn/metrics/_classification.py", line 1950, in classification_report
    raise ValueError(
ValueError: Number of classes, 1, does not match size of target_names, 2. Try specifying the labels parameter
```

### 저의 생각

`kochat/utils/metrics.py`의 `Metrics.report()` 함수를 보면 `classification_report()` 함수를 호출하고 있습니다.

```python
class Metrics:

    ...

    def report(self, label_dict: dict, mode: str) -> tuple:
        """
        분류 보고서와 confusion matrix를 출력합니다.
        여기에는 Precision, Recall, F1 Score, Accuracy 등이 포함됩니다.

        :return: 다양한 메트릭으로 측정한 모델 성능
        """

        ...

        report = DataFrame(
            classification_report(
                y_true=label,
                y_pred=predict,
                target_names=list(label_dict),
                output_dict=True
            )
        )

        ...
```

`classification_report()` 함수 정의는 다음과 같습니다. 에러는 해당 코드의 맨 마지막 줄에서 발생합니다.

```python
def classification_report(y_true, y_pred, *, labels=None, target_names=None,
                          sample_weight=None, digits=2, output_dict=False,
                          zero_division="warn"):
    """Build a text report showing the main classification metrics.

    Read more in the :ref:`User Guide <classification_report>`.

    Parameters
    ----------
    y_true : 1d array-like, or label indicator array / sparse matrix
        Ground truth (correct) target values.

    y_pred : 1d array-like, or label indicator array / sparse matrix
        Estimated targets as returned by a classifier.

    labels : array, shape = [n_labels]
        Optional list of label indices to include in the report.

    target_names : list of strings
        Optional display names matching the labels (same order).

    sample_weight : array-like of shape (n_samples,), default=None
        Sample weights.

    digits : int
        Number of digits for formatting output floating point values.
        When ``output_dict`` is ``True``, this will be ignored and the
        returned values will not be rounded.

    output_dict : bool (default = False)
        If True, return output as dict

        .. versionadded:: 0.20

    zero_division : "warn", 0 or 1, default="warn"
        Sets the value to return when there is a zero division. If set to
        "warn", this acts as 0, but warnings are also raised.

    Returns
    -------
    report : string / dict
        Text summary of the precision, recall, F1 score for each class.
        Dictionary returned if output_dict is True. Dictionary has the
        following structure::

            {'label 1': {'precision':0.5,
                         'recall':1.0,
                         'f1-score':0.67,
                         'support':1},
             'label 2': { ... },
              ...
            }

        The reported averages include macro average (averaging the unweighted
        mean per label), weighted average (averaging the support-weighted mean
        per label), and sample average (only for multilabel classification).
        Micro average (averaging the total true positives, false negatives and
        false positives) is only shown for multi-label or multi-class
        with a subset of classes, because it corresponds to accuracy otherwise.
        See also :func:`precision_recall_fscore_support` for more details
        on averages.

        Note that in binary classification, recall of the positive class
        is also known as "sensitivity"; recall of the negative class is
        "specificity".

    See also
    --------
    precision_recall_fscore_support, confusion_matrix,
    multilabel_confusion_matrix

    Examples
    --------
    >>> from sklearn.metrics import classification_report
    >>> y_true = [0, 1, 2, 2, 2]
    >>> y_pred = [0, 0, 2, 2, 1]
    >>> target_names = ['class 0', 'class 1', 'class 2']
    >>> print(classification_report(y_true, y_pred, target_names=target_names))
                  precision    recall  f1-score   support
    <BLANKLINE>
         class 0       0.50      1.00      0.67         1
         class 1       0.00      0.00      0.00         1
         class 2       1.00      0.67      0.80         3
    <BLANKLINE>
        accuracy                           0.60         5
       macro avg       0.50      0.56      0.49         5
    weighted avg       0.70      0.60      0.61         5
    <BLANKLINE>
    >>> y_pred = [1, 1, 0]
    >>> y_true = [1, 1, 1]
    >>> print(classification_report(y_true, y_pred, labels=[1, 2, 3]))
                  precision    recall  f1-score   support
    <BLANKLINE>
               1       1.00      0.67      0.80         3
               2       0.00      0.00      0.00         0
               3       0.00      0.00      0.00         0
    <BLANKLINE>
       micro avg       1.00      0.67      0.80         3
       macro avg       0.33      0.22      0.27         3
    weighted avg       1.00      0.67      0.80         3
    <BLANKLINE>
    """

    y_type, y_true, y_pred = _check_targets(y_true, y_pred)

    labels_given = True
    if labels is None:
        labels = unique_labels(y_true, y_pred) # labels의 정의되는 지점
        labels_given = False
    else:
        labels = np.asarray(labels)

    # labelled micro average
    micro_is_accuracy = ((y_type == 'multiclass' or y_type == 'binary') and
                         (not labels_given or
                          (set(labels) == set(unique_labels(y_true, y_pred)))))

    if target_names is not None and len(labels) != len(target_names):
        if labels_given:
            warnings.warn(
                "labels size, {0}, does not match size of target_names, {1}"
                .format(len(labels), len(target_names))
            )
        else:
            raise ValueError(
                "Number of classes, {0}, does not match size of "
                "target_names, {1}. Try specifying the labels "
                "parameter".format(len(labels), len(target_names))
            ) # 여기에서 에러가 발생합니다!
    ...
```

즉, `labels`와 `target_names`의 길이가 달라서 에러가 발생하는 것으로 보입니다. `labels`는 `classification_report()` 함수에서 일부러 `None` 값이 들어가도록 따로 값을 적어 호출하지 않으신 것 같아서  `labels`는 `unique_labels(y_true, y_pred)`로 정의됩니다.

`unique_labels()` 함수의 설명 속 예시는 다음과 같습니다.

```
    Examples
    --------
    >>> from sklearn.utils.multiclass import unique_labels
    >>> unique_labels([3, 5, 5, 5, 7, 7])
    array([3, 5, 7])
    >>> unique_labels([1, 2, 3, 4], [2, 2, 3, 4])
    array([1, 2, 3, 4])
    >>> unique_labels([1, 2, 10], [5, 11])
    array([ 1,  2,  5, 10, 11])
```

즉, `unique_labels(y_true, y_pred)`는 `y_true`와 `y_pred`를 합집합 하는 연산이라 보입니다.

문제는 이때 `y_true`와 `y_pred`가 모두 동일한 `label`인 `1`, 즉 `out_dist`을 가지고 있을 때 발생합니다. (학습을 충분히 시키지 않은 문제도 있지만, 모두 OOD로 분류되더라도 학습은 진행되어야 하는 것 아닌가요?)

`y_true`와 `y_pred`를 출력해보면 각각 `[1 1 1 ... 1 1 1]`과 `[1 1 1 ... 1 1 1]`로, 길이는 동일합니다.

해당 에러는 어떻게 해결할 수 있을까요? 열심히 제 나름대로 저의 시행착오를 정리했는데 두서가 없는 점 죄송합니다 ㅠㅠ 멋진 프레임워크를 공유해주셔서 다시 한 번 감사합니다.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

kochat/utils/metrics.py에서 호출한 classification_report() 중 `ValueError: Number of classes, 1, does not match size of target_names, 2. Try specifying the labels parameter` #31

Intro

문제 상황

저의 생각

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

kochat/utils/metrics.py에서 호출한 classification_report() 중 ValueError: Number of classes, 1, does not match size of target_names, 2. Try specifying the labels parameter #31

Description

Intro

문제 상황

저의 생각

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

kochat/utils/metrics.py에서 호출한 classification_report() 중 `ValueError: Number of classes, 1, does not match size of target_names, 2. Try specifying the labels parameter` #31