Auto Tokenizer vs Model specific Tokenizer 차이점 #11

zgotter · 2021-10-03T04:26:19Z

zgotter
Oct 3, 2021
Maintainer

huggingface에서 config, tokenizer, model 들을 불러오기 위해 다음과 같은 클래스를 사용하게 됩니다.

AutoConfig
AutoTokenizer
AutoModelForSequenceClassification

이러한 클래스들은 각각의 모델들에 특화된 것들도 존재합니다. (ex. ELECTRA)

ElectraConfig
ElectraTokenizer
ElectraForSequenceClassification

이것들을 사용함에 있어 Auto 클래스를 사용하는 것과 모델에 특화된 클래스를 사용하는 것에 차이가 있는걸까요??

Answered by nlee-208

Oct 4, 2021

저도 그 부분이 궁금해서 관련해서 가볍게 찾아봤는데 그럴듯한 답변은 못 찾겠네요...!

다만 huggingface documentation 에

AutoClasses are here to do this job for you so that you automatically retrieve the relevant model given the name/path to the pretrained weights/config/vocabulary

라고 나와 있는 부분만 보고 유추했을 때 다양한 모델 활용 시, huggingface directory 만 Auto에 입력해주는 비교적 단순한 작업으로 모델 별 ~Tokenizer, ~Config, ~ModelFor~ 클래스를 불러오는 수고스러움을 덜 수 있다...? 가 가장 큰 장점이 아닐가 싶습니다.

관련 이슈글 에서는Auto* 를 사용함으로 잘못된 checkpoint 혹은 tokenizer 를 사용하는 실수를 방지할 수 있다 하는데 그게 흔히 일어날만한 실수는 아닌 것 같아 그냥 편리성 하나 보고 쓰는듯 합니다...!

View full answer

nlee-208 · 2021-10-04T00:46:13Z

nlee-208
Oct 4, 2021
Maintainer

저도 그 부분이 궁금해서 관련해서 가볍게 찾아봤는데 그럴듯한 답변은 못 찾겠네요...!

다만 huggingface documentation 에

AutoClasses are here to do this job for you so that you automatically retrieve the relevant model given the name/path to the pretrained weights/config/vocabulary

라고 나와 있는 부분만 보고 유추했을 때 다양한 모델 활용 시, huggingface directory 만 Auto에 입력해주는 비교적 단순한 작업으로 모델 별 ~Tokenizer, ~Config, ~ModelFor~ 클래스를 불러오는 수고스러움을 덜 수 있다...? 가 가장 큰 장점이 아닐가 싶습니다.

관련 이슈글 에서는Auto* 를 사용함으로 잘못된 checkpoint 혹은 tokenizer 를 사용하는 실수를 방지할 수 있다 하는데 그게 흔히 일어날만한 실수는 아닌 것 같아 그냥 편리성 하나 보고 쓰는듯 합니다...!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto Tokenizer vs Model specific Tokenizer 차이점 #11

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Auto Tokenizer vs Model specific Tokenizer 차이점 #11

zgotter Oct 3, 2021 Maintainer

Replies: 1 comment

nlee-208 Oct 4, 2021 Maintainer

zgotter
Oct 3, 2021
Maintainer

nlee-208
Oct 4, 2021
Maintainer