เสนอว่าไม่ควรใช้ pythainlp.corpus.get_corpus_path() นั้นเรียกดาวน์โหลดแฟ้มโดยอัตโนมัติหากมันหาแฟ้มไม่เจอครับ ควรจะปล่อยให้ผู้ใช้ตัดสินใจเองมากกว่า
Current get_corpus_path() try to download the corpus file if it is not yet exist locally:
|
def get_corpus_path(name: str) -> Union[str, None]: |
if db.search(query.name == name):
path = get_full_data_path(db.search(query.name == name)[0]["file"])
if not os.path.exists(path):
download(name)
I proposed that it shouldn't do that.
If the file is not exist, user/developer should get notified and decided if they want to download it or not (using API or using command line).
Currently, inside pythainlp module, every single call of get_corpus_path() do exactly that. They check if returned path is "true", if not they call pythainlp.corpus.download() by themselves:
|
self.__data_path = get_corpus_path("thainer-1-3") |
|
self.__filemodel = get_corpus_path("thai2rom-pytorch-attn") |
|
self.__filemodel = get_corpus_path("thai-g2p") |
|
path = get_corpus_path(fname) |
|
path = get_corpus_path("thai2fit_wv") |
So removing the auto-download inside pythainlp.corpus.get_corpus_path() will not change the behavior of those functions in submodules. (Anyway, it can be further discuss if we want to remove the auto-downloads from those submodules as well or not).
Proposed return values
I propose these for discussion:
- full path - if the corpus name is valid and the file is exist locally
- "" (empty string) - if the corpus name is valid but the file is not exist locally
- None - if the corpus name is not valid (not inside the corpus database)
เสนอว่าไม่ควรใช้
pythainlp.corpus.get_corpus_path()นั้นเรียกดาวน์โหลดแฟ้มโดยอัตโนมัติหากมันหาแฟ้มไม่เจอครับ ควรจะปล่อยให้ผู้ใช้ตัดสินใจเองมากกว่าCurrent
get_corpus_path()try to download the corpus file if it is not yet exist locally:pythainlp/pythainlp/corpus/core.py
Line 81 in 831a9fc
I proposed that it shouldn't do that.
If the file is not exist, user/developer should get notified and decided if they want to download it or not (using API or using command line).
Currently, inside pythainlp module, every single call of
get_corpus_path()do exactly that. They check if returned path is "true", if not they callpythainlp.corpus.download()by themselves:pythainlp/pythainlp/tag/named_entity.py
Line 79 in 831a9fc
pythainlp/pythainlp/transliterate/thai2rom.py
Line 24 in 831a9fc
pythainlp/pythainlp/transliterate/thaig2p.py
Line 25 in 831a9fc
pythainlp/pythainlp/ulmfit/__init__.py
Line 134 in 831a9fc
pythainlp/pythainlp/word_vector/__init__.py
Line 23 in 831a9fc
So removing the auto-download inside
pythainlp.corpus.get_corpus_path()will not change the behavior of those functions in submodules. (Anyway, it can be further discuss if we want to remove the auto-downloads from those submodules as well or not).Proposed return values
I propose these for discussion: