Skip to content
/ kss Public
forked from hyunwoongko/kss

Kss: A Toolkit for Korean sentence segmentation

License

Notifications You must be signed in to change notification settings

keyog0/kss

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Kss: A Toolkit for Korean sentence segmentation

latest version BSD 3-Clause Issues

This repository contains the source code of Kss, a representative Korean sentence segmentation toolkit. I also conduct ongoing research about Korean sentence segmentation algorithms and report the results to this repository. If you have a good idea about Korean sentence segmentation, please feel free to talk through the issue.


What's New:

1. Installation

1.1. Install from pip

Kss can be easily installed using the pip package manager.

pip install kss

1.2. Install from source codes

You can also install Kss from source codes. This can be useful for adding words to user dictionary described in here.

git clone https://github.com/hyunwoongko/kss
cd kss
pip install -e .

2. Usage

2.1. split_sentences

Kss is the sentence segmentation toolkit based on morpheme-aware heuristic algorithms. And split_sentences is a key function of this toolkit. You can segment input texts to the sentences using this function. Click the triangle button (โ–บ) for more detailed information and example code snippets of each paramter.

>>> from kss import split_sentences

>>> split_sentences(
...     text: Union[str, tuple, List[str]],  
...     use_heuristic: bool = True,
...     use_quotes_brackets_processing: bool = False,                             
...     max_recover_step: int = 5,
...     max_recover_length: int = 20000,
...     backend: str = "pynori",
...     num_workers: int = -1,                       
...     disable_gc: bool = True,                           
... )
text (Union[str, tuple, List[str]])

This parameter indicates input texts. you can also input list or tuple for batch processing not only string.

  • An example of single text segmentation

    >>> from kss import split_sentences
    
    >>> text = "๊ฐ•๋‚จ์—ญ ๋ง›์ง‘์œผ๋กœ ์†Œ๋ฌธ๋‚œ ๊ฐ•๋‚จ ํ† ๋ผ์ •์— ๋‹ค๋…€์™”์Šต๋‹ˆ๋‹ค ํšŒ์‚ฌ ๋™๋ฃŒ ๋ถ„๋“ค๊ณผ ๋‹ค๋…€์™”๋Š”๋ฐ ๋ถ„์œ„๊ธฐ๋„ ์ข‹๊ณ  ์Œ์‹๋„ ๋ง›์žˆ์—ˆ์–ด์š” ๋‹ค๋งŒ, ๊ฐ•๋‚จ ํ† ๋ผ์ •์ด ๊ฐ•๋‚จ ์‰‘์‰‘๋ฒ„๊ฑฐ ๊ณจ๋ชฉ๊ธธ๋กœ ์ญ‰ ์˜ฌ๋ผ๊ฐ€์•ผ ํ•˜๋Š”๋ฐ ๋‹ค๋“ค ์‰‘์‰‘๋ฒ„๊ฑฐ์˜ ์œ ํ˜น์— ๋„˜์–ด๊ฐˆ ๋ป” ํ–ˆ๋‹ต๋‹ˆ๋‹ค"
    >>> split_sentences(text)
    ['๊ฐ•๋‚จ์—ญ ๋ง›์ง‘์œผ๋กœ ์†Œ๋ฌธ๋‚œ ๊ฐ•๋‚จ ํ† ๋ผ์ •์— ๋‹ค๋…€์™”์Šต๋‹ˆ๋‹ค', 'ํšŒ์‚ฌ ๋™๋ฃŒ ๋ถ„๋“ค๊ณผ ๋‹ค๋…€์™”๋Š”๋ฐ ๋ถ„์œ„๊ธฐ๋„ ์ข‹๊ณ  ์Œ์‹๋„ ๋ง›์žˆ์—ˆ์–ด์š”', '๋‹ค๋งŒ, ๊ฐ•๋‚จ ํ† ๋ผ์ •์ด ๊ฐ•๋‚จ ์‰‘์‰‘๋ฒ„๊ฑฐ ๊ณจ๋ชฉ๊ธธ๋กœ ์ญ‰ ์˜ฌ๋ผ๊ฐ€์•ผ ํ•˜๋Š”๋ฐ ๋‹ค๋“ค ์‰‘์‰‘๋ฒ„๊ฑฐ์˜ ์œ ํ˜น์— ๋„˜์–ด๊ฐˆ ๋ป” ํ–ˆ๋‹ต๋‹ˆ๋‹ค']
  • An example of multiple texts batch segmentation

    >>> from kss import split_sentences
    
    >>> text1 = "์˜ค๋Š˜ ์—ฌ๋Ÿฌ๋ถ„๊ณผ ํ•จ๊ป˜ ๋ฆฌ๋ทฐํ•ด ๋ณผ ์˜ํ™”๋Š” ๋ฐ”๋กœ ๋””์ฆˆ๋‹ˆ ํ”ฝ์‚ฌ์˜ ์˜ํ™” '์—…'์ž…๋‹ˆ๋‹ค ์ €๋Š” ์ด ์˜ํ™”๋ฅผ ๊ณ ๋“ฑํ•™๊ต ์˜์–ด์‹œ๊ฐ„์— ์ฒ˜์Œ ๋ณด๊ฒŒ๋˜์—ˆ๋Š”๋ฐ์š”, ์ˆ˜๋Šฅ๋‚ ์„ ๋งž์ดํ•ด์„œ ๊ณ ๋“ฑํ•™๊ต ์ถ”์–ต์ด ๋‹ด๊ธด ์˜ํ™”๋ฅผ ์˜ค๋Š˜ ์—ฌ๋Ÿฌ๋ถ„๊ป˜ ์†Œ๊ฐœํ•ด๋“œ๋ฆฌ๋ ค๊ณ  ํ•ด์š”~ใ…Žใ…Žใ…Ž ํ•œ๋ฐฉ์šธ ๋ˆˆ๋ฌผ๊ณผ ํ•œ๋ฐ”ํƒ• ์›ƒ์Œ ๋งˆ์Œ ์†์— ๋‹ด๊ณ  ์‹ถ์€ ๋‹จ ํ•˜๋‚˜์˜ ๊ฑธ์ž‘ ํ‰์ƒ ๋ชจํ—˜์„ ๊ฟˆ๊ฟ” ์™”๋˜ โ€˜์นผโ€™ ํ• ์•„๋ฒ„์ง€๋Š” ์ˆ˜์ฒœ ๊ฐœ์˜ ํ’์„ ์„ ๋งค๋‹ฌ์•„ ์ง‘์„ ํ†ต์งธ๋กœ ๋‚จ์•„๋ฉ”๋ฆฌ์นด๋กœ ๋‚ ๋ ค ๋ฒ„๋ฆฌ๋Š”๋ฐ, โ€˜์นผโ€™ ํ• ์•„๋ฒ„์ง€์˜ ์ด ์œ„๋Œ€ํ•œ ๋ชจํ—˜์— ์ดˆ๋Œ€ ๋ฐ›์ง€ ์•Š์€ ๋ถˆ์ฒญ๊ฐ์ด ์žˆ์—ˆ์œผ๋‹ˆ, ๋ฐ”๋กœ ํ™ฉ์•ผ์˜ ํƒํ—˜๊ฐ€ โ€˜๋Ÿฌ์…€โ€™ ์ง€๊ตฌ์ƒ์— ๋‘˜๋„ ์—†์„ ์ด ์–ด์ƒ‰ํ•œ ์ปคํ”Œ์ด ํ•จ๊ป˜ ํ•˜๋Š” ๋Œ€๋ชจํ—˜ ๊ทธ๋“ค์€ ๊ณผ์—ฐ ๋‚จ๋ฏธ์˜ ์žƒ์–ด๋ฒ„๋ฆฐ ์„ธ๊ณ„์—์„œ ์‚ฌ๋ผ์ ธ ๋ฒ„๋ฆฐ ๊ฟˆ๊ณผ ํฌ๋ง, ํ–‰๋ณต์„ ๋‹ค์‹œ ์ฐพ์„ ์ˆ˜ ์žˆ์„๊นŒ? ์—ฌ๋Ÿฌ๋ถ„์€ ๋””์ฆˆ๋‹ˆ ์˜ํ™”๋ฅผ ์ข‹์•„ํ•˜์‹œ ๋‚˜์š”? ์ €๋Š” ๋””์ฆˆ๋‹ˆ๋ณด๋‹ค๋Š” ํ”ฝ์‚ฌ๋ฅผ ํ›จ์”ฌ ๋” ์ข‹์•„ํ•˜๋Š” ํŽธ์ธ๋ฐ์š” ๋””์ฆˆ๋‹ˆ์™€ ํ”ฝ์‚ฌ๊ฐ€ ํ•ฉ๋ณ‘ํ•œ ๋’ค, ์ €๋Š” ๋””์ฆˆ๋‹ˆ ํ”ฝ์‚ฌ ์˜ํ™”๊ฐ€ ์ธ์ƒ์˜ํ™” ์ค‘ ๋Œ€๋ถ€๋ถ„์„ ์ฐจ์ง€ํ•  ์ •๋„๋กœ ์ •๋ง ์ฆ๊ฒจ๋ณด๊ณ  ์žˆ์–ด์š”"
    >>> text2 = "๋™์˜์ƒ ์ดฌ์˜์ด ๊ธˆ์ง€๋˜์–ด์žˆ์–ด ๋…ธํ™์ฒ  ์”จ์˜ ์—ด์ • ๋„˜์น˜๋Š” ๊ฐ•์—ฐ์„ ๊ทธ๋Œ€๋กœ ๋ณด์—ฌ ๋“œ๋ฆฌ์ง€ ๋ชปํ•˜๋Š” ์  ๋„ˆ๋ฌด ์•„์‰ฝ๋„ค์š” ใ… ใ…  ๊ฐ„๋‹จํ•œ ํ–‰์‚ฌ์Šค์ผ€์น˜๋กœ๋‚˜๋งˆ ์ฐธ๊ณ ํ•ด์ฃผ์„ธ์š”~ ๋…ธํ™์ฒ ์˜ ์—ด์ • Talk ํ–‰์‚ฌ๋Š” ๊ฐœ๊ทธ๋งจ ๊น€๋ฒ”์šฉ ์”จ๊ฐ€ ๋งก์•„์ฃผ์…จ๊ณ  ์˜คํ”„๋‹ ๋ฌด๋Œ€๋Š” ์œ„๋Œ€ํ•œ ํƒ„์ƒ3 ํƒ‘3๋กœ ์ด๋ฆ„์„ ๋‚ ๋ฆฐ ์˜ค๋ณ‘๊ธธ ์”จ์˜ ๋…ธ๋ž˜๋กœ ๋œจ๊ฒ๊ฒŒ ๋‹ฌ๊ถˆ์กŒ์Šต๋‹ˆ๋‹ค^^ ์ด๋‚  ์ดˆ๋Œ€๋œ ๋กœ์—ด๋ธ”๋ฃจ์™€ ๋ธ”๋ฃจ ๋ฉค๋ฒ„์‹ญ ๊ณ ๊ฐ๋ถ„๋“ค์˜ ํ™˜ํ˜ธ๋กœ ์‚ผ์„ฑํ™๋ณด๊ด€ ๋”œ๋ผ์ดํŠธ ์•ˆ์ด ๊ฐ€๋“ ์ฐจ๋”๊ตฐ์š”! (์˜ค๋ณ‘๊ธธ ์”จ์˜ ๋…ธ๋ž˜ ์ž˜ํ•˜๋Š” ๋น„๋ฒ•์€ ๋ฌด์—‡์ผ๊นŒ์š”? ๊พธ์ค€ํ•œ ๋ชจ์ฐฝ์—ฐ์Šต์ด๋ผ๊ณ โ€ฆ ใ…‹) ๊ณง์ด์–ด ์ด ๋‚  ํ–‰์‚ฌ์˜ ๋ฉ”์ธ์ด์—ˆ๋˜ ๋…ธํ™์ฒ ์”จ์˜ ์—ด์ • Talk๊ฐ€ ๋ณธ๊ฒฉ์ ์œผ๋กœ ์‹œ์ž‘๋˜์—ˆ์Šต๋‹ˆ๋‹ค"
    >>> split_sentences([text1, text2])
    [["์˜ค๋Š˜ ์—ฌ๋Ÿฌ๋ถ„๊ณผ ํ•จ๊ป˜ ๋ฆฌ๋ทฐํ•ด ๋ณผ ์˜ํ™”๋Š” ๋ฐ”๋กœ ๋””์ฆˆ๋‹ˆ ํ”ฝ์‚ฌ์˜ ์˜ํ™” '์—…'์ž…๋‹ˆ๋‹ค", '์ €๋Š” ์ด ์˜ํ™”๋ฅผ ๊ณ ๋“ฑํ•™๊ต ์˜์–ด์‹œ๊ฐ„์— ์ฒ˜์Œ ๋ณด๊ฒŒ๋˜์—ˆ๋Š”๋ฐ์š”,', '์ˆ˜๋Šฅ๋‚ ์„ ๋งž์ดํ•ด์„œ ๊ณ ๋“ฑํ•™๊ต ์ถ”์–ต์ด ๋‹ด๊ธด ์˜ํ™”๋ฅผ ์˜ค๋Š˜ ์—ฌ๋Ÿฌ๋ถ„๊ป˜ ์†Œ๊ฐœํ•ด๋“œ๋ฆฌ๋ ค๊ณ  ํ•ด์š”~ใ…Žใ…Žใ…Ž', 'ํ•œ๋ฐฉ์šธ ๋ˆˆ๋ฌผ๊ณผ ํ•œ๋ฐ”ํƒ• ์›ƒ์Œ ๋งˆ์Œ ์†์— ๋‹ด๊ณ  ์‹ถ์€ ๋‹จ ํ•˜๋‚˜์˜ ๊ฑธ์ž‘ ํ‰์ƒ ๋ชจํ—˜์„ ๊ฟˆ๊ฟ” ์™”๋˜ โ€˜์นผโ€™ ํ• ์•„๋ฒ„์ง€๋Š” ์ˆ˜์ฒœ ๊ฐœ์˜ ํ’์„ ์„ ๋งค๋‹ฌ์•„ ์ง‘์„ ํ†ต์งธ๋กœ ๋‚จ์•„๋ฉ”๋ฆฌ์นด๋กœ ๋‚ ๋ ค ๋ฒ„๋ฆฌ๋Š”๋ฐ, โ€˜์นผโ€™ ํ• ์•„๋ฒ„์ง€์˜ ์ด ์œ„๋Œ€ํ•œ ๋ชจํ—˜์— ์ดˆ๋Œ€ ๋ฐ›์ง€ ์•Š์€ ๋ถˆ์ฒญ๊ฐ์ด ์žˆ์—ˆ์œผ๋‹ˆ, ๋ฐ”๋กœ ํ™ฉ์•ผ์˜ ํƒํ—˜๊ฐ€ โ€˜๋Ÿฌ์…€โ€™ ์ง€๊ตฌ์ƒ์— ๋‘˜๋„ ์—†์„ ์ด ์–ด์ƒ‰ํ•œ ์ปคํ”Œ์ด ํ•จ๊ป˜ ํ•˜๋Š” ๋Œ€๋ชจํ—˜ ๊ทธ๋“ค์€ ๊ณผ์—ฐ ๋‚จ๋ฏธ์˜ ์žƒ์–ด๋ฒ„๋ฆฐ ์„ธ๊ณ„์—์„œ ์‚ฌ๋ผ์ ธ ๋ฒ„๋ฆฐ ๊ฟˆ๊ณผ ํฌ๋ง, ํ–‰๋ณต์„ ๋‹ค์‹œ ์ฐพ์„ ์ˆ˜ ์žˆ์„๊นŒ?', '์—ฌ๋Ÿฌ๋ถ„์€ ๋””์ฆˆ๋‹ˆ ์˜ํ™”๋ฅผ ์ข‹์•„ํ•˜์‹œ ๋‚˜์š”?', '์ €๋Š” ๋””์ฆˆ๋‹ˆ๋ณด๋‹ค๋Š” ํ”ฝ์‚ฌ๋ฅผ ํ›จ์”ฌ ๋” ์ข‹์•„ํ•˜๋Š” ํŽธ์ธ๋ฐ์š”', '๋””์ฆˆ๋‹ˆ์™€ ํ”ฝ์‚ฌ๊ฐ€ ํ•ฉ๋ณ‘ํ•œ ๋’ค, ์ €๋Š” ๋””์ฆˆ๋‹ˆ ํ”ฝ์‚ฌ ์˜ํ™”๊ฐ€ ์ธ์ƒ์˜ํ™” ์ค‘ ๋Œ€๋ถ€๋ถ„์„ ์ฐจ์ง€ํ•  ์ •๋„๋กœ ์ •๋ง ์ฆ๊ฒจ๋ณด๊ณ  ์žˆ์–ด์š”'],
    ['๋™์˜์ƒ ์ดฌ์˜์ด ๊ธˆ์ง€๋˜์–ด์žˆ์–ด ๋…ธํ™์ฒ  ์”จ์˜ ์—ด์ • ๋„˜์น˜๋Š” ๊ฐ•์—ฐ์„ ๊ทธ๋Œ€๋กœ ๋ณด์—ฌ ๋“œ๋ฆฌ์ง€ ๋ชปํ•˜๋Š” ์  ๋„ˆ๋ฌด ์•„์‰ฝ๋„ค์š” ใ… ใ… ', '๊ฐ„๋‹จํ•œ ํ–‰์‚ฌ์Šค์ผ€์น˜๋กœ๋‚˜๋งˆ ์ฐธ๊ณ ํ•ด์ฃผ์„ธ์š”~', '๋…ธํ™์ฒ ์˜ ์—ด์ • Talk ํ–‰์‚ฌ๋Š” ๊ฐœ๊ทธ๋งจ ๊น€๋ฒ”์šฉ ์”จ๊ฐ€ ๋งก์•„์ฃผ์…จ๊ณ  ์˜คํ”„๋‹ ๋ฌด๋Œ€๋Š” ์œ„๋Œ€ํ•œ ํƒ„์ƒ3 ํƒ‘3๋กœ ์ด๋ฆ„์„ ๋‚ ๋ฆฐ ์˜ค๋ณ‘๊ธธ ์”จ์˜ ๋…ธ๋ž˜๋กœ ๋œจ๊ฒ๊ฒŒ ๋‹ฌ๊ถˆ์กŒ์Šต๋‹ˆ๋‹ค^^', '์ด๋‚  ์ดˆ๋Œ€๋œ ๋กœ์—ด๋ธ”๋ฃจ์™€ ๋ธ”๋ฃจ ๋ฉค๋ฒ„์‹ญ ๊ณ ๊ฐ๋ถ„๋“ค์˜ ํ™˜ํ˜ธ๋กœ ์‚ผ์„ฑํ™๋ณด๊ด€ ๋”œ๋ผ์ดํŠธ ์•ˆ์ด ๊ฐ€๋“ ์ฐจ๋”๊ตฐ์š”!', '(์˜ค๋ณ‘๊ธธ ์”จ์˜ ๋…ธ๋ž˜ ์ž˜ํ•˜๋Š” ๋น„๋ฒ•์€ ๋ฌด์—‡์ผ๊นŒ์š”? ๊พธ์ค€ํ•œ ๋ชจ์ฐฝ์—ฐ์Šต์ด๋ผ๊ณ โ€ฆ ใ…‹) ๊ณง์ด์–ด ์ด ๋‚  ํ–‰์‚ฌ์˜ ๋ฉ”์ธ์ด์—ˆ๋˜ ๋…ธํ™์ฒ ์”จ์˜ ์—ด์ • Talk๊ฐ€ ๋ณธ๊ฒฉ์ ์œผ๋กœ ์‹œ์ž‘๋˜์—ˆ์Šต๋‹ˆ๋‹ค']]

use_heuristic (bool)

Kss is an open-ended sentence segmentation toolkit, that can segment everywhere in the input texts even if there are no punctuation marks. But, if you want to conduct punctuation-only segmentation, the setting to segment depending only on punctuation, you can modify segmentation setting using this parameter.

This parameter indicates whether to use the heuristic algorithm for the open-ended sentence segmentation. If you set it True, Kss conduct open-ended segmentation. If you set it False, Kss conduct punctuation-only segmentation.. I recommend to you set it False if input texts follow the punctuation rules relatively well, because Kss can make mistakes sometimes in the parts without punctuation mark.

  • Formal articles (wiki, news, essays): recommend to False
  • Informal articles (sns, blogs, messages): recommend to True

As shown in the performance analysis, if this option is set to False, the segmentation error rate will be downed. However, it does mean Kss will be less sensitive. If your input texts have relatively few punctuation marks, such as messages or blog articles, Kss can't split most of the sentences. Therefore, it must be adjusted according to the type of the input texts.

  • An example of use_heuristic

    >>> from kss import split_sentences
      
    >>> text = "์›์–ด๋ฏผ๋„ ํ”ํ•˜๊ฒŒ ํ‹€๋ฆฌ๋Š” ๋ฌธ๋ฒ•์˜ค๋ฅ˜๋Š” ์•„ํฌ์ŠคํŠธ๋กœํ”ผ(apostrophe)๋ฅผ ์ž˜๋ชป๋œ ์‚ฌ์šฉํ•˜๋Š”๊ฑฐ์˜ˆ์š” ์งˆ๋ฌธ: ์•„ํฌ์ŠคํŠธ๋กœํ”ผ(apostrophe)๋ฅผ ์™œ ์“ฐ๋‚˜์š”? ๋Œ€๋‹ต: ๋‘ ๊ฐ€์ง€ ๋ชฉ์ ์œผ๋กœ ์‚ฌ์šฉํ•ด์š” ์˜ˆ๋ฅผ ๋“ค์–ด์„œ do not = don't not์˜ o๋ฅผ ์ƒ๋žตํ•œ๊ฑธ apostrophe๊ฐ€ ๋ณด์—ฌ์ฃผ๋Š”๊ฑฐ์˜ˆ์š” ๋˜ ๋‹ค๋ฅธ ์˜ˆ๋ฅผ ๋“ค๋ฉด we are = we're are์˜ a๋ฅผ ์ƒ๋žตํ–ˆ์ฃ  ์ƒ๋žต๋œ ํ‘œํ˜„์— ์•„ํฌ์ŠคํŠธ๋กœํ”ผ๋ฅผ ์ž์ฃผ ์‚ฌ์šฉํ•ด์š”. ์ด์ œ ์•„์‹œ๊ฒ ์ฃ ?"
    >>> split_sentences(text, use_heuristic=True)  # can segment without punctuations
    ['์›์–ด๋ฏผ๋„ ํ”ํ•˜๊ฒŒ ํ‹€๋ฆฌ๋Š” ๋ฌธ๋ฒ•์˜ค๋ฅ˜๋Š” ์•„ํฌ์ŠคํŠธ๋กœํ”ผ(apostrophe)๋ฅผ ์ž˜๋ชป๋œ ์‚ฌ์šฉํ•˜๋Š”๊ฑฐ์˜ˆ์š”', '์งˆ๋ฌธ: ์•„ํฌ์ŠคํŠธ๋กœํ”ผ(apostrophe)๋ฅผ ์™œ ์“ฐ๋‚˜์š”?', '๋Œ€๋‹ต: ๋‘ ๊ฐ€์ง€ ๋ชฉ์ ์œผ๋กœ ์‚ฌ์šฉํ•ด์š”', "์˜ˆ๋ฅผ ๋“ค์–ด์„œ do not = don't not์˜ o๋ฅผ ์ƒ๋žตํ•œ๊ฑธ apostrophe๊ฐ€ ๋ณด์—ฌ์ฃผ๋Š”๊ฑฐ์˜ˆ์š”", "๋˜ ๋‹ค๋ฅธ ์˜ˆ๋ฅผ ๋“ค๋ฉด we are = we're are์˜ a๋ฅผ ์ƒ๋žตํ–ˆ์ฃ ", '์ƒ๋žต๋œ ํ‘œํ˜„์— ์•„ํฌ์ŠคํŠธ๋กœํ”ผ๋ฅผ ์ž์ฃผ ์‚ฌ์šฉํ•ด์š”.', '์ด์ œ ์•„์‹œ๊ฒ ์ฃ ?']
    
    >>> split_sentences(text, use_morpheme=False)  # can't segment without punctuations
    ['์›์–ด๋ฏผ๋„ ํ”ํ•˜๊ฒŒ ํ‹€๋ฆฌ๋Š” ๋ฌธ๋ฒ•์˜ค๋ฅ˜๋Š” ์•„ํฌ์ŠคํŠธ๋กœํ”ผ(apostrophe)๋ฅผ ์ž˜๋ชป๋œ ์‚ฌ์šฉํ•˜๋Š”๊ฑฐ์˜ˆ์š” ์งˆ๋ฌธ: ์•„ํฌ์ŠคํŠธ๋กœํ”ผ(apostrophe)๋ฅผ ์™œ ์“ฐ๋‚˜์š”?', "๋Œ€๋‹ต: ๋‘ ๊ฐ€์ง€ ๋ชฉ์ ์œผ๋กœ ์‚ฌ์šฉํ•ด์š” ์˜ˆ๋ฅผ ๋“ค์–ด์„œ do not = don't not์˜ o๋ฅผ ์ƒ๋žตํ•œ๊ฑธ apostrophe๊ฐ€ ๋ณด์—ฌ์ฃผ๋Š”๊ฑฐ์˜ˆ์š” ๋˜ ๋‹ค๋ฅธ ์˜ˆ๋ฅผ ๋“ค๋ฉด we are = we're are์˜ a๋ฅผ ์ƒ๋žตํ–ˆ์ฃ  ์ƒ๋žต๋œ ํ‘œํ˜„์— ์•„ํฌ์ŠคํŠธ๋กœํ”ผ๋ฅผ ์ž์ฃผ ์‚ฌ์šฉํ•ด์š”.", '์ด์ œ ์•„์‹œ๊ฒ ์ฃ ?']

use_quotes_brackets_processing (bool)

Kss has the feature that prevents to segment the parts enclosed in brackets (๊ด„ํ˜ธ) and quotation marks (๋”ฐ์˜ดํ‘œ). This parameter indicates whether to segment the parts enclosed in brackets or quotations marks. If you set it True, Kss does not segment these parts, If you set it False, Kss segments the even in the parts that are enclosed in brackets and quotations marks. default is False. (I set it to False because it's too slow. Set to True if you need this feature.)

  • An example of use_quotes_brackets_processing

    >>> from kss import split_sentences
      
    >>> text = '"๋‚˜๋Š” ์ด์ œ ๋”๋Š” ๋ชป ๋จน๊ฒ ๋‹ค. ๋„ˆ๋ฌด ๋ฐฐ๋ถˆ๋Ÿฌ." ๊ทธ๋ฆฌ๊ณ  ๊ณง์žฅ ์ž๋ฆฌ๋ฅผ ๋–ด๋‹ค. ์•„๋งˆ๋„ ํ™”์žฅ์‹ค์— ๊ฐ„ ๋ชจ์–‘์ด๋‹ค.'
    >>> split_sentences(text, use_quotes_brackets_processing=True)
    ['"๋‚˜๋Š” ์ด์ œ ๋”๋Š” ๋ชป ๋จน๊ฒ ๋‹ค. ๋„ˆ๋ฌด ๋ฐฐ๋ถˆ๋Ÿฌ." ๊ทธ๋ฆฌ๊ณ  ๊ณง์žฅ ์ž๋ฆฌ๋ฅผ ๋–ด๋‹ค.', '์•„๋งˆ๋„ ํ™”์žฅ์‹ค์— ๊ฐ„ ๋ชจ์–‘์ด๋‹ค.']
    
    >>> split_sentences(text, use_quotes_brackets_processing=False)
    ['"๋‚˜๋Š” ์ด์ œ ๋”๋Š” ๋ชป ๋จน๊ฒ ๋‹ค.', '๋„ˆ๋ฌด ๋ฐฐ๋ถˆ๋Ÿฌ.', '" ๊ทธ๋ฆฌ๊ณ  ๊ณง์žฅ ์ž๋ฆฌ๋ฅผ ๋–ด๋‹ค.', '์•„๋งˆ๋„ ํ™”์žฅ์‹ค์— ๊ฐ„ ๋ชจ์–‘์ด๋‹ค.']

max_recover_step & max_recover_length (int)

Kss 2.0 or later can segment sentences even if the pair of brackets and quotation marks do not match. This was a chronic problem in previous Kss C++ (1.0) (#4, #8). But it was fixed in 2.0 by calibration feature about quotation marks and brackets mismatch. However, this feature uses the recursive algorithm that has poor time complexity of O(2^n), so it can be very slow in some cases. Therefore, Kss provides the parameters to adjust the recursive algorithm.

  • max_recover_step determines the depth of recursion. Kss never go deeper than this when resolving quotes and brackets mismatch.
  • max_recover_length determines the length of a sentence to which calibration is applied. Kss does not calibrate sentences longer than this value. Because calibrating long sentences takes a very long time.

P.S. From kss 3.0.2, memoization with LRU cache was introduced. This can improve performance by saving duplicated segmentation results.

  • An example of max_recover_step

    >>> from kss import split_sentences
      
    >>> text = 'YOUR_VERY_LONG_TEXT'
    >>> split_sentences(text, max_recover_step=5)
  • An example of max_recover_length

    >>> from kss import split_sentences
      
    >>> text = 'YOUR_VERY_LONG_TEXT'
    >>> split_sentences(text, max_recover_length=20000)

backend (str)

Kss 3.0 or later supports morpheme analysis. This parameter indicates which morpheme anlyzer will be used during segmentation. If you set it pynori or mecab, sentence segmentation is possible even at the unspecified eomi (์–ด๋ฏธ). In this case, Kss can segment sentences that use honorifics (๊ฒฝ์–ด), dialects (๋ฐฉ์–ธ), neologisms (์‹ ์กฐ์–ด) and eomi transferred from noun (๋ช…์‚ฌํ˜• ์ „์„ฑ์–ด๋ฏธ), and can grasped well the parts that are difficult to grasp without morpheme information.

The followings are summary of the three possible options.

  • pynori: Use Pynori analyzer. It works fine even without C++ installed, but is very slow.
  • mecab: Use Mecab analyzer. It only works in the environment that C++ is installed. However, it is much faster than Pynori.

Kss use the Pynori, the pure python morpheme anlyzer by default. However, you can change it to Mecab-Ko, the super-fast morpheme analyzer based on C++. The performance of two analyzers is almost similar because they were developed based on the same dictionary, mecab-ko-dic. However, since there is a lot of difference in speed, we strongly recommend using mecab backend if you can install mecab-ko in your environment. (I didn't set Mecab-Ko as the default because I value compatibility over speed. If installing mecab is difficult, check this guide)

  • An example of backend

    >>> from kss import split_sentences
      
    >>> text = "๋ถ€๋”” ๋งŒ์ˆ˜๋ฌด๊ฐ• ํ•˜์˜ต์†Œ์„œ ์ฒœ์ฒœํžˆ ๊ฐ€์„ธ์šฉ~ ๋„ˆ ๋ฐฅ์„ ๋จน๋Š”๊ตฌ๋‚˜ ์‘ ๋งž์•„ ๋‚œ ๊ทผ๋ฐ ์–ด์ œ ์ด์‚ฌํ–ˆ์Œ ๊ทธ๋žฌ๊ตฌ๋‚˜ ์ด์ œ ๋งˆ์ง€๋ง‰์ž„ ์‘์‘"
    
    >>> split_sentences(text, backend="pynori")
    ['๋ถ€๋”” ๋งŒ์ˆ˜๋ฌด๊ฐ• ํ•˜์˜ต์†Œ์„œ', '์ฒœ์ฒœํžˆ ๊ฐ€์„ธ์šฉ~', '๋„ˆ ๋ฐฅ์„ ๋จน๋Š”๊ตฌ๋‚˜', '์‘ ๋งž์•„ ๋‚œ ๊ทผ๋ฐ ์–ด์ œ ์ด์‚ฌํ–ˆ์Œ', '๊ทธ๋žฌ๊ตฌ๋‚˜ ์ด์ œ ๋งˆ์ง€๋ง‰์ž„', '์‘์‘']
    
    >>> split_sentences(text, backend="mecab")
    ['๋ถ€๋”” ๋งŒ์ˆ˜๋ฌด๊ฐ• ํ•˜์˜ต์†Œ์„œ', '์ฒœ์ฒœํžˆ ๊ฐ€์„ธ์šฉ~', '๋„ˆ ๋ฐฅ์„ ๋จน๋Š”๊ตฌ๋‚˜', '์‘ ๋งž์•„ ๋‚œ ๊ทผ๋ฐ ์–ด์ œ ์ด์‚ฌํ–ˆ์Œ', '๊ทธ๋žฌ๊ตฌ๋‚˜ ์ด์ œ ๋งˆ์ง€๋ง‰์ž„', '์‘์‘']

num_workers (int)

Kss 3.0 or later supports multiprocessing. Therefore, multiple sentences can be segmented at the same time. This parameter indicates the number of workers to use for multiprocessing. If you set this value as 1 or 0, multiprocessing is disabled. If you input -1, Kss uses the maximum workers as many as possible. If a different value is entered, the number you entered of workers is allocated.

As shown in the performance evaluation, multiprocessing can lead a very large effect on speed. Multiprocessing makes segmentation much faster, especially when using the Pynori backend.

  • An example of num_workers

    >>> from kss import split_sentences
    
    >>> split_sentences(some_text, num_workers=1)  # disable multiprocessing
    >>> split_sentences(some_text, num_workers=-1)  # use maximum workers as many as possible
    >>> split_sentences(some_text, num_workers=4)  # use 4 workers

disable_gc (bool)

This parameter indicates whether to enable the garbage collection during the sentence segmentation. The Pynori analyzer is implemented based on the data structure called Trie. However, since this uses recursive algorithm, it often wastes a lot of memory, which leads to frequent garbage collection. If you set it to True, segmentation speed can be improved by disabling garbage collection. Of course, when the segmentation process ends, garbage collection will be reactivated.

  • An example of disable_gc

    >>> from kss import split_sentences
    
    >>> split_sentences(some_text, disable_gc=True)  # disable garbage collection
    >>> split_sentences(some_text, disable_gc=False)  # enable garbage collection

2.2. split_chunks

split_chunks is used when you want to segment input texts into paragraphs rather than sentences. This function conducts the following two processes:

  1. Split sentences using split_sentences.
  2. Construct a paragraph by concatenating the segmented sentences to the maximum length entered by the user.

Note that this function segments input texts into paragraphs based only on the length, not the contents. And it also supports to chunk window level through the overlap option. Click the triangle button (โ–บ) for more detailed information and example code snippets of each paramter.

>>> from kss import split_chunks

>>> split_chunks(
...     text: Union[str, List[str], tuple],
...     max_length: int,
...     overlap: bool = False,
...     **kwargs,
... )
text (Union[str, tuple, List[str]])

This parameter indicates input texts. you can also input list or tuple for batch processing not only string.

  • An example of single text segmentation
>>> from kss import split_chunks

>>> text = """๊ฐ•๋‚จ์—ญ ๋ง›์ง‘์œผ๋กœ ์†Œ๋ฌธ๋‚œ ๊ฐ•๋‚จ ํ† ๋ผ์ •์— ๋‹ค๋…€์™”์Šต๋‹ˆ๋‹ค. ํšŒ์‚ฌ ๋™๋ฃŒ ๋ถ„๋“ค๊ณผ ๋‹ค๋…€์™”๋Š”๋ฐ ๋ถ„์œ„๊ธฐ๋„ ์ข‹๊ณ  ์Œ์‹๋„ ๋ง›์žˆ์—ˆ์–ด์š” ๋‹ค๋งŒ, ๊ฐ•๋‚จ ํ† ๋ผ์ •์ด ๊ฐ•๋‚จ ์‰‘์‰‘๋ฒ„๊ฑฐ ๊ณจ๋ชฉ๊ธธ๋กœ ์ญ‰ ์˜ฌ๋ผ๊ฐ€์•ผ ํ•˜๋Š”๋ฐ ๋‹ค๋“ค ์‰‘์‰‘๋ฒ„๊ฑฐ์˜ ์œ ํ˜น์— ๋„˜์–ด๊ฐˆ ๋ป” ํ–ˆ๋‹ต๋‹ˆ๋‹ค ๊ฐ•๋‚จ์—ญ ๋ง›์ง‘ ํ† ๋ผ์ •์˜ ์™ธ๋ถ€ ๋ชจ์Šต. ๊ฐ•๋‚จ ํ† ๋ผ์ •์€ 4์ธต ๊ฑด๋ฌผ ๋…์ฑ„๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค.', '์—ญ์‹œ ํ† ๋ผ์ • ๋ณธ ์  ๋‹ต์ฃ ?ใ…Žใ……ใ…Ž ๊ฑด๋ฌผ์€ ํฌ์ง€๋งŒ ๊ฐ„ํŒ์ด ์—†๊ธฐ ๋•Œ๋ฌธ์— ์ง€๋‚˜์น  ์ˆ˜ ์žˆ์œผ๋‹ˆ ์กฐ์‹ฌํ•˜์„ธ์š” ๊ฐ•๋‚จ ํ† ๋ผ์ •์˜ ๋‚ด๋ถ€ ์ธํ…Œ๋ฆฌ์–ด. ํ‰์ผ ์ €๋…์ด์—ˆ์ง€๋งŒ ๊ฐ•๋‚จ์—ญ ๋ง›์ง‘ ๋‹ต๊ฒŒ ์‚ฌ๋žŒ๋“ค์ด ๋งŽ์•˜์–ด์š”. ์ „์ฒด์ ์œผ๋กœ ํŽธ์•ˆํ•˜๊ณ  ์•„๋Š‘ํ•œ ๊ณต๊ฐ„์œผ๋กœ ๊พธ๋ฉฐ์ ธ ์žˆ์—ˆ์Šต๋‹ˆ๋‹คใ…Žใ…Ž ํ•œ ๊ฐ€์ง€ ์•„์‰ฌ์› ๋˜ ๊ฑด ์กฐ๋ช…์ด ๋„ˆ๋ฌด ์–ด๋‘์›Œ ๋ˆˆ์ด ์นจ์นจํ–ˆ๋˜โ€ฆ ์ €ํฌ๋Š” 3์ธต์— ์ž๋ฆฌ๋ฅผ ์žก๊ณ  ์Œ์‹์„ ์ฃผ๋ฌธํ–ˆ์Šต๋‹ˆ๋‹ค.', '์ด 5๋ช…์ด์„œ ๋จน๊ณ  ์‹ถ์€ ์Œ์‹ ํ•˜๋‚˜์”ฉ ๊ณจ๋ผ ๋‹ค์–‘ํ•˜๊ฒŒ ์ฃผ๋ฌธํ–ˆ์–ด์š” ์ฒซ ๋ฒˆ์งธ ์ค€๋น„๋œ ๋ฉ”๋‰ด๋Š” ํ† ๋ผ์ • ๊ณ ๋กœ์ผ€์™€ ๊นป์žŽ ๋ถˆ๊ณ ๊ธฐ ์‚ฌ๋ผ๋‹ค๋ฅผ ๋“ฌ๋ฟ ์˜ฌ๋ ค ๋จน๋Š” ๋ง›์žˆ๋Š” ๋ฐฅ์ž…๋‹ˆ๋‹ค. ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๋ฉ”๋‰ด๋ฅผ ํ•œ ๋ฒˆ์— ์‹œํ‚ค๋ฉด ์ค€๋น„๋˜๋Š” ๋ฉ”๋‰ด๋ถ€ํ„ฐ ๊ฐ€์ ธ๋‹ค ์ฃผ๋”๋ผ๊ตฌ์š”. ํ† ๋ผ์ • ๊ณ ๋กœ์ผ€ ๊ธˆ๋ฐฉ ํŠ€๊ฒจ์ ธ ๋‚˜์™€ ๊ฒ‰์€ ๋ฐ”์‚ญํ•˜๊ณ  ์†์€ ์ด‰์ด‰ํ•ด ๋ง›์žˆ์—ˆ์–ด์š”!', '๊นป์žŽ ๋ถˆ๊ณ ๊ธฐ ์‚ฌ๋ผ๋‹ค๋Š” ๋ถˆ๊ณ ๊ธฐ, ์–‘๋ฐฐ์ถ”, ๋ฒ„์„ฏ์„ ๋ณถ์•„ ๊นป์žŽ์„ ๋“ฌ๋ฟ ์˜ฌ๋ฆฌ๊ณ  ์šฐ์—‰ ํŠ€๊น€์„ ๊ณ๋“ค์—ฌ ๋ฐฅ์ด๋ž‘ ํ•จ๊ป˜ ๋จน๋Š” ๋ฉ”๋‰ด์ž…๋‹ˆ๋‹ค. ์‚ฌ์‹ค ์ „ ๊ณ ๊ธฐ๋ฅผ ์•ˆ ๋จน์–ด์„œ ๋ฌด์Šจ ๋ง›์ธ์ง€ ๋ชจ๋ฅด๊ฒ ์ง€๋งŒ.. ๋‹ค๋“ค ์—„์ฒญ ์ž˜ ๋“œ์…จ์Šต๋‹ˆ๋‹คใ…‹ใ…‹ ์ด๊ฑด ์ œ๊ฐ€ ์‹œํ‚จ ์ด‰์ด‰ํ•œ ๊ณ ๋กœ์ผ€์™€ ํฌ๋ฆผ์ŠคํŠœ์šฐ๋™. ๊ฐ•๋‚จ ํ† ๋ผ์ •์—์„œ ๋จน์€ ์Œ์‹ ์ค‘์— ์ด๊ฒŒ ์ œ์ผ ๋ง›์žˆ์—ˆ์–ด์š”!!! ํฌ๋ฆผ์†Œ์Šค๋ฅผ ์›๋ž˜ ์ข‹์•„ํ•˜๊ธฐ๋„ ํ•˜์ง€๋งŒ, ๋Š๋ผํ•˜์ง€ ์•Š๊ฒŒ ๋ถ€๋“œ๋Ÿฝ๊ณ  ๋‹ฌ๋‹ฌํ•œ ์ŠคํŠœ์™€ ์ซ„๊นƒํ•œ ์šฐ๋™๋ฉด์ด ๋„ˆ๋ฌด ์ž˜ ์–ด์šธ๋ ค ๊ณ„์† ์†์ด ๊ฐ€๋”๋ผ๊ตฌ์š”.', '์‚ฌ์ง„์„ ๋ณด๋‹ˆ ๋˜ ๋จน๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค ๊ฐ„์‚ฌ์ด ํ’ ์—ฐ์–ด ์ง€๋ผ์‹œ์ž…๋‹ˆ๋‹ค. ์ผ๋ณธ ๊ฐ„์‚ฌ์ด ์ง€๋ฐฉ์—์„œ ๋งŽ์ด ๋จน๋Š” ๋– ๋จน๋Š” ์ดˆ๋ฐฅ(์ง€๋ผ์‹œ์Šค์‹œ)์ด๋ผ๊ณ  ํ•˜๋„ค์š”. ๋ฐ‘์— ์™€์‚ฌ๋น„ ๋งˆ์š”๋ฐฅ ์œ„์— ์—ฐ์–ด๋“ค์ด ๋‹ด๊ฒจ์ ธ ์žˆ์–ด ์ฝ”๋์ด ์ฐกํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ ํ˜€ ์žˆ๋Š”๋ฐ, ๋‚œ ์™€์‚ฌ๋น„ ๋ง› 1๋„ ๋ชจ๋ฅด๊ฒ ๋˜๋ฐโ€ฆ? ์™€์‚ฌ๋น„๋ฅผ ์•ˆ ์ข‹์•„ํ•˜๋Š” ์ €๋Š” ๋ถˆํ–‰์ธ์ง€ ๋‹คํ–‰์ธ์ง€ ์—ฐ์–ด ์ง€๋ผ์‹œ๋ฅผ ๋งค์šฐ ๋ง›์žˆ๊ฒŒ ๋จน์—ˆ์Šต๋‹ˆ๋‹คใ…‹ใ…‹ใ…‹', '๋‹ค์Œ ๋ฉ”๋‰ด๋Š” ๋‹ฌ์ง์ง€๊ทผํ•œ ์ˆฏ๋ถˆ ๊ฐˆ๋น„ ๋ฎ๋ฐฅ์ž…๋‹ˆ๋‹ค! ๊ฐ„์žฅ ์–‘๋…์— ๊ตฌ์šด ์ˆฏ๋ถˆ ๊ฐˆ๋น„์— ์–‘ํŒŒ, ๊นป์žŽ, ๋‹ฌ๊ฑ€ ๋ฐ˜์ˆ™์„ ํ„ฐํŠธ๋ ค ๋น„๋ฒผ ๋จน์œผ๋ฉด ๊ทธ ๋ง›์ด ํฌ.. (๋ฌผ๋ก  ์ „ ์•ˆ ๋จน์—ˆ์ง€๋งŒโ€ฆ๋‹ค๋ฅธ ๋ถ„๋“ค์ด ๊ทธ๋ ‡๋‹ค๊ณ  ํ•˜๋”๋ผ๊ตฌ์š”ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹) ๋งˆ์ง€๋ง‰ ๋ฉ”์ธ ๋ฉ”๋‰ด ์–‘์†ก์ด ํฌ๋ฆผ์ˆ˜ํ”„์™€ ์ˆฏ๋ถˆ๋–ก๊ฐˆ๋น„ ๋ฐฅ์ž…๋‹ˆ๋‹ค. ํฌ๋ฆผ๋ฆฌ์กฐ๋˜๋ฅผ ๋ฒ ์ด์Šค๋กœ ์œ„์— ๊ทธ๋ฃจํ†ต๊ณผ ์ˆฏ๋ถˆ๋กœ ๊ตฌ์šด ๋–ก๊ฐˆ๋น„๊ฐ€ ์˜ฌ๋ผ๊ฐ€ ์žˆ์–ด์š”!', 'ํฌ๋ฆผ์ŠคํŠœ ์šฐ๋™ ๋งŒํผ์ด๋‚˜ ๋Œ€๋ฐ• ๋ง›์žˆ์Šต๋‹ˆ๋‹คโ€ฆใ… ใ… ใ… ใ… ใ… ใ…  (ํฌ๋ฆผ ์†Œ์Šค๋ฉด ๋‹ค ์ข‹์•„ํ•˜๋Š” ๊ฑฐ ์ ˆ๋Œ€ ์•„๋‹™๋‹ˆ๋‹คใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹) ๊ฐ•๋‚จ ํ† ๋ผ์ • ์š”๋ฆฌ๋Š” ๋‹ค ๋ง›์žˆ์ง€๋งŒ ํฌ๋ฆผ์†Œ์Šค ์š”๋ฆฌ๋ฅผ ์ฐธ ์ž˜ํ•˜๋Š” ๊ฑฐ ๊ฐ™๋„ค์š” ์š”๊ฑด ๋ฌผ๋งŒ ๋งˆ์‹œ๊ธฐ ์•„์‰ฌ์›Œ ์‹œํ‚จ ๋‰ด์ž๋ชฝ๊ณผ ๋ฐ€ํ‚ค์†Œ๋‹ค ๋”ธ๊ธฐํ†ตํ†ต! ์œ ์ž์™€ ์ž๋ชฝ์˜ ๋ง›์„ ํ•จ๊ป˜ ๋Š๋‚„ ์ˆ˜ ์žˆ๋Š” ๋‰ด์ž๋ชฝ์€ ์ƒํผํ•จ ๊ทธ ์ž์ฒด์˜€์–ด์š”.', 'ํ•˜์น˜๋งŒ ์ €๋Š” ๋”ธ๊ธฐํ†ตํ†ต ๋ฐ€ํ‚ค์†Œ๋‹ค๊ฐ€ ๋” ๋ง›์žˆ์—ˆ์Šต๋‹ˆ๋‹คใ…Žใ…Ž ๋ฐ€ํ‚ค์†Œ๋‹ค๋Š” ํ† ๋ผ์ •์—์„œ๋งŒ ๋งŒ๋‚˜๋ณผ ์ˆ˜ ์žˆ๋Š” ๋ฉ”๋‰ด๋ผ๊ณ  ํ•˜๋‹ˆ ํ•œ ๋ฒˆ ๋“œ์…”๋ณด์‹œ๊ธธ ์ถ”์ฒœํ• ๊ฒŒ์š”!! ๊ฐ•๋‚จ ํ† ๋ผ์ •์€ ๊ฐ•๋‚จ์—ญ ๋ง›์ง‘๋‹ต๊ฒŒ ๋ชจ๋“  ์Œ์‹๋“ค์ด ๋Œ€์ฒด์ ์œผ๋กœ ๋ง›์žˆ์—ˆ์–ด์š”! ๊ฑด๋ฌผ ์œ„์น˜๋„ ๊ฐ•๋‚จ ๋Œ€๋กœ๋ณ€์—์„œ ์กฐ๊ธˆ ๋–จ์–ด์ ธ ์žˆ์–ด ๋‚ด๋ถ€ ์ธํ…Œ๋ฆฌ์–ด์ฒ˜๋Ÿผ ์•„๋Š‘ํ•œ ๋Š๋‚Œ๋„ ์žˆ์—ˆ๊ตฌ์š”ใ…Žใ…Ž', '๊ธฐํšŒ๊ฐ€ ๋˜๋ฉด ๋‹ค๋“ค ๊ผญ ๋“ค๋Ÿฌ๋ณด์„ธ์š”~ ๐Ÿ™‚"""
>>> split_chunks(text, max_length=128)
['๊ฐ•๋‚จ์—ญ ๋ง›์ง‘์œผ๋กœ ์†Œ๋ฌธ๋‚œ ๊ฐ•๋‚จ ํ† ๋ผ์ •์— ๋‹ค๋…€์™”์Šต๋‹ˆ๋‹ค. ํšŒ์‚ฌ ๋™๋ฃŒ ๋ถ„๋“ค๊ณผ ๋‹ค๋…€์™”๋Š”๋ฐ ๋ถ„์œ„๊ธฐ๋„ ์ข‹๊ณ  ์Œ์‹๋„ ๋ง›์žˆ์—ˆ์–ด์š” ๋‹ค๋งŒ, ๊ฐ•๋‚จ ํ† ๋ผ์ •์ด ๊ฐ•๋‚จ ์‰‘์‰‘๋ฒ„๊ฑฐ ๊ณจ๋ชฉ๊ธธ๋กœ ์ญ‰ ์˜ฌ๋ผ๊ฐ€์•ผ ํ•˜๋Š”๋ฐ ๋‹ค๋“ค ์‰‘์‰‘๋ฒ„๊ฑฐ์˜ ์œ ํ˜น์— ๋„˜์–ด๊ฐˆ ๋ป” ํ–ˆ๋‹ต๋‹ˆ๋‹ค ๊ฐ•๋‚จ์—ญ ๋ง›์ง‘ ํ† ๋ผ์ •์˜ ์™ธ๋ถ€ ๋ชจ์Šต. ๊ฐ•๋‚จ ํ† ๋ผ์ •์€ 4์ธต ๊ฑด๋ฌผ ๋…์ฑ„๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค.', '์—ญ์‹œ ํ† ๋ผ์ • ๋ณธ ์  ๋‹ต์ฃ ?ใ…Žใ……ใ…Ž ๊ฑด๋ฌผ์€ ํฌ์ง€๋งŒ ๊ฐ„ํŒ์ด ์—†๊ธฐ ๋•Œ๋ฌธ์— ์ง€๋‚˜์น  ์ˆ˜ ์žˆ์œผ๋‹ˆ ์กฐ์‹ฌํ•˜์„ธ์š” ๊ฐ•๋‚จ ํ† ๋ผ์ •์˜ ๋‚ด๋ถ€ ์ธํ…Œ๋ฆฌ์–ด. ํ‰์ผ ์ €๋…์ด์—ˆ์ง€๋งŒ ๊ฐ•๋‚จ์—ญ ๋ง›์ง‘ ๋‹ต๊ฒŒ ์‚ฌ๋žŒ๋“ค์ด ๋งŽ์•˜์–ด์š”. ์ „์ฒด์ ์œผ๋กœ ํŽธ์•ˆํ•˜๊ณ  ์•„๋Š‘ํ•œ ๊ณต๊ฐ„์œผ๋กœ ๊พธ๋ฉฐ์ ธ ์žˆ์—ˆ์Šต๋‹ˆ๋‹คใ…Žใ…Ž ํ•œ ๊ฐ€์ง€ ์•„์‰ฌ์› ๋˜ ๊ฑด ์กฐ๋ช…์ด ๋„ˆ๋ฌด ์–ด๋‘์›Œ ๋ˆˆ์ด ์นจ์นจํ–ˆ๋˜โ€ฆ ์ €ํฌ๋Š” 3์ธต์— ์ž๋ฆฌ๋ฅผ ์žก๊ณ  ์Œ์‹์„ ์ฃผ๋ฌธํ–ˆ์Šต๋‹ˆ๋‹ค.', '์ด 5๋ช…์ด์„œ ๋จน๊ณ  ์‹ถ์€ ์Œ์‹ ํ•˜๋‚˜์”ฉ ๊ณจ๋ผ ๋‹ค์–‘ํ•˜๊ฒŒ ์ฃผ๋ฌธํ–ˆ์–ด์š” ์ฒซ ๋ฒˆ์งธ ์ค€๋น„๋œ ๋ฉ”๋‰ด๋Š” ํ† ๋ผ์ • ๊ณ ๋กœ์ผ€์™€ ๊นป์žŽ ๋ถˆ๊ณ ๊ธฐ ์‚ฌ๋ผ๋‹ค๋ฅผ ๋“ฌ๋ฟ ์˜ฌ๋ ค ๋จน๋Š” ๋ง›์žˆ๋Š” ๋ฐฅ์ž…๋‹ˆ๋‹ค. ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๋ฉ”๋‰ด๋ฅผ ํ•œ ๋ฒˆ์— ์‹œํ‚ค๋ฉด ์ค€๋น„๋˜๋Š” ๋ฉ”๋‰ด๋ถ€ํ„ฐ ๊ฐ€์ ธ๋‹ค ์ฃผ๋”๋ผ๊ตฌ์š”. ํ† ๋ผ์ • ๊ณ ๋กœ์ผ€ ๊ธˆ๋ฐฉ ํŠ€๊ฒจ์ ธ ๋‚˜์™€ ๊ฒ‰์€ ๋ฐ”์‚ญํ•˜๊ณ  ์†์€ ์ด‰์ด‰ํ•ด ๋ง›์žˆ์—ˆ์–ด์š”!', '๊นป์žŽ ๋ถˆ๊ณ ๊ธฐ ์‚ฌ๋ผ๋‹ค๋Š” ๋ถˆ๊ณ ๊ธฐ, ์–‘๋ฐฐ์ถ”, ๋ฒ„์„ฏ์„ ๋ณถ์•„ ๊นป์žŽ์„ ๋“ฌ๋ฟ ์˜ฌ๋ฆฌ๊ณ  ์šฐ์—‰ ํŠ€๊น€์„ ๊ณ๋“ค์—ฌ ๋ฐฅ์ด๋ž‘ ํ•จ๊ป˜ ๋จน๋Š” ๋ฉ”๋‰ด์ž…๋‹ˆ๋‹ค. ์‚ฌ์‹ค ์ „ ๊ณ ๊ธฐ๋ฅผ ์•ˆ ๋จน์–ด์„œ ๋ฌด์Šจ ๋ง›์ธ์ง€ ๋ชจ๋ฅด๊ฒ ์ง€๋งŒ.. ๋‹ค๋“ค ์—„์ฒญ ์ž˜ ๋“œ์…จ์Šต๋‹ˆ๋‹คใ…‹ใ…‹ ์ด๊ฑด ์ œ๊ฐ€ ์‹œํ‚จ ์ด‰์ด‰ํ•œ ๊ณ ๋กœ์ผ€์™€ ํฌ๋ฆผ์ŠคํŠœ์šฐ๋™. ๊ฐ•๋‚จ ํ† ๋ผ์ •์—์„œ ๋จน์€ ์Œ์‹ ์ค‘์— ์ด๊ฒŒ ์ œ์ผ ๋ง›์žˆ์—ˆ์–ด์š”!!! ํฌ๋ฆผ์†Œ์Šค๋ฅผ ์›๋ž˜ ์ข‹์•„ํ•˜๊ธฐ๋„ ํ•˜์ง€๋งŒ, ๋Š๋ผํ•˜์ง€ ์•Š๊ฒŒ ๋ถ€๋“œ๋Ÿฝ๊ณ  ๋‹ฌ๋‹ฌํ•œ ์ŠคํŠœ์™€ ์ซ„๊นƒํ•œ ์šฐ๋™๋ฉด์ด ๋„ˆ๋ฌด ์ž˜ ์–ด์šธ๋ ค ๊ณ„์† ์†์ด ๊ฐ€๋”๋ผ๊ตฌ์š”.', '์‚ฌ์ง„์„ ๋ณด๋‹ˆ ๋˜ ๋จน๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค ๊ฐ„์‚ฌ์ด ํ’ ์—ฐ์–ด ์ง€๋ผ์‹œ์ž…๋‹ˆ๋‹ค. ์ผ๋ณธ ๊ฐ„์‚ฌ์ด ์ง€๋ฐฉ์—์„œ ๋งŽ์ด ๋จน๋Š” ๋– ๋จน๋Š” ์ดˆ๋ฐฅ(์ง€๋ผ์‹œ์Šค์‹œ)์ด๋ผ๊ณ  ํ•˜๋„ค์š”. ๋ฐ‘์— ์™€์‚ฌ๋น„ ๋งˆ์š”๋ฐฅ ์œ„์— ์—ฐ์–ด๋“ค์ด ๋‹ด๊ฒจ์ ธ ์žˆ์–ด ์ฝ”๋์ด ์ฐกํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ ํ˜€ ์žˆ๋Š”๋ฐ, ๋‚œ ์™€์‚ฌ๋น„ ๋ง› 1๋„ ๋ชจ๋ฅด๊ฒ ๋˜๋ฐโ€ฆ? ์™€์‚ฌ๋น„๋ฅผ ์•ˆ ์ข‹์•„ํ•˜๋Š” ์ €๋Š” ๋ถˆํ–‰์ธ์ง€ ๋‹คํ–‰์ธ์ง€ ์—ฐ์–ด ์ง€๋ผ์‹œ๋ฅผ ๋งค์šฐ ๋ง›์žˆ๊ฒŒ ๋จน์—ˆ์Šต๋‹ˆ๋‹คใ…‹ใ…‹ใ…‹', '๋‹ค์Œ ๋ฉ”๋‰ด๋Š” ๋‹ฌ์ง์ง€๊ทผํ•œ ์ˆฏ๋ถˆ ๊ฐˆ๋น„ ๋ฎ๋ฐฅ์ž…๋‹ˆ๋‹ค! ๊ฐ„์žฅ ์–‘๋…์— ๊ตฌ์šด ์ˆฏ๋ถˆ ๊ฐˆ๋น„์— ์–‘ํŒŒ, ๊นป์žŽ, ๋‹ฌ๊ฑ€ ๋ฐ˜์ˆ™์„ ํ„ฐํŠธ๋ ค ๋น„๋ฒผ ๋จน์œผ๋ฉด ๊ทธ ๋ง›์ด ํฌ.. (๋ฌผ๋ก  ์ „ ์•ˆ ๋จน์—ˆ์ง€๋งŒโ€ฆ๋‹ค๋ฅธ ๋ถ„๋“ค์ด ๊ทธ๋ ‡๋‹ค๊ณ  ํ•˜๋”๋ผ๊ตฌ์š”ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹) ๋งˆ์ง€๋ง‰ ๋ฉ”์ธ ๋ฉ”๋‰ด ์–‘์†ก์ด ํฌ๋ฆผ์ˆ˜ํ”„์™€ ์ˆฏ๋ถˆ๋–ก๊ฐˆ๋น„ ๋ฐฅ์ž…๋‹ˆ๋‹ค. ํฌ๋ฆผ๋ฆฌ์กฐ๋˜๋ฅผ ๋ฒ ์ด์Šค๋กœ ์œ„์— ๊ทธ๋ฃจํ†ต๊ณผ ์ˆฏ๋ถˆ๋กœ ๊ตฌ์šด ๋–ก๊ฐˆ๋น„๊ฐ€ ์˜ฌ๋ผ๊ฐ€ ์žˆ์–ด์š”!', 'ํฌ๋ฆผ์ŠคํŠœ ์šฐ๋™ ๋งŒํผ์ด๋‚˜ ๋Œ€๋ฐ• ๋ง›์žˆ์Šต๋‹ˆ๋‹คโ€ฆใ… ใ… ใ… ใ… ใ… ใ…  (ํฌ๋ฆผ ์†Œ์Šค๋ฉด ๋‹ค ์ข‹์•„ํ•˜๋Š” ๊ฑฐ ์ ˆ๋Œ€ ์•„๋‹™๋‹ˆ๋‹คใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹) ๊ฐ•๋‚จ ํ† ๋ผ์ • ์š”๋ฆฌ๋Š” ๋‹ค ๋ง›์žˆ์ง€๋งŒ ํฌ๋ฆผ์†Œ์Šค ์š”๋ฆฌ๋ฅผ ์ฐธ ์ž˜ํ•˜๋Š” ๊ฑฐ ๊ฐ™๋„ค์š” ์š”๊ฑด ๋ฌผ๋งŒ ๋งˆ์‹œ๊ธฐ ์•„์‰ฌ์›Œ ์‹œํ‚จ ๋‰ด์ž๋ชฝ๊ณผ ๋ฐ€ํ‚ค์†Œ๋‹ค ๋”ธ๊ธฐํ†ตํ†ต! ์œ ์ž์™€ ์ž๋ชฝ์˜ ๋ง›์„ ํ•จ๊ป˜ ๋Š๋‚„ ์ˆ˜ ์žˆ๋Š” ๋‰ด์ž๋ชฝ์€ ์ƒํผํ•จ ๊ทธ ์ž์ฒด์˜€์–ด์š”.', 'ํ•˜์น˜๋งŒ ์ €๋Š” ๋”ธ๊ธฐํ†ตํ†ต ๋ฐ€ํ‚ค์†Œ๋‹ค๊ฐ€ ๋” ๋ง›์žˆ์—ˆ์Šต๋‹ˆ๋‹คใ…Žใ…Ž ๋ฐ€ํ‚ค์†Œ๋‹ค๋Š” ํ† ๋ผ์ •์—์„œ๋งŒ ๋งŒ๋‚˜๋ณผ ์ˆ˜ ์žˆ๋Š” ๋ฉ”๋‰ด๋ผ๊ณ  ํ•˜๋‹ˆ ํ•œ ๋ฒˆ ๋“œ์…”๋ณด์‹œ๊ธธ ์ถ”์ฒœํ• ๊ฒŒ์š”!! ๊ฐ•๋‚จ ํ† ๋ผ์ •์€ ๊ฐ•๋‚จ์—ญ ๋ง›์ง‘๋‹ต๊ฒŒ ๋ชจ๋“  ์Œ์‹๋“ค์ด ๋Œ€์ฒด์ ์œผ๋กœ ๋ง›์žˆ์—ˆ์–ด์š”! ๊ฑด๋ฌผ ์œ„์น˜๋„ ๊ฐ•๋‚จ ๋Œ€๋กœ๋ณ€์—์„œ ์กฐ๊ธˆ ๋–จ์–ด์ ธ ์žˆ์–ด ๋‚ด๋ถ€ ์ธํ…Œ๋ฆฌ์–ด์ฒ˜๋Ÿผ ์•„๋Š‘ํ•œ ๋Š๋‚Œ๋„ ์žˆ์—ˆ๊ตฌ์š”ใ…Žใ…Ž', '๊ธฐํšŒ๊ฐ€ ๋˜๋ฉด ๋‹ค๋“ค ๊ผญ ๋“ค๋Ÿฌ๋ณด์„ธ์š”~ ๐Ÿ™‚']
  • An example of multiple texts batch segmentation
>>> from kss import split_chunks

>>> text1 = """๊ฐ•๋‚จ์—ญ ๋ง›์ง‘์œผ๋กœ ์†Œ๋ฌธ๋‚œ ๊ฐ•๋‚จ ํ† ๋ผ์ •์— ๋‹ค๋…€์™”์Šต๋‹ˆ๋‹ค. ํšŒ์‚ฌ ๋™๋ฃŒ ๋ถ„๋“ค๊ณผ ๋‹ค๋…€์™”๋Š”๋ฐ ๋ถ„์œ„๊ธฐ๋„ ์ข‹๊ณ  ์Œ์‹๋„ ๋ง›์žˆ์—ˆ์–ด์š” ๋‹ค๋งŒ, ๊ฐ•๋‚จ ํ† ๋ผ์ •์ด ๊ฐ•๋‚จ ์‰‘์‰‘๋ฒ„๊ฑฐ ๊ณจ๋ชฉ๊ธธ๋กœ ์ญ‰ ์˜ฌ๋ผ๊ฐ€์•ผ ํ•˜๋Š”๋ฐ ๋‹ค๋“ค ์‰‘์‰‘๋ฒ„๊ฑฐ์˜ ์œ ํ˜น์— ๋„˜์–ด๊ฐˆ ๋ป” ํ–ˆ๋‹ต๋‹ˆ๋‹ค ๊ฐ•๋‚จ์—ญ ๋ง›์ง‘ ํ† ๋ผ์ •์˜ ์™ธ๋ถ€ ๋ชจ์Šต. ๊ฐ•๋‚จ ํ† ๋ผ์ •์€ 4์ธต ๊ฑด๋ฌผ ๋…์ฑ„๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค.', '์—ญ์‹œ ํ† ๋ผ์ • ๋ณธ ์  ๋‹ต์ฃ ?ใ…Žใ……ใ…Ž ๊ฑด๋ฌผ์€ ํฌ์ง€๋งŒ ๊ฐ„ํŒ์ด ์—†๊ธฐ ๋•Œ๋ฌธ์— ์ง€๋‚˜์น  ์ˆ˜ ์žˆ์œผ๋‹ˆ ์กฐ์‹ฌํ•˜์„ธ์š” ๊ฐ•๋‚จ ํ† ๋ผ์ •์˜ ๋‚ด๋ถ€ ์ธํ…Œ๋ฆฌ์–ด. ํ‰์ผ ์ €๋…์ด์—ˆ์ง€๋งŒ ๊ฐ•๋‚จ์—ญ ๋ง›์ง‘ ๋‹ต๊ฒŒ ์‚ฌ๋žŒ๋“ค์ด ๋งŽ์•˜์–ด์š”. ์ „์ฒด์ ์œผ๋กœ ํŽธ์•ˆํ•˜๊ณ  ์•„๋Š‘ํ•œ ๊ณต๊ฐ„์œผ๋กœ ๊พธ๋ฉฐ์ ธ ์žˆ์—ˆ์Šต๋‹ˆ๋‹คใ…Žใ…Ž ํ•œ ๊ฐ€์ง€ ์•„์‰ฌ์› ๋˜ ๊ฑด ์กฐ๋ช…์ด ๋„ˆ๋ฌด ์–ด๋‘์›Œ ๋ˆˆ์ด ์นจ์นจํ–ˆ๋˜โ€ฆ ์ €ํฌ๋Š” 3์ธต์— ์ž๋ฆฌ๋ฅผ ์žก๊ณ  ์Œ์‹์„ ์ฃผ๋ฌธํ–ˆ์Šต๋‹ˆ๋‹ค.', '์ด 5๋ช…์ด์„œ ๋จน๊ณ  ์‹ถ์€ ์Œ์‹ ํ•˜๋‚˜์”ฉ ๊ณจ๋ผ ๋‹ค์–‘ํ•˜๊ฒŒ ์ฃผ๋ฌธํ–ˆ์–ด์š” ์ฒซ ๋ฒˆ์งธ ์ค€๋น„๋œ ๋ฉ”๋‰ด๋Š” ํ† ๋ผ์ • ๊ณ ๋กœ์ผ€์™€ ๊นป์žŽ ๋ถˆ๊ณ ๊ธฐ ์‚ฌ๋ผ๋‹ค๋ฅผ ๋“ฌ๋ฟ ์˜ฌ๋ ค ๋จน๋Š” ๋ง›์žˆ๋Š” ๋ฐฅ์ž…๋‹ˆ๋‹ค. ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๋ฉ”๋‰ด๋ฅผ ํ•œ ๋ฒˆ์— ์‹œํ‚ค๋ฉด ์ค€๋น„๋˜๋Š” ๋ฉ”๋‰ด๋ถ€ํ„ฐ ๊ฐ€์ ธ๋‹ค ์ฃผ๋”๋ผ๊ตฌ์š”. ํ† ๋ผ์ • ๊ณ ๋กœ์ผ€ ๊ธˆ๋ฐฉ ํŠ€๊ฒจ์ ธ ๋‚˜์™€ ๊ฒ‰์€ ๋ฐ”์‚ญํ•˜๊ณ  ์†์€ ์ด‰์ด‰ํ•ด ๋ง›์žˆ์—ˆ์–ด์š”!', '๊นป์žŽ ๋ถˆ๊ณ ๊ธฐ ์‚ฌ๋ผ๋‹ค๋Š” ๋ถˆ๊ณ ๊ธฐ, ์–‘๋ฐฐ์ถ”, ๋ฒ„์„ฏ์„ ๋ณถ์•„ ๊นป์žŽ์„ ๋“ฌ๋ฟ ์˜ฌ๋ฆฌ๊ณ  ์šฐ์—‰ ํŠ€๊น€์„ ๊ณ๋“ค์—ฌ ๋ฐฅ์ด๋ž‘ ํ•จ๊ป˜ ๋จน๋Š” ๋ฉ”๋‰ด์ž…๋‹ˆ๋‹ค. ์‚ฌ์‹ค ์ „ ๊ณ ๊ธฐ๋ฅผ ์•ˆ ๋จน์–ด์„œ ๋ฌด์Šจ ๋ง›์ธ์ง€ ๋ชจ๋ฅด๊ฒ ์ง€๋งŒ.. ๋‹ค๋“ค ์—„์ฒญ ์ž˜ ๋“œ์…จ์Šต๋‹ˆ๋‹คใ…‹ใ…‹ ์ด๊ฑด ์ œ๊ฐ€ ์‹œํ‚จ ์ด‰์ด‰ํ•œ ๊ณ ๋กœ์ผ€์™€ ํฌ๋ฆผ์ŠคํŠœ์šฐ๋™. ๊ฐ•๋‚จ ํ† ๋ผ์ •์—์„œ ๋จน์€ ์Œ์‹ ์ค‘์— ์ด๊ฒŒ ์ œ์ผ ๋ง›์žˆ์—ˆ์–ด์š”!!! ํฌ๋ฆผ์†Œ์Šค๋ฅผ ์›๋ž˜ ์ข‹์•„ํ•˜๊ธฐ๋„ ํ•˜์ง€๋งŒ, ๋Š๋ผํ•˜์ง€ ์•Š๊ฒŒ ๋ถ€๋“œ๋Ÿฝ๊ณ  ๋‹ฌ๋‹ฌํ•œ ์ŠคํŠœ์™€ ์ซ„๊นƒํ•œ ์šฐ๋™๋ฉด์ด ๋„ˆ๋ฌด ์ž˜ ์–ด์šธ๋ ค ๊ณ„์† ์†์ด ๊ฐ€๋”๋ผ๊ตฌ์š”.', '์‚ฌ์ง„์„ ๋ณด๋‹ˆ ๋˜ ๋จน๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค ๊ฐ„์‚ฌ์ด ํ’ ์—ฐ์–ด ์ง€๋ผ์‹œ์ž…๋‹ˆ๋‹ค. ์ผ๋ณธ ๊ฐ„์‚ฌ์ด ์ง€๋ฐฉ์—์„œ ๋งŽ์ด ๋จน๋Š” ๋– ๋จน๋Š” ์ดˆ๋ฐฅ(์ง€๋ผ์‹œ์Šค์‹œ)์ด๋ผ๊ณ  ํ•˜๋„ค์š”. ๋ฐ‘์— ์™€์‚ฌ๋น„ ๋งˆ์š”๋ฐฅ ์œ„์— ์—ฐ์–ด๋“ค์ด ๋‹ด๊ฒจ์ ธ ์žˆ์–ด ์ฝ”๋์ด ์ฐกํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ ํ˜€ ์žˆ๋Š”๋ฐ, ๋‚œ ์™€์‚ฌ๋น„ ๋ง› 1๋„ ๋ชจ๋ฅด๊ฒ ๋˜๋ฐโ€ฆ? ์™€์‚ฌ๋น„๋ฅผ ์•ˆ ์ข‹์•„ํ•˜๋Š” ์ €๋Š” ๋ถˆํ–‰์ธ์ง€ ๋‹คํ–‰์ธ์ง€ ์—ฐ์–ด ์ง€๋ผ์‹œ๋ฅผ ๋งค์šฐ ๋ง›์žˆ๊ฒŒ ๋จน์—ˆ์Šต๋‹ˆ๋‹คใ…‹ใ…‹ใ…‹', '๋‹ค์Œ ๋ฉ”๋‰ด๋Š” ๋‹ฌ์ง์ง€๊ทผํ•œ ์ˆฏ๋ถˆ ๊ฐˆ๋น„ ๋ฎ๋ฐฅ์ž…๋‹ˆ๋‹ค! ๊ฐ„์žฅ ์–‘๋…์— ๊ตฌ์šด ์ˆฏ๋ถˆ ๊ฐˆ๋น„์— ์–‘ํŒŒ, ๊นป์žŽ, ๋‹ฌ๊ฑ€ ๋ฐ˜์ˆ™์„ ํ„ฐํŠธ๋ ค ๋น„๋ฒผ ๋จน์œผ๋ฉด ๊ทธ ๋ง›์ด ํฌ.. (๋ฌผ๋ก  ์ „ ์•ˆ ๋จน์—ˆ์ง€๋งŒโ€ฆ๋‹ค๋ฅธ ๋ถ„๋“ค์ด ๊ทธ๋ ‡๋‹ค๊ณ  ํ•˜๋”๋ผ๊ตฌ์š”ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹) ๋งˆ์ง€๋ง‰ ๋ฉ”์ธ ๋ฉ”๋‰ด ์–‘์†ก์ด ํฌ๋ฆผ์ˆ˜ํ”„์™€ ์ˆฏ๋ถˆ๋–ก๊ฐˆ๋น„ ๋ฐฅ์ž…๋‹ˆ๋‹ค. ํฌ๋ฆผ๋ฆฌ์กฐ๋˜๋ฅผ ๋ฒ ์ด์Šค๋กœ ์œ„์— ๊ทธ๋ฃจํ†ต๊ณผ ์ˆฏ๋ถˆ๋กœ ๊ตฌ์šด ๋–ก๊ฐˆ๋น„๊ฐ€ ์˜ฌ๋ผ๊ฐ€ ์žˆ์–ด์š”!', 'ํฌ๋ฆผ์ŠคํŠœ ์šฐ๋™ ๋งŒํผ์ด๋‚˜ ๋Œ€๋ฐ• ๋ง›์žˆ์Šต๋‹ˆ๋‹คโ€ฆใ… ใ… ใ… ใ… ใ… ใ…  (ํฌ๋ฆผ ์†Œ์Šค๋ฉด ๋‹ค ์ข‹์•„ํ•˜๋Š” ๊ฑฐ ์ ˆ๋Œ€ ์•„๋‹™๋‹ˆ๋‹คใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹) ๊ฐ•๋‚จ ํ† ๋ผ์ • ์š”๋ฆฌ๋Š” ๋‹ค ๋ง›์žˆ์ง€๋งŒ ํฌ๋ฆผ์†Œ์Šค ์š”๋ฆฌ๋ฅผ ์ฐธ ์ž˜ํ•˜๋Š” ๊ฑฐ ๊ฐ™๋„ค์š” ์š”๊ฑด ๋ฌผ๋งŒ ๋งˆ์‹œ๊ธฐ ์•„์‰ฌ์›Œ ์‹œํ‚จ ๋‰ด์ž๋ชฝ๊ณผ ๋ฐ€ํ‚ค์†Œ๋‹ค ๋”ธ๊ธฐํ†ตํ†ต! ์œ ์ž์™€ ์ž๋ชฝ์˜ ๋ง›์„ ํ•จ๊ป˜ ๋Š๋‚„ ์ˆ˜ ์žˆ๋Š” ๋‰ด์ž๋ชฝ์€ ์ƒํผํ•จ ๊ทธ ์ž์ฒด์˜€์–ด์š”.', 'ํ•˜์น˜๋งŒ ์ €๋Š” ๋”ธ๊ธฐํ†ตํ†ต ๋ฐ€ํ‚ค์†Œ๋‹ค๊ฐ€ ๋” ๋ง›์žˆ์—ˆ์Šต๋‹ˆ๋‹คใ…Žใ…Ž ๋ฐ€ํ‚ค์†Œ๋‹ค๋Š” ํ† ๋ผ์ •์—์„œ๋งŒ ๋งŒ๋‚˜๋ณผ ์ˆ˜ ์žˆ๋Š” ๋ฉ”๋‰ด๋ผ๊ณ  ํ•˜๋‹ˆ ํ•œ ๋ฒˆ ๋“œ์…”๋ณด์‹œ๊ธธ ์ถ”์ฒœํ• ๊ฒŒ์š”!! ๊ฐ•๋‚จ ํ† ๋ผ์ •์€ ๊ฐ•๋‚จ์—ญ ๋ง›์ง‘๋‹ต๊ฒŒ ๋ชจ๋“  ์Œ์‹๋“ค์ด ๋Œ€์ฒด์ ์œผ๋กœ ๋ง›์žˆ์—ˆ์–ด์š”! ๊ฑด๋ฌผ ์œ„์น˜๋„ ๊ฐ•๋‚จ ๋Œ€๋กœ๋ณ€์—์„œ ์กฐ๊ธˆ ๋–จ์–ด์ ธ ์žˆ์–ด ๋‚ด๋ถ€ ์ธํ…Œ๋ฆฌ์–ด์ฒ˜๋Ÿผ ์•„๋Š‘ํ•œ ๋Š๋‚Œ๋„ ์žˆ์—ˆ๊ตฌ์š”ใ…Žใ…Ž', '๊ธฐํšŒ๊ฐ€ ๋˜๋ฉด ๋‹ค๋“ค ๊ผญ ๋“ค๋Ÿฌ๋ณด์„ธ์š”~ ๐Ÿ™‚"""
>>> text2 = """์ฃผ๋ง์— ๊ฐ€์กฑ์—ฌํ–‰์œผ๋กœ ์˜ค์…˜์›”๋“œ ๋‹ค๋…€์™”์–ด์š”!!! ์˜ค์…˜์›”๋“œ๋Š” ์ฒ˜์Œ๊ฐ€๋ณด๋Š”๊ฑฐ์—ฌ์„œ ์„ค๋ ˜์„ค๋ ˜~~!! ๋‚ ์”จ๋„ ๋๋‚ด์ฃผ๊ณ ~! ํ•˜๋Š˜,๊ตฌ๋ฆ„ ๋„ˆ๋ฌด ์ด๋ปค์Šต๋‹ˆ๋‹ค~! ๊ฐ€ํ‰ํœด๊ฒŒ์†Œ๊นŒ์ง€ ๊ฐ€๋Š”๋ฐ ์ฐจ๊ฐ€ ์—„~~~์ฒญ ๋ง‰ํ˜”์Šต๋‹ˆ๋‹ค(3์‹œ๊ฐ„๋„˜๊ฒŒ๊ฑธ๋ฆผ) ์™€ ์ •๋ง ํ† ๋‚˜์˜ค๋Š”์ค„ ์•Œ์•˜๋„ค์š” ํ•˜ํ•„ ๋˜ ์ €ํฌ๊ฐ€์กฑ ๋Šฆ๊ฒŒ ์ผ์–ด๋‚˜์„œ ๋Šฆ๊ฒŒ ์ถœ๋ฐœํ–ˆ๊ฑฐ๋“ ์š” ใ…‹ใ…‹ใ…‹ ๊ฐ€ํ‰ํœด๊ฒŒ์†Œ ์‚ฌ๋žŒ๋“ค์ด ์—„์ฒญ ๋งŽ์•˜์–ด์š”! ํ˜ธ๋‘๊ณผ์ž๋ž‘ ๊ตฐ๊ฒƒ์งˆ์ข€ ํ•ด์ฃผ๊ตฌ์š” ใ…‹_ใ…‹ ์˜ค์…˜์›”๋“œ ๋„์ฐฉ!! ์ฃผ์ฐจ์žฅ์ด ๋‹ค ๊ฝ‰์ฐจ์„œ.. ์ฃผ์ฐจํ• ๊ณณ์ด ์—†๋”๋ผ๊ตฌ์š” ๊ณ„์† ์ฃผ์ฐจ์žฅ ๋Œ๋‹ค๊ฐ€ ๊ฒจ์šฐ ํ•œ์ž๋ฆฌ ์žˆ์–ด์„œ ์ฃผ์ฐจํ–ˆ์Šต๋‹ˆ๋‹ค..ใ… ใ… ใ…  ๊ทธ๋Ÿฐ๋ฐ ๋˜ ์ฃผ์ฐจ์žฅ์— ์ฃผ์ฐจํ•˜๊ณ  ์–ธ๋•๊ธธ์„ ์˜ฌ๋ผ๊ฐ€์•ผ ํ•˜๋”๋ผ๊ตฌ์š”!?ํ—~ ์˜ค์…˜์›”๋“œ ..์ด๊ฒŒ๋ญ๋žŒ.. ํ์•Œ์ฝ”๋“œ๋กœ ์ฐ๊ณ  ๊ฐ„ํŽธํ•˜๊ฒŒ ์ž…์žฅํ–ˆ์Šต๋‹ˆ๋‹ค ์˜ค์…˜์›”๋“œ ์ฝ”์ธ๋„ ๋„‰๋„‰ํ•˜๊ฒŒ 10๋งŒ์› ์ถฉ์ „ํ–ˆ์–ด์š” ใ…‹ใ…‹ใ…‹ ๋‹ค๋“ค ๋„ˆ๋ฌด ์ž˜๋จน๊ธฐ๋•Œ๋ฌธ์—... ๋„‰๋„‰ํ•˜๊ฒŒ..ใ…‹ใ…‹ใ…‹ ์—ฌ์ž ๋ฝ์ปค์‹ค์— ์—์–ด์ปจ์ด ์–ผ๋งˆ๋‚˜ ๋นต๋นตํ•œ์ง€ ์˜ค๋“ค์˜ค๋“ค ์ถ”์› ์Šต๋‹ˆ๋‹ค ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋Š” ์Šตํ•˜๊ณ  ์ถ•์ถ•ํ•œ๋ฐ ์˜ค์…˜์›”๋“œ๋Š” ์™„์ „ ์ •๋ฐ˜๋Œ€ ใ…‹ใ…‹ใ…‹ ์ œ๊ฐ€ ๋ฐฉ์ˆ˜ํŒฉ์„ ์ค€๋น„๋ชปํ•ด์„œ ๊ฐ์ž 3๊ฐœ ์‚ด๋ ค๊ณ  ํ–ˆ๋Š”๋ฐ ํ—! ํ•œ๊ฐœ์— 19000์›์ด์—์š”! ๊ทธ๋ž˜์„œ ํ•œ๊ฐœ๋งŒ ์ƒ€์–ด์š” ใ… ใ…  ์ œ ํ•ธ๋“œํฐ์€ ๋ฝ์ปค์—.. ๋ฐฉ์ˆ˜ํŒฉ ๊ผญ ๋ฏธ๋ฆฌ ์ค€๋น„ํ•˜์„ธ์š” ใ…  ๋„˜๋น„์‹ธ์š” ใ…  ์˜ค์…˜์›”๋“œ ์ •๋ง ์—‰๋ง์ง„์ฐฝ์ด์—ˆ์–ด์š” ใ… ใ…  ์‚ฌ๋žŒ์ด ๋„ˆ~~~๋ฌด๋งŽ์•„์„œ ์œ ์ˆ˜ํ’€๋„ ์ค„์„œ์„œ๋“ค์–ด๊ฐ€๊ตฌ์š” ๋‹ค๋ฅธ ๋†€์ด๊ธฐ๊ตฌ๋Š” ์—„๋‘๋„ ๋ชป๋‚ฌ์Šต๋‹ˆ๋‹ค ํŒŒ๋„ํ’€๋„ ์‚ฌ๋žŒ์ด ๋„ˆ๋ฌด ๋งŽ์€์ง€ ์•ˆ์ „์ƒ ๊ด€๋ฆฌ๋ฅผ ๋นก์„ธ๊ฒŒ ํ•ด์„œ ์žฌ๋ฏธ๊ฐ€ ์—†์—ˆ์–ด์š”.. ์ฒ˜์Œ์œผ๋กœ ๋จน์–ด๋ณธ ์†Œ๋–ก์†Œ๋–กย ๋ฌผ๋†€์ดํ•˜๋‹ค๊ฐ€ ๋จน์€ ๊ฐ„์‹์ด์–ด์„œ ๊ทธ๋Ÿฐ์ง€ ์ฐธ ๋ง›์žˆ๊ฒŒ ๋จน์—ˆ์Šต๋‹ˆ๋‹ค! ๊ทธ๋ ‡์ง€๋งŒ ์œ„์ƒ์€ ์ •๋ง ์•ˆ์ข‹์•˜์–ด์š”.. ์˜ค์…˜์›”๋“œ ์ฒ˜์Œ์ด๋ผ ๊ธฐ๋Œ€ ๋งŽ์ด ํ–ˆ๋Š”๋ฐ ์ฒจ๋ถ€ํ„ฐ ๋๊นŒ์ง€ ๋‹ค ๋ง˜์— ์•ˆ๋“ค์—ˆ์–ด์š” ๋ฌผ๋ก  ์‚ฌ๋žŒ์ด ๋„ˆ~๋ฌด ๋งŽ์•„์„œ ์ผ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋Š” ์œ„์ƒ๋„ ๊ดœ์ฐฎ์•„ ๋ณด์ด๊ณ  ์Œ์‹์ด ๋น„์‹ธ์ง€๋งŒ ๋‹ค ๋ง›์žˆ์—ˆ๊ฑฐ๋“ ์š”! ๊ทผ๋ฐ ์˜ค์…˜์›”๋“œ ์œ„์ƒ๋„ ๋ณ„๋กœ๊ณ  ๋น„์‹ธ๊ณ  ๋ง›์—†๊ณ !!! ์ฃผ์ฐจ์žฅ๋„ ์ข๊ณ  ์ฃผ์ฐจ์žฅ์—์„œ ์ž…๊ตฌ๊นŒ์ง€ ๊ฑธ์–ด์„œ ์˜ฌ๋ผ๊ฐ€๊ณ .. ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋ณด๋‹ค ๋‚˜์•˜๋˜๊ฑด ๋ฝ์ปค์‹œ์„ค๊ณผ ์œ ์ˆ˜ํ’€ ๋‘๊ฐœ ์ •๋„! ์˜ค์…˜์›”๋“œ ์ •๋ง ์•„์‰ฌ์› ์Šต๋‹ˆ๋‹ค ๊ฐœ์ธ์ ์œผ๋ฃจ ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๊ฐ€ ํ›จ์”ฌ ๋‚˜์€๋“ฏ!"""
>>> split_chunks([text1, text2], max_length=128)
[['๊ฐ•๋‚จ์—ญ ๋ง›์ง‘์œผ๋กœ ์†Œ๋ฌธ๋‚œ ๊ฐ•๋‚จ ํ† ๋ผ์ •์— ๋‹ค๋…€์™”์Šต๋‹ˆ๋‹ค. ํšŒ์‚ฌ ๋™๋ฃŒ ๋ถ„๋“ค๊ณผ ๋‹ค๋…€์™”๋Š”๋ฐ ๋ถ„์œ„๊ธฐ๋„ ์ข‹๊ณ  ์Œ์‹๋„ ๋ง›์žˆ์—ˆ์–ด์š” ๋‹ค๋งŒ, ๊ฐ•๋‚จ ํ† ๋ผ์ •์ด ๊ฐ•๋‚จ ์‰‘์‰‘๋ฒ„๊ฑฐ ๊ณจ๋ชฉ๊ธธ๋กœ ์ญ‰ ์˜ฌ๋ผ๊ฐ€์•ผ ํ•˜๋Š”๋ฐ ๋‹ค๋“ค ์‰‘์‰‘๋ฒ„๊ฑฐ์˜ ์œ ํ˜น์— ๋„˜์–ด๊ฐˆ ๋ป” ํ–ˆ๋‹ต๋‹ˆ๋‹ค ๊ฐ•๋‚จ์—ญ ๋ง›์ง‘ ํ† ๋ผ์ •์˜ ์™ธ๋ถ€ ๋ชจ์Šต. ๊ฐ•๋‚จ ํ† ๋ผ์ •์€ 4์ธต ๊ฑด๋ฌผ ๋…์ฑ„๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค.', '์—ญ์‹œ ํ† ๋ผ์ • ๋ณธ ์  ๋‹ต์ฃ ?ใ…Žใ……ใ…Ž ๊ฑด๋ฌผ์€ ํฌ์ง€๋งŒ ๊ฐ„ํŒ์ด ์—†๊ธฐ ๋•Œ๋ฌธ์— ์ง€๋‚˜์น  ์ˆ˜ ์žˆ์œผ๋‹ˆ ์กฐ์‹ฌํ•˜์„ธ์š” ๊ฐ•๋‚จ ํ† ๋ผ์ •์˜ ๋‚ด๋ถ€ ์ธํ…Œ๋ฆฌ์–ด. ํ‰์ผ ์ €๋…์ด์—ˆ์ง€๋งŒ ๊ฐ•๋‚จ์—ญ ๋ง›์ง‘ ๋‹ต๊ฒŒ ์‚ฌ๋žŒ๋“ค์ด ๋งŽ์•˜์–ด์š”. ์ „์ฒด์ ์œผ๋กœ ํŽธ์•ˆํ•˜๊ณ  ์•„๋Š‘ํ•œ ๊ณต๊ฐ„์œผ๋กœ ๊พธ๋ฉฐ์ ธ ์žˆ์—ˆ์Šต๋‹ˆ๋‹คใ…Žใ…Ž ํ•œ ๊ฐ€์ง€ ์•„์‰ฌ์› ๋˜ ๊ฑด ์กฐ๋ช…์ด ๋„ˆ๋ฌด ์–ด๋‘์›Œ ๋ˆˆ์ด ์นจ์นจํ–ˆ๋˜โ€ฆ ์ €ํฌ๋Š” 3์ธต์— ์ž๋ฆฌ๋ฅผ ์žก๊ณ  ์Œ์‹์„ ์ฃผ๋ฌธํ–ˆ์Šต๋‹ˆ๋‹ค.', '์ด 5๋ช…์ด์„œ ๋จน๊ณ  ์‹ถ์€ ์Œ์‹ ํ•˜๋‚˜์”ฉ ๊ณจ๋ผ ๋‹ค์–‘ํ•˜๊ฒŒ ์ฃผ๋ฌธํ–ˆ์–ด์š” ์ฒซ ๋ฒˆ์งธ ์ค€๋น„๋œ ๋ฉ”๋‰ด๋Š” ํ† ๋ผ์ • ๊ณ ๋กœ์ผ€์™€ ๊นป์žŽ ๋ถˆ๊ณ ๊ธฐ ์‚ฌ๋ผ๋‹ค๋ฅผ ๋“ฌ๋ฟ ์˜ฌ๋ ค ๋จน๋Š” ๋ง›์žˆ๋Š” ๋ฐฅ์ž…๋‹ˆ๋‹ค. ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๋ฉ”๋‰ด๋ฅผ ํ•œ ๋ฒˆ์— ์‹œํ‚ค๋ฉด ์ค€๋น„๋˜๋Š” ๋ฉ”๋‰ด๋ถ€ํ„ฐ ๊ฐ€์ ธ๋‹ค ์ฃผ๋”๋ผ๊ตฌ์š”. ํ† ๋ผ์ • ๊ณ ๋กœ์ผ€ ๊ธˆ๋ฐฉ ํŠ€๊ฒจ์ ธ ๋‚˜์™€ ๊ฒ‰์€ ๋ฐ”์‚ญํ•˜๊ณ  ์†์€ ์ด‰์ด‰ํ•ด ๋ง›์žˆ์—ˆ์–ด์š”!', '๊นป์žŽ ๋ถˆ๊ณ ๊ธฐ ์‚ฌ๋ผ๋‹ค๋Š” ๋ถˆ๊ณ ๊ธฐ, ์–‘๋ฐฐ์ถ”, ๋ฒ„์„ฏ์„ ๋ณถ์•„ ๊นป์žŽ์„ ๋“ฌ๋ฟ ์˜ฌ๋ฆฌ๊ณ  ์šฐ์—‰ ํŠ€๊น€์„ ๊ณ๋“ค์—ฌ ๋ฐฅ์ด๋ž‘ ํ•จ๊ป˜ ๋จน๋Š” ๋ฉ”๋‰ด์ž…๋‹ˆ๋‹ค. ์‚ฌ์‹ค ์ „ ๊ณ ๊ธฐ๋ฅผ ์•ˆ ๋จน์–ด์„œ ๋ฌด์Šจ ๋ง›์ธ์ง€ ๋ชจ๋ฅด๊ฒ ์ง€๋งŒ.. ๋‹ค๋“ค ์—„์ฒญ ์ž˜ ๋“œ์…จ์Šต๋‹ˆ๋‹คใ…‹ใ…‹ ์ด๊ฑด ์ œ๊ฐ€ ์‹œํ‚จ ์ด‰์ด‰ํ•œ ๊ณ ๋กœ์ผ€์™€ ํฌ๋ฆผ์ŠคํŠœ์šฐ๋™. ๊ฐ•๋‚จ ํ† ๋ผ์ •์—์„œ ๋จน์€ ์Œ์‹ ์ค‘์— ์ด๊ฒŒ ์ œ์ผ ๋ง›์žˆ์—ˆ์–ด์š”!!! ํฌ๋ฆผ์†Œ์Šค๋ฅผ ์›๋ž˜ ์ข‹์•„ํ•˜๊ธฐ๋„ ํ•˜์ง€๋งŒ, ๋Š๋ผํ•˜์ง€ ์•Š๊ฒŒ ๋ถ€๋“œ๋Ÿฝ๊ณ  ๋‹ฌ๋‹ฌํ•œ ์ŠคํŠœ์™€ ์ซ„๊นƒํ•œ ์šฐ๋™๋ฉด์ด ๋„ˆ๋ฌด ์ž˜ ์–ด์šธ๋ ค ๊ณ„์† ์†์ด ๊ฐ€๋”๋ผ๊ตฌ์š”.', '์‚ฌ์ง„์„ ๋ณด๋‹ˆ ๋˜ ๋จน๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค ๊ฐ„์‚ฌ์ด ํ’ ์—ฐ์–ด ์ง€๋ผ์‹œ์ž…๋‹ˆ๋‹ค. ์ผ๋ณธ ๊ฐ„์‚ฌ์ด ์ง€๋ฐฉ์—์„œ ๋งŽ์ด ๋จน๋Š” ๋– ๋จน๋Š” ์ดˆ๋ฐฅ(์ง€๋ผ์‹œ์Šค์‹œ)์ด๋ผ๊ณ  ํ•˜๋„ค์š”. ๋ฐ‘์— ์™€์‚ฌ๋น„ ๋งˆ์š”๋ฐฅ ์œ„์— ์—ฐ์–ด๋“ค์ด ๋‹ด๊ฒจ์ ธ ์žˆ์–ด ์ฝ”๋์ด ์ฐกํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ ํ˜€ ์žˆ๋Š”๋ฐ, ๋‚œ ์™€์‚ฌ๋น„ ๋ง› 1๋„ ๋ชจ๋ฅด๊ฒ ๋˜๋ฐโ€ฆ? ์™€์‚ฌ๋น„๋ฅผ ์•ˆ ์ข‹์•„ํ•˜๋Š” ์ €๋Š” ๋ถˆํ–‰์ธ์ง€ ๋‹คํ–‰์ธ์ง€ ์—ฐ์–ด ์ง€๋ผ์‹œ๋ฅผ ๋งค์šฐ ๋ง›์žˆ๊ฒŒ ๋จน์—ˆ์Šต๋‹ˆ๋‹คใ…‹ใ…‹ใ…‹', '๋‹ค์Œ ๋ฉ”๋‰ด๋Š” ๋‹ฌ์ง์ง€๊ทผํ•œ ์ˆฏ๋ถˆ ๊ฐˆ๋น„ ๋ฎ๋ฐฅ์ž…๋‹ˆ๋‹ค! ๊ฐ„์žฅ ์–‘๋…์— ๊ตฌ์šด ์ˆฏ๋ถˆ ๊ฐˆ๋น„์— ์–‘ํŒŒ, ๊นป์žŽ, ๋‹ฌ๊ฑ€ ๋ฐ˜์ˆ™์„ ํ„ฐํŠธ๋ ค ๋น„๋ฒผ ๋จน์œผ๋ฉด ๊ทธ ๋ง›์ด ํฌ.. (๋ฌผ๋ก  ์ „ ์•ˆ ๋จน์—ˆ์ง€๋งŒโ€ฆ๋‹ค๋ฅธ ๋ถ„๋“ค์ด ๊ทธ๋ ‡๋‹ค๊ณ  ํ•˜๋”๋ผ๊ตฌ์š”ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹) ๋งˆ์ง€๋ง‰ ๋ฉ”์ธ ๋ฉ”๋‰ด ์–‘์†ก์ด ํฌ๋ฆผ์ˆ˜ํ”„์™€ ์ˆฏ๋ถˆ๋–ก๊ฐˆ๋น„ ๋ฐฅ์ž…๋‹ˆ๋‹ค. ํฌ๋ฆผ๋ฆฌ์กฐ๋˜๋ฅผ ๋ฒ ์ด์Šค๋กœ ์œ„์— ๊ทธ๋ฃจํ†ต๊ณผ ์ˆฏ๋ถˆ๋กœ ๊ตฌ์šด ๋–ก๊ฐˆ๋น„๊ฐ€ ์˜ฌ๋ผ๊ฐ€ ์žˆ์–ด์š”!', 'ํฌ๋ฆผ์ŠคํŠœ ์šฐ๋™ ๋งŒํผ์ด๋‚˜ ๋Œ€๋ฐ• ๋ง›์žˆ์Šต๋‹ˆ๋‹คโ€ฆใ… ใ… ใ… ใ… ใ… ใ…  (ํฌ๋ฆผ ์†Œ์Šค๋ฉด ๋‹ค ์ข‹์•„ํ•˜๋Š” ๊ฑฐ ์ ˆ๋Œ€ ์•„๋‹™๋‹ˆ๋‹คใ…‹ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹) ๊ฐ•๋‚จ ํ† ๋ผ์ • ์š”๋ฆฌ๋Š” ๋‹ค ๋ง›์žˆ์ง€๋งŒ ํฌ๋ฆผ์†Œ์Šค ์š”๋ฆฌ๋ฅผ ์ฐธ ์ž˜ํ•˜๋Š” ๊ฑฐ ๊ฐ™๋„ค์š” ์š”๊ฑด ๋ฌผ๋งŒ ๋งˆ์‹œ๊ธฐ ์•„์‰ฌ์›Œ ์‹œํ‚จ ๋‰ด์ž๋ชฝ๊ณผ ๋ฐ€ํ‚ค์†Œ๋‹ค ๋”ธ๊ธฐํ†ตํ†ต! ์œ ์ž์™€ ์ž๋ชฝ์˜ ๋ง›์„ ํ•จ๊ป˜ ๋Š๋‚„ ์ˆ˜ ์žˆ๋Š” ๋‰ด์ž๋ชฝ์€ ์ƒํผํ•จ ๊ทธ ์ž์ฒด์˜€์–ด์š”.', 'ํ•˜์น˜๋งŒ ์ €๋Š” ๋”ธ๊ธฐํ†ตํ†ต ๋ฐ€ํ‚ค์†Œ๋‹ค๊ฐ€ ๋” ๋ง›์žˆ์—ˆ์Šต๋‹ˆ๋‹คใ…Žใ…Ž ๋ฐ€ํ‚ค์†Œ๋‹ค๋Š” ํ† ๋ผ์ •์—์„œ๋งŒ ๋งŒ๋‚˜๋ณผ ์ˆ˜ ์žˆ๋Š” ๋ฉ”๋‰ด๋ผ๊ณ  ํ•˜๋‹ˆ ํ•œ ๋ฒˆ ๋“œ์…”๋ณด์‹œ๊ธธ ์ถ”์ฒœํ• ๊ฒŒ์š”!! ๊ฐ•๋‚จ ํ† ๋ผ์ •์€ ๊ฐ•๋‚จ์—ญ ๋ง›์ง‘๋‹ต๊ฒŒ ๋ชจ๋“  ์Œ์‹๋“ค์ด ๋Œ€์ฒด์ ์œผ๋กœ ๋ง›์žˆ์—ˆ์–ด์š”! ๊ฑด๋ฌผ ์œ„์น˜๋„ ๊ฐ•๋‚จ ๋Œ€๋กœ๋ณ€์—์„œ ์กฐ๊ธˆ ๋–จ์–ด์ ธ ์žˆ์–ด ๋‚ด๋ถ€ ์ธํ…Œ๋ฆฌ์–ด์ฒ˜๋Ÿผ ์•„๋Š‘ํ•œ ๋Š๋‚Œ๋„ ์žˆ์—ˆ๊ตฌ์š”ใ…Žใ…Ž', '๊ธฐํšŒ๊ฐ€ ๋˜๋ฉด ๋‹ค๋“ค ๊ผญ ๋“ค๋Ÿฌ๋ณด์„ธ์š”~ ๐Ÿ™‚'],
['์ฃผ๋ง์— ๊ฐ€์กฑ์—ฌํ–‰์œผ๋กœ ์˜ค์…˜์›”๋“œ ๋‹ค๋…€์™”์–ด์š”!!! ์˜ค์…˜์›”๋“œ๋Š” ์ฒ˜์Œ๊ฐ€๋ณด๋Š”๊ฑฐ์—ฌ์„œ ์„ค๋ ˜์„ค๋ ˜~~!! ๋‚ ์”จ๋„ ๋๋‚ด์ฃผ๊ณ ~! ํ•˜๋Š˜,๊ตฌ๋ฆ„ ๋„ˆ๋ฌด ์ด๋ปค์Šต๋‹ˆ๋‹ค~! ๊ฐ€ํ‰ํœด๊ฒŒ์†Œ๊นŒ์ง€ ๊ฐ€๋Š”๋ฐ ์ฐจ๊ฐ€ ์—„~~~์ฒญ ๋ง‰ํ˜”์Šต๋‹ˆ๋‹ค(3์‹œ๊ฐ„๋„˜๊ฒŒ๊ฑธ๋ฆผ) ์™€ ์ •๋ง ํ† ๋‚˜์˜ค๋Š”์ค„ ์•Œ์•˜๋„ค์š” ํ•˜ํ•„ ๋˜ ์ €ํฌ๊ฐ€์กฑ ๋Šฆ๊ฒŒ ์ผ์–ด๋‚˜์„œ ๋Šฆ๊ฒŒ ์ถœ๋ฐœํ–ˆ๊ฑฐ๋“ ์š” ใ…‹ใ…‹ใ…‹', '๊ฐ€ํ‰ํœด๊ฒŒ์†Œ ์‚ฌ๋žŒ๋“ค์ด ์—„์ฒญ ๋งŽ์•˜์–ด์š”! ํ˜ธ๋‘๊ณผ์ž๋ž‘ ๊ตฐ๊ฒƒ์งˆ์ข€ ํ•ด์ฃผ๊ตฌ์š” ใ…‹_ใ…‹ ์˜ค์…˜์›”๋“œ ๋„์ฐฉ!! ์ฃผ์ฐจ์žฅ์ด ๋‹ค ๊ฝ‰์ฐจ์„œ.. ์ฃผ์ฐจํ• ๊ณณ์ด ์—†๋”๋ผ๊ตฌ์š” ๊ณ„์† ์ฃผ์ฐจ์žฅ ๋Œ๋‹ค๊ฐ€ ๊ฒจ์šฐ ํ•œ์ž๋ฆฌ ์žˆ์–ด์„œ ์ฃผ์ฐจํ–ˆ์Šต๋‹ˆ๋‹ค..ใ… ใ… ใ…  ๊ทธ๋Ÿฐ๋ฐ ๋˜ ์ฃผ์ฐจ์žฅ์— ์ฃผ์ฐจํ•˜๊ณ  ์–ธ๋•๊ธธ์„ ์˜ฌ๋ผ๊ฐ€์•ผ ํ•˜๋”๋ผ๊ตฌ์š”!?ํ—~ ์˜ค์…˜์›”๋“œ ..์ด๊ฒŒ๋ญ๋žŒ..', 'ํ์•Œ์ฝ”๋“œ๋กœ ์ฐ๊ณ  ๊ฐ„ํŽธํ•˜๊ฒŒ ์ž…์žฅํ–ˆ์Šต๋‹ˆ๋‹ค ์˜ค์…˜์›”๋“œ ์ฝ”์ธ๋„ ๋„‰๋„‰ํ•˜๊ฒŒ 10๋งŒ์› ์ถฉ์ „ํ–ˆ์–ด์š” ใ…‹ใ…‹ใ…‹ ๋‹ค๋“ค ๋„ˆ๋ฌด ์ž˜๋จน๊ธฐ๋•Œ๋ฌธ์—... ๋„‰๋„‰ํ•˜๊ฒŒ..ใ…‹ใ…‹ใ…‹ ์—ฌ์ž ๋ฝ์ปค์‹ค์— ์—์–ด์ปจ์ด ์–ผ๋งˆ๋‚˜ ๋นต๋นตํ•œ์ง€ ์˜ค๋“ค์˜ค๋“ค ์ถ”์› ์Šต๋‹ˆ๋‹ค ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋Š” ์Šตํ•˜๊ณ  ์ถ•์ถ•ํ•œ๋ฐ ์˜ค์…˜์›”๋“œ๋Š” ์™„์ „ ์ •๋ฐ˜๋Œ€ ใ…‹ใ…‹ใ…‹ ์ œ๊ฐ€ ๋ฐฉ์ˆ˜ํŒฉ์„ ์ค€๋น„๋ชปํ•ด์„œ ๊ฐ์ž 3๊ฐœ ์‚ด๋ ค๊ณ  ํ–ˆ๋Š”๋ฐ ํ—! ํ•œ๊ฐœ์— 19000์›์ด์—์š”!', '๊ทธ๋ž˜์„œ ํ•œ๊ฐœ๋งŒ ์ƒ€์–ด์š” ใ… ใ…  ์ œ ํ•ธ๋“œํฐ์€ ๋ฝ์ปค์—.. ๋ฐฉ์ˆ˜ํŒฉ ๊ผญ ๋ฏธ๋ฆฌ ์ค€๋น„ํ•˜์„ธ์š” ใ…  ๋„˜๋น„์‹ธ์š” ใ…  ์˜ค์…˜์›”๋“œ ์ •๋ง ์—‰๋ง์ง„์ฐฝ์ด์—ˆ์–ด์š” ใ… ใ…  ์‚ฌ๋žŒ์ด ๋„ˆ~~~๋ฌด๋งŽ์•„์„œ ์œ ์ˆ˜ํ’€๋„ ์ค„์„œ์„œ๋“ค์–ด๊ฐ€๊ตฌ์š” ๋‹ค๋ฅธ ๋†€์ด๊ธฐ๊ตฌ๋Š” ์—„๋‘๋„ ๋ชป๋‚ฌ์Šต๋‹ˆ๋‹ค ํŒŒ๋„ํ’€๋„ ์‚ฌ๋žŒ์ด ๋„ˆ๋ฌด ๋งŽ์€์ง€ ์•ˆ์ „์ƒ ๊ด€๋ฆฌ๋ฅผ ๋นก์„ธ๊ฒŒ ํ•ด์„œ ์žฌ๋ฏธ๊ฐ€ ์—†์—ˆ์–ด์š”..', '์ฒ˜์Œ์œผ๋กœ ๋จน์–ด๋ณธ ์†Œ๋–ก์†Œ๋–ก๋ฌผ๋†€์ดํ•˜๋‹ค๊ฐ€ ๋จน์€ ๊ฐ„์‹์ด์–ด์„œ ๊ทธ๋Ÿฐ์ง€ ์ฐธ ๋ง›์žˆ๊ฒŒ ๋จน์—ˆ์Šต๋‹ˆ๋‹ค! ๊ทธ๋ ‡์ง€๋งŒ ์œ„์ƒ์€ ์ •๋ง ์•ˆ์ข‹์•˜์–ด์š”.. ์˜ค์…˜์›”๋“œ ์ฒ˜์Œ์ด๋ผ ๊ธฐ๋Œ€ ๋งŽ์ด ํ–ˆ๋Š”๋ฐ ์ฒจ๋ถ€ํ„ฐ ๋๊นŒ์ง€ ๋‹ค ๋ง˜์— ์•ˆ๋“ค์—ˆ์–ด์š” ๋ฌผ๋ก  ์‚ฌ๋žŒ์ด ๋„ˆ~๋ฌด ๋งŽ์•„์„œ ์ผ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋Š” ์œ„์ƒ๋„ ๊ดœ์ฐฎ์•„ ๋ณด์ด๊ณ  ์Œ์‹์ด ๋น„์‹ธ์ง€๋งŒ ๋‹ค ๋ง›์žˆ์—ˆ๊ฑฐ๋“ ์š”! ๊ทผ๋ฐ ์˜ค์…˜์›”๋“œ ์œ„์ƒ๋„ ๋ณ„๋กœ๊ณ  ๋น„์‹ธ๊ณ  ๋ง›์—†๊ณ !!! ์ฃผ์ฐจ์žฅ๋„ ์ข๊ณ  ์ฃผ์ฐจ์žฅ์—์„œ ์ž…๊ตฌ๊นŒ์ง€ ๊ฑธ์–ด์„œ ์˜ฌ๋ผ๊ฐ€๊ณ .. ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋ณด๋‹ค ๋‚˜์•˜๋˜๊ฑด ๋ฝ์ปค์‹œ์„ค๊ณผ ์œ ์ˆ˜ํ’€ ๋‘๊ฐœ ์ •๋„! ์˜ค์…˜์›”๋“œ ์ •๋ง ์•„์‰ฌ์› ์Šต๋‹ˆ๋‹ค', '๊ฐœ์ธ์ ์œผ๋ฃจ ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๊ฐ€ ํ›จ์”ฌ ๋‚˜์€๋“ฏ!']]

max_length (int)

This parameter indicates the maximum length of each chunk. The split_chunks function creates chunks by concatenating sentences while traversing the list of segmented sentences. If the concatenated string is longer than the maximum length, Kss make it into a chunk (paragraph) including previous sentences.

  • An example of max_length
>>> from kss import split_chunks
>>> text = """์ฃผ๋ง์— ๊ฐ€์กฑ์—ฌํ–‰์œผ๋กœ ์˜ค์…˜์›”๋“œ ๋‹ค๋…€์™”์–ด์š”!!! ์˜ค์…˜์›”๋“œ๋Š” ์ฒ˜์Œ๊ฐ€๋ณด๋Š”๊ฑฐ์—ฌ์„œ ์„ค๋ ˜์„ค๋ ˜~~!! ๋‚ ์”จ๋„ ๋๋‚ด์ฃผ๊ณ ~! ํ•˜๋Š˜,๊ตฌ๋ฆ„ ๋„ˆ๋ฌด ์ด๋ปค์Šต๋‹ˆ๋‹ค~! ๊ฐ€ํ‰ํœด๊ฒŒ์†Œ๊นŒ์ง€ ๊ฐ€๋Š”๋ฐ ์ฐจ๊ฐ€ ์—„~~~์ฒญ ๋ง‰ํ˜”์Šต๋‹ˆ๋‹ค(3์‹œ๊ฐ„๋„˜๊ฒŒ๊ฑธ๋ฆผ) ์™€ ์ •๋ง ํ† ๋‚˜์˜ค๋Š”์ค„ ์•Œ์•˜๋„ค์š” ํ•˜ํ•„ ๋˜ ์ €ํฌ๊ฐ€์กฑ ๋Šฆ๊ฒŒ ์ผ์–ด๋‚˜์„œ ๋Šฆ๊ฒŒ ์ถœ๋ฐœํ–ˆ๊ฑฐ๋“ ์š” ใ…‹ใ…‹ใ…‹ ๊ฐ€ํ‰ํœด๊ฒŒ์†Œ ์‚ฌ๋žŒ๋“ค์ด ์—„์ฒญ ๋งŽ์•˜์–ด์š”! ํ˜ธ๋‘๊ณผ์ž๋ž‘ ๊ตฐ๊ฒƒ์งˆ์ข€ ํ•ด์ฃผ๊ตฌ์š” ใ…‹_ใ…‹ ์˜ค์…˜์›”๋“œ ๋„์ฐฉ!! ์ฃผ์ฐจ์žฅ์ด ๋‹ค ๊ฝ‰์ฐจ์„œ.. ์ฃผ์ฐจํ• ๊ณณ์ด ์—†๋”๋ผ๊ตฌ์š” ๊ณ„์† ์ฃผ์ฐจ์žฅ ๋Œ๋‹ค๊ฐ€ ๊ฒจ์šฐ ํ•œ์ž๋ฆฌ ์žˆ์–ด์„œ ์ฃผ์ฐจํ–ˆ์Šต๋‹ˆ๋‹ค..ใ… ใ… ใ…  ๊ทธ๋Ÿฐ๋ฐ ๋˜ ์ฃผ์ฐจ์žฅ์— ์ฃผ์ฐจํ•˜๊ณ  ์–ธ๋•๊ธธ์„ ์˜ฌ๋ผ๊ฐ€์•ผ ํ•˜๋”๋ผ๊ตฌ์š”!?ํ—~ ์˜ค์…˜์›”๋“œ ..์ด๊ฒŒ๋ญ๋žŒ.. ํ์•Œ์ฝ”๋“œ๋กœ ์ฐ๊ณ  ๊ฐ„ํŽธํ•˜๊ฒŒ ์ž…์žฅํ–ˆ์Šต๋‹ˆ๋‹ค ์˜ค์…˜์›”๋“œ ์ฝ”์ธ๋„ ๋„‰๋„‰ํ•˜๊ฒŒ 10๋งŒ์› ์ถฉ์ „ํ–ˆ์–ด์š” ใ…‹ใ…‹ใ…‹ ๋‹ค๋“ค ๋„ˆ๋ฌด ์ž˜๋จน๊ธฐ๋•Œ๋ฌธ์—... ๋„‰๋„‰ํ•˜๊ฒŒ..ใ…‹ใ…‹ใ…‹ ์—ฌ์ž ๋ฝ์ปค์‹ค์— ์—์–ด์ปจ์ด ์–ผ๋งˆ๋‚˜ ๋นต๋นตํ•œ์ง€ ์˜ค๋“ค์˜ค๋“ค ์ถ”์› ์Šต๋‹ˆ๋‹ค ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋Š” ์Šตํ•˜๊ณ  ์ถ•์ถ•ํ•œ๋ฐ ์˜ค์…˜์›”๋“œ๋Š” ์™„์ „ ์ •๋ฐ˜๋Œ€ ใ…‹ใ…‹ใ…‹ ์ œ๊ฐ€ ๋ฐฉ์ˆ˜ํŒฉ์„ ์ค€๋น„๋ชปํ•ด์„œ ๊ฐ์ž 3๊ฐœ ์‚ด๋ ค๊ณ  ํ–ˆ๋Š”๋ฐ ํ—! ํ•œ๊ฐœ์— 19000์›์ด์—์š”! ๊ทธ๋ž˜์„œ ํ•œ๊ฐœ๋งŒ ์ƒ€์–ด์š” ใ… ใ…  ์ œ ํ•ธ๋“œํฐ์€ ๋ฝ์ปค์—.. ๋ฐฉ์ˆ˜ํŒฉ ๊ผญ ๋ฏธ๋ฆฌ ์ค€๋น„ํ•˜์„ธ์š” ใ…  ๋„˜๋น„์‹ธ์š” ใ…  ์˜ค์…˜์›”๋“œ ์ •๋ง ์—‰๋ง์ง„์ฐฝ์ด์—ˆ์–ด์š” ใ… ใ…  ์‚ฌ๋žŒ์ด ๋„ˆ~~~๋ฌด๋งŽ์•„์„œ ์œ ์ˆ˜ํ’€๋„ ์ค„์„œ์„œ๋“ค์–ด๊ฐ€๊ตฌ์š” ๋‹ค๋ฅธ ๋†€์ด๊ธฐ๊ตฌ๋Š” ์—„๋‘๋„ ๋ชป๋‚ฌ์Šต๋‹ˆ๋‹ค ํŒŒ๋„ํ’€๋„ ์‚ฌ๋žŒ์ด ๋„ˆ๋ฌด ๋งŽ์€์ง€ ์•ˆ์ „์ƒ ๊ด€๋ฆฌ๋ฅผ ๋นก์„ธ๊ฒŒ ํ•ด์„œ ์žฌ๋ฏธ๊ฐ€ ์—†์—ˆ์–ด์š”.. ์ฒ˜์Œ์œผ๋กœ ๋จน์–ด๋ณธ ์†Œ๋–ก์†Œ๋–กย ๋ฌผ๋†€์ดํ•˜๋‹ค๊ฐ€ ๋จน์€ ๊ฐ„์‹์ด์–ด์„œ ๊ทธ๋Ÿฐ์ง€ ์ฐธ ๋ง›์žˆ๊ฒŒ ๋จน์—ˆ์Šต๋‹ˆ๋‹ค! ๊ทธ๋ ‡์ง€๋งŒ ์œ„์ƒ์€ ์ •๋ง ์•ˆ์ข‹์•˜์–ด์š”.. ์˜ค์…˜์›”๋“œ ์ฒ˜์Œ์ด๋ผ ๊ธฐ๋Œ€ ๋งŽ์ด ํ–ˆ๋Š”๋ฐ ์ฒจ๋ถ€ํ„ฐ ๋๊นŒ์ง€ ๋‹ค ๋ง˜์— ์•ˆ๋“ค์—ˆ์–ด์š” ๋ฌผ๋ก  ์‚ฌ๋žŒ์ด ๋„ˆ~๋ฌด ๋งŽ์•„์„œ ์ผ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋Š” ์œ„์ƒ๋„ ๊ดœ์ฐฎ์•„ ๋ณด์ด๊ณ  ์Œ์‹์ด ๋น„์‹ธ์ง€๋งŒ ๋‹ค ๋ง›์žˆ์—ˆ๊ฑฐ๋“ ์š”! ๊ทผ๋ฐ ์˜ค์…˜์›”๋“œ ์œ„์ƒ๋„ ๋ณ„๋กœ๊ณ  ๋น„์‹ธ๊ณ  ๋ง›์—†๊ณ !!! ์ฃผ์ฐจ์žฅ๋„ ์ข๊ณ  ์ฃผ์ฐจ์žฅ์—์„œ ์ž…๊ตฌ๊นŒ์ง€ ๊ฑธ์–ด์„œ ์˜ฌ๋ผ๊ฐ€๊ณ .. ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋ณด๋‹ค ๋‚˜์•˜๋˜๊ฑด ๋ฝ์ปค์‹œ์„ค๊ณผ ์œ ์ˆ˜ํ’€ ๋‘๊ฐœ ์ •๋„! ์˜ค์…˜์›”๋“œ ์ •๋ง ์•„์‰ฌ์› ์Šต๋‹ˆ๋‹ค ๊ฐœ์ธ์ ์œผ๋ฃจ ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๊ฐ€ ํ›จ์”ฌ ๋‚˜์€๋“ฏ!"""
>>> split_chunks(text, max_length=24)
['์ฃผ๋ง์— ๊ฐ€์กฑ์—ฌํ–‰์œผ๋กœ ์˜ค์…˜์›”๋“œ ๋‹ค๋…€์™”์–ด์š”!!! ์˜ค์…˜์›”๋“œ๋Š” ์ฒ˜์Œ๊ฐ€๋ณด๋Š”๊ฑฐ์—ฌ์„œ ์„ค๋ ˜์„ค๋ ˜~~!! ๋‚ ์”จ๋„ ๋๋‚ด์ฃผ๊ณ ~! ํ•˜๋Š˜,๊ตฌ๋ฆ„ ๋„ˆ๋ฌด ์ด๋ปค์Šต๋‹ˆ๋‹ค~! ๊ฐ€ํ‰ํœด๊ฒŒ์†Œ๊นŒ์ง€ ๊ฐ€๋Š”๋ฐ ์ฐจ๊ฐ€ ์—„~~~์ฒญ ๋ง‰ํ˜”์Šต๋‹ˆ๋‹ค', '(3์‹œ๊ฐ„๋„˜๊ฒŒ๊ฑธ๋ฆผ) ์™€ ์ •๋ง ํ† ๋‚˜์˜ค๋Š”์ค„ ์•Œ์•˜๋„ค์š” ํ•˜ํ•„ ๋˜ ์ €ํฌ๊ฐ€์กฑ ๋Šฆ๊ฒŒ ์ผ์–ด๋‚˜์„œ ๋Šฆ๊ฒŒ ์ถœ๋ฐœํ–ˆ๊ฑฐ๋“ ์š” ใ…‹ใ…‹ใ…‹', '๊ฐ€ํ‰ํœด๊ฒŒ์†Œ ์‚ฌ๋žŒ๋“ค์ด ์—„์ฒญ ๋งŽ์•˜์–ด์š”! ํ˜ธ๋‘๊ณผ์ž๋ž‘ ๊ตฐ๊ฒƒ์งˆ์ข€ ํ•ด์ฃผ๊ตฌ์š” ใ…‹_ใ…‹ ์˜ค์…˜์›”๋“œ ๋„์ฐฉ!! ์ฃผ์ฐจ์žฅ์ด ๋‹ค ๊ฝ‰์ฐจ์„œ.. ์ฃผ์ฐจํ• ๊ณณ์ด ์—†๋”๋ผ๊ตฌ์š”', '๊ณ„์† ์ฃผ์ฐจ์žฅ ๋Œ๋‹ค๊ฐ€ ๊ฒจ์šฐ ํ•œ์ž๋ฆฌ ์žˆ์–ด์„œ ์ฃผ์ฐจํ–ˆ์Šต๋‹ˆ๋‹ค..ใ… ใ… ใ…  ๊ทธ๋Ÿฐ๋ฐ ๋˜ ์ฃผ์ฐจ์žฅ์— ์ฃผ์ฐจํ•˜๊ณ  ์–ธ๋•๊ธธ์„ ์˜ฌ๋ผ๊ฐ€์•ผ ํ•˜๋”๋ผ๊ตฌ์š”!?', 'ํ—~ ์˜ค์…˜์›”๋“œ ..์ด๊ฒŒ๋ญ๋žŒ.. ํ์•Œ์ฝ”๋“œ๋กœ ์ฐ๊ณ  ๊ฐ„ํŽธํ•˜๊ฒŒ ์ž…์žฅํ–ˆ์Šต๋‹ˆ๋‹ค ์˜ค์…˜์›”๋“œ ์ฝ”์ธ๋„ ๋„‰๋„‰ํ•˜๊ฒŒ 10๋งŒ์› ์ถฉ์ „ํ–ˆ์–ด์š” ใ…‹ใ…‹ใ…‹', '๋‹ค๋“ค ๋„ˆ๋ฌด ์ž˜๋จน๊ธฐ๋•Œ๋ฌธ์—... ๋„‰๋„‰ํ•˜๊ฒŒ..ใ…‹ใ…‹ใ…‹ ์—ฌ์ž ๋ฝ์ปค์‹ค์— ์—์–ด์ปจ์ด ์–ผ๋งˆ๋‚˜ ๋นต๋นตํ•œ์ง€ ์˜ค๋“ค์˜ค๋“ค ์ถ”์› ์Šต๋‹ˆ๋‹ค ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋Š” ์Šตํ•˜๊ณ  ์ถ•์ถ•ํ•œ๋ฐ ์˜ค์…˜์›”๋“œ๋Š” ์™„์ „ ์ •๋ฐ˜๋Œ€ ใ…‹ใ…‹ใ…‹', '์ œ๊ฐ€ ๋ฐฉ์ˆ˜ํŒฉ์„ ์ค€๋น„๋ชปํ•ด์„œ ๊ฐ์ž 3๊ฐœ ์‚ด๋ ค๊ณ  ํ–ˆ๋Š”๋ฐ ํ—! ํ•œ๊ฐœ์— 19000์›์ด์—์š”! ๊ทธ๋ž˜์„œ ํ•œ๊ฐœ๋งŒ ์ƒ€์–ด์š” ใ… ใ…  ์ œ ํ•ธ๋“œํฐ์€ ๋ฝ์ปค์—.. ๋ฐฉ์ˆ˜ํŒฉ ๊ผญ ๋ฏธ๋ฆฌ ์ค€๋น„ํ•˜์„ธ์š” ใ… ', '๋„˜๋น„์‹ธ์š” ใ…  ์˜ค์…˜์›”๋“œ ์ •๋ง ์—‰๋ง์ง„์ฐฝ์ด์—ˆ์–ด์š” ใ… ใ…  ์‚ฌ๋žŒ์ด ๋„ˆ~~~๋ฌด๋งŽ์•„์„œ ์œ ์ˆ˜ํ’€๋„ ์ค„์„œ์„œ๋“ค์–ด๊ฐ€๊ตฌ์š”', '๋‹ค๋ฅธ ๋†€์ด๊ธฐ๊ตฌ๋Š” ์—„๋‘๋„ ๋ชป๋‚ฌ์Šต๋‹ˆ๋‹ค ํŒŒ๋„ํ’€๋„ ์‚ฌ๋žŒ์ด ๋„ˆ๋ฌด ๋งŽ์€์ง€ ์•ˆ์ „์ƒ ๊ด€๋ฆฌ๋ฅผ ๋นก์„ธ๊ฒŒ ํ•ด์„œ ์žฌ๋ฏธ๊ฐ€ ์—†์—ˆ์–ด์š”..', '์ฒ˜์Œ์œผ๋กœ ๋จน์–ด๋ณธ ์†Œ๋–ก์†Œ๋–ก๋ฌผ๋†€์ดํ•˜๋‹ค๊ฐ€ ๋จน์€ ๊ฐ„์‹์ด์–ด์„œ ๊ทธ๋Ÿฐ์ง€ ์ฐธ ๋ง›์žˆ๊ฒŒ ๋จน์—ˆ์Šต๋‹ˆ๋‹ค! ๊ทธ๋ ‡์ง€๋งŒ ์œ„์ƒ์€ ์ •๋ง ์•ˆ์ข‹์•˜์–ด์š”.. ์˜ค์…˜์›”๋“œ ์ฒ˜์Œ์ด๋ผ ๊ธฐ๋Œ€ ๋งŽ์ด ํ–ˆ๋Š”๋ฐ ์ฒจ๋ถ€ํ„ฐ ๋๊นŒ์ง€ ๋‹ค ๋ง˜์— ์•ˆ๋“ค์—ˆ์–ด์š”', '๋ฌผ๋ก  ์‚ฌ๋žŒ์ด ๋„ˆ~๋ฌด ๋งŽ์•„์„œ ์ผ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋Š” ์œ„์ƒ๋„ ๊ดœ์ฐฎ์•„ ๋ณด์ด๊ณ  ์Œ์‹์ด ๋น„์‹ธ์ง€๋งŒ ๋‹ค ๋ง›์žˆ์—ˆ๊ฑฐ๋“ ์š”!', '๊ทผ๋ฐ ์˜ค์…˜์›”๋“œ ์œ„์ƒ๋„ ๋ณ„๋กœ๊ณ  ๋น„์‹ธ๊ณ  ๋ง›์—†๊ณ !!! ์ฃผ์ฐจ์žฅ๋„ ์ข๊ณ  ์ฃผ์ฐจ์žฅ์—์„œ ์ž…๊ตฌ๊นŒ์ง€ ๊ฑธ์–ด์„œ ์˜ฌ๋ผ๊ฐ€๊ณ .. ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋ณด๋‹ค ๋‚˜์•˜๋˜๊ฑด ๋ฝ์ปค์‹œ์„ค๊ณผ ์œ ์ˆ˜ํ’€ ๋‘๊ฐœ ์ •๋„! ์˜ค์…˜์›”๋“œ ์ •๋ง ์•„์‰ฌ์› ์Šต๋‹ˆ๋‹ค ๊ฐœ์ธ์ ์œผ๋ฃจ ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๊ฐ€ ํ›จ์”ฌ ๋‚˜์€๋“ฏ!']

>>> split_chunks(text, max_length=128)
['์ฃผ๋ง์— ๊ฐ€์กฑ์—ฌํ–‰์œผ๋กœ ์˜ค์…˜์›”๋“œ ๋‹ค๋…€์™”์–ด์š”!!! ์˜ค์…˜์›”๋“œ๋Š” ์ฒ˜์Œ๊ฐ€๋ณด๋Š”๊ฑฐ์—ฌ์„œ ์„ค๋ ˜์„ค๋ ˜~~!! ๋‚ ์”จ๋„ ๋๋‚ด์ฃผ๊ณ ~! ํ•˜๋Š˜,๊ตฌ๋ฆ„ ๋„ˆ๋ฌด ์ด๋ปค์Šต๋‹ˆ๋‹ค~! ๊ฐ€ํ‰ํœด๊ฒŒ์†Œ๊นŒ์ง€ ๊ฐ€๋Š”๋ฐ ์ฐจ๊ฐ€ ์—„~~~์ฒญ ๋ง‰ํ˜”์Šต๋‹ˆ๋‹ค(3์‹œ๊ฐ„๋„˜๊ฒŒ๊ฑธ๋ฆผ) ์™€ ์ •๋ง ํ† ๋‚˜์˜ค๋Š”์ค„ ์•Œ์•˜๋„ค์š” ํ•˜ํ•„ ๋˜ ์ €ํฌ๊ฐ€์กฑ ๋Šฆ๊ฒŒ ์ผ์–ด๋‚˜์„œ ๋Šฆ๊ฒŒ ์ถœ๋ฐœํ–ˆ๊ฑฐ๋“ ์š” ใ…‹ใ…‹ใ…‹', '๊ฐ€ํ‰ํœด๊ฒŒ์†Œ ์‚ฌ๋žŒ๋“ค์ด ์—„์ฒญ ๋งŽ์•˜์–ด์š”! ํ˜ธ๋‘๊ณผ์ž๋ž‘ ๊ตฐ๊ฒƒ์งˆ์ข€ ํ•ด์ฃผ๊ตฌ์š” ใ…‹_ใ…‹ ์˜ค์…˜์›”๋“œ ๋„์ฐฉ!! ์ฃผ์ฐจ์žฅ์ด ๋‹ค ๊ฝ‰์ฐจ์„œ.. ์ฃผ์ฐจํ• ๊ณณ์ด ์—†๋”๋ผ๊ตฌ์š” ๊ณ„์† ์ฃผ์ฐจ์žฅ ๋Œ๋‹ค๊ฐ€ ๊ฒจ์šฐ ํ•œ์ž๋ฆฌ ์žˆ์–ด์„œ ์ฃผ์ฐจํ–ˆ์Šต๋‹ˆ๋‹ค..ใ… ใ… ใ…  ๊ทธ๋Ÿฐ๋ฐ ๋˜ ์ฃผ์ฐจ์žฅ์— ์ฃผ์ฐจํ•˜๊ณ  ์–ธ๋•๊ธธ์„ ์˜ฌ๋ผ๊ฐ€์•ผ ํ•˜๋”๋ผ๊ตฌ์š”!?ํ—~ ์˜ค์…˜์›”๋“œ ..์ด๊ฒŒ๋ญ๋žŒ..', 'ํ์•Œ์ฝ”๋“œ๋กœ ์ฐ๊ณ  ๊ฐ„ํŽธํ•˜๊ฒŒ ์ž…์žฅํ–ˆ์Šต๋‹ˆ๋‹ค ์˜ค์…˜์›”๋“œ ์ฝ”์ธ๋„ ๋„‰๋„‰ํ•˜๊ฒŒ 10๋งŒ์› ์ถฉ์ „ํ–ˆ์–ด์š” ใ…‹ใ…‹ใ…‹ ๋‹ค๋“ค ๋„ˆ๋ฌด ์ž˜๋จน๊ธฐ๋•Œ๋ฌธ์—... ๋„‰๋„‰ํ•˜๊ฒŒ..ใ…‹ใ…‹ใ…‹ ์—ฌ์ž ๋ฝ์ปค์‹ค์— ์—์–ด์ปจ์ด ์–ผ๋งˆ๋‚˜ ๋นต๋นตํ•œ์ง€ ์˜ค๋“ค์˜ค๋“ค ์ถ”์› ์Šต๋‹ˆ๋‹ค ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋Š” ์Šตํ•˜๊ณ  ์ถ•์ถ•ํ•œ๋ฐ ์˜ค์…˜์›”๋“œ๋Š” ์™„์ „ ์ •๋ฐ˜๋Œ€ ใ…‹ใ…‹ใ…‹ ์ œ๊ฐ€ ๋ฐฉ์ˆ˜ํŒฉ์„ ์ค€๋น„๋ชปํ•ด์„œ ๊ฐ์ž 3๊ฐœ ์‚ด๋ ค๊ณ  ํ–ˆ๋Š”๋ฐ ํ—! ํ•œ๊ฐœ์— 19000์›์ด์—์š”!', '๊ทธ๋ž˜์„œ ํ•œ๊ฐœ๋งŒ ์ƒ€์–ด์š” ใ… ใ…  ์ œ ํ•ธ๋“œํฐ์€ ๋ฝ์ปค์—.. ๋ฐฉ์ˆ˜ํŒฉ ๊ผญ ๋ฏธ๋ฆฌ ์ค€๋น„ํ•˜์„ธ์š” ใ…  ๋„˜๋น„์‹ธ์š” ใ…  ์˜ค์…˜์›”๋“œ ์ •๋ง ์—‰๋ง์ง„์ฐฝ์ด์—ˆ์–ด์š” ใ… ใ…  ์‚ฌ๋žŒ์ด ๋„ˆ~~~๋ฌด๋งŽ์•„์„œ ์œ ์ˆ˜ํ’€๋„ ์ค„์„œ์„œ๋“ค์–ด๊ฐ€๊ตฌ์š” ๋‹ค๋ฅธ ๋†€์ด๊ธฐ๊ตฌ๋Š” ์—„๋‘๋„ ๋ชป๋‚ฌ์Šต๋‹ˆ๋‹ค ํŒŒ๋„ํ’€๋„ ์‚ฌ๋žŒ์ด ๋„ˆ๋ฌด ๋งŽ์€์ง€ ์•ˆ์ „์ƒ ๊ด€๋ฆฌ๋ฅผ ๋นก์„ธ๊ฒŒ ํ•ด์„œ ์žฌ๋ฏธ๊ฐ€ ์—†์—ˆ์–ด์š”..', '์ฒ˜์Œ์œผ๋กœ ๋จน์–ด๋ณธ ์†Œ๋–ก์†Œ๋–ก๋ฌผ๋†€์ดํ•˜๋‹ค๊ฐ€ ๋จน์€ ๊ฐ„์‹์ด์–ด์„œ ๊ทธ๋Ÿฐ์ง€ ์ฐธ ๋ง›์žˆ๊ฒŒ ๋จน์—ˆ์Šต๋‹ˆ๋‹ค! ๊ทธ๋ ‡์ง€๋งŒ ์œ„์ƒ์€ ์ •๋ง ์•ˆ์ข‹์•˜์–ด์š”.. ์˜ค์…˜์›”๋“œ ์ฒ˜์Œ์ด๋ผ ๊ธฐ๋Œ€ ๋งŽ์ด ํ–ˆ๋Š”๋ฐ ์ฒจ๋ถ€ํ„ฐ ๋๊นŒ์ง€ ๋‹ค ๋ง˜์— ์•ˆ๋“ค์—ˆ์–ด์š” ๋ฌผ๋ก  ์‚ฌ๋žŒ์ด ๋„ˆ~๋ฌด ๋งŽ์•„์„œ ์ผ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋Š” ์œ„์ƒ๋„ ๊ดœ์ฐฎ์•„ ๋ณด์ด๊ณ  ์Œ์‹์ด ๋น„์‹ธ์ง€๋งŒ ๋‹ค ๋ง›์žˆ์—ˆ๊ฑฐ๋“ ์š”! ๊ทผ๋ฐ ์˜ค์…˜์›”๋“œ ์œ„์ƒ๋„ ๋ณ„๋กœ๊ณ  ๋น„์‹ธ๊ณ  ๋ง›์—†๊ณ !!! ์ฃผ์ฐจ์žฅ๋„ ์ข๊ณ  ์ฃผ์ฐจ์žฅ์—์„œ ์ž…๊ตฌ๊นŒ์ง€ ๊ฑธ์–ด์„œ ์˜ฌ๋ผ๊ฐ€๊ณ .. ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋ณด๋‹ค ๋‚˜์•˜๋˜๊ฑด ๋ฝ์ปค์‹œ์„ค๊ณผ ์œ ์ˆ˜ํ’€ ๋‘๊ฐœ ์ •๋„! ์˜ค์…˜์›”๋“œ ์ •๋ง ์•„์‰ฌ์› ์Šต๋‹ˆ๋‹ค', '๊ฐœ์ธ์ ์œผ๋ฃจ ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๊ฐ€ ํ›จ์”ฌ ๋‚˜์€๋“ฏ!']

overlap (bool)

This parameter indicates whether the sentences can be duplicated across the chunks. If you set it to True, sentences can be duplicated across the chunks like sliding window. If you set it to False, each sentence is going to unique.

  • An example of overlap
>>> from kss import split_chunks
>>> text = """์ฃผ๋ง์— ๊ฐ€์กฑ์—ฌํ–‰์œผ๋กœ ์˜ค์…˜์›”๋“œ ๋‹ค๋…€์™”์–ด์š”!!! ์˜ค์…˜์›”๋“œ๋Š” ์ฒ˜์Œ๊ฐ€๋ณด๋Š”๊ฑฐ์—ฌ์„œ ์„ค๋ ˜์„ค๋ ˜~~!! ๋‚ ์”จ๋„ ๋๋‚ด์ฃผ๊ณ ~! ํ•˜๋Š˜,๊ตฌ๋ฆ„ ๋„ˆ๋ฌด ์ด๋ปค์Šต๋‹ˆ๋‹ค~! ๊ฐ€ํ‰ํœด๊ฒŒ์†Œ๊นŒ์ง€ ๊ฐ€๋Š”๋ฐ ์ฐจ๊ฐ€ ์—„~~~์ฒญ ๋ง‰ํ˜”์Šต๋‹ˆ๋‹ค(3์‹œ๊ฐ„๋„˜๊ฒŒ๊ฑธ๋ฆผ) ์™€ ์ •๋ง ํ† ๋‚˜์˜ค๋Š”์ค„ ์•Œ์•˜๋„ค์š” ํ•˜ํ•„ ๋˜ ์ €ํฌ๊ฐ€์กฑ ๋Šฆ๊ฒŒ ์ผ์–ด๋‚˜์„œ ๋Šฆ๊ฒŒ ์ถœ๋ฐœํ–ˆ๊ฑฐ๋“ ์š” ใ…‹ใ…‹ใ…‹ ๊ฐ€ํ‰ํœด๊ฒŒ์†Œ ์‚ฌ๋žŒ๋“ค์ด ์—„์ฒญ ๋งŽ์•˜์–ด์š”! ํ˜ธ๋‘๊ณผ์ž๋ž‘ ๊ตฐ๊ฒƒ์งˆ์ข€ ํ•ด์ฃผ๊ตฌ์š” ใ…‹_ใ…‹ ์˜ค์…˜์›”๋“œ ๋„์ฐฉ!! ์ฃผ์ฐจ์žฅ์ด ๋‹ค ๊ฝ‰์ฐจ์„œ.. ์ฃผ์ฐจํ• ๊ณณ์ด ์—†๋”๋ผ๊ตฌ์š” ๊ณ„์† ์ฃผ์ฐจ์žฅ ๋Œ๋‹ค๊ฐ€ ๊ฒจ์šฐ ํ•œ์ž๋ฆฌ ์žˆ์–ด์„œ ์ฃผ์ฐจํ–ˆ์Šต๋‹ˆ๋‹ค..ใ… ใ… ใ…  ๊ทธ๋Ÿฐ๋ฐ ๋˜ ์ฃผ์ฐจ์žฅ์— ์ฃผ์ฐจํ•˜๊ณ  ์–ธ๋•๊ธธ์„ ์˜ฌ๋ผ๊ฐ€์•ผ ํ•˜๋”๋ผ๊ตฌ์š”!?ํ—~ ์˜ค์…˜์›”๋“œ ..์ด๊ฒŒ๋ญ๋žŒ.. ํ์•Œ์ฝ”๋“œ๋กœ ์ฐ๊ณ  ๊ฐ„ํŽธํ•˜๊ฒŒ ์ž…์žฅํ–ˆ์Šต๋‹ˆ๋‹ค ์˜ค์…˜์›”๋“œ ์ฝ”์ธ๋„ ๋„‰๋„‰ํ•˜๊ฒŒ 10๋งŒ์› ์ถฉ์ „ํ–ˆ์–ด์š” ใ…‹ใ…‹ใ…‹ ๋‹ค๋“ค ๋„ˆ๋ฌด ์ž˜๋จน๊ธฐ๋•Œ๋ฌธ์—... ๋„‰๋„‰ํ•˜๊ฒŒ..ใ…‹ใ…‹ใ…‹ ์—ฌ์ž ๋ฝ์ปค์‹ค์— ์—์–ด์ปจ์ด ์–ผ๋งˆ๋‚˜ ๋นต๋นตํ•œ์ง€ ์˜ค๋“ค์˜ค๋“ค ์ถ”์› ์Šต๋‹ˆ๋‹ค ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋Š” ์Šตํ•˜๊ณ  ์ถ•์ถ•ํ•œ๋ฐ ์˜ค์…˜์›”๋“œ๋Š” ์™„์ „ ์ •๋ฐ˜๋Œ€ ใ…‹ใ…‹ใ…‹ ์ œ๊ฐ€ ๋ฐฉ์ˆ˜ํŒฉ์„ ์ค€๋น„๋ชปํ•ด์„œ ๊ฐ์ž 3๊ฐœ ์‚ด๋ ค๊ณ  ํ–ˆ๋Š”๋ฐ ํ—! ํ•œ๊ฐœ์— 19000์›์ด์—์š”! ๊ทธ๋ž˜์„œ ํ•œ๊ฐœ๋งŒ ์ƒ€์–ด์š” ใ… ใ…  ์ œ ํ•ธ๋“œํฐ์€ ๋ฝ์ปค์—.. ๋ฐฉ์ˆ˜ํŒฉ ๊ผญ ๋ฏธ๋ฆฌ ์ค€๋น„ํ•˜์„ธ์š” ใ…  ๋„˜๋น„์‹ธ์š” ใ…  ์˜ค์…˜์›”๋“œ ์ •๋ง ์—‰๋ง์ง„์ฐฝ์ด์—ˆ์–ด์š” ใ… ใ…  ์‚ฌ๋žŒ์ด ๋„ˆ~~~๋ฌด๋งŽ์•„์„œ ์œ ์ˆ˜ํ’€๋„ ์ค„์„œ์„œ๋“ค์–ด๊ฐ€๊ตฌ์š” ๋‹ค๋ฅธ ๋†€์ด๊ธฐ๊ตฌ๋Š” ์—„๋‘๋„ ๋ชป๋‚ฌ์Šต๋‹ˆ๋‹ค ํŒŒ๋„ํ’€๋„ ์‚ฌ๋žŒ์ด ๋„ˆ๋ฌด ๋งŽ์€์ง€ ์•ˆ์ „์ƒ ๊ด€๋ฆฌ๋ฅผ ๋นก์„ธ๊ฒŒ ํ•ด์„œ ์žฌ๋ฏธ๊ฐ€ ์—†์—ˆ์–ด์š”.. ์ฒ˜์Œ์œผ๋กœ ๋จน์–ด๋ณธ ์†Œ๋–ก์†Œ๋–กย ๋ฌผ๋†€์ดํ•˜๋‹ค๊ฐ€ ๋จน์€ ๊ฐ„์‹์ด์–ด์„œ ๊ทธ๋Ÿฐ์ง€ ์ฐธ ๋ง›์žˆ๊ฒŒ ๋จน์—ˆ์Šต๋‹ˆ๋‹ค! ๊ทธ๋ ‡์ง€๋งŒ ์œ„์ƒ์€ ์ •๋ง ์•ˆ์ข‹์•˜์–ด์š”.. ์˜ค์…˜์›”๋“œ ์ฒ˜์Œ์ด๋ผ ๊ธฐ๋Œ€ ๋งŽ์ด ํ–ˆ๋Š”๋ฐ ์ฒจ๋ถ€ํ„ฐ ๋๊นŒ์ง€ ๋‹ค ๋ง˜์— ์•ˆ๋“ค์—ˆ์–ด์š” ๋ฌผ๋ก  ์‚ฌ๋žŒ์ด ๋„ˆ~๋ฌด ๋งŽ์•„์„œ ์ผ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋Š” ์œ„์ƒ๋„ ๊ดœ์ฐฎ์•„ ๋ณด์ด๊ณ  ์Œ์‹์ด ๋น„์‹ธ์ง€๋งŒ ๋‹ค ๋ง›์žˆ์—ˆ๊ฑฐ๋“ ์š”! ๊ทผ๋ฐ ์˜ค์…˜์›”๋“œ ์œ„์ƒ๋„ ๋ณ„๋กœ๊ณ  ๋น„์‹ธ๊ณ  ๋ง›์—†๊ณ !!! ์ฃผ์ฐจ์žฅ๋„ ์ข๊ณ  ์ฃผ์ฐจ์žฅ์—์„œ ์ž…๊ตฌ๊นŒ์ง€ ๊ฑธ์–ด์„œ ์˜ฌ๋ผ๊ฐ€๊ณ .. ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋ณด๋‹ค ๋‚˜์•˜๋˜๊ฑด ๋ฝ์ปค์‹œ์„ค๊ณผ ์œ ์ˆ˜ํ’€ ๋‘๊ฐœ ์ •๋„! ์˜ค์…˜์›”๋“œ ์ •๋ง ์•„์‰ฌ์› ์Šต๋‹ˆ๋‹ค ๊ฐœ์ธ์ ์œผ๋ฃจ ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๊ฐ€ ํ›จ์”ฌ ๋‚˜์€๋“ฏ!"""
>>> split_chunks(text, max_length=24, overlap=False)
['์ฃผ๋ง์— ๊ฐ€์กฑ์—ฌํ–‰์œผ๋กœ ์˜ค์…˜์›”๋“œ ๋‹ค๋…€์™”์–ด์š”!!! ์˜ค์…˜์›”๋“œ๋Š” ์ฒ˜์Œ๊ฐ€๋ณด๋Š”๊ฑฐ์—ฌ์„œ ์„ค๋ ˜์„ค๋ ˜~~!! ๋‚ ์”จ๋„ ๋๋‚ด์ฃผ๊ณ ~! ํ•˜๋Š˜,๊ตฌ๋ฆ„ ๋„ˆ๋ฌด ์ด๋ปค์Šต๋‹ˆ๋‹ค~! ๊ฐ€ํ‰ํœด๊ฒŒ์†Œ๊นŒ์ง€ ๊ฐ€๋Š”๋ฐ ์ฐจ๊ฐ€ ์—„~~~์ฒญ ๋ง‰ํ˜”์Šต๋‹ˆ๋‹ค', '(3์‹œ๊ฐ„๋„˜๊ฒŒ๊ฑธ๋ฆผ) ์™€ ์ •๋ง ํ† ๋‚˜์˜ค๋Š”์ค„ ์•Œ์•˜๋„ค์š” ํ•˜ํ•„ ๋˜ ์ €ํฌ๊ฐ€์กฑ ๋Šฆ๊ฒŒ ์ผ์–ด๋‚˜์„œ ๋Šฆ๊ฒŒ ์ถœ๋ฐœํ–ˆ๊ฑฐ๋“ ์š” ใ…‹ใ…‹ใ…‹', '๊ฐ€ํ‰ํœด๊ฒŒ์†Œ ์‚ฌ๋žŒ๋“ค์ด ์—„์ฒญ ๋งŽ์•˜์–ด์š”! ํ˜ธ๋‘๊ณผ์ž๋ž‘ ๊ตฐ๊ฒƒ์งˆ์ข€ ํ•ด์ฃผ๊ตฌ์š” ใ…‹_ใ…‹ ์˜ค์…˜์›”๋“œ ๋„์ฐฉ!! ์ฃผ์ฐจ์žฅ์ด ๋‹ค ๊ฝ‰์ฐจ์„œ.. ์ฃผ์ฐจํ• ๊ณณ์ด ์—†๋”๋ผ๊ตฌ์š”', '๊ณ„์† ์ฃผ์ฐจ์žฅ ๋Œ๋‹ค๊ฐ€ ๊ฒจ์šฐ ํ•œ์ž๋ฆฌ ์žˆ์–ด์„œ ์ฃผ์ฐจํ–ˆ์Šต๋‹ˆ๋‹ค..ใ… ใ… ใ…  ๊ทธ๋Ÿฐ๋ฐ ๋˜ ์ฃผ์ฐจ์žฅ์— ์ฃผ์ฐจํ•˜๊ณ  ์–ธ๋•๊ธธ์„ ์˜ฌ๋ผ๊ฐ€์•ผ ํ•˜๋”๋ผ๊ตฌ์š”!?', 'ํ—~ ์˜ค์…˜์›”๋“œ ..์ด๊ฒŒ๋ญ๋žŒ.. ํ์•Œ์ฝ”๋“œ๋กœ ์ฐ๊ณ  ๊ฐ„ํŽธํ•˜๊ฒŒ ์ž…์žฅํ–ˆ์Šต๋‹ˆ๋‹ค ์˜ค์…˜์›”๋“œ ์ฝ”์ธ๋„ ๋„‰๋„‰ํ•˜๊ฒŒ 10๋งŒ์› ์ถฉ์ „ํ–ˆ์–ด์š” ใ…‹ใ…‹ใ…‹', '๋‹ค๋“ค ๋„ˆ๋ฌด ์ž˜๋จน๊ธฐ๋•Œ๋ฌธ์—... ๋„‰๋„‰ํ•˜๊ฒŒ..ใ…‹ใ…‹ใ…‹ ์—ฌ์ž ๋ฝ์ปค์‹ค์— ์—์–ด์ปจ์ด ์–ผ๋งˆ๋‚˜ ๋นต๋นตํ•œ์ง€ ์˜ค๋“ค์˜ค๋“ค ์ถ”์› ์Šต๋‹ˆ๋‹ค ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋Š” ์Šตํ•˜๊ณ  ์ถ•์ถ•ํ•œ๋ฐ ์˜ค์…˜์›”๋“œ๋Š” ์™„์ „ ์ •๋ฐ˜๋Œ€ ใ…‹ใ…‹ใ…‹', '์ œ๊ฐ€ ๋ฐฉ์ˆ˜ํŒฉ์„ ์ค€๋น„๋ชปํ•ด์„œ ๊ฐ์ž 3๊ฐœ ์‚ด๋ ค๊ณ  ํ–ˆ๋Š”๋ฐ ํ—! ํ•œ๊ฐœ์— 19000์›์ด์—์š”! ๊ทธ๋ž˜์„œ ํ•œ๊ฐœ๋งŒ ์ƒ€์–ด์š” ใ… ใ…  ์ œ ํ•ธ๋“œํฐ์€ ๋ฝ์ปค์—.. ๋ฐฉ์ˆ˜ํŒฉ ๊ผญ ๋ฏธ๋ฆฌ ์ค€๋น„ํ•˜์„ธ์š” ใ… ', '๋„˜๋น„์‹ธ์š” ใ…  ์˜ค์…˜์›”๋“œ ์ •๋ง ์—‰๋ง์ง„์ฐฝ์ด์—ˆ์–ด์š” ใ… ใ…  ์‚ฌ๋žŒ์ด ๋„ˆ~~~๋ฌด๋งŽ์•„์„œ ์œ ์ˆ˜ํ’€๋„ ์ค„์„œ์„œ๋“ค์–ด๊ฐ€๊ตฌ์š”', '๋‹ค๋ฅธ ๋†€์ด๊ธฐ๊ตฌ๋Š” ์—„๋‘๋„ ๋ชป๋‚ฌ์Šต๋‹ˆ๋‹ค ํŒŒ๋„ํ’€๋„ ์‚ฌ๋žŒ์ด ๋„ˆ๋ฌด ๋งŽ์€์ง€ ์•ˆ์ „์ƒ ๊ด€๋ฆฌ๋ฅผ ๋นก์„ธ๊ฒŒ ํ•ด์„œ ์žฌ๋ฏธ๊ฐ€ ์—†์—ˆ์–ด์š”..', '์ฒ˜์Œ์œผ๋กœ ๋จน์–ด๋ณธ ์†Œ๋–ก์†Œ๋–ก๋ฌผ๋†€์ดํ•˜๋‹ค๊ฐ€ ๋จน์€ ๊ฐ„์‹์ด์–ด์„œ ๊ทธ๋Ÿฐ์ง€ ์ฐธ ๋ง›์žˆ๊ฒŒ ๋จน์—ˆ์Šต๋‹ˆ๋‹ค! ๊ทธ๋ ‡์ง€๋งŒ ์œ„์ƒ์€ ์ •๋ง ์•ˆ์ข‹์•˜์–ด์š”.. ์˜ค์…˜์›”๋“œ ์ฒ˜์Œ์ด๋ผ ๊ธฐ๋Œ€ ๋งŽ์ด ํ–ˆ๋Š”๋ฐ ์ฒจ๋ถ€ํ„ฐ ๋๊นŒ์ง€ ๋‹ค ๋ง˜์— ์•ˆ๋“ค์—ˆ์–ด์š”', '๋ฌผ๋ก  ์‚ฌ๋žŒ์ด ๋„ˆ~๋ฌด ๋งŽ์•„์„œ ์ผ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋Š” ์œ„์ƒ๋„ ๊ดœ์ฐฎ์•„ ๋ณด์ด๊ณ  ์Œ์‹์ด ๋น„์‹ธ์ง€๋งŒ ๋‹ค ๋ง›์žˆ์—ˆ๊ฑฐ๋“ ์š”!', '๊ทผ๋ฐ ์˜ค์…˜์›”๋“œ ์œ„์ƒ๋„ ๋ณ„๋กœ๊ณ  ๋น„์‹ธ๊ณ  ๋ง›์—†๊ณ !!! ์ฃผ์ฐจ์žฅ๋„ ์ข๊ณ  ์ฃผ์ฐจ์žฅ์—์„œ ์ž…๊ตฌ๊นŒ์ง€ ๊ฑธ์–ด์„œ ์˜ฌ๋ผ๊ฐ€๊ณ .. ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋ณด๋‹ค ๋‚˜์•˜๋˜๊ฑด ๋ฝ์ปค์‹œ์„ค๊ณผ ์œ ์ˆ˜ํ’€ ๋‘๊ฐœ ์ •๋„! ์˜ค์…˜์›”๋“œ ์ •๋ง ์•„์‰ฌ์› ์Šต๋‹ˆ๋‹ค ๊ฐœ์ธ์ ์œผ๋ฃจ ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๊ฐ€ ํ›จ์”ฌ ๋‚˜์€๋“ฏ!']

>>> split_chunks(text, max_length=24, overlap=True)
['์ฃผ๋ง์— ๊ฐ€์กฑ์—ฌํ–‰์œผ๋กœ ์˜ค์…˜์›”๋“œ ๋‹ค๋…€์™”์–ด์š”!!! ์˜ค์…˜์›”๋“œ๋Š” ์ฒ˜์Œ๊ฐ€๋ณด๋Š”๊ฑฐ์—ฌ์„œ ์„ค๋ ˜์„ค๋ ˜~~!! ๋‚ ์”จ๋„ ๋๋‚ด์ฃผ๊ณ ~! ํ•˜๋Š˜,๊ตฌ๋ฆ„ ๋„ˆ๋ฌด ์ด๋ปค์Šต๋‹ˆ๋‹ค~! ๊ฐ€ํ‰ํœด๊ฒŒ์†Œ๊นŒ์ง€ ๊ฐ€๋Š”๋ฐ ์ฐจ๊ฐ€ ์—„~~~์ฒญ ๋ง‰ํ˜”์Šต๋‹ˆ๋‹ค', '์˜ค์…˜์›”๋“œ๋Š” ์ฒ˜์Œ๊ฐ€๋ณด๋Š”๊ฑฐ์—ฌ์„œ ์„ค๋ ˜์„ค๋ ˜~~!! ๋‚ ์”จ๋„ ๋๋‚ด์ฃผ๊ณ ~! ํ•˜๋Š˜,๊ตฌ๋ฆ„ ๋„ˆ๋ฌด ์ด๋ปค์Šต๋‹ˆ๋‹ค~! ๊ฐ€ํ‰ํœด๊ฒŒ์†Œ๊นŒ์ง€ ๊ฐ€๋Š”๋ฐ ์ฐจ๊ฐ€ ์—„~~~์ฒญ ๋ง‰ํ˜”์Šต๋‹ˆ๋‹ค(3์‹œ๊ฐ„๋„˜๊ฒŒ๊ฑธ๋ฆผ) ์™€ ์ •๋ง ํ† ๋‚˜์˜ค๋Š”์ค„ ์•Œ์•˜๋„ค์š”', '(3์‹œ๊ฐ„๋„˜๊ฒŒ๊ฑธ๋ฆผ) ์™€ ์ •๋ง ํ† ๋‚˜์˜ค๋Š”์ค„ ์•Œ์•˜๋„ค์š” ํ•˜ํ•„ ๋˜ ์ €ํฌ๊ฐ€์กฑ ๋Šฆ๊ฒŒ ์ผ์–ด๋‚˜์„œ ๋Šฆ๊ฒŒ ์ถœ๋ฐœํ–ˆ๊ฑฐ๋“ ์š” ใ…‹ใ…‹ใ…‹', 'ํ•˜ํ•„ ๋˜ ์ €ํฌ๊ฐ€์กฑ ๋Šฆ๊ฒŒ ์ผ์–ด๋‚˜์„œ ๋Šฆ๊ฒŒ ์ถœ๋ฐœํ–ˆ๊ฑฐ๋“ ์š” ใ…‹ใ…‹ใ…‹ ๊ฐ€ํ‰ํœด๊ฒŒ์†Œ ์‚ฌ๋žŒ๋“ค์ด ์—„์ฒญ ๋งŽ์•˜์–ด์š”! ํ˜ธ๋‘๊ณผ์ž๋ž‘ ๊ตฐ๊ฒƒ์งˆ์ข€ ํ•ด์ฃผ๊ตฌ์š” ใ…‹', '๊ฐ€ํ‰ํœด๊ฒŒ์†Œ ์‚ฌ๋žŒ๋“ค์ด ์—„์ฒญ ๋งŽ์•˜์–ด์š”! ํ˜ธ๋‘๊ณผ์ž๋ž‘ ๊ตฐ๊ฒƒ์งˆ์ข€ ํ•ด์ฃผ๊ตฌ์š” ใ…‹_ใ…‹ ์˜ค์…˜์›”๋“œ ๋„์ฐฉ!! ์ฃผ์ฐจ์žฅ์ด ๋‹ค ๊ฝ‰์ฐจ์„œ.. ์ฃผ์ฐจํ• ๊ณณ์ด ์—†๋”๋ผ๊ตฌ์š”', 'ํ˜ธ๋‘๊ณผ์ž๋ž‘ ๊ตฐ๊ฒƒ์งˆ์ข€ ํ•ด์ฃผ๊ตฌ์š” ใ…‹_ใ…‹ ์˜ค์…˜์›”๋“œ ๋„์ฐฉ!! ์ฃผ์ฐจ์žฅ์ด ๋‹ค ๊ฝ‰์ฐจ์„œ.. ์ฃผ์ฐจํ• ๊ณณ์ด ์—†๋”๋ผ๊ตฌ์š” ๊ณ„์† ์ฃผ์ฐจ์žฅ ๋Œ๋‹ค๊ฐ€ ๊ฒจ์šฐ ํ•œ์ž๋ฆฌ ์žˆ์–ด์„œ ์ฃผ์ฐจํ–ˆ์Šต๋‹ˆ๋‹ค..ใ… ใ… ใ… ', '_ใ…‹ ์˜ค์…˜์›”๋“œ ๋„์ฐฉ!! ์ฃผ์ฐจ์žฅ์ด ๋‹ค ๊ฝ‰์ฐจ์„œ.. ์ฃผ์ฐจํ• ๊ณณ์ด ์—†๋”๋ผ๊ตฌ์š” ๊ณ„์† ์ฃผ์ฐจ์žฅ ๋Œ๋‹ค๊ฐ€ ๊ฒจ์šฐ ํ•œ์ž๋ฆฌ ์žˆ์–ด์„œ ์ฃผ์ฐจํ–ˆ์Šต๋‹ˆ๋‹ค..ใ… ใ… ใ…  ๊ทธ๋Ÿฐ๋ฐ ๋˜ ์ฃผ์ฐจ์žฅ์— ์ฃผ์ฐจํ•˜๊ณ  ์–ธ๋•๊ธธ์„ ์˜ฌ๋ผ๊ฐ€์•ผ ํ•˜๋”๋ผ๊ตฌ์š”!?', '๊ณ„์† ์ฃผ์ฐจ์žฅ ๋Œ๋‹ค๊ฐ€ ๊ฒจ์šฐ ํ•œ์ž๋ฆฌ ์žˆ์–ด์„œ ์ฃผ์ฐจํ–ˆ์Šต๋‹ˆ๋‹ค..ใ… ใ… ใ…  ๊ทธ๋Ÿฐ๋ฐ ๋˜ ์ฃผ์ฐจ์žฅ์— ์ฃผ์ฐจํ•˜๊ณ  ์–ธ๋•๊ธธ์„ ์˜ฌ๋ผ๊ฐ€์•ผ ํ•˜๋”๋ผ๊ตฌ์š”!?ํ—~ ์˜ค์…˜์›”๋“œ ..์ด๊ฒŒ๋ญ๋žŒ..', '๊ทธ๋Ÿฐ๋ฐ ๋˜ ์ฃผ์ฐจ์žฅ์— ์ฃผ์ฐจํ•˜๊ณ  ์–ธ๋•๊ธธ์„ ์˜ฌ๋ผ๊ฐ€์•ผ ํ•˜๋”๋ผ๊ตฌ์š”!?ํ—~ ์˜ค์…˜์›”๋“œ ..์ด๊ฒŒ๋ญ๋žŒ.. ํ์•Œ์ฝ”๋“œ๋กœ ์ฐ๊ณ  ๊ฐ„ํŽธํ•˜๊ฒŒ ์ž…์žฅํ–ˆ์Šต๋‹ˆ๋‹ค', 'ํ—~ ์˜ค์…˜์›”๋“œ ..์ด๊ฒŒ๋ญ๋žŒ.. ํ์•Œ์ฝ”๋“œ๋กœ ์ฐ๊ณ  ๊ฐ„ํŽธํ•˜๊ฒŒ ์ž…์žฅํ–ˆ์Šต๋‹ˆ๋‹ค ์˜ค์…˜์›”๋“œ ์ฝ”์ธ๋„ ๋„‰๋„‰ํ•˜๊ฒŒ 10๋งŒ์› ์ถฉ์ „ํ–ˆ์–ด์š” ใ…‹ใ…‹ใ…‹', 'ํ์•Œ์ฝ”๋“œ๋กœ ์ฐ๊ณ  ๊ฐ„ํŽธํ•˜๊ฒŒ ์ž…์žฅํ–ˆ์Šต๋‹ˆ๋‹ค ์˜ค์…˜์›”๋“œ ์ฝ”์ธ๋„ ๋„‰๋„‰ํ•˜๊ฒŒ 10๋งŒ์› ์ถฉ์ „ํ–ˆ์–ด์š” ใ…‹ใ…‹ใ…‹ ๋‹ค๋“ค ๋„ˆ๋ฌด ์ž˜๋จน๊ธฐ๋•Œ๋ฌธ์—... ๋„‰๋„‰ํ•˜๊ฒŒ..ใ…‹ใ…‹ใ…‹ ์—ฌ์ž ๋ฝ์ปค์‹ค์— ์—์–ด์ปจ์ด ์–ผ๋งˆ๋‚˜ ๋นต๋นตํ•œ์ง€ ์˜ค๋“ค์˜ค๋“ค ์ถ”์› ์Šต๋‹ˆ๋‹ค', '์˜ค์…˜์›”๋“œ ์ฝ”์ธ๋„ ๋„‰๋„‰ํ•˜๊ฒŒ 10๋งŒ์› ์ถฉ์ „ํ–ˆ์–ด์š” ใ…‹ใ…‹ใ…‹ ๋‹ค๋“ค ๋„ˆ๋ฌด ์ž˜๋จน๊ธฐ๋•Œ๋ฌธ์—... ๋„‰๋„‰ํ•˜๊ฒŒ..ใ…‹ใ…‹ใ…‹ ์—ฌ์ž ๋ฝ์ปค์‹ค์— ์—์–ด์ปจ์ด ์–ผ๋งˆ๋‚˜ ๋นต๋นตํ•œ์ง€ ์˜ค๋“ค์˜ค๋“ค ์ถ”์› ์Šต๋‹ˆ๋‹ค ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋Š” ์Šตํ•˜๊ณ  ์ถ•์ถ•ํ•œ๋ฐ ์˜ค์…˜์›”๋“œ๋Š” ์™„์ „ ์ •๋ฐ˜๋Œ€ ใ…‹ใ…‹ใ…‹', '๋‹ค๋“ค ๋„ˆ๋ฌด ์ž˜๋จน๊ธฐ๋•Œ๋ฌธ์—... ๋„‰๋„‰ํ•˜๊ฒŒ..ใ…‹ใ…‹ใ…‹ ์—ฌ์ž ๋ฝ์ปค์‹ค์— ์—์–ด์ปจ์ด ์–ผ๋งˆ๋‚˜ ๋นต๋นตํ•œ์ง€ ์˜ค๋“ค์˜ค๋“ค ์ถ”์› ์Šต๋‹ˆ๋‹ค ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋Š” ์Šตํ•˜๊ณ  ์ถ•์ถ•ํ•œ๋ฐ ์˜ค์…˜์›”๋“œ๋Š” ์™„์ „ ์ •๋ฐ˜๋Œ€ ใ…‹ใ…‹ใ…‹ ์ œ๊ฐ€ ๋ฐฉ์ˆ˜ํŒฉ์„ ์ค€๋น„๋ชปํ•ด์„œ ๊ฐ์ž 3๊ฐœ ์‚ด๋ ค๊ณ  ํ–ˆ๋Š”๋ฐ ํ—! ํ•œ๊ฐœ์— 19000์›์ด์—์š”!', '์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋Š” ์Šตํ•˜๊ณ  ์ถ•์ถ•ํ•œ๋ฐ ์˜ค์…˜์›”๋“œ๋Š” ์™„์ „ ์ •๋ฐ˜๋Œ€ ใ…‹ใ…‹ใ…‹ ์ œ๊ฐ€ ๋ฐฉ์ˆ˜ํŒฉ์„ ์ค€๋น„๋ชปํ•ด์„œ ๊ฐ์ž 3๊ฐœ ์‚ด๋ ค๊ณ  ํ–ˆ๋Š”๋ฐ ํ—! ํ•œ๊ฐœ์— 19000์›์ด์—์š”! ๊ทธ๋ž˜์„œ ํ•œ๊ฐœ๋งŒ ์ƒ€์–ด์š” ใ… ใ… ', '์ œ๊ฐ€ ๋ฐฉ์ˆ˜ํŒฉ์„ ์ค€๋น„๋ชปํ•ด์„œ ๊ฐ์ž 3๊ฐœ ์‚ด๋ ค๊ณ  ํ–ˆ๋Š”๋ฐ ํ—! ํ•œ๊ฐœ์— 19000์›์ด์—์š”! ๊ทธ๋ž˜์„œ ํ•œ๊ฐœ๋งŒ ์ƒ€์–ด์š” ใ… ใ…  ์ œ ํ•ธ๋“œํฐ์€ ๋ฝ์ปค์—.. ๋ฐฉ์ˆ˜ํŒฉ ๊ผญ ๋ฏธ๋ฆฌ ์ค€๋น„ํ•˜์„ธ์š” ใ… ', '๊ทธ๋ž˜์„œ ํ•œ๊ฐœ๋งŒ ์ƒ€์–ด์š” ใ… ใ…  ์ œ ํ•ธ๋“œํฐ์€ ๋ฝ์ปค์—.. ๋ฐฉ์ˆ˜ํŒฉ ๊ผญ ๋ฏธ๋ฆฌ ์ค€๋น„ํ•˜์„ธ์š” ใ…  ๋„˜๋น„์‹ธ์š” ใ…  ์˜ค์…˜์›”๋“œ ์ •๋ง ์—‰๋ง์ง„์ฐฝ์ด์—ˆ์–ด์š” ใ… ใ… ', '์ œ ํ•ธ๋“œํฐ์€ ๋ฝ์ปค์—.. ๋ฐฉ์ˆ˜ํŒฉ ๊ผญ ๋ฏธ๋ฆฌ ์ค€๋น„ํ•˜์„ธ์š” ใ…  ๋„˜๋น„์‹ธ์š” ใ…  ์˜ค์…˜์›”๋“œ ์ •๋ง ์—‰๋ง์ง„์ฐฝ์ด์—ˆ์–ด์š” ใ… ใ…  ์‚ฌ๋žŒ์ด ๋„ˆ~~~๋ฌด๋งŽ์•„์„œ ์œ ์ˆ˜ํ’€๋„ ์ค„์„œ์„œ๋“ค์–ด๊ฐ€๊ตฌ์š”', '๋„˜๋น„์‹ธ์š” ใ…  ์˜ค์…˜์›”๋“œ ์ •๋ง ์—‰๋ง์ง„์ฐฝ์ด์—ˆ์–ด์š” ใ… ใ…  ์‚ฌ๋žŒ์ด ๋„ˆ~~~๋ฌด๋งŽ์•„์„œ ์œ ์ˆ˜ํ’€๋„ ์ค„์„œ์„œ๋“ค์–ด๊ฐ€๊ตฌ์š” ๋‹ค๋ฅธ ๋†€์ด๊ธฐ๊ตฌ๋Š” ์—„๋‘๋„ ๋ชป๋‚ฌ์Šต๋‹ˆ๋‹ค', '์‚ฌ๋žŒ์ด ๋„ˆ~~~๋ฌด๋งŽ์•„์„œ ์œ ์ˆ˜ํ’€๋„ ์ค„์„œ์„œ๋“ค์–ด๊ฐ€๊ตฌ์š” ๋‹ค๋ฅธ ๋†€์ด๊ธฐ๊ตฌ๋Š” ์—„๋‘๋„ ๋ชป๋‚ฌ์Šต๋‹ˆ๋‹ค ํŒŒ๋„ํ’€๋„ ์‚ฌ๋žŒ์ด ๋„ˆ๋ฌด ๋งŽ์€์ง€ ์•ˆ์ „์ƒ ๊ด€๋ฆฌ๋ฅผ ๋นก์„ธ๊ฒŒ ํ•ด์„œ ์žฌ๋ฏธ๊ฐ€ ์—†์—ˆ์–ด์š”..', '๋‹ค๋ฅธ ๋†€์ด๊ธฐ๊ตฌ๋Š” ์—„๋‘๋„ ๋ชป๋‚ฌ์Šต๋‹ˆ๋‹ค ํŒŒ๋„ํ’€๋„ ์‚ฌ๋žŒ์ด ๋„ˆ๋ฌด ๋งŽ์€์ง€ ์•ˆ์ „์ƒ ๊ด€๋ฆฌ๋ฅผ ๋นก์„ธ๊ฒŒ ํ•ด์„œ ์žฌ๋ฏธ๊ฐ€ ์—†์—ˆ์–ด์š”.. ์ฒ˜์Œ์œผ๋กœ ๋จน์–ด๋ณธ ์†Œ๋–ก์†Œ๋–ก๋ฌผ๋†€์ดํ•˜๋‹ค๊ฐ€ ๋จน์€ ๊ฐ„์‹์ด์–ด์„œ ๊ทธ๋Ÿฐ์ง€ ์ฐธ ๋ง›์žˆ๊ฒŒ ๋จน์—ˆ์Šต๋‹ˆ๋‹ค!', 'ํŒŒ๋„ํ’€๋„ ์‚ฌ๋žŒ์ด ๋„ˆ๋ฌด ๋งŽ์€์ง€ ์•ˆ์ „์ƒ ๊ด€๋ฆฌ๋ฅผ ๋นก์„ธ๊ฒŒ ํ•ด์„œ ์žฌ๋ฏธ๊ฐ€ ์—†์—ˆ์–ด์š”.. ์ฒ˜์Œ์œผ๋กœ ๋จน์–ด๋ณธ ์†Œ๋–ก์†Œ๋–ก๋ฌผ๋†€์ดํ•˜๋‹ค๊ฐ€ ๋จน์€ ๊ฐ„์‹์ด์–ด์„œ ๊ทธ๋Ÿฐ์ง€ ์ฐธ ๋ง›์žˆ๊ฒŒ ๋จน์—ˆ์Šต๋‹ˆ๋‹ค! ๊ทธ๋ ‡์ง€๋งŒ ์œ„์ƒ์€ ์ •๋ง ์•ˆ์ข‹์•˜์–ด์š”..', '์ฒ˜์Œ์œผ๋กœ ๋จน์–ด๋ณธ ์†Œ๋–ก์†Œ๋–ก๋ฌผ๋†€์ดํ•˜๋‹ค๊ฐ€ ๋จน์€ ๊ฐ„์‹์ด์–ด์„œ ๊ทธ๋Ÿฐ์ง€ ์ฐธ ๋ง›์žˆ๊ฒŒ ๋จน์—ˆ์Šต๋‹ˆ๋‹ค! ๊ทธ๋ ‡์ง€๋งŒ ์œ„์ƒ์€ ์ •๋ง ์•ˆ์ข‹์•˜์–ด์š”.. ์˜ค์…˜์›”๋“œ ์ฒ˜์Œ์ด๋ผ ๊ธฐ๋Œ€ ๋งŽ์ด ํ–ˆ๋Š”๋ฐ ์ฒจ๋ถ€ํ„ฐ ๋๊นŒ์ง€ ๋‹ค ๋ง˜์— ์•ˆ๋“ค์—ˆ์–ด์š”', '๊ทธ๋ ‡์ง€๋งŒ ์œ„์ƒ์€ ์ •๋ง ์•ˆ์ข‹์•˜์–ด์š”.. ์˜ค์…˜์›”๋“œ ์ฒ˜์Œ์ด๋ผ ๊ธฐ๋Œ€ ๋งŽ์ด ํ–ˆ๋Š”๋ฐ ์ฒจ๋ถ€ํ„ฐ ๋๊นŒ์ง€ ๋‹ค ๋ง˜์— ์•ˆ๋“ค์—ˆ์–ด์š” ๋ฌผ๋ก  ์‚ฌ๋žŒ์ด ๋„ˆ~๋ฌด ๋งŽ์•„์„œ ์ผ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.', '์˜ค์…˜์›”๋“œ ์ฒ˜์Œ์ด๋ผ ๊ธฐ๋Œ€ ๋งŽ์ด ํ–ˆ๋Š”๋ฐ ์ฒจ๋ถ€ํ„ฐ ๋๊นŒ์ง€ ๋‹ค ๋ง˜์— ์•ˆ๋“ค์—ˆ์–ด์š” ๋ฌผ๋ก  ์‚ฌ๋žŒ์ด ๋„ˆ~๋ฌด ๋งŽ์•„์„œ ์ผ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋Š” ์œ„์ƒ๋„ ๊ดœ์ฐฎ์•„ ๋ณด์ด๊ณ  ์Œ์‹์ด ๋น„์‹ธ์ง€๋งŒ ๋‹ค ๋ง›์žˆ์—ˆ๊ฑฐ๋“ ์š”!', '๋ฌผ๋ก  ์‚ฌ๋žŒ์ด ๋„ˆ~๋ฌด ๋งŽ์•„์„œ ์ผ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋Š” ์œ„์ƒ๋„ ๊ดœ์ฐฎ์•„ ๋ณด์ด๊ณ  ์Œ์‹์ด ๋น„์‹ธ์ง€๋งŒ ๋‹ค ๋ง›์žˆ์—ˆ๊ฑฐ๋“ ์š”! ๊ทผ๋ฐ ์˜ค์…˜์›”๋“œ ์œ„์ƒ๋„ ๋ณ„๋กœ๊ณ  ๋น„์‹ธ๊ณ  ๋ง›์—†๊ณ !!! ์ฃผ์ฐจ์žฅ๋„ ์ข๊ณ  ์ฃผ์ฐจ์žฅ์—์„œ ์ž…๊ตฌ๊นŒ์ง€ ๊ฑธ์–ด์„œ ์˜ฌ๋ผ๊ฐ€๊ณ .. ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋ณด๋‹ค ๋‚˜์•˜๋˜๊ฑด ๋ฝ์ปค์‹œ์„ค๊ณผ ์œ ์ˆ˜ํ’€ ๋‘๊ฐœ ์ •๋„! ์˜ค์…˜์›”๋“œ ์ •๋ง ์•„์‰ฌ์› ์Šต๋‹ˆ๋‹ค', '์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋Š” ์œ„์ƒ๋„ ๊ดœ์ฐฎ์•„ ๋ณด์ด๊ณ  ์Œ์‹์ด ๋น„์‹ธ์ง€๋งŒ ๋‹ค ๋ง›์žˆ์—ˆ๊ฑฐ๋“ ์š”! ๊ทผ๋ฐ ์˜ค์…˜์›”๋“œ ์œ„์ƒ๋„ ๋ณ„๋กœ๊ณ  ๋น„์‹ธ๊ณ  ๋ง›์—†๊ณ !!! ์ฃผ์ฐจ์žฅ๋„ ์ข๊ณ  ์ฃผ์ฐจ์žฅ์—์„œ ์ž…๊ตฌ๊นŒ์ง€ ๊ฑธ์–ด์„œ ์˜ฌ๋ผ๊ฐ€๊ณ .. ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋ณด๋‹ค ๋‚˜์•˜๋˜๊ฑด ๋ฝ์ปค์‹œ์„ค๊ณผ ์œ ์ˆ˜ํ’€ ๋‘๊ฐœ ์ •๋„! ์˜ค์…˜์›”๋“œ ์ •๋ง ์•„์‰ฌ์› ์Šต๋‹ˆ๋‹ค ๊ฐœ์ธ์ ์œผ๋ฃจ ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๊ฐ€ ํ›จ์”ฌ ๋‚˜์€๋“ฏ!']

kwargs (**dict)

split_chunks is based on split_sentences. Therefore, all arguments of split_sentences can be used. Check the following examples.

  • An example of kwargs
>>> from kss import split_chunks
>>> text = """์ฃผ๋ง์— ๊ฐ€์กฑ์—ฌํ–‰์œผ๋กœ ์˜ค์…˜์›”๋“œ ๋‹ค๋…€์™”์–ด์š”!!! ์˜ค์…˜์›”๋“œ๋Š” ์ฒ˜์Œ๊ฐ€๋ณด๋Š”๊ฑฐ์—ฌ์„œ ์„ค๋ ˜์„ค๋ ˜~~!! ๋‚ ์”จ๋„ ๋๋‚ด์ฃผ๊ณ ~! ํ•˜๋Š˜,๊ตฌ๋ฆ„ ๋„ˆ๋ฌด ์ด๋ปค์Šต๋‹ˆ๋‹ค~! ๊ฐ€ํ‰ํœด๊ฒŒ์†Œ๊นŒ์ง€ ๊ฐ€๋Š”๋ฐ ์ฐจ๊ฐ€ ์—„~~~์ฒญ ๋ง‰ํ˜”์Šต๋‹ˆ๋‹ค(3์‹œ๊ฐ„๋„˜๊ฒŒ๊ฑธ๋ฆผ) ์™€ ์ •๋ง ํ† ๋‚˜์˜ค๋Š”์ค„ ์•Œ์•˜๋„ค์š” ํ•˜ํ•„ ๋˜ ์ €ํฌ๊ฐ€์กฑ ๋Šฆ๊ฒŒ ์ผ์–ด๋‚˜์„œ ๋Šฆ๊ฒŒ ์ถœ๋ฐœํ–ˆ๊ฑฐ๋“ ์š” ใ…‹ใ…‹ใ…‹ ๊ฐ€ํ‰ํœด๊ฒŒ์†Œ ์‚ฌ๋žŒ๋“ค์ด ์—„์ฒญ ๋งŽ์•˜์–ด์š”! ํ˜ธ๋‘๊ณผ์ž๋ž‘ ๊ตฐ๊ฒƒ์งˆ์ข€ ํ•ด์ฃผ๊ตฌ์š” ใ…‹_ใ…‹ ์˜ค์…˜์›”๋“œ ๋„์ฐฉ!! ์ฃผ์ฐจ์žฅ์ด ๋‹ค ๊ฝ‰์ฐจ์„œ.. ์ฃผ์ฐจํ• ๊ณณ์ด ์—†๋”๋ผ๊ตฌ์š” ๊ณ„์† ์ฃผ์ฐจ์žฅ ๋Œ๋‹ค๊ฐ€ ๊ฒจ์šฐ ํ•œ์ž๋ฆฌ ์žˆ์–ด์„œ ์ฃผ์ฐจํ–ˆ์Šต๋‹ˆ๋‹ค..ใ… ใ… ใ…  ๊ทธ๋Ÿฐ๋ฐ ๋˜ ์ฃผ์ฐจ์žฅ์— ์ฃผ์ฐจํ•˜๊ณ  ์–ธ๋•๊ธธ์„ ์˜ฌ๋ผ๊ฐ€์•ผ ํ•˜๋”๋ผ๊ตฌ์š”!?ํ—~ ์˜ค์…˜์›”๋“œ ..์ด๊ฒŒ๋ญ๋žŒ.. ํ์•Œ์ฝ”๋“œ๋กœ ์ฐ๊ณ  ๊ฐ„ํŽธํ•˜๊ฒŒ ์ž…์žฅํ–ˆ์Šต๋‹ˆ๋‹ค ์˜ค์…˜์›”๋“œ ์ฝ”์ธ๋„ ๋„‰๋„‰ํ•˜๊ฒŒ 10๋งŒ์› ์ถฉ์ „ํ–ˆ์–ด์š” ใ…‹ใ…‹ใ…‹ ๋‹ค๋“ค ๋„ˆ๋ฌด ์ž˜๋จน๊ธฐ๋•Œ๋ฌธ์—... ๋„‰๋„‰ํ•˜๊ฒŒ..ใ…‹ใ…‹ใ…‹ ์—ฌ์ž ๋ฝ์ปค์‹ค์— ์—์–ด์ปจ์ด ์–ผ๋งˆ๋‚˜ ๋นต๋นตํ•œ์ง€ ์˜ค๋“ค์˜ค๋“ค ์ถ”์› ์Šต๋‹ˆ๋‹ค ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋Š” ์Šตํ•˜๊ณ  ์ถ•์ถ•ํ•œ๋ฐ ์˜ค์…˜์›”๋“œ๋Š” ์™„์ „ ์ •๋ฐ˜๋Œ€ ใ…‹ใ…‹ใ…‹ ์ œ๊ฐ€ ๋ฐฉ์ˆ˜ํŒฉ์„ ์ค€๋น„๋ชปํ•ด์„œ ๊ฐ์ž 3๊ฐœ ์‚ด๋ ค๊ณ  ํ–ˆ๋Š”๋ฐ ํ—! ํ•œ๊ฐœ์— 19000์›์ด์—์š”! ๊ทธ๋ž˜์„œ ํ•œ๊ฐœ๋งŒ ์ƒ€์–ด์š” ใ… ใ…  ์ œ ํ•ธ๋“œํฐ์€ ๋ฝ์ปค์—.. ๋ฐฉ์ˆ˜ํŒฉ ๊ผญ ๋ฏธ๋ฆฌ ์ค€๋น„ํ•˜์„ธ์š” ใ…  ๋„˜๋น„์‹ธ์š” ใ…  ์˜ค์…˜์›”๋“œ ์ •๋ง ์—‰๋ง์ง„์ฐฝ์ด์—ˆ์–ด์š” ใ… ใ…  ์‚ฌ๋žŒ์ด ๋„ˆ~~~๋ฌด๋งŽ์•„์„œ ์œ ์ˆ˜ํ’€๋„ ์ค„์„œ์„œ๋“ค์–ด๊ฐ€๊ตฌ์š” ๋‹ค๋ฅธ ๋†€์ด๊ธฐ๊ตฌ๋Š” ์—„๋‘๋„ ๋ชป๋‚ฌ์Šต๋‹ˆ๋‹ค ํŒŒ๋„ํ’€๋„ ์‚ฌ๋žŒ์ด ๋„ˆ๋ฌด ๋งŽ์€์ง€ ์•ˆ์ „์ƒ ๊ด€๋ฆฌ๋ฅผ ๋นก์„ธ๊ฒŒ ํ•ด์„œ ์žฌ๋ฏธ๊ฐ€ ์—†์—ˆ์–ด์š”.. ์ฒ˜์Œ์œผ๋กœ ๋จน์–ด๋ณธ ์†Œ๋–ก์†Œ๋–กย ๋ฌผ๋†€์ดํ•˜๋‹ค๊ฐ€ ๋จน์€ ๊ฐ„์‹์ด์–ด์„œ ๊ทธ๋Ÿฐ์ง€ ์ฐธ ๋ง›์žˆ๊ฒŒ ๋จน์—ˆ์Šต๋‹ˆ๋‹ค! ๊ทธ๋ ‡์ง€๋งŒ ์œ„์ƒ์€ ์ •๋ง ์•ˆ์ข‹์•˜์–ด์š”.. ์˜ค์…˜์›”๋“œ ์ฒ˜์Œ์ด๋ผ ๊ธฐ๋Œ€ ๋งŽ์ด ํ–ˆ๋Š”๋ฐ ์ฒจ๋ถ€ํ„ฐ ๋๊นŒ์ง€ ๋‹ค ๋ง˜์— ์•ˆ๋“ค์—ˆ์–ด์š” ๋ฌผ๋ก  ์‚ฌ๋žŒ์ด ๋„ˆ~๋ฌด ๋งŽ์•„์„œ ์ผ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋Š” ์œ„์ƒ๋„ ๊ดœ์ฐฎ์•„ ๋ณด์ด๊ณ  ์Œ์‹์ด ๋น„์‹ธ์ง€๋งŒ ๋‹ค ๋ง›์žˆ์—ˆ๊ฑฐ๋“ ์š”! ๊ทผ๋ฐ ์˜ค์…˜์›”๋“œ ์œ„์ƒ๋„ ๋ณ„๋กœ๊ณ  ๋น„์‹ธ๊ณ  ๋ง›์—†๊ณ !!! ์ฃผ์ฐจ์žฅ๋„ ์ข๊ณ  ์ฃผ์ฐจ์žฅ์—์„œ ์ž…๊ตฌ๊นŒ์ง€ ๊ฑธ์–ด์„œ ์˜ฌ๋ผ๊ฐ€๊ณ .. ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋ณด๋‹ค ๋‚˜์•˜๋˜๊ฑด ๋ฝ์ปค์‹œ์„ค๊ณผ ์œ ์ˆ˜ํ’€ ๋‘๊ฐœ ์ •๋„! ์˜ค์…˜์›”๋“œ ์ •๋ง ์•„์‰ฌ์› ์Šต๋‹ˆ๋‹ค ๊ฐœ์ธ์ ์œผ๋ฃจ ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๊ฐ€ ํ›จ์”ฌ ๋‚˜์€๋“ฏ!"""
>>> split_chunks(text, backend="mecab", max_length=24)
['์ฃผ๋ง์— ๊ฐ€์กฑ์—ฌํ–‰์œผ๋กœ ์˜ค์…˜์›”๋“œ ๋‹ค๋…€์™”์–ด์š”!!! ์˜ค์…˜์›”๋“œ๋Š” ์ฒ˜์Œ๊ฐ€๋ณด๋Š”๊ฑฐ์—ฌ์„œ ์„ค๋ ˜์„ค๋ ˜~~!! ๋‚ ์”จ๋„ ๋๋‚ด์ฃผ๊ณ ~! ํ•˜๋Š˜,๊ตฌ๋ฆ„ ๋„ˆ๋ฌด ์ด๋ปค์Šต๋‹ˆ๋‹ค~! ๊ฐ€ํ‰ํœด๊ฒŒ์†Œ๊นŒ์ง€ ๊ฐ€๋Š”๋ฐ ์ฐจ๊ฐ€ ์—„~~~์ฒญ ๋ง‰ํ˜”์Šต๋‹ˆ๋‹ค', '(3์‹œ๊ฐ„๋„˜๊ฒŒ๊ฑธ๋ฆผ) ์™€ ์ •๋ง ํ† ๋‚˜์˜ค๋Š”์ค„ ์•Œ์•˜๋„ค์š” ํ•˜ํ•„ ๋˜ ์ €ํฌ๊ฐ€์กฑ ๋Šฆ๊ฒŒ ์ผ์–ด๋‚˜์„œ ๋Šฆ๊ฒŒ ์ถœ๋ฐœํ–ˆ๊ฑฐ๋“ ์š” ใ…‹ใ…‹ใ…‹', '๊ฐ€ํ‰ํœด๊ฒŒ์†Œ ์‚ฌ๋žŒ๋“ค์ด ์—„์ฒญ ๋งŽ์•˜์–ด์š”! ํ˜ธ๋‘๊ณผ์ž๋ž‘ ๊ตฐ๊ฒƒ์งˆ์ข€ ํ•ด์ฃผ๊ตฌ์š” ใ…‹_ใ…‹ ์˜ค์…˜์›”๋“œ ๋„์ฐฉ!! ์ฃผ์ฐจ์žฅ์ด ๋‹ค ๊ฝ‰์ฐจ์„œ.. ์ฃผ์ฐจํ• ๊ณณ์ด ์—†๋”๋ผ๊ตฌ์š”', '๊ณ„์† ์ฃผ์ฐจ์žฅ ๋Œ๋‹ค๊ฐ€ ๊ฒจ์šฐ ํ•œ์ž๋ฆฌ ์žˆ์–ด์„œ ์ฃผ์ฐจํ–ˆ์Šต๋‹ˆ๋‹ค..ใ… ใ… ใ…  ๊ทธ๋Ÿฐ๋ฐ ๋˜ ์ฃผ์ฐจ์žฅ์— ์ฃผ์ฐจํ•˜๊ณ  ์–ธ๋•๊ธธ์„ ์˜ฌ๋ผ๊ฐ€์•ผ ํ•˜๋”๋ผ๊ตฌ์š”!?', 'ํ—~ ์˜ค์…˜์›”๋“œ ..์ด๊ฒŒ๋ญ๋žŒ.. ํ์•Œ์ฝ”๋“œ๋กœ ์ฐ๊ณ  ๊ฐ„ํŽธํ•˜๊ฒŒ ์ž…์žฅํ–ˆ์Šต๋‹ˆ๋‹ค ์˜ค์…˜์›”๋“œ ์ฝ”์ธ๋„ ๋„‰๋„‰ํ•˜๊ฒŒ 10๋งŒ์› ์ถฉ์ „ํ–ˆ์–ด์š” ใ…‹ใ…‹ใ…‹', '๋‹ค๋“ค ๋„ˆ๋ฌด ์ž˜๋จน๊ธฐ๋•Œ๋ฌธ์—... ๋„‰๋„‰ํ•˜๊ฒŒ..ใ…‹ใ…‹ใ…‹ ์—ฌ์ž ๋ฝ์ปค์‹ค์— ์—์–ด์ปจ์ด ์–ผ๋งˆ๋‚˜ ๋นต๋นตํ•œ์ง€ ์˜ค๋“ค์˜ค๋“ค ์ถ”์› ์Šต๋‹ˆ๋‹ค ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋Š” ์Šตํ•˜๊ณ  ์ถ•์ถ•ํ•œ๋ฐ ์˜ค์…˜์›”๋“œ๋Š” ์™„์ „ ์ •๋ฐ˜๋Œ€ ใ…‹ใ…‹ใ…‹', '์ œ๊ฐ€ ๋ฐฉ์ˆ˜ํŒฉ์„ ์ค€๋น„๋ชปํ•ด์„œ ๊ฐ์ž 3๊ฐœ ์‚ด๋ ค๊ณ  ํ–ˆ๋Š”๋ฐ ํ—! ํ•œ๊ฐœ์— 19000์›์ด์—์š”! ๊ทธ๋ž˜์„œ ํ•œ๊ฐœ๋งŒ ์ƒ€์–ด์š” ใ… ใ…  ์ œ ํ•ธ๋“œํฐ์€ ๋ฝ์ปค์—.. ๋ฐฉ์ˆ˜ํŒฉ ๊ผญ ๋ฏธ๋ฆฌ ์ค€๋น„ํ•˜์„ธ์š” ใ… ', '๋„˜๋น„์‹ธ์š” ใ…  ์˜ค์…˜์›”๋“œ ์ •๋ง ์—‰๋ง์ง„์ฐฝ์ด์—ˆ์–ด์š” ใ… ใ…  ์‚ฌ๋žŒ์ด ๋„ˆ~~~๋ฌด๋งŽ์•„์„œ ์œ ์ˆ˜ํ’€๋„ ์ค„์„œ์„œ๋“ค์–ด๊ฐ€๊ตฌ์š”', '๋‹ค๋ฅธ ๋†€์ด๊ธฐ๊ตฌ๋Š” ์—„๋‘๋„ ๋ชป๋‚ฌ์Šต๋‹ˆ๋‹ค ํŒŒ๋„ํ’€๋„ ์‚ฌ๋žŒ์ด ๋„ˆ๋ฌด ๋งŽ์€์ง€ ์•ˆ์ „์ƒ ๊ด€๋ฆฌ๋ฅผ ๋นก์„ธ๊ฒŒ ํ•ด์„œ ์žฌ๋ฏธ๊ฐ€ ์—†์—ˆ์–ด์š”..', '์ฒ˜์Œ์œผ๋กœ ๋จน์–ด๋ณธ ์†Œ๋–ก์†Œ๋–ก๋ฌผ๋†€์ดํ•˜๋‹ค๊ฐ€ ๋จน์€ ๊ฐ„์‹์ด์–ด์„œ ๊ทธ๋Ÿฐ์ง€ ์ฐธ ๋ง›์žˆ๊ฒŒ ๋จน์—ˆ์Šต๋‹ˆ๋‹ค! ๊ทธ๋ ‡์ง€๋งŒ ์œ„์ƒ์€ ์ •๋ง ์•ˆ์ข‹์•˜์–ด์š”.. ์˜ค์…˜์›”๋“œ ์ฒ˜์Œ์ด๋ผ ๊ธฐ๋Œ€ ๋งŽ์ด ํ–ˆ๋Š”๋ฐ ์ฒจ๋ถ€ํ„ฐ ๋๊นŒ์ง€ ๋‹ค ๋ง˜์— ์•ˆ๋“ค์—ˆ์–ด์š”', '๋ฌผ๋ก  ์‚ฌ๋žŒ์ด ๋„ˆ~๋ฌด ๋งŽ์•„์„œ ์ผ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋Š” ์œ„์ƒ๋„ ๊ดœ์ฐฎ์•„ ๋ณด์ด๊ณ  ์Œ์‹์ด ๋น„์‹ธ์ง€๋งŒ ๋‹ค ๋ง›์žˆ์—ˆ๊ฑฐ๋“ ์š”!', '๊ทผ๋ฐ ์˜ค์…˜์›”๋“œ ์œ„์ƒ๋„ ๋ณ„๋กœ๊ณ  ๋น„์‹ธ๊ณ  ๋ง›์—†๊ณ !!! ์ฃผ์ฐจ์žฅ๋„ ์ข๊ณ  ์ฃผ์ฐจ์žฅ์—์„œ ์ž…๊ตฌ๊นŒ์ง€ ๊ฑธ์–ด์„œ ์˜ฌ๋ผ๊ฐ€๊ณ .. ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋ณด๋‹ค ๋‚˜์•˜๋˜๊ฑด ๋ฝ์ปค์‹œ์„ค๊ณผ ์œ ์ˆ˜ํ’€ ๋‘๊ฐœ ์ •๋„! ์˜ค์…˜์›”๋“œ ์ •๋ง ์•„์‰ฌ์› ์Šต๋‹ˆ๋‹ค ๊ฐœ์ธ์ ์œผ๋ฃจ ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๊ฐ€ ํ›จ์”ฌ ๋‚˜์€๋“ฏ!']

>>> split_chunks(text, use_heuristic=False, max_length=24)
['์ฃผ๋ง์— ๊ฐ€์กฑ์—ฌํ–‰์œผ๋กœ ์˜ค์…˜์›”๋“œ ๋‹ค๋…€์™”์–ด์š”!!! ์˜ค์…˜์›”๋“œ๋Š” ์ฒ˜์Œ๊ฐ€๋ณด๋Š”๊ฑฐ์—ฌ์„œ ์„ค๋ ˜์„ค๋ ˜~~!! ๋‚ ์”จ๋„ ๋๋‚ด์ฃผ๊ณ ~! ํ•˜๋Š˜,๊ตฌ๋ฆ„ ๋„ˆ๋ฌด ์ด๋ปค์Šต๋‹ˆ๋‹ค~! ๊ฐ€ํ‰ํœด๊ฒŒ์†Œ๊นŒ์ง€ ๊ฐ€๋Š”๋ฐ ์ฐจ๊ฐ€ ์—„~~~์ฒญ ๋ง‰ํ˜”์Šต๋‹ˆ๋‹ค(3์‹œ๊ฐ„๋„˜๊ฒŒ๊ฑธ๋ฆผ) ์™€ ์ •๋ง ํ† ๋‚˜์˜ค๋Š”์ค„ ์•Œ์•˜๋„ค์š” ํ•˜ํ•„ ๋˜ ์ €ํฌ๊ฐ€์กฑ ๋Šฆ๊ฒŒ ์ผ์–ด๋‚˜์„œ ๋Šฆ๊ฒŒ ์ถœ๋ฐœํ–ˆ๊ฑฐ๋“ ์š” ใ…‹ใ…‹ใ…‹ ๊ฐ€ํ‰ํœด๊ฒŒ์†Œ ์‚ฌ๋žŒ๋“ค์ด ์—„์ฒญ ๋งŽ์•˜์–ด์š”!', 'ํ˜ธ๋‘๊ณผ์ž๋ž‘ ๊ตฐ๊ฒƒ์งˆ์ข€ ํ•ด์ฃผ๊ตฌ์š” ใ…‹_ใ…‹ ์˜ค์…˜์›”๋“œ ๋„์ฐฉ!! ์ฃผ์ฐจ์žฅ์ด ๋‹ค ๊ฝ‰์ฐจ์„œ.. ์ฃผ์ฐจํ• ๊ณณ์ด ์—†๋”๋ผ๊ตฌ์š” ๊ณ„์† ์ฃผ์ฐจ์žฅ ๋Œ๋‹ค๊ฐ€ ๊ฒจ์šฐ ํ•œ์ž๋ฆฌ ์žˆ์–ด์„œ ์ฃผ์ฐจํ–ˆ์Šต๋‹ˆ๋‹ค..ใ… ใ… ใ… ', '๊ทธ๋Ÿฐ๋ฐ ๋˜ ์ฃผ์ฐจ์žฅ์— ์ฃผ์ฐจํ•˜๊ณ  ์–ธ๋•๊ธธ์„ ์˜ฌ๋ผ๊ฐ€์•ผ ํ•˜๋”๋ผ๊ตฌ์š”!?ํ—~ ์˜ค์…˜์›”๋“œ ..์ด๊ฒŒ๋ญ๋žŒ.. ํ์•Œ์ฝ”๋“œ๋กœ ์ฐ๊ณ  ๊ฐ„ํŽธํ•˜๊ฒŒ ์ž…์žฅํ–ˆ์Šต๋‹ˆ๋‹ค ์˜ค์…˜์›”๋“œ ์ฝ”์ธ๋„ ๋„‰๋„‰ํ•˜๊ฒŒ 10๋งŒ์› ์ถฉ์ „ํ–ˆ์–ด์š” ใ…‹ใ…‹ใ…‹ ๋‹ค๋“ค ๋„ˆ๋ฌด ์ž˜๋จน๊ธฐ๋•Œ๋ฌธ์—... ๋„‰๋„‰ํ•˜๊ฒŒ..ใ…‹ใ…‹ใ…‹ ์—ฌ์ž ๋ฝ์ปค์‹ค์— ์—์–ด์ปจ์ด ์–ผ๋งˆ๋‚˜ ๋นต๋นตํ•œ์ง€ ์˜ค๋“ค์˜ค๋“ค ์ถ”์› ์Šต๋‹ˆ๋‹ค ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋Š” ์Šตํ•˜๊ณ  ์ถ•์ถ•ํ•œ๋ฐ ์˜ค์…˜์›”๋“œ๋Š” ์™„์ „ ์ •๋ฐ˜๋Œ€ ใ…‹ใ…‹ใ…‹ ์ œ๊ฐ€ ๋ฐฉ์ˆ˜ํŒฉ์„ ์ค€๋น„๋ชปํ•ด์„œ ๊ฐ์ž 3๊ฐœ ์‚ด๋ ค๊ณ  ํ–ˆ๋Š”๋ฐ ํ—! ํ•œ๊ฐœ์— 19000์›์ด์—์š”!', '๊ทธ๋ž˜์„œ ํ•œ๊ฐœ๋งŒ ์ƒ€์–ด์š” ใ… ใ…  ์ œ ํ•ธ๋“œํฐ์€ ๋ฝ์ปค์—.. ๋ฐฉ์ˆ˜ํŒฉ ๊ผญ ๋ฏธ๋ฆฌ ์ค€๋น„ํ•˜์„ธ์š” ใ…  ๋„˜๋น„์‹ธ์š” ใ…  ์˜ค์…˜์›”๋“œ ์ •๋ง ์—‰๋ง์ง„์ฐฝ์ด์—ˆ์–ด์š” ใ… ใ…  ์‚ฌ๋žŒ์ด ๋„ˆ~~~๋ฌด๋งŽ์•„์„œ ์œ ์ˆ˜ํ’€๋„ ์ค„์„œ์„œ๋“ค์–ด๊ฐ€๊ตฌ์š” ๋‹ค๋ฅธ ๋†€์ด๊ธฐ๊ตฌ๋Š” ์—„๋‘๋„ ๋ชป๋‚ฌ์Šต๋‹ˆ๋‹ค ํŒŒ๋„ํ’€๋„ ์‚ฌ๋žŒ์ด ๋„ˆ๋ฌด ๋งŽ์€์ง€ ์•ˆ์ „์ƒ ๊ด€๋ฆฌ๋ฅผ ๋นก์„ธ๊ฒŒ ํ•ด์„œ ์žฌ๋ฏธ๊ฐ€ ์—†์—ˆ์–ด์š”.. ์ฒ˜์Œ์œผ๋กœ ๋จน์–ด๋ณธ ์†Œ๋–ก์†Œ๋–ก๋ฌผ๋†€์ดํ•˜๋‹ค๊ฐ€ ๋จน์€ ๊ฐ„์‹์ด์–ด์„œ ๊ทธ๋Ÿฐ์ง€ ์ฐธ ๋ง›์žˆ๊ฒŒ ๋จน์—ˆ์Šต๋‹ˆ๋‹ค!', '๊ทธ๋ ‡์ง€๋งŒ ์œ„์ƒ์€ ์ •๋ง ์•ˆ์ข‹์•˜์–ด์š”.. ์˜ค์…˜์›”๋“œ ์ฒ˜์Œ์ด๋ผ ๊ธฐ๋Œ€ ๋งŽ์ด ํ–ˆ๋Š”๋ฐ ์ฒจ๋ถ€ํ„ฐ ๋๊นŒ์ง€ ๋‹ค ๋ง˜์— ์•ˆ๋“ค์—ˆ์–ด์š” ๋ฌผ๋ก  ์‚ฌ๋žŒ์ด ๋„ˆ~๋ฌด ๋งŽ์•„์„œ ์ผ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.', '์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋Š” ์œ„์ƒ๋„ ๊ดœ์ฐฎ์•„ ๋ณด์ด๊ณ  ์Œ์‹์ด ๋น„์‹ธ์ง€๋งŒ ๋‹ค ๋ง›์žˆ์—ˆ๊ฑฐ๋“ ์š”! ๊ทผ๋ฐ ์˜ค์…˜์›”๋“œ ์œ„์ƒ๋„ ๋ณ„๋กœ๊ณ  ๋น„์‹ธ๊ณ  ๋ง›์—†๊ณ !!! ์ฃผ์ฐจ์žฅ๋„ ์ข๊ณ  ์ฃผ์ฐจ์žฅ์—์„œ ์ž…๊ตฌ๊นŒ์ง€ ๊ฑธ์–ด์„œ ์˜ฌ๋ผ๊ฐ€๊ณ .. ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๋ณด๋‹ค ๋‚˜์•˜๋˜๊ฑด ๋ฝ์ปค์‹œ์„ค๊ณผ ์œ ์ˆ˜ํ’€ ๋‘๊ฐœ ์ •๋„! ์˜ค์…˜์›”๋“œ ์ •๋ง ์•„์‰ฌ์› ์Šต๋‹ˆ๋‹ค ๊ฐœ์ธ์ ์œผ๋ฃจ ์บ๋ฆฌ๋น„์•ˆ๋ฒ ์ด๊ฐ€ ํ›จ์”ฌ ๋‚˜์€๋“ฏ!']

3. Additional Documents

4. References

Kss is available in various programming languages.

5. Citation

If you find this toolkit useful, please consider citing:

@misc{kss,
  author       = {Park, Sang-kil and Ko, Hyunwoong},
  title        = {Kss: A Toolkit for Korean sentence segmentation},
  howpublished = {\url{https://github.com/hyunwoongko/kss}},
  year         = {2020},
}

About

Kss: A Toolkit for Korean sentence segmentation

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 96.5%
  • Shell 2.1%
  • Makefile 1.3%
  • M4 0.1%