subclassing markovify.Text to allow for different types of 'sentences'

hi and thx for yr great library. 

i made a cli program to run it on my own texts.

i'm trying to add a subclass to it that enables me to feed it sentences that dont begin with initial capital letters and might begin with stars, bullets, etc. i made a subclass (modeled on your NewlineText) to modify the regexes in ```split_into_sentences()```, changing the lookahead search that mandates an initial capital letter after sentence end (splitters.py, line 45) to read ```r"\s+(?=[-•\w‘’“”'*\|/~\",])",```, and added a few more punctuation marks to the previous regexes (hypen, ellipses/triple periods). 

it works if i manually generate a corpus and markov model from one of my texts, but not if i run my program using the subclass. one "sentence" will have a period in the middle of it and will continue printing text after it.

so i wanted to ask if there anything in the way that sentences are made from the markov model that would affect these modified regexes or disregard them? and is there a better way to go about modifying sentence endings than messing with  ```split_into_sentences()```?

[sorry if its obvious in the code. i'm very much a novice with programming.]

 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

subclassing markovify.Text to allow for different types of 'sentences' #145

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

subclassing markovify.Text to allow for different types of 'sentences' #145

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions