Skip to content

subclassing markovify.Text to allow for different types of 'sentences' #145

Open
@mooseyboots

Description

@mooseyboots

hi and thx for yr great library.

i made a cli program to run it on my own texts.

i'm trying to add a subclass to it that enables me to feed it sentences that dont begin with initial capital letters and might begin with stars, bullets, etc. i made a subclass (modeled on your NewlineText) to modify the regexes in split_into_sentences(), changing the lookahead search that mandates an initial capital letter after sentence end (splitters.py, line 45) to read r"\s+(?=[-•\w‘’“”'*\|/~\",])",, and added a few more punctuation marks to the previous regexes (hypen, ellipses/triple periods).

it works if i manually generate a corpus and markov model from one of my texts, but not if i run my program using the subclass. one "sentence" will have a period in the middle of it and will continue printing text after it.

so i wanted to ask if there anything in the way that sentences are made from the markov model that would affect these modified regexes or disregard them? and is there a better way to go about modifying sentence endings than messing with split_into_sentences()?

[sorry if its obvious in the code. i'm very much a novice with programming.]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions