You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fromkeras.preprocessing.textimporttext_to_word_sequence# define the doctext='The quick brown fox jumps over the lazy dog'# tokenize the documentresult=text_to_word_sequence(text)
print(result)
fromkeras.preprocessing.textimporttext_to_word_sequencefromkeras.preprocessing.textimportone_hot# define the doctext='The quick brown fox jumps over the lazy dog.'#estimate the size of the vocabwords=set(text_to_word_sequence(text))
vocab_size=len(words)
print(vocab_size)
# integer encode the documentresult=one_hot(text, round(vocab_size*1.3))
print(result)
8
[3, 9, 3, 8, 1, 1, 3, 4, 9]
Hashing Encoding with hashing-trick
fromkeras.preprocessing.textimporttext_to_word_sequencefromkeras.preprocessing.textimporthashing_trick# define the doctext='The quick brown fox jumps over the lazy dog.'#estimate the size of the vocabwords=set(text_to_word_sequence(text))
vocab_size=len(words)
print(vocab_size)
# integer encode the documentresult=hashing_trick(text, round(vocab_size*1.3), hash_function='md5')
print(result)
8
[6, 4, 1, 2, 7, 5, 6, 2, 6]
Tokenizer API
fromkeras.preprocessing.textimportTokenizer# define 5 docsdocs= ['Well done!',
'Good work',
'Great effort',
'nice work',
'Excellent!']
# create the tokenizert=Tokenizer()
# fit the tokenizer on the docst.fit_on_texts(docs)
# summarize what was learnedprint(t.word_counts)
print(t.document_count)
print(t.word_index)
print(t.word_docs)
# integer encode docsencoded_docs=t.texts_to_matrix(docs, mode='count')
print(encoded_docs)