Skip to content
This repository was archived by the owner on Oct 22, 2018. It is now read-only.

Refactoring language processing#22

Open
sepulchered wants to merge 5 commits into
PeARSearch:developmentfrom
sepulchered:refactoring_language_processing
Open

Refactoring language processing#22
sepulchered wants to merge 5 commits into
PeARSearch:developmentfrom
sepulchered:refactoring_language_processing

Conversation

@sepulchered
Copy link
Copy Markdown

moved pos tagger, lemmatizer and textblob tagger script into one file lang_proc and refactoring for those modules

Comment thread app/lang_proc.py
for line in text_lines:
tagged = []
line = line.strip()
for surface, pos in pt.tag(line):
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use tag_query method here?

@minimalparts
Copy link
Copy Markdown
Member

A tag_query method sounds good. But please note that the status of this code is unclear at the moment (sorry for not clarifying this in the code itself). We have moved to a system that doesn't need any prior linguistic processing. This is faster, and in some sense more robust: a wrong POS tag can create havoc in the results. The semantic space can also be smaller if less linguistic info is included. We are waiting to do some more testing to decide on whether the code should completely go or not. So perhaps don't do too much work on this now :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants