Skip to content

subject_verb_object_triples cannot find SVO for a prepositional object (0.11.0) #343

Open
@ChiefOfGxBxL

Description

@ChiefOfGxBxL

Hi there, I've been using textacy for at least a few months and it has helped me make significant progress on a few projects I'm working on! The subject_verb_object_triples() method is what I'm most interested in for knowledge extraction.

My current use case is looking at subjects and verbs, along with coreference resolution provided by coreferee, to accumulate knowledge of what people are doing prior to certain events taking place. I'm encountering the following issue in the latest version, 0.11.0:

steps to reproduce

import spacy
import textacy

nlp = spacy.load('en_core_web_trf')  # or en_core_web_sm, en_core_web_lg
doc = nlp("A woman walked to the store.")
svos = textacy.extract.triples.subject_verb_object_triples(doc)

for svo in svos:
    print(svo)

expected vs. actual behavior

Expected: Output contains a SVOTriple with a (woman, walked, store) triple.
Actual: Output is empty, no svo triples are detected []

possible solution?

I debugged https://github.com/chartbeat-labs/textacy/blob/main/src/textacy/extract/triples.py on my end to determine what information was being captured and not. Here's what I found.

Here's the document and its dependencies:

A     woman   walked   to      the    store.
det   nsubj   ROOT     prep    det    pobj

It's a simple sentence with a nominal subject, verb, and prepositional object.

The verb and nsubj are found, but the following lines prevent "store" from being added as the object https://github.com/chartbeat-labs/textacy/blob/main/src/textacy/extract/triples.py#L79-L82:

Lines 79 - 82:

# prepositional object acting as agent of passive verb
elif tok.dep == pobj:
    if head.dep == agent and head.head.pos == VERB:
        verb_sos[head.head]["objects"].update(expand_noun(tok))

When the token in the loop reaches tok = store, head.head.pos == VERB is TRUE, but head.dep == agent is FALSE, hence the object "store" is not added to the verb_sos data. head.dep is prep in this case, not agent.

On my end I can circumvent this by disabling the head.dep == agent check or expanding it to allow [agent, prep]. However, I'm wondering if the workaround should in fact be incorporated into textacy. Was the prep case missed, or perhaps you were encountering false positives when it was included?

Since the object is not detected, https://github.com/chartbeat-labs/textacy/blob/main/src/textacy/extract/triples.py#L108 prevents the SVOTriple from being returned:

for verb, so_dict in verb_sos.items():
    if so_dict["subjects"] and so_dict["objects"]:
        yield SVOTriple(...)

I may be misremembering, but I thought in previous versions of textacy SVOTriples were allowed even if the object was missing. Would you consider adding an optional parameter to the function, def subject_verb_object_triples(doclike, allow_empty_objects=False) so pairs with subjects and verbs can still be extracted?

For example:
He laughed at me. -> SVOTriple (he, laughed, me)

But with no object, an SVOTriple with an empty object may still be useful:
He laughed. -> SVOTriple (he, laughed, None)

context

Extracting this simple SVO triple provides important information about what the subject was doing prior to a certain incident taking place. In this case, prior to arriving to the store, the woman was walking. It also helps identify the means of transportation to the store, e.g. walk, took a bus, drove to the store, etc.

environment

  • operating system: Windows 10
  • python version: 3.9.7
  • spacy version: 3.1.2
  • installed spacy models: en_core_web_trf, en_core_web_sm, en_core_web_lg
  • textacy version: 0.11.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions