Skip to content

Sort concepts by (ES hits * ES score per concept) / PITs per concept: #10 #10

@bertspaan

Description

@bertspaan

Currently, Histograph does the following:

  • API queries Elasticsearch (e.g. q=utrecht), ES returns list of PITs
  • List of PITs probably contains many forms and spellings of Utrecht, and maybe some results like Abcoude bij Utrecht
  • Those PITs are sent to Neo4j Plugin, BFSs are computed for each PIT, and subgraphs/concepts/klonten are returned, ordered by number of PITs per concept
  • This may cause Abcoude to show up first in the list of results.
  • This is wrong!

Possible solution:

  • API queries Elasticsearch (e.g. q=utrecht), ES returns list of PITs
  • List of PITs probably contains many forms and spellings of Utrecht, and maybe some results like Abcoude bij Utrecht
  • Those PITs are sent to Neo4j Plugin, together with their respective Elasticsearch score
  • BFSs are computed for each PIT, just like before, but now the Neo4j Plugin orders the list of resulting concepts by (ES hits * ES score per concept) / PITs per concept
  • This way, the concept of Utrecht will have many ES hits (and high ES scores, too) per concept, while the concept of Abcoude will have at least one ES hit (Abcoude bij Utrecht) in its concept, but probably not many more. The new ordering algorithm will make sure this concept is not returned first.
  • This is better!

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions