Skip to content

On curation page for taxa, add panel for cases where word boundaries are missing around taxon names in work titles #1898

Open
@Daniel-Mietchen

Description

@Daniel-Mietchen

What kind of panel would you like to add to which Scholia aspect?

The representation of publication article titles in Wikidata can be distorted in multiple ways with respect to the original publication. A frequent one of these distortions is that taxon names were originally marked up with <i>italic</i> or similar, which then has been stripped by some processing step on the way, often such that surrounding whitespace or punctuation are affected to.

What kind of information should the panel provide, and which of the visualization options (e.g. table, bubble chart, map) should it use?

Draft query is here:

PREFIX target: <http://www.wikidata.org/entity/Q752130> 

SELECT DISTINCT 
?item ?title ?taxonname
  (REPLACE(STR(?item), ".*Q", "Q") AS ?work) 
  ("P921" AS ?main_subject)
  (REPLACE(STR(target: ), ".*Q", "Q") AS ?taxon)
  ("S887" AS ?heuristic)
  ("Q69652283" AS ?deduced)

WHERE
{
  target: wdt:P225 ?taxonname .
  SERVICE wikibase:mwapi
  {
    bd:serviceParam wikibase:endpoint "www.wikidata.org";
                    wikibase:api "Generator";
                    mwapi:generator "search";
                    mwapi:gsrsearch ?taxonname;
                    mwapi:gsrlimit "max".
    ?item wikibase:apiOutputItem mwapi:title.
  }
  ?item wdt:P1476 ?title .
#   MINUS {?item wdt:P921 target: }
  
  FILTER REGEX(LCASE(?title), LCASE(?taxonname))  
  FILTER (!REGEX(LCASE(?title), LCASE(CONCAT( "\\", "b", ?taxonname ,"\\", "b"))))
}

Screenshot 2022-03-08 at 18-48-16 Wikidata Query Service

Which Wikidata entries would be good candidates to explore such visualizations?

Any taxon item with lots of publications about them.

Anything else?

  • Similar issues exist for other stuff frequently written in italics or otherwise marked up in publication titles, like formulas or mineral names.
  • Perhaps link to TABernacle rather than prepare QuickStatements

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1476-titleWikidata propertyP225-taxon-nameWikidata propertydata-qualityissues related to the quality of the data that Scholia showsmissing-datadata relevant to Scholia that is not accessible to itpanelsscreen space for displaying the result of a query

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions