-
Notifications
You must be signed in to change notification settings - Fork 5
Limit SPARQL queries in SparqlFilter to user-edited articles only #52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thank you, Zache! I agree it would be good to get this sooner rather than later, seeing how often SPARQL-based contests fail. |
wikidata_items = self.get_competition_wikidata_ids(self.tpl, site) | ||
|
||
# If possible wikidata_items is none then there is no need for SPARQL query | ||
if not wikidata_items: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what if it returned no Wikidata items because of an error, or because fiwiki-tools is down for some unrelated reason? In that case, I think it should fall back to the old VALUES-less SPARQL query, instead of not running the query at all, like now.
Essentially: I'd like the script to distinguish between "get_competition_wikidata_ids() worked as it should and returned no items" and "get_competition_wikidata_ids() had an exception and therefore returned no items". Does that make sense?
Have you considered using https://query-main.wikidata.org/ which excludes tens of millions Wikidata scholarly article items (which are very unlikely to be part of any contest)? It is less likely to give a timeout compared to https://query.wikidata.org/. |
I am not sure excluding scholarly articles would solve the problems we've had lately. Our problems were because we were querying for all women (and then checking which of those women had articles on the supported Wikipedias), and even a simple query takes a long time due to the sheer amount of items involved. Granted, the query-main service was about 3× quicker when I tried the same query on both just now, so it is worth considering, but probably in combination with this new approach instead of as an alternative to it. |
@zache-fi I tried the patch locally, but it doesn't work like it should. I tried it with this command:
Resulting in this: resultspage.txt No contributions counted, even though it spent more than an hour going through the participants' contributions (the long time was because I was starting fresh with a big, already-finished contest, I think, so nothing was saved to the local database). While testing this, I also came across a small bug – it might have to be fixed on the fiwiki-tools side for now. This command: |
I just noticed that on 15 April they will split scholarly articles out from query.wikidata.org, so no need to change the query address :). See Tech News: 2025-16. |
It would be useful that the faster SPARQL query would be up and running. I will do the PHP backend code to python code refactoring when i have little bit more time than today.