Skip to content

Limit SPARQL queries in SparqlFilter to user-edited articles only #52

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zache-fi
Copy link
Collaborator

@zache-fi zache-fi commented Apr 5, 2025

It would be useful that the faster SPARQL query would be up and running. I will do the PHP backend code to python code refactoring when i have little bit more time than today.

@zache-fi zache-fi linked an issue Apr 5, 2025 that may be closed by this pull request
@jhsoby
Copy link
Member

jhsoby commented Apr 5, 2025

Thank you, Zache! I agree it would be good to get this sooner rather than later, seeing how often SPARQL-based contests fail.

wikidata_items = self.get_competition_wikidata_ids(self.tpl, site)

# If possible wikidata_items is none then there is no need for SPARQL query
if not wikidata_items:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what if it returned no Wikidata items because of an error, or because fiwiki-tools is down for some unrelated reason? In that case, I think it should fall back to the old VALUES-less SPARQL query, instead of not running the query at all, like now.

Essentially: I'd like the script to distinguish between "get_competition_wikidata_ids() worked as it should and returned no items" and "get_competition_wikidata_ids() had an exception and therefore returned no items". Does that make sense?

@Samoasambia
Copy link

Have you considered using https://query-main.wikidata.org/ which excludes tens of millions Wikidata scholarly article items (which are very unlikely to be part of any contest)? It is less likely to give a timeout compared to https://query.wikidata.org/.

@jhsoby
Copy link
Member

jhsoby commented Apr 5, 2025

I am not sure excluding scholarly articles would solve the problems we've had lately. Our problems were because we were querying for all women (and then checking which of those women had articles on the supported Wikipedias), and even a simple query takes a long time due to the sheer amount of items involved.

Granted, the query-main service was about 3× quicker when I tried the same query on both just now, so it is worth considering, but probably in combination with this new approach instead of as an alternative to it.

@jhsoby
Copy link
Member

jhsoby commented Apr 10, 2025

@zache-fi I tried the patch locally, but it doesn't work like it should. I tried it with this command:

ukbot --page "Wikipedia:Konkurranser/Månedens_konkurranse_2024-03" --simulate config/config.no-mk.yml --output resultspage.txt

Resulting in this: resultspage.txt

No contributions counted, even though it spent more than an hour going through the participants' contributions (the long time was because I was starting fresh with a big, already-finished contest, I think, so nothing was saved to the local database).

While testing this, I also came across a small bug – it might have to be fixed on the fiwiki-tools side for now. This command: ukbot --page "Wikipedia:Konkurranser/Månedens konkurranse 2024-03" --simulate config/config.no-mk.yml --output resultspage.txt, with spaces instead of underscores in the page name, led to the bot looking up this URL instead of this one.

@Samoasambia
Copy link

Have you considered using https://query-main.wikidata.org/ which excludes tens of millions Wikidata scholarly article items (which are very unlikely to be part of any contest)? It is less likely to give a timeout compared to https://query.wikidata.org/.

I just noticed that on 15 April they will split scholarly articles out from query.wikidata.org, so no need to change the query address :). See Tech News: 2025-16.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Problematic SPARQL queries
3 participants