-
-
Notifications
You must be signed in to change notification settings - Fork 8.5k
Added Turkish sites: 1000Kitap, Eksi Sozluk, Uludag Sozluk, KizlarSor… #2786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Automatic validation of changes
Failures were detected on at least one updated target. Commits containing accuracy failures will often not be merged (unless a rationale is provided, such as false negatives due to regional differences). |
|
Update: I've updated the logic for Turkish sites in Progress:
The current |
|
Thanks for the PR The CI errors are typically caused by the use of diacritics or non-standard characters in error messages (as often found with non-English sites) Quick example here from a Russian site: sherlock/sherlock_project/resources/data.json Lines 878 to 884 in 8f1308b
And you can use CyberChef to handle the actual conversion: |
|
To clarify -- The CI error I'm referring to is it's outright failure to run. This issue will be replicated on some user's systems as well (often OS-dependent). We can ignore False Negatives due to regional issues however, that's fine. False Positives are a hard sell because that introduces a lot of noise --- a catch should be implemented to prevent False Positives. If you're unable to do so (because of the same regional issues), I can probably add a catch for them in a bit whenever I have time |
Automatic validation of changes
Failures were detected on at least one updated target. Commits containing accuracy failures will often not be merged (unless a rationale is provided, such as false negatives due to regional differences). |
Automatic validation of changes
Failures were detected on at least one updated target. Commits containing accuracy failures will often not be merged (unless a rationale is provided, such as false negatives due to regional differences). |
…ate 1000Kitap headers
Automatic validation of changes
Failures were detected on at least one updated target. Commits containing accuracy failures will often not be merged (unless a rationale is provided, such as false negatives due to regional differences). |
Hi Paul, I’ve been working hard to get these Turkish sites integrated correctly, but I’ve had to make a bit of a call to keep the results reliable. I decided to move forward with just 1000Kitap and Bionluk and removed the others because they were just too "noisy" with consistent False Positives. I wanted to give you a quick heads-up on the technical side of things: Solving the False Positives (F+): This was the main goal. Both sites were originally failing this. Since Bionluk uses a "Soft 404" (returning 200 OK for pages that don't exist), I switched the detection to a specific <title> match. For 1000Kitap, moving to the status_code method did the trick. Now, F+ checks are passing perfectly. The F- Trade-off: Interestingly, once I cleared the F+ issues, these sites started failing the F- checks (finding existing users). After doing some testing with curl from my end here in Australia and looking at regional IP behavior, it’s clear that GitHub’s CI servers are getting flagged by Turkish WAFs or IP blocks when they try to pull actual profile data. Prioritizing Accuracy: I remembered your point about false positives being a "hard sell," so I prioritized making sure Sherlock doesn't give misleading "found" results, even if regional blocks make the F- tests look a bit messy on GitHub's end. I’ve also made sure all the error messages are properly Unicode escaped and the file is fully compliant with the schema. Thanks for being patient while I figured this out—I think it’s in a much more solid place now. Best, |
Hi team,
I have added support for popular Turkish social media and community platforms to expand Sherlock's coverage in this region.
Added Sites:
Testing:
I have locally tested all added sites using
data.jsonwith known valid usernames (e.g., official accounts or founders likezall,ssg,ksbence). All sites returned positive results ([+] Found) and valid profile URLs.Thank you!