Skip to content

Conversation

@sedat4ras
Copy link

Hi team,

I have added support for popular Turkish social media and community platforms to expand Sherlock's coverage in this region.

Added Sites:

  • 1000Kitap: A popular book tracking and review community.
  • Ekşi Sözlük: One of the largest collaborative dictionaries/communities in Turkey.
  • Uludağ Sözlük: Another major community dictionary platform.
  • KizlarSoruyor: A popular Q&A community platform.
  • Bionluk: A freelance services marketplace.

Testing:
I have locally tested all added sites using data.json with known valid usernames (e.g., official accounts or founders like zall, ssg, ksbence). All sites returned positive results ([+] Found) and valid profile URLs.

Thank you!

TerminalSS

@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2026

Automatic validation of changes

Target F+ Check F- Check
1000Kitap ❌   Fail ✔️   Pass
KizlarSoruyor ❌   Fail ❌   Fail
UludagSozluk ✔️   Pass ❌   Fail
EksiSozluk ❌   Fail ❌   Fail
Bionluk ❌   Fail ✔️   Pass

Failures were detected on at least one updated target. Commits containing accuracy failures will often not be merged (unless a rationale is provided, such as false negatives due to regional differences).

@sedat4ras
Copy link
Author

Update: I've updated the logic for Turkish sites in data.json.

Progress:

  • UludagSozluk: Fixed! Switched to errorType: message and updated the URL structure to /yazar/. Locally verified and working.
  • 1000Kitap & Bionluk: Locally verified and working correctly.
  • EksiSozluk & KizlarSoruyor: These sites use aggressive WAF/Cloudflare protection that blocks cloud data center IPs (like GitHub Actions). However, I have verified the logic manually from a residential IP.

The current data.json structure is correct for these platforms. Please consider the CI failures as false negatives due to regional/bot protection.

@ppfeister
Copy link
Member

Thanks for the PR

The CI errors are typically caused by the use of diacritics or non-standard characters in error messages (as often found with non-English sites)
They should be resolved by translating those messages into an escaped unicode string.

Quick example here from a Russian site:

"Football": {
"errorMsg": "\u041f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u044c \u0441 \u0442\u0430\u043a\u0438\u043c \u0438\u043c\u0435\u043d\u0435\u043c \u043d\u0435 \u043d\u0430\u0439\u0434\u0435\u043d",
"errorType": "message",
"url": "https://www.rusfootball.info/user/{}/",
"urlMain": "https://www.rusfootball.info/",
"username_claimed": "solo87"
},

And you can use CyberChef to handle the actual conversion:
(dropped in the text from the above example to demonstrate)

https://gchq.github.io/CyberChef/#recipe=Escape_Unicode_Characters('%5C%5Cu',false,4,false)&input=0J/QvtC70YzQt9C%2B0LLQsNGC0LXQu9GMINGBINGC0LDQutC40Lwg0LjQvNC10L3QtdC8INC90LUg0L3QsNC50LTQtdC9&oenc=65001

@ppfeister
Copy link
Member

To clarify --

The CI error I'm referring to is it's outright failure to run. This issue will be replicated on some user's systems as well (often OS-dependent).

We can ignore False Negatives due to regional issues however, that's fine. False Positives are a hard sell because that introduces a lot of noise --- a catch should be implemented to prevent False Positives. If you're unable to do so (because of the same regional issues), I can probably add a catch for them in a bit whenever I have time

@github-actions
Copy link
Contributor

Automatic validation of changes

Target F+ Check F- Check
Bionluk ❌   Fail ✔️   Pass
UludagSozluk ❌   Fail ❌   Fail
1000Kitap ❌   Fail ✔️   Pass
KizlarSoruyor ❌   Fail ❌   Fail
EksiSozluk ❌   Fail ❌   Fail

Failures were detected on at least one updated target. Commits containing accuracy failures will often not be merged (unless a rationale is provided, such as false negatives due to regional differences).

@github-actions
Copy link
Contributor

Automatic validation of changes

Target F+ Check F- Check
1000Kitap ✔️   Pass ❌   Fail
Bionluk ❌   Fail ✔️   Pass

Failures were detected on at least one updated target. Commits containing accuracy failures will often not be merged (unless a rationale is provided, such as false negatives due to regional differences).

@github-actions
Copy link
Contributor

Automatic validation of changes

Target F+ Check F- Check
1000Kitap ✔️   Pass ❌   Fail
Bionluk ✔️   Pass ❌   Fail

Failures were detected on at least one updated target. Commits containing accuracy failures will often not be merged (unless a rationale is provided, such as false negatives due to regional differences).

@sedat4ras
Copy link
Author

Automatic validation of changes

Target F+ Check F- Check
1000Kitap ✔️   Pass ❌   Fail
Bionluk ✔️   Pass ❌   Fail
Failures were detected on at least one updated target. Commits containing accuracy failures will often not be merged (unless a rationale is provided, such as false negatives due to regional differences).

Hi Paul,

I’ve been working hard to get these Turkish sites integrated correctly, but I’ve had to make a bit of a call to keep the results reliable. I decided to move forward with just 1000Kitap and Bionluk and removed the others because they were just too "noisy" with consistent False Positives.

I wanted to give you a quick heads-up on the technical side of things:

Solving the False Positives (F+): This was the main goal. Both sites were originally failing this. Since Bionluk uses a "Soft 404" (returning 200 OK for pages that don't exist), I switched the detection to a specific <title> match. For 1000Kitap, moving to the status_code method did the trick. Now, F+ checks are passing perfectly.

The F- Trade-off: Interestingly, once I cleared the F+ issues, these sites started failing the F- checks (finding existing users). After doing some testing with curl from my end here in Australia and looking at regional IP behavior, it’s clear that GitHub’s CI servers are getting flagged by Turkish WAFs or IP blocks when they try to pull actual profile data.

Prioritizing Accuracy: I remembered your point about false positives being a "hard sell," so I prioritized making sure Sherlock doesn't give misleading "found" results, even if regional blocks make the F- tests look a bit messy on GitHub's end.

I’ve also made sure all the error messages are properly Unicode escaped and the file is fully compliant with the schema. Thanks for being patient while I figured this out—I think it’s in a much more solid place now.

Best,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants