Skip to content

Commit ec82fe7

Browse files
committed
Change robots.txt #1709
We had an old robots.txt for the wmflabs with subdirectory. As the Toolforge Scholia domain is changed to scholia.toolforge.org then the robots.txt was no longer having a correct path and should thus be ineffective. Search engines index dynamic content on Scholia pages differently: Bing and Quant seems to index the content, but Duckduckgo and Google do apparently not, see #1709. With this change, not only is the path change, but bots are now allows. If this results in too much load on the Toolforge infrastruture then it should be changed to a 'Disallow: /'. Note that the 'robots' HTML meta tag on each Scholia page has a nofollow to avoid crawling.
1 parent 1c66181 commit ec82fe7

File tree

1 file changed

+22
-2
lines changed

1 file changed

+22
-2
lines changed

scholia/app/views.py

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1984,14 +1984,34 @@ def show_publisher_empty():
19841984
def show_robots_txt():
19851985
"""Return robots.txt file.
19861986
1987+
A robots.txt file is returned that allows bots to index Scholia.
1988+
19871989
Returns
19881990
-------
19891991
response : flask.Response
1990-
Rendered HTML for publisher index page.
1992+
Rendered plain text with robots.txt content.
1993+
1994+
Notes
1995+
-----
1996+
The default robots.txt for Toolforge hosted tools is
1997+
1998+
User-agent: *
1999+
Disallow: /
2000+
2001+
Scholia's function returns a robots.txt with 'Allow' for all. We would like
2002+
bots to index, but not crawl Scholia. Crawling is also controlled by the
2003+
HTML meta tag 'robots' thatis set to the content: noindex, nofollow on all
2004+
pages. So Scholia's robots.txt is:
2005+
2006+
User-agent: *
2007+
Allow: /
2008+
2009+
If this results in too much crawling or load on the Toolforge
2010+
infrastructure then it should be changed.
19912011
19922012
"""
19932013
ROBOTS_TXT = ('User-agent: *\n'
1994-
'Disallow: /scholia/\n')
2014+
'Allow: /\n')
19952015
return Response(ROBOTS_TXT, mimetype="text/plain")
19962016

19972017

0 commit comments

Comments
 (0)