Skip to content

Commit 38e2b10

Browse files
committed
Add external_urls filter
This filter traverses all <a> tags and replaces its url for an url poiting to a path of an existant documentation.
1 parent e9d7849 commit 38e2b10

File tree

6 files changed

+59
-1
lines changed

6 files changed

+59
-1
lines changed

docs/filter-reference.md

+1
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,7 @@ The `call` method must return either `doc` or `html`, depending on the type of f
8484
* [`AttributionFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/attribution.rb) — appends the license info and link to the original document
8585
* [`TitleFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/title.rb) — prepends the document with a title (disabled by default)
8686
* [`EntriesFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/entries.rb) — abstract filter for extracting the page's metadata
87+
* [`ExternalUrlsFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/external_urls.rb) — replaces external URLs for relative URLs of existant devdocs documentation.
8788

8889
## Custom filters
8990

docs/scraper-reference.md

+5
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,7 @@ Additionally:
115115

116116
* [`TitleFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/title.rb) is a core HTML filter, disabled by default, which prepends the document with a title (`<h1>`).
117117
* [`EntriesFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/entries.rb) is an abstract HTML filter that each scraper must implement and responsible for extracting the page's metadata.
118+
* [`ExternalUrlsFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/external_urls.rb) is an HTML filter that replaces external URLs found in `<a>` tags to urls pointing to existant devdocs documentation.
118119

119120
### Filter options
120121

@@ -185,6 +186,10 @@ More information about how filters work is available on the [Filter Reference](.
185186

186187
_Note: this filter is disabled by default._
187188

189+
* [`ExternalUrlsFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/external_urls.rb)
190+
191+
- `:external_urls` [Hash or Proc] If it is a Hash, replaces all URLs found in `<a>` tags for URLs of existant devdocs documentation. If it is a Proc, it is called with an URL (string) as argument and should return a relative URL pointing to an existant devdocs documentation. See [`backbone.rb`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/scrapers/backbone.rb)
192+
188193
## Keeping scrapers up-to-date
189194

190195
In order to keep scrapers up-to-date the `get_latest_version(opts)` method should be overridden. If `self.release` is defined, this should return the latest version of the documentation. If `self.release` is not defined, it should return the Epoch time when the documentation was last modified. If the documentation will never change, simply return `1.0.0`. The result of this method is periodically reported in a "Documentation versions report" issue which helps maintainers keep track of outdated documentations.

lib/docs/core/filter.rb

+10
Original file line numberDiff line numberDiff line change
@@ -96,5 +96,15 @@ def clean_path(path)
9696
path = path.gsub %r{\+}, '_plus_'
9797
path
9898
end
99+
100+
def path_to_root
101+
if subpath == ''
102+
return '../'
103+
else
104+
previous_dirs = subpath.scan(/\//)
105+
return '../' * previous_dirs.length
106+
end
107+
end
108+
99109
end
100110
end

lib/docs/core/scraper.rb

+1-1
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ def stub(path, &block)
4141
self.html_filters = FilterStack.new
4242
self.text_filters = FilterStack.new
4343

44-
html_filters.push 'apply_base_url', 'container', 'clean_html', 'normalize_urls', 'internal_urls', 'normalize_paths', 'parse_cf_email'
44+
html_filters.push 'apply_base_url', 'container', 'clean_html', 'normalize_urls', 'internal_urls', 'normalize_paths', 'parse_cf_email', 'external_urls'
4545
text_filters.push 'images' # ensure the images filter runs after all html filters
4646
text_filters.push 'inner_html', 'clean_text', 'attribution'
4747

+38
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# frozen_string_literal: true
2+
3+
module Docs
4+
class ExternalUrlsFilter < Filter
5+
6+
def call
7+
if context[:external_urls]
8+
9+
root = path_to_root
10+
11+
css('a').each do |node|
12+
13+
next unless anchorUrl = node['href']
14+
15+
# avoid links already converted to internal links
16+
next if anchorUrl.match?(/\.\./)
17+
18+
if context[:external_urls].is_a?(Proc)
19+
node['href'] = context[:external_urls].call(anchorUrl)
20+
next
21+
end
22+
23+
url = URI(anchorUrl)
24+
25+
context[:external_urls].each do |host, name|
26+
if url.host.to_s.match?(host)
27+
node['href'] = root + name + url.path.to_s + '#' + url.fragment.to_s
28+
end
29+
end
30+
31+
end
32+
end
33+
34+
doc
35+
end
36+
37+
end
38+
end

lib/docs/scrapers/backbone.rb

+4
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,10 @@ class Backbone < UrlScraper
2121
Licensed under the MIT License.
2222
HTML
2323

24+
options[:external_urls] = {
25+
'underscorejs.org' => 'underscore'
26+
}
27+
2428
def get_latest_version(opts)
2529
doc = fetch_doc('https://backbonejs.org/', opts)
2630
doc.at_css('.version').content[1...-1]

0 commit comments

Comments
 (0)