Skip to content

Scrape github home for repo description, stars, and tags #511

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: ros2
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions _includes/repo_summary.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@
{% endcomment %}
<link rel="stylesheet" href="{{ "/css/ci-status.css" | prepend: site.baseurl }}">
<table class="table table-condensed">
<tr>
<td class="text-right"><b>Description</b></td>
<td><span class="label label-default">{{snapshot.data.description}}</span></td>
</tr>
<tr>
<td style="width:100px;" class="text-right"><b>Checkout URI</b></td>
<td><a class="label label-default" href="{{repo.uri}}">{{repo.uri}}</a></td>
Expand Down
10 changes: 8 additions & 2 deletions _layouts/search_packages.html
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,7 @@ <h2>More resources</h2>
const LEFT_ARROW = "\u2B05";
const RIGHT_ARROW = "\u27A1";
const CIRCLED_DOT = "\u2299";
const BOLT = "\u26A1";
const isRepos = getIsRepos();

const tableSort = searchArray ? SCORE_SORT : (isRepos ? REPO_SORT : PACKAGE_SORT);
Expand Down Expand Up @@ -273,14 +274,18 @@ <h2>More resources</h2>
{ title:"Package", field:"url", width:200,
formatter:"link", formatterParams:{labelField:"package", urlPrefix:"{{site.baseurl}}"}},
{ title:"Description", field:"description", minWidth:isRepos ? 160: 300},
{ title:STAR, field:"released", width:40, formatter:"tickCross", headerTooltip:"released: release status", hozAlign:"center",
{ title:BOLT, field:"released", width:40, formatter:"tickCross", headerTooltip:"released: release status", hozAlign:"center",
formatterParams: {allowTruthy: true, allowEmpty: true}},
{ title:LEFT_ARROW, field:"pkg_deps", width:40, headerTooltip:"pkg_deps: package dependency count"},
{ title:RIGHT_ARROW, field:"dependants", width:40, headerTooltip:"dependants: package used by count"},
{ title:CIRCLED_DOT, field:"core", width:40, formatter:"tickCross", headerTooltip:"core: Is dependency of a core package?",
hozAlign:"center", formatterParams: {allowTruthy: true}, sorter:function(a, b, aRow, bRow, column, dir, sorterParams) {
return sorterParams.indexOf(a) - sorterParams.indexOf(b);
}, sorterParams:['ros_core', 'ros_base', 'desktop', 'desktop_full']},
{ title:STAR, field:"stars", width:40, sorter:"number", headerTooltip:"Repo stars", formatter:function(cell, formatterParams, onRendered) {
const value = cell.getValue();
return value >= 10000 ? Math.floor(value / 1000).toString() + "K" : value.toString();
}},
{ title:"Authors", field:"authors", width:120},
{ title:"Maintainers", field:"maintainers", width:120},
{ title:"Org", field:"org", width:100},
Expand Down Expand Up @@ -313,7 +318,8 @@ <h2>More resources</h2>
tableOptions['groupHeader'] = (value, count, data, group) => {
header =
`<span style='color:#d00;'>${data[0]['last_commit_time']}</span>` +
`<span style='margin-left:11px;'><a href="/r/${value}/#${distro}">${value}</span>`;
`<span style='margin-left:11px;'><a href="/r/${value}/#${distro}">${value}</a></span>` +
`<span style='margin-left:11px;' title='${data[0]['repo_description']}'>${data[0]['repo_description']}</span>`
return header;
}
}
Expand Down
81 changes: 78 additions & 3 deletions _plugins/rosindex_generator.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
require 'colorator'
require 'fileutils'
require 'find'
require 'nokogiri'
require 'rexml/document'
require 'rexml/xpath'
require 'pathname'
Expand All @@ -30,6 +31,7 @@

$fetched_uris = {}
$debug = false
$repo_scrape = {}
DEFAULT_LANGUAGE_PREFIX = 'en'
HEAVY_CHECKMARK = "\u2714"
HEAVY_MINUS = "\u2796"
Expand Down Expand Up @@ -647,6 +649,69 @@ def find_packages(site, distro, repo, snapshot, local_path)
return packages
end

def fetch(uri_str, limit = 10)
# get an http response, accounting for redirects.
if limit <= 0 then
raise StandardError.new "Redirect limit exceeded for #{uri_str}"
end
response = Net::HTTP.get_response(URI(uri_str))
case response
when Net::HTTPSuccess
response
when Net::HTTPRedirection
fetch(response['location'], limit - 1)
else
response.error!
end
end

def scrape_repo_page(uri_s)
# Scrapes a github repository home page to get various items
begin
# cache the results since this only depends on the repo uri
if $repo_scrape.key?(uri_s) then
return $repo_scrape[uri_s]
end

repo_uri = URI(uri_s)
if repo_uri.host == 'github.com' then
response = fetch(uri_s)
document = Nokogiri::HTML(response.body)
element = document.at(".Layout-sidebar .octicon-star + strong")
if element then
star_count_f = element.text.to_f
if element.text.include? 'k' then
star_count_f = 1000 * star_count_f
end
star_count = star_count_f.to_i
else
star_count = nil
end
element = document.at('.Layout-sidebar p')
description = if element then element.text.strip else '' end
tag_elements = document.css('h3:contains("Topics") + div a')
tags = []
tag_elements.each do |element|
tags.push(element.text.strip)
end
repo_parms = {
stars: star_count,
description: description,
tags: tags,
}
else
repo_parms = {}
end

rescue => e
puts "Error in scrape_repo_page: #{ e.message } for #{ uri_s }"
repo_parms = {}
end

$repo_scrape[uri_s] = repo_parms
return repo_parms
end

# scrape a version of a repository for packages and their contents
def scrape_version(site, repo, distro, snapshot, vcs)

Expand All @@ -656,6 +721,7 @@ def scrape_version(site, repo, distro, snapshot, vcs)
end

# initialize this snapshot data
repo_page = scrape_repo_page(repo.uri)
data = snapshot.data = {
# get the uri for resolving raw links (for imgages, etc)
'raw_uri' => get_raw_uri(repo.uri, repo.type, snapshot.version),
Expand All @@ -665,7 +731,11 @@ def scrape_version(site, repo, distro, snapshot, vcs)
'readme' => nil,
'readme_rendered' => nil,
'contributing' => nil,
'contributing_rendered' => nil}
'contributing_rendered' => nil,
'stars' => repo_page.fetch(:stars, ''),
'description' => repo_page.fetch(:description, ''),
'tags' => repo_page.fetch(:tags, []),
}

# load the repo readme for this branch if it exists
data['readme_rendered'], data['readme'] = get_readme(
Expand Down Expand Up @@ -698,7 +768,10 @@ def scrape_version(site, repo, distro, snapshot, vcs)
snapshot.packages[package_name] = package

# collect tags from discovered packages
repo.tags = Set.new(repo.tags).merge(package_data['tags']).to_a
repo.tags = Set.new(repo.tags).merge(package_data['tags'])

# add any tags placed on a repo
repo.tags = repo.tags.merge(data['tags']).to_a

# collect wiki data
package.data['wiki'] = @wiki_data[package_name]
Expand Down Expand Up @@ -1580,7 +1653,9 @@ def generate(site)
'pkg_deps' => p['pkg_deps'].length,
'dependants' => p['dependants'].length,
'readme' => readme_filtered,
'org' => URI(repo.uri).path.split('/')[1]
'org' => URI(repo.uri).path.split('/')[1],
'stars' => repo_snapshot.data['stars'],
'repo_description': repo_snapshot.data['description'],
}

dputs 'indexed: ' << "#{package_name} #{instance_id} #{distro}"
Expand Down
3 changes: 2 additions & 1 deletion help/package_list.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,12 @@ breadcrumbs: ['help']

- Package: Package name.
- Description: Package description.
- Release status (): Release status of the package. A checkmark signifies the package is released.
- Release status (): Release status of the package. A checkmark signifies the package is released.
- Core (⊙): Is this a dependency of a core package? (a dependent of ros_core, ros_base, desktop, or desktop_full).
- Last commit date (📅): The last date a commit was made to the package's repository.
- Package dependency count (⬅): How many packages this package depends on.
- Package used by count(➡): How many packages use this package as a dependency.
- Stars (★): Github count of repo stars.
- Authors: Names listed in package.xml as authors.
- Maintainers: Names listed in package.xml as maintainers.
- Repo: Repository containing this package.
Expand Down