The rosdistro cache is actively maintained by the OSRF buildfarm https://github.com/ros-infrastructure/rosdistro and in the cache it has effectively all of the content that we need in the index, including all the distro information as well as the package.xml content. We can likely reduce most or even all of the crawling and iterate using the rosdistro API and leverage the cache to build this site much faster.
@rkent this was what I was mentioning about avoiding the full crawl in jekyll.