Skip to content

TheMooseman/wiki-xml-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

About The Project

This is a simple tool I built to parse the english wikipedia xml dump. It uses goroutines to parse the 105gb(at the time of writing) file and create a map that pairs a pages name with the links it contains to other pages. It's basically an adjacency list in NDJSON format.

Built With

Golang

License

Distributed under the GPL V3 license.

About

Wikipedia XML parser to make a graph of pages.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages