Open
Description
This is a doozy, so watch out! Long term project, no immediate action needed.
We have (and can generate anytime) a full export of Google Groups content to .mbox
format using Google Takeout. It's all public information except peoples email addresses.
Someday, we may want to import all of these as nodes, back-dated using the timestamp data, to make them searchable in PublicLab.org. This might involve several challenges:
- matching email addresses to usernames where possible
- displaying an alert that these were auto-imported from Google Groups, with a link to original URL
- ability to display "users" for each email address that does NOT have a matching user account
- whether to forward comment responses to these legacy nodes to everyone in that discussion using the old emails
- how to display a thread -- initial post as a node, then all responses as comments?
- how to ensure "reply back quoted text" is not displayed since it'll be disruptive (similar to reply by email filtering)
- how to actually run the import script using
mbox
data - maybe via https://github.com/darthbatman/mbox-json plus a Ruby script? - do a test run of just one to see how it looks
- what tags to use automatically per-list?
I'm sure there's more. This is a starting list.