Description
There's wide room to scope this up or down for MVP (including possibly nothing for MVP). Many of us have had good past experiences with a built-in facility to collect information from deployed systems and make it directly available to our support and engineering teams. Much needs to be considered around customer consent and security (see RFDs 94 and 354).
Examples of stuff that's pretty easy and valuable to collect:
zpool get all
for all zpoolszfs get all
for all zfs datasetssvcs -Zap
for all sleds (all SMF service states in all zones, plus running processes)ptree
for all sleds (all processes)- for processes we care about, maybe:
pargs
,pargs -e
,pstack
pfiles
(but see below) - log files (e.g., SMF log files, syslog, FMA ereport and fault logs, CockroachDB logs)
- existing core files (assumes we've established some place to put these)
Slightly more invasive but probably safe enough would also be:
gcore
for any processes we care particularly about (e.g., Nexus, Sled Agent)cockroach debug zip
(their own support bundles)
This sounds like a lot, but I think it's fairly straightforward to collect most of this. More of the work seems like figuring out the security and privacy issues, temporary storage while we're assembling the bundles, and then putting them somewhere that we can access.
We can also start with very little and augment the collection facility with software updates. In past systems I've used, we tagged different kinds of data. A standard service bundle would collect a default set of tags. More specific bundles could be requested that would collect more data that was either too invasive or too expensive to do by default.
It could be we do none of this for MVP and move this to MVP+1. I think that's basically what RFD 354 proposes.