Thanks @sean_obrien. That sounds promising and I hope that this is something which would be useful to other people/customers.
You asked:
It would be great to understand the task this formatted output helps with, and any subsequent use of the csv file so we can tackle this optimally.
Fair question, and I know I said this in my original post:
On a fairly regular basis, we work with Process Start Events data extracted from InsightIDR as a CSV. It would be very useful to extract only those columns which are relevant.
I should have explained more clearly, but when I say we work with extracts from InsightIDR, it’s not one specific task which we’re repeating over and over. I can give some examples though; these are usually kicked off either by a request from management for arbitrary information at scale, or by our need to understand exposure in a big hurry.
The first example I’m going to give is Log4j. Since this was such a severe and easily exploitable vulnerability, we needed to figure out which endpoints were vulnerable asap, so that we could give senior management the info they’d need to make emergency decisions; do we switch things off? Might be better to have some services offline and safe than online and compromised. Especially since Log4j dropped at the end of the week, we couldn’t wait for InsightVM detections to come through. By looking at Process Start Event data, we could at least say “here’s a list of endpoints which are almost certainly vulnerable”.
Thinking about it, these requests often (but not always) boil down to something like: “which exact version of {software} has been run on {subset of endpoints} during {time period}?” We can pull Process Start Events, and use process.exe_file.hashes_*
fields to determine exactly which version of software is running, even if InsightIDR or other solutions don’t always record the exact version.
What I describe above might not be the originally envisaged use of the export functionality. Having said that, I think it’s a valid thing to want to do, and the platform is capable (with a bit of nudging).
I know adding features takes a lot of work/testing/documentation/communication, so I’m keeping my expectations realistic here. I do see advantages to allowing extracts with only custom columns though, and not just for customers: if I can cut down the size of a generated export file from several GB to ~50MB by excluding all those columns I don’t need to see, that might reduce S3 data transfer/storage costs on the R7 side…?
One more thought: if this functionality does get added, it would be a bonus if the user could save lists of desired columns for reuse in the future, since otherwise it’d be necessary to select columns each time.