We have a need from our org to produce top25 remediation reports in CSV rather than the existing formats. Our colleagues work mostly in excel, and as such - the existing formats are all but useful for them.
Duplicates could be removed with a SELECT DISTINCT
This is the query I have so far:
SELECT DISTINCT da.ip_address, da.host_name, dos.description AS operating_system, favi.date AS scan_finished, favi.port as port, dv.cvss_score as cvss_score, dv.severity as severity, proofAsText(ds.fix) AS remediation, proofAsText(favi.proof)
FROM fact_asset_vulnerability_instance favi
JOIN dim_vulnerability dv USING (vulnerability_id)
JOIN dim_vulnerability_solution dvs USING (vulnerability_id)
JOIN dim_asset da USING (asset_id)
JOIN dim_operating_system dos USING (operating_system_id)
JOIN dim_solution ds USING (solution_id)
JOIN dim_site_asset dsa USING (asset_id)
JOIN dim_site dsi USING (site_id)
WHERE solution_id IN (
SELECT solution_id
FROM dim_solution_highest_supercedence
WHERE superceding_solution_id IN (
SELECT solution_id
FROM dim_solution
WHERE lower(summary) LIKE '%'
)
)
Some more tinkering (selecting vulnerabilities according to internal classification) and I now have the query below - a problem I’m trying to solve is that multiple patches are included as separate items - even though they may be superseded by another patch (for example one windows server in the report has 20 KB patches, but only a few when looking at the top25 report).
SELECT DISTINCT da.ip_address, da.host_name, dos.description AS operating_system,
favi.date AS scan_finished, favi.port as port,
CASE
WHEN dv.cvss_score > 9 THEN
'Critical'
WHEN dv.cvss_score BETWEEN 7
AND 8.9 THEN
'High'
END AS "Severity",
(regexp_replace(ds.fix,'<[^>]*>|^\s+|\s\s+','','g')) AS remediation,
ds.solution_type as solution_type,
(regexp_replace(favi.proof,'<[^>]*>|^\s+|\s\s+','','g')) as proof
FROM fact_asset_vulnerability_instance favi
JOIN dim_vulnerability dv USING (vulnerability_id)
JOIN dim_vulnerability_solution dvs USING (vulnerability_id)
JOIN dim_asset da USING (asset_id)
JOIN dim_operating_system dos USING (operating_system_id)
JOIN dim_solution ds USING (solution_id)
JOIN dim_site_asset dsa USING (asset_id)
JOIN dim_site dsi USING (site_id)
WHERE vulnerability_id IN (
SELECT vulnerability_id
FROM dim_vulnerability
WHERE cvss_score >= 7.0
)
AND solution_id IN (
SELECT solution_id
FROM dim_solution_highest_supercedence
WHERE superceding_solution_id IN (
SELECT solution_id
FROM dim_solution
)
)
By god, I think I’ve done it. The query below only selects the “highest” solutions.
SELECT DISTINCT da.ip_address, da.host_name, dos.description AS operating_system,
favi.date AS scan_finished, favi.port as port,
CASE
WHEN dv.cvss_score > 9 THEN
'Critical'
WHEN dv.cvss_score BETWEEN 7
AND 8.9 THEN
'High'
END AS "Severity",
(regexp_replace(ds.fix,'<[^>]*>|^\s+|\s\s+','','g')) AS remediation,
ds.solution_type as solution_type,
(regexp_replace(favi.proof,'<[^>]*>|^\s+|\s\s+','','g')) as proof
FROM fact_asset_vulnerability_instance favi
JOIN dim_vulnerability dv USING (vulnerability_id)
JOIN dim_vulnerability_solution dvs USING (vulnerability_id)
JOIN dim_asset da USING (asset_id)
JOIN dim_operating_system dos USING (operating_system_id)
JOIN dim_solution ds USING (solution_id)
JOIN dim_site_asset dsa USING (asset_id)
WHERE vulnerability_id IN (
SELECT vulnerability_id
FROM dim_vulnerability
WHERE cvss_score >= 7.0
)
AND solution_id IN (
SELECT DISTINCT fr.solution_id
FROM fact_remediation(25,'riskscore DESC') fr
JOIN dim_solution ds USING (solution_id)
JOIN dim_solution_highest_supercedence dshs ON (fr.solution_id = dshs.superceding_solution_id AND ds.solution_id = dshs.superceding_solution_id)
)
The problem with this query is that it’s really, really slow. I tested it out on a relatively small scale at first (around 50 assets) and now generating the same report but for 250 assets - and the report has been generating for atleast 12 hours now.
I have to commend you for this query and how much you’ve built it out so far. It is accessing a lot of different tables, so not super surprising that it’s slower.
My first recommendation would be to optimize the JOINs being used to try to minimize the number of fields being selected. There’s a Postgres walkthrough here that does a pretty good job of visually illustrating which joins to use, and when. I think LEFT JOINs may be good here.
You could also try to further restrict its scope with WHERE conditions. In particular, there could be a better way to do the WHERE vulnerability_id IN clause towards the end. That is likely generating a list of tens of thousands of vulns that it has to check through each time. Would something like this work better?
Adjusted the query a bit, removed the JOIN with dim_site_asset (since it’s not used in the select). This query seems faster, running a test report on my 250+ asset site right now (should take less than the original 36+ hours )
One significant issue i noticed is that while I do FROM fact_remediation(25,'riskscore DESC') fr (which should result in a maximum of 25 remediations), the resulting report only cover 8 distinct remediations out of a possible 16 (listed in the query builder for the same site and CVSS-score limitation)
SELECT DISTINCT da.ip_address, da.host_name, dos.description AS operating_system,
favi.date AS scan_finished, favi.port as port,
CASE
WHEN dv.cvss_score > 9 THEN
'Critical'
WHEN dv.cvss_score BETWEEN 7
AND 8.9 THEN
'High'
END AS "Severity",
(regexp_replace(ds.fix,'<[^>]*>|^\s+|\s\s+',' ','g')) AS remediation,
ds.solution_type as solution_type,
(regexp_replace(favi.proof,'<[^>]*>|^\s+|\s\s+',' ','g')) as proof
FROM fact_asset_vulnerability_instance favi
LEFT JOIN dim_vulnerability dv USING (vulnerability_id)
LEFT JOIN dim_asset da USING (asset_id)
LEFT JOIN dim_operating_system dos USING (operating_system_id)
LEFT JOIN dim_vulnerability_solution dvs USING (vulnerability_id)
LEFT JOIN dim_solution ds USING (solution_id)
WHERE dv.cvss_score > 7.0
AND solution_id IN (
SELECT DISTINCT fr.solution_id
FROM fact_remediation(25,'riskscore DESC') fr
JOIN dim_solution ds USING (solution_id)
JOIN dim_solution_highest_supercedence dshs ON (fr.solution_id = dshs.superceding_solution_id AND ds.solution_id = dshs.superceding_solution_id)
)
That’s interesting. Since this is part of its own contained SELECT statement at the end of your full query, what you could do is extract it and test it on its own to help figure out why it’s only covering 8 remediations. When I test that piece of the query on my side, I get a full 25 as expected. One of the other conditions specified in the full query could be limiting it and resulting in the 8 you see.
That small portion of the query also takes a long time to run on my side, so I suspect it’s another culprit in slowing things down. I don’t think you need to join with dim_solution there, so you could remove that for a little bit of improvement.
This is tricky all-around because it’s a complex query, and we generally recommend running things like these against a much smaller subset of assets to ensure reports don’t take hours to generate.
One thing I can recommend (and I’m not sure if you’ve already set this up, forgive me if you have) is our data warehouse option. What this does is allows you to export your InsightVM data to an external warehouse to get a richer set of data and better integrate it into any other reporting tools your org might have. This means the queries you run aren’t hitting the InsightVM console.
We’ve got some info about configuring the data warehouse as well as the schema for it. The schema between the DW and the console isn’t 100% the same, so a query like this one might need some minor adjusting. But it is better for running more demanding queries or pulling larger data sets.
Thanks, I will look in to the data warehouse thing.
I modified the query once again, my idea was to move the (potentially resource-intensive) call to fact_remediation aside and only do it once, modified the way I do DISTINCT to avoid duplicates (a lot of vulnerability proofs for RHEL lists specific CVE numbers in the proof, resulting a large number of rows), moved around a few columns and so on. Still getting weird amount of remediations here, even though I’m picking out 50 of them.
with remediations as (
SELECT solution_id, summary, (regexp_replace(ds.fix,'<[^>]*>|^\s+|\s\s+',' ','g')) as remediation
FROM fact_remediation(50,'riskscore DESC')
LEFT JOIN dim_solution ds USING (solution_id)
)
SELECT DISTINCT ON (
da.ip_address,
r.summary
)
da.ip_address,
da.host_name,
dos.description AS operating_system,
CASE
WHEN dv.cvss_score > 9 THEN
'Critical'
WHEN dv.cvss_score BETWEEN 7 AND 8.9 THEN
'High'
END AS "Severity",
r.remediation,
favi.date AS scan_finished,
favi.port AS port,
(regexp_replace(favi.proof,'<[^>]*>|^\s+|\s\s+',' ','g')) as proof
FROM fact_asset_vulnerability_instance favi
LEFT JOIN dim_vulnerability dv USING (vulnerability_id)
LEFT JOIN dim_asset da USING (asset_id)
LEFT JOIN dim_operating_system dos USING (operating_system_id)
LEFT JOIN dim_vulnerability_solution dvs USING (vulnerability_id)
LEFT JOIN remediations r USING (solution_id)
WHERE dv.cvss_score > 7.0
AND dvs.solution_id in (
SELECT solution_id
FROM remediations
)
I feel like I’m really close, yet far away at the same time with this. I will investigate if I create this report via the API & python instead.
After reaching out to support about this, they gave us the query which does produce a similar report to the top remediations report:
SELECT
ds.summary AS "Solution Summary",
proofAsText(ds.fix) AS "Solution Steps",
array_to_string(array_agg(da.ip_address), ', ') AS "IP Addresses",
array_to_string(array_agg(da.host_name),', ') AS "Host Names"
FROM fact_remediation(25, 'riskscore DESC') AS fr
JOIN dim_solution AS ds ON fr.solution_id = ds.solution_id
JOIN dim_asset_vulnerability_solution davs ON fr.solution_id = davs.solution_id
JOIN dim_asset AS da ON davs.asset_id = da.asset_id
GROUP BY ds.summary, ds.fix
This report does however not include all remediations for some reason.
@anon26205435 , You can try below query , not perfect to what you need since this is more focused on remediations which are for both critical and exploitable vulnerabilities.
WITH remediations AS (
SELECT DISTINCT fr.solution_id AS ultimate_soln_id, summary, fix, estimate, riskscore, dshs.solution_id AS solution_id
FROM fact_remediation(50,'riskscore DESC') fr
JOIN dim_solution ds USING (solution_id)
JOIN dim_solution_highest_supercedence dshs ON (fr.solution_id = dshs.superceding_solution_id AND ds.solution_id = dshs.superceding_solution_id)
),
assets AS (
SELECT DISTINCT asset_id, host_name, ip_address
FROM dim_asset
GROUP BY asset_id, host_name, ip_address
)
SELECT DISTINCT
csv(DISTINCT dv.nexpose_id) AS "CVE",
host_name AS "Asset Hostname", ip_address AS "Asset IP",
round(sum(dv.riskscore)) AS "Asset Risk",
summary AS "Remediation",
proofAsText(fix) AS "Remediation Steps",
cvss_score AS "CVSS Score",
exploits AS "Number of Exploits"
FROM remediations r
JOIN dim_asset_vulnerability_solution dvs USING (solution_id)
JOIN dim_vulnerability dv USING (vulnerability_id)
JOIN assets USING (asset_id)
WHERE dv.cvss_score >= 7.0 AND dv.exploits>0
GROUP BY r.riskscore, host_name, ip_address, asset_id, summary, fix, cvss_score, exploits
ORDER BY "Asset Risk" DESC
Via the support, I learned that the fact_remediation (and all other SQL export reports) do not have access to the same data as the rest of the console. This explains why fact_remediation returns a different set of remediations from the top remediations document report. Apparently, this is not considered a product defect either (allthough it’s not clearly stated in the InsightVM documentation)
So a while back I just ended up writing a python scripts which convert a HTML version of the top remediations report into a CSV list. See the link below, note that I’ve chopped away our organization-specific stuff from the scripts, but you should be able to simply convert the report_data variable into a CSV file quite easily!