Top25 remediations report, but as an SQL query

Hello!

We have a need from our org to produce top25 remediation reports in CSV rather than the existing formats. Our colleagues work mostly in excel, and as such - the existing formats are all but useful for them.

I’ve searched through the discussions so far, and I have (albeit with my caveman-like knowledge of SQL) attempted to create something akin to SQL query to export Policy scan results with Remediation, Rationale and Proof - #19 by manny_singh but no luck so far.

The top25 remediation report is great, but the formats do not fit our orgs workflow - at all. Does anyone else have any ideas on how to go about this?

I think it might be possible to adjust the “SQL example - proof of a specific remediation” (SQL example - proof of a specific remediation | InsightVM Documentation), because I can get all of the remediations by just searching for “%” :slight_smile:

One issue I noticed with this approach was that there are a lot of duplicates in the resulting report.

Duplicates could be removed with a SELECT DISTINCT :slight_smile:

This is the query I have so far:

SELECT DISTINCT da.ip_address, da.host_name, dos.description AS operating_system, favi.date AS scan_finished, favi.port as port, dv.cvss_score as cvss_score, dv.severity as severity, proofAsText(ds.fix) AS remediation, proofAsText(favi.proof)
FROM fact_asset_vulnerability_instance favi
JOIN dim_vulnerability dv USING (vulnerability_id)
JOIN dim_vulnerability_solution dvs USING (vulnerability_id)
JOIN dim_asset da USING (asset_id)
JOIN dim_operating_system dos USING (operating_system_id)
JOIN dim_solution ds USING (solution_id)
JOIN dim_site_asset dsa USING (asset_id)
JOIN dim_site dsi USING (site_id)
WHERE solution_id IN (
SELECT solution_id
FROM dim_solution_highest_supercedence
WHERE superceding_solution_id IN (
SELECT solution_id
FROM dim_solution
WHERE lower(summary) LIKE '%'
)
)

Some more tinkering (selecting vulnerabilities according to internal classification) and I now have the query below - a problem I’m trying to solve is that multiple patches are included as separate items - even though they may be superseded by another patch (for example one windows server in the report has 20 KB patches, but only a few when looking at the top25 report).

SELECT DISTINCT da.ip_address, da.host_name, dos.description AS operating_system,
                favi.date AS scan_finished, favi.port as port, 
                CASE
                WHEN dv.cvss_score > 9 THEN
                    'Critical'
                WHEN dv.cvss_score BETWEEN 7
                AND 8.9 THEN
                'High'
                END AS "Severity",
                (regexp_replace(ds.fix,'<[^>]*>|^\s+|\s\s+','','g')) AS remediation, 
                ds.solution_type as solution_type,
                (regexp_replace(favi.proof,'<[^>]*>|^\s+|\s\s+','','g')) as proof
FROM fact_asset_vulnerability_instance favi
    JOIN dim_vulnerability dv USING (vulnerability_id)
    JOIN dim_vulnerability_solution dvs USING (vulnerability_id)
    JOIN dim_asset da USING (asset_id)
    JOIN dim_operating_system dos USING (operating_system_id)
    JOIN dim_solution ds USING (solution_id)
    JOIN dim_site_asset dsa USING (asset_id)
    JOIN dim_site dsi USING (site_id)
WHERE vulnerability_id IN (
    SELECT vulnerability_id
    FROM dim_vulnerability
    WHERE cvss_score >= 7.0
)
AND solution_id IN (
    SELECT solution_id
    FROM dim_solution_highest_supercedence
    WHERE superceding_solution_id IN (
        SELECT solution_id
        FROM dim_solution
    )
)

By god, I think I’ve done it. The query below only selects the “highest” solutions.

SELECT DISTINCT da.ip_address, da.host_name, dos.description AS operating_system,
                favi.date AS scan_finished, favi.port as port, 
                CASE
                WHEN dv.cvss_score > 9 THEN
                    'Critical'
                WHEN dv.cvss_score BETWEEN 7
                AND 8.9 THEN
                    'High'
                END AS "Severity",
                (regexp_replace(ds.fix,'<[^>]*>|^\s+|\s\s+','','g')) AS remediation, 
                ds.solution_type as solution_type,
                (regexp_replace(favi.proof,'<[^>]*>|^\s+|\s\s+','','g')) as proof
FROM fact_asset_vulnerability_instance favi
    JOIN dim_vulnerability dv USING (vulnerability_id)
    JOIN dim_vulnerability_solution dvs USING (vulnerability_id)
    JOIN dim_asset da USING (asset_id)
    JOIN dim_operating_system dos USING (operating_system_id)
    JOIN dim_solution ds USING (solution_id)
    JOIN dim_site_asset dsa USING (asset_id)
WHERE vulnerability_id IN (
    SELECT vulnerability_id
    FROM dim_vulnerability
    WHERE cvss_score >= 7.0
)
AND solution_id IN (
    SELECT DISTINCT fr.solution_id
    FROM fact_remediation(25,'riskscore DESC') fr
        JOIN dim_solution ds USING (solution_id)
        JOIN dim_solution_highest_supercedence dshs ON (fr.solution_id = dshs.superceding_solution_id AND ds.solution_id = dshs.superceding_solution_id)
)

2 Likes

The problem with this query is that it’s really, really slow. I tested it out on a relatively small scale at first (around 50 assets) and now generating the same report but for 250 assets - and the report has been generating for atleast 12 hours now.

I have to commend you for this query and how much you’ve built it out so far. :slightly_smiling_face: It is accessing a lot of different tables, so not super surprising that it’s slower.

My first recommendation would be to optimize the JOINs being used to try to minimize the number of fields being selected. There’s a Postgres walkthrough here that does a pretty good job of visually illustrating which joins to use, and when. I think LEFT JOINs may be good here.

You could also try to further restrict its scope with WHERE conditions. In particular, there could be a better way to do the WHERE vulnerability_id IN clause towards the end. That is likely generating a list of tens of thousands of vulns that it has to check through each time. Would something like this work better?

WHERE dv.cvss_score > 7.0
1 Like

Many thanks! That makes a lot of sense.

Adjusted the query a bit, removed the JOIN with dim_site_asset (since it’s not used in the select). This query seems faster, running a test report on my 250+ asset site right now (should take less than the original 36+ hours :smiley:)

One significant issue i noticed is that while I do FROM fact_remediation(25,'riskscore DESC') fr (which should result in a maximum of 25 remediations), the resulting report only cover 8 distinct remediations out of a possible 16 (listed in the query builder for the same site and CVSS-score limitation)

SELECT DISTINCT da.ip_address, da.host_name, dos.description AS operating_system,
                favi.date AS scan_finished, favi.port as port, 
                CASE
                WHEN dv.cvss_score > 9 THEN
                    'Critical'
                WHEN dv.cvss_score BETWEEN 7
                AND 8.9 THEN
                    'High'
                END AS "Severity",
                (regexp_replace(ds.fix,'<[^>]*>|^\s+|\s\s+',' ','g')) AS remediation, 
                ds.solution_type as solution_type,
                (regexp_replace(favi.proof,'<[^>]*>|^\s+|\s\s+',' ','g')) as proof
FROM fact_asset_vulnerability_instance favi
    LEFT JOIN dim_vulnerability dv USING (vulnerability_id)
    LEFT JOIN dim_asset da USING (asset_id)
    LEFT JOIN dim_operating_system dos USING (operating_system_id)
    LEFT JOIN dim_vulnerability_solution dvs USING (vulnerability_id)
    LEFT JOIN dim_solution ds USING (solution_id)
WHERE dv.cvss_score > 7.0
AND solution_id IN (
    SELECT DISTINCT fr.solution_id
    FROM fact_remediation(25,'riskscore DESC') fr
        JOIN dim_solution ds USING (solution_id)
        JOIN dim_solution_highest_supercedence dshs ON (fr.solution_id = dshs.superceding_solution_id AND ds.solution_id = dshs.superceding_solution_id)
)
1 Like

That’s interesting. Since this is part of its own contained SELECT statement at the end of your full query, what you could do is extract it and test it on its own to help figure out why it’s only covering 8 remediations. When I test that piece of the query on my side, I get a full 25 as expected. One of the other conditions specified in the full query could be limiting it and resulting in the 8 you see.

That small portion of the query also takes a long time to run on my side, so I suspect it’s another culprit in slowing things down. I don’t think you need to join with dim_solution there, so you could remove that for a little bit of improvement.

This is tricky all-around because it’s a complex query, and we generally recommend running things like these against a much smaller subset of assets to ensure reports don’t take hours to generate.

One thing I can recommend (and I’m not sure if you’ve already set this up, forgive me if you have) is our data warehouse option. What this does is allows you to export your InsightVM data to an external warehouse to get a richer set of data and better integrate it into any other reporting tools your org might have. This means the queries you run aren’t hitting the InsightVM console.

We’ve got some info about configuring the data warehouse as well as the schema for it. The schema between the DW and the console isn’t 100% the same, so a query like this one might need some minor adjusting. But it is better for running more demanding queries or pulling larger data sets.

Thanks, I will look in to the data warehouse thing.

I modified the query once again, my idea was to move the (potentially resource-intensive) call to fact_remediation aside and only do it once, modified the way I do DISTINCT to avoid duplicates (a lot of vulnerability proofs for RHEL lists specific CVE numbers in the proof, resulting a large number of rows), moved around a few columns and so on. Still getting weird amount of remediations here, even though I’m picking out 50 of them.

with remediations as (
    SELECT solution_id, summary, (regexp_replace(ds.fix,'<[^>]*>|^\s+|\s\s+',' ','g')) as remediation 
    FROM fact_remediation(50,'riskscore DESC') 
        LEFT JOIN dim_solution ds USING (solution_id)
)

SELECT DISTINCT ON (
        da.ip_address,
        r.summary
    )
    da.ip_address,
    da.host_name,
    dos.description AS operating_system,
    CASE
    WHEN dv.cvss_score > 9 THEN
        'Critical'
    WHEN dv.cvss_score BETWEEN 7 AND 8.9 THEN
        'High'
    END AS "Severity",
    r.remediation,
    favi.date AS scan_finished,
    favi.port AS port,
    (regexp_replace(favi.proof,'<[^>]*>|^\s+|\s\s+',' ','g')) as proof
FROM fact_asset_vulnerability_instance favi
    LEFT JOIN dim_vulnerability dv USING (vulnerability_id)
    LEFT JOIN dim_asset da USING (asset_id)
    LEFT JOIN dim_operating_system dos USING (operating_system_id)
    LEFT JOIN dim_vulnerability_solution dvs USING (vulnerability_id)
    LEFT JOIN remediations r USING (solution_id)
WHERE dv.cvss_score > 7.0 
AND dvs.solution_id in (
    SELECT solution_id
    FROM remediations
)

I feel like I’m really close, yet far away at the same time with this. I will investigate if I create this report via the API & python instead.

1 Like

After reaching out to support about this, they gave us the query which does produce a similar report to the top remediations report:

SELECT 
    ds.summary AS "Solution Summary",
    proofAsText(ds.fix) AS "Solution Steps",
    array_to_string(array_agg(da.ip_address), ', ') AS "IP Addresses",
    array_to_string(array_agg(da.host_name),', ') AS "Host Names"
FROM fact_remediation(25, 'riskscore DESC') AS fr
    JOIN dim_solution AS ds ON fr.solution_id = ds.solution_id
    JOIN dim_asset_vulnerability_solution davs ON fr.solution_id = davs.solution_id
    JOIN dim_asset AS da ON davs.asset_id = da.asset_id
GROUP BY ds.summary, ds.fix

This report does however not include all remediations for some reason.

@anon26205435 , You can try below query , not perfect to what you need since this is more focused on remediations which are for both critical and exploitable vulnerabilities.

WITH remediations AS (
    SELECT DISTINCT fr.solution_id AS ultimate_soln_id, summary, fix, estimate, riskscore, dshs.solution_id AS solution_id
    FROM fact_remediation(50,'riskscore DESC') fr
    JOIN dim_solution ds USING (solution_id)
    JOIN dim_solution_highest_supercedence dshs ON (fr.solution_id = dshs.superceding_solution_id AND ds.solution_id = dshs.superceding_solution_id)
),
 
assets AS (
    SELECT DISTINCT asset_id, host_name, ip_address
    FROM dim_asset
    GROUP BY asset_id, host_name, ip_address
)
 
SELECT DISTINCT
   csv(DISTINCT dv.nexpose_id) AS "CVE",
   host_name AS "Asset Hostname", ip_address AS "Asset IP",
   round(sum(dv.riskscore)) AS "Asset Risk",
   summary AS "Remediation",
   proofAsText(fix) AS "Remediation Steps",
cvss_score AS "CVSS Score",
exploits AS "Number of Exploits"

FROM remediations r
   JOIN dim_asset_vulnerability_solution dvs USING (solution_id)
   JOIN dim_vulnerability dv USING (vulnerability_id)
   JOIN assets USING (asset_id)
 
 WHERE dv.cvss_score >= 7.0 AND dv.exploits>0
 
GROUP BY r.riskscore, host_name, ip_address, asset_id, summary, fix, cvss_score, exploits
ORDER BY "Asset Risk" DESC

Manny

Hello!

Via the support, I learned that the fact_remediation (and all other SQL export reports) do not have access to the same data as the rest of the console. This explains why fact_remediation returns a different set of remediations from the top remediations document report. Apparently, this is not considered a product defect either (allthough it’s not clearly stated in the InsightVM documentation)

So a while back I just ended up writing a python scripts which convert a HTML version of the top remediations report into a CSV list. See the link below, note that I’ve chopped away our organization-specific stuff from the scripts, but you should be able to simply convert the report_data variable into a CSV file quite easily!

2 Likes