SQL Query Run Time

Hi all,

I built a query to search for critical vulnerabilities where the number of affected assets is either 1 or 2. Even after verifying and testing the query, it seems that it’s taking a very long time to run, it several hours and still didn’t complete.
As well as stating the Site ID in the query, I also set the Site and Vulnerability filters in the Scope section of the report. The site has just under 3.5K of assets.

Can I make this anymore efficient in the hopes it runs faster?

SELECT DISTINCT
dv.title AS "Title", dv.malware_kits AS "Malware", dv.exploits AS "Exploits", dv.cvss_score AS "CVSS", dv.riskscore AS "Risk", dv.severity_score AS "Severity", fv.affected_assets AS "Instances", da.ip_address AS "IP Address", da.host_name AS "Hostname"

FROM
dim_vulnerability dv
JOIN fact_vulnerability AS fv ON fv.vulnerability_id=dv.vulnerability_id
JOIN fact_asset_scan_vulnerability_finding AS fasvf ON fasvf.vulnerability_id=dv.vulnerability_id
JOIN dim_asset AS da ON da.asset_id=fasvf.asset_id
JOIN dim_site_asset AS dsa ON  dsa.asset_id=da.asset_id

WHERE
dsa.site_id=90 AND dv.severity='Critical' AND fv.affected_assets BETWEEN 1 and 2

ORDER BY 
dv.title ASC

LIMIT
250;

I havent got to test it yet but im curious if it has something to do with the where statement. Try putting this in parenthesis. Other than that, im not sure about the SELECT DISTINCT instead of just SELECT. Also the order youre joining the tables affect the time as well. I always start my FROM with a fact table and then join the dimensions.

Hiwever this query is just not completing so theres something else going on here (possible the BETWEEN statement messing up on the AND portion

@rapid_wave as @john_hartman called out, it’s best practice to start with a fact table and join your dimensions from there as shown in the schemas.

In your example, you would start with fact_vulnerability, however, this is tricky since this fact table aggregates assets and you want to return unique asset values such as ip_address and host_name. I’ve found the most universal fact table to start with when looking for both dim_vulnerability and dim_asset data is fact_asset_vulnerability_instance

Here is the description of said table:

Accumulating snapshot fact for all current vulnerability instances on an asset. This fact provides a record for each vulnerability instance on every asset. If an asset is not vulnerable to any vulnerabilities (or all vulnerabilities have been excluded) it will have no records in this fact table.

Here is a query I put together that uses fact_asset_vulnerability_instance. Hopefully this is what you are looking for and can be used as an example for future queries. I tested this in my environment without a site_id and it took ~3 minutes to generate the report.

WITH cte AS (

SELECT 

	 favi.vulnerability_id
	,dv.title AS title
	,SUM(dv.malware_kits) AS malware
	,SUM(dv.exploits) AS exploits
	,dv.cvss_score AS cvss_score
	,dv.riskscore AS risk_score
	,dv.severity_score AS severity_score
	,STRING_AGG(DISTINCT CONCAT(da.host_name,' (',da.ip_address,')'),', ') AS host
	,COUNT(favi.vulnerability_id) AS vulnerability_instances
	,COUNT(DISTINCT favi.asset_id) AS affected_assets

FROM
	fact_asset_vulnerability_instance favi 
JOIN 
	dim_vulnerability dv ON favi.vulnerability_id = dv.vulnerability_id
JOIN
	dim_asset da ON favi.asset_id = da.asset_id
JOIN
	dim_site_asset dsa ON da.asset_id = dsa.asset_id
WHERE
/*dsa.site_id=90 AND */ dv.severity='Critical'
GROUP BY
	 favi.vulnerability_id
	,dv.title
	,dv.cvss_score
	,dv.riskscore
	,dv.severity_score
	
)
SELECT
	*
FROM 
	cte 
WHERE 
	affected_assets BETWEEN 1 AND 2 
ORDER BY
	title 

Callout: I only checked a few examples and they were correct, but be aware that the host_name and ip_address, might not be in the correct order. ← This no longer is an issue per below and the query above has been updated

Thank you @bobsledtedd, this is great. I have noticed that the hostname and it’s respective IP are not in the correct order as per your Callout. Rather than them being in a single line, what would have to change in order to get each asset in it’s own row? For example:

Vulnerability 1 | Asset1 | 10.10.10.1
Vulnerability 2 | Asset1 | 10.10.10.1
Vulnerability 3 | Asset1 | 10.10.10.1
Vulnerability 3 | Asset2 | 10.10.10.2
Vulnerability 4 | Asset1 | 10.10.10.1
Vulnerability 5 | Asset3 | 10.10.10.3

Again, thank for the help so far!

@rapid_wave Glad to hear. There are many options we can take, but here are two that came to mind first.

  • Option 1: I went back and updated the original script to concat the host_name and ip_address to ensure their correct order and make things a little cleaner.

  • Option 2: Going back to your original logic, we can add a CTE that utilized [fact_vulnerability] to filter on the desired range of affected_assets. This is a bit hacky, but I found it to be the most straightforward and can be easily adjusted. Note: This will return records at a vulnerability_instance level, meaning there will be duplicate records for each instance of a vulnerability on a given asset. You can pivot this data in excel to roll it back up to get the rolled up results of the OG query. Uncommenting the DISTINCT will reduce the dataset down to a vulnerability level.

WITH cte AS (

SELECT 

	 favi.vulnerability_id
	,dv.title AS title
    ,dv.malware_kits AS malware
	,dv.exploits AS exploits
	,dv.cvss_score AS cvss_score
	,dv.riskscore AS risk_score
	,dv.severity_score AS severity_score
	,da.host_name AS host_name
	,da.ip_address AS ip_address

FROM
	fact_asset_vulnerability_instance favi 
JOIN 
	dim_vulnerability dv ON favi.vulnerability_id = dv.vulnerability_id
JOIN
	dim_asset da ON favi.asset_id = da.asset_id
JOIN
	dim_site_asset dsa ON da.asset_id = dsa.asset_id
WHERE
/*dsa.site_id=90 AND */ dv.severity='Critical'
	
), affected_assets AS (

	SELECT
		fv.vulnerability_id
	FROM
		fact_vulnerability fv
	WHERE 
		affected_assets BETWEEN 1 AND 2	
)
SELECT /* DISTINCT */ 
	*
FROM 
	cte
JOIN
	affected_assets aa ON cte.vulnerability_id = aa.vulnerability_id
ORDER BY
	title