Vulnerability Age

jacob_horning · June 29, 2020, 5:53pm

Hello All,

So I am trying to produce how many days the a single vulnerability has been on a host. There is an option to produce this number in “Days” with the timestamp from dim_asset_vulnerability_finding. I am unsure how to implement this and get Vulnerability Age back as a column. This is the current query I am working on.

"SELECT DISTINCT asset_id, dv.title, favf.date From dim_asset da JOIN dim_asset_tag USING (asset_id) JOIN dim_tag dt USING (tag_id) JOIN dim_asset_service_configuration USING (asset_id) JOIN fact_asset_vulnerability_finding favf USING (asset_id) JOIN dim_vulnerability dv USING (vulnerability_id)

Thanks in advance,
Jacob

tyler_schmidtke · June 29, 2020, 9:14pm

Does this table contain the information that you’re looking for? https://insightvm.help.rapid7.com/docs/understanding-the-reporting-data-model-facts#section-fact-asset-vulnerability-age

jacob_horning · June 30, 2020, 11:04am

I have tried using the table fact_asset_vulnerability_age but it does not seem exist within the datawarehouse schema. In the link below it says I can choose the function age(date, unit) which I can choose “days” but it will not let me choose “days” but the function choosing age works.
https://help.rapid7.com/nexpose/en-us/warehouse/warehouse-schema.html#fact_asset_vulnerability_finding_remediation.date

tyler_schmidtke · June 30, 2020, 5:32pm

@jacob_horning

My apologies - for the data warehouse, you can use some Postgres functions to get the format that you’re looking for.

If you have the date that vulnerability was found, favf.date, you can use the age(timestamp) function to get the following example format:

1 years 7 mons 29 days 23 hours 29 mins 3.564 secs

To get the days specifically from this, you could use the following. Note, this syntax is seemingly undocumented, but it will accept the following intervals years, months, days, hours, minutes, seconds.

age(favf.date, 'days')

This will return a floating point value that could be converted to an integer if you want with the following:

ROUND(age(favf.date, 'days'))::integer

There are some of other ways to accomplish this that are documented, but are a bit more verbose.

jacob_horning · July 2, 2020, 4:47pm

@tyler_schmidtke Thank you for the help. One more question, what if I wanted to print out years, months, and days but not hours, minutes and seconds. Thank you.

tyler_schmidtke · July 2, 2020, 4:51pm

You can use the extract function for that. The previously linked documentation has details on the syntax

jacob_horning · July 2, 2020, 6:13pm

@tyler_schmidtke

Thank you so much for the reference. It seems with everything I found I can only select one field from the extract function. I am trying to extract multiple fields not sure if this is possible with this function or my syntax is off with this.

EXTRACT(‘months’ , ‘days’ , ‘years’, from favf.date)

tyler_schmidtke · July 2, 2020, 6:42pm

You should be able to use CONCAT along with multiple calls of the age function, something like this might get the results that you’re looking for:

concat(extract(YEARS from age(favf.date)), ' years ', extract(MONTHS from age(favf.date)), ' months ',  extract(DAYS from age(favf.date)), ' days') AS age

matt_wyen · July 10, 2020, 2:03pm

be careful with SELECT DISTINCT

r7_aging AS (
SELECT asset_id
,vulnerability_id
,first_discovered
,most_recently_discovered
FROM fact_asset_vulnerability_age
)

jacob_horning · July 10, 2020, 2:08pm

@matt_wyen is there a reason with DISTINCT not working because I use it in almost everyone one of my queries and it removes duplicates I noticed. We had a lot of issues with this and had to remove the duplicates via a python script. Thanks

matt_wyen · July 10, 2020, 2:23pm

Stack exchange is really better at explaining this than I am - its a “WHAT IS MY USE CASE” situation

Whoever told you using DISTINCT is a bad sign in itself is wrong. In reality, it all depends on what problem you are trying to solve by using DISTINCT in the first place.

If you’re querying a table that is expected to have repeated values of some field or combination of fields, and you’re reporting a list of the values or combinations of values (and not performing any aggregations on them), then DISTINCT is the most sensible thing to use. It doesn’t really make sense in my mind to use GROUP BY instead just because somebody thinks DISTINCT shouldn’t be used. Indeed, I think this is the kind of thing DISTINCT is designed for.

If OTOH you’ve found that your query has a bug meaning that repeated values are being returned, you shouldn’t use either DISTINCT or GROUP BY to cancel out this bug. Rather, you should figure out the cause of the bug and fix it.

Using DISTINCT as a safety net is also a poor practice, as it potentially hides problems, and furthermore it can be computationally expensive (typically O( n log n ) or O( n 2)). In this scenario, I can’t see that using GROUP BY instead would help you.

https://stackoverflow.com/questions/33651429/should-i-use-distinct-in-my-queries

jacob_horning · July 10, 2020, 2:36pm

@matt_wyen that makes sense. What rapid7 does is thrown in how many instances a single vulnerability is found and include the port they found it on but the remediation would be the same for all of these instances. So to decrease the size of our reports we removed these instances and just kept one of them. The numbers then would be more closely related the numbers found in our dashboard cards.

akhade · March 30, 2022, 8:11am

@jacob_horning @matt_wyen

Are you removing the instances manually? Because it will take a long time for huge data.
Is there any alternate way to do it?