We have established a DW and I have been familiarizing myself with the schema by replicating various dashboard cards from within the portal.
The wall I seem to constantly hit is the use of dates with the various trending cards and how those translate to the dates within the data warehouse schema. The use of date labels within the portal dashboards/cards is nonexistent and I have not uncovered any helpful queries or documentation that shed light on this…
My hope is I am missing something obvious and can be on my way with a little direction. I have attempted to replicate about every card within the portal that uses a date and can never seem to produce the same values. My assumption is the date fields I am using are incorrect and translates to inaccurate values being bucketed.
Can someone help me out here? Is there something obvious I am missing for determining what date fields are being used within the portal cards vs the DW?
For example, say I want to replicate the “New vs. Remediated Vulnerability Comparison Over Time” card using the DW schema.
- What date is being utilized within that card and where does it live within the data warehouse?
I’m pretty sure for that query the dates would be scan dates. Depending on how you build it I assume it could either come from exact scan dates or from a fact table.
You could start by looking here which is a SQL query against the console schema for new vs. remediated vulnerabilities. Some of the tables/columns would need swapped out but the logic would essentially remain the same. Take note of the temporary tables for baseline_scan_date and current_scan_date
Otherwise you may be able to play with fact_asset_date which reports on the amount of vulnerabilities an asset had at a given date.
Otherwise I don’t think we have a pre-built DW schema query to display what you would see on those queries.
@john_hartman - Appreciate the reply. The scan dates in dim_scan (started, finished) was my first attempt, however, I am having a hard time getting a scan down to an asset level. Do you have any insight here? For instance, If I grab a random scan_id from dim_scan, which tables provide the assets under that particular scan? I’m specifically talking about assets that fall under the default “Rapid7 Insight Agents” site.
The two that made the most sense to me were fact_scan and/or fact_asset_event, however, neither include values for scans that are performed by the “Rapid7 Insight Agents”. For example, within fact_scan, I have the scan_id, but the values are all 0. This doesn’t seem correct…
I’ve also disected that query and seemed to old produce the value of “old”. I can always take another stab at it, but the translation to the DW schema makes it difficult to understand if the issue is with my undertsanding or the difference in the data availability between the two environments.
I tried using fact_asset_date, but the limitation their is the day is predicated on the export date and my worry is if you don’t export everyday without issue, you wouldn’t be able to do accurate trending.
Is it possible our ETL process is doing something out of the ordinary by not populating fact_scan and fact_asset_event with all the dim_scan values?
So the reason why you’re not finding anything based off of the Insight Agents is because they are not scans. The insight Agent assesses for vulnerabilities but it is not listed as an actual scan.
I’ll see what I can find about scoping dates to Insight Agent assessments instead of scan dates.
"I’ll see what I can find about scoping dates to Insight Agent assessments instead of scan dates."
That sounds like exactly what I’m looking for as our environment currently relies heavily on the the results from the Insight Agents and not actual scans.
Let me know if I can assist with anything on my end.