Access the equivalent of dim_asset_unique_id in SQL Report

matt_domko_deprecated · November 11, 2021, 3:24pm

With data warehouse, I can map an insightvm asset id to an aws instance id using the dim_asset_unique_id table. Is there an equivalent thing I can do with SQL Query reporting?

https://help.rapid7.com/nexpose/en-us/warehouse/warehouse-schema.html#dim_asset_unique_id

holly_wilsey · November 17, 2021, 4:38pm

I believe this is what you’re looking for:

https://docs.rapid7.com/insightvm/understanding-the-reporting-data-model-dimensions/#dim_asset_unique_id

matt_domko_deprecated · December 31, 2021, 9:46pm

Thanks folks That worked perfectly! Dropping my query here for future use.

Goal: Know if a vulnerability exists within a base AMI or was caused by something an engineer deployed to the instance.

InsightVM doesn’t collect the AMI an AWS Instance was created from, so I have 2x other workflows running that help with this:

A workflow that inventories all my running instances and stores the metadata in a PSQL table
A workflow that creates “reference instances” from each AMI that I’m using. It tags these instances with “ami-id” so that I can extract them later, and adds them to my “baseline farm” site.

Baseline Query
This query finds all the instances that are part of my “farm”, and aggregates the vulnerabilities that exist in them.

/* This query takes approximately 600 seconds to run on InsightVM
It returns a table containing:
  ami-id
  array of distinct vulnerability_ids
  count of vulnerabilities
We'll join this table against a table containing service, ami, service_vulns
to provide a high level compliance report
*/

WITH
  farm_assets as (
	-- Get all assets that are part of the AMI Farm
  	select
  		asset_id
  	from dim_asset
  	where
  	    last_assessed_for_vulnerabilities >= current_date - 1 and
              string_to_array(sites,',') && array['Baseline Farm']
),
  ami_ids as (
	-- Get All AMI IDs
  	select
  		tag_id,
  		split_part(lower(tag_name),'ami-id:',2) as ami_id
  	from dim_tag
  	where
  		lower(tag_name) like '%ami-id%'
),
  asset_ami as (
  -- join ami to assets
    select
      asset_id,
      ami_id
    from
      dim_tag_asset t1
    left join ami_ids t2 on t1.tag_id = t2.tag_id
    where ami_id is not null
  ),
  ami_vulns as (
    -- Get all vulns from each ami
    select
      t1.ami_id,
      array_agg(t3.nexpose_id) as baseline_vulns,
      count(*) as ami_vuln_count
    from
      asset_ami t1
    left join fact_asset_vulnerability_finding t2 using(asset_id)
    left join dim_vulnerability t3 using (vulnerability_id)
    group by ami_id
  )
  select
    *
  from ami_vulns

Vulnerability Query
Next I dump all the nexpose_id’s for all my recently scanned AWS assets.

with
  recently_scanned as (
    select
      t1.asset_id,
      t2.unique_id as instance_id
    from
      dim_asset t1
    inner join dim_asset_unique_id t2
      on t1.asset_id = t2.asset_id
      and t2.source = 'amazon-web-services'
    where
      last_assessed_for_vulnerabilities >= current_date -1
  ),
  vulns as (
    select
      t1.instance_id,
      avi.vulnerability_id,
      vuln_data.cvss_score as cvss,
      vuln_data.nexpose_id,
      vuln_data.severity,
      vuln_data.riskscore
    from
      recently_scanned t1
    left join
      fact_asset_vulnerability_finding avi on t1.asset_id = avi.asset_id
    left join
      dim_vulnerability vuln_data on avi.vulnerability_id = vuln_data.vulnerability_id

  )

  select
    *
  from vulns

Fancy Joins
An ICON workflow runs those two queries and uploads them to a PSQL server.
Once that’s done, I can do some fancy joins (ami vulnerability data, host vulnerability data, ec2 metadata) to report on:
Service Name, AMI Vulns, Service Vulns

If I could either: get our engineering teams to add an AMI label to their instances, or get R7 to collect the AMI id, I wouldn’t need the external PSQL database anymore…

aaiadmin · March 9, 2022, 1:03pm

Matt, they actually collect the AMI data (with the R7 agent) and much more, but unfortunately they are not importing that data to the Console.
If you can get access to the GraphQL (not GA yet), you can query the Cloud “backend” for a lot of data using the agent id.
Rapid7-GraphQL