SQL query for finding assets with multiple software installed

I am using one of the SQL queries that are provided on github called Assets-By-Specified-Software. When I run it. Not output rows are shown.

SELECT da.ip_address, da.host_name, ds.vendor, ds.name as software_name,  ds.family, ds.version
FROM dim_asset_software das
JOIN dim_software ds using (software_id)
JOIN dim_asset da on da.asset_id = das.asset_id
where ds.name ilike '%Rapid7 Insight Agent%' AND ds.name ilike '%CrowdStrike%'

Change:

where ds.name ilike ‘%Rapid7 Insight Agent%’ AND ds.name ilike ‘%CrowdStrike%’

To:

where ds.name ilike '%Rapid7 Insight Agent%' OR ds.name ilike '%CrowdStrike%'
1 Like

Thanks for the help. I didn’t word the question correctly. I am looking for host that contain both of the software.

Here is a shamefully dirty way of what that logic might look like using the data warehouse schema.

WITH r7 AS ( /* Distinct List of Assets with software ilike '%Rapid7 Insight Agent%' */

SELECT 
	 da.host_name
	,'r7' AS note
FROM
	dim_asset da
JOIN
	dim_asset_software das USING (asset_id)
WHERE
	das.name ilike '%Rapid7 Insight Agent%'
GROUP BY 
	da.host_name

), cs AS ( /* Distinct List of Assets with software ilike '%CrowdStrike%' */

SELECT 
	 da.host_name
	,'cs' AS note
FROM
	dim_asset da
JOIN
	dim_asset_software das USING (asset_id)
WHERE
	das.name ilike '%CrowdStrike%'
GROUP BY 
	da.host_name

), asset_list AS ( /* UNION the two CTEs two get a complete list of assets */

SELECT
	 host_name
	,note
FROM
	r7
	
UNION

SELECT
	 host_name 
	,note
FROM
	cs

), final_asset_list AS ( 
SELECT
	host_name
FROM
	asset_list
GROUP BY
	host_name
HAVING COUNT(*) > 1 /* If the host_name is counted more than once, then it shows up in both CTEs */

)
SELECT 
	 da.ip_address
	,da.host_name
	,das.vendor
	,das.name as software_name
	,das.family
	,das.version
FROM 
	dim_asset da
JOIN
	dim_asset_software das USING (asset_id)
JOIN
	final_asset_list fal ON da.host_name = fal.host_name /* Ultimately all the work above is just used to filter the original query */
WHERE 
	(das.name ilike '%Rapid7 Insight Agent%' 
	OR das.name ilike '%CrowdStrike%')
    AND da.sites = 'Rapid7 Insight Agents'

So technically speaking, @bobsledtedd was sorta right the first time to the extent that it would give you the assets that have either of those two installed and then within the csv you could filter to see which assets duplicate.

However, the columns you’re selecting in the way that you are, kind of make it impossible to display that data. I’m assuming you want one line per asset, so keeping those columns for vendor, software_name, family, and version are going to break that query.

@bobsledtedd was even closer on the second query although it was for the data warehouse. I would just create two temporary tables each with a where statement of the one software you’re checking for and then simply add the host info. Then you can just join the two tables together. The assets that overlap will be displayed so you will know which assets have both software (one line per asset).

Otherwise if you want to keep the software info then instead of joining the two tables you could just do UNION and then sort by ip_address or host_name. This would give you the same result but two lines for each asset displaying the software details.

I actually played with this a little more and it wasn’t as simple as I initially thought. The having clause that @bobsledtedd put in his query can run into issues if you have multiple lines for the software. For example depending on how the agent was found you may have two lines.

This is the closest I could get on a whim to what I think you’re looking to see. You SHOULD only see one line per asset with both the cs and r7 info in that line. However if you have multiple lines for any given software you may have multiple lines per software. If for example you had two versions of each software in the database you could have up to 4 lines per asset. That’s going to come back on the data in your specific database though.

with r7 as (
SELECT das.asset_id, ds.vendor, ds.name,  ds.family, ds.version
FROM dim_asset_software das
JOIN dim_software ds using (software_id)
where ds.name ilike '%Rapid7 Insight Agent%'
),
cs as (
SELECT das.asset_id, ds.vendor, ds.name,  ds.family, ds.version
FROM dim_asset_software das
JOIN dim_software ds using (software_id)
where ds.name ilike '%CrowdStrike%'
)

select da.ip_address, da.host_name, r7.vendor, r7.name as r7_software_name,  r7.family, r7.version, cs.vendor, cs.name as cs_software_name,  cs.family, cs.version
from dim_asset da
join r7 using(asset_id)
join cs using(asset_id)

Same thing happened to me when I started working through it, ha.

Can you provide more context around this as I don’t follow? When I check your query against mine, we are getting the same results with yours having a few less lines as you are putting the r7 and CS details on the same line.

However, I’m curious what might cause it to break.

Maybe this is it what you are referencing?

  • You can have multiple host_names in dim_asset, each with a different asset_id. The reason for this is the asset belongs to different “sites”. Each asset will be tagged with the “Rapid7 Insight Agents” site and potentially another…For example depending on how the agent was found

  • In my envrionment I am seeing the results of this from both queries, which presents itself as seeing two entries for the R7 agent on the same host_name. Again because there are multiple asset_ids for the same host.

Solution: My initial thought was to add [WHERE da.sites = ‘Rapid7 Insight Agents’] to both queries, which does technically fix this issue, however I noticed in some examples that assets assigned to sites = ‘Rapid7 Insight Agents’ are reporting an older version of the agent than the entry where the sites = ‘ExampleSite,Rapid7 Insight Agents’.

…might need to dig into that more

If this was a big concern, you can get fancy with a ROW_NUMBER() and order by agent version to grab the record you are interested in (the correct one).

So what I was referring to wasn’t that there were separate asset_ids but that the Rapid7 Insight Agent was found in multiple ways. Typically speaking I’ve seen this more with Windows than Linux but here are the results I’m referring to

Screen Shot 2022-12-09 at 11.42.06 AM

Interesting. I’ll look to see if there are cases of this in my envrionment. I was seeing duplicated agents with your query and mine, but it was being caused by two entries in dim_asset for the same host_name (different asset_ids).

Thanks for the information John and helping me with solving this problem.

So that would definitely be another scenario that could duplicate lines given that my query is linking on asset_id. So like I said, the results are going to be as accurate as the data in your database.