Apache2 / nginx log collection and analysis

Hello,

we plan to ingest the access and error logs of Apache2 / nginx in IDR to alert unusual behavior.
Unfortunately, there are no standard event sources for this.

Does anyone have experience with these logs, custom parsing of the logs and creating rules to monitor the most important aspects?

I would be very pleased if you could share your experiences with me.
Elisabeth

You could set up a Custom log event source and set it to tail the apache logs remotely (you’d need to set up an smb share and provide credentials)

Alternatively the Insight Agent offers the ability to tail any log file locally and send it to Log Search see here Configure the Insight Agent to Send Additional Logs | InsightIDR Documentation

David

Hello David,

Thank you for your feedback.

Feeding logs into IDR only for the search is not practical. In addition to searching the logs, we also want to create detection rules to alert on unusual behavior. Additional logs collected by the Insight Agent will bypass the custom parsing module and the detection engine.

For these reasons, we planned to use the custom log event source and set up a custom parsing rule. We also discuss which activities should be monitored for web servers.

Do you have experience with parsing these logs? Do you have any recommendations for web server log activities that should be monitored?

Regards,
Elisabeth

Theres an alternative configuration you can use with the logging.json method to send in the data through an agent via the collector listening on a network port. This would enable you to build Custom Detection Rules (not just basic detection rules) as well as Custom Parsing Rules.

The logging.json configuration would look like this

Step 1 - Configure the logging.json

{
    "config":
    {
        "datahub": "<COLLECTOR_FQDN>:20001",
        "state-file": "/opt/rapid7/ir_agent/components/insight_agent/common/config/logs.state",
        "logs":
        [
            {
                "token": "",
                "enabled": true,
                "name": "Apache Logs",
                "path": "/var/log/nginx*.log"
            }
        ]
    }
}

Step 2 - Setup Custom Log Event Source in IDR

You would need to set up a Custom Log Event Source in IDR, set it to Listen on Network port and enable the TCP encrypted option. Navigate to data collection, add event source, Add Raw Data → Custom Logs

Screenshot 2024-10-24 at 12.10.09 PM

Screenshot 2024-10-24 at 12.05.21 PM

Note some important keys in the above configuration:

  • datahub - this key is the target machine (collector) listening in the specific port. Add the collector FQDN or hostname ONLY followed by the port number, separated by a colon.
  • formatter - this can be set to “formatter”: “plain”, or omitted to set it to default behavior of prepending a syslog header onto line (which would break parsing if logs are native JSON)
  • logs - this array is required
  • token - this must be included but set to an empty string “”
  • name - not used when the formatter is set to plain, when syslog header enabled, this appears in each log header
  • path - this should point to the directory which contains your Apache logs, with a wild-carded file name if needed including the file extension. Don’t use unanchored wildcards.

Step 3 - Restart the Agent Service

After saving the logging.json in the

/opt/rapid7/ir_agent/components/insight_agent/common/config.‘’

  1. Restart the agent service.

Once the le_realtime job starts - you should see a log lines like this

[agent.jobs.le_realtime]: Following Apache Log - /var/log/nginx*.log

it should begin tailing the Apache logs and streaming them to the event source.

Within a minute of the above log event (provided the log file you are tailing is active) you should see the events when hitting the View Raw Log button.

One limitation of this method is that if you tail multiple logs, all of them will be sent to the same log within log search, since the collector using a custom log event source is limited in this manner.

However you can work around this by leveraging the name key within the configuration which will appear in each log as part of the syslog header which is prepended by default.

David