On the Need for Continuous Evaluation of Detection Coverage

"Security engineering is about building systems to remain dependable in the face of malice, error, or mischance."
- Ross Anderson, Security Engineering Third Edition

What is the difference between building something and engineering it? It is the difference between single entry and double entry accounting, between a simple Caesar cipher and a modern cryptosystem, and between knowing something and being able to prove it. In one word: rigor.

To truly earn the moniker "detection engineering", security operations centers ("the SOC") and managed security services providers need to apply a certain level of rigor to the process of drafting, testing, tuning, and updating alerts. This must be a continuous cycle: vendors are constantly updating their technology to apply new or modify existing logging capabilities, and adversaries routinely shift tactics and techniques to avoid detection.

In this series, I will outline key considerations and steps in the alert management lifecycle in order to demonstrate an engineering mindset as applied to detection strategy. Concrete examples will be specific to Splunk, the Security Information and Event Management (SIEM) platform with which I am most familiar. However, the process, and more importantly, the mindset, can be applied universally.

Malice, Error, and Mischance: Why Detection Strategy is Never Static

In contrast to, say, ten years ago, there is today a wealth of freely-available detection content on the web. The best sources are actively maintained, with frequent commits adding new detections and tuning existing ones. Though it is not without its flaws, Sigma has for years offered a means of translating existing alerts from one query language to another in addition to collecting a repository of SIEM-agnostic detections.

There is a parallel here with current musings about computer programming in the age of artificial intelligence. Detection content is now "free" in the way that generating code has become. Or perhaps not "free", but "deeply discounted". Even if you cannot use the publicly-available content directly for your own commerical purposes, it serves as fairly comprehensive prior art for your in-house effort. As with computer programming, the task then shifts from writing content to evaluating its correctness and effectiveness.

To understand how to build a reliable detection system, we first need to understand how things can go awry. Broadly speaking, there are a few ways alerts can lose effectiveness:

Log formatting drift
Logging infrastructure changes
New, more robust telemetry becomes available
Underlying attacker technique obsolesence

Format Drift

Since logs are nothing more than the (ideally) human-readable outputs from various pieces of technology, it stands to reason that when there are fundamental changes to the design or operation of the underlying system, the logs are likely to change to a similar degree. Polite vendors provide public advanced notice, sunsetting periods, and even extended support. Others lock such announcements behind customer login portals or provide notice only by quietly updating documentation.

A common reason for SaaS vendors to introduce breaking changes to logging schemas is a concomitant modification to the service’s API. Crowdstrike did this in 2024/2025 with the move from the DetectionSummaryEvent to the EppDetectionSummaryEvent endpoint. These two streams represent perhaps the most critical information Crowdstrike sends to the SIEM: alerts from the endpoint module of the agent. Failure to accommodate the new schema resulted in a significantly degraded ability to detect threats on endpoints.

Nor should such changes be considered one-and-done. Backwards compatibility is often needed to address situations where some systems emit logs in the legacy schema, and some in the new. One would think that fifteen years is a sufficiently long period of time to drop consideration for an old schema. Yet, the legacy key/value Windows log format shows up from time to time despite the fact that the XML-based replacement was introduced in 2011 with Windows Vista. This means that the number of formatting checks that must be performed tends to increase over time.

Changes to the SIEM and Other Infrastructure

While the task of monitoring the availability of telemetry and collectors typically falls outside the scope of detection engineering, it is nonetheless important to build a system with a certain resilience to error and tampering.

Adversaries, aware that defenders are increasingly dependent on a handful of tools for detection and response, have introduced various measures to disable or defang AV and EDR tools. Disabling log collectors does not yet seem to be a widespread technique, but log clearing has been a standard post-exploitation step for many years. Instead, attackers have focused on bypassing the system utilities that generate logs in the first place.

Detecting the Windows event log being cleared is fairly pedestrian and monitoring handles to NTDLL.dll is possible with good telemetry, but it is entirely reasonable to rely on EDR vendors to play the cat-and-mouse game of defense evasion with more advanced techniques.

Yet the question remains: what to do when adversaries are capable of disabling or bypassing EDR or other monitoring tools? It is patently not the role of the SOC analyst to respond to individual endpoints going dark. There must be a threshold, either a defined number of individual systems failing to report for a specified period of time, or a risk threshold covering actions (or sequences of actions) observed prior, that, when breached, rises to the level of a security incident. Put another way: one endpoint failing to report during normal business hours is normal, whereas one hundred endpoints going missing is likely a security incident. Similarly, an endpoint interacting with a newly-registered domain and retrieving an MSI file shortly before the telemetry ceases is probably also worth investigating. It is the task of the detection engineer to work with those responsible for monitoring availability to define what is normal in order to know when an incident has breached the threshold and become a larger security concern.

Unfortunately, detecting malice is only half the battle here. Setting aside benign reasons for logs going missing (e.g. network connectivity issues for an entire site), there are also the SIEM and its data processing to consider. Logs can be flowing from an endpoint to the SIEM, but changes to indexing, parsing, or normalizing that data can result in searches failing to monitor those logs. For this reason, it is important to design a testing strategy that validates each step in the pipeline, all the way from the asset emitting the log to that data appearing in the results of a particular search query.

Better Telemetry Exists

This is a subtler failure mode, but important nonetheless. Sometimes, two log sources overlap, showing the same activity with differing levels of verbosity. Consider Windows event logging. To make this even more concrete, let’s assume we want to write an alert targeting some attribute of process creation logs. We have (at least) two choices for log sources:

Native Windows event ID 4688
Sysmon event ID 1

Setting aside any other merits of the Sysmon approach (event ID 6 for Driver Loaded can be very useful to detect BYOVD attacks and has no direct, Windows-native equivalent), there is a clear difference between the two sources: Sysmon provides file hashes for this, and other, event codes, whereas native logging typically does not. Having the file hash for context, we can:

Present it to a SOC analyst to speed up alert triage, or even automatically close an alert as false-positive given other attributes
Enable many kinds of detections based on static file IOCs
Provide a valuable artifact for responders in the event of an incident

It may also be that the two log sources can be complementary, with each playing to its strengths. Some alerts will run against the old, some against the new, and perhaps some monitoring both.

The utility of the log source is likely not to be the only consideration. To extend the previous example, Sysmon is not yet natively deployed to Windows machines. It is not hard to envision some kind of business imperative or technology quirk that limits our ability to deploy an arbitrary number of agents to a Windows machine: the cost of licensing, lack of effective control or oversight for a particular business subunit, the use of ephemeral machines or containers, etc.

When new data becomes available, a detection engineer must weigh the visibility and detection coverage benefits against the infrastructure and organizational costs of implementation. The task here is usually to decide whether to jettison the less verbose logs to save resources or keep them as a part of a “defense-in-depth” approach.

The Ongoing Chess Match

The back-and-forth, high stakes game between attackers and defenders never stops (ask any third shift SOC analyst). Defenders deploy their defenses and attackers respond by executing attacks that fit in between the cracks.

The ongoing saga of Powershell defense and evasion illustrates the dynamic quite well. Prior to Powershell version 5.0, Powershell had relatively few protections, detective or preventive. Transcription logging existed, but had easily-exploitable gaps: obfuscated commands, execution via script, lack of support for remote sessions, etc. Furthermore, transcription logging can be incredibly verbose and is logged to plaintext, complicating any efforts to make effective use of the logs. Attackers were not deterred by what defenses then existed around Powershell. Powersploit’s initial release occurred in 2012, and Powershell Empire was first introduced to the public towards the end of this era at a BSides in 2015.

With the 5.0 release, Microsoft deployed several features that should be familiar to many today: AMSI, script block logging, constrained language mode, etc. For a brief time, it appeared as though defenders had the upper hand on this terrain. Even if script block logging was not enabled or used, AMSI gave attackers enough of a headache that they just started looking for ways to turn it off. As this technique proliferated, tamper protection mechanisms became more common in EDR and AV products. While this did not wholly prevent AMSI bypasses, it raised the level of sophistication needed to go undetected if Powershell was a part of your tradecraft. At present, the ClickFix family of attacks has become common, which represents less an evolution in the conversation and more an attempt to reframe it entirely, as it exploits a human vulnerability, not a technological one.

Needless to say, effective detection strategy has not remained constant in the face of all these changes. Prior to 5.0, one might argue there was no effective, Powershell-specific monitoring that was available. As defense evasion techniques evolved, focus shifted from monitoring attributes of the commands executed towards behaviors exhibited, which is the case generally as defenders seek to situate their defenses higher up the Pyramid of Pain.

The Case for Rigor

This post has demonstrated why detection engineering must be an ongoing process with a structured approach: threats to the strategy obviously stem from evolution in attacker techniques, but also from conflicting business imperatives and even from the monitoring tools themselves. An undisciplined process is likely to lose sight of one or more of these, or else fail to balance them in a way that creates exploitable gaps.

In the next posts, we will examine how to respond to these challenges. We will survey what data is available to us to evaluate our strategy and how to construct a test suite that enables us to rapidly iterate when confronted with change, while preserving our ability to assert coverage for targeted techniques.

The nature of the task suggests we will never achieve complete detection coverage or total security, but our goal should be to outpace the attacker, and to never fall behind. Only with an engineering mindset can we move quickly enough without risking the wheels falling off.