Should We Target Zero False Positives?

In a perfect world, secret reveal tools discover all leaked secrets and never report false positives.

Unfortunately – or perhaps fortunately… – we don’t live in a perfect world: Secret detection tools aren’t perfect, and sometimes they report false positives. But would it really be better if they didn’t?

What are false positives and false negatives?

A false positive occurs when the system reports a condition as valid when it is not. The opposite, false negative, occurs when a condition is reported as invalid when its validity should have been reported.

In the case of secret discovery, the condition is to determine whether a character sequence, or set of character sequences, is a secret.

We can summarize this in a table:

Sequence marked as not a secret (negative) The sequence marked as secret (positive)
Sequence is not a secret true negative false positive
sequencing secret negative error true positive

(You can learn more about these and other terms in our article explaining the concepts of accuracy, precision, and recall)

With these definitions in mind, it makes sense for covert detection tools to try to increase the number of true negatives and true positives, and reduce false positives. But secret discovery relies on a lot of inference, so often the answer isn’t really black or white. What should a secret detective do when the answer is not clear?

Which is worse, too many false positives or too many false negatives?

Let’s take this to the extreme.

If the tool reports too many false positives, it leads to alert fatigue: Alerts flooded users. They started to be ignored and some real positive secrets were not taken care of.

On the other hand, if the tool is too aggressive against false positives, users will not be inundated with reports and they can take care of reported reports. But since the tool’s inference cannot be perfect, the tool will identify true secrets as false negatives and silently sweep them under the rug.

Which is better?

Which is better?

In both cases, valid secrets are at risk of being lost, but at least if the tool prefers to mark a secret as valid when it is unsure, the user can realize this: the secret will not be silently discarded.

This may be counterintuitive when one begins to evaluate secret revealing tools: when comparing tools A and B, if A reports only valid secrets and B reports valid and invalid secrets, one might be tempted to decide that A is the better product. This is until you dig deeper and notice that some valid secrets that B reported have been silently ignored.

This is similar to the case of spam: if the spam filter is too aggressive, your inbox will remain clean and tidy, but you will have to regularly check for valid emails from the spam folder.

False positive, from whose point of view?

Sometimes tools can’t access the entire context. A sequence can be defined as a secret by a detector because it matches all properties of secrets used for a particular service. However, you know it’s not really a secret in the context of your infrastructure, for example because it’s used in example code. For the gadget, this is a valid secret, and for you it is a false positive.

Continuing the spam filter analogy, you are the ultimate arbiter of whether an email is spam or not.

Beware of over-equipment

According to Wikipedia, overfitting is “producing an analysis that corresponds too closely or completely to a given set of data and may therefore fail to fit additional data or reliably predict future observations”.

When developing a covert detection tool, eliminating all false positives reported when testing the tool against known data sets may lead to overfitting. The tool will have perfect theoretical results but will perform poorly on real world data.

To avoid falling into this trap, we use static datasets and real time data. Static data sets are good when one needs determinism, for example, to detect regressions. Real-time data prevents over-personalization by using our tools against diverse content. We also update our static datasets regularly to ensure that they remain relevant.

How to avoid fatigue alert?

It’s safer to have some false positives than no false positives at all, but if we don’t deal with alert stress, users won’t be much better off. What features do GitGuardian tools provide to help solve this problem?

First, both our internal and public monitoring products allow you to identify warehouses to exclude. This allows you to quickly get rid of a bunch of irrelevant secrets.

Second, following our left-leaning strategy, we introduce GitGuardian Shield, a client-side tool to prevent secrets from being committed. GitGuardian Shield helps reduce the number of false positives reported to your security team by making your developers aware of them: if they try to commit a secret, ggshield It will prevent them from doing so, unless they tell it as a secret to ignoring the use of a file ggignore Comment (in this case defined as false positive). When a secret is marked as ignored, it will not sound an alert when it is pushed into your repository.

Once you report a leaked secret, we provide tools to help your security team send it quickly. For example, the automatic repair guide automatically sends a mail to the developer who committed the secret. Depending on how the operating directory is configured, developers can be allowed to resolve the incident or ignore it themselves, easing the remaining workload for the security team.

conclusion

Don’t be attracted to false promises of 0 false positives. If a secret detective doesn’t report any false positives, you’ll likely be silently skipping real secrets.

We’re constantly improving our tools and improving our algorithms to report as few false positives as possible, but like an asymptote, no false positives can be arrived at. And erring on the side of caution and allowing some false positives is much safer. This leaves the user in control, but beware of alert fatigue. Confidential detection tools should provide easy ways for users to report false positives as such.

.

Leave a Comment