EPSS is a daily estimate of the probability of observed exploitative activity over the next 30 days. It is designed from the ground up to make the best possible use of all available information in five steps:

  1. Collect as much vulnerability information as possible from various sources.

  2. Collecting evidence of daily exploitation activity.

  3. Training a model: Learning the relationship between vulnerability information and exploitation activity.

  4. Measure the model’s performance, tune, and repeat step 3 to tune the model.

  5. Daily: Update vulnerability information (Step 1) and apply the model (Step 3) to produce daily estimates of the likelihood of exploitation over the next 30 days for each published CVE.

  6. Vulnerability information

  7. Exploitation activity

  8. EPSS modeling

    1. Predict the future
  9. Model performance

    1. Effort, efficiency and coverage
    2. Measure performance across all possible scores.
  10. Where do I get EPSS results?

  11. Where do I start as a practitioner with EPSS? How do I use it?

  12. How is EPSS different from CVSS? Should this be used with CVSS or replaced with CVSS?

Vulnerability information

Vulnerability intelligence gathering is about collecting data that we hope will help answer the question: “What increases (or decreases) the likelihood that a vulnerability will be exploited?” Fortunately, we don’t need to know the answer or Estimate weights for the data we collect. The modelling in Step 3 (in combination with the exploitation activity in Step 2) will explore how different sources of information help explain the exploitation activity we observe. The more information we can gather, the better, as more detail and diversity in the data can help the model discover increasingly subtle patterns about which vulnerabilities are likely to be exploited.

The vulnerability information we collect:

  • Provider (CPE, via NVD)
  • Vulnerability age (days since CVE was published in the MITER CVE list)
  • References with categorical labels that define their content (MITRE CVE List, NVD)
  • Normalized multi-word expressions extracted from the vulnerability description (MITRE CVE list)
  • Weakness in Vulnerability (CWE, via NVD)
  • CVSS metrics (base vector from CVSS 3.x, via NVD)
  • CVE is listed/discussed on a list or website (CISA KEV, Google Project Zero, Trend Micros Zero Day Initiative (ZDI), more to be added)
  • Publicly available exploit code (Exploit-DB, GitHub, MetaSploit)
  • Offensive security tools and scanners: Intrigue, sn1per, jaeles, nuclei

As EPSS develops, this list will undoubtedly be updated and expanded.

Another critical consideration within EPSS is that it takes a meticulous approach to capturing time information.

Knowing when information was published, entered into lists, or otherwise influenced the threat landscape is crucial.

Of particular importance, for example, is whether and when a module for a specific CVE was added to Metasploit.

Metasploit is an open-source penetration testing framework used by cybersecurity experts, ethical hackers and penetration testers. It provides a comprehensive set of tools and resources for detecting, exploiting, and validating security vulnerabilities in computer systems. Key features include exploit development, various payloads, post-exploitation modules, and auxiliary modules for scanning and information retrieval tasks. Metasploit integrates with databases and the exploit database, encouraging a community approach to keeping the framework up to date. Although freely available as a Metasploit Framework, commercial versions like Metasploit Pro and Metasploit Express offer additional features and support. Ethical use of Metasploit includes authorized and controlled testing to identify and remediate security vulnerabilities proactively. Unethical or unauthorized use is strictly prohibited.

Exploitation activity

Feedback enables learning, which is why there is a focus on collecting and organizing input in the form of exploitation activities in the wild. The list of contributors is constantly expanding. An exploit activity is evidence of an attempt to exploit a vulnerability but was unsuccessful against a vulnerable target. Collecting data using honeypots, IDS/IPS sensors and host-based detection methods is an extension of the data sources.

It was learned early on that exploitative activities are not a permanent stream of activity that, once begun, continues indefinitely. Exploitation is intermittent, often sporadic and sometimes isolated, local and short-lived. A simple report that a vulnerability has been “exploited” does not help understand precisely when, how often, or how frequently the exploitation activity occurred.

The timing must be known precisely to measure whether it happened before or after certain events and whether it is still being exploited.

With specific timing information, the impact of the various events known about each vulnerability can be accurately measured.

To highlight this aspect of timing, EPSS uses things like Google Project Zero and CISA KEV as vulnerability information rather than exploitation activity because their presence on the list represents a single point in time (a vulnerability was added to a list) and occurred in the past. Monitoring and collecting this information makes it possible to understand what happens after the vulnerability is added to the website or list. By being added to CISA’s KEV list, attackers are less likely to exploit these CVEs in the future, as defenders may focus on resolving them before others. Google Project Zero raises awareness that zero days become less of a target once they reach N-day status on a published CVE. Both lists raise attacker awareness, and we may see increased exploitation activity days or months after being added to the list(s).

Detailed exploitation activities and the daily timing of these activities form the feedback loop to train the EPSS model.

EPSS modeling

EPSS uses machine learning to identify patterns and relationships between vulnerability information and exploitation activity collected over time. Model generation is closely linked to measuring the performance of the model. Instead of focusing solely on modelling, the EPSS team spends much more time on performance measurement, which is critical to using EPSS.

Predict the future

Measuring performance may seem impossible at first glance.

Here’s how it works: EPSS is trained using 12 months of historical data. To test performance on “future” data, these 12 months are created starting from 14 months, and the model is trained on data from 12 months up to 2 months. That leaves two months of “future” data the model has never seen and knows nothing about. The team can then test all possible variations of models and data sources and see how well it will work in the “future”. It can even further test how the model may degrade over time.

A note on prediction

When most people hear “prediction,” they might think of a mystic with a crystal ball making vaguely specific statements about the future. You can’t expect EPSS to tell you exactly what is being exploited, what isn’t, and when. A commonly used analogy is gambling (probability theory originated from gambling).

All that can be said about vulnerability exploitation is that some vulnerabilities are more likely to be exploited than others; That’s all EPSS says. EPSS provides practitioners with information that gives them an advantage and enables them to take advantage of opportunities and proactive actions against attackers before they take those actions.

Model performance

To measure performance, we need two (perhaps obvious) things: our assessment of future events and feedback about whether those future events occurred or not.

Let’s walk through an example, create a simple prioritization strategy that people are familiar with, and prioritize all CVEs with a base CVSS score of 7 and above. We’ll talk about measuring performance using a set of numbers later. However, we will decide to remediate for a CVSS score of 7 or higher and delay remediation for everything else. We compare this to the outcome (observed exploitation activity). This gives each vulnerability two different attributes: whether we prioritized it or not and whether there was any exploitation activity. Each vulnerability can be tagged with one of four categories, as shown in the image below. To put it in numbers: As of October 1, 2023, NVD has published 139,473 CVSS 3.x results for published CVEs. Over the following 30 days (October 1-30), we observed 3,852 unique CVEs (with CVSS 3.x scores) with exploitative activity (approximately 2.7%). Note that we exploited many more CVEs than in October, but we are measuring CVSS here, so we’ll limit ourselves to those with CVSS 3.x results. For these 139,000, we can create a 2x2 table with the two variables (left) and visualize the proportions with a Venn diagram (right).

The large grey circle represents all published CVEs with a CVSS 3.x score assigned by the NVD, and all CVEs with a CVSS score of seven or higher are shown in blue (our “strategy” says we should prioritize these) . Finally, all CVEs we observed being exploited in the following 30 days are shown in red (these are the ones we should have prioritized).

  • True Positives (TP) are the standout choices – the prioritized vulnerabilities that we have also seen exploitation activity against in the wild. Here, the blue circle of prioritized vulnerabilities overlaps with the red circle of exploited vulnerabilities.
  • False positive results (FP) These are prioritized but not exploited vulnerabilities. These decisions represent potentially wasted resources and lie in the blue circle of prioritized vulnerabilities that do not overlap with the red circle of exploited vulnerabilities.
  • False Negative (FN) These are vulnerabilities that have not been prioritized but have been observed to be exploited in the wild. These vulnerabilities in the red circle do not overlap with the fixed vulnerabilities in the blue circle.
  • True Negatives (TN) are vulnerabilities that are not prioritized and not exploited; These are the vulnerabilities in the outer grey circle that have not been fixed or exploited in the wild.

As the figure above shows, the remediation strategy based on CVSS 7+ prioritizes a large part of the published CVEs (the blue part overlaps the grey one) and leads to a lot of false positives (the blue part does not overlap the red one), and still leaves a lot of them The exploited vulnerabilities are open and waiting to be fixed (the red is not covered by the blue).

Let’s compare this to EPSS with a completely arbitrary limit of 10% (this value is not a reference, but chosen just for this example)

The first noticeable difference between the two is the size of the blue circle or the effort each strategy would require. With an EPSS threshold of 10%, the effort is significantly reduced. We also see that false positives have dropped considerably, which is good, but it comes at the expense of some true positives.

Four intertwined and related variables are sufficient to determine a particular strategy’s value and/or performance. We can simplify this table to two numbers that most companies want to track over time:

Efficiency and cover. Of course, we also want to monitor the expenses as this is heavily tied to staff, resources and budgets.

Effort, efficiency and coverage

Using these four categories (TP, FP, FN, TN) we can derive three additional meaningful metrics: effort, efficiency and coverage.

The effort measures the proportion of vulnerabilities that are prioritized. Research research demonstrated that most organizations can resolve an average of 10-15% of their open vulnerabilities per month. This is something to remember when comparing different strategies and the time/resources they would require from your stakeholders.

Efficiency considers how efficiently resources were deployed by measuring the percentage of prioritized vulnerabilities that were exploited. In the diagram above, efficiency is the amount of the blue circle covered by the red circle. Prioritizing primarily exploited vulnerabilities would result in a high efficiency score (resources were allocated efficiently), while prioritizing potentially random or largely unexploited vulnerabilities would result in a low efficiency score. Efficiency is calculated as the number of prioritized exploited vulnerabilities (TP) divided by the total number of prioritized vulnerabilities (TP+FP).

The cover considers how well the percentage of prioritized exploited vulnerabilities is calculated by dividing the number of prioritized exploited vulnerabilities (TP) by the total number of exploited vulnerabilities (TP + FN). In the diagram above, coverage is the amount of the red circle covered by the blue circle. Low coverage indicates that only a few exploited vulnerabilities were fixed using the specified strategy.

These three variables are interrelated. Within a single strategy, achieving better coverage involves more effort and less efficiency, while improving efficiency often reduces effort and coverage. This trade-off can be overcome by finding an improved prioritization strategy. Therefore, tracking these metrics and comparing them across different approaches can be important.

Measure performance across all possible scores.

EPSS has never published guidance on thresholds that everyone should use. There is no universal “critical” or “high” to agree on. Breaking down and understanding metrics like effort, efficiency, and coverage will help explain why. These thresholds represent risk tolerance statements, and different organizations approach vulnerability prioritization differently depending on their internal risk tolerance and resource constraints. Organizations with few resources (staff/budget) may want to emphasize efficiency (at the expense of coverage) to limit their effort and achieve maximum impact with the limited resources available. But for organizations where resources are less constrained, and security is more mission-critical, the emphasis may be on increased coverage at the expense of effort and efficiency.

Where do I get EPSS results?

There are two methods to obtain EPSS results. First, a daily downloadable CSV file containing all CVEs can be downloaded via a direct HTTP request. Secondly, a REST API is available at https://api.first.org/epss**/** , which is documented.

Where do I start as a practitioner with EPSS? How do I use it?

EPSS is helpful in many ways so that specific advice can be complex. But first of all, get the data that EPSS produces. It is updated every day and for every CVE published. EPSS produces a probability of exploitative activity (a value between 0 and 1) in the next 30 days, the primary EPSS score. EPSS also puts this score into context by determining the percentile, the proportion of vulnerabilities rated at or below the vulnerability. The EPSS scores can be used as an initial prioritization tool for observed vulnerabilities in your environment. They can help estimate the likelihood that a particular vulnerability will be exploited in the wild.

How is EPSS different from CVSS? Should this be used with CVSS or replaced with CVSS?

Both EPSS and CVSS aim to help network defenders better prioritize vulnerability management. Both efforts are carried out by volunteer groups of researchers, practitioners, academics and government employees. The results of both efforts will be made available to the public free of charge.

EPSS is a measure of threat – it estimates the likelihood that a vulnerability will be exploited in the wild. This is achieved entirely through data-driven empirical analysis. Because EPSS generates a probability, it can be scaled to estimate the likelihood that at least one of a broader set of vulnerabilities can be exploited. This more extensive set could represent all vulnerabilities on a laptop, network appliance, subnet, or office location.

CVSS , On the other hand, is a measure of the overall severity of a vulnerability. The CVSS-based metrics are evaluated based on the immutable characteristics of a vulnerability. These values ​​are then converted into numerical form, and an equation is then used to approximate the vulnerability severity ranking. CVSS cannot be used to combine scores from multiple vulnerabilities.

Cheers Sven