API observability overview
What is API observability?
API observability is the extent to which an API’s internal state can be understood through the signals it emits. In order for an API to be observable, it must be instrumented with event listeners, agents, or libraries, which enable teams to passively collect an API’s metrics, events, logs, and traces. This telemetry data can then be used to create alerts that will notify teams about concerning issues, and it can also be visualized and forwarded to an APM tool for further analysis. API observability therefore plays a crucial role in helping teams monitor their API’s performance, troubleshoot issues, understand usage patterns, and identify opportunities for optimization.
Here, we’ll discuss how API observability supports the API-first development model—and clarify the relationship between API observability and API monitoring. Next, we’ll review the four pillars of API observability and explore some common use cases for this telemetry data. Finally, we’ll highlight some features of the Postman API Platform that can help teams improve the observability of every API in their portfolio.
What role does API observability play in an API-first world?
Today, many teams are designing and building applications as a collection of internal and external services that are delivered through APIs. This approach, which is known as API-first, has resulted in the widespread proliferation of independently managed microservices that communicate with one another through APIs. Microservice-based architectures are highly scalable, but their distributed nature makes them difficult to observe. For instance, a seemingly small change in one microservice may have significant consequences on another microservice it interacts with, but these kinds of problems can be difficult to troubleshoot without full visibility into the entire system.
In addition to helping teams implement microservice-based architectures, the API-first approach has also led to an increase in the number of APIs that are offered as billable products to third-party consumers. In this model, API producers are responsible for upholding Service-Level Agreements (SLAs) for availability, performance, and security, and an issue can erode customer confidence and lead to churn.
API observability gives teams the information they need to ensure that every API—whether it’s private, partner, or public—is delivering its maximum value. It enables teams to not only bridge production-level visibility gaps between microservices, but also correlate trends in API performance and usage with key business metrics, such as revenue and adoption, in order to maintain alignment between business and technical strategies.
What is the relationship between API observability and API monitoring?
API observability and API monitoring are closely related concepts, but they are not the same thing. API monitoring is the process of gathering, visualizing, and alerting on pre-defined metrics in order to ensure that an API is meeting specific expectations. API observability supports API monitoring, but it is much more open-ended. It not only gives teams access to context-rich data that enables them to debug unanticipated issues, but also facilitates ad-hoc exploration and complex analysis that can guide high-level business decisions.
What are the four pillars of API observability?
As discussed above, an observable API emits telemetry data that enables developers, site reliability engineers, and DevOps engineers to better understand its internal state. There are four primary types of telemetry data—metrics, events, logs, and traces—which are collectively known as the “four pillars” of API observability. While no single piece of telemetry data can tell the entire story, these different types of data can be analyzed together in order to paint a more complete picture of an API’s health, performance, and usage.
A metric is a measurement of a value that is taken at a specific interval, such as once per minute or once per hour. There are many types of metrics that can shed crucial light on different dimensions of an API’s health. For instance, work metrics—such as throughput and latency—can reveal how efficiently an API is able to process requests, while resource metrics—such as CPU and memory usage—can be used to gauge an API’s saturation. These metrics not only help surface issues that require immediate attention, but can also be analyzed in the long term in order to identify opportunities for optimization.
Events capture significant state changes within a system, such as a new host spinning up, a code deployment, or a configuration change. They include contextual information about what happened, when it happened, and which users, services, or assets were involved. For instance, a code deployment event will likely include a timestamp, the name of the user who initiated it, the deployment environment, and the name of the branch that was merged. Events can be useful when teams need to troubleshoot sudden spikes in an API’s latency or error rate, as they may contain clues about the issue’s root cause.
Whereas events capture significant—though relatively infrequent—activity, logs record the details of every action that takes place in a system. Logs are much more granular than events; in fact, a single event can often be correlated with numerous logs. For instance, every step of a code deployment event—such as the moment the working branch was merged, the build initiation, and every CI test execution—would likely be captured in separate logs.
A typical API log contains the request method and URL, its timestamp, the HTTP status code, the response time, and the IP address of the caller. This information helps teams troubleshoot issues with specific endpoints and methods, and it can also be used to investigate suspicious activity or security attacks.
A trace is a record of a request’s entire path through a distributed system. Every trace contains at least one span, which represents a single step in the request’s journey. Every span includes data about what occurred at that step, such as the amount of time it took and whether any errors occurred. Traces and their constituent spans are often visualized on a flame graph or service map, which enables teams to better understand traffic patterns and dependency relationships. Traces can also help teams isolate the component responsible for a spike in overall latency, and they can be correlated with logs and events during the troubleshooting process.
What use cases does API observability support?
Many teams leverage API telemetry data to ensure that an API’s latency and error rate meet its Service-Level Objectives (SLOs). While this type of performance monitoring is extremely important, API observability also facilitates more complex forms of monitoring and analysis. For instance, API telemetry data can help teams:
Plan for API deprecation
Deprecation is a normal part of an API’s lifecycle, but it can be difficult for teams to plan for it without adequate visibility into an API’s usage. API observability enables these teams to track an API’s requests per minute, as well as its number of unique consumers, so that they can make an informed decision about whether or not an API can be safely deprecated. This data can also be useful in the aftermath of a deprecation announcement, as it helps teams confirm that usage is declining as expected.
Discover gaps in API test coverage
API tests help teams confirm that an API’s methods, endpoints, and integrations are working as expected. However, it can sometimes be difficult for these teams to anticipate exactly how users will interact with an API, which can cause them to omit key workflows from their test suite. API observability enables teams to maximize their test coverage by surfacing insight into which endpoints and methods are most commonly used—and with which parameters. Teams can then create tests that capture and validate these important—though unexpected—user journeys.
Detect production-level deviations from an API’s baseline
It’s common for teams to deploy their API to a dedicated staging environment before they deploy it to production. Staging environments are designed to mirror production environments as closely as possible, which allows teams to establish a baseline for how they expect their API to perform in the real world. Once the API reaches production, teams can continuously compare its production-level data to this baseline in order to catch any unexpected deviations as soon as they occur.
How can Postman help you improve your API observability?
The Postman API Platform includes several features that help users boost the observability of every API in their portfolio. With Postman, you can:
- Create collection-based monitors: Postman enables you to monitor the health and performance of individual requests and entire workflows with collection-based monitors. These monitors can be run manually, on a schedule, and in various regions, and they also support custom retry logic.
- Forward API performance data to other observability tools: Postman integrates with several third-party observability tools, such as Datadog, New Relic, and Splunk, which allows you to correlate data from your Postman Monitors with metrics, events, logs, and traces from across your environment.
- Visualize performance data on a filterable dashboard Postman displays the results of every monitor run on a built-in dashboard, so you can spot performance trends at a glance. The dashboard can be scoped to individual requests, run types, results, and regions, so you can troubleshoot more efficiently.
- Get notified about run failures and errors: Postman Monitors can be configured to automatically email you if a request fails, so you don’t have to worry about missing an issue that is surfaced by a scheduled run.