The case for full-stack observability in a modern distributed application world

The application-first digital economy and future of work slowly taking shape over the past few years got a jolt of adrenaline in March of 2020. Before the pandemic, 50 percent of companies polled by the World Economic Forum expected that software, automation and AI would lead to some significant reskilling of their workforce as well as some reductions. COVID-19 significantly accelerated and exacerbated this, profoundly impacting software developers.

Increasingly more business transactions, autonomous supply chain control loops, health care delivery, agricultural efficiencies, education, and entertainment are taking place through modern distributed cloud native applications.

The Application is the New Brand

The business agility and quality of digital experience provided by modern applications has led to the latest industry mantra: the application experience is the new brand. This application experience demands a faster cadence of features and functions, consistent availability, enhanced application performance, and paramount trust and security around the data being handled by the application. AppDynamics’ App Attention Index shows brands have one shot to deliver the ‘total application experience.’

At the heart of providing this application experience is the developer, who is now tasked to deliver these apps and features faster, with higher availability and better security than ever before. Developers now live in the land of plenty and in the age of choices. They have a smorgasbord of software APIs and services available to construct applications ranging from mobile APIs to public cloud APIs, SaaS APIs, edge computing APIs, and on-premises APIs that their internal development teams might provide. They must select software services that streamline application development while keeping customers’ data secure. Building the modern application powered by external cloud and internet-centric environments is much different than the monolithic closed platforms of a bare metal server or a virtual machine.

In this distributed modern application development environment, that runs on complex underlying network and internet infrastructures, being able to observe your end-to-end and top-to-bottom applications across all APIs, software services, back-end sub-components, and All software and hardware infrastructure is critical to providing better customer experience, application availability and performance. This visibility is also key to driving down mean time to resolution (MTTR) on failures, and monitoring KPIs on how the business is doing and is potentially impacted, positively or negatively, with software and infrastructure changes. This is known as full-stack observability.

Full-stack observability allows any persona – developer, SRE, product, customer success, or business lead – to answer the questions of “What Happened?” “Where did it happen?” “Why did it happen?” and “Can it happen in the future?”

It’s helpful to illustrate this with a real-world example, where end-to-end full-stack observability was instrumental in driving down the MTTR and reducing the business impact of a modern banking application.

Alice, and Her Rendezvous with Full-Stack Observability

Alice is a developer in the mobile banking app team at New Bank, Inc. Two months into the pandemic her product manager asked her to develop a new feature for the New Bank mobile app: Contactless Cash Withdrawal. A customer would use the feature to first locate the nearest ATM, and get driving directions to the ATM. The mobile app would then authenticate and verify the proximity to the ATM, the credentials of the customer, and the amount to be withdrawn from their account. The customer is then simply asked to pick up the cash (yes, touch involved at this stage) from the ATM, without having to touch any high-traffic screens or buttons on the ATM.

The customer experience was quite simple, but the development experience was anything but. Alice had to start with mobile (say iOS) APIs, as that’s where her customers interacted with the app. Her entire back end was in AWS, so she had to select her AWS services carefully, while customer data was accessible via Salesforce SaaS APIs. Her bank’s transactional back ends existed on-premises on bare metal servers over a monolithic database whose APIs provided a global and account-level consistency picture, while her branch ATM’s edge compute nodes had a different set of APIs to manage geo-local cash consistency. There were other SaaS APIs to manage location, identity, compliance, etc.

A month after production deployment, the customer success team starts getting an increased number of calls around the contactless cash withdrawal feature taking too much time in spitting out the cash at various ATMs. Simultaneously, using a full-stack observability solution, the business metrics team witnesses increased transaction delays in the Digital Endpoint Monitoring (DEM) dashboard for the mobile banking app.

Alice and her fellow developers and SREs start invoking code using the full-stack observability APIs that uniformly queries and correlates relevant events across the Data Platform, which includes Metrics, Logs and Traces from every API, app, service, and infrastructure (HW or SW ) component outlined in the distributed development environment above. The full-stack observability UX allows every persona – eg, developer, SRE, product, business leader, customer success – to focus the relevant information to only those events that are pertinent to the persona.

After a few quick debugging cycles, they noticed that the latency between a service in AWS US-East and their on-premises software stack had been steadily increasing over the past hour. Using any monitoring capable tool, one could easily jump to the conclusion that this could be a network problem. But using full-stack observability, they could find out that a few memory (RAM) banks on their on-premises database server had failed. This was causing that database server to queue up incoming requests, which in turn was driving up the service layer latency between the AWS service and their on-premises software stack.

If Software will Eat the World…

Then full-stack observability will ensure that software is feature rich, evolves rapidly, is performant, trustworthy and secure, and will ensure that consumers of that software have the best possible digital experience. This becomes especially true with modern distributed software built across a variety of APIs and infrastructure stacks, spread across third-party providers, and running over the Internet.

Leave a Comment