How to reduce OPEX, TCO and MTTR with Automated Assurance & Operations: The example of 360° Assurance.

Contributed by Andrew Baldock, Product Marketing Director, Infovista.

2022 has been the year that service assurance took center stage as an enabler of value creation. To some, this was surprising, but to others it was a long time coming: real-time, actionable and holistic network, service and customer intelligence is simply the other side of the coin to delivering revenue growth from highly configurable, programmable networks like 5G.

But the recent growth of investment in service assurance capabilities is not simply the product of renewed focus on an existing operational function. It’s more than that: communication service providers (CSPs) want to transform NOC/SOC processes in order to eliminate repetitive manual tasks, deconstruct assurance silos, expose assurance capabilities to customers and partners, and expand the remit of assurance beyond just troubleshooting and into network optimization.

Enter Automated Assurance & Operations: a modern framework that emphasizes cross-domain intelligence from the customer experience, through to service quality (as powered by customer- and resource-facing apps), to network performance and faults. Combine this intelligence with the automation of workflows, analytics and orchestration, and the building blocks for much reduced OPEX through process automation, accelerated mean-time-to-repair (MTTR) through the removal of process inefficiencies and time-consuming manual investigations, and ultimately improved TCO by eliminating unnecessary system silos.

Let’s illustrate this with an example: 360° Assurance for 5G slicing. 360° Assurance is the category of assurance use cases that combine cross-domain correlation and troubleshooting, as described above. Essentially this use case involves three core processes:  service level agreement (SLA) monitoring, troubleshooting and resolution.

SLA monitoring

The 5G slice SLA can often be determined purely through traffic monitoring, but the SLA performance is in fact influenced by the core, transport and RAN domains. So, having ‘actionable intelligence’ of the SLA requires metrics from all domains to be correlated by a shared modeling engine. Then, risks of SLA breaches can be identified, as precursors to breaches observed in the past.

This horizontal cross-domain SLA visibility is a critical building block for MTTR reduction through proactive NOC/SOC alerting and triggering, and requires the correlation of metrics about traffic, service quality, infrastructure and resource performance, and faults, with AI/ML for discovery of correlations and automated alerting.

Troubleshooting

Different stakeholders in the CSP are involved in troubleshooting: customer care, engineering teams, the NOC/SOC; but everyone wants to reduce time and effort – and OPEX – through greater automation of workflows. So, when an issue is identified in SLA monitoring, likely multiple tasks need to be triggered simultaneously.

This includes raising tickets for customer care and account teams to proactively notify customers of problems, and then enriching those tickets with contextual insights, possible root-cases, likely impacted services and regions, and estimated resolution times. The automated root-cause and impact assessment NOC/SOC workflows are triggered, with priorities and severities assigned based on customer impact analysis (in this case, risk and cost of SLA breach).

All this analysis involves vertical correlation: top-down root-cause analysis from the SLA down to the underlying network resource, and bottom-up impact analysis from the infrastructure and service quality up to the impacted customer.

Resolution

Resolving the root-cause of a potential or actual SLA breach is a case study in machine learning. While in some situations, rule-based resolution still applies; in the context of the highly configurable and programmable networks that drive these SLAs, it’s increasingly the discovery of causality (and rules) that is the biggest challenge in closing the loop for Automated Assurance & Operations.

Resolving a problem might involve reverting a software version, modifying a configuration in a router, rebooting a base station, adjusting antenna power or tilt, or indeed addressing any number of similarly difficult to predict root-cause.

This means discovering new rules as time goes on, automating the identification of the associated root-causes, and interoperating with the associated orchestrators or controllers to ensure the problem is resolved. Active testing also plays an important validation role here, in terms of validating both the problem itself, and the resolution of the problem, by simulating real calls and sessions at the start and end of resolution process.

Automated Assurance & Operations

So, the source of OPEX and MTTR reductions from the adoption of Automated Assurance & Operations is clear. But TCO savings comes from the adoption of true cloud-native technology, not just from the collapsing of silos and decommissioning of expensive legacy assurance systems.

With a fully containerized architecture and efficient, Kubernetes-based containerization and workload orchestration, CSPs can ensure that the system resource consumption of their capability is always optimal and based on the actual resource requirements of the system, and not simply the predicted peak workload. This is critical when deploying AI/ML, which involves unpredictable spikes of resource utilization.

Infovista Ativa™ is our answer to this emerging requirement for Automated Assurance & Operations. It is based on a unified cloud platform of shared analytics, automation, modeling, user interface and other resources, to enable use cases involving horizontal and vertical correlation and automation, like the 360° Assurance for 5G Slicing use case described here. It is deployable on public, private and hybrid cloud environments through Kubernetes, to deliver the TCO advantages of cloud-native systems. It introduces automation of analytics, correlation, workflows and orchestrator interoperability, to deliver the OPEX benefits described above. Finally, it comes with a catalogue of pre-integrated outcomes-based solutions, spanning IoT, voice, 5G core, business services, OTT video, and 5G slicing, to add time-to-value advantages.

Automated Assurance & Operations is rapidly evolving into a source of competitive differentiation for CSPs, as they look to leverage their networks and deliver returns on investments in 5G and cloudified networks.

You can find more insights into our vision, blueprint, products and solutions in Automated Assurance & Operations at www.infovista.com.