Evolve network monitoring by combining fault management with performance management
Contributed by Amir Kupervas, Managing Director, Anodot.
Today’s telecom environments and networks are complex, presenting greater challenges than ever for those charged with monitoring the network, assuring continuous service, and driving customer experience.
To prevent customer complaints, the high cost for care, and ultimately – churn, operators need to be able to focus on how services are performing and what their customers are experiencing. To achieve this, it’s critical that they understand not only which technical issues had occurred, but what their impact is, and to resolve them before they turn into customer-impacting problems, instead of focusing on technical alarms.
This is why the traditional network monitoring framework, which relies on fault management for issue awareness, requires a more innovative approach.
Let’s take a closer look at how service providers can evolve network monitoring by combining network fault management with autonomous network performance management to profoundly improve service experience, reduce costs, and protect revenues.
The value of fault management
Network fault management systems detect and alert users to technical faults, i.e., events that result from malfunctions and which interfere with the correct functioning of the network.
The key benefits of network fault management include:
- Detecting technical issues in real-time
- Alerting incident handlers to the issue detected
- Driving incident resolution for the restoration of service
Yet, alongside these benefits, there is also a limit to how much fault management alone can do for optimal service delivery and application availability:
- Fault alarms do not provide insights into what is the impact on service and customer experience.
- Fault alarms are de facto reactive, particularly with regards to service degradation, coming in after the fact and lacking the data and insights required for handling issues before they turn into customer-impacting problems.
- Prioritization is not possible since these alarms provide no input on the actual impact on customers and their service experience, nor on how many customers are being impacted.
- Root cause understanding is compromised since it requires fault alarm correlations that can only be executed by experts with hard to find and often costly skill sets.
- The sheer number of alarms coming in the millions, often results in alarm fatigue and with critical alarms being overlooked.
- Alerts are generated as based on pre-defined static thresholds, whose efficacy primarily depends on the skill-set and technical knowhow of the individual who defined them.
What is network performance management
Where fault management is focused on detecting and alerting to technical failures, performance management is focused on detecting service experience degradations and on correlating issues and alerts.
Performance management enables operators to go beyond knowing that a technical fault has occurred. It monitors all the elements that make up a service, delivers an understanding of what is the impact of the issue, how many customers are being impacted, and enables the operator to resolve them before they turn into customer-impacting problems.
Why PM needs to be autonomous
It’s not just any performance management approach that can fully complete fault management. This is because performance management offerings typically do not:
- Monitor each network domain, layer, and type
- Collect data from the entire network in one place
- Correlate all data and prioritize by significance
As a result, operators are too often finding themselves still having to cope with alert noise, a prolonged time to resolve, low rates of root cause understanding, and rising customer complaints.
Autonomous network performance management eliminates these issues by:
- Continually monitoring and correlating network and service anomalies across the entire telco stack
- Monitoring cross-layer network performance and service experience.
- Preemptively identifying trends and sending predictive alerts that point to the root cause of faults before they become problems.
- Providing real-time actionable alerts for the next best action in their context.
A powerful combination
The combination of network fault management with autonomous network performance management constitutes a powerful framework for end-to-end service experience monitoring.
Fault management delivers strategic input about whether there was or wasn’t a malfunction, autonomous performance management delivers the strategic insights on the impact.
It lets the operator see the service degradation, understand what the root cause is, and dramatically reduce time to resolve through alerts that are in their context and actionable.
Moreover, autonomous performance management enables operators to be preemptive. For example, if there is a fault between a certain DNS that leaves subscribers unable to connect to Facebook, the issue will be alerted immediately, along with insights into why and what the next best action should be for accelerating resolution, before customers start calling into the contact center to complain.
How Anodot can help
Anodot is an autonomous network monitoring platform that completes the network fault management value proposition with real time detection of service-impacting incidents.
Anodot collects and analyzes data across the entire telco stack. Patented big data machine-learning algorithms detect outliers in time series data and make correlations among related anomalies. As a result, there is a 90% reduction in alert noise, 80% faster time to detect, 90% improved root cause analysis, and 30% faster time to resolve.
The telecoms business, operations, and network are in flux. And while it can sometimes be a great challenge to keep up, monitoring the network in a modernized and ever evolving ecosystem doesn’t have to be.
When combining the strengths of network fault management with autonomous network performance management, operators have the power to evolve network monitoring and improve service experience, reduce customer complaints, prevent churn, and protect revenues.
Now, that’s one powerful combination!