Contributed by Yuval Stein, AVP Technologies at TEOCO
Advanced, zero-touch network operations – what it takes to create an ‘autonomous network’ – requires telecom network engineers to take the results from AI/ML algorithms and use them efficiently within a network’s operational processes. Up until now, much of the industry focus has been on the challenges developing the proper AI, yet sometimes forgotten, there are also challenges in putting the AI results to use in a way that contributes to the quality of the services and the network. That’s what this blog post explores. But before I get ahead of myself, let’s discuss the purpose of autonomous networks.
Autonomous Networks – what they are, and why we need them
Building an autonomous telecom network is somewhat akin to building a living organism. Just like our bodies can automatically self-regulate many of the functions that keep us alive – our stomachs digest, our lungs breathe, and our hearts beat – autonomous networks function in a similar way. The goal for Communication Service Providers (CSPs) is to create a fully automated environment that is self-configuring, self-healing, self-optimizing and self-evolving. A network that will hum along with minimal to no human intervention.
Why is this so important? Yes, there are cost savings that can be achieved through automation, but that has become almost secondary. Let’s stick with the human body analogy for a bit longer. If we suffer a small cut or cold, our bodies automatically begin the healing process. We don’t have to tell it what to do; it just happens automatically. If everyone had to rush to the hospital to seek the advice of a specialist for every sniffle, bump, and bruise, we would find it hard to get through the day. Job productivity would plummet, and there wouldn’t be enough medical personnel to deal with all the demand. That is the same issue for telecom networks. Systems have become so complex and fast-moving that human intervention can’t be relied upon to fix every minor network error – there are simply too many to manage, and response times need to be immediate. Automation needs to be the default, and humans should only have to step in for the bigger issues – when absolutely needed.
It Takes a Village: Solving the Autonomous Service Assurance Stack
Communication Service Providers, companies like TEOCO, and groups like TM Forum, are working to create the software, systems and tools required to enable fully autonomous networks – and we are getting closer every day. As you can see in figure 1 below, there is a stack of service assurance steps that need to be achieved – each one building upon the one prior.
The 3 bottom steps in the diagram below are well established, with their own operational methodologies. CSPs know how to work with network messages, events, alarms and KPIs. However, the upper layers, including Analytics and Automation – are very different. They require incorporating and acting upon things like probabilities and forecasts, which are relatively new types of information that until now have only existed in separate silos across various departments. Now, it all needs to work together. This requires new methodologies (and APIs) for how to incorporate these new stages before networks can become truly ‘Autonomous’.
Six Human and Technical Operational Challenges for Managing AI/ML Data
Understanding how to best leverage the information and data being generated by machine learning and artificial intelligence, and how to ‘operationalize’ the ongoing use of this information, is the task at hand.
As the saying goes, the devil is in the details. In integrating these new layers, which as mentioned above work very differently than the previous layers, we’ve identified a ‘language gap,’ for lack of a better term – with both human and technical hurdles that need to be overcome.
Before we can address this gap, we need to understand it and identify exactly what challenges we are facing before we can fix them. My belief is that they are both human and technical. After all, even with automation there will always be people involved at some level. I’ve outlined some of these challenges below:
- Lack of Trust: This is the main human challenge by far, as algorithms using deep mathematics are often not easily explainable. The use of visual cues and explanations within the user interface, along with achieving proven results over time, helps build trust.
- Defining What is Actionable – and How to Act Upon It: AI and ML results are rarely black and white. Like forecasting the weather, they often involve probabilities. But instead of predicting the chance of rain, the AI/ML results may show that there is an 80% probability for a network function failure in the next 12 hours. Network engineers need to decide- is this a high enough probability to require the system to automatically change a network configuration? And is there enough information to know what that change should be?
- Cost Benefit Analyses: Once an issue is predicted, are we able to compare the cost and impact of the expected failure to the cost and impact of the fix? To run a network in a cost-efficient manner, these types of decisions will be required on a regular basis. And what about future impacts? If a minor network error occurs you may decide to ignore it. But what if it could lead to a larger, more costly issue down the road– how do we predict and account for these?
- Optimizing Data Size: When it comes to machine learning, there is always a delicate balance between getting enough data to generate good AI/ML results, but not so much that it takes too long to process. Sometimes it is better to reduce the amount of data ingested so the algorithms can provide their findings closer to real-time. This needs to be done carefully though, to maintain quality results. Similarly, if the data output is too large, it becomes too complex and unwieldly for other systems to work with. Therefore, we need to reduce the results to those that are ‘operationally affective’. But how do we know which data to use and which to ignore? Sometimes these efforts are complex enough that they require their own algorithms.
- Lack of standardization of APIs – Further standardization of APIs will eventually create a true ecosystem of best-of-breed systems that can work together seamlessly to create a truly autonomous network. The industry is still evolving in these efforts- with more work ahead. Currently, Automated Root Cause Analysis is the only widely adopted AI/ML API. There needs to be more.
- AIOps Challenges – Creating, analyzing, and working with AI and ML data is very different from traditional software. A typical software solution is ready to go upon implementation and no changes are required until the next upgrade, but that isn’t typically the case with AI and ML. These systems have shorter lifecycles and are best defined as a hybrid mix of both a product and a service. They require regular re-training and updates because they are constantly learning from new data all the time. Having the right support structure in place for the ‘care and feeding’ of these new systems will be critical to their success.
Aside from the operational challenges associated with creating an autonomous network, automation in general requires upfront investments in technology, skills, and services. These investments can be significant and are better shared across the whole enterprise. A hybrid approach, which involves selecting the most cost-effective tool for each scenario, may (in the short-term) enable more-rapid deployments. However, this approach can prompt expensive maintenance and vendor management issues in the long term.
Automation also has implications for staff and organizational change. Automation projects can be delayed or difficult to justify where redundancies or reassignment create cost implications. Automation is best achieved where there is a clear understanding of each end-to-end process, and each process is closely managed to prevent poor practices from being replicated through the automation.
Is it worth it?
Some may wonder if these challenges are worth the effort and expense. The truth is that the telecom industry is at an inflection point, where for progress to continue we must address automation in a way to get both a positive return on investment in the long term, along with immediate results and efficiencies in the short term. What worked yesterday- the systems, processes, and skillsets – won’t work tomorrow. Telecom networks – and the future services they will enable – demand a new operational playbook.
TEOCO is at the forefront of this effort. We are working to address each one of these challenges by participating in research catalysts with groups like the TM Forum and investing in our own research and design; constantly exploring ways to help our customers manage the challenges ahead.
To learn more and hear how we are addressing some of these challenges, sign up for our FutureNet hosted webinar, Leveraging AIOps towards advanced zero-touch operations on the 15th of September at 4pm BST. Or contact us today.