How to make AI (really) work for your networks

Contributed by Puneet Sethi, CTO, Tata Communication Transformation Services.


Regardless of industry or application, organizations around the world are jumping on the AI bandwagon. The telecommunications industry wants to be included too. As telecom networks become more complex, AI techniques, such as predictive and image analytics, generative AI, and correlation analytics can increase efficiencies. It can be achieved by automating and simplifying various network management tasks and enhancing observability, manageability and reliability.

But to truly unlock the potential of AI, telcos need to figure out their secret sauce, which is, moving from generic algorithms to specific ones by applying telecom domain knowledge on top of the plethora of data they are sitting on.

It relies on collaboration between IT experts, data scientists and telecom subject matter experts within the organization, who can understand the data and the insights it can bring out, and then mature the generic algorithms to more advanced telecom-specific ones.

Data quality hinders network operations, can AI help?

Meet Puneet Sethi and the Tata Communications Transformation Services team at FutureNet MENA 2024

One of the challenges with telecom data is accessing correlated datasets like faults, alarms, incidents and inventory data. Telecom networks span across domains and evolve with mergers and acquisitions, resulting in discrete datasets that need to be correlated to understand their impact on each other.

Moreover, analyzing large volumes of data at high speed becomes more taxing when noisy data flows through real-world networks. Every month, hundreds of thousands of alarms are reported, which creates a lot of operational noise. A dedicated team of network technicians is needed to prioritize, triage, and acknowledge the alarms, which can be an agonizing multi-step process.

Consequently, it is not uncommon for alarms to receive the incorrect priority due to manual intervention. To achieve efficient alarm analysis, correlation and resolution, a significant amount of data pre-processing and domain expertise is necessary to smoothen the noisy data for meaningful analysis.

Another major obstacle is data integration. Telco networks generate volumes of data from multiple sources, often scattered across the organization. In case the data is created and maintained by multiple vendor solutions, integrating data could be challenging, creating a “data silo” problem.

For instance, most tier-1 telcos we served have an average of 30 service assurance tools from various vendors to manage their communication networks. In many cases, these industry-leading tools, which cost businesses millions of dollars per year, operated in silos. As a result, approximately 40% of alarms were uncorrelated, and 50% were managed manually, which makes multi-domain correlation nearly impractical.

Aggravating the matter is the lack of a data dictionary to allow seamless data exchange. AI/ML can be leveraged to overcome the low data quality challenge in such environments. For example, leveraging raw network events and alarm data feeds with AI levers to mitigate the fragmented inventory data is crucial to building a strong foundation for service assurance.

Blanket AI solutions aren’t always effective

Every telco network is distinct. Each network is an amalgamation of different technologies, OEM equipment and topology. In recent years of helping communication service providers (CSPs) improve their network performance, we have seen their assurance processes still rely on manual processing and the expertise of the network engineers.

This approach is not sustainable for managing alarm floods and network noise every day. Additionally, such a conventional approach cannot efficiently address

  • Cross-domain correlation
  • Alarm blueprinting and topology in the network backdrop
  • Adaptability to network changes and upgrades
  • Predicting network impact or performance degradation
  • Real-time linkages between network noise and unwanted incidents

As mentioned earlier, every organization has a custom network built for its specific needs. But building your own AI model that is accurate, flexible, and can be developed cost-effectively isn’t for everyone.

Furthermore, the AI solution must be configurable and customised to several use cases applied to a network. Often, it is a black box that lacks flexibility and does not allow teams to fine-tune for optimal coverage, accuracy, and benefits.

The secret sauce to AI success: Bringing in the practitioner’s perspective and collaboration

Data scientists, IT architects and telecom domain subject matter experts must come together towards building the AI strategy, use cases and its adoption. Training and maturing AI algorithms for a given network environment and use cases require a practitioner’s in-depth knowledge of the telecom industry and is crucial to realise tangible benefits with AI-driven use cases. This also ensures minimal lead time towards adoption.

  • Based on the insights, they should identify solutions for improving network efficiency, observability, and manageability.
  • Identify the pilot use case; demonstrate success; move on to larger implementations.
  • Start with a non-invasive approach with evolution towards AIOps.
  • Define the end goal. Determine the use of the AI solution and business benefits before beginning the design and development process.
  • Start with historical data and mature it to real-time analytics.
  • Validate the AI outputs with the right dataset.
  • Self-service is crucial. Enable UI-based mechanisms to conduct training, testing and improving AI output accuracy.

There is a crying need to “democratize” AI by providing levers to the end users to control parameters, operate the critical stages of AI, such as model training and testing, and remove the black-box approach.

Use cases

AI enhances observability and reliability to deliver predictable network outcomes. Use cases should bring in clear business benefits.

Predictive Assurance: Network Fault and Performance Prediction; Non-invasive approach ensuring AI accuracy and coverage exceeding industry benchmarks.

AI Correlation: Multi-domain correlation and alarm patterns to determine RCA in real-time and improve manageability. Minimal lead time to train and mature AI algorithms for the given environment.

AI-enabled service assurance and network blueprinting: Create network blueprints by feeding massive volumes of raw data into AI. This mitigates incorrect inventory data points and enables a foundation for service assurance.

Generative-AI: Use Generative AI (Gen-AI) to integrate systems and seamlessly bring all information under a Single Pane.

NOC Bots: Use Gen-AI NOC Bots for cognitive support, incident resolution and on-demand visibility.

Self-healing networks: Leverage code generation capabilities of Gen-AI to create self-healing networks or autonomous network changes.

Computer Vision: Helps in fibre fault prediction, telecom tower, ancillary maintenance, risk assessment and site inventory reconciliation.

Lead-to-Quote process: Use Gen-AI for faster L2Q, feasibility checks and auto proposal generations.

Wrapping up

While investment in AI is heating up, its full effect is still lagging. Like any other technology, buying off-the-shelf AI solutions has benefits and shortcomings. AI/ML algorithms and Gen-AI may help improve the stability of networks but could fall short of being a panacea for all problems affecting your network performance.

Therefore, data science has become table stakes for making AI systems work with network efficiencies and performance. Collaboration is the key for AI to shift from generic to domain-specific algorithms trained and matured on telecom data to automate and scale operations with high observability, manageability and predictability.