Cloudifying the network infrastructure

Contributed by Teresa Monteiro, Director of Marketing, Infinera

In recent years network automation has evolved past SDN and NFV, with the cloud emerging as a major player. Networks that once extended from physical to virtual are now moving to cloud. Cloud-native network functions are helping service providers expand beyond connectivity, and multi-cloud and hybrid cloud architectures are core to meet the distributed computing requirements for scale and agility imposed by 5G and IoT.

In this blog I will also discuss the role of cloud in network automation – but I will do it from a different perspective: that of cloudifying the embedded network infrastructure.

Cloud-native down to the infrastructure

When we think of optical network automation, we typically think of the network management and control layer, and of centralized automation applications enabled by an optical domain controller – automation applications such as network discovery and inventory, path computation, or service restoration.

But let’s not forget that there are also important automation applications implemented in the network operating system (NOS), the operating system running on the individual network elements. The adoption of a cloud-native architecture at the NOS facilitates agile delivery and deployment of such applications.

By cloud-native architectures and technologies, we specifically mean a NOS that is microservices-based and can be deployed in containers with the support of a container management system. This choice of software architecture has many well-known benefits; today, I will focus on the fact that it allows for software modules, developed, and compiled elsewhere, to run autonomously in a network element environment, deployed in what is called a guest container.

In simple words, a guest container is an isolated component within the NOS that can host and execute a software agent. This software agent has access to open, exposed interfaces, but not to any other internal NOS parameters.

The deployment of software agents within guest containers enables the extension of NOS features, accelerates the introduction of innovative automation applications, and supports the development of customized functionality.

A NOS-agnostic software agent can be implemented and compiled independently, in a foreign development environment, by an operator or a third-party supplier, and, once downloaded, will run smoothly in a cloud-native NOS.

Furthermore, since a containerized architecture offers a variety of deployment options, the same agent can be deployed on the fly and run locally on a network element, on a server, or in the cloud. It can be ported easily across platforms: to the cloud, when an application needs to scale, to the network element processor when there are latency constraints.

An example: adaptive streaming telemetry

Let me describe a concrete example, that of an automation application named adaptive streaming telemetry that extends and improves the standard streaming telemetry mechanism and has been successfully implemented as a NOS-agnostic software agent.

Streaming telemetry is a network monitoring methodology where an external system subscribes to a specific network element data stream, among all the monitoring data that the equipment is able to expose. From there on, the network element pushes all corresponding data, in an almost continuous manner, to the server that subscribed it. Streaming telemetry ensures low-latency, high-performance data collection, enabling near real-time access to large volumes of network data.

However, standard streaming telemetry can still be improved. In a modern network, there are plenty of network parameters available to be streamed; under normal operation, many are redundant and monitoring them does not add meaningful information, while also imposing an unnecessary load on the system. This is where adaptive streaming telemetry, a solution that adjusts dynamically to the network status and evolves with the network’s needs, is useful:

  • Under normal network operations, a fixed, limited set of parameters is included in each data stream and pushed to the data collectors at a moderate frequency. These are the parameters that best summarize the network status.
  • Upon network status change, the streaming frequency and the content of the data stream are adjusted: the push frequency may be increased, or more parameters can be added to the telemetry stream, for further insight.

The power of adaptive streaming telemetry

This approach decreases the load on the data communication network compared to standard telemetry, but it continues to support fine-grained visibility when and where needed. It also contributes to better overall data quality, which, in turn, allows for better compliance to SLAs, improves characterization of a network element’s health, and unleashes the predictive power of analytics and machine learning.

Infinera has worked jointly with Oracle and Microsoft in adaptive solutions that extend the standard gRPC-based streaming telemetry. We have successfully demonstrated adaptive streaming using a NOS-agnostic software agent implemented in the Go open source programming language, running in a NOS guest container. The same agent was seamlessly deployed and tested in Infinera’s optical network operating systems as well as in in SONiC, an open-source network operating system that includes strong support for routing protocols. The use of the same software across various technologies and equipment vendors ensures that the behavior of adaptive streaming telemetry is uniform and consistent.

Leveraging cloud towards autonomous networking

Automation applications like adaptive streaming telemetry, that intelligently observe the network, are key ingredients for implementing intent-based cognitive networking. Adaptive streaming telemetry is one example of a fast-growing ecosystem of embedded network automation applications that are also leveraging cloud technologies to bring operators closer to the vision of a network that is self-adapting, self-healing, and self-optimizing.

This content has been adapted from Infinera blogs, published in 2021/2022.