Thoughts From the Field: Lessons Learned from Early Network Automation & Orchestration Initiatives
Contributed by Morgan Stern, VP of Automation Strategy, Itential
In the past three years, we’ve seen a number of Communications Service Providers (CSPs) embark on ambitious network automation and orchestration initiatives. The results have been mixed with some providers realizing significant improvements in productivity and accelerated time-to-market for new services, while others have struggled to meet their timelines and business targets for automation.
To better understand why some programs were seeing great results while others were stuck, I started doing some informal research. I spoke with a number of project team members and engineers that have been involved in a wide range of automation and orchestration projects, from simple task automation initiatives to more comprehensive service orchestration programs. The tools that teams used were varied and included scripting languages, commercial automation platforms, and open-source architectures. While my research was not rigorous enough to be considered scientific, the anecdotal feedback I received from the folks I spoke with was interesting and worth sharing.
In no particular order or priority, here are some of my key findings:
Monolithic Approaches Have Yet to Bear Fruit
A number of CSPs have been working on far-reaching, comprehensive orchestration and management frameworks intended to support a wide range of use cases across multiple technology domains. Due to the ambitious nature of these initiatives, and the broad scope, these programs are taking years to come to fruition. One specific challenge these teams are facing is the rate of technology is changing at a more rapid pace than the rate of their development cycles for the orchestration platform. For example, initial designs for orchestration were focused on virtualization models, while the current thinking is focused on containers as the preferred method for delivering network functions. Monolithic models take a long time to develop, and they take a long time to evolve – which is acceptable when the underlying technology is mature and relatively stable. That is not the case with modern networking, so these teams are forced to pivot. The end result is that most of the programs of this type have yet to bear fruit beyond one or two simple use cases. The next 18 months will be critical for some of the frameworks, as capabilities such as 5G network slicing will require a robust, scalable, end-to-end orchestration capability.
Flexibility is Critical – But No Plan/Framework Results in Snowballing Technical Debt
On the other end of the spectrum from the monolithic framework approach are the CSPs that are taking an unstructured approach to automation. These teams are using a variety of tools, methods, and models for their automation and orchestration use cases. They may use a platform like Ansible for some use cases, a workflow engine like Camunda for other use cases, and Python scripts for other cases. This type of ad hoc approach is appealing to teams because it enables them to achieve some quick wins – they can quickly automate simple use cases and can show relatively good results compared to manual methods. With that said, CSPs that have used this approach quickly become victims of their initial successes. As their internal customers begin to ask for more and more use cases to be automated, the complexity of the use cases increases. The tools are then stretched beyond their effective limit, and the code base of automations becomes increasingly complex. This technical debt inevitably reaches a point where the time and effort required to maintain the existing code base becomes so significant that the automation team no longer has the cycles to develop new automations.
One of my key takeaways from thinking about these two models is that there is a productive middle ground between a monolithic framework and the ad hoc approach, but it does require careful planning, a clear understanding of the automation tool options and clarity around the best practices for modular orchestration/automation design. I anticipate that this middle ground approach will deliver the best results in the mid-term, and potentially in the long term as it has the highest potential to be flexible enough to accommodate changes in the underlying technology domains while still providing enough structure for re-usability and management of technical debt.
Choices Make a Difference
As I was continuing to talk with different teams, I came across another interesting point – some programs, despite using almost identical automation/orchestration tools, were having different experiences with vastly differing business impact. In an attempt to better understand what was going on, I started to ask a lot of questions — What caused these differences, was it the use cases? The technology domains? Or was it the skills/experience of the teams that were developing and implementing the orchestration systems?
I discovered it was not one specific factor, but the choices of the architects, business leaders, and developers that accounted for these varied experiences. For example, one program implemented a process for evaluating use cases across multiple dimensions – business impact, complexity, and fit between the technology domain and the automation toolset. Another program developed orchestration use cases on the same software platform, but with little to no rigor for prioritizing and evaluating use cases. This second team took whatever priorities were handed to them by the network planners and did their best to automate. In response, the results were starkly different. The first team had shorter development cycles, and the automations they developed had clear business impact. The second team continues to struggle, with very mixed results and a growing backlog, which further frustrated their internal customers.
In another situation, the orchestration development teams took different approaches in designing the orchestration flows for service activation. One team effectively replicated the existing manual processes as automations, while the second team re-designed the activation process based on the capabilities of the orchestration system. In this instance, the second team had far greater impact – not only did they eliminate the manual activities, they also radically accelerated the activation process, resulting in a 90% reduction in time-to-billing for new customers.
Consumption & Exposure Unlock Greater Business Value
One additional topic came up from my conversations with these automation teams. The implementations that emphasized the ability to expose the automation use cases to the business via an Application Programming Interface (API) or automation catalogs were seeing higher utilization, and subsequently, they were able to demonstrate more value to the business than those implementations that were used predominantly by a small set of engineers. To illustrate this point, I’ll use a simple, specific example – automation of access control lists (ACLs) on IP routers.
In one program, the engineers developed a series of scripts that they used to create and edit ACLs. Their team would receive requests for ACL changes from their internal customers via spreadsheets that were sent either through email or attached to tickets. The engineers would receive the spreadsheet, manually review the requested ACLs to confirm compliance with security policies, and paste the ACLs into their script.
The second group took a different approach. They exposed the ACL process as a network service, where the internal customers could submit their ACL request as an API to the automation system. The automation engine would compare the requests to a predefined list of approved and blocked addresses/ports, generate the appropriate ACLs, and automatically apply the ACLs that were consistent with policy. Any ACL requests that deviated from policy were sent to the queue for manual review.
In both cases, the changes to the routers to apply ACLs were done through automation, but the results were very different. While the first team reduced some of the manual effort, there was minimal impact to their Service Level Agreement (SLA) – they required the same amount of elapsed time to push ACLs for their internal customers in the automated model than they did in the manual model. The second team saw improvements in multiple areas. Not only were they able to deliver ACLs in a much shorter SLA timeframe, their customers could use the self-service API to evaluate and request a much larger volume of ACLs, which enabled the ACL team to show greater value to the business.
The automation and orchestration market is at a very interesting point. When we first started Itential in 1994, only a very small percentage of CSPs even understood what network automaton was all about. Now, everyone is doing some level of automation. Some programs are in their second or third iteration, and the maturity of thinking has come a long way. I look forward to talking again with some of these teams to see how their automation infrastructures are evolving based on the lessons learned from these early experiences.