Blog

The Rumsfeld Matrix in Service Assurance

Contributed by Matthew Twomey, Head of Marketing, Anritsu Service Assurance. 

In 2002, at a press conference, the then U.S. Secretary of Defense Donald Rumsfeld, while answering a question about weapons of mass destruction (WMDs), began a line of thought which eventually culminated in what we know today as the Rumsfeld Matrix.

…there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns—the ones we don’t know we don’t know.

For this now-infamous piece of wisdom, Rumsfeld won the 2003 Plain English Campaign’s Foot in Mouth trophy. With the benefit of hindsight, though, there does seem to be a sensibility in the sentiment, and the phrase ‘unknown unknowns’ has since become common usage across many business domains. If we now layer the Principle of Causation (Cause & Effect) on top of the Rumsfeld Matrix, we can begin to understand how this thinking can be applied within the telco domain and, more specifically, to incidents within network and subscriber service assurance. Allow me to explain.

Known Knowns: Cause Known, Effect Known

This scenario is similar to being on a journey and knowing exactly where you are going and how to get there, or at the very least, having a map you can read to get you there. Known knowns are the bread and butter of Network Operation Centres (NOCs) and Service Operation Centres (SOC). If you walk into an NOC or SOC, the walls are covered with TVs filled with varying charts and diagrams, and probably 80% of these are known knowns. These reflect what most people recognise as operation-critical metrics or KPIs. In many cases, you might recognise these as Fixed Threshold KPIs. For example, as a Mobile Network Operator (MNO), you know that if your Call Success Rate KPI falls below a particular value (e.g., 98.5%), you have a real customer-affecting problem. Other standard KPIs might reflect Network Attach Time, Retainability, Mobility, etc. These particular KPIs are standard for two reasons: the effect is well-known, and often the cause or causes are quite well-known and have been experienced many times before. These KPIs and metrics are usually available out of the box in most service assurance solutions. Other known knowns might be “Roaming Not Allowed” errors for roaming subscribers. It is clear what the issue is, the resolution, and how it might affect the subscriber.

Solutions suitable to known knowns are those used for standard threshold monitoring, reporting and alarming.

 

Known Unknowns: Cause Known, Effect Unknown

Anritsu Service Assurance is sponsoring FutureNet World 2024.

This scenario is similar to travelling somewhere new, and you have yet to figure out how you will get there. Known unknowns are a little trickier to handle. A perfect example is analysing volumes of error codes to detect network issues. As an MNO, you understand that an error code is cause for concern. It is an error code for a reason, after all. However, a single instance of an error code will garner less attention than a large cluster of the same error code within a short timeframe. The error codes will sometimes point to the root cause of the issue, but often, further analysis is required to determine the actual cause and find a suitable resolution. The effect is unknown because the scope of the incident in terms of the number of subscribers or services affected by these incidents is generally only understood on a per-incident basis. For this reason, it is hard to tie these incidents down using fixed or meaningful thresholds for alarming purposes.

Solutions suitable to known unknowns need to be more flexible than those applied to known knowns and will definitely need to consider shifting baselines and seasonality.

 

Unknown Knowns: Cause Unknown, Effect Known

This scenario is similar to being on a journey in your car and instinctively avoiding a particular road or route as you know it is prone to traffic jams at this time of day even though you are not actively thinking about it. Unknown knowns increase the complexity once more concerning both detection and resolution. They are similar to known unknowns but also significantly different in their detection method. Here, we are not looking for an error code to tell us something is wrong. Instead, we may be delving into a volumetric analysis of any specified occurrence across any of the dimensions available. Suppose you are a Roaming Manager responsible for worldwide subscriber data usage. You may choose to monitor a metric of Total Bytes and break this down per country. Imagine seeing a sudden drop in data usage. This phenomenon is impossible to monitor with Fixed Threshold KPIs (Known knowns). If no error codes are associated with the incident, you cannot surface this phenomenon in the manner of known unknowns above. What now? This is where volumetric analysis of metrics combined with seasonally-adjusted anomaly detection algorithms play their part.

Solutions suitable to unknown knowns need to build upon those used for known unknowns with the ability to use different metrics and calculation methods for detection.

 

Unknown Unknowns: Cause Unknown, Effect Unknown

This scenario is like encountering a road traffic accident on a journey that neither you nor anyone else could know would happen; it’s a complete surprise that requires new thinking and solutions. By far, the most challenging incidents to detect are the unknown unknowns. We find these things so hard to imagine that we can’t prepare for them in advance. However, when they occur, the MNO must be aware of them before they adversely affect large numbers of customers. This is where artificial intelligence (AI) and machine learning (ML) can play a part. MNOs can use unsupervised machine learning to highlight anomalies worthy of investigation that a human couldn’t detect. The real value to be had in these use cases centres around domain knowledge and the intelligent use of data to feed into the algorithms with the ultimate goal of raising the signal above the noise. Other unknown unknowns might be where a network element has a complete loss of function, and the effect of that is unknown.

Solutions suitable to unknown unknowns primarily use AI and ML as their foundation and intelligent, trusted data as their input.

 

Not all MNOs are equal, however. The never-ending requirement to manage Operational Expenditure (OpEx) can take its toll on the ability of some to move beyond the known knowns for their service assurance capabilities. Others will dip their toes a little further into known unknowns and reap the additional benefits of faster incident response time and better customer experience. Each step up the ladder has the potential to make things easier for the MNO and better for their subscribers, but each step is also an investment of time and money.

 

As MNOs grapple with the complexities of known knowns, known unknowns, unknown knowns, and the elusive unknown unknowns, their reliance on AI and ML technologies becomes increasingly pivotal. This shift marks a technological and strategic leap, highlighting the importance of adaptability, foresight, and innovation in managing operational complexities and change. The challenge for MNOs is not just in detecting and resolving incidents but in foreseeing and preparing for the unforeseeable, ensuring they remain ahead in a perpetually evolving digital landscape. The journey from managing the known to navigating the unknown confidently reflects a broader narrative of growth and resilience, pushing the boundaries of what is possible in service assurance and operational excellence. And when the unknown unknowns present themselves, the speed, volume and quality of data available to MNOs for troubleshooting purposes is more important than ever.

 

According to financier and philanthropist Sanford I. Weill, “Details create the big picture”.

Trusted data and trusted partners are now more important than ever.

Chat with our team