End-to-End Service orchestration: the Telecom industry's exoplanet

Luc-Yves Pagal Vinette
31 mars 2022
26 min de lecture

Dernière mise à jour : 24 juin 2025

Introduction

End-to-End Service Orchestration (E2ESO) or Multi-Domain Service Orchestration (MDSO) is a primary activity that fundamentally changes and can accelerate both 5G and Service Infrastructure monetization, as well as the accompanying revenues for more advanced service use cases. End-to-end orchestration is the method of dynamically allocating and rapidly connecting a set of domain resources, defined in an inventory, to a service. It notably provides de-facto consolidation of all domain inventories under the umbrella of a functional/logical End-to-End service inventory, which includes all infrastructure capabilities such as Network, Mobile, Satellite, xNF Virtualized functions (uCPE, vFW, etc.) as well as related hardware resources.

More importantly, End-to-End Orchestration paves the way for a new generation of telecom services that will finally bring together two realms that have previously existed independently, namely wireline and wireless domains. Indeed, once wireline and wireless finally converge under the same service orchestration umbrella, they open the door for interesting, new service use cases, including: E2E Network Slicing, 5G and NTN (Non-Terrestrial Networks), 5G & Wi-Fi offloading (Wi-Fi 6 & 7), Open RAN SMO & RIC, Cloud Computing, Private 5G while aiming for 5G Advanced (Rel18 and beyond), as well as upselling NFV services (Cloud-based NFVs, uCPE, SD-WAN, etc..).

Fig.1: End-to-End Orchestration perspectives

As shown in the picture above, E2E Orchestration is vital function that is designed to align essential standards from 3GPP, IETF, O-RAN Alliance, TMF Forum, and others with Open-Source innovation activities such as ETSI Open-Source MANO or LFN's ONAP. Such a viewpoint presents its own set of issues, as each service island and domain evolve at their own pace and with their own backward compatibility.

As previously stated, the primary task and complexity inherent in End-to-End orchestration is the allocation and stitching together of resources to define an end-to-end service. Easier said than done, a service like Network Slicing or another end-to-end service brings a plethora of challenges and complications such as seamless alignment of domain controller/orchestrator with E2E orchestration for automated service instantiation, service assurance and observability, Quality of Service (QoS) & Quality of Experience reporting, and finally resource release when an end-to-end service is terminated.

What drives End-to-End Orchestration, aka MDSO (Multi-Domain Service Orchestration)? Let's get started.

What are the components of E2E Orchestration or Multi-Domain Service Orchestration (MDSO)?

End-to-End Orchestration is made possible by several factors, including the ability to connect many service domains in order to synthesize consumer E2E services. Several technologies and solutions aid in its realization.

APIs

End-to-End Service Orchestration relies significantly on a reference design and framework of communication-based APIs to enable end-to-end service stitching across layers, domains, and service provider networks. To build Open APIs, several standard groups (MEF, TM Forum, ONF, etc.) have standardized reference points, defined information models, and created use cases alongside business needs.

Fig.2: End-to-End Service Orchestration components

A comprehensive API framework serves as a foundation for facilitating or hastening the transition of Communication Service Providers to Digital Service Providers (DSPs). Such a framework would provide the necessary “in-between” elements to simplify commercial interactions, service operational allocation, and resource allocation across layers, domains, and inter-operators, as well as to alleviate the exchange between the Master orchestration and the sub-domain controllers and their respective HW/SW infrastructure.

FEDERATION OF INVENTORIES

A unified and federated inventory is a modern method to inventory management that enables the unification of inventory data from disparate sources such as legacy OSS silos, network management systems (NMS), and service assurance systems. This establishes a single centralized "source of truth" that accurately reflects the current condition of the complete service infrastructure, including hardware and software.

Service providers require an inventory system that categorizes and visualizes all available resources, functions, and services in order to maximize the value of their entire service infrastructure (physical, virtual, and end-to-end). Telcos and other Digitized Service Providers' automation capabilities heavily relies on inventory foundations. The progress toward dynamic inventory systems, both within and across domains, necessitates a federated convergence.

Operations workers rely on IT systems based on inventory data from Dynamic Inventory systems. Inventory data, on the other hand, is dispersed among multiple inventory systems. Network inventory data, for example, is available for numerous service islands (Access, Core, Edge, Cloud, 5G, SD-WAN, NFV) and network levels (Optical, Ethernet or IP-VPN). As a result, it is not uncommon for major service providers to maintain multiple inventory systems, either commercial or developed internally. Service providers have continuously added overlay network infrastructure and Operation Support Systems (OSS) to fulfil new service demands; yet these old inventory systems are static. With the rapid adoption of Cloud-Native principles, both virtualization and containerization are being introduced alongside legacy hardware-based infrastructure. As a result, OSSs and Orchestration platforms require a dynamic way to maintain a live state of inventory systems, even when they are not synchronized with Closed-Loop automation.

As Communication Service Provider (CSP) evolves into a Digital Service Provider, they differentiate themselves with their own product/service catalogue of offerings that spans any combination of network and service domains. As a result, operations workers must access numerous inventory and monitoring systems per domain to execute service planning, fulfilment, and assurance tasks. To provide a single correct view of network resources, the holistic notion of End-to-End orchestration/automation necessitates synchronization of per-domain information and real-time correlation of network data.

PER-DOMAIN & CROSS-DOMAIN OBSERVABILITY

Observability on a per-domain and cross-domain basis is an integral aspect of service assurance (described in the next chapter). Regardless, they are a critical component of the Service Orchestration process. Notably, previous service instantiation and proactive monitoring provide a true sense of the state of inventory components, ensuring a flawless service active chain. Throughout the service's lifecycle, continuous monitoring is essential to ensure the health of all the components that comprise the service (PNF, VNF or CNF) and of the service itself. Once the service is decommissioned, all connected resources are released and made available for other purposes.

As orchestration/automation evolves and service components become more granular and stitched in complex service configurations, the need for granular observability across levels and across the board has never been more pressing. Indeed, service infrastructure is naturally evolving to become more Cloud-Native driven, which means that xNF (Any Network Functions), whether physical, virtual, or container-based, could not only be instantiated or provisioned with the operational service platform, but also monitored regardless of their state of evolution or operation.

However, the rate at which each service domain evolves differs. Some domains, such as RAN (Radio Access Networks) or Core/Access/Optical Networks, are still strongly reliant on hardware-related functions, necessitating the use of legacy monitoring technologies such as SNMP and vendor-centric monitoring solutions. Other domains, such as 5G Core or Kubernetes-based on-premises/hybrid cloud and container clusters, are already mostly based on Cloud-Native principles such as Kafka-based streaming, leveraging Open-Source or Hyperscaler tools such as Prometheus for monitoring and data exposure, though ELK (ElasticSearch Logstash & Kibana) for analytics and Grafana for data visualization.

Why is Multi-Layer & Domain Service Assurance (MDSA) so relevant?

In recent years, the rising complexity of Service Provider networks has prompted CSPs/MNOs to spend heavily on vendor-based or Open-Source focused orchestration/automation capabilities. The primary goals for orchestration/automation were to better monetize overlay, wireline, but especially wireless, and more specifically 5G service opportunities based on a shared service architecture. Furthermore, to substantially decrease both TCO (Total Expenses of Ownership) and OpEx costs associated with static or monolithic service implementation.

However, as service composition turned as being more sophisticated, complex, and ranging across multiple domains within an operator architecture, and even across multiple operators for wholesale commercial agreements, it became necessary to ensure that the services offered were competitive and met the requirements of the consumers. This necessitated the use of a Multi-Domain Service Assurance (MDSA) platform to support an end-to-end orchestration platform aka MDSO (Multi-Domain Service Orchestration).

A SET OF FUNCTIONS ON A PER-DOMAIN PERSPECTIVE

Multi-Domain Service Assurance necessitates the creation of an umbrella comprised of a set of capabilities that will not only support all services horizontally but also vertically, as defined by the ETSI MANO (Management & Orchestration) principles. MDSA enables notably monitoring across all layers, beginning with HW (gNBs, routers, etc.) and progressing through Virtualization (OpenStack, Kubernetes-based implementations), Networking layers (IP-VPN, EVPN, SD-WAN, Segment Routing), Mission critical applications, Virtual stitching of resources (Network Slice subnets), and End-to-End service (Network Slice/Cross-Operator service).

Fig.3: ETSI MANO

Data collection is an essential component of service assurance because it allows metrics and counters from hardware or service functions to be accessed and exploited. Data retrieval is a critical element in ensuring that data may be used to develop KPIs (Key Performance Indicators) on a per-domain and cross-domain level. KPIs are either vendor-driven or standard-driven; in recent times, due to the global interest in 5G, the 3GPP standard has produced two noteworthy standards: 3GPP TS 28.554 for 5G E2E Key Performance Indicators and 3GPP TS 28.552 for 5G Performance measurements. After being generated, the KPIs can be visualized utilizing a customizable dashboard on a per-profile, roles and responsibilities, and even customer basis.

CORRELATION ACROSS SERVICE DOMAINS

Conventional assurance methods fail to manage complex multi-technology and multi-domain networks, which are often fragmented, work in silos, are slow, and require manual intervention. With multi-domain and technology entwined nowadays, challenges that develop are frequently the result of interdependencies among multiple layers within the service architecture. Such concerns are difficult to track and control using standard or legacy assurance methods.

Fig.4: E2E Orchestration and Service Assurance: the perfect combo

To address this issue, a Multi-Domain Service Assurance) platform is proposed. It works alongside and connects with the Multi-Domain Service orchestration/automation platform. Further is powered by an AI/ML framework and is supported by intent-based policies. As a result of such an umbrella of assurance capabilities, the following benefits can be realized:

Acquire visibility and end-to-end insights into the service infrastructure's layers and domains.
Rapidly identify and resolve issues before they become service interruptions
Visualize multi-layer network/function relationships that allow for correlation of network/service issues
As discussed in the observability section, assuring data probing across domains and levels from several sources of fault and performance measurements or counters
Correlation of alerts and metrics across multiple domains for Root-Cause Analysis via advanced analytics (AI/ML-based) tools.
Customizable dashboards for analytics, assurance, anomalies, and ticket creation, suited to the roles and responsibilities of specific roles
Closed-loop resolution of problems using policy-driven (AI) and machine learning-assisted techniques

ENTERING THE REALM OF NEAR-REAL-TIME (Nr-RT) AND REAL-TIME (RT ANALYTICS AND OBSERVABILITY

The introduction of 5G as a multi-domain service technology has raised concerns about the need for an Analytics Framework, which can provide data analytics capabilities at various tiers and on a per-domain basis. It is referred to as Data Analytics Functions (DAFs) by 3GPP. The Management Data Analytics Function (MDAF) may collect and analyze data from across the 5G service architecture, including 5GC, Cloud, Edge networks, and Radio Access Networks (RANs). NWDAF (Network Data Analytics Function) leverages data from 5G Core Network Functions (NFs) to offer near real-time analysis back to the network, 5G NFs, but also potentially to other domains such as the virtualized Open RAN and its RAN Intelligent Controller (RIC). NWDAF also offers predictive analytics, which aids in anticipating and proactively managing capacity for various functions with minimum human intervention. Both technologies are 3GPP-certified and feature the following specifications: MDAF (3GPP TS.28533) and NWDAF (3GPP TS.28533) (3GPP TS.29520).

Fig.5: RAN Domain orchestration and Open RAN integration

The RAN Intelligent Controller (RIC) is an excellent illustration of how industry is embracing Open-Source enhancements of standards. The RIC is a critical component of the O-RAN (Open RAN) architecture described by the O-RAN Alliance. The O-RAN design divides gNB functions into three primary components: the O-RU (Open Radio Unit), the O-DU (Open Distributed Unit), and the O-CU (Open Centralized Unit). The RIC, on the other hand, comes in two varieties: Non-Real Time (Non-RT RIC) and Near-Real Time (NRT RIC). Many people consider the RIC to be an improved version of the C-SON (Centralized Self-Organizing Networks).

As a D-SON (Distributed SON) alternative, the RIC is being implemented to allow AI/ML intelligent decision making for real-time monitoring, management, and re-configuration of O-RAN containerized functions. The Non-RT RIC, as shown in the picture above, integrates as part of the RAN sub-domain controller, commonly known as RAN Service Management & Orchestration (SMO) or Domain Orchestration. The Near-RT becomes more closely integrated with O-DU and O-CU.

LOCAL OPTIMIZATION OF SUB-DOMAIN

One of the main benefits of having both Open-RAN and SMO capabilities is the ability to fine-tune and manage the radio network at the RAN level. As a result, the objective would be to optimize not just radio frequency and Cloud-Native function resource allocation, but also the remainder of the service architecture.

The two O-RAN RIC instances are critical components of the RAN SMO (Service Management& Orchestration) and enable intelligent management of next-generation Radio Access Networks built on Cloud-Native concepts in preparation for future generations of wireless services such as 5G Advanced and 6G. These are Non-Real time and Near-Real time RIC platforms, respectively.

Fig.6 RAN Domain orchestration and Open RAN integration

The primary objective of the Non-RT RIC is to support non-real-time radio resource management by optimizing the higher layer, policy optimization parameters in the RAN, policies, and AI/ML models to support the Near-RT RIC function operations and achieve its non-real-time objectives. More precisely, the Non-RT RIC manages policies, does RAN analytics, and trains models for Near-RT RIC instances (yes, there will be more than one). Additionally, the Non-RT RIC platform hosts rApps (Non-RT RIC applications) for the purpose of performing Non-RT RIC tasks.

As the name implies, Near-RT RIC runs in near-real time (within a timeframe ranging from 10ms to 500ms) and is responsible for RAN control and optimization. It also combines xApps to achieve Resource Radio Management and is based on UE, O-DU, O-CU, and cell-specific metrics. Numerous use cases are intended to be employed for local optimization of the RAN domain, including traffic steering, QoE optimization and measurement, massive MIMO optimization, and QoS resource optimization.

CLOSED-LOOP SERVICE AUTOMATION IN O-RAN DOMAIN

As the nature of services evolves, most notably for network slicing, it enables customizable and agile network deployments of varying types to target distinct business verticals, while remaining within the same service domain. As addressed below, an evolution to Cloud-Native principles and implementations is required to enable the RAN domain to instantiate and allocate resources in a scalable and flexible fashion.

The O-RAN architecture's two embedded intelligent components are the Radio Intelligent Controllers (RIC), which come in two flavors: Non-RT RIC and Near-RT RICs. Non-RT RIC collects data for the purpose of monitoring and slicing in order to provide Key Performance Indicators (KPIs) for RAN slice subnets allocated resources as well as the parameters required to design them.

1) To assure visibility of the RAN service components, these parameters and KPI data are gathered and delivered upward to the Service Assurance central nerve centre through Kafka streaming methods in near real-time.

2) The same parameters and KPIs are also provided downstream to the near-real-time RIC in order to perform any necessary dynamic slice subnet optimization or reconfiguration to ensure SLA/QoE service assurance.

As illustrated in Figure 4, the Service Assurance architecture includes two critical components that are required to influence the data-driven service orchestration process. Indeed, a Data Architecture that incorporates a Data repository (DataLake) will enable the collection of vital data across domains.

Further, an AI/ML framework is needed to ensure that data is processed to provide KPIs, performance metrics, and alarm analysis. It can further perform event/anomaly detection such as degraded performance that could affect the quality of service and lead to degradation in the service experience delivered to end customers. The presence of a closed-loop connection between the Assurance and Orchestration platforms allows for service level threshold violations and alerts to be communicated with the End-to-End Service Orchestration platform, which can take action to introduce the needed adjustments.

What role can we expect from AI/ML capabilities?

Artificial Intelligence (AI) and Machine Learning (ML) have been explored and considered as a major capability of next-generation 5G wireless networks and beyond 5G. (B5G). Analysts suggest that AI/ML capabilities might be used in three key areas: network planning, network diagnostics/insights generation for prevention & optimization, and network control. Telco service infrastructure and 5G nowadays, are becoming increasingly complex due to their multi-domain nature, based on ETSI MANO hierarchical tiers, utilizing numerous layers of connectivity, and heading towards disaggregation and Cloud-Native evolution. This rapidly growing complexity, along with the need to manage legacy services (statically provisioned), renders traditional network planning and observability systems obsolete, necessitating their replacement with automated solutions that leverage AI/ML.

End-to-end orchestration has become a requirement as the industry shifts to a more distributed service infrastructure nature. Indeed, mobile and fixed networks have moved from being historically centralized and dedicated to becoming more distributed, dynamically provisioned, and orchestrated. In this context, Artificial Intelligence (AI) is rapidly becoming an inescapable characteristic for network management as well as operational aspects of distributed service domains (Wireline, 5G, NFV, Cloud, Edge, etc..). Leveraging AI also generates a large amount of monitoring and operational data from disparate network/service domains, allowing vital insights based on real-time networking operations to be accessible.

PER-DOMAIN ANOMALY & EVENT DETECTION / CROSS-DOMAIN ANOMALY & EVENT CORRELATION

As the move to the digital world has never been more vital for Service Providers and Enterprises, data architecture plays a critical role in increasing the efficiency and data reliability of Digital Service Orchestration platforms. According to Forbes and other analysts, it is predicted that data creation will be about 60 times greater in 2025 than it was in 2010. Per-domain observability becomes more critical, however, using rule-based approaches to detect security threats, forecast device or software failures, or uncover faults will not guarantee sufficiently effective results. More innovative approaches for anomaly detection are based on AI/ML algorithms that examine data abnormalities and improve over time based on the algorithm's efficiency. Anomaly detection is the process of recognizing non-compliant patterns using statistical, supervised, and artificially intelligent algorithms that automate the process of rapidly and efficiently identifying and locating them.

In a similar vein, cross-domain event correlation automates the process of analyzing events across all service domains and establishing correlations between them to discover problems and elucidate their root cause. As organizations and Telcos, included, produce massive amounts of data in a variety of formats, generated by servers, databases, virtual machines & containers, mobile devices, operating systems, and a plethora of other network components, they increasingly rely on AI/ML frameworks and technologies to analyze this data to enhance their systems and applications uptime and performance but also to enhance customer service experience and strengthen the differentiation from the competition.

PREDICTIVE ANALYTICS

Predictive analytics uses historical data in conjunction with statistical modelling, data collection techniques, and machine learning to forecast future results. Corporations and Telcos are starting to use predictive analytics to identify trends in their data that indicate hazards or risks and patterns of risks.

Frequently associated with big data and data science, digital organizations today are flooded with data from a variety of sources, including transactional, database, equipment or network log files, photos, videos, and sensor data. To glean insights from this plethora of data, data scientists employ machine learning and often deep learning algorithms to forecast future occurrences such as traffic congestion and equipment breakdowns, as well as to detect odd data patterns.

TRANSFER LEARNING

Transfer learning is a relatively new machine learning method that entails repurposing a trained model for one task to be utilized for another related activity. Transfer learning can be used in many settings, including Open-RAN. Training a model from scratch on a multi-cell RAN scenario can be a lengthy and complex process due to the large amount of training data and other factors such as latency and complexity of the intended use cases.

Therefore, when the model is employed in a different data environment, transfer learning could be used to quickly transfer existing knowledge from one environment to another, such as expanding RAN Open-RIC from one area to another.

What are the requirements to reach a seamless E2E Orchestration?

End-to-End orchestration, as demonstrated throughout this article, is undoubtedly a sought-after feature for any multi-services Service Provider, leveraging a unique network core infrastructure and serving numerous service domains: wireline, wireless, access, and last mile technologies. As discussed in previous sections, Telcos must cope with hybridization of their Service Infrastructure, where service domains are not necessarily in lockstep on their paths of evolution, despite a strong desire to evolve towards Cloud-Native solutions and toolsets.

This is how network hybridization can be summarized. Some domains, such as RAN (Radio Access Networks), Core & Access Networks, are still primarily driven by hardware-attached functions (PNFs). Some other domains are more in the middle, with most of their functions being either VNF (Virtual Network Functions) attached to an NFVi (Network Function Virtualization Infrastructure) such as OpenStack or any derived infrastructure, or bare-metal implemented Network functions such as On-Premise Cloud and Managed services.

On the other end of this industrial spectrum, there is an intriguingly small number of service domains that have almost totally embraced Cloud-native principles and technologies and are rapidly progressing towards a unified management approach for xNF (Any Network Functions). The 5GC (5G Core Network) and Hyperscalers such as AWS, GCP, Azure but also Oracle Cloud, IBM, Equinix, etc. They have significantly influenced Telcos to embrace Cloud-Native specific provisioning and management capabilities.

EVOLUTION FROM TELCO-SPECIFIC TO CLOUD-NATIVE AND OPEN SOURCE INNOVATION

As Telcos/MNOs/CSPs/ILECs/CLECs (or whatever names we use to describe them) continue to evolve toward these new concepts that have entirely dominated the telecom industry, indeed, Kubernetes, as a COE (Container Orchestration Engine), has simply forced all other market competitors out of the ring, much like a Sumo in a judoka-filled arena. The main advantages of Kubernetes were that it could orchestrate, manage, and scale Docker containers vertically and horizontally, but it could also provide a credible and inherent alternative to Edge orchestration to avoid a rather complex distributed approach of the end-to-end orchestration platform. As a result, orchestration platforms delegate certain of their functions to Kubernetes-based functions.

Open-Source orchestration systems like as ONAP and ETSI Open-Source MANO, as well as commercial vendor orchestration platforms, have already proved that Kubernetes can be readily incorporated into their orchestration service umbrella. The Multi-VIM component from ONAP or other API-based interfaces have just recently extended this notion to make Kubernetes the key engine for automated service and function deployment.

This document wouldn’t be complete if I did not highlight the gradual transition to disaggregated networking solutions or NoS. (Network Operating System). A NoS is a disaggregated container-based software package that can be orchestrated by a Network Automation platform but can run on any whitebox platform rather than dedicated vendor hardware. Most operators are beginning to deploy overlay network services such as MPLS and Segment Routing (SR), EVPN, and SD-WAN, which are enabled by network automation southbound communication protocols such as Netconf/Restconf and can run over a NoS where numerous suppliers are already positioned (ADVA, Metaswitch now Microsoft, IPInfusion, Drivenets, etc..) or Open-Source variant like DANOS.

Fig.7: From Telco-specifics to Cloud-Native evolution

In the context of monetizing 5G service infrastructure, MNOs are looking for ways to align not only their service infrastructure (which, as previously said, is made up of many service domains) but also their partners (Hyperscalers or Partner operators) with the use of industry standards and open-source solutions. Hence, the CNCF (Cloud Native Computing Foundation) de-facto standards have flooded the market not by surprise, but by necessity. The next generation of services and applications inherently prioritize flexibility and agility over robustness, making design for failure an essential practice. Such Cloud-Native apps might be disaggregated into various components and brought together utilizing Kubernetes and Service orchestration. The ability for an entire end-to-end service to be created, monitored, and scaled horizontally or vertically on the fly, as well as changed or terminated based on client demand, was also a fundamental component of Cloud-Native concepts.

So, why does it appear that now the Cloud-Native transition is unavoidable?

WHY DO OPERATORS EMBRACE CLOUD-NATIVE PRINCIPLES ?

Numerous factors can account for this phenomenon, but three key ones stand out: independence from vendors, decreased operating expenses, and advanced service control, as well as flexibility and agility.

Independence from vendors

All long-standing operators who have been in the market for a given amount of time bring with them what we call legacy services or "baggage." This "baggage" is often services that are completely dependent on hardware and hence stuck in the previous reality of legacy ideas and toolsets where provisioning is frequently static and single-purpose appliances deliver the expected functionality. The irony is that while legacy-based services may be an attractive source of recurring revenue and thus difficult to abandon, the longer they function, the more they sustain custom hardware-based static solutions, thereby slowing critical change. As a solution, cloud-native evolution offers a natural and logical substitute for legacy/hardware-based services.

OpEx decrease

Any technology/hardware component is defined by its intrinsic costs, which are typically expressed as Capital and Operation expenses (CAPex/OPex) and total cost of ownership (TCO) (TCO). As the complexity of the technology (HW or SW) used to create services grows, so do the human resources, time, and money required to tame and monetize it. Maintaining different technology levels from one service domain to the next increases operational costs (OPex) artificially; however, renouncing telco-based principles and adopting Cloud-native principles and toolset would tightly align both Telcos and hyperscaler providers (AWS, GCP, or Azure) and would undoubtedly simplify Telcos' consumption of hyperscaler services and technologies.

Adopting cloud-based orchestration tools such as CloudFormation or Terraform, on the other hand, would naturally expedite the options for using hyperscaler services or blending both on-premises and hyperscaler technologies as hybrid-made services. This would not only broaden Telco services' footprint and exposure but would also accelerate the development of value-added services while keeping the cost per bit ratio as low as feasible while reducing provisioning and operational costs. The secret would be in aligning both staff expertise and technologies used throughout the Telco's domain and its service partners on cloud-centric offerings.

Advanced service control, flexibility and agility

As the complexity for service introduction and automated provisioning (orchestration) has never been higher, the requirement for a higher level of control for a speedier decision process, as well as more flexibility and agility on provisioned services, has never been greater.

Telcos will now be able to distribute the decision-making process for changing services not only from the top layer (E2E Orchestrator), but also to lower-layer functions such as Kubernetes COE (Container Orchestration Engine) or any commercially adapted versions, in order to control the grainy details of a set of Cloud-Native xNF functions packaged as an end-to-end service.

As a result, service control is inextricably linked to flexibility and agility, as end-to-end services and the xNF functions that constitute them may need to be scaled vertically or horizontally at will. Vertical scaling simply means that a function's initial hardware resource allocation (CPU, memory, or storage) is increased, but the functions remain on the same physical device. Scaling horizontally, on the other hand, allows for the replication of the initial resource allocation to other hardware resources within the same or a different geographic area.

The scaling of xNF functions may be applied differently depending on whether the function is a PNF, VNF, or CNF. In the event of a PNF or Edge MEC device, a cloud infrastructure software stack dedicated to easing hardware resources for container-oriented architecture for edge or customer-based services is naturally required. Good instances of this may be found in the Open-Source realm, such as Starling-X or Mobile Edge-X, as well as in the commercial realm, where multiple companies are already competing for this, e.g., AWS Outpost. On both sides of the fence, the same cloud-native concepts apply for service creation (Kubernetes orchestration), function provisioning (Kubernetes + Ansible/Terraform or CloudFormation.), and network function monitoring (Kubernetes + Prometheus).

Can it all be achieved without a greenfield scenario?

The path to complete cloud native across all domains is a journey that is approached quite differently depending on whether a telco is a long-term market actor or a new market player. Recent entrants, such as Rakuten and Dish Networks, have proved how a smart strategy and Cloud-Native can provide you market visibility and competitiveness. Both Rakuten and Dish Networks have built their strategies around the idea of minimizing hardware dependence, so maintaining their independence from vendors. This is accomplished by introducing vendors with a Cloud-Native strategy at their core for key strategic domains such as RAN with Open RAN, xNF or disaggregated NoS in Transport/Access and Core networking domains, 5G Core domain based solely on Cloud-Native, and a close partnership with Hyperscalers to ensure meaningful market presence and coverage. However, for networking services, the reliance on silicon-based hardware solutions remains significant for performance and scalability difficulties, particularly for core/optical networking.

This Cloud-Native journey can be naturally expected by long-term industry players with proper planning and evaluation of the important strategic domains that demand transformation sooner. First, there would be a desire in a domain that is easily based on Cloud-Native principles, which may be easier with the 5G Core domain. Second, 5G spectrum represents a significant investment for mobile-centric carriers to monetize as early and precisely as feasible. The transition from NG-RAN to Open-RAN brings up a slew of Cloud-Native opportunities, most notably for orchestration, monitoring, and service assurance, but also for spectrum monetization. The third and most challenging portion, and for some of them, the primary piece of monetization, is network infrastructure, on which most operators rely heavily.

To give a series of market examples that demonstrate how it is explored and deployed. There are some new MNO operators who have undoubtedly benefited from a greenfield setting and a strong market and technological strategy. Indeed, market actors such as Dish Networks and Rakuten Mobile have largely bet on multi-vendor, open standards, hardware independence (notably using Open RAN), and a Cloud Native and E2E service orchestration strategy, while focusing their monetization strategy primarily on their ability to rapidly capture mobile user market share.

However, they are not alone in the market; established operators have gradually shifted their focus to a multi-vendor strategy and cloud-native alignment, investing in areas such as the transition to Open RAN and the expansion of their service infrastructure with cloud providers, relying on Network Automation and end-to-end service orchestration. These operators operate in different regions: Telus, Bell Canada, AT&T, and Verizon in North America / Orange, Telefonica, and Deutsch Telekom in Europe.

How would E2E Orchestration improve Telcos’ service infrastructure monetization?

The need for orchestration/automation has accelerated and interest has increased significantly as a result of the promise of a new generation and diversified revenue streams associated with 5G. However, because the 5G Service infrastructure is composed of many service domains (RAN, Transport/Access networks, and 5G Core), all elements must fully coordinate and align all domain resources and controllers under an End-to-End Service orchestration umbrella in order to fully monetize it.

Finally, the alignment of domain resources such as RAN gNBs or Open RAN O-DU/O-CU, Core/Edge/Access Network optical channels and tunnels, 5G Core, and any virtualized or containerized resources enables service orchestration and cloud native monetization, which are critical for quickly and efficiently generating new revenue streams. Let's take a look at some of the future use cases where both Cloud-Native and E2E Service Orchestration are notably taken advantage of.

WHAT ARE THE SERVICE USE CASES OF INTEREST?

1st Future use case: Network Slicing

5G is frequently related with Network Slicing and its natural applications, which include IoT (Internet of Things), enhanced mobile broadband (eMBB), and crucial machine type communication (mMTC), all of which enable novel use cases and revenue streams. These use cases have varying requirements for network performance (throughput, latency, jitter, packet loss, and application responsiveness), but also for the resources necessary to supply them via an efficient deployment and utilization of the 5G Service Infrastructure. Because 5G mobile service infrastructure is as reliant on its RAN (Radio Access Networks) as it is on its transport networks, partitioning and isolating resources for the exclusive benefit of a single client is required.

2nd Future use case: Network as a Service (NaaS) / Network Slicing as a Service (NSaaS)

Not all operators are MNOs (Mobile Network Operators), but Network/Comms Service Providers (NSPs or CSPs) also plan to utilize their Network Automation transformation to capitalize on NaaS and NSaaS use cases. Through partnerships and ICI (interconnect interfaces) between service providers, Network as a Service (NaaS) enables MNOs or CSPs with a restricted footprint to expand their immediate coverage. As a result, a given MNO or CSP might use TM Forum APIs to request services and functions, and Operator 1 Service Orchestrator would then demand the instantiation of the network service to Operator 2 Network Automation function depending on specific performance criteria (bandwidth, Jitter, Packet loss, etc..). These criteria and network concerns might then be monitored and communicated back to Operator 1 via streaming technologies such as Kafka.

For MNOs with a restricted network footprint, particularly in the last miles, who seek to provide network slicing use cases but are constrained by their reliance on network infrastructure, this may limit their alternatives and monetization opportunities. Extending networking capabilities may be prohibitively expensive. As a result, they may have two options:

Option 1: MNOs leverage MOCN (Multi-Operator Core Networks) to enable active collaboration between two MNOs on 4G/5G services via Network Slicing Management functionalities.

Option 2: MNOs exploit the core and last mile fibre infrastructure of Network/Comms Service Providers by utilizing Network Slicing Management Functions between the MNO and the CSP.

For Network Slicing Option 1, the end-to-end orchestration from MNO 1 is accomplished by instantiating the end-to-end Network Slice via the NSMF (Network Slicing Mgmt Function) and using per-domain NSSMF (Network Slice Subnet Mgmt Function). From the MNO 1 perspective, the NSMF, C-NSSMF (Core NSSMF), and T-NSSMF are used to instantiate the Network Slice's main component. (From the MNO 2 perspective, the NSMF receives both the order API and instantiation from MNO 1). MNO 2's NSMF then connects with both R-NSSMF (RAN NSSMF) and T-NSSMF to complete the Network slice by stitching together MNO 1's Core domain slice and Transport domain slice subnets and MNO 2's RAN domain slice and Transport domain slice subnets.

For Network Slicing Option 2, the end-to-end orchestration from MNO 1 instantiates the end-to-end Network Slice via the NSMF (Network Slicing Management Function) and utilizes per-domain NSSMF (Network Slice Subnet Mgmt Function). From the perspective of MNO 1, the NSMF, C-NSSMF (Core NSSMF), and R-NSSMF (RAN NSSMF) are used to instantiate the Network Slice's main component (On the CSP's side, the NSMF receives both the order API and instantiation from MNO 1). CSP's Network Automation NSMF then talks with T-NSSMF to complete the Network slice by stitching together MNO 1's Core domain slice and RAN domain slice subnets, as well as CSP's Transport domain slice subnet, which will offer Network connectivity to MNO 1's RAN domain subnet.

A frequent stumbling block when offering network slicing as a service between two operators is the requirement for both parties to share telemetry data. Typically, two mobile operators will share their RANs, employing both their RAN and network domains. As a result, it is critical for organizations to provide pertinent information about their infrastructure's health, particularly on key performance indicators and performance measures (latency, throughput, packet loss, jitter, and degraded performance alerts). After the pick is selected, the manner through which the data will be transmitted is also an important consideration. The traditional method of transmitting a "full bucket" of data via SFTP (Secure FTP) is clearly obsolete. Alternatively, innovative approaches such as Kafka streaming are favored nowadays, either by using the secure communication provided by the NNI (Network-to-Network Interface) between the two operators or by allowing Kafka streaming data to be transmitted along. Another option receiving some interest is to exploit a shared hyperscaler environment in which the two operators might coexist, securely transmitting telemetry data between themselves, while being protected by the hyperscaler’s security policies and solutions.

Future Service Models: Gaming/CDN as a Service

Gaming and CDN (Content-Delivery Networks) are global businesses that have seen a dramatic increase in opportunities and new revenue streams since the introduction of 4G/LTE and increased fibre penetration (FTTx), which will be accelerated further by 5G SA (Stand-Alone) implementations combined with the latest WiFi innovations (802.1ax for WiFi6 & 802.11be for WiFi7). Cloud/mobile gaming and content delivery networks share a set of core characteristics, including the followings:

1) High-speed downloads. Highly recommended for game file updates or the addition of new CDN or gaming content.

2) Scalability. It is a subject that is always being challenged and altered to meet the needs of an expanding player base, ensuring player loyalty and a high-quality experience in order to attract even more new players or viewers. Often, the easiest solution to manage high-speed scalability is to distribute CDN or gaming throughout the service infrastructure's edge locations to accommodate fluctuating user demand 3) Network availability. A popular film/TV series or a newly released game might generate a sudden and massive increase of players/viewers. A service infrastructure of this type should be dimensioned and capable of supporting such rising demands. The most effective strategy to assure high-speed scalability and availability is to provide a fault-tolerant infrastructure and the coordinated use of Cloud-Native technologies in conjunction with an edge-based distributed architecture. A distributed architecture combined with Cloud-Native technologies and monitoring principles ensures optimal resource management, which means that both gaming and CDN apps and functions distributed across the Central Office and Edge (up to the far edge alongside the O-DU) fundamentally reduce hardware impacts and thus de-facto reduce both OPex and CAPex, but also alleviates the pressure of users converging erratically on contents/games.

4) Proactive monitoring & Network support. It is critical for modern real-time games, competitive gaming, and popular content delivery networks (CDNs) (Netflix, Prime Video, or Disney +) to avoid any lags or delays that might interrupt the comfort and pleasure desired by players or viewers. Failure to provide an excellent experience result in customer/user loss. To avoid this, the combination of the most powerful hardware and the most dependable network infrastructure is required, with both aspects being proactively monitored for glitches/issues and requiring a Multi-Layer Service Assurance solution with cross-domain capabilities.

Conclusion

If we had to define Service Orchestration, we could say that it is the centralized coordination of IT services, functions, and resources across domains and layers in order to monetize end-to-end IT and business capabilities and processes.

The End-to-End Service Orchestration platform, also known as an Orchestrator or Orchestrators, is located at the top of the IT stack; its abstraction layers filter away the complexity of complicated individual systems and domain contexts. This not only simplifies process administration and monitoring of all xNF resources, but also streamlines end-to-end services and processes from a centralized point of management and control based on a model-driven approach rather than custom scripting or workflows.

End-to-End Service Orchestration is critical because legacy IT procedures manage platforms, systems, and environments that are built on disparate technologies from disparate suppliers. As a result, these technologies and systems are administered in silos by employees with specialized but constrained skill sets. These senior executives are thus forced to conduct research, build, and test customized scripts to manage cross-silos processes. Frequently resulting in lengthy and error-prone processes, scripts (workflows) rely on several manual triggers and handoffs that are difficult to manage and make it impossible to access real-time data or meet high-level user expectations.

Service orchestration platforms are envisioned and developed to integrate the administration and organization of diverse processes and systems across on-premises, cloud, and hybrid environments, all while adhering to Cloud-Native concepts and technologies. By providing a single platform (end-to-end orchestration platform) for establishing end-to-end processes referred to as services, IT teams can now centralize and distribute management and monitoring of those processes and resources closer to end users. As new use cases and monetization require, real-time monitoring and self-remediation technologies built on an expanded AI/ML framework can be utilized to anticipate and minimize service-impacting delays and failures, while also negating any monetization and upselling opportunities.

The benefits of end-to-end service orchestration to service infrastructure monetization are manyfold: Faster track to market, accelerated development of service to develop new revenue streams, using service assurance to improve customer loyalty and market differentiation but there are also other benefits that will significantly boost operators' margins on a per orchestrated service basis:

Improved SLAs — By immediately identifying the root cause of delays, overruns, underruns, or failures, a unified platform for orchestration and management (orchestration + service assurance) can be used to better forecast issues and take preventive actions.
Rapid DevOps — A robust and adaptable continuous integration/continuous delivery strategy enables service designers to create more dependable services and end-to-end testing, delivery, and deployment processes. \
CI/CD and LCM (Life Cycle Management) -- CI/CD (Continuous Integration / Continuous Delivery & Deployment) is a methodology that automates the process of frequently delivering software to clients. It is part of the overall DevOps framework and provides automated pipelines. When implemented end-to-end, including vendors and clients, CICD alleviates integration challenges, improves quality, and speeds up the delivery, by introducing tools that automate and monitor the overall lifecycle of applications and other software components, from the integration and testing phases to delivery and deployment. On the client site it typically supports instantiation, configuration, different levels and types of testing, deployment and rollback as needed. CICD uses services such as AWS CodePipeline, configuration management tools as Ansible, Chef, or Puppet, pipeline management tools as Jenkins or Gitlab, and repositories, such as Harbor.
Service instantiation & simplicity — Simplifying IT infrastructures with Service Orchestration. These features minimize the time required to develop and deploy services, increase recovery capabilities, monitoring, and capacity management.

Finally, Service Orchestration improves efficiency and significantly reduces costs leveraging universal standards and API accessibility. Additionally, by abstracting service tiers, an End-to-End Service Orchestration platform automates more and requires far less manual management. Simply put, this means that operators' employees may devote more attention to higher-value service-related activities and initiatives, as well as innovative new business services and models that leverage the best of digital service infrastructure transformation. Written by Luc-Yves Pagal Vinette @ Pagal Vinette Consulting with the collaboration of Vassilka Kirova @ Nokia Bell Labs Consulting