top of page

True Telco AI Architecture: Beyond the GPU Bubble

  • Photo du rédacteur: Luc-Yves Pagal Vinette
    Luc-Yves Pagal Vinette
  • il y a 4 jours
  • 15 min de lecture

Introduction


Currently, we are witnessing a sense of hype in the AI landscape, but the evidence tells us a different story. Indeed, in the race to monetize a pseudo "Artificial Intelligence" we can see staggering valuations and impressive commitments - such as the reported $1.15 Trillion foreseen commitments involving one the most notable protagonist of this market with Open AI. It has certainly created a financial rumble that increasing demonstrating the typical traits of an AI bubble. These financial movements defy logical explanation, especially when measured against the fundamental lack of genuine, broad-based enterprise innovation and applicability.


Although Large Language Models (LLMs) and Generative AI (GenAI) have demonstrated undeniable interests as advanced analytical and search engines, however these functions have not, at all, translated into the fundamental and self-sustaining intelligence required by and for mission-critical verticals and industries. For Telcos, a sector that was defined by massive capital expenditure, extreme operational and siloed complexity and having the NorthStar for real-time service quality then we can clearly see that the current AI paradigm was thought with a major flaw, as its architecture was built for content and not for core network decision making.


The current AI model is trapped in an infinite cycle of training and inference loops, tethered to power-hungry GPU-centered hardware. GPUs are not fit for economies of scale and certainly not suited for the dynamic edge-computing demands of a modern telecom era. Worse, this approach promotes a business "battlefield" where domain-specific and model dominance prevails that is actually threatening the end-to-end notion that customers rely on and largely demand. If we only imagine that the inception of Agentic on such shaky foundations merely add to the pile of complexity rather than eliminating it.


A dramatic, close-up shot of a liquid-cooled GPU rack with visible heat plumes or vapor, contrasted sharply with a very small, simple, brain-like processor chip

This article challenges the current AI narrative. We now witness that the industry is pursuing a technologically negative path that will ultimately fail to elevate from the current inefficacies and lack of new revenue streams. AI should normally be an accelerator of particles to boost the transition from Telco to Techco and even provide to enterprise the AI-driven operational tools to boost productivity but no negate humanity capabilities but to enhance them. The industry must shift its axis from the current Generative hype and embrace a true Telco AI vision. One that is actually defined by an architectural resilience, cross-domain collaboration and the capacity for true autonomy and neurological-inspired learning capabilities.


Therefore, we will explore this paradox in three parts:


Part 1 - The Architectural GenAI Dead ends : Exploring the technical limitations of LLMs/GenAI, the economic barrier of the GPU-based model and the inherent flaws in Agentic AI.


Part 2 - The Operational nonsense : Questioning the major governance gaps, the data sovereignty risks and the inability to reach a true end-to-end service alignment in a multi-vendor context.


Part 3 - What is the true Telco AI vision : Proposing a path forward, leveraging a Neuromorphic AI approach to true intelligence that could really deliver the sub-millisecond, cross-domain and true decision making capability that is really essential for the future of network.


Part 1 - The Architectural Dead Ends : The Limitations of LLMs, Gen AI and the fake promises of Agentic AI


The current AI landscape is defined by its reliance on Large Language Models (LLMs) and Generative AI (Gen AI), which, despite their complexity and sophistication, illustrate some profound architectural limitations that make them ineffective for the core operational requirements of modern Telco and enterprise networks.


1.1 The Training/Inference Trap : Pattern matching but Not True Intelligence

The acronym stands for Artificial Intelligence, which implies a system with the ability to learn by itself, mimicking the functional plasticity of the human brain. Current LLMs and GenAI models , are static learning machines, that are trapped in infinite training/inference loops, optimizing for pattern and text prediction based on massives but fixed datasets, resulting in:


  • Fixed Knowledge Base : An LLM's knowledge is statically limited to its training date. It cannot learn on its own from the continuous flow of real-time that defines any network domains



  • Human Bias all over : These massive training sets inevitably invite the systems with human biases, which, in critical operational domains represent a major flaw where impartiality is of utmost importance in machine-driven network governance.


As I am sure you know where I am going with this, this current approach of AI cannot be qualified as genuine intelligence with an inherent ability to learn by itself. Rather, I'd qualify it as merely advanced analytical search engine, incapable of adapting to repetitive, rare or highly unusual and rare network events that makes true operation training based upon experience.


1.2 A scalability crisis : The GPU-Centric, power-vampiric model

The main hardware component of the architecture supporting the current AI boom - The Graphics Processing Unit (GPU - is to me the main impending element of the AI scalability crisis, particularly for the Telco context, that can also apply to enterprises.


  • Economic struggle for the Edge : Despite GPUs being powerful, it is inherently power-vampiric and cost-probitive for both distributed and large scale deployment. Telcos, around the world, have largely invested into Telco cloud infrastructure and partnered with hyperscalers to create most valuable real estate for new service creation with the Edge either leveraging a hybrid or public nature of it.


    • However, it is important to consider how deploying massive GPU clusters, today identified as AI factories, in hundred of geographically dispersed metro or cell-site locations, purposefully deployed for AI-RAN, would result in a massive economic drain that would objectively dampen any innovative service creation


iquid-cooled GPU rack with visible heat plumes or vapor, contrasted sharply with a very small, simple, brain-like processor chip

  • Energy Consumption & Thermal Management : GPU-based datacenters generate an impressive level of heat at scale. High-density filled AI racks can draw between 60 kw to 120 kw of power. This necessitates intensive and expensive cooling infrastructure (liquid or advanced airflow). Such requirements of massive power resources result in a poor Power Usage Effectiveness (PUE), that would be directly conflicting with Telco and enterprise sustainability goals and balance of revenue and operational costs.


1.3 Token Management and Latency Overhead

LLM architectures rely on Token Management tools to define and process context. But, this mechanism or tool brings two important operational deficiencies for Telco/Techco tasks :


  1. Resource-Over-Consumption : LLMs have serious issues freeing up compute resources used once a token has been used to serve up a request/a prompt. This pushes to over-consume of pricey GPU resources and exagerates the architectural inefficiencies of these platforms.


  1. Latency : The processing of both extensive context windows and the complexity of managing token aspects introduce network and processing delays that would prove inacceptable to reach a sub-millisecond, real-time decision making required for advanced capabilities such as Intent-based orchestration, dynamic network slicing, multi-domain anomaly detection and multi-layer security observability.


1.4 The illusion of Agentic AI

Agentic AI should be seen as a natural extension of LLMs, working as autonomous systems meant to orchestrate actions leveraging LLMs via external APIs. It is being positioned as THE solution for operational automation. However, we need to appreciate that this approach would simply be fundamentally limited by the same limitations we have highlighted before about LLMs. But more importantly, it would bring a significant added level of complexity of the LLMs it is based upon.


  • A Fragile Orchestration : Agentic AI requires the generalist and cumbersome LLM to be implemented in very highly specific operational domains. This would require extensive and complex engineering for every domains, especially if a cross or multi-domains context is intended.


  • Lack of domains visibility : By de-facto, LLMs lack the inherent structure for cross-domain knowledge. Indeed, the issues affecting a given network domain such as RAN would generate symptoms in another domain (5G Core, Cloud or IT infra). Because, current LLM or AGI systems do not factor or consider cross-AI or even cross-AI vendor communications. This inevitably leads to agents being limited and only localized perspectives that would never be able to break into end-to-end services.


By requiring LLM to learn and manage operational logic on the fly, a function for which it has never been designed for would result in an Agentic AI to become an inefficient and expensive automation layer built on a shaky house of cards rather than a true malleable learning engine.


Part 2 - The Operational Misalignment : Questioning AI Governance and End-to-End Alignment


As expressed in the previous chapter, the current generation of AI is architecturally flawed for telecom and enterprise application. But, its implementation raises equal concerns regarding Data Governance, security and holistic operational integrity. Also, in our industry the pursuit of market dominance rather than cross-platform collaboration creates a fundamental misalignment with the Telco/enterprise mandate for secure, seamless and end-to-end service delivery in multi-vendor environments.


2.1 The Data Governance and the Security Conundrum

LLM and GenAI systems revolve around the continuous update of AI applications. This cycle is determined by the infinite loop of training/inference, which actually poses a direct threat to robust Data Governance. To deliver operational relevance, especially for tasks related to OSS/BSS or networking service layers, these models require access to Telco/Enterprise sensitive data and detailed operational file logs.


  • Geographical and Regulatory constraints : To train such systems (LLMs/Agentic AI) for specific customer domain environments, requires that sensitive data must be utilized within strict geographical restrictions (GDPR or CCPA where data shall remain in EU or gets properly protection / CCPA means that consumers have rights to opt-out/delete, etc,,). Such approaches necessitates that the utmost security and guarantees about data localization and safeguarding. Therefore, the current centralized training paradigm often requires data aggregation from various sources, leading to serious questions : Where will the customer data stored and used for training ? How can the security and sovereignty be guaranteed during training and inference transfer ? Now, a decentralized or third-party training or inference model would raise even more significant legal and risks for any Telco/Techco or enterprise.


  • The additional risks of Federated/Decentralized Learning : There is a also a possibility that customer-specific data used for training, could potentially be using a 3rd-party partner either for federated training (notably in case of cross-operator partnerships such as MOCN or MORAN) or decentralized approaches towards hyperscaler partners. In such cases, training would be occurring, behind the operator firewall locally rather than risking transporting or aggregating in a large public or hybrid cloud environment.


2.2 The Siloed-Optimization Trap : The Business "Game of Thrones"

Modern telcos are currently evolving towards a Techco evolution, such a shifting change relies naturally on a multi-vendor construct for both hardware and software and this across the RAN, Transport domain, 5G Core and also Cloud Infrastructure with a dual-sided view of Telco Cloud and hyperscaler partnership(s). In such an environment, vendors would supply domain-specific AI solutions that are TOTALLY incompatible, where the focus for each vendor is often on achieving domain dominance and developing/optimizing localized Key Performance Indicators (KPIs). However, this has serious implications :


  • The Absence of Cross-AI Communications : Current AI systems are mostly designed to operate in silos, preventing any cross-AI domain or cross-AI vendor communications. This introduces a natural lack of interoperability being a critical structural flaw.


  • Impact on End-to-End (E2E) service notion : Any AI decisions taken apply in one domain, for instance, AI-RAN takes aggressive actions to optimize power usage that might have beneficial impacts but could negative impacts over the end-to-end service notion by degrading either latency, jitter, availability or bandwidth metrics as the service propages through the transport network and the 5G core.


A conceptual image showing a city skyline made up of several high, separate towers (the silos/vendors). Between the towers, fragile, crumbling bridges (the missing cross-AI communications) are failing

  • The Requirement for Alignment : To ensure some optimal customer experience, all individual domain/silo decisions must align with a set of decisions (across domains) that benefit the holistic E2E customer service. Without an active and structured approach for different AI domains or AI vendors to communicate and cross-exchange knowledge, Telcos or Techcos are left with a set of silo-ed efficiencies that might benefit the local domain but impact the overall quality experienced by the end customer and negate any possibilities for creating new generation of services. The AI systems and implied vendors work hard for individual/silo-ed/domain dominance but NOT for a collaborative E2E alignment.


2.3 The Energy and Thermal Management Misfit

Resource consumption highlights a fundamental architectural mistake that is directly impacting the infrastructure supporting the AI foundations.


  • PUE Degradation : Current AI systems are based upon heady-duty GPUs creates serious challenges. As I highlighted prior in the document, high-thermal output requires intense cooling capabilities (liquid or airflow) leading to indirect power consumption.


  • The Sustainability or Green Barrier : For years, global corporate had a clear mandate for sustainability, but, such an obscene approach based on GPUs goes in direct contradiction with the green objectives placing a significant operational expenditure barrier on Telcos/Techcos/enterprises looking to scale their AI capabilities. So, as you understand out of this, both energy consumption and thermal management shouldn't be underestimated as they are fundamental issues and flaws in the current AI construct that actually jeopardize both the economic viability and the green responsibility of future network services.


As an element of comparison, the human brain consumes about 20w of power representing 20% of the energy generated by the human body although the brain is only estimated of being 2% of the body mass.


Therefore, we can largely estimate that AI places itself at the confluence of regulatory risk, architectural inflexibility and operational silo-ed fragmentation means that current LLM/GenAI and derivative models such as Agentic AI are merely suboptimal for Telco/Techco and even enterprise operations. They actually represent a fundamental operational misalignment that requires correction before true, economically viable and scalable network intelligence can be achieved and sustained.


Part 3 - A True Telco/Techco Vision - Architecting a Neuromorphic AI future


While preparing this paper, I realized that the most important of it would be from transitioning from a critics point of view to a more constructive vision. A vision that really defines a practical path forward that would be detailing the precise areas where a true "Telco AI" can deliver value, but most importantly, approaching it with a radical architecture change needed to reach it.


This "Telco AI" is actually a model that is defined by a holistic architectural vision where the notion of true intelligence is natively integrated across the entire Telco/Techco service infrastructure. Its applicability is not a mere addition but a real transformative approach across four key domains the Network, Operations, Assurance and the AI underlying communication fabric and I left Security aside on purpose (as largely covered in the industry with Cybersecurity, SD-WAN and SASE).


3.1 Network with Telco AI : Accelerating the Wireline-Wireless Network Convergence


Network AI represents an attempt of applying intelligence directly into the service infrastructure to achieve unprecedented convergence, efficiency and operational gains.


  • AI-RAN and Wireline-Wireless Convergence


  • AI-RAN, (I wrote a more detailed paper about it therefore I won't go in-depth here) fundamentally accelerate the transformation of wireless service infrastructure, enabling a true disaggregated Radio Access Network but leveraging AI natively to enable predictive capacity planning and energy optimization at the edge.


  • Such an acceleration is now naturally associated to the Wireline infrastructure via the 5G Access Gateway Function (AGF) and the standards provided by the Broadband Forum (BBF). Which defines how wireline access networks integrate with the 5G core.


  • The need for wireless-wireline convergence is dictated by the need of evolving towards a seamless, consistent user experience under a common umbrella of IT capabilities such as authentication, orchestration & fulfillment regardless of the access technology used (Fiber, Copper, Radio or Satellite).


  • Market alignment : The strategic investment made by Nvidia in Nokia, while mostly focused on GPU-based AI-RAN (another insanity, underpins the industry belief in a true AI-Native 5G/6G wireless era that wouldn't restrict itself at the RAN. The challenge still resides in a truly adaptive and efficient intelligence, but not just a new form of GPU centralization.


  • API Exposure and Monetization : Monetization of network feature capabilities via API Exposure has been a decades-long challenge. Most recent advances in that field have now expanded these capabilities with exposure of network features (GSMA/Camara) or service capabilities (MEF now Mplify). In such a context, a true Telco/Techco AI might accelerate this by managing the API translation layer :

    • Telco AI can better manage the translation between external (northbound) and internal (southbound) APIs, but also by the handling of complex authentication, encryption and the crucial token management required for seamless API consumption, security and monitoring


3.2 OSS & BSS with Telco AI : Orchestrating the End-to-End Service

The Operational and Business Support Systems (OSS/BSS) are the layers where the complexity of the multi-domain challenge is most apparent. AI would be essential to achieve true multi-domain automation and assurance.


  • Multi-Domain Service Orchestration (MDSO) : Orchestrating services across multiple entangled domains such as RAN, Transport, 5G Core, IT and Cloud, was the main failure of traditional orchestration based upon a silo-ed approach. However, to achieve full automation, an AI-driven MDSO requires:

    • An AI-based access to advanced and multi-layer real-time inventory

    • The implementation of a robust Open and Closed-Loop capability that is taking place between Service Orchestration (action) and Service Assurance (observability). Telco AI observes and validates the crucial KPIs that demands orchestrated actions either validated by a human decision making (Open-Loop) or partially or fully automated by AI (Closed-Loop). Closed Loop is an essential component for Intent-driven orchestration with continuous monitoring of deployed functions or services such as Network Slicing associated with 5G Network functions.


  • Multi-Domain Service Assurance and Single Pane of Glass (SPOG) :

    • Multi-Domain Service Assurance (MDSA) is crucial to extract data from all direct domains (RAN, Transport, 5G Core, Cloud).

    • Such extracted data is used to generate key performance indicators (KPIs). Such KPIs would be used : One for a unified dashboard where all domain KPIs are visualized thru a Single Pane of Glass. Two, it would allow for a multi-Tenant platform that a customized AI-powered per-customer dashboard allowing to fully monetize Service Assurance capabilities.

    • It is relatively clear that without Telco AI, generating a multi-layered, customized KPI stream in real-time would literally be impossible.


  • BSS (Billing & Charging) with Telco AI : Billing remains a critical aspects of monetization principles, especially as monetization is rapidly shifting towards API consumption and complex services slices or end-to-end service. Telco AI might help guarantee the accuracy by managing the entire transaction lifecycle to ensure :

    • That the right APIs are invoked towards the BSS layer and charged accordingly

    • That the length, volume and number of API transactions consumed are well-captured and correctly reflected in the billing record.


3.3 The Neuromorphic AI leap : Cross-Domain AI Communications

Perhaps, the greatest challenge to overcome in a Telco AI E2E vision is the necessity for an underlying communication architecture, that would naturally be cross-domain and cross-vendors. However, traditional means of packet-based networks are using TCP/IP, which is to me a real-limiting factor for a true, real-time intelligence exchange.


  • The Problem with TCP/IP : Introduces latency and overhead prevent from a sub-millisecond (ms) knowledge exchange required for a synchronized, holistic decisions across diverse AI domains such as RAN, security, transport, 5G core and cloud.


A sleek, stylized visual representation of a Spiking Neural Network (SNN)—a network of nodes and lines with energy spikes or pulses flowing between them.

  • The Proposed Solution - Neuromorphic AI : Addresses communications in a more flatten and an event-driven approach is required. One that replicates the brain's efficiency. This involves leveraging Neuromorphic AC principles and architectures with Spiking Neural Networks (SNNs) and Synapses.

    • Event-driven Communication : SNNs exchange using discrete spikes (only producing data when needed) in a true real-time manner. This allows for cross-pollination of shared knowledge across all domains for informed decisions across the service infrastructure. However, communications are exchanged at or below the millisecond level.

    • The Goal : is providing a fundamental layer of intelligence and necessary communications where AI is not just used to analyzing data but also exchanging instantaneous information, contextual decisions taken in other domains but ultimately enabling the network to self-healing capability that would allow a faster adaptation to repetitive or unknown situations than any human-in-the-loop system could manage alone. In this approach, humans are certainly not put aside, but will still be an important of the reflection bringing their own creativity, out-of-the-box thinking and innovation at various levels but leveraging the key information by the Neuromorphic AI system for final decision-making.


3.4 Defining True Learning : Adaptive Plasticity

A true Telco and Neuromorphic AI must not only mimic the biological brain's learning capabilities but may even surpass it to reach its full maturity.


  • Learning contextually : The Telco & Neuromorphic AI system would naturally benefit from an initial training but then would learn continuously within the live data context of its respective domain (security, network, OSS, 5G core, RAN, etc..). Somehow, data flows become the fuel of this ongoing evolutionary process.


  • Adaptive Plasticity : Very crucially, this learning process must include the capacity to flush old or irrelevant past knowledge. A feature of biological intelligence that allows for staying continuously relevant and efficient. To me, this is the ultimate blow and key differentiator from current LLMs, which remain perpetually attached to their static training/inference sets.


Part 4 - Conclusion : Bridging the Gap from Hype to Holistic Intelligence



I sadly believe that the Telecommunications industry stands at a perilous crossroad. Corporations and key individuals have been lured by siren music of market hype and the promise of only money and quick valuation captations regardless of the human and social costs it would impose on others.


While laying off hundred of thousands of people, many AI market actors committed hundreds of billions to an AI paradigm that is fundamentally misaligned with Telcos/Techcos and enterprises operational and architectural requirements.


Through this document, I have tried in the best of my abilities to demonstrate how wrong the current GPU-centric LLM/GenAI model was and how a flawed foundation it represented. It is indeed power-hungry, economically non viable for the edge, strategically fragmented also silo-ed and legally dangerous due to the inherent data governance and security risks it introduces. In my view, the reliance of such architecture will trap organizations in a vertical and isolated, not only that will take years to remove from, but, it would create evasive gains in a given domain although eroding the overall end-to-end service notion that is actually sold to the customers.


This path reflects a profound lack of strategy, vision and wisdom. It is also showing a profound and collective myopia that prioritizes technological focus that favors stock market positive valuations benefiting only a few over genuine architectural innovation that ensure the betterment for all.


The Universal Requirement of True Intelligence


Humbly, I believe that the solution is not more horsepower or brute force but instead.. It should be smarter, more humble beginning, decentralized and holistic intelligence. The term Telco AI shouldn't be strictly in a pure technology term but more as a universal architectural vision for small to large scale able to handle high-volume, able to address mission-critical environments. This vision applies equally to :


  • Telcos : Requiring the autonomous, cross-domain management of complex wireline/wireless services infrastructure.

  • Techcos: A natural evolution from Telcos, seeking to eliminate latency and operational challenges from their digital and cloud platforms

  • Enterprises : Demanding secure, localized and contextually aware intelligence that ensures data sovereignty.


The Final Warning


I truly believe that the future of sustained profitability (not money but for mankind) and service differentiation requires on the development and deployment of a True Telco AI. An event-driven learning AI system built on principles of Adaptive plasticity and Neuromorphic Communication between AI domain elements. Such an architecture defined by sub-millisecond, event-driven and cross-AI or cross-vendor communication, is the only way to truly achieve Multi-domain capabilities (orchestration or Assurance) and necessary for networks to become self-healing and self-optimizing engines.


It is truly sad to see the current AI bubble, becoming a technology circus fueled by financial speculation and architectural compromise. By continuing to invest in impasse systems. Therefore, the industry is not just wasting money, it is also wasting lives and talents, subsequently it is trading the betterment of a complete industry, long-term operational resilience and innovation for short-term speculative gains.


AI market actors as well as Telcos/Techcos/Enterprises shouldn't remove the pursuit or utilization of GenAI but take it for it is : advanced and analytical search engines. The whole industry should consider and focus on purpose-built, architecturally sound and practical intelligence not centered on GPUs but looking at power-moderate alternatives such as CPU, NPU, TPU where GPUs would only be used when critically demanded.


The strategic imperative is clear : embrace a true Telco AI vision or risk that AI would become another casualty of technological failed revolution. A quick look in the mirror, what happened to these trends : Metaverse, 5G Private Networks, Network Slicing, etc.. Actually, all could be achieved with the principles of Telco AI.


Thank you for taking the time reading it. Luc-Yves Pagal Vinette

Commentaires


Contact
  • LinkedIn Social Icône

©2019 Tous droits réservés

Thank you! Message Sent!

bottom of page