Connect with us

Hi, what are you looking for?

AI Technology

AMD Advocates Integrated AI Compute with EPYC CPUs for 50% Cost Savings and Enhanced Performance

AMD’s EPYC processors enable firms like Kakao Enterprise to cut AI infrastructure costs by 50% while boosting performance by 30%, redefining compute strategies for AI.

Graphics processing units (GPUs) have emerged as the primary upgrade for companies enhancing their AI systems, particularly in the inferencing stage, where trained models produce outputs from new data. However, semiconductor firm AMD warns that solely depending on GPUs can hinder performance and escalate costs.

In a recent interview with Newsbytes.PH, AMD’s Asia Pacific general manager, Alexey Navolokin, emphasized the growing need for effective coordination among CPUs, GPUs, memory, and networking as AI workloads expand and agentic AI systems shift towards real-world applications.

“Today’s large models operate across clusters of GPUs that must work in parallel and exchange data constantly,” Navolokin explained. He noted that overall performance hinges not only on GPU speed but also on the efficiency with which data is transferred and computation is coordinated across the entire system architecture.

Navolokin pointed out a prevalent misconception that GPUs serve as the singular powerhouse for AI inferencing. He highlighted that modern AI models typically exceed the capacity of a single device, necessitating substantial support from host CPUs to facilitate data movement, synchronization, and latency-sensitive tasks. “A fast CPU keeps the GPU fully utilized, reduces overhead in the inference pipeline, and cuts end-to-end latency,” he stated, adding that even minor reductions in CPU delays can significantly enhance application responsiveness.

Tokenization, the process of converting inputs into numerical units, is heavily reliant on the interaction between CPU and GPU. “Inference runs token by token, and tasks such as tokenization, batching, and synchronization sit directly on the critical path,” Navolokin said. “Delays on the host CPU can slow the entire response.”

Beyond performance, Navolokin argued that optimizing CPU-GPU balance can lead to lower infrastructure costs by increasing GPU utilization and decreasing hardware requirements. “Higher efficiency enables teams to meet demand with fewer CPU cores or GPU instances,” he noted.

He cited a case study involving South Korean IT firm Kakao Enterprise, which reportedly reduced its total cost of ownership by 50% and its server count by 60%, while improving AI and cloud performance by 30% after deploying AMD’s EPYC processors.

The fifth-generation EPYC processors, according to Navolokin, can deliver comparable integer performance to earlier systems while using up to 86% fewer racks, effectively lowering both power consumption and software licensing requirements. He added that the demand for CPUs is exacerbated by the rise of agentic AI systems, designed to plan, reason, and act autonomously.

“These systems generate significantly more CPU-side work than traditional inference,” Navolokin explained. “Tasks such as retrieval, prompt preparation, multi-model routing, and synchronization are CPU-driven.” In these scenarios, the CPU functions as a control node across distributed resources that span data centers, cloud platforms, and edge systems.

AMD is focusing its EPYC processors as host CPUs for these demanding workloads. The latest EPYC 9005 Series boasts up to 192 cores, expanded AVX-512 execution, DDR5-6400 memory support, and PCIe Gen 5 I/O—features designed to support large-scale inferencing and GPU-accelerated systems. Navolokin mentioned that this latest generation shows a 37% improvement in instructions per cycle for machine learning and high-performance computing workloads compared to previous EPYC processors.

He also referenced Malaysian reinsurance firm Labuan Re, which anticipates reducing its insurance assessment turnaround time from weeks to less than a day after migrating to an EPYC-powered AI platform.

As AI deployments extend beyond centralized data centers, Navolokin urged organizations to rethink their infrastructure design. “The priority should not be the performance of a single compute resource, but the ability to deploy AI consistently across heterogeneous environments,” he advised. He underscored the importance of open platforms and distributed compute strategies, noting that real-time inference often runs more efficiently on edge devices or AI PCs closer to data sources.

“Success in inferencing is no longer defined solely by raw compute power,” Navolokin concluded. “It depends on latency, efficiency, and the ability to operate across data center, cloud, and edge environments.”

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

AI Technology

Intel and Dell unveil new AI-capable PCs designed to run smaller language models locally, slashing cloud costs and enhancing operational efficiency for businesses.

Top Stories

Chinese semiconductor firms capture 41% of the AI server market as Nvidia's share plummets to 55% with 2.2M GPUs shipped amid U.S. sanctions.

AI Technology

NVIDIA partners with Marvell, investing $2B to enhance AI infrastructure with NVLink Fusion, enabling scalable, custom AI systems amid surging demand.

AI Technology

AMD announces its Advancing AI Summit for July 22-23, 2026, unveiling a five-year roadmap for AI innovation and practical resources including free GPU hardware.

Top Stories

DeepSeek shifts to Huawei chips, revealing a 50% spike in Chinese representation in US AI research, as Western firms struggle with $15M daily costs...

AI Technology

Fujitsu announces a ¥58 billion investment to develop a 1.4nm neural processing unit for AI inference, backed by Japan's NEDO to enhance domestic chip...

Top Stories

Memory manufacturers face a 20% share drop as Google’s TurboQuant optimizes AI workloads, intensifying the memory crisis amid rising geopolitical tensions.

AI Technology

AI infrastructure advancements enhance Kubernetes performance, achieving 95% gains in resource efficiency and stabilizing TPOT metrics for scalable AI deployments.

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.