In a groundbreaking achievement, Samsung Research has successfully overcome a significant challenge in on-device artificial intelligence (AI). The company has demonstrated the capability to operate a 30-billion-parameter generative model—which typically requires over 16GB of memory—using less than 3GB through innovative compression algorithms. Dr. MyungJoo Ham, a lead expert at Samsung’s AI Center, detailed this advancement in an exclusive interview with Samsung Newsroom.
This breakthrough signifies a dramatic shift in what is possible for AI applications on mobile devices. Just six months ago, fitting enterprise-grade AI capabilities into a smartphone’s limited memory seemed implausible. According to Dr. Ham, the company has effectively reduced the size of massive language models by over 80% while sustaining performance levels comparable to those found in cloud environments.
The figures are striking. Samsung has achieved a feat where a 30-billion-parameter generative model can be executed on devices with just 3GB of memory through sophisticated quantization techniques. “We’re developing optimization techniques that intelligently balance memory and computation,” Dr. Ham noted. He emphasized that loading only the data required at any given moment vastly enhances efficiency.
Importantly, this achievement goes beyond mere academic interest. Samsung is actively commercializing these algorithms across a range of devices, including smartphones and home appliances. Each device is tailored with custom compression profiles to maximize performance. “Because every device model has its own memory architecture and computing profile, a general approach can’t deliver cloud-level AI performance,” Dr. Ham explained. The focus of Samsung’s research is on creating AI experiences that users can directly engage with in their daily lives.
The core of this transformation lies in advanced quantization methods. These techniques allow the conversion of complex 32-bit floating-point calculations into more efficient 8-bit or 4-bit integers. Dr. Ham likened this process to photo compression, stating, “The file size shrinks but visual quality remains nearly the same.” Samsung’s algorithms are designed to assess the significance of each model weight, ensuring that essential components retain higher precision while less critical elements undergo greater compression.
However, compression alone isn’t sufficient. Samsung has also developed a custom AI runtime engine that serves as the “model’s engine control unit.” This engine intelligently distributes computations among the CPU, GPU, and NPU processors, allowing larger and more sophisticated models to operate at equivalent speeds on the same hardware. Dr. Ham pointed out that the primary obstacles to on-device AI include memory bandwidth and storage access speed. Samsung’s runtime system anticipates when computations will occur, pre-loading only the necessary data while minimizing memory access patterns. The outcome is a significant reduction in response latency, leading to improved AI performance characterized by smoother conversations and enhanced image processing capabilities.
This development could pave the way for more powerful AI applications directly on consumer devices, expanding the possibilities for personalized user experiences. As the AI landscape continues to evolve, Samsung’s advancements in compression and runtime efficiency may set a new standard for what consumers can expect from their devices.
Asseta AI Secures $4.2M to Enhance Intelligent Financial Tech for Family Offices
VeydooMax Unveils V5 AI Smart Hub at EICMA 2025, Attracts 100+ European Dealers
AI Growth Lab Launches to Address UK’s Workforce AI Literacy Gap, Says AlphaSense’s Sanchez-Grant
VA Launches Ambient AI Scribe, Saving Clinicians 2 Hours Daily and Enhancing Veteran Care
Tech Stocks Plunge as Nvidia’s Strong Earnings Fail to Restore Investor Confidence



















































