Researchers at the University of California, Irvine, have unveiled a pioneering framework named HELIOS that significantly enhances the decompilation of binary code through large language models (LLMs). This innovative approach addresses the limitations of existing methods, which often treat code simply as text and fail to account for essential control flow graphs. The research team, comprising Yonatan Gizachew Achamyeleh, Harsh Thomare, and Mohammad Abdullah Al Faruque, has reframed binary decompilation as a structured reasoning process, enabling a more accurate understanding of complex, optimized binaries.
The core advantage of HELIOS lies in its ability to summarize a binary’s control flow into a hierarchical text representation. This representation details basic blocks, their connections, and high-level constructs like loops and conditionals, thus providing critical structural context that traditional LLM approaches overlook. The research demonstrated significant improvements in the compilability of object files, with performance metrics soaring from 45.0% to 85.2% when using Gemini 2.0, and from 71.4% to 89.6% with GPT-4.1 Mini on the HumanEval-Decompile benchmark.
Incorporating compiler feedback further boosted compilability rates beyond 94%, while also improving functional correctness by up to 5.6 percentage points compared to text-only methods. This represents a substantial leap in the reliability and usability of LLM-driven decompilation, particularly for security analysts who often grapple with complex binary analysis. The framework has shown adaptability across six architectures, including x86, ARM, and MIPS, effectively reducing variations in functional correctness while maintaining high syntactic accuracy.
The method employed by the team includes a static analysis backend to derive both control flow and call graphs, which form the basis for the hierarchical textual representation. This representation is then used in conjunction with raw decompiler output and optional compiler feedback to guide the LLM’s interpretation. By crafting prompts that summarize each function’s role and detailing the control flow, researchers have created a system that mirrors the analytical approach of human experts, facilitating a fine-tuning-free and architecture-agnostic pipeline.
In their experiments, the research team utilized the HumanEval-Decompile benchmark focused on the x86_64 architecture, where they achieved impressive results. The framework’s ability to translate intricate control flow data into a format that LLMs can effectively process has proven essential for improving the accuracy and consistency of decompiled code. The results reveal that HELIOS not only enhances the rate at which code can be successfully compiled but also strengthens the logical consistency of the output, positioning it as a transformative tool for reverse engineering and security analysis.
As software security evolves and the demand for skilled reverse engineers increases, HELIOS addresses a pressing need within the industry by automating challenging tasks like reverse engineering, malware analysis, and vulnerability assessment. With this framework, analysts can obtain recompilable, semantically faithful code across various hardware platforms, making it a practical asset for security settings. The researchers highlight the potential of HELIOS to reshape the landscape of binary analysis and software security, paving the way for more efficient and effective security research methodologies.
See also
Scientists Advance Continual Learning with New Multimodal Panoptic Perception Model
Mexican President Sheinbaum Presents AI-Generated Image Claiming Ryan Wedding Surrender
Jim Cramer: Generative AI Shaves Microsoft’s P/E Ratio Significantly Amid Market Pressure
Chinese Start-ups DeepSeek and Moonshot AI Open-Source Advanced Multimodal Models, Enhancing Ecosystem Growth



















































