Connect with us

Hi, what are you looking for?

AI Finance

NVIDIA Blackwell Achieves STAC-AI Record with 3.2x Performance Boost for LLM Inference

NVIDIA’s Blackwell architecture achieves a record-setting 3.2x performance boost for LLM inference in the STAC-AI benchmark, revolutionizing financial AI applications.

Large language models (LLMs) are transforming the financial trading sector by facilitating advanced analysis of unstructured data, which provides actionable insights for traders. These sophisticated AI systems evaluate a broad spectrum of information, including financial news, social media sentiment, and market data, to forecast stock price fluctuations and automate investment strategies with exceptional precision.

The Strategic Technology Analysis Center (STAC) has spent over 15 years developing benchmarks tailored for the financial industry. They recently introduced the STAC-AI benchmark, aimed at helping firms evaluate the end-to-end retrieval-augmented generation (RAG) and LLM inference pipeline, a critical component in deploying these technologies successfully.

In the context of this advancement, the STAC-AI LANG6 benchmark focuses on LLM inference performance, specifically examining the hardware and software stack using the Llama 3.1 8B Instruct and Llama 3.1 70B Instruct models. It incorporates custom datasets derived from EDGAR filings that model summarization tasks relevant to financial trading and investment advice.

The benchmark evaluates two inference scenarios: batch mode and interactive mode. In batch mode, all requests are processed simultaneously, measuring throughput; in interactive mode, requests arrive at random intervals, assessing metrics such as reaction time and words per second per user. Notably, interactive mode does not include the combination of the Llama 3.1 70B Instruct model with the EDGAR5 dataset, creating a focused environment for testing.

This evaluation is critical as the quality of output is compared to baseline LLM-generated responses, with the benchmark emphasizing the need for specific preprocessing steps that might be better suited for server-side execution in real-world applications, thereby placing additional demands on CPU resources.

Technical Details

The recent analysis compares two on-premises NVIDIA Hopper-based servers provided by HPE with a cloud-based NVIDIA Blackwell node. As part of the benchmarking procedures, post-training quantization was required. The models were quantized using the NVIDIA TensorRT Model Optimizer, with different quantization formats applied for Hopper and Blackwell to optimize performance. The TensorRT LLM inference framework was utilized to ensure efficient model execution while maintaining a familiar PyTorch development environment.

Benchmarking results from both batch and interactive modes reveal significant advantages for the NVIDIA Blackwell architecture, which outperformed its competitors across all scenarios in batch mode. For example, the Llama 3.1 8B model achieved a words per second (WPS) rate of 37,480 with the EDGAR4 dataset on the Blackwell node, demonstrating the model’s robust capabilities in processing financial data efficiently.

Furthermore, single-GPU performance assessments indicated a throughput advantage for the Blackwell architecture, with reported performance improvements of up to 3.2 times when compared to the previous generation of GPUs. This performance uplift positions NVIDIA’s newer models as pivotal tools in handling high-volume financial data processing.

In interactive mode, the balance between token economics and user experience becomes paramount. The analysis illustrated how the Blackwell NVL72 managed to maintain a favorable trade-off between throughput and both reaction time and inter-word latency across various model and dataset configurations. This highlights not only the model’s raw computational power but also its ability to deliver a responsive user experience, a critical factor in financial applications where timing can significantly influence decision-making.

Even when throughput levels were matched, the Blackwell architecture consistently outperformed the Hopper servers in terms of both reaction time and inter-word latency, showcasing its superiority in maintaining performance under load.

As firms seek to implement these advanced technologies, understanding how to benchmark models against specific dataset characteristics remains crucial. A guide for benchmarking TensorRT LLM with customized data has been made available, outlining steps for quantizing models and preparing datasets to suit particular use cases. This includes launching containers with the necessary dependencies, quantizing models using the NVIDIA Model Optimizer, and generating synthetic datasets to simulate real-world conditions.

Ultimately, the advancements represented by the NVIDIA GB200 NVL72 in the STAC-AI LANG6 benchmark signify a new horizon for LLM inference within the financial sector. Delivering up to 3.2 times the performance of older architectures, these new models not only achieve higher throughput but also ensure superior interactivity, making them invaluable assets for financial institutions seeking to leverage AI-driven insights.

While NVIDIA’s Hopper architecture continues to perform well, even three years post-launch, the enhancements of the Blackwell generation affirm the ongoing evolution of LLM technologies. As firms increasingly adopt AI solutions for trading strategies, the insights gained from this benchmarking will guide future investments and innovations in LLM applications.

See also
Marcus Chen
Written By

At AIPressa, my work focuses on analyzing how artificial intelligence is redefining business strategies and traditional business models. I've covered everything from AI adoption in Fortune 500 companies to disruptive startups that are changing the rules of the game. My approach: understanding the real impact of AI on profitability, operational efficiency, and competitive advantage, beyond corporate hype. When I'm not writing about digital transformation, I'm probably analyzing financial reports or studying AI implementation cases that truly moved the needle in business.

You May Also Like

AI Generative

MIT researchers unveil a new TLT method, boosting reasoning LLM training speed by 70-210% while maintaining accuracy, revolutionizing AI efficiency.

AI Technology

Singtel partners with Nvidia to launch a multimillion-dollar AI centre of excellence, accelerating enterprise AI deployment and overcoming infrastructure challenges.

AI Technology

NVIDIA partners with Meta to deploy millions of GPUs and enhance AI infrastructure, aiming for significant performance improvements in data centers worldwide.

Top Stories

Cohere launches Tiny Aya, a multilingual model family, achieving $240M ARR and positioning for a 2026 IPO with a focus on offline AI capabilities.

AI Finance

Ryt Bank launches Malaysia's first fully AI-powered digital banking platform, leveraging ILMU for autonomous finance and achieving instant credit approvals up to RM1,499.

AI Generative

Amazon introduces its Nova LLM-as-a-Judge, automating AI model evaluations with dynamic, task-specific rubrics to enhance accuracy and transparency in assessments.

AI Tools

Midpage integrates with Anthropic's Claude to enhance legal research, enabling law firms to streamline workflows with advanced AI tools and comprehensive case law access.

Top Stories

Alphabet's CFO Ruth Porat warns that a newly discovered vulnerability in the Git MCP Server could expose large language models to serious security risks,...

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.