AI Generative

Microsoft Open Sources Phi-4-Reasoning-Vision-15B Model for Efficient Multimodal Tasks

Microsoft open-sources the Phi-4-reasoning-vision-15B model, featuring 15 billion parameters for high-performance multi-modal tasks at a fraction of typical costs.

Staff

Published

3 hours ago

Microsoft open-sources the Phi-4-reasoning-vision-15B model, featuring 15 billion parameters for high-performance multi-modal tasks at a fraction of typical costs.

Microsoft has officially open-sourced its latest multi-modal reasoning model, Phi-4-reasoning-vision-15B. With a parameter scale of 15 billion, this model strikes a balance between high performance and low cost while maintaining a lightweight design, making it a viable option for complex visual tasks in resource-constrained environments.

In contrast to prevailing industry models that typically rely on trillions of tokens for training, Phi-4-reasoning-vision was trained using only 200 billion multi-modal tokens. The development team focused on data quality, employing techniques such as deep cleaning of open-source data, the generation of targeted synthetic data, and a meticulous domain data ratio. This included an increase in math data to enhance its capabilities in scientific reasoning and screen positioning tasks.

A standout feature of this model is its innovative hybrid reasoning path design. For simpler tasks like image description and optical character recognition (OCR), the model defaults to a direct answer mode, effectively minimizing latency. In contrast, for more complex reasoning tasks that involve mathematical formulas and scientific charts, it automatically engages a structured chain-of-thought (CoT) path to ensure answer accuracy. Users also have the option to manually switch between these two modes using specific guiding words, allowing for adaptability in various scenarios.

Another notable aspect is the integration of the SigLIP-2 dynamic resolution encoder, which enhances the model’s perception capabilities when dealing with small elements in high-resolution screenshots. This makes Phi-4-reasoning-vision an excellent choice for developing computer operation assistants (CUA), capable of accurately identifying and interacting with buttons and input fields on both web and mobile interfaces.

Currently, the Phi-4-reasoning-vision-15B model is available on multiple open-source platforms. Microsoft aims to demonstrate that in the multi-modal AI field, the concepts of “smaller and faster” can coexist with “stronger,” thereby promoting the growth of spatial intelligence and real-time interaction technologies. As AI continues to evolve, the implications of such advancements could significantly influence the development of user-friendly interfaces and smart assistants, potentially reshaping the landscape of how individuals interact with technology.

Microsoft Removes Copilot Buttons from Windows 11 Apps, Shifts AI Integration Approach

Microsoft removes Copilot buttons from Windows 11 apps like Notepad and Snipping Tool, streamlining AI integration and enhancing user experience.

Staff16 hours ago

AI Cybersecurity

Anthropic Launches Project Glasswing to Combat Cybersecurity Workforce Shortage with AI

Anthropic launches Project Glasswing with partners like AWS and Google to transform cybersecurity using AI, targeting zero-day vulnerabilities for real-time defense.

Rachel Torres17 hours ago

Microsoft’s Stable AI Growth and Reliable Dividends Attract Retirees Amid Market Volatility

Microsoft's robust AI services and 153% dividend increase over the past decade position it as a preferred investment for retirees amid market volatility.

Staff18 hours ago

AI Technology

AI Infrastructure Investment Surges to $3 Trillion as Meta, Microsoft Expand Aggressively

Meta and Microsoft lead a $3 trillion AI infrastructure investment surge, with Meta committing $48 billion to reshape global compute capabilities by 2026.

Staff1 day ago

Perplexity Expands Plaid Integration for Personalized Financial Insights Across 12,000 Institutions

Perplexity enhances its Plaid integration, granting users insights from over 12,000 financial institutions, revolutionizing personal finance management.

Staff1 day ago

Elon Musk Accused of Legal ‘Ambush’ in $100B OpenAI Lawsuit Ahead of Trial

OpenAI accuses Elon Musk of a $134B legal ambush, alleging strategic disruptions ahead of a pivotal trial on AI ethics and responsibilities.

Staff1 day ago

AI Education

Colleges Adapt to Demographic Cliff and AI Disruption Amid Declining Enrollment Trends

Colleges face unprecedented pressures from a demographic cliff and AI disruptions, prompting urgent academic restructuring to align with evolving job market demands.

David Park1 day ago

AI Regulation

Microsoft Launches Agent 365 for AI Governance Amidst Valuation Concerns

Microsoft launches Agent 365 for AI governance, addressing enterprise needs while trading 37% below analyst target of $585.41 amid cybersecurity initiatives.

Staff2 days ago

AIPRESSA.COM

AI Generative

Microsoft Open Sources Phi-4-Reasoning-Vision-15B Model for Efficient Multimodal Tasks

Trending

Top Stories

Albania Appoints AI Bot Minister Diella Amid Corruption Concerns and EU Membership Goals

AI Government