Connect with us

Hi, what are you looking for?

AI Generative

Aniket Roy Reveals Resource-Constrained Image Generation Techniques in PhD Research

Aniket Roy, a PhD from Johns Hopkins, unveils FeLMi and DiffNat, enhancing image generation efficiency in low-resource environments for practical AI applications.

In a recent interview, Aniket Roy, a newly minted PhD from Johns Hopkins University, shared insights into his groundbreaking research in generative models for computer vision tasks. Under the guidance of Bloomberg Distinguished Professor Rama Chellappa, Roy’s work focuses on enhancing efficiency and adaptability in image generation, especially in resource-constrained environments.

Roy’s PhD research traverses the realms of generative AI, multimodal learning, and few-shot learning. He has sought to create methodologies that enable models to learn new concepts or execute intricate visual tasks with minimal data and computational resources. His work addresses longstanding challenges such as data scarcity and personalized image synthesis, aiming to make advanced vision systems more practical for real-world applications.

One significant contribution from Roy is FeLMi, a few-shot learning framework that utilizes uncertainty-guided hard mixup strategies. This innovation improves robustness when working with a limited number of labeled samples. Another noteworthy project is Cap2Aug, which employs textual descriptions to guide synthetic image generation, effectively enhancing visual diversity and bridging the gap between real and generated data.

In addition to these frameworks, Roy developed DiffNat, a regularization method that improves the perceptual quality of images generated by diffusion models. By applying a kurtosis-concentration loss, DiffNat encourages generated images to exhibit more natural texture statistics, a crucial element in enhancing visual realism for downstream vision tasks.

Furthermore, Roy has made strides in personalizing generative models. He introduced DuoLoRA, a framework designed for efficient control over content and style, allowing for fine-tuning without necessitating a complete model retraining. This innovation extends to zero-shot settings, enabling users to customize objects during generation simply through textual input. His MultiLFG framework further refines this process by incorporating wavelet-domain representations to facilitate accurate and training-free fusion of various concepts within diffusion models.

Among the projects that Roy found particularly engaging is DiffNat, which he presented at the International Conference on Learning Representations (TMLR) in 2025. This project highlights the importance of improving the perceptual quality of images generated by diffusion models, addressing a challenge that has persisted despite significant advancements in generative AI. Roy’s method not only enhances the statistical consistency of generated images but also integrates a condition-agnostic perceptual guidance strategy that boosts image fidelity without needing additional training.

The transition from academic research to practical applications is a key focus for Roy as he embarks on a new chapter at NEC Laboratories America as a Research Scientist. He aims to develop new generative model methodologies while exploring their interactions with multimodal systems. His interests lie at the intersection of generative models, vision-language-action models, and embodied AI, with the broader goal of enhancing intelligent systems that can proficiently understand and generate visual information.

Reflecting on his journey, Roy’s fascination with computer vision and machine learning was ignited during his undergraduate studies. The immediate visual impact of signal and image processing algorithms captivated him, fostering a deep curiosity about how machines can emulate human visual perception. His intellectual curiosity was further nurtured by mentorship from Dr. Kuntal Ghosh, who inspired him to approach complex problems with scientific rigor.

Roy’s experience at the recent AAAI Doctoral Consortium, although marred by visa issues that prevented his attendance, was nonetheless fruitful. His colleague’s presentation of his research poster sparked insightful discussions with fellow researchers, yielding constructive feedback and potential collaborative opportunities. Roy expressed appreciation for the platform, recognizing it as a valuable avenue for sharing early-stage ideas and engaging with the academic community.

Beyond his research endeavors, Roy finds joy in music, stand-up comedy, and travel. He considers exploring diverse cultures a refreshing escape and is also a budding poet who combines humor and storytelling through his performances. This creative outlet contrasts with his rigorous analytical research, allowing him to maintain a well-rounded perspective on life and work.

As Roy moves forward, he remains committed to advancing the capabilities of generative models and their applications, striving to contribute to the scientific understanding of intelligent systems that can interact effectively with the visual world.

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

Top Stories

Microsoft shifts to independent AI development, targeting state-of-the-art models by 2027, fueled by Nvidia chips and a new strategic focus.

AI Technology

Oracle secures $16 billion financing for a Michigan data center to enhance AI capabilities, coinciding with 10,000 layoffs amid rising operational costs.

AI Tools

Apple enhances Siri with third-party chatbot integrations via a new AI App Store in iOS 27, leveraging Google’s Gemini for a competitive edge.

AI Technology

Apple's iOS 27 update will allow Siri to integrate third-party AI chatbots like Google's Gemini and Anthropic's Claude, enhancing user personalization and functionality.

Top Stories

DeepSeek announces 17 job openings for agentic AI specialists, signaling a pivotal shift towards autonomous technology in China's competitive landscape.

AI Business

HSBC plans to cut 20,000 jobs, about 10% of its workforce, as it integrates AI to streamline operations and reassess non-client roles.

AI Tools

DoorDash launches its "Tasks" app, allowing 8 million couriers to earn money by creating AI training videos, enhancing operational efficiencies across sectors.

AI Cybersecurity

AI agents are transforming cybersecurity, as ODB's Matt Sweeny warns of escalating dual-use risks that threaten public safety and national security.

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.