In a recent interview, Aniket Roy, a newly minted PhD from Johns Hopkins University, shared insights into his groundbreaking research in generative models for computer vision tasks. Under the guidance of Bloomberg Distinguished Professor Rama Chellappa, Roy’s work focuses on enhancing efficiency and adaptability in image generation, especially in resource-constrained environments.
Roy’s PhD research traverses the realms of generative AI, multimodal learning, and few-shot learning. He has sought to create methodologies that enable models to learn new concepts or execute intricate visual tasks with minimal data and computational resources. His work addresses longstanding challenges such as data scarcity and personalized image synthesis, aiming to make advanced vision systems more practical for real-world applications.
One significant contribution from Roy is FeLMi, a few-shot learning framework that utilizes uncertainty-guided hard mixup strategies. This innovation improves robustness when working with a limited number of labeled samples. Another noteworthy project is Cap2Aug, which employs textual descriptions to guide synthetic image generation, effectively enhancing visual diversity and bridging the gap between real and generated data.
In addition to these frameworks, Roy developed DiffNat, a regularization method that improves the perceptual quality of images generated by diffusion models. By applying a kurtosis-concentration loss, DiffNat encourages generated images to exhibit more natural texture statistics, a crucial element in enhancing visual realism for downstream vision tasks.
Furthermore, Roy has made strides in personalizing generative models. He introduced DuoLoRA, a framework designed for efficient control over content and style, allowing for fine-tuning without necessitating a complete model retraining. This innovation extends to zero-shot settings, enabling users to customize objects during generation simply through textual input. His MultiLFG framework further refines this process by incorporating wavelet-domain representations to facilitate accurate and training-free fusion of various concepts within diffusion models.
Among the projects that Roy found particularly engaging is DiffNat, which he presented at the International Conference on Learning Representations (TMLR) in 2025. This project highlights the importance of improving the perceptual quality of images generated by diffusion models, addressing a challenge that has persisted despite significant advancements in generative AI. Roy’s method not only enhances the statistical consistency of generated images but also integrates a condition-agnostic perceptual guidance strategy that boosts image fidelity without needing additional training.
The transition from academic research to practical applications is a key focus for Roy as he embarks on a new chapter at NEC Laboratories America as a Research Scientist. He aims to develop new generative model methodologies while exploring their interactions with multimodal systems. His interests lie at the intersection of generative models, vision-language-action models, and embodied AI, with the broader goal of enhancing intelligent systems that can proficiently understand and generate visual information.
Reflecting on his journey, Roy’s fascination with computer vision and machine learning was ignited during his undergraduate studies. The immediate visual impact of signal and image processing algorithms captivated him, fostering a deep curiosity about how machines can emulate human visual perception. His intellectual curiosity was further nurtured by mentorship from Dr. Kuntal Ghosh, who inspired him to approach complex problems with scientific rigor.
Roy’s experience at the recent AAAI Doctoral Consortium, although marred by visa issues that prevented his attendance, was nonetheless fruitful. His colleague’s presentation of his research poster sparked insightful discussions with fellow researchers, yielding constructive feedback and potential collaborative opportunities. Roy expressed appreciation for the platform, recognizing it as a valuable avenue for sharing early-stage ideas and engaging with the academic community.
Beyond his research endeavors, Roy finds joy in music, stand-up comedy, and travel. He considers exploring diverse cultures a refreshing escape and is also a budding poet who combines humor and storytelling through his performances. This creative outlet contrasts with his rigorous analytical research, allowing him to maintain a well-rounded perspective on life and work.
As Roy moves forward, he remains committed to advancing the capabilities of generative models and their applications, striving to contribute to the scientific understanding of intelligent systems that can interact effectively with the visual world.
See also
Sam Altman Praises ChatGPT for Improved Em Dash Handling
AI Country Song Fails to Top Billboard Chart Amid Viral Buzz
GPT-5.1 and Claude 4.5 Sonnet Personality Showdown: A Comprehensive Test
Rethink Your Presentations with OnlyOffice: A Free PowerPoint Alternative
OpenAI Enhances ChatGPT with Em-Dash Personalization Feature



















































