Researchers from University College London and the University of Urbino Carlo Bo have made significant strides in drug discovery by exploring the use of artificial intelligence to streamline the hit generation process. Traditionally, identifying promising drug candidates involves extensive laboratory work and substantial resources, but a new study led by Nagham Osman and Laura Toni aims to demonstrate that machine learning can effectively facilitate the design of novel molecules that exhibit potential biological activity. Their work represents a critical advancement in drug discovery by validating the ability of machine learning models to generate ‘hit-like’ compounds, serving as a virtual starting point for pharmaceutical research.
The collaborative study details the implementation of generative models such as DiGress, MolRNN, and GraphINVENT for de novo drug design. The focus was on generating molecules that bind to seven specific protein targets: ADORA2A, D3R, GSK-3β, HSP90α, PPARα, SRC, and Thrombin. Each model underwent training, incorporating methods including reinforcement learning and a focus on drug-like properties. The evaluation of generated molecules utilized docking scores—where lower scores indicate better binding—alongside assessments of similarity to known ligands and analysis of physicochemical properties. While results generally indicated that the models performed well, challenges arose in generating compounds for the PPARα, SRC, and Thrombin targets.
Notably, models trained to produce hit-like compounds outperformed those using reinforcement learning. A particular compound designed for GSK-3β surpassed existing ligand sets in activity, underscoring the potential of machine learning to create novel molecules with enhanced biological activity. The research also revealed crucial binding interactions within the GSK-3β site, further emphasizing the promise of generative models in drug design.
This recent study takes a pioneering approach to drug discovery by framing hit-like molecule generation as an independent task, aimed at efficiently creating compounds that can be incorporated into conventional screening workflows. The research involved benchmarking various autoregressive and diffusion-based generative models, training them across diverse datasets. Generated molecules underwent a stringent multi-stage filtering pipeline designed to define hit-like chemical space based on physicochemical properties, structural features, and predicted bioactivity. The synthesis and in vitro confirmation of several GSK-3β hits not only demonstrated the practical applicability of this generative approach but also highlighted potential limitations in current evaluation metrics and existing training data.
Furthermore, the scientists emphasized that deep learning models can successfully produce compounds suitable for the early stages of drug discovery. This work signifies the first explicit attempt to evaluate whether generative models can effectively support the critical phase of hit identification. By developing a novel evaluation framework, the researchers assessed the outputs of autoregressive and diffusion-based models through multiple datasets and training configurations. Results showed that these models could generate valid, diverse, and biologically relevant compounds, aligning with the properties of established drug candidates.
Despite these advances, the study revealed constraints when high-quality, hit-like training data was scarce, particularly for certain protein targets. Additionally, standard metrics for assessing molecular diversity and similarity did not always correspond with predicted biological activity, suggesting a need for more biologically relevant benchmarks. Moving forward, the research team plans to focus on enhancing datasets and developing improved model architectures capable of learning effectively from limited target-specific data.
As the pharmaceutical industry increasingly turns to artificial intelligence to innovate drug discovery processes, this research could play a pivotal role in shaping future methodologies. By validating the efficacy of machine learning models in generating active drug candidates, the study opens new avenues for accelerating the development of new therapeutics while potentially reducing the reliance on traditional, resource-intensive screening methods.
See also
New Current-Diffusion Model Enhances Metasurface Discovery with Spatial-Frequency Dynamics
LLMs Achieve Over 99% Accuracy as World Models for AI Agent Training, Study Reveals
Top10Lists.us Validates AI Source Credibility for Real Estate Agent Recommendations
LG Launches Gram Pro AI Laptops with Exaone 3.5 and 27-Hour Battery Life at CES 2026



















































