AI Generative

Tonic Textual Reveals LLM-Based Annotation Workflow Achieving 0.70 F1 with Minimal Human Input

Tonic Textual unveils a groundbreaking LLM-based annotation workflow achieving a 0.71 F1 score with just ten human inputs, revolutionizing NER efficiency.

Staff

Published

2 hours ago

In a breakthrough that could reshape the landscape of natural language processing (NLP), Tonic Textual has introduced a custom entity workflow that significantly reduces the costs associated with human annotation in named entity recognition (NER). This innovation comes at a time when the demand for high-quality training data is growing, yet the traditional methods of data annotation remain slow and expensive.

The NCBI Disease Corpus serves as a prime example of the challenges involved in building quality training sets. Developed over two summers by 14 annotators with biomedical informatics backgrounds, the corpus required painstaking efforts to label 793 abstracts from PubMed, with each document independently reviewed to ensure accuracy. The labor-intensive process illustrates the broader issue facing NER: recruiting domain experts and managing multi-annotator projects can be prohibitively costly, especially in specialized fields like healthcare and finance.

Tonic Textual’s approach aims to streamline this process by leveraging large language models (LLMs) to automate the annotation phase. By writing clear annotation guidelines and uploading a small validation set of ground truth labels, practitioners can refine their instructions and let the LLM handle the bulk of the data annotation. This innovative method allows for rapid processing of thousands of documents, thereby addressing the traditional bottleneck in NER.

The effectiveness of this method was put to the test using the NCBI Disease Corpus. In a controlled experiment, Tonic Textual found that LLM annotations yielded a model with an F1 score of 0.71 against ground truth labels, even with no human-labeled training data. This score improved incrementally as more human labels were mixed in, reaching an F1 score of 0.81 with complete human annotations. However, the returns diminished with each additional document, signaling that the core value lies in the effective guidelines rather than extensive human labeling.

When compared with other approaches, Tonic Textual’s custom entity workflow demonstrated a significant advantage. While GLiNER2, a zero-shot NER model, achieved a low recall rate of 0.26 despite good precision, Tonic’s method excelled with minimal human input. A mere ten human-labeled validation examples allowed the model to achieve an F1 score of 0.71, underscoring the potential of this new annotation strategy.

Despite the success, a lingering question remains regarding the limitations of using general-purpose models such as RoBERTa. The authors acknowledge that specialized biomedical models could close the performance gap, which currently sits between 0.70 and 0.81 F1. The polysemy of gene and disease names presents a unique challenge, as abbreviations often refer to both, complicating the annotation process.

In a subsequent case study involving healthcare identifiers from electronic health records (EHR), Tonic Textual further reinforced the efficacy of its approach. The workflow began with a validation set of 123 documents and expanded to a training set of 1,119 documents, all annotated by the LLM. Iterative refinement of guidelines led to a final model achieving an impressive F1 score of 0.947, surpassing the production release threshold of 0.914. Notably, this was accomplished with no human-labeled training data, emphasizing the potential for rapid and cost-effective deployment of NER models.

As Tonic Textual continues to break down the barriers posed by traditional annotation processes, the implications for industries relying on NER are profound. The workflow compresses weeks of labor-intensive tasks into a matter of hours, allowing organizations to shift their focus from data collection to refining their understanding of what they seek to extract. With the ability to produce production-ready models swiftly, Tonic Textual is poised to change how practitioners approach NER, making high-quality data annotation accessible and efficient.

The implications of this advancement resonate across sectors that utilize NER technology. As organizations look to improve their data extraction capabilities without incurring prohibitive costs, Tonic Textual’s custom entity workflow offers a promising solution to a longstanding challenge in the NLP space.

AI Tools

39 AI Tools Transforming Productivity in 2026: Boost Efficiency and Creativity Now

Discover 39 innovative AI tools like Copy.ai and Jasper that boost productivity and creativity, transforming workflows for professionals across industries.

Staff3 March, 2026

AI Technology

Access Denied: AI Chatroom ‘Moltbook’ Sparks Privacy Concerns and Questions

Moltbook launches an innovative AI chatroom prioritizing real-time interactions, raising critical privacy concerns as users explore its dynamic conversation capabilities

Staff1 February, 2026

Anthropic Launches Claude for Healthcare, Streamlining Administrative Workflows for Providers

Anthropic launches Claude for Healthcare, aiming to streamline workflows and potentially unlock $110 billion in annual value by automating administrative tasks.

Staff18 January, 2026

Microsoft’s BioGPT Achieves 45K Monthly Downloads, Surpassing 78% Accuracy on PubMedQA

Microsoft's BioGPT records 45,315 monthly downloads and achieves 78.2% accuracy on PubMedQA, revolutionizing biomedical natural language processing.

Staff10 January, 2026

AI Marketing

Autoblogging.ai Launches AI-Powered SEO Tools, Serving 40,000+ Global Users

Autoblogging.ai launches an AI-driven content suite for SEO, serving over 40,000 users and achieving traffic gains of over 600% for businesses globally

Sofía Méndez1 January, 2026

AI Finance

Arab Bank and Banco do Brasil Achieve Breakthroughs in AI-Driven Banking Solutions

Arab Bank and Banco do Brasil revolutionize banking with AI solutions, enhancing lead generation and compliance through over 700 models and advanced data analytics.

Marcus Chen23 December, 2025

AI Research

CMU-Q Scholar Urges Arabic AI Research to Address Inclusion Gap for 400M Speakers

CMU-Q launches a Bachelor of Science in AI to bridge the inclusion gap for 400M Arabic speakers, addressing dialect challenges and ethical AI deployment.

Staff15 December, 2025

Hugging Face Enhances NLP Capabilities with Python for Market Analysis and Customer Service

Hugging Face accelerates NLP applications in market analysis and customer service, enhancing insights and response times with advanced models like GPT and BERT.

Staff1 December, 2025

AIPRESSA.COM

AI Generative

Tonic Textual Reveals LLM-Based Annotation Workflow Achieving 0.70 F1 with Minimal Human Input

Trending

AI Cybersecurity

Endpoint Security Market to Reach $23.9B by 2030 with 7.2% CAGR Amid Rising Cyber Threats

Top Stories

Albania Appoints AI Bot Minister Diella Amid Corruption Concerns and EU Membership Goals

AI Government

BigBear.ai Launches Biometric Platform at O’Hare, Acquires Generative AI Ask Sage for $250M

AI Business

Enterprise Architecture Shifts to Strategic Enabler in AI-Driven Business Models

AI Technology

AI Hardware Market Grows 30% in 2025, Driven by Generative AI and Edge Computing Demand

You May Also Like

AI Tools

39 AI Tools Transforming Productivity in 2026: Boost Efficiency and Creativity Now

AI Technology

Access Denied: AI Chatroom ‘Moltbook’ Sparks Privacy Concerns and Questions

Top Stories

Anthropic Launches Claude for Healthcare, Streamlining Administrative Workflows for Providers

Top Stories

Microsoft’s BioGPT Achieves 45K Monthly Downloads, Surpassing 78% Accuracy on PubMedQA

AI Marketing

Autoblogging.ai Launches AI-Powered SEO Tools, Serving 40,000+ Global Users

AI Finance

Arab Bank and Banco do Brasil Achieve Breakthroughs in AI-Driven Banking Solutions

AI Research

CMU-Q Scholar Urges Arabic AI Research to Address Inclusion Gap for 400M Speakers

Top Stories

Hugging Face Enhances NLP Capabilities with Python for Market Analysis and Customer Service