Large language models (LLMs) pose a growing threat to online anonymity, according to a recent study conducted by researchers from ETH Zurich, Anthropic, and collaborating institutions. The research reveals that these AI systems can re-identify anonymous users at scale using only their writing, a finding that calls into question the effectiveness of current online privacy measures.
The study demonstrates that LLMs can automate the process of deanonymization—matching anonymous online profiles to real identities or other accounts—by analyzing unstructured text such as forum posts, comments, and interview transcripts. What once required extensive manual investigation can now be executed in a matter of minutes and at a fraction of the cost.
Historically, deanonymization relied on structured data, such as demographic attributes or ratings, and required skilled human analysts. The researchers, however, have shown that LLMs can efficiently perform both necessary steps of this process. By scrutinizing text, these models extract relevant identity signals, such as education, profession, location, and even subtle linguistic cues. These signals are subsequently employed to search and verify potential matches across extensive datasets or the open web.
In one notable experiment, AI agents were tasked with identifying users based on anonymized profiles derived from platforms like Hacker News and Reddit. The systems achieved recall rates between 25% and 67%, with precision levels ranging from 70% to 90%. This means they could accurately identify a significant portion of users while maintaining relatively low error rates. In another test involving anonymized interview transcripts with scientists, the system successfully identified 9 out of 33 individuals, even in the absence of explicit identifiers.
To examine performance on a larger scale, the researchers engineered a modular pipeline that breaks down deanonymization into four steps: feature extraction from text, candidate matching, reasoning over those candidates, and confidence scoring. This framework enables the system to work across large datasets while measuring performance using standard metrics like precision and recall. In benchmarks that linked Hacker News users to LinkedIn profiles, the AI system achieved up to 45% recall at 99% precision, a stark contrast to near-zero performance from traditional methods.
The reasoning phase, where the model determines whether two profiles belong to the same individual, was particularly crucial. It significantly enhanced accuracy beyond simple similarity matching. The researchers found that the chances of successful identification increased as more user data became accessible. In tests linking Reddit users across various communities, recall rates surged as the number of shared references grew, demonstrating that users who disclosed more about themselves were easier to identify.
Moreover, increased computational resources improved outcomes. Models that spent more time reasoning about potential matches exhibited better performance, especially when aiming for high-confidence results. Even as the number of potential matches expanded, the system remained effective, with performance declining gradually as datasets scaled from thousands to tens of thousands of users. The researchers estimate that meaningful success rates could persist even with candidate pools in the millions.
The findings indicate that the signals traditionally used to identify individuals are not novel; however, the speed, cost, and scale at which these clues can now be analyzed have fundamentally changed. This evolution raises urgent questions about the reliability of online privacy models. Simple measures, such as removing names and usernames, may no longer suffice if the remaining text contains enough contextual information to reconstruct identities.
The study also challenges the assumption that unstructured text is safer than structured datasets. Previous research has shown that anonymized datasets could often be re-identified, and this study extends that concern to everyday online communications. While the researchers acknowledged limitations in their experiments, such as reliance on users who voluntarily linked accounts across different platforms, they underscored the need for future work to explore more robust anonymization methods without sacrificing usability.
As artificial intelligence systems increasingly lower the barriers to large-scale identification, the distinction between anonymous and identifiable online behavior becomes progressively blurred. The implications of these findings suggest a requisite evolution in platform policies and user expectations regarding anonymity in the digital age.
See also
Sam Altman Praises ChatGPT for Improved Em Dash Handling
AI Country Song Fails to Top Billboard Chart Amid Viral Buzz
GPT-5.1 and Claude 4.5 Sonnet Personality Showdown: A Comprehensive Test
Rethink Your Presentations with OnlyOffice: A Free PowerPoint Alternative
OpenAI Enhances ChatGPT with Em-Dash Personalization Feature





















































