Connect with us

Hi, what are you looking for?

AI Generative

Large Language Models Can Deanonymize Users with 67% Recall, ETH Zurich Study Reveals

ETH Zurich study reveals large language models can deanonymize users with up to 67% recall, raising alarms over online privacy effectiveness.

Large language models (LLMs) pose a growing threat to online anonymity, according to a recent study conducted by researchers from ETH Zurich, Anthropic, and collaborating institutions. The research reveals that these AI systems can re-identify anonymous users at scale using only their writing, a finding that calls into question the effectiveness of current online privacy measures.

The study demonstrates that LLMs can automate the process of deanonymization—matching anonymous online profiles to real identities or other accounts—by analyzing unstructured text such as forum posts, comments, and interview transcripts. What once required extensive manual investigation can now be executed in a matter of minutes and at a fraction of the cost.

Historically, deanonymization relied on structured data, such as demographic attributes or ratings, and required skilled human analysts. The researchers, however, have shown that LLMs can efficiently perform both necessary steps of this process. By scrutinizing text, these models extract relevant identity signals, such as education, profession, location, and even subtle linguistic cues. These signals are subsequently employed to search and verify potential matches across extensive datasets or the open web.

In one notable experiment, AI agents were tasked with identifying users based on anonymized profiles derived from platforms like Hacker News and Reddit. The systems achieved recall rates between 25% and 67%, with precision levels ranging from 70% to 90%. This means they could accurately identify a significant portion of users while maintaining relatively low error rates. In another test involving anonymized interview transcripts with scientists, the system successfully identified 9 out of 33 individuals, even in the absence of explicit identifiers.

To examine performance on a larger scale, the researchers engineered a modular pipeline that breaks down deanonymization into four steps: feature extraction from text, candidate matching, reasoning over those candidates, and confidence scoring. This framework enables the system to work across large datasets while measuring performance using standard metrics like precision and recall. In benchmarks that linked Hacker News users to LinkedIn profiles, the AI system achieved up to 45% recall at 99% precision, a stark contrast to near-zero performance from traditional methods.

The reasoning phase, where the model determines whether two profiles belong to the same individual, was particularly crucial. It significantly enhanced accuracy beyond simple similarity matching. The researchers found that the chances of successful identification increased as more user data became accessible. In tests linking Reddit users across various communities, recall rates surged as the number of shared references grew, demonstrating that users who disclosed more about themselves were easier to identify.

Moreover, increased computational resources improved outcomes. Models that spent more time reasoning about potential matches exhibited better performance, especially when aiming for high-confidence results. Even as the number of potential matches expanded, the system remained effective, with performance declining gradually as datasets scaled from thousands to tens of thousands of users. The researchers estimate that meaningful success rates could persist even with candidate pools in the millions.

The findings indicate that the signals traditionally used to identify individuals are not novel; however, the speed, cost, and scale at which these clues can now be analyzed have fundamentally changed. This evolution raises urgent questions about the reliability of online privacy models. Simple measures, such as removing names and usernames, may no longer suffice if the remaining text contains enough contextual information to reconstruct identities.

The study also challenges the assumption that unstructured text is safer than structured datasets. Previous research has shown that anonymized datasets could often be re-identified, and this study extends that concern to everyday online communications. While the researchers acknowledged limitations in their experiments, such as reliance on users who voluntarily linked accounts across different platforms, they underscored the need for future work to explore more robust anonymization methods without sacrificing usability.

As artificial intelligence systems increasingly lower the barriers to large-scale identification, the distinction between anonymous and identifiable online behavior becomes progressively blurred. The implications of these findings suggest a requisite evolution in platform policies and user expectations regarding anonymity in the digital age.

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

Top Stories

Anthropic expands its UK operations with an 800-employee office in London and launches the cybersecurity-focused Mythos model for financial institutions.

AI Cybersecurity

Anthropic's Claude Mythos Preview can autonomously exploit software vulnerabilities, alarming leaders like U.S. Treasury Secretary Scott Bessent and raising cyber risk concerns.

AI Research

Mark Zuckerberg relocates his desk to Meta's AI lab, actively coding alongside engineers as the company launches Muse Spark, boosting stock prices amid fierce...

AI Regulation

Anthropic appoints Novartis CEO Vas Narasimhan to its board, enhancing AI governance with his extensive healthcare and regulatory expertise.

AI Finance

70% of finance teams in Australia and New Zealand use shadow AI tools like ChatGPT, risking data governance with only 16% confident in data...

Top Stories

Google DeepMind hires philosopher Henry Shevlin to guide ethical AI development and explore machine consciousness as AGI approaches reality

AI Education

CoreWeave secures a $21 billion deal with Meta and partners with Anthropic to enhance AI model deployment, responding to skyrocketing demand for compute capacity.

Top Stories

Stanford's AI Index reveals U.S. investment of $285.9B eclipses China's $12.4B, yet 95% of AI projects see no ROI and model gap narrows to...

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.