Key Takeaways
- A 16TB unsecured MongoDB database was discovered, exposing approximately 4.3 billion professional records.
- The database contained LinkedIn-style data, including names, emails, phone numbers, job roles, employers, work history, education, locations, skills, languages, and social accounts.
- The exposure of this data enables large-scale AI-driven social-engineering attacks, including phishing and CEO fraud, corporate reconnaissance, and automated personalized scams.
- The ownership of the leaked dataset remains unconfirmed, but researchers found clues suggesting a lead-generation company may be responsible.
- The leak is considered dangerous due to the massive, structured data that enables targeted attacks, and criminals can automate personalized scams using large language models (LLMs).
Introduction to the Leak
A recent discovery by researchers has uncovered an unsecured 16TB database containing approximately 4.3 billion professional records. The database, which was found to be unsecured and only closed after researchers alerted the owner, exposed a vast amount of LinkedIn-style data. This data includes names, emails, phone numbers, job roles, employers, work history, education, locations, skills, languages, and social accounts. The exposure of this data has significant implications, as it enables large-scale AI-driven social-engineering attacks, including phishing and CEO fraud, corporate reconnaissance, and automated personalized scams.
The Database Structure
The unsecured database was found to contain nine collections, each with a name that likely indicates the type of information contained within. These collections include intent, profiles, unique_profiles, people, sitemap, companies, company_sitemap, address_cache, and intent_archive. The largest collections, including profiles, unique_profiles, and people, exposed nearly two billion personal records. The "unique_profiles" dataset alone listed over 732 million records with image URLs, while the "people" collection added enrichment metrics and Apollo IDs linked to the Apollo.io ecosystem. Notably, there were no signs of an Apollo breach, suggesting that the data may have been collected through other means.
Implications of the Leak
The leak of this database is considered particularly dangerous due to the massive, structured data that enables targeted attacks. With billions of records, criminals can automate personalized scams, reduce preparation time, and focus on high-value targets, including Fortune 500 employees. Large language models (LLMs) can generate personalized messages based on user profile information, allowing attackers to send tens of millions of malicious emails to victims. The leak also simplifies social engineering and credential stuffing attacks, as malicious actors can craft a large, searchable database of personal data that, after enrichment, could also include passwords, device identifiers, links to other social media, and more.
Attribution and Ownership
The ownership of the leaked dataset remains unconfirmed, but researchers found clues suggesting a lead-generation company may be responsible. The company, which claims access to over 700 million professionals, closely matches the exposed "unique_profiles" count, and the database went offline a day after notification. However, researchers stopped short of attribution, noting that the company itself may have been scraped. The age of the LinkedIn data is also difficult to determine, as timestamps show records were collected or updated in 2025, but some data may date back years, including possible scrapes from large LinkedIn leaks claimed by threat actors in 2021.
Conclusion
The discovery of the unsecured 16TB database highlights the importance of data security and the potential risks associated with large-scale data exposure. The leak of this database has significant implications, as it enables large-scale AI-driven social-engineering attacks, including phishing and CEO fraud, corporate reconnaissance, and automated personalized scams. As the ownership of the leaked dataset remains unconfirmed, it is essential to take steps to protect personal data and prevent similar leaks in the future. This includes implementing robust security measures, such as encryption and access controls, and being cautious when sharing personal data online. Ultimately, the leak of this database serves as a reminder of the importance of data security and the need for individuals and organizations to take proactive steps to protect sensitive information.


