Back to home
Technology

A major AI training data set contains millions of examples of personal data

Source

MIT Technology Review

Published

TL;DR

AI Generated

A major AI training data set, DataComp CommonPool, contains millions of personal data examples, including images of passports, credit cards, and birth certificates, according to new research. The study revealed thousands of images with identifiable faces and identity documents within CommonPool, estimating hundreds of millions of such images in the dataset. The data set, released in 2023, consists of 12.8 billion image-text pairs and is used for training generative text-to-image models. Concerns were raised about the presence of personally identifiable information in the data set, highlighting privacy risks and the challenges of filtering such data effectively. Researchers emphasize the need for the machine-learning community to address privacy issues and reconsider the practice of indiscriminate web scraping.

Read Full Article

Similar Articles

SemiEngineering

The Smart Advantage: How Artificial Intelligence Is Transforming Inspection And Metrology In Semiconductor Manufacturing

Artificial Intelligence (AI) is revolutionizing semiconductor inspection and metrology by enhancing defect detection processes with automation, speed, and adaptability. AI-driven systems leverage Big Data to uncover patterns and anomalies that traditional methods may miss, leading to improved accuracy and efficiency. AI-integrated platforms like Nordson's SQ3000 Multi-Function System can detect microscopic flaws with unparalleled speed and efficiency, surpassing traditional methods. AI's real-time, in-line inspection capabilities enable rapid data processing without compromising production speed, while machine learning models adjust quickly to new production requirements. The advancement of Machine Learning (ML) in inspection systems is transforming defect detection by creating self-teaching AI systems that become smarter and more adaptable with each interaction.

SemiEngineering
Microsoft is automatically updating Windows 11 24H2 to 25H2 using machine learning

Microsoft is automatically updating Windows 11 24H2 to 25H2 using machine learning

Microsoft is automatically updating Windows 11 from version 24H2 to 25H2 for Home and Pro users, excluding IT departments and organizations. Users have limited control over the timing of the update but can postpone it temporarily. The forced update is aimed at streamlining future updates and focusing resources on the newer version, as version 24H2 will reach end-of-life in 2026. While Microsoft claims this is a "machine learning-based intelligent rollout," details on this process are not provided, raising concerns about user autonomy and privacy. Microsoft's efforts to improve Windows 11 include plans for enhancements in Windows search, with more improvements expected in the future.

TweakTown
LinkedIn is spying on you, according to a new 'BrowserGate' security report — scripts stealthily scan visitors' browsers for over 6,000 Chrome extensions and harvest hardware data

LinkedIn is spying on you, according to a new 'BrowserGate' security report — scripts stealthily scan visitors' browsers for over 6,000 Chrome extensions and harvest hardware data

LinkedIn has been accused of spying on users through a JavaScript script that scans visitors' browsers for over 6,000 Chrome extensions and collects hardware data like CPU core count and screen resolution. The script also gathers device telemetry such as time zone and battery status. Many of the targeted extensions are LinkedIn-related tools, including those from competitors like Apollo and ZoomInfo. LinkedIn claims the scanning is to detect extensions that violate its terms of service, but the data collected could potentially be used to identify individuals. This aggressive client-side fingerprinting technique is not unique to LinkedIn, as other platforms like eBay have been found to engage in similar practices.

Tom's Hardware
MIT Technology Review

OpenAI is throwing everything into building a fully automated researcher

OpenAI is shifting its focus to building an AI researcher, aiming to create a fully automated system capable of tackling complex problems independently. The company plans to develop an autonomous AI research intern by September, leading to a multi-agent research system by 2028. OpenAI's chief scientist, Jakub Pachocki, believes in the potential of AI models to work autonomously for extended periods, with the goal of applying AI tools to real-world problem-solving. However, concerns about the risks and ethical implications of autonomous AI systems remain, prompting discussions on oversight and control mechanisms.

MIT Technology Review

We use cookies

We use cookies to ensure you get the best experience on our website. For more information on how we use cookies, please see our cookie policy.