Apple and Other Tech Giants Accused of Using 170,000 Unlicensed YouTube Video Captions to Train AI
According to a report by the technology news website Proof News, several tech companies, including Apple, have utilized third-party data that was acquired without authorization. This data primarily consists of subtitles from 173,536 YouTube videos across 48,000 channels, used to train artificial intelligence models.
The major tech companies implicated include Apple, Anthropic, NVIDIA, Salesforce, among others. However, these companies did not directly extract videos from YouTube; instead, the data was collected and provided by the third-party data provider EleutherAI.
EleutherAI is a non-profit organization that has released a dataset named Pile, which is largely available for public use. Anyone with sufficient storage and computing capacity can download and use it to train AI models via the internet.
The organization's paper mentioned that several large tech companies, including Apple, have used the Pile dataset to train AI models. For example, Apple used it to train the OpenELM model, a new model unveiled in April.
This situation raises complex issues regarding companies like Apple utilizing data provided by third parties, where the source of the data is irregular. The question arises: should companies using this data to train models bear responsibility?
According to YouTube’s terms of service, using any content from YouTube videos without authorization, including subtitles, is a violation of the agreement. EleutherAI's actions violate YouTube's terms of service and infringe upon the copyright of YouTube video creators.
However, it has become increasingly common for AI companies to scrape content from the internet without authorization for training purposes. This process is often conducted quietly, making it difficult for content creators to discover their work has been used without permission.
As of now, Apple, NVIDIA, Anthropic, EleutherAI, and YouTube have not issued any statements regarding this matter. However, given the clear violation, YouTube may take legal action.