NVIDIA Found Again Scraping Data from YouTube and Netflix for Training AI Models

NVIDIA has previously been caught using third-party data sets to train AI models without the consent of the copyright owners, which means companies like NVIDIA are using data content for training without authorization.

A new report today reveals that NVIDIA is collecting various types of data every day for model training. A former NVIDIA employee disclosed that the company required them to scrape video content from Netflix, YouTube, or other online resources for training data for various NVIDIA AI products.

These products include NVIDIA's Omniverse 3D world generator, autonomous driving systems, digital humans, and also a project named Cosmos. This project aims to build a foundational AI model similar to Gemini 1.5, GPT-4, or Llama 3.1.

Notably, when employees inquired about the legality of the project, NVIDIA management assured them that they had obtained approval from the company's highest level of management to use these data for AI model training.

Internal Slack chat records, emails, and some documents from NVIDIA were also leaked, serving as evidence that NVIDIA indeed continuously scrapes data without authorization for model training.

To scrape various online video resources, the Cosmos project reportedly used an open-source video downloader and leveraged machine learning for IP hopping to evade YouTube's blocks. Evidence shows project managers discussed using 30 virtual machines running on Amazon AWS for data scraping.

NVIDIA responded to media reports, stating that they had done nothing wrong:

"We respect the rights of all content creators and believe our models and research work fully comply with both the letter and spirit of copyright laws. Copyright law protects specific expressions but not facts, ideas, data, or information. Anyone is free to learn facts, ideas, data, or information from other sources and use these to express their own views. Fair use also protects the ability to use works for transformative purposes, such as model training."

Currently, technology companies, including NVIDIA, are finding ways to scrape data from the internet for model training, inevitably involving unauthorized copyrighted content, but as long as it remains undetected, the scraping continues unabated.

On the other hand, AI models trained with protected content used for commercial purposes can easily lead to copyright disputes. For instance, NVIDIA's responses to inquiries about the training methods for its game-generating AI engine at CES 2024 were ambiguous, raising many concerns. NVIDIA later stated it was commercially safe to allay developers' worries.

AI(251)Netflix(5)NVIDIA(26)YouTube(22)

Copyright Notice:
Thank you for reading. This article was written by Landian News, and the author is Brook.X. If you wish to repost this article, please include a link to the original: https://landian.news/article/2979.html

{{userData.name}}

NVIDIA Found Again Scraping Data from YouTube and Netflix for Training AI Models

Rise and Fall of Neeva: The Ad-Free Search Engine Bows Out, Aiming For an AI Future

OpenAI Updates ChatGPT Access: Free Users Can Now Generate 2 Images Daily

NVIDIA Reportedly Plans to Launch RTX 4060 Ti, RTX 4060, and RTX 4050 Graphics Cards

Google Reveals a Quarter of New Code is AI-Generated, Undergoes Engineer Review

Rumors Suggest iPhone 17 Pro Max to Feature 12GB RAM for Enhanced AI Performance, Other Models Remain at 8GB

Microsoft Announces Closer Collaboration with OpenAI: Bing Search Mode Now Available on ChatGPT

ChatGPT Introduces Easy Chat History Export Feature: Here's How to Back Up Your Conversations

Microsoft Launches Native Version of Copilot for Windows 10/11, But It's Really Just a Web App in Disguise 🤡

After a Series of Mishaps, Google Significantly Reduces the Visibility of AI-generated Summaries, Now Only 7% of Searches Will Display AI-generated Overviews

Alibaba Cloud's Qwen team open-sources two voice base models with better speech recognition performance than OpenAI Whisper model

[Download] Free Virtual Machine Software VMware Workstation Pro v17.6.2 Official Release - No Activation Required

MAS v2.9 Update: Your All-In-One Solution to Windows/Office Activation Issues

[Updated] The Importance of Data Backup: European Cloud Company Hetzner Deletes All Servers and Data of a Client Without Warning

Linux Kernel 6.13 to Support Display of Stuck Task Counts, Aiding Administrators in Fault Diagnosis

Ubuntu 20.04 LTS Support Nearing End: Upgrade or Subscribe to ESM for Updates

Elementary OS 8 Released: Aiming to Replace Windows and macOS

Linux Kernel 6.13 First Release Candidate (RC1) Launched with Multiple New Features and Improvements

OpenAI Expands ChatGPT Collaboration: New IDE and Terminal Tool Integrations

The True God of Permanence! MAS Team Announces Successful Crack of Windows 10 ESU Paid Extension Updates