OpenAI transcribed over a million hours of YouTube videos to train its LLMs, Google engaged in same practice

A scorching potato: One of many many controversial parts surrounding generative AIs and their massive language fashions’ (LLM) coaching information is the potential copyright infringements. It is a matter underneath the highlight as soon as once more following a report that OpenAI transcribed over 1,000,000 hours of YouTube movies to coach GPT-4. Why did not YouTube proprietor Google object? As a result of it did the identical factor.

To be able to entry extra respected English language-based textual content on the web in 2021, OpenAI researchers created a speech recognition device referred to as Whisper, writes The New York Occasions. It was designed to transcribe audio from YouTube movies, giving the corporate a trove of information to coach its LLMs.

OpenAI reportedly knew that scraping YouTube information was legally questionable however did it anyway, assuming such motion can be honest use. The Occasions writes that OpenAI president Greg Brockman was personally concerned in amassing movies that have been transcribed.

One would think about Google being lower than glad about OpenAI’s actions, however that will have been hypocritical provided that Google additionally transcribed YouTube movies for its AI fashions, doubtlessly violating creators’ copyrighted materials.

YouTube CEO Neal Mohan stated throughout an interview with Bloomberg final week that the platform’s phrases of service don’t allow unauthorized transcripts or downloading of video content material. When requested about OpenAI’s transcribing, he stated, “I’ve seen stories that it might or might not have been used. I’ve no info myself.”

Google spokesperson Matt Bryant repeated the ToS guidelines, including that the corporate takes “technical and authorized measures” to forestall this type of unauthorized apply “when now we have a transparent authorized or technical foundation to take action.” Google stated that its AI fashions “are educated on some YouTube content material” that’s allowed underneath agreements with creators.

The NY Occasions states that Google has expanded its phrases of service, giving it extra rights to make use of shopper information resembling publicly obtainable Google Docs and restaurant critiques on Google Maps for the corporate’s AI fashions. The revised coverage was launched on July 1 within the hope that the Independence Day weekend would act as a distraction.

Meta was additionally stated to be contemplating shady strategies of accomplishing extra information for its LLM coaching. The NY Occasions writes that the Fb mum or dad thought-about amassing copyrighted information from the web, even when that meant dealing with lawsuits, as negotiations with license holders would take too lengthy.

Hundreds of organizations and people are complaining and submitting lawsuits towards massive AI firms over the usage of their content material with out fee or acknowledgment. The New York Occasions is suing OpenAI and Microsoft for utilizing its copyrighted information articles. In February, OpenAI accused the publication of paying somebody to “hack” its well-known chatbot and different merchandise to generate deceptive proof supporting these claims.

Masthead: Souvik Banerjee

Source link

What's Hot

Better’s new ChatGPT app targets lenders Rocket and UWM

Your Boss Isn’t the Problem. Your Expectations Are

US Treasury signals global tariff hike to 15% as Trump trade policy returns

OpenAI transcribed over a million hours of YouTube videos to train its LLMs, Google engaged in same practice

An interview with Tim Sweeney on the Google/Epic settlement, what Play Store changes mean for developers, why Epic’s case against Apple is different, and more (Dean Takahashi/GamesBeat)

‘Our consciousness is under siege’: Michael Pollan on chatbots, social media and mental freedom | Well actually

Your next Oura Ring powered by voice or gesture? What this AI buy means for Oura Ring 5

Could the Trump administration rerun the TikTok playbook on Fortnite?

Better’s new ChatGPT app targets lenders Rocket and UWM

Your Boss Isn’t the Problem. Your Expectations Are

US Treasury signals global tariff hike to 15% as Trump trade policy returns

An interview with Tim Sweeney on the Google/Epic settlement, what Play Store changes mean for developers, why Epic’s case against Apple is different, and more (Dean Takahashi/GamesBeat)

Best Debt Settlement Companies of 2026: Compare Fees and Savings

Chart of the Week: AI Is Reshaping the Labor Market

What's Hot

OpenAI transcribed over a million hours of YouTube videos to train its LLMs, Google engaged in same practice

Related Posts