Former OpenAI Researcher Claims OpenAI Violates Copyright Law and Undermines the Internet's Commercial Viability
OpenAI's GPT series of models are trained on vast amounts of data, most of which, evidently, were used without the consent of copyright holders. As a result, some copyright owners have initiated lawsuits against OpenAI.
Suchir Balaji, a former OpenAI researcher who was involved with the GPT model research for four years before leaving this summer, pointedly accused OpenAI of violating copyright laws in a personal blog post. "If you believe what I believe, then you must leave OpenAI," Balaji stated.
Joining OpenAI in 2020, Balaji also contributed to the GPT and the latest GPT-4 models. His initial motivation was the belief that AI technology could solve seemingly insurmountable problems, such as curing diseases and stopping aging. However, he later became disillusioned, feeling that the technology was being used in ways he could not support. Balaji further argued that OpenAI is compromising the commercial viability of individuals, businesses, and internet services, which provide the underlying data used to train AI.
ChatGPT's Output Does Not Meet Fair Use Standards:
In his blog, Balaji argued that OpenAI violates copyright law, demonstrating how much copyrighted information from AI system training datasets ends up in the models' outputs, meaning ChatGPT's output does not meet the standards of fair use. This legal standard allows limited use of copyrighted material without the permission of the copyright owner. Balaji believes that regulation is the only solution to this problem.
OpenAI's Response to Former Employee's Claims:
In response, OpenAI contested these claims, stating, "We build our artificial intelligence models using publicly available data, protected by fair use and related principles, supported by long-standing and widely accepted legal precedents. We believe these principles are fair to creators, necessary for innovators, and crucial for America’s competitiveness."
However, the dispute between OpenAI and copyright holders is growing, with some copyright owners presenting evidence that OpenAI's notion of fairness is flawed. The company has collected a vast amount of copyrighted/uncopyrighted content for training its models, which in turn generate revenue for OpenAI but offer no compensation to copyright holders.
Furthermore, as OpenAI shifts from a non-profit to a profit-making entity, it might increasingly prioritize its own interests over those of the industry, society, and copyright owners.