What OpenAI Whistleblower Suchir Balaji Revealed About The Dark Side Of AI

1 week ago 4

San Francisco, US:

Suchir Balaji, the 26-year-old OpenAI researcher-turned-whistleblower, was found dead in an apartment in San Francisco, US last month. His death on November 26 was determined suicide by the San Francisco medical examiner's office as police found no evidence of wrong doing.

Balaji, who left OpenAI in August, has spoken out against the artificial intelligence company's practice of training the chatbot on copyrighted material scraped from the internet very openly in recent months. The artificial intelligence (AI) giant has been fighting several lawsuits relating to its data-gathering practices.

About Suchir Balaji

Indian American Suchir Balaji grew up in Cupertino, California. A remarkably sharp kid, he excelled in programming contests, placing 31st in the ACM ICPC 2018 World Finals and winning first place in the 2017 Pacific Northwest Regional and Berkeley Programming Contests.

Balaji also secured 7th place in Kaggle's TSA-sponsored "Passenger Screening Algorithm Challenge," earning a $100,000 prize. Per his LinkedIn profile, he was the US Open 2016 National Champion and a USACO Finalist.

Like most others in his field, Balaji had been captivated by the promise of artificial intelligence since an early age. In an interview given to the New York Times in October, he mentioned his interest in AI started after he stumbled across a news story about the technology in his teens and imagined that neural networks could solve humanity's greatest problems.

"I thought that AI was a thing that could be used to solve unsolvable problems, like curing diseases and stopping ageing...I thought we could invent some kind of scientist that could help solve them," he said, according to the NYT report.

Even before graduating, he worked at Scale AI, Helia, and was a Software Engineer at Quora. In 2020, Balaji joined a stream of Berkeley grads who went to work for OpenAI.

Suchir Balaji's Time At OpenAI

He worked at OpenAI for four years, during which for one and half years, he helped gather and organize the enormous amounts of internet data the company used to build its online chatbot, ChatGPT.

Balaji told NYT that during his initial data at OpenAI, he did not carefully consider whether the company had a legal right to build its products using both copyrighted and open internet data. It was only after the release of ChatGPT in late 2022 that he started to contemplate the issue and realised that technologies like ChatGPT were damaging the internet by using copyrighted data, violating the law in the process.

By 2024, Balaji said realised "he no longer wanted to contribute to technologies that he believed would bring society more harm than benefit." He left the company in August this year without any new job and started working on what he called "personal projects."

He died a day after he was named in a court filing as someone whose files OpenAI would search as part of a lawsuit brought by those who sued the AI giant.

Suchir Balaji's Allegation Against OpenAI

After leaving OpenAI, Suchir Balaji spoke out publicly against the way AI companies are using copyrighted data to create their technologies. He alleged that the AI models were too dependent on the labour of others as they are trained on copyrighted material scraped from the internet without authorisation.

"This is not a sustainable model for the internet ecosystem as a whole," he told the NYT.

He also explained his concerns on his personal website, where he noted that while generative models rarely produce outputs identical to their training data, the act of replicating copyrighted material during training could violate laws if not protected under "fair use."

"Because fair use is determined on a case-by-case basis, no broad statement can be made about when generative AI qualifies for fair use," he noted.

Balaji argued in several cases the chatbots directly compete with the copyrighted works they learned from. "Generative models are designed to imitate online data, so they can substitute for "basically anything" on the internet, from news stories to online forums," he said.

According to him, the biggest problem is that with AI technologies gradually replacing existing internet services, they sometimes produce "false and sometimes completely made-up information - what researchers call "hallucinations.""

The internet, he said, is changing for the worse.

Allegations Against AI Companies

Balaji was not alone in his concerns about AI companies misusing copyrighter data to train their chatbots. Several US and Canadian news publishers, including the New York Times, have filed lawsuits against OpenAI and its primary partner, Microsoft, claiming they used millions of their articles to build chatbots that now compete with the news outlet as a source of reliable information.

Many best-selling writers, including John Grisham, also have filed lawsuits against the company.

OpenAI Disputes Claims

OpenAI has disputed Balaji's claims, insisting that their data use adhered to fair use principles and legal precedents.

"We build our AI models using publicly available data, in a manner protected by fair use and related principles, and supported by longstanding and widely accepted legal precedents. We view this principle as fair to creators, necessary for innovators, and critical for US competitiveness," OpenAI said in a statement.

The company told BBC in November that its software is "grounded in fair use and related international copyright principles that are fair for creators and support innovation".

Reacting to BAlaji's death, a spokesperson for OpenAI said, "We are devastated to learn of this incredibly sad news today and our hearts go out to Suchir's loved ones during this difficult time."

Read Entire Article