stack overflow will charge for accessing its api

Stack Overflow, an online community used by developers from all across the world, may soon start charging AI companies for using the data from its forums to train their large language models (LLMs).

The company’s intentions were confirmed by no other than Stack Overflow’s Chief Executive Officer, Prashanth Chandrasekar, according to an exclusive report from the tech-focused online magazine Wired.

“Community platforms that fuel LLMs absolutely should be compensated for their contributions so that companies like us can reinvest back into our communities to continue to make them thrive”, the head of the online community asserted.

Public Forums Have Been a Relevant Source of Data from AI Companies

Most of the popular AI models nowadays including ChatGPT and GPT-4 have used publicly available data from sources like Wikipedia, Reddit, and Stack Overflow to train these LLMs so they can learn what they need to answer almost any query coming from a user.

Some communities like Quora and Stack Overflow have Q&A forums that can be tapped to get information about a wide range of topics. In the case of the company headed by Chandrasekar, these questions and answers are typically related to programming topics.

One of the most profitable and productive use cases for AI models is their ability to create rough drafts of the coding needed to create applications and software. In addition, the models may be trained to go over a user-generated code and solve any errors.

Stack Overflow’s library reportedly contains over 50 million questions and answers from its more than 20 million users. This makes the public forum a gold mine for accessing cheap – virtually free – training data for AI models.

Last week, Reddit, a site that hosts public forums on many topics, announced that it will begin charging clients who want access to large amounts of data through the platform’s application programming interface (API). The updates to the company’s Data API terms will take effect on June 19 this year.

Meanwhile, Elon Musk recently accused Microsoft (MSFT) of illegally using Twitter’s data to train the models created by OpenAI and implied that he may be preparing to sue the Redmond-based tech firm for doing so.

It is not clear how much firms like Stack Overflow and Reddit plan to charge for accessing the bulk of the data available on their platforms. However, in the case of Twitter, the company shut down access to its free API and started to charge $42,000 per month for giving businesses access to 50 million tweets and a whopping $210,000 per month in exchange for access to 200 million tweets.

AI Companies Are Earning Millions and Their Data Sources Want Their Cut

Companies like OpenAI and Microsoft have been benefitting from the powerful models they have managed to build by using the data obtained from these platforms. For example, Microsoft is charging $19 per month for its code-generation solution GitHub Copilot while OpenAI earns millions for giving developers access to their API.

Since AI models will likely continue to grow, they will need more and more data to keep building their knowledge base. Platforms like Reddit and Stack Overflow are acting on the assumption that companies like OpenAI, Anthropic, and Stability AI will keep turning to their communities to keep training their software.

One of the characteristics that make these platforms a relevant source for AI models is the fact that they are ever-evolving libraries of knowledge as users will discuss new topics as they appear.

For example, in 2021 topics like the Russia vs. Ukraine war and the US’s attempt to ban TikTok due to national security concerns were not being actively discussed and no information was available on the internet about this as these events had not occured.

By using the data from these platforms and online libraries like Wikipedia, AI models can be progressively updated by keeping tabs on every new discussions and topics that come up on them.

Other Related Articles: