Even if data is widely considered as the oil of the 21st century, it has been largely under-monetized until now. As an illustration, OpenAI developed ChatGPT with massive amounts of data, free of charge…
But data-rich companies, such as social media, have taken notice and the situation is rapidly evolving with both Twitter and Reddit announcing in the last couple of months the end of free data access and the introduction of pricing plans for their APIs.
Let’s first start with the concept of an API (Application Programming Interface). An API is a set of rules and protocols that allows different software applications to communicate and interact with each other. In the context of social media platforms like Twitter and Reddit, APIs enable third-party developers to build applications that can read and write data from these platforms, creating a wide range of digital tools and services.
Both Twitter and Reddit have become increasingly cautious about the extensive use of their APIs by companies for training AI systems as their contents have served as a free teaching aid for companies such as Google, OpenAI, and Microsoft. This sudden awareness and willingness to monetize data can also find their source in both companies’ financial challenges. Twitter has had to deal with declining advertising revenue since Elon Musk’s takeover and has been actively seeking new revenue streams, including an enterprise access for its API starting at $500K a year and rising up to $2.5 million a year.
Similarly, Reddit, which is seeking to reach an elusive profitability and gearing up for a potential IPO, introduced usage-based pricing for its API, that can reportedly cost several millions or even tens of millions of dollars a year to third-party developers.
Interestingly, these pricing adjustments also address concerns related to the use of APIs for malicious activity on social media: with Twitter and Reddit being widely utilized platforms, there has been a growing issue of bots and automated systems leveraging their APIs to manipulate engagement, spread misinformation, or undertake spammy activities. By increasing the cost associated with API access, Twitter and Reddit find a way to discourage such practices and create a deterrent for those looking to exploit their platforms for inappropriate use.
Accordingly, we believe that the monetization of APIs and data is likely to expand very fast, a major positive for the profitability of social media companies that have troves of data (Meta, Snap, Pinterest, Tencent…) but also for companies owning more specific data (professional/academic publishing, scientific/medical research…) that could license their content. Indeed, it’s likely that, at some point, developers of language models will seek to refine their AI with “high-end”, more accurate data in specific fields such as medicine or economics/finance and will need to look beyond the public Internet.
But this monetization will also come with some negatives. First, only a handful of players will be able to support massive data costs to train AI foundation models, suggesting that most start-ups will have to exit the space and that the current lead of Big Tech in AI is unlikely to be challenged soon. The only start-ups that we see emerging in the space are or will probably be backed by the Tech giants, such as OpenAI (Microsoft), Anthropic (Google, Salesforce, Zoom…) or Cohere (Nvidia and ).
And, from an end-user perspective, the cost of AI services is likely to rise in a near future as most AI companies will obviously try to preserve their profitability.
Second, data pricing changes on Twitter and Reddit have disrupted the developer landscape on both apps and raised questions about the financial viability of third-party applications. Some Reddit third-party apps like Apollo, RIF (Reddit is Fun) and ReddPlanet have already announced they will have to shut down because of the data costs while others are seeking to pass on the cost to their customers. Infinity for Reddit for instance has introduced a paid version, in hopes of ensuring the future of its app.
In conclusion, the impact of data pricing plans clearly extends far beyond Twitter and Reddit as other social media platforms will also need to navigate the delicate balance between data protection/monetization and a thriving developer ecosystem. These initiatives are likely to redefine the very structure of the AI and broader digital ecosystem, with a clear skew towards larger and profitable players in each segment.