Even if data is widely considered as the oil of the 21st century, it has been largely under-monetized until now. As an illustration, OpenAI developed ChatGPT with massive amounts of data, free of charge…
But data-rich companies, such as social media, have taken notice and the situation is rapidly evolving with companies including X and Reddit announcing since last year the end of free data access and the introduction of pricing plans for their APIs, that enable third-party developers to read and collect data from these platforms.
This data monetization trend is now getting some clarity in terms of its economics as Reddit is in its IPO process and disclosed its revenue streams. In addition to its advertising revenue, the company has signed licensing agreements allowing some AI companies to train their models on its social media platform’s content and totaling $203 million in aggregate contract value over two to three years. Reuters reported that Reddit has notably struck a $60 million/year deal with Google, a rather large amount when considering that Reddit ranks “only” at the 16th place of the most popular social media platforms, with 60 million daily active users.
Assuming Reddit signs similar deals with all companies developing large AI models (OpenAI, Anthropic, Mistral, xAI, Meta…), the revenue and earnings opportunity would then be massive (probably around $300 million a year), having in mind that licensing revenue usually come with 80-90% margins.
Accordingly, we believe that the monetization of AI data is likely to expand very fast, a major positive for the revenue growth and profitability of social media companies that have troves of data (Meta, Snap, Pinterest, Tencent…) but also for companies owning more specific data (professional/academic publishing, scientific/medical research…) that are starting to license their content as well. Indeed, developers of language models are already seeking to refine their AI with “high-end”, more accurate data in specific fields such as medicine or economics/finance and are looking beyond the public Internet. For instance, OpenAI struck deals with media groups Prisa, Axel Springer and Le Monde.
Another consequence of this monetization trend is that the barriers to entry in the large language and multimodal models space, that were already high, are going to get even higher as wannabe start-ups will have to fund both their computing needs AND data from now on (vs. only computing previously). This suggests that the current lead of Big Tech and a couple of start-ups backed by Big Tech, such as OpenAI (Microsoft), Anthropic (Google, Salesforce, Zoom…) or Cohere (Nvidia and Oracle), is unlikely to be challenged soon.