- CO/AI
- Posts
- Publishers vs Big Tech 🟢
Publishers vs Big Tech 🟢
Why HarperCollins is paying authors $2500 and how the NYT-OpenAI battle exposes the ever growing tension between publishers and AI companies.
In partnership with
NEW LAUNCHES
The latest features, products & partnerships in AI
Mistral introduces search and a suite of creative tools to its AI assistant
Microsoft’s new AI tool helps medical professionals detect cancer
Google’s AI voice assistant is now available on iPhone — here are 3 practical use cases
Perplexity Pro now includes an AI shopping assistant and free shipping on e-com purchases
GOVERNMENT
Law, regulation, defense & politics.
AI MODELS
Deployment, research, training & infrastructure
From mind to machine: How human biology is informing new breakthroughs in AI
Italian researchers investigate LLMs’ ability to handle ethical dilemmas in finance
Mistral unveils Pixtral Large, an open-weights multimodal model
Why some engineers believe LLMs present a ‘dead end’ for software development
IMPLEMENTATION
Announcements, strategies & case studies
Experts say large language models for healthcare must overcome these barriers
Forrester publishes guide for retrieval-augmented generation
Artificial Integrity: How to maintain trust when implementing AI
IBM says successful enterprise AI adoption requires navigating these obstacles
Thoughtworks’ latest Radar report says these are the AI solutions you should adopt right now
OTHER INTERESTING STORIES
Compelling stories beyond the usual categories
What’s happening in AI right now
The scramble for training data
As training data becomes the new oil powering AI development, Meta even floated acquiring Simon & Schuster for AI training data, traditional publishers are moving quickly to monetize their catalogs.
The publisher's gambit
HarperCollins made waves this week by offering authors $2,500 for three-year licenses to use their nonfiction books in AI training - a move that could set industry standards but has raised eyebrows over valuation. For context, this represents one of the first structured licensing programs from a major publisher, though the modest compensation - averaging less than $1,000 per year - has sparked debate about fair rates for AI training content.
Meanwhile, the Copyright Clearance Center has launched the industry's first collective licensing solution for AI content use through a modified annual blanket license. This framework operates globally and aims to standardize how businesses can legally access training data while ensuring creator compensation.
Hollywood's hidden data mine
Behind the scenes, tech giants including Apple, Anthropic, Meta and Nvidia have been training on an enormous dataset of subtitles from over 53,000 movies and 85,000 TV episodes - largely without explicit authorization. This widespread use of entertainment industry content helps explain the growing pushback from creative professionals concerned about their intellectual property being used without compensation. Runway struck a pioneering deal with Lionsgate to train AI video models on the studio's catalog - though this comes as California enacts new restrictions on AI replicas in film and while Runway faces litigation over unauthorized training data use.
The transparency battle
OpenAI's legal clash with The New York Times has exposed the challenges of auditing AI training data. The Times estimates needing $800,000 worth of API credits to properly inspect OpenAI's models - far above OpenAI's proposed $15,000 cap. This dispute highlights the broader tension between AI companies' desire to protect proprietary technology and growing demands for transparency.
The data wall ahead
The scramble for training data comes amid signs that AI progress may be hitting limitations. Researchers project that high-quality public text data suitable for AI training could be fully utilized between 2026-2032. OpenAI's latest Orion model showed only marginal gains over GPT-4, suggesting diminishing returns from traditional training approaches.
New defensive tools emerge
Content creators are developing innovative protection methods. A new "copyright trap" tool lets creators embed unique gibberish sentences into their text that serve as digital watermarks. These traps are designed to be easily detectable in AI outputs, providing evidence of unauthorized training use. While the tool works best with smaller AI models that tend to memorize training data, it represents a creative approach to a complex challenge. Researchers have made the implementation code publicly available on GitHub, though they acknowledge limitations including the potential impact on text readability.
The core tension reflects an industry racing to accumulate training data while grappling with thorny questions of ownership, compensation and transparency. Publishers and creators are pushing back against unauthorized use while exploring new licensing models. This battle over content rights and training data may define the next phase of AI development.
We publish daily research, playbooks, and deep industry data breakdowns. Learn More Here
One of the holy grails of unlocking the potential of AI is to build systems that can reason like humans. By improving AI's, Large Language Models in particular, ability to break down complex problems and apply logical steps.
Bagel’s research team has been exploring this problem. Analyzing LLM building techniques, especially fine-tuning techniques, to allow Large Language Models to evolve from pattern-recognizing prediction agents to true cognitive agents
Read More Here 🥯
AI generated art
A look at the art and projects being created with AI.
Midjourney Prompt: A photograph of an orange car with red flowers in front, parked outside white houses on the island of Lanzarote, against a blue sky on a sunny day. The image exhibits minimalism, natural colors, and natural lighting, with high-resolution photography that is insanely detailed and intricate, resulting in a hyper-realistic style in the manner of David LaChapelle.
How'd you like today's issue?Have any feedback to help us improve? We'd love to hear it! |
Reply