EXCLUSIVE: BBC 'not approached' on proposed British media AI dataset
AI action plan calls for copyright-cleared dataset to be 'licensed at scale'; UK government's AI critic says it 'mirrors sell-out of the UK’s creative industries'
WHAT’S HAPPENED?
PLANS TO INCLUDE BBC programming in a British media dataset that could be licensed to AI developers were floated in a government-commissioned report without consulting the corporation, Charting Gen AI can reveal.
The AI Opportunities Action Plan — compiled by venture capitalist Matt Clifford and published this week — called on the UK government to establish a “copyright-cleared British media asset training dataset, which can be licensed internationally at scale”. “This could be done through partnering with bodies that hold valuable cultural data like the National Archives, Natural History Museum, British Library and the BBC to develop a commercial proposition for sharing their data to advance AI.”
Partially agreeing to Clifford’s recommendation the government said culture and technology ministers would “engage with partner organisations and industry to consider the potential role of government in taking forward this recommendation” and gave a completion target date of “Spring 2025”.
The full BBC Archive — one of the world’s largest multimedia archives — contains broadcast material dating back to the earliest pioneering days of radio and television plus around 7 million still images and an extensive recorded music collection. Commercial access to 200,000 licence-ready clips and over 1 million hours of footage in the BBC Motion Gallery is managed by Getty Images on behalf of BBC Studios, the corporation’s main for-profit division.
The creation of an AI training dataset — part of a proposed National Data Library — including BBC clips and programming would inevitably cut across those commercial arrangements and require the close cooperation of the UK’s oldest and biggest public service broadcaster. We asked the corporation whether it was aware of the idea before it was mentioned in the Clifford recommendation.
A BBC spokesperson told Charting:
“The BBC has not been approached by the government regarding these plans. As always, we would carefully assess any opportunity to ensure it aligns with our values and responsibilities, and delivers value for licence fee payers.”
Baroness Kidron, the multi-award winning film director and producer who sits as an independent peer in the upper house, told Charting: “Tragically the government in partially agreeing with this recommendation has chosen the part that gives no control, agency or right to license to these institutions, many of which are paid for by the public purse.
“By adding AI industry and government into the list of ‘partners’ it is mirroring its sell-out of the UK’s creative industries. Our cultural institutions play a central role in our identity, education and soft power, and are desperately underfunded. They should be allowed to exploit their assets on their own terms.”
Lord Clement-Jones, Liberal Democrat spokesperson on science, innovation and technology in the Lords, told us: “Provided the BBC has control of what data and content is shared, and it gets full value out of it, then this is better than the current situation with its content being scraped from the internet for training of large language models (LLMs) without a licence. The problem is that there are no details at all about the National Data Library. Nothing in the Data Use and Access Bill and the government doesn’t know yet whether it needs primary or secondary legislation to bring it about.”
WHY SHOULD WE CARE?
✨ There’s much here for broadcasters all over the world to unpack, as well as the BBC — which will have read about its potential involvement in the national AI training dataset at the same time as the rest of us. Broadcast programming is an amalgam of multiple content creators; separating out rights for AI licensing will be a complex process. Then there’s the ‘substitutive products’ issue that’s frequently mentioned in copyright infringement lawsuits brought by publishers and creators against AI developers: once clips and programming have been ingested they can be used to create content that directly competes with the original material, reducing its value. There are brand integrity issues too as a broadcaster’s premium programming would inevitably be chopped up and blended with other material, creating deepfakes and other brand-tarnishing (if not totally destroying) outputs. The key to avoiding that, as Lord Clement-Jones says, is maintaining control and insisting that proper guardrails are in place across all the AIs. But who will do that?
Further reaction to the AI action plan and other developments that threaten to reshape the global human-made media landscape in tomorrow’s Weekly Newsletter
Great exclusive, Graham. It doesn’t take much effort to pick up the phone and pop the questions: “Hey BBC, what do you think of this idea? What are the implications? Could we make this work?” Underlines how wafer-thin and speculative this Action Plan actually is. Written in glorious, uninformed, unquestioning isolation.
I think that the idea has some merit. There is probably a great deal of value to be created out of combining content from public institutions. I think Tim Clement-Jones is right - the institutions have to be able to have some degree of control over the content.
However, we need to realise that this content was built using UK taxpayers' money and I'm afraid that if the institutions are left to exploit the content themselves, they will not consider the wider benefits of combining data.