Meet the Other Phone. Flexible and made to last.

Meet the Other Phone.
Flexible and made to last.

Buy now

Please or to access all these features

Why we're taking legal action against Open AI and other scrapers

6 replies

JustineMumsnet · 19/07/2024 09:46

Hi all - you may have noticed this piece (https://www.thetimes.com/uk/technology-uk/article/mumsnet-openai-sues-copyright-infringement-cz5hzvf8s) in the Times today and I wanted to explain why we're doing this.

Earlier this year, we became aware that OpenAI was scraping Mumsnet - presumably to train their large language model (LLM). Such scraping without permission is an explicit breach of our terms of use, which clearly state that no part of the site may be distributed, scraped or copied for any purpose without our express approval. So we approached Open AI and suggested they might like to licence our content. In truth there are some very good reasons why the LLMs should ingest our conversational data to train their models. The six billion plus words on Mumsnet is a unique record of twenty-four years of female conversation about everything from global politics to fashion to relationships with in-laws. By contrast the majority of the content on the web was written by and for men. AI models have misogyny baked in and we’d love to help counter the gender bias likely to be present in many of them and raise women’s voices. Their response was that they were more interested in datasets that are not easily accessible online.

Much of the content on the open web is likewise being lifted. Mustafa Suleyman, CEO of Microsoft AI pronounced only two weeks ago that machine-learning companies are perfectly within their rights to scrape content published online because the moment it’s published it becomes ‘freeware’.

You might ask why the theft of online content for model-training poses a problem - hasn’t Google been crawling all over websites and ingesting their data for search purposes since the dawn of the internet? True, but there is a clear value exchange in allowing Google to access that data, namely the resulting search traffic that comes from being indexed by Google. The LLMs are building models like ChatGPT to provide the answers to any and all prospective questions that will mean we’ll no longer need to go elsewhere for solutions. And they’re building those models with scraped content from the websites they are poised to replace.

At Mumsnet we’re in a stronger position than most because much of our traffic comes to us direct and though it’s a piece of cake for an LLM to spit out a Mumsnet-style answer to a parenting question I doubt they’ll ever be as funny about parking wars or as honest about relationships and they’ll certainly never provide the emotional support that sees around a thousand women a year helped to leave abusive partners by other Mumsnet users. But if these trillion-dollar giants are simply allowed to pillage content from online publishers - and get away with it - they will destroy many of them.

Not surprisingly, a number of large, global publishers are currently suing OpenAI and Microsoft for copyright infringement and here at Mumsnet though we’re neither large (in revenue terms) nor global, we’ve decided we have no choice but to initiate a legal complaint too.

That’s not to say that A.I. is all bad of course. It plainly has the potential to advance human progress and improve our lives in multiple ways. But if the LLMs are allowed to simply steal content from publishers and communities like Mumsnet they risk destroying them. Everything that’s unique and brilliant about sites like ours will be lost, and a handful of Silicon Valley giants will be left with even more control over the world’s content and commerce.

We know that taking on a multinational giant like OpenAI, with its $3bn of revenues, is not an easy task in the face of the huge resources they’ll throw at us but this is too important an issue to simply roll over. Not just for Mumsnet but for every website you’ve ever landed on for news, advice or simply to ask if you’re being unreasonable.

Mumsnet launches first British legal action against OpenAI

Parenting website accuses the California tech giant of scraping six billion words from it to help build the chatbot ChatGPT

https://www.thetimes.com/uk/technology-uk/article/mumsnet-openai-sues-copyright-infringement-cz5hzvf8s

JustineMumsnet · 19/07/2024 11:29

thanks all for the support - if you're happy to, please do share a link to this thread - would love to get the message out as widely as possible Flowers

RhiannonEMumsnet · 19/07/2024 11:50

shellyleppard · 19/07/2024 11:36

@JustineMumsnet how do I share?? Sorry not very tech savvy x

Hi @shellyleppard, if you click the share button at the bottom of the OP it will give you some options for sharing the link. Thank you!

Why we're taking legal action against Open AI and other scrapers
Why we're taking legal action against Open AI and other scrapers

JustineMumsnet · 19/07/2024 12:20

dieselKiller · 19/07/2024 11:35

Good not to let them just take everything, but what’s the end game?

Are you asking them to pay for access?

Or are you trying to prevent them using the content because you believe that the site’s users should have a say in whether their content is ingested into Open AI’s LLM?

Mumsnet users provide and own the content. Mumsnet users should decide whether their content can be used for this purpose on an individual and opt-in basis.

Can you clarify whether you agree with the principle that each of us must give our own, individual, opt in consent before our text can be used for any purpose beyond simple display on the mumsnet website?

Hi dieselKiller, we are requiring that OpenAI deletes any Mumsnet data they hold and ceases to use it for their models. That said, as I've outlined above we think there are some good reasons to use MN conversational data to counter gender bias in LLMs but it seems OpenAI doesn't agree right now.

JustineMumsnet · 19/07/2024 12:41

Sparklfairy · 19/07/2024 12:18

I'm a bit confused too. Threads often get picked up by journos and it's an easy way to produce a cheap article - compiled mostly of screenshots and a (possibly AI generated!) summary of the thread.

So presumably MN has some sort of kickback arrangement with the papers, and/or benefit from the increased traffic. And MN users can't complain because as PP said it's in their terms. Them's the breaks, if you post and it ends up elsewhere online, that's the potential pitfall of posting on a public forum.

My confusion comes from the way Justine has phrased it - it seems a bit disingenuous. The reality seems to be MN is simply annoyed they're not getting paid for their content being lifted.

So just say that? Don't make out that somehow this is some virtuous pursuit of equality/fight against misogyny. It isn't - because if you win, Open AI just won't use your content, so you'd only be furthering the male-weighted conversation bias.

You're worried that people will use Open AI instead of MN? Anyone that uses Chat GPT at all knows it's no substitute for real advice from real women. The only reason people would post their questions to GPT instead of MN is because of your own terms allowing journos to pick it up and publish it...

We don't get any kickback from the papers when they do articles based on Mumsnet content, no - in fact they often don't even include a link to Mumsnet which is frustrating (The Mirror, owned by Reach being worst offender here). But there are rules around fair use for papers and they are only lifting small amounts of content - they don't send bots in to copy our code wholesale with the aim of a reproduction of the database. And yes, we believe we should be compensated for that reproduction - we put a lot of resource into running and maintaining the Mumsnet platform and these AI companies are worth billions on the back of their LLMs and the revenues they will make from them. Plus we think that the wholesale disregard of websites terms of use is wrong (and possibly existential for some) Plus we think women's voices and opinions should be included proportionately in LLMs. None of these things are mutually exclusive.

JustineMumsnet · 19/07/2024 15:02

MrsTerryPratchett · 19/07/2024 14:37

I'd be interested to know too.

We're awaiting a response from Aston to our legal letter of complaint.

JustineMumsnet · 19/07/2024 17:22

peachgreen · 19/07/2024 17:09

Very glad you’re doing this and completely agree with your stance but I wanted to ask for clarification on something… You said this:

…no part of the site may be distributed, scraped or copied for any purpose without our express approval

Does that mean that you’ve given the Daily Mail et al permission to lift and reproduce posts and responses verbatim? Even when – as you often acknowledge in deletion messages – it causes posters real-life distress?

Daily Mail et al would claim this exemption as I understand it
https://www.gov.uk/guidance/exceptions-to-copyright#fair-dealing

Exceptions to copyright

Details of the exceptions to copyright that allow limited use of copyright works without the permission of the copyright owner.

https://www.gov.uk/guidance/exceptions-to-copyright#fair-dealing

Watch this thread for updates

Tap "Watch" to get all the latest updates

End of posts

There are no more MNHQ posts on this thread