Meet the Other Phone. Protection built in.

Meet the Other Phone.
Protection built in.

Buy now

Please or to access all these features

Site stuff

Join our Innovation Panel to try new features early and help make Mumsnet better.

See all MNHQ comments on this thread

Why we're taking legal action against Open AI and other scrapers

134 replies

JustineMumsnet · 19/07/2024 09:46

Hi all - you may have noticed this piece (https://www.thetimes.com/uk/technology-uk/article/mumsnet-openai-sues-copyright-infringement-cz5hzvf8s) in the Times today and I wanted to explain why we're doing this.

Earlier this year, we became aware that OpenAI was scraping Mumsnet - presumably to train their large language model (LLM). Such scraping without permission is an explicit breach of our terms of use, which clearly state that no part of the site may be distributed, scraped or copied for any purpose without our express approval. So we approached Open AI and suggested they might like to licence our content. In truth there are some very good reasons why the LLMs should ingest our conversational data to train their models. The six billion plus words on Mumsnet is a unique record of twenty-four years of female conversation about everything from global politics to fashion to relationships with in-laws. By contrast the majority of the content on the web was written by and for men. AI models have misogyny baked in and we’d love to help counter the gender bias likely to be present in many of them and raise women’s voices. Their response was that they were more interested in datasets that are not easily accessible online.

Much of the content on the open web is likewise being lifted. Mustafa Suleyman, CEO of Microsoft AI pronounced only two weeks ago that machine-learning companies are perfectly within their rights to scrape content published online because the moment it’s published it becomes ‘freeware’.

You might ask why the theft of online content for model-training poses a problem - hasn’t Google been crawling all over websites and ingesting their data for search purposes since the dawn of the internet? True, but there is a clear value exchange in allowing Google to access that data, namely the resulting search traffic that comes from being indexed by Google. The LLMs are building models like ChatGPT to provide the answers to any and all prospective questions that will mean we’ll no longer need to go elsewhere for solutions. And they’re building those models with scraped content from the websites they are poised to replace.

At Mumsnet we’re in a stronger position than most because much of our traffic comes to us direct and though it’s a piece of cake for an LLM to spit out a Mumsnet-style answer to a parenting question I doubt they’ll ever be as funny about parking wars or as honest about relationships and they’ll certainly never provide the emotional support that sees around a thousand women a year helped to leave abusive partners by other Mumsnet users. But if these trillion-dollar giants are simply allowed to pillage content from online publishers - and get away with it - they will destroy many of them.

Not surprisingly, a number of large, global publishers are currently suing OpenAI and Microsoft for copyright infringement and here at Mumsnet though we’re neither large (in revenue terms) nor global, we’ve decided we have no choice but to initiate a legal complaint too.

That’s not to say that A.I. is all bad of course. It plainly has the potential to advance human progress and improve our lives in multiple ways. But if the LLMs are allowed to simply steal content from publishers and communities like Mumsnet they risk destroying them. Everything that’s unique and brilliant about sites like ours will be lost, and a handful of Silicon Valley giants will be left with even more control over the world’s content and commerce.

We know that taking on a multinational giant like OpenAI, with its $3bn of revenues, is not an easy task in the face of the huge resources they’ll throw at us but this is too important an issue to simply roll over. Not just for Mumsnet but for every website you’ve ever landed on for news, advice or simply to ask if you’re being unreasonable.

Mumsnet launches first British legal action against OpenAI

Parenting website accuses the California tech giant of scraping six billion words from it to help build the chatbot ChatGPT

https://www.thetimes.com/uk/technology-uk/article/mumsnet-openai-sues-copyright-infringement-cz5hzvf8s

OP posts:
Thread gallery
9
dieselKiller · 19/07/2024 12:54

This might also be a good time to state that you don’t welcome bots and AI-generated content on the site and that any deal with Open AI won’t change that.

shockeditellyou · 19/07/2024 12:57

Good luck and thank you!

DawnAttwood · 19/07/2024 14:14

As someone who uses Mumsnet regularly but never really thinks about the stuff that goes on behind the scenes this is fascinating - and clearly really important. Good luck Justine and Mumsnet!

Blueyatemyhomework · 19/07/2024 14:30

Am I understanding the reason you taking the action is because you've missed a trick on being able to monetise this data set due to data scraping technologies? What is happening about the Aston scrape?

MrsTerryPratchett · 19/07/2024 14:37

Blueyatemyhomework · 19/07/2024 14:30

Am I understanding the reason you taking the action is because you've missed a trick on being able to monetise this data set due to data scraping technologies? What is happening about the Aston scrape?

I'd be interested to know too.

Thesquarerootofnotgivingafuck · 19/07/2024 15:01

DawnAttwood · 19/07/2024 14:14

As someone who uses Mumsnet regularly but never really thinks about the stuff that goes on behind the scenes this is fascinating - and clearly really important. Good luck Justine and Mumsnet!

Hi Dawn, if you find this interesting I suggest you take a look at this thread.
www.mumsnet.com/talk/site_stuff/5057903-mumsnet-corpus

JustineMumsnet · 19/07/2024 15:02

MrsTerryPratchett · 19/07/2024 14:37

I'd be interested to know too.

We're awaiting a response from Aston to our legal letter of complaint.

OP posts:
MrsTerryPratchett · 19/07/2024 15:06

Thanks Justine!

Igmum · 19/07/2024 15:14

Thank you @JustineMumsnet Flowers

BoreOfWhabylon · 19/07/2024 15:22

Thank you Justine and MNHQ!
Chinese proverb says
When sleeping women wake, mountains move

Missscentsation · 19/07/2024 15:40

Blatant book marking

ifIwerenotanandroid · 19/07/2024 15:46

Thank you & good luck!

Gazelda · 19/07/2024 15:50

Thank you Mumsnet. I agree that it's vital that women have a voice in this debate and input into the conversation around how AI infiltrates our lives.

BellyPork · 19/07/2024 16:13

I call bullshit. You wanted paying for the content. That's fine but don't pretend it's a moral crusade.

AstonUniversityScrapedMyCorpus · 19/07/2024 16:46

BellyPork · 19/07/2024 16:13

I call bullshit. You wanted paying for the content. That's fine but don't pretend it's a moral crusade.

It’s not immoral to want to be compensated when someone takes something from you without asking.

moderate · 19/07/2024 16:59

“they don't send bots in to copy our code wholesale with the aim of a reproduction of the database”

But that’s not how LLMs work. They digest. Like someone who reads a lot of books and becomes an expert on a subject, but without being able to reproduce anything verbatim from any of them.

peachgreen · 19/07/2024 17:09

Very glad you’re doing this and completely agree with your stance but I wanted to ask for clarification on something… You said this:

…no part of the site may be distributed, scraped or copied for any purpose without our express approval

Does that mean that you’ve given the Daily Mail et al permission to lift and reproduce posts and responses verbatim? Even when – as you often acknowledge in deletion messages – it causes posters real-life distress?

PurpleSparkledPixie · 19/07/2024 17:12

BoreOfWhabylon · 19/07/2024 15:22

Thank you Justine and MNHQ!
Chinese proverb says
When sleeping women wake, mountains move

When sleeping women wake, mountains move
😯

pokes the sleepy mountain troll to see what happens
😈

ArabellaScott · 19/07/2024 17:14

MrsTerryPratchett · 19/07/2024 14:37

I'd be interested to know too.

The article does seem to suggest Mumsnet is asking AI companies to pay for scraped content, rather than just taking it for free.

If this is going to happen, I think it surely couldn't be sold retrospectively, and users would have to be fully informed their posts/words/data may be sold? But I suppose it depends on what the T&Cs currently say.

JustineMumsnet · 19/07/2024 17:22

peachgreen · 19/07/2024 17:09

Very glad you’re doing this and completely agree with your stance but I wanted to ask for clarification on something… You said this:

…no part of the site may be distributed, scraped or copied for any purpose without our express approval

Does that mean that you’ve given the Daily Mail et al permission to lift and reproduce posts and responses verbatim? Even when – as you often acknowledge in deletion messages – it causes posters real-life distress?

Daily Mail et al would claim this exemption as I understand it
https://www.gov.uk/guidance/exceptions-to-copyright#fair-dealing

Exceptions to copyright

Details of the exceptions to copyright that allow limited use of copyright works without the permission of the copyright owner.

https://www.gov.uk/guidance/exceptions-to-copyright#fair-dealing

OP posts:
ErrolTheDragon · 19/07/2024 17:31

moderate · 19/07/2024 16:59

“they don't send bots in to copy our code wholesale with the aim of a reproduction of the database”

But that’s not how LLMs work. They digest. Like someone who reads a lot of books and becomes an expert on a subject, but without being able to reproduce anything verbatim from any of them.

Well yes... good analogy, as currently the results of this digestion seem to be some useful output plus a lot of shit.

Suzieandthemonkeyfeet · 19/07/2024 17:39

Great news!

I’d also put money on it there are threads that have been made by A.I. to gauge the responses and conversational tone.

Are you able to track those?

dieselKiller · 19/07/2024 17:50

To the people asking whether the terms and conditions allow mumsnet to do what they want with our posts, I would offer two thoughts:

  1. A business reputation is hard won and easily lost. Mumsnet shouldn’t be in the business of maximally exploiting our posts if people feel strongly about the way they do that, regardless of whether they could make a legal argument that they have the right.
  2. There remains scope in the English legal system to challenge terms & conditions that people don’t read or don’t understand, especially when we’re talking about ordinary consumers and not business users, as we are with mumsnet posters. In particular, terms that are written to try and take into account unknowable future events are a weak point in my view.

The first thought is the most practical and collegial one of course. Users of the site seem mostly happy. To ensure they remain happy by allowing people to opt out of all the AI nonsense is an incredibly low cost and sensible thing for mumsnet to do.

A business that relies on user-generated content has to balance extracting value against making people feel comfortable and cared for. Personally, I feel very strongly that I do not want my posts to be used to train AI and that I do not want any AI features on the site.

Sethera · 19/07/2024 18:01

HelpMeGetThrough · 19/07/2024 12:32

The global outage today shows just what over-reliance on Microsoft can lead to. They need challenging.

Microsoft didn't cause the problem.

I know it didn't originate with Microsoft, but their systems clearly lacked the resilience to stop it doing damage. If we put all our eggs in one tech basket, we leave ourselves vulnerable, especially if that basket has holes in it.

shellyleppard · 19/07/2024 18:46

@JustineMumsnet @RhiannonEMumsnet there isn't a share button.....i have double checked

Swipe left for the next trending thread