Meet the Other Phone. Flexible and made to last.

Meet the Other Phone.
Flexible and made to last.

Buy now

Please or to access all these features

Site stuff

Join our Innovation Panel to try new features early and help make Mumsnet better.

See all MNHQ comments on this thread

Why we're taking legal action against Open AI and other scrapers

134 replies

JustineMumsnet · 19/07/2024 09:46

Hi all - you may have noticed this piece (https://www.thetimes.com/uk/technology-uk/article/mumsnet-openai-sues-copyright-infringement-cz5hzvf8s) in the Times today and I wanted to explain why we're doing this.

Earlier this year, we became aware that OpenAI was scraping Mumsnet - presumably to train their large language model (LLM). Such scraping without permission is an explicit breach of our terms of use, which clearly state that no part of the site may be distributed, scraped or copied for any purpose without our express approval. So we approached Open AI and suggested they might like to licence our content. In truth there are some very good reasons why the LLMs should ingest our conversational data to train their models. The six billion plus words on Mumsnet is a unique record of twenty-four years of female conversation about everything from global politics to fashion to relationships with in-laws. By contrast the majority of the content on the web was written by and for men. AI models have misogyny baked in and we’d love to help counter the gender bias likely to be present in many of them and raise women’s voices. Their response was that they were more interested in datasets that are not easily accessible online.

Much of the content on the open web is likewise being lifted. Mustafa Suleyman, CEO of Microsoft AI pronounced only two weeks ago that machine-learning companies are perfectly within their rights to scrape content published online because the moment it’s published it becomes ‘freeware’.

You might ask why the theft of online content for model-training poses a problem - hasn’t Google been crawling all over websites and ingesting their data for search purposes since the dawn of the internet? True, but there is a clear value exchange in allowing Google to access that data, namely the resulting search traffic that comes from being indexed by Google. The LLMs are building models like ChatGPT to provide the answers to any and all prospective questions that will mean we’ll no longer need to go elsewhere for solutions. And they’re building those models with scraped content from the websites they are poised to replace.

At Mumsnet we’re in a stronger position than most because much of our traffic comes to us direct and though it’s a piece of cake for an LLM to spit out a Mumsnet-style answer to a parenting question I doubt they’ll ever be as funny about parking wars or as honest about relationships and they’ll certainly never provide the emotional support that sees around a thousand women a year helped to leave abusive partners by other Mumsnet users. But if these trillion-dollar giants are simply allowed to pillage content from online publishers - and get away with it - they will destroy many of them.

Not surprisingly, a number of large, global publishers are currently suing OpenAI and Microsoft for copyright infringement and here at Mumsnet though we’re neither large (in revenue terms) nor global, we’ve decided we have no choice but to initiate a legal complaint too.

That’s not to say that A.I. is all bad of course. It plainly has the potential to advance human progress and improve our lives in multiple ways. But if the LLMs are allowed to simply steal content from publishers and communities like Mumsnet they risk destroying them. Everything that’s unique and brilliant about sites like ours will be lost, and a handful of Silicon Valley giants will be left with even more control over the world’s content and commerce.

We know that taking on a multinational giant like OpenAI, with its $3bn of revenues, is not an easy task in the face of the huge resources they’ll throw at us but this is too important an issue to simply roll over. Not just for Mumsnet but for every website you’ve ever landed on for news, advice or simply to ask if you’re being unreasonable.

Mumsnet launches first British legal action against OpenAI

Parenting website accuses the California tech giant of scraping six billion words from it to help build the chatbot ChatGPT

https://www.thetimes.com/uk/technology-uk/article/mumsnet-openai-sues-copyright-infringement-cz5hzvf8s

OP posts:
Thread gallery
9
Cellotapedispenser · 19/07/2024 11:25

Good Luck Mumsnet, will also follow this with interest. So true that most content is male leaning. Also agree that AI is never going to be as funny as mumsnetters. See current thread on banning the use of the word 'tummy' Grin

JustineMumsnet · 19/07/2024 11:29

thanks all for the support - if you're happy to, please do share a link to this thread - would love to get the message out as widely as possible Flowers

OP posts:
JennyBeanR · 19/07/2024 11:30

Wow, good luck and thanks for the update 👍

ssd · 19/07/2024 11:33

Well doneStar

dieselKiller · 19/07/2024 11:35

Good not to let them just take everything, but what’s the end game?

Are you asking them to pay for access?

Or are you trying to prevent them using the content because you believe that the site’s users should have a say in whether their content is ingested into Open AI’s LLM?

Mumsnet users provide and own the content. Mumsnet users should decide whether their content can be used for this purpose on an individual and opt-in basis.

Can you clarify whether you agree with the principle that each of us must give our own, individual, opt in consent before our text can be used for any purpose beyond simple display on the mumsnet website?

shellyleppard · 19/07/2024 11:36

@JustineMumsnet how do I share?? Sorry not very tech savvy x

Amazinggrace842 · 19/07/2024 11:43

So the future of humanity is that we don't even talk to each other online (now that everyone wants to WFH and cancelling social plans is the new hobby), we're supposed to talk to robots instead? Robot men, at that. Who of course have the answers to all questions 🙄, fed to us by harvesting someone else's knowledge. How fucking lazy is that? The ultimate in crumb communications. I wonder if off grid communities, where you get to interact with others as nature intended, will become more popular?

mrshoho · 19/07/2024 11:46

Amazinggrace842 · 19/07/2024 11:43

So the future of humanity is that we don't even talk to each other online (now that everyone wants to WFH and cancelling social plans is the new hobby), we're supposed to talk to robots instead? Robot men, at that. Who of course have the answers to all questions 🙄, fed to us by harvesting someone else's knowledge. How fucking lazy is that? The ultimate in crumb communications. I wonder if off grid communities, where you get to interact with others as nature intended, will become more popular?

I know! I can't get my head around who is going to benefit from this? The future is not looking bright.

RhiannonEMumsnet · 19/07/2024 11:50

shellyleppard · 19/07/2024 11:36

@JustineMumsnet how do I share?? Sorry not very tech savvy x

Hi @shellyleppard, if you click the share button at the bottom of the OP it will give you some options for sharing the link. Thank you!

Why we're taking legal action against Open AI and other scrapers
Why we're taking legal action against Open AI and other scrapers
shellyleppard · 19/07/2024 11:53

@RhiannonEMumsnet thank you 💐 i will try my best

TheShellBeach · 19/07/2024 11:57

This is very good news, Justine.

Well done and good luck!

LaeralSilverhand · 19/07/2024 12:00

@dieselKiller when you post on mumsnet you transfer all rights to Mumsnet and they can do with your post what they want, including selling it. From the T&Cs:

"By submitting User Content to us, simultaneously with such posting you automatically grant to us a worldwide, fully-paid, royalty-free, perpetual, irrevocable, non-exclusive, fully sublicensable, and transferable right and license to use, record, sell, lease, reproduce, distribute, create derivative works based upon (including, without limitation, translations), publicly display, publicly perform, transmit, publish and otherwise exploit the User Content (in whole or in part) as Mumsnet, in its sole discretion, deems appropriate. We may exercise this grant in any format, media or technology now known or later developed for the full term of any copyright that may exist in such User Content."

This is in line with most social media platforms.

PerkingFaintly · 19/07/2024 12:14

YANBU!

Thank you so much for doing this, @JustineMumsnet .

Sparklfairy · 19/07/2024 12:18

dieselKiller · 19/07/2024 11:35

Good not to let them just take everything, but what’s the end game?

Are you asking them to pay for access?

Or are you trying to prevent them using the content because you believe that the site’s users should have a say in whether their content is ingested into Open AI’s LLM?

Mumsnet users provide and own the content. Mumsnet users should decide whether their content can be used for this purpose on an individual and opt-in basis.

Can you clarify whether you agree with the principle that each of us must give our own, individual, opt in consent before our text can be used for any purpose beyond simple display on the mumsnet website?

I'm a bit confused too. Threads often get picked up by journos and it's an easy way to produce a cheap article - compiled mostly of screenshots and a (possibly AI generated!) summary of the thread.

So presumably MN has some sort of kickback arrangement with the papers, and/or benefit from the increased traffic. And MN users can't complain because as PP said it's in their terms. Them's the breaks, if you post and it ends up elsewhere online, that's the potential pitfall of posting on a public forum.

My confusion comes from the way Justine has phrased it - it seems a bit disingenuous. The reality seems to be MN is simply annoyed they're not getting paid for their content being lifted.

So just say that? Don't make out that somehow this is some virtuous pursuit of equality/fight against misogyny. It isn't - because if you win, Open AI just won't use your content, so you'd only be furthering the male-weighted conversation bias.

You're worried that people will use Open AI instead of MN? Anyone that uses Chat GPT at all knows it's no substitute for real advice from real women. The only reason people would post their questions to GPT instead of MN is because of your own terms allowing journos to pick it up and publish it...

JustineMumsnet · 19/07/2024 12:20

dieselKiller · 19/07/2024 11:35

Good not to let them just take everything, but what’s the end game?

Are you asking them to pay for access?

Or are you trying to prevent them using the content because you believe that the site’s users should have a say in whether their content is ingested into Open AI’s LLM?

Mumsnet users provide and own the content. Mumsnet users should decide whether their content can be used for this purpose on an individual and opt-in basis.

Can you clarify whether you agree with the principle that each of us must give our own, individual, opt in consent before our text can be used for any purpose beyond simple display on the mumsnet website?

Hi dieselKiller, we are requiring that OpenAI deletes any Mumsnet data they hold and ceases to use it for their models. That said, as I've outlined above we think there are some good reasons to use MN conversational data to counter gender bias in LLMs but it seems OpenAI doesn't agree right now.

OP posts:
ResisterOfTwaddleRex · 19/07/2024 12:20

Thank you MNHQ and Justine Flowers

PurpleSparkledPixie · 19/07/2024 12:26

Another one adding my voice of thanks. I don't want a male robot in my future discussions.

EmeraldRoulette · 19/07/2024 12:28

Amazinggrace842 · 19/07/2024 11:43

So the future of humanity is that we don't even talk to each other online (now that everyone wants to WFH and cancelling social plans is the new hobby), we're supposed to talk to robots instead? Robot men, at that. Who of course have the answers to all questions 🙄, fed to us by harvesting someone else's knowledge. How fucking lazy is that? The ultimate in crumb communications. I wonder if off grid communities, where you get to interact with others as nature intended, will become more popular?

That is the endgame. I have been aware of this for a while and I think any steps that we take are delaying it.

But I think delaying it has a lot of value.

At this point, I think anything that preserves some humanity is a great idea. So thank you @JustineMumsnet for taking this action.

Even if it is ultimately about profit, that in itself is really important! otherwise we will be down to 2 corporations battling for everything and no one else will have a commercial outlet.

Sethera · 19/07/2024 12:31

I look forward to Co-Pilot telling me to cancel the cheque or LTB 😅

On a serious note, well done. The global outage today shows just what over-reliance on Microsoft can lead to. They need challenging.

HelpMeGetThrough · 19/07/2024 12:32

The global outage today shows just what over-reliance on Microsoft can lead to. They need challenging.

Microsoft didn't cause the problem.

thesandwich · 19/07/2024 12:35

Thank you

ErrolTheDragon · 19/07/2024 12:40

That said, as I've outlined above we think there are some good reasons to use MN conversational data to counter gender bias in LLMs but it seems OpenAI doesn't agree right now.

I wonder whether any of these companies (or universities) are investigating the effects of using different datasets for their training?

JustineMumsnet · 19/07/2024 12:41

Sparklfairy · 19/07/2024 12:18

I'm a bit confused too. Threads often get picked up by journos and it's an easy way to produce a cheap article - compiled mostly of screenshots and a (possibly AI generated!) summary of the thread.

So presumably MN has some sort of kickback arrangement with the papers, and/or benefit from the increased traffic. And MN users can't complain because as PP said it's in their terms. Them's the breaks, if you post and it ends up elsewhere online, that's the potential pitfall of posting on a public forum.

My confusion comes from the way Justine has phrased it - it seems a bit disingenuous. The reality seems to be MN is simply annoyed they're not getting paid for their content being lifted.

So just say that? Don't make out that somehow this is some virtuous pursuit of equality/fight against misogyny. It isn't - because if you win, Open AI just won't use your content, so you'd only be furthering the male-weighted conversation bias.

You're worried that people will use Open AI instead of MN? Anyone that uses Chat GPT at all knows it's no substitute for real advice from real women. The only reason people would post their questions to GPT instead of MN is because of your own terms allowing journos to pick it up and publish it...

We don't get any kickback from the papers when they do articles based on Mumsnet content, no - in fact they often don't even include a link to Mumsnet which is frustrating (The Mirror, owned by Reach being worst offender here). But there are rules around fair use for papers and they are only lifting small amounts of content - they don't send bots in to copy our code wholesale with the aim of a reproduction of the database. And yes, we believe we should be compensated for that reproduction - we put a lot of resource into running and maintaining the Mumsnet platform and these AI companies are worth billions on the back of their LLMs and the revenues they will make from them. Plus we think that the wholesale disregard of websites terms of use is wrong (and possibly existential for some) Plus we think women's voices and opinions should be included proportionately in LLMs. None of these things are mutually exclusive.

OP posts:
ErrolTheDragon · 19/07/2024 12:45

HelpMeGetThrough · 19/07/2024 12:32

The global outage today shows just what over-reliance on Microsoft can lead to. They need challenging.

Microsoft didn't cause the problem.

No, but it seems the scale of the problem caused by the Crowdstrike error was due to over reliance on affected Microsoft systems.
It's hard/impossible for airlines, banks etc to build redundant systems based on alternatives so not sure what the answer is!

dieselKiller · 19/07/2024 12:48

Deletion of existing data & desisting scraping is a good first step.

But if, as it sounds, you hope to charge for access, I really hope that users will be able to prevent this use. If you can make a compelling case for including this data, most people will agree and those that don’t (like me) will be happy that you’ve respected their wishes.

I would expect that if you can get Open AI to pay, they will want a dedicated feed. You can easily remove content from opted out users during generation of that feed.

Why not show people what consent actually looks like?

Swipe left for the next trending thread