Meet the Other Phone. Only the apps you allow.

Meet the Other Phone.
Only the apps you allow.

Buy now

Please or to access all these features

Site stuff

Join our Innovation Panel to try new features early and help make Mumsnet better.

See all MNHQ comments on this thread

Why we're taking legal action against Open AI and other scrapers

134 replies

JustineMumsnet · 19/07/2024 09:46

Hi all - you may have noticed this piece (https://www.thetimes.com/uk/technology-uk/article/mumsnet-openai-sues-copyright-infringement-cz5hzvf8s) in the Times today and I wanted to explain why we're doing this.

Earlier this year, we became aware that OpenAI was scraping Mumsnet - presumably to train their large language model (LLM). Such scraping without permission is an explicit breach of our terms of use, which clearly state that no part of the site may be distributed, scraped or copied for any purpose without our express approval. So we approached Open AI and suggested they might like to licence our content. In truth there are some very good reasons why the LLMs should ingest our conversational data to train their models. The six billion plus words on Mumsnet is a unique record of twenty-four years of female conversation about everything from global politics to fashion to relationships with in-laws. By contrast the majority of the content on the web was written by and for men. AI models have misogyny baked in and we’d love to help counter the gender bias likely to be present in many of them and raise women’s voices. Their response was that they were more interested in datasets that are not easily accessible online.

Much of the content on the open web is likewise being lifted. Mustafa Suleyman, CEO of Microsoft AI pronounced only two weeks ago that machine-learning companies are perfectly within their rights to scrape content published online because the moment it’s published it becomes ‘freeware’.

You might ask why the theft of online content for model-training poses a problem - hasn’t Google been crawling all over websites and ingesting their data for search purposes since the dawn of the internet? True, but there is a clear value exchange in allowing Google to access that data, namely the resulting search traffic that comes from being indexed by Google. The LLMs are building models like ChatGPT to provide the answers to any and all prospective questions that will mean we’ll no longer need to go elsewhere for solutions. And they’re building those models with scraped content from the websites they are poised to replace.

At Mumsnet we’re in a stronger position than most because much of our traffic comes to us direct and though it’s a piece of cake for an LLM to spit out a Mumsnet-style answer to a parenting question I doubt they’ll ever be as funny about parking wars or as honest about relationships and they’ll certainly never provide the emotional support that sees around a thousand women a year helped to leave abusive partners by other Mumsnet users. But if these trillion-dollar giants are simply allowed to pillage content from online publishers - and get away with it - they will destroy many of them.

Not surprisingly, a number of large, global publishers are currently suing OpenAI and Microsoft for copyright infringement and here at Mumsnet though we’re neither large (in revenue terms) nor global, we’ve decided we have no choice but to initiate a legal complaint too.

That’s not to say that A.I. is all bad of course. It plainly has the potential to advance human progress and improve our lives in multiple ways. But if the LLMs are allowed to simply steal content from publishers and communities like Mumsnet they risk destroying them. Everything that’s unique and brilliant about sites like ours will be lost, and a handful of Silicon Valley giants will be left with even more control over the world’s content and commerce.

We know that taking on a multinational giant like OpenAI, with its $3bn of revenues, is not an easy task in the face of the huge resources they’ll throw at us but this is too important an issue to simply roll over. Not just for Mumsnet but for every website you’ve ever landed on for news, advice or simply to ask if you’re being unreasonable.

Mumsnet launches first British legal action against OpenAI

Parenting website accuses the California tech giant of scraping six billion words from it to help build the chatbot ChatGPT

https://www.thetimes.com/uk/technology-uk/article/mumsnet-openai-sues-copyright-infringement-cz5hzvf8s

OP posts:
Thread gallery
9
BIWI · 19/07/2024 10:03

Brilliant. But must also be very daunting. Flowers

WalkInAStraightLine · 19/07/2024 10:08

This is really interesting and well done for taking a stance! I will watch with interest.

In my experience, Chat GPT, and other AIs that attempt to 'summarise' text, very often leave out key context, details, or are just plain incorrect. Now clearly you can't stop that even if it wasn't scraping MN but it's worth pointing out how flawed these results often are - in fact Google is being sued for libel after publishing a 'summary' that was very misleading
https://www.theatlantic.com/technology/archive/2024/06/google-ai-overview-libel/678751/

https://www.addleshawgoddard.com/en/insights/insights-briefings/2023/technology/generative-ai-licence-libel/

As a very tangential tangent, if you are interested in questions around who gets to shape what we talk about and how, I enjoyed the (fiction) book "The Dictionary of Lost Words" by Pip Williams - it's based around the creation of the Oxford Dictionary and although I'm not hugely into stories about relationships this also raised some thought-provoking questions.

Ereshkigalangcleg · 19/07/2024 10:09

Well done @JustineMumsnet Flowers

RivkaTheBold · 19/07/2024 10:09

Good luck to you @JustineMumsnet

OuterSpaceCadet · 19/07/2024 10:19

Excellent news.

Thesquarerootofnotgivingafuck · 19/07/2024 10:20

Go Justine!

EatMoreFibre · 19/07/2024 10:24

Yes! Flowers

RumNotRun · 19/07/2024 10:24

Good luck 🤞

I also second the recommendation by @WalkInAStraightLine It is a fantastic book and came to mind when I read the part of Justine's post about the majority of web content being written for and by men.

TwattyMcFuckFace · 19/07/2024 10:27

Well done and good luck! 🍾💐

DrSpartacular · 19/07/2024 10:30

Good luck!

EasterlyDirections · 19/07/2024 10:32

I saw the article this morning, good luck!

SirChenjins · 19/07/2024 10:35

Good luck! You’re doing the right thing 💪

SiobhanSharpe · 19/07/2024 10:39

Such a cogent argument, it really made me think.
Well done, it's a necessary action, and best of luck with it,

AstonToTheNaughtyStep · 19/07/2024 10:55

Well done Justine, and best of luck.

I'm not a mother, but I ended up here because every time I searched the internet for answers to problems ranging from work issues to personal relationships to attempting DIY in my house there would always be a link to Mumsnet in the list. I love it here. Thank you for giving women this fabulous, diverse and safe space to speak to eachother.

AstonUniversityScrapedMyCorpus · 19/07/2024 10:56

Fascinating news, makes me proud to be a Mumsnetter!

Boiledbeetle · 19/07/2024 10:56
workin moms comedy GIF by CBC

You tried it the nice way and they weren't interested.

Now you crush them!

shellyleppard · 19/07/2024 10:59

Thank you Mumsnet, for standing up to the big guys. I'm following with interest as I would be lost without this site, it makes me smile daily. Thanks again x

FedUpWithBriiiiick · 19/07/2024 10:59

👏👏👏

Dragonfly97 · 19/07/2024 11:01

Good luck Mumsnet! I'm right behind you- I'm not a mum but I appreciate the wealth of knowledge you provide on so many issues. We need you!

ooooohnoooooo · 19/07/2024 11:02

Good luck 🤞

morningpaper · 19/07/2024 11:03

Good luck! It's a fascinating area TBH and I can't make up my mind about most of it... but am following with interest! Well done for tackling it.

WhereDoBrokenHeartsGo · 19/07/2024 11:05

This is amazing, I love that you are taking a stand and raising awareness.

I don’t understand how it can be considered fair use to scrape and summarise websites - I’ve rarely seen output that adds value, most of it is summarising one or two websites and rehashing their information. It really bothers me that someone has worked hard to create valuable content and it ends up being taken with absolutely no reward for the original creator

Morgi · 19/07/2024 11:21

Good luck MNHQ! Surely there should be better regulation to prevent this? And such an important point about most of the internet being written by and for men 🙄

libertybonds · 19/07/2024 11:23

I heard you on Radio 4 this morning. Really good initiative on your part! You have my support.

mrshoho · 19/07/2024 11:25

Good luck Justine and thank you 💐.