Meet the Other Phone. Only the apps you allow.

Meet the Other Phone.
Only the apps you allow.

Buy now

Please or to access all these features

Site stuff

Join our Innovation Panel to try new features early and help make Mumsnet better.

See all MNHQ comments on this thread

Why we're taking legal action against Open AI and other scrapers

134 replies

JustineMumsnet · 19/07/2024 09:46

Hi all - you may have noticed this piece (https://www.thetimes.com/uk/technology-uk/article/mumsnet-openai-sues-copyright-infringement-cz5hzvf8s) in the Times today and I wanted to explain why we're doing this.

Earlier this year, we became aware that OpenAI was scraping Mumsnet - presumably to train their large language model (LLM). Such scraping without permission is an explicit breach of our terms of use, which clearly state that no part of the site may be distributed, scraped or copied for any purpose without our express approval. So we approached Open AI and suggested they might like to licence our content. In truth there are some very good reasons why the LLMs should ingest our conversational data to train their models. The six billion plus words on Mumsnet is a unique record of twenty-four years of female conversation about everything from global politics to fashion to relationships with in-laws. By contrast the majority of the content on the web was written by and for men. AI models have misogyny baked in and we’d love to help counter the gender bias likely to be present in many of them and raise women’s voices. Their response was that they were more interested in datasets that are not easily accessible online.

Much of the content on the open web is likewise being lifted. Mustafa Suleyman, CEO of Microsoft AI pronounced only two weeks ago that machine-learning companies are perfectly within their rights to scrape content published online because the moment it’s published it becomes ‘freeware’.

You might ask why the theft of online content for model-training poses a problem - hasn’t Google been crawling all over websites and ingesting their data for search purposes since the dawn of the internet? True, but there is a clear value exchange in allowing Google to access that data, namely the resulting search traffic that comes from being indexed by Google. The LLMs are building models like ChatGPT to provide the answers to any and all prospective questions that will mean we’ll no longer need to go elsewhere for solutions. And they’re building those models with scraped content from the websites they are poised to replace.

At Mumsnet we’re in a stronger position than most because much of our traffic comes to us direct and though it’s a piece of cake for an LLM to spit out a Mumsnet-style answer to a parenting question I doubt they’ll ever be as funny about parking wars or as honest about relationships and they’ll certainly never provide the emotional support that sees around a thousand women a year helped to leave abusive partners by other Mumsnet users. But if these trillion-dollar giants are simply allowed to pillage content from online publishers - and get away with it - they will destroy many of them.

Not surprisingly, a number of large, global publishers are currently suing OpenAI and Microsoft for copyright infringement and here at Mumsnet though we’re neither large (in revenue terms) nor global, we’ve decided we have no choice but to initiate a legal complaint too.

That’s not to say that A.I. is all bad of course. It plainly has the potential to advance human progress and improve our lives in multiple ways. But if the LLMs are allowed to simply steal content from publishers and communities like Mumsnet they risk destroying them. Everything that’s unique and brilliant about sites like ours will be lost, and a handful of Silicon Valley giants will be left with even more control over the world’s content and commerce.

We know that taking on a multinational giant like OpenAI, with its $3bn of revenues, is not an easy task in the face of the huge resources they’ll throw at us but this is too important an issue to simply roll over. Not just for Mumsnet but for every website you’ve ever landed on for news, advice or simply to ask if you’re being unreasonable.

Mumsnet launches first British legal action against OpenAI

Parenting website accuses the California tech giant of scraping six billion words from it to help build the chatbot ChatGPT

https://www.thetimes.com/uk/technology-uk/article/mumsnet-openai-sues-copyright-infringement-cz5hzvf8s

OP posts:
Thread gallery
9
dieselKiller · 19/07/2024 18:51

shellyleppard · 19/07/2024 18:46

@JustineMumsnet @RhiannonEMumsnet there isn't a share button.....i have double checked

Tap the three dots … at the top right of a post and you’ll see a menu that contains Share.

shellyleppard · 19/07/2024 18:53

@JustineMumsnet @RhiannonEMumsnet doh I didn't think of that. I will try again lol

shellyleppard · 19/07/2024 18:54

@JustineMumsnet @RhiannonEMumsnet okay i think its done..... thanks for all your help and patience with me today 🙏 ❤️

IgoogledYOLO · 19/07/2024 18:57

👏

Sibilantseamstress · 19/07/2024 19:15

Well done! Go get ‘em!

TeamPolin · 19/07/2024 19:25

Bravo @JustineMumsnet!! I applaud you. 👏

moderate · 19/07/2024 20:20

I don’t get why so many people are so averse to their writing contributing to the training of large language models.

If those models are hoovering up men’s voices and gender ideology on other sites, I sure as hell want them to ingest what we’ve written here to help provide balance.

veritusvarity · 19/07/2024 20:34

Nice one Justine💪🏻

dieselKiller · 19/07/2024 20:37

LLMs are good for SEO and disinformation and bad for anything that requires correctness.

They are really bad search engines (because they have no concept of correctness), and they are incredibly energy inefficient. Polluting and wrong - what a combo!

This is the CEO of Open AI.
en.m.wikipedia.org/wiki/Sam_Altman

dudsville · 19/07/2024 20:45

Very thought provoking, well done @JustineMumsnet!

ShakespearesSisters · 19/07/2024 20:53

Good luck. It's a challenge but worth it

TooBusyGazingAtStarss · 19/07/2024 20:54

Well done!

HappiestSleeping · 19/07/2024 21:12

@JustineMumsnet I think you are doing exactly the right thing. AI can be a powerful tool, but the data the models are trained on should be credited and compensated and only used with permission.

I have 35 odd years in technology and would be happy to help you should you need it. PM me if you need an extra pair of hands.

ArabellaScott · 19/07/2024 21:18

moderate · 19/07/2024 20:20

I don’t get why so many people are so averse to their writing contributing to the training of large language models.

If those models are hoovering up men’s voices and gender ideology on other sites, I sure as hell want them to ingest what we’ve written here to help provide balance.

Because it depends. Data Protection is all about sharing data for a specific purpose - perhaps to help another woman who's going through something you've gone through.

If it ends up being used for another purpose then it's potentially impinging on data rights, morally as well as perhaps legally.

moderate · 19/07/2024 21:45

ArabellaScott · 19/07/2024 21:18

Because it depends. Data Protection is all about sharing data for a specific purpose - perhaps to help another woman who's going through something you've gone through.

If it ends up being used for another purpose then it's potentially impinging on data rights, morally as well as perhaps legally.

Legally, we’ve given up all our rights to Mumsnet HQ. They can do whatever they like with our posts, subject to statutory rights.

I just don’t understand why people want to withhold what they’ve publicly posted. It’s excluding yourself from helping to shape attitudes.

SquirrelSoShiny · 19/07/2024 21:51

Well done @JustineMumsnet 👏

dieselKiller · 19/07/2024 21:55

moderate · 19/07/2024 21:45

Legally, we’ve given up all our rights to Mumsnet HQ. They can do whatever they like with our posts, subject to statutory rights.

I just don’t understand why people want to withhold what they’ve publicly posted. It’s excluding yourself from helping to shape attitudes.

Why give defective tools a veneer of quality so that more people are fooled by them?

The attitude that I want to shape is that LLMs are dangerously defective by design.

Apparently ChatGPT already contains mumsnet content? It hasn’t fixed its flaws.

Glitterbiscuits · 19/07/2024 22:10

That's brilliant @JustineMumsnet !

Thanks

WindsurfingDreams · 19/07/2024 22:35

While I support this, I do think it is time to take stock and consider whether you can better protect women from the daily mail type "stories".

You could easily pull a thread if daily mail link to it.

It must be quite devastating sometimes for women to share a concern on here and then find it splashed on the daily mail website.

I would like to see Mumsnet think a bit more creatively about how to deter daily mail. Or at least reach a decent arrangement with them where they won't cover stories that are clearly intrusive

Ereshkigalangcleg · 19/07/2024 22:48

People who are debating the merits of LLMs should read the Aston thread, and see just how our posts could potentially be misused.

Ereshkigalangcleg · 19/07/2024 22:49

www.mumsnet.com/Talk/site_stuff/5072884-corpus-2

Ereshkigalangcleg · 19/07/2024 22:53

First thread www.mumsnet.com/Talk/site_stuff/5057903-mumsnet-corpus

MavisPennies · 19/07/2024 22:54

Good!

peachgreen · 19/07/2024 22:57

JustineMumsnet · 19/07/2024 17:22

Daily Mail et al would claim this exemption as I understand it
https://www.gov.uk/guidance/exceptions-to-copyright#fair-dealing

Thanks for clarifying, Justine.

Oblomov24 · 19/07/2024 23:00

Good for you.
I found the Steven Bartlett podcast with Mohammad Gawdat frightening.

Swipe left for the next trending thread