Meet the Other Phone. Only the apps you allow.

Meet the Other Phone.
Only the apps you allow.

Buy now

Please or to access all these features

Site stuff

Join our Innovation Panel to try new features early and help make Mumsnet better.

See all MNHQ comments on this thread

Mumsnet Corpus

1000 replies

TokyoBouncyBall · 19/04/2024 11:36

Not a TAAT, but a bit of googling as a result of a now deleted thread has led me to this:

https://fold.aston.ac.uk/handle/123456789/18

I note it says that the License is uncertain. Can you confirm that you have given permission for posts to be used in this way, or is there something that Aston might like to look into?

I note it says Users who wish to access this dataset must make a detailed application to FoLD and the researcher, as well as potentially gain additional agreement from an external organisation before they can be approved for access.

Given one of the uses it is being put to, I think it is a bit dubious to say the least.

OP posts:
Thread gallery
82
SoupDragonsFriend · 24/04/2024 12:11

RedToothBrush · 24/04/2024 11:38

No they don't have to take legal advice from anyone on here.

But a failure to take users concerns seriously does put MN at risk because users have rights they can legally follow up on if they don't feel MN is doing the things they should be.

It would be frankly stupid to fail to address users concerns in line with these kind of points.

I would imagine, when identifying issues that have only come to light in the space of six days, it would also be extremely useful to have a rapidly generated bank of replies from people who have a very wide range of potentially relevant expertise, including law. It's an MN focus group.

If it were me having to get to grips with this quickly, and I know nothing about this stuff apart from a Jane Public level of interest and concern, I'd have been sorting this 6+ day collection of forum posts to make initial prioritised lists of the key issues and questions, and picking through information from posters who clearly have professional knowledge in relevant areas just to short-cut my thinking as to whom I needed to be consulting IRL.

Datun · 24/04/2024 12:17

Cazpar · 24/04/2024 11:29

They will have their own dedicated lawyers. They don't need to take legal advice from posters on here.

Um, they didn't even know they'd been scraped until they were informed by their posters. On here.

KellieJaysLapdog · 24/04/2024 12:17

NotTHATCorpusLinguist · 24/04/2024 11:33

I wanted to offer some thoughts on the doxxing threat angle taken up in this and the other thread, and give some potentially helpful info on other corpus linguistic technicalities... NOT to excuse or explain the PhD in question (because wow) but because this is my field and some of us have research integrity not reflected by this PhD and people at Aston. I'm livid that this kind of research is what gets corpus linguistics more known and people then think we're all unethical twits. Ehem. So:

The whole point of corpus linguistics is to look at a dataset as a whole. There are no individuals.

Forensic linguists might be looking at authorship identification but it is a whole different field to corpus linguistics.

The fact that the Mumsnet corpus held by Aston has/could be used by both forensic and corpus linguists makes this confusing.

EPs PhD looks to be squarely in the corpus linguistics wheelhouse, likely without any access to any usernames at all. The data you use in corpus software is just the post content. Usernames are miles away.

It is likely that ethical approval has been given for EPs PhD already precisely because the dataset already exists at Aston. In a kind of "I'm using an approved internal dataset, nothing to see here". But any researcher worth their salt should CHECK that the rules haven't changed. Should check the rules of the website in question. So that's a failure on the supervisor and the student. Ethics is a big thing. Not getting informed consent from participants is a massive thing. And the rules on online datasets are changing rapidly and it is vital to stay informed.

The big issue here is the ethics of the Mumsnet dataset existing in the first place as it contravenes the T&Cs of the site.

To be clear, I am not affiliated with Aston or the PhD in question. Just someone else in the field watching with interest in how this all ends...

Thanks, good to hear a view from someone with specific subject knowledge.

What do you think about FoLD being developed by someone (sadly recently passed away) who worked on both corpus and forensic linguistics and Aston advertising a travel grant for students whose work ‘sits at the intersection’ between the two?

I suspect a lot of us would’ve been significantly less troubled if we were just in a general corpus linguistics database, as opposed to a forensic linguistics database!

From what I can see there has been a bit of a push over the last 5 years or so towards combining linguistics disciplines (corpus, forensics, computerised/AI) via shared conferences? Tim Grant seems especially keen on making forensic linguistics ‘more science-y’ but not so machine driven that the results cannot easily be explained by an expert witness to a judge/jury.

Mumsnet Corpus
Mumsnet Corpus
Mumsnet Corpus
Mumsnet Corpus
AlisonDonut · 24/04/2024 12:19

I want to know why they are allowed to use this 'unparalleled access to genuine data and expertise' as a means to recruit new students, when we have not and MN has explicitly determined that our data is not to be scraped.

And lumping us into their research on criminality - if you think a hate crime has been commited guys, then report it. Otherwise fuck off.

https://www.aston.ac.uk/bss/social-sciences-and-humanities/staff-blogs/forensic-linguistics-cracking-cases-with-words

Mumsnet Corpus
NotTHATCorpusLinguist · 24/04/2024 12:27

@KellieJaysLapdogI think the mumsnet dataset being in a forensic linguistic database is more about funding/IT bandwidth than it is about how the dataset is used. It's likely that a forensic linguistic database has more protection/safeguarding on it from an access point of view (!) than the general database in the university and so it just needs to be somewhere. I feel a bit twitchy about it, but knowing how universities run, I think it's more likely to be something like "you don't need a corpus linguistics database, just put it in the forensic linguists database that already exists. You're all linguists, who cares".

Dumbledoreslemonsherbets · 24/04/2024 12:32

Cazpar · 24/04/2024 11:29

They will have their own dedicated lawyers. They don't need to take legal advice from posters on here.

Condescension and lack of comprehension. What a lovely combination.

Posters on here are posting for OTHER MN USERS, mainly, not MNHQ.

There is already an excellent FOI likely from a MN user. And we can all report to ICO if we want.

We're allowed to talk to each other and share our opinions and expertise with each other, that's kind of the point.

And many women on here are lawyers, some of whom post under their name, such as Sarah Phillimore.

KellieJaysLapdog · 24/04/2024 12:38

NotTHATCorpusLinguist · 24/04/2024 12:27

@KellieJaysLapdogI think the mumsnet dataset being in a forensic linguistic database is more about funding/IT bandwidth than it is about how the dataset is used. It's likely that a forensic linguistic database has more protection/safeguarding on it from an access point of view (!) than the general database in the university and so it just needs to be somewhere. I feel a bit twitchy about it, but knowing how universities run, I think it's more likely to be something like "you don't need a corpus linguistics database, just put it in the forensic linguists database that already exists. You're all linguists, who cares".

Edited

The chaps that ‘Donated’ the MN data to the FoLD are:

https://research.aston.ac.uk/en/persons/krzysztof-kredens/publications/

and
https://research.aston.ac.uk/en/persons/piotr-pezik

We found a talk on YouTube where one of them can be heard blithely joking about a MN user discussing her infertility.

Tim Grant was another speaker at the same event (in Manchester 2019).

it’s very difficult to trust the integrity of two men who uploaded a giant data scrape described as “license:unknown” without even notifying the owners of the website.

Surely most organisations expect to pay the OG host fordata use? Social Media makes money by selling user data to advertisers, why should Aston be able to help themselves to such a valuable resource? Why do they think they can ‘donate’ something they never owned?

Edited to add that ‘Data Donator’ Dr Krzysztof Kredens is listed by Aston as Director, Centre for Forensic Text Analysis so we didn’t end up in the FoLD just because of uni budgets, we were specifically scraped by academics in the forensics department.

Screenshots are of the FoLD MN index page before and after they were rumbled on Friday.

Mumsnet Corpus
Mumsnet Corpus
Mumsnet Corpus
DoNotScrapeMyDataBishes · 24/04/2024 12:41

The breathtaking arrogance of the Vice Chancellor with their "we have this data and we're going to use it anyway so fuck off" response.

In case Aston are scraping this - my username indicates that I do NOT give consent to be used in your thoughtcrime hate studies by the way - or are you now doing "spechul" ethics forms now where people get used consent or not? (In which case can I have the hours of my life back from when I had to do all my own ethics paperwork)

Dumbledoreslemonsherbets · 24/04/2024 12:43

KellieJaysLapdog · 24/04/2024 12:00

I’m a massive name changer, btw. Not because I’m committing thought crimes or avoiding a violent ex but because I realised several years ago that a Social Media presence with a name and an image attached is absolutely fucking over the mental health of those who participate in it, especially those who start using it when their brains are still developing. I quit posting anything on social media for both my own sake and to be a better role model for my children.

I like Mumsnet and I like not having a permanent username because here our words, our (robust!) discussions, our expressions of support are our focus.
We value our interactions and the lack of a block button or a predatory algorithm means we aren’t silo’d from each other the way we are elsewhere.
We aren’t elevating the pretty, or those with the highest follower count, here the quality of the discussion is Queen (and also: the inventive sweariness).

The ‘group’ is created by the board topic.

Jonathon Haidt’s new book re: the damage social media is causing to teens is an excellent, if scary read.
He observes that increasingly the Silicon Valley elite types are turning away from the inventions they created, they aren’t allowing their own children to access social media and they themselves are switching to ‘dumb’ phones outside of work.
It’s the less well-off kids who are spending the most time being warped by algorithms (because single parents and working class parents have less quality time to spend with their kids and fewer resources, making it harder to replace screens with activities).

Part of my giving up on social media meant moving back to the old school parts of the internet, the ones that don’t make us so anxious and depressed, including text based chat forums, like Mumsnet.

Mumsnet is one of the few OG discussion forums that has weathered the big changes that happened to the internet after the invention of the smart phone.

It hasn’t been easy, I’m sure, especially when anti woman activists started targeting advertisers.

Haidt says that the mental health of women and girls has been more affected by social media than that of men (boys have a different set of challenges mostly related to gaming).
I believe that Mumsnet’s great longevity has been precisely because it does not have the features that trigger anxiety and depression in women. You don’t have to worry about being socially ostracised for saying something daft/misguided on Mumsnet, you can just name change and start over, still access the same support, the same jokes, the same resources.

You can’t be ‘cancelled’ for something you say on Mumsnet, because no one knows who you are.

Even the mean gossipy-doxy websites aren’t particularly interested in us, the lack of selfies, the jumble of usernames and the inability to become a celeb tier user (no follower counts, no total posts tally) makes us rather dull to casual observers.
Plus a lot of people just ignore us because they are sexist and assume all mums talk about is mumsy things.

Justine’s commitment to free speech (within UK law and with the need for the business to remain financially viable!) and MN’s general ‘if it ain’t broke, don’t fix it’ attitude to stuff like site functions and layout design have created a British institution far more valuable than perhaps any other website in British history.

Obviously Aston have cottoned onto this value and won’t give up their giant, secretly appropriated data stash (“license: Unsure”) without a fight.

I wonder how various UK politicians feel about their disastrous webchats (and biscuit preferences) being held on a Secret Special Server in Birmingham, funded by the US government? Grin

Great post, agree 100% and also why MNHQ would be very, very silly to not take Aston to the cleaners and I'm sure will be asking hard questions.

They have built a trusted brand. The advertising potential for all these Mums doing the weekly shop, and all the child-related purchases week in week out is huge. Why should Aston be able to harvest that without asking first - the entitlement is breathtaking even before you get to the legality.

Dumbledoreslemonsherbets · 24/04/2024 12:48

Do we think these Aston "academics" realise that Mums have quite a lot of influence on where their kids go to Uni? Or not, in this case.

My understanding is that a lot of Universities have a very precarious financial future so a big ethical faux pas like this, not to mention potentially illegal activity, probably isn't going to help that.

RethinkingLife · 24/04/2024 12:49

Kellie - We found a talk on YouTube where one of them can be heard blithely joking about a MN user discussing her infertility.

Tim Grant was another speaker at the same event (in Manchester 2019).

Is that the roundtable? Is it still up, please? I'd like to consult it for consideration about wider ethical considerations in data projects in healthcare.

KellieJaysLapdog · 24/04/2024 12:50

Aston Institute for Forensic Linguistics, AIFL

https://research.aston.ac.uk/en/organisations/aston-institute-for-forensic-linguistics

Aston Centre for Applied Linguistics, ACAL

https://research.aston.ac.uk/en/organisations/aston-centre-for-applied-linguistics-acal

All our beef is with staff in the Forensics department.

fatshamedbyfamily · 24/04/2024 12:51

This reply has been deleted

This has been deleted by MNHQ for breaking our Talk Guidelines.

Dumbledoreslemonsherbets · 24/04/2024 12:51

VitoCorleoneOfMNMafia · 24/04/2024 10:33

Aston believe they have legitimate rights to use the data

I'd love to know what makes him think that.

I'm guessing arrogance and never having been held to account legally before?

I really can't see how it can be legal given the ICO position statement of 2023.

SqueakyDinosaur · 24/04/2024 12:52

I have to say that although the data theft - I mean, let's call it what it actually is - is by far the bigger issue, I hope that the academic quality control issue is also dealt with.

Someone who had been approved for a PhD wrote a biased, assumption-laden outline of a project that can't be described as research in any meaningful way, supported it with a 5-item bibliography including a single vice.com article, and her supervisor apparently saw no issue with that. It is utterly risible.

Dumbledoreslemonsherbets · 24/04/2024 12:53

This reply has been deleted

This has been deleted by MNHQ for breaking our Talk Guidelines.

Yeah... you haven't read the thread, or the ICO position on data scraping.

Seen yes, used not necessarily and only within the law (which IMO what Aston are doing is not).

fatshamedbyfamily · 24/04/2024 12:54

This reply has been deleted

This has been deleted by MNHQ for breaking our Talk Guidelines.

Dumbledoreslemonsherbets · 24/04/2024 12:55

This reply has been deleted

This has been deleted by MNHQ for breaking our Talk Guidelines.

You really don't understand the law.

Here you go.

https://ico.org.uk/about-the-ico/media-centre/news-and-blogs/2023/08/joint-statement-on-data-scraping-and-data-protection/#:~:text=%E2%80%9CSocial%20media%20companies%20have%20obligations,as%20a%20personal%20data%20breach.%E2%80%9D

KellieJaysLapdog · 24/04/2024 12:55

RethinkingLife · 24/04/2024 12:49

Kellie - We found a talk on YouTube where one of them can be heard blithely joking about a MN user discussing her infertility.

Tim Grant was another speaker at the same event (in Manchester 2019).

Is that the roundtable? Is it still up, please? I'd like to consult it for consideration about wider ethical considerations in data projects in healthcare.

Edited

Yes, here’s the link again.

We could really do with generating a transcript!
https://m.youtube.com/live/ZUfxdLstIOc

IncompleteSenten · 24/04/2024 12:56

This reply has been deleted

This has been deleted by MNHQ for breaking our Talk Guidelines.

I'm shocked that a university would do it and use it for the purposes it stated.

I'm well aware some random dickhead could decide to stalk me and read every one of my probably thousands of posts and dedicate their life to finding out I'm really Amanda from Penistone (there you go, saved you some time) but that's very different to an organisation doing a mass scrape of data over several years with the express intention of finding ways to develop programs to identify people from anonymous posts then seemingly allow people to use that data willy nilly without appropriate scrutiny.

fatshamedbyfamily · 24/04/2024 12:56

This reply has been deleted

This has been deleted by MNHQ for breaking our Talk Guidelines.

SqueakyDinosaur · 24/04/2024 12:57

This reply has been deleted

This has been deleted by MNHQ for breaking our Talk Guidelines.

Read Mumsnet's terms and conditions for an easy way to find out just how wrong you are!

Dumbledoreslemonsherbets · 24/04/2024 12:58

Statement on ICO page linked above

“This joint statement helps provide certainty, and consistency across borders, in how data protection applies to information people post online. Organisations must have a lawful reason for collecting and using people’s data, even when it is publicly available.
“Social media companies have obligations under UK data protection law to protect the information people post on their platforms.
“We are seeing increased reports of mass data scraping from social media and remind organisations that such incidents may require reporting to the ICO as a personal data breach.”
- Stephen Bonner, ICO Deputy Commissioner for Regulatory Supervision

Dumbledoreslemonsherbets · 24/04/2024 12:59

And defamation law applies if a University employee calls someone or multiple people 'transphobic' and they're not.

AgathaAllAlong · 24/04/2024 13:01

@KellieJaysLapdog oh god it's over seven hours long! Do you happen to know which talk (or perhaps roughly when in the video) mentions the database?

Please create an account

To comment on this thread you need to create a Mumsnet account.

This thread is not accepting new messages.