Meet the Other Phone. Only the apps you allow.

Meet the Other Phone.
Only the apps you allow.

Buy now

Please or to access all these features

Site stuff

Join our Innovation Panel to try new features early and help make Mumsnet better.

See all MNHQ comments on this thread

Mumsnet Corpus

1000 replies

TokyoBouncyBall · 19/04/2024 11:36

Not a TAAT, but a bit of googling as a result of a now deleted thread has led me to this:

https://fold.aston.ac.uk/handle/123456789/18

I note it says that the License is uncertain. Can you confirm that you have given permission for posts to be used in this way, or is there something that Aston might like to look into?

I note it says Users who wish to access this dataset must make a detailed application to FoLD and the researcher, as well as potentially gain additional agreement from an external organisation before they can be approved for access.

Given one of the uses it is being put to, I think it is a bit dubious to say the least.

OP posts:
Thread gallery
82
IncompleteSenten · 24/04/2024 11:15

From a legal issue on a thread a bit back the legal advice is to not show your hand because doing so gives the other side a heads up.

Datun · 24/04/2024 11:18

"no intention to identify individual posters from their posts."

a) so what's the point of it then? They wanted to identify authorship. How is that not identifying separate content from individual posters? They may not have our names, but they will have collected 'personally identifying information'. Which leads to names...

b) when MN say no scraping is allowed, and women sign up on that basis, how is that not misleading if it can then be scraped?

c) Have Aston offered any money/reward to allow this after the event?

Cazpar · 24/04/2024 11:18

SqueakyDinosaur · 24/04/2024 11:10

This came to light a couple of days ago. The CEO of Mumsnet has today spoken to the VC of Aston. No doubt there are people at Aston scrambling to get a defensible position together, but at this stage I'm not sure what else we could expect. MN may be getting their legal team to apply for remedy.

I would hope that MN is demanding that Aston stop all data-scraping from here and delete existing data, but I think saying they aren't taking it seriously is jumping the gun here.

There is an awful lot of jumping the shark on this thread.

If you genuinely think posters on here are the only ones taking it seriously then you're deluded.

Aston will be scrambling to try and ascertain whether they did have a right to use this data and whether they've acted legally. They will have an expensive legal team.

MNHQ will be doing the same but approaching it from the opposite direction.

For at least one party this is going to be a very difficult and costly situation. No-one is going to be sat on their arse giving a gallic shrug.

It is complex and will need to be examined carefully. We will not get answers overnight. If you are uncomfortable posting in the meantime then that's a shame, but proper care must be taken and it can't be rushed just because posters on here are unhappy. That way lies mistakes and more legal wrangling.

The idea that MNHQ should be taking random posters along to the call or that they're not taking it seriously is laughable.

RethinkingLife · 24/04/2024 11:19

On Site Stuff, it might be helpful to have a list of organisations and projects to whom MN have given permission with an overview of what that encompasses. It would make it easier to check and it would be a contribution to transparency.

RedToothBrush · 24/04/2024 11:25

Cazpar · 24/04/2024 11:18

There is an awful lot of jumping the shark on this thread.

If you genuinely think posters on here are the only ones taking it seriously then you're deluded.

Aston will be scrambling to try and ascertain whether they did have a right to use this data and whether they've acted legally. They will have an expensive legal team.

MNHQ will be doing the same but approaching it from the opposite direction.

For at least one party this is going to be a very difficult and costly situation. No-one is going to be sat on their arse giving a gallic shrug.

It is complex and will need to be examined carefully. We will not get answers overnight. If you are uncomfortable posting in the meantime then that's a shame, but proper care must be taken and it can't be rushed just because posters on here are unhappy. That way lies mistakes and more legal wrangling.

The idea that MNHQ should be taking random posters along to the call or that they're not taking it seriously is laughable.

If a data breach has happened or is suspected, there is an obligation to report yourself to the ICO within a few days of the incident or it coming to light.

I would hope this obligation isn't just ignored because it happens to be inconvenient or might somehow not be in the interests of the organisation.

Because it's not about the organisation it's about the rights and protections of users.

everythingthelighttouches · 24/04/2024 11:26

JustineMumsnet · 24/04/2024 10:30

Further update: I spoke to the Vice Chancellor this am. He's promised that he and his team will take time to thoroughly answer our questions which I'm sending over now. A couple of things he wanted to stress - Aston believe they have legitimate rights to use the data and there is/has been no intention to identify individual posters from their posts. He also accepted that the recent research by a first year PHD into "transphobia" may not be of the quality they expect and that he will investigate and commit to enhancements in quality if appropriate. Obviously there's lots more detail we need from them - will update here as and when we hear back.

They need to immediately tell you the lawful basis and conditions for processing under article s 6 and 9 if the UKGDPR, respectively.

He should have had that ready at the meeting today.

It is irrelevant whether they intend(ed) to identify individuals . It is now a distinct possibility.

This needs to be referred to the ICO.

Cazpar · 24/04/2024 11:28

RedToothBrush · 24/04/2024 11:25

If a data breach has happened or is suspected, there is an obligation to report yourself to the ICO within a few days of the incident or it coming to light.

I would hope this obligation isn't just ignored because it happens to be inconvenient or might somehow not be in the interests of the organisation.

Because it's not about the organisation it's about the rights and protections of users.

And I'm sure MNHQ are aware of this obligation given they have hundreds of thousands, if not millions, of peoples data, they have no doubt very competent lawyers who are skilled in this area, and they will certainly be aware of the deleterious consequences of not complying.

Cazpar · 24/04/2024 11:29

everythingthelighttouches · 24/04/2024 11:26

They need to immediately tell you the lawful basis and conditions for processing under article s 6 and 9 if the UKGDPR, respectively.

He should have had that ready at the meeting today.

It is irrelevant whether they intend(ed) to identify individuals . It is now a distinct possibility.

This needs to be referred to the ICO.

They will have their own dedicated lawyers. They don't need to take legal advice from posters on here.

everythingthelighttouches · 24/04/2024 11:30

RethinkingLife · 24/04/2024 11:19

On Site Stuff, it might be helpful to have a list of organisations and projects to whom MN have given permission with an overview of what that encompasses. It would make it easier to check and it would be a contribution to transparency.

Edited

This is literally in the terms and privacy policy at the bottom of the page.

Please, everyone should take ten minutes to read what you have signed up to on a public forum.

DeanElderberry · 24/04/2024 11:30

I have always been cautious about what I put on line - don't use Facebook or Whatsapp - use a pseudonym, make minor tweaks to things I describe. But over nearly 25 years of I have several times had the experience of the penny dropping as I realise exactly who a fellow cautious anonymised poster is in 'real life' (all life is real, on line and off). Not on Mumsnet actually, so far anyway, but on other sites.

I have no doubt at all that a determined human could Id me if they wanted to. I'm lucky that I don't have to protect other people from intrusion, I am not linked to an employer, and I am not at present doing anything that brings me into conflict with the law in the country I live in.

Many people here, who joined the site trusting in the good faith of Mumsnet, are in at least one of those situations, and they have a right not to have their data taken and presented for artificial analysis by remote searchers.

NotTHATCorpusLinguist · 24/04/2024 11:33

I wanted to offer some thoughts on the doxxing threat angle taken up in this and the other thread, and give some potentially helpful info on other corpus linguistic technicalities... NOT to excuse or explain the PhD in question (because wow) but because this is my field and some of us have research integrity not reflected by this PhD and people at Aston. I'm livid that this kind of research is what gets corpus linguistics more known and people then think we're all unethical twits. Ehem. So:

The whole point of corpus linguistics is to look at a dataset as a whole. There are no individuals.

Forensic linguists might be looking at authorship identification but it is a whole different field to corpus linguistics.

The fact that the Mumsnet corpus held by Aston has/could be used by both forensic and corpus linguists makes this confusing.

EPs PhD looks to be squarely in the corpus linguistics wheelhouse, likely without any access to any usernames at all. The data you use in corpus software is just the post content. Usernames are miles away.

It is likely that ethical approval has been given for EPs PhD already precisely because the dataset already exists at Aston. In a kind of "I'm using an approved internal dataset, nothing to see here". But any researcher worth their salt should CHECK that the rules haven't changed. Should check the rules of the website in question. So that's a failure on the supervisor and the student. Ethics is a big thing. Not getting informed consent from participants is a massive thing. And the rules on online datasets are changing rapidly and it is vital to stay informed.

The big issue here is the ethics of the Mumsnet dataset existing in the first place as it contravenes the T&Cs of the site.

To be clear, I am not affiliated with Aston or the PhD in question. Just someone else in the field watching with interest in how this all ends...

VitoCorleoneOfMNMafia · 24/04/2024 11:33

ArabellaScott · 24/04/2024 10:38

They don't get to scapegoat the student, btw. She's just been the canary.

This.

In seeking to slander FWR, she's alerted us to a far bigger problem that affects every past and current poster on Mumsnet.

She's unwittingly done us all a favour.

everythingthelighttouches · 24/04/2024 11:34

Cazpar · 24/04/2024 11:29

They will have their own dedicated lawyers. They don't need to take legal advice from posters on here.

It’s not legal advice.

It is my opinion and any individual can request this information as part of the data subject access process.

TokyoBouncyBall · 24/04/2024 11:37

So, other people think data protection is a feminist issue too:

https://forkast.news/demanding-data-privacy-is-feminist-fight/

OP posts:
ArabellaScott · 24/04/2024 11:38

NotTHATCorpusLinguist · 24/04/2024 11:33

I wanted to offer some thoughts on the doxxing threat angle taken up in this and the other thread, and give some potentially helpful info on other corpus linguistic technicalities... NOT to excuse or explain the PhD in question (because wow) but because this is my field and some of us have research integrity not reflected by this PhD and people at Aston. I'm livid that this kind of research is what gets corpus linguistics more known and people then think we're all unethical twits. Ehem. So:

The whole point of corpus linguistics is to look at a dataset as a whole. There are no individuals.

Forensic linguists might be looking at authorship identification but it is a whole different field to corpus linguistics.

The fact that the Mumsnet corpus held by Aston has/could be used by both forensic and corpus linguists makes this confusing.

EPs PhD looks to be squarely in the corpus linguistics wheelhouse, likely without any access to any usernames at all. The data you use in corpus software is just the post content. Usernames are miles away.

It is likely that ethical approval has been given for EPs PhD already precisely because the dataset already exists at Aston. In a kind of "I'm using an approved internal dataset, nothing to see here". But any researcher worth their salt should CHECK that the rules haven't changed. Should check the rules of the website in question. So that's a failure on the supervisor and the student. Ethics is a big thing. Not getting informed consent from participants is a massive thing. And the rules on online datasets are changing rapidly and it is vital to stay informed.

The big issue here is the ethics of the Mumsnet dataset existing in the first place as it contravenes the T&Cs of the site.

To be clear, I am not affiliated with Aston or the PhD in question. Just someone else in the field watching with interest in how this all ends...

Thanks, that's helpful.

RedToothBrush · 24/04/2024 11:38

Cazpar · 24/04/2024 11:29

They will have their own dedicated lawyers. They don't need to take legal advice from posters on here.

No they don't have to take legal advice from anyone on here.

But a failure to take users concerns seriously does put MN at risk because users have rights they can legally follow up on if they don't feel MN is doing the things they should be.

It would be frankly stupid to fail to address users concerns in line with these kind of points.

Astontacious · 24/04/2024 11:41

Cazpar · 24/04/2024 11:29

They will have their own dedicated lawyers. They don't need to take legal advice from posters on here.

But it’s good for everyone to be reminded that mums wear lots of hats.

Boiledbeetle · 24/04/2024 11:44

Let's see how songs mess with their data collection.

Not my best lyrics I know but I'm nursing a stinking migraine! Plus it wouldn't let me do what I actually wanted to say as I used naughty words originally!

https://suno.com/song/f62726b6-64af-46a5-a99b-67ef748f02ff

No!
No!
No debate!
We women say our NO!
We aren't your little playthings
Wrapped up in a pretty bow.
We aren't your toy to play with in your adult sandbox
We don't want to be a part of your attempts to dox
So take your data set and stick it in the bin
our words aren't yours to use, so just let that sink in.

No!
No!
No debate
We women say our No!

No debate | Suno

Women singing in anger song. Listen and make your own with Suno.

https://suno.com/song/f62726b6-64af-46a5-a99b-67ef748f02ff

VitoCorleoneOfMNMafia · 24/04/2024 11:44

NotTHATCorpusLinguist · 24/04/2024 11:33

I wanted to offer some thoughts on the doxxing threat angle taken up in this and the other thread, and give some potentially helpful info on other corpus linguistic technicalities... NOT to excuse or explain the PhD in question (because wow) but because this is my field and some of us have research integrity not reflected by this PhD and people at Aston. I'm livid that this kind of research is what gets corpus linguistics more known and people then think we're all unethical twits. Ehem. So:

The whole point of corpus linguistics is to look at a dataset as a whole. There are no individuals.

Forensic linguists might be looking at authorship identification but it is a whole different field to corpus linguistics.

The fact that the Mumsnet corpus held by Aston has/could be used by both forensic and corpus linguists makes this confusing.

EPs PhD looks to be squarely in the corpus linguistics wheelhouse, likely without any access to any usernames at all. The data you use in corpus software is just the post content. Usernames are miles away.

It is likely that ethical approval has been given for EPs PhD already precisely because the dataset already exists at Aston. In a kind of "I'm using an approved internal dataset, nothing to see here". But any researcher worth their salt should CHECK that the rules haven't changed. Should check the rules of the website in question. So that's a failure on the supervisor and the student. Ethics is a big thing. Not getting informed consent from participants is a massive thing. And the rules on online datasets are changing rapidly and it is vital to stay informed.

The big issue here is the ethics of the Mumsnet dataset existing in the first place as it contravenes the T&Cs of the site.

To be clear, I am not affiliated with Aston or the PhD in question. Just someone else in the field watching with interest in how this all ends...

It is likely that ethical approval has been given for EPs PhD already precisely because the dataset already exists at Aston. In a kind of "I'm using an approved internal dataset, nothing to see here".

I called this the "someone else's department" effect. The assumption that someone else got the website owner's consent to collect the dataset, ran doing so past ethics, etc etc so no one else has to do any checks. I really used to piss people off in my last job because I'd ask for their ethics approval certificate number when they were starting a new project that involved personal data and I got criticised by some PIs for overstepping what technical staff are meant to do. I'm now wondering whether, far from me overstepping, everyone involved with research data should do this.

EPs PhD looks to be squarely in the corpus linguistics wheelhouse, likely without any access to any usernames at all.

I would not be so sure, given her department and supervisor's specialisation.

C8H10N4O2 · 24/04/2024 11:45

NotTHATCorpusLinguist · 24/04/2024 11:33

I wanted to offer some thoughts on the doxxing threat angle taken up in this and the other thread, and give some potentially helpful info on other corpus linguistic technicalities... NOT to excuse or explain the PhD in question (because wow) but because this is my field and some of us have research integrity not reflected by this PhD and people at Aston. I'm livid that this kind of research is what gets corpus linguistics more known and people then think we're all unethical twits. Ehem. So:

The whole point of corpus linguistics is to look at a dataset as a whole. There are no individuals.

Forensic linguists might be looking at authorship identification but it is a whole different field to corpus linguistics.

The fact that the Mumsnet corpus held by Aston has/could be used by both forensic and corpus linguists makes this confusing.

EPs PhD looks to be squarely in the corpus linguistics wheelhouse, likely without any access to any usernames at all. The data you use in corpus software is just the post content. Usernames are miles away.

It is likely that ethical approval has been given for EPs PhD already precisely because the dataset already exists at Aston. In a kind of "I'm using an approved internal dataset, nothing to see here". But any researcher worth their salt should CHECK that the rules haven't changed. Should check the rules of the website in question. So that's a failure on the supervisor and the student. Ethics is a big thing. Not getting informed consent from participants is a massive thing. And the rules on online datasets are changing rapidly and it is vital to stay informed.

The big issue here is the ethics of the Mumsnet dataset existing in the first place as it contravenes the T&Cs of the site.

To be clear, I am not affiliated with Aston or the PhD in question. Just someone else in the field watching with interest in how this all ends...

Isn't the ethical issue with this piece of corpus research/modeling also the bias in the subjective assumptions of the researcher? (in this case that MN is transphobic but it could equally be anything else).

I'm coming at it from the model building/AI and data side rather than the linguistics side but we have to be very cautious about data sets created for CDA type research due to problems with bias.

NotTHATCorpusLinguist · 24/04/2024 11:48

@VitoCorleoneOfMNMafia "The assumption that someone else got the website owner's consent to collect the dataset, ran doing so past ethics, etc etc so no one else has to do any checks. [...] I'm now wondering whether, far from me overstepping, everyone involved with research data should do this."

I would agree with this. It's not enough to say it's not your problem because you didn't collect it yourself. Should be: You use it, you check it.

RedToothBrush · 24/04/2024 11:52

VitoCorleoneOfMNMafia · 24/04/2024 11:44

It is likely that ethical approval has been given for EPs PhD already precisely because the dataset already exists at Aston. In a kind of "I'm using an approved internal dataset, nothing to see here".

I called this the "someone else's department" effect. The assumption that someone else got the website owner's consent to collect the dataset, ran doing so past ethics, etc etc so no one else has to do any checks. I really used to piss people off in my last job because I'd ask for their ethics approval certificate number when they were starting a new project that involved personal data and I got criticised by some PIs for overstepping what technical staff are meant to do. I'm now wondering whether, far from me overstepping, everyone involved with research data should do this.

EPs PhD looks to be squarely in the corpus linguistics wheelhouse, likely without any access to any usernames at all.

I would not be so sure, given her department and supervisor's specialisation.

Data protection law in the uk is that you can only use that data for a very narrow explicit reason. You can't say 'research purposes' for example because that's too vague. You have to state research of x, on every single occasion. This is precisely to prevent situations like this and people feeling misled about what they have given consent to.

People don't seem to understand this well, but talk to the ICO and I bet they will confirm my point.

KellieJaysLapdog · 24/04/2024 12:00

I’m a massive name changer, btw. Not because I’m committing thought crimes or avoiding a violent ex but because I realised several years ago that a Social Media presence with a name and an image attached is absolutely fucking over the mental health of those who participate in it, especially those who start using it when their brains are still developing. I quit posting anything on social media for both my own sake and to be a better role model for my children.

I like Mumsnet and I like not having a permanent username because here our words, our (robust!) discussions, our expressions of support are our focus.
We value our interactions and the lack of a block button or a predatory algorithm means we aren’t silo’d from each other the way we are elsewhere.
We aren’t elevating the pretty, or those with the highest follower count, here the quality of the discussion is Queen (and also: the inventive sweariness).

The ‘group’ is created by the board topic.

Jonathon Haidt’s new book re: the damage social media is causing to teens is an excellent, if scary read.
He observes that increasingly the Silicon Valley elite types are turning away from the inventions they created, they aren’t allowing their own children to access social media and they themselves are switching to ‘dumb’ phones outside of work.
It’s the less well-off kids who are spending the most time being warped by algorithms (because single parents and working class parents have less quality time to spend with their kids and fewer resources, making it harder to replace screens with activities).

Part of my giving up on social media meant moving back to the old school parts of the internet, the ones that don’t make us so anxious and depressed, including text based chat forums, like Mumsnet.

Mumsnet is one of the few OG discussion forums that has weathered the big changes that happened to the internet after the invention of the smart phone.

It hasn’t been easy, I’m sure, especially when anti woman activists started targeting advertisers.

Haidt says that the mental health of women and girls has been more affected by social media than that of men (boys have a different set of challenges mostly related to gaming).
I believe that Mumsnet’s great longevity has been precisely because it does not have the features that trigger anxiety and depression in women. You don’t have to worry about being socially ostracised for saying something daft/misguided on Mumsnet, you can just name change and start over, still access the same support, the same jokes, the same resources.

You can’t be ‘cancelled’ for something you say on Mumsnet, because no one knows who you are.

Even the mean gossipy-doxy websites aren’t particularly interested in us, the lack of selfies, the jumble of usernames and the inability to become a celeb tier user (no follower counts, no total posts tally) makes us rather dull to casual observers.
Plus a lot of people just ignore us because they are sexist and assume all mums talk about is mumsy things.

Justine’s commitment to free speech (within UK law and with the need for the business to remain financially viable!) and MN’s general ‘if it ain’t broke, don’t fix it’ attitude to stuff like site functions and layout design have created a British institution far more valuable than perhaps any other website in British history.

Obviously Aston have cottoned onto this value and won’t give up their giant, secretly appropriated data stash (“license: Unsure”) without a fight.

I wonder how various UK politicians feel about their disastrous webchats (and biscuit preferences) being held on a Secret Special Server in Birmingham, funded by the US government? Grin

NotTHATCorpusLinguist · 24/04/2024 12:04

C8H10N4O2 · 24/04/2024 11:45

Isn't the ethical issue with this piece of corpus research/modeling also the bias in the subjective assumptions of the researcher? (in this case that MN is transphobic but it could equally be anything else).

I'm coming at it from the model building/AI and data side rather than the linguistics side but we have to be very cautious about data sets created for CDA type research due to problems with bias.

I wouldn't say it's the main issue. It's definitely a problem but more in the area of research quality. It's definitely a bit suspect! But is entirely seperate to the problem of lack of consent.

NonLinguisticRhetoricIsMyKryptonite · 24/04/2024 12:09
Snl Smile GIF by Saturday Night Live

I name change to reflect contemporary events at times

Please create an account

To comment on this thread you need to create a Mumsnet account.

This thread is not accepting new messages.
Swipe left for the next trending thread