Meet the Other Phone. Flexible and made to last.

Meet the Other Phone.
Flexible and made to last.

Buy now

Please or to access all these features

Site stuff

Join our Innovation Panel to try new features early and help make Mumsnet better.

See all MNHQ comments on this thread

Mumsnet Corpus

1000 replies

TokyoBouncyBall · 19/04/2024 11:36

Not a TAAT, but a bit of googling as a result of a now deleted thread has led me to this:

https://fold.aston.ac.uk/handle/123456789/18

I note it says that the License is uncertain. Can you confirm that you have given permission for posts to be used in this way, or is there something that Aston might like to look into?

I note it says Users who wish to access this dataset must make a detailed application to FoLD and the researcher, as well as potentially gain additional agreement from an external organisation before they can be approved for access.

Given one of the uses it is being put to, I think it is a bit dubious to say the least.

OP posts:
Thread gallery
82
AmaryllisNightAndDay · 22/04/2024 17:13

Thank you @LilyMumsnet I can't think of anything else to say that would be helpful, but here's hoping Aston come up with a sensible response.

Ereshkigalangcleg · 22/04/2024 20:19

Repeating my post from the other thread here:

Krysztof Kredens and Piotr Pezik spoke at this Forensic Linguistics round table in 2019 about the Mumsnet dataset as part of their corpus.

I've ploughed my way through the whole event. It's actually very interesting to understand what we're dealing with, but the talk which refers to Mumsnet starts at 3 hours 17 minutes. It looks like they set it up as an easily scraped "sandbox" model to play around with. They refer to the distinctive language of "women having fertility treatment" and "dieting".

It's also worth watching Tim Grant's section, he's on second, and the Q&A for that.

Talulahalula · 22/04/2024 20:55

I think that is the paper I was wondering about and refer to on page 2 of the thread. Excellent work to find it.

MrsTerryPratchett · 22/04/2024 21:09

Placemarking to listen to cleverer women than me.

Encyclopediaofnonsense · 22/04/2024 21:28

Mumsnet are not going to like my next question. If mumsnetters - past and present - are doxxed as part of this data scrape what then? Equally can past and present mumsnetters request deletion of all their historical posts to prevent future scrapes that may occur after this.

@LilyMumsnet

DrBlackbird · 22/04/2024 22:58

Talulahalula · 22/04/2024 15:22

I think, but I don’t know, that it is the other way around. MN as a large dataset of language use has been used by the Automatic Doxy Guys to develop and prove the concepts which can be applied in other contexts of interest to the US government and intelligence agencies.
Because the dataset of MN exists at Aston, the PhD to look at specific language in a specific political context (or indeed anything else of interest) becomes possible.

But I think the questions above about when the dataset was created, who funded it, who has access to it and so on, are pertinent to understand what has gone on (and is going on).

Edited

I’ve not looked up that research grant, but do i understand correctly that the research aim (funded by the US govt) is to be able to identify authorship from online posts by linguistic similarity with other identifiable sources? Presumably using AI….

And is FWR being characterised by Aston’s Forensic Linguistic department as "forum posts from far-right online groups" because it’s women concerned about the impact of gender ideology on women’s safety, privacy, dignity and safeguarding? Struth.

Encyclopediaofnonsense · 22/04/2024 23:27

DrBlackbird · 22/04/2024 22:58

I’ve not looked up that research grant, but do i understand correctly that the research aim (funded by the US govt) is to be able to identify authorship from online posts by linguistic similarity with other identifiable sources? Presumably using AI….

And is FWR being characterised by Aston’s Forensic Linguistic department as "forum posts from far-right online groups" because it’s women concerned about the impact of gender ideology on women’s safety, privacy, dignity and safeguarding? Struth.

Yes. The entire reason for the study is to prove they can take large amounts of anonymous data and revert it back to individuals. They are further challenging themselves to use the data and cross reference styles across different languages and to see how an individuals communication has changed with time. Why Mumsnet and not somewhere like Tattle or MSE or one of the other large longstanding sites is anyone's guess.

VitoCorleoneOfMNMafia · 22/04/2024 23:53

I think @MNHQ needs to have a word with Aston more generally about appropriate engagement: https://www.mumsnet.com/talk/conception/5057450-calling-women-ttc-or-diagnosed-with-infertility

NonLinguisticRhetoricIsMyKryptonite · 23/04/2024 00:00

VitoCorleoneOfMNMafia · 22/04/2024 23:53

I think @MNHQ needs to have a word with Aston more generally about appropriate engagement: https://www.mumsnet.com/talk/conception/5057450-calling-women-ttc-or-diagnosed-with-infertility

Yes.

Mumsnet Corpus
ADoggyDogWorld · 23/04/2024 00:03

VitoCorleoneOfMNMafia · 22/04/2024 23:53

I think @MNHQ needs to have a word with Aston more generally about appropriate engagement: https://www.mumsnet.com/talk/conception/5057450-calling-women-ttc-or-diagnosed-with-infertility

For now, the link takes you to this page:

'This thread has been hidden until the MNHQ team can have a look at it.'

VitoCorleoneOfMNMafia · 23/04/2024 00:41

ADoggyDogWorld · 23/04/2024 00:03

For now, the link takes you to this page:

'This thread has been hidden until the MNHQ team can have a look at it.'

Edited

That's cos I ratted the poster out to Night Watch innit. MNHQ will be able to see the content.

VitoCorleoneOfMNMafia · 23/04/2024 00:57

Encyclopediaofnonsense · 22/04/2024 23:27

Yes. The entire reason for the study is to prove they can take large amounts of anonymous data and revert it back to individuals. They are further challenging themselves to use the data and cross reference styles across different languages and to see how an individuals communication has changed with time. Why Mumsnet and not somewhere like Tattle or MSE or one of the other large longstanding sites is anyone's guess.

ICO have a page on deanonymisable data. tl;dr: It's personal data, and because MumsNet is full of women posting about their political opinions, philosophical and religious opinions, sex lives, sexual orientations, and health conditions, it's special category personal data too.

AlisonDonut · 23/04/2024 06:57

I'm just posting here to keep track of this.

I can't speak for anyone else, but I'm going to need to know, from Aston, how they are going to go about rectifying this and responding to potential FoIs on what they did with our data and Subject Access Requests on what they hold, such that they don't force us to doxx ourselves when asking.

DrBlackbird · 23/04/2024 07:20

Encyclopediaofnonsense · 22/04/2024 23:27

Yes. The entire reason for the study is to prove they can take large amounts of anonymous data and revert it back to individuals. They are further challenging themselves to use the data and cross reference styles across different languages and to see how an individuals communication has changed with time. Why Mumsnet and not somewhere like Tattle or MSE or one of the other large longstanding sites is anyone's guess.

Well. If so, surely that’s a death knell for MN? No one posts on an anonymous forum to be identifiable in real life. And why the bloody hell choose a woman’s site? Such a misogynistic thing to do to. Why not Reddit? If that’s good enough for Openai that should be good enough for Aston.

ArabellaScott · 23/04/2024 07:42

DrBlackbird · 23/04/2024 07:20

Well. If so, surely that’s a death knell for MN? No one posts on an anonymous forum to be identifiable in real life. And why the bloody hell choose a woman’s site? Such a misogynistic thing to do to. Why not Reddit? If that’s good enough for Openai that should be good enough for Aston.

The institute head said in a speech that Reddit really hates doxxing.

NonLinguisticRhetoricIsMyKryptonite · 23/04/2024 07:43

DrBlackbird · 23/04/2024 07:20

Well. If so, surely that’s a death knell for MN? No one posts on an anonymous forum to be identifiable in real life. And why the bloody hell choose a woman’s site? Such a misogynistic thing to do to. Why not Reddit? If that’s good enough for Openai that should be good enough for Aston.

Reddit signed a deal with Google to sell its users posts for training if anyone's interested in what happened there. Valuable resource for those that want to train chatbots to be breezy (as mentioned) but I also wonder about more empathetic (more sensitive issues?).

https://www.theregister.com/2024/02/22/reddit_google_license_ipo_altman/

Reddit signs AI training deal with Google

IPO docs drop showing just who has a stake in the forum

https://www.theregister.com/2024/02/22/reddit_google_license_ipo_altman

hamstersarse · 23/04/2024 07:45

What makes me furious about this is the hypocrisy. Universities are, as we all know, cesspits of virtue signalling about identity politics and ‘how to behave’, yet here they are, mining information from a women’s forum to basically enable doxxing.

I am at the point where I’m wishing for the failure of the university system. Rip off bullshit where free thought isn’t even the objective.

Very much looking forward to their weasly response, it’s going to be a cracker! Although, it’ll also probably take some time because they’ll need it to go through various bullshit committees and forums before they can manage to respond, which is telling in itself.

GreenSmithing · 23/04/2024 08:02

In the long term, generative AI seems likely to be the death knell for forensic linguistics attempts to link anonymous text to individuals. When chatbots are trained using text from reddit, then soon it will be impossible to tell if text is from a chatbot or a human redditor. Similarly, as people start using AI to write forum posts more frequently - and there can be good accessibility reasons for doing this - then how will you be able to tell if similarity is due to a person to person link, or person to AI, or AI to AI?

That suggests that the rationale for retaining this corpus has been lost, if the purpose was to train forensic linguistics models.

RethinkingLife · 23/04/2024 08:04

I've just taken a look at the debt thread analysed in the first paper I cited above:

Stanley, L.M. orcid.org/0000-0003-3882-8682, Deville, J. and Montgomerie, J. (2016) Digital Debt Management The Everyday Life of Austerity. New Formations: A Journal of Culture, Theory, Politics, 87. pp. 64-82. ISSN 0950-2378

There's no MNHQ indication that there's permission for academics to look at it there.

And the OP explicitly encourages posters that the thread is effectively a safe space to share their parlous financial situation and woes because it's anonymous.

Mumsnet Corpus
ElaineMBenes · 23/04/2024 08:07

Just commenting so I get the updates when Aston responds.
This really concerns me as MN is really the only place I can say what I really think ( under various usernames) and I'm disturbed by the implications of this study.

everythingthelighttouches · 23/04/2024 08:12

@JustineMumsnet @LilyMumsnet

Please report this directly to the Information Commissioner’s Office

I believe this is a breach of GDPR as Aston have made a copy and stored this data on their own servers.

We need to know if the way they are compiling and storing this data makes it much more easy for someone to re-identify a data subject, through the use of Personal Data.

I suspect you have your lawyers on this already.

For ANYONE ELSE, who is concerned about their data, with regard to your own data, whether or not this is pseudonymised, you are entitled to contact Aston University’s Data Protection Officer to make a Data Subject Access Request to find out if they have your personal data and to withdraw consent for its use. [email protected].

https://www.aston.ac.uk/sites/default/files/Data%20Subject%20Access%20Request%20Procedure.pdf

You can also make a freedom of information request to find out about what Aston have been doing in general with mumsnet data. [email protected]

https://www.aston.ac.uk/sites/default/files/Data%20Subject%20Access%20Request%20Procedure.pdf

Encyclopediaofnonsense · 23/04/2024 08:16

I've had a think over a bowl of Coco pops this morning that the way American laws are going in relation to women's health and this is an American back scrape and study are we being analysed with a view to policing women's health in the states?

RethinkingLife · 23/04/2024 08:21

Encyclopediaofnonsense · 23/04/2024 08:16

I've had a think over a bowl of Coco pops this morning that the way American laws are going in relation to women's health and this is an American back scrape and study are we being analysed with a view to policing women's health in the states?

This is why Aston FoLD and other comparable research groups need advisory and oversight groups…

MarkMenziesFakeMugger · 23/04/2024 08:24

so what’s the point of an anonymous forum … that’s NOT anonymous?

The implications of such dubious ‘research’ is chilling, and is of course another way to censor and keep women in their place. Pretty appalling. A relative is interested in studying in this field. I’ll be dissuading them from looking at this institution.

Cauliflowery · 23/04/2024 08:55

everythingthelighttouches · 23/04/2024 08:12

@JustineMumsnet @LilyMumsnet

Please report this directly to the Information Commissioner’s Office

I believe this is a breach of GDPR as Aston have made a copy and stored this data on their own servers.

We need to know if the way they are compiling and storing this data makes it much more easy for someone to re-identify a data subject, through the use of Personal Data.

I suspect you have your lawyers on this already.

For ANYONE ELSE, who is concerned about their data, with regard to your own data, whether or not this is pseudonymised, you are entitled to contact Aston University’s Data Protection Officer to make a Data Subject Access Request to find out if they have your personal data and to withdraw consent for its use. [email protected].

https://www.aston.ac.uk/sites/default/files/Data%20Subject%20Access%20Request%20Procedure.pdf

You can also make a freedom of information request to find out about what Aston have been doing in general with mumsnet data. [email protected]

Apologies for the probably stupid question, but in doing a SAR on one's anonymised data you're surely handing them your name and linking it to all the pseudonyms? What happens to that particular piece of data, what stops them stealing that? How can one trust Aston at this point?

Please create an account

To comment on this thread you need to create a Mumsnet account.

This thread is not accepting new messages.