Meet the Other Phone. Child-safe in minutes.

Meet the Other Phone.
Child-safe in minutes.

Buy now

Please or to access all these features

Site stuff

Join our Innovation Panel to try new features early and help make Mumsnet better.

See all MNHQ comments on this thread

Mumsnet Corpus

1000 replies

TokyoBouncyBall · 19/04/2024 11:36

Not a TAAT, but a bit of googling as a result of a now deleted thread has led me to this:

https://fold.aston.ac.uk/handle/123456789/18

I note it says that the License is uncertain. Can you confirm that you have given permission for posts to be used in this way, or is there something that Aston might like to look into?

I note it says Users who wish to access this dataset must make a detailed application to FoLD and the researcher, as well as potentially gain additional agreement from an external organisation before they can be approved for access.

Given one of the uses it is being put to, I think it is a bit dubious to say the least.

OP posts:
Thread gallery
82
RedToothBrush · 23/04/2024 20:45

Ereshkigalangcleg · 23/04/2024 20:36

I don't think I have anything to add about my opinion on the student above what I said on the other thread, so I'll take the point but respectfully disagree.

I'm not sure why I'm supposed to feel sorry for her either. She's no different to any blinkered, empathy free transactivist shouting TWAW at women at events. They're often "young students" too. Women are not here to coddle everyone's feelings. So I'll respectfully disagree back.

Young students have enough responsibility and agency to change sex if they want it seems but don't have enough responsibility and agency to be held to account for anything they may post online or write. Is that what we are getting at here?

If they are a PhD student they will be over 21 (I'm good at maths here). A 21 medical student could be in a public facing role working in a hospital as part of their studies. Do they cease to be responsible for their manner, their beliefs or their conduct too? Or do they have to work within certain confines of ethical practice as well?

Just I'm starting to find it really confusing as to what point the age of legal responsibility kicks in here. 10, 12, 14, 16, 18, 21, 25, under 30?

Sorry. If you deliberately are writing a hit piece on MN and its users as a PhD student, I think perhaps you have reached an age where you are starting to run out of excuses. You know what you are doing and what your objectives are.

Welliwould · 23/04/2024 20:52

Commenting to get updates. Thank you all so much for this.

Talulahalula · 23/04/2024 20:55

From The Routledge Handbook of Forensic Linguistics (Ed Malcolm Coulthard, Alison May and Rui Sousa-Silva, 2021 second edition), in a chapter called ‘Experts and opinions in my opinion’ by Malcolm Coulthard, p. 535, he says:

‘Kris Kredens at the Aston University Institute for Forensic Linguistics has a currently restricted, but in the longer term accessible, corpus of posts to internet discussion fora, including Mumsnet, totalling over three billion words’.

There is no reference given to the source of this information, so it must have been sufficiently well-known in the field not to need one (or the author here has heard Kredens speak). Question: what is meant by ‘currently restricted, but in the longer term accessible’? Obviously, the author here is a third party but one sufficiently confident to write this.

mrshoho · 23/04/2024 20:57

I'd like to ask mumsnet what security systems are in place to prevent data scraping on your site? Also, what systems are in place to identify and monitor attempts to scrape data?

Do you feel any obligation to your users to notify us when you do grant authorisation for data to be scraped?

Ereshkigalangcleg · 23/04/2024 21:20

Kris Kredens at the Aston University Institute for Forensic Linguistics has a currently restricted, but in the longer term accessible, corpus of posts to internet discussion fora, including Mumsnet, totalling over three billion words’.

There is no reference given to the source of this information, so it must have been sufficiently well-known in the field not to need one (or the author here has heard Kredens speak). Question: what is meant by ‘currently restricted, but in the longer term accessible’? Obviously, the author here is a third party but one sufficiently confident to write this.

Yes I'd like to know what this means too.

KellieJaysLapdog · 23/04/2024 21:24

Talulahalula · 23/04/2024 20:55

From The Routledge Handbook of Forensic Linguistics (Ed Malcolm Coulthard, Alison May and Rui Sousa-Silva, 2021 second edition), in a chapter called ‘Experts and opinions in my opinion’ by Malcolm Coulthard, p. 535, he says:

‘Kris Kredens at the Aston University Institute for Forensic Linguistics has a currently restricted, but in the longer term accessible, corpus of posts to internet discussion fora, including Mumsnet, totalling over three billion words’.

There is no reference given to the source of this information, so it must have been sufficiently well-known in the field not to need one (or the author here has heard Kredens speak). Question: what is meant by ‘currently restricted, but in the longer term accessible’? Obviously, the author here is a third party but one sufficiently confident to write this.

Adding these screenshots to make MNHQ’s lives easier.

Possible questions for Aston

Is the book quote using ‘Restricted’ as it is described in the screenshot (from FoLDs main website)

(nb: the dataset was actually classified as ‘Controlled’ when we found it, since deleted from the website index)

More specifically, what does ‘accessible’ mean in this context, and under what circumstances does a repository change a dataset’s classification?

Also, just wanted to call MNHQ’s attention to the bit where it says ‘License: Unsure’

Mumsnet Corpus
Mumsnet Corpus
Mumsnet Corpus
Mumsnet Corpus
Winnading · 23/04/2024 21:29

everythingthelighttouches · 23/04/2024 13:36

This is a very good question actually.

Yes, you would be essentially re-identifying yourself if you provide your real name, or an email address and your mumsnet username to Aston. (Mumsnet already has all this data of course, but you consented to that).

Whether you trust Aston is a separate question.

If an institution gained more personal information about a data subject during the course of the access request and then made that public, or breached GDPR in some way, they would be subject to the very highest fines from the ICO, from memory it is in the region of €20million per breach. For a university this would also come with significant reputational damage with wider implications still.

What if they have already identified us?
Say my posts here are forensically connected to my Facebook and its bloody obvious that winna is also xyz on Facebook.
Ok they may not have made this public (yet) but they might have the full details in a document somewhere. Visible to the right students, visible to lots of staff.

Always the possibility of a leak, something I really really dont want, but neither am I willing to dox myself to find out. Sigh. I hope the code thingy is going to work.

What a fucking mess. Stupid university.

KellieJaysLapdog · 23/04/2024 21:30

And to put that ‘License: Unsure’ in context, here is the category info from other uploads in the FoLD repository.

Mumsnet Corpus
Mumsnet Corpus
Mumsnet Corpus
ArabellaScott · 23/04/2024 21:33

If an institution gained more personal information about a data subject during the course of the access request and then made that public, or breached GDPR in some way, they would be subject to the very highest fines from the ICO, from memory it is in the region of €20million per breach. For a university this would also come with significant reputational damage with wider implications still.

Crikey.

Astontacious · 23/04/2024 21:44

That’s a lot of responsibility on not to actually reveal anything for the researcher.

I have thought about this a bit more. Surely by being the judge and jury and already finding mumsnet/mumsnetters guilty in the PhD title, the researcher is setting themselves up for being sued for libel/malicious communications? She’s got to try and find some now because her title has told her she has.

VitoCorleoneOfMNMafia · 23/04/2024 21:45

everythingthelighttouches · 23/04/2024 20:19

I think I’ve answered my own question

https://www.aston.ac.uk/research/integrity-ethics/ethics

”Ethics review and approval process

If you are a staff member or postgraduate research student, and your research involves any of the following, you must apply for research ethics approval (this approval must be received before your research commences):

  • Human participants: (including all types of interviews, questionnaires, focus groups, records relating to humans, use of online datasets or other secondary data, observations, etc.)
  • Human tissue or cells gathered prospectively from participants. Please see our Human tissue in Research page if you are purchasing or transferring samples to Aston University.
  • Risk to members of the research team such as:
  • lone working during data collection
  • travel to areas where researchers may be at risk: (any request for research requiring international travel should be accompanied by a University travel risk assessment form)
  • risk of emotional distress
  • other: please outline
  • Any risk to the environment
  • Any conflict of interest
  • Research that could be considered controversial or be of reputational risk to Aston University
  • Social media data and/or data from internet sources that could be regarded as private
  • Any other ethical considerations: (any substantial ethical considerations you are aware of)
If you have answered YES to any of the above, you need to take the following steps in applying to your College Research Ethics Committee (CREC) to seek approval to commence your research.”

Human participants: (including all types of interviews, questionnaires, focus groups, records relating to humans, use of online datasets or other secondary data, observations, etc.)

Women are human and our posts are records relating to humans.

AgathaAllAlong · 23/04/2024 21:45

Not sure if this has been mentioned already, but I've found another published Aston paper that uses data from the MN database [edit: see edit] They include (short and non-identifying, to not alarm anyone) direct quotes in MN language, which seems like a massive breach to me (relevant sections of the paper in screenshots - 1. where they mention the data; 2. is the notes on that sentence; 3. the quotes).

Paper: Htait, A., Busso, L., Grant, T. (2024). Hierarchies of Power: Identifying Expertise in Anonymous Online Interactions. In: Naik, N., Jenkins, P., Grace, P., Yang, L., Prajapat, S. (eds) Advances in Computational Intelligence Systems. UKCI 2023. Advances in Intelligent Systems and Computing, vol 1453. Springer, Cham. https://doi-org.eux.idm.oclc.org/10.1007/978-3-031-47508-511

Assuming MN headquarters didn't know about this either?

Edit: does 'adoption topics' means something else in this context...? I need a linguist to tell me, as when I first read it I thought it meant 'taken from the talkpages on adoption'. If so - very bad.

Mumsnet Corpus
Mumsnet Corpus
Mumsnet Corpus
AstonCanKissMyArse · 23/04/2024 21:47

RedToothBrush · 23/04/2024 20:45

Young students have enough responsibility and agency to change sex if they want it seems but don't have enough responsibility and agency to be held to account for anything they may post online or write. Is that what we are getting at here?

If they are a PhD student they will be over 21 (I'm good at maths here). A 21 medical student could be in a public facing role working in a hospital as part of their studies. Do they cease to be responsible for their manner, their beliefs or their conduct too? Or do they have to work within certain confines of ethical practice as well?

Just I'm starting to find it really confusing as to what point the age of legal responsibility kicks in here. 10, 12, 14, 16, 18, 21, 25, under 30?

Sorry. If you deliberately are writing a hit piece on MN and its users as a PhD student, I think perhaps you have reached an age where you are starting to run out of excuses. You know what you are doing and what your objectives are.

Hear hear.

What I've seen on thisforum is users wanting accountability / saying the student is old enough to know better / that they don't feel sorry for them.

Not that the student should be vilified, punished or permanently cancelled.

There is a massive difference.

Consequences (as John Wick would say).

Also: estas rano en mia bideo 🍻👀🐸 (to fuck with the data collection).

Talulahalula · 23/04/2024 21:51

As an aside, there is a detailed consideration of the ethics of using MN posts for research in a chapter called ‘Digital interaction’ by Jal MacKenzie in the Routledge Handbook of English Language and Digital Humanities (2020). It’s chapter 4. It would be good reading for the doxy dudes.
p50 - ‘research has suggested that parenting forums around the world … can offer safe spaces in which women can explore motherhood on their own terms’
p.52 ‘In addition, it became clear to me during the course of my observations [of MN] that Mumsnet users often valued their sense of privacy and anonymity very highly, with many exercising their autonomy and agency in imaginative ways to control and shape the accessibility of their posts […] One of the most important decisions I made as a result of these considerations was to contact all the Mumsnet users whose words I wished to quote and/or analyse in detail and ask for their informed consent’.

The author here goes on that they contacted users in batches of ten over 24 hour periods to gauge their response and among other things, this researcher also gave participants the chance to have their usernames anonymised. There’s more but this extract gives a good sense of how using posts on MN for research can be approached sensitively and carefully. From the references, it looks like the author here has published more on internet research ethics specifically in 2017, so it’s not like ethical awareness was not out there when the doxy dudes scraped the site.

I just add the above for reference in case anyone at Aston is reading this thread and needs a primer on how to do things. Also worth noting that MN gave permission for the study I have extracted from in this post and the use of their logo In the book and the quoted extracts in the actual study.

everythingthelighttouches · 23/04/2024 21:53

”What if they have already identified us?”

They haven’t.

If you have identified yourself by putting your own real name in your posts then yes they will now have that information. But then they didn’t identify you, you did.

The concern is is the theoretical identification of real people (referred to as natural person in the GDPR). I.e. Aston Uni have pooled together so many pieces of information from different threads/posts attached to one username that someone could identify the real life person without much effort, if so motivated.

So it’s not as serious as you are imagining but it is considered extremely seriously by the information commissioner’s office (ICO)

Sparklybutold · 23/04/2024 21:53

Has anyone actually reported it to the ICO?

MarkMenziesFakeMugger · 23/04/2024 21:54

JustineMumsnet · 23/04/2024 15:16

Update - Aston Uni have responded and offered a call with their Vice Chancellor to explain the reasons for the research, how they manage ethical approval and protect privacy and data. I'll be taking them up on that and putting some of our own (and your questions). Will report back!

I’m uncertain as to how a phonecall will sufficiently hold the ethics of this university to account? I think this is a weak response from Aston.

Encyclopediaofnonsense · 23/04/2024 21:54

Sparklybutold · 23/04/2024 21:53

Has anyone actually reported it to the ICO?

I came on to ask this. If not, can someone contact them?

VitoCorleoneOfMNMafia · 23/04/2024 21:56

Sparklybutold · 23/04/2024 21:53

Has anyone actually reported it to the ICO?

@MNHQ would be the obvious people to do so, as the lawful data controller.

Aston are supposed to report their own GDPR breaches within 72 hours but I don't think they think they've breached GDPR.

MarkMenziesFakeMugger · 23/04/2024 22:02

VitoCorleoneOfMNMafia · 23/04/2024 21:56

@MNHQ would be the obvious people to do so, as the lawful data controller.

Aston are supposed to report their own GDPR breaches within 72 hours but I don't think they think they've breached GDPR.

If they’ve caused distress - and they have - then surely they’ll have breached their own code of conduct/ ethics. GDPR breach is an extra dollop of mess on top of that I would have thought?

So far the student didn’t think this through, nor the supervisor, nor whoever was in charge of agreeing the data theft. (I’m assuming that’s what this is. Technically another word may apply?) Another university making big F* ups.

Has this ‘story’ been outed to the press yet?

everythingthelighttouches · 23/04/2024 22:07

VitoCorleoneOfMNMafia · 23/04/2024 21:56

@MNHQ would be the obvious people to do so, as the lawful data controller.

Aston are supposed to report their own GDPR breaches within 72 hours but I don't think they think they've breached GDPR.

I asked MNHQ to do this this morning.

I think it will be debatable as to whether Aston have breached GDPR . Only expert lawyers will be able to decide.

As I said earlier there may be a case around de-identification of special category data. This would be incredibly serious for Aston. It is complicated by the fact that they have used data scraping which is not very well covered by the ICO (it is a relatively new technology and practice). The ICO and other international regulatory agencies only released a statement on it at the end of last year.

I think this might end up being a test case in court for GDPR.

Mumsnet are on much steadier ground with the breach of their Ts&Cs and potentially their IP

since I’ve been thinking about it this evening , I think Aston may have to stop this going any further , destroy the data and report themselves to their funders for breach of their own ethics and research integrity policies.

All this is for the lawyers to sort out after the initial “conversation “ tomorrow

Talulahalula · 23/04/2024 22:11

AgathaAllAlong · 23/04/2024 21:45

Not sure if this has been mentioned already, but I've found another published Aston paper that uses data from the MN database [edit: see edit] They include (short and non-identifying, to not alarm anyone) direct quotes in MN language, which seems like a massive breach to me (relevant sections of the paper in screenshots - 1. where they mention the data; 2. is the notes on that sentence; 3. the quotes).

Paper: Htait, A., Busso, L., Grant, T. (2024). Hierarchies of Power: Identifying Expertise in Anonymous Online Interactions. In: Naik, N., Jenkins, P., Grace, P., Yang, L., Prajapat, S. (eds) Advances in Computational Intelligence Systems. UKCI 2023. Advances in Intelligent Systems and Computing, vol 1453. Springer, Cham. https://doi-org.eux.idm.oclc.org/10.1007/978-3-031-47508-511

Assuming MN headquarters didn't know about this either?

Edit: does 'adoption topics' means something else in this context...? I need a linguist to tell me, as when I first read it I thought it meant 'taken from the talkpages on adoption'. If so - very bad.

Edited

Yes, it’s MN because you can find the thread if you put one of the quotes in Google. FFS.

Talulahalula · 23/04/2024 22:13

It is the Talk pages in the Adoption board. Not going to link.

everythingthelighttouches · 23/04/2024 22:14

I also really think it would be a good idea for anyone worried to scroll down to the bottom of this page and click on the links to the mumsnet terms that we all signed up to.

Also clicking on privacy will give you information and contact details of mumsnet’s data protection officer.

I’ve had another read of these today. It’s always good to be as informed as possible.

IDoNotConsentToAstonResearch · 23/04/2024 22:15

DeanElderberry · 23/04/2024 19:33

If that 'poor student' wanted my sympathy, that 'poor student' shouldn't have accused me of a crime.

The self-righteous young can be very dangerous.

The 17th century Witchfinder General Matthew Hopkins’ career took him from the ages of 24 to 27. Just an interesting factoid I think more people should know.

Please create an account

To comment on this thread you need to create a Mumsnet account.

This thread is not accepting new messages.