Meet the Other Phone. A phone that grows with your child.

Meet the Other Phone.
A phone that grows with your child.

Buy now

Please or to access all these features

Site stuff

Join our Innovation Panel to try new features early and help make Mumsnet better.

See all MNHQ comments on this thread

Mumsnet Corpus

1000 replies

TokyoBouncyBall · 19/04/2024 11:36

Not a TAAT, but a bit of googling as a result of a now deleted thread has led me to this:

https://fold.aston.ac.uk/handle/123456789/18

I note it says that the License is uncertain. Can you confirm that you have given permission for posts to be used in this way, or is there something that Aston might like to look into?

I note it says Users who wish to access this dataset must make a detailed application to FoLD and the researcher, as well as potentially gain additional agreement from an external organisation before they can be approved for access.

Given one of the uses it is being put to, I think it is a bit dubious to say the least.

OP posts:
Thread gallery
82
Ormally · 24/04/2024 22:52

SqueakyDinosaur · 24/04/2024 22:44

This is totally irrelevant, but why would the iapp website illustrate a piece about the EU GDPR regulations with a photo of Grand Central Station New York? I mean, come ON!

Subconscious non-linguistic influence on secondary uses of the word 'railroading' (as a verb)?

IncompleteSenten · 24/04/2024 23:00

everythingthelighttouches · 24/04/2024 22:07

From the app (my bold)

“There are also rules that apply to special categories of personal data and seem to limit the requirements when it comes to publicly available data. That is, in line with Article 9, if the processing relates to personal data that are manifestly made public by the data subject, no explicit consent or other legal basis as enlisted in the Article 9 (mainly specific laws and regulations or establishment, exercise or defense of legal claims) is required.
On the other hand, such data would have to be made public by the data subject, and more than that, manifestly made public, so as to indicate that they wish and expect such data to be further processed. No need to mention that all other provisions, including the principles and the Article 6, still apply, and also the personal data may be processed only if the purpose of the processing could not reasonably be fulfilled by other means.”

This link is well worth a read
https://iapp.org/news/a/publicly-available-data-under-gdpr-main-considerations/

Would we not be able to argue that we posted under the assumption that Mumsnet terms and conditions prohibiting third parties from taking the contents would be adhered to and therefore did not reasonably expect the data to be (unlawfully? Illegally? In breech of Mumsnet rule) used elsewhere?

Ormally · 24/04/2024 23:16

As MyLadyDisdain wrote further up:

The safety mechanism we thought we had to keep us anonymous is regular name changes. The whole point of this Aston database is to develop software that will identify people based on how they write and use language, regardless of name change. The alleged “legitimate” point of such software is to track criminals and terrorists.

There are spokes, at the very least, that suggest clearly debatable issues in many areas including: GDPR; privacy and the protection of identifiable data; data retention policies/right to be forgotten; protection and safeguarding of children and vulnerable subjects either directly or by association with a poster (who may also be identifiable enough through the information posted); interpretation of T&Cs as the law has changed since 2008.

Could there be consequences for individuals IRL if the study either identified, or wrongly identified, people from the analysis of the corpus and the terms in which it has been permitted to operate? Highly likely.

Astontacious · 25/04/2024 08:51

Because every page is now disappearing, it would be useful to remember the real wording used to show the intent. ‘Hate crimes’ and mumsnet was used in the LinkedIn. It still shows up in Google in the blurb.

From Aston’s website:
What is a hate crime?
When hate incidents become criminal offences they are known as hate crimes. A criminal offence is something that breaks the law.

Minority Report was about pre-crime. I don’t know if the researchers are trying to create a database of mums trying to link them to pre-crimes or post-crimes but it is very dystopian.

If the researcher made a mistake with their wording then that’s unfortunate in their field of study but it suggests their opinion and intent.

Ormally · 25/04/2024 09:01

Astontacious, there is a chance that the research theme had to be pushed into the 'criminal' kind of area to justify being able to dip into whatever was intended over that whole corpus of language in the forum, compared to the weak points it offers up at the same time, of identifiability, safeguarding, subjects such as abortion, surgery, infertility, mental health etc. that also happen to be what is believed usable (and a long period of the scrape as well) in the interest of tracking and pinpointing a poster. Not saying that this was the intent, but where the corpus is held and under what security, does make me wonder.

If a suspected criminal was posting about something else on a different forum - child abuse, for example - then similar techniques could be used to compare witness statements, maybe records of interviews with therapists etc (usually totally off limits re sensitive data), but these powers are limited to specialist branches of the police and the law - not linguistics depts.

AstonsDataThief · 25/04/2024 09:05

interpretation of T&Cs as the law has changed since 2008.

Though it is law that and T&C that applied when they did the scrape that is relevant, not that in 2008

AstonsDataThief · 25/04/2024 09:18

On the other hand, such data would have to be made public by the data subject, and more than that, manifestly made public, so as to indicate that they wish and expect such data to be further processed.**

Has it been made public by the data subject when the data subject uses anonymous usernames that regularly change, from more than one account set up under pseudonyms using throwaway email addresses? And certainly how can you possibly conclude from this that they wish and expect such data to be further processed including by linking up those various user names and accounts?

MarkMenziesFakeMugger · 25/04/2024 09:22

AgathaAllAlong · 24/04/2024 19:05

Exactly, the bit where they said they looked at the adoption talk subsection of this forum to preserve researcher interest really stood out to me.

In the video, the Scrapers demonstrate such little care or understanding of our forum. At 3.28ish they have a slide up showing the sites they've scraped. There are 4, and only MN is named. They say that their data assumes that each "nickname" (i.e. username) corresponds to one individual. They note that on social media you have more mixed identities and "troll accounts" but they they don't consider it so much of a problem on traditional discussion boards (they don't say explicitly but from context they mean MN). This detail demonstrates that they have taken no time at all to get to know this forum. They don't know (and don't care enough to find out) that we all frequently name change, and that we do it for privacy and safety reasons (often with serious real life consequences). Not only is it an unethical attitude, it's a flaw in their research outputs..

They go on to demonstrate their authorship identifying tool by picking out a user (not named) and demonstrating how they tracked them across threads by picking up on characteristics in their typing such as particular expressions used, characteristic spelling mistakes, characteristics typos. They have the real examples up on their slide. Then, they say that the topic that someone posts about also helps identify them, and they give as an example someone who mainly posts about infertility on the infertility board. Again, this shows a complete disregard for MN users and the purpose of our posts.

It’s pretty Machiavellian behaviour. I’d not put this past a for-profit company - but a university!

Astontacious · 25/04/2024 09:25

Ormally · 25/04/2024 09:01

Astontacious, there is a chance that the research theme had to be pushed into the 'criminal' kind of area to justify being able to dip into whatever was intended over that whole corpus of language in the forum, compared to the weak points it offers up at the same time, of identifiability, safeguarding, subjects such as abortion, surgery, infertility, mental health etc. that also happen to be what is believed usable (and a long period of the scrape as well) in the interest of tracking and pinpointing a poster. Not saying that this was the intent, but where the corpus is held and under what security, does make me wonder.

If a suspected criminal was posting about something else on a different forum - child abuse, for example - then similar techniques could be used to compare witness statements, maybe records of interviews with therapists etc (usually totally off limits re sensitive data), but these powers are limited to specialist branches of the police and the law - not linguistics depts.

Yes the funding stream is always god. But it seems clumsy at best to phrase like that in LinkedIn. And remember this linguistic department’s staff is used as experts by the police.

This should be a reminder to mums everywhere: name your child something common rather than youique so if they do things they or their department regrets, it’s less likely to be accurately googled. If the researcher is viewing, I think it will be ok on those fronts. But oh the irony of mums discussing children and young people not realising what they are doing and how to prevent regret, then those words being linguistically analysed to post-judge-and-jury if they are crimes.

edit: even I can get my words right and need an edit - irony of ironies

ArsetonUniversity · 25/04/2024 09:56

MarkMenziesFakeMugger · 25/04/2024 09:22

It’s pretty Machiavellian behaviour. I’d not put this past a for-profit company - but a university!

Sadly Universities follow the funding. They (correctly) identified that a database like this would bring in funding and graduate students.

Dumbledoreslemonsherbets · 25/04/2024 10:11

It's funny how we have apparently no way to delete all our posts from their 'sandbox' but they're deleting pages about this (including the 'hate crimes' page) left right and centre.

They really don't see the users of mumsnet - even when posting about things as heartrending as fertility or adoption -as human, do they? It's real people, real children being discussed here.

Immoral bastards.

I do not consent to my data on here under any username being used by Aston University in any context because I think they are immoral and do not see the people on here as fully human - based on their words and deeds. I never consented to the multiple uses they've put the data to, and they did not ask me for a change of use of my data, which GDPR requires and it requires a specific request - you cannot be covered by something so general to be meaningless. People cannot consent to all and sundry use of their data in perpetuity.

My company also has to inform all users when they change their T&Cs, and I do not recall being informed of a change which allowed this use of my data (although MN has said their use breached the T&Cs publicly already).

Dumbledoreslemonsherbets · 25/04/2024 10:13

I'm afraid I'm very close to deciding I need to delete my MN account and ask MN to remove all of my posts under any username.

I did have two accounts previously that I deleted. I haven't gone back to check whether all posts under usernames associated with those accounts were deleted - I don't think they do this, but if there's a specific risk (which there is from Aston activities) I wonder if this is something posters could request?

KellieJaysLapdog · 25/04/2024 10:14

I think I might’ve fed Ashton’s Hate Crimes page into Internet archive over the weekend without mentioning it anywhere here.

I’ll look it up when I have time
later

Dumbledoreslemonsherbets · 25/04/2024 10:24

Have MNHQ explicitly said they've informed the ICO, that their Data Protection Officer is involved, or that they're getting legal advice?

Talulahalula · 25/04/2024 10:26

AstonsDataThief · 25/04/2024 09:18

On the other hand, such data would have to be made public by the data subject, and more than that, manifestly made public, so as to indicate that they wish and expect such data to be further processed.**

Has it been made public by the data subject when the data subject uses anonymous usernames that regularly change, from more than one account set up under pseudonyms using throwaway email addresses? And certainly how can you possibly conclude from this that they wish and expect such data to be further processed including by linking up those various user names and accounts?

I don’t think you can.
It behoves Aston to explain how and why they think their use of the user content is legal, both in terms of copyright (where it goes beyond fair use, I would think - are there exemptions for research and education?), and in terms of how it meets GDPR requirements legally. That is before we get to the ethics.
As I understand it, MN would be the data controller in GDPR terms and Aston the third party using the data. We are the data subjects. Hence, we can ask MN to delete data and posts. The problem is that we cannot ask Aston and we didn’t know the material was going to that use.
I am not (yet) at the stage of deleting my account but I am not posting anywhere else in the site anymore.

Talulahalula · 25/04/2024 10:26

Dumbledoreslemonsherbets · 25/04/2024 10:24

Have MNHQ explicitly said they've informed the ICO, that their Data Protection Officer is involved, or that they're getting legal advice?

No, I don’t think so.

Whinge · 25/04/2024 10:34

Dumbledoreslemonsherbets · 25/04/2024 10:24

Have MNHQ explicitly said they've informed the ICO, that their Data Protection Officer is involved, or that they're getting legal advice?

I don't believe so.

BIWI · 25/04/2024 10:37

Ereshkigalangcleg · 22/04/2024 20:19

Repeating my post from the other thread here:

Krysztof Kredens and Piotr Pezik spoke at this Forensic Linguistics round table in 2019 about the Mumsnet dataset as part of their corpus.

I've ploughed my way through the whole event. It's actually very interesting to understand what we're dealing with, but the talk which refers to Mumsnet starts at 3 hours 17 minutes. It looks like they set it up as an easily scraped "sandbox" model to play around with. They refer to the distinctive language of "women having fertility treatment" and "dieting".

It's also worth watching Tim Grant's section, he's on second, and the Q&A for that.

I watched the first of these (the one starting around 3.17) where they explained the data sets that they had used for their 'sandboxes', and I was interested to see that they have taken data from 4 different sources, but only Mumsnet was named specifically. The other fora were just named X, Y and Z. Any idea why this might be? Do they somehow seem to think that Mumsnet is more public/available than the others? Do they somehow seem to think that they had permission to use Mumsnet?

IncompleteSenten · 25/04/2024 10:39

That's right. The unlimited name change facility on here is unusual for the internet and clearly demonstrates that Mumsnet knows and supports our desire to be anonymous and prevent people being able to read through all our posts over months and years to try to identify us.

If they have all this data to use to match up posts under different usernames and say all of these posts under these ten usernames are actually by the same poster and here is the full picture of this poster - then that is unacceptable and unethical and surely illegal?

Ormally · 25/04/2024 10:44

My company also has to inform all users when they change their T&Cs, and I do not recall being informed of a change which allowed this use of my data

My thoughts too. And changes of T&Cs are typically prompted by, or hot on the heels of, changes in tech, and ramping up of the abilities of tools that can be used for online analysis (among other legal developments). The live forum and the sandbox version can surely only be seen as 2 entities that have been diverging with speed and certainty, and with control being very much at the mercy of (at least) 2, UK-based, companies, with very different motivations. But only one of them put an initial agreement with its users on the table.

Encyclopediaofnonsense · 25/04/2024 11:12

Dumbledoreslemonsherbets · 25/04/2024 10:13

I'm afraid I'm very close to deciding I need to delete my MN account and ask MN to remove all of my posts under any username.

I did have two accounts previously that I deleted. I haven't gone back to check whether all posts under usernames associated with those accounts were deleted - I don't think they do this, but if there's a specific risk (which there is from Aston activities) I wonder if this is something posters could request?

Good luck. They've so far ignored my request.

hamstersarse · 25/04/2024 11:14

BIWI · 25/04/2024 10:37

I watched the first of these (the one starting around 3.17) where they explained the data sets that they had used for their 'sandboxes', and I was interested to see that they have taken data from 4 different sources, but only Mumsnet was named specifically. The other fora were just named X, Y and Z. Any idea why this might be? Do they somehow seem to think that Mumsnet is more public/available than the others? Do they somehow seem to think that they had permission to use Mumsnet?

I just watched that part too and thought the same - who are X Y and Z? Why were they so confident to name Mumsnet? Which obviously was the main data source by a country mile.

Having watched the video, I just cannot comprehend how these people think this is ethical. What is the ultimate purpose of their 'research' - to create a Big Brother authoritarian society where no-one has any right to privacy or anonymity?

I feel like adding a line to the bottom of every one of my posts which they can put in their sandbox and FOTTFSOF:

<To the criminals at Aston University, hope you are having a nice day and mine away on this - trans women are not women>

DewinDwl · 25/04/2024 11:24

Dumbledoreslemonsherbets · 25/04/2024 10:24

Have MNHQ explicitly said they've informed the ICO, that their Data Protection Officer is involved, or that they're getting legal advice?

I don't think so.

I remember the MN-intern-who-breached-GDPR-for-ideological-reasons debacle and I remember MNHQ's initial response ("she has been spoken to", "she has apologised"). I hope things are different this time.

I yes I agree with your pp @Dumbledoreslemonsherbets - misogyny is at the heart of Aston's behaviour

BIWI · 25/04/2024 11:30

The other thing that I object to is the use of 'sandbox'. Unless it's a specific, technical term. But they actually talked about using the sandbox to 'play'. Just shows that there really isn't any thought about or concern for the women/posters that they're using for their jollies.

Please create an account

To comment on this thread you need to create a Mumsnet account.

This thread is not accepting new messages.