Meet the Other Phone. Protection built in.

Meet the Other Phone.
Protection built in.

Buy now

Please or to access all these features

Site stuff

Join our Innovation Panel to try new features early and help make Mumsnet better.

See all MNHQ comments on this thread

Mumsnet Corpus

1000 replies

TokyoBouncyBall · 19/04/2024 11:36

Not a TAAT, but a bit of googling as a result of a now deleted thread has led me to this:

https://fold.aston.ac.uk/handle/123456789/18

I note it says that the License is uncertain. Can you confirm that you have given permission for posts to be used in this way, or is there something that Aston might like to look into?

I note it says Users who wish to access this dataset must make a detailed application to FoLD and the researcher, as well as potentially gain additional agreement from an external organisation before they can be approved for access.

Given one of the uses it is being put to, I think it is a bit dubious to say the least.

OP posts:
Thread gallery
82
Ormally · 24/04/2024 17:11

I would expect a forensic linguistics institute to be right on top of data protection law, conventions, ethics, and regulations. I mean handling data is their actual raisonne d'etre, isn't it?

Universities, and the DPO named as responsible for such a place if the institute is not independent, but one of their departments, should also be on top of their own obligations towards data protection. Most give basic training suited to Higher Education to all employees highlighting a much larger stick (the examples of the size of institutional fines that have been levied) than carrot.

This is what makes it look, to me, as if there should be a Data Controller and a Data Processor here (pointing to one being Mumsnet and one Aston), but it's very fuzzy as to which is which, or whether both could be sitting uncomfortably in one of the roles, because of the circumstances and the 'sandbox claim' in which the university came to hold the 'donated' research source. Also the question of which controller is, or was, waving through the use of the subjects' data (the posters), for what purpose.

Dumbledoreslemonsherbets · 24/04/2024 17:13

Exchanging indecent images of children is illegal. Sometimes, certain groups (the police, not Aston) are allowed to break some laws in order to uphold others. So forensic authorship analysis might be ok as part of a criminal investigation by a law enforcement agency.

Women talking online is not illegal here (it more or less is in Afghanistan now - think about who you're 'aligning' with here Aston!) so it seems fairly clear to me that data scraping MN is illegal under GDPR.

Dumbledoreslemonsherbets · 24/04/2024 17:15

Maybe this is a case of 'Stonewall law' and organisations acting as if the law is the way they want it to be (like in Afghanistan) rather than how it actually is. The 'transphobia' PhD suggests this might be the case.

RedToothBrush · 24/04/2024 17:18

Dumbledoreslemonsherbets · 24/04/2024 17:15

Maybe this is a case of 'Stonewall law' and organisations acting as if the law is the way they want it to be (like in Afghanistan) rather than how it actually is. The 'transphobia' PhD suggests this might be the case.

I think it may well be.

If it is and it's being used primarily against women, I think that raises all sorts of questions for me.

Ormally · 24/04/2024 17:25

(As a kind of edit to my last post):
A quick web search has indeed easily found the Aston mandatory training page (data protection law and safeguarding being on the list for internally provided training for all employees) and data protection being one heading among 11 top level 'Research Governance' headings.

DrBlackbird · 24/04/2024 17:27

SqueakyDinosaur · 24/04/2024 16:46

I'm wondering now if this research and this event also used MN data?

https://twitter.com/lucia__busso/status/1765017857543352799

It wouldn’t be a huge shock to find out that MN data is a gold mine for Aston…

First, for ethical reasons (researchers well-being), we used a dataset collected from a benign open web parenting discussion forum.

And nice to see that their rationale and ethical focus is the researcher’s wellbeing. Whether it’s MN or another parenting website, there’s no mention of the people whose highly personal data is enabling their bread and butter research papers and conference presentations.

https://link.springer.com/chapter/10.1007/978-3-031-47508-5_11

Hierarchies of Power: Identifying Expertise in Anonymous Online Interactions

This paper sets the stage for our primary objective, which is to identify and examine various forms of claimed expertise in anonymous online interactions. By building upon the findings and incorporating the proposed enhancements, we aim to gain a deepe...

https://link.springer.com/chapter/10.1007/978-3-031-47508-5_11

DrBlackbird · 24/04/2024 17:40

So is it the FoLD that @JustineMumsnet is going to ask about ie not just use of MN scraped data but whether MN data is actually being held in FoLD?

Given that one example of data held in FoLD (as per the attached paper) is ‘abusive language targeting the transgender community in a Facebook post’ and the PhD student and her supervisor clearly believe MN is a forum where posters use transphobic comments, it’s not a big leap in logic to come to the conclusion that MN data is being held there…

https://publications.aston.ac.uk/id/eprint/43719/3/Aston_Forensic_Linguistic_Databank_FOLD_001_petykoetal.pdf

https://publications.aston.ac.uk/id/eprint/43719/3/Aston_Forensic_Linguistic_Databank_FOLD_001_petykoetal.pdf

Ereshkigalangcleg · 24/04/2024 17:40

Ereshkigalangcleg · 22/04/2024 20:19

Repeating my post from the other thread here:

Krysztof Kredens and Piotr Pezik spoke at this Forensic Linguistics round table in 2019 about the Mumsnet dataset as part of their corpus.

I've ploughed my way through the whole event. It's actually very interesting to understand what we're dealing with, but the talk which refers to Mumsnet starts at 3 hours 17 minutes. It looks like they set it up as an easily scraped "sandbox" model to play around with. They refer to the distinctive language of "women having fertility treatment" and "dieting".

It's also worth watching Tim Grant's section, he's on second, and the Q&A for that.

For anyone asking for the timings.

Ereshkigalangcleg · 24/04/2024 17:44

I found this video on Saturday, finished watching it on Monday. As people have correctly identified it's over 7 hours long and I've watched all of it.

Ereshkigalangcleg · 24/04/2024 17:52

What I will say is that the forensic speech authorship practitioners mentioned that in the last few years they've had to change practices a lot due to GDPR and data protection law changes. That doesn't seem to have filtered through to these guys.

TokyoBouncyBall · 24/04/2024 18:15

Ereshkigalangcleg · 24/04/2024 17:44

I found this video on Saturday, finished watching it on Monday. As people have correctly identified it's over 7 hours long and I've watched all of it.

You win this thread. Taking one very much for the team.

OP posts:
TokyoBouncyBall · 24/04/2024 18:16

DrBlackbird · 24/04/2024 17:40

So is it the FoLD that @JustineMumsnet is going to ask about ie not just use of MN scraped data but whether MN data is actually being held in FoLD?

Given that one example of data held in FoLD (as per the attached paper) is ‘abusive language targeting the transgender community in a Facebook post’ and the PhD student and her supervisor clearly believe MN is a forum where posters use transphobic comments, it’s not a big leap in logic to come to the conclusion that MN data is being held there…

https://publications.aston.ac.uk/id/eprint/43719/3/Aston_Forensic_Linguistic_Databank_FOLD_001_petykoetal.pdf

If you look at the link on the very first post (now dead but archived a bit further down) the URL very much suggests that it was being held in FOLD. I had a bit of a rifle through that on Friday and yes, it's very much the Unabomber, court records, offenders and, er, us.

OP posts:
TokyoBouncyBall · 24/04/2024 18:21

@DrBlackbird I'm now reading that document that you linked to, which is a description of FoLD. MN was under controlled access - want to know what that is about?

Datasets with controlled access contain highly sensitive material that may come from a third party and have even heavier constraints on access and use. Controlled datasets are therefore stored not on the FoLD web server but on an air-gapped, offline computer in our secure data lab at the Aston Institute for Forensic Linguistics. Users who wish to access these datasets must make a detailed application to FoLD and the data owner, as well as potentially gain additional agreement from an external organisation before they can be approved for access. Although information on controlled datasets is detailed in the FoLD repository for users to search, the data itself is not available for download and users may need to visit Aston or agree a secure means of access. Datasets in this category include, for example, scraped data from white supremacist and dark-web child abuse discussion fora (Kredens and Pezik 2021b,a).

OP posts:
Ereshkigalangcleg · 24/04/2024 18:22

You win this thread. Taking one very much for the team.

I think other people have also watched it now, like @KellieJaysLapdog and I'd recommend watching it to everyone as an introduction to what they do.

SoupDragonsFriend · 24/04/2024 18:23

Would it be possible through making a FoI request to find out how much money has been made by FoLD, The Aston Institute for Forensic Linguistics and the Aston student software company, Beautiful Canoe, from marketing the MN stolen data or is that kind of thing outside the remit of FoI requests? (Assuming that any money-making has happened.)

TokyoBouncyBall · 24/04/2024 18:28

@SoupDragonsFriend I think they are getting grants (possibly from the US and I wonder which bit of govt) to use it so the FOI would have to be worded very carefully.

And this thanks to the Internet Archive

Mumsnet Corpus
OP posts:
Talulahalula · 24/04/2024 18:29

JustineMumsnet · 24/04/2024 10:30

Further update: I spoke to the Vice Chancellor this am. He's promised that he and his team will take time to thoroughly answer our questions which I'm sending over now. A couple of things he wanted to stress - Aston believe they have legitimate rights to use the data and there is/has been no intention to identify individual posters from their posts. He also accepted that the recent research by a first year PHD into "transphobia" may not be of the quality they expect and that he will investigate and commit to enhancements in quality if appropriate. Obviously there's lots more detail we need from them - will update here as and when we hear back.

Does legitimate right to use the data mean legitimate interest in the terms of the GDPR? So for example because there is a public interest in the work they are doing with the dataset which informs work about criminality?

https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/lawful-basis/legitimate-interests/what-is-the-legitimate-interests-basis/

Because surely this is overridden by the amount of material on the site concerning children, apart from anything else? The quotes in the paper which drew from the adoption board directly concerned children.
J am sorry I have not had a chance to catch up on the thread yet, but what is the legitimate right based on?

What is the ‘legitimate interests’ basis?

These pages sit alongside our Guide to the GDPR and provide more detailed guidance for UK organisations on legitimate interests under the GDPR.

https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/lawful-basis/legitimate-interests/what-is-the-legitimate-interests-basis/

ArabellaScott · 24/04/2024 18:35

Their 'legitimate interest' is that they are interested, they want the data, they identify as good guys, and so they're bloody well having it.

RedToothBrush · 24/04/2024 18:35

Talulahalula · 24/04/2024 18:29

Does legitimate right to use the data mean legitimate interest in the terms of the GDPR? So for example because there is a public interest in the work they are doing with the dataset which informs work about criminality?

https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/lawful-basis/legitimate-interests/what-is-the-legitimate-interests-basis/

Because surely this is overridden by the amount of material on the site concerning children, apart from anything else? The quotes in the paper which drew from the adoption board directly concerned children.
J am sorry I have not had a chance to catch up on the thread yet, but what is the legitimate right based on?

Legitimate interests have to be very narrow

I'd argue that the use of language has to be legally deemed illegal to be studied. It can't be non-crimes.

Given what has happened with the Scottish hate crime act this is somewhat problematic (and doesn't extend to England and Wales anyway).

The law also does allow for whistleblowing.

So my argument would be if the data was obtained to be used for examining criminal activity then there has to be evidence of criminal activity and it can't be used for studying something which is non criminal.

Because otherwise it's outside the legitimate usage of the data and isnt being used for it's intended purposes.

Talulahalula · 24/04/2024 18:38

ArabellaScott · 24/04/2024 18:35

Their 'legitimate interest' is that they are interested, they want the data, they identify as good guys, and so they're bloody well having it.

Yes but I would like to know what their public facing lawyer speak argument is. What is meant by legitimate right to use by them and why?

(I agree with the counter-arguments, I just want to make sure I understand their argument but I am not sure there is enough information yet).

RedToothBrush · 24/04/2024 18:47

Talulahalula · 24/04/2024 18:38

Yes but I would like to know what their public facing lawyer speak argument is. What is meant by legitimate right to use by them and why?

(I agree with the counter-arguments, I just want to make sure I understand their argument but I am not sure there is enough information yet).

This is the type of question that needs to be asked though. It's really specific.

Some of this might well come down to those pesky issues of definition and who is making those definitions.

Given that the definition of transphobia was totally wacko to begin with and Aston didn't seem to have an issue with it, I wouldn't like to trust Aston's definition of what fails under legitimate purposes either without proper scrutiny.

ArabellaScott · 24/04/2024 18:48

Well, they harvested the data back before the PhD on 'hate crimes'.

Apparently using a 'harmless' site to preserve researchers wellbeing?

So I guess their 'legitimate interest' is going to be 'research and furthering their ability to research'.

PerkingFaintly · 24/04/2024 18:57

ArabellaScott · 24/04/2024 10:38

They don't get to scapegoat the student, btw. She's just been the canary.

This.

Riva5784 · 24/04/2024 18:57

Justine said Aston believe they have legitimate rights to use the data. I note they used the phrase legitimate rights. They did not say legitimate interest, which has a specific legal meaning. It's just more obfuscating waffle.

Please create an account

To comment on this thread you need to create a Mumsnet account.

This thread is not accepting new messages.