Meet the Other Phone. Flexible and made to last.

Meet the Other Phone.
Flexible and made to last.

Buy now

Please or to access all these features

Site stuff

Join our Innovation Panel to try new features early and help make Mumsnet better.

See all MNHQ comments on this thread

Mumsnet Corpus

1000 replies

TokyoBouncyBall · 19/04/2024 11:36

Not a TAAT, but a bit of googling as a result of a now deleted thread has led me to this:

https://fold.aston.ac.uk/handle/123456789/18

I note it says that the License is uncertain. Can you confirm that you have given permission for posts to be used in this way, or is there something that Aston might like to look into?

I note it says Users who wish to access this dataset must make a detailed application to FoLD and the researcher, as well as potentially gain additional agreement from an external organisation before they can be approved for access.

Given one of the uses it is being put to, I think it is a bit dubious to say the least.

OP posts:
Thread gallery
82
Cazpar · 24/04/2024 18:59

Riva5784 · 24/04/2024 18:57

Justine said Aston believe they have legitimate rights to use the data. I note they used the phrase legitimate rights. They did not say legitimate interest, which has a specific legal meaning. It's just more obfuscating waffle.

We don't know that. All we know is that Justine has used the phrase when describing the conversation in what is a quite brief post.

AgathaAllAlong · 24/04/2024 19:05

ArabellaScott · 24/04/2024 18:48

Well, they harvested the data back before the PhD on 'hate crimes'.

Apparently using a 'harmless' site to preserve researchers wellbeing?

So I guess their 'legitimate interest' is going to be 'research and furthering their ability to research'.

Exactly, the bit where they said they looked at the adoption talk subsection of this forum to preserve researcher interest really stood out to me.

In the video, the Scrapers demonstrate such little care or understanding of our forum. At 3.28ish they have a slide up showing the sites they've scraped. There are 4, and only MN is named. They say that their data assumes that each "nickname" (i.e. username) corresponds to one individual. They note that on social media you have more mixed identities and "troll accounts" but they they don't consider it so much of a problem on traditional discussion boards (they don't say explicitly but from context they mean MN). This detail demonstrates that they have taken no time at all to get to know this forum. They don't know (and don't care enough to find out) that we all frequently name change, and that we do it for privacy and safety reasons (often with serious real life consequences). Not only is it an unethical attitude, it's a flaw in their research outputs..

They go on to demonstrate their authorship identifying tool by picking out a user (not named) and demonstrating how they tracked them across threads by picking up on characteristics in their typing such as particular expressions used, characteristic spelling mistakes, characteristics typos. They have the real examples up on their slide. Then, they say that the topic that someone posts about also helps identify them, and they give as an example someone who mainly posts about infertility on the infertility board. Again, this shows a complete disregard for MN users and the purpose of our posts.

Riva5784 · 24/04/2024 19:10

Cazpar · 24/04/2024 18:59

We don't know that. All we know is that Justine has used the phrase when describing the conversation in what is a quite brief post.

OK fair enough. There is a lot we don't know.

I understand Justine giving brief updates and MNHQ wanting to play their cards close to their chests. Aston, journalists and others will be looking at these threads.

Encyclopediaofnonsense · 24/04/2024 19:13

Point of interest. Are we not at risk of causing a Streisand effect with this?

Ereshkigalangcleg · 24/04/2024 19:15

I'm not sure that isn't desirable personally. This issue has wider implications for social media than just Mumsnet.

Beingboredisgoodforyou · 24/04/2024 19:21
  • This is an outline of their approval process for accepting datasets into the FOLD databank. It looks rigorous until you realise that it's an internal ethics approval system with one half of the department overseeing the ethics of the other half who run the databank. There appears to be very little scrutiny by the wider university.
  • Aston Forensic Linguistic Databank FOLD_001-petykoetal
  • Copyright (c) 2022 Marton Petyko, et al. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License Final published version, 1.42 MBLicence: CC BY-NC 4.0

https://research.aston.ac.uk/files/76517882/Aston_Forensic_Linguistic_Databank_FOLD_001_petykoetal.pdf

MyLadyDisdainlsYetLiving · 24/04/2024 19:31

Ereshkigalangcleg · 24/04/2024 19:15

I'm not sure that isn't desirable personally. This issue has wider implications for social media than just Mumsnet.

To be honest, given the rate of development of social media, our online lives, the increasing tendency of governments of any party to be authoritarian and the use of AI, I think it’s going to be very difficult to be anonymous in the future, in “mainstream” online life. So I see it as incredibly important to have these wider debates now to establish where we hold the line on what is and isn’t acceptable for organisations to do with our data.

I do think many people will think that if you put your life online then you have no comeback, but I think that is a very naive view not fully appreciating the increasing capabilities of technology. Plus putting trust in governments and organisations to behave ethically and in line with the law. The tainted blood and the post office enquiries show that trust can be misplaced.

Ereshkigalangcleg · 24/04/2024 19:33

So I see it as incredibly important to have these wider debates now to establish where we hold the line on what is and isn’t acceptable for organisations to do with our data.

Yes that's the point I'm at with this.

ArabellaScott · 24/04/2024 19:44

AgathaAllAlong · 24/04/2024 19:05

Exactly, the bit where they said they looked at the adoption talk subsection of this forum to preserve researcher interest really stood out to me.

In the video, the Scrapers demonstrate such little care or understanding of our forum. At 3.28ish they have a slide up showing the sites they've scraped. There are 4, and only MN is named. They say that their data assumes that each "nickname" (i.e. username) corresponds to one individual. They note that on social media you have more mixed identities and "troll accounts" but they they don't consider it so much of a problem on traditional discussion boards (they don't say explicitly but from context they mean MN). This detail demonstrates that they have taken no time at all to get to know this forum. They don't know (and don't care enough to find out) that we all frequently name change, and that we do it for privacy and safety reasons (often with serious real life consequences). Not only is it an unethical attitude, it's a flaw in their research outputs..

They go on to demonstrate their authorship identifying tool by picking out a user (not named) and demonstrating how they tracked them across threads by picking up on characteristics in their typing such as particular expressions used, characteristic spelling mistakes, characteristics typos. They have the real examples up on their slide. Then, they say that the topic that someone posts about also helps identify them, and they give as an example someone who mainly posts about infertility on the infertility board. Again, this shows a complete disregard for MN users and the purpose of our posts.

I can't respond to that without swearing, tbh.

Lumpysaurus · 24/04/2024 19:54

I think back to learning to use log tables in maths in the days without electronic calculators and then having a go at writing my first computer programme as a teenager during a spell doing a summer placement at a major science research centre in the early 70s. I filled in a paper form using something like Fortran to calculate the simplest of addition sums and handed it in to be processed. It was returned to me the next day together with the punch tape and cards that my text had been converted into and a printout of the calculation showing that 2 plus 3 did in fact equal 5.
I recently came across this from another Aston department when I was digging around
https://www.aston.ac.uk/latest-news/aston-university-researchers-send-data-45-million-times-faster-average-broadband
The rate of change on all fronts is too big to imagine.

Talulahalula · 24/04/2024 20:02

Riva5784 · 24/04/2024 18:57

Justine said Aston believe they have legitimate rights to use the data. I note they used the phrase legitimate rights. They did not say legitimate interest, which has a specific legal meaning. It's just more obfuscating waffle.

hi, yes I understand that - I was just pondering what was meant by legitimate right - it’s an odd phrasing and I wondered if it was a conflation of legal right and legitimate interest. The word legitimate brought to mind its use in the GDPR context.
Another way of phrasing my question would be on what grounds do Aston consider their creation, use and storage of the MN corpus to be legitimate? Why do they think what they are doing is okay?

hamstersarse · 24/04/2024 20:06

Dumbledoreslemonsherbets · 24/04/2024 17:15

Maybe this is a case of 'Stonewall law' and organisations acting as if the law is the way they want it to be (like in Afghanistan) rather than how it actually is. The 'transphobia' PhD suggests this might be the case.

I agree with this. Their ideology has taken them to a place where they believe they are ‘just right’ about it all and can do want they want to expose these awful transphobes

The whole course is advertised as being about ‘getting justice’.

Hopefully the ‘are we the bad guys’ meme will hit home very soon

Morred · 24/04/2024 20:21

If data is held in a forensic linguistics repository and is used to determine if (hate) crimes are being committed, is there any expectation (either moral or legal) that any crimes so identified would be reported to the police? Who could then commission use of the same forensic dataset to try to identify who had committed these crimes?

Winnading · 24/04/2024 20:24

ArabellaScott · 24/04/2024 18:48

Well, they harvested the data back before the PhD on 'hate crimes'.

Apparently using a 'harmless' site to preserve researchers wellbeing?

So I guess their 'legitimate interest' is going to be 'research and furthering their ability to research'.

So the site is both harmless AND a haven for transphobic bigots?
Does anyone think critically anymore?

Encyclopediaofnonsense · 24/04/2024 20:26

There is an alternate argument to this, that it is being held as evidence of a neutral source of inoffensive dialogue and is being used as comparative data to the criminal dialogue they're researching. The ethical considerations of whether anon/pseudonymised postings should be analysed where there is no criminal intent are what needs to be focused on from this.

AlisonDonut · 24/04/2024 20:33

Morred · 24/04/2024 20:21

If data is held in a forensic linguistics repository and is used to determine if (hate) crimes are being committed, is there any expectation (either moral or legal) that any crimes so identified would be reported to the police? Who could then commission use of the same forensic dataset to try to identify who had committed these crimes?

That's the general gist of the research yes.

SqueakyDinosaur · 24/04/2024 20:35

Encyclopediaofnonsense · 24/04/2024 20:26

There is an alternate argument to this, that it is being held as evidence of a neutral source of inoffensive dialogue and is being used as comparative data to the criminal dialogue they're researching. The ethical considerations of whether anon/pseudonymised postings should be analysed where there is no criminal intent are what needs to be focused on from this.

There really, really isn't. Because a PhD candidate presumably applied for permission to use the data in this way. It may originally have been scraped with something like that in mind, but it's clear both from the original talk outline that kicked these threads off, and from the video of the Dodgy Scraper Guys, that they have gone well beyond that with apparently zero qualms.

ItsAllGoingToBeFine · 24/04/2024 21:07

I wonder when MN added data scraping to their Ts and C's? Perhaps the data scraping was carried out before this?

Talulahalula · 24/04/2024 21:12

ItsAllGoingToBeFine · 24/04/2024 21:07

I wonder when MN added data scraping to their Ts and C's? Perhaps the data scraping was carried out before this?

I think it has been updated though, as the PhD went up to 2023.

everythingthelighttouches · 24/04/2024 22:07

From the app (my bold)

“There are also rules that apply to special categories of personal data and seem to limit the requirements when it comes to publicly available data. That is, in line with Article 9, if the processing relates to personal data that are manifestly made public by the data subject, no explicit consent or other legal basis as enlisted in the Article 9 (mainly specific laws and regulations or establishment, exercise or defense of legal claims) is required.
On the other hand, such data would have to be made public by the data subject, and more than that, manifestly made public, so as to indicate that they wish and expect such data to be further processed. No need to mention that all other provisions, including the principles and the Article 6, still apply, and also the personal data may be processed only if the purpose of the processing could not reasonably be fulfilled by other means.”

This link is well worth a read
https://iapp.org/news/a/publicly-available-data-under-gdpr-main-considerations/

Publicly available data under the GDPR: Main considerations

One the issues when applying the specific EU General Data Protection Regulation provisions, including the very principles relating to processing of personal dat

https://iapp.org/news/a/publicly-available-data-under-gdpr-main-considerations/

everythingthelighttouches · 24/04/2024 22:07

Sorry that should say from the iapp

VitoCorleoneOfMNMafia · 24/04/2024 22:16

Winnading · 24/04/2024 20:24

So the site is both harmless AND a haven for transphobic bigots?
Does anyone think critically anymore?

You win the thread.

SqueakyDinosaur · 24/04/2024 22:44

This is totally irrelevant, but why would the iapp website illustrate a piece about the EU GDPR regulations with a photo of Grand Central Station New York? I mean, come ON!

RedToothBrush · 24/04/2024 22:48

SqueakyDinosaur · 24/04/2024 20:35

There really, really isn't. Because a PhD candidate presumably applied for permission to use the data in this way. It may originally have been scraped with something like that in mind, but it's clear both from the original talk outline that kicked these threads off, and from the video of the Dodgy Scraper Guys, that they have gone well beyond that with apparently zero qualms.

Yep, if the data was scrapped under permissions to research criminality (assuming this is a legal exemption or there was a permission granted), that is a different purpose to using data to demonstrate a control.

Thats a different use.

You are only allowed to use the data for a very narrow set purpose or for criminal research.

If there isn't a crime, you can't just use the criminality exemption to research it. Nor if you do have permission to use the data, can you just decide to use it for another purpose without revisiting all the permissions.

So either route really shouldn't be possible in this scenario.

Talulahalula · 24/04/2024 22:51

One thing the iapp link brought to mind was that the data subject can change their mind about the processing and ask for their data to be deleted. Hence, MN will delete posts if requested to do so and you can delete your account. As posted already on this thread, I think, this is not possible in the data scrape.
And looking at the ICO link again, which I posted above, I come back to the point that the data from MN which makes the corpus is very heavily about children.
There’s also a point about reasonable expectations of how data will be used, which is surely not anyone’s reasonable expectation that what they post on MN will end up in a forensic linguistic ‘sandbox’ used for author attribution research.

Mumsnet Corpus
Please create an account

To comment on this thread you need to create a Mumsnet account.

This thread is not accepting new messages.