Meet the Other Phone. A phone that grows with your child.

Meet the Other Phone.
A phone that grows with your child.

Buy now

Please or to access all these features

Site stuff

Join our Innovation Panel to try new features early and help make Mumsnet better.

See all MNHQ comments on this thread

Mumsnet Corpus

1000 replies

TokyoBouncyBall · 19/04/2024 11:36

Not a TAAT, but a bit of googling as a result of a now deleted thread has led me to this:

https://fold.aston.ac.uk/handle/123456789/18

I note it says that the License is uncertain. Can you confirm that you have given permission for posts to be used in this way, or is there something that Aston might like to look into?

I note it says Users who wish to access this dataset must make a detailed application to FoLD and the researcher, as well as potentially gain additional agreement from an external organisation before they can be approved for access.

Given one of the uses it is being put to, I think it is a bit dubious to say the least.

OP posts:
Thread gallery
82
VitoCorleoneOfMNMafia · 08/05/2024 22:13

DrSpartacular · 08/05/2024 20:01

I thought posters were referring to asking Aston to delete data?

There seem to be crossed wires here.

Mumsnet will delete posts if there's a risk of being identified. You don't need to give all your usernames as they're already recorded against your account.

There is no way for Aston, or any researchers who've published using Aston's MN corpus, to link RL identities to posts/names on their (already or soon to be deleted) MN corpus, and no way for posters to prove who they are to request deletions of data. I could email and say 'hey, I'm langcleg, delete everything I've written' but there's no way to prove that either way.

Ultimately, the use of anonymous at source data like this gives us as data subjects fewer rights to our data as we are unable to prove it even is our data.

We can easily prove that it is our data.

  1. I make a thread in OTBT and copy its URL.
  2. I email Aston saying "please delete data of Vito. I will post the phrase 'pombears are not naice' in the thread at <insert URL here> as Vito to prove that I own that username after sending this email".
  3. I wait an hour and post 'pombears are not naice' in that thread.

Because I emailed them with the phrase before I posted it on here, provable by comparing timestamp of the email and the post, it's highly unlikely that the email can be from anyone other than that username.

Talulahalula · 08/05/2024 22:31

DrSpartacular · 08/05/2024 20:53

Of course there is. We see it all the time on MN that posters ask for their threads to be deleted because they are too identifying, or other parties recognise themselves in the description and join in.

But that risk already exists by posting on here. Most people know already that tabloids swipe threads from here, which presents a far greater risk of being identified.

I don't dispute that Aston's data scraping, dodgy use of data they don't have consent to use, and their playing in the sandbox of women's lives is all abhorrent, but they've agreed to delete it. What I do think needs to happen is that this whole issue needs to be drawn to the attention of all MNers highlighting how to request post/history deleting. Because it's the MN posting histories where risks of identification are strongest.

My point was not that posters could not be identified in other ways, it was that Aston are being disingenuous if they say their intention was not to identify posters IRL. Firstly because it is possible to identify someone from a post and secondly because their research is about author attribution. So while they may not have intended to identify MN posters, it is theoretically possible to identify someone from the quotes they have used and secondly, the direction of travel of their research is to identify people from online posts.

Also, I might be wrong, but I think even the Daily Mail does not lift threads about adoption and infertility. And finally, Aston University, like all universities, is supposed to pay attention to research ethics, not ride roughshod over it.

And then, we have the point that their 11.3m grant is part of a project funded by the US govt intelligence service. And the dataset was offered for use by other researchers (now taken down).

Not that I am condoning lazy journalism, but I think there are quite big differences in what has gone on here and what the tabloids do.

Fallingirl · 08/05/2024 23:43

The legalities of it all aside, I am gobsmacked that Aston University is being obstinate about the smaller data scrape, the one intended for the PhD research.

How on earth is it even thinkable to give ethical permission to use data, however pseudonymous and publicly posted, when you have been told explicitly that the authors of that data do not consent to their posts being used?

At the very least, the thesis and any publication based on her work, should have a footnote explaining these data were used non-consensually.

Winnading · 09/05/2024 05:59

Fallingirl · 08/05/2024 23:43

The legalities of it all aside, I am gobsmacked that Aston University is being obstinate about the smaller data scrape, the one intended for the PhD research.

How on earth is it even thinkable to give ethical permission to use data, however pseudonymous and publicly posted, when you have been told explicitly that the authors of that data do not consent to their posts being used?

At the very least, the thesis and any publication based on her work, should have a footnote explaining these data were used non-consensually.

It's the current in vogue to find transphobia anywhere. Even better if its women commiting such.
And I guess by us all complaining, it 'proves' the theory. The old, if your not doing anything wrong, you have nothing to fear.

ArabellaScott · 09/05/2024 06:25

their 11.3m grant is part of a project funded by the US govt intelligence service.

Should we be contacting the Pentagon and asking for their DPO?

EdenPalmersVenomViper · 09/05/2024 08:43

I would like to know how Eden was defining 'transphobia'.

This issue with the way that the word is used to shut down debate by being thrown at anyone who does not completely agree with Trans Ideology is untethered from the word 'phobia.'

DrP spoke about this on the recent Queen's Speech with Clive Simpson.

If a woman prisoner didn't want to be locked up with Karen White (as they had a completely reasonable fear of Karen hurting them based on Karen's previous record of hurting women) is that transphobic?

Is understanding that human being cannot change sex even with cross sex hormones and surgery (so biological reality) transphobic?

I'm sure that TRAs would say yes.

This is madness! Understanding reality and having reasonable fears is not any kind of phobia.

Think about religion. Some people are Christians, some people are not Christians. Are the non-Christians all automatially Christanphobic purely because they do not believe in the religion? Some people believe in Trans Ideology, some people do not believe Trans Ideology. If someone does not believe, the believers do throw 'Transphobe' at the non-believers all the friggin time!

As I said, it's madness!

AstonsDataThief · 09/05/2024 08:46

ArabellaScott · 09/05/2024 06:25

their 11.3m grant is part of a project funded by the US govt intelligence service.

Should we be contacting the Pentagon and asking for their DPO?

How is this not commercial research then? That is not exempted from copyright - even if copyright was the only issue here.

SamuelDJackson · 09/05/2024 08:55

Are Aston being obstinate about the second data scrape (pHD project) or is it being treated/argued differently because it raises different issues?

Thinking in terms of both timeline and purposes

First dataset collection taken before there was clear guidance and official advice on scraping - unethical but with no obvious malign intent, broad collection of data many boards, used as a 'sandbox' for testing ideas and programs initially, - clearly not appropriate but probably done unthinkingly, without an agenda against Mumsnet posters (beyond the usual chauvinism and disregard of boundaries)

Update of Mumsnet T&Cs with specific rules on scraping in Nov 23? researcher takes a smaller scraped dataset Jan 24 - so very clearly in breach of the sites rules, and also their terms of engagement with researchers. Second dataset very much collected with an agenda, selected boards, focused on political ideas and language, with a clearly ideologically/politically biased agenda in its scope and definition, and the implicit idea that posters and the site are harboring unacceptable views - seems to be much more to discuss in terms of researcher ethics, commercial harm and legal consequences.

I am waiting with interest for Mumsnets updates on the second dataset issues

Ormally · 09/05/2024 08:57

Talulahalula · 08/05/2024 22:31

My point was not that posters could not be identified in other ways, it was that Aston are being disingenuous if they say their intention was not to identify posters IRL. Firstly because it is possible to identify someone from a post and secondly because their research is about author attribution. So while they may not have intended to identify MN posters, it is theoretically possible to identify someone from the quotes they have used and secondly, the direction of travel of their research is to identify people from online posts.

Also, I might be wrong, but I think even the Daily Mail does not lift threads about adoption and infertility. And finally, Aston University, like all universities, is supposed to pay attention to research ethics, not ride roughshod over it.

And then, we have the point that their 11.3m grant is part of a project funded by the US govt intelligence service. And the dataset was offered for use by other researchers (now taken down).

Not that I am condoning lazy journalism, but I think there are quite big differences in what has gone on here and what the tabloids do.

Edited

Very much agree.
And the tabloids aren't using a data scrape; and most are open to comment, a complaints mechanism, and a regulator if their articles step over too many lines.

Astontacious · 09/05/2024 08:58

Winnading · 09/05/2024 05:59

It's the current in vogue to find transphobia anywhere. Even better if its women commiting such.
And I guess by us all complaining, it 'proves' the theory. The old, if your not doing anything wrong, you have nothing to fear.

But this has gone beyond the ducking stool to whether the witch drowns or floats. The women don’t even need to drown to prove their innocence. The ‘witches’ have already been condemned by the researcher as committing hate crimes. The PhD is examining how these have altered over time.

edit: it may not have been a ducking stool, maybe they were just thrown in?

Ormally · 09/05/2024 09:14

...and no way for posters to prove who they are to request deletions of data. I could email and say 'hey, I'm langcleg, delete everything I've written' but there's no way to prove that either way.

In the circumstances, that's ironic. The use of language (and whatever pseudonyms used would be very much a part of this analysis, as well as other factors, I think) is apparently such that it will give PHD research-worthy indications of posts being written by the same source over time. But contacting the university on this, you could claim to be whoever you like, so YOUR request could probably be discredited due to not having enough information to merit THEIR corpus posts being deleted.

However, they are confident they can 'prove' similarities in style, grammar, times of your activity using MN on different threads, local information, specific themes, age, hell - probably socio economics if you like, very easy - if you are writing on debt or renting or childcare credit or IVF or private schooling. There's grammar and systems knowledge as a small part, but seeing the actual meaning and content as fair game is the magic ingredient that makes the research tick, and no point pretending otherwise.

AlisonDonut · 09/05/2024 09:27

I suspect it is because the level of sign off is lower and they cannot be seen to sack a TRA for this. But I am suspicious old witch.

RethinkingLife · 09/05/2024 10:03

I would like to be a WhatsApp fly on the wall for the academic groups where the removal of the corpus is discussed.

I should think it will be a major topic of discussion at the upcoming regional conference event.

I'd especially like to know what the US funders makes of it if access to the corpus is a contingent part of the award.

RethinkingLife · 09/05/2024 10:14

AstonsDataThief · 08/05/2024 21:16

But that risk already exists by posting on here. Most people know already that tabloids swipe threads from here, which presents a far greater risk of being identified.

Tabloids might but there is another important distinction here. Tabloids are the public. Aston University are part of The State. They are a public authority. We might not think of them as being part of the apparatus of the state but that is what they are. They are part of the same organisation that runs the police, tax and benefits system, education etc.

I recommend this book for a perspective on Aston and its work amongst other matters: Seeing Like a State: How Certain Schemes to Improve the Human Condition Have Failed

Scott shows how central governments attempt to force legibility on their subjects, and fail to see complex, valuable forms of local social order and knowledge. A main theme of this book, illustrated by his historic examples, is that states operate systems of power toward 'legibility' in order to see their subjects correctly in a top-down, modernist, model that is flawed, problematic, and often ends poorly for subjects. The goal of local legibility by the state is transparency from the top down, from the top of the tower or the center/seat of the government, so the state can effectively operate upon their subjects.

https://www.amazon.co.uk/dp/0300246757/?

https://www.amazon.co.uk/dp/0300246757?tag=mumsnet&ascsubtag=mnforum-site-stuff-5057903-mumsnet-corpus

AstonVillains · 09/05/2024 10:14

AlisonDonut · 09/05/2024 09:27

I suspect it is because the level of sign off is lower and they cannot be seen to sack a TRA for this. But I am suspicious old witch.

But they don't need to sack the researcher, just tell her to change the subject of her PhD, surely?

Dumbledoreslemonsherbets · 09/05/2024 10:21

everythingthelighttouches · 08/05/2024 19:59

I think Aston has breached GDPR.

I want to email their DPO, Samantha Burns, to ask the following:

“I would like to make a data subject access request to know what is the lawful basis Aston is using for processing my data, scraped from Mumsnet, under article 6 of the UKGDPR

I would like to know the conditions for processing my special category data that Aston is using, including my philosophical beliefs, scraped from Mumsnet, under article 9 of the UKGDPR .

I would like to know if a DPIA was carried out by Aston University in their data scraping activities of Mumsnet.”

I don’t believe that Aston has the appropriate lawful basis, conditions for processing or that they have conducted a DPIA. With this information, I would go to the ICO.

I cannot write to the DPO at Aston myself, as I may interact with them in a professional capacity.

If anyone is contacting the DPO at Aston, please can they ask these questions as written here?

many thanks

@everythingthelighttouches I'll use your questions when I write, but I will write from a disposable email address.

Can you not do the same? Write to them from a disposable / temporary email?

I agree it's absolutely clear that they've breached GDPR. The problem is the GDPR breach affects users rather than MN as a business directly. MNHQ are more interested in copyright. So we do need to seek redress on this, which carries a risk of outing (I accept even using a disposable email carries some risk for all but the most internet / computer savvy among us).

It's tricky because they've already shown they can't be trusted and have the tools to doxx / identify users from anonymous posts. They've also shown a total disregard for MN users - despite a good number of women here expressing lack of consent quite explicitly and upset and concern about the facts and use of Aston's illegal data scrape, they're still identifying as self-righteous and seem to very much have the attitude of stuff our feelings and rights under the law. Really the legislation needs to evolve to deal with this issue as it's discriminating against access to justice for the most vulnerable. Catch 22.

And that's before we even get to the PhD which summarily slurs MN users at the starting point as 'transphobic' (before any analysis of philosophical beliefs, which I note are protected in law).

I would be in favour of a law which enables users who are fearful of the response of the DPO they're supposed to contact, based on past actions (which would apply here with Aston, who have behaved reprehensibly) to be able to contact ICO as a first port of call, as a more independent body.

Dumbledoreslemonsherbets · 09/05/2024 10:25

Talulahalula · 08/05/2024 22:31

My point was not that posters could not be identified in other ways, it was that Aston are being disingenuous if they say their intention was not to identify posters IRL. Firstly because it is possible to identify someone from a post and secondly because their research is about author attribution. So while they may not have intended to identify MN posters, it is theoretically possible to identify someone from the quotes they have used and secondly, the direction of travel of their research is to identify people from online posts.

Also, I might be wrong, but I think even the Daily Mail does not lift threads about adoption and infertility. And finally, Aston University, like all universities, is supposed to pay attention to research ethics, not ride roughshod over it.

And then, we have the point that their 11.3m grant is part of a project funded by the US govt intelligence service. And the dataset was offered for use by other researchers (now taken down).

Not that I am condoning lazy journalism, but I think there are quite big differences in what has gone on here and what the tabloids do.

Edited

Great post - agree 100%.

Also, it's well known on MN by most users that the DM and others lift threads (usually of the 'is my sister being a bridezilla?' variety) and if something is lifted they often then ask MN to delete it but none of the users know about this use, which is much, much more sinister.

MinorDisaster · 09/05/2024 11:38

I'm not sure that this has been mentioned before, apologies if I missed it, but the 'P Murray' who sent the FoI request to Aston about the 'understanding the black box' conference talk, later submitted two others on 3/5/24.

To Manchester University about 'Ethics and data protection for the acquisition of the source material for the "Tracking the structure and sentiment of vaccination discussions on Mumsnet" journal article',
and
to Newcastle University about 'Ethics and data protection for the acquisition of the source material for the "Scoping the Priorities and Concerns of Parents: Infodemiology Study of Posts on Mumsnet and Reddit" journal article'.

Thanks again to P Murray.

https://www.whatdotheyknow.com/user/p_murray

Dumbledoreslemonsherbets · 09/05/2024 19:57

P Murray is awesome. That is all.

Talulahalula · 09/05/2024 22:20

Dumbledoreslemonsherbets · 09/05/2024 19:57

P Murray is awesome. That is all.

Agreed.

GrannyAchingsShepherdsHut · 10/05/2024 09:28

P Murray is definitely awesome.

The issue I have is that I have no problem with my posts remaining on Mumsnet. I posted because I wanted to, I consented to my data being used in that way, I've no desire to request deletions.

I did not consent to Aston or any other university using my data for their research, regardless of where it sits on the scale of earth-shattering importance to dubious twatwankery.

I could send a screenshot of my username history from my account, that would presumably be enough evidence that I am me. But as said previously by many, that completely defeats the object. (and I'm not convinced they wouldn't use it to hone their stalker programming)

I don't only want my data removed from the corpus, I want any reference to any of my posts, or data extrapolated or aggregated from my posts removed from any research papers, thesis, presentations, slide decks, internal emails, the whole fucking lot.

I also want to know who else they've given access to the corpus so I can request the same from them.

Deleting my posts from MN would achieve none of that.

KellieJaysLapdog · 10/05/2024 09:49

Completely agree Granny. I consented to MN using my user-generated content and I understood the risk with tabloids lifting a single thread as a slow-news-day item.

I did not consent to being caught up in a massive data scrape and for my content to be held indefinitely outside of MN own servers. Aston (and other universities involved) can COCK OFF.

KellieJaysLapdog · 10/05/2024 09:50

Also, yes to P Murray (who crafts the finest FOIs I have ever laid eyes on).

Encyclopediaofnonsense · 10/05/2024 14:34

They should be working on the premise of "no response is no consent" and given they've had no response from any user here they have no consent from any user here.

similarminimer · 10/05/2024 17:37

Received this today - so they do not think they need to check unser my user names

The datasets held by Aston Institute of Forensic Linguistics do not link usernames to identifiable individuals. We are therefore only able to search for personal information which uses your official name, as this can be linked to you as an individual (“personally identifiable information”).

In order to determine whether we hold such information, we need to confirm your identity by requesting a copy of one piece of official photo ID. This helps us to protect your personal data and process it appropriately. Once we receive your ID, we will use it for the purposes of processing your request and will then securely delete it.

Please create an account

To comment on this thread you need to create a Mumsnet account.

This thread is not accepting new messages.
Swipe left for the next trending thread