Meet the Other Phone. Child-safe in minutes.

Meet the Other Phone.
Child-safe in minutes.

Buy now

Please or to access all these features

Site stuff

Join our Innovation Panel to try new features early and help make Mumsnet better.

See all MNHQ comments on this thread

Mumsnet Corpus

1000 replies

TokyoBouncyBall · 19/04/2024 11:36

Not a TAAT, but a bit of googling as a result of a now deleted thread has led me to this:

https://fold.aston.ac.uk/handle/123456789/18

I note it says that the License is uncertain. Can you confirm that you have given permission for posts to be used in this way, or is there something that Aston might like to look into?

I note it says Users who wish to access this dataset must make a detailed application to FoLD and the researcher, as well as potentially gain additional agreement from an external organisation before they can be approved for access.

Given one of the uses it is being put to, I think it is a bit dubious to say the least.

OP posts:
Thread gallery
82
Ereshkigalangcleg · 24/04/2024 10:05

If they can identify your unique linguistic features as a user (it's not certain that they can), it's not going to be difficult to find other instances within the same dataset.

Astontacious · 24/04/2024 10:06

AgathaAllAlong · 24/04/2024 09:23

But we did all write our comments on a public forum

We didn't consent to have them scraped for data though, and we posted on a forum that specifically has T&C's against this. Also, their data goes as far back as 2008, years before anyone knew that this technology would be possible.

Many of us take a risk posting. We've posted details of our children's SEN, of our DV situations, of personal family problems, health conditions, adoption and fertility issues, all sensitive topics. Risks include being identified IRL, alerting abusive partners, or having newspapers repost our stories. But for many, MN was the only outlet for these issues. The choice was drown alone or post here. In weighing up the risk many people change details like children's ages and length of relationship to try and not be recognised. And now there is technology being applied to our posts that deliberately aims to circumnavigate these precautions by identifying posts across usernames. Whether or not they publish these particular findings isn't relevant, the point is that we didn't - and indeed couldn't have - consented to this in posting anonymously here.

I would also highlight it is very disconcerting that the supervisor for this PhD changed her ‘Votes for Women’ to ‘Trans Rights are Human Rights’ on her Twitter and presumably has Ok’ed the title for this PhD of a researcher she appears to know well.

Taken from her website, the supervisor is:

‘registered on the National Crime Agency’s Specialist Operations Expert Advisers Database. Dr MacLeod has provided expert forensic linguistic reports for defence teams and to several police forces, including the Serious Organised Crime Agency, as well as to the Independent Police Complaints Commission. She has been instructed by solicitors and private clients in criminal and civil cases and has appeared as an expert witness in the Crown Court of England & Wales and both the Sheriff Court and High Court of the Judiciary in Scotland.’

‘Versus’

Me - I am a mum with a disabled child, fighting for single sex spaces to keep her safe and care dignified in hospital and other spaces. I don’t want my writing style to identify me. Will I be put on a database as a ‘bingo’? Are single sex spaces transphobic? Is wanting same sex personal care transphobic? I don’t think so but maybe they are buzz words/ phrases? I don’t feel confident that I won’t be accused of something I haven’t done. I don’t have the right of reply. And this involves my words about my children.

I don’t need the extra stress Aston are putting on me.

AgathaAllAlong · 24/04/2024 10:14

@everythingthelighttouches

From what I understand, the main purpose for compiling the dataset is to use it as a training ground for their AI tool (a "sandbox"). The tool is for the purposes of forensic identification online. Reading between the lines, I think that the ultimate aim is to create a tool with which they can make progress on identifying criminals by matching their writing style to their online anonymous posts. But if they are training and testing this tool on our data, then part of what they are doing is training the tool to identify and match posts from the same user. If successful this would work across name changes.

Additionally, they are allowing people to apply to use the data for their own purposes (like the paper I linked to upthread). There is no telling what researchers might apply to use it for (this PhD thesis is just one example), and we obviously have no control over what uses get approval.

AmaryllisNightAndDay · 24/04/2024 10:18

I've been thinking about this and wondering... not just about this project but all the issues it raises for the long term. I don't know what the outcome will be about this specific PhD project (likely to be re-planned), and about this specific dataset (access could be restricted, and/or parts removed, or all deleted... but I don't know)

Data hangs around for a long, long time. If a dataset was scraped in 2008 (say) then the children being discussed are becoming adults now. What are their rights?

What I am thinking is... in my university department one of the questions on the student project ethics form is whether you need permission from an outside organisation to use a specific dataset for your research. If this dataset survives, or if in future research data is gathered from MumsNet (in a more ethical way) and stored offsite - then one condition for any research project that wants to use that data could be that MumsNet have to approve each use, to see the proposal and agree the ethics.

And maybe MumsNet needs some ethical representation from its own contributors. Not only relying on MNHQ itself but also an ethics group who view any proposals relating to MumsNet data, maybe recruited partly from women who are contributors, and who are answerable to everyone who posts to MumsNet (so that the ethics group would not itself be susceptible to ideological capture)

Thoughts?

Ereshkigalangcleg · 24/04/2024 10:18

What Agatha said.

RedToothBrush · 24/04/2024 10:19

AgathaAllAlong · 24/04/2024 09:57

This is excellent research, thanks. Very interesting re: the US - so I wonder if THEY can use this data in ways that are not approved in the UK, or the laws of the country in which it was collected apply in any case.

If the US government are using reputable institutions to make a point using data stolen under UK law, those institutions cease to be reputable. Key point: The US government wants that level of credibility. They could find out information through more nefarious channels if they wanted to. The thing they want is to demonstrate a point using the reputations of established reputable institutions.

Thus these institutions need to be a) named and shamed b) It made explicitly clear this is unethical / goes against the ICOs statement on data scrapping and c) damages the reputation of these institutions when they act like this.

This is not ok.

I will keep saying this. This is against human rights. The US government has plenty of form for not respecting human rights. We have the ECHR to try and protect us from similar.

AmaryllisNightAndDay · 24/04/2024 10:20

Another condition that an organisation can impose is that data gathered for a specific project is deleted afterwards. Or that it is kept for a specific time period, and then deleted.

Ereshkigalangcleg · 24/04/2024 10:22

Research exemption is an exception to that, @AmaryllisNightAndDay

Ereshkigalangcleg · 24/04/2024 10:22

But there are conditions to being able to use it.

everythingthelighttouches · 24/04/2024 10:26

Thanks Agatha that is a really helpful explanation of what the research might be able to do.

I suspect that’s probably not how far it will get under a PhD studentship, however I am not an expert in AI so who knows?

The mere possibility seems to me to be enough for Mumsnet and Aston to report to the ICO.

JustineMumsnet · 24/04/2024 10:30

Further update: I spoke to the Vice Chancellor this am. He's promised that he and his team will take time to thoroughly answer our questions which I'm sending over now. A couple of things he wanted to stress - Aston believe they have legitimate rights to use the data and there is/has been no intention to identify individual posters from their posts. He also accepted that the recent research by a first year PHD into "transphobia" may not be of the quality they expect and that he will investigate and commit to enhancements in quality if appropriate. Obviously there's lots more detail we need from them - will update here as and when we hear back.

VitoCorleoneOfMNMafia · 24/04/2024 10:32

Many of us change names to segment aspects of our lives or to make doxxing harder over time.

This software and techniques developed by these researchers Piotr and Krzysztof is intended to demonstrate whether two different usernames on different forums are the same person by examining how words are used. It's a tiny step of logic to decide to use it to see if two different usernames on the same forum are the same person.

VitoCorleoneOfMNMafia · 24/04/2024 10:33

Aston believe they have legitimate rights to use the data

I'd love to know what makes him think that.

AnotherAngryAcademic · 24/04/2024 10:33

VitoCorleoneOfMNMafia · 24/04/2024 10:33

Aston believe they have legitimate rights to use the data

I'd love to know what makes him think that.

Quite.

AgathaAllAlong · 24/04/2024 10:33

@everythingthelighttouches the database and AI tool is a project by researchers at the university. They have talked about it at a conference and another poster found a published article referencing that Aston have this database. RedToothbBrush found that the research is US funded. So this is a big ongoing project, wider that the PhD project.

The PhD was using it for a different reason (to study "transphobic language" on MN across time). They applied to use it for this purpose and the university granted access.

Purpel · 24/04/2024 10:34

Aston believe they have legitimate rights to use the data
what did he/your lawyers say to him about the first paragraph in the terms and conditions about no scrapping and must not be reproduced

IDoNotConsentToAstonResearch · 24/04/2024 10:35

Thank you Justine.

How can they possibly believe they have a right to use the data when it was scraped against Mumsnet’s T&Cs?

Ereshkigalangcleg · 24/04/2024 10:35

Thanks @JustineMumsnet

KellieJaysLapdog · 24/04/2024 10:35

Absolute arseholes! There must be lots of legal challenges to Aston’s claims (an academic in their own legal department is arguing for the right to retain data privacy even AFTER DEATH)

Edited to add a reference: https://research.aston.ac.uk/en/publications/protecting-post-mortem-privacy-reconsidering-the-privacy-interest

VitoCorleoneOfMNMafia · 24/04/2024 10:36

AgathaAllAlong · 24/04/2024 09:55

I actually think that academic institutions telling staff what views they can express in public sets an extremely dangerous precedent.

With apologies for obnoxious bullet points but I think that these areas would be negatively affected by a neutral stance policy for staff:

  • Academic freedom: academics should be able to freely express their views
  • My employer doesn't own me. My own time and own profiles are my own. When I'm in class or representing the university then I abide by their official stance but not off the clock
  • Relatedly, academics aren't spokespeople for their institution. The deal is more than they are employed to teach and conduct research projects. But the findings and dissemination of these aren't controlled
  • Many academics use twitter to criticise academia or their own universities. We wouldn't know about half the bullshit that goes on without their freedom to express their own views.
  • This would hurt GC academics more than anyone. It's powerful that there's protection form being fired over GC views expressed privately
  • Actually knowing whether a potential supervisor has views like this is really really useful. Imagine getting a funded PhD on some feminist topic only to then find out that everyone in the institution you've chosen is anti GC feminism

You've just changed my view on this with your last three paragraphs. Thank you for making me rethink.

ArabellaScott · 24/04/2024 10:36

Eden's PhD has highlighted important considerations. Scraping data is in itself unethical and possibly illegal according to the ICO and MNs t and cs .

Allowing access to that data for a completely different purpose than the ostensible initial reason for scraping it is another breach.

Using it to criminalise women discussing their protected beliefs.

Using it to identify women using sensitive personal data.

Selling it? Sharing it?

Etc

RedToothBrush · 24/04/2024 10:37

JustineMumsnet · 24/04/2024 10:30

Further update: I spoke to the Vice Chancellor this am. He's promised that he and his team will take time to thoroughly answer our questions which I'm sending over now. A couple of things he wanted to stress - Aston believe they have legitimate rights to use the data and there is/has been no intention to identify individual posters from their posts. He also accepted that the recent research by a first year PHD into "transphobia" may not be of the quality they expect and that he will investigate and commit to enhancements in quality if appropriate. Obviously there's lots more detail we need from them - will update here as and when we hear back.

Given concerns and the lack of clarity, why are the ICO not being consulted as to whether this type of data scraping is ok without explicit consent.

MN users ultimately want to know this. This is in the public interest to find out what the ICO's position is on this and whether Aston have indeed got legitimate rights here.

Either MN or Aston need to do this to set this straight.

It is not satisfactory just to take Aston's position on this. We need an independant verification on this, from the experts. This is the ICO's job. Thats what it is there for.

Why isn't this being done?

Boiledbeetle · 24/04/2024 10:37

That is a fairly bollocks non response from him isn't it!

Whinge · 24/04/2024 10:37

VitoCorleoneOfMNMafia · 24/04/2024 10:33

Aston believe they have legitimate rights to use the data

I'd love to know what makes him think that.

Another who would like to know the answer to this.

I also want to know if MNHQ have reported the data scraping?

ArabellaScott · 24/04/2024 10:38

They don't get to scapegoat the student, btw. She's just been the canary.

Please create an account

To comment on this thread you need to create a Mumsnet account.

This thread is not accepting new messages.