Meet the Other Phone. Flexible and made to last.

Meet the Other Phone.
Flexible and made to last.

Buy now

Please or to access all these features

Site stuff

Join our Innovation Panel to try new features early and help make Mumsnet better.

See all MNHQ comments on this thread

Mumsnet Corpus

1000 replies

TokyoBouncyBall · 19/04/2024 11:36

Not a TAAT, but a bit of googling as a result of a now deleted thread has led me to this:

https://fold.aston.ac.uk/handle/123456789/18

I note it says that the License is uncertain. Can you confirm that you have given permission for posts to be used in this way, or is there something that Aston might like to look into?

I note it says Users who wish to access this dataset must make a detailed application to FoLD and the researcher, as well as potentially gain additional agreement from an external organisation before they can be approved for access.

Given one of the uses it is being put to, I think it is a bit dubious to say the least.

OP posts:
Thread gallery
82
Ferretaria · 02/05/2024 08:26

I hope lawyers are on the case.

MyLadyDisdainlsYetLiving · 02/05/2024 13:03

Ferretaria · 02/05/2024 08:26

I hope lawyers are on the case.

Justine has given an update in the latest thread in FWR.

https://www.mumsnet.com/talk/womens_rights/5066453-thread-2-a-corpus-assisted-discourse-analysis-of-linguistic-transphobia-on-mumsnet?reply=134966555

JustineMumsnet · 02/05/2024 14:30

Apols I should have posted this here too. Here's my update on where we're at with Aston...

Thanks all, for your patience. As many of you have surmised we’ve been in some back and forth with Aston Uni to try to establish exactly what’s gone on: what MN data they hold, how they obtained it, for what purpose, how it’s stored etc.

Whilst we have a number of concerns, it’s worth saying upfront that Aston is adamant that the dataset has never been used to attempt to identify individual posters and that is not the purpose of their research; and we have no reason to disbelieve them on that front.

That said, we do believe that Aston has behaved unethically and unlawfully by scraping our website without seeking prior permission and in breach of our copyright, to obtain two datasets for forensic linguistic research purposes. The first set was obtained in 2019 and entailed the scraping of a large volume of posts. The second (for the PHD student’s research) was obtained in January 2024 and involves a much smaller number of posts.

Aston University’s Intellectual Property Policy, as listed on the university website says:

“It is an infringement of copyright in a work to copy the whole or a substantial part of the work, whether manually, electronically or otherwise, without authorisation or licence from the copyright owner. Substantial is measured by reference to quality and therefore copying of even a small part could be an infringement. This is the case even if the work is freely accessible and available in the public domain, including online, unless a licence is clearly provided and the proposed activities fall with [sic] the scope of the licence.”

Mumsnet’s terms of use (though updated in Nov 23 to specifically prohibit scraping) have always stated clearly that “the web site and its contents are copyright Mumsnet, all rights reserved”

Mumsnet is a rich source of information and almost unique in being a large female-dominated online discussion forum - so we do appreciate why it’s very attractive to researchers. However, we are extremely careful about what research we will allow and who we permit to do it. We are more likely to consent to projects that elevate women’s voices, or research that could be used to design products, services or policies to make parents’ lives easier. Examples of research project we’ve consented to in the past are:

Developing models to assist the diagnosis of mental health conditions, such as postnatal depression, and the identification of individuals at risk. [Turing Institute]

Quantifying the real impact of the reality of lived experiences of women in the UK with regards to violence against women and girls. [Bolton University]

I very much doubt we would have agreed to the creation of a sandbox for forensic linguistics without assurances about how the data was being used and for what purpose. We certainly wouldn’t have agreed to the holding of the dataset for further research projects to be decided upon at the discretion of the University.

We also have concerns about Aston’s ethical approval process (which we have raised with the Vice Chancellor). Specifically, we do not believe that appropriate consideration has been given to the potential risk of harm caused by the PHD’s research to the Mumsnet website, its reputation and its community.

Consequently we have asked Aston University to immediately cease and desist all scraping activities, to immediately destroy both datasets and to provide a written assurance that they will refrain from any further unauthorised access or use of our content.

We will of course update you as and when we can.

AstonsDataThief · 02/05/2024 14:55

Thank you for your efforts and the update.

Aston University must also retract any research papers or posters following on from the unlawful extraction of these datasets.

SqueakyDinosaur · 02/05/2024 14:58

I would imagine that the legal and data protection teams of Aston are feeling a bit brown-trousered right now.

AstonCanKissMyArse · 02/05/2024 15:02

JustineMumsnet · 02/05/2024 14:30

Apols I should have posted this here too. Here's my update on where we're at with Aston...

Thanks all, for your patience. As many of you have surmised we’ve been in some back and forth with Aston Uni to try to establish exactly what’s gone on: what MN data they hold, how they obtained it, for what purpose, how it’s stored etc.

Whilst we have a number of concerns, it’s worth saying upfront that Aston is adamant that the dataset has never been used to attempt to identify individual posters and that is not the purpose of their research; and we have no reason to disbelieve them on that front.

That said, we do believe that Aston has behaved unethically and unlawfully by scraping our website without seeking prior permission and in breach of our copyright, to obtain two datasets for forensic linguistic research purposes. The first set was obtained in 2019 and entailed the scraping of a large volume of posts. The second (for the PHD student’s research) was obtained in January 2024 and involves a much smaller number of posts.

Aston University’s Intellectual Property Policy, as listed on the university website says:

“It is an infringement of copyright in a work to copy the whole or a substantial part of the work, whether manually, electronically or otherwise, without authorisation or licence from the copyright owner. Substantial is measured by reference to quality and therefore copying of even a small part could be an infringement. This is the case even if the work is freely accessible and available in the public domain, including online, unless a licence is clearly provided and the proposed activities fall with [sic] the scope of the licence.”

Mumsnet’s terms of use (though updated in Nov 23 to specifically prohibit scraping) have always stated clearly that “the web site and its contents are copyright Mumsnet, all rights reserved”

Mumsnet is a rich source of information and almost unique in being a large female-dominated online discussion forum - so we do appreciate why it’s very attractive to researchers. However, we are extremely careful about what research we will allow and who we permit to do it. We are more likely to consent to projects that elevate women’s voices, or research that could be used to design products, services or policies to make parents’ lives easier. Examples of research project we’ve consented to in the past are:

Developing models to assist the diagnosis of mental health conditions, such as postnatal depression, and the identification of individuals at risk. [Turing Institute]

Quantifying the real impact of the reality of lived experiences of women in the UK with regards to violence against women and girls. [Bolton University]

I very much doubt we would have agreed to the creation of a sandbox for forensic linguistics without assurances about how the data was being used and for what purpose. We certainly wouldn’t have agreed to the holding of the dataset for further research projects to be decided upon at the discretion of the University.

We also have concerns about Aston’s ethical approval process (which we have raised with the Vice Chancellor). Specifically, we do not believe that appropriate consideration has been given to the potential risk of harm caused by the PHD’s research to the Mumsnet website, its reputation and its community.

Consequently we have asked Aston University to immediately cease and desist all scraping activities, to immediately destroy both datasets and to provide a written assurance that they will refrain from any further unauthorised access or use of our content.

We will of course update you as and when we can.

Thanks for the update and the work behind the scenes.

Can you clarify whether the data breach has been reported to the ICO? It's something a lot of us are concerned about.

The ICO exists to deal with these kinds of issues and should be properly involved.

PerkingFaintly · 02/05/2024 15:08

Whilst we have a number of concerns, it’s worth saying upfront that Aston is adamant that the dataset has never been used to attempt to identify individual posters and that is not the purpose of their research; and we have no reason to disbelieve them on that front.

I'm sure you're ahead of me on this, but it might be good to be a little wary of this statement.

The obvious failing is that an Aston user might have attempted to directly identify individual MN posters and Aston not know about it.

However there is another, more subtle, possible failing.

If we look at what happened with the Facebook data-scrape, Cambridge Analytica didn't really care about identifying individuals. What they used the date for was to identify generic posting profiles which could then be used to sort Facebook (and other) users into "universes" of people who would be likely to behave in the same ways. Then they could herd all the people in that universe in the same way – without any need to get down to the individual level.

So with the Aston language analysis, in addition to the risk of being identified as an individual across platforms (or, horror, being wrongly identified because of similar language-use), there is also a risk of this database being used to profile people as "belonging to a group which is prone to behaviour X". Such classifying is not a benign thing to do!
https://www.theguardian.com/news/2018/mar/17/cambridge-analytica-facebook-influence-us-election

Two more points. I remember Alexander Nix, in a TV interview, saying that Cambridge Analytica no longer held the original data – and that might well be true. But they still held the OUTPUTS from processing the data, which is the bit they cared about and the bit which was dangerous.

The other point, again from memory so I could be wrong, but I seem to remember Kogan who did the original research on the data not having intended the harmful use to which it was put, and being horrified at what he'd created.

The parable of Frankenstein follows us down the years...

Revealed: 50 million Facebook profiles harvested for Cambridge Analytica in major data breach

Whistleblower describes how firm linked to former Trump adviser Steve Bannon compiled user data to target American voters

https://www.theguardian.com/news/2018/mar/17/cambridge-analytica-facebook-influence-us-election

AmaryllisNightAndDay · 02/05/2024 15:09

Thank you @JustineMumsnet for dealing with this and for the update, which is as I'd hoped (well in an ideal world Aston would have deleted the datasets by now but I understand there has to be a process and it wont be instant)

DeanElderberry · 02/05/2024 16:01

I still wonder to what extent these student analysts and critics of Mumsnet are creating their own threads and posts that they can then use to prove the site is guilty of something heinous. I'd like to know whether there was a flurry of problematic posts at the time the PhD candidate collected their data, and also whether the recent flurry, at the end of the Easter holidays, was connected to a specific university assignment.

I think Mumsnet need to be more alert and proactive on that score. The perfectly proper discouragement of troll hunting by users is good, but it seems to me that MN needs to match that with a higher level of vigilance against trolling by linguistics students and others seeking to provide themselves with essay, dissertation, and thesis fodder.

RethinkingLife · 02/05/2024 16:06

I still wonder to what extent these student analysts and critics of Mumsnet are creating their own threads and posts that they can then use to prove the site is guilty of something heinous.

Bunbury highlighted that as a risk (not specific to students).

DeanElderberry · 02/05/2024 16:07

Particularly at weekends, over bank holidays, and during University vacations. I sued to just think it was 'the devil finding work for idle hands' but looking at all those eager linguistics students who need big collections of words about specific subjects to work on makes me wonder what short cuts they might take.

IDoNotConsentToAstonResearch · 02/05/2024 16:11

DeanElderberry · 02/05/2024 16:07

Particularly at weekends, over bank holidays, and during University vacations. I sued to just think it was 'the devil finding work for idle hands' but looking at all those eager linguistics students who need big collections of words about specific subjects to work on makes me wonder what short cuts they might take.

Given the sloppiness over ethics Aston has showed I would certainly not rule that out.

DeanElderberry · 02/05/2024 16:46

It would be nice if it was confined to Aston, but we can't be sure of that.

AstonUniDataScraperWankers · 02/05/2024 17:00

Can we av one more remedy pls? I'd like entire aston forensic linguini dept put in the stocks on village green. And then we can moov on.

BIWI · 02/05/2024 17:04

@DeanElderberry

"I think Mumsnet need to be more alert and proactive on that score. The perfectly proper discouragement of troll hunting by users is good, but it seems to me that MN needs to match that with a higher level of vigilance against trolling by linguistics students and others seeking to provide themselves with essay, dissertation, and thesis fodder"

But how would they do this? I think that's the whole difficulty with this issue, that no-one really knows how scraping is done, and how said students might be trolling. I suspect that linguistics students are going to be far better at trolling than our usual piss/poo/period (etc) trolls are! It would mean a level of moderation that I doubt Mumsnet could possibly countenance.

AmaryllisNightAndDay · 02/05/2024 17:24

BIWI · 02/05/2024 17:04

@DeanElderberry

"I think Mumsnet need to be more alert and proactive on that score. The perfectly proper discouragement of troll hunting by users is good, but it seems to me that MN needs to match that with a higher level of vigilance against trolling by linguistics students and others seeking to provide themselves with essay, dissertation, and thesis fodder"

But how would they do this? I think that's the whole difficulty with this issue, that no-one really knows how scraping is done, and how said students might be trolling. I suspect that linguistics students are going to be far better at trolling than our usual piss/poo/period (etc) trolls are! It would mean a level of moderation that I doubt Mumsnet could possibly countenance.

@AstonsDataThief has pointed out that publications based on data that was not gathered ethically may have to be retracted and that hits where it hurts. Academic careers are based on publications so anyone relying on an illicit data scrape is vulnerable. And if they were found to have acted as an agent provocateur after their papers were published that could be their career right down the tubes.

PowerTulle · 02/05/2024 17:40

@JustineMumsnet Surely both the data and any and all of the work based on that data produced by the university (or students) must be destroyed too?

As Perking says upthread, it’s the subsequent work using the data that is potentially harmful to us and MN. And very valuable to Aston.

PerkingFaintly · 02/05/2024 17:42

AstonUniDataScraperWankers · 02/05/2024 17:00

Can we av one more remedy pls? I'd like entire aston forensic linguini dept put in the stocks on village green. And then we can moov on.

GrinGrinGrin

ArsetonUniversity · 02/05/2024 20:19

Thanks for the update.

I'm shocked that, contrary to what I had believed and to my defence of the student, the student actually did do more scraping specifically for their research. How the fuck did that get approved?? Madness.

AstonToTheNaughtyStep · 02/05/2024 20:30

ArsetonUniversity · 02/05/2024 20:19

Thanks for the update.

I'm shocked that, contrary to what I had believed and to my defence of the student, the student actually did do more scraping specifically for their research. How the fuck did that get approved?? Madness.

I'm currently feeling and thinking exactly the same.

ArsetonUniversity · 02/05/2024 20:36

And I'd also missed how "A corpus-assisted discourse analysis of linguistic transphobia on Mumsnet, 2008-2023" is actually their PhD thesis title, I'd for some reason mistakenly believed that it was just a talk or a chapter, rather than the whole damn project. Again, how did a project that relies ENTIRELY on illicitly acquired data get approved??

SpicyMoth · 02/05/2024 20:39

AstonUniDataScraperWankers · 02/05/2024 17:00

Can we av one more remedy pls? I'd like entire aston forensic linguini dept put in the stocks on village green. And then we can moov on.

"I'd like entire aston forensic linguini dept put in the stocks on village green."

Ik this is 100% not what you want, but all this made me think about is a street party-esque type get together at the local village green featuring a lovely true-crime documentary projected up on the back of some beach huts and stock-rich creamy linguine pasta for all to tuck in on!

Can you tell I'm hungry? 😩😂

RethinkingLife · 02/05/2024 20:43

ArsetonUniversity · 02/05/2024 20:36

And I'd also missed how "A corpus-assisted discourse analysis of linguistic transphobia on Mumsnet, 2008-2023" is actually their PhD thesis title, I'd for some reason mistakenly believed that it was just a talk or a chapter, rather than the whole damn project. Again, how did a project that relies ENTIRELY on illicitly acquired data get approved??

And the (now removed) LinkedIn account for Eden Palmer referred to hate crimes and transphobic rhetoric on MN.

Couple of versions but I think the account is now unavailable.

Mumsnet Corpus
Mumsnet Corpus
Astontacious · 02/05/2024 20:57

Have to say it was the blatant ‘hate crimes’ wording that I saw, then the supervisor’s banner change from Votes for Women to Trans rights are Human rights. The 2 together were breathtaking.

Please create an account

To comment on this thread you need to create a Mumsnet account.

This thread is not accepting new messages.
Swipe left for the next trending thread