Meet the Other Phone. Protection built in.

Meet the Other Phone.
Protection built in.

Buy now

Please or to access all these features

Site stuff

Join our Innovation Panel to try new features early and help make Mumsnet better.

See all MNHQ comments on this thread

Mumsnet Corpus

1000 replies

TokyoBouncyBall · 19/04/2024 11:36

Not a TAAT, but a bit of googling as a result of a now deleted thread has led me to this:

https://fold.aston.ac.uk/handle/123456789/18

I note it says that the License is uncertain. Can you confirm that you have given permission for posts to be used in this way, or is there something that Aston might like to look into?

I note it says Users who wish to access this dataset must make a detailed application to FoLD and the researcher, as well as potentially gain additional agreement from an external organisation before they can be approved for access.

Given one of the uses it is being put to, I think it is a bit dubious to say the least.

OP posts:
Thread gallery
82
Dumbledoreslemonsherbets · 25/04/2024 23:03

JustineMumsnet · 24/04/2024 10:30

Further update: I spoke to the Vice Chancellor this am. He's promised that he and his team will take time to thoroughly answer our questions which I'm sending over now. A couple of things he wanted to stress - Aston believe they have legitimate rights to use the data and there is/has been no intention to identify individual posters from their posts. He also accepted that the recent research by a first year PHD into "transphobia" may not be of the quality they expect and that he will investigate and commit to enhancements in quality if appropriate. Obviously there's lots more detail we need from them - will update here as and when we hear back.

So Aston did not get permission from MN to use the data (breaching copyright) and also seem to have completely ignored the privacy policy MN has in its terms of use - they have clearly not abided by this policy.

I don't really understand why MN is waiting for Aston to answer questions. Surely they should be compelled to answer questions via a lawyer? Surely this data breach should be reported to ICO by Mumsnet - has it been?

It would also be good to have clear reassurance that the data Aston have illegally scraped is now inaccessible to all users / not currently being examined or used in any way. Several users have clearly stated they do not consent to the new use of their data in this way.

The PhD which first alerted us to it all is the least of the issues.

DrSoupDragonsFriend · 25/04/2024 23:07

ArsetonUniversity · 25/04/2024 22:45

And let's not forget the 30 days only chat...

@SoupDragonsFriend what happened, did you lose your new title for using illegally scraped data in your Bagpuss references by any chance?

I don't know what the 30 days chat is/was. I'm a relative newbie and am trying and failing to limit my MN time so haven't gone exploring very far.

(Whoops about the name. Nothing to do with Bagpuss. It was my mistake!)

Encyclopediaofnonsense · 25/04/2024 23:10

DrSoupDragonsFriend · 25/04/2024 23:07

I don't know what the 30 days chat is/was. I'm a relative newbie and am trying and failing to limit my MN time so haven't gone exploring very far.

(Whoops about the name. Nothing to do with Bagpuss. It was my mistake!)

There are a few parts of Mumsnet that aren't accessible by Google (OTBT) or disappear within a set time frame (30 and 90 days). If Aston have scraped the whole site they have presumably done these parts too.

@JustineMumsnet @LilyMumsnet can either of you confirm if Mumsnet local has been scraped too? If it has what would the implications be for posters over there?

DrBlackbird · 25/04/2024 23:17

They think MN[HQ] are going to move out of their way & let them carry on with their dodgy antics, because they’re used to just barrelling along as they wish. Of course female humans should move out of their Very Important way, even if they’re pushing a pram/in a wheelchair/on crutches/with their guide dog/a nonagenarian pootling along with a zimmer frame/a toddler weebling unsteadily along on reins/a teenager with a school bag that looks as if they’re off to do a D of E exploration, their PE kit, & a violin; while the Very Important Male strides along encumbered only by their sense of entitlement

@NitroNine they very much do think they’re going to carry on. With the wimmin scootling along out of their way. They’ve got a massive corpus that sounds integral to their playing and their sandbox and their £11m grant. Methinks they’ll just ’wait it out’ expecting this to blow over as the wimmin go back to looking after the toddlers/tweens/teens/pets/PiLs etc.

RedToothBrush · 25/04/2024 23:20

AstonVillains · 25/04/2024 22:20

I'm not sure if it's been mentioned already (too many posts for my brain) but Mishcon de Reya fella is musing on A5(1)(a) of GDPR.
https://twitter.com/jonbainesdata/status/1783554331813216419

Finally asked DH about this. His professional opinion is that if you are scrapping data against terms and conditions this is not only against GDPR it's also against the computer misuse act. Which is a criminal offence not just one which might earn you a large fine.

He says that if MN doesn't put into a complaint to the ICO then we should make complaints to the ICO.

But yes, he's very much of the opinion this is a criminal act under the computer misuse act and data protection act which existed before GDPR.

He thinks that MN should be having some very interesting conversations with Aston and if enough users are unhappy they should sue. Otherwise they demonstrate they don't care about protecting their users.

I haven't mentioned this to him until now, and he's less than impressed saying that he doesn't think Aston have a reasonable defence at all.

Ooo goodie.

Notmycircusnotmydonkeys · 25/04/2024 23:27

RethinkingLife · 23/04/2024 08:04

I've just taken a look at the debt thread analysed in the first paper I cited above:

Stanley, L.M. orcid.org/0000-0003-3882-8682, Deville, J. and Montgomerie, J. (2016) Digital Debt Management The Everyday Life of Austerity. New Formations: A Journal of Culture, Theory, Politics, 87. pp. 64-82. ISSN 0950-2378

There's no MNHQ indication that there's permission for academics to look at it there.

And the OP explicitly encourages posters that the thread is effectively a safe space to share their parlous financial situation and woes because it's anonymous.

I think that's a completely different bit of research from the Aston one. As far as I recall they had set it up as a legit research question thread, or at least a legit thread with permission to generate data from it.

RedToothBrush · 25/04/2024 23:30

Computer misuse act 1990

(1)A person is guilty of an offence if—
(a)he causes a computer to perform any function with intent to secure access to any program or data held in any computer [F1, or to enable any such access to be secured];
(b)the access he intends to secure [F2, or to enable to be secured,] is unauthorised; and
(c)he knows at the time when he causes the computer to perform the function that that is the case.
(2)The intent a person has to have to commit an offence under this section need not be directed at—
(a)any particular program or data;
(b)a program or data of any particular kind; or
(c)a program or data held in any particular computer.
[F3(3)A person guilty of an offence under this section shall be liable—
(a)on summary conviction in England and Wales, to imprisonment for a term not exceeding [F4the general limit in a magistrates’ court] or to a fine not exceeding the statutory maximum or to both;
(b)on summary conviction in Scotland, to imprisonment for a term not exceeding [F512] months or to a fine not exceeding the statutory maximum or to both;
(c)on conviction on indictment, to imprisonment for a term not exceeding two years or to a fine or to both.]

Let's reflect on that in the context of unauthorised mass collection of data using software against T&C's of use of a website.

Hmm.

SqueakyDinosaur · 25/04/2024 23:41

Thank you to your DH, @RedToothBrush .

RedToothBrush · 25/04/2024 23:52

His point here is that because MN have explicitly said Aston have acted in a way that is against their T&C's there is a problem here.

Data protection aside.

It IS interesting that MN have gone that far already.

Whether MN realise this or not, I have no idea but it does suggest there perhaps is more of an issue than Aston are willing to admit to.

Certainly he seems to be of the opinion that MN users would be well within their rights and the scope of the issues raised to make formal complaints to the ICO if MN fail to do so.

He does think that MN would be failing users if they didn't take some sort of action.

He did say that MN should be having some very duty conversations with Aston as part of all this though.

Aston would be very unwise to simply shrug their shoulders and try and say 'its all fine' though. It's really not fine.

That's BEFORE you start talking about ethics which, of course, by definition aren't legally enforceable.

Ultimately you don't tend to have these kind of conversations for issues which have no ethical considerations.... So yes I do have 'questions' about how the hell no one has raised issues about this before WITHIN Aston. The title of the paper alone is seriously questionable.

RedToothBrush · 25/04/2024 23:56

AnotherAngryAcademic · 25/04/2024 18:12

Audrey Ludwig (a solicitor who contributes to the Legal Feminist blog with Naomi Cunningham et al) has now tweeted about this thread. Her tweet has been picked up by Sarah Philmore, and someone has tagged Jo Phoenix. Plenty of others also chipping in to comment. It will be interesting to see what they all make of it!

Audrey Ludwig's twitter thread is here

Oh lookie here.

One of the replies to this thread says the following.

Under the Computer Misuse Act 1990, it is a criminal offence to access a computer program or data without authorisation. This may apply to data scrapers and miners if the website owner prohibits the type of access made by them (eg in the website's terms of use).
AT AstonUniversity

Its not just my DH making this exact point.

Hmm.

RethinkingLife · 25/04/2024 23:57

Notmycircusnotmydonkeys · 25/04/2024 23:27

I think that's a completely different bit of research from the Aston one. As far as I recall they had set it up as a legit research question thread, or at least a legit thread with permission to generate data from it.

Your recollection is based on the Stanley paper? I posted the pdf link, is the printed version different?

I looked at the MN thread in the Stanley paper, it has no indication that permission has been granted to any set of academics.

Am I missing your point or are you missing mine that I was raising questions about any academic use of MN and permission. Plus, papers that involved MSE.

RedToothBrush · 25/04/2024 23:59

Incidentally looking this up it appears the Computer Misuse Act is currently under review as it felt it needs updating and isn't reflective of the current time.

Wanna bet one of the concerns is data scraping?

Mmmnotsure · 26/04/2024 01:18

Audrey Ludwig's tweet now has over 100k views. Lots of engagement.

A frequent comment is that it really is NOT a good idea to piss off the Mumsnet demographic and @MumsnetTowers. Let's hope for all our sakes that that proves to be the case.

DogsAkimbo · 26/04/2024 04:49

A frequent comment is that it really is NOT a good idea to piss off the Mumsnet demographic and @MumsnetTowers. Let's hope for all our sakes that that proves to be the case.

Indeed. Corpus Interruptus.

Whinge · 26/04/2024 08:19

Audrey Ludwig's tweet now has over 100k views. Lots of engagement.

That's great news.

I wonder if MNHQ will update users again this week. A pop up notification or pinned thread would be helpful, as I suspect there are a lot of people who use MN who have no idea about the situation.

MsGrumpytrousers · 26/04/2024 08:49

Mmmnotsure · 26/04/2024 01:18

Audrey Ludwig's tweet now has over 100k views. Lots of engagement.

A frequent comment is that it really is NOT a good idea to piss off the Mumsnet demographic and @MumsnetTowers. Let's hope for all our sakes that that proves to be the case.

There’s also a suggestion for a technical fix, though I can’t judge how feasible it is:

“I would suggest that mumsnet looks into anti-scraping tech, it's fairly easy, you just have a thread that humans can't see but a scraper can, and you fill that thread full of every DROP TABLE combo you can think off that when scraped will nuke the uni database ;)”

so maybe @JustineMumsnet could get the technical team as well as the lawyers onto it?

Huge apologies if this angle has already been explored.

Ormally · 26/04/2024 09:16

I'm quite interested in the preamble to the debt thread - it was copied into one of the threads on this issue but I can't find it now. It stressed anonymity and the intentions behind the thread (and added a few behavioural expectations for posts).

Who created that? Would it, in this case, have been someone internal to Mumsnet or a general user like most threads?

I'm not quite wishing to rub my hands ghoulishly about it, because it's yet another area where real people meet sensitive data meets trust in a brand, meets the question of whether their writing has any power within the forum framework. However, I'm now also wondering if, as well as the express Ts & cs for site use, that thread example could add some implied terms too - implied, often further down the line, by conduct and on 'what the parties say and do at different times'. And yet, it's a thread that has been cherry picked for a research paper.

Whinge · 26/04/2024 09:28

RethinkingLife · 23/04/2024 08:04

I've just taken a look at the debt thread analysed in the first paper I cited above:

Stanley, L.M. orcid.org/0000-0003-3882-8682, Deville, J. and Montgomerie, J. (2016) Digital Debt Management The Everyday Life of Austerity. New Formations: A Journal of Culture, Theory, Politics, 87. pp. 64-82. ISSN 0950-2378

There's no MNHQ indication that there's permission for academics to look at it there.

And the OP explicitly encourages posters that the thread is effectively a safe space to share their parlous financial situation and woes because it's anonymous.

@Ormally

Do you mean this one?

I've reposted the image as I don't think it shows up in the quoted post.

Mumsnet Corpus
RethinkingLife · 26/04/2024 09:29

Ormally · 26/04/2024 09:16

I'm quite interested in the preamble to the debt thread - it was copied into one of the threads on this issue but I can't find it now. It stressed anonymity and the intentions behind the thread (and added a few behavioural expectations for posts).

Who created that? Would it, in this case, have been someone internal to Mumsnet or a general user like most threads?

I'm not quite wishing to rub my hands ghoulishly about it, because it's yet another area where real people meet sensitive data meets trust in a brand, meets the question of whether their writing has any power within the forum framework. However, I'm now also wondering if, as well as the express Ts & cs for site use, that thread example could add some implied terms too - implied, often further down the line, by conduct and on 'what the parties say and do at different times'. And yet, it's a thread that has been cherry picked for a research paper.

It was written by the OP of the debt thread and a general user. She'd been involved in the previous ones (this was thread 3 and a continuation of posters gathering together to discuss this) - this might have been back in the days before MN had specific Talk boards, I think).

The paper discusses the OP's style of support on the thread.

There is no indication of MN permission on that thread.

Post with quotation:

https://www.mumsnet.com/talk/sitestuff/5057903-mumsnet-corpus?reply=134741451&

ETA: xd with Whinge

Page 4 | Mumsnet Corpus | Mumsnet

Not a TAAT, but a bit of googling as a result of a now deleted thread has led me to this: [[https://fold.aston.ac.uk/handle/123456789/18 https://fold...

https://www.mumsnet.com/talk/site_stuff/5057903-mumsnet-corpus?reply=134741451

Ormally · 26/04/2024 09:35

Whinge

Thanks, that's the one.
So it looks like it was created by an external and general poster, not someone internal to MN.

"This is an anonymous forum..."
"I will not tell your employer, family or friends..." OK. In which case, that needs a strict limited lifespan like 30 days only, and posters should take great care in not taking at face value the terms that seem extraneous to the ones that come from the top.

A bit like "nobody can be proved to be medically qualified on here - seek proper advice if you are in doubt".

Things I knew, but so worth slap-in-the-face reminders from time to time.

Ormally · 26/04/2024 09:40

...this might have been back in the days before MN had specific Talk boards, I think.

  • And if that's true, then I do think that the time span and the 2008 to 2020-something feature of the corpus has relevance, even if what ts and cs there were changed in 2018 or whenever while the prior information was scrapeable and accessible.
Saisong · 26/04/2024 09:42

Something occurs to me about this vast, but presumably static version of MN that Aston holds and the PhD students investigation of "transphobic hate crime"
We know that there are occasional transphobic posts on MN, possibly sometimes by malicious actors looking for screen grabs. However MN quite rightly delete these, and more, under their talk rules.
The PhD researcher could be analysing these as evidence of our dreadful transphobia, even though those posts no longer exist in the 'real world' of MN.

Ereshkigalangcleg · 26/04/2024 09:56

Yes, as other people have said, there will be text from outside and quotes, as well as deliberate trolling in the way you mention. When Emma Healey doxxed people, one of the people was a pro trans person making a sarcastic comment, something completely ignored by TRAs and puff pieces on Mumsnet "transphobia" written by TRA journalists later.

AmaryllisNightAndDay · 26/04/2024 10:36

Saisong · 26/04/2024 09:42

Something occurs to me about this vast, but presumably static version of MN that Aston holds and the PhD students investigation of "transphobic hate crime"
We know that there are occasional transphobic posts on MN, possibly sometimes by malicious actors looking for screen grabs. However MN quite rightly delete these, and more, under their talk rules.
The PhD researcher could be analysing these as evidence of our dreadful transphobia, even though those posts no longer exist in the 'real world' of MN.

There would not be enough deleted posts caught by a scrape to do meaningful research. Though that might not stop everyone!

Boiledbeetle · 26/04/2024 10:56

I have so many questions about deleted posts and threads and how Aston planned to manage this, especially when they are trying to quantify 'transphobia'

Was every user of the dataset intending to go back to the mumsnet source to ensure that say 'transphobic' comments still stood? Maybe they did their data scrape in the middle of the night and mumsnet didn't delete the posts until the morning. That's happened quite a few times that I've noticed.

Do they do any tally to the number of posts in a thread when they scraped versus what number of posts that remain at the source. Do they remove data connected to the now deleted posts from their studies? Do they not delete it but make the reader aware that posts they are using to make their point have since been deleted. Do they not even consider this maybe?

How would they quantify deletions? Would they be able to split between deleted and withdrawn posts? Would they understand that we now have an edit button? Would the data scrapped include all versions of the edits? If so would they get counted as separate posts?

Do they make assumptions as to the reason for the deletion about posts that were already deleted by the time of their data scrape?

So many questions!

Please create an account

To comment on this thread you need to create a Mumsnet account.

This thread is not accepting new messages.