Meet the Other Phone. A phone that grows with your child.

Meet the Other Phone.
A phone that grows with your child.

Buy now

Please or to access all these features

Site stuff

Join our Innovation Panel to try new features early and help make Mumsnet better.

See all MNHQ comments on this thread

Mumsnet Corpus

1000 replies

TokyoBouncyBall · 19/04/2024 11:36

Not a TAAT, but a bit of googling as a result of a now deleted thread has led me to this:

https://fold.aston.ac.uk/handle/123456789/18

I note it says that the License is uncertain. Can you confirm that you have given permission for posts to be used in this way, or is there something that Aston might like to look into?

I note it says Users who wish to access this dataset must make a detailed application to FoLD and the researcher, as well as potentially gain additional agreement from an external organisation before they can be approved for access.

Given one of the uses it is being put to, I think it is a bit dubious to say the least.

OP posts:
Thread gallery
82
logicisall · 06/05/2024 11:56

place marking.

YoucancallmeAI · 06/05/2024 12:47

@JustineMumsnet I know it's not a hack, but has the ICO been contacted? Given the way they've behaved, I really don't trust Aston to destroy the data they scraped (against the site Ts&Cs and probably the law). I think outside oversight is needed.

YoucancallmeAI · 06/05/2024 12:50

Also, it's not enough that they destroy the database - the also need to redact/destroy any research that used it - "fruits of the poison tree" as they say in American crime dramas.
We also need to know if all/parts of the database have been downloaded/stored by other entities.

everythingthelighttouches · 06/05/2024 20:13

ArabellaScott · 03/05/2024 10:24

I've had a chat with someone at the ICO.

Copied and pasted below (with their name redacted and shown as ICO):

[03/05/2024, 10:11:27] Arabella: Hi there. I have a query about data scraping and am hoping you can help.
[03/05/2024, 10:11:46] Arabella: I'm a registered user of Mumsnet, a large social media site that users use pseudonymously.
[03/05/2024, 10:11:59] Arabella: The Terms and Conditions explicitly forbid data scraping without permission.
[03/05/2024, 10:12:35] Arabella: It's recently come to light that several universities have been scraping enormous quantities of data from Mumsnet without asking permission from either the websirte or users.
[03/05/2024, 10:12:51] Arabella: This is then stored, sometimes 'made available' for research.
[03/05/2024, 10:13:10] Arabella: Many users are concerned that sometimes personal and intimate data has been shared without consent.
[03/05/2024, 10:13:16] [ICO]: It would very much depend on the lawful basis being relied upon to use the data. This can often be consensual during sign-up but we would need to investigate fully to be able to provide a definitive outcome decision.
[03/05/2024, 10:13:37] Arabella: Would this be when a university claims it's for 'research'?
[03/05/2024, 10:13:48] Arabella: How could we ask for an investigation?
[03/05/2024, 10:13:55] [ICO]:: Potentially. I would first
try raising your concerns with the organisation's Data Protection Officer (DPO)
in writing (details often found on their privacy policy at the bottom of their
website). If you fail to get a response or an inappropriate response please
feel free to raise a complaint with the ICO and we will investigate
accordingly.
https://ico.org.uk/make-a-complaint/data-protection-complaints/data-protection-complaints/
[03/05/2024, 10:14:06] [ICO]:: In order to investigate, and
as an evidence-based regulator, we require service users to raise their
concerns with the organisation's Data Protection Officer (DPO) first, and have
received a response to their correspondence. If you have already contacted the
DPO and they have provided a response (or lack of after one calendar month) you
would be in position to raise a complaint with the ICO here:
https://ico.org.uk/make-a-complaint/data-protection-complaints/data-protection-complaints/

If you haven't yet raised your concerns with the
organisation's Data Protection Officer we would kindly ask that you do that
first (details often found on their privacy policy at the bottom of their
website). If you need further guidance, and a number of useful templates to
assist with raising information rights complaints, please find additional
material here: https://ico.org.uk/for-the-public/

If you are unable to raise your concerns with the DPO please
feel free to raise your complaint via the link and we'll investigate
accordingly.
https://ico.org.uk/make-a-complaint/data-protection-complaints/data-protection-complaints/
[03/05/2024, 10:14:28] Arabella: Thanks. Do you mean the Mumsnet Data protection officer or that of the Universities involved?
[03/05/2024, 10:14:57] Arabella: Part of the problem is that women have shared information anonymously and often don't want to compromise that by contacting either organisation directly
[03/05/2024, 10:15:05] [ICO]:: Both parties. Whomever is processing or securing the data.
[03/05/2024, 10:15:12] Arabella: Thanks. That's really helpful.
[03/05/2024, 10:15:21] [ICO]:: If you feel unable to raise your concerns with the DPO please feel free to raise your complaint via the link and we'll investigate accordingly. https://ico.org.uk/make-a-complaint/data-protection-complaints/data-protection-complaints/

I contacted the Mumsnet DPO 12 days ago and still waiting for a response. Any response. I have not even had an acknowledgment of receipt.

I would expect Mumsnet to have already referred itself to the ICO on the basis of risk (special category data, large volume of data and number of data subjects, data scraping).

ArabellaScott · 06/05/2024 20:15

everythingthelighttouches · 06/05/2024 20:13

I contacted the Mumsnet DPO 12 days ago and still waiting for a response. Any response. I have not even had an acknowledgment of receipt.

I would expect Mumsnet to have already referred itself to the ICO on the basis of risk (special category data, large volume of data and number of data subjects, data scraping).

Well, the ICO said a month. But perhaps worth emailing again to check receipt?

JustineMumsnet · 07/05/2024 17:55

Thanks for your patience. I’m pleased to say that Aston has agreed to delete the historic dataset and make no further use of it. (It’s worth saying again that they insist that they haven’t breached any copyright laws because of exemptions that exist for the purposes of research.)

We are still in dialogue with them about the second, much smaller, dataset and the PHD research project and will keep you posted.

When an organisation acquires an individual’s personal data from a publicly-available source they automatically become what’s known as the Data Controller of that data and are responsible for processing that data in compliance with GDPR. If anyone has any particular concerns about personal information Aston may hold then do contact their Data Protection Officer: https://www.aston.ac.uk/about/statutes-ordinances-regulations/publication-scheme/data-protection.

GDPR has strict rules ensuring that personal information provided for the purposes of submitting a Subject Access Request are treated as confidential by the Data Controller in question.Likewise if anyone has concerns about particular posts on Mumsnet that contain personal identifiable information please do [email protected] and we’d be happy to delete them.

We’ll keep you posted on further developments of course.

Exceptions to copyright

Details of the exceptions to copyright that allow limited use of copyright works without the permission of the copyright owner.

https://www.gov.uk/guidance/exceptions-to-copyright

ArabellaScott · 07/05/2024 18:00

That is good to hear, Justine, thanks for the update.

Beingboredisgoodforyou · 07/05/2024 18:21

From the Exceptions to copyright link

Text and data mining for non-commercial research
Text and data mining is the use of automated analytical techniques to analyse text and data for patterns, trends and other useful information. Text and data mining usually requires copying of the work to be analysed.
An exception to copyright exists which allows researchers to make copies of any copyright material for the purpose of computational analysis if they already have the right to read the work (that is, they have ‘lawful access’ to the work). This exception only permits the making of copies for the purpose of text and data mining for non-commercial research. Researchers will still have to buy subscriptions to access material; this could be from many sources including academic publishers.
Publishers and content providers will be able to apply reasonable measures to maintain their network security or stability but these measures should not prevent or unreasonably restrict researcher’s ability to text and data mine. Contract terms that stop researchers making copies to carry out text and data mining will be unenforceable.
https://www.gov.uk/guidance/exceptions-to-copyright#text-and-data-mining-for-non-commercial-research

DrSpartacular · 07/05/2024 18:22

Wow, I was not expecting Aston to delete their MN corpus/sandbox without a fight Shock

Fab news @JustineMumsnet thank you and fingers crossed for a decent outcome with the PhD research...

RethinkingLife · 07/05/2024 18:22

Justine - I appreciate the update and thank you for your advocacy in addition to everything else.

I shall be interested to see what happens to some publications that used this and the response of any ethics bodies within the relevant institutions.

Like others, I await the outcome of the PhD discussion and scrape with interest. Particularly as the latest scrape was definitely after the T&Cs update that mentions scraping, iirc.

EasternStandard · 07/05/2024 18:23

Amazing thanks to the op for seeing this

AstonsDataThief · 07/05/2024 18:31

This isn’t just about copyright. It is about human rights and the respect to privacy, family and correspondence (article 8) - the university is a public authority.

Crankywiddershins · 07/05/2024 19:05

Thank you @JustineMumsnet for the work you have done so far .

Aston seen petulant in their response. They're just sorry they got caught and are hoping to go ahead with the PhD. If this happens I for one will be logging off for good.

Talulahalula · 07/05/2024 19:53

Thank you very much for this update, which is welcome.

I would like to make two points.

Firstly, relating to the deleted dataset, individuals, the forensic linguistics department and the university have benefitted financially and in reputation from work done based in part on the deleted dataset, directly and indirectly. Is there any redress for MN from this?

Secondly, I appreciate the willingness of MN to delete posts when users wish, and indeed, moderators have been prompt to do so when I have requested this. The problem is, and this relates to the data scape done in 2024, that I may ask MN to delete posts, but I have no way of asking Aston to delete posts. So there is no point in asking MN to delete posts.

Now aside from one thread I started on what used to be the feminist support board, I cannot remember the extent to which I posted personal stuff on FWR whilst engaging with debates about women’s rights. At some point, I reached the view that I should not have to lay out the things which had happened to me to justify why I believed that there were sex-based imbalances of power in society and that women historically had certain protections for a reason. However, it remains the fact that it is my experience of x, y or z which informs my views as well as seeing inequalities and injustices happen to other women. The personal is political - a central tenet of second wave feminism, which is where I have my intellectual roots, so to speak. FWR is no less personal or no less deserving of privacy and ethical use than any other area of the board.

And finally, whilst having a browse around Aston University’s Modern Languages and Linguistics submission to REF 2021, I was once again struck by the fact that the forensic linguistics papers submitted deal with crime, and particularly unsavoury crime at that.

DrSoupDragonsFriend · 07/05/2024 20:18

Justine, thank you very much from me too.

Please can we have the exact dates of the data that was scraped by Aston, including the additional set for the PhD? Until I know these, I don't know if any of my posts were affected.

As has been said, Aston will have benefited significantly, both in reputation and grants, from work that depends on the datasets they've taken, but there may be other sources of financial benefit too. Please would you ask Aston to let us know a) who else they authorised to use any of the MN data outside the AIFL, and b) if so, did Aston charge for its use?

It struck me, having glanced through the Exceptions page in the gov.uk link, that most people who publish within academia are - unless they have always been 100% open access - likely to make some sort of money from their published work being sold, copied or loaned. I know that, unless you are writing bestsellers, financial benefits from academic publishing can be pitiful, but authors do collect small sums from rights and secondary uses as well as sales.

Once the ranges of dates are known for the data scraping, please would you let all MN users know what has been happening? Thanks.

(Let's not forget that, in addition to Justine's continuing activity behind the scenes, we are also still waiting for the results of the two FoI requests about ethics etc.)

AstonCanKissMyArse · 07/05/2024 21:43

Thanks for the update Justine.

DrBlackbird · 07/05/2024 21:45

Beingboredisgoodforyou · 07/05/2024 18:21

From the Exceptions to copyright link

Text and data mining for non-commercial research
Text and data mining is the use of automated analytical techniques to analyse text and data for patterns, trends and other useful information. Text and data mining usually requires copying of the work to be analysed.
An exception to copyright exists which allows researchers to make copies of any copyright material for the purpose of computational analysis if they already have the right to read the work (that is, they have ‘lawful access’ to the work). This exception only permits the making of copies for the purpose of text and data mining for non-commercial research. Researchers will still have to buy subscriptions to access material; this could be from many sources including academic publishers.
Publishers and content providers will be able to apply reasonable measures to maintain their network security or stability but these measures should not prevent or unreasonably restrict researcher’s ability to text and data mine. Contract terms that stop researchers making copies to carry out text and data mining will be unenforceable.
https://www.gov.uk/guidance/exceptions-to-copyright#text-and-data-mining-for-non-commercial-research

This is from an Act passed in 2014 and dare I say it would benefit from being updated. It reads to me as if it focuses on online publications in the first instance and contradicts data protection in the second. Possibly a rather loose interpretation by Aston.

In any event, @JustineMumsnet will you consider anti-scraping software for the platform? That would protect us and ensure that any legitimate research/ers has to request it from you.

Dumbledoreslemonsherbets · 08/05/2024 09:45

AstonsDataThief · 07/05/2024 18:31

This isn’t just about copyright. It is about human rights and the respect to privacy, family and correspondence (article 8) - the university is a public authority.

This.

It is not just about copyright, it is about human rights and data protection.

I would argue the data protection and human rights violations of Aston are greater than any copyright breaches, certainly from our point of view.

I would recommend everyone contact Aston's DPO and ask for a SAR of any data held by them under usernames you are willing to give up, and then confirmation all the data under those usernames and any research including your data under those usernames, has been deleted. Making it clear that you did not and do not give consent for your writing to be used in the way they have used it.

Aston's statutory data protection officer is Samantha Burns, Director of Legal Services. [email protected]

Obviously there will be vulnerable women who for various reasons are unable to do this, which I think makes it even more important that those of us who can do it, do. I certainly will be.

Dumbledoreslemonsherbets · 08/05/2024 09:47

If you don't get an adequate response from Aston, then you can escalate to the ICO.

WookeyHole · 08/05/2024 10:00

I'm happy to set up a gmail account to email Aston's DPO using my username.

Could a wise person please provide a template with the best text to cover all aspects succinctly?

GrannyAchingsShepherdsHut · 08/05/2024 12:03

I'd like to do that, but I really don't want to send an email saying I am usernames 1,2,3,4 and 5. I can't see any way around it. (other than setting up email accounts for all the u/n individually?!)

everythingthelighttouches · 08/05/2024 12:49

@JustineMumsnet
Thank you for the updates and the work behind the scenes.

I would just like to pick up a point you made about Aston being the Controller.

Mumsnet is still a Controller of the data too.
As a data subject, I have proximity to Mumsnet and not Aston.

I believe Mumsnet still has obligations regarding my data and although Mumsnet is clearly not a joint controller with Aston, do you consider that Mumsnet has an obligation to its data subjects regarding concerns about use of our data at Aston?

Specifically does Mumsnet believe it has an obligation to report this incident to the ICO, and, to make representations to Aston on behalf of pseudonymised data subjects and their rights?

many thanks

Winnading · 08/05/2024 13:16

GrannyAchingsShepherdsHut · 08/05/2024 12:03

I'd like to do that, but I really don't want to send an email saying I am usernames 1,2,3,4 and 5. I can't see any way around it. (other than setting up email accounts for all the u/n individually?!)

I posted on the other thread 3? I think
I'm willing to set up a anonymous email, you can all add usernames and I will send the email off, or print and post if you like. Or both if necessary.

No need to out yourselves.

AstonCanKissMyArse · 08/05/2024 17:10

Winnading · 08/05/2024 13:16

I posted on the other thread 3? I think
I'm willing to set up a anonymous email, you can all add usernames and I will send the email off, or print and post if you like. Or both if necessary.

No need to out yourselves.

I would like to do this!

AstonUniDataScraperWankers · 08/05/2024 17:12

AstonCanKissMyArse · 08/05/2024 17:10

I would like to do this!

Me too.

Please create an account

To comment on this thread you need to create a Mumsnet account.

This thread is not accepting new messages.