I've been keeping an eye on the various FOI requests about the data scrape. One has been declined, with Aston saying they don't have the information requested. However
I'm not sure what the exact status of the data is at the moment (IIRC Aston have agreed to delete it) - but it is worth noting that analysis was being done on the student's own computer, which is not encrypted, so hopefully anything related to those analyses has also been deleted. There is a spreadsheet of usernames and the threads they post in, which will presumably also be deleted?
Snippets from the ethics form for the PhD project:
"There were two factors I needed to consider when measuring whether informed consent must be gathered for the purpose of this project: the first is how much contact I will have with the Mumsnet posters, and the second is the level of privacy afforded by Mumsnet fora (Eysenbach and Till, 2001). I will carry out a ‘passive’ (Eysenbach and Till, 2001:1103) analysis of the linguistic data, meaning that I will not be directly involved with the Mumsnet posters. Instead, I will analyse linguistic data that already exists on the website. According to Roberts (2015), this usually means that informed consent is not required."
"Raw data used in the study will only be analysed on my own password protected laptop, and stored on Box, the cloud device used by Aston University. During the data collection portion of the project (as the text files are being created), any identifying information pertaining to the posters will be redacted for anonymization purposes (for example, some usernames and locations).
A key will also be created of Mumsnet usernames to quantify the number of posters in each thread, which will give me a number of participants overall. This will be done by copying and pasting each username on each thread into a spreadsheet for each corpus, and cross-referencing how many threads feature posters with the same name. This will also allow me to assign pseudonyms to participants if needed, and to track which pseudonym belongs to each user. This key will be deleted after the redaction has taken place."
"There is a very small risk of potential loss of anonymity of posters."
"Overall, it will be unlikely that posters’ true identities will be accidentally revealed during the project. This has been discussed at length between me and my supervisors."
"Could participation cause discomfort (physical and/or psychological – e.g., distressing, sensitive or embarrassing topics), inconvenience and/or danger beyond the risks encountered in normal life? Please indicate the level of risk and plans to address these potential risks. N/A".
(Apparently a risk to the researchers was anticipated, but this has been redacted.)
Supervisor's comment:
"I can confirm that I have had extensive discussions with <student's name> about this application and the design of the project as a whole. We have talked at great length about the importance of redacting usernames and applying pseudonyms, and destroying the key to pseudonyms as soon as soon as this is completed. While strings of words will remain searchable and discoverable on Mumsnet.com, they will not point to an identifiable individual as this information is not shared on Mumsnet. We have also discussed the importance of researcher welfare and ensured adequate support is in place."
From correspondence about the ethics form:
"The second is just to flag the possibility that any direct quotes you may use (I wasn’t sure if this was a possibility) could be googled and lead back to the Mumsnet user which MAY including identifiable information (as we can’t guarantee pseudonymisation via username e.g. if I was xxxxxxxxxxxxx, someone could work out it was me!). It would be great if you could discuss how you’re mitigating this a little more to demonstrate that you’ve thought about the possibility (even if remote) and what steps you’re taking to ensure accidental identification doesn’t take place. This might be via paraphrasing quotes instead of direct ones, etc. If you’re not planning to quote at all, this is fine – I just wanted to flag that this is something the Chairs will be thinking about – it’ll just have to be explicitly stated in your form for better understanding, and our records."
(Given that this document is in the public domain I think it is reasonable to quote here. The whole thing is available at Mumsnet corpus FOI for anyone interested!)