Reposting here because I know not everybody is across this and the site stuff thread...
I wanted to offer some thoughts on the doxxing threat angle taken up in this and the other thread, and give some potentially helpful info on other corpus linguistic technicalities... NOT to excuse or explain the PhD in question (because wow) but because this is my field and some of us have research integrity not reflected by this PhD and people at Aston. I'm livid that this kind of research is what gets corpus linguistics more known and people then think we're all unethical twits. Ehem. So:
The whole point of corpus linguistics is to look at a dataset as a whole. There are no individuals.
Forensic linguists might be looking at authorship identification but it is a whole different field to corpus linguistics.
The fact that the Mumsnet corpus held by Aston has/could be used by both forensic and corpus linguists makes this confusing.
EPs PhD looks to be squarely in the corpus linguistics wheelhouse, likely without any access to any usernames at all. The data you use in corpus software is just the post content. Usernames are miles away.
It is likely that ethical approval has been given for EPs PhD already precisely because the dataset already exists at Aston. In a kind of "I'm using an approved internal dataset, nothing to see here". But any researcher worth their salt should CHECK that the rules haven't changed. Should check the rules of the website in question. So that's a failure on the supervisor and the student. Ethics is a big thing. Not getting informed consent from participants is a massive thing. And the rules on online datasets are changing rapidly and it is vital to stay informed.
The big issue here is the ethics of the Mumsnet dataset existing in the first place as it contravenes the T&Cs of the site.
To be clear, I am not affiliated with Aston or the PhD in question. Just someone else in the field watching with interest in how this all ends...