Page 15 | A corpus-assisted discourse analysis of linguistic transphobia on Mumsnet

IwantToRetire · 2024-04-18T16:32:52+00:00

^By Aston Institute for Forensic Linguistics^ ^It has been suggested that the forum-style parenting website Mumsnet is a hub for ‘gender-critical’ feminism, which directly opposes transgender rights, to be practised with little moderation (Livingston, 2018). This presentation reports on the initial stages of a project aiming to investigate that the potential intensification of linguistic transphobia on Mumsnet may lead to further marginalisation of transgender people offline (Powys Maurice, 2021). Though studies of non-linguistic transphobic rhetoric on Mumsnet (e.g., Pedersen, 2022; Mackenzie, 2019), and discourse analyses of other radical online communities (e.g., Krendel, 2020) have both occurred, this project is the first to analyse linguistic transphobia on Mumsnet. It also contributes to existing literature surrounding UK-based ‘gender-critical’ feminism; linguistic transphobia; and radical online community discourses.^ ^The presentation explores the rise of potentially ‘gender-critical’ linguistic transphobia on Mumsnet over time through the corpus linguistic (CL) analysis of the ‘Feminism: Sex & Gender Discussions’ board, using three corpora comprising a fifteen-year timeframe: 2008-2013; 2013-2018; and 2018-2023. As the project is still ongoing, preliminary findings will be presented, namely a comparative overview of trends yielded in frequency analyses. Overall, this presentation provides insights into the growing commonality of potentially ‘gender-critical’ feminist rhetoric on Mumsnet and its effect on increasing transphobic discourse on the site.^ [[https://www.eventbrite.co.uk/e/a-corpus-assisted-discourse-analysis-of-linguistic-transphobia-on-mumsnet-tickets-880795271367?aff=ebdssbdestsearch https://www.eventbrite.co.uk/e/a-corpus-assisted-discourse-analysis-of-linguistic-transphobia-on-mumsnet-tickets-880795271367?aff=ebdssbdestsearch]] (I had just finished my favourite tea time treat of catching up on FWR and was going to get back to the grindstone when this popped up on my feed. So have come back as it is too good not to be shared. Enjoy!)

KellieJaysLapdog · 20/04/2024 01:49

Ereshkigalangcleg · 20/04/2024 00:44

Full description of the FoLD here with explanation of data storage, research access etc

ojs.letras.up.pt/ojs/index.php/LLLD/article/view/12824/11680

There is an illustration in the above journal showing a dataset that isn’t currently present on the FoLD index page.

I found it via Internet archive:

https://archive.ph/OjwDu

This one was contributed by an even lowlier person in the Aston hierarchy than Eden Palmer, a former MA student.

Interesting that data pulled from Facebook was designated ‘restricted’ yet data pulled from Mumsnet was ‘controlled’ - does this indicate that less was done to anonymise the MN data? Or is it perhaps because MN is a U.K. website and FB a US website? Or the sheer volume of data taken from MN?

Whatever the reason a decision was made to classify the mumsnet scrape results as ‘restricted’? Who made that decision and why?

A corpus-assisted discourse analysis of linguistic transphobia on Mumsnet

Ereshkigalangcleg · 20/04/2024 01:56

Whatever the reason a decision was made to classify the mumsnet scrape results as ‘restricted’? Who made that decision and why?

I imagine because it's third party data with serious consent/anonymity issues to which they need to control access. Justine has said it's against MN terms of service to scrape the site like this.

Ereshkigalangcleg · 20/04/2024 01:58

US also doesn't have GDPR which may be a factor.

Tinysoxxx · 20/04/2024 02:25

Tim Grant: ’Improving the delivery of justice through the analysis of language’.

So our language is being analysed to deliver justice to whom?What is the end goal? Is it that this board was deemed threatening and obscene because we discuss women’s rights? Or is it on trial for wrongthink? If so, who has taken that decision because surely it has legal and criminal consequences for mumsnet and posters?

It would be nice to get some feedback from Aston University please.

KellieJaysLapdog · 20/04/2024 02:30

I’ve pulled out some bits of the journal article that are probably relevant

(TLDR: “we know it’s probably not ok to hold all this data without anyone’s permission but we only show it to really special students at full moon who pinky swear not to use it to get middle aged mums arrested for thought crime”)

However, providing full and unrestricted access to some datasets is not possible because of ethical concerns, copyright or license issues, or constraints established by Data Sharing Agreement. This is why we have introduced the restricted and controlled access categories.
^^
Datasets with controlled access contain highly sensitive material that may come from a third party and have even heavier constraints on access and use. Controlled datasets are therefore stored not on the FoLD web server but on an air-gapped, offlinecomputer in our secure data lab at the Aston Institute for Forensic Linguistics.
^^
Users who wish to access these datasets must make a detailed application to FoLD and the data owner, as well as potentially gain additional agreement from an external organisation before they can be approved for access. Although information on controlled datasets is detailed in the FoLD repository for users to search, the data itself is not available for download and users may need to visit Aston or agree a secure means of access. Datasets in this category include, for example, scraped data from white supremacist and dark-web child abuse discussion fora (Kredens and Pezik 2021b,a).
^^
FoLD is completely unrestrictive in the sense that anyone who is willing to share their forensic linguistic datasets via FoLD can become a data donor. Given that our aim is to publish as many datasets as possible, we are constantly looking for new relevant datasets, and we actively encourage potential data donors from within and outside AIFL to contact us at [email protected] to discuss the suitability of their datasets for inclusion into FoLD. Publishing a dataset via FoLD has several benefits for data donors. Firstly, datadonors retain full ownership of their datasets and can withdraw them from the databank at any time if they wish to do so. We never change the content, structure, or access category of any dataset we hold on FoLD without the data donor’s explicit permission.
^^
As mentioned above, we invite all potential data donors to have an informal discussion with the FoLD team about the suitability of their dataset for inclusion before they submit the dataset to the FoLD website. During this discussion, we establish whether the dataset holds relevance to forensic linguistics and whether there are any obvious ethical or licensing issues that would prevent us from publishing the dataset via FoLD. Once we have established that the dataset is in principle publishable on FoLD, we ask the data donor to submit their dataset by completing an online form available on the FoLD website
^^
Once the dataset has been submitted, we carry out our editorial review before making the data available on FoLD.
During this process, we check the metadata and the data files for any inaccuracies and decide on the most suitable access category for the dataset under review.
If we have any ethical concerns about the dataset that we are unable to resolve, we seek advice from the AIFL Research Ethics Committee, which works independently from the FoLD team. Whilst we normally follow the advice received from the Ethics Committee, all publication decisions lie with and are the responsibility of the FoLD team.
^^
The ethical aspect of the editorial process is of great importance to a specialist repository such as FoLD.
As stated above, sharing and publishing research data through a repository poses ethical and some legal challenges, with the potential for harm to individuals or communities should they become identifiable or if data were to be misused. These challenges prompted many early decisions by the FoLD team about the structure and management of the repository. The gold standard for sharing data beyond a project is usually to consent research participants for the storage and reuse of their anonymised data (ESRC 2015).
^^
However, it is acknowledged that such consent or anonymisation may not exist for pre-existing datasets or even be feasible for many types of research data and in these instances careful consideration and potentially a review by a research ethics committee is needed before a dataset is archived and reused.
This issue has perhaps been particularly acute in forensic linguistic contexts, where data may not originally have been intended for research, might be sourced from criminal or sensitive contexts, or be provided by another organisation that requires strict limits on use – long held difficulties for sharing data in the field. As a means of mitigating these difficulties, an early decision was made to provide datasets through layered access categories, whereby more complex datasets would not be published openly but could be restricted to authorised researchers

(“And we would’ve gotten away with it if it wasn’t for you meddling terves!”)

KellieJaysLapdog · 20/04/2024 02:32

Should we have a little wager on whether FoLD took the Mumsnet dataset to Ethics Committee prior to publishing?

RethinkingLife · 20/04/2024 02:35

Tinysoxxx · 20/04/2024 02:25

Tim Grant: ’Improving the delivery of justice through the analysis of language’.

So our language is being analysed to deliver justice to whom?What is the end goal? Is it that this board was deemed threatening and obscene because we discuss women’s rights? Or is it on trial for wrongthink? If so, who has taken that decision because surely it has legal and criminal consequences for mumsnet and posters?

It would be nice to get some feedback from Aston University please.

This sort of advice to the police to contribute to the criminal justice system? I that's the advice they're getting from the people whom they appoint to such positions…

‘I believe the College of Police is inherently corrupted by gender ideology'

'Recently, a man named Clare, Head of the Independent Advisory Group to Essex Police, said women with gender critical views should be treated as terrorists’

- Sarah Phillimore, Co-founder WeAreFairCop

https://twitter.com/GBNEWS/status/1769464221874434067

Ereshkigalangcleg · 20/04/2024 02:41

Datasets in this category include, for example, scraped data from white supremacist and dark-web child abuse discussion fora (Kredens and Pezik 2021b,a).

The fact that these two also scraped Mumsnet is quite the judgement. I wonder if it's all of MN or just FWR.

NotBadConsidering · 20/04/2024 02:55

Finally RTFT. Bonkers. But as usual a brilliant response from everyone here.

songaboutjam · 20/04/2024 02:58

I've probably missed bits of the thread as it's been filling up so fast, so apologies if someone else has already said this, but the phrase "data donor" is a bit piss-boiling.

To donate something implies it was yours to donate in the first place.

Ereshkigalangcleg · 20/04/2024 03:09

Yes I thought that too.

NitroNine · 20/04/2024 03:28

I am practically dancing with rage (a form of non-linguistic communication they can stuff right up a certain part of their corpus…) over this.

AmaryllisNightAndDay · 20/04/2024 07:15

"Data thief" sounds so much more... well, accurate.

Special thanks to Eden and Nicci for putting us on to it. Without Eden's seminar announcement we wouldn't have known.

TokyoBouncyBall · 20/04/2024 07:19

So the short round up from all of this excellent research is that they have taken this data because they’ve already decided that we have committed crimes.

Something that was apparent from the now deleted LinkedIn page which described the work as a PhD on transphobic hate crimes on Mumsnet.

So what’s the best step? Email Tim The Head? Or ask questions of their ethics department?

Talulahalula · 20/04/2024 07:34

I think from the posts above that Kredens and Pezik at Aston have harvested the data and created the dataset which is held in FoLD at Aston University.
The PhD student has been given access to the dataset, rather than created it. Leaving aside the issues with the premise of the PhD, the question is why Kredens and Pezik created it in the first place.
The only output I have found about it is this
https://research.aston.ac.uk/en/publications/large-scale-authorship-attribution-with-sociolinguistically-dynam
You can read the abstract on the link - it doesn’t mention MN but I would speculatively link their creation of the dataset with this output.

Next step, I would like to see a copy of that paper and would recommend that MNHQ ask for it.

I mean, fuck’s sake, I have published as part of my job and I have also posted prolifically on MN over the years when I was going through a really difficult time and in support of other women experiencing similar. I did so in the knowledge that what I was writing was publically accessible and having weighed up whether the benefits outweighed the risks of being identified (and I do think women supporting women is a huge benefit). But that people would set out to see if they could attribute language to people naively never crossed my mind, much less that legitimate public discourse would be used to allege hate crime.

The only thing I would say, and I will search a bit more, is that Kredens and Pezik do not seem to have published this paper, and their subsequent work does not seem to draw on this database. It also does not seem to be accessible anymore on the FoLD website. But I think it could be legitimate to do an FOI on what projects/researchers had gained access.

Large-scale authorship attribution with sociolinguistically dynamic data

https://research.aston.ac.uk/en/publications/large-scale-authorship-attribution-with-sociolinguistically-dynam

Boiledbeetle · 20/04/2024 07:39

NitroNine · 20/04/2024 03:28

I am practically dancing with rage (a form of non-linguistic communication they can stuff right up a certain part of their corpus…) over this.

Quite how they are going to quantify our apparent transphobic thoughts that we do through the medium of interpretative dance is going to be interesting.

I've just been catching up and the more posters uncover the more my interpretative dance routine goes from solitary dancer just shuffling feet with a bit of tap thrown in to full 3 part two interval West end massive cast ensemble steel drum straight out of Stomp routine.

RealFeminist · 20/04/2024 07:45

Talulahalula · 20/04/2024 07:34

I think from the posts above that Kredens and Pezik at Aston have harvested the data and created the dataset which is held in FoLD at Aston University.
The PhD student has been given access to the dataset, rather than created it. Leaving aside the issues with the premise of the PhD, the question is why Kredens and Pezik created it in the first place.
The only output I have found about it is this
https://research.aston.ac.uk/en/publications/large-scale-authorship-attribution-with-sociolinguistically-dynam
You can read the abstract on the link - it doesn’t mention MN but I would speculatively link their creation of the dataset with this output.

Next step, I would like to see a copy of that paper and would recommend that MNHQ ask for it.

I mean, fuck’s sake, I have published as part of my job and I have also posted prolifically on MN over the years when I was going through a really difficult time and in support of other women experiencing similar. I did so in the knowledge that what I was writing was publically accessible and having weighed up whether the benefits outweighed the risks of being identified (and I do think women supporting women is a huge benefit). But that people would set out to see if they could attribute language to people naively never crossed my mind, much less that legitimate public discourse would be used to allege hate crime.

The only thing I would say, and I will search a bit more, is that Kredens and Pezik do not seem to have published this paper, and their subsequent work does not seem to draw on this database. It also does not seem to be accessible anymore on the FoLD website. But I think it could be legitimate to do an FOI on what projects/researchers had gained access.

It was withdrawn yesterday, after MN contacted them regarding data scraping

Talulahalula · 20/04/2024 07:47

they are not the only people to scrape the site
this is from a paper on discussion of vaccination in AIBU (different authorship and custom made dataset from MN).

RealFeminist · 20/04/2024 07:48

Just reading the Data Protection Act.

I think Mumsnet users need to know the rationale, reasoning, ethical statement for why Aston U was scraping and storing their data, how, and what they have done with it.

It seems they are attempting to accuse the whole site of a crime. Is Dawn Butler involved?

RealFeminist · 20/04/2024 07:50

TokyoBouncyBall · 20/04/2024 07:19

So the short round up from all of this excellent research is that they have taken this data because they’ve already decided that we have committed crimes.

Something that was apparent from the now deleted LinkedIn page which described the work as a PhD on transphobic hate crimes on Mumsnet.

So what’s the best step? Email Tim The Head? Or ask questions of their ethics department?

Or the Data Commisioner.

BettyFilous · 20/04/2024 07:51

If the researchers at Aston University think MNers have committed a crime then the right thing to do would be report it for investigation. However, as various GC tribunals and cases like Harry the Owl’s have demonstrated, people saying things you don’t agree with, even things that offend you, is not criminal. There is a high bar for interfering in freedom of expression, a fundamental human right. The belief that biological sex is real, immutable and important in some contexts is protected under the Equality Act thanks to Forstater. MN has a robust moderation policy on this issue and deletes content that it thinks breaches its policy, probably more often than it needs to in light of recent judgements. So my question to Aston is: who is really on the wrong side of the law here?

RealFeminist · 20/04/2024 07:51

Talulahalula · 20/04/2024 07:34

I think from the posts above that Kredens and Pezik at Aston have harvested the data and created the dataset which is held in FoLD at Aston University.
The PhD student has been given access to the dataset, rather than created it. Leaving aside the issues with the premise of the PhD, the question is why Kredens and Pezik created it in the first place.
The only output I have found about it is this
https://research.aston.ac.uk/en/publications/large-scale-authorship-attribution-with-sociolinguistically-dynam
You can read the abstract on the link - it doesn’t mention MN but I would speculatively link their creation of the dataset with this output.

Next step, I would like to see a copy of that paper and would recommend that MNHQ ask for it.

I mean, fuck’s sake, I have published as part of my job and I have also posted prolifically on MN over the years when I was going through a really difficult time and in support of other women experiencing similar. I did so in the knowledge that what I was writing was publically accessible and having weighed up whether the benefits outweighed the risks of being identified (and I do think women supporting women is a huge benefit). But that people would set out to see if they could attribute language to people naively never crossed my mind, much less that legitimate public discourse would be used to allege hate crime.

The only thing I would say, and I will search a bit more, is that Kredens and Pezik do not seem to have published this paper, and their subsequent work does not seem to draw on this database. It also does not seem to be accessible anymore on the FoLD website. But I think it could be legitimate to do an FOI on what projects/researchers had gained access.

Is this saying they were attempting to use tech to ID women who'd posted on Mumsnet elsewhere on the Internet? A kind of massive Doxxing machine? Please tell me I'm wrong.

RealFeminist · 20/04/2024 07:55

'This paper presents an evaluation of two approaches to large-scale authorship attribution. The data sets contain over 60 million posts
(ca. 3 billion word tokens) contributed to online discussion boards by over one million registered members, which makes them
significantly larger in terms of both the number of documents and authors than any other experimental collection to date. Importantly
from a forensic linguistic perspective, the data sets are also highly interactive and dynamic, featuring hundreds of thousands of
authors engaging in complex polylogic exchanges on a wide range of topics over several years. We believe such an experimental
setup reduces some of the typical biases found in automated authorship attribution experiments which have used fairly static data
(e.g. blog posts or emails).
The first approach reported is a K-Nearest Neighbours (KNN) algorithm which transforms text samples into query vectors and collects
aggregated relevance scores of probable authors. The second approach is a FastText classifier (Joulin et al. 2016) utilising recent
advances in natural language processing such as vector-based word representations obtained through neural network training.
Depending on the number of test samples used for classification, our recall rate is 44 to 75 per cent at the 30th rank of the prediction
lists. We discuss the implications of our findings for the notion of idiolect and, more widely, for internet-scale authorship attribution.'

The numbers they cite roughly tally with Mumsnet.

Talulahalula · 20/04/2024 07:59

I don’t know because I have not read the paper. But the abstract refers to the potential for large scale author attribution on the internet. So if that is not what they were doing, it is what they were looking for the potential to do.

As an aside, there’s at least 34 academic papers using MN content - not this dataset to be clear, other forms of data collecting of various scales (someone has a whole academic paper out of one thread on weird things in people’s homes during childhood) - including on PND discussions here. So I would safely say that posters including myself should consider themselves research subjects when they post.

AmaryllisNightAndDay · 20/04/2024 07:59

Aston wouldn't be the first university to have to delete a research dataset.

Back in the day IIRC another university had to delete a huge database of tweets they'd been using for research, not because they'd obtained it illegally but because they couldn't meet the Twitter Ts&Cs that said individuals could delete their tweets.