Meet the Other Phone. Protection built in.

Meet the Other Phone.
Protection built in.

Buy now

Please or to access all these features

Feminism: Sex and gender discussions

A corpus-assisted discourse analysis of linguistic transphobia on Mumsnet

1000 replies

IwantToRetire · 18/04/2024 17:32

By Aston Institute for Forensic Linguistics

It has been suggested that the forum-style parenting website Mumsnet is a hub for ‘gender-critical’ feminism, which directly opposes transgender rights, to be practised with little moderation (Livingston, 2018). This presentation reports on the initial stages of a project aiming to investigate that the potential intensification of linguistic transphobia on Mumsnet may lead to further marginalisation of transgender people offline (Powys Maurice, 2021). Though studies of non-linguistic transphobic rhetoric on Mumsnet (e.g., Pedersen, 2022; Mackenzie, 2019), and discourse analyses of other radical online communities (e.g., Krendel, 2020) have both occurred, this project is the first to analyse linguistic transphobia on Mumsnet. It also contributes to existing literature surrounding UK-based ‘gender-critical’ feminism; linguistic transphobia; and radical online community discourses.

The presentation explores the rise of potentially ‘gender-critical’ linguistic transphobia on Mumsnet over time through the corpus linguistic (CL) analysis of the ‘Feminism: Sex & Gender Discussions’ board, using three corpora comprising a fifteen-year timeframe: 2008-2013; 2013-2018; and 2018-2023. As the project is still ongoing, preliminary findings will be presented, namely a comparative overview of trends yielded in frequency analyses. Overall, this presentation provides insights into the growing commonality of potentially ‘gender-critical’ feminist rhetoric on Mumsnet and its effect on increasing transphobic discourse on the site.

https://www.eventbrite.co.uk/e/a-corpus-assisted-discourse-analysis-of-linguistic-transphobia-on-mumsnet-tickets-880795271367?aff=ebdssbdestsearch

(I had just finished my favourite tea time treat of catching up on FWR and was going to get back to the grindstone when this popped up on my feed. So have come back as it is too good not to be shared. Enjoy!)

A corpus-assisted discourse analysis of linguistic transphobia on Mumsnet

The talk explores the rise of potentially ‘gender-critical’ linguistic transphobia on Mumsnet over time through a corpus linguistic analysis

https://www.eventbrite.co.uk/e/a-corpus-assisted-discourse-analysis-of-linguistic-transphobia-on-mumsnet-tickets-880795271367?aff=ebdssbdestsearch

OP posts:
Thread gallery
83
songaboutjam · 19/04/2024 22:15

Threads, even.

AmaryllisNightAndDay · 19/04/2024 22:16

RethinkingLife · 19/04/2024 21:31

I have a minor interest in automated sentiment analysis.

I would be genuinely interested in a brief overview of that if you could spare your time and cognitive resources. I keep seeing mention of it (or did before Twitter was completely closed off to researchers) but have no idea what's actually involved in conducting such research (the nuts and bolts).

I completely understand if you don't have the time nor energy to do this.

To be honest it's really not my area, I just supervise occasional undergraduates who like to play with SentiStrength on one of the public datasets. Many years since I've talked to anyone with access to anything more and I haven't really kept up. Which is why I'm interested to know what's going on now!

BettyFilous · 19/04/2024 22:17

This reply has been deleted

Message deleted by MNHQ. Here's a link to our Talk Guidelines.

Keeprejoining · 19/04/2024 22:32

This whole thing is horrific and unethical. I'm sure the Scottish hate crime police would love to access this data base.

SaffronSpice · 19/04/2024 22:54

Remember boards include 30 day chats where people want information to disappear or SN chat which doesn’t get featured in trending and disappears after 90 days.

NeverDropYourMooncup · 19/04/2024 22:56

This reply has been deleted

Message deleted by MNHQ. Here's a link to our Talk Guidelines.

Headline: 'We're doing this to catch Nazis, paedophiles and terrorists'.

Small Print: 'We're going to use it primarily to doxx middleaged females and make sure everybody gets to know all about their relationships, rapes, assaults, political views, voting habits and the occasional sarky comment about Carol Vorderman'.

ditalini · 19/04/2024 23:00

This reply has been deleted

Message deleted by MNHQ. Here's a link to our Talk Guidelines.

Yes, it's difficult to work out how many publications have used the dataset due to paywalls & Google Scholar sometimes being a bit off with its fulltext mining (I just searched for each of the dataset upload authors + mumsnet), but one of the uploaders did work on a project seeing if you could compare two texts and use linguistic analysis to tell if they were by the same author.

It's not a wild stretch from that to seeing if you could identify name changes on an anonymous forum, which makes jigsaw identification much more likely.

KellieJaysLapdog · 19/04/2024 23:01

Most of the other data sets in the FoLd library are absolutely nothing like Mumsnet. Looks like a lot of has come from law enforcement/court cases?

A corpus-assisted discourse analysis of linguistic transphobia on Mumsnet
A corpus-assisted discourse analysis of linguistic transphobia on Mumsnet
A corpus-assisted discourse analysis of linguistic transphobia on Mumsnet
A corpus-assisted discourse analysis of linguistic transphobia on Mumsnet
A corpus-assisted discourse analysis of linguistic transphobia on Mumsnet
songaboutjam · 19/04/2024 23:09

KellieJaysLapdog · 19/04/2024 23:01

Most of the other data sets in the FoLd library are absolutely nothing like Mumsnet. Looks like a lot of has come from law enforcement/court cases?

That's insightful. So assuming the PhD student wasn't the person who added Mumsnet to the corpus data, we have the questions of who, why and when.

If the other data sets had been forums like Reddit, I'd still have ethical concerns but I'd be far less suspicious about the overall motives behind lumping MN in with crime related corpora.

SaffronSpice · 19/04/2024 23:15

Didn’t one of the screenshots say the Mumsnet dataset ran until 2023? If that was when it was created then that would suggest it was the postgrad.

KellieJaysLapdog · 19/04/2024 23:15

songaboutjam · 19/04/2024 23:09

That's insightful. So assuming the PhD student wasn't the person who added Mumsnet to the corpus data, we have the questions of who, why and when.

If the other data sets had been forums like Reddit, I'd still have ethical concerns but I'd be far less suspicious about the overall motives behind lumping MN in with crime related corpora.

There is one dataset from a particularly awful, now defunct, paedo chat forum, but it looks like it was collected as part of several court case and was only released after convictions (fair enough).

There is also stuff from white supremacy forums and logs of nasty tweets to an anti racist twitter but no wholesale scraping of mainstream chat forum data where no crimes have been committed

A corpus-assisted discourse analysis of linguistic transphobia on Mumsnet
TokyoBouncyBall · 19/04/2024 23:16

The PhD student had previously done an MA at Aston so quite possible that they created it.

songaboutjam · 19/04/2024 23:19

Good to know it's probably the student and the corpus was being collected for specific research rather than just some shady archive of no stated purpose.

Although the ethical issues remain, to borrow from Gen Z tumblrspeak, kind of yikes.

KellieJaysLapdog · 19/04/2024 23:20

I’ve made an archive of the PhD candidate’s event page as it appears on the Eventbrite website, which may be where people have seen dates for the mumsnet data?

https://archive.ph/kosy9

it supposedly spans 15 years 2008-2023

A corpus-assisted discourse analysis of linguistic transphobia on Mumsnet
KellieJaysLapdog · 19/04/2024 23:26

The uploaders of the Mumsnet data (aka ‘data donors’) are named on the version of the FoLD page we found via Internet archive (link posted upthread by REAL FEMINIST but it looked a bit bonkers and you had to copy and paste it to make it work).

A corpus-assisted discourse analysis of linguistic transphobia on Mumsnet
A corpus-assisted discourse analysis of linguistic transphobia on Mumsnet
Dumbledoreslemonsherbets · 19/04/2024 23:38

My understanding of GDPR is that data / information cannot be used - without consent - for purposes other than that originally intended. What's intended on here is discussion with other users. I think it's reasonable to expect sometimes our discussion will be reported in the press (who should adhere to certain standards of accuracy). IIRC the terms of use also include the site owners (MNHQ) using data in certain circumstances (e.g for advertising). But that's not what's happened here.

Regardless using slurs against users breaches talk guidelines, so unlikely to be approved by MNHQ.

I certainly don't consent to my data being used to smear middle aged women as 'transphobic' for standing up for sex-based rights. I have never written anything 'transphobic' by any reasonable definition of that term.

I'm pretty sure the judge in the Jo Phoenix trial had something to say about using slurs as a bullying tactic in academia. I will be looking up the judgement and I look forward to bringing that up on the Teams call at the event as I've got a ticket. Plus also the egregious lack of ethics and breach of GDPR this "research" (heavily sarcastic air quotes) constitutes.

Not to mention the misogyny in targeting the free speech of women.

BoreOfWhabylon · 20/04/2024 00:06

Perhaps the more knowledgeable posters could let MNHQ/Justine know about some of the implications that may not be immediately obvious to non-specialists? Like the bit about comparing text to see if it comes from the same user?

Ereshkigalangcleg · 20/04/2024 00:10

This reply has been deleted

Message deleted by MNHQ. Here's a link to our Talk Guidelines.

Ereshkigalangcleg · 20/04/2024 00:28

Text Crimess^ is an online database that Beautiful Canoe implemented for Professor Tim Grantt^ at the Centre for Forensic Linguisticss^, at Aston Universityy^. The Text Crimess^ website contains a database of curated malicious communications, designed to create a common set of corpora for forensic linguistic researchers. As well as storing the text of malicious communications, Text Crimess^ allows researchers to run a small number of common, automated, NLPP^algorithms over the curated corpora.

I can't get the page to load but this is the link textcrimes.com/

Ereshkigalangcleg · 20/04/2024 00:29

From the Beautiful Canoe website (Aston student run software engineering company)

beautifulcanoe.com/research-software-engineering.html

songaboutjam · 20/04/2024 00:30

The wider implications of scraping Mumsnet for research, without anyone's prior knowledge or consent, could be a chilling effect on women's online speech. More vulnerable posters may censor or reduce their contributions if they know that for e.g their writing style could be analysed for similarities with other usernames. Or simply that their personal stories might be directly quoted in published research.

People are angry enough with AI models being trained on this kind of data and at least that doesn't involve a human nitcombing through it.

Ereshkigalangcleg · 20/04/2024 00:44

Full description of the FoLD here with explanation of data storage, research access etc

ojs.letras.up.pt/ojs/index.php/LLLD/article/view/12824/11680

Ereshkigalangcleg · 20/04/2024 00:51

From that document, apologies for any formatting issues:

One of the most frequently mentioned challenges is how to deal with sensitive data
in linguistic datasets and corpora (Rock 2001; Anthony 2013; Leedham
et al.
2021). In
the social sciences generally, and particularly perhaps in linguistics, commonly used
data, such as written texts, recordings of interactions and interviews, nearly always
originate from individuals, which becomes particularly problematic when using data
from sensitive contexts. A handful of qualitative studies have met these challenges
for data-sharing through explicitly consenting data subjects for the use of their data
and outlining the level of anonymity that will be achieved, but such methods are not
always appropriate or feasible in forensic and legal contexts, particularly when working
with secondary data from external organisations, and there are no direct research
‘participants’ in the traditional sense.
In the context of corpus linguistics, sensitive data almost exclusively refers to
personal information of named individuals, such as names, addresses, phone numbers
and other contact details (Leedham
et al.
2021). The standard method for mitigating
this problem is anonymisation, i.e., replacing personal information with standard
placeholders in a corpus (Rock 2001). Anonymisation is widely used in corpus linguistics,
especially in published language resources but it also presents its own challenges. Due
to the sheer size of many present-day corpora, manual anonymisation or the manual
inspection of the output of automated anonymisation tools has become unfeasible, which means that it is practically impossible to ensure that all personal information has been
removed from published corpora (Baker 2018).

KellieJaysLapdog · 20/04/2024 01:03

Ereshkigalangcleg · 20/04/2024 00:44

Full description of the FoLD here with explanation of data storage, research access etc

ojs.letras.up.pt/ojs/index.php/LLLD/article/view/12824/11680

From the journal article linked above, coauthored by Professor of Forensic Linguistics at Aston Tim Grant:

FoLD. Why is FoLD needed?
Access to data in forensic linguistics in the broadest sense. Forensic linguistics can be defined as the application of linguistic knowledge, theory, and methods to legal and criminal contexts with the aim to improve the delivery of justice through the analysis of language.
Although it is evident that forensic linguistics can only fulfil this aim by conducting evidence-based, reliable, and replicable research, access to relevant forensic linguistic data has been notoriously challenging since the conception of the discipline.
The need for specific datasets for forensic linguistic analysis was recognised in the first edition of the journal then known as Forensic Linguistics […] gives mention to the setting up of a corpus to be “whimsically entitled the Habeas Corpus, which will include suicide notes, threatening letters, transcriptions of threatening and obscene telephone calls, court transactions, witness statements and police interview records

And you’ve decided it’s ethical for two random Polish bloke academics to data scrape a parenting chat forum with a
particular emphasis on the rights of women and the safeguarding of children? Really Tim?

Two screen shots from Tim Grant’s Twitter account (I’m working off the premise that calling Tim an asshole for overseeing the assholeness of having thousands of unsuspecting and unknowing women’s conversations datascraped for academic clout and research funding doesn’t count as hate speech, based on Tim’s own tweets):

A corpus-assisted discourse analysis of linguistic transphobia on Mumsnet
A corpus-assisted discourse analysis of linguistic transphobia on Mumsnet
Please create an account

To comment on this thread you need to create a Mumsnet account.

This thread is not accepting new messages.