Meet the Other Phone. Child-safe in minutes.

Meet the Other Phone.
Child-safe in minutes.

Buy now

Please or to access all these features

Site stuff

Join our Innovation Panel to try new features early and help make Mumsnet better.

See all MNHQ comments on this thread

Another data scrape from academia

60 replies

ArabellaSaurus · 08/11/2025 19:46

Hi, MNHQ.

Alas, a couple of twits have been scraping the site, again.

https://bulletin.appliedtransstudies.org/article/4/1-3/7/

'In this study, however, we excavate what it means to write like a GC by analyzing how GC forum users rely on reactionary language and deploy storytelling practices in ways that calcify their anti-trans ideologies as personal and natural while rendering transgender people as anti-feminist, dangerous, and monstrous. To identify how GC groups perform political mythmaking and construct extremist identities, we undertook a computationally assisted discursive analysis of two popular GC forums: Ovarit and Mumsnet’s “Feminism: Sex & Gender” board (abbreviated to “FSG”). Through comparative platform discourse analysis, we analyzed over 80k posts and comments scraped from Ovarit and over 60k posts and comments scraped from Mumsnet (Burgess and Matamoros-Fernández 2016; Lewis and Marwick 2017)'

Data Collection
This study relied on a computationally-assisted discourse analysis of data collected from FSG and Ovarit using Python notebooks developed by one of the authors. Data from each platform was collected and analyzed separately, using the means below.
...
Mumsnet
Mumsnet posts within the FSG board are organized in a single feed. While Ovarit has forum subcategories (circles), FSG is a subforum itself, and does not have subcategories. Given that the discussion on FSG regularly involves trans- and gender-related subjects, it was likewise important to grab a wide sample. We scraped the most recent 3,767 threads and an accompanying 57,791 comments.5 In each case, the text, username, datetime, post type, and thread URL were collected. Data was collected in April of 2024.'

“I Took a Deep Breath and Came Out as GC”: Gender Critical Storytelling, Radicalization, and Discursive Practice on Ovarit and Mumsnet

Following the closure of the anti-trans subreddit r/GenderCritical, gender critical (GC) internet users have migrated to more obscure, invite-only spaces. A side-effect of this GC dispersal is that activity in online anti-trans spaces has become increa...

https://bulletin.appliedtransstudies.org/article/4/1-3/7/

OP posts:
ArabellaSaurus · 08/11/2025 19:48

I'm copy pasting this from @AstonUniversityDataScrapingDepartment:

'I have read it. I would be interested to see the peer reviews. (In my discipline these are also published.)
I would also be interested to see evidence of the editorial decision making process, not least evidence (or not) of due diligence wrt data use. It wouldn't take anyone long to see that (1) this kind of scraping is against T&Cs and (2) that MN is prepared to take legal action.
Finally, I hope MN looks at the "data availability statement" in which the authors of this article offer to give their scraped data to others.'

OP posts:
ArabellaSaurus · 08/11/2025 19:49

From the footnotes:

'“TIF” and “TIM” were banned across Mumsnet by founder Justine Roberts along with the terms “cis” and “TERF” as “many feminists are affronted by” the terms (Roberts 2018). These changes were aimed at encouraging “civil debate,” exemplifying the apolitical posturing that gives lip service to free speech and open debate while allowing Mumsnet leadership to abdicate responsibility for their culpability in fostering transphobia.↩︎'

“I Took a Deep Breath and Came Out as GC”: Gender Critical Storytelling, Radicalization, and Discursive Practice on Ovarit and Mumsnet

Following the closure of the anti-trans subreddit r/GenderCritical, gender critical (GC) internet users have migrated to more obscure, invite-only spaces. A side-effect of this GC dispersal is that activity in online anti-trans spaces has become increa...

https://bulletin.appliedtransstudies.org/article/4/1-3/7/#fnref7

OP posts:
ArabellaSaurus · 08/11/2025 19:50

'Acknowledgments

The authors would like to thank the special issue editors as well as our anonymized reviewers who provided insightful and crucial feedback. They also extend thanks to the panelists and attendees at the 2024 Association of Internet Researchers conference in Sheffield, who discussed this work with us. Finally, they extend love and gratitude to their friends, pets, and partners, who supported them through a deeply challenging research project.'

OP posts:
ArabellaSaurus · 08/11/2025 20:07

The article is defamatory of Mumsnet, and users. throughout.

Scraping is against the T&Cs.

The journal is affiliated with Northwestern University Libraries as the publisher, the primary sponsoring organisation is the Center for Applied Transgender Studies

I don't see anything at all about ethics, but MN may wish to ask for a statement about ethics approval, data handling and anonymisation.

Internet-mediated research has domain-specific ethical expectations (AoIR, ESRC, UKRIO). If the authors failed to obtain required ethics approval, failed to mitigate harms to vulnerable people, or used deceptive methods without justification, the university (if they are employed there or used its resources) could face internal complaints for research misconduct.

I'm sure MN's lawyers are expert on this by now, but you may want to:

Request an investigation under the university’s research-integrity rules (missing ethics approval, failure to follow AoIR/ESRC guidance).

Complain to the publisher / journal: ask for corrections, clarifications, or retraction if procedures or ethics statements were inadequate.

OP posts:
ArabellaSaurus · 08/11/2025 20:10

Archived:

archive.ph/e0u3Z

AoIR: https://archive.ph/uAzua

OP posts:
ArabellaSaurus · 08/11/2025 20:12

Found the ethics statement for the journal;

'Publication Ethics

Authors should observe high standards with respect to publication ethics as set out by the Committee on Publication Ethics. Falsification or fabrication of data, plagiarism, including duplicate publication of the authors’ own work without proper citation, and misappropriation of the work are all unacceptable practices. Any cases of ethical misconduct are treated very seriously and will be dealt with in accordance with the COPE guidelines. All research employing human subjects must have been conducted with the authorization and approval of an Institutional Review Board, an Ethical Review Board, or some other credentialed research ethics committee.
Members of the journal’s editorial team (including assistant editors and editorial board members) are welcome to submit papers to BATS. These submissions are not given any priority over other manuscripts, and editorial team members’ affiliation with the journal has no bearing on editorial decisions. If an editorial team member is an author on a submission, they will have no involvement in or access to confidential information on the editorial process.'

https://bulletin.appliedtransstudies.org/guidelines/

Author Guidelines - Bulletin of Applied Transgender Studies

Author Guidelines

https://bulletin.appliedtransstudies.org/guidelines/

OP posts:
doctorsleep · 08/11/2025 20:30

Why do you assume they didn’t get approval from MN?

c. No part of the Website may be distributed, scraped or copied for any purpose without express approval and a licence to do so from us or our licensors. If you are interested in copying, licensing or using Mumsnet content for any purpose, then contact us at [email protected].

M

MrsOvertonsWindow · 08/11/2025 20:33

doctorsleep · 08/11/2025 20:30

Why do you assume they didn’t get approval from MN?

c. No part of the Website may be distributed, scraped or copied for any purpose without express approval and a licence to do so from us or our licensors. If you are interested in copying, licensing or using Mumsnet content for any purpose, then contact us at [email protected].

M

😂 Just guessing but can't see Mumsnet happily allowing data scraping for the purpose of men trashing the Mumsnet brand and women posters.

Just guessing of course.

Boiledbeetle · 08/11/2025 20:34

doctorsleep · 08/11/2025 20:30

Why do you assume they didn’t get approval from MN?

c. No part of the Website may be distributed, scraped or copied for any purpose without express approval and a licence to do so from us or our licensors. If you are interested in copying, licensing or using Mumsnet content for any purpose, then contact us at [email protected].

M

Because we've been here before, with Aston University, and Justine got lawyers involved.

HonoriaBulstrode · 08/11/2025 20:47

Why do you assume they didn’t get approval from MN?

If they did, it should be in the acknowledgements, shouldn't it?

Boiledbeetle · 08/11/2025 20:52

doctorsleep · 08/11/2025 20:30

Why do you assume they didn’t get approval from MN?

c. No part of the Website may be distributed, scraped or copied for any purpose without express approval and a licence to do so from us or our licensors. If you are interested in copying, licensing or using Mumsnet content for any purpose, then contact us at [email protected].

M

https://www.mumsnet.com/talk/site_stuff/5057903-mumsnet-corpus

Previous data scraping debacle

Mumsnet Corpus | Mumsnet

Not a TAAT, but a bit of googling as a result of a now deleted thread has led me to this: [[https://fold.aston.ac.uk/handle/123456789/18 https://fold...

https://www.mumsnet.com/talk/site_stuff/5057903-mumsnet-corpus

doctorsleep · 08/11/2025 20:58

@HonoriaBulstrode not necessarily. It can be done to thank those who helped you write the article, either with their work or funding.
@Boiledbeetle I have posted before on MN having an extremely high number of trackers who use our data for ad biding. With so many companies (more than 17) having access to MN, it could well be an indirect approval via a third company.

Until MNHQ comments on this, we can speculate as much as we want.

MassiveWordSalad · 08/11/2025 21:16

”Yo Mumsnet, are you down with us ‘borrowing’ your data for our totally legit academic research? Cards on the table, we may refer to you as ‘anti-trans’ or a ‘hateful community’ but that’d be cool, right? Lol”

”Sure, fill your boots lads.”

I have my doubts that this conversation took place, but we’ll see.

FuckOffMadison · 08/11/2025 23:42

Alas, a couple of twits have been scraping the site, again.

10/10 for restraint.

JadeSquid · 09/11/2025 01:58

I think any site on the Internet should be up for academic critique. If you don't like their paper, academically refute their findings. Other people will peer review everything written.

AreYouSureAskedNaomi · 09/11/2025 04:34

Bumping this for all of us

Thank you @ArabellaSaurus

TalulaHalulah · 09/11/2025 06:59

JadeSquid · 09/11/2025 01:58

I think any site on the Internet should be up for academic critique. If you don't like their paper, academically refute their findings. Other people will peer review everything written.

The issue is, I think, that data scraping is against MNs terms and conditions; indeed any data use from MN is against the T’s and Cs. Therefore users have a reasonable expectation that their data (and I include user generated data as data) will be used only on research or projects MN have agreed to.

I don’t wish to rehearse all the arguments made on the Aston thread, where, to be fair there were also examples given of excellent research on MN as well as academic papers on how to do this ethically. Data scraping three thousand odd threads from one sub forum without consent from the site owner, is not how to do this ethically. Especially as the scraped data is being offered to other researchers as well.

Personally, I would have no objection if someone wrote a paper on FWR with attention to the particular UK political and social context of the content looking at it over time, and as a poster on the thread on this paper on FWR said, with reference to the kind of issues posted by women (the biological kind) over the rest of the site, that is to say, with some understanding of the lived day to day experience of women and a genuine desire to understand why there might be strongly held views on the importance of single- sex spaces for women’s privacy and I would argue, full civic participation as well as other aspects of gender ideology especially relating to children.

ArabellaSaurus · 09/11/2025 07:06

Yes. The issues are as I've laid out in my posts. Jade appears to have either not read, or failed to understand.

OP posts:
OnlyOnAFriday · 09/11/2025 07:15

That’s very naughty of them. Be very interesting to hear what their ethical approval from their institution was. Especially bearing in mind that as previously pointed out on the Aston thread Mumsnet posts are pseudonymous, not anonymous.

BoulevardOfBrokenSleep · 09/11/2025 07:38

I've read (skimmedl) it, bless them.
I mean if you're writing for the journal of applied transgender studies - best not to ask what the research impact factor of that is - you're not exactly on the cutting edge of science are you

Apparently as well as not believing males can actually be female (so backward of us!) we complain we could get into trouble at work for that belief, and that the media ignores our viewpoint What silly little ladies we are to imagine these things.

Also among our crimes is to "denigrate sex workers" by saying 'prostitutes'... Or another way of looking at that could be that we respect their humanity and empathise with them too much to gloss over their systematic financial exploitation and sexual abuse? Potato/pot-ah-to innit.

I mean sure they're breaking the T&C's, but to me coming from a science background it's quite funny that you can call something like this research. Also funny/appalling that for all the care we take not to be biased in our work, your Humanities guys can just pile on in there without questioning their basic assumptions in any way. I mean if Phipps wrote something equally waffly and conjectural in 2022 it must be correct, cite it and move on, right?

Helleofabore · 09/11/2025 09:15

Considering MN were very clear last time that scraping the site was against the T&Cs, I don’t think there has been a change in T&C s since then.

TalulaHalulah · 09/11/2025 09:41

OnlyOnAFriday · 09/11/2025 07:15

That’s very naughty of them. Be very interesting to hear what their ethical approval from their institution was. Especially bearing in mind that as previously pointed out on the Aston thread Mumsnet posts are pseudonymous, not anonymous.

The lead author seems to be at the University of Alberta. I checked their ethics pages and I would imagine they are relying on the exemptions in my screenshot.
But I would question the reasonable expectation of privacy bit and the anonymous bit - user content is publicly available but it is generated with the expectation that it will be used in line with the MN terms and conditions which prohibit data scraping, or they did at the time this research was done, and say that research needs MN site owners’ permission. The dataset created by scraping is against the T’s and Cs and is certainly not the researchers to give away.
And I think people can be identified as we know from the many threads of ‘have you ever identified someone you know on MN?’. There are identifiers associated with the user generated data in the form of user names which can be cross-referenced with other posts should someone be so minded. So it’s not anonymous data with no identifiers. I mean, the authors may not have a clue sitting in Alberta who someone is, but I know at least one poster IRL by what they post because they live around the corner from me and some of what they post is identifiable if you know the area.
It’s also unethical to use forum data with no ethics approval or consideration, as the UK research councils guidance makes clear.

Another data scrape from academia
TalulaHalulah · 09/11/2025 09:42

Image currently under review

JadeSquid · 09/11/2025 10:59

TalulaHalulah · 09/11/2025 06:59

The issue is, I think, that data scraping is against MNs terms and conditions; indeed any data use from MN is against the T’s and Cs. Therefore users have a reasonable expectation that their data (and I include user generated data as data) will be used only on research or projects MN have agreed to.

I don’t wish to rehearse all the arguments made on the Aston thread, where, to be fair there were also examples given of excellent research on MN as well as academic papers on how to do this ethically. Data scraping three thousand odd threads from one sub forum without consent from the site owner, is not how to do this ethically. Especially as the scraped data is being offered to other researchers as well.

Personally, I would have no objection if someone wrote a paper on FWR with attention to the particular UK political and social context of the content looking at it over time, and as a poster on the thread on this paper on FWR said, with reference to the kind of issues posted by women (the biological kind) over the rest of the site, that is to say, with some understanding of the lived day to day experience of women and a genuine desire to understand why there might be strongly held views on the importance of single- sex spaces for women’s privacy and I would argue, full civic participation as well as other aspects of gender ideology especially relating to children.

I dont think thats necessary and I don't think you should need a license to mine the data of publicly available posts. It seems more like people are worried about others knowing what they think about certain topics and thought they could share their views here with absolutely no consequence.

JadeSquid · 09/11/2025 11:03

TalulaHalulah · 09/11/2025 09:41

The lead author seems to be at the University of Alberta. I checked their ethics pages and I would imagine they are relying on the exemptions in my screenshot.
But I would question the reasonable expectation of privacy bit and the anonymous bit - user content is publicly available but it is generated with the expectation that it will be used in line with the MN terms and conditions which prohibit data scraping, or they did at the time this research was done, and say that research needs MN site owners’ permission. The dataset created by scraping is against the T’s and Cs and is certainly not the researchers to give away.
And I think people can be identified as we know from the many threads of ‘have you ever identified someone you know on MN?’. There are identifiers associated with the user generated data in the form of user names which can be cross-referenced with other posts should someone be so minded. So it’s not anonymous data with no identifiers. I mean, the authors may not have a clue sitting in Alberta who someone is, but I know at least one poster IRL by what they post because they live around the corner from me and some of what they post is identifiable if you know the area.
It’s also unethical to use forum data with no ethics approval or consideration, as the UK research councils guidance makes clear.

Wrong quoted post