I wasn't going to keep responding on this thread because I know the p.o.v. I'm advocating for isn't popular here, and I've never wanted to derail or disrupt threads.
I just want to say this and will duly shut up, IMV private or other ownership interests on the public internet shouldn't be able to gatekeep public content from public scrutiny. Think of all the sites on the public internet that should be accessible to scrutiny.
Creating T&Cs will maybe protect a public site from ripping for commercial use if the site has an appetite for lawsuits but they won't IMV protect from public interest journalism or academic research where approved protocols laid down for research are properly in place and complied with. I am familiar with those but I no longer work in HEIs.
Research protocols provide guidelines, risk assessments and hurdles for ethical approval. If researchers in universities don't conduct studies / handle data properly and as agreed, they will / should be internally sanctioned. If the university hasn't taken enough care to ensure compliance, it can be prosecuted. It depends on the case as to what the consequences might be. Because of the principles of public interest and academic freedom in democracies, the parameters of study aren't simply decided by institutions / publications / people who might be studied.
The new protocol element is the way the public internet, specifically social media, combines citizens and publication. So protocols for data gathering fall somewhere between studying publications / media and studying groups of people. If you set up a private group hosted on the internet, which no-one can see the content of, unless signed up with whatever data the organisers require, you have an expectation of privacy, just as you would if you met in person. But if the content of a site is publicly visible to consume, contributors are not private in the same way. Contributors are the monetised content, and their position isn't like private people communicating just with one another. The dilemma is how researchers can follow protocols of defaulting to anonymity for a general public, in this case site users. This is the debate.
My view is that if data is handled well, further anonymising of users is possible. The biggest risk for breaching anonymity will always be the content that users themselves publicly post. Sites can offer various aids to anonymity and erasure of content (but the content may still be permanent) and it's their responsibility not to allow users' private data they hold to be accessible. The arguments about the status of user-contributors can go on, but a site can't be both a public resource, a monetised public entertainment or a public influence, and also private.
It's not possible to acquire individual consent from user-content contributors, an unidentifiable quantity of people contributing over time, as it is from people involved in old-fashioned field studies. You could argue this means you can't study sites, but I disagree and I don't think I'm out of line with ethics panels on that. IMV site or platform-owners consent or lack of consent should not decide whether it can be scrutinised or studied.
The technologies which enable the public internet and social media aren't in the hands of its users, or even their governments, so it's important to be able to scrutinise web content independently. Public web content isn't accessible to study as easily as printed or other physical record just by the nature of the medium. It can take a lot longer and far more bodies to data gather from an ever-moving, multifaceted source of information. The biggest obstacle to getting the data therefore is time / funding. This is why researchers use ways of speeding up data collection like software. Using software to access private / hidden data is hacking and is illegal. Using software tools to retrieve and catalogue public content isn't but what you do with it is variously restricted. Academic researchers have access to commercial made-for-purpose software which requires user licenses. This removes some independent control of data from the researchers and is potentially problematic for that reason. Use of self-built open source software tools to do the same job shouldn't impede data gathering IMV, it's not ethically different from doing it by hand - the tools are not the problem so much as what is done with the data - but that doesn't mean university managements won't self-censor as they are increasingly nervous of being subject to litigation, being increasingly commercial themselves and fearful of funding cuts.
Most barriers to research come down to money / resources as well as confidence in being supported by governments and the law - the allocation of funding and validation of institutions, or threat of its withdrawal, can have a freezing effect on research. Universities, media companies, journalists and even governments are increasingly subject to the power of global internet technology barons to withdraw their technologies. For these reasons, I think the principle of public scrutiny of the public internet is a prime concern.
Aside from bad or illegal use of data, the issue with sites being studied isn't the production of papers like the one that generated this thread but the lack of motivation for people researching in unis right now to produce better, challenging studies from a wide variety of angles. Maybe working academics on MN can do that, hopefully unhampered by 'T&Cs'.