Any statitics/research methods gurus want to hold my hands through my current masters module?(8 Posts)
I'm doing a masters degree in Occupational Psychology. It's distance learning (part time) so I am finding it difficult to understand certain things and hard to get questions across to my tutors by email.
My current module is Research Methods. Technically this shouldn't be too much of a stretch as I already have an MSc in Psychological Research Methods, but that was 8 years ago. I could do with someone to give me some practical rather than abstract information, like, why is normal distribution so important anyway? And isn't removing outliers cheating?
Anyone out there with more experience than me who would hold my hand through the next few weeks?
No takers? I'll dip a toe in then.
I'm not sure that I understand your comment about "why is normal distribution so important anyway". It is what it is. If populations were distributed evenly across the range then ND wouldn't be important / wouldn't apply. But, very often, populations cluster around the mid-point with a few variations up/down i.e are normally distributed so the Statistics has grown up to describe this very common pattern.
I could give you a practical example from my own experience if you want.
Removing outliers isn't cheating. It's the outliers themselves who are the 'cheats'. They are unrepresentative of the population so to draw generalisations from them would be misleading; it is better to exclude them.
Does that help? Keep posting and we'll try to get somewhere.
Thanks for the reply!
Wrt outliers but they are part of the population aren't they? Unless you assume they are 'wrong' for some reason.
The normal population thing just seems like a high expectation. Say you were comparing how much people like David Cameron & were surveying Times & Guardian & Independent readers, you might expect clusters round the mean for partisan papers like the Times & Guardian, but perhaps the results from the Indie readers was more spread out. It might not be normally distributed due to the question being asked. Does that make sense?
You cannot say for certain what a population looks like unless you survey 100% of it. Statistics allows you to take a sample, describe their attributes and then apply those findings to the whole population. The trouble comes when your sample is not representative so to apply data from that skewed subsection to the whole population gives erroneous results. In your Cameron example, it is not "due to the question being asked" but due to choosing a non-representative subsection. You could not apply the results to the whole country but you could apply them to (say) Left-wing, broadsheet-reading members of socioeconomic group B.
Again, outliers in the subsection could give you the problem of a skewed sample. I agree that (genuine) outliers can be part of the population but if you end up with an unrepresentatively large number of outliers in the sample then you will get misleading answers. It is amazing how small some sample sizes are. For example, there are about 45m people voters in the UK but when MORI etc do an opinion poll to discover their intentions they only sample a few thousand. Detail here.
I suppose the correct answer is to investigate the outlier in more detail* or to repeat the survey. However, this means extra time and expense so the more realistic answer is to exclude the outlier, especially if the problem has arisen because they are not genuine outliers but are some other oddity eg an error on the part of the researcher (quite common, apparently!)
If a population does genuinely have a lot of outliers then its distribution cannot be called Normal, it goes against the definition.
*An example I read talked about the temperatures of items in a room. The table, the chair, the bowl etc all had similar temperatures, which conformed to ND, but there was one outlier which had a much higher temperature than the rest. When you realise that the outlier was a cooker, which you would expect to have its own temperature independently of the rest of the room, its not such a surprise after all and I'm sure that you would agree that it should be excluded.
I'm coming from a science angle, not social sciences, but in science it is bad practice to remove outliers because you are dealing with proportionally very small sample sizes, you don't know if the 'outliers' actually describe further trends in your data.
You should check your data for skew and kurtosis (to see if it is normally distributed or not). If it is normally distributed (which at least in biology is rare, so probably also the case in psychology), you can use fairly standard parametric methods to analyse your data. If it is not normally distributed you need to use non-parametric methods. The exact methods/tests will obviously depend on the exact research question you have.
If you apply parametric statistics to non-normally distributed data, you will get spurious results and hence the reason to use non-parametric methods.
bump since I want to ensure my response is seen
Hi, yes, I'm still reading replies, thanks
I think it's the concept of the central limit theorem I'm struggling to get my head around.
this andy field video about central limit theorem is quite helpful.
this is too basic but covers why you might think it's not cheating depending what you are measuring. I do reaction time experiments, it's not cheating to take away very long responses made when people are sneezing for example, which then disporportionately influence the mean in one condition. Of course if you had more data (more people, longer experiment, more approximately the complete possible responses) there would be less need to remove outliers. The data in psychology is often normally distributed btw.
Join the discussion
Please login first.