# Talk

## Stats help?

(4 Posts)
Mrsmaudwatts Tue 13-Feb-18 11:34:09

Could anyone please help me with some basic stats? I'm doing an MA and I'm really struggling with the stats element. I did stats st undergrad but that was 20 years ago. My tutor is off on long term sick and I have tried every possible avenue for help and drawn a complete blank. I have taught myself basic spss with books and YouTube tutorials but I'm confused. I feel presumptive putting the detail here, but nothing gained....

From the non parametric chi sq below, I need to know
1. Is this an appropriate test for this data?
2. If it is, how do I interpret the result? It's significant, but that's all I can tell. Do I need to run another test to infer anything more useful?
*these aren't the real questions!

Q "which do you prefer?" Choose one only:
Tea
Coffee
Water

106 people chose Tea
50 people chose coffee
13 people chose Water

I want to know if the difference between these scores (converted to %) is significant, so I have ran a chi sq (after watching a tutorial) Assuming this is the correct test for this data, does that mean I can say "participants prefer tea". Can I say anything about the other drinks (eg, water is the least preferred?)? I've changed the content as my real survey question is quite long!

Chi output

TEA
Observed: 50
Expected: 56.3
Residual: -6.3

COFFEE
Observed: 106
Expected: 56.3
Residual: 49.7

WATER
Observed: 13
Expected: 56.3
Residual: -43.3

Chi-Square 77.834a
df 2
Asymp. Sig. .000

a. 0 cells (0.0%) have expected frequencies less than 5. The minimum expected cell frequency is 56.3

This is a non-parametric Chi Squared test - An spss tutorial suggested - (analyze/non parametric/legacy dialogs/chi is what I've done here.)

Thanks so much if anyone with any stats knowledge has the time and inclination to help me. I'm in such a muddle

Abitstatty Tue 13-Feb-18 15:03:29

My advice would be to try to not think about how you feel about it (that it's compulsory, unfair and tutor is off sick) and just get it done - it's not as hard as it looks, and there is so much help online. If you do it and interpret it correctly you will pass; there is no magic involved.

Briefly and very, very informally: the SPSS output is telling us how far what we see in the breakdown of drinks is from what we would expect if there were no variation across choices. What do we see? Lots more coffee than expected, and a lot less water than expected if there were no differences across choices. Tea is close, but we are looking at the three choices as a group.

The formula to calculate chi-sq is available online and easy to do by hand, but SPSS does it for you anyway. There isn't a rule of thumb above which chi-sq must be to be 'significant' but anything above double figures when we don't have many categories is getting there.

Then, how much larger is chi-sq than what we would expect if the distribution were uniform? The p-value (which SPSS refers to as Sig for significance) tells us. Our usual standard in the social sciences is 0.05 - at that point there are 5 chances in 100 that we observe differences or variation that extreme through random chance, when in fact there are no 'true' differences. In everyday life we're generally comfortable with that level of doubt (less so if our lives depended on it). This difference between the sample and the underlying population can arise because we are dealing with samples and sometimes you just get a fluke sample.

Here, the p-value has been rounded to 0.000 (to three decimal places) so there is less than 1 chance in 1000 that we observe that variation in drink choices in our sample when in reality there is none in the underlying population.

Is it an appropriate test? It's a one-way chi-squared goodness-of-fit test so if you are testing for goodness of fit, it's fine. Chi-squared tests tend to be a bit 'low power' at detecting genuine differences - but for categorical data and at introductory level there aren't many other options.

HTH. Not a proper statistician by the way but do have to use crosstabs and chi-sq tests quite often in my work. I'm spending a research day doing admin & catch up tasks, wouldn't usually comment during the day.

Mrsmaudwatts Tue 13-Feb-18 15:30:03

Ah!! I hadn't spotted the "expected figure" is just an average of the total responses/no. Of options.

Right, that now makes so much sense!

So, I think I can fairly confidently say that coffee is the drink of choice....? The chance of this being wrong is 1/1000 ?

If tea, coffee, and water had all been closer scores (or 56.3 for example) I'd be saying confidently there is no preference over the other.

I have about 30 to interpret. Then I'm attempting to move on to my continuous variables and doing anova....

I'm very thankful to the internet, in particular mumsnet and YouTube right now! I'm quite enjoying the challenge but I'm not very confident with figures generally so I get somewhere and then doubt myself and that's where I'm struggling, no one to check things with. The internet is helpful but there is so much disagreement on a lot of stats, what test to use etc and I'm not confident enough to make a call either way.

But thank you!!!!!

Abitstatty Tue 13-Feb-18 15:43:03

Not disagreement so much as we have choices regarding which test to use - for which we're accountable.

The test is just showing that there is more variation than we would expect by chance if the distribution between the three choices were equal. It doesn't specifically tell us that coffee is the most popular - but we can make a graph, use our eyes and say so.

If tea, coffee, and water had all been closer scores (or 56.3 for example) I'd be saying confidently there is no preference over the other.

Depends on the p-value/sig. (but - probably, yes).

Keep going - though doing this one by one for 30 variables sounds a complete bore :-)

Join the discussion

Registering is free, easy, and means you can join in the discussion, watch threads, get discounts, win prizes and lots more.