Page 2 | 1 in 4 GCSE and A level exam results could be wrong.

goodbyestranger · 02/11/2018 12:28

Agree. Of course it matters if grades are wrong, whether towards the top end or the bottom end or in the middle. On principle it matters and in practical terms it matters.

dennishsherwood · 02/11/2018 15:37

"Does it matter?"

A few months ago, I was at a meeting attended by the great-and-the-good to discuss all this. In response to what I was saying, one of the greatest-and-the-goodest (alas, the Chatham House rules prevent me from saying more), responded, "Well, let's suppose for a moment that what you are saying is valid, and the 1 grade in 4 is wrong. But they're wrong both ways. So a student taking 8 GCSEs has two wrong grades, one too high, one too low. It all comes out in the wash, so it doesn't matter." Most of the heads around the table nodded wisely, and the meeting moved on to something else.

I couldn't believe my ears. Yes, it is true that grades are wrong both ways. But to suggest that a higher grade in [this] offsets a lower grade in [that] is preposterous. But however preposterous it might be, it was said, and said by someone in a very important role.

So if this does matter to you, please say so. We need to have a strong and loud voice to make those careless hierarchs listen!!!

CherryPavlova · 02/11/2018 18:09

Pearsons and AQA do let you have script returned prior to getting remarks. From 2019/2020 teachers will have free access to scripts online. It currently costs about £11 administration charge but you have time to look at paper before review of marking deadline.

noblegiraffe · 02/11/2018 18:16

AQA can’t send you a script before a remark for GCSE. They have to have this in place for 2020. Ofqual said all exam boards have to by then, Edexcel are ahead of the curve with allowing access to all scripts for free.

MNHQ

LilyTheLanternPhantomMumsnet · 02/11/2018 18:19

We're just moving this over to our petitions topic.

CherryPavlova · 02/11/2018 22:52

Noblegirafe you are wrong. AQA can definitely send your script before the review of marketing ng deadline. The 2020 deadline is for free, online access to scripts and not a regulatory reqifrom OfQual.

noblegiraffe · 03/11/2018 00:43

No I’m not. Not for GCSE. “Following Ofqual's decision about post-results, we will not provide priority copies of summer 2018 GCSE papers. We're reviewing this for future series.”
www.aqa.org.uk/exams-administration/results-days/post-results/copy-of-marked-paper

roundaboutthetown · 03/11/2018 09:15

Sounds to me like "wrong" is a matter of opinion here (ie different markers can have different opinions, not that they have generally done anything wrong), and that this issue has existed since exams marked by human beings were invented. When have any public, national exams been marked by the same person??? It has never, ever been the case that exam results are a reliable marker of someone's ability, skills and knowledge and never will be, and however "fuzzy" an exam grade, you aren't going to select someone with a lower fuzzy exam grade over someone with a higher fuzzy exam grade unless persuaded to by another human being tou like more who gives you yet another opinion, not fixed facts indisputable by other human beings...
So, what are people really hoping for here? That, eg, the taxpayer should pay for every single exam paper to be marked by five different people and then their marks averaged out for the final result, so as to reduce the risk of being shafted by a mean examiner? Or that schools should be allowed to see a copy of all papers and challenge all marks they do not like without risk of financial penalty, even though this is also phenomenally expensive and unfair, because then children are at the mercy of the determination and quality, or lack thereof, of their school and the ability of administrative systems to cope with being swamped by appeals? I'm failing to see what can be done about it to make it genuinely more fair and less opinionated that is feasible and affordable.

CherryPavlova · 03/11/2018 09:19

Actually most exam papers are marked by several different people. No paper is marked in its entire form. Questions are divided up and each sent to different markers. It’s not about one dodgy marker.

CherryPavlova · 03/11/2018 09:22

Apologies noblegiraffe. A level scripts can be returned in good time but GCSE papers take the longer route. This is so priority is given to ensuring as many young people as possible get their choice of university places.

roundaboutthetown · 03/11/2018 09:30

CherryPavlova - different people marking different bits of an exam paper is not remotely the same thing as 5 people marking the same bits of an exam paper and then averaging out their scores, though, it just means you have an inconsistently marked exam paper by anyone's standards, because it is littered with different people's opinions .

dennishsherwood · 03/11/2018 10:08

Roundaboutthetown asks 'what is "wrong"?'. Pragmatically, if a grade is changed after an appeal, then, from the candidate's point-of-view, the original grade is "wrong". But determining the "right" grade is not so easy, and you're right about the impracticality of multiple marking.

So maybe a more useful term is "reliable" - a "reliable" assessment being one that has a very high probability (say, 99.9% or more) that the original assessment is confirmed if the same script were marked/reviewed/re-marked by any other one (or more) markers.

The current assessment in terms of grades fails this reliability test miserably. But if the assessment is done in another way, without the use of grades and without the need for hard-edged grade boundaries, then all sorts of possibilities open up - possibilities that are practical, inexpensive, and - most importantly - fair. The statistician, and senior examiner, Neil Sheldon neilsheldon.net has done some very good thinking on how this can be done, for real.

noblegiraffe · 03/11/2018 10:16

I looked at Neil Sheldon’s website and there’s nothing on it. What is his thinking about how this can be done?

dennishsherwood · 03/11/2018 11:11

One way is as briefly presented in The Sunday Telegraph, www.telegraph.co.uk/education/2018/08/25/million-gcse-exams-could-open-challenge-due-tounreliable-grading/, and as we have discussed before. This idea dispenses with grades altogether, and awards the candidate's 'raw' mark, say, 69 (say, current grade 7). But we know this mark is 'fuzzy', and that another marker might have given 72, or whatever. So the 'fuzziness' is also declared as, say, ± 5. The certificate therefore shows 69 ± 5 (in whatever format is easily meaningful), clearly showing that any mark from 64 to 74 is (as Ofqual would say) "reasonable". So there is no meaningful difference between this candidate, and one whose 'raw' score is 70 (say, current grade 8), also ± 5. Certainly, this new way of presenting assessments needs to be communicated clearly, and understood correctly, but I'm sure that's possible. And the claim that it's not possible should not be accepted as a reason to perpetuate the current injustice of unreliable grades.

There are two further things that need to be considered. Firstly, the ± 5 is a property of the subject exam, not a particular candidate's individual script. So ± 5 applies to all candidates doing, say, geography. For maths, the number might be ± 2; for English Lit, ± 8. To determine this number, once all scripts have been marked, the exam board could take one script marked, say, 62, and give that same script to each of, say, 50 different markers for a fair, 'blind', re-mark. Some of those re-marks will give 50, but others won't. This set if 50 re-marks will form a distribution, from which the 'fuzziness' can be determined. This process can be carried out for as many scripts originally marked 62 as you like, and also for scripts originally marked 40, 90, whatever. This will give a sensible average value for the subject's 'fuzziness'. A professional statistician can get the details of all this right.

The second matter concerns appeals. Suppose that a candidate assessed as 69 ± 5 appeals, and suppose that the fair remark is 71. This is to be expected, for 71 is within the range 69 ± 5. So the original assessment is confirmed. The same applies to all re-marks in the range 64 to 74.That's why this process is robust under appeal.

But if the re-mark is, say, 76, that's outside the range 69 ± 5, and is indicative of a (significant) marking error. So the assessment is changed to 76 ± 5.

Two last points to complete the picture. Any marking error within the ± 5 range is masked by the measure of 'fuzziness', and so is undetectable (unless you specifically look for it, but why would you?). This shouldn't matter.

But the point that does matter is that, if the fair range is 69 ± 5, then it is possible that the first mark might have been 74, implying that the assessment, as shown on the certificate, would be 74 ± 5, which might make a real difference. This is true, but if the measure ± 5 has been determined statistically properly, then the likelihood that this is will be the case can be limited. But it is still possible. In which case the appeals system should pick it up, for if the awarded assessment is 69 ± 5, whereas the "real truth" is 74 ± 5, there is a 50% probability that a fair re-mark would be 75 or greater.

This is not perfect, but no system will ever be totally so.

The key question is "Is this more fair than the status quo?" I think it is.

An assessment awarded as [mark] ± [fuzziness] is one possibility... there are some others...

Does this make sense?

noblegiraffe · 03/11/2018 11:30

There is no way in hell that a system that reports raw marks will be better than the current system, regardless of confidence intervals.

E.g. Edexcel’s maths exam is tough this year, you’d need 60% for an A.
OCR’s was easier, you’d need 70% for an A.

In your system, instead of awarding those two marks the same grade which takes into account the difficulty of the exam, you’d make an OCR candidate look better than an Edexcel one by presenting the raw mark.

roundaboutthetown · 03/11/2018 12:42

Oh, ffs. To the vast majority of employers, the number before the + or - is what they will pay attention to (eg 69), and they won't give a toss about how accurate that is in the + or - 5 sense, because they have no way of knowing how that particular candidate was affected by that - statistics are a load of old bollocks to the individual case. It would probaby be better to say the candidate got a mark in the range of 64-74 and nobody knows where in that range they really were, or whether they fairly fall into that range at all... and that isn't helpful to anyone!

The only way to make exams genuinely more "fair" in this sense is to dumb them down so much that there is virtually no room for differences of opinion on what makes a good answer - and there is nothing I loathe more than the idea that exams should be a tick box exercise where only one answer, structured in a particular way, is "correct." That's taking the idea of schools being exam factories to the extreme, if all students have to be trained to make robotic responses in order for the exams to be "fair."

dennishsherwood · 03/11/2018 12:58

Noblegiraffe: good point. The 'raw' mark, as I refer to it, is the mark after any adjustment made by the regulator to attempt to ensure equity across the different exam boards - exactly as happens today. A more subtle point is that the value of the fuzziness is quite likely to be different for different boards, and that this idea will make this explicit. If the fuzziness is different for different boards, this is the case today - but it is hidden. But it's still there.

Rounadoutthetown: another good point. But as I said, declaring the (adjusted, thank you, NG) raw mark plus a measure of the fuzziness is just one possibility; two others address, and resolve, exactly the point you make.

titchy · 03/11/2018 14:47

If the value of the fuzziness changes from one exam board to another, how on Earth are employers and educational institutions supposed to deal with that? The system HAS to be simple to use and understand - as well as reasonably accurate. Introduce your system of fuzziness to increase accuracy (except it doesn't increase accuracy, just clarifies the marks for pedants), and you immediately lose simplicity - and simplicity and ease of use are pretty vital.

dennishsherwood · 03/11/2018 16:03

Simplicity. Yes. One way of making things more simple is for there to be one exam board for each subject. There can still be "competition", for different boards can administer different subjects, with the competitive threat that any misdemeanours can be punished by the loss of the 'franchise'.

The current system is deeply rotten, but most of the rot is hidden. The vast majority of the victims - all those young people with unreliable, "wrong", grades - don't even know that they have been awarded the wrong grade. That's hundreds of thousands of DS's and DD's each year.

Covering things up certainly protects the vested interests. But covering things up just perpetuates the problem, and the damage done to all those victims.

I'm arguing that the reliability of grades should be measured, and made public. This does not happen now, despite the great number of statistics published by Ofqual. Once more people realise the extent of the problem, there will be pressure to improve things. What that might look like, I don't know: yes, I have some ideas, but there are surely others too. These need to be identified, and wisely evaluated - alongside the maintenance of the status quo - to identify the best way forward.

If you click on this link cerp.aqa.org.uk/sites/default/files/pdf_upload/CERP_RP_MM_01052005.pdf, you will find a paper, published in AQA n 2005. It is a good read. Here is a sound-bite from page 70:

"However, to not routinely report the levels of unreliability associated with examinations leaves awarding bodies open to suspicion and criticism. For example, Satterly (1994) suggests that the dependability of scores and grades in many external forms of assessment will continue to be unknown to users and candidates because reporting low reliabilities and large margins of error attached to marks or grades would be a source of embarrassment to awarding bodies. Indeed it is unlikely that an awarding body would unilaterally begin reporting reliability estimates or that any individual awarding body would be willing to accept the burden of educating test users in the meanings of those reliability estimates."

Embarrassment. Yes. And that statement that "it is unlikely that an awarding body would unilaterally begin reporting reliability estimates" is insightful too - no-one likes to have their dirty washing paraded in public. But if the exam boards won't declare the truth, what is the role of the regulator?

Especially since the lead author of this report is Dr Michelle Meadows, who, in 2005, worked at AQA.

Today, Dr Meadows is the Executive Director of Risk, Strategy and Reasearch... at Ofqual.

TeenTimesTwo · 03/11/2018 16:29

I've an idea.

Why not have fewer grades, and thus wider bands, and then maybe some extra checks put in place at the boundaries?

Maybe just 3 passing grades, perhaps called A, B and C.
And then maybe 2 lower grades D and E.

roundaboutthetown · 03/11/2018 17:11

denishsherwood - I agree the current system is deeply rotten. I disagree that is hidden at all. It doesn't take a genius to notice that with numerous different exam boards setting different exams for the same subject, there is never going to be a level playing field. It's equally obvious that marking is a fiasco, because there have been complaints for years about consistency, reliability, grade inflation and incompetence. It is nevertheless one thing to assess an exam board's competence by forcing it to publish reliability estimates and another thing to f*ck about with the way individual students' grades are reported, as though that helps anyone decide how good a candidate is, rather than how rubbish the exam board was that their school chose.

noblegiraffe · 03/11/2018 18:15

Parents don’t understand confidence intervals when it comes to Progress 8 and just assume a 0.1 is better than a 0.05 regardless of the extra info.
It would be the same with marks.

‘Raw mark’ means the actual unadulterated score in the test. If you want to fiddle with it to account for test difficulty these are called UMS (uniform mark scale). If you call it raw mark, people will get confused.

Original poster

ehughes7777 · 27/11/2018 22:37

Ofqual have released information today, which confirms grade errors are as bad as we thought. Did Ofqual know this information before they introduced the new system for reviews/appeals, that makes it impossible to appeal unless a school will do it on your behalf?
Please support our attempts to change this by following our Facebook Page - Campaign Against Grade Errors (CAGE). Thank you.
assets.publishing.service.gov.uk/…/Marking_consiste…
www.gov.uk/gov…/publications/marking-roundtable-2018
www.gov.uk/…/opportunities-for-improving-quality-of…

OP posts:

Original poster

ehughes7777 · 27/11/2018 22:40

The links are as follows:
www.gov.uk/government/news/opportunities-for-improving-quality-of-marking
www.gov.uk/government/publications/marking-roundtable-2018
assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/759207/Marking_consistency_metrics_-an_update-_FINAL64492.pdf

OP posts:

dennishsherwood · 27/11/2018 23:17

As a supplement to ehughes7777's recent messages, towards the end of the summary of the Ofqual document entitled "Marking Consistency Metrics - An update", at the bottom of page 4, you will find these words:

"The probability of receiving the definitive grade or adjacent grade is above 0.95 for all qualifications, with many at or very close to 1.0 (ie suggesting that 100% of candidates receive the definitive or adjacent grade in these qualifications)."

You might like to read that again, and then think about how many job applications, university applications, apprenticeship applications... pay any attention to "the adjacent grade".

In plain English, Ofqual are stating that a grade 7, say, might be an 8, or a 6... we just don't know... and that for some subjects (including English Language and history - see Figure 12 on page 21) there is a 5% probability (meaning for 5 candidates out of every 100) that the right grade might in fact be a 9... or perhaps a 5...

Does this matter? Do you care?