Here is the evidence for just how unreliable GCSE and A level grades are. The picture is a slide from a presentation entitled "Quality of marking: confidence and consistency" given by Michelle Meadows, a Director of Ofqual, at a meeting held in June 2017 (see www.gov.uk/government/news/presentations-from-ofquals-summer-series-symposium-2017).
The picture answers the question "What is the likelihood that a script in a given subject will be given a different grade if the original grade is appealed?". That's an important question, for it shows how reliable - or unreliable - the originally-awarded grade is.
As the picture shows, over each of the last four years, for GCSE physics, there has been a 15% probability that a script would be re-graded on appeal, implying that 15 candidates in every 100 were awarded the wrong grade when they opened their envelope. For English Language, 30 candidates in each 100 were awarded the wrong grade; for history, 40.
These numbers – 15 for physics, 30 for Eng Lang, 40 for history – are averages over the entire marking range from 0 to 100. Another Ofqual report shows that the probability of being awarded the wrong grade increases to 50% or more for scripts marked close to a grade boundary (see Figures 12 and 13 of "Marking Consistency Metrics", November 2016, www.gov.uk/government/publications/marking-consistency-metrics). So, if a candidate’s mark is near a grade boundary, tossing a coin might be fairer!
For English Language, the figure of 30 refers to grading according to A*, A, B.... This year, the figure could be as high as 45 candidates per 100 getting the wrong grade, as caused by the change in grading to 9, 8, 7... where 6 grades are now squeezed into the same ‘space’ previously occupied by 4, so narrowing the grade widths (and 45 = 30 x 6 / 4) – as explained on page 21 of "Marking Consistency Metrics".
These problems are a result of the fact that different markers can legitimately give the same question slightly different marks. So a script might be marked 64, or perhaps 66. If the grade boundary is 65, this is important: if the mark is 64, it’s grade C; if the mark is 66, grade B. And if an appeal is made for the script marked 64, the re-mark might be 66, and the C is up-graded to a B. The reality is that the mark given to the script is not the exact number 64, or 66: the script should be marked, say, 64 ± 2, where the ± 2 represents the legitimate variability in marking. In general, if this variability is represented as f marks (f = 2 in the example) – this being a number that can be reliably measured – then any script given a mark m ( = 64 in the example) should be represented as m ± f (64 ± 2 in the example).
The candidate’s grade can then determined not by m (= 64, grade C), but by giving the candidate the “benefit of the doubt”, and determining the grade based on m + f (= 66, grade B). If an appeal is made, then it is very likely that the re-mark does not exceed 66, and so the originally awarded grade will be confirmed. The grade will be robust under appeal, and so be reliable.
Giving the “benefit of the doubt” ensures that no candidate is awarded a grade lower than he or she deserves, no candidate is disadvantaged, no candidate loses important life chances. Nor does this idea drive grade inflation – as described in the blogs on www.silverbulletmachine.com.