Getting a top mark on an end of topic test doesn't mean they've made any progress on the previous topic
That is true, but if every setting test covers two topics, and there are two setting tests, that are a two setting tests for a set move, then that surely if you get put in set 1 after two setting tests that should show you are doing well relative to cohort, that gives you some real data. If you don't because one test is good and the other isn't, it might be because you had a bad day, or because you didn't understand topics 1 and 2. If the test shows you lost most of your marks in a particular topic, and you get put down, and your best friend doesn't tha suggests you are weak on a particular topic, and she isn't.
Of course topics build on on another, so that certain Year 8 topics build on year 7, for example. But so what. If one does badly on an end of topic test in Year 8, it might be because one never understood the work in Year 7 in the first place, or because one didn't understand the build, but bottom line, what that test shows is that one is not performing at the level expected of Year 8, which is what a parent wants to know.
What a parent wants to know (and therefore hopefully what Ofsted should be testing for, and it is the only thing Ofsted should be testing for) is how is my child doing in terms of what is reasonable to expect at this point.
It is like a journey to the top of a mountain. Sure, the really tough cliff at the top of the mountain might prove impossible for me to climb, no matter how well I have navigated the bottom slopes. But if I am struggling with the bottom slopes, and are not far enough up the path, I won't even get as far as the cliff face at the top. If I get up the bottom slopes well, then I am clearly in reasonable shape to attempt the cliff face.
I don't believe any parent is asking for confirmation that their kid will be able to climb the cliff face of GCSE, based in what is going on in Year 7 or 8. But surely the opposite must be true. If they are not successfully navagating the lower slopes to the point where most kids reach at this age, it will be that much harder (not impossible, but that much harder), to attempt the cliff face when that time comes. And if most kids after having this level of preparation actually then do navigate the cliff face, it is not unreasonable to mark certain points on the journey as adequate or not adequate.
And clearly kids have different strengths and weaknesses. I have one kid who handles algebraic equations with ease, and is most likely to fall down on the geometric type, whereas another who struggles with the abstract nature of algebraic equations, but can see geometric relationships instantly. Very different kinds of mathematicians. Both can do very well, ultimately, but both also need to know which types of questions they will almost certainly lose marks on, and need to work on, even if they both get the same marks in a test (a 9 or 8 or whatever). Because I have some mathematical background I can see this, but if I didn't, I might ideally want the teacher to be able to tell me that, but I suspect it is likely they will not be able to (although topic tests should help). I might also ideally want to have some idea of the ratio of algebraic type questions to geometric type questions in the syllabus. So long as it is kept in roughly the same ratio all the way through, then my kids are less likely to face major changes. If Year 7 and 8 is all geometry and then GCSE is all algebra, sure, what my kids get at Year 7 and 8 is not going to give a good picture of how they will do at GCSE. But if the subjects in Year 7 and 8 are well chosen because they will build to GCSE, then some sort of averaging of topic tests should give me a reasonable picture of how well they are prepared for the GCSE. And surely 100-200 kids is likely to be somewhat statistically significant, in which case how that cohort does on a setting test should give me a reasonable feel for how my DC are positioned (unless the teaching is exceptionally poor, or brilliant, when they will be lower or higher, correspondingly).
Isn't this the sort of data that should be being produced?