OK, I'm trying to get this clear in my head, bear with me.
So, you take a population, and divide it into four: 'more intelligent' girls, 'more intelligent' boys, 'less intelligent' girls, 'less intelligent' boys (scare quotes as i'm not clear exactly what 'intelligent' signifies here - perhaps higher scoring on an IQ test?)
Then, the single group 'more intelligent' girls is observed to perform less well than any of the three other groups at a particular style of test - one that becomes extremely hard in the middle, and then easier again.
My first question would be, over how many studies has this been observed? If it is on one study or a small number of studies only, how large are these studies, how robust are their methodologies? Sorry to ask these questions, but it's very easy to get a superficially interesting result purely by chance.
If this effect has been observed over a large population and/or on several studies, then it becomes interesting.
I'm thinking that stereotype threat can presumably be ruled out, since if this was the issue, then 'less intelligent' girls would underperform too (and certainly not out perform 'more intelligent' girls).
Similarly, one would assume that a specifially gender related way of approaching problem solving (eg more strategic) can be ruled out for the same reason.
Which I guess does suggest issues of perfectionism and a particular way of socialising 'more intelligent' girls as a place to start looking.