I have looked into the research about the "greater male variability hypothesis" in various test-taking, the idea that there are more male than female test-takers in both tails of the distributions, and I have a couple of extra points which are worth thinking about when interpreting those test findings:
- Who takes the tests in terms of asking if the samples are randomly drawn from the underlying populations?
- Which sex uses guessing more in test-taking?
- What is happening to the extreme tail differences over time?
On the first point:
The people taking IQ tests or scholastic tests are usually not randomly drawn samples from the underlying populations. Different countries, for instance, test populations for IQ for different reasons (some to find those who need special education, as the IQ test was developed for that very reason, not as a measure of general intelligence). And scholastic test-taking is based on students choosing to take the test.
In the US, for instance, there are many more girls taking the tests for university admission than there are boys. This affects both the average score and the distribution of the score.
It's not impossible that the larger female sample contains a higher percentage of girls who are going to score lower than the smaller male sample, if the 'extra girls' are created by more girls than boys not doing as well at school deciding to take the test as job opportunities for women without university education are not as good as job opportunities for men without university education.
I would like to see studies based on random sampling in this field so that we could control for the sample size and self-selection problems.
On the second point: Guessing in true-false and multiple-choice questions can increase the proportion of those who score in extreme ends of the distribution, depending on the way correct and false answers are scored, including not choosing an answer at all. Boys and men are more likely to treat the test as a game and more likely to guess than girls and women who are more likely to not answer a question they are unsure of.
This is unlikely to cause tremendous differences, but it will have some effects, especially in true-false type tests.
On the third point: In the US there are more boys than girls who score in the top one percent for the mathematics SAT test, but over time the difference in that is getting smaller. So whatever the causes of that difference, at least some appear to be changing.
Could be differences in practice, given that boys' traditional games and play tend to strengthen things like three-dimensional mental rotation while girls' traditional games and play tend to strengthen linguistic talents etc.
Interestingly enough, in the verbal/linguistic SAT tests (and essay writing tests) the extreme upper one percentage contains more girls than boys.
There are other wider questions that can't be fully answered, such as how tests are created and calibrated (the weights different types of questions are assigned, for instance, the lack of memory effects in the testing, the absence of several kinds of intelligence which can't be easily tested in short amounts of time (creativity, social intelligence), the focus on speed of answers over thoroughness, say), and to what extent we are measuring the same things at both extremes in some distributions (if the lower end, say, includes effects caused by medical conditions).
This is not to dispute the findings themselves, but just to set them into a wider framework and to argue that the questions don't have their final answers yet.