### A basic statistical analysis of the Physics GRE

So I am going to talk a bit about the Physics GRE. This is a test I spent many hours of my life studying for. I personally scored a 56 percentile on the official test, which isn't terrible. On practice tests I scored from the 30's to the 80's, depending on the test, averaging out in the 60's somewhere. The top of one of the practice GRE Physics tests. This particular test is test GR1777. The practice test can be found on the ETS website https://www.ets.org/s/gre/pdf/practice_book_physics.pdf

Usually the people that I have seen defend this test are those who did well on it. To be frank, even though I was getting decent scores, I don't like this test and I do not think that it is a good indicator of talent, ability, or knowledge. I also have a reason for this which is valid regardless of your position on the matter, the reason being the spread of my scores.

A question I ended up asking myself was, "Is this massive spread just happening to me? Do all physics GRE test takers experience this?" Since a lot of the research I've done has involved a lot of statistics,  I ended up doing a bit of research into the matter to see if I could unearth any insights. What I found was surprising.

### Making Assumptions

First we need to lay some ground rules. Since I don't have direct access to physics GRE test results and I don't know how ETS officially calculates their percentile, I have to make some assumptions.

First I'm going to assume that the statistical problem we are dealing with is either a Binomial or Gaussian distribution. More realistically, it's probably a combination of both, but for simplicity/clarity I'm going to look at each case individually.

I will also be making these base assumptions into account which I construe as reasonable
• The statistics of the GR1777 represents other modern physics GRE tests and that the statistics it presents are accurate.
• There are no abnormalities in the statistical distributions of the test, mainly that scores are representative of a single Gaussian or Binomial distribution
• The percentile calculation is dependent only upon how many questions are answered correctly and isn't affected by the amount answered incorrectly (this is true for modern physics GRE's).
• A test taker will always randomly guess if they don't know an answer. This may not always be the case but trying to factor in a test takers strategies is kind of impossible in this case.
• The GRE website statistics are accurate which I found here (I accessed this link on Feb 2020)
• I'm allowed to reasonably round.

### Considering the Test is a Simple, Binomial Distribution

This is the case where I give the physics GRE the benefit of the doubt, meaning that the only uncertainty in a tester's score comes from the random guessing on the testers part. To further clarify, If they take one test and know how to correctly answer 50 questions, they will always get at least 50 correct every single time they take the test.

OK time to get to the details of the analysis (which will certainly include even more assumptions).

Lets assume our test taker is perfectly average. For physics, the GRE website I linked to states that the 50 percentile is between 700-720 and since it's a 4 percentile jump from 48 to 52 respectively, I'll take the 50th percentile to be a score of 710.

If we then take a look at the GR1777 (the official GRE website updates the practice test periodically so you might have to google 'physics gre GR1777 pdf' to find a copy) a score of 710 leads to an average amount of 56 questions correctly answered.

Since I've assumed the test taker is perfectly average, this lead to being able to deduce they know 45 of the questions. To explain this, remember that I assumed the tester is average and thus guesses an average amount of questions correctly. Since the physics GRE GR1777 questions have 5 choices each, on average a student will guess 1/5 of them correctly. If that student perfectly knows 45 of the questions, on average they will get 11 of the 55 correct (55/5 = 11). Just add the average amount guessed correctly to the amount they perfectly knew (45+11=56) thus bringing them up to the average score.

Assuming that the distribution is a binomial distribution, the probability that the tester successfully answers correctly is 1/5. The variance is therefore n*p*(1-p), where n is the number of trials and p is the probability of success. The student guesses on 55 questions and has a 1/5 chance of getting it correct. The standard deviation is the root of the variance or equal to sqrt(55*.2*.8) =~ 3.

All this means an average tester who gets a a score of 53 is statistically the same as one who gets 59. For the GR1777 this represents scores between the range of 680-740. The corresponding percentile spread 45 to the 56 percentile. If a tester gets lucky, say around 2 standard deviations, they could get a 780 or 63 percentile. If they get unlucky by the same amount, they would get a 640 which corresponds to a 36 percentile. Funny enough that means that an average tester who's lucky with a 63 percentile is statistically the same as an unlucky tester scoring a 36 percentile.

If you look up how standard deviations work, we can assume that 1/3 of all average testers fall into this lucky/unlucky category, with 2/3 falling into the normal score category. This is a graph of how a standard deviation is distributed. 68.2% will be within +/- 1 standard deviation of the average. 27.2% will be 1 to 2 standard deviations from the average. 4.4% will be beyond 2 standard deviations from the average. I retrieved this figure from https://commons.wikimedia.org/wiki/File:Standard_deviation_diagram.svg, Feburary 2020.

### Considering the Test is not Binomial but is Gaussian as described by ETS

If the test accurately tested the ability of a test taker and was a binomial distribution, I would say it's not a terrible to include it in admissions. HOWEVER, this is not the case.

This time around I am not going to give the physics GRE the benefit of the doubt. The physics GRE isn't perfectly Binomial. There is going to be variation from student to student, from test to test. Even if two students are academically equal, if one of them gets questions they specifically know how to solve they are going to get a higher score, while maybe another having had a curriculum which didn't focus on physics GRE content tested on, meaning they are less prepared. This isn't because the less prepared student is worse or studied less, but it is because they are unlucky.
Since I don't have the raw data of the test, I am going to use the physics GRE statistics published by ETS (the company behind the physics GRE) as the standard deviation for my Gaussian distribution to further push my point I obtained this table from the ets website https://www.ets.org/s/gre/pdf/gre_guide_table2.pdf on February 2020.

According to ETS, Physics GRE scores have a standard deviation of 160 for any given student. Because this is the standard deviation of the population, not the standard deviation a student might expect from test to test, one cannot say for certain that this is what any given student might expect.

However, If you do take this to mean the uncertainty an average student might expect when taking this test, something that I contend isn't entirely unfair to do, assuming that this standard deviation represents a single Gaussian distribution of scores, a student scoring a 38 percentile is statically the same as one that scored a 79 percentile. If this truly is a standard deviation, around 2/3 of perfectly average testers should fall within this absurd range.  A perfectly average, 2 standard deviation unlucky tester, according to ETS's published statistics, drops that student into the single digit percentiles. On the flip side, an average, 2 standard deviation lucky tester is launched into the 90th percentiles. Once again, if this is truly a standard deviation as ETS claims it to be, 1/3 of perfectly average, 50 percentile students, fall into this lucky/unlucky category.

### What this means

If what ETS says is true and this SD has any resemblance to what an average student might expect, then my 30-80 percentile range of scores is perfectly normal and is to be expected.

So then why do we use this test? The only answer I've received that is believable is 'That's just how it is. Deal with it.' To which I would respond,

"Wouldn't it be more ethical to just apply a random cut? If a tester is a 20th percentile tester, and pays to take the test 8 times there is a solid chance they could end up in the 80th percentile. If an above average, 80th percentile tester, who can only afford to take the test once gets reasonably unlucky, they could very well end up in the 20th percentile according to ETS. "

If that doesn't dissuade them I might then say

"Isn't it hypocritical for a department to say they believe in helping those who are disadvantaged while simultaneously using this test in admissions? These absurd statistics nearly ensures those who can pay to take the test multiple times will score well."

Or if I'm feeling extra salty

"If a physics faculty member understands the statistics of the phyiscs GRE and still accepts it as a valid way to critique an applicant, what does that say about quality of the statistics they are willing to accept in their own research?."

Since I don't have the raw data on these statistics, I can't concretely say what such a high standard deviation means. It could very well be that there are two Gaussians, one on the lower end and one on the higher (which would cause a large standard deviation if you try to approximate it as a single Gaussian) which could mean any number of things.

Or it could just be that this test is truly just this unreliable. Either way, it's probably something that shouldn't be taken seriously by any self respecting physicist.

1. 