Break Bad Data: Part 1 - Stanines
I have had the privilege of playing with complex sets of educational data for well over a decade now; I love it. It’s my happy place indeed.
Throughout this time, I have worked with teachers and school leaders to interrogate and understand many different forms of education related data. This has included everything from the boring (i.e. percentages and traditional grades), to the obtuse (z-scores), to the insightful (such as developmental rubrics, learning progressions, student voice, staff perceptions …) and so on.
One of the forms of data I see come up from time to time are stanines. Sigh.
Now, I am going to try and be as transparent as possible.
I CANNOT STAND STANINES AS A FORM OF DATA.
Indeed, I dislike them with ferocity of 1000 suns.
Why, I hear you ask?
The reason is that as far as I am concerned, they have exceptionally little practical utility at all and can actually obscure what really matters.
So, what is/ARE stanine(s)?
The word stanine is really just an abbreviation or aggregation of the two words STANDARD NINE. And it is the name of a methodology for scaling/reporting scores (typically from a norm-referenced standardised test).
Essentially, give or take, the underlying normal distribution upon which scores are divided into nine interval (see below).
Using stanine, a student’s test score is reported based upon where it falls within the distribution. For example, if I sat an assessment and my score places me at the 65th percentile, then my score would be reported as stanine 6.
Sounds pretty straight forward, so what is my problem?
My issue with stanines is really three-fold:
There is a lot of variation in terms of a student percentile score within stanine bands. Indeed, there is a possible range of up to 20 percentile points for students categorised with the same stanine. For example two students sitting on the 40th and 59th percentiles respectively would both be categorised as stanine 5. I argue that the difference between these students is not insignificant and they may well need different supports in order to progress their learning - this is not reflected or obvious though as they both are reported as stanine 5.
There can be more variation, in terms of student percentile, within a a stanine than between stanines. For example, the difference between two students that are categorised as stanine 8 can be up to 6 percentile points wheres as the difference between a stanine 8 student and stanine 9 students can be as small as 1 percentile point. This can be a big problem when stanines are used as a metric for determining student eligibility for extension or gift and talented program.
This is perhaps the most important point. Apart from being able to identify where a student sits, coarsely, relative to other students who sat the assessment when it was normed a stanine tells you very little else. They do not tell you what a student is capable of, what they know,what they can say, what they can do, what they can make, or indeed, how much they have learnt or what thy are ready to learn next [all of this is also true of percentiles and z-scores].
My view is that educational assessment should have obvious utility and impact on student instruction. Unfortunately stanine fail this test. Fortunately, many of the assessments from which stanines are derived actually have alternates that have much more practical utility. We will talk to these in the future.
Over the next few weeks, we are going to delve a little deeper into other forms of educational data both good and bad of educational.
Past post that might be interest include:
Data, data, everywhere …. but not a drop to drink
Data in education: The good, The bad and The ugly.
At Educational Data Talks we love working with schools to understand their data better.
If you have not already subscribed to our newsletter sign up at the bottom of the page!
Like to find out more? Give us a call or drop us an email.
Tim O’Leary