Prompted by recent discussions about house numbers becoming illuminated features in order to aid delivery of fast food by Deliveroo riders after dark I used an idle lockdown moment to peruse the incidence of house numbers in the UK by using taginfo. We have a sample size of 2 293 590 house numbers with 29 881 values.
Initially I thought I had stumbled upon a curious feature as the frequency of house numbers follows a sequential pattern i.e. 1 is the most frequent house number followed by 2 and so on. I was about to muse on why this should be and concoct all kinds of theories about mapper behaviour and urban planning, when a statistician informed me this is a merely an example of Zipf’sLaw in operation.
Now I’m not a statistician and after two lines of the Wikipedia article linked above I’m lost so I don’t know what sample size you need for a perfect sequential distribution or whether Zipf’s Law merely describes the phenomenon or explains it.
Our house number data follows Zipf’s Law until no 38 (except for the number 13 for which there are cultural reasons associated with superstitious beliefs about bad luck which depresses its occurrence). Then a curious pattern emerges with 40 and 39 reversing sequence repeated at 50 and 49;80 and 79; and 90 and 89.
After 100 there are only interspersed regular sequences and fairly rapidly any discernible pattern disappears.
I guess this a typical pattern of distribution which can be explained by the kind of complex tools that statisticians use and as the sample size increases the further the sequential order is exhibited . So I further guess there are no mysteries lurking in the data we have gathered, but it would be nice to have this confirmed(or even explained) by skilled practitioners, if possible in layman’s terms.