posted by
palfrey at 12:34am on 10/08/2008
![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
As a result of an earlier conversion with
valkyriekaren about the distribution of birthdays and whether colder weather will effect frequency/numbers on multiple-partner sex (as you do), I've been digging through various blocks of numbers today. Specifically, NSO data on birth dates, Wikipedia's date pages (with their inclusion criteria of notability) and numbers culled from my friends on Facebook and Livejournal.
Having written a parser to get all of these into a single format, I wanted to try and figure out smart ways to determine whether there are similarities between the distributions in each of these sets, and also whether they are similar to a theoretical perfectly uniform distribution of birthdays. My statistics knowledge appears to have however failed at this point, and my brain is currently dribbling out of my ears with the force of attempts to think of better ideas. I planned on doing a Kolmogorov-Smirnov test (yay for the R <-> Python bridge) but it doesn't appear to be getting the results I had hoped for (e.g. comparing uniform and NSO data says they're drastically different - p-value of 0.0097, which is kinda a "d'oh" moment because they are very different distributions...). Anyone else want me to point me in other directions?
Other potential uses of this:
![[livejournal.com profile]](https://www.dreamwidth.org/img/external/lj-userinfo.gif)
Having written a parser to get all of these into a single format, I wanted to try and figure out smart ways to determine whether there are similarities between the distributions in each of these sets, and also whether they are similar to a theoretical perfectly uniform distribution of birthdays. My statistics knowledge appears to have however failed at this point, and my brain is currently dribbling out of my ears with the force of attempts to think of better ideas. I planned on doing a Kolmogorov-Smirnov test (yay for the R <-> Python bridge) but it doesn't appear to be getting the results I had hoped for (e.g. comparing uniform and NSO data says they're drastically different - p-value of 0.0097, which is kinda a "d'oh" moment because they are very different distributions...). Anyone else want me to point me in other directions?
Other potential uses of this:
- Crazily determining a new set of months based only on attempts to evenly space the number of friends who have birthdays in each one (I'd do this with the NSO data, but they don't provide a per-day breakdown yet, and have been emailed about that)
- Attempting to add more evidence to the pile that wants to further kill the idea of zodiac compatibility
- Testing whether particular months/parts of the year include more famous (for wikipedia-notable values of famous) people
There are 3 comments on this entry. (Reply.)