What’s in a Name?
Letters, of course!
I'm helping Amy plan out some games for the Baby Shower and I'm being a bit geekier than I probably should be. First, some background:
We haven't told anyone what the gender of the baby is yet, mainly because we don't want a bunch of gender specific junk that we'll never actually use (hence this). But, we're now thinking that after the shower, it won't really matter since everyone would have already bought us all the stuff we're going to get. So I thought it would be fun to have a word scramble of the name. Since we already have a name picked out, first and middle, it would be fun to give everyone the letters to the first and middle names scrambled up without telling them the gender and see if anyone can come up with the correct names.
So a thought struck me, Can you figure out the gender of the baby just by looking at the letters in the scramble? Both the first and middle name of the baby is very gender specific, as American names go, and if there is a connection between letters and gender of American names, some simple analysis should reveal the gender of the baby. That's the hypothesis anyway. So I did what any card carrying geek would do, I ran the numbers.
First, I needed a big subset of American names for boys and girls. It was hard finding just a listing of the names. Most sites have search engines for names, but no listing download. The best one I found, and probably the most accurate, was at the U.S. Government's Social Security site. This had a list of the most popular 1000 boys and girls names for the decade starting in 2000. They had other decades, but I figured I'd focus on this one for now.
The first thing to pop out at me was the exoticness of the top girls names. The top ten boys names are:
| Rank | Name | Number | Percent |
|---|---|---|---|
| 1 | Jacob | 154161 | 1.49 |
| 2 | Michael | 141515 | 1.36 |
| 3 | Joshua | 127928 | 1.23 |
| 4 | Matthew | 126576 | 1.22 |
| 5 | Andrew | 111248 | 1.07 |
| 6 | Christopher | 109926 | 1.06 |
| 7 | Joseph | 107276 | 1.03 |
| 8 | Nicholas | 106558 | 1.03 |
| 9 | Daniel | 105822 | 1.02 |
| 10 | William | 100450 | 0.97 |
These make sense. For each boy's name, I've known at least one person to have it, from school or otherwise. Fairly standard. But the girls:
| Rank | Name | Number | Percent |
|---|---|---|---|
| 1 | Emily | 125564 | 1.27 |
| 2 | Madison | 104263 | 1.05 |
| 3 | Hannah | 95257 | 0.96 |
| 4 | Emma | 86201 | 0.87 |
| 5 | Ashley | 78379 | 0.79 |
| 6 | Alexis | 77565 | 0.78 |
| 7 | Samantha | 75065 | 0.76 |
| 8 | Sarah | 74275 | 0.75 |
| 9 | Abigail | 74162 | 0.75 |
| 10 | Olivia | 73279 | 0.74 |
I've known an Emily, Sarah and Samantha. An Olivia? Abigail? These are names that I never would have guessed would be in the top ten. Even Elizabeth is number 11, beaten out by Alexis at number 6. And girls names in general have way less market share, only two garnering more than 1% of the female population born between 2000 and now. Is this because people want girls to have more exotic sounding names and hence need more variety than boys' names?
I may eventually study past decades to see if this is a running theme, but for now, back to the original question.
I took the names, 1000 top names for boys and the 1000 top names for girls, and broke out the letters in the names and counted them. I did this with Ruby (my old pal) and broke the letters apart and threw the counts into a hashtable, put that in a spreadsheet and got this:

(Boys are blue, girls red, for clarity)
The striking thing is that there are a few definite girl letters, like A,I, L, and Y, but not many boy letters, only O and R really. Also, there are more As in girls' names than there are girls' names in the list, 1159 As in 1000 names. Having a preponderance of girl letters should be a valid indicator of gender in a name scramble, but I wouldn't necessarily say that having a lot of boy letters would say the same. As are prevalent in each gender, but if you've got three As and three Ls, there's probably a good chance that it's a girl. (Sorry, Alan. I'm just running the numbers here.)
Since some of the later names in the 1000 are not as popular, and therefore not as "tested" as the others, I decided to run a top 100 names as well.

We still have more As in girls than actual names and girls claim even more letters, like E and L. But behold, boys lose their letters almost completely! The difference between R disappears and O closes the gap enough to be significant. Boys have no significant letters left to claim as their own! From the way this looks, the only thing you can tell about scrambled names is whether they're girls or maybe girls. Even though girls' names are trying to get more exotic, the letters themselves are not and certain letters show a preponderance for being feminine more than others.
Now a more valid test of all of this isn't so much numbers as phonics. I think I saw some phonics type programs before and I'll see if I can dig anything up. If anyone knows of something, let me know.
I would actually like to try and rig up a program that will try to guess gender of a name by it's letters or phonics. The results here, though, are a little disheartening for guessing boys' names, but I'll see what I can put together.
Of course, I don't think there will be a lot of geeks at the baby shower, so we probably would do the scramble. I thought it would be funny for me to give a presentation on the findings at the shower, with powerpoint, projector and everything, just to be funny. I don't think most of the attendees will get it, but we would.
And I'd give the name scramble here, but I'm afraid it would give too much away.
Related posts:
- Snarky Baby I have a baby due in May and my wife...
- Gizmo on the XO I had long thought of trying to get the Gizmo...
- Java Tips & Tricks: A Couple Pointers A Couple PointersMany of you may have learned the following...