Archive for January, 2009

Explaining Things to Computer Scientists

Tuesday, January 20th, 2009

I love this introduction to git: Git for Computer Scientists.

The reason I like it is evident right in the abstract:

Quick introduction to git internals for people who are not scared by words like Directed Acyclic Graph.

Computer Science has such a rich vocabulary for describing data structures and their transformations, yet every other introduction to git I’ve seen uses mostly command line examples and hides technical details behind analogies.

The examples and analogies are helpful, but a little precise discussion of the way a system is put together can be even more helpful. It seems to me that most programmer targeted software introductions and tutorials tend to shy away from directly invoking basic computer science concepts.

It’s a shame, because you can communicate a lot of information in a short amount of time if you use the right vocabulary.

If you’re a programmer writing a document for other programmers, why not leverage this kind of common knowledge to communicate efficiently?

Atwood’s Unfinished Game Revisited

Friday, January 2nd, 2009

My previous post described how this puzzle doesn’t give enough information to be definitively solved. We need to know the mother’s strategy for revealing the gender of the first child.

In that earlier post, I approached things from the perspective of repeated simulations, giving some code to demonstrate. Here’s a more visual argument.

My thinking is that in order to make a judgment about the likeliness of different outcomes, we need to know the probability tree for the different events: the possible genders of the children and which child is revealed first. We can deduce almost the entire tree:

bggraph

But we are never told how the mother decides which child to reveal when she has both a boy and a girl.

In my last post, I gave two ways of completing this tree:

  • If the mother always reveals the gender of the female first whenever possible:

    bggraph1

  • In this case, the a priori chance of the second child being a boy is higher.

  • If the mother always randomly decides which child she will reveal the gender of first:

    bggraph2

    In this case, the a priori chance of the second child being a boy is 50%.

So, to answer the question using a given a probability tree, we examine the possible states we can be in (states in which the first child revealed is ‘G’). Then we prune off the other states and refigure the odds using the rules of conditional probability.

So, depending on which tree is being used, the answer to the question is different.

  • A girl is always chosen first if possible

    bggraph1_pruned

    Probability of a boy second = 2/3

  • First child is always chosen randomly

    bggraph2_pruned

    Probability of a boy second = 1/2

Jeff Atwood and the Unfinished Game

Friday, January 2nd, 2009

I don’t usually read Jeff Atwood, but Paul pointed out to me this post because he knows I like math. The post poses this problem:

Let’s say, hypothetically speaking, you met someone who told you they had two children, and one of them is a girl. What are the odds that person has a boy and a girl?

Commenters argued contentiously between 1:2 and 2:3 (and there were some arguments over semantics as well, but I’m focusing on the math right now). Atwood made this follow-up post claiming the correct answer was 2:3.

I’m not so sure this is necessarily correct.

Atwood’s reasoning is that the possible child combinations are: BB, GB, BG, and GG. We know one child is a girl, so that eliminates BB, and 2:3 remaining options have a boy. The sticking point many people have is: why are GB and BG two separate cases?

Atwood’s answer is to draw analogy with The Unfinished Game, a problem he formulates thusly:

Two players, Harry and Ted, place equal bets on who will win the best of 5 coin tosses. In each round, Harry always chooses heads (H), and Ted always chooses tails (T). Suppose they are forced to abandon the game after 3 coin tosses, with Harry ahead 2 to 1. What is the fairest way to divide the pot?

The “intuitive” answer is that the only possible continuations of the game are: H, TH, or TT (since Harry wins on the next H) so 2/3 of the pot should go to Harry since he wins 2:3 of the expected games. The real answer is that the HT and HH continuations affect the expected games equally “strongly” as TH and TT, so Harry actually has a 3:4 chance of winning. Jeff demonstrates this with a code simulation.

The implied argument is that, just like HT and HH are “equally strong” in the Unfinished Game, so are the GBs and BGs in the child gender problem.

Again, I don’t think this is necessarily the case.

In Atwood’s original problem, there is the complication of the mother deciding to tell you one of her children is female. I think the answer hinges on the behavior of the mother.

Here is my answer to Atwood’s original child gender problem. It’s a two-parter, depending on how the mother acts:

  • Scenario 1: When deciding which child’s gender to report, the mother always picks a female if she can.

This yields the 2:3 answer.

If we do a large number of simulations where we randomly generate two genders sequentially, we’ll get BB, GB, BG, or GG equal numbers of times.

We know the mother told us one child is a girl. So 100% of the time we generate a BB, it never makes it to the mother reporting a female gender. So, that gender generation doesn’t count and we throw it out.

We know the mother will always pick G if she can, so 100% of the BGs, BGs, and GGs meet the assumptions of our scenario — they make it to the “mother reported a girl” phase. So, Atwood’s original logic applies and 2:3 times the second child will be a boy.

To demonstrate, here’s a simulation in Python 3.0:

import random
 
genders = ["B","G"]
 
def spawn(n):
    return [random.choice(genders) for i in range(n)]
 
def run(times):
 
    boy = 0
    girl = 0
 
    valid = 0
 
    for i in range(times):
 
        children = spawn(2)
 
        if not "G" in children:
            # This scenario is was already thrown out
            # by our initial assumption -- doesn't count
            continue
        else:
            valid += 1
 
        if "B" in children:
            boy += 1
        else:
            girl += 1
 
    boyRatio = float(boy) / valid
    girlRatio = float(girl) / valid
 
    print("Boy: %f" % boyRatio)
    print("Girl: %f" % girlRatio)
>>> run(10000)
Boy: 0.669774
Girl: 0.330226
  • Scenario 2: The mother randomly picks a child whose gender to report.

This yields the 1:2 answer.

Imagine running the simulation as before. Just as in the previous simulation, we have to throw out scenarios where the mother reports B, since we were assuming we had arrived at the point where she reported G. Previously, since she always reported G in the BG and GB cases, we only had to throw out the BBs.

This time, however, when we get a BG, the mother will choose B or G to report at random. 50% of the time, the mother will report B, which means we will have to throw that simulation out — just as we do the BBs. Similarly, 50% of the BGs will not count. The BGs and GBs are both at “half strength” since half of them don’t make it to the “mother reported a girl” stage. So, we’ll get (BG or GB) the same amount of time we get GG, giving us a 50-50 shot at a boy.

Here’s the Python simulation for this case:

import random
 
genders = ["B","G"]
 
def spawn(n):
    return [random.choice(genders) for i in range(n)]
 
def run(times):
 
    boy = 0
    girl = 0
 
    valid = 0
 
    for i in range(times):
 
        children = spawn(2)
 
        random.shuffle(children)   
 
        childX = children[0]
        childY = children[1]
 
        if childX != "G":
            # This scenario is was already thrown out
            # by our initial assumption
            continue
        else:
            valid += 1
 
        if childY != "B":
            boy += 1
        else:
            girl += 1
 
    boyRatio = float(boy) / valid
    girlRatio = float(girl) / valid
 
    print("Boy: %f" % boyRatio)
    print("Girl: %f" % girlRatio)
>>> run(10000)
Boy: 0.495971
Girl: 0.504029

The crucial difference in the simulations is:

if not "G" in children:
            # This scenario is was already thrown out
            # by our initial assumption -- doesn't count
            continue

versus:

if childX != "G":
            # This scenario is was already thrown out
            # by our initial assumption -- doesn't count
            continue

So, as far as I can tell, the real answer is that we don’t know enough about the mother’s behavior to give a definitive answer of how this scenario plays out on average.

Of course, this kind of reasoning is pretty tricky to pull off correctly, so I wouldn’t be surprised if a good counter arises. But I’ve thought about this problem for a while, and, for now at least, I’m pretty thoroughly convinced that this solution is correct.