from David MacKay

Sunday, 17 January 2010

from David MacKay

One of many joyful memories of Sam is the tradition he invented of giving presentations in the style of a cinema. He would precede "The Feature Presentation" (which was, say, 45 minutes long) by a 5-minute "Short" on a different topic, for example some neat algorithm he had picked up recently and wanted to pass on.

Sam also initiated a tradition of entertaining the community, usually at holiday-time, with well-chosen puzzles, some of which I am still chewing over. Here's a message from Sam.

From: Sam Roweis Wed Aug 6 00:22:26 2003 To: learnmeet@cs.toronto.edu cc: "David J.C. MacKay" Subject: coin puzzle Here's a summer coin flipping puzzle to keep you occupied: "You are given a biased coin, whose bias you do not know. Using a sequence of independent flips of this coin, simulate a sequence of independent flips of a fair (unbiased) coin." The goal is to do this as efficiently as possible, in the sense that you use as few flips (on average) of the biased coin per simulated flip of the unbiased coin. Here's an inefficient but correct way to get you thinking: Flip the biased coin twice. If it comes up HH or TT, try again. If it comes up HT, output H. If it comes up TH, output T. Of course if the bias is near 0 or 1 you will be rejecting a lot...

Sam, you were a wonderful, wonderful colleague and friend. Life was always filled with fun and humour and interest and intensity when you were around. We will miss you terribly, but you are not gone - you are still with us now, in the memes with which you have infected us.

5 comments:

Michael Littman18 January 2010 at 15:16
I love that you posted a puzzle. Over the past few days when I start to focus on how sad it is, I shift gears and work on the puzzle instead. As someone at the amazing memorial service said, "It's what Sam would have wanted."

I've made some progress, but I'm still a bit stumped. One observation is that, as the coin probability approaches 0 or 1, the coin has lower entropy and so more flips are needed to get a bit (fair flip) out. So, it's not surprising that the "flip twice" scheme ends up doing a lot of rejection at the extremes---any method will have to do a lot of flips there.

In fact, the expected number of flips for the "flip twice" solution seems nearly optimal at the extremes. In particular, my intuition is that the number of flips needed has to be at least 1/min(p,1-p) because that's the expected number of times you need to flip before you see both H and T come up. Anything less than that and you are only seeing H or only seeing T, and I don't see how you can get a bit out of that.

So, near the extremes, the expected number of flips to get a bit out of "flip twice" is maybe 1/2 flip more than what it takes to get both H and T.

So, maybe the hard part here is coming up with a scheme that helps when the coin is close to fair already?
ReplyDelete
Replies
Nicolas19 January 2010 at 09:36
That's the impression I got as well: at the extremes, this solution seems optimal since you need at least one of each draw to get a bit, which is almost what it provides (minus the HH case if P(T)~ 1, which is negligible). After a lunch discussion, John Winn came with a better solution:
- if HT or TH comes up, do as before;
- if HH or TT comes up, store it and wait until the next HH or TT comes up (while getting bits for the HT and TH draws);
- if the next homogeneous draw is different from the last stored one, get a bit (that is, if you have HH then TT, output H. If you get TT then HH, output T)
- if the next homogeneous draw is the same as before (HH then HH, for instance), store it and wait for the next sequence of two homogeneous draws (which may be interrupted by heterogeneous draw, or homogeneous draws of different kinds).

You only store the sequences in powers of two (HH, HH HH, HH HH HH HH, ...), waiting for a sequence of the same length, but of the opposite kind to happen.

And that's where it shows that one needs real talent to explain things in a clear way...
ReplyDelete
Replies
Tom Minka21 January 2010 at 23:38
Here is another way to generalize from Sam’s hint. Flip the coin n times and let h be the number of heads. There are choose(n,h) possible sequences with h heads, and they are equally likely. Number them and write the index of the sequence you got in binary. Thus the problem is reduced to converting a uniformly-distributed integer into uniform bits. This gives floor(log2(choose(n,h))) bits. For example, the sequence HTTT gives 2 bits (not just 1). Sam’s hint is the case n=2.
ReplyDelete
Replies
Hogg25 January 2010 at 19:05
I realize that we want completely uncorrelated flips, but if one is willing to live with small levels of correlation, can't you instead of flipping twice each time (as in Sam's solution) but rather flip once, and use the flip k flips ago to make the other half of the pair? This halves the number of flips relative to Sam's solution but of course creates some (small if k is large) correlations.

Yann LeCun suggested to me that using another past flip to re-define the TH or HT as tail or head or the other way might reduce the correlations further. But I am sure I can't figure that out.
ReplyDelete
Replies
Erik Winfree25 January 2010 at 21:35
Since Sam's puzzle sounded to me like a rephrasing of Shannon's source coding theorem, I gave his puzzle as an in-class quiz to my information theory & complexity class. I was expecting a solution along the lines of "The biased coin generates H(p) bits of entropy per flip, so choose N and flip the coin N times and assign the typical sequences uniformly to length N*H(p) bitstrings." I still think that this is the essence of the optimal solution, but there are all these issues of residual correlations and epsilons and deltas to worry about, as Hogg points out. So I was surprised when after 10 minutes, Tim Black announced his elegant solution, which works better as N goes to infinity:

Flip the biased coin N times. Let k be the number of heads. The probability of any sequence with k heads is equal to the probability of any other such sequence. So, let L_k be the largest such that 2^{L_k} <= (N choose k). If your observed sequence is among the first 2^{L_k}, say the n^{th} such sequence, then output the bits of n. Otherwise, toss the sequence into your "bit bucket" for future use.

First note that the output sequence (as you do this many times) yields independent bits, because for each k, the probability is uniform over all binary strings to output. Second, note that k will typically be near N*p, with very high probability. So L_k will be right near log (N choose N*p) = N*H(p) using Sterling's approximation. Third, if you want to eek out every last ounce of randomness, go back to the bit bucket every now and again, and extract bits with a larger N.

Beautiful, huh? I can imagine Sam's face, as he says, "You see it now, hey? Say, have you heard the one about..."
ReplyDelete
Replies

Add comment

Sam Roweis 1972 - 2010

Sunday, 17 January 2010

from David MacKay

5 comments:

Blog Archive