The Investors vs. the Tabby

There’s an amusing article making its rounds of the internet today, about the successful investment strategy of a cat named Orlando..

A group of people at the Observer put together a fun experiment.
They asked three groups to pretend that they had 5000 pounds, and asked each of them to invest it, however they wanted, in stocks listed on the FTSE. They could only change their investments at the end of a calendar quarter. At the end of the year, they compared the result of the three groups.

Who were the three groups?

  1. The first was a group of professional investors – people who are, at least in theory, experts at analyzing the stock market and using that analysis to make profitable investments.
  2. The second was a classroom of students, who are bright, but who have no experience at investment.
  3. The third was an orange tabby cat named Orlando. Orlando chose stocks by throwing his toy mouse at a
    targetboard randomly marked with investment choices.

As you can probably guess by the fact that we’re talking about this, Orlando the tabby won, by a very respectable margin. (Let’s be honest: if the professional investors came in first, and the students came in second, no one would care.) At the end of the year, the students had lost 160 pounds on their investments. The professional investors ended with a profit of 176 pounds. And the cat ended with a profit of 542 pounds – more than triple the profit of the professionals.

Most people, when they saw this, had an immediate reaction: “see, those investors are a bunch of idiots. They don’t know anything! They were beaten by a cat!”
And on one level, they’re absolutely right. Investors and bankers like to present themselves as the best of the best. They deserve their multi-million dollar earnings, because, so they tell us, they’re more intelligent, more hard-working, more insightful than the people who earn less. And yet, despite their self-alleged brilliance, professional investors can’t beat a cat throwing a toy mouse!

It gets worse, because this isn’t a one-time phenomenon: there’ve been similar experiments that selected stocks by throwing darts at a news-sheet, or by rolling dice, or by picking slips of paper from a hat. Many times, when people have done these kinds of experiments, the experts don’t win. There’s a strong implication that “expert investors” are not actually experts.

Does that really hold up? Partly yes, partly no. But mostly no.

Before getting to that, there’s one thing in the article that bugged the heck out of me: the author went out of his/her way to make sure that they defended the humans, presenting their performance as if positive outcomes were due to human intelligence, and negative ones were due to bad luck. In fact, I think that in this experiment, it was all luck.

For example, the authors discuss how the professionals were making more money than the cat up to the last quarter of the year, and it’s presented as the human intelligence out-performing the random cat. But there’s no reason to believe that. There’s no evidence that there’s anything qualitatively different about the last quarter that made it less predictable than the first three.

The headmaster at the student’s school actually said “The mistakes we made earlier in the year were based on selecting companies in risky areas. But while our final position was disappointing, we are happy with our progress in terms of the ground we gained at the end and how our stock-picking skills have improved.” Again, there’s absolutely no reason to believe that the students stock picking skills miraculously improved in the final quarter; much more likely that they just got lucky.

The real question that underlies this is: is the performance of individual stocks in a stock market actually predictable, or is it dominantly random. Most of the evidence that I’ve seen suggests that there’s a combination; on a short timescale, it’s predominantly random, but on longer timescales it becomes much more predictable.

But people absolutely do not want to believe that. We humans are natural pattern-seekers. It doesn’t matter whether we’re talking about financial markets, pixel-patterns in a bitmap, or answers on a multiple choice test: our brains look for patterns. If you randomly generate data, and you look at it long enough, with enough possible strategies,
you’ll find a pattern that fits. But it’s an imposed pattern, and it has no predictive value. It’s like the images of jesus on toast: we see patterns in noise. So people see patterns in the market, and they want to believe that it’s predictable.

Second, people want to take responsibility for good outcomes, and excuse bad ones. If you make a million dollars betting on a horse, you’re going to want to say that it was your superiour judgement of the horses that led to your victory. When an investor makes a million dollars on a stock, of course he wants to say that he made that money because he made a smart choice, not because he made a lucky choice. But when that same investor loses a million dollars, he doesn’t want to say that the lost a million dollars because he’s stupid; he wants to say that he lost money because of bad luck, of random factors beyond his control that he couldn’t predict.

The professional investors were doing well during part of the year: therefore, during that part of the year, they claim that their good performance was because they did a good job judging which stocks to buy. But when they lost money during the last quarter? Bad luck. But overall, their knowledge and skills paid off! What evidence do we have to support that? Nothing: but we want to assert that we have control, that experts understand what’s going on, and are able to make intelligent predictions.

The students performance was lousy, and if they had invested real money, they would have lost a tidy chunk of it. But their teacher believes that their performance in the last quarter wasn’t luck – it was that their skills had improved. Nonsense! They were lucky.

On the general question: Are “experts” useless for managing investments?

It’s hard to say for sure. In general, experts do perform better than random, but not by a huge margin, certainly not by as much as they’d like us to believe. The Wall Street Journal used to do an experiment where they compared dartboard stock selection against human experts, and against passive investment in the Dow Jones Index stocks over a one-year period. The pros won 60% of the time. That’s better than chance: the experts knowledge/skills were clearly benefiting them. But: blindly throwing darts at a wall could beat experts 2 out of 5 times!

When you actually do the math and look at the data, it appears that human judgement does have value. Taken over time, human experts do outperform random choices, by a small but significant margin.

What’s most interesting is a time-window phenomenon. In most studies, the human performance relative to random choice is directly related to the amount of time that the investment strategy is followed: the longer the timeframe, the better the humans perform. In daily investments, like day-trading, most people don’t do any better than random. The performance of day-traders is pretty much in-line with what you’d expect from probability from random choice. Monthly, it’s still mostly a wash. But if you look at yearly performance, you start to see a significant difference: humans do typically outperform random choice by a small but definitely margin. If you look at longer time-frames, like 5 or ten years, then you start to see really sizeable differences. The data makes it look like daily fluctuations of the market are chaotic and unpredictable, but that there are long-term trends that we can identify and exploit.

Weekend Recipe: Flank Steak with Mushroom Polenta

I just finished eating a great new dinner, and I’m going to share the recipe with you.

Both my wife and I never particularly liked polenta. But recently, we’ve
had it in a couple of outstanding Italian restaurants, and realized that polenta could be wonderful. What made the difference were two things: first, coarse-ground polenta. If you use fine-ground cornmeal for the polenta, it comes out very smooth and creamy. A lot of people like it that way. I don’t. Second, keeping it soft. Polenta, because of the starch, can become very gluey. It needs to be cooked with enough liquid and enough fat to keep it light.

So after discovering that we liked it, I went out and bought some good stone-ground coarse polenta to experiment with. I knew from the places where I’d like the polenta that it goes really well with strong-flavored meats. So I decided to make a flank steak. Since I absolutely adore mushrooms with steak, I wanted to find a way to get mushroom flavor into the polenta, so I went with a nice duxelles.

The result was absolutely phenomenal: one of the best meals I’ve made in the last several months.

Ingredients:

  • 2 lbs flank steak.
  • For the marinade:
    • 2 cloves minced garlic.
    • 1/2 teaspoon dijon mustard.
    • 2 teaspoons tomato paste
    • 1/2 cup red wine
    • 1 tablespoon red wine vinegar
  • For the duxelles:
    • 1 pound mushrooms, minced.
    • 2 olive oil
    • 2 shallots, minced.
    • 1/2 teaspoon dried thyme
    • Salt and pepper
    • 1/2 cup red wine
  • For the polenta:
    • 1 1/2 cups coarse polenta
    • 5 1/2 cups chicken stock.
    • 1 1/2 teaspoons salt.
    • 4 tablespoons butter
    • 1/4 cup parmesan cheese.
  • The sauce:
    • Drippings from the steak.
    • 3 tablespoons butter.
    • 1 minced shallot
    • 1/2 cup port wine
    • 1/2 cup chicken stock.

Instructions

  1. Marinate the steak. Mix together all of the marinade ingredients, and coat the steak with the marinade. Let it set for a couple of hours.
  2. Make the duxelle for the polenta. Put a pan on high heat, and melt the butter. When it’s melted, add the shallots and the mushrooms. Sprinkle with salt and pepper. After the mushrooms start to shed some of their liquid, add the thyme. Keep stirring. If the pan starts to get dry, add some of the red wine. Keep cooking, stirring all the time, until you run out of wine. By that time, the mushrooms should have lost a lot of their volume, and turned a deep caramel brown. Remove it from the heat, and set aside.
  3. Start the polenta. Bring 4 1/2 cups of the chicken stock to a boil. Stir in the polenta and the salt. Reduce the heat to medium, and stir until it starts to thicken. Add in the duxelle, and reduce the heat a bit more, to medium-low. Now the polenta just sits and cooks. You want it to go for about 45 minutes at a minimum. But as long as you keep it moist, polenta just keeps getting better as it cooks, so don’t worry about it. Add some stock whenever it gets too dry, and stir it every few minutes.
  4. Preheat your oven t 350.
  5. Heat a cast iron pan on high heat. When it’s good and hot, sear the steak, about 3 minutes on each side. Then transfer it to a baking sheet, and put it in the oven for 10 minutes. At the end of the ten minutes, remove it, and transfer it to a cutting board, to rest for about ten minutes.
  6. Heat 1 tablespoon of the butter in a saucepan. Add in the shallots, and cook until they turn translucent. Add in whatever drippings are left on the baking sheet, and the port wine, and reduce nearly all of the liquid away. Then add the chicken stock. When it boils, add in salt to taste, and then remove from the heat. Add in the remaining butter and stir until it melts.
  7. While the steak is resting, add the butter and cheese to the polenta, and stir it in.
  8. Slice the steak against the grain.
  9. On each plate, put a nice mound of polenta, and a helping of the steak. Then drizzle the sauce over the steak, and a little bit of extra virgin olive oil over the polenta.
  10. Eat!

Define Distance Differently: the P-adic norm

As usual, sorry for the delay. Most of the time when I write this blog, I’m writing about stuff that I know about. I’m learning about the p-adic numbers as I write these posts, so it’s a lot more work for me. But I think many of you will be happy to hear about the other reason for the delay: I’ve been working on a book based on some of my favorite posts from the history of this blog. It’s nearly ready! I’ll have more news about getting pre-release versions of it later this week!

But now, back to p-adic numbers!

In the last post, we looked a bit at the definition of the p-adic numbers, and how p-adic arithmetic works. But what makes p-adic numbers really interesting and valuable is metrics.

Metrics are one of those ideas that are simultaneously simple and astonishingly complicated. The basic concept of a metric is straightforward: I’ve got two numbers, and I want to know how far apart they are. But it becomes complicated because it turns out that there are many different ways of defining metrics. We tend to think of metrics in terms of euclidean geometry and distance – the words that I to describe metrics “how far apart” come from our geometric intuition. In math, though, you can’t ever just rely on intuition: you need to be able to define things precisely. And precisely defining a metric is difficult. It’s also fascinating: you can create the real numbers from the integers and rationals by defining a metric, and the metric will reveal the gaps between the rationals. Completing the metric – filling in those gaps – gives you the real numbers. Or, in fact, if you fill them in differently, the p-adic numbers.

To define just what a metric is, we need to start with fields and norms. A field is an abstract algebraic structure which describes the behavior of numbers. It’s an abstract way of talking about the basic structure of numbers with addition and multiplication operations. I’ve talked about fields before, most recently when I was debunking the crackpottery of E. E. Escultura here.

A norm is a generalization of the concept of absolute value. If you’ve got a field F, then a norm of F is a function, the norm of is a function | cdot | from values in F to non-negative numbers.

  1. | x | = 0 if and only if x = 0.
  2.  |x y| = |x| |y|
  3. |x + y| le |x| + |y|

A norm on F can be used to define a distance metric d(x, y) between x and y in F as | x - y|.

For example, the absolute value is clearly a norm over the real numbers, and it defines the euclidean distance between them.

So where do the gaps come from?

You can define a sequence a of values in F as a = { a_i }
for some set of values i. There’s a special kind of sequence called
a Cauchy sequence, which is a sequence where lim_{i,j rightarrow infty} |a_n - a_m| = 0.

You can show that any Cauchy sequence converges to a real number. But even if every element of a Cauchy sequence is a rational number, it’s pretty easy to show that many (in fact, most!) Cauchy sequences do not converge to rational numbers. There’s something in between the rational numbers which Cauchy sequences of rational numbers can converge to, but it’s not a rational number. When we talk about the gaps in the rational numbers, that’s what we mean. (Yes, I’m hand-waving a bit, but getting into the details
would be a distraction, and this is the basic idea!)

When you’re playing with number fields, the fundamental choice that you get is just how to fill in those gaps. If you fill them in using a metric based on a Euclidean norm, you get the real numbers. What makes the p-adic numbers is just a different norm, which defines a different metric.

The idea of the p-adic metric is that there’s another way of describing the distance between numbers. We’re used to thinking about distance measured like a ruler on a numberline, which is what gives us the reals. For the p-adics, we’re going to define distance in a different way, based on the structure of numbers. The way that the p-adic metric works is based on how a number is built relative to the prime-number base.

We define the p-adic metric in terms of the p-adic norm exactly the way that we defined Euclidean distance in terms of the absolute value norm. For the p-adic number, we start off with a norm on the integers, and then generalize it. In the P-adic integers, the norm of a number is based around the largest power of the base that’s a factor of that number: for an integer x, if p^n is the largest power of p that’s a factor of x, then the the p-adic norm of x (written |x|_p) is p^{-n}. So the more times you multiply a number by the p-adic base, the smaller the p-adic norm of that number is.

The way we apply that to the rationals is to extend the definition of p-factoring: if p is our p-adic base, then we can define the p-adic norm of a rational number as:

  • |0|_p = 0
  • For other rational numbers x: |x|_p = p^{-text{ord}_p(x)} where:
    • If x is a natural number, then text{ord}_p(x) is the exponent of the largest power of p that divides x.
    • If x is a rational number a/b, then text{ord}(a/b) = ord(a) - ord(b).

Another way of saying that is based on a property of rational numbers and primes. For any prime number p, you can take any rational number x, and represent it as a p-based ratio p^nfrac{a}{b}, where neither a nor b is divisible by p. That representation is unique – there’s only one possible set of values for a, b, and n where that’s true. In that case, p-adic norm of x,
|x|_p == p^{-n}.

Ok, that’s a nice definition, but what on earth does it mean?

Two p-adic numbers x and y are close together if x - y is divisible by a large power of p.

In effect, this is the exact opposite of what we’re used to. In the real numbers written out in decimal for as a series of digits, the metric says that the more digits numbers have in common moving from left to right, the closer together they are. So 9999 and 9998 are closer than 9999 and 9988.

But with P-adic numbers, it doesn’t work that way. The P-adic numbers are closer together if, moving right to left, they have a common prefix. The distance ends up looking very strange. In 7-adic, the distance between 56666 and 66666 is smaller than the distance between 66665 and 66666!

As strange as it looks, it does make a peculiar kind of sense. The p-adic distance is measuring a valuable and meaningful kind of distance between numbers – their distance in terms of
their relationship to the base prime number p. That leads to a lot of interesting stuff, much of which is, to be honest, well beyond my comprehension! For example, the Wiles proof of Fermat’s last theorem uses properties of the P-adic metric!

Without getting into anything as hairy as FLT, there are still ways of seeing why the p-adic metric is valuable. Next post, we’ll look at something called Hensel’s lemma, which both shows how something like Newton’s method for root-finding works in the p-adic numbers, and also shows some nice algebraic properties of root-finding that aren’t nearly as clear for the real numbers.

A Bad Mathematical Refutation of Atheism

At some point a few months ago, someone (sadly I lost their name and email) sent me a link to yet another Cantor crank. At the time, I didn’t feel like writing another Cantor crankery post, so I put it aside. Now, having lost it, I was using Google to try to find the crank in question. I didn’t, but I found something really quite remarkably idiotic.

(As a quick side-comment, my queue of bad-math-crankery is, sadly, empty. If you’ve got any links to something yummy, please shoot it to me at markcc@gmail.com.)

The item in question is this beauty. It’s short, so I’ll quote the whole beast.

MYTH: Cantor’s Set Theorem disproves divine omniscience

God is omniscient in the sense that He knows all that is not impossible to know. God knows Himself, He knows and does, knows every creature ideally, knows evil, knows changing things, and knows all possibilites. His knowledge allows free will.

Cantor’s set theorem is often used to argue against the possibility of divine omniscience and therefore against the existence of God. It can be stated:

  1. If God exists, then God is omniscient.
  2. If God is omniscient, then, by definition, God knows the set of all truths.
  3. If Cantor’s theorem is true, then there is no set of all truths.
  4. But Cantor’s theorem is true.
  5. Therefore, God does not exist.

However, this argument is false. The non-existence of a set of all truths does not entail that it is impossible for God to know all truths. The consistency of a plausible theistic position can be established relative to a widely accepted understanding of the standard model of Cantorian set theorem. The metaphysical Cantorian premises imply that Cantor’s theorem is inapplicable to the things that God knows. A set of all truths, if it exists, must be non-Cantorian.

The attempted disproof of God’s omniscience is, from a meta-mathematical standpoint, is inadequate to the extent that it doesn’t explain well-known mathematical contexts in which Cantor’s theorem is invalid. The “disproof” doesn’t acknowledge standard meta-mathematical conceptions that can analogically be used to establish the relative consistency of certain theistic positions. The metaphysical assertions concerning a set of all truths in the atheistic argument above imply that Cantor’s theorem is inapplicable to a set of all truths.

This is an absolute masterwork of crankery! It’s remarkably silly argument on so many levels.

  1. The first problem is just figuring out what the heck he’s talking about! When you say “Cantor’s theorem”, what I think of is one of Cantor’s actual theorems: “For any set S, the powerset of S is larger than S.” But that is clearly not what he’s referring to. I did a bit of searching to make sure that this wasn’t my error, but I can’t find anything else called Cantor’s theorem.
  2. So what the heck does he mean by “Cantor’s set theorem”? From his text, it appears to be a statement something like: “there is no set of all truths”. The closest actual mathematical statement that I can come up with to match that is Gödel’s incompleteness theorem. If that’s what he means, then he’s messed it up pretty badly. The closest I can come to stating incompleteness informally is: “In any formal mathematical system that’s powerful enough to express Peano arithmetic, there will be statements that are true, but which cannot be proven”. It’s long, complex, not particularly intuitive, and it’s still not a particularly good statement of incompleteness.

    Incompleteness is a difficult concept, and as I’ve written about before, it’s almost impossible to state incompleteness in an informal way. When you try to do that, it’s inevitable that you’re going to miss some of its subtleties. When you try to take an informal statement of incompleteness, and reason from it, the results are pretty much guaranteed to be garbage – as he’s done. He’s using a mis-statement of incompleteness,and trying to reason from it. It doesn’t matter what he says: he’s trying to show how “Cantor’s set theorem” doesn’t disprove his notion of theism. Whether it does or not doesn’t matter: for any statement X, no matter what X is, you can’t prove that “Cantor’s set theorem” or Gödel’s incompleteness theorem, or anything else disproves X if you’re arguing against something that isn’t X.

  3. Ignoring his mis-identification of the supposed theorem, the way that he stated it is actually meaningless. When we talk about sets, we’re using the word set in the sense of either ZFC or NBG set theory. Mathematical set theory defines what a set is, using first order predicate logic. His version of “Cantor’s set theorem” talks about a set which cannot be a set!

    He wants to create a set of truths. In set theory terms, that’s something you’d define with the axiom of specification: you’d use a predicate ranging over your objects to select the ones in the set. What’s your predicate? Truth. At best, that’s going to be a second-order predicate. You can’t form sets using second-order predicates! The entire idea of “the set of truths” isn’t something that can be expressed in set theory.

  4. Let’s ignore the problems with his “Cantor’s theorem” for the moment. Let’s pretend that the “set of all truths” was well-defined and meaningful. How does his argument stand up? It doesn’t: it’s a terrible argument. It’s ultimately nothing more than “Because I say so!” hidden behind a collection of impressive-sounding words. The argument, ultimately, is that the set of all truths as understood in set theory isn’t the same thing as the set of all truths in theology (because he says that they’re different), therefore you can’t use a statement about the set of all truths from set theory to talk about the set of all truths in theology.
  5. I’ve saved what I think is the worst for last. The entire thing is a strawman. As a religious science blogger, I get almost as much mail from atheists trying to convince me that my religion is wrong as I do from Christians trying to convert me. After doing this blogging thing for six years, I’m pretty sure that I’ve been pestered with every argument, both pro- and anti-theistic that you’ll find anywhere. But I’ve never actually seen this argument used anywhere except in articles like this one, which purport to show why it’s wrong. The entire argument being refuted is a total fake: no one actually argues that you should be an atheist using this piece of crap. It only exists in the minds of crusading religious folk who prop it up and then knock it down to show how smart they supposedly are, and how stupid the dirty rotten atheists are.

P-adic arithmetic

I’ve been having a pretty rough month. The day after thanksgiving,
I was diagnosed with shingles. Shingles is painful – very, very painful. Just the friction of clothing brushing over the shingles when I move caused so much pain that I spent weeks being as immobile as possible.

Needless to say, this has been extremely boring. When I wasn’t fuzzed out on pain-killers, I’ve been doing some reading lately on something called p-adic numbers. p-adics are a very strange kind of number: They’re strange-looking, strange-acting, and strangely useful.

P-adics are an alternative to the real numbers. In fact, in a way, they are the alternative to real numbers.

If you go back to number theory, there’s a reason why the rational numbers aren’t sufficient, and why we have to extend them to the reals by adding irrational numbers. If you take all of the possible rational numbers, you find that there are gaps – places where we know that there must be a number, and yet, if we limit ourselves to fractions, there isn’t anything to fit. For example, we know that there must be some number where if we multiply it by itself, it’s equal to 2. It’s more than 1 4/10ths, by about 1/100th. But it’s more than 141/100ths, but about 4/1,000ths. It’s more that 1,414/1,000s, by about 2/10,000ths. No matter how far we go, there’s no fraction that’s exactly right. But we know that there’s a number there! If we look at those gaps carefully, we’ll find that most numbers are actually in those gaps! The real numbers and the p-adics are both ways of creating number systems that allow us to define the numbers that fit in to those gaps.

The easiest way to understand p-adic numbers is think about how we
represent real numbers in base-P. In real numbers, we represent them using the powers of the base. So, for example, we’d write 24.31 in base-5 to mean 2*51 + 4*50 + 3*5-1 + 1*5-2. Using our normal notation for real numbers, we fill in the gaps by writing numbers as an integer part, followed by a decimal point, followed by a fractional part. For real numbers that aren’t rationals, we say that the fractional part goes on forever. So the square root of two starts 1.4142135624, and continues on and on, forever, without repeating. That gives us the real numbers. In that system, every number can be written using a finite number of digits to the left of the decimal point, and an infinite number of digits to the right.

P-adic numbers are exactly the opposite: every P-adic number has a finite number of digits to the right of the decimal, but it can have an infinite number to the left!

Defining p-adic numbers starts off being pretty similar to how we compute the representation of numbers in a standard numeric base. Take a number for your base, like 10. For a number n in base 10, take n modulo 10. That’s the right-most digit of your number. Divide by ten, dropping the fractional part, and take the result modulo 10, and that’s the second-rightmost digit. Continue this until there’s nothing left.

For example, take the number 222 in base-10. If we wanted to represent that in base-7, we’d do:

  1. If we divide 222 by 7, we get 31 with a remainder of 5. So the rightmost digit is 5.
  2. We take 31, and divide it by 7, giving us 4, with a remainder of 3. So the second digit is 3.
  3. We’re left with 4, so the last digit is 4.

So – 222 in base-7 is 435. It’s the same in 7-adic. For a particular base B, all positive integers are written the same in both base-B and B-adic. Integer arithmetic that doesn’t involve negative numbers is also the same.

There’s one reallybig catch, and it’s one that would make a lot of cranks (like E. E. Escultura) very happy: for real numbers, the decimal notation (for example) is a just a representation – 35 in base 10 is written differently from 43 in base 8 but they’re the same actual number, just written differently. 35 in 10-adic is not the same number as 43 in 8-adic. As we’ll see when we get to metrics, they’re quite different. Each p-adic base produces a distinct system of p-adic numbers, and you can’t convert between them as freely as you can in the conventional reals. Decimal notation and hexidecimal notation are writing numbers in the same number system; 2-adic and 3-adic are different number systems!

The first essential difference between p-adic numbers and the reals comes when you try to do arithmetic.

As I said above, for integers, if you don’t
do anything with negative numbers, P-adic arithmetic is the same as real number integer arithmetic. In fact, if you’re working with P-adic numbers, there are no negative numbers at all! In a P-adic number system, subtracting 1 from 0 doesn’t give you -1. It “rolls over” like an infinite odometer. So for 7-adic, 0-1 = …666666666! That means that arithmetic gets a bit different. Actually, it’s really pretty easy: you just do arithmetic from the right to the left, exactly the way that you’re used to.

For addition and subtraction, P-adic works almost like normal real-number arithmetic using decimals. Addition is basically what you know from decimal arithmetic. Just go right to left, adding digits, and carrying to the left.

So, for example, in 5-adic, if you have a number …33333, and 24, to add them, you’d go through the following steps.

  1. 3 + 4 is 7, which is 12 in base-5. So the first digit of the sum is 2, and we carry 1.
  2. 3 + 2 is 6, plus the carried one is 6 – so again, 12 in base-5. So the second digit is also 2, and we carry 1.
  3. 3 + 0 is 3, plus the carried one is 4, se the third digit is 4.
  4. For all the rest of the infinite digits streaming off to the left, it’s 3+0=3.

So the sum is …3333422.

To do subtraction, it’s still basically the same as what you’re used to from decimal. There’s just one simple change: infinite borrowing. In normal subtraction, you can borrow from the position to your left if there’s anything to your left to borrow from. For example, in decimal, if you wanted to subtract 9 from 23, you’d borrow 10 from the 2, then subtract 9 from 13, giving you a result of 14. But if you wanted to substract 3-9, you couldn’t borrow, because there’s nothing to the left to borrow from. In p-adic, you can always borrow. If you’re subtracting 3-9 in 10-adic, then you can borrow from the 0 to the left of the 3. Of course, there’s nothing there – so it needs to borrow from its left. And so on – giving you an infinite string of 9s. So 3-9 in 10-adic gives you ….999999994.

Let’s do a full example: …33333 – 42 in 5-adic.

  1. As always, we start from the right. 3 – 2 = 1, so the first digit is 1.
  2. Since 3 is smaller than 4, we need to borrow 1 – so we have 13 base 5, which is 8. 8 – 4 = 4. So
    the second digit is 4.
  3. For the third digit, we just subtract the borrowed 1, so it’s 2.

So the result is …3333241.

Multiplication and division get even stranger in P-adic. Because we can’t have an infinite number of digits to the right of the decimal, p-adic ends up representing fractions using infinite numbers of digits on the right of the decimal. And that means that we get collisions between fractions and negative numbers. (This should start to give you a clue why each p-adic base is really a different number system: the correspondance between roll-over negatives and infinitely long fractions is different in each base.) It’s easiest to see how this works by looking at a concrete example.

The fraction 1/3 can’t be written as finite-length string in base-5. In 5-adic, that means we can’t write it using digits to the right of the decimal point – we would need an infinite number of digits, and we’re not allowed to do that. Instead, we need to write it with an infinite number of digits to the left! 1/3 in 5-adic is: …1313132.

Looks crazy, but it does work: if you do a tabular multiplication, right to left, multiplying …1313132 by 3 gives you one! Let’s work it through:

  • Start from the right: the rightmost digit is 2. 3*2 is 6, which is 11 in base 5; so the rightmost digit is 1, and you carry a one.
  • The next digit is 3: 3 times 3 is – 9, plus the carried 1, gives 10 – which is 20 in base-5, so the next digit is a 0, and you carry 2.
  • The next digit is 1: 3*1 is 3 plus the carried 2 = 5; so 0, carry 1.

And so on – ….13131313132 * 3 = 1, so ….131313132 == 1/3 in 5-adic.

How can we compute the value of 1/3? Just like decimal real numbers: division.

Division in p-adics is actually easy. The trick is that like all of the other arithmetic, it goes from right to left. Suppose we want to divide N by M. To make it easy, we’ll talk about the digits of M and N using subscripts. The rightmost digit of a number X is X1; the second-rightmost is X2, etc. The multiplication algorithm is:

  1. Start at the rightmost digit of both numbers.
  2. Find the smallest number d which, multiplied by M, has as Ni as its rightmost digit.
  3. Subtract d*Mi from N.
  4. Drop the trailing last digits from N, giving N’.
  5. Now divide N’ by M, and put d on the right.

Let’s walk through 1/3 in 5-adic:

  • The rightmost digit of 1 is 1.
  • What, multiplied by 3 will have a trailing digit of 1 in base-5? 2*3=6, which is 11 in base-5. So d = 2.
  • Now we subtract the “11” from 1 – which is really …00001. So it becomes …444440.
  • We drop the trailing 0, and N’ is …4444.
  • So now we divide …4444 by 3. What’s the smallest number which, multiplied by 3, ends in a 4 in base-5? 3*3=9, which is 14 in base-5. So the next digit of the result is 3.
  • Now, we subtract 14 from …4444. That gives us …44430. Drop the zero, and it’s …4443
  • Next digit is a 1, leaving …444

Crazy, huh? There is one catch about division: it only really works if the p-base in your p-adic system is a prime number. Otherwise, you get trouble – your p-adic system of numbers isn’t a field if p is non-prime.

If you think about it, arithmetic with the P-adics is actually simpler than it is with conventional real numbers. Everything goes right to left. It’s all more uniform. In fact, p-adic has been seriously proposed as a number representation for computer hardware, because the hardware is much easier to build when everything can be done uniformly right to left!

There’s a lot more wierdness in the p-adics. We haven’t even started to talk about metrics and distances in p-adic numbers. That’s both where the p-adics get even stranger, and where they actually get useful. That’ll be the next post, hopefully within a day or two!

Let's Get Rid of Zero!

One of my tweeps sent me a link to a delightful pile of rubbish: a self-published “paper” by a gentleman named Robbert van Dalen that purports to solve the “problem” of zero. It’s an amusing pseudo-technical paper that defines a new kind of number which doesn’t work without the old numbers, and which gets rid of zero.

Before we start, why does Mr. van Dalen want to get rid of zero?

So what is the real problem with zeros? Zeros destroy information.

That is why they don’t have a multiplicative inverse: because it is impossible to rebuilt something you have just destroyed.

Hopefully this short paper will make the reader consider the author’s firm believe that: One should never destroy anything, if one can help it.

We practically abolished zeros. Should we also abolish simplifications? Not if we want to stay practical.

There’s nothing I can say to that.

So what does he do? He defines a new version of both integers and rational numbers. The new integers are called accounts, and the new rationals are called super-rationals. According to him, these new numbers get rid of that naughty information-destroying zero. (He doesn’t bother to define real numbers in his system; I assume that either he doesn’t know or doesn’t care about them.)

Before we can get to his definition of accounts, he starts with something more basic, which he calls “accounting naturals”.

He doesn’t bother to actually define them – he handwaves his way through, and sort-of defines addition and multiplication, with:

Addition
a + b == a concat b
Multiplication
a * b = a concat a concat a … (with b repetitions of a)

So… a sloppy definition of positive integer addition, and a handwave for multiplication.

What can we take from this introduction? Well, our author can’t be bothered to define basic arithmetic properly. What he really wants to say is, roughly, Peano arithmetic, with 0 removed. But my guess is that he has no idea what Peano arithmetic actually is, so he handwaves. The real question is, why did he bother to include this at all? My guess is that he wanted to pretend that he was writing a serious math paper, and he thinks that real math papers define things like this, so he threw it in, even though it’s pointless drivel.

With that rubbish out of the way, he defines an “Account” as his new magical integer, as a pair of “account naturals”. The first member of the pair is called a the credit, and the second part is the debit. If the credit is a and the debit is b, then the account is written (a%b). (He used backslash instead of percent; but that caused trouble for my wordpress config, so I switched to percent-sign.)

Addition:
a%b ++ c%d = (a+c)%(b+d)
Multiplication
a%b ** c%d = ((a*c)+(b*d))%((a*d)+(b*c))
Negation
– a%b = b%a

So… for example, consider 5*6. We need an “account” for each: We’ll use (7%2) for 5, and (9%3) for 6, just to keep things interesting. That gives us: 5*6 = (7%2)*(9%3) = (63+6)%(21+18) = 69%39, or 30 in regular numbers.

Yippee, we’ve just redefined multiplication in a way that makes us use good old natural number multiplication, only now we need to do it four times, plus 2 additions to multiply two numbers! Wow, progress! (Of a sort. I suppose that if you’re a cloud computing provider, where you’re renting CPUs, then this would be progress.

Oh, but that’s not all. See, each of these “accounts” isn’t really a number. The numbers are equivalence classes of accounts. So once you get the result, you “simplify” it, to make it easier to work with.

So make that 4 multiplications, 2 additions, and one subtraction. Yeah, this is looking nice, huh?

So… what does it give us?

As far as I can tell, absolutely nothing. The author promises that we’re getting rid of zero, but it sure likes like this has zeros: 1%1 is zero, isn’t it? (And even if we pretend that there is no zero, Mr. van Dalen specifically doesn’t define division on accounts, we don’t even get anything nice like closure.)

But here’s where it gets really rich. See, this is great, cuz there’s no zero. But as I just said, it looks like 1%1 is 0, right? Well it isn’t. Why not? Because he says so, that’s why! Really. Here’s a verbatim quote:

An Account is balanced when Debit and Credit are equal. Such a balanced Account can be interpreted as (being in the equivalence class of) a zero but we won’t.

Yeah.

But, according to him, we don’t actually get to see these glorious benefits of no zero until we add rationals. But not just any rationals, dum-ta-da-dum-ta-da! super-rationals. Why super-rationals, instead of account rationals? I don’t know. (I’m imagining a fraction with blue tights and a red cape, flying over a building. That would be a lot more fun than this little “paper”.)

So let’s look as the glory that is super-rationals. Suppose we have two accounts, e = a%b, and f = c%d. Then a “super-rational” is a ratio like e/f.

So… we can now define arithmetic on the super-rationals:

Addition
e/f +++ g/h = ((e**h)++(g**f))/(f**h); or in other words, pretty much exactly what we normally do to add two fractions. Only now those multiplications are much more laborious.
Multiplication
e/f *** g/h = (e**g)/(f**h); again, standard rational mechanics.
Multiplication Inverse (aka Reciprocal)
`e/f = f/e; (he introduces this hideous notation for no apparent reason – backquote is reciprocal. Why? I guess for the same reason that he did ++ and +++ – aka, no particularly good reason.

So, how does this actually help anything?

It doesn’t.

See, zero is now not really properly defined anymore, and that’s what he wants to accomplish. We’ve got the simplified integer 0 (aka “balance”), defined as 1%1. We’ve got a whole universe of rational pseudo-zeros – 0/1, 0/2, 0/3, 0/4, all of which are distinct. In this system, (1%1)/(4%2) (aka 0/2) is not the same thing as (1%1)/(5%2) (aka 0/3)!

The “advantage” of this is that if you work through this stupid arithmetic, you essentially get something sort-of close to 0/0 = 0. Kind-of. (There’s no rule for converting a super-rational to an account; assuming that if the denominator is 1, you can eliminate it, you get 1/0 = 0:

I’m guessing that he intends identities to apply, so: (4%1)/(1%1) = ((4%1)/(2%1)) *** `((2%1)/(1%1)) = ((4%1)/(2%1)) *** ((1%1)/(2%1)) = (1%1)/(2%1). So 1/0 = 0/1 = 0… If you do the same process with 2/0, you end up getting the result being 0/2. And so on. So we’ve gotten closure over division and reciprocal by getting rid of zero, and replacing it with an infinite number of non-equal pseudo-zeros.

What’s his answer to that? Of course, more hand-waving!

Note that we also can decide to simplify a Super- Rational as we would a Rational by calculating the Greatest Common Divisor (GCD) between Numerator and Denominator (and then divide them by their GCD). There is a catch, but we leave that for further research.

The catch that he just waved away? Exactly what I just pointed out – an infinite number of pseudo-0s, unless, of course, you admit that there is a zero, in which case they all collapse down to be zero… in which case this is all pointless.

Essentially, this is all a stupidly overcomplicated way of saying something simple, but dumb: “I don’t like the fact that you can’t divide by zero, and so I want to define x/0=0.”

Why is that stupid? Because dividing by zero is undefined for a reason: it doesn’t mean anything! The nonsense of it becomes obvious when you really think about identities. If 4/2 = 2, then 2*2=4; if x/y=z, then x=z*y. But mix zero in to that: if 4/0 = 0, then 0*0=4. That’s nonsense.

You can also see it by rephrasing division in english. Asking “what is four divided by two” is asking “If I have 4 apples, and I want to distribute them into 2 equal piles, how many apples will be in each pile?”. If I say that with zero, “I want to distribute 4 apples into 0 piles, how many apples will there be in each pile?”: you’re not distributing the apples into piles. You can’t, because there’s no piles to distribute them to. That’s exactly the point: you can’t divide by zero.

If you do as Mr. van Dalen did, and basically define x/0 = 0, you end up with a mess. You can handwave your way around it in a variety of ways – but they all end up breaking things. In the case of this account nonsense, you end up replacing zero with an infinite number of pseudo-zeros which aren’t equal to each other. (Or, if you define the pseudo-zeros as all being equal, then you end up with a different mess, where (2/0)/(4/0) = 2/4, or other weirdness, depending on exactly how you defie things.)

The other main approach is another pile of nonsense I wrote about a while ago, called nullity. Zero is an inevitable necessity to make numbers work. You can hate the fact that division by zero is undefined all you want, but the fact is, it’s both necessary and right. Division by zero doesn’t mean anything, so mathematically, division by zero is undefined.

For every natural number N, there's a Cantor Crank C(n)

More crankery? of course! What kind? What else? Cantor crankery!

It’s amazing that so many people are so obsessed with Cantor. Cantor just gets under peoples’ skin, because it feels wrong. How can there be more than one infinity? How can it possibly make sense?

As usual in math, it all comes down to the axioms. In most math, we’re working from a form of set theory – and the result of the axioms of set theory are quite clear: the way that we define numbers, the way that we define sizes, this is the way it is.

Today’s crackpot doesn’t understand this. But interestingly, the focus of his problem with Cantor isn’t the diagonalization. He thinks Cantor went wrong way before that: Cantor showed that the set of even natural numbers and the set of all natural numbers are the same size!

Unfortunately, his original piece is written in Portuguese, and I don’t speak Portuguese, so I’m going from a translation, here.

The Brazilian philosopher Olavo de Carvalho has written a philosophical “refutation” of Cantor’s theorem in his book “O Jardim das Aflições” (“The Garden of Afflictions”). Since the book has only been published in Portuguese, I’m translating the main points here. The enunciation of his thesis is:

Georg Cantor believed to have been able to refute Euclid’s fifth common notion (that the whole is greater than its parts). To achieve this, he uses the argument that the set of even numbers can be arranged in biunivocal correspondence with the set of integers, so that both sets would have the same number of elements and, thus, the part would be equal to the whole.

And his main arguments are:

It is true that if we represent the integers each by a different sign (or figure), we will have a (infinite) set of signs; and if, in that set, we wish to highlight with special signs, the numbers that represent evens, then we will have a “second” set that will be part of the first; and, being infinite, both sets will have the same number of elements, confirming Cantor’s argument. But he is confusing numbers with their mere signs, making an unjustifiable abstraction of mathematical properties that define and differentiate the numbers from each other.

The series of even numbers is composed of evens only because it is counted in twos, i.e., skipping one unit every two numbers; if that series were not counted this way, the numbers would not be considered even. It is hopeless here to appeal to the artifice of saying that Cantor is just referring to the “set” and not to the “ordered series”; for the set of even numbers would not be comprised of evens if its elements could not be ordered in twos in an increasing series that progresses by increments of 2, never of 1; and no number would be considered even if it could be freely swapped in the series of integeres.

He makes two arguments, but they both ultimately come down to: “Cantor contradicts Euclid, and his argument just can’t possibly make sense, so it must be wrong”.

The problem here is: Euclid, in “The Elements”, wrote severaldifferent collections of axioms as a part of his axioms. One of them was the following five rules:

  1. Things which are equal to the same thing are also equal to one another.
  2. If equals be added to equals, the wholes are equal.
  3. If equals be subtracted from equals, the remainders are equal.
  4. Things which coincide with one another are equal to one another.
  5. The whole is greater that the part.

The problem that our subject has is that Euclid’s axiom isn’t an axiom of mathematics. Euclid proposed it, but it doesn’t work in number theory as we formulate it. When we do math, the axioms that we start with do not include this axiom of Euclid.

In fact, Euclid’s axioms aren’t what modern math considers axioms at all. These aren’t really primitive ground statements. Most of them are statements that are provable from the actual axioms of math. For example, the second and third axioms are provable using the axioms of Peano arithmetic. The fourth one doesn’t appear to be a statement about numbers at all; it’s a statement about geometry. And in modern terms, the fifth one is either a statement about geometry, or a statement about measure theory.

The first argument is based on some strange notion of signs distinct from numbers. I can’t help but wonder if this is an error in translation, because the argument is so ridiculously shallow. Basically, it concedes that Cantor is right if we’re considering the representations of numbers, but then goes on to draw a distinction between representations (“signs”) and the numbers themselves, and argues that for the numbers, the argument doesn’t work. That’s the beginning of an interesting argument: numbers and the representations of numbers are different things. It’s definitely possible to make profound mistakes by confusing the two. You can prove things about representations of numbers that aren’t true about the numbers themselves. Only he doesn’t actually bother to make an argument beyond simply asserting that Cantor’s proof only works for the representations.

That’s particularly silly because Cantor’s proof that the even naturals and the naturals have the same cardinality doesn’t talk about representation at all. It shows that there’s a 1 to 1 mapping between the even naturals and the naturals. Period. No “signs”, no representations.

The second argument is, if anything, even worse. It’s almost the rhetorical equivalent of sticking his fingers in his ears and shouting “la la la la la”. Basically – he says that when you’re producing the set of even naturals, you’re skipping things. And if you’re skipping things, those things can’t possible be in the set that doesn’t include the skipped things. And if there are things that got skipped and left out, well that means that it’s ridiculous to say that the set that included the left out stuff is the same size as the set that omitted the left out stuff, because, well, stuff got left out!!!.

Here’s the point. Math isn’t about intuition. The properties of infinitely large sets don’t make intuitive sense. That doesn’t mean that they’re wrong. Things in math are about formal reasoning: starting with a valid inference system and a set of axioms, and then using the inference to reason. If we look at set theory, we use the axioms of ZFC. And using the axioms of ZFC, we define the size (or, technically, the cardinality) of sets. Using that definition, two sets have the same cardinality if and only if there is a one-to-one mapping between the elements of the two sets. If there is, then they’re the same size. Period. End of discussion. That’s what the math says.

Cantor showed, quite simply, that there is such a mapping:

{ (i rightarrow itimes 2) | i in N }

There it is. It exists. It’s simple. It works, by the axioms of Peano arithmetic and the axiom of comprehension from ZFC. It doesn’t matter whether it fits your notion of “the whole is greater than the part”. The entire proof is that set comprehension. It exists. Therefore the two sets have the same size.

Debunking Two Nate Silver Myths

I followed our election pretty closely. My favorite source of information was Nate Silver. He’s a smart guy, and I love the analysis that he does. He’s using solid math in a good way to produce excellent results. But in the aftermath of the election, I’ve seen a lot of bad information going around about him, his methods, and his result.

First: I keep seeing proclamations that “Nate Silver proves that big data works”.

Rubbish.

There is nothing big data about Nate’s methods. He’s using straightforward Bayesian methods to combine data, and the number of data points is remarkably small.

Big data is one of the popular jargon keywords that people use to appear smart. But it does actually mean something. Big data is using massive quantities of information to find patterns: using a million data points isn’t really big data. Big data means terabytes of information, and billions of datapoints.

When I was at Google, I did log analysis. We ran thousands of machines every day on billions of log records (I can’t say the exact number, but it was in excess of 10 billion records per day) to extract information. It took a data center with 10,000 CPUs running full-blast for 12 hours a day to process a single days data. Using that data, we could extract some obvious things – like how many queries per day for each of the languages that Google supports. We could also extract some very non-obvious things that weren’t explicitly in the data, but that were inferrable from the data – like probable network topologies of the global internet, based on communication latencies. That’s big data.

For another example, look at this image produced by some of my coworkers. At foursquare, we about five million points of checkin data every day, and we’ve got a total of more than 2 1/2 billion data points. By looking at average checkin densities, and then comparing that to checkin densities after the hurricane, we can map out precisely where in the city there was electricity, and where there wasn’t. We couldn’t do that by watching one person, or a hundred people. But by looking at the patterns in millions and millions of records, we can. That is big data.

This doesn’t take away from Nate’s accomplishment in any way. He used data in an impressive and elegant way. The fact is, he didn’t need big data to do this. Elections are determined by aggregate behavior, and you just don’t need big data to predict them. The data that Nate used was small enough that a person could do the analysis of it with paper and pencil. It would be a huge amount of work to do by hand, but it’s just nowhere close to the scale of what we call big data. And trying to do big data would have made it vastly more complicated without improving the result.

Second: there are a bunch of things like this.

The point that many people seem to be missing is that Silver was not simply predicting who would win in each state. He was publishing the odds that one or the other candidate would win in each statewide race. That’s an important difference. It’s precisely this data, which Silver presented so clearly and blogged about so eloquently, that makes it easy to check on how well he actually did. Unfortunately, these very numbers also suggest that his model most likely blew it by paradoxically underestimating the odds of President Obama’s reelection while at the same time correctly predicting the outcomes of 82 of 83 contests (50 state presidential tallies and 32 of 33 Senate races).

Look at it this way, if a meteorologist says there a 90% chance of rain where you live and it doesn’t rain, the forecast wasn’t necessarily wrong, because 10% of the time it shouldn’t rain – otherwise the odds would be something other than a 90% chance of rain. One way a meteorologist could be wrong, however, is by using a predictive model that consistently gives the incorrect probabilities of rain. Only by looking a the odds the meteorologist gave and comparing them to actual data could you tell in hindsight if there was something fishy with the prediction.

Bzzt. Sorry, wrong.

There are two main ways of interpreting probability data: frequentist, and Bayesian.

In a frequentist interpretation, saying that an outcome of an event has a probability X% of occuring, you’re saying that if you were to run an infinite series of repetitions of the event, then on average,
the outcome would occur in X out of every 100 events.

The Bayesian interpretation doesn’t talk about repetition or observation. What it says is: for any specific event, it will have one outcome. There is no repetition. But given the current state of information available to me, I can have a certain amount of certainty about whether or not the event will occur. Saying that I assign probability P% to an event doesn’t mean that I expect my prediction to fail (100-P)% of the time. It just means that given the current state of my knowledge, I expect a particular outcome, and the information I know gives me that degree of certainty.

Bayesian statistics and probability is all about state of knowledge. The fundamental, defining theorem of Bayesian statistics is Bayes theorem, which tells you, given your current state of knowledge and a new piece of information, how to update your knowledge based on what the new information tells you. Getting more information doesn’t change anything about whether or not the event will occur: it will occur, and it will have either one outcome or the other. But new information can allow you to improve your prediction and your certainty of that prediction’s correctness.

The author that I quoted above is being a frequentist. In another section of his articple, he’s more specific:

…The result is P= 0.199, which means there’s a 19.9% chance that it rained every day that week. In other words, there’s an 80.1% chance it didn’t rain on at least one day of the week. If it did in fact rain everyday, you could say it was the result of a little bit of luck. After all, 19.9% isn’t that small a chance of something happening.

That’s frequentist intepretation of the probability – which makes sense, since as a physicist, the author is mainly working with repeated experiments – which is a great place for frequentist interpretation. But looking at the same data, a Bayesian would say: “I have an 19.9% certainty that it will rain today”. Then they’d go look outside, see the clouds, and say “Ok, so it looks like rain – that means that I need to update my prediction. Now I’m 32% certain that it will rain”. Note that nothing about the weather has changed: it’s not true that before looking at the clouds, 80.1 percent of the time it wouldn’t rain, and after looking, that changed. The actual fact of whether or not it will rain on that specific day didn’t
change.

Another way of looking at this is to say that a frequentist believes that a given outcome has an intrinstic probability of occurring, and that our attempts to analyze it just bring us closer to the true probability; whereas a Bayesian says that there is no such thing as an intrinsic probability, because every event is different. All that changes is our ability to make predictions with confidence.

One last metaphor, and I’ll stop. Think about playing craps, where you’re rolling two six sided dice.
For a particular die, a frequentist would say “A fair die has a 1 in 6 chance of coming up with a 1”. A
Bayesian would say “If I don’t know anything else, then my best guess is that I can be 16% certain that a 1
will result from a roll.” The result is the same – but the reasoning is different. And because of the difference in reasoning, you can produce different predictions.

Nate Silver’s predictions of the election are a beautiful example of Bayesian reasoning. He watched daily polls, and each time a new poll came out, he took the information from that poll, weighted it according to the historical reliability of that poll in that situation, and then used that to update his certainty. So based on his data, Nate was 90% certain that his prediction was correct.

Did Global Warming Cause Hurricane Sandy?

I’ve been trapped in post-storm hell (no power, no heat for 10 days. Now power is back, but still no internet at home, which is frustrating, but no big deal), and so I haven’t been able to post this until now.

I’ve been getting a bunch of questions from people in response to an earlier post of mine about global warming, where I said that we can’t blame specific weather events on global warming. The questions come down to: “Can we say that hurricane Sandy and yesterday’s NorEaster were caused by global warming?”

I try to be really careful about things like this. Increasing the amount of energy in the environment definitely has an effect on weather patterns. But for the most part, that effect is statistical. That is, we can’t generally say that a specific extreme weather event wouldn’t have happened without global warming. We can just say that we expect extreme weather events to become much more common.

But what about hurricane Sandy?

Yes, it was caused by global warming.

How can I say that so definitively?

There were a lot of observations made around this particular hurricane. What made it such a severe event is a combination of three primary factors.

  • The ocean water over which it developed is warmer that historically normal. Warm water is, simply, fuel for hurricanes. We know this from years of observation. And we know that the water was warmer, by a couple of degrees, than it would normally be in this season. This is a direct cause for the power of the storm, for the fact that as it moved north, it continued to become stronger rather than weakening. Those warm waters are, by definition global warming: they’re one of the things we measure when we’re measuring global temperature trends.
  • Hurricane Sandy took a pretty dramatic left turn as it came north, which is what swept it into the east coast of the US. That is a very unusual trajectory. Why did it do that? Because of an unusual weather pattern in the Northeast Atlantic, called a negative North Atlantic oscillation (-NAO). And where did the -NAO come from? Our best models strongly suggest that it resulted, at least in part, from icemelt from Greenland. This is less certain than the first factor, but still likely enough that we can be pretty confident.
  • Hurricane Sandy merged with another weather front as it came inland, which intensified it as it came ashore. This one doesn’t have any direct relation to global warming: the front that it merged with is typical autumn weather on the east coast.

So of the three factors that caused the severe hurricane, one of them is absolutely, undeniably global warming. The second is very probably linked to global warming. And the third isn’t.

This is important to understand. We shouldn’t make broad statements about causation when we can’t prove them. But we also shouldn’t refrain from making definitely statements about causation when we can.

The NorEaster that we’re now recovering from falls in to that first class. We simply don’t know if it would have happened without the hurricane. The best models that I’ve seen suggest that it probably wouldn’t have happened without the effects of the earlier hurricane, but it’s just not certain enough to draw a definitive conclusion.

But the Hurricane? There is absolutely no way that anyone can honestly look at the data, and conclude that it was not caused by warming. Anyone who says otherwise is, quite simply, a liar.

Everyone should program, or Programming is Hard? Both!

I saw something on twitter a couple of days ago, and I promised to write this blog post about it. As usual, I’m behind on all the stuff I want to do, so it took longer to write than I’d originally planned.

My professional specialty is understanding how people write programs. Programming languages, development environment, code management tools, code collaboration tools, etc., that’s my bread and butter.

So, naturally, this ticked me off.

The article starts off by, essentially, arguing that most of the programming tutorials on the web stink. I don’t entirely agree with that, but to me, it’s not important enough to argue about. But here’s where things go off the rails:

But that’s only half the problem. Victor thinks that programming itself is broken. It’s often said that in order to code well, you have to be able to “think like a computer.” To Victor, this is absurdly backwards– and it’s the real reason why programming is seen as fundamentally “hard.” Computers are human tools: why can’t we control them on our terms, using techniques that come naturally to all of us?

And… boom! My head explodes.

For some reason, so many people have this bizzare idea that programming is this really easy thing that programmers just make difficult out of spite or elitism or clueless or something, I’m not sure what. And as long as I’ve been in the field, there’s been a constant drumbeat from people to say that it’s all easy, that programmers just want to make it difficult by forcing you to think like a machine. That what we really need to do is just humanize programming, and it will all be easy and everyone will do it and the world will turn into a perfect computing utopia.

First, the whole “think like a machine” think is a verbal shorthand that attempts to make programming as we do it sound awful. It’s not just hard to program, but those damned engineers are claiming that you need to dehumanize yourself to do it!

To be a programmer, you don’t need to think like a machine. But you need to understand how machines work. To program successfully, you do need to understand how machines work – because what you’re really doing is building a machine!

When you’re writing a program, on a theoretical level, what you’re doing is designing a machine that performs some mechanical task. That’s really what a program is: it’s a description of a machine. And what a programming language is, at heart, is a specialized notation for describing a particular kind of machine.

No one will go to an automotive engineer, and tell him that there’s something wrong with the way transmissions are designed, because they make you understand how gears work. But that’s pretty much exactly the argument that Victor is making.

How hard is it to program? That all depends on what you’re tring to do. Here’s the thing: The complexity of the machine that you need to build is what determines the complexity of the program. If you’re trying to build a really complex machine, then a program describing it is going to be really complex.

Period. There is no way around that. That is the fundamental nature of programming.

In the usual argument, one thing that I constantly see is something along the lines of “programming isn’t plumbing: everyone should be able to do it”. And my response to that is: of course so. Just like everyone should be able to do their own plumbing.

That sounds like an amazingly stupid thing to say. Especially coming from me: the one time I tried to fix my broken kitchen sink, I did over a thousand dollars worth of damage.

But: plumbing isn’t just one thing. It’s lots of related but different things:

  • There are people who design plumbing systems for managing water distribution and waste disposal for an entire city. That’s one aspect of plubing. And that’s an incredibly complicated thing to do, and I don’t care how smart you are: you’re not going to be able to do it well without learning a whole lot about how plumbing works.
  • Then there are people who design the plumbing for a single house. That’s plumbing, too. That’s still hard, and requires a lot of specialized knowledge, most of which is pretty different from the city designer.
  • There are people who don’t design plumbing, but are able to build the full plumbing system for a house from scratch using plans drawn by a designer. Once again, that’s still plumbing. But it’s yet another set of knowledge and skills.
  • There are people who can come into a house when something isn’t working, and without ever seeing the design, and figure out what’s wrong, and fix it. (There’s a guy in my basement right now, fixing a drainage problem that left my house without hot water, again! He needed to do a lot of work to learn how to do that, and there’s no way that I could do it myself.) That’s yet another set of skills and knowledge – and it’s still plumbing.
  • There are non-professional people who can fix leaky pipes, and replace damaged bits. With a bit of work, almost anyone can learn to do it. Still plumbing. But definitely: everyone really should be able to do at least some of this.

  • And there are people like me who can use a plumbing snake and a plunger when the toilet clogs. That’s still plumbing, but it requires no experience and no training, and absolutely everyone should be able to do it, without question.

All of those things involve plumbing, but they require vastly different amounts and kinds of training and experience.

Programming is exactly the same. There are different kinds of programming, which require different kinds of skills and knowledge. The tools and training methods that we use are vastly different for those different kinds of programming – so different that for many of them, people don’t even realize that they are programming. Almost everyone who uses computers does do some amount of programming:

  • When someone puts together a presentation in powerpoint, with things that move around, appear, and disappear on your command: that is programming.
  • When someone puts formula into a spreadsheet: that is programming.
  • When someone builds a website – even a simple one – and use either a set of tools, or CSS and HTML to put the site together: that is programming.
  • When someone writes a macro in Word or Excel: that is programming.
  • When someone sets up an autoresponder to answer their email while they’re on vacation: that is programming.

People like Victor completely disregard those things as programming, and then gripe about how all programming is supercomplexmagicalsymbolic gobbledygook. Most people do write programs without knowing about it, precisely because they’re doing it with tools that present the programming task as something that’s so natural to them that they don’t even recognize that they are programming.

But on the other hand, the idea that you should be able to program without understanding the machine you’re using or the machine that you’re building: that’s also pretty silly.

When you get beyond the surface, and start to get to doing more complex tasks, programming – like any skill – gets a lot harder. You can’t be a plumber without understanding how pipe connections work, what the properties of the different pipe materials are, and how things flow through them. You can’t be a programmer without understanding something about the machine. The more complicated the kind of programming task you want to do, the more you need to understand.

Someone who does Powerpoint presentations doesn’t need to know very much about the computer. Someone who wants to write spreadsheet macros needs to understand something about how the computer processes numbers, what happens to errors in calculations that use floating point, etc. Someone who wants to build an application like Word needs to know a whole lot about how a single computer works, including details like how the computer displays things to people. Someone who wants to build Google doesn’t need to know how computers render text clearly on the screen, but they do need to know how computers work, and also how networks and communications work.

To be clear, I don’t think that Victor is being dishonest. But the way that he presents things often does come off as dishonest, which makes it all the worse. To give one demonstration, he presents a comparison of how we teach programming to cooking. In it, he talks about how we’d teach people to make a soufflee. He shows a picture of raw ingredients on one side, and a fully baked soufflee on the other, and says, essentially: “This is how we teach people to program. We give them the raw ingredients, and say fool around with them until you get the soufflee.”

The thing is: that’s exactly how we really teach people to cook – taken far out of context. If we want them to be able to prepare exactly one recipe, then we give them complete, detailed, step-by-step instructions. But once they know the basics, we don’t do that anymore. We encourage them to start fooling around. “Yeah, that soufflee is great. But what would happen if I steeped some cardamom in the cream? What if I left out the vanilla? Would it turn out as good? Would that be better?” In fact, if you never do that experimentation, you’ll probably never learn to make a truly great soufflee! Because the ingredients are never exactly the same, and the way that it turns out is going to depend on the vagaries of your oven, the weather, the particular batch of eggs that you’re using, the amount of gluten in the flour, etc.

To write complicated programs is complicated. To write programs that manipulate symbolic data, you need to understand how the data symbolizes things. To write a computer that manipulates numbers, you need to understand how the numbers work, and how the computer represents them. To build a machine, you need to understand the machine that you’re building. It’s that simple.