Oy Veh! Power Series, Analytic Continuations, and Riemann Zeta

After the whole Plait fiasco with the sum of the infinite series of natural numbers, I decided it would interesting to dig into the real math behind that mess. That means digging in to the Riemann function, and the concept of analytic continuation.

A couple of caveats before I start:

  1. this is the area of math where I’m at my worst. I am not good at analysis. I’m struggling to understand this stuff well enough to explain it. If I screw up, please let me know in the comments, and I’ll do my best to update the main post promptly.
  2. This is way more complicated than most of the stuff I write on this blog. Please be patient, and try not to get bogged down. I’m doing my best to take something that requires a whole lot of specialized knowledge, and explain it as simply as I can.

What I’m trying to do here is to get rid of some of the mystery surrounding this kind of thing. When people think about math, they frequently get scared. They say things like “Math is hard, I can’t hope to understand it.”, or “Math produces weird results that make no sense, and there’s no point in my trying to figure out what it means, because if I do, my brain will explode. Only a super-genius geek can hope to understand it!”

That’s all rubbish.

Math is complicated, because it covers a whole lot of subjects. To understand the details of a particular branch of math takes a lot of work, because it takes a lot of special domain knowledge. But it’s not fundamentally different from many other things.

I’m a professional software engineer. I did my PhD in computer science, specializing in programming languages and compiler design. Designing and building a compiler is hard. To be able to do it well and understand everything that it does takes years of study and work. But anyone should be able to understand the basic concepts of what it does, and what the problems are.

I’ve got friends who are obsessed with baseball. They talk about ERAs, DIERAs, DRSs, EQAs, PECOTAs, Pythagorean expectations, secondary averages, UZRs… To me, it’s a huge pile of gobbledygook. It’s complicated, and to understand what any of it means takes some kind of specialized knowledge. For example, I looked up one of the terms I saw in an article by a baseball fan: “Peripheral ERA is the expected earned run average taking into account park-adjusted hits, walks, strikeouts, and home runs allowed. Unlike Voros McCracken’s DIPS, hits allowed are included.” I have no idea what that means. But it seems like everyone who loves baseball – including people who think that they can’t do their own income tax return because they don’t understand how to compute percentages – understand that stuff. They care about it, and since it means something in a field that they care about, they learn it. It’s not beyond their ability to understand – it just takes some background to be able to make sense of it. Without that background, someone like me feels lost and clueless.

That’s the way that math is. When you go to look at a result from complex analysis without knowing what complex analysis is, it looks like terrifyingly complicated nonsensical garbage, like “A meromorphic function is a function on an open subset of the complex number plain which is holomorphic on its domain except at a set of isolated points where it must have a Laurent series”.

And it’s definitely not easy. But understanding, in a very rough sense, what’s going on and what it means is not impossible, even if you’re not a mathematician.


Anyway, what the heck is the Riemann zeta function?

It’s not easy to give even the simplest answer of that in a meaningful way.

Basically, Riemann Zeta is a function which describes fundamental properties of the prime numbers, and therefore of our entire number system. You can use the Riemann Zeta to prove that there’s no largest prime number; you can use it to talk about the expected frequency of prime numbers. It occurs in various forms all over the place, because it’s fundamentally tied to the structure of the realm of numbers.

The starting point for defining it is a power series defined over the complex numbers (note that the parameter we use is s instead of a more conventional x: this is a way of highlighting the fact that this is a function over the complex numbers, not over the reals).

\zeta(s) = \sum_{n=1}^{\infty} n^{-s}

This function \zeta is not the Riemann function!

The Riemann function is something called the analytic continuation of \zeta. We’ll get to that in a moment. Before doing that; why the heck should we care? I said it talks about the structure of numbers and primes, but how?

The zeta function actually has a lot of meaning. It tells us something fundamental about properties of the system of real numbers – in particular, about the properties of prime numbers. Euler proved that Zeta is deeply connected to the prime numbers, using something called Euler’s identity. Euler’s identity says that for all integer values:

\sum_{n=1}^{\infty} n^{-s} = \prod_{p \in \textbf{Primes}} frac{1}{1-p^{-s}}

Which is a way of saying that the Riemann function can describe the probability distribution of the prime numbers.


To really understand the Riemann Zeta, you need to know how to do analytic continuation. And to understand that, you need to learn a lot of number theory and a lot of math from the specialized field called complex analysis. But we can describe the basic concept without getting that far into the specialized stuff.

What is an analytical continuation? This is where things get really sticky. Basically, there are places where there’s one way of solving a problem which produces a diverging infinite series. When that happens you say there’s no solution, that thepoint where you’re trying to solve it isn’t in the domain of the problem. But if you solve it in a different way, you can find a way of getting a solution that works. You’re using an analytic process to extend the domain of the problem, and get a solution at a point where the traditional way of solving it wouldn’t work.


A nice way to explain what I mean by that requires taking a
diversion, and looking at a metaphor. What we’re talking about here isn’t analytical continuation; it’s a different way of extending the domain of a function, this time in the realm of the real numbers. But as an example, it illustrates the concept of finding a way to get the value of a function in a place where it doesn’t seem to be defined.

In math, we like to play with limits. One example of that is in differential calculus. What we do in differential
calculus is look at continuous curves, and ask: at one specific location on the curve, what’s the slope?

If you’ve got a line, the slope is easy to determine. Take any two points on the line: (x_1, y+1), (x_2, y_2), where x_1 < x_2. Then the slope is \frac{y_2 - y_1}{x_2 - x_1}. It’s easy, because for a line, the slope never changes.

If you’re looking at a curve more complex than line, then slopes get harder, because they’re constantly changing. If you’re looking at y=x^2, and you zoom in and look at it very close to x=0, it looks like the slope is very close to 0. If you look at it close to 1, it looks like it’s around 2. If you look at it at x=10, it looks a bit more than 20. But there are no two points where it’s exactly the same!

So how can you talk about the slope at a particular point x=k? By using a limit. You pick a point really close to x=k, and call it x=k+epsilon. Then an approximate value of the slope at k is:

\frac{(x+\epsilon)^2 - x^2}{x+\epsilon - x}

The smaller epsilon gets, the closer your approximation gets. But you can’t actually get to \epsilon=0, because if you did, that slope equation would have 0 in the denominator, and it wouldn’t be defined! But it is defined for all non-zero values of \epsilon. No matter how small, no matter how close to zero, the slope is defined. But at zero, it’s no good: it’s undefined.

So we take a limit. As \epsilon gets smaller and smaller, the slope gets closer and closer to some value. So we say that the slope at the point – at the exact place where the denominator of that fraction becomes zero – is defined as:

 \lim_{\epsilon \rightarrow 0}  \frac{(k+\epsilon)^2 - k^2}{k+\epsilon - k} =

 \lim_{\epsilon \rightarrow 0}  \frac{  k^2 + 2k\epsilon + \epsilon^2 - k^2}{\epsilon} =

(Note: the original version of the previous line had a missing “-“. Thanks to commenter Thinkeye for catching it.)

 \lim_{\epsilon \rightarrow 0}  \frac{ 2k\epsilon + \epsilon^2}{\epsilon} =

Since epsilon is getting closer and closer to zero, epsilon^2 is getting smaller much faster; so we can treat it as zero:

 \lim_{\epsilon \rightarrow 0}  \frac{ 2k\epsilon}{\epsilon} = 2k

So at any point x=k, the slope of y=x^2 is 2k. Even though computing that involves dividing by zero, we’ve used an analytical method to come up with a meaningful and useful value at \epsilon=0. This doesn’t mean that you can divide by zero. You cannot conclude that \frac{2*0}{0} = 2. But for this particular analytical setting, you can come up with a meaningful solution to a problem that involves, in some sense, dividing by zero.


The limit trick in differential calculus is not analytic continuation. But it’s got a tiny bit of the flavor.

Moving on: the idea of analytic continuation comes from the field of complex analysis. Complex analysis studies a particular class of functions in the complex number plane. It’s not one of the easier branches of mathematics, but it’s extremely useful. Complex analytic functions show up all over the place in physics and engineering.

In complex analysis, people focus on a particular group of functions that are called analytic, holomorphic, and meromorphic. (Those three are closely related, but not synonymous.).

A holomorphic function is a function over complex variables, which has one
important property. The property is almost like a kind of abstract smoothness. In the simplest case, suppose that we have a complex equation in a single variable, and the domain of this function is D. Then it’s holomorphic if, and only if, for every point d \in D, the function is complex differentiable in some neighborhood of points around d.

(Differentiable means, roughly, that using a trick like the one we did above, we can take the slope (the derivative) around d. In the complex number system, “differentiable” is a much stronger condition than it would be in the reals. In the complex realm, if something is differentiable, then it is infinitely differentiable. In other words, given a complex equation, if it’s differentiable, that means that I can create a curve describing its slope. That curve, in turn, will also be differentiable, meaning that you can derive an equation for its slope. And that curve will be differentiable. Over and over, forever: the derivative of a differentiable curve in the complex number plane will always be differentiable.)

If you have a differentiable curve in the complex number plane, it’s got one really interesting property: it’s representable as a power series. (This property is what it means for a function to be called analytic; all holomorphic functions are analytic.) That is, a function f is holomorphic for a set S if, for all points s in S, you can represent the value of the function as a power series for a disk of values around s:

 f(z) = \sum_{n=0}^{\infty} a_n(z-c)^n

In the simplest case, the constant c is 0, and it’s just:

 f(z) = \sum_{n=0}^{\infty} a_nz^n

(Note: In the original version of this post, I miswrote the basic pattern of a power series, and put both z and s in the base. Thanks to John Armstrong for catching it.)

The function that we wrote, above, for the base of the zeta function is exactly this kind of power series. Zeta is an analytic function for a particular set of values. Not all values in the complex number plane; just for a specific subset.

If a function f is holomorphic, then the strong differentiability of it leads to another property. There’s a unique extension to it that expands its domain. The expansion always produces the same value for all points that are within the domain of f. It also produces exactly the same differentiability properties. But it’s also defined on a larger domain than f was. It’s essentially what f would be if its domain weren’t so limited. If D is the domain of f, then for any given domain, , where , there’s exactly one function with domain that’s an analytic continuation of f.

Computing analytic continuations is not easy. This is heavy enough already, without getting into the details. But the important thing to understand is that if we’ve got a function f with an interesting set of properties, we’ve got a method that might be able to give us a new function g that:

  1. Everywhere that f(s) was defined, f(s) = g(s).
  2. Everywhere that f(s) was differentiable, g(s) is also differentiable.
  3. Everywhere that f(s) could be computed as a sum of an infinite power series, g(s) can also be computed as a sum of an infinite power series.
  4. g(s) is defined in places where f(s) and the power series for f(s) is not.

So, getting back to the Riemann Zeta function: we don’t have a proper closed form equation for zeta. What we have is the power series of the function that zeta is the analytic continuation of:

\zeta(s) = \sum_{n=1}^{\infty} n^{-s}

If s=-1, then the series for that function expands to:

\sum_{n=1}^{\infty} n^1 = 1 + 2 + 3 + 4 + 5 + ...

The power series is undefined at this point; the base function that we’re using, that zeta is the analytic continuation of, is undefined at s=-1. The power series is an approximation of the zeta function, which works over some specific range of values. But it’s a flawed approximation. It’s wrong about what happens at s=-1. The approximation says that value at s=-1 should be a non-converging infinite sum. It’s wrong about that. The Riemann zeta function is defined at that point, even though the power series is not. If we use a different method for computing the value of the zeta function at s=-1 – a method that doesn’t produce an incorrect result! – the zeta function has the value -\frac{1}{12} at s=-1.

Note that this is a very different statement from saying that the sum of that power series is -\frac{1}{12} at s=-1. We’re talking about fundamentally different functions! The Riemann zeta function at s=-1 does not expand to the power series that we used to approximate it.

In physics, if you’re working with some kind of system that’s described by a power series, you can come across the power series that produces the sequence that looks like the sum of the natural numbers. If you do, and if you’re working in the complex number plane, and you’re working in a domain where that power series occurs, what you’re actually using isn’t really the power series – you’re playing with the analytic zeta function, and that power series is a flawed approximation. It works most of the time, but if you use it in the wrong place, where that approximation doesn’t work, you’ll see the sum of the natural numbers. In that case, you get rid of that sum, and replace it with the correct value of the actual analytic function, not with the incorrect value of applying the power series where it won’t work.

Ok, so that warning at the top of the post? Entirely justified. I screwed up a fair bit at the end. The series that defines the value of the zeta function for some values, the series for which the Riemann zeta is the analytical continuation? It’s not a power series. It’s a series alright, but not a power series, and not the particular kind of series that defines a holomorphic or analytical function.

The underlying point, though, is still the same. That series (not power series, but series) is a partial definition of the Riemann zeta function. It’s got a limited domain, where the Riemann zeta’s domain doesn’t have the same limits. The series definition still doesn’t work at s=-1. The series is still undefined at s=-1. At s=-1, the series expands to 1 + 2 + 3 + 4 + 5 + 6 + ..., which doesn’t converge, and which doesn’t add up to any finite value, -1/12 or otherwise. That series does not have a value at s=-1. No matter what you do, that equation – the definition of that series – does not work at s=-1. But the Riemann Zeta function is defined in places where that equation isn’t. Riemann Zeta at s=-1 is defined, and its value is -1/12.

Despite my mistake, the important point is still that last sentence. The value of the Riemann zeta function at s=-1 is not the sum of the set of natural numbers. The equation that produces the sequence doesn’t work at s=-1. The definition of the Riemann zeta function doesn’t say that it should, or that the sum of the natural numbers is -1/12. It just says that the first approximation of the Riemann zeta function for some, but not all values, is given by a particular infinite sum. In the places where that sum works, it gives the value of zeta; in places where that sum doesn’t work, it doesn’t.

Bad Math from the Bad Astronomer

This morning, my friend Dr24Hours pinged me on twitter about some bad math:

And indeed, he was right. Phil Plait the Bad Astronomer, of all people, got taken in by a bit of mathematical stupidity, which he credulously swallowed and chose to stupidly expand on.

Let’s start with the argument from his video.


We’ll consider three infinite series:

S1 = 1 - 1 + 1 - 1 + 1 - 1 + ...
S2 = 1 - 2 + 3 - 4 + 5 - 6 + ...
S3 = 1 + 2 + 3 + 4 + 5 + 6 + ...

S1 is something called Grandi’s series. According to the video, taken to infinity, Grandi’s series alternates between 0 and 1. So to get a value for the full series, you can just take the average – so we’ll say that S1 = 1/2. (Note, I’m not explaining the errors here – just repeating their argument.)

Now, consider S2. We’re going to add S2 to itself. When we write it, we’ll do a bit of offset:

1 - 2 + 3 - 4 + 5 - 6 + ...
    1 - 2 + 3 - 4 + 5 + ...
==============================
1 - 1 + 1 - 1 + 1 - 1 + ...

So 2S2 = S1; therefore S2 = S1=2 = 1/4.

Now, let’s look at what happens if we take the S3, and subtract S2 from it:

   1 + 2 + 3 + 4 + 5 + 6 + ...
- [1 - 2 + 3 - 4 + 5 - 6 + ...]
================================
   0 + 4 + 0 + 8 + 0 + 12 + ... == 4(1 + 2 + 3 + ...)

So, S3 – S2 = 4S3, and therefore 3S3 = -S2, and S3=-1/12.


So what’s wrong here?

To begin with, S1 does not equal 1/2. S1 is a non-converging series. It doesn’t converge to 1/2; it doesn’t converge to anything. This isn’t up for debate: it doesn’t converge!

In the 19th century, a mathematician named Ernesto Cesaro came up with a way of assigning a value to this series. The assigned value is called the Cesaro summation or Cesaro sum of the series. The sum is defined as follows:

Let A = {a_1 + a_2 + a_3 + ...}. In this series, s_k = Sigma_{n=1}^{k} a_n. s_k is called the kth partial sum of A.

The series A is Cesaro summable if the average of its partial sums converges towards a value C(A) = lim_{n rightarrow infty} frac{1}{n}Sigma_{k=1}^{n} s_k.

So – if you take the first 2 values of A, and average them; and then the first three and average them, and the first 4 and average them, and so on – and that series converges towards a specific value, then the series is Cesaro summable.

Look at Grandi’s series. It produces the partial sum averages of 1, 1/2, 2/3, 2/4, 3/5, 3/6, 4/7, 4/8, 5/9, 5/10, … That series clearly converges towards 1/2. So Grandi’s series is Cesaro summable, and its Cesaro sum value is 1/2.

The important thing to note here is that we are not saying that the Cesaro sum is equal to the series. We’re saying that there’s a way of assigning a measure to the series.

And there is the first huge, gaping, glaring problem with the video. They assert that the Cesaro sum of a series is equal to the series, which isn’t true.

From there, they go on to start playing with the infinite series in sloppy algebraic ways, and using the Cesaro summation value in their infinite series algebra. This is, similarly, not a valid thing to do.

Just pull out that definition of the Cesaro summation from before, and look at the series of natural numbers. The partial sums for the natural numbers are 1, 3, 6, 10, 15, 21, … Their averages are 1, 4/2, 10/3, 20/4, 35/5, 56/6, = 1, 2, 3 1/3, 5, 7, 9 1/3, … That’s not a converging series, which means that the series of natural numbers does not have a Cesaro sum.

What does that mean? It means that if we substitute the Cesaro sum for a series using equality, we get inconsistent results: we get one line of reasoning in which a the series of natural numbers has a Cesaro sum; a second line of reasoning in which the series of natural numbers does not have a Cesaro sum. If we assert that the Cesaro sum of a series is equal to the series, we’ve destroyed the consistency of our mathematical system.

Inconsistency is death in mathematics: any time you allow inconsistencies in a mathematical system, you get garbage: any statement becomes mathematically provable. Using the equality of an infinite series with its Cesaro sum, I can prove that 0=1, that the square root of 2 is a natural number, or that the moon is made of green cheese.

What makes this worse is that it’s obvious. There is no mechanism in real numbers by which addition of positive numbers can roll over into negative. It doesn’t matter that infinity is involved: you can’t following a monotonically increasing trend, and wind up with something smaller than your starting point.

Someone as allegedly intelligent and educated as Phil Plait should know that.

The Latest Update in the Hydrino Saga

Lots of people have been emailing me to say that there’s a new article out about Blacklight, the company started by Randall Mills to promote his Hydrino stuff, which claims to have an independent validation of his stuff, and announcing the any-day-now unveiling of the latest version of his hydrino-based generator.

First of all, folks, this isn’t an article, it’s a press release from Blacklight. The Financial Post just printed it in their online press-release section. It’s an un-edited release written by Blacklight.

There’s nothing new here. I continue to think that this is a scam. But what kind of scam?

To find out, let’s look at a couple of select quotes from this press release.

Using a proprietary water-based solid fuel confined by two electrodes of a SF-CIHT cell, and applying a current of 12,000 amps through the fuel, water ignites into an extraordinary flash of power. The fuel can be continuously fed into the electrodes to continuously output power. BlackLight has produced millions of watts of power in a volume that is one ten thousandths of a liter corresponding to a power density of over an astonishing 10 billion watts per liter. As a comparison, a liter of BlackLight power source can output as much power as a central power generation plant exceeding the entire power of the four former reactors of the Fukushima Daiichi nuclear plant, the site of one of the worst nuclear disasters in history.

One ten-thousandth of a liter of water produces millions of watts of power.

Sounds impressive, doesn’t it? Oh, but wait… how do we measure energy density of a substance? Joules per liter, or something equivalent – that is, energy per volume. But Blacklight is quoting energy density as watts per liter.

The joule is a unit of energy. A joule is a shorthand for frac{text{kilogram}*text{meter}^2}{text{second}^2}. Watts are a different unit, a measure of power, which is a shorthand for frac{text{kilogram}*text{meter}^2}{text{second}^3}. A watt is, therefore, one joule/second.

They’re quoting a rather peculiar unit there. I wonder why?

Our safe, non-polluting power-producing system catalytically converts the hydrogen of the H2O-based solid fuel into a non-polluting product, lower-energy state hydrogen called “Hydrino”, by allowing the electrons to fall to smaller radii around the nucleus. The energy release of H2O fuel, freely available in the humidity in the air, is one hundred times that of an equivalent amount of high-octane gasoline. The power is in the form of plasma, a supersonic expanding gaseous ionized physical state of the fuel comprising essentially positive ions and free electrons that can be converted directly to electricity using highly efficient magnetohydrodynamic converters. Simply replacing the consumed H2O regenerates the fuel. Using readily-available components, BlackLight has developed a system engineering design of an electric generator that is closed except for the addition of H2O fuel and generates ten million watts of electricity, enough to power ten thousand homes. Remarkably, the device is less than a cubic foot in volume. To protect its innovations and inventions, multiple worldwide patent applications have been filed on BlackLight’s proprietary technology.

Water, in the alleged hydrino reaction, produces 100 times the energy of high-octane gasoline.

Gasoline contains, on average, about 11.8 kWh/kg. A milliliter of gasoline weighs about 7/10ths of a gram, compared to the 1 gram weight of a milliter of water; therefore, a kilogram of gasoline should contain around 1400 milliliters. So, let’s take 11.8kWh/kg, and convert that to an equivalent measure of energy per milliter: about 8 1/2 kWh/milliliter. How does that compare to hydrinos? Oh, wait… we can’t convert those, now can we? Because they’re using power density. And the power density of a substance depends not just on how much power you can extract, but how long it takes to extract it. Explosives have fantastic power density! Gasoline – particularly high octane gasoline – is formulated to try to burn as slowly as possible, because internal combustion engines are more efficient on a slower burn.

To bring just a bit of numbers into it, TNT has a much higher power density than gasoline. You can easily knock down buildings with TNT, because of the way that it emits all of its energy in one super short burst. But it’s energy density is just 1/4th the energy density of gasoline.

Hmm. I wonder why Mills is using the power density?

Here’s my guess. Mills has some bullshit process where he spikes his generator with 12000 amps, and gets a microsecond burst of energy out. If you can produce 100 joules from one milliliter in 1/1000th of a second, that’s a power density of 100,000 joules per milliliter.

Suddenly, the amount of power that’s being generated isn’t so huge – and there, I would guess, is the key to Mills latest scam. If you’re hitting your generating apparatus with 12,000 amperes of electric current, and you’re producing microsecond burst of energy, it’s going to be very easy to produce that energy by consuming something in the apparatus, without that consumption being obvious to an observer who isn’t allowed to independently examine the apparatus in detail.


Now, what about the “independent verification”? Again, let’s look at the press release.

“We at The ENSER Corporation have performed about thirty tests at our premises using BLP’s CIHT electrochemical cells of the type that were tested and reported by BLP in the Spring of 2012, and achieved the three specified goals,” said Dr. Ethirajulu Dayalan, Engineering Fellow, of The ENSER Corporation. “We independently validated BlackLight’s results offsite by an unrelated highly qualified third party. We confirmed that hydrino was the product of any excess electricity observed by three analytical tests on the cell products, and determined that BlackLight Power had achieved fifty times higher power density with stabilization of the electrodes from corrosion.” Dr. Terry Copeland, who managed product development for several electrochemical and energy companies including DuPont Company and Duracell added, “Dr. James Pugh (then Director of Technology at ENSER) and Dr. Ethirajulu Dayalan participated with me in the independent tests of CIHT cells at The ENSER Corporation’s Pinellas Park facility in Florida starting on November 28, 2012. We fabricated and tested CIHT cells capable of continuously producing net electrical output that confirmed the fifty-fold stable power density increase and hydrino as the product.”

Who is the ENSER corporation? They’re an engineering consulting/staffing firm that’s located in the same town as Blacklight’s offices. So, pretty much, what we’re seeing is that Mills hired his next door neighbor to provide a data-free testimonial promising that the hydrino generator really did work.

Real scientists, doing real work, don’t pull nonsense like this. Mills has been promising a commercial product within a year for almost 25 years. In that time, he’s filed multiple patents, some of which have already expired! And yet, he’s never actually allowed an independent team to do a public, open test of his system. He’s never provided any actual data about the system!

He and his team have claimed things like “We can’t let people see it, it’s secret”. But they’re filing patents. You don’t get to keep a patent secret. A patent application, under US law, must contain: “a description of how to make and use the invention that must provide sufficient detail for a person skilled in the art (i.e., the relevant area of technology) to make and use the invention.”. In other words, if the patents that Mills and friends filed are legally valid, they must contain enough information for an interested independent party to build a hydrino generator. But Mills won’t let anyone examine his supposedly working generators. Why? It’s not to keep a secret!


Finally, the question that a couple of people, including one reporter for WiredUK asked: If it’s all a scam, why would Mills and company keep on making claims?

The answer is the oldest in the book: money.

In my email this morning, I got a new version of a 419 scam letter. It’s from a guy who claims to be the nephew of Ariel Sharon. He claims that his uncle owned some farmland, including an extremely valuable grove of olive trees, in the occupied west bank. Now, he claims, the family wants to sell that land – but as Sharon’s, they can’t let their names get in to the news. So, he says, he wants to “sell” the land to me for a pittance, and then I can sell it for what it’s really worth, and we’ll split the profits.

When you read about people who’ve fallen for 419 scams, you find that the scammers don’t ask for all of the money up front. They start off small: “There is a $500 fee for the transfer”. When they get that, they show you some “evidence” in the form of an official-looking transfer-clearance recepit. But then they say that there’s a new problem, and they need money to get around it. “We were preparing to transfer, but the clerk became suspicious; we need to bribe him!”, “There’s a new financial rule that you can’t transfer sums greater that $10000 to someone without a Nigerian bank account containing at least $100,000”. It’s a continual process. They always show some kind of fake document at each step of the way. The fakes aren’t particularly convincing unless you really want to be convinced, but they’re enough to keep the money coming.

Mills appears to be operating in very much the same vein. He’s getting investors to give him money, promising that whatever they invest, they’ll get back manifold when he starts selling hydrino power generators! He promises they’ll be on market within a year or two – five at most!

Then he comes up with either a demonstration, or the testimonial from his neighbor, or the self-publication of his book, or another press release talking about the newest version of his technology. It’s much better than the old one! This time it’s for real – just look at these amazing numbers! It’s 10 billion watts per liter, a machine that fits on your desk can generate as much power as a nuclear power plant!! We just need some more money to fix that pesky problem with corrosion on the electrodes, and then we’ll go to market, and you’ll be rich, rich, rich!

It’s been going on for almost 25 years, this constant cycle of press release/demo/testimonial every couple of years. (Seriously; in this post, I showed links to claims from 2009 claiming commercialization within 12 to 18 months; from 2005 claiming commercialization within months; and claims from 1999 claiming commercialization within a year.) But he always comes up with an excuse why those deadlines needed to be missed. And he always manages to find more investors, willing to hand over millions of dollars. As long as suckers are still willing to give him money, why wouldn’t he keep on making claims?

Leading in to Machine Code: Why?

I’m going to write a few posts about programming in machine language. It seems that many more people are interested in learning about the ARM processor, so that’s what I’ll be writing about. In particular, I’m going to be working with the Raspberry Pi running Raspbian linux. For those who aren’t familiar with it, the Pi is a super-inexpensive computer that’s very easy to program, and very easy to interface with the outside world. It’s a delightful little machine, and you can get one for around $50!

Anyway, before getting started, I wanted to talk about a few things. First of all, why learn machine language? And then, just what the heck is the ARM thing anyway?

Why learn machine code?

My answer might surprise you. Or, if you’ve been reading this blog for a while, it might not.

Let’s start with the wrong reason. Most of the time, people say that you should learn machine language for speed: programming at the machine code level gets you right down to the hardware, eliminating any layers of junk that would slow you down. For example, one of the books that I bought to learn ARM assembly (Raspberry Pi Assembly Language RASPBIAN Beginners: Hands On Guide) said:

even the most efficient languages can be over 30 times
slower than their machine code equivalent, and that’s on a good
day!

This is pure, utter rubbish. I have no idea where he came up with that 30x figure, but it’s got no relationship to reality. (It’s a decent book, if a bit elementary in approach; this silly statement isn’t representative of the book as a whole!)

In modern CPUs – and the ARM definitely does count as modern! – the fact is, for real world programs, writing code by hand in machine language will probably result in slower code!

If you’re talking about writing a single small routine, humans can be very good at that, and they often do beat compilers. Butonce you get beyond that, and start looking at whole programs, any human advantage in machine language goes out the window. The constraints that actually affect performance have become incredibly complex – too complex for us to juggle effectively. We’ll look at some of these in more detail, but I’ll explain one example.

The CPU needs to fetch instructions from memory. But memory is dead slow compared to the CPU! In the best case, your CPU can execute a couple of instructions in the time it takes to fetch a single value from memory. This leads to an obvious problem: it can execute (or at least start executing) one instruction for each clock tick, but it takes several ticks to fetch an instruction!

To get around this, CPUs play a couple of tricks. Basically, they don’t fetch single instructions, but instead grab entire blocks of instructions; and they start retrieving instructions before they’re needed, so that by the time the CPU is ready to execute an instruction, it’s already been fetched.

So the instruction-fetching hardware is constantly looking ahead, and fetching instructions so that they’ll be ready when the CPU needs them. What happens when your code contains a conditional branch instruction?

The fetch hardware doesn’t know whether the branch will be taken or not. It can make an educated guess by a process called branch prediction. But if it guesses wrong, then the CPU is stalled until the correct instructions can be fetched! So you want to make sure that your code is written so that the CPUs branch prediction hardware is more likely to guess correctly. Many of the tricks that humans use to hand-optimize code actually have the effect of confusing branch prediction! They shave off a couple of instructions, but by doing so, they also force the CPU to sit idle while it waits for instructions to be fetched. That branch prediction failure penalty frequently outweighs the cycles that they saved!

That’s one simple example. There are many more, and they’re much more complicated. And to write efficient code, you need to keep all of those in mind, and fully understand every tradeoff. That’s incredibly hard, and no matter how smart you are, you’ll probably blow it for large programs.

If not for efficiency, then why learn machine code? Because it’s how your computer really works! You might never actually use it, but it’s interesting and valuable to know what’s happening under the covers. Think of it like your car: most of us will never actually modify the engine, but it’s still good to understand how the engine and transmission work.

Your computer is an amazingly complex machine. It’s literally got billions of tiny little parts, all working together in an intricate dance to do what you tell it to. Learning machine code gives you an idea of just how it does that. When you’re programming in another language, understanding machine code lets you understand what your program is really doing under the covers. That’s a useful and fascinating thing to know!

What is this ARM thing?

As I said, we’re going to look at machine language coding on the
ARM processor. What is this ARM beast anyway?

It’s probably not the CPU in your laptop. Most desktop and laptop computers today are based on a direct descendant of the first microprocessor: the Intel 4004.

Yes, seriously: the Intel CPUs that drive most PCs are, really, direct descendants of the first CPU designed for desktop calculators! That’s not an insult to the intel CPUs, but rather a testament to the value of a good design: they’ve just kept on growing and enhancing. It’s hard to see the resemblance unless you follow the design path, where each step follows directly on its predecessors.

The Intel 4004, released in 1971, was a 4-bit processor designed for use in calculators. Nifty chip, state of the art in 1971, but not exactly what we’d call flexible by modern standards. Even by the standards of the day, they recognized its limits. So following on its success, they created an 8-bit version, which they called the 8008. And then they extended the instruction set, and called the result the 8080. The 8080, in turn, yielded successors in the 8088 and 8086 (and the Z80, from a rival chipmaker).

The 8086 was the processor chosen by IBM for its newfangled personal computers. Chip designers kept making it better, producing the 80286, 386, Pentium, and so on – up to todays CPUs, like the Core i7 that drives my MacBook.

The ARM comes from a different design path. At the time that Intel was producing the 8008 and 8080, other companies were getting into the same game. From the PC perspective, the most important was the 6502, which
was used by the original Apple, Commodore, and BBC microcomputers. The
6502 was, incidentally, the first CPU that I learned to program!

The ARM isn’t a descendant of the 6502, but it is a product of the 6502 based family of computers. In the early 1980s, the BBC decided to create an educational computer to promote computer literacy. They hired a company called Acorn to develop a computer for their program. Acorn developed a
beautiful little system that they called the BBC Micro.

The BBC micro was a huge success. Acorn wanted to capitalize on its success, and try to move it from the educational market to the business market. But the 6502 was underpowered for what they wanted to do. So they decided to add a companion processor: they’d have a computer which could still run all of the BBC Micro programs, but which could do fancy graphics and fast computation with this other processor.

In a typical tech-industry NIH (Not Invented Here) moment, they decided that none of the other commercially available CPUs were good enough, so they set out to design their own. They were impressed by the work done by the Berkeley RISC (Reduced Instruction Set Computer) project, and so they adopted the RISC principles, and designed their own CPU, which they called the Acorn RISC Microprocessor, or ARM.

The ARM design was absolutely gorgeous. It was simple but flexible
and powerful, able to operate on very low power and generating very little heat. It had lots of registers and an extremely simple instruction set, which made it a pleasure to program. Acorn built a lovely computer with a great operating system called RiscOS around the ARM, but it never really caught on. (If you’d like to try RiscOS, you can run it on your Raspberry Pi!)

But the ARM didn’t disappear. Tt didn’t catch on in the desktop computing world, but it rapidly took over the world of embedded devices. Everything from your cellphone to your dishwasher to your iPad are all running on ARM CPUs.

Just like the Intel family, the ARM has continued to evolve: the ARM family has gone through 8 major design changes, and dozens of smaller variations. They’re no longer just produced by Acorn – the ARM design is maintained by a consortium, and ARM chips are now produced by dozens of different manufacturers – Motorola, Apple, Samsung, and many others.

Recently, they’ve even starting to expand even beyond embedded platforms: the Chromebook laptops are ARM based, and several companies are starting to market server boxes for datacenters that are ARM based! I’m looking forward to the day when I can buy a nice high-powered ARM laptop.

More Basics: Compilers, Programs, and Languages

After my “what is an OS?” post, a couple of readers asked me to write a similar post about compilers.

Before I can answer what a compiler is, it’s helpful to first answer a different question: what is a program?

And here we get to one of my pet peeves. The most common answer to that question is “a detailed step-by-step sequence of instructions”. For example, here’s what wikipedia says:

A computer program, or just a program, is a sequence of instructions, written to perform a specified task with a computer.

This is wrong.

Back when people first started to study the idea of computing devices, they talked about computing machines as devices that performed a single, specific task. If you think about a basic Turing machine, you normally define Turing machines that perform a single computation. They’ve got a built-in sequence of states, and a built in transition table – the machine can only perform one computation. It took one kind of input, and performed its computation on that input, producing its output.

Building up from these specific machines, they came up with the idea of a universal computing device. A universal computer was a computing machine whose input was a description of a different computing machine. By giving the universal machine different inputs, it could perform different computations.

The point of this diversion is that looking at this history tells us what a program really is: it’s a description of a computing machine. Our computers are universal computing machines; they take programs as input to describe the computing machines we want them to emulate. What we’re doing when we program is describing a computing machine that we’d like to create. Then we feed it into our universal computing machine, and it behaves as if we’d built a custom piece of hardware to do our computation!

The problem is, our computers are simultaneously very primitive and overwhelming complex. They can only work with data expressed in fixed-length sequences of on/off values; to do anything else, we need to find a way of expressing in terms of extremely simple operations on those on/off values. To make them operate efficiently, they’ve got a complex structure: many different kinds of storage (registers, l1 and l2 caches, addressable memory), complicated instruction sets, and a whole lot of tricky perfomance tweaks. It’s really hard to program a computer in terms of its native instructions!

In fact, it’s so hard to program in terms of native instructions that we just don’t do it. What we do is write programs in terms of different machines. That’s the point of a programming language.

Looked at this way, a program language is a way of describing computing machines. The difference between different programming languages is how they describe computing machines. A language like C describes von Neumann machines. Haskell describes machines that work via lambda calculus computations using something like a spineless G-machine. . Prolog describes machines that perform computations in terms of intuitionistic logical inference like a Warren Abstract Machine.

So finally, we can get to the point: what is a compiler? A compiler is a program that takes a description of a computing device defined in one way, and translates into the kind of machine description that can be used by our hardware. A programming language ets us ignore all of the complexities of how our actual hardware is built, and describe our computations in terms of a simple abstraction. A compiler takes that description, and turns it into the form that computer hardware can actually use.

For anyone who’s read this far: I’ve gotten a few requests to talk about assembly language. I haven’t programmed in assembly since the days of the Motorola 68000. This means that to do it, I’ll need to learn something more up-to-date. Would you be more interested in seeing Intel, or ARM?

Boot all the computers!

Moving on from last weeks operating system post, today we’ll look at how a computer boots up and loads an operating system.

Let’s start with why booting is a question at all. When a computer turns on, what happens? What we’re using to seeing is that the disk drive turns on and starts spinning, and the computer loads something from the disk.

The question is how does the computer know how to turn on the disk? As I said in the OS post, the CPU only really knows how work with memory. To talk to a disk drive, it needs to do some very specific things – write to certain memory locations, wait for things to happen. Basically, in order to turn on that disk drive and load the operating system, it needs to run a program. But how does it know what program to run?

I’m going to focus on how modern PCs work. Other computers have used/do use a similar process. The details vary, but the basic idea is the same.

A quick overview of the process:

  1. CPU startup.
  2. Run BIOS initialization
  3. Load bootloader
  4. Run bootloader
  5. Load and run OS.

As that list suggests, it’s not a particularly simple process. We think of it as one step: turn on the computer, and it runs the OS. In fact, it’s a complicated dance of many steps.

On the lowest level, it’s all hardware. When you turn on a computer, some current gets sent to a clock. The clock is basically a quartz crystal; when you apply current to the crystal, it vibrates and produces a regular electrical pulse. That pulse is what drives the CPU. (When you talk about your computer’s speed, you generally describe it in terms of the frequency of the clock pulse. For example, in the laptop that I’m using to write this post, I’ve got a 2.4 GHz processor: that means that the clock chip pulses 2.4 billion times per second.)

When the CPU gets a clock pulse, it executes an instruction from memory. It knows what instruction to execute because it’s got a register (a special piece of memory built-in to the CPU) that tells it what instruction to execute. When the computer is turned on, that register is set to point at a specific location. Depending on the CPU, that might be 0, or it might be some other magic location; it doesn’t matter: what matters is that the CPU is built so that when it’s first turned on and it receives a clock pulse that starts it running, that register will always point at the same place.

The software part of the boot process starts there: the computer puts a chunk of read-only memory there – so when the computer turns on, there’s a program sitting at that location, which the computer can run. On PCs, that program is called the BIOS (Basic Input/Output System).

The BIOS knows how to tell the hardware that operates your display to show text on the screen, and it knows how to read stuff on your disk drives. It doesn’t know much beyond that. What it knows is extremely primitive. It doesn’t understand things like filesystems – the filesystem is set up and controlled by the operating system, and different operating systems will set up filesystems in different ways. The BIOS can’t do anything with a filesystem: it doesn’t include any programming to tell it how to read a filesystem, and it can’t ask the operating system to do it, because the OS hasn’t loaded yet!

What the BIOS does is something similar to what the CPU did when it started up. The CPU knew to look in a special location in memory to find a program to run. The BIOS knows to look at a special section on a disk drive to find a program to run. Every disk has a special chunk of data on it called the master boot record (MBR). The MBR contains another program, called a boot loader. So the BIOS loads the boot loader, and then uses it to actually load the operating system.

This probably seems a bit weird. The computer starts up by looking in a specific location for a program to run (the BIOS), which loads something (the bootloader). The thing it loads (the bootloader) also just looks in a specific location for a program to run (the OS). Why the two layers?

Different operating systems are build differently, and the specific steps to actually load and run the OS are different. For example, on my laptop, I’ve can run two operating systems: MacOS, and Linux. On MacOS (aka Darwin), there’s something called a microkernel that gets loaded. The microkernel is stored in a file named “mach_kernel” in the root directory of a type of filesystem called HFS. But in my installation of linux, the OS is stored in a file named “vmlinuz” in the root directory of a type of filesystem called EXT4. The BIOS doesn’t know what operating system it’s loading, and it doesn’t know what filesystem the OS uses – and that means that it knows neither the name of the file to load, nor how to find that file.

The bootloader was set up by the operating system. It’s specific to the operating system – you can think of it as part of the OS. So it knows what kind of filesystem it’s going to look at, and how to find the OS in that filesystem.

So once the bootloader gets started, it knows how to load and run the operating system, and once it does that, your computer is up and running, and ready for you to use!

Of course, all of this is a simplified version of how it works. But for understanding the process, it’s a reasonable approximation.

(To reply to commenters: I’ll try to do a post like this about compilers when I have some time to write it up.)

Basics: What is an OS?

A reader of this blog apparently likes the way I explain things, and wrote to me to ask a question: what is an operating system? And how does a computer know how to load it?

I’m going to answer that, but I’m going to do it in a roundabout way. The usual answer is something like: “An operating system or OS is a software program that enables the computer hardware to communicate and operate with the computer software.” In my opinion, that’s a cop-out: it doesn’t really answer anything. I’m going to take a somewhat roundabout approach, but hopefully give you an answer that actually explains things in more detail, which should help you understand it better.

When someone like me sets out to write a program, how can we do it? That sounds like an odd question, until you actually think about it. The core of the computer, the CPU, is a device which really can’t do very much. It’s a self-contained unit which can do lots of interesting mathematical and logical operations, but they all happen completely inside the CPU (how they happen inside the CPU is way beyond this post!). To get stuff in and out of the CPU, the only thing that the computer can do is read and write values from the computer’s memory. That’s really it.

So how do I get a program in to the computer? The computer can only read the program if it’s in the computer’s memory. And every way that I can get it into the memory involves the CPU!

Computers are built so that there are certain memory locations and operations that are used to interact with the outside world. They also have signal wires called interrupt pins where other devices (like disk drives) can apply a current to say “Hey, I’ve got something for you”. The exact mechanics are, of course, complicated, and vary from CPU to CPU. But to give you an idea of what it’s like, to read some data from disk, you’d do something like the following.

  1. Set aside a chunk of memory where the data should be stored after it’s read. This is called a buffer.
  2. Figure out where the data you want to read is stored on the disk. You can identify disk locations as a number. (It’s usually a bit more complicated than that, but we’re trying to keep this simple.
  3. Write that number into a special memory location that’s monitored by the disk drive controller.
  4. Wait until the disk controller signals you via an interrupt that the data is ready. The data will be stored in a special memory location, that can be altered by the disk. (Simplifying again, but this is sometimes called a DMA buffer.)
  5. Copy the data from the controller’s DMA buffer into the application’s memory buffer that you allocated.

When you down to that level, programming is an intricate dance! No one
wants to do that – it’s too complicated, too error prone, and just generally
too painful. But there’s a deeper problem: at this level, it’s every program
for itself. How do you decide where on the disk to put your data? How can you
make sure that no one else is going to use that part of the disk? How can you
tell another program where to find the data that you stored?

You want to have something that creates the illusion of a much simpler computational world. Of course, under the covers, it’s all going to be that incredibly messy stuff, but you want to cover it up. That’s the job of an operating system: it’s a layer between the hardware and the programs that you run that create a new computational world that’s much easier to work in.

Instead of having to do the dance of mucking with the hard disk drive controller yourself, the operating system gives you a way of saying “Open a file named ‘foo'”, and then it takes that request, figures out where ‘foo’ is on the disk, talks to the disk drive, gets the data, and then hands you a buffer containing it. You don’t need to know what kind of disk drive the data is coming from, how the name ‘foo’ maps to sectors of the disk. You don’t need to know where the control memory locations for the drive are. You just let the operating system do that for you.

So, ultimately, this is the answer: The operating system is a program that runs on the computer, and creates the environment in which other programs can run. It does a lot of things to create a pleasant environment in which to write and run other programs. Among the multitude of services provided by most modern operating system are:

  1. Device input and output. This is what we talked about above: direct interaction with input and output devices is complicated and error prone; the operating system implements the input and output processes once, (hopefully) without errors, and then makes it easy for every other program to just use its correct implementation.
  2. Multitasking: your computer has enough power to do many things at once. Most modern computers have more than one CPU. (My current laptop has 4!) And most programs end up spending a lot of their time doing nothing: waiting for you to press a key, or waiting for the disk drive to send it data. The operating system creates sandboxes, called processes, and allows one program to run in each sandbox. It takes care of ensuring that each process gets to run on a CPU for a fair share of the time.
  3. Memory management. With more than one program running at the same time on your computer, you need to make sure that you’re using memory that isn’t also being used by some other program, and to make sure that no other program can alter the memory that you’re using without your permission. The operating system decides what parts of memory can be used by which program.
  4. Filesystems. Your disk drive is really just a huge collection of small sections, each of which can store a fixed number of bits, encoded in some strange format dictated by the mechanics of the drive. The OS provides an abstraction that’s a lot easier to deal with.

I think that’s enough for one day. Tomorrow: how the computer knows how to run the OS when it gets switched on!

The Birthday Paradox

To me, the thing that makes probability fun is that the results are frequently surprising. We’ve got very strong instincts about how we expect numbers to work. But when you do anything that involves a lot of computations with big numbers, our intuition goes out the window – nothing works the way we expect it to. A great example of that is something called the birthday paradox.

Suppose you’ve got a classroom full of people. What’s the probability that there are two people with the same birthday? Intuitively, most people expect that it’s pretty unlikely. It seems like it shouldn’t be likely – 365 possible birthdays, and 20 or 30 people in a classroom? Very little chance, right?

Let’s look at it, and figure out how to compute the probability.

Interesting probability problems are all about finding out how to put things together. You’re looking at things where there are huge numbers of possible outcomes, and you want to determine the odds of a specific class of outcomes. Finding the solutions is all about figuring out how to structure the problem.

A great example of this is something called the birthday paradox. This is a problem with a somewhat surprising outcome. It’s also a problem where finding the right way to structure the problem is has a dramatic result.

Here’s the problem: you’ve got a group of 30 people. What’s the probability that two people out of that group of thirty have the same birthday?

We’ll look at it with some simplifying assumptions. We’ll ignore leap year – so we’ve got 365 possible birthdays. We’ll assume that all birthdays are equally likely – no variation for weekdays/weekends, no variation for seasons, and no holidays, etc. Just 365 equally probable days.

How big is the space? That is, how many different ways are there to assign birthdays to 30 people? It’s 36530 or something in the vicinity of 7.4*1076.

To start off, we’ll reverse the problem. It’s easier to structure the problem if we try to ask “What’s the probability that no two people share a birthday”. If P(B) is the probability that no two people share a birthday, then 1-P(B) is the probability that at least two people share a birthday.

So let’s look at a couple of easy cases. Suppose we’ve got two people? What’s the odds that they’ve got the same birthday? 1 in 365: there are 3652 possible pairs of birthdays; there are 365 possible pairs. So there’s a probability of 365/3652 that the two people have the same birthday. For just two people, it’s pretty easy. In the reverse form, there’s a 364/365 chance that the two people have different birthdays.

What about 3 people? It’s the probability of the first two having different birthdays, and the probability of the third person having a different birthday that either of those first two. There are 365 possible birthdays for the third person, and 363 possible days that don’t overlay with the first two. So for N people, the probability of having distinct birthdays is 1 \times (1 - 1/365) \times (1 - 2/365) \times \dots (1 - (n/365)).

At this point, we’ve got a nice recursive definition. Let’s say that f(N) is the probability of N people having distinct birthdays. Then:

  1. For 2 people, the probability of distinct birthdays is 364/365. (f(2) = \frac{364}{365})
  2. For N>2 people, the probability of distinct birthdays is
    \frac{365-(N-1)}{365} times f(n-1).

Convert that to a closed form, and you get: f(n) = \frac{365!}{(365-(n-1))!365^n}. For 30 people, that’s
\frac{365!}{(365-29)!*365^{30}}. Work it out, and that’s
0.29 – so the probability of everyone having distinct
birthdays is 29% – which means that the probability of at least
two people in a group of 30 having the same birthday is 71%!

You can see why our intuitions are so bad? We’re talking about something where one factor in the computation is the factorial of 365!

Let’s look a bit further: how many people do you need to have, before there’s a 50% chance of 2 people sharing a birthday? Use the formulae we wrote up above, and it turns out to be 23. Here’s the numbers – remember that this is the reverse probability, the probability of all birthdays being distinct.

1 1
2 0.997260273973
3 0.991795834115
4 0.983644087533
5 0.9728644263
6 0.959537516351
7 0.943764296904
8 0.925664707648
9 0.905376166111
10 0.883051822289
11 0.858858621678
12 0.832975211162
13 0.805589724768
14 0.776897487995
15 0.747098680236
16 0.716395994747
17 0.684992334703
18 0.653088582128
19 0.620881473968
20 0.588561616419
21 0.556311664835
22 0.524304692337
23 0.492702765676
24 0.461655742085
25 0.431300296031
26 0.401759179864
27 0.373140717737
28 0.345538527658
29 0.319031462522
30 0.293683757281
31 0.269545366271
32 0.24665247215
33 0.225028145824
34 0.20468313538
35 0.185616761125
36 0.16781789362
37 0.151265991784
38 0.135932178918
39 0.121780335633
40 0.108768190182
41 0.0968483885183
42 0.0859695284381
43 0.0760771443439
44 0.0671146314486
45 0.0590241005342
46 0.0517471566327
47 0.0452255971667
48 0.0394020271206
49 0.0342203906773
50 0.029626420422

With just 23 people, there’s a greater than 50% chance that two people will have the same birthday. By the time you get to just 50 people, there’s a greater than 97% chance that two people have the same birthday!

As an amusing aside, the first time I saw this problem worked through was in an undergraduate discrete probability theory class, with 37 people in the class, and no duplicate birthdays!

Now – remember at the beginning, I said that the trick to working probability problems is all about how you formulate the problem. There’s a much, much better way to formulate this.

Think of the assignment of birthdays as a function from people to birthdays: f: P \rightarrow B. The number of ways of assigning birthdays to people is the size of the set of functions from people to birthdays. How many possible functions are there? | B | ^{| P |}. | B | is the number of days in the year – 365, and | P | is the number of people in the group.

The set of assignments to unique birthdays is the number of injective functions. (An injective function is a function where f(x) = f(y) \Leftrightarrow x = y.) How many injective functions are there? \frac{| B |!}{(| B | - | P |)!}.

The probability of all birthdays being unique is the size of the set of injective functions divided by the size of the set of all assignments: \frac{\frac{| B |!}{(| B | - | P |)!}}{ | B | ^{| P |} } = \frac{365!}{365^P\times (365 - P)!}.

So we’ve got the exact same result – but it’s a whole lot easier in term of the discrete functions!

The Elegance of Uncertainty

I was recently reading yet another botched explanation of Heisenberg’s uncertainty principle, and it ticked me off. It wasn’t a particularly interesting one, so I’m not going disassemble it in detail. What it did was the usual crackpot quantum dance: Heisenberg said that quantum means observers affect the universe, therefore our thoughts can control the universe. Blah blah blah.

It’s not worth getting into the cranky details. But it inspired me to actually take some time and try to explain what uncertainty really means. Heisenberg’s uncertainty principle is fascinating. It’s an extremely simple concept, and yet when you realize what it means, it’s the most mind-blowingly strange thing that you’ve ever heard.

One of the beautiful things about it is that you can take the math of uncertainty and reduce it to one simple equation. It says that given any object or particle, the following equation is always true:

sigma_x sigma_p ge hbar

Where:

  • sigma_x is a measurement of the amount of uncertainty
    about the position of the particle;
  • sigma_p is the uncertainty about the momentum of the particle; and
  • hbar is a fundamental constant, called the reduced Plank’s constant, which is roughly 1.05457173 times 10^{-34}frac{m^2 kg}{s}.

That last constant deserves a bit of extra explanation. Plank’s constant describes the fundamental granularity of the universe. We perceive the world as being smooth. When we look at the distance between two objects, we can divide it in half, and in half again, and in half again. It seems like we should be able to do that forever. Mathematically we can, but physically we can’t! Eventually, we get to a point where where is no way to subdivide distance anymore. We hit the grain-size of the universe. The same goes for time: we can look at what happens in a second, or a millisecond, or a nanosecond. But eventually, it gets down to a point where you can’t divide time anymore! Planck’s constant essentially defines that smallest unit of time or space.

Back to that beautiful equation: what uncertainty says is that the product of the uncertainty about the position of a particle and the uncertainty about the momentum of a particle must be at least a certain minimum.

Here’s where people go wrong. They take that to mean that our ability to measure the position and momentum of a particle is uncertain – that the problem is in the process of measurement. But no: it’s talking about a fundamental uncertainty. This is what makes it an incredibly crazy idea. It’s not just talking about our inability to measure something: it’s talking about the fundamental true uncertainty of the particle in the universe because of the quantum structure of the universe.

Let’s talk about an example. Look out the window. See the sunlight? It’s produced by fusion in the sun. But fusion should be impossible. Without uncertainty, the sun could not exist. We could not exist.

Why should it be impossible for fusion to happen in the sun? Because it’s nowhere near dense or hot enough.

There are two forces that you need to consider in the process of nuclear fusion. There’s the electromagnetic force, and there’s the strong nuclear force.

The electromagnetic force, we’re all familiar with. Like charges repel, different charges attract. The nucleus of an atom has a positive charge – so nuclei repel each other.

The nuclear force we’re less familiar with. The protons in a nucleus repel each other – they’ve still got like charges! But there’s another force – the strong nuclear force – that holds the nucleus together. The strong nuclear force is incredibly strong at extremely short distances, but it diminishes much, much faster than electromagnetism. So if you can get a proton close enough to the nucleus of an atom for the strong force to outweigh the electromagnetic, then that proton will stick to the nucleus, and you’ve got fusion!

The problem with fusion is that it takes a lot of energy to get two hydrogen nuclei close enough to each other for that strong force to kick in. In fact, it turns out that hydrogen nuclei in the sun are nowhere close to energetic enough to overcome the electromagnetic repulsion – not by multiple orders of magnitude!

But this is where uncertainty comes in to play. The core of the sun is a dense soup of other hydrogen atoms. They can’t move around very much without the other atoms around them moving. That means that their momentum is very constrained – sigma_p is very small, because there’s just not much possible variation in how fast it’s moving. But the product of sigma_p and sigma_x have to be greater than hbar, which means that sigma_x needs to be pretty large to compensate for the certainty about the momentum.

If sigma_x is large, that means that the particle’s position is not very constrained at all. It’s not just that we can’t tell exactly where it is, but it’s position is fundamentally fuzzy. It doesn’t have a precise position!

That uncertainty about the position allows a strange thing to happen. The fuzziness of position of a hydrogen nucleus is large enough that it overlaps with the the nucleus of another atom – and bang, they fuse.

This is an insane idea. A hydrogen nucleus doesn’t get pushed into a collision with another hydrogen nucleus. It randomly appears in a collided state, because it’s position wasn’t really fixed. The two nuclei that fused didn’t move: they simply didn’t have a precise position!

So where does this uncertainty come from? It’s part of the hard-to-comprehend world of quantum physics. Particles aren’t really particles. They’re waves. But they’re not really waves. They’re particles. They’re both, and they’re neither. They’re something in between, or they’re both at the same time. But they’re not the precise things that we think of. They’re inherently fuzzy probabilistic things. That’s the source uncertainty: at macroscopic scales, they behave as if they’re particles. But they aren’t really. So the properties that associate with particles just don’t work. An electron doesn’t have an exact position and velocity. It has a haze of probability space where it could be. The uncertainty equation describes that haze – the inherent uncertainty that’s caused by the real particle/wave duality of the things we call particles.

This one's for you, Larry! The Quadrature BLINK Kickstarter

After yesterday’s post about the return of vortex math, one of my coworkers tweeted the following at me:

Larry’s a nice guy, even if he did give me grief at my new-hire orientation. So I decided to take a look. At oh my, what a treasure he found! It’s a self-proclaimed genius with a wonderful theory of everything. And he’s running a kickstarter campaign to raise money to publish it. So it’s a lovely example of profound crackpottery, with a new variant of the buy my book gambit!

To be honest, I’m a bit uncertain about this. At times, it seems like the guy is dead serious; at other times, it seems like it’s an elaborate prank. I’m going to pretend that it’s completely serious, because that will make this post more fun.

So, what exactly is this theory of everything? I don’t know for sure. He’s dropping hints, but he’s not going to tell us the details of the theory until enough people buy his book! But he’s happy to give us some hints, starting with an explanation of what’s wrong with physics, and why a guy with absolutely no background in physics or math is the right person to revolutionize physics! He’ll explain it to us in nine brief points!

First: Let me ask you a question. Since the inclusion of Relativity and Dirac’s Statistical Model, why has Physics been at loose ends to unify the field? Everyone has tried and failed, and for this reason so many have pointed out: what we don’t need, is another TOE, Theory of Everything. So if I was a Physicist, my theory would probably just be one of these… a failed TOE based on the previous literature.

But why do these theories fail? One thing for sure is that in academia every new ideas stems from previously accepted ideas, with a little tweak here or there. In the main, TOEs in Physics have this in common, and they all have failed. What does this tell you?

See, those physicists, they’re all just trying the same stuff, and they all failed, therefore they’ll never succeed.

When I look at modern physics, I see some truly amazing things. To pull out one particularly prominent example from this year, we’ve got the higgs boson. He’ll sneer at the higgs boson a bit later, but that was truly astonishing: decades ago, based on a deep understanding of the standard model of particle physics, a group of physicists worked out a theory of what mass was and how it worked. They used that to make a concrete prediction about how their theory could tested. It was untestable at the time, because the kind of equipment needed to perform the experiment didn’t exist, and couldn’t exist with current technology. 50 years later, after technology advanced, their prediction was confirmed.

That’s pretty god-damnned amazing if you ask me.

Based on the arguments from our little friend, a decade ago, you could have waved your hands around, and said that physicists had tried to create theories about why things had mass, and they’d failed. Therefore, obviously, no theory of mass was going to come from physics, and if you wanted to understand the universe, you’d have to turn to non-physicists.

On to point two!

Second: the underlying assumptions in Physics must be wrong, or somehow grossly mis-specified.

That’s it. That’s the entire point. No attempt to actually support that argument. How do we know that the underlying assumptions in physics must be wrong? Because he says so. Period.

Third: Who can challenge the old paradigm of Physics, only Copernicus? Physicists these days cannot because they are too inured of their own system of beliefs and methodologies. Once a PhD is set in place, Lateral Thinking, or “thinking outside the box,” becomes almost impossible due to departmental “silo thinking.” Not that physicists aren’t smart – some are genius, but like everyone in the academic world they are focused on publishing, getting research grants, teaching and other administrative duties. This leaves little time for creative thinking, most of that went into the PhD. And a PhD will not be accepted unless a candidate is ready and willing to fall down the “departmental silo.” This has a name: Catch 22.

It’s the “good old boys” argument. See, all those physicists are just doing what their advisors tell them to; once they’ve got their PhD, they’re just producing more PhDs, enforcing the same bogus rules that their advisors inflicted on them. Not a single physicist in the entire world is willing to buck this! Not one single physicist in the world is willing to take the chance of going down as one of the greatest scientific minds in history by bucking the conventional wisdom.

Except, of course, there are plenty of people doing that. For an example, right off the top of my head, we’ve got the string theorists. Sure, they get lots of justifiable criticism. But they’ve worked out a theory that does seem to describe many things about the universe. It’s not testable with present technology, and it’s not clear that it will ever be testable with any kind of technology. But according to Bretholt’s argument, the string theorists shouldn’t exist. They’re bucking the conventional model, and they’re getting absolutely hammered for it by many of their colleagues – but they’re still going ahead and working on it, because they believe that they’re on to something important.

Fourth: There is not much new theory-making going on in Physics since its practitioners believe their Standard Model is almost complete: just a few more billion dollars in research and all the colors of the Higgs God Particle may be sorted, and possibly we may even glimpse the Higgs Field itself. But this is sort of like hunting down terrorists: if you are in control of defining what a terrorist is, then you will never be out of a job or be without a budget. This has a name too: Self-Fulfilling Prophesy. The brutal truth…

Right, there’s not much new theory-making going on in physics. No one is working on string theory. There’s no one coming up with theories about dark matter or dark energy. There’s no one trying to develop a theory of quantum gravity. No one ever does any of this stuff, because there’s no new theory-making going on.

Of course, he hand-waves one of the most fantastic theory-confirmations from physics. The higgs got lots of press, and lots of people like to hand-wave about it and overstate what it means. (“It’s the god particle!”) But even stripped down to its bare minimum, it’s an incredible discovery, and for a jackass like this to wave his hands and pretend that it’s meaningless and we need to stop wasting time on stuff like the LHC and listen to him: I just don’t even know the right words to describe the kind of disgust it inspires in me.

Fifth: Who then can mount such a paradigm-breaking project? Someone like me, prey tell! But birds like me just don’t sit around the cage and get fat, we fly to the highest vantage point, and see things for what they are! We have a name as well: Free Thinkers. We are exactly what your mother warned you of… There’s a long list of us include Socrates, Christ, Buddha, Taoist Masters, Tibetan Masters, Mohammed, Copernicus, Newton, Maxwell, Gödel, Hesse, Jung, Tesla, Planck… All are Free Thinkers, confident enough in their own knowledge and wisdom that they are willing to risk upsetting the applecart! We soar so humanity can peer beyond its petty day to day and discover itself.

There’s two things that really annoy me about this paragraph. First of all, there’s the arrogance. This schmuck hasn’t done anything yet, but he sees fit to announce that he’s up there with Newton, Maxwell, etc.

Second, there’s the mushing together of scientists and religious figures. Look, I’m a religious jew. I don’t have anything against respecting theology, theologians, or religious authorities. But science is different. Religion is about subjective experience. Even if you believe profoundly in, say, Buddhism, you can’t just go through the motions of what Buddha supposedly did and get exactly the same result. There’s no objective, repeatable way of testing it. Science is all about the hard work of repeatable, objective experimentation.

He continues point 5:

This chain might have included Einstein and Dirac had they not made three fatal mistakes in Free Thinking: They let their mathematical machine dictate what was true rather than using mathematics only to confirm their observations, they got fooled by their own anthropomorphic assumptions, and then they rooted these assumptions into their mathematical methods. This derailed the last two generations of scientific thinking.

Here’s where he strays into the real territory of this blog.

Crackpots love to rag on mathematics. They can’t understand it, and they want to believe that they’re the real geniuses, so the math must be there to confuse things!

Scientists don’t use math to be obscure. Learning math to do science isn’t some sort of hazing ritual. The use of math isn’t about making science impenetrable to people who aren’t part of the club. Math is there because it’s essential. Math gives precision to science.

Back to the Higgs boson for a second. The people who proposed the Higgs didn’t just say “There’s a field that gives things mass”. They described what the field was, how they thought it worked, how it interacted with the rest of physics. The only way to do that is with math. Natural language is both too imprecise, and too verbose to be useful for the critical details of scientific theories.

Let me give one example from my own field. When I was in grad school, there was a new system of computer network communication protocols under design, called OSI. OSI was complex, but it had a beauty to its complexity. It carefully divided the way that computer networks and the applications that run on them work into seven layers. Each layer only needed to depend on the details of the layer beneath it. When you contrast it against TCP/IP, it was remarkable. TCP/IP, the protocol that we still use today, is remarkably ad-hoc, and downright sloppy at times.

But we’re still using TCP/IP today. Why?

Because OSI was specified in english. After years of specification, several companies and universities implemented OSI network stacks. When they connected them together, what happened? It didn’t work. No two of the reference implementations could talk to each other. Each of them was perfectly conformant with the specification. But the specification was imprecise. To a human reader, it seemed precise. Hell, I read some of those specifications (I worked on a specification system, and read all of specs for layers 3 and 4), and I was absolutely convinced that they were precise. But english isn’t a good language for precision. It turned out that what we all believed was perfectly precise specification actually had numerous gaps.

There’s still a lot of debate about why the OSI effort failed so badly. My take, having been in the thick of it is that this was the root cause: after all the work of building the reference implementations, they realized that their specifications needed to go back to the drawing board, and get the ambiguities fixed – and the world outside of the OSI community wasn’t willing to wait. TCP/IP, for all of its flaws, had a perfectly precise specification: the one, single, official reference implementation. It might have been ugly code, it might have been painful to try to figure out what it meant – but it was absolutely precise: whatever that code did was right.

That’s the point of math in science: it gives you that kind of unambiguous precision. Without precision, there’s no point to science.

Sixth: What happens to Relativity when the assumptions of Lorentz’ space-time is removed? Under these assumptions, the speed of light limits the speed of moving bodies. The Lorentz Transformation was designed specifically to set this speed limit, but there is no factual evidence to back it up. At first, the transformation assumed that there would be length and time dilations and a weight increase when travelling at sub-light speeds. But after the First Misguided Generation ended in the mid 70’s, the weight change idea was discarded as untenable. It was quietly removed because it implied that a body propagating at or near the speed of light would become infinitely massive and turn into a black hole. Thus, the body would swallow itself up and disappear!

Whoops… bad assumption!

The space contraction idea was left intact because it was imperative to Hilbert’s rendition of the space-time geodesic that he devised for Einstein in 1915. Hilbert was the best mathematician of his day, if not ever! He concocted the mathematical behemoth called General Relativity to encapsulate Einstein’s famous insight that gravitation was equivalent to an accelerating frame. Now, not only was length assumed to contract, but space was assumed to warp and gravitation was assumed to be an accelerating frame, though no factual evidence exists to back up these assumptions!

Whoops… 3 bad assumptions in a row!

This is an interestingly bizarre argument.

Relativity predicts a change in mass (not weight!) as velocity increases. That prediction has not changed. It has been confirmed, repeatedly, by numerous experiments. The entire reasoning here is based on the unsupported assertion that relativistic changes in mass have been discarded as incorrect. But that couldn’t be farther from the truth!

Similarly, he’s asserting that the space-warping effects of gravity – one of the fundamental parts of general relativity – is incorrect, again without the slightest support.

This is going to seem like a side-track, but bear with me:

When I came in to my office this morning, I took out my phone and used foursquare to check in. How did that work? Well, my phone received signals from a collection of satellites, and based on the tiny differences in data contained in those signals, it was able to pinpoint my location to precisely the corner of 43 street and Madison avenue, outside of Grand Central Terminal in Manhattan.

To be able to pinpoint my location that precisely, it ultimately relies on clocks in the satellites. Those clocks are in orbit, moving very rapidly, and in a different position in earths gravity well. Space-time is less warped at their elevation than it is here on earth. Relativity predicts that based on that fact, the clocks in those satellites must move at a different rate than clocks here on earth. In order to get precise positions, those clocks need to be adjusted to keep time with the receivers on the surface of the earth.

If relativity – with its interconnected predictions of changes in mass, time, and the warp of space-time – didn’t work, then the corrections made by the GPS satellites wouldn’t be needed. And yet, they are.

There are numerous other examples of this. We’ve observed relativistic effects in many different ways, in many different experiments. Despite what Mr. Bretholt asserts, none of this has been disproven or discarded.

Seventh: Many, many, many scientists disagree with Relativity for these reasons and others, but Physics keeps it as a mainstream idea. It has been violated over and over again in various space programs, and is rarely used in the aerospace industry when serious results are expected. Physics would like to correct Relativity because it doesn’t jive with the Quantum Standard Model, but they can’t conceive how to fix it.

In Quadrature Theory the problem with Relativity is obvious and easily solved. The problem is that the origin and nature of space is not known, nor is the origin and nature of time or gravitation. Einstein did not prove anything about gravitation, norhas anyone since. The “accelerating frame” conjecture is for the convenience of mathematics and sheds no light on the nature of gravitation itself. Quantum Chromo Dynamics, QCD, hypothesizes the “graviton” on the basis of similarly convenient mathematics. Many scientists disagree with such “force carrier” propositions: they are all but silenced by the trends in Physics publishing, however. The “graviton” is, nevertheless, a mathematical fiction similar to Higgs Boson.

Whoops… a couple more bad assumptions, but where did they come from?

Are there any serious scientists who disagree with relativity? Mr. Bretholt doesn’t actually name any. I can’t think of any credible ones. Certainly pretty much all physicists agree that there’s a problem because both relativity and quantum physics both appear to be correct, but they’re not really compatible. It’s a major area of research. But that’s a different thing from saying that scientists “disagree” with or reject relativity. Relativity has passed every experimental test that anyone has been able to devise.

Of course, it’s completely true that Einstein didn’t prove anything about gravity. Science doesn’t deal with proof. Science devises models based on observations. It tries to find the best predictive model of the universe that it can, based on repeated observation. Science can disprove things, by showing that they don’t match our observations of reality, but it can’t prove that a theory is correct. So we can never be sure that our model is correct – just that it does a good job of making predictions that match further observations. Relativity could be completely, entirely, 100% wrong. But given everything we know now, it’s the best predictive theory we have, and nothing we’ve been able to do can disprove it.

Ok, I’ve gone on long enough. If you want to see his last couple of points, go ahead and follow the link to his “article”. After all of this, we still haven’t gotten to anything about what his supposed new theory actually says, and I want to get to just a little bit of that. He’s not telling us much – he wants money to print his book! – but what little he says is on his kickstarter page.

So let me introduce that modification: it’s called Quadrature, or Q. Quadrature arose from Awareness as the original separation of Awareness from itself. This may sound strangely familiar; I elaborate at length about it in BLINK. The Theory of Quadrature develops Q as the Central Generating Principle that creates the Universe step by step. After a total of 12 applications of Quadrature, it folds back on itself like a snake biting its tail. Due to this inevitable closure, the Universe is complete, replete with life, energy and matter, both dark and light. As a necessary consequence of this single Generating Principle, everything in the Universe is ultimately connected through ascending levels of Awareness.

The majesty and mystery of Awareness and its manifestation remains, but this vision puts us inside as co-creative participants. I think you will agree that this is highly desirable from a metaphysical point of view. Quadrature is the mechanism that science has been looking for to unify these two points of view. Q has been foreshadowed in many ways in both physics and metaphysics. As developed in BLINK, Quadrature Theory can serve as a Theory of Everything.

Pretty typical grandiose crackpottery. This looks an awful lot like a variation of Langan’s CTMU. It’s all about awareness! And there’s a simple “mathematical” construct called “quadrature” that makes it all work. Of course, I can’t tell you what quadrature is. No, you need to pay me! Give me money! And then I’ll deign to explain it to you.

To make a long story short, Quadrature Theory supports four essential claims that undermine Relativity, Quantum Mechanics, and Cosmology while placing these disciplines back on a more secure foundation once their erroneous assumptions have been removed. These are:

  1. The origin of space and its nature arise from Quadrature. Space is shown to be strictly rectilinear; space cannot warp under any conditions.
  2. The origin of the Tempic Field and its nature arise from Quadrature. This field facilitates all types of energetic interaction and varies throughout space. The idea of time arises solely from transactions underwritten by the Tempic Field. Therefore, time as we know it here on Earth is a local anomaly, which uniquely affects all interactions including the speed of light. “C,” in fact, is a velocity, and is variable in both speed and direction depending on the gradient of the Tempic Field. Thus, “C” varies drastically off-planet!
  3. Spin is a fundamental operation in space that constitutes the only absolute measurement. Its density throughout space is non-linear and it generates a variable Tempic Field within spinning systems such as atoms, or galaxies. This built-in “time” serves to hold the atom together eternally, and has many other consequences for Quantum Mechanics and Cosmology.
  4. Gravity is also a ringer in physics. Nothing of the fundamental origin of gravity is known, though we know how to use it quite well. Given the consequence of Spin, gravity can be traced to forms that have closed Tempic Fields. The skew electric component of spinning systems will align to create an aggregated, polarized, directional field: gravity.

Pop science, of course, loves to talk about black holes, worm holes, time warps and all manner of the ridiculous in physics. There is much more fascinating stuff than this in my book, and it is completely consistent with what is observable in the Universe. For example, I propose the actual purpose of the black hole and why every galaxy has one. At any rate, perhaps you now have an inkling of why Quadrature Theory is a Revolution Waiting to Happen!

Pure babble, stringing together words in nonsensical ways. As my mantra goes: the worst math is no math. Here he’s arguing that rigorous, well-tested mathematical models are incorrect – because vague reasons.