Category Archives: Good Math

The Next Step in Computation: Level-2 Languages

Time to move on to Chomsky level 2! Level two languages are also known as context free languages, or CFLs. Level 2 languages are wonderful things. They’re simple enough to be parsed easily, but expressive enough to support a very wide range of useful languages. Pretty much every programming language that’s widely used can have its syntax specified with a level-2 language.

Grammars for Context Free Languages

In terms of the grammar, a CFL is easy to describe: it’s a language where the left-hand side of every grammar rule consists of exactly one non-terminal symbol. That’s it: the right hand side of a rule in a CFL grammar can be anything at all. Unlike the regular grammars, there are no restrictions on the position or the number of NTSs on the right hand side.

This change makes a huge difference in what you can do. In a CFL, you can count. You can have distant relationships – things like a sub-string that can occurs at the end of a string only if a match for it occurs at the beginning. The canonical example of this is paren matching: you can write a language that makes sure that you have matching parens: the same number of open and close parens.

  • A ::= '(' ')'
  • A ::= A A

This language includes ()((())()()(())), but not ()((())()()(()) (the same thing, but with one trailing paren omitted – 8 opens, 7 closes), or ()((())()()(()))) (the same thing, but with an extra close paren at the end – 8 opens, 9 closes).

As a quick aside: this also illustrates what we mean when we say that a language supports counting. When we say that a language requires counting, what we mean is that is that some feature of a string in the language has to have a number of repetitions dictated by another feature in the same string. In a paren-matching grammar, we require that the number of close parens must be the same as the number of open parens. We don’t just make sure that that’s true for 1, or 2, or 3 open parens, we have matching close parens. For any number of parens, the number of closes must be the same as the number of opens.

We can look at a much less trivial example of a simple grammar. As I’ve written about at other times, in computer science, there’s a formal language that we use for a ton of valuable things called lambda calculus. Lambda calculus is the formal mathematical basis of the Haskell language, and the basic tool used for specifying formal semantics of computational systems. The complete grammar of the simply typed lambda calculus is:

  • ident ::= "A" | "B" | "C" | ... | "Z"
  • expr ::= "lambda" ident "." expr
  • expr ::= ident
  • expr ::= expr expr
  • expr ::= "(" expr ")"

You can see a practical example of counting in this grammar. It guarantees that expressions in the lambda calculus are well-formed. We couldn’t do that in a regular language. That’s a huge boost in capability.

So why do we call this a context free language? In terms of the grammar, when we’re constructing a string, we’re always doing it by replacing non-terminal symbols with sequences of other symbols. When we do one of those replacements, if there’s an NTS “S” in the current string, then we can replace it. We can’t look at anything but the individual non-terminals in the string. We can’t consider the context in which a non-terminal occurs before we decide whether or not we’re allowed to replace it.

Capability of Context Free Languages

As we’ve seen, CFLs give us some pretty significant additional capabilities. That abilities to count and to describe non-consecutive relationships between different parts of a string in a language are a huge upgrade in computational capability. But CFLs are still pretty limited in many ways. You can’t write a grammar for a spoken language using CFGs – natural languages aren’t context free. We use CFLs and CFGs all the time for compilers and such in real programs, but there’s always an extra step involved, because there are aspects of real computer languages that can’t be captured in context-free form.

So what can’t you do in a CFL? I’m not going to try to formally characterize the limits of CFLs, but here are two examples:

Complex counting/Arithmetic
if you want a language of strings with the same number of Xs and Ys, that language is a CFL. If you want a string with the same number of Xs, Ys, and Zs – that isn’t.
Constrained relationships
We can have some distant relationships in a context grammar – but it’s a very limited capability. You can’t specify that a particular symbol can only occur in one place in the string if it already occured in a prior part of the string. For example, in the lamba calculus example, it really should say that you can only use the “expr ::= ident” rule if the ident occured in an enclosing LAMBA ident”. CFLs can’t do that.

Computing CFLs: the PushDown Automaton

So – now that we know what a CFL is, and what it can express: what’s
the basic computational model that underlies them? CFLs are computable using a kind of machine called a pushdown automaton (PDA). Pushdown automata are relatively simple: take a finite state machine, and add a stack.

For the non-CS folks out there, a stack is a last in first out (LIFO) storage system. What that means is that you can store something on the top of the stack whenever you want to (called pushing), look at what’s on top of the stack (peeking), and removing the element on top (popping). For a PDA, the transitions look like:

(mbox{state},mbox{top-of-stack},mbox{input}) rightarrow (mbox{state}, mbox{stack-action})

  • The top-of-stack in the transition can be either a symbol from the machine’s alphabet, or it can be “*”. If it’s a symbol, then the transition can only be taken if both the machine state and the top-of-stack match. If it’s “*”, then the transition can be taken regardless of the value on top of the stack.
  • The stack-action can be “push(symbol)”; “pop”, or “none”.
  • The machine accepts the input if it reaches a final state with the stack empty. (There are alternate formulations that don’t require an empty stack, or that only require an empty stack but don’t use final states. They’re exactly equivalent to empty stack + final state.)

As usual, it’s easiest to understand this with an example. So let’s look at a simple language consisting of parens and identifiers, where the parens have to match. So, for example, “((I)(((I)I)(II)))” would be accepted, but “(I)(((I)I)(II)))” (the same string, but with the first open removed) would be rejected.

Our machine has an alphabet of “(“, “)”, and “I”. It has two states: 0, and 1. 0 is both the initial state, and the only final state. The available transitions are:

  • (0, *, "(") rightarrow (0, push("(")): in state 0, if you see an open paren, push it onto the stack, and stay in state 0 It doesn’t matter what’s on the stack – if there’s an open-paren in state 0, you can do this.
  • (0, "(", ")") rightarrow (0, pop): in state 0, if you see a close paren and there’s an open-paren on top of the stack, then you’ve matched a pair, so you can pop the top open off of the stack.
  • (0, epsilon, ")") rightarrow (1, _): if you’re in state 0, and you see a close paren, but the stack in empty, then go to state one. State one is an error state: it means that you saw a close paren without a corresponding open paren. That’s not allowed in this grammar. Once you’re in state 1, you’re stuck.
  • (0, *, "I") rightarrow (0, _): in state zero, if you see an “I”, then you consume it and continue in state zero. It doesn’t matter what’s on the stack, and you don’t change the stack.

Or graphically:

So – if it sees a “(“, it pushes a “(” on the stack. If it sees an identifier, it just keeps going past it. If it sees a “)”, and there’s an “(” on top of the stack, it pops the stack; if it sees a “)” and there’s no “(” on the stack, then it switches into state 1. Since state 1 is not a final state, and there is no transitions that can be taken from state 1, the machine rejects the string if it sees an extra “)”. If there’s a “(” without a matching close, then when the machine finishes, it will have a non-empty stack, and so it will reject the string.

Finally, one nifty little note. The pushdown automaton is a very limited kind of machine. It can’t do complex arithmetic, or process complex grammatical constructions. There’s a lot that it can’t do. So what happens if we take this very limited machine, and give it a second stack?

It becomes maximally powerful – Turing complete. In fact, it becomes a Turing machine. We’ll see more about Turing machines later, but a Turing machine is equivalent to a two-stack PDA, and a Turing
machine can perform any computation computable by any mechanized computing process. So one stack is extremely constrained; two stacks is as un-constrained as any computing device can ever be.

Regular Expressions and Derivatives

When you’re working with regular languages specified in regular expression form, there’s a really cool idea that you can use for building regular expression matchers, and for describing how to convert from a regular expression to a NFA. It’s called the Brzozozwksi derivative of a regular expression – or just simply the derivative of a regexp.

The basic idea of the derivative is that given a regular expression, r, you can derive a new regular expression called the derivative with respect to symbol c, D_c(r). D_c(r) is a regular expression describing the string matched by r after it’s matched an r.

Continue reading

Nondeterminism in Finite State Automata

In the last post, I mentioned the fact that regular expressions specify the same set of languages as regular grammars, and that finite state machines are the computational device that can recognize those languages.

It’s even pretty easy to describe how to convert from regular expressions to FSMs. But before we do that, to make it a bit easier, we’ll extend our finite state machines. Doing that is interesting in itself: What we’re going to do is create non-deterministic finite state machines – NFA (for nondeterministic finite automata) for short.

Continue reading

Regular Languages

The simplest family of languages – and correspondingly, the simplest kind of computing machine – deal with regular languages. A regular language is very limited: there’s no counting, no deep patterns. They really don’t have much in the way of computational power. But they’re still very useful – the ubiquitous regular expression libraries in most programming languages are just easy ways of using regular languages to decompose simple patterned inputs.

The regular languages can be described in three ways: as regular grammars; as regular expressions; or as a kind of computing machine called a finite state machine or finite state automaton.

Continue reading

3-Valued Semantics

Before we can move from three-valued logic to fuzzy logic, we need to take a look at semantics – both how conventional two-valued logic handle semantics, and how three-valued logics extend the basic semantic structure. This isn’t exactly one of the more exciting topics I’ve ever written about – but it is important, and going through it now will set the groundwork for the interesting stuff – the semantics of a true fuzzy logic.

What we’ve looked at so far has been propositional 3-valued logics. Propositional logics aren’t particularly interesting. You can’t do or say much with them. What we really care about is predicate logics. But all we need to do is take the three-valued logics we’ve seen, and allow statements to be predicate(object).

In a conventional first-order predicate logic, we define the semantics in terms of a model or interpretation of the logic. (Technically, a logic and an interpretation aren’t quite the same thing, but for our purposes here, we don’t need to get into the difference.)

An interpretation basically takes a domain consisting of a set of objects or values, and does two things:

  1. For each atomic symbol in the logic, it assigns an object from the domain. That value is called the interpretation of the symbol.
  2. For each predicate in the logic, it assigns a set, called the extension of the predicate. The extension contains the tuples for which the predicate is true.

For example, we could use logic to talk about Scientopia. The domain would be the set of bloggers, and the set of blogs. Then we could have predicates like “Writes”, which takes two parameters – A, and B – and which is true is A is the author of the blog B.

Then the extension of “Writes” would be a set of pairs: { (MarkCC, Good Math/Bad Math), (Scicurious, Neurotic Physiology), … }.

We can also define the counterextension, which is the set of pairs for whiche the predicate is not true. So the counterextension of “writes” would contain values like { (MarkCC, Neurotic Physiology), …}.

Given a domain of objects, and the extension of each predicate, we know the meaning of statements in the logic. We know what objects we’re reasoning about, and we know the truth or falsehood of every statement. Importantly, we don’t need to know the counterextension of a predicate: if we know the extension, then the counterextension is simple the complement of the extension.

In three-valued Lukasiewicz logic, that’s not true, for reasons that should be obvious: if I(A) is the interpretation of the predicate A, then the complement of I(A) is not the same thing as I(lnot A). L_3 requires three sets for a predicate: the extension, the counterextension, and the fringe. The fringe of a predicate P is the set of values x for which P(x) is N.

To be more precise, an interpretation I for first order L_3 consists of:

  1. A set of values, D, called the domain of the logic. This is the set of objects that the logic can be used to reason about.
  2. For each predicate P of arity n (that is, taking n arguments), three sets ext(P), cext(P), and fringe(P), such that:
    • the values of the members of all three sets are members of D^n.
    • the sets are mutually exclusive – that is, there is no value that is in more than one of the sets.
    • the sets are exhaustive: ext(P) cup cext(P) cup fringe(P) = D^n.
  3. For each constant symbol a in the logic, an assignment of a to some member of D: I(a) in D

With the interpretation, we can look at statements in the logic, and determine their truth or falsehood. But when we go through a proof, we’ll often have statements that don’t operate on specific values – they use variables inside of them. In order to make a statement have a truth value, all of the variables in that statement have to be bound by a quantifier, or assigned to a specific value by a variable assignment. So given a statement, we can frequently only talk about its meaning in terms of variable assignments for it.

So, for example, consider a simple statement: P(x,y,z). In the interpretation I, P(x, y, z) is satisfied if (I(x), I(y), I(z)) in ext(P); it’s dissatisfied if (I(x), I(y), I(z)) in cext(P). Otherwise, (I(x), I(y), I(z)) must be in fringe(P), and then the statement is undetermined.

The basic connectives – and, or, not, implies, etc., all have defining rules like the above – they’re obvious and easy to derive given the truth tables for the connectives, so I’m not going to go into detail. But it does get at least a little bit interesting when we get to quantified statements. But to talk about we need to first define a construction called a variant. Given a statement with variable assignment v, which maps all of the variables in the statement to values, an x-variant of v is a variable assignment where for every variable y except x, . In other words, it’s an assignment where all of the variables except x have the same value as in v.

Now we can finally get to the interpretation of quantified statements. Given a statement forall x P, P is satisfied by a variable assignment v if P is satisfied by every x-variant of v; it’s dissatisfied if P is dissatisfied by at least one x-variant of v. Otherwise, it’s undetermined.

Similarly, an existentially quantified statement exists x P is satisfied by v if P is satisfied by at least one x-variant of v; it’s dissatisfied if P is dissatisfied by every x-variant of v. Otherwise, it’s undetermined.

Finally, now, we can get to the most important bit: what it means for a statement to be true or false in L_3. A statement S is T (true) if it is satisfied by every variable assignment on I; it’s F (false) if it’s dissatisfied by every variable assignment on I, and it’s N otherwise.

Computability

I just recently realized that I only wrote about computability back in the earliest days of this blog. Those posts have never been re-run, and they only exist back on the original blogger site. When I wrote them, I was very new to blogging – looking back, I think I can do a much better job now. So I’m going to re-do that topic. This isn’t just going to be a re-post of those early articles, but a complete rewrite.

The way that I’m going to cover this is loosely based on the way that it was first taught to me by a wonderful professor, Dr. Eric Allender at Rutgers, where I went to college. Dr. Allender was a really tremendous professor: he managed to take an area of computer science that could seem hopelessly abstract and abstruse, and turned it into something fun and even exciting to learn about.

Computability is the most basic and fundamental sub-field of theoretical computer science. It’s the study of what a mechanical computing device can do. Not just what a specific mechanical computing device can do, but what can any mechanical computing device do? What are the limits of what you can do mechanically? And once we know the limits, what can we discover about the nature of computation?

Continue reading

Fuzzy Logic vs Probability

In the comments on my last post, a few people asked me to explain the difference between fuzzy logic and probability theory. It’s a very good question.

The two are very closely related. As we’ll see when we start looking at fuzzy logic, the basic connectives in fuzzy logic are defined in almost the same way as the corresponding operations in probability theory.

The key difference is meaning.

There are two major schools of thought in probability theory, and they each assign a very different meaning to probability. I’m going to vastly oversimplify, but the two schools are the frequentists and the Bayesians

First, there are the frequentists. To the frequentists, probability is defined by experiment. If you say that an event E has a probability of, say, 60%, what that means to the frequentists is that if you could repeat an experiment observing the occurrence or non-occurrence of E an infinite number of times, then 60% of the time, E would have occurred. That, in turn, is taken to mean that the event E has an intrinsic probability of 60%.

The other alternative are the Bayesians. To a Bayesian, the idea of an event having an intrinsic probability is ridiculous. You’re interested in a specific occurrence of the event – and it will either occur, or it will not. So there’s a flu going around; either I’ll catch it, or I won’t. Ultimately, there’s no probability about it: it’s either yes or no – I’ll catch it or I won’t. Bayesians say that probability is an assessment of our state of knowledge. To say that I have a 60% chance of catching the flu is just a way of saying that given the current state of our knowledge, I can say with 60% certainty that I will catch it.

In either case, we’re ultimately talking about events, not facts. And those events will either occur, or not occur. There is nothing fuzzy about it. We can talk about the probability of my catching the flu, and depending on whether we pick a frequentist or Bayesian interpretation, that means something different – but in either case, the ultimate truth is not fuzzy.

In fuzzy logic, we’re trying to capture the essential property of vagueness. If I say that a person whose height is 2.5 meters is tall, that’s a true statement. If I say that another person whose height is only 2 meters is tall, that’s still true – but it’s not as true as it was for the person 2.5 meters tall. I’m not saying that in a repeatable experiment, the first person would be tall more often than the second. And I’m not saying that given the current state of my knowledge, it’s more likely than the first person is tall than the second. I’m saying that both people possess the property tall – but in different degrees.

Fuzzy logic is using pretty much the same tools as probability theory. But it’s using them to trying to capture a very different idea. Fuzzy logic is all about degrees of truth – about fuzziness and partial or relative truths. Probability theory is interested in trying to make predictions about events from a state of partial knowledge. (In frequentist terms, it’s about saying that I know that if I repeated this 100 times, E would happen in 60; in Bayesian, it’s precisely a statement of partial knowledge: I’m 60% certain that E will happen.) But probability theory says nothing about how to reason about things that aren’t entirely true or false.

And, in the other direction: fuzzy logic isn’t particularly useful for talking about partial knowledge. If you allowed second-order logic, you could have fuzzy meta-predicates that described your certainty about crisp first-order predicates. But with first order logic (which is really where we want to focus our attention), fuzzy logic isn’t useful for the tasks where we use probability theory.

So probability theory doesn’t capture the essential property of meaning (partial truth) which is the goal of fuzzy logic – and fuzzy logic doesn’t capture the essential property of meaning (partial knowledge) which is the goal of probability theory.

More 3-valued logic: Lukasiewicz and Bochvar

Last time I wrote about fuzzy logic, we were looking at 3-valued logics, and I mentioned that there’s more than one version of 3-valued logic. We looked at one, called K^S_3, Kleene’s strong 3-valued logic. In K^S_3, we extended a standard logic so that for any statement, you can say that it’s true (T), false (F), or that you don’t know (N). In this kind of logic, you can see some of the effect of uncertainty. In many ways, it’s a very natural logic for dealing with uncertainty: “don’t know” behaves in a very reasonable way.

For example, suppose I know that Joe is happy, but I don’t know if Jane is happy. So the truth value of “Happy(Joe)” is T; the truth value of “Happy(Jane)” is N. In Kleene, the truth value of “Happy(Joe) ∨ Happy(Jane)” is T; since “Happy(Joe)” is true, then “Happy(Joe) ∨ anything” is true. And “Happy(Joe) ∧ Happy(Jane)” is N; since we know that Joe is happy, but we don’t know whether or not Jane is happy, we can’t know whether both Joe and Jane are happy. It works nicely. It’s a rather vague way of handling vagueness, (that is, it lets you say you’re not sure, but it doesn’t let you say how not sure you are) but in so far as it goes, it works nicely.

A lot of people, when they first see Kleene’s three-valued logic think that it makes so much sense that it somehow defines the fundamental, canonical three-valued logic in the same way that, say, first order predicatelogic defines the fundamental two-valued predicate logic.

It isn’t.

There are a bunch of different ways of doing three-valued logic. The difference between them is related to the meaning of the third value – which, in turn, defines how the various connectives work.

There are other 3-valued logics. We’ll talk about two others. There’s Bochvar’s logic, and there’s Lukasiewicz’s. In fact, we’ll end up building our fuzzy logic on Lukasiewicz’s. But Bochvar is interesting in its own right. So we’ll take a look at both.

Continue reading

Representational Crankery: the New Reals and the Dark Number

There’s one kind of crank that I haven’t really paid much attention to on this blog, and that’s the real number cranks. I’ve touched on real number crankery in my little encounter with John Gabriel, and back in the old 0.999…=1 post, but I’ve never really given them the attention that they deserve.

There are a huge number of people who hate the logical implications of our definitions real numbers, and who insist that those unpleasant complications mean that our concept of real numbers is based on a faulty definition, or even that the whole concept of real numbers is ill-defined.

This is an underlying theme of a lot of Cantor crankery, but it goes well beyond that. And the basic problem underlies a lot of bad mathematical arguments. The root of this particular problem comes from a confusion between the representation of a number, and that number itself. “\frac{1}{2}” isn’t a number: it’s a notation that we understand refers to the number that you get by dividing one by two.

There’s a similar form of looniness that you get from people who dislike the set-theoretic construction of numbers. In classic set theory, you can construct the set of integers by starting with the empty set, which is used as the representation of 0. Then the set containing the empty set is the value 1 – so 1 is represented as { 0 }. Then 2 is represented as { 1, 0 }; 3 as { 2, 1, 0}; and so on. (There are several variations of this, but this is the basic idea.) You’ll see arguments from people who dislike this saying things like “This isn’t a construction of the natural numbers, because you can take the intersection of 8 and 3, and set intersection is meaningless on numbers.” The problem with that is the same as the problem with the notational crankery: the set theoretic construction doesn’t say “the empty set is the value 0″, it says “in a set theoretic construction, the empty set can be used as a representation of the number 0.

The particular version of this crankery that I’m going to focus on today is somewhat related to the inverse-19 loonies. If you recall their monument, the plaque talks about how their work was praised by a math professor by the name of Edgar Escultura. Well, it turns out that Escultura himself is a bit of a crank.

The specify manifestation of his crankery is this representational issue. But the root of it is really related to the discomfort that many people feel at some of the conclusions of modern math.

A lot of what we learned about math has turned out to be non-intuitive. There’s Cantor, and Gödel, of course: there are lots of different sizes of infinities; and there are mathematical statements that are neither true nor false. And there are all sorts of related things – for example, the whole ideaof undescribable numbers. Undescribable numbers drive people nuts. An undescribable number is a number which has the property that there’s absolutely no way that you can write it down, ever. Not that you can’t write it in, say, base-10 decimals, but that you can’t ever write down anything, in any form that uniquely describes it. And, it turns out, that the vast majority of numbers are undescribable.

This leads to the representational issue. Many people insist that if you can’t represent a number, that number doesn’t really exist. It’s nothing but an artifact of an flawed definition. Therefore, by this argument, those numbers don’t exist; the only reason that we think that they do is because the real numbers are ill-defined.

This kind of crackpottery isn’t limited to stupid people. Professor Escultura isn’t a moron – but he is a crackpot. What he’s done is take the representational argument, and run with it. According to him, the only real numbers are numbers that are representable. What he proposes is very nearly a theory of computable numbers – but he tangles it up in the representational issue. And in a fascinatingly ironic turn-around, he takes the artifacts of representational limitations, and insists that they represent real mathematical phenomena – resulting in an ill-defined number theory as a way of correcting what he alleges is an ill-defined number theory.

His system is called the New Real Numbers.

In the New Real Numbers, which he notates as R^*, the decimal notation is fundamental. The set of new real numbers consists exactly of the set of numbers with finite representations in decimal form. This leads to some astonishingly bizarre things. From his paper:

3) Then the inverse operation to multiplication called division; the result of dividing a decimal by another if it exists is called quotient provided the divisor is not zero. Only when the integral part of the devisor is not prime other than 2 or 5 is the quotient well defined. For example, 2/7 is ill defined because the quotient is not a terminating decimal (we interpret a fraction as division).

So 2/7ths is not a new real number: it’s ill-defined. 1/3 isn’t a real number: it’s ill-defined.

4) Since a decimal is determined or well-defined by its digits, nonterminating decimals are ambiguous or ill-defined. Consequently, the notion irrational is ill-defined since we cannot cheeckd all its digits and verify if the digits of a nonterminaing decimal are periodic or nonperiodic.

After that last one, this isn’t too surprising. But it’s still absolutely amazing. The square root of two? Ill-defined: it doesn’t really exist. e? Ill-defined, it doesn’t exist. \pi? Ill-defined, it doesn’t really exist. All of those triangles, circles, everything that depends on e? They’re all bullshit according to Escultura. Because if he can’t write them down in a piece of paper in decimal notation in a finite amount of time, they don’t exist.

Of course, this is entirely too ridiculous, so he backtracks a bit, and defines a non-terminating decimal number. His definition is quite peculiar. I can’t say that I really follow it. I think this may be a language issue – Escultura isn’t a native english speaker. I’m not sure which parts of this are crackpottery, which are linguistic struggles, and which are notational difficulties in reading math rendered as plain text.

5) Consider the sequence of decimals,

(d)^na_1a_2…a_k, n = 1, 2, …, (1)

where d is any of the decimals, 0.1, 0.2, 0.3, …, 0.9, a_1, …, a_k, basic integers (not all 0 simultaneously). We call the nonstandard sequence (1) d-sequence and its nth term nth d-term. For fixed combination of d and the a_j’s, j = 1, …, k, in (1) the nth term is a terminating decimal and as n increases indefinitely it traces the tail digits of some nonterminating decimal and becomes smaller and smaller until we cannot see it anymore and indistinguishable from the tail digits of the other decimals (note that the nth d-term recedes to the right with increasing n by one decimal digit at a time). The sequence (1) is called nonstandard d-sequence since the nth term is not standard g-term; while it has standard limit (in the standard norm) which is 0 it is not a g-limit since it is not a decimal but it exists because it is well-defined by its nonstandard d-sequence. We call its nonstandard g-limit dark number and denote by d. Then we call its norm d-norm (standard distance from 0) which is d > 0. Moreover, while the nth term becomes smaller and smaller with indefinitely increasing n it is greater than 0 no matter how large n is so that if x is a decimal, 0 < d < x.

I think that what he’s trying to say there is that a non-terminating decimal is a sequence of finite representations that approach a limit. So there’s still no real infinite representations – instead, you’ve got an infinite sequence of finite representations, where each finite representation in the sequence can be generated from the previous one. This bit is why I said that this is nearly a theory of the computable numbers. Obviously, undescribable numbers can’t exist in this theory, because you can’t generate this sequence.

Where this really goes totally off the rails is that throughout this, he’s working on the assumption that there’s a one-to-one relationship between representations and numbers. That’s what that “dark number” stuff is about. You see, in Escultura’s system, 0.999999… is not equal to one. It’s not a representational artifact. In Escultura’s system, there are no representational artifacts: the representations are the numbers. The “dark number”, which he notates as d^*, is (1-0.99999999…) and is the smallest number greater than 0. And you can generate a complete ordered enumeration of all of the new real numbers, {0, d^*, 2d^*, 3d^*, ..., n-2d^*, n-d^*, n, n+d^*, ...}.

Reading Escultura, every once in a while, you might think he’s joking. For example, he claims to have disproven Fermat’s last theorem. Fermat’s theorem says that for n>2, there are no integer solutions for the equation x^n + y^n = z^n. Escultura says he’s disproven this:

The exact solutions of Fermat’s equation, which are the counterexamples to FLT, are given by the triples (x,y,z) = ((0.99…)10^T,d*,10^T), T = 1, 2, …, that clearly satisfies Fermat’s equation,

x^n + y^n = z^n, (4)

for n = NT > 2. Moreover, for k = 1, 2, …, the triple (kx,ky,kz) also satisfies Fermat’s equation. They are the countably infinite counterexamples to FLT that prove the conjecture false. One counterexample is, of course, sufficient to disprove a conjecture.

Even if you accept the reality of the notational artifact d^*, this makes no sense: the point of Fermat’s last theorem is that there are no integer solutions; d^* is not an integer; (1-d^*)10 is not an integer. Surely he’s not that stupid. Surely he can’t possibly believe that he’s disproven Fermat using non-integer solutions? I mean, how is this different from just claiming that you can use (2, 3, 351/3) as a counterexample for n=3?

But… he’s serious. He’s serious enough that he’s published published a real paper making the claim (albeit in crackpot journals, which are the only places that would accept this rubbish).

Anyway, jumping back for a moment… You can create a theory of numbers around this d^* rubbish. The problem is, it’s not a particularly useful theory. Why? Because it breaks some of the fundamental properties that we expect numbers to have. The real numbers define a structure called a field, and a huge amount of what we really do with numbers is built on the fundamental properties of the field structure. One of the necessary properties of a field is that it has unique identity elements for addition and multiplication. If you don’t have unique identities, then everything collapses.

So… Take \frac{1}{9}. That’s the multiplicative inverse of 9. So, by definition, \frac{1}{9}*9 = 1 – the multiplicative identity.

In Escultura’s theory, \frac{1}{9} is a shorthand for the number that has a representation of 0.1111…. So, \frac{1}{9}*9 = 0.1111....*9 = 0.9999... = (1-d^*). So (1-d^*) is also a multiplicative identity. By a similar process, you can show that d^* itself must be the additive identity. So either d^* == 0, or else you’ve lost the field structure, and with it, pretty much all of real number theory.

Apex: My Editor Project

Lots of people were intrigued by my reference to my editor project. So I’m sharing the current language design with you folks. I’m calling it Apex, as a homage to the brilliant Acme, which is the editor that comes closest to what I’d like to be able to use.

So. The language for Apex is sort of a combination of TECO and Sam. Once the basic system is working, I’m planning on a UI that’s modeled on Acme. For now, I’m going to focus on the language.

Continue reading