Yes, the thought of two vintage computing devices battling as digital gladiators in a cage match to the death under the watchful eye of our supercomputer overlord, the HAL 9000, is amusing at best. But this is Pi Day, and what better way to celebrate than with silicon blood. Who knows, perhaps this did happen a long time ago in The Grid.
Let the games begin!
Mano-a-mano
In a battle before time itself (I’m referring to the lack of a clock in the Apple II and 41C), this David and Goliath-esque story started to unfold as I was cleaning my office not too long ago. (Cleaning my office is really an excuse to get my retro on.)
I’ve always been a fan (and user) of the Apple II, RPN calculators, and of course, Pi. It’s no wonder that I have stacks of machines, journals, books, magazines, printouts, etc… related to all three topics.
As I was filtering through stacks of papers to determine the value of each scrap of paper, I stumbled across a 1980 article by Ron Knapp, PI TO 1,110 PLACES[1]. I had forgotten about this article. I remember printing it out along with a 1981 followup article by the same author, FASTEST PI TO 1,000 PLACES[2]. At the time I was interested in the relative 1000-digit Pi performance of all ’80s calculators, and gravitated towards the newer, faster code, and had forgotten to read the first article.
So, I read it. Hmmm…
The run-time for this program is relatively short: 30 places in 2 min., 90 places in 9 min., 200 places in 34 min., 1000 places in 11 ½ hr., and 1160 places in 15 ¼ hr. When you consider that an article in “The Best of Micro” some time ago stated that an Apple II had been programmed to calculate “pi” to 1000 places in 40 hours–the run-time for the 41-C, a shirt-pocked calculator, seems very fast indeed[2].
Wow! Really? Can this be true? I was determined to get to the bottom of this. I’ll be “cleaning” my office for a bit longer.
Confirming the Results
It didn’t take too long to track down the source of Ron Knapp’s statement; Robert J. Bishop’s article, APPLE PI[3]:
Since the program is written entirely in BASIC it is understandably slow[3].
Ah ha! BASIC! That explains everything. Right?
I knew from my Apple II days in high school that machine language was faster than BASIC, as was Pascal. But is a circa 1977 computer running BASIC really 4-5 times slower than a circa 1979 pocket calculator? I wanted to see the results myself, and find out why?.
I do not have a clock card in my Apple //c, nor did I have the time to fashion one out of a serial cable and a GPS unit (I’ve done this before), and I am not going to sit around with a stopwatch; I opted to use Virtual ][ on my Mac. IMHO, Virtual ][ is the most authentic Apple II series emulator available. And thankfully Virtual ][ emulates a clock card.
For the HP-41C I’ll be using my HP-41CX. The 41C and 41CX run at the same speed (the X in CX stands for eXpanded memory). Fortunately the CX (unlike the C) has a clock built-in.
I modified both programs to output to a printer and to leverage the clocks so that I could get an accurate reading of the performance. The output with source for the Apple II and HP-41C can be obtained here and here.
The results:
|
41 hours, 46 minutes, and 50 seconds. Confirmed, BASIC is indeed slow.
NOTE: I had to compute 1001 digits of Pi to get 1000 accurate digits.
|
NOTE: I did not use Ron’s original 1980 program (where the first stone was thrown), but opted to use the optimized 1981 version instead. The Apple II would have taking a a beating in either case.
And the Winner Is…
Ron was right to boast. I have been unable to find any public Apple II results dated before 1982 to take back the title, Microcomputer/Calculator Class: Fastest Pi to 1000 digits.
To add insult to injury, in 1949, the hulking 30-ton giant ENIAC enslaved a team of four humans and computed Pi to 2037 digits in 70 wall clock hours over a three day holiday weekend. ENIAC throughout the computation also reversed the calculations to check for errors. The humans were leveraged as bio-mechanical RAM since ENIAC could only hold 200 decimal numbers in memory. Team ENIAC worked in shifts collecting, organizing, and feeding IBM punch cards. All compute, check, and human interaction took 70 hours[4, pp. 277-281, 731].
I’ll demonstrate later that 2037 digits in 70 hours is computationally the same as 1000 digits in ~17 hours. The Apple II in 1978 cannot best a 30-something-year-old cyborg?
Why? Did all three use the same algorithm? In the same way? Are there technological factors or limitations? Can the Apple II best the HP-41C and ENIAC?
To answer all those questions we need to better understand how to compute Pi to 1000 digits.
Computing Pi with arctan
There are many different methods to compute the digits of Pi by hand or with a computer. The arctan formulas since the second half of the 17th century have been the most popular for computing a relatively small (less than a million) number of Pi digits[5, pp. 65, 205]. So it is no surprise that Ron, Robert, and Team ENIAC used arctan as well. Specifically, Machin’s formula.
Machin’s arctan Formula
John Machin discovered and used this formula in 1706 to compute the first 100 digits of Pi[5, p. 72]. It has been the most widely used formula by Pi digit hunters until the early ’80’s when the state-of-the-art shifted to Gauss AGM and other exotic formulas[5, pp. 206]. However, Machin is still very popular because it’s easy to implement, fast, and has a low memory footprint. All three are important factors when using retro tech.
Many mathematical functions can be numerically computed with a series which is relatively easy to calculate. The arctan function possesses just such a series which was discovered by James Gregory in 1671[5, p. 67].
Combine the two and voilà! We have something that we can compute efficiently that will converge quickly.
The first series converges with log10(52) = 1.3979 decimal places per series term and the second converges with log10(2392) = 4.7568 decimal places per series term[5, p. 72].
The total number of terms required for each series for 1000 decimal Pi digits can be calculated with ⎡1000 / log10(52)⎤ = 716 and ⎡1000 / log10(2392)⎤ = 211 respectively. 927 terms total.
NOTE: The ⎡ ⎤ brackets is defined as round up to the next integer.
NOTE: The number of terms does not change if the numeric base for computation changes. IOW, it is constant. More on this later.
TIP: http://www.codecogs.com/latex/eqneditor.php was used to create all equations used in this article.
Arbitrary-Precision Arithmetic
Go ahead, load up that last equation into Apple’s Integer BASIC and see what you get for Pi. 3? Don’t worry, you are are in good company; the biblical value of Pi = 3[6, p. 174] :-). Apple’s Floating Point BASIC doesn’t fair well either; at best, it would have computed 10 or 11 digits.
Clearly we are going to have to create some arbitrary-precision (AP) arithmetic code. The good news is that all we need is add, subtract, multiply, and divide. The simplest method for all four is to go old skool. That’s right, the same way you learned how to do this in elementary school is still applicable today. Simply represent each digit in an array and perform the operations the same way you would with pen and paper. Get out a pen and paper and try it!
It’s easier than it sounds. Although the adds and subtracts will be AP +/- AP, the multiply and divides will be AP ∗/÷ SP (single-precision). Now that’s even easier. AP:AP mult/div is harder and the subject of many papers and AP libraries.
For each operation, as your code loops from the least significant digit to the most significant digit (except divide, go the other way), the four arithmetic routines will perform two basic operations per iteration; add, sub, mult, div, and then a carry. I understand that this is an oversimplification, and if you want a clearer understanding, then I suggest you look at some code. However, the point I am trying to make is that if the number of digits doubles (e.g. computing Pi to 2000 digits), then the number of operations for AP:AP add/sub and SP:AP div/mult also doubles. It also works the other way. If I can half the number of digits, then I half my number of operations, and therefore half the time for basic arithmetic operations.
Covering Your Bases
The amount of effort it takes an 8-bit computer to add two base 10 numbers is the same effort as adding two base 256 (28) numbers. Therefore, if you can pack your 1000 digit base 10 arrays into fewer base 256 digits, then you can also reduce the number of operations and get a proportional speed up in time. Your new array size will = ⎡number of decimal digits / log10base⎤.
A few examples based on 1000 decimal digits of Pi:
Base | Calculation |
Array Size
|
10 | ⎡1000 / log1010⎤ |
1000
|
28 (256) | ⎡1000 / log10(28)⎤ |
416
|
216 (65536) | ⎡1000 / log10(216)⎤ |
208
|
100,000 | ⎡1000 / log10100,000⎤ |
200
|
10,000,000,000 | ⎡1000 / log1010,000,000,000⎤ |
100
|
The speed-up should be proportional to the array size assuming that a larger base will not have a disproportional overhead. E.g. The 6502, an 8-bit processor, will take more that twice the instructions to add two 16-bit numbers compared to two 8-bit numbers. However using 16-bit numbers on an 8-bit system is still more efficient that using base 10.
There is a caveat when not using base 10, 100, 1000, etc…; the end result must be converted back to base 10 if you are truly computing 1000 decimal digits of Pi. This adds a bit of overhead to the end of the computation, however it is only a fraction (e.g. 1/6th for base 216 to base 10) of the total computation.
I am assuming that ENIAC can leverage it’s double precision support and use base 1010 (10,000,000,000) and therefore only require an array of 100 base 1010 digits for 1000 decimal digits of Pi. Ron’s HP-41C program uses base 105 (100,000) requiring only 200 base 105 digit arrays. Robert’s Apple Pi uses base 10 requiring 1000 digit arrays.
This begins to shed some light on why Apple Pi performs so poorly. If Apple Pi could use base 100,000 or base 10,000,000,000 then its time would be reduced by a factor of 5 and 10 respectively. However, Integer BASIC and the 6502 are not designed to support this. ENIAC and the HP-41C are both designed and optimized for base 1010. As they should be; they are calculators.
Computation Complexity
Earlier I said, I’ll demonstrate later that 2037 digits in 70 hours is computationally the same as 1000 digits in ~17 hours.
That statement makes two assumptions. The first is that Team ENIAC used the Gregory expansion of Machin’s arctan formula, and the second is that their AP arithmetic scaled linearly. If that is the case, as is the case with Ron’s and Robert’s programs, and since the number of terms and the number of digits is a fixed ratio based on the desired number of digits; then if the number of desired digits double, then so does the number of terms to be computed and the size of the array to hold the digits, and therefore the amount of computation is double * double. IOW, the computational complexity is O(n2).
Simply put, if I double the digits the time increases by four. Therefore, the estimated time for ENIAC to compute 1000 digits is: 70 hr * (1000/2037)2 = 16.87 hr.
Back to Apple Pi
The number of terms to be computed will be constant regardless of the program’s numeric base. That leaves optimizing the arithmetic routines. Assuming that Robert’s BASIC program could be rewritten in base 256 (and it cannot without a lot of effort; effort that would not yield sufficient benefit), then the best speed up would be 41.78 hr * (416/1000) = 17.38 hr. Still deficient.
When writing fundamental AP routines it helps to have unsigned double precision integer support; if not, make your base the square-root of (the largest positive integer + 1), e.g. the largest HP-41C positive integer is 9,999,999,999, making the optimal base 100,000 since the HP-41C does not have double precision integer support. Because of the wacky integer support in Integer BASIC, the largest base would be 181 and not offer up enough performance.
Bottom line, Integer BASIC is most likely our culprit. In 1978 Robert had very few options. BASIC and assembly language were the dominate languages, and if I were in Robert’s shoes I would have picked BASIC too (Robert wrote the program in less than 40 hours)[3]. However, there was another option; Apple FORTH from Programma Consultants[7, p. 26]. And by 1980, when Ron first mentioned the Apple II’s poor performance, there were many more Apple languages to choose from, such as FORTRAN and Pascal.
Pi Day Rematch: Apple II vs. HP-41C
FORTH seems promising for a few reasons; one, it was available in 1978; two, it’s most like RPN, making a comparison to the 41C a bit more apples to apples; and three, with FORTH it will be easy to use a larger base (I used base 216) thus reducing the size of the arrays. Lastly, FORTH is fast.
Unfortunately I was unable to locate a copy of Apple FORTH from Programma Consultants, so I opted to use Mad Apple Forth (MAF) as a substitute. My program with timings can be located here. I didn’t have time to figure out how to access the clock card from MAF, so I used the Virtual ][ record function and then noted the time stamps on the output.
Start: 22:44:47 Print: 23:05:24 End: 23:09:43 Total Time: 00:24:56
The HP-41C and ENIAC were both schooled by the Apple II with FORTH.
HP-41C cries foul!
If you could only use what came in the box with your Apple II or HP-41C, then what would be the results? Fortunately the Apple II mini-assembler is in the box.
In 1982 Glen Bredon wrote an APPLE PI program in assembly that blows the battery door off the 41C with a time of 194 seconds. Game over! The Apple II wins again!
My 1496 sec. FORTH version doesn’t hold an LED to assembly.
If you are looking for Glen’s program, it is packaged with the Merlin Assembler.
HP-41C Machine Language
To be thorough I asked a friend of mine to compute Pi on the 41C with a 100% machine language (ML) program. He came back with 5 hours, 9 minutes. No question, the Apple II is king.
Victory Lap
In my quest to best the 41C, I tried C as well. The C language hit the seen in the mid ’80s making it easier to write and port over existing Pi programs.
Why not rub it in a bit?
C Version | C Year | Base |
Time (s)
|
Aztec K&R C 3.2b | 1986 | 216 |
2180
|
Aztec K&R C 3.2b | 1986 | 10 |
6746
|
cc65 ANSI C 2.12 | 2010 | 216 |
1764
|
cc65 ANSI C 2.13 | 2010 | 216 |
1514
|
cc65 ANSI C 2.13 | 2010 | 10 |
4241
|
Summary
BASIC sucks. The HP-41C is truly awesome, and for a time bested the Apple II in the Fastest to Compute 1000 digits of Pi benchmark. But, in the end, the true potential of the Apple II smoked the 41C.
BTW, you’ll never need more than 50 digits of Pi[5, p. 153].
On the other hand…
To continue reading this fascinating tale click here.
Some Great Pi Books
If you like Pi, especially computing Pi, then I recommend the following three books[4, 5, 6]:
Print References
- Knapp, Ron. PI TO 1,110 PLACES. PPC Calculator Journal Jun. 1980: 9-10.
- Knapp, Ron. FASTEST PI TO 1,000 PLACES. PPC Calculator Journal Aug.-Dec. 1981: 68-69
- Bishop, Robert J. APPLE PI. MICRO THE 6502 JOURNAL Aug.-Sep. 1978: 15-16.
- Berggren, Lennart, Jonathan M. Borwein, and Peter B. Borwein. Pi, a source book . 3rd ed. New York: Springer, 2004. Print.
- Arndt, Jörg, and Christoph Haenel. [Pi] – unleased . 2. ed. Berlin: Springer, 2001. Print.
- Beckmann, Petr. A history of [pi] (pi) . 4th ed. New York: Barnes & Noble, 19931971. Print.
- Apple Computer Inc. the best of the user group newsletters for 1978. Contact ’78 Dec. 1978: 26.
Credits
- Pi Day Deathmatch Poster. Dhemerae Ford (dhemerae@gmail.com).
- ENIAC photo (linked). Wikipedia.
I seem to recall that Steve Wozniak chained a number of apple computers (model ?) together to calculate Pi. While attempting to track that article down, I came across this jerkwerks article. I find it very informative and well written. I would also like to tell the author that I also have a working Apple II, double drives, and original monitor, and with several extra boards for various purposes. I had to buy a new power supply in the 1990’s.
As Ever,
Orin Keplinger – near Chicago
Pingback: Pi Day 2011 | Trends Pics
A soon to be available replacement CPU board for the original HP 41c will run at 50X turbo speed.
That would cut the 5 hour machine language program down to a bit over 6 minutes, beating the Apple II time of 24 minutes. Running the user code program at 50X would be a bit over 10 minutes.
Also, I would think that a full machine language program should be much faster than 5 hours. If a user language program takes 8 hours, machine code should be faster than 5 hours. :-)
When you get your 50X turbo results let me know. I can send you the barcode or bin, lif, dat, raw, etc… of the RPN program and then you can test for me. The replacement board does not have a clock so you’ll need that to.
For some reason on the 41CX if you have the printer connected it will run a bit slower, so run without the printer, then after the program completes connect the printer and then GTO E to print out the results and the time.
Yep, no problem. Will do.
I do think the MCODE program had to have had some problems to run at 60% of the user code speed. That just can’t be right. :-)
If using a CPU accelerator is fair game one could use an 8 MHz “Zip Chip” for the Apple II.
This this a) 20 year old technology and b) should result in a speedup of 7 to 8 times regular CPU speed (slightly above 1 MHz on the Apple II).
24 minutes divided by 8 would give around 3 minutes, obviously…
Aah, but the HP-41 lives on. A new microprocessor board capable of running up to 50x in speed will be shipping in a couple of weeks. See: : http://systemyde.com/hp41/
I knew Ron as a friend when he was working on the Pi programs. There is a story behind the story. While the article evaluates the math processes there is an aspect that is also important and that is utilizing the peculiarities of the machine to gain a speed advantage.
X Y,
Richard
Hi Richard. Yes, Ron’s program is genius and his ability to use very few registers was remarkable.
How long does it take the Apple ][ in emulation with the speed governor off? :)
35x faster on my 2.13GHz MacBook Air.
Interestingly, Glen Bredon’s assembly solution produces its result in 3:14.
Just a quick note. Although the old Apple II (not Plus, not e, not c, not gs) was VERY limited in some ways, with an inexpensive 8 bit 6502 processor with a limited instruction set and a 1 mHz cpu clock speed, its real strength was that it finally gave the public nerds inside access to a computing machine, largely because of Steve Wozniak’s contributions, a hacker at heart. And they took off and ran with it, helping to start Apple into becoming the company that it is today.
I, too, was intrigued by Bob Bishop’s _Micro_ magazine article on calculating Pi. So much so that in 1980 I wrote a machine language program for my 48K Apple II that was able to calculate Pi to 36,364 digits in 52 hours, 28 minutes and 29 seconds. I was able to check the accuracy of my result by visiting the UCLA library and comparing the last few hundred digits in my result. I wrote it in machine code (no assembler), using Woz’s internal disassembler to check my work. I tried Gauss’ equation and Stirling’s equation, but I found Machin’s formula to be the fastest on my Apple II. I printed the result with a short AppleSoft program that ‘co-habitated’ memory with my machine code. A major issue in outputing 36,000+ on a 48K machine (minus room for the lower 2k+ used for the Applesoft program) is that you don’t have a lot of room for partial results as you complete the task, particularly when changing the binary to decimal for output. And the code was fully relocatable on this 8 bit processor. It was a fun challenge.
I actually submitted it to Apple as a lark program. It was submitted on the media of that time, a cassette tape. :-)
Craig Peterson
Apple II owner since 1978
(and Apple II Plus, Apple IIgs, Macintosh 128K, 512K, iMac, Mac LC, and PCs as well)
Machine code. That’s hard core. :-)
Stomer’s Arc-tangent is a bit faster than Machin (http://sense.net/~egan/hpgcc/#Example:%20%20%CF%80%20Shootout).
If you still got your code, I’d really like to see it.
BTW, I am working on an article about the Apple II cassette tape. Hopefully I’ll be done in about a month.
PS – Using your ‘square of the ratios of the digits’ method, I would guess that the program above would have calculated the value of Pi to 1000 digits in about 2 minutes and 23 seconds. I’m sure I did some timings for this in 1980, but I can’t find any record of it. So machine code was faster than integer basic, but we all knew that anyway.
Craig Peterson
Yep. Now I really want to see your code.
Hi,
I am selling on eBay “HP-41C Synthetic Programming Manual”, the original manual (Larken Publications-Oregon, 92 pages (letterformat), March 1982)
http://www.benl.ebay.be/itm/120871262164?ssPageName=STRK:MESELX:IT&_trksid=p3984.m1558.l2649
I you are interested, could be an opportunity….
Thank you,