#36 - Apr 99 [Updated 2 May 99]

 

StuChat 36

Hi Folks,
Been a long time since I did this - actually took some time out to write a StuChat
rather than just throwing one together. This one will cover: Evangelist, fast square
root and arctan algorithms and finally a quick run down and some software related to
Genetic Algorithms.

 

First ladies and gentlemen, a minute of silence for the death of the EvangeList
which ceased operations on 16th April 1999. You've probably read it elsewhere
by now but here's the closing message:

In the past two years Apple has experienced a stunning turnaround. This
is due to many things including the steadfast loyalty of Apple's
customers--and EvangeListas are the most steadfast of the steadfast.

The original purpose of EvangeList was to counteract the negative news
about Apple and Macintosh, and I believe that EvangeList has served its
purpose--fantastically, as a matter of fact. So after discussing what we
should do with EvangeList with the folks at Apple, we've decided to
retire the list.

It is with fond memories I bid the Evangelist fairwell. Two years
ago were indeed very dark times. Many people (myself included) alluded to the notion
of the "Dark Side" slowly gaining ground and converts. I must admit it isn't half as
much fun these days; knowing we were right (I am aware that i may be getting a little
ahead of the situation here) and that we would climb out into the light one day "soon."

These days it is rather different; we screamed for on-board 3d hardware and standards;
we screamed for better performance (though we always knew this was a "given" considering
we were RISC and the darkside wasn't) and we screamed for memory protection. Now we have
them (or very soon will have) has the fun gone?

Yes and no. Yes, because there's very little to moan about at the moment; very little
to want. No because the fundamentals have changed very little - PC's are still running
antiquated instruction sets; Macs are still Macs and the HW has just standardised -
what's the difference betwen a 603 and a G3? Very little. A G4? AltiVec which is already
a very well understood and anticipated technology. People still expect Mac software to
be better than PC software and so it should be!

 

 

Square Root algorithm (with thanks to Prof. Ranko Bojanic)
Very often one needs fast trig functions and would be quite happy to trade accuracy
for speed. A classic use for sqrt is when calculating the length of a line (in 3d
space) viz:

sqrt(a^2+b^2+c^2)

Using Newton's method and selecting the starting point we can get a very good
approximation after just two or three iterations - we really don't need absolute
accuracy.

First compute R = a^2 + b^2 + c^2. Then

set x_0 = a+b+c and then compute

x_1 = 0.5*(x_0 + R/x_0)
x_2 = 0.5*(x_1 + R/x_1)
x_3 = 0.5*(x_2 + R/x_2)
etc for greater accuracy

For example, if we take 10,11,12 for a,b and c the result of sqrt(a^2+b^2+c^2)
is 19.10.

In this example:
R is calculated as 365 (10^2+11^2+12^2) and x_0 as 33
Now the first iteration:
x_1 = 0.5*(x_0 + R/x_0)

R/x_0 = 365 / 33 = 11.06
+x_0 = 11.06 + 33 = 44.06
*0.5 = 22.03

x_1 computes as 22.03

On the second iteration:
x_2 = 0.5*(x_1 + R/x_1)
R/x_1 = 365 / 22.03 = 16.57
+x_1 = 16.57 + 22.03 = 38.6
*0.5 = 19.3
x_2 computes as 19.3

The third iteration:
x_3 = 0.5*(x_2 + R/x_2)
R/x_2 = 365 / 19.3 = 18.91
+x_2 = 18.91 + 19.3 = 38.21
*0.5 = 19.10
x_3 computes as 19.10

 

As you can see, within a few iterations the result is close enough - in this case three.

The error estimate after n iterations can be calculated with:
x_n - Sqrt(a^2+b^2+c^2) <
< 2 * Sqrt(a^2+b^2+c^2) * (0.5(Sqrt(3)-1))^(2^n)

 

 

 

Fast ATAN (see below for update re: more accurate atan)

Atan can be used to calculate the angle or heading to an object on a 2d plane given
a right angle triangle as follows:
We know that:
sin(x)=opposite/hypotenuse
cos(x)=adjacent/hypotenuse
tan(x)=opposite/adjacent

When considering a 360 degrees "arc" if we split it into quadrants then we can construct
right angle triangles. From a right angle triangle using the inverse functions arcsin,
arccos and arctan we can calculate the angle given the length of the sides of the triangle.

Now you can use Pythagorus to calculate the length of the hypotonuse should you really want
to, but we already have the length of the adjacent and opposite sides, so if we had a quick
arctan function we could use this:
angle=arctan(opposite/adjacent)

The problem with arctan is that it gets very non-linear above an input of 1.0 which equates
to 45 degrees, or one half of a quadrant. What you can do is split the quadrant in half
which means the greatest input to the arctan function will be 1.0 and the greatest angle
returned will be 45 degrees - you then need to correct the result depending on what octant
the point lies in, but that's a simple addition.

The actual arctan function then becomes a simple table look up. In this example
I have used a table with 21 integer values declared. The pseudo-code routine
is accurate to about 2.5 degrees over a 45 degree arc.

**Takes a floating point input value called "in" between 0 and 1,
**returns an integer result in degrees
if in<0 or in>1 error
index=convert_to_int(in*20)
return atans(index)

atans: dc.w 0,3,6,8,11,14,17,19,22,24,26,29,31,33,35,37,38,40,42,43,45

**end


Update 02 May 99 by Ranko Bojanic
A more accurate atan function.

I have essentially found what I believe is the best method
for the evaluation of the atan function. The computation of
the atan function as described in StuChat 36 is very ingenious,
but its accuracy is not very good.

Here is a very simple formula to compute the atan function:

atan(x) = x/(1+ 0.28*x^2) (|x|<=1)

(To get degrees, multiply the right hand side by 180/pi).
For example, let's compute atan(0.15). By your formula,
20*0.15 = 3. Hence, from the table,

atan(0,15) = 8 degrees.

By this formula,

atan(0.15) = 0.15/(1+0.28*(0.15)^2)
= 0.15/1.0063 =
= 0.149060916

or, in degrees,

atan(0.15) = 0.149060916 *(180/3.14159265)
= 8.54 degrees

With a calculator we find that

atan(0.15) = 8.53076561...degrees

Hence the error is less than 0.01 degrees!

For values of x > 1 use the formula

atan(x) = pi/2 - x/(x^2 + 0.28) (|x| >=1),

or, in degrees,

atan(x) = 90 - (x/(x^2 + 0.28))*(180/pi) |x|>=1.

For example, let's compute atan(20). We have

atan(20) = 90 - (20/400.28)*180/3.14159265 =
= 90 - 2.862785029
= 87.13721497 degrees

Your calculator gives

atan(20) = 87.13759477 degrees

Hence the error is less than 0.0004 degrees!

To summarize: you need only to have a formula for |x| <= 1.
For values |x| >= 1 you use the identity

atan(x) = pi/2 - atan(1/x).

>From Approximation Theory (C. Hastings, Jr., Note 143,
Math. Tables Aids. Comp 6, 68 (1953)) we know that

atan(x) = x/(1+0.28*x^2) + e(x)

where |e(x)| <= 0.005 for |x| <=1. (If you are
interested, I can send you info how to get even more
precise approximations of the atan(x) function)

I am sure that you will see immediately the advantages
of using this method from the programming point of view.

See StuChat 37 for more on Atan

(My sincere thanks to Ranko)


 

Genetic Algorithms
These are fun. The basic premise is as follows:
1. Generate a load of genes. A gene is simply a list of instructions.
2. 'Run' them.
3. Score them according to how well they performed the "task".
4. Pick the top 10% of genes and discard the remaining 90%.
5. Fill the pool with duplications of the top 10%.
6. Mate the genes - for example for each pair of genes, swap one instruction from each.
7. Go to step 2.

A simple task might be to produce the number 10. The genes consist of instructions
that are executed either in a virtual machine or on a real processor (assuming suitable
safety mechanisms are in place to catch crash conditions). After executing they can be
scored according to their result and length - a shorter gene scores higher than a longer
gene and one that gets closer to 10 scores more than one that doesn't.

Here's a very simple (but complete) example in 68K assm I wrote a long time ago:

*****************************************************************
*Simple test of Genetic Algorithm
*
*
*What:
*The goal of a gene in this case it to produce the result 10.
*a perfect gene will have an instruction length of 1 and produce the result 10 when run
*this is impossible.
*Each gene can contain instructions as follows:
*0 - clear accumulator
*1 - inc accumulator
*2 - dec accumulator
*3 - shift left accumulator
*4 - shift right accumulator
*5 - nop
*
*The accumulator is always cleared before processing the genes code.

*A gene is preceded by it's instruction length, score, result and selected. Each instruction, each gene "entry", is
*a byte, thus a typical gene might
*look like 3,0,3,0,1,1,1
*length----^
*score-------^
*result--------^
*selected--------^ *Set to 1 by culler if deemed fit for next generation (top 10%)
*
*Process
*1. Initialise n genes with random garbage (but the length byte has to be correct
**maximum length allowed must be set!
*2 loop
*2a Process each gene, giving it a score depending on how close its result is to 10,
*and how long it's instruction length is.
*
*Thus the lower the score, the better the gene (code contained within it). The highest
*score is zero (impossible),
*a good one might be
* inc accum *1
* shift left *2
* shift left *4
* shift left *8
* inc accum *9
* inc accum *10
*Six instructions which would score ((10-10)+1)*6=6
**As it turns out this is what it produced
** inc accum *1
** shift left *2
** inc *3
** inc *4
** inc *5
** shift left *10!!!! (Quite frankly that's amazing, and it only took 4 iterations!)

*Show the score

*2b pick the top 10% of genes and delete the remaining 90%

*2c. duplicate these to fill the remainder of the 90%

*2d. mate these genes with each other as follows.
* 1. Pick two random genes. Swap an instruction at random between them.
**(To prevent stagnation, there's a 1 in thousand chance we'll just stick a dec into
**the first instruction of one of them)

 

*3 do it all again until we decide to stop

 

number_of_genes: equ 300 *lets have this many
max_gene_length: equ 64 *max. number of bytes in each gene
max_instruction_length: equ max_gene_length-4
min_instruction_length: equ 3
max_opcode: equ 5 *instructions between 0 and 5 are valid

GA_main: entry
bsr init_mac
bsr ga_init

ga_loop:
bsr score_genes *run their code in VM and calc score
**find top 10 %
bsr cull
debug *examine genes here. 5 steps produces optimal program!!!!!!!!!!!!!!!
lea genes(pc),a3 *quick vis check
move.l the_highest_score(a5),d0
bsr mate *mate each gene with it's neighbour
bra ga_loop
rts_ "GA_main"

**this routine mates each gene with it's neighbour
**this involves:
*1. pick two genes at random
*2. swap one random instruction between them.
*
mate:
move.l #number_of_genes/2,d5 *counter
**loop
do_next_mate:
**pick next two genes at random
movem.l d1-d7/a0-a6,-(sp)
move.l #number_of_genes-1,d0
bsr get_rand
movem.l (sp)+,d1-d7/a0-a6
mulu.l #64,d0
lea genes(pc),a3
add.l d0,a3

movem.l d1-d7/a0-a6,-(sp)
move.l #number_of_genes-1,d0
bsr get_rand
movem.l (sp)+,d1-d7/a0-a6
mulu.l #64,d0
lea genes(pc),a4
add.l d0,a4
**We just swap one byte at a random position determined by the shortest string
**find shortest
clr.l d0
clr.l d1
move.b (a3),d0
move.b (a4),d1
lea 4(a3),a3
lea 4(a4),a4 *to code
cmp.l d1,d0
ble.s first_is_shortest
move.l d1,d0
first_is_shortest:
movem.l d1-d7/a0-a6,-(sp)
bsr get_rand *<=d0
movem.l (sp)+,d1-d7/a0-a6
**just a tadge of randomness
movem.l d0/d2-d7/a0-a6,-(sp)
move.l #1000,d0
bsr get_rand
move.l d0,d1
movem.l (sp)+,d0/d2-d7/a0-a6
**if d1=500 then we make inst 1 in one gene a dec instruction
cmpi.w #500,d1
bne.s not_mutation
add.l d0,a3
add.l d0,a4
move.b #2,(a3) *dec instruction
bra.s done_mute
not_mutation:
**swap bytes
add.l d0,a3
add.l d0,a4
move.b (a3),d1
move.b (a4),(a3)
move.b d1,(a4) *swap just one instruction
done_mute:
*have a cigarette and do next
subq.l #1,d5
bne do_next_mate
rts_ "mate"

**walks the genes.
*first finds highest scorers
cull:
bsr mark_as_winner

lea genes(pc),a3 *the winners should now be marked
bsr move_winners_to_start *move em to the start of the table

**duplicate this 10% to other 90%
lea genes(pc),a3 *dest
lea genes(pc),a4 *copy from here
add.l #max_gene_length*(number_of_genes/10),a4 *to here
move.l #8,d0 *9 times..

replicate:
**copy (number_of_genes/10)*(max_gene_length/4) to a4 from a3
push a3 *save src
move.l #(number_of_genes/10)*(max_gene_length/4),d1
copy_10:
move.l (a3)+,(a4)+
subq.l #1,d1
bne.s copy_10
pop a3 *restore src

dbra d0,replicate
rts_ "cull"

move_winners_to_start:
lea genes(pc),a3 *dest
lea genes(pc),a4 *copy from here
move.l #number_of_genes-1,d7 *cntr
mwts_loop:
tst.b 3(a4)
beq.s not_a_winner

**Copy max_gene_length bytes from a4 to a3
moveq #max_gene_length/4-1,d0
move.l a4,a2 *save src ptr
**Clear winner byte
clr.b 3(a4)
cts:
move.l (A4)+,(a3)+
dbra d0,cts
move.l a2,a4 *restore src ptr
not_a_winner:
lea 64(a4),a4
dbra d7,mwts_loop
rts_ "move_winners_to_start"



**This routine runs through genes looking for winniers (top 10% scorers)
mark_as_winner:
**we need to find the highest 10% of scorers, or 1/10 of number_of_genes
clr.l d6 *score index
move.l #number_of_genes/10,d5 *counter
move.l #0xff,the_highest_score(a5)
*work our way through looking for genes with scores that match score index until
*we have marked number_of_genes/10
mark_loop:
clr.l d7 *gene index
ml_at_score:
move.l d7,d0
lea genes(pc),a3
mulu.l #max_gene_length,d0
add.l d0,a3 *to gene
cmp.b 1(a3),d6 *cmp score to score
bne.s no_match
clr.l d2
move.b 1(a3),d2 *the score
cmp.l the_highest_score(a5),d2
bge.s not_highest
move.l d2,the_highest_score(a5)
not_highest:
**mark gene as selected
move.b #1,3(a3)
subq.l #1,d5 *dec needed counter
no_match:
addq.l #1,d7
cmp.l #number_of_genes,d7
bne.s ml_at_score *carry on looking for results that match acceptable
inc.l d6 *dec acceptable score
tst.l d5
bgt.s mark_loop
rts_ "mark_as_winner"


score_genes:
clr.l current_gene(a5) *the one were working on
move.l current_gene(a5),d7
**
do_score:
move.l d7,d0
bsr run_gene *execute it's code to get result
addq.l #1,d7
cmp.l #number_of_genes,d7
bne.s do_score
rts_ "score_genes"

run_gene:
lea genes(pc),a3
mulu.l #max_gene_length,d0
add.l d0,a3 *to gene
move.l a3,a4 *save gene start for scoring
clr.l d4 *accum
clr.l d3
move.b (a3),d3 *instruction counter
move.l d3,d6 *save length
subq.l #1,d3
addq.l #4,a3 *to first instruction
process:
clr.l d0
move.b (a3)+,d0
bsr execute
dbra d3,process
**result in d4
andi.l #0x7fff,d4 *keep is reasonable
*calc score
*score is 0 if result =10 else score=>0>255
*as follows
*if result=10 then score=0+length
*else
*find diff between 10 and result and add length
**if result -ve, make +ve
clr.l d0 *Score
move.b d4,2(a4) *Save result
cmpi.l #10,d4
beq.s is_10
*get diff between result and 10
bge.s is_pos
moveq #10,d3
sub.l d4,d3
move.l d3,d0
bge.s is_10 *>0
neg.l d0
bra.s is_10
is_pos:
sub.l #10,d4
move.l d4,d0 *diff
is_10: add.l d6,d0 *plus length
cmpi.l #0xff,d0
ble.s score_ceil_ok
moveq #-1,d0
score_ceil_ok:
move.b d0,1(a4) *save score
rts_ "run_gene"
execute: *inst in d0 as int32
*0 - clear accumulator
*1 - inc accumulator
*2 - dec accumulator
*3 - shift left accumulator
*4 - shift right accumulator
*5 - nop
tst.l d0
beq.s clear
cmpi.l #1,d0
beq.s inc
cmpi.l #2,d0
beq.s dec
cmpi.l #3,d0
beq.s left
cmpi.l #4,d0
beq.s right
cmpi.l #5,d0
beq.s nop
debug *illegal instruction!!!!!!!
clear:
clr.l d4
bra.s end_proc
inc: addq.l #1,d4
bra.s end_proc
dec: subq.l #1,d4
bra.s end_proc
left: lsl.l #1,d4
bra.s end_proc
right: lsr.l #1,d4
bra.s end_proc
nop:
end_proc:
rts_ "execute"

***********************************************************************************

ga_init:
clr.l current_gene(a5) *the one were working on
move.l current_gene(a5),d7

do_init:
move.l d7,d0
bsr init_gene
addq.l #1,d7
cmp.l #number_of_genes,d7
bne.s do_init
rts_ "GA_init"


init_gene:
lea genes(pc),a3
mulu.l #max_gene_length,d0
add.l d0,a3 *to gene
**get number of instructions for this gene
move.l #max_instruction_length,d0
bsr get_rand2 *number between 2 and d0
move.b d0,(a3)+ *length
clr.b (a3)+ *Score
clr.b (a3)+ *result
clr.b (a3)+ *pad byte
subq.l #1,d0
bge ok
debug
ok:
cmpi.l #max_gene_length-1,d0
ble.s ok1
debug
ok1:
move.l d0,d6 *loop
set_inst:
move.l #max_opcode,d0
bsr get_rand
move.b d0,(a3)+
dbra d6,set_inst
rts_ "init_gene"

get_rand2:
push d0
OSRandom d0
pop d1 *input max value
andi.l #$ffff,d0
move.l d1,d3 *range
sub.l #min_instruction_length,d3 *now
mulu.l d0,d3 *(rndm*range)
lsr.l #8,d3
lsr.l #8,d3 *div 65535
move.l d3,d0
addq.l #min_instruction_length,d0
rts_ "get_rand2"

get_rand:
push d0
OSRandom d0
pop d1 *input max value
andi.l #$ffff,d0
move.l d1,d3 *range
mulu.l d0,d3 *(rndm*range)
lsr.l #8,d3
lsr.l #8,d3 *div 65535
move.l d3,d0

rts_ "get_rand"




******data
genes: ds.b number_of_genes*max_gene_length ;Space for genes

**GA_Globals
current_gene: globoff.l 1
the_highest_score: globoff.l 1
the_highest_addr: globoff.l 1

extern init_mac


Unfortunately there is no interface so you have to examine the genes with MacsBug after
each iteration but hopefully it's food for thought. There are seven more of these
examples so maybe I'll post the next in the next StuChat should anybody be interested.
I have posted a Fantasm 6 project for this example available on FTP as a Stuffit(tm) 5 archive as GA1.sit.hqx.

Till the next time,
Code on!

Stu.

Make a comment about this article

©Lightsoft 99