In this "issue":
In future we plan to release updates every two months or so. As usual, for minor updates these will be free, but the upgrade to V5 will be charged which will probably include any new G3 and G4 instructions as necessary, and as we can get details of these chips.
Development - 410 bugs
There is one bug in Eddie we know of, which surfaces when auto
inserting text - for example deleting and pasting a highlighted
block of text, or commenting or uncommenting out a block of code.
We have seen and had reports of Eddie freezing sometimes. As soon
as we get this sorted we will release an updater. There appear
to be no bugs in either Fantasm or PowerFantasm assemblers or
linkers. There is a minor fault in one of the error strings in
PowerFantasm which will report the first operand as being illegal,
when in fact it is not.
Within the next month or two, we will release updates for both Fantasm and PowerFantasm. The update will fix the above problems and add new features as per the preliminary outline given in the news section.
The dropping off of Copland development affects us, as we had already redesigned a lot of our systems communications around the (scant) micro kernal information that has been published. As some of you know, when we get fast standard inter application communications we will publish the interface standards that Fantasm, PowerFantasm and Eddie use. Unfortunately the delay of the micro kernal OS delays the release of this information - we can only appologise, but would like to add that it isn't any of our doing.
StuChat - PPC assm.
Yay, phew. Man, busy or what?
Yes. Somehow I now seem to be writing assemblers, editors, compilers
and now games - how come I'm involved in game writing? God only
knows. However, they do say that to appreciate your customers
needs, you have to have been there. So now I am involved in writing
a game with PF. I can now truely better appreciate the email we
get on customer support. Why do you want alignment on 32 byte
boundries? Oh, right I see!
It's been two years since I last looked
at any game code, and that was in 68k assembly language. The difference
is quite staggering. Simply having so many condition code fields
is brilliant. The most startling feature of PPC is the alignment.
For example if I access a word (32 bits) sized global variable
which crosses a 64 bit boundary, the chip has to make two accesses
- which really cripples it. The solution is simply to first align
the BSS section on an octal boundary and then use plenty of rs_align
directives - if necessary after every RS. This really adds performance
and stabilises any timing you are trying to make!
How to align the BSS section? Simple. At the end of your global
RS directive block, add on 8 bytes by rs.l 1. Then all we need
to do is align the start of the BSS during our program initialisation,
viz:
**lets make bss octal aligned!li r29,8mr r28,r30 *copy bss pointerandi. r28,r28,%111 *mask lower 3 bits <8sub r29,r29,r28 *sub from 8add r30,r30,r29 *and make aligned
This will give much better performance than just hoping the
BSS (and hence your global vars) are octal aligned. More often
than not it will simply be word aligned and would possibly be
a problem if writing in C.
Another tip I can give you is to pre-calculate
an array of random values in advance of usage. We use two routines
- random_init which fills an array with random values as halfs.
The second, get_random gets the next value from the array and
increments the index. get_random takes a single parameter, which
is the largest value you want returned. We have optimised get_random
for 512, which rather than having to divide and multiply to get
the MOD of the value, simply ands with 0x1ff and hence is much
faster:
random_init:sub_inla r20,random_array(`bss)li r21,255subi r20,r20,2 *for pre-increment addr modemtctr r21 *loop countsurloop: Xcall Randomsthu r3,2(r20)bdnz surloopli r3,0 *index into this array to get a value - not strictly necessary.stb r3,random_index(`bss)sub_out
Note the sub_in and out macros - if
this routine did not call any OS functions or any other subroutine,
then we would not need sub_in and out, and we could simply return
with "blr".
**get random returns a random number >0 and less than r3. Optimised for r3=512**Leaf routineget_random:lbz r5,random_index(`bss)mr r28,r3 *save input paramcmpwi r28,512la r4,random_array(`bss)lhzx r3,r4,r5 *get random value into r3 from r4+r5addi r5,r5,1 *Note, the random array should be 256 halfsstb r5,random_index(`bss)beq quick_rand *we can and with 512**now mod result with max wanted valuedivw r4,r3,r28 *rand/modmullw r4,r4,r28 *times modsubf r3,r4,r3 *rand-result=modblrquick_rand: andi r3,r3,0x1ffblr
In this case, as it calls nothing, we don't use sub_in and out.
Also note the nice large gap between the "cmpwi" and
the "beq" instructions - this really is optimal for
conditional branches. The instructions between the compare and
the branch are effectively free instructions. Note also that PF
lets one get away with no dot on the "andi" instruction!
512 was chosen as a good optimisation, as this fits in as a nice
screen width when drawing (as you probably need margins of at
least 32 pixels either side of the visible area, which takes the
screen width the 512+64.).
Those of you awake may well be wondering why the index - random_index
- isn't tested for it's maximum value and then reset? Well, as
it's a byte sized item, it will auto wrap to zero when being incremented
from 0xff - hence the random array should be defined as 256 halfs
and all will be well. Note (Again!) that this routine cheats a
lot by having a byte sized index into an array of halfs! I'm not
going to even think about a high level equivalent:-)
I had to have a look at Game Sprockets(tm) as part of my research. So after being forced to hand over my name and email address to Apple, I got hold of the not incredibly small downloads and had a quick play. I have to be careful what I say here, but I am not impressed with the speed of DrawSprocket(tm) at all. On my 75Mhz 601 it achieved 21 MB/sec on the screen splat when not locked to the VBL and lower when it was.
This is not good, as our current splat is achieving close to 50MB/sec
and other people are achieving higher rates. When scrolling we
can achieve higher rates again. Even copybits on my machine can
crack 30 MB/sec! I would not use DrawSprocket. SoundSprocket is
OK, but has way too much parameter overhead and is rather large.
If registered users want a fast splat, contact me and I'll forward
some code.
We'll be testing the games core routines shortly from this site.
If anybody want's to test, please let me know - we desperately
need testers with 604 and 603 based machines. All testers who
send in a report will receive a free copy of the completed game.
The game comprises both static and scrolling sections along with
mucho sprites, stars and lots of sfx explosions along with stereo
sound and as much simulation as we can cram in. It is very fast
paced.
Now, I am reliably informed that a war is about to start to see who can post the worst photo - so, I'll get my shot in first and show you this. This is a photo of Rob, sans shades. What's unusual about this shot, is that judging by the sky, it's obviously past midday and Rob is still awake! :-)
Finally, I must thank Ajay Nath who contacted me this month
and pointed me towards "The Compiler Writers Guide"
on I.B.M.s WWW site. This book contains many assembly language
code examples and is well worth a look for anybody writing native
assembly language. I strongly suggest you give it a flick through.
Ajay tells me you can buy the book for one dollar from I.B.M.
Go
to The Compiler Writers Guide
Code on!
Stu.