16th September 1996

 

In this "issue":




What's Happening?

Quite a lot :-) We've had a heck of a busy month. We managed to get both Fantasm and PowerFantasm V4.10 out pretty much on schedule and with very few bugs (see Development below for details). PowerFantasm is now being hammered by many people, us included. Many native games are being written, which is rapidly highlighting future features that would be nice. An updater to 4.11 is in the pipeline which will fix the two problems reported below and add new features. We expect the updater to be available within the next month or two. We are looking into being able to link with Apple's Game Sprockets(tm) but as of this present time there are no plans to, as very few people seem to be interested in Sprockets. If you are and want to link to these libraries, now is a good time to let us know.

In future we plan to release updates every two months or so. As usual, for minor updates these will be free, but the upgrade to V5 will be charged which will probably include any new G3 and G4 instructions as necessary, and as we can get details of these chips.

Development - 410 bugs

There is one bug in Eddie we know of, which surfaces when auto inserting text - for example deleting and pasting a highlighted block of text, or commenting or uncommenting out a block of code. We have seen and had reports of Eddie freezing sometimes. As soon as we get this sorted we will release an updater. There appear to be no bugs in either Fantasm or PowerFantasm assemblers or linkers. There is a minor fault in one of the error strings in PowerFantasm which will report the first operand as being illegal, when in fact it is not.

Within the next month or two, we will release updates for both Fantasm and PowerFantasm. The update will fix the above problems and add new features as per the preliminary outline given in the news section.

The dropping off of Copland development affects us, as we had already redesigned a lot of our systems communications around the (scant) micro kernal information that has been published. As some of you know, when we get fast standard inter application communications we will publish the interface standards that Fantasm, PowerFantasm and Eddie use. Unfortunately the delay of the micro kernal OS delays the release of this information - we can only appologise, but would like to add that it isn't any of our doing.

StuChat - PPC assm.

Yay, phew. Man, busy or what?
Yes. Somehow I now seem to be writing assemblers, editors, compilers and now games - how come I'm involved in game writing? God only knows. However, they do say that to appreciate your customers needs, you have to have been there. So now I am involved in writing a game with PF. I can now truely better appreciate the email we get on customer support. Why do you want alignment on 32 byte boundries? Oh, right I see!
It's been two years since I last looked at any game code, and that was in 68k assembly language. The difference is quite staggering. Simply having so many condition code fields is brilliant. The most startling feature of PPC is the alignment. For example if I access a word (32 bits) sized global variable which crosses a 64 bit boundary, the chip has to make two accesses - which really cripples it. The solution is simply to first align the BSS section on an octal boundary and then use plenty of rs_align directives - if necessary after every RS. This really adds performance and stabilises any timing you are trying to make!
How to align the BSS section? Simple. At the end of your global RS directive block, add on 8 bytes by rs.l 1. Then all we need to do is align the start of the BSS during our program initialisation, viz:

**lets make bss octal aligned!
	li r29,8
	mr r28,r30 			*copy bss pointer 
	andi. r28,r28,%111 	*mask lower 3 bits <8
	sub r29,r29,r28 		*sub from 8
	add r30,r30,r29 		*and make aligned

This will give much better performance than just hoping the BSS (and hence your global vars) are octal aligned. More often than not it will simply be word aligned and would possibly be a problem if writing in C.
Another tip I can give you is to pre-calculate an array of random values in advance of usage. We use two routines - random_init which fills an array with random values as halfs. The second, get_random gets the next value from the array and increments the index. get_random takes a single parameter, which is the largest value you want returned. We have optimised get_random for 512, which rather than having to divide and multiply to get the MOD of the value, simply ands with 0x1ff and hence is much faster:

random_init: 
	sub_in 
	la r20,random_array(`bss)
	li r21,255 
	subi r20,r20,2 	*for pre-increment addr mode 
	mtctr r21 			*loop count 
surloop: Xcall Random 
	sthu r3,2(r20) 
	bdnz surloop 
	li r3,0 			*index into this array to get a value - not strictly necessary. 
	stb r3,random_index(`bss) 
	sub_out


Note the sub_in and out macros - if this routine did not call any OS functions or any other subroutine, then we would not need sub_in and out, and we could simply return with "blr".

**get random returns a random number >0 and less than r3. Optimised for r3=512 
**Leaf routine 
get_random:
	lbz r5,random_index(`bss) 
	mr r28,r3 				*save input param 
	cmpwi r28,512 
	la r4,random_array(`bss) 
	lhzx r3,r4,r5 		*get random value into r3 from r4+r5 
	addi r5,r5,1 			*Note, the random array should be 256 halfs 
	stb r5,random_index(`bss) 
	beq quick_rand 		*we can and with 512 
**now mod result with max wanted value 
	divw r4,r3,r28 		*rand/mod 
	mullw r4,r4,r28 		*times mod 
	subf r3,r4,r3 		*rand-result=mod 
	blr 
quick_rand: andi r3,r3,0x1ff 
	blr


In this case, as it calls nothing, we don't use sub_in and out. Also note the nice large gap between the "cmpwi" and the "beq" instructions - this really is optimal for conditional branches. The instructions between the compare and the branch are effectively free instructions. Note also that PF lets one get away with no dot on the "andi" instruction!
512 was chosen as a good optimisation, as this fits in as a nice screen width when drawing (as you probably need margins of at least 32 pixels either side of the visible area, which takes the screen width the 512+64.).


Those of you awake may well be wondering why the index - random_index - isn't tested for it's maximum value and then reset? Well, as it's a byte sized item, it will auto wrap to zero when being incremented from 0xff - hence the random array should be defined as 256 halfs and all will be well. Note (Again!) that this routine cheats a lot by having a byte sized index into an array of halfs! I'm not going to even think about a high level equivalent:-)

I had to have a look at Game Sprockets(tm) as part of my research. So after being forced to hand over my name and email address to Apple, I got hold of the not incredibly small downloads and had a quick play. I have to be careful what I say here, but I am not impressed with the speed of DrawSprocket(tm) at all. On my 75Mhz 601 it achieved 21 MB/sec on the screen splat when not locked to the VBL and lower when it was.


This is not good, as our current splat is achieving close to 50MB/sec and other people are achieving higher rates. When scrolling we can achieve higher rates again. Even copybits on my machine can crack 30 MB/sec! I would not use DrawSprocket. SoundSprocket is OK, but has way too much parameter overhead and is rather large. If registered users want a fast splat, contact me and I'll forward some code.


We'll be testing the games core routines shortly from this site. If anybody want's to test, please let me know - we desperately need testers with 604 and 603 based machines. All testers who send in a report will receive a free copy of the completed game. The game comprises both static and scrolling sections along with mucho sprites, stars and lots of sfx explosions along with stereo sound and as much simulation as we can cram in. It is very fast paced.


 

Now, I am reliably informed that a war is about to start to see who can post the worst photo - so, I'll get my shot in first and show you this. This is a photo of Rob, sans shades. What's unusual about this shot, is that judging by the sky, it's obviously past midday and Rob is still awake! :-)

Picture of Rob complete with important nose.


Finally, I must thank Ajay Nath who contacted me this month and pointed me towards "The Compiler Writers Guide" on I.B.M.s WWW site. This book contains many assembly language code examples and is well worth a look for anybody writing native assembly language. I strongly suggest you give it a flick through. Ajay tells me you can buy the book for one dollar from I.B.M.
Go to The Compiler Writers Guide

Code on!

 

Stu.

©Lightsoft 1996. Unauthorised reproduction prohibited.


Send mail to Stu



Back to Stu's Page Top Level

Back to Home Page