13th July 1996.

In this "issue":




What's Happening?

Well, 406 is putting up a brave fight - every time we think it's ok, another little bug raises it's head and say's "find me!", so the testing is on-going. There really isn't much to say.

 

Development

Well, this week I've spent quite a bit of time talking to various PowerMac developers. The level of expertise is slowly rising as people work out their own little systems of programming without a stack, and how to actually get maximum speed out of the chip. It's just like the clock has been turned back 10 years, when people were getting their heads around the likes of 6502's and Z80's. Just proves that it doesn't matter how powerful the machine, it's never powerful enough! I think it's going to be scary when we see the first machine code PPC games!

One of the conversations I had centered around hardware acceleration, and how you should make available a CopyBits option (in a game), as on some machines CopyBits is hardware accelerated and will easily outperform the most devious of direct screen write algorithms. At the same time, not all machines have accelerated CopyBits, and so you have to write to the lowest common denominator, which does mean devious graphics routines. Where this is leading is that we were to include various CLUT (Color Look Up Table) routines in the PPC video library, but then decided that it was probably a waste of time because whatever we wrote would not be suitable for your situation.

For example, if we wrote a general purpose CLUT rotate, it would be too slow because you may only want to rotate colours 32-64 and so a custom rotate will be far quicker. The last thing you want is a slow routine trying to work out what it is you want. Ok, so it makes coffee for you, but it doesn't do the job as quickly as it could.

I'll show you some example PPC CLUT code in the interlude (Fully working small demoette).

The other thing of concern to quite a few people at the moment is the amount of time it can take to change the CLUT. We've talked to quite a few people about this, and the general opinion is that it can take up to one whole video frame for the SetEntries call to come back to you - Eeek! This is born out in our own code, where we can see the speed of the rotates varying quite considerably. Why? Well it seems that most video hardware will wait for the VBL (Vertical BLanking) time before actually changing the CLUT - this of course is a perfectly natural thing for video hardware to do, as it simply may not have the time to transfer 2k of data whilst scanning the monitor at a high refresh rate, whereas it does nothing whilst the electron beam is returning to the top left of the monitor tube and so can happily update it's internal hardware palette. The other reason is that you will get no visual interference on the screen when the CLUT is changed as during the VBL, the electron beam is blanked, and so draws nothing.

This poses a big problem if you need to rotate colours whilst other action is happening.

Why would you want to change colours whilst doing other things? Millions of reasons - fading backdrops or just selected areas of graphics, animation, general fast graphics frigs etc - it's always been a time honoured way of making things look more busy than they really are.

So we have a real problem in that we can't do colour rotates and other things at the same time - or can we? As a matter of fact, I don't think I've seen it done on the Mac?

Thus we have our second puzzle, as posted on the puzzle page. I'll give you one of the ways we thought of doing it in a few weeks. In the mean time, if you have a way of doing this, send it in and we'll post it up.

 


One of the examples that never got sent out with PowerFantasm (it will be included in the 406 distribution) is how to create a Transition Vector with PF. As you may know, PF does not create TV's in it's TOC when you define a code pointer with toc_routine. It creates half a TV - the code pointer.

A Transition Vector is two words - the first is the code pointer, the second is the code pointer's TOC value.

You generally need a TV when you wish to create an Universal Proc Pointer. These are needed whenever you want to use a callbackproc, for example in an async sound driver or controls.
As we now know what a transition vector is, we can create one quite easily, normally in the BSS section, by getting the relocated code pointer, and the current rtoc value. The following code sums it up:

includeh general_usage.def import CallUniversalProc *Note - not defined
in headers 
*****************************************************************************************
*Example of setting up an Universal Proc Pointer for PowerFantasm V4.xx
* *©Lightsoft 1995. * * * *****************************************************************************************
** 
**Theory of operation: 
**Step 1 
**Because PowerFantasm does not create transition vectors in it's TOC (Because it wastes 
**valuable space) we create a transition vector for the routine to be called in the BSS. 
**Step 2 
**We then pass that TV to NewRoutineDescriptor, which will return an UPP which we can save 
**away. Note that all your UPP's should be calculated as early as possible in your application's boot **up stage, to prevent heap fragmentation.
*****************************************************************************************
**Note, for your edification, we've used the quicker startup and exit forms of code, rather 
**than the longer (but more educational) macros (Startup and tidy_up). 
	bss: reg r30 
**********Start up 
	ENTRY 
Startup: mflr r0 
	stw	r0,8(sp) *Store link register on the stack 
	stmw r10,-88(sp) *Save r10-r31
	stwu sp,-64+88(sp) *Skip over the stack space 
	lwz r30,(rtoc) *Load global data (bss) pointer (first entry in TOC) 
	stw r2,20(sp) *Save RTOC; 
**********Call Macsbug to examine 
; Xcall Debugger 
**Step 1 **Because Fantasm does not create transition vectors in the TOC, we create our own in the 
**BSS. 
**Set up transition vector 
	la r10,test_tv(`bss) *TV here in bss 
	lwz r4,[t]my_test(rtoc)	*Actual pointer to code 
	stw r4,(r10) *into transition vector 
	stw rtoc,4(r10) *followed by our toc 
**Step 2 
**get a routine descriptor for our test upp.
	la r3,test_tv(`bss) *pointer to tv to code we want to execute 
	lwz r4,my_test_info(rtoc)	*info record for code 
	li r5,1 *isa=ppc 
	Xcall NewRoutineDescriptor 
	cmpwi	r3,0 *Check if call failed 
	beq Error *NewRoutineDescriptor failed. 
	stw r3,test_upp(`bss)
*save the upp 
**End of actual set up, we can now test... 
***********The acid test - call my_test through the upp 
	lwz r3,test_upp(`bss) 
	lwz r4,my_test_info(rtoc)
	Xcall CallUniversalProc 
***********Quit app 
Error: *On error, quit. 
Exit:
	lwz r0,64+8+88(sp) *Get saved link register 
	mtlr r0 
	addi sp,sp,64+88 *Reset stack pointer 
	lmw r10,-88(sp) *Restore r10-r31 blr 
****************************************************************************************
**This is out universal headered ppc test routine 
my_test: toc_routine *so we get a relocated pointer to this code entry point 
	mflr r29 
	Xcall Debugger	*so we can check we get here 
	nop *Do what you have to do. 
	nop 
	nop 
	mtlr r29
	blr *End of routine 
******************************************************************************************
*DATA 
******************************************************************************************
*Initialised data * 
**procinfo data for my_test 
my_test_info: dc.w 0 *procinfo 0=no parameters, see IM PPC sys s/w 
	dc.b 0 *resvd 
	dc.b 1 *ppc (68k=0) 
	dc.h	4 *Routine flags 4=native + 2=needs init + 1=offset 
*******************************************************************************************
*Uninitialised data into BSS section 
test_tv: rs.w 2 *Transition vector for routine. 
test_upp: rs.w 1 *save our upp here **


This little snippet should be assembled in the Stand Alone mode. It sets up a transition vector to the routine "my_test", then calls NewRoutineDescriptor to create a UPP, and finally, calls CallUniversalProc just as a test.


General

Well, that low end platform I was talking about last week finally has a name - "LERP" or Low End Reference Platform. Come on guys, are you serious or what? Hello?

I see now that the cloning operation is proceeding smoothly, what with the likes of the big Japanese and Taiwanese corporations getting a slice. Rumor has it that LG Electronics, or Goldstar as they are more commonly known, think they'll have a box out by September, and both I.B.M. (God bless 'em) and Motorola are sub-licensing thick and fast. This is good.
However, I did have a rather silly thought about all this: It is a well known fact that some people buy a Mac because it is a top of the range item. It exudes quality and looks expensive - to some, a status symbol. I do hope we don't end up in a situation where we have people scorning other Mac users simply because they have a "Goldstar Spesh" rather than a "real" Mac. It is possible, specially if the cheaper clones won't do what a real Mac does - for example there could be a lot of corners cut in the licensing area of things like fonts and internal technologies. What if a clone has problems running QT? What if it won't take an ADB keyboard. I think I'm talking complete, unadulterated cr*p here, so...

On a different note, I received a Macsbug dump this week from 406, and couldn't help noticing the hard disk was called "MacintoshHD" - how can you have a hard disk called that? James - get it sorted - it's a Mac :-)
But, this started me looking into what people call their Macs and associated devices - a quick check round here revealed the following:
"Victoria", "Annabel", "Connie", "Susan", "LightWorld" - soon to be changed apparently, "The Beast" and of course, "Elsie". The server under the stairs (and out of view) has not got a name, because it is a PeeCee, and as such simply isn't worthy.
Why are we allowed to name Macs? So they can become ours. The same way you can change a great deal of the interface, to make the Mac yours, you can also give it a name. Is this silly?
Nah! It's great.

 


Interlude

Ok, so now onto what can cause problems for some Mac developers - fades. You want one of those cool screen fades for your latest and greatest?
We'll talk about fading 256 colour screens, as that's just about the defacto standard for games these days. We've uploaded a complete example project to the downloads section, but it is not a supported Lightsoft project. It is just example code that shows you how to do fades and rotates. It is native only, and will run on any PowerMac that has a 256 indexed mode video driver (that is - 8 bit or 256 colours). The code therein has not been edited or commented to any great extreme - it is "as is".You can download it by clicking here (20k)

Now then, how to fade? Well, most Macs support what's called an "indexed" video mode - that is where a colour is translated to an index, which is used to address a table, which contains the best match for that colour (as 16 bit RGB values). This means that the Mac can supply a rough approximation for the colour you want. Colours on the Mac are normally defined a red, green and blue values. Each value is 16 bits, and you can set the foreground drawing colour with RGBForeColor and the background drawing colour with RGBBackColor. When the Mac is actually refreshing the screen, it looks at the pixel in VRAM, uses it's value to index into the table, and sends the contents of the table to the monitor

The table which is indexed is called the Color Look Up Table, or CLUT. Thus when in this mode, if we can change the CLUT periodically, we can alter colours on screen dynamically without writing any pixels. ~From this we can deduce that each entry in the CLUT is 3 halfs right - Red, Green and Blue values? Wrong, each entry is actually four halfs, which makes a great deal of sense from an addressing point of view! The values we want to alter are the last three halfs - these are the Red, Green and Blue values for this index.

Fading
The simplest form of fade, is to fade all the CLUT entries to zero. To do this we need to determine the difference between the starting value and zero (?! I know, but this theory helps later) then divide this difference by the number of steps we want to fade by. For example if we want a slow, smooth fade, we could fade over 300 steps. If we want a quick fade, we can fade over say 10 or 12 steps. For example if entry 0 has the values r=1000, g=500, b=2000 and we want to fade over 100 steps,we divide the red, green and blue values by 100, which gives 10,5,20. These values can then be subtracted from this clut entry to fade to zero.

As I'm sure you can see, we need an array to hold the fade values in, and each entry in this array is 3 halfs - the fade values for the RGBcomponents of the CLUT entry. We also need an array to hold the original clut in, so we may modify it.

Once we have calculated these fade values, it is simply a matter of looping around for the number of steps, subtracting the fade values from each CLUT entry until the whole CLUT has been modified. When every entry in the clut has been modified we can call SetEntries to send the new CLUT to the video hardware, then go back and subtract the differences again. When the loop is finished, the screen will be dark.

A variation of fading to zero is fading to a colour. Again we use an array of fade values, but this time the fade values are calculated not from the original colours minus zero, but the original colours minus the new colour, so we can get poitive or negative fade values. Then, as per the previous fade, we loop round modifying the CLUT entries until the requisite number of fade steps have completed.

The last fade incorporated into the example code is a "fade to clut" routine. This works similar to the above, except this time, when calculating the fade values, we have to find the difference between two CLUT entries and store these as the fade values.

When working with CLUTs, it is very important to get the Port correct, otherwise you may copy/set the wrong colours! Remember, there is only one CLUT - in the video hardware. In reality, when we are getting a CLUT for modification, we are reading a PMTable out of a Port - the current Port, so make sure it is the port you want (unless of course you are synthesising your own CLUT). Rotating
A simple procedure whereby the whole CLUT is rotated either left or right, more commonly called forwards or backwards. Included in the example code is a simple rotate which rotates all the entries left one position. These rotates can be seen in the vertical, horizontal and circle examples of the code. To rotate down by one, you copy the first entry, then shift the next 254 entries down one, then paste in the first entry into position 255.

You may like to rotate by more than one position, or just rotate a section of the CLUT, say colours 32-64 thus animating only part of the colour table - play with it, it's good fun!

The routines

The file "fade.s" contains most of the routines we've talked about previously.

fade_down, fade_to_clut and fade_to_colour are the main routines here.

copy_active_clut gets memory for the clut, then copies the PMTable of the active port and puts the handle to this clut in "current_clut_h".

get_current_pm_table returns a pointer in r3 pointing to the PMTable of the active port - used by copy_active_clut, but usable in it's own right. get_clut is also used by copy_active_clut and copies the PMTable pointed to by r3 to the clut in "current_clut_h".

load_clut is a resource loader that loads a clut from the resource fork. Takes the clut ID in r3, and returns with the handle to the resource in r3.

calc_fades, calc_fades_clut and calc_fades_colour are the routines that calculate the fade arrays for the three different forms of fade.

run_fade is the actual fade routine that will fade a clut. Needs the clut_copy and fade_values arrays correctly set up. run_fade_add is the same routine, but instead of subtracting the fade_values from the clut_copy, adds them - this works better when fading to a clut.

splat_clut takes a new clut in r3 and calls SetEntries to set the new clut.

copy_clut is a general routine that copies the clut in r3 to the memory pointed to by r4.

rotate_clut is a simple rotate routine that will rotate the clut in r3 to the left one position.

The file cf_utilities contains various window and drawing routines that I won't document here as they're just standard routines that call system functions.

"clut_fade_main" is naturally enough the top level of the program. The theory of operation follows this path:

That's it! Ah, just one final thing - does anybody have a cgi script that will accept a form and post the fields on as email? If so, can we have a copy as we haven't got a clue!

Code on!
Stu.

©Lightsoft 1996. Unauthorised reproduction prohibited.


Send mail to Stu



Back to Stu's Page Top Level
Back to Home Page