14th October 1996

 

In this "issue":



What's Happening?

Well it's a bit of a bitty month really - so many things are being worked on, it's had to definitively say "We're doing this", cause we're doing bits and pieces all over the shop. Some of us are doing docs (Moi!), some Fanta_C, some working on stuff for V5, some working on Eclipse, some are researching. With help from Salvo we got the list server working and of course 4.11 is being tested.

I guess you could say that October is generally a bitsy (spell check that you smart arsed computer - yeah!) month anyway - kind of non-descript time of year - it's not Winter, and it's not Summer, so what is it? Don't say Autumn, 'cause we don't get Autumn anymore - not in the UK anyway.

Before proceeding onto the development topic for this time, I really must congratulate a certain new Brit champion - well done.


Development - desperately seeking...

a search algorithm.

Yes, something not often talked about, err, not by me anyway!

This has the possibility of being the most boring StuChat ever! But hey, seriously, it's something very often overlooked. You know that old programmers saying "Heck, you can always speed it up by changing the algorithm"? Well, if there's one place you can really get a big speed up, it's in any searches you may have. I know all you budding games programmers are currently thinking - "he's talking complete cr*p - I am never gonna need to search big data sets in any of my games". To which I'd reply, "Lets hope you never want to write a 3D game then." in as deadpan a voice as possible.

This is going to be tricky to write, because if there's two subjects that go hand in hand it's searching and sorting - mainly because it is far quicker to search a sorted data set than it is an unsorted set (sounds obvious but it's easily overlooked in the rush to get something working...). However, I really don't want to get into sorting here, but I will show you one method of searching that inherently dictates the data is sorted - a binary tree. But first, more simple searching.

Do we want fast searches, or slow, but more generic searching? Well, for text books it's always considered good practice to steer away from hardware and stick to the theory - they sell more that way. Here, whilst I will be describing principles, the leaning will be towards modern 64 bit processors - Pentium programmers, as much as you want to believe it, it's not you, so go away now :-)

No! We want fast extraction routines that ran yesterday - not in five minutes time. The fastest possible way to extract data is to know the position of the data before hand, then simply index into it. This, I know, is a blindingly obvious statement, but if you can possibly make a note of something before you need it, do it - it may be one instruction now, but it can save many instructions retrieving the data. Just one other point, before we dive in - if you are not hand coding assembler, then get to know your compiler - there is a particular Mac compiler that outputs the following code very often (you know who you are :-)):

move.l d0,d3
tst.l d3
Bcc xxxx

As I'm sure you can tell, the tst instruction is slowing the code down, as the "move" inherently sets the flags exactly as per the "tst"instruction - the "tst" is not needed. I do not know why the compiler is doing this, but can hazard a few guesses.

 

Linear Searching

This type of searching is called "linear" because the search starts at the start of the data list, and progressively works its way through, one item at a time. Do not scoff - this search is fine for smallish sets as it is very small, always fits into the cache of even the most lowly processor and often surprises people at how fast it can run within it's limits. It can naturally be highly optimised for the target processor. For example, most 68k systems have a 32 bit data bus. The 68k has a "cmpm" instruction which can increment both the source and destinations by the size - thus:
cmpm.l (a0)+,(A1)+ compares four bytes, or 32 bits, and both the pointers are now pointing at the next four bytes to be compared.

This of course means that you have to zero out all unused bytes, and it helps if the strings are a fixed maximum length. To compare two 32 byte strings in a0 and a1 use:

moveq #7,d0
cmp_loop:
cmpm (a0)+,(A1)+
bne.s nope
dbra d0,cmp_loop
if we get here, the strings are equal
nope:
if we get here, the strings are not equal

If programming for PPC, things are even better. You can use the FPU to compare 8 bytes at a time with the "fcmpu" instruction - just make sure that where you are getting the data from (as doubles) is octal aligned.

Now, as always in programming, to get things moving faster, we trade memory for speed...

 

Linear searching with hints

This is where the fun really starts. Ok, lets assume you have a list of 15000 items to search. The items are arranged in memory as a simple list. Each slot in the list has a maximum length of, say, 128 bytes. If an item is less than 128 bytes in length, it is padded with zeros. The item you are searching for is naturally padded with zeros if not 128 bytes long. You want to know where in the list the item is

Now, even with the cmpm instruction, or the 8 byte compare of the PPC, it's still can take a while to ascertain that an item does NOT match the target item - for example they could match right up to byte 127, then be different - they do not match.

What you want in this case is something that gives you a hint about whether the items may match, before carrying out a full compare. Any ideas?

"Please, Ladies and Gentlemen, a big round of applause for ..... Mr. Hash!"

No officer, please, you've misunderstood....

Mr. Hash is every bodies' savior when it comes to quickly searching big lists. Why? Well hashing allows us to provide a rough representation of an item, in a much smaller piece of data. There are literally zillions of ways of hashing data - some good, some bad. All you do is make some kind of note about the contents of a data item - how you do this is often regarded as a touchy subject - it doesn't matter, just as long as what you end up with roughly represents the value of the data.

For example, we're adding a new item to our big long list - the item is 83 bytes long - it's a C type string (that is, it's terminated in zero) - we want a hash from this. A simple hash is to simply add each character of the string into a register, and then rotate the register either way by one bit. At the end of the string the register will contain a number that is representative of the string. Another string may produce exactly the same result, but in this case it doesn't matter, because we store the hash values in a parallel array to the main storage array. We use these hash values as our hints.

Now, when we want to search the list, we first hash up the search item (using exactly the same hash algorithm) and then search the hash list first, using simple byte compares. When we get a match, then we can do a full compare on the strings to see if they are an exact match. Thus the hash array can give us a hint that the item may match and significantly speed up search times. The accuracy of the hashes can be improved by a) increasing the size of the hash items to 16, 32 or a larger number of bits, or b) improving the hashing algorithm, or c) a mixture of both.

A good hashing algorithm is generally regarded as one which doesn't take much time to run and provides uniqueness in the hash values. With 8 bits used as the hash size, it's fairly obvious that within 257 items, there will be at least two items with the same hash value.

 

Hashing algorithms

It's easy to see where the simple accumulate and rotate algorithm falls down - two strings can easily hash up to exactly the same irrespective of the size of the hash value. To alleviate this, you could expand the add and rotate to become add, rotate, add serial, inc serial. This means that no two items will have exactly the same value, unless the number of datums in the item being hashed exceeds the size of the serial counter.

At the start of the algorithm we set a counter to zero. As we're hashing the string, after every add and rotate we add on the counter and then increment it. Whilst providing a guard against hashing "dab" as "bad" this slows down the hashing process. This, unfortunately is a fact of life, as the algorithm gets better, it slows down.

However, in the above hash algorithm, we may be able to improve it's effectiveness by NOT setting the counter to zero at the start of every hash. This will speed it up and may provide better results.

A traditional hashing algorithm is to add the data item to the hash value, add on constant1, multiply by constant2 and then make the hash value = hash value MOD constant2.

Both constants should be odd.
31 is a good value for constant 1.
Ideally, constant2 will be the length of the array you are searching, but if not, 3 times constant1 is good.

.

Chop!

If you find yourself in the envious position of having pre-sorted data to search, by far the fastest "find it" method is the "binary chop". You pick a point bang in the middle of the data. If the value there is less than what you are looking for you make that the new start. If the value is greater than your wanted item, you make it the end. You then pick a value bang in the middle of the new start/end positions and repeat the process. You are effectively divide the amount of data by two at every compare.

Eventually you will land on an item that matches what you wanted (if it's there!). If not your start and end positions will be equal an you know you have not found it.

 

The green way - binary trees.

Combining presorting with very fast search times is a binary tree. A binary tree is not really a tree at all - it is inverted - bad name. The tree starts at the top and expands outwards and downwards. To insert an item into a tree, it has to be searched to find the correct place for the new data. Because this search is a sort of binary chop, it is fast - err, that's with a capital F.

Trees are made up of nodes. A node can be considered a container that holds the data and links to other nodes. The links go left and right. When designing a tree the designer will dictate at the very start which way means what. Either it's left is greater and right is less than or the opposite. I prefer left is less than and right is greater than, as it's easier to envisage.

Nodes must not be considered to be double linked (that is a pointer to both the object before and after) - if you have to go up a tree, something is wrong with your insertion routine.

So, lets build a tree. We have four items to insert - "cat","coat" ,"cath" and "cat" again. The first node is "cat" - the insertion routine takes the data and creates a node for it - in this case it is the very first node, right at the top of the tree.

The next item is "coat" - this is lexically greater than "cat", so the insertion routine creates a new node to the right of "cat" and puts "coat" in there. The next item is "cath" - this is lexically greater than "cat" so the insertion routine goes right from "cat" - it is less than "coat", so the insertion routine creates a new node from "coat" to the left, and inserts "cath" in the new node. The last item is "cat". The insertion routine examines the first node, finds "cat" which is lexically the same as "cat", so the insertion routine returns with an "already present" error - what the caller does at this point is irrelevant. A diagram is in order...

Here, I've taken the liberty of adding a few more items to give you a better idea of how it works.
If we now want to add another item, say "artwork" we can simply work our way through the tree:
Lexically compare "artwork" to "cat" - it is less than "cat", so take the left route - we get to "bone". Compare "artwork" to "bone" - again it is less than, so go left - Oh no! We can't - so we have to create a new node and put "artwork" there.

Of course, to be useful, the nodes will contain useful information - for example the "dog" node could contain some information describing a dog(or a pointer to the information).

A few notes about implementation - the connections between nodes are simply pointers. When we were in "bone" and needed to create a new node for "artwork", we find out where the node is (i.e. where we have free memory) and put a pointer to it in "bone". Thus a minimal node is three pointers - 1 to the node identifier (or even faster, the identifier can be kept in the node) - i..e. "dog", "bone" etc, 2 a pointer to the node to the left, 3 a pointer to the node right. From this we can calculate the size of a tree for any given number of entries - they can get big.

The pointers need not be 32 bit pointers - they are after all just offsets from a given start address - you can use any size pointer you like, but it's difficult designing a maximum size into a piece of software. Besides, the architecture you are running on will normally dictate an inherent optimum size for the pointers - in general, 32 bit pointers will allow trees up to 4 Gigabytes in size. 24 bit pointers will allow trees up to 16 Megabytes and 16 bit pointers will give 65k of addressable data.

 

After the trees...

You can use many trees - for example if the item you are looking for, say a string, starts with ASCII "a" then use tree "a", same for "b","c","d" etc. - use an array to point to the tree starts and use the first character of the string as an array index times 4 to get the correct tree start - i.e.
ASCII character - "0" *4 is your index.

For specialised data there are specialised searches - for example helix, double and multi helixes - these are most useful where one has to feed a super staged (i.e. normally more than 16 stages where a stall could be disastrous) pipeline with data - the data is sorted on input to the helix, and as the helix rotates the data comes out pre-sorted with no delays - very often implemented in hardware where the helix is rotated under control of a master pipeline clock, normally via some form of feed forward accelerator taking account of capacitive and inductive delays, or in the case of software, interrupt latency times for real time systems (most real time systems have a guaranteed maximum latency).


StuChat

Well, rumors, rumors everywhere - when will it all end? I'm convinced some people make up "rumors" just to fill editorial pages? It's a fact of life that things get quiet in computing as we approach the Winter months (apologies to readers in New Zealand), and so the rumors get worse. For example, I would be most surprised if Apple (Bless you!) we're to abandon MacOS 8 in favour of Be's OS. Ok, so you think I'm being naive? Can you imagine what that would do to the moral of Apples' work force who have spent years working on it?

As I understand it the big bug bear is backwards compatibility with 68k applications. This is not a trivial problem, and is probably on a par with the switch from 680x0 to PowerPC processors - but Apple managed that just fine - sure there were a few bugs in the emulator, but it worked good enough (which is about as good as any modern software gets!).

Problems of compatibility should have been addressed in the initial design stage. Apple has some of the worlds' best engineers, so I'm sure if they thought that compatibility would be no problem, then there is no problem. It is effectively the first new mainstream operating system for many years - indeed it could even be classed as the world's first new operating system, as all others have slowly evolved to where they are now.

Apple simply cannot afford to "cock it up" - if the new OS does not work that'll be it. Hence long delays, redesigns, compatibility modifications. Coupled with an ever moving goalposts - that of rationalised product lines, clone machines and advancing technology, it must be extremely difficult to specify design way points - if you like events that determine how the project is progressing.

It is a shame the development is taking so long, but Apple and Macintosh must have the best OS.

 

Moving ever onwards, something I love doing is reading, and this month I found some classic stuff to read. If you like microprocessors, their history and related information, get over to http://infopad.eecs.berkeley.edu/CIC/archive/cpu_history.html#68000
It's huge, but engrossing. Covers Motorola, Ferranti, I.B.M., Intel amongst many others. It also contains one of the better overviews of PowerPC - it does contain a its fair share of errors, but is good nonetheless.

 

Personal insights into 4.11

And now that it's finished, and in testing, a little personal insight into PF411. Please note that 411 is it's internal version - it will probably be released as 4.15.

First job was to fix the bugs - this took about, ooh, 3 or 4 seconds barring the Eddie crash bug, which was a real killer. It was hard to find because, as some of you may know, Eddie contains its own task switching code. This means that a lot of Eddie code has to be "pure" - i.e. the same piece of code can be called from two places "simultaneously". Code like this has to save it's "state" whenever it is terminating, and restore the state when re-entering - for example doing a global find and hyperjumping at the same time uses the same code. Eddie is never idle - it may look like it's just flashing the cursor, but behind the scenes it is terribly busy - jiggling memory, creating line start lists and so on. It takes advantage of idle periods to prepare for what you *may* want to do next - scroll up a page, hit carriage return (a major event in a text editor!), paste from the scrap- whatever. This particular bug was because Eddie had run out of space for context changes. Hopefully it's all sorted out now.

Ok, so after fixing the bugs, we decided to add some new features that we had lying about - the first was voice notes:- I always use some form of voice recording facility when working, and it's so annoying to find out that the "Dictaphone" has flat batteries. So, we added some simple facilities to Eddie 1.4 - record and playback. Nothing fancy. No interface to speak of, just record and playback. Because we didn't want playback interfering with normal Eddie operation we made it playback in the background, courtesy of the ever so nice sound manager PlaySoundFile (thanks Jim!) routines. For those interested, the file format is AAIF-C (compressed). It's interesting to note that Apple define the qualities as "good", "better" and "best". Someone at Apple definitely has an ego problem :-) I would call them cr*p, poor and good. Eddie uses the cr*p setting in the interests of not wishing to pay the telephone company too much for modem bills.

Next up, and courtesy of Kev, is a block select mode - you know - place the cursor at the start of a block, move to the end and shift-click - the block is highlighted. Another Kev spesh is: ESC now has the same affect as hitting cancel in most dialog boxes. Still on Eddie, had to add an auto repeat to the CD controller and had a general tidy up. For example if you eject a disk, the CD status display now blanks out :-)

We also fixed a few minor buggettes: the block cursor now remember's its a block cursor across Eddie reboots and improved the word select logic. We added a "Run" command from Eddie, which is only available if Build completed successfully and is very nice.

In the back end we revamped the fragment compiler (CXF) for the PPC output stage. This is a big improvement over the old one - it tells you any imports it can't tie up - previously you'd get just "unresolved import" - now you get the list of names it can't resolve. It was lot of work, but worth it. I also noticed that the Shared library linker wasn't working terribly well if you had extra shared libraries selected - it was idiosyncratic to say the least! Well, in 411 it works fine :-)

As an aside, we can't call Shared Libraries DLL's anymore - it's a Micr*soft thing...but hey, at least Mac DLL versions work as their supposed to!

Also, and mainly of use to PPC users, we added more alignment options to rs_align (32,64 and 256) , and added an rs_auto_align directive that saves loads of typing, as you can auto align to 8 after every rs directive. It means the BSS section can be bigger than it needs to be, but it's up to the programmer to use the directives intelligently - well it is :-)

 

Blatant plug for list server

Salvo has been kind enough to set up a list server for Fantasm - do get on it - feel free to post. If it's not used, it'll go....

We'll be posting all kinds of rubbish on it - for example common mistakes and useful tips, along with update news and additions/changes to the WWW pages. If you're not on it, don't blame us when you miss something important. We post all kinds of tips as they happen - if you're not on, you don't get - simple.

 

Oh, that Brit champion I mentioned earlier - John Ball - UK Conker champion 1996 :-)

Not forgetting good old Damon and Williams of course.
Next year....Ferrari?

And, definitely, finally, I must have a go at Derek Smith of MacFormat fame who keeps having a go at Fantasy Role players - admit it Derek, you love it! We all know. How do you know Elves don't like rain?

 

Till next time...
Hang on, that cat is about. I swear that cat has a higher I.Q. than Spock. It's Ok, she's gone now.

Code on!

 

Stu.












For all my fans out there, here we go. I finally managed to scavange a semi decent shot off the server, and here I am!

Yes, it's finally me! Fluffy! Drool away besotted fans.


©Lightsoft 1996. Unauthorised reproduction prohibited.


Send mail to Stu
Send mail to Fluffy



Back to Stu's Page Top Level

Back to Home Page