[Coco] the final stage of optimization
asa.rand at yahoo.com
Fri Nov 20 22:24:41 EST 2009
>I'm not entirely sure I understand what you are doing/trying to do. Do
you need to maintain sorted and searchable set of items dynamically?
If so, perhaps what you need is a tree or heap structure. The
insertion will be on average O(LogN) instead of the constant time
insertion cost of a list, but it maintains the sequence and can be
searched in the same O(LogN) time.
The search and sort routines are separate. The search routine is designed to determine if a record has already been established for the data being compared. If it has, increment the ref count on, and save back to the file, the record that matched. If no match is found, then add a new record. The problem occurs when the number of records becomes large, say more than 25 records (I haven't actually been able to determine when the search becomes noticeably longer). The search times grow proportionately with the number of records to search.
The sort speed problem is a little more involved, especially in the literals file. The line sort is only comparing 2 fields, but the literal sort is comparing 5 pairs of fields, the actual pair depending on the value of the first field compared.
The reason is because literal references occur in 5 ways. There are byte, integer, hex integer, real and string literals. The sort first looks to see if the 2 records being compared have the same type field value. If they don't, sort them if necessary. If they do, then look at either the byte field, integer field, real field or string field. If they are different, sort them if necessary. With strings, there are 2 comparisons. First is according to string length. If they are the same length, compare for equality and sort if necessary. If the lengths are different, sort if necessary. Necessary, in all cases, means that the first record compared has compared values greater than and not equal to the second record's compared values.
The comparisons are not slow, going by the way it acts at the beginning of the decode (search) or with a smaller number of records (sort).
I firmly believe it is the speed of the disk accesses that are slowing the program down. I don't know if disk accesses can be sped up, but I do know that the recursive sorts I used in DCom were much faster than the original linear (bubble?) sorts were.
From: Steven Hirsch <snhirsch at gmail.com>
To: CoCoList for Color Computer Enthusiasts <coco at maltedmedia.com>
Sent: Fri, November 20, 2009 2:26:39 PM
Subject: Re: [Coco] the final stage of optimization
On Fri, 20 Nov 2009, Wayne Campbell wrote:
> I googled quicksort. The wiki page is very informative. While reading it, I remembered how I wrote the sorts in DCom. They were based on quicksort. Because they were contained in separate procedures, I was able to use them recursively, and that's what made it work.
> In decode, the sorts and searches are part of that procedure, and not separate. This rules out recursion, as I can't call decode repeatedly. But I did gain an understanding I didn't previously have. I need to deal with the records to be searched or sorted in groups. This will require extra code to determine if a split is required, and how to divide it. I'm reasonably certain that a n/2 division will be faster, but I may be able to have it do a n/4 division when the number of records becomes greater than a specific number.
> I think that doing it this way will speed things up because it won't have to search every record to find a match. We'll see.
I'm not entirely sure I understand what you are doing/trying to do. Do you need to maintain sorted and searchable set of items dynamically? If so, perhaps what you need is a tree or heap structure. The insertion will be on average O(LogN) instead of the constant time insertion cost of a list, but it maintains the sequence and can be searched in the same O(LogN) time.
Coco mailing list
Coco at maltedmedia.com
More information about the Coco