[Coco] Mod10 Suggestions
Dave Philipsen
dave at davebiz.com
Sun Feb 19 00:02:46 EST 2017
Or maybe not, after all....
On 2/18/2017 10:57 PM, Dave Philipsen wrote:
> Yeah, I think the BNE is one less cycle if the branch isn't taken, right?
>
> Dave
>
>
> On 2/18/2017 10:53 PM, William Astle wrote:
>> It would be 8 BNEs actually. It's executed even for the last loop.
>>
>> BNE is 3 cycles and DECB is 2 cycles so 40 cycles total.
>>
>> You can also save a cycle for each "temporary" reference by just
>> using RESULT as the temporary instead of using the stack. It's one
>> byte longer but one cycle faster as long as RESULT is in range of an
>> 8 bit offset from PC. That would be 2 cycles gained per iteration for
>> a total of 16 cycles. It's faster to use the stack if a PCR access to
>> result would need a 16 bit offset.
>>
>>
>> On 2017-02-18 09:15 PM, Dave Philipsen wrote:
>>> How much speed would you gain by completely eliminating 8 DECBs and 7
>>> BNEs?:
>>>
>>> ORG $1200
>>> CCD RMB 16
>>> RESULT RMB 1
>>>
>>> START LEAX CCD+16,PCR
>>> CLRA
>>>
>>> LOOP ADDA ,-X
>>> DAA
>>> PSHS A
>>> LDA ,-X
>>> LSLA
>>> CMPA #10
>>> BLO LOOP2
>>> SUBA #9
>>> LOOP2 ADDA ,S+
>>> DAA
>>>
>>> ADDA ,-X
>>> DAA
>>> PSHS A
>>> LDA ,-X
>>> LSLA
>>> CMPA #10
>>> BLO LOOP3
>>> SUBA #9
>>> LOOP3 ADDA ,S+
>>> DAA
>>>
>>> ADDA ,-X
>>> DAA
>>> PSHS A
>>> LDA ,-X
>>> LSLA
>>> CMPA #10
>>> BLO LOOP4
>>> SUBA #9
>>> LOOP4 ADDA ,S+
>>> DAA
>>>
>>> ADDA ,-X
>>> DAA
>>> PSHS A
>>> LDA ,-X
>>> LSLA
>>> CMPA #10
>>> BLO LOOP5
>>> SUBA #9
>>> LOOP5 ADDA ,S+
>>> DAA
>>>
>>> ADDA ,-X
>>> DAA
>>> PSHS A
>>> LDA ,-X
>>> LSLA
>>> CMPA #10
>>> BLO LOOP6
>>> SUBA #9
>>> LOOP6 ADDA ,S+
>>> DAA
>>>
>>> ADDA ,-X
>>> DAA
>>> PSHS A
>>> LDA ,-X
>>> LSLA
>>> CMPA #10
>>> BLO LOOP7
>>> SUBA #9
>>> LOOP7 ADDA ,S+
>>> DAA
>>>
>>> ADDA ,-X
>>> DAA
>>> PSHS A
>>> LDA ,-X
>>> LSLA
>>> CMPA #10
>>> BLO LOOP8
>>> SUBA #9
>>> LOOP8 ADDA ,S+
>>> DAA
>>>
>>> ADDA ,-X
>>> DAA
>>> PSHS A
>>> LDA ,-X
>>> LSLA
>>> CMPA #10
>>> BLO LOOP9
>>> SUBA #9
>>> LOOP9 ADDA ,S+
>>> DAA
>>>
>>> ANDA #$0F
>>> STA RESULT,PCR
>>> ENDPGM RTS
>>> END START
>>>
>>> On 2/18/2017 8:22 PM, William Mikrut wrote:
>>>> Which is the beauty of this project.
>>>>
>>>> Clearly there are at least 3 ways to do this...each with a slightly
>>>> different outcome.
>>>>
>>>> Some optimization for size,speed... or both.
>>>>
>>>> There is a wealth of information and experience here from everone
>>>> and I
>>>> truly appreciate all the input!
>>>>
>>>> I can't wait to start the next project and see where it leads!!
>>>>
>>>>
>>>>
>>>> On Feb 18, 2017 8:10 PM, "L. Curtis Boyle" <curtisboyle at sasktel.net>
>>>> wrote:
>>>>
>>>>> I was just going to mention that if speed is more important, doing an
>>>>> leas
>>>>> -1,s before the loop, and then just a sta ,a /adda ,s (instead of
>>>>> pshs
>>>>> a/add ,s+), and then a final leas 1,s after the loop is done would be
>>>>> a bit
>>>>> longer, but a bit faster.
>>>>>
>>>>> L. Curtis Boyle
>>>>> curtisboyle at sasktel.net
>>>>>
>>>>> TRS-80 Color Computer Games website
>>>>> http://www.lcurtisboyle.com/nitros9/coco_game_list.html
>>>>>
>>>>>
>>>>>
>>>>>> On Feb 18, 2017, at 7:41 PM, Dave Philipsen <dave at davebiz.com>
>>>>>> wrote:
>>>>>>
>>>>>> That's pretty well optimized! Have you ever considered the
>>>>>> difference
>>>>> between optimizing for size and optimizing for speed? So, for
>>>>> instance, if
>>>>> you weren't necessarily constrained for size but you knew you were
>>>>> going to
>>>>> process a list of jillions of cc numbers would you write it
>>>>> differently?
>>>>>> Dave Philipsen
>>>>>>
>>>>>>> On Feb 18, 2017, at 5:06 PM, William Mikrut <wmikrut72 at gmail.com>
>>>>> wrote:
>>>>>>> Some slight re ordering of the code and it works perfectly!
>>>>>>> 48 Bytes total, Less 17 for storage -- 31 program bytes to get
>>>>>>> the job
>>>>> done.
>>>>>>> My original code was 61 program bytes... down to half the size and
>>>>>>> does
>>>>> the
>>>>>>> exact same thing.
>>>>>>> Absolutely amazing!
>>>>>>>
>>>>>>>
>>>>>>> ORG $1200
>>>>>>> CCD RMB 16
>>>>>>> RESULT RMB 1
>>>>>>>
>>>>>>> START LEAX CCD+16,PCR
>>>>>>> CLRA
>>>>>>> LDB #8
>>>>>>>
>>>>>>>
>>>>>>> LOOP ADDA ,-X
>>>>>>> DAA
>>>>>>> PSHS A
>>>>>>> LDA ,-X
>>>>>>> LSLA
>>>>>>> CMPA #10
>>>>>>> BLO LOOP2
>>>>>>> SUBA #9
>>>>>>> LOOP2 ADDA ,S+
>>>>>>> DAA
>>>>>>>
>>>>>>> DECB
>>>>>>> BNE LOOP
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ANDA #$0F
>>>>>>> STA RESULT,PCR
>>>>>>> ENDPGM RTS
>>>>>>> END START
>>>>>>>
>>>>>>>> On Sat, Feb 18, 2017 at 1:03 PM, William Mikrut
>>>>>>>> <wmikrut72 at gmail.com>
>>>>> wrote:
>>>>>>>> You are right -- I looked at is closer.
>>>>>>>> One thing I need to do is reverse the order of operations.
>>>>>>>>
>>>>>>>> The LSLA is performed first.
>>>>>>>> First I need to store the byte and LSLA the next byte.
>>>>>>>>
>>>>>>>> Otherwise if I flip it from left to right:
>>>>>>>> (LEAX CCD,PCR
>>>>>>>> ...
>>>>>>>> LDA ,X+
>>>>>>>> ...
>>>>>>>> ADDA ,X+)
>>>>>>>>
>>>>>>>> it works perfectly.
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Sat, Feb 18, 2017 at 11:35 AM, William Astle <lost at l-w.ca>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Take a closer look. It only does the LSLA on every other
>>>>>>>>> digit. It
>>>>> does
>>>>>>>>> *two* digits per loop, just like Brett's version.
>>>>>>>>>
>>>>>>>>> You can easily pretend all numbers are 16 digits by right
>>>>>>>>> justifying
>>>>> the
>>>>>>>>> numbers in your buffer and padding with zeros.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On 2017-02-18 10:06 AM, William Mikrut wrote:
>>>>>>>>>>
>>>>>>>>>> I like how this works from right to left.
>>>>>>>>>> The only issue is the LSLA on every number.
>>>>>>>>>>
>>>>>>>>>> The algo is to double every other number, starting with the
>>>>>>>>>> right
>>>>> most
>>>>>>>>>> digit, and sub 9 if the result is 10 or more.
>>>>>>>>>>
>>>>>>>>>> Now if the number is always 16 digits, Brett's 16 bit word seems
>>>>>>>>>> the
>>>>>>>>>> easiest way to go.
>>>>>>>>>> If the number is 13 digits long the 16 bit word method won't
>>>>>>>>>> work,
>>>>> but I
>>>>>>>>>> am
>>>>>>>>>> happy to pretend all numbers are 16 digits!
>>>>>>>>>>
>>>>>>>>>> I am going to try to include a couple things you showed me into
>>>>> Brett's
>>>>>>>>>> 16
>>>>>>>>>> bit chunk method and try a slightly different routine!
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sat, Feb 18, 2017 at 10:22 AM, William Astle <lost at l-w.ca>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> On 2017-02-18 12:43 AM, msmcdoug wrote:
>>>>>>>>>>> Actually I'm surprised noone has suggested bcd arithmetic on
>>>>>>>>>>> the
>>>>> result
>>>>>>>>>>>> to eliminate divide by 10 loop
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> BCD would certainly give a predictable overall cycle count. It
>>>>>>>>>>> would
>>>>>>>>>>> require a significantly different approach, though. The only
>>>>> register
>>>>>>>>>>> you
>>>>>>>>>>> can use for BCD arithmetic is A and DAA is only useful after
>>>>>>>>>>> ADDA or
>>>>>>>>>>> ADCA.
>>>>>>>>>>>
>>>>>>>>>>> I had thought about using BCD but had initially dismissed it
>>>>>>>>>>> due to
>>>>>>>>>>> possible complexity. However, upon reflection, the extra
>>>>>>>>>>> cycles to
>>>>> use
>>>>>>>>>>> BCD
>>>>>>>>>>> would probably be less than the average cycle time of the
>>>>>>>>>>> modulus
>>>>> loop
>>>>>>>>>>> combined or checking for digit overflow during the loop.
>>>>>>>>>>>
>>>>>>>>>>> I think you could use code that looks something like the
>>>>>>>>>>> following
>>>>> which
>>>>>>>>>>> is based off Mr. Mikrut's most recent posted code. (warning:
>>>>>>>>>>> mailer
>>>>>>>>>>> codeā¢
>>>>>>>>>>> follows so it may have errors)
>>>>>>>>>>>
>>>>>>>>>>> ORG $1200
>>>>>>>>>>> CCD RMB 16
>>>>>>>>>>> RESULT RMB 1
>>>>>>>>>>> START LEAX CCD+16,PCR
>>>>>>>>>>> CLRA
>>>>>>>>>>> LDB #8
>>>>>>>>>>> LOOP PSHS A
>>>>>>>>>>> LDA ,-X
>>>>>>>>>>> LSLA
>>>>>>>>>>> CMPA #10
>>>>>>>>>>> BLO LOOP2
>>>>>>>>>>> SUBA #9
>>>>>>>>>>> LOOP2 ADDA ,S+
>>>>>>>>>>> DAA
>>>>>>>>>>> ADDA ,-X
>>>>>>>>>>> DAA
>>>>>>>>>>> DECB
>>>>>>>>>>> BNE LOOP
>>>>>>>>>>> ANDA #$0F
>>>>>>>>>>> STA RESULT,PCR
>>>>>>>>>>> ENDPGM RTS
>>>>>>>>>>>
>>>>>>>>>>> I'm using the stack for a temporary storage location instead of
>>>>>>>>>>> something
>>>>>>>>>>> PCR relative for code size reasons. You could use the "RESULT
>>>>> variable
>>>>>>>>>>> for
>>>>>>>>>>> the temporary to eliminate stack usage. That would probably be
>>>>> slightly
>>>>>>>>>>> faster at the expense of two more code bytes. This is one of
>>>>>>>>>>> those
>>>>>>>>>>> size/speed trade-offs.
>>>>>>>>>>>
>>>>>>>>>>> DAA has to be used after every addition and only applies to A.
>>>>> Using BCD
>>>>>>>>>>> means we can eliminate the mod 10 loop and just mask off the
>>>>>>>>>>> upper
>>>>> digit
>>>>>>>>>>> (BCD stores two decimal digits in a byte). That gives a
>>>>>>>>>>> constant
>>>>> time
>>>>>>>>>>> for
>>>>>>>>>>> the "mod 10" result and also only takes 2 bytes (and 2 cycles).
>>>>>>>>>>>
>>>>>>>>>>> I have also eliminated the STATUS variable and just store the
>>>>> result.
>>>>>>>>>>> You
>>>>>>>>>>> can test RESULT for non-zero trivially so there's no need for a
>>>>> separate
>>>>>>>>>>> STATUS value.
>>>>>>>>>>>
>>>>>>>>>>> By my calculation, this version is 32 bytes, requires 1 byte of
>>>>> stack
>>>>>>>>>>> space, 17 bytes of data space, and runs in a maximum of 351
>>>>>>>>>>> cycles
>>>>> (and
>>>>>>>>>>> a
>>>>>>>>>>> minimum of 336 cycles if none of the doubled digits goes
>>>>>>>>>>> above 9).
>>>>> For
>>>>>>>>>>> this
>>>>>>>>>>> analysis, I've assumed 8 bit offsets for the PCR references. 16
>>>>>>>>>>> bit
>>>>>>>>>>> offsets
>>>>>>>>>>> in PCR mode are quite a bit more expensive (4 extra cycles
>>>>>>>>>>> and 1
>>>>> extra
>>>>>>>>>>> byte).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Coco mailing list
>>>>>>>>>>> Coco at maltedmedia.com
>>>>>>>>>>> https://pairlist5.pair.net/mailman/listinfo/coco
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Coco mailing list
>>>>>>>>> Coco at maltedmedia.com
>>>>>>>>> https://pairlist5.pair.net/mailman/listinfo/coco
>>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>>> Coco mailing list
>>>>>>> Coco at maltedmedia.com
>>>>>>> https://pairlist5.pair.net/mailman/listinfo/coco
>>>>>>
>>>>>> --
>>>>>> Coco mailing list
>>>>>> Coco at maltedmedia.com
>>>>>> https://pairlist5.pair.net/mailman/listinfo/coco
>>>>>>
>>>>>
>>>>> --
>>>>> Coco mailing list
>>>>> Coco at maltedmedia.com
>>>>> https://pairlist5.pair.net/mailman/listinfo/coco
>>>>>
>>>
>>>
>>
>>
>
>
More information about the Coco
mailing list