[Coco] Mod10 Suggestions
William Mikrut
wmikrut72 at gmail.com
Sat Feb 18 21:22:21 EST 2017
Which is the beauty of this project.
Clearly there are at least 3 ways to do this...each with a slightly
different outcome.
Some optimization for size,speed... or both.
There is a wealth of information and experience here from everone and I
truly appreciate all the input!
I can't wait to start the next project and see where it leads!!
On Feb 18, 2017 8:10 PM, "L. Curtis Boyle" <curtisboyle at sasktel.net> wrote:
> I was just going to mention that if speed is more important, doing an leas
> -1,s before the loop, and then just a sta ,a /adda ,s (instead of pshs
> a/add ,s+), and then a final leas 1,s after the loop is done would be a bit
> longer, but a bit faster.
>
> L. Curtis Boyle
> curtisboyle at sasktel.net
>
> TRS-80 Color Computer Games website
> http://www.lcurtisboyle.com/nitros9/coco_game_list.html
>
>
>
> > On Feb 18, 2017, at 7:41 PM, Dave Philipsen <dave at davebiz.com> wrote:
> >
> > That's pretty well optimized! Have you ever considered the difference
> between optimizing for size and optimizing for speed? So, for instance, if
> you weren't necessarily constrained for size but you knew you were going to
> process a list of jillions of cc numbers would you write it differently?
> >
> > Dave Philipsen
> >
> >> On Feb 18, 2017, at 5:06 PM, William Mikrut <wmikrut72 at gmail.com>
> wrote:
> >>
> >> Some slight re ordering of the code and it works perfectly!
> >> 48 Bytes total, Less 17 for storage -- 31 program bytes to get the job
> done.
> >>
> >> My original code was 61 program bytes... down to half the size and does
> the
> >> exact same thing.
> >> Absolutely amazing!
> >>
> >>
> >> ORG $1200
> >> CCD RMB 16
> >> RESULT RMB 1
> >>
> >> START LEAX CCD+16,PCR
> >> CLRA
> >> LDB #8
> >>
> >>
> >> LOOP ADDA ,-X
> >> DAA
> >> PSHS A
> >> LDA ,-X
> >> LSLA
> >> CMPA #10
> >> BLO LOOP2
> >> SUBA #9
> >> LOOP2 ADDA ,S+
> >> DAA
> >>
> >> DECB
> >> BNE LOOP
> >>
> >>
> >>
> >> ANDA #$0F
> >> STA RESULT,PCR
> >> ENDPGM RTS
> >> END START
> >>
> >>> On Sat, Feb 18, 2017 at 1:03 PM, William Mikrut <wmikrut72 at gmail.com>
> wrote:
> >>>
> >>> You are right -- I looked at is closer.
> >>> One thing I need to do is reverse the order of operations.
> >>>
> >>> The LSLA is performed first.
> >>> First I need to store the byte and LSLA the next byte.
> >>>
> >>> Otherwise if I flip it from left to right:
> >>> (LEAX CCD,PCR
> >>> ...
> >>> LDA ,X+
> >>> ...
> >>> ADDA ,X+)
> >>>
> >>> it works perfectly.
> >>>
> >>>
> >>>> On Sat, Feb 18, 2017 at 11:35 AM, William Astle <lost at l-w.ca> wrote:
> >>>>
> >>>> Take a closer look. It only does the LSLA on every other digit. It
> does
> >>>> *two* digits per loop, just like Brett's version.
> >>>>
> >>>> You can easily pretend all numbers are 16 digits by right justifying
> the
> >>>> numbers in your buffer and padding with zeros.
> >>>>
> >>>>
> >>>>> On 2017-02-18 10:06 AM, William Mikrut wrote:
> >>>>>
> >>>>> I like how this works from right to left.
> >>>>> The only issue is the LSLA on every number.
> >>>>>
> >>>>> The algo is to double every other number, starting with the right
> most
> >>>>> digit, and sub 9 if the result is 10 or more.
> >>>>>
> >>>>> Now if the number is always 16 digits, Brett's 16 bit word seems the
> >>>>> easiest way to go.
> >>>>> If the number is 13 digits long the 16 bit word method won't work,
> but I
> >>>>> am
> >>>>> happy to pretend all numbers are 16 digits!
> >>>>>
> >>>>> I am going to try to include a couple things you showed me into
> Brett's
> >>>>> 16
> >>>>> bit chunk method and try a slightly different routine!
> >>>>>
> >>>>>
> >>>>> On Sat, Feb 18, 2017 at 10:22 AM, William Astle <lost at l-w.ca> wrote:
> >>>>>
> >>>>> On 2017-02-18 12:43 AM, msmcdoug wrote:
> >>>>>>
> >>>>>> Actually I'm surprised noone has suggested bcd arithmetic on the
> result
> >>>>>>> to eliminate divide by 10 loop
> >>>>>>>
> >>>>>>>
> >>>>>> BCD would certainly give a predictable overall cycle count. It would
> >>>>>> require a significantly different approach, though. The only
> register
> >>>>>> you
> >>>>>> can use for BCD arithmetic is A and DAA is only useful after ADDA or
> >>>>>> ADCA.
> >>>>>>
> >>>>>> I had thought about using BCD but had initially dismissed it due to
> >>>>>> possible complexity. However, upon reflection, the extra cycles to
> use
> >>>>>> BCD
> >>>>>> would probably be less than the average cycle time of the modulus
> loop
> >>>>>> combined or checking for digit overflow during the loop.
> >>>>>>
> >>>>>> I think you could use code that looks something like the following
> which
> >>>>>> is based off Mr. Mikrut's most recent posted code. (warning: mailer
> >>>>>> codeā¢
> >>>>>> follows so it may have errors)
> >>>>>>
> >>>>>> ORG $1200
> >>>>>> CCD RMB 16
> >>>>>> RESULT RMB 1
> >>>>>> START LEAX CCD+16,PCR
> >>>>>> CLRA
> >>>>>> LDB #8
> >>>>>> LOOP PSHS A
> >>>>>> LDA ,-X
> >>>>>> LSLA
> >>>>>> CMPA #10
> >>>>>> BLO LOOP2
> >>>>>> SUBA #9
> >>>>>> LOOP2 ADDA ,S+
> >>>>>> DAA
> >>>>>> ADDA ,-X
> >>>>>> DAA
> >>>>>> DECB
> >>>>>> BNE LOOP
> >>>>>> ANDA #$0F
> >>>>>> STA RESULT,PCR
> >>>>>> ENDPGM RTS
> >>>>>>
> >>>>>> I'm using the stack for a temporary storage location instead of
> >>>>>> something
> >>>>>> PCR relative for code size reasons. You could use the "RESULT
> variable
> >>>>>> for
> >>>>>> the temporary to eliminate stack usage. That would probably be
> slightly
> >>>>>> faster at the expense of two more code bytes. This is one of those
> >>>>>> size/speed trade-offs.
> >>>>>>
> >>>>>> DAA has to be used after every addition and only applies to A.
> Using BCD
> >>>>>> means we can eliminate the mod 10 loop and just mask off the upper
> digit
> >>>>>> (BCD stores two decimal digits in a byte). That gives a constant
> time
> >>>>>> for
> >>>>>> the "mod 10" result and also only takes 2 bytes (and 2 cycles).
> >>>>>>
> >>>>>> I have also eliminated the STATUS variable and just store the
> result.
> >>>>>> You
> >>>>>> can test RESULT for non-zero trivially so there's no need for a
> separate
> >>>>>> STATUS value.
> >>>>>>
> >>>>>> By my calculation, this version is 32 bytes, requires 1 byte of
> stack
> >>>>>> space, 17 bytes of data space, and runs in a maximum of 351 cycles
> (and
> >>>>>> a
> >>>>>> minimum of 336 cycles if none of the doubled digits goes above 9).
> For
> >>>>>> this
> >>>>>> analysis, I've assumed 8 bit offsets for the PCR references. 16 bit
> >>>>>> offsets
> >>>>>> in PCR mode are quite a bit more expensive (4 extra cycles and 1
> extra
> >>>>>> byte).
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Coco mailing list
> >>>>>> Coco at maltedmedia.com
> >>>>>> https://pairlist5.pair.net/mailman/listinfo/coco
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>> --
> >>>> Coco mailing list
> >>>> Coco at maltedmedia.com
> >>>> https://pairlist5.pair.net/mailman/listinfo/coco
> >>>>
> >>>
> >>>
> >>
> >> --
> >> Coco mailing list
> >> Coco at maltedmedia.com
> >> https://pairlist5.pair.net/mailman/listinfo/coco
> >
> >
> > --
> > Coco mailing list
> > Coco at maltedmedia.com
> > https://pairlist5.pair.net/mailman/listinfo/coco
> >
>
>
> --
> Coco mailing list
> Coco at maltedmedia.com
> https://pairlist5.pair.net/mailman/listinfo/coco
>
More information about the Coco
mailing list