[Coco] FW: Multi-Processor 6809 Computer System

Kip Koon computerdoc at sc.rr.com
Tue Apr 30 23:20:54 EDT 2013


John,
Thank you for sharing your experiences with your multicore 6809 FGPA
project.  Any chance on encouraging you to finish that design?  That would
be the trigger to get me into the FPGA arena!  It would be a dream come true
for me to see a multi-core 6809 processor chip booting a Multicore version
of NitrOS-9!  The things we could do with that in a 6809 Multicore Coco!  I
have been admiring your talent for some time now.  Keep up the most
excellent work.  You have a fantastic technical engineering design aptitude!
I'll be eagerly waiting for any further developments you care to do!  Thanks
again for sharing!  Happy 6809ing!
Kip

-----Original Message-----
From: coco-bounces at maltedmedia.com [mailto:coco-bounces at maltedmedia.com] On
Behalf Of John Kent
Sent: Monday, April 29, 2013 4:09 PM
To: CoCoList for Color Computer Enthusiasts
Subject: Re: [Coco] FW: Multi-Processor 6809 Computer System


You can instantiate as many 6809 cores in an FPGA as it will fit in the
FPGA, although you will probably need block RAM cache as you would need to
share common memory between the CPUs. I was working on a quad core
6809 system at one stage, many years ago, using a XC3S1000 on the spartan 3
starter FPGA board from Digilent, but I ran into some difficulties trying to
work out the bus arbitration for shred memory and never completed it. I
wanted to use rotating priority between CPUs for memory access. It need a
priority encode, which I've subsequently made and a small barrel shifter or
cross point switch to rotate the encoder inputs, and a counter connected to
an adder on the output of the priority encoder to offset the rotated encoded
inputs, if you can work that out. The counter would also drive the barrel
shifter and would be incremented on each bus access. The result is that the
priority of the requests from each CPU for memory is rotated but the adder
offsets the encoded priority so that the same code corresponding to the
particular CPU generating the request is produced. The rotating priority of
memory request means that each CPU is more likely to get an even share of
the memory. If you are looking for speed, a FPGA 6809 will work much faster
than the original chip (25MHz as opposed to 1 or 2MHz). Actually 1 FPGA
6809 is equivalent to about 12 x 2MHz 6809s or even slightly faster.

The XC3S1000 I think has 16 x 2KB block RAMs so I was going to use 3 block
RAMs per CPU, 2 for the address (tag?) cache and one for the program and
data cache. (no separate instruction and data cache). I came up with a
scheme for cache coherency so that the CPUs could communicate through common
memory. You still need some form of signalling between CPUs to indicate when
data is ready to be used by another CPU. You could use a polling system on
memory, but interrupts would be more efficient. 
The extra 4 block RAMs I was going to use for the VDU text buffer, attribute
buffer, character generator and monitor program. You probably need an
additional CPU as the supervisor or I/O processor, but I don't think it
fitted in the XC3S1000. 4 CPUs was the limit with that FPGA. 
The XESS XuLA2 board I think has 1600K gates, so might be able to
accommodate 5 CPUs, but it uses SDRAM which is slower than the 10n SRAM on
the Digilent Spartan 3 starter board. The Altera DE1 board might also be a
candidate, although I think the FPGA may be a slightly smaller than the
XC3S1000.

The XC3S1000 also has 16 hardware multipliers which would allow for 4 x
32 bit hardware multipliers i.e. one per CPU. The hardware multipliers in
the Spartan 3 have 18 bit inputs and 36 bit outputs. You can use 4 of them
to make a 32 x 32 input = 64 bit output hardware multiplier that will
operate in only a few clock cycles. You'd use 4 of them Hi x Hi, Hi x Lo, Lo
x Hi and Lo x Lo and add the parts of the 32 bit results. You would use 32
bits rather than the full 36 bits, as it would map onto byte boundaries more
cleanly. If the floating point maths in BASIC uses
5 bytes (1 byte signed exponent and 4 byte signed mantissa) then the
multiplier could possibly be used to speed up the BASIC multiply routine.

I sent a QIX arcade board to Mark Mc Dougall a few years ago. It used 2
6809s, one for the game and one for the graphics processor. It used a
6845 CRTC with 64K RAM mapped into two 32K pages. The display was 256 x
256 pixels with 2 bits each for R, G, B and I (intensity) signal giving
256 colors.  It used dual port memory to communicate between the two 6809s.
I think the CPUs were clocked 90 degrees out of phase to avoid race problems
with the dual port memory. The board(s) also had a 6800 for the sound
processor which I think communicated with the game 6809 through a pair of
back to back (?) 6821 PIAs and it had two 8 bit DACs connected to a 6821 PIA
to generate stereo sound. It used a pair of
LM2002 amplifier chips for the output. I still have the manual and
schematics for the board in the filing cabinet I could probably scan,
although I think it's probably more complicated and difficult to build than
Kip was looking for.


John.

On 29/04/2013 11:40 PM, Luis Antoniosi (CoCoDemus) wrote:
> The Fujitsu FM-7 computer uses 2 6809 being on of them for generating 
> graphics. There is no vdp on this computer, the second 6809 is used 
> exclusively for generating 8 color 640x200 pixels graphics.
>
> Some dual 6809 arcades do the same.
>

--
http://www.johnkent.com.au
http://members.optusnet.com.au/jekent



--
Coco mailing list
Coco at maltedmedia.com
http://five.pairlist.net/mailman/listinfo/coco




More information about the Coco mailing list