[Coco] Update on new Coco 3 game engine
Fedor Steeman
petrander at gmail.com
Tue Aug 13 05:28:33 EDT 2013
Holy crap that is awesome!!!
I have long been considering something similar, but never had the time nor
the know-how.
This could become an enormous stimulus for CoCo game development!
Cheers,
Fedor
On 13 August 2013 07:26, Richard Goedeken
<Richard at fascinationsoftware.com>wrote:
> Hello Coco fans!
>
> In my long-term quest to write a side-scrolling arcade/adventure game for
> my daughter, I began earlier this year with one of the hardest parts:
> building a fast enough graphics engine to handle the scrolling and sprites.
> I figured that if I couldn't get this working well enough, then there
> would be no point in doing all of the work creating the game elements. I'm
> writing this email because this graphics code is nearly complete, and I
> wanted to share some of the many interesting things that I learned about
> the Coco and the 6809 during this development.
>
> In the design of the graphics engine, there are many decisions to be made
> which trade off between performance and visual quality. The one major
> advantage that the Coco has over the other 8-bit micros of the era is the
> large available memory pool of 512k. I wanted to use this to my advantage
> as much as possible, and you will see it in some of the choices that I made.
>
> I decided to use double buffering, which is very common, to eliminate
> tearing and flashing artifacts. This requires twice as much memory usage,
> and also requires us to redraw twice as many background pixels as we
> otherwise would. For example, consider the case in which the screen
> background is moving at 1 byte (2 pixels) per frame horizontally. Buffer 0
> is drawn at a starting point of (0,0). For the next frame, buffer 1 is
> drawn at (1,0). For the following frame we will switch back to buffer 0.
> We need to draw the new pixels for this screen buffer at a starting point
> of (2,0). We have already drawn this buffer at (0,0), so we only need to
> add two columns of bytes (4 pixels) on the right side to paint in the
> missing part of the screen. So we must draw a column 4 pixels wide, even
> though we have only moved 2 pixels from the previous frame. This is because
> each buffer only gets updated every other frame.
>
> I also decided to use the 256-byte wide screen mode. This increases
> memory usage for the screens by 60%, but it gives us some good advantages:
> 1. Pixel location calculations are greatly simplified (no need to multiply
> by 160). 2. We do not need to clip sprites on the sides when we draw them,
> because it's okay to draw a little offscreen. 3. Background block
> redrawing can be faster and more consistent in time between frames by
> always drawing the full width of the blocks.
>
> I really wish the GIME designers had provided for byte-level horizontal
> screen positioning. It is extremely unfortunate that it can only set the
> horizontal scroll position in 2-byte (4 pixel) increments. The only way to
> make it scroll smoothly with this constraint is scroll A) very fast, and B)
> at a constant speed. Some games (Crystal City) do this and it looks
> impressive, but this scrolling is faster than I want, and I also would like
> to vary the scrolling speed. Slower scrolling is too jerky with 2-byte
> positioning. In software, we can do 2-pixel scrolling by using a pair of
> screen planes (even and odd) for each of the front and back buffers. One
> screen plane is offset by one byte, and we choose which plane to display on
> the monitor when we are flipping front/back buffers in the vertical
> interrupt by looking at the lowest bit of the X screen start position in
> bytes. The penalty for this finer scrolling is doubling the video memory
> usage, and about 30% more time to draw the background pixels.
>
> So that's the background scrolling engine. I posted a demo on this list a
> few months ago. I recently rewrote the block drawing functions with an
> improved copying algorithm, so that it is now interrupt friendly (this is
> required for sound), and also a little bit faster.
>
> Regarding performance, the amount of time which passes between one field
> of the NTSC video output from the Coco and the next is 16.7 milliseconds.
> This is our "time budget". To achieve 60fps operation, we must
> draw/erase/redraw everything necessary, as well as read input
> keyboard/joystick state and do physics calculations in less than this time.
> Similarly, to run at 30fps we need to finish all these calculations in
> less than 33.4ms. One thing that I realized is that the computational
> workload for the game can vary greatly depending upon number of objects on
> the screen, the positions of the objects, whether the background is
> scrolling and by how much, etc. Rather than try to achieve a constant
> frame rate (at which every frame will be bound by the worst case), it is
> better to support a variable frame rate. This is a common technique used
> in modern games, and in fact I even noticed this is the new Pikmin 3 game
> for the Wii U. Since we already use double-buffering, this can be supported
> with a small penalty when doing the physics calculations. So my game
> engine does this: I track the number of 60hz fields which pass between
> frame updates, and use this value for updating the game state ('physics'
> calculations). For example, all objects will move at 3* their nominal
> speed if there were 3 field durations which passed between the last pair of
> frame updates. For simplicity and performance, I only support 1x, 2x, and
> 3x field times for the variable frame rate. If it takes more than 3 field
> durations to calculate a frame, then the game will appear to 'lag' or slow
> down. Otherwise, it will just get a little choppier as it slows down, but
> will appear to run at the same perceptual speed.
>
> The performance of the scrolling engine is pretty good. Here is a table
> which shows the number of milliseconds required to update the background
> (in terms of bytes for horizontal scrolling, and rows for vertical
> scrolling):
>
> Time (millisec) -8 -6 -4 -2 0 2 4 6 8
> ------------------------------**------------------------------**-------
> Horizontal 12.3 9.7 7.2 4.3 1.4 4.3 7.2 9.8 12.3
> Vertical 12.0 10.2 8.4 5.0 1.4 5.0 8.3 10.1 12.0
> ------------------------------**------------------------------**-------
>
> The total overhead of the engine with no objects running is 1.4
> milliseconds. This includes reading 2 axes of one joystick and all the
> screen redraw logic. One thing that I noticed is that IRQ overhead of the
> 6809 is really high. The horizontal interrupt is a killer. The overhead
> for even an FIRQ is 21 cycles (10 cycles to enter, 5 for the LBRA at $FExx,
> and 6 for the RTI). The fastest routine that I can come up with handle
> both VSync and sound is a minimum of 45 cycles in 9 instructions, and I
> would probably need more than this to dynamically update the screen based
> on row number. So we need a minimum of 66 cycles for this interrupt
> routine, and here's the kicker: the horizontal interrupt signals arrive
> only 114 cycles apart. Therefore, using the horizontal interrupt will
> occupy a minimum of 58% of all clock cycles, regardless of frame rate.
> This is too steep for me, so I will not use this and won't be able to
> split up the screen into horizontal regions, like Nick is doing for Popstar
> Pilot. I'll run the sound at a lower frequency off of the 12-bit timer,
> and turn off this interrupt source when the sound is not playing.
>
> During the last few months I've made a lot of progress on the sprite
> portion of the graphics engine. I believe that my design for this is
> novel, and it is about as fast as it could be. My goal here was maximum
> theoretical performance. Again, I traded off memory consumption for speed.
> Part of the challenge with drawing sprites is that the 16 color mode packs
> 2 pixels into a single byte. If you want to support sprites with 1-pixel
> wide features, you must mask the background bytes with a logical AND, and
> then OR/ADD the results with the sprite pixels before writing back to the
> screen. The fastest general-purpose sprite routines that I can write
> require 3720 cycles to erase and write (while saving background data for
> later erasing) a 16x16 sprite. This works out to 14.5 cycles per pixel
> (assuming that all 256 pixels are drawn).
>
> To achieve the maximum possible performance with my sprite engine, I wrote
> a sprite compiler. This software is a large and complex Python script,
> which reads sprite data from a file and writes out near-optimal 6809
> assembly for drawing and erasing sprites on the screen. It basically
> paints them, byte by byte. Even though the sprite compiler includes a lot
> of crazy optimizations, the performance gains that I get on a
> cycle-per-pixel basis are relatively small and mostly attributable to two
> techniques: 1) I don't need to AND mask the bytes/words which will get
> completely overwritten, and 2) I can minimize foreground pixel loads by
> grouping together writes with the same byte/word values. For the few
> sprites with which I've been testing, the compiled sprite code takes an
> average of 12.9 cycles/pixel to erase and draw, which is only 12% faster
> than the general purpose routine, but the big gain comes from the fact that
> we only draw and erase the bytes which contain non-transparent pixels.
> When we look at the overall time consumed (rather than cycles per pixel),
> the new sprite engine turns out to be much faster than the general purpose
> routine. For example, I can draw+erase a 15-pixel diameter ball in under
> 2000 cycles. This is much faster than the general purpose routine, which
> would take 3720. I can draw+erase nice outlined 8x16 numeric characters in
> 1000 cycles or less each.
>
> So I'm happy with the performance. As I mentioned before, the tradeoff is
> increased memory consumption. For a general-purpose engine, you would use
> probably 2 bytes per pixel to store sprite data (each sprite object would
> have 2 copies to get single-pixel positioning, and each copy would contain
> a mask byte and a foreground pixel byte for each screen byte). For my
> engine, the generated machine code for drawing and erasing the sprites
> varies, but comes out to about 6 bytes per pixel if you only need
> byte-level positioning (ie, for letters/numbers), or 9 bytes per pixel if
> you want pixel-level positioning. It's a pretty heavy memory penalty, but
> I think it's worth the speed. The maximum sprite size is about 62x32, and
> the cool thing is that the sprites can be any shape or size, and will be
> optimized for just the pixels which get written to the screen.
>
> With the high memory consumption (and a scrolling graphics aperature which
> can move anywhere in the physical RAM space), it is desirable to abstract
> the 8k memory page (de)allocation and mapping. So, one of the very first
> modules of code that I wrote is a simple virtual memory manager which
> tracks the 8k pages which are allocated by different parts of the engine.
> It automatically moves them when the screen aperature moves to overlap
> with an in-use block.
>
> I'm really excited about this graphics/game engine, because it is
> sufficiently generalized that it could be used for a lot of great games in
> addition to platformers. It's not suitable for every genre, but it would
> work well for several different game types. I would love to do a top-down
> racer like Micro Machines. If it were simple enough, it could look
> beautiful running at 60fps. This engine is also suitable for horizontal
> shoot-em-ups and top-down or isometric RPG graphic adventure or arcade
> action games. The sprite functionality could be extracted separately from
> the background scrolling engine and could be used in any type of game.
>
> I also came up with a name for this engine: I call it DynoSprite. I have
> a few more weeks of work to do on a demo that I will release to show the
> sprite functionality. With any luck I should have something cool to show
> you soon.
>
> Richard
>
> --
> Coco mailing list
> Coco at maltedmedia.com
> http://five.pairlist.net/**mailman/listinfo/coco<http://five.pairlist.net/mailman/listinfo/coco>
>
More information about the Coco
mailing list