I wanted to follow up and explain how 64-column text is handled.
Since each character cell is 5 pixels wide (64 cols * 5 px = 320 px) many of the characters overlap the normal 8x8 cells. What I did here was break the x-axis into 8 zones of 40 pixels (5 normal 8x8 cells). Eight 64-column chars can fit into each 40-pixel zone. So when determining where the next character should go, the routine divides the x-coordinate (range --> 0-63) by 8 to get the correct 40-pixel zone, numbered 0-7. Once the correct base address is calculated for the Y-row ($a000 + (320 * Y) we add (40 * zone #) to get the base address for this zone. The x-coordinate is further masked for the lowest three bits (range --> 0-7), which determines which of the eight custom draw routines will be used in this 40-pixel zone. When calculating complicated addresses I don't even think about messing around with math, even at 20 MHZ. Every component comes from a look-up table; tbl_320, tbl_40, even tbl_8. The final, effective address then becomes a matter of simple addition -- something the 65x family does
very well.
Each custom sub-zone drawing routine is accessed via an indirect jump table. Within a 40-pixel zone, character locations 0, 2, 5, and 7 are fairly easy (relatively speaking, that is) and only require one pass, because all 5 pixels fall within one 8x8 normal character cell. Character locations 1, 3, 4, and 6 are a different story, however. There is a left and a right component to these, and it was quite a challenge to figure all of it out with the right pixel masks (and offset from the beginning of the 40-pixel zone). Lucky for me I did this 25 years ago when the brain still had a little life! Just kidding. Something like this stays with you
forever. Over the years I expanded the function to include four different types of blending masks, so that you could place the text behind other foreground objects, combine, exclude, or replace other objects, much as you can do with hardware sprites. If you have been thinking the 40 pixel zones seem a little like GCR encoding, you would be correct -- the process is vaguely similar.
I encourage anyone who has the inclination to learn about goofy stuff like this to look at the source. I did manage to scrape up the 8-bit versions for the C64 and have included them here. Unfortunately, I did not include all of the equates, zp locations, checks for non-print chars, etc. with these, and they are not really commented at all because they were written a long time ago directly from formulas and drawings on graph paper, and then ripped from the original 5 1/4" floppies and dot-matrix fan-fold paper in 2008, but they are still great examples to work from. It might be a little like putting a puzzle together, however. The version with the blending code (6425long.asm) does add an intense amount of complexity to the function. As it is now, even the 8-bit version puts up 64-column hires text faster than the kernal displays normal text characters.
If there is any part you may want clarified please do not hesitate to contact me. This 64-column text is my 25 yr. old child and I definitely don't mind talking about her...