/*
* HOME(PROJECTS) || RESUME || LINKS || ABOUT
*/

|Newest| . . . |<<Newer| . . . . . . |Older >>| . . . |Oldest|
(view all as one document)

Retrochallenge 2018/09, 2019/03

Cycle Scavenging

September 11, 2018

10:15 pm


To output each pixel in the VGA routine takes two instructions:
	inc zh                     //incremenet address
	out io(RAMADDR_PORT), zh   //output new address
This is repeated many times as an unrolled loop:
	//define some macros

	.macro PX
		inc zh
		out io(RAMADDR_PORT), zh
	.endm

	.macro PX4
		PX
		PX
		PX
		PX
	.endm

	.macro PX16
		PX4
		PX4
		PX4
		PX4
	.endm


	//a bit later, when the time is right

	PX16
	PX16
	PX16
	...
	PX16  //256 pixels

The AVR has the feature that writing to the INPUT register of a port performs an XOR on that register and flips bits. If I already have the right value (the number 1) loaded, I can output 2 pixels using only 3 instructoins:


.macro PX2noppy_odd_even
	nop                       //do nothing!
	out io(RAMADDR_PIN), zl   //flip the last bit: the 0 on the address becomes a 1
	subi zh, -2  //skip 2     //count forward 2
	out io(RAMADDR_PORT) ,zh  //output the next even address
.endm

I replaced 16 pixels of the display with the 'noppy' version as a test. Everything works the same. This creates the same display, but every 4th instruction is a nop. What is the advantage? These nops can be replaced with any other single cycle instruction. So, one quarter of the 256 instructions used to generate the display (64 instructions) is potentially reclaimed. With caveats. I am limited to single cycle instructions (any 2-cycle one will break the timing), and there can be no jumps. Limited to basic arithmetic and bit operations: essentially branchless combinational logic. Not sure what I can put there yet.

On the original Hackaday.io project page for this, I had implemeneted a higher than possible horizontal resolution by making use of one of the AVR's PWM timers. PORTB is used for addressing memory. PORTB also has PWM pins, so I configured the PWM pin to toggle every cycle. With the code timed correctly, even though I could only increment and output once every 2 clock cycles, the PWM was making each 2-cycle long address break into 2 addresses. Similar to what I have tried above by flipping one bit without incrementing. Well, on the page I just linked to, I did realize even back then that if I used the PWM trick on a low resolution video mode, I only needed to increment and write once every 4 clock cycles. Leaving 2-cycle nops available. The idea back then was to implement an interpreter within those cycles. Now, that idea seems crazy, but I think if I can just reclaim enough CPU, I should be able to maybe process the PS2 keyboard within the nops recaptured from video. There are two problems with this 1)The way the PWM toggles bits complicates horizontal scrolling, since its not as simple as 'add one', the bit being toggles is actually bit number 3; very annoying. To make it work before I needed to write to memory in a strange swizzled order. And 2) If I were to implement hi-res video, the ps2 handling code will still have to go somewhere.

To get on with this project, I think I am just going to have the PS2 keyboard blank out a scanline for handling incoming bits. I can clean it up later.



VGA Bug

September 11, 2018


I messed up writing the VGA rotines the other day. I was wondering why the screen was vertically offset a little bit, despite the vertical blanking being the in the right spot. Turns out, it was not. I messed up my row counter. I never reset it! It would just overflow, so I was counting out 512 scanlines instead of 525. To save CPU time, I attempted to use two byte variables: one is the row to draw, and the other tracks odd vs even. This came out to less clocks wasted loading, incrementing, testing 16 bit values, etc. I would have to notch out pixels otherwise if I didn't do this optimization.

The real fix would be to use 16-bit math, but the longer loads, stores, and compares was blowing the length of the interrupt routine. I need the active video scanline case to be fast.

The trick was to keep the existing active video parts all the same. The interrupt routine is already abbreviated when there is a vertical blanking interval; it runs just enough to get the hsync/vsync timing right and then exits to the user application. So, in the vertical blanking interval, the rowcounter is hacked up to make the frame take 525 lines. The video driver still alternates odd/even for rows 0 to 254. (Scanlines 0 to 509). The next lines, the counters are locked to 'even' and 255. A seperate counter tracks how many scanlines beyond the counter limit, and when it equals 15, it then resets all counters to zero. (510 from 2x254, then + 15 = 525 lines). This works fine! This additional counter was then stored in the leftover bits of the odd/even flag. (Since it was a boolean, 7 other bits are wasted!)


Reducing display width?

Originally I rewrote the VGA routine to get better control over timing. Part of the goal was to optimize it so I can get the full 256 pixel width of the display. However, now there's extra stuff I haven't crammed in yet, like ps2 keyboard polling. If there is a change on the PS2CLK input, I suppose a scanline can be skipped, or notched out, without too much ill effect. But, another thing I wanted to do with this is support good horizontal scrolling. The VGA driver already support scrolling in both direction. However verical scrolling has something horizontal doesn't: a hidden area. Therea are 256 pages in a bank of memory, and vertical scrolling just changes which page is the first scanline, and there are only 240 visible lines. This gives 16 lines that can be written to in software BEFORE scrolling, and the user will not see the drawing. To do the same thing horizontally, there needs to be a strip of hidden pixels. If the width of the display is shrunk from 256 to 248 pixels, this gives a 8 pixel wide hidden area, and also provides 16 additional clock cycles to the video interrupt. If the text font used is 8 pixels wide, this gives a strange 31 character wide display. If the font is switched to an 6 pixel wide font, the display will be 41 characters wide.

|Newest| . . . |<<Newer| . . . . . . |Older >>| . . . |Oldest|
(view all as one document)