;
; TopOctaveV3.asm
;
; Created: 5/11/2018 8:21:56 AM
; Author : Alan
;
; AtMEGA328P emulation of a Top Octave Generator (TOG) IC (e.g. MK50242) driven by a 2MHz clock
;
; note frequency(Hz) Output
;  C9    8368.2     D0 (PD0)
;  B8    7905.1     D1 (PD0)
;  A#8   7462.7     D2 (PD0)
;  A8    7042.3     D3 (PD0)
;  G#8   6644.5     D4 (PD0)
;  G8    6269.6     D5 (PD0)
;  F#8   5917.2     D6 (PD0)
;  F8    5586.6     D7 (PD0)
;  E8    5277.0     D8 (PB0)
;  D#8   4975.1     D9 (PB1)
;  D8    4694.8     D10 (PB2)
;  C#8   4434.6     D11 (PB3)
;
; 12 outputs are toggled at various rates to produce 12 square waves at the required frequencies. These are
; approximations of a 12-tone equal temperament scale notes C#8 to C9. The program has two functional parts.
; A note creator which computes the time when a port bit for a particular note should be toggled, and a note
; producer that does the toggling at the correct time. The pitch of a note is stored as the number of 16MHz
; clock cycles between toggling the output port bit. The values used are 4 times the dividers used in the
; TOG IC (the clock is 8 times the IC clock, but one cycle of a note requires 2 toggles). The values are
; held in an array in program memory, and are moved into working memory at initialisation.
;
; method - The note creator generates a queue with 2 types of 2 byte cells. The first cell has a match value
; to load into a timer comparator register. This is followed by one or more cells containing a bit pattern to
; send to output ports. Only 12 of the 16 bits in the pattern cells are sent to the ports, 1 other bit is used
; to indicate if there is another pattern cell following. The note producer loads the match value cell time
; into the comparator register. When the hardware detects the time has been reached, the note producer sends
; the bit pattern to the output ports. If the cell indicates another cell follows, it is read and output 500nS
; (8 clock cycles) later, the process repeating until a cell indicates there is no following pattern cell.
; It is assumed to be the next timer match value, which starts the next cell sequence.
;
; The note producer is an interrupt routine activated by a match between the 16 bit TC1 timer and Output
; Compare Register 1A. Although the cells were described as a timer cell followed by one or more pattern
; cells, the interrupt routine actually processes pattern cells first, then before exiting loads the
; comparator from the next timer cell.
;
; The interface between note creator and note producer is a large memory buffer, 3 * 256 byte pages and part
; of a fourth page. The fourth page is used if a pattern cell starts near the end of the third page but a
; series of cells can't fit entirely within the page. The note creator fills pages until it reaches the page
; that the note producer is currently reading, then waits until the producer moves to the next page. This
; keeps the creator well ahead of the producer. This is desirable as the process is statistically
; indeterminate in the short term but on average the note creator creates cells faster than the producer reads
; them.
;
; The continuation of pattern cells is required if one or more port bits need toggling in less than the
; minimum time to exit the interrupt routine and return again. The minimum time is calculated to be 39 cycles,
; so shorter intervals are dealt with by the continuation method. Each pattern cell takes 8 CPU clock cycles
; to process, with a 16MHz clock this results in transitions at 500nS intervals, a 2MHz output. When
; transitions are more than 500nS and less than the minimum interrupt time, padding cells with all zeros are
; inserted. These are sent to the ports just like any pattern cell but don't result in a transition as the
; bits sent to the ports are all zero. There is a theoretical maximum sequence length for pattern cells, which
; is statistically unlikely but the program has been written to cater for it.
;
; The note producer has to work out which of the 12 notes is to be output next. There is a timer for each
; note, the note with the lowest value timer is selected for output, its timer is updated with the note period
; and normally some other note will then be selected for output [it is unlikely but possible that C9 is
; followed by C9, this does not affect the logic]. The note timers are kept in an array using registers 0-23,
; 2 registers per note. The array is ordered so the lowest value timer is in registers r23,22. When a note
; timer is updated all lower value timers are moved down the array and it is inserted [down in the array is to
; higher number registers]. 
;
; The note producer must determine how the note bit is inserted in a bit pattern cell. The options are:
; (1) the time between notes is longer than the interrupt time, so a new match value cell is required,
;     followed by a pattern cell for the note.
; (2) this note is the same time as the previous note, so its bit can be added to that note's pattern
; (3) the note requires a new pattern as a continuation of an existing pattern, perhaps with padding cells
; This decision is actually made when processing the previous lowest note when both timers are in the array
; and it is easy to determine the time difference (if any). So the determination for the next note is made and
; stored, the determination for this note retrieved and actioned.
;
; There are about 100 cycles available to process each note. This restriction can only be met by keeping the
; array of timers in registers. There are some fortunate things that allow this. The note number and next
; timer value are held in a register pair as [4 bit note:12 bit match time]. The lowest note period can be
; held in 11 bits, so the 16 bit timer is reset at 0x07FF but the time in the registers is allowed to flow
; into the 12th until the lowest timer is over 0x07ff at which time all timers are brought back to 11 bits.
; This makes comparisons easy because the timers don't wrap around. Comparing register pairs would normally
; not work without removing the 4 bit note information. However, the half carry flag has the result of the
; 12 bit compare so the registers don't need extra processing for comparison. The note information is stripped
; when the timer is put in the compare register as a match value.
;
; Because most of the registers are taken up by the note array, there are only a few left for use by the
; creator/producer.
; The producer has dedicated use of r25 and X (r26,r27) and 'borrows and returns' r24.
; the creator has use of r24, Y and Z (r28-r31).
;
.MACRO NOTEPROC
.ORG (1<<8)*(@0+3)
;
; processing for a note - entered by indirect jump
;
; parameter 0 is the note (1 to 12). This determines
;     1. location of this code in memory - 0x0400 thru 0x0F00
;     2. Which period is retrieved from the array of periods
;     3. Which bit is set on the output ports
;
; determine how the note after this one is processed
; This note data in r23,22, next note in r21,20
;
	mov     r24,r22 ; round this timer down to multiple of 8
	andi    r24,0xF8
	movw    ZL,r20 ; copy the timer after this one into Z
	sub     ZL,r24 ; subtract this timer
	sbc     ZH,r23
	clc             ; initially assume next note starts a new timer string
	andi    ZH,0x0F ; gets rid of the note info in bits 4:7 and tests for zero
	brne    n@0lab1 ; if not zero, big difference so start a new string
	andi    ZL,0xF8 ; round down to multiple of 8
	cpi     ZL,40   ; less than 40 cycles: need to add to previous bit pattern cells
n@0lab1:
	rol     ZL ; picks up carry - set if ZL was less than 40
; retrieve the processing for this timer, save the processing for the next
	in      r24,GPIOR0
	out     GPIOR0,ZL ; save for next time through
; determine if adding to existing flags, adding new flags with optional padding,
; or start a new timer string
	lsr     r24 ; retrieve the result of cpi ZL,40 into C
	brcc    n@0lab5 ; delay between notes justifies new timer string
	breq    n@0lab2 ; no delay between times, add this note bit to last note
; adding new bit pattern cell, after optional padding
	ld      ZH,Y 
	ori		ZH,0x01 ; set continuation flag
	st      Y,ZH
	adiw	YL,2 ; move to new cell
	subi	r24,8 ; difference = 8? if so no padding
	breq    n@0lab6
	ldi     ZH,0x01 ; pad cell is 0x0100
	clr     ZL
; pad 1
	st      Y+,ZH
	st      Y+,ZL
	subi    r24,8
	breq    n@0lab6
; pad 2
	st      Y+,ZH
	st      Y+,ZL
	subi    r24,8
	breq    n@0lab6
; pad 3
	st      Y+,ZH
	st      Y+,ZL
	subi    r24,8
	breq    n@0lab6
; pad 4
	st      Y+,ZH
	st      Y+,ZL
; maximum 4 pads
	rjmp    n@0lab6

n@0lab2:
; add the bit for this note to previous note bits
.IF @0 < 9
	ldd     r24,Y+1
	ori		r24,1<<(@0-1)
	std		Y+1,r24
.ELSE
	ld      r24,Y
	ori		r24,1<<(@0-8)
	st      Y,r24
.ENDIF
	rjmp	n@0lab7

n@0lab3:
; if there is time to spare, 'call' the counter routine. The anti jitter code in
; the interrupt routine can't handle the delay caused by interrupting a call or
; ret instruction (both 4 cycle instructions) so these are emulated by several
; 1 and 2 cycle instructions. r24 is loaded with the note page address so the
; 'counter' routine knows which page to return to. The address within the page is
; the 'cp XH,YH' instruction, this is hard coded into the counter routine.
	ldi     r24,high(n@0labx) ; 
	ldi     ZH,high(counter)
	ldi     ZL,low(counter)
	ijmp     

n@0lab4:
; lowest selected value > 0x07FF so all values will now have most
; significant bit set and will not compare with the timer (which runs
; to 0x07FF and resets to x0000)
; So unset most significant bit in all values
	ldi     r24,0xF7
	and     r1,r24
	and     r3,r24
	and     r5,r24
	and     r7,r24
	and     r9,r24
	and     r11,r24
	and     r13,r24
	and     r15,r24
	and     r17,r24
	and     r19,r24
	and     r21,r24
	and     r23,r24
n@0lab5:
	sbrc    r23,3 ; count > 0x07FF ?
	rjmp    n@0lab4 ; yes, pull all counter back by 0x0800
; create a new match value cell, after removing note info and rounding down
; to a multiple of 8
	adiw    YL,2 ; move to new cell
	movw    ZL,r22
	andi    ZH,0x0F ; remove note
	andi    ZL,0xF8 ; round down
	st      Y+,ZH ; save the timer match value
	st      Y+,ZL
; when pointer gets to 0x0400 or greater, it is pulled back to
; the same YL offset in 0x0100. It doesn't matter exactly where, the
; interrupt routine does the same action so will follow
	sbrc    YH,2 ; if into 0x04?? go back to 0x01??
	ldi     YH,0x01

n@0labx: ; counter routine returns here ** HARD CODED **
	cp      XH,YH ; (waiting for interrupt routine to move X)
	breq    n@0lab3
;	breq    n@0labx ; ** DEBUG **
n@0lab6:
; now add the bit pattern for the note
; Y is not advanced, allowing processing of subsequent
; notes to add their bit, or set the
; continue flag without changing Y
.IF @0 < 9
	ldi     ZL,1<<(@0-1) ; value for note
	clr     ZH
	st      Y,ZH
	std     Y+1,ZL 
.ELSE
	clr     ZL
	ldi     ZH,1<<(@0-8) ; value for note
	st      Y,ZH
	std     Y+1,ZL 
.ENDIF
n@0lab7:
; calculate the next timer value for this note - add the period to this time
; result in Z awaiting insertion into the note array
	movw    ZL,r22 ; new timer value for this note
	lds     r24,periods+((@0-1)*2)
	add     ZL,r24
	lds     r24,periods+((@0-1)*2)+1
	adc     ZH,r24
; each note has its own routine to determine where the timer is inserted into the array,
; the specific routine follows the note's instantiation of this macro
.ENDMACRO
.MACRO EPILOG
; epilog - go to another note handler
	mov     ZH,r23 ; pick out the top 4 bits, use them as a vector
	andi    ZH,0xF0 ; to a code page for the particular note
	swap    ZH
	clr     ZL 
	ijmp
.ENDMACRO
; a set of macros to insert a timer (in Z) into the timer array.
.MACRO R0INSERT
	movw    r22,r20
	movw    r20,r18
	movw    r18,r16
	movw    r16,r14
	movw    r14,r12
	movw    r12,r10
	movw    r10,r8
	movw    r8,r6
	movw    r6,r4
	movw    r4,r2
	movw    r2,r0
	movw    r0,ZL
	EPILOG
.ENDMACRO
.MACRO R2INSERT
	movw    r22,r20
	movw    r20,r18
	movw    r18,r16
	movw    r16,r14
	movw    r14,r12
	movw    r12,r10
	movw    r10,r8
	movw    r8,r6
	movw    r6,r4
	movw    r4,r2
	movw    r2,ZL
	EPILOG
.ENDMACRO
.MACRO R4INSERT
	movw    r22,r20
	movw    r20,r18
	movw    r18,r16
	movw    r16,r14
	movw    r14,r12
	movw    r12,r10
	movw    r10,r8
	movw    r8,r6
	movw    r6,r4
	movw    r4,ZL
	EPILOG
.ENDMACRO
.MACRO R6INSERT
	movw    r22,r20
	movw    r20,r18
	movw    r18,r16
	movw    r16,r14
	movw    r14,r12
	movw    r12,r10
	movw    r10,r8
	movw    r8,r6
	movw    r6,ZL
	EPILOG
.ENDMACRO
.MACRO R8INSERT
	movw    r22,r20
	movw    r20,r18
	movw    r18,r16
	movw    r16,r14
	movw    r14,r12
	movw    r12,r10
	movw    r10,r8
	movw    r8,ZL
	EPILOG
.ENDMACRO
.MACRO R10INSERT
	movw    r22,r20
	movw    r20,r18
	movw    r18,r16
	movw    r16,r14
	movw    r14,r12
	movw    r12,r10
	movw    r10,ZL
	EPILOG
.ENDMACRO
.MACRO R12INSERT
	movw    r22,r20
	movw    r20,r18
	movw    r18,r16
	movw    r16,r14
	movw    r14,r12
	movw    r12,ZL
	EPILOG
.ENDMACRO
.MACRO R14INSERT
	movw    r22,r20
	movw    r20,r18
	movw    r18,r16
	movw    r16,r14
	movw    r14,ZL
	EPILOG
.ENDMACRO
.MACRO R16INSERT
	movw    r22,r20
	movw    r20,r18
	movw    r18,r16
	movw    r16,ZL
	EPILOG
.ENDMACRO
.MACRO R18INSERT
	movw    r22,r20
	movw    r20,r18
	movw    r18,ZL
	EPILOG
.ENDMACRO
.MACRO R20INSERT
	movw    r22,r20
	movw    r20,ZL
	EPILOG
.ENDMACRO
.MACRO R22INSERT
	movw    r22,ZL
	EPILOG
.ENDMACRO
.LISTMAC

.DSEG

.ORG 0x100
buff:     .BYTE	1024
periods:  .BYTE 24 ; working copy of periods (not over a page boundary)
loops:    .BYTE	4 ; accumulate wait times (for debugging)

.CSEG

;
;  ################### NOTE PERIODS ###################
;
.ORG 0x1000

; note half periods in 16MHz clock periods. These are equivalent to the divisors
; used in the Top Octave Generator ICs. They will produce jitter free output.
; If the period is divisible by 8, it produces a 50% duty cycle. If it is
; divisible by 4 the duty cycle isn't 50%, but when halved in a divider chain
; all notes will have a 50% duty cycle

initperiod: .DW 956,1012,1072,1136,1204,1276,1352,1432,1516,1608,1704,1804

; this is an alternate set of periods that will make some notes closer to
; the equal temperament scale. The adjusted notes will appear to have jitter
; of +-250nS but after division by 8 in a divider chain this disappears.
;
; .DW 956,1012,1073,1136,1204,1276,1351,1432,1517,1607,1703,1804

;
;  ################### TIMER MATCH INTERRUPT ###################
;
; ONEOUT is the basic pattern cell handler. A byte is output to each of the Pin registers
; of Ports B and D. Bits set to 1 will cause the port to toggle. Bits set to 0 have no
; effect. Only 4 bits are significant for Port B. The pattern for it, when first loaded,
; also has the continuation bit. This is stripped off to the C register by a left shift.
; This is tested in a subsequent branch.
.MACRO ONEOUT
	ld      r25,X+ ; send one bit pattern to the output ports
	lsr     r25 ; will set C if there is another pattern following
	ld      r24,X+
	OUT     PINB,r25
	out     PIND,r24
.ENDMACRO
; SEVENOUT is seven pattern handlers with six continuation tests. There is a limit how
; far the branch that tests for continuation can jump (about 64 words), the SEVENOUT
; code block is shorter, so all branches can go to the same place (the parameter @0). 
.MACRO SEVENOUT
;
	ONEOUT
	brcc    @0
;
	ONEOUT
	brcc    @0
;
	ONEOUT
	brcc    @0
;
	ONEOUT
	brcc    @0
;
	ONEOUT
	brcc    @0
;
	ONEOUT
	brcc	@0
;
	ONEOUT
.ENDMACRO
; INTEXIT - 12 words, 17 clock cycles
.MACRO INTEXIT
	ld      r25,X+ ; set up the timer match for the next interrupt [1 word]
	sts     OCR1AH,r25 ; [2]
	ld      r25,X+ ; [1]
	sts     OCR1AL,r25 ; [2]
	sbrc    XH,2 ; if into 0x04?? go back to 0x01?? [1]
	ldi     XH,0x01 ; [1]
	in      r24,GPIOR2 ; restore r24 [1]
	in      r25,GPIOR1 ; restore SREG [1]
	out     SREG ,r25 ; [1]
	reti    ; [1]
.ENDMACRO

; Now the timer match interrupt routine. Since there is only one enabled interrupt, the routine
; is entered directly when the interrupt occurs, rather than the more conventional method of
; jumping to another location. Saves 2 clock cycles per interrupt
 
.ORG 0x000A ; is 0x0016 (the entry point for the COMPA interrupt) minus the length of INTEXIT
ISTIM1_exit1:
	INTEXIT
; .ORG    0x0016
; ISTIM1_COMPA: ; Timer1 CompareA
; From interrupt to entering the first SEVENOUT macro, 13 clock cycles. It takes a minimum
; of 4 clock cycles to reach the IN instruction, this may increase if a multi cycle
; instruction was in progress when the interrupt ocurred. The anti jitter routine
; delays 6 cycles if the interrupt latency is 4 cycles, but reduces if the latency is
; more, to a minimum of 4 cycles for a 6 cycle latency.
; The pattern generator decides if a new timer is loaded or a pattern is appended to
; an existing pattern based on the time it would take to exit and reenter the interrupt 
; routine. An appended pattern will be processed 8 cycles later. To exit and reenter the
; interrupt routine takes 18 to exit (INTEXIT plus one cycle for the branch to get to 
; INTEXIT) + 13 for this preamble plus 8 for pattern = 39 cycles so patterns are
; appended for any transition occuring less than 40 cycles after the previous pattern.
	in      r25, SREG  ; save SREG
	out     GPIOR1,r25
	out     GPIOR2,r24 ; save r24
	in      R24,TCNT0 ; jitter reduction.
	sbrc    R24,1     ; Timer normally xxxxxx00
	rjmp    PC+3      ; if xxxxxx10 delay 4 cycles [in(1), sbrc(not taken: 1), rjmp(2)]
	sbrs    R24,0     ; if xxxxxx01 delay 5 cycles [in(1), sbrc(taken: 2), sbrs(taken: 2)]
	rjmp    PC+1      ; xxxxxx00 delay 6 cycles [in(1), sbrc(taken: 2), sbrs(not taken: 1), rjmp(2)]
; Allow for a block of 12 pattern cells separated by 4 pads = 56 cells.
; This is the theoretical maximum but not likely to be encountered. Where a SEVENOUT
; block is too far from an INTEXIT to be reached in a single relative branch, the
; branch is taken to a branch that is closer to the exit. Two exits are provided
; so the first 4 SEVENOUT blocks jump to the lower address exit and the last 4
; jump to the higher address exit. The nearest exit is reached in a maximum of 4 hops.
	SEVENOUT ISTIM1_exit1 ; output up to seven cells
ISTIM1_exit2: ; jump here when too far from exit code, double jump
	brcc     ISTIM1_exit1
	SEVENOUT ISTIM1_exit2 ; output 7 cells
ISTIM1_exit3: ; jump here when too far from exit code, triple jump
	brcc    ISTIM1_exit2
	SEVENOUT ISTIM1_exit3 ; output 7 cells
ISTIM1_exit4: ; jump here when too far from exit code, four jumps
	brcc    ISTIM1_exit3
	SEVENOUT ISTIM1_exit4 ; output 7 cells
	brcc    ISTIM1_exit4
	SEVENOUT ISTIM1_exit5 ; output 7 cells
ISTIM1_exit5: ; jump here when too far from exit code, four jumps
	brcc    ISTIM1_exit6
	SEVENOUT ISTIM1_exit6 ; output 7 cells
ISTIM1_exit6: ; jump here when too far from exit code, three jumps
	brcc    ISTIM1_exit7
	SEVENOUT ISTIM1_exit7 ; output 7 cells
ISTIM1_exit7: ; jump here when too far from exit code, two jumps
	brcc	ISTIM1_exit8
	SEVENOUT ISTIM1_exit8 ; output 7 cells
ISTIM1_exit8:
	INTEXIT

;
;  ################### INITIALISATION ###################
;
.ORG 0

	jmp     start ; Reset

; the timer match interrupt is between 0 and 0x200
 
.ORG 0x200

start:
	ldi     r24,high(RAMEND) ; Main program start
	out     SPH,r24 ; Set Stack Pointer to top of RAM
	ldi     r24,low(RAMEND)
	out     SPL,r24
;
	ldi     XH,HIGH(loops) ;
	ldi     XL,LOW(loops)  ;
	clr     r24            ; ** DEBUG
	st      X+,r24         ; idle time counter
	st      X+,r24         ; not required for
	st      X+,r24         ; normal operation
	st      X+,r24         ;

; copy the note periods from program memory to working memory
	ldi     ZL,low(initperiod<<1)
	ldi     ZH,high(initperiod<<1)
	ldi     YL,low(periods)
	ldi     YH,high(periods)
moveperd:
	lpm     r24,Z+ ; load a scale from program memory
	st      Y+,r24
	cpi     YL,low(periods+24)
	brne    moveperd
; set up the 12 note counters in r0 to r23. Initially, the counter match
; value for all counters is 0, so all bits will transition when TIMR1
; is zero (the second time around, see initial condition below)
	clr     XL
	ldi     XH,0x40
	movw    r22,XL
	ldi     XH,0x50
	movw    r20,XL
	ldi     XH,0x60
	movw    r18,XL
	ldi     XH,0x70
	movw    r16,XL
	ldi     XH,0x80
	movw    r14,XL
	ldi     XH,0x90
	movw    r12,XL
	ldi     XH,0xA0
	movw    r10,XL
	ldi     XH,0xB0
	movw    r8,XL
	ldi     XH,0xC0
	movw    r6,XL
	ldi     XH,0xD0
	movw    r4,XL
	ldi     XH,0xE0
	movw    r2,XL
	ldi     XH,0xF0
	movw    r0,XL

; initial condition - assume a timer match value was fetched from location 0x03FA so the interrupt routine
; is positioned (via reg X) to output bits from 0x03FC (set to zero so has no effect). On the first
; timer interrupt, it outputs those bits, outputs the next timer match value (set to 0). Before exit,
; X is at 0x0400, this is tested and reset to 0x0100. The pattern there is actioned on the next interrupt.
; Prior to the first interrupt, the background task is (via reg Y) at location 0x0100. It retrieves data from
; GPIOR0 which is set to indicate new bits are to be added to an existing pattern cell at that location.
; The background will then create data until it reaches 0x0300 at which time it spins waiting for the
; first interrupt.

	clr     r24
	ldi     YH,0x01
	ldi     YL,0x02 ; background pointer start at 0x0102 clear back to 0x0100
	st      -Y,r24  ; 
	st      -Y,r24
	ldi     XH,0x04
	clr     XL
	st      -X,r24 ; interrupt pointer start at 0x0400 clear back to 0x03FC
	st      -X,r24
	st      -X,r24
	st      -X,r24
; GPIOR0 holds timer information for the last bits set up by the mainline.
; Set it to 1, indicating the next note flag(s) is added to the last bit
; pattern. Initially, this is located at 0x0100 and had no bits set. The
; interrupt routine will output it on the second match. The first match
; occurs when the timer hits zero for the first time resulting in the
; output of a zero pattern from 0x03fc (which does nothing) followed by loading
; a new match value of zero from 0x03fe and the interrupt pointer wrapping to
; 0x0100 (by which time the background task has filled in data from 0x0100 to
; at least 0x0300 and is waiting for the interrupt to move from 0x03fc.
; Normal operation starts when timer1 hits zero for the second time.
	ldi     r24,0x01
	out     GPIOR0,r24
; Set PORTB to outputs on bottom 4 bits
	ldi     r25 ,0x0f
	out     DDRB,r25 
; Set PORTD to all outputs
	ldi     r25 ,0xff
	out     DDRD,r25 
; Set up Timer0 for jitter removal
; Normal Operation no interrupts
; clkIO/1 (no prescaling)
	out     TCCR0A,YL ; zero - normal operation
	sts     TIMSK0,YL ; zero - no interrupts
 ; experimentally determined that the counter should start at 1 for the jitter removal
 ; routine. This is a relationship between starting Timer0 and starting Timer1.
 ; It will need recalculating if the code changes
 	ldi     r24,1
	out     TCNT0,r24
	ldi     r24,(1<<CS00) ; start the timer
	out     TCCR0B,r24
; Set up Timer1
; Inital TCNT = 0x8000
; This allows plenty of time to load up the buffer
	ldi     R24,0x80
	sts     TCNT1H,r24
	sts     TCNT1L,YL
; Set ICR1 = 0x07FF - the value at which timer 1 resets to 0 the next time through
	ldi     r24,0x07
	sts     ICR1H,r24
	sts     ICR1L,r25 ; still 0xFF
; first timer interrupt when timer wraps to 0
	sts     OCR1AH,YL
	sts     OCR1AL,YL
; Normal Port Operation with reset on ICR1 match
; clkIO/1 (no prescaling)
	sts     TCCR1A,YL ; still zero
	ldi     r24,(1<<CS10)|(1<<WGM13)|(1<<WGM12)
	sts     TCCR1B,r24

; Enable OCR1A Interrupt
	ldi     r24,1<<OCIE1A
	sts     TIMSK1,r24

; Enable Interrupts
	sei
; first candidate is r23,r22
	EPILOG	; jumps to the first note routine
;
; ################### COUNTING SUBROUTINE ###################
;
; counter: is a debugging aid, it soaks up idle time and counts each time it is
; 'called'. If the mainline uses a CALL/RET it could introduce output jitter.
; They are both 4 cycle instructions and the anti jitter routine only caters
; for 3 cycle instructions. Instead, the 'CALL'ing code loads its high
; address in r24 then jumps to this routine. The return is an indirect jump
; using the high address provided plus a known offset for the low address.
; This is NOT good practice, but fits the circumstances. 
;
counter:
	push    r24 ; save the high address of the 'caller'
	lds     r24,loops+3 ; 4 byte counter, big endian
	inc     r24
	sts     loops+3,r24
	brne    waste1
	lds     r24,loops+2
	inc     r24
	sts     loops+2,r24
	brne    waste1
	lds     r24,loops+1
	inc     r24
	sts     loops+1,r24
	brne    waste1
	lds     r24,loops
	inc     r24
	sts     loops,r24
waste1:
; this code is not required, it just copies the same data that was copied
; on setup. It demonstrates that the period information can be altered while
; the code is running. There is plenty of spare code space, into which could be
; loaded hundreds of alternate 12-tone scales. By using some input mechanism
; (e.g. polling some port input bits) the scales can be altered 'on the fly'.
	push    YL ; save Y
	push    YH
	ldi     ZL,low(initperiod<<1)
	ldi     ZH,high(initperiod<<1)
	ldi     YL,low(periods)
	ldi     YH,high(periods)
waste2:
	lpm     r24,Z+ ; load a scale from program memory
	st      Y+,r24
	cpi     YL,low(periods+24)
	brne    waste2
	pop     YH ; restore Y
	pop     YL
; because all note routines are similar, the low address is the same for
; all. The offset 0x46 returns to the comparison of XH,YH
	ldi     ZL,0x46 ; ** WARNING ** change this if the note macro is modified
	pop     ZH
	ijmp

;
;  ################### NOTE ROUTINES ###################
;
; All note routines are similar, if there were spare cycles only one routine would
; be needed. Because cycles are precious and code space is plentiful, cycles are
; saved by having each routine hard coded for a particular note. Also, different
; strategies are used to determine where in the array an updated note timer should
; be placed. For example: the lowest note always goes to the back of the array, but
; the highest note may be inserted anywhere with a preference for particular 
; positions. The code aims to minimise the comparisons needed for the most common
; insertion points, and the comparisons are arranged so that dropping through
; a comparison is favoured to taking a branch (saving a cycle)
;

;
;  ################### C9  ###################
;
	NOTEPROC 1
;
; do a binary/linear search to see where to insert new value
; n1cpR6:
	cp      r6,ZL
	cpc     r7,ZH
	brhc    n1cpR10 ;
n1cpR2:
	cp      r2,ZL
	cpc     r3,ZH
	brhs    n1cpR0
n1cpR4:
	cp      r4,ZL
	cpc     r5,ZH
	brhs    n1insR4
; n1insR6:
	R6INSERT
n1insR4:
	R4INSERT

n1cpR0:
	cp      r0,ZL
	cpc     r1,ZH
	brhs    n1insR0
; n1insR2:
	R2INSERT

n1cpR10:
	cp      r10,ZL
	cpc     r11,ZH
	brhc    n1cpR12
; n1cpR8:
	cp      r8,ZL
	cpc     r9,ZH
	brhc    n1insR10
; n1insR8:
	R8INSERT
n1insR10:
	R10INSERT

n1insR0:
	R0INSERT ; moved down so branch to cpR10 is in range

n1cpR12:
	cp      r12,ZL
	cpc     r13,ZH
	brhc    n1cpR14
; n1insR12:
	R12INSERT

n1cpR14:
	cp      r14,ZL
	cpc     r15,ZH
	brhc    n1cpR16 ;
; n1insR14:
	R14INSERT

n1cpR16:
	cp      r16,ZL
	cpc     r17,ZH
	brhc    n1cpR18
;n1insR16:
	R16INSERT

n1cpR18:
	cp      r18,ZL
	cpc     r19,ZH
	brhc    n1cpR20
; n1insR18:
	R18INSERT

n1cpR20:
	cp      r20,ZL
	cpc     r21,ZH
	brhc    n1insR22
	R20INSERT

n1insR22:
	R22INSERT
;
;  ################### B8 ###################
;
	NOTEPROC 2
; do a search to see where to insert new value
; n2cpR6:
	cp      r6,ZL
	cpc     r7,ZH
	brhc    n2cpR10 ;
n2cpR2:
	cp      r2,ZL
	cpc     r3,ZH
	brhs    n2cpR0
n2cpR4:
	cp      r4,ZL
	cpc     r5,ZH
	brhc    n2insR6
; n2insR4:
	R4INSERT
n2insR6:
	R6INSERT

n2cpR0:
	cp      r0,ZL
	cpc     r1,ZH
	brhs    n2insR0
; n2insR2:
	R2INSERT

n2cpR10:
	cp      r10,ZL
	cpc     r11,ZH
	brhc    n2cpR12
; n2cpR8:
	cp      r8,ZL
	cpc     r9,ZH
	brhc    n2insR10
; n2insR8:
	R8INSERT
n2insR10:
	R10INSERT

n2insR0:
	R0INSERT ; moved down so branch to cpR10 is in range

n2cpR12:
	cp      r12,ZL
	cpc     r13,ZH
	brhc    n2cpR14
; n2insR12:
	R12INSERT

n2cpR14:
	cp      r14,ZL
	cpc     r15,ZH
	brhc    n2cpR16 ;
; n2insR14:
	R14INSERT

n2cpR16:
	cp      r16,ZL
	cpc     r17,ZH
	brhc    n2cpR18
;n2insR16:
	R16INSERT

n2cpR18:
	cp      r18,ZL
	cpc     r19,ZH
	brhc    n2insR20
; n2insR18:
	R18INSERT

n2insR20:
	R20INSERT

;
;  ################### A#8 ###################
;
	NOTEPROC 3
; do a search to see where to insert new value
; n3cpR6:
	cp      r6,ZL
	cpc     r7,ZH
	brhc    n3cpR10 ;
n3cpR2:
	cp      r2,ZL
	cpc     r3,ZH
	brhs    n3cpR0
n3cpR4:
	cp      r4,ZL
	cpc     r5,ZH
	brhc    n3insR6
; n3insR4:
	R4INSERT
n3insR6:
	R6INSERT

n3cpR0:
	cp      r0,ZL
	cpc     r1,ZH
	brhs    n3insR0
; n3insR2:
	R2INSERT

n3cpR10:
	cp      r10,ZL
	cpc     r11,ZH
	brhc    n3cpR12
; n3cpR8:
	cp      r8,ZL
	cpc     r9,ZH
	brhc    n3insR10
; n3insR8:
	R8INSERT
n3insR10:
	R10INSERT

n3insR0:
	R0INSERT ; moved down so cpR10 is in range

n3cpR12:
	cp      r12,ZL
	cpc     r13,ZH
	brhc    n3cpR14
; n3insR12:
	R12INSERT

n3cpR14:
	cp      r14,ZL
	cpc     r15,ZH
	brhc    n3cpR16 ;
; n3insR14:
	R14INSERT

n3cpR16:
	cp      r16,ZL
	cpc     r17,ZH
	brhc    n3insR18
;n3insR16:
	R16INSERT
n3insR18:
	R18INSERT
;
; ################### A8 ###################
;
	NOTEPROC 4
; do a search to see where to insert new value
; n4cpR2:
	cp      r2,ZL
	cpc     r3,ZH
	brhs    n4cpR0
; n4cpR4:
	cp      r4,ZL
	cpc     r5,ZH
	brhc    n4cpR8
; n4insR4:
	R4INSERT

n4cpR0:
	cp      r0,ZL
	cpc     r1,ZH
	brhs    n4insR0
; n4insR2:
	R2INSERT

n4insR0:
	R0INSERT

n4cpR8:
	cp      r8,ZL
	cpc     r9,ZH
	brhc    n4cpR10
; n4cpR6:
	cp      r6,ZL
	cpc     r7,ZH
	brhc    n4insR8
; n4insR6:
	R6INSERT
	 ;
n4insR8:
	R8INSERT

n4cpR10:
	cp      r10,ZL
	cpc     r11,ZH
	brhc    n4cpR12
; n4insR10:
	R10INSERT

n4cpR12:
	cp      r12,ZL
	cpc     r13,ZH
	brhc    n4cpR14
; n4insR12:
	R12INSERT

n4cpR14:
	cp      r14,ZL
	cpc     r15,ZH
	brhc    n4insR16 ;
; n4insR14
	R14INSERT

n4insR16:
	R16INSERT

;
; ################### G#8 ###################
;
	NOTEPROC 5
; do a search to see where to insert new value
; n5cpR2:
	cp      r2,ZL
	cpc     r3,ZH
	brhc    n5cpR6
; n5cpR0:
	cp      r0,ZL
	cpc     r1,ZH
	brhs    n5insR0
	R2INSERT

n5insR0:
	R0INSERT

n5cpR6:
	cp      r6,ZL
	cpc     r7,ZH
	brhc    n5cpR8

; n5cpR4:
	cp      r4,ZL
	cpc     r5,ZH
	brhc    n5insR6
	R4INSERT

n5insR6:
	R6INSERT

n5cpR8:
	cp      r8,ZL
	cpc     r9,ZH
	brhc    n5cpR10
	R8INSERT

n5cpR10:
	cp      r10,ZL
	cpc     r11,ZH
	brhc    n5cpR12
	R10INSERT

n5cpR12:
	cp      r12,ZL
	cpc     r13,ZH
	brhc    n5insR14
	R12INSERT

n5insR14:
	R14INSERT
;
;  ################### G8 ###################
;
	NOTEPROC 6
; do a search to see where to insert new value
n6cpR2:
	cp      r2,ZL
	cpc     r3,ZH
	brhc    n6cpR4
n6cpR0:
	cp      r0,ZL
	cpc     r1,ZH
	brhs    n6insR0
	R2INSERT

n6insR0:
	R0INSERT

n6cpR4:
	cp      r4,ZL
	cpc     r5,ZH
	brhc    n6cpR6 ;
	R4INSERT

n6cpR6:
	cp      r6,ZL
	cpc     r7,ZH
	brhc    n6cpR8
	R6INSERT

n6cpR8:
	cp      r8,ZL
	cpc     r9,ZH
	brhc    n6cpR10
	R8INSERT

n6cpR10:
	cp      r10,ZL
	cpc     r11,ZH
	brhc    n6insR12
	R10INSERT

n6insR12:
	R12INSERT
;
;  ################### F#8 ###################
;
	NOTEPROC 7
; do a linear search to see where to inset new value
; n7cpR0:
	cp      r0,ZL
	cpc     r1,ZH
	brhc    n7cpR2 ; = higher than 12, push down
	R0INSERT

n7cpR2:
	cp      r2,ZL
	cpc     r3,ZH
	brhc    n7cpR4 ; = higher than 12, push down
	R2INSERT

n7cpR4:
	cp      r4,ZL
	cpc     r5,ZH
	brhc    n7cpR6 ;
	R4INSERT

n7cpR6:
	cp      r6,ZL
	cpc     r7,ZH
	brhc    n7cpR8
	R6INSERT

n7cpR8:
	cp      r8,ZL
	cpc     r9,ZH
	brhc    n7insR10
	R8INSERT

n7insR10:
	R10INSERT
;
;  ################### F8 ###################
;
	NOTEPROC 8
; do a linear search to see where to inset new value
; n8cpR0:
	cp      r0,ZL
	cpc     r1,ZH
	brhc    n8cpR2
	R0INSERT

n8cpR2:
	cp      r2,ZL
	cpc     r3,ZH
	brhc    n8cpR4
	R2INSERT

n8cpR4:
	cp      r4,ZL
	cpc     r5,ZH
	brhc    n8cprR6 ;
	R4INSERT

n8cprR6:
	cp      r6,ZL
	cpc     r7,ZH
	brhc    n8insR8
	R6INSERT

n8insR8:
	R8INSERT
;
;  ################### E8 ###################
;
	NOTEPROC 9
; do a linear search to see where to insert new value
; n9cpR0:
	cp      r0,ZL
	cpc     r1,ZH
	brhc    n9cpR2
	R0INSERT

n9cpR2:
	cp      r2,ZL
	cpc     r3,ZH
	brhc    n9cpR4
	R2INSERT

n9cpR4:
	cp      r4,ZL
	cpc     r5,ZH
	brhc    n9insR6 ;
	R4INSERT

n9insR6:
	R6INSERT
;
;  ################### D#8 ###################
;
	NOTEPROC 10
; do a linear search to see where to insert new value
	cp      r0,ZL
	cpc     r1,ZH
	brhc    n10cpR2
	R0INSERT
;
n10cpR2:
	cp      r2,ZL
	cpc     r3,ZH
	brhc    n10insR4
	R2INSERT

n10insR4:
	R4INSERT
;
;  ################### D8 ###################
;
	NOTEPROC 11
; do a linear search to see where to insert new value
	cp      r0,ZL
	cpc     r1,ZH
	brhc    n11insR2
; n11insR0:
	R0INSERT
n11insR2:
	R2INSERT
;
;  ################### C#8 ###################
;
	NOTEPROC 12
; no search needed, always to the back of the queue
	R0INSERT
