33Notes for the developers of the RPN83P app, likely myself in 6 months when I
44cannot remember how the code works.
55
6+ ** Version** : 0.8.0-dev (2023-11-29)
7+
8+ ** Project Home** : https://github.com/bxparks/rpn83p
9+
10+ ## Table of Contents
11+
12+ - [ Debug Statements] ( #debug-statements )
13+ - [ DRAW Mode] ( #draw-mode )
14+ - [ PRIM Prime Factor] ( #prim-prime-factor )
15+ - [ Prime Factor Algorithm] ( #prime-factor-algorithm )
16+ - [ Prime Factor Improvements] ( #prime-factor-improvements )
17+ - [ TVM Algorithms] ( #tvm-algorithms )
18+
19+ ## Debug Statements
20+
21+ Debugging a TI calculator app written in Z80 assembly language is a royal pain
22+ in the neck. Maybe there exist Z80 debuggers that are useful, but I don't know
23+ how to use any of them. The ` debug.asm ` file contains a number of routines that
24+ I have incrementally added to help me debug this app.
25+
26+ They are normally excluded from the binary. They are included only when the
27+ ` DEBUG ` macro is defined with the ` -DDEBUG ` flag like this in the ` Makefile ` :
28+
29+ ```
30+ $(SPASM) -DDEBUG -I $(SPASM_INC) -N rpn83p.asm $@
31+ ```
32+
33+ Currently, all debug routines are placed in Flash Page 1, but they could easily
34+ be moved to another Flash Page if needed. The routines are placed in the branch
35+ table in ` rpn83p.asm ` , and all of them start with the prefix ` _Debug ` :
36+
37+ - ` _DebugInputBuf `
38+ - ` _DebugParseBuf `
39+ - ` _DebugString `
40+ - ` _DebugPString `
41+ - ` _DebugClear `
42+ - ` _DebugOP1 `
43+ - ` _DebugEEPos `
44+ - ` _DebugUnsignedA `
45+ - ` _DebugSignedA `
46+ - ` _DebugFlags `
47+ - ` _DebugU32AsHex `
48+ - ` _DebugHLAsHex `
49+ - ` _DebugPause `
50+ - ` _DebugU32DEAsHex `
51+
52+ They are called with the usual ` bcall() ` convention like
53+ ` bcall(_DebugUnsignedA) ` . Most of these debugging functions write to the empty
54+ line on the LCD, just below the top Status line. That line is purposely left
55+ unused in the RPN83P app, for the explicit goal of allowing these debug routines
56+ to print to that line without interfering with the normal operation of the app.
57+
58+ These debug statements are intended to have no side effects so that they can be
59+ inserted into most places in the application code, without affecting the logic
60+ of the code being debugged. They should save all modified registers (including
61+ the accumulator A and the flags with the ` AF ` register), save the display cursor
62+ variables ` CurRow ` and ` CurCol ` , and restore these variables at the end of the
63+ routine. It is probably a bug if any of these routines cause side effects,
64+ because that means that adding a debug statement would cause the normal flow of
65+ the application code to change.
66+
667## DRAW Mode
768
8- The secret ` DRAW ` (maybe call it "Debug") modes are activated by the ` 2ND DRAW `
9- command. It prompts the user for a number, like the ` FIX ` or ` STO ` command.
10- Currently 4 modes defined:
69+ The secret ` DRAW ` modes are activated by the ` 2ND DRAW ` command. It prompts the
70+ user for a number, like the ` FIX ` or ` STO ` command. Currently 4 modes defined:
1171
12- - 0 (drawNodeNormal ): Normal rendering, this is the default.
72+ - 0 (drawModeNormal ): Normal rendering, this is the default.
1373- 1 (drawModeTvmSolverI): Single step through the ` I%YR ` TVM Solver
1474 calculations, and show the iteration counter (` tvmSolverCount ` ), and the
1575 internal interest rate variables (` tvmI0 ` , ` tvmI1 ` ) in place of the RPN stack
@@ -23,3 +83,92 @@ Currently 4 modes defined:
2383 always shown, instead of being overwritten by the ` inputBuf ` in Edit mode.
2484 This helps debugging the complex interaction between the input buffer and the
2585 X register.
86+
87+ Any other value is treated to be the same as 0 (drawModeNormal).
88+
89+ ## PRIM Prime Factor
90+
91+ ### Prime Factor Algorithm
92+
93+ The [ USER_GUIDE.md#prime-factors] ( USER_GUIDE.md#prime-factors ) section explains
94+ how to use the ` PRIM ` menu function to successively calculate all the prime
95+ factors of an integer ` N ` from ` [0, 2^32) ` . The largest prime less than 2^16 is
96+ ` 65521 ` . Therefore the longest time that ` PRIM ` can spend is the factorization
97+ of ` 65521*65521=4 293 001 441 ` . On a TI-84 Plus, that calculation takes 33
98+ seconds (at 6 MHz) and 13 seconds (at 15 MHz).
99+
100+ Here are some notes about how the ` PRIM ` algorithm works:
101+
102+ - The basics of the algorithm is to test all the candidate prime factors from 2
103+ to ` sqrt(N) ` .
104+ - We could simply start at 3 and increment by 2 to test every odd number to
105+ ` sqrt(N) ` . But we can do slightly better. All prime numbers ` >=5 ` are of the
106+ form ` 6k-1 ` and ` 6k+1 ` . So each iteration can increment by 6, but perform 2
107+ checks. This effectively means that we step by 3 through the candidate prime
108+ factors, instead of just by 2 (for all odd numbers), which makes the loop 50%
109+ faster.
110+ - We use integer operations instead of TI-OS floating point ops. If I recall,
111+ this makes it about 2-3X faster (floating point ops in TI-OS are surprisingly
112+ fast).
113+ - Z80 does not support integer division operations in hardware, so we have to
114+ write our own in software. The integer size of ` N ` is limited to 32 bits, so
115+ we need to write a ` div(u32, u32) ` routine.
116+ - But the loop only needs to go up to ` sqrt(N) ` , so we actually only need a
117+ ` div(u32, u16) ` routine, which if I recall is about 2X faster. This is because
118+ the bit-wise loop is reduced by 2X, but also because the dividend can be
119+ stored in a 16-bit Z80 register, instead of stored in 4 bytes of RAM.
120+ - Finally, we actually don't need a full ` div() ` operation for the ` PRIM `
121+ function. We don't need the quotient, we need only the remainder. So we
122+ implement a custom ` mod(u32, u16) ` function which is about 25% faster than the
123+ full ` div(u32, u16) ` function.
124+
125+ I think there are additional micro-optimizations left on the table that could
126+ make the ` PRIM ` function maybe 1.5X to 2X faster, without resorting to a
127+ completely different algorithm. But I suspect that the resulting code would be
128+ difficult to understand and maintain. So I decided to stop here.
129+
130+ ### Prime Factor Improvements
131+
132+ For completeness, here are some improvements that could be made in the prime
133+ factoring algorithm:
134+
135+ 1 . The ` PRIM ` function currently returns only the smallest prime factor. It must
136+ be manually called repeatedly to get additional prime factors. But each time
137+ it is called, the search for the next prime factor restarts at 2 and loops to
138+ sqrt(N).
139+
140+ This is inefficient because the search should have started at the * last*
141+ prime factor, since all candidates smaller than that number have already been
142+ tested. We could implement another function (maybe call it` PRFS ` ) that
143+ returned * all* prime factors of a number ` N ` . It could be more efficient
144+ by restarting the loop at the previous prime factor. However, this new
145+ function would need support for vectors in the RPN83P app so that it can
146+ return multiple numbers as the result. Vectors unfortunately are not
147+ currently (v0.7.0) supported.
148+ 1 . The [ Prime Number
149+ Theorem] ( https://en.wikipedia.org/wiki/Prime_number_theorem ) tells us that
150+ the number of prime numbers less than ` n ` is roughly ` n/ln(n) ` . Since we
151+ restrict our input to the ` PRIM ` function to 32-bit unsigned integers, the
152+ largest prime factor that we need to consider is ` sqrt(2^32) ` or ` 2^16 ` . That
153+ means that the number of candidate prime factors that we need to consider is
154+ roughly ` 65536/ln(65535) ` or about ` 5909 ` . According to the [ Prime Counting
155+ Function] ( https://www.dcode.fr/prime-number-pi-count ) , the actual number is
156+ ` 6542 ` . (Apparently, the ` n/ln(n) ` expression * underestimates* the actual
157+ number of primes).
158+
159+ We could pre-calculate those 6542 prime numbers into a table, consuming 13084
160+ bytes (using 16-bit integers), which is less than one flash page (16 kiB) of
161+ a TI calculator. The ` PRIM ` function would need to iterate only 6542 times
162+ through this table. In comparison, the current algorithm effectively
163+ increments through the candidates by 3, up to ` 2^16 ` , so about 21845
164+ iterations. The lookup table method would be 3.3X faster, but would increase
165+ the app flash memory size by at least 13084 bytes (most likely another flash
166+ page, so 16 kiB).
167+
168+ I'm not sure if the increase in flash size is worth it, but the ` PRIM `
169+ function could be made blindingly fast, finishing the toughest prime factor
170+ problem (of less than ` 2^32 ` ) in about 10 seconds on a TI-84+ calculator.
171+
172+ ## TVM Algorithms
173+
174+ See [ TVM Algorithms] ( TVM.md ) .
0 commit comments