-
Notifications
You must be signed in to change notification settings - Fork 25
code clean up and some optimisations #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
code clean up and some optimisations #1
Conversation
|
Can you explain the code logic the comments? How are you using time? Why not just base the open and close events on a straight pulse count based on the tooth wheel? Are you trying to relate valve duration to RPM at this point? |
| hallcounter++; | ||
| timeGap = millis() - timeGap; | ||
|
|
||
| if (timeGap >= (int)(1.5 * last_timeGap)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand why this is cast to an (int) here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably to avoid a casting for timeGap to a float. Only numbers of the same type can be compared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks. It should probably be a (long) to keep it consistent with the type declaration of these variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider multiplying the timeGap by 10 and last_timeGap by 15 to prevent calling into the floating point libraries. See this: https://github.com/nkapron/Freevalve_Arduino/blob/054797a65dec251340a225ef7532224851ff92c9/valve_control.ino#L142
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider multiplying the timeGap by 10 and last_timeGap by 15 to prevent calling into the floating point libraries. See this: https://github.com/nkapron/Freevalve_Arduino/blob/054797a65dec251340a225ef7532224851ff92c9/valve_control.ino#L142
Actually I like this approach quite a lot!
@ricallison floats are not "convinient" numbers to work on in computers, they don't exactly have the value you think they have, also conversion will happen automatically to compare an int to a float, as such losing accuracy. My way of casting the float to int basically turncates the float number to decimals only, i.e. 14.7 => 14 and 14.3 => 14, which was ok since the comparison was for larger, but the multiplication method is better and more accurate, even if potentialy a smige slower because now we have to do 2 multiplications, instead of a casting
|
Other than my one minor question nice code cleanup dude! +1 |
|
General advice, avoid using print inside an interrupt. Print statements take a considerable long time and in this case being on time is crucial for the success of the program. The workaround is to use a flag variable that you set inside the interrupt to indicate a change and in the loop function check for the flag, print, and reset. Also, all variables that are used both inside an interrupt and outside must be declared volatile to instruct the compiler no to cache their value. void loop() {
if(magnet_detected){
Serial.print(degree)
magnet_detected = 0;
}
}
void magnet_detect() {
hallcounter++;
timeGap = millis() - timeGap;
if (timeGap >= last_timeGap * 3 / 2) {
hallcounter = 1;
cycle = -1 * cycle;
}
last_timeGap = timeGap;
degree = hallcounter * 6;
if (cycle > 0)
digitalWrite(INTAKE_V, degree <= 180);
else
digitalWrite(EXHAUST_V, 180 <= degree);
magnet_detected = 1;
}Note, you can use a bool expression to set a pin value which makes the code more readable than nested ifs. |
|
@valeriu-balaban Wow my mind is 100% blown right now!🤯 I’m a high performance street car turbo engine tuner, and also somewhat of a computer programmer. This video randomly appeared on my YouTube recommended tonight and I was intrigued to see a GitHub and wanted to see what’s up and Aha! It’s buzzing with activity in the past 4 hours! I just learned a few things from your improvements that were just the kind of thing I need for the iOS jailbreak programming I’m doing right now. The idea of controlling my race engine’s valves with code instead of a custom camshaft grind like I have now, is truly next level! 👏🏼👏🏼👏🏼👏🏼 I’ve always posted my tuning results publicly and I always hated what I call “Secret Hoarders” - racers who won’t dare share any details for fear that if anyone learns their “secret”, then they’ll easily be defeated and won’t feel so special anymore. So it’s a triple-whammy for me to have this video with the code operated valves, tuning an engine performance, and using open source code & discussion to do so!🏆 |
Valid questions all of them. I didn't change the logic, I just gave it a clean up and mved thing around a bit. From what I can see the timing is depedent on the RPM. Time is used to decide wheter the sensor is above a tooth/magnet or not, if not it will take more time than before to register the next tooth/magnet, approx. 1.5*(time from tooth n-2 to n-1). The count already happens, it's the hallcounter. Hope these answer your questions.
I left the prints where they were since the OP is still testing the system and the prints can be very usefull, otherwise I agree with you on that abd I like the way you're using the comparison in the digital write. Although, since this is a time critical system, we should test which version would be faster. |
| degree = hallcounter * 6; | ||
| if (cycle > 0) { //Bool wasn't working so this dumpster fire got started. | ||
| Serial.print("STATE 1 "); | ||
| if (degree <= 180) { //EDIT HERE INTAKE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment on PR #2 - a single degree value conditional per "cam" is not enough - there should be a range and it can also overlap between phases/cycles https://github.com/Tamaren1/Freevalve_Arduino/pull/2/files#r543460537
|
tl;dr compiler no dummy lds r17, 0x013D ; 0x80013d <last_timeGap+0x1>
lds r18, 0x013E ; 0x80013e <last_timeGap+0x2>
lds r19, 0x013F ; 0x80013f <last_timeGap+0x3>
movw r26, r18
movw r24, r16
lsr r27
ror r26
ror r25
ror r24
add r24, r16
adc r25, r17
adc r26, r18
adc r27, r19
cp r20, r24
cpc r21, r25
cpc r22, r26
cpc r23, r27
brcs .+46 ; 0x598 <_Z13magnet_detectv+0xc8>Fairly lengthy, but that's because we're using 32 bit values on an 8 bit cpu. lds r19, 0x013D ; 0x80013d <last_timeGap+0x1>
lds r20, 0x013E ; 0x80013e <last_timeGap+0x2>
lds r21, 0x013F ; 0x80013f <last_timeGap+0x3>
ldi r26, 0x03 ; 3
ldi r27, 0x00 ; 0
call 0xa0c ; 0xa0c <__muluhisi3>
call 0xa2e ; 0xa2e <__umulhisi3
mul r26, r18
movw r22, r0
mul r27, r19
movw r24, r0
mul r26, r19
add r23, r0
adc r24, r1
eor r1, r1
adc r25, r1
mul r27, r18
add r23, r0
adc r24, r1
eor r1, r1
adc r25, r1
ret ; end of __umulhisi3
mul r26, r21
add r25, r0
mul r27, r20
add r25, r0
mul r26, r20
add r24, r0
adc r25, r1
eor r1, r1
ret ; end of __muluhisi3
lsr r25
ror r24
ror r23
ror r22
cp r12, r22
cpc r13, r23
cpc r14, r24
cpc r15, r25
brcs .+46 ; 0x594 <_Z13magnet_detectv+0xc4>It's twice as many instructions and has two calls. Significantly better than floating point operations, but not better than the original. And remember, the call instruction takes 4 cycles, and the return takes 4. Original code takes 24 cycles, "optimized" takes 65 (assuming branch is taken, else 23 and 64). As in, ~2.7x as long to execute that segment. As for the digitalWrites, the compiler still only does one call to write for each pin and needs to conditionally store a 0 or 1 in the argument anyway. So I get it, and honestly I prefer the single line version, but if you're going to be optimizing digitalWrite it'll have to be done with direct port manipulation. The long and short of it is that the compiler knows what's up. Before sacrificing understanding from the author, maybe see if you need to. |
|
@Neywiny - Great analysis. What's your opinion on using |
|
Thank you. For that you basically, take the downsides of *3 / 2, and double them. It's doing 2 calls to __mul... meaning 8 calls/returns (I did check the assembly, that's not speculation/opinion). As in, if the AVR ISA manual is correct in putting each at 4 cycles, it spends more time just calling and returning than the original code did in total, let alone everything else it's doing. If I'm counting correctly, it works out to 112 cycles. Which is 4.66x. And the worst part is that this is all just for the if statement guard. I suppose the lesson here why I personally don't use the Uno for time sensitive applications. Even with hardware multiplication it's just too slow. |
|
Indeed a nice and detailed analysis. Didn't expect that, my intuition was that the compiler will optimize the two and produce the same byte code but I guess the Arduino does not enable any compiler optimizations.
The main advantage of Arduino is simplicity, a lot of resources, and makes the project accessible to a larger community. I think we can throw any code inside the interrupt as long as the interrupt is executed fast. Now, the question is how fast is fast and how to measure it. For an engine running at @Neywiny made a good point, the interrupt code must be precisely tracked in terms of cycles. Also, the loop code I wrote above is also not good, there are not enough cycles in the Arduino to send data over serial after each interrupt. This complicates debugging but does not mean an Arduino can not control the engine valves. Note, pushing the interrupt latency to 100% CPU utilization is not a good practice (it just made the math easier above) since interrupts might not be serviced on time or at all. Usually, the CPU utilization of all interrupts is kept below 20% to 50% depending on the application. To measure the duration of the interrupt we can use the volatile uint8_t interrupt_duration = 0;
volatile uint8_t magnet_detected = 0;
void loop() {
if(magnet_detected){
magnet_detected = 0;
if(interrupt_duration > 16){ // Interrupt longer than 1024 clock cycles
Serial.print("!");
}
}
}
void magnet_detect() {
uint8_t start_tcnt0 = TCNT0;
// Interrupt code here
interrupt_duration = TCNT0 - start_tcnt0;
magnet_detected = 1;
}Note, not sure if Timer 0 is initialized by default when the code does not call any timing related function. |
|
Similar to your TCNT0, we can actually pull the least significant byte of the millis() result faster. This is because the millis() function has some steps for ensuring all 4 bytes are returned correctly, but if we don't care about that (which we don't) we can avoid a few instructions. (For those curious, it disables/re-enables interrupts and saves/restores status register). It's important to work in values that the ALU operates in. Using 8-bit values is crucial here. And for anyone asking, because this does take some getting used to, no matter what the original value was it can keep track of time as long as the difference is < 256. For example, if the last interrupt fired at 184 milliseconds, and we pull the least significant byte of millis() which gets 183, the math works out. 183-184 = -1, which is represented as 255. We can verify this using 16-bit numbers. 184+255=439, in which case the lower byte is 183. The engine would need to be spinning very slowly for us to overflow that. We need < 170 mS between interrupts for our *1.5 comparison, so it needs to be doing one revolution every ~10 seconds. Which is fine. By working in 8-bit values execution time is ~1/4 what it would be otherwise, and more things can stay in registers which is good for speed. Likewise, doing things like multiplying by 6 just for readability is wasteful. I prefer to say something like What does this all mean? It means that I have gotten the code to ~51 cycles with optimizations and clean up. Total. Depending on what's needed for debugging in loop(), a lot of these values should get moved to be static inside magnet_detect. Cleans up the namespace, but no speed up. const uint8_t hall = 2;
uint8_t degrees_6 = 0;
uint8_t hallstate = 0;
uint8_t lasthallstate = 0;
uint8_t triggerTime;
uint8_t last_triggerTime;
uint8_t timeGap;
uint8_t last_timeGap;
uint8_t state;
uint8_t cycle;
extern uint32_t timer0_millis; // need to tell compiler it'll find this variable later
void setup() {
pinMode(hall, INPUT);
Serial.begin(115200);
pinMode(13, OUTPUT); // 13 is the pin the exhaust valve is connected to
pinMode(12, OUTPUT); // 12 is the pin the input valve is connected to
attachInterrupt(0, magnet_detect, RISING);//Initialize the intterrupt pin digital pin 2
}
void loop()
{
//leave this here just to be extra extra super duper sure we'll still have millis() running
Serial.println(millis());
delay(255);
}
void magnet_detect() {
degrees_6++;
triggerTime = timer0_millis;// note you need a "extern uint32_t timer0_millis;" somewhere
timeGap = triggerTime - last_triggerTime;
last_triggerTime = triggerTime;
if (timeGap >= last_timeGap + last_timeGap / 2) {
degrees_6 = 0; // changing this from 0 won't be the end of the world
cycle += 1;// ++, +=, who cares
}
last_timeGap = timeGap;
if (cycle % 2 == 0)
if (degrees_6 <= 30) { // adding lower bounds will add a clock cycle or two
PORTB |= 1 << 5;
} else {
PORTB &= ~(1 << 5);
}
else {
if (30 <= degrees_6) {
PORTB |= 1 << 4;
} else {
PORTB &= ~(1 << 4);
}
}
}If you count the cycles of each instruction its > 51, but remember we're only doing 1 degree_6 comparison. gpio setting, and return each time. Where do we go from here? Once the final values are known, we can do some tricks to generate a 256 byte table, aligned to 256 bytes, indexed by an 8-bit register. The most significant bit being the cycle 0-1, the remaining being the degrees_6 as they're < 127. This collapses the entire Now this is pod racing. |
All well and good except the interrupt is occurring every 1ms with a 60 tooth wheel at 1000rpm, necessitating the use of a usec time base to span any useful rpm range. Solve the problem, then optimize. |
|
Original code used millis(), so I used millis(). If we need micros(), we can do micros(). It's the same problem and optimization, just replace |
|
Again, solve the problem, then focus on optimization. The unsigned 8 bit integers will overflow when using a usec time base. There is no need to focus on optimizing every cycle while experimenting. Logic will almost certainly have to be changed from this point going forward. Focusing on optimizing for speed in the middle of developing the logic will only make that more cumbersome. |
|
My understanding from the video was that the logic is fine as the engine did start eventually. As for overflow, we just up the prescaler to 256. Easy fix. At 10,000 RPM we'll see 100 uSec between interrupts regularly. 256 prescaler gives us 16uS of precision, allowing us to go to 4.096 mS, which is ~250 RPM. we could even switch from starting up to running on the fly with a dynamic prescale switch after a certain RPM from 1024 to 256, allowing us to start the engine at 60 RPM, which is well in range of regular cranking speed. Or, switch functions to use 16-bit values at the start then go down to 8-bit once we're running. Also, this pull request is on optimization. If you don't want to be looking at optimizations, don't. |
|
It's good to know the constraints of critical optimizations so you can keep them in mind when working on the logic. There are some key information points missing in the test code we have been looking at here;
My guess would be the most common use is a v8 with an RPM range from 1000 to 8000. |
|
That is true, there's a lot we don't know about the final goals of this project. As you point out, constraints are important and at the moment we don't have very many except "be fast and easy". Luckily, the optimizations still hold true whether we're using microseconds or milliseconds or anything else. We should be working in 8-bit values wherever we can and avoiding any unnecessary function calls. If the timing wheel has 120 teeth my table idea still holds out. It potentially could with switching the high byte as needed for up to 255 teeth but I'm unsure on the speed implications. I do think though that using timing to measure the start of the wheel is not the greatest solution, though it is the easiest for the hardware. It would be nice if the other single pin interrupt could be connected to another sensor with one pulse/revolution, that way none of the timing matters anyway. As for number of cylinders, that's going to be a limitation on the chip and board. There are 20 pins for us on the R3 board, since we need 2 per cylinder anyway we can budget for my desired second sensor. That gives us 10 pairs to work with. Note that the Analog pins can also be used as digital pins. So max of 10 configurations of intake/exhaust open/closed. I'm not too much of a gear head, but I think often cylinders are run in a lockstep equivalent, so that should be fine. The nice thing about my table idea is that even if each of the 10 pairs has a different mapping for duration and phase and whatnot, you can just tabulate that and it'll take care of it, and to top it all off it'll be done in parallel just as a real camshaft would do it. The more I think about it the more I like it. Some initial testing shows that compared to 6 calls to digitalWrite, indexing a table is ~15x faster, and no change when it's stored in flash or changing alignment. |
|
A second pulse per rotation would solve all the timing issues for me. Going to add it to #3. |
have a look in the comments as well, I think there's more that can be done to improve performance.