Skip to content

Conversation

@glenn-slayden
Copy link
Contributor

This is one example of code where the attempt to increase the cycle count is incorrect due to the use of the += operator. (There may be others)

There are opcode calls such as dd_prefix and fd_prefix which internally increase the cycle count. Any such changes will be discarded/ignored, because the outer += operator has already captured the prior value of self->cycles locally, meaning the addition will be based on that earlier value, and will not include any increase of the self->cycles value that occurred inside the opcode.

Assuming that both the explicit opcode increase of the cycle count and the returned value from the opcode function are both meant to be added, the fix shown in this PR is one way to solve the problem.


Legal notice (do not delete)

Contributors are required to assign copyright to the original author of the project so that he retains full ownership. This ensures that other entities can use the software more easily, as they only need to deal with a single copyright holder. It also provides the original author with the assurance that he can make decisions in the future without needing to consult or obtain consent from all contributors.

By submitting this pull request (PR), you agree to the following:

You hereby assign the copyright in the code included in this pull request to the original author of the Z80 library, Manuel Sainz de Baranda y Goñi, to be licensed under the same terms as the rest of the code. You also agree to relinquish any and all copyright interest in the software, to the detriment of your heirs and successors.

@glenn-slayden
Copy link
Contributor Author

glenn-slayden commented Apr 21, 2025

I should note that this bug causes the emulator to hang in an infinite loop on FD instructions, when calling with z80_run(self, 1) because the cycle count stays at zero and thus never exceeds the limit which allows that API to return. The fix in this PR does seem to solve that problem.

@agaxia agaxia added the bug Something isn't working label Apr 21, 2025
@agaxia
Copy link
Collaborator

agaxia commented Apr 21, 2025

Have you checked that this really happens or is it a theory? Because I have not managed to reproduce the behavior you describe. In addition, self->cycles = insn_table[DATA[0] = FETCH_OPCODE(PC)](self) + self->cycles; is inherently dependent on the compiler's evaluation order, which means that the value of self->cycles could be captured before insn_table[DATA[0] = FETCH_OPCODE(PC)](self) if the compiler evaluates right-to-left. Conversely, += should implicitly require that the RHS is fully evaluated before the LHS is accessed, or at least that's what I have always assumed (specially because the parameter of zuint8 (* Insn)(Z80 *self) is not const).

The following test checks whether the cycle count stays at zero on instructions with the DD or FD prefix when calling with z80_run(self, 1). Please, try it with your compiler and report your results:

#include <Z80.h>
#include <stdio.h>
#include <string.h>

static zuint8 memory[65536];


static zuint8 cpu_read(void *context, zuint16 address)
	{return memory[address];}


static void cpu_write(void *context, zuint16 address, zuint8 value)
	{memory[address] = value;}


int main(int argc, char **argv)
	{
	Z80 cpu;
	zusize cycles;

	cpu.context      = nullptr;
	cpu.fetch_opcode = cpu_read;
	cpu.nop          = cpu_read;
	cpu.fetch        = cpu_read;
	cpu.read         = cpu_read;
	cpu.write        = cpu_write;
	cpu.in           = cpu_read;
	cpu.out          = cpu_write;
	cpu.halt         = nullptr;
	cpu.ld_i_a       = nullptr;
	cpu.ld_r_a       = nullptr;
	cpu.reti         = nullptr;
	cpu.retn         = nullptr;
	cpu.hook         = nullptr;
	cpu.illegal      = nullptr;

	memset(memory, 0, 65536);
	memory[0] = 0xFD;
	memory[1] = 0x00; // FD nop
	memory[2] = 0xDD;
	memory[3] = 0xFD; // DD FD (prefix sequence)
	memory[4] = 0xDD;
	memory[5] = 0x45; // ld b,ixl

	z80_power(&cpu, true);

	cycles = z80_run(&cpu, 1);
	printf("process prefix -- PC: %04X, cycles: %lu\n", Z80_PC(cpu), (zulong)cycles);
	cycles = z80_run(&cpu, 1);
	printf("process opcode -- PC: %04X, cycles: %lu\n", Z80_PC(cpu), (zulong)cycles);
	cycles = z80_run(&cpu, 1);
	printf("process prefix -- PC: %04X, cycles: %lu\n", Z80_PC(cpu), (zulong)cycles);
	cycles = z80_run(&cpu, 1);
	printf("replace prefix -- PC: %04X, cycles: %lu\n", Z80_PC(cpu), (zulong)cycles);
	cycles = z80_run(&cpu, 1);
	printf("replace prefix -- PC: %04X, cycles: %lu\n", Z80_PC(cpu), (zulong)cycles);
	cycles = z80_run(&cpu, 1);
	printf("process opcode -- PC: %04X, cycles: %lu\n", Z80_PC(cpu), (zulong)cycles);

	return 0;
	}
image

@redcode
Copy link
Owner

redcode commented Apr 21, 2025

Thanks for the PR Glenn,

However, according to the standard, the left-hand side (LHS) is evaluated only once. Therefore, the right-hand side (RHS) MUST be evaluated and executed BEFORE the load/modify/store sequence of the LHS occurs.


ISO/IEC 9899:1990

6.3.16.2 Compound assignment

Semantics

A compound assignment of the form E1 op= E2 differs from the simple assignment expression E1 = E1 op (E2) only in that the lvalue E1 is evaluated only once.

ISO/IEC 9899:2011

6.5.16.2 Compound assignment

Semantics

A compound assignment of the form E1 op= E2 is equivalent to the simple assignment expression E1 = E1 op (E2), except that the lvalue E1 is evaluated only once, and with respect to an indeterminately-sequenced function call, the operation of a compound assignment is a single evaluation. If E1 has an atomic type, compound assignment is a read-modify-write operation with memory_order_seq_cst memory order semantics. 113)


I'm sorry, but I don't think we can approve this PR. Also, your fix introduces undefined behavior because makes the code dependent on the compiler evaluation order, as Sofía pointed out.

However, if you find a compiler that generates the bug you have explained, we will accept this PR with the appropriate modifications that do not introduce UB (e.g., use a variable to store the value returned by the function and then add to self->cycles that variable).

@glenn-slayden
Copy link
Contributor Author

Yes it's a real problem that blocked me and actually occurs in my environment.

However, the environment in this case is far outside your supported platforms: I'm using MSVC in C++/CLI fully managed mode, which also unfortunately requires that the entire compilation unit be treated as C++. Although it's possible that the C vs. C++ language-standard is making a difference, it seems more likely that the much more dramatic compilation of the entire Z80.c file in #pragma managed mode is introducing problems for me.

For example, the target bytecode for this compilation is the ECMA CIL synthetic instruction set, which is a fundamentally stack-based virtual machine. So its possible that the C++/CLI spec allows for the (fetched value of the) left-hand side to be pushed onto the execution stack before considering anything about the RHS.

I can plainly see in the debugger that--as you note--because zuint8 (* Insn)(Z80 *self) is not const, the function call duly mutates the self->cycles member of the struct, but indeed, this is after the compiler has already begun the += operation by stacking the prior value of self->cycles.

@glenn-slayden
Copy link
Contributor Author

I don't think your understanding of the language specs pertains to this case. When they talk about the LHS being evaluated only once, I think they mean the computation of the address. I don't think the specs you quoted discuss the case I submitted above, where the interior contents of the fetched value, and not the fetched value itself, is explicitly mutated by the RHS.

The idea of += not computing the outermost address of the LHS side first defies all my intuitions about C going back decades.

The same problem happens in C#:

struct test
{
    public int x;
}

static unsafe int foo(test* p) { p->x = 999; return 0; }

Code:

var test = new test();

test* ptest = &test;

ptest->x += foo(ptest);

After running this, the value of ptest->x in the outer function is 0, not 999. Because that's what it started out as, and we added 0 to it. The attempted mutation inside the non-const function foo does not, and I think, should not have any effect, because += is an outer-scoping operation, which must be underway by having already established provenance on the interior contents of the struct. It really seems wrong to be otherwise.


Here's another example:

byte* data = stackalloc byte[4];
data[0] = 0xAA;
data[1] = 0xBB;
data[2] = 0xCC;
data[3] = 0xDD;

byte* p = &data[0];

*(p += 2) = *(p = &data[1]);

Clearly, the LHS is evaluated before the RHS. The result is that data[2] is modified to 0xBB, even though p ends up pointing to data[1]. I think everyone expects this. If, as you say, the RHS is evaluated without considering the LHS at all, then this example would end up having p point to data[3]. As above, nobody expects the starting value of p on the LHS to adopt the altered version from the RHS!


I recommend that you fix z80.c to introduce a variable and avoid this problem:

zuint8 cyc = insn_table[DATA[0] = FETCH_OPCODE(PC)](self);
self->cycles += cyc;

@glenn-slayden glenn-slayden reopened this Apr 21, 2025
@glenn-slayden
Copy link
Contributor Author

If E1 has an atomic type, compound assignment is a read-modify-write operation

In this discussion, self-> is most absolutely not an "atomic type". It's hard to make the issue arise with primitive types in the first place. The problem is that you are explicitly modifying the internal contents of the struct by reference, and the compiler has no way of knowing which internal fields of the struct any function will mutate--especially since the function is dynamically coming out of a table at runtime.

The "single access" feature of += doesn't mean that the compiler can't spread the "single" read-modify-write operation as spanning across/over the function call.

@glenn-slayden
Copy link
Contributor Author

Actually, my struct z80 in C++/CLI is a managed value struct z80 { }, which although it is a blittable struct with the traditionally obvious by-value semantics, is nevertheless a "managed value-type" in .NET parlance, which has considerably different semantics from a raw native memory image. This probably explains the different behavior.

What happened was that last week I was compiling the z80.c as a .NET assembly using so-called C++/CLI mixed-mode, where the compilation of your z80 library code was treated as a true native C struct exactly as-is, but within that obscure type of mixed .NET assembly that integrates the managed code directly into the same binary.

In that situation, I didn't have the += problem described above.

Although mixed-mode is a great, albeit aging, legacy technology, it's often the case with that the constant managed-to-native (and back) transitions can be quite a penalty, especially for this emulator, since the z80.h API can be chatty, depending on where the developer chooses to put the transition boundaries.

But I was soon amazed to discover that putting the whole z80.c/z80.h thing simply into managed mode worked 99.44% great. (Except for this PR.) Switching it all to managed mode pretty much seemed to require that the struct z80 become value struct z80--which the C++/CLI compiler sees as very different--even though the memory image/layout of the two are exactly byte-identical!

But meanwhile now it's super-duper fast compared to the mixed-mode, because there are no more managed/native transitions. Thanks.

@redcode
Copy link
Owner

redcode commented Apr 22, 2025

I have asked for opinions on this matter and have been discussing it. At least, since C++17, it is guaranteed that:

https://en.cppreference.com/w/cpp/language/eval_order:

  1. In every simple assignment expression E1 = E2 and every compound assignment expression E1 @= E2, E2 is sequenced before E1.

References

https://en.wikipedia.org/wiki/Sequence_point#cite_note-9
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0145r3.pdf


However, the C standard is somewhat ambiguous:

is there a sequence point between function argument evaluation and function call, but not between compound assignment and function call?

C90

imagen imagen

C11

imagen imagen imagen

According to ChatGPT and DeepSeek, there is no UB:

imagen

Case evaluated:

typedef struct {int counter;} S;

int do_something(S *s)
{
    s->counter += 1;
    return 4;
}

int main(int argc, char **argv)
{
    S s;
    s.counter = 0;
    s.counter += do_something(&s);
    return 0;
}

Actually, my struct z80 in C++/CLI is a managed value struct z80 { }, which although it is a blittable struct with the traditionally obvious by-value semantics, is nevertheless a "managed value-type" in .NET parlance, which has considerably different semantics from a raw native memory image. This probably explains the different behavior.

What happened was that last week I was compiling the z80.c as a .NET assembly using so-called C++/CLI mixed-mode, where the compilation of your z80 library code was treated as a true native C struct exactly as-is, but within that obscure type of mixed .NET assembly that integrates the managed code directly into the same binary.

In that situation, I didn't have the += problem described above.

Although mixed-mode is a great, albeit aging, legacy technology, it's often the case with that the constant managed-to-native (and back) transitions can be quite a penalty, especially for this emulator, since the z80.h API can be chatty, depending on where the developer chooses to put the transition boundaries.

But I was soon amazed to discover that putting the whole z80.c/z80.h thing simply into managed mode worked 99.44% great. (Except for this PR.) Switching it all to managed mode pretty much seemed to require that the struct z80 become value struct z80--which the C++/CLI compiler sees as very different--even though the memory image/layout of the two are exactly byte-identical!

But meanwhile now it's super-duper fast compared to the mixed-mode, because there are no more managed/native transitions. Thanks.

OK, that explains a lot, because I haven't seen this problem with compound assignment in my almost 3 decades writing C code.

So, I propose you something:

Fix the PR to not introduce UB and change z80_run and z80_execute in the 12 places where self->cyles += whatever(self) is used (yes, it's not just 1 line, it's 2 in z80_execute and at least 9 in z80_run). Use, as I explained in my previous comment, a temporary variable, and make this optional through a pre-defined macro and a package-specific CMake configuration option. Example:

#ifdef Z80_WITH_FORCED_EVALUATION_ORDER
	zusize step_cycles;
#endif

/* ... */

#ifdef Z80_WITH_FORCED_EVALUATION_ORDER
	step_cycles = insn_table[DATA[0] = FETCH_OPCODE(PC)](self);
	self->cycles += step_cycles;
#else
	self->cycles += insn_table[DATA[0] = FETCH_OPCODE(PC)](self);
#endif

Also, modify CMakeLists.txt, README, README.md, documentation/installation-from-source-code.rst and documentation/integration.rst accordingly by adding support for Z80_WITH_FORCED_EVALUATION_ORDER and explaining what it does.

I think we could accept a PR that does that. But if you are going to do so don't rush, take your time and check that everything is ok before asking for the review.

@glenn-slayden
Copy link
Contributor Author

haven't seen this problem with compound assignment in my almost 3 decades writing C code

I wrote C since 1988, so I'd say the case is quite obscure. Seems so unusual to compute on a field inside a function, by reference, when the computation is already pending externally on the same field? I think the reason we haven't run across it is because it's a strange thing to do.

I don't think it's the best practice to depend on this sketchy area, regardless of the language specs, because even when it works as you intended, it is doing an extra fetch, addition, and store on the main memory of self->cycles, that is, once inside the XY_PREFIX macro, and then again at the bottom of Run(), during the same cycle. I thought you were trying to be aggressive about minimizing that sort of thing.

That is, unless you don't intend for both additions to happen, which I asked earlier.

This is perhaps the more serious problem with the code as it currently stands... whether it works or not, it's not clear what is supposed to happen.

Either mutate the structure completely, including self->cycles within the instruction callbacks,... or have Run() gather the totals externally as a policy. The cases like XY_PREFIX where it is supposed (?) to happen in both (?) places is inefficient, inconsistent, and with all due respect, inelegant.

@redcode
Copy link
Owner

redcode commented Apr 22, 2025

This emulator was written in x86 assembly before being ported to C. Originally, Z80::cycles was incremented inside the instruction functions but it turned out to be more efficient to load the increment into eax and return from the function, allowing the part that controls the main loop to perform the addition and use the result to evaluate whether the loop should continue. This reduced the number of memory accesses. Some functions return 0 because they are special cases. halt() has its own execution loop, which simplifies the main loop in z80_run() and z80_execute(). On the other hand, xy_xy() handles prefix sequences and needs to be able to break the execution when the clock cycle limit is exceeded. This means that the cycles for the XY prefixes must be computed independently, as if they were instructions, making it necessary to do this as well in dd_prefix() and fd_prefix().

It may not look elegant, but from a certain point of view, it is; The goal isn't uniformity in doing everything the same way, but rather strategically using the most efficient and effective method in each specific case.

@glenn-slayden
Copy link
Contributor Author

It occurred to me that the inconsistent treatment by C++/CLI of the "managed" value struct might be resolved by simply marking the Z80::cycles field as volatile. This might allow your desired code design to run correctly. I'll try it when I get a chance.

@agaxia
Copy link
Collaborator

agaxia commented May 3, 2025

It occurred to me that the inconsistent treatment by C++/CLI of the "managed" value struct might be resolved by simply marking the Z80::cycles field as volatile. This might allow your desired code design to run correctly. I'll try it when I get a chance.

All right. Try it and if it works, I'll modify Z80::cycles by making it conditionally volatile-qualified through the Z80_WITH_VOLATILE_CYCLES configuration option:

#if !defined(__DOXYGEN__) && defined(Z80_WITH_VOLATILE_CYCLES)
	volatile
#endif
zusize cycles;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants