[Coco] Devastated. Long term OVCC project falls short
William Astle
lost at l-w.ca
Fri Oct 4 12:47:51 EDT 2019
On 2019-10-04 10:19 a.m., James Ross wrote:
> Those are quite surprising results, for sure. I see why you were taken aback.
>
> I wonder why the assembly version for Linux would be that much slower than the Windows version. There’s got to be something weird going there – there should not be a big difference between the two running essentially the same native code (at least on same class of processor)
That could even be down to process layout causing the running code to
overflow the processor caches. When running the tests in isolation, you
would likely have been well within the processor's cache so you wouldn't
have been taking the cache miss penalty (which can be quite brutal). You
could even get the results flip-flop to Windows showing a huge slowdown
and Linux not simply by adding or removing code elsewhere in OVCC.
Also, differences in the compiler can make a huge difference, too. For
instance, if one is building a 64 bit binary and the other a 32 bit
binary, but also things like which optimizations the compiler knows
about, or will use, can make a huge difference. What you might be seeing
is the compiler you're using to build the Windows version is less good
at optimization than the one used to build the Linux version, or is
optimizing for a different target architecture. It may be
counter-intuitive, but what is optimal on one CPU may actually be
pessimal on another, even if they're the same "architecture".
Something else to keep in mind is that on modern CPUs, the shortest code
is often the slowest, especially when mixed in with other running code.
Even the order of two instructions can make a huge difference, even if
they don't look related.
Both of the above are actually good arguments for letting compilers
figure out how to order instructions on modern CPUs. Or choose which
instructions to use for that matter.
More information about the Coco
mailing list