[Coco] Devastated. Long term OVCC project falls short

Fri Oct 4 12:47:51 EDT 2019

On 2019-10-04 10:19 a.m., James Ross wrote:
> Those are quite surprising results, for sure.  I see why you were taken aback.
> 
> I wonder why the assembly version for Linux would be that much slower than the Windows version.  There’s got to be something weird going there – there should not be a big difference between the two running essentially the same native code (at least on same class of processor)

That could even be down to process layout causing the running code to 
overflow the processor caches. When running the tests in isolation, you 
would likely have been well within the processor's cache so you wouldn't 
have been taking the cache miss penalty (which can be quite brutal). You 
could even get the results flip-flop to Windows showing a huge slowdown 
and Linux not simply by adding or removing code elsewhere in OVCC.

Also, differences in the compiler can make a huge difference, too. For 
instance, if one is building a 64 bit binary and the other a 32 bit 
binary, but also things like which optimizations the compiler knows 
about, or will use, can make a huge difference. What you might be seeing 
is the compiler you're using to build the Windows version is less good 
at optimization than the one used to build the Linux version, or is 
optimizing for a different target architecture. It may be 
counter-intuitive, but what is optimal on one CPU may actually be 
pessimal on another, even if they're the same "architecture".

Something else to keep in mind is that on modern CPUs, the shortest code 
is often the slowest, especially when mixed in with other running code. 
Even the order of two instructions can make a huge difference, even if 
they don't look related.

Both of the above are actually good arguments for letting compilers 
figure out how to order instructions on modern CPUs. Or choose which 
instructions to use for that matter.