For floating point code, the <= comparison may indeed be slower (by one instruction) even on modern architectures. Here's the first function:
int compare_strict(double a, double b) { return a < b; }
On PowerPC, first this performs a floating point comparison (which updates `cr`, the condition register), then moves the condition register to a GPR, shifts the "compared less than" bit into place, and then returns. It takes four instructions.
Now consider this function instead:
int compare_loose(double a, double b) { return a <= b; }
This requires the same work as `compare_strict` above, but now there's two bits of interest: "was less than" and "was equal to." This requires an extra instruction (`cror` - condition register bitwise OR) to combine these two bits into one. So `compare_loose` requires five instructions, while `compare_strict` requires four.
You might think that the compiler could optimize the second function like so:
int compare_loose(double a, double b) { return ! (a > b); }
However this will incorrectly handle NaNs. `NaN1 <= NaN2` and `NaN1 > NaN2` need to both evaluate to false.