division is 20-40 CPU cycles. multiplication is 4. if you're dividing by the same constant in a loop, you're paying that tax a million times for no reason.
one reciprocal. computed once. everything else is multiplication.
sorting your array before the loop makes it 6x faster. same data. same algorithm. the CPU just stopped guessing wrong.
a mispredicted branch flushes the entire pipeline. 15-20 wasted cycles. per wrong guess.