Ultimately the lesson I took away from it was that you shouldn't believe low level performance claims without micro-benchmarks, and even then you should take them with a large grain of salt.
You raise a good point. So I microbenchmarked it - took the three versions of the code and threw lots of randomly generated arrays at them (the same arrays for each piece of code, naturally).
The result?
My thrown-out-there code and Water's original suggestion that he's sure would be more effective are basically the same. There's a ms in it here and there when processing thousands of arrays containing a thousand elements - sometimes one is faster, sometimes the other. They're basically the same regardless of how dense the arrays are - tried them with 50% nulls, 10% nulls, 90% nulls - within a millisecond of each other every time.
Water's other code that he was sure would be slower? Faster except where array density is very low, in which case the others take over. An example run:
iapetus' attempt - rationalised 5000 arrays of size 5000/10000 in 270ms
Water's attempt - rationalised 5000 arrays of size 5000/10000 in 97ms
Water's other approach - rationalised 5000 arrays of size 5000/10000 in 283ms
An example run with low array density:
iapetus' attempt - rationalised 5000 arrays of size 500/10000 in 60ms
Water's attempt - rationalised 5000 arrays of size 500/10000 in 84ms
Water's other approach - rationalised 5000 arrays of size 500/10000 in 61ms
An example run with high array density:
iapetus' attempt - rationalised 5000 arrays of size 9000/10000 in 147ms
Water's attempt - rationalised 5000 arrays of size 9000/10000 in 101ms
Water's other approach - rationalised 5000 arrays of size 9000/10000 in 133ms
And a couple of edge cases:
iapetus' attempt - rationalised 5000 arrays of size 10000/10000 in 24ms
Water's attempt - rationalised 5000 arrays of size 10000/10000 in 139ms
Water's other approach - rationalised 5000 arrays of size 10000/10000 in 27ms
iapetus' attempt - rationalised 5000 arrays of size 0/10000 in 41ms
Water's attempt - rationalised 5000 arrays of size 0/10000 in 57ms
Water's other approach - rationalised 5000 arrays of size 0/10000 in 44ms
All of which tells us... well, not very much more than we knew before.