So, to the point of L2 and especially L3 cache misses are not covered by even obscene ROB out of order windows (Apple’s A series can only look ahead 600 instructions or so and that is the window of opportunity to find non dependent instructions to execute [224 on Zen 2]… and even then with branches you may fall into replay traps and other hazards anyways) the comment is “well, not a problem, if you have the data in the cache before it is needed…”? Well… sure, but I think you are both saying the same thing
.
OOOE engines are designed to maximise parallelism by finding non dependent work that can be scheduled and thus indirectly cover L1 misses mostly and, depending on L3 availability and latency, L2 misses too. If you are talking about covering misses beyond that I think we are exaggerating. I need to look into more studies, but there was an older one that actually linked too aggressive memory instructions reordering and wide OOOE windows to higher mixes and lower efficiency… so not trivial:
https://citeseerx.ist.psu.edu/docum...&doi=8ffeda1abde50055e9b2308cc8c05c17e7dac2dc