Yeah, the final row (I've completed everything previous to that now) does seem like it contains some rather heavy stuff.I feel this way about most of the early levels. Probably 1/3 of my total playtime was the game sitting with Signal Multiplier open and me occasionally going over to it and trying some new way to beat Cyan's score (I eventually did, and our solutions are nothing alike, but I did only get mine after looking at his). It's great how easy it is to try something radically different from what you were doing before.
The later ones definitely feel tedious to me in the same way that later Spacechem levels did, though. Sequence Sorter is probably the best example of one that I really have no desire to try to optimize or rework. For a bunch of those I basically just went through and got a solution, and then maybe I parallelized it for some easy cycle gains, before moving on. I've had the most fun trying to optimize some of the really simple levels. Really happy with my 127 on Differential Converter (#3) or my 232 on Comparator (#4), for example, but those are the ones where it's fun for me to keep trying new kinds of solutions. I like the later ones too, but the appeal is in doing them once and not in going back and doing them over and over to minimize cycles.
Anyway, I'm exceedingly curious about how your image test pattern programs work, especially #2. I really thought the one I just implemented was rather close to optimal (given that the output node has 0% idle time according to the game and spends what I considered the maximum possible amount of time just pushing "pixels"), and it looks good enough compared to everyone else on my list, but yours still saves over 100 cycles (almost 10%!).
(Note: "exceedingly curious" as in "this won't let me sleep", not as in "tell me". Please don't tell me

In terms of optimization, my first major breakthrough idea to get good scores on the cycles metric was realizing that you can (and should) use
loop unrolling