It's just a way of making the thumbnail more interesting. Human (mostly) face expressions communicate huge amounts of data which makes the thumbnail inherently more distinctive.
Selection processes (which happen in YouTube naturally because of the high traffic) have this outcome.
Some creepy/weird outcomes of these processes are exemplified by the following picture:
In short: blame the normies.