Real nightmare fuel.
Also, does the AI decide on its own what to do?
If so, I at least consider it impressive that it attempted to do the pose with Liquid raising his fist.
It depends on the training data (videos in this case). It is most likely some sort of Transformer-based network, so it has learned correlation between sequences (frames). Essentially, it is trained to generate matching (based on it's built-up latent space) next or missing frames in a sequence. Most Transformers are still auto-regressive, so they generate the next element in a sequence based on what has come before and use that as new input and just repeat this process. Obviously, SORA and other networks are a bit more sophisticated in their training process and probably have some post-processing going on. But most of the time, it's basically just brute-forcing quality with an enormous quantity of training data on large neural networks and not so much using really sophisticated architectures.
TL;DR: It decides what the next frame should be based on previous frames in the input sequence and what it "knows", which is based what it has learnt (training data) and how it has learnt (loss function and training objective like sequence prediction or prediction of missing-element, which is a variation on sequence prediction, etc.)