I mean, yeah, it's cool. But it is just a scriptable sound processor. And the idea seems similar to how one would do graphics. You've got a pipeline, you've got basic primitives and/or meshes/scene descriptions that you put in, you process that shit through the pipeline, have some stages that let you transform/modify that (or rather the output of each stage) in a fixed way, since no shaders during that era, until you've got your culled, projected, and rasterized frame rendered to the frame buffer, that you can then show on the screen. And they applied the same philosophy to how they compose sound. My main question before seeing this video was mostly whether the scripting "language" is Turing complete or not. And it seems like it is not. And the Tetris thing is just injecting some code using an exploit found within the sound engine that lets you access the memory.
It's all fascinating, but nothing too crazy. But it's of course unexpected if one would just expect the game to just play some MIDI files, which they would've probably rather used for most games if they hadn't had the space limitations of N64 cartridges. But it all seems like a sensible evolution from what they did during the SNES-era. Still, it's great engineering.