30 fps because Nintendo keyframed every animation. Putting it up to 60 fps would just make the animations cycle through faster. It'd look really weird. Getting around it would require redoing the animations. No simple fix.
How are you sure that they are key framing every frame (I assume you meant every frame, since all animations have "key frames"? And how do you know that's why they limited it to 30fps? I'm asking because if you're getting frame drops at 30fps, it's highly unlikely that it could be pushed to 60fps without massive frame drops.
Also, 3d animation is trivial to shift from 30fps or 60fps, or whatever FPS you want to do. Even if EVERY frame was key framed (which is a horrible way to do things, and I would say 100% chance of not being the case), you can still tween between key frames, it's not as smooth, but possible. And another fallback would (if the unlikely was happening), you can duplicate animation frames, which would lead to 30fps character animation, but 60fps camera player movement in the world and everything else, but I doubt this would ever be necessary (Rayman does it because it's 2D animated, and every frame is "drawn" separately)
how does that work? does it simply add frames with coordinates averaged from the previous and next frame for all data points, or is there some clever way of doing it that accounts for inverse kinematics, center of mass etc?
This depends on the engine, but to answer the question you have to go back to how 3d art/animation is created.
1) 3d models are created in a program like 3dstudio/maya/the like, you get a 3d mesh.
2) then for things that are complex (like NPCs), have bones created, which essentially give you the ability to move/animate less control points, they've virtual, the player never sees them, it beats having to animate every vertext on a complicated 3d mesh directly.
3) So you have a smaller number of control points that you can "animate". You can decide to do forward or inverse kinematics on them. Forward is the most basic, which means you rotate at the joints to make your animation. Reverse is more complicated, but you connect mulitple joints in a virtual group, and you can move the end point (say the hand), and the rest of the joints below will move to allow for that joint to be placed where it needs to be (the elbow and wrist will bend automatically), good for walking, as you can get feet to not slide around. This is the key though to your question. These "angles" or "positions" in forward or reverse kinematics are "key framed". So the animator will pick key points in time where the animation is right. For an arm moving, you would have a key frame at the beginning of an animation, and then at the end. The engine/animation player will calculate the inbetween moves. To add more "flare" you can add as many key frames inbetween as you want that deviates from automatic "tween".
What Thunder_monkey is suggesting is that Nintendo opted to not use tweening, and hand animated, every frame @ 30fps, thus, and the engine can't handle tweening to 60fps. I guess another option is the source files are "exported" to the engine which could strip out any tweening, and key frame every frame, but this would only be a problem if Nintendo didn't have any of the source animation files.
Tweening can be rudimentary, averaging and whatnot, but it can also be complicated where you can have ease in/ease out and even curves to better control the motion.
My opinion is that animation isn't a factor in why it's 30fps... But I'm just looking from the outside, just doesn't make sense to me. But feel free to prove me wrong.