Buggy Loop
Member
It's just interesting information as I had no clue Sony were doing that.
It's more like deployment of ML R&D for worldwide for PlayStation but an interesting insight into their direction. They used to have teams buy small GPU systems with hobbyist infrastructures, hardware scattered across offices, researchers maintaining their own infrastructure, no visibility, no isolation, no scheduler, etc. They centralized with large-scale GPU cluster. Infosec with the old ways of doing things with local systems is probably also how Sony studios have been hacked multiple times.
It's mainly IT but it's interesting to hear them talk about the problems they had for deployment and their reflections on the allocation and scaling. "hey we're sony, we know how to deploy servers right?" to employing a company that had more experience than them, but had launch day fuckups.
What's interesting here is that PlayStation has big ML ambitions, so much so that for a long time any GPUs they were adding were basically immediately used by their researchers.
A few key slides from the presentation so you don't have to go through 40 mins of IT talk. Interesting tidbit here is the cost explosion. Their ML R&D will it tackle that eventually? I guess that's a goal for the scaling into ML.