• Hey Guest. Check out your NeoGAF Wrapped 2025 results here!

Google: Project Genie | Experimenting with infinite interactive worlds

Anyone thinking that this technology is only a few steps away from making full games would do well to learn about Cargo Cults


iu


This is not a few visual refinements away from flying.

iu


This is not a generation or two away from communicating.

Creating a facsimile of a game will not entice the gods to deliver you a game. Though I do fear that this tech will successfully entice Cargo Cult believing investors to divert money from from traditional game developers to AI bro startups in the hope that they will deliver the riches.
 
that doesn't work tho. you can't even define game mechanics in this thing, let alone keep them consistent from one moment to the next.

how do you prototype a gameplay idea, if the obstacle in front of you just disappears the moment it leaves the camera view? how do you test game mechanics if they aren't kept consistent and work differently every time to press the button?

this tech, as seen here, is quite literally useless for anything. it has no practical use in any field or any production.
It reminds me of the the golf game on the Philips CD-I, an amazing visual illusion but completely incoherent to sell even the most rudimentary level of persistence that was essential to modern gaming even back then.

The other thing is that these are GAAS Cloud AI, even if they got beyond other pitfalls. Unless they can make this run on local hardware and software AI models on commodity hardware at proper interactive frame-rates, it is still only going to be part of the puzzle IMO, in the same way CD fast storage, 3D , Audio and physics accelerator all were.

I could see aspects of this already being part of the pipeline by the end of next-gen games, mostly to replace complex simulation for inconsequential particle fx that by their very nature don't have any useful need for persistence and can be organically calculated much cheaper as two dimensional generative image that can match a real 3D game scene's changing context frame by frame but unless they have more tricks this offers little to replace real game engines.
 
Last edited:
the only way to not see it as AI slop is if you have both, no clue how video games work, and no clue how these generative AI models work.

because if you knew both of these things, you'd also know why this is not even remotely useable for anything other than a tech demo that people play around with for an hour and then discard again.

but I guess I shouldn't expect someone on a gaming forum to know how video games function... that's my bad.


"how video games work"

Man that's way over my head. Keep the technical talk to yourself, nobody is going to understand you...

I've been gaming since NES, have a computer science degree, and have been using and following AI since its infancy.

But please enlighten us oh wise one...
 
I hate this.
I mean, don't get me wrong, it's fun that you can for example draw something, or use a picture, put it in the AI and suddenly move inside it, but... that kind of stops there.

This would kill creativity, interesting level design etc... I really wouldn't want to see AI generated worlds.

But what annoys me the most, are everyone wanting to force this. I don't get it. All these comments about "game devs are cooked", "this is the future of games" "this is how games will be made now", so many comments trying to force down a future with their remarks.

I also find it very strange how people don't realize the amount of stuff required to make an interesting game. I see so many posts of people saying that this AI create games in one day, I even saw a post saying that GTA 6 will be much less impressive now because of this. I mean come on, seriously? Only someone who never played any games and doesn't have any idea of what games involve, could say something like this.

But go ahead, don't play GTA 6, play a generated video game based on a GTA 6 screenshot, and we'll see how long you have fun with that. I know I know, this is just the beginning blablabla, of course, but people act like it's already better than current games and that it must absolutely replace how games are made, and I don't get why.
 
It reminds me of the the golf game on the Philips CD-I, an amazing visual illusion but completely incoherent to sell even the most rudimentary level of persistence that was essential to modern gaming even back then.
FMVs are actually a good comparison historically-wise. A new breakthrough in tech that convinced a whole bunch of tech companies that this was the future of games only for the realities of the limitations come crashing in with few people adopting it, though not completely invalidating potential uses.

We actually already have good use-cases of AI tech in games, such as for procedural facial animations as well as body ones, voice synthesis, procedural generation in scenarios, or the ever-so-used DLSS and even ray tracing. I don't see much future in things like genie 3 specifically though.
 
Last edited:
It reminds me of the the golf game on the Philips CD-I, an amazing visual illusion but completely incoherent to sell even the most rudimentary level of persistence that was essential to modern gaming even back then.

The other thing is that these are GAAS Cloud AI, even if they got beyond other pitfalls. Unless they can make this run on local hardware and software AI models on commodity hardware at proper interactive frame-rates, it is still only going to be part of the puzzle IMO, in the same way CD fast storage, 3D , Audio and physics accelerator all were.

I could see aspects of this already being part of the pipeline by the end of next-gen games, mostly to replace complex simulation for inconsequential particle fx that by their very nature don't have any useful need for persistence and can be organically calculated much cheaper as two dimensional generative image that can match a real 3D game scene's changing context frame by frame but unless they have more tricks this offers little to replace real game engines.

FMVs are actually a good comparison historically-wise. A new breakthrough in tech that convinced a whole bunch of tech companies that this was the future of games only for the realities of the limitations come crashing in with few people adopting it, though not completely invalidating potential uses.

Agreed. I do remember the brouhaha in the 90's when CD Rom gaming appeared and we got FMV and all the talk about fitting so much data on a CD. Games could look as good as real life, every thing changed, but really it didn't. FMV did produce some fun titles, but the limiting factors of it meant that 3D game engines, textures and polygons became the dominant force, and FMV got relegated to cutscenes.

Why was that? Well computers are great at manipulating data, they can do more when not restricted by a linear stream of images (video).

This AI approach does get rid of that linear stream of images, essentially creating a Dragon's Lair where at every rendered frame you can fork into a new direction, but in someways it still features the limitations of FMV being that there are no models, no scene to deconstruct in to individual pieces (3D assets, sounds, triggers, particles), just one amorphous blob of fuzzy data.

It lacks that ability to reference files from a disk as an oracle of truth. A car in GTA is a car, always the same (with some customizable variables), therefore can be recalled from disk today or 100 years from now. That car can be manipulated and that manipulation replayed over and over forever. It's incredibly efficient to do, it's comfortably predictable in all the right ways. We like predictability as humans, it allows us to converse and compare and share based on those ground truths.
 
None of the examples so far are "interactive" or interesting (beyond the underlying tech).

I don't mean to downplay the potential impact as the technology continues to evolve and grow. But none of it looks as interesting to interact with as your average asset flip game. Copyright violations are also going to be a problem (assuming big tech with its lawyers and lobbyists don't just conveniently rewrite everything in the name of artistic license while simultaneously sucking up all the revenue that artists could have hoped to earn).

I'm much more interested to see how generative models can work in tandem with human design: working out unique ragdoll animations on the fly, coming up with filler NPC dialogue and choices, etc.
 
Last edited:
Yeah, it is wasnt so long ago that Google was showing off their AI picture generation where a request for a dog would produce a crude image of a dog made out of images of dogs. Now they have realtime interactive video.
But the progression is not linear. Early gains are relatively easy; later ones are exponentially harder.
 
FMVs are actually a good comparison historically-wise. A new breakthrough in tech that convinced a whole bunch of tech companies that this was the future of games only for the realities of the limitations come crashing in with few people adopting it, though not completely invalidating potential uses.

We actually already have good use-cases of AI tech in games, such as for procedural facial animations as well as body ones, voice synthesis, procedural generation in scenarios, or the ever-so-used DLSS and even ray tracing. I don't see much future in things like genie 3 specifically though.

I honestly don't understand how people on an enthusiast forum are only seeing the limitations in this instead of the possibilities. There are plenty of use cases for this tech, some of which we have already seen successfully demonstrated in a crude state. Many of us are looking at the current progression and beyond, specifically at how this can be used to get to hybrid rendering, which is the first step toward the end goal.

In a hybrid renderer, the game engine handles the "logic and math" (collisions, movement, game state), while the Genie like model sits at the end of the pipeline. It acts as a generative reshader, creating photorealistic "infnite textures" and surface details in realtime.

The latest research into world models in project genie has a dual memory configuration. Beyond the 60 second environmental memory , they're now implementing a short-term "physics buffer" (about 5 secs) to predict more advanced interactions. This will lead to usecases where instead of the cpu bruteforcing millions of particles, the AI predicts the outcome (like water splashes or crushing buildings) without the normal performance lag. This tech moves us forward towards "generative detail injection" where the AI fills in the details that traditional engines just can't afford to render...Much like DLSS.


Edit:

Here's an early demonstration of what AI could do with a hybrid renderer. It uses the G-buffers (the math/geometry) from the GTA V engine to guide the AI and apply a dashcam look to it. Interestingly this demo is not realtime, it was processed frame by frame back in 2021. But is possible in realtime now.




And btw, here's is the magic of DLSS + motion vectors in a nutshell, and it can be used to partly describe why f.ex. Project Genie is exciting for the future:

Native 38x22 resolution (!)
032ffdf4557f.png


38x22 resolution upscaled to 4K by DLSS:
8bbb6e0d0890.png
 
Last edited:
I honestly don't understand how people on an enthusiast forum are only seeing the limitations in this instead of the possibilities. There are plenty of use cases for this tech, some of which we have already seen successfully demonstrated in a crude state. Many of us are looking at the current progression and beyond, specifically at how this can be used to get to hybrid rendering, which is the first step toward the end goal.

In a hybrid renderer, the game engine handles the "logic and math" (collisions, movement, game state), while the Genie like model sits at the end of the pipeline. It acts as a generative reshader, creating photorealistic "infnite textures" and surface details in realtime.

The latest research into world models in project genie has a dual memory configuration. Beyond the 60 second environmental memory , they're now implementing a short-term "physics buffer" (about 5 secs) to predict more advanced interactions. This will lead to usecases where instead of the cpu bruteforcing millions of particles, the AI predicts the outcome (like water splashes or crushing buildings) without the normal performance lag. This tech moves us forward towards "generative detail injection" where the AI fills in the details that traditional engines just can't afford to render...Much like DLSS.
I'm not going to speak for the entire forum, right? But (as always) extreme positions tend to make people retarded. From a technical/technological point of view, sure, the possibilities are endless

The biggest issue for me, in the context of something like Genie, is the idea of using Gen AI to create the entire product as if it were "magic." Some prompts here and there, and voilà. It does not make sense economically; something is not adding up, and that is at the baseline.

On top of that, we are still in the AI bubble phase. While some cracks are starting to show, no one, not even the companies themselves, knows the actual cost of AI implementation and integration
 
I'm not going to speak for the entire forum, right? But (as always) extreme positions tend to make people retarded. From a technical/technological point of view, sure, the possibilities are endless

The biggest issue for me, in the context of something like Genie, is the idea of using Gen AI to create the entire product as if it were "magic." Some prompts here and there, and voilà. It does not make sense economically; something is not adding up, and that is at the baseline.

On top of that, we are still in the AI bubble phase. While some cracks are starting to show, no one, not even the companies themselves, knows the actual cost of AI implementation and integration
Yeah, that's a fair perspective. There's definitely a lot of "icky" stuff in the current AI market and the bubble debate is real. But those are market aspects that I'm intentionally down prioritizing when talking about the tech itself. And to me, the most interesting part isn't the "magic prompt" idea per-se, that's a side-quest to me, but how this tech can eventually merge with traditional engines to solve some ages long challenges in the pipeline.
 
Last edited:
I honestly don't understand how people on an enthusiast forum are only seeing the limitations in this instead of the possibilities. There are plenty of use cases for this tech, some of which we have already seen successfully demonstrated in a crude state. Many of us are looking at the current progression and beyond, specifically at how this can be used to get to hybrid rendering, which is the first step toward the end goal.

In a hybrid renderer, the game engine handles the "logic and math" (collisions, movement, game state), while the Genie like model sits at the end of the pipeline. It acts as a generative reshader, creating photorealistic "infnite textures" and surface details in realtime.

The latest research into world models in project genie has a dual memory configuration. Beyond the 60 second environmental memory , they're now implementing a short-term "physics buffer" (about 5 secs) to predict more advanced interactions. This will lead to usecases where instead of the cpu bruteforcing millions of particles, the AI predicts the outcome (like water splashes or crushing buildings) without the normal performance lag. This tech moves us forward towards "generative detail injection" where the AI fills in the details that traditional engines just can't afford to render...Much like DLSS.


Edit:

Here's an early demonstration of what AI could do with a hybrid renderer. It uses the G-buffers (the math/geometry) from the GTA V engine to guide the AI and apply a dashcam look to it. Interestingly this demo is not realtime, it was processed frame by frame back in 2021. But is possible in realtime now.




And btw, here's is the magic of DLSS + motion vectors in a nutshell, and it can be used to partly describe why f.ex. Project Genie is exciting for the future:

Native 38x22 resolution (!)
032ffdf4557f.png


38x22 resolution upscaled to 4K by DLSS:
8bbb6e0d0890.png

Like i said, there are use-cases, and having it act as a shader of sorts is actually one of the uses i thought of. I do have a few bones to pick with the idea but it still a far more reasonable application for gaming than what genie is doing.

As mentioned before, genie 3 is essentially a video generator, its doing something thats barely different from what other AI video gen do.
 
Last edited:
there's nothing to get to, there is no end product.
this tech as seen here can not be used to create anything other than a temporally incohesive interactive animation, that falls apart within a few seconds.

there's a reason you don't see anyone do a full 360° turn in any of the example videos shown. because if they did that, literally everything you saw would disappear and be replaced with something unrelated to the original environment.

so all you see is someone slowly moving forward, or just standing around turning the camera a tiny bit left or right.

even doing a 90° turn left, followed by a 90° turn right would instantly break the demo.
There will be a point where these A/V generative tools will be able to output something tangible and editable. For example, the Suno music creation tool is great at creating a decent final waveform you can listen to, but that's it.

However, they're already building towards exporting accurate stems and (much more importantly) midi files. Meaning we're not far away from being able to generate a song, essentially get its "source" midi file, load it into a DAW, and edit it however we'd like. Then even release it as "human-created".

This will eventually happen for all of these generative AI products. It'll be much harder for games, since I don't technologically know how they would turn generated content like this into proper meshes, but rest assured that they are certainly working on doing that as we speak.

Genie 3 is a neat tech demo, but this type of service is not the end goal. Mark my words that we'll eventually get to a place where we'll be able export all of the components of a generated video game (meshes, textures, code, etc.), and then the line between human and AI created content will be forever blurred.
 
"how video games work"

Man that's way over my head. Keep the technical talk to yourself, nobody is going to understand you...

I've been gaming since NES, have a computer science degree, and have been using and following AI since its infancy.

But please enlighten us oh wise one...

if you think what is shown here is in any way useable by any video game you clearly don't know how they work or how this AI works.

that's just a fact. all you see here is an image generator with a bunch of prompt macros mapped to a keyboard. it has no inner logic, no temporal consistency, no spacial awareness.
the moment anything leaves the screen, it's gone forever. even your own character.

and the amount of memory needed to keep any simulation visually coherent (even if we assume in a future version of something like this, that there's a full 3D engine running underneath for game logic etc.) would be pure insanity. and even then the visual stability over time would not be guaranteed, so details of objects would just randomly change. you could do a 360° turn and the red door that was there before is suddenly brown and has a small windows in it.
or effects wouldn't look as previously intended, where before there were godrays coming through trees, but now, after you looked away for a second, there's randomly none now, because the AI descided it wouldn't fit.


so as to your previous statement, yes this is slop. and you'd know that if you knew how any of this functions.
 
Last edited:
There will be a point where these A/V generative tools will be able to output something tangible and editable. For example, the Suno music creation tool is great at creating a decent final waveform you can listen to, but that's it.

However, they're already building towards exporting accurate stems and (much more importantly) midi files. Meaning we're not far away from being able to generate a song, essentially get its "source" midi file, load it into a DAW, and edit it however we'd like. Then even release it as "human-created".

This will eventually happen for all of these generative AI products. It'll be much harder for games, since I don't technologically know how they would turn generated content like this into proper meshes, but rest assured that they are certainly working on doing that as we speak.

Genie 3 is a neat tech demo, but this type of service is not the end goal. Mark my words that we'll eventually get to a place where we'll be able export all of the components of a generated video game (meshes, textures, code, etc.), and then the line between human and AI created content will be forever blurred.

asset generation through AI is already happening.
and sure, that's totally something that will become normal.

the main thing I was talking about is that tech demos like these here have absolutely no usecase for actual video game production. they are just image generators that generate images in quick succession based on prompts that happen behind the scenes as you press a button.

all this right here is is a way to show how fast image generation works now. it's so fast that you can fake interactivity... for a few seconds, until it loses track of what is happening and just breaks down or almost resets as you turn the camera too far off the original image that was used to generate the first scene.
 
Yeah, that's a fair perspective. There's definitely a lot of "icky" stuff in the current AI market and the bubble debate is real. But those are market aspects that I'm intentionally down prioritizing when talking about the tech itself. And to me, the most interesting part isn't the "magic prompt" idea per-se, that's a side-quest to me, but how this tech can eventually merge with traditional engines to solve some ages long challenges in the pipeline.
I think that as soon as the Next-Gen, we could see an actual game using "AI" to drive NPC conversations. (I mean, there are already examples of this implementation). I think Unity already has AI integrations and Photoshop as well. And I'm sure software like MAYA or Zbrush have/already planned to do so.

"AI," along with ray tracing, are fundamentally changing the entire development pipeline and hardware requirements. However, unlike the way AI is often being sold as something that will reduce costs and speed everything up, I think it will ultimately make production/development way more sophisticated and complex.
 
This will eventually happen for all of these generative AI products. It'll be much harder for games, since I don't technologically know how they would turn generated content like this into proper meshes, but rest assured that they are certainly working on doing that as we speak.

Genie 3 is a neat tech demo, but this type of service is not the end goal. Mark my words that we'll eventually get to a place where we'll be able export all of the components of a generated video game (meshes, textures, code, etc.), and then the line between human and AI created content will be forever blurred.
Genie 3 isn't generating assets, its generating images and (attempting) to string them together. You need a completely different type of training and neural networks for making assets - worth mentioning some game engines are already implementing stuff like that.
 
Genie 3 isn't generating assets, its generating images and (attempting) to string them together. You need a completely different type of training and neural networks for making assets - worth mentioning some game engines are already implementing stuff like that.
Yeah, I think we're all basically saying the same thing.
 
No. I'm pointing out this neat tech demo in particular will lead nowhere, even if that doesn't apply to AI in general
Yeah, I agree with that. Specifically Genie is not going to ever become an end product. But part of this tech will lead to stuff that will.
 
Yeah, I agree with that. Specifically Genie is not going to ever become an end product. But part of this tech will lead to stuff that will.
Not for asset generation though, which requires a completely different process. At the very most a type of shader.
 
Not for asset generation though, which requires a completely different process. At the very most a type of shader.
I don't know how you can confidently say part of this tech will not lead to asset generation eventually unless you're on the development team.

The parallel I'm drawing is to Suno, which was purely generative to a final waveform. However, now they're adapting that tech to produce source midi files.

Similarly, I could see something like Genie eventually using a form of photogrammetry, or even some kind of internal volumetric capture, to essentially create an export based off of the real-time A/V it's generating.

That's not this product. That's not Genie 3. But we are undoubtedly headed in that direction, whether with something Google is cooking up, or something more bespoke to a dev environment.
 
I don't know how you can confidently say part of this tech will not lead to asset generation eventually unless you're on the development team.

The parallel I'm drawing is to Suno, which was purely generative to a final waveform. However, now they're adapting that tech to produce source midi files.

Similarly, I could see something like Genie eventually using a form of photogrammetry, or even some kind of internal volumetric capture, to essentially create an export based off of the real-time A/V it's generating.

That's not this product. That's not Genie 3. But we are undoubtedly headed in that direction, whether with something Google is cooking up, or something more bespoke to a dev environment.

Can't say for sure, but it doesn't seem clear to me what are the obvious steps this tech can take to start outputting assets.

Things like current mesh generation tools are trying to focus on rendering out objects from all angles, I guess this can do that part, but then to translate them into a mesh, that's really the hard challenge and there's zero of that within this tech (I'm presuming based on that there doesn't need to be). It's not even just mesh generation, it's clean, well modeled, meshes with good UVs and textures.

Audio, maybe, but it's overkill in many ways and there are other solutions already doing that.

Physics? That's code, there's no code in this.

I agree with others that something in a general ball park idea of using a traditional game engine and then an "AI" surface layer to upres, improve the final visual could be a product, but we're already going down that road with DLSS and frame gen to a point. But yea, just like you can draw a crappy sketch, give it to an image generator and say "make pretty with ray tracing".

None of this is of the notion that Genie IS the solution or is steps away from being the solution. It's all a pivot.
 
Last edited:
I don't know how you can confidently say part of this tech will not lead to asset generation eventually unless you're on the development team.
Because this tech already exists and has existed for a while now. Transforming 2D photos, single or multiple angles, into 3D assets.
 
Because this tech already exists and has existed for a while now. Transforming 2D photos, single or multiple angles, into 3D assets.
Yep, exactly. And I'm sure it'll eventually be integrated into either Genie, or something like it.
 
Yep, exactly. And I'm sure it'll eventually be integrated into either Genie, or something like it.
It cannot because, as i already said, genie is a video generator. Its not creating actual enviroments with meshes and assets, its only giving the ilusion of it by creating a video on the fly based on player input.

And even if you try to do a weird mix of the two in an attempt to fix these problems - like trying to generate 3D assets on the fly based on the video being made - you just run into the simple question of "why not just use AI to create the 3D world directly instead of doing all this convoluted stuff??"
 
It cannot because, as i already said, genie is a video generator. Its not creating actual enviroments with meshes and assets, its only giving the ilusion of it by creating a video on the fly based on player input.

And even if you try to do a weird mix of the two in an attempt to fix these problems - like trying to generate 3D assets on the fly based on the video being made - you just run into the simple question of "why not just use AI to create the 3D world directly instead of doing all this convoluted stuff??"
My brother in Christ. We both are agreeing on the tech, how it works, and how it's currently being utilized. I just think we have different ideas on how it'll be implemented in the future.

It's all good.
 
if you think what is shown here is in any way useable by any video game you clearly don't know how they work or how this AI works.

that's just a fact. all you see here is an image generator with a bunch of prompt macros mapped to a keyboard. it has no inner logic, no temporal consistency, no spacial awareness.
the moment anything leaves the screen, it's gone forever. even your own character.

and the amount of memory needed to keep any simulation visually coherent (even if we assume in a future version of something like this, that there's a full 3D engine running underneath for game logic etc.) would be pure insanity. and even then the visual stability over time would not be guaranteed, so details of objects would just randomly change. you could do a 360° turn and the red door that was there before is suddenly brown and has a small windows in it.
or effects wouldn't look as previously intended, where before there were godrays coming through trees, but now, after you looked away for a second, there's randomly none now, because the AI descided it wouldn't fit.


so as to your previous statement, yes this is slop. and you'd know that if you knew how any of this functions.


Everything you mention is easily solvable over the next 3-5 years at most.

You're looking at this as if it's supposed to represent the final product.

It's an experiment, and it's insanely impressive. This in some form is the future of games.

Yes it's energy intensive for now, but so is producing billions of components to create game consoles.
 
this is great for pumping out even more crap onto the playstore and app store!!

fill the marketplace with slopware!!

it is useless for actual game production
 
Games are fun because they consist of intricately conceived systems. The presentation is not the meal.

Have fun herding LLMs into churning out a finely curated, novel and consistent experience, in a cost effective manner.
 
Anyone thinking that this technology is only a few steps away from making full games would do well to learn about Cargo Cults


iu


This is not a few visual refinements away from flying.

iu


This is not a generation or two away from communicating.

Creating a facsimile of a game will not entice the gods to deliver you a game. Though I do fear that this tech will successfully entice Cargo Cult believing investors to divert money from from traditional game developers to AI bro startups in the hope that they will deliver the riches.
Bingo.

The naivety or outright willful ignorance in this thread is mindbending
 
I hope that video game users are not so simple-minded as to support a 99% automated industry and leave 99% of the artists and creators who have made this industry great unemployed. All because it looks so beautiful and realistic... ridiculous.
 
You severely underestimate just how energy intensive these servers are. Not to mention they need the same components all the same, even more actually.

Just to do what we can do today already. Instead of putting such hardware into our hands, AI scammers are reinventing the wheel to make it as something new

Project genie is just like sora, snitch AI imagery into moving video with added on rails controls

So let say they eventually get into AI able to write game codes into UE5 and let it run on local hardware like today, what's the point when human developers are doing this already, at fraction of the power needed for AI servers?
 
Top Bottom