Support NeoGAF

Panajev2001a · Aug 9, 2004

Would you rather use a more OpenGL like approach like this:

Code:

		myStack.Push (temp_stack);
		
		IdentityMatrix ( &temp_stack );

		temp_stack.m[3][0] = 2048.0f;	// x
		temp_stack.m[3][1] = 2048.0f;	// y
		temp_stack.m[3][2] = 1000.0f;	// z;	// z
		RotMatrixY(&temp_stack, &temp_stack, [b]an++[/b]);

		if ( an >= 360.0f ) an = 0.0f;

		tie.upload_matrix (iSPS2Device);
		sps2FlushCache(iSPS2Device);	

		tie.FirePacketToVIF (m_Prim1Base,pMemEnd,m_Prim1Base,iSPS2Device);
		myStack.Pop(); //updates temp_stack, the global matrix

Or this ?

Code:

		myStack.Push (temp_stack);

		tie1.synch_matrix(); //copies the matrix stored in the tie object 
                                              //into the global temp_stack
		
		temp_stack.m[3][0] = 2148.0f;	// x
		temp_stack.m[3][1] = 1948.0f;	// y
		temp_stack.m[3][2] = 1000.0f;	// z;	// z
		RotMatrixY(&temp_stack, &temp_stack, [b]-0.017f[/b]);


		tie1.upload_matrix (iSPS2Device);
		
		sps2FlushCache(iSPS2Device);
		
		tie1.FirePacketToVIF (m_Prim1Base,pMemEnd,m_Prim1Base,iSPS2Device);

		myStack.Pop();

I tend to prefer the OpenGL-like approach, but I know some would prefer something "similar", but of course modified and enhanced, to the second approach.

They both produce an object on scree that rotates, so they both work.

FirePacketToVif... for those who have looked at ps2linux stuff before, this is only a shell of Fortuna's function: really all it is doing now is setting a few EE registers to start the DMA transfer.

We will talk more about and show pictures when we get something worthy of being shown: there is still so much to do... no matter how much you work on it.

We are at 0.01% I think, lol

.

PlayStation 2 is fun to play with: first you seem to get lost, then you seem to think you got the hang of it and then you see that it was only the beginning... rinse and repeat... this feeling continues.

Even parts you have already"done", you keep going back to them and/or add new stuff that goes along with them.

Bog · Aug 9, 2004

Definitely the OpenGL-like approach.

Panajev2001a · Aug 9, 2004

Bog said:
Definitely the OpenGL-like approach.

Thank you for your comment.

NotMSRP · Aug 9, 2004

Does the first one work with the object directly while the second one works with a copy of the object?

rastex · Aug 9, 2004

I like the second approach simply because it cuts out the if-statement which I always find ugly.

I got a lesson on PS2 dev from a guy at work the other day and it was pretty enlightening to say the least. Having only ONE bus to use between all the processors is just evil >_<

LinesInTheSand · Aug 9, 2004

Both approaches are hideos.

rastex · Aug 9, 2004

LinesInTheSand said:
Both approaches are hideos.

ya... what's the deal with [0] [1] [2]? #defines are your friend yo!

gofreak · Aug 9, 2004

I'd prefer the first one (OpenGL-like).

Panajev2001a · Aug 9, 2004

rastex said:
I like the second approach simply because it cuts out the if-statement which I always find ugly.

I got a lesson on PS2 dev from a guy at work the other day and it was pretty enlightening to say the least. Having only ONE bus to use between all the processors is just evil >_<

Writing DMA chains... well it can be entertaining... after a while lol.

To me it is the amount of interface processors you have to prepare packets for that does confuses you the most.

Want to send a Vertex to the GS, but have the VU1 process it ?

You have to prepare a DMA chain, insert into it the VIF codes for the VIF1 interface processor, insert the GIFTag(s) for the GIF unit and then fire.

There is a lot of work to be done seting-things up before you are actually doing anything useful.

Still, if you think that is evil... on ps2linux when you allocate memory with a low level application like sps2dev (which allows you to access the hardware directly [has some utilities to give you a Physical Address from a Virtual one, it allocates unswappable memory in 4KB pages, etc...], including the DMAC which can crash the Linux OS just nicely

) you are only guaranteed that each 4 KB (non swappable) page is in itself physically contiguous (problem with Virtual vs Physical addresses).

That is a problem when doing DMA transfers as the DMAC wants the Quad Words sent to be contiguous.

(the following assumes a DMA transfer mode called Source Chain, the one I am familiar with at this time, as it is the one in ps2linux seems to be the most useful due to the 4 KB pages limitation).

What you do is to stitch DMA chains: you put a DMA RET tag at the beginning of a page, then you keep adding QWs (incrementing the ptr) and increasing the QW Count (max QWC is 255 QWs) for that tag and if you get to the beginning of another page you change the old RET tag to a NEXT tag and put at the beginning of the new page a RET tag... and so on.

NEXT tags do this basically: they transfer the amount of QWs specified inside the QWC field that are physically after the DMA tag (we are not transferring the TAG itself though) and then they jump to the (Physical) address contained in the ADDR field and that QW (1 QW = 16 bytes = 128 bits) is interpreted as a DMA tag.

RET tags end a DMA transfer (like END tags would): they can transfer a number of QWs as specified in the QWC field and they also pop the address of the previous DMA CALL tag from the stack if the DMA chain was called from a DMA CALL tag.

DMA chains can be called using CALL tags.

The CALL tag jumps to the (Physical) address contained in the ADDR field (the QW there is interpreted as a DMA tag) and the address of the CALL tag is put into the stack (you can have a maximum nesting level of 2).

One you hit a RET tag you do what you have to do and when you jump back to the CALL tag you move to the following QW and interpret it as a DMA tag (it can be a CALL tag again, a NEXT tag, an END tag, etc... ).

Panajev2001a · Aug 9, 2004

rastex said:
ya... what's the deal with [0] [1] [2]? #defines are your friend yo!

Well, neither approach was anything definite.

Just parts of the while loop in main().

It was mostly a test of the Matrix stack used with the rest of the code.

Both approaches might be cleaned up in looks (so that they do not look "hideos"

), but their functionality will still be split.

The first approach leads itself to have pseudo-GL draw calls for each objects which you can abstract from that code very nicely.

I could simply have a bunch of tie<insert number>.Draw(); calls there and put what you see there in its function, like you do with OpenGL.

Panajev2001a · Aug 9, 2004

if ( an >= 360.0f ) an = 0.0f;

This can be hidden in the Rotate function since an is a global (for now).

Fafalada · Aug 9, 2004

I don't really care for either if we're talking about matrix math stuff, but that's mostly because I've converted to metaprogramming whore over the course of last year or so.

Pana, linking of DMA chains is definately the correct approach for dynamically generated display lists. Ideally you'd want to use chunks of 16-32KB though(in terms of efficient memory usage), but I guess the limitations of your environment won't let you do that.

Panajev said:
The first approach leads itself to have pseudo-GL draw calls for each objects which you can abstract from that code very nicely.
I could simply have a bunch of tie<insert number>.Draw(); calls there and put what you see there in its function, like you do with OpenGL.

Not sure if the DMA stuff is there only as an example or do you seriously suggest to keep it there for every draw call - if the latter then it's a really horrible idea. The DMA setup needs to be split away from the rest of the code in a manner that will allow you to batch stuff together - whether to fully defer the rendering for the entire scene, or to do multiple calls if you are afraid of extra frame latency, but either way, minimize the number of DMA calls to absolute minimum.

rastex said:
Having only ONE bus to use between all the processors is just evil

It's just a fact of life with UMA centric architectures - frankly the downside to it on PS2 is the woefully inadequate cache sizes on R5900 core. All other units are well served, and R59k would be as well if the data cache was 2-4x as big with 4x the asociativity.

Panajev2001a · Aug 9, 2004

Not sure if the DMA stuff is there only as an example or do you seriously suggest to keep it there for every draw call - if the latter then it's a really horrible idea. The DMA setup needs to be split away from the rest of the code in a manner that will allow you to batch stuff together - whether to fully defer the rendering for the entire scene, or to do multiple calls if you are afraid of extra frame latency, but either way, minimize the number of DMA calls to absolute minimum.

The FireToVif function only sets the TADR, QWC and CHRC registers of the EE (channel 1, from Memory to VIF1 ).

I use static DMA chains for the geometry and the textures which are generated before the while loop, when you import the geometry.

In the while loop, in mani(), I only update (then reset) a Dynamic DMA chain for things like World/Camera/etc... Matrix upload to VUs, special GIFTags to change the Texture buffer.

Basically all the stuff you re-do each frame.

I know that the trick is somehow to have tons of pre-compiled DMA chains and basically nothing dynamically done, but I guess I am not up to that level yet.

I do not plan to have the DMA fire for every draw call and that is why I am using CALL chains.

I plan to build, partially staticlaly before the while loop, and partially dynamically in the while loop a CALL chain with some CALL tags and a RET tag: each CALL tag calls a DMA chain, but I can fire the chain once if I want.

Is this a more acceptable path for you ?

At the end I want to have something like this:

load_textures_build_DMA_chains ();
load_geometry_build_DMA_chain ();//common textures and geometry loaded into RAM

while (... ) {

.
.
.

tie1.draw();

tie2.draw(); //or a drawallOBJs() function that calls all the draw functions.

.
.
.

FireDMA (); //this would simply fire the very first CALL tag in the DMA chain

}

Fafalada · Aug 9, 2004

The FireToVif function only sets the TADR, QWC and CHRC registers of the EE (channel 1, from Memory to VIF1 ).

Which starts the DMA transfer - that's why you flushed the cache before calling it in your code also. Together these two functions will run you several thousand cpu cycles, so it's worth keeping that in mind when using them (for drawing it's easy to set it up to call them just once per frame, but when you get down to do other stuff like VU0 or IPU stuff, you can't really avoid multiple dma calls...).
Anyway I asked because the sample wasn't clear on where the stuff is executed, you made it sound like it's executed per object inside the loop.

What you described now is good.
Also, there's no such thing as completely static lists so you don't need to worry about that. You will always need to generate some things dynamically - but yes, trying to minimize that is where you save CPU time.

Support NeoGAF

I need a word of advice (from C/C++ programmers): something basic...

Panajev2001a

GAF's Pleasant Genius

Bog

Junior Ace

Panajev2001a

GAF's Pleasant Genius

NotMSRP

Member

rastex

Banned

LinesInTheSand

Banned

rastex

Banned

gofreak

GAF's Bob Woodward

Panajev2001a

GAF's Pleasant Genius

Panajev2001a

GAF's Pleasant Genius

Panajev2001a

GAF's Pleasant Genius

Fafalada

Fafracer forever

Panajev2001a

GAF's Pleasant Genius

Fafalada

Fafracer forever

Similar threads