Questions about PS2

That says four passes is equivalent to something. Multipass rendering can't really be capped in hardware beyond how many times you can read and write buffers.
And because of so many passes availabe, there is need of so high bandwith of EDRAM?
Yes. That's Sony's solution to the problem. Rather than draw a few times with complicated drawing and low BW requirements, draw lots of times with simple drawing requiring massive BW.
 
Yes. That's Sony's solution to the problem. Rather than draw a few times with complicated drawing and low BW requirements, draw lots of times with simple drawing requiring massive BW.
But is if frame-buffer what is rewriten or textures? And BTW, how textures is even go to EDRAM. It has 3 buses, 1 for frame-buffer write, 1 for frame-buffer read and one for textures read. But how they go to EDRAM?
 
Then how was possible all those great games with great graphics, like GT4, GOW2, Black and many more others. I just can't belive what all this was done only on EE main core + VU1.

I was bored so I decided to try this.

Black, switching to the VU0 interpreter drops be from around 200% speed to 120%.

Jak II goes from ~130% to 70% which is even bigger.

MGS2 goes from around 120% to 70%, and MGS3 is ~210% -> 105%.

GT4 only goes from 150% to 145%.

Ratchet and Clank 3 though drops from 85% to under 10%...

I also tried LoTR: The Two Towers, and it can go from over 200% around to 105% which is also pretty huge.
 
I was bored so I decided to try this.

Black, switching to the VU0 interpreter drops be from around 200% speed to 120%.

Jak II goes from ~130% to 70% which is even bigger.

MGS2 goes from around 120% to 70%, and MGS3 is ~210% -> 105%.

GT4 only goes from 150% to 145%.

Ratchet and Clank 3 though drops from 85% to under 10%...

I also tried LoTR: The Two Towers, and it can go from over 200% around to 105% which is also pretty huge.
WOW. Please explain what is VU0 interpreter. And what all those % mean.:D
 
WOW. Please explain what is VU0 interpreter. And what all those % mean.:D

Well, PCSX2 gives you a few choices in ways to emulate the PS2 CPUs(EE, IOP, VU0 and VU1), an interpreter or JIT compiler(pcsx2 calls them dynarecs or dynamic recompilers). The interpreter is the slowest option and it works by looking at each instruction, decoding it and then emulating the instruction. I'm not an expert on this, but the way I understand a JIT is that it translates larger blocks of code(instead of single instructions) before executing them, while also saving these translated blocks into memory(so it doesn't have to redo it each time they come up). A JIT compiler will usually be a lot faster than an interperter.

And that's percentage of full speed.
 
No, that's total. But think about what's happening here. A screen has 0.3M pixels. Drawing 12 million pixels in one frame means drawing 40x as many pixels as there are on the display. That means the equivalent of 40 passes, not two. With the average being 12 times. So that's a typical 12 passes (assuming a pass is a full screensworth of pixels). From another perspective, 4K has 8 million pixels per screen. PS2 was typically drawing one and half 4K screens worth of pixels, breaking the final image down into lots of separate layers each drawn on top of the previous work.

Remember that there's also overdraw, which can easily be > 4x depending on the scene. I doubt anything uses something like 40 passes.

I was bored so I decided to try this.

Black, switching to the VU0 interpreter drops be from around 200% speed to 120%.

Jak II goes from ~130% to 70% which is even bigger.

MGS2 goes from around 120% to 70%, and MGS3 is ~210% -> 105%.

GT4 only goes from 150% to 145%.

Ratchet and Clank 3 though drops from 85% to under 10%...

I also tried LoTR: The Two Towers, and it can go from over 200% around to 105% which is also pretty huge.

Looking at change in FPS or percent speed is not a good way to look at how well an optimization works in an emulator. This is a huge pet peeve for emulator developers who care about improving performance :p

This is what the numbers look like when converted to ms differentials:

Jak II: 12.82 -> 23.81 (+10.99)
MGS2: 13.89 -> 23.81 (+9.92)
MGS3: 7.94 -> 15.87 (+7.93)
GT4: 11.11 -> 11.49 (+0.38)
R&C3: 19.61 -> 166.7 (+147.09)
LOTR: 8.335 -> 15.87 (+7.535)

So the cost is about 7 to 11ms in most cases with one extreme positive and one extreme negative outlier. Without knowing what kind of hardware you ran this on it's hard to really try to guess at how severe that looks, but I doubt that's what interpreting heavy VU0 loads looks like. Interpreting VU0 is going to be much, much less efficient than recompiling it, especially since they would have little incentive to optimize it.

That and the magnitude of cost for Ratchet & Clank 3 would suggest that it has a very heavy load and the others have very light loads (5-10% utilization) with one that basically doesn't use VU0 at all. But I can't really say this definitively without knowing more about what the games and emulator are doing at a lower level. eg, it's possible Ratchet & Clank 3 does something that's a pathologically bad case for the VU0 interpreter.
 
Well, PCSX2 gives you a few choices in ways to emulate the PS2 CPUs(EE, IOP, VU0 and VU1), an interpreter or JIT compiler(pcsx2 calls them dynarecs or dynamic recompilers). The interpreter is the slowest option and it works by looking at each instruction, decoding it and then emulating the instruction. I'm not an expert on this, but the way I understand a JIT is that it translates larger blocks of code(instead of single instructions) before executing them, while also saving these translated blocks into memory(so it doesn't have to redo it each time they come up). A JIT compiler will usually be a lot faster than an interperter.

And that's percentage of full speed.
Great explanation! Thank you. So then you say what whith VU0 speed of game increase o decrease?

Remember that there's also overdraw, which can easily be > 4x depending on the scene. I doubt anything uses something like 40 passes.
Does it mean what no one game on PS2 used more than 4 passes? But there already in info about in that PDF what one game use 40 passes. :-|

This is what the numbers look like when converted to ms differentials:

Jak II: 12.82 -> 23.81 (+10.99)
MGS2: 13.89 -> 23.81 (+9.92)
MGS3: 7.94 -> 15.87 (+7.93)
GT4: 11.11 -> 11.49 (+0.38)
R&C3: 19.61 -> 166.7 (+147.09)
LOTR: 8.335 -> 15.87 (+7.535)
Same question. Does it mean what usage of VU0 increase or decrease speed? Or all this means something different? :D
 
But is if frame-buffer what is rewriten or textures? And BTW, how textures is even go to EDRAM. It has 3 buses, 1 for frame-buffer write, 1 for frame-buffer read and one for textures read. But how they go to EDRAM?

All data that goes to the eDRAM, be it primitive commands or textures, go over the GIF and the GS.

The 4MB of eDRAM is a single shared memory. What's framebuffer vs what's texture is a function of what you're doing with the memory. Usually, the 16 pixel pipelines access the 1024-bit read and write interfaces to do depth + pixel read and depth + pixel output respectively. But the write bus is also connected to the host interface to allow for texture uploading while you're not outputting pixels.

Does it mean what no one game on PS2 used more than 4 passes? But there already in info about in that PDF what one game use 40 passes. :-|

No, of course it doesn't mean that, you need to get this 4 pass thing out of your head. The technical spec you gave just gives an example of what per-pixel fillrate is available per-pass if 4 passes are used.

Read the PDF more closely because it never says "40 passes." It says 40 full screens worth of pixels. If you render a hill then render a person on top of it that means you're drawing multiple things at the same locations. That's what overdraw is. Part of that 40x surely includes substantial overdraw.

Same question. Does it mean what usage of VU0 increase or decrease speed? Or all this means something different? :D

I already said it in my post. If Ratchet & Clank 3 is what a normal "heavy" load looks like then that suggests the others that use VU0 at all are using it at about 5-10% at best.

But this is a very crude way of estimating it. Better would be if PCSX2 actually logs the utilization or number of VU0 instructions per frame or something.
 
Great explanation! Thank you. So then you say what whith VU0 speed of game increase o decrease?


Not completely sure what you're asking here. Using the interpreter slows down the emulation speed, which is what the percentages represent.

Does it mean what no one game on PS2 used more than 4 passes? But there already in info about in that PDF what one game use 40 passes. :-|

No, he just means that some of those passes(is that the right term in this case? idk) were just drawing overlapping geometry.

Looking at change in FPS or percent speed is not a good way to look at how well an optimization works in an emulator. This is a huge pet peeve for emulator developers who care about improving performance :p

...

That and the magnitude of cost for Ratchet & Clank 3 would suggest that it has a very heavy load and the others have very light loads (5-10% utilization) with one that basically doesn't use VU0 at all. But I can't really say this definitively without knowing more about what the games and emulator are doing at a lower level. eg, it's possible Ratchet & Clank 3 does something that's a pathologically bad case for the VU0 interpreter.

Yeah, it's probably a really bad way of measuring it. Maybe Ratchet just really pushes the VU0 though! :p
 
All data that goes to the eDRAM, be it primitive commands or textures, go over the GIF and the GS.
I just asked about it regards to mutipass. Does GS write in EDRAM frame-buffer or textures during multipass?

The 4MB of eDRAM is a single shared memory. What's framebuffer vs what's texture is a function of what you're doing with the memory. Usually, the 16 pixel pipelines access the 1024-bit read and write interfaces to do depth + pixel read and depth + pixel output respectively. But the write bus is also connected to the host interface to allow for texture uploading while you're not outputting pixels.
Great! This is what I wanted to know. Thank you.

Read the PDF more closely because it never says "40 passes." It says 40 full screens worth of pixels. If you render a hill then render a person on top of it that means you're drawing multiple things at the same locations. That's what overdraw is. Part of that 40x surely includes substantial overdraw.
I don't think what I understood you completely, but I'l try. :Dn But as you said, if you render a hill, then person on top of it. Does it mean what this is two passes or not. Because as I understand, mutipass rendering is when you textture polygons in first pass, when you texture them again in second pass to get (as an example) specular map, then third time in third pass, to get normal mapping, then fourth time in fourth time to get glow mapping, then fifth time in fifth pass to get motion blur, etc. And this is multipass rendering. And you just make as meny passes as you need to make resut what you need. Am I right?

I already said it in my post. If Ratchet & Clank 3 is what a normal "heavy" load looks like then that suggests the others that use VU0 at all are using it at about 5-10% at best.
Ok. Sorry. I just was confused and thought what if you use VU0 you get worse results.
 
Yeah, it's probably a really bad way of measuring it. Maybe Ratchet just really pushes the VU0 though! :p

If it is, maybe they talk about it somewhere buried in here:

https://www.youtube.com/playlist?list=PL7D649DF9B66678B7&feature=plcp

I'm not going to listen to all that to find out though ;)

Another comparison point would be to see what the impact of turning on VU1 interpretation is like. It'll probably be (usually) much worse than that from VU0 intepretation.
 
Not completely sure what you're asking here. Using the interpreter slows down the emulation speed, which is what the percentages represent.
My question was, if developer use VU0 is it increase or decrease overall game speed? Because sometimes using VU0 just slowdown main core of EE.

No, he just means that some of those passes(is that the right term in this case? idk) were just drawing overlapping geometry.
Aha, now it's clear. But was it bad afterall? I think game worked superb and was great looking for 2002.
 
I just asked about it regards to mutipass. Does GS write in EDRAM frame-buffer or textures during multipass?

Passes don't write to textures. Texture upload commands write to textures. But it's sort of arbitrary what's texture and what's framebuffer other than how you're currently using it. For example, what's the framebuffer at one point could end up being textures later if the game is doing motion blur.

Technically "framebuffer" isn't really a great term anyway, it's generally going to be a backbuffer, or you could say render target.

I don't think what I understood you completely, but I'l try. :Dn But as you said, if you render a hill, then person on top of it. Does it mean what this is two passes or not. Because as I understand, mutipass rendering is when you textture polygons in first pass, when you texture them again in second pass to get (as an example) specular map, then third time in third pass, to get normal mapping, then fourth time in fourth time to get glow mapping, then fifth time in fifth pass to get motion blur, etc. And this is multipass rendering. And you just make as meny passes as you need to make resut what you need. Am I right?

Yes, that is what passes are like. Generally, multiple passes use the same geometry but different textures, colors, modulation and blend coefficients, etc.

The person on the hill example is not two passes, it's an overlap within the same pass.

My question was, if developer use VU0 is it increase or decrease overall game speed? Because sometimes using VU0 just slowdown main core of EE.

If any game uses VU0 to net negative effect then the game's management/coding oversight must have been terrible. Programming for the VUs is hard and that always means incurring more risk that your game will have difficult to fix bugs. It's hard to imagine there were developers using VU0 who weren't also properly profiling game performance before and after doing so, and if the VU0 optimizations had a very minor positive effect they probably wouldn't be allowed to stay in (much more so if they had a negative effect)

My guess is that whatever (probably relatively few) games that used VU0 heavily were designed with VU0 in mind early on.
 
Another comparison point would be to see what the impact of turning on VU1 interpretation is like. It'll probably be (usually) much worse than that from VU0 intepretation.

Oh yeah, it's much worse. At least most of the time. Tekken Tag Tournament is one game that takes about the same hit from interpreting VU0 and VU1. I heard it does all it's 3D calculations on the EE and VU0.

My guess is that whatever (probably relatively few) games that used VU0 heavily were designed with VU0 in mind early on.

That's probably true, but, if I understand right, VU0 was mostly used in macro mode, which doesn't show up on the ps2 performance analyzer. So the actual usage could be considerably higher than 8%.
 
Oh yeah, it's much worse. At least most of the time. Tekken Tag Tournament is one game that takes about the same hit from interpreting VU0 and VU1. I heard it does all it's 3D calculations on the EE and VU0.

If the VU0 and VU1 loads are similar my guess is that VU1 is still doing 3D.. VU0 has no interface to the GS so I think at some point its results would have to go through the CPU then VU1 or GS. Not using VU1 at all for primitive data calculation seems very odd. Maybe there was multi-pass vector calculation.

Maybe VU0 is doing higher level geometry stuff. I was wondering about occlusion testing but that probably doesn't fit very well with VU0's tiny memory.

That's probably true, but, if I understand right, VU0 was mostly used in macro mode, which doesn't show up on the ps2 performance analyzer. So the actual usage could be considerably higher than 8%.

Would still show up in the profiler as something though.

I haven't used PCSX2; I hear that it has utilization percentages in the top bar. Does it list VU0 independently?

I remember asking a PS2 emulator dev the macro vs micro utilization question but that was probably over 10 years ago and I don't remember the answer :( All I can find right now is this presentation that says macro mode is the "most popular method, hands down": http://webpages.charter.net/atruseps/VU-Assembly.ppt

You'd kind of think that all these games with low VU0 utilization are using macro mode while the rare ones with high utilization tend to use micro, otherwise there'd be too little time left over for the normal CPU processing. But who knows.

If games use macro mode a lot I would imagine that's more expensive to emulate since the emulator has to go back and forth between CPU and VU0 state a lot.
 
I haven't used PCSX2; I hear that it has utilization percentages in the top bar. Does it list VU0 independently?

Nope. By default it just has EE and GS load. VU1 only shows up when you enable the mtvu option.

I remember asking a PS2 emulator dev the macro vs micro utilization question but that was probably over 10 years ago and I don't remember the answer :( All I can find right now is this presentation that says macro mode is the "most popular method, hands down": http://webpages.charter.net/atruseps/VU-Assembly.ppt

You'd kind of think that all these games with low VU0 utilization are using macro mode while the rare ones with high utilization tend to use micro, otherwise there'd be too little time left over for the normal CPU processing. But who knows.

I don't know, I just saw this post at some point a while ago.

https://forum.beyond3d.com/posts/184625/
 
I don't know, I just saw this post at some point a while ago.

https://forum.beyond3d.com/posts/184625/

Thanks, very insightful thread.

I can see why developers would prefer VU0 macro mode. In addition to being easier to program it has some big advantages being fed by the CPU's caches.

It looks like in hindsight Sony should have just had macro mode. Without micro mode there'd also be no need to have the integer ALUs or registers, control flow, fetch, dedicated memory interface, or 4K instruction + 4K data RAMs. So it'd probably save a significant amount of space, that could be used at a minimum to bring the L1 dcache up to 16KB. It looks like there's a consensus on the CPU being too limited by the weak cache subsystem - ideally some L2 cache would have been available.

I think they would have also been better off with a traditional SIMD setup where integer/FP SIMD and scalar FP share the same register file, while the other scalar integer part uses another register file. This makes much more sense with respect to the datapaths these items take in the CPU. Instead R59k + VU0 macromode is an unusual combination of 128-bit SIMD shared with the general purpose registers, a dedicated scalar single-precision FPU and register file, the VU0 128-bit FP register file and 16-bit integer register file.
 
what do you guys think was the most technically impressive game on PS2? Personally I say Ghost Hunter, it has high poly characters model, light map for flash light which is absent in all silent hill games on PS2, 3D water, individual grass that interact with the character, 480p + widescreen and advance post fx at the time. All of that and solid performance and I thought I was playing a high end Xbox 1 game. Many says GT4 since it got 1080i support and runs at 60fps with high quality car models at the time.
 
Back
Top