Is this accurate?

dodo3 · Dec 1, 2005

Edge said:
Yeah XENOS is a monster, after all playing with Kameo tonight, I saw blurry backgrounds (crappy depth of field technique?), aliasing!!! (read: jaggies), and a total lack of anistropic filtering, and all that from a MS first party developer.

You think if anyone who could have gotten the graphics right, it would have been them!

Nice shader power though, but too bad all those other flaws. I really thought the next generation would solve all those problems, but I guess not.

[/sarcasm]

Is your signature accurate?

Edge · Dec 1, 2005

Yes, my signature is accurate, and I think that one of the main reasons why CELL is so impressive.

Yes sarcasm on Xenos, as the end result is the most important thing, and Kameo falls short in a lot of ways, but then again, it is early software, and I'm sure developers will solve those flaws apparent in that game, given more time. I should hope so!

Jawed · Dec 1, 2005

Bobbler said:
I'd say that's insulting to Jawed. Jawed at least knows what hes talking about (if not a bit overzealous about guestimating performance ).

Sometimes I know what I'm talking about - sometimes I'm guessing (and hopefully highlight that, consistently). Sometimes, I will admit, I'm throwing a bit of fat on the fire to get some of the more reticent (and useful) people to say something, anything. It's always good when you can get a straight answer from people like nAo and Fafalada.

I've definitely been burnt a few times on performance guesstimates...

The posting at the beginning of the thread is certainly a mish-mash.

Efficiency is still a great un-answered question (and somewhat multi-dimensional: ALUs v pipelines v texturing v pixel:vertex split, etc.) - there's a vast amount written about this in threads on here, and how unified shader architectures tackle one part of the issue. It's too early for any comparative benchmarking - perhaps with R600 (perhaps the first PC unified GPU) we'll get some insights.

Jawed

Shifty Geezer · Dec 1, 2005

It's one of those questions that can't be answered through discussion. All the pro's and cons have already been weighed. No-one knows what the real-world benefits and losses are though. It almost seems to me, perish the thought, that one GPU will be better at some things and worse at others than the other GPU.

Yes! Can you believe it? That perhaps there is no one super-killer-GPU-that-beats-the-other-at-everything-and-makes-it-look-like-dirt? Now with any luck that won't be true, and one army of sixstars will be able to gloat over the other army of sixstars for the life of the hardware. Then I'm sure we'll all be happy. But there's still that risk that, maybe, they'll both perform fairly similarly, and neither side will have the smug satisfaction of knowing their machine is better than their rivals. And I pity the gaming community if that's true - such an occurence would seemingly end the point of all the console forums in the world.

Guden Oden · Dec 1, 2005

seismologist said:
I'm not too familiar with shader stalling but what if I send 500 vertex shading operations to Xenos. Is every shader going to switch over to work on vertices and stall any pixel shader ops?

I'm not a dev, but looking at publically available info one can determine there is a dynamic scheduler in xenos that sends vertex and pixel shading jobs off to the different shader execution units on the chip. How it balances between these two different types of tasks (every other, or ratio according to the total number of each type, or some other form), I've no idea. Maybe nobody outside ATi and possibly MS does.

I've also no idea how flexible the scheduler of xenos is. Assuming it has a large list of tiny one-pixel polys to draw, there might be some form of limit on the total number of polys it can process at any one time. While it has 48 shader units, it can only work on 8 pixel fragments at a time for example.

In the dedicated shaders I'd guess it would stall the vertex ops up to the number of vertex shaders while still processing pixels in the meantime.

It might be worth noting that so far on the PC side, vertex shaders rarely, if ever stall the action. I've run Painkiller on my ancient Geforce 3, and despite some very complex levels, the game seems to be draw limited more than anything else even on such a primitive GPU, at least between fights when the CPU doesn't have to spend time ragdolling characters and running pathfinding and AI and such for enemies... And the GF3 has only ONE vertex shader. Which doesn't even complete one vertex/clock I might add.

Farid · Dec 1, 2005

Question

Ok... What's going on in here?

I've read the enigmatical and unclear thread title (That had, interestingly, two question marks... I guess that emphasize the question. Not.).

But I saw no point there.

I've read the first post, and it mentioned Gamespot forums...

But I saw no point there.

I've read the thread.

I found a few points made there, but I don't exactly comprehend what are the logical connectors used to tie all thoses point together (Shaders efficiency, Z/Color buffer bandwidth, usual quality of Jawed posting, guesswork of gamespot forum members, etc...).

Ok... Anyway, next time !eVo!-X Ant UK, when you start a thread or made a post, read this first:
http://www.beyond3d.com/forum/showthread.php?t=22121

- A clear thread title, plus a correct punctuation, plus a good subject of discussion (If possible) make a good discussion.

And if this thread turn into a discussion about the efficiency of the different Shader architectures in real world situation, then it'll remain open, if it continues sliding into the everything/nothing mish-mash oblivion, it will be locked.

PS: It might be better if one would start a new thread with a comprehensive title and all. But I'm just saying...

!eVo!-X Ant UK · Dec 1, 2005

There were 25 other posts before yours and your the only one to complain and not have a clue about the thread title. The others understood what it means and got on with disscussing the topic.

Jawed · Dec 1, 2005

Guden Oden said:
I'm not a dev, but looking at publically available info one can determine there is a dynamic scheduler in xenos that sends vertex and pixel shading jobs off to the different shader execution units on the chip. How it balances between these two different types of tasks (every other, or ratio according to the total number of each type, or some other form), I've no idea. Maybe nobody outside ATi and possibly MS does.

They're being deliberately secretive. All we know is that 31 vertex and 63 pixel threads (it's better to call them threads than batches, really) are supported concurrently - though if shaders use more than 12 registers (I think 12 is the cut-off) then concurrent thread count falls off. Each thread consists of either 64 vertices or 64 fragments (pixels).

There's a thread somewhere that discusses scheduling techniques.

It's worth observing that pixel shaders tend to perform lots of texturing (which means lots of latency due to memory accesses or texture filtering) whereas texturing in vertex shaders is (currently, at least) pretty rare. This means that pixel shaders will often be "halting" for a thread-context switch while texturing takes place. This is a natural time for vertex shaders (which won't be halting if there's no texturing) to use the shader ALUs. To be fair, other pixel shader threads will also be grabbing a slice of the action, so this is far from a complete characterisation of scheduling.

I've also no idea how flexible the scheduler of xenos is. Assuming it has a large list of tiny one-pixel polys to draw, there might be some form of limit on the total number of polys it can process at any one time.

Yeah, the debate on this is still open... Suggestions that all the pixels/polys running the same shader will be batched-together are only mildly convincing to me. I suspect there's a crude limit to this.

While it has 48 shader units, it can only work on 8 pixel fragments at a time for example.

It can work on 3 threads, each of 16 fragments - or 2 threads of fragments and 1 of vertices; or 1 thread of fragments and 2 of vertices; or 3 threads of vertices.

Jawed

Pugger · Dec 1, 2005

EDGE, and SN Systems is owned by whom? Yep that's it, SONY. And when that interview was conducted Sony were in the final days of buying them and the guys who made that statement were sitting pretty on a pile of cash. Further in that article from EDGE they failed to admit that the SN never had access to a 360 development system, were never going to get access to a 360 and got there information from 3rd parties. Still it was a good article to read.

Alpha_Spartan · Dec 1, 2005

SN Systems is about as objective as Factor5 is when it comes to the PS3. Moneyhats always come with a complimentary pair of rose-colored glasses.

Shifty Geezer · Dec 1, 2005

!eVo!-X Ant UK said:
There were 25 other posts before yours and your the only one to complain and not have a clue about the thread title. The others understood what it means and got on with disscussing the topic.

Only Vysez is a mod concerned about the structure of the information in the forum, and I agree with him. I actually posted more to my response saying discussion of other forum's threads isn't really appropriate, but delated those comments as I'm not a mod and probably shouldn't be telling people what to write or not

.

What you should really do/have done, to fit in with the ideology of the forum, is taken the concepts you read in that Gamespot post and posted them as questions. eg.

I've just read a thread looking at the comparative GFlop ratings for RSX and Xenos, and it suggests the attainable performance from RSX is substantially below that of Xenos. Key points were that RSX only has 4 GFlops worth of vector processing capability, whereas Xenos can use it's entire range of 240 GFlops on vertices if the dev wants, and that a lot of RSX Flops are unusable when those pipes are doing texture ops. It also says Xenos is 90-95% efficient whereas RSX is likely to be 50-60% efficient. The article's here.

Are these points valid? Specifically that RSX has at most 4.4 GFlops of vertex processing power, loses a lot of shader power to texture ops, and may be nearly half as efficient as Xenos in operation?

Okay, that's something of a long-winded way of saying what you wanted to know and which you conveyed anyway, but the format of posting a thread from another forum with not much more than a 'discuss' tagline is frowned upon. A thread should ideally have specific questions that can be picked up and discussed. Without a leading question debate just rambles, and that's a pain in the arse when information get's spread around different threads. eg. If something were mentioned here about RSX that was really worth knowing, it'd be lost when it came time to search. Having information pertaining to RSX in RSX threads for example keeps everything together. Ideally a thread's title should convey the discussion that thread will cover, which should be a technical question in the main, this being the technical forum.

Dave Baumann · Dec 1, 2005

Jawed said:
They're being deliberately secretive. All we know is that 31 vertex and 63 pixel threads (it's better to call them threads than batches, really) are supported concurrently - though if shaders use more than 12 registers (I think 12 is the cut-off) then concurrent thread count falls off. Each thread consists of either 64 vertices or 64 fragments (pixels).

According to ATI there are 64 PS and 48 VS threads.

It can work on 3 threads, each of 16 fragments - or 2 threads of fragments and 1 of vertices; or 1 thread of fragments and 2 of vertices; or 3 threads of vertices.

In context at any one point in time are 6 threads, two for each of the ALU arrays - plus whatever is going on over the two texture arrays.

You shouldn't get "halting" on the Shader arrays, if possible. Becuase there are two thread in context then as one thread ends/requires another instruction, the next thread is operated on for an instruction and a third thread is can then be placed in conext while the second is in operation. Should a thread be running and a texture instruction be required it is placed back into the reservation station until such a time as the texture data is ready.

Jawed · Dec 1, 2005

Dave, I think you're confusing render state contexts with threads. Previous GPUs such as R420 could only support a single render state, whereas Xenos supports 8 concurrently.

Each render state's commands will ultimately generated thousands of pixels. It's an hierarchical level above threading. Presumably in multi-pass rendering of a frame there might be 10, 20 or more render states in use.

The Sequencer in Xenos not only manages scheduling due to load-balancing (vertex versus fragments) and fetch latency (vertex or texture), but it also directly supports render state switching.

Being frank, I don't know what programmers would do with concurrent multiple render states (particularly if they need to run sequentially), but the documentation encourages devs to use them in a disciplined fashion as a way to hide the latency that's normally incurred by render-state switching, since they can overlap each other in time.

Also, when I was referring to "halting" it was in relation to a thread, not a pipeline. When a thread "halts" its because of a vertex or texture fetch. That's a direct signal to the Sequencer to shift that thread into a different mode and swap contexts.

Jawed

seismologist · Dec 1, 2005

So Jawed...correct me if I'm wrong (trying to boil this down).

The 50% efficiency number for conventional GPU doesn't come from stalling on vertex vs. pixel ops.
But it comes from stalling pixel shaders when there's a texture request?

So the real benefit of Xenos seems to be the way it's handling texture requests. This doesn't seem to be tied to the fact that it's using unified shaders.
What's to stop nvidia from doing a similar mechanism to improve efficiency while still using dedicated pixel shaders.

ERP · Dec 1, 2005

seismologist said:
So Jawed...correct me if I'm wrong (trying to boil this down).

The 50% efficiency number for conventional GPU doesn't come from stalling on vertex vs. pixel ops.
But it comes from stalling pixel shaders when there's a texture request?

So the real benefit of Xenos seems to be the way it's handling texture requests. This doesn't seem to be tied to the fact that it's using unified shaders.
What's to stop nvidia from doing a similar mechanism to improve efficiency while still using dedicated pixel shaders.

I have no idea where ATI got the 50% number from, but there is a siginificant loss of performance accross a frame because of pixel shaders stalling vertex shaders and the opposite case.

It's incredibly hard to get useful figures, you can't just benchmark it. Xbox had enough performance counters that you could estimate the loss of pixel shader output, but how often vertex shaders were stalled would have had to be a guess.

I'm sure ATI/NVidia probably has the facilities internally to establish amount of idle time, but even with that a lot of the data is going top be application dependant, there is no way you can puy 1 number on it.

Jawed · Dec 1, 2005

It's a combination of all the factors I listed earlier, pipeline stalls, imbalances between vertex/pixel pipe utilisation, ALUs sitting idle due to instruction dependency (including texturing), etc.

Xenos is the only GPU with this architecture and there aren't any usable benchmarks that I'm aware of.

A simple example of Xenos's gain from the USA relates to the rendering passes that work purely on geometry (there's no pixel shading required), e.g. Z-buffer pre-fill. These rendering passes can utilise all 48 shader pipelines, hence tackling a higher geometry complexity than, say, an 8 pipeline SM3 GPU running at the same speed in the same time. In this scenario, an 8v/24p GPU is wasting 75% of its pipelines.

This thread, intermittently, is quite useful:

http://www.beyond3d.com/forum/showpost.php?p=581315&postcount=64

That posting refers to 5 vertex shader passes.

Currently there's no evidence from R520 to support/deny that out of order thread scheduling is a win (a feature shared by Xenos, and directly related to efficiency gains centred on ameliorating texturing latency). That's because GPUs already have advanced texture-latency hiding techniques. It's frustrating not getting a clear indication.

RV530 might provide such evidence (since it utilises a 3:1 ALU:Texture ratio, like Xenos) but it's clouded by too much other stuff. With only 4 texture pipes it seems to be radically better at texturing than RV515 (also 4 texture pipes and nearly identical core and memory clocks). Hinting that higher ALU:Texture ratio is a good thing - but as I say, clouded by other variables.

So, right now, we have no evidence for a range of efficiency gains in Xenos that relate specifically to the ALU:texture workload of typical game shaders.

Jawed

Edge · Dec 1, 2005

Pugger said:
EDGE, and SN Systems is owned by whom? Yep that's it, SONY. And when that interview was conducted Sony were in the final days of buying them and the guys who made that statement were sitting pretty on a pile of cash. Further in that article from EDGE they failed to admit that the SN never had access to a 360 development system, were never going to get access to a 360 and got there information from 3rd parties. Still it was a good article to read.

I know SN Systems are owned by Sony, as they were bought by them recently. Still my signature reflects what I would say about CELL, and so that's why I like it.

Now if I say it, and I'm not owned by Sony, atleast I don't think I am, does that make it more valid?

London Geezer · Dec 1, 2005

Edge said:
I know SN Systems are owned by Sony, as they were bought by them recently. Still my signature reflects what I would say about CELL, and so that's why I like it.

Now if I say it, and I'm not owned by Sony, atleast I don't think I am, does that make it more valid?

It would make it more credible as there would be no conflict of interest.

aaaaa00 · Dec 1, 2005

london-boy said:
It would make it more credible as there would be no conflict of interest.

Only if such a person was someone who was likely to have any credible knowledge of the subjects in question in the first place.

London Geezer · Dec 1, 2005

aaaaa00 said:
Only if such a person was someone who was likely to have any credible knowledge of the subjects in question in the first place.

Well yes but i was saying that any involvment of the person with Sony would make him less credible, whether he's Einstein or not

Is this accurate?

dodo3

Edge

Jawed

Shifty Geezer

uber-Troll!

Guden Oden

Senior Member

Farid

Artist formely known as Vysez

!eVo!-X Ant UK

Jawed

Pugger

Alpha_Spartan

Shifty Geezer

uber-Troll!

Dave Baumann

Gamerscore Wh...

Jawed

seismologist

ERP

Jawed

Edge

London Geezer

aaaaa00

London Geezer

Similar threads