Xenos Questions

Has 3Dc or 3Dc2 been implemented for use in the Xbox2 since it is a bandwidth saving feature? If so, what are the new features(if any) of 3Dc2?

--------------------

If the R500 does have 3Dc or higher please ask this question.

Does the R500 have any other features that haven't been mentioned i.e. Fast14?

or

Has Fast14 technology been implemented for use in the R500 or on any part of the Xbox2?
 
Pete said:
Luminescent said:
Can its units work on both vertex and pixel streams, simultaneously, within the same clock cycle?
You might have seen this already, but apparently not. (Sorry, Dave.)
Thankyou very much for pointing that out. Let's see if Dave can confirm whether the work-arounds mentioned in that thread are the ones R500 empolys.
 
NP, Lumi.

Hmmm, the rumor was that Xenos would only be able to output 8 pixels per clock. ET says otherwise:

The 48 ALUs are divided into three SIMD groups of 16. When it reaches the final shader pipe, each of the 16 ALUs has the ability to write out two samples to the 10MB of EDRAM. Thus, the chip is capable of writing out a maximum of 32 samples per clock. At 500MHz, that means a peak fill rate of 16 gigasamples.

Do samples per clock translate directly into pixels drawn per clock, or does that relate to z-/stencil-ops or AA ops or FP buffer ops or anything other than physically updating the back-buffer? If it's 32 pixels per clock, why double that of current high-end PC GPUs for "only" 720p--and a shader-laden 720p, at that?

(Eh, this is probably delving into discussion territory, but I'm not sure how ET went from three SIMD groups of 16 ALUs to two samples from each of the 16 ALUs. What happened to the other 32 ALUs? Does each SIMD group take turns, outputting every third clock?)

Edit: Xmas, even I should've known how samples translate to pixels, as I did manage to read Dave's X800 and 6800 articles. Shame on me, and thanks. Actually, that begs a logic follow-up: if the 32 samples per clock figure is correct, does that mean R500 can do 4xMSAA in a single clock (8 pixels/clock * 4 samples/pixel = 32 samples/clock), vs. the 2x current cards are limited to?

MoHonRi, I don't mean to dip this thread any further into discussion (as Dave warned), but Xmas' point was that a unified shader is possible because pixel and vertex shading operations are so similar. Thus, there isn't separate, exclusive vertex and pixel functionality in each of the 48 ALUs. Rather, each ALU is more general purpose than a typical, discrete pixel or vertex shader (on, say, a GF6 or X800) b/c is can perform either operation. I don't think a typical pixel shader ALU can perform typical vertex shader functions. The appeal of a unified shader architecture is that, if you become vertex or shader limited, you can switch all of your shader units to either functionality, whereas with a typical, split architecture, you can't use an idle pixel shader unit to supplement maxed out vertex shader units.
 
Unknown Soldier said:
Does the R500 have any other features that haven't been mentioned i.e. Fast14?

or

1) Has Fast14 technology been implemented for use in the R500 or on any part of the Xbox2?

I would like to know also about this tech ( however it looks to slow to use it ).

If it does not use it : Why not?

2) What is "Fluid Reality", in a construtive answer, please (i.e. no PR talk).
 
I would like to know what type of real world performance they are getting from this thing in comparison to their current top of the line gpus. Will software need to go through some massive optimization before we see real world performance improvements.
 
Re: NP, Lumi.

Pete said:
Do samples per clock translate directly into pixels drawn per clock, or does that relate to z-/stencil-ops or AA ops or FP buffer ops or anything other than physically updating the back-buffer? If it's 32 pixels per clock, why double that of current high-end PC GPUs for "only" 720p--and a shader-laden 720p, at that?
Pixels/clock is samples/clock divided by samples/pixel (i.e. the antialiasing mode). However, the limiting factor is pixels/clock first, so without AA you just get less samples/clock but the same number of pixels. Both R420 and NV40 can output 32 samples/clock but only 16 pixels/clock with either no AA or 2xAA, and 8 pixels/clock with 4xAA.
R500 seems to be limited to 8 pixels/clock with color, and 16 with Z/stencil only.

(Eh, this is probably delving into discussion territory, but I'm not sure how ET went from three SIMD groups of 16 ALUs to two samples from each of the 16 ALUs. What happened to the other 32 ALUs? Does each SIMD group take turns, outputting every third clock?)
I'm pretty sure the internal organization is different and that the ALUs are not "outputting 2 samples per clock each". It's quad-based.
 
DemoCoder said:
Fluid reality = emotion synthesis = PR nonsense.

Probably is true, but maybe they really had put any thingh especialy to animation/physics like they exemplified.

Hope is the last to die :LOL: .
 
Luminescent said:
Pete said:
Luminescent said:
Can its units work on both vertex and pixel streams, simultaneously, within the same clock cycle?
You might have seen this already, but apparently not. (Sorry, Dave.)
Thankyou very much for pointing that out. Let's see if Dave can confirm whether the work-arounds mentioned in that thread are the ones R500 empolys.

This is not really correct.
 
Yeah,

Hasn't it been stated that each ALU can do a Vertex shader operation AND a Pixel shader operation each clock cycle?

With this piece of information I take it to mean that internally on a clock cycle it goes something like this - All Vertex operations are performed -- All Pixel Operations Performed.

Seems to me that both areas of the GPU get 'fired' each clock cycle. Sure at each moment in time only one ore the other is happening, but they both happen each cycle.

MoH

*edit - left out 'pixel' in my first sentance :oops:
 
I thought ALUs are like vector4 / scalar1.
You'd use the vector4 to work on RGBA pixels or XYZ? vertices, and the scalar1 for any 1-operand calculations. Right? No vertex/pixel processing going on at once on the same ALU...
 
MoHonRi said:
Yeah,

Hasn't it been stated that each ALU can do a Vertex operation AND a shader operation each clock cycle?
Maybe you're confusing this with vector and scalar operations.
Vertex and pixel operations are basically the same mathematical operations, except for a few instructions (gradients) that only make sense in the context of pixels. That's why a unified shader architecture can work at all.
 
Xmas said:
Maybe you're confusing this with vector and scalar operations.
Vertex and pixel operations are basically the same mathematical operations, except for a few instructions (gradients) that only make sense in the context of pixels. That's why a unified shader architecture can work at all.

Nah, I wasn't confused there, I was referring to the linked conversation thread here..

*edit - At least I don't THINK I was confused there.

Luminescent said:
Can its units work on both vertex and pixel streams, simultaneously, within the same clock cycle?

I believe the answer is yes, each ALU might not be able to do both in the same cycle, but the whole group of 48 can do both operations in a clock cycle, some of them with Vertex operations, some with shader. This confusion was brought up by the extremetech article where they state this.

All 48 of the ALUs are able to perform operations on either pixel or vertex data. All 48 have to be doing the same thing during the same clock cycle (pixel or vertex operations), but this can alternate from clock to clock. One cycle, all 48 ALUs can be crunching vertex data, the next, they can all be doing pixel ops, but they cannot be split in the same clock cycle.

They group it to mean that ALL ALU's as a group can only do one thing at time. I really doubt this is true, I was stating that most likely all ALU's Vertex and Pixel 'areas' 'fire' once each clock, so like a combustion engine rotates which cylinder fires, first the Vertex area fires crunching through any operations that it does, then the second half of the cycle the Pixel operations are performed.

MoH
 
Here is my question....

Is this correct?

TechReports Article on Xenos said:
On chip, the shaders are organized in three SIMD engines with 16 processors per unit, for a total of 48 shaders. Each of these shaders is comprised of four ALUs that can execute a single operation per cycle, so that each shader unit can execute four floating-point ops per cycle.

If this is correct then we really to have 48 Shader pipelines and a total of 192 ALU's, Or is this just semantecs...

MoH
 
MoHonRi said:
....
If this is correct then we really to have 48 Shader pipelines and a total of 192 ALU's, Or is this just semantecs...

It's semantics. one 4-way SIMD ALU has the same theoretical performance as four scalar ALUs. The scalar ALUs will be more flexible, but have higher instruction issue overhead.

Cheers
Gubbi
 
MoHonRi said:
Here is my question....

Is this correct?

TechReports Article on Xenos said:
On chip, the shaders are organized in three SIMD engines with 16 processors per unit, for a total of 48 shaders. Each of these shaders is comprised of four ALUs that can execute a single operation per cycle, so that each shader unit can execute four floating-point ops per cycle.

If this is correct then we really to have 48 Shader pipelines and a total of 192 ALU's, Or is this just semantecs...

MoH

I was also going to ask the same question.
 
Shit. I'm really beginning to understand why there are lots of confusion about how this actually operates. It really doesn't operate in a manner in which fits in with the conepts we've understood before.
 
Back
Top