If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#101 |
|
Gamerscore Wh...
Join Date: Jan 2002
Posts: 12,951
|
Its a shame Rene knew shit about Xenos before he spoke. A lot of wat he said was actually wrong.
|
|
|
|
|
|
#102 |
|
Junior Member
Join Date: May 2004
Location: Tempe, AZ
Posts: 43
|
So, that means that Xenos actually does have a PPP for creating/modifying/deleting vertices, then?
As a layman, I wanted to ask a question in one of the other threads where DeanoC spoke of Xenos' ability to read/write anywhere in main RAM...what purpose would that have for game visuals, other than the academic possibilities? Physics? |
|
|
|
|
|
#103 |
|
Gamerscore Wh...
Join Date: Jan 2002
Posts: 12,951
|
Hopefully I'll have some more up later next week.
|
|
|
|
|
|
#104 | |
|
Senior Member
Join Date: Mar 2004
Location: Portugal
Posts: 3,528
|
Quote:
Well at least it will worth the wait BTW thanks in advance by the article |
|
|
|
|
|
|
#105 | ||||
|
Senior Member
Join Date: Jun 2004
Posts: 1,908
|
Quote:
Anyway, just stating the obvious but those comparison's would be for a SM3.0 and SM2.0 architecture's and even more disparity... Quote:
Quote:
So...you've read the 'leak' wrong. It seems 'Billions' and 'cycles' and 'seconds' and 'numbers' are being confused... What makes this more frustrating is that we've discussed these leaks many times in the other threads... Quote:
Please read my first post on the first page... If you have, then it would be obvious that, CELL + RSX ~ 100 Billion shader operations per second. And your quoting ONLY the Xenos GPU ~ 120 Billion shader operation per second ?? Lets keep this logical here... |
||||
|
|
|
|
|
#106 |
|
Regular
|
Jaws, we have reasonably detailed architectural diagrams for NV40 and R420 plus explanations on how they work. Care to explain, in detail, how they perform against each other, based purely on theory?
In other words, can you convert the theoretical capabilities of these two architectures into a realistic prediction of the performance of them? NV40 has 2x the SM2.0/3.0 ALU capability of R420, which should overhaul its core-clock disadvantage. But it doesn't. etc. What I've learnt over the last few days is this is a road to nowhere. I'm aghast that you still think it's worth pursuing this. I'm quite happy to speculate on the architectures, but I'm going to stick to throwing around stupid performance numbers for the sake of taking the piss out of the marketing. ATI's now counting 120Gsops for Xenos. It's now time for NVidia to counter that. Jawed |
|
|
|
|
|
#107 |
|
Regular
|
Oh yeah, you're right, I confused the "96 ops per cycle" and "96 Gops per second" numbers.
Sigh. Jawed |
|
|
|
|
|
#108 | |
|
Gamerscore Wh...
Join Date: Jan 2002
Posts: 12,951
|
Quote:
|
|
|
|
|
|
|
#109 | ||
|
Member
Join Date: Jun 2003
Posts: 862
|
Quote:
|
||
|
|
|
|
|
#110 | ||||
|
Senior Member
Join Date: Jun 2004
Posts: 1,908
|
Quote:
Quote:
This is essentially no different that putting the *offcial*, released spec metrics side by side. At least it has context and discussion in this thread. And NOBODY has *real* world numbers. My point? FACTS not BS that's been flying round recently. Even if these facts are *purely* theoretical but nevertheless can be derived logically. I've read some threads recently with people STILL crying foul at PS2 specs and yet conveniently forgetting to cry foul at XBOX specs and vice-versa. I smell fanb**s... Quote:
Quote:
Have you read my derivations of the 'peak' numbers on the first page? If you can dispute them then please feel free as I want FACTS that can be consistently and independently derived from *offcial* info. I've used a consistent method that's held solid so far with *official* numbers. It's explaind all the other, conflicting numbers, including yours. If you can't accept/dispute those numbers then I've learnt something new today... |
||||
|
|
|
|
|
#111 |
|
Regular
|
Maybe you want to look at page 13 of the PDF I linked:
- Pixel shader operations/pixel 8 - Pixel shader operations/clock 128 These are the claimed numbers for NV40. 51.2Gsops. Roughly half of what's claimed for RSX. How much more black and white do you want?... If only we could talk in terms of pixel shader instructions, comparisons would start to get meaningful. This example shows SM3 executing 102 instructions in 46.75 cycles, 2.2 instructions per cycle: http://www.beyond3d.com/forum/viewto...=327176#327176 It's also interesting to ask about the effect of RSX's likely SIMD pixel shader architecture. NV40 appears to be SIMD across all 16 pipelines, i.e. only one shader can be executing at a time: http://www.beyond3d.com/forum/viewtopic.php?t=23295 R420 is 4-way MIMD across 16 pipelines, i.e. each quad can execute a different shader. Counting transistors, this means that R420 has prolly got a greater overhead in instruction decode logic than NV40. I wonder if Xenos will be 48-way MIMD, i.e. each ALU can be running a different shader. I'm sorta doubtful, to be honest, because that's an awful lot of decode-logic overhead - though I admit to not knowing what that amounts to in percentage terms. I aint got the foggiest! RSX and Xenos are looking as incomparable as NV30 and R300 did a few years ago. All of this still leaves us high and dry on Cell versus XB360 CPU. Jawed |
|
|
|
|
|
#112 | ||
|
Regular
|
Quote:
Jaws is determined to compare architectures with absolutely no regard for their respective architectures. Jawed |
||
|
|
|
|
|
#113 |
|
Naughty Boy!
Join Date: May 2005
Posts: 994
|
I appologize for being such a noob here, but where is this article from Dave? thanks much.
|
|
|
|
|
|
#114 | |||||||
|
Senior Member
Join Date: Jun 2004
Posts: 1,908
|
Quote:
It's another component operation specific to pixels as I've already pointed out to you earlier in the thread with your 'page 3' reference. And if your going to use 'components' again, you've missed out the 'vertices' too for the total... Quote:
Considering you've also missed out 'vertex' ops too from the '51.2 GSop', it's nothing near "half" of what was claimed and would infact be similar. Quote:
Quote:
Quote:
Quote:
Quote:
|
|||||||
|
|
|
|
|
#115 | |
|
Friends call me xbd
Join Date: Feb 2005
Posts: 6,293
|
Quote:
I have a feeling you'll have no way of not knowing once he actually posts it.
__________________
Somebody set up us the bomb. |
|
|
|
|
|
|
#116 | ||
|
Naughty Boy!
Join Date: May 2005
Posts: 994
|
Quote:
|
||
|
|
|
|
|
#117 |
|
Regular
|
136 shader operations per cycle is what, exactly?
24 pixel pipelines doing 4 operations? plus 10 vertex pipelines doing 4 operations? Should we be making allowances for texture blending? Texture address calculation? What else? Unluckily we have two different claims from ATI for Xenos, 48Gsops (two ops per cycle) and 120Gsops (five ops per cycle). Which are you going to use in your comparison? Why? Jawed |
|
|
|
|
|
#118 |
|
Regular
|
In the code I linked to earlier:
http://www.beyond3d.com/forum/viewto...=327176#327176 which in SM3 is 102 instructions, at an average of 2.2 instructions executed per cycle. A 6800 Ultra would shade 137 million pixels per second. Assuming RSX operates in the same way, at 550MHz across 24 pipelines, this shader would shade 282 million pixels per second. The same shader executed on Xenos would need to operate at 1.2 instructions per cycle to shade 282 million pixels per second. But I have no idea if Xenos could run this shader at more than 1 instruction per cycle. Jawed |
|
|
|
|
|
#119 | |||||||||||
|
Senior Member
Join Date: Jun 2004
Posts: 1,908
|
Quote:
Quote:
Quote:
Xenos ~ 96 shop/cycle These numbers/ metrics on there own are meaningless without further parameters. But both numbers also cross-reference with other metrics that I calculated on the first page without any conflicts. So they are consistent but need further analysis. All this is essentially telling us (with the per CYCLE) is the number of execution units that run shaders, i.e. the number of shader execution units. It is not telling us the amount of work/computation being done per clock cycle nor the precision of the data being worked on. E.g. it's not differentiating between 1-way, 2-way, 3-way or 4-way execution units. i.e. all of those shops/cycle can be from 136 scalar units or 136 vector units or a combination. Also these vector units can be vec(2-4) units! So we can't go into any further detail without further information. However, from *official* MS spec from xbox.com, Quote:
Quote:
The 'leak' above also mentions the 48 ALUs consisting of a vector + scalar unit, ===> Xenos> 48 vec4 + 48 scalar units> 96 Shop/cycle The 4-way vector components of vec4 units are not included in the definition. ===> RSX> x + y units ~ 136 Shop/cycle* * we need more info to determine more detail...and this is deduced from the DOT products information below. Quote:
*Vec4 unit is assumed to provide a 1 Dot/cycle and this means Dot product per cycle is an 'integer' number, e.g.. 34 Dot/cycle. And more importantly, falls way short of the claimed CELL+RSX ~ 51 GDot/sec Taking the contribution of DOT products from CELL, either from 7SPUs or 7SPUs+1VMX, we get, RSX ~ 25.4 OR 28.6 GDot/sec Which one is accurate? 25.4/0.55 GHz ~ 46.18 Dot product/Cycle? or 28.6/0.55 GHz ~ 52 Dot product/Cycle? The 46.18 Dot/cycle is rejected in favor of the 52 Dot/Cycle because it's not an 'integer' from above assumption. From our earlier definition of a Shop/cycle, this then suggests 52 Vec4 units contribute to RSX's 136 Shops/cycle. RSX~ 52 Vec4 units + 84 units not contributing DOT products. http://www.beyond3d.com/forum/viewto...28&start=0 http://www.beyond3d.com/forum/viewto...=531473#531473 ===>RSX~ 28.6 GDot/sec Jawed-RSX'~ 18.7 GDot/sec is way short of my (Jaws*) RSX~ 28.6 GDot/sec and does not have enough Dot product computation to match the CELL+RSX claim. Therefore 18.7 GDot/sec and it's pipeline arrangement is unlikely. * Yes, as if we don't have enough confusion, Jaws and Jawed is now officially confusing the shit out of me too(Jaws)! :P Quote:
Quote:
The 120 GShop/sec for Xenos is greater than BOTH CELL+RSX ~ 100 GShop/sec. We can reject the 120 GShop/sec number for Xenos here for being inconsistent. Even though that '120' number is a valid number, the 'unit' of the metric is not consistent. It would be more accurate to call it Xenos~ 120 Billion component (5D) operations per second and leave out 'shader' from the metric. And also, 120*2FMADD ~ 240 GFlop/sec, (32bit because of SM3.0). Quote:
Quote:
Taking this consistency, the following was derived, RSX ~ 136 Shop/cycle ~ 52 Vec4 units + 84 units NOT contributing Dot products. Xenos ~ 96 Shop/cycle ~ 48 Vec4 + 48 Scalar units. Those 84 units for RSX can ALL be scalar for all we know or ALL be Vec3. So the measure of computation performed per cycle can vary. In that sense the aforementioned, 'component operation per cycle' metric will give more detail. But we don't have that for BOTH systems. ![]() From what I've derived above, RSX ~ 136 Shop/cycle ~ 52 Vec4 units + 84 units NOT contributing Dot products. I'd be guessing now on the following, usually, scalar units are paired with Vec units so, RSX ~ 136 Shop/cycle ~ 52 Vec4 + 52 Scalar + 32 Other units 32 Pixel Shaders ~ 32 Vec4 + 32 Scalar + 32 Other units* 20 Vertex Shaders ~ 20 Vec4 + 20 Scalar *Other units can be Vec3 or Scalar etc... Quote:
In any case, both Xenos and RSX will have assembly level, to-the-metal access on both consoles, irrespective of whether Xenos uses SM3+ or RSX uses OpenGL|ES. Looking at the Xenon 'leak' text above, it suggests that one Xenos ALU ~ vec4 + Scalar, and those ALUs can dual issue to a Vec4 and a scalar unit. So, Xenos ~ each ALUs(vec4+ scalar) can dual issue per cycle 48*2~ 96 instructions per cycle 96*0.5 Ghz ~ 48 Billion INSTRUCTIONS per second* * Not SHADER ops per second and so another number to get confused with! I'll stop right here! :P |
|||||||||||
|
|
|
|
|
#120 | |
|
Gamerscore Wh...
Join Date: Jan 2002
Posts: 12,951
|
Quote:
|
|
|
|
|
|
|
#121 |
|
Nutella Nutellae
Join Date: Feb 2002
Location: San Francisco
Posts: 4,297
|
RSX: 8 VS + 24 PS
1 VS = 1 vec4 + 1 scalar ops per cycle 1 PS = 1 vec4 + 1 vec4 (with co-issue 2 vec2) + 2 scalar ops per cycle (from RSX presentation diagram, there are 2 SFU units) 2 * 8 + (1 + 2 + 2) * 24 = 136 |
|
|
|
|
|
#122 |
|
Regular
|
Actually those SFUs look like 2 Vec4 units, each SFU is a stack of 4 "planes", just like each Vector ALU.
I wonder if Dave knows what SFU means (apart from "special function unit")? Jawed |
|
|
|
|
|
#123 |
|
Nutella Nutellae
Join Date: Feb 2002
Location: San Francisco
Posts: 4,297
|
SFUs are special function units, that is. SFUs do simple and complex scalar ops suchs as reciprocal, exp, log, sin, etc..
NV40 has SFUs too and those 'things' stacked upon eacht other are just pixel pipelines, imho. |
|
|
|
|
|
#124 |
|
Regular
|
Ah, I've never seen an SFU on an NVidia diagram before. Good thinking.
I suppose, alternatively, it could be the Fog ALU that you can see here: http://www.beyond3d.com/previews/nvi.../index.php?p=9 SM3 requires that Fog is done in shader code rather than as a fixed function unit in the ROP. Jawed |
|
|
|
|
|
#125 |
|
Off-season
Join Date: Feb 2002
Location: On the pursuit of happiness
Posts: 3,019
|
That fog ALU is just a fixed point 4-component linear interpolation.
__________________
Binary prefixes for bits and bytes |
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Different filtering methods | Zvekan | 3D Architectures & Chips | 41 | 30-Jul-2003 01:23 |
| Kyoto FLAMEWAR! | RussSchultz | General Discussion | 91 | 14-May-2003 00:57 |
| My response to the latest HardOCP editorial on benchmarks... | Joe DeFuria | 3D Architectures & Chips | 216 | 26-Feb-2003 11:34 |
| GF4 has inflated 3dmarks scores so says the INQ..... | jb | 3D Architectures & Chips | 126 | 19-Jun-2002 23:35 |
| nVIDIA Cg Compiler & Language Embraced By Industry | Dave Baumann | Press Releases | 0 | 14-Jun-2002 21:27 |