If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
|
|
#4526 | ||
|
Senior Member
Join Date: Mar 2006
Posts: 1,713
|
Quote:
Not knowing the architecture of the competition, it's the low risk and low cost way to transition to your next product. Most established companies work that way, especially when they are resource limited due to a number of other projects on the side (xBox/Wii): instead of throwing everything overboard you only improve existing stuff and spend most time on what needs to be designed from scratch. (DX10 stuff, tessellation, ...) I don't really see how the shader architecture is going to be the source of performance problems for 3D stuff. Quote:
|
||
|
|
|
|
#4527 |
|
Certified not a majority
Join Date: Sep 2003
Location: Sittard, the Netherlands
Posts: 3,182
|
I really think that the instructions issued to the individual ALUs include wide (4vec + scalar) as well as small (scalar) ops, simply because that would reduce the needed bandwidth a whole lot. And they might include a loop counter as well as some conditionals. Those conditionals could be applied to the exit condition, or the conditional writing/branching of individual instructions.
__________________
The Laws of nature are NOT subject to the majority vote. In the long run. |
|
|
|
|
#4528 |
|
Junior Member
Join Date: May 2007
Posts: 57
|
Could you please explain that ?
We will show a two examples Let's add two 4D vectors and two 1D scalar information. ATI will do this in a single clock because it has one 4D and one 1D unit to spend. Nvidia will do this in a single clock but it will take 5 units from 128 possible while ATI used only 2 from 128 possible (64 4D + 64 1D). ATI is theoretically faster, but Nvidia has Shaders that run at 1.35 GHz or some 60 percent faster than ATI's that run at 800 MHz. This makes the situation much better for Nvidia. This example is very rare in real games. Shaders very rarely use the operations with 4D vectors. Shaders usually work with 3D data such as 3D coordinates, 3D normals and 3D RGB channels without alpha. In some cases Shaders use 2D functions only e.g. 2D coordinate in texture or even 1D. Example 2: Let's add two 3D vectors, two 2D Vectors and one 1 D scalar. Nvidia again needs 5 Shaders or 2+2+1 units while ATI can not make this in a single clock. ATI needs a single clock to add two 3D vectors and at the same clock it can add the 1D scalar. However it needs a second clock to add 2 2D vectors. This means that ATI used 1+1 units in first and a single unit in second clocks, together it used three units in two clocks. As R600 Shaders are 60 percent slower than Nvidia's it means that R600 is two times slower than G80 in this particular example. If you decide to make a Shader that will add eight 1D vectors Nvidia can take 8 units in a single clocks and ATI has to do in four clocks as it can use 1 4D and 1 1D unit per clock to finish this operation. In this case Nvidia can be four times faster and still finishes the calculation faster. Even if it looks really bad for ATI there is still some hope for the R600. You have to take into consideration that Microsoft HLSL (Higher Level Shader Language) compiler will do its small miracle and will try to optimize the Shader code. Compiler will try to "glue" a few scalars in a single VEC4 unit. Compiler We have two scalar informations, the first one called Fudo and second ona called JenHsun. The compiler will glue these two scalar values in a single 2D vector called "FudoJenHsun" and it will access Fudo scalar part as the FudoJenHsun.X and JenHsun part as FudoJenHsun.Y. The good part is that all possible operations with a new variable 2D vector FudoJenHsun will run parallel as one 2D vector. This wont help Nvidia at all, but it will really mean a lot for ATI. Nvidia still has an advantage of having full flexible scalar units, while ATI doesn’t, at least not that we or any of our sources knows of. We know it is a complicated part but we could not simplify things much more than this. We apologise for the complexity of the part but it just doesn’t go easier than that. We simplified it as much as we could. and will the future games use these vec4 instructions for shaders ? Thx |
|
|
|
|
#4529 |
|
Join Date: May 2002
Location: New York, NY
Posts: 12,679
|
There's really no reason to expect any increase in vec4 instruction utilization.
__________________
April 20, 1979 - America must never forget. |
|
|
|
|
#4530 |
|
Junior Member
Join Date: May 2007
Posts: 57
|
so how ppl expect a better performance of R600 than G80 in DX10 games ??
|
|
|
|
|
#4531 |
|
Senior Member
Join Date: May 2005
Posts: 2,042
|
source: Fudzilla. That's the whole explanation
__________________
Sorry for my English. But I hope it's better than your Czech |
|
|
|
|
#4532 |
|
Junior Member
Join Date: May 2007
Posts: 57
|
but i really wanna know why r600 would be better in Dx10 and G80 is better in DX9
will DX10 take more advantage of the Vec5 *vec4 + scalar* architecture and R600's 320 Sps will be used in a more efficient way ? |
|
|
|
|
#4533 |
|
Heteroscedasticitate
Join Date: Mar 2005
Posts: 2,362
|
It will because it must because it`s from ATi and it`s late and it may not be a scorcher in terms of general performance so there must be a catch because ATi doesn`t make sucky stuff ever ever ever.
On a serious note, I think it`s fairly hard to predict wheter or not it`ll rule WRT DX10, because we've yet to see typical DX10 workloads(no, dubious demos quoted in marketing slides don`t count). It`ll come down to what ppl actually do with DX10 and how that jibes with the 2 competing architectures. The R600 seems to be more adept than the G80 when it comes to Geometry shading, wheter or not that`ll matter/it is fast enough for it to matter is a different thing. Then there`s compiler magic that comes into play...until the NDA is off an a pertinent analysis can be performed, it`s all a complex exercise of who can pull theories from the deeper parts of their anatomy at a greater pace(no disrespect intended towards fellows like Jawed, mind you)
__________________
Donald Knuth: Science is what we understand well enough to explain to a computer. Art is everything else we do. |
|
|
|
|
#4534 |
|
Certified not a majority
Join Date: Sep 2003
Location: Sittard, the Netherlands
Posts: 3,182
|
Say, you have an instruction stream coming into your GPU, that uses half the available memory bandwidth. The other half is used for data. Say, you have 256 ALUs (processing elements), that all need a new instruction each clock to be able to do something useful. And each instruction + operands is 64 bits wide (that's very conservative), and you can read one each clock from memory. That requires a 256 * 64 * 2 (for data) = 32768 (!!) bits wide bus.
Of course, that isn't practical. Fortunately, GPUs are SIMD: Single Instruction, Multiple Data. In other words: the ALUs are grouped. Each instruction is executed multiple times in parallel. And there are other ways to reduce the bandwidth (bus width) needed: either you use instructions that tell multiple groups of ALUs what to do (like the superscalar R600), or you execute those instructions over multiple clock cycles (like the serial 8800). Both ways, you reduce the amount of instructions needed each clock to a manageable amount. Branching is further increasing the need for instruction scheduling, or reducing the throughput. Because, at each if..then..else statement, some of the elements might go one way, and the others the other way. At that point, you can split the streams (doubling the amount of ALUs and bandwidth needed), calculate both possibilities in sequence and only write the ones that are valid for that case, or simply calculate both paths in sequence. The latter two both halve the throughput every time they happen. As the R600 has 4 ALU blocks, that all consist of 6 (4 vec, 1 scalar and 1 branching/conditional) units, which all have to receive instructions each clock, you either need instructions that tell all of them what to do in all cases (from 6 independent, scalar instructions, up to a single combined vec4 + scalar and a conditional instruction). That requires either very long instruction words (VLIW, say up to 1024 bits each for each block), or clever scheduling. Because, most combined instructions can be simplified into a single instruction for the whole ALU block. But, if you only schedule a single instruction for a single ALU, all the others would be wasted for that clock. So, it's most likely, that they kept the instruction length manageable, but made it possible to stack instructions, so they can be executed all at once.
__________________
The Laws of nature are NOT subject to the majority vote. In the long run. Last edited by Frank; 05-May-2007 at 22:49. |
|
|
|
|
#4535 | |||
|
Senior Member
Join Date: Mar 2006
Posts: 1,713
|
Quote:
ATI will do the 4D operation and the 1D operation in 1 clock cycle and can do 64 of those actions (for 64 different threads) at the same time. The 8800GTX requires 5 clock cycles to do them and can do 128 in parallel, but at a higher clock speed. Quote:
|
|||
|
|
|
|
#4536 | |||
|
Senior Member
Join Date: Oct 2006
Location: Germany
Posts: 1,003
|
Quote:
Quote:
G80 never works on Vec2, Vec3, Vec4... instructions, but always on Vec1. If there is a eg a Vec4 instruction, G80's compiler splits it in four Vec1 and it takes four cycles two work on it. Imagine: You have to work on four RGBA quads. The G70's quad ALU (4D) needs four cycles for these four RGBA quads: first cycle - first RGBA quad, second cycle - second, third... The G80 splits these four quads in four pixel groups with 16 pixels: 16*R, 16*G, 16*B, 16*A. A Vec16-ALU needs four cycles for these four groups: first cycle - 16*R, second cycle - 16*G... A Vec16-ALU is clocked up to 1512 MHz. And now: You forget all these pixel stuff. Current GPUs like R520/R580 or G80 work on threads/batches/green elephants. What really matters is: input (10101010101...) and output (what you see on you display). Quote:
__________________
Hail Brothers and Sisters! Coranon Silaria, Ozoo Mahoke Eta Kooram Nah Smech! Find Chuck Norris. Last edited by Arnold Beckenbauer; 05-May-2007 at 23:01. |
|||
|
|
|
|
#4537 | |
|
Senior Member
Join Date: Mar 2006
Posts: 1,713
|
Quote:
But the execution pipes (shaders) don't have much to do with the DX9 vs DX10 performance. For that, the stuff surrounding the shaders is what counts: how do you store geometry shading data? How do you feed the shaders with data? How fast can it do branching? We currently don't know a lot about the DX10 organization for G80, and even less for R600. It's impossible to know how they will behave for DX10 games, but if there's a major difference, I think it's reasonable to say that the difference in shader executing pipeline won't be the major factor. Right now, I know of only 1 report about relative GS performance, and that's a blog post about MS Flight Simulator. Not a exactly a large body of evidence to go on... |
|
|
|
|
|
#4538 |
|
Regular
Join Date: Mar 2007
Posts: 9,227
|
Aye it'd be interesting to hear from the developers of Call of Juarez, Company of Heroes, Crysis, etc, who are either working on DX10 or patching their games to DX10 what they think of the two architechtures. Although I'm supposing they are also still under NDA from ATI.
From the admittedly VERY few screens of Call of Juarez, the DX10 version looks absolutely nothing like the DX9 version that I tried out. Well other than buildings and terrain being in the same place. I'm not sure however if that is a sign of things to come, or if it's just that they've had an extra year or so to add more bling to the game. Regards, SB |
|
|
|
|
#4539 |
|
Registered
Join Date: Apr 2004
Posts: 8
|
What are the ramifications of the R600 "doing" audio as well as graphics? Is this going to render (no pun intended) after market sound cards superfluous?
|
|
|
|
|
#4540 | |
|
Member
Join Date: May 2004
Location: Somewhere, IN USA
Posts: 313
|
Quote:
1) Offload processing requirements from the CPU 2) Better sound quality with analog output. 3D effects are the only thing that comes to mind that requires serious audio processing and with multicore CPUs being more prevalent the need to offload the processing to a card seems increasingly less. As far as quality goes a CPU can do just as well as a card, if not better since there are no processing restrictions. The big sound difference in the past came from better components used to output analog signals. With digital signals there isn't a whole lot that can be done to improve their quality component wise. I'd be willing to bet that the only audio functions the card will have will be to output that digial signal through HDMI. Most onboard audio solutions are nothing but software. The hardware part is just a matter of analog to digital conversion and vice versa. |
|
|
|
|
|
#4541 | |
|
Senior Member
Join Date: Jul 2002
Posts: 2,178
|
Quote:
I'm surprised that no one picked up that the GDDR4 1GIG card seems to be underclocked. The original rumours spouted that the GDDR4 card(ala XTX?) would use 2.2(or 2200Mhz) yet these pics only show 2000Mhz. Of course as someone mentioned, these are most probably developer cards thus why it only has 2000Mhz and is called XT. US
__________________
God put me on earth to do a certain number of things. Right now i'm so far behind that i'll never die. Random 512Kb onboard -> S3 Virge 4MB -> RivaTNT2 -> GeforcePro -> GF3 -> NV3x -> R420 -> R580 -> G80 -> G92 -> 5870 -> ??? |
|
|
|
|
|
#4542 |
|
Senior Member
Join Date: May 2005
Posts: 2,042
|
HD2900XT GDDR4 OEM? (750/2000). Maybe the same version, like in DailyTech "preview"... (they call i XTX, but I think they are just wrong
__________________
Sorry for my English. But I hope it's better than your Czech |
|
|
|
|
#4543 |
|
Unknown.
Join Date: Aug 2002
Location: UK
Posts: 4,883
|
According to publicly leaked information, the processing is still done on the CPU. You know, just like what any modern integrated audio solution will do; these things leave all the major processing duty to the CPU. In the end, that kind of work is so minimalist is doesn't really matter. Some EAX effects (which neither R6xx nor 99% of integrated solutions support in practice anyway) *might* be a bit more expensive.
__________________
Focusing on non-graphics projects in 2013 (but I still love triangles) "[...]; the kind of variation which ensues depending in most cases in a far higher degree on the nature or constitution of the being, than on the nature of the changed conditions." |
|
|
|
|
#4544 |
|
Senior Member
|
So, R600 would just have an integrated audio codec, similar to those in all over the mainboards?
__________________
Apple: China -- Brutal leadership done right.
Google: United States -- Somewhat democratic. Microsoft: Russia -- Big and bloated. Linux: EU -- Diverse and broke. |
|
|
|
|
#4545 | |
|
Senior Member
Join Date: Jul 2002
Posts: 2,178
|
Quote:
Other sound cards usually work off PCI. US
__________________
God put me on earth to do a certain number of things. Right now i'm so far behind that i'll never die. Random 512Kb onboard -> S3 Virge 4MB -> RivaTNT2 -> GeforcePro -> GF3 -> NV3x -> R420 -> R580 -> G80 -> G92 -> 5870 -> ??? |
|
|
|
|
|
#4546 |
|
Senior Member
Join Date: Jan 2007
Location: TDO, Germany
Posts: 1,222
|
It will be nice to finally get rid of those sound + graphics IRQ conflicts anyway.
__________________
Trade Steam games with other B3D members |
|
|
|
|
#4547 |
|
yes, i'm drunk
|
This should be from AMD's own papers:
__________________
I'm nothing but a shattered soul... Been ravaged by the chaotic beauty... Ruined by the unreal temptations... I was betrayed by my own beliefs... |
|
|
|
|
#4548 |
|
Junior Member
Join Date: Sep 2006
Location: North West UK
Posts: 81
|
I thought GDDR4 was supposed to be more efficient from an energy side of things, when compared to GDDR3, at the same clockspeeds? (I know 2.0 W is next to nothing anyway.. but still) And the GPU TDP looks off too, both are estimated at 750-800, same voltages etc.. just one is GDDR3, one is GDDR4, and the latter has a TDP 20 W higher? Maybe i'm just being oblivious to the obvious though (i'll blame the hangover) |
|
|
|
|
#4549 | |
|
Member
Join Date: Oct 2006
Posts: 386
|
Quote:
|
|
|
|
|
|
#4550 |
|
Junior Member
Join Date: Sep 2006
Location: North West UK
Posts: 81
|
*goes and sits in the corner with his dunce hat*
|
|
|
| Thread Tools | |
| Display Modes | |
|
|