PDA

View Full Version : The ATI R600 Rumours & Speculation Centrum


Pages : 1 2 [3] 4 5 6 7 8 9 10

Rys
15-Nov-2006, 22:01
I don't get why something as low-level as dual-issue of a pixel unit can be disabled at as high a level as the driver, or why they would.

How can a buggy driver louse that up?

In CPUs, that kind of switching off is done in microcode, and not without good reason.

What if we have to wait until G85 for a fully capable second MUL, if ever?
If the shader assembler in the driver never asks for a unit or function to be used, it won't. Performance can be limited in all kinds of ways by that part of the performance equation, and the chip would have no choice but to obey. The chip itself is dumb that way (as is an x86 CPU).

And it's maybe not a buggy driver at all (as far as people think about a 'bug'), but rather an intentionally limited one until they figure out how to expose it for general shading for most cases.

dnavas
15-Nov-2006, 22:20
I'm concerned it may reflect a limitation on the number of writes to the registry-file. That would explain why it handles perspective correction (output is to the texture addressing units, and not the registry-file?), but seems to conflict with the notion that the SFU and MAD units appear to be able to run at the same time....

^eMpTy^
15-Nov-2006, 22:28
Do you guys think G80 vs R600 will come down to shading horsepower? Or do you think they'll be about even and ATi's 512bit (rumored) memory bus will make the difference?

PeterAce
15-Nov-2006, 22:37
In NV40 I seem to recall that in ALU 0 (shader computation top) that the MUL (before the TEX) shared a data path with SFU (RCP) and that if a SFU was required then a MUL4 was blocked but a MUL2 + SFU or MUL3 + SFU could still run together.

Could sharing data paths be the cause of the the missing MUL on G80?

Razor1
15-Nov-2006, 22:59
Do you guys think G80 vs R600 will come down to shading horsepower? Or do you think they'll be about even and ATi's 512bit (rumored) memory bus will make the difference?

If its 512 bit, its going to come down to both, the g80 doesn't have pronounced hit because of less bandwidth. Chris Ray demonstarted this in one of his articles I think, when he downclocked the memory of the g80, the hit was very small.

Geo
15-Nov-2006, 23:11
I'm going to be very, very surprised if R600 is 512-bit. But, y'know, I've been very very surprised before. It looks to me like they are very competitive bw-wise with just GDDR4. Now, 65nm refresh time it might be more interesting, if NV sticks with 384-bit *and* has GDDR4.

Razor1
15-Nov-2006, 23:43
I'm going to be very, very surprised if R600 is 512-bit. But, y'know, I've been very very surprised before. It looks to me like they are very competitive bw-wise with just GDDR4. Now, 65nm refresh time it might be more interesting, if NV sticks with 384-bit *and* has GDDR4.


If nV feels the need to go to 512 bit in thier refresh, they would definility be able to do it, the g80 seems to also partition the bus as previous generations have partitioned quads. And 1 gb boards are definility coming sooner rather then later, but I'm very suprised if they come this soon, it just been 1 gen of 512's!

INKster
16-Nov-2006, 00:12
If nV feels the need to go to 512 bit in thier refresh, they would definility be able to do it, the g80 seems to also partition the bus as previous generations have partitioned quads. And 1 gb boards are definility coming sooner rather then later, but I'm very suprised if they come this soon, it just been 1 gen of 512's!

Not if you count the rare 6800 Ultra's and X850 XT PE with 512MB. ;)

Razor1
16-Nov-2006, 00:48
Not if you count the rare 6800 Ultra's and X850 XT PE with 512MB. ;)


ah yeah but who's counting MIA cards :lol:

INKster
16-Nov-2006, 00:53
ah yeah but who's counting MIA cards :lol:

I had one in my very hands (a 6800 U 512) :D. I'm amazed Nvidia even bothered to do a special, longer PCB just for this almost non-existent product.

Geeforcer
16-Nov-2006, 06:56
Regarding the R600 bus width, all I can say that if someone asked me 3 months ago to bet on what was less likely, R600 having a 512 bit bus or G80 being a USA with 128 scalar ALUs, I would be quite poor right now.

zgemboandislic
16-Nov-2006, 07:35
Well it will be the first time that amount of my system memory is not 8 times the video memory.

I had a 9800pro 128Mb and 1Gb RAM, then a 7800GTX 256Mb with 2 gigs, but I will definitely not buy 8Gb of system memory when R600 hits the market...

The way it is going, in 2 years we will have more video memory than system memory! :D

neliz
16-Nov-2006, 10:32
Having that much memory available at that bandwidth, combined with ati's general low latency on memory requests could actually mean they're doing more with their memory than current generations of hardware. IQ wise anyway.

Robbitop
16-Nov-2006, 10:35
R5x0 ALU = 12 FLOPs
Xenos ALU = 9 FLOPs
G80 SP = 3 FLOPs


You do count the additional ADD from the R5xx Mini-ALUs? AFAIK this isn't always available (of course in synthies which only use sperate MAD/AD/MUL shaders it is). For SFU, reciproca, change of the algebraic sign ect this unit is blocked. Many shaders and calculations need those functions so the ADD will be not avaible most of the time (DP3 BM is a popular example).

Sorry for my horrable english. :-(

Radeon600
16-Nov-2006, 12:43
Isn't 128 G80 Scalar Processor equals 4 Quads (16) of R520 with Vector3+Scalar or Vector4 ALU setup? Since, the frequency of G80 Scalar Processors is almost doubled, so doesn't it competes almost 32 Vector3+Scalar based Pixel Pipelines (8 Quads) running at 650MHz? I don't think it makes any sense about 96 Vector based Shader Processors for R600, however, I suppose the figure of 96 Shader Processors may be correct for R680 rather.

What I believe, that R600 will be indeed based on Xenos style with additional 16x Array/Cluster with Vector4+Scalar alignment. Just because ATI already made Xenos on similar Architecture. And since, this suits ATI well, as historically we've seen on their cards, all units operates at similar frequency.


My take on R600 (quite similar to what Inquirer is reporting)

64 Shader Processors [Vector4+Scalar]
700MHz Core
16 TMUs
16 ROPs
384bit @ 2200MT/s GDDR4
1GB Memory

trinibwoy
16-Nov-2006, 12:50
If you want to work it that way you'd come up with ~ 62 vec4 ALU's at 700Mhz.

= 128/4 * 1350/700.

I'm not buying the R600 is just Xenos with another array tacked on thing though. ATi must have some surprises in store for us given the extra time they've had with the architecture.

Ailuros
16-Nov-2006, 21:29
You do count the additional ADD from the R5xx Mini-ALUs? AFAIK this isn't always available (of course in synthies which only use sperate MAD/AD/MUL shaders it is). For SFU, reciproca, change of the algebraic sign ect this unit is blocked. Many shaders and calculations need those functions so the ADD will be not avaible most of the time (DP3 BM is a popular example).

Sorry for my horrable english. :-(

Maximum theoretical throughput; does that ring a bell? Ok I'll revise for accuracy's sake:

R5x0 = 8 FLOPs
Xenos = 9 FLOPs
G80 = 2 FLOPs :P

PeterAce
16-Nov-2006, 22:30
Maximum theoretical throughput; does that ring a bell? Ok I'll revise for accuracy's sake:

R5x0 = 8 FLOPs
Xenos = 9 FLOPs
G80 = 2 FLOPs :P

ZOMG!!!1!! You're still counting R500/C1's extra 9'th FLOP which is the Scalar ADD +/ SFU? (I jest ;))


.....I hate counting/comparing FLOPs because it competely ignores an architectures effency (or in some cases lack of it!).

Ailuros, I strongly agree with your view 'knowing the numbers of units' (or in this cases FLOPs) is pointless without knowing each units (or architechures) capabilitys.

<the reason for this this post is for the last few days I've been comparing G80's deliverable performance vs it's theoretical FLOPs and then doing the same comparison with 9800 Pro, X800 XT and 6800 GT, and in lots of Pixel Shader limated cases G80 is performing between 10% and 70% faster then expected. I still have to do many more tests but so far it looks like the scalar nature of G80's units really help in upping efficiency.>

LeStoffer
16-Nov-2006, 22:55
... for the last few days I've been comparing G80's deliverable performance vs it's theoretical FLOPs and then doing the same comparison with 9800 Pro, X800 XT and 6800 GT, and in lots of Pixel Shader limated cases G80 is performing between 10% and 70% faster then expected.

Faster than expected? Well, in theory the scalar design could reach an efficiency at near 100 percent, so what kind of efficiency were/are you expecting?

PeterAce
16-Nov-2006, 23:59
Faster than expected? Well, in theory the scalar design could reach an efficiency at near 100 percent, so what kind of efficiency were/are you expecting?

Your right, I was overly sceptical beforehand because I read the part of the nvidia arch docs (the day before launch -while my marketing/hype filter was switched on) they stated that 128 scalar units were x2 the performance of 32 vec4s.

So after testing G80 I allowed myself to be much more impressed at the delivered results ;) :)

ChrisRay
17-Nov-2006, 04:12
If its 512 bit, its going to come down to both, the g80 doesn't have pronounced hit because of less bandwidth. Chris Ray demonstarted this in one of his articles I think, when he downclocked the memory of the g80, the hit was very small.

Actually.. I think I should clarify this since it may be misunderstood. I didnt write an article about memory bandwith. But I did do some tests with bandwith regarding 4xAA performance where the performance impact was minimal. With 8xQ AA the performance impact was much larger by reducing the bandwith. The 4xAA results were obviously at least partially related to the single cycle 4xAA that occurs on the G80.

Chris

rwolf
17-Nov-2006, 05:58
Well it will be the first time that amount of my system memory is not 8 times the video memory.

I had a 9800pro 128Mb and 1Gb RAM, then a 7800GTX 256Mb with 2 gigs, but I will definitely not buy 8Gb of system memory when R600 hits the market...

The way it is going, in 2 years we will have more video memory than system memory! :D

Soon they will put the CPU on a card and the GPU on the motherboard.

INKster
17-Nov-2006, 06:30
Soon they will put the CPU on a card and the GPU on the motherboard.

Old news (http://www.asrock.com/product/AM2CPU%20Board.htm). :wink:

Shtal
17-Nov-2006, 23:51
I'm looking forward/truth about R600, but I still have to wait....

Shtal
17-Nov-2006, 23:51
64 Shader Processors [Vector4+Scalar]
700MHz Core
16 TMUs
16 ROPs
384bit @ 2200MT/s GDDR4
1GB Memory

I'am not sure how/why on ATI side 384bit memory makes Total 1GB RAM vs. Nvidia 384bit memory makes 768MB Ram 256bit+128bit does not double 512MB to 1GB.

Also: The problem is with R600 to beat out the G80 at anything it has to be running higher clock speed like at least around 700MHz for GPU. And its 4 way smid has to be 100% as effective as 4 single units.. "Lower clock speed on ATI GPU will hurt effectiveness 4 way vs. 1 way on Nvidia's" You need higher clock Frequency to utilze 100% effect at 4-way {64 shader Processor}

I liked the previous rumor 96 Shaders without 4-way operational.
The way I look at it: fact is that a 4x design isn't as effective as a 1x design and never can be and the real world number.

Anarchist4000
18-Nov-2006, 00:36
My understanding of a ring bus is that you keep forwarding data from one point to the next until it gets to where it's going. How wide that bus is only determines how much data can be in flight at any given time. How you get the data into the bus is independent of how wide the bus itself is. Memory capacity/bandwidth could be increased simply by adding more points on the loop at the cost of additional hops for things to get where they're going. They could maintain a 256bit wide bus and still double the bandwidth simply by sticking more points/chips onto the loop, latency would likely be horrific however. I have no idea if the bus even has to run at the same speed as the memory. I'd imagine it'd be possible to have more available bandwidth on the bus than the memory chips themselves can handle.

As for utilization, anything in a 3D environment should predominantly be using vectors for almost everything. Vector ALUs should be quicker at doing vector operations than a series of scalar ALUs working to complete the same task.

Chalnoth
18-Nov-2006, 00:41
As for utilization, anything in a 3D environment should predominantly be using vectors for almost everything. Vector ALUs should be quicker at doing vector operations than a series of scalar ALUs working to complete the same task.
Correction: vector ALU's should be smaller than a series of scalar ALU's doing the same vector operations. Thus it should conceivably be possible to build more vector ALU's than an equivalent number of scalar ALU's (i.e. if nVidia made their 128 scalar ALU's into 4-vector ALU's, they might conceivably have fit more than 32 vector ALU's on the 8800).

But according to nVidia the performance improvement from full scalar is up to 2x, so nVidia may well be correct that the space savings just isn't worth it.

Shtal
18-Nov-2006, 01:49
The participations on this forum has many different angle view about ATI R600.
I made a pole vote, if anybody don't mind to vote about R600 expectations please and try. Just to see what is your hope "or" ability what ATI's Engineers could do.
http://snappoll.com/poll/147813.php

Thanks :) :) :) :)

Jawed
18-Nov-2006, 02:29
My understanding of a ring bus is that you keep forwarding data from one point to the next until it gets to where it's going. How wide that bus is only determines how much data can be in flight at any given time. How you get the data into the bus is independent of how wide the bus itself is. Memory capacity/bandwidth could be increased simply by adding more points on the loop at the cost of additional hops for things to get where they're going. They could maintain a 256bit wide bus and still double the bandwidth simply by sticking more points/chips onto the loop, latency would likely be horrific however. I have no idea if the bus even has to run at the same speed as the memory. I'd imagine it'd be possible to have more available bandwidth on the bus than the memory chips themselves can handle.
GPUs prefer bandwidth over latency, generally. I think latency only becomes a serious question when you're looking at the pipeline level of a unit (a ROP or a TMU, say) where increased latency between a buffer and the pipeline means that the buffer has to be sized-up and the pipeline lengthened.

As for utilization, anything in a 3D environment should predominantly be using vectors for almost everything. Vector ALUs should be quicker at doing vector operations than a series of scalar ALUs working to complete the same task.
Vectorisation at the hardware level is arguably easier to implement. Though I suppose once you get into the twists and turns of arbitrary swizzling, co-issuing, dual-issuing and instruction predication/dynamic-branching the increased complexity of a vector pipeline becomes sufficient that an alternative approach seems quite interesting.

I think G80's scalar pipeline is pretty impressive:

utilisation of each ALU is significantly improved as a vector ALU's channels can't always be fully utilised (instruction level parallelism in the code is often too coarse grained for the co-issue options available)
register file utilisation is improved by having registers that only take the space they use since unused channels in vec4 registers no longer exist, wasting space - the GPU is no longer dependent upon the programmer packing all register usage as tightly as possible into the set of allocated registers
register file allocation gets a significant boost from (transparent to the dev) instruction reordering. Instead of using vec4s for intermediate values, you can "unroll" code into a "loop" that processes a single channel at a time. So if you have complex code that makes use of four intermediate vec4s, you can replace that code with four passes (as an unrolled loop) operating on scalars, using four scalar intermediate registers instead. That saves the space corresponding to three vec4s in the register fileWhat I wonder about G80's scalar pipeline is:

how costly is the hardware required to organise scalar instruction despatch? It's much harder to schedule a different instruction every clock cycle than every few clock cycles, or every few hundred cycles as we see in older GPU designs (clearly, though, this is the norm in CPU implementations)
how complex is the hardware required to access the register file and perform swizzling and predication? GPUs have to work with huge register files (which are difficult enough) and a scalar pipeline seems to add a significant degree of irregularity into the access patterns (read and write) used against a register fileThe end result of G80's scalar pipeline seems to be significant boost in realworld ALU utilisation (guess: 33%) and register file consumption (guess: 150% increase in threads in flight for a given register file size).

Jawed

silent_guy
18-Nov-2006, 08:53
How wide that bus is only determines how much data can be in flight at any given time. How you get the data into the bus is independent of how wide the bus itself is. Memory capacity/bandwidth could be increased simply by adding more points on the loop at the cost of additional hops for things to get where they're going.
:idea: It's actually just the opposite!
Bandwidth is the amount of data per unit of time that can pass a particular reference point. By widening the bus, you can deliver more data per clock cycle. Adding ring stages increases the number of in-flight transactions and, given the same amount of injection points, reduces the chance of a bus collission, but it doesn't increase bandwidth.

I have no idea if the bus even has to run at the same speed as the memory. I'd imagine it'd be possible to have more available bandwidth on the bus than the memory chips themselves can handle.
That's correct: you better make the bandwidth larger than the input feed. This is why it's important to look at the R580 ringbus as 2 separate 256-bit buses: each clock cycle, the full input bandwidth can be transferred from one hop to the next.

Radeon600
18-Nov-2006, 12:00
I'am not sure how/why on ATI side 384bit memory makes Total 1GB RAM vs. Nvidia 384bit memory makes 768MB Ram 256bit+128bit does not double 512MB to 1GB.

:?: I am talking about Memory Controller being at 384-bit wide. What it has to do with the amount of Memory? Care to elaborate those 128-bit Video cards with 512MB RAM?


Also: The problem is with R600 to beat out the G80 at anything it has to be running higher clock speed like at least around 700MHz for GPU. And its 4 way smid has to be 100% as effective as 4 single units.. "Lower clock speed on ATI GPU will hurt effectiveness 4 way vs. 1 way on Nvidia's" You need higher clock Frequency to utilze 100% effect at 4-way {64 shader Processor}

Talking about Xenos style ALU, it's a Vector4+Scalar, so don't exclude that Scalar ALU, it makes 64 of them in total if R600 comes up with four arrays of 16x

rendezvous
18-Nov-2006, 12:50
:?: I am talking about Memory Controller being at 384-bit wide. What it has to do with the amount of Memory? Care to elaborate those 128-bit Video cards with 512MB RAM?

You tend use memory chips of the same size and with for the whole bus.
For it to have 1 GB of ram on a 384 bit bus you would need to split the bus up in two parts, one holding 512 MB over 256 bits and one part holding 512 MB over 128 bits. This would be an unbalanced design.

512 MB on a 128 bit bus can easily be explained, one could use 8 32Mx16 chips where two chips shares 16 bits of the bus.

Radeon600
18-Nov-2006, 14:24
You tend use memory chips of the same size and with for the whole bus.
For it to have 1 GB of ram on a 384 bit bus you would need to split the bus up in two parts, one holding 512 MB over 256 bits and one part holding 512 MB over 128 bits. This would be an unbalanced design.

With NVIDIA in 8800GTX they've used six 64bit channels , with each channel connected to two 64MB DRAM's,(12x64) even ATI used eight 32bit channels in X1800 for maximum I/O efficiency.

The_Wolf_Who_Cried_Boy
18-Nov-2006, 23:23
even ATI used eight 32bit channels in X1800 for maximum I/O efficiency.

Didn't ATI design their memory controller with growth into GDDR4 in mind in that it used 8 cycle bursts as opposed to GDDR3 using 4 cycle bursts?
Wasn't it also suggested at the time it could already scale to very high clocks and as such would be reused in the next generation of designs?

That being the case an R600 utilising the current 8x32bit memory controller may not equal a G8800 in theoretic bandwidth if both used identical memory but would be much more efficient with 8 concurrent transactions as opposed to six.

bdmosky
19-Nov-2006, 03:16
The participations on this forum has many different angle view about ATI R600.
I made a pole vote, if anybody don't mind to vote about R600 expectations please and try. Just to see what is your hope "or" ability what ATI's Engineers could do.
http://snappoll.com/poll/147813.php

Where are the slower options?

Shtal
19-Nov-2006, 03:24
Where are the slower options?

About the same option could also mean slower on R600 side.
Because "about" could mean slower or faster or same.
I don't think R600 could suck bad.

Chalnoth
19-Nov-2006, 03:44
Well, it could, but I don't think any of us think it will.

MfA
19-Nov-2006, 04:35
But according to nVidia the performance improvement from full scalar is up to 2x
That's a bit disingenuous ... by saying "full scalar" you are suggesting the 2x improvement is in comparison to "partly scalar". Which isn't what they said.

My guess is that the split vector/scalar should perform well enough for "normal" rendering for it not be crippling for ATI to stick with it ... of course if you want to use it to run other types of algorithms the full scalar architecture is going to be very advantageous a lot of the time.

Chalnoth
19-Nov-2006, 07:18
That's a bit disingenuous ... by saying "full scalar" you are suggesting the 2x improvement is in comparison to "partly scalar". Which isn't what they said.
I assumed they meant with respect to co-issue hardware, like the NV4x.

Shtal
19-Nov-2006, 07:30
Well, it could, but I don't think any of us think it will.

Sorry, personally it never occurred to me or came to my mind that I should included slower option in percentage wise; but if any of you think that, go ahead and post your comment here and it will count. Sorry....

So far; 15-20% wins at the moment, I hope people are here has positive hope/side on ATI/AMD R600.

Also just curious; if any of you guys could/would/maybe/possibility - think that R600 could be much slower vs. G80. Please talk/discuss. I would like to see what way it could if somebody think that way only.

Tim Murray
19-Nov-2006, 07:49
Why are so many people assuming that R600 is nothing more than Xenos on a PCI Express bus? That seems awfully silly, considering the platform differences, DX10 requirements, and the lessons they've learned from Xenos, including those in the ALU configuration. Plus, looking at DX10 processors that are currently on the market, every one uses scalar ALUs. I'm guessing there are significant advantages to this in the D3D10 specs, and I'm pretty sure ATI would know this as well.

SA2
19-Nov-2006, 08:16
It would indeed be more than a bit surprising for the R600 to simply be a Xenos with a few more shaders, given the rumored die size, memory bandwidth, power requirements, delivery timeframe, etc.

Shtal
19-Nov-2006, 08:55
I read somewhere "I think it was from Xbitlab" that R600 is going to be based on Xenos R500 tech with improved shaders capability's, plus of course DX10 compliant hardware. The only downside it will not include embedded eDRAM that comes in Xenos.

satein
19-Nov-2006, 09:52
Well, it could, but I don't think any of us think it will.
Well, you sound so confident in the post... any bean to spill here to indicate such a point :wink:

It would indeed be more than a bit surprising for the R600 to simply be a Xenos with a few more shaders, given the rumored die size, memory bandwidth, power requirements, delivery timeframe, etc.
Yes, I agree :yep2:. At least two of the differents are the Ring-bus that absent from Xenos and the EDRAM that will absent from R600. Anyway, I believe the off-spring of Xenos and R5xx combination would do well against G8x. It seems like right-now AMD/ATi is so quite as if they are :sleeping:...

Hope we will see any leaks soon :cool:

Edit: Typo (as usual :oops:)

MfA
19-Nov-2006, 10:15
I assumed they meant with respect to co-issue hardware, like the NV4x.
From the direct quote (http://www.hothardware.com/viewarticle.aspx?page=2&articleid=903&cid=2), "NVIDIA engineers have estimated as much as 2X performance improvement can be realized from a scalar architecture that uses 128 scalar processors versus one that uses 32 4-component vector processors".

hoom
19-Nov-2006, 10:35
I read somewhere "I think it was from Xbitlab" that R600 is going to be based on Xenos R500 tech with improved shaders capability's, plus of course DX10 compliant hardware. The only downside it will not include embedded eDRAM that comes in Xenos.Yeah, thats what everyone assumes but well, we mostly assumed non-unified beefed up g71 for g80 & that wound up being very wrong so who knows? Other than NDAed people anyway :???:

no-X
19-Nov-2006, 11:16
From the direct quote (http://www.hothardware.com/viewarticle.aspx?page=2&articleid=903&cid=2), "NVIDIA engineers have estimated as much as 2X performance improvement can be realized from a scalar architecture that uses 128 scalar processors versus one that uses 32 4-component vector processors".
I'd like to know, which configuration has higher transistor count. 128 scalar-processors or 64 vec4 processors? I still think that old-style vec3+scalar architecture is more effective (at least) for todays games (performance per square mm). G80 shader core should be 2x more effective than G71, it's 2x bigger than G71 and it's ALU are clocked 2x higher than G71, but ther performance isn't 2x2x2 = 8x better... it's still about twice as fast as G71...

nAo
19-Nov-2006, 12:05
It's much bigger cause it's also more advanced and it supports much more features

DemoCoder
19-Nov-2006, 12:41
A DX10 SIMD vector architecture IMHO will be more complex than a DX10 scalar architecture, especially if you count software complexity. The SP architecture achieves efficiency from simplicity. Utilization rates go up even as individual scheduling logic is simpler. SIMD architectures impose alot more work in the driver and silicon to make sure HW doesn't sit idle. it's hard to measure this effect on transistors because the G80 is a massive boost in features as well.

Geo
19-Nov-2006, 13:25
A DX10 SIMD vector architecture IMHO will be more complex than a DX10 scalar architecture, especially if you count software complexity. The SP architecture achieves efficiency from simplicity. Utilization rates go up even as individual scheduling logic is simpler. SIMD architectures impose alot more work in the driver and silicon to make sure HW doesn't sit idle. it's hard to measure this effect on transistors because the G80 is a massive boost in features as well.

We're assuming R600 is SIMD because of Xenos? They've clearly said that they're leveraging Xenos, but one would have to believe they are bringing some v2 wrinkles as well.

Arun
19-Nov-2006, 13:33
G71 isn't Vec4... It's Vec2+Vec2/Vec3+Scalar. There still are efficiency gains with a purely scalar architecture, and they can still be up to 2x in the absolute corner cases, but generally speaking they're much smaller.

I do not believe NVIDIA's implementation of their scalar units are more expensive than G71's implementation of Vec2+Vec2/Vec3+Scalar (at least not by more than, say, 10%) - it's just smarter, imo.


Uttar

silent_guy
19-Nov-2006, 19:26
I'd like to know, which configuration has higher transistor count. 128 scalar-processors or 64 vec4 processors?
Irrespective of whether they are scalar or not, GPU's have simple execution units, in the sense that they don't have single thread out-of-order execution, register renaming, branch prediction and other fancy stuff that you can find in contemporary CPU's. They also don't have support for exceptions, no interrupt support etc etc. So after stripping off all this, you're left with the ALU's, register files (that are much larger for a GPU than CPU) and probably still quite a lot of control logic but a lot of it won't be directly scalar/non-scalar related.
My guess would be that the key factor in determining the relative perf/mm2 efficiency of the processor is in the complexity of the register file design: single ported/double ported/tripple ported? The area increases more or less linear with the number of ports. If a ve3/4 unit can produce 3 or 4 MULs per clock cycles, the data has to come from and go to somewhere?
Maybe Jawed has a better idea about this:?:

I still think that old-style vec3+scalar architecture is more effective (at least) for todays games (performance per square mm).G80 shader core should be 2x more effective than G71, it's 2x bigger than G71 and it's ALU are clocked 2x higher than G71, but ther performance isn't 2x2x2 = 8x better... it's still about twice as fast as G71...
I'm sure that a bit of creative thinking will allow you to find some holes in your calculation. (Hint: count the number MADD's and try to find out if G80 has functionality that has been added or has been improved compared to G70)

Shtal
19-Nov-2006, 23:08
Right now AMD is getting ready with 65nm shrink and soon be widely available, does anybody think since AMD acquire ATI, it will increase/faster development for ATI R6xx series to be 65nm shrink too. Or it would not make any differences even if ATI was never bought by AMD.

Does anybody ever thought about this question?

Razor1
19-Nov-2006, 23:12
AMD doesn't have the capacity to do both right now, or in the near to mid future, thats the only thing really stopping AMD from making ATi chips in there fabs that and it will take time to shift over to the AMD libraries, I would think possibly in 2 or 3 years we might see a transition once the NY fab opens up. But for the time being nada :wink:

Kaotik
19-Nov-2006, 23:13
Right now AMD is getting ready with 65nm shrink and soon be widely available, does anybody think since AMD acquire ATI, it will increase/faster development for ATI R6xx series to be 65nm shrink too. Or it would not make any differences even if ATI was never bought by AMD.

Does anybody ever thought about this question?

It doesn't really make any difference, R6xx chips are manufactured by TSMC, not AMD

Shtal
19-Nov-2006, 23:15
AMD doesn't have the capacity to do both right now, or in the near to mid future, thats the only thing really stopping AMD from making ATi chips in there fabs that and it will take time to shift over to the AMD libraries, I would think possibly in 2 or 3 years we might see a transition once the NY fab opens up. But for the time being nada :wink:

So in other words it feels it's still two different company's only with AMD logo now vs. ATI logo.

Jawed
19-Nov-2006, 23:16
This patent:

Simulating Multiported Memories Using Lower Port Count Memories (http://v3.espacenet.com/textdoc?DB=EPODOC&IDX=WO2006017135&F=0)

needs some study for the register file complexity question. As far as I can tell the primary issue with G80's register file is that on each clock cycle the access pattern can be different from the prior clock.

Additionally, it now seems that there are three kinds of units writing to the register file:

primary MAD pipeline - reading 16, 32 or 48 scalars per clock and writing 16 scalars per clock
SF unit - reading and writing four scalars per clock
TEX pipeline - writing 16 scalars (4x vec4) per clock (much slower clock)and I haven't even considered the flow of data required in reading constants (from the constant buffer - a process that appears to be more like texel-fetching, but with the data normally expected to reside in L1 cache, I guess).

---

The "co-issuable" MUL in G80 appears to be Int24, only. It seems G80 actually offers ~345GFLOPs and NVidia, for some reason, never quotes GFLOPs in documents (TIA: if someone can find a GFLOPs figure for G80 in an NVidia document - it used to mystify me why NVidia's been so thoroughly quiet on this).

Obviously if you compare R580's GFLOPs, they come from the co-issue of MAD+ADD. In MADs, alone, G80 is ~33% faster, whilst in ADDs R580 is ~equal. Of course, that's ignoring R580's VS GFLOPs.

Obviously the whole thing is skewed in G80's favour because of unified shading and the scalar pipeline's inherently greater utilisation in code that doesn't occupy all four channels of vec4 ALUs.

Jawed

Razor1
19-Nov-2006, 23:29
So in other words it feels it's still two different company's only with AMD logo now vs. ATI logo.

Not two different companies, kinda like if ain't broken don't fix it!

Well really depends AMD could make the choice of making ATi GPU's in thier fabs, but if they are getting more money from AMD chips, it wouldn't be smart for them to shift already low capacity over to something over all less profitable.

Shtal
19-Nov-2006, 23:38
Not two different companies, kinda like if ain't broken don't fix it!

Well really depends AMD could make the choice of making ATi GPU's in thier fabs, but if they are getting more money from AMD chips, it wouldn't be smart for them to shift already low capacity over to something over all less profitable.

I remember the days when Nvidia acquire 3DFX; then later they used 3Dfx engineers to make NV30-FX chips: I wasn’t sure failure was cause by that reason or not. But the original question was is two heads better then one.

just to let you know I'm 100% with you on your answer!

Geeforcer
20-Nov-2006, 02:54
As has already been pointed out, "R600 = PC-ized beefed-up Xenos" does not mash with 700+ million transistors rumor.

Kaotik
20-Nov-2006, 03:05
As has already been pointed out, "R600 = PC-ized beefed-up Xenos" does not mash with 700+ million transistors rumor.

Why not? It's all about how much you beef it up :wink:

Geeforcer
20-Nov-2006, 03:13
I should have been more specific - I had "64 Xenos-type ALUs" rumor in mind. IMO, for the 700M transitors rumor to be true, either R600 ALU >> Xenos ALU or R600 ALU # >> 64.

SugarCoat
20-Nov-2006, 04:36
As has already been pointed out, "R600 = PC-ized beefed-up Xenos" does not mash with 700+ million transistors rumor.



If you had tried to tell me that the 8800GTX was being powered by a 700M transistor chip 4 months ago i would of told you you're head might be screwed on backwards and stuck in your ass. Everyone should be humbled and accepting after that ;).

Calling the R600 an improved Xenos really doesnt do it justice at all. It would have to be quite large regardless to pack in support for a 512-bit bus, let alone what added features might do to the transistor count.

I had "64 Xenos-type ALUs" rumor in mind.

well erase it from your mind, they're significantly beefed up.

Geeforcer
20-Nov-2006, 04:50
All of that is exactly my point: A) R600 will either feature some pretty dramatic changes to its 64 ALUs OR B) have more than 64 of them OR C) will not be 700+ Million transistors. My money is on A.

rwolf
20-Nov-2006, 09:59
All of that is exactly my point: A) R600 will either feature some pretty dramatic changes to its 64 ALUs OR B) have more than 64 of them OR C) will not be 700+ Million transistors. My money is on A.

We know that to be true based on patents. Accumulator, min, max in simd unit.

rwolf
20-Nov-2006, 10:08
We're assuming R600 is SIMD because of Xenos? They've clearly said that they're leveraging Xenos, but one would have to believe they are bringing some v2 wrinkles as well.

It is a good assumption given that sine of their latest patents show vec4 + scalar.

PeterAce
20-Nov-2006, 13:27
AMD (ATi) adopted a strategy with R520 with their initial thourghts that as they were making a big change to the previous R4xx gen they decided that SM 3.0 + new ultra-threaded design was enough 'risk' and that they did 'go for the moon' by adding lots of extra PS ALUs, once the design was trusted they went for the 'many more ALUs' R580.

As more noices have been made resently about 64 ALUs (Vec4 MADD + Scalar ADD/SF) in R600, I'm wondering of they are doing the same thing this time round with the first high-end R600. Maybe 'next gen refresh' of R600 (labled as 65nm on the current roadmap) will be more like the 'going for the moon' version (many more ALUs + new 10.1 requirements) and will be more like my previous speculation of 96 ALUs :

http://www.beyond3d.com/forum/showpost.php?p=854164&postcount=27

Razor1
20-Nov-2006, 13:56
The "co-issuable" MUL in G80 appears to be Int24, only. It seems G80 actually offers ~345GFLOPs and NVidia, for some reason, never quotes GFLOPs in documents (TIA: if someone can find a GFLOPs figure for G80 in an NVidia document - it used to mystify me why NVidia's been so thoroughly quiet on this).

Obviously if you compare R580's GFLOPs, they come from the co-issue of MAD+ADD. In MADs, alone, G80 is ~33% faster, whilst in ADDs R580 is ~equal. Of course, that's ignoring R580's VS GFLOPs.

Obviously the whole thing is skewed in G80's favour because of unified shading and the scalar pipeline's inherently greater utilisation in code that doesn't occupy all four channels of vec4 ALUs.

Jawed


Its in the gf8800 tech brief

http://www.nvidia.com/object/IO_37100.html

Each stream processor on a GeForce 8800 GTX operates at 1.35 GHz and supports the dual issue of a scalar MAD and a scalar MUL operation, for a total of roughly 520 gigaflops of raw shader horsepower. But raw gigaflops do not tell the whole performance story. Instruction issue is 100 percent efficient with scalar shader units, and the mixed scalar and vector shader program code will perform much better compared to vector-based GPU hardware shader units that have instruction issue limitations (such as 3+1 and 2+2).

There is a good chance ATi's r600 might have more gflops so they probably aren't going to market the gflop side to much right now.

Jawed
20-Nov-2006, 14:14
Ta, it was under my nose. I guess they better hurry up and get it working then.

Jawed

Razor1
20-Nov-2006, 15:09
nah it was hard to find, actually looked through that doc too yesterday and missed it lol,

but that would the one hell of a back fire if nV starts promoting gflops and they get the short end of the stick!

trinibwoy
20-Nov-2006, 15:24
Flop numbers have never been a focus of PC GPU marketing before....why would it become so now?

Kaotik
20-Nov-2006, 15:40
Flop numbers have never been a focus of PC GPU marketing before....why would it become so now?

I guess GPGPU solutions might be a reason for GFLOP-advertising?

trinibwoy
20-Nov-2006, 16:00
I guess but wouldn't that be a completely different realm of marketing? I'd be surprised to see gflop quotes on a retail box for example.

no-X
20-Nov-2006, 17:58
I remember the days when Nvidia acquire 3DFX; then later they used 3Dfx engineers to make NV30-FX chips: I wasn’t sure failure was cause by that reason or not. But the original question was is two heads better then one.
nVidia used ex-3Dfx engineers to make NV4x, too http://www.beyond3d.com/previews/nvidia/nv40/index.php?p=9

//edited

_xxx_
20-Nov-2006, 22:05
I wouldn't say "they used 3dfx engineers", it's nV's employees after all ;)

dnavas
21-Nov-2006, 01:20
This patent:

Simulating Multiported Memories Using Lower Port Count Memories (http://v3.espacenet.com/textdoc?DB=EPODOC&IDX=WO2006017135&F=0)

needs some study for the register file complexity question.

Honestly, I'm confused by that patent, because I'm not sure exactly what problem it's trying to solve. This appears to me to be similar to multibank DRAM and/or XDR2, but I don't expect, during the normal course of graphics rendering, to require randomized access to the register file. You're always dealing with one of a handful of registers on a particular thread. I'm clearly missing something, but I'm not sure where to start looking :(

Shtal
21-Nov-2006, 08:05
I was wondering does ATI really needs 512bit memory, is it really could possibly happened that it might come true, would it effect enough in performance to justify the cost on 512bit memory. Would it affect Nvidia that ATI has 512bit in front of the customers that will choose to buy new video card, by looking at high-end with 4 choices they get 256bit, 320bit, 384bit or mighty 512bit option from ATI?

Or will ATI choose the cheaper route vs. going with 512bit memory. My understanding with X1950XTX 2.0GHz GDDR4 @64GB’s bandwidth: it gets to limitation with memory when you start over clock the core-R580 to ~900MHz range, that’s were impact begins. At lower frequency GDDR4 2.0GHz feeds the R580 core just fine even over- clocking at ~800MHz range. It might be possibly the high bandwidth of 512bit memory could be reserve at high frequency GPU or running insane resolution with AA+HDR.

Radeon600
21-Nov-2006, 10:28
This thread is entirely revolving around how many Shader Processors will R600 have, and about Memory Interface, purely ignoring R600 Texturing capabilities and technological advacements in ROP design, I am pretty sure that makes a huge difference as well. To be honest, more speculation will rather lead to more confusion. R600 will be powerfull enough, and that's for sure. There are strong chances that R600 might able to beat 8800GTX, given that ATI is taking more time.

Corwin_B
21-Nov-2006, 10:32
There are strong chances that R600 might able to beat 8800GTX, given that ATI is taking more time.

Yep, and NV30 really kicked the R300's ass... :grin:

Radeon600
21-Nov-2006, 10:43
That was NV's fault, they were wrong in predicting about what future games hold for their Architecture, infact, I am pretty sure same applies for X1800XT also, as initially the X1800XT was slightly slower than 7800GTX but yet with the help of future drivers it managed to beat 7800GTX, thanks to the fully programmable Ring Bus Memory Controller.

Xmas
21-Nov-2006, 10:51
Honestly, I'm confused by that patent, because I'm not sure exactly what problem it's trying to solve. This appears to me to be similar to multibank DRAM and/or XDR2, but I don't expect, during the normal course of graphics rendering, to require randomized access to the register file. You're always dealing with one of a handful of registers on a particular thread. I'm clearly missing something, but I'm not sure where to start looking :(
Even if it's only a handful of registers, it's still random access. And you need to be able to switch threads quickly.

_xxx_
21-Nov-2006, 10:53
I think we'll have the same situation as always, speed-wise both will be within 5% difference in most cases, prices will be about the same but nV's chip/boards are cheaper to manufacture and thus nV will have better margins again (and better sales thanks to the timing).

Ailuros
21-Nov-2006, 11:17
That was NV's fault, they were wrong in predicting about what future games hold for their Architecture, infact, I am pretty sure same applies for X1800XT also, as initially the X1800XT was slightly slower than 7800GTX but yet with the help of future drivers it managed to beat 7800GTX, thanks to the fully programmable Ring Bus Memory Controller.

Must be the reason why the released the R580 with 3x the floating point power on paper I guess; before we end up running around in circles again, his point was valid since a later release does not guarantee higher performance at any price. It's what everyone expects that much is true (and better be that way IMHO).

Last but not least, also read xxx's comment above. It's not that I expect R600 to not end up more than competitive; later releases though do come with a specific amount of disadvantages .

Radeon600
21-Nov-2006, 11:24
I think we'll have the same situation as always, speed-wise both will be within 5% difference in most cases, prices will be about the same but nV's chip/boards are cheaper to manufacture and thus nV will have better margins again (and better sales thanks to the timing).

If R600 able to beat 8800GTX convincingly and If ATI price R600 sensibly from the start, then ATI will able to recover very quickly, especially they're looking more agressive in the mainstream segment from about an year or so.

And I am pretty sure most people are waiting for R600 (besides die-hard NV fans), before doing any upgrade. There are no mainstream DX10 cards present yet, so I believe it isn't hurting AMD/ATI business much.

nAo
21-Nov-2006, 11:27
And I am pretty sure most people are waiting for R600 (besides die-hard NV fans) before doing any upgrade.
1) Developers don't wait
2) Even common ppl don't wait forever

Radeon600
21-Nov-2006, 11:38
1) Developers don't wait

From what I think and from what I've read, many developers have already been provided with a R600 Sample


2) Even common ppl don't wait forever

I am not saying people will have to wait forever ;) And I think you don't know but enthusiast market is pretty small compared to mainstream to low-end market. Think about it from marketing and profit perspective. AMD/ATI isn't losing much of money, and ATI is barely three months behind NV.

nAo
21-Nov-2006, 11:43
From what I think and from what I've read, many developers have already been provided with a R600 Sample
Yeah..it's full of devs with a R600 out there..

Radeon600
21-Nov-2006, 11:45
Yeah..it's full of devs with a R600 out there..

Yea perhaps, many excludes all devs ;)

The_Wolf_Who_Cried_Boy
21-Nov-2006, 11:48
1) Developers don't wait
2) Even common ppl don't wait forever

Yeah..it's full of devs with a R600 out there..

Would this be a subtle hint R600 is runnning really late, potentially missing even Q1-07 release?

vertex_shader
21-Nov-2006, 11:49
If R600 able to beat 8800GTX convincingly and If ATI price R600 sensibly from the start, then ATI will able to recover very quickly, especially they're looking more agressive in the mainstream segment from about an year or so.

And I am pretty sure most people are waiting for R600 (besides die-hard NV fans), before doing any upgrade. There are no mainstream DX10 cards present yet, so I believe it isn't hurting AMD/ATI business much.

When R600 released AMD not have too much time release the mainstream products, nv g80mainstream parts coming in 2007Q1.
In the last some try ATi can't make a good price/performance mainstream card, always come out with highend cards some pipeline and other things disabled. (x800gt/x800gto, x1800gto).
They have now a good product the x1950pro, x1650xt 30-40 $ cheaper only than the x1950pro, and much slower, x1650xt is very late, and not really faster than the 7600gt, x1950pro now the best peformance/$ card in the 180-200$ category.

AMD need to release r600 mainstream cards in 2007Q1, and can't make a performance mistake like last time with the x1300/1600series, rv515/rv530 was very unbalanced.
Highend is good for PR, but not good for earn money, mainstream market and oem is the real golden chest :smile:

Radeon600
21-Nov-2006, 11:52
Must be the reason why the released the R580 with 3x the floating point power on paper I guess; before we end up running around in circles again, his point was valid since a later release does not guarantee higher performance at any price. It's what everyone expects that much is true (and better be that way IMHO).

Last but not least, also read xxx's comment above. It's not that I expect R600 to not end up more than competitive; later releases though do come with a specific amount of disadvantages .

Also Ailuros, I see your point. But chances are most probably that ATI is going to give NV a tough competition, and ATI is always aggressive in product pricing from the scratch.

Arun
21-Nov-2006, 11:56
There are no mainstream DX10 cards present yet, so I believe it isn't hurting AMD/ATI business much.It's fairly accurate to say that NVIDIA and ATI have a profit of ~$100 on the ultra-high-end chips, unless they need to price it stupidly aggressively to remain competitive. So that's $1M of pure profit for every 10K boards sold. While I agree that this isn't likely to significantly change anyone's profitability, claiming it to be inconsequential would be quite insane, too, because it's tons of profit,with high margins.
That was NV's fault, they were wrong in predicting about what future games hold for their ArchitectureErrr, you'd be quite naive that to think either IHV does not have the proper intelligence to figure that shit out. The problem isn't that they don't know what to optimize for, but rather that they don't know how long specific trends will last and how long their specific architecture will last.

Consider the fact that when NVIDIA designed the NV40, they expected the G80 to be ready much earlier than it was. And they couldn't drastically change the NV4x architecture for G70 (and especially G71), when workloads started changing, as that would have taken a substantial investment of time and money that was better spent elsewhere. Their architecture wasn't that flexible there anyway; decoupling the TEX units would have been a big job, and adding an ALU might have been just as problematic for compiler reasons, and because the pipeline might become too long then. Of course, those are just a few of the reasons I can think of, and may not even be the best ones - but you get the point, I think.


Uttar

nAo
21-Nov-2006, 12:03
Another way to make good predictions is to make sure that devs fulfill your own predictions :)

Radeon600
21-Nov-2006, 12:22
It's fairly accurate to say that NVIDIA and ATI have a profit of ~$100 on the ultra-high-end chips, unless they need to price it stupidly aggressively to remain competitive. So that's $1M of pure profit for every 10K boards sold. While I agree that this isn't likely to significantly change anyone's profitability, claiming it to be inconsequential would be quite insane, too, because it's tons of profit,with high margins.

That's a rough analysis, and as I said, most of the people will wait for R600 before declaring a final winner and will upgrade according to that.


Errr, you'd be quite naive that to think either IHV does not have the proper intelligence to figure that shit out. The problem isn't that they don't know what to optimize for, but rather that they don't know how long specific trends will last and how long their specific architecture will last.


Well, then in that case, don't you think ATI's predictions had always been more accurate? :D

_xxx_
21-Nov-2006, 12:30
No and no.

nAo
21-Nov-2006, 12:35
LOL, that way you're not giving him any hope :)

Arun
21-Nov-2006, 12:37
Well, then in that case, don't you think ATI's predictions had always been more accurate? :DI'm not sure what you're basing that on. The X1800s and X1900s are exactly what NVIDIA refused to do: major changes to their prior architecture, involving substiantial development time and money. I think that even nowadays, NVIDIA has slightly less total engineers dedicated to GPU products; ATI was working on both R500 and R520 at the same time, while NVIDIA finished G70 about 3-4 months before either, and part of those engineers moved to the expanding G80 project, while others continued to work on G71/G72/G73.

If you look at ATI, you can claim a very similar thing happened to them in the R400/R420/R480 timeframe. While their architecture didn't do too horribly against NVIDIA's in that timeframe, they might still have benefited from a completely new design, which got cancelled. If you look at the middle of 2005, that hit them hard, as pretty much their entire product line became worthless as OEM pressure for checkboxes began rising, and their lost mindshare due to the 7800GTX's success and the X1800XT's delay. And even if the X1800XT hadn't been delayed, they wouldn't had had the RV515 and RV530 in that timeframe, so they'd have been screwed anyway.

So while I'm just slightly detailing a few specific scenarios here, I completely fail to see your point. This kind of thing happens just as much to both NVIDIA and ATI, although to differing extents and for varying timeframes. In the examples I gave, NVIDIA was hit more in the high-end and as a "future games" kind of thing, while ATI was hit more in the mid-end/low-end. I'm sure you could find historical examples of the opposite though (Radeon 9000/GeForce4 MX for a few months, for example) - but if you don't get my point yet after all this, then I'm going to seriously ponder whether you're even trying to respond with more than one-liners...


Uttar

Ailuros
21-Nov-2006, 12:55
Also Ailuros, I see your point. But chances are most probably that ATI is going to give NV a tough competition, and ATI is always aggressive in product pricing from the scratch.

As I said they better be, but that doesn't come without consequences either. Or do you really believe that they wouldn't prefer to make X more profit in the end?

When a competitor arrives with a product months earlier, he's not getting only a headstart in sales figures, he has also time to raise availability, price products cheaper etc etc. Yes ATI can follow suit almost instantly, but you're not seriously suggesting that it's the exact same thing when it comes to profits do you?

Well, then in that case, don't you think ATI's predictions had always been more accurate?

In terms of the high end segment alone lately, or other parts of the market too? :roll:

_xxx_
21-Nov-2006, 13:28
LOL, that way you're not giving him any hope :)

He's got enough hope already, obviously...

Bjorn
21-Nov-2006, 13:36
When a competitor arrives with a product months earlier, he's not getting only a headstart in sales figures, he has also time to raise availability, price products cheaper etc etc.

And the availability for the G80 seems to be rather good compared to other launches. Komplett.se has around 70 GTS and 50 GTX cards in stock currently.

pjbliverpool
21-Nov-2006, 13:41
R600 doesn't have to just beat the 8800GTX. It needs to beat its refresh which will almost certainly be out very soon after.

Im not expecting more than a speed bump but it could be good for another 10% of performace.

Radeon600
21-Nov-2006, 13:58
I don't think 8900GTX will make it to the market before July/August. But why you think R600 should be powerfull enough to compete 8900GTX? ATI can also release R680? Historically I haven't anything like this, and it's unlikely.

Bjorn
21-Nov-2006, 14:08
So Ati is three or so months late with the R600 release (compared to the G80 at least) and yet you seem to believe that they can release a refresh at the same time as Nvidia ?

Arun
21-Nov-2006, 14:12
The R200 had to compete against the GF3 Ti500 instead of the plain GF3. The NV30 had to compete against the 9800Pro instead of the 9700Pro. The R520 would have had to compete against the 7800GTX 512MB had NVIDIA not made that product a complete joke in terms of availability. The Voodoo 5500 had to compete against the GeForce2 GTS instead of the GeForce 256 DDR. And the list goes on.

It's relatively fair to claim history teaches us something is unlikely, but it tends to be a good idea to make sure of that before doing so ;)


Uttar

Radeon600
21-Nov-2006, 15:38
So Ati is three or so months late with the R600 release (compared to the G80 at least) and yet you seem to believe that they can release a refresh at the same time as Nvidia ?

R520<->R580?

Razor1
21-Nov-2006, 15:41
the r520 was more then 3 months late........... it was around 6 months if I remember, which should have been the intended lauch of the r580, but then the r580 came out a few months after.

So ATi won't be going for a quick refresh, because this time thier chip shouldn't be as late, they won't have the capability of a quick r580 luanch since there would be at least a few months for them to get the refresh ready.

Ailuros
21-Nov-2006, 16:44
And the availability for the G80 seems to be rather good compared to other launches. Komplett.se has around 70 GTS and 50 GTX cards in stock currently.

That might suggest though something else too: adoption isn't as enthusiastic as expected.

Ailuros
21-Nov-2006, 16:50
R520<->R580?

Meaning that they'll let another high end product vanish on such short notice? Given the circumstances that surrounded R520 they didn't have any other choice, but if something like that turns out to continue itself over and over again it's rather sub-optimal for their yields.

What you're implying by the way suggests that R600 won't be as competitive as others expect it to be; there's a reason why they pushed the R580 release that close after the R520 and it's called floating point power.

If ATI is confident enough that R600 is highly competitive (which I personally believe it to be in the end) then there shouldn't be a single need for a 3 month refresh, effectively killing sales of the R600 as soon as it sets foot on shelves.

_xxx_
21-Nov-2006, 16:59
That might suggest though something else too: adoption isn't as enthusiastic as expected.

Just because there are no games out there requiring it yet. Wait till something appears that will make the DX9-gen HW run it in slow-motion, everyone will go out and get the shiny-new-whatever's-best-now.

Razor1
21-Nov-2006, 17:00
Just because there are no games out there requiring it yet. Wait till something appears that will make the DX9-gen HW run it in slow-motion, everyone will go out and get the shiny-new-whatever's-best-now.


That and possible the wait on christimas sales.

dnavas
21-Nov-2006, 17:03
Even if it's only a handful of registers, it's still random access. And you need to be able to switch threads quickly.

So, it doesn't seem all that random to me, even after you split out the 16 st.units. If you have 32 fragments per st. unit, then all of your operands within a cycle are within ~3% of each other. That seems pretty spatially close to me. Additionally, while you want to be able to switch quickly, if you need to pause every cycle to wait for texture access, you're going to be limited on texture throughput anyway. It's more likely that you have at least 2-3 ALU cycles back-2-back. Without vec->scalar "unrolling", those additional cycles are likely to be accessing other parts of the same vector, so that makes for some smaller amount of temporal locality. [If the vec->scalar process includes "unrolling" in order to save temp. registery usage, then there's a more complex access-path relationship which might make less sense to take advantage of.]

I've been assuming:
struct FragmentReg { fp32[16]; }
struct StreamingUnitReg { FragmentReg[32]; }
struct ClusterReg { StreamingUnitReg[16]; }
int32 regAllocatedBitField
...but maybe there's a reason to arrange things differently? Insight appreciated. Thanks!

nAo
21-Nov-2006, 17:37
'randomness' in this case can be addressed trying to prefetch registers that are going to be used in the next clock cycles from unused banks and also reusing some operand over multiple passes recirculating them into the pipeline without storing them in the register file.

Shtal
21-Nov-2006, 18:43
This thread is entirely revolving around how many Shader Processors will R600 have, and about Memory Interface, purely ignoring R600 Texturing capabilities and technological advacements in ROP design, I am pretty sure that makes a huge difference as well. To be honest, more speculation will rather lead to more confusion. R600 will be powerfull enough, and that's for sure. There are strong chances that R600 might able to beat 8800GTX, given that ATI is taking more time.

This rumor so far does not “please a lot” maybe little bit on texture capabilities; R600 has 64 complex shader units - each composed by 4 simple units: 4x64=256 simple shader units. G80 has 128 simple shader units but about double the clock - 1350MHz. R600 256shaders@800MHz and G80 128@1350MHz. The R600 design of the chip is not exact same vs. G80, similarly as R580 chip; the R600 is supposed to be able to handle more intense and complex visual dynamics better vs. G80. Based on the R600 information right now: it can render 16 pixels @700 - 800MHz as opposed to G80 24 pixels @ 575MHz. In the end 16x750 is smaller vs. 24x575, but not by much; don’t forget that R580 also has 16 ROPs as opposed to Nvidia G71 24 and it doesn’t make the ATI card perform lower vs. NVIDIA.
What I rather to see is increase ROP units to 24.
This will be nice on R600 render 24 pixel @700 – 800MHz.
Also on texture units if they stretch little bit better vs. 256shaders@800MHz. Like 256shaders @ 1GHz. Just to be more on ahead side.

trinibwoy
21-Nov-2006, 18:51
the R600 is supposed to be able to handle more intense and complex visual dynamics better vs. G80.

I'm seeing this all over the place. Where did this R600 will be better at more "intense and complex" shaders come from?

Acert93
21-Nov-2006, 19:05
On the late issue...

I know a lot of people waiting for Vista to upgrade anything. ATI will be surely late if they miss the Vista launch. And while G80 and R600 accelerate DX9 games, since D3D10 is Vista only and not available on DX9, I do wonder how "late" R600 is when the primary PR platform for new D3D10 GPUs isn't even available to mass consumers.

Missing the holiday season, especially when your competitor has hit the lucrative enthusiest market with a product the performs insanely well with current hardware, is a big loss for MS. Missing the Vista launch, even in Spring, may be even worse though.

Razor1
21-Nov-2006, 20:19
This rumor so far does not “please a lot” maybe little bit on texture capabilities; R600 has 64 complex shader units - each composed by 4 simple units: 4x64=256 simple shader units. G80 has 128 simple shader units but about double the clock - 1350MHz. R600 256shaders@800MHz and G80 128@1350MHz. The R600 design of the chip is not exact same vs. G80, similarly as R580 chip; the R600 is supposed to be able to handle more intense and complex visual dynamics better vs. G80. Based on the R600 information right now: it can render 16 pixels @700 - 800MHz as opposed to G80 24 pixels @ 575MHz. In the end 16x750 is smaller vs. 24x575, but not by much; don’t forget that R580 also has 16 ROPs as opposed to Nvidia G71 24 and it doesn’t make the ATI card perform lower vs. NVIDIA.
What I rather to see is increase ROP units to 24.
This will be nice on R600 render 24 pixel @700 – 800MHz.
Also on texture units if they stretch little bit better vs. 256shaders@800MHz. Like 256shaders @ 1GHz. Just to be more on ahead side.


Vector ALU's might be more complex to make but that doesn't mean they can do complex shaders better, actually if you think about it, when optimizing code you tend to try to make a complex algorithm into something simpler because let be CPU or GPU or what ever will have an easier and faster time at execution. This is the same way you can look at how the scalar units work.

Bob
21-Nov-2006, 20:28
don’t forget that R580 also has 16 ROPs as opposed to Nvidia G71 24 and it doesn’t make the ATI card perform lower vs. NVIDIA.
G71 has 16 ROPs, but 24 fragment and texture pipes.

Shtal
21-Nov-2006, 22:54
G71 has 16 ROPs, but 24 fragment and texture pipes.

sorry typo error

this is what I mean: R580 has 16 ROPs and 16 texture as opposed to Nvidia G71 16 ROP and 24 texture clock and it doesn’t make the ATI card perform lower vs. NVIDIA.

Shtal
22-Nov-2006, 01:22
I'm seeing this all over the place. Where did this R600 will be better at more "intense and complex" shaders come from?

I'm think same thing about were rumor came up with this idea. I was saying this what they make you think to believe!


Vector ALU's might be more complex to make but that doesn't mean they can do complex shaders better, actually if you think about it, when optimizing code you tend to try to make a complex algorithm into something simpler because let be CPU or GPU or what ever will have an easier and faster time at execution. This is the same way you can look at how the scalar units work.

Exactly the point you made! Agree....

Rangers
22-Nov-2006, 01:44
That might suggest though something else too: adoption isn't as enthusiastic as expected.

Maybe because there aren't any games? Let alone games that stress high end cards.

Seriously, if ATI and Nvidia want their market to remain relevant, maybe they should invest in games development. Or more likely, hope microsoft's games for windows type stuff takes off in a big way.

Originally Posted by trinibwoy View Post
I'm seeing this all over the place. Where did this R600 will be better at more "intense and complex" shaders come from?

The Inq was one who said it. Of course, they're pulling it out of their ass. People just want to pigenohole things and even easier, it was true last generation, so they go with that. It sounds nice.

INKster
22-Nov-2006, 02:11
True high-end DX10 hardware adoption will only start when Vista is finally in stores.
So, starting with a G80 before the holiday season was probably a way to kick start driver feedback and have a nice little (but fat) profit margin, as only a ultra-high-end GPU can provide. :D

The question remains, then.
Will ATI come out with a full family of R600 cheaper derivatives right away, like R5xx before it, or will they wait a little longer to flush mainstream DX9 parts from the channel/stock ?
Nvidia is probably releasing mainstream DX10 GPU's around that time, so they better be steady, or risk loosing even more on that vital segment.

Shtal
22-Nov-2006, 02:53
The Inq was one who said it. Of course, they're pulling it out of their ass.

Don't say that, you might upset them....
You should be more nice with words. (Like from their Butt instead)If you really want to say it.
But who knows, if you say a lie loud enough, many times enough people will believe it.

Shtal
22-Nov-2006, 03:23
My unanswered question still bothers me:
Did original engineers that worked on (Xbox360) Xonos R500 core did they got together with R580 engineers team and worked together on R600 project. Since ATI has several teams that works on different generation of designing next generation.
A. One team working on first revision core
B. Then it passed over to second team of finalizing the core and then frequency
C. Third team working on next core.

The reason I asked if ATI combine best of the best team together R600 should be incredible.

Correct me if I’m wrong; but I’m not expert of exactly how cycle goes.

Cuthalu
22-Nov-2006, 11:04
Maybe because there aren't any games? Let alone games that stress high end cards.


Put 4x TRAA or above and 16x HQAF on and you'll see several games that stress high end cards.

Arun
22-Nov-2006, 11:53
But who knows, if you say a lie loud enough, many times enough people will believe it.That doesn't mean YOU have to believe it. No offense, but your posts are a tad repetitive and/or openly biased. There's no problem asking questions, but it can become a bit annoying if every single one of them ignores the big picture.

Things like "The reason I asked if ATI combine best of the best team together R600 should be incredible." are rather ridiculous; while the statement itself is quite acceptable considering it's likely some of ATI's best engineers were working on R500 instead of R520, I'm not sure what you can objectively conclude out of this.

The only thing that's relatively obvious at this point is that both NVIDIA and ATI have had tons more time and R&D to invest into this than in the G70/R520 refreshes, and even way more compared to that invested in G71/R580. So, yes, this will be much more of titan's fight - but this is rather traditional for new architectures. You get used to it after you've watched the industry for a while ;)


Uttar

leoneazzurro
22-Nov-2006, 13:01
What I've heard (and I trust the source, of course) is that the R600 shader core is going to be bigger than G80's one (I don't know if it's FAR bigger or only A BIT bigger, but the latter seems to be the case). Of course this does not say anything about the shading power, but of course we can speculate that more transistors equates to more unit, especially if these units contain vector units (AFAIK buliding a D4 unit needs less transistor than four scalar units). No mention of clock speeds.
But definitely R600 seems headed to perform more work per clock cycle with repect to G80.

Ailuros
22-Nov-2006, 16:52
That doesn't say a single thing though about its efficiency :roll:

leoneazzurro
22-Nov-2006, 17:04
Of course, especially because if the ALUs will work on vectors, efficiency will be reduced.
And the "more work per clock cycle" does not say anything about absolute performance because we don't know the clock frequency of R600, and clock frequency of shader core on G80 will probably not reached by R600 by a "fair" amount :) . My point could be, if R600 could have more than Inquirer's prospected 64 ALUs (if they are really Vec 4 at all) or if these ALUs could be "superscalar" and perform like 2+2 or 3+1 in NV40 - G70's fashion, or they could have a completely different arrangement.
Anyway, my guess it's that R600 will trade (a little of) efficiency for more peak "horsepower" in the shader department, even if simple vec 4 ALU seems not really reasonable for me.

leoneazzurro
22-Nov-2006, 17:05
(please forgive me for my English, you can see it's not my native language) ;)

3dilettante
22-Nov-2006, 17:09
Are we sure the trade-off in going Vec4 didn't reap some rewards in other areas?

What exactly is the common mix of instructions that GPUs tend to encounter? If a given shader is heavy in vector ops, the vector units could do better.

It may be that other parts of the GPU may have benefited, if there are transistor savings to sticking with vector units.
I'm curious if the scheduling mechanism used by R600 has been influenced by the choice to use wider units.

Geo
22-Nov-2006, 17:16
Are we sure the trade-off in going Vec4 didn't reap some rewards in other areas?

What exactly is the common mix of instructions that GPUs tend to encounter? If a given shader is heavy in vector ops, the vector units could do better.

It may be that other parts of the GPU may have benefited, if there are transistor savings to sticking with vector units.
I'm curious if the scheduling mechanism used by R600 has been influenced by the choice to use wider units.


That's an interesting question. Say for the sake of argument that R600 ALUs have received more love than the Xenos ones and are now fully MIMD, or something very close to it. Why would they have still grouped them as a unit that way rather than the single unit scalars of G80? What are the tradeoffs?

trinibwoy
22-Nov-2006, 17:22
Wouldn't a combination of MIMD and scalar call for an exhorbitant amount of control logic and thread handling overhead? And what happens to the caches in this case?

I'd like to see it though - it was bounced around as a possibility for g80 once the scalar stuff started to leak.

leoneazzurro
22-Nov-2006, 17:29
Are we sure the trade-off in going Vec4 didn't reap some rewards in other areas?

What exactly is the common mix of instructions that GPUs tend to encounter? If a given shader is heavy in vector ops, the vector units could do better.

It may be that other parts of the GPU may have benefited, if there are transistor savings to sticking with vector units.
I'm curious if the scheduling mechanism used by R600 has been influenced by the choice to use wider units.

This is a question I cannot respond, but if you have only vector units, you have to maximize the throughput by keeping all vector data together and fed to the ALU. If you have to do a single scalar operation, immediately you lose 75% efficiency for that operation, which is of course not good. This is why I don't trust a "simple vec4" approach (but of course I can be completely wrong on this). So it's possible that the unit can be superscalar, and capable to operate with the same instruction but with 2+2 or 3+1 and even 2+1 or 1+1 patterns. Or it's possible that the arrangement is simply not vec 4 at all, or that there is a very complex scheduling and dispatching/reordering of the instructions/data on the fly.

Ailuros
22-Nov-2006, 17:44
That's an interesting question. Say for the sake of argument that R600 ALUs have received more love than the Xenos ones and are now fully MIMD, or something very close to it. Why would they have still grouped them as a unit that way rather than the single unit scalars of G80? What are the tradeoffs?

It seems to me that full MIMDs aren't an absolute eulogy under all circumstances either; plus according to my understading they aren't exactly cheap in terms of hardware either.

silent_guy
22-Nov-2006, 18:29
Not a whole lot you can do hardware-wise with three weeks, that I've ever heard.
Just a small point: once can assume that most of the functionality should be ok by now, but there are a ton of small little details that must be right before final release for production and almost all of them can usually be fixed with fairly simple metal fixes. I'm thinking about ESD and latchup problems, small fixes for yield improvement, maybe a corner functional cases that only just popped up. If they delay by 3 weeks now, that still gives them 2 months: just enough to do a spin and produce sufficient amount for launch, while they continue qualification on existing samples. (Not saying that's the case, but it's definitely possible.)

Shtal
22-Nov-2006, 18:32
That doesn't mean YOU have to believe it. No offense, but your posts are a tad repetitive and/or openly biased. There's no problem asking questions, but it can become a bit annoying if every single one of them ignores the big picture.

Things like "The reason I asked if ATI combine best of the best team together R600 should be incredible." are rather ridiculous; while the statement itself is quite acceptable considering it's likely some of ATI's best engineers were working on R500 instead of R520, I'm not sure what you can objectively conclude out of this.

The only thing that's relatively obvious at this point is that both NVIDIA and ATI have had tons more time and R&D to invest into this than in the G70/R520 refreshes, and even way more compared to that invested in G71/R580. So, yes, this will be much more of titan's fight - but this is rather traditional for new architectures. You get used to it after you've watched the industry for a while ;)


Uttar

Sorry....

3dilettante
22-Nov-2006, 18:48
That's an interesting question. Say for the sake of argument that R600 ALUs have received more love than the Xenos ones and are now fully MIMD, or something very close to it. Why would they have still grouped them as a unit that way rather than the single unit scalars of G80? What are the tradeoffs?

I'm not sure if they'd go full MIMD or fully Vec4. Perhaps there's a compromise solution between being a full-width vector unit and wholly scalar.
A 2-wide vector unit could be designed with a restricted form of 2-way scalar issue, if there's some creative renaming of components, either by the compiler or some scheduling unit in the array.

I don't have enough of an understanding of the GPU architectures being compared to go beyond some theoretical ideas I have about why things might not be fully scalar.

The grouping of individual processors in arrays might not be that big a deal, if during simulations or profile studies, the designers saw that it was very common to see ~8 or 16 units working on the same task.
If 90% of the time this is the case, there is a very strong argument for not bothering to maintain the additional context and routing necessary for fully independent function.

At least I think that's what you're asking. I think ATI's spec seems to indicate the units are kept together in arrays that operate on the same program, though not necessarily in lock-step.

I could use some feedback, since I don't know enough, but it seems to me that ATI and Nvidia's GPUs differ at what level or criteria they use to dispense tasks/instructions.

Perhaps reducing the context associated with heavy threading might be a reason why ATI hasn't gone fully scalar. Since scalar units have to serialize vector operations, the thread running those scalar ops needs to last several times longer.
The context cost for fully independent scalar units is expanded both horizontally with respect to each unit operating on its own, and vertically with respect to time (double-clocking the shader units could have helped with this in G80).

Scheduling and threading contexts could be reduced, if either the units could churn through instruction packets faster, or the chip can simplify how it monitors unit or array execution status.
It could be that the command processor can rely on a less intense scheduling method, or that the freed resources can allow for thread contexts that have less meta-data and allow for more registers per thread.
On the other hand, vector operands would double,triple, or quadruple the amount of data needed at any given time, so I can't say they would be a win either.

Rumors of issues with "threading resources" for Xenos might be a motivation for an emphasis on reducing meta-context, but that's such a vauge statement that it could mean that fully Vec4 units had too much context as well.

Once again I'm going very speculative, since I know squat about R600's actual implementation, and no deep detail on G80.

rwolf
22-Nov-2006, 18:57
This is a question I cannot respond, but if you have only vector units, you have to maximize the throughput by keeping all vector data together and fed to the ALU. If you have to do a single scalar operation, immediately you lose 75% efficiency for that operation, which is of course not good. This is why I don't trust a "simple vec4" approach (but of course I can be completely wrong on this). So it's possible that the unit can be superscalar, and capable to operate with the same instruction but with 2+2 or 3+1 and even 2+1 or 1+1 patterns. Or it's possible that the arrangement is simply not vec 4 at all, or that there is a very complex scheduling and dispatching/reordering of the instructions/data on the fly.

I don't think that is a problem because R600 is probably [vec4 + scalar]. You can see the scalar unit in their SIMD patent applications.

psurge
22-Nov-2006, 19:01
3dilettante - but if your threads are capable of finishing faster and main memory latency remains constant, you need more of them to fully hide memory latency. So how does this reduce threading context - or am I misunderstanding what you're saying?

leoneazzurro
22-Nov-2006, 19:04
I don't think that is a problem because R600 is probably [vec4 + scalar]. You can see the scalar unit in their SIMD patent applications.

Yes, but the patent could refer to R580 or Xenos (as it's 5D +1).
But in R600 it could be very different.

3dilettante
22-Nov-2006, 19:26
3dilettante - but if your threads are capable of finishing faster and main memory latency remains constant, you need more of them to fully hide memory latency. So how
does this reduce threading context - or am I misunderstanding what you're saying?

There's that, and the use of an array configuration and vector instructions to reduce the amount of data that needed to track execution. By reducing the amount of work the chip does in tracking each unit, it frees up transistors and storage that can then be used to run more threads.

Since the threads are leaner, more can fit in the scheduling slots, and there could be more of them thanks to the transistors freed up from earlier.

Faster thread execution also keeps math-heavy threads from taking up slots for too long, relative to memory-limited threads. The sooner a thread that needs a lot of memory accesses can start, the better.

edit:
It's all speculative. The bottlenecks in any given system just move around, nothing eliminates them.
It could easily be that ATI stuck with a vector architecture because they misjudged the pros and cons, given what they predicted would be the target workload and process generation. They had to make a lot of choices long before they were sure they would work out.

rwolf
22-Nov-2006, 19:26
Vector ALU's might be more complex to make but that doesn't mean they can do complex shaders better, actually if you think about it, when optimizing code you tend to try to make a complex algorithm into something simpler because let be CPU or GPU or what ever will have an easier and faster time at execution. This is the same way you can look at how the scalar units work.

Apply that logic now to SSE and Altivec. Think about the x86 machine language instructions necessary to do the same work as SSE and therein lies your answer.

SIMD will not do single operations better, but math on sets is much more efficient.

trinibwoy
22-Nov-2006, 19:48
Apply that logic now to SSE and Altivec. Think about the x86 machine language instructions necessary to do the same work as SSE and therein lies your answer.

That's a pretty extreme analogy to what we're discussing here. You're comparing two completely different instruction sets (x86/SSE). The vector/scalar comparison is about executing the same instruction over multiple operands simultaneously vs sequentially.

Razor1
22-Nov-2006, 19:51
Apply that logic now to SSE and Altivec. Think about the x86 machine language instructions necessary to do the same work as SSE and therein lies your answer.

SIMD will not do single operations better, but math on sets is much more efficient.


That is true, but shaders tend not to use complex algorithms, because of previous hardware, to make them scalable they are almost always broken down to something very simple for the hardware to do fast. And that goes for new generation cards too. Unlike cpu's GPU's are all about real time tasks.

3dilettante
22-Nov-2006, 19:57
That's a pretty extreme analogy to what we're discussing here. You're comparing two completely different instruction sets (x86/SSE). The vector/scalar comparison is about executing the same instruction over multiple operands simultaneously vs sequentially.

That's not all that extreme.

x86 is by default scalar, SSE's SIMD instructions allow for a watered-down version of vector processing on small vectors.

If the scalar SSE instructions are used versus the SIMD ones, the same argument would still apply, though not to as great an extent because there's just not as much crap in SSE fp code as there is in x86.

trinibwoy
22-Nov-2006, 20:28
That's not all that extreme.

Isn't SSE* about more than just the vectorization of existing x86 scalar instructions?

psurge
22-Nov-2006, 20:57
well yes, IIRC it also has scalar DP instructions (with recent compilers I believe you can generate code containing no x87 cruft at all).

3dilettante
22-Nov-2006, 20:57
Isn't SSE* about more than just the vectorization of existing x86 scalar instructions?

There's a lot more than just vectorization, but I think the SSE instructions being discussed are the vector operations compared to the x86 instructions, which are all scalar.

The SIMD instructions can shave a lot of cycles off compared to code that is forced to use scalar operations.

Razor1
22-Nov-2006, 21:00
There's a lot more than just vectorization, but I think the SSE instructions being discussed are the vector operations compared to the x86 instructions, which are all scalar.

The SIMD instructions can shave a lot of cycles off compared to code that is forced to use scalar operations.


That is also true, but when you have parrallel execution this can be done in one cycle in a gpu well aslong as its not larger then the amount of free scalar ALU's.

trinibwoy
22-Nov-2006, 21:04
The SIMD instructions can shave a lot of cycles off compared to code that is forced to use scalar operations.

Yep, that's why you have to make sure that moving to a scalar architecture facilitates a higher clock and/or more units.

3dilettante
22-Nov-2006, 21:12
That is also true, but when you have parrallel execution this can be done in one cycle in a gpu well aslong as its not larger then the amount of free scalar ALU's.

There's a lot of stuff going on in the background that is eliminated or simplified with SIMD or vector ops.

For a CPU:

SIMD:

1 packed load, check dependencies, check when load completes
1 SIMD op (mark unit as busy, bypass, etc)
1 store (various checks)

Scalar
4x(1 load, check dependencies even though there are none, check when load completes)
4xop (mark each unit as busy, bypass, etc)
4xstore (various checks)

If most of the scalar work winds up looking like the second form, you could save a lot of transistors or speed up the units by ditching all those extra checks.

Yep, that's why you have to make sure that moving to a scalar architecture facilitates a higher clock and/or more units.

That's not the main reason for scalar units. They increase utilization, not straight-line speed. Vector units tend to use fewer transistors and instructions for the amount of calculations they do, and scalar units complicate stores and bypassing compared to consolidated signal traffic.

If the scalar work is highly data parallel and independent most of the time, it's a good argument for something non-scalar.

trinibwoy
22-Nov-2006, 21:22
That's not the main reason for scalar units. They increase utilization, not straight-line speed. Vector units tend to use fewer transistors and instructions for the amount of calculations they do, and scalar units complicate stores and bypassing compared to consolidated signal traffic.

Oh I'm not saying that increased unit count and higher clocks are motivations for going scalar. But once you decide to go scalar increased efficiency isn't going to help you much if you don't increase the number of units or clock vs a vector based architecture. I'm just agreeing with your work per cycle statement earlier.

Shtal
23-Nov-2006, 06:25
I would like to update the vote information that I did earlier.
So far people here think R600 going to be good core chip.

15-20% faster = 36% 24 votes
About the same = 26% 17 votes
35% or more = 24% 16 votes
Slightly faster = 14% 9 votes

Total votes are: 66

The results are not that bad…. :)
http://snappoll.com/poll/147813.php

We just have to wait little bet longer about the truth behind R600 mystery.

There is some reporting that AMD will ensure that ATI won't be skipping its schedules any more. We all hope the target release time January 2007. And the Mystical R600 project will be reveal without lies that is spreading out on the net.

---------------------------------------------------------------------------------
To be honest and very honest truth about what I earlier said about: "if you say a lie loud enough, many times enough people will believe it" I meant the Inq that they do, but not me. :) :)
I apologize for misunderstanding. - Sometimes when I read something I start thinking hmmm.... Interesting information. "But most likely a lie"

rwolf
23-Nov-2006, 10:08
Its in the gf8800 tech brief

http://www.nvidia.com/object/IO_37100.html

Each stream processor on a GeForce 8800 GTX operates at 1.35 GHz and supports the dual issue of a scalar MAD and a scalar MUL operation, for a total of roughly 520 gigaflops of raw shader horsepower. But raw gigaflops do not tell the whole performance story. Instruction issue is 100 percent efficient with scalar shader units, and the mixed scalar and vector shader program code will perform much better compared to vector-based GPU hardware shader units that have instruction issue limitations (such as 3+1 and 2+2).

There is a good chance ATi's r600 might have more gflops so they probably aren't going to market the gflop side to much right now.


It must be true it came from Nvidia marketing material. :wink:

Arun
23-Nov-2006, 11:05
To be honest and very honest truth about what I earlier said about: "if you say a lie loud enough, many times enough people will believe it" I meant the Inq that they do, but not me. :) :)
I apologize for misunderstanding. - Sometimes when I read something I start thinking hmmm.... Interesting information. "But most likely a lie"Yeah, what I meant is that you were repeating these things yourself, thus becoming part of the "If someone repeats a lie many times, I'll believe it!" category :) I was just mocking the fact you were repeating these things as if they were concensus, when I think many people here aren't fully convinced of them yet.
Right on. You tell Shtal. Nobody else on this forum is openly biased or repetitive.Heh. If this is directly aimed at me, I don't think I'm ever going to deny I have a slight bias (the amount is relative and subjective, ofc) towards NVIDIA's GPUs, but that's historically partially because I have a better understanding of their architecture than ATI's. It's easier to get excited about things you know about, or at least think you know about, than other things you have no clue on!

There's a pretty big difference (purely IMO!) in knowing you have a bias while admitting as much, and acting as if you were perfectly objective while trying to interpret or influence others' opinions to fit your own better. The only thing that annoyed me with Shtal above is that he sometimes repeats others' words blindly without applying much if any critical thinking, and that his understanding of performance seemed a bit black-and-white.

What tells you G80 won't be 30%+ faster in some games, and R600 30%+ faster in others, for example? What makes you think R600 will be at least as fast as G80 in every single case, or that it is unlikely R600 is going to be significantly faster than 35% more than the G80, even in corner cases that fit the architecture perfectly? Or is what we're benchmarking exclusively 3DMark06? :)


Uttar

Razor1
23-Nov-2006, 11:27
It must be true it came from Nvidia marketing material. :wink:


Well if you add up the gigaflops on a per ALU bases we get that number anyways, Jawed was asking if nV stated the gigaflops anywhere, because at the moment it isn't using all its calculation abilities :wink: . And also nV would have marketed gflops alot more but as I said there is a very good chance the r600 will have more raw horsepower.

Rangers
23-Nov-2006, 12:02
There is some reporting that AMD will ensure that ATI won't be skipping its schedules any more


Oh yeah I think there's no doubt about this. We should start to see a much more aggressive ATI high end out of the partnership.

Xmas
23-Nov-2006, 12:24
So, it doesn't seem all that random to me, even after you split out the 16 st.units. If you have 32 fragments per st. unit, then all of your operands within a cycle are within ~3% of each other. That seems pretty spatially close to me. Additionally, while you want to be able to switch quickly, if you need to pause every cycle to wait for texture access, you're going to be limited on texture throughput anyway. It's more likely that you have at least 2-3 ALU cycles back-2-back. Without vec->scalar "unrolling", those additional cycles are likely to be accessing other parts of the same vector, so that makes for some smaller amount of temporal locality. [If the vec->scalar process includes "unrolling" in order to save temp. registery usage, then there's a more complex access-path relationship which might make less sense to take advantage of.]
I have the feeling that maybe we're discussing different things, so I suggest going back to the initial issue:
Honestly, I'm confused by that patent, because I'm not sure exactly what problem it's trying to solve. This appears to me to be similar to multibank DRAM and/or XDR2, but I don't expect, during the normal course of graphics rendering, to require randomized access to the register file. You're always dealing with one of a handful of registers on a particular thread. I'm clearly missing something, but I'm not sure where to start looking :(
So what kind of access pattern that could be taken advantage of do you expect during the normal course of graphics rendering?

I've been assuming:
struct FragmentReg { fp32[16]; }
struct StreamingUnitReg { FragmentReg[32]; }
struct ClusterReg { StreamingUnitReg[16]; }
int32 regAllocatedBitField
...but maybe there's a reason to arrange things differently? Insight appreciated. Thanks!
You've lost me here. I don't quite understant the Cluster->StreamingUnit->Fragment grouping you're using. I'm thinking more along the lines of threads, consisting of 32 fragments or 16 vertices, that need to allocate a number of wide registers (possibly 16*fp32 or 32*fp32, one or two scalars per fragment/vertex), depending on the shader program in use.

But register allocation is an interesting problem anyway, since now the units have to be able to run different threads with different register requirements.
In the CineFX pixel pipeline all threads (quads) need the same number of temp registers. So the register file can be nicely divided into indexable chunks, and as soon as a quad is finished and leaves the pipeline, a new quad can enter, just reusing the leaving quad's register chunk.

_xxx_
23-Nov-2006, 14:44
Oh yeah I think there's no doubt about this. We should start to see a much more aggressive ATI high end out of the partnership.

I actually expect the high-end ATI stuff to disappear in a few years. Now where's that crystal ball when you need it... ;)

Geo
23-Nov-2006, 15:11
The "Does G80 still have quads in there somewhere?" threadlet has been moved to the G80 Arch thread, where it will be more at home.

leoneazzurro
23-Nov-2006, 18:15
I actually expect the high-end ATI stuff to disappear in a few years. Now where's that crystal ball when you need it... ;)

From a marketing perspective it makes no sense if ATI loses its "high end graphics" feeling.
To have high-end high performace parts can be a leverage for the low-end, as we saw in the past for everything from CPU to GPU.

Shtal
23-Nov-2006, 21:13
The only thing that annoyed me with Shtal above is that he sometimes repeats others' words blindly without applying much if any critical thinking, and that his understanding of performance seemed a bit black-and-white.

Applying critical thinking how you describe it, In my opinion would not change a bit what is the real truth behind R600 project and your critical thinking would not change R600 design. I'm not saying I should not apply critical think how you are describing, critical thinking is very important, but on the other hand at the moment for me would be wasting time. I would rather get real/true specs first before I would apply critical thinking. I have to get a fact first on R600 before I start talking. Sorry for my opinion.

--------------------------------------------------------------------------------------
Just because I don't say anything of my own thought's and I only speak what other people say, does not give you the right to say black-and-white of me. How would you know who am I....

Ailuros
24-Nov-2006, 04:53
From a marketing perspective it makes no sense if ATI loses its "high end graphics" feeling.
To have high-end high performace parts can be a leverage for the low-end, as we saw in the past for everything from CPU to GPU.

I could easily think though that Intel does have the largest market share as in total units sold, without even selling standalone graphics sollutions.

Truth is there is a point for the above, but I could think of far better reasons why AMD would want to continue to address the high end market aggressively. A simple example would be the console market, in which ATI already has an excellent penetration.

Shtal
24-Nov-2006, 06:47
I could easily think though that Intel does have the largest market share as in total units sold, without even selling standalone graphics sollutions.

Truth is there is a point for the above, but I could think of far better reasons why AMD would want to continue to address the high end market aggressively. A simple example would be the console market, in which ATI already has an excellent penetration.

The way I look at it; if their is a continue demand on High-End stuff and people willing to pay high price for video cards, and AMD will still make profit even a little one. I would assume their is no reason for AMD to cut-off High-End stuff. Unless AMD decides who cares.

dizietsma
24-Nov-2006, 08:33
Are people happy that the core will be roughly 400-425mm in size and therefore a lot smaller than the G80 which is about 480mm in size ? If this is true then can we see ATi put price presssure on nvidia if their solution is in the ball park for performance ?

Sunrise
24-Nov-2006, 09:01
Are people happy that the core will be roughly 400-425mm in size and therefore a lot smaller than the G80 which is about 480mm in size ? If this is true then can we see ATi put price presssure on nvidia if their solution is in the ball park for performance ?
For one, I don´t see why anyone would be happy about core size alone. If it´s at least the same perf/watt or perf/mm², there may be a point in that, otherwise those are all relatively uninteresting paper facts.

Also, you need to know a little more about yields/margin model, before you can come to conclusions like that. If it performs well (read: at least on par), I also don´t see any reason why ATI should put the pressure on NV.

Sell it at the highest price possible, then lower it when the time comes, which should be about Q4 for their (or NV`s next part) on 65nm, depending on the maturity of TSMC´s process targets, of course.

Jawed
24-Nov-2006, 09:20
Apart from the size of R600 I think we'll all be keenly watching out for the sizes of the lower dies.

After all "unified" is supposed to be "efficient" per unit area - so we should see some startlingly small/powerful GPUs in the $75/100/150 brackets early next year.

If we don't then it's time to start wondering about execution. Why would you delay releasing small/more-capable/power-efficient SKUs (i.e. more attractive in the market, better margins)?

The ghost of the 80nm fracas hangs over R6xx, so actually I'm expecting a fairly unhappy debut for the line-up. Even if it is technically amazing and "good value".

If AMD can't sort out mid-range R6xx from day 1 (in addition to the budget/value parts) then they'll lose a significant amount of face. We're expecting NVidia to have mid-range G8x ready for Vista release, so why not AMD? The halo effect is pretty useless if there's nothing under the halo worth buying.

Jawed

satein
24-Nov-2006, 09:32
Are people happy that the core will be roughly 400-425mm in size and therefore a lot smaller than the G80 which is about 480mm in size ? If this is true then can we see ATi put price presssure on nvidia if their solution is in the ball park for performance ?

That would sound promising as smaller core will result in smaller chip and thus, possibly, smaller card design too :smile: . Only thing I am starting to wonder is that if ATi pack 1GB memory on the card... the final card might be end up as long as we heard.

Also, the points Jawed said are very interesting too... the size of lower part chip. I would be more happy if they can produce in a proper size and within themal evelop so that the mobility part will not get screw up as in the X1k series... as it took them too long to put mobility part to work for the X1k.

leoneazzurro
24-Nov-2006, 09:46
I could easily think though that Intel does have the largest market share as in total units sold, without even selling standalone graphics sollutions.


That's true, but I think we were speaking about discrete graphics and 3D :razz: .
Intel has of course heavy penetration in the corporate and notebook market with integrated GPU.
But I think that when someone speaks about "graphics" the names in mind are ATI and Nvidia, not Intel.
And this not because R200 or 6150 chipsets :wink:
Anyway, speaking about margins, I think high-end graphics GPUs have more headroom than integrated chipsets, like in the server CPU market, so if it's profitable (or it can be profitable), why sell them ?
Fusion will be in first place a substitute for Integrated graphis first.
But I think if AMD wanted only to have an integrated chipset with graphics, they could have chosen to buy Sis or Via instead.

Sunrise
24-Nov-2006, 10:39
That would sound promising as smaller core will result in smaller chip and thus, possibly, smaller card design too :smile:
That´s highly irrelevant. Don´t forget that e.x. G80 GTX is a fair bit longer than the GTS - and that´s not a result of the core size, but rather because of several voltage related elements that feed the core (at it´s official SKU speed) and it´s amount of fairly voltage-hungry memory. Not even talking about some of G80´s base "features" (interface) which happen to limit NV on the size of those cards, too. GDDR4 offsets some of those, but that´s a little offtopic.

R600 will also be produced on the 80nm half-node, which is another point why core size isn´t really relevant. If their yields are as good as NV´s, core size results in more dice per wafer and that´s about the only thing that could give ATI an advantage, but i´m not really expecting that. 90nm should be far more mature than 80nm and even if ATI does pull out some magic tricks (which there aren´t too many) to produce high-class silicon, they would still be limited be other factors.

It will be far more interesting, how both handle their midrange-lineup, since with NV´s current G8X architecture the scalability should easily best G7X´s. I´m also with Jawed here, since that will also mark the point where anybody is free to judge if ATI is still able to compete, not only at the high-end, but at the very important mid-end as well. Theoretically, NV is able to scale top-to-bottom and they should also be able to ramp them up really fast. Since ATI supposedly prefers to use the 65nm node (if those plans are still accurate) for their mid-/low-end that alone could induce delays, which wouldn´t be too good, to say the least.

To come back to R600, ATI currently has (had?) the problem that they need(ed) quite a hefty cooling solution for R600 and since there is a limit on the height, they have to make it longer instead. 1Gig should also be present on their next top-of-the-line FireGL SKU.

Ailuros
24-Nov-2006, 11:18
That's true, but I think we were speaking about discrete graphics and 3D :razz: .
Intel has of course heavy penetration in the corporate and notebook market with integrated GPU.
But I think that when someone speaks about "graphics" the names in mind are ATI and Nvidia, not Intel.
And this not because R200 or 6150 chipsets :wink:
Anyway, speaking about margins, I think high-end graphics GPUs have more headroom than integrated chipsets, like in the server CPU market, so if it's profitable (or it can be profitable), why sell them ?
Fusion will be in first place a substitute for Integrated graphis first.
But I think if AMD wanted only to have an integrated chipset with graphics, they could have chosen to buy Sis or Via instead.

That's why I used the console market as an example, for which AMD doesn't have any penetration at all at the moment. The console market needs usually potential high end graphics designs and while potentual customers probably also negotiate with smaller players than ATI/NVIDIA there's a reason why they concenctrate on the big players mostly. One would be a very high weatlh of resources and the other experience with high end/very complicated designs.

I'm not sure but it wouldn't surprise me in the least if AMD would be also interested to sell CPUs in that and in other markets they don't have a single foothold in.

What I was saying in my former post, is that Intel is a perfect example that breaks the rule that suggests that an IHV needs high end graphics to sell lower end-whatever. What on the other hand if an third IHV would re-enter the GPU market tomorrow, introduces highly competitive high end GPUs, yet extremely underwhealming low-end GPUs, would they sell really as good as their competitors in the budget segment after all?

satein
24-Nov-2006, 11:29
That´s highly irrelevant. Don´t forget that e.x. G80 GTX is a fair bit longer than the GTS - and that´s not a result of the core size, but rather because of several voltage related elements that feed the core (at it´s official SKU speed) and it´s amount of fairly voltage-hungry memory. Not even talking about some of G80´s base "features" (interface) which happen to limit NV on the size of those cards, too. GDDR4 offsets some of those, but that´s a little offtopic.

R600 will also be produced on the 80nm half-node, which is another point why core size isn´t really relevant. If their yields are as good as NV´s, core size results in more dice per wafer and that´s about the only thing that could give ATI an advantage, but i´m not really expecting that. 90nm should be far more mature than 80nm and even if ATI does pull out some magic tricks (which there aren´t too many) to produce high-class silicon, they would still be limited be other factors.

It will be far more interesting, how both handle their midrange-lineup, since with NV´s current G8X architecture the scalability should easily best G7X´s. I´m also with Jawed here, since that will also mark the point where anybody is free to judge if ATI is still able to compete, not only at the high-end, but at the very important mid-end as well. Theoretically, NV is able to scale top-to-bottom and they should also be able to ramp them up really fast. Since ATI supposedly prefers to use the 65nm node (if those plans are still accurate) for their mid-/low-end that alone could induce delays, which wouldn´t be too good, to say the least.

To come back to R600, ATI currently has (had?) the problem that they need(ed) quite a hefty cooling solution for R600 and since there is a limit on the height, they have to make it longer instead. 1Gig should also be present on their next top-of-the-line FireGL SKU.

Thank you for kindly explianation :grin:. I just forgot that if R600 might be 512bit bus... the pin out should be out run into a bigger package too :oops: And there are also a lot of parameters to take in to account on desinging a PCB.

Anyway, I think a high-end part is still needed as it can be shown the company advancement in technology. There are also a market to this high end part, both gaming and workstation, which I think is very important to response too. If AMD wants to be good in the workstation and server segment, it would be nice if they can offer a completed system of high-end workstation (thoroughly from CPU, chipset, GPU and the add-on card subsystem) on its own... and right now AMD got the FireGL in hand already. I think this would be good enough to presuade AMD to make more pressure on ATi to produce a good promise on high-end GPU on schedule (yes, no more delay) to serve that area.

Edit: Missing quote :oops:

leoneazzurro
24-Nov-2006, 11:56
That's why I used the console market as an example, for which AMD doesn't have any penetration at all at the moment. The console market needs usually potential high end graphics designs and while potentual customers probably also negotiate with smaller players than ATI/NVIDIA there's a reason why they concenctrate on the big players mostly. One would be a very high weatlh of resources and the other experience with high end/very complicated designs.

I'm not sure but it wouldn't surprise me in the least if AMD would be also interested to sell CPUs in that and in other markets they don't have a single foothold in.

What I was saying in my former post, is that Intel is a perfect example that breaks the rule that suggests that an IHV needs high end graphics to sell lower end-whatever. What on the other hand if an third IHV would re-enter the GPU market tomorrow, introduces highly competitive high end GPUs, yet extremely underwhealming low-end GPUs, would they sell really as good as their competitors in the budget segment after all?

Yes, but Intel sells these GPUs as "Chipsets". And Intel has the almost complete dominance in the chipset market (as its market share is over 75% and 90% of this 75% uses Intel chipsets, IMHO). I mean, Intel's focus is not graphics, it's CPU and chipsets. Story tells us that when Intel tried to compete in the discrete graphics market, they failed. So yes, they sell the most of the units. But with ATI acquisition AMD has gained access to a market (discrete graphics, professional graphics) where Intel has zero market share. And I think it's foolish to give away this advantage. The same for the console market, as you rightly point out.

Prometheus
24-Nov-2006, 12:57
Hmmm?

http://img218.imageshack.us/img218/7868/r600kn2.th.jpg (http://img218.imageshack.us/my.php?image=r600kn2.jpg)
http://img220.imageshack.us/img220/1613/r600mkl9.th.jpg (http://img220.imageshack.us/my.php?image=r600mkl9.jpg)
http://img215.imageshack.us/img215/4365/r600nyc6.th.jpg (http://img215.imageshack.us/my.php?image=r600nyc6.jpg)
http://img217.imageshack.us/img217/3494/r600foq2.th.jpg (http://img217.imageshack.us/my.php?image=r600foq2.jpg)
http://img295.imageshack.us/img295/8572/r600siu2.th.jpg (http://img295.imageshack.us/my.php?image=r600siu2.jpg)
http://img218.imageshack.us/img218/6650/r600qra1.th.jpg (http://img218.imageshack.us/my.php?image=r600qra1.jpg)

satein
24-Nov-2006, 13:23
Hmmm?

http://img218.imageshack.us/img218/7868/r600kn2.th.jpg (http://img218.imageshack.us/my.php?image=r600kn2.jpg)
http://img220.imageshack.us/img220/1613/r600mkl9.th.jpg (http://img220.imageshack.us/my.php?image=r600mkl9.jpg)
http://img215.imageshack.us/img215/4365/r600nyc6.th.jpg (http://img215.imageshack.us/my.php?image=r600nyc6.jpg)

:shock: That is a strange core positioning!!
Who does know that size of the token used to compare the core?

Roughly estimated from measurement the size is about 238x238mm^2....

Prometheus
24-Nov-2006, 13:26
:shock: That is a strange core positioning!!
Who does know that size of the token used to compare the core?
About 2.5cm.

satein
24-Nov-2006, 13:37
About 2.5cm.

Thank you! I did measure the component around the core and get a ratio to the real one on CPU beside me...

The core size is estimated to be about 23.8x23.8 mm^2 to 24.0x24.0 mm^2 :grin: .

[Procedure is that the 6 pins components long 4.4mm (on my 2405) and R600 core longs 52.4mm (same screen).
The 6pins components package on Pentium M longs 2mm (using verniar), thus a ratio of the imagae to the real package is 2.2...
divide 52.4 by 2.2 gets 23.8mm.... and it looks square!!]

Anyway, if this core supposed to be the real one... it would be a big smaller than G80 core!!

Edit: typo and add some point...

Edit2: :oops: digit error...

NocturnDragon
24-Nov-2006, 13:44
About 2.5cm.

Wikipedia says 26mm of diameter for the coin.
So the chip should be around 450mm^2.

IbaneZ
24-Nov-2006, 13:46
Has someone ripped the IHS off an 8800 card yet? R600 must be a bigger chip though right?

I wonder why we always get leaked stuff on Fridays. People start drinking at work? :lol:

Arnold Beckenbauer
24-Nov-2006, 13:52
Can you remenber this picture?
http://img394.imageshack.us/img394/1659/r5201iv2tm.jpg

R600 = 2xR520 = ~ 640-650 M transistors?

Geo
24-Nov-2006, 14:09
I'm trying to think of why you'd turn the core that way, and all I'm coming up with is maybe straightening/shortening memory routings on the PCB, which in turn might make PCB manufacture a bit easier and fractionally reduce power reqs. But my thumb is kinda smelly on that one, given where I just pulled it from. . . :smile:

That kind of positioning do anything for you on 512-bit?

satein
24-Nov-2006, 14:13
Has someone ripped the IHS off an 8800 card yet? R600 must be a bigger chip though right?

I wonder why we always get leaked stuff on Fridays. People start drinking at work? :lol:

:yep2:

Now I would like someone get drunk and leak any banchmarks score :twisted:

satein
24-Nov-2006, 14:15
I'm trying to think of why you'd turn the core that way, and all I'm coming up with is maybe straightening/shortening memory routings on the PCB, which in turn might make PCB manufacture a bit easier and fractionally reduce power reqs. But my thumb is kinda smelly on that one, given where I just pulled it from. . . :smile:

That kind of positioning do anything for you on 512-bit?

I don't think that way!! if you want something like that you can do only rotate chip by 45 degree and get the same thing! Probably, it may relate to how to make connections from the die to the package which so many pins...

Jawed
24-Nov-2006, 14:17
I'm trying to think of why you'd turn the core that way, and all I'm coming up with is maybe straightening/shortening memory routings on the PCB, which in turn might make PCB manufacture a bit easier and fractionally reduce power reqs. But my thumb is kinda smelly on that one, given where I just pulled it from. . . :smile:

That kind of positioning do anything for you on 512-bit?
The grid of pins on the base of the package is still going to be "square".

Dunno, really. Memory bus is certainly what I've been thinking, too, since trace-lengths like to be nice and equal.

Indeed, this layout seems to imply that pins on the bottom of the package, right in the corners, will be "a long way" from any point on the die - further than if the die was square to the package.

Now, you might argue that you can absorb some trace-length equalisation routing, in the circuit board, by putting more equalisation (than normal) into the package<->die pin-to-pad mapping.

Erm...

Jawed

satein
24-Nov-2006, 14:24
Geo, could you please notice that now we get the leaked core picture!! :cool: (Probably direct to the post)
I am now counting on how long the INQ will catch this pic and post on their website!!

Geo
24-Nov-2006, 14:37
Geo, could you please notice that now we get the leaked core picture!! :cool: (Probably direct to the post)
I am now counting on how long the INQ will catch this pic and post on their website!!

These pics aren't appearing anywhere else? I'm assuming they're originally from some chinese forum.

Prometheus
24-Nov-2006, 14:40
I've uploaded more pics in my previous post and one more here.The pics are from http://www.pcdvd.com.tw/showthread.php?t=672250&page=8&pp=10

http://img220.imageshack.us/img220/8455/r600wso2.th.jpg (http://img220.imageshack.us/my.php?image=r600wso2.jpg)

neliz
24-Nov-2006, 14:42
I'm just amazed by the polished surface, probably hinting that someone used his brain and save all the oc'ers out there some effort in polishing the bases of their coolers.

But I can only see the rotated base as something to use for shortening the board, whereas the old chips were allways vertical and some rotated on the edges of the chip the diamond setup <> would allow them to save some milimeters on the length since some components can now be placed in the available space..

wait a minute.. that last part was bs..

Kaotik
24-Nov-2006, 14:42
Has someone ripped the IHS off an 8800 card yet? R600 must be a bigger chip though right?

I wonder why we always get leaked stuff on Fridays. People start drinking at work? :lol:

http://www.beyond3d.com/misc/chipcomp/?view=chipdetails&id=117&orderby=release_date&order=Order&cname=

Based on that and estimations here, G80 is bigger (even without NVIO included in the size?)

Sinistar
24-Nov-2006, 14:46
These pics aren't appearing anywhere else? I'm assuming they're originally from some chinese forum.

They are here (http://www.pcdvd.com.tw/showthread.php?t=672250&page=8&pp=10).

Geo
24-Nov-2006, 14:49
What's with "Roden" on the chip? "Rodin" (with an 'i') was R580's codename. . . R600 is supposedly Pele. . .

Razor1
24-Nov-2006, 14:52
its a Taiwanesse 10 dollar coin and its 26 mm in diameter.

R300King!
24-Nov-2006, 14:53
I resized the die as best as I could. Here it is in comparison to the R520.
http://www.headlinerkaps.com/0000/ForumPics/R520_R600_DieShot.jpg

Prometheus
24-Nov-2006, 14:54
R600 nda expires January 20th and the core is 20.23mm X 21.17mm according to the guy that originally posted the pics.

leoneazzurro
24-Nov-2006, 14:56
Can you remenber this picture?
http://img394.imageshack.us/img394/1659/r5201iv2tm.jpg

R600 = 2xR520 = ~ 640-650 M transistors?

If R600 is 80 nm it can be even more. 700-720 M

{Sniping}Waste
24-Nov-2006, 14:59
I'm just amazed by the polished surface, probably hinting that someone used his brain and save all the oc'ers out there some effort in polishing the bases of their coolers.

But I can only see the rotated base as something to use for shortening the board, whereas the old chips were allways vertical and some rotated on the edges of the chip the diamond setup <> would allow them to save some milimeters on the length since some components can now be placed in the available space..

wait a minute.. that last part was bs..
Every ES ATI card I have has a nice polished core on it.

Jawed
24-Nov-2006, 15:07
I've piddled about doing various re-sizings in Photoshop and averaging.

I reckon this R600 die is 470mm2.

Assuming it's 80nm, scaling it up by ~1.27x for 90nm would make it around 595mm2. But you can't do a simple linear scaling... But if you did, that's 23% bigger than G80 at 90nm (which is missing functionality that's in the NVIO chip)...

So, there's little doubt that an 80nm G80 would be notably smaller than R600, ~81% (again, assuming the somewhat faulty linear scaling) or 382mm2.

Jawed

Jawed
24-Nov-2006, 15:10
R600 nda expires January 20th and the core is 20.23mm X 21.17mm according to the guy that originally posted the pics.
Woah! that's way smaller than my estimate, 430mm2 :oops:

Jawed

Razor1
24-Nov-2006, 15:12
well the pics aren't an exact scale lol.

_xxx_
24-Nov-2006, 15:15
there's a reason why they concenctrate on the big players mostly. One would be a very high weatlh of resources and the other experience with high end/very complicated designs.

Actually, the biggest reason is the availability/volumes. Big players can more or less guarantee these.

_xxx_
24-Nov-2006, 15:19
So, anyone out there still believing in 512 bit bus?

Well maybe, if they use a wireless bus, or magic... :razz:

Kaotik
24-Nov-2006, 15:28
So, anyone out there still believing in 512 bit bus?

Well maybe, if they use a wireless bus, or magic... :razz:

I think someone suggested that the position/angle of the chip would suggest that?

Jawed
24-Nov-2006, 15:29
So, anyone out there still believing in 512 bit bus?
G71, at 196mm2, is the smallest die with a 256-bit bus.

So, what is it about this R600 that excludes the possibility of it being 512-bit :?:

I don't see how the picture changes anything (i.e. the 512-bit bus is still a wild rumour, one that I don't believe, even if its implications are rather natty). If the die was 250mm2 then you'd have a point, but...

EDIT: for me, by far the best argument against 512-bit is that a 65nm refresh really would be too small and magic would be needed.

2nd EDIT: and while I'm thinking about it, GDDR4's huge increase in maximum bandwidth over GDDR3 sorta implies that ATI was never aiming for 512-bit.

Jawed

_xxx_
24-Nov-2006, 15:45
So, what is it about this R600 that excludes the possibility of it being 512-bit :?:

Pincount. I can't say for sure, but I doubt there would be enough place for the number of pins required in this package.

Jawed
24-Nov-2006, 16:53
Pincount. I can't say for sure, but I doubt there would be enough place for the number of pins required in this package.
So you're saying it's the pincount on the package, not the pad count on the die that's the restriction?

Inspired by that, I've piddled around with the package size and it does seem that R600's package is ~ the same size as R580's (much less error than my earlier erroneous sizing of R600). So, yeah, fitting a 512-bit bus's pins onto that package could be a problem - though I've no idea what kind of pincount that package supports... how densely populated with pins is R580/R520's package? Anyone got a pic of the bottom?

Jawed

Xmas
24-Nov-2006, 17:19
I'm just amazed by the polished surface, probably hinting that someone used his brain and save all the oc'ers out there some effort in polishing the bases of their coolers.

But I can only see the rotated base as something to use for shortening the board, whereas the old chips were allways vertical and some rotated on the edges of the chip the diamond setup <> would allow them to save some milimeters on the length since some components can now be placed in the available space..

wait a minute.. that last part was bs..
The first part as well. ;)
Silicon wafers are that shiny in general, there's nothing special about the "polished surface". And having one surface polished doesn't mean the other doesn't require polishing for best heat transfer (so you need as few thermal grease as possible to fill the microscopic scratches and bumps).

Jawed
24-Nov-2006, 17:21
Comparing RV535 (80nm) against RV530 (90nm), both supposedly the same design (157M transistors), I get an area scaling of 87.5% for 80nm.

If I apply that to R580, which is 352mm2 at 90nm, I get 308mm2. From there to 428mm2 (R600) is 139%.

So a naive transistor count scaling for R600 puts it at 139% * 384 = 534M.

Not many, huh?

Jawed

Lux_
24-Nov-2006, 17:33
So a naive transistor count scaling for R600 puts it at 139% * 384 = 534M. It's ~150M less than G80. If it's indeed true, then ATI presumably has cost advantage (per chip) for a change :wink:.

fellix
24-Nov-2006, 17:47
how densely populated with pins is R580/R520's package? Anyone got a pic of the bottom?

Shall we count? (http://img108.imageshack.us/img108/3457/davefudogv0.jpg) :lol:

Lux_
24-Nov-2006, 17:55
I got 1293 (on Friday night :oops: )

trinibwoy
24-Nov-2006, 18:02
How many pins do you need per bit for GDDR3/4 ?

Lux_
24-Nov-2006, 18:10
How many pins do you need per bit for GDDR3/4 ?AFAIK it's not strictly defined. For example addressing: (in theory) you can use half the pins, if you address in two cycles instead of one.

fellix
24-Nov-2006, 18:13
I got 1293 (on Friday night :oops: )
Missed with just an inch -- 1265. :lol:

How many pins do you need per bit for GDDR3/4 ?
62 pins are for common I/O (data, address, clock & etc.) per GDDR3 device -- those are wired to the GPU I believe, the rest is for power/ground.
For GDDR4 substract ~5 pins of the address line, gone for the grounding.

JoshMST
24-Nov-2006, 18:15
It's ~150M less than G80. If it's indeed true, then ATI presumably has cost advantage (per chip) for a change :wink:.

Well, you are assuming that ATI will not increase the amount of cache in their R600 vs. R580. Cache is far more densely packed as you know, and the G80 has significantly more L1/L2 cache than any previous NVIDIA product. So, with the flexibility and programmability that ATI will also include in their design, you have to assume as well that they will increase the size of the caches around the chip. My feeling is that ATI will have a higher transistor count with R600 than NVIDIA does for G80, but my gut is telling me that they will be pretty close in overall count.

_xxx_
24-Nov-2006, 18:15
Shall we count? (http://img108.imageshack.us/img108/3457/davefudogv0.jpg) :lol:

No need, it's pretty obvious that it won't fit in there, at least not with this pin spacing.

_xxx_
24-Nov-2006, 18:24
For the data just one AFAIK. But you need other stuff as well, like voltage supply pins, clocks, enable pins, chip select pins, data mask etc, etc. You need roughly 50-70% more pins (out of the blue, just to point out that it's not double the count) for the memory interface compared to what we have now. Say, 300-400 more :???:

Here's the GDDR3 datasheet, you can find the pinning in there. GDDR4 has less pins, but not much less AFAICR.

http://www.samsung.com/Products/Semiconductor/GraphicsMemory/GDDR4SDRAM/512Mbit/K4U52324QE/ds_k4u52324qe_rev10.pdf

Jawed
24-Nov-2006, 19:18
Shall we count? (http://img108.imageshack.us/img108/3457/davefudogv0.jpg) :lol:
Thanks - nice pic.

Pretty clear from that that R600 can't fit a few hundred more pins in, presuming that the package really is the same size as R580's (which I'm pretty sure of).

Jawed

Jawed
24-Nov-2006, 19:48
It's ~150M less than G80. If it's indeed true, then ATI presumably has cost advantage (per chip) for a change :wink:.
We already know the die size is smaller while transistor count can't be used to compare GPUs from different families (and particularly not from different IHVs).

80nm will prolly cost more per unit area of die (than 90nm) for a fair while - it seems to be 1 year+ less mature. Having said that, I think NVidia's yield compromise (~1/4 of the ALUs/TMUs and 1/6 of the ROPs/memory channels/L2-cache - 20% of the die overall?) for GTS is costing them a hell of a lot. And there's the cost of NVIO on top.

Jawed

Ailuros
24-Nov-2006, 19:56
I hope we can soon bury that 512bit pipe dream; it'll arrive when the time is right anyway and it still doesn't seem to be just yet. Apart from that no-one has been able yet to explain what would someone do today with bandwidth in excess of 120GB/sec. Granted more bandwidth is always a good thing, but I'd think that you need a few other aspects to increase too for that kind of bandwidth to make even sense; be it fillrate, processing power or anything else.

Has anyone excluded yet the possibility that it might after all have also a 384bit bus?

Topman
24-Nov-2006, 21:50
Comparing RV535 (80nm) against RV530 (90nm), both supposedly the same design (157M transistors), I get an area scaling of 87.5% for 80nm.

If I apply that to R580, which is 352mm2 at 90nm, I get 308mm2. From there to 428mm2 (R600) is 139%.

So a naive transistor count scaling for R600 puts it at 139% * 384 = 534M.

Not many, huh?

Jawed

RV570 80nm 230mm2 330mi
--->
R600 80nm 430 mm2 =~ 617mi ???

bye

Tim Murray
24-Nov-2006, 21:54
RV570 80nm 230mm2 330mi
--->
R600 80nm 430 mm2 =~ 617mi ???

bye
Depends what kind of 80nm stepping you're using. TSMC has three, if I recall correctly, and ATI is using 80HS, which is significantly more than just a 90nm optical shrink.

Topman
24-Nov-2006, 22:02
New photos...
http://www.iamxtreme.net/andre/R600.jpg
http://www.iamxtreme.net/andre/R600_1.jpg
http://www.iamxtreme.net/andre/R600_2.jpg
http://www.iamxtreme.net/andre/R600_3.jpg


???

trinibwoy
24-Nov-2006, 22:08
Whoa that's a shitload of pins!

silent_guy
24-Nov-2006, 22:10
Thanks - nice pic.

Pretty clear from that that R600 can't fit a few hundred more pins in, presuming that the package really is the same size as R580's (which I'm pretty sure of).

Jawed

This picture by itself doesn't really prove anything: the R520 ball pitch on that PCB is large. There are BGAs of similar area with smaller pitch that can hold up to 1900 balls and probably even more. Just have a look at the foot prints of the RAMs: they have a ball pitch of 0.8mm. I'm not sure this pitch is available for the big ones, but 1.0mm is and on this PCB the pitch seems larger than that.

Razor1
24-Nov-2006, 22:10
hmm looks like Trumpsuio's pin count of 2000 is correct?

pax
24-Nov-2006, 22:11
Ugh mine eyes! Can anyone have the audacity to count the pins?

Reputator
24-Nov-2006, 22:15
Whoa that's a shitload of pins!yeah no kidding, maybe there's still hope for a 512 interface?

silent_guy
24-Nov-2006, 22:17
hmm looks like Trumpsuio's pin count of 2000 is correct?

First estimate: 2140 pins.
What a monster... :shock:

Razor1
24-Nov-2006, 22:19
how many pins does the g80 have? I know its not a good estimate but just wondering.

trinibwoy
24-Nov-2006, 22:19
yeah no kidding, maybe there's still hope for a 512 interface?

Unless somebody has a good alternative explanation I'm at the point where I'm expecting something big in the memory arena.

Sound_Card
24-Nov-2006, 22:26
http://i21.photobucket.com/albums/b299/Genocide737/R600_3.jpg
http://i21.photobucket.com/albums/b299/Genocide737/R600_2.jpg
http://i21.photobucket.com/albums/b299/Genocide737/R600_1.jpg
http://i21.photobucket.com/albums/b299/Genocide737/R600.jpg

I counted roughly 600 additional pins, from the sides only!!!(roughly 150 on each side) I forgot to include the additional pins in the center as well, danm.:shock: My guess is 800 to a 1000 new pins.


512bit???:shock:

Jawed
24-Nov-2006, 22:30
RV570 80nm 230mm2 330mi
--->
R600 80nm 430 mm2 =~ 617mi ???

bye
Good thinking.

Jawed

Jawed
24-Nov-2006, 22:32
Unless somebody has a good alternative explanation I'm at the point where I'm expecting something big in the memory arena.
Me too, that's really put the cat among the pigeons.

FUCK

So, ahem, for the 65nm refresh ...

Jawed

pax
24-Nov-2006, 22:37
Now wth does that core have that needs such monster bw?

trinibwoy
24-Nov-2006, 22:37
So, ahem, for the 65nm refresh ...

Heh, well if we're lucky the 65nm refresh die will be just as big as the 80nm one :)

Sound_Card
24-Nov-2006, 22:38
So, anyone out there still believing in 512 bit bus?

Well maybe, if they use a wireless bus, or magic... :razz:


So as you were saying??:cool:

Farhan
24-Nov-2006, 22:39
First estimate: 2140 pins.
What a monster... :shock:

I counted 2140 as well. 512bit memory bus ftw?

Jawed
24-Nov-2006, 22:44
http://i21.photobucket.com/albums/b299/Genocide737/R600_3.jpg

So, what is that other thing then? Is it their version of NVIO?

Jawed

R300King!
24-Nov-2006, 22:45
Counted 2200 pins. (Actually counted 2204 pins, but who's counting? :D , It's a nicer round number)

no-X
24-Nov-2006, 22:48
2254... maybe nVidia was correct (http://img6.picsplace.to/img6/22/512bit.png)?

Jawed
24-Nov-2006, 22:56
Heh, well if we're lucky the 65nm refresh die will be just as big as the 80nm one :)
Hey, well I'm just basking in my apparent glorified wrongness from earlier.

I really can't figure out what the refresh will be - unless there'll be no refresh and this multi-die R700 follows in short order.

Does it get easier to have a tighter pin/ball spacing on smaller packages?

Jawed

fellix
24-Nov-2006, 22:57
w00t!

512-bit ... oh, come on -- 768 is the latest groove now. :lol:

Jawed
24-Nov-2006, 22:57
2254... maybe nVidia was correct (http://img6.picsplace.to/img6/22/512bit.png)?
Damn, this is funny. It's a proper soap opera.

Jawed

vertex_shader
24-Nov-2006, 23:00
[IM]http://i21.photobucket.com/albums/b299/Genocide737/R600_3.jpg[/IMG]

So, what is that other thing then? Is it their version of NVIO?

Jawed

No, intel core2duo.

fellix
24-Nov-2006, 23:02
BTW, the pin layout is very similar to the R5x0 one, isn't it?

Reputator
24-Nov-2006, 23:07
No, intel core2duo.yeah I'm sure that was a serious question....

SugarCoat
24-Nov-2006, 23:14
So, anyone out there still believing in 512 bit bus?

Well maybe, if they use a wireless bus, or magic... :razz:


:o

Still doing my free AA dance.

The angle of the core may very well have something to do with the configuration of the new PCB which is reportedly packed and not much longer then the X1800/X1900 PCB. Its got a little....shall we say girth?

vertex_shader
24-Nov-2006, 23:14
yeah I'm sure that was a serious question....

:wink:

Jawed
24-Nov-2006, 23:19
yeah I'm sure that was a serious question....
Er, actually I was being serious, I saw the fuzzy writing on it and gave up!

But yeah, looking properly I can see it's just about legible.

Jawed

Twinkie
24-Nov-2006, 23:20
Im thinking this card will have 1gb of vram most definately and if its 512bit memory interface, then we are talking about 16 memory chips on one PCB :shock:

What is the exact pin count on the G80? it would be great if someone could have a side by side comparison between the G80 die WITHOUT the heat spreader and the leaked shots of the R600 core. (Also a shot of the backside of the G80 core so that we can have a comparison on the number of pins and layout.)