AMD: R7xx Speculation

Status
Not open for further replies.
More than transistor switch times, the limiting factors are heat, leakage current and signal propagation times.

For example, one single transistor at 55nm could switch at 10GHz, but the frequency of a chip is dependent on the maximum signal propagation time(for example the longest path). In a very long pipelined arch. like the pentium 4 higher frequency can be achieved, in this case the thermal and leakage current limits would limit the design first.

So to summarize, it is not so simple. Good designs make smart compromises to achieve optimal performance.

edit:

But I would think clock propogation presents its own set of problems too. Syncing a very high clock across the chip should get more difficult with larger chips right?

Clock propagation is a problem, but not the most significant. The biggest problem today is related to leakage current more than anything else and it is not a function of frequency. It has to do with the number of transistors and the process technology.

Edit2:

To add to my previous point, increasing frequency does indeed increase power and heat. What we are seeing today is that because the leakage current amounts to a significant part of the heat and power budget, one can not significantly increase the frequency like in the past.
 
Last edited by a moderator:
But I would think clock prorogation presents its own set of problems too. Syncing a very high clock across the chip should get more difficult with larger chips right?
Not really. It's not all that important that the clock edges on distance points on the die go up and down at the same time (within reason): there will never be direct communication between those 2 points anyway.

When you're dealing with macro blocks, then all those have their own clock tree, which is then balanced by a master clock network on the top level. Some trimming may required (by manually inserting or removing delay cells) to balance things up, but it's not rocket science.

Even for FF's that are close to each other, it's not always a big deal: when there is enough logic in between the output of one FF and the input of another, it's easy to survive clock skew. In fact, for the last 7 years or so, when asked to, placing and CTS tools will explicitly create skew to help the design make timing, so called 'useful skew'. If one side of a FF easily makes timing and the other side does not, the tool will shift the clock forward or backward to make the easy part tighter and loosen up the difficult side.
 
R680 was said to be 775MHz, but it turned out to be 825MHz.
You've got that part bass ackwards ;) : most 3870-cards without "OC" in their name shipped and still seem to be at 775 Mhz. But X2 is at 825, so this makes it up a bit. Anyway...
 
You've got that part bass ackwards ;) : most 3870-cards without "OC" in their name shipped and still seem to be at 775 Mhz. But X2 is at 825, so this makes it up a bit. Anyway...

He's correct, actually. R680 = X2. RV670 is the 775MHz part.
 
In order to believe that information you've got to believe that AMD has performed a full custom logic overhaul of the entire architecture. Count me as very sceptical. The "shader domain" clocking is particularly untrustworthy in my view.

Jawed
 
In order to believe that information you've got to believe that AMD has performed a full custom logic overhaul of the entire architecture.

You think S3 did this for low-cost Chrome 400 series? :???:

I would think that a special, lower delta between ALU und TMUs should be possible without to implement this domain through custom logic.

Darüber hinaus hat man sich dem „MSAA-Shader-Resolve-Problem“ (das Shader-Resolve wird normalerweise von den ROPs durchgeführt. Beim R600 übernehmen dies die ALUs; ob dies Absicht oder ein Defekt an den ROPs ist, ist weiterhin unbekannt) angenommen und dessen Effizienz erhöht. Erreicht wird dies laut David Nalasco durch eine höhere Taktrate des „Shader-Cores“.
http://www.computerbase.de/artikel/...0_rv670/3/#abschnitt_technik_im_detail_part_1
They say David Nalasco mentioned, that shader-resolve for MSAA was improved trough higher frequency for the shader-cores.
 
A shaderdomaine would be chipclk x a constant Multi, like 1.2 e.g. but they say that both GPUs have a +200MHz Shaderclk. So it sounds different.

Besides, afaik these specs are complete BS
 
You think S3 did this for low-cost Chrome 400 series? :???:

I would think that a special, lower delta between ALU und TMUs should be possible without to implement this domain through custom logic.
I'm not trying to suggest that the clocking domains require custom logic. Instead I'm saying that an ALU-specific clock is extremely unlikely.

Separately, I'm saying that, in my opinion, custom logic for the entire chip would be required to stuff 32 TUs into a die that's < 30% bigger than RV670.

---

I'm dubious about a clock domain for the ALUs because of the use of the register file. Both ALUs and TUs appear to work directly with the register file which makes me sceptical about the timing of reads and writes if ALUs and TUs are clocked independently.

Bear in mind that the ALUs are given 3 clocks to fetch all their operands per batch from the register file (4.7.4 of the R600 ISA). I think this implies that the 4th clock is given over to the TUs. This then implies that ALUs and TUs must use the register file synchronously. In that case there's no way the architecture could support independent clocks for ALUs and TUs.

Jawed
 
Fast14? :D Sorry I just couldn't resist!
Separately, I'm saying that, in my opinion, custom logic for the entire chip would be required to stuff 32 TUs into a die that's < 30% bigger than RV670.
But the die size is possibly the least reliable part of all current rumours; why are you and so many others taking that as a starting point?
Anyway, personally, I'm expecting 40 TMUs... :devilish:
 
http://www.computerbase.de/artikel/...0_rv670/3/#abschnitt_technik_im_detail_part_1
They say David Nalasco mentioned, that shader-resolve for MSAA was improved trough higher frequency for the shader-cores.

As was already pointed out in 3DCF, this is basically true since all HD3870's shader cores are running at 775 MHz+, which is, after all, a higher freq shader than in R600.
Dave "Tabasco" Nalasco does not explicitly mention a separate clock domain. Besides - that would also show in shader fillrate benchmarks.


A shaderdomaine would be chipclk x a constant Multi, like 1.2 e.g. but they say that both GPUs have a +200MHz Shaderclk. So it sounds different.

Besides, afaik these specs are complete BS
Yep, they're just camping at 3dc forums converting those floatingly rumoresque specs into a table.
 
No idea - 32 is so boring though!
[note: this is a caricature of R7xx Rumours. Stop believing everything you read and randomly combining unreliable rumours please? ;)]
 
Separately, I'm saying that, in my opinion, custom logic for the entire chip would be required to stuff 32 TUs into a die that's < 30% bigger than RV670.

Kombatant said it could be or it could not be... lol. This was a while back on rage3d.

So yes, I think the die size is just as un realiable as anything else.
 
But the die size is possibly the least reliable part of all current rumours; why are you and so many others taking that as a starting point?
The picture of a cooler that turned up a while back seems like a good indication.

Jawed
 
Kombatant said it could be or it could not be... lol. This was a while back on rage3d.

So yes, I think the die size is just as un realiable as anything else.
Since die size is a function of manufacturing process, number of transistors and transistor density and AMD seems to position the upcoming chip as a value part (hence the "V" in RV770) we should be reasonable as to what we can expect from a chip on the same, relatively mature manufacturing process (which already enables a very high transistor density on AMDs parts) designed for roughly the same market segment as RV670.
 
Status
Not open for further replies.
Back
Top