The LAST R600 Rumours & Speculation Thread

Status
Not open for further replies.
Thanks for that. Single port register file upto 32K bits. Gulp, not very much at all.

---

The other big question with register files seems to be to do with clock rates. CPUs seem quite happy with multi-GHz fetches. How does that work? Does that enforce some kind of round-robin (staggered) fetching, e.g. of nibbles, in order to attain the required fetch rate?

Or are CPU register files typically so small that they can be implemented directly as flip flops like you implied earlier?

Assuming that there's ~256KB of register file per "bank" or some other portion of the SIMD array (e.g. one quarter) could the memory support ~1GHz fetches? Or is the speed going to tail-off much faster?

Jawed
 
Thanks for that. Single port register file upto 32K bits. Gulp, not very much at all.
There's a difference between a physical register file and a logical one. I'm not entirely sure how a physical register file differs architecturally from an SRAM: it doesn't matter much for the designer because they often have a similar interface and behavior and they are used interchangably.
So assuming your SRAM is fast enough, you can use that one even if its logical function is a register file.

Or are CPU register files typically so small that they can be implemented directly as flip flops like you implied earlier?
Yes, they are most likely implemented as flip flops or latches, but placed in a regular grid. A memory operates in 2 phases for a read: when the clock is high, you precharge a certain capacitive node that is shared among a bit column. During the low phase, this capacitive node is either discharged or not, depending on the value of the bit cell of a specific row. (This is different than a DRAM cell, where the discharge destroys the contents of the bit cell itself.)
This is not the case for a flip flop based register file, which has a driving stage for each individual bit. (As a result, their bit densities are way lower than for a memory.)

Assuming that there's ~256KB of register file per "bank" or some other portion of the SIMD array (e.g. one quarter) could the memory support ~1GHz fetches? Or is the speed going to tail-off much faster?
Don't know.
 
The other big question with register files seems to be to do with clock rates. CPUs seem quite happy with multi-GHz fetches. How does that work? Does that enforce some kind of round-robin (staggered) fetching, e.g. of nibbles, in order to attain the required fetch rate?
Modern CPU's can typically access multiple 128-bit registers in a single clock cycle, as far as I know.
Or are CPU register files typically so small that they can be implemented directly as flip flops like you implied earlier?
They are small (~128 registers), but they don't normally use standard static CMOS flip-flops. Dynamic registers store their data in a capacitor (really just a short wire) and are much faster. They consume more power though. And there are lots of other variations for register design. Anyway, GPU and CPU design is still worlds apart.

I can highly recommend Digital Integrated Circuits - A Design Perspective...
 
They are small (~128 registers), but they don't normally use standard static CMOS flip-flops. Dynamic registers store their data in a capacitor (really just a short wire) and are much faster. They consume more power though. And there are lots of other variations for register design. Anyway, GPU and CPU design is still worlds apart.
This is true for the inner pipeline FFs that are guaranteed to be rewritten each clock cycle, but I doubt that this is the case for register files: you need some way to keep the values in tact even if there's no activity and that's not going to happen with dynamic logic, so a 6T feedback structure will be necessary.

Obviously, they won't use standard cell based FF's, but some optimized variation of it.

That said, I'm looking now at the schematics of the Alpha 21264 register files: they have a 6T storage cell and are using the precharge-read method of get the contents of a row. 7 years ago, so this may be a bit dated. It shouldn't be too hard to find a paper that describes what Intel and AMD are doing.
 
There's a difference between a physical register file and a logical one. I'm not entirely sure how a physical register file differs architecturally from an SRAM: it doesn't matter much for the designer because they often have a similar interface and behavior and they are used interchangably.
So assuming your SRAM is fast enough, you can use that one even if its logical function is a register file.
Hmm, now I'm thinking that the kind of cell is very much determined by typical access modes. e.g. I'm thinking that perhaps DRAM type cells are more suited to bursts, successive rows, say, while SRAM is better for random accesses.

So, erm, older styles of GPU register files, such as G7x, might utilise DRAM, where the address is setup once every few hundred clocks and then the register file iterates through several hundred rows to feed the pipeline, one row at a time.

Still it's fascinating to contemplate the scale and speed of G80's register file and puzzling to think what R600 does, if it's not a truly scalar ALU like G80 (as I suspect is the case).

Jawed
 
Hmm, now I'm thinking that the kind of cell is very much determined by typical access modes. e.g. I'm thinking that perhaps DRAM type cells are more suited to bursts, successive rows, say, while SRAM is better for random accesses.
Yes, DRAM is better for accesses within the same row, but that is using a different definition than a row for an SRAM! In an SRAM, a row is one or 2 or 4 words. In a DRAM, a row is 256 or 512 words.
A column in an SRAM is a signle bit of a word. In DRAM, it's a single word of a row.;)
So, erm, older styles of GPU register files, such as G7x, might utilise DRAM, where the address is setup once every few hundred clocks and then the register file iterates through several hundred rows to feed the pipeline, one row at a time.
I find that hard to believe. DRAM consumes a lot of power, requires a lot of additional refresh circuitry, is much slower and usually requires additional processing steps (to reduce leakage.) All that doesn't sound very compatible with GPU requirements.
 
March 8th-9th looks like launch day!

Sounds right. The last rumors had parts going out to editors late Feb, and a I think a hard launch around cebit, so that sounds exactly right for an NDA lift.

So 46-47 days, or almost 7 weeks. Could be thought of as around six weeks if you want to make it seem shorter :). Since you can kind of disregard any partial weeks in my thinking, aka the extra 3-4 days beyond six weeks.


IIRC G70 launched June 22 and R520 not til Oct 8 that year, about 3.5 months. This delay would then be November 8 (G80) to March 8, about 4 months. So ATI slipped half a month.
 
Big sigh.

I've been honkering down and just biting my lip waiting for R600 to show, or at the very damn least, release the specs...but alas no. Many times over the past few months i've had more than enough money to buy an 8800 or two, but I again just bit my lip and said a big 'NO', because I like my apples red. Now that it seems like it's still a good few months away, it's just killing me...I have the upgrade bug, it NEEDS nourishment. NOW! :cry:

Just...anything. Specs, features, tidbits, anything. Just no more copy/pasta INQ crap or NDA shenanigans. My head's going to explode :p
 
1. G80 restricted range constraints, the original plan has changed. It is responsible for

2 to seize the time and site. G81 .... how the legends say, it should be a good product, But I do not expect

3. R600 has been the follow-up version of the noisy ring for the other drums, seemingly unable to resist NV. But they do not reasonably NV card that will be beyond everyone's expectations. Please do not use your thinking to measure NV strategy, so wait, the third quarter of 0907, also will play, storm approaching. R600 rumored version of the A15 (According to the Ministry of certain ATi market is greatly surname 2GHz/2GB million) and R680. time to time appeared to have a program that was really excellent.
 
Last edited by a moderator:
This is true for the inner pipeline FFs that are guaranteed to be rewritten each clock cycle, but I doubt that this is the case for register files: you need some way to keep the values in tact even if there's no activity and that's not going to happen with dynamic logic, so a 6T feedback structure will be necessary.

Obviously, they won't use standard cell based FF's, but some optimized variation of it.

That said, I'm looking now at the schematics of the Alpha 21264 register files: they have a 6T storage cell and are using the precharge-read method of get the contents of a row. 7 years ago, so this may be a bit dated. It shouldn't be too hard to find a paper that describes what Intel and AMD are doing.
http://www.research.ibm.com/journal/rd/475/oklobdzija.pdf

Many different fancy designs. Pulse latches FTW!
 
1. G80 restricted range constraints, the original plan has changed. It is responsible for

2 to seize the time and site. G81 .... how the legends say, it should be a good product, But I do not expect

3. R600 has been the follow-up version of the noisy ring for the other drums, seemingly unable to resist NV. But they do not reasonably NV card that will be beyond everyone's expectations. Please do not use your thinking to measure NV strategy, so wait, the third quarter of 0907, also will play, storm approaching. R600 rumored version of the A15 (According to the Ministry of certain ATi market is greatly surname 2GHz/2GB million) and R680. time to time appeared to have a program that was really excellent.

*chris tucker voice*

what the hell did you just say
 
*chris tucker voice*

what the hell did you just say

Allow me to try to translate.

He says the G80 has clock restraints that will limit its future performance capabilities (OC versions for example).

He doesnt think the G81 will be all that great and says ATI doesnt expect it to be a dramatic (G71ish) answer to the R600.

He also says ATI will have a refreshed respun 80nm R600 launched in September with 2GB of memory with either a 2GHz memory clock or a 2GHz internal processor clock (i think the latter) followed closely by the 65nm refresh.
 
Last edited by a moderator:
90nm R600? Must be a typo SugarCoat since I've never seen a refresh to date that goes from a smaller to a larger manufacturing process.

As for the supposed extravagant clock frequencies even on 65nm: yeahrightsureok LOL :D
 
There seems to be a very strong assumption by several people in this thread that R600 must be substantially superior to G80, simply because it is coming out 6 months later. This assumes that ATI has always intended to launch R600 6 months after G80. I don't think that's necessarily true.

If you go back in time a bit most of the rumours about the R600 release date suggested that R600 and G80 would be out almost simultaneously. Rumoured R600 dates then slipped from Novemberto January, then February, and now March/April.

I think it's very likely that ATI intended R600 to be a direct, immediate rival to G80, but then simply didn't manage to get it out the door. Maybe they were unlucky compared with Nvidia and needed one more respin than G80 did? I don't know. But, either way, I think we are in a situation not unlike what happened with R520 (although not quite as bad!): Nvidia and ATI parts were supposed to launch at the same time, but the ATI one experienced long delays and the Nvidia one didn't. (This is the reason there were only about 3 months between R520 and R580 - they were being worked on by separate teams, and R580 came out more or less on time).

I may well be wrong; but I can't see any reason to assume that ATI did intend there to be such a long gap between G80 and R600. And I think it is therefore very rash to assume that R600 must necessarily be a "next generation after G80" product; it's at least as likely to be the same generation with directly comparable performance.


Latest rumors suggest that R600 will completely crush G80 and even be suprerior to a rumored 80nm G80 refresh. (We are talking +40% on average compared to G80 in regular settings 1600x1200 *cough* - AA and AF maxed)
 
It's just as likely to be AMD trying to make Nvidia aim lower with G81.

I will forever doubt this kind of stuff. These companies have spies no doubt so no mis-information campaign is likely to be effective. Also, how exactly does one aim lower. There are upper limits to the G8x architecture on a given process node...it would make sense to just aim for those limits and give yourself the flexibility to throttle things back based on the hand the competition plays.

With all those extra transisitors, a lower process node, all that memory bandwidth and the 512bit memory controller, there's got to be something more in R600 that we don't yet know about.

Yeah, definitely. Although I'm not putting too much into the transistor count because there have to be many ways to approach a DX10 solution and some of them may be more transistor efficient than others. We're pretty much agreed that ATI's approach is significantly different to Nvidia's so that in itself could account for the transistor difference. Buy yeah, I'm still expecting something .....
 
Last edited by a moderator:
Latest rumors suggest that R600 will completely crush G80 and even be suprerior to a rumored 80nm G80 refresh. (We are talking +40% on average compared to G80 in regular settings 1600x1200 *cough* - AA and AF maxed)
No offense, but random bullshit and/or guesswork does not qualify as rumours... :) I assume you are refering to the level505 numbers? heh.


Uttar
 
Status
Not open for further replies.
Back
Top