R500 will Rock

RussSchultz said:
Will TSMC have to do some special work for ATI to utilize fast14? Dunno. Maybe. Part of the mojo might be extra mask steps, or a tweaked process or something.

This answered(or not answered) my question. :D

I was asking if there's special work need to be done. If so, how much.
 
the Xbox 2 / Xenon VPU is probably not a plain R500 but more of a custom VPU somewhere inbetween R500 and R600.

actually whoever said there is no R500 is probably right.

PC will first be getting R520 which is less than what R500 would have been since R520 is a souped up R300 architecture like R420. then in 2006 the PC should be getting the R600 with Windows Graphics Foundation 1.x / DirectX Next / DirectX 10 with Shader Model 4.0

R520 = Shader Model 3.0 or 3.0+

Xbox 2 / Xenon = Shader Model 3.0++

R600 = Shader Model 4.0

the Xbox 2 / Xenon VPU should definitally be beyond R520 and what is/was R500 much like the Xbox GPU (NV2A) was beyond GeForce 3 (NV20)
 
Megadrive1988 said:
R520 which is less than what R500 would have been since R520 is a souped up R300 architecture like R420

so, really, the R520 will end up being LESS than what the R400 was gonna be? damn :(

pssst... there is no R600 either. Well, there is, but we will never see it. They are actually going to release the R620 which is based on the R300 ;) :LOL:
 
I love the + after specifications. To make it easier for people to understand, graphics chips will be either fully DX9.0 or WGF(-whatever) compliant in the foreseeable future.

the Xbox 2 / Xenon VPU should definitally be beyond R520 and what is/was R500 much like the Xbox GPU (NV2A) was beyond GeForce 3 (NV20)

How do you define "beyond" exactly? If from what I´ve seen and heard this far (under the presupposition that it´s as close to reality as it can get) about R500/Xenon and it´s supposedly "more" than desktop/R520, then I´d rather think that ATI is in trouble. Frankly I consider Orton to be a lot smarter than that ;)

Console chips are entirely different beasts and the requirements are naturally different too; chances are low IMHO that R520 will have an on chip framebuffer and that´s the first fundamental difference. Since both chips though carry after the roadmap change an R5xx-preset, it´s logical to assume that they belong to the same generation. In other words SM3.0 compliant DX9.0 chips.
 
Am I off base here assuming a 200M transistor GPU that switched at several gigahertz would need a staggering amount of current to drive all those active transistors?

Think of how much current a modern GPU needs today just to switch those transistors at 400MHz. Speed the switching up by 5x to 10x and you'd need more current for identical transistors to those used today to switch at that increased speed. If you have 1/10th the time to switch then you need 10x the electrons arriving at the gates to than currently does (pun intended!).

Ten times the current arriving implies alot more voltage to drive it there and from memory heat is proportional to the square of the voltage in these cases. A chip of today forced to reach 3GHz would surely be sucking in hundreds of amps and a fast flow liquid helium bath to stop it immitating a flash bulb.

To combat this I presume you need faster switching materials that leak far less current and have much lower capitance, and can switch at lower voltages. That's quite an ask, and I understand significant breakthroughs have been made in all these areas this year.

I presume a CPU can reach such high clock speeds today because it is far less parallel than a GPU; in peak load at any moment I believe only 5% of a CPU's instruction transistors may be in use, for a GPU that figure can be 95% of all transistor in use. That 20 times more transistors in use means 20 times more current demand and alot more than 20 times more heat produced. I thought heat management of all the active transistors is the key reason GPUs are clocked 10 times slower than a CPU, so they only need * 20 / 10 = about twice the power demands, and thereby generate a manageable amount of heat.

I wouldn't expect a single chip GPU running at 1 GHz before 2007 at best, unless some really advanced materials are introduced into fab labs in the next 18 months.
 
I think the main push behind Intrinsity's technology is to give near custom cell performance at standard cell prices. From my limited understanding, the basic difference between the performance of a full custom cell vs. a standard cell is about 3 to 4 times. This is with the same amount of power applied to the custom cell. AMD and Intel do a full custom approach to their processors, and Chalnoth is right in that a large portion of each CPU is cache (which can be high speed, and highly repeatable in overall design).

So, if Intrinisty's design tools can generate dynamic logic that comes close to getting half the performance of full custom cells on TSMC's 130/110 nm process, then we can expect to see large GPU's running at 750 MHz consuming as much power and generating as much heat as a standard cell 130 nm Low-K product running at 450 MHz. While it is not the multi-GHz increase that many are hoping for, it is still a significant jump.
 
I think that is closer to the mark, but the only way I could see that you could run faster and produce less heat is to have either:

1) less current leakage == better materials and fab process

2) less current == a profoundly simpler design that achieves your end consuming fewer transistors to do exactly the same work load and therefore current to switch them.

Point 2 is a holy grail of circuit design. It means your layout needs a revelation in designing silicon to do more with less. Whilst possible it requires a eureka moment or two. I have seen that happen once or twice with silicon design along time ago, but today we are processing with a multitude of complex parts that I presume alot of really smart people are trying to optimise. I expect only if someone has a breakaway insight will a step change occur.
 
g__day said:
Am I off base here assuming a 200M transistor GPU that switched at several gigahertz would need a staggering amount of current to drive all those active transistors?

Think of how much current a modern GPU needs today just to switch those transistors at 400MHz.

The above indicates a significant misunderstanding of how a chips Mhz rating relates to transistor switching speed.

Transistors in nodes below 90nm can switch at almost 1 Thz (1000Ghz), and this is the same for CPUs and GPUs, though Intel and AMD's are faster and more highly tuned.
http://www.reed-electronics.com/electronicnews/article/CA185610?pubdate=12/10/2001

Transistors are in the hundreds of Ghz in switching speed today. The transistor switching speed is only part of the clock frequency of a chip, and a small one.

A GPU runs at 400 Mhz, not because that is the speed that its transistors switch, but the clock rate at which a pipeline stage completes doing whatever work it is meant to do in a single clock. Each pipeline stage has many cascaded transistors (and wire delays) and completes a task of larger scope than one switch turning on or off.
The larger scope task is typically doig some sort of calculation, or retrieving or writing some data.

g__day said:
I presume a CPU can reach such high clock speeds today because it is far less parallel than a GPU; in peak load at any moment I believe only 5% of a CPU's instruction transistors may be in use, for a GPU that figure can be 95% of all transistor in use. That 20 times more transistors in use means 20 times more current demand and alot more than 20 times more heat produced. I thought heat management of all the active transistors is the key reason GPUs are clocked 10 times slower than a CPU, so they only need * 20 / 10 = about twice the power demands, and thereby generate a manageable amount of heat.

Nope, the reason is nothing of the like.

A single CPU pipeline stage does very small work compared to a GPU. A CPU pipeline stage might add two 64 bit numbers together. A multiply will take 4 clocks (on a K8 core, for reference). A (L1) cache read may take 3 clocks on a CPU.

Clearly, if all you have to do is add two numbers in a clock cycle, you can run at high frequencies.

What does a GPU typically do in one clock? Maybe a full dot product of two 4-vectors (8 multiplies and three dependant adds). Or a texture blend (bilinear filter = a handful of adds and multiplies). Both of these tasks are larger than a single add by far and should be expected to take longer, meaning lower clock.

Quite simply, GPUs are lower clock because they are doing more work per clock. The transistors are switching at similar speeds to a CPU, but there are just more transistors cascaded together in a row in a single pipeline stage. Per clock, more transistors have to switch.

Although parallelism does play a role in the design differnces between a CPU and GPU, it is not a direct affect on clock speed. Otherwise, you'd see chips with half the pipelines running at twice the clockrate. But that isn't the case.
 
Scott C said:
Transistors are in the hundreds of Ghz in switching speed today. The transistor switching speed is only part of the clock frequency of a chip, and a small one.
I'm not sure that's true, as it determines how quickly a signal can propagate through the chip. Given infinite cooling, the time it takes for a signal to propagate through the longest pathway in the chip determines the maximum clockspeed of that chip. I claim that this is one reason why, even with extreme cooling, more modern processors clock higher than older ones.

What will be nice is a move to asynchronous processing, where the speed of the chip will be limited only by the average transistor switching speed and the length of the line being used to process.

What does a GPU typically do in one clock? Maybe a full dot product of two 4-vectors (8 multiplies and three dependant adds). Or a texture blend (bilinear filter = a handful of adds and multiplies). Both of these tasks are larger than a single add by far and should be expected to take longer, meaning lower clock.
I really doubt this is true, for the simple reason that GPUs need to hide many cycles of latency to keep texture reads moving at a good clip. Also consider that the performance hit for state changes (which is, essentially, flushing the pipelines) on a GPU is huge (on the order of hundreds of cycles).

From all that I've read, it seems to be obvious that GPU pipelines are much, much longer than CPU pipelines. The lack of high clockspeeds is probably due to two primary factors:
1. More logic density with more units active at any one time mean much higher heat dissipation.
2. Fast product cycles mean not nearly as much time is available for optimizing the various components for high-speed operation.
 
I think agp will be around for awhile. I mean consider how many ppl will really have pci express within the next year. Not the majority of customers. And as a result, the gpu makers will push for an agp interface to meet the demands of the majority of us that still have agp.
 
marsumane said:
I think agp will be around for awhile. I mean consider how many ppl will really have pci express within the next year. Not the majority of customers. And as a result, the gpu makers will push for an agp interface to meet the demands of the majority of us that still have agp.
No. If you want a new video card, you're going to have to get PCI Express in the very near future. The #1 market for GPU manfacturers is, by far, the OEM market. The OEM market has no problem switching platforms, since they specialize in selling whole PCs.
 
g__day said:
Am I off base here assuming a 200M transistor GPU that switched at several gigahertz would need a staggering amount of current to drive all those active transistors?

If you read the information on their site you will see that one of the big advantages of their technology is substantially reduced power consumption.
 
rwolf said:
If you read the information on their site you will see that one of the big advantages of their technology is substantially reduced power consumption.
That's fine to say, but I don't know. There's no way that GPU's are going to get better than CPU's made by the likes of AMD and Intel at power consumption. And since GPU's have much higher logic densities and greater usage efficiency, I just don't see GPU's approaching the frequencies of CPU's.

So, in short, I'm very dubious about those claims. They may be true in limited, synthetic circumstances, but I doubt that we'll see a change in a final product of more than about 20% or so.
 
Guys thanks for your thoughts but one or two points in your description still aren't clear.

ScottC said:
Transistors are in the hundreds of Ghz in switching speed today. The transistor switching speed is only part of the clock frequency of a chip, and a small one

Are you saying some parts of the GPU are running in the gigahertz but because of how the deep the pipelines are constructed the chip outside the pipline is running in 100s of megahertz to counter say the propogation time for data to get all the way thru a 10 stage 1,000 megahertz?

That just seems wierd to me. Inside the pipelines data runs fast thru each stage but throughput on your first data item takes a while so you slow you entire GPU down - er no I doubt you address latency in that manner.

I tend to think Chanloth might be nearer the truth - it would be limited design optimisation time / skill /IP - determining how complex is your design and therefore how hot it will run. How many transistors switch (charge / discharge) each second will have a direct bearing on power requirements and heat produced. Push that much power thru a chip and I bet capiticance becomes a real design nightmare too.

Rwolfe reading that paper it is hard to see if the whole chip with tens or hundreds of millions of transistors can all be switching at gigahertz - that they specifically don't say can happen. I am sure you can design a half adder that rockets in speed. But I wonder if 95% of the logic circuits are active doing work all the time?
 
I think many people here are essentially correct.

Not only is the base transistor speed important, but also the overall design and usage. Guys like NV and ATI are adopting software tools to help not only speed up the design process, but also speed up the products. These designs are just plain huge, and because of this guys like ATI and NVIDIA have to rely on standard cell designs. AMD and Intel have the engineers and time to do a full custom cell design. Now, the physical and electrical properties of a full custom cell design is very well known, and when integrating all these custom cells together AMD and Intel are able to achieve very high speeds with relatively low power usage.

Now, this isn't taking into account developments in dynamic logic, and the tools (such as Intrinsity's Fast-14) to do a full dynamic logic layout. I think we might be seeing the fruits of these types of labors with the R5x0 series and possibly the NV50. Merely speculation here, but with such tools we could possibly see a 20% to 25% improvement in clockspeed and power consumption over a standard cell design on a certain process. That is not insignificant in this industry!
 
Back
Top