Torrenza on die

Techno+

Regular
hi guys

If you haven't heard of it, 'Torrenza on die' is based on the same concept as Torrenza, except that the coprocessors are placed on the CPU die itself. Now ignoring all other types of coprocessors except GPUs, i have two questions.

1) Will the GPU run at the same speed as the CPU?
2) Do you think it will be able to compete with the high-end gfx market?

I believe it will, if AMD develops cores similar to SPEs in Cell, they can somehow use them to work with the GPU, while at the same time working on physx etc, for e.g these cores can be used for vertex and texture calculations. A second reason is that if Intel acquires nvidia, at some point there will be no company to design gfx cards, since both ATI and nvidia would be used by AMD and intel for on die gfx processing. I am just expressing my views.
 
If the gpu logic has anywhere near the complexity of todays mid-low range gpus, running synchronously with CPU clocks is impossible.
 
If the gpu logic has anywhere near the complexity of todays mid-low range gpus, running synchronously with CPU clocks is impossible.
Why exactly? Just use a whole lot more stages. And instead of needing 16 pipelines, 4 would be plenty at 2+ GHz.
 
If the gpu logic has anywhere near the complexity of todays mid-low range gpus, running synchronously with CPU clocks is impossible.
That makes absolutely no sense at all.

A modern CPU is much more complicated than a GPU. The GPU is a lot *wider*, which is why it has more logic.
 
That makes absolutely no sense at all.

A modern CPU is much more complicated than a GPU. The GPU is a lot *wider*, which is why it has more logic.

Yeah, I always thought it was because of the fast turn-around time of GPUs that nescessitates not-so-optimal standard component based layouts and very little manual tweaking.
 
Yes, but that turn-around time is finally starting to slow (at least wrt capability)... performance is trickier because the long term implications probably can't be fully understood right now.
 
I think it could go either way really, depending on how much time and money AMD puts into the effort.

Getting a GPU up to 2+ GHz is not a trivial task, but there is no reason why it can't be done given enough time, money, and man hours. Current GPU's use a lot of design automation using standard cells, but this is not optimal for clock speed. Using a fully custom (or as near as possible) then clockspeed can be significantly improved, along with power usage and heat generation.

Now, AMD has two ways that they could go...

The first being to port a ATI design to their SOI process (assuming 45 nm here) and use a clock divider so that the CPU portion would run at full speed (let's say 3 GHz) and the GPU portion runs at 500 MHz to 600 MHz. This would be the most simple way to do this, and the quickest to market... but suboptimal in terms of overall performance vs. die space. By making the GPU portion slower, they can theoretically make it more complex (wider) to have better performance in different applications.

The second way is to do a full custom GPU design, yet make it as simple as possible, but keep it so the clock speed is equal to the CPU portion. So, let's say that this smaller unit has 16 shader units, but because it is at 5X the speed of a larger 48 shader unit it actually has very good performance compared to the larger yet slower GPU.

At this point I can see it going either way. There are pros and cons to each method, but it all depends on how much time AMD wants to spend on it, and how much performance/die space they are willing to sacrifice.
 
That makes absolutely no sense at all.

Only if one doesn't know anything about IC design.

A modern CPU is much more complicated than a GPU.

Complexity vis a vis GPU/CPU is kind of irrelevant here. The P4 prescott is WILDY complex in it's design, but clocks very high because it is designed that way, i.e. 30-something pipeline stages and a LOT of logic(I've heard >50%) to resolve pipeline hazards and prefetch misses.

It is partly _because_ the GPU pipeline is relatively simple, that it just doesn't scale as well clock speed-wise. It may be so that the pipelines just aren't as granular as to what can be broken off into smaller stages. The delay length of the longest stage is what sets the target clock frequency, 1/T.
 
It is partly _because_ the GPU pipeline is relatively simple, that it just doesn't scale as well clock speed-wise. It may be so that the pipelines just aren't as granular as to what can be broken off into smaller stages. The delay length of the longest stage is what sets the target clock frequency, 1/T.
Unlike what is the case with CPUs, it is not particularly hard to design a GPU pipeline that is not chock full of dependencies back and forth across pipeline stages; once you get rid of such dependencies, chopping it up into the requisite number of stages to reach a given clock speed target is really quite straightforward. Without such dependencies, there really aren't very many operations that cannot be partitioned into successively smaller stages until you get down to the 1-gate-delay level.

The main problem with making a 2+GHz GPU is not so much timing budget, but power consumption. The registers that form the actual pipeline stages burn a lot of power, so by e.g. doubling the max clock speed of your GPU design (which requires a little more than a doubling of the number of pipeline steps), you will very approximately double the amount of heat produced per operation.
 
no

no


Aaron Spink
speaking for myself inc.

I like those answers. LOL

And the original questions are too generic, you have to consider many variables here. For all we know it can be a whole different aproach compared to current GPUs from here until it comes to market. And as others have already said we also need to solve the memory BW problem first.
 
Somehow I don't think they are terribly worried about memory bandwidth. Considering that an AM2 based processor doesn't particularly utilize that much memory bandwidth by itself, there is plenty left over. I also don't think that AMD is really aiming these products for users looking to play the latest 2008 games at 1920 x 1200 and above with high levels of AA. DDR3 will relieve this situation some, but again I don't think the designers and the particular marketplace these products are aimed at are really going to "need" 80 GB/sec of bandwidth.
 
do u think they can use some kind of editted GPU core with the CPU?

i kinda think so, if not it will be by 2010. For example, in Intel's Terascale demonstration they mentioned that they will be able to place special purpose cores to replace normal compute minicores. What i think will happen is that htey will develop a GPU that uses shader cores, TMUs etc which(shader cores etc) would have replaced a specific no. of compute cores
 
Back
Top