Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 21-Mar-2012, 16:30   #26
Malo
Oz Yak
 
Join Date: Feb 2002
Location: US of A
Posts: 3,087
Default

Quote:
Originally Posted by Davros View Post
When being asked for a video card recommendation no ones ever asked how big is the gpu.
This is increasingly frustrating here. This is hardly the place where the layman is a factor in determining how/why aspects of a GPU are better/worse. If it were then every time a new GPU was released, all you would need is a thread asking if it's faster in Farmville. This is Beyond 3D.
__________________
Is EA still bleeding cash like an executive doing an ED-209 demonstration.... - Grall
Malo is offline   Reply With Quote
Old 22-Mar-2012, 02:42   #27
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: /
Posts: 4,241
Send a message via Skype™ to rpg.314
Default

Quote:
Originally Posted by Davros View Post
From who's veiwpoint ?
From architecture's pov.
Quote:
if cards containing both chips are the same price then chip A wins
From a consumer's pov.
rpg.314 is offline   Reply With Quote
Old 22-Mar-2012, 02:52   #28
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: /
Posts: 4,241
Send a message via Skype™ to rpg.314
Default

Quote:
Originally Posted by Albuquerque View Post
But power has obvious offshoots that make sense -- more power means more heat, means more cost to operate, means more regulation circuitry to operate correctly. Die size doesn't have any intrinsic limits, except an absolute ceiling on size. Yeah, ok, so a comparatively "big" chip is going to have a higher initial cost than a "small" chip, but when we're talking about really BIG ticket prices (the example I gave was a $9 final package difference on a $599 MSRP card) is essentially rounding error in the final cost to the consumer. I understand that bill of materials is going to see it as a higher percentage impact, but even at that price level, is the BOM going to see a $9 hike as anything larger than 10% at most? I don't know, this actually is a question that I have...
Yes, bigger dies cost more, non linearly more. And the bigger you get, the steeper is the climb. It's even more steep for a brand new process like 28nm.
Quote:
I get the basic thought that you're conveying, but that assumes "All else is equal" -- and it never is. What performance metric is 20% better? Is it floating point operations? Integer operations? Texture filtering? Raster ops? What if you're trying to tell me that floating point ops is the part that only gained 20%, but it turns out that it's purposefully limited by the vendor (ie, what we already have today?)
Those are synthetic microbenchmarks.

They are useful for architectural exploration, but not for evaluation by any sane measure. Like silent_guy pointed out, you have to consider entire applications.
Quote:
The problem is exascerbated by the fact that, in your example, A is still 20% faster than B. IN absolute terms, A wins, regardless of die size. When you declare B the winner by virtue only of a smaller die, you have now placed some direct importance on the size of the die -- but why does this matter?
It matters because some (myself included) care about architecture. Deeply. In this example, if everything else like power, features, price etc. were equal, as a consumer I might still end up buying chip B.
Quote:
If the argument is price, then we would have to assume that B is half the cost of A -- but that isn't going to be true. What happens if a 100% larger die only costs 10% more to make? Does that make A the winner? What happens if the half-sized die needs two extra PCB layers and requires eight memory chips rather than four to fill out the required buswidth? Is the half-sized die still a winner?
PCB costs are rounding error. As for memory chips, you got that the other way around. Smaller dies usually have narrower
interfaces. Even otherwise, die costs are probably the biggest contributor to BOM. As for 2x bigger die only 10% more expensive to make, that's almost impossible for realistic sizes and cutting edge processes.

Quote:
Die size, as part of an entire video card, still seems utterly meaningless. There are dozens if not hundreds of other things that will affect price and performance; the BOM isn't the die and an HDMI connector. Even if it were, every possible performance metric would NOT be exactly 20% different between the two.
Performance difference lie in the eyes of the beholder.
rpg.314 is offline   Reply With Quote
Old 23-Mar-2012, 03:47   #29
dkanter
Regular
 
Join Date: Jan 2008
Posts: 358
Default

Quote:
Originally Posted by Albuquerque View Post
Alright, fair enough. I didn't realize that physical layout had come to the point where density is basically constant -- in relative terms. My ignorance on that factor then precipitated my inability to get my head wrapped around why die size matters in performance terms.

I get it now, or at least, more than I did four hours ago Thank you for all the replies and the time you spent to explain it in a way that I could understand!
Physical design is not nearly that simple or uniform, but it is abstracted.

As Silent Guy alluded to earlier, different transistors can have leakage that varies by around 10-100X. The drive strength (i.e. speed) probably varies by around 2-8X. The size of the transistor is a fairly important variable as well, and can directly influence those other two factors. E.g. larger transistors leak less. So there is always a trade-off between power, area and performance.

The reason that you care about perf/mm2 or perf/W is that it captures that trade-off and normalizes for different factors.

If you have a GPU that is 1.2X faster, but is 8X larger, it's not a very efficient design. It's the best design for people who need the maximum performance...but for anyone who cares about cost it's going to be a worse choice.

The same goes for power.


That being said, as a consumer...you don't really care directly about perf/mm2. However, you care about a number of factors (e.g. power, cost, performance, board size, etc.) that are all profoundly influenced by perf/mm2 and perf/W.

So studying perf/mm2 and perf/W is valuable, because it can give you insights about the things you do care about.

DK
__________________
www.realworldtech.com
dkanter is offline   Reply With Quote
Old 23-Mar-2012, 06:13   #30
3dilettante
Regular
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 5,424
Default

One thing I've been mulling over concerning perf/mm2 is that top-end comparisons usually work because we compare designs on mostly comparable processes, or their direct predecessors.
With the rising costs and lengthening delays in making node transitions, and the lack of guarantees that making a transition is beneficial, perf/mm2 analysis may become more complicated.

AMD made a conscious decision prior to the 32nm cancellation that its lower-end chips would remain on 40nm, because of the cost argument.
28nm has not yet given us the volumes or perf/$ many have expected.
AMD may be pressured to lower prices or roll out some kind of OC edition, but supply limitations may push out the crossover point where any downward trend in demand crosses its supply.

Slides were shown with chip design costs more than doubling from 45nm to 22nm, I'm not certain if that includes escalating mask costs.
If 2.5 and 3D integration come to pass in wide scale, we may be comparing designs whose perf/mm2 per layer is inferior, but whose cost/mm2 and cost per design is competitive.

While it is generally not considered elegant or efficient to throw a lot more silicon at something, it doesn't look so bad if--and I am not claiming this is true yet--you can achieve the same end result while pocketing some cash savings.

When silicon scaling does hit a wall, we may have to start comparing perf/mm3 or perf/elements used in the recipe as designers and fabs work their way around any impediments.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is offline   Reply With Quote
Old 23-Mar-2012, 12:11   #31
upnorthsox
Senior Member
 
Join Date: May 2008
Posts: 1,423
Default

Quote:
Originally Posted by 3dilettante View Post
One thing I've been mulling over concerning perf/mm2 is that top-end comparisons usually work because we compare designs on mostly comparable processes, or their direct predecessors.
With the rising costs and lengthening delays in making node transitions, and the lack of guarantees that making a transition is beneficial, perf/mm2 analysis may become more complicated.

AMD made a conscious decision prior to the 32nm cancellation that its lower-end chips would remain on 40nm, because of the cost argument.
28nm has not yet given us the volumes or perf/$ many have expected.
AMD may be pressured to lower prices or roll out some kind of OC edition, but supply limitations may push out the crossover point where any downward trend in demand crosses its supply.

Slides were shown with chip design costs more than doubling from 45nm to 22nm, I'm not certain if that includes escalating mask costs.If 2.5 and 3D integration come to pass in wide scale, we may be comparing designs whose perf/mm2 per layer is inferior, but whose cost/mm2 and cost per design is competitive.

While it is generally not considered elegant or efficient to throw a lot more silicon at something, it doesn't look so bad if--and I am not claiming this is true yet--you can achieve the same end result while pocketing some cash savings.

When silicon scaling does hit a wall, we may have to start comparing perf/mm3 or perf/elements used in the recipe as designers and fabs work their way around any impediments.
I was just looking at this one the other day:

upnorthsox is offline   Reply With Quote
Old 23-Mar-2012, 19:45   #32
3dilettante
Regular
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 5,424
Default

The top end of the design cost ranges could show a $400 million dollar difference in design costs, so even if a next-node design is better than an oversized current-node device, is it $400 million better?

In terms of GPUs, interposers and die stacking may be something to watch out for, even if no radical new design changes take place in the graphics architecture.

Let's posit that 2.5D integration and a wide stacked GDDR standard became feasible in the time frame of that slide (just to have some numbers to play with). A 300-400 mm2 redesigned GPU on the next node with the usual PCB mounting might face off against a 28nm dual gpu (say Pitcairn++, or Tahiti++ if you're into that kind of thing) with stacked DRAM.

In this manufactured scenario, the next node might not win. The interposer might make certain things possible that weren't before, such as a very high-bandwidth link between GPUs.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is offline   Reply With Quote
Old 24-Mar-2012, 05:09   #33
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: /
Posts: 4,241
Send a message via Skype™ to rpg.314
Default

Even then, there would be an OoM throughput and latency penalty for going off die. Graphics architectures will have to change significantly to scale with that.
rpg.314 is offline   Reply With Quote
Old 25-Mar-2012, 00:48   #34
3dilettante
Regular
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 5,424
Default

The latencies in question are in the same order of magnitude as a memory access. An access to the second die will require a second hit to the memory controller or associated L2 slice, then possibly on to a DRAM module. It may be 2-5x the latency, but there are optimizations that can be made to make accesses to the other GPU bypass the rather significant buffering and queueing related to high utilization of GDDR on the first die.

The latency of NUMA for CPUs is within 1.5-2x a local hit, and that is considered close enough to treat as a single pool.
GPUs can tolerate much higher latency; what they could not obtain was the raw bandwidth.
In terms of throughput, a link would have the same pitch and signalling rates as available to the on-interposer DRAM.
Various streamout scenarios on single GPUs already contend with similar round trips to memory.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is offline   Reply With Quote
Old 25-Mar-2012, 02:59   #35
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: /
Posts: 4,241
Send a message via Skype™ to rpg.314
Default

I wasn't speaking of DRAM latency, which GPUs are good at hiding. I was speaking of the latency to go off die for the second GPU die. A sort last renderer *might* not be a good fit for that.

But if motivates them to switch to sort middle,
rpg.314 is offline   Reply With Quote
Old 25-Mar-2012, 10:05   #36
CarstenS
Senior Member
 
Join Date: May 2002
Location: Germany
Posts: 2,965
Send a message via ICQ to CarstenS
Default

Quote:
Originally Posted by dkanter View Post
As Silent Guy alluded to earlier, different transistors can have leakage that varies by around 10-100X. The drive strength (i.e. speed) probably varies by around 2-8X. The size of the transistor is a fairly important variable as well, and can directly influence those other two factors. E.g. larger transistors leak less. So there is always a trade-off between power, area and performance.
So, all other factors being equal, making your transistors (or the most critical transistors in this case) a bit larger on purpose could make them less leaky? Is that (one the things) you're implying above? For example forgoing absolutely smallest overall die size in order to achieve more consistent results across your yielded chips?
__________________
English is not my native tongue. Before flaming please consider the possiblity that I did not mean to say what you might have read from my posts.
Work| Recreation
Warning! This posting may contain unhealthy doses of gross humor, sarcastic remarks and exaggeration!
CarstenS is offline   Reply With Quote
Old 25-Mar-2012, 16:12   #37
silent_guy
Senior Member
 
Join Date: Mar 2006
Posts: 2,211
Default

Quote:
Originally Posted by CarstenS
So, all other factors being equal, making your transistors (or the most critical transistors in this case) a bit larger on purpose could make them less leaky? Is that (one the things) you're implying above? For example forgoing absolutely smallest overall die size in order to achieve more consistent results across your yielded chips?
I must admit that I wasn't aware of this. It's not something you get exposed to in the standard cell world: you simply have a ton of different cells with matrix of different speeds and different driving strengths. How this is implemented inside the cell is something you never look at.
silent_guy is offline   Reply With Quote
Old 25-Mar-2012, 17:00   #38
CarstenS
Senior Member
 
Join Date: May 2002
Location: Germany
Posts: 2,965
Send a message via ICQ to CarstenS
Default

So... that's a "no" to my question?
__________________
English is not my native tongue. Before flaming please consider the possiblity that I did not mean to say what you might have read from my posts.
Work| Recreation
Warning! This posting may contain unhealthy doses of gross humor, sarcastic remarks and exaggeration!
CarstenS is offline   Reply With Quote
Old 26-Mar-2012, 15:53   #39
3dilettante
Regular
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 5,424
Default

There appear to be density trade-offs made at least implicitly for the sake of mitigating variation, meaning that the transistors being used could be smaller but default to larger sizes.

Intel's SRAM cell sizes are often larger than those offered by foundry processes, such as TSMC.
However, when it comes to designing something manufacturable, Intel's actual products tend to have cell sizes that are the size advertised.
One of the reasons is to provide enough margin for a mass-produced product's memory to meet its reliability and power requirements at the desired voltages.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is offline   Reply With Quote
Old 30-Mar-2012, 05:06   #40
dkanter
Regular
 
Join Date: Jan 2008
Posts: 358
Default

Quote:
Originally Posted by silent_guy View Post
I must admit that I wasn't aware of this. It's not something you get exposed to in the standard cell world: you simply have a ton of different cells with matrix of different speeds and different driving strengths. How this is implemented inside the cell is something you never look at.
Yes. If you look at presentations, you'll see a number of discussions of "long Leffective" devices. Those are slightly longer transistors that have lower leakage.

DK
__________________
www.realworldtech.com
dkanter is offline   Reply With Quote
Old 30-Mar-2012, 06:57   #41
vking
Junior Member
 
Join Date: Jun 2007
Posts: 15
Default

Typically how we control leakage is,

(a) we go for the target frequencies we want to hit
(b) replace the cells in paths with slack with low leakage cells

Obviously the process is iterative (in terms of achieving balance between leakage and perf and that discussion is for another day). In step (b) above, cells with with equivalent drive strength but lower leakage tend to be larger than their leakier counterparts (but you all knew that already ) ).

From my experience, we can expect anywhere from 5% to 20% increase in area (largely a function of timing slack you have and leakage requirements).

I'd say perf per mm^2 is more a function of the choice architecture than floor planning and cell designs (based on my experience in a very niche area). Designs that tend to move large chunks of data across the length and breadth of the die tend to be less efficient in perf/mm^2 (again based on my experience).
vking is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 02:54.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.