If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#26 |
|
Oz Yak
Join Date: Feb 2002
Location: US of A
Posts: 2,528
|
This is increasingly frustrating here. This is hardly the place where the layman is a factor in determining how/why aspects of a GPU are better/worse. If it were then every time a new GPU was released, all you would need is a thread asking if it's faster in Farmville. This is Beyond 3D.
__________________
Is EA still bleeding cash like an executive doing an ED-209 demonstration.... - Grall |
|
|
|
|
|
#27 | |
|
Senior Member
|
From architecture's pov.
Quote:
|
|
|
|
|
|
|
#28 | |||||
|
Senior Member
|
Quote:
Quote:
They are useful for architectural exploration, but not for evaluation by any sane measure. Like silent_guy pointed out, you have to consider entire applications. Quote:
Quote:
interfaces. Even otherwise, die costs are probably the biggest contributor to BOM. As for 2x bigger die only 10% more expensive to make, that's almost impossible for realistic sizes and cutting edge processes. Quote:
|
|||||
|
|
|
|
|
#29 | |
|
Regular
Join Date: Jan 2008
Posts: 354
|
Quote:
As Silent Guy alluded to earlier, different transistors can have leakage that varies by around 10-100X. The drive strength (i.e. speed) probably varies by around 2-8X. The size of the transistor is a fairly important variable as well, and can directly influence those other two factors. E.g. larger transistors leak less. So there is always a trade-off between power, area and performance. The reason that you care about perf/mm2 or perf/W is that it captures that trade-off and normalizes for different factors. If you have a GPU that is 1.2X faster, but is 8X larger, it's not a very efficient design. It's the best design for people who need the maximum performance...but for anyone who cares about cost it's going to be a worse choice. The same goes for power. That being said, as a consumer...you don't really care directly about perf/mm2. However, you care about a number of factors (e.g. power, cost, performance, board size, etc.) that are all profoundly influenced by perf/mm2 and perf/W. So studying perf/mm2 and perf/W is valuable, because it can give you insights about the things you do care about. DK
__________________
www.realworldtech.com |
|
|
|
|
|
|
#30 |
|
Senior Member
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,242
|
One thing I've been mulling over concerning perf/mm2 is that top-end comparisons usually work because we compare designs on mostly comparable processes, or their direct predecessors.
With the rising costs and lengthening delays in making node transitions, and the lack of guarantees that making a transition is beneficial, perf/mm2 analysis may become more complicated. AMD made a conscious decision prior to the 32nm cancellation that its lower-end chips would remain on 40nm, because of the cost argument. 28nm has not yet given us the volumes or perf/$ many have expected. AMD may be pressured to lower prices or roll out some kind of OC edition, but supply limitations may push out the crossover point where any downward trend in demand crosses its supply. Slides were shown with chip design costs more than doubling from 45nm to 22nm, I'm not certain if that includes escalating mask costs. If 2.5 and 3D integration come to pass in wide scale, we may be comparing designs whose perf/mm2 per layer is inferior, but whose cost/mm2 and cost per design is competitive. While it is generally not considered elegant or efficient to throw a lot more silicon at something, it doesn't look so bad if--and I am not claiming this is true yet--you can achieve the same end result while pocketing some cash savings. When silicon scaling does hit a wall, we may have to start comparing perf/mm3 or perf/elements used in the recipe as designers and fabs work their way around any impediments.
__________________
Dreaming of a .065 micron etch-a-sketch. |
|
|
|
|
|
#31 | |
|
Senior Member
Join Date: May 2008
Posts: 1,233
|
Quote:
|
|
|
|
|
|
|
#32 |
|
Senior Member
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,242
|
The top end of the design cost ranges could show a $400 million dollar difference in design costs, so even if a next-node design is better than an oversized current-node device, is it $400 million better?
In terms of GPUs, interposers and die stacking may be something to watch out for, even if no radical new design changes take place in the graphics architecture. Let's posit that 2.5D integration and a wide stacked GDDR standard became feasible in the time frame of that slide (just to have some numbers to play with). A 300-400 mm2 redesigned GPU on the next node with the usual PCB mounting might face off against a 28nm dual gpu (say Pitcairn++, or Tahiti++ if you're into that kind of thing) with stacked DRAM. In this manufactured scenario, the next node might not win. The interposer might make certain things possible that weren't before, such as a very high-bandwidth link between GPUs.
__________________
Dreaming of a .065 micron etch-a-sketch. |
|
|
|
|
|
#33 |
|
Senior Member
|
Even then, there would be an OoM throughput and latency penalty for going off die. Graphics architectures will have to change significantly to scale with that.
|
|
|
|
|
|
#34 |
|
Senior Member
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,242
|
The latencies in question are in the same order of magnitude as a memory access. An access to the second die will require a second hit to the memory controller or associated L2 slice, then possibly on to a DRAM module. It may be 2-5x the latency, but there are optimizations that can be made to make accesses to the other GPU bypass the rather significant buffering and queueing related to high utilization of GDDR on the first die.
The latency of NUMA for CPUs is within 1.5-2x a local hit, and that is considered close enough to treat as a single pool. GPUs can tolerate much higher latency; what they could not obtain was the raw bandwidth. In terms of throughput, a link would have the same pitch and signalling rates as available to the on-interposer DRAM. Various streamout scenarios on single GPUs already contend with similar round trips to memory.
__________________
Dreaming of a .065 micron etch-a-sketch. |
|
|
|
|
|
#35 |
|
Senior Member
|
I wasn't speaking of DRAM latency, which GPUs are good at hiding. I was speaking of the latency to go off die for the second GPU die. A sort last renderer *might* not be a good fit for that.
But if motivates them to switch to sort middle, |
|
|
|
|
|
#36 | |
|
Senior Member
|
Quote:
__________________
English is not my native tongue. Before flaming please consider the possiblity that I did not mean to say what you might have read from my posts. Work| RecreationWarning! This posting may contain unhealthy doses of gross humor, sarcastic remarks and exaggeration! |
|
|
|
|
|
|
#37 | |
|
Senior Member
Join Date: Mar 2006
Posts: 1,713
|
Quote:
|
|
|
|
|
|
|
#38 |
|
Senior Member
|
So... that's a "no" to my question?
__________________
English is not my native tongue. Before flaming please consider the possiblity that I did not mean to say what you might have read from my posts. Work| RecreationWarning! This posting may contain unhealthy doses of gross humor, sarcastic remarks and exaggeration! |
|
|
|
|
|
#39 |
|
Senior Member
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,242
|
There appear to be density trade-offs made at least implicitly for the sake of mitigating variation, meaning that the transistors being used could be smaller but default to larger sizes.
Intel's SRAM cell sizes are often larger than those offered by foundry processes, such as TSMC. However, when it comes to designing something manufacturable, Intel's actual products tend to have cell sizes that are the size advertised. One of the reasons is to provide enough margin for a mass-produced product's memory to meet its reliability and power requirements at the desired voltages.
__________________
Dreaming of a .065 micron etch-a-sketch. |
|
|
|
|
|
#40 | |
|
Regular
Join Date: Jan 2008
Posts: 354
|
Quote:
DK
__________________
www.realworldtech.com |
|
|
|
|
|
|
#41 |
|
Junior Member
Join Date: Jun 2007
Posts: 15
|
Typically how we control leakage is,
(a) we go for the target frequencies we want to hit (b) replace the cells in paths with slack with low leakage cells Obviously the process is iterative (in terms of achieving balance between leakage and perf and that discussion is for another day). In step (b) above, cells with with equivalent drive strength but lower leakage tend to be larger than their leakier counterparts (but you all knew that already From my experience, we can expect anywhere from 5% to 20% increase in area (largely a function of timing slack you have and leakage requirements). I'd say perf per mm^2 is more a function of the choice architecture than floor planning and cell designs (based on my experience in a very niche area). Designs that tend to move large chunks of data across the length and breadth of the die tend to be less efficient in perf/mm^2 (again based on my experience). |
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|