If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#1 |
|
Senior Member
Join Date: Apr 2007
Posts: 1,393
|
How do you think they will compare?
Llano: - 32nm - 400SPs (5D VLIW) @ up to 600MHz - dual-channel DDR3 @ ~ 1.6Gbps - mid 2011 Intel Graphics HD 200: - 32nm - 12 EUs (4D MADDs?) doubled troughput over last generation , 4 TMUs, clocks up to 1.35GHz - Direct3D 10.1 support, OpenCL, DirectCompute - connected to 8MiB LL-cache - dual-channel DDR3 @ ~ 1.6Gbps - early 2011 Iy Bridge Graphics: - 22nm - 16 EUs according to Intel - Direct3D 11 support - stacked DRAM? - early 2012 Last edited by AnarchX; 14-Apr-2011 at 13:21. Reason: Update |
|
|
|
|
|
#2 |
|
Member
Join Date: Aug 2010
Location: Hungary
Posts: 104
|
SNB IGP OpenCL compatible?
|
|
|
|
|
|
#3 | |
|
Senior Member
Join Date: Apr 2007
Posts: 1,393
|
Quote:
|
|
|
|
|
|
|
#4 | |
|
Senior Member
Join Date: Oct 2002
Posts: 2,434
|
Quote:
I'm quite sure there were 8 TMUs even for i965 already (though not sure what they could do per clock), and I certainly wouldn't expect SNB to have less (in theory, it could have more, since it appears some versions will have 6 EUs the other 12 EUs, it's possible at least on paper the tmu block isn't shared). In any case, texture fillrate should be quite good even with 8 TMUs (possibly approaching Llano levels), with the caveat I've no idea about FP16 etc. For flops, if that's 4D units, you're looking at ~120GFlops if you count that MAC as 2 ops. If that's 8D units, well then that's twice that which would begin to look nearly comparable to Llano. So for Ivy Bridge, if that basically doubles SNB graphics performances, that could be quite a challenge for Llano. Though of course there's a lot more to graphic performance than just alus/tmus - one area intel was very weak was what AMD initially named HyperZ, things like early-z (though intel can do this now), z buffer compression etc to save bandwdith. I think though SNB improves this quite a bit, and the 8MB cache could give it a huge advantage in some situations since these chips are quite a bit bandwidth-challenged. |
|
|
|
|
|
|
#5 |
|
Senior Member
Join Date: Apr 2007
Posts: 1,393
|
Next-Gen Fusions Trinity and Komodo: http://www.abload.de/img/amddesktop126q72.jpg
- still 32nm - probably L3-Cache connection for IGP - probably increased die-size (Thuban level ~300mm²) which should allow to increase SIMDs from 6 to 10 (800SPs @ 5D, 640SPs @4D) - probably mid 2012 release - Komodo probably with 3 memory channels or GDDR5 sideport Last edited by AnarchX; 09-Nov-2010 at 19:02. |
|
|
|
|
|
#6 | |
|
Senior Member
Join Date: Feb 2003
Location: Sofia, BG
Posts: 1,136
|
Quote:
__________________
"There are three types of lies - lies, damn lies, and statistics." |
|
|
|
|
|
|
#7 |
|
Senior Member
Join Date: Apr 2007
Posts: 1,393
|
|
|
|
|
|
|
#8 | ||||
|
Member
Join Date: May 2002
Location: Herwood, Tampere, Finland
Posts: 264
|
Quote:
Quote:
Quote:
And I don't except them to increase die size much, would be too costly to manufacture. My estimate is increase from 3(*80) to 4(*64) Quote:
|
||||
|
|
|
|
|
#9 | |
|
Member
Join Date: May 2002
Location: Herwood, Tampere, Finland
Posts: 264
|
And there won't be a sideport in a chip which does not contain a GPU.
AMD's PDF document for the investor day: http://phx.corporate-ir.net/External...xUeXBlPTM=&t=1 Quote:
Last edited by hkultala; 11-Nov-2010 at 06:41. |
|
|
|
|
|
|
#10 |
|
Member
Join Date: Sep 2005
Location: Rage3D
Posts: 301
|
"designed to couple with" doesn't prove the existence of sideport.
|
|
|
|
|
|
#11 | |
|
Member
Join Date: Apr 2009
Posts: 140
|
Quote:
Komodo is CPU and guesstimating that it will probably be augmented with GPU similar to one used in Ontario/Zacate APUs, up to 80SPs (5D-VLIW) but more probably 64SPs "3rd Gen DX11" 4D-VLIW with other TMU:ROPS unchanged from O/Z. My guess is that Komodo will probably addressing lack of IGPs in new chipsets and also make it more comparable to intels SB. And it will be socket compatible with Zambezi (AM3r2) As for Trinity APU as it's in slides 2-4 BD cores, i in fact hope for 4-6 BD cores and "3rd Gen DX11" (SI) with maybe some minor upgrade from 480SPs 5D (EG/"NI" shaders) in Llano to 640SPs 4D (SI shaders). But then maybe AMD will stay to 2-4 BD cores just so they could add up necessary 4MB of L3 cache to it instead of extra 2 BD cores. Trinity 2-4BD cores (4MB L2 cache) 4MB L3 cache 640SP (4D DX11 gen3) sFM1/sFS1 or better (?) 4-6BD cores (6MB L2 cache) no L3 cache 640SP (4D DX11 gen3) sFM1/sFS1 second solution would certainly need less job to adapt Llano style APU design to Trinity design. And does GPU really benefit from additional 4MB L3, instead already large 6M L2 (total for six BDv1 cores) available in HPC case. And for most of 3D/gaming work Llano and probably Trinity will rely on cheap 128-bit DDR3 1866MHz memory BW giving 30GB/s in total (shared w/ CPU) which is probably even good enough for budget dual display 1080p noAA/noAF gaming (considering for praised 640SP), or single 1080p 2AA/16AF? |
|
|
|
|
|
|
#12 | |
|
Member
Join Date: May 2002
Location: Herwood, Tampere, Finland
Posts: 264
|
Quote:
It's still manufactures at 32nm, and it's not a high-end products, so it must not big too big/too expensive to manufacture. And I don't see L3 cache as "necessary thing" for this market segment. With 2*2 MB L2 cache there is already plenty of cache. |
|
|
|
|
|
|
#13 | |
|
Senior Member
Join Date: Oct 2002
Posts: 2,434
|
Quote:
Well, the advantage of L3 is that you can use it for graphics too - L2 being exclusive to the cpu cores. This also probably means you can make the L2 cache attached to the ROPs smaller if you've got shared L3 and it's still faster (as the gpu l2 cache wasn't that large). Clearly, for Phenom II / Athlon II the L3 cache did not really help THAT much - but that balance should shift towards the solution with L3 cache in terms of performance benefits / area if you can also use it for the graphic core. It might require some changes to the MC/graphic core though, which might be something AMD isn't willing to do (as they couldn't just use basically unchanged discrete gpu cores). |
|
|
|
|
|
|
#14 |
|
Member
Join Date: May 2002
Location: Herwood, Tampere, Finland
Posts: 264
|
|
|
|
|
|
|
#15 |
|
Member
Join Date: May 2002
Location: Herwood, Tampere, Finland
Posts: 264
|
|
|
|
|
|
|
#16 | |
|
Senior Member
|
Quote:
240SPs at ~1040MHz just doesn't seem realistic, power-wise. ![]() That GPU-part looks to be around 100mm², which is close to Redwood's size, but on 32nm. |
|
|
|
|
|
|
#17 | ||
|
Senior Member
Join Date: Oct 2002
Posts: 2,434
|
Quote:
Quote:
|
||
|
|
|
|
|
#18 |
|
Senior Member
|
There was another comment during analyst day, where the guy said 500+ GFLOPS, worded in a way that makes me think it was just for the GPU. I don't have time right now but I'll try to find it a link it later today.
|
|
|
|
|
|
#19 |
|
Senior Member
Join Date: Oct 2002
Posts: 2,434
|
Even with 500+ gflops for the gpu, shouldn't 400 SPs be more than sufficient? That would only need 625Mhz. Shouldn't the 32nm SOI process actually allow clock increases over 40nm bulk? Granted the structure doesn't really look like that. But it would be strange imho if there would be so many simds (hence increasing cost) but then they'd be clocked so low.
|
|
|
|
|
|
#20 | |
|
Senior Member
|
Quote:
I can't find a free transcript for Tuesday's analyst day, but I think the quote in question was during the Client platforms breakout session, for which the webcast is still available. |
|
|
|
|
|
|
#21 |
|
Member
Join Date: Aug 2005
Location: Mars
Posts: 181
|
Why are there no APU's GPUs running at 2+ GHz?
|
|
|
|
|
|
#22 |
|
Member
Join Date: Sep 2006
Posts: 273
|
It's all about balance. Remember we aren't talking about the 1980's which the components had passive cooling using 3W. We are already limited by cooling and power consumption.
It's probably better to get 400SPs at 650MHz than 200SPs at 1300MHz. GPU code has extremely high parallelism so adding more SPs are easier than clocking it high. Nvidia does have high clock speeds for its SPs, but again, its just for SPs. All other blocks clock much lower. ATI design calls for having everything clock like the base clock. I guess they can change it, but not something that'll happen overnight. Even if the process technology, thermal and power limits, and costs of development allow clocking the GPU at 2GHz, does the design allow it? |
|
|
|
|
|
#23 | |
|
Member
Join Date: May 2002
Location: Herwood, Tampere, Finland
Posts: 264
|
It seems intel is finally at least developing openCL implementation for their integrated GPU's:
They just sent an email to llvm-developers list, recruiting people to develop their llvm-based opencl implementation: Quote:
|
|
|
|
|
|
|
#24 |
|
Senior Member
|
|
|
|
|
|
|
#25 | |
|
Member
Join Date: Jan 2010
Posts: 416
|
Quote:
Wouldnt they use it already in server cpu-s if they could get 1 GB of memory at 5770 speeds in the ivy bridge design. |
|
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|