Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 29-Oct-2010, 14:40   #1
AnarchX
Senior Member
 
Join Date: Apr 2007
Posts: 1,393
Default Llano IGP vs SNB IGP vs IVB IGP

How do you think they will compare?

Llano:
- 32nm
- 400SPs (5D VLIW) @ up to 600MHz
- dual-channel DDR3 @ ~ 1.6Gbps
- mid 2011

Intel Graphics HD 200:
- 32nm
- 12 EUs (4D MADDs?) doubled troughput over last generation , 4 TMUs, clocks up to 1.35GHz
- Direct3D 10.1 support, OpenCL, DirectCompute
- connected to 8MiB LL-cache
- dual-channel DDR3 @ ~ 1.6Gbps
- early 2011

Iy Bridge Graphics:
- 22nm
- 16 EUs according to Intel
- Direct3D 11 support
- stacked DRAM?
- early 2012

Last edited by AnarchX; 14-Apr-2011 at 13:21. Reason: Update
AnarchX is offline   Reply With Quote
Old 29-Oct-2010, 16:43   #2
Chabi
Member
 
Join Date: Aug 2010
Location: Hungary
Posts: 104
Default

SNB IGP OpenCL compatible?
Chabi is offline   Reply With Quote
Old 29-Oct-2010, 16:44   #3
AnarchX
Senior Member
 
Join Date: Apr 2007
Posts: 1,393
Default

Quote:
Ce core graphique intègre cependant le support de l’antialiasing pour pouvoir passer à DirectX 10.1. Il supporte également OpenGL 3.1 et, plus intéressant, OpenCL. DirectCompute en version 4.1 est également au menu.
http://www.hardware.fr/articles/803-...e-honneur.html
AnarchX is offline   Reply With Quote
Old 29-Oct-2010, 18:01   #4
mczak
Senior Member
 
Join Date: Oct 2002
Posts: 2,434
Default

Quote:
Originally Posted by AnarchX View Post
Intel Graphics HD 200:
- 12 EUs (4D MADDs?) doubled troughput over last generation , 4 TMUs, clocks up to 1.35GHz
The EUs still can't do MAD. They can, however, do MAC (with a special accumulator reg), and, in contrast to the last generation, enable/disable accumulator update per instruction, which might make it more easy to exploit this. Earlier EUs were 4D physical, 8D logical (well they had 4D mode but such a 4D instruction still took 2 cycles), so it's possible (but I don't know) they are 8D physical now (which would explain the "double throughput" but maybe that quote was meant to describe something else).
I'm quite sure there were 8 TMUs even for i965 already (though not sure what they could do per clock), and I certainly wouldn't expect SNB to have less (in theory, it could have more, since it appears some versions will have 6 EUs the other 12 EUs, it's possible at least on paper the tmu block isn't shared).
In any case, texture fillrate should be quite good even with 8 TMUs (possibly approaching Llano levels), with the caveat I've no idea about FP16 etc. For flops, if that's 4D units, you're looking at ~120GFlops if you count that MAC as 2 ops. If that's 8D units, well then that's twice that which would begin to look nearly comparable to Llano.
So for Ivy Bridge, if that basically doubles SNB graphics performances, that could be quite a challenge for Llano. Though of course there's a lot more to graphic performance than just alus/tmus - one area intel was very weak was what AMD initially named HyperZ, things like early-z (though intel can do this now), z buffer compression etc to save bandwdith. I think though SNB improves this quite a bit, and the 8MB cache could give it a huge advantage in some situations since these chips are quite a bit bandwidth-challenged.
mczak is offline   Reply With Quote
Old 09-Nov-2010, 18:54   #5
AnarchX
Senior Member
 
Join Date: Apr 2007
Posts: 1,393
Default

Next-Gen Fusions Trinity and Komodo: http://www.abload.de/img/amddesktop126q72.jpg

- still 32nm
- probably L3-Cache connection for IGP
- probably increased die-size (Thuban level ~300mm²) which should allow to increase SIMDs from 6 to 10 (800SPs @ 5D, 640SPs @4D)
- probably mid 2012 release
- Komodo probably with 3 memory channels or GDDR5 sideport

Last edited by AnarchX; 09-Nov-2010 at 19:02.
AnarchX is offline   Reply With Quote
Old 09-Nov-2010, 20:05   #6
chavvdarrr
Senior Member
 
Join Date: Feb 2003
Location: Sofia, BG
Posts: 1,136
Default

Quote:
Originally Posted by AnarchX View Post
Next-Gen Fusions Trinity and Komodo: http://www.abload.de/img/amddesktop126q72.jpg

- still 32nm
- probably L3-Cache connection for IGP
- probably increased die-size (Thuban level ~300mm²) which should allow to increase SIMDs from 6 to 10 (800SPs @ 5D, 640SPs @4D)
- probably mid 2012 release
- Komodo probably with 3 memory channels or GDDR5 sideport
I had a feeling that Zacate has 2 SIMDs with 80SPs total
__________________
"There are three types of lies - lies, damn lies, and statistics."
chavvdarrr is offline   Reply With Quote
Old 10-Nov-2010, 07:24   #7
AnarchX
Senior Member
 
Join Date: Apr 2007
Posts: 1,393
Default

Quote:
Originally Posted by chavvdarrr View Post
I had a feeling that Zacate has 2 SIMDs with 80SPs total
The topic is about higher performance APUs/CPU-IGP-chips: Llano IGP vs SNB IGP vs IVB IGP.
AnarchX is offline   Reply With Quote
Old 10-Nov-2010, 07:41   #8
hkultala
Member
 
Join Date: May 2002
Location: Herwood, Tampere, Finland
Posts: 264
Default

Quote:
Originally Posted by AnarchX View Post
Next-Gen Fusions Trinity and Komodo: http://www.abload.de/img/amddesktop126q72.jpg

- still 32nm
Yes, of course.

Quote:
- probably L3-Cache connection for IGP
I see nothing suggesting this.

Quote:
- probably increased die-size (Thuban level ~300mm²) which should allow to increase SIMDs from 6 to 10 (800SPs @ 5D, 640SPs @4D)
.. except that Llano will not have 6 but 3 SIMD cores (240 ALUs).
And I don't except them to increase die size much, would be too costly to manufacture.

My estimate is increase from 3(*80) to 4(*64)


Quote:
- probably mid 2012 release
- Komodo probably with 3 memory channels or GDDR5 sideport
AMD has never used non-2-power memory buses before. I don't except them to do it with Komodo either.
hkultala is offline   Reply With Quote
Old 10-Nov-2010, 12:30   #9
hkultala
Member
 
Join Date: May 2002
Location: Herwood, Tampere, Finland
Posts: 264
Default

Quote:
Originally Posted by AnarchX View Post
- Komodo probably with 3 memory channels or GDDR5 sideport
And there won't be a sideport in a chip which does not contain a GPU.

AMD's PDF document for the investor day:

http://phx.corporate-ir.net/External...xUeXBlPTM=&t=1

Quote:
Originally Posted by AMD nov 9 pdf
“Komodo”
Market: Server and Performance Desktops
What is it? “Komodo” is AMD’s next generation CPU and is primarily intended for
servers and high-performance desktops. “Komodo” will feature next-generation
“Bulldozer” CPU cores and, in desktop PC platforms, is designed to couple with
DirectX® 11 GPUs to provide enthusiast-level system performance.

Planned for introduction: 2012

Last edited by hkultala; 11-Nov-2010 at 06:41.
hkultala is offline   Reply With Quote
Old 10-Nov-2010, 20:00   #10
caveman-jim
Member
 
Join Date: Sep 2005
Location: Rage3D
Posts: 301
Default

"designed to couple with" doesn't prove the existence of sideport.
caveman-jim is offline   Reply With Quote
Old 11-Nov-2010, 05:14   #11
keritto
Member
 
Join Date: Apr 2009
Posts: 140
Default

Quote:
Originally Posted by AnarchX View Post
Next-Gen Fusions Trinity and Komodo: http://www.abload.de/img/amddesktop126q72.jpg
Komodo is listed asCPU, and you should differentiate it from Llano and NG-Trinity as it could be seen in slides

Komodo is CPU and guesstimating that it will probably be augmented with GPU similar to one used in Ontario/Zacate APUs, up to 80SPs (5D-VLIW) but more probably 64SPs "3rd Gen DX11" 4D-VLIW with other TMU:ROPS unchanged from O/Z. My guess is that Komodo will probably addressing lack of IGPs in new chipsets and also make it more comparable to intels SB. And it will be socket compatible with Zambezi (AM3r2)

As for Trinity APU as it's in slides 2-4 BD cores, i in fact hope for 4-6 BD cores and "3rd Gen DX11" (SI) with maybe some minor upgrade from 480SPs 5D (EG/"NI" shaders) in Llano to 640SPs 4D (SI shaders). But then maybe AMD will stay to 2-4 BD cores just so they could add up necessary 4MB of L3 cache to it instead of extra 2 BD cores.

Trinity
2-4BD cores (4MB L2 cache)
4MB L3 cache
640SP (4D DX11 gen3)
sFM1/sFS1

or better (?)
4-6BD cores (6MB L2 cache)
no L3 cache
640SP (4D DX11 gen3)
sFM1/sFS1

second solution would certainly need less job to adapt Llano style APU design to Trinity design.

And does GPU really benefit from additional 4MB L3, instead already large 6M L2 (total for six BDv1 cores) available in HPC case. And for most of 3D/gaming work Llano and probably Trinity will rely on cheap 128-bit DDR3 1866MHz memory BW giving 30GB/s in total (shared w/ CPU) which is probably even good enough for budget dual display 1080p noAA/noAF gaming (considering for praised 640SP), or single 1080p 2AA/16AF?
keritto is offline   Reply With Quote
Old 11-Nov-2010, 06:45   #12
hkultala
Member
 
Join Date: May 2002
Location: Herwood, Tampere, Finland
Posts: 264
Default

Quote:
Originally Posted by keritto View Post
As for Trinity APU as it's in slides 2-4 BD cores, i in fact hope for 4-6 BD cores and "3rd Gen DX11" (SI) with maybe some minor upgrade from 480SPs 5D (EG/"NI" shaders) in Llano to 640SPs 4D (SI shaders). But then maybe AMD will stay to 2-4 BD cores just so they could add up necessary 4MB of L3 cache to it instead of extra 2 BD cores.
more than 4 bulldozer cores/2 bulldozer modules would make it too big.
It's still manufactures at 32nm, and it's not a high-end products, so it must not big too big/too expensive to manufacture.

And I don't see L3 cache as "necessary thing" for this market segment. With 2*2 MB L2 cache there is already plenty of cache.
hkultala is offline   Reply With Quote
Old 11-Nov-2010, 12:46   #13
mczak
Senior Member
 
Join Date: Oct 2002
Posts: 2,434
Default

Quote:
Originally Posted by keritto View Post
As for Trinity APU as it's in slides 2-4 BD cores, i in fact hope for 4-6 BD cores and "3rd Gen DX11" (SI) with maybe some minor upgrade from 480SPs 5D (EG/"NI" shaders) in Llano to 640SPs 4D (SI shaders).
I really don't see the 480SPs in Llano - not with the flop numbers AMD quoted. More like 240SP IMHO.

Quote:
Originally Posted by hkultala View Post
And I don't see L3 cache as "necessary thing" for this market segment. With 2*2 MB L2 cache there is already plenty of cache.
Well, the advantage of L3 is that you can use it for graphics too - L2 being exclusive to the cpu cores. This also probably means you can make the L2 cache attached to the ROPs smaller if you've got shared L3 and it's still faster (as the gpu l2 cache wasn't that large). Clearly, for Phenom II / Athlon II the L3 cache did not really help THAT much - but that balance should shift towards the solution with L3 cache in terms of performance benefits / area if you can also use it for the graphic core. It might require some changes to the MC/graphic core though, which might be something AMD isn't willing to do (as they couldn't just use basically unchanged discrete gpu cores).
mczak is offline   Reply With Quote
Old 11-Nov-2010, 21:43   #14
hkultala
Member
 
Join Date: May 2002
Location: Herwood, Tampere, Finland
Posts: 264
Default

Quote:
Originally Posted by mczak View Post

Well, the advantage of L3 is that you can use it for graphics too - L2 being exclusive to the cpu cores.
What makes this an advantage?
hkultala is offline   Reply With Quote
Old 11-Nov-2010, 21:45   #15
hkultala
Member
 
Join Date: May 2002
Location: Herwood, Tampere, Finland
Posts: 264
Default

Quote:
Originally Posted by mczak View Post
I really don't see the 480SPs in Llano - not with the flop numbers AMD quoted. More like 240SP IMHO.
Yep.

And the size of the GPU part of the chip also seems to indicate it has 240 shader ALU's, not 480.
hkultala is offline   Reply With Quote
Old 11-Nov-2010, 21:47   #16
Alexko
Senior Member
 
Join Date: Aug 2009
Posts: 2,019
Send a message via MSN to Alexko
Default

Quote:
Originally Posted by mczak View Post
I really don't see the 480SPs in Llano - not with the flop numbers AMD quoted. More like 240SP IMHO.
They said 500+ GFLOPS. That sounds to me like 480SPs @ ~550MHz or maybe 400SPs @ ~630MHz.

240SPs at ~1040MHz just doesn't seem realistic, power-wise.



That GPU-part looks to be around 100mm², which is close to Redwood's size, but on 32nm.
Alexko is offline   Reply With Quote
Old 12-Nov-2010, 03:25   #17
mczak
Senior Member
 
Join Date: Oct 2002
Posts: 2,434
Default

Quote:
Originally Posted by Alexko View Post
They said 500+ GFLOPS. That sounds to me like 480SPs @ ~550MHz or maybe 400SPs @ ~630MHz.

240SPs at ~1040MHz just doesn't seem realistic, power-wise.
The quote was 400-500 GFlops. And from how it was worded, it was for the whole chip. Which leaves 300-400Gflops for the GPU. With 240SPs that gives you 625-830Mhz. Sounds doable to me.
Quote:
That GPU-part looks to be around 100mm², which is close to Redwood's size, but on 32nm.
You are right it looks quite big.
mczak is offline   Reply With Quote
Old 12-Nov-2010, 06:57   #18
Alexko
Senior Member
 
Join Date: Aug 2009
Posts: 2,019
Send a message via MSN to Alexko
Default

Quote:
Originally Posted by mczak View Post
The quote was 400-500 GFlops. And from how it was worded, it was for the whole chip. Which leaves 300-400Gflops for the GPU. With 240SPs that gives you 625-830Mhz. Sounds doable to me.

You are right it looks quite big.
There was another comment during analyst day, where the guy said 500+ GFLOPS, worded in a way that makes me think it was just for the GPU. I don't have time right now but I'll try to find it a link it later today.
Alexko is offline   Reply With Quote
Old 12-Nov-2010, 13:16   #19
mczak
Senior Member
 
Join Date: Oct 2002
Posts: 2,434
Default

Quote:
Originally Posted by Alexko View Post
There was another comment during analyst day, where the guy said 500+ GFLOPS, worded in a way that makes me think it was just for the GPU. I don't have time right now but I'll try to find it a link it later today.
Even with 500+ gflops for the gpu, shouldn't 400 SPs be more than sufficient? That would only need 625Mhz. Shouldn't the 32nm SOI process actually allow clock increases over 40nm bulk? Granted the structure doesn't really look like that. But it would be strange imho if there would be so many simds (hence increasing cost) but then they'd be clocked so low.
mczak is offline   Reply With Quote
Old 12-Nov-2010, 13:45   #20
Alexko
Senior Member
 
Join Date: Aug 2009
Posts: 2,019
Send a message via MSN to Alexko
Default

Quote:
Originally Posted by mczak View Post
Even with 500+ gflops for the gpu, shouldn't 400 SPs be more than sufficient? That would only need 625Mhz. Shouldn't the 32nm SOI process actually allow clock increases over 40nm bulk? Granted the structure doesn't really look like that. But it would be strange imho if there would be so many simds (hence increasing cost) but then they'd be clocked so low.
400 SPs seems plausible, but 240 doesn't, IMO.

I can't find a free transcript for Tuesday's analyst day, but I think the quote in question was during the Client platforms breakout session, for which the webcast is still available.
Alexko is offline   Reply With Quote
Old 12-Nov-2010, 18:41   #21
Karoshi
Member
 
Join Date: Aug 2005
Location: Mars
Posts: 181
Default

Why are there no APU's GPUs running at 2+ GHz?
Karoshi is offline   Reply With Quote
Old 15-Nov-2010, 09:00   #22
DavidC
Member
 
Join Date: Sep 2006
Posts: 273
Default

Quote:
Originally Posted by Karoshi View Post
Why are there no APU's GPUs running at 2+ GHz?
It's all about balance. Remember we aren't talking about the 1980's which the components had passive cooling using 3W. We are already limited by cooling and power consumption.

It's probably better to get 400SPs at 650MHz than 200SPs at 1300MHz. GPU code has extremely high parallelism so adding more SPs are easier than clocking it high.

Nvidia does have high clock speeds for its SPs, but again, its just for SPs. All other blocks clock much lower. ATI design calls for having everything clock like the base clock. I guess they can change it, but not something that'll happen overnight.

Even if the process technology, thermal and power limits, and costs of development allow clocking the GPU at 2GHz, does the design allow it?
DavidC is offline   Reply With Quote
Old 15-Nov-2010, 19:48   #23
hkultala
Member
 
Join Date: May 2002
Location: Herwood, Tampere, Finland
Posts: 264
Default

It seems intel is finally at least developing openCL implementation for their integrated GPU's:

They just sent an email to llvm-developers list, recruiting people to develop their llvm-based opencl implementation:

Quote:
Originally Posted by Intel recruitment email

LLVM Software engineer at Intel,CA(Santa Clara or Folsom)

In this position, you will be responsible for designing and developing highly competitive OpenCL (Open Compute Language, a new industry standard for heterogeneous data and task parallel computing across GPU's and CPU's). You will be supporting on integrated graphics processors. This includes a JIT compiler, a library of built-in functions and OpenCL runtime driver support. Responsibilities (depending on your skill set) will include applying state of the art compilation/JIT technology, knowledge of high performance math algorithms and system architecture skills to allow applications to tap into the computation power of GPUs previously only available to graphics applications ....
hkultala is offline   Reply With Quote
Old 30-Dec-2010, 06:28   #24
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: /
Posts: 4,070
Send a message via Skype™ to rpg.314
Default

http://www.semiaccurate.com/2010/12/...ry-ivy-bridge/

Interesting.
__________________
The views presented here are my own and not my employer's.
Quote:
Originally Posted by Alexko View Post
So in a nutshell, model [BLANK] will have [BLANK], up to [BLANK], and even [BLANK] for a power consumption of just [BLANK]. Impressive.
rpg.314 is offline   Reply With Quote
Old 30-Dec-2010, 10:42   #25
GZ007
Member
 
Join Date: Jan 2010
Posts: 416
Default

Quote:
Originally Posted by rpg.314 View Post
The size and the bandwith doesnt sound to realistic to me.
Wouldnt they use it already in server cpu-s if they could get 1 GB of memory at 5770 speeds in the ivy bridge design.
GZ007 is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 07:52.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.