Haswell vs Kaveri

I hate to say it, but Intel is starting to close the gap on AMD big time, if they start putting some real effort into drivers along with the crazy silicon improvements we've seen just from the graphics side over the last 2 years then AMD could really be in serious trouble...

Never the less looking forward to trinity
 
I think if they get the whole integration (with llc sharing etc.) right it could probably get close even if it only has half the bandwidth. Otherwise (with llano-like integration) it would stay well below hd5750, might probably only match a 6670 or thereabouts. I've got no idea though how integration looks like on kaveri.

I wouldn't be so sure about that.
Supposedly, there'll be much less CPU-GPU bandwidth overhead because of not having to pass the same data around different places in the system RAM.
iGPU's memory efficiency in Kaveri should be way ahead of Llano's.


Besides, I get this feeling that raw memory bandwidth keeps getting overrated in mid to high-end GPUs.
I don't know if this comes from improvements in memory controllers, better Z-buffering, greater emphasys on less bandwidth-consuming operations during game development or others, but the truth is that general performance-per-GB/s in graphics cards has been steadily rising.
Just seeing how close Pitcairn XT is from Tahiti Pro makes me wonder how much of all that bandwidth goes to waste in anything lower than a 3-monitor eyefinity setup.


That said, I think Kaveri might actually bite the heels of a HD7750 in 720p + 4xMSAA, or even 1080p w/o AA.
 
I don't know if this comes from improvements in memory controllers, better Z-buffering, greater emphasys on less bandwidth-consuming operations during game development or others, but the truth is that general performance-per-GB/s in graphics cards has been steadily rising.

a) It's a useless number.

b) Of course it is rising. A number can increase and still be the bottleneck.
 
a) It's a useless number.

b) Of course it is rising. A number can increase and still be the bottleneck.

I have no idea what you're referring to, or what you meant in your post.
 
ToTTenTranz said:
Besides, I get this feeling that raw memory bandwidth keeps getting overrated in mid to high-end GPUs.
I tend to agree with you, at least for mid-range performance. Being bandwidth limited for graphics applications is usually not as dire a situation as is portrayed. If you are bandwidth limited to 150 frames per second, who cares? Your monitor is only going to show you 120 (if you are lucky). The main performance threshold to cross is 60 Hz. As long as they can provide enough bandwidth to not be primarily bandwidth limited below 60 frames per second, customers (in that market range) are not going to care.
 
Supposedly, there'll be much less CPU-GPU bandwidth overhead because of not having to pass the same data around different places in the system RAM.

With a smart driver, the CPU-GPU memory bandwidth overhead can be essentially driven to zero on this kind of a platform.

The problem is, the present-day graphics pipeline was developed for an architecture where the link between CPU and GPU was narrow and slow. So, it's really not a large proportion of total bandwidth, so even drastic cuts to it has pretty minimal effects on total bandwidth.


I find it really, really unlikely that it will actually act as a L4 cache. Memory requests will have to be sent from the largest cache level, this would mean that either they have to move the memory controllers to the external chip (Meaning they cannot use cheap mass produced ram chips, but have to ship two custom high-speed chips. Bye-bye margins.), or requests to memory have to wait for requests to pass over the external bus twice giving a lot more latency and possibly hurting in as many tasks as they help.

I find it much more likely that if there is a special back-side bus, it will attach to a memory pool that is high-bandwidth, relatively high-latency, directly accessed, and mostly used as the frame buffer for the iGPU.

Big L4. What do you think how big is that, maybe 512 MB?

Depends on what it is. If it's some kind of DRAM, 256-512 is realistic. If it's SRAM or relatives, 64MB would be about it.
 
I wouldn't be so sure about that.
Supposedly, there'll be much less CPU-GPU bandwidth overhead because of not having to pass the same data around different places in the system RAM.

I don't think so. Pixel data and texture file always take up a lot of space and there's no enough cache in CPU, they still need to put it into the memory which means high bandwidth is required
 
I find it really, really unlikely that it will actually act as a L4 cache. Memory requests will have to be sent from the largest cache level, this would mean that either they have to move the memory controllers to the external chip (Meaning they cannot use cheap mass produced ram chips, but have to ship two custom high-speed chips. Bye-bye margins.), or requests to memory have to wait for requests to pass over the external bus twice giving a lot more latency and possibly hurting in as many tasks as they help.

I find it much more likely that if there is a special back-side bus, it will attach to a memory pool that is high-bandwidth, relatively high-latency, directly accessed, and mostly used as the frame buffer for the iGPU.
Fully agree.

Depends on what it is. If it's some kind of DRAM, 256-512 is realistic. If it's SRAM or relatives, 64MB would be about it.
Samsung started delivering 8Gbit DRAM chips over three years ago. Intel/Micron can certainly do much better than that in 2013, should they so desire. I think it can safely be assumed that they will equip the chip with enough memory to take the business of the whatever part of the graphics market they desire. My personal guess would be 1GB.

Once the CPUs start to integrate the graphics memory, the bandwidth advantage of discrete cards could well be a thing of the past, leaving only the very high TDP/large die area parts with any justification at all. Not sure how I feel about that personally, it is not a market segment I favour, I'd rather see the dedicated GPUs fight back with large dies alone, but I doubt it will happen. It's an open question whether I will ever buy a dedicated GPU again, either privately or professionally.
 
I'm suprised they chose to use SRAM instead of a x8 bigger DRAM chunk. I guess its less latency and potentially faster, but it seems like a much bigger amount would be more useful. Then again, being that its Intel, they probably didn't want to have to go outside the company for DRAM and chose SRAM because its something they can make in-house.
 
I don't think it is SRAM.

Really? Seems tiny if its DRAM on an interposer, I wouldn't think they would need an interposer at all if it was only 64MB DRAM. I mean, in 2013, they should have 4gbit DRAM chips, and 2gbit has a raw die size of ~55mm^2 nowadays IIRC? Thats gotta be below 10mm^2 for 64MB DRAM, whats the interposer for?

Unless of course this info is completely wrong and its much more than 64MB... I was expecting 512MB or 1GB stacked next to the die, personally.
 
maybe it was 512MBytes but the article authors thought it was 512Mbits and converted it to 64MBytes.
 
And yet, would be epic badassery if there's 512MBytes of ram stacked on the die at full clock and a fat connection. I'm not counting on that, though...
 
Back
Top