Haswell vs Kaveri

Raqia · Nov 13, 2013

RecessionCone said:
The context just holds memory allocations. It's not going to gain any speed from HSA.

Well let's say you want to GREP the results of your sort (which I think makes much more sense to do on a CPU) to find patterns or present it to an Excel range for a client; now you have to pass that whole context back into the CPU's memory space right? I'd be thrilled if this were a fast and abstractable process on Kaveri.

RecessionCone · Nov 14, 2013

Raqia said:
Well let's say you want to GREP the results of your sort (which I think makes much more sense to do on a CPU) to find patterns or present it to an Excel range for a client; now you have to pass that whole context back into the CPU's memory space right? I'd be thrilled if this were a fast and abstractable process on Kaveri.

=)
It's already fast and abstractable. I haven't had to deal with this problem since I started using Thrust years ago.

//your CPU data
T* data; int len;

//allocate space and transfer data
thrust::device_vector<T> gpu_data(data, data+len);

//sort it
thrust::sort(gpu_data.begin(), gpu_data.end);

//look at an element (transfers data for you)
T el = gpu_data[10];
std::cout << el << std::endl;

//transfer the whole vector back
thrust::host_vector<T> sorted_data = gpu_data;

Abstraction over memory spaces isn't too bad...

Raqia · Nov 14, 2013

RecessionCone said:
=)
It's already fast and abstractable. I haven't had to deal with this problem since I started using Thrust years ago.

//your CPU data
T* data; int len;

//allocate space and transfer data
thrust::device_vector<T> gpu_data(data, data+len);

//sort it
thrust::sort(gpu_data.begin(), gpu_data.end);

//look at an element (transfers data for you)
T el = gpu_data[10];
std::cout << el << std::endl;

//transfer the whole vector back
thrust::host_vector<T> sorted_data = gpu_data;

Abstraction over memory spaces isn't too bad...

Thanks, that's down right clean, I'll have to try it! Still, if you want a speed, the whole paradigm of GPU + CPU interoperation for now still seems limited to running long chunks of math dense code like massive linear algebra or merge sorts entirely on the GPU until it finishes, and only then accessing data instead of freely interleaving of the two types of processors.

Anyway, it sounds like the memory allocation issue is addressed with Kaveri so after some initial overhead, maybe we'll just be able to do all this w/o specialized memory abstracting libraries and just passing raw pointers around to GPU threads (or waves, warps whatever they call them) running our functions.

It'd be nice to have new CPU instructions in the future that directly use the hardware GPU like the extra wide SIMD unit it is on a synchronous CPU thread and let the OS or a more sophisticated GPU scheduler handle managing resources. (Another possibility is reserving one or two couple privileged units on the GPU side for this purpose with full cache coherence logic for those select units etc.)

Paran · Nov 20, 2013

Andrew Lauritzen said:
Well I contend that the market for "high end" socketed APUs has yet to be proven. The cost analysis just never comes out in favor of these things compared to cheap dGPUs unless you are form factor or power-constrained, and I don't expect that to change any time soon.

So sure, you can say that they fill that niche and thus aren't comparable to anything Intel ships, but I'm not convinced that niche exists to start with

I guess if they are going to resist that comparison I'll have to wait for the mobile chips. It'll be even harder for them to compete there though due to a process disadvantage I imagine.

Suddenly Intel found a market. Broadwell-K gets GT3e according to this: http://www.cpu-world.com/news_2013/...socket_1150_CPUs_to_feature_GT3_graphics.html

Chabi · Nov 23, 2013

entity279 · Nov 23, 2013

2.0 ? This hints at Google chrome versioning system or what?

Kaotik · Nov 23, 2013

Source?

no-X · Nov 23, 2013

Maybe: http://vozforums.com/showpost.php?p=64095925&postcount=299 ?

Chabi · Nov 23, 2013

The Kaveri Steamroller core slower clock speed faster than the Trinity Piledriver core...

A8-5600K @ 3.60 GHz vs. Kaveri ES 3.5GHz
http://browser.primatelabs.com/geekbench3/compare/209001?baseline=223722

I do not know that the Family 21 Model 48 Stepping 1 marking the Kaveri 2.0 or not, but here is an older Kaveri ES, Family 21 Model 48 Stepping 0
http://cosmologyathome.org/show_host_detail.php?hostid=187215

moozoo · Nov 23, 2013

Dedicated SSD PCIe = Sata express?

pjbliverpool · Nov 23, 2013

Chabi said:
The Kaveri Steamroller core slower clock speed faster than the Trinity Piledriver core...

A8-5600K @ 3.60 GHz vs. Kaveri ES 3.5GHz
http://browser.primatelabs.com/geekbench3/compare/209001?baseline=223722

I do not know that the Family 21 Model 48 Stepping 1 marking the Kaveri 2.0 or not, but here is an older Kaveri ES, Family 21 Model 48 Stepping 0
http://cosmologyathome.org/show_host_detail.php?hostid=187215

Cool, looks like this might actually be a decent CPU. Just a shame about the lack of GPU and memory oomph.

Chabi · Nov 23, 2013

moozoo said:
Dedicated SSD PCIe = Sata express?

direct access PCIe lanes to SSD
(i think)

kalelovil · Nov 24, 2013

Chabi said:
The Kaveri Steamroller core slower clock speed faster than the Trinity Piledriver core...

A8-5600K @ 3.60 GHz vs. Kaveri ES 3.5GHz
http://browser.primatelabs.com/geekbench3/compare/209001?baseline=223722

A good multi-threaded improvement as expected (likely due to the split decoders per module), but single-threaded performance in those benchmarks remains lacklustre.

I see the Kaveri ES system only has half the DRAM installed as Trinity, is memory size or if it is only running a single module in single channel likely to effect those benchmarks?
More unlikely but possible, the low installed memory size and lack of memory details could perhaps point to the ES using GDDR5m, which it was earlier strongly rumoured AMD was considering using but will not bring to market at this time.

DSC · Nov 24, 2013

Intel has released 10.18.10.3345 drivers for IVB & HSW.

32bit
https://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=23406

64bit
https://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=23405

Kaotik · Nov 24, 2013

Anyone know if the "2.0" refers to "Kaveri 2.0" or "APU 2.0" now that it has all the HSA stuff?

Alexko · Nov 24, 2013

There have been mentions of Kaveri 2.0 before, e.g. on the LinkedIn profiles a few AMD employees. I think the original Kaveri was scrapped and replaced by what's about to be released, hence the delay and the introduction of Richland.

Chabi · Nov 24, 2013

http://www.bitsandchips.it/9-hardwa...lla-apu-a8-76x0k-con-moltiplicatore-sbloccato

Alexko · Nov 24, 2013

The article says that AMD has yet to decide what the Turbo frequency will be, but that seems hard to believe so close to launch.

fellix · Nov 24, 2013

Chabi said:
The Kaveri Steamroller core slower clock speed faster than the Trinity Piledriver core...

A8-5600K @ 3.60 GHz vs. Kaveri ES 3.5GHz
http://browser.primatelabs.com/geekbench3/compare/209001?baseline=223722

I do not know that the Family 21 Model 48 Stepping 1 marking the Kaveri 2.0 or not, but here is an older Kaveri ES, Family 21 Model 48 Stepping 0
http://cosmologyathome.org/show_host_detail.php?hostid=187215

Is the Mandelbrot FPU subtest x87 coded? Looks like the legacy stack remains untouched.

Raqia · Nov 24, 2013

Chabi said:
http://www.bitsandchips.it/9-hardwa...lla-apu-a8-76x0k-con-moltiplicatore-sbloccato

Looks like the CEO is slashing costs and ensuring execution w/ this next round of parts. The bulk process is probably cheaper and TSMC can probably deliver better volume too, but it seems like clock speed is down; hopefully their turbo boost is working more selectively now. We have DDR3 instead of GDDR5, we're keeping sockets yet again, and there's no enthusiast part. The enthusiast in me wants them to liberate the Steamroller B core, but maybe I should buy the company's stock for some consolation when it eventually turns around.

Haswell vs Kaveri

Raqia

RecessionCone

Raqia

Paran

Chabi

entity279

Kaotik

Drunk Member

no-X

Chabi

moozoo

pjbliverpool

B3D Scallywag

Chabi

kalelovil

DSC

Kaotik

Drunk Member

Alexko

Chabi

Alexko

fellix

Raqia