Haswell vs Kaveri

The context just holds memory allocations. It's not going to gain any speed from HSA.

Well let's say you want to GREP the results of your sort (which I think makes much more sense to do on a CPU) to find patterns or present it to an Excel range for a client; now you have to pass that whole context back into the CPU's memory space right? I'd be thrilled if this were a fast and abstractable process on Kaveri.
 
Well let's say you want to GREP the results of your sort (which I think makes much more sense to do on a CPU) to find patterns or present it to an Excel range for a client; now you have to pass that whole context back into the CPU's memory space right? I'd be thrilled if this were a fast and abstractable process on Kaveri.

=)
It's already fast and abstractable. I haven't had to deal with this problem since I started using Thrust years ago.


//your CPU data
T* data; int len;

//allocate space and transfer data
thrust::device_vector<T> gpu_data(data, data+len);

//sort it
thrust::sort(gpu_data.begin(), gpu_data.end);

//look at an element (transfers data for you)
T el = gpu_data[10];
std::cout << el << std::endl;

//transfer the whole vector back
thrust::host_vector<T> sorted_data = gpu_data;

Abstraction over memory spaces isn't too bad...
 
=)
It's already fast and abstractable. I haven't had to deal with this problem since I started using Thrust years ago.


//your CPU data
T* data; int len;

//allocate space and transfer data
thrust::device_vector<T> gpu_data(data, data+len);

//sort it
thrust::sort(gpu_data.begin(), gpu_data.end);

//look at an element (transfers data for you)
T el = gpu_data[10];
std::cout << el << std::endl;

//transfer the whole vector back
thrust::host_vector<T> sorted_data = gpu_data;

Abstraction over memory spaces isn't too bad...

Thanks, that's down right clean, I'll have to try it! Still, if you want a speed, the whole paradigm of GPU + CPU interoperation for now still seems limited to running long chunks of math dense code like massive linear algebra or merge sorts entirely on the GPU until it finishes, and only then accessing data instead of freely interleaving of the two types of processors.

Anyway, it sounds like the memory allocation issue is addressed with Kaveri so after some initial overhead, maybe we'll just be able to do all this w/o specialized memory abstracting libraries and just passing raw pointers around to GPU threads (or waves, warps whatever they call them) running our functions.

It'd be nice to have new CPU instructions in the future that directly use the hardware GPU like the extra wide SIMD unit it is on a synchronous CPU thread and let the OS or a more sophisticated GPU scheduler handle managing resources. (Another possibility is reserving one or two couple privileged units on the GPU side for this purpose with full cache coherence logic for those select units etc.)
 
Well I contend that the market for "high end" socketed APUs has yet to be proven. The cost analysis just never comes out in favor of these things compared to cheap dGPUs unless you are form factor or power-constrained, and I don't expect that to change any time soon.

So sure, you can say that they fill that niche and thus aren't comparable to anything Intel ships, but I'm not convinced that niche exists to start with :)

I guess if they are going to resist that comparison I'll have to wait for the mobile chips. It'll be even harder for them to compete there though due to a process disadvantage I imagine.


Suddenly Intel found a market. Broadwell-K gets GT3e according to this: http://www.cpu-world.com/news_2013/...socket_1150_CPUs_to_feature_GT3_graphics.html
 
d1k0.png
 
The Kaveri Steamroller core slower clock speed faster than the Trinity Piledriver core...

A8-5600K @ 3.60 GHz vs. Kaveri ES 3.5GHz
http://browser.primatelabs.com/geekbench3/compare/209001?baseline=223722

I do not know that the Family 21 Model 48 Stepping 1 marking the Kaveri 2.0 or not, but here is an older Kaveri ES, Family 21 Model 48 Stepping 0
http://cosmologyathome.org/show_host_detail.php?hostid=187215

Cool, looks like this might actually be a decent CPU. Just a shame about the lack of GPU and memory oomph.
 
The Kaveri Steamroller core slower clock speed faster than the Trinity Piledriver core...

A8-5600K @ 3.60 GHz vs. Kaveri ES 3.5GHz
http://browser.primatelabs.com/geekbench3/compare/209001?baseline=223722

A good multi-threaded improvement as expected (likely due to the split decoders per module), but single-threaded performance in those benchmarks remains lacklustre.

I see the Kaveri ES system only has half the DRAM installed as Trinity, is memory size or if it is only running a single module in single channel likely to effect those benchmarks?
More unlikely but possible, the low installed memory size and lack of memory details could perhaps point to the ES using GDDR5m, which it was earlier strongly rumoured AMD was considering using but will not bring to market at this time.
 
Anyone know if the "2.0" refers to "Kaveri 2.0" or "APU 2.0" now that it has all the HSA stuff?
 
There have been mentions of Kaveri 2.0 before, e.g. on the LinkedIn profiles a few AMD employees. I think the original Kaveri was scrapped and replaced by what's about to be released, hence the delay and the introduction of Richland.
 
The article says that AMD has yet to decide what the Turbo frequency will be, but that seems hard to believe so close to launch.
 
The Kaveri Steamroller core slower clock speed faster than the Trinity Piledriver core...

A8-5600K @ 3.60 GHz vs. Kaveri ES 3.5GHz
http://browser.primatelabs.com/geekbench3/compare/209001?baseline=223722

I do not know that the Family 21 Model 48 Stepping 1 marking the Kaveri 2.0 or not, but here is an older Kaveri ES, Family 21 Model 48 Stepping 0
http://cosmologyathome.org/show_host_detail.php?hostid=187215
Is the Mandelbrot FPU subtest x87 coded? Looks like the legacy stack remains untouched.
 

Looks like the CEO is slashing costs and ensuring execution w/ this next round of parts. The bulk process is probably cheaper and TSMC can probably deliver better volume too, but it seems like clock speed is down; hopefully their turbo boost is working more selectively now. We have DDR3 instead of GDDR5, we're keeping sockets yet again, and there's no enthusiast part. The enthusiast in me wants them to liberate the Steamroller B core, but maybe I should buy the company's stock for some consolation when it eventually turns around.
 
Back
Top