AMD: Sea Islands R1100 (8*** series) Speculation/ Rumour Thread

I just don't see how perf/flop matters.

Like others were trying to argue in the other thread, perf/flop doesn't matter to anything tangible. However, it does matter when you're trying to understand what an architecture is doing. Since when does intellectual curiosity have such a low standing among this crowd? :LOL:
 
Southern Islands made huge strides in perf/flop but still isn't quite up to Fermi. I know there are some who aren't interested in such trivia but it's interesting to note that even after dropping VLIW SI is still lagging Fermi somewhat in utilization, at least in these tests. nVidia will probably still need upward of 3.5Gflops to compete for the compute crown though.

Note that the Mandelbrot numbers should be even higher on a Tesla card given the artificial double-precision hobbling of Geforces. Will have to wait and see if FirePros have more DP throughput than Radeons.

Too bad ComputeMark is based on DirectCompute... Anyway, I suspect that SI performance can be improved by moving to OpenCL.

If the workload is unbalanced (i.e. some wavefronts run for much longer than others), then SI would benefit from running multiple kernels simultaneously. In OpenCL, we are able to run concurrent kernels to achieve higher utilization when there are no output dependencies. This should all be documented publicly soon.
 
Like others were trying to argue in the other thread, perf/flop doesn't matter to anything tangible. However, it does matter when you're trying to understand what an architecture is doing. Since when does intellectual curiosity have such a low standing among this crowd?

Well, what I should have said was that perf/flop doesn't tell you much while weighing architectures.
 
Too bad ComputeMark is based on DirectCompute... Anyway, I suspect that SI performance can be improved by moving to OpenCL.

If the workload is unbalanced (i.e. some wavefronts run for much longer than others), then SI would benefit from running multiple kernels simultaneously. In OpenCL, we are able to run concurrent kernels to achieve higher utilization when there are no output dependencies. This should all be documented publicly soon.

I take it that's not possible with DirectCompute. Is that a limitation of DirectCompute itself, or just the way it's implemented at the moment?
 
Note that the Mandelbrot numbers should be even higher on a Tesla card given the artificial double-precision hobbling of Geforces. Will have to wait and see if FirePros have more DP throughput than Radeons.

There's a little documentation on Jan Vlietnick's Homepage as to some of the tests used in ComputeMark:
http://users.skynet.be/fquake/

The tests though were obviously modified by Robert Varga, so the descriptions might not necessarily still apply.
 
Southern Islands made huge strides in perf/flop but still isn't quite up to Fermi.
I'm afraid to say I think it's reasonable to assume the driver compiler isn't sorted yet. We've seen plenty of cases where NVidia's compute performance fluctuates wildly with different drivers.

Note that the Mandelbrot numbers should be even higher on a Tesla card given the artificial double-precision hobbling of Geforces. Will have to wait and see if FirePros have more DP throughput than Radeons.
This code doesn't use double.
 
You're right, I can't be sure which version Hardware Canucks has used.

I didn't rummage properly earlier and looked solely at the float version.

But, when I look at the other HLSL file (mandel.hlsl instead of mandel_float.hlsl) it uses float, not double, for computations. So I don't know what's happening when the "double" version of the benchmark is running.

---

If the benchmark was using doubles I expect the performance difference between scalar and vector on HD6970 would be far smaller than it is. This is because float-scalar is about 40% utilisation on HD5870 (or, at a guess, 50% utilisation on HD6970). Double-scalar would be in the region of 100% utilisation on HD6970 and therefore practically as fast as double-vector (or faster, due to reduced incoherence).

(GPU Shader Analyzer 1.54 crashes when I try to change the shader to use double in place of float :rolleyes: )

---

So the evidence I have is that we're looking at float, not double, computation.
 
If the benchmark was using doubles I expect the performance difference between scalar and vector on HD6970 would be far smaller than it is. This is because float-scalar is about 40% utilisation on HD5870 (or, at a guess, 50% utilisation on HD6970). Double-scalar would be in the region of 100% utilisation on HD6970 and therefore practically as fast as double-vector (or faster, due to reduced incoherence).

Yeah that makes sense, not sure what it's doing then.

/shrug
 
I take it that's not possible with DirectCompute. Is that a limitation of DirectCompute itself, or just the way it's implemented at the moment?
It might be possible with DirectCompute as well, but I know the DX runtime gets in the way at times. I.e. if kernels use the same UAV indices there can be stalls inserted by the runtime. That's my understanding at least, DirectCompute is not my domain.
 
It might be possible with DirectCompute as well, but I know the DX runtime gets in the way at times. I.e. if kernels use the same UAV indices there can be stalls inserted by the runtime. That's my understanding at least, DirectCompute is not my domain.

OK, thanks.
 
Is there evidence or documentation for this?
There is some statement in old Stream Computing guides about the rasterization process. It pertains to pixel shaders doing compute, but I guess one can relate the "blocks" within the domain of execution to triangles in case of graphics.
Stream Computing User Guide 1.4 said:
Thread Optimization
ATI hardware is designed to maximize the number of active threads in a wavefront. So, if there are partial 8x8 blocks [smaller triangles], the stream processor tries to fill the rest of the wavefront from other blocks [triangles], but within the quad limitation.
 
Probably a stupid question but is it likely that Xbox3's GPU might be based on Sea Islands ?

I dont think its a stupid question at all, in fact i think thats quite likely. It all depends on the timeframe though, if its slated for a launch in 2013 then its probably Sea Islands, if its sometime in 2014 or later its probably going to be based on the generation after Sea Islands. Anyway i dont expect Sea Islands to be revolutionary in nature. Its probably the same base GCN architecture with some updates and tweaks, like Cpyress to Cayman.

My personal speculation for the next Xbox is an AMD fusion chip, with say 12-16 Bulldozer cores(Steamroller/Excavator) and a Tahiti class GPU (under 22/20nm in say late 2014). Of course this is all a completely made up and random thought :p
 
I dont think its a stupid question at all, in fact i think thats quite likely. It all depends on the timeframe though, if its slated for a launch in 2013 then its probably Sea Islands, if its sometime in 2014 or later its probably going to be based on the generation after Sea Islands. Anyway i dont expect Sea Islands to be revolutionary in nature. Its probably the same base GCN architecture with some updates and tweaks, like Cpyress to Cayman.

My personal speculation for the next Xbox is an AMD fusion chip, with say 12-16 Bulldozer cores(Steamroller/Excavator) and a Tahiti class GPU (under 22/20nm in say late 2014). Of course this is all a completely made up and random thought :p

That would never fit in a reasonable power budget.

DK
 
That would never fit in a reasonable power budget.

DK

At the highest possible clocks and taking current 32/28nm power levels yes it wouldnt. But under 22/20 nm and with lower clocks i think they should manage to get it within about 200 watts (which afaik is what the original Xbox 360 consumed). The CPU should be fine at lower clocks since its got a large number of cores. GPU power would be more important. A 60:40 split between the GPU and CPU would mean 120 watts for the GPU and 80 watts for the CPU. I'd imagine that a 16 core bulldozer at 22nm could run at say 2.2-2.4 ghz and stay within that power budget. And a 22nm shrink of Tahiti with similar or lower clocks could well scale down to 120 watts or so. I think they could manage it, but like i said, im just pulling things and numbers out of thin air, im sure im waaay off :smile: But overall, a Fusion chip would be quite cost competitive, and simplify the design as well. I wouldnt at all be surprised if they end up going with a Fusion chip (even if they specs are radically different)
 
Last edited by a moderator:
I dont think its a stupid question at all, in fact i think thats quite likely. It all depends on the timeframe though, if its slated for a launch in 2013 then its probably Sea Islands, if its sometime in 2014 or later its probably going to be based on the generation after Sea Islands. Anyway i dont expect Sea Islands to be revolutionary in nature. Its probably the same base GCN architecture with some updates and tweaks, like Cpyress to Cayman.

My personal speculation for the next Xbox is an AMD fusion chip, with say 12-16 Bulldozer cores(Steamroller/Excavator) and a Tahiti class GPU (under 22/20nm in say late 2014). Of course this is all a completely made up and random thought :p

Wrong choice given Bulldozer horrible performance, A smaller CPU with higher IPC would be a much better option.
 
Back
Top