For all we know...is Rys correct?.

Is GCN the most elegant architecture with its rough edges? or the Kepler -way improved Fermi- did change this?. Which architecture has a brighter future?.

My vote is for Rys!. And the proof is PITCAIRN!. I see GCN as, at least on paper, more straightforward and flexible architecture.

And what about the rough edges? what are the rough edges in these architectures?.
 
Love_In_Rio said:
After the frontend fiasco from R600 ATI improved the ROPs in RV720. But what about it since then?.
I think you and Atlantis have a very different definition of front-end. ;)

ROPs are typically considered the very back-end because it's the last stage in the complete pipe that touches a pixel.
 
You're seeing an awful lot into that statement IMHO. It's a reasonably murky thing, there are niceties on both sides really, and I don't think that currently employed investigative methods do do them proper justice. Also, time is coming for us to keep a tighter eye on software...that's where the big wins are going to be coming medium term, with hardware being on a convergent-ish path.
 
Would it be accurate to state that at the ISA GCN is not SIMT, but explicit SIMD programming?
In terms of synchronization, which under SIMT can lead to weird outcomes with per-lane synchronization with divergence, isn't it treated as a single thread with 64-wide vector instructions?
 
Would it be accurate to state that at the ISA GCN is not SIMT, but explicit SIMD programming?
In terms of synchronization, which under SIMT can lead to weird outcomes with per-lane synchronization with divergence, isn't it treated as a single thread with 64-wide vector instructions?

I have never seen GCN described as SIMT by AMD, always as SIMD + scalar.
 
GCN is a scalar + vector design at the ISA level. We map languages like OpenCL and DirectCompute, which are SIMT languages (although I personally dislike that term...) onto that architecture. And you are right that thinking about synchronization and communication in a SIMT model is a little funky since the real nature of the machine you mapping to does bleed through, especially on assumption on if/when/how you can remove barriers, the performance behavior on divergence, or even the visibility ordering of things.
 
mhouston said:
GCN is a scalar + vector design at the ISA level. We map languages like OpenCL and DirectCompute, which are SIMT languages (although I personally dislike that term...) onto that architecture.
Is the scalar part ever used/useful for vertex or pixel shaders?
 
So SIMT is just SPMD by another name? At the hardware level, I've never understood the difference between SIMT and SIMD + prediction + scatter/gather.

Also, I've got a similar question to silent_guy - can bits of a kernel that are known to be constant across work-items be run on the scalar core and then broadcast to the vector ALUs for power savings?
 
Also, I've got a similar question to silent_guy - can bits of a kernel that are known to be constant across work-items be run on the scalar core and then broadcast to the vector ALUs for power savings?

Can't nearly everything that is known to be constant across work items be lifted completely out of the computation and just injected as constants?
 
One of the consequences of having the scalar instructions is that it exposes what was once automated in the divergence handling of older GPU architectures.

The GCN presentation showed that vector instructions can take a broadcast value from an SGPR.
 
@tunafish - I don't mean compile time constants. Contrived example (just the first thing that popped into my head): let's say you are writing a software rasterizer, and your design fires off wavefronts of 8x8 pixel blocks coming from a single triangle. The type of thing I'm thinking of would be for example to setup edge equations in the scalar unit and then use the results in the vector units to decide whether or not to shade a particular pixel.

@3dilettante - thanks. That aspect of GCN (broadcast, and making divergence handling explicit) seems very nice indeed.
 
You're seeing an awful lot into that statement IMHO. It's a reasonably murky thing, there are niceties on both sides really, and I don't think that currently employed investigative methods do do them proper justice. Also, time is coming for us to keep a tighter eye on software...that's where the big wins are going to be coming medium term, with hardware being on a convergent-ish path.

*nods* nVidia moving more control logic to software potentially means a wider marging for performance improvement curve in future drivers. As long as they can keep the overhead under control in the initial drivers of course.
 
At the hardware level, I've never understood the difference between SIMT and SIMD + prediction + scatter/gather.
That's understandable as there is none. :LOL:
"SIMT" (a really ridiculous term in my opinion) is just a stupid way to describe the mapping of a problem formulated in an implicitly parallel way onto an SIMD architecture with lane masking support. That's all.
At least so far. The stuff one hears about nV's Einstein architecture sounds a bit as this may change in the future (some flexible hybrid of MIMD and SIMD).
Also, I've got a similar question to silent_guy - can bits of a kernel that are known to be constant across work-items be run on the scalar core and then broadcast to the vector ALUs for power savings?
GCN's scalar ALU is a pure integer unit. Think of it like a really cut down integer core. It handles the program and control flow (also the lane masks) for the threads (the real ones, a.k.a. wavefronts in AMD's case), while the SIMD units do the actual computations.
 
Is GCN the most elegant architecture with its rough edges? or the Kepler -way improved Fermi- did change this?. Which architecture has a brighter future?.
Good question, they are closeser to each other than ever, apparently GCN have some sharp edges to be polited, so let's wait.

Is the scalar part ever used/useful for vertex or pixel shaders?
It is usefull, you don't waste signifcant resource on redundant operations, but I can't say if it is used, by looking at previous drivers...
 
Back
Top