Difference between various AMD architectures

silhouette

Regular
Dear all,

I am trying to understand the difference between various AMD architectures, particularly XENON (360), VLIW 4/5 and GCN. I read couple of articles, but I am still not sure if I understand correctly. I appreciate if someone can verify if I understanf rigth and correct me if I am mistaken:

Xenos (Xbox 360)

- We have a 3 CU, each of which was configured as 16 4D+1D ALUs.
- In each cycle, one 4D vector operation and one scalar operation can be performed simultaneously.
- In each cycle a single (or double depending on the utilization of both vector and scalar ALUs) instruction is decoded for a CU and that instrunction is performed on 16 units (pixels or vertices).
- If a branch occurs and different units take different paths, the units whose branch path are not taken become idle until the other path's execution is completed.
- If there occurs a case that all units need to wait (e.g. because of an texture access which is on main memory), a new thread is dispatched instead of blocking the whole CU

Disadvantages:
1- If an instruction does not operate on full 4D vector (such as r1.xy = r1.xy + r3.zw), some of the ALUs can be left unused for that cycle
2- If not all units take the same branch path, the ALUs for units whose branch path is not taken are left unused for that cycle (i.e. number of units whose branch not taken x 5D ALUs)

VLIW 4/5

- We have a number of CUs, each of which was configured as 16 4/5 1D ALU groups.
- In each cycle, upto 4/5 instructions can be issued to utilize the 4/5 ALU groups. So, upto 4/5 1D operations can be performed
- In each cycle a 4/5 instructions can be decoded for a CU and those instrunctions are performed on 16 units (pixels or vertices).
- If a branch occurs and different units take different paths, the units whose branch path are not taken become idle until the other path is executed.
- If there occurs a case that all units need to wait (e.g. because of an texture access which is on main memory), a new thread is dispatched instead of blocking the whole CU

Advantages:
- The Xenon disadvantage-(1) can be eliminated as we do not have to operate on 4D vectors any more.

Disadvantages:
- If there are dependencies between instructions and those cannot be resolved, not all ALUs in 5 1D group can be used
- Number of decode units is bumped from 2 to 5 per CU
- The Xenon disadvantage-(2) (branching) is pretty much the same here (i.e. number of units whose branch not taken x 5D ALUs)


GCN

- We have a number of CUs, each of which was configured as 4x16 1D SIMD ALU groups.
- In each cycle, 4 instructions is decoded per CU, one for each 16 element (pixel/vertex/geomertry etc).
- There is no vector operations, it is always scalar operation but on 16 elements at a time.
- If a branch occurs and different units takes different paths, the units whose branch path is not taken become idle until the other path is executed.
- If there occurs a case that all 16 element units need to wait (e.g. because of an texture access which is on main memory), a new thread(wavefront) is dispatched on that 16 SIMD units

Advantages:
- The Xenon disadvantage-(1) can be eliminated as we do not have to operate on 4D vectors.
- The VLIW4/VLIW5 disadvante can be eliminated as we always do operations on 1D scalars at a time and dependency does not affect the execution
- If not all units take the same branch path, the ALUs for units whose branch path is not taken are left unused for that cycle, but this time it is limited to 1D ALUs rather than 4/5D vector units.
- Number of decode units is kept 4 per CU

Disadvantages:
- The Xenon disadvantage-(2) (branching) is still here but now a lot less costly (i.e. number of units whose branch not taken x 1D ALUs)

Is my understanding correct? Thanks!!
 
Sorry, I'm still trying to fully understand your question...I figure if I do I'll have learned something. :oops:

(Your question was a bit over my head, too advanced for me)
 
This is a broad topic, discussed here on B3D to death every and each time. If you are looking for deeper clarification and understanding of the GPUs, don't limit your scope on only one vendor. Search for some official documentation, like programmer's manuals, API/SDK documentations, etc. -- understanding how the software works on those architectures is the best way to familiarize yourself with any platform's specifics.

In relation, here is a good bit of a brief comparison between different architectures:

http://forum.beyond3d.com/showthread.php?p=1590247#post1590247
 
Your understanding is correct though sometimes you wrote Xenon when it should have said Xenos.
 
Back
Top