PDA

View Full Version : ATI RV740 GPU and Architecture analysis


B3D News
26-Sep-2009, 13:36
After too long a hiatus, we're back with a new GPU and architecture analysis! ATI's 40nm marvel, RV740, is under Alex's microscope in Radeon HD 4770 form. Bringing almost everything that made RV770 (and now RV870) great to the lower end of the market, does RV740 ultimately impress?

Read the full news item (http://www.beyond3d.com/content/news/734)

rpg.314
26-Sep-2009, 14:58
Thank God. You guys finally do a review. :) Better late than never.

I guess this means that a rv770 review is nuked. Let's hope you guys can deliver a rv870 review, even if it is late.

AlStrong
26-Sep-2009, 14:58
:eek: :eek: :eek: :eek: :eek: :eek:

Nice!

Rys
26-Sep-2009, 15:42
RV770 board-level review is nuked, but the deeper arch piece isn't (since we'll just combined that with RV870's latest improvements into one thing). HD 5870 is done and I'm in the editing phase with Alex now.

rpg.314
26-Sep-2009, 15:53
From here

http://www.beyond3d.com/content/reviews/52/9


In a single clock, any instruction can read from at most 3 distinct GPR addresses due to read port restrictions.


In the code posted here,

http://forum.beyond3d.com/showpost.php?p=1322632&postcount=1

IN the instruction 35, I see 5 registers being read. Are there differences between RV740 and RV770's shaders?

LordEC911
26-Sep-2009, 20:23
Dang, another review to read...
Haven't even got through all the 58x0 reviews yet.

AlStrong
26-Sep-2009, 20:27
Are you complaining about having to read a Beyond3D article.:shocked: :wink3: :runaway: :runaway: :runaway:

Although if *recent* history serves *correctly* *cough*, you'll have plenty of time to catch up until the next one. :wink4:

wishiknew
26-Sep-2009, 22:59
I haven't checked the front page in a year and I hit jackpot!

mczak
27-Sep-2009, 03:02
IN the instruction 35, I see 5 registers being read. Are there differences between RV740 and RV770's shaders?
Not that I know of. But I think you're interpreting that wrong, docs say storage is separated per element. Hence you can access 3 .x components, 3 different .y components etc. (with some limitations) which is fulfilled by that instruction (x 2/3/13, y 3/13, z 3/16, w 0/3/16).

trinibwoy
27-Sep-2009, 22:42
Nice job guys, keep em coming.

Jawed
28-Sep-2009, 00:28
Integer and float instructions can't be processed in parallel.
Both types can share an instruction group.

The transcendental unit (the Rys unit!) is different from its more silhouette conscious brethren: it's (surprisingly!!!) capable of handling transcendentals (cos, sin, log, exp, rcp et al.) at a rate of 1/cycle, INT MUL, due to a slightly higher internal precision (40-bit versus 32-bit, allowing expression of a 32-bit int in the FP exponent) than the other ALUs, and format conversions, all whilst not being able to process dot products or double precision work (so it's idle when double precision processing is happening).
T unit is fully functional while XY or ZW or XYZW unit-combinations are executing double-precision instructions. T is only subject to operand bandwidth/register-file porting.

Work-assignment for the ALUs is also asymmetrical, the slim ALUs being tasked first, with the T ALU being the last to be issued an instruction, except when the instruction group contains transcendental or other instructions only it can execute, in which case it is immediately issued the corresponding instruction.
Order of work-assignment (XYZW versus T) is actually an option in the hardware: CONFIG.ALU_INST_PREFER_VECTOR.

The transcendental ALU shares GPR read ports with the other ALUs, so it can either load a needed operand in a single cycle if and only if one of the slim ALUs loads the same operand, otherwise it has to wait until such a time when an unused read port is available.
T: can also use any of the in-pipe registers: PV and PS, or any type of constant. The operand fetching algorithm and resulting constraints are byzantine and not worth the effort comprehending (unless you're writing a compiler or are after the last 1% of performance).

Writes are owner-exclusive (only the owner thread can write to the owned location), reads are shared (all other threads can read any location).
"Owner-exclusive" is merely an option. Though I've seen one document that suggests otherwise, it's battling both the R700 Family ISA and the Intermediate Language Specification (which, admittedly, could be considered "forward looking", beyond R700).

Jawed

Freak'n Big Panda
28-Sep-2009, 15:56
Holy crap. amazing. Keep it up rys, I'm hoping for a 870 review.

Richard
28-Sep-2009, 17:16
This is Alex's fault. :eek:

mboeller
08-Oct-2009, 11:32
Are you complaining about having to read a Beyond3D article.

Although if *recent* history serves *correctly* *cough*, you'll have plenty of time to catch up until the next one. :wink4:


not this time:

http://www.beyond3d.com/content/reviews/53/1

two for the prize of one ;)