Why Barts is really VLIW4, not VLIW5 (and more on HD 5830/6790 being mainly 128-bit)

Math does not lie.
Sure, math doesn't lie (if used properly), but it can be used as part of a larger argument in which the non-math portions are flawed in some way.

How can anyone respond to "My math proves it."? More faulty math?
How about with "the math doesn't prove it"?

How about the bleeding obvious: 1120/64 = 17.5?
I was going to mention this earlier, but I thought it was too obvious (plus I didn't know that it had to be in multiples of 64)….
 
steampoweredgod is that you ;)

can someone provide some test data proving what barts is otherwise this will go on and on

Nope, the great Bo_Fox. ;)

You guys are acting like as if you guys really hate math... :oops:

So far, you are deliberately avoiding the math like the plague. :rolleyes:

Come on, it's not that hard. Silly!!!!!!!!!!! I've never seen anything this ridiculous before in my life, honestly!!!! You guys like drama a bit more than the math. :LOL:
 
From the other thread:

Teh more reliable source is directed tests, showing clearly an instruction throughput of transcendental functions scheduled alonside other instructions like MADD, that is just not possible with VLIW4, thus the 6870 is faster in this particular directed test than the VLIW4-based HD 6970.

And 5870 even bests HD 7970, because it can schedule a transcendental along four MADs if served right.
(Also, that seems like some math to me, as long as the performance results are in terms of numbers or similar stuff.)
 
More words and still nothing concrete. You type quite a bit. I typed out the math for you.
*Ahem* OpenGL Guy works at AMD, in the driver development department (AFAIK). If anyone knows this stuff, it'll be him. ;)

Barts is VLIW5. This is fact, it's not open to discussion, debate, or interpretation.

Me having an attitude? Nah, not half as bad as yours, I honestly think, IMHO.
I think he's getting exasperated by you not listening to reason and instead insisting on clinging on to your personal opinions/delusions regardless of what anyone is saying.

I just want to have a concrete discussion -
You enjoy tilting at windmills, eh? How can you discuss that apples aren't in fact apples, but rather oranges? Seems less like a discussion to me and rather a stupid waste of time. Barts is VLIW5. End of story.
 
You want more math?

Ok let's try something else, that's new..

Barts XT vs HD 6950:

About 12% overall performance difference. All right? Everybody ok with that?

Now, increase all of Barts specs by 12% while leaving the clocks alone, to try to make Barts XT equal to HD 6950 in overall performance. Or should I do it by 13%, just for the heck of it? Let's just do 13% then.

shader power: 2016 GFLOPs plus 13% = 2278 GFLOPs
texturing: 50.4 Gtexels/sec plus 13% = 57 GT/s
bandwidth: 134.4 GB/s plus 13% = 151.9 GB/s

Ignore the Gpixels/sec as Barts XT already has 32 ROPs and higher clocks than Cayman, which means almost nothing (less than 1% practical real-world difference in gaming).

HD 6950 has:
shader power: 2253 GFLOPs
texturing power: 70.4 Gtexels/sec
bandwidth: 160 GB/s

Whoa! Wait a minute here.. increasing ALL of Barts XT specs by 13% in order to increase overall perf by 13% only gives it about 1% more GFLOPS than HD 6950, while STILL having considerably LESS texturing power and bandwith.

How does that add up?

Even worse yet - how does that add up if Barts XT is based upon VLIW5 rather than VLIW4?

Somebody here, please have the integrity to answer how that is possible for Barts XT to be that AMAZINGLY efficient if it's actually VLIW5? It would mean that VLIW4 is meaningless for Cayman, netting zero overall gaming performance benefits over Barts XT.

so after all that non math ( playing with a bunch of theoretical maximum numbers isn't the maths of it. Where is the accounting for the differences in the internal buses, changes in cache etc) how come you haven't answered

so how exactly does Barts work then if the shader compiler sends transcendentals to the T unit?

all your dodgy "math" aside how do you explain how transcendental are handled because it changed quite a bit between VLIW 4 and 5.
 
Nobody here wants to rise up to my challenge:

Logically answer the questions I've been asking -
Like:
Why does Barts XT perform so well in games against Cayman specs-wise if it's VLIW5 rather than VLIW4?

Why does Barts XT absolutely destroy HD 5850 and HD 5870, specs-wise, by a ridiculous margin?

Why does HD 6790 perform about the same as HD 5830 if the latter has 33% more shader and texturing power, with other specs being roughly the same - if BOTH are VLIW5?

More questions, but you get the idea... I trust???

so after all that non math ( playing with a bunch of theoretical maximum numbers isn't the maths of it. Where is the accounting for the differences in the internal buses, changes in cache etc) how come you haven't answered



all your dodgy "math" aside how do you explain how transcendental are handled because it changed quite a bit between VLIW 4 and 5.
That just makes it even more so, as the cache is supposed to improve it even further for Cayman, right? All the more reason for Barts XT to be VLIW4-based, if Cayman has further refinements, etc... Common sense, please, rather than some nonsense spouting. "Dodgy"? You are dodging the numbers here.

The questions I ask above should be answered by a nice person, if there is one, without this "drama queen" thing going on here - it seems like a competition for the drama queen crown.

Are those truly "impossible" questions to answer? :runaway:

EDIT: One guy gives a bit of insight into the drama going on here right now: http://alienbabeltech.com/abt/viewtopic.php?f=6&t=24350&view=unread#p59695
 
Last edited by a moderator:
One does not need logic when he has facts.

And there are actually facts:

http://developer.amd.com/sdks/AMDAPPSDK/assets/AMD_Evergreen-Family_Instruction_Set_Architecture.pdf
It's named 'Evergreen' but it's actually for every gpu from R600 up to Evergreen/NI excluding Cayman.

http://developer.amd.com/sdks/AMDAPPSDK/assets/AMD_HD_6900_Series_Instruction_Set_Architecture.pdf
This one is specifically for Cayman

Look at the "Chapter 4: ALU clauses" in both and you'll find the info you need.

And as mczak already told you you could just look at the linux mesa drivers... Or ml.
You'd find posts like this one:
http://lists.freedesktop.org/archives/mesa-dev/2012-February/019495.html
R600 (HD2xxx) up to Evergreen/Northern Islands (HD6xxx except HD69xx) are VLIW5. So that's not exactly scalar but it doesn't quite fit any simd vector model (as you can have 5 different instructions per instruction slot). (Cayman, aka HD69xx is VLIW4, and the new GCN chips aka HD7xxx indeed use a scalar model, as does nvidia.).
It's coming from an vmware dev who actually writes low level gpu stuff.
 
You're using flawed data to dogy math, showing a spurious correlation, leading you to jump to an erroneous conclusion.

And you apparently lack the ability to comprehend the overwhelming evidence pointed out to you disproving your wild theory.

My poor statisticians heart aches reading this thread. It's like having _xxx_ back, except not in RPSC. :(
 
You're using flawed data to dogy math, showing a spurious correlation, leading you to jump to an erroneous conclusion.

And you apparently lack the ability to comprehend the overwhelming evidence pointed out to you disproving your wild theory.

My poor statisticians heart aches reading this thread. It's like having _xxx_ back, except not in RPSC. :(
Surprise- I challenge you! But you might escape anyways..

Which data is flawed?

What part of my math is dodgy?

What evidence are you talking about, against my extrapolation that Barts could only possibly have the performance it currently has if it had the efficiency of VLIW4-based shaders?


I'm now thinking that maybe one of the specs about Barts is WRONG, if it's indeed VLIW5-based. Something has to be VERY wrong to account for the absolutely MASSIVE performance that's only possible if it were boosted by the efficiency of VLIW4. Which one is it, false specifications or VLIW4?
 
Why does Barts XT perform so well in games against Cayman specs-wise if it's VLIW5 rather than VLIW4?
....

Why does HD 6790 perform about the same as HD 5830 if the latter has 33% more shader and texturing power, with other specs being roughly the same - if BOTH are VLIW5?

More questions, but you get the idea... I trust???


That just makes it even more so, as the cache is supposed to improve it even further for Cayman, right? All the more reason for Barts XT to be VLIW4-based, if Cayman has further refinements, etc... Common sense, please, rather than some nonsense spouting. "Dodgy"? You are dodging the numbers here.
You have very "ignorant" (or naive) assumption that the ISA difference alone between the two VLIW architectures plays a pivotal role in whatever performance metrics you are flashing all over the place here. A GPU potential is still very much defined by a ton of dedicated graphics hardware in the pipeline, alongside the ALU throughput.
And no, the caching architecture in Cayman is pretty much the same as in the previous two generations.
 
Why is not the 5830 at least 30% faster according to the numbers above?
Uhm, because Barts has a more efficient front-end to feed the available resources? I always thought Cypress was kind of a brute force approach of throwing more shaders into the card to achieve higher performance and wouldn't consider it a good example of an efficient VLIW5 GPU architecture. Law of dimnishing returns is a bitch and Barts shows this.

If you want to further hunt conspiracies, figure out how a GTX580 is so much faster than Cypress despite lower theoreticals.
 
You have very "ignorant" (or naive) assumption that the ISA difference alone between the two VLIW architectures plays a pivotal role in whatever performance metrics you are flashing all over the place here. A GPU potential is still very much defined by a ton of dedicated graphics hardware in the pipeline, alongside the ALU throughput.
And no, the caching architecture in Cayman is pretty much the same as in the previous two generations.

You have a very "ignorant" attitude regarding the "WHY's" -

- Why does Barts perform so darn well given its specifications, if it's indeed VLIW5, against Evergreen? And also able to perform just like its Cayman cousin, specs-wise, like as if both had the same VLIW4 benefits over Evergreen?

About the caching, sorry - my mistake, it was HD 7970 that had more L2, but then it renders the other poster's question about cache irrelevant.

It's either that we're ignorant of a certain flawed specification for Barts that is just as deceptive as AMD stating in one of the slideshows that Bulldozer has 2B trannies. Did any of you guys even think about that slideshow until AMD admitted the "error", correcting it to 1.2B trannies? Sorry about the strawman argument, but by looking at where you guys are headed, you guys just do not want to sincerely consider the massive performance of Barts with its given specifications that truly seem impossible with VLIW5 architecture (either the specs has to be wrong, or that it's VLIW4).

No hard feelings. The drama didn't bruise me at all. I could take it like a goliath, no problem.

Now, just be sincere and constructive.

I welcome you guys to answer the "why's", assuming that a few of you guys here are actually capable of answering it. I know that most of your IQ is pretty high (at least 120-130, maybe more on average), so please let it shine gracefully. Just one good post with an explanation could be enough. It's like an engineer sitting down and explaining how a TV works, explaining the basics for 2 minutes, and then the listener is able to fully grasp the concept. The problem here is that nobody wants to explain it (or is allowed to truly explain it)???
 
Last edited by a moderator:
Uhm, because Barts has a more efficient front-end to feed the available resources? I always thought Cypress was kind of a brute force approach of throwing more shaders into the card to achieve higher performance and wouldn't consider it a good example of an efficient VLIW5 GPU architecture. Law of dimnishing returns is a bitch and Barts shows this.

If you want to further hunt conspiracies, figure out how a GTX580 is so much faster than Cypress despite lower theoreticals.

It sounds like you're guessing as to the "why". It's good effort, and I appreciate it. A more efficient front end does not actually boost a VLIW5 card like VLIW4 could boost it, IMHO..

No thanks, it's not my intent to hunt around for conspiracies, nor do I regard myself as a conspiracist compared to many others (on a scale of 1 to 10, I'm maybe a "2" or "3" while some others are "10"). I just try to connect the dots, or at least figure the "why", like if you saw Bulldozer looking more like 1.2B trannies rather than 2B, how would you feel being labeled as a conspiracist just because things didn't add up to you in your utmost honest sense (and it was only the math, not that much else, really)?

It's just utterly amazing how none of you guys tried to figure out a MASSIVE discrepancy between the gaming performance of standard VLIW5 cards and Barts, spec-wise. :oops: :cool: :LOL: (Sorry, I literally laughed so hard.. pretty hard! Laughter is healthy, so be happy for me at least!)
 
Last edited by a moderator:
Barts has among many other improvements:

- new front end, boosting utilization of shader core
- improved tesselation
- compared to 5830 fully functional 32 ROP's

On top of that game code rarely is limited by shader performance!
There are few purely shader limited tests, Perlin Noise from 3DMark is one of them. Just look at the results from this page and think about them for a second:
http://techreport.com/articles.x/20126/7

Barts performs exactly as you would expect from VLIW5 there.
Look at more complex tests to see how Barts is in line with Cypress on Particle, Cloth, and other tests where Cayman is showing good progress :)


Games are too complex workloads to just pinpoint any major similarities or differences between architectures. There might be 10% of the code running 300% faster on one architecture but 90% of code running a bit slower and you will get almost same speed on both of them. This kind of analysis is best done on low level code and synthetic benchmarks.
Game benches are great to establish how balanced architecture is for CURRENT workloads.

PS. You're not coming across as nice and polite person to me Bo_Fox but this might be my non native English. Try to listen with understanding more before going on crusade like seen in this topic.
 
Correct me if I am wrong, but isn't it all about shader utilization?

I did some calculations when the HD 7750 came out, and it turned out that it had like, 35% more performance per shader per clock than the HD 7950, which in turn is faster per shader per clock than the HD 7970 (looking at this card alone, it seems that GCN made no improvements in perf/shader).

And all of this on the same architecture.

If the HD 7970 had the same performance per shader than it's lower-end siblings, it would be much faster than even the previous dual-GPU cards.

I've it somewhere that it has much to do with the number of Asynchronous Command Engines (ACEs) on Tahiti, that is actually the same as with Cape Verde. Not sure though.

Maybe someone with more knowledge could explain it better?
 
Can i ask a noob question
vliw 5 i sort of understand 4 simple shaders + 1 complex
vliw 4 = 4 complex shaders
cgn is what ?

ps: about shader power the 2000 series had way much more shader power than its nv competition it but it wasnt until years later when the card was obsolete that the extra shading power helped framerate
(hope i got the right card)
 
Can i ask a noob question
vliw 5 i sort of understand 4 simple shaders + 1 complex
vliw 4 = 4 complex shaders
cgn is what ?
gcn, not cgn. vliw4 isn't exactly 4 complex alus, they are slightly more complex than the simple vliw5 ones as they need to perform the operations of the former complex alu but they need to work together for that.
For gcn, there's just one alu in that sense, with perhaps similar complexity to the ones in vliw4 (only looking at the shader alu itself, which doesn't really mean much, but anyway), since not multiple ones are working together for the more complex operations, but instead they need multiple clocks. I haven't actually seen a detailed analysis how fast these operations are on gcn.

ps: about shader power the 2000 series had way much more shader power than its nv competition it but it wasnt until years later when the card was obsolete that the extra shading power helped framerate
(hope i got the right card)
HD2000 didn't actually have that much more shader power (not counting the SFU mul on 8800gtx, something like 30% more peak flops comparing hd2900xt to 8800gtx). I don't think this really ever helped that much with newer titles. Only starting with hd4xxx amd cards really had a huge peak flops advantage (often around 100%), since amd managed to cram in a lot more simds.

btw this thread is hilarious :)
 
thanks for that answer mczak, ive asked the question a couple of times but never really got an answer
so in simple terms vliw4 is groups of 4 while gcn are single entities ?
 
Back
Top