Alright, I finally worked it out after going through a few driver revisions. It was Cat 14.12 (Omega) where TessMark performance jumped up. I don't run these low level tests every time (they never change, until now), so I hadn't picked up on it until now. Still not quite sure what AMD did though to boost performance though.AMDs tessellation performance can vary quite substiantially with driver revision.
Hope you can see this graph directly:
http://www.pcgameshardware.de/scree...ellation_Performance_vs_R9_280_part1-pcgh.png
If not, you need to scroll down a bit here:
http://www.pcgameshardware.de/Grafikkarten-Grafikkarte-97980/Tests/AMD-Radeon-R9-285-Test-1134146/4/
x64: 152.9fps (was: 134)Wow! I wonder what happened there. Is there a similar boost for Tonga?
Trying to install the latest beta driver, but since it's fucking beta shit it doesn't even work. It stops at the 'identifying graphics hardware' step and never progresses from there. Un-fucking believable, tried two times now and nothing ever happens.
How hard can it be to just get a god damn installer to work? All it has to do is stick some files in their right place (and create a billion registry keys, for some incomprehensible reason.)
The one thing I keep asking in this thread is "What kind of "big architectural improvements" do you want in GCN 2.0 exactly" and I still haven't heard the answer.AMD's problem is that they need an architectural improvement for GCN and honestly that improvement needs to be included in their first finfet GPUs or they're going to have a bad time.
Updating just a few modular blocks still requires a complete redesign with all the validation stages, so this will be a new chip, not a minor revision/stepping.Or perhaps this is the place where those "IP-blocks" I think Dave Baumann once called them come into play. If my memory serves me right, it's not just "GCN x.y", but there's individual "IP-levels" for smaller blocks inside the GPU, which aren't tied to updating every other block too.
Alright, I finally worked it out after going through a few driver revisions. It was Cat 14.12 (Omega) where TessMark performance jumped up. I don't run these low level tests every time (they never change, until now), so I hadn't picked up on it until now. Still not quite sure what AMD did though to boost performance though.
x64: 152.9fps (was: 134)
x32: 302.5fps (was: 294)
So a slight uptick in perf, but not like on 290. I'm not sure why 285 basically got this boost first, since 290 wasn't this fast when I initially reviewed 285.
Obviously, but it still may be relatively easy job compared to trying to fit actually new blocks like the TrueAudio DSPs in thereUpdating just a few modular blocks still requires a complete redesign with all the validation stages, so this will be a new chip, not a minor revision/stepping.
Maybe, maybe not - I wouldn't agree on the way you read that (supposed) slide, especially since all the GCN's (at least 1.1+) already do FL12_0 with resource binding 3Unfortunately it doesn't really seem that AMD implemented feature level 12_1 in Fiji.
I like GCN compute units. The design is elegant. I definitely do not want OoO. I like that GCN is a memory based architecture. All the resources stored in memory and cached by general purpose cache hardware. There are no sudden performance pitfalls. Performance is all about the memory access patterns (= something that the developer has full control on).The one thing I keep asking in this thread is "What kind of "big architectural improvements" do you want in GCN 2.0 exactly" and I still haven't heard the answer.
Does it really make sense to increase complexity and increase bus/register bit width, implement out-of-order and superscalar execution, etc.? Especially when known AMD performance problems seem to come from underutilizing existing GCN cores with non-optimal driver/compiler/optimizer code.
If I may (if you have the time). Are Maxwell and current Intel iGPUs also memory based architectures, or are they something else? And if you really have the time or a quick link, what is the alternative to "memory based"?I like that GCN is a memory based architecture. All the resources stored in memory and cached by general purpose cache hardware. There are no sudden performance pitfalls. Performance is all about the memory access patterns (= something that the developer has full control on).
So there is a difference? And if so, is the console compiler better because it has a different API?Current generation consoles have roughly the same GPU architecture as the AMD PC GPUs, making it easy for the developers to compare the compiler results (instruction counts, GPR counts, etc).
I still don't see any reason to believe so.Color compression makes AMD bandwidth usage comparable to NVIDIA.
Ahaha, okay, so it's about politics then? Anyway, I chose "express uninstall of all AMD software", which reasonably should include the not-the-driver-but-also-part-of-the-driver catalyst, AND the actual driver itself. Yet when driver cleaner sorted through my stuff, it found tons of remnants. After killing all of that stuff the install program was able to proceed.Although it is absolutely necessary for many things, it is somehow considered technically separate from the actual driver.
When you have 4096 very simple CISC cores - 50-80 times more than Intel Xeon Phi which use modified Atom cores - you can improve performance by simply scaling down the process node to add more cores and increase running frequency.
If I'm not mistaken, it's Windows that wants to keep old driver files and those are the ones driver cleaner softwares find and clean up.Ahaha, okay, so it's about politics then? Anyway, I chose "express uninstall of all AMD software", which reasonably should include the not-the-driver-but-also-part-of-the-driver catalyst, AND the actual driver itself. Yet when driver cleaner sorted through my stuff, it found tons of remnants. After killing all of that stuff the install program was able to proceed.
This - leaving basically everything behind after "uninstalling" - is an issue AMD has never bothered to resolve, and it's been extremely well documented over the years. It takes something like a minute for the catalyst installer to do its thing (I haven't actually timed it, but it won't be wildly off), while "uninstall" happens inside of a snap of your fingers in comparison. It stands within reason that it doesn't actually do all that much uninstalling in that short amount of time.
If this is true, Fiji has 2x more memory bandwidth than it needs.I still don't see any reason to believe so.
I also never understood why are people expecting all the bandwidth saving technologies in Fiji. For example if the compression had anything to do with booming size of Tonga, than you better drop that.If this is true, Fiji has 2x more memory bandwidth than it needs.
I still expect that full Tonga has 384bit membus, CU amount has already been confirmed to equal TahitiI also never understood why are people expecting all the bandwidth saving technologies in Fiji. For example if the compression had anything to do with booming size of Tonga, than you better drop that.