Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 15-Mar-2012, 04:36   #3151
mczak
Senior Member
 
Join Date: Oct 2002
Posts: 2,670
Default

Quote:
Originally Posted by Jawed View Post
What we could simply be seeing in this scenario, is that NVidia's "CUDA-specific" linear access hardware is better than AMD's. It wasn't that long ago that doing linear buffers in OpenCL on AMD was a disaster zone (because it was based upon the vertex fetch hardware) and AMD might still be climbing that curve.
I'd think that gcn's better cache architecture should potentially fix such issues? I think though I'm largely missing how any necessary synchronization etc. really works for UAVs...
mczak is offline   Reply With Quote
Old 15-Mar-2012, 05:24   #3152
OpenGL guy
Senior Member
 
Join Date: Feb 2002
Posts: 2,333
Send a message via ICQ to OpenGL guy
Default

Quote:
Originally Posted by Jawed View Post
Going back into the mists of time, CUDA was often (though not always) higher performance reading linearly organised buffers through non-TMU paths rather than through the TMUs.

What we could simply be seeing in this scenario, is that NVidia's "CUDA-specific" linear access hardware is better than AMD's. It wasn't that long ago that doing linear buffers in OpenCL on AMD was a disaster zone (because it was based upon the vertex fetch hardware) and AMD might still be climbing that curve.
We've had caching for buffer reads for EG/NI chips for quite a while now. It doesn't help when a buffer is read/write, but there are plenty of buffers that are read only so it's still quite beneficial. SI has caching all the time of course.

Quote:
Originally Posted by Jawed
AMD's initial support for UAVs was something of a kludge as far as I can tell - for the multiple UAVs that are required by D3D, using an emulation that configures a single physical UAV in hardware and splits it up. Additionally AMD hardware has severe constraints on the size of a UAV - a common complaint amongst OpenCL programmers is (was?) that it is impossible to allocate a single monster UAV (that is, a linear buffer) to use the majority of graphics memory (e.g. 900MB out of 1GB). There's some kind of hardware/driver restriction that only allows for 50% allocation. Allocating texture memory in OpenCL is less constrained.
There are some reasons for this. First, the GPU's memory pool is split into two regions: CPU visible and invisible. The CPU visible region we expose is 256MB, normally. This means that you have ato most 768MB of contiguous memory on a 1GB card. The way the OpenCL conformance tests are written, you have to be able to allocate a buffer of the maximal size you report, which is sort of impossible to guarantee unless you're conservative. I believe Nvidia only exposes 128MB of CPU visible memory, so they have a larger continuous pool to work with. They also may handle memory allocations differently, but we use VidMM and expose two memory pools. Note that I believe we've improved this (memory allocation) behavior recently, but you're still going to have some limits caused by having two memory pools.

My understanding is that if everyone were using 64-bit OSes (and apps) we could expose all the video memory to the CPU and not worry about having separate memory pools, not to mention facilitating faster data uploads in some cases.
__________________
I speak only for myself.
OpenGL guy is offline   Reply With Quote
Old 15-Mar-2012, 08:27   #3153
Dade
Member
 
Join Date: Dec 2009
Posts: 182
Default

Quote:
Originally Posted by OpenGL guy View Post
There are some reasons for this. First, the GPU's memory pool is split into two regions: CPU visible and invisible. The CPU visible region we expose is 256MB, normally. This means that you have ato most 768MB of contiguous memory on a 1GB card. The way the OpenCL conformance tests are written, you have to be able to allocate a buffer of the maximal size you report, which is sort of impossible to guarantee unless you're conservative. I believe Nvidia only exposes 128MB of CPU visible memory, so they have a larger continuous pool to work with. They also may handle memory allocations differently, but we use VidMM and expose two memory pools. Note that I believe we've improved this (memory allocation) behavior recently, but you're still going to have some limits caused by having two memory pools.

My understanding is that if everyone were using 64-bit OSes (and apps) we could expose all the video memory to the CPU and not worry about having separate memory pools, not to mention facilitating faster data uploads in some cases.
At least on HD5xxx family, you can not allocate a single OpenCL buffer larger than 128MB (indeed you can allocate multiple 128MB buffers). I haven't recently verified if this limit is still present but I assume so. It is one of the most annoying limitation of AMD old hardware. It was a severe limitation for most OpenCL applications on AMD.

In my opinion, this limit was more annoying than not having access to all GPU memory pool.

Another note: in the past, using linear data stored in an OpenCL image buffer was an effective way to improve performance over storing data on OpenCL linear buffer. This optimization was quite annoying to code too.
Dade is offline   Reply With Quote
Old 15-Mar-2012, 09:05   #3154
Gipsel
Senior Member
 
Join Date: Jan 2010
Location: Hamburg, Germany
Posts: 1,448
Default

To get back to the question of MSAA performance:
Quote:
Originally Posted by mczak View Post
I would think everybody reading a render target in a pixel shader would do so through the tmu data path? A render target should look pretty much like any ordinary texture when accessed in the pixel shader. Maybe it's more likely to have non-full speed throughput due to "odd" format but otherwise what's the difference?
Exactly that was my idea.
Just imagine nV can do it full speed and AMD only half speed (or some other difference). Factor in that AMD TMUs are already slower for the FP16 data format used quite often (afaik) for such render targets and you may arrive at a significant difference.
Gipsel is offline   Reply With Quote
Old 15-Mar-2012, 15:16   #3155
OpenGL guy
Senior Member
 
Join Date: Feb 2002
Posts: 2,333
Send a message via ICQ to OpenGL guy
Default

Quote:
Originally Posted by Dade View Post
At least on HD5xxx family, you can not allocate a single OpenCL buffer larger than 128MB (indeed you can allocate multiple 128MB buffers). I haven't recently verified if this limit is still present but I assume so. It is one of the most annoying limitation of AMD old hardware. It was a severe limitation for most OpenCL applications on AMD.
If you're using Linux, then the issue is lack of VM support. In Windows we support VM for all EG/NI/SI chips and don't have these issues. Currently, only SI has VM support in Linux.
Quote:
Originally Posted by Dade
Another note: in the past, using linear data stored in an OpenCL image buffer was an effective way to improve performance over storing data on OpenCL linear buffer. This optimization was quite annoying to code too.
This is probably because read-only images are always cached. Buffers used read-only would be cached as well, as long as you don't alias pointers. I.e. "kernel void foo(global float* in, global float* out)", if the same memory object were bound to "in" and "out", then in would not be cached.

Sorry for the OT, but I thought it was worth explaining.
__________________
I speak only for myself.
OpenGL guy is offline   Reply With Quote
Old 20-Mar-2012, 10:34   #3156
UniversalTruth
Former Member
 
Join Date: Sep 2010
Posts: 1,529
Default

lol

AMD Radeon HD 7990 Clock Speeds, Chip-Configuration Surface

UniversalTruth is offline   Reply With Quote
Old 20-Mar-2012, 10:53   #3157
Kaotik
Drunk Member
 
Join Date: Apr 2003
Posts: 5,365
Send a message via ICQ to Kaotik
Default

Quote:
Originally Posted by UniversalTruth View Post
What's so "lol" about this? It looks like they might actually keep it within 300W barrier this time with some binned chips at those clocks
__________________
I'm nothing but a shattered soul...
Been ravaged by the chaotic beauty...
Ruined by the unreal temptations...
I was betrayed by my own beliefs...
Kaotik is offline   Reply With Quote
Old 20-Mar-2012, 12:10   #3158
Mianca
Member
 
Join Date: Aug 2010
Posts: 330
Default

Quote:
Originally Posted by Kaotik View Post
What's so "lol" about this?
Well, I chuckled a bit at the notion of 6GB of RAM on one card.

Should come with some kind of "4k Eyefinity ready" sticker ...
Mianca is offline   Reply With Quote
Old 20-Mar-2012, 12:44   #3159
Dooby
Junior Member
 
Join Date: Jul 2003
Posts: 478
Default

Well, thats only 3GB per chip, much like all their other cross fire on a stick cards.

Given that theres already a 6GB for ONE GPU card coming out, 6GB for two GPUs is hardly amazing.
__________________
.טאָ לאָמיר אַלע שפּילן ,אין דרײדל, אײנס און צוויי
webdesign & ecommerce | tv guide
Dooby is offline   Reply With Quote
Old 20-Mar-2012, 13:09   #3160
Psycho
Member
 
Join Date: Jun 2008
Location: Copenhagen
Posts: 668
Default

So 2 full Tahitis @ 850 mhz with only 300W TDP?
(and probably a bios switch making it a full 7970x2 around 375W)
That sounds pretty efficient
Psycho is offline   Reply With Quote
Old 20-Mar-2012, 13:12   #3161
Love_In_Rio
Senior Member
 
Join Date: Apr 2004
Posts: 1,157
Default

Quote:
Originally Posted by Psycho View Post
So 2 full Tahitis @ 850 mhz with only 300W TDP?
(and probably a bios switch making it a full 7970x2 around 375W)
That sounds pretty efficient
Or two full 7980s ( Tahiti at 1/16 rate DP )
Love_In_Rio is offline   Reply With Quote
Old 20-Mar-2012, 13:22   #3162
denev2004
Member
 
Join Date: Apr 2010
Location: China
Posts: 143
Send a message via MSN to denev2004 Send a message via Skype™ to denev2004
Default

Quote:
Originally Posted by Dooby View Post
Well, thats only 3GB per chip, much like all their other cross fire on a stick cards.

Given that theres already a 6GB for ONE GPU card coming out, 6GB for two GPUs is hardly amazing.
Exactly. It's just a normal improvement over the previous cards, there's nothing special.
But I wonder whether 850Mhz is a bit low.
__________________
Well I'm not a native English speaker so there might be misuse through my words. I just hope it won't cause too much misunderstanding.
denev2004 is offline   Reply With Quote
Old 20-Mar-2012, 14:04   #3163
tunafish
Member
 
Join Date: Aug 2011
Posts: 406
Default

Quote:
Originally Posted by denev2004 View Post
But I wonder whether 850Mhz is a bit low.
Probably necessary for 300W. They are free to provide a buff enough VRM that ocing it to 1GHz should be a breeze.
tunafish is offline   Reply With Quote
Old 20-Mar-2012, 17:51   #3164
Acert93
Artist formerly known as Acert93
 
Join Date: Dec 2004
Location: Seattle
Posts: 7,806
Default

Where is the 300W figure coming from?
__________________
"In games I don't like, there is no such thing as "tradeoffs," only "downgrades" or "lazy devs" or "bugs" or "design failures." Neither do tradeoffs exist in games I'm a rabid fan of, and just shut up if you're going to point them out." -- fearsomepirate
Acert93 is offline   Reply With Quote
Old 20-Mar-2012, 18:09   #3165
TKK
Member
 
Join Date: Jan 2010
Posts: 146
Default

Quote:
Originally Posted by Acert93 View Post
Where is the 300W figure coming from?
I wonder the same. I doubt we'll see anything less than a 375W TDP.
TKK is offline   Reply With Quote
Old 20-Mar-2012, 18:15   #3166
AnarchX
Senior Member
 
Join Date: Apr 2007
Posts: 1,485
Default

Maybe AMD will reconfigure PowerTune as Boost:
- 925MHz dual GPU boost, when front-end is limiting
- 1100MHz single GPU boost, in cases when AFR is not working, through render-to-texture or similar

Of couse enthusiasts should be able unlock it to >375W and a included water-cooler would be appropriate if they are aiming >$800.
AnarchX is offline   Reply With Quote
Old 20-Mar-2012, 20:46   #3167
Silent_Buddha
Regular
 
Join Date: Mar 2007
Posts: 10,302
Default

If they have the power circuitry on board for this, then it should theoretically be capable of overclocking to the same limits as a 7970.

The clocks are probably just there to keep TDP under a certain level for certification.

Regards,
SB
Silent_Buddha is offline   Reply With Quote
Old 20-Mar-2012, 22:51   #3168
Zaphod
Remember
 
Join Date: Aug 2003
Posts: 2,097
Default

Quote:
Originally Posted by Acert93 View Post
Where is the 300W figure coming from?
Quote:
Originally Posted by TKK View Post
I wonder the same. I doubt we'll see anything less than a 375W TDP.
Maximum PCIe power spec. If they want the card to be certified, they won't go above 300W @ stock settings.
Zaphod is offline   Reply With Quote
Old 20-Mar-2012, 23:11   #3169
cal_guy
Member
 
Join Date: Jun 2008
Posts: 204
Default

Quote:
Originally Posted by Zaphod View Post
Maximum PCIe power spec. If they want the card to be certified, they won't go above 300W @ stock settings.
That's not really true

Quote:
At the end of the day as the PCI-SIG is a pro-compliance organization as opposed to being a standard-enforcement organization, there’s little to lose for AMD or their partners by not being compliant with the PCIe power specifications. By not having passed compliance testing the only “penalty” for AMD is that they cannot claim the 6990 is PCIe compliant; funny enough they can even use the PCIe logo (we’ve already seen a Sapphire 6990 box with it). So does PCIe compliance matter? For mainstream products PCIe compliance matters for the purposes of getting OEM sales; for everything else including niche products like the 6990, PCIe compliance does not matter.
http://www.anandtech.com/show/4209/a...e-card-king/18
cal_guy is offline   Reply With Quote
Old 20-Mar-2012, 23:11   #3170
Tchock
Member
 
Join Date: Mar 2008
Location: Jurong West
Posts: 843
Default

Hmm... chances of pushing a 1+Ghz 7950 1.5GB out there at $380-400, post-GTX680?

Now's a good time to do a balanced price-perf SKU that doesn't necessarily hurt AMD (1Ghz vs full-unit chips, I've no idea, but it'd probably be a good idea), while maintaining almost the same performance gap wrt the 680 (probably 15% or so, at a 25-30% price gap).



(I want my quarterrate dp fix )
__________________
<rpg.314> - I have a feeling that shielding 480 from the evils of afr, embodied in that creation of satan called 5970, will be a part of epic battle between good and evil
<neliz> - The Devil doesn't wear green.
Tchock is offline   Reply With Quote
Old 20-Mar-2012, 23:25   #3171
Zaphod
Remember
 
Join Date: Aug 2003
Posts: 2,097
Default

Quote:
Originally Posted by cal_guy View Post
That's not really true
AMD may or may not care about PCIe compliance for a dual GPU halo card, but that's almost certainly where the assumption (that may or may not be warranted) of max 300W comes from (which was the question).
Zaphod is offline   Reply With Quote
Old 20-Mar-2012, 23:32   #3172
AlphaWolf
Specious Misanthrope
 
Join Date: May 2003
Location: Treading Water
Posts: 8,119
Default

Considering the range on power use for the 7970, I don't find it that hard to believe they could manage 300W, they can use lower voltage and aggressive binning. I also doubt it matters to most of the people that would buy one.
AlphaWolf is offline   Reply With Quote
Old 22-Mar-2012, 22:30   #3173
Alexko
Senior Member
 
Join Date: Aug 2009
Posts: 2,809
Send a message via MSN to Alexko
Default

From Tridam (Hardware.fr):

"AMD nous a indiqué hier avoir l'intention de proposer une version GHz Edition de la Radeon HD 7970, sans préciser la forme exacte que cette déclinaison prendrait."

Which you could translate as:

"Last night, AMD told us that they intend to launch a GHz Edition of the Radeon HD 7970, without specifying exactly what it would be."

I sure hope it's running above 1 GHz, otherwise the 8% overclock will leave most people underwhelmed.
__________________
"Well, you mentioned Disneyland, I thought of this porn site, and then bam! A blue Hulk." —The Creature
My (currently dormant) blog: Teχlog
Alexko is offline   Reply With Quote
Old 22-Mar-2012, 23:22   #3174
DarthShader
Member
 
Join Date: Jul 2010
Location: Land of Mu
Posts: 350
Default

Could be enough, if they can add a Boost on top of that. This could be doable with current hardware, but with new Bios. (or not?)
DarthShader is offline   Reply With Quote
Old 23-Mar-2012, 14:09   #3175
Dave Baumann
Gamerscore Wh...
 
Join Date: Jan 2002
Posts: 13,439
Default

Quote:
Originally Posted by DarthShader View Post
Could be enough, if they can add a Boost on top of that. This could be doable with current hardware, but with new Bios. (or not?)
Could be, but it is a question for each particular board target if it is necessary.
__________________
Radeon is Gaming
Tweet Tweet!
Dave Baumann is offline   Reply With Quote

Reply

Tags
bye bye vliw, fps, stutter, untapped power, vliw lives on

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 02:19.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.