Yesterday at a completely different internet connection your image did show up, while at multiple computers at this internet connection it didn't. Today it shows on my computer
Are you using chrome?
Yesterday at a completely different internet connection your image did show up, while at multiple computers at this internet connection it didn't. Today it shows on my computer
No, FF and IE. Anyway I can only guess that my ISP was being creative with something or other, since the problem has disappeared. I had wondered whether Win8CP was the cause, but I ruled that out.Are you using chrome?
The Radeon 7900 review thread discussed a possible weakness in AMD's design when it came to MRTs and MSAA. Building the G-buffer in various deferred schemes is an area where Nvidia handled things significantly better.
There's no "why" in that discussion, as far as I can tell.The Radeon 7900 review thread discussed a possible weakness in AMD's design when it came to MRTs and MSAA. Building the G-buffer in various deferred schemes is an area where Nvidia handled things significantly better.
To add to this point, if I'm not mistaken, AMD GPUs read a render target (configured as input to a pixel shader) through the TMU data path. Therefore it could actually be a TMU weakness, not one of the ROPs. That would also explain, that the performance relation between Pitcairn and Tahiti stays virtually the same with MSAA.There's no "why" in that discussion, as far as I can tell.
I would think everybody reading a render target in a pixel shader would do so through the tmu data path? A render target should look pretty much like any ordinary texture when accessed in the pixel shader. Maybe it's more likely to have non-full speed throughput due to "odd" format but otherwise what's the difference?To add to this point, if I'm not mistaken, AMD GPUs read a render target (configured as input to a pixel shader) through the TMU data path. Therefore it could actually be a TMU weakness, not one of the ROPs. That would also explain, that the performance relation between Pitcairn and Tahiti stays virtually the same with MSAA.
The Radeon 7900 review thread discussed a possible weakness in AMD's design when it came to MRTs and MSAA. Building the G-buffer in various deferred schemes is an area where Nvidia handled things significantly better.
Going back into the mists of time, CUDA was often (though not always) higher performance reading linearly organised buffers through non-TMU paths rather than through the TMUs.I would think everybody reading a render target in a pixel shader would do so through the tmu data path? A render target should look pretty much like any ordinary texture when accessed in the pixel shader.
I'd think that gcn's better cache architecture should potentially fix such issues? I think though I'm largely missing how any necessary synchronization etc. really works for UAVs...What we could simply be seeing in this scenario, is that NVidia's "CUDA-specific" linear access hardware is better than AMD's. It wasn't that long ago that doing linear buffers in OpenCL on AMD was a disaster zone (because it was based upon the vertex fetch hardware) and AMD might still be climbing that curve.
We've had caching for buffer reads for EG/NI chips for quite a while now. It doesn't help when a buffer is read/write, but there are plenty of buffers that are read only so it's still quite beneficial. SI has caching all the time of course.Going back into the mists of time, CUDA was often (though not always) higher performance reading linearly organised buffers through non-TMU paths rather than through the TMUs.
What we could simply be seeing in this scenario, is that NVidia's "CUDA-specific" linear access hardware is better than AMD's. It wasn't that long ago that doing linear buffers in OpenCL on AMD was a disaster zone (because it was based upon the vertex fetch hardware) and AMD might still be climbing that curve.
There are some reasons for this. First, the GPU's memory pool is split into two regions: CPU visible and invisible. The CPU visible region we expose is 256MB, normally. This means that you have ato most 768MB of contiguous memory on a 1GB card. The way the OpenCL conformance tests are written, you have to be able to allocate a buffer of the maximal size you report, which is sort of impossible to guarantee unless you're conservative. I believe Nvidia only exposes 128MB of CPU visible memory, so they have a larger continuous pool to work with. They also may handle memory allocations differently, but we use VidMM and expose two memory pools. Note that I believe we've improved this (memory allocation) behavior recently, but you're still going to have some limits caused by having two memory pools.Jawed said:AMD's initial support for UAVs was something of a kludge as far as I can tell - for the multiple UAVs that are required by D3D, using an emulation that configures a single physical UAV in hardware and splits it up. Additionally AMD hardware has severe constraints on the size of a UAV - a common complaint amongst OpenCL programmers is (was?) that it is impossible to allocate a single monster UAV (that is, a linear buffer) to use the majority of graphics memory (e.g. 900MB out of 1GB). There's some kind of hardware/driver restriction that only allows for 50% allocation. Allocating texture memory in OpenCL is less constrained.
There are some reasons for this. First, the GPU's memory pool is split into two regions: CPU visible and invisible. The CPU visible region we expose is 256MB, normally. This means that you have ato most 768MB of contiguous memory on a 1GB card. The way the OpenCL conformance tests are written, you have to be able to allocate a buffer of the maximal size you report, which is sort of impossible to guarantee unless you're conservative. I believe Nvidia only exposes 128MB of CPU visible memory, so they have a larger continuous pool to work with. They also may handle memory allocations differently, but we use VidMM and expose two memory pools. Note that I believe we've improved this (memory allocation) behavior recently, but you're still going to have some limits caused by having two memory pools.
My understanding is that if everyone were using 64-bit OSes (and apps) we could expose all the video memory to the CPU and not worry about having separate memory pools, not to mention facilitating faster data uploads in some cases.
Exactly that was my idea.I would think everybody reading a render target in a pixel shader would do so through the tmu data path? A render target should look pretty much like any ordinary texture when accessed in the pixel shader. Maybe it's more likely to have non-full speed throughput due to "odd" format but otherwise what's the difference?
If you're using Linux, then the issue is lack of VM support. In Windows we support VM for all EG/NI/SI chips and don't have these issues. Currently, only SI has VM support in Linux.At least on HD5xxx family, you can not allocate a single OpenCL buffer larger than 128MB (indeed you can allocate multiple 128MB buffers). I haven't recently verified if this limit is still present but I assume so. It is one of the most annoying limitation of AMD old hardware. It was a severe limitation for most OpenCL applications on AMD.
This is probably because read-only images are always cached. Buffers used read-only would be cached as well, as long as you don't alias pointers. I.e. "kernel void foo(global float* in, global float* out)", if the same memory object were bound to "in" and "out", then in would not be cached.Dade said:Another note: in the past, using linear data stored in an OpenCL image buffer was an effective way to improve performance over storing data on OpenCL linear buffer. This optimization was quite annoying to code too.
Well, I chuckled a bit at the notion of 6GB of RAM on one card.What's so "lol" about this?