What is the OpenGL eqivalent to Sm2.0/3.0?

Sigma said:
ATI provides it's own compiler, NVIDIA does the same, as 3DLabs, etc.

Yes and as I pointed out not MS. Performance may be made up by the advantages of the GLSL but not having the large MS compiler team working on the optimisation problem is a disavantage.
 
DeanoC said:
Sigma said:
ATI provides it's own compiler, NVIDIA does the same, as 3DLabs, etc.

Yes and as I pointed out not MS. Performance may be made up by the advantages of the GLSL but not having the large MS compiler team working on the optimisation problem is a disavantage.

Yes, but no one is as interested in optimization as the IHV, the eternal benchmark war and all that. And as far as MS is concerned it has won the 3D api war (for games at least). I just hope D3D doesn't turn like another IE once they have 95% of the 3D api "market".
 
MS have some very good reasons (upcoming console war with Sony) in keeping the compiler tech up there with the best.

Don't get me wrong, the NVIDIA and ATI teams are excellent but I suspect even they would concede for 'pure' compiler tech, MS is hard to beat.

MS are doing a lot of work at the moment, and given that on one platform they get the GLSL advantage (direct to hardware) for HLSL (and also its the DX10 methodology) I expect them to keep producing the best shader compilers for the forseeable future.

Another tech I wouldn't be surprised (this is pure guess work BTW) coming into shader compilers, is profile guided. Its only just hit PC and console for main code but I can see it being used for shaders soon.
 
DeanoC said:
Another tech I wouldn't be surprised (this is pure guess work BTW) coming into shader compilers, is profile guided. Its only just hit PC and console for main code but I can see it being used for shaders soon.
Would you mind summarizing profile guidance? If you're too busy with actual work :)P), I'll just Google it.
 
DeanoC said:
MS are doing a lot of work at the moment, and given that on one platform they get the GLSL advantage (direct to hardware) for HLSL (and also its the DX10 methodology) I expect them to keep producing the best shader compilers for the forseeable future.

Obviously true. But compiling to the assembly-like bytecode they do today on their PC OSs is suboptimal because it favours hardware that has implementations that have a similar structure as that particular asm.

DeanoC said:
Another tech I wouldn't be surprised (this is pure guess work BTW) coming into shader compilers, is profile guided. Its only just hit PC and console for main code but I can see it being used for shaders soon.
Aren't they doing this already? By manually analyzing workloads and putting shader replacements into drivers :)

Seriously, I think will we see this when SM3 shaders are widespread. PGO, afterall, is mostly about getting branches right.

Cheers
Gubbi
 
Inane_Dork: Profile guided just means actually using a real run of the code in situ (i.e. the real game etc.) to find out which bits get run the most and then optimsing for those paths through the code.

Gubbi:
You mean were not all SM3 by now ;-) Just playing ATI fans, just a little joke :)

Yep PGO is mainly about branching, inlining etc.
Whats particular interesting for shaders is the limited instruction RAM, most systems store a cache of shaders on chip.
PGO should potentially make the choice of when to favor size over speed (for example inline functions) must better. And this should make a good speed increase, I'd expect better than the 20% increase generally quoted for CPU code. Particular if the PGO can get information on upload times to GPU etc and call rates. If I can predict a shader is called at every pixel full screen I can definately waste iRAM to inline stuff, however if its a vertex shader used by 4 vertices than its probably not worth potentially kicking out other shaders.

This will be even more important for unified shader architectures.
 
DeanoC said:
Yep PGO is mainly about branching, inlining etc.
Whats particular interesting for shaders is the limited instruction RAM, most systems store a cache of shaders on chip.
PGO should potentially make the choice of when to favor size over speed (for example inline functions) must better.

<snip>
This will be even more important for unified shader architectures.

Another possibility with PGO is to look at texture-lookups. If a shader accesses a texture with poor locality it could to choose to use a non-temporal load of the texture so that the lookup won't evict data from the texture-cache that has temporal (or spatial) locality.

It'll be a significant step up in driver (and even hardware) complexity though.

Cheers
Gubbi
 
Gubbi said:
If a shader accesses a texture with poor locality it could to choose to use a non-temporal load of the texture so that the lookup won't evict data from the texture-cache that has temporal (or spatial) locality.
AFAIK current texture caches are so small that this is a no problem.
EDIT: I'm not saying texture caches are not effective, I'm saying texture caches are not build to make sure the hw reuse texels even between batches, imho.
 
nAo said:
Gubbi said:
If a shader accesses a texture with poor locality it could to choose to use a non-temporal load of the texture so that the lookup won't evict data from the texture-cache that has temporal (or spatial) locality.
AFAIK current texture caches are so small that this is a no problem.
EDIT: I'm not saying texture caches are not effective, I'm saying texture caches are not build to make sure the hw reuse texels even between batches, imho.

But don't you think they will going forward? I'd imagine texture caches will increase in size, not so much to solve the associated bandwidth load, but more to lower the latency for texture reads (in particular dependent reads), otherwise GPUs would have to commit *alot* of logic of keeping instructions in flight to avoid stalls.

The size of such a cache would have to be large enough to hold most (if not all) the texels used in a scene, something along the line of Nintendo's Gamecube's Flipper with its 1MB tex-cache.

Cheers
Gubbi
 
Gubbi said:
The size of such a cache would have to be large enough to hold most (if not all) the texels used in a scene
This is not what I'd call a cache.
 
nAo said:
Gubbi said:
The size of such a cache would have to be large enough to hold most (if not all) the texels used in a scene
This is not what I'd call a cache.

Why not? It'll still be demand loaded on a per texel basis (or in case of compression per texel-block), ie. I'm not talking about loading an entire texture into on-die RAM (vis a vis Sony's PS2).

If we're talking about a 1:1 ratio of texels to pixels and 3-4 texture layers, that'll be 8M texels in a 1600x1200 resolution scene ... @ 4 bits per texel (with compression) that'll be 4MB of data.

Cheers
Gubbi
 
Gubbi said:
If we're talking about a 1:1 ratio of texels to pixels and 3-4 texture layers, that'll be 8M texels in a 1600x1200 resolution scene ... @ 4 bits per texel (with compression) that'll be 4MB of data.
I would not waste a lot of transistors for a big cache just to load 4 MBytes (240 Mbytes/s @ 60 fps) of unique data per frame ;)

ciao,
Marco
 
Well after watching you guys discuss stuff way out of my league, I think I picked up a few bits...

OpenGL games like D3 etc.. can just use whatever features are on a card but they need an extension by the card vendors.

D3D games like HL2 or FarCry need to wait for a DX update from M$ but they dont have to worry about which card vendor writes an extension etc...

Am I correct?
 
XxStratoMasterXx said:
Well after watching you guys discuss stuff way out of my league, I think I picked up a few bits...

OpenGL games like D3 etc.. can just use whatever features are on a card but they need an extension by the card vendors.

D3D games like HL2 or FarCry need to wait for a DX update from M$ but they dont have to worry about which card vendor writes an extension etc...

Am I correct?
Mostly.

With D3D, they have to worry about which shader versions are supported and what D3D caps bits are set - something which varies between different cards (a caps bit is basically a bit supplied by the driver that specifies whether a given feature is supported or not). The main advantage of the D3D caps bit system over the OpenGL extension system is that if a feature is present, there is ONE standard way to use it, rather than one way per card vendor.
 
So..OpenGL can support features faster, but both card vendors have their own way of doing it (as carmck mentioned with the render to texture pbuffer interface). D3d doesnt support features as fast, but they are one size fits all, right?

Am i right?
 
arjan de lumens said:
The main advantage of the D3D caps bit system over the OpenGL extension system is that if a feature is present, there is ONE standard way to use it, rather than one way per card vendor.

That's when there's not one extension widely supported for the feature.
Most of the times, you have vendor specifics extension for close featureset, then an ARB extension that encompass everything, is well thought and supported.
Some vendor specific extensions also can become widely supported, such as GL_NV_texgen_reflection.

Basically in DirectX you check caps, in OpenGL you check extensions; in DirectX there's only one way to access the feature, in OpenGL there might be more than a single. (And it's likely that there's one widely supported, just stick to it if you don't want to code for too many extensions.)
 
XxStratoMasterXx said:
So..OpenGL can support features faster, but both card vendors have their own way of doing it (as carmck mentioned with the render to texture pbuffer interface). D3d doesnt support features as fast, but they are one size fits all, right?

Am i right?

Yes, though the OpenGL Architecture Review Board makes standard extensions that are widely supported for important features.
(ARB_vertex_buffer_object, ARB_fragment_program, ARB_vertex_program, OpenGL Shading Language...)

So it's more you can access the hardware capabilities very quickly, in a vendor specific way, and you'll have a standard interface later on.
 
nAo said:
Gubbi said:
If we're talking about a 1:1 ratio of texels to pixels and 3-4 texture layers, that'll be 8M texels in a 1600x1200 resolution scene ... @ 4 bits per texel (with compression) that'll be 4MB of data.
I would not waste a lot of transistors for a big cache just to load 4 MBytes (240 Mbytes/s @ 60 fps) of unique data per frame ;)

Like I said earlier it's not to save bandwidth it's to reduce latency for dependent tex reads.

Cheers
Gubbi
 
Ingenu said:
XxStratoMasterXx said:
So..OpenGL can support features faster, but both card vendors have their own way of doing it (as carmck mentioned with the render to texture pbuffer interface). D3d doesnt support features as fast, but they are one size fits all, right?

Am i right?

Yes, though the OpenGL Architecture Review Board makes standard extensions that are widely supported for important features.
(ARB_vertex_buffer_object, ARB_fragment_program, ARB_vertex_program, OpenGL Shading Language...)

So it's more you can access the hardware capabilities very quickly, in a vendor specific way, and you'll have a standard interface later on.

Ah, so say if nVidia wanted to show off a new feature in an opengl game (lets just say doom 3 since it's a recent ogl game) that only works with their extension, and they had the code, they could impliment it? Then later on ARB makes a standard?
 
So..OpenGL can support features faster, but both card vendors have their own way of doing it (as carmck mentioned with the render to texture pbuffer interface). D3d doesnt support features as fast, but they are one size fits all, right?

Its fast meaning they don't have to wait for MS to put out say DX9.0c. But they still do need to put it out. Like ATI F-buffer. That one took awhile, actually I haven't check if they put an extension to it yet. Have they ?
 
Back
Top