How many of em are developing with D3D 10?

poly-gone

Newcomer
Just curious, how many Game Developers are out there (suspectedly) developing a pure D3D (DX?) 10 engine? Since that way they could really leverage the full power of D3D10 without having to "scale" across various hardware. I think Unreal Engine 4 could be a potential candidate. Any ideas?
 
poly-gone said:
Just curious, how many Game Developers are out there (suspectedly) developing a pure D3D (DX?) 10 engine? Since that way they could really leverage the full power of D3D10 without having to "scale" across various hardware. I think Unreal Engine 4 could be a potential candidate. Any ideas?
I would imagine it'll only be the R&D guys that are working on this properly. It's too early to be putting any heavyweight resources in (unless you are one of the huge AAA shops - e.g. Epic/Unreal).

The next generation CryEngine has been shown with a D3D10 path, but no indication as to whether it was pure 10 or just used it. I forget the name, but there is a presentation at this years GDC revealing some WIP of a D3D10 title - but details are suitably sketchy on that one as well.

Despite the fact that some people have had D3D10 a lot longer than it's been public, the fact that there is no hardware (that I know of!) available makes development somewhat challenging. Yes, you can develop on the REFRAST but realistic expectations of the HW are few and far between. The only one I've seen recently is that NV/ATI recommend the GS should only emit a "few dozen" primitives out of the 1024 possible...

Jack
 
poly-gone said:
Just curious, how many Game Developers are out there (suspectedly) developing a pure D3D (DX?) 10 engine? Since that way they could really leverage the full power of D3D10 without having to "scale" across various hardware. I think Unreal Engine 4 could be a potential candidate. Any ideas?

Well, none that you'd care even in the mid-term. Since D3D10 will require Vista no one would be crazy enough to release a game for such a small segment of the market, not to mention D3D10 hardware.
 
I agree with Mordenkainen. I think there will be a big need for Vista to get popular before Direct3d 10 becomes popular though I doubt that would take very long since it will be bundled in with every vanilla PC sold.
 
JHoxley said:
The only one I've seen recently is that NV/ATI recommend the GS should only emit a "few dozen" primitives out of the 1024 possible...

Jack
Do you remember where you saw them comment on GS performance? And it's 1024 verts, not primitives AFAIK.
 
Mordenkainen said:
Well, none that you'd care even in the mid-term. Since D3D10 will require Vista no one would be crazy enough to release a game for such a small segment of the market, not to mention D3D10 hardware.

By the time low-end D3D10 hardware will have sufficiently high user-base (that is 3-4 years from now), Vista requirement might not be an issue...
Until that the only pure engines will be in techdemos and benchmarks.

But I'm sure some developers will start to work on some of those engines as soon as they have hardware.
 
3dcgi said:
Do you remember where you saw them comment on GS performance? And it's 1024 verts, not primitives AFAIK.
Forget my vert comment as with a tri strip you can essentially have 1 vert per primitive.
 
Does anyone know if D3D10 has any way of specifying that the processing order of primitives (or even batches of primitives) is unimportant?
I.e. can I tell the GPU that for N instances of some object, the order in which the instances (and the triangles within an instance) are rasterized or shaded is completely unimportant?

Basically I'm wondering how parallel the implementation of the GS is going to be - as it stands, it looks like the GS will be operating on a single primitive at a time - is this anywhere near correct?

Serge
 
psurge said:
Basically I'm wondering how parallel the implementation of the GS is going to be - as it stands, it looks like the GS will be operating on a single primitive at a time - is this anywhere near correct?

Yes the GS program operates on a single primitive and can output multiple primitives.
It can however use information from adjacent vertices if the topology information is provided (that means more indices per primitive).

OTOH, I don't see why couldn't multiple primitives be run paralelly with apropriate buffering of the output.
 
psurge said:
Does anyone know if D3D10 has any way of specifying that the processing order of primitives (or even batches of primitives) is unimportant?
I.e. can I tell the GPU that for N instances of some object, the order in which the instances (and the triangles within an instance) are rasterized or shaded is completely unimportant?

Basically I'm wondering how parallel the implementation of the GS is going to be - as it stands, it looks like the GS will be operating on a single primitive at a time - is this anywhere near correct?

Serge
AFAIK, no such mechanism currenty exists. And even if it were to exist, it would run into situations with non-deterministic rendering results, which is a nightmare for both hardware and game developers to deal with.

But even with the requrement to emulate strictly serial operation, It isn't really that hard to parallellize GS operation, though: there are standard parallel-computing algorithms/data structures that can handle both parallel generation and fast traversal of variable-length data structures. The main complication is that the number of polygons that the GS can output per input polygon is large; if the number is N (1024 has been mentioned in this thread) and the GS has M threads, then you need to dynamically buffer a section of the generated data with at least N*M polygons, which implies that you pretty much have to be able to use off-chip memory to buffer some of the polygons in the more pathological cases.
 
Last edited by a moderator:
I get the impression that using off-chip memory is the only way this will work, which is where Stream Out (memexport) comes in.

As I understand it, the GPU is generating a replacement vertex buffer, based on the buffer given to it. Indeed I think the idea is that the GPU generates multiple buffers.

Or, at the very least, the GPU generates an additional buffer to be read back in on a second pass at the same time as the given buffer.

Jawed
 
Jawed said:
I get the impression that using off-chip memory is the only way this will work, which is where Stream Out (memexport) comes in.
Off-chip buffers will likely be required for the Geometry Shader to ever reach reasonable performance at a reasonable cost (because of the severe lack of 1:1 correspondence between input and output + the desire to parallellize operation), but there is no need to expose the off-chip buffers/traffic as an API-visible feature, as memexport would be.
As I understand it, the GPU is generating a replacement vertex buffer, based on the buffer given to it. Indeed I think the idea is that the GPU generates multiple buffers.
The GS output data set can presumably be considered as one vertex buffer generated per input polygon, which is then read back in later (by a 2nd-pass vertex shader or triangle setup or whatever).
 
When writing a GS you have to specify how much data you're going to write out (or, more specifically, a maximum) which is used to allocate the correctly sized buffer.

The maximum number of 32-bit values that can be output from a geometry shader is 1024
So, if you throw out a regular XYZ then you can only get ~340 vertices.

hth
Jack
 
psurge said:
Does anyone know if D3D10 has any way of specifying that the processing order of primitives (or even batches of primitives) is unimportant?
I.e. can I tell the GPU that for N instances of some object, the order in which the instances (and the triangles within an instance) are rasterized or shaded is completely unimportant?

Basically I'm wondering how parallel the implementation of the GS is going to be - as it stands, it looks like the GS will be operating on a single primitive at a time - is this anywhere near correct?

Serge
It's just like a vertex shader only with primitives. So in situations where vertices can be processed in parallel primitives can be processed in parallel.
 
3dcgi said:
It's just like a vertex shader only with primitives. So in situations where vertices can be processed in parallel primitives can be processed in parallel.
Not quite. In both cases you need to buffer the data at the output of the unit, so that data order can be restored before being presented to the next unit in the pipeline. There are two differences that make the geometry shader much harder to parallellize in practice:
  • The vertex shader outputs a fixed-size data set, while the geometry shader outputs a variable-size data set. This introduces problems with parallelizing processing of output from the geometry shader in a way that has no equivalent with vertex shaders.
  • The vertex shader dumps all data at once at the end of the program, whereas the geometry shader can dump data continuously during the entire program invocation. (You can maintain vertex order out of the a vertex shader array extremely cheaply by simply not completing a shader program if such completion would cause out-of-order vertices; a geometry shader array doesn't permit you to do that.)
 
arjan de lumens said:
but there is no need to expose the off-chip buffers/traffic as an API-visible feature, as memexport would be.
I think that would be rather like giving C programmers malloc(), but taking away pointers.

Jawed
 
Jawed said:
I think that would be rather like giving C programmers malloc(), but taking away pointers.
Why use stream out only to read it back into the vertex shader manually when it can simply do all the work for you automatically?
 
Last edited by a moderator:
Back
Top