D3D10 Streamout: What kind of hardware?

Jawed · May 30, 2006

I'm curious about the kind of hardware that'll be used to implement streamout in D3D10 GPUs.

I suppose we have two precursors: Xenos memexport/backbuffer->frontbuffer copy and render to vertex buffer in ATI's DX9 GPUs.

In Xenos a portion of the Backend Central handles Memexport. Taking a wild stab in the dark, I'd guess that the same hardware also handles the render target data produced by the resolve process, i.e. the resolved backbuffer data which needs to be written to the frontbuffer. Both processes seem to require taking a stream of data and putting it in system RAM.

In R2VB, ROPs write directly to memory. I suppose, depending on the quantity of data points written, this may or may not occupy all of the ROPs simultaneously.

---

So, what kind of hardware configuration is best suited to performing streamout, and what degree of parallelism would suit streamout?

Is it a bad idea to use a GPU's ROPs to handle streamout? If so, what sort of complexity would a dedicated streamout unit entail? Presuming that streamout and pixel output can both occur simultaneously, what kind of demands is streamout going to place on the overall architecture?

What kind of streamout bandwidth will early D3D10 GPUs aim at? Will streamout be a features-first, performance-later runt in the first D3D10 GPUs?

Jawed

DemoCoder · May 30, 2006

Just a nit, nVidia cards support R2VB in OpenGL, it's not exactly new. What's new is the FOURCC exposure technique used in DX9.

Nom De Guerre · May 31, 2006

Jawed said:
an honest post

The existence of such a question is evidence of the horrible mess Microsoft, NVIDIA and ATi (please excuse me if I missed any other noteworthy IHVs... not that there exists any other(s) in the current landscape) have collaboratively made of the GPU/VPU/whathaveyous architecture.

On reasonable hardware, you'd simply have support for pointers and "move mem,reg" and "mov reg,mem" instructions like the 6502 and 8088 did back in the late 1970s. Of course, they were borrowing well-known mainframe and minicomputer instructions from the early 1960s.

But oh no; graphics companies need (or needed; I hope this is already past tense to some IHVs) arcane dedicated hardware and a huge set of APIs and specifications to do this.

Why?

Because they still think graphics computing is a special case, completely unlike any sane and reasonable model of ordinary computing.

JHoxley · May 31, 2006

Sadly I lack the low-level hardware knowledge to add anything useful, but.... LOL @ Nom De Guerre.

Graphics processing in the theoretical sense has a lot of characteristics that make it very easy to generate specialised hardware for (e.g. the pipeline process A->B->C->D and the parallelisable/repetitive/simple operations on pixels). However the fact that it becomes specialized usually has other trade-offs such as making some simple CPU-like tasks less simple (we've already seen that in dynamic brancing for SM2/SM3).

I'd be very surprised if you could come up with conclusive evidence to suggest that a more CPU-like and non-specialized architecture is all-round better (performance and cost-effectiveness) than what we currently have and will have.

Jack

Zeross · May 31, 2006

DemoCoder said:
Just a nit, nVidia cards support R2VB in OpenGL, it's not exactly new.

It's not really R2VB but more Copy to vertex buffer using PBO, unless there is another technique I don't know.

DemoCoder · Jun 2, 2006

Zeross said:
It's not really R2VB but more Copy to vertex buffer using PBO, unless there is another technique I don't know.

You're right it copies (I thought I recalled somewhere that the copy can be eliminated somehow, but can't remember) but in any case, the copy isn't that expensive. Does R2VB not use a copy? Seems to me that it could be implemented either way by the driver.

RoOoBo · Jun 2, 2006

DemoCoder said:
You're right it copies (I thought I recalled somewhere that the copy can be eliminated somehow, but can't remember) but in any case, the copy isn't that expensive. Does R2VB not use a copy? Seems to me that it could be implemented either way by the driver.

If the output is already properly formated it just a matter of 'renaming' that memory area from render target to vertex input buffer or whatever. And this could be done by the driver just changing a few internal GPU registers.

Humus · Jun 3, 2006

DemoCoder said:
Does R2VB not use a copy? Seems to me that it could be implemented either way by the driver.

It doesn't. It could of course be implemented that way, but that would of course be slower. It could still be workable, just like glCopyTexImage2D() was commonly used for render to texture in OpenGL before the days of pBuffers and FBOs.

WaltC · Jun 3, 2006

Nom De Guerre said:
...
On reasonable hardware, you'd simply have support for pointers and "move mem,reg" and "mov reg,mem" instructions like the 6502 and 8088 did back in the late 1970s. Of course, they were borrowing well-known mainframe and minicomputer instructions from the early 1960s....

But oh no; graphics companies need (or needed; I hope this is already past tense to some IHVs) arcane dedicated hardware and a huge set of APIs and specifications to do this.

Why?

Because they still think graphics computing is a special case, completely unlike any sane and reasonable model of ordinary computing.

Great...

Show me the ordinary cpu that can render 3d graphics like the current gpus you're decrying as a "mess" and we'll talk...

I cannot think of a single one that can even come close--presumably because 3d graphics is indeed quite a special case apart from general cpu technology.

As to APIs and the need for them, haven't you considered the plight of the games software developer? It seems to me that APIs were developed to help that developer as opposed to hindering him. To borrow your logic and twist it a bit, if APIs weren't helping developers they'd never have accepted them as a framework and the concept of an API would have failed on the drawing board, wouldn't it? After all, long before there was 3d there was 2d acceleration, and before that software 2d, and in the beginning with the model of ordinary computing, 2d graphics were about as ordinary as they could get--like for instance with black & white (2-color) monochrome & text-only displays. What, do you think a return to gpu-less, assembly-language programming is the proper path for the future? I'm not really sure I understand what your point, or complaint, might be...?

I mean, what's wrong with specialization?

Nom De Guerre · Jun 5, 2006

WaltC said:
Great... Show me the ordinary cpu that can render 3d graphics like the current gpus you're decrying as a "mess" and we'll talk...

I said no such thing. I was talking about situations. Graphics IHVs do not live on desert islands.

I now personally understand the term "taken out of context".

D3D10 Streamout: What kind of hardware?

Jawed

DemoCoder

Nom De Guerre

JHoxley

Zeross

DemoCoder

RoOoBo

Humus

Crazy coder

WaltC

Nom De Guerre

Similar threads