Emulating pixel shaders?

Zvekan

Newcomer
Ok I have a question, why can't DirectX emulate PS effects.

I know that VS can be emulated and quite succesfully on a modern CPU, but what is preventing PS emulation? Is it insufficent bandwidth?

Also how can DX reference rasterizer on CPU be so slow compared to VPUs as their MIPS aren't that much different? (20 fps vs more than an hour per frame)

Zvekan
 
Pixel shading can't be emulated on a CPU without moving the entire rasterization process including vertex shading etc to the CPU, this would obviously be insanely slow compared to a hardware implementation.

Take the latest Radeon 9800XT for example, it has about 3.2 billion pixels per second of fillrate in theory. That would give a 3.2GHz P4 *one* clock cycle per pixel to complete all work that needs to be done to finish rendering that pixel. Obviously that's not near enough.

3D renderers out-OPS a CPU so bad it's not even funny. We'll most likely never see general CPUs catch up to dedicated 3D rendering chips simply because the tasks they're aimed at are so different.


*G*
 
Bandwidth is more than sufficient.

One of the problems is that CPU instructions are not as powerful as GPU instructions. For example, in the SSE instruction set you can do a 4-component vector addition in 2 clock cycles, but only that, and without copying the result in another register (two operands). The GPU can do it in one clock cycle, plus swizzle and mask components, and write the result in an independent register. On the other hand, GPUs floating-point format are not IEEE standard.

The CPU also lacks registers. You have to read/write data from/to memory quite often. And although the cache speeds that up a lot, it still isn't optimal. And before you ask, there's little that can be done about it. AMD has doubled the number of registers, but this is bad for multi-threading since more data has to be stored/restored at every context switch. The GPU doesn't have that disadvantage since registers are managed by hardware and not software.

Another issue is parallelism. The GPU is desined to do 4 or 8 vertices/pixels in parallel. That's possible since nearly everything is independent. So even though the clock speed isn't that high, it processes more data in less time. Current desktop CPUs have absolutely no parallelism, but future forms of Hyper-Threading could introduce it.

[shameless plug]

Anyway, have a look at my signature to see a near-optimal ps 2.0 emulator. Actually it's not an emulator. It compiles ps 2.0 code directly into optimized MMX/SSE instructions. Except for copy propagation and peephole optimization it's as good as it gets.

The reference rasterizer is much slower because it's totally written in C. It 'compiles' the shader into interpreted instructions, much like Java. But unlike swShader it does not use MMX or SSE, and doesn't keep variables in registers. Furthermore, things like texture lookup checks every render state...
 
Nick said:
Anyway, have a look at my signature to see a near-optimal ps 2.0 emulator. Actually it's not an emulator. It compiles ps 2.0 code directly into optimized MMX/SSE instructions. Except for copy propagation and peephole optimization it's as good as it gets.

The reference rasterizer is much slower because it's totally written in C. It 'compiles' the shader into interpreted instructions, much like Java. But unlike swShader it does not use MMX or SSE, and doesn't keep variables in registers. Furthermore, things like texture lookup checks every render state...

Um, little OT - many of the mirrors that are supposed to be hosting the file, don't have it :/
 
Rambler said:
Um, little OT - many of the mirrors that are supposed to be hosting the file, don't have it :/
Yeah, sometimes they do, sometimes they don't. That's why there's more than one, isn't it? ;)

Anyway, the CVS server is quite reliable and you can have a preview of my most recent features...
 
Nick said:
[shameless plug]

Anyway, have a look at my signature to see a near-optimal ps 2.0 emulator. Actually it's not an emulator. It compiles ps 2.0 code directly into optimized MMX/SSE instructions. Except for copy propagation and peephole optimization it's as good as it gets.

The reference rasterizer is much slower because it's totally written in C. It 'compiles' the shader into interpreted instructions, much like Java. But unlike swShader it does not use MMX or SSE, and doesn't keep variables in registers. Furthermore, things like texture lookup checks every render state...

I just downloaded your swShader and I look very promising, but of course I have many questions :)

Unfortunately I don't have much experience with programing so I don't quite get it how to put my shaders? Do I have to change VShader.txt and PShader.txt?

If that is true, I have to put assembly in txt files? Would it work if I just copy 3DMark03 shaders for example? Or would missing textures for tex. lookup prevent it from functioning properly?

Sorry for my ignorance :),

Zvekan
 
Zvekan said:
Unfortunately I don't have much experience with programing so I don't quite get it how to put my shaders? Do I have to change VShader.txt and PShader.txt?
Yes, currently the teapot is using those shaders.
If that is true, I have to put assembly in txt files? Would it work if I just copy 3DMark03 shaders for example? Or would missing textures for tex. lookup prevent it from functioning properly?
The shader file can have any extension. As long as they follow ps 2.0 specifications most of it 'should' work. I say most of it because some features like fog are not implemented yet. I think 3DMark03 shaders won't work just like that, but you could try it 'line by line'. It would be great if you could let me know if you find some unexpected incompatibilities. It's all still very beta...

To load extra textures, have a look at Main/Application.cpp. With a little bit of C++ knowledge you can extend the texture array, load extra .tga or .jpg files and assign them to texture stages/samplers. The fixed-function pipeline also 'should' be functionally compatible with DirectX 9.
Sorry for my ignorance :)
No problem! Thanks a lot for the interest!
 
Nick said:
The shader file can have any extension. As long as they follow ps 2.0 specifications most of it 'should' work. I say most of it because some features like fog are not implemented yet. I think 3DMark03 shaders won't work just like that, but you could try it 'line by line'. It would be great if you could let me know if you find some unexpected incompatibilities. It's all still very beta...

To load extra textures, have a look at Main/Application.cpp. With a little bit of C++ knowledge you can extend the texture array, load extra .tga or .jpg files and assign them to texture stages/samplers. The fixed-function pipeline also 'should' be functionally compatible with DirectX 9.

It try it when I obtain 3DMark03 (don't have it at home). Thanks for additional info. Looking at Application.cpp helped a lot.

Now if I understood it correctly, I can specify under setVertexShader and setPixelShader filenames that I want to use (ie. where shader code is)?

model = new Model3DS("Teapot.3ds"); --> can I use any 3ds model?

Also I see that you use two textures. If i want to add more what must I do? I presume adding texture[2] = new Texture("name.jpg"); and renderer->setTextureMap(2, texture[2]); would just add it but it wouldn't be used for anything.

If you spent so much time working on software emulation you probably have some performance number that would be very interesing? Like what is the difference in speed vs P4 and R350? You already specified the reasons why software shading takes so much time, but how does it convert to real numbers?

I already tried Eric Bron's application that can use over 16 billion of vertices although it runs under 1 fps on every machine it tried it.

Sorry if I'm bothering you, and if you feel something is to complicated to respond to feel free to skip the difficoult questions :)

Zvekan
 
Zvekan said:
Now if I understood it correctly, I can specify under setVertexShader and setPixelShader filenames that I want to use (ie. where shader code is)?
Indeed.
model = new Model3DS("Teapot.3ds"); --> can I use any 3ds model?
The 3ds loader is very minimalistic. swShader is a renderer, not an engine. I'm just using it to draw something more interesting than a single triangle. But I wouldn't hope for anything better than a teapot...
Also I see that you use two textures. If i want to add more what must I do? I presume adding texture[2] = new Texture("name.jpg"); and renderer->setTextureMap(2, texture[2]); would just add it but it wouldn't be used for anything.
Yes. You would also have to make the array bigger in Application.hpp, and best delete it in the destructor Application::~Application().
If you spent so much time working on software emulation you probably have some performance number that would be very interesing? Like what is the difference in speed vs P4 and R350? You already specified the reasons why software shading takes so much time, but how does it convert to real numbers?
I don't have an R350. ;) Actually I don't even have hardware that supports ps 2.0 shaders! So I'm just working 'in the dark' and trying to stick as close to the documentation as possible. That's why I would be very grateful if you or anyone else with real ps 2.0 expercience could inform me of any strange behaviour.

I don't have real performance numbers yet because I haven't started the real optimization phase yet (although the overall architecture should ensure quite optimal performance). You could guesstimate it by running the application in debug mode, and looking at the generated listing files. As a first indication of performance you could count the number of instructions.
I already tried Eric Bron's application that can use over 16 billion of vertices although it runs under 1 fps on every machine it tried it.
Just transforming vertices can be done extremely quickly with SSE, processing four vertices in parallel. But with vs 2.0 this is not possible because of branching so that they have to be processed separately. This is far less optimal.

But I suggest you to try to do a few dozen texture samples. I think you might be surprised that performance doesn't drop radically.
Sorry if I'm bothering you, and if you feel something is to complicated to respond to feel free to skip the difficoult questions :)
You are certainly not bothering me. If you have any more questions I'd be happy to answer them!
 
no_way said:
Pixel shaders in software ? Hm ... give http://www.realstorm.com benchmark a try.

I have seen RealStorm, but it doesn't emulate pixel shader effect, but instead uses things that wont be available on graphics cards for at least some time.

Like: Primitives which are not triangles (Spheres, Cylinders, Hyperbols, Planes, Polygons), raytracing and such....

Although some things seem interesting, like Reflection mapping - a texture which defines per pixel if there is reflection or not. But i'am not sure if that can qualify as pixel shader effect or not?

Zvekan
 
davepermen said:
uhm, no, realstorm has nothing to do with pixelshading at all, hehe.. :D
Yes, im aware of that it doesnt do "pixel shading" as defined in HW 3D APIs, but it _does_ quite impressive rasterizing in CPU at _almost_ interactive framerates.
 
uhm, no. it doesn't perform any rastericing eighter. its a raytracing engine. thats WAY OFF anything gpu's can do (they rasterice..)

btw, realstorm benchmark 2004 is out.. and it looks amazing
 
davepermen said:
it doesn't perform any rastericing eighter. its a raytracing engine. thats WAY OFF anything gpu's can do (they rasterice..)
Isn't "rasterizing" just any process that gets pixels as the final result (doesn't matter whether it's raytracing, scanline rendering or whatnot)?
BTW, by some perverse GPU usage you can get raytracing (Stanford's papers), but, yes, it's not 'natural' to GPUs.
And while we're at it: 'pixel shading' in turn could also be any process that calculates something on pixels. Then, raytracers also do some form of 'pixel shading' (adding reflection ray result and diffuse texture, etc.)
 
i know you can map anything to anything:D done raytracing on hw myself, too.. very limited and non-usable today, but performance is promising. (1500 fps to trace one sphere at 320x240 IS promising, not?:D and that with an "old" radeon9700pro..)

well, rastericing is actually the scanline thing. its when you map some geometrical (2d!!) thing onto a raster, and fill out the parts where it fits. thats what scanline based renderers do. they do rasterice. word does, too, for its circles, and polygones, etc. its essencially a 2d thing and will always be (thats why i don't believe in a future for gpu's in that direction in the long term).

pixelshading is generally a simple small function that gets executed per pixel. as only rastericers generate real "pixels", it only fits there well. raytracers generate 3d intersection points..

in a raytracer, you generally need 2 or 3 shader types: intersection shaders wich describe where an object is (equation of a sphere, or what ever object you want), material shader, wich possibly emits reflection/refraction or what ever-rays, and finally lightshaders wich spit out light.

about that.

but of course, you can mix all those terms around in much funny ways. all you want in the end is an image..

i wanna see athlon64 benches for realstorm 2004!!!!
 
davepermen said:
i know you can map anything to anything:D done raytracing on hw myself, too.. very limited and non-usable today, but performance is promising. (1500 fps to trace one sphere at 320x240 IS promising, not?:D and that with an "old" radeon9700pro..)

Is it something like this?
 
Back
Top