If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#1 |
|
Member
Join Date: Feb 2003
Location: NZ
Posts: 363
|
I'm currently writing a simple software renderer for a university project (rendering a bezier patch basically). Now, I've noticed that I'm doing a lot of matrix/vector multiplication (in the bezier patch tesselator and for all the transform/projection calcualtions for all the vertices). I might download the AMD analyzer program to see if that really is a big bottleneck but I suspect so (especially since I'm not using indexed triangle lists at the moment, so lots of duplicate vertices - I'll fix that soon though).
So I'm wondering if it is wise to use SSE for the matrix/vector multiplication? Does anybody have some code for that they might like to share, because unfortunately I know close to no assembly and looking at the Intel SSE documentation wasn't all that helpful. If its too complex I'll skip it and just try to optimize what I can but since I've heard that using SSE can double the speed of these sorts of parallel calculations I would think it worth a shot. |
|
|
|
|
|
#2 |
|
Tea maker
Join Date: Feb 2002
Location: In the Island of Sodor, where the steam trains lie
Posts: 4,396
|
Out of curiousity, what method are you using to do the tessellation? If it's recursive subdivision then, IIRC, the subdivision matrix contains a lot of very simple constants so using a matrix multiply might be a bit of a brute force approach. Why not take a look at the code in Watt&Watt's "Advanced Animation and Rendering Techniques"?
There are other approaches, of course, but the recursive subdiv' algorithm is a simple one to start with and would be challenging enough for a Uni' project.
__________________
"Your work is both good and original. Unfortunately the part that is good is not original and the part that is original is not good." -(attributed to) Samuel Johnson "I invented the term Object-Oriented, and I can tell you I did not have C++ in mind." Alan Kay |
|
|
|
|
|
#3 |
|
Senior Member
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
|
Like Simon F sais, you should first look for the most optimal algorithm. I can imagine that since you're not using indexed vertices, some are re-used up to eight times. So even though assembly can help, just using an efficient algorithm can yield much more impressive results.
Anyway, I'd be glad to help you out with the SIMD code, I just first want to make sure you've got all other optimizations implemented first. It happens a lot that assembly optimizations become useless when the algorithm is improved. So try to work from top to bottom as much as possible. Good luck! |
|
|
|
|
|
#4 |
|
Member
Join Date: Feb 2003
Location: NZ
Posts: 363
|
Right now its very brute force with matrix multiplication because that's what we got taught and it seemed easy. The book we are using is Foley/Van Dam and that explains it the same way. I was looking at an old gamedev mag article on central differencing though, that approach looks interesting and I might experiment with it.
No matter what though each vertex needs to be projected and that's a lot of multiplies (Matrix * Vector). I figured it wouldn't be all that hard to load 4 floats in one SSE register, another set in another register and then do a combined multiply, from what I've read on SIMD that's exactly the kind of thing it is for or am I completely wrong? It's not that important, there are probably lots of other ways of optimizing my code that will make more of a difference, I just thought it would be easy, sort of three or four lines of inline assembly. |
|
|
|
|
|
#5 |
|
Naughty Boy!
|
You could use the D3DX library from the DirectX SDK. It contains matrix/vector/quaternion functions which are optimized for x87, 3dnow!, SSE and SSE2. The best path for the CPU is automatically selected at runtime.
|
|
|
|
|
|
#6 | |||
|
Tea maker
Join Date: Feb 2002
Location: In the Island of Sodor, where the steam trains lie
Posts: 4,396
|
Quote:
Code:
[ 8 0 0 0] [ 4 4 0 0] * 1/8 [ 2 4 2 0] [ 1 3 3 1] You can reformulate that to save a LOT of calculations. I really suggest you take a look at Watt & Watt. Quote:
Quote:
The local->world matrix, OTOH is more expensive but subdivision is a linear operation... so you can apply your transform/projection matrix to the original control points before subdividing. Does that make you happier?
__________________
"Your work is both good and original. Unfortunately the part that is good is not original and the part that is original is not good." -(attributed to) Samuel Johnson "I invented the term Object-Oriented, and I can tell you I did not have C++ in mind." Alan Kay |
|||
|
|
|
|
|
#7 |
|
Senior Member
Join Date: Jul 2002
Location: UK
Posts: 1,758
|
Man overboard?
|
|
|
|
|
|
#8 | ||
|
Member
Join Date: Feb 2003
Location: NZ
Posts: 363
|
Quote:
where Mb = Code:
[ -1 3 -3 1 ] [ 3 -6 3 0 ] [ -3 3 0 0 ] [ 1 0 0 0 ] Gbx is the geometry matrix of all the x values of the control points. The middle part (Mb * Gbx * Mbt) is the same for all points during each tesselation so I compute that before a huge loop through s and t. It is very slow but Foley/Van Dam doesn't cover anything else and I don't have access to Watt & Watt although I'll have a look if there is a copy at the uni library and if they don't have it I might order it because it sounds really worthwhile. Quote:
view transform - translates the scene down z and rotates around x and y. projection - projects the points viewport transform - moves all the points so that world coordinates of 0,0 match to the center of the view window rather than 0,0 on the view window and also scales the scene. Now when I go to redraw the scene I apply those to each of the points without re-tesselating the bezier surface. I also don't really want to couple tesselation and projection because of how I've structured my code. At maximum tesselation (four divisions of the control mesh) I worked out there are 13824 verticies (indexing would make a huge difference, I know and will implement that soon!) which makes that many vector*matrix operations and each of those is 16 multiplies and 12 adds if my maths is right so that's a lot of calculations going on there, plus each point needs to be homogonized. And I'm not doing any lighting calculations yet. Although I was thinking of computing lighting right after tesselating so that when the user spins the view around the patch it doesn't need to be calculated every frame. So there's it really. Any suggestions on where I should optimize are welcome. Looking at the swShader source has actually helped a lot, especially when I was creating my vector and matrix classes, I just wanted to say thank you Nick. |
||
|
|
|
|
|
#9 | ||||
|
Tea maker
Join Date: Feb 2002
Location: In the Island of Sodor, where the steam trains lie
Posts: 4,396
|
Quote:
Quote:
Quote:
Quote:
__________________
"Your work is both good and original. Unfortunately the part that is good is not original and the part that is original is not good." -(attributed to) Samuel Johnson "I invented the term Object-Oriented, and I can tell you I did not have C++ in mind." Alan Kay |
||||
|
|
|
|
|
#10 | |
|
Member
Join Date: Feb 2003
Location: NZ
Posts: 363
|
Quote:
I got out the Watt & Watt book from the library and had a look over that. It looks mildly complicated but I'll try to implement the central differencing as soon as I have indexed lists, simple lighting and zbuffering working. If anyone is interested the app so far can be had here. Its about 320kb zipped up (the size is mostly from using wxWindows for the graphics). Be sure to read the readme for the commands. |
|
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Vertex Shaders: MIMD versus SIMD | Jawed | 3D Architectures & Chips | 9 | 24-Jun-2005 21:50 |
| Fast writes, they are actually useful!?! | Colourless | 3D Architectures & Chips | 20 | 08-May-2003 04:18 |
| Nvidia CEO: working on something twice as fast? | megadrive0088 | 3D Architectures & Chips | 35 | 18-Feb-2003 12:36 |
| specialized HW is fast, but new CPUs are fast as well... | Panajev2001a | 3D Architectures & Chips | 64 | 24-Dec-2002 23:38 |
| VIA Launches LAN-on-Motherboard Fast Ethernet Controller | marco | Press Releases | 0 | 19-Apr-2002 07:12 |