### Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 11-Aug-2004, 08:58 #1 Goragoth Member   Join Date: Feb 2003 Location: NZ Posts: 363 Fast matrix/vector math with SIMD? I'm currently writing a simple software renderer for a university project (rendering a bezier patch basically). Now, I've noticed that I'm doing a lot of matrix/vector multiplication (in the bezier patch tesselator and for all the transform/projection calcualtions for all the vertices). I might download the AMD analyzer program to see if that really is a big bottleneck but I suspect so (especially since I'm not using indexed triangle lists at the moment, so lots of duplicate vertices - I'll fix that soon though). So I'm wondering if it is wise to use SSE for the matrix/vector multiplication? Does anybody have some code for that they might like to share, because unfortunately I know close to no assembly and looking at the Intel SSE documentation wasn't all that helpful. If its too complex I'll skip it and just try to optimize what I can but since I've heard that using SSE can double the speed of these sorts of parallel calculations I would think it worth a shot.
 11-Aug-2004, 09:44 #2 Simon F Tea maker   Join Date: Feb 2002 Location: In the Island of Sodor, where the steam trains lie Posts: 4,396 Out of curiousity, what method are you using to do the tessellation? If it's recursive subdivision then, IIRC, the subdivision matrix contains a lot of very simple constants so using a matrix multiply might be a bit of a brute force approach. Why not take a look at the code in Watt&Watt's "Advanced Animation and Rendering Techniques"? There are other approaches, of course, but the recursive subdiv' algorithm is a simple one to start with and would be challenging enough for a Uni' project. __________________ "Your work is both good and original. Unfortunately the part that is good is not original and the part that is original is not good." -(attributed to) Samuel Johnson "I invented the term Object-Oriented, and I can tell you I did not have C++ in mind." Alan Kay
 11-Aug-2004, 10:01 #3 Nick Senior Member   Join Date: Jan 2003 Location: Ottawa, Ontario Posts: 1,783 Like Simon F sais, you should first look for the most optimal algorithm. I can imagine that since you're not using indexed vertices, some are re-used up to eight times. So even though assembly can help, just using an efficient algorithm can yield much more impressive results. Anyway, I'd be glad to help you out with the SIMD code, I just first want to make sure you've got all other optimizations implemented first. It happens a lot that assembly optimizations become useless when the algorithm is improved. So try to work from top to bottom as much as possible. Good luck!
 11-Aug-2004, 10:06 #4 Goragoth Member   Join Date: Feb 2003 Location: NZ Posts: 363 Right now its very brute force with matrix multiplication because that's what we got taught and it seemed easy. The book we are using is Foley/Van Dam and that explains it the same way. I was looking at an old gamedev mag article on central differencing though, that approach looks interesting and I might experiment with it. No matter what though each vertex needs to be projected and that's a lot of multiplies (Matrix * Vector). I figured it wouldn't be all that hard to load 4 floats in one SSE register, another set in another register and then do a combined multiply, from what I've read on SIMD that's exactly the kind of thing it is for or am I completely wrong? It's not that important, there are probably lots of other ways of optimizing my code that will make more of a difference, I just thought it would be easy, sort of three or four lines of inline assembly.
 11-Aug-2004, 11:04 #5 Scali Naughty Boy!   Join Date: Nov 2003 Posts: 2,127 You could use the D3DX library from the DirectX SDK. It contains matrix/vector/quaternion functions which are optimized for x87, 3dnow!, SSE and SSE2. The best path for the CPU is automatically selected at runtime.
11-Aug-2004, 12:15   #6
Simon F
Tea maker

Join Date: Feb 2002
Location: In the Island of Sodor, where the steam trains lie
Posts: 4,396

Quote:
 Originally Posted by Goragoth Right now its very brute force with matrix multiplication because that's what we got taught and it seemed easy. The book we are using is Foley/Van Dam and that explains it the same way.
So you're using...:
Code:
```[ 8 0 0 0]
[ 4 4 0 0] * 1/8
[ 2 4 2 0]
[ 1 3 3 1]```

You can reformulate that to save a LOT of calculations. I really suggest you take a look at Watt &amp; Watt.

Quote:
 I was looking at an old gamedev mag article on central differencing though, that approach looks interesting and I might experiment with it.
I guess that's Clark's method. That's also described in Watt&amp;Watt. It makes trying to decide when to terminate due to flatness cheaper as well.

Quote:
 No matter what though each vertex needs to be projected and that's a lot of multiplies (Matrix * Vector).
Ahhhh. The projection matrix +division is very simple so doesn't really cost much.

The local->world matrix, OTOH is more expensive but subdivision is a linear operation... so you can apply your transform/projection matrix to the original control points before subdividing. Does that make you happier?
__________________
"Your work is both good and original. Unfortunately the part that is good is not original and the part that is original is not good." -(attributed to) Samuel Johnson

"I invented the term Object-Oriented, and I can tell you I did not have C++ in mind." Alan Kay

 11-Aug-2004, 13:22 #7 Dio Senior Member   Join Date: Jul 2002 Location: UK Posts: 1,758 Man overboard?
11-Aug-2004, 13:38   #8
Goragoth
Member

Join Date: Feb 2003
Location: NZ
Posts: 363

Quote:
 So you're using...: Code: [ 8 0 0 0] [ 4 4 0 0] * 1/8 [ 2 4 2 0] [ 1 3 3 1] Question
Ok, to be exact I'm doing x(s,t) = S * Mb *Gbx * Mbt * Tt
where Mb =
Code:
```[ -1  3 -3  1 ]
[  3 -6  3  0 ]
[ -3  3  0  0 ]
[  1  0  0  0 ]```
Mbt is its transpose (itself) and S and T are [s^3, s^2, s, 1 ] and [t^3, t^2, t, 1 ]

Gbx is the geometry matrix of all the x values of the control points. The middle part (Mb * Gbx * Mbt) is the same for all points during each tesselation so I compute that before a huge loop through s and t. It is very slow but Foley/Van Dam doesn't cover anything else and I don't have access to Watt &amp; Watt although I'll have a look if there is a copy at the uni library and if they don't have it I might order it because it sounds really worthwhile.

Quote:
 Ahhhh. Very Happy The projection matrix +division is very simple so doesn't really cost much. The local->world matrix, OTOH is more expensive but subdivision is a linear operation... so you can apply your transform/projection matrix to the original control points before subdividing. Does that make you happier? Smile
I see what you are saying but I have several matrices for the final projection that get concatonated together:
view transform - translates the scene down z and rotates around x and y.
projection - projects the points
viewport transform - moves all the points so that world coordinates of 0,0 match to the center of the view window rather than 0,0 on the view window and also scales the scene.

Now when I go to redraw the scene I apply those to each of the points without re-tesselating the bezier surface. I also don't really want to couple tesselation and projection because of how I've structured my code.
At maximum tesselation (four divisions of the control mesh) I worked out there are 13824 verticies (indexing would make a huge difference, I know and will implement that soon!) which makes that many vector*matrix operations and each of those is 16 multiplies and 12 adds if my maths is right so that's a lot of calculations going on there, plus each point needs to be homogonized. And I'm not doing any lighting calculations yet. Although I was thinking of computing lighting right after tesselating so that when the user spins the view around the patch it doesn't need to be calculated every frame.

So there's it really. Any suggestions on where I should optimize are welcome.

Looking at the swShader source has actually helped a lot, especially when I was creating my vector and matrix classes, I just wanted to say thank you Nick.

11-Aug-2004, 14:35   #9
Simon F
Tea maker

Join Date: Feb 2002
Location: In the Island of Sodor, where the steam trains lie
Posts: 4,396

Quote:
 Originally Posted by Goragoth Ok, to be exact I'm doing x(s,t) = S * Mb *Gbx * Mbt * Tt where Mb = Code: ```[ -1 3 -3 1 ] [ 3 -6 3 0 ] [ -3 3 0 0 ] [ 1 0 0 0 ]``` Mbt is its transpose (itself) and S and T are [s^3, s^2, s, 1 ] and [t^3, t^2, t, 1 ]
Oh no! That's merely the common way to succinctly specify the equations - you should not use that to evaluate the curve. It's inefficient and mathematically unstable.
Quote:
 I don't have access to Watt & Watt although I'll have a look if there is a copy at the uni library
Go and look. It is worth your time.

Quote:
 I see what you are saying but I have several matrices for the final projection that get concatonated together:
As I would expect. Just concatenate all of them except for the final projection to cannonical clip/Projection coordinates.

Quote:
 Now when I go to redraw the scene I apply those to each of the points without re-tesselating the bezier surface.
OK, if you are only doing the tessellation once but changing the viewpoint frequently, then it's not worth doing.
__________________
"Your work is both good and original. Unfortunately the part that is good is not original and the part that is original is not good." -(attributed to) Samuel Johnson

"I invented the term Object-Oriented, and I can tell you I did not have C++ in mind." Alan Kay

12-Aug-2004, 03:00   #10
Goragoth
Member

Join Date: Feb 2003
Location: NZ
Posts: 363

Quote:
 Oh no! That's merely the common way to succinctly specify the equations - you should not use that to evaluate the curve. It's inefficient and mathematically unstable.
Well, I knew it was bad but I'm definatively going to look for another way to do tesselation now
I got out the Watt &amp; Watt book from the library and had a look over that. It looks mildly complicated but I'll try to implement the central differencing as soon as I have indexed lists, simple lighting and zbuffering working.

If anyone is interested the app so far can be had here. Its about 320kb zipped up (the size is mostly from using wxWindows for the graphics). Be sure to read the readme for the commands.

 Thread Tools Display Modes Linear Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home News Forums     Beyond3D News         Press Releases         Beyond3D Articles Core Forums     3D Architectures & Chips         3D Beginner's Questions     3D Technology & Algorithms         3D Programming & Tools     3D Hardware, Software & Output Devices         Video Technology, Displays, & HTPC     3D & Semiconductor Industry     GPGPU Technology & Programming Embedded 3D Forums     Console Forum         Console Technology         Console Games             PC Games     Handheld Gaming     Handheld Technology     CellPerformance@B3D PC Forums     Hardware & Software Talk         Politics & Ethics of Technology         Unix, Mac, & BSD (3D)     Processor & Chipset Technology     Purchase Decisions Help     PC Games         Console Games Site Forums     General Discussion     Folding For Beyond3D Team #32377     Industry Jobs     Site Feedback Beyond3D Hall of Fame     Pre-release GPU Speculation     General 3D Technology     Consoles     Other

 Similar Threads Thread Thread Starter Forum Replies Last Post Jawed 3D Architectures & Chips 9 24-Jun-2005 21:50 Colourless 3D Architectures & Chips 20 08-May-2003 04:18 megadrive0088 3D Architectures & Chips 35 18-Feb-2003 12:36 Panajev2001a 3D Architectures & Chips 64 24-Dec-2002 23:38 marco Press Releases 0 19-Apr-2002 07:12

All times are GMT +1. The time now is 01:07.

 -- vB3D -- vBulletin Default Style Contact Us - Beyond3D - Archive - Top