optimized by post-T&L vertex cache

ultrafly · Feb 20, 2003

hi.

I use the method raised by Mike Abrash(<<Xbox Vertex Performance>>).But I find some strange things.

I suppose the post-T&L wertex should be 11 wertexs in my R9500 pro 64M.

First, after using 20,000 triangles,I got two results,optimized code is 255 fps, and no optimized code is 177 fps. But when I reduce the size of triangles(no change the numbers of triangles), I found the fps under no optimized code is faster than optimized code. why?

Second,I found use D3DFILL_SOLID is faster than D3DFILL_WIREFRAME,why?

I am sorry for my pool english.

Dio · Feb 20, 2003

ultrafly said:
Second,I found use D3DFILL_SOLID is faster than D3DFILL_WIREFRAME,why?

Solid generates fewer primitives than wireframe. For every triangle sent, in wireframe you get 3 lines.

ultrafly · Feb 20, 2003

Why the no optimized code is faster than optimized code after changed the size?

ultrafly · Feb 20, 2003

anyone can help me :?:

ultrafly · Feb 20, 2003

my optimize method is:

sample,suppose the post-T&L wertex should be 15 vertexs,
the triangles:
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
| \ | \ | \ | \ | \ | \ | \ | \ | \ | \ | \ | \ | \ | \ |
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
| \ | \ | \ | \ | \ | \ | \ | \ | \ | \ | \ | \ | \ | \ |
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

the index of the optimized code is:
0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14 ,0, 0 ,15, 1, 16, 2, 17, 3, 18, 4, 19, 5, 20, 6, 21, 7, 22, 8, 23, 9, 24, 10, 25, 11, 26, 12, 27, 13, 28, 14, 29，29, 15, 15 ,30, 16...

the index of the no optimized code is:
0 ,15, 1, 16, 2, 17, 3, 18, 4, 19, 5, 20, 6, 21, 7, 22, 8, 23, 9, 24, 10, 25, 11, 26, 12, 27, 13, 28, 14, 29 ，29，15，15，30，16...

Tagrineth · Feb 20, 2003

ultrafly said:
my optimize method is:

sample,suppose the post-T&L wertex should be 15 vertexs,
the triangles:
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
| \ | \ | \ | \ | \ | \ | \ | \ | \ | \ | \ | \ | \ | \ |
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
| \ | \ | \ | \ | \ | \ | \ | \ | \ | \ | \ | \ | \ | \ |
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

the index of the optimized code is:
0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14 ,0, 0 ,15, 1, 16, 2, 17, 3, 18, 4, 19, 5, 20, 6, 21, 7, 22, 8, 23, 9, 24, 10, 25, 11, 26, 12, 27, 13, 28, 14, 29，29, 15, 15 ,30, 16...

the index of the no optimized code is:
0 ,15, 1, 16, 2, 17, 3, 18, 4, 19, 5, 20, 6, 21, 7, 22, 8, 23, 9, 24, 10, 25, 11, 26, 12, 27, 13, 28, 14, 29 ，29，15，15，30，16...

I'm no software guru, but it looks like you're reusing a LOT of vertices. Why?

The unoptimised is using half as many entries for the same data...

Dio · Feb 20, 2003

The no-optimised order certainly isn't _bad_.

I presume what you're trying to do with the optimal order is to fill in cache entries per line, then use them, then make sure the next line's LRU is updated so they don't get evicted. Problem is your line is too large; you've got twice as many vertices in your cache (if you check, you will see you need 30 vertices for the size you have).

I'm not sure I'd recommend sending that many degenerate triangles (more than 50%). I'd stick with the 'no-optimised order' myself, with a shorter line repeat rate maybe.

ultrafly · Feb 20, 2003

Dio said:
The no-optimised order certainly isn't _bad_.

I presume what you're trying to do with the optimal order is to fill in cache entries per line, then use them, then make sure the next line's LRU is updated so they don't get evicted. Problem is your line is too large; you've got twice as many vertices in your cache (if you check, you will see you need 30 vertices for the size you have).

I'm not sure I'd recommend sending that many degenerate triangles (more than 50%). I'd stick with the 'no-optimised order' myself, with a shorter line repeat rate maybe.

the cache is FIFO,not LRU

after
0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14 ,0

the cache(15 vertexs) is:
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14

after
0 ,15, 1, 16, 2, 17, 3, 18, 4, 19, 5, 20, 6, 21, 7, 22, 8, 23, 9, 24, 10, 25, 11, 26, 12, 27, 13, 28, 14, 29

the cache(15 vertexs) is:
15,16,17,18,19,20,21,22,23,24,25,26,27,28,29

after
15,30,16,31,17,32,18,33,19,34,20,35,21,36,22,37,23,38,24,39,25,40,26,41,27,42,28,43,29,44

the cache(15 vertexs) is:
30,31,32,33,34,35,36,37,38,39,40,41,42,43,44

.............................

every vertex process once.

No optimized code:

after
0 ,15, 1, 16, 2, 17, 3, 18, 4, 19, 5, 20, 6, 21, 7, 22, 8, 23, 9, 24, 10, 25, 11, 26, 12, 27, 13, 28, 14, 29

the cache(15 vertexs) is:
22,8,23,9,24,10,25,11,26,12,27,13,28,14,29

when process the second line:
15,30,16,31,17,32,18,33,19,34,20,35,21,36,22,37,23,38,24,39,25,40,26,41,27,42,28,43,29,44

15:not in cache,reload and process again
30:not in cache,reload and process again
...............

one vertex process more then one times.

Dio · Feb 20, 2003

Where do you get the information that the R9500 Pro cache is FIFO?

ultrafly · Feb 20, 2003

Dio said:
Where do you get the information that the R9500 Pro cache is FIFO?

I suppose.I couldn't find any information about the ATI's vertex cache.
The NVIDIA GPU's vertex cache is FIFO,so i suppose the ATI's vertex cache is also FIFO.

ultrafly · Feb 20, 2003

The ATI's vertex cache is FIFO or LRU?

Dio · Feb 20, 2003

I wouldn't assume one way or the other... I must admit I don't know myself, maybe one of the other ATI chaps on here might?

ultrafly · Feb 20, 2003

Dio said:
I wouldn't assume one way or the other... I must admit I don't know myself, maybe one of the other ATI chaps on here might?

thanks for your reply. :idea:

Simon F · Feb 20, 2003

Dio said:
I wouldn't assume one way or the other... I must admit I don't know myself, maybe one of the other ATI chaps on here might?

Probably a FIFO, as that should actually perform better than an LRU (at least according to Hoppe).

ultrafly said:
the index of the optimized code is:
0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, ...

I was a bit confused by this. I'm assuming that you are forming triangles but why aren't these ordered more like..... 0,1,15; 1, 16, 15; 1, 2, 16;... ?

ultrafly · Feb 20, 2003

Simon F said:
Dio said:

I wouldn't assume one way or the other... I must admit I don't know myself, maybe one of the other ATI chaps on here might?

Click to expand...

Probably a FIFO, as that should actually perform better than an LRU (at least according to Hoppe).

Why the no optimized code is faster than optimized code after changed the size in my project?

thanks.

Dio · Feb 20, 2003

Simon F said:
I was a bit confused by this. I'm assuming that you are forming triangles but why aren't these ordered more like..... 0,1,15; 1, 16, 15; 1, 2, 16;... ?

Tristrips.

I think the algorithm's valid, but requires a lot of assumptions about how the hardware works. Personally, I wouldn't use an algorithm of this kind - I'd take a few % inefficiency in exchange for portability, but what do I know?

ultrafly · Feb 21, 2003

Dio said:
Simon F said:

I was a bit confused by this. I'm assuming that you are forming triangles but why aren't these ordered more like..... 0,1,15; 1, 16, 15; 1, 2, 16;... ?

Click to expand...

Tristrips.

I think the algorithm's valid, but requires a lot of assumptions about how the hardware works. Personally, I wouldn't use an algorithm of this kind - I'd take a few % inefficiency in exchange for portability, but what do I know?

I think the method should be efficient.
But i don't understand why get inverse result by reduce the size of triangles?

Simon F · Feb 21, 2003

Dio said:
Simon F said:

I was a bit confused by this. I'm assuming that you are forming triangles but why aren't these ordered more like..... 0,1,15; 1, 16, 15; 1, 2, 16;... ?

Click to expand...

Tristrips.

But surely they don't even form tristrips. The vals given would make triangles (0,1,1) (1,1,2) (1,2,2)... etc!

Dio · Feb 21, 2003

Yep. It uses degenerate triangles to pre-fill a FIFO style cache. Then the second row does the actual drawing, and pre-fills the second line of the cache.

My uncertainty in it's efficiency is that I'm not sure that many degenerate triangles are at all a good idea.

ultrafly · Feb 21, 2003

Simon F said:
Dio said:

Simon F said:

I was a bit confused by this. I'm assuming that you are forming triangles but why aren't these ordered more like..... 0,1,15; 1, 16, 15; 1, 2, 16;... ?

Click to expand...

Tristrips.

Click to expand...

But surely they don't even form tristrips. The vals given would make triangles (0,1,1) (1,1,2) (1,2,2)... etc!

use degenerate triangles to fill post-T&L vertex cache by especial order of the vertexs.

optimized by post-T&L vertex cache

ultrafly

Dio

ultrafly

ultrafly

ultrafly

Tagrineth

murr

Dio

ultrafly

Dio

ultrafly

ultrafly

Dio

ultrafly

Simon F

Tea maker

ultrafly

Dio

ultrafly

Simon F

Tea maker

Dio

ultrafly

Similar threads