First of all, I'm Sorry to a provocative Title and My poor English.
Recently, I found one person(P) that He claims 'Quadro's SP count is more Marketing than performance'.
-----
P claims 'Quadro's SP count is Marketing'
1. In OpenGL is Important for polygon GPC, not Processor(cuda core) count.
2. Regardless of SM(SP), Quadro 5000 is process 3 polygons per clock cycle, 6000 is 4 polygons.
3. That is the reason why did not down the ROP. (GTX470 40 Rop 320 bit, Quadro 6000 48 Rop 384 bit)
These links are basis of his(P) opinion.
http://techreport.com/articles.x/19404/4
http://www.behardware.com/articles/787-8/r...tx-480-470.html
-----
So, I read several GF100 Architecture review (and gf100 whitepaper),
everybody say -GF100 a parallel geometry processing architecture : 16 Polymolph Engine and 4 Raster Engine.-
http://techreport.com/articles.x/18332/2
http://www.bjorn3d.com/read.php?cID=1778&pageID=8317
http://www.scribd.com/doc/35710178/NVIDIA-GF100-Whitepaper
'To facilitate high triangle rates, we designed a scalable geometry engine called the PolyMorph Engine.
Each of the 16 PolyMorph engines has its own dedicated vertex fetch unit and tessellator, greatly expanding geometry performance.'
I think he(P) places emphasis on simply GPC's Raster Engine.
1. - I don't understand why mentioning only GPC(Raster Engine).
2. - It's natural.
...... Quadro 5000 cuda core 352(3 GPC), Quadro 6000 cuda core 448(4 GPC)
...... 1 GPC = need 1~4 SM. (1 Raster Engine per GPC)
...... 1 SM = 32 cuda core(GF100). (1 Polymolph Engine per SM)
As far as I know Polymolph Engine(SM) and Raster Engine(GPC) are closely related.
techreport.com/articles.x/18332/2
'Once the polymorph engines have finished their work, the resulting data are forwarded the GF100's four raster engines.'
3. - ROPs can explain AA Perfomance. (Geforce 32x, Quadro 64x)
http://techreport.com/articles.x/18332/4
Also, I can explain why SP count is not only Marketing.
Adobe Premiere pro cs5- Mercury Playback Engine GPU Accelation.(or RapiHD=Elemental Accelator at GT200)
Mentalimage Iray. Arion Render. Octane Render. etc..(refer to cuda showcase)
and this
http://www.awn.com/articles/article/fermi-entering-era-computational-visualization/page/1,1
http://pressroom.nvidia.com/easyir/...rsion=live&releasejsp=release_157&prid=645616
Reference 1
Nvidia fermi Quadro 6000.
GPU clock 574MHz
Cuda Core 448, Clock 1148MHz
Memory 384bit, 6GB, Clock 1500(750*2)MHz
48 ROPs
OpenGL 4.x
SM 5.x
1.3 billion triangles per second. (Based on GLperf, run by NVIDIA Performance Lab)
Could you explain it so I can understand more easily?
1. Is SM(Polymolph Engine)/SP(Cuda core) count does not particularly usefulness in openGL performance?
2. Why Quadro more ROPs than Geforce? (openGL? or AA? or Memory (bit, capacity)?)
3. Why Quadro 6000 is 1.3BTris? (Why not 1.9~2.4Btris? How?)
ex) GTX470 2428 MTris = 4 * 607 (4 GPC * GPU clock)
I don't understand how result 1.3BTris. (but i think SM(polymolph engine)s influence to result)
4. Which is more effect(or important) between Polymolph Engine or Raster Engine at OpenGL Performance?
(both sure, but I think more PE than RE)
Reference 2
'Once the polymorph engines have finished their work, the resulting data are forwarded the GF100's four raster engines.
Optimally, each one of those engines can process a single triangle per clock cycle.
The GF100 can thus claim a peak theoretical throughput rate of four polygons per cycle, although Alben called that "the impossible-to-achieve rate," since other factors will limit throughput in practice.
Nvidia tells us that in directed tests, GF100 has averaged as many as 3.2 triangles per clock, which is still quite formidable.'
'Fermi can (theoretically) produce 4 triangles at once. The reality is that it can process about 2.5 - 2.7 simultaneously.
That might not seem like a lot but previous GPU's processed one so even 2.5 per clock is a 250% polygon processing performance increase.'
Each rasterizer can do 8 pixels per clock, for a total of 32 pixels per clock over the entirety of GF100.
4 GPC = 32 pixels per clock * 574(Quadro 6000) = 18.3 Gpixels/s
48 rop = 48 pixels per clock * 574(Quadro 6000) = 27.5 Gpixels/s
Thank you for read.
Recently, I found one person(P) that He claims 'Quadro's SP count is more Marketing than performance'.
-----
P claims 'Quadro's SP count is Marketing'
1. In OpenGL is Important for polygon GPC, not Processor(cuda core) count.
2. Regardless of SM(SP), Quadro 5000 is process 3 polygons per clock cycle, 6000 is 4 polygons.
3. That is the reason why did not down the ROP. (GTX470 40 Rop 320 bit, Quadro 6000 48 Rop 384 bit)
These links are basis of his(P) opinion.
http://techreport.com/articles.x/19404/4
http://www.behardware.com/articles/787-8/r...tx-480-470.html
-----
So, I read several GF100 Architecture review (and gf100 whitepaper),
everybody say -GF100 a parallel geometry processing architecture : 16 Polymolph Engine and 4 Raster Engine.-
http://techreport.com/articles.x/18332/2
http://www.bjorn3d.com/read.php?cID=1778&pageID=8317
http://www.scribd.com/doc/35710178/NVIDIA-GF100-Whitepaper
'To facilitate high triangle rates, we designed a scalable geometry engine called the PolyMorph Engine.
Each of the 16 PolyMorph engines has its own dedicated vertex fetch unit and tessellator, greatly expanding geometry performance.'
I think he(P) places emphasis on simply GPC's Raster Engine.
1. - I don't understand why mentioning only GPC(Raster Engine).
2. - It's natural.
...... Quadro 5000 cuda core 352(3 GPC), Quadro 6000 cuda core 448(4 GPC)
...... 1 GPC = need 1~4 SM. (1 Raster Engine per GPC)
...... 1 SM = 32 cuda core(GF100). (1 Polymolph Engine per SM)
As far as I know Polymolph Engine(SM) and Raster Engine(GPC) are closely related.
techreport.com/articles.x/18332/2
'Once the polymorph engines have finished their work, the resulting data are forwarded the GF100's four raster engines.'
3. - ROPs can explain AA Perfomance. (Geforce 32x, Quadro 64x)
http://techreport.com/articles.x/18332/4
Also, I can explain why SP count is not only Marketing.
Adobe Premiere pro cs5- Mercury Playback Engine GPU Accelation.(or RapiHD=Elemental Accelator at GT200)
Mentalimage Iray. Arion Render. Octane Render. etc..(refer to cuda showcase)
and this
http://www.awn.com/articles/article/fermi-entering-era-computational-visualization/page/1,1
http://pressroom.nvidia.com/easyir/...rsion=live&releasejsp=release_157&prid=645616
Reference 1
Nvidia fermi Quadro 6000.
GPU clock 574MHz
Cuda Core 448, Clock 1148MHz
Memory 384bit, 6GB, Clock 1500(750*2)MHz
48 ROPs
OpenGL 4.x
SM 5.x
1.3 billion triangles per second. (Based on GLperf, run by NVIDIA Performance Lab)
Could you explain it so I can understand more easily?
1. Is SM(Polymolph Engine)/SP(Cuda core) count does not particularly usefulness in openGL performance?
2. Why Quadro more ROPs than Geforce? (openGL? or AA? or Memory (bit, capacity)?)
3. Why Quadro 6000 is 1.3BTris? (Why not 1.9~2.4Btris? How?)
ex) GTX470 2428 MTris = 4 * 607 (4 GPC * GPU clock)
I don't understand how result 1.3BTris. (but i think SM(polymolph engine)s influence to result)
4. Which is more effect(or important) between Polymolph Engine or Raster Engine at OpenGL Performance?
(both sure, but I think more PE than RE)
Reference 2
'Once the polymorph engines have finished their work, the resulting data are forwarded the GF100's four raster engines.
Optimally, each one of those engines can process a single triangle per clock cycle.
The GF100 can thus claim a peak theoretical throughput rate of four polygons per cycle, although Alben called that "the impossible-to-achieve rate," since other factors will limit throughput in practice.
Nvidia tells us that in directed tests, GF100 has averaged as many as 3.2 triangles per clock, which is still quite formidable.'
'Fermi can (theoretically) produce 4 triangles at once. The reality is that it can process about 2.5 - 2.7 simultaneously.
That might not seem like a lot but previous GPU's processed one so even 2.5 per clock is a 250% polygon processing performance increase.'
Each rasterizer can do 8 pixels per clock, for a total of 32 pixels per clock over the entirety of GF100.
4 GPC = 32 pixels per clock * 574(Quadro 6000) = 18.3 Gpixels/s
48 rop = 48 pixels per clock * 574(Quadro 6000) = 27.5 Gpixels/s
Thank you for read.