NVIDIA Fermi: Architecture discussion

you mean compositing render targets from different GPUs? that would be awesome but we would need some kind of a industry wide standard.

Think of a new X11, supporting the same kind of transparency. Your rendered window may come from your IGP, your other GPU, from software or from another GPU on the network.

VirtualGL allows a remote OpenGL Unix, application to run, with rendering done on the server's GPU. (frames are sent done the network with MJPEG compression, with near real-time or real-time performance)

That software appears to be doing pretty much what you're proposing!, but in the unix/OpenGL/X11 realm.
http://www.virtualgl.org/About/Introduction

Normally, a Unix OpenGL application would send all of its drawing commands and data, both 2D and 3D, to an X-Windows server, which may be located across the network from the application server. VirtualGL, however, employs a technique called "split rendering" to force the 3D commands from the application to go to a 3D graphics card in the application server. VGL accomplishes this by pre-loading a dynamic shared object (DSO) into the application at run time. This DSO intercepts a handful of GLX, OpenGL, and X11 commands necessary to perform split rendering. Whenever a window is created by the application, VirtualGL creates a corresponding 3D pixel buffer ("Pbuffer") on a 3D graphics card in the application server. Whenever the application requests that an OpenGL rendering context be created for the window, VirtualGL intercepts the request and creates the context on the corresponding Pbuffer instead. Whenever the application swaps or flushes the drawing buffer to indicate that it has finished rendering a frame, VirtualGL reads back the Pbuffer and sends the rendered 3D image to the client.

(with OpenGL and X11, you can run a remote application, but it results in 3D rendering commands sent over the network and rendered locally in software mode. ugh!)

vgltransport.png


I guess it can be hacked to run your scenario.
 
Who knows if this guy is blowing smoke or not, but surely he meant to say 4x MSAA (with 16xAF)?

Those tests were run using a Corei7 920, 6GBs of DDR3 1333, and an Intel 64GB SSD paired with a single GF100 card. The tests were run at 1920x1200 with 4x SSAA and 16xAF. Happy? If you want more details, wait for a benchmarking site to run benchmarks like everyone else. I am spoiling you as it is. Don't bite the hand that feeds.

http://www.guildwarsguru.com/forum/showpost.php?p=5005444&postcount=8

There's no correction anywhere in that thread from his side clarifying that A or B application was with MSAA and SSAA as someone else claimed.

In any case I can't but disregard that kind of nonsense especially since his entire initial post is full of other off topic nonsense.
 
http://www.brightsideofnews.com/new...oak-ridge-bushes-legacy-threats-progress.aspx
"First of all, the part of the rumor which pegs the power consumption of NV100-class architecture as the reason for rumored cancelation is absolute nonsense"
According to our confidential sources, GPGPU excels when page files exceed 8K
:oops:

I could be wrong but I don't think modern day CPU's and GPU's allow page sizes as a tunable parameter.

Is theo really smoking some really exotic stuff?
 
http://www.brightsideofnews.com/new...oak-ridge-bushes-legacy-threats-progress.aspx
"First of all, the part of the rumor which pegs the power consumption of NV100-class architecture as the reason for rumored cancelation is absolute nonsense"

seriously, why do people read anything Theo writes? The article is filled with so much manure that it isn't funny. Oak Ridge would have to get power from the Mississippi? Oh Really? Its not like knoxville is home of TVA which is one of the largest operators of hydro-electric power in the world with hydro plants throughout the entire state.
 
If the TVA sources are already spoken for it could be a problem. But obviously the Mississippi doesn't have the hydro resources anyway. Anyway there could be some question about renewable power, but I did not read the article.
 
I think it will be a step forward in the sense that it will work more often than the dual issue of G80/GT200. The "missing" in the term "missing MUL" came from the fact that it didn't work so well there, you know? ;)
And nvidia added those MULs for their peak performance figures. It should be easier for Fermi to approach the advertized throughput as it was for the previous generations.

From the general point of view design got cleaner and therefore a bit simpler (should save some transistors compared to a GT200 like design with 4 SMs in a TPC) while not sacrificing (much) performance. Good choice in my opinion. Only SFU limited code will probably suffer.
Are you saying the famous MUL was there only for cosmetic purpose?

http://www.beyond3d.com/content/reviews/52/14

What miracle happened for a 8800GT to show more than 100% efficiency disregarding them if that's the case? In fact, given the workload used for the 5-way independant float test (MUL + 3x MAD + transcendental), it's close to 100% efficiency counting it.
 
The article is filled with so much manure that it isn't funny.

Perhaps but give him credit for writing an interesting (and believable) story unlike SemiAccurate's. I have no doubt Charlie could have accessed this same information but instead of laying it out like this he simply decided to take another infantile jab at Nvidia.

its not only a sneak peak, press will be going under NDA then too.

Excellent, leaks galore next week then? :)

What miracle happened for a 8800GT to show more than 100% efficiency disregarding them if that's the case? In fact, given the workload used for the 5-way independant float test (MUL + 3x MAD + transcendental), it's close to 100% efficiency counting it.

How do you figure that? Theoretical peak for a 8800GT is 336 GInstr/s counting the MUL and 210 GInstr/s counting transcendentals. So it's far from 100% efficiency. It seems what you're seeing there in that 191 GInstr/s number is the transcendental throughput. The workload nicely maps to the 4:1 MAD:SFU ratio.
 
Perhaps but give him credit for writing an interesting (and believable) story unlike SemiAccurate's.

lolwat

That's perhaps believable if you hold American politics to the very naive, basic stereotype that Theo (too much TV?) seems to have.

Occam's Razor here. Why is nVidia so insistent on ORNL then?

I'm pretty sure AMD probably "gave away" those 4870X2s and Barcelona Opterons to 2 separate Chinese entities (one was video processing and the other supercomputing IIRC) so why exclusively ORNL again?
 
Yes, having interned at Los Alamos I definitely give more credit to beauracratic red tape at government organizations than I do to Charlie's personal vendetta. Notwithstanding the points raised before that Fermi's power consumption would have to be ridiculously off from initial estimates for it to even be considered an issue. Nvidia is insistent on expanding their business and breaking into HPC in a big way. ORNL is just one front in that effort albeit a significant one.
 
How do you figure that? Theoretical peak for a 8800GT is 336 GInstr/s counting the MUL and 210 GInstr/s counting transcendentals. So it's far from 100% efficiency. It seems what you're seeing there in that 191 GInstr/s number is the transcendental throughput. The workload nicely maps to the 4:1 MAD:SFU ratio.
If you read the article, it's quite clearly stated the workload has been tweaked to issue 1 MUL, 3 MAD and 1 transcendental each clock, which gives 168Gi/s as in the Int workload.

If it raises to 191Gi/s using SP it's entirely due to the "free" MUL being used.

As it's only 1/5th of the workload, a 113% efficiency translates in more than 50% of the MUL being issued to this ALU.

At first, I was thinking peak should be 198Gi/s with the MUL ALU, but since that would give an Int throughput of 142Gi/s peak it's not the case.

Btw, I don't know how this translates to real world workloads, but the infamous missing MUL is not missing anymore, at least since G92 (no G80 numbers update so it could be due to the drivers).
 
Last edited by a moderator:
Btw, I don't know how this translates to real world workloads, but the infamous missing MUL is not missing anymore, at least since G92 (no G80 numbers update so it could be due to the drivers).

Err, I wouldn't go that far, and I wrote the article. Can the MUL be sortof kindof seen? Yeah, it can (do MUL MUL MUL MUL MUL ad infinitum ad nauseum, it'll probably be there, albeit it's still ). Can it be leveraged in practical workloads? Not quite. In this case, I'm not entirely sure it's not an artefact of how the test was setup, and I should've mentioned that in writing. What I am 99.9% sure of is that the missing MUL remains quite missing outside of toy scenarios like the one outlined above.
 
If you read the article, it's quite clearly stated the workload has been tweaked to issue 1 MUL, 3 MAD and 1 transcendental each clock, which gives 168Gi/s as in the Int workload.

If it raises to 191Gi/s using SP it's entirely due to the "free" MUL being used.
No, the MULs are not the "missing" ones as trinibwoy already said. The test is laid out in a way that the 3 MADs and the MUL are executed on the normal SPs. Only the transcendentals are done by the SFUs and are adding to the instruction rate. So trinibwoy was perfectly right to calculate the theoretical instruction rate for this test to 210 GInst/s for the 8800GT.
But try to achieve north of 300 GInst/s ;)

By the way, I never claimed that it is impossible to see some MULs executed by the SFUs. I only stated that dual issue of the Fermi flavour is very likely to be used more often than the G80/GT200 dual issue.
 

Hope it works. I really don't like their current 3d solution. Though having 3 monitors hopefully means eyeinfinity like support also which si what i'm more interested in.


btw i don't like the 3d solution because it gives me really bad headaches after a few minutes. i think its how close I am to the screens. But 3d in thearters doesn't do that for me. Though towards the end of avatar a headache was starting to develop.
 
Last edited by a moderator:
By the way, I never claimed that it is impossible to see some MULs executed by the SFUs. I only stated that dual issue of the Fermi flavour is very likely to be used more often than the G80/GT200 dual issue.
Given that otherwise half the alus will idle it'll better be more likely!
 
Given that otherwise half the alus will idle it'll better be more likely!

But this happens with GT200, too. Anandtech wrote that you can not use the vec8 alu and SFUs at the same time. So 1/3 of the alus are idling for 8 cycles.

In previous architectures, the SM dispatch logic was closely coupled to the execution hardware. If you sent threads to the SFU, the entire SM couldn't issue new instructions until those instructions were done executing. If the only execution units in use were in your SFUs, the vast majority of your SM in GT200/G80 went unused. That's terrible for efficiency.

The good news is that the SFU doesn't tie up the entire SM anymore. One dispatch unit can send 16 threads to the array of cores, while another can send 16 threads to the SFU. After two clocks, the dispatchers are free to send another pair of half-warps out again. As I mentioned before, in GT200/G80 the entire SM was tied up for a full 8 cycles after an SFU issue.
http://www.anandtech.com/video/showdoc.aspx?i=3651&p=4
 
Hope it works. I really don't like their current 3d solution. Though having 3 monitors hopefully means eyeinfinity like support also which si what i'm more interested in.


btw i don't like the 3d solution because it gives me really bad headaches after a few minutes. i think its how close I am to the screens. But 3d in thearters doesn't do that for me. Though towards the end of avatar a headache was starting to develop.

Unless you literally have cash to burn, I don't see how 3D Vision Surround is even remotely feasible. 3D Vision requires each frame to be rendered twice, once for each eye. So you've already cut your framerates in half when only using one monitor. And now do that on three displays? So you've now got 1/6 the framerate you had when playing the game in non-3D on a single monitor.

S-L-I-D-E-S-H-O-W....
 
Back
Top