The Inquirer Trying to Scoop B3D!

Makes you wonder about NV35: a 4x2 architecture with a 256-bit, DDR-II memory interface? No way!

I am pretty much 100% certain regarding the memory tech. Perhaps NV35 is an 8x2 configuration (?!). That would certainly cast a new light on recent claims of something coming that is "twice as fast". Twice the fillrate, twice the memory bandwidth... :oops:

"Hmmmmmmm...." indeed.

MuFu.
 
Wow, The inquirer has had a lot of stories with various rumors, but this seems like it could be pretty legit.

What exactly is color+z rendering, compared to z-rendering and the various operations? If shader operations are running at 8 pixels per clock, will the FX get better in the future, or worse in the future as things become more shader oriented?

Still, pretty disappointing news. Nvidia has been really misleading about this.

Nite_Hawk
 
I'm about "this" close to actually believing this story...It just makes sense.

In the event this turns out to be the case, is there _any_ more proof needed that nVidia was completely side-swiped by R300? Doesn't that really lead one to think that the whole "it's a manufactring issue" is really only 1/2 the story?

Slowly but surely, a picture is forming on what really happened to nVidia when they finally knew the R300 scoop.

Furthermore, if this is the case, my whole impression of nVidia PR would sink to an alltime low...and it's already _real_ low right now.
 
Nite_Hawk said:
What exactly is color+z rendering, compared to z-rendering and the various operations? If shader operations are running at 8 pixels per clock, will the FX get better in the future, or worse in the future as things become more shader oriented?

Nite_Hawk

Well, as mentioned in the other thread ( http://www.beyond3d.com/forum/viewtopic.php?t=4252&start=60 )

the key thing is a post from OpenGL-developer called "pixelpipes"

" Normally I have Z test disabled, but Z write enabled, and of course also color write. Enabling Z test will invoke the 'early out' tests, which are done per tile, thus screwing the measurement.

I tried it with Z write DISabled, and the result is the same. (equivalent to NV25 with appropriate GPU clock ratio boost) "

post can be found from this thread:
http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/008757-3.html

So, everytime you write a pixel out to the framebuffer, with z-buffer update OR WITHOUT IT, you are writing 4 pixels at the time. NOT 8 like nVidia has led people to believe. Plain z-buffer updates and stencil updates are done paraller with 8 pixels. Like a true 8x1 architecture should do. Basically, if I understand this correctly, better shader programs are good because then the execution time gets amortized better. (= a slight hit from the 4x2 pixel output doesn't matter that much)

This would also mean that all the ps 1.4 fallback to ps 1.1 are really bad because they include multiple passes throught the framebuffer.

Now we can think, where could you find those pixelshader fallbacks ?! ;) ;)
 
epicstruggle said:
I wish the inq would have credited b3d, this is obviously the site they got that info from.

later,

not the first time rumours and technical issues discussed on this and other 3d forums become stories at the The Inqwell.
 
demalion said:
the inquirer said:
IN A MARATHON INVESTIGATION that The Inquirer launched a couple of days ago, we have some solid stuff to present to you.

This is just plain disgusting. :-?

Um yes, but at least they asked nVidia about the matter, or did I misunderstood something ?! Isn't that "... only 4 pixels with color+z ..." quote from nVidia's tech manager ?!
 
By the way, I was told by someone that this is half the story. Seems the NVIDIA Technical Manager didn't give the whole story, and probably couldn't because of NDA material.
 
Matt said:
By the way, I was told by someone that this is half the story. Seems the NVIDIA Technical Manager didn't give the whole story, and probably couldn't because of NDA material.

You mean it can be even worst?
 
RoOoBo said:
Matt said:
By the way, I was told by someone that this is half the story. Seems the NVIDIA Technical Manager didn't give the whole story, and probably couldn't because of NDA material.

You mean it can be even worst?

Not sure about worst, just misunderstood. When the Inquirer doesn't get the full story they tend to try and make one up.

Talking with Ben Sun right now, and he pointed out this article over at ExtremeTech:

http://www.extremetech.com/article2/0,3973,713547,00.asp

Note what David Kirk said:

Pipes don't mean as much as they used to. In the [dual-pipeline] TNT2 days you used to be able to do two pixels in one clock if they were single textured, or one dual-textured pixel per pipe in every two clocks, it could operate in either of those two modes. We've now taken that to an extreme. Some things happen at sixteen pixels per clock. Some things happen at eight. Some things happen at four, and a lot of things happen in a bunch of clock cycles four pixels at a time. For instance, if you're doing sixteen textures, it's four pixels per clock, but it takes more than one clock. There are really 32 functional units that can do things in various multiples. We don't have the ability in NV30 to actually draw more than eight pixels per cycle. It's going to be a less meaningful question as we move forward...[GeForceFX] isn't really a texture lookup and blending pipeline with stages and maybe loop back anymore. It's a processor, and texture lookups are decoupled from this hard-wired pipe.
 
Matt said:
By the way, I was told by someone that this is half the story. Seems the NVIDIA Technical Manager didn't give the whole story, and probably couldn't because of NDA material.

Oh God!!! :oops:
 
Pipes don't mean as much as they used to. In the [dual-pipeline] TNT2 days you used to be able to do two pixels in one clock if they were single textured, or one dual-textured pixel per pipe in every two clocks, it could operate in either of those two modes. We've now taken that to an extreme. Some things happen at sixteen pixels per clock. Some things happen at eight. Some things happen at four, and a lot of things happen in a bunch of clock cycles four pixels at a time. For instance, if you're doing sixteen textures, it's four pixels per clock, but it takes more than one clock. There are really 32 functional units that can do things in various multiples. We don't have the ability in NV30 to actually draw more than eight pixels per cycle. It's going to be a less meaningful question as we move forward...[GeForceFX] isn't really a texture lookup and blending pipeline with stages and maybe loop back anymore. It's a processor, and texture lookups are decoupled from this hard-wired pipe.

I think the only thing that it is clear from that is that they are actually outputing 4 pixels per clock in some cases. It is also clear that they are trying to hide something with that confusing explanation. BTW who said that? Because doesn't even to seem proper english (not that I'm that good with english either, but it reads really weird).

32 functional units could be 8 pixel pipes with 4 32 bit fp units each, or 4 pixel pipes with 8 16 bit fp units. Or 4 pipes with 4 32 bit fp units each and 2 fps units for each TMU and 2 TMUs per pipe. As we don't know what NVidia calls 'functional unit' that means nothing. And it isn't just the functionals units, how many read/write ports to memory or caches? How many Z units?

From the ExtremeTech article:

Interestingly, neither Kirk nor Tamasi were willing to disclose how many texturing units are present in each pixel pipe. In our interview, they both stressed the decreasing importance of pipes, and focused on the number and sophistication of internal processing units -- which they feel is more important. Kirk describes GeForceFX as a network of processing units on a single dye, and adds

Pipes are important as are throughput and latency. If they don't have 4 or 8 pixel pipes as everyone else and use a 'sea of functional units' (words that mean absolutely nothing) they still have throughtput and latency. If they aren't wanting to say how many pixels can be output per cycle in each case (single texture, multitexture, point filter, bilinear, trilinear, stencil, shaders) and how much latency each operations has (for example the R300 has a latency of two cycles for trilinear and 1 for bilinear) is because there is something that they think people shouldn't know. And I guess it isn't something good for NVidia.

ATI also has a 'network of processing uints on a single dye'. It is just that we know how is arranged and how most those units are: 4 vertex shader units with a scalar and a SIMD vector unit each, and 8 pixel pipes with 1 TMU, one scalar and one SIMD vector unit per pipe. And the TMU is capable of doing bilinear filtering in one cycle (4 reads from a single texture). It is just they aren't playing the FUD game.
 
NVIDIA never cease to amaze me. At the NV30 launch, you could see thier Tony Tamasi smiling like a sun when he proudly declared repeatedly: "it's eight pixel pipelines!" And then continued to compare it to Eniacs and black holes. It's not going to be so funny when this 'black hole' eats you first!

By the way, if you want to know where this 'collection of processing elements' idea came from- look no further than the 'FX' in GeforceFX and the 'jo in 'mojo. ;)
 
Back
Top