Fermi Temperature. It's still in line with Celsius (NV10), Kelvin (NV20), Rankine (NV30), Curie (NV40), Tesla (NV50). They are supposed to be interfaces though, not actual hardware architectures. Newer chips also support older interfaces (in a compatibility mode).Jensen made it clear that "Tesla" was never the name for GT 200 architecture and Brian Burke told me that Fermi is the first name for GPU architecture named after a scientist.
Each core can access any data in its shared L1D cache, but cannot generally see the contents of remote L1D caches. At the end of a kernel, cores must write-through the dirty L1D data to the L2 to make it visible to both other cores in the GPU and the host. This might be described as a lazy write-through policy for the L1D, but it seems more accurately described as write back with periodic synchronization.
That is really disappointing. I can't wait until they share more information about what features it has that will benefit graphics.
RealWorldTech said:Perhaps the most significant demonstration of Nvidia's commitment to compute is the fact that a great deal of the new features are not particularly beneficial for graphics. Double precision is not terribly important, and while cleaning up the programming model is attractive, it's hardly required. The real question is whether Nvidia has strayed too far from the path of graphics, which again depends on observing and benchmarking real products throughout AMD's, Nvidia's and Intel's line up; but it seems like the risk is there, particularly with AMD's graphics focus.
A few questions,
I haven't understood the notion of semi coherent caches? Quoting from here,
1) Can a SM see the stuff in other SM's L1D or not?
2) If L1D is private to a SM, then what happens if two SM's write different data to a same location without atomics?
3) Is it programmers job to make sure that doesn't happen?
4) And how does a SM know what to cache?
5) What about the L2 cache? Everything is supposed to go through it, then how does the notion of (semi/fully) coherence fit here?
It will be highly appreciated if anyone can provide me answers to these questions.
Thanks,
Yay²So to those who understand, this new chip is it
Yay, Meh or Wtf ?
My gut reaction is that NVidia's built a slower version of Larrabee in about the same die size with no x86 and added ECC. It might be a bit smaller.
i read that they showed a raytraced carrunning on fermi
someone has a picture?
Is it just me or is nVidia running a really huge risk here? I can't imagine this having a better price/performance ratio than the HD5xx0 line, so it really looks like they're trying to transition away from mainstream consumer graphics, or else just couldn't adjust their plans fast enough to have anything else in this time frame.
Rys,
16 pixels/clock address and setup/SM? Are you sure 256 TMUs aren't way too much overkill for that kind of bandwidth?
Also when you state 8Z/8C samples /clock for the ROPs, I assume it's either/or as in today's GPUs?
I wonder how many people waiting to see what Nvidia has against 58xx will see this and think that Fermi is going to be late, expensive, and not really a gaming product, but more of a science/supercomputing chip?
1) No.
2) It's undefined on current hardwares, so it should remain undefined.
3) Yes.
4) Like any other cache, such as LRU, I presume.
5) The L2 cache is supposedly tied to the memory controller (each memory controller has 128KB L2 cache), so it's coherent by itself because it only has to cache its own data.
That's my question also.
Also hardware.fr is reporting that each SFU unit can do 8 interpolations. (if i understood correct)
Each SM has 1 SFU unit, so the GT300 can do 16X8=128 interpolations.
Wouldn't the design (256TMUs) be limited that way? (128 interpolations)
EDIT*
I just rechecked hardware.fr.
It reports 16 interpolations.
So probably we have 256TMUs.
If L1D are not supposed/expected to communicate, then where do you see possible uses for a unified L2? faster atomic operations?
c) This is gonna invite some irritation/criticism/flames. But why is implementing trees, linked lists in g80-esque shared memory inefficient? Or plain hard? I haven't tried doing this, so please make allowance for this before you berate me.
That's been my feel since yesterday especcially after I read (from anad) that tesla sales last quarter amount to 1.3% to their business. Nvidia is forcefully trying to carve its niche in the HPC market.Is it just me or is nVidia running a really huge risk here? I can't imagine this having a better price/performance ratio than the HD5xx0 line, so it really looks like they're trying to transition away from mainstream consumer graphics, or else just couldn't adjust their plans fast enough to have anything else in this time frame.
But that puts them in a position where they're forced to cede even more of the consumer market to AMD, and Intel and Larrabee seem like they could directly compete with this sort of concept. It's like they're running toward the giant behemoth that is Intel while being nipped at the heels by an AMD with a far stronger bite than expected.
I can't believe that's an enviable position to be in at all.
Of course HPC is a niche. With GF100 they can cover mass market and niche as well. They are not leaving GPU market, they have hopefully built a product, which can compete in GPU market and in HPC market (niche chips into niche market).HPC is a niche market. [...] Concentrating on GPGPU at the expense of retail/OEM/mobile
I see the opposite: what NVidia has to do now, if it wants to survive in 7 years, when both Intel and AMD have integrated/combined/scalable CPU and GPU.see where it takes them
They've had spare, if the chip interconnects were dictating the minimum size, for example in R700-s case it got 800 SPs instead on some less number because of this.I think they added DP and ECC not as a major focus, but because they had spare transistor budget to do it
GF100 is 3 billion transistors. If the product was good enough being half the size, the features would get cut without thinking twice. But it isn't.