Nvidia GT300 core: Speculation

Status
Not open for further replies.
1 day in near future, a writing on the screen when starting your brand new IBM PC compatible...


"no CPU detected, starting software emulation"

LOL
 
WTF "native C, C++, Fortran etc" means?!
NV were able to make ANSI C++ compiler for the new chip?
It's not really a big deal, hell there are C++ to C translators ... although I fail to see the point of Fortran apart from the warm and fuzzy feeling the name generates. It's not like even porting legacy code to it is an option, the level of algorithmic changes needed to suit a GPU make a rewrite the only realistic option. Fortran doesn't seem to me to be a great language to write kernels in.

PS. http://www.pgroup.com/resources/cudafortran.htm
 
Well the question is what features of C++ do they currently NOT support? The answer to that question would probably provide hints as to what else they've changed
 
Well the question is what features of C++ do they currently NOT support? The answer to that question would probably provide hints as to what else they've changed

I think the biggest issue is that of using function pointers. A lot of the object-oriented features of C++ are implemented through the manipulation of function pointers. As far as I know, they could only branch with fixed offset so far .
 
i think, GT300 packs an additional SM(Stream multiproceesor) up from 3 in GT200 and 16 TPC up from 10 in GT200..

As G80 had 8 TPCs with 2 SMs inside each TPC. Each SMs contains 8 SPs, thus have total of 16 SPs in each TPC, equals total of 128 SPs

8(TPC) * (8 * 2) = 128

GT200 had 10 TPC with 3 SMs inside each TPC. Each SMs contains 8 SPs, thus have total of 24 SPs in each TPC, equals total of 240 SPs

10(TPC) * (8 * 3) = 240

Now according to BSN rumour, i think, GT300 will have 16 TPC with 4 SMs inside each TPC. Each SMs containing 8 SPs, thus will have total of 32 SPs in each TPC, equals total
of 512 SPs..

16 (TPC) * (8 * 4) = 512


GT300 might be GT200 on Steriods... i may be wrong
 
It's really shaping up like Nvidia built Larrabee while Intel was talking about building it. I'm itching to know what fixed function stuff they might have gotten rid of, or if there are any changes to the rendering pipeline.

GT300 might be GT200 on Steriods... i may be wrong

According to Rys, you are :)
 
It's not really a big deal, hell there are C++ to C translators ... although I fail to see the point of Fortran apart from the warm and fuzzy feeling the name generates. It's not like even porting legacy code to it is an option, the level of algorithmic changes needed to suit a GPU make a rewrite the only realistic option. Fortran doesn't seem to me to be a great language to write kernels in.

PS. http://www.pgroup.com/resources/cudafortran.htm

It might be a big deal for the lazy ones out there *shrugs*
 
It's not really a big deal, hell there are C++ to C translators ...

I don't think those will work, since 'C for Cuda' isn't fully ANSI C. C++ can only be translated to C if it supports all features... like I said, function pointers are key to the object model.
 
It's really shaping up like Nvidia built Larrabee while Intel was talking about building it. I'm itching to know what fixed function stuff they might have gotten rid of, or if there are any changes to the rendering pipeline.

If it's true, it looks like Nvidia went GPU->CPU while Intel are trying to do CPU->GPU ie both trying to solve the same problems from opposite starting points. It's certainly a very ambitious approach and a big step towards convergence of GPU/CPU.

I guess all those questions about Nvdia not having a x86 licence are kind of moot if you can talk to the new chip via a compiler the same way as you talk to any CPU.
 
Very typical of Theo, convenient how both hardware-Infos & bsn come out with this 'exclusive' 'breaking' news story AFTER Rys' hint :D

By the way, when is the webcast? (est)

There's a huge difference between an educated hw analysis and being first and 2nd at nothing.:oops:
 
It's not really a big deal, hell there are C++ to C translators ... although I fail to see the point of Fortran apart from the warm and fuzzy feeling the name generates. It's not like even porting legacy code to it is an option, the level of algorithmic changes needed to suit a GPU make a rewrite the only realistic option. Fortran doesn't seem to me to be a great language to write kernels in.

PS. http://www.pgroup.com/resources/cudafortran.htm
Quite a lot of scientific work is still done in Fortran, and though it is possible to get Fortran and C/C++ to play together, it can be difficult and fraught with difficulties with compiling properly. So having a native Fortran version of Cuda could be a boon for getting it adopted within the scientific community.
 
Looks like that's exactly what they're trying to do. Strange that there's no mention of any graphics specific bits so far. Not saying there aren't any but the focus seems to have veered sharply away from graphics.
The Rys blur-o-gram had a lot of the same colors in the area that the GT200 one had for shader and triangle setup. It looks like there's still some kind of texture block.
The compute portion appears to be heavily reworked, and the area that was the ROP section is still there, but I can't infer much from a gray (oddly dark gray...) smudge.

If the setup, texturing, and ROP specialized sections persist, the Fermi architecture would be the answer to the question "what if we made Larrabee without x86, and gave it ROPs and a rasterizer?"
The next question would be, "what if we built Larrabee with an inferior process", but I digress.

That's true, but the same could be said for G71->G80 which was an even bigger change. Though they are trying to do more stuff now which could have put a strain on resources.
The rumors seem to reflect that the birthing process for this new chip could have been smoother.

It's probably safe to assume that if they're serious about computing, performance of atomics would have been high on their todo list. Side question - are the existing caches on GPUS generally useful for non-texture data (not referring to the specialized caches like PTVC)?
I'm not sure.
They are pretty small, and they are structured to provide peak bandwidth for the common case of filtered texture fetches.
I'm not sure how much of their behavior changes if they are tasked with linearly addressed memory. If the data is structured to make the most of them, then their bandwidth can be used.
Their size and read-only nature makes them less than generally useful.
 
Very typical of Theo, convenient how both hardware-Infos & bsn come out with this 'exclusive' 'breaking' news story AFTER Rys' hint :D
Wouldn't that be NVIDIA's hint? (If you're under NDA you tell what you are told you can tell.)
 
All will be revealed later today anyway, not long to go now.

Wow, no wonder the parking lot was full late last night! I thought we had a couple of weeks to go. I wonder if they brought the demo forward for competitive reasons?

Also, I realize Rys says everything has changed, but:
1) 16kb per 8-wide set of SPs does actually work out
2) I like that there are four blue dots and four sets of SPs in there
3) I wonder what bits in the chip run the C++ code
4) If DP runs half-speed, they really have done some work in there.
5) Isn't it great that each SP can run an instruction per clock per thread? Why, all I have to do to increase performance is add more threads! Infinite TFlops!
 
Status
Not open for further replies.
Back
Top