Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Closed Thread
Old 30-Sep-2009, 14:19   #2751
Scali
Naughty Boy!
 
Join Date: Nov 2003
Posts: 2,127
Send a message via ICQ to Scali Send a message via MSN to Scali
Default

Quote:
Originally Posted by MfA View Post
It's not really a big deal, hell there are C++ to C translators ...
I don't think those will work, since 'C for Cuda' isn't fully ANSI C. C++ can only be translated to C if it supports all features... like I said, function pointers are key to the object model.
__________________
ZX81 -> C64 -> Hercules -> Plantronics CGA -> Paradise VGA -> Amiga ECS -> Amiga AGA -> Cirrus Logic 5428 VLB -> S3 Trio64 -> Matrox Mystique -> PCX2 -> Matrox G200 -> Matrox G450 -> GeForce2 GTS -> Kyro II -> Radeon 8500 -> Radeon 9600XT -> GeForce 7600GT -> GeForce 8800GTS -> HD5770
Scali is offline  
Old 30-Sep-2009, 14:21   #2752
Bouncing Zabaglione Bros.
Regular
 
Join Date: Jun 2003
Posts: 6,179
Default

Quote:
Originally Posted by trinibwoy View Post
It's really shaping up like Nvidia built Larrabee while Intel was talking about building it. I'm itching to know what fixed function stuff they might have gotten rid of, or if there are any changes to the rendering pipeline.
If it's true, it looks like Nvidia went GPU->CPU while Intel are trying to do CPU->GPU ie both trying to solve the same problems from opposite starting points. It's certainly a very ambitious approach and a big step towards convergence of GPU/CPU.

I guess all those questions about Nvdia not having a x86 licence are kind of moot if you can talk to the new chip via a compiler the same way as you talk to any CPU.
Bouncing Zabaglione Bros. is offline  
Old 30-Sep-2009, 14:27   #2753
Arty
KEPLER
 
Join Date: Jun 2005
Posts: 1,893
Default

Quote:
Originally Posted by Bouncing Zabaglione Bros. View Post

Is this the first big leak? Sounds impressive on paper, though I do wonder if all the focus on GPGPU means that the gaming side of things will be taking a backseat.
Very typical of Theo, convenient how both hardware-Infos & bsn come out with this 'exclusive' 'breaking' news story AFTER Rys' hint

By the way, when is the webcast? (est)
__________________
People like you - Silent_Buddha laying an epic smackdown on XMAN26's double standards.
So you're mixing apples and oranges to calculate grapes and then compare it to apples. - silent_guy's witty retort on sweeping comparisons.
Arty is offline  
Old 30-Sep-2009, 14:32   #2754
Ailuros
Epsilon plus three
 
Join Date: Feb 2002
Location: Chania
Posts: 7,831
Default

Quote:
Originally Posted by Arty View Post
Very typical of Theo, convenient how both hardware-Infos & bsn come out with this 'exclusive' 'breaking' news story AFTER Rys' hint

By the way, when is the webcast? (est)
There's a huge difference between an educated hw analysis and being first and 2nd at nothing.
__________________
People are more violently opposed to fur than leather; because it's easier to harass rich ladies than motorcycle gangs.
Ailuros is offline  
Old 30-Sep-2009, 14:32   #2755
Chalnoth
 
Join Date: May 2002
Location: New York, NY
Posts: 12,679
Default

Quote:
Originally Posted by MfA View Post
It's not really a big deal, hell there are C++ to C translators ... although I fail to see the point of Fortran apart from the warm and fuzzy feeling the name generates. It's not like even porting legacy code to it is an option, the level of algorithmic changes needed to suit a GPU make a rewrite the only realistic option. Fortran doesn't seem to me to be a great language to write kernels in.

PS. http://www.pgroup.com/resources/cudafortran.htm
Quite a lot of scientific work is still done in Fortran, and though it is possible to get Fortran and C/C++ to play together, it can be difficult and fraught with difficulties with compiling properly. So having a native Fortran version of Cuda could be a boon for getting it adopted within the scientific community.
Chalnoth is offline  
Old 30-Sep-2009, 14:33   #2756
Scali
Naughty Boy!
 
Join Date: Nov 2003
Posts: 2,127
Send a message via ICQ to Scali Send a message via MSN to Scali
Default

Quote:
Originally Posted by Arty View Post
By the way, when is the webcast? (est)
Keynote is at 1 PM PT, which I assume is webcast live, see here for more info:
http://www.nvidia.com/object/gpu_tec...onference.html
__________________
ZX81 -> C64 -> Hercules -> Plantronics CGA -> Paradise VGA -> Amiga ECS -> Amiga AGA -> Cirrus Logic 5428 VLB -> S3 Trio64 -> Matrox Mystique -> PCX2 -> Matrox G200 -> Matrox G450 -> GeForce2 GTS -> Kyro II -> Radeon 8500 -> Radeon 9600XT -> GeForce 7600GT -> GeForce 8800GTS -> HD5770
Scali is offline  
Old 30-Sep-2009, 14:33   #2757
trinibwoy
Meh
 
Join Date: Mar 2004
Location: New York
Posts: 9,810
Default

Quote:
Originally Posted by Arty View Post
By the way, when is the webcast? (est)
4pm EST / 1pm PST

http://www.nvidia.com/object/gpu_tec...onference.html
__________________
What the deuce!?
trinibwoy is offline  
Old 30-Sep-2009, 14:37   #2758
3dilettante
Senior Member
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,280
Default

Quote:
Originally Posted by trinibwoy View Post
Looks like that's exactly what they're trying to do. Strange that there's no mention of any graphics specific bits so far. Not saying there aren't any but the focus seems to have veered sharply away from graphics.
The Rys blur-o-gram had a lot of the same colors in the area that the GT200 one had for shader and triangle setup. It looks like there's still some kind of texture block.
The compute portion appears to be heavily reworked, and the area that was the ROP section is still there, but I can't infer much from a gray (oddly dark gray...) smudge.

If the setup, texturing, and ROP specialized sections persist, the Fermi architecture would be the answer to the question "what if we made Larrabee without x86, and gave it ROPs and a rasterizer?"
The next question would be, "what if we built Larrabee with an inferior process", but I digress.

Quote:
That's true, but the same could be said for G71->G80 which was an even bigger change. Though they are trying to do more stuff now which could have put a strain on resources.
The rumors seem to reflect that the birthing process for this new chip could have been smoother.

Quote:
It's probably safe to assume that if they're serious about computing, performance of atomics would have been high on their todo list. Side question - are the existing caches on GPUS generally useful for non-texture data (not referring to the specialized caches like PTVC)?
I'm not sure.
They are pretty small, and they are structured to provide peak bandwidth for the common case of filtered texture fetches.
I'm not sure how much of their behavior changes if they are tasked with linearly addressed memory. If the data is structured to make the most of them, then their bandwidth can be used.
Their size and read-only nature makes them less than generally useful.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is offline  
Old 30-Sep-2009, 14:40   #2759
MfA
Regular
 
Join Date: Feb 2002
Posts: 5,328
Send a message via ICQ to MfA
Default

Quote:
Originally Posted by Arty View Post
Very typical of Theo, convenient how both hardware-Infos & bsn come out with this 'exclusive' 'breaking' news story AFTER Rys' hint
Wouldn't that be NVIDIA's hint? (If you're under NDA you tell what you are told you can tell.)
MfA is offline  
Old 30-Sep-2009, 15:00   #2760
dnavas
Member
 
Join Date: Apr 2004
Posts: 326
Default

Quote:
Originally Posted by Rys View Post
All will be revealed later today anyway, not long to go now.
Wow, no wonder the parking lot was full late last night! I thought we had a couple of weeks to go. I wonder if they brought the demo forward for competitive reasons?

Also, I realize Rys says everything has changed, but:
1) 16kb per 8-wide set of SPs does actually work out
2) I like that there are four blue dots and four sets of SPs in there
3) I wonder what bits in the chip run the C++ code
4) If DP runs half-speed, they really have done some work in there.
5) Isn't it great that each SP can run an instruction per clock per thread? Why, all I have to do to increase performance is add more threads! Infinite TFlops!
dnavas is offline  
Old 30-Sep-2009, 15:57   #2761
Richard
Mord's imaginary friend
 
Join Date: Jan 2004
Location: PT, EU
Posts: 3,506
Default

Quote:
Originally Posted by chavvdarrr View Post
1 day in near future, a writing on the screen when starting your brand new IBM PC compatible...


"no CPU detected, starting software emulation"

LOL
One would hope that by then we wouldn't need POST-screens anymore. Who am I kidding, we're going to still be posting in x86 mode regardless... the future will bring x86 bootstrap hw in the mobo using CMOS for working set; mark my words. For an industry so quick to change we are an awful crotchety bunch.
__________________
The optimist proclaims that we live in the best of all possible worlds, and the pessimist fears this is true. - James Branch Cabell
Richard is offline  
Old 30-Sep-2009, 16:24   #2762
GrapeApe
Junior Member
 
Join Date: Apr 2004
Location: Calgary, Canada
Posts: 57
Default

With nV being so die space limited again, focusing heavily on the Tesla family in design, and trying to pack in as much compute power onto the die as possible under current fab, I'm guess the return of the NVIO is a safe assumption, no?

With that, what would the limitations be towards putting more than one traditional NVIO on the the PCB to allow for greater multiple monitor configurations (more as a rarer 'we can do it too' configuration than as a general design). With the DRAM and ROP/RBE partitions being an odd number as inferred from the blurry-diagram, I'm assuming a six-cluster would be easier to feed to two external NVIOs than 3 distinct groups of even numbers.

It would be another way to address a PR checkbox, in an era of the return of the checkbox (3DVision, Eyefinity, PhysX etc), and if possible would be simpler than an NVIO near-term redesign.

I'm just not sure of the restriction on the NVIO as there's not too much on the underlying design, just the base components included (TMDS, RAMDACs, etc).

I always thought the NVIO was a cop-out for near term, but would be essential if you wanted to go to an multi-die MCM style future design to avoid duplication of resources and maximize the transistor budget for this and the idea of multiple offspring designs (like Tesla).

I know there's 2 NVIO on the GTX295, but that's primarily due to the SLi considerations when communicating with the bridge.

Anywhoo, just curious if anyone knows for sure if dual NVIOs per chip was possible, or if it's limited by memory interface or RBE/ROP restrictions by design?
__________________
"I'm Sorry That would be playing God."
"GOD Shmod, I want my Monkey Man! "
GrapeApe is offline  
Old 30-Sep-2009, 16:33   #2763
jaredpace
Member
 
Join Date: Sep 2009
Posts: 135
Default

What is this?

jaredpace is offline  
Old 30-Sep-2009, 16:33   #2764
DegustatoR
Senior Member
 
Join Date: Mar 2002
Location: msk.ru/spb.ru
Posts: 1,311
Default

Quote:
Originally Posted by GrapeApe View Post
With that, what would the limitations be towards putting more than one traditional NVIO on the the PCB to allow for greater multiple monitor configurations (more as a rarer 'we can do it too' configuration than as a general design).
AFAIK even the first version of NVIO allows to have 4 simultaineous outputs.
And NVIO has nothing to do with being die size limited.
DegustatoR is offline  
Old 30-Sep-2009, 16:34   #2765
Jawed
Regular
 
Join Date: Oct 2004
Location: London
Posts: 9,867
Send a message via Skype™ to Jawed
Default

Quote:
Originally Posted by DegustatoR View Post
That's a bit of an overstatement.
At best 10-20% better performance than HD4890 in games despite having more bandwidth and being dramatically larger. I don't see any overstatement there.

Quote:
So how come nobody did?
Maybe they did but AMD blanked them

Quote:
GT218 is a 60mm^2 GPU. I don't think that you can compare it to the 140mm^2 RV740.
It wasn't a reference to the performance of RV740. It was a reference to the ability to refresh on a new node and improve all performance-per metrics significantly.

Quote:
And you surely can't compare it to a GPU made on another process.
Why? They're direct competitors (until Cedar arrives). If it was higher performance and/or lower-power we'd say "that's the benefit of 40nm". Instead we're just scratching our heads.

Quote:
In other words we need more information before any conclusion on GT21x being a failure can be made. One review of GT218 isn't enough for such conclusion.
It'll need to be quite a turnaround. Remember NVidia was boasting about expecting to be first with 40nm chips.

When something as "simple" as GT218 is delayed and working badly it's not particularly surprising that NVidia's not ready for W7 launch with a 40nm D3D11 GPU.

Jawed
__________________
Can it play WoW?
Jawed is offline  
Old 30-Sep-2009, 16:37   #2766
AnarchX
Senior Member
 
Join Date: Apr 2007
Posts: 1,396
Default

Quote:
Originally Posted by jaredpace View Post
What is this?

G92(b)
AnarchX is offline  
Old 30-Sep-2009, 16:44   #2767
Jawed
Regular
 
Join Date: Oct 2004
Location: London
Posts: 9,867
Send a message via Skype™ to Jawed
Default

Quote:
Originally Posted by Humus View Post
Previously the interpolator provided the interpolated values to the shader. In SM 5.0 the shader can ask for interpolated values by itself. There are some functions for it: EvaluateAttributeAtCentroid(), EvaluateAttributeAtSample() and EvaluateAttributeSnapped().
Ooh, very interesting, thanks. Can't find anything about those online

Is there something similar for use in DS to help in obtaining attributes at the newly generated points?

Jawed
__________________
Can it play WoW?
Jawed is offline  
Old 30-Sep-2009, 16:49   #2768
DegustatoR
Senior Member
 
Join Date: Mar 2002
Location: msk.ru/spb.ru
Posts: 1,311
Default

Quote:
Originally Posted by Jawed View Post
When something as "simple" as GT218 is delayed and working badly it's not particularly surprising that NVidia's not ready for W7 launch with a 40nm D3D11 GPU.
I fail to see any correlation between GT218 and DX11.
And it was late because of TSMC not NVIDIA. Which rises the question of who's to blame for it's power characteristics also.
DegustatoR is offline  
Old 30-Sep-2009, 17:00   #2769
Jawed
Regular
 
Join Date: Oct 2004
Location: London
Posts: 9,867
Send a message via Skype™ to Jawed
Default

Quote:
Originally Posted by 3dilettante View Post
Any such data path puts RV870 one step closer to fully closing the write/read loop in the manner CPU caches do.
It would probably still be less flexible and have higher latency, but at least there's an on-chip path.
Another thought is merely that the L2 system can query the RBE-owned render target structures and either decode the RBEs' compression tag tables or request decompression semantics for the data it wants to fetch from memory. So the on-chip linkage might be quite simple and L2 is simply doing most of the work, rather than having RBEs fetching the data and using the render target caches.

Quote:
As a side note, I'm curious about the additional non-texture L1 that was added alongside the regular texture cache, as mentioned in the Anandtech article. What this brings to the table at that size compared to the larger texture and LDS, I'm not sure. It would help with problems with thrashing, if graphics and compute shaders hit the same SIMD, I suppose.
In a GPGPU situation, what would it offer over using the larger L1?
Sure this isn't just the regular L1 cache that's used for textures? I don't trust Anandtech.

Jawed
__________________
Can it play WoW?
Jawed is offline  
Old 30-Sep-2009, 17:12   #2770
Jawed
Regular
 
Join Date: Oct 2004
Location: London
Posts: 9,867
Send a message via Skype™ to Jawed
Default

Quote:
Originally Posted by nAo View Post
With now both NVIDIA and ATI interpolating in the shader cores divisions are used even more often
Even Larrabee has RCP and RSQRT intrinsics I wonder what the throughput for these is. I guess that's the cost of doing graphics, rather than just general compute. The EXP2 and LOG2 functions are useful too - though base-2 stuff is pretty easy I dare say (partly re-using FTOI/ITOF I guess?).

Jawed
__________________
Can it play WoW?
Jawed is offline  
Old 30-Sep-2009, 17:19   #2771
Jawed
Regular
 
Join Date: Oct 2004
Location: London
Posts: 9,867
Send a message via Skype™ to Jawed
Default

Quote:
Originally Posted by trinibwoy View Post
How do you figure that?

I get 30*16-banks*4-bytes*600Mhz =~ 1.15TB/s
I was looking at it from the point of view of the throughput for the ALUs, that 1 operand per clock is available per MAD: 30 SIMDs * 8 ALUs * 4 bytes * 1476MHz (GTX285) = 1.417TB/s.

Jawed
__________________
Can it play WoW?
Jawed is offline  
Old 30-Sep-2009, 17:23   #2772
3dilettante
Senior Member
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,280
Default

Quote:
Originally Posted by Jawed View Post
Sure this isn't just the regular L1 cache that's used for textures? I don't trust Anandtech.
I don't know. It seemed like an odd thing to just make up out of thin air.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is offline  
Old 30-Sep-2009, 17:27   #2773
trinibwoy
Meh
 
Join Date: Mar 2004
Location: New York
Posts: 9,810
Default

Quote:
Originally Posted by Jawed View Post
I was looking at it from the point of view of the throughput for the ALUs, that 1 operand per clock is available per MAD: 30 SIMDs * 8 ALUs * 4 bytes * 1476MHz (GTX285) = 1.417TB/s.

Jawed
Oh, ok, though I'm not sure if the register file and/or shared memory runs at the hot clock. For one thing, results from the pipeline are written 16 at a time which implies some sort of buffering.
__________________
What the deuce!?
trinibwoy is offline  
Old 30-Sep-2009, 17:35   #2774
Arty
KEPLER
 
Join Date: Jun 2005
Posts: 1,893
Default

Quote:
Originally Posted by DegustatoR View Post
I fail to see any correlation between GT218 and DX11.
And it was late because of TSMC not NVIDIA. Which rises the question of who's to blame for it's power characteristics also.
That's hardly an excuse, AMD didn't suffer as much so it comes down to NV's design.

Thanks for the webcast time, appreciate it.
__________________
People like you - Silent_Buddha laying an epic smackdown on XMAN26's double standards.
So you're mixing apples and oranges to calculate grapes and then compare it to apples. - silent_guy's witty retort on sweeping comparisons.
Arty is offline  
Old 30-Sep-2009, 17:57   #2775
Jawed
Regular
 
Join Date: Oct 2004
Location: London
Posts: 9,867
Send a message via Skype™ to Jawed
Default

Quote:
Originally Posted by dnavas View Post
At anyrate, from my limited experience, if I wanted to make my ALUs more generic, DIV would have to be close to the top of my list....
See figure 4:

http://www.ece.ubc.ca/~aamodt/papers...m.ispass09.pdf

Sure, it's not comprehensive, but SFU isn't getting much use there.

Jawed
__________________
Can it play WoW?
Jawed is offline  

Closed Thread

Tags
nvidia, speculation

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 05:12.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.