Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Closed Thread
Old 30-Sep-2009, 05:31   #2701
Ailuros
Epsilon plus three
 
Join Date: Feb 2002
Location: Chania
Posts: 7,768
Default

Quote:
Originally Posted by nAo View Post
You clearly haven't read what I wrote about data transfer errors. We are dealing with GDDR5, it won't fail, it will scale badly or even impact perf. Moreover no app is entirely ALU limited or bw limited, bottlenecks are dynamic and constantly change while rendering a single frame.
BTW..not just talking about the memory modules 'failing', the GDDR5 interface can fail as well.
Albeit I don't know anything yet but assuming the 384bit bus for GF100 is true, what guarantees that we might see something similar here too?
__________________
People are more violently opposed to fur than leather; because it's easier to harass rich ladies than motorcycle gangs.
Ailuros is offline  
Old 30-Sep-2009, 06:28   #2702
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: /
Posts: 4,070
Send a message via Skype™ to rpg.314
Default

I on the other hand believe that CPU style caches dont scale. LRB's rendering pipeline is an ample proof of that. We'll need scratch pad memories, just like cell/gpu's of today. However, the one thing that I'll change over cell is to allow vector scatter gather from global memory as well, and not just async. dma's.

Cell programmers might be banging their heads against walls, stones etc. But gpu programmers have got on pretty fine in the last 2.5 years on CUDA.
__________________
The views presented here are my own and not my employer's.
Quote:
Originally Posted by Alexko View Post
So in a nutshell, model [BLANK] will have [BLANK], up to [BLANK], and even [BLANK] for a power consumption of just [BLANK]. Impressive.
rpg.314 is offline  
Old 30-Sep-2009, 06:31   #2703
nAo
Nutella Nutellae
 
Join Date: Feb 2002
Location: San Francisco
Posts: 4,297
Default

Quote:
Originally Posted by rpg.314 View Post
. But gpu programmers have got on pretty fine in the last 2.5 years on CUDA.
If you believe that you haven't read enough CUDA based research papers
edit: sooner or later nvidia & ati will add proper coherent r/w caches to their architectures, it's just a matter of time.
__________________
[twitter]
More samples, we need more samples! [Dean Calver]
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way
nAo is offline  
Old 30-Sep-2009, 06:31   #2704
FUDie
Member
 
Join Date: Sep 2002
Posts: 559
Default

Quote:
Originally Posted by nAo View Post
You clearly haven't read what I wrote about data transfer errors. We are dealing with GDDR5, it won't fail, it will scale badly or even impact perf. Moreover no app is entirely ALU limited or bw limited, bottlenecks are dynamic and constantly change while rendering a single frame.
Yes, I did read what you wrote and I do understand it. And nothing you say contradicts the fact that Crysis scaled better with engine clock. It doesn't matter if the memory wasn't scaling as well due to errors: 9% engine clock gave 5% performance boost. If both engine and memory were increased by 9% the maximum gain we'd expect would be 9%. So 9% memory clock increase could give at most 4% more performance.

Engine clock is having a larger impact here. Note that engine speed regulates more than just ALU speed, it also controls ROP performance, vertex rates, etc.

-FUDie
__________________
Ph.D. - Piled Higher and Deeper
FUDie is offline  
Old 30-Sep-2009, 07:00   #2705
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: /
Posts: 4,070
Send a message via Skype™ to rpg.314
Default

Quote:
Originally Posted by nAo View Post
If you believe that you haven't read enough CUDA based research papers
May be. But I'd like to see someone using r/w coherency of caches on a say O(50) core chip with high performance to be convinced otherwise.

Quote:
edit: sooner or later nvidia & ati will add proper coherent r/w caches to their architectures, it's just a matter of time.
I am in the software managed caches camp for now. r/w coherent caches hurt more than the help in the O(50) cores regime, as your compute increases as O(p) but your communication increases by O(p^2).
__________________
The views presented here are my own and not my employer's.
Quote:
Originally Posted by Alexko View Post
So in a nutshell, model [BLANK] will have [BLANK], up to [BLANK], and even [BLANK] for a power consumption of just [BLANK]. Impressive.
rpg.314 is offline  
Old 30-Sep-2009, 07:31   #2706
nAo
Nutella Nutellae
 
Join Date: Feb 2002
Location: San Francisco
Posts: 4,297
Default

Quote:
Originally Posted by rpg.314 View Post
your communication increases by O(p^2).
With naive/simple hw implementations.
__________________
[twitter]
More samples, we need more samples! [Dean Calver]
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way
nAo is offline  
Old 30-Sep-2009, 07:57   #2707
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: /
Posts: 4,070
Send a message via Skype™ to rpg.314
Default

May be it is possible to reduce the O(p^2) to something lower, but I am still waiting for something that uses the r/w coherency of caches on an O(50) core chip with high performance.
__________________
The views presented here are my own and not my employer's.
Quote:
Originally Posted by Alexko View Post
So in a nutshell, model [BLANK] will have [BLANK], up to [BLANK], and even [BLANK] for a power consumption of just [BLANK]. Impressive.
rpg.314 is offline  
Old 30-Sep-2009, 08:49   #2708
Rys
Tiled
 
Join Date: Oct 2003
Location: Kings Langley, UK
Posts: 2,675
Default

Quote:
Originally Posted by Anteru View Post
Seriously, if Rys is already doing diagrams of GF100 (while those for HD5k are still not out yet?), I'll definitely wait for the GF100 before deciding where to sink my money.
Those for HD 5870 are done, and were done before I started work on GF100 (thanks Alex!). We'll publish on it soon.
__________________
A major redesign of the core ALU pineapple boomerang fortress.
Rys is offline  
Old 30-Sep-2009, 10:44   #2709
Davros
Darlek ******
 
Join Date: Jun 2004
Posts: 9,498
Default

Quote:
Originally Posted by Ailuros View Post
the 384bit bus for GF100 is true,
GF100 ? where did this come from I know about G300, but Gf100 ???

edit: and Gt212 what the bloody hell is that ?
__________________
Guardian of the Most holy Two Terabytes of Gaming Goodness™
Davros is offline  
Old 30-Sep-2009, 10:58   #2710
Dr Evil
Anas platyrhynchos
 
Join Date: Jul 2004
Location: Finland
Posts: 4,373
Default

Quote:
Originally Posted by Davros View Post
GF100 ? where did this come from I know about G300, but Gf100 ???
Go back to post nr. 2548 and read forward.
Dr Evil is online now  
Old 30-Sep-2009, 11:05   #2711
AnarchX
Senior Member
 
Join Date: Apr 2007
Posts: 1,393
Default

Quote:
GPU specifications
This is the meat part you always want to read fist. So, here it how it goes:
* 3.0 billion transistors
* 40nm TSMC
* 384-bit memory interface
* 512 shader cores [renamed into CUDA Cores]
* 32 CUDA cores per Shader Cluster
* 1MB L1 cache memory [divided into 16KB Cache - Shared Memory]
* 768KB L2 unified cache memory
* Up to 6GB GDDR5 memory
* Half Speed IEEE 754 Double Precision
BSN
AnarchX is offline  
Old 30-Sep-2009, 11:08   #2712
trinibwoy
Meh
 
Join Date: Mar 2004
Location: New York
Posts: 9,809
Default

Quote:
What makes a single unit important is the fact that it can execute an integer or a floating point instruction per clock per thread.
__________________
What the deuce!?
trinibwoy is offline  
Old 30-Sep-2009, 11:15   #2713
MfA
Regular
 
Join Date: Feb 2002
Posts: 5,229
Send a message via ICQ to MfA
Default

Quote:
built-in ECC features inside the GDDR5 SDRAM memory
Is he just being disingenuous here or does he still don't get it will generally only corrects transfer errors?
MfA is offline  
Old 30-Sep-2009, 11:21   #2714
Ailuros
Epsilon plus three
 
Join Date: Feb 2002
Location: Chania
Posts: 7,768
Default

Quote:
Originally Posted by Davros View Post
GF100 ? where did this come from I know about G300, but Gf100 ???

edit: and Gt212 what the bloody hell is that ?
GT212 was IMHO a 40nm/D3D10.1 project which would had been a pretty dumb release considering that it also had a 384bit bus and 32SPs/cluster. It wouldn't had come close to GF100 though but most likely a future performance iteration of it. I'd say that if they had any common sense when they cancelled that project they moved its human resources into a GF10x performance GPU project.

Since you're asking questions I hope now some come can understand why the intentional false information in supposed roadmaps. They just "named" the D12U something like GTX280 1.5GB.
__________________
People are more violently opposed to fur than leather; because it's easier to harass rich ladies than motorcycle gangs.
Ailuros is offline  
Old 30-Sep-2009, 11:29   #2715
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: /
Posts: 4,070
Send a message via Skype™ to rpg.314
Default

Isn't there supposed to be 32 kb shared mem per block in dx11?
__________________
The views presented here are my own and not my employer's.
Quote:
Originally Posted by Alexko View Post
So in a nutshell, model [BLANK] will have [BLANK], up to [BLANK], and even [BLANK] for a power consumption of just [BLANK]. Impressive.
rpg.314 is offline  
Old 30-Sep-2009, 11:35   #2716
DegustatoR
Senior Member
 
Join Date: Mar 2002
Location: msk.ru/spb.ru
Posts: 1,311
Default

Quote:
Originally Posted by rpg.314 View Post
Isn't there supposed to be 32 kb shared mem per block in dx11?
48>32?
Ah, I see, it's Theo again.
He's talking about L1 cache there. Considering there is 1 MB of memory total and 16 KB L1 per SM and 16 SMs (512/32=16) how do you get to 1 MB from 16x16KB?
DegustatoR is offline  
Old 30-Sep-2009, 11:36   #2717
trinibwoy
Meh
 
Join Date: Mar 2004
Location: New York
Posts: 9,809
Default

Quote:
Originally Posted by 3dilettante View Post
Maybe it could happen, though like Charlie I would question whether it would be wise to try to out-Larrabee Larrabee.
Looks like that's exactly what they're trying to do. Strange that there's no mention of any graphics specific bits so far. Not saying there aren't any but the focus seems to have veered sharply away from graphics.

Quote:
A clean-sheet design that would basically abandon a huge chunk of the G80-GT200 framework would take time and resources to bring about. Given the time cycles for something like that, the roughly four years since the completion of G80 (assuming GT200's somewhat underwhelming improvements meant it was a secondary effort) would be a frighteningly tight timeline to architect a general purpose VLSI architecture.
That's true, but the same could be said for G71->G80 which was an even bigger change. Though they are trying to do more stuff now which could have put a strain on resources.

Quote:
A big problem I see, as was noted in the discussion concerning the latency of Nvidia's atomic ops, was how the read-write-read process for GPUs with their read-only caches was so very long. As far as general computation is concerned, the rearchitecting of how caches interact would be something Nvidia would be interested in looking at...
It's probably safe to assume that if they're serious about computing, performance of atomics would have been high on their todo list. Side question - are the existing caches on GPUS generally useful for non-texture data (not referring to the specialized caches like PTVC)?
__________________
What the deuce!?
trinibwoy is offline  
Old 30-Sep-2009, 11:37   #2718
trinibwoy
Meh
 
Join Date: Mar 2004
Location: New York
Posts: 9,809
Default

Quote:
Originally Posted by DegustatoR View Post
48>32?
Heh, where did you see 48? Theo didn't mention it

Ah, I see what you did thar! 1024/16-16=48
__________________
What the deuce!?
trinibwoy is offline  
Old 30-Sep-2009, 11:53   #2719
Rys
Tiled
 
Join Date: Oct 2003
Location: Kings Langley, UK
Posts: 2,675
Default

Quote:
Originally Posted by DegustatoR View Post
Considering there is 1 MB of memory total and 16 KB L1 per SM and 16 SMs (512/32=16) how do you get to 1 MB from 16x16KB?
There isn't 16KB of L1 per SM.
__________________
A major redesign of the core ALU pineapple boomerang fortress.
Rys is offline  
Old 30-Sep-2009, 11:57   #2720
DegustatoR
Senior Member
 
Join Date: Mar 2002
Location: msk.ru/spb.ru
Posts: 1,311
Default

Quote:
Originally Posted by Rys View Post
There isn't 16KB of L1 per SM.
There might be -)
DegustatoR is offline  
Old 30-Sep-2009, 12:00   #2721
fellix
Senior Member
 
Join Date: Dec 2004
Location: Varna, Bulgaria
Posts: 2,819
Send a message via Skype™ to fellix
Default

Or is it 32KB per 16-wide SM (two in a cluster) for a grand total of 512 SPs in 16 clusters and 1024KB array?!
__________________
Apple: China -- Brutal leadership done right.
Google: United States -- Somewhat democratic.
Microsoft: Russia -- Big and bloated.
Linux: EU -- Diverse and broke.
fellix is offline  
Old 30-Sep-2009, 12:02   #2722
trinibwoy
Meh
 
Join Date: Mar 2004
Location: New York
Posts: 9,809
Default

Quote:
As you can read for yourself, the GT300 packs three billion transistors of silicon real estate, packing 16 Streaming Multiprocessor [new name for former Shader Cluster] in a single chip. Each of these sixteen multiprocessors packs 32 cores
What's he talking about? An SM previously referred to each independent 8-wide SIMD of which there are 30 in GT200. He's referring to each SM as being 32-wide and only having 16 of them per chip. Is it safe to assume he's mucking up the terminology, the architecture really has changed, or he's making up the whole thing?
__________________
What the deuce!?
trinibwoy is offline  
Old 30-Sep-2009, 12:07   #2723
Rys
Tiled
 
Join Date: Oct 2003
Location: Kings Langley, UK
Posts: 2,675
Default

It really has changed. I can't say (well I could) if Theo's right or not, but GF100 is not terribly GT200-like in places. All will be revealed later today anyway, not long to go now.
__________________
A major redesign of the core ALU pineapple boomerang fortress.
Rys is offline  
Old 30-Sep-2009, 12:10   #2724
trinibwoy
Meh
 
Join Date: Mar 2004
Location: New York
Posts: 9,809
Default

http://www.fudzilla.com/content/view/15741/1/

Yep, so Fuad says as well. JHH will give us the business during his keynote at 1pm EST. Delays aside, it's good to know we'll have something new to dissect over the next few months
__________________
What the deuce!?
trinibwoy is offline  
Old 30-Sep-2009, 12:22   #2725
Vincent
Member
 
Join Date: May 2007
Location: London
Posts: 235
Default

Quote:
Originally Posted by trinibwoy View Post
http://www.fudzilla.com/content/view/15741/1/

Yep, so Fuad says as well. JHH will give us the business during his keynote at 1pm EST. Delays aside, it's good to know we'll have something new to dissect over the next few months

My present next year
Vincent is offline  

Closed Thread

Tags
nvidia, speculation

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 08:49.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.