NVIDIA Fermi: Architecture discussion

But everything else he's right about, including that they EOL'd the GT200 parts, something you steadfastly denied across various forums until you couldn't do it anymore.

And I'll steadfastly continue to do so until it's proven true. I still rank common sense over Charlie's rants. Feel free to blindly follow/defend him if that floats your boat. Where do you get that he's right about the EOL story though? And are you ignoring his convenient backup story about them clearing the shelves before a price drop?
 
At any rate, GT200 parts are not available in the market today. Whether nv eol'ed them or screwed up the ordering earlier, doesn't matter for the nv q4 results. And it is hard for an outside observer to tell what exactly happened.

But yeah, companies rarely abandon market share totally. They usually prefer to take losses instead of letting the other side win share, unless of course, the loss is too much to take. On GT200b parts, latter might have been true, but there is no easy way to tell.
 
Can atomics be used to implement barrier synchronization of threads that have arbitrarily diverged?
No - the problem is that even atomics are only useful for pseudo-communication when used together with barries, which are not only memory barries but *execution* barriers. The problem is that if you sit around in one "thread" waiting for some value to change, there's no guarantee that it *ever* will. In reality you may be just stalling a for loop whose future iteration is what will actually change the value...
 
At any rate, GT200 parts are not available in the market today.

Sigh, based on what?

1. Semi-accurate says they're clearing shelves before a price drop
2. Semi-accurate says they're going EOL
3. Digitimes says 55nm is in short supply from both Nvidia and ATi
4. Fudzilla says partners can't get parts
5. BSN claims partners say the parts are still in full production
6. Newegg has lots of 260s, 275s, 285s and 295s in stock

Take your pick.
 
Seriously I refuse to fuel any fool's kindergarten crusade. You can't bother me to care about GT200 availability one bit and even more so in a GF100 thread. Once my financials get better next year I'll see what I will get then with a complete system overhaul, but at the moment Cypress looks like the most serious real candidate.
 
Sigh, based on what?

1. Semi-accurate says they're clearing shelves before a price drop
2. Semi-accurate says they're going EOL
3. Digitimes says 55nm is in short supply from both Nvidia and ATi
4. Fudzilla says partners can't get parts
5. BSN claims partners say the parts are still in full production
6. Newegg has lots of 260s, 275s, 285s and 295s in stock

Take your pick.

7. ASUS says that there won't be GTX285's coming to northern europe anymore, and once current 260/275 stocks run out, there won't be those either. Most likely applies to 295 too. (This was asked specificly for northern europe, it's possible or even most likely that it applies to other parts of the world too)
8. 285's haven't been available in many parts of europe for several weeks.

Newegg and/or US isn't the whole world.
 
Of course it isn't but just as you can use arbitrary evidence to support one side of the story you can do the same for the other. Point being that all we have so far is he said, she said BS.
 
Of course it isn't but just as you can use arbitrary evidence to support one side of the story you can do the same for the other. Point being that all we have so far is he said, she said BS.

I wouldn't call ASUS saying something as "he said, she said BS"
 
No - the problem is that even atomics are only useful for pseudo-communication when used together with barries, which are not only memory barries but *execution* barriers. The problem is that if you sit around in one "thread" waiting for some value to change, there's no guarantee that it *ever* will. In reality you may be just stalling a for loop whose future iteration is what will actually change the value...

What happens if you yank out the syncthread in the code and leave something like a loop on an atomic CAS in place?
 
Why is this still a point of contention? Nvidia pretty much confirmed to Anandtech that the 260-285 are EOL.
NVIDIA told me two things. One, that they have shared with some OEMs that they will no longer be making GT200b based products. That’s the GTX 260 all the way up to the GTX 285. The EOL (end of life) notices went out recently and they request that the OEMs submit their allocation requests asap otherwise they risk not getting any cards.
 
Why is this still a point of contention? Nvidia pretty much confirmed to Anandtech that the 260-285 are EOL.

Why exclude the sentence immediately following that paragraph?

The second was that despite the EOL notices, end users should be able to purchase GeForce GTX 260, 275 and 285 cards all the way up through February of next year.

What does EOL mean if products are still available in retail? Hmmmmm.

What does all this have to do with Fermi?

Nothing. Anything you'd care to share to reinvigorate that particular discussion :)
 
Nothing. Anything you'd care to share to reinvigorate that particular discussion

Nah.. not yet...

What does EOL mean if products are still available in retail? Hmmmmm.

Its just doom and gloom. Nvidia had inventory issues with some launch cards because they over ordered. They probably didn't order as many too begin with to avoid that with Fermi's coming launch.

I look at typical european stores such as overclockers.co.uk and stores like newegg and still see plenty in stock.
 
Much of the time when using atomics on CPUs, the is not really waiting for something to be finished/chained, but to ensure that you do not read or write something in an invalid state. Check out Cliff Click's no-lock, no-barrier concurrent hashmap http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf. CAS is only needed so that one can atomically update pointers. There's concurrent lock free, barrier free versions of queue, priority queue, and trees as well.

Barriers are useful when you need the output of another thread to make progress. Problems arise when some threads finish much sooner than others and are left waiting, especially if the wait is a spin-lock. Some of the traditional approaches for dealing with this that I'm aware of has been barrier splitting (grossly partitioning into multiple barriers to avoid contention on a single barrier)

The threaded programming model has lots of problems that make it error prone and subject to difficulty, on the other hand, it offers the possibility of max performance with careful coding on shared memory architectures. I'm still not convinced that presenting a 'thread' model to the develop makes sense here.
 
Poor Charlie boy is on a rampage again: Fermi is for a second tape out, more delays
Just a quick note before I go to bed: the trick is obviously that NVIDIA has the respin-ready wafers parked at TSMC. They won't need 6 weeks to get hot lots given that; and they've presumably got enough wafers parked for mass production/initial availability as Charlie himself previously reported, not just for hot lots. Of course, if they don't tape-out very soon, even that won't save them to get anything out this year...
 
No - the problem is that even atomics are only useful for pseudo-communication when used together with barries, which are not only memory barries but *execution* barriers. The problem is that if you sit around in one "thread" waiting for some value to change, there's no guarantee that it *ever* will. In reality you may be just stalling a for loop whose future iteration is what will actually change the value...

Sorry if I'm being thick, but I don't understand. Lets outlaw the use of __sync for a moment, and just reimplement barriers by having each participating work item atomically decrement a common counter value and then spin on the counter value till it becomes <= 0. In what way does the hardware (rather than buggy software) prevent forward progress?

I'm not on-board with the NV terminology here (inability to spawn work-items from within a work-item without CPU-level intervention, and inability to execute irreducible CFGs are convincing arguments to me), just trying to understand this particularly point.

Edit: sorry, didn't see 3dilettante has already asked the same question.
 
Last edited by a moderator:
by having each participating work item atomically decrement a common counter value and then spin on the counter value till it becomes <= 0. In what way does the hardware (rather than buggy software) prevent forward progress?

I thought the concern would be if a thread which was spinning hogged the warp's execution? T1-T16 belong in W1. T1 completes and spins. T2-T16 need to run, but W1 never, ever runs T2-T16, but instead keeps running T1's spin loop waiting for it to be done.

At least, that's what I understood, otherwise he's got someone else to educate

-Dave
 
Back
Top