If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#3426 | ||
|
Meh
Join Date: Mar 2004
Location: New York
Posts: 9,809
|
Quote:
Quote:
__________________
What the deuce!? |
||
|
|
|
|
|
#3427 |
|
Regular
|
The assembler instructions generated by the 4 offset gather HLSL are not something you'd ever actually expect to encounter unless you knew the HLSL instruction existed and would generate them. You have to know something will occur before you can decide to design hardware to accelerate it ... so if keeping this out of the docs and refrast kept AMD from knowing about the existence of the HLSL instruction for X months then that's a competitive advantage (this isn't about Evergreen, this is about future hardware).
|
|
|
|
|
|
#3428 |
|
Meh
Join Date: Mar 2004
Location: New York
Posts: 9,809
|
Well color me confused. I thought jittered point sampling was an old technique, the only difference being that prior hardware would do it one sample at a time instead of two at a time in Fermi.
__________________
What the deuce!? |
|
|
|
|
|
#3429 |
|
Regular
|
Jittered point sampling is an old technique, but it stopped really making sense for a while when you could gather4 at about the same cost ... it's the changed structure of Fermi texture addressing which makes it relevant again. You'd never use this on Evergreen hardware except if you were lazy or running sponsored code.
|
|
|
|
|
|
#3430 | |
|
Meh
Join Date: Mar 2004
Location: New York
Posts: 9,809
|
Quote:
If jittered point samples are useless it's curious that Nvidia is making a big deal about it (granted they're using screens from 3dmark06 to demonstrate the effect). I tried doing a quick google but came up empty. What are the new techniques for shadow mapping / SSAO that render jittered sampling irrelevant?
__________________
What the deuce!? |
|
|
|
|
|
|
#3431 |
|
Regular
|
It's better to have 16 quads from jittered locations than 16 single texel samples ... that's basically the choice with Evergreen.
|
|
|
|
|
|
#3432 |
|
Senior Member
|
Why would you not want to go through existing caches for this?
__________________
English is not my native tongue. Before flaming please consider the possiblity that I did not mean to say what you might have read from my posts. Work| RecreationWarning! This posting may contain unhealthy doses of gross humor, sarcastic remarks and exaggeration! |
|
|
|
|
|
#3433 | |
|
Senior Member
Join Date: Jul 2004
Location: NY, NY
Posts: 2,680
|
Quote:
each domain has seperate clocks, each domain has to sync with its internals, this goes with the polymorph engines as well, the data is the spit out to where ever it needs to go. As long as each domain within themselves sync up there are no problems. Its just like previous architectures, thats why the domain clocks are there, each domain work independently of each other, data is sent from domain to domain has to be synced within the domain the data is being processed at a certain time. Now with the g100 there is out of order processing but I'm pretty sure its always within the domain, *this is me guessing*. Last edited by Razor1; 19-Jan-2010 at 01:06. |
|
|
|
|
|
|
#3434 |
|
Itchy
Join Date: Feb 2002
Location: United Queendom
Posts: 2,858
|
|
|
|
|
|
|
#3435 | |
|
Regular
|
Conspiracy theory! Lets make one thing crystal clear, even though I let myself get drawn lengthy arguments on this it is pretty far out there. If it's true we will probably never hear of it, some people at AMD might get mad at Microsoft but it still would not be in their best interest to antagonize them in public. If it's false and AMD confirms it was only a public documentation error I will look foolish and we can all quickly forget about it.
Quote:
The other component is the center of my conspiracy theory ... the IHVs during DirectX standardization have to put a lot of cards on the table, their competition might not be immediately able to take that into account for their own hardware but they will take any implicit information about the other's upcoming hardware into consideration for their next generation. If NVIDIA got instructions into HLSL but got Microsoft to keep them out of the documentation and allowing them to simply declare "oh this is part of DirectX 11 too" at their convenience, then yes they got a clear competitive advantage. In a bad way. |
|
|
|
|
|
|
#3436 |
|
Senior Member
Join Date: Mar 2008
Posts: 4,917
|
why would ms go to nvidia after they were royaly screwed by them back in the original xbox days and why would they leave ati when ati delivered a fantastic part in the xenos that has allowed them to stay competetive with the ps3 dispite launching a year earlier ?
|
|
|
|
|
|
#3437 |
|
Itchy
Join Date: Feb 2002
Location: United Queendom
Posts: 2,858
|
@eastmen
Business is business. Sometimes you get stung but others you might get a good deal with no obvious caveats that is cheaper and faster than the competition. Money talks louder than words. |
|
|
|
|
|
#3438 |
|
Senior Member
Join Date: Mar 2008
Posts: 4,917
|
Thats what i'm saying. I pointed out two reasons why ati would be more likely than nvidia and as we see ati and nvidia's performance is quite in sync with each other
|
|
|
|
|
|
#3439 |
|
Junior Member
Join Date: May 2004
Posts: 91
|
Not to get too far off-topic, but it's not like Nvidia sprang something on them - they mutually came to an agreement in the design phase which benefited Nvidia quite a bit in the long run. I don't see sticking to the terms of a contract and taking your profits as an unscrupulous business practice. I seriously doubt that Microsoft would have done anything different if the roles were reversed and they were in Nvidia's position.
|
|
|
|
|
|
#3440 |
|
super willyjuice
Join Date: May 2005
Location: Astoria, NY
Posts: 986
|
Yes, let's try to keep the discussion on Fermi's architecture, not the internals of the next Xbox. You are always welcome to make a new thread over in the console forum if you wish to continue the discussion.
|
|
|
|
|
|
#3441 |
|
Member
Join Date: Apr 2004
Posts: 416
|
__________________
Vincent: G80 is designed for time to market, whereas the R600 is specialized in the rich feature. |
|
|
|
|
|
#3442 |
|
Senior Member
|
I still don't get it, why there is a dedicated tessellator for each SM?
__________________
Apple: China -- Brutal leadership done right.
Google: United States -- Somewhat democratic. Microsoft: Russia -- Big and bloated. Linux: EU -- Diverse and broke. |
|
|
|
|
|
#3443 | |
|
Beyond3d isn't defined yet
Join Date: Jan 2008
Location: New Zealand
Posts: 3,037
|
Quote:
But they have always been naughty like that, when Nvidia didn't play ball, they grabbed them by the balls and partially made R300 the runaway success it was. Whats the chances that the actual feature doesn't exist on the DX11 specification and the feature they implemented here is an intercept which nets the same result anyway? |
|
|
|
|
|
|
#3444 |
|
Iron "BEAST" Man
Join Date: Mar 2007
Location: NGC2264
Posts: 8,382
|
Just read the whitepaper and it sure looks innovative with great improvements in different areas that are IMO key for greatly pushing the graphic further in games and professionally. Glad I waited as a GTX380 will be the deal for me. Pretty much sure about this.
|
|
|
|
|
|
#3445 |
|
Regular
|
Read the bit about the code, it exists in the HLSL (but not in the docs or assembler/refrast, which is relevant since assembler/refrast are more strictly documented since drivers are written with them).
|
|
|
|
|
|
#3446 |
|
Senior Member
|
Since it's apparently only using L/S, I'd assume, it wouldn't go through Tex-Cache at all, but rather use the (larger) L1-/Shared-Memory-Pool.
__________________
English is not my native tongue. Before flaming please consider the possiblity that I did not mean to say what you might have read from my posts. Work| RecreationWarning! This posting may contain unhealthy doses of gross humor, sarcastic remarks and exaggeration! |
|
|
|
|
|
#3447 | |
|
Senior Member
Join Date: Mar 2002
Posts: 3,779
|
Quote:
I think NVidia is saying that they can do the latter in one or two instructions, or equivalently they can gather 8-16 jittered point samples with the same four instructions. That wouldn't make a difference in a 4x4 sampling footprint, but it would if your samples were farther apart. |
|
|
|
|
|
|
#3448 |
|
Regular
|
More samples is better even if they aren't ideally distributed and on ATI you get the extra samples from a quad virtually for free with gather4, so you should simply try to make use of them ... the optimal algorithms for both architectures are neither here nor there though.
What NVIDIA is saying is that there is an instruction in HLSL which up to this point has remained hidden, which if you know it exists you can decompile from assembler level and design hardware for to make it run efficiently. Knowing it exists is a rather important step though, without that knowledge you simply wouldn't expect those type of assembly instructions. They make absolutely no sense on the original hardware from which gather4 came (HD3/4/5, where you will just take all the samples). Last edited by MfA; 19-Jan-2010 at 15:33. |
|
|
|
|
|
#3449 | ||
|
Meh
Join Date: Mar 2004
Location: New York
Posts: 9,809
|
Quote:
I still don't know why that matters if you can only cull / setup / rasterize 4 triangles per clock though. Maybe there's something they're doing in the GS to discard primitives before they even get to the setup/rasterization stages. Quote:
__________________
What the deuce!? |
||
|
|
|
|
|
#3450 | |
|
Regular
|
Quote:
ATI always samples 128-bits at a time. If you choose to throw away 96 bits, then so be it. The hardware will not bother loading the 96 bits into registers if you choose not to use them. But the memory transaction is 128-bits. The compiler should be able to coalesce distinct fetches when it sees they are using a common sample address, resulting in a 128-bit fetch, rather than several 32-bit fetches. That's very much dependent on how the code's written though, and wouldn't apply when the developer chooses to use nicely jittered samples, whose average is no greater than 32-bits of data per 128-bit sampling address. Of course looking at all the pixels the average fetch per 128-bit address is likely to be higher. So global memory traffic won't show such a severe disparity in effort versus results. But there's no doubt that slowing the samplers down to 1 32-bit result per pixel per clock is going to make ATI slow here (actually, 1/4 of that, once ALU:fetch is taken into account). Jawed
__________________
Can it play WoW? |
|
|
|
|
![]() |
| Tags |
| delay, fermi, geforce, gf100 |
| Thread Tools | |
| Display Modes | |
|
|