Sound_Card
Regular
http://www.nordichardware.com/news,7623.html
RV770 to stay on GDDR3 instead of GDDR5 but will come out sooner?
RV770 to stay on GDDR3 instead of GDDR5 but will come out sooner?
It handles them sequentially, from what I've gleaned.How do you think the 2 SIMD RV610 handles the 3 types of shaders it has to run?
Sure, that works. It's not the only solution I can think of, though, or at least not single queues.Input/output queues for the win.
Eventually the instruction has to leave the queue and the control signals are fed to each of the 16 elements in the SIMD.Which part of R600 are you thinking of here? Each part of the chip has queues.
As long as it's time-slicing, yes.With time-slicing for general kernels I honestly don't see any relevance in the absolute number of SIMDs in a chip (ignoring their width).
I think it's possible.w/ 1,1GHz+ GDDR3 & 256bit SI RV770 would have only a bit over 70GB/s bandwidth which would be less than what RV670 has, so imho this is pure bs.
When i posted the speculations at our own forums on our website earlier today, they were regarded as "BS" and "link bait" (you can read both of it here at B3D).A persistent rumour is that RV770 is 50% faster than RV670. It might turn out that this is when MSAA is turned on and is due to having 4 Zs per clock instead of 2.
So, if RV770 with enhanced Z is capable of using bandwidth more effectively, then 50->70+ GB/s is much like the 50% performance gain that's rumoured.
This would then indicate that it has 16 TUs and then I wouldn't be at all surprised about 5:1 ALU:TEX. That would leave "R780" as a 800 SP, 32 TU, 32 ROP 2-chip board.
Reminds me of when I decided that there was a distinct possibility that R600 would be only 16 TUs (instead of the much rumoured 32)...
Jawed
Well, for what it's worth it might be best to think of an R600 clause as a macro op and the individual instructions as micro ops There can be as many as 128 micro ops, each of which is a 5-component VLIW.Eventually the instruction has to leave the queue and the control signals are fed to each of the 16 elements in the SIMD.
That connection would be analogous to a single-instruction issue port.
Actually, it's pretty much the same thing as the issue ports to an x86 processor's SIMD units.
Are you talking about the VLIW instruction or something else?As long as it's time-slicing, yes.
I guess my hangup is that it's not really concurrent execution when doing that.
Underutilization of parallel units of any sort during a given time-slice can't be opportunistically allocated to code waiting for its time slice.
AMD's CPU macro ops actually might map most closely to the individual instructions in a clause.Well, for what it's worth it might be best to think of an R600 clause as a macro op and the individual instructions as micro ops There can be as many as 128 micro ops, each of which is a 5-component VLIW.
Just portions of the chip in general.Are you talking about the VLIW instruction or something else?
Jawed
w/ 1,1GHz+ GDDR3 & 256bit SI RV770 would have only a bit over 70GB/s bandwidth which would be less than what RV670 has, so imho this is pure bs.
Or it's the well known game of FUD: Feed someone a snippet about some magical "internal ringbus" with 512 Bits and he'll keep relaying that, not knowing, that you're not counting external bits.Long time ago I heard rumors that RV770 will use 512bit memory, it may be false. Otherwise their could be 2 versions of RV770 256bit memory 1200MHz GDDR4 and 512bit memory 900MHz GDDR3.
Or it's the well known game of FUD: Feed someone a snippet about some magical "internal ringbus" with 512 Bits and he'll keep relaying that, not knowing, that you're not counting external bits.
HD2900's ringbus is also supposed to be 1024 bits wide.
Proves my point - thanks!Supposed to?It actually is-internally.
Proves my point - thanks!
['supposed to' in the sense that most of the time when you talk about memory interfaces, you're characterizing their external bit width]
That's if you're not in marketing
If you're to calculate the maximum bandwidth of a GPU, internal bandwidth means jackshit and no marketing departments haven't gone as far yet to calculate bandwidth based on those.
As for RV770 my gut feeling tells me that it's more likely it'll come with GDDR4 than GDDR3, yet still such a detail is not that important to me as long as the chip gets supplied with sufficient bandwidth.
If it truly can now do 4z/clock the first thing I'd like to see is its 8x MSAA performance.
Equally interesting would be if resolve still takes place in the ALUs or if it "bounced back" to hardware; in the latter case I'd really like to hear then what supported that change, since we've heard times and times again that shader resolve isn't a problem after all.
Gah, you're all nazis. Marketing doesn't have to calculate bandwidth in order to strategically place 512/1024 bits here and there and voila, it creates noise about it. I wasn't arguing that's it's relevant or irelevant...in fact, i wasn't arguing at all. And this is O/T.
I'd be surprised if resolve went back to dedicated HW.
No, because each of the 4 SIMDs in R600 is independent of the others. The 64 pixels in a batch (where 63 don't want to run some instructions) all execute over a period of 4 consecutive clocks in just one of the SIMDs. The other SIMDs can be doing VS, GS or PS work.
Jawed
I think it's possible.
If you compare 8800GTS-512 performance against HD3870 you can conclude that RV670 is wasting a lot of bandwidth. You could argue that RV670 is performing the same as if G92 had, say, 50GB/s available.
A persistent rumour is that RV770 is 50% faster than RV670. It might turn out that this is when MSAA is turned on and is due to having 4 Zs per clock instead of 2.
So, if RV770 with enhanced Z is capable of using bandwidth more effectively, then 50->70+ GB/s is much like the 50% performance gain that's rumoured.
This would then indicate that it has 16 TUs and then I wouldn't be at all surprised about 5:1 ALU:TEX. That would leave "R780" as a 800 SP, 32 TU, 32 ROP 2-chip board.
Reminds me of when I decided that there was a distinct possibility that R600 would be only 16 TUs (instead of the much rumoured 32)...
Jawed