Coherent access may not need CPU or GPU power boost at all. It's a communication thing.
e.g. One may batch their work on the same CPU to reduce external "sharing".
And 30 GBs of coherent access between the gpu and cpu is not cache related. Durango's performance drops to 15 GB/s when it comes to actually hitting the cpu caches.
I read about this in the leak and I'm not sure what to make of it...isn't a cache hit better than a miss? why would this incur a performance drop?
The Onion bus has over 10 GB/s of bandwidth, and there's no restriction that traffic over it fall within a 2MB window.Wouldn't you pin any portion of the cpu cache that needed to maintain coherency with the gpu?
If you want bandwidth, the page table entry would be assigned the necessary attributes for Garlic.Instead of flushing that data to cacheable portion of the system memory, could you flush the data using the write combined buffer to uncached system memory? Which the gpu could access over garlic?
If you meant taking data going over Onion and putting it in the non-coherent write combining buffers, that would then require just throwing away the APU when it comes out of the factory.
I'm afraid I'm not following you on this.No I meant data that being flushed from the cache to main memory. Under normal circumstances wouldn't servicing cache misses be slower?
Not quite. The arrow goes over the cores and L2, which makes sense since Llano's L1s are exclusive and would need to be snooped as well.In the AMD zero copy presentation, Llanos's gpu reads and writes to cacheable memory was limited to 4.5-5.5 GBs. AMD presented a data path figure for gpu accesses to cacheable memory which went over the UNB to L2 back to UNB then to the system memory.
Assuming we're talking about GPU accesess, that's Garlic.While access to uncacheable memory was 6-12 GBs and some midrange Llanos zero copy transfer rates hit 15 GBs.
The CPU has no idea what data the GPU might want, and in the interests of latency CPU cache writeback data isn't going to hang around on chip for very long.I was wondering if it was possible that coherent data that initially resided in the cpu cache but got flushed to main memory could be moved in a way that allows the gpu faster access to that data.
AFAIK, that type of access to CPU require the use of Onion+ bus.
I was wondering if it was possible that coherent data that initially resided in the cpu cache but got flushed to main memory could be moved in a way that allows the gpu faster access to that data.
Could we be looking at the PS4 SoC & not even know it?
“We are looking at an architecture where the bulk of processing will still sit on the main board, with CPU and graphics added to by more digital signal processing and some configurable logic.”
Aren't we fairly confident that the systems are just AMD Fusion based?
Would anyone be surprised by that?
The Onion bus has over 10 GB/s of bandwidth, and there's no restriction that traffic over it fall within a 2MB window.
Fixed Function Accelerator = Vector Co-processor
Huh? Wouldn't that be quite the shocker? Why hasn't Sony spoken about this?
Seems like the newest incarnation of secret sauce.
When your enemy is down, that's when you deal the killing blow. You don't give them chance to recover. Sony took the performance advantage line and ran with it. MS has dealt a couple of come-back blows. If Sony have something to retaliate with, it behoves them to use that. there's something to be gained from telling the world that your hardware has a capable DSP that'll add to the experience. There's nothing to be gained from withholding that info.They could always be keeping their full deck hidden. After all they under no compulsion at all to try one upmanship with MS over hardware. They've already got that in the bag.
The only thing I can relate it to is the zlib block.