Because theres a large difference between coherent read and coherent write, one does not imply the other.
do you have proof that its anything but?
or you just feel that there's more to the story?
Because theres a large difference between coherent read and coherent write, one does not imply the other.
Because theres a large difference between coherent read and coherent write, one does not imply the other.
There are two types of coherency in the Durango memory system:
Fully hardware coherent
I/O coherent
The two CPU modules are fully coherent. The term fully coherent means that the CPUs do not need to explicitly flush in order for the latest copy of modified data to be available (except when using Write Combined access).
The rest of the Durango infrastructure (the GPU and I/O devices such as, Audio and the Kinect Sensor) is I/O coherent. The term I/O coherent means that those clients can access data in the CPU caches, but that their own caches cannot be probed.
do you have proof that its anything but?
or you just feel that there's more to the story?
Proof that coherent read doesn't equal coherent write?.
Well for one it could be that, as I have previously mentioned, coherent read requires you to snoop the CPU's cached values in some way, whilst coherent write requires you to either flush the entire cache of the GPU or to bypass the cache entirely.
You're just saying the same thing, I acknowledge it's a package and I didn't suggest that the eSRAM is a band-aid solution, but it, however IS there to provide the bandwidth that the system will lack when DDR3/DDR4 was chosen. If you think this is a "band-aid" solution then you will find many engineering designs to be full of band-aids.
Proof that coherent read doesn't equal coherent write?.
Well for one it could be that, as I have previously mentioned, coherent read requires you to snoop the CPU's cached values in some way, whilst coherent write requires you to either flush the entire cache of the GPU or to bypass the cache entirely.
This is accurate AFAIK: http://www.vgleaks.com/durango-memory-system-overview/
Esram is band-aid as much as the gddr5 is the band aid to a less well designed system. It's like saying we need a four lanes freeway everywhere. What if we could isolate the traffic so the we build a four lanes freeway where it's needed and regural two lanes street where we don't...It's like saying TBR is a bandaid to faster memory. It's designed to solve the same problem differently.
In my world, the term band-aid fixed something that is flawed...whether its a design flaw or implementation flaw, and it's not meant to be permanent as part of the design paradigm. Esram is evolution of edram of the 360, so not sure how one can conclude it to be a bandaid.
The parts of this specifically relating to your comments are that the CPU does not snoop the GPU cache.
It actually works by having the GPU invalidate CPU cache lines on write and any reads from those invalidated cache line have to wait for the data to be flushed from the GPU cache.
The GPU can snoop the CPUs cache.
Can you elaborate on why you would believe that reads can be coherent if writes aren't? Wouldn't writes being non-coherent automatically make any reads to altered locations also be non-coherent? MS couldn't claim hardware coherency if they didn't support coherent writes from the GPU.
Esram is band-aid as much as the gddr5 is the band aid to a less well designed system. It's like saying we need a four lanes freeway everywhere. What if we could isolate the traffic so the we build a four lanes freeway where it's needed and regural two lanes street where we don't...It's like saying TBR is a bandaid to faster memory. It's designed to solve the same problem differently.
In my world, the term band-aid fixed something that is flawed...whether its a design flaw or implementation flaw, and it's not meant to be permanent as part of the design paradigm. Esram is evolution of edram of the 360, so not sure how one can conclude it to be a bandaid.
You're only pointing out the advantages (and ignoring some of the consequences/relevency of the advantages) of one system while ignoring the disadvantages.
Why not list the advantages of the other design?
There is a reason why people are giving thumbs up to one console for having 8GB of GDDR5.
Again, if you view the eSRAM with the underlying context being its motivating purpose for existing is to alleviate main RAM bandwidth constraints, you run the risk of concluding that it must be some exotic design that devs will struggle with
Thats what im talking about, thawhich types of tasks benefitt you have to invalidate the L2 cache to get coherent writes, and according to later vgleaks articles its not singular lines, its the entire cache. Unless im reading this wrong.
I'd have to double check the specifics again but it was my understanding that the GPU could invalidate individual lines in the CPU cache giving coherent writes and can snoop the CPU cache for coherent reads.
The GPU has to flush it's entire cache when writing coherent memory but that's separate to how the CPU is notified of the changes.
I think there will be a lot of work being done to figure out which type of jobs benefit from coherency with less bandwidth.
Oh where good then I was just talking wrt to the GPU writing coherent memory itself. So your saying that when the flush happens the CPU sees the changes by invalidating lines on the CPUs cache?.
That's my understanding, the GPU invalidates CPU cache lines on write. Presumably the CPU would then stall untill the GPU cache write back is complete.
Is the GPU also effectively stalled whilst it is writing back its cache?.
That's my understanding, the GPU invalidates CPU cache lines on write. Presumably the CPU would then stall untill the GPU cache write back is complete.
It doesn't look like the CPU side is aware of what the GPU's cache hierarchy is doing, so it won't be stalling for what could be a long-latency operation.
Coherent traffic over the Onion bus inserts itself into the request queue that orders CPU coherent traffic.
Until that happens, the CPU doesn't know what the GPU is doing, and the request queue is the mechanism for coherence after traffic gets to the end of the Onion bus.
The CPU isn't aware. The GPU invalidates the CPU cache so if it does that and starts a write, wouldn't a CPU read have to wait for that to complete before it can begin?