Question about cache size, and its benefit for 3d rendering
How is it OT? Discussing the kinds of PC components that would be equivalent to the One, and their pricing, and the technical comparison to the One seems on topic to a technical hardware investigation thread.
I'll stop now, but I figured this was the best thread to put it on, since none of the other seemed to fit.
Well it is not that point to that article, as you did, was OT but the follow up had to stir away from the original topic and investigate current and up coming PC, which I did and others too.
Anyway the Shifty vouched for a new thread so the discussion can go on
--------------------------------------------------------------------------------------
Anyway I agree with you putting together with off the shelves a PC parts that is a match for the PS4 and/or xbox1 (damned I missed the "ps360", laziness...). Though being in the performances ballpark is doable. Form factor, power consumption, etc. are out of touch... for now.
I will elaborate further on my previous post before I ask the resident tech head here a couples of questions.
For example Kaveri, could be an interesting case if AMD as some rumors hinted ir could use GDDR5.
Everybody mostly knows what Kaveri consists in, just the sake of the conversation:
2 modules / 4 cores, 8 Cus 8 ROPs and 128 bit bus.
With GDDR5 that thing could be a "little beast" that would make quite some budget gamers and people in emergent market happy.
Not up there with the up-coming gen of console, it should not support more than 4GB of GDDR5, but head and shoulder above the ps360 (from processing power to the amount of RAM) by such a margin that it could almost qualified as in the ball park of next gen performance. Definitely not "last geny" even if not completely next geny, in anyway way closer to the later than the former.
Kaveri should ship mostly as the same time as the next generation of console and (sadly for AMD) I expect them to be quite affordable chip, GDDR5 (if used) should spiced up the price but it should definitely offer incredible value (as a PC part).
For the ref, some estimate the die size of Kaveri ~220mm^2 so it is slightly tinier than Trinity (TSMC process is really dense
).
Whereas Kaveri is yet to be released Intel just released Haswell and more relevant to my rant: Crystal Web.
Crystal Web seems to hold on its promise, Intel claims that to match-up there set-up it would take between 100-130GB/s to a GDDR5 interface.
If the review I read are nay hint, putting aside what could be lacking in current Intel GPU they may not be over doing it, at least the bottom of the spread sounds solid (and that is quiet impressive already imho).
Courtesy of Anandtech:
There’s only a single size of eDRAM offered this generation: 128MB. Since it’s a cache and not a buffer (and a giant one at that), Intel found that hit rate rarely dropped below 95%. It turns out that for current workloads, Intel didn’t see much benefit beyond a 32MB eDRAM
I think that data quiet interesting and actually I wish Intel had let info about how cache hit rates was affected by going even tinier than 32MB.
Now to the 22nm process and AMD next APUs.
Estimates have Kaveri at ~220mm^2 on TSMC 28nm process. Assuming perfect scaling (good enough as a ballpark figure) AMD could fit within the same silicon footprint: 4 modules / 8 cores, 16CUs 16 ROPs.
Whereas such a chip "from the distance" looks pretty close to say the PS4 it should fall quite short at rendering (at best using GDDR5 half the bandwidth and half the ROPs). The matter of the point is that AMD is unlikely to produce such a chip.
Actually I don't expect AMD to go further than 4 cores (/2 modules). For the iGPU, I don't expect eithert AMD to stuff 1024 Stream Processor (16 CUs) in their next APU:
1) it could hurt some of their discrete GPU sales.
2) they would not be able to feed properly (/bandwidth constrain).
3) AMD said it were to leverage gaming, they have discrete part for higher end set-up, APU just has to be in the ball park of the up coming generation of consoles (/lower end).
Overall I would not be too surprised if AMD vouches for a configuration really close to Durango aka 768 SP. It sounds "right", it would provide a neat improvement from Kaveri and allow AMD to clock the iGPU low in mobile part saving quiet some power.
I read a few guesses/rumors about AMD making change to its CUs and I though it could make sense (looking at what Nvidia did), I could see AMD make some improvements in compute density (even normalized to the process used) by lowering their TEX/ALU ratio. It seems to me that AMD has not radically changed its good old SIMD moving from Cayman/VLIW4 to GCN. A Compute Unit / SIMMD array is still compromised of 4 Stream Cores, each Stream Core compromised of 16 Stream Processors ( I hope I get their naming policies right).
It would not surprise me too much if AMD were to beef up the front end of their SIMD/CU and go with 6 groups of 16 Stream Processors instead of 4 (8 sounds like pushing without increasing the texturing power). I would hope that would further increase their compute density (may be not much but like from VLIW4 to GCN enough to cover the cost of others improvements, it seems they have not touched their ROPs much in a while for example) while achieving perfect or close to perfect scaling.
So long story made short, a 4 cores, 8 "new/wider" Compute Units follow up to Kaveri could end way below Kaveri as far as die size is concerned which lead me (finally)
to my question to our dear resident tech heads
It seems that an Intel like solution (ala Crystal Web) is out of reach (for multiple reasons) though AMD just put together a chip (Durango) that includes 32MB of eSRAM (not a cache though). It got me to wonder about the usefulness of "big" cache for rendering. I would not expect AMD to be able to fit 32MB of L3 on die in its upcoming APUs though I wonder about the benefits of a lesser amount of cache say 16MB.
I can't wrap my head around it, I think there are data available that could allow some people here to make some guesstimates about the benefits.
For example that
type of data courtesy of the TechReport. there are also interesting posts in
that thread, especially sebbbi's ones (about the amount of texture data accessed per frame, the size of shadow maps, etc. and how that could fit in high end Intel CPU L3 cache).
I can't my-self wrap my head around it, not even close but I think that as silicon budget are growing and exotic memory solutions are yet to come I wonder about the extend of the hit "big caches" that used to be reserved to server CPUs type of device could put a hit in the "bandwidth wall".
PS: I think that is relevant to the thread as it is related to how good (and competitive with the upcoming consoles) pretty affordable piece of kit (200-250mm^2 piece of silicon, with a dual memory channel set-up using DDR3/4) could get in really near future.