Predict: Next gen console tech (9th iteration and 10th iteration edition) [2014 - 2017]

Status
Not open for further replies.
GDDR6 will arrive in early 2018.

Seoul, April 23, 2017 – SK Hynix Inc. (or ‘the Company’, www.skhynix.com) today introduced the world’s fastest 2Znm 8Gb(Gigabit) GDDR6(Graphics DDR6) DRAM. The product operates with an I/O data rate of 16Gbps(Gigabits per second) per pin, which is the industry’s fastest. With a forthcoming high-end graphics card of 384-bit I/Os, this DRAM processes up to 768GB(Gigabytes) of graphics data per second. SK Hynix has been planning to mass produce the product for a client to release high-end graphics card by early 2018 equipped with high performance GDDR6 DRAMs.
http://www.skhynix.com/eng/pr/pressReleaseView.do?seq=2086&offset=1
 
GDDR6 will arrive in early 2018.

Seoul, April 23, 2017 – SK Hynix Inc. (or ‘the Company’, www.skhynix.com) today introduced the world’s fastest 2Znm 8Gb(Gigabit) GDDR6(Graphics DDR6) DRAM. The product operates with an I/O data rate of 16Gbps(Gigabits per second) per pin, which is the industry’s fastest. With a forthcoming high-end graphics card of 384-bit I/Os, this DRAM processes up to 768GB(Gigabytes) of graphics data per second. SK Hynix has been planning to mass produce the product for a client to release high-end graphics card by early 2018 equipped with high performance GDDR6 DRAMs.
http://www.skhynix.com/eng/pr/pressReleaseView.do?seq=2086&offset=1

Would this forthcoming high-end graphics card be from Nvidia or AMD?

We know Vega will use HBM2, don't we?

Is this a portent to GPU vendors abandoning wholesale HBM?
 
Hm so with a 256bit bus memory you would get 480~512GB/s at 14~16gbps. Don't think you would need moe bandwith than that.

Bandwidth-wise its fine, but ram amount could be problematic. This still has a limit of 1GB per chip, which means that realistic limit is 16GB.
 
The other issue is power consumption so how many watts difference between 512gb/s sec of GDDR6 vs 2 stacks of HBM2.

Couldn't find anything on HBM2 power consumption, but I did find an Anandtech article for estimated power consumption for HBM1 (on the AMD Fury X) and GDDR5x at around 14W and 20W respectively.

Seems in general HBM is lower power, but I think may carry a significant price premium at the moment.
 
Bandwidth-wise its fine, but ram amount could be problematic. This still has a limit of 1GB per chip, which means that realistic limit is 16GB.
The first one announced is 8Gb, but it's very possible 16Gb would be coming out soon after.
 
Question, is 512GB/s of memory bw actually enough for a hypothetical 12-14 TFLOPS PS5?

Considering, the console will likely have the same unified memory architecture, possibly GPU with 64 ROPs** and clocked in the GHz range? And that's not even considering the increasing bw requirement of a Zen-based CPU?

** I'm thinking about the bw requirement for the ROPs. I have no idea if this represents peak GPU bw consumption or not? Do compute shaders use more?

Vega will have 512GB/s of memory bw with HBM2. However, Vega doesn't have to share that bw with a CPU the way consoles do.
 
From sebbbi's posts compute does not use rops so is independent in its bandwidth use, I believe he also said it was beneficial to break work down to a level internal cache was used so read and writes to ram was minimal as that uses bandwidth but also wastes flops.

Depending how things change perhaps larger internal cache is a bigger win than huge external bandwidth.
 
** I'm thinking about the bw requirement for the ROPs. I have no idea if this represents peak GPU bw consumption or not? Do compute shaders use more?
GCN ROP can write up to 8 bytes per cycle (RG32 or RGBA16). With 64 ROPs this is 8 bytes * 64 = 512 bytes / cycle (= 512 GB/s at 1 GHz). Maximum bandwidth can only be reached with 8+ byte formats. RGBA8 for example reaches only 1/2 of maximum ROP bandwidth usage.

GCN compute units have issue rate of 16 cycles for non-coalesced writes (64 wide vector). All formats have full rate (including 32 bits per channel, 4 channels). Thus RGBA32 gives the best bandwidth usage. 16 bytes * 64 / 16 = 64 bytes / cycle for a single CU. Fury X has 64 CUs = 64 * 64 = 4096 bytes / cycle total (= 4 TB/s at 1 GHz). Maximum bandwidth usage can be only reached with the widest formats (32 bits per channel, 4 channels). But there's one exception: GCN is able to coalesce 1d writes that hit to sequential addresses. Coalesced writes have issue rate of 4 cycles (4x faster). Thus 4x R32_UINT writes are as fast as single RGBA32_UINT write if the memory access pattern is fully linear.

Pixel shaders can also write directly to memory using UAVs. This way you can sidestep the ROP bottlenecks. You need to use [earlydepthstencil] attribute for your pixel shader, as UAV writes don't have late-Z test like ROPs do.

Compute shader memory load benchmark:
https://github.com/sebbbi/perftest

GCN loads and stores are same speed. I don't know about other architectures. Need to add write tests at some point. It's important to notice that this test application isn't a memory bandwidth tester. The whole data set fits to L1 cache. This way you can measure maximum theoretical throughput better.
 
I'm normally wrong when I suggest this ... but ... could GDDR 6 be part of the roadmap for Scorpio?

6 x 16gb GDDR6 @ 14 ghz on three, 64-bit memory channels would give them a touch more BW than they have currently while using a low speed bin for GDDR6, and (very) small increases in CPU and GPU clock may be able to cover any performance hit from having less concurrency with a narrower bus.

So you halve number of chips, shrink the board, save 30+ mm^2 on chip area from reduced interface, shrink chip to 10 nm, save power on that too, move to a cheaper cooler, moved to a smaller PSU .... etc etc

Basically, supercede both the S and the Scorpio with a single cheaper, smaller Scorpio equivalent in 2019/20 and make Scorpio the minimum target for software. Then bring along the next "premium" beast machine a couple of years later.

Rinse and repeat.
 
I think GDDR6 will be part of the roadmap of PS5 (16 chips of 1 Gbyte). Dont think Ryzen will never be used in a console... 5 billions transistors for the CPU only is really too much... most probably Jaguar@2.5Ghz, maybe 20 cores (so 5 cluster of 4). I think the problems of Jaguar are going a bit less with very fast memory bus...

As I see on PS4 APU pictures 4 Jaguar CPU = 5 Graphic CU dimensionally...
 
Last edited by a moderator:
That Ryan transitor count is for the highest tier, majority of which is probably in the L3 cache. The next consoles won't use that and certainly won't use Jaguar. It will be some Ryzen variant that won't take up more than a 1/4 of the chip area and power consumption budget.
 
I think GDDR6 will be part of the roadmap of PS5 (16 chips of 1 Gbyte). Dont think Ryzen will never be used in a console... 5 billions transistors for the CPU only is really too much... most probably Jaguar@2.5Ghz, maybe 20 cores (so 5 cluster of 4). I think the problems of Jaguar are going a bit less with very fast memory bus...

As I see on PS4 APU pictures 4 Jaguar CPU = 5 Graphic CU dimensionally...

If they go AMD it will be Zen based, for one thing its 44mm sq for the 4 core CCX on 14nm. Heaps of Zeppelins space is taken up by "uncore" functions, massive sram arrays outside the CCX (probably cache directories), 32 pci-e lanes, 8 phys for GMI, crypto accelerators, multi 10Gb/e , usb controllers etc.

7nm is supposed to bring 50% area reduction vs 14LLP so if CCX's stay at 4 cores thats under 50mm sq for 8 core zen based chip. If the ccx's grow to 6 cores* then a console could get 12 cores for ~75nm. IMO if PS5/XB2 are AMD based it is going to be Zen based and its going to be minimum 8 cores @ 25-30watts which right now is around 7700k in terms of throughput.


*rumor of 48 core Zen on 7nm but we know AM4 is long lived so easiest way that happens is 6core CCX.
 
If they go AMD it will be Zen based, for one thing its 44mm sq for the 4 core CCX on 14nm. Heaps of Zeppelins space is taken up by "uncore" functions, massive sram arrays outside the CCX (probably cache directories), 32 pci-e lanes, 8 phys for GMI, crypto accelerators, multi 10Gb/e , usb controllers etc.

7nm is supposed to bring 50% area reduction vs 14LLP so if CCX's stay at 4 cores thats under 50mm sq for 8 core zen based chip. If the ccx's grow to 6 cores* then a console could get 12 cores for ~75nm. IMO if PS5/XB2 are AMD based it is going to be Zen based and its going to be minimum 8 cores @ 25-30watts which right now is around 7700k in terms of throughput.


*rumor of 48 core Zen on 7nm but we know AM4 is long lived so easiest way that happens is 6core CCX.
how much die space would be left for the GPU do you think if they went with 8 core zen?
that should give us an idea of GPU power should they pair like this.
 
The Jaguar 4 core (complete) was about 26 mm2 at 28 nm... that means something -I guess- like 10 mm2 at 7 nm... so 20 cores are around 10*5=50 mm2 @7nm or 60 mm2 for 24 cores... I know zen cores are much better, but keeping compatibility maybe is more important (with the online sales taking much and much more importance would be nice to sell ps4 titles also on ps5). Another idea is that they use an etherogenous approach.... 8 old Jaguar cores + 8 new Zen cores.... I see this also on Switch.
 
Status
Not open for further replies.
Back
Top