Next Generation Hardware Speculation with a Technical Spin [pre E3 2019]

Status
Not open for further replies.
Partially.
TSMC's 7nm+ is 7nm that uses EUV on 4 layers.
Then there's 6nm that uses EUV on more layers (amount undisclosed).

Theoretically, the more EUV they use, the higher yields they'll get.
The first 9th gen console SoCs will most probably use 7nm+, which might be on mass production already (first tape outs were back in October 2018).
Zen 2 was taped out on TSMC 7nm, so that’s likely what next gen consoles will use. 7nm+ hasn’t seen a huge HPC uptake.
 
Zen 2 was taped out on TSMC 7nm, so that’s likely what next gen consoles will use. 7nm+ hasn’t seen a huge HPC uptake.
Do you say that because using 7nm+ will result in additional costs compared to using 7nm, because it will have to be redesigned for the new node?
 
Do you say that because using 7nm+ will result in additional costs compared to using 7nm, because it will have to be redesigned for the new node?
Yes. They’re not compatible. Supposedly 6nm is compatible with 7nm though, oddly enough. I think this is because TSMC knows 7nm customers want a path to EUV and it was an oversight to not make 7nm+ that path. I think the density gains are almost identical between 7nm+ and 6nm.

Zen 3 is expressly 7nm+ per roadmap, but all HPC besides that I’ve read is either 7nm or 5nm.
 
Last edited:
Yes



I've been thinking if a 8GB HBM2 would really be viable though? How many GDDR6 would it take to be equivalent?
It might be viable for the reasons he listed
  • Samsung, Micron and SK Hynix already shifting part of their capacity towards HBM due to falling NAND prices
  • HBM is expected to scale down in price a lot more than GDDR6 over the console lifetime
  • Sony got "amazing" deal for HBM in part due to them buying up bad chips from other customers which can't run higher then 1.6 Gbps while keeping 1.2
  • Sony will be one of the first high volume customers of TSMCs InFO_MS when mass production starts later this year (normal InFo already used by Apple in their iPhone)
  • InFO_MS brings down the cost compared to traditional silicon interposers - has thermal and performance advantage as well
 
Last edited:
I've seen the violet table... really interesting... I think HMB2 with a bus width of 3072b will be used and will have 12 gb of the slowest type of memory (615 gb/sec... (no more is really needed if ps5 is around 10 TF as I think).... PLUS a LOT of slow DDR4 that will be used mostly (but not only also OS will mostly stay there) as HD buffer... maybe 32 giga... don't believe in SSD ... I think will have a big HD (maybe 3 TB).... so double memory controller... the one towards the DDR4 that acts as transparent, intelligent, bridge also towards the HD... from the game environment will see just HD and not the DDR4 ram... OS will see both HMB2 & DDR4... this also usefull against piracy
 
Last edited:
Sony's fiscal year is funky, off by a quarter compared to calender year.

I've worked in a lot of large multination and Government and financial years running April to March are the only thing I've known. Preparing EOY financials over Christmas and first thing in the new year? :runaway:
 
I've seen the violet table... really interesting... I think HMB2 with a bus width of 3072b will be used and will have 12 gb of the slowest type of memory (615 gb/sec... (no more is really needed if ps5 is around 10 TF as I think).... PLUS a LOT of slow DDR4 that will be used mostly (but not only also OS will mostly stay there) as HD buffer... maybe 32 giga... don't believe in SSD ... I think will have a big HD (maybe 3 TB).... so double memory controller... the one towards the DDR4 that acts as transparent, intelligent, bridge also towards the HD... from the game environment will see just HD and not the DDR4 ram... OS will see both HMB2 & DDR4... this also usefull against piracy

3072 bits bus is only available for 12, 16 and 32 GB modules, and you would have to use 2. So a minumum of 24 GB HBM2

And with NAND prices expecting to fall to $0,08 per GB in 2019 why should it not be used?
 
Doesnt have sense to develope the RT hardware with AMD taking as a start point the custom development of the ID buffer?.Is an element that could be heavily involved as a buffer to calculate intersections.
 
Doesnt have sense to develope the RT hardware with AMD taking as a start point the custom development of the ID buffer?.Is an element that could be heavily involved as a buffer to calculate intersections.

I don't see how that would help. What are you thinking of?
 
In a separate RT core feed from the id buffer to make intersection calculations, avoiding registers access.
And they could use it in PS4 BC for games using id buffer (which is being used not only for checkerboard rendering but also for TAA).

Now an interesting question would be: How are they going to emulate FP16 RPM on PS5 ? are they going to bureforce it using FP32 or are they going to have FP16 RPM also on PS5 ?
 
And they could use it in PS4 BC for games using id buffer (which is being used not only for checkerboard rendering but also for TAA).

Now an interesting question would be: How are they going to emulate FP16 RPM on PS5 ? are they going to bureforce it using FP32 or are they going to have FP16 RPM also on PS5 ?
What would make you think AMD would take such a huge step back as to remove 2xFP16?
 
In a separate RT core feed from the id buffer to make intersection calculations, avoiding registers access.

How do you use the ID buffer data to intersect against geometry? Isn't the ID buffer just a flat colour buffer with different values for each poly?
 
if memory prices are going so low maybe will have 24 giga of HMB2 + 64 or even 128 giga of DDR4 as system memory + HD buffer... now seems unbelievable but ...
 
Sorry guys, but i can't post links. I recommend search for Gen-Z it might be the key for PS5 and the integration of IF2, HBCC and SSG.

The Gen-Z switch: The backbone of Gen-Z fabric


April 9, 2019

By Tim Symons, Storage Architect, Microchip

Gen-Z fabric incorporates a number of features for enabling low-latency, memory-semantic communication, and one of the key components of the Gen-Z ecosystem is the Gen-Z switch. The Consortium showcased a Gen-Z switch at Flash Memory Summit 2018 as part of our multi-vendor technology demonstration. The server rack display utilized Field-Programmable Gate Array (FPGA)-based Gen-Z bridges connecting compute nodes to memory pools through a Gen-Z switch.

The Gen-Z switch is backbone of the Gen-Z fabric. It is essentially a component or component-integrated functionality that performs packet relay between component interfaces. The configuration of Gen-Z fabric is achieved and managed by a Fabric Manager that identifies all of the components attached to the fabric and executes policies for managing packets, creating domains, sub-domains, and access privileges by configuring the switches within the fabric. Fabric management is in-band using Gen-Z commands over Gen-Z connections.


To enable its full functionality, Gen-Z memory fabric involves switching, routing, security, zoning, access control and fabric management. The core properties of Gen-Z fabric consist of:

  • Scalable and provisioned memory infrastructure
  • Shared memory for data processing
  • Connectivity of processors, GPUs, accelerators, and optimized engines
  • Next generation DRAM, FLASH and storage class memory
  • Enabling persistent memory
These characteristics are configurable to meet application needs by managing access to memory domains, high-speed routing zones, and processing domains. Fabric switches can interconnect to create a larger infrastructure and can also be provisioned into multiple subdomains and zones designed to support variable application workloads and system requirements.

Local switches enable small-scale Gen-Z fabrics, as well as routing and provisioning of resources with minimal switching latency and almost transparent routing. Peer-to-peer operation codes within the fabric ensure that direct attached memory is among the lowest latency Gen-Z fabric configurations.

Finally, it’s important to note that data integrity is paramount. Cyclic Redundancy Check (CRC) is performed at three levels: on the packet header, within a packet and optionally for each PHIT, to ensure error detection and to eliminate the possibility of a false packet acceptance.

At the higher link rates (53.125 Gb/s and above) Forward Error Correction (FEC) is implemented to correct PHIT bit errors that ensures data integrity and minimize retry events to optimize bandwidth utilization.

Gen-Z is truly the fabric for next-generation workloads. As the amount of data and multi-processing demands continue to increase, a high-bandwidth, low-latency interconnect is now imperative to meet the industry’s needs. Gen-Z technology delivers business and technology leaders a solution for overcoming current challenges within existing computer architecture and presents open, efficient, simple and cost-effective future solution opportunities.

The Gen-Z consortium is a trade group of technology vendors involved in designing CPUs, random access memory, servers, storage, and accelerators. The goal was an open and royalty-free "memory-semantic" protocol, which is not limited by the memory controller of a CPU. The basic operations consist of simple loads and stores with the addition of modular extensions. It is intended to be used in a switched fabric or point-to-point where each device connects using a standard connector.[1]

The consortium was publicly announced on October 11, 2016.[2] Server vendor members include Cisco Systems, Cray, Dell Technologies, Hewlett Packard Enterprise, Huawei, IBM, and Lenovo. CPU vendor members include Advanced Micro Devices, ARM Holdings, Broadcom Limited, IBM, and Marvell (formerly Cavium). Memory and storage vendor members include Micron Technology, Samsung, Seagate Technology, SK Hynix, and Western Digital. Other members include IDT Corporation, Mellanox Technologies, Microsemi, Red Hat, and Xilinx.[1] Analysts noted the absence of Intel (which announced an inter-connect technology of its own called Omni-Path a year before) and Nvidia (with its own NVLink technology).[3] Some of the vendors also joined a group to promote the Cache coherent interconnect for accelerators (CCIX) protocol on the same day.[4] At about the same time, yet another consortium formed to work on an open specification for the Coherent Accelerator Processor Interface (CAPI).[5] The efforts followed years of delays before products were available with version 4.0 of PCI Express.[6]

Last one.
Customers are demanding new levels of performance, functionality, security to solve the growing challenges associated with processing and analyzing massive amounts of data in real time, while avoiding today’s system bottlenecks and security risks. After many months of investigation, the member companies determined that a new, comprehensive data-access technology was required – one that could support a wide range of new storage-class memory media, new hybrid and data-centric computing technologies, new memory-centric solution architectures, and a wide range of applications using a highly-efficient and performance-optimized solution stack.

Gen-Z is the solution. It is an open-systems interconnect designed to provide memory semantic access to data and devices via direct-attached, switched or fabric topologies. This means Gen-Z will allow any device to communicate with any other device as if it were communicating with its own local memory using simple commands. Sometimes called load/store protocol, we refer to it as a “memory-semantic communications” because it uses the same language as local memory does today. Memory-semantic communications are used to move data between buffers located on different components with minimal overhead. For example, Gen-Z-attached memory can be mapped into a processor memory management unit (MMU). Any processor load, store, or atomic operation is transparently translated into Gen-Z read, write, or atomic operation and transported to the destination memory component. Similarly, Gen-Z supports buffer "put" and "get" operations to move up to 232 bytes of data between buffers without any processor involvement.

This leads to much simpler software and hardware, and this simplicity drives performance and lower costs. Gen-Z will provide this memory-semantic connectivity to devices including System on a Chip (SoC), data accelerators, storage, and memory on the motherboard and beyond the motherboard to rack scale. In practice, that means that Gen-Z will deliver businesses more flexibility, performance, efficiency, and choice in the design and configuration of their core data center technology investments, all connected in a unifying industry standard interconnect.
 
Last edited by a moderator:
I think Gen-Z seems like a poor fit for the context of a standalone console, especially if AMD's existing technologies are in use.
Gen-Z seems more applicable to things beyond the reach of the infinity fabric, and between devices with storage in the same rack or between different points in a data center. Supporting a broad range of devices, dynamically allocated resources, massive aggregate bandwidth, RAS, and security for a data center is wasted on a console where the relevant components are meant for a single static context within the reach of a PCIe or infinity fabric link (if not on the same die).
There are some signs AMD could have given some capability to use Gen-Z in its server products, but its own internal interconnect methods could satisfy the needs of a consumer console APU or MCM at lower overhead and cost.


The description of the PS5 having a custom audio unit even with the TrueAudio work done by AMD's modern GPUs reminds me of Sony's presentation on audio engines using HSA back in the first year of the console. Sony's engineer found the GCN architecture available at the time to be wholly unsuited for anything but the most latency-insensitive audio effects, and the coarse batch granularity unappealing as well. He did make note that at the time he had not looked into whether there were priority or preemption options in that hardware that could help with any of them. Going from what we've seen in the years hence, it doesn't seem like the console GPU architecture was made suitable. In theory, the TrueAudio Next would allow for a massive improvement on the latency and consistency objections raised years ago, but it's not clear if it did enough if Sony has a custom audio unit. Even if latency were acceptable, the batching and flexibility objections could still remain.

On a side note, the DSPs that Sony had for its audio unit that had some resemblance to TrueAudio were better than the GPU, but not necessarily acceptable either. They were in use for platform functions, behind a secure API that added some latency, and often lacked the grunt of the GPU or speed of the CPU.
The Jaguar CPU had the single-digit millisecond to sub-millisecond latency advantage, higher clock speeds, highest programmability, and small batch size.


As far as TLC or QLC storage goes this seems theoretically fine, if the platform is updated to properly handle it. The PS4's lack of TRIM support made using SSDs long term a bit iffy.
I am concerned if there's an inexpensive or home-grown TLC or QLC control implementation that doesn't have the experience or lessons learned by the PC or server drive manufacturers.
There are SSDs with surprisingly poor latency for a solid-state device, and pathological garbage collection or write bandwidth loss when drives are filled or exceed a limited range of "fast" storage. Additionally, I think I'll keep an eye out for any signs of QLC drives having performance drop-off after several months like Samsung's initial TLC drive offerings.
The inconsistencies or drops from peak usually don't drive things down to the performance of a spinning disk, although some poorly handled corner cases get uncomfortably close in my opinion. A PS5 promising to standardize on a solid-state storage system with an order of magnitude improvement in performance has a much less forgiving baseline, especially if there are APIs or hardware controllers that bake in assumptions like 10x better latency and bandwidth.
 
While i was reading the wired article i've noticed some nuances that led me to the idea that the SSD he is talking about is not a conventional one, not sata based, nor nvme based.
"What’s built into Sony’s next-gen console is something a little more specialized."

"but Cerny claims that it has a raw bandwidth higher than any SSD available for PCs. That’s not all. “The raw read speed is important,“ Cerny says, “but so are the details of the I/O [input-output] mechanisms and the software stack that we put on top of the"

Based on those words i would highly suggest the possibility of custom and specialized hardware and i would add to that the possibility that we are going to see direct lanes to the CPU, GPU, I/O, (3d audio engine), and maybe even main memory, so eliminating some of the bandwidth restriction and latency caused by this. I sincerely don't know if inifinity fabric will cope with those extraordinary connections but Gen-z can
  • Scalable and provisioned memory infrastructure
  • Shared memory for data processing
  • Connectivity of processors, GPUs, accelerators, and optimized engines
  • Next generation DRAM, FLASH and storage class memory
  • Enabling persistent memory
"This leads to much simpler software and hardware, and this simplicity drives performance and lower costs. Gen-Z will provide this memory-semantic connectivity to devices including System on a Chip (SoC), data accelerators, storage, and memory on the motherboard and beyond the motherboard to rack scale."

The Anantech article says:
"The Core Specification released today primarily addresses connecting processors to memory, with the goal of allowing the memory controllers in processors to be media-agnostic: the details of whether the memory is some type of DRAM (eg. DDR4, GDDR6) or a persistent memory like 3D XPoint are handled by a media controller at the memory end of a Gen-Z link, while the processor itself issues simple and generic read and write commands over the link. In this use case, Gen-Z doesn't completely remove the need for traditional on-die memory controllers or the highest-performance solutions like HBM2, but Gen-Z can enable more scalability and flexibility by allowing new memory types to be supported without altering the processor, and by providing access to more banks of memory than can be directly attached to the processor's own memory controller."

Anand show 12431 genz-interconnect-core-specification-10-published
 
Status
Not open for further replies.
Back
Top