Next Generation Hardware Speculation with a Technical Spin [post E3 2019, pre GDC 2020] [XBSX, PS5]

Status
Not open for further replies.
To insist on a specific game is pedantic and overly constraining, as MrFox alludes.
You've got it wrong, it's not about Gears 4 per se, it's about subjecting the 580 and One X to the same test, whether Gears 4 or 5 or Forza or whatever. If you test the Xbox at 4K30 then test the 580 at 4K30 too.

Locked framerate isn’t an issue here, games are optimized to a fixed hardware configuration to specifically maximize performance out of them.
It is an issue. At 1080p60 the console only consumed 128w (down from 175w), even though it is pushing double fps, so locking fps to 60 at 1080p, reduced the power draw significantly, because the GPU is underutilized, it's capable of more fps than that, yet the need for consistency forced the developer to lock fps. The same thing is happening at 4K too. Don't underestimate the effect of V-Sync and fps lock on reducing power draw by a massive amount.

Subject the 580 to the same treatment (V-Sync, fps lock) and you will have a similar pattern in power draw.

In fact don't take my word for it, here is AMD doing the same thing with their Radeon Chill technology, locking fps to reduce power draw significantly and improve efficiency in a massive way.

Chill%2BPerf.JPG


g_-_-x-_-_-_x20170726222212_0.png


rscrle-17_30_23.png


Radeon%20Software%20Crimson%20ReLive%20%5BNDA%20Only%20-%20Confidential%5D%20v6-page-052.jpg



rscrle-17_30_21.png
 
You've got it wrong, it's not about Gears 4 per se, it's about subjecting the 580 and One X to the same test, whether Gears 4 or 5 or Forza or whatever. If you test the Xbox at 4K30 then test the 580 at 4K30 too.


It is an issue. At 1080p60 the console only consumed 128w (down from 175w), even though it is pushing double fps, so locking fps to 60 at 1080p, reduced the power draw significantly, because the GPU is underutilized, it's capable of more fps than that, yet the need for consistency forced the developer to lock fps. The same thing is happening at 4K too. Don't underestimate the effect of V-Sync and fps lock on reducing power draw by a massive amount.

Subject the 580 to the same treatment (V-Sync, fps lock) and you will have a similar pattern in power draw.

In fact don't take my word for it, here is AMD doing the same thing with their Radeon Chill technology, locking fps to reduce power draw significantly and improve efficiency in a massive way.

Chill%2BPerf.JPG


g_-_-x-_-_-_x20170726222212_0.png


rscrle-17_30_23.png


Radeon%20Software%20Crimson%20ReLive%20%5BNDA%20Only%20-%20Confidential%5D%20v6-page-052.jpg



rscrle-17_30_21.png
WTF does it matter if some games are idling and lowering the average, it changes nothing to the MAX gaming power consumption, which is what we need to know for the console's design.

Please answer the original question from the first post. How many watts from the wall for the rumored 12TF console?

Pasting all these images in the thread is just noise.
 
WTF does it matter if some games are idling and lowering the average, it changes nothing to the MAX gaming power consumption, which is what we need to know for the console's design.
We are not discussing console design for power here, we are discussing whether an RX 580 consumes way more power than the entire Xbox One X console or not.
Please answer the original question from the first post. How many watts from the wall for the rumored 12TF console?
I am not interested in such hogwash, I extrapolate the numbers from existing data, so my sole focus here is the GPU side of the equation, as I said earlier, a hypothetical 64 CU @1500MHz, will be pushing 280w, based on the power draw of the 5700XT. Maybe voltage tailoring will push this down a little or maybe not.
 
So long as they have the cooling figured out...does the wattage number matter?


Also would moving to NVMe and GDDR6 shave some wattage compared to last gen?
 
I am not interested in such hogwash, I extrapolate the numbers from existing data, so my sole focus here is the GPU side of the equation, as I said earlier, a hypothetical 64 CU @1500MHz, will be pushing 280w, based on the power draw of the 5700XT.
Do you know 5700XT GPU power is 180W?
 
The blog covers that.
filesystem and cache organization are covered by batch sizes and different types of read requests.
Drivers/implementation will improve performance, but I think because we are talking about consoles the expectation is that they will write the implementation for best performance.

ie. a filesystem setup well with sequential scan (read) is going to maximize bandwidth when running a batch size of >64. But a filesystem with random block reads will 1/2 the performance in the case of the hardware for the blog.

Also of interest is whether they retain a portion of the drive for SLC cache and what their drive management firmware to do so is like.
 
Also of interest is whether they retain a portion of the drive for SLC cache and what their drive management firmware to do so is like.
i'm not sure how useful it will be in such an optimized solution. SLC cache is useful in the PC space; but if the games and the OS is going to optimize block and batch sizes to hit maximum drive performance perhaps they could save money to go without.
 
i'm not sure how useful it will be in such an optimized solution. SLC cache is useful in the PC space; but if the games and the OS is going to optimize block and batch sizes to hit maximum drive performance perhaps they could save money to go without.
Well that’s my point. PC drives “create” SLC cache by just partitioning part of the drive as SLC. This same concept allows them to work around failed blocks and keep the drive functional. I wonder how they’ll do it on console, and I’d bet there will be numerous firmware updates to tweak NAND performance. Perhaps even a FPGA in the controller path.
 
Power draw (wattage usage) is very game dependent. Heck, the RX 5700 XT has been shown to hit 295w (e.g., Metro Last Light Redux) and greater in other cases.
We might start seeing a shift to base/boost clocks on consoles this time around instead of fixed clocks. It will allow them to have a TDP ceiling.
 
https://thessdguy.com/ssds-need-controllers-with-more-no-less-power/

I feel this is applicable to the discussion.

In the Computational Storage community there’s a difference of opinion about how much computing power belongs inside of an SSD. The Micron example used the hardware that the SSD already contained. SSDs from NGD and ScaleFlux use more powerful processors and boost the available DRAM...A completely opposite school of thought was expressed at SDC by Alibaba, CNEX, and Western Digital. These companies argued that the processing normally done in an SSD might be better managed by the host server’s CPU. There’s validity to this argument as well.

SSD housekeeping functions are performed asynchronously to the server’s application program, and this can create troublesome conflicts. An SSD that encounters too much write traffic might enter a garbage collection process that cannot be interrupted at a time when the host requires immediate access...A lot of thought has been dedicated to addressing this problem, including the Trim Command, which helps the host to manage obsolete blocks, and other controls that can be used to disallow garbage collection at specific times. Baidu did some groundbreaking work in moving control of a number of basic SSD functions outside of the drive to allow application programs to control exactly when an SSD would perform its housekeeping. This lets the application program determine the least painful time to perform such functions. This only works in systems where the hardware and software can be designed to complement each other, a luxury available to few entities other than hyperscale Internet firms.

Like the other approach, this one avails itself of the higher bandwidth within the SSD, but in a different way: The application program can be crafted to better match the number of internal channels in the SSD. SSDs typically have multiple internal flash channels (4-32, for many, but some have more or fewer) and if the number of concurrent processes in the application program is made to match the number of SSD flash channels, and if each I/O stream in the application program is assigned its own flash channel in the SSD, the performance of the entire system will be optimized.

This approach is generally known as the OpenChannel SSD, and is favored by hyperscale companies, not only because they have the ability to optimize their applications around specific hardware, but also because it reduces the cost of the SSD controller, and this is very important to companies that are deploying millions of new SSDs every year.

The Computational Storage approach also requires some change to the application software, but this will be supported by libraries supplied by the SSD manufacturers. The libraries provide calls that invoke the SSD’s help as an alternative to performing certain functions in the processor. This approach to streamlining the application is significantly simpler than dividing the application into multiple streams. While there’s no standard way to do this, leading advocates of Computational Storage have joined forces to create standards that will allow a hardware + software ecosystem to be developed for this new architecture. Even without this support, many eager buyers are already using these companies’ products and are willing to work with non-standard hardware to benefit from the competitive edge that they offer.
 
Last edited:

Hardware and software like Cerny said and like it is describe in the different patent inside one of the patent they talk about the garbage collection and system of t
queue for decide when to do it after it is using the second CPU it is maybe automated but I would not be surprised if Sony let a manual mode the dev decide when to do it. They give an example whith a guaranted latency of 14 ms.
 
Hardware and software like Cerny said and like it is describe in the different patent inside one of the patent they talk about the garbage collection and system of t
queue for decide when to do it after it is using the second CPU it is maybe automated but I would not be surprised if Sony let a manual mode the dev decide when to do it. They give an example whith a guaranted latency of 14 ms.
14ms seems very high. It must be faster than that. Or how it's being measured is different.
As per the earlier blog; we can see latency in the micro seconds and up to 10ms for blocks as large as 2^18 on a normal sata.

storage-performance-08.png
 
Openchannel nvme drives are going to be standard from many manufacturers, so we might get upgradable storage after all.
 
Gears 5 is the most stressing case for XB1 known to date. That’s still less than this single instance of 580 example. Requesting an exhaustive dataset is disingenuous in my opinion. There is a somewhat extensive technical disclosure of directed efforts on Microsoft’s part to limit power consumption by aggressive voltage tailoring. That’s exactly the results we see.

Descending into pedantry moaning about lack of matching usage conditions (across platforms where that equalization of parameters is by design difficult and open to argument) or claiming gotchas because the actual usage TF of 5% higher than XB1X simply distract from what MS has demonstrated they can accomplish comparative to PC use cases.

I will also note in my power budget for XSX, I didn’t even claim the benefits of N7P over N7.

index.php

If anything, that table shows how bad the 580 performs for what it draws. A 1080 is much faster yet has a lower power draw.

Is PS5 SSD in-game speed faster than “theoretical speed” of PC SSD?

Probably at the time of PR speaking (early spring 2019) yes.
 
14ms seems very high. It must be faster than that. Or how it's being measured is different.
As per the earlier blog; we can see latency in the micro seconds and up to 10ms for blocks as large as 2^18 on a normal sata.

storage-performance-08.png

It is guaranteed latency when you have a cleaning operation going on. This real-time application they need to guarantee the latency, this the worst-case latency and it means devs can stream the data knowing it will never be worse than this. It means you can guarantee data for the next frame at 30 and 60 fps and two frames later at 120 fps. ;) This is enough.
 
If anything, that table shows how bad the 580 performs for what it draws. A 1080 is much faster yet has a lower power draw.



Probably at the time of PR speaking (early spring 2019) yes.

Lol

https://thessdguy.com/baidu-goes-beyond-ssds/

Baidu did something with the same NAND they improve the SSD write speed by 7 to 10 times and reduce the cost per megabyte by 50% of the SSD system

http://ranger.uta.edu/~sjiang/pubs/papers/ouyang14-SDF.pdf

There is a research paper about it

The result was SDF, or “Software-Defined Flash”, a card that appears to be based on a Huawei PCIe SSD whose FPGAs were reprogrammed to remove several internal functions. The designers omitted the internal DRAM buffer, and the logic for garbage collection, inter-chip parity coding, and static wear leveling, while exposing all 44 internal NAND channels to the system software. With these changes SDF delivers high-throughput I/O with consistent low latency to highly concurrent applications.

The host system software was redesigned to bypass the standard Linux I/O stack, reducing a 75μs delay down to 2-4μs, and writes are now consolidated, to convert random 4KB writes into 2MB streaming erase/write operations that perform 7-10 times faster.

This approach paid off – in its intended environment Baidu’s SDF delivers three times the I/O bandwidth at a 50% lower hardware cost per-megabyte than the company’s original SSD-based system. Bandwidth improvements stem from the large writes, and the removal of SSD’s internal garbage collection and static wear leveling. Most of the cost benefit resulted from the fact that the SSD no longer needs overprovisioning once these functions have been eliminated.`/quote]
 
Lol

https://thessdguy.com/baidu-goes-beyond-ssds/

Baidu did something with the same NAND they improve the SSD write speed by 7 to 10 times and reduce the cost per megabyte by 50% of the SSD system

http://ranger.uta.edu/~sjiang/pubs/papers/ouyang14-SDF.pdf

There is a research paper about it

That's all nice and done, but we have no clue what Cerny was talking about. He said clearly, faster then anything right now on pc, reffering to his laptop also? He most likely ment PCIE3.0, as MS's console is going to have NVME.
 
That's all nice and done, but we have no clue what Cerny was talking about. He said clearly, faster then anything right now on pc, reffering to his laptop also? He most likely ment PCIE3.0, as MS's console is going to have NVME.

I think the real speed of the SSD will be 4 to 5 GB/s but if like Baidu they can reach 95% of read speed at the end but they will be able to deliver 8 to 10 GB per second of uncompressed data in memory because of the realtime decompression.
 
Status
Not open for further replies.
Back
Top