Next Generation Hardware Speculation with a Technical Spin [pre E3 2019]

Discussion in 'Console Technology' started by TheAlSpark, Dec 31, 2018.

Thread Status:
Not open for further replies.
  1. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    20,502
    Likes Received:
    24,399
    Yes, like some QLC SSDs have QLC cells set to SLC mode, but then overall capacity takes a large hit since you're only using 1/4th of the cells.
     
    anexanhume and function like this.
  2. function

    function None functional
    Legend

    Joined:
    Mar 27, 2003
    Messages:
    5,854
    Likes Received:
    4,405
    Location:
    Wrong thread
    Ah, so that explains variable size SLC caches ... :)

    That'd give you a lot of options as to how you balance durability and capacity for a single drive. Amazing no-one is selling 256 GB SLC SSDs then. A 10+ PB write drive would seem to have at least some uses.

    If you were to format a 1TB QLC drive down to, say, 240GB could you force it to always act as an SLC drive with the corresponding write endurance?
     
  3. Proelite

    Veteran Subscriber

    Joined:
    Jul 3, 2006
    Messages:
    1,620
    Likes Received:
    1,106
    Location:
    Redmond
    New math for console APUs.

    Usually only ~40% of a discrete gpu is the CU + ROPs. For 60CUs Vega 7 on 7nm, the CUs and ROPS would be ~140mm2. The majority of the chip is taken up by memory controllers, audio processing blocks, video encode/decode hardware, the PCIe 3.0 bus interface, and a number of other low-level silicon blocks.

    Zen2 should be the same size as the 28nm Jaguars. The 28nm PS4 dedicated 88mm2 of its die to the GPU. The PS4 APU is 328mm2.
    Given a 350mm2 die, we have an extra 22mm2 for the gpu.
    88mm2 + 22mm2 = 110mm2.
    That translates to 48-50CUs and 64 ROPs. I don't expect Navi to be a denser design than Vega 7 (in fact, I expect the opposite), so 50 CUs should the expected maximum.

    48-50 CUs top out at 11-11.5 teraflops at 1.8 ghz.

    I expect 11-11.5 teraflops to be the absolute max in an APU design.
     
    iroboto likes this.
  4. sniffy

    Newcomer

    Joined:
    Nov 14, 2014
    Messages:
    55
    Likes Received:
    83
    Why would it be an APU design when AMD are already buying wafers for 7nm Zen 2 chiplets en masse? Wouldn't it be financially sensible for both AMD, Sony, and probably MS to dip into that pre-existing pool of product and glue ;-) it to a custom GPU?
     
    MBTP likes this.
  5. goonergaz

    Veteran

    Joined:
    Jun 3, 2005
    Messages:
    4,494
    Likes Received:
    1,693
    The great thing with Windows 95 was when I found out about batch files...being able to stick batch files on the HDD and install from the batch files was like going from completely PITB manual process to fully automatic (including installing drivers etc). But yes, the old days were fun...I remember the piles of Amiga floppy disks for some games!

    Whoops - sorry for the OT post!
     
    Lightman likes this.
  6. I don't know how these contracts work, but I would guess if AMD, Sony and MS all use the same Zen 2 chiplet then it's probably easier to acquire Gigafab capacity from TSMC which would bring down the cost.

    There is something I've been thinking recently: consoles usually don't use the highest binning silicon from what I've read - in fact more the opposite. MS also uses AMD Epyc in their datacenter. Could they negotiate a special deal with AMD for cheaper prices due to that? Along the lines of:

    "Hey AMD, we will not customized the Zen 2 cores for our Scarlet familiy like we did with Jaguar in the previous consoles but instead we want to order normal Zen 2 chiplets. Due to the sheer volume of our Zen 2 order this will increase your possibility of binning immensely. We will use the lower binning dies for our consoles and give you the good binning dies from our order for your Epyc and Threadripper CPUs. But in turn we get a really good deal for our console silicon (including the GPU) and for our Azure order as well."
     
  7. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,661
    Likes Received:
    1,114
    Only for writes, running games is effectively read only.

    Don't know if that is down to the software Anandtech uses for testing random performance. Storagereview.com uses VDBench, and here the Crucial P1 (QLC) goes above 300k IOPS for 4K reads.

    The consumer oriented WD Black SN750 TB also hits 300K IOPS, but with worse latencies


    Edit: Anandtech seems to test random performance with one queue and one thread. The sustained random performance is the maximum of one, two or four queues, but still with one thread. VDBench uses 32 queues, Storagereview tests with 1 and 8 threads, so that's a maximum of 256 ops in flight for SR vs just four for Anandtech.

    I do not think they will develop the entire storage solution themselves, I think they will partner with someone, WD or Micron, to develop it for them. I do think it will be soldered in the main board and not a M.2 solution.

    Cheers
     
    #1867 Gubbi, Apr 30, 2019
    Last edited: Apr 30, 2019
  8. chris1515

    Legend

    Joined:
    Jul 24, 2005
    Messages:
    7,157
    Likes Received:
    7,965
    Location:
    Barcelona Spain
    Lightman and Cyan like this.
  9. Jay

    Jay
    Veteran

    Joined:
    Aug 3, 2013
    Messages:
    4,029
    Likes Received:
    3,428
  10. Rootax

    Veteran

    Joined:
    Jan 2, 2006
    Messages:
    2,400
    Likes Received:
    1,845
    Location:
    France
    I don't get the Stadia comparaison... It's a cloud thing, and they can "stack them", no ? Or when it's not powerful enough, I guess upgrade them...
     
  11. shiznit

    Regular

    Joined:
    Nov 27, 2007
    Messages:
    345
    Likes Received:
    95
    Location:
    Oblast of Columbia
    The 100K+ IOPS numbers you often see quoted for NAND are at very high queue depths. Not a gaming workload. NAND random read performance falls off a cliff at low queue depths, and contrary to popular belief reads are more expensive than writes. If you want to augment DRAM with a non-volatile medium, you really need Optane.

    Devs will still have to arrange game data for sequential access and large block sizes to achieve 1GB/s+ throughput. That said, it's not hard to do, and unlike with hard drives the potential gains are enormous.
     
    #1871 shiznit, Apr 30, 2019
    Last edited: Apr 30, 2019
  12. manux

    Veteran

    Joined:
    Sep 7, 2002
    Messages:
    3,034
    Likes Received:
    2,276
    Location:
    Self Imposed Exhile
    Given low enough latency and high enough amount of iops I wonder if game engine and even hw could treat ssd as ram. Just mmap the game files into ram and access blocks, let hits and misses happen. HBCC in steroids. Of course one wants to optimize accesses patterns but I wonder if such a simplistic approach(mmap, ssd as ram) could be the base from where to start? Maybe have additional metadata file so mmap can optimize what to never cache, what to preload to ram etc.

    Console games should already be fairly optimized for streaming in sequential mannor. Seeks on existing console hw are brutally slow.
     
  13. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,661
    Likes Received:
    1,114
    It is not a gaming workload because games have been loading in large chunks since the beginning of time, - to circumvent the atroucious random seek latency of legacy media (optical and HDDs).

    What I'm talking about is demand loading missing texture mipmaps at 4K page boundaries. Such a workload could use as many outstanding transactions as the NAND controller supports.

    Exactly !.

    Cheers
     
    MBTP likes this.
  14. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    I missed that the numbers I was using were at lower queue depths.
    There is a selectable section that goes to 32-deep, but it doesn't seem to me that the throughput at a depth of 32 for the Intel and Crucial QLC drives matches Storagereview.
    The Crucial drive's random read throughput at Anandtech seems far too low for the IOPs given by Storagereview. The Intel drive fares better than Crucial though still notably slower at Anandtech. A significant caveat is that greater performance doesn't manifest if the drive is full.

    I'm not sure what to attribute this to. Perhaps a testing suite quirk, or a platform difference between a setup matching a desktop PC versus a Dell Poweredge?

    There are other examples in Anandtech's list like a WD TLC drive review that has lower performance than Storagerview, but they're still in the ballpark.
     
  15. HBRU

    Regular

    Joined:
    Apr 6, 2017
    Messages:
    837
    Likes Received:
    180
    I think will just have RAM RAM and RAM.... RAM of the cheapest DDR of slowest type that buffers the HD.... with OS that (running also mostly on it) manages to let see that from games just as an really quick HD

    (also for backwards compatibility issues)....
     
  16. bgroovy

    Banned

    Joined:
    Oct 15, 2014
    Messages:
    799
    Likes Received:
    626
    It is: "Custom 2.7GHz hyper-threaded x86 CPU with AVX2 SIMD and 9.5MB L2+L3 cache." The cache amount implies to me each instance only gets three cores on a Xeon many core processor. Compared to an 8 core, 16 thread Zen 2 based PS5, the Sony console would be significantly more powerful.
     
  17. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,661
    Likes Received:
    1,114
    SR also runs 8 threads in parallel.

    Anyway, We can deduce the minimum access latency from the Anandtechs numbers, NVMe SSDs seem to have an average of 45MB/s for sequential random accesses, that's 11k IOPS and an access latency of 90 us.

    For 300k IOPS, we would need (at least) 30 simultaneous transactions in flight, so game engine developers will have to rework some of their assets loading subsystems after decades of minimizing seeks/accesses.

    Cheers
     
    #1877 Gubbi, May 1, 2019
    Last edited: May 1, 2019
  18. Nisaaru

    Veteran

    Joined:
    Jan 19, 2013
    Messages:
    1,133
    Likes Received:
    403
    Do I miss something here but do you really conclude from how many 512B/4K blocks the SSD can process in parallel to the max bandwidth?:)
     
  19. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,661
    Likes Received:
    1,114
    No, I derived how many transactions that would need to be served in parallel to hit 300k IOPS. 300k IOPS each reading 4K only totals 1.2GB/s, a lot less that the internal >6 GB/s bandwidth of eight ONFI4 channels running at 800MT/s.

    Cheers
     
  20. anexanhume

    Veteran

    Joined:
    Dec 5, 2011
    Messages:
    2,078
    Likes Received:
    1,535
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...