Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Discussion in 'Console Technology' started by Proelite, Mar 16, 2020.

  1. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,536
    Likes Received:
    4,635
    Location:
    Well within 3d
    NGG was introduced but not really implemented with Vega. It wasn't cited as something brought into the Pro, although perhaps some element of it was brought in.
    Rapid packed math is the Vega feature I remember being adopted by the Pro.

    Could another interpretation be that NGG legacy is when vertex shader code is being converted or run through the new pipeline. Interpreting older code to run on NGG was a big part of the work done with Vega, and possibly a contributor in why it wasn't adopted then.
     
    thicc_gaf and BRiT like this.
  2. QPlayer

    Newcomer

    Joined:
    May 17, 2019
    Messages:
    49
    Likes Received:
    23
    XSX needs the modern shader language.

    Old code no way
     
  3. chris1515

    Legend Regular

    Joined:
    Jul 24, 2005
    Messages:
    5,971
    Likes Received:
    6,094
    Location:
    Barcelona Spain
    Seeing hair technology in FIFA 21 or at a lesser extent Spiderman Miles Morales, UE 5 Nanite and Lumen, Demon's soul's level of geometry Demon's soul's and Watchdog GI, or raytracing in Watchdog Legions and Spiderman Miles Morales.

    We begin to have an idea of what we can really do with PS5 and Xbox Series X|S as the target. This will be interesting when everything will be merge together and when the PS4 and XB1 and midgen consoles will be left behind.
     
    Johnny Awesome and milk like this.
  4. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,987
    Likes Received:
    134
    PS5 has 64 ROPs (Github), so its pixel fillrate has increased from previous gen and is plentiful:
    64x2.23GHz = 142.7 Gpix/s

    Triangle throughput:
    PS5:
    2x2.23GHz = 4.46 GTri/s (if 2 tri/cycle for RDNA2)
    PS4 Pro:
    4x0.911GHz = 3.64 GTri/s

    Triangle throughput has also increased from the PS4 Pro. And RDNA2 Raster Units apparently are closer to peak performance and therefore more efficient:
    https://forum.beyond3d.com/threads/...6900-xt-2020-10-28.62091/page-53#post-2176773
    Doesn't sound like this is a limitation.

    I'm expecting a Normal distribution to be around 16 fragment triangle sizes or thereabouts. Up to 32 and down to 1 fragment triangles, I'm still expecting better performance than RDNA1 as well. Will need benchmarks to see. As games progress, triangles are getting smaller, so with a new Raster Unit with two scan converters per triangle, I'd expect both larger and smaller triangle efficiency gains.
    Was this the DF interview with Bluepoint? I recall the devs saying they would tessellate wireframe meshes down to triangles so small that you couldn't tell the difference - implying triangles are down to pixel sizes. It's a shame the interview didn't followup if this was done with the new Geometry Engine and its capabilities.
    https://forum.beyond3d.com/threads/...6900-xt-2020-10-28.62091/page-65#post-2177723
    Coarse rasteriser (scan converter) is feeding multiple fine rasterisers. Not sure on the topology - parallel, series or combination of scan converters.

    For example for Navi21: 1 Shader Engine has 2 scan converters, working on 1 triangles. You could arrange those scan converters in a number of coarse and fine arrangements, and feed the Shader Arrays appropriately. With 4 Shader Engines and 8 scan converters working on 4 triangles, you could have some kind of network of coarse and fine rasterisation feeding 8 Shader Arrays.
    From Github, Prim Legacy is still 4 triangles per cycle and depending on clocks for BC, you'll have enough triangle throughput. You could possibly use the NGG Legacy path as well. Perhaps the scan converters can be arranged to work on 1 triangle with 1 scan converter in Legacy mode, and 1 triangle with 2 scan converters in Fast/ Native mode.
    NGG Primitive Shaders still use fixed-function scan converters? IIRC, Raster Units are still involved, but not Prim Units.
    The suffixes aren't very clear in that Github table. However, there's only 1 NGG Legacy entry. This could be for PS4 Pro culling BC. Why have missing corresponding entries - pre-cull 8 for NGG Legacy and post-cull 4 for NGG Fast/ Native? Without distinction, I assume they correspond.

    If NGG Legacy is pre-cull = 8 prim/cycle
    And NGG Fast/ Native is pre-cull = 4 prim/cycle

    Then NGG Fast/ Native is post-cull = 2 prim/ cycle
    This feeds Prim Fast/ Native at 2 triangles per cycle - assumed PS5 fixed-function RDNA2 Raster Units
    Yes, the suffixes/ labels are not explicit.
    From the Github leak, which is mainly for backwards compatiblity, there is an entry missing, "peak Prim Fast". This is assumed fixed-function native capability for rasterisation/ scan conversion and is absent, so in my previous post and above post, deduced it in 2 different ways - as 2 triangles per clock.
     
  5. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,536
    Likes Received:
    4,635
    Location:
    Well within 3d
    That's different from having a coarse rasterizer working alongside a fine rasterizer. The coarse rasterizer is routing primitives to multiple finer rasterizers, the coarse rasterizer itself doesn't provide coverage, since a coarse rasterizer doesn't sample finely enough to give the appropriate coverage information.

    Coarse rasterization would give a general screen space tile or tiles that a primitive might cover. That's too broad for the pixel coverage needed for a pixel shader wavefront's launch. What coarse rasterization can do is provide information on which rasterizers may be responsible for providing the per-pixel coverage data needed.

    If it's 4 triangles per clock then I don't see the point in speculating about a 2-triangle per clock arrangement. Per the BC testing and some of the discussion of the boost and non-boost forms for the PS5's backwards compatibility, a fallback to PS4 or PS4 Pro clocks would not work very well if the hardware was half as wide.

    As of the last time AMD discussed them with any detail, yes. Vega's primitive shader still fed into a primitive assembler. Primitive shaders have optional levels of culling available, leaving it up to the primitive unit and rasterizer to make the final determination.

    I'm seeing NGG Vertex legacy and NGG Prim legacy, which one are you referring to?

    Do you mean the ones that include the (tri list) descriptor? They may not be testing the same input mode, and the B column which isn't expanded fully in the screenshot indicates there are other possible differences.

    This is ignoring that the test names hint they aren't testing the same thing.

    If fast launch is unique to NGG, the non-NGG tests wouldn't need to test it.
     
    BRiT likes this.
  6. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,987
    Likes Received:
    134
    https://forum.beyond3d.com/threads/...6900-xt-2020-10-28.62091/page-53#post-2176773
    A triangle can touch a single pixel or up to 32 in the above link. We still have 1 Raster Unit per Shader Engine and two scan converters per triangle. Scan Converters (rasterisers) and their detailed arrangements aren't explicit.
    This started with Navi21, where Navi22 with half the number of Shader Engines, and half the number of Scan Converters would do half the number of triangles per cycle at 2, and speculation on PS5 being similar.

    The Github data isn't explicit enough to confirm or deny, and the point of the speculation.
    Yes, I'm only looking at the Prim entries - NGG Prim Legacy, NGG Prim Fast and Prim Legacy. And I'm aware they may not be testing the same input method as it isn't explicit in the table. It makes more sense to test corresponding inputs for both Legacy and Fast. Still could be the opposite.
    This isn't clear.
    Yes, my point that Github is testing BC and a corresponding Prim Fast entry is missing, and a deduction made.
     
    ethernity likes this.
  7. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,536
    Likes Received:
    4,635
    Location:
    Well within 3d
    The number of pixels a triangle covers isn't what differentiates coarse and fine rasterization. The ability of a given rasterizer stage to determine how many pixels a triangle covers determines whether it is coarse or fine rasterization. A coarse rasterizer generally cannot give pixel-level information, and might be as coarse as a screen tile or region of pixels/quads. A standard rasterizer would need to follow up with the actual coverage information, and the coarse rasterizer's output can be used to determine which rasterizers would need to do this.
    A very coarse-level check was mentioned for Vega's primitive shaders, where there was an instruction used to look up how many shader engines (1:1 with rasterizer at that time) would be responsible for evaluating a given primitive. Such a coarse check might be handled with primitive shaders now, or may possibly be part of the the workload handled by the geometry processor.

    The Github data gives 4 primitives per cycle natively. Its PS4 Pro mode gives 4, and the PS4 mode gives 2, which gives a decent pattern for the number of primitives the hardware can process per clock. I'm not following on why generally proven details on the actual hardware would be disputed by the settings of a different GPU.

    I'm not following your wording here. I was responding to your statement that there was only one legacy entry, when I found two. Your answer now gives three, and one of them isn't legacy.
    If the fast case is a special case for NGG that isn't considered special for the legacy case, it wouldn't be necessary. One specific form of primitive input is being singled out, which may make it a condition for whatever fast launch is.

    It's not a definitive statement, but they bothered to change the naming in a specific way, when if there were no change they wouldn't need to.

    Perhaps if the full rows of that section were available, some of the differences may be partly explained. There are elements in the spreadsheet that explicitly measure functionality that do not have equivalents in the original hardware, like the earlier row on WGP mode LDS bandwidth or L1 graphics cache bandwidth.
     
    PSman1700, iroboto and BRiT like this.
  8. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,987
    Likes Received:
    134
    There seems to be definition vagueness. We know that the Raster Unit spits out 1-32 fragments per triangle. There are 2 scan converters involved from the driver leak. The 'coarse' and 'fine' arrangement of scan converters (rasterisers) are not detailed. We are speculating on how we get 1-32 fragments.
    These are not explicit PS5 modes to tell us what the non-NGG, non-legacy PS5 native Raster Units are capable of. We have suffixes 'legacy', 'NGG' and 'fast' as below:

    - peak prim legacy = 4 prim/clk (fixed-function)
    - peak NGG legacy = 8 prim/clk (primitive shader)
    - peak NGG fast = 3.3 prim/clk (weird, non-integer)
    - peak NGG fast / scan conv = 4 prim/ clk (native?, primitive shader)

    We are missing 'peak prim fast', the corresponding entry to 'peak prim legacy'.

    As already mentioned, NGG legacy is twice NGG fast, with no extra details about them being pre or post cull.

    1) You are trying to say NGG legacy is 8 because of pre cull, and NGG fast is 4 because of post cull. That's why it's higher. I'm saying we don't have that info, and it's just as valid to say they are both pre cull.

    2) If they are both pre cull, why is NGG fast 4 instead of 8? Because for RDNA2 and Navi21, 8 to 4 ratio is pre to post cull ratio. But for Navi22, 4 to 2 ratio is expected for pre to post cull ratio.

    Which means 2 prim/ cycle sent to 2 Raster Units for Navi22, instead of 4 prim/ cycle sent to 4 Raster Units for Navi21. If PS5 follows Navi22, then 2 prim/ cycle is expected. You are arguing 1) and I'm arguing 2) as far as I can see.

    See above, hope that is clearer.
    It's not clear as mentioned above.
    Yes, the data isn't clear to confirm explicitly what I said above.
     
    BillSpencer likes this.
  9. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,536
    Likes Received:
    4,635
    Location:
    Well within 3d
    The rasterizer stage produces coverage data for a 32/64-wide pixel shader wavefront. I'm guessing there's an assumption that the path between rasterizer and the launch hardware is sized for 32 pixels, although equality isn't necessary since there's no requirement that launches occur every cycle (GCN was 16:64). A triangle can cover more than 32 pixels, since that is a function of the dimensions of the triangle (could be screen-sized if necessary) but that would require additional wavefronts that cannot launch in the same cycle, blocking further geometry processing upstream. That coverage information is at the fine level of granularity, as coarse rasterization doesn't give an adequate answer for the individual elements in the coverage mask.
    Having multiple rasterizers opens up the question of how many of them need to evaluate a given triangle. The straightforward approach would be to submit the same triangle to however many rasterizers there are, but that wastes their cycles in many cases because rasterizers are responsible for separate tiles of screen space, and most triangles touch fewer tiles than there are rasterizers. Coarse rasterization can flag which rasterizers need to be sent a triangle for coverage evaluation, which may make more sense if there are additional subdivisions in the scan conversion process beyond the original 4.


    This is where having visibility on all the columns in that spreadsheet and their headers would give more information. I'd have to search where the data was discussed before, but there are columns for the native clocks at the time and modes coinciding with the PS4 Pro and PS4, with per-clock adjustments for things like the PS4 being half as wide as the Pro.
    The pipeline for processing geometry would be present for at least 4 primitives per clock, going by the tests for the native, PS4 Pro BC mode, and 2 per clock in the compatibility mode for the PS4.
    Much of the hardware is shared between the types, so I would think it would be more straightforward to maintain the throughput.
    The "weird" non-integer value may not be that weird if we don't delete the text related to triangle lists and the row related to there being 10 vertices per clock in NGG Vertex Fast mode.
    There are 3 vertices per triangle, so I can imagine one way to get 3.3 triangles/clock from a process that supports 10 vertices/clock.
    Since Fast always mentions (Tri list) it could be that it's part of the condition for the fast test, in which case we have the peak value for fast.


    The rest of the process around primitive processing is sized for 4 primitives/clock, going by the wave launch section. Dropping NGG's throughput despite the the hardware being mostly there with the legacy pipeline and wave launch path being sized for 4 doesn't seem necessary to me.

    The first part is a possibility I've mentioned.
    I'm theorizing NGG fast's behavior may be related to the triangle list condition, which could have different constraints.
    That they have different culling settings is something I've noted, though I don't know what the settings specifically mean.
    The 3.3 prim/clk ratio aligns with the NGG Vertex Fast row that wasn't in the list posted, given triangles have 3 vertices.
    The fractional throughput limit there may constrain the overall rate, but I don't know if that means other formats would have the same ceiling.
    The Peak NGG Prim Fast(Tri list)/Scan Conv line may have some implication on the throughput or culling capabilities of the chip in fast mode since there are multiple scan convertors. The BC testing's scaling from a 2 SE mode to a 4 SE mode would seem to indicate there are 4 however.

    Navi 22 is supposedly a 2 SE GPU, but the PS5's testing behavior is consistent with there being 4.
     
    turkey, thicc_gaf, Shompola and 5 others like this.
  10. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,987
    Likes Received:
    134
    Are you using 'rasterisers' and 'scan converters' interchangeably?

    Given a Shader Engine, there are 2 scan converters in 2 Shader Arrays, and the Raster Unit sits above SAs at the SE level processing 1 triangle. It isn't clear how we are getting 1-32 fragments still from 'coarse' and 'fine' scan converters. Is every scan converter upgraded to 1-32, or still 1-16, but in some combination? A whitepaper with pipeline breakdown would be great.

    Then there are 8 Packers in each RDNA2 Shader Engine from the driver leak, so 2x2 quads would make 32 fragments being sent, which matches wave size.

    Well, if you have a better spreadsheet that has more information, then that would make things clearer. As mentioned previously, what we have isn't explicit enough.
    Yes, but the 'tri list' doesn't make it clear if the numbers are pre or post cull, as mentioned previously, and doesn't make it any more explicit for 'peak Prim Fast', so we are still looking at different views.
    The 'wave launch' section has the first row for Work Items as 64 items/clock, which is GCN wavefronts as far as I can see. So, all subsequent entries in that section would be legacy BC. So seeing 4 prim/clk entries isn't surprising. Github was about BC, so ascertaining RDNA native capabilities are obfuscated.
    Do you mean SEs as Shader Engines or Shader Arrays? What is 4 referring to? I expect 4 scan converters. Even Navi22 has 4 from the driver leak.
    Sorry, not following '4' - is that referring to 4 scan converters? Both Navi22 and PS5 should have the same numbers.
     
  11. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,536
    Likes Received:
    4,635
    Location:
    Well within 3d
    Rasterization as a process is synonymous with scan conversion, in this case I'm treating scan converters as a later stage in the rasterization hardware's functionality. Whether they're physically distinct in a manner that hasn't been indicated before isn't clear, but for the purposes of determining peak geometry rate the rasterizer block would be accepting the geometry first.

    Perhaps there's something about the links being used to reference posts, but I haven't seen what is supposed to indicate there are coarse and fine scan converters like you've claimed.
    The amount of coverage information being used for pixel shader wavefront launch is equal to the size of the wavefront. Whether that coverage mask is filled with active lanes is based on how many pixels/quads a triangle is found to be touching, where a scan converter probably provides the information that populates the mask.
    I'm not seeing gain in restricting the amount of coverage information being generated by narrowing the scan converter output, a wavefront isn't going to launch until it has that information, irrespective of the number of pixels the triangle covers--which can be more than 32.

    The model I'm working with for now is what was documented in AMD's patent for a binning rasterizer, which is presumably the DSBR introduced with Vega. https://www.freepatentsonline.com/20190122417.pdf
    What AMD has publicly described as its rasterizer covers the primitive batching module, accumulator, and a scan converter. If AMD has split or duplicated scan conversion hardware, the path from the binning and culling portion of the rasterizer would define peak geometry rate for triangles that are rendered.

    I'm not 100% certain on the identity of the packers in the driver leak, but if it's related to POPS packers in the ISA it's not how they would be used. A wavefront can reference a packer ID, but that ID is for all pixels in the wavefront. The point of it is to provide a way to detect that exports from different triangles' pixel shaders are hitting the same pixels, and the packer ID and the value given by that packer give the order those exports should retire in based on what sequence the triangle entered the rasterization process.

    It came up some time ago, I would need to see if any pages were attached to posts or there are other repositories.

    The vertex entry and the 3.3 throughput may indicate there is another factor. There are choices that can be made in terms of how the geometry is passed to the GPU that can have significant throughput impacts, although the most recent example of synthetics being used to check this was for Vega.

    GCN has a 4-clock cadence, so 16 items would be brought up per clock, per shader engine.
    The PS4 Pro's rate was 64, and it has 4 shader engines. The PS4's rate is 32, and it has 2 shader engines.

    SE means shader engine, Navi 22 has two SEs in the leak.
    At least so far, shader launch has seemed to be part of the hardware that is actually at the shader engine level, versus shader array.

    Shader engines.
     
    thicc_gaf and PSman1700 like this.
  12. eastmen

    Legend Subscriber

    Joined:
    Mar 17, 2008
    Messages:
    12,119
    Likes Received:
    3,109
    [​IMG]

    This is why I believe the digital only versions of the consoles will be the most popular moving forward and why I think console refreshes or next gen consoles wont have Bluray drives

    I also think for next gen post ps5/xbox series bluray makes zero sense as that would be the 4th generation with the same storage format and its going to have issues for storage.

    Bluray is 25/50 XL is 100/128.
    Bluray has a maximum speed of 72MB/s @16x I am not sure if xl discs can read faster , can't find the info.

    About the only thing bluray has going for it is the disc cost. But its going to be harder and harder to justify that disc cost when 75% of customers buy digital. The drive price is a price that is carried with every console sold that has one.

    I think if we see any physical format in the next generation of consoles it will be some nand cart. 128Gigs of nand continues to get cheaper and cheaper with some on black friday dipping under $10 and 256gig dipping to $20. Fast forward another 6 years and what would we have ?
     
    thicc_gaf likes this.
  13. Pete

    Pete Moderate Nuisance
    Moderator Legend Veteran

    Joined:
    Feb 7, 2002
    Messages:
    5,401
    Likes Received:
    1,161
    IIRC, the PC/console preorder split was 59/41. Assuming virtually no one on PC got a physical version, that’s 26/41=63% physical on console. 63%*41%*8mil=2mil copies, nothing to sneeze at.

    Still, a $400 PS5 DE is nothing to sneeze at, either.
     
    DSoup likes this.
  14. Globalisateur

    Globalisateur Globby
    Veteran Regular Subscriber

    Joined:
    Nov 6, 2013
    Messages:
    3,905
    Likes Received:
    2,840
    Location:
    France
    Many customers still can't realistically download 100GB games. I know I can't.
    Games size of >300GB and even more greedy publishers.
     
    Pete and DSoup like this.
  15. thicc_gaf

    Newcomer

    Joined:
    Oct 9, 2020
    Messages:
    198
    Likes Received:
    160
    NAND-based USB carts could definitely be a thing for at least one of the 10th-gen systems. Already in arcade/FEC markets, there's systems like the exa-Arcadia which use something like this as a physical media for game delivery.

    Microsoft already has some proto-form of this on the market already with the expansion cards. A few years from now, cut down on the capacity a ton, keep the same bandwidth spec, maybe tune the decompression capabilities a bit and 64 GB/128 GB, maybe even 256 GB cards for physical delivery can be doable at affordable prices. Though it'll probably still be a bit more than equivalent Blu-Ray disc by that time.

    Anyway about Cyberpunk, the split is interesting but not surprising. Been reading a lot about the glitches and performance issues though, even on cards like the 3080 it seems to be very unoptimized. Hopefully by the time the next-gen patch is ready the glitches and performance issues will be fixed.

    xD that's a true if pessimistic outlook on that note. But even if any type of NAND-based USB cart doesn't happen, let's at least hope SSD sizes will be a lot bigger.

    4 TB should be completely doable by then, probably at a lower cost than the 1 TB/768 GB equivalents costed the 9th-gen systems.
     
  16. fehu

    Veteran Regular

    Joined:
    Nov 15, 2006
    Messages:
    1,925
    Likes Received:
    900
    Location:
    Somewhere over the ocean
    If you must lose compatibility with the disc, go directly to digital only.
    The customers that you lose are nothing compared to the cost of maintaining an expensive niche format.
     
    PSman1700 and turkey like this.
  17. liams

    Newcomer

    Joined:
    Jul 1, 2020
    Messages:
    181
    Likes Received:
    169
    They could just use straight up USB memory sticks, with a read-only mode on it. The benefit of using a USB based solution and still requiring an install process is that you could introduce it as a distribution format whenever you wanted, you don't need to wait for a hard cut gen to gen.
     
  18. Jay

    Jay
    Veteran Regular

    Joined:
    Aug 3, 2013
    Messages:
    3,346
    Likes Received:
    2,629
    Disks are hardly even a distribution format now.
    Not when day 1 patches are the same size as the install or only part is actually on the disk to begin with.

    What I've always thought places like gamestop should do is provide a service where they downloaded the games, and you can either rent the usb drive or bring your own and copy the game to it.
    So still digital, but for people with data caps or bad net, they jist pay couple bucks to use theres. (obviously they would have load of games already downloaded etc)
     
    egoless likes this.
  19. liams

    Newcomer

    Joined:
    Jul 1, 2020
    Messages:
    181
    Likes Received:
    169

    I think we will that on the xbox side with GameStop's in the near future. Once the expandable storage cards come down in price a bit and the world is a bit less covidy I think they will roll something like that out. I could see them wanting to delay that at the moment because they don't want to be seen encouraging people to risk their health by going into stores. The expandable storage cards for the xboxs are perfect for this. I could even see it being a free service, if Microsoft provided the kiosks for free to gamestop and gamestop provided the floor space for free its just another reason for customers to go into a gamestop, giving them more opportunity for sales.

    Plus you can download and install any xbox game now, even if you don't own it, which would help with this. You wouldn't have to own every game that you transfer, so if you saw something cool you could just copy it over.
     
    thicc_gaf and Jay like this.
  20. Rodéric

    Rodéric a.k.a. Ingenu
    Moderator Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,043
    Likes Received:
    927
    Location:
    Planet Earth.
    I think ROM (cartridges or whatever read only electronic system) are better than disk, and digital only is not such a good idea, it should always be an option IMO.
    (Some people buy games at release finish them and resell them to play more games, there's no reason to prevent them, and disks are sooo slow...)
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...