Xbox Series X [XBSX] [Release November 10 2020]

Discussion in 'Console Technology' started by Megadrive1988, Dec 13, 2019.

  1. Jay

    Jay
    Veteran Regular

    Joined:
    Aug 3, 2013
    Messages:
    3,512
    Likes Received:
    2,855
    Considering the upscaling would need to be implemented in engine?
    Just rendering at a higher resolution like the x360 & OG Xbox BC would be a huge benefit.

    They actually demoed it on gears remaster to DF during the initial reveal to them, but nothings come of it yet.

    Some of those 120fps boost titles could've done with a resolution bump instead.
    Some titles could even do both.
     
  2. eastmen

    Legend Subscriber

    Joined:
    Mar 17, 2008
    Messages:
    12,534
    Likes Received:
    3,467
    I'm wondering if they could just enable it through the emulation software they are using and just force it.
     
  3. Jay

    Jay
    Veteran Regular

    Joined:
    Aug 3, 2013
    Messages:
    3,512
    Likes Received:
    2,855
    I wouldn't hold your breath for ML Upscaling that doesn't also require motion vectors as input. But I'm no expert.

    It must've run into problems, but as I said they did have a way to render XO games at higher resolutions. Probably more demanding than ML upscaling but doesn't need to be done in engine, and would give good results.

    It's really needed for XO/1S games running on XSX and especially XSS. 1X titles aren't as bad.
     
    BRiT likes this.
  4. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    17,786
    Likes Received:
    7,836
    Sure, but in any given period of time, if your processing is limited by how much data you can fetch in X amount of time, it doesn't matter if you process items more quickly due to clock speed or due to more CUs.

    Basically if your SOC is sitting idle because you can only get X amount of data to the SOC then it does not matter how quickly it can process that data. The faster you process it, the faster you get to sit there and wait for more data to come in.

    In that case, regardless of the system architecture, the system that can provide more data to the SOC will have an advantage.

    That's what happens if you become limited by your bandwidth. Your SOC could be operating at 10 GHz instead of 3 GHz, but both are rendering at the same speed if they both have the same bandwidth and are limited by it. Another SOC could have 1000 CUs or 300 CUs, but they are both rendering a scene at the same speed if they are limited by the same bandwidth.

    In situations like this where pure bandwidth is the main limitation, then whatever system has more bandwidth will have the advantage regardless how much faster or wider an SOC is than another SOC. It doesn't matter how quickly your SOC can process the data if your system can get less data to the SOC than another architecture.
    • The PS5 can, in ideal situations, process 448 GB/s of data from main memory. The SOC can process more than that depending on the workload, but that is the absolute limit if it has to access main memory.
    • The XBS-X can in ideal situations, process 560 GB/s of data from main memory. The SOC can process more than that depending on the workload, but that is the absolutely limit if it has to access main memory.
    Bandwidth limited situations means the SOC can process more than those data rates at that point in time, but that it's limited to processing data by those data rates. When this situation arises, it doesn't matter that you have similar or greater bandwidth per CU, your system is still sitting there twiddling it's thumbs if it's already pulling in the maximum amount of data that it can.

    Bandwidth obviously isn't the only limitation you can run into, so how much of a limitation it is depends on how often your system needs more data than it can pull from main memory.

    Regards,
    SB
     
    mr magoo likes this.
  5. snc

    snc
    Regular Newcomer

    Joined:
    Mar 6, 2013
    Messages:
    919
    Likes Received:
    668
    yes, as I wrote, xsx has 25% bw advantage (if we taking into account only 10gb of ram)
     
  6. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    13,182
    Likes Received:
    16,039
    Location:
    The North
    The only thing I would add to this is that you can also have a CU bottleneck as well. If you have a workload that is significantly larger than the number of available ALU, you have a compute deficiency that cannot be rectified by bandwidth.

    So for extreme examples in this case 1 CU @ 78Ghz vs 36 CU @ 2.3Ghz. The latter will outperform the former at the same bandwidth and memory setup. This is because the former has to make 36x more write trips and 36x more read trips than the latter. So while the bandwidth is available there isn't enough CU's to take advantage of it.

    So you could have 1 TB of bandwidth but a single CU is only capable of requesting so much data before it's full. You can process it fast sure, but requesting data and writing data is likely the slowest part of the process here because latency becomes a factor the more times you make requests. We traditional hide latency by introducing more threads, but once again there is a limit to that as well.

    I definitely don't think having a 36x faster front end on the graphics side will make up for the number of memory trips later down the pipeline.

    1 CU will need to make 36x requests vs the 36CU 1 request to fulfill the same amount of work. You can eventually extrapolate this to other items over time.

    The reason why we don't see things like 80CUs and such have a huge lift over smaller CUs is likely because the workload just hasn't been large enough where you see the smaller ALUs combined with smaller caches fall off a cliff.

    It's not always linear and often more than not, most things run very well until you reach a workload point in which you break the camels back and things get progressively worse running.
     
    #2686 iroboto, Apr 30, 2021 at 10:15 PM
    Last edited: Apr 30, 2021 at 10:21 PM
  7. DSoup

    DSoup meh
    Legend Veteran Subscriber

    Joined:
    Nov 23, 2007
    Messages:
    14,987
    Likes Received:
    11,086
    Location:
    London, UK
    Yup. Within a certain envelope, the clock speed and number of compute units does not matter, it's about how many free compute cycles you have available in any given timeframe and Series X should have more unless that higher ceiling has been eaten into running a higher resolution.

    All things being equal, Series X should support better RT than PS5.
     
    Johnny Awesome likes this.
  8. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    17,786
    Likes Received:
    7,836
    Yes. In a nutshell, many people don't realize that every architecture out there is going to be limited at X time by Y thing. Until such time as there are unlimited resources and capabilities on some piece of hardware, there will always be times when it's limited by some bit of it's architecture.

    It's just that when something is the fastest piece of hardware, you don't think about the times when it hits it's limitations because it's human nature to assume it's not limited because it's the fastest thing on the market. But anytime some part of the hardware is idling, it means another part of the hardware is operating at its limits, thus limiting the overall performance of the hardware at that point.

    The practical holy grail of hardware isn't to design something with no limits, but to design something where each piece can be the limiting factor at some point. IE - the HD 2900 XT wasn't a good design because it's bandwidth was never a limitation (thus transistors could have been better used for something else) for any real world use.

    Now when comparing 2 pieces of hardware, the interesting things is to tease out how one architecture might be limited by X feature in Y situations versus another architecture. Unfortunately, often a lot of noise comes in with partisan comments that this means one architecture is overall better than another because of that, when that isn't necessarily the case at all.

    Just because one system might be slightly better at RT doesn't suddenly make the other architecture not good. Just because one arch has a lower clock speed doesn't make it worse. Just because one arch has more CU doesn't mean the other arch is bad. Etc., etc.

    If someone can't acknowledge when their arch might have a limitation that another arch is less limited by, then there's no way they can fairly judge different architectures. Likewise if they can never admit that another arch than the one they like is better in some areas, the same problem arises.

    Of course, in a good technical discourse, there will always be a back and forth about the relative strengths or weaknesses and how that impacts the overall performance of an arch or even potential discussion about whether or not some feature is a weakness or a strength with evidence provided by real world applications after a product has spent enough time on the market.

    It's unfortunate that I sometimes see too much of X is better than Y because ... limited data. It's still early in the product cycle. Each arch has been on the market for less than a year. Very little software has been written to utilize the features of either product. Yet, some are already making claims that X is better at Y thing on such limited data.

    That said, I do appreciate all the people that keep an open mind and attempt to steer the discourse into talking about why A might be better than B in S product doing X, Y, or Z thing. Is it the hardware? Is it the software? Is it the development environment? Is it something non-obvious? Is it the skill level of the developer? Is it the time spent on A or B arch? Etc.

    It's also a little frustrating if someone keeps point out that X thing is true because this is how that person interprets Cerny's words yet at the same time dismisses anything Andrew Goosen might have said about the arch he helped create. Likewise, going the other way around, pointing out things Goosen said while ignoring things that Cerny said.

    Also, if someone is going to go through a video frame by frame to find places where X arch is doing something better than Y arch? Your argument will be stronger if you also point out those frames where Y arch is doing something better than X arch. Otherwise, you'll often come out looking partisan when that may or may not be your intention. There's been a lot of screenshots posted here attempting to show that X arch is better or worse, and then later having someone else come in and show that the opposite is true depending on what frame or screenshot that person cherry picked to show some alleged superiority or inferiority. Or at the very least while you're looking for proof that your arch is definitely better, at least try to make sure that there isn't also evidence in the footage of your favorite arch doing the exact same thing? :)

    People, it's not the end of the world if your system of preference is slightly worse or slightly better at this or that. :) The fun is in looking at what it happening and trying to tease out any details we can from it.

    Bleh, this turned out a lot longer than I intended. Perhaps a side-effect of my not wanting to put people on ignore. :p

    Regards,
    SB
     
    Johnny Awesome, mr magoo and iroboto like this.
  9. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    13,182
    Likes Received:
    16,039
    Location:
    The North
    It’s tough to saturate a GPU. I have nvidia-Smi on when I profile my code for data work and I can barely make a blip up 14%. It’s just sitting around waiting for data to do work. Computation is just so fast.

    I get the challenges that everyone puts forward as pros and cons. But so much of that is just hardware talk; making software maximize said hardware is incredibly difficult and we probably don’t put enough focus on how hard that may be.
     
    Silent_Buddha likes this.
  10. snc

    snc
    Regular Newcomer

    Joined:
    Mar 6, 2013
    Messages:
    919
    Likes Received:
    668
    ok so we have new quite good benchmark re8 rt mode and still gap is minimal, so maybe indeed cu count * clock is better performance indicator than only cu count ? ;)
     
  11. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    18,883
    Likes Received:
    21,271
    We only have a time limited demo, perhaps wait until the full release game where tech-heads have more time and regions to analyze?

    Though I do think that it sometimes can be closer to Count*Clock-vs-Count*Clock than just Count-vs-Count. It all depends where the limiters are for the workloads.
     
    snc likes this.
  12. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    18,883
    Likes Received:
    21,271
    *Ahem* Behave. Also, not the thread for console comparisons. We generally avoid those because of how quickly it loses the ability to remain as a civil discussion.
     
  13. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    13,182
    Likes Received:
    16,039
    Location:
    The North
    They just do different things, if we're talking about particular functions required in a pipeline right. Clockspeed increases the speed of the whole pipeline, having more CU's may only assist in improving 1 or 2 aspects of the pipeline (in scenarios where more hardware units would be advantageous to have).

    Since you're measuring just the final output, you don't know what aspects PS5 is doing better or worse than XSX. This isn't the same as talking about strictly RT performance; and in order for there to be a larger gap between RT performance, the amount of workload around RT has to be greater as it must be a greater % of frame time in order to see a differential in RT performance.
     
  14. snc

    snc
    Regular Newcomer

    Joined:
    Mar 6, 2013
    Messages:
    919
    Likes Received:
    668
    yeah but thats how generally games works, but maybe you are right and there exist syntetic benchmark that would show increase rt performance comparing to compute performance tough I'm not sure
     
  15. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    13,182
    Likes Received:
    16,039
    Location:
    The North
    yes, games work that way because not all parts of the game are being run in parallel over the CUs. At least with the older APIs, there is a heavy reliance on the 3D pipeline and the fixed function units to do a lot of heavy lifting.

    But in the question of just asking if having more CUs (and RT Units) would be more advantageous in RT performance, vs having less but with more clock speed, I would say yes, it likely is. Ray traversal takes a while, having more units and more bandwidth would be ideal in this scenario. But if the workload is not large enough, that advantage in RT units will likely not make up the deficit elsewhere (ie, being much slower on the front end of the pipeline).

    When games leave behind last generation, i would re-assess this because the new APIs rely significantly less on the fixed function hardware and more on the compute units to do the work. And the only engine I know capable of doing next gen things without using newer API features, is UE5... which isn't out.
     
    mr magoo and snc like this.
  16. snc

    snc
    Regular Newcomer

    Joined:
    Mar 6, 2013
    Messages:
    919
    Likes Received:
    668
    ok do you think it still would have advantage if there is same tf and same bandwidth but more cu (so proportionaly slower clock) ?
     
  17. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    13,182
    Likes Received:
    16,039
    Location:
    The North
    The best way to represent this would likely be a graph of CUs and clock speed vs Performance while locking Teraflops and bandwidth. Looking at the graph you'll find a local maxima and you're also likely find a very particular combination in which too many CUs would under perform significantly, and too few CUs would underperform.

    The answer would probably just be looking at the graph if that makes sense. You're going to want to choose the profile that gives you the best maxima in this case. But different workloads will likely result in different maximas.

    So there is that consideration as well. I'm not sure if Sony and MS went for overall best average performance, or if they biased their configuration for what they thought would be the future. Unfortunately this is a complete unknown, but an interesting discussion none the less.
     
  18. mr magoo

    Newcomer

    Joined:
    May 31, 2012
    Messages:
    193
    Likes Received:
    322
    Location:
    Stockholm
    Maybe new metro:exodus will shed some light on this manner. I personally think that RT will play less role further we will go in next gen. New consoles are not powerfull enough with 1gen amd RT hardware for advanced RT. And tbh RT costs vs visual fidelity is very bad, if remember correctly Alex from DF said that the simplest RT effect adds 6ms rendering time to 6800. This budget could be spent elsewhere with possible bigger impact on overall visual representation. But maybe in time engines will change and RT will play bigger role and have bigger impact. Right now i feel its like tessellation on x360.
     
    snc likes this.
  19. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    17,786
    Likes Received:
    7,836
    Singular games are almost never a great benchmark for anything other than overall performance of an arch for a given game. IE - you can't usually make judgements about any specific implementation detail of hardware or software.

    Even on the PC, this is the case, although on PC you have greater control in that you can enable or disable things to try to tease out possibilities.

    Imagine it this way. Lets say some hypothetical game is hypothetically
    • Limited by physics calculations 5% of the time.
    • Limited by compute 25% of the time.
    • Limited by rasterization 10% of the time.
    • Limited by RT 15% of the time.
    • Limited by trips to main memory (bandwidth) 10% of the time.
    • Etc.
    If the game runs well on X hardware versus Y hardware, can you definitively say that A portion of hardware X, B portion of hardware X, or C portion of hardware X is the main reason that X hardware performs better than Y hardware? Now to further complicate things lets say the percentage that those numbers are for X hardware, but the percentages change for Y hardware because Y hardware has made different choices in how the hardware was designed. Yeah, not easy to say...

    We can take guesses, certainly, but they'll never be anything more than guesses without being able to see the developer's internal performance graphs.

    Things are further complicated by the fact that we generally can't run console games with unlocked framerate. Games often run at different resolutions, further complicated by dynamic resolution where we're relying on someone attempting to determine resolution of a very VERY (as in extremely) small sample of frames. The potential for error there is huge even with people and sites that have been doing this for years.

    This is further complicated when the performance lead for 2 pieces of hardware might swap positions depending on what level or what portions of a level are benchmarked. In the PC space, you can (and some sites have) cherry picked where they benchmark a game because that particular section will put their favored hardware in a more favorable light, while another site might choose a different point in the game to benchmark because it puts their hardware ahead in that portion of the game.

    Benchmarking to determine hardware performance of specific parts of a given piece of hardware is already hard and more often than not either indeterminant or misinterpreted. And that's where you at least have some level of control over the benchmark conditions in attempting apples to apples comparisons. And that's also where you have some tools available to you to attempt to see what the hardware is doing.

    Trying to do that on consoles where you have almost no control over the benchmark conditions and no hardware level tools?

    If one console performs relatively better in RT on mode versus RT off mode (if it has that setting), this might or might not tell us anything about relative RT performance ... assuming that anything that changes between RT on and RT off is limited only to RT. But if other things also change (like resolution, effects, etc.)? It'll be interesting certainly, but I'm not sure how much we would be able to take away from it.

    BTW - this isn't to say that it won't be fun to talk about it. Just trying to emphasize that there is likely no case and no one game we can look at that will definitely tell us that X platform or Y platform is better or worse due to A hardware implementation or B hardware implementation.

    At the end of the generation after looking at the body of work? Maybe we can make some generalizations? But talking about it and trying to figure it out is fun. :) Just no-one should take any of the discussion as evidence of a fact.

    Oh and back to why I originally posted this. There are no good benchmarks on console. :) If we had performance graphs like we do on the PC version of Gears 5? And we could toggle individual rendering features on and off? Man that would help a lot. But we don't and we likely never will have something that breaks things down to that level on console.

    Regards,
    SB
     
    mr magoo, BRiT and snc like this.
  20. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    13,182
    Likes Received:
    16,039
    Location:
    The North
    I’m going to revise my answer here now thinking about it.
    You should never have more CUs than bandwidth can support and vice versa otherwise bandwidth or CUs will be idle.

    so your hypothetical would never occur. A single CU with a high clock rate could never withdraw more than a couple Gb/s of a larger memory pool and this is a result of transport and latency. You can only make so many read/write requests per second for each CU.

    this is largely the reason why GPUs continually increase in bandwidth (more cores with each generation) and CPUs are generally requiring more or less the same amount of bandwidth (less cores but much more powerful cores).

    so if you lower clock rate and increase CU to match the computational power, you must increase bandwidth to support the additional CUs otherwise technically they have nothing to pull.

    If you lower the CU count having more bandwidth won’t help even if you vastly increase the clockspeed.

    there is a bit of wiggle room here with this argument however; there is definitely a point in time where a marginal CU increase is easily offset by more clockspeed. But there is a limit on clockspeed in which the returns will taper off as you increase further.
     
    PSman1700, cwjs, mr magoo and 2 others like this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...