Intel ARC GPUs, Xe Architecture for dGPUs

Discussion in 'Architecture and Products' started by DavidGraham, Dec 12, 2018.

Tags:
  1. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    That XMX throughput is assuming the same capabilities as for the Matrix Engines in Ponte Vecchio. When asked, Intel reps would not give more details on Alchemist than what was presented in the slides. That may have been different for US press, but if so, they should be explicit about it.

    It is possible, if unlikely, that for conumser grade GPUs Intel chose to have pure inference engines there while Ponte Vecchios HPC-style XMX are half as many, twice as wide and churn out 2048 ops/clk on TF32 (!), 4096 on FP16 & BF16 as well as 8192 on INT8. It's Vector Engines have been refactored too (8 x 512 Bit vs. 16 x 256 Bit), if that's any indication.
     
  2. Insight

    Newcomer

    Joined:
    Sep 30, 2020
    Messages:
    148
    Likes Received:
    415
  3. Dampf

    Regular

    Joined:
    Nov 21, 2020
    Messages:
    284
    Likes Received:
    474
    Krteq, Cyan, Janne Kylliö and 6 others like this.
  4. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    20,511
    Likes Received:
    24,410
  5. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    No CXL host attach for PVC.
    Sad!
     
  6. This part of the interview made me think they're planning on running XeSS completely off the Xe iGPUs present in the current Tiger Lake and future Alder Lake CPUs.


     
    digitalwanderer, BRiT and Picao84 like this.
  7. Jay

    Jay
    Veteran

    Joined:
    Aug 3, 2013
    Messages:
    4,032
    Likes Received:
    3,428
    Why is running at 1080p internal resolution only upto 2x faster compared to native 4K?
     
  8. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    Because you're assuming the slide that says it's for illustrative purposes only is actually showing accurate figures
     
  9. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    Is this good enough?
    Straight from Intel blog @ Medium
    https://medium.com/intel-tech/the-n...el-arc-high-performance-graphics-f68e7d2dc068
     
  10. OlegSH

    Regular

    Joined:
    Jan 10, 2010
    Messages:
    801
    Likes Received:
    1,630
    No, there is no word on "deep-learning upscaling method running efficiently on their integrated Iris Xe using RPM DP4A truly".

    This part "supported by Intel Xᵉ LP-based integrated and discrete graphics" means that XeSS runs on the Iris Xe, this tells us nothing new since we all know that Xe Lp supports Shader Model 6.4 with the dp4a instructions. There is no word on whether XeSS is performant enough even for reconstruction to 1080p on Iris Xe in games.
    What we really have is Intel's data on 1.5x frequency in comparison with the desktop DG1 part, perf modelling with this data in mind (though, in order to compete with RTX 3070 grade GPUs, Xe Max has to be ~12x of the discrete Xe LP) and Intel's XMX/DP4A graphs that are based on real testing.
    Based on these Intel's numbers, I estimated Xe LP runtimes here and here. Do you remember how restrictive DLSS 1.0 was on way more performant GPUs? Would you consider something like 5.5 ms for 1080p reconstruction to be efficient for the low performant Xe LP?
     
    pharma and DegustatoR like this.
  11. OlegSH

    Regular

    Joined:
    Jan 10, 2010
    Messages:
    801
    Likes Received:
    1,630
    Probably because Lumen does a whole bunch of work that has constant cost across different resolutions.
    From epic's docs, it seems Lumen's shading cache is decoupled from screen resolution, world space probes are decoupled, and geometry is at least partially decoupled.
     
    T2098, pharma, Jay and 1 other person like this.
  12. Jay

    Jay
    Veteran

    Joined:
    Aug 3, 2013
    Messages:
    4,032
    Likes Received:
    3,428
    Did wonder about that, but that would be a really bad example to use for promotional purposes.
    Do we know if it was using UE5 for the demo?

    Just seems very strange that running at 1080p can only get you upto 2x speed which implies that's the most you should really expect.
    I would have thought that would have been at 1440p.

    I need to check what the quality settings and internal resolutions are for DLSS2.1
     
  13. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    man talk about being anal about semantics.
    Are you trying to justify GEMM tumors everywhere now?
    They don't do anything duhhhhhhhhhh.
     
    Deleted member 13524 likes this.

  14. During the Digitalfoundry interview, it really seemed like Richard Leadbetter was trying to figure out if the DG2's dedicated tensor units were instrumental for XeSS to run effectively and efficiently, and IMO the answer was "No", again because Tom Petersen pointed to the integrated DG1 in their current and future CPUs as candidates to run XeSS.

    Here's the timestamped video:

     
    Kaotik likes this.
  15. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,344
    Likes Received:
    176
    Location:
    On the path to wisdom
    It seems to me that the answer to that question significantly depends on the target framerate. If DLSS/XeSS took, say, an extra 2ms per frame on tensor units and 4ms using DP4A then the latter might be fine if you aim to get 30-60fps, but not for 120fps.
     
  16. troyan

    Regular

    Joined:
    Sep 1, 2015
    Messages:
    605
    Likes Received:
    1,125
    TAAU in UE4 costs less than 1ms for 1080p -> 2160p on a 3090. The difference between a 4ms ML and a 1ms TAAU approach will never be worth the huge performance impact.
     
  17. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    IIRC on a 3090 the cost of DLSS is very close to the cost of TAAU.
     
    Deleted member 13524 likes this.
  18. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,344
    Likes Received:
    176
    Location:
    On the path to wisdom
    That 4ms approach becomes the new 1ms approach as the hardware improves, though, until we get no more returns from method improvements. If we assume that current methods are already there then yes, a 3090 might well be beyond the point where tensor cores add much for temporal sampling/upscaling, while it's the 3060/3070 range that benefits most. Personally I believe there's a huge unexplored space of ML based methods for 3D rendering that would benefit from lower precision matmul acceleration, yet it's rather difficult to predict what might come out of it.
     
    Dictator likes this.
  19. Jay

    Jay
    Veteran

    Joined:
    Aug 3, 2013
    Messages:
    4,032
    Likes Received:
    3,428
    Also to get the same IQ may need to go from say 1440p -> 2160p for TAAU not 1080p.
     
  20. Dangerman

    Newcomer

    Joined:
    Apr 1, 2014
    Messages:
    43
    Likes Received:
    8
    I'm mostly wondering about the roadmap with Alchemist>Battlemage>Celestial>Druid, annual releases perhaps? I mean considering the also rumored roadmap of Alder Lake to Nova Lake in 2025 I wonder if eventually with Druid & Nova Lake is the start of Intel doing annual releases of CPU+GPU starting with Druid & Nova Lake (on Intel 20A/18A Process?) where Intel doesn't want to simply beat AMD in the performance per watt but also aim at Apple.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...