Intel ARC GPUs, Xe Architecture for dGPUs

Discussion in 'Architecture and Products' started by DavidGraham, Dec 12, 2018.

Tags:
  1. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    ODI is very not soon(TM).
    Also Co-EMIB is not ODI.
     
  2. Those are excellent results, and they should give Renoir a hard time.

    I wonder how the 48 EU Tiger Lake is being so much better than the 64 EU Ice Lake. Maybe they Xe EUs are wider?
     
  3. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
  4. Dayman1225

    Newcomer

    Joined:
    Sep 9, 2017
    Messages:
    77
    Likes Received:
    169
    Raja has posted photos of what appears to be 3 Seperate Xe -HP based GPUs.


    We have seen the one on the left hand side before but the smaller one and larger ones are new.

    Many are speculating that the larger one is a 4 Tile Arctic Sound GPU
    [​IMG]
    And this picture here that Raja posted seems to prove that:
    upload_2020-6-26_22-4-47.png
    ATS = Arctic Sound
    4T = 4 Tiles.

    Raja also gave a vague hint on performance


    Almost 1 PetaOps, many assume this is INT8.
     
    tinokun, Lightman and digitalwanderer like this.
  5. 256 TOPS per Arctic Sound chip.
    If that's INT8, then 128 TFLOPs FP16, 64 TFLOPs FP32?

    Naah, way too much. Unless it's using some dedicated tensor units and those are matrix operations like nvidia's hardware.
     
  6. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,462
    Location:
    Finland
    They could very well have some tensor units in there, or some other means to run low precisions at much higher rate.
    These chips fit nicely with the old leak too
    upload_2020-6-27_5-11-3.png
     
    Dayman1225 and Lightman like this.
  7. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,462
    Location:
    Finland
    https://newsroom.intel.com/press-kits/architecture-day-2020

    All kinds of details, also gaming GPUs will be "Xe-HPG" and they'll be made at external foundry (read: TSMC)
    In Xe-LP they've gone from Gen11's 4 FP/Int + 4 FP/ExtMath to 8 FP/Int + 2 ExtMath pipes and two EUs now share Thread controller. 6 texturing units capable of 48 texels/clock and 24 ROPs for 96 EUs

    edit:
    Also, Xe-HP FP32 FLOPS: 1 tile ~10.6 TFLOPS, 2 tiles ~21.2 TFLOPS (1.999x) and 4 tiles ~42,3 TFLOPS (3.993x). 512 EUs running at 1,3 GHz
     
  8. Could be Samsung. TSMC will be for their CPU but OTOH they seem pretty full with orders.
     
  9. digitalwanderer

    digitalwanderer Dangerously Mirthful
    Legend

    Joined:
    Feb 19, 2002
    Messages:
    18,987
    Likes Received:
    3,529
    Location:
    Winfield, IN USA
    Well Raja sure seems excited, so you know it's gonna bomb! :D
     
    Kej, Lightman, Cuthalu and 6 others like this.
  10. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,211
    I am not expecting the gaming chips (Xe-HPG) to provide any stellar or competitive performance with AMD or NVIDIA, since it relies on the same scalability scheme as Xe-HPC, ie: relies on racking up several tiles of graphics to scale up core count, this will be a mess for drivers and games in general.
     
  11. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    No, those are single dies packed in organic carriers.
    The actual IP is just subpar.
     
  12. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,211
    The architecture relies purely on software scoreboarding (software schedulers), which means Intel will have it's hands full writing good drivers for it to achieve good utilization (VLIW5 days anyone?), then on top of that they are scaling it up through tiling (multi core/die approach), which means it's going to be a nightmare to write drivers for, and to extract good performance from.

    This is literally the laziest effort for making a new GPU in recent memory.
     
  13. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Removing scoreboarding from hardware enabled the power nightmare that was Fermi to become the somewhat efficient Kepler (among others of course). So, at least for the starting point, which is integrated graphics, that step totally makes sense. And Intel has a ton of software people (idk though if they are necessarily good at gfx driver compilers).
     
    Lightman likes this.
  14. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    Not really.
    Nah, QC takes the cake.
    Their s/w top talent is very much compiler people so this move isn't unwarranted.
    Unfortunately the IP is still "meh" at best and they clearly lack focus.
    Like dear god, what, 4 flavors of Gen12?
    Why even.
     
  15. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,211
    It is also one of the reasons why Kepler suck in modern gaming years after the end of life status of it's drivers, that in addition to it's weird FP32 units arrangements, it required very high effort in writing compilers, which didn't really help it on the long run.

    Fermi had troubles with the 40nm process, that was the main reason for it's power hungry status, not hardware schedulers, Tesla and G80 had it before and they were not power hungry chips. Furthermore, Kepler didn't remove them completely, and If I recall correctly, most elements of hardware scheduling came back in Maxwell, Volta and Turing.
     
  16. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Thanks for shortening my quote to fit your narrative. I explicitly said "among others of course)". And yes, Kepler had some failsafe mechanism to keep things in check, in case SW did not work that well.
     
  17. Rootax

    Veteran

    Joined:
    Jan 2, 2006
    Messages:
    2,400
    Likes Received:
    1,845
    Location:
    France

    Maybe wait to see real performances before being so affirmative ?
     
  18. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    +20% over Vega8 so I don't even.
     
  19. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,211
    We have a long history of GPU architectures to judge and forecast performance from, nothing is affirmed of course, but it's worth going through the motions to predict where performance will lie given what we already know from past experiences.

    Furthermore, Xe-LP still retains the abysmal max 1 primitive per clock rate, and worse yet, it lacks all of the features from DX12U, except hardware RT.

    Intel removed hardware scoreboarding from Gen11, which wasn't really that effective there to begin with. Gen11 had one Thread Control unit handling 2 ALUs, each ALU had control over 4 FP32 instructions, so in total each Thread Control unit had access to 8 FP32 instructions, which I would call a pretty weak arrangement to begin with. Intel didn't change this arrangements in Xe-LP, instead it allowed each Thread Control unit to supervise 16 FP32 instructions now, further weakening their already weak position.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...