Intel Gen9 Skylake

Discussion in 'Architecture and Products' started by Paran, Aug 5, 2015.

Tags:
  1. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    Page 21 has some figures for preemption between the generations, which looks like it could be the switching delay for task preemption between a thread group and mid-thread preemption.
    The before time periods are reminiscent of some of the early evaluations for PS4 GPU audio compute by Sony's audio engineer, with App 2 being in the same range. I would imagine he'd have been happier if the numbers given back then were closer to Gen 9's usecs versus the tend of msecs that Gen8 and the PS4 GPU (loaded latency) provided at the time.
     
  2. So Skylake's GT4e is already in the same ballpark as the 8th generation of consoles, with a fraction of the TDP.

    That was fast.
     
  3. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    Yes, the new 72 EU GPU is getting close to consoles in performance. If there still are developers that do not validate and optimize their games for Intel GPUs, now is the time to start doing it. Intel is definitely now a competitor in the laptop gaming market :)

    GT4e requires an external EDRAM die, so it is more expensive to manufacture than the single chip console solutions. But Intel CPUs have huge (L3 and L4) shared caches and much faster CPU than the consoles... And TSX. AVX-512 is the best SIMD instruction set I have seen for ages. It would be nice to have all these goodies on consoles :)
     
  4. The SoC+eDRAM chips are probably more expensive (though they're a lot smaller but the cost for using 14nmn is probably higher?).

    But the whole system may actually be cheaper to make? For starters, its peak performance is achieved using a much narrower system memory bus. 128bit 2133MHz DDR4 on Skylake vs. 256bit 2133MHz DDR3 on the XBone. This means a less complex PCB, less memory chips and less power consumption.
    And then the fact that it consumes less power and generates less heat means they can save on the power regulation and the cooling system.

    Of course, this is highly speculative and it'll always be. As they stand right now, Intel would never subject their chips to the low-margin deals that a console requires.
     
  5. Newguy

    Regular

    Joined:
    Nov 10, 2014
    Messages:
    263
    Likes Received:
    122
    It's probably even higher considering they used a 1GHz clock as the base, don't Intel usually use roughly 1100MHz-1300MHz clocks for their higher end igpus? Very impressive performance I'm sure.
     
  6. Kaarlisk

    Regular Subscriber

    Joined:
    Mar 22, 2010
    Messages:
    293
    Likes Received:
    49
    Depends on whether it will be power limited. If there is a 95W TDP GT4e, it might run at the full 1300.

    Yup. The premium for GT4e (instead of GT2) may be as low as $50-$100. That may actually make it competitive with quad+discrete. Before Broadwell-C, the Haswell-R wasn't socketed and the systems it was in were too expensive.
     
  7. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    This came up in the 16-core thread, but it seems like Intel has bifurcated its core line, with one path being client and the other being server. There is a slide in one of the latest IDF presentations to the effect that cores are being specialized between the two markets.
    Is there evidence that the client and not the server Xeon line will get AVX-512? So far it has only come up for Xeon and Xeon Phi. Perhaps it's AVX 512/2?
     
  8. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    The fastest existing i7 consumer Skylake only supports AXV2 (http://ark.intel.com/products/88195/Intel-Core-i7-6700K-Processor-8M-Cache-up-to-4_20-GHz). So I assume there will not be AVX-512 in any i7 models. I believe that workstation Xeons (4-8 cores) have AVX-512 (in addition to big server chips). But there is no official info yet.
     
  9. Andrew Lauritzen

    Andrew Lauritzen Moderator
    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,629
    Likes Received:
    1,227
    Location:
    British Columbia, Canada
    IMHO it still has far too much DP FLOPs and other modern GPUs tend to agree. I'd easily trade another factor or two in DP FLOPs for more SP/3D throughput. Frankly I see this the other way around - if you really need heavy DP (and I still argue in most cases you don't, or only for a small part of the computation) then there are many suitable targets, including as you note CPUs which are quite competent. Xeon/Xeon Phi if you want to go nuts :)
     
    sebbbi and BRiT like this.
  10. iMacmatician

    Regular

    Joined:
    Jul 24, 2010
    Messages:
    797
    Likes Received:
    223
    Which slide is this?
     
  11. Paran

    Regular

    Joined:
    Sep 15, 2011
    Messages:
    251
    Likes Received:
    14
  12. Kaarlisk

    Regular Subscriber

    Joined:
    Mar 22, 2010
    Messages:
    293
    Likes Received:
    49
    If it is just a GPU, absolutely.
    However, one might also position this as an alternative to AVX-512:
    On server CPUs, it is more efficient to implement high-throughput units within the cores, as they do not need an iGPU
    On client CPUs, as they already have the GPU, it is cheaper to add DP capability to the GPU instead of implementing AVX-512
    So the issue of DP not being exposed through OpenCL may be even more important, as it limits developers when experimenting and figuring out whether they have a use for the GPU for pure compute applications.

    Slide 10
     
    iMacmatician likes this.
  13. AlexV

    AlexV Heteroscedasticitate
    Moderator Veteran

    Joined:
    Mar 15, 2005
    Messages:
    2,535
    Likes Received:
    144
    GPUs are many things, but drop-in replacements for established CPU SIMD/vector bits they are not. Also I share Andrew's opinion that people frequently misuse doubles i.e. "it makes the bug go away, clearly I need MOAR PRECISION". What would be beneficial is a better understanding and mastery of numerical analysis, as opposed to more double throughput to be thrown at the problem.
     
  14. Kaarlisk

    Regular Subscriber

    Joined:
    Mar 22, 2010
    Messages:
    293
    Likes Received:
    49
    Still, isn't there at least some overlap in possible applications?

    I certainly did not mean to dispute this.
     
  15. entity279

    Veteran Subscriber

    Joined:
    May 12, 2008
    Messages:
    1,332
    Likes Received:
    500
    Location:
    Romania
    1/4 DP rate is a very decent rate for a consumer GPU. AFAIR, 1/4 should mean no investment into DP and no crippling either. 1/2 would have needed some more hardware to be added by Intel.

    Intel, I and i expect the great majority of programers would rather see some invetsment in extra hardware at the CPU vector extension side, as they are far more widely applicable.
     
  16. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    1/4 DP rate is perfect. I don't understand the need of 1/2 rate DP. A good programmer analyses his data and formulas and knows where the extra precision is important and where it doesn't give you any improvements. Even if you need DP, an optimized program should be a mix of 32 bit float, 64 bit float and 16/32 bit integer. In consumer applications (such as games and image processing), you get very good results by mixing 32 bit floats and 16 bit floats (and some 16/32 bit integers of course).

    In games (and image processing applications) we analyze our data carefully and frequently use formats such as 8/16 bit normalized (fixed point) and 10/11/16 bit float to optimize our memory bandwidth. Double rate 16 bit float processing (and halved GPR cost of 16 bit registers) is much more important than DP for us. Intel's fp16 implementation is now top notch. This combined with 2x improved integer processing rate (in Broadwell) and 1/4 rate DP (which is better compared to other consumer GPUs) make their GPUs very good for multiple purposes, both consumer workloads and workloads that need DP. For brute force pure DP workloads you should consider a Tesla (1/3 DP rate) or a Xeon Phi.
     
    Kaarlisk likes this.
  17. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,235
    Likes Received:
    4,259
    Location:
    Guess...
    Does anyone know how many ROPs and texture units are in GT2?
     
  18. Ryan Smith

    Regular

    Joined:
    Mar 26, 2010
    Messages:
    629
    Likes Received:
    1,131
    Location:
    PCIe x16_1
    8 pixels per clock per slice (with 1 slice for GT2).
     
    pjbliverpool likes this.
  19. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,235
    Likes Received:
    4,259
    Location:
    Guess...
    Thanks, that's more than I was expecting, so potentially 24 ROP's for GT4e. Accounting for the higher clock speed that's in PS4 fill rate territory!
     
  20. Andrew Lauritzen

    Andrew Lauritzen Moderator
    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,629
    Likes Received:
    1,227
    Location:
    British Columbia, Canada
    BRiT likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...