Intel i9 7900x CPUs

Discussion in 'PC Industry' started by Davros, May 29, 2017.

Tags:
  1. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    7,982
    Likes Received:
    2,427
    Location:
    Well within 3d
    From the high level summary of the caching model, it would be undesirable for the L2 to have a line populated and accessible before the L3 and its coherence/snoop information is populated with a state consistent with the L2's status and core-use information.

    The L2 prefetcher's fetches create an L2 miss, which I think a more straightforward implementation would then go the L3 to see if the data is there.
    If not there, then the L3 slice could generate an L3 miss that would then send a request to memory or broadcast a request if it is an SMP setup.
    The listed behavior would readily fall out of this chain if the sequence is maintained, and the rule is that the prefetcher's miss can be discarded and the L3 slice's miss cannot.

    I'm not sure if that means a message or signal is sent out to actively cancel the L2's request or the L2 makes a note of ignoring or rejecting it. Ignoring it might work since Intel's level of inclusion is not total, and the cores can silently evict lines without telling the L3. This would be like a preemptive eviction of a line. At worst, that leads to redundant snoops or invalidates that yield nothing.

    That might not be what the hardware necessarily does. There are events separated by variable amounts of time, and it may be possible to shift events around or bypass stages as long as the arbitrating hardware properly isolates intermediate states or recovers from problems. For Intel's inclusive L3, the slices each have agents that manage that arbitration, so it would seem like the L3 would be the nearest place to update when the caching agent starts processing the transaction.

    I'm not sure if any of the mentions yesterday were meant to have gone through or if mentions don't archive, since my list doesn't seem to have any from this thread.
     
    #141 3dilettante, Jul 20, 2017
    Last edited: Jul 20, 2017
    CarstenS likes this.
  2. xEx

    xEx
    Regular Newcomer

    Joined:
    Feb 2, 2012
    Messages:
    898
    Likes Received:
    366
    Be careful with the OC...

    [​IMG]

    That was 1.25v on a 7800X
     
  3. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,737
    Likes Received:
    1,970
    Location:
    Germany
    There's something else wrong witht that CPU/Board then. 1.25v is default VID for 4,5 GHz 2c-TBM3 in SKX.
     
    BRiT likes this.
  4. Clukos

    Clukos Bloodborne 2 when?
    Veteran Newcomer Subscriber

    Joined:
    Jun 25, 2014
    Messages:
    4,426
    Likes Received:
    3,737
  5. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    689
    Likes Received:
    242
    Meanwhile I also got a retail 7820X octa core.
    I can confirm it indeed has both FMA AVX 512 units enabled.

    Here a AVX 512 julia/mandelbrot real time zoomer I made up to date.
    Computations done in double precision.
    Compared to a Titan XP, it runs twice faster :)

    Warning, when running all 8 cores at 4 Ghz, CPU power is up to 208 Watt !

    Edit
    - Replaced with a slightly less optimized version, to reduce the heat, 10% less heat and speed.
    (my cooler can't cope, with CPU at ~100 degrees celcius)
    - Added a fall back to AVX2 if no AVX512 is present
    - Added a missing libmmd.dll ( had to use an Intel compiler and didn't find a way to get rid of this dll)
     
    #145 Voxilla, Jul 28, 2017
    Last edited: Jul 29, 2017
    Laurent06, pharma, Alexko and 4 others like this.
  6. entity279

    Veteran Regular Subscriber

    Joined:
    May 12, 2008
    Messages:
    1,194
    Likes Received:
    397
    Location:
    Romania
  7. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,390
    Likes Received:
    802
    Very interesting! What happens if you keep the optimized code path, but downclock and undervolt the CPU a bit?
     
  8. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,737
    Likes Received:
    1,970
    Location:
    Germany
    4 GHz at AVX512 load with unlimited power by UEFI sounds much like the MSI X299 board. Other boards enforce the 140 watt TDP, downclocking the 7900X for example to 3,1-3,2 GHz in AVX512 loads.
     
    Lightman likes this.
  9. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    689
    Likes Received:
    242
    I'll be adding the fully optimized version, so it can be tried on CPUs with more safe settings.
    The additional optimization is 4 way interleaving of computations. The fractal computations are one long dependency chain and the FMAs have 4 or 6 cycles latency. Interleaving and SMT mitigates the dependencies.
    The less optimized version does only 2 way interleaving.

    I'd like to keep my CPU at 4 Ghz for AVX512, only for this extreme kind of code it is a problem.
     
    Alexko and Lightman like this.
  10. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    689
    Likes Received:
    242
    Indeed, the board is a MSI X299 Tomahawk. I'm running with the Enhanced Turbo on, which means all cores run normally at 4.3 Ghz. AVX512 would not run at that frequency. To fix that I put 'AVX offset' to -3, which causes frequency to be reduced to 4 Ghz when running AVX/AVX512.
     
  11. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,737
    Likes Received:
    1,970
    Location:
    Germany
    I see. 4.0 GHz still is all-core turbo and not what non-insane UEFIs do use. No wonder you're having problems cooling that amount of heat with air. :)
    Any chance you could make that more optimized torture version of your mandelbrot/julia renderer available again? And does it tax GPUs equally heavy? For now, your Waves3D is hammering GPUs the most, even though it largely depends on bandwidth.
     
    Lightman likes this.
  12. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    689
    Likes Received:
    242
    I'll be adding the fully optimized version tonight.
    The CPU and GPU code are very similar. On GPUs there is no explicit interleaving of computations but I would think the inherent threading takes care of FMA dependencies, so it's likely optimal on GPUs too.
     
  13. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    689
    Likes Received:
    242
    Ok, I've updated the AVX2 / AVX512 / GPU fractal zoomer, to include fastest AVX512 computation.
    This can be toggled on/off with the 'F' key. You may have to disable waiting for vsync to see the difference (V key).
    Warning, this code can produce extreme heat, even more than prime95, use at your own risk !
     
    Alexko, Lightman, BRiT and 2 others like this.
  14. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,737
    Likes Received:
    1,970
    Location:
    Germany
    Thanks! If I find the time, I'll test it against my current worst case tomorrow (but with a more tame UEFI that's honoring the 140 Watt TDP - I'm measuring achieved clock rates here instead) :)
     
    Alexko likes this.
  15. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,283
    Location:
    Helsinki, Finland
    http://www.anandtech.com/show/11687/coffee-lake-not-supported-by-intels-200series-motherboards

    Forthcoming Coffee Lake (6-core / 12 threads non HEDT consumer chips) needs new motherboards. This makes HEDT 6/8-core and Ryzen much more appealing upgrade options for many consumers, since you can't simply plug the new Coffee Lake 6-core to your existing Skylake 6600K/6700K socket. Someone needs to upgrade the Wikipedia page (https://en.wikipedia.org/wiki/LGA_1151).

    I was also considering the highest clocked 6-core Coffee Lake as an cost effective upgrade path for our non-programmers (we all have Skylake 6700K now). I will get myself a Threadripper in any case, but now it seems that Threadripper would be a pretty good upgrade path for all of us (that 12-core / 24 thread model at 799$ is very aggressively priced).
     
    #155 sebbbi, Aug 3, 2017
    Last edited: Aug 3, 2017
    DavidGraham, Lightman, Kyyla and 2 others like this.
  16. Malo

    Malo YakTribe.games
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    6,557
    Likes Received:
    2,591
    Location:
    Pennsylvania
    Gee what a surprise.
     
  17. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    689
    Likes Received:
    242
    What do you use the large amount of threads for ?
     
  18. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,283
    Location:
    Helsinki, Finland
    UE4 code recompile takes 25 minutes on 6700K. Shader recompile (console target) takes over an hour (UE4 has so many shader permutations). Data cooking is also slow on quad (I have fast SSD obviously). Many console platforms + PC + debug/release, so there's plenty of these operations happening. Quad loses 30+ min of your time every day, and more than an hour in bad days.
     
    Lightman, Alexko, DavidGraham and 4 others like this.
  19. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,390
    Likes Received:
    802
    What would non-programmers do with all that power?
     
  20. BRiT

    BRiT (╯°□°)╯
    Moderator Legend Alpha Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    11,050
    Likes Received:
    6,732
    Location:
    Cleveland
    Play Crysis ...
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...