Intel Gen9 Skylake

Discussion in 'Architecture and Products' started by Paran, Aug 5, 2015.

Tags:
  1. Turtle 1

    Newcomer

    Joined:
    Oct 21, 2005
    Messages:
    77
    Likes Received:
    0
    Location:
    Mapleview MN
    I understand what your trying to do With highest Supported memory. But these are K chips . Enthusaist. People buy these for O/c and highest supported memory has really nothing to do with anything , The boards support XMP Right. So it supports higher memory speeds . Latency means a lot . My really old DDR1 BH1 timing was 2-2-2-2 -2 it matters . I ran those timing while overclocking the Memory as far as I know I was the only person to run that timing above 20 O/c . I also held the record at Hw bot for awhile I had the Gold . Been 10 years I believe 6,000 points on PC mark 6 . About sandy stock speed. I sure if I went and looked I could find it . But 6,000 back than had everyone shook up . I believe A gave thaT setup to an AT member. My thinking is keep CPU stock and run the memory as hard as it will go . That what we all want to see . These are not the Stock desktop . Who do you believe makes the better memory controller Intel or AMD . If Amd runs the default higher . Than match it with the intel . These are performance chips treat them as such . You got all the other Chips to run at default. It just your result were the worst out of everyones. If I was Intel you would never get another K type chip. Me and my wifes brother inlaw talked about just that earlier. You tried to O/c the cpu Without uping the dram speed . Keeping the base memory speed one would think your O/Cing would be Better. I will buy that Chip from you if its retail . I pretty sure 4.7 be easy . What a lot of people need to do is Go back and reread Anand SB review . and the orginal IPC that was stated . This Great SB chip was laughed at back than . Read the replies from that review . Everyone had it below 10% IPC I believe AT had it the lowest . My chip was showing around 16% IPC increase . Go to your forum we had a big debate on it. Today the way people talk SB was the second coming . I had right back than everyone else missed Big time . Same with Conroe I hit the IPC Right on the numbers that to is in your forum backlog . Go read the Forum topic I was Banned for saying it would cream AMD 64 . That's after intel had already shown us bench marks 5 months earlier . I just figured Intel was telling truth and they were . Everyone else was saying intel lied. Yet ban hammer. I pretty much stay away from hardware forums now days . This release got me active. Lets see if there is an Angel canyon as my wifes brother inlaw tells me .
     
  2. Turtle 1

    Newcomer

    Joined:
    Oct 21, 2005
    Messages:
    77
    Likes Received:
    0
    Location:
    Mapleview MN
    I would like to see the math on that if you would . I little confused here as this DDR3 is at 1866 cl9 and the DDR4 is at 2133 cl15. is exactly what others were saying and that's as I read it. Did they make a mistake an correct it . Where did you get those numbers from . Same exactly as I had it . My memory is really good and Reading comprehension is better yet . So latency was lower with DDR3 Using 1866. But if they used 1600 OK I believe that's more to it than that . I believe same topic was going on at AT today and they used a different formula. I stayed out of it as much as I could as I banned there . I deserved it to . I told Mod in PM to go screw himself and I wished him Harm LOL I was sick of my user name anyway . I steel post there . As I have 4 land lines here 1 in the house 3 in the shop 3 fiber optic 1 DSl. I will go find there math and copy paste the results here.
     
  3. Turtle 1

    Newcomer

    Joined:
    Oct 21, 2005
    Messages:
    77
    Likes Received:
    0
    Location:
    Mapleview MN
    I just did a quick look at ATs set up here is what I found
    Corsair DDR4-2133 2x8
    G.Skill DDR4-2133 2x8
    G.Skill DDR3-1866 4x4
    *Memory Timings used were the supported frequencies of each architecture,
    except DDR3L vs DDR4 testing, which used DDR3-1866 C9.
    For Skylake's DDR3L requirement, this was a DDR3 kit running with an undervolt to 1.42V.
    At 1.5V, the system failed to boot.


    Originally Posted by Walter E Kurtz [​IMG]
    Hardware Canucks uses DDR3 1866 Cas 11 and DDR4 2666 CAS 13(!). That is not even close. Hothardware and PcPer don't even post CAS latency under test system setup so god knows what they are testing. There are plenty of reviews where the memory difference between Haswell / Skylake comparison was reduced as much as possible, including here on anandtech showing the real IPC gain to be nowhere near the "usual" 10-15%
    Let's see what Anandtech said about their RAM, since you allege that Anandtech reduced difference in RAM "as much as possible".
    http://anandtech.com/show/9483/intel...h-generation/7
    How to measure performance, according to AT:
    Quote:
    Normally in our DRAM reviews I refer to the performance index, which has a similar effect in gauging general performance:
    DDR3-1600 C11: 1600/11 = 145.5
    DDR4-2133 C15: 2133/15 = 142.2
    As you have faster memory, you get a bigger number, and if you reduce the CL, we get a bigger number also. Thus for comparing memory kits, if the difference > 10, then the kit with the biggest performance index tends to win out, though for similar kits the one with the highest frequency is preferred.
    Performance index=frequency/CAS, supposedly.


    And now the RAM they chose:
    Quote:
    For these tests, both sets of numbers were run at 3.0 GHz with hyperthreading disabled. Memory speeds were DDR4-2133 C15 and DDR3-1866 C9 respectively.
    DDR4: 2133/15=142.2
    DDR3L: 1866/9=207.3
    A difference of 65 in favor of DDR3L



    Compare these to the "not even close" RAM that Hardware Canucks chose:
    DDR4: 2666/13=205.1
    DDR3L: 1866/11=169.6
    A difference of 39 in favor of DDR4


    It looks to me like Hardware Canucks' choice of RAM is actually significantly closer than Anandtech's.
    [​IMG]
     
  4. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,172
    Location:
    La-la land
    Depends on how you define "a lot"; in almost all workloads, the fastest, most expensive, lowest-possible latency boutique RAM will only buy you a handful percent speed increase, because most memory accesses hit the CPU's caches and not main RAM. Most people wouldn't agree that's "a lot", and would think it a terrible waste of money (it is, btw.)

    Your old DDR1 latency had only a couple clocks' latency because the memory was very slow-clocked. Don't stare yourself blind at clock cycle numbers, what really matters is actual latency (as measured in ns), not clock cycle counts.
     
    RedVi and BRiT like this.
  5. gongo

    Regular

    Joined:
    Jan 26, 2008
    Messages:
    582
    Likes Received:
    12
    Guys ...can we get back to..snooping around with Skylake dynamic clocks/fivr(lack of)...

    ram talk is boring...
     
    pjbliverpool likes this.
  6. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    Some reviews mentioned backbuffer color compression for Gen9 GPU. Color compression is huge for integrated GPUs, as the bandwidth of dual channel DDR3/DDR4 is limited to 25.6 GB/s - 34.1 GB/s (tenth of a high end discrete GPU). Color compression would be an easy 30%+ improvement for purely bandwidth bound cases.
     
  7. Infinisearch

    Veteran Regular

    Joined:
    Jul 22, 2004
    Messages:
    739
    Likes Received:
    139
    Location:
    USA
    Sorry gongo.

    I was trying from memory (most probably failed) at approximating the latency to the critical first word. I didn't take into account cmdrate since I didn't remember how. In addition I just used the CAS latency since I don't remember the timing diagrams anymore. So I quickly did a 1/f to get period and multiplied by the CAS latency. If someone can point me to a good timing diagram or operational description for DDR3/4 DIMM's it would be appreciated. Thanks.

    Skylake performance in some benchmarks is really impressive, I'm really impressed with Intel given they're still extracting IPC improvements from x86... I wonder how long thats going to last given the same bog standard cache hierarchy? Can't wait for the 72EU EDRAM skylake benchies, I want a cheap laptop that I can play some games on. Does anyone know offhand how much EDRAM adds to the price of an intel CPU at the same clock without EDRAM?
     
  8. Paran

    Regular Newcomer

    Joined:
    Sep 15, 2011
    Messages:
    251
    Likes Received:
    14
  9. pixelio

    Newcomer

    Joined:
    Feb 17, 2014
    Messages:
    47
    Likes Received:
    75
    Location:
    Seattle, WA
    Nice find!

    The changes from Gen8 that grabbed my attention are:

    [​IMG]

    Preemption is interesting.

    But if the EU thread scheduling scheme in pre-Gen9 wasn't round-robin then what was it?
     
  10. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,853
    Likes Received:
    4,463
    Isn't the GPU proportion getting bigger in each iteration?
    If that's a chip with 24 EUs, then the 72 EU chip will be almost twice as big, not to mention the eDRAM in a separate chip.
     
  11. Kaarlisk

    Regular Newcomer Subscriber

    Joined:
    Mar 22, 2010
    Messages:
    293
    Likes Received:
    49
    Yup.
    In Haswell, a GT2 GPU is approximately 3.4 times larger than a single CPU core.
    In Skylake, a GT2 GPU is approximately 5.4 times larger than a single CPU core.
    However, as always, the math might be different. And it may also be impossible to make the CPU core much larger.
     
  12. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    I don't remember that the old document described it exactly. Previously one EU had 7 HW threads (waves) ready to execute (each of these was either SIMD8/16/32 or SIMD4x2). There was two SIMD4 execution units. On a single cycle the two execution units could not both take an instruction from the same HW thread. If I understood correctly there was no other scheduling limitations. I didn't find any instruction latency chart either.

    My guess would be that the new hardware has more strict scheduling limitations, allowing more efficient hardware implementation.
     
  13. Kaarlisk

    Regular Newcomer Subscriber

    Joined:
    Mar 22, 2010
    Messages:
    293
    Likes Received:
    49
    Another change: eDRAM is now a "memory-side cache", not a "victim cache". The eDRAM controller is now a part of the system agent (previously it was a separate stop on the ring).
     
    fellix and Lightman like this.
  14. Paran

    Regular Newcomer

    Joined:
    Sep 15, 2011
    Messages:
    251
    Likes Received:
    14

    35,2% for the GPU doesn't look much different to me. Assuming ~120 mm² is correct, GT2 Gen9 is only 42 mm² big.
     
  15. Kaarlisk

    Regular Newcomer Subscriber

    Joined:
    Mar 22, 2010
    Messages:
    293
    Likes Received:
    49
    It was 31% of the die for the GPU in Haswell. There is rather a lot of unidentified space in the Skylake die image, unlike Haswell.

    Yup.
    Still, unlike the CPU cores, which shrank, the GT2 GPU has grown. Skylake GT2 is the first one that has become performant enough that it actually makes sense. Before Skylake, it was either don't care about graphics, so a GT1 will do, or go cheap discrete. Maybe except in the case of 4K desktop or something else specific.
     
  16. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    Maybe it has to do with a flattening of the prioritization of instruction issue, which may help with quality of service and possibly preemption.
    Going further back: http://www.realworldtech.com/sandy-bridge-gpu/5/
    This description seems to indicate that a thread with sufficient priority is able to take successive issue cycles. This does sound more complex to manage than round-robin, and might allow a thread that didn't stall to dominate execution time, which might raise fairness issues.
     
    Kaarlisk likes this.
  17. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    That is my assumption as well. Round robin is simpler and more fair. And it gives the threads more time to hide instruction latency (assuming some complex instructions need this). Downside of course is that on average round robin finishes threads slightly slower, potentially causing slightly more resource and cache contention (depending of course on data access patterns).
     
  18. Paran

    Regular Newcomer

    Joined:
    Sep 15, 2011
    Messages:
    251
    Likes Received:
    14
  19. Kaarlisk

    Regular Newcomer Subscriber

    Joined:
    Mar 22, 2010
    Messages:
    293
    Likes Received:
    49
    This is weird. I was pretty sure and I checked a couple of places, IronLake was 45nm (the GMCH was 45nm, the CPU 32nm).

    Also new in Skylake: EU simplified to “scalar” mode.
    Is that about those different SIMD widths or something else?
     
  20. moozoo

    Newcomer

    Joined:
    Jul 23, 2010
    Messages:
    109
    Likes Received:
    1
    I'm just sad that besides not implementing fp64 in opencl they are also capping the DP flops below or similar values to the CPU core flops.
    i.e. 1/4 ratio means that 1152 Gflops fp32 -> 288 Gflops fp64
    Note fp64 is available in DirectX compute shaders, C++ Amp and OpenGL computer shaders . It's not a hardware issue.
     
    Grall likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...