Intel Gen9 Skylake

Discussion in 'Architecture and Products' started by Paran, Aug 5, 2015.

Tags:
  1. Turtle 1

    Turtle 1 Newcomer

    I understand what your trying to do With highest Supported memory. But these are K chips . Enthusaist. People buy these for O/c and highest supported memory has really nothing to do with anything , The boards support XMP Right. So it supports higher memory speeds . Latency means a lot . My really old DDR1 BH1 timing was 2-2-2-2 -2 it matters . I ran those timing while overclocking the Memory as far as I know I was the only person to run that timing above 20 O/c . I also held the record at Hw bot for awhile I had the Gold . Been 10 years I believe 6,000 points on PC mark 6 . About sandy stock speed. I sure if I went and looked I could find it . But 6,000 back than had everyone shook up . I believe A gave thaT setup to an AT member. My thinking is keep CPU stock and run the memory as hard as it will go . That what we all want to see . These are not the Stock desktop . Who do you believe makes the better memory controller Intel or AMD . If Amd runs the default higher . Than match it with the intel . These are performance chips treat them as such . You got all the other Chips to run at default. It just your result were the worst out of everyones. If I was Intel you would never get another K type chip. Me and my wifes brother inlaw talked about just that earlier. You tried to O/c the cpu Without uping the dram speed . Keeping the base memory speed one would think your O/Cing would be Better. I will buy that Chip from you if its retail . I pretty sure 4.7 be easy . What a lot of people need to do is Go back and reread Anand SB review . and the orginal IPC that was stated . This Great SB chip was laughed at back than . Read the replies from that review . Everyone had it below 10% IPC I believe AT had it the lowest . My chip was showing around 16% IPC increase . Go to your forum we had a big debate on it. Today the way people talk SB was the second coming . I had right back than everyone else missed Big time . Same with Conroe I hit the IPC Right on the numbers that to is in your forum backlog . Go read the Forum topic I was Banned for saying it would cream AMD 64 . That's after intel had already shown us bench marks 5 months earlier . I just figured Intel was telling truth and they were . Everyone else was saying intel lied. Yet ban hammer. I pretty much stay away from hardware forums now days . This release got me active. Lets see if there is an Angel canyon as my wifes brother inlaw tells me .
     
  2. Turtle 1

    Turtle 1 Newcomer

    I would like to see the math on that if you would . I little confused here as this DDR3 is at 1866 cl9 and the DDR4 is at 2133 cl15. is exactly what others were saying and that's as I read it. Did they make a mistake an correct it . Where did you get those numbers from . Same exactly as I had it . My memory is really good and Reading comprehension is better yet . So latency was lower with DDR3 Using 1866. But if they used 1600 OK I believe that's more to it than that . I believe same topic was going on at AT today and they used a different formula. I stayed out of it as much as I could as I banned there . I deserved it to . I told Mod in PM to go screw himself and I wished him Harm LOL I was sick of my user name anyway . I steel post there . As I have 4 land lines here 1 in the house 3 in the shop 3 fiber optic 1 DSl. I will go find there math and copy paste the results here.
     
  3. Turtle 1

    Turtle 1 Newcomer

    I just did a quick look at ATs set up here is what I found
    Corsair DDR4-2133 2x8
    G.Skill DDR4-2133 2x8
    G.Skill DDR3-1866 4x4
    *Memory Timings used were the supported frequencies of each architecture,
    except DDR3L vs DDR4 testing, which used DDR3-1866 C9.
    For Skylake's DDR3L requirement, this was a DDR3 kit running with an undervolt to 1.42V.
    At 1.5V, the system failed to boot.


    Originally Posted by Walter E Kurtz [​IMG]
    Hardware Canucks uses DDR3 1866 Cas 11 and DDR4 2666 CAS 13(!). That is not even close. Hothardware and PcPer don't even post CAS latency under test system setup so god knows what they are testing. There are plenty of reviews where the memory difference between Haswell / Skylake comparison was reduced as much as possible, including here on anandtech showing the real IPC gain to be nowhere near the "usual" 10-15%
    Let's see what Anandtech said about their RAM, since you allege that Anandtech reduced difference in RAM "as much as possible".
    http://anandtech.com/show/9483/intel...h-generation/7
    How to measure performance, according to AT:
    Quote:
    Normally in our DRAM reviews I refer to the performance index, which has a similar effect in gauging general performance:
    DDR3-1600 C11: 1600/11 = 145.5
    DDR4-2133 C15: 2133/15 = 142.2
    As you have faster memory, you get a bigger number, and if you reduce the CL, we get a bigger number also. Thus for comparing memory kits, if the difference > 10, then the kit with the biggest performance index tends to win out, though for similar kits the one with the highest frequency is preferred.
    Performance index=frequency/CAS, supposedly.


    And now the RAM they chose:
    Quote:
    For these tests, both sets of numbers were run at 3.0 GHz with hyperthreading disabled. Memory speeds were DDR4-2133 C15 and DDR3-1866 C9 respectively.
    DDR4: 2133/15=142.2
    DDR3L: 1866/9=207.3
    A difference of 65 in favor of DDR3L



    Compare these to the "not even close" RAM that Hardware Canucks chose:
    DDR4: 2666/13=205.1
    DDR3L: 1866/11=169.6
    A difference of 39 in favor of DDR4


    It looks to me like Hardware Canucks' choice of RAM is actually significantly closer than Anandtech's.
    [​IMG]
     
  4. Grall

    Grall Invisible Member Legend

    Depends on how you define "a lot"; in almost all workloads, the fastest, most expensive, lowest-possible latency boutique RAM will only buy you a handful percent speed increase, because most memory accesses hit the CPU's caches and not main RAM. Most people wouldn't agree that's "a lot", and would think it a terrible waste of money (it is, btw.)

    Your old DDR1 latency had only a couple clocks' latency because the memory was very slow-clocked. Don't stare yourself blind at clock cycle numbers, what really matters is actual latency (as measured in ns), not clock cycle counts.
     
    RedVi and BRiT like this.
  5. gongo

    gongo Regular

    Guys ...can we get back to..snooping around with Skylake dynamic clocks/fivr(lack of)...

    ram talk is boring...
     
    pjbliverpool likes this.
  6. sebbbi

    sebbbi Veteran

    Some reviews mentioned backbuffer color compression for Gen9 GPU. Color compression is huge for integrated GPUs, as the bandwidth of dual channel DDR3/DDR4 is limited to 25.6 GB/s - 34.1 GB/s (tenth of a high end discrete GPU). Color compression would be an easy 30%+ improvement for purely bandwidth bound cases.
     
  7. Infinisearch

    Infinisearch Veteran

    Sorry gongo.

    I was trying from memory (most probably failed) at approximating the latency to the critical first word. I didn't take into account cmdrate since I didn't remember how. In addition I just used the CAS latency since I don't remember the timing diagrams anymore. So I quickly did a 1/f to get period and multiplied by the CAS latency. If someone can point me to a good timing diagram or operational description for DDR3/4 DIMM's it would be appreciated. Thanks.

    Skylake performance in some benchmarks is really impressive, I'm really impressed with Intel given they're still extracting IPC improvements from x86... I wonder how long thats going to last given the same bog standard cache hierarchy? Can't wait for the 72EU EDRAM skylake benchies, I want a cheap laptop that I can play some games on. Does anyone know offhand how much EDRAM adds to the price of an intel CPU at the same clock without EDRAM?
     
  8. Paran

    Paran Regular

  9. pixelio

    pixelio Newcomer

    Nice find!

    The changes from Gen8 that grabbed my attention are:

    [​IMG]

    Preemption is interesting.

    But if the EU thread scheduling scheme in pre-Gen9 wasn't round-robin then what was it?
     
  10. Isn't the GPU proportion getting bigger in each iteration?
    If that's a chip with 24 EUs, then the 72 EU chip will be almost twice as big, not to mention the eDRAM in a separate chip.
     
  11. Kaarlisk

    Kaarlisk Regular Subscriber

    Yup.
    In Haswell, a GT2 GPU is approximately 3.4 times larger than a single CPU core.
    In Skylake, a GT2 GPU is approximately 5.4 times larger than a single CPU core.
    However, as always, the math might be different. And it may also be impossible to make the CPU core much larger.
     
  12. sebbbi

    sebbbi Veteran

    I don't remember that the old document described it exactly. Previously one EU had 7 HW threads (waves) ready to execute (each of these was either SIMD8/16/32 or SIMD4x2). There was two SIMD4 execution units. On a single cycle the two execution units could not both take an instruction from the same HW thread. If I understood correctly there was no other scheduling limitations. I didn't find any instruction latency chart either.

    My guess would be that the new hardware has more strict scheduling limitations, allowing more efficient hardware implementation.
     
  13. Kaarlisk

    Kaarlisk Regular Subscriber

    Another change: eDRAM is now a "memory-side cache", not a "victim cache". The eDRAM controller is now a part of the system agent (previously it was a separate stop on the ring).
     
    fellix and Lightman like this.
  14. Paran

    Paran Regular


    35,2% for the GPU doesn't look much different to me. Assuming ~120 mm² is correct, GT2 Gen9 is only 42 mm² big.
     
  15. Kaarlisk

    Kaarlisk Regular Subscriber

    It was 31% of the die for the GPU in Haswell. There is rather a lot of unidentified space in the Skylake die image, unlike Haswell.

    Yup.
    Still, unlike the CPU cores, which shrank, the GT2 GPU has grown. Skylake GT2 is the first one that has become performant enough that it actually makes sense. Before Skylake, it was either don't care about graphics, so a GT1 will do, or go cheap discrete. Maybe except in the case of 4K desktop or something else specific.
     
  16. 3dilettante

    3dilettante Legend Alpha

    Maybe it has to do with a flattening of the prioritization of instruction issue, which may help with quality of service and possibly preemption.
    Going further back: http://www.realworldtech.com/sandy-bridge-gpu/5/
    This description seems to indicate that a thread with sufficient priority is able to take successive issue cycles. This does sound more complex to manage than round-robin, and might allow a thread that didn't stall to dominate execution time, which might raise fairness issues.
     
    Kaarlisk likes this.
  17. sebbbi

    sebbbi Veteran

    That is my assumption as well. Round robin is simpler and more fair. And it gives the threads more time to hide instruction latency (assuming some complex instructions need this). Downside of course is that on average round robin finishes threads slightly slower, potentially causing slightly more resource and cache contention (depending of course on data access patterns).
     
  18. Paran

    Paran Regular

  19. Kaarlisk

    Kaarlisk Regular Subscriber

    This is weird. I was pretty sure and I checked a couple of places, IronLake was 45nm (the GMCH was 45nm, the CPU 32nm).

    Also new in Skylake: EU simplified to “scalar” mode.
    Is that about those different SIMD widths or something else?
     
  20. moozoo

    moozoo Newcomer

    I'm just sad that besides not implementing fp64 in opencl they are also capping the DP flops below or similar values to the CPU core flops.
    i.e. 1/4 ratio means that 1152 Gflops fp32 -> 288 Gflops fp64
    Note fp64 is available in DirectX compute shaders, C++ Amp and OpenGL computer shaders . It's not a hardware issue.
     
    Grall likes this.
Loading...

Share This Page

Loading...