Bring back high performance single core CPUs already!

Discussion in 'PC Hardware, Software and Displays' started by Frontino, Apr 10, 2012.

  1. Albuquerque

    Albuquerque Red-headed step child
    Veteran

    Joined:
    Jun 17, 2004
    Messages:
    3,845
    Likes Received:
    329
    Location:
    35.1415,-90.056
    Yeah, not sure what the bottleneck is on 8192 shadows, but I ran headlong into the same bottleneck on my Q9450 + 5850 config a few months ago. I made the (naive?) assumption it was VRAM limited, but never did the proper research to prove that.

    GPU-Z and MSI Afterburner both show >2500Mb of VRAM usage while I'm dorking around the outdoors in Skyrim; typically less when I'm indoors somewhere. I haven't compared it after making Richard's shadow and uGrids changes; I'll check it out tonight and report back.
     
  2. almighty

    Banned

    Joined:
    Dec 17, 2006
    Messages:
    2,469
    Likes Received:
    5
    Why have 5Ghz when you can have 5.5Ghz like me ;)
     
  3. HMBR

    Regular

    Joined:
    Mar 24, 2009
    Messages:
    416
    Likes Received:
    105
    Location:
    Brazil
    X2 3.3GHz = 44FPS
    X4 3.7GHz = 54FPS
    X6 3.3 (up to 3.7GHz with turbo) = 49FPS

    the game doesn't seem to care for more core than maybe 3,
    I think you shouldn't look at the i7 3xxx compared to the 2600k because there is simply to much difference, a lot more l3 cache, a lot more memory bandwidth and in this game maybe a higher clock with turbo to,

    my experience with Skyrim was divided between 3 CPUs,
    first an E5400 (2mb L2, 2 Cores) at 3.7GHz,
    without the first patches the framerate was already OK at my settings (above 20 on the most intensive places but normally above 40), BUT the game had some terrible stuttering and freezes which as far as I know were only happening with dual core CPUs, there was an unofficial fix that worked (from enbdev.com), but it was later solved with the patches,
    I tested some underclocking at 2.4GHz the game was still playable, with more than 20fps in Riften, and here is the funny thing, with a higher level of details (all at ultra, which is to much for my VGA) the experience was still smooth at this clock, with constant framerate, while at 3.7GHz it was awful, with a lot of variation, with the framerate jumping up and down all the time,
    anyway, the 1.4 patch made the game a lot lighter, I stated seeing most of the time the framerate at 50fps or more, with the lowest going to the 30s...
    I swapped this CPU with a Core 2 Quad 65nm at 2.85GHz, and performance decreased a little but stayed close enough, it was clear by looking at task manager that this game uses very little more than 2 cores....

    going for a i3 2100 the framerate definitely improved, I guess the architecture improvements really can make up for the "missing" cores easily in this case (gaming), a lot faster memory/IO subsystem I guess...

    but there are games that are far more successful at using "more threads" than Skyrim,
    comparing the e54@3.75 to the C2Q@2.85GHz I saw a huge advantage on the C2Q at some games like Witcher 2 and GTA 4 things jumping from 20 to 30FPS basically (but the C2Q also had more L2 cache),

    I think a dual core sandy bridge with HT at 4.5GHz would mostly make useless more cores for gaming right now... and I also think that this is the reason why intel only unlocks overclocking at their more expensive parts... so yes, the OP have a point I think... most users don't really need 4/6 cores, but could use 2 stronger cores,
     
  4. I.S.T.

    Veteran

    Joined:
    Feb 21, 2004
    Messages:
    3,174
    Likes Received:
    389
    Read the rest of his data, you'll see that HT just isn't quite enough.
     
  5. Albuquerque

    Albuquerque Red-headed step child
    Veteran

    Joined:
    Jun 17, 2004
    Messages:
    3,845
    Likes Received:
    329
    Location:
    35.1415,-90.056
    Yeah, uh, try reading the rest of my posts :lol: ;) Let me help with one of my cliff notes from an earlier post (but long after the one you quoted...)
    I've done a bit of homework for the world to see, and HT isn't much help. You want real cores, not HT... The i5-2500k seems to be your absolute best "bang for the buck" in terms of gaming performance, which really isn't news to anyone... Also, at very low speeds (ie low-power processors found in laptops) there is a measurable performance benefit to having four physical cores in Skyrim. The benches indicated a jump from 10fps -> 30fps via jumping from single core to quad core (hyperthreading helped at lower core counts, but maximum performance was found at 4c / 4t rather than 2c / 4t)
     
    #65 Albuquerque, Apr 17, 2012
    Last edited by a moderator: Apr 17, 2012
  6. itsmydamnation

    Veteran Regular

    Joined:
    Apr 29, 2007
    Messages:
    1,296
    Likes Received:
    395
    Location:
    Australia
    so what your really trying to say is that 17watt trinity is going to be awesome :lol:
     
  7. Ninjaprime

    Regular

    Joined:
    Jun 8, 2008
    Messages:
    337
    Likes Received:
    1
    I'm pretty sure the CPU market took the turn towards multicore the way it did for a reason, a reason smarter and more informed people than me decided. However, one has to wonder what kind of performance a hypothetical single core Sandy Bridge pushed to 5Ghz+ with 4 threads and 512 bit larrabee-style vector unit, with massive 32-64MB L2 cache would look like, performance wise, had the single core method stuck around. Might be able to keep up with or maybe even beat a dual core SB of today. Though 4/6 cores would probably crush it. Someone make it happen! :razz:
     
  8. Albuquerque

    Albuquerque Red-headed step child
    Veteran

    Joined:
    Jun 17, 2004
    Messages:
    3,845
    Likes Received:
    329
    Location:
    35.1415,-90.056
    No comment ;)

    Meh. Four threads from a single core sounds foolish and unlikely useful; a fat vector unit makes a lot of assumptions on how game code would get written, and epic L2 cache actually doesn't seem of much use given prior history and the 'usefulness' of parts that sport the same clockspeed and architecture but fatter cache size (ie: very little difference.)

    Case in point: my six core, twelve thread 3930k sports 50% more cache, 50% more cores, 50% more threads, 100% more main memory bandwidth, and 200% more PCI-E lanes and yet basically equals or loses to a 2600k when talking strictly about games. Of course, when you throw in something that is "compute" related (H.264 encoding, raytracing, blah-de-blah) then the SB-E platform brings out the big guns and lays waste to the 2600k.

    Single core with all that jazz? Not seeing it.
     
  9. hkultala

    Regular

    Joined:
    May 22, 2002
    Messages:
    284
    Likes Received:
    6
    Location:
    Herwood, Tampere, Finland
    AFAIK Willamette/Northwood had 28 stages, and AFAIK bulldozer has only about 20 stages(precise number not stated).
    And Presscott had 42 stages.

    So, bulldozers pipeline length is only "halfway from P6/K7 to willamette and northwood" and "1/3 way from P6/K7 to presscott".

    And long pipeline was not the biggest/only problem with willamette/northwood's IPC; slow shifts, multiplications, small L1D cache were bigger IPC limiters.

    On presscott with many of these fixed or improved and even longer pipeline, the pipeline length was really the biggest ipc-reducer.

    And the pipeline length of bulldozer is quite equal to the pipeline length of Power7, worlds fastest microprocessor.
     
  10. hoho

    Veteran

    Joined:
    Aug 21, 2007
    Messages:
    1,218
    Likes Received:
    0
    Location:
    Estonia
    No, pre-Prescott Netbursts had 20 stage pipeline and Prescott and up had 31 stages.
    In what workloads?
     
  11. hkultala

    Regular

    Joined:
    May 22, 2002
    Messages:
    284
    Likes Received:
    6
    Location:
    Herwood, Tampere, Finland
    20 stages AFTER the trace cache

    8 stages before the trace cache/total.

    28 stages total.

    And presscott had.. 31 + 11?
     
  12. imaxx

    Newcomer

    Joined:
    Mar 9, 2012
    Messages:
    131
    Likes Received:
    1
    Location:
    cracks
    Counting the TC into the stages for the P4 is like to count L1I latency - it sounds not very fair.
    NetBurst did pay indeed a high price for such performance, but was funny: a nice example I remember was it needed an additional mop for INC vs ADD for masking out flags, or the TC space 'borrows' alchemy.

    I'm not saying BD is slower because it has the same stages of P4, but rather it sounds like a trend reversal for x86. I'm sure AMD ensured that the 10/15% clock advantage more than cover the issues it brings (well, same could have been said for Intel so..). BD looks a bit like K10 to me - a caged, immense firepower toy with a tiny entrance (K10 with a tiny exit, too).

    Power7... if x86 had fixed instruction size, maybe with VLIW possibilities, more registers, an optimized set of instructions +etc... but you see where Itanium ended up -at AMD64. Compatibility wins, for large scale.
     
  13. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    Yes... a 4.5 GHz single core (with HT) already reaches 52 fps (very near the 60 fps cap), and dual core (without HT) reaches the max. Adding more cores or threads do not help at all, since two beefy high clocked Sandy Bridge cores can already execute all the game threads sequentially in the allocated 16 ms time slot (60 fps target).

    At 1.5 GHz however you see good scaling. 1 core = 10.5 fps, 1 cores with HT = 15.5 fps, 2 cores = 16.5 fps, 2 cores with HT = 23.3 fps, and 4 cores = 29.5. Also you see that extra hardware threads provided by HT give very good gains at low core counts (1 core + HT = 93% of 2 cores, 2 cores + HT = 78% of 4 cores).

    It seems that Skyrim is designed to run at 30 fps on lower end processors (and consoles). The 29.5 fps is exactly half of the 59.3 fps cap seen in high end benchmarks. Maybe the game detects the CPU clock speed and lowers the cap to half if an low end CPU is detected (very good idea, since constant frame rate is always better than a fluctuating one). That's why there's no additional gains when going over 4 cores. You could try to slightly increase the CPU clocks, and see when the game switches to 60 fps mode. 2 GHz would be an good even number for example... The 3 GHz CPU slightly scales up from 4 -> 6 cores, so there's likely extra scaling to be discovered at lower clocks (as long as the frame cap is not lowered to 30 fps).
     
  14. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,560
    Likes Received:
    157
    Location:
    In the Island of Sodor, where the steam trains lie
    What's wrong with real cores AND hyperthreading?

    I'm sure my application would benefit from it.
     
  15. Albuquerque

    Albuquerque Red-headed step child
    Veteran

    Joined:
    Jun 17, 2004
    Messages:
    3,845
    Likes Received:
    329
    Location:
    35.1415,-90.056
    If I can choose between 2c / 4t and 4c / 4t, then obvious winner is the latter (given all other things are equal.) If you can get both, then more power to you!

    My overclocked 3930k has no problem obviously crushing this game and anything else I throw at it, but I leave HT turned on regardless. It certainly helps when transcoding all the videos of my daughter...
     
  16. hoho

    Veteran

    Joined:
    Aug 21, 2007
    Messages:
    1,218
    Likes Received:
    0
    Location:
    Estonia
    But that's the whole point with HT, things aren't equal at all :)
     
  17. hkultala

    Regular

    Joined:
    May 22, 2002
    Messages:
    284
    Likes Received:
    6
    Location:
    Herwood, Tampere, Finland
    Calculating P4 pipeline stages without calculating decode stages but calculating decode stages on other processors even even less "fair".


    The "corrent" way of comparing these pipeline legths would be calculating the maximum sequential gate count of pipeline stages, ie. how long is one stage, not total amount of the pipeline stages. And there P4 is clearly longer than bulldozer, and bulldozer is very close to power7.
     
  18. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    But the reality is that HT costs minimal die space, while doubling core count almost doubles the die space (and rises power draw dramatically). You should be comparing 2c/2t with 2c/4t (and 4c/4t with 4c/8t) because those designs are similar in power draw, transistor count and manufacturing costs.

    Comparing 2c/4t directly with 4c/4t is not a fair comparison, because the 2c/4t CPU is much cheaper to produce and consumes considerably less power (half the cores, half the execution units, etc).

    HT is a good way to get extra performance efficiently (only minimal extra transistors needed, doesn't raise power consumption that much). A processor without HT (or similar SMT technology) wastes lots of CPU cycles doing nothing. This happens because it's very hard to keep long pipelines filled with instructions only from one thread. The parallerism that can be extracted from one thread (ILP) is just not enough. A processor with HT can fill the empty slots of the pipeline (that would just execute NOP) with instructions from another thread. That's basically free performance (without having to add any new execution resources).

    HT is most important for CPUs that have low core count to begin with. Good examples are those 17W dual core Sandy Bridge CPUs (found in Ultrabooks and Macbook Air). With good turbo clocks these CPUs can handle single core situations pretty well, and with HT these CPUs can also handle four threaded code pretty well. Without HT these ULV processors would really crumble in code that is designed for highly multithreaded execution. Of course if you already have four or six real cores (with high desktop clock rates), adding HT doesn't help that much in most games and applications (that are often designed four core execution).
     
  19. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,172
    Location:
    La-la land
    Well, non-P4 processors typically do not have a trace cache either, whose whole purpose is to essentially take the decoder stages out of the equation, from what I've read.

    I could be wrong though, it's been known to happen. :D

    Sounds like an entirely pointless metric IMO, since Bulldog has crap performance regardless of how many or few gates comprise its pipeline.

    Absolute numbers mean nothing; it's the real-world performance that counts.
     
  20. Albuquerque

    Albuquerque Red-headed step child
    Veteran

    Joined:
    Jun 17, 2004
    Messages:
    3,845
    Likes Received:
    329
    Location:
    35.1415,-90.056
    I agree and already know everything you've said, and yet it does not invalidate what I said. Which was, specifically, two physical cores (regardless of HT implementation) will perform less than four physical cores -- all else being equal. Yes, it's obvious. I know. I was responding to "one core, four threads" concept brought up earlier regarding the building of some conceptual high-performance 'single core' CPU with a fat AVX unit and 64Mb of L2 cache.

    And the whole "well they aren't equal if one has HT, Duuuhhhrr" response is not necessary, because "all ELSE being equal" has a unique defining term that I've helpfully highlighted for you. Hyperthreading would be the unique delta in this case; all ELSE would be the same. If you want to argue this, consider that HT / non-HT Intel processors on the SB platform are identical minus a capabilities fuse that was blown. Thus, an HT versus non-HT core is indeed EQUAL, except for the HT being enabled. See? All else? That else part, yeah, it's equal :D

    So, carry on :)
     
    #80 Albuquerque, Apr 17, 2012
    Last edited by a moderator: Apr 17, 2012
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...