There are multiple reasons that might explain worse than expected performance in some games.
Generic hyperthreading / SMT related issues:
- Intel had similar issues with Nehalem/Sandy/Ivy in games and applications. Reviewers suggested disabling hyperthreading. Intel did some HW changes to improve the situation and Windows scheduling was improved. But there are still cases where Intel chips show reduced performance when HT is active.
- Some reviews show big performance boost by tweaking power saving options. This affects how Windows schedules threads (could affect whether the CPU fills both SMT threads of a single core first, or fill each core with one thread first). AMD seems to take a slightly bigger hit from SMT than Intel.
AMD SMT cores are mapped differently than Intel:
- Some websites claim than Intel logical core mapping is: thread 1 of every CPU 1,2,3..,8 and thread 2 of every CPU 9, 10, 11... 16.
- AMD Ryzen logical cores are apparently mapped sequentially (one core at a time): CPU1 = 1,2, CPU2 = 3,4... CPU8 = 15,16.
- This causes problems in game engines that core lock their 6-8 worker threads (assuming console port). A game engine like this would only use 3 or 4 cores (both SMT threads on each) on AMD 8-core Ryzen. This would explain huge gains seen by some reviewers when disabling hyperthreading.
AMD have separate L3 caches for 4 core clusters:
- Apparently Windows doesn't know about this and migrates threads repeatedly between clusters. This is practically equivalent to L3 flush. Intel has shared L3 cache between all cores, and is not affected.
- CPU driver and/or Windows scheduler patch could reduce this problem.
- But many game engines are simply designed to do lots of parallel for loops, where the workload is split to all cores, and then the results are returned to a single core. There is nothing the OS can do to help this scenario. It can't analyze the memory access pattern of each core.
- AMD Jaguar has similar LLC cache design. Both 4 core clusters have their own L2 cache. It is best to keep communication between these two clusters as limited as possible. Many console game engines have already designed to work around this limitation. But on consoles, all threads are locked to a core. Core locking threads on PC is a double edged sword (has potential problems). In this case, core locking would be preferable, but you need different thread mappings for Ryzen than Jaguar (as you have 16 logical cores vs 8). Hopefully AMD releases a best practices guide for Ryzen caches and logical core mappings. It should be relatively easy to patch a game engine thread scheduler to support Ryzen.
Generic hyperthreading / SMT related issues:
- Intel had similar issues with Nehalem/Sandy/Ivy in games and applications. Reviewers suggested disabling hyperthreading. Intel did some HW changes to improve the situation and Windows scheduling was improved. But there are still cases where Intel chips show reduced performance when HT is active.
- Some reviews show big performance boost by tweaking power saving options. This affects how Windows schedules threads (could affect whether the CPU fills both SMT threads of a single core first, or fill each core with one thread first). AMD seems to take a slightly bigger hit from SMT than Intel.
AMD SMT cores are mapped differently than Intel:
- Some websites claim than Intel logical core mapping is: thread 1 of every CPU 1,2,3..,8 and thread 2 of every CPU 9, 10, 11... 16.
- AMD Ryzen logical cores are apparently mapped sequentially (one core at a time): CPU1 = 1,2, CPU2 = 3,4... CPU8 = 15,16.
- This causes problems in game engines that core lock their 6-8 worker threads (assuming console port). A game engine like this would only use 3 or 4 cores (both SMT threads on each) on AMD 8-core Ryzen. This would explain huge gains seen by some reviewers when disabling hyperthreading.
AMD have separate L3 caches for 4 core clusters:
- Apparently Windows doesn't know about this and migrates threads repeatedly between clusters. This is practically equivalent to L3 flush. Intel has shared L3 cache between all cores, and is not affected.
- CPU driver and/or Windows scheduler patch could reduce this problem.
- But many game engines are simply designed to do lots of parallel for loops, where the workload is split to all cores, and then the results are returned to a single core. There is nothing the OS can do to help this scenario. It can't analyze the memory access pattern of each core.
- AMD Jaguar has similar LLC cache design. Both 4 core clusters have their own L2 cache. It is best to keep communication between these two clusters as limited as possible. Many console game engines have already designed to work around this limitation. But on consoles, all threads are locked to a core. Core locking threads on PC is a double edged sword (has potential problems). In this case, core locking would be preferable, but you need different thread mappings for Ryzen than Jaguar (as you have 16 logical cores vs 8). Hopefully AMD releases a best practices guide for Ryzen caches and logical core mappings. It should be relatively easy to patch a game engine thread scheduler to support Ryzen.