Next Generation Hardware Speculation with a Technical Spin [2018]

Status
Not open for further replies.
It's worth noting that the multithreading in Jaguar isn't SMT - it's Clustered-multithreading. The way threading workloads affect the cores will be different and what devs have to do to maximise core utilisation will be different between Jaguar and Zen.
makes for an interesting backward compatibility discussion I guess. I now question whether not having the same number of available concurrent threads (ie 4 Ryzen + HT makes 8 threads, but only 4 are active) may not play well with ND engine.
 
ok... But how many are they going to sell? It may be a necessity. I remember a time when Sony developed Cell... Improving an existing Architecture with maybe adding more cache, adding AVX512 while shriking to 7nm is sure much simpler than Cell development... Also costs may be shared between Sony and Ms
 
they are on the same AMD boat... And Nintendo is there selling --a lot-- with a Tegra chip... Not to mention Google also wanting to jump on the market... But yes I finally admit plain, vanilla Jaguar is unfit (the lack of AVX512 looks bad)
 
It's worth noting that the multithreading in Jaguar isn't SMT - it's Clustered-multithreading. The way threading workloads affect the cores will be different and what devs have to do to maximise core utilisation will be different between Jaguar and Zen.
Clustered multithreading sounds like Bulldozer or its ilk, with shared front ends and FPU.
A Jaguar cluster is 4 separate cores and shared L2.

The Jaguar module and Zen CCX have some correlations. A CCX gives the same or more L1 cache, and while not shared their L2 capacities match. Zen's L3 is extra capacity, and the shadow tags serve a purpose close to the shared L2 of Jaguar.

At least the first revision of Zen, the quad-core module and mostly equivalent poor memory and cross-module latency also carries through.
 
My response was specifically for ND's engine. Unfortunately there seems to be less information out there on how multicore programming works on other engines. But if I recall their presentation correctly, they want the cores to never switch away threads because the thread is responsible for generating and switching fibres, and somewhere in near the beginning of their presentation I do recall that the CPU would thread swap between cores for some reason for running OS tasks and that was detrimental to their performance and became the first thing they needed to lock down.

The way that ND does multi-threading is a solution that aims to have very high/equal core utilization. It ensures as much as possible that each available core is working on some item, or generating new items for other cores to be working on them. I think without fibres this type of 'switch' would have too much overhead, but from what I remember from the presentation, fibre switches have minimal overhead. Fibre switching is just a form of job pool. In this case they had full control over how it would operate. From what i can see the largest discernible difference between this and standard job pools is that there is no master thread assigning work back onto the stack, they effectively decentralized that function.

But having a shared fibre pool to work from is just one way to do multi-core programming. Others opt to assign full roles to cores, like sound, animation, AI, etc to specific cores and the threads don't need to have any overhead switching threads, they just work on what's been assigned to them. It may not saturate all the cores as well as the shared fiber pool, but as long as no core is going over budget it will work. And in the scenario with SMT, you are now given 2 threads per core, it should help alleviate further any bottle necking possibilities on each core.

With SMT traditional shared thread/job pools appears improved. Mini jobs controlled by a single thread are assigned to the job pools as tasks out for the idle cores to pickup to do work on. SMT is probably going to be helpful if you setup for that type of event as not all jobs will be equal and you're going to get better saturation of the cpu if the thread stalls.

I think overall SMT is a good thing. And I assume in the console space it could be a feature that developers can turn off if they desire

One thing I was wondering about was if there were benefits from SMT in maintaining data locality. For example, is there any benefit to assigning multiple tasks that use the same data to a pair of virtual cores that run on the same physical core?
 
@mrcorbo
I'd hazard a guess and say "probably no, or not very much", as if you have the data you need in cache, then the thread will be working on that data, so having two threads working on the same data using the same hardware resources likely isn't going to bring much, if any benefit.

The way I understood multithreading is it helps when threads stall on memory accesses, because the other thread might have data loaded into cache already which it can process while the first thread's data start trickling in from memory...
 
complete OneX SOC is 7 billion transistors... Ryzen alone (8 core) is almost 5 billion....

thats why maybe improved, maybe 16 cores, maybe with more cache ... But jaguar is still the first choice

https://en.m.wikipedia.org/wiki/Transistor_count
Yop that is a lot of transistors though lots of cache so it turns out that both ZEN powered Chip we've seen are around 5 billions transistors and 210mm2. Density ain't bad, according the same data (wiki) a cpu cluster and its 8MB of L3 is around 44mm2.
Say you remove a cluster, you are around 170mm2 with a 11/16 GPU and a dual memory controller, so some room to go before you reach the size of the PS4 Pro and XBX SOCs (325mm2 and 360mm2).
Not a great issue if at all looking at the massive increase in CPU perfs.
 
Since Nvidia gpu will have better performance per watt, why doesn't PS5 use a x86 cpu and a Nvidia gpu even a discrete graphic card?

Besides,how much more will discrete cpu/gpu cost than a single large APU in 7nm process?
 
Maybe I’m the only one with this dream but why not a snapdragon soc (or something similar) connected to a fixed function (task specific) hardware raytracer. The raytracer would only do the heavy lighting passes(raytraced shadows, global illumination, etc) and the gpu on the soc would do everything else. So the only R&D would be custom raytracing hardware, that only traces rays.
 
but why not a snapdragon soc (or something similar) connected to a fixed function (task specific) hardware raytracer.
I don't think very many mainstream devs are interested in a platform with hugely different performance characteristics and method of function compared to what is tradition. Maybe such a setup would be a big hit with the Jeff Minter-type indie crowd, but when middleware and content production software are all geared towards 3D rasterizers, and especially, when a blockbuster-type game costs $100+ million to produce, you don't want to fuck around.
 
I don't think very many mainstream devs are interested in a platform with hugely different performance characteristics and method of function compared to what is tradition. Maybe such a setup would be a big hit with the Jeff Minter-type indie crowd, but when middleware and content production software are all geared towards 3D rasterizers, and especially, when a blockbuster-type game costs $100+ million to produce, you don't want to fuck around.

You’re probably right, I’m just looking forward to the next gen...

This seems like the route to go https://github.com/justingallagher/fpga-trace
 
FPGAs are very costly and inefficient apparatuses. They're also quite slow when doing a task, compared to an ASIC performing the same task. They'll never be part of any mainstream gaming platform, that's virtually a guarantee.
 
I remember reading the Amiga CD32 used a 256 bit FPGA for its planar <> bitplane conversion. Can't back that up though.
Hm, I thought it was built into one of the system chips used in that device... It seems overkill to use a FPGA just for that, but I'm not dismissing the possibility... The CD32 always was a 2-bit device, and Commodore basically had very little to no money at that stage due to the incredibly inept company "leadership" (read: corporate thieves!) which was more interested in reaching into company coffers and helping themselves to all the money and benefits they could stuff into their big greedy pockets, than doing anything even approaching even a decent, much less good job. Maybe they simply couldn't spare the expense to make a proper ASIC for the task.

Gods, it still boils my blood to this day how those fuckers crashed and burned my favorite of all time computer company. I was such an Amiga fanboy, you wouldn't believe. ...Well, maybe you would! lol *runs and hides in shame* :runaway:

:LOL:

Sidenote: the Super Magicom which I certainly never owned for my SNES did absolutely use a FPGA for its core functionality (I noticed this, because I took mine apart.) But its needs was surely very modest, considering how primitive overall the SNES was.
 
The Akiko chip is what I'm seeing everywhere, but I recall a magazine at the time (back then there wasn't much internet!) saying a 256 bit FGPA, and explicitly stating it could be reprogrammed/repurposed. May have been bollocks, but it was a proper magazine and not some brainless trendy gaming one (Amiga User International?), and back then journalism was actually a real profession and they could be expected to do a little research instead of parroting stuff from Twitter and reddit.

Possibly the prototype used an FPGA and eventually they went with a fixed function ASIC? Or I'm hallucinating memories? Searching is made harder by lots of mention of Amiga's emulated on FPGA hardware.
 
SOC transistor budget.... Max budget for this gen was 7 billions on One X, quite a lot... How minimally will be for next gen ? To keep costs rasonable.... I say 10 billions. But for Ryzen to find place maybe 15 billions are needed. Opinions ?!?
 
I don't think that can be answered without an idea of what density we'll get at the production node used. It's more about mm^2 silicon than number of transistors.

https://forum.beyond3d.com/threads/console-die-sizes.53343/
Die sizes mm^2

Xbox ~230
360 ~360
XBO ~360
XBOX ~360

GameCube ~150

PS2 >500
PS3 ~500
PS4 ~350

Interesting observations there about efficiency of the silicon, and how much more Xbox achieves per mm^2 than PS. Taking the sane mid-ground, 350 mm^2 seems a good spot for a console. So the estimate would be however many transistors fit into 350 mm^2 on whatever process is used. Consoles using more area than that have been loss-leaders.
 
Status
Not open for further replies.
Back
Top