Next Generation Hardware Speculation with a Technical Spin [2018]

Status
Not open for further replies.
also the 40 bit jaguars memory adress space means less data moving around... So saving bandwidth ...
Wot? I think you're completely out of your depth here and don't understand a thing you're talking about - not to be rude, but to tell you to take a step or six back because this is a Beyond3D tech forum and we want a decent signal:noise ratio and completely clueless (if well intentioned) posts just generate noise. Links to verbose articles with no explanation what they're supposed to be saying is just wasting users' time.

40 bit addressing means how much memory can be addressed - seen and access by the processor. 36 bits was 64 GBs, or 64 billion unique locations on Bobcat. 40 bit is 1 TB, or a trillion unique locations. That has nothing to do with data moving around, which is determined by the memory system and bus widths and whatnot at however many bits per clock. You'll be reading/writing so many bits to/from RAM and so many to/from caches. Even if Jaguar was maybe reading 40 bits at a time instead of 64, which I guess is what you're saying, if you have a 64 bit number you still need to read the whole thing in. And if it's 32 bits, how does a 40 bit read help??

If you don't have the technical know-how to make a technical argument in favour of Jaguar, please refrain from posting. Feel free to believe what you want but there's no discussion to be had.
 
But HDDs are not evolving anymore to allow large asset streaming. Actually they are degrading in performance (shingled magnetic recording). And I doubt SSDs will be cheap enough.
There is still an increase in transfer rate every time they improve the tracks linear data density. Shingle didn't, but the latest 5400 2.5 can do 130MB/s which is much better than the launch 500GB specs. With HAMR there should be another (small) increase too.

But yeah, it's not really improving enough. Which is why a lot are proposing a tiered storage area with something like a 2TB hdd and 128GB flash. Copying the next level/area to flash while playing should be usable. Something developer-controlled instead of the unpredictable block-cache of hybrid drives.
 
Because supposedly in 2020 they'll have access to a new "next-gen" GPU architecture instead of old GCN 6 found in Navi

AMD-2018-GPU-roadmap.jpg


https://wccftech.com/amd-new-major-gpu-architecture-to-succeed-gcn-by-2020-2021/
hmm interesting.
I've been looking around here on the graphics forum and couldn't find a thread on this. Is there anywhere discussing this next gen post Navi? Curious to learn more if the information is available.
Also seems like a good area to begin 'next gen'.
 
There are a few mentions of the next gen in the Navi thread, since its the next box after Navi. There's not been much else to go on, and I wouldn't expect that to be more than a placeholder name.
The claim that it is not GCN is a more recent one, and whether that's worth starting a thread over might also come down to the source's credibility.
 
but the latest 5400 2.5 can do 130MB/s which is much better than the launch 500GB specs.
It's also roughly as good as my 10k RPM 1TB Raptor drives, or maybe even better, I'm not sure what the linear transfer rate is. My drives should still beat these on access time/IOPS tho, but they're getting rather obsolete these days.

The HDD was invented in the 1950s, it's amazing these mechanical devices are still relevant. Pretty much nothing else in the way of high-level components found in computers was around then other than the basest of building blocks; DRAM wasn't invented yet, graphics did not exist, computers output to printouts, punchcards or maybe numerical displays and so on. CPUs were several cabinets' worth of electronics at least (or maybe an entire room) as the integrated circuit had yet to be invented so if the computer even used transistors it was discrete components being used... But the humble harddrive... It's still carrying on. :)

There's nothing in the consumer space built with the precision of a modern harddrive, and they're really underappreciated. Most of the time people curse them for how slow they are, but these things really are quite incredible. They pack tens of thousands of tracks per inch onto the platters, that's pretty crazy. And still they're also very cheap, unless you get right up to the bleeding edge drive sizes that is, that's even crazier. :)
 
I understand that in console environment with unified memory pool and greatly optimized code most of Ryzen benefits over Jaguar are lost... But Ryzen uses a lot of silicon more that can be better utilized.... Thats why I keep on saying Jaguar will be used, maybe an improved version ok....https://www.realworldtech.com/jaguar/
I would be surprised if that the case, be it in the XB1 or PS4 the whole CPU complex is an ugly patchword plagued by terrible latency. As most previous AMD architectures I suspect the memory controller are doing a bad job at extracting bandwidth from RAM. Then there are all critical serial performances, lot of operations are simply faster, much faster. They are still fall behind Intel cores significantly but compare to previous generations from Bulldozer to Jaguar it is a massive leap forward.
Improving Jaguar ain't trivial at all, the whole thing has to be reworked down to the cache heirarchy and memony subsystem. It lags what ARM offers significantly for example, native support for 8 cores cluster and another level of cache for example and overall more flexible cache configurations. I would also assume that the last ARM cores are also better by themselves and by the time next gen release ARM should have wider cores available (a follow-up to the A72) and further refined (even though slightly) A73 (they are interating quite fast usually). I'm not necessarily advocating for a jump to ARM I just want to show case that it is a dated architectures. I find it "sad" as it was a great starting point BUT AMD simply could not interate on that architecture to make it shine due to their financial situation at the time.

EDIT
On the matter of ARM cores I forgot they already introduce their Dynamic HQ cores A75/55 which significantly which should significanty raise the bar above their prior CPU (especially the A55 against either the A53 or A35).
 
Last edited:
I think 48 bit vs 40 bit adress space adds on memory controllers complexity and beeing them shared between GPU and CPUS that steals on silicon, energy and also somewhat on bandwith becouse adresses are also data to be moved around mostly inside the registers of CPU and GPU of course but also outside to cache and RAM. This is what I think. Of course IMHO
 
You can't really have opinions on the function of technical matters; you can't say like, "well you can drive forwards in reverse gear - IMHO". Alternative facts isn't an actual thing.
 
I think 48 bit vs 40 bit adress space adds on memory controllers complexity and beeing them shared between GPU and CPUS that steals on silicon, energy and also somewhat on bandwith becouse adresses are also data to be moved around mostly inside the registers of CPU and GPU of course but also outside to cache and RAM. This is what I think. Of course IMHO
Are you talking about the physical address space increasing from 40 to 48 in AMD64?
Addresses used by programs would be virtual addresses, and in 64-bit mode they are 64 bits (48 bits sign-extended to 64).

The physical address values wouldn't show up in program data, as they are internal to the memory management hardware and cache hierarchy. If they make it to a storage location or operated on by code, they both don't fit 32 or 64 bit granularity. Some of the translation and tag hardware is a bit bigger in a few subsections of the chip, which seems survivable.
An external chip-to-chip link using classical Hypertransport might need an extra packet for handling more than 40 bits, but not much has been said about the latest fabric packet format. The fabric's various intermediary blocks could abstract it out for most of the SOC.
A single-chip solution with no aspirations for addressing even up to 40 bits of memory could still default to the compact format, and since the hardware still needs to implement the standard the complexity for the other mode is there regardless.

The on-die fabric has lower costs and could be customized, should there be a notable benefit for doing so. An identifier that has a few more bits being ignored isn't blowing out the complexity in the controller.
 
It's also roughly as good as my 10k RPM 1TB Raptor drives, or maybe even better, I'm not sure what the linear transfer rate is. My drives should still beat these on access time/IOPS tho, but they're getting rather obsolete these days.

The HDD was invented in the 1950s, it's amazing these mechanical devices are still relevant. Pretty much nothing else in the way of high-level components found in computers was around then other than the basest of building blocks; DRAM wasn't invented yet, graphics did not exist, computers output to printouts, punchcards or maybe numerical displays and so on. CPUs were several cabinets' worth of electronics at least (or maybe an entire room) as the integrated circuit had yet to be invented so if the computer even used transistors it was discrete components being used... But the humble harddrive... It's still carrying on. :)

There's nothing in the consumer space built with the precision of a modern harddrive, and they're really underappreciated. Most of the time people curse them for how slow they are, but these things really are quite incredible. They pack tens of thousands of tracks per inch onto the platters, that's pretty crazy. And still they're also very cheap, unless you get right up to the bleeding edge drive sizes that is, that's even crazier. :)
Oh yeah, the hdd is a mechanical engineering marvel. The delay of HAMR is a big problem but cost per GB remains pretty good. Seagate is planning multiple heads to keep up with the bandwidth needed, it's probably only enterprise. For low memory amount or small size, flash replaced HDD. High end is now flash, apparently the 10k and 15k drives are no longer cost effective. But 5400/7200 are the kings of cost per GB and reasonable performance. Specially for a console where the procurement deals are around $35 for whatever single platter 2.5 they can get. This won't change any time soon.

Back when I was studying, the guy teaching the cpu/mcu class brought a PDP10 in the lab (I think, or some kind of dec from the 60's, I forgot) along with it's HDD and wanted to make it work by the end of the semester. The HDD was the size of a dishwasher, 5MB, and the whole system needed many kilowatts of three phase high voltage. It ended up moved to a corner of the metal workshop to get the proper current. :LOL:
 
yes 3dilettante... But consider ALL the extra memory controller wiring needed for ALL the APU 40 bit vs 48 phisical adressing... how much extra silicon is needed ? With that extra maybe a couple of Jaguar core can be build... Sorry for making devil advocate...
 
yes 3dilettante... But consider ALL the extra memory controller wiring needed for ALL the APU 40 bit vs 48 phisical adressing... how much extra silicon is needed ? With that extra maybe a couple of Jaguar core can be build...
You're suggesting a 17% reduction in memory controller wiring equates to the same die area of two Jaguar cores? Have you spent even 15 seconds thinking about that before presenting it as an option? For that to be true, for '8 pins' of memory addressing to be worth 2 cores, 48 pins would be worth 12 cores. The memory controller would take up all the space of 12 CPU cores. Unless you've never seen a CPU die shot in your life, that should immediately strike you as ridiculous, ergo the idea that considerable silicon savings are to be had by reducing physical addressing from 48 bits to 40 bits would never even be entertained.
 
You're suggesting a 17% reduction in memory controller wiring equates to the same die area of two Jaguar cores? Have you spent even 15 seconds thinking about that before presenting it as an option? For that to be true, for '8 pins' of memory addressing to be worth 2 cores, 48 pins would be worth 12 cores. The memory controller would take up all the space of 12 CPU cores. Unless you've never seen a CPU die shot in your life, that should immediately strike you as ridiculous, ergo the idea that considerable silicon savings are to be had by reducing physical addressing from 48 bits to 40 bits would never even be entertained.

Think of all the additional CPU cores Xbox One X could have if they cut down from 384 bit memory addressing to 16bit!
 
one X is still 40 bit memory adressing ... And Yes I dont know how much silicon more would be used if 48 bit addressing but SURE no benefit at all from that...
 
Because supposedly in 2020 they'll have access to a new "next-gen" GPU architecture instead of old GCN 6 found in Navi

AMD-2018-GPU-roadmap.jpg


https://wccftech.com/amd-new-major-gpu-architecture-to-succeed-gcn-by-2020-2021/
Wouldn't they be privy to that tech already?

Chances are that 7nm won't be very cost effective without EUV. So a next gen before 7nm+ would cost a lot per mm2?

It's phenomenal that they continued to shrink features size down to 7nm with a 193nm light source. I don't understand clearly how interferometry works, but they said the multi pattern rocket surgery thing means cost per mm2 rises signifucantly using these tricks.
Yes but is that cost reduction more important than a possible year headstart over the competition?
 
one X is still 40 bit memory adressing ... And Yes I dont know how much silicon more would be used if 48 bit addressing but SURE no benefit at all from that...
Yours is a dead horse... Please stop beating it. Move on to another topic of discussion, you're just creating noise.
 
yes 3dilettante... But consider ALL the extra memory controller wiring needed for ALL the APU 40 bit vs 48 phisical adressing... how much extra silicon is needed ? With that extra maybe a couple of Jaguar core can be build... Sorry for making devil advocate...
For the memory controllers, I'm not entirely sure that would change as much as you think if their area went to zero. The extra 8 bits isn't impactful for much of the system, and the controllers themselves are more readily adjusted to a system that knows that even 40 bits is overkill.
The areas potentially impacted are the incrementally larger translation buffers and cache structures for tags within the core and cache sections. Some of the structures use hashes, which means they wouldn't be scaling by 20%.
Many of these are the small rectangles next to the big rectangles that make up the arrays in the die shots for the core complexes, and a few small rectangles in the little rectangles in the core areas. Overall it's a minor part of a minority of the die, and while there may be some benefit in the most tightly constrained parts of the pipeline, that would have been more difficult multiple nodes ago when it was 40, so some of that would have been compensated for by now. At that point, it's also not significantly changing anything, the operations for 40 and 48 are not different, just mildly larger in a few spots.

External to the TLBs and tags, there can be some entries in the IOMMU that might be mildly larger. The GPU's virtually addressed cache hierarchy doesn't need the physical address bits.
The rest of the system can be changed to be aware that the system it's in doesn't need the extra bits. There have been examples of CPUs in the past that dropped some of the external address bits, as they weren't intended for systems that needed full addressing. Many of the VM data structures are already formatted to take up additional space, since AMD64 allowed for room to grow from 40 all the way to 52, if they ever opted for it.
 
Status
Not open for further replies.
Back
Top