Next Generation Hardware Speculation with a Technical Spin [pre E3 2019]

Status
Not open for further replies.
So that'll be up to the bean counters to determine on an ROI basis. I'm not sure if Sony has a similar online feedback forum where they can similarly gauge interest in the community for particular titles as MS does.

Right, which is why I say make the emulator capable of playing discs, and track which games are played via emulation. There's then data to take to rights holders in order to, hopefully, reach some distribution agreement.

I wasn't very clear though, so apologies.

Keeping on target for discussion about hardware, are there any hardware choices needed to enable BC? PS1 and 2 emulators are already plentiful so they should be a given. PS4 emulation should just be a case of a few considerations in the GPU design?? Is there anything at all that could enable PS3 emulation short of a Cell processor? Could Cell be emulated on a GPU these days?

IIRC AVX 512 was common/semi-common/not common at all, but used in key ways on the Cell, and may be implemented in Zen 2.

If it does, then we're pretty much looking at a core for core match between the Cell and, I presume, an 8 core Zen 2. 3.2GHz should be a doddle too.

I have no idea about the GPU side of things, but if there's any truth to the rumour of Sony being more deeply involved in the development of Navi, maybe that's one of their goals?

I remember one particular dev who worked on one of the Tomb Raider games talking about the way that the Cell pretty much constantly had to be used just to prop up the shitty GPU. And no other developers ever had anything positive to say about said GPU, so hopefully that means it was so poor it's easy to emulate.

I really hope it is, because I think that would be a great thing to announce, on the 25th anniversary of PlayStation, a new console that will play all of the past 25 years of everything on PlayStation. Except for PT :no:
 
If it does, then we're pretty much looking at a core for core match between the Cell and, I presume, an 8 core Zen 2. 3.2GHz should be a doddle too.
What do you do about the completely different instruction set and memory management? How do you get SPU code onto Zen 2 cores with negligible overhead? Is there such a thing as a hardware transcoder that could be used??
 
What do you do about the completely different instruction set and memory management? How do you get SPU code onto Zen 2 cores with negligible overhead? Is there such a thing as a hardware transcoder that could be used??
One of the project’s primary focuses:

Short Term Goals
  • Improve SPU/PPU LLVM recompiler compatibility, add useful optimizations. (Nekotekina)
  • Use compression to store compiled PPU modules. (Nekotekina)
  • Rewrite SSSE3-using code using runtime compilation for better consistency. (Nekotekina)
  • Fix some engine-specific rendering issues (kd-11)
  • Improve the shader decompiler/recompiler (kd-11)
  • Fix remaining problems with texture readback (write color buffers) (kd-11)
Medium Term Goals
  • Improve SPU instruction accuracy for Fast Interpreter and ASMJIT, investigate vectorized software FP implementation.
  • Improve audio and video decoders for better speed and compatibility.
  • Improve controller support. This includes emulated controllers (with mouse or keyboard) and real controllers as well.
  • Improve solution structure, move and rename some files.
  • Implement missing syscalls. Allow to LLE more system modules.
  • Write automatic tests to minimize bugs.
  • Enable hardware acceleration for decryption (AES-NI). (#2457)
  • Implement config tristate in GUI for per-game configurations.
  • Improvements to the shader generation and cache system
  • Implement parametrized PPU/SPU Interpreters reusing current LLVM IR generator, remove original hard-coded interpreters and make LLVM mandatory. Add options regulating its accuracy detached from the base choice of the Interpreter or the Recompiler. (Nekotekina)
  • Remove obstacles for ASLR support. (Nekotekina)
  • Rewrite signal handlers to improve reliability and support exceptions. (Nekotekina)
 
What do you do about the completely different instruction set and memory management? How do you get SPU code onto Zen 2 cores with negligible overhead? Is there such a thing as a hardware transcoder that could be used??

That's for people far smarter than I. But as anexenhume has pointed out, there's already a capable emulator for the PC.

Edit:

So it's possible even without custom hardware or some kind of hardware transcoder.

/Edit

It makes me wonder: is there hypothetically any kind of CPU-specific customisation that would be beneficial to BC?

Sebbi's been pretty clear that chiplet CPU's can be fed sufficiently with IF, whereas capable GPU's can't. It follows that the GPU and I/O will have to be on the same die, barring some miraculous IF2.0 BW increase. But it still leaves open the possibility of chiplet CPU's.

So could any BC specific hardware customisations manifest only in the GPU + I/O, using a standard Zen 2 chiplet?
 
Last edited:
Didn't know PC emulation had come this far along. Impressive stuff! Performance is very CPU intensive. I feel a custom hardware block could still be useful for BC purposes.
 
Didn't know PC emulation had come this far along. Impressive stuff! Performance is very CPU intensive. I feel a custom hardware block could still be useful for BC purposes.
They are incentivized so they don’t have to keep legacy PS3 racks up and running for PSNow as well.
 
I remember one particular dev who worked on one of the Tomb Raider games talking about the way that the Cell pretty much constantly had to be used just to prop up the shitty GPU. And no other developers ever had anything positive to say about said GPU, so hopefully that means it was so poor it's easy to emulate.

Performance is less of the issue for the GPU - it's more about returning the expected results when running the code meant for RSX's quirks vs another GPU/driver, especially from the low level point of view - so that's probably where the emulation has run into a number of graphical bugs, which may eventually be sorted out depending on how successful they are at bugging/debugging (almost like trying to build RSX from the ground up but in software).

I'm not sure if there would be particular issues on legal side for Sony to do it vs these sorts of projects.
 
I don't know much about it, but the way ps3 have so many direct busses, and spus can read each others LS, and rsx can fetch data from the spus, it sounds like a race condition nightmare to emulate. Like the amiga emulator where many games needed to activate the machine-wide cycle-exact mode.

One of the Cerny patent was about making each opcode take a specific time to execute. I suppose this would require additional circuity in the cpu, getting a sequence of instructions to take a very precise time to execute. I doubt it's just padding with a number of NOPs.

They have some incentives to emulate PS3 outside of the few gamers who want it on PS5, they need lots of nodes for PSNow, provisioning the racks would be much easier with the same x86 hardware for any game.
 
I don't know much about it, but the way ps3 have so many direct busses, and spus can read each others LS, and rsx can fetch data from the spus, it sounds like a race condition nightmare to emulate. Like the amiga emulator where many games needed to activate the machine-wide cycle-exact mode.

One of the Cerny patent was about making each opcode take a specific time to execute. I suppose this would require additional circuity in the cpu, getting a sequence of instructions to take a very precise time to execute. I doubt it's just padding with a number of NOPs.

They have some incentives to emulate PS3 outside of the few gamers who want it on PS5, they need lots of nodes for PSNow, provisioning the racks would be much easier with the same x86 hardware for any game.
IIRC from reading the patent, they planned to equalize latency by slowing clocks or turning off functional units.
 
Short answer to the question somebody asked me about "what would replace unified shading architecture" way back in the thread.

Looking at Nvidia and mobile lines of "non central processing units", it seems that we are past the sea or ocean of similar processing units.
AMD and Nvidia GPUs (from a couple years) were the pinnacle of GPU desing to fit the need of GPGPUs so lots of computing units able to process 32bits numbers (with various takes on 64bits ones).
That move was not justified by graphics accuracy requirements as such preicsion is not require all the time and there are ways to do more with less.

Now it gets clearer that post processing, geometry, shading, shadowing, etc have different hardware requirements and that actually the units handling those requirements may have others bankable uses for GPUs manufacturers. I read super quickly a Nv presentation about "tasklets" (what that really I don't know) I hardly understood it thought it see"med clear that technology is help by comittee design at the API level.
THe nive thing nowadays is mobile and those damned power constrains, it could be the push for Apple for example to break away from the mass to get the Metal API ahead of the curve. Those specialized units are more power efficient than a bunch gen X FP/Int 32 ALUs. The extra power could be used in upgraded mobile gaming (including more serious/traditional gaming) but also VR and a more sensical choice, for me at least, AR.

So back to my initial post I don't know where AMD is right now, neither I know Sony or MSFT plans, but I wonders if the way graphics are done could shifted quickly after the release of the next generation of consoles. The consoles would find themselves on the wrong side of tech as the PS360 found themselves (foremost on the CPU side of the equation for the 360).
 
The consoles would find themselves on the wrong side of tech as the PS360 found themselves (foremost on the CPU side of the equation for the 360).

I wouldn't necessarily call it the "wrong side" when consoles have a way of defining how far graphics even on the PC will go, though I don't think the Xenon CPU was on the wrong side of technology, that was Cell. Xenon was actually somewhat inline with where PC CPUs were going (symmetric multicore with extended FPU/Vector ISAs).

IIRC AVX 512 was common/semi-common/not common at all, but used in key ways on the Cell, and may be implemented in Zen 2.

If it does, then we're pretty much looking at a core for core match between the Cell and, I presume, an 8 core Zen 2. 3.2GHz should be a doddle too.

Cell's SPEs were only 128 bit wide using a very custom instruction set, presumably based on IBM's Altivec ISA.

I doubt AVX 512 lacks any instructions found in what is a 15 year old design. PC CPUs were on par with Cell's Vector/FP performance years ago, and without the other huge limitations Cell suffers with.
 
My memory is a bit swizzled, but I thought there were FMA4-related things that 360 could do that was missing on recent architectures because reasons. Not sure if Cell had it too, or if it's a particular issue these days either.

@3dilettante or @sebbbi might recall :?:
 
XB360 had a single-cycle dot product instruction. If it doesn't feature in subsequent processors, I guess it didn't make much difference.
 
Debatable I suppose since a lot will be dictated by Intel who serve more than just gamers
sebbbi. :V
See also confusion between AMD's plans for FMA4 vs Intel's switch to FMA3.

It may have had something to do with packing vectors into the 1 register (4x32-bit -> 128-bit) & being able to do things from there. o_O It was just one of those nice things to have (I thought it was sebbbi who mentioned it here years ago when comparing CPUs).

¯\_(ツ)_/¯
 
Last edited:
I'm not versed in VMX-128. Searching did turn up some discussion by sebbbi concerning the 360's dot product instructions and their utility in dealing with data in Array of Structures layout.
https://forum.beyond3d.com/posts/1595940/

There was an SSE extension that introduced a dot product instruction, although what Intel called its dot product probably did not match VMX-128. The initial x86 variant had the typical x86 destructive 2-operand encoding, and although it had a conditional mask for write-back, it worked by either writing the result over a destination element or setting it to zero. Merging into a result register was apparently more complicated.
All the way into the AVX VEX and EVEX era, it has become possible for a dot product to have a destination register that isn't one of the sources, at least for single-precision. Other constraints may apply.

The 360 did have FMA, which Jaguar did not. Bulldozer had FMA4, whereas Zen officially does not (the instruction for undisclosed reasons is still executed by current Zen cores, however).
https://www.agner.org/optimize/blog/read.php?i=838
 
Interesting given Sony’s AVX512 LLVM commits.

A later tweet posted a link to the listing that contained that entry: http://pci-ids.ucw.cz/read/PC/1022
However, the entries with Arden in them do not say family 18h. There is a set of PCI devices labelled as Arden device 18h, whereas other entries that mention the various families like 17h will state "family 17h". There are other device 18h IDs that are associated with family 17h chips.
That leaves the possibility of two separate numbering schemes in hex that are running into similar counts.

Family 18h was already documented elsewhere as being taken by something non-Sony--the Hygon family of chips being built by AMD's joint venture in China.
https://www.phoronix.com/scan.php?page=news_item&px=Hygon-Dhyana-AMD-China-CPUs
 
Status
Not open for further replies.
Back
Top