PlayStation 4 (codename Orbis) technical hardware investigation (news and rumours)

Status
Not open for further replies.
You will be reading and writing it continuously to both ram pools. I really doubt it will just be a simple read/write buffer. 32MB isn't a lot but it can store enough for the most needed bits of memory, and the large texture can be stored in main memory. Writing between esram and main memory can be done by the move engines.

and writing between the eSRAM and DRAM uses the DRAM bandwidth.
 
and writing between the eSRAM and DRAM uses the DRAM bandwidth.

Of course, but reading in the PR Textures and copying a completed framebuffer out at the end of the frame render is a small % of the overall bandwidth consumed during the generation of a frame. It's pretty clear that ms have measured 150GB/s to their esram during real world usage, getting that read/write activity off the system ram bus has to result in less contention.
 
Of course, but reading in the PR Textures and copying a completed framebuffer out at the end of the frame render is a small % of the overall bandwidth consumed during the generation of a frame. It's pretty clear that ms have measured 150GB/s to their esram during real world usage, getting that read/write activity off the system ram bus has to result in less contention.

that is an assumption i would not make. just because they hit 150gbps doesn't actually mean anything in terms of a contention comparison. its all about sample interval, if your 95th percentile average or something like that is 150gbps over the entire frame then it means something. Remember GPU can tolerate latency

There will be a lot of data being read-write to the Esram, its to small not to. There will still be a lot of data being read-write from the GPU to the DRAM and there will still be data read-write from the CPU to the DRAM.

Remember GPU's have caches its not like every read and write on the PS4 is going to Dram either.

Personally i think Dev's will find the best way to use all resources on the boxes. So unless the Esram is going to buy them the ability to run algorithms on the shaders the PS4 can't i don't think the Esram is going to be any kind of performance multiplier as i expect shader utilization to be high on both platforms regardless.
 
Of course, but reading in the PR Textures and copying a completed framebuffer out at the end of the frame render is a small % of the overall bandwidth consumed during the generation of a frame. It's pretty clear that ms have measured 150GB/s to their esram during real world usage, getting that read/write activity off the system ram bus has to result in less contention.

It may be a small percentage but it is not free, either bandwidth nor latency wise and as such it is important to keep in mind how often you use it.

Just to throw out some numbers (based on the time it takes to copy the data).

25.6GB/s in MB/s is 26214.4 MB/s
26214.4MB/s / 30 is 874 MB/frame

In other words to copy 874MB of data to/from the eSRAM would take a entire frame of time.

you can do a full fill/read from the eSRAM 28 times / frame if its doing nothing else.
This mean that each total fill/read (ie 32MB) takes 1/28th of a frame to complete.

That doesn't seem so low to me.
 
Last edited by a moderator:
It may be a small percentage but it is not free, either bandwidth nor latency wise and as such it is important to keep in mind how often you use it.

Just to throw out some numbers (based on the time it takes to copy the data).

25.6GB/s in MB/s is 26214.4 MB/s
26214.4MB/s / 30 is 874 MB/frame

In other words to copy 874MB of data to/from the eSRAM would take a entire frame of time.

you can do a full fill/read from the eSRAM 28 times / frame if its doing nothing else.
This mean that each total fill/read (ie 32MB) takes 1/28th of a frame to complete.

That doesn't seem so low to me.

25.6 is from?
 
25.6GB/s in MB/s is 26214.4 MB/s
26214.4MB/s / 30 is 874 MB/frame

This mean that each total fill/read (ie 32MB) takes 1/28th of a frame to complete.

That doesn't seem so low to me.
Using a single DME, which would make it the slowest you could expect it to be then.
Fastest being using all 4 together.
 
Using a single DME, which would make it the slowest you could expect it to be then.
Fastest being using all 4 together.

Not unless your doing interleaved read/writes and the DME's are bad at that

The four move engines share a single memory path, yielding a total maximum throughput for all the move engines that is the same as for a single move engine.
 
but copying with GPU is just 2X faster, and wastes the cycles, besides how realistically is this scenario in real titles? Why would anyone just move data around for nothing?

Its not for nothing, its for processing, im just pointing out that it takes more then a 'insignificant amount of time'
 
Its not for nothing, its for processing, im just pointing out that it takes more then a 'insignificant amount of time'

why would anyone need to copy a finished frame is beyond me. Also PRT don't need to be refreshed completely every frame, I think the point is that the DME frees up the GPU, 25.6G be, you can refresh the entire 32M 13 times @ 60fps, that's like 1M cycles when going through GPU is at 500K? Its pretty insignificant IMO
 
It may be a small percentage but it is not free, either bandwidth nor latency wise and as such it is important to keep in mind how often you use it.

Just to throw out some numbers (based on the time it takes to copy the data).

25.6GB/s in MB/s is 26214.4 MB/s
26214.4MB/s / 30 is 874 MB/frame

In other words to copy 874MB of data to/from the eSRAM would take a entire frame of time.

you can do a full fill/read from the eSRAM 28 times / frame if its doing nothing else.
This mean that each total fill/read (ie 32MB) takes 1/28th of a frame to complete.

That doesn't seem so low to me.

I'm not sure that's the point I was trying to make. I only mentioned the low % of reads/writes to dram as opposed to esram in response the post that was saying that 32MB esram is so small it's irrelevant to the overall bandwidth picture.

Clearly data is going to need to be moved between dram and esram, but there is going to be a bigger proportion of bandwidth that is consumed by the GPU reading/writing intermediate data to esram. That has to be the point of having it in the first place.

Getting back to my original question, it seems that in the absence of any evidence we can only guess at the real world performance of Orbis bus. I proffer that it will not do any better than the x1 at 75%. Happy to hear anybody's ideas about why it could me more or less than this.
 
why would anyone need to copy a finished frame is beyond me. Also PRT don't need to be refreshed completely every frame, I think the point is that the DME frees up the GPU, 25.6G be, you can refresh the entire 32M 13 times @ 60fps, that's like 1M cycles when going through GPU is at 500K? Its pretty insignificant IMO

Im merely giving upper bounds on specific busses and the times that they take. The reason I gave numbers for the entire thing is that its easier to reason with thats all.

Yeah you should be able to fill / empty it 13 times at 60FPS but thats using the entire peak bandwidth of the DME's and they hang off a bus that has other devices on it (although they do not use a great deal of bandwidth).

Not to mention thats also using ~1/2 your DDR bandwidth.
 
Im merely giving upper bounds on specific busses and the times that they take. The reason I gave numbers for the entire thing is that its easier to reason with thats all.

Yeah you should be able to fill / empty it 13 times at 60FPS but thats using the entire peak bandwidth of the DME's and they hang off a bus that has other devices on it (although they do not use a great deal of bandwidth).

Not to mention thats also using ~1/2 your DDR bandwidth.

well, to put it in perspective, PS4 at 176GBps and 8G Ram, allows 3G of data copying per frame @ 60fps, so it'll take 3f to copy the full 8G? I don't get what you are saying at all.
 
well, to put it in perspective, PS4 at 176GBps and 8G Ram, allows 3G of data copying per frame @ 60fps, so it'll take 3f to copy the full 8G? I don't get what you are saying at all.

Im saying that there is a upper limit to the amount of bandwidth the DME's have and as such I don't really think that copying even the entire 32MB from eSRAM to DDR or vice versa takes 'insignificant' amount of time and as such it would require careful usage to get good performance out of.
 
Im saying that there is a upper limit to the amount of bandwidth the DME's have and as such I don't really think that copying even the entire 32MB from eSRAM to DDR or vice versa takes 'insignificant' amount of time and as such it would require careful usage to get good performance out of.

I don't think that use case (copying the entire contents of the ESRAM to DRAM or filling the ESRAM from DRAM data unmodified) is going to be very prevalent. Ideally, I would expect that since the GPU can read and write to both pools that you would want to accomplish the bulk of your data movement between them by tying it to GPU processing. At that point you're just using the DMEs for specific use cases and in ALU-bound situations to take advantage of unused bandwidth.
 
Do you think it will be possible for PS4 to emulate PS2 without additional hardware?

Yes, because unlike PS3 RSX, this time the GPU has enough bandwitdth to emulate graphics synthesizer without any problems. Sony could port any freeware PS2 PC emulator to PS4.
 
Status
Not open for further replies.
Back
Top