PlayStation 4 (codename Orbis) technical hardware investigation (news and rumours)

Status
Not open for further replies.
1ghz GPU clock was never realistic, also unless there was more to the tweet there is nothing that suggests a CPU frequency, so CPU frequency is still in the air and it's the thing that has always seemed the most subject to variation.

~1.8ghz CPU is my bet.

I don't think they would desynchronize cpu and gpu clockwise. It's a simple 2:1.

Also the gain from +200 on this type of architecture, where the cpu is just there to throw general puropose simple routines, is equal to.. nothing, or not worth. It's not a Cell or some kind of stream processor.
 
According to dice bf4 was running at 3k resolution and 60fps on a hd 7990 .



http://m.computerandvideogames.com/...-had-to-build-battlefield-4-from-gut-feelings

1080p is 33.33% of 3k res . 1.8tf is around 20.93% of 8.6 TF of hd 7990.
So how will ps4 run bf4 and which one will be sacrificed - resolution or frame rate ?

Its supposed to run at 720p on the PS4 isn't it? That would be the only power saving you need. Based on that id expect these exact same graphics for the PS4 version just at a much lower resolution, but still 60fps. A single 680 should be able to do the same at 1080p.

Incidentally, the youtube video does neither the resolution or the framerate justice. I thought both looked much lower.
 
A compute program and a graphics shader are just instructions as far as the CU is concerned...
Yes, I can believe Liverpool can run compute and graphics concurrently. What's daft is thinking Liverpool can run compute programs aside graphics programs on the same CU without any impact on the graphics potential. If you're running compute, those resources aren't available for graphics rendering. You're not going to have 1.8 TF of graphical number crunching and compute at the same time. You can't squeeze a quart into a pint pot!
 
According to dice bf4 was running at 3k resolution and 60fps on a hd 7990 .



http://m.computerandvideogames.com/...-had-to-build-battlefield-4-from-gut-feelings

1080p is 33.33% of 3k res . 1.8tf is around 20.93% of 8.6 TF of hd 7990.
So how will ps4 run bf4 and which one will be sacrificed - resolution or frame rate ?


It doesn't have to it can run it 1080p medium setting and it will be fine,the PS4 doesn't have to keep up with the 7990 the PS4 is not running $2000 dollar hardware it will cost $400 at worst with a $170+ GPU..;)

Most console owners will not even care about PC edging it,and from what i read most PC on steam will not be in a better position than the PS4 either,i don't see the 560TI or Intel integrated graphics doing 3K at 60 FPS either.
 
I'm not convinced yet that it has 7970 dies in it. I feel like they are going for a bit lower power consumption with this, but 7970s wouldn't really surprise me either.
 
Or perhaps the vague bullet points filtered through random people aren't totally correct. I'm sure things have been said that contradict each other at some point so I'm also pretty sure you're cherry picking what you like.



So what you're saying is that for 4 out of 18 CUs instead of having a 64 + 1 ALU arrangement it's 64 + 1 + 1? And that this is done for the benefit of compute?

I think you must not understand why that scalar unit is there. Accelerating its computational throughput is just not worth it. Usually if you have a good reason for using the CU in the first place you'll probably spend at least as many instructions on the SIMDs as you need scalar input, usually used for flow control or managing some wide shared control like TMU bindings. And those scalar instructions will probably tend to be more latency dependent in nature in which case it'll be harder to even utilize a second unit. Not worth the overhead in scheduling, register file access, etc. And it doesn't sound like a good idea to introduce asynchronous CUs in the first place.



GCN is already "kinda like Larrabee" in all ways pertinent to this discussion.

Now i am not a tech guy,but GDC is no E3 is not for your average joe gamer,i am sure you know that.

Sony was addressing developers there,so it would be incredible unprovable from sony that they sell developers bullsh** developers will know one way or the other when the final hardware is on their hands,so something got to give about their claims..

Hell i know no GPU out there is 100% efficient or close,what if sony actually make some customization so that they are able to get most out of the GPU out by using a mix of the 2 graphics and compute,to actually maximize the GPU..

For examples let say the 78XX is 70% efficient tops on PC,but on PS4 that same GPU is 95% efficient,what if the whole 25% efficiency comes from using CU to also do compute jobs.?

I don't know if i explain it right my English is not so great.
 
What I hate is that much of this speculation could be cleared up with a brief interview with someone who worked on the system at Sony and a competent person to ask the questions. The following is a hypothetical interview...

Me -- So does the GPU have any processing power of any kind beyond the 1.8 tflops? For example, extra ALUs?

Him - No.

Me - So there are no hidden vector units or anything else that should be added to the 1.8 figure?

Him - There is no secret sauce, hidden vector units, or extra ALUs. All of you are crazy!
 
What I hate is that much of this speculation could be cleared up with a brief interview with someone who worked on the system at Sony and a competent person to ask the questions. The following is a hypothetical interview...

Me -- So does the GPU have any processing power of any kind beyond the 1.8 tflops? For example, extra ALUs?

Him - No.

Me - So there are no hidden vector units or anything else that should be added to the 1.8 figure?

Him - There is no secret sauce, hidden vector units, or extra ALUs. All of you are crazy!


I think people is getting this the wrong way.

To do both things at the same time you don't need to have more than the 1.84TF already confirm,there is more to systems than TF performance and even i know that and i am not a tech geek or anything close.

There is a huge inefficiency claim,not only from AMD,people inside Nvidia (Timothy Lottes) and even top developers like John Carmack had complain about it,making your games for 1 hardware is not the same as making it for 100..

What if sony customization is actually allowing them to use what until now has been waste in inefficiency.?

So using the extra power for compute.?

That would not only fit every well but would also explain not having more than 1.84 TF,in fact that and basically having an 8 core CPU to just gaming will balance the act very well,sure you will not get 7970 performance,but at least will allow its 78XX to be the very best it can be..;)
 
I have just read so many rumors that point towards some kind of extra power or ALUs that I tend to think there is a possibility of it. I wish some developer here could call Sony, get permission to tell us if there any such extra flops, and report back. I think sSony would let them give us a yes or no answer, especially after they seem to be so open about the console.
 
There's no extra alu's. They've said how many Flops it is. If it had more ALU's it would have more flops. There is no magic.
 
There's no extra alu's. They've said how many Flops it is. If it had more ALU's it would have more flops. There is no magic.



What is the flops are already accounted.?

The 7850 has 1.76TF but is at 860mhz,not at 800mhz,sure there are 2 more CU,but how much a few extra ALU will increase TF performance.?
 
3dilettante, if we look at the GPU alone, the internal states and flow will be consistent. What happens if the CPU tries to read/write into the GPU internal states (I don't quite know what this mean because the freaking translated article s*cks) ?

I do not think there's a direct method of writing to the GPU. It's all going through intermediary memory locations.
Data is not written to a CU, but to an address you expect it will read from.
If the GPU already has that line cached, a write by the CPU will first invalidate the GPU's cache entries.

One possible interpretation from that mangled text is that the cache bypass allows CUs to read and write without caching the data in the GPU's cache and going to memory, but the traffic is still considered coherent.
The description is too confused to know if this means the CPU's cache behavior changes when snooped by Onion+ (it might prevent the CPU from having to go through a snoop+invalidate trip with the GPU), but it does seem to point to main memory being the primary path for communication.

Given the absence of a last level cache (or large scratchpad), the bad latencies for remote cache hits, and the weaker memory model of the GPU pipeline, this is probably the most reliable way of doing things.
 
I've been wondering when the full Gamasutra Cerny interview would post. Looks like it won't be for a while:

Eric Garant
29 Mar 2013 at 10:06 am PST

You said in in the end of this article: "In a forthcoming article, Gamasutra will share the many details of the PlayStation 4's architecture and design that came to light during this extensive and highly technical conversation."

Just wonder when are you planning to publish it ?

Thanks.

Christian Nutt
29 Mar 2013 at 10:25 am PST

Within the next couple of weeks, but no firm date just yet. Need to make sure all the technical information is correct, etc.​

Pffft, fact checking. Fact checking is clearly for losers (and journalistic integrity).
 
I do not think there's a direct method of writing to the GPU. It's all going through intermediary memory locations.
Data is not written to a CU, but to an address you expect it will read from.
If the GPU already has that line cached, a write by the CPU will first invalidate the GPU's cache entries.

One possible interpretation from that mangled text is that the cache bypass allows CUs to read and write without caching the data in the GPU's cache and going to memory, but the traffic is still considered coherent.
The description is too confused to know if this means the CPU's cache behavior changes when snooped by Onion+ (it might prevent the CPU from having to go through a snoop+invalidate trip with the GPU), but it does seem to point to main memory being the primary path for communication.

Given the absence of a last level cache (or large scratchpad), the bad latencies for remote cache hits, and the weaker memory model of the GPU pipeline, this is probably the most reliable way of doing things.

Before I go further, how does the programmer choose when to use Onion or Garlic under the GCN model ? Is it implicit in CPU read/write (always Onion) vs GPU read/write (always Garlic) ?

According to the same interview, the "feature(s)" described by Cerny is optional. He doesn't expect developers to use it/them at the beginning.

I've been wondering when the full Gamasutra Cerny interview would post. Looks like it won't be for a while:



Pffft, fact checking. Fact checking is clearly for losers (and journalistic integrity).

Nice ! Probably trying to catch Cerny. :p
 
Before I go further, how does the programmer choose when to use Onion or Garlic under the GCN model ? Is it implicit in CPU read/write (always Onion) vs GPU read/write (always Garlic) ?

I've only seen details for the older APUs publicized, and I suspect GCN's movement towards the x86 paging scheme has changed some things. Potentially, the older APU setup where memory is statically allocated as device memory is going to be modified or removed, and the pinning requirements might be different (or irrelevant if the console doesn't page things out to disk).

The general idea is that Garlic and Onion are used by the GPU based on whether the pages being written to are defined as being cacheable by the CPU. If an address might be found in the CPU's cache, traffic goes over Onion since accesses have to play by the rules or it's game over for the system.
Garlic is non-coherent, so it handles traffic for pages that have been defined as not cacheable.

It is possible GCN is more flexible as to when and how this can be defined compared to Llano's requirement for setting things up at initialization, but the continued existence of the two buses probably means the cacheable and non-cacheable distinction remains.
 
Status
Not open for further replies.
Back
Top