Will next gen consoles finally put an end to ugly low res textures?

Look there is no way in hell that you were going to get more than 512mb gddr3 shoe horned into a £300 white box in 2005, as someone else has pointed out, we were VERY VERY fortunate to get that, 256 would have been high end for a console, and that is what we were going to get untill microsoft took a last minute decision.
I remember at the time being gobsmacked at the specs of both consoles, it was for the time years ahead of what developers were capable of taking advantage of.

For peoples information, the xbox 360 and ps3 are perfectly capable of running 1080p games (native not scaled) we have had a few of them, i couldn't name them but i know some were made.

But to increase things like textures and frame rates, lighting effects etc as well as fit it onto 1 disc they had to make a decision, and in my opinion they made the right one.
 
Almighty;
That's why I think that both Microsoft and Sony will use dedicated System and VRAM memory pools.

Well thats something that i have suggested on another thread, i pretty much said exactly the same thing, what i didn't take into account was actually one of the 360's greatest strengths..unified memory pool.
Developers love simplicity and having access to the full ram pool, to do what they want with out messing around is much prefered to the ps3 method of split pools.

If the choice however was a medium sized unified pool or a large split pool i wander what the preference would be?
 
You're comparing apples to oranges.
If a 360 CPU were running the same OS and drivers as that core 2, it would have worse problems running those games.

I took almighty's comments to be referring to actual in-game experiences, hence my putting "beats" in quotation marks and also making reference to optimisation and overheads being in the Xbox 360's favour.

The primary issue on PC CPU's is no exclusive access to the GPU, some less than ideal driver architecture that made draw prim calls ludicrously expensive and drivers that are trying to fix broken code on games written over the last 10 years.

What kind of proportion of a PC CPUs time would you guesstimate is actually being taken up by tasks/holdups that consoles are free from? No doubt it would vary on a game by game (and probably driver by driver) basis, but as a ballpark figure would it be fair to say something like 25% ~ 50% of a C2Ds time would be spent on such things in a graphically impressive / complex, top tier game?

We actually ran various benchmarks on the XCPU and at the time High end PC processors before it was released and for none vectorized code, like Zip type compression/decompression, it wasn't even close, the PC processors were MUCH faster. I don't remember the exact numbers so I won't quote them. But even the PPC chips in the "alpha kits" were considerably faster than the shipped CPU's.

How closely would those benchmarks reflect the performance of current games, and could the console results on any of the benchmarks be improved with the benefit of ~7+ years of development experience? I'm asking to try and get additional opinion on the just how far 360 development may or may not have come. Capcom's "A64 X2" comment is ancient history (relatively) and without much context.

My own extremely simple comparisons of an Athlon 64 and the 360 were done using XNA and I found that the A64 was about twice as fast clock for clock on the single threaded stuff I was doing, but I could make situations where the A64 was nearly 10 times as fast (and I don't think it was the garbage collector as I wasn't allocating). Never tried multithreaded code on it because I suck at it.
 
But then again, it was a compromise, designing a box that does not draw more than 200W, being able to deal with IP of the CPUs, they could be shrunk for cost saving...
I believe IBM offered them the best deal, even for Microsoft to be able to implement several features on the CPU...

I understand that it is not possible to design a box that pleases everyone, looking back I believe that it was the best decision.
 
Almighty;

Well thats something that i have suggested on another thread, i pretty much said exactly the same thing, what i didn't take into account was actually one of the 360's greatest strengths..unified memory pool.
Developers love simplicity and having access to the full ram pool, to do what they want with out messing around is much prefered to the ps3 method of split pools.

If the choice however was a medium sized unified pool or a large split pool i wander what the preference would be?

But if they do use UMA how will they generate enough bandwidth between the RAM and GPU to not starve it?
 
A wider bus. What makes you think a split pool will solve the problem? You still need I/O for the second pool of memory. So um.... yeah, instead of using a second memory controller and a second RAM type, you just uh... increase the bus width of the single RAM type.

The second set of I/O isn't magically free. >_>
 
A wider bus. What makes you think a split pool will solve the problem? You still need I/O for the second pool of memory. So um.... yeah, instead of using a second memory controller and a second RAM type, you just uh... increase the bus width of the single RAM type.

The second set of I/O isn't magically free. >_>

If you use split pools you could have a good amount of system RAM, Say 4Gb of DDR3 which would provide ample bandwidth for the CPU.

And then between 1-2Gb of GDDR5 on a big ( 'ish ) bus to provide between 150-200Gb/s bandwidth depending on memory clock and GPU bus size.

Would a UMA solution provide 150-200Gb/s to a GPU without going for a massive bus that eats a lot of transistors? Or without resorting to EDRAM to alleviate the lack of bandwidth between VRAM and the GPU?
 
And if you use split pools, you still have a huge bus. That doesn't change.

NUMA actually complicates the design if you think about it, when you consider that you're wanting the CPU and GPU to access both RAM spaces lest you end up with a situation similiar to how Cell can only access GDDR3 at 16MB/s. Extra wiring from GPU and CPU to XDR to avoid bus contention.... XDR I/O on RSX. Dealing with higher RAM speeds also complicates the design of the memory controller now that you're having to deal with two different types.

In case you still don't understand what I mean, you're suggesting for example, a 128-bit bus to GDDR5 and a separate 128-bit bus to DDR3. So why don't you just skip the bullcrap of wiring and hardware design needed to accommodate the board complexity and just go with a single 256-bit bus?
 
And if you use split pools, you still have a huge bus. That doesn't change.

NUMA actually complicates the design if you think about it, when you consider that you're wanting the CPU and GPU to access both RAM spaces lest you end up with a situation similiar to how Cell can only access GDDR3 at 16MB/s. Extra wiring from GPU and CPU to XDR to avoid bus contention.... XDR I/O on RSX. Dealing with higher RAM speeds also complicates the design of the memory controller now that you're having to deal with two different types.

But modern CPU's and GPU's are designed to handle such memory speeds so using them shouldn't complicate anything imo.

I was thinking split memory pools would be better in terms of cost, Using DDR3 for system RAM and GDDR5 as VRAM woulds surely be cheaper then using a faster type of memory in UMA? Would a faster memory be required for UMA to help get the bandwidth up?

With the chance of 720's GPU using a 256bit bus being high, (Can't see it being higher personally) Would using XDR/2 in UMA provide as much bandwidth as a 256Bit bus running on high speed GDDR5 would? If no then what would need to be done to make up for the lack of bandwidth? EDRAM? More efficient compression routines? A bigger GPU bus width? How much bigger?

So many questions, One could overload!
 
We actually ran various benchmarks on the XCPU and at the time High end PC processors before it was released and for none vectorized code, like Zip type compression/decompression, it wasn't even close, the PC processors were MUCH faster. I don't remember the exact numbers so I won't quote them. But even the PPC chips in the "alpha kits" were considerably faster than the shipped CPU's.
Decompressors designed for PC OoO CPUs tend to have lots of imuls and variable shifts. Running straight PC code on a console without modifying it is often not a good idea. Properly optimized LZMA style algorithm runs over 5x faster on consoles. Of course there are many algorithms that cannot be efficiently ran on consoles, but often there are many alternative ways to solve the same problem in a way that is efficient on consoles as well.
 
They didn't low ball last time, 512mb was a decent amount when they were designed.
If by "when they were designed" you meant two years before the launch then maybe. During launch 512M of system+vram was definitely midrange at best when compared versus PCs. I can agree with that amount being as good as they could have managed but I wouldn't say it was excellent.
If you use split pools you could have a good amount of system RAM, Say 4Gb of DDR3 which would provide ample bandwidth for the CPU.
Yeah but in a console where you don't need to keep all the textures in main RAM just in case someone decides to alt+tab out of the game you won't need nearly as much of it as on PC. Instead of 4G RAM/1.5G VRAM I'd probably swap those numbers around but then you won't really save money on simplier design and the whole thing becomes rather pointless.
Properly optimized LZMA style algorithm runs over 5x faster on consoles
How does that "5x faster" compare to same algorithm running on PC CPU and optimized for that?
 
How does that "5x faster" compare to same algorithm running on PC CPU and optimized for that?
The standard LZMA decompression algorithm is already well optimized for standard PC CPUs (out of order execution, relatively short pipeline, very good branching performance, pipelined integer multiplies). Check here for their official benchmark: http://www.7-cpu.com/

As you can see, the algorithm clearly prefers short pipelines, as Bobcat beats fastest P4 dual cores with ease.
 
Last edited by a moderator:
Yes but my point was "5x faster" than what exactly? x86-optimized code on PPC? PPC-optimized code on PPC vs x86 optimized on x86?

From the URL I saw that x86 CPUs are several times faster even when using just a single core and scaled WAY better than Cell (xb 360 doesn't seem to be there). Another interesting thing I saw was 1.2GHz Cortex A9 being faster and scaling better than Cell.
 
x86-optimized code on PPC?
This.
From the URL I saw that x86 CPUs are several times faster even when using just a single core and scaled WAY better than Cell (xb 360 doesn't seem to be there). Another interesting thing I saw was 1.2GHz Cortex A9 being faster and scaling better than Cell.
Cortex A9 is an out of order CPU with all the required abilities to run the standard LZMA decoder quickly. Cell is a in-order speed demon (high clocks, long pipeline) style CPU and requires a specially crafted version just like XCPU (that likely results in many times higher performance).

The scaling: Cortex A9 has two physical independent cores, while Cell has just one core with SMT ("hyperthreading"). Almost 50% boost from SMT is very good, so it seems to be working well in this case.
 
Thank you :)
Cortex A9 is an out of order CPU with all the required abilities to run the standard LZMA decoder quickly
In-order Cortex A8 with 200MHz lower clock speed than A9 did remarkably well too, especially on decompression. It was basically on-par with Cell that has 3x higher clock speed. Though that talk about ARM doing (for me) surprisingly well probably should go to HW speculation thread.
 
sebbbi said:
but often there are many alternative ways to solve the same problem
In all fairness, a fair bit of that can apply to PC CPUs as well but hardly anyone bothers anymore because there's so little point optimizing for the 'platform' (beyond path of least resistance), and even when you can run something spectacularly fast on the CPU - eg. PPAA, the overhead you pay to get it to/from GPU just makes it all feel worthless all the more.
 
Decompressors designed for PC OoO CPUs tend to have lots of imuls and variable shifts. Running straight PC code on a console without modifying it is often not a good idea. Properly optimized LZMA style algorithm runs over 5x faster on consoles. Of course there are many algorithms that cannot be efficiently ran on consoles, but often there are many alternative ways to solve the same problem in a way that is efficient on consoles as well.

It wasn't the only test we ran, there were many, including larger portions of existing code.

I have 2 points here, one is to note that that for none vector workloads, any concept that 360's CPU or the PPU on a PS3 were competitive with existing x86 CPU's at the time is simply not true.
The second is although you could get good performance out of them it took a lot of work, work that in general doesn't get done to every line of a 3 M+ line codebase, and at some level means work not getting done elsewhere.

I would consider poor single threaded performance to be amongst the bigger problems with both processors.

What kind of proportion of a PC CPUs time would you guesstimate is actually being taken up by tasks/holdups that consoles are free from? No doubt it would vary on a game by game (and probably driver by driver) basis, but as a ballpark figure would it be fair to say something like 25% ~ 50% of a C2Ds time would be spent on such things in a graphically impressive / complex, top tier game?

There is no way to quantify it, for generic "code" you're paying what may as well be nothing in overhead. If you were submitting primitives in DX9 if might have been >90% lost to the API/Driver.
A lot of the poor ports developed on 360 first, didn't have reasonable batch counts, going from a platform where submitting batches is cheap to one where it's ridiculously expensive pretty much guarantees ending up CPU bound on whatever thread is submitting primitives.
DX11 has much less of an issue in this regard.

Here is one of my favorite quotes from Tom Forsythe on twitter he was asked what it was like to develop drivers for a PC

"Driver writers spend >50% of their time bodging around broken apps. Then apps bodge around the bodges. Repeat until bloodshed."
 
ERP said:
Here is one of my favorite quotes from Tom Forsythe on twitter he was asked what it was like to develop drivers for a PC

"Driver writers spend >50% of their time bodging around broken apps. Then apps bodge around the bodges. Repeat until bloodshed."

I love Tom's wit, always puts a smile on my face... May need to update the sig...
 
So do you guys think everything is gonna be nice and sharp on my 720p TV next gen or am I still going to see ugly low res stuff when I walk up to the wall in Modern Warfare 5?
 
Back
Top