Predict: The Next Generation Console Tech

Status
Not open for further replies.
I received more solid confirmation what I've been hearing, Durango vs Orbis, Durango is said to feature more RAM and a weaker GPU compared to Orbis, but both machines specs are said not final and so, difficult to judge.
 
I received more solid confirmation what I've been hearing, Durango vs Orbis, Durango is said to feature more RAM and a weaker GPU compared to Orbis, but both specs are said not final and so, difficult to judge.

Is it from BGAssassin? If not, disregard it. The ram and the GPU isn't finalized yet on the Durango. Ditto for Orbis ram.
 
I doubt they will do it on this schedule. Still, even only frontend changes would help -- at present it's actually very hard to keep the FPUs in BD fully occupied.
Interesting bud disheartening... :LOL:

I follow the discussion here about BD, it's over my head mostly but the rough understanding I get about it is that the whole idea is almost be thrown out of the windows.

High performances cache are crazy complicated (from posters comments here), AMD was already lagging Intel. They chose a solution that were calling for even more complicated ones (among other serious complications).

I'm close to think (a synthesis of reads than proper thinking though) that AMD should revert to the Star cores present in Llano and build from there.
They should rework the cache hierarchy.
Going with CMT (vs SMT) may have make sense when they were competing head to head with Intel. They expected more return from it than from SMT. Now that they admittedly decouple from Intel I see simply no reason to from them to pass on the tech.

The only thing from BD that could makes sense is the shared SIMD/FP. It would save then silicon, power and if they only go with "good enough" CPU one shared SIMD could be enough (in general public use and most likely server workload)..
In any case they will get hammered by Haswell in that sector.
 
Last edited by a moderator:
I received more solid confirmation what I've been hearing, Durango vs Orbis, Durango is said to feature more RAM and a weaker GPU compared to Orbis, but both specs are said not final and so, difficult to judge.
Would make sense with early rumors saying that Sony would be ahead.
May MS is indeed investing more on the CPU side/ slash planning for pretty complicated CPU/GPU interactions.
 
Is it from BGAssassin? If not, disregard it. The ram and the GPU isn't finalized yet on the Durango. Ditto for Orbis ram.

Not from BG, and yes all specs are not final, just the current state.

I'd throw in Bkillian has hinted here at Durango featuring a beefy CPU. Some say an IBM 6 core (but that part about IBM 6 core is extremely speculative so dont hold me too it)
 
I received more solid confirmation what I've been hearing, Durango vs Orbis, Durango is said to feature more RAM and a weaker GPU compared to Orbis, but both machines specs are said not final and so, difficult to judge.

Isn't it a bit laughable that an almost bankrupt company puts more expensive hardware inside its box ?
 
http://www.geforce.com/whats-new/ar...-next-gen-gtx-680-powered-real-time-graphics/

The technique is known as SVOGI – Sparse Voxel Octree Global Illumination, and was developed by Andrew Scheidecker at Epic. UE4 maintains a real-time octree data structure encoding a multi-resolution record of all of the visible direct light emitters in the scene, which are represented as directionally-colored voxels. That octree is maintained by voxelizing any parts of the scene that change, and using a traditional set of Direct Lighting techniques, such as shadow buffers, to capture first-bounce lighting.

Performing a cone-trace through this octree data structure (given a starting point, direction, and angle) yields an approximation of the light incident along that path.

The trick is to make cone-tracing fast enough, via GPU acceleration, that we can do it once or more per-pixel in real-time. Performing six wide cone-traces per pixel (one for each cardinal direction) yields an approximation of second-bounce indirect lighting. Performing a narrower cone-trace in the direction of specular reflection enables metallic reflections, in which the entire scene is reflected off each glossy surface.

[Editor's Note: If the above sequence seems alien to you, it's because it is. Global Illumination requires a totally new lighting pipeline. In a traditional game, all indirect lighting (light that is bounced from a surface) is calculated in advance and stored in textures called lightmaps. Lightmaps give game levels a GI-look but since they are pre-computed, they only work on static objects.

In Unreal Engine 4, there are no pre-computed lightmaps. Instead, all lighting, direct and indirect, is computed in real-time for each frame. Instead of being stored in a 2D texture, they are stored in voxels. A voxel is a pixel in three dimensions. It has volume, hence the term "voxel."

The voxels are organized in a tree structure to make them efficient to locate. When a pixel is rendered, it effectively asks the voxel tree "which voxels are visible to me?" Based on this information it determines the amount of indirect light (Global Illumination) it receives.

The simple takeaway is this: UE4 completely eliminates pre-computed lighting. In its place, it uses voxels stored in a tree structure. This tree is updated per frame and all pixels use it to gather lighting information.

This technology is ram-intensive,according to Epic.
They showcased it on 16 GB machine.

If it's true, this may be one reasons Microsoft decided to go for more memory and a slightly less powerful GPU: voxel rendering.
 
I doubt they will do it on this schedule. Still, even only frontend changes would help -- at present it's actually very hard to keep the FPUs in BD fully occupied.
AMD has known for years now which way the vector ISA path is going. BD was far enough back from final design that it could change its decoders to handle AVX, which was published in 2008, although it apparently too late when FMA3 was introduced in 2009.
The back end was too frozen to allow a change for AVX, which if we go by public dates means it was too far along with a release date in 2011, 3 years after AVX was disclosed.

Steamroller would have been at a much earlier stage at that point, if it was moving along at all. 2008 to 2013 is long enough to have built an AVX-supporting core from start to finish.


I'm close to think (a synthesis of reads than proper thinking though) that AMD should revert to the Star cores present in Llano and build from there.
Llano was what you get when moving a Stars core to 32nm.
That chip had horrible problems, and Trinity is by all material measures better.
 
Not from BG, and yes all specs are not final, just the current state.

I'd throw in Bkillian has hinted here at Durango featuring a beefy CPU. Some say an IBM 6 core (but that part about IBM 6 core is extremely speculative so don't hold me too it)

Big assumptions but sometime it can help set some "borders".

If Sony goes with 18CU I would assume a GPU ~200 sq.mm.
We hear MS is Weaker on the GPU front. I'm not sure the dev would make such a comment (though he didn't give precise statement). I don't think ! or 2 SIMD would make the GPU "weaker". Both would be in the same ballpark at an early stage that kinf of difference is kind of marginal (6% and 11% clock for clock).

On the other hand I would not assume that Durango has let say 50% less SIMD. That would be 9 SIMD.
Based on this early feedback I would put durango anywhere between 10<< xx <<18 SIMD
I could definitely buy 12 SIMD. It's weaker 33% is weaker significantly though it's nothing horrible.

Some thoughts at this point:
The GPU is going to be tiny (without changes) it could be 150 sq.mm
With a pretty tiny GPU the odds for a SOC goes higher.

--------------------------------

Then we have the statement from BgAssassin that Durango is "weird".
The only thing I could think as weird and taking the rumors above in account is that the balance of power between the CPU and GPU is indeed "weird".
Actually I could push further and assert that even the CPU choice is weird, not what you would expect from MS looking at both the xbox and the 360 which were pretty conventional designs.

------------------------------

So we have one side 150 sq.mm of silicon for the GPU, possibly a SOC and lot of CPU power.

If I go by AMD own GPU, from an economic pov I would say that anything above 350 sq.mm is too expansive. So I would assume that a reasonable (wrt to cost) high chip could be anywhere between 300sq.mm (gtx680) and 360 sq.mm (~ hd7970 and gtx560).
So we have anywhere near 150 sq.mm and 200sq.mm (let make it simple the point is not to be accurate) to be invested in CPU and cache.

That would definitely allow for 6 cores which on a core basis would pretty much the same size as half a piledriver module.

Point is "is that weird'? Not really to me. It's pretty standard SOC set-up.
I would not be that surprised if MS went further with more cores, and more throughput oriented ones.
----------------------------------------------------

Then we have the amount of RAM (>4GB) which discard GDDR5.
That let the sytem with a significant bandwidth problem.
I could only see it addressed in two fashions:
1)Xenos approach with a smart edram die.
2) enforcing tile based rendering (and tiny tile that would fit in the caches or LS of the GPU).

I could see one turning as a significant investment, Ms may be pursuing 1080P, deferred render and becoming the standard, etc.

Taking "weird" in account I could see actually MS enforcing the 2 approach. Risky business though. The fastest DDR3 will only grant them +60GB/s worse of bandwidth.

The impact on the GPU could be increased size for the cache and local store. The CPU could also with an healthy amount of cache to make the most of data locality.

--------------------------------------

So if I let my imagination works I could envisioned something "weird" something like this:
*It could be a SOC.
*It would use IBM 32nm that allows for EDRAM
*It could be north of 300 sq.mm
*There would be 8 Xenon2 cores
*Those cores would be reworked Xenon cores:
-3 issue and a bit wider CPU
-Integer pipeline OoO + other improvements and a bit more resources(ALU, AGU,LS units)
-SIMD IO, improved VMX128 still 4 wide, would supporting integer help?
-4 way SMT
-reworked cache hierarchy, faster L1, L2 faster included in the "core" 256 KB.
-There is a beefy L3, at least 8MB
-slower based clock speed than Xenon + turbo.
*The GPU is 12 SIMD, with two ACE. has more LS and cache than off the shelves GPU. Based on last AMD products. Heavily compute oriented.
*They implemented coherency between the CPU and the GPU. the GPU can access the L3.
*8GB of RAM

With regard to software the thing is complicated . Ms is to use C++ AMP or any other language that allow to program for both the GPU and CPU.

I would say 8 cores and 12 SIMD are the bottom line for the design.

OK, that's enough for the hairy speculation. I've serious stuffs to do :)
 
Last edited by a moderator:
Llano was what you get when moving a Stars core to 32nm.
That chip had horrible problems, and Trinity is by all material measures better.
That was still pretty a pretty standard K10. AMD doesn't seem to have put the same amount of R&D behing BD and the Stars cores.
When I mean rework I mean changing the cache, include some of the nicety found in trinity, use more up to date SIMD. Include SMT to the design.

Thing is developing CPU is long and AMD is stuck to something that looks a bit like a dead end. I try to understand what you are discussing in the proper topic, I don't think I can't get it but it still looks like AMD went for something overly complicated whereas reworking what they had may have granted better results.:?:
 
Not sure who to listen to, but it would be very strange for MS to somehow leave "GPU centric" design of their consoles and go for good CPU instead, regardless of RAM difference. Their choices this gen have been very successful so I'm not sure why would they leave that philosophy (maybe because Kinect 2.0 and more entertainment on new system).
 
Not sure who to listen to, but it would be very strange for MS to somehow leave "GPU centric" design of their consoles and go for good CPU instead, regardless of RAM difference. Their choices this gen have been very successful so I'm not sure why would they leave that philosophy (maybe because Kinect 2.0 and more entertainment on new system).
The 360 was not gpu centric. Xenon and xenos are mostly the same size.
Smart edram/daughter die is mostly edram and could be considered as a fast type of ram (more expansive than gddr3 or xdram by self but allowing some nice cost reduction).

A 6 big CPU cores + 12 SIMD would not be that CPU centric, like a 40/60 split (60 being the GPU).

If MS want to avoid complication they could go with something like.
6 cores 12-14 SIMD(/CU/GCN I'm lost in AMD naming) and move the ROPs to a big piece of edram.

That would be a south ~300 sq.mm piece of silicon tied to a 100-150 sq.mm of edram ( say the rops +64 MB of edram).

WRT to who believe, people with informations sounds like a good start. BgAssassin ahs proved to have solid connections through neogaf. Rangers seems also to have valuable connections.

You can't expect any devs or insiders here to state anything, or that's if they don't want to retire early... :LOL:
 
It sounds to me like MS might be designing their next console around Kinect 2.0.

This would explain lots of RAM and a powerful CPU, while neglecting the GPU.

Bkilian said RAM is very important for the Kinect stuff, "by far the most precious resource on the box" he called it.

You can read the incriminating post here http://forum.beyond3d.com/showpost.php?p=1645931&postcount=409

And the Kinect is dependent on bandwidth, CPU, and memory. The Skeletal processing is a lib developers link in, and it uses a certain amount of memory and some part of a hardware thread. Speech is the same. The amount of memory required limits the skeletal database size and the number of joints. With huge gobs of ram, we could probably track lots more joints,

So to answer your question, I'd say memory would be the biggest concern for most of the Kinect functions, it's by far the most precious resource on the box.
 
When they talk about SOC i think they are talking about an interposer connecting CPU and GPU that are in different packages. Building a huge only SOC chip would make yields very bad.

If this is the case why not use Kaveri apu and a gpu?. Just doesnt make sense unless they are just going for a cheap console price.

Use a kaveri apu at 3.2g is a 100w twp, cell was 110w tdp. Use the GNC version of either of these cards 7870 120tdp 7850 90W tpd. RSX tdp was 80w.

"Testing performed by AMD Performance Labs. Calculated compute performance or Theoretical Maximum GFLOPS score for

2013 Kaveri (4C, 8CU) 100w APU, use standard formula of (CPU Cores xfreq x 8 FLOPS) + (GPU Cores x freq x 2 FLOPS). The calculated GFLOPS for the 2013 Kaveri

(4C, 8CU) 100w APUwas 1050."

With the gpu paired with this apu you would have about 2800 GFLOPS at about the same TDP as the launch ps3.

ATI is betting the company on heterogeneous system architecture and these rumor cpu will be the first apu to fulling support this. Why not show it off in the PS4?
 

Recap of "believable" rumors:
- MS went crazy high for specs
- Shitload of slow ram (8 GB of DDR3)
- A mid-range GPU (~Cape Verde)
- Weird design
- Strong CPU


Could a really high-throughput processor, born for DirectCompute/ C++ AMP / OpenCL, be the thing? Actually almost a "PS3" approach (with the CPU helping a lot the GPU).
If the processor is from AMD, it could be just some on-die reworked CUs without texture/ROPs+ 2 steamroller module. eDRAM should be less costly than the last gen, 64 mb require just 50 m^2 of space.
How much space does a CU take?
 
Last edited by a moderator:
It sounds to me like MS might be designing their next console around Kinect 2.0.

This would explain lots of RAM and a powerful CPU, while neglecting the GPU.

Bkilian said RAM is very important for the Kinect stuff, "by far the most precious resource on the box" he called it.

You can read the incriminating post here http://forum.beyond3d.com/showpost.php?p=1645931&postcount=409

Interesting!
Speech recognition, the skeletal processing etc. would benefit more an heavily-multithreaded process or from less but faster threads? Could GPU-compute be used to accelerate those process?
 
That was still pretty a pretty standard K10. AMD doesn't seem to have put the same amount of R&D behing BD and the Stars cores.
Llano's architecture has had R&D behind it since K8 or K7.
It reaped the benefits of generations of work put into the same basic pipeline.
Its problems are that its pipeline worked best for the challenges of 130nm and 90nm, and it has been an increasing struggle to make it scale at lower process nodes.

Even if on a redo AMD didn't make the same choices as were made for Bulldozer, it wouldn't make another Llano either.

When I mean rework I mean changing the cache, include some of the nicety found in trinity, use more up to date SIMD. Include SMT to the design.
Some of those things are very fundamental to the design. Start changing them and you might not get a BD, but it wouldn't be a K10.

Thing is developing CPU is long and AMD is stuck to something that looks a bit like a dead end. I try to understand what you are discussing in the proper topic, I don't think I can't get it but it still looks like AMD went for something overly complicated whereas reworking what they had may have granted better results.:?:
Bulldozer was probably a serious misstep.
However, Llano was a reworked K10.
Just because the new thing they tried wasn't all that great doesn't mean a warmed-over old pipeline could give them what they needed.
 
Not sure who to listen to, but it would be very strange for MS to somehow leave "GPU centric" design of their consoles and go for good CPU instead, regardless of RAM difference. Their choices this gen have been very successful so I'm not sure why would they leave that philosophy (maybe because Kinect 2.0 and more entertainment on new system).

Microsoft is listening to Tim Sweeney and other developers? The desire for a more developer friendly platform to lower complexity and keep development costs in check would seem to be a driving force for a CPU-centric design.

My wild guess as to why there is a bit of mystery from Crytek is that Microsoft plans to use a PowerVR RTX and the design isn't done.
 
Status
Not open for further replies.
Back
Top