Xenon , Ps3 , Revolution ...

psurge · Mar 18, 2005

ERP said:
darkblu said:

psurge said:

If these new consoles all have 2 thread in-order general purpose units, maybe compilers will eventually support "scouting threads" (not sure what this is officially called), by which i mean a thread more or less dedicated to prefetching memory for a compute thread...

Click to expand...

well, although theoretically possible, it would hardly come from the compiler side*. what the compiler could successively do, though, is to put in prefetches for the more prominent accesses in the code.

* compilers hardly care bout threads dispatching, at least not the cc's.

Click to expand...

It would be a pointless excercise because of the limit on outstanding prefetches. It's just about enough to hide latency if you think about it and put them inline.

Err I'm not talking about prefetch instructions - the scout thread would be issuing normal memory loads and then doing nothing with the data (in an attempt to warm up the cache for the computation thread). Basically, your scout thread runs a version of the computation code stripped of all memory writes and instructions non-relevant to branches and memory reads. You then run it hundreds of instructions in advance of the computation thread and it becomes an intelligent prefetch engine for the main thread. But yeah, it would probably require hardware support to be effective...

function · Mar 18, 2005

The SNES and MD both had higher resolution modes than were normally used. On SNES, almost everything (bar almost nothing) was 256 x 224, where as on MD most stuff was 320 x 224, but some things (presumeably to save cart space for tile backgrounds?) were 256 x 224 (like the SNES games). PC engine had a higher resolution mode that was reasonably commonly used too - I'm struggling to remember but it was around 356 x 224 ish.

As I said, on the whole the SNES produced better visuals than previous technology (bar the Neo Geo), but in some ways it was also quite a step back.

Shifty Geezer · Mar 18, 2005

function said:
26MB in the GC to 64MB in Xbox is a pretty big difference considering they both launched at pretty much the same time!

Yes, but that 64MB costs a lot of money! Timeline isn't the only factor affecting specs. Willingness of a company to lose billions also has quite an impact.

ERP · Mar 18, 2005

psurge said:
ERP said:

darkblu said:

psurge said:

If these new consoles all have 2 thread in-order general purpose units, maybe compilers will eventually support "scouting threads" (not sure what this is officially called), by which i mean a thread more or less dedicated to prefetching memory for a compute thread...

Click to expand...

well, although theoretically possible, it would hardly come from the compiler side*. what the compiler could successively do, though, is to put in prefetches for the more prominent accesses in the code.

* compilers hardly care bout threads dispatching, at least not the cc's.

Click to expand...

It would be a pointless excercise because of the limit on outstanding prefetches. It's just about enough to hide latency if you think about it and put them inline.

Click to expand...

Err I'm not talking about prefetch instructions - the scout thread would be issuing normal memory loads and then doing nothing with the data (in an attempt to warm up the cache for the computation thread). Basically, your scout thread runs a version of the computation code stripped of all memory writes and instructions non-relevant to branches and memory reads. You then run it hundreds of instructions in advance of the computation thread and it becomes an intelligent prefetch engine for the main thread. But yeah, it would probably require hardware support to be effective...

But assuming you don't actually flush the cache frame to frame, you already get a lot of this for free. For the most part modern L2 caches miss infrequently at least in the benchmarks I've run on console games.

On PC's it's a different ballgame, because you have competing applications and the OS is large enough to cause contentions.

I'm not sure the extra thread would actually help with the execution speed either, because almost all code is bound by the nuber of memory fetches, and not by the ensuing calculations.

Besides how hard is it to write

loop
prefetch (x+32)
do work on x
x+=32

I've always been somewhat surprised that compilers don't attempt to insert prefetches, most of the useful cases are pretty mechanical.

Fox5 · Mar 18, 2005

16 MB main ram, 8 MB video RAM, 2MB Audio ram. All of it 100mhz 64-bit SDRAM. Aggregate bandwidth almost the same as the GC main ram too, come to think of it (remember that the DC's video chip had a tile buffer too).

Well, that'd be 800MB/s x 3, so 2.4GB/s.

Gamecube's main ram is 2.6GB/s, plus ~10 GB/s video/audio bandwidth, and then I think either 80 or 800MB/s for the ARAM.(guessing 80 since 800 seems like it would be fairly useful)

Nes was bested by the Mark 3 / Master System hardware though, although I can't remember exact timeframes off the top of my head ...

I think there were quite a few years difference in Japan, but less than a 2 year difference in America and probably even less in Europe. NES still remained one of the top systems though since no others really ventured into the console market during that time period.

Being "the most powerful in a gen" is all dependent on when you want to draw the boundry lines for that generation. N64 was, on balance, more powerful than the PS1 and Saturn, but it came 18 months later. If you look less than 18 months later than the N64, you see a console appear (the Dreamcast) that absolutely beasts the N64 by a margin several times that of the margin by which the N64 was better than the Saturn and Playstation. And thats despite costing less than the N64 at launch, and including a modem and optical drive.

In America, DC came out like exactly 3 years later. I think sega took a loss on hardware though, while nintendo didn't.
Sega also had access to technology that nintendo didn't, sdram, modern pc video chips, and cpus that resemble modern day stuff weren't really available to nintendo when designing the n64.

It doesn't have the GBAs sprite and tile hadrware, but frankly given how much extra power it has and how easy it is to to these things in software you shouldn't need them to outperform the GBA at 2D!

I seem to recall GP32's games having framerate problems and such, even the few made by capcom.(I think capcom made some games for it anyhow, but I remember the launch titles for gp32 didn't look better than gba games and ran worse, and I think capcom made some of them)[/quote]

version · Mar 18, 2005

http://forum.pcvsconsole.com/viewthread.php?tid=14947

this is the best that i had ever seen

psurge · Mar 18, 2005

ERP, for code like that I agree that prefetching is the way to go... Check this out though, it appears ICC already supports something like this:
http://www.wam.umd.edu/~atrus/lj/intel_beta/ReleaseNotes-linux-c.htm#NewFeatures
Scroll down a little, it's under "Software-based Speculative Pre-Computation".

I can't argue with your experience though - it sounds like this is a waste of resources on consoles.

darkblu · Mar 18, 2005

version said:
http://forum.pcvsconsole.com/viewthread.php?tid=14947

this is the best that i had ever seen

anonymous genius @ [url said:
http://forum.pcvsconsole.com/]2.8444[/url] + 2.5 + 1.5 = 6.844444444444444, its almost 7 times !!!

yes, and if you keep your finger on the '4' key a bit longer it gets even closer to 7. :idea:

Li Mu Bai · Mar 19, 2005

Has revolution been confirmed to have 384MB of ram yet? I haven't heard that figure anywhere, plus it's more than the 256MB of xbox 2, and ps3 is only rumored to have 256MB or 512MB.

No, this was all speculative.

function wrote:

26MB in the GC to 64MB in Xbox is a pretty big difference considering they both launched at pretty much the same time!

You must understand the differences between launches & spec. finalizations. Nintendo's & XBX's respective time differences were about a year apart in that aspect iirc. Again the UMA does not even allow for 64mb of dedicated ram, the whole machine must draw from the same pool. Video, sound, textures, etc. Let's not act as if the A-ram is totally useless even given it's speed. Retro & Factor 5 found uses. (executables, etc)

function · Mar 20, 2005

Fox5,

The DC had dedicated on chip video memory too, in the form of a tile buffer. Copy out once per tile per frame, so very efficient use of main video ram, just as with GC and (if the leaks are genuine) Xenon main memory.

In America, DC came out like exactly 3 years later.

If you look around you can normally find territory specific releases that don't reflect the time the systems first launched, and so don't represent how cutting edge a design really is. While this is certainly an issue for customers in specific countries, I don't think it gives a good idea of how ambitious a design really is.

Of course, nintendo hardware often seems to date while being held back for launch software to be ready.

I think sega took a loss on hardware though, while nintendo didn't.

This is one of the points I'm getting at. It's not that Nintendo are "too st00pid" to develop cutting edge hardware, it's just that they don't see it as central to their strategy, as MS, Sony and the Sega of old (usually) did.

Sega also had access to technology that nintendo didn't, sdram, modern pc video chips, and cpus that resemble modern day stuff weren't really available to nintendo when designing the n64.

Each new system has access to new technologies that are unavailable to the last. Having said that, a version of the N64 design was offered to, and rejected (possibly unwisely) by Sega prior to the release of the Saturn. A CD drive and the "full" 8 megs of ram would have transformed the N64 into something fully representative of it's launch time IMO.

I seem to recall GP32's games having framerate problems and such, even the few made by capcom.(I think capcom made some games for it anyhow, but I remember the launch titles for gp32 didn't look better than gba games and ran worse, and I think capcom made some of them)

I don't think Capcom ever actually made anything for the GP32.

How "good" games look is as much down to the resources put into them as the hardware. This is illustrated by how Megadrive and SNES stuff on GP32 emulators look better than much of native GP32 development, and often original GBA games too. GP32 development never went anywhere.

There's some very impressive (for the system) textured 3D stuff both in terms of homebrew demos and ports of stuff like Quake btw.

function · Mar 20, 2005

Li Mu Bai said:
You must understand the differences between launches & spec. finalizations.

Because understanding this (assuming I don't) somehow changes what the consoles are capable of on launch day?

Li Mu Bai · Mar 20, 2005

function said:
Li Mu Bai said:

You must understand the differences between launches & spec. finalizations.

Click to expand...

Because understanding this (assuming I don't) somehow changes what the consoles are capable of on launch day?

Launch day isn't a correlation to technology available during the R&D process is what I was attempting to relate. I wasn't assuming your ignorance, my apologies if you took it in this way.

function · Mar 20, 2005

Li Mu Bai said:
Launch day isn't a correlation to technology available during the R&D process is what I was attempting to relate. I wasn't assuming your ignorance, my apologies if you took it in this way.

No worries, I was just (ineffectively) trying to say that on launch day it's what the consumers (like me) are gawking at in the shops that becomes significant. Of course, ability to mass produce the technology asap to meet global demand is also a factor. The DS for example might look rough compared to the far more cutting edge PSP, but Nintendo have avoided Sony's limited production woes which have led to Europe (as usual) getting the shaft in the form of a several month delay of the PSP launch (which was already several months behind Japan).

By the time the PSP hits Europe in big numbers it'll look much less impressive than it did last year, due to competition from the likes of the ...err.. N-Gage 2 and, err ... Gizmondo ...

Fox5 · Mar 20, 2005

function said:
Li Mu Bai said:

Launch day isn't a correlation to technology available during the R&D process is what I was attempting to relate. I wasn't assuming your ignorance, my apologies if you took it in this way.

Click to expand...

No worries, I was just (ineffectively) trying to say that on launch day it's what the consumers (like me) are gawking at in the shops that becomes significant. Of course, ability to mass produce the technology asap to meet global demand is also a factor. The DS for example might look rough compared to the far more cutting edge PSP, but Nintendo have avoided Sony's limited production woes which have led to Europe (as usual) getting the shaft in the form of a several month delay of the PSP launch (which was already several months behind Japan).

By the time the PSP hits Europe in big numbers it'll look much less impressive than it did last year, due to competition from the likes of the ...err.. N-Gage 2 and, err ... Gizmondo ...

And Xboy!

Li Mu Bai · Mar 22, 2005

I'll probably get a lot of shit for this, but I don't think hardware is Nintendo's strength.

Prepare for the fecal storm my friend. As the GC has a wonderfully unique & technically impressive piece of hardware architecture. (as well as powerful) I cannot wait to see what Broadway offers, especially since its supposed to be expanding upon the innovative features found initially in the Flipper chipset. (I would expect some aspects of Hollywood to follow suit from the Gekko as well) Let us begin with the GC's partial dissection:

-Mosys 1-TSRAM with a refresh/latency rate equivalent to, though not surpassing those of conventional SRAM. Main memory system bandwidth: Approximately 10ns Sustainable Latency

-2mb of on-chip embedded RAM Z and framebuffer with 7.5gb dedicated bandwidth. This on-die Z-buffer completely removes all of those accesses from hogging the limited amount of main memory bandwidth the Flipper GPU is granted. 6.2ns suistainable latency (1T-SRAM)

-1mb texture cache with 10.5gb dedicated bandwidth which can hold compressed textures & assists with texture load performance

-Early Z check HSR

-Texture Environment (TEV) which is essentially a pixel shader with extremely flexible texture reads (more so than even the NV2A's) but slightly less flexible combines than the NV2A. (think indirect texturing effects like heat distortion)

-Half the L1 data is locked to keep needed information without wasting reads to L2 cache, and ultimately main memory. The rest of the chip isn't penalized for accesses to the L2 data cache due to the non-blocking cache arrangement. Also, after all the data is transferred, it has to travel back through the L1 and L2 data caches while it makes its way back to the system bus.

So the 64 bit data bus to the processor from the L1 data cache is still 5.6 GB/s, and it is written back to the L2 cache using the remaining bandwidth of the 256-bit connection.

-A 128-byte FIFO write gather pipe accumulates data to be sent in 32-byte bursts to the graphics chip.

-32 byte fill buffer rests between the L2 cache and the L1 cache, and between the L1 cache and the FIFO write gather pipe.

-4:1 vertex compression can be held in the L1 cache, with a small cache for decompression.

-Seperate FIFO write gather pipe for bursts of graphics data to main memory while the bus is not busy.

-8 layer multi-texturing per single pass (however fillrate intensive)

-8 hardware lights (global) offered at no computational penalty as they are performed in parallel to other functions

-Flipper does support virtual texturing, which is a fetch on demand for textures

-PPC 750CXE Cpu with additional SIMD functionality. (40 instructions total) Data quantisation inclusive, which simplifies the use of compressed data and in effect ties in with the Gekko when needed for dynamic geometry processing in the system.

-Compresses textures at a 6:1 ratio via S3TC

-EMBM & per-pixel lighting supported in hw

-Trilinear filtering comes at no cost to the Gamecube's texel fillrate

-Gekko utilizes paired single capability

-A few of its features were added in dynamically, like self-shadowing and color tinting.

-A FSB frequency that results in a 1.3GB/s connection between Gekko and the North Bridge

-All bus clocks operate in synch with one another, which lends itself to a much lower latency operation (the memory bus is synchronized to the Gekko's FSB and Flipper's operating frequency (162MHz x 2)

Official overly conservative polygon throughput numbers, etc, etc. Nintendo's biggest fault imo were its ram allocations, providing too much as far as ARAM, & too little as far as main system 1-TSRAM. Far from being a "weak" system with specs. finalized a year before the XBX's. Its design was made to allow for an efficient, fast, & uninterrupted data flow. (no bottlenecks, texture stalls, etc.) Although the much dreaded & time consuming handcoding is necessary to truly exploit many of the system's more impressive feature sets. (I expect the Revolution to alleviate this) HW is not Nintendo's forte you say?

PC-Engine · Mar 22, 2005

Nintendo could've added more RAM to GCN, but then they would've had to sell it for $250 instead of $200.

Li Mu Bai · Mar 23, 2005

PC-Engine said:
Nintendo could've added more RAM to GCN, but then they would've had to sell it for $250 instead of $200.

Indeed it would've been ideal, although not justifiable seeing as there was no DVD playback offered. (while their competitors were for a mere $50 more) Which leads me to the question of the Revolution, I have no doubt that it will be priced at the $300 price point, although in all likelihood supporting yet another proprietary format. DVD playback will not even be such a touted feature this time, (if one at all worth mentioning) with Wifi out of the box & a free online gaming community supported could easily sway consumers imo. But the launch software must be both varied & plentiful.

darkblu · Mar 23, 2005

Li Mu Bai said:
-DOT3, EMBM, & per-pixel lighting supported in hw

Click to expand...

you sure bout the dot3?

otherwise i can only agree with you. of all the consoles this gen, the GC is the one that gets the crown for overall best price/performance design. which, IMHO, has always been the point about game consoles.

Li Mu Bai · Mar 23, 2005

darkblu said:
Li Mu Bai said:

-DOT3, EMBM, & per-pixel lighting supported in hw

Click to expand...

Click to expand...

you sure bout the dot3?

otherwise i can only agree with you. of all the consoles this gen, the GC is the one that gets the crown for overall best price/performance design. which, IMHO, has always been the point about game consoles.

Yes I'm sure darkblu. Here are comparative methods on how DOT3 is accomplished on both the GC & PC (from an earlier post of mine):

GC DOT3 Method: Bump Mapping= Visually better results can be achieved using â€œrealâ€ bump mapping as supported with the indirect texture unit. (TEV) Using this method the hardware computes a normal per pixel and uses that to lookup different textures including a diffuse light map (containing all directional and ambient lights), an environment map and even a specular map. Thereby all those shading effects are computed correctly in a bumped way. However, since the global lights are now fetched from a texture instead of being computed by the lighting hardware, the texture needs to be generated dynamically as soon as the camera orientation and/or the lights change.In addition, the height field needs to be pre-processed into a â€œdelta U/delta V textureâ€ (which is an intensity/alpha texture with four bit per component) and therefore needs (without further measures) twice as much memory for texture storage than the emboss mapping method.

The delta-texture is fed into the indirect unit where it is combined with the surface normals, describing the orientation of the bump map. In the last stage of this three-cycle setup, the diffuse light map is looked up and the result is the bumped light color for the global lights. Note that the local lights are still computed per vertex (because they have a location and the normal used as input data does not give this information) and are added later in the texture environment.

PC Dot3 Method: Bump Mapping= Take a height map as input - this would be a file that contains numbers that correspond with a certain heights.
Internally this height map is translated into a slope map. This means that the slope is calculate along the UV parameters (the x and y parameters of the texture and bump map). This is done quite simply by taking the height values and subtracting them from each other to indicate the change in height in u and v directions (of course normalised). These perturbations give the change of the normal relative to a normal perpendicular to the base polygon. Now when doing the light calculations you do a dot product between the light source (direction and intensity) - the normal of the plane and the perturbation in u and v directions. The result is a changed light intensity calculation that takes into account the bump map (through the slope values).

As perturbed environment, blend, & pre-calculated bumpmapping do not match the definitions listed above. Emboss mapping (pre-baked) computes light values per pixel. It is not possible to compute â€œbumpedâ€ specular highlights and reflections using it. Gamasutra was my source for the GC method, drawing some parallels from F5's work on RL.

StefanS · Mar 23, 2005

Li Mu Bai said:
GC DOT3 Method: Bump Mapping= Visually better results can be achieved using â€œrealâ€ bump mapping as supported with the indirect texture unit. (TEV) Using this method the hardware computes a normal per pixel and uses that to lookup different textures including a diffuse light map (containing all directional and ambient lights), an environment map and even a specular map. Thereby all those shading effects are computed correctly in a bumped way. However, since the global lights are now fetched from a texture instead of being computed by the lighting hardware, the texture needs to be generated dynamically as soon as the camera orientation and/or the lights change.In addition, the height field needs to be pre-processed into a â€œdelta U/delta V textureâ€ (which is an intensity/alpha texture with four bit per component) and therefore needs (without further measures) twice as much memory for texture storage than the emboss mapping method.

The delta-texture is fed into the indirect unit where it is combined with the surface normals, describing the orientation of the bump map. In the last stage of this three-cycle setup, the diffuse light map is looked up and the result is the bumped light color for the global lights. Note that the local lights are still computed per vertex (because they have a location and the normal used as input data does not give this information) and are added later in the texture environment.

Click to expand...

That's how you do it, that doesn't mean it is done in hw. AFAIK it doesn't support it in hw and uses Gekko for some operations

Xenon , Ps3 , Revolution ...

psurge

function

None functional

Shifty Geezer

uber-Troll!

ERP

Fox5

version

psurge

darkblu

Li Mu Bai

function

None functional

function

None functional

Li Mu Bai

function

None functional

Fox5

Li Mu Bai

PC-Engine

Li Mu Bai

darkblu

Li Mu Bai

StefanS

meandering Velosoph

Similar threads