nFactor2 - an engine on X360

A 8KB buffer sould be enougth to hide up to 500 cicle DMA latency in the worst case scenario (one 128 bit load from LS to register per cicle) not counting the instrucción buffer.

In any case when you can predict all the data that you will use (so you aren´t loading from Main Memory data that you won´t use) like in the tipical Streaming case you have a lot more memory in the LS than you need.

In fact the originaly the SPUs only had 64KB LS.
 
In that case the output dataset size might form the constraint. Say 8KB of input data leads to 32KB of output data (e.g. tessellation of input triangles). How long would it take, say, to output that data to RSX? Would a larger batch be more desirable in the case of this algorithm? Does it matter what size of batch is used, as long as 8KB is the minimum input dataset?...

Anyway, it's interesting what you say about LS being "larger than strictly necessary for a purely 128-bit Vec4 datastream".

Jawed
 
If you have an algorithm that give you 4x more output then you would reduce the size of the input buffer, load a 2KB buffer and store a 8KB buffer would take you at least 600 cicles (probably a lot more) to use the buffers.

If you have a SPU that will only work in Streaming you can have other SPU to use some of his LS.

If you can't use buffers (because you don't know what data will you need) I think that the best option would be doing two or more things at the same time.
 
http://www.watch.impress.co.jp/game/docs/20050929/3dinis.htm

Zenji Nishikawa uploaded his detailed report on the presentation by clearing NDA, with 1024x576 directfeed screenshots...

+ The engine, nFactor2, is an in-house engine by Inis that utilizes multicore processor and SM3.0 GPU.

+ FP10 (7e3) HDR + tone mapping
(pupil simulation by dynamic tone mapping)
http://www.watch.impress.co.jp/game/docs/20050929/ini04.htm
http://www.watch.impress.co.jp/game/docs/20050929/ini05.htm
http://www.watch.impress.co.jp/game/docs/20050929/ini06.htm
(HDR bloom/glare)
http://www.watch.impress.co.jp/game/docs/20050929/ini07.htm
http://www.watch.impress.co.jp/game/docs/20050929/ini08.htm
http://www.watch.impress.co.jp/game/docs/20050929/ini09.htm

+ Normal maps, parallax maps (details created by Zbrush)
http://www.watch.impress.co.jp/game/docs/20050929/ini13.htm
http://www.watch.impress.co.jp/game/docs/20050929/ini14.htm

+ Light-Space Perspective Shadow Maps (LSPSM)
http://www.watch.impress.co.jp/game/docs/20050929/ini32.htm
http://www.watch.impress.co.jp/game/docs/20050929/ini33.htm

+ The shader was written in assembly at first but was converted to HLSL which resulted in some preformance gain. The performance of creating soft penumbra for LSPSM could be doubled thanks to dynamic branching supported by PixelShader 3.0
http://www.watch.impress.co.jp/game/docs/20050929/ini34.htm

+ 2x AA is applied to scene rendering without penalty thanks to eDRAM

+ Physics engine is NovodeX suitable for multicore

+ Hair physics is an original implementation by Inis, which is running on CPU and GPU.
(hair and accessary driven by physics)
http://www.watch.impress.co.jp/game/docs/20050929/ini41.htm
http://www.watch.impress.co.jp/game/docs/20050929/ini42.htm
(more physics demo)
http://www.watch.impress.co.jp/game/docs/20050929/ini43.htm
http://www.watch.impress.co.jp/game/docs/20050929/ini44.htm
http://www.watch.impress.co.jp/game/docs/20050929/ini45.htm
http://www.watch.impress.co.jp/game/docs/20050929/ini46.htm

Besides, Inis revealed the rendering pipeline of nFactor2.
http://www.watch.impress.co.jp/game/docs/20050929/ini47.htm
Its average polygon count per scene is 150,000. As Vertex Shader works in the first 5 passes, apparently GPU processes about 750,000 polygons/frame, so it's still Pixel Shader intensive.

+ Pass 1/2 - Shadowmap
(rendered by LSPSM into FP24, D3DFMT_D24FS8 depth buffer)
http://www.watch.impress.co.jp/game/docs/20050929/ini48.htm

+ Pass 3 - Z buffer prepass
(Deferred Rendering, which is fast in Xenos thanks to depth buffer resident in eDRAM)
http://www.watch.impress.co.jp/game/docs/20050929/ini49.htm
http://www.watch.impress.co.jp/game/docs/20050929/ini50.htm

+ Pass 4 - Shadow color
According to the developer, this pass was originally included in the pass 5, but was separated due to sub-par performance. He suggested texture cache was hindered by shadow rendering pass. 5x5 Gaussian filter for soft shadow might have destroyed the locality of texture cache in color rendering. Nishikawa speculates that Xenos has relatively small number of transistors because of removal of some cache memory while having more registers for multithreading, so Xbox360 may require Xenos-specific optimization unlike cache-rich NVIDIA GPU.
http://www.watch.impress.co.jp/game/docs/20050929/ini51.htm
http://www.watch.impress.co.jp/game/docs/20050929/ini50.htm

+ Pass 5 - Color, lighting
Diffuse, normal map, environmental map, gloss mapping etc. Lighting by all light sources in a scene is done in this pass.
http://www.watch.impress.co.jp/game/docs/20050929/ini52.htm
http://www.watch.impress.co.jp/game/docs/20050929/ini50.htm

+ Pass 6 - 11 - Luminance instrument
Scanning HDR-rendered frames to get average luminance. It has relatively small GPU load as it's only framebuffer processing by Pixel Shader.

+ Pass 12 - 19 - Bloom/Glare
Adding blur to places with higher luminance than average in a low-res buffer, then blend with rendering target
http://www.watch.impress.co.jp/game/docs/20050929/ini54.htm
http://www.watch.impress.co.jp/game/docs/20050929/ini50.htm

+ Pass 20 - Tonemapping
pupil simulation (dynamic exposure adjustment)
http://www.watch.impress.co.jp/game/docs/20050929/ini55.htm
http://www.watch.impress.co.jp/game/docs/20050929/ini50.htm

+ Pass 21 - Depth of Field
http://www.watch.impress.co.jp/game/docs/20050929/ini56.htm
http://www.watch.impress.co.jp/game/docs/20050929/ini57.htm
http://www.watch.impress.co.jp/game/docs/20050929/ini58.htm

The rest is about the breakdown of threads which was covered before in this thread.
One new revelation is, in an experimental version of the demo it had used 5 HW threads, but in the demo shown at the presentation uses 4 HW threads with Thread 0 running both main game loop and rendering engine.
 
Am i the only one to get some very strange fuzzy pixelisation in the shots with DOF? Looks terrible.

The rest looks ok i guess, the art is really awful, but i'm sure that technically it's a good engine.
 
what happend to polygons?

london-boy said:
Am i the only one to get some very strange fuzzy pixelisation in the shots with DOF? Looks terrible.

The rest looks ok i guess, the art is really awful, but i'm sure that technically it's a good engine.

Too many polygon edges for a tech demo of "optimized" engine. Lighting and art-style is strongest point.
 
london-boy said:
Am i the only one to get some very strange fuzzy pixelisation in the shots with DOF? Looks terrible.

The rest looks ok i guess, the art is really awful, but i'm sure that technically it's a good engine.
No, as I said what's with the fuzz? That's the worst DOF fake I've ever seen, such that I thought maybe my graphics were getting screwed. Every other game's DOF isa gaussian like blur but this thing looks a smess.
 
ihamoitc2005 said:
Too many polygon edges for a tech demo of "optimized" engine. Lighting and art-style is strongest point.

Funny thing is most people think that the art-style is the worst thing.
 
art style

mckmas8808 said:
Funny thing is most people think that the art-style is the worst thing.


Yes that is very funny. I agree the creatures are very ugly but I admire the composition of the scene and in a troll-like way the creatures can be seen as charming as well no? Depends on how animation, voices, etc are combined. Also, physics could be interesting with multiple spheres. I am very dissappointed by other design choices such as excessive GPU cycles for resulting poor shadowing. I might prefer if fake shadows used so polygon count improved hence trees, characters will be smooth. But I understand this is tech demo so goal is to show range of capabilities of engine rather than design choice capability of developer.
 
I wouldn't draw too many far-reaching conclusions on the quality of the DOF in this tech-demo, as all the pics look like they use very heavy jpeg compression (lots of artefacting all over the place). Things almost always get screwy when people try to pick apart pictures downloaded off the web and analyze them down to the smallest details.
 
Back
Top