Toshiba, Sony close to 65nm sample production

ERP · Dec 20, 2003

ERP, what do you mean by "pipelining" - pipelining usually refers to the multiple stages in a single processor

Pipelining is just an approach to parallelism, where you break down a complex operation into sequential parts, and implement the seqential parts in parrallel on different pieces of sequential data. It is the sort of parrallelism that modern processors (and GPU's) exploit for performance.

cthellis42 · Dec 20, 2003

Argh... I was hoping we'd gotten past that part, simply because it's been the MOST STUPID running arguement overall in this thread, and it's fundamentally useless.

I hate, hate, HATE things that boil down "we don't know" or "you can't prove that XXX did/didn't do YYY" as they are PURE conjecture, and can be equally applied to both sides. What if Sony has had a team working full time on CELL since 1993, which would not attract notice but could involve a lot of man-hours and tracing chip-trends for the past decade! What if Microsoft saw the success of the Atari 2600 and has been planning on breaking in since then, and has had two guys (they're each named "Rufus," amusingly) working at it straight since 1980?

Arguing the impact of funding is one thing, as one could reach across the industry or the partners' past to try to draw parallels, one can examine the patents and the complexity and question just how much extra such a project would NEED... All that is fine (and has been done), as it's at least working with things that have left their traces on the industry. But on arguing the simply the level of funding or project-involvement itself, I can see no reason to give anything but the information we can SEE any serious weight, as conjecture in that area can take any form and be applied to any side of the equation willy-nilly. (Basically, it factors out.)

Why change what we ALWAYS do, which is the re-figure things every time new pieces come to light?

Dave Baumann · Dec 20, 2003

Panajev2001a said:
DaveBaumann said:

Note that I was using those two examples as ones that Panajev brings up. The point I'm making is that you need to strike a balance of abilities / performance for the lowest level of application you are going to set them to in relation to the scaled up versions.

Click to expand...

What kind of abilities do you see in the APU ( this is the building block and scaling is basically adding more APUs in the PE and then adding more PEs to the system ) that would result in the problem you mention ?

I donâ€™t know. But lets take a scenario.

Let say you have a function that can operate at the right performance with the number of ALUâ€™s you have in the PS3, however if your design requirement are also saying you want this in a, say, PDA in the same time frame, but the number of APUâ€™s required to operate that instruction at a decent rate for the PDAâ€™s purposes is more than the power requirements of that device will allow - what do you do? You can decide not to use it in that device until the next process or two or you put a specific instruction in at the APU level that would allow the PDAâ€™s requirement to be hit but would probably be wasteful in terms of silicon for the larger devices â€“ at your design phase those priorities need to be met and understood and dependant on the business need/importance one of those things may occur.

Take another example â€“ in the previous thread where we were talking about texture operations nAo suggested that the texld instruction could be in at the APU level â€“ well, would this be a sensible instruction for all of the APUâ€™s in the PS3? What use would this instruction have in, say, and MP3 player or another non-graphics device? Iâ€™d guess at very little.

If they are very general purpose processing units then you possibly aren't going to too many redundancy issues, but then again you also aren't likely to have specific instruction at the APU level hardware such as nAo was requesting. Alternatively, if you do have specific instructions then you do run the risk of a higher level of redundancy.

Panajev2001a · Dec 21, 2003

If they are very general purpose processing units then you possibly aren't going to too many redundancy issues, but then again you also aren't likely to have specific instruction at the APU level hardware such as nAo was requesting. Alternatively, if you do have specific instructions then you do run the risk of a higher level of redundancy.

I have had that talk with nAo and as I said to him, I do not think that those special instructions should have place in the APUs' ISA.

We want APUs' ISA to be uniform across all APUs for the CELL idea to hold true and produce the expected result: modifying that ISA would negate the ideals behind CELL and would require after a while the same software layer for CELL device to CELL device communication.

This is from Suzuoki's patent:

[0002] The computers and computing devices of current computer networks, e.g., local area networks (LANs) used in office networks and global networks such as the Internet, were designed principally for stand-alone computing. The sharing of data and application programs ("applications") over a computer network was not a principal design goal of these computers and computing devices. These computers and computing devices also typically were designed using a wide assortment of different processors made by a variety of different manufacturers, e.g., Motorola, Intel, Texas Instruments, Sony and others. Each of these processors has its own particular instruction set and instruction set architecture (ISA), i.e., its own particular set of assembly language instructions and structure for the principal computational units and memory units for performing these instructions. A programmer is required to understand, therefore, each processor's instruction set and ISA to write applications for these processors. This heterogeneous combination of computers and computing devices on today's computer networks complicates the processing and sharing of data and applications. Multiple versions of the same application often are required, moreover, to accommodate this heterogeneous environment.

[0003] The types of computers and computing devices connected to global networks, particularly the Internet, are extensive. In addition to personal computers (PCs) and servers, these computing devices include cellular telephones, mobile computers, personal digital assistants (PDAs), set top boxes, digital televisions and many others. The sharing of data and applications among this assortment of computers and computing devices presents substantial problems.

[0004] A number of techniques have been employed in an attempt to overcome these problems. These techniques include, among others, sophisticated interfaces and complicated programming techniques. These solutions often require substantial increases in processing power to implement. They also often result in a substantial increase in the time required to process applications and to transmit data over networks.

[...]

[0009] Therefore, a new computer architecture, a new architecture for computer networks and a new programming model are required. This new architecture and programming model should overcome the problems of sharing data and applications among the various members of a network without imposing added computational burdens. This new computer architecture and programming model also should overcome the security problems inherent in sharing applications and data among the members of a network.

[...]

[0011] In accordance with the present invention, all members of a computer network, i.e., all computers and computing devices of the network, are constructed from a common computing module. This common computing module has a consistent structure and preferably employs the same ISA. The members of the network can be, e.g., clients, servers, PCs, mobile computers, game machines, PDAs, set top boxes, appliances, digital televisions and other devices using computer processors. The consistent modular structure enables efficient, high speed processing of applications and data by the network's members and the rapid transmission of applications and data over the network. This structure also simplifies the building of members of the network of various sizes and processing power and the preparation of applications for processing by these members.

[0012] In another aspect, the present invention provides a new programming model for transmitting data and applications over a network and for processing data and applications among the network's members. This programming model employs a software cell transmitted over the network for processing by any of the network's members. Each software cell has the same structure and can contain both applications and data. As a result of the high speed processing and transmission speed provided by the modular computer architecture, these cells can be rapidly processed. The code for the applications preferably is based upon the same common instruction set and ISA. Each software cell preferably contains a global identification (global ID) and information describing the amount of computing resources required for the cell's processing. Since all computing resources have the same basic structure and employ the same ISA, the particular resource performing this processing can be located anywhere on the network and dynamically assigned.

The instructions nAo envisioned do not seem to be part of what CELL is about: if they were introduced they would have to be part of every APU in some form introducing the redundancy problem you described.

A better solution would be to embed in a CELL CPU or CELL GPU some dedicated Silicon which is not part of the APU specification.

Like you would add a Sound chip to a console you can add Pixel Engines or other Hardware constructs without changing the APU's structure and ISA.

Dave Baumann · Dec 21, 2003

The instructions nAo envisioned do not seem to be part of what CELL is about: if they were introduced they would have to be part of every APU in some form introducing the redundancy problem you described.

Yes.

But can you also see the flipside of that - making sure that the native instructions within the APU's hits the right level of functionality for the computational power of the range of devices you are looking to it to be place in at the right die size / power requirements?

A better solution would be to embed in a CELL CPU or CELL GPU some dedicated Silicon which is not part of the APU specification.

Ahhh, so, fixed functionality isn't such a bad boy then?

Doesn't this also defeat the entire object a little? Surely the point is to have a simple mechanism that is scalable up and down devices with the minimum of change to the basic construct - whats the point of doing that if you have to lob a bunch of extra instructions into a fixed unit somewhere for each different device?

[Kinda OT I guess] Do we have any clue what type of instruction set would be applicable to an APU as well? It seems to strikes me that the "look at the number of FLOPS it can do" willy waving exercises some people like to go into seems to be missing the issue - dependant on your instruction set some ops are going to take a tone of APU's/cycles in comparison to more focused hardware . The texld instruction is one such example - obviously thats useful for shader purposes and an NV30, for instance, will be able to carry 4 out in a single cycle - how many cycles would it take non-decidated hardware? What other instructions that are commonly used in shader ops are there that specific shader hardware already has that the PS3 will be relaint the number of APU's & clock cycles to achieve?

Panajev2001a · Dec 21, 2003

DaveBaumann said:
The instructions nAo envisioned do not seem to be part of what CELL is about: if they were introduced they would have to be part of every APU in some form introducing the redundancy problem you described.

Click to expand...

Yes.

But can you also see the flipside of that - making sure that the native instructions within the APU's hits the right level of functionality for the computational power of the range of devices you are looking to it to be place in at the right die size / power requirements?

I think that is why the instructions part of the APU ISA would be related not to the power requirement of any CELL based device, but to the workload CELL as an architecture is mostly geared towards.

I see APU's ISA to be quite small and I am talking 50-60 Instructions kind of small.

A better solution would be to embed in a CELL CPU or CELL GPU some dedicated Silicon which is not part of the APU specification.

Click to expand...

Ahhh, so, fixed functionality isn't such a bad boy then?

Doesn't this also defeat the entire object a little? Surely the point is to have a simple mechanism that is scalable up and down devices with the minimum of change to the basic construct - whats the point of doing that if you have to lob a bunch of extra instructions into a fixed unit somewhere for each different device?

Hehe, no fixed functionality is not that bad of a boy: even when talking about 3D Rendering we always assumed that PlayStation 3 was not going to go 100% Software even with CELL as there are some tasks that are simply quite easy to implement in fast Silicon Logic than to run them through software solutions.

Those tasks also do not benefit of the fact of being run in SoftwareL they are solved problems, they can find succesful Hardware implementations.

The idea of CELL was not replace Dedicated Silicon 100%, we want CELL devices to be flexible, modular and to be still able to communicate easily with each other and for that we need a flexible, but common building-block.

We also have to face practical problems and that is why for each device we look at what it has to do and how we can complement CELL to be at its best in that device without ruining the ideals behind CELL.

A CELL based PlayStation 3 could even have a custom non CELL based GPU and PlayStation 3 would still fit in the big picture of a CELL based Home Network although a CELL based Visualizer would be better IMHO.

CELL is flexible to handle most of the tasks we want it to do, but sometimes there are few of them which are small in themselves, but are used very often in which it is not worth do by software and that is why even CELL leaves some space for Dedicated Silicon.

If PlayStation 3 developers will want to do software texture filtering because it fits their needs then they can still do it and they have flexibility and power to attempt that.

Generally with CELL we try to do with our trusty APUs all those tasks in which we would like to have programmable solutions, but as I keep sayign there are some tasks that nobody would really want to code as they are fully solved problems.

[Kinda OT I guess] Do we have any clue what type of instruction set would be applicable to an APU as well? It seems to strikes me that the "look at the number of FLOPS it can do" willy waving exercises some people like to go into seems to be missing the issue - dependant on your instruction set some ops are going to take a tone of APU's/cycles in comparison to more focused hardware . The texld instruction is one such example - obviously thats useful for shader purposes and an NV30, for instance, will be able to carry 4 out in a single cycle - how many cycles would it take non-decidated hardware? What other instructions that are commonly used in shader ops are there that specific shader hardware already has that the PS3 will be relaint the number of APU's & clock cycles to achieve?

FP/FX:

ADD, SUB, MUL, DIV, MADD ( more than one kind... with broadcast and without, etc... ), LOAD/STORE, etc...

You would have SIMD and Scalar versions of several of the Arithmetic Instructions of course.

Of course they will have some more specialized Instructions, but those will be related to efficient message passing, general DMA and I/O related operations, etc...

I do not see the real need for a Dot Product Instruction for example: they should have in their Libraries a function that does that work and that maps to a certain sequence of simple instructions, but I am saying something very obvious here.

If you look at EE's VUs ISA, minus some instructions like CLIP that are related to 3D Graphics in particular, we might find a good deal of the kind of ISA we are expecting the APUs to have.

I expect them to extensively profile 3D Engines, Image Processing applications, Networking Stacks, etc... ( all applications in which CELL has the advantage and for which its power and architecture is best suited ) and to include in the ISA the most used Instructions ( plus of course some set of useful general and basic operations ) they found.

I do not see complex 3D Operation being part of the ISA, but I see part of the ISA useful operations which would be needed to implement those Operations and make them relatively fast.

Some of those instructions might be used for several tasks... to make an example, we might see the usefulness of some comparation Instructions that work on absolute values.

Those would be useful for Physics, Image Processing and general Signal Processing, 3D Graphics, etc...

One thing about what ERP and Fafalada said about "Pipelines", again quoting Suzuoki's CELL patent:

[0131] The ability of APUs to perform tasks independently under the direction of a PU enables a PU to dedicate a group of APUs, and the memory resources associated with a group of APUs, to performing extended tasks.

The PUs can create "Pipelines" dedicating APUs to certain tasks: it seems to me that the STI guys thought about cases in which this might be useful to programmers.

Dave Baumann · Dec 21, 2003

Generally with CELL we try to do with our trusty APUs all those tasks in which we would like to have programmable solutions, but as I keep sayign there are some tasks that nobody would really want to code as they are fully solved problems.

Well, bear in mind there should be an API in there and I would assume that would present things as a "solved problem" to the developer but still actually run them in software. DX, for instance, has macros - NV30 can run sincos in a single cycle through dedicated hardware, however DX has a sincos macro that for hardware without the native functionality will execute it in up to 12 cycles. I would image there would be lots of instructions presented to the developer through the PS3 API that would then get executed in software (presented via the HAL) over numerous cycles / APU's.

ADD, SUB, MUL, DIV, MADD ( more than one kind... with broadcast and without, etc... ), LOAD/STORE, etc...

You would have SIMD and Scalar versions of several of the Arithmetic Instructions of course.

Of course they will have some more specialized Instructions, but those will be related to efficient message passing, general DMA and I/O related operations, etc...

Yeah, this is fairly low level then.

Question - have you looked at the instruction set of a DX9 class shader hardware?

http://www.beyond3d.com/articles/nv30r300/index.php?p=4#vsinstr
http://www.beyond3d.com/articles/nv30r300/index.php?p=8#psinstr

Panajev2001a · Dec 21, 2003

DaveBaumann said:
Generally with CELL we try to do with our trusty APUs all those tasks in which we would like to have programmable solutions, but as I keep sayign there are some tasks that nobody would really want to code as they are fully solved problems.

Click to expand...

Well, bear in mind there should be an API in there and I would assume that would present things as a "solved problem" to the developer but still actually run them in software. DX, for instance, has macros - NV30 can run sincos in a single cycle through dedicated hardware, however DX has a sincos macro that for hardware without the native functionality will execute it in up to 12 cycles. I would image there would be lots of instructions presented to the developer through the PS3 API that would then get executed in software (presented via the HAL) over numerous cycles / APU's.

Of course, I expect them to provide API support for some of those "solved problems" and that we might have macros basically which woul mean that those problems are executed in software.

I expect though that some of those "solved problems" might need a implementation which provides a high throughput ( Texture Sampling for example ) and in those cases I see them providing some Dedicated Silicon ( like in the Pixel Engines in the Visualizer ).

ADD, SUB, MUL, DIV, MADD ( more than one kind... with broadcast and without, etc... ), LOAD/STORE, etc...

You would have SIMD and Scalar versions of several of the Arithmetic Instructions of course.

Of course they will have some more specialized Instructions, but those will be related to efficient message passing, general DMA and I/O related operations, etc...

Click to expand...

Yeah, this is fairly low level then.

Question - have you looked at the instruction set of a DX9 class shader hardware?

http://www.beyond3d.com/articles/nv30r300/index.php?p=4#vsinstr
http://www.beyond3d.com/articles/nv30r300/index.php?p=8#psinstr

I haven't gotten a good and extensive look at the DirectX 9 Instruction set, but I will.... thanks for linking those pages

Dave... this, this kind of discussions is what makes this forum so unique

( oh well, I like it a lot

)

Panajev2001a · Dec 21, 2003

Ok, after a QUICK skim of the ISA for DX9...

I think we want the APUs to be able to branch and execute if-and-else statements: they are not supposed to be glorified FPUs.

Flow Control ( full or partial predication... or something as simple as CMOV ) should be part of the APUs' ISA.

JUMP, JUMPNZ, CALL, RETURN, IF, ELSE, LABEL, LOOP, etc... I think they should be featured in APUs' ISA ( they are useful to make APUs flexible and independent processors as the patent specifies they are ).

I am sure Branch Prediction will not be present in APU's ISA, although there is a chance the PUs might feature someform of ( at least static ) branch prediction.

I do not see DP3 and DP4 ( Dot-Product Instructions ) being part of the ISA ( efficiently doable with fast macros ) and I am not sure about SIN and COS, but pretty sure about DDX and DDY instructions as being implemented by macros.

To say the truth I would see no problem in implementing in the ISA several of the listed Math Instructions: I am sure that some of them might be useful for several applications and programmers might like a fast Hardware implementation.

A lot of programmers on PlayStation 2 do not use the EXP, LOG, SIN and COS functions the EFU provides on the VU1 because compared to using LUTs they are still too slow.

It will be up to the STI's engineers to decide if EXP, FRAC, LOG, RCP, RSQ are useful enough to be implemented in the APU's ISA for example and if they can afford to add those functions to the APUs.

Comparison Instructions should be present in what we consider the general Low Level Instructions the APU's ISA should contain and the rest ( more complex operations ) is up to the STI engineers and the applications they see CELL focused on and how big they want to make each APU ( if they can fit some more instructions they will again look at what is most used/needed and they will implement that [they probably already did all that work] ).

Fafalada · Dec 21, 2003

ERP said:
There just isn't that much stuff in the average game outside of rendering that works on a lot of sequential non dependant data.

I am not so sure about this - particularly games that have lots of gameworld entities or complex ones could be broken down into non-dependant data processes that are also sequential.
Granted it's perfectly possible to have code written in such a manner that's not feasible :|

I would argue that the main bottlenecks for display building are reading and copying data.

Which would be fine if one core actually had the capacity to saturate the bandwith instead of stalling on misses all the time. Somehow I doubt this will be the case though.

I'm not really trying to say that games can't make use of multiple processors for there core logic, just that the way most are currently constructed, it's difficult and time consuming, not to mention debugging hell. And in a deadline oriented industry than usually equates to won't happen often.

No argument here.

Dave Baumann · Dec 21, 2003

Panajev2001a said:
I expect though that some of those "solved problems" might need a implementation which provides a high throughput ( Texture Sampling for example ) and in those cases I see them providing some Dedicated Silicon ( like in the Pixel Engines in the Visualizer ).

Yeah, but I get the impression that the functions of the pixel engine will probably stick to what would be termed as the "fixed function SGI raster end" - i.e. texturing, Z ops, AA, etc. I'll be surprised if you seen anything particularily programmable in there.

Fafalada said:
I'm not really trying to say that games can't make use of multiple processors for there core logic, just that the way most are currently constructed, it's difficult and time consuming, not to mention debugging hell. And in a deadline oriented industry than usually equates to won't happen often.

Click to expand...

No argument here.

Curiously, though, aren't Intel helping in this regard? They are running telling developers how to make use of multiple threads in the game engine for HyperThreading, and this knowledge should be just as applicable to the pipelined ALU sysem in the PS3.

Deadmeat · Dec 21, 2003

Curiously, though, aren't Intel helping in this regard? They are running telling developers how to make use of multiple threads in the game engine for HyperThreading

Faf already explained it; it costs too much and takes too much time to construct a multi-threaded engine, unless the engine comes shrintwrapped in a CD ready to compile and developers just supply the content file to feed into the engine. Just look at how long it takes to contruct a decent multi-threaded engine like Doom3 and you see the problem.

and this knowledge should be just as applicable to the pipelined ALU sysem in the PS3.

Actually CELL programming is based on the concept of message passing; that is, many independently spawned processes cooperate with each other via message pipes, whereas multithreading assumes several threads working within single process address space. The techniques for constructing a message-passing program and a multi-threaded program are very different. CELL programming draws experiences from MPI/PVM cluster programming, which is a rare quality among game coders and must be taught by SCEI. Kutaragi is gambling that he could teach the developers to program for PSX3 just like how he got the developers to work on PSX2 5 years ago. We will see what happens.

Panajev2001a · Dec 21, 2003

DaveBaumann said:
Panajev2001a said:

I expect though that some of those "solved problems" might need a implementation which provides a high throughput ( Texture Sampling for example ) and in those cases I see them providing some Dedicated Silicon ( like in the Pixel Engines in the Visualizer ).

Click to expand...

Yeah, but I get the impression that the functions of the pixel engine will probably stick to what would be termed as the "fixed function SGI raster end" - i.e. texturing, Z ops, AA, etc. I'll be surprised if you seen anything particularily programmable in there.

You have the Visualizer's APUs

Dave Baumann · Dec 21, 2003

Deadmeat said:
Curiously, though, aren't Intel helping in this regard? They are running telling developers how to make use of multiple threads in the game engine for HyperThreading

Click to expand...

Faf already explained it; it costs too much and takes too much time to construct a multi-threaded engine, unless the engine comes shrintwrapped in a CD ready to compile and developers just supply the content file to feed into the engine. Just look at how long it takes to contruct a decent multi-threaded engine like Doom3 and you see the problem.

I don't think it necessarily has to cost too much, its probably more a different way if thinking. Auqanox 2 is an example of it already in operation - I believe they use collision detection down on a separate thread; other physics and AI are also examples that could probably be fairly easily threaded.

Panajev2001a said:
DaveBaumann said:

Yeah, but I get the impression that the functions of the pixel engine will probably stick to what would be termed as the "fixed function SGI raster end" - i.e. texturing, Z ops, AA, etc. I'll be surprised if you seen anything particularily programmable in there.

Click to expand...

You have the Visualizer's APUs

Oh, absolutetly, I was talking about the pixel engine specifically, not the visualiser as a whole.

Vince · Dec 21, 2003

Dave,

I don't have it, but perhaps someone can link you to SCE's somewhat recent patent related to a traditional rasterizer - the Pixel Engines you just refered to. It's been mumbled that it's design is intended for PSP, but I'd assume it would be a decent starting point as I question how much the PS3's fixed logic will deviate from it. It's not that elegant, leaves much to be desired in some aspects, but it's duable considering what's in front of it.

Panajev2001a · Dec 21, 2003

DaveBaumann said:
Panajev2001a said:

DaveBaumann said:

Yeah, but I get the impression that the functions of the pixel engine will probably stick to what would be termed as the "fixed function SGI raster end" - i.e. texturing, Z ops, AA, etc. I'll be surprised if you seen anything particularily programmable in there.

Click to expand...

You have the Visualizer's APUs

Click to expand...

Oh, absolutetly, I was talking about the pixel engine specifically, not the visualiser as a whole.

I agree with your concept of Pixel Engines then completely: AA, Texturing ( sampling, etc... ) and Z-checks.

The Image Cache they have attached might be a Texture Cache of some sort as they would have the chance of modifying the access to that Memory Pool to optimize it for Textures accesses while leaving the main pool of shared e-DRAM ( shared between the PEs ) unchanged.

If the main pool of Shared e-DRAM is fast enough we might leave the frame-buffer and the Z-buffer there.

Each PE would likely work on Rasterizing separate Triangles: after-all we expect next-generation to have average Triangle size to fall down to even 1-2 Pixels or lower in some optimized engines and dedicating 4 Pixel Pipelines or more to fill a single Triangle might be wasted.

It is not like each Triangle would not have enough Shading power concentrated on it: you still have 4 APUs on each Visualizer PE.

Panajev2001a · Dec 21, 2003

Vince said:
Dave,

I don't have it, but perhaps someone can link you to SCE's somewhat recent patent related to a traditional rasterizer - the Pixel Engines you just refered to. It's been mumbled that it's design is intended for PSP, but I'd assume it would be a decent starting point as I question how much the PS3's fixed logic will deviate from it. It's not that elegant, leaves much to be desired in some aspects, but it's duable considering what's in front of it.

I find what we hear about the PSP's GPU to be quite neat and if the rumors you hear about the Pixel Engines in the Visualizer PEs are related to the PSP's GPU Rasterizing logic, well I cannot say I would be unhappy

The patents I have seen that relate to the PSP somewhat were a bit more elegant than you give them credit of being: I liked the P-buffer idea and the "DOOM III in Hardware Rendering scheme idea"

Deadmeat · Dec 21, 2003

...

I don't think it necessarily has to cost too much, its probably more a different way if thinking.

Time is money. If a project takes twice as long to finish, then it is going to cost twice as much(if not more if you consider the financial interests).

Auqanox 2 is an example of it already in operation - I believe they use collision detection down on a separate thread; other physics and AI are also examples that could probably be fairly easily threaded.

Dealing with two or three thread isn't too bad; your brain can handle it. But things start getting out of hand if the thread counts exceeds 8, and PSX3 certainly has more than APU if the patent application is to be believed.

Dio · Dec 21, 2003

Panajev2001a said:
Each PE would likely work on Rasterizing separate Triangles: after-all we expect next-generation to have average Triangle size to fall down to even 1-2 Pixels or lower in some optimized engines and dedicating 4 Pixel Pipelines or more to fill a single Triangle might be wasted.

A good point, particularly for a console - well, non-HDTV at least - where resolutions are low.

Of course, it has the flip side that on larger triangles such an architecture could be inefficient. There is significant cost efficiency and reduction of redundant operations in raising granularity.

Dave Baumann · Dec 21, 2003

Re: ...

Panajev2001a said:
The patents I have seen that relate to the PSP somewhat were a bit more elegant than you give them credit of being: I liked the P-buffer idea and the "DOOM III in Hardware Rendering scheme idea"

Ahhh, the poor mans tiler! This is actually in use on 3DLabs P9 and at one point S3's DeltaChrome was supposed to do it, but I'm not sure whether its actually running or not.

However, the thing to bear in mind with the Doom3 style rendering system is that it only saves at render time, not geometry - you gains would be dependant on where you place the rendering onus (geometry or raster). This makes a lot of sense for PSP as I would expect it to be largely raster bound quite often, but I'm still not so sure with PS3 - conversly, with a system that uses lots of geometry deferring the rendering will eat up the memory or has to calculate it twice.

Deadmeat said:
Dealing with two or three thread isn't too bad; your brain can handle it. But things start getting out of hand if the thread counts exceeds 8, and PSX3 certainly has more than APU if the patent application is to be believed.

Just going by what I see, I doubt you'll have multiple thread across each APU - that would be for the system to handle I'd say the threads you exists at the PU level.

Toshiba, Sony close to 65nm sample production

ERP

cthellis42

Hoopy Frood

Dave Baumann

Gamerscore Wh...

Panajev2001a

Dave Baumann

Gamerscore Wh...

Panajev2001a

Dave Baumann

Gamerscore Wh...

Panajev2001a

Panajev2001a

Fafalada

Dave Baumann

Gamerscore Wh...

Deadmeat

Panajev2001a

Dave Baumann

Gamerscore Wh...

Vince

Panajev2001a

Panajev2001a

Deadmeat

Dio

Dave Baumann

Gamerscore Wh...

Similar threads