Interview with one of the N64's system architects: Phil Gossett

Nightz

Newcomer
He was responsible for designing the RDP. He gives some insight into the reasoning behind some of the design decisions they made such as the trilinear filtering implementation, the texture cache and the choice of Rambus memory. Also get to hear about other influential Nintendo hardware designers Wei Yen and Tim Van Hook.

Skip to about 41:12 to hear the N64 stuff



Partial transcript


Gossett: I got a call from another set of, you know, friends that I’d worked with in the past who were at SGI, and they wanted to do a project with Nintendo, okay, which was sort of an odd thing, okay. I don’t think that was really in the genome of SGI to do low-end things. They were the super-high end. So, you know, they were thinking about doing it using people internally, but it became obvious that this was probably not a good idea, so they hired people like me and a bunch of other people, who had done lower-end things.


Mashey: Was also a terrific financial deal for SGI.


Gossett: Indeed, indeed. So the financial part of it was they got royalties on not just every console that was sold but also on every cartridge, every game that was sold, and this is a goldmine, okay.


Mashey: Dave Corbin, Andy Kean special


Gossett: So, you know, I don’t know who’s responsible for this, but it was genius.


Mashey: Yeah, it was those guys. Yes.


Gossett: Best deal ever.


Mashey: Yes.


Gossett: It was the perfect deal. There’s no down side and all up side. So anyway, they hired this-- well, they hired one architect who didn’t last very long. I saw his work and it wasn’t very inspired. Then they hired this guy, Tim Van Hook, who was very clever, and he had an outline, a sketch, I think he called it, and it was like 80 pages or something like that, that, you know, didn’t go into too much nitty-gritty detail but just kind of spec’ed out what it was, how it worked, how things were connected, blah, blah, blah. So anyway, and he was mostly interested in two parts of this design, so there were two sections to it. There was the reality signal processor, was called Project Reality internally. There was the reality signal processor, the RSP, and the reality data processor or display processor, I forget which, the RDP, and the RSP was like the geometry engine, and it was also all the audio processing, okay. So depending on what you were doing you wanted different size data, okay, so it had this vector unit that would be either eigth 8-bit units, I think it was, or four 16-bit units or two 32-bit units. Something like that. I could be off by a factor of two on that, but anyway, there were some problems with that, okay. I mean, for the adds, that all makes perfect sense, but these were multiply adds, and multiplies go quadratically, right. It’s a square of the size. So it didn’t really tile very well, and we just ended up duplicating logic, right, you know, but that was fine. It actually still made sense, because from a programming point of view it was slick. Okay. So I didn’t really have that much to do with that side of the chip. That was Mary Jo Dougherty was the overall

architect of that, and she was very good, and I was sort of the architect of the RDP side, okay. So the thing you have to remember, this is like ’94, I think?


Mashey: Yeah, ’94, yeah.


Gossett: And, you know, technology was kind of just barely adequate. Okay. And, you know, the real graphics engines, the RealityEngine, I think. Yeah, yeah. Were like a million dollars, okay, so--


Mashey: Run multiple screens, the whole bit. Yes.


Gossett: Exactly. Exactly, and we were trying to make something that was going to retail for like 150 bucks, okay, so--


Mashey: Yeah, but with one chip.


Gossett: Yeah, yeah. So you can’t do that. I mean, you can’t do anything like that, okay. But we didn’t want it to look cheesy, okay, right, so we wanted it to look like it came from a RealityEngine, you know, be it a clearly reduced resolution, but, you know, we wanted all the fixings, so we wanted, you know, texture mapping--


Mashey: Texture mapping and the whole bit, yeah.


Gossett: --and we wanted anti-aliasing and, you know, all that good stuff. So the sort of classic ways of doing that were just way too expensive, not even close. So what we did, so it was mostly me, but Tim Van Hook added a nice little embellishment, which I’ll get to in a sec, but, well, Tim came up with this clever way of folding the texture memory, okay, so you had mipmaps, they were called, and so power of two different resolutions and you’d sort of interpolate between two of them, depending on what level of detail you were at. This was the cheap way to do that. So he had this very clever way of folding the memory addressing so that it all, you know, optimized the use of memory, on-chip memory. But pretty much everything else was mine. So, you know, I didn’t design every gate but, you know, most of the gates, and I designed the architecture, you know, the detailed architecture. Oh, I also-- I think I designed every arithmetic element on that chip, okay, so that was one of my specialties was doing, you know, math units, and that-- we had a very limited choice of cell libraries to use for this chip, okay2. It was a full custom chip, but you don’t design everything from scratch. You have a cell library, right. So there was the so-called high-performance library and there was the so-called high-density library, which were really the low-density library and the low-performance library.
 
Mashey: Library. Yeah, right. Right. Yeah, okay.


Gossett: Okay. So the high-performance, low-density library was just-- we couldn’t afford it. There was no way we could’ve gotten anywhere close to the target. But the high-density, low-performance library was nowhere near fast enough. Now, for a lot of the chip, this didn’t really matter. You know, we weren’t pushing things. Except for the arithmetic parts, okay. There you really cared about performance. So what we did was we contracted some library design company to make I think it was a half dozen carefully selected cells. Okay. So there was like a D-flipflop, a full adder. You know, a 2-1 MUX NAND gate, XOR, okay. I forget what the last one was, but anyway. They were very carefully chosen to fit the design for the arithmetic elements, so that worked great. Those knocked it out of the park. So anyway, the other things that I had to do, we couldn’t quite afford the normal trilinear interpolation that you do for a mipmapped thing. So I came up with this hack, which was dubbed triangular interpolation, which split the four-element square in each of the two levels into two triangles, okay, and then you got to save a couple of multipliers by doing this, okay. Which mattered, okay. We were counting gates. So I did that, and that gave it sort of this interesting characteristic look. You know, things looked sort of hexagonal kind of. You’ve seen the graphics.


Mashey: I owned one of these things, so yeah.


Gossett: So, you know, had character.


Mashey: Yeah, it had character, yes.


Gossett: It didn’t look bad. It just...


Mashey: No. It looked, I mean, for the time, it looked really pretty good. Yeah.


Gossett: Yeah, yeah. So there was that hack. There was-- we wanted phong shading, you know, the highlights on Mario’s nose, right.


Mashey: Yes, yes, Mario’s nose, yes. I had that Mario, yeah.

Gossett: So we couldn’t afford to do that in the normal way, so I came up with this hack where you used the X and Y components of the half-angle normal, okay, and used that as addresses into a texture, okay, and in the texture you have a little Gaussian bump, okay, and that was the highlight, and that worked great and cost nothing, okay. It was very, very effective. It was just software to calculate the half-angle normals, and what was the-- well, okay. So there were two other really big ones. One was-- we built the first version of this. There were two version, you know, two generations of the chip. The first generation we knew wasn’t going to ship. It was way too big.

Mashey: Yeah, sure, yeah.

Gossett: We weren’t even trying to make the target.


Gossett: Right, right.

Mashey: Which was a bus size cut down version. Okay. Right.

Gossett: Right. That was sort of optimized for game use, and that was a separate--

Mashey: Right, and it was the graphics chip.


Gossett: That was a separate project and then there was the graphics chip.


Mashey: Yeah. Chip, yeah.


Gossett: And that was--


Mashey: And what else was in that?


Gossett: That was it.


Mashey: That was it, yeah.



Gossett: Yeah. I mean, there was some security nonsense and, you know, whatever.


Mashey: Yeah, okay. Was some RAM someplace, right?


Gossett: There—Rambus, yes, right . I’ll get to that in a second, okay.



Gossett: So actually, let’s get to Rambus now. So one of the obvious mistakes that they made was choosing Rambus. Sorry, Rambus, but it was supposed to be a cost saver. So the Takeda-san, who was the Nintendo exec who was running the project, you know, “Two yen per chip.” Okay. was going on and on. Or two yen per pin, right. So he was counting pins, right, because it cost, and, you know, true enough. The unfortunate reality of this was that Rambus was eight-way multiplexed, okay, so it was getting eight data bits across every clock. Okay. Now, buses are a challenging thing under the least constrained circumstances, and if you’re going eight times faster than normal, you know, I mean, duh. This is not going to go well, right. So you have to do things, and the things that you tend to do is have lots of grounds. Okay. So yeah, it had fewer single pins, but it made up for it with ground pins.


Mashey: With grounds, yeah.


Gossett: Okay. So I don’t think it actually saved anything on the actual pin count, and it was a nightmare from a sort of architectural point of view, because the inputs and outputs, the data ins and outs, were shared. They were multiplexed, right. So you had to turn the corner, right, whenever you wanted to go from reading to writing, right, and we weren’t allowed very much in the way of buffering on chip, as we were trying to cut die size, cut cost. So that really sucked performance. There was just no way they could be helped. But, you know, we were stuck with it, so we did the best we could, and anyway, so going back to the RDP. So there were two tricky things. One was we didn’t have very much memory, okay. So even though, you know, this was, like, considered a high-end memory chip at the time, it wasn’t very high-end, okay. I think it was half a megabyte, which sounds ridiculous today, but...

Mashey: Yeah, but it’s right.

Gossett: Anyway...

Mashey: That was then.
 
Gossett: That was then. So, you know, we wanted to have a reasonable sized frame buffer, right, but we didn’t really have anywhere to put it, right. So we couldn’t afford to do even, you know, 8-bit per component, right, which would be a 32-bit word, you know, with some alpha. So it had to be 5-5-5. Okay. Five bits per component, and because it was a Rambus chip with parity, it was actually 9 bits times 2, so it was 18 bits. So it’s 5-5-5-3, okay, so 3. What are we going to do with that, right?

Mashey: Yeah.

Gossett: So I came up with this-- well, two things. We-- first thing we did was just implement that straight up, and it looked terrible. You had to dither the five bits, so you compute eight bits and you dither it down to five bit. The problem was that the dithering interacted with the NTSC color system. Okay. So it just looked, I mean, it probably would’ve caused an epileptic seizure, you know. So it was hard to look at. I mean, it was painful. So it’s like, “Oh, crap. We can’t ship that,” right. So I came up with this dither correction filter, okay, which was cute3. So what you would do is you had a couple of line buffers on the way out to the display, okay, and we could just barely afford that. That was acceptable, and you used that to form a little three-by-three-pixel region, and you’d look at the color of the center pixel and then look at the color of the pixels surrounding it, and if they were, you know, above you’d add one into the LSB’s and if they were below, you’d subtract one or the-- or yeah. Anyway, this was just enough to use the neighborhood to smooth thing out so that it looked right. Okay. It was surprisingly effective, okay. It sounds lame, but, you know, it worked. Then the other thing was anti-aliasing, okay. So if you do nothing, you get jaggies, okay, and it looks awful.

Mashey: Yeah. You can’t have it, yeah. Right.


Gossett: You can’t have that. So we couldn’t afford to do the usual thing, which was to multi-sample, so the traditional multi-sample in the aliasing algorithm is you chop each pixel up into a four-by-four subpixel region and you compute all 16 of those subpixels in the average. Okay. That works great, okay, that’s fine. But, “pht,” there was no way we could--

Mashey: But you couldn’t afford, yeah.

Gossett: --afford that, okay. So you have the foreground color, okay, and you can compute the coverage of the subpixels that you would normally have. Now, you know, you’d sort of like to have a count of the subpixels, so you’d sort of liked to have had four bits, but we only had three, but it turns out if you just do sort of a checkerboarding of those subpixels you get eight black squares, okay, and that’s enough, okay, that’s good enough. Okay. So we have the foreground pixel color and we have the coverage count, okay, but we don’t have the background color. So what do you do? Well, okay, so we already had-- we’d already paid for these line buffers, okay, and, you know, could only be three pixels high but it could be any number of pixels wide, okay. So if you just did the three-by-three thing and used like the-- well, basically what the algorithm was, you used the maximum and the minimum of the neighborhood, okay. Now, one of those is going to be more or less the pixel you’re on and the other one’s going to be the background, okay. Except you don’t actually want to use the maximum and the minimum. You use the pen max and the pen-- the next to maximum and next to minimum to cover sort of corner cases, okay. That’s good enough, and then if you add the max and the min and then, you know, one of which is the foreground, one of which is the background, and then subtract the foreground, which you know, because we’re sitting on it, then you’re left with the background, okay, and then once you have the background on the foreground, you use the coverage value to interpolate between the two and you’re done. That’s anti-aliasing, okay, but the eight neighborhood, you know, the square neighborhood with four corners and four vertical and horizontal, doesn’t work, because the corners are square root of two further away than the other guys, okay, and that mattered. You could see it. So I came up with this thing where you do a three-by-five pixel neighborhood, okay, and you checkerboard that, okay, which leaves you with seven pixels, so there’s a pixel on the center, and then there’s six pixels in sort of a hexagonal arrangement. Okay.

Mashey: That’s the hexagonal thing, yeah, right, okay.

Gossett: Okay. Now, it’s squished, but it turns out that doesn’t matter. It’s an affine transform away from being, you know, the right distance, equal distances, and you can’t see that, okay. So you take that six neighborhood and take the penultimate maximum and penultimate minimum, do the same thing. You know, add them together, subtract the foreground and you’re left with the background, done, okay, and these two algorithms, the one for the dither correction filter and the one for the anti-aliasing, didn’t really play well together, but if you were doing the anti-aliasing thing, the dither didn’t matter, because you were on an edge, okay, and that overwhelmed whatever the dither pattern was, and if the coverage was full, okay, then you weren’t doing anti-aliasing, so just do the dither thing. So we just switched which one you were doing based on whatever the coverage with zero or one, and the embellishment that Tim Van Hook came up with, which was very clever, I never would’ve thought of it, was to add transparency, okay. So this works fine if everything is opaque, but what if you have, like, a, you know, the glass on a cockpit of a fighter or whatever, right, and turns out you can just let the coverage value wrap, okay. So you just keep


adding it on top of it and to do the mod-8, you know, just throw away the higher bits that are falling off the edge, and that does exactly the right thing, so that was that.
 
Mashey: So this sounds like an example of lots of clever tricks to fit a really constrained design.

Gossett: Exactly. I mean, that’s exactly what it was. You know, this was all insane at some level, and, you know, if we didn’t have to do it, we wouldn’t have done it, and it was a dead-end, okay. So as soon as you could afford to do multi-sample anti-aliasing, everybody did that, right. I mean, you know, this other stuff is crazy, right. Some other crazy things about this. So we were under time pressure. You know, they wanted to ship by a particular Christmas, which meant many months before that particular Christmas, and we actually missed the original target, but they kind of knew that we probably weren’t going to hit it. But the next Christmas we were like damn well going to hit, okay. So the project wasn’t really staffed quick enough. So I was among the last people staffed onto it, and from then until the final, you know, tape-out, well, the chip coming back, you know, ready to go, was 18 months, which is phenomenal.

Mashey: Yeah, it’s insane. Yes, yeah.

Gossett: Okay. So we had the first chip, which was done very hastily. You know, there was no effort to make it optimal. It was like twice as large as it needed to be, and it had all these problems, you know, the dither issue and some other things. There was as bug that caused it to hang after a few minutes. It’s one of these things that you can’t simulate, right, because, you know, real-time is real and, you know, simulating out for a couple of minutes would’ve taken, you know, decades of simulation, even with an army of machines slaving away. So anyway, it was very wise to do that first chip because we caught all those bugs, and fixed them, okay. So second chip, you know, the natural instinct would’ve been to do as little as possible, okay. Not change hardly anything. But we knew we had to change a massive amount of stuff, and there’s just no way you can sort of half-change everything, okay. So we had to change everything, and, you know, the rest of SGI thought we were out of our freaking minds, but we did it, and, you know, it worked. There were a couple of things that didn’t work but they were nonessential, so we just declared them non-features and, you know, moved on. So that was amazing, and just, you know, there was, you know, there was disagreements and the occasional yelling and screaming, but it was all productive. You know, it was all-- everybody was rowing in the same direction.

Mashey: So how many people were, how many designers were in there?

Gossett: Right. So there were about 25 physical design people that were doing-- well, there was the team that did the MIPS-chip, which also did the physical design for the graphics chip, and there were about 25 of them. Then the rest of the team was maybe half. No, was a little less than half software and the rest hardware, so, you know, I don’t know, 14-ish, some number like that, 15, maybe, and, you know, it was sort of divvied up. There were the leads, you know, Tim Van Hook, Mary Jo Dougherty, myself, and then there were, you know, people who were designing individual units, and, you know, everybody was good, okay. Or great, actually. You know, everybody knew what they were doing. There was a unity of vision. You know, it was clear what we had to produce, and, you know, there were no arguments at that level. There were minor technical disagreements, but, you know, that’s always going to happen, so-- and amazingly, the company let us go. So there was this-- the VP of everything, I called him, Wei Yen, who was the grand poohbah, the guy running the operation, and he was a really amusing fellow. You know, I’d get called into his office every week and on alternating weeks he’d either chew my ass off or praise me, okay, and pretty much awesome, all right, and, you know, it was-- it got to be kind of amusing, you know, sort of a joke, but, you know, it was disturbing at times but on the other hand it all worked and it was fine, and the guy was phenomenal. He had this blackboard in his office that was divided up into tiny little squares. He was running 80 projects, 8-0, projects, okay, so he’d have people in for half an hour, okay, and he knew everything, okay. I mean, he was totally aware and got all the technical subtleties and, you know, you didn’t have to explain things to him. He’s a really amazing fellow. I’ve never seen anything quite like-- well, maybe one thing quite like that, but I’ll get to that, and anyway, it worked, it shipped, and woohoo. You know, it made the company tons of money.

Mashey: Tons of money. Yes.

Gossett: I think they ended up spinning MIPS off, at some point, and a large part of that valuation was because of the Nintendo money, okay. So that was great. Unfortunately, SGI did not do so well thereafter, so, you know, I was looking for something to do. I really should’ve left and gone to something else, but, you know, I was sort of a local hero, right, when I pulled this thing off, and, you know, was just seductive to stay there. So there were three divisions, three graphics divisions. There was AGD, the Advanced Graphic Division, the Desktop Systems Division, and I forget what they were called but basically the low-end division, okay, and, you know, originally, we were going to work on the low-end division. You know, makes sense, right, as that was kind of the space I was in. So I went off and designed a somewhat embellished variant of the Nintendo architecture, and anyway, they decided they didn’t want to do that. They knew what they were doing, “Go away,” okay. Fine. So the Desktop Systems Division had just finished shipping, on this horrible death march, the-- I think it was called the Octane Series of—
 
Back
Top