Starting GPU Core design using VHDL

Fusion

Newcomer
Starting GPU Core design in VHDL

Hi,

I've started learning VHDL to use it to design my own graphics core, I've also studied the block diagrams of different GPU core architectures from AMD and NVIDIA, also I started to study a computer graphics course and I'm now familiar with the mathematical (Matrices) operations involved in the core operations.

I decided to start with the ALU design (Stream Processors) and the Memory controller design so I could test a simple Arithmetic/Logic operations by acquiring data from the DDR2 memory included in the Virtex-5 ML506 FPGA and process it using the ALU.

My questions now, Aren't there any useful resources that could help me in designing the ALU and the MC using VHDL ? by the way, a Floating point unit is an essential part of the design, and I couldn't find some useful resources of designing it in VHDL.

Also, I am asking for your kind advice about a good start in the design of the GPU core!

Thanks in advance.
 
Last edited by a moderator:
Don't worry about which language to use. Or about floating point or DDR2 memory controllers.

If you've never done this before, first spend a lot of time just learning how digital design really works. It's conceptually simple, but it take a long time to become good at it. So start out with flashing leds and small state machines and reading and writing data from and to a static memory etc. Don't forget to actually synthesize your code to FPGA too: getting it work in simulation doesn't prove a thing. There are tons of constructs that are not synthesizable and beginners have a tendency to run into a lot of them. ;)
 
@rpg:

OpenSPARC (Internals Book) which is based on the UltraSPARC T2 has a lot of resources on their website since it's an open source core. I am interested in this chip since its architecture is close to Intel's Larrabee architecture, what's ur opinion about it guys? would it be a good start ? and which parts of the design should I start with ?

@mmendez:

Thanks for sharing ur opinion, I've seen the opengraphics project before but there is no resources regarding the internal design.

@silent_guy:

Thanks for your valuable advice, What are your recommended resources for topics in Digital design?, btw, I worked before on microcontrollers and DSPs, but it's the first time on FPGAs using a Hardware description language.
 
Thanks for your valuable advice, What are your recommended resources for topics in Digital design?, btw, I worked before on microcontrollers and DSPs, but it's the first time on FPGAs using a Hardware description language.

Programming experience with microcontrollers and DSP's won't help you much: they still execute code linearly, even if it's assembler code.

You need to get into the mind set of lots and lots of things happening in parallel. That takes time and practice. I can't really point you to resources: in its days, the comp.lang.verilog and comp.lang.vhdl forums were pretty decent, but I haven't been there in years...
 
Thank you silent_guy.

Guys, any opinions regarding a good memory controller design to start to implement on an FPGA using VHDL ?
 
I'd go with the one that ususally comes with the toolkit. I am not sure but I think modern high end FPGA'a also have hard-core memory controllers in them. If so, it'll make your life a lot simpler.
 
Guys, any opinions regarding a good memory controller design to start to implement on an FPGA using VHDL ?
My recommendation is to ignore the MC and simply use the RAMs that are inside the FPGA. The chance that you'll get to the point where you say "I'm done with everything and now I'm really constrained by the lack of external memory" is very small.

Even a simple SDR memory controller takes weeks of full-time design effort by someone who knows what he's doing.

Try to detail what you really want to achieve: learn about digital design? Learn about GPU architecture in general? Design a shader core? Learn about memory controllers?

Each of those are already beyond the scope of a typical hobby project.

Did you look at opencores.org? You may be able to find some things there.
 
I also evaluated this idea a few times, but always came to the conclusion that this will lead to nowhere as a one person project. It's like the MMORPG of chip design ;)

If I were you I would design a simple integer SIMT stream processing architecture that can hold a few threads in flight but only has access to a local register file with only static branching. Doing floating point or dynamic branching would be much more work. You could add these later if you are still interested.

An entire GPU would also require a triangle rasterizer that does generic 4D clipping in homogenous coordinates. This is no easy task, even when without guard band clipping. The math behind it is quite challenging at first.

And if you want to do this in your spare time it's always important to have some motivating results after at least a month or two of work.
 
Last edited by a moderator:
Thanks silent_guy and Novum.

Actually, it's neither a one-person project nor a hobby one ... It's a graduation project :)

Our team is formed of 3 students and we have about 6 Months to finish it.
Did you look at opencores.org? You may be able to find some things there.
Yes I checked it and it helped me with the design of a simple FPU.

By the way, we could use the OpenSPARC T2 architecture in our design, we should understand the architecture very well, then start modifying it for the graphics processing by increasing the FGUs (Floating Graphics Units) and modifying other parts to increase the vector processing performance, what's your opinion?

Our problem now is that we don't know from where to start! We had a lot of resources and Architecture details of the OpenSPARC T2, but how to start is the problem facing us now.

If I were you I would design a simple integer SIMT stream processing architecture that can hold a few threads in flight but only has access to a local register file with only static branching. Doing floating point or dynamic branching would be much more work. You could add these later if you are still interested.
The resources available for us now are some codes at opencores.org and the complete architecture of the OpenSPARC which I am actually interested at. Would this architecture be suitable for graphics processing so that I'll start studying it and modifying as I mentioned before.

Any additional resources will be much appreciated, but our most challenging issue right now is the "Start" :???:
 
You're still very non-specific about the goals you want to achieve.

What should your project do to qualify as a success? What's the scope of the project?
- vertex transformations?
- triangle clipping?
- triangle rasterization?
- pixel shading?
- texturing?
- ROP operations?
- VGA output?


Let's reduce the problem to just pixel shading.

Some questions you need to answer:
- What's your experience? Have you ever designed something more complex than the proverbial traffic light controller?
- Which tools are available: Which simulator? Which waveform viewer? Which synthesis tool? Forget about debugging these things on an FPGA, that's the very last step. You're going to need to build a verification test bench for this to simulate everything. Not necessarily hard or complicated for a simple one that does directed tests, unless you've never done it before. You'll have to learn to become comfortable analyzing waveforms. It takes a while before it becomes second nature.
- how are you going to feed data in and out of the design? One way or the other, you need an interface with a PC.
- do you have any performance goals? Throw them away! ;) Process just 1 pixel at a time. Basically, design a very restricted and very slow CPU. You don't have time for anything else.
- the OpenSparc T2 architecture is way too complex, even if you don't change it. You have no idea how hard it is to change an existing simple design. The T2 is not simple. Expanding a CPU with additional unit will ripple through all stages of the design. Even something like the LEON Sparc is too complex: you don't need stuff like register windows and interrupt handling and debugging features and even branch instructions for an old-style GPU pixel shader.

You're massively underestimating the work involved.

(BTW: if you really still want FP, you could use the FP unit from the LEON3 cpu.)
 
I would say that, before you attempt to create a HW graphics component, try writing it in software first.

(I will agree with Silent guy - the task you have set yourself is enormous)
 
Now that was an Important post, thanks again silent_guy.
You're still very non-specific about the goals you want to achieve.
What should your project do to qualify as a success? What's the scope of the project?
This is the thesis of our project:
Graphics processing units is a basic part of any modern computer. It comes as either a graphics card or as on-board graphics processor integrated with the motherboard. It is needed for gaming, animation, real-time, video display. The basic functionality includes image translation, rotation, shadowing, ...etc. Older generations used full hardware implementation. Current generations use software programmable general purposearithmetic units (ALUs).

In this project, we design a simple computer graphics processor card which covers these aspects. The project will look into basic 2-D a simplified programmable graphics processor, graphics memory, fast interfaces to the system CPU, power distribution, PCB design, and porting of some standard graphics libraries. A simplified scaled-down FPGA implementation will be tested to demonstrate the basic knowledge of internal architecture of the processor.
We decided to start with the GPU Core design first then the PCB design.
- What's your experience? Have you ever designed something more complex than the proverbial traffic light controller?
We designed the EC-1 General Purpose Microprocessor in VHDL and simulated it using "FPGA Advantage" and "ModelSim", but did not implement it on an FPGA.
- Which tools are available: Which simulator? Which waveform viewer? Which synthesis tool?
Simulator: FPGA Advantage and ModelSim.
Waveform viewer: ModelSim and a Logic analyzer (available in the college's lab).
Synthesis tools: Leonardo Spectrum and Xilinx ISE tool Suite that comes with the Virtex-5 FPGA.
You're going to need to build a verification test bench for this to simulate everything.
Unfortunately, I've never done it before.
- how are you going to feed data in and out of the design? One way or the other, you need an interface with a PC.
PCI Express Interface.

And regarding your point about the complexity of the OpenSPARC T2 or the LEON3, I got it.
Actually, I need to divide the project into several parts and objectives to be done, I found this on wikipedia:
A CPU design project generally has these major tasks:

* Programmer-visible instruction set architecture, which can be implemented by a variety of microarchitectures
* Architectural study and performance modeling in ANSI C/C++ or SystemC
* High-level synthesis (HLS) or RTL (eg. logic) implementation
* RTL Verification
* Circuit design of speed critical components (caches, registers, ALUs)
* Logic synthesis or logic-gate-level design
* Timing analysis to confirm that all logic and circuits will run at the specified operating frequency
We do not want to go deep in software issues, our task is a Digital Logic Design then PCB RF Design.

Our project will be similar to the Manticore Project with nearly the same goals.

So based on the thesis of the project and my answers to your questions, If you were me, how would you start/divide the project?


Thank you for helping me, much respect dear silent_guy.

+ Sorry for all these quotes!
 
Some random points:
* once again, the scope of the thesis is insane. Your thesis promotor should never have accepted it as such. :???: Note that the Manticore project didn't get far in realizing the final goal. It's already impressive that they got as far as getting a rasterizer, VGA output and SDRAM controller to work. I suggest you start with their code and take it from there, in adding one or more units on their list.
* I've never worked PCIe itself, but I know enough of it that it is very complex. If you're starting out from scratch (that is: no RTL available yet), you're looking at at least a man year (or 2 or 3?) of work by an experienced designer. If your FPGA has a ready-made PCIe interface block available, just getting it to work (and run at speed) to the point where your PC will be able to control an LED on your FPGA will be a major accomplishment. I suggest just drop it and use a UART or simple parallel port. You're trying to proof that you're worthy of getting a degree. Performance of your hardware is irrelevant.
* As the lowest level, a testbench is the same as RTL code, except that you're allowed to use all the features of the language, such as reading and writing files. Which you use to feed data into the actual design and to compare the results that come out of your design with some reference data. How you generate this input and output files, is a different topic. Often a C model is used as a reference model to compare against...
* So it looks like you're focussing on 2D only. How about you start with Manticore and bolt a very simple pixel shader onto that? That's something that should be doable. Have a look at DX8-style shader (no branches). In addition to the Z interpolator of Manticore, add more attributes that can be interpolated and calculate something with it to determine the final pixel color. A gouraud shaded triangle will be a big success. Additional points if you support reading data from external memory, which will allow some basic texturing.
* Forget about porting standard libraries. It's not going to happen. ;)
 
sponsored by

Unfortunately without the source for the OpenGL implementation the extendibility of the public version of ATTILA is quite limited ... but the code may help understanding how a R300 era GPU works (if you can understand the code, the rasterization code for example is quite messy with two different implementation and a couple extra variations).

As the new OpenGL and D3D9 implementations using a common software base are already working pretty well I plan to push for at least releasing the source for the old OpenGL implementation that corresponds with the public version of the simulator.

In any case don't waste bw mailing me to request the source code as a few people have already done in the past. If it was just me most of the code would be public already (as I'm barely using it for any kind of research right now and it's a waste). The other PhDs working on it have their own right to decide about the exclusivity of their code and parts of the common code. Try also to guess who is 'the shadow' that really controls the group. It may not be that obvious at first sight ;).
 
Thanks silent_guy, mhouston and RoOoBo for cooperation! :smile:

After about 4 months of progress with the help of some tips in this thread, I have some new questions:

I am now in the process of designing my own Instruction set architecture. I decided to write a simple program in OpenGL and disassemble it to know the instructions used and implement it. So I wrote a simple program to generate a rectangle and translate it, then I disassembled the .exe file. The instructions shown should be stored in the system memory, right?!

I need to know what happens to these instructions after being stored? They should be executed by the CPU for sure! But what happens next? Why shouldn't I copy these instructions as is to the GPU memory to be then executed by the GPU?

I think I couldn't copy the instructions as is since they're targeting an architecture based on the X86 ISA and I am designing my own ISA, so I probably need a translator (driver), correct? If yes, how could I design my driver? how could I link the OpenGL functions to the driver to create the command stream for the GPU?
 
On the GPU side, you may want to start with aiming for early DX9 level capability, which means hardware supporting PS2.0 (or later) as directly as possible. Again, looking at ATTILA is probably a good place to look at what should be GPU side and what CPU side.
 
In any case don't waste bw mailing me to request the source code as a few people have already done in the past. If it was just me most of the code would be public already (as I'm barely using it for any kind of research right now and it's a waste). The other PhDs working on it have their own right to decide about the exclusivity of their code and parts of the common code.
Can master level student projects get access? Assuming he's a student and passionate enough about it he could always try to transfer.

PS. Gallium3D seems to be making some headway.
 
Last edited by a moderator:
Back
Top