Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 22-May-2005, 23:40   #1
trinibwoy
Meh
 
Join Date: Mar 2004
Location: New York
Posts: 9,809
Default Asynchronous Hardware Communication

Guys, quick question -

Communication between components running at different clocks/speeds is taken for granted in hardware today. But exactly how does it work. For example, if the ROPs on a GPU are all busy, are the shader pipelines stalled or is there some sort of intermediate buffer ? This is just one example, but is that how it works in general?
trinibwoy is offline   Reply With Quote
Old 23-May-2005, 00:22   #2
Dave Baumann
Gamerscore Wh...
 
Join Date: Jan 2002
Posts: 12,956
Default

Generally speaking elements have some sort of FIFO's / Buffers inbetween. Should they fill/starve then elements will slow down.
__________________
Expand. Accelerate. Dominate.
Tweet Tweet!
Dave Baumann is offline   Reply With Quote
Old 23-May-2005, 00:38   #3
trinibwoy
Meh
 
Join Date: Mar 2004
Location: New York
Posts: 9,809
Default

Thanks Dave.
trinibwoy is offline   Reply With Quote
Old 23-May-2005, 00:39   #4
Ailuros
Epsilon plus three
 
Join Date: Feb 2002
Location: Chania
Posts: 7,768
Default

I just found out fairly recently that the ROPs on NV40 are running at memory frequency. Is it the same on Radeons?
Ailuros is offline   Reply With Quote
Old 23-May-2005, 05:13   #5
3dcgi
Senior Member
 
Join Date: Feb 2002
Posts: 2,021
Default

To expand on Dave's answer. Incorrectly sizing FIFOs can lead to big performance problems or a lot of wasted transistors. A big goal of performance testing is to figure out ideal FIFO sizes.

By this a mean a FIFO that is too small will result in a lot of stalls. In contrast using a 32 deep FIFO when only 8 locations are typically needed is a waste of transistors.
3dcgi is offline   Reply With Quote
Old 24-May-2005, 19:37   #6
BobbleHead
Junior Member
 
Join Date: Sep 2002
Posts: 55
Default

One thing to note is that most of what you've talked about here is not really asynchronous. Rather this is just communication between blocks in the graphics pipeline that take differing numbers of cycles to complete their work.

Most of the time all of the parts of that pipeline run on the same clock ("engine" or "core" clock), so the data passing between them is still synchronous. The FIFOs are there to even out the workflow between blocks. They let the source block continue on with computations and generating data for the next block, until the FIFO is nearly full, and then they provide the destination block with a steady stream of data to work on.

Asynchronous communication is trickier. Depending on how much data needs to be send across clock boundaries, you might put all of the actual data into a FIFO that is written with one clock and read with another, while having a parallel set of gray-coded signals communicating that data was added or removed. There is extra delay in this while you make sure everything is synchronized. So most of the time such async boundaries are only used when necessary. You must have one somewhere between the block running on AGP/PCIE clock and the engine clock, and you must have one somewhere in the path between engine clock and external memory clock.
BobbleHead is offline   Reply With Quote
Old 24-May-2005, 20:47   #7
trinibwoy
Meh
 
Join Date: Mar 2004
Location: New York
Posts: 9,809
Default

Quote:
Originally Posted by BobbleHead
You must have one somewhere between the block running on AGP/PCIE clock and the engine clock, and you must have one somewhere in the path between engine clock and external memory clock.
And between the core and ROPs/Memory Controller as well?
trinibwoy is offline   Reply With Quote
Old 24-May-2005, 22:42   #8
Guden Oden
Senior Member
 
Join Date: Dec 2003
Posts: 6,201
Default

Quote:
Originally Posted by BobbleHead
you might put all of the actual data into a FIFO that is written with one clock and read with another, while having a parallel set of gray-coded signals communicating that data was added or removed. There is extra delay in this while you make sure everything is synchronized.
Why not just have a FIFO that signals the supply end when it is full to not write any more to it? Why this other elaborate stuff, I can see no reason why it should be needed.
__________________
Top one reason why capital punishment is immoral and wrong:
You can release an innocently convicted man from jail,
but you cannot release an innocently convicted man from death.
Guden Oden is offline   Reply With Quote
Old 24-May-2005, 23:36   #9
t0y
Member
 
Join Date: Mar 2004
Location: Portugal
Posts: 149
Default

Quote:
Originally Posted by Guden Oden
Quote:
Originally Posted by BobbleHead
you might put all of the actual data into a FIFO that is written with one clock and read with another, while having a parallel set of gray-coded signals communicating that data was added or removed. There is extra delay in this while you make sure everything is synchronized.
Why not just have a FIFO that signals the supply end when it is full to not write any more to it? Why this other elaborate stuff, I can see no reason why it should be needed.
Try implementing a multi-threaded FIFO in C/C++ and you'll understand why...
t0y is offline   Reply With Quote
Old 25-May-2005, 05:49   #10
BobbleHead
Junior Member
 
Join Date: Sep 2002
Posts: 55
Default

Quote:
Originally Posted by trinibwoy
Quote:
Originally Posted by BobbleHead
You must have one somewhere between the block running on AGP/PCIE clock and the engine clock, and you must have one somewhere in the path between engine clock and external memory clock.
And between the core and ROPs/Memory Controller as well?
I don't know the details of that design. But that would fall under "between engine clock and external memory clock." Given the extra time cost of the asynchronous crossing, they probably do it as little as possible. So if the ROPs are running on memory clock there is one place the data crosses from engine to memory clock. There is some piece of the chip running at memory clock (or some nice 2^n or 1/2^n multiple), and once the data makes it there it moves along a synchronous path to the external memory. Again there are bound to be some synchronous FIFOs in that path, just like there are between parts of the graphics pipeline.

Quote:
Originally Posted by Guden Oden
Quote:
Originally Posted by BobbleHead
you might put all of the actual data into a FIFO that is written with one clock and read with another, while having a parallel set of gray-coded signals communicating that data was added or removed. There is extra delay in this while you make sure everything is synchronized.
Why not just have a FIFO that signals the supply end when it is full to not write any more to it? Why this other elaborate stuff, I can see no reason why it should be needed. :)
All that "elaborate stuff" is the method to properly signal that it is full. :)
BobbleHead is offline   Reply With Quote
Old 25-May-2005, 10:37   #11
Chalnoth
 
Join Date: May 2002
Location: New York, NY
Posts: 12,678
Default

Quote:
Originally Posted by Guden Oden
Why not just have a FIFO that signals the supply end when it is full to not write any more to it? Why this other elaborate stuff, I can see no reason why it should be needed.
Well, the only issue with doing this is: when do you tell the threads to start up again? In what order?
Chalnoth is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Xbox 2 hardware overview leaked? Bowie Console Technology 143 25-Jun-2004 18:16
The Way its Meant to be Reviewed? Dave Baumann Beyond3D News 266 31-Dec-2003 16:24
3D Hardware Vendors Key to Microsoft Dave Baumann Beyond3D News 22 09-Jun-2003 09:50
Futuremark lied! So dont trust it. jerry_enCater 3D Architectures & Chips 48 25-May-2003 09:08
+'s/-'s and feasability of lengthened hardware release cycle JavaJones 3D Architectures & Chips 12 09-Mar-2002 16:56


All times are GMT +1. The time now is 04:47.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.