BobbleHead
Newcomer
Do we? What's your source for that? AFAIK DDR3 handles one outstanding read or write request at a time, in the order each command is received.We do know DDR3 can handle a lot of accesses in flight though..
Dude, you're just annoying the shit out of me so I'm not gonna bother to reply to all of your drivel other than to say "typically" does not mean "cannot do more than". And DDR3 can still only SERVICE one request at a time. End of fucking story.
You believe that anything that can only service one request at a time can not have multiple requests in flight? That's just completely incorrect. The whole point of newer high speed buses with higher latencies is to push more data through a single set of wires (one request at a time) while keeping a pipeline of requests queued up. That's the only way to keep those wires filled with data as much as possible. Just because it processes the requests in-order and sequentially does not mean there aren't multiple commands outstanding at a time.
Perhaps that is too generic a response for you. Let's look at DDR3 specifically.
Reading data from a DDR3 DRAM is a two step process. First a command is sent with a row address and bank. This tells the DRAM to open up a particular page in that bank. Then you have to wait a number of cycles while the DRAM does the page open (tRDC in the DDR3 spec). Once the page is open, a command is sent with a column address and bank. This tells the DRAM to read the data at the provided column address on the previously opened page (row) and return it. Then you wait column access latency (CL in the DDR3 spec) before the DRAM puts the read data on the data bus and sends it back.
In DDR3-1600 the row to column delay (page open delay) is 11 cycles and CAS latency is also 11 cycles. Since any read is a burst of 8, the completion of the read data takes an additional 4 cycles. If DDR3 couldn't have multiple accesses in flight then you would be limited to 8 bits of data every 26 cycles. The *only* way you can keep the data bus full is by issuing multiple commands to the DRAM while it is still processing the previous commands.
If you ignore the page open cost and only consider the CAS (column command) as the actual read then a DDR3 interface has to have 3 reads outstanding to cover that 11 cycles of delay. Keeping the read data bus fully occupied with data requires a constant stream of CAS commands every 4 cycles. Include the page open cost and you need at least 6 reads outstanding (in one form or another) to keep DDR3 going at peak rate.
Now let's put those row/page commands back into the mix as well, since they are a required portion of any read. We know we have to keep issuing CAS commands every 4 cycles to keep the data bus filled, but that leaves 3 idle cycles on the command bus. Any good memory controller utilizes those cycles to send out the row commands to open up a page for future CAS commands to read from. When that happens you have the DRAM doing multiple things at the same time. While it is reading data for address A from bank 0, it is also opening row B of bank 1 and maybe even opening row C of bank 2. If you've scheduled things right, then by the time you're done with your reads for A you can immediately issue your reads for B because you've use the A reads to hide the page open cost for B. Then while the reads for B are being issued you also send out a close-page command for A (you're done with it at this point), and throw in a page open for address D of bank 3. Not only are there multiple reads outstanding, there are different aspects of those reads going on simultaneously!
When we get to DDR4 it becomes even more complicated. In DDR3 and earlier, the minimum cas to cas delay (tCCD) for same bank accesses has always been less than or equal to the burst length. You could keep the data bus full of read data by issuing a steady stream of column addresses that are all in the same row on a single bank. In DDR4 that is no longer true. As interface speed ratchets up, CCD has increased to be bigger than burst length. If you want to keep the data bus full of read data you now have to keep at least 2 banks open and interleave individual reads for the two banks. Combined with the higher interface speed and longer latencies (in clock cycles) and you have even more reads outstanding in various forms.