There isn't much in the rumors concerning the DMEs that appear relevant to the contents of that paper.
The paper was concerning a scheme to make it easier for FPGA designers to use PCIe, since the protocol itself is complex and affected by the complex software and hardware environment in a PC.
Why would PCIe even be involved for Durango, when it's all on one chip?
Nothing about the description of the DMEs indicates they are as flexible as an FPGA, and at least one rumor flat out says they are fixed-function. DME bandwidth is an order of magnitude higher than the best bandwidth for this research paper, and the latencies reported look very bad for what an APU should be capable of doing.
edit:
Just to clarify something, I suspect from the information given so far that the DMEs are an elaboration of the DMA engines already present in GPUs. These are used for handing data transfers over the PCIe bus in GPUs, but they look to be useful for data movement even without the actual bus. Durango's rumored first two DMEs appear to be just that, and the remaining two with compression/decompression capability have extra hardware added to them. The limited data path that is shared amongst the DMEs and the video decode block makes me thing they are hanging off of the low-bandwidth hub all AMD GPUs have. This hub exists for ease of adding hardware that can function without a direct feed into the high-bandwidth cache system.