I don't know much of the specifics of fast14, but some about older dynamic logic.
In dynamic logic, you've got clocks on every single gate. A "gate" kan be more complex than a NAND gate, but I don't think they do gates equivalent to more than a handful of NAND gates.
Each such gate can be seen (logically) as a small logic function followed by a latch. The latch is however not stable. It stores the bit like in a DRAM, as a charge in a capacitor, thus the name "dynamic logic". This means that the latch can not store the bit for more than one clock cycle, and this clock cycle must be short. In fact, IIRC (it was long time ago) it might just store the bit for half a clock cycle, so you'll need the next gate to run at the opposite phase (that's the multiple clocks they talk about).
This logic is much faster than if you'd designed it with a normal gate and with a normal latch after, and it is very suitable for extreme pipelining. Think free pipeline stages for every couple of gates.
The drawback is that the capacitor in the latch will be charged and discharged for every clock cycle, even if the data doesn't change. And this will of course increase the power needed.
And then of course it's been (historically) harder to design for if you want to make use of that extreme pipelining. Keeping the waves of data in sync, and hiding the higher latencies (counted in clock cycles). Exact timing between the clocks of two consecutive gates has also been critical, and made it hard.
The exact timing has been made easier by fast14's quad clock. But I guess you still need to do some brain athletics to get good use of the extreeme pipelining. (The tools might help with some of it though.)
This way to design shoud work well if you have lots of independent identical parallelisim (PS or VS). Think hyperthreading.
I'll see if I can find some (transistor level) drawings of the actual gates.