Such a 32bit integer multiplication can be constructed from four 16 bit integer multiplies and a series of adds. I hoped 3 would be enough or it would be at least possible to get the full 64bits result with one VLIW instruction group (which should be possible if the adders are fast and wide enough). But it needs two VLIW bundles to get that:Why the hell would that happen ? I mean isn't two slots enough for that ? the only way I see that happening is if each slot pefroms 1/4 of the operation (8-bits)! (possibly utilizing the mantissa portion) ?
Code:
4 x: MULLO_INT R3.x, R2.x, R2.x
y: MULLO_INT ____, R2.x, R2.x
z: MULLO_INT ____, R2.x, R2.x
w: MULLO_INT ____, R2.x, R2.x
5 x: MULHI_INT R4.x, R2.x, R2.x
y: MULHI_INT ____, R2.x, R2.x
z: MULHI_INT ____, R2.x, R2.x
w: MULHI_INT ____, R2.x, R2.x