Recent content by metafor

M
Cortex A15 fp64 peak?

IIRC, most x86 space processors do divide as part of their pipeline. I'm obviously not going to confirm anything but divides in ARM space is more of a separate iterative sequencer deal. Those take up a lot less space but it does mean divide throughput isn't very high, hence the desire for dual...
- metafor
- Post #16
- Aug 30, 2012
- Forum: Mobile Graphics Architectures and IP
M
Kishonti GFXbench

Also the Galaxy Nexus LTE. Tegra/Exynos still used SDIO for its modem communications; which made it unsuitable to work with the MDM9xxx series. I would imagine Exynos 5xxx resolves this. OMAP has never had this problem.
- metafor
- Post #73
- Aug 30, 2012
- Forum: Mobile Software
M
Cortex A15 fp64 peak?

Divides are only available in scalar mode; there isn't a SIMD divide instruction. I'm not sure how permute would be handled with regards to forwarding to the arithmetic path. I realize that A15's SIMD/FP pipeline is far more integrated to the other pipes than it was in A9 but whether they're...
- metafor
- Post #14
- Aug 25, 2012
- Forum: Mobile Graphics Architectures and IP
M
Cortex A15 fp64 peak?

Actually, there are dual dividers in A15. I wouldn't expect it to be completely symmetrical, but the things I wouldn't expect to be symmetrical would be FP64 MUL and MAC. Perhaps VFP can't be dual-issued either. It could. And I agree public information doesn't say one or the other. But I'll...
- metafor
- Post #12
- Aug 23, 2012
- Forum: Mobile Graphics Architectures and IP
M
Cortex A15 fp64 peak?

Yes but you can interleave ADD/SH and other non-MUL instructions in there. I realize that's not as flexible or as preferable. The public information is the block diagram I've seen in various places: http://pc.watch.impress.co.jp/video/pcw/docs/513/347/p11.pdf For instance. There are...
- metafor
- Post #10
- Aug 22, 2012
- Forum: Mobile Graphics Architectures and IP
M
Cortex A15 fp64 peak?

Even the partial product tree isn't increased all that much for a base-16 multiplier. But more importantly, the actual tree can be mostly reused and shared by 4xINT32 without much waste. FP adds a lot of waste, but not in the partial product tree itself; the shifter for alignment is the biggest...
- metafor
- Post #8
- Aug 22, 2012
- Forum: Mobile Graphics Architectures and IP
M
Cortex A15 fp64 peak?

Integer multiplication doesn't cause quite that much extra area for doubling the width for a relatively sane implementation. Floating point, yes.
- metafor
- Post #5
- Aug 22, 2012
- Forum: Mobile Graphics Architectures and IP
M
Qualcomm Krait & MSM8960 @ AnandTech

Depends on the MAC implementation. Chained implementations don't have much of a size advantage other than the bloat that comes with instruction tracking. How prevalent is a MAC type op in GPGPU applications compared to chains of MUL or ADD? Any matrix-based operations would obvious benefit...
- metafor
- Post #223
- Aug 16, 2012
- Forum: Mobile Devices and SoCs
M
Qualcomm Krait & MSM8960 @ AnandTech

I wasn't referring to fused vs unfused. I was referring to single MAC op vs standalone ops for MUL and ADD. Are there significant advantages to having a MAC op for most jighly parallel, loosely memory coupled algorithms?
- metafor
- Post #217
- Aug 16, 2012
- Forum: Mobile Devices and SoCs
M
Qualcomm Krait & MSM8960 @ AnandTech

What are the typical use cases for FP64 in GPGPU? Even Kepler takes a significant hit when performing MAC compared to standalone MUL and ADD, so I imagine MAC's aren't particularly desired?
- metafor
- Post #213
- Aug 15, 2012
- Forum: Mobile Devices and SoCs
M
Qualcomm Krait & MSM8960 @ AnandTech

The biggest problem (and often critical path) is alignment for FP64 MADD. Since a FP64 has a much wider range of exponents compared to FP32, the initial exponent comparator would have to be widen, the shift would have to be done over multiple cycles with the intermediates stored some place. The...
- metafor
- Post #200
- Aug 14, 2012
- Forum: Mobile Devices and SoCs
M
Qualcomm Krait & MSM8960 @ AnandTech

But if the FP64 datapath is separate and sufficiently low performance, nominal power consumption shouldn't be that much higher. Particularly when we're talking about a ~500MHz operating frequency.
- metafor
- Post #198
- Aug 14, 2012
- Forum: Mobile Devices and SoCs
M
Samsung Exynos 5250 - production starting in Q2 2012

Samsung does make discrete LTE modems. Albeit not true multi-mode like the qcom ones. Modem integration can cause quite a few hassles. Not the least of which is qualification and validation on top of heavier IP protection that requires encryption ROMs and thus, difficulty with chip bringup...
- metafor
- Post #115
- Aug 12, 2012
- Forum: Mobile Devices and SoCs
M
Samsung Exynos 5250 - production starting in Q2 2012

They started this SoC likely around the time ARM first announced the A7. I think you severely underestimate how long it takes to integrate something like that into an SoC; especially considering you're supporting a hybrid coherency model. I believe the Korean version of the Galaxy S3 has...
- metafor
- Post #96
- Aug 10, 2012
- Forum: Mobile Devices and SoCs
M
Samsung Exynos 5250 - production starting in Q2 2012

Krait's pipeline is significantly shorter than A15's though. I'm frankly very surprised they're only running at 1.7GHz. Perhaps it's a power consumption issue. A15's kinda power hungry.
- metafor
- Post #94
- Aug 10, 2012
- Forum: Mobile Devices and SoCs