To achieve a low-power design, the CS301 development team combined a carefully envisioned architecture, as seen from the top-down view, with the practical engineering detail built from the bottom up. It was a matter of taking everything back to basic principles and finding a solution that looked good when viewed from the top or the bottom. Optimizing the architecture produced the greatest gains, but it's the detail that ultimately determined which approach was best.
Probably the single most important design target for the CS301 was to minimize the number of times information had to be moved and to move it efficiently. This basic approach was woven into both the architecture and implementation, from the on-chip network-which has a very simple control structure allowing distributed arbitration and clock gating-to the fundamental structure of the multithreaded array processor. Instead of centralizing the control for decision making and processing into a single unit-as with a typical microprocessor-where possible, local units make their own decisions about what processing is required. This minimizes the flow of data, control and clock signals to only the unit that is required to implement the correct functionality.
The replicated processing element played a fundamental part in achieving both the performance and the power efficiency. The microarchitecture was critical, as each decision, from control coding and distribution through to the detail of the compute elements, needed to be evaluated for efficiency. There is no trick to power-efficient design other than making sure the team understands the goal and is inspired to sweat the details and find an optimal solution. The only shortcuts in this design were based on experience and sound engineering principles plus an integrated design environment that provided fast feedback and predictability throughout the flow.
For ClearSpeed, achieving its performance and power goals meant stripping out complexity. Finding low-transistor-count solutions to each aspect of the design allowed the team to reduce the area of each component, reducing capacitance locally and, as a by-product, reducing the capacitance associated with the global control and data flow. An essential requirement of the company's approach was the ability to rapidly take new ideas through to finished layout and to validate expectations. Some ideas looked elegant as RTL but turned out to be inefficient when realized in silicon through a semicustom flow. The ideal flow needed not only to give rapid closure but also to allow the company's engineers to understand the result and modify their design strategies to work with the tools.
Originally, the company's engineers had tried using a conventional point-tool IC design flow for the CS301, but various problems caused the team to abandon that method. Timing, signal and power integrity, and routing issues prevented it from achieving design closure. The designers suspected these problems were a result of poor initial placement. The team believed that its point-tool flow was not addressing all of the issues concurrently as was needed. In addition, the point-tool flow provided no feedback, so identifying the causes of the problems was impossible.
The development team adopted a new design flow from Magma Design Automation Inc. (Santa Clara, Calif.). With Magma's Blast Fusion APX, Blast Noise and Blast Rail, the team had an integrated flow that addressed timing, signal and power integrity, and routing issues concurrently throughout the flow. This correct-by-construction approach delivered better placement and provided insight into the design that allowed the team to reduce power significantly. With Magma's system, the team's engineers could accurately and efficiently perform timing-vs.-power and area-vs.-power trade-offs at different stages of the design flow.