Tsinghua Science and Technology


accelerator parallelization, point-to-point interconnect insertion, bus-based embedded system-on-chips


As performance requirements for bus-based embedded System-on-Chips (SoCs) increase, more and more on-chip application-specific hardware accelerators (e.g., filters, FFTs, JPEG encoders, GSMs, and AES encoders) are being integrated into their designs. These accelerators require system-level tradeoffs among performance, area, and scalability. Accelerator parallelization and Point-to-Point (P2P) interconnect insertion are two effective system-level adjustments. The former helps to boost the computing performance at the cost of area, while the latter provides higher bandwidth at the cost of routability. What's more, they interact with each other. This paper proposes a design flow to optimize accelerator parallelization and P2P interconnect insertion simultaneously. To explore the huge optimization space, we develop an effective algorithm, whose goal is to reduce total SoC latency under the constraints of SoC area and total P2P wire length. Experimental results show that the performance difference between our proposed algorithm and the optimal results is only 2.33% on average, while the running time of the algorithm is less than 17 s.


Tsinghua University Press