Background
Heat on chip
Computer chips become hotter when they are used. If they become too hot, they break. Also, if they become just a little hot enough, they consume much more power to get the same amount of work done. Therefore, to protect chips and increase their power efficiency, it is useful to search ways to reduce the overall heat.
Heat on chips is generated in “hot spots”, those components on the chip that are more active from others. A hot spot at one component then dissipates the heat vertically towards the chip’s package, but also horizontally to neighboring components. Even if the neighboring components are not to active, the propagated heat reduces their efficiency and they may start to become hotter than they would if they were isolated.
So “science” (we) needs to work on reducing hot spots on chip, but this work must account for the relative position of components on the silicon.
Architecture design
“Early design” occurs when chip architects choose components to put on the chip to perform some task, and decide how they are functionally connected, ie. “which components needs to be connected to which other component”. This establishes data paths but there are still many ways to actually organize the components on the silicon chip. For example, early design can decide a chip must contain four components A, B, C, D that must be fully connected. This is sufficient to “validate” the design and confirm it is appropriate to carry out some computation. However when printed on silicon, these components can be organized in e.g. four different configurations (ABCD / ACBD / ADBC / ADCB), which do not influence the functional behavior but does influence which components are physically neighbours and thus how heats propagates.
Hypothesis
Traditionally, temperature and heat were not considered in early design, because early design (as explained above) does not produce the final topology of the silicon chip, which seems required to model and simulate heat propagation.
Hypothesis: the information about component locality found in the functional chip design is sufficient to derive useful information about thermal behavior, without synthetising the actual chip floor plan.
The idea flows as follows: from a single functional design with data paths, there are many possible floor plans possible. However, we can safely postulate that all candidate floor plans will put components that work together close to each other, because otherwise the early design phase would not have connected them with data paths.
So at the high level we can simply compute any random planar embedding of the graph of components with data paths as edges, and postulate that the 2D topology of this embedding is representative of any other planar embedding of the same design for the purpose of simulating heat propagation. By “representative” we mean that the prediction of thermal behavior that can be obtained with this planar embedding can be empirically shown to be more accurate than a prediction based on a fully random placement of the components without taking the data path topology into account.
Experimental protocol
Infrastructure
A few of my colleagues develop Sesame, a simulation tool within the Daedalus architecture design framework.
Sesame is able to manipulate functional chip designs and co-simulate these high-level architecture models and the software that runs on them. We can thus propose to extend Sesame to compute a random planar embedding of the designs, and then simulate heat propagation using a naive model: compute heat sources using the load on each component, then heat propagation using the topology of the random planar embedding.
Experiment
It is possible to load/define in Sesame the functional design of existing embedded chips.
For some chips, the thermal behavior T_real has been already experimentally measured/modelled using the existing silicon implementation (eg. using on-chip sensors or thermal cameras).
For these chips, we can thus make a functional model of its components in Sesame, without taking its actual silicon topology into account, then use the infrastructure defined above to make a prediction T_pred of its thermal behavior. We can also use the same infrastructure and make a prediction T_random of the thermal behavior using a fully random 2D planar topology which does not account for the locality information provided by the data paths.
If the experiment shows that T_pred is significantly closer to T_real than it is from T_random for enough different chip designs, the hypothesis is validated and we can claim to have designed a method for early design space exploration of thermal behavior.