AI-Empowered Thermal Modeling and Run-Time Management for Manycore Processor and Chiplet Designs
Principle Investigators
- Dr. Sheldon Tan (PI)
Graduate Students
Current Students
- Jincong Lu
- Subed Lamichlane
Graduate Students (graduated)
- Sheriff Sadiqbatcha
- Shuyuan Yu
- Wentian Jin
- Jinwei Zhang
- Mohammadamir Kavousi
- Yibo Liu
- Liang Chen (post-doc, SJTU)
- Han Zhou (First job: Synopsys)
Industry Liaisons
Funding
- National Science Foundation CISE CCF Core Small program (CCF-1816361), "SHF:Small: Data-Driven Thermal Monitoring and Run-Time Management for Manycore Processor and Chiplet Designs", $500,000, Oct. 1st, 2021 to Sept 30th, 2024, single PI.
Project Goals
This project seeks to develop a new generation of data-driven super fast thermal modeling/monitoring and smart run-time thermal/power and
reliability management techniques by harnessing the latest advances in
deep learning and numerical methods for commercial multi/many core
processors and chiplet design. The project capitalizes the unique thermal IR imaging system at VSCLAB for measurement of commercial multi/many cores processors and future chiplet based system in a package design.
This project also leads to the first open source "Thermal map database for commercial CPU/GPU/TPU multi/many core processors"
Project Highlights
Publications
Journal publications
- S. Sadiqbatcha, J. Zhang, H. Zhao, H. Amrouch, J. Hankel, S. X.-D. Tan, “Post-silicon heat-source identification and machine-learning-based thermal modeling using infrared thermal imaging”, IEEE Transaction on Computer-Aided Design of Integrated Circuits and Systems (TCAD), July 2020, 10.1109/TCAD.2020.3007541
- S. Sadiqbatcha, J. Zhang, H. Amrouch and S. X.-D. Tan, “Real-time full-chip thermal tracking: a post-silicon, machine learning perspective”, IEEE Transaction on Computers (TC), June, 2021. 10.1109/TC.2021.3086112
- J. Zhang, S. Sadiqbatcha, M. O’Dea, H. Amrouch and S. X.-D. Tan, “Full-chip power density and thermal map characterization for commercial microprocessors under heat sink cooling”, IEEE Transaction on Computer-Aided Design of Integrated Circuits and Systems (TCAD), Vol. 41, No. 5, pp. 1453-1466, May 2022, 10.1109/TCAD.2021.3088081.
- L. Chen, S. Sadiqbatcha, H. Amrouch and S. X.-D. Tan, “Electrothermal simulation and optimal design of thermoelectric cooler using analytic approach”, IEEE Transaction on Computer-Aided Design of Integrated Circuits and Systems(TCAD), vol. 41, no. 9, page: 3066-3077, 2022, 10.1109/TCAD.2021.3120533
- J. Zhang, S. Sadiqbatcha, L. Chen, C. Thi, S. Sachdeva, H. Amrouch and S. X.-D. Tan, “Hot-spot aware thermoelectric array based cooling for multicore processors”, Integration, vol. 89, pp. 73-82, 2023, https://doi.org/10.1016/j.vlsi.2022.11.006.
- J. Zhang, S. Sadiqbatcha and S. X.-D. Tan, “Hot-Trim: thermal and reliability management for commercial multi-core processors considering workload dependent hot spots”, IEEE Transaction on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 42, no. 7, pp. 2290-2302, July 2023, doi: 10.1109/TCAD.2022.3216552.
- L. Chen, W. Jin, J. Zhang and S. X.-D. Tan, “Thermoelectric cooler modeling and optimization via surrogate modeling using implicit physics-constrained neural networks”, IEEE Transaction on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 42, no. 11, Nov. 2023. 10.1109/TCAD.2023.3269385
Conference publications
- S. Sadiqbatcha, H. Zhao, H. Amrouch, J. Henkel and S. X-D. Tan, "Hot spot identification and system parameterized thermal modeling for multi-core processors through infrared thermal imaging”, Proc. Design, Automation and Test in Europe (DATE'19), Florence, Italy, March 2019.
- Z. Sun, H. Zhou, and S. X.-D. Tan, “Dynamic reliability management for multi-core processor based on deep reinforcement learning”, International Conference on Synthesis, Modeling, Analysis and Simulation Methods and Applications to Circuit Design (SMACD’19), Lausanne, Switzerland, July 2019.
- S. Sadiqbatcha, Y. Zhao, J. Zhang, H. Amrouch, J. Henkel and S. X.-D. Tan, "Machine learning based online full-chip heatmap estimation," Proc. Asia South Pacific Design Automation Conference (ASP-DAC’20), Beijing, China, Jan. 2020. (35% acceptance rate)
- J. Zhang, S. Sadiqbatcha, W. Jin and S. X.-D. Tan, “Accurate power density map estimation for commercial multi-core microprocessors”, Proc. Design, Automation and Test in Europe (DATE’20), Grenoble, France, March 2020. (26% acceptance rate)
- S. Yu, H. Zhou, H. Amrouch, J. Henkel, S. X.-D. Tan, “Run-time accuracy reconfigurable stochastic computing for dynamic reliability and power management: work-in-progress”, Proc. International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES’20), ESWeek 2020, Sept 2020.
- W. Jin, S. Sadiqbatcha, J. Zhang and S. X.-D. Tan, “Full-chip thermal map estimation for multi-core commercial CPUs with generative adversarial learning”, Proc. IEEE/ACM International Conf. on Computer-Aided Design (ICCAD’20), San Diego, CA, Nov. 2020. (invited), https://doi.org/10.1145/3400302.3415764
- J. Zhang, S. Sadiqbatcha, Y. Gao, M. O’Dea, N. Yu, and S. X.-D. Tan, “HAT-DRL: Hotspot-Aware Task Mapping for Lifetime Improvement of Multicore System using Deep Reinforcement Learning”, Proc. 2nd IEEE/ACM Workshop on Machine Learning for CAD (MLCAD’20), Virtual Event, Nov. 2020, doi: 10.1145/3380446.3430623.
- L. Chen, W. Jin and S. X.-D. Tan, "Fast thermal analysis for chiplet design based on graph convolution networks”, Proc. Asia South Pacific Design Automation Conference (ASP-DAC’22), virtual, Jan. 2022. (invited), doi: 10.1109/ASP-DAC52403.2022.9712583
-
J. Lu J. Zhang, W. Jin and S. Sachdeva and S. X.-D. Tan, “Learning based spatial power characterization and full-chip power estimation for commercial TPUs”, Proc. Asia South Pacific Design Automation Conference (ASP-DAC’23), Japan, Jan. 2023. (invited), https://doi.org/10.1145/3566097.3568347
-
J. Lu, J. Zhang and S. X.-D. Tan, “Real-time thermal map estimation for AMD multi-core CPUs using transformer”, Proc. IEEE/ACM International Conf. on Computer-Aided Design (ICCAD’23), San Francisco, CA, Nov. 2023, 10.1109/ICCAD57390.2023.10323817
-
L. Chen, J. Lu, W. Jin and S. X.-D. Tan, “Fast full-chip parametric thermal analysis based on enhanced physics enforced neural networks”, Proc. IEEE/ACM International Conf. on Computer-Aided Design (ICCAD’23), San Francisco, CA, Nov. 2023, 10.1109/ICCAD57390.2023.10323696
-
J. Lu and S. X.-D. Tan, “Thermal map dataset for commercial multi/many core CPU/GPU/TPU”, Proc. Of 6th ACM/IEEE International Symposium on Machine Learning for CAD (MLCAD'24), Snowbird, Utah, Sept, 2024, https://doi.org/10.1145/3670474.36859