Main Program

Monday, 27 February 2023

08:30 – 09:30KeynoteAddressing Challenges of Core Microarchitecture Researchby Daniel A. Jiménez
09:30 – 10:00Coffee Break
10:00 – 12:00Session 1A: Neural Networks and Accelerators 1

Session Chair: Xue Lin

SGCN: Exploiting Compressed-Sparse Features in Deep Graph Convolutional Network Accelerators
Mingi Yoo (Yonsei University),
Jaeyong Song (Yonsei University),
Jounghoo Lee (Yonsei University),
Namhyung Kim (Samsung Electronics),
Youngsok Kim (Yonsei University),
Jinho Lee (Seoul National University)


PhotoFourier: A Photonic Joint Transform Correlator-Based Neural Network Accelerator
Shurui Li (UCLA),
Hangbo Yang (UCLA),
Chee Wei Wong (UCLA),
Volker Sorger (The George Washington University),
Puneet Gupta (UCLA)


INCA: Input-stationary Dataflow at Outside-the-box Thinking about Deep Learning Accelerators
Bokyung Kim (Duke University),
Shiyu Li (Duke University),
Hai “Helen” Li (Duke University)


GROW: A Row-Stationary Sparse-Dense GEMM Accelerator for Memory-Efficient Graph Convolutional Neural Networks
Ranggi Hwang (KAIST),
Minhoo Kang (KAIST),
Jiwon Lee (KAIST),
Dongyun Kam (POSTECH),
Youngjoo Lee (POSTECH),
Minsoo Rhu (KAIST)


Logical/Physical Topology-Aware Collective Communication in Deep Learning Training
Jo Sanghoon (KAIST),
Hyojun Son (KAIST),
John Kim (KAIST)


Sibia: Signed Bit-slice Architecture for Dense DNN Acceleration with Slice-level Sparsity Exploitation
Dongseok Im (KAIST),
Gwangtae Park (KAIST),
Zhiyong Li (KAIST),
Junha Ryu (KAIST),
Hoi-Jun Yoo (KAIST)
Session 1B: NVRAM and Hybrid Memory

Session Chair: Prashant Nair

AstriFlash: A Flash-Based System for Online Services
Siddharth Gupta (EPFL),
Yunho Oh (Korea University),
Lei Yan (EPFL),
Mark Sutherland (EPFL),
Abhishek Bhattacharjee (Yale University),
Babak Falsafi (EPFL),
Peter Hsu (Peter Hsu & Associates)


Thoth: Bridging the Gap Between Persistently Secure Memories and Memory Interfaces of Emerging NVMs
Xijing Han (North Carolina State University),
James Tuck (North Carolina State University),
Amro Awad (North Carolina State University)


Multi-Granularity Shadow Paging with NVM Write Optimization for Crash-Consistent Memory-Mapped I/O
Hongchao Du (City University of Hong Kong),
Qiao Li (Xiamen University),
Riwei Pan (City University of Hong Kong),
Tei-Wei Kuo (National Taiwan University),
Chun Jason Xue (City University of Hong Kong)


MGC: Multiple-Gray-Code for 3D NAND Flash based High-Density SSDs
Yina Lv (East China Normal University),
Liang Shi (East China Normal University),
Qiao Li (Xiamen University),
Congming Gao (Xiamen University),
Yunpeng Song (East China Normal University),
Longfei Luo (East China Normal University),
Youtao Zhang (University of Pittsburgh)


Baryon: Efficient Hybrid Memory Management with Compression and Sub-Blocking
Yiwei Li (Tsinghua University),
Mingyu Gao (Tsinghua University)


Root Crash Consistency of SGX-style Integrity Trees in Secure Non-Volatile Memory Systems
Jianming Huang (Huazhong University of Science and Technology),
Yu Hua (Huazhong University of Science and Technology)
Session 1C: Caching and Memory Management

Session Chair: Minnsoo Rhu

ACIC: Admission-Controlled Instruction Cache
Yunjin Wang (Pennsylvania State University),
Chia-Hao Chang (Pennsylvania State University),
Anand Sivasubramaniam (Pennsylvania State University),
Niranjan K Soundararajan (Intel Labs)

Compression-Aware and Performance-Efficient Insertion Policies for Long-Lasting Hybrid LLCs
Carlos Escuin (University of Zaragoza),
Asif Ali Khan (Tu Dresden),
Pablo Ibáñez Marín (Universidad de Zaragoza),
Teresa Monreal (Universitat Politècnica de Catalunya),
Jeronimo Castrillon (Center for Advancing Electronics Dresden, TU Dresden),
Víctor Viñals Yúfera (Universidad de Zaragoza)


NOMAD: Enabling Non-blocking OS-managed DRAM Cache via Tag-Data Decoupling
Youngin Kim (Yonsei University),
Hyeonjin Kim (Yonsei University),
William Song (Yonsei University)


Safety Hints for HTM Capacity Abort Mitigation
Anirudh Jain (Georgia Tech),
Divya Kiran Kadiyala (Georgia Tech),
Alexandros Daglis (Georgia Tech)


iCACHE: An Importance-Sampling-Informed Cache for Accelerating I/O-Bound DNN Model Training
Weijian Chen (Zhejiang University),
Shuibing He (Zhejiang University),
Yaowen Xu (Zhejiang University),
Xuechen Zhang (Washington State University Vancouver),
Siling Yang (Zhejiang University),
Shuang Hu (Zhejiang University),
Sun Xian-He (Illinois Institute of Technology),
Gang Chen (Zhejiang University


Are Randomized Caches Truly Random? Formal Analysis of Randomized-Partitioned Caches
Anirban Chakraborty (Indian Institute of Technology, Kharagpur),
Sarani Bhattacharya (Imec, Belgium),
Sayandeep Saha (Nanyang Technological University, Singapore),
Debdeep Mukhopadhyay (Indian Institute of Technology, Kharagpur)
12:00 – 01:30Lunch
13:30 – 15:10Session 2A: Accelerators

Session Chair: Christopher Batten

HIRAC: A Hierarchical Accelerator with Sorting-based Packing for SpGEMMs in DNN Applications
Hesam Shabani (Lehigh University),
Abhishek Singh (Lehigh University),
Bishoy Youhana (Lehigh University),
Xiaochen Guo (Lehigh University)


VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs
Geonhwa Jeong (Georgia Tech),
Sana Damani (Georgia Tech),
Abhimanyu Bambhaniya (Georgia Tech),
Eric Qin (Georgia Tech),
Christopher J. Hughes (Intel Labs),
Sreenivas Subramoney (Intel Labs),
Hyesoon Kim (Georgia Tech),
Tushar Krishna (Georgia Tech)


ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
Haoran You (Georgia Tech),
Zhanyi Sun (Rice University),
Huihong Shi (Nanjing University),
Zhongzhi Yu (Georgia Tech),
Yang Zhao (Rice University),
Yongan Zhang (Georgia Tech),
Chaojian Li (Georgia Tech),
Baopu Li (Oracle Health and AI),
Yingyan (Celine) Lin (Georgia Tech)


Leveraging Domain Information for the Efficient Automated Design of Deep Learning Accelerators
Chirag Sakhuja (The University of Texas at Austin),
Zhan Shi (The University of Texas at Austin),
Calvin Lin (The University of Texas at Austin)


DIMM-Link: Enabling Efficient Inter-DIMM Communication for Near-Memory Processing
Zhe Zhou (Peking University),
Cong Li (Peking University),
Fan Yang (Nankai University),
Guangyu Sun (Peking University)
Session 2B: Security

Session Chair: Magnus Själander

AutoCAT: Reinforcement Learning for Automated Exploration of Cache-Timing Attacks
Mulong Luo* (Cornell University),
Wenjie Xiong* (Meta AI/Virginia Tech),
Geunbae Lee (Virginia Tech),
Yueying Li (Cornell University),
Xiaomeng Yang (Meta AI),
Amy Zhang (Meta AI),
Yuandong Tian (Meta AI),
Hsien-Hsin S. Lee (Intel),
Edward Suh (Meta AI/Cornell University)

(* indicates equal contributions)

SHADOW: Preventing Row Hammer in DRAM with Intra-Subarray Row Shuffling
Minbok Wi (Seoul National University),
Jaehyun Park (Seoul National University),
Seoyoung Ko (Seoul National University), Michael Jaemin Kim (Seoul National University),
Nam Sung Kim (UIUC),
Eojin Lee (Inha University),
Jung Ho Ahn (Seoul National University)


Efficient Distributed Secure Memory with Migratable Merkle Tree
Erhu Feng (Shanghai Jiao Tong University),
Dong Du (Shanghai Jiao Tong University),
Yubin Xia (Shanghai Jiao Tong University),
Haibo Chen (Shanghai Jiao Tong University)


AB-ORAM: Constructing Adjustable Buckets for Space Reduction in Ring ORAM
Mehrnoosh Raoufi (University of Pittsburgh),
Jun Yang (University of Pittsburgh),
Xulong Tang (University of Pittsburgh),
Youtao Zhang (University of Pittsburgh)


Scalable and Secure Row-Swap: Efficient and Safe Row Hammer Mitigation in Memory Systems
Jeonghyun Woo (University of British Columbia),
Gururaj Saileshwar (Georgia Institute of Technology),
Prashant J. Nair (University of British Columbia)
Session 2C: Applications 1

Session Chair: Tim Rogers

Post0-VR: Enabling Universal Realistic Rendering for Modern VR via Exploiting Architectural Similarity and Data Sharing
Yu Wen (University of Houston),
Chenhao Xie (
Beihang University),
Shuaiwen Leon Song (University of Sydney),
Xin Fu (University of Houston)


ParallelNN: A Parallel Octree-based Nearest Neighbor Search Accelerator for 3D Point Clouds
Faquan Chen (Shanghai Jiao Tong University),
Rendong Ying (Shanghai Jiao Tong University),
Jianwei Xue (Shanghai Jiao Tong University),
Fei Wen (Shanghai Jiao Tong University),
Peilin Liu (Shanghai Jiao Tong University)


ViTALiTy: Unifying Low-rank and Sparse Approximation for Vision Transformer Acceleration with Linear Taylor Attention
Jyotikrishna Dass (Rice University),
Shang Wu (Rice University),
Huihong Shi (Georgia Institute of Technology),
Chaojian Li (Georgia Institute of Technology),
Zhifan Ye (Rice University),
Zhongfeng Wang (Nanjing University),
Yingyan (Celine) Lin (Georgia Institute of Technology)


CTA: Hardware-Software Co-design for Compressed Token Attention Mechanism
Haoran Wang (Institute of Computing Technology, Chinese Academy of Sciences),
Haobo Xu (Institute of Computing Technology, Chinese Academy of Sciences),
Ying Wang (Institute of Computing Technology, Chinese Academy of Sciences),
Yinhe Han (ICT, Chinese Academy of Sciences)


HeatViT: Hardware-Efficient Adaptive Token Pruning for Vision Transformers
Peiyan Dong (Northeastern University),
Mengshu Sun (Northeastern University),
Alec Lu (Simon Fraser University),
Yanyue Xie (Northeastern University),
Kenneth Liu (Simon Fraser University),
Zhenglun Kong (Northeastern University),
Xin Meng (Peking University),
Zhengang Li (Northeastern University),
Xue Lin (Northeastern University),
Zhenman Fang (Simon Fraser University),
Yanzhi Wang (Northeastern University)
15:10 – 15:40Coffee Break
15:40 – 17:00Session 3A: Best of CAL

Session Chair: Chia-Lin Yang

A First-Order Model to Assess Computer Architecture Sustainability
Lieven Eeckhout (Ghent University)

A Pre-Silicon Approach to Discovering Microarchitectural Vulnerabilities in Security Critical Applications
Kristin Barber (The Ohio State University),
Moein Ghaniyoun (The Ohio State University),
Yinqian Zhang (Southern University of Science and Technology),
Radu Teodorescu (The Ohio State University)

GraNDe: Near-Data Processing Architecture With Adaptive Matrix Mapping for Graph Convolutional Networks
Sungmin Yun (Seoul National University),
Byeongho Kim (Seoul National University),
Jaehyun Park (Seoul National University),
Hwayong Nam (Seoul National University),
Jung Ho Ahn (Seoul National University),
Eojin Lee (Inha University) 
Session 3B: Datacenters and HPC

Session Chair: Osman Unsal

Trans-FW: Short Circuiting Page Table Walk in Multi-GPU Systems via Remote Forwarding
Bingyao Li (University of Pittsburgh),
Jieming Yin (Lehigh University),
Anup Holey (NVIDIA),
Youtao Zhang (University of Pittsburgh),
Jun Yang (University of Pittsburgh),
Xulong Tang (University of Pittsburgh)


Ah-Q: Quantifying and Handling the Interference within a Datacenter from a System Perspective
Yu-Hang Liu (ICT, CAS),
Xin Deng (ICT, CAS),
Jiapeng Zhou (ICT, CAS),
Mingyu Chen (ICT, CAS),
Yungang Bao (ICT, CAS)


Market Mechanism-Based User-in-the-Loop Scalable Power Oversubscription for HPC Systems
Md Rajib Hossen (UT Arlington),
Kishwar Ahmed (The University of Toledo),
Mohammad A. Islam (UT Arlington)


RAMBDA: RDMA-driven Acceleration Framework for Memory-intensive us-scale Datacenter Applications
Yifan Yuan (UIUC/Intel Labs),
Jinghan Huang (UIUC),
Yan Sun (UIUC),
Tianchen Wang (UIUC),
Jacob Nelson (Microsoft Research),
Dan Ports (Microsoft Research),
Yipeng Wang (Intel Labs),
Ren Wang (Intel Labs),
Charlie Tai (Intel Labs),
Nam Sung Kim (UIUC)
Session 3C: GPUs

Session Chair: Daniel Wong

FinePack: Transparently Improving the Efficiency of Fine-Grained Transfers in Multi-GPU Systems
Harini Muthukrishnan (NVIDIA),
Daniel Lustig (NVIDIA),
Oreste Villa (NVIDIA),
Thomas Wenisch (University of Michigan),
David Nellans (NVIDIA)

Mitigating GPU Core Partitioning Performance Effects
Aaron Barnes (Purdue University),
Fangjia Shen (Purdue University),
Timothy G. Rogers (Purdue University)

Plutus: Bandwidth-Efficient Memory Security for GPUs
Rahaf Abdullah (North Carolina State University),
Huiyang Zhou (North Carolina State University),
Amro Awad (North Carolina State University)


MPress: Democratizing Billion-Scale Model Training on Multi-GPU Servers via Memory-Saving Inter-Operator Parallelism
Quan Zhou (University of Science and Technology of China),
Haiquan Wang (University of Science and Technology of China),
Xiaoyan Yu (University of Science and Technology of China),
Cheng Li (University of Science and Technology of China),
Youhui Bai (University of Science and Technology of China),
Feng Yan (University of Houston),
Yinlong Xu (University of Science and Technology of China)
17:00 – 17:15Refreshments for Award Ceremony
17:15 – 17:45HPCA Test-of-Time Awards
17:45 – 18:45Business Meeting

Tuesday, 28 February 2023

08:30 – 09:30Keynote
09:30 – 10:00Coffee Break
10:00 – 12:00Session 4A: Neural Networks and Accelerators 2

Session Chair: Dimitrios Soudris

DeFiNES: Enabling Fast Exploration of the Depth-first Scheduling Space for DNN Accelerators through Analytical Modeling
Linyan Mei (KU Leuven),
Koen Goetschalckx (KU Leuven),
Arne Symons (KU Leuven),
Marian Verhelst (KU Leuven)


CEGMA: Coordinated Elastic Graph Matching Acceleration for Graph Matching Networks
Yue Dai (University of Pittsburgh),
Youtao Zhang (University of Pittsburgh),
Xulong Tang (University of Pittsburgh)

ISOSceles: Accelerating Sparse CNNs through Inter-Layer Pipelining
Yifan Yang (MIT),
Joel Emer (MIT/NVIDIA),
Daniel Sanchez (MIT)


OptimStore: In-Storage Optimization of Large Scale DNNs with On-Die Processing
Junkyum Kim (SAIT),
Myeonggu Kang (KAIST),
Yunki Han (KAIST),
Yang-gon Kim (KAIST),
Lee-sup Kim (KAIST)


KRISP: Enabling Kernel-wise Right-sizing for Spatial Partitioned GPU Inference Servers
Marcus Chow (University of California, Riverside),
Ali Jahanshahi (University of California, Riverside),
Daniel Wong (University of California, Riverside)


MERCURY: Accelerating DNN Training By Exploiting Input Similarity
Vahid Janfaza (Texas A&M University),
Kevin Weston (Texas A&M University),
Moein Razavi Ghods (Texas A&M University),
Shantanu Mandal (Texas A&M University),
Farabi Mahmud (Texas A&M University),
Alex Hilty (Texas A&M University),
Abdullah Muzahid (Texas A&M University)
Session 4B: PIMs and Persistent Memory

Session Chair: Josep Torrellas

Silo: Speculative Hardware Logging for Atomic Durability in Persistent Memory
Ming Zhang (Huazhong University of Science and Technology),
Yu Hua (Huazhong University of Science and Technology)


Reconciling Selective Logging and Hardware Persistent Memory Transaction
Chencheng Ye (Huazhong University of Science and Technology),
Yuanchao Xu (North Carolina State University),
Xipeng Shen (North Carolina State University),
Yan Sha (Huazhong University of Science and Technology),
XIAOFEI LIAO (Huazhong University of Science and Technology),
Hai Jin (Huazhong University of Science and Technology),
Yan Solihin (University of Central Florida)


SecPB: Architectures for Secure Non-Volatile Memory with Battery-Backed Persist Buffers
Alexander Freij (North Carolina State University),
Huiyang Zhou (North Carolina State University),
Yan Solihin (University of Central Florida)


EVE: Ephemeral Vector Engines
Khalid Al-Hawaj (Cornell University),
Tuan Ta (Cornell University),
Nick Cebry (Cornell University),
Shady Agwa (Cornell University),
Olalekan Afuye (Cornell University),
Eric Hall (Cornell University),
Courtney Golden (Cornell University),
Alyssa B. Apsel (Cornell University),
Christopher Batten (Cornell University)


On Consistency for Bulk-Bitwise Processing-in-Memory
Ben Perach (Technion – Israel Institute of Technology),
Ronny Ronen (Technion – Israel Institute of Technology),
Shahar Kvatinsky (Technion – Israel Institute of Technology)


Dalorex: A Data-Local Program Execution and Architecture for Memory-bound Applications
Marcelo Orenes-Vera (Princeton University),
Esin Tureci (Princeton University),
David Wentzlaff (Princeton University),
Margaret Martonosi (Princeton University)
Session 4C: Quantum and FPGAs

Session Chair: Eddy Zhang

HyQSAT: A Hybrid Approach for 3-SAT Problems by Integrating Quantum Annealer with CDCL
Siwei Tan (Zhejiang University),
Mingqian Yu (Zhejiang University),
Andre Python (Zhejiang University),
Yongheng Shang (Zhejiang University),
Tingting Li (Zhejiang University),
Liqiang Lu (Zhejiang University),
Jianwei Yin (Zhejiang University)


Duet: Creating Harmony between Processors and Embedded FPGAs
Ang Li (Princeton University),
August Ning (Princeton University),
David Wentzlaff (Princeton University)


Co-Designed Architectures for Modular Superconducting Quantum Computers
Evan McKinney (University of Pittsburgh),
Mingkang Xia (University of Pittsburgh),
Chao Zhou (University of Pittsburgh),
Pinlei Lu (University of Pittsburgh),
Michael Hatridge (University of Pittsburgh),
Alex K. Jones (University of Pittsburgh)


A Pulse Generation Framework with Augmented Program-aware Basis Gates and Criticality Analysis
Yanhao Chen (Rutgers University),
Yuwei Jin (Rutgers University),
Fei Hua (Rutgers University),
Ari Hayes (Rutgers University),
Ang Li (Pacific Northwest National Laboratory),
Yunong Shi (Amazon Web Services),
Eddy Z. Zhang (Rutgers University)


The Imitation Game: Leveraging CopyCats for Robust Native Gate Selection in NISQ Programs
Poulami Das (Georgia Tech),
Eric Kessler (Amazon Web Services)
,
Yunong Shi (Amazon Web Services)
12:00 – 01:30Lunch
13:30 – 15:10Session 5A: Cloud and Edge Computing

Session Chair: Alex Daglis

eNODE: Energy-Efficient and Low-Latency Edge Inference and Training of Neural ODEs
Junkang Zhu (University of Michigan, Ann Arbor),
Yaoyu Tao (University of Michigan, Ann Arbor),
Zhengya Zhang (University of Michigan, Ann Arbor)


SpecFaaS: Accelerating Serverless Applications with Speculative Function Execution
Jovan Stojkovic (University of Illinois at Urbana-Champaign),
Tianyin Xu (University of Illinois at Urbana-Champaign),
Hubertus Franke (IBM Research),
Josep Torrellas (University of Illinois Urbana-Champaign)


MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks
Seah Kim (University of California, Berkeley),
Hasan Genc (University of California, Berkeley),
Vadim Vadimovich Nikiforov (University of California, Berkeley),
Krste Asanovic (University of California Berkeley),
Borivoje Nikolic (University of California, Berkeley),
Yakun Sophia Shao (University of California, Berkeley)


Know Your Enemy To Save Cloud Energy: Energy-Performance Characterization of Machine Learning Serving
Jun Yeol Ryu (Sungkyunkwan University),
Jongseok Kim (Sungkyunkwan University),
Euiseong Seo (Sungkyunkwan University)


Adrias: Interference-Aware Memory Orchestration for Disaggregated Cloud Infrastructures
Dimosthenis Masouros (National Technical University of Athens),
Christian Pinto (IBM Research Europe),
Michele Gazzetti (IBM Research),
Sotirios Xydis (Harokopio University of Athens, Greece),
Dimitrios Soudris (National Technical University of Athens)
Session 5B: Encryption and SGX

Session Chair: John Kim

Poseidon: Practical Homomorphic Encryption Accelerator
Yinghao Yang (Institute of Computing Technology, Chinese Academy of Sciences),
Huaizhi Zhang (Zhejiang Ocean University),
Shengyu Fan (Institute of Information Engineering, Chinese Academy of Sciences),
Hang Lu (Institute of Computing Technology, Chinese Academy of Sciences),
Mingzhe Zhang (Institute of Information Engineering, Chinese Academy of Sciences),
Xiaowei Li (Institute of Computing Technology, Chinese Academy of Sciences)


FAB: An FPGA-based Accelerator for Bootstrappable Fully Homomorphic Encryption
Rashmi Agrawal (Boston University),
Leo de Castro (MIT CSAIL),
Guowei Yang (Boston University),
Chiraag Juvekar (Analog Devices),
Rabia Yazicigil (Boston University),
Anantha Chandrakasan (MIT),
Vinod Vaikuntanathan (MIT CSAIL),
Ajay Joshi (Boston University)


FxHENN: FPGA-based acceleration framework for homomorphic encrypted CNN inference
Yilan Zhu (Shandong University),
Xinyao Wang (Shandong University),
Lei Ju (Shandong University),
Shanqing Guo (Shandong University)


D-Shield: Enabling Processor-Side Encryption and Integrity Verification for Secure NVMe Drives
Md Hafizul Islam Chowdhuryy (University of Central Florida),
Myoungsoo Jung (KAIST),
Fan Yao (University of Central Florida),
Amro Awad (North Carolina State University)


TensorFHE: Achieving Practical Computation on Encrypted Data Using GPGPU
Shengyu Fan (Institute of Information Engineering, Chinese Academy of Sciences),
Zhiwei Wang (Institute of Information Engineering, Chinese Academy of Sciences),
Weizhi Xu (Shandong Normal University),
Rui Hou (Institute of Information Engineering, Chinese Academy of Sciences),
Dan Meng (Institute of Information Engineering, Chinese Academy of Sciences),
Mingzhe Zhang (Institute of Information Engineering, Chinese Academy of Sciences)
Session 5C: Reliability

Session Chair: Vilas Sridharan

AVGI: Microarchitecture-Driven, Fast and Accurate Vulnerability Assessment
George Papadimitriou (University of Athens),
Dimitris Gizopoulos (University of Athens)


Thales: Formulating and Estimating Architectural Vulnerability Factors for DNN Accelerators
Abhishek Tyagi (University of Rochester),
Yiming Gan (University of Rochester),
Shaoshan Liu (PerceptIn),
Bo Yu (PerceptIn),
Paul Whatmough (Arm),
Yuhao Zhu (University of Rochester)


Realizing Extreme Endurance Through Fault-aware Wear Leveling and Improved Tolerance
Jiangwei Zhang (Tsinghua University),
Chong Wang (Tsinghua University),
Zhenhua Zhu (Tsinghua University),
Donald Kline, Jr (Intel Corporation),
Alex K. Jones (University of Pittsburgh),
Huazhong Yang (Tsinghua University),
Yu Wang (Tsinghua University)


ESD: An ECC-assisted and Selective Deduplication for Non-Volatile Main Memory
Chunfeng Du (Xiamen University),
Suzhen Wu (Xiamen University),
Jiapeng Wu (Xiamen University),
Bo Mao (Xiamen University),
Shengzhe Wang (Xiamen University)
15:10 – 15:40Coffee Break
15:40 – 17:00Session 6A: Industry Track Session

Session Chair: Ravi Iver

A Systematic Study of DDR4 DRAM Faults in the Field
Majed Valad Beigi (AMD),
Yi Cao (Google),
Sudhanva Gurumurthi (AMD),
Charles Recchia (AMD),
Andrew Walton (Google),
Vilas Sridharan (AMD)

High Performance and Power Efficient Accelerator for Cloud Inference
Jianguo Yao (SJTU/Enflame-Tech Inc.),
Hao Zhou (Enflame-Tech Inc.),
Yalin Zhang (Enflame-Tech Inc.),
Ying Li (Enflame-Tech Inc.),
Chuang Feng (Enflame-Tech Inc.),
Shi Chen (Enflame-Tech Inc.),
Jiaoyan Chen (Enflame-Tech Inc.),
Yongdong Wang (Enflame-Tech Inc.),
Qiaojuan Hu (Enflame-Tech Inc.)

LightTrader: A Standalone High-Frequency Trading System with Deep Learning Inference Accelerators and Proactive Scheduler
Sungyeob Yoo (KAIST),
Hyunsung Kim (Rebellions Inc.),
Jinseok Kim (Rebellions Inc.),
Sunghyun Park (Rebellions Inc.),
Joo-Young Kim (KAIST),
Jinwook Oh (Rebellions Inc.)


BM-Store: A Transparent and High-performance Local Storage Architecture for Bare-metal Clouds Enabling Large-scale Deployment
Yiquan Chen (Alibaba Group and Zhejiang University),
Jiexiong Xu (Zhejiang University),
Chengkun Wei (Zhejiang University),
Yijing Wang (Alibaba Group),
Xin Yuan (Alibaba Group),
Yangming Zhang (Alibaba Group),
Xulin Yu (Alibaba Group),
Yi Chen (Zhejiang University),
Zeke Wang (Zhejiang University),
Shuibing He (Zhejiang University),
Wenzhi Chen (Zhejiang University)
Session 6B: NICs and Networks

Session Chair: Ajay Joshi

Turbo: SmartNIC-enabled Dynamic Load Balancing of μs-scale RPCs
Hamed Seyedroudbari (Georgia Tech),
Srikar Vanavasam (Georgia Tech),
Alexandros Daglis (Georgia Tech)

A Scalable Methodology for Designing Efficient Interconnection Network of Chiplets
Yinxiao Feng (Tsinghua University),
Dong Xiang (Tsinghua University),
Kaisheng Ma (Tsinghua University)


VVQ: Virtualizing Virtual Channel for Cost-Efficient Protocol Deadlock Avoidance
Hans Kasan (KAIST),
John Kim (KAIST)


17:00 – 18:00Break
18:00 – 22:00Banquet

Wednesday, 1 March 2023

08:30 – 09:30Keynote
09:30 – 10:00Coffee Break
10:00 – 12:00Session 7A: Neural Network and Accelerators 3

Session Chair: Abdullah Muzahid

Mix-GEMM: An efficient HW-SW Architecture for Mixed-Precision Quantized Deep Neural Networks Inference on Edge Devices
Enrico Reggiani (Barcelona Supercomputing Center),
Alessandro Pappalardo (AMD AECG Research Labs),
Max Doblas Font (Barcelona Supercomputing Center),
Miquel Moreto (BSC and UPC),
Mauro Olivieri (Sapienza University of Rome),
Osman Sabri Unsal (Barcelona Supercomputing Center),

Adrian Cristal (Barcelona Supercomputing Center)

FlowGNN: A Dataflow Architecture for Real-Time Workload-Agnostic Graph Neural Network Inference
Rishov Sarkar (Georgia Institute of Technology),
Stefan Abi-Karam (Georgia Institute of Technology),
Yuqi He (Georgia Institute of Technology),
Lakshmi Sathidevi (Georgia Institute of Technology),
Cong Hao (Georgia Institute of Technology),
Callie Hao (Georgia Institute of Technology)


Chimera: An Analytical Optimizing Framework for Effective Compute-intensive Operators Fusion
Size Zheng (Peking University),
Siyuan Chen (Peking University),
Peidi Song (Peking University),
Renze Chen (Peking University),
Xiuhong Li (Sensetime Research & Shanghai AI Lab),
Shengen Yan (SenseTime Research),
Dahua Lin (The Chinese University of Hong Kong & Shanghai AI Lab),
Jingwen Leng (Shanghai Jiao Tong University),
Yun Liang (Peking University)


Securator: A Fast and Secure Neural Processing Unit
Nivedita Shrivastava (IIT Delhi),
Smruti R. Sarangi (IIT Delhi)


Tensor Movement Orchestration In Multi-GPU Training Systems
Shao-Fu Lin (National Taiwan University),
Chia-Lin Yang (National Taiwan University),
Yi-Jung Chen (National Chi Nan University),
Hsiang-Yun Cheng (Academia Sinica)
Session 7B: Microarchitecture and memory Systems

Session Chair: Abhishek Bhattacharjee

A Storage-Effective BTB Organization for Servers
Truls Asheim (Norwegian University of Science and Technology),
Boris Grot (University of Edinburgh),
Rakesh Kumar (Norwegian University of Science and Technology (NTNU))


HoPP: Hardware-Software Co-Designed Page Prefetching for Disaggregated Memory
Haifeng Li (Institute of Computing Technology, Chinese Academy of Sciences,University of Chinese Aca
demy of Sciences),
Ke Liu (Institute of Computing Technology, Chinese Academy of Sciences),
Ting Liang (Institute of Computing Technology, Chinese Academy of Sciences,University of Chinese Aca
demy of Sciences),
Zuojun Li (Institute of Computing Technology, Chinese Academy of Sciences),
Tianyue Lu (Institute of Computing Technology, Chinese Academy of Sciences),
Hui Yuan (Huawei),
Yinben Xia (Huawei),
Yungang Bao (ICT, CAS),
Mingyu Chen (Institute of Computing Technology, Chinese Academy of Sciences),
Yizhou Shan (Huawei Cloud)


Speculative Register Reclamation
Sanyam Mehta (HPE)

SnakeByte: A TLB Design with Adaptive and Recursive Page Merging in GPU
Jiwon Lee (Yonsei University),
Ju Min Lee (Yonsei University),
Yunho Oh (Korea University),
William Song (Yonsei University),
Won Woo Ro (Yonsei University)


CARE: A Concurrency-Aware Enhanced Lightweight Cache Management Framework
Xiaoyang Lu (Illinois Institute of Technology),
Rujia Wang (Illinois Institute of Technology),
Xian-He Sun (Illinois Institute of Technology)


Memory-Efficient Hashed Page Tables
Jovan Stojkovic (University of Illinois at Urbana-Champaign),
Namrata Mantri (UIUC),
Dimitrios Skarlatos (Carnegie Mellon University),
Tianyin Xu (University of Illinois at Urbana-Champaign),
Josep Torrellas (University of Illinois Urbana-Champaign)

Session 7C: Applications 2 & Potpourri

Session Chair: David Kaeli

NvWa: Enhancing Sequence Alignment Accelerator Throughput via Hardware Scheduling
Yewen Li (Institute of Computing Technology, Chinese Academy of Sciences / University of Chinese Aca
demy of Sciences),
Xueqi Li (Institute of Computing Technology, Chinese Academy of Sciences),
Ruihao Gao (Institute of Computing Technology, Chinese Academy of Sciences / University of Chinese A
cademy of Sciences),
Wanqi Liu (Institute of Computing Technology, Chinese Academy of Sciences / University of Chinese Ac
ademy of Sciences),
Guangming Tan (Institute of Computing Technology, Chinese Academy of Sciences)


Efficient Supernet Training Using Path Parallelism
Ying Xu (Institute of Computing Technology, Chinese Academy of Sciences),
Long Cheng (School of Control and Computer Engineering, North China Electric Power University),

Xuyi Cai (Institute of Computing Technology, Chinese Academy of Sciences),
Xiaohan Ma (Institute of Computing Technology, Chinese Academy of Sciences),
Weiwei Chen (Insititute of Computer Technoology, Chinese Academy of Sciences.),
Lei Zhang (Institute of Computing Technology, Chinese Academy of Sciences)
,
Ying Wang (Institute of Computing Technology, Chinese Academy of Sciences)

Phloem: Automatic Acceleration of Irregular Applications with Fine-Grain Pipeline Parallelism
Quan Nguyen (MIT),
Daniel Sanchez (MIT)


CHOPPER: A Compiler Infrastructure for Programmable Bit-serial SIMD Processing Using Memory in DRAM
Xiangjun Peng (The Chinese University of Hong Kong),
Yaohua Wang (National University of Defense Technology),
Ming-Chang Yang (The Chinese University of Hong Kong)


VAQUERO: Vector Acceleration for Query Processing
Julian Pavon Rivera (Barcelona Supercomputing Center),
Ivan Vargas Valdivieso (Barcelona Supercomputing Center),
Joan Marimon (Barcelona Supercomputing Center),
Roger Figueras Bagué (Barcelona Supercomputing Center),
Francesc Moll Echeto (Universidad Politecnica de Catalunya),
Osman Unsal (Barcelona Supercomputing Center),
Mateo Valero (Barcelona Supercomputing Center),
Adrian Cristal (Barcelona Supercomputing Center)
12:00 – 12:20Closing