Chiplets are transforming computer system designs, allowing system designers to combine heterogeneous computing resources at unprecedented scales. Breaking larger, mono-lithic chips into smaller, connected chip lets helps performance continue scaling, avoids die size limitations, improves yield, and reduces design and integration costs. However, chip let-based designs introduce an additional level of hierarchy, which causes indirection and non-uniformity. This clashes with typ-ical heterogeneous systems: unlike CPU-based multi-chiplet systems, heterogeneous systems do not have significant OS support or complex coherence protocols to mitigate the impact of this indirection. Thus, exploiting locality across application phases is harder in multi-chiplet heterogeneous systems. We propose CPElide, which utilizes information already avail-able in heterogeneous systems’ embedded microprocessor (the command processor) to track inter-chiplet data dependencies and aggressively perform implicit synchronization only when necessary, instead of conservatively like the state-of-the-art HMG. Across 24 workloads CPElide improves average performance (13%, 19%), energy (14%, 11 %), and network traffic (14%,17%), respectively, over current approaches and HMG.