Photo by Oleksandr Zinenko
IMPACT 2023
13th International Workshop on Polyhedral Compilation Techniques
January 16th, 2023 | Toulouse, France
In conjunction with HiPEAC 2023,
January 16-18 2023
|
 |
Polyhedral compilation techniques receive more attention than ever
from both researchers and practitioners. Thanks to a unified formalism
for parallelization and memory optimization, polyhedral techniques
play a central role in a wide range of domain-specific languages and
optimizing compilers, with high impact in competitive areas such as
machine learning, scientific computing and media processing.
IMPACT is a unique event focusing on polyhedral compilation. The
workshop brings together researchers and practitioners for a
high-quality one-day event including a keynote, paper presentations,
and work-in-progress discussions. The workshop has a well
established tradition of high quality, extensive peer reviews, with a
PC balancing core polyhedral expertise with applications and broader
computing systems research.
Program
Pierre Baudis Convention Centre
Room: Foyer St Exupéry
HiPEAC session site
Opening and Keynote (10:00 - 11:00)
-
Opening
[Slides]
-
Keynote: Compilation and Optimization of Static Control Flow Programs: Polyhedral Compilation at
Large.
Louis-Noël Pouchet
[Abstract]
The process of efficiently mapping an algorithm to a particular hardware is a constantly evolving challenge,
not least due to the increasing complexity of both the target hardware and the software stack to be used in
modern computing.
To cope with these challenges, a broad range of optimizing compilation techniques are developed, with the aim
to simplify programming for the user, while automating its optimization for performance. These developments
have dramatically increased the role and expectations of optimizing compilers: we expect from them high
performance of the application for a wide variety of execution contexts, which is critical for embedded and
mainstream computing as well as the push to exascale computing at the high end.
A key to success is the ability to capture the semantics of the original program, and execution behavior of
the machine, as accurately as possible at compile-time. Static control flow programs, where every branch taken
at runtime can be exactly modeled at compile-time, model a rich class of numerical algorithms, and have been
the target of polyhedral compilation. In this talk, I will attempt to pave a way between programs, compilers
and machines where polyhedral analysis and optimizations have delivered successful approaches to automatic
optimization.
[Bio]
Louis-Noël Pouchet is a tenured Associate Professor of Computer Science at Colorado State University, with a
Joint Appointment in the Electrical and Computer Engineering department at CSU. He received his Ph.D. in
Computer Science in 2010 from University of Paris-Sud 11 and INRIA, France, was a postdoctoral fellow
(2010-2012) and then Research Assistant Professor (2014-2016) in the Computer Science and Engineering
department at Ohio State University before joining CSU in 2016, and was a Visiting Assistant Professor at
University of California Los Angeles (2012-2014), as a member of the NSF Center for Domain-Specific Computing.
He works on a variety of topics for high-performance computing, including optimizing compilers for CPUs, GPUs
and FPGAs, domain-specific languages and optimizations, high-level synthesis, and distributed systems for
large-scale scientific computing. Pouchet is a leading expert in polyhedral compilation, with contributions in
modeling, scheduling and reconstructing static control flow programs, and proving their equivalence. His
research is funded by the U.S. National Science Foundation, the U.S. Department of Energy, and Intel. He has
co-authored 100+ publications; he is the recipient of an NSF CAREER grant, the William J. McCalla IEEE best
paper award at ICCAD’15, and has received several best paper awards at top conferences. He is the main author
of the PoCC polyhedral compiler, of the ROSE/PolyOpt software toolset (a complete polyhedral compilation
framework for the LLNL ROSE compiler), and of the popular PolyBench/C benchmarking suite, which has been used
in 400+ publications so far.
[Slides]
Break (11:00 — 11:30)
Session 1: Theory and Techniques (11:30 — 13:00) [25 min + 5min Q&A]
-
Maximal Atomic irRedundant Sets: a Usage-based Dataflow Partitioning Algorithm.
Corentin Ferry, Sanjay Rajopadhye and Steven Derrien.
[Abstract]
Programs admitting a polyhedral representation can be transformed in many ways for locality and parallelism,
notably loop tiling. Data flow analysis can then compute dependence
relations between iterations and between tiles. When tiling is
applied, certain iteration-wise dependences cross tile boundaries, creating the need for inter-tile data
communication.
Previous work computes it as the flow-in and flow-out
sets of iteration tiles.
In this paper, we propose a partitioning of the flow-out
of a tile into the maximal sets of iterations that are entirely
consumed and incur no redundant storage or transfer. The
computation is described as an algorithm and performed
on a selection of polyhedral programs. We then suggest
possible applications of this decomposition in compression
and memory allocation.
[Slides]
[Paper]
-
Algebraic Tiling.
Clément Rossetti and Philippe Clauss.
[Abstract]
In this paper, we present an ongoing work whose aim is to
propose a new loop tiling technique where tiles are characterized by their volumes – the number of embedded
iterations
– instead of their sizes – the lengths of their edges. Tiles
of quasi-equal volumes are dynamically generated while
the tiled loops are running, whatever are the original loop
bounds, which may be constant or depending linearly of surrounding loop iterators. The adopted strategy is to
successively and hierarchically slice the iteration domain in parts
of quasi-equal volumes, from the outermost to the innermost
loop dimensions. Since the number of such slices can be
exactly chosen, quasi-perfect load balancing is reached by
choosing, for each parallel loop, the number of slices as being
equal to the number of parallel threads, or to a multiple of
this number. Moreover, the approach avoids partial tiles by
construction, thus yielding a perfect covering of the iteration
domain minimizing the loop control cost. Finally, algebraic
tiling makes dynamic scheduling of the parallel threads fairly
purposeless for the handled parallel tiled loops.
[Slides]
[Paper]
-
Automatic Algorithm-Based Fault Tolerance (AABFT) of Stencil Computations.
Louis Narmour, Steven Derrien and Sanjay Rajopadhye.
[Abstract]
In this work, we study fault tolerance of transient errors,
such as those occurring due to cosmic radiation or hardware
component aging and degradation, using Algorithm-Based
Fault Tolerance (ABFT). ABFT methods typically work by
adding some additional computation in the form of invariant checksums which, by definition, should not change
as
the program executes. By computing and monitoring checksums, it is possible to detect errors by observing
differences
in the checksum values. However, this is challenging for two
key reasons: (1) it requires careful manual analysis of the
input program, and (2) care must be taken to subsequently
carry out the checksum computations efficiently enough for
it to be worth it. Prior work has shown how to apply ABFT
schemes with low overhead for a variety of input programs.
Here, we focus on a subclass of programs called stencil applications, an important class of computations found
widely
in various scientific computing domains. We propose a new
compilation scheme to automatically analyze and generate
the checksum computations. To the best of our knowledge,
this is the first work to do such a thing in a compiler. We
show that low overhead code can be easily generated and
provide a preliminary evaluation of the tradeoff between
performance and effectiveness.
[Slides]
[Paper]
Lunch (13:00 — 14:00)
Session 2: Performance optimization (14:00 — 14:50)
-
Kernel Merging for Throughput-Oriented Accelerator Generation. [25min + 5min Q&A]
Nicolas Derumigny, Louis-Noël Pouchet and Fabrice Rastello.
[Abstract]
Progresses in High-Level Synthesis has enabled programmers to quickly explore design spaces for
high-performance accelerators (e.g. to explore trade-offs between coarse-grain and fine-grain hardware
parallelism). However, resource sharing opportunities are often under-exploited by HLS tools, especially in
coarse-grain pipelined designs. In this work, we target the issue of generating multi-purpose yet effcient
pipelined accelerators, demonstrating our approach on sequences of dense linear algebra computations. We
develop polyhedral program analysis to generate the accelerator structure, as well as their profitability
criteria. In particular, we leverage cross-loop compute unit reuse to create coarse-grain pipelined designs
suited for batched execution of sequences of operations.
[Slides]
[Paper]
-
Superloop Scheduling: Loop Optimization via Direct Statement Instance Reordering. [15min + 5min
Q&A]
Cedric Bastoul, Alain Ketterlin and Vincent Loechner.
[Abstract]
Loop optimization in the polyhedral model is supported by
the expressiveness of affine scheduling functions to model
statement iteration ordering. Discovering the best scheduling remains a grand challenge reinforced by the
need
to
build affine functions. Automatic techniques based on solving systems of affine constraints and/or composing
affine
scheduling primitives are limited either by the affine modeling or by their primitive set.
In this paper we propose a radically different approach
to affine scheduling construction called superloop scheduling. We leverage basic-block-level instruction
reordering
techniques and the capacity to agglomerate statement instances into "super" loops offered by modern trace
analysis tools. This approach enables deepest possible reasoning
about instruction ordering and a global approach to loop
optimization, e.g., where tiling and intra-tile optimization is
considered along with other transformations.
[Slides]
[Paper]
Session 3: DSLs and MLIR (14:50 — 15:30) [15min + 5min Q&A]
-
PolyLingual: a Programmable Polyhedral Scheduler.
Tom Hammer and Vincent Loechner.
[Abstract]
In this short paper, we present PolyLingual,
a DSL and compiler to C source code, for easy generation
and fast prototyping of new polyhedral schedulers.
[Slides]
[Paper]
-
Building a Static HLS Pass with FPL.
Kunwar Shaanjeet Singh Grover, Arjun Pitchanathan, Julian Oppermann, Mike Urbach and Tobias Grosser.
[Abstract]
Compiler infrastructure like LLVM/MLIR provides modular
and extensible building blocks for reuse in compiler development. The previously existing polyhedral building
blocks
depended on external solvers. Due to the perceived cost of
depending on such external tools, these building blocks were
used only in fully polyhedral pipelines. For hybrid or one-off
use cases, selective reimplementation has been preferred
over taking the dependency; several reimplementations of
sub-polyhedral solvers now exist in the ecosystem. With the
introduction of a fast Presburger library (FPL) in MLIR, many
core polyhedral building blocks are now available in-tree.
As a result, it is now possible to use polyhedral techniques
within the MLIR-based CIRCT project. Using the development of a static high-level synthesis (HLS) pass in
CIRCT as a case study, we show the impact of FPL’s availability
upstream.
[Slides]
[Paper]
Break (15:30 — 16:00)
Session 4: Benchmarks and Internals (16:00 — 16:40) [15min + 5min Q&A]
-
GeMS: Towards Generating Millions of SCoPs.
Venkatakeerthy S, Nilesh Shah, Anilava Kundu, Shikhar Jain and Ramakrishna Upadrasta.
[Abstract]
We propose an automatic polyhedral loop generator for Generating Millions of SCoPs/loops
(GeMS).
In this presentation, we first give a motivation and an outline of our generator. Using GeMS, we
generate a labeled set of compute and memory bounded kernels by imposing constraints on cache
locality. Our initial loop-generator is able to generate 24K kernels for a given configuration.
We then illustrate the characteristics of the generated kernels by studying their IPC statistics. Our
experimental evaluation involves using various input parameter sizes on two machines that have
different cache configurations.
We conclude with a listing of possible applications where GeMS could be useful. We also suggest
ways to extend GeMS using various core polyhedral compilation libraries and data structures.
[Slides]
[Presentation-only]
-
Which is the Best Farkas Multipliers Elimination Method?.
Nassim Tchoulak, Adilla Susungi, Gianpietro Consolaro, Harenome Razanajato, Nelson Lossing, Zhen Zhang,
Cedric Bastoul, Corinne Ancourt and Renwei Zhang.
[Abstract]
The projection method of polyhedrons and the Fourier-Motkzin
elimination method are existing approaches to eliminate
Farkas multipliers before passing a constraint system to an
ILP solver. This paper focuses on experimenting with variations of such elimination process in order to
identify which
could be the best method. We assess the impact of such variations over requirements such as the effects on
the
number of
constraints generated or the scheduling time when coupled
with a variety of solvers.
[Slides]
[Presentation-only]
Session 5: Modelling and Sparse matrices (16:40 — 17:30)
-
Modelling linear algebra kernels as polyhedral volume operations. [25min + 5min Q&A]
Karl F. A. Friebel, Asif Ali Khan, Lorenzo Chelini and Jeronimo Castrillon.
[Abstract]
Linear algebra de-facto dominates an entire branch of current hard- and software design efforts focused on
bringing
faster and more efficient machine learning and signal processing kernels. For many years, compilers have
leveraged
the well-defined semantics of linear algebra operations for
optimization, with lots of recent research around machine
learning. Due to the structure of the operations the polyhedral model is an ideal fit for reasoning about
linear algebra
programs. In this paper, we present a related model in which
such programs are represented as sequences of operations
on indexed volumes characterized by their element-wise dependencies. We show how this model opens a way
towards
efficiently implementing compound kernels by partitioning
and specializing over index domains. We demonstrate how
memory layout, partitioning (e.g., tiling) and compile-time
sparsity are exploited using this model.
[Slides]
[Paper]
-
Sparse Tetris: Reconstructing Sparse Matrices with Polyhedra. [15min + 5min Q&A]
Gabriel Rodriguez and Louis-Noel Pouchet.
[Abstract]
Sparse data structures are ubiquitous in modern computing
to represent sets of nonzero-valued regions within a dense
coordinate system, typically when nonzero values are rare
in the structure. This avoids storing useless zero values, and
bypasses useless computations (e.g., multiplication by zero).
Sparse tensors and matrices are widely used in physics simu-
lation, graph analytics, or may occur after sparsification of a
neural network weight matrix. A sparse representation
from the set of nonzero coordinates in a data structure can be
built by inspection and compression into multiple arrays, as
exemplified with the Compressed Sparse Row (CSR) format.
Such formats are not amenable to fast insertion/deletion of a
nonzero element in the structure: one or more array resizing
and shifting of all its data, is typically required. There exits
numerous scenarios where the sparse structure is mostly
immutable over time (e.g., a road network), and only the
values associated to the nonzero coordinates, that is their
associated field(s), are changing during the computation.
[Slides]
[Presentation-only]
Community news and closing notes (17:30 — 17:40)
Call For Papers and Presentations
We welcome both theoretical and experimental papers and presentations
on all aspects of polyhedral compilation. We also welcome submissions describing
preliminary results, crazy new ideas, position papers, experience
reports, education material, and available tools, with an aim to
stimulate discussions, collaborations, and advances in the field. The
following illustrate potential IMPACT papers:
- Thorough theoretical discussion of a preliminary idea with an attempt to
place it in context but no experimental results.
- Experimental results comparing two or more existing ideas, followed by a
detailed analysis.
- Presentation of an existing idea in a different way, including
illustrations of how the idea applies to new use cases, code,
architectures, etc. Attribution should as clear as possible.
Topics of interest include, but are not limited to:
- program optimization (automatic parallelization, tiling, etc.);
- code generation;
- data/communication management on GPUs, accelerators and distributed systems;
- hardware/high-level synthesis for affine programs;
- static analysis;
- program verification;
- model checking;
- theoretical foundations of the polyhedral model;
- extensions of the polyhedral model;
- scalability and robustness of polyhedral compilation techniques.
Important Dates
Paper and Presentation-only Submissions |
Paper deadline: |
Thursday, November 24, 2022 (AoE) |
|
Extended: Sunday, November 27, 2022 (AoE) |
Author notification: |
Monday, December 19, 2022 |
Final version due: |
Thursday, January 12, 2023 (AoE) |
Workshop: |
Monday, January 16, 2023 (in conjunction with HIPEAC) |
Submission
Paper Submissions
Paper submissions should not exceed 10 pages (recommended 8 pages), excluding
references, formatted as per ACM SIGPLAN proceedings format.
Short paper submissions, even with only 2 pages, are welcome as well.
Use version 1.54 or above of the following templates to prepare your manuscript:
https://www.acm.org/publications/proceedings-template
Make sure to use the "sigplan" subformat. Visit
http://sigplan.org/Resources/Author/
for further information on SIGPLAN manuscript formatting.
NOTE: older versions of the article template use smaller fonts for
submission, please double-check that you are using the recent style file
(in particular, various LaTeX distributions ship older versions of the
acmart style file, please download the most recent one from ACM).
Submissions should use PDF format and be printable on US Letter or A4
paper. Please submit your manuscripts through EasyChair and while selecting "paper" as topic:
https://easychair.org/conferences/?conf=impact2023
Proceedings will be posted online. If the final version of an accepted
paper does not sufficiently address the comments of the reviewers, then
it may be accompanied by a note from the program committee. Publication
at IMPACT will not prevent later publication in conferences or journals
of the presented work. However, simultaneous submission to IMPACT and
other workshop, conference, or journal is often prohibited by the policy
of other venues. For instance, a manuscript overlapping significantly
with IMPACT submission cannot be submitted to PLDI 2023 or any other
overlapping SIGPLAN event.
Please make sure that at least one of the authors can attend the
workshop if your work is accepted to give a 25 minutes presentation of
your work.
Presentation-only Submissions
Submit a summary of your presentation through EasyChair:
https://easychair.org/conferences/?conf=impact2023
Submissions have no formatting requirements but should use PDF formal.
Choose the "presentation-only" topic when submitting.
The presentation should take no longer than 25 minutes to present.
Please make sure that at least one of the authors can attend the workshop
if your work is accepted.
Workshop Chair
|
Michael Kruse |
Argonne National Laboratory, USA |
|
Ramakrishna Upadrasta |
IIT Hyderabad, India |
Program Committee
|
Lorenzo Chelini |
Intel, Switzerland |
|
Albert Cohen |
Google, France |
|
Alexandre Eichenberger |
IBM, USA |
|
Renato Golin |
Intel, UK |
|
Tobias Grosser |
University of Edinburgh, UK |
|
Vinod Grover |
NVIDIA, USA |
|
Guillaume Iooss |
INRIA, France |
|
Andreas Kloeckner |
UIUC, USA |
|
Martin Kong |
Ohio State University, USA |
|
Antoine Miné |
Sorbonne Université, France |
|
Diana Picus |
AMD, Sweden |
|
Arjun Pitchanathan |
University of Edinburgh, UK |
|
Louis-Noël Pouchet |
Colorado State University, USA |
|
Benoit Pradelle |
Silexica, Germany |
|
Sanjay Rajopadhye |
Colorado State University, USA |
|
P. Sadayappan |
University of Utah, USA |
|
Daniele Giuseppe Spampinato |
Huawei, Switzerland |
|
Sven Verdoolaege |
Cerebras Systems, Belgium |
|
David G. Wonnacott |
Haverford College, USA |
|
Jie Zhao |
State Key Laboratory Zhengzhou, China |
Contact Us
Please send any questions related to the IMPACT'23 workshop to
Michael Kruse or
Ramakrishna Upadrasta.