New parallel programming languages, such as X10, make it easier to develop more reliable parallel code. However, despite the advances in parallel programming languages, developing parallel code remains challenging, with common bugs including atomicity violations, dataraces, and deadlocks. Software testing remains the most widely used approach for finding such bugs. This project proposes to parallelize systematic testing in and for X10. First, we plan to develop in X10 a parallel state-space exploration framework, which we will initially use for Java programs. Second, we plan to instantiate this framework for X10 programs, i.e., for systematic testing of programs written in X10. The benefits of the project are twofold. First, it will provide a state-space exploration framework in X10, which is interesting code in itself, e.g., as a performance benchmark. Second, it will provide a testing tool for X10, which can help in developing more reliable X10 programs and in teaching about X10.
We want to provide optimization and verification techniques for parallel programs that are on par with today's standards for sequential programs. We believe that the key enabler is to raise the level of abstraction of parallel programming.
We study parallel programs in the context of X10, a parallel langauge designed at IBM. Two of X10's key constructs for parallelism are async and finish. The async statement is a lightweight notation for spawning threads, while a finish statement (finish s) waits for termination of all async statement bodies started while executing the statement s. Additionally, X10 supports multidimensional distributed arrays.
We have rewritten a state-of-the-art plasma simulation program from Fortran 95 to X10. We have designed a core calculus with async and finish and shown that the semantics enables managable proofs of key properties. We have designed a context-sensitive may-happen-in-parallel analysis that gives precise static information about the parallel behavior of a program. We have built a compiler for a subset of X10 1.5 and shown that for our plasma simulation program, our compiler produces code that is about 5,000 times faster than the code produced by IBM's X10 1.5 compiler.
Given the importance of performance tools for productive HPC development, this project aims to explore the challenges involved in enabling experimental performance analysis of X10 applications. Our exploration will leverage the existing Parallel Performance Wizard (PPW) system, one of the foremost performance tools that supports Partitioned Global-Address-Space (PGAS) programming models, and the Global Address Space Performance (GASP) interface, which specifies the interaction between a performance tool and a PGAS programming model implementation. By extending PPW and GASP to support X10’s unique features, we aim to provide a prototype performance tool supporting X10 application analysis. Such a tool would be of substantial benefit to X10 application developers, providing them with a variety of capabilities to capture and understand program performance in terms of X10 constructs. Given X10's provisions for high-level parallelism, with abstractions that may often hide the potential performance impact of some language constructs and thus make manual performance monitoring difficult, it is particularly important for application developers to have access to tools that automatically collect and provide views into X10 application performance data. Successful completion of this project would comprise substantial progress toward bring much-needed performance tool capabilities to the world of X10 application development.
Although many efforts have been made to improve the memory model and synchronization, atomicity violations that originate from buggy declarations of atomic blocks may still occur in X10 programs. The objective of this project is to develop a static approach to effectively and efficiently detect atomicity violations for X10 programs. In order to classify both “must” and “may” atomicity violations, we improve unserializable patterns of shared-memory accesses through considering the dependencies between the successive shared-memory accesses. We plan to devise a static analysis to produce sound detection results and eliminate false positives as much as possible. We will also utilize reduction techniques to guarantee the scalability of our approach. The benefits of this project are as follows: First, it provides debugging support for X10 programs through the detection of atomicity violations. Second, since our approach does not rely on the declarations of atomic blocks, it can be applied to infer atomic blocks or eliminate unnecessary atomic blocks. Third, inter-activity data-flow analysis is extremely difficult due to the explosion of state space of activity interleavings; this project partly concerns inter-activity data-flow analysis, and therefore has the potential to provide experience or theoretical foundation in this field. Finally, our approach can be integrated with testing to provide more effective and efficient debugging support for X10 programs.
The proposed project enhances the debuggability of the X10 programs by designing, implementing, and evaluating a lightweight deterministic record and replay technique that transparently works with X10 programs through the compiler-based program analysis and instrumentation. We seek both the theoretical analysis and the pragmatic treatments that faithfully produce problematic executions of X10 programs by effectively regulating all of the random inputs to the program. In addition, we plan to achieve practicality by designing a technique that incurs low perceivable runtime footprint and requires the minimum programmer intervention.