Thursday, January 27, 2011

Philosopher examples

We have just posted two new examples to the ParaSail Google group -- a very simple dining philosophers example, and a more interesting drinking philosophers example, based on Chandy and Misra's 1984 ACM TOPLAS paper.  The drinking philosophers example was more challenging as it required orchestrating the use of two different concurrent data structures, one representing a philosopher's bar stool, and one representing the bottles being shared between the philosophers.

To see these examples, visit ParaSail Google Group.

Tuesday, January 4, 2011

CUDA, OpenCL, Cilk, and ParaSail

NVIDIA, AMD, and Intel have all gotten interested in parallel programming languages over the past few years, and for good reason.  It is clear that all programmers will have to start learning how to take advantage of parallelism if they expect their programs to continue to benefit from Moore's law, with its wonderful doubling of computing power every two years.  And chips that make it easier to take advantage of parallelism will be the more successful chips. 

NVIDIA has been championing CUDA as the way to use their GPUs for general purpose parallel processing (GPGPU), though they have also been part of the group designing OpenCL, which is the more hardware-independent approach first developed by Apple and now being standardized by the Khronos group.  AMD (after swallowing ATI) started with CTM (Close to Metal) and some other approaches, but now seem to be putting most of their energy behind OpenCL for GPGPU programming.  Intel has been encouraging a number of different approaches, including Cilk, Threading Building Blocks (TBB), Concurrent Collections (CnC), etc., and they are part of the OpenCL effort as well.

Cilk and OpenCL seem to be emerging as two important, rather distinct approaches to parallel programming.  Cilk makes it easy to spawn threads off to compute an arbitrary expression, and then uses a set of worker processes to service the threads, using the approach we discussed in an earlier post, where threads spawned by a given worker are serviced LIFO, but when a worker runs out of threads, it steals from other workers using a FIFO strategy.  OpenCL, on the other hand, is focused on data parallelism, where many data items are all being manipulated in nearly the same way.  It is roughly a SIMD (Single Instruction Multiple Data) approach, though some variation in control flow is permitted, meaning that it is somewhat closer to the SPMD (Single Program, Multiple Data) approach fostered by the Connection Machine folks.  In OpenCL (and CUDA), the per-data-item code is bundled into a kernel, which is essentially a function that operates on a single data item in a large array or stream of such items.

So how does this relate to ParaSail?  In some ways, Cilk and OpenCL can be seen as lower-level languages, where the programmer has to worry about identifying particularly places where threads should be spawned (in Cilk), or worry about extracting kernels from code that operates on an array or stream (in OpenCL).  ParaSail on the other hand is focused on expressing the algorithm at a relatively high level with pervasive opportunities for parallelism, presuming the compiler or run-time support routines can identify how best to map all of the opportunities for parallelism on to the underlying parallel hardware.

One of the key advantages that ParaSail provides is that there is no hidden aliasing, unlike in C, C++, Java, etc., where aliasing is everywhere.  (Cilk and OpenCL are both piggy-backed on C/C++; various threading APIs are available to Java, C#, Python, etc.)  This makes it dramatically easier for a compiler or run-time to automatically partition computations into pieces to be executed in parallel.  Furthermore, the ParaSail compiler detects all race conditions (and various other problems) at compile time, has no exceptions, allows memory to be partitioned into non-overlapping regions, etc., which simplify both the programmer's model and the process of automatic parallelization.