Wednesday, November 11, 2009

ParaSail concurrent interfaces

A ParaSail interface can be marked as concurrent, meaning that an object of a type produced by instantiating such a concurrent interface can be accessed concurrently by multiple parallel threads.  We use the term concurrent type for a type produced by instantiating a concurrent interface, and concurrent object for an object of a concurrent type.

Both lock-based and lock-free synchronization techniques can be used to coordinate access to a concurrent object.  In addition, operation queues are provided to hold operation requests that cannot be serviced until the concurrent object attains a desired state, as specified by the dequeue condition.  When a thread initiates an operation request, if it can't be performed immediately because the condition is False, the thread is blocked until the request gets dequeued and the associated operation is completed, or the operation request is canceled for some reason, such as a timeout.   In general, multiple operation requests may be queued simultaneously on behalf of a single thread, with the first one to be dequeued causing the automatic cancellation of the others.

With a lock-based concurrent interface, one of the operands of a procedure or function of the interface, if the operand is of the type defined by the interface, can be marked indicating it should be locked upon call. When calling the operation,  the operand's lock will be acquired automatically upon entry to the operation, and the lock will be released upon return.  Alternatively, one operand of a concurrent interface operation may be marked queued, meaning that the dequeue condition must be true before the operation will be performed, and when it is performed it will be locked automatically as well.

Within the concurrent class implementing a lock-based concurrent interface, the components of an operand marked locked or queued may be referenced directly, knowing that the access is single-threaded at that point.  If there are other (non-locked/queued) operands of the associated concurrent type, their non-concurrent components may not be referenced by the operation.  An operation with a queued operand has the further assurance that at the point when the operation starts, the dequeue condition is True.  The dequeue condition for a given operation may depend on the values of the non-concurrent parameters, plus the internal state of the (concurrent) operand marked queued.

Within the concurrent class implementing a lock-free concurrent interface, all data components must themselves be concurrent objects, since no locking is performed on entry to the operations of the classTypically these components will be of low-level concurrent types, such as a single memory cell that supports atomic load, atomic store, and atomic compare-and-swap (CAS), or a somewhat higher-level concurrent type that supports multi-word-compare-and-swap (MCAS).  The ParaSail standard library provides a small number of such lower-level concurrent types to support implementing lock-free concurrent interfaces.  The higher level queued operations are implemented using a lower-level race-free and lock-free concurrent type similar to a private semaphore, which can be used directly if desired.

Here is an example of a concurrent interface and corresponding concurrent class for implementing a simple bounded buffer:
concurrent interface Bounded_Buffer
  <Element_Type is Assignable<>;
   Index_Type is Integer<>> is
    function Create_Buffer(
      Max_In_Buffer : Index_Type {Max_In_Buffer > 0})
      -> Result : Bounded_Buffer;
      // Create buffer of given capacity

    procedure Put(Buffer : queued Bounded_Buffer; 
      Element : Element_Type);
      // Add element to bounded buffer; 
      // remain queued until there is room
      // in the buffer.

    function Get(Buffer : queued Bounded_Buffer)
      -> Element_Type;
      // Retrieve next element from bounded buffer;
      // remain queued until there is an element.
end interface Bounded_Buffer; 

concurrent class Bounded_Buffer
  <Element_Type is Assignable<>;
   Index_Type is Integer<>> is
    const Max_In_Buffer : Index_Type {Max_In_Buffer > 0};
    var Data : Array<Element_Type, Index_Type> := 
      Create_Array(1, Max_In_Buffer);
    var Next_Add : Index_Type 
      {Next_Add in 1..Max_In_Buffer} := 1;
    var Num_In_Buffer : Index_Type 
      {Num_In_Buffer in 0..Max_In_Buffer} := 0;
  exports
    function Create_Buffer(
      Max_In_Buffer : Index_Type {Max_In_Buffer > 0}) 
      -> Bounded_Buffer
      // Create buffer of given capacity
    is
        return (Max_In_Buffer => Max_In_Buffer);
    end function Create_Buffer;

    procedure Put(Buffer : queued Bounded_Buffer; 
      Element : Element_Type)  
      queued until 
        Buffer.Num_In_Buffer < Buffer.Max_In_Buffer 
      // Add element to bounded buffer;
      // remain queued until there is room
      // in the buffer.
    is
        Buffer.Data[Buffer.Next_Add] := Element; 
        // Advance to next element, cyclically.
        Buffer.Next_Add := 
         (Buffer.Next_Add mod Buffer.Max_In_Buffer) + 1;
        Buffer.Num_In_Buffer += 1;
    end procedure Put;

    function Get(Buffer : queued Bounded_Buffer)
      -> Element_Type
      queued until Buffer.Num_In_Buffer > 0 
      // Retrieve next element from bounded buffer;
      // remain queued until there is an element.
    is
         return Buffer.Data[
           ((Buffer.Next_Add - Buffer.Num_In_Buffer - 1)
             mod Buffer.Max_In_Buffer) + 1];
    end function Get;
end class Bounded_Buffer; 
This concurrent interface has one constructor, plus two queued operations, Put and Get.  The dequeue conditions are provided as part of the implementation of an operation with a queued operand.  Here the dequeue conditions are Buffer.Num_In_Buffer < Buffer.Max_In_Buffer for Put, and Buffer.Num_In_Buffer > 0 for Get.  These dequeue conditions ensure that when the operations are performed, they can proceed without an error.

An example of a lock-free concurrent interface will be given in a later post.

Monday, November 9, 2009

ParaSail region-based storage management

In ParaSail, heap storage is broken up into regions which are generally deallocated as a whole, rather than one element at a time.  The concept of region-based storage management originated a decade or more ago, and was at the heart of the Cyclone programming language.  The region-based storage management in ParaSail is a bit different, but is certainly inspired by all this earlier work. 

To prevent dangling references, regions in ParaSail are reference counted.  The reference counting is done at the region level rather than at the individual object level, since deallocation happens at the region-as-a-whole level.  For a similar reason, rather than counting each individual pointer into the region, we (effectively) just keep track of the number of other regions that might contain objects pointing into the given region.  Before storing a pointer from an object in region A to an object in region B, a check is made that region B is reachable from region A via one or more region-to-region "handles."  The region-to-region handles form a reachability graph.

The programmer decides when to create a region-to-region handle, in anticipation of creating one or more object-to-object pointers.  The operation for creating a region-to-region handle is "Allow_Reference(From_Region, To_Region)." After invoking this operation, objects in From_Region are allowed to point at objects in To_Region, or in other words, To_Region is reachable from From_Region.  This cannot be revoked -- the region-to-region handles once created are immutable, until the From_Region goes away as a whole.  Reachability is transitive, so after this call, anything reachable from the "To_Region" is now also reachable from the "From_Region."

Allow_Reference can create a loop in the reachability graph.  Because the region-to-region handles are immutable, once such a loop is created, all regions in the strongly connected subgraph containing From_Region and To_Region will necessarily be deallocated simultaneously (their lifetimes will end at the same moment).  From a reference-counting perspective, this means that the reference counting is really best done at the strongly-connected-subgraph level, and the possibly-cyclic region graph can be collapsed into a directed acyclic graph (DAG) between strongly-connected subgraphs of the original region graph.

In addition to region-to-region handles, there are also scope-to-region handles associated with local scopes of the program.  To prevent pointers in local variables from ending up as dangling references, a pointer from a local variable to a region is only permitted if the region is reachable from the immediately enclosing program scope:  A region is (directly) reachable from a given program scope if there is a scope-to-region handle in that scope targeting that region.  Also, a region is reachable from a given scope if it is reachable from an enclosing scope.  The transitivity of reachability completes the picture in the usual way, so that if region A is reachable from region B, and region B is reachable from the current scope, the region A is reachable from the current scope.

As with region-to-region handles, scope-to-region handles are immutable, so that once a region is reachable from a given scope, it will remain so until the scope is exited.  The fundamental safety rule is that a region must not be deallocated if it is reachable from any still active scope of the program. Coupled with the requirement that local pointers may only point into reachable regions, this ensures that no local pointer can become a dangling reference.  Similarly, a pointer in a region A that is reachable from an active scope, cannot become a dangling reference because it can only point at an object in a region B if B is reachable from A and, transitively, from the same active scope.  

An alternative way of describing this notion of scope-to-region handles is that each scope has its own local region, and scope-to-region handles are actually region-to-region, the difference being that they go from a local region to a non-local, reference-counted region.  Note that local regions would never be reference counted, since upon leaving the scope, they are always deallocated.  For similar reasons, one would never allow a handle, nor a pointer from a non-local region to a local region, nor from the local region of an enclosing scope to a local region of a nested scope.  So all in all, calling each scope a  region may not actually simplify the model very much -- local regions would still have a number of special rules.

In addition to the two kinds of region handles we have described, there are two additional kinds of references that affect region lifetime -- strong references and weak references.   Both of these are internally a pair of a region handle and a pointer to an individual object.  The pointed-to object is guaranteed to be in a region reachable from the one designated by the handle. 

The region handle in a strong reference is reference counted, meaning that as long as the strong reference exists, the designated region and any region reachable from it will not be deallocated.  The big difference between the region handle in a strong reference and a normal region handle is that it is not immutable.  A strong reference can be a variable, and may be reassigned to point at a different region/object.  A normal region handle is immutable because it is "protecting" an arbitrary number of pointers that depend on it to prevent dangling references.  By contrast, the handle in a strong reference is only "protecting" the single pointer with which it is bound, so there is no problem allowing it to be reassigned in conjunction with reassigning the pointer.  To actually use a strong reference, the reference must be split into its constituent parts, producing a scope-to-region handle at the point of the use, along with a pointer protected by the scope-to-region handle in a local variable.  Once the strong reference has been split out, the strong reference could then be reassigned to point elsewhere, or to point nowhere (i.e. set to null).

A weak reference is similar to a strong reference, except that it does not prevent the designated region from being deallocated.  Whether the designated region has been deallocated is determined when the weak reference is split out.  If the result of splitting the weak reference is a null region handle and a null pointer, then the designated region has at some point been deallocated, presumably due to the underlying storage manager running low on overall available storage. 

A weak reference is useful for caching, by pointing to data that can be reconstructed when needed, but where reconstructing it is somewhat expensive -- for example, data coming from a file or a separate database, or data that is the result of a complex calculation.  The weak reference can be put in a cache for recently used results, which is checked first when a result is needed.  If a weak reference exists for the needed result, and after splitting turns out to be non-null, then the expense of reconstructing the result can be avoided.  If is is null, then the data must be reconstructed.

More on regions in a later posting...