Designing ParaSail, a new programming language: September 2009

Wednesday, September 30, 2009

ParaSail extension, inheritance, and polymorphism

ParaSail fully supports object-oriented programming, including interface and class extension/inheritance/reuse, and both static (compile-time) and dynamic (run-time) polymorphism.

Interface extension is the most straightforward. One interface can be defined to extend another, meaning that it inherits all of the operations and generic parameters of some existing interface, and optionally adds more of each:

  interface Extensible_Array extends Array  is
     operator "[]"(Arr : ref Extensible_Array;
        Index : Index_Type {Index >= First(Arr)})
         -> ref Element_Type {Last(Arr) >= Index};
  end interface Extensible_Array;

Here we have essentially the same operation, but now the array automatically grows (at only one end) if the indexing operation is applied to it with an index that is greater than the prior value of Last(Arr). Note that the generic parameters for Extensible_Array are all inherited as is from Array.

The operations that are not redeclared in the new interface are inherited from Array, but with each occurrence of Array replaced systematically with Extensible_Array.

Unless Extensible_Array is declared as an abstract interface, there must be a corresponding Extensible_Array class that provides its default implementation. The Extensible_Array class might be defined as an extension of the Array interface (and indirectly its associated class), but it need not be. It could be a completely different implementation. Here is what it would look like if it were an extension of the Array interface.

  class Extensible_Array extends Parent: Array
  is
    ... 

  exports 
    operator "[]"(...) -> ... is ... end "[]"; 
    function Create_Array(...) -> Extensible_Array is ... end Create_Array;
  end class Extensible_Array;

Local operations and objects could be defined as needed, and exported operations could be redefined as appropriate. An object of the interface being extended (the underlying interface) is created as a local variable of the class (this variable is the underlying object). A name can be given to the underlying object -- we use "Parent" above.

Note that the class is extending an interface, rather than directly another class, and in fact any implementation of the interface can later be optionally specified when using Extensible_Array. Effectively the actual class to use as the implementation of the underlying interface becomes another optional generic parameter. If unspecified, it defaults to the default implementation of the underlying interface.

For those operations of the interface that are not provided explicitly in the class, an inherited operation is provided, based on the corresponding operation of the underlying interface. The implementation of this inherited operation invokes the corresponding operation of the Array interface, passing it the underlying Array object of each operand of type Extended_Array.

An exported operation that has any results of the type of the extended interface must always be explicitly defined, because the corresponding operation of the underlying interface returns the wrong type of object (that of the underlying interface), and there is no implicitly provided mechanism for creating an object of the extended interface given an object of the underlying interface. Hence we are obliged to explicitly define Create_Array, because in the underlying interface it returned a result of type Array, while to implement the extended interface it has to return a result of type Extended_Array. Of course inside the new Create_Array we could (and probably would) call the Create_Array operation of the underlying interface to create the underlying object, and then presumably take additional actions needed to finish creating an Extended_Array.

Although an interface may extend any number of other interfaces, a class can only extend one interface. Of course it may have local variables of many different interfaces, but only one of them is the underlying object, and only inherited operations corresponding to operations on the underlying interface will be automatically provided.

One important thing to note about how ParaSail implementation inheritance works. It is a completely black box. Each implicitly provided inherited operation calls the corresponding operation of the underlying interface, passing it the underlying objects of any operands of the type being defined. The underlying interface operation operates only on the underlying object(s), having no knowledge that it was called "on behalf" of some extended class. If the underlying operation calls other operations, they too are operating only on the underlying object(s). There is no "redispatch" on these internal calls, so the extended class can treat these underlying operations as black boxes, and not worry that if it explicitly defines some operations while inheriting others, that that might somehow interact badly with how the underlying operations are implemented.

Which brings us to polymorphism. Static, compile-time polymorphism has already been illustrated, through the use of generic parameters, and the name resolution rules. We also see it here in that the particular class implementing the underlying interface for an extended class can also vary depending on the particular instantiation of the extended class. Although we didn't mention the syntax for that, it makes sense for it to use the extends keyword as well at the point when we are providing the actual generic parameters:

    var XArr : Extended_Array 
       extends Sparse_Array := Create_Array(...);

Now what about dynamic, run-time polymorphism? When do we actually need it? A classic example is that of a heterogeneous tree structure, where all of the nodes in the tree are extensions of some basic Tree_Node type, but each is specialized to carry one particular kind of information or another. Static polymorphism is great for creating general purpose algorithms and data structures, but they are inevitably homogeneous to some degree. The algorithm manipulates numbers all of the same precision, or the data structure holds elements all of the same type. With dynamic polymorphism, heterogeneity rules.

In ParaSail, dynamic polymorphism is indicated by appending a "+" to a type name. What this means is that the corresponding object, parameter, or result might be of a type coming from any extension of the given type's interface, with the proviso that the generic parameters inherited from this original interface have the same bindings for the two types. That is important because, without that, the operations shared between the two types would not necessarily take the same types of parameters.

To illustrate, presume we give a name to the student-array type we have been using in some of the above examples:

  subtype Student_Array is Array<String, Student_ID>;

Now if we indicate a parameter of type "Student_Array+" then we can take parameters of type Student_Array but also objects of type "Extended_Array<String, Student_Id>" and "Sparse_Array<String, Student_ID>". Similarly, if we indicate a result of "Student_Array+", then the object returned might be of type Student_Array, or it might be of any other "similar" type. A Student_Array "factory" might typically specify its result type as "Student_Array+" to allow run-time flexibility in choosing which particular type of array-like structure to return.

So now we have a polymorphic Student_Array+ object, say "SA_Plus," what can we do with it? We can of course pass it to operations that take a parameter of type Student_Array+. But what about all those nice operations of Array? They take operations of type Array, not Array+. What happens if we try to pass SA_Plus to one of those, say "First(SA_Plus)" or "SA_Plus[Joes_Id]"? This is where dynamic, run-time polymorphism kicks in, and we go to the "appropriate" implementation of "First" or operator "[]", based on a run-time type identification carried around with objects of a polymorphic type.

If we happen to have a binary operation in our interface Array, for example Merge, which is defined to take two Array operands, and somehow combine them into a result Array, how does that work? The requirement is that either all or none of the operands are polymorphic, and if all are polymorphic, then they must have the same run-time type identification (henceforth "type-id"), and the result will also be polymorphic but will be guaranteed to similarly have that same type-id. Since all of the operands are required to have the same type-id, that type-id determines which particular implementation of the operation is actually invoked at run-time.

All of this only applies to operations declared in the interface defining the root type of the polymorphic type (Array in this example). Operations declared elsewhere that take some particular array type have no special semantics. But that makes sense because only the operations defined in the root interface are guaranteed to be present in all of the root interface's extensions.

What about operations that return an object of the enclosing interface type, but don't take any operands of the type? This would be typical for a constructor function, such as Create_Array()->Array, or for something resembling an enumeration literal like "Red" for a color type. Can such an operation be called in a "polymorphic" fashion? Yes, if it is used as an operand in a call and some other operand in the call is polymorphic, then the rule that requires all such operands to have the same type-id is used as a kind of self-fulfilling prophecy: the type-id to use for the call on Create_Array is that of the other operand. So if we use our hypothetical binary operation Merge and one of the operands is our polymorphic object SA_Plus, and the other is Create_Array(...), then we call the Create_Array associated with the type-id we get from SA_Plus, and then proceed to call Merge on the SA_Plus and the result of Create_Array, confident in the knowledge that the two operands will now have the same type-id.

This approach ensures that when you get into the implementation of a binary operation, the operands have the same type, and it is the type of the interface whose operation is actually being invoked.

That's probably enough for now!

Monday, September 28, 2009

Resolving names in ParaSail

In ParaSail, the names exported by an interface can in most cases be used without any explicit qualification of the interface from which they came. Hence, continuing with our Array interface example, we can generally write "First(Arr)" without having to specify which "First" function we mean, presuming we know the type of Arr. If qualification is necessary, the "::" notation of C++ is used, such as "Array::First(Arr)". ParaSail reserves "." for selecting a component of a composite object, such as "Arr.First".

ParaSail's name resolution works roughly as follows: Going bottom up, if you have a simple name (not used as the name of an operation being called), and that name is declared in the current scope, you use that interpretation. If you have an operation call, you determine the types of as many parameters as you can, and then look in the interfaces defining each of those types, for operations of the given name. If you find one or more that "work," then you consider all of those possibilities. If there are some operands which have no interpretations yet (perhaps because the operand is actually a parameterless call or an equivalent interface constant), then the type expected for that operand is used to help resolve the operand. If you get to the "top" and you have exactly one interpretation that "works," you use that one. If you have no interpretations, or you have more than one, the construct cannot be resolved, and an error is indicated. This sort of overload resolution sounds more complicated than it typically ends up being, because most constructs are relatively simple.

How a ParaSail class implements its interface

In ParaSail, a class implements an interface. Unless an interface is marked as abstract, there is required to be a class with the same name as the interface, which provides the default implementation of the interface. There may be multiple implementations of an interface. Some may be specializations of the default. Specializations specify certain restrictions on the generic parameters, along with a preference level, which allows the compiler to choose the appropriate implementation automatically if not specified explicitly.

Here is a class implementing the interface Array shown in an earlier post:

 class Array < Element_Type is Assignable<>;
                    Index_Type is Discrete<>>
 is 
   const First : Index_Type;
   const Last : Index_Type;
   function Length(Arr : Array) return Integer is
     return Last-First+1;
   end Length;
   var Data : Length(Array)*Element_Type;    
 exports  
   function First(Arr : Array) -> Index_Type is
     return Arr.First;
   end First;
   function Last(Arr : Array) -> Index_Type is
     return Arr.Last;
   end Last;
   operator "[]"(Arr : ref Array;
     Index : Index_Type {Index in First(Arr)..Last(Arr)})
       -> ref Element_Type is
     return Data[Index-First];
   end "[]";
   function Create_Array(First, Last : Index_Type)
       -> Result:Array
           {First(Result) = First & Last(Result) = Last}
   is
     return (First => First, Last => Last, Data => ());
   end Create_Array ;
   ...
 end class Array;

A class must include implementations for each operation in its interface, plus any number of local declarations as necessary to implement the exported operations. The exported declarations inside a class follow the word exports. Declarations preceding exports are local to the class. Even though the interface associated with the class implies what operations are exported, we make the distinction explicit in the class itself, since changing the specification of an exported declaration has a much larger implication than does changing that of a local declaration. Also, by segregating the exported declarations at the end, we make it easier to find them. Finally, because a class might implement interfaces other than its own, the exported declarations allow it to fulfill the set of operations of other interfaces. C uses "extern" vs. "static" to make a similar distinction. Java uses "public" vs. "private".

Note that the above example is being pretty fast and loose with syntax and semantics. But at this point, ParaSail's design isn't complete yet, so any examples should be taken with a grain of salt. Nevertheless, it seems useful to try to produce examples. It is guaranteed that the examples won't all be consistent with one another, since the syntax and semantics will continue to evolve during this design process. And part of the reason for producing examples is to try out the evolving ideas, and see how they look. One good concrete example can often shatter or confirm many days of abstract philosophizing.

More on ParaSail interfaces

Here is the ParaSail example of an Array interface from the last installment:

 interface Array < Element_Type is Assignable<>;
                    Index_Type is Discrete<>>
 is
   function First(Arr : Array) -> Index_Type;
   function Last(Arr : Array) -> Index_Type;
   function Element(Arr : ref Array;
     Index : Index_Type {Index in First(Arr)..Last(Arr)})
       -> ref Element_Type;
   function Create_Array(First, Last : Index_Type)
       -> Result:Array
           {First(Result) = First & Last(Result) = Last};
   ...
 end interface Array;

To make indexing into an Array look more familiar, we might want to use the "[]" operator instead of the function Element:

  operator "[]"(Arr : ref Array;
     Index : Index_Type {Index in First(Arr)..Last(Arr)})
       -> ref Element_Type;

Now we could use this interface roughly as follows:

   var Names: Array<String, Student_ID> := 
     Create_Array(First => First_Student_ID,
                  Last => Last_Student_ID);
   Names[ID_Of_Mike] := "Mike";

This kind of thing should look pretty familiar.

A few things to note: There is no separate notion of a constructor. A function that returns an object of the type defined by the enclosing interface is effectively a constructor. A function that returns a ref is not a constructor, but may be used on the left-hand side of an assignment, provided the ref parameter(s) to the function can be used on the left-hand side. Parameters are by default read-only, but ref parameters can be updated. The result of a function is specified after a "->". If the function has multiple results, they are enclosed in parentheses in the same way as the (input) parameters:

  function Two_Outputs(X : Input_Type) -> 
         (Y : Output_Type1; Z : Output_Type2);

Once we start talking about statements, we will see that in ParaSail, as in C++ and Java, declarations and executable statements may be interspersed. All declarations start with a reserved word, such as var, const, interface, subtype, etc. so they are easy to distinguish from assignment statements, procedure calls, etc.

Next time we will discuss how a class implements an interface.

Modules in ParaSail

Since ParaSail doesn't really exist at this point, I should probably write everything in the future tense. But instead I will use what my friends call the software present tense which is used to describe how it ought to work and hopefully will work someday, if that day ever comes...

Another thing I am somewhat uncomfortable with is using the first person ("I") so much in this description, so I will probably use "ParaSail" as the subject rather than "I" in many sentences. "We" seems a bit presumptious, though in a scholarly paper seems better than "I", even for a solo researcher. But in a programming language description, one really hopes that the language quickly comes to exist independently of any creator or creators. So (more than) 'nough said. On to the technical content:

ParaSail has only two kinds of modules: interface modules and class modules. A class module implements one or more interface modules. Perhaps a class ought to be called an implementation module, but that is an awfully long word, and in many ways a class module matches what languages like Simula and C++ and Java and C# and Eiffel call a class. In general ParaSail tries to use familiar terminology in familiar ways. Of course there is danger of confusion, but hopefully the benefits of familiarity outweigh the dangers of confusion.

Both interface and class modules are parameterized, essentially generic templates, as used in Ada, C++, Java, etc. Because of their wide-spread adoption, ParaSail uses "<...>" for generic parameters. While we are talking about notation, ParaSail uses "(...)" for normal function/procedure parameters, uses "[...]" for selecting from an array or similar container. ParaSail uses words for bracketing control flow (e.g. "if ... then ... else ... end if;"). ParaSail uses "{...}" for annotations such as constraints, assertions, pre/post-conditions, etc. This is intended to be familiar from Hoare logic, which uses the notation "{P} S {Q}" to represent a statement S with precondition P and postcondition Q.

Interface modules have the following basic syntax:

    interface name < generic_parameters >
    is
        interface_items     
    end interface name;

The interface_items comprise a sequence of function, procedure, and nested interface specifications. In addition constants and variables may be declared, but these are considered equivalent to corresponding functions. A constant is equivalent to a parameterless function returning an object of the constant's type. A variable is equivalent to a function with one by-reference parameter of the enclosing interface type, returning a by-reference object of the variable's type. In addition, operators may be declared. Operators are essentially functions with special syntax for calling them.

As subtly implied above, an interface defines a type, and within the interface module, the interface's name represents that type. A type is essentially an interface that has been bound to a set of actual generic parameters. There are also subtypes in ParaSail, which are types with some additional constraints (such as a range limitation), specified inside "{...}".

Generic parameters are themselves either types derived from specified interfaces, or static constants used to parameterize the implementation of the interface, and/or select among different implementations of the same interface. A generic array interface, with its generic parameter list, might look like this:

 interface Array < Element_Type is Assignable<>;
                    Index_Type is Discrete<>>
 is
   function First(Arr : Array) -> Index_Type;
   function Last(Arr : Array) -> Index_Type;
   function Element(Arr : ref Array;
     Index : Index_Type {Index in First(Arr)..Last(Arr)})
       -> ref Element_Type;
   function Create_Array(First, Last : Index_Type)
       -> Result:Array
           {First(Result) = First & Last(Result) = Last};
   ...
 end interface Array;

Note that there is no implicit parameter in ParaSail (identified as this or self in many object-oriented languages). All parameters are explicit. More about this example in the next installment.

Saturday, September 26, 2009

ParaSail language themes and philosophy

So what will make ParaSail an interesting programming language? What is the philosophy behind the language? ParaSail tries to minimize implicit operations, implicit parameters, implicit dynamic binding (virtual function calls), implicit initializations, implicit conversions, etc. This is both in the name of clarity for the human reader, and in the name of formal testability and verifiability. ParaSail uses a small number of concepts to represent all of the various composition mechanisms such as records, packages, classes, modules, templates, capsules, structures, etc. Arrays and more general containers are treated uniformly.

On the other hand, ParaSail allows many things to proceed in parallel by default, effectively inserting implicit parallelism everywhere. Parameter evaluation is logically performed in parallel. The language disallows uses that would make the result depend on the order or concurrency of parameter evaluation. The iterations of a for loop are by default executed in parallel. Explicit ordering must be specified if it is required by the algorithm. Even sequential statements are essentially converted into a data-flow based DAG which is then evaluated in parallel in so far as possible. In all cases, the language disallows code that could result in race conditions due to inadequately synchronized access to shared data, either by using per-thread data, structured safe synchronization, or a handoff semantics (similar to that of linear types, distributed languages like Hermes, or the UVM virtual memory system).

Of course much of this implicit parallelism is possible in pure functional languages, and ParaSail will support a functional programming style where it works naturally. Unfortunately, doing some relatively straightforward things in pure functional languages can require mind-bending (for me) concepts like monads, while a good old assignment statement is something that most programmers understand. With apologies to the political world, ParaSail will try to make assignments statements safe, legal, and rare.

Why blog about designing a programming language?

So why did I decide to start this blog? Designing a new language is a long process, but it is hard to find someone who is willing to sit down and discuss it. Those who like to design languages generally have their own strong biases, and the discussions tend to be more distracting than satisfying, because there are so many good answers to any interesting language design question. Those who don't have any interest in designing a new language tend to lack the patience to even talk about it, as they believe we already have more than enough programming languages, and all we need are better tools, better processes, and better training to be more productive and to achieve higher quality.

So writing a blog seems like a nice way to record the process, to perhaps get some feedback (though that may be optimistic), and to hopefully make progress by being forced to actually get the ideas down onto "paper."

That's probably enough meta-discussion for now. In the next post I plan to dive into the technical issues.

Friday, September 25, 2009

Why design a new programming language?

So why would anyone want to design a new programming language? For some of us who have the bug, it is the ultimate design project. Imagine actually creating the language in which you can express yourself. But there is another reason. I have been in the software business for over 40 years, and despite everything that might have been said to the contrary, I still believe that a well-designed programming language can result in more productive programmers building higher quality software. In the particular area of high-integrity software, including both safety-critical software and high-security software, there is all the more reason to use the very best programming language you can, because the problems you are trying to solve and the level of quality required is at the very limits of what can be accomplished.

This new language is meant to address the goals of producing inherently safe and secure software, while taking advantage of the wider availability of true parallel processing in the form of multi-core chips. It is intended to promote a formal approach to software, where the program text includes pre- and postconditions, liberal use of assertions and invariants, etc., with tool-supported proof of correctness with respect to the formal annotations.

The language is tentatively named ParaSail, for Parallel Specification and Implementation Language. I would have spelled it "ParaSAIL" but for the danger of confusion with the original Stanford AI Language, "SAIL," and its more modern follow-on "MAINSAIL" (for Machine Independent SAIL). I don't mind making the connection with SAIL, as it was a very interesting language in its day, and MAINSAIL remains worth a look today. ParaSail is a completely new language, but it steals liberally from other programming languages, including the ML series (SML, CAML, OCAML, etc.), the Algol/Pascal family (Algol, Pascal, Ada, Modula, Eiffel, Oberon, etc.), the C family (C, C++, Java, C#), and the region-based languages (especially Cyclone). Perhaps one significant deviation from the excellent baseline established by SAIL, ML, Eiffel, Java, etc. is that ParaSail is intended to avoid "fine-granule" garbage collection in favor of stack and region-based storage management.

So why a blog? I guess I'll leave that to the next post.

Designing ParaSail, a new programming language