Programming with types
Anand Kumar Keshavan ( with inputs from Nirmalya Sengupta)
The world of computer science moves at an excruciatingly slow pace. Foundational work for functional programming was done over fifty years ago and yet ideas from the paradigm have come into mainstream programming only in the recent years- with even the lumbering Java adopting some of its features in its 8th version.
Meanwhile, the Haskell community ( and its micro communities led by Idris and Liquid Haskell ones) have been advocating type theory based ideas wherein the goal is to reach a nirvana-like situation- “if it compiles, it runs”. (The idea behind this are also several decades old -- computer science minded readers are encouraged to read Type and Programming Languages, by Benjamin C Pierce. and the more mathematically oriented ones are directed to "Curry- Howard isomorphism").
Type Driven Development
None of this is a prerequisite, nevertheless, to understand the motivation behind Type Driven Development. No, don’t confuse this with more well-known and often ill-implemented Test Driven Development. That’s a different beast altogether.
The basic idea behind Type Driven Development is simply this: while using statically typed languages for implementing a solution, one takes care to define and use Types to model behaviours of its entities and components. This enables the compiler to frown whenever a programmer writes an expression that violates what the Types allow. In effect, this leads to a more robust solution, because the compiler prevents certain classes of errors from creeping in the runtime. The final result: less time and effort to spend on testing!
To repeat, “If it compiles, it runs” doesn’t mean that if we write a well-typed program, the compiler will detect all errors during compile time. In other words, the phase of testing doesn’t go away but the effort and time put behind testing, are reduced significantly, particularly verification flavour of testing.
Let me attempt to illustrate what all this means, without delving into theoretical computer science or mathematical concepts such as type or proof theories. I expect that you know the difference between statically and dynamically typed programming languages (a topic of perennial flame wars on the net). A statically typed language, such as Java, will prevent programs with such type mismatch errors (such as passing a String to a function that expects an Int) from compiling. Dynamically typed languages, such as Javascript, will detect and report such type mismatch errors only during run time. In case of the latter, one has to have additional checks in the code to verify type correctness. Also, one may need to write a lot more test cases to ensure that such type mismatches do not result in run-time errors.
Time to roll up our sleeves
To understand how type-based thinking can reduce testing, let us take a simple example of a common problem in virtually any SAAS today- creating error messages in many languages (this is not a blog on internationalization - for that I refer you to this ancient, but excellent piece by Joel Spolsky, here). Please also be informed that the implementation below aims to prove an aspect. It is far from an ideal implementation model, using Java.
Let us define the problem.
- When we add a new error message, we must ensure that the new error message is added in the list of error strings for every language i.e, English, Spanish… etc
- When we add a new language, we must ensure that the list of error messages in the new language are complete.
If we use a dynamically typed language like JS or even a statically typed language like Java, the only way we can ensure (1) and (2) is by writing exhaustive test cases for each language and adding more test cases as new languages are added. When we add a new error, the Java compiler cannot ensure that it has been added to every language.
Here is a sample of this kind of thing written in Java.
public class ErrorSystem { public static String getErrorString(String lang, Error error) { if (lang == "English") return EnglishErrorMessages.getInstance().getErrorString(error); else if (lang == "French") return FrenchErrorMessages.getInstance().getErrorString(error); return "Unknown Language"; } } public enum Error { PASSWORD_ERROR, USERNAME_ERROR, } public interface ErrorMessages { public String getErrorString(Error error); } public class EnglishErrorMessages implements ErrorMessages { private Map < Error, String > errors = new HashMap < Error, String > (); private static EnglishErrorMessages instance; private EnglishErrorMessages() { errors.put(Error.PASSWORD_ERROR, "English: Password Error"); errors.put(Error.USERNAME_ERROR, "English: User name error"); } public static ErrorMessages getInstance() { if (instance == null) { instance = new EnglishErrorMessages(); } return instance; } @Override public String getErrorString(Error error) { return errors.get(error); } } public class FrenchErrorMessages implements ErrorMessages { private Map < Error, String > errors = new HashMap < Error, String > (); private static FrenchErrorMessages instance; private FrenchErrorMessages() { errors.put(Error.PASSWORD_ERROR, "French: Password Error"); errors.put(Error.USERNAME_ERROR, "French: User name error"); } public static ErrorMessages getInstance() { if (instance == null) { instance = new FrenchErrorMessages(); } return instance; } @Override public String getErrorString(Error error) { return errors.get(error); } }
...and so on for other languages.
Important points not note here are:
- When we add a new error, say EMAIL_ERROR, there is no way the compiler can determine whether an equivalent error string has been mapped for every language that the system has. If FrenchErrorMessages does not define a string for EMAIL_ERROR, the errors.get(error) statement will return a NULL, which can only be detected at run time. To test this we need to add new test cases for each language when a new error gets added into the system.
- On the other hand if a new language has been added , say, KlingOnErrorMessages, but not added on to the if-else-if… in the main controlling switch, then this will just return an “Unknown language” error during runtime. Again, a new set of test cases have to be written when a new language is added.
Let us see how a language that supports better type system and pattern matching can help us do this without writing any test cases. Following is a type-based equivalent in Scala, which also belongs to the ML family, like Haskell, Idris etc and where advanced type features such as refinement types are under serious consideration.
//Errors.scala sealed trait Error final case class PasswordError() extends Error final case class UserIdError() extends Error //Languages.scala sealed trait Language final case class English() extends Language final case class French() extends Language //ErrorMessages.scala object ErrorMessages { def getMessage(err:Error)(lang:Language):String= lang match { case English() =>englishMessages(err); case French()=>frenchMessages(err); } def englishMessages(err:Error) : String = err match { case PasswordError()=>"Password Error" case UserIdError()=>"UserId error()" case SomeOtherError()=> "Some Other Error" } def frenchMessages(err:Error):String = err match{ case PasswordError()=>"French: Password Error" case UserIdError()=>"French: UserId error()" case SomeOtherError()=> "French: sometOtherError" } }
IF you add a new error case class such as:
final case class EmailError() extends Error
Then, ErrorMessages.scala will not compile ( actually the compiler generates a warning, but you can set it up to throw an error - a better practice)
Similarly, IF you add a new Language in language.scala
final case class KlingoOn() extends Language
Then, ErrorMessage.scala will not be compile as the compiler will warn you that you have missed the case “KlingOn” in:
def getMessage(err:Error)(lang:Language):String= lang match { case English() => englishMessages(err); case French() => frenchMessages(err); }
The magic happens because of the sealed nature of the Error and Language traits. It tells the compiler that all the subtypes of these traits are in the same file. This enables the compiler to determine if you have missed any of the cases in a pattern matching operation. Neat, isn’t it?
You really have to write no test cases to check if a language is missing or to check whether an error message has been missed in any of the languages. This what is meant by “If it compiles, then it runs”. However, if someone makes a spelling mistake in the error message, that has to be manually checked. Also, some tests have to be written to test logical errors, especially if one is in the habit of copying and pasting code. For example:
def getMessage(err:Error)(lang:Language):String= lang match { case English() => englishMessages(err); case French() => frenchMessages(err); case German() => frenchMessages(err); //programmer copy pasted code! }
Note: Scala permits you to define default cases, similar to default in switch statements in Java. Strong advice: do not use it. If you use them the above benefits won’t be available. The default statement will be executed during run-time, in which case you are back to writing hundreds of test cases.
Type based abstractions can help you reduce your testing effort by an order of magnitude. But this is only the tip of the iceberg. With ever-increasing complexity in software, type-driven development using languages that support advanced type systems may actually become a necessity, if we are to build large, scalable and robust systems.
Comments
Post a Comment