Reflecting on Haskell in 2017

Alas, another year has come and gone. It feels like just yesterday I was writing the last reflection blog post on my flight back to Boston for Christmas. I’ve spent most of the last year traveling and working in Europe, meeting a lot of new Haskellers and putting a lot of faces to names.

Haskell has had a great year and 2017 was defined by vast quantities of new code, including 14,000 new Haskell projects on Github . The amount of writing this year was voluminous and my list of interesting work is eight times as large as last year. At least seven new companies came into existence and many existing firms unexpectedly dropped large open source Haskell projects into the public sphere. Driven by a lot of software catastrophes, the intersection of security, software correctness and formal methods have been become quite an active area of investment and research across both industry and academia. It’s really never been an easier and more exciting time to be programming professionally in the world’s most advanced (yet usable) statically typed language.

Per what I guess is now a tradition, I will write my end of year retrospective on my highlights of what happened in the Haskell scene in retrospect.

Writing

The amount of Haskell writing this year was vast and it’s impossible to enumerate all of the excellent writing that occurred. Some of the truly exceptional bits that were drafted in 2017 included:

Books & Courses

Vladislav Zavialov and Artyom Kazak set out to write a book on the netherlands of Intermediate Haskell development, a mythical place that we all seemingly pass through but never speak of again. The project is intuitively called Intermediate Haskell and is slated to be released in 2018

Bartosz Milewski finished writing Category Theory for Programmers which is freely available and also generated as PDF.

Brent Yorgey drafted a new course for teaching introduction to Computer Science using Haskell at Hendrix University. Dmitry Kovanikov and Arseniy Seroka also drafted a course for a wide variety of intermediate to advanced Haskell topics at ITMO university. Some of which are in Russian but nevertheless большое письмо!

GHC

The Glorious Glasgow Haskell Compiler had it’s 8.2 release, and landed to much rejoicing. Major features such as unboxed sum types landed as planned in GHC 8.2. There were many longstanding issues that were closed and many miscellaneous fixes contributed in 2017. For instance GHC now uses alternative linkers such as ld.gold or ld.lld instead of the system default ld.

Semigroup is now a superclass of Monoid.

There was quite a bit of work on GHC cross compilation to ARM. The new build system Hadrian has been in work for the past three years, has was finally been merged into the GHC tree.

The DevOps Group has officially started and is being funded to help maintain the infrastructure used to host Haskell packages and build GHC. The general push of the group has been toward using hosted CI services, Appveyor and CircleCI and a greater use of more transparent platforms such as Github for GHC development.

There is work on a major refactor of the AST types to use the new Trees that Grow research to allow GHC API user to extend the AST for their own purposes. Eventually this may allow the split of the AST types out of the ghc package, allowing tooling authors, Template Haskell users, and the compiler itself to use the same AST representation.

GHC is partially accepting pull requests on Github although most of the development still occurs on the mailing list and Phabricator.

There was significant engineering effort to allow GHC to produce deterministic build artifacts to play nicely with caching reproducible build systems such as Buck and Bazel. Previously the emitted .hi files would contain non-deterministic data such as hashes, timestamps and unique name supply counts.

Errors

GHC 8.2 added wonderful new colorful error messages with caret diagnostics for syntax and type errors:

Colorful errors GHC 8.2

Compact Regions

Support for ‘Compact Regions’ landed in GHC 8.2. Compact regions are manually allocated regions where the data allocated inside it are compacted and not traversed by the GC. This is amenable for long-lived data structures that are resident in memory without mutation frequently occurring.

The interface can be accessed through the ghc-compact modules and used to create compact malloc’d memory.

import Control.DeepSeq
import Control.Exception
import qualified Data.ByteString.Char8 as B
import Data.Compact

main = compact (B.pack ['a'..'c'])

The test suite inside GHC maintains the best illustrations of its use for complex non-traditional data structures.

Type.Reflection

GHC added a new more expressive Typeable mechanism using the Type.Reflection module. typeRep can be applied with explicit type applications to arrows and functions can checked for type-level equality after application.

λ> typeRep @(->)
(->) 'LiftedRep 'LiftedRep
λ> typeRep @(Maybe Int) == App (typeRep @Maybe) (typeRep @Int)
True

Coercible

Not new to GHC 8.2 although the base library now exposes the Coercible constraints allowing polymorphism over types which have the same runtime representations to be safely coerced at runtime. This can also be extended to polymorphic functions which take a compile-time proof of the equivalence of two runtime data layouts.

import Data.Coerce

newtype Meters = Meters Integer deriving (Num)
newtype Feet = Feet Integer deriving (Num)

f :: (Coercible a b, Num b) => a -> b -> b
f x y = coerce x + y
λ> f (Meters 3) 4
7
λ> f (Feet 3) 4
7
λ> f 4 (Feet 3)
Feet 7 

GHC tracks the usage (i.e. role) of type variables used as parameters as either nominal representational or phantom allowing types that differ only in a phantom type of nominal parameters to be safely coerced.

Join Points

Luke Maurer and Simon Peyton Jones merged new work on join points which modifies GHC Core to optimize for join points in code. In Core, a join point is a specially tagged function whose only occurrences are saturated tail calls. In the actual GHC Core AST, a join point is simple bit of metadata indicated by IdDetails of the binder.

Simon Peyton Jones presented the keynote at Haskell Exchange on his collaboration on Compiling without Continuation which present the ideas and core optimizations that are allowed by the new join points.

Deriving Strategies

In GHC 8.0 there were two alternative methods for automatic deriving of typeclass instances, using GeneralizedNewtypeDeriving and DeriveAnyClass. In addition there was also the wired-in deriving mechanisms for Show, Read, etc that were hardcoded into the compiler. These all used the same syntax and would conflict if multiple pragmas were enabled in a module.

The addition of DerivingStrategies allows us to disambiguate which deriving mechanism to use for a specific instance generation.

newtype Meters = Meters Int
  deriving stock    (Read, Show)
  deriving newtype  (Num)
  deriving anyclass (ToJSON, FromJSON)

Backpack

Edward Yang finished his PhD thesis on Backpack which was integrated in the GHC tree. The new branch adds support .bkp files, which specify abstract interfaces which can be instantiated in modules and used to construct Haskell modules which work polymorphically across multiple module instantiations.

For example an abstract string type can be written which operates over a module parameter `Str``:

unit abstract-str where
    signature Str where
        data Str
        len :: Str -> Int

    module AStr (alen) where
        import Str

        alen :: Str -> Int
        alen = len

We can create (contrived) instantiations of this module for lists of ints and chars which expose a polymorphic length function over both.

unit str-string where
    module Str where
        type Str = String

        len :: Str -> Int
        len = length

unit str-list where
    module Str where
        type Str = [Int]

        len :: Str -> Int
        len = length

The modules can then be opened as specific namespaces with the exported functions able to be called over both module types.

unit main where
    dependency abstract-str[Str=str-string:Str] (AStr as AStr.Int)
    dependency abstract-str[Str=str-list:Str] (AStr as AStr.String)

    module Main (main) where
        import qualified AStr.Int
        import qualified AStr.String

        main :: IO ()
        main = do
            print $ AbstractStr.String.alen "Hello world"
            print $ AbstractStr.Int.alen [1, 2, 3]

With the latest GHC this can be compiled with the --backpack which generates the sum of all the hi files specified in the .bkp file and resolves values at link-time.

$ stack exec -- ghc --backpack example.bkp

While the functionality exists today, I’m not aware of any large projects using Backpack. Bolting this functionality onto an ecosystem that has spent a decade routing around many of the problems this system aims to solve, poses a huge engineering cost and may take a while to crystallize.

Summer of Haskell

Google lacked vision this year and did not sponsor the Haskell Organization for Summer of Code. But the program proceeded regardless with private sponsorship from industrial users. Fifteen students were paired with mentors and many successful projects and collaborations resulted.

LLVM

The LLVM bindings for Haskell saw quite a bit of work this year and were forked into a new organization llvm-hs and added support for LLVM 4.0 - 5.1:

  1. llvm-hs
  2. llvm-hs-pure
  3. llvm-hs-pretty

In a collaboration with Joachim Breitner and myself at Zurihac a type-safe LLVM library which embeds the semantics of LLVM instructions into the Haskell type-system was written.

Siddharth Bhat started work on an STG to LLVM backend simplehxc before doing a rewrite in C++.

Moritz Angermann has been continuing to develop a Haskell library for emitting and reading LLVM Bitcode format as well as work on the llvm-ng backend which is a major rewrite of the GHC LLVM code generator.

Linear Types

Arnaud Spiwack prototyped a extension of GHC which augments the type system with linear types. Edsko de Vries wrote a detailed blog post about the nature of linearity and its uses.

The proposal extends the typing of functions to include linearity constraints on arrows, enforcing that variables or references are created and consumed with constrained reference counts. This allows us to statically enforce reference borrowing and allocations in the typechecker potentially allowing us to enforce lifetime constraints on closures and eliminating long-lived memory from being used with constructed unbounded lifetimes, thereby eliminating garbage collection for some Haskell functions.

For instance, use of the linear arrow (a ->. b) can enrich the existing raw memory access functions enforcing the matching of allocation and free commands statically. The multiplicity of usage is either 0, 1 or ω and the linear arrow is syntatic sugar for unit multiplicity are aliases for (:'1 ->).

malloc :: Storable a => a ->. (Ptr a ->. Unrestricted b) ->. Unrestricted b
read :: Storable a => Ptr a ->. (Ptr a, a)
free :: Ptr a ->. ()

New abstractions such as movable, consumable and dupable references can be constructed out of existing class hierarchies and enriched with static linearity checks:

class Consumable a where
  consume :: a ->. ()

class Consumable a => Dupable a where
  dup :: a ->. (a, a)

class Dupable a => Movable a where
  move :: a ->. Unrestricted a

instance Dupable a => Dupable [a] where
  dup [] = ([], [])
  dup (a:l) = shuffle (dup a) (dup l)
    where
      shuffle :: (a, a) ->. ([a], [a]) ->. ([a], [a])
      shuffle (a, a') (l, l') = (a:l, a':l')

instance Consumable a => Consumable [a] where
  consume [] = ()
  consume (a:l) = consume a `lseq` consume l

As part of the Summer of Haskell Edvard Hübinette used linear types to construct a safer stream processing library which can enforce resource consumption statically using linearity.

There is considerable debate about the proposal and the nature of integration of linear types into GHC. With some community involvement these patches could be integrated quite quickly in GHC.

LiquidHaskell

LiquidHaskell the large suite of tools for adding refinement types to GHC Haskell continued development and became considerably more polished. At HaskellExchange several companies were using it in anger in production. For example, we can enforce statically the lists given at compile-time statically cannot contain certain values by constructing a proposition function in a (subset) of Haskell which can refine other definitions:

measure hasZero :: [Int] -> Prop
hasZero [] = false
hasZero (x:xs) = x == 0 || (hasZero xs)

type HasZero = {v : [Int] | (hasZero v)}

-- Rejected
xs :: HasZero
xs = [1,2,3,4]

-- Accepted
ys :: HasZero
ys = [0,1,2,3]

This can be used to statically enforce that logic that consumes only a Just value can provably only be called with a Just with a isJust measure:

measure isJust :: forall a. Data.Maybe.Maybe a -> Bool
isJust (Data.Maybe.Just x)  = true 
isJust (Data.Maybe.Nothing) = false 

This year saw the addition of inductive predicates, allowing more complex properties about non-arithmetic refinements to be checked. Including properties about lists:

measure len :: [a] -> Int
len [] = 0
len (x:xs) = 1 + (len xs)

-- Spec for Data.List exports refined types which can be statically refined with
-- length constraints.
[] :: {v:[a]| len v = 0}
(:) :: _ -> xs:_ -> {v:[a]| len v = 1 + len xs}

append :: xs:[a] -> ys:[a] -> {v:[a]| len v = len xs + len ys}

The full library of specifications is now quite extensive and adding LiquidHaskell to an existing codebase is pretty seamless.

Foundation

Foundation is an alternative Prelude informed by modern design practices and data structures. It ships a much more sensible and efficient packed array of UTF8 points as its default String type. Rethinks the Num numerical tower , and statically distinguishes partial functions. Also has fledgling documentation beyond just

Last year Foundation was a bit early, but this year at Zurihac several companies in London reported using it fully in production as a full industrial focused Prelude.

Editor Tooling

Editor integration improved, adding best in modern tooling to most of the common editors:

Editor Haskell Integration

Atom

https://atom.io/packages/ide-haskell

Emacs

https://commercialhaskell.github.io/intero/

IntelliJ

https://plugins.jetbrains.com/plugin/8258-intellij-haskell

VSCode

https://marketplace.visualstudio.com/items?itemName=Vans.haskero

Sublime

https://packagecontrol.io/packages/SublimeHaskell

Monica Lent wrote a lovely post on the state of art in Vim and Haskell integration.

Rik van der Kleij has done an impressive amount of work adapting the IntellJ IDE to work with Haskell. Including a custom parser handling all of the syntax extensions, name lookup, Intero integration and integration with haskell-tools refactoring framework.

Development on the haskell-ide-engine has picked up again in the last few months.

Formal Methods

The DeepSpec, a collaboration between MIT, UPenn, Princeton and Yale, is working on a network of specification that span many compilers, languages and intermediate representations with the goal of achieving full functional correctness of software and hardware. Both Haskell and LLVM are part of this network of specifications. The group has successfully written a new formal calculus describing the GHC core language and proved it type sound in Coq. The project is called corespec and is described in the paper “A Specification for Dependent Types in Haskell”.

In addition the group also published a paper “Total Haskell is Reasonable Coq” and provided a utlity hs-to-coq which converts haskell code to equivalent Coq code.

The Galois group started formalizing the semantics of Cryptol, a compiler for high-assurance cryptographic protocols which is itself written in Haskell.

Michael Burge wrote a cheeky article about extracting a specification for a “domain specific” browser from Coq into Haskell.

Pragma Proliferation & Prelude

Writing Haskell is almost trivial in practice. You just start with the magic fifty line {-# LANGUAGE ... #-} incantation to fast-forward to 2017, then add 150 libraries that you’ve blessed by trial and error to your cabal file, and then just write down a single type signature whose inhabitant is Refl to your program. In fact if your program is longer than your import list you’re clearly doing Haskell all wrong.

In all seriousness, Haskell is not the small language it once was in 1998, it’s reach spans many industries, hobbyists, academia and many classes of people with different incentives and whose ideas about the future of the language are mutually incompatible. Realistically the reason why the Prelude and extension situation aren’t going to change anytime soon is that no one person or company has the economic means to champion such a change. It would be enormously expensive and any solution will never satisfy everyone’s concerns and desires. Consensus is expensive, while making everything opt-in is relatively cheap. This is ultimately the equilibrium we’ve converged on and barring some large sea change the language is going to remain in this equilibrium.

Haskell Survey

Taylor Fausak conducted an unofficial survey of Haskell users with some surprising results about widespread use of Haskell. Surprisingly there are reportedly 100 or more people who maintain 100,000 or more lines of Haskell code. Not so surprisingly most people have migrated to Stack while vim and emacs are the editors of choice.

While the majority of respondents are satisfied with Haskell the language the response are somewhat mixed the quality of libraries and the bulk of respondents reported Haskell libraries being undocumented , hard to use hard to find, and don’t integrate well.

Projects

Idris, the experimental dependently typed language, reached 1.0 release and became one of the larger languages which is itself written in Haskell.

The most prolific Haskell library Pandoc released its version 2.0.

Several other groups published new compilers in the Haskell-family of languages. Intel finally open sourced the Intell Haskell compiler which was a research project in more optimal compilation techniques. Morgan Stanley also released Hobbes a Haskell-like language used internally at the bank featuring several novel extensions to row-types and C++ FFI integration. A prototype Haskell compiler was also written in Rust.

The SMT solver integration library SBV saw a major rewrite of its internal and its old tactics system. The library is heavily used in various projects as an interface to Z3 and CVC4 solvers.

Uber released a library for parsing and analysis of Vertica, Hive, and Presto SQL queries.

Wire released the backend service to their commercial offering.

The Advanced Telematic Systems group in Berlin released a Quickcheck family library for doing for property testing of models about state machines. Bose also released Smudge a tool for doing development and analysis of large state machines for hardware testing.

Florian Knupfer released a new high-performance HTML combinator library for templating.

Galois continued with HalVM unikernel development this year, and several HalVM Docker Imagaes were published allowing a very convenient way to write and test code against HalVM.

Facebook released a Haskell string parsing library duckling which parses human input into a restricted set of semantically tagged data. Facebook also prototyped a technique for hot-swapping Haskell code at runtime using clever GHC API trickery.

Chris Done released vado a browser engine written in Haskell.

Jonas Carpay released apecs an entity component system for game development in Haskell.

Csaba Hruska continued work on a GRIN compiler and code generator, an alternative core language for compiling lazy functional languages based on the work by Urban Boquist.

Dima Szamozvancev released mezzo a library and embedded domain-specific language for music description that can enforce rules of compositionality of music statically and prevent bad music from being unleashed on the world.

Conal Elliot and John Wiegley advanced a novel set of ideas on Compiling with Categories which allows the bidirectional representation of Haskell functions as categorical structures. Although currently implemented as a GHC typechecker extension it is a promising research area.

Harry Clarke published a paper on Generics Layout-Preserving Refactoring using a reprinter. i.e. a tool which takes a syntax tree, the original source file, and produces an updated source file which preserves secondary notation. This can be used to build tools like haskell-tools for arbitrary languages and write complicated refactoring suites.

Joachim Breitner contributed a prolific amount of open source projects including a new technique (ghc-proofs) for proving the equivalence of Haskell programs using a GHC plugin, (veggies) a verified simple LLVM code generator for GHC, and (inspection-testing) new technique for verifying properties of Core.

  1. veggies
  2. ghc-proofs
  3. inspecction-testing

The ghc-proofs plugin allows us to embed equation and relations into our Haskell module and potentially allow GHC to prove them correct by doing symbolic evaluation and evaluating them as far as possible and checking the equational relations of the equations sides. Right now it works for contrived and simple examples, but is quite a interesting approach that may yield further fruit.

{-# OPTIONS_GHC -O -fplugin GHC.Proof.Plugin #-}
module Simple where

import GHC.Proof
import Data.Maybe

my_proof1 = (\f x -> isNothing (fmap f x))
        === (\f x -> isNothing x)
$ ghc Simple.hs
[1 of 1] Compiling Simple           ( Simple.hs, Simple.o )
GHC.Proof: Proving my_proof1 …
GHC.Proof proved 1 equalities

Haddock

Haddock is creaking at the seams. Most large Haskell projects (GHC, Stack, Agda, Cabal, Idris, etc) no longer use it for documentation. The codebase is dated and long standing issues like dealing with reexported modules are still open.

There is a large vacuum for a better solution to emerge and compatibility with RestructuredText would allow easy migration of existing documentation.

Databases

This year saw two new approaches to Haskell database integration:

  1. Selda - A library interacting with relational databases inspired by LINQ and Opaleye.
  2. Squeal - A deep embedding of PostgreSQL in Haskell using generics-sop.

This still remains an area in the ecosystem where many solutions end up needlessly falling back to TemplateHaskell to generate data structures and typically tend to be a pain point. Both these libraries use generics and advanced type system features to statically enforce well-formed SQL query construction.

Data Science & Numerical Computing

Chris Doran presented a concise interpretation of Geometric Algebra, a generalization of linear algebra in terms of graded algebra written in Haskell. A talk on this project was presented at the Haskell Exchange.

At ZuriHac several people expressed interest in forming a Data Haskell organization to work on advancing the state of Haskell libraries for data science and macing learning. There is considerable interest of the constructing the right abstractions for a well-typed dataframe library.

Tom Nielson is furiously working on a suite of of projects, mostly related to data science, machine learning and statistics in Haskell.

The Accelerate project has been seeing a lot of development work recently and now has support for latest versions of LLVM and CUDA through llvm-hs project.

A group at the University of Gothenburg released TypedFlow a statically typed higher-order frontend to TensorFlow.

Eta

Typelead continued working on a Java Virtual Machine backend to GHC called Eta. The project has seen considerable amount of person-hours invested in compatibility and the firm has raised capital to pursue the project as a commercial venture.

The project does seem to be diverging from the Haskell mainline language in both syntax and semantics which raises some interesting questions about viability. Although the Haskell language standard seems to be stuck and not moving forward, so there’s an interesting conundrum faced by those trying to build on top of GHC in the long run.

WebAssembly

WebAssembly has been released and is supported by all major browsers as of December 2017. This is been piquing the interest of quite a few Haskellers (including myself) who have been looking for a saner way to interact with browsers than the ugly hack of ejecting hundreds of thousands of lines of generated Javascript source. WebAssembly in either standalone projects or as a target from GHC’s intermediate form STG to the browser offers an interesting path forward.

  1. Haskell WASM
  2. WebGHC
  3. ministgwasm
  4. forest-lang

Industry

An incomplete list of non-consulting companies who are actively using Haskell or have published Haskell open sources libraries follows:

Product Companies
Standard Chartered
Galois
Intel
Target
Uber
Vente Privee
FrontRow
NStack
Morgan Stanley
Takt
Fugue
Habito
Asahi Net
Sentenai
IOHK
Awake Networks
Facebook
Adjoint
DigitalAsset
AlphaSheets
Channable
SlamData
Wire
JP Morgan
Bose

There are four consulting companies also supporting Haskell in industry.

Consultancies
Well-Typed
Tweag
FP Complete
Obsidian Systems

Conferences

Both the Haskell Exchange and ZuriHac conference had record attendance this year. Haskell was well-represented in many interdisciplinary conferences include StrangeLoop, CurryOn, LambdaDays, LambdaConf, and YOW.

Most of the videos are freely available online.

  1. Haskell Exchange Videos
  2. ComposeConf Videos
  3. ZuriHac Videos
  4. ICFP Videos

Haskellers by the lake in Rapperswil, Switzerland

In 2018 I have plans to visit Zurich, Sydney, Budapest, Amsterdam, San Francisco, Paris, Copenhagen and Tokyo this year. Hopefully I get to meet more Haskellers and share some ideas about the future of our beautiful language. Until then, Merry Christmas and Happy New Haskell Year.