Fork me on GitHub

Background

As I usually do every Sunday, I skim through Sergeys F# Weekly just to see if there are anything interesting happening in the F# Community.

This week I found Lucas Reis’ blog post really well written, educational and didactic, specially the visualization of final state machine representation.

What seem to tingle a bit my OCD was the implementation of the EventStore:

1
2
3
4
5
6
7
8
9
type EventStore() =
    let eventList =
        new ResizeArray<String * ScoutEvent>()
            
    member this.Save(name, events) =
        events |> List.iter (fun e -> eventList.Add(name, e))
                        
    member this.Get() =
        eventList

Problem by introducing OO data structures into F# (or OCaml)

As Lucas mention, you can just declare a type with () and define it’s members, and then you have a new data structure in F#. As with Lucas EventStore, I will point out the main issue by taking this approach. If we look into MSDN, we can see that ResizeArray is just a type abbreviation for a generic .NET list:

type ResizeArray<'T> = System.Collections.Generic.List<'T>

So my example will also made by using the built-in ResizeArray data structure:

1
2
3
4
let xs = new ResizeArray<int>()
    
Array.Parallel.init 1000 (fun i -> xs.Add i) |> ignore
xs |> Seq.reduce(fun x y -> x + y)

We can see that the final reduced sum is a non-deterministic as well as incorrect result:

> val it : int = 991456
> val it : int = 1490956
> val it : int = 1990456

So why is this happening? Well if you are used to work with the .NET platform, you might as well (if you actually read the documentation on MSDN) have seen the following text on the bottom of almost every Class definition, under the Thread Safety sections:

Public static (Shared in Visual Basic) members of this type are thread safe. Any instance members are not guaranteed to be thread safe.

The main point here is that .NET collections are not immutable and therefore don’t fit well with the functional paradigm that F# is mainly built-on, even though it has support for other paradigms as imperative and OO.

Build your data structures the right way

Is there a way to solve this self inflicted problem? Yes, we can create constrained types in F#, see Scott Wlaschin Gist in the References below for more information, where you can avoid exposing types from a module. They are accessible from inside the module, but not from code importing the module.

With this in mind, I will create an immutable array like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
module Immutable =
    type 'a iarray = private | T of 'a array with
        override ia.ToString() =
            ia |> function | T xs -> xs |> sprintf "%A"
            
    module Array =
        let init n f =
            Array.Parallel.init n f |> T
        let map f (T xs) =
            xs |> Array.Parallel.map f |> T
        let iter f (T xs) =
            xs |> Array.iter f
        let reduce f (T xs) =
            xs |> Array.reduce f
        let fold init f (T xs) =
            xs |> Array.fold f init
        let length (T xs) = xs |> Array.length
        let at i (T xs as ixs) =
            if i < 0 || i >= (length ixs) then
                failwith (sprintf "index: %i is out of boundries." i)
            else
                xs.[i]
        let append (T xs) (T ys) =
            Array.append xs ys |> T
                
        module Extra =
            let add x (T xs) =
                Array.append xs [| x |] |> T
            let pop (T xs as ixs) = length ixs |> function
                | 0 -> failwith "the array is empty."
                | 1 -> [|         |] |> T
                | n -> xs.[0 .. n-2] |> T

where the 'a iarray is visible from outside, while the single-case union constructor T is marked as private | T of 'a array therefore it can only be accessed from inside the module (and sub modules).

As you can see in the sub (and sub sub) modules, I’m just extracting the standard (and mutable) array type from the single-case union constructor and then using the built-in functions to perform the desired logic.

If you look carefully, I’m never exposing the underlying and mutable array, therefore, as I don’t allow any external piece of code to instantiate my type iarray unless it’s by using the init function, I can therefore argue that my data structure is sound to be used as an immutable F# data structure as the native built-in would be used.

ResizeArray vs iarray

Snippets:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
let foobar =
    Array.Parallel.init 1000 id
    |> Array.reduce(fun x y -> x + y)

let foo =
    let xs = new ResizeArray<int>()
    
    Array.Parallel.init 1000 (fun i -> xs.Add i) |> ignore
    xs |> Seq.reduce(fun x y -> x + y)

let bar =
    let xs = Immutable.Array.init 0 id
    
    Array.Parallel.init 1000 (fun i -> xs |> Immutable.Array.Extra.add i)
    |> Array.reduce(fun x y -> Immutable.Array.append x y)
    |> Immutable.Array.reduce (fun x y -> x + y)

Output:

>
val foobar : int = 499500
val foo : int = 304641
val bar : int = 499500

Functor modules as in OCaml

In my current implementation of iarray my additions to the array are in linear time, as a new array +1 needs to be allocated on another spot in memory, while my indexed access still is in constant time. So in the case that I was using this data structure for a lot of reads but very few inserts, it would be ideal, but what about if I had a lot of inserts but very few reads? Or what if I had more or less fifty/fifty on reads and writes? Well, in the case that I had a lot of writes and few reads, I would have used a standard built in list as the underlying data structure due to constant addition and linear reads while in the case where I had fifty/fifty reads and writes I would probably go for a balanced tree, logarithmic reads and writes. In all these cases, I would actually have to create new and separated modules for each of the approaches I mention.

Therefore it would be really nice if F# could port the Functor modules from OCaml as it would allow us to change the underlying datastructures inside a module.

I’ve POC an approach where I used records as modules, as you can see in the References, but it’s very hackerish and doesn’t really gets the job done …

Conclusion:

I think it’s a change of the mindset that you need to do when your are coding with functional programming languages that are multi-paradigm, as you will be able to do things the way you are used to do, in an OO way, but that might not always be the appropriate approach.

References:

comments powered by Disqus