F# in Production

02 Dec 2016

This post is part of the F# Advent Calendar 2016. Check it out for two F# blog posts a day for the whole month of December. And special thanks to Sergey Tihon for organizing this!

What the most efficient way to get the F# community into the christmas spirit? Someone using F# in production, surely. So today I’m going to give a little intro to our latest project, and reflect a bit on the challenges we faced.

Intro

I work at PFA Pension in the actuarial modelling department, where our primary task is to calculate the total liabilities for the company’s insurance policies. The liability is the amount of money we need to set aside to be sure that we’re able to pay our obligations to our customers.

I have previously spoken about how we’ve used F# to calculate the value of individual contracts. This is the extension of that work, in which we consider entire policies consisting of multiple contracts. So where calculating the value of one contract was math heavy, this project’s focus is more on implementing business rules and taming the complexity of these, along with fleshing out the structure of the problem domain in types.

I won’t be getting into too many specifics, but instead I’ll focus on self contained bits I think the community will find interesting.

The Existing Solution

Pension companies having to calculate liabilities is nothing new, and we already had an existing solution in place. This was written in the SAS statistical programming language. For those of you that, thankfully, don’t know about it, SAS is a row-based data-processing language, where you write a function (a so called data-step) to process each row of a table. The main abstraction is a macro, which is just like a function, except it’s just directly inserted into your code. So you have to be careful not to use the same variable names from your macros in your other code. Oh, and did I mention that all variables are global?

People often shun the big rewrite, but in our case, we were at a place where no one wanted to change the existing code in fear of what might happen. We were also using a language that was generating additional complexity instead of limiting it. So, for me at least, it was a no brainer.

Getting Data

To compute anything we need to get some data into an appropriate model. Here we make extensive use of Railway-oriented-programming, which enables us to streamline the error prone process of parsing data.

Let’s look at a little example.

type DataRecord = { PropX : X; PropY : Y }

type Error = | XError of string | YError of string

type Result<'a> = | OK of 'a | Failure of Error

let readX : X = function
                | "1" -> OK X1
                | "2" -> OK X2
                | s   -> Failure (XError <| sprintf "Unknown X type %s" s)

let readY = ...

let example xStr yStr =
    data {
        let! x = readX xStr
        let! y = readY yStr

        return { PropX = x; PropY = y }
    }

Here we’re able to contain the error handling logic in individual, small, functions and then compose them with our data computation expression, which just makes sure none of the let! bindings return Failure, returning the first failure if there are any.

Not only did this make for a relatively pleasant experience doing data parsing. This also gave us an incremental view of how much of the input data we were able to parse correctly. Since all of the parsed data will be wrapped in a Result, measuring completeness was a simple matter of counting how many Results were OK vs Failure.

We also took the same approach to handle possible errors in the liability calculations themselves. For each calculation, we call an external DLL several times, and this might throw an exception. If that happens, the exception is wrapped in the Failure case of Result, avoiding crashing the program. The advantage here is that whenever you run the program with a large workload, you can be sure it doesn’t crash halfway through, and any errors that might pop up can be saved and analyzed later.

Debugging made easy with the REPL

Debugging idiomatic F# isn’t always a lot of fun. If you try to step through your program, it jumps back and forth between function definitions. And if you’re using custom computation expressions like us, you’re in for an incomprehensible ride through your code.

After a certain amount of time, you iron out all of the show stopping bugs, and debugging becomes a matter of understanding why the model outputs the numbers it does. To aid with this, what we’ve done is define a helper script that can load up a single policy and then run various debugging functions on it. Here Deedle plays a big part, since we not only use its basic data frame functionality, but we also convert our existing types to be displayed as a data frame.

Debugging in this way with the REPL is a huge timer saver, because all the data you need will be right there in your FSI session. And if you need a question answered, you just write a function to compute it and run it immediately! If you consider the alternative, which would be to mangle the program with Console.WriteLine and chuck that into Excel for analysis, this approach is faster, easier and more flexible.

Testing

Given that this is a re-write of an existing model, we can do a very thorough test wrt. the final results. These should simply match the output of the old model.

As for unit testing, I’ve found myself just not writing that many. Partly because of the type system and immutability by default, but also because of how many of the business rules we implement are structured. A lot of cases are just: “if case A then multiply value by 5%, else multiply value by 3%”. So naturally, for a given function, I can write a test that tests if this holds up, but what have I really achieved here? I’m just implementing the same logic twice. If do things correctly, the test merely serves as a lock on existing results, and any legal change in the implementation will have me changing the test too. If I’ve misunderstood the rule, the test won’t help me catch the error because I’m also implementing the wrong test.

It’s not like a sorting algorithm, where the test for correctness is vastly different from the algorithm, and there are many legal implementations which will pass tests.

Comments are greatly appreciated here, as it’s a question I’ve been struggling with. Right now we rely mostly on a full model test, which can’t be run on our workstations, and therefore don’t get run as often as the unit tests.

Performance

Performance matters. Even though you’re not doing real time processing of incoming requests, a better performing program is just easier to work with. In our case, being able to run the model 50 times instead of say, 10 times, a day makes us able to answer more questions about its behaviour, and provide more detailed answers to those questions.

First a little bit about the type of workload we have. We have a lot (can’t share specifics, unfortunately) of insurance policies that need a few different numbers calculated for each of them. The policies are completely independent, so the problem becomes an embarassingly parallel one. We run the model on a 64 core machine with 128 GB of RAM, so we’re able to keep a lot of data in memory, if need be.

(Im)mutability

Pretty much all data structures are immutable, except for a few key ones. Our main unit of computation is a Cashflow, which is just a labeled float array with a length somewhere between 1 and 120. We allocate on the order of XX millions of these every time we run the program.

While a standard array is inherently mutable, we have defined the standard operators (+, -, *, /) for a Cashflow such that they create a new Cashflow with a new underlying array each time. So we get the benefits of immutability at the price of the developers having to know not to manually manipulate the underlying arrays. It’s not the guarantee you’d get from using a linked list, but in our case the trade off makes sense.

There’s no doubt we could achieve better performance by just using a few arrays per policy and manipulating them over and over again. But when you factor in the added development time for the inevitable bug hunting and having to write a test suite to test all sorts of weird interactions, we deemed it wasn’t worth it. Now that we’ve started using the model in production, I’m happy to say that it ended up being fast enough for our use.

ToString()

Our output is just a csv file with a bunch of numbers and identifiers in them. Some of these identifiers are just the ToString method of a discriminated union without any data attached. Starting out, I thought I’d be really smart and implemented ToString() like this:

type Example =
    | A | B
    static member this.ToString() = sprintf "%A" this

The good part is that if you expand your DU here, you don’t have to touch the ToString method. The bad part is that this ended up being a severe performance bottleneck in several cases! Turns out this isn’t so fast if you want to call the method millions of times. The fix was simple:

type Example =
    | A | B

    static member this.ToString() =
        match this with
        | A -> "A"
        | B -> "B"

Now you have to change the method if you expand the DU, but even if you forget compiler warnings will remind you of the missing pattern match. This change fixed all of our ToString-related performance issues.

GC Stats

Just for fun I thought I’d give you some numbers. Below you’ll see a table of the GC stats that PerfView generates. These stats were generated running the program on a 64 core machine with CLR Startup Flags: CONCURRENT_GC, SERVER_GC. Only the portion of the code that calculates stuff was profiled. The generation of output was skipped.

GC Index	Pause Start	Trigger Reason	Gen	Pause MSec	% Pause Time	Gen0 Alloc MB	Gen0 Alloc Rate MB/sec
2	15.681,315	AllocSmall	0N	5.204,196	50,4	682,229	133,36
3	31.311,055	AllocSmall	2B	495,062	29,3	6.339,091	2.200,44
4	31.350,123	AllocSmall	1N	5.687,250	35,2	3.776,353	360,86
5	41.291,427	AllocSmall	0F	898,578	17,3	5.953,185	1.381,22
6	46.439,654	AllocSmall	1N	1.225,027	47,2	6.520,651	4.763,03
7	54.761,537	AllocSmall	0N	917,653	11,4	9.696,989	1.366,35
8	59.798,991	AllocSmall	0N	880,756	17,6	10.376,745	2.518,66
9	65.247,703	AllocSmall	2B	1.776,603	28,7	2.990,233	2.402,10
10	65.260,024	AllocSmall	0N	945,024	17,1	10.985,220	2.398,28
11	74.186,778	AllocSmall	1F	2.131,579	21,1	1.505,170	188,39
12	80.610,316	AllocSmall	0N	1.041,057	25,5	12.121,152	3.977,32
13	85.577,245	AllocSmall	0N	1.007,499	20,4	12.253,635	3.121,14
14	90.949,361	AllocSmall	0N	986,747	18,4	12.584,146	2.883,12
15	96.015,221	AllocSmall	1N	2.276,321	35,8	12.645,542	3.099,95
16	102.378,983	AllocSmall	0N	1.088,974	21,0	12.703,983	3.107,93
17	106.666,817	AllocSmall	0N	832,306	20,6	11.686,306	3.653,11
18	112.030,120	AllocSmall	0N	1.000,396	18,1	12.510,029	2.760,88
19	117.049,092	AllocSmall	1N	2.130,006	34,6	12.518,743	3.115,09
20	123.000,183	AllocSmall	0N	1.028,213	21,2	12.470,023	3.263,34
21	128.056,713	AllocSmall	0N	1.064,758	20,9	12.588,864	3.124,97
22	133.372,027	AllocSmall	0N	1.060,356	20,0	12.756,741	3.001,08
24	138.586,971	AllocSmall	0N	1.115,656	21,2	13.464,913	3.240,87

Total Allocs : 212.300,084 MB
Total GC Pause: 34.794,0 msec
% Time paused for Garbage Collection: 28,1%
Max GC Heap Size: 26.251,720 MB

That’s a lot of allocations! 3 GB per second! The program ran for a bit over 2 minutes and allocated over 200 GB. The max heap size was 26 GB, so the vast majority of these allocs didn’t make it past gen 0. Normally we say gen 0 GCs are supposed to be fast, but I guess you can put pressure on those too, after all?

On one hand, the high performance guy in me wants to remove a ton of these allocations at the expense of code safety and development time, but right now we’re at a point where this is just good enough. The model is several times faster than our old solution, so at the moment everyone is praising performance. At the same time the development was a smooth experience, where most of the debugging time was spent figuring out what the old SAS code was really doing.

Summary

In closing, I just want to touch a bit on what I think made this project a success. It wasn’t profunctor optics. It was just the continued application of better types (DUs, records), immutability and better error handling (railway oriented programming). This allowed us to focus on the business logic instead of the technical stuff.

And finally, a shout out to FAKE, Paket & F# Power Tools (we’re not on Code/Ionide …yet), which makes our lives easier every day.

Blog

F# in Production

Intro

The Existing Solution

Getting Data

Debugging made easy with the REPL

Testing

Performance

(Im)mutability

ToString()

GC Stats

Summary

Comments

Related Posts

F# Interactive Pretty Printing with Deedle 26 May 2016

Slides From My Talk At MF#K 20 Jan 2016

A Roll of the Liar's Dice 01 Dec 2015