Skip to content

typeAI/probability-monad

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Probability Distribution Monad

Makes it easy to create, manipulate and sample probability distributions.

For example, here's how you would code up the following problem: You are given either a fair coin or a biased coin with equal probability. If you flip it 5 times and it comes up heads each time, what is the probability you have the fair coin?

case class Trial(haveFairCoin: Boolean, flips: List[Coin])

def bayesianCoin(nflips: Int): Distribution[Trial] = {
  for {
    haveFairCoin <- tf()
    c = if (haveFairCoin) coin else biasedCoin(0.9)
    flips <- c.repeat(nflips)
  } yield Trial(haveFairCoin, flips)
}

bayesianCoin(5).given(_.flips.forall(_ == H)).pr(_.haveFairCoin)

Or: You repeatedly roll a 6-sided die and keep a running sum. What is the probability the sum reaches exactly 30?

def dieSum(rolls: Int): Distribution[List[Int]] = {
  markov(rolls, List(0))(runningSum => for {
    d <- die
  } yield (d + runningSum.head) :: runningSum)
}

dieSum(30).pr(_ contains 30)

Or: Each family has children until it has a boy, and then stops. What is the expected fraction of girls in the population?

sealed abstract class Child
case object Boy extends Child
case object Girl extends Child

def family = {
  discreteUniform(List(Boy, Girl)).until(_ contains Boy)
}

def population(families: Int) = {
  for {
    children <- family.repeat(families).map(_.flatten)
    val girls = children.count(_ == Girl)
  } yield 1.0 * girls / children.length
}

population(4).ev

How it works

A Distribution[T] represents a random variable that, when sampled, produces values of type T according to a particular probability distribution. For example, Distribution.uniform is a Distribution[Double] that produces Double values between 0.0 and 1.0, uniformly distributed. Distribution.coin is a Distribution[Coin] that produces the values H and T with equal probability, and Distribution.biasedCoin(0.3) is a Distribution[Coin] that produces the value H 30% of the time and the value T 70% of the time.

You can think of a Distribution[T] as a collection like any other scala collection that you can map, flatMap and filter over. The presence of these methods allow you to use scala's for-comprehensions to manipulate distributions. For example, here's how you would create a distribution that represents the sum of 2 die rolls:

val dice = for {
  d1 <- die
  d2 <- die
} yield d1 + d2

Here, die is a Distribution[Int], and d1 and d2 are both Ints. The type of dice is Distribution[Int]. You can see that for-comprehensions are an easy way to define new a distribution in terms of individual samples from other distributions.

You can visualize a distribution with hist:

scala> dice.hist
 2  2.61% ##
 3  5.48% #####
 4  8.70% ########
 5 10.53% ##########
 6 14.21% ##############
 7 16.90% ################
 8 13.90% #############
 9 11.43% ###########
10  8.35% ########
11  5.17% #####
12  2.72% ##

If you want more control over the display of continuous distributions, use bucketedHist:

scala> normal.map(_ * 2 + 1).bucketedHist(20)   // 20 buckets, min & max determined automatically
-7.0  0.02% 
-6.0  0.03% 
-5.0  0.22% 
-4.0  1.01% #
-3.0  2.69% ##
-2.0  6.43% ######
-1.0 12.19% ############
 0.0 17.07% #################
 1.0 19.74% ###################
 2.0 17.55% #################
 3.0 12.17% ############
 4.0  6.55% ######
 5.0  2.92% ##
 6.0  1.10% #
 7.0  0.23% 
 8.0  0.06% 
 9.0  0.01% 
10.0  0.01%

scala> cauchy.bucketedHist(-10, 10, 20)   // min=-10, max=10, #buckets=20
-10.0  0.20% 
 -9.0  0.38% 
 -8.0  0.44% 
 -7.0  0.55% 
 -6.0  0.82% 
 -5.0  1.23% #
 -4.0  1.85% #
 -3.0  2.92% ##
 -2.0  6.78% ######
 -1.0 16.78% ################
  0.0 30.04% ##############################
  1.0 16.64% ################
  2.0  6.22% ######
  3.0  3.06% ###
  4.0  1.76% #
  5.0  1.26% #
  6.0  0.84% 
  7.0  0.67% 
  8.0  0.48% 
  9.0  0.42% 
 10.0  0.14% 

This probability monad is based on sampling, so the values and plots produced will be inexact and will vary between runs.

scala> normal.stdev
res9: Double = 1.0044818262040809

scala> normal.stdev
res10: Double = 1.0071194147525722

Code and examples

Distribution.scala contains code for creating and manipulating probability distributions. Built-in distributions include:

  • uniform discrete (including die and fair coin)
  • weighted discrete (biased coin, uses the alias method)
  • bernoulli
  • geometric
  • binomial
  • negative binomial
  • poisson
  • zipf
  • uniform continuous
  • normal
  • cauchy
  • chi2
  • pareto
  • exponential
  • lognormal
  • student's t-distribution
  • gamma
  • beta

Methods for manipulating distributions include:

  • adding (convolution), subracting (cross-correlation), multiplying and dividing distributions
  • joint distributions (a flatMap)
  • marginal distributions (a filter)
  • creating Markov chains (an iterated flatMap)
  • finding the probability of arbitrary predicates, conditional probabililty
  • finding expected values (mean), standard deviation, variance, skewness and kurtosis
  • sampling, histogram

Examples.scala contains some example uses, and possibly a RISK simulator.

To try out some examples, do

$ ./sbt console

scala> runBayesianCoin(5)

Contributions welcome!

Releases

No releases published

Packages

No packages published

Languages

  • Scala 98.2%
  • Shell 1.8%