Add section on choice of prior

Hugo Bowne-Anderson · web-flow · commit 664f796a25b2 · 2018-07-06T18:40:03.000-04:00
diff --git a/notebooks/2.Parameter_estimation_hypothesis_testing.ipynb b/notebooks/2.Parameter_estimation_hypothesis_testing.ipynb
@@ -75,8 +75,23 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "In this equation, we call $P(p)$ the prior (distribution), $P(D|p)$ the likelihood and $P(p|D)$ the posterior (distribution). The intuition behind the nomenclature is as follows: the prior is the distribution containing our knowledge about $p$ prior to the introduction of the data $D$ & the posterior is the distribution containing our knowledge about $p$ after considering the data $D$.\n",
-    "\n",
+    "In this equation, we call $P(p)$ the prior (distribution), $P(D|p)$ the likelihood and $P(p|D)$ the posterior (distribution). The intuition behind the nomenclature is as follows: the prior is the distribution containing our knowledge about $p$ prior to the introduction of the data $D$ & the posterior is the distribution containing our knowledge about $p$ after considering the data $D$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Note** that we're _overloading_ the term _probability_ here. In fact, we have 3 distinct usages of the word:\n",
+    "- The probability $p$ of seeing a head when flipping a coin;\n",
+    "- The resulting binomial probability distribution $P(D|p)$ of seeing the data $D$, given $p$;\n",
+    "- The prior & posterior probability distributions of $p$, encoding our _uncertainty_ about the value of $p$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
     "**Key concept:** We only need to know the posterior distribution $P(p|D)$ up to multiplication by a constant at the moment: this is because we really only care about the values of $P(p|D)$ relative to each other – for example, what is the most likely value of $p$? To answer such questions, we only need to know what $P(λ|D)$ is proportional to, as a function of $p$. Thus we don’t currently need to worry about the term $P(D)$. In fact,\n",
     "\n",
     "$$P(p|D) \\propto P(D|p)P(p) $$\n",
@@ -169,14 +184,25 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Other priors"
+    "### The choice of the prior"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "But wait! We've had to specify a prior. Discuss specification of priors. Motivate Jeffries prior and show equation for it. "
+    "You may have noticed that we needed to choose a prior and that, in the small to medium data limit, this choice can affect the posterior. We'll briefly introduce several types of priors and then you'll use one of them for the example above to see the effect of the prior:\n",
+    "\n",
+    "- **Informative priors** express specific, definite information about a variable, for example, if we got a coin from the mint, we may use an informative prior with a peak at $p=0.5$ and small variance. \n",
+    "- **Weakly informative priors** express partial information about a variable, such as a peak at $p=0.5$ (if we have no reason to believe the coin is biased), with a larger variance.\n",
+    "- **Uninformative priors** express no information about a variable, except what we know for sure, such as knowing that $0\\leq p \\leq1$.\n",
+    "\n",
+    "Now you may think that the _uniform distribution_ is uninformative, however, what if I am thinking about this question in terms of the probability $p$ and Eric Ma is thinking about it in terms of the _odds ratio_ $r=\\frac{p}{1-p}$? Eric rightly feels that he has no prior knowledge as to what this $r$ is and thus chooses the uniform prior on $r$.\n",
+    "\n",
+    "With a bit of algebra (transformation of variables), we can show that choosing the uniform prior on $p$ amounts to choosing a decidedly non-uniform prior on $r$ and vice versa. So Eric and I have actually chosen different priors, using the same philosophy. How do we avoid this happening? Enter the **Jeffreys prior**, which is an uninformative prior that solves this problem. You can read more about the Jeffreys prior [here](https://en.wikipedia.org/wiki/Jeffreys_prior) & in your favourite Bayesian text book (Sivia gives a nice treatment). \n",
+    "\n",
+    "In the binomial (coin flip) case, the Jeffreys prior is given by $P(p) = \\frac{1}{\\sqrt{p(1-p)}}$.\n",
+    "\n"
    ]
   },
   {