|
31 | 31 | "\n", |
32 | 32 | "Baseball players have many metrics measured for them. Let's say we are on a baseball team, and would like to quantify player performance, one metric being their batting average (defined by how many times a batter hit a pitched ball, divided by the number of times they were up for batting (\"at bat\")). How would you go about this task?\n", |
33 | 33 | "\n", |
34 | | - "## Discussion\n", |
35 | | - "\n", |
36 | | - "Discuss with your neighbors the following questions.\n", |
37 | | - "\n", |
| 34 | + "**Discuss**: \n", |
38 | 35 | "1. What data would we need?\n", |
39 | 36 | "1. What metric would you rank by? \n", |
40 | 37 | "1. Would your metric be reasonable for rookie players?\n", |
|
61 | 58 | "- Binomail distribution: a probability distribution modelling the number of successes in `n` trials. Parameterized by both `n` and `p`.\n", |
62 | 59 | "- Beta distributions: a probability distribution bounded over the interval $(0, 1)$. Models distribution of probability values, usually the `p` in a Bernoulli or Binomial. Parameterized by $\\alpha$ and $\\beta$, which can be thought of as \"number of successes\" and \"number of failures\" respectively.\n", |
63 | 60 | "\n", |
| 61 | + "Every distribution has its \"story\". If you're curious, check out [Justin Bois' probability stories][probstory] page.\n", |
| 62 | + "\n", |
| 63 | + "[probstory]: http://bebi103.caltech.edu.s3-website-us-east-1.amazonaws.com/2017/tutorials/t3b_probability_stories.html#Beta-distribution\n", |
| 64 | + "\n", |
64 | 65 | "### Focus on beta\n", |
65 | 66 | "\n", |
66 | 67 | "Let's say we wanted to model a probability distribution centered approximately on 0.2. Depending on our parameterization of the Beta distribution, we can express different levels of confidence (as measured by the spread of the distribution) as to how sure we are a distribution takes on that value.\n", |
|
111 | 112 | "cell_type": "markdown", |
112 | 113 | "metadata": {}, |
113 | 114 | "source": [ |
114 | | - "### Exercise\n", |
115 | | - "\n", |
116 | | - "Write a naive estimation model for the players above.\n", |
| 115 | + "**Exercise:** Write a naive estimation model for the players above.\n", |
117 | 116 | "\n", |
118 | 117 | "Hint, a possible model you could specify is as follows:\n", |
119 | 118 | "\n", |
|
186 | 185 | "cell_type": "markdown", |
187 | 186 | "metadata": {}, |
188 | 187 | "source": [ |
189 | | - "## Discussion\n", |
190 | | - "\n", |
191 | | - "Are the estimates reasonable, particularly for players that have had only one at bat (AB)?" |
| 188 | + "**Discuss:** Are the estimates reasonable, particularly for players that have had only one at bat (AB)?" |
192 | 189 | ] |
193 | 190 | }, |
194 | 191 | { |
|
197 | 194 | "source": [ |
198 | 195 | "# Hierarchical Modelling\n", |
199 | 196 | "\n", |
200 | | - "## Discussion\n", |
201 | | - "\n", |
| 197 | + "**Discuss:** \n", |
202 | 198 | "- How do we deal with the fact that some players have only had 1 at bat (AB = 1), and zero hits (H = 0)? \n", |
203 | 199 | "- Would it be reasonable, fair, and in line with prior knowledge that the player's true batting average was zero? \n", |
204 | 200 | "\n", |
|
337 | 333 | " ax.plot(r)\n", |
338 | 334 | " \n", |
339 | 335 | "ax.set_xticks([0, 1, 2])\n", |
340 | | - "ax.set_xticklabels(['no pooling (MAP)', 'partial pooling (Bayesian)', 'complete pooling (Population Average)'])\n", |
| 336 | + "ax.set_xticklabels(['no pooling', 'partial pooling', 'complete pooling'])\n", |
341 | 337 | "ax.set_ylim(0, 1)\n", |
342 | 338 | "despine(ax)" |
343 | 339 | ] |
|
0 commit comments