ericmjl
diff --git a/‎images/bacteria_model.jpg‎
30.7 KB b/‎images/bacteria_model.jpg‎
30.7 KB
diff --git a/‎images/bacteria_model.pdf‎
6.18 KB b/‎images/bacteria_model.pdf‎
6.18 KB
diff --git a/‎notebooks/03-instructor-two-group-iq.ipynb‎
Lines changed: 18 additions & 32 deletions b/‎notebooks/03-instructor-two-group-iq.ipynb‎
Lines changed: 18 additions & 32 deletions
diff --git a/‎notebooks/03-student-two-group-iq.ipynb‎
Lines changed: 19 additions & 33 deletions b/‎notebooks/03-student-two-group-iq.ipynb‎
Lines changed: 19 additions & 33 deletions
diff --git a/‎notebooks/04-instructor-multi-group-comparsion-sterilization.ipynb‎
Lines changed: 8 additions & 38 deletions b/‎notebooks/04-instructor-multi-group-comparsion-sterilization.ipynb‎
Lines changed: 8 additions & 38 deletions
@@ -120,9 +120,7 @@
     "\n",
     "Now that we have a first-pass generative model for the data, let's do some quick sanity checks against the data.\n",
     "\n",
-    "#### Exercise 1\n",
-    "\n",
-    "Load the dataset into a pandas DataFrame. It is available at the path `../data/iq.csv`."
+    "Let's get started by loading the data!"
    ]
   },
   {
@@ -139,9 +137,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### Exercise 2\n",
-    "\n",
-    "Plot the number of samples for drug and for treatment."
+    "**Exercise:** Plot the number of samples for drug and for treatment."
    ]
   },
   {
@@ -159,9 +155,7 @@
    "source": [
     "More important than the number of samples per treatment is the distribution of IQ, which will give us a hint as to whether we can expect a difference in effect.\n",
     "\n",
-    "#### Exercise 3\n",
-    "\n",
-    "Plot the ECDF of the treatments vs. control. If you need to inspect the source code of ECDF, it is available below."
+    "**Exercise:** Plot the ECDF of the treatments vs. control. If you need to inspect the source code of ECDF, it is available below."
    ]
   },
   {
@@ -197,9 +191,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### Discuss\n",
-    "\n",
-    "Does it look like the treatment had an effect on the IQ of the participants? What numbers from the chart above can help support your conclusions?"
+    "**Discuss:** Does it look like the treatment had an effect on the IQ of the participants? What numbers from the chart above can help support your conclusions?"
    ]
   },
   {
@@ -208,9 +200,7 @@
    "source": [
     "### Step 3: Fit Model\n",
     "\n",
-    "#### Exercise\n",
-    "\n",
-    "We will specify the model below. Fill in the distributions as we go along in class. We are proceeding slowly here, simply to give you repetition practice with PyMC3's syntax."
+    "**Exercise:** We will specify the model below. Fill in the distributions as we go along in class. We are proceeding slowly here, simply to give you repetition practice with PyMC3's syntax."
    ]
   },
   {
@@ -310,9 +300,7 @@
     "\n",
     "We use posterior predictive checks (PPC) as one tool in our toolkit to evaluate and critique the model. The overarching goal of the PPC is to check that the data generating model generates simulated data that matches closely to the actual data. If this is the case, then we have a model that probably describes the data generating process well. If this is not the case, then we have evidence to go guide us towards re-doing the model.\n",
     "\n",
-    "#### Exercise\n",
-    "\n",
-    "To do a PPC, PyMC3 provides a `sample_ppc` function, which allows us to draw samples from the posterior distribution as a check. Run the following cell, filling in the appropriate `trace` and `model`."
+    "**Exercise:** To do a PPC, PyMC3 provides a `sample_ppc` function, which allows us to draw samples from the posterior distribution as a check. Run the following cell, filling in the appropriate `trace` and `model`."
    ]
   },
   {
@@ -328,9 +316,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### Exercise\n",
-    "\n",
-    "Let's now plot the ECDF of the sampled data against the original data."
+    "**Exercise:** Let's now plot the ECDF of the sampled data against the original data."
    ]
   },
   {
@@ -375,11 +361,14 @@
    "source": [
     "It looks like we have a model that, just by eyeballing the charts, models pretty well the distribution of the observed data.\n",
     "\n",
-    "For pedagogical brevity, we did not dive into a case where the model was plausibly but nonetheless incorrectly specified. Under an incorrect model, we would expect the PPC and data distributions to be anywhere from moderately to wildly off. Having detected this from a visual comparison of the PPC samples and data, we would go back and try to see where we went wrong. We might also opt to quantify this difference using the tools provided in PyMC3. \n",
-    "\n",
-    "#### Exercise\n",
-    "\n",
-    "Now, let us evaluate whether the drug actually did have an effect. Recall that we computed the difference in means, as well as an effect size, both with uncertainty. Using this information, plot the posterior distribution of the difference in means and effect sizes."
+    "For pedagogical brevity, we did not dive into a case where the model was plausibly but nonetheless incorrectly specified. Under an incorrect model, we would expect the PPC and data distributions to be anywhere from moderately to wildly off. Having detected this from a visual comparison of the PPC samples and data, we would go back and try to see where we went wrong. We might also opt to quantify this difference using the tools provided in PyMC3. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Exercise:** Now, let us evaluate whether the drug actually did have an effect. Recall that we computed the difference in means, as well as an effect size, both with uncertainty. Using this information, plot the posterior distribution of the difference in means and effect sizes."
    ]
   },
   {
@@ -405,9 +394,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### Exercise\n",
-    "\n",
-    "Compute the p-value of the t-test for this dataset."
+    "**Excercise:** Compute the p-value of the t-test for this dataset."
    ]
   },
   {
@@ -425,8 +412,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### Discuss\n",
-    "\n",
+    "**Discuss**:\n",
     "1. Is there a significant difference between the drug-treated and placebo-treated participants of the intervention? (This question is intentionally vague on the definition of \"significant\", to encourage discussion of the difference between statistical and practical significance.)\n",
     "1. Would you recommend the intervention as a method to raise people's IQ? How much money would you be willing to pay for this intervention?"
    ]
@@ -435,7 +421,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Further Reading/Watching\n",
+    "## Further Reading/Watching\n",
     "\n",
     "- PyMC3's documentation contains an example of how to do [model selection][model_selection], which we did not touch on here. \n",
     "- John Kruschke's paper on [Bayesian Estimation][bayes_est] is what this notebook's example is based on. There is also a [YouTube video][bayes_yt] available.\n",
 
@@ -120,9 +120,7 @@
     "\n",
     "Now that we have a first-pass generative model for the data, let's do some quick sanity checks against the data.\n",
     "\n",
-    "#### Exercise 1\n",
-    "\n",
-    "Load the dataset into a pandas DataFrame. It is available at the path `../data/iq.csv`."
+    "Let's get started by loading the data!"
    ]
   },
   {
@@ -139,9 +137,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### Exercise 2\n",
-    "\n",
-    "Plot the number of samples for drug and for treatment."
+    "**Exercise:** Plot the number of samples for drug and for treatment."
    ]
   },
   {
@@ -159,9 +155,7 @@
    "source": [
     "More important than the number of samples per treatment is the distribution of IQ, which will give us a hint as to whether we can expect a difference in effect.\n",
     "\n",
-    "#### Exercise 3\n",
-    "\n",
-    "Plot the ECDF of the treatments vs. control. If you need to inspect the source code of ECDF, it is available below."
+    "**Exercise:** Plot the ECDF of the treatments vs. control. If you need to inspect the source code of ECDF, it is available below."
    ]
   },
   {
@@ -204,9 +198,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### Discuss\n",
-    "\n",
-    "Does it look like the treatment had an effect on the IQ of the participants? What numbers from the chart above can help support your conclusions?"
+    "**Discuss:** Does it look like the treatment had an effect on the IQ of the participants? What numbers from the chart above can help support your conclusions?"
    ]
   },
   {
@@ -215,9 +207,7 @@
    "source": [
     "### Step 3: Fit Model\n",
     "\n",
-    "#### Exercise\n",
-    "\n",
-    "We will specify the model below. Fill in the distributions as we go along in class. We are proceeding slowly here, simply to give you repetition practice with PyMC3's syntax."
+    "**Exercise:** We will specify the model below. Fill in the distributions as we go along in class. We are proceeding slowly here, simply to give you repetition practice with PyMC3's syntax."
    ]
   },
   {
@@ -310,9 +300,7 @@
     "\n",
     "We use posterior predictive checks (PPC) as one tool in our toolkit to evaluate and critique the model. The overarching goal of the PPC is to check that the data generating model generates simulated data that matches closely to the actual data. If this is the case, then we have a model that probably describes the data generating process well. If this is not the case, then we have evidence to go guide us towards re-doing the model.\n",
     "\n",
-    "#### Exercise\n",
-    "\n",
-    "To do a PPC, PyMC3 provides a `sample_ppc` function, which allows us to draw samples from the posterior distribution as a check. Run the following cell, filling in the appropriate `trace` and `model`."
+    "**Exercise:** To do a PPC, PyMC3 provides a `sample_ppc` function, which allows us to draw samples from the posterior distribution as a check. Run the following cell, filling in the appropriate `trace` and `model`."
    ]
   },
   {
@@ -329,9 +317,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### Exercise\n",
-    "\n",
-    "Let's now plot the ECDF of the sampled data against the original data."
+    "**Exercise:** Let's now plot the ECDF of the sampled data against the original data."
    ]
   },
   {
@@ -376,11 +362,14 @@
    "source": [
     "It looks like we have a model that, just by eyeballing the charts, models pretty well the distribution of the observed data.\n",
     "\n",
-    "For pedagogical brevity, we did not dive into a case where the model was plausibly but nonetheless incorrectly specified. Under an incorrect model, we would expect the PPC and data distributions to be anywhere from moderately to wildly off. Having detected this from a visual comparison of the PPC samples and data, we would go back and try to see where we went wrong. We might also opt to quantify this difference using the tools provided in PyMC3. \n",
-    "\n",
-    "#### Exercise\n",
-    "\n",
-    "Now, let us evaluate whether the drug actually did have an effect. Recall that we computed the difference in means, as well as an effect size, both with uncertainty. Using this information, plot the posterior distribution of the difference in means and effect sizes."
+    "For pedagogical brevity, we did not dive into a case where the model was plausibly but nonetheless incorrectly specified. Under an incorrect model, we would expect the PPC and data distributions to be anywhere from moderately to wildly off. Having detected this from a visual comparison of the PPC samples and data, we would go back and try to see where we went wrong. We might also opt to quantify this difference using the tools provided in PyMC3. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Exercise:** Now, let us evaluate whether the drug actually did have an effect. Recall that we computed the difference in means, as well as an effect size, both with uncertainty. Using this information, plot the posterior distribution of the difference in means and effect sizes."
    ]
   },
   {
@@ -407,9 +396,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### Exercise\n",
-    "\n",
-    "Compute the p-value of the t-test for this dataset."
+    "**Excercise:** Compute the p-value of the t-test for this dataset."
    ]
   },
   {
@@ -427,17 +414,16 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### Discuss\n",
-    "\n",
-    "1. Is there a significant difference between the drug-treated and placebo-treated participants of the intervention?\n",
+    "**Discuss**:\n",
+    "1. Is there a significant difference between the drug-treated and placebo-treated participants of the intervention? (This question is intentionally vague on the definition of \"significant\", to encourage discussion of the difference between statistical and practical significance.)\n",
     "1. Would you recommend the intervention as a method to raise people's IQ? How much money would you be willing to pay for this intervention?"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Further Reading/Watching\n",
+    "## Further Reading/Watching\n",
     "\n",
     "- PyMC3's documentation contains an example of how to do [model selection][model_selection], which we did not touch on here. \n",
     "- John Kruschke's paper on [Bayesian Estimation][bayes_est] is what this notebook's example is based on. There is also a [YouTube video][bayes_yt] available.\n",
 
@@ -109,9 +109,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Exercise \n",
-    "\n",
-    "View a random sample of 5 rows to get a feel for the structure of the data."
+    "**Exercise:** View a random sample of 5 rows to get a feel for the structure of the data."
    ]
   },
   {
@@ -127,9 +125,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Exercise \n",
-    "\n",
-    "To help you visualize what data are available and missing in the dataframe, run the cell below to get a visual matrix (using MissingNo). (By the way, be sure to make use of this awesome tool in your data analysis!)"
+    "**Exercise:** To help you visualize what data are available and missing in the dataframe, run the cell below to get a visual matrix (using MissingNo). (By the way, be sure to make use of this awesome tool in your data analysis!)"
    ]
   },
   {
@@ -146,9 +142,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### Exercise \n",
-    "\n",
-    "Plot the average percentage reduction in colonies for each treatment."
+    "**Exercise:** Plot the average percentage reduction in colonies for each treatment."
    ]
   },
   {
@@ -167,9 +161,11 @@
    "source": [
     "### Step 3: Implement and Fit Model\n",
     "\n",
-    "#### Exercise\n",
+    "**Exercise:** Write the generative model for the data. \n",
+    "\n",
+    "To help you, this is a diagrammed version of the model below.\n",
     "\n",
-    "Write the generative model for the data. "
+    "![](../images/bacteria_model.jpg)"
    ]
   },
   {
@@ -218,8 +214,6 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Exercise\n",
-    "\n",
     "Check the traces to make sure that sampling has converged."
    ]
   },
@@ -237,20 +231,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Exercise\n",
-    "\n",
     "Visualize the posterior distributions of percentage reduction"
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "sorted(mapping)"
-   ]
-  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -265,9 +248,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Discussion\n",
-    "\n",
-    "Find a neighbour who is working on the same notebook, and discuss this together.\n",
+    "**Discussion:** Find a neighbour who is working on the same notebook, and discuss this together.\n",
     "\n",
     "- Which method of sterilization is the most effective? \n",
     "- Given the data, is there any uncertainty surrounding this? Could we still be wrong about the uncertainty?"
@@ -283,17 +264,6 @@
     "- We estimate parameter of interest for each group, and then compare the parameter posterior distributions."
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Hints\n",
-    "\n",
-    "A graphical version of one possible model implementation is provided below.\n",
-    "\n",
-    "![](../images/bacteria_model.jpg)"
-   ]
-  },
   {
    "cell_type": "code",
    "execution_count": null,