Skip to content

Commit fc866fd

Browse files
authored
Merge pull request #1493 from d2l-ai/master
Release v0.15.0
2 parents ac4e912 + 5f08dc6 commit fc866fd

File tree

209 files changed

+18240
-2262
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

209 files changed

+18240
-2262
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,4 +63,4 @@ This open source book is made available under the Creative Commons Attribution-S
6363

6464
The sample and reference code within this open source book is made available under a modified MIT license. See the [LICENSE-SAMPLECODE](LICENSE-SAMPLECODE) file.
6565

66-
[Chinese version](https://github.com/d2l-ai/d2l-zh) | [Discuss and report issues](https://discuss.d2l.ai/) | [Other Information](INFO.md)
66+
[Chinese version](https://github.com/d2l-ai/d2l-zh) | [Discuss and report issues](https://discuss.d2l.ai/) | [Code of conduct](CODE_OF_CONDUCT.md) | [Other Information](INFO.md)

chapter_appendix-mathematics-for-deep-learning/distributions.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -334,7 +334,7 @@ def binom(n, k):
334334
comb = comb * (n - i) // (i + 1)
335335
return comb
336336
337-
pmf = torch.tensor([p**i * (1-p)**(n - i) * binom(n, i) for i in range(n + 1)])
337+
pmf = d2l.tensor([p**i * (1-p)**(n - i) * binom(n, i) for i in range(n + 1)])
338338
339339
d2l.plt.stem([i for i in range(n + 1)], pmf, use_line_collection=True)
340340
d2l.plt.xlabel('x')
@@ -672,7 +672,7 @@ d2l.plot(x, np.array([phi(y) for y in x.tolist()]), 'x', 'c.d.f.')
672672
```{.python .input}
673673
#@tab pytorch
674674
def phi(x):
675-
return (1.0 + erf((x - mu) / (sigma * torch.sqrt(torch.tensor(2.))))) / 2.0
675+
return (1.0 + erf((x - mu) / (sigma * torch.sqrt(d2l.tensor(2.))))) / 2.0
676676
677677
d2l.plot(x, torch.tensor([phi(y) for y in x.tolist()]), 'x', 'c.d.f.')
678678
```

chapter_appendix-mathematics-for-deep-learning/eigendecomposition.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -538,7 +538,7 @@ for all practical purposes, our random vector has been transformed
538538
into the principle eigenvector!
539539
Indeed this algorithm is the basis
540540
for what is known as the *power iteration*
541-
for finding the largest eigenvalue and eigenvector of a matrix. For details see, for example, :cite:`Van-Loan.Golub.1983`.
541+
for finding the largest eigenvalue and eigenvector of a matrix. For details see, for example, :cite:`Van-Loan.Golub.1983`.
542542

543543
### Fixing the Normalization
544544

chapter_appendix-mathematics-for-deep-learning/geometry-linear-algebraic-ops.md

Lines changed: 194 additions & 194 deletions
Large diffs are not rendered by default.

chapter_appendix-mathematics-for-deep-learning/information-theory.md

Lines changed: 34 additions & 33 deletions
Large diffs are not rendered by default.

chapter_appendix-mathematics-for-deep-learning/integral-calculus.md

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Integral Calculus
22
:label:`sec_integral_calculus`
33

4-
Differentiation only makes up half of the content of a traditional calculus education. The other pillar, integration, starts out seeming a rather disjoint question, "What is the area underneath this curve?" While seemingly unrelated, integration is tightly intertwined with the differentiation via what is known as the *fundamental theorem of calculus*.
4+
Differentiation only makes up half of the content of a traditional calculus education. The other pillar, integration, starts out seeming a rather disjoint question, "What is the area underneath this curve?" While seemingly unrelated, integration is tightly intertwined with the differentiation via what is known as the *fundamental theorem of calculus*.
55

66
At the level of machine learning we discuss in this book, we will not need a deep understanding of integration. However, we will provide a brief introduction to lay the groundwork for any further applications we will encounter later on.
77

@@ -187,7 +187,7 @@ We will instead take a different approach. We will work intuitively with the no
187187

188188
## The Fundamental Theorem of Calculus
189189

190-
To dive deeper into the theory of integration, let us introduce a function
190+
To dive deeper into the theory of integration, let us introduce a function
191191

192192
$$
193193
F(x) = \int_0^x f(y) dy.
@@ -201,10 +201,10 @@ $$
201201

202202
This is a mathematical encoding of the fact that we can measure the area out to the far end-point and then subtract off the area to the near end point as indicated in :numref:`fig_area-subtract`.
203203

204-
![Visualizing why we may reduce the problem of computing the area under a curve between two points to computing the area to the left of a point.](../img/SubArea.svg)
204+
![Visualizing why we may reduce the problem of computing the area under a curve between two points to computing the area to the left of a point.](../img/sub-area.svg)
205205
:label:`fig_area-subtract`
206206

207-
Thus, we can figure out what the integral over any interval is by figuring out what $F(x)$ is.
207+
Thus, we can figure out what the integral over any interval is by figuring out what $F(x)$ is.
208208

209209
To do so, let us consider an experiment. As we often do in calculus, let us imagine what happens when we shift the value by a tiny bit. From the comment above, we know that
210210

@@ -259,7 +259,7 @@ First, suppose that we have a function which is itself an integral:
259259

260260
$$
261261
F(x) = \int_0^x f(y) \; dy.
262-
$$
262+
$$
263263

264264
Let us suppose that we want to know how this function looks when we compose it with another to obtain $F(u(x))$. By the chain rule, we know
265265

@@ -286,16 +286,16 @@ $$\int_{u(0)}^{u(x)} f(y) \; dy = \int_0^x f(u(y))\cdot \frac{du}{dy} \;dy.$$
286286

287287
This is the *change of variables* formula.
288288

289-
For a more intuitive derivation, consider what happens when we take an integral of $f(u(x))$ between $x$ and $x+\epsilon$. For a small $\epsilon$, this integral is approximately $\epsilon f(u(x))$, the area of the associated rectangle. Now, let us compare this with the integral of $f(y)$ from $u(x)$ to $u(x+\epsilon)$. We know that $u(x+\epsilon) \approx u(x) + \epsilon \frac{du}{dx}(x)$, so the area of this rectangle is approximately $\epsilon \frac{du}{dx}(x)f(u(x))$. Thus, to make the area of these two rectangles to agree, we need to multiply the first one by $\frac{du}{dx}(x)$ as is illustrated in :numref:`fig_rect-transform`.
289+
For a more intuitive derivation, consider what happens when we take an integral of $f(u(x))$ between $x$ and $x+\epsilon$. For a small $\epsilon$, this integral is approximately $\epsilon f(u(x))$, the area of the associated rectangle. Now, let us compare this with the integral of $f(y)$ from $u(x)$ to $u(x+\epsilon)$. We know that $u(x+\epsilon) \approx u(x) + \epsilon \frac{du}{dx}(x)$, so the area of this rectangle is approximately $\epsilon \frac{du}{dx}(x)f(u(x))$. Thus, to make the area of these two rectangles to agree, we need to multiply the first one by $\frac{du}{dx}(x)$ as is illustrated in :numref:`fig_rect-transform`.
290290

291-
![Visualizing the transformation of a single thin rectangle under the change of variables.](../img/RectTrans.svg)
291+
![Visualizing the transformation of a single thin rectangle under the change of variables.](../img/rect-trans.svg)
292292
:label:`fig_rect-transform`
293293

294294
This tells us that
295295

296296
$$
297297
\int_x^{x+\epsilon} f(u(y))\frac{du}{dy}(y)\;dy = \int_{u(x)}^{u(x+\epsilon)} f(y) \; dy.
298-
$$
298+
$$
299299

300300
This is the change of variables formula expressed for a single small rectangle.
301301

@@ -404,7 +404,7 @@ ax.set_zlim(0, 1)
404404
ax.dist = 12
405405
```
406406

407-
We write this as
407+
We write this as
408408

409409
$$
410410
\int_{[a, b]\times[c, d]} f(x, y)\;dx\;dy.
@@ -416,7 +416,7 @@ $$
416416
\int_{[a, b]\times[c, d]} f(x, y)\;dx\;dy = \int_c^{d} \left(\int_a^{b} f(x, y) \;dx\right) \; dy.
417417
$$
418418

419-
Let us see why this is.
419+
Let us see why this is.
420420

421421
Consider the figure above where we have split the function into $\epsilon \times \epsilon$ squares which we will index with integer coordinates $i, j$. In this case, our integral is approximately
422422

@@ -430,16 +430,16 @@ $$
430430
\sum _ {j} \epsilon \left(\sum_{i} \epsilon f(\epsilon i, \epsilon j)\right).
431431
$$
432432

433-
![Illustrating how to decompose a sum over many squares as a sum over first the columns (1), then adding the column sums together (2).](../img/SumOrder.svg)
433+
![Illustrating how to decompose a sum over many squares as a sum over first the columns (1), then adding the column sums together (2).](../img/sum-order.svg)
434434
:label:`fig_sum-order`
435435

436-
The sum on the inside is precisely the discretization of the integral
436+
The sum on the inside is precisely the discretization of the integral
437437

438438
$$
439439
G(\epsilon j) = \int _a^{b} f(x, \epsilon j) \; dx.
440440
$$
441441

442-
Finally, notice that if we combine these two expressions we get
442+
Finally, notice that if we combine these two expressions we get
443443

444444
$$
445445
\sum _ {j} \epsilon G(\epsilon j) \approx \int _ {c}^{d} G(y) \; dy = \int _ {[a, b]\times[c, d]} f(x, y)\;dx\;dy.
@@ -466,9 +466,9 @@ $$
466466
$$
467467

468468
## Change of Variables in Multiple Integrals
469-
As with single variables in :eqref:`eq_change_var`, the ability to change variables inside a higher dimensional integral is a key tool. Let us summarize the result without derivation.
469+
As with single variables in :eqref:`eq_change_var`, the ability to change variables inside a higher dimensional integral is a key tool. Let us summarize the result without derivation.
470470

471-
We need a function that reparameterizes our domain of integration. We can take this to be $\phi : \mathbb{R}^n \rightarrow \mathbb{R}^n$, that is any function which takes in $n$ real variables and returns another $n$. To keep the expressions clean, we will assume that $\phi$ is *injective* which is to say it never folds over itself ($\phi(\mathbf{x}) = \phi(\mathbf{y}) \implies \mathbf{x} = \mathbf{y}$).
471+
We need a function that reparameterizes our domain of integration. We can take this to be $\phi : \mathbb{R}^n \rightarrow \mathbb{R}^n$, that is any function which takes in $n$ real variables and returns another $n$. To keep the expressions clean, we will assume that $\phi$ is *injective* which is to say it never folds over itself ($\phi(\mathbf{x}) = \phi(\mathbf{y}) \implies \mathbf{x} = \mathbf{y}$).
472472

473473
In this case, we can say that
474474

@@ -486,7 +486,7 @@ D\boldsymbol{\phi} = \begin{bmatrix}
486486
\end{bmatrix}.
487487
$$
488488

489-
Looking closely, we see that this is similar to the single variable chain rule :eqref:`eq_change_var`, except we have replaced the term $\frac{du}{dx}(x)$ with $\left|\det(D\phi(\mathbf{x}))\right|$. Let us see how we can to interpret this term. Recall that the $\frac{du}{dx}(x)$ term existed to say how much we stretched our $x$-axis by applying $u$. The same process in higher dimensions is to determine how much we stretch the area (or volume, or hyper-volume) of a little square (or little *hyper-cube*) by applying $\boldsymbol{\phi}$. If $\boldsymbol{\phi}$ was the multiplication by a matrix, then we know how the determinant already gives the answer.
489+
Looking closely, we see that this is similar to the single variable chain rule :eqref:`eq_change_var`, except we have replaced the term $\frac{du}{dx}(x)$ with $\left|\det(D\phi(\mathbf{x}))\right|$. Let us see how we can to interpret this term. Recall that the $\frac{du}{dx}(x)$ term existed to say how much we stretched our $x$-axis by applying $u$. The same process in higher dimensions is to determine how much we stretch the area (or volume, or hyper-volume) of a little square (or little *hyper-cube*) by applying $\boldsymbol{\phi}$. If $\boldsymbol{\phi}$ was the multiplication by a matrix, then we know how the determinant already gives the answer.
490490

491491
With some work, one can show that the *Jacobian* provides the best approximation to a multivariable function $\boldsymbol{\phi}$ at a point by a matrix in the same way we could approximate by lines or planes with derivatives and gradients. Thus the determinant of the Jacobian exactly mirrors the scaling factor we identified in one dimension.
492492

@@ -502,7 +502,7 @@ $$
502502
\int _ 0^\infty \int_0 ^ {2\pi} e^{-r^{2}} \left|\det(D\mathbf{\phi}(\mathbf{x}))\right|\;d\theta\;dr,
503503
$$
504504

505-
where
505+
where
506506

507507
$$
508508
\left|\det(D\mathbf{\phi}(\mathbf{x}))\right| = \left|\det\begin{bmatrix}
@@ -517,7 +517,7 @@ $$
517517
\int _ 0^\infty \int _ 0 ^ {2\pi} re^{-r^{2}} \;d\theta\;dr = 2\pi\int _ 0^\infty re^{-r^{2}} \;dr = \pi,
518518
$$
519519

520-
where the final equality follows by the same computation that we used in section :numref:`integral_example`.
520+
where the final equality follows by the same computation that we used in section :numref:`integral_example`.
521521

522522
We will meet this integral again when we study continuous random variables in :numref:`sec_random_variables`.
523523

0 commit comments

Comments
 (0)