Skip to content

Commit 626fb0b

Browse files
Merge pull request #151 from jiangzhonglian/master
更新 15项目案例
2 parents df0396d + 2341737 commit 626fb0b

6 files changed

Lines changed: 81 additions & 355 deletions

File tree

docs/15.大数据与MapReduce.md

Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -10,14 +10,14 @@
1010
## 大数据 场景
1111

1212
```
13-
假如你为一家网络购物商店工作,很多拥护访问该网站,其中有些人会购买商品,有些人则随意浏览后就离开。
13+
假如你为一家网络购物商店工作,很多用户访问该网站,其中有些人会购买商品,有些人则随意浏览后就离开。
1414
对于你来说,可能很想识别那些有购物意愿的用户。
1515
那么问题就来了,数据集可能会非常大,在单机上训练要运行好几天。
16-
接下来:我们讲讲 Hadoop 如何来解决这样的问题
16+
接下来:我们讲讲 MapRedece 如何来解决这样的问题
1717
```
1818

1919

20-
## MapReduce
20+
## MapRedece
2121

2222
### Hadoop 概述
2323

@@ -79,21 +79,21 @@ cat input/15.BigData_MapReduce/inputFile.txt | python src/python/15.BigData_MapR
7979

8080
#### Mahout in Action
8181

82-
1. 简单贝叶斯:
83-
2. k-近邻算法:
82+
1. 简单贝叶斯:它属于为数不多的可以很自然的使用MapReduce的算法。通过统计在某个类别下某特征的概率。
83+
2. k-近邻算法:高维数据下(如文本、图像和视频)流行的近邻查找方法是局部敏感哈希算法。
8484
3. 支持向量机(SVM):使用随机梯度下降算法求解,如Pegasos算法。
8585
4. 奇异值分解:Lanczos算法是一个有效的求解近似特征值的算法。
8686
5. k-均值聚类:canopy算法初始化k个簇,然后再运行K-均值求解结果。
8787

88-
#### 使用 mrjob 库将 MapReduce 自动化
88+
### 使用 mrjob 库将 MapReduce 自动化
8989

9090
> 理论简介
9191
92-
* MapReduce作业流自动化的框架:Cascading 和 Oozie.
93-
* mrjob是一个不错的学习工具,与2010年底实现了开源,来之于Yelp(一个餐厅点评网站).
92+
* MapReduce 作业流自动化的框架:Cascading 和 Oozie.
93+
* mrjob 是一个不错的学习工具,与2010年底实现了开源,来之于 Yelp(一个餐厅点评网站).
9494

9595
```Shell
96-
python mrMean.py < inputFile.txt > myOut.txt
96+
python src/python/15.BigData_MapReduce/mrMean.py < input/15.BigData_MapReduce/inputFile.txt > input/15.BigData_MapReduce/myOut.txt
9797
```
9898

9999
> 实战脚本
@@ -106,11 +106,11 @@ python mrMean.py < inputFile.txt > myOut.txt
106106
python src/python/15.BigData_MapReduce/mrMean.py < input/15.BigData_MapReduce/inputFile.txt
107107
```
108108

109-
#### 利用 Pegasos 算法并行训练支持向量机
109+
### 项目案例:分布式 SVM 的 Pegasos 算法
110110

111111
Pegasos是指原始估计梯度求解器(Peimal Estimated sub-GrAdient Solver)
112112

113-
> Pegasos 工作原理
113+
#### Pegasos 工作原理
114114

115115
1. 从训练集中随机挑选一些样本点添加到带处理列表中
116116
2. 按序判断每个样本点是否被正确分类
@@ -130,7 +130,7 @@ Pegasos是指原始估计梯度求解器(Peimal Estimated sub-GrAdient Solver)
130130
累加对 w 的更新
131131
```
132132

133-
> 开发流程
133+
#### 开发流程
134134

135135
```
136136
收集数据:数据按文本格式存放。
@@ -141,6 +141,11 @@ Pegasos是指原始估计梯度求解器(Peimal Estimated sub-GrAdient Solver)
141141
使用算法:本例不会展示一个完整的应用,但会展示如何在大数据集上训练SVM。该算法其中一个应用场景就是本文分类,通常在文本分类里可能有大量的文档和成千上万的特征。
142142
```
143143

144+
> 训练算法
145+
146+
[完整代码地址](https://github.com/apachecn/MachineLearning/blob/master/src/python/2.KNN/kNN.py): <https://github.com/apachecn/MachineLearning/blob/master/src/python/2.KNN/kNN.py>
147+
148+
144149
我们继续看 Python 版本的代码实现。
145150

146151
* * *
-22.2 KB
Loading
Lines changed: 1 addition & 303 deletions
Original file line numberDiff line numberDiff line change
@@ -1,303 +1 @@
1-
shape(self.w): (2,)
2-
shape(X[index,:]) (1, 2)
3-
1 ["u", 79]
4-
shape(self.w): (2,)
5-
shape(X[index,:]) (1, 2)
6-
1 ["u", 115]
7-
shape(self.w): (2,)
8-
shape(X[index,:]) (1, 2)
9-
1 ["u", 107]
10-
shape(self.w): (2,)
11-
shape(X[index,:]) (1, 2)
12-
1 ["u", 109]
13-
shape(self.w): (2,)
14-
shape(X[index,:]) (1, 2)
15-
1 ["u", 109]
16-
shape(self.w): (2,)
17-
shape(X[index,:]) (1, 2)
18-
1 ["u", 88]
19-
shape(self.w): (2,)
20-
shape(X[index,:]) (1, 2)
21-
1 ["u", 56]
22-
shape(self.w): (2,)
23-
shape(X[index,:]) (1, 2)
24-
1 ["u", 94]
25-
shape(self.w): (2,)
26-
shape(X[index,:]) (1, 2)
27-
1 ["u", 50]
28-
shape(self.w): (2,)
29-
shape(X[index,:]) (1, 2)
30-
1 ["u", 86]
31-
shape(self.w): (2,)
32-
shape(X[index,:]) (1, 2)
33-
1 ["u", 75]
34-
shape(self.w): (2,)
35-
shape(X[index,:]) (1, 2)
36-
1 ["u", 30]
37-
shape(self.w): (2,)
38-
shape(X[index,:]) (1, 2)
39-
1 ["u", 20]
40-
shape(self.w): (2,)
41-
shape(X[index,:]) (1, 2)
42-
1 ["u", 157]
43-
shape(self.w): (2,)
44-
shape(X[index,:]) (1, 2)
45-
1 ["u", 15]
46-
shape(self.w): (2,)
47-
shape(X[index,:]) (1, 2)
48-
1 ["u", 19]
49-
shape(self.w): (2,)
50-
shape(X[index,:]) (1, 2)
51-
1 ["u", 63]
52-
shape(self.w): (2,)
53-
shape(X[index,:]) (1, 2)
54-
1 ["u", 124]
55-
shape(self.w): (2,)
56-
shape(X[index,:]) (1, 2)
57-
1 ["u", 132]
58-
shape(self.w): (2,)
59-
shape(X[index,:]) (1, 2)
60-
1 ["u", 3]
61-
shape(self.w): (2,)
62-
shape(X[index,:]) (1, 2)
63-
1 ["u", 140]
64-
shape(self.w): (2,)
65-
shape(X[index,:]) (1, 2)
66-
1 ["u", 139]
67-
shape(self.w): (2,)
68-
shape(X[index,:]) (1, 2)
69-
1 ["u", 127]
70-
shape(self.w): (2,)
71-
shape(X[index,:]) (1, 2)
72-
1 ["u", 98]
73-
shape(self.w): (2,)
74-
shape(X[index,:]) (1, 2)
75-
1 ["u", 30]
76-
shape(self.w): (2,)
77-
shape(X[index,:]) (1, 2)
78-
1 ["u", 16]
79-
shape(self.w): (2,)
80-
shape(X[index,:]) (1, 2)
81-
1 ["u", 4]
82-
shape(self.w): (2,)
83-
shape(X[index,:]) (1, 2)
84-
1 ["u", 2]
85-
shape(self.w): (2,)
86-
shape(X[index,:]) (1, 2)
87-
1 ["u", 75]
88-
shape(self.w): (2,)
89-
shape(X[index,:]) (1, 2)
90-
1 ["u", 123]
91-
shape(self.w): (2,)
92-
shape(X[index,:]) (1, 2)
93-
1 ["u", 42]
94-
shape(self.w): (2,)
95-
shape(X[index,:]) (1, 2)
96-
1 ["u", 16]
97-
shape(self.w): (2,)
98-
shape(X[index,:]) (1, 2)
99-
1 ["u", 94]
100-
shape(self.w): (2,)
101-
shape(X[index,:]) (1, 2)
102-
1 ["u", 163]
103-
shape(self.w): (2,)
104-
shape(X[index,:]) (1, 2)
105-
1 ["u", 159]
106-
shape(self.w): (2,)
107-
shape(X[index,:]) (1, 2)
108-
1 ["u", 23]
109-
shape(self.w): (2,)
110-
shape(X[index,:]) (1, 2)
111-
1 ["u", 16]
112-
shape(self.w): (2,)
113-
shape(X[index,:]) (1, 2)
114-
1 ["u", 160]
115-
shape(self.w): (2,)
116-
shape(X[index,:]) (1, 2)
117-
1 ["u", 5]
118-
shape(self.w): (2,)
119-
shape(X[index,:]) (1, 2)
120-
1 ["u", 42]
121-
shape(self.w): (2,)
122-
shape(X[index,:]) (1, 2)
123-
1 ["u", 53]
124-
shape(self.w): (2,)
125-
shape(X[index,:]) (1, 2)
126-
1 ["u", 83]
127-
shape(self.w): (2,)
128-
shape(X[index,:]) (1, 2)
129-
1 ["u", 46]
130-
shape(self.w): (2,)
131-
shape(X[index,:]) (1, 2)
132-
1 ["u", 121]
133-
shape(self.w): (2,)
134-
shape(X[index,:]) (1, 2)
135-
1 ["u", 73]
136-
shape(self.w): (2,)
137-
shape(X[index,:]) (1, 2)
138-
1 ["u", 123]
139-
shape(self.w): (2,)
140-
shape(X[index,:]) (1, 2)
141-
1 ["u", 93]
142-
shape(self.w): (2,)
143-
shape(X[index,:]) (1, 2)
144-
1 ["u", 99]
145-
shape(self.w): (2,)
146-
shape(X[index,:]) (1, 2)
147-
1 ["u", 106]
148-
shape(self.w): (2,)
149-
shape(X[index,:]) (1, 2)
150-
1 ["u", 173]
151-
shape(self.w): (2,)
152-
shape(X[index,:]) (1, 2)
153-
1 ["u", 192]
154-
shape(self.w): (2,)
155-
shape(X[index,:]) (1, 2)
156-
1 ["u", 132]
157-
shape(self.w): (2,)
158-
shape(X[index,:]) (1, 2)
159-
1 ["u", 57]
160-
shape(self.w): (2,)
161-
shape(X[index,:]) (1, 2)
162-
1 ["u", 47]
163-
shape(self.w): (2,)
164-
shape(X[index,:]) (1, 2)
165-
1 ["u", 164]
166-
shape(self.w): (2,)
167-
shape(X[index,:]) (1, 2)
168-
1 ["u", 157]
169-
shape(self.w): (2,)
170-
shape(X[index,:]) (1, 2)
171-
1 ["u", 199]
172-
shape(self.w): (2,)
173-
shape(X[index,:]) (1, 2)
174-
1 ["u", 62]
175-
shape(self.w): (2,)
176-
shape(X[index,:]) (1, 2)
177-
1 ["u", 175]
178-
shape(self.w): (2,)
179-
shape(X[index,:]) (1, 2)
180-
1 ["u", 154]
181-
shape(self.w): (2,)
182-
shape(X[index,:]) (1, 2)
183-
1 ["u", 110]
184-
shape(self.w): (2,)
185-
shape(X[index,:]) (1, 2)
186-
1 ["u", 0]
187-
shape(self.w): (2,)
188-
shape(X[index,:]) (1, 2)
189-
1 ["u", 116]
190-
shape(self.w): (2,)
191-
shape(X[index,:]) (1, 2)
192-
1 ["u", 49]
193-
shape(self.w): (2,)
194-
shape(X[index,:]) (1, 2)
195-
1 ["u", 76]
196-
shape(self.w): (2,)
197-
shape(X[index,:]) (1, 2)
198-
1 ["u", 121]
199-
shape(self.w): (2,)
200-
shape(X[index,:]) (1, 2)
201-
1 ["u", 178]
202-
shape(self.w): (2,)
203-
shape(X[index,:]) (1, 2)
204-
1 ["u", 75]
205-
shape(self.w): (2,)
206-
shape(X[index,:]) (1, 2)
207-
1 ["u", 167]
208-
shape(self.w): (2,)
209-
shape(X[index,:]) (1, 2)
210-
1 ["u", 41]
211-
shape(self.w): (2,)
212-
shape(X[index,:]) (1, 2)
213-
1 ["u", 105]
214-
shape(self.w): (2,)
215-
shape(X[index,:]) (1, 2)
216-
1 ["u", 71]
217-
shape(self.w): (2,)
218-
shape(X[index,:]) (1, 2)
219-
1 ["u", 5]
220-
shape(self.w): (2,)
221-
shape(X[index,:]) (1, 2)
222-
1 ["u", 135]
223-
shape(self.w): (2,)
224-
shape(X[index,:]) (1, 2)
225-
1 ["u", 80]
226-
shape(self.w): (2,)
227-
shape(X[index,:]) (1, 2)
228-
1 ["u", 116]
229-
shape(self.w): (2,)
230-
shape(X[index,:]) (1, 2)
231-
1 ["u", 198]
232-
shape(self.w): (2,)
233-
shape(X[index,:]) (1, 2)
234-
1 ["u", 164]
235-
shape(self.w): (2,)
236-
shape(X[index,:]) (1, 2)
237-
1 ["u", 105]
238-
shape(self.w): (2,)
239-
shape(X[index,:]) (1, 2)
240-
1 ["u", 98]
241-
shape(self.w): (2,)
242-
shape(X[index,:]) (1, 2)
243-
1 ["u", 156]
244-
shape(self.w): (2,)
245-
shape(X[index,:]) (1, 2)
246-
1 ["u", 72]
247-
shape(self.w): (2,)
248-
shape(X[index,:]) (1, 2)
249-
1 ["u", 54]
250-
shape(self.w): (2,)
251-
shape(X[index,:]) (1, 2)
252-
1 ["u", 62]
253-
shape(self.w): (2,)
254-
shape(X[index,:]) (1, 2)
255-
1 ["u", 57]
256-
shape(self.w): (2,)
257-
shape(X[index,:]) (1, 2)
258-
1 ["u", 87]
259-
shape(self.w): (2,)
260-
shape(X[index,:]) (1, 2)
261-
1 ["u", 68]
262-
shape(self.w): (2,)
263-
shape(X[index,:]) (1, 2)
264-
1 ["u", 163]
265-
shape(self.w): (2,)
266-
shape(X[index,:]) (1, 2)
267-
1 ["u", 140]
268-
shape(self.w): (2,)
269-
shape(X[index,:]) (1, 2)
270-
1 ["u", 40]
271-
shape(self.w): (2,)
272-
shape(X[index,:]) (1, 2)
273-
1 ["u", 70]
274-
shape(self.w): (2,)
275-
shape(X[index,:]) (1, 2)
276-
1 ["u", 120]
277-
shape(self.w): (2,)
278-
shape(X[index,:]) (1, 2)
279-
1 ["u", 172]
280-
shape(self.w): (2,)
281-
shape(X[index,:]) (1, 2)
282-
1 ["u", 71]
283-
shape(self.w): (2,)
284-
shape(X[index,:]) (1, 2)
285-
1 ["u", 82]
286-
shape(self.w): (2,)
287-
shape(X[index,:]) (1, 2)
288-
1 ["u", 168]
289-
shape(self.w): (2,)
290-
shape(X[index,:]) (1, 2)
291-
1 ["u", 42]
292-
shape(self.w): (2,)
293-
shape(X[index,:]) (1, 2)
294-
1 ["u", 144]
295-
shape(self.w): (2,)
296-
shape(X[index,:]) (1, 2)
297-
1 ["u", 27]
298-
shape(self.w): (2,)
299-
shape(X[index,:]) (1, 2)
300-
1 ["u", 36]
301-
in the map_fin
302-
1 ["w", [0.001, 0.001]]
303-
1 ["t", 1]
1+
0.5095697 0.08477803392127012

0 commit comments

Comments
 (0)