apachecn
diff --git a/‎docs/12.使用FP-growth算法来高效发现频繁项集.md‎
Lines changed: 18 additions & 16 deletions b/‎docs/12.使用FP-growth算法来高效发现频繁项集.md‎
Lines changed: 18 additions & 16 deletions
diff --git a/‎…d-growth/apachecn_fd_growth_homepage.png‎ ‎…P-growth/apachecn_fp_growth_homepage.png‎images/12.Fd-growth/apachecn_fd_growth_homepage.png renamed to images/12.FP-growth/apachecn_fp_growth_homepage.png b/‎…d-growth/apachecn_fd_growth_homepage.png‎ ‎…P-growth/apachecn_fp_growth_homepage.png‎images/12.Fd-growth/apachecn_fd_growth_homepage.png renamed to images/12.FP-growth/apachecn_fp_growth_homepage.png
diff --git a/‎images/12.Fd-growth/步骤1-3.png‎ ‎images/12.FP-growth/步骤1-3.png‎images/12.Fd-growth/步骤1-3.png renamed to images/12.FP-growth/步骤1-3.png b/‎images/12.Fd-growth/步骤1-3.png‎ ‎images/12.FP-growth/步骤1-3.png‎images/12.Fd-growth/步骤1-3.png renamed to images/12.FP-growth/步骤1-3.png
diff --git a/‎images/12.Fd-growth/步骤4-5.png‎ ‎images/12.FP-growth/步骤4-5.png‎images/12.Fd-growth/步骤4-5.png renamed to images/12.FP-growth/步骤4-5.png b/‎images/12.Fd-growth/步骤4-5.png‎ ‎images/12.FP-growth/步骤4-5.png‎images/12.Fd-growth/步骤4-5.png renamed to images/12.FP-growth/步骤4-5.png
diff --git a/‎images/12.Fd-growth/步骤6-1.png‎ ‎images/12.FP-growth/步骤6-1.png‎images/12.Fd-growth/步骤6-1.png renamed to images/12.FP-growth/步骤6-1.png b/‎images/12.Fd-growth/步骤6-1.png‎ ‎images/12.FP-growth/步骤6-1.png‎images/12.Fd-growth/步骤6-1.png renamed to images/12.FP-growth/步骤6-1.png
diff --git a/‎images/12.Fd-growth/步骤6-2.png‎ ‎images/12.FP-growth/步骤6-2.png‎images/12.Fd-growth/步骤6-2.png renamed to images/12.FP-growth/步骤6-2.png b/‎images/12.Fd-growth/步骤6-2.png‎ ‎images/12.FP-growth/步骤6-2.png‎images/12.Fd-growth/步骤6-2.png renamed to images/12.FP-growth/步骤6-2.png
diff --git a/‎images/12.Fd-growth/步骤7-1.png‎ ‎images/12.FP-growth/步骤7-1.png‎images/12.Fd-growth/步骤7-1.png renamed to images/12.FP-growth/步骤7-1.png b/‎images/12.Fd-growth/步骤7-1.png‎ ‎images/12.FP-growth/步骤7-1.png‎images/12.Fd-growth/步骤7-1.png renamed to images/12.FP-growth/步骤7-1.png
diff --git a/‎images/12.Fd-growth/步骤7-2.png‎ ‎images/12.FP-growth/步骤7-2.png‎images/12.Fd-growth/步骤7-2.png renamed to images/12.FP-growth/步骤7-2.png b/‎images/12.Fd-growth/步骤7-2.png‎ ‎images/12.FP-growth/步骤7-2.png‎images/12.Fd-growth/步骤7-2.png renamed to images/12.FP-growth/步骤7-2.png
@@ -1,14 +1,16 @@
 
 # 第12章 使用FP-growth算法来高效发现频繁项集
 
-![](../images/12.Fd-growth/apachecn_fd_growth_homepage.png)
+![](../images/12.FP-growth/apachecn_fp_growth_homepage.png)
 
-### 第11章 我们已经介绍了用Apriori算法发现频繁项集与关联规则，本章将继续关注发现频繁项集这一任务，并使用FP-growth算法更有效的挖掘频繁项集。
+## 前言
+在 [第11章]() 时我们已经介绍了用 `Apriori` 算法发现 `频繁项集` 与 `关联规则`。  
+本章将继续关注发现 `频繁项集` 这一任务，并使用 `FP-growth` 算法更有效的挖掘 `频繁项集`。
 
 ## FP-growth 算法简介
 
 * 一种非常好的发现频繁项集算法。
-* 基于Apriori算法构建,但是数据结构不同，使用叫做 FP树 的数据结构结构来存储集合。下面我们会介绍这种数据结构。
+* 基于Apriori算法构建,但是数据结构不同，使用叫做 `FP树` 的数据结构结构来存储集合。下面我们会介绍这种数据结构。
 
 ## FP-growth 算法步骤
 - 基于数据构建FP树   
@@ -35,34 +37,34 @@ class treeNode:
    1. 遍历所有的数据集合，计算所有项的支持度。
    2. 丢弃非频繁的项。
    3. 基于 支持度 降序排序所有的项。
-   ![](../images/12.Fd-growth/步骤1-3.png)
+   ![](../images/12.FP-growth/步骤1-3.png)
    4. 所有数据集合按照得到的顺序重新整理。
    5. 重新整理完成后，丢弃每个集合末尾非频繁的项。
-   ![](../images/12.Fd-growth/步骤4-5.png)
+   ![](../images/12.FP-growth/步骤4-5.png)
 
 步骤2:
    6. 读取每个集合插入FP树中，同时用一个头部链表数据结构维护不同集合的相同项。  
-   ![](../images/12.Fd-growth/步骤6-1.png)
-   最终得到下面这样一棵FD树
-   ![](../images/12.Fd-growth/步骤6-2.png)
+   ![](../images/12.FP-growth/步骤6-1.png)
+   最终得到下面这样一棵FP树
+   ![](../images/12.FP-growth/步骤6-2.png)
 
 
 从FP树中挖掘出频繁项集
 
 步骤3:
    1. 对头部链表进行降序排序
    2. 对头部链表节点从小到大遍历，得到条件模式基，同时获得一个频繁项集。
-        ![](../images/12.Fd-growth/步骤6-2.png)
+        ![](../images/12.FP-growth/步骤6-2.png)
         如上图，从头部链表 t 节点开始遍历，t 节点加入到频繁项集。找到以 t 节点为结尾的路径如下:
-        ![](../images/12.Fd-growth/步骤7-1.png)
-        去掉FD树中的t节点，得到条件模式基<左边路径,左边是值>[z,x,y,s,t]:2，[z,x,y,r,t]:1 。条件模式基的值取决于末尾节点 t ，因为 t 的出现次数最小，一个频繁项集的支持度由支持度最小的项决定。所以 t 节点的条件模式基的值可以理解为对于以 t 节点为末尾的前缀路径出现次数。
+        ![](../images/12.FP-growth/步骤7-1.png)
+        去掉FP树中的t节点，得到条件模式基<左边路径,左边是值>[z,x,y,s,t]:2，[z,x,y,r,t]:1 。条件模式基的值取决于末尾节点 t ，因为 t 的出现次数最小，一个频繁项集的支持度由支持度最小的项决定。所以 t 节点的条件模式基的值可以理解为对于以 t 节点为末尾的前缀路径出现次数。
 
-   3. 条件模式基继续构造条件 FD树， 得到频繁项集，和之前的频繁项组合起来，这是一个递归遍历头部链表生成FD树的过程，递归截止条件是生成的FD树的头部链表为空。
-        根据步骤 2 得到的条件模式基 [z,x,y,s,t]:2，[z,x,y,r,t]:1 作为数据集继续构造出一棵FD树，计算支持度，去除非频繁项，集合按照支持度降序排序，重复上面构造FD树的步骤。最后得到下面 t-条件FD树 :
-        ![](../images/12.Fd-growth/步骤7-2.png)
-        然后根据 t-条件FD树 的头部链表进行遍历，从 y 开始。得到频繁项集 ty 。然后又得到 y 的条件模式基，构造出 ty的条件FD树，即 ty-条件FD树。继续遍历ty-条件FD树的头部链表，得到频繁项集 tyx，然后又得到频繁项集 tyxz. 然后得到构造tyxz-条件FD树的头部链表是空的，终止遍历。我们得到的频繁项集有 t->ty->tyz->tyzx，这只是一小部分。
+   3. 条件模式基继续构造条件 FP树， 得到频繁项集，和之前的频繁项组合起来，这是一个递归遍历头部链表生成FP树的过程，递归截止条件是生成的FP树的头部链表为空。
+        根据步骤 2 得到的条件模式基 [z,x,y,s,t]:2，[z,x,y,r,t]:1 作为数据集继续构造出一棵FP树，计算支持度，去除非频繁项，集合按照支持度降序排序，重复上面构造FP树的步骤。最后得到下面 t-条件FP树 :
+        ![](../images/12.FP-growth/步骤7-2.png)
+        然后根据 t-条件FP树 的头部链表进行遍历，从 y 开始。得到频繁项集 ty 。然后又得到 y 的条件模式基，构造出 ty的条件FP树，即 ty-条件FP树。继续遍历ty-条件FP树的头部链表，得到频繁项集 tyx，然后又得到频繁项集 tyxz. 然后得到构造tyxz-条件FP树的头部链表是空的，终止遍历。我们得到的频繁项集有 t->ty->tyz->tyzx，这只是一小部分。
    * 条件模式基:头部链表中的某一点的前缀路径组合就是条件模式基，条件模式基的值取决于末尾节点的值。
-   * 条件FD树:以条件模式基为数据集构造的FD树叫做条件FD树。
+   * 条件FP树:以条件模式基为数据集构造的FP树叫做条件FP树。
 
 FP-growth 算法优缺点: