xiaolong88’s blog

階層的クラスター分析

階層的クラスター分析は、データの２つの近い点をまとめていくものであるが、
距離の計算において異なる以下の６つの手法を紹介する
１．単連結法（最短距離方）：１番短い距離を利用する
２．完全連結法（最長距離法）：１番長い距離を利用する
３．群平均法：距離の平均を求める
４．ウォード法：クラスターが巨大にならないようにグループの中の距離がどれだけ増えるかを計算する。それが１番最小となる点を選ぶ
５．重心法：重心と距離を計算する
６. メディアン法：重心法の変形。２つのクラスターの重心の間の重み付きの距離を求めるとき、重みを等しくして求めた距離を２つのクラスター間の距離とする

それぞれ試してく
まずはウォード法

iris0 <- iris[,1:4] #iris 1-4列のデータを使用
iris1 <- dist(iris0) #距離行列の作成
hc.iris1 <- hclust(iris1,method = "ward.D2")
hc.iris1

##
## Call:
## hclust(d = iris1, method = "ward.D2")
##
## Cluster method : ward.D2
## Distance : euclidean
## Number of objects: 150

クラスタリング結果の樹形図
>|r|plot(hc.iris1)|r

f:id:xiaolong88:20170320132843p:plain

result_iris1 <- cutree(hc.iris1,4)
cluster <- iris[,5]     #クラスタリング正解:品種（iris 5列目）
cluster_table <- table(cluster, result_iris1 ) #正解と結果のクロス表を作成
cluster_table

## result_iris1
## cluster 1 2 3 4
## setosa 50 0 0 0
## versicolor 0 24 25 1
## virginica 0 14 1 35

他の手法も使用してみる

iris_3 <- iris[1:20,1:4] #見やすいようデータを２０行に限定
iris_3 <- dist(iris_3) #ユークリッド距離表記にする

hc.sngl <- hclust(iris_3,"single") #単連結法
hc.comp <- hclust(iris_3)#完全連結法
hc.aver <- hclust(iris_3,"average") #群平均法
hc.ward <- hclust(iris_3,"ward.D2") #ウォード法
hc.cntr <- hclust(iris_3,"centroid") #重心法
hc.medi <- hclust(iris_3,"median") #メディアン法

op <- par(mfrow=c(2,3)) #plotを2行3列画面表示するように指定
par(family = "HiraKakuProN-W3")#macでplotの文字化けを防ぐ
plot(hc.sngl, main="単連結法")
plot(hc.comp, main="完全連結法")
plot(hc.aver, main="群平均法") 
plot(hc.ward, main="ウォード法")
plot(hc.cntr, main="重心法")
plot(hc.medi, main="メディアン法")

f:id:xiaolong88:20170320132850p:plain

par(op)#Graphic Parameterをもとに戻す

どの結果が良いのかは他の分析手法も使用して、探索的に比べる必要がある。

#We always use this kind of method to rename the colnames
#But, sometime we want to only change the specific column name

names(x) <- c("a","b","c")

#let's try it
#First, I will create the data

> sex    <- c("F","F","F","M","M")
> height <- c(157,164,159,178,175)
> weight <- c(46,55,50,69,72)
> x    <- data.frame(SEX=sex, HEIGHT=height, WEIGHT=weight) 
>head(x)
  SEX HEIGHT WEIGHT
1   F    157     46
2   F    164     55
3   F    159     50
4   M    178     69
5   M    175     72

#Check the column names

> names(x)
[1] "SEX"    "HEIGHT" "WEIGHT"

#Use the polular method to change the column names

> names(x) <- c("sex","height","weight")
> names(x)
[1] "sex"    "height" "weight"

#Change only the specific column

> names(x)[2] <- "b"
> names(x)
[1] "sex"    "b"      "weight"

xiaolong88’s blog

東京大学松尾准教授によるディープランニングに関する講演

Rでデータ分析階層的クラスター分析

階層的クラスター分析

Rでデータ分析カラム名を変更する