两个对象i和j之间的相异性可以根据不匹配率来计算:
d(i,j) = (p-m)/p;
其中,m是匹配的数目(即i和j取值相同状态的属性数), 而p是刻画对象的属性总数。
相似性
d(i,j)=1-d(i,j);
对于对称的二元属性,每个状态都同样重要。基于对称二元属性的相异性称做对称的二元相异性。
d(i,j)=(r+s)/(q+r+s+t);
非对称的二元属性,两个状态不是同等重要的,非对称的二元相异性,负匹配数t被认为是不重要的,
d(i,j)=(r+s)/(q+r+s);
数值属性的相异性:euclidean distance, manhattan distance,minkoski distance;
euclidean distance :d(i,j)=sqrt(power((x1-y1),2) + power((x2-y2),2)+power((xn-yn),2));
manhattan distance:d(i,j)=abs(x1-y1)+abs(x2-y2)+abs(xn-yn);
upper distance :produce the max minus value between each dimension of the object
-------------------------------------------------------
weighted euclidean distance
that's d(i,j)=sqrt(power((x1-y1),2)*weight+power((x2-y2),2)*weight+power((xn-yn),2)*weight)
--------------------------------------------------------
So, how can we calculate the dissimilarity of the objects which had mixed attributes .
one method is to group according to the each type of the attribute,then we can proceed
data mining based on the each attribute.however,in real application,each attribute type
which is anabyzed individually can't produce the compatible result
One better way is process all attributes at one time,and only do one analysis.one technology can assemble the different attribute combination in one dissimilarity maxtrix.
and transfer all meaningful attributes to common interval [0.0,1.0]
Assume that the dataset include mixed type attribute amount to p,the dissimilarity between
object i and j will be defined
-------------------------------------------------
the cosine similarity:
s(i,j)=(i*j)/(|i|*|j|)=((x1*y1)+(x2*y2)+(x3*y3)+(xn*yn))/(sqrt(power(x1,2)+power(x2,2)+power(xn,2))*sqrt(power(y1,2)+power(y2,2)+power(yn,2))
---------------------------------------------------