DataLab is a compact statistics package aimed at exploratory data analysis. Please visit the DataLab Web site for more information....



Cluster Analysis

Command: Math -> Cluster Analysis...

The command Math/Cluster Analysis provides several methods for constructing dendrograms. The user may select upon five different clustering procedures in combination with four different distance measures. The resulting dendrograms can be used to assign new class numbers to the data objects. After activating the cluster analysis the user has first to select the variables to be used for the clustering. Subsequently the dendrogram is calculated and displayed. The dendrogram can be zoomed in and out, and panned by setting the mouse function using the corresponding buttons of the command bar.

 Variable Selection Select new variables for calculating the dendrogram. The user may select any combination of variables by means of the variable selection dialog which is displayed when the Change Variables button is pressed.
 Assign Classes A dendrogram can be used to assign new class numbers to the objects. The user has to define the minimum distance between the clusters which is used as the criterion for the assignment of new class numbers. The distance can be set interactively moving the dotted red line after activating the assign classes button.
Linkage Type The dendrogram is recalculated whenever any of the parameters are changed. The user may select one of the following clustering methods:
  • Single Linkage
  • Complete Linkage
  • Average Linkage
  • Ward's Method
  • Flexible Strategy (this method requires an extra parameter alpha, which can be set by using the scrollbar below the Linkage Type box)
Distance Measure The dendrograms can be calculated using four different distance measures:
  • Euclidian
  • Squared Euclidian
  • Manhattan
  • Jaccard coefficient

Please note that the Jaccard coefficient is not a distance measure but a measure of similarity. The interpretation of such a dendrogram will thus be different to dendrograms obtained by using "normal" distance measures.

 Store in Newick Format The current dendrogram can be stored using the Newick format.
 Show Protocol The protocol contains the numeric description of the dendrogram in two formats. In the first part of the protocol the dendrogram is described as a table, the second part contains the Newick-String.

The cluster table contains four columns; the first and second column show the object number and the object identifier separated by a pipe symbol. Dendrogram nodes are indicated by the node number and a '+' character. Each object or dendrogram node has a parent node which is specified in the third column. The distance of the object/node to the base line of the dendrogram is listed in column 4. Please note that the table always has N-1 rows (N = number of ojects) and that the nodes are specified by numbers from N+1 upwards.


Last Update: 2012-Aug-27