DataLab is a compact statistics package aimed at exploratory data analysis. Please visit the DataLab Web site for more information....



Splitting a Data Set

Command: Tools -> Split Data...

During data analysis it is often necessary to create two or more disjoint subsets from a common set of data, which then can be used as training and test sets. DataLab therefore provides two ways of creating such subsets: (1) drawing disjoint random samples and (2) splitting a data set without changing the order of the data. In addition, a data set may be split up row- or columnwise.

After choosing Edit/Split Data, a set-up box is displayed which allows the user to set the number of files to be created and the mode of sampling (random selection, blocked, or interleaved, and columnwise. vs. rowwise). The subsets are created from the current data matrix and are stored in the current working directoy using the ASC format.

The names of the subsets are created automatically from the name of the original data set by appending decimal numbers with two places. If the data set is not yet named, the split data is stored using the name "noname_xx.asc".

The process of subset creation is started by clicking the command Do It.


Last Update: 2011-Dez-08