|DataLab is a compact statistics package aimed at exploratory data analysis. Please visit the DataLab Web site for more information....|
|Home Features of DataLab Tools Splitting a Data Set|
|See also: Transpose Data Matrix, Randomisation of the Data, Sorting the Data, Resizing the Data Matrix, Serializing the Data Matrix
Splitting a Data Set
During data analysis it is often necessary to create two or more disjoint subsets from a common set of data, which then can be used as training and test sets. DataLab therefore provides three ways of creating such subsets: (1) splitting of the variables (columns), (2) splitting of the objects (rows), and the creation of a test and a training set. The size of the data sets can be controlled by the scroll bar in the left center. The mode of selection can either be random, blocked or interleaved.
After choosing Tools/Split Data, a set-up box is displayed which allows the user to set the number of files to be created and the mode of sampling (random selection, blocked, or interleaved, and columnwise. vs. rowwise). The subsets are created from the current data matrix and are stored in the current working directoy using the ASC format.
The names of the subsets are created automatically from the file template by appending decimal numbers with two places.
The process of subset creation is started by clicking the command Do It.