DataLab is a compact statistics package aimed at exploratory data analysis. Please visit the DataLab Web site for more information....



Format of ASC files

DataLab uses a simple ASCII-Format to IMPORT or EXPORT data. This data file (text file) has the following structure:

Line 1 Arbitrary header line, containing a maximum of 255 characters. Note that this comment line is displayed in the open dialog of DataLab.
Line 2 Parameter NFEAT: number of columns (variables, features) of the data matrix (not including the optional object names and class information). Any comment may follow this number as long as this comment is separated by at least one blank from the numeric value and the whole line is no longer than 255 characters.
Line 3 Parameter NOBJ: number of objects of the data matrix. Any comment may follow this number as long as this comment is separated by at least one blank from the numeric value and the whole line is no longer than 255 characters.
Line 4 Parameters FLAG_CLASSINFO, FLAG_FEATNAMES, FLAG_OBJNAMES (possible values: 'TRUE' or 'FALSE'). These parameters control the presence or absence of some additional information, such as the class information (FLAG_CLASSINFO), the names of features (FLAG_FEATNAMES), or the names of objects (FLAG_OBJNAMES). If any of these parameters is 'TRUE' the specific information is included in the following data table. The format of the data table is adjusted accordingly. The values of the parameters must be separated by at least one blank. Any comment may follow these parameters.
Lines 5..k Names of features: the following line(s), holding the names of the features, is (are) present only if the parameter 'FLAG_FEATNAMES' is set TRUE. The identifiers of the features must be separated by at least one blank or any ASCII character below 32 and they have to be stored in the same sequence as the variables. If a feature identifier contains blanks, the identifier has to be enclosed in double quotes ("). A single double quote can be included by using two double quotes (""). The number of names have to be equal to the number of features. The feature names may be stored in any number of lines and the lines may be of any length. Note that the maximum length of a column identifier is 50 characters.
Lines k+1..n Class information, object names and data: the data table is stored row by row, starting with the first variable as the first entry. Each row of variables is preceded by optional class information and an optional row identifier (=object name). This additional information is stored only if the parameters FLAG_CLASSINFO and/or FLAG_OBJNAMES are set 'TRUE'. If a row identifier contains blanks, the identifier has to be enclosed in double quotes (").

Between the values of a row any number of carriage returns or blanks are allowed. In any case it is strongly recommended to store the data table in such a way that it can be read and edited easily.

The values may be stored in any format (integer, floating point, exponential notation) and they must be separated at least by one blank. The class information must be of integer type, the row identifiers are interpreted as strings. The lines can have any length and must not contain any comment.

The following example shows an ASCII data file, which contains 10 rows of 3 variables each. Class information, features names and object names are included.

This is a sample file
3                 ;number of features
10                ;number of objects
TRUE TRUE TRUE    ;class info, feat.names, obj.names
                   F1      F2      "oil speed"
1   S23X4         3.380    2.20    -4
1   S24X4        15.900   -2.20    -4.033E-05
1   C24X3         3.607    1.20    2
2   "S12 early"  -3.305    2.20    -4
2   S12          35.340   -2.20    2.888E-05
1   SWINTER      13.670    1.20    22
2   "SPG MER 9"  -3.376    2.20    4
1   B1           25.375   -2.20    -1.113E+01
2   B2           -1.650    1.20    -0.1
2   B3            2.509    1.20    -10.0


Last Update: 2012-Jul-25