Importing and Exporting (I/O)

Importing data from tabular data files

To read data from a CSV-like file, use the readtable function:

DataTables.readtableFunction.

Read data from a tabular-file format (CSV, TSV, ...)

readtable(filename, [keyword options])

Arguments

  • filename::AbstractString : the filename to be read

Keyword Arguments

  • header::Bool – Use the information from the file's header line to determine column names. Defaults to true.

  • separator::Char – Assume that fields are split by the separator character. If not specified, it will be guessed from the filename: .csv defaults to ',', .tsv defaults to ' ', .wsv defaults to ' '.

  • quotemark::Vector{Char} – Assume that fields contained inside of two quotemark characters are quoted, which disables processing of separators and linebreaks. Set to Char[] to disable this feature and slightly improve performance. Defaults to ['"'].

  • decimal::Char – Assume that the decimal place in numbers is written using the decimal character. Defaults to '.'.

  • nastrings::Vector{String} – Translate any of the strings into this vector into a NULL value. Defaults to ["", "NULL", "NA"].

  • truestrings::Vector{String} – Translate any of the strings into this vector into a Boolean true. Defaults to ["T", "t", "TRUE", "true"].

  • falsestrings::Vector{String} – Translate any of the strings into this vector into a Boolean false. Defaults to ["F", "f", "FALSE", "false"].

  • makefactors::Bool – Convert string columns into CategoricalVector's for use as factors. Defaults to false.

  • nrows::Int – Read only nrows from the file. Defaults to -1, which indicates that the entire file should be read.

  • names::Vector{Symbol} – Use the values in this array as the names for all columns instead of or in lieu of the names in the file's header. Defaults to [], which indicates that the header should be used if present or that numeric names should be invented if there is no header.

  • eltypes::Vector – Specify the types of all columns. Defaults to [].

  • allowcomments::Bool – Ignore all text inside comments. Defaults to false.

  • commentmark::Char – Specify the character that starts comments. Defaults to '#'.

  • ignorepadding::Bool – Ignore all whitespace on left and right sides of a field. Defaults to true.

  • skipstart::Int – Specify the number of initial rows to skip. Defaults to 0.

  • skiprows::Vector{Int} – Specify the indices of lines in the input to ignore. Defaults to [].

  • skipblanks::Bool – Skip any blank lines in input. Defaults to true.

  • encoding::Symbol – Specify the file's encoding as either :utf8 or :latin1. Defaults to :utf8.

  • normalizenames::Bool – Ensure that column names are valid Julia identifiers. For instance this renames a column named "a b" to "a_b" which can then be accessed with :a_b instead of Symbol("a b"). Defaults to true.

Result

  • ::DataTable

Examples

dt = readtable("data.csv")
dt = readtable("data.tsv")
dt = readtable("data.wsv")
dt = readtable("data.txt", separator = '	')
dt = readtable("data.txt", header = false)
source

readtable requires that you specify the path of the file that you would like to read as a String. To read data from a non-file source, you may also supply an IO object. It supports many additional keyword arguments: these are documented in the section on advanced I/O operations.

Exporting data to a tabular data file

To write data to a CSV file, use the writetable function:

DataTables.writetableFunction.

Write data to a tabular-file format (CSV, TSV, ...)

writetable(filename, dt, [keyword options])

Arguments

  • filename::AbstractString : the filename to be created

  • dt::AbstractDataTable : the AbstractDataTable to be written

Keyword Arguments

  • separator::Char – The separator character that you would like to use. Defaults to the output of getseparator(filename), which uses commas for files that end in .csv, tabs for files that end in .tsv and a single space for files that end in .wsv.

  • quotemark::Char – The character used to delimit string fields. Defaults to '"'.

  • header::Bool – Should the file contain a header that specifies the column names from dt. Defaults to true.

  • nastring::AbstractString – What to write in place of missing data. Defaults to "NULL".

Result

  • ::DataTable

Examples

dt = DataTable(A = 1:10)
writetable("output.csv", dt)
writetable("output.dat", dt, separator = ',', header = false)
writetable("output.dat", dt, quotemark = '', separator = ',')
writetable("output.dat", dt, header = false)
source

Supplying DataTables inline with non-standard string literals

You can also provide CSV-like tabular data in a non-standard string literal to construct a new DataTable, as in the following:

dt = csv"""
    name,  age, squidPerWeek
    Alice,  36,         3.14
    Bob,    24,         0
    Carol,  58,         2.71
    Eve,    49,         7.77
    """

The csv string literal prefix indicates that the data are supplied in standard comma-separated value format. Common alternative formats are also available as string literals. For semicolon-separated values, with comma as a decimal, use csv2:

dt = csv2"""
    name;  age; squidPerWeek
    Alice;  36;         3,14
    Bob;    24;         0
    Carol;  58;         2,71
    Eve;    49;         7,77
    """

For whitespace-separated values, use wsv:

dt = wsv"""
    name  age squidPerWeek
    Alice  36         3.14
    Bob    24         0
    Carol  58         2.71
    Eve    49         7.77
    """

And for tab-separated values, use tsv:

dt = tsv"""
    name	age	squidPerWeek
    Alice	36	3.14
    Bob	24	0
    Carol	58	2.71
    Eve	49	7.77
    """