IOLink
IOL_v1.6.1_release
|
A dataframe is simply a model for tabular data, organised in columns and rows. More specifically, a dataframe is column-oriented, meaning that data is stored in columns, rows just being an aggregate of columns' data. Because of this construction, type of data is homogeneous along columns, but can be heterogeneous along rows.
The API for dataframe defines a view that has various capabilities and methods. DataFrameView
has three capabilities: READ
for reading columns' data, WRITE
for editing it, and RESHAPE
for adding and removing columns or rows.
There are also structure methods that will be available for all instances, especially for information about the shape of the dataset, and column informations. Columns are usually accessed by indices in this interface, ranging from 0 to the shape's width. But more information can be accessed using the following methods:
The reading interface of DataFrameView
is quite simple and low level. You can only read data from one column at a time, and you must pass a buffer to the reading method with a size corresponding to the amount of elements you want to read.
In this example, we want to read data from the first five rows of the data frame, that has a column with an integer type, and another storing strings.
The interface to write data is quite similar, with the difference that you must fill the given buffer with the data to write. Similary to previous example, we can write the top five rows of our data frame like this:
For getting one element of a column, there are some more user-friendly methods to use:
The RESHAPE
capability offers method to change the shape of a DataFrameView
instance by adding and removing columns and rows. Because of the column-oriented structure, columns can only be affected one by one, but rows can be processed in continuous chunks.
Columns operations examples:
Row operations example:
The factory DataFrameViewFactory
has a method allocate
that can be used to create a DataFrameView
with its data stored in memory: