A Model/View Table for Large Datasets

by Reginald Stadlbauer

This article shows how to handle large datasets with QTable using a model/view approach, and without using QTableItems. Model/view means that only the data that the user can see needs to be retrieved and held in memory. This approach also makes it easy to show the same dataset in multiple tables without duplicating any data.

An Abstract Data Source
A TSV Data Source
A Model/View QTable Subclass
Conclusion

Using model/view (or document/view) to handle data means accessing the data via some data source and providing a means of viewing retrieved data in a widget. In the case of tabular data, this might mean having an SQL database that holds the data in a database table, and a widget to present a subset of the data's records. Qt already provides a mechanism for doing this with its QSql module and the QDataTable class. But what do you do if the data source is not an SQL database, but is a file or an in-memory data structure? In this article we will present a QTable model/view subclass that can work with any data source that provides a few simple functions.

For our example we will use a data source that operates on a tab-separated values (TSV) file. Our data source will not read the entire file into memory, but will only retrieve (and write back) the portions of the file that the user actually views.

model/view schematic

An Abstract Data Source

There are any number of possible abstractions we could use to model our data. For our example we have decided to use a column and row abstraction so that we can operate in terms of individual cells. And we will assume that each row has the same number of cells. For this to work our data source must be able to provide the following information: the number of rows, the number of columns, and the contents of a specified cell.

For a writable data source, an additional function that allows us to change the contents of a cell must be provided. The data source also needs to be able to notify the views that are associated with it that the data has changed, so that the views can update themselves if they are presenting changed cells. A writable data source should allow rows to be inserted and deleted, but we will leave this as an extension that you could consider implementing yourself.

Here's the declaration of an abstract DataSource class:

class DataSource : public QObject
{
    Q_OBJECT
public:
    DataSource();
    DataSource();
 
    virtual QString cell(int row, int col) const = 0;
    virtual void setCell(int row, int col, const QString &text) = 0;
 
    virtual int numRows() const = 0;
    virtual int numCols() const = 0;
 
signals:
    void dataChanged();
};

The numRows(), numCols(), and cell() functions provide the information required to display the data. The setCell() function is used to update the data when the user edits the data shown in a view. The dataChanged() signal is emitted if the underlying data is changed (either by a user editing the data in another view, or because the data source changes the data for some other reason), so that the views can update themselves. The use of the signal makes it necessary to include the Q_OBJECT macro and to make DataSource a QObject subclass.

For simplicity we have assumed that each cell contains string data. An alternative would have been to use QVariant, which can store many different C++ and Qt types, including QString.

A TSV Data Source

To make use of a data source we must have a concrete implementation. For our example, we have chosen to implement a simple (and very inefficient) data source that operates on tab-separated values files. More efficient implementations are readily achievable, but since the aim of this article is to show how to create a model/view QTable subclass, we are not concerned about the details of the data source.

TSV files are very easy to parse. Records (rows) are separated by newlines (\n) and the fields in each row (cells) are separated by tabs (\t). Newlines and tabs are not permitted within the data itself. The definition of the TsvDataSource class, which implements the DataSource interface, looks like this:

class TsvDataSource : public DataSource
{
    Q_OBJECT
public:
    TsvDataSource(const QString &fileName);
 
    QString cell(int row, int col) const;
    void setCell(int row, int col, const QString &text);
 
    int numRows() const { return nRows; }
    int numCols() const { return nCols; }
 
private:
    void skipToRow(int row) const;
 
private:
    mutable QFile file;
    int nRows;
    int nCols;
};

The constructor takes the file name of the TSV file the data source will operate on. The cell(), setCell(), numRows(), and numCols() functions are reimplemented from the DataSource interface and work directly on the TSV file. Our QTable subclass will use these functions to access and modify the data.

The private QFile object, file, binds the class to the TSV file on disk. The QFile object is declared mutable because it will modify its data (the file position pointer) in the cell() function which is a const function. Using mutable variables is quite common for objects that serve as some kind of cache. We store the number of rows and columns in nRows and nCols since this is simple to do and is more efficient than recalculating them from the data whenever they are needed.

The private skipToRow() function makes sure that the QFile's position index is moved to the given row in the file.

The implementations of TsvDataSource's functions are straightforward because QFile already provides the underlying functionality we need.

TsvDataSource::TsvDataSource(const QString &fileName)
    : file(fileName)
{
    nRows = nCols = 0;
    if (!file.open(IO_ReadWrite)) {
        qWarning(QString("Couldn't open %1: %2").arg(fileName).arg(file.errorString()));
    } else {
        QString line;
        while (file.readLine(line, 32767) != -1) {
            if (nRows == 0)
                nCols = line.contains('\t') + 1;
            ++nRows;
        }
    }
}

In the constructor we open the file and calculate the number of rows and columns. The number of rows is calculated by calling QFile::readLine() until it returns -1 and counting how often that succeeded. For the number of columns we take the first line and count the number of tabs in that line plus one, since there is always one more field in a row than there are tabs. Qt can handle very long lines, but we have chosen to apply an arbitrary limit of 32767 characters, just in case the file is not a text file with \n as the line terminator.

QString TsvDataSource::cell(int row, int col) const
{
    skipToRow(row);
    QString line;
    file.readLine(line, 32767);
    return line.section('\t', col, col);
}

In the implementation of TsvDataSource::cell() we first skip to the requested row by using skipToRow(). Then we use QString:: section() to extract the data from column col.

void TsvDataSource::setCell(int row, int col, const QString &text)
{
    skipToRow(row);
    int tabCount = 0;
    while (col > tabCount) {
        int c = file.getch();
        if (c == '\t')
            tabCount++;
    }
    for (int i = 0;; ++i) {
        int c = file.getch();
        if (c == '\t' || c == '\n' || c == -1)
            break;
        file.ungetch(c);
        if (i < (int)text.length())
            file.putch(text[i].latin1());
        else
            file.putch(' ');
    }
    emit dataChanged();
}

In setCell() we again skip to the specified row. Then we skip to the given field in that row using QFile::getch(). After that we use QFile::putch() to overwrite the data in the given column. Our implementation is perfect for fixed-width data since it does not permit the size of a field to change. Instead we truncate the input if it is too long, or pad it with spaces if it is too short.

Handling variable-width data is much more complicated and expensive, since it involves overwriting the file with all the data from the first changed row to the end of the file. In this situation the simplest and most robust solution is to base the data source on a real database.

void TsvDataSource::skipToRow(int row) const
{
    file.reset();
    QString line;
    int i = 0;
    while (i++ < row)
        file.readLine(line, 32767);
}

In this function we first move the file's position index to the beginning of the file by calling QFile::reset(). Then we call QFile::readLine() until we reach the requested row. Clearly this is very inefficient for large files. A simple optimization that trades memory for speed is to store each row's file offset in a vector when the data source is constructed. This would mean that skipToRow() could be replaced with file.at(offsets[row]).

A Model/View QTable Subclass

Now that we have a data source we are in a position to implement our QTable subclass. The key difference from a standard QTable is that our table class must use an external data source (TsvDataSource or any other DataSource subclass) instead of its own data structure (QTableItem). To achieve this we must reimplement some functions which usually work on QTableItems. Here is the definition of our QTable subclass:

class Table : public QTable
{
    Q_OBJECT
public:
    Table(DataSource *dSource, QWidget *parent = 0, const char *name = 0);
 
    QString text(int row, int col) const;
    QWidget *createEditor(int row, int col, bool initFromCell) const;
    void setCellContentFromEditor(int row, int col);
    QWidget *cellWidget(int row, int col) const;
    void endEdit(int row, int col, bool accept, bool replace);
    void paintCell(QPainter *painter, int row, int col,
                   const QRect &cr, bool selected, const QColorGroup &cg);
 
    void resizeData(int) {}
    QTableItem *item(int, int) { return 0; }
    void setItem(int, int, QTableItem *) {}
    void clearCell(int, int) {}
    void insertWidget(int, int, QWidget *) {}
    void clearCellWidget(int, int) {}
 
private slots:
    void updateContents();
 
private:
    DataSource *dataSource;
    mutable QLineEdit *editor;
};

The constructor takes the data source it should work on in addition to the widget's parent and name. The text(), createEditor(), setCellContentFromEditor(), cellWidget(), endEdit(), and paintCell() functions must be reimplemented to work on the data source. We will discuss these functions later on.

The remaining public functions are reimplemented as empty functions. Normally QTable uses these functions in conjunction with QTableItems; but since we are not using QTableItems we must make sure that these functions do nothing.

The private slot updateContents() just calls QTable::updateContents(). We will connect this slot to the data source's dataChanged() signal, so that the table will repaint itself when the data changes.

Finally we must remember the data source associated with the table, and cache the current editor used to edit a cell. We'll now review the implementation.

Table::Table(DataSource *dSource, QWidget *parent, const char *name)
    : QTable(dSource->numRows(), dSource->numCols(), parent, name),
      dataSource(dSource), editor(0)
{
    connect(dataSource, SIGNAL(dataChanged()), this, SLOT(updateContents()));
}

In the constructor we initialize the member variables and connect the data source's dataChanged() signal to QTable's updateContents() slot.

QString Table::text(int row, int col) const
{
    return dataSource->cell(row, col);
}

The implementation of text() is easy.

void Table::paintCell(QPainter *painter, int row, int col,
                      const QRect &cr, bool selected, const QColorGroup &cg)
{
    QRect rect(0, 0, cr.width(), cr.height());
    if (selected) {
        painter->fillRect(rect, cg.highlight());
        painter->setPen(cg.highlightedText());
    } else {
        painter->fillRect(rect, cg.base());
        painter->setPen(cg.text());
    }
    painter->drawText(0, 0, cr.width(), cr.height(), AlignCenter, text(row, col));
}

Since we are not using QTableItems we must paint the cell content ourselves by reimplementing QTable::paintCell(). QTable translates the painter to the cell's coordinates and provides the other information we require. We fill the cell's background and set the painter's foreground color depending on the cell's selection state. Then we call drawText() to paint the cell's content. The AlignCenter flag aligns both vertically and horizontally.

QTable makes sure that paintCell() is only called for visible cells and then only if really necessary. Nonetheless, paintCell() is a performance critical function since it is called very often in comparison to other functions. In our case its performance depends on the data source implementation, since we call TsvDataSource::cell() via Table::text().

QWidget *Table::createEditor(int row, int col, bool initFromCell) const
{
    editor = new QLineEdit(viewport());
    if (initFromCell)
        editor->setText(text(row, col));
    return editor;
}

If the user starts editing a cell, createEditor() is called with the row and column as parameters. If the function returns 0 it means that the cell cannot be edited. We just create, remember, and return a QLineEdit. QTable will make sure that the editor widget is placed correctly. The initFromCell parameter is used to determine whether or not we should initialize the editor with the cell's current contents.

QWidget *Table::cellWidget(int row, int col) const
{
    if (row == currEditRow() && col == currEditCol())
        return editor;
    return 0;
}

If the user changes a column width or row height in the table while editing a cell, the editor's geometry might need to be updated. QTable handles this automatically by calling cellWidget() for the affected cells, and if the function returns a valid widget, the widget's geometry is updated. To take advantage of this mechanism, we must reimplement cellWidget().

void Table::endEdit(int row, int col, bool accept, bool replace)
{
    QTable::endEdit(row, col, accept, replace);
    delete editor;
    editor = 0;
}

The endEdit() function is called when the user has finished editing, either by accepting (pressing Return) or rejecting (pressing Esc) their changes. We call QTable's implementation to handle the user's acceptance or rejection of their edit.

void Table::setCellContentFromEditor(int row, int col)
{
    if (editor)
        dataSource->setCell(row, col, editor->text());
}

If the user accepted the changes, QTable's endEdit() function calls setCellContentFromEditor(). This function should write the updated cell contents to the underlying data structure. DataSource::setCell() emits the dataChanged() signal so that all open tables using this data source will update themselves. This means that when the user changes one table, the change is propagated to all the other tables that are viewing the same data source.

Here is a simple test application that provides two editable views of the same data:

#include <qapplication.h>
#include "table.h"
#include "tsvdata.h"
 
int main(int argc, char *argv[])
{
    QApplication app(argc, argv);
    TsvDataSource dataSource("test.tsv");
    Table table1(&dataSource);
    Table table2(&dataSource);
    table1.show();
    table2.show();
    app.connect(&app, SIGNAL(lastWindowClosed()), &app, SLOT(quit()));
    return app.exec();
}

Conclusion

This article has shown how to implement a model/view QTable subclass, by reimplementing the QTableItem-related functions to do nothing, and by reimplementing cell editing and painting functions to make use of a data source. The model/view Table can be used for all kinds of data, simply by implementing appropriate DataSource subclasses.

Trademarks

Исходники

Другое

A Model/View Table for Large Datasets