Efficiency comparison of C++ Xml parsing (Qt/TinyXml 2/RapidXml)

Keywords: xml Qt Windows Linux

Efficiency comparison of C++ Xml parsing (Qt/TinyXml 2/RapidXml)

Usually when we initialize or save configurations of some software, we will encounter operations on XML files, including reading and writing XML files, parsing content and so on. In my work, I encountered a problem that Qt parses XML files very slowly on the ARM platform. At first, I suspected that my operation was wrong or that the file operation on the ARM platform itself was very slow. So I began to investigate where the efficiency problem was. Here are some tests to share with you.

Background of problem

The following section of code is the one mentioned earlier that is inefficient:

QString filename = "...";
QFile file( filename );

//< step1 open file
if( !file.open(QIODevice::ReadOnly) )
{
    qDebug() << "failed in opening file!";
    return false;
}

//< step2 read file content
QDomDocument doc;       //< #include <qdom.h>
if( !doc.setContent( &file ) )
{
    qDebug() << "failed in setting content!";
    file.close();
    return false;
}
file.close();
...  //< operations on the content of file!

At first, it was thought that it took too much time to open and close files, so the system time was acquired before and after the file open and close functions to test the function. The result was that it took too long for doc.setContent to get the content of XML files. It took too much time for Qt to get the content of XML files and the structure of Dom model, so we began to search for more efficient methods. Solutions.

testing environment

Windows:
system: windows 10
cpu: intel core-i5-5200u @2.2GHz
IDE: visual studio 2010
compiler: VC10

Linux:
system: Debian 4.4.5-8
cpu: intel core-i5-3450 @3.3GHz
IDE: VIM
compiler: gcc version 4.4.5

  • Qt Version: 4.8.4
  • The file name used for testing is DriverConfig.xml,Size 245 Kb,A total of 1561 lines, mostly in Chinese
  • Comparison item TinyXml2, QDomDocument,Because from the interface, they operate in a very similar way, and I'll add others later. xml Comparisons of parsing libraries, such as xmlbooster And so on.

Qt - QDomDocument

The following is the source code that uses xml support in Qt to read the contents of files:

#include <QtCore/QCoreApplication>
#include <qdom.h>
#include <QFile>
#include <QIODevice>
#include <iostream>
#ifdef Q_OS_WIN
# include <Windows.h>
#else
# include <sys/time.h>
#endif

using std::cout;
using std::endl;

#define TEST_TIMES 10

int main(int argc, char *argv[])
{
    QCoreApplication a(argc, argv);

#ifdef Q_OS_WIN  //< windows

    long tStart = 0;
    long tEnd   = 0;

    LARGE_INTEGER nFreq;
    LARGE_INTEGER nStartTime;
    LARGE_INTEGER nEndTime;
    double time = 0.;

    QueryPerformanceFrequency(&nFreq);
    QFile file( "D:/DriverConfig.xml" );
    QDomDocument doc;

    for( int i = 0; i < TEST_TIMES; ++i )
    {
        doc.clear();

        //< step1 open file
        if( !file.open(QIODevice::ReadOnly) )
        {
            cout << "failed to open file!" << endl;
            continue;
        }
        Sleep( 100 );
        QueryPerformanceCounter(&nStartTime); 

        //< step2 set content
        if( !doc.setContent(&file) )
        {
            cout << "Failed to read xml file!" << endl;
        }
        QueryPerformanceCounter(&nEndTime);
        time = (double)(nEndTime.QuadPart-nStartTime.QuadPart) / (double)nFreq.QuadPart * 1000.;  //< ms
        cout << " seting content costs " << time << "ms" << endl;

        file.close();
        Sleep( 100 );
    }

#else //< LINUX

    timeval starttime, endtime;
    QFile file( "/home/liuyc/DriverConfig.xml" );
    QDomDocument doc;
    double timeuse = 0.;
    double timeAverage = 0.;

    for( int i = 0; i < TEST_TIMES; ++i )
    {
        doc.clear();

        //< step1 open file
        if( !file.open(QIODevice::ReadOnly) )
        {
            cout << "failed to open file!" << endl;
            continue;
        }
        sleep( 1 );  //< delay for 1s
        gettimeofday( &starttime, 0 );

        //< step2 set content
        if( !doc.setContent(&file) )
        {
            cout << "Failed to read xml file!" << endl;
            continue;
        }
        gettimeofday( &endtime, 0 );
        timeuse = 1000000. * (endtime.tv_sec - starttime.tv_sec) + endtime.tv_usec - starttime.tv_usec;
        timeuse *= 0.001 ;
        timeAverage += timeuse;
        cout << " reading files costs : " << timeuse << "ms" << endl;

        file.close();
        sleep( 1 );  //< delay for 1s
    }

    timeAverage /= TEST_TIMES;
    cout << " The End *****************\n    average costs = " << timeAverage << "ms" << endl; 

#endif

    return a.exec();
}

Let's take a look at the results under windows:

My reaction was WTF?? Why does the same function read the same file ten times have such a big difference, so I will add delay when the file opens and closes, hoping to avoid the impact of the process of file switch on this function, the result still has not solved this problem, I hope God can help me solve this problem!

Now let's look at the results of linux:

Obviously, the time under linux is relatively stable and credible, so we only need to use the time under linux as a reference for later testing.

TinyXml-2

Now let's look at the source code for reading using tinyxml2:

#include <iostream>
#include "tinyxml2.h"
#ifdef _WIN32
#include <Windows.h>
#else
#include <sys/time.h>
#endif
using namespace tinyxml2;
using std::cout;
using std::endl;

#define TEST_TIMES  10

int main()
{
#ifndef _WIN32  //< linux ------------------------------------------------

    tinyxml2::XMLDocument doc;
    timeval starttime, endtime;
    double timeuse = 0.;
    double timeAverage = 0.;
    for( int i = 0; i < TEST_TIMES; ++i )
    {
        gettimeofday( &starttime, 0 );
        if( XML_SUCCESS != doc.LoadFile( "/home/liuyc/DriverConfig.xml" ) )
        {
            cout << "failed in load xml file! _ " << i << endl;
            continue;
        }
        gettimeofday( &endtime, 0 );

        timeuse = 1000000. * (endtime.tv_sec - starttime.tv_sec) + endtime.tv_usec - starttime.tv_usec;
        timeuse *= 0.001 ;
        cout << " reading files costs : " << timeuse << "ms" << endl;
        timeAverage += timeuse;
    }
    timeAverage /= TEST_TIMES;
    cout << " \n** The end *******************\n    the average costs = " << timeAverage << "ms" << endl;

#else  //< windows ---------------------------------------------------

    LARGE_INTEGER nFreq;
    LARGE_INTEGER nStartTime;
    LARGE_INTEGER nEndTime;
    double time = 0.;

    QueryPerformanceFrequency(&nFreq);
    tinyxml2::XMLDocument doc;
    for( int i = 0; i < TEST_TIMES; ++i )
    {
        QueryPerformanceCounter(&nStartTime); 
        if( XML_SUCCESS != doc.LoadFile( "D:/DriverConfig.xml" ) )
        {
            cout << "failed in load xml file! _ " << i << endl;
            continue;
        }
        QueryPerformanceCounter(&nEndTime);
        time = (double)(nEndTime.QuadPart-nStartTime.QuadPart) / (double)nFreq.QuadPart * 1000.;  //< ms
        cout << " reading files costs : " << time << "ms" << endl;
    }
    cout << endl;
    system("pause");

#endif  //< end of windows ---------------------------------------------------
    return 0;
}

Next, let's look at the running results under linux (the running results under windows have not much reference value anymore):

The performance under linux is still stable, and we can draw a very clear conclusion that tinyxml is much more efficient than QDomDocument (the data here is roughly four times, but excluding calls to other functional interfaces for processing information inside xml files).

Although there is no reference value, but still look at the test results under windows:

The efficiency here is obviously much higher than that of Windows Qt, and the execution time is relatively stable. Therefore, the unstable running time in the previous test is tentatively defined as the problem of Qt implementation itself. There is no answer to the specific question or whether it has been solved in the high version of Qt.

RapidXml

Note: RapidXml version: 1.13
stay RapidXml Manual In the introduction, you can see that it compares with TinyXml and other XML parsing libraries (where tinyXml is the slowest), which is the fastest Xml parsing at present.

As a rule of thumb, parsing speed is about 50-100x faster than Xerces DOM, 30-60x faster than TinyXml, 3-12x faster than pugxml, and about 5% - 30% faster than pugixml, the fastest XML parser I know of.

So here I also want to try RapidXml's efficiency performance in content parsing. Here's the source code:

#include <iostream>
#include "rapidxml.hpp"
#include "rapidxml_print.hpp"
#include "rapidxml_utils.hpp"
#ifdef _WIN32
# include <Windows.h>
#else
# include <sys/time.h>
#endif

using namespace rapidxml;
using std::cout;
using std::endl;

#define TEST_TIMES  10

int main()
{
#ifdef _WIN32  //< windows

    LARGE_INTEGER nFreq;
    LARGE_INTEGER nStartTime;
    LARGE_INTEGER nEndTime;
    double time = 0.;
    QueryPerformanceFrequency(&nFreq);

    //< parse xml
    for( int i = 0 ; i < TEST_TIMES; ++i )
    {
        rapidxml::file<> filename( "D:/DriverConfig.xml" );
        xml_document<> doc;
        QueryPerformanceCounter(&nStartTime); 

        doc.parse<0>( filename.data() );

        QueryPerformanceCounter(&nEndTime);
        time = (double)(nEndTime.QuadPart-nStartTime.QuadPart) / (double)nFreq.QuadPart * 1000.;  //< ms
        cout << " reading files costs : " << time << "ms" << endl;
        doc.clear();
    }

    system("pause");

#else

    timeval starttime, endtime;
    double timeuse = 0.;
    double timeAverage = 0.;

    //< parse xml
    for( int i = 0 ; i < TEST_TIMES; ++i )
    {
        rapidxml::file<> filename( "/home/liuyc/DriverConfig.xml" );
        xml_document<> doc;
        gettimeofday( &starttime, 0 );

        doc.parse<0>( filename.data() );

        gettimeofday( &endtime, 0 );

        timeuse = 1000000. * (endtime.tv_sec - starttime.tv_sec) + endtime.tv_usec - starttime.tv_usec;
        timeuse *= 0.001 ;
        cout << " reading files costs : " << timeuse << "ms" << endl;
        doc.clear();

        timeAverage += timeuse;
    }
    timeAverage /= TEST_TIMES;
    cout << " \n** The end *******************\n    the average costs = " << timeAverage << "ms" << endl;

#endif

    return 0;
}

Similarly, first look at the results of running under linux:

The efficiency is 2.x times that of TinyXml 2, but there is no 30-60 times efficiency difference as mentioned in rapidXml's instruction manual (TinyXml is compared with TinyXml, not TinyXml 2). I don't know whether TinyXml 2 has a significant efficiency improvement over TinyXml or whether I have problems with the use of RapidXml. Later, I need to investigate the use of RapidXml's interface carefully. Law.
In my own initial use, I think RapidXml's interface is as simple and easy to use as Qt and TinyXml2, so using TinyXml2 may achieve a win-win situation in terms of development efficiency and operation efficiency when the file size is not large or the efficiency requirements are not very limited.

Let's take a look at the results of windows:

Still not very stable, so only for reference.

summary

The statistical time is as follows (LINUX):

Parser Time consumed (ms) Efficiency multiples (relative Qt)
Qt-QDomDocument 25.85 1
TinyXml2 6.64 3.89
RapidXml 2.71 9.54

Since the work is basically developed under Qt, deeply realize that Qt's interface package is perfect and easy to use, but inevitably sacrificed some efficiency (although unexpectedly the efficiency has been reduced so much). In contrast, tinyxml 2 in the interface is very similar at the same time, the efficiency has been significantly improved (nearly four times), so the current analysis of xml is still recommended TinyXml 2. Later, I will continue to study the use of TinyXml 2 and RapidXml carefully to see if RapidXml can really bring its performance into full play! uuuuuuuuuuu

Posted by mvidberg on Fri, 21 Dec 2018 06:45:06 -0800