Quantcast
Channel: Jobs
Viewing all articles
Browse latest Browse all 18427

Large text file import into sqlite

$
0
0
So I have been working on a program which imports the contents of a text file into a sqlite database, divided by line. This is my first foray into the wonderful world of QT5, but not my first rodeo with C++. I have spent a goodly amount of time Googling this all, and looking through the documentation. I am not a stranger to working through my own problems and figuring out what is wrong. In fact, this program is working. I am not asking for help fixing an error, Google does a dang good job of answering those questions for me. I am asking for your experienced advice, simply because I am still pretty new at this. Back to my ‘issue’. Like I said, i am using sqlite to store a lot of small amounts of text. <30GB sizes of text. (worst case scenario, but still). The program has one functioning button, which allows a user to import a text file into the database. I have got that to work. My problem is that it can import (on my pretty good computer) about 4 lines of text per second. If i have a a 1.2GB file with >200,000,000 lines of text, this becomes an issue. Here is my relevant code; void MainWindow::on_pushButton_clicked() {     //open personal sqlite database     QSqlDatabase db = QSqlDatabase::addDatabase("QSQLITE");     db.setDatabaseName("./data.sqlt");       //make sure the database opened properly     if(!db.open()) { //just in case         qFatal("Failed to connect to the database");     } else {         qDebug("Succesfully Connected to the database");     }     QSqlQuery qry;       //create database with table if it doesn't exist     qry.prepare("CREATE TABLE IF NOT EXISTS pw (pass VARCHAR(63), value INT(256))");     if(!qry.exec()) {         qDebug() << qry.lastError();     } else {         qDebug() << "Succesfully Connected to the table";     }       //open dialgue for finding new wordlist to import     QString database = QFileDialog::getOpenFileName(this, tr("Open Database"),"",tr("Password Database (*.*)"));     QFile fdatabase(database);       if (!fdatabase.open(QIODevice::ReadOnly | QIODevice::Text)) { //if the file doesn't exist         qDebug("User cancelled alphabetical wordlist import");     } else {         //show the dialogue for choosing the wordlist to import         QProgressDialog progress("Gathering information...", "Abort Import", 0, 0, this);             progress.show();             progress.setWindowModality(Qt::WindowModal);           //count the words in the wordlist         int numo = 0;         int numo2 = 100;         while(!fdatabase.atEnd()) {             //read the text file             fdatabase.readLine();               //count the lines of text             numo++;               //keep the user informed             if(numo2 == numo) {                 progress.setLabelText("Gathering information...\nFound " + QString::number(numo) + " new passwords");                 numo2 += 100;             }               //ensure the program doesn't freeze             qApp->processEvents(QEventLoop::ExcludeUserInputEvents);               //the user can cancel importing wordlist if he wants             if (progress.wasCanceled()) {                 break; }         }           //restart the database because i don't know how to read the file again         fdatabase.close();         fdatabase.open(QIODevice::ReadOnly | QIODevice::Text);           //Update the progress dialogue and set everything ready for actual import         QString tnumPass;         QString xnumPass;         tnumPass = QString::number(numo);         progress.setLabelText("Importing 1 of " + tnumPass + " new passwords...");         progress.setMaximum(numo);         qDebug() << "Counted" << numo << "passwords in selected file" <<  fdatabase.fileName();         numo = 0;           //read the wordlist into personal sqlite database         while(!fdatabase.atEnd()) {             QString line = fdatabase.readLine();               //show progress             numo++;             xnumPass = QString::number(numo);             progress.setValue(numo);             progress.setLabelText("Importing " + xnumPass + " of " + tnumPass + " new passwords...\nImporting Password: " + line);               //insert new password             qry.prepare("INSERT INTO pw (pass, value) values ('" + line + "', '1')");               //if the password was not inserted correctly             if (!qry.exec()) {                 qDebug() << qry.lastError();             }               //ensure the program doesn't freeze             qApp->processEvents(QEventLoop::ExcludeUserInputEvents);               //the user can cancel importing wordlist if he wants             if (progress.wasCanceled()) {                 break;             }         }     //close the sqlite database     db.close();     } } My question, or plea for advice, then, is this. How do I speed up this process? If I import a 10GB text file, going line by line, at about 4 lines a second into my sqlite database, it will take for ever for it to finish. I can’t load it all into memory and then push it to the sqlite database, because I can’t guarantee that the end user will have the amount of RAM needed to fit it all. Is there a faster way to store this type of data that i am unaware of? Am I doing something, really really, wrong that is causing this to put the data into the sqlite file? If you want to compile this yourself, here is the source code (run in debian based linux, on a 3.2ghz dual core AMD processor with a 1TB 7200RPM hard drive) http://www.sendspace.com/file/7t89ta. I would love to hear your opinion about what is the best way to work with VERY large text files, and the best way to store them in a easy access format. Any and all contributions would be greatly appreciated.

Viewing all articles
Browse latest Browse all 18427

Trending Articles