Thanks for taking the time to write this, Bob. ACID I needed to be reminded of, it's good to know there are more than just one or two basic DBs with interfaces other than SQL, and there are a few points in here I'm sure I'll be recalling in the future, thinking "oh that's what he meant!"
On 26 May 2010, at 22:45, Bob Pendleton wrote:
Since no one has mentioned this, I guess I will. The main reason to use a data base system is ACID. ACID ACID short for Atomicity – Consistency – Isolation – Durability and describes the four properties of an enterprise-level transaction: * ATOMICITY: a transaction should be done or undone completely. In the event of an error or failure, all data manipulations should be undone, and all data should rollback to its previous state. * CONSISTENCY: a transaction should transform a system from one consistent state to another consistent state. * ISOLATION: each transaction should happen independently of other transactions occurring at the same time. * DURABILITY: Completed transactions should remain stable/permanent, even during system failure. Insuring that every transaction is fully ACID compliant adds so complexity and overhead to the code that implements the database. But, it is absolutely wonderful when your program dies, or your system dies, and your data base integrity has not been compromised. Nothing quite as much fun as walking through a database looking for broken links and orphaned entries added by a transaction that was in play when the admin tripped over the power cord. :-) You can not get ACID compliance with any kind of purely in memory data management system. Another thing I think I noticed is a bit of confusion about SQL and databases. SQL is the not nearly Standard enough Query Language and it is used as an interface to most relation database management systems. But, if you look around, you'll find that a lot of DBMSes have a C/C++ api that doesn't use a lot of SQL. In fact, some of them don't use any SQL at all. Then there are the DBMSes that don't use SQL at all. Take a look at DBM, AKA GDBM. Go to Google and type in "man unix dbm" to find the man page. You'll probably have to install the package to use it. GDBM is a replacement for DBM which is a linked in, lives on disk, completely non SQL DBMS. The point of all that is that you can have a database with out SQL. I'm so old that when I took the class on databases in college we spent half the class on Codasyl databases and the other half on those new fangled relational databases. We looked at the relational databases just because my professor thought they might be important some day. We used Ingres which used its own QUEL query language. Ingres was the second relational database and did not support SQL. If you want to keep stuff on disk and can live without guaranteed ACID compliance look at something like DBM. It is really easy to use and works great. Now days, we are starting to see the cracks in the relational database model. RDBMSes are really good up to a certain size. After that they start to fall apart. The new big deal in DBMSes are the things like Googles BigTable and hadoop. These are systems that allow simple tag,value databases to span across thousands of machines. But, I don't think you are worried about petabyte databases right now. :-) Getting back to your question. Database management systems are designed to provide ACID compliance. (At least most of them are.) If you need to have your data survive a hardware failure or a system crash you need to use a database. Database servers, even when they are just an old PC, can be much faster than just keeping the data in memory if the data is big enough. That is because the data is indexed and the indexing systems have been tuned and worked over for decades. The indexing systems are part of the real value of a DBMS. Let's say I start out needing to store a few under key,value pairs in my array. I can just do a sequential search to find values, I can add new values at the end or in empty slots. And I can delete pairs by just setting their slot to NULL. Works great. Its simple and reliable you can do it up in a few minutes. What happens when you need to start storing several thousand items? Then your simple sequential search starts taking a meaningful amount of time. When you hit millions (not hard to do on hardware that can support 8 to 16 gigabytes) that simple search gets to be a real problem. So what do you do? Well, you move to a hash table and it gets fast again. Unless you have multiple copies of the same tag. The simplest fastest kinds of hash tables don't like multiple identical tags. You discover then that multmap is your friend, or you wind up writing a lot of code. The same thing happens if you have multiple tables with relationships between the tables. The look ups have to be done for each table and deletions and insertions have to work across all the different tables. You wind up implementing most of a database to get this all to work. Or, you learn a lot about more about the STL than you ever thought you wanted to know. I've done it both ways several times. I have a data base that I use to generate about 8 thousand web pages. I read every table into the appropriate STL data type and do the work in RAM. It took less than a minute to run last time I ran it. It took a fair amount of time to write that code. I did it because when I put the data on a server and used SQL to do the queries I had to let the job run over night. It was walking over every entry in the entire database doing all sorts of weird things including having to resort the data several different ways. And, at the time I was using MySQL running on a 300 Mhz PII that I picked up as surplus from some junk store. :-) So, it made sense do it all in memory. It took almost no time to code it up using SQL and ODBC to connect to my MySQL database. OTOH, coding it up in C++ using STL took a while to do and a while longer to get right. :-) Basically, if your database is simple, at most a few tables with simple relationships, the data all fits in memory, and you can live without ACID, then do it in RAM. You might want to make your life easier by learning the STL or by finding a nice implementation of a hash table. If that does not describe your application, use a DBMS. You don't have to use SQL with your DBMS. Like I said, a lot of DBMSes have a C/C++ api that you can use instead. If you are somewhere in between then look at things like good old DBM or Berkeley DB. I do believe that Berkeley DB may be just what you are looking for, it is like sqlite but has not SQL. Bob PendletonOn Sat, May 22, 2010 at 12:56 AM, Chris Nystrom <cnystrom@xxxxxxxxx> wrote:Below is an example c program that I wrote to use the SQLite library (http://www.sqlite.org/). It compiles without error under Linux. SQLite seems to be far more trouble to use than simply creating arrays. I assume that as you scale up though, it is important for performance to have an actual database instead of internal arrays of data. Is there a rule of thumb about when you should use a database? Or is there some other reason to use SQL besides performance with large systems? Chris -- #include <stdio.h> #include <stdlib.h> #include <sqlite3.h> sqlite3 *db; void open_db(char *db_name) { if (sqlite3_open(db_name, &db)) {fprintf(stderr, "Can't open database: %s\n", sqlite3_errmsg(db));sqlite3_close(db); } } char **result; int nrow = 0; int ncol = 0; int db_exe(char *s_exe) { int rc; char *zErrMsg = 0; rc = sqlite3_get_table(db, /* An open database */ s_exe, /* SQL to be executed */&result, /* Result written to a char *[] */&nrow, /* Number of result rows */ &ncol, /* Number of result columns */ &zErrMsg /* Error msg written here */ ); return rc; } void db_close(void) { sqlite3_close(db); } void get_data(void) { int i, j; if (nrow > 0) { printf("nrow = %d\n", nrow); printf("ncol = %d\n\n", ncol); /* header */ for (i = 0; i < ncol; ++i) printf("%s\t", result[i]); printf("\n"); /* data rows */ for (i = 0; i < nrow; i++) { for (j = 0; j < ncol; j++) { printf("%s\t", result[(ncol * (i + 1)) + j]); } printf("\n"); } } sqlite3_free_table(result); } int main(int argc, char **argv) { char *db_name = "test.db"; /* remove old db if it exists */ remove(db_name); /* create db */ open_db(db_name); /* create emp table */db_exe("create table emp (name varchar(15),age int,weight double)");/* insert data into tables */ db_exe("insert into emp (name,age,weight) values ('Bob',47,172)"); db_exe("insert into emp (name,age,weight) values ('Sue',38,134)"); /* show data in emp table */ db_exe("select * from emp"); get_data(); return 0; } -- E-Mail: Chris Nystrom <cnystrom@xxxxxxxxx> Saving the world from web programming. http://www.newio.org - G-Talk: cnystrom --------------------- To unsubscribe go to http://gameprogrammer.com/mailinglist.html-- +----------------------------------------------------------- + Bob Pendleton: writer and programmer + email: Bob@xxxxxxxxxxxxx + web: www.TheGrumpyProgrammer.com --------------------- To unsubscribe go to http://gameprogrammer.com/mailinglist.html
-- Simplicity does not precede complexity, but follows it. -- Alan Perlis --------------------- To unsubscribe go to http://gameprogrammer.com/mailinglist.html