ACID and BASE: MySQL and NoSQL
SQL is an acronym for Structured Query Language, which was standardized early in the history of relational databases. MySQL is the most widely-used open-source relational database. NoSQL stands for Not only SQL, and is an emerging alternative to SQL databases. NoSQL options include MongoDB, CouchDB, VoltDB, Membase, and Schooner Membrain: the smart memcached, optimized for flash memory. How does a relational database like MySQL differ from NoSQL, and what are the main use cases for each?
MySQL with the InnoDB plug-in storage engine is a full ACID-compliant relational database, like Oracle, IBM DB2, and Microsoft SQL Server. This means that database transactions are processed in a way that guarantees data integrity. ACID is an acronym for:
- Atomic: Database modifications must follow an all-or-nothing rule and either complete or not complete, but never leave data in some in-between state.
- Consistent: Any database modification must keep the database in a consistent state, or, more precisely, change the database from one consistent state to another consistent state.
- Isolated: This requires that other operations cannot access or see data that has been modified during a transaction that has not yet completed. In other words, concurrent transactions must not interfere with one another.
- Durable: The database guarantees that once the user is told that the transaction has succeeded, the transaction will not be lost even in the case of system failure.
The ACID mechanisms impose some performance overhead. That's obviously worthwhile when you need absolute data integrity, as when you're building an electronic funds transfer system. But some applications don't have stringent data integrity requirements. Losing a web cookie, for example, usually has no bad side effects other than a slight delay in web site access. For applications that don't need full data integity, why pay for the ACID overhead? This is especially true where performance requirements are very high. There are also important issues of elasticity and scalability: how many simultaneous writes and reads can be handled by a single database, and over how many servers must (or can) the workload be spread to achieve the desired performance goals?
The NoSQL data stores sacrifice ACID-compliance to eliminate the performance overhead. They range from key-value stores to document/object stores to extensible record stores. NoSQL data stores are genuinely useful for many applications. All NoSQL offerings relax one or more of the ACID properties, and usally have simpler APIs that don't offer the full power of SQL queries.
The NoSQL alternative to ACID is sometimes described as BASE, an acronym for:
- Basically Available,
- Soft state,
- Eventual consistency,
as in this 2008 ACM article by Dan Pritchett.
There's also an excellent 2010 paper by database luminary and Schooner Advisory Board member, Dr. Rick Cattell, entitled "High Performance Scalable Data Stores." Also, here's a CACM blog post — Errors in Database Systems, Eventual Consistency, and the CAP Theorem — by database pioneer Michael Stonebraker, which nicely discusses some of the tradeoffs between ACID's guaranteed permanent consistency and BASE's eventual consistency model.
Caching, or temporary storage of data retrieved from a relational database, is a very common technique for speeding up pure MySQL-based applications. Memcached, an open-source caching protocol, is very widely utilized because of its low latency and high scalability.
Because memcached is so popular we believe it is the right foundation for more powerful NoSQL data stores. Implement memcached to exploit a lot of fast flash memory and add persistence and enterprise-grade high-availability / disaster-recovery. Suddenly, you have a superb enterprise-ready NoSQL solution for production use on demanding workloads. That's what Membrain is. It complements SchoonerSQL, either as pure cache used with MySQL or a NoSQL replacement for jobs where MySQL is not needed.
