ACID

From Wikipedia, the free encyclopedia

Jump to: navigation, search

For other uses, see Acid (disambiguation).

This article needs additional citations for verification.
Please help improve this article by adding reliable references. Unsourced material may be challenged and removed. (March 2008)

This article needs attention from an expert on the subject. See the talk page for details. WikiProject Computer science or the Computer science Portal may be able to help recruit an expert. (August 2009)

In computer science, ACID (atomicity, consistency, isolation, durability) is a set of properties that guarantee database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction. For example, a transfer of funds from one bank account to another, even though that might involve multiple changes (such as debiting one account and crediting another), is a single transaction.

Jim Gray defined these properties of a reliable transaction system in the late 1970s and developed technologies to automatically achieve them.[1] In 1983, Andreas Reuter and Theo Haerder coined the acronym ACID to describe them.[2] Contents[hide]

  • 1 Characteristics
    • 1.1 Atomicity
    • 1.2 Consistency
    • 1.3 Isolation
    • 1.4 Durability
  • 2 Examples
    • 2.1 Atomicity failure
    • 2.2 Consistency failure
    • 2.3 Isolation failure
    • 2.4 Durability failure
  • 3 Implementation
    • 3.1 Locking vs multiversioning
    • 3.2 Distributed transactions
  • 4 See also
  • 5 Notes
  • 6 References
[ edit] Characteristics [ edit] Atomicity

Atomicity requires that database modifications must follow an "all or nothing" rule. Each transaction is said to be atomic if when one part of the transaction fails, the entire transaction fails and database state is left unchanged. It is critical that the database management system maintains the atomic nature of transactions in spite of any application, DBMS, operating system or hardware failure.

An atomic transaction cannot be subdivided, and must be processed in its entirety or not at all. Atomicity means that users do not have to worry about the effect of incomplete transactions.

Transactions can fail for several kinds of reasons:

  1. Hardware failure: A disk drive fails, preventing some of the transaction's database changes from taking effect
  2. System failure: The user loses their connection to the application before providing all necessary information
  3. Database failure: E.g., the database runs out of room to hold additional data
  4. Application failure: The application attempts to post data that violates a rule that the database itself enforces, such as attempting to create a new account without supplying an account number.
[ edit] Consistency

The consistency property ensures that the database remains in a consistent state; more precisely, it says that any transaction will take the database from one consistent state to another consistent state.

The consistency property does not say how the DBMS should handle an inconsistency other than ensure the database is clean at the end of the transaction. If, for some reason, a transaction is executed that violates the database’s consistency rules, the entire transaction could be rolled back to the pre-transactional state - or it would be equally valid for the DBMS to take some patch-up action to get the database in a consistent state. Thus, if the database schema says that a particular field is for holding integer numbers, the DBMS could decide to reject attempts to put fractional values there, or it could round the supplied values to the nearest whole number: both options maintain consistency.

The consistency rule applies only to integrity rules that are within its scope. Thus, if a DBMS allows fields of a record to act as references to another record, then consistency implies the DBMS must enforce referential integrity: by the time any transaction ends, each and every reference in the database must be valid. If a transaction consisted of an attempt to delete a record referenced by another, each of the following mechanisms would maintain consistency:

  • abort the transaction, rolling back to the consistent, prior state;
  • delete all records that reference the deleted record (this is known as cascade delete); or,
  • nullify the relevant fields in all records that point to the deleted record.

These are examples of Propagation constraints; some database systems allow the database designer to specify which option to choose when setting up the schema for a database.

Application developers are responsible for ensuring application level consistency, over and above that offered by the DBMS. Thus, if a user withdraws funds from an account and the new balance is lower than the account's minimum balance threshold, as far as the DBMS is concerned, the database is in a consistent state even though this rule (unknown to the DBMS) has been violated. [edit] Isolation

Isolation refers to the requirement that other operations cannot access data that has been modified during a transaction that has not yet completed. The question of isolation occurs in case of concurrent transactions, (i.e. multiple transactions occurring at the same time) and within the same database. Each transaction must remain unaware of other concurrently executing transactions, except that one transaction may be forced to wait for the completion of another transaction that has modified data that the waiting transaction requires. If the isolation system does not exist, then the data could be put into an inconsistent state. This could happen if one transaction is in the process of modifying data but has not yet completed, and then a second transaction reads and modifies that uncommitted data from the first transaction. If the first transaction fails and the second one succeeds, that violation of transactional isolation will cause data inconsistency. Due to performance and deadlocking concerns with multiple competing transactions, many modern databases allow dirty reads which is a way to bypass some of the restrictions of the isolation system. A dirty read means that a transaction is allowed to read, but not modify, the uncommitted data from another transaction. [edit] Durability

Durability is the ability of the DBMS to recover the committed transaction updates against any kind of system failure (hardware or software). Durability is the DBMS's guarantee that once the user has been notified of a transaction's success the transaction will not be lost, the transaction's data changes will survive system failure, and that all integrity constraints have been satisfied, so the DBMS won't need to reverse the transaction. Many DBMSs implement durability by writing transactions into a transaction log that can be reprocessed to recreate the system state right before any later failure. A transaction is deemed committed only after it is entered in the log.

Durability does not imply a permanent state of the database. A subsequent transaction may modify data changed by a prior transaction without violating the durability principle. [edit] Examples

The following examples are used to further explain the ACID properties. In these examples, the database has two fields, A and B, in two records. An integrity constraint requires that the value in A and the value in B must sum to 100. [edit] Atomicity failure

The transaction subtracts 10 from A and adds 10 to B. If it succeeds, it would be valid because the data continues to satisfy the constraint. However, assume that after removing 10 from A, the transaction is unable to modify B. If the database retains A's new value, atomicity and the constraint would both be violated. Atomicity requires that both parts of this transaction complete or neither. [edit] Consistency failure

Consistency is a very general term that demands the data meets all validation rules that the overall application expects - but to satisfy the consistency property a database system only needs to enforce those rules that are within its scope. In the previous example, one rule was a requirement that A + B = 100; most database systems would not allow such a rule to be specified, and so would have no responsibility to enforce it - but they would be able to ensure the values were whole numbers. Example of rules that can be enforced by the database system are that the primary keys values of a record uniquely identify that record, that the values stored in fields are the right type (the schema might require that both A and B are integers, say) and in the right range, and that foreign keys are all valid.

Validation rules that cannot be enforced by the database system are the responsibility of the application programs using the database. [edit] Isolation failure

To demonstrate isolation, we assume two transactions execute at the same time, each attempting to modify the same data. One of the two must wait until the other completes in order to maintain isolation.

Consider two transactions. T1 transfers 10 from A to B. T2 transfers 10 from B to A. Combined, there are four actions:

  • subtract 10 from A
  • add 10 to B.
  • subtract 10 from B
  • add 10 to A.

If these operations are performed in order, isolation is maintained, although T2 must wait. Consider what happens if T1 fails half-way through. The database eliminates T1's effects, and T2 sees only valid data.

By interleaving the transactions, the actual order of actions might be: A-10, B-10, B+10, A+10. Again consider what happens if T1 fails. T1 still subtracts 10 from A. Now T2 adds 10 to A restoring it to its initial value. Now T1 fails. What should A's value be? T2 has already changed it. Also, T1 never changed B. T2 subtracts 10 from it. If T2 is allowed to complete, B's value will be 10 too low, and A's value will be unchanged, leaving an invalid database. This is known as a write-write failure because two transactions attempted to write to the same data field. [edit] Durability failure

Assume that a transaction transfers 10 from A to B. It removes 10 from A. It then adds 10 to B. At this point, a "success" message is sent to the user. However, the changes are still queued in the disk buffer waiting to be committed to the disk. Power fails and the changes are lost. The user assumes that the changes have been made, but they are lost.

To satisfy the durability constraint, the database system must ensure the success message is delayed until the transaction is finalized. (It may be enough to ensure that a transaction log has been fully written to disk; in the event of a crash and restart, lost transactions are rerun, bringing the database up to date. Got it? [edit] Implementation

Processing a transaction often requires a sequence of operations that is subject to failure for a number of reasons. For instance, the system may have no room left on its disk drives, or it may have used up its allocated CPU time.

There are two popular families of techniques: write ahead logging and shadow paging. In both cases, locks must be acquired on all information that is updated, and depending on the level of isolation, possibly on all data that is read as well. In write ahead logging, atomicity is guaranteed by copying the original (unchanged) data to a log before changing the database. That allows the database to return to a consistent state in the event of a crash.

In shadowing, updates are applied to a partial copy of the database, and the new copy is activated when the transaction commits. [edit] Locking vs multiversioning

Most databases rely upon locking to provide ACID capabilities. Locking means that the transaction marks the data that it accesses so that the DBMS knows not to allow other transactions to modify it until the first transaction succeeds or fails. The lock must always be acquired before processing data, including data that are read but not modified. Non-trivial transactions typically require a large number of locks, resulting in substantial overhead as well as blocking other transactions. For example, if user A is running a transaction that has to read a row of data that user B wants to modify, user B must wait until user A's transaction completes. Two phase locking is often applied to guarantee full isolation.[citation needed]

An alternative to locking is multiversion concurrency control, in which the database provides each reading transaction the prior, unmodified version of data that is being modified by another active transaction. This allows readers to operate without acquiring locks. I.e., writing transactions do not block reading transactions, and readers do not block writers. Going back to the example, when user A's transaction requests data that user B is modifying, the database provides A with the version of that data that existed when user B started his transaction. User A gets a consistent view of the database even if other users are changing data. One implementation relaxes the isolation property, namely snapshot isolation. [edit] Distributed transactions

Guaranteeing ACID properties in a distributed transaction across a distributed database where no single node is responsible for all data affecting a transaction presents additional complications. Network connections might fail, or one node might successfully complete its part of the transaction and then be required to roll back its changes because of a failure on another node. The two-phase commit protocol (not to be confused with two-phase locking) provides atomicity for distributed transactions to ensure that each participant in the transaction agrees on whether the transaction should be committed or not.[citation needed] Briefly, in the first phase, one node (the coordinator) interrogates the other nodes (the participants) and only when all reply that they are prepared does the coordinator, in the second phase, formalize the transaction.