DDIA Ch7

Transactions are not a law of nature; they were created with a purpose, namely to simplify the programming model for applications accessing a database.

工具的出现都是为了做事情更方便,这包括数学的工具,都是为了做一件事情更加顺手(比如虚数的出现就是为了能够计算一元三次方程的解)而工具出现以后可能就会被用作其他地方

这种工具包括思维工具(只存在脑子里的工具)

Definition of ACID

Atomicity:

In general, refers to something cannot be broken into smaller parts.

In multi-threaded programming, if one thread executes an atomic operation, that means there is no way that another thread could see the half-finished result of the operation. The system can only be in the state it was before the operation or after the operation, not something in between

By contrast, in the context of ACID, atomicity is about concurrency. it does not describe what happens if several processes try to access the same data at the same time, because that is covered under the letter , for .

Rather, ACID atomicity describes what happns if a client wants to make several writes, but a fault occurs after some of the writes have been processed

有点类似concurrency里面说system不能half finished,但这里强调的是client make several write request,然后有一些write 已经完成了,这时候出现了问题(process crash etc)

If the writes are grouped together into an atomic transaction, and the transaction cannot be completed () due to a fault, then the transaction is and the database must discard or undo any writes it has made so far in that transaction.

所以说这里的atomicity是根据transaction定义的,而不是一个的写入,相比于多线程里面的atomic, 这是一组操作

Consistency

In the context of ACID, consistency refers to an application-specific notion of the database being in a "good state"

The idea of ACID consistency is that you have certain statements about your data (invariants) that must always be true--for example, in an accounting system, credits and debits across all accounts must always be balanced. If a transaction starts with a databse that is valid according to these incariants, and any writes during the transaction preserve the validity, then you can be sure that the invariants are always satisfied.

换句话说,你系统里面如果要求一个数据在每次 transaction 都保证符合一定条件(如文章给出的accounting例子),那么consistency 就确保了每次 transaction 你的这个data都是valid的。

However, this idea of consistency depends on the application's notion of invariants

没错,app决定了什么是invariant,什么不是

However, this idea of consistency depends on the application’s notion of invariants, and it’s the application’s responsibility to define its transactions correctly so that they preserve consistency. This is not something that the database can guarantee: if you write bad data that violates your invariants, the database can’t stop you. (Some spe‐ cific kinds of invariants can be checked by the database, for example using foreign key constraints or uniqueness constraints. However, in general, the application defines what data is valid or invalid—the database only stores it.)

app定义什么是valid, 如果定义的不对,DB是不管的, 他只管存储

consistency是application层面的属性

Atomicity, isolation, and durability are properties of the database, whereas consistency (in the ACID sense) is a of the application

Isolation

Isolation is dealing with concurrency problem

in the sense of ACID means that concurrently executing transactions are isolated from each other

The classic database textbooks formalize isolation as , which means that each transaction can pretend that it is the only transaction running on the entire database. The database ensures that when the transactions have committed, the result is the same as if they had run serially (one after another)

serializability 这里实际上给出了很好的定义,它是在DB textbook中被定义为serializable。也就是说一个transaction和另一个transaction如果同时发生,你可以看成他们是running serially.

Durability

就是指长期保存,无论是一个node还是多个,都是在transaction committed 之后确保多个或者一个node确认之后才可以

In a single-node database, durability typically means that the data has been written to nonvolatile storage such as a hard drive or SSD. It usually also involves a write-ahead log or similar (see “Making B-trees reliable” on page 82), which allows recovery in the event that the data structures on disk are corrupted. In a replicated database, durabil‐ ity may mean that the data has been successfully copied to some number of nodes. In order to provide a durability guarantee, a database must wait until these writes or replications are complete before reporting a transaction as successfully committed.

所以denormalization是拆分的意思?

Different level of isolation

Read committed

Two guarantees:

  1. When reading from the database, you will only see data that has been committed (no dirty reads) 就是没法看到half finished transaction
  2. When writing to the database, you will only overwrite data that has been committed (no dirty writes)
Dirty reads

If transaction has not yet committed or aborted and another transaction can see that uncommitted data, it is a dirty read

Dirty writes

Similar to read, dirty writes means one write from another transaction can overwrite another transaction that has not been committed. This is a dirty write

Dirty write 并不会prevent increment update。 比如说两个用户同时update 一个数值,第一个用户先update x++, 这时候还没commit,然后第二个用户也update了x++,(第一个用户看到是x=1, 第二个也是x=1) read commit isolation level 只保证第二个transaction commit after 第一个transaction, 但他不保证第二个write 一定要等到第一个commit 之后才开始,也就是说,第一个transaction commit 了,x=1, 第二个transaction看到第一个transaction commit 之后,它也commit, 这时候, 但实际上x 应该等于2, 所以说虽然不违反 dirty write, 但是还是没有符合app的要求

总结就是 even read commit prevent dirty write, it does not prevent race condition on increment value (因为write 还是可以同时进行的,只是transaction commit 要等)

这里如果要保证没有 race condition,应该要用serializability

Read skew

read skew 就是read 两个不同object 的时候, 也有人在写,但读的人读到了第一个object,第二个人还没写,等第二个人把两个object都写完了,第一个人才读,这时候第一个人就看到了很奇怪的结果
下面这张图给了个例子
![[DDIA-figure-7-6.png]]
这个例子虽然用户重新读一下就可以看到正确的结果了,但有些应用场景是不允许这种read skew/non repeatable read 的:

  1. Backups
    Taking a backup requires making a copy of the entire database, which may take hours on a large database. During the time that the backup process is running, writes will continue to be made to the database. Thus, you could end up with some parts of the backup containing an older version of the data, and other parts containing. a newer version. If you need to restore from such a backup, the inconsistencies become permanent

    这里的例子很有用!因为backup 的时间超级长,所以如果有人在这期间还在写入 DB, 这时候backup 读到的就是旧的数值(因为write 还没commit), 然后如果DB 需要 restore 那么整个DB 就有上面这个例子的问题了(balance 少了100)。

  2. Analytic queries and integrity checks

    Sometimes, you may want to run a query that scans over large parts of the data‐ base. Such queries are common in analytics

    or may be part of a periodic integrity check that everything is in order (monitoring for data corruption). These queries are likely to return nonsensical results if they observe parts of the database at different points in time.

    也是因为read committed isolation level 不保证你读到的是最新的值,所以这种需要花长时间的读取都有可能导致问题 (read skew/nonrepeatable read)

如何解决?

Snapshot isolation is the most common solution

这篇1995年的paper 之后有时间可以读一下

the transaction sees all the data that was committed in the database at the start of the transaction. Even if the data is subsequently changed by another transaction, each transaction sees only the old data from that particular point in time.

换句话说,这个保证了你读的数值只有committed(哪根read committed 有什么区别?)哦,我标记了一下重点, at the start of the transaction, 也就是说,每个transaction 在开始之前,做一次snapshot, 然后之后有任何读的操作,只从这个snapshot 读, 即使中间有人write了新的数值,那么你也只会读到旧的value

所以snapshot isolation 对于任何长时间读取操作都非常友好,因为他会读到统一的数值。书中原话:

Snapshot isolation is a boon for long-running, read-only queries such as backups and analytics.

所以问题原因就是因为如果读取操作时间很长,我们就会有read skew/nonrepeatable read 的问题

Snapshot isolation implementation

PostgreSQL MVCC (snapshot isolation implementation)
这里有个论文可以看,不过书中也给出了思路

When a transaction reads from the database, transaction IDs are used to decide which objects it can see and which are invisible. By carefully defining visibility rules, the database can present a consistent snapshot of the database to the application. This works as follows:

  1. At the start of each transaction, the database makes a list of all the other transactions that are in progress (not yet committed or aborted) at that time. Any writes that those transactions have made are ignored, even if the transactions subsequently commit.
  1. Any writes made by aborted transactions are ignored.
  2. Any writes made by transactions with a later transaction ID (i.e., which started after the current transaction started) are ignored, regardless of whether those transactions have committed.
  3. All other writes are visible to the application’s queries.

小结一下就是说在开始的时候拿到所有正在进行的transaction,所有正在进行的transaction的write 操作全部ignore, 当前transaction之后发生的transaction(用ID 来判断)全部ignore,所有aborted transaction ignore。剩下的写入都可以看到

其实这里的实现思路值得借鉴,因为它也是用了排除法的逻辑来实现的,也就是把所有可能出现问题的情况排除掉,剩下的全部允许执行, 这在我network hw4 实现 goback n 算法的时候思路是一致的,因为你就要先排除错误情况,比如packet corrupted 或者发来的ack是之前的(也就是packet loss),这些错误情况就直接重发packet,如果所有错误情况都排除了,那就发下一个packet。

这种排除法应该在实现算法的时候经常用到,逻辑会严谨很多,所以要学习并且应用

这张图很不错,但是我觉得应该还可以更好,不过足够用了

![[DDIA-Figure-7-7.png]]

这里snapshot 就节省了空间,因为它只在被操作的object上面snapshot, 所以不会占用额外的空间

之所以update 要先 delete 再 create 是因为如果update inplace,会造成读的时候读到不对的值(比如你有一个updated by = 13) 但是你没有旧的数值了,所以要先delete,再create

直接标记 updated by = 13 然后下面新建一个create 可以可以呢?是可以的,但是 transaction 结束的时候你会有一个多余的updated by row, background deletion progress 就不知道该不该删了,会造成空间浪费

所以这种实现方式only incurring a small overhead

Index and snapshot isolation

不同DB对snapshot isolation 的优化

copy-on-write 是什么意思?应该是not inplace update, 不像Btree那样,直接在上面写新的数值,只是append,

but instead creates a new copy of each modified page. Parent pages, up to the root of the tree, are copied and updated to point to the new versions of their child pages.

所以CouchDB, Datomic, LMDB 用了这种方式,虽然它们用了B-tree,但是他每次update 都是直接从parent page 到当前的page 直接 copy,然后再写

PostgreSQL 则是avoid index updates if different versions of the same object can fit on the same page

Compare and set

对比拿锁的方式,compare and set 是你在写的操作的时候对比现在的value跟你之前读到的value,如果有变化,abort, read-modify-write cycle must be retried

Conflict resolution with replication

在有 replication 的情况下,情况就复杂一点了,拿锁或者compare-and-set 都默认你只有一个copy, 这时候通常需要 application code or special data structures to resolve and merge these versions after the fact.

有multi leader 或者 leaderless 的DB通常都需要user 或者 application resolve conflict

write skew

Write skew is a generalization of lost update problem
![[DDIA-figure-7-8.png]]

这张图给出了 write skew 的例子

This effect, where a write in one transaction changes the result of a search query in another transaction, is called a phantom

也就是如果一个 write 改变了另一个 transaction 的 search condition, 这个就叫phantom

snapshot isolation only avoids read-only phantom, but not read-write transactions like doctor oncall example

Two phase locking

Two-phase locking is similar to lock for preventing dirty write, but much stronger requirements for lock.

[[2021-12-27]] 更新

  • After a transaction has acquired the lock, it must continue to hold the lock until the end of the transaction (commit or abort). This is where the name “two- phase” comes from: the first phase (while the transaction is executing) is when the locks are acquired, and the second phase (at the end of the transaction) is when all the locks are released.

所以说two phase 主要是讲拿锁和释放锁的意思?因为 read lock 可以升级为write lock 所以,不同object的锁都需要被释放,才能真正结束这个 transaction

Several transactions are allowed to concurrently read the same object as long as nobody is writing to it. But as soon as anyone wants to write (modify or delete) an object, exclusive access is required:

  • If transaction A has read an object and transaction B wants to write to that object, B must wait until A commits or aborts before it can continue. (This ensures that B can’t change the object unexpectedly behind A’s back.)
  • If transaction A has written an object and transaction B wants to read that object, B must wait until A commits or aborts before it can continue. (Reading an old version of the object, like in Figure 7-1, is not acceptable under 2PL.)

其实就像java里面的对象锁(obj lock), 把一个对象锁住了,而且只要是有人有写入操作,都锁住。因为这个 transaction 里面的写入有可能会改掉另一个读取 transaction 的结果, 所以一旦有写入,都要等之前的读取 transaction commit,然后才开始,而且后面任何 transaction 都要等, 因为他要保证这个有写入操作的 transaction 完成才让别人读或者写

其实这一章确实强调了 transaction 的重要性,因为都是基于 transaction 而不是单独的写入或者读取, 一个 transaction 里面会有多个读取或者写入操作,所以2PL 就相当于把每个有写入操作的 transaction 变成了 atomic 并且在别人commit之前或者自己commit不让别人动

In 2PL, writers don’t just block other writers; they also block readers and vice versa. Snapshot isolation has the mantra readers never block writers, and writers never block readers, which captures this key difference between snapshot isolation and two-phase locking.

相比于 snapshot isolation, 2PL 会被之前还没commit 的 transaction block, 也会block 之后的所有transaction (只要这个 transaction 有写入操作)

Predicate locks

就是根据query 的需求,如果query里面有查询一段时间,predicate locks 会把当前满足这个 predicate 的 object 都查一遍,没有 lock 在执行

It works similarly to the shared/exclusive lock described earlier, but rather than belonging to a particular object (e.g., one row in a table), it belongs to all objects that match some search condition

The key idea here is that a predicate lock applies even to objects that do not yet exist in the database, but which might be added in the future (phantoms). If two-phase locking includes predicate locks, the database prevents all forms of write skew and other race conditions, and so its isolation becomes serializable.

Index range locks

这种lock实际上就是直接锁住更大范围的object,比如你要 book meeting room 123 between 12pm-1pm, 那么 index range locks 直接锁住room123, 或者所有 12pm-1pm的房间, 这样就不用一个个去找,然后在上锁。

就是用更大范围的锁来省去一些查询时间

Serializable Snapshot Isolation (SSI)

Perhaps not: an algorithm called serializable snapshot isolation (SSI) is very promis‐ ing. It provides full serializability, but has only a small performance penalty com‐ pared to snapshot isolation. SSI is fairly new: it was first described in 2008 [40] and is the subject of Michael Cahill’s PhD thesis

Two-phase locking is a so-called pessimistic concurrency control mechanism: it is based on the principle that if anything might possibly go wrong (as indicated by a lock held by another transaction), it’s better to wait until the situation is safe again before doing anything. It is like mutual exclusion, which is used to protect data struc‐ tures in multi-threaded programming.

2PL 就是跟正常concurrency里面的mutex一样,只不过是在DB层面的 mutex

By contrast, serializable snapshot isolation is an optimistic concurrency control technique.

SSI 是基于 snapshot isolation 的, 他在snapshot isolation基础上加了一个算法,用来判断serialization conflicts among writes, 然后决定哪个 transaction to abort.

On top of snapshot isolation, SSI adds an algorithm for detecting serialization conflicts among writes and determining which transactions to abort.

Decisions based on an outdated premise

这点我在读 snapshot isolation 的时候也观察到了,就是造成 write skew/phantom 的原因就是因为你一开始读一个数值, 然后根据你读的数值来做判断要不要进行写入操作, 但是 snapshot isolation 不一定给你最新的数值(别的thread可能已经更新了当前的value, 但是因为你是从 snapshot 里面读,所以没法读到最新的value)

当然,不能读到最新的数值原本就是为了避免read skew。

Put another way, the transaction is taking an action based on a premise (a fact that was true at the beginning of the transaction, e.g., “There are currently two doctors on call”). Later, when the transaction wants to commit, the original data may have changed—the premise may no longer be true.

文中这段话形容的很好! 你的premise 在你commit 的时候没准已经变了,这就是问题所在

但是我总觉得他们为什么把这些逻辑全放到DB里面,不能在application层面判断吗? 这样岂不是让DB省很多事情? 哦也对,DB通常是vendor 提供的,他们也不知道你application到底要做什么,所以只能提供general 的解决方案

我觉得 DB wants to be as generic as possible

How does the database know if a query result might have changed? There are two cases to consider:
• Detecting reads of a stale MVCC object version (uncommitted write occurred before the read)
• Detecting writes that affect prior reads (the write occurs after the read)

Compared to two-phase locking, the big advantage of serializable snapshot isolation is that one transaction doesn’t need to block waiting for locks held by another trans‐ action. Like under snapshot isolation, writers don’t block readers, and vice versa.

总结

Transactions are an abstraction layer that allows an application to pretend that certain concurrency problems and certain kinds of hardware and software faults don't exist.

我们每次要增加一些功能的时候,都是外面在包装一层abstraction layer. 然后内部解决一些比较复杂的问题

dirty read

看到了其他正在执行一半的transaction的结果就是dirty read

One client reads another client’s writes before they have been committed. The read committed isolation level and stronger levels prevent dirty reads.

dirty write

写到了其他正在执行一半的transaction的就是dirty write

One client overwrites data that another client has written, but not yet committed. Almost all transaction implementations prevent dirty writes.

read skew

读到的值在commit的时候就被改了,然后产生银行账户少了100的状况

A client sees different parts of the database at different points in time. This issue is most commonly prevented with snapshot isolation, which allows a transaction to read from a consistent snapshot at one point in time. It is usually implemented with multi-version concurrency control (MVCC).

lost update

经典counter 问题

Two clients concurrently perform a read-modify-write cycle. One overwrites the other’s write without incorporating its changes, so data is lost. Some implemen‐ tations of snapshot isolation prevent this anomaly automatically, while others require a manual lock (SELECT FOR UPDATE).

Write skew

根据一个条件进行写入操作,结果条件被改了,然后就发生问题了

A transaction reads something, makes a decision based on the value it saw, and writes the decision to the database. However, by the time the write is made, the premise of the decision is no longer true. Only serializable isolation prevents this anomaly.

Phantom reads

跟上面一样,根据一个条件read, 但是commit的时候条件被其他transaction改了

A transaction reads objects that match some search condition. Another client makes a write that affects the results of that search. Snapshot isolation prevents straightforward phantom reads, but phantoms in the context of write skew require special treatment, such as index-range locks.

3 implementation of serializable transactions

Literally executing transactions in serial order

只用一个线程进行整个transaction

Two-phase locking

相当于普通concurrency 的 mutex (所有read 先拿一个 shared lock, 然后如果有write, 升级为 mutex lock)

Serializable snapshot isolation

加强版 snapshot isolation, 加了一层算法来判断如果当前transaction 有write, 判断是否有其他transaction 的premise 会被改,如果有,后commit 的 transaction会直接 abort (例如医生请病假的例子,后一个commit的医生的transaction会被abort)

这个方法跟 snapshot isolation 一样的, writer don't block reader, reader don't block writer.

你可能感兴趣的:(DDIA Ch7)