Dig into SQL Transactions

Tram Ho

  • In any database, poor management of data operations often leads to crashes and performance problems in a multi-user system, as the number of users manipulating data increases. , the management of effective data manipulation is extremely important.
  • In this article, I will introduce to you the methods that SQL uses to ensure data integrity for each transaction and how to effectively manage transactions.

Transaction Basic

  • A transaction is a set of statements executed to the database, the statements are executed sequentially, if any of the commands fail, the transaction will stop and roll back the DB all the executed commands, return DB as at the start of transaction.
  • Transaction has 4 properties (ACID):
    • Atomicity : A transaction must be as a unit of work, either all data changes are made, or no changes have been made.
    • Consistency : Once completed, a transaction must leave all data in a consistent state. In the relational database, all the rules must be applied to the transaction changes made to keep all data intact. All data structures, such as indexs, must be correct at the end of the transaction.
    • Isolation : Modifications of different transactions must be independent of each other. A transaction can only get data in a state before or after this data is changed by another transaction, not in an intermediate state.
    • Durability : After a transaction is completed successfully, its changes will become formal and sustainable, without being rolled back.

Locking and Row Versioning

  • Locking and Row Versioning are mechanisms used by SQL to ensure the integrity of transactions and maintain database consistency when multiple people manipulate data at the same time.
    • Locking : Each transaction requires different types of locks on resources such as: row lock, page lock or table lock, … depending on what transaction depends on. Lock prevents other transactions from changing data. Transactions will release Lock when it is no longer dependent on locked resources. At this time, other transactions can access these resources,
    • Row versioning : When a resource is locked, other transactions will not be able to manipulate this resource, but wait until the transaction holding the resource releases the Lock to be able to access it. This will result in transactions waiting for too long, leading to a significant decrease in system performance. Row versioning stores the versions of locked resources, other transactions that only require reading these resources will be returned to the correct version without having to wait until the resource is released. This will greatly reduce the likelihood of multiple transactions waiting for each other to use resources,

Types of Lock

  • There are 3 main types of Lock are: Share lock, exclusive lock and update lock
    • Share Lock : also known as read-only lock is the lock that a transaction possesses when trying to read a data. Share lock hold transactions are allowed to read data, but not write. Multiple transactions can simultaneously hold Share lock on the same unit of data
    • Exclusive Lock : also known as write lock is the lock that a transaction possesses when trying to read + write data. Only a maximum of 1 transaction can hold an Exclusive lock at a time. It is impossible to set up a Share lock on an existing exclusive lock.
    • Update lock : The key intended to write. Update lock is used when reading data with the intention of writing back on it. Update lock is an intermediate lock mode between Share lock and Exclusive lock. When performing data writing operations, Update lock must automatically be changed to Exclusive lock. Transaction lock Update lock is allowed to WRITE + READ data. At a time, only a maximum of 1 transaction is entitled to keep Update lock on 1 data form. Share lock can be set up on an existing data unit with Update lock

Managing concurrent data access

Impacts can be caused by concurrent access

  • Users manipulating data can influence other people manipulating this data at the same time. If the system does not control well, the user may experience some of the following effects:
    • Lost update : This case occurs when 2 or more transacetion together update 1 row from its original value, Update will eventually overwrite the updates by other transactions resulting in data loss.
    • Uncommitted dependency (dirty read) : This effect occurs when a transaction A reads a row when it is being updated by another transaction B and has not been committed. Transaction A reads data that has not been committed yet. For example, 1 person A in the account has 3 million and is doing 1 transaction to top up 2 million to the account. The transaction has finished updating the account to 5 million, but there are still some operations that have not been run and this data has not been committed. Another person B who shares an account checks the balance, and the result is 5 million. Person A’s transaction has crashed and reported an error, the data is rolled back to 3 million. Thus person B is getting the wrong data.
    • Inconsistent analysis (nonrepeatable read) : This case occurs when a transaction A reads a unit of data many times and the results are different between times because between the reading time of those times, the data is committed by another transaction. change.
    • Phantom reads : This happens when two identical queries are performed but the resulting list rows are different. For example, two transactions are executed at the same time. The two SELECT statements in the first transaction may return different results because the INSERT statement in the second transaction changes the data used by both.


Types of concurrent access management

  • There are 2 types of concurrent access management settings
    • Pessimistic concurrency control: With this setting, when a transaction locks a resource to operate, other transactions will not be able to access this resource, but have to wait until the lock is released. It is called pessimistic control because it is mainly used for environments with high data contention, in which the protection of data with Lock is more effective than rolling back data if there is a copper collision. time
    • Optimistic concurrency control : At this setting, the transaction cannot lock data when reading. When a transaction updates data, the system will check if there is another transaction changing this data at the same time, if there is a raise error and roll back the data. This setting is mainly used in environments with low data contention.

Isolation Levels

  • Isolation levels are data isolation levels. Each transaction is assigned 1 isolation level to specify the extent to which it should be isolated from data modifications made by other transactions.
  • The Isolation levels control:
    • Lock is used when data is read or not and Lock type is used.
    • How long does the read lock last?
    • Can a read operation that references rows be corrected by another transaction?
  • The low level of data isolation increases the concurrency, and thus increases efficiency, but it increases the risk of adverse effects discussed in the previous section.
  • Depending on the circumstances, environment and application requirements, we choose the most appropriate data isolation levels.
  • SQL provides the following isolation levels in ascending order of data isolation: Read Uncommitted, Read Commited, Repeatable Read, Serializable, Snapshot

Read uncommitted

  • This is the lowest level of isolation. When the transaction executes at this level, queries can still access records that are being updated by another transaction and receive data at that time even though that data has not been committed (uncommited data). This will result in a possible Dirty read

Read commited

  • This is the default isolation level, and if you don’t set anything, the transaction should operate at this level. The transaction cannot read the data being updated, but has to wait until the update is done. Thus it avoids the dirty read as above but phantom read may occur

Repeatable read

  • This isolation level works as a read commited but it goes one notch further by preventing the transaction from writing to the data being read by another transaction until that other transaction is complete.
  • This isolation level ensures that readings within the same transaction produce the same results, in other words the data being read is protected from being updated by other transactions. However, it does not protect the data from insert or delete: if you replace the update command in the second window with the insert command, the two select statements in the first window will produce different results. So it still cannot avoid phantom read

Serializable

  • This is the highest level of isolation, the transactions are completely separate from each other, SQL puts read + write lock on the data until the transaction ends. Therefore, phantom read will no longer be at this level.

Snapshot

  • This level is only available when row versioning is enabled.
  • This level also guarantees the equivalent of Serializable isolation, but it differs slightly in the method of operation. When a transaction is selecting records, it does not lock these records, but makes a snapshot and selects on it. So other transactions insert / update on those records does not affect the original transaction. Its effect is to reduce blocking between transactions while ensuring data integrity. However, there is a cost associated with the need for additional memory to store a copy of the records, and this amount of memory is required for each transaction and can therefore greatly increase.

The range of effects resulting from simultaneous access for each isolation level

Isolation levelDirty readNonrepeatable readPhantom
Read uncommittedyesyesyes
Read committednoyesyes
Repeatable readnonoyes
Serializablenonono
Snapshotnonono

References

Share the news now

Source : Viblo