You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

2764 lines
70 KiB

9 years ago
MDEV-14638 - Replace trx_sys_t::rw_trx_set with LF_HASH trx_sys_t::rw_trx_set is implemented as std::set, which does a few quite expensive operations under trx_sys_t::mutex protection: e.g. malloc/free when adding/removing elements. Traversing b-tree is not that cheap either. This has negative scalability impact, which is especially visible when running oltp_update_index.lua benchmark on a ramdisk. To reduce trx_sys_t::mutex contention std::set is replaced with LF_HASH. None of LF_HASH operations require trx_sys_t::mutex (nor any other global mutex) protection. Another interesting issue observed with std::set is reproducible ~2% performance decline after benchmark is ran for ~60 seconds. With LF_HASH results are stable. All in all this patch optimises away one of three trx_sys->mutex locks per oltp_update_index.lua query. The other two critical sections became smaller. Relevant clean-ups: Replaced rw_trx_set iteration at startup with local set. The latter is needed because values inserted to rw_trx_list must be ordered by trx->id. Removed redundant conditions from trx_reference(): it is (and even was) never called with transactions that have trx->state == TRX_STATE_COMMITTED_IN_MEMORY. do_ref_count doesn't (and probably even didn't) make any sense: now it is called only when reference counter increment is actually requested. Moved condition out of mutex in trx_erase_lists(). trx_rw_is_active(), trx_rw_is_active_low() and trx_get_rw_trx_by_id() were greatly simplified and replaced by appropriate trx_rw_hash_t methods. Compared to rw_trx_set, rw_trx_hash holds transactions only in PREPARED or ACTIVE states. Transactions in COMMITTED state were required to be found at InnoDB startup only. They are now looked up in the local set. Removed unused trx_assert_recovered(). Removed unused innobase_get_trx() declaration. Removed rather semantically incorrect trx_sys_rw_trx_add(). Moved information printout from trx_sys_init_at_db_start() to trx_lists_init_at_db_start().
8 years ago
MDEV-14638 - Replace trx_sys_t::rw_trx_set with LF_HASH trx_sys_t::rw_trx_set is implemented as std::set, which does a few quite expensive operations under trx_sys_t::mutex protection: e.g. malloc/free when adding/removing elements. Traversing b-tree is not that cheap either. This has negative scalability impact, which is especially visible when running oltp_update_index.lua benchmark on a ramdisk. To reduce trx_sys_t::mutex contention std::set is replaced with LF_HASH. None of LF_HASH operations require trx_sys_t::mutex (nor any other global mutex) protection. Another interesting issue observed with std::set is reproducible ~2% performance decline after benchmark is ran for ~60 seconds. With LF_HASH results are stable. All in all this patch optimises away one of three trx_sys->mutex locks per oltp_update_index.lua query. The other two critical sections became smaller. Relevant clean-ups: Replaced rw_trx_set iteration at startup with local set. The latter is needed because values inserted to rw_trx_list must be ordered by trx->id. Removed redundant conditions from trx_reference(): it is (and even was) never called with transactions that have trx->state == TRX_STATE_COMMITTED_IN_MEMORY. do_ref_count doesn't (and probably even didn't) make any sense: now it is called only when reference counter increment is actually requested. Moved condition out of mutex in trx_erase_lists(). trx_rw_is_active(), trx_rw_is_active_low() and trx_get_rw_trx_by_id() were greatly simplified and replaced by appropriate trx_rw_hash_t methods. Compared to rw_trx_set, rw_trx_hash holds transactions only in PREPARED or ACTIVE states. Transactions in COMMITTED state were required to be found at InnoDB startup only. They are now looked up in the local set. Removed unused trx_assert_recovered(). Removed unused innobase_get_trx() declaration. Removed rather semantically incorrect trx_sys_rw_trx_add(). Moved information printout from trx_sys_init_at_db_start() to trx_lists_init_at_db_start().
8 years ago
11 years ago
12 years ago
MDEV-14638 - Replace trx_sys_t::rw_trx_set with LF_HASH trx_sys_t::rw_trx_set is implemented as std::set, which does a few quite expensive operations under trx_sys_t::mutex protection: e.g. malloc/free when adding/removing elements. Traversing b-tree is not that cheap either. This has negative scalability impact, which is especially visible when running oltp_update_index.lua benchmark on a ramdisk. To reduce trx_sys_t::mutex contention std::set is replaced with LF_HASH. None of LF_HASH operations require trx_sys_t::mutex (nor any other global mutex) protection. Another interesting issue observed with std::set is reproducible ~2% performance decline after benchmark is ran for ~60 seconds. With LF_HASH results are stable. All in all this patch optimises away one of three trx_sys->mutex locks per oltp_update_index.lua query. The other two critical sections became smaller. Relevant clean-ups: Replaced rw_trx_set iteration at startup with local set. The latter is needed because values inserted to rw_trx_list must be ordered by trx->id. Removed redundant conditions from trx_reference(): it is (and even was) never called with transactions that have trx->state == TRX_STATE_COMMITTED_IN_MEMORY. do_ref_count doesn't (and probably even didn't) make any sense: now it is called only when reference counter increment is actually requested. Moved condition out of mutex in trx_erase_lists(). trx_rw_is_active(), trx_rw_is_active_low() and trx_get_rw_trx_by_id() were greatly simplified and replaced by appropriate trx_rw_hash_t methods. Compared to rw_trx_set, rw_trx_hash holds transactions only in PREPARED or ACTIVE states. Transactions in COMMITTED state were required to be found at InnoDB startup only. They are now looked up in the local set. Removed unused trx_assert_recovered(). Removed unused innobase_get_trx() declaration. Removed rather semantically incorrect trx_sys_rw_trx_add(). Moved information printout from trx_sys_init_at_db_start() to trx_lists_init_at_db_start().
8 years ago
MDEV-14477 InnoDB update_time is wrongly updated after partial rollback or internal COMMIT The non-persistent UPDATE_TIME for InnoDB tables was not being updated consistently at transaction commit. If a transaction is partly rolled back so that in the end it will not modify a table that it intended to modify, the update_time will be updated nevertheless. This will also happen when InnoDB fails to write an undo log record for the intended modification. If a transaction is committed internally in InnoDB, instead of being committed from the SQL interface, then the trx_t::mod_tables will not be applied to the update_time of the tables. trx_t::mod_tables: Replace the std::set<dict_table_t*> with std::map<dict_table_t*,undo_no_t>, so that the very first modification within the transaction is identified. trx_undo_report_row_operation(): Update mod_tables for every operation after the undo log record was successfully written. trx_rollback_to_savepoint_low(): After partial rollback, erase from trx_t::mod_tables any tables for which all changes were rolled back. trx_commit_in_memory(): Tighten some assertions and simplify conditions. Invoke trx_update_mod_tables_timestamp() if persistent tables were affected. trx_commit_for_mysql(): Remove the call to trx_update_mod_tables_timestamp(), as it is now invoked at the lower level, in trx_commit_in_memory(). trx_rollback_finish(): Clear mod_tables before invoking trx_commit(), because the trx_commit_in_memory() would otherwise wrongly process mod_tables after a full ROLLBACK.
8 years ago
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed. [TODO: It appears that the resetting is not taking place as often as it could be. We should test that a simple INSERT should eventually cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is invoked soon enough.] The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert). When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0. The persistent InnoDB undo log used to be split into two parts: insert_undo and update_undo. The insert_undo log was discarded at transaction commit or rollback, and the update_undo log was processed by the purge subsystem. As part of this change, we will only generate a single undo log for new transactions, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is touched. That is, all persistent undo log will be preserved at transaction commit or rollback, to be removed by purge. The InnoDB redo log format is changed in two ways: We remove the redo log record type MLOG_UNDO_HDR_REUSE, and we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table. This is also changing the format of persistent InnoDB data files: undo log and clustered index leaf page records. It will still be possible via import and export to exchange data files with earlier versions of MariaDB. The change to clustered index leaf page records is simple: we allow DB_TRX_ID to be 0. When it comes to the undo log, we must be able to upgrade from earlier MariaDB versions after a clean shutdown (no redo log to apply). While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0) before an upgrade, to empty the undo logs, we cannot assume that this has been done. So, separate insert_undo log may exist for recovered uncommitted transactions. These transactions may be automatically rolled back, or they may be in XA PREPARE state, in which case InnoDB will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK. Upgrade has been tested by starting up MariaDB 10.2 with ./mysql-test-run --manual-gdb innodb.read_only_recovery and then starting up this patched server with and without --innodb-read-only. trx_undo_ptr_t::undo: Renamed from update_undo. trx_undo_ptr_t::old_insert: Renamed from insert_undo. trx_rseg_t::undo_list: Renamed from update_undo_list. trx_rseg_t::undo_cached: Merged from update_undo_cached and insert_undo_cached. trx_rseg_t::old_insert_list: Renamed from insert_undo_list. row_purge_reset_trx_id(): New function to reset the columns. This will be called for all undo processing in purge that does not remove the clustered index record. trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the old DB_TRX_ID of the record to the undo log. ReadView::changes_visible(): Allow id==0. (Return true for it. This is what speeds up the MVCC.) row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read(): Implement a fast path for DB_TRX_ID=0. Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type. MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format! innobase_start_or_create_for_mysql(): Set srv_undo_sources before starting any transactions. The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully tested by running the following: ./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680 grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err
8 years ago
MDEV-14638 - Replace trx_sys_t::rw_trx_set with LF_HASH trx_sys_t::rw_trx_set is implemented as std::set, which does a few quite expensive operations under trx_sys_t::mutex protection: e.g. malloc/free when adding/removing elements. Traversing b-tree is not that cheap either. This has negative scalability impact, which is especially visible when running oltp_update_index.lua benchmark on a ramdisk. To reduce trx_sys_t::mutex contention std::set is replaced with LF_HASH. None of LF_HASH operations require trx_sys_t::mutex (nor any other global mutex) protection. Another interesting issue observed with std::set is reproducible ~2% performance decline after benchmark is ran for ~60 seconds. With LF_HASH results are stable. All in all this patch optimises away one of three trx_sys->mutex locks per oltp_update_index.lua query. The other two critical sections became smaller. Relevant clean-ups: Replaced rw_trx_set iteration at startup with local set. The latter is needed because values inserted to rw_trx_list must be ordered by trx->id. Removed redundant conditions from trx_reference(): it is (and even was) never called with transactions that have trx->state == TRX_STATE_COMMITTED_IN_MEMORY. do_ref_count doesn't (and probably even didn't) make any sense: now it is called only when reference counter increment is actually requested. Moved condition out of mutex in trx_erase_lists(). trx_rw_is_active(), trx_rw_is_active_low() and trx_get_rw_trx_by_id() were greatly simplified and replaced by appropriate trx_rw_hash_t methods. Compared to rw_trx_set, rw_trx_hash holds transactions only in PREPARED or ACTIVE states. Transactions in COMMITTED state were required to be found at InnoDB startup only. They are now looked up in the local set. Removed unused trx_assert_recovered(). Removed unused innobase_get_trx() declaration. Removed rather semantically incorrect trx_sys_rw_trx_add(). Moved information printout from trx_sys_init_at_db_start() to trx_lists_init_at_db_start().
8 years ago
MDEV-14638 - Replace trx_sys_t::rw_trx_set with LF_HASH trx_sys_t::rw_trx_set is implemented as std::set, which does a few quite expensive operations under trx_sys_t::mutex protection: e.g. malloc/free when adding/removing elements. Traversing b-tree is not that cheap either. This has negative scalability impact, which is especially visible when running oltp_update_index.lua benchmark on a ramdisk. To reduce trx_sys_t::mutex contention std::set is replaced with LF_HASH. None of LF_HASH operations require trx_sys_t::mutex (nor any other global mutex) protection. Another interesting issue observed with std::set is reproducible ~2% performance decline after benchmark is ran for ~60 seconds. With LF_HASH results are stable. All in all this patch optimises away one of three trx_sys->mutex locks per oltp_update_index.lua query. The other two critical sections became smaller. Relevant clean-ups: Replaced rw_trx_set iteration at startup with local set. The latter is needed because values inserted to rw_trx_list must be ordered by trx->id. Removed redundant conditions from trx_reference(): it is (and even was) never called with transactions that have trx->state == TRX_STATE_COMMITTED_IN_MEMORY. do_ref_count doesn't (and probably even didn't) make any sense: now it is called only when reference counter increment is actually requested. Moved condition out of mutex in trx_erase_lists(). trx_rw_is_active(), trx_rw_is_active_low() and trx_get_rw_trx_by_id() were greatly simplified and replaced by appropriate trx_rw_hash_t methods. Compared to rw_trx_set, rw_trx_hash holds transactions only in PREPARED or ACTIVE states. Transactions in COMMITTED state were required to be found at InnoDB startup only. They are now looked up in the local set. Removed unused trx_assert_recovered(). Removed unused innobase_get_trx() declaration. Removed rather semantically incorrect trx_sys_rw_trx_add(). Moved information printout from trx_sys_init_at_db_start() to trx_lists_init_at_db_start().
8 years ago
MDEV-14638 - Replace trx_sys_t::rw_trx_set with LF_HASH trx_sys_t::rw_trx_set is implemented as std::set, which does a few quite expensive operations under trx_sys_t::mutex protection: e.g. malloc/free when adding/removing elements. Traversing b-tree is not that cheap either. This has negative scalability impact, which is especially visible when running oltp_update_index.lua benchmark on a ramdisk. To reduce trx_sys_t::mutex contention std::set is replaced with LF_HASH. None of LF_HASH operations require trx_sys_t::mutex (nor any other global mutex) protection. Another interesting issue observed with std::set is reproducible ~2% performance decline after benchmark is ran for ~60 seconds. With LF_HASH results are stable. All in all this patch optimises away one of three trx_sys->mutex locks per oltp_update_index.lua query. The other two critical sections became smaller. Relevant clean-ups: Replaced rw_trx_set iteration at startup with local set. The latter is needed because values inserted to rw_trx_list must be ordered by trx->id. Removed redundant conditions from trx_reference(): it is (and even was) never called with transactions that have trx->state == TRX_STATE_COMMITTED_IN_MEMORY. do_ref_count doesn't (and probably even didn't) make any sense: now it is called only when reference counter increment is actually requested. Moved condition out of mutex in trx_erase_lists(). trx_rw_is_active(), trx_rw_is_active_low() and trx_get_rw_trx_by_id() were greatly simplified and replaced by appropriate trx_rw_hash_t methods. Compared to rw_trx_set, rw_trx_hash holds transactions only in PREPARED or ACTIVE states. Transactions in COMMITTED state were required to be found at InnoDB startup only. They are now looked up in the local set. Removed unused trx_assert_recovered(). Removed unused innobase_get_trx() declaration. Removed rather semantically incorrect trx_sys_rw_trx_add(). Moved information printout from trx_sys_init_at_db_start() to trx_lists_init_at_db_start().
8 years ago
MDEV-14638 - Replace trx_sys_t::rw_trx_set with LF_HASH trx_sys_t::rw_trx_set is implemented as std::set, which does a few quite expensive operations under trx_sys_t::mutex protection: e.g. malloc/free when adding/removing elements. Traversing b-tree is not that cheap either. This has negative scalability impact, which is especially visible when running oltp_update_index.lua benchmark on a ramdisk. To reduce trx_sys_t::mutex contention std::set is replaced with LF_HASH. None of LF_HASH operations require trx_sys_t::mutex (nor any other global mutex) protection. Another interesting issue observed with std::set is reproducible ~2% performance decline after benchmark is ran for ~60 seconds. With LF_HASH results are stable. All in all this patch optimises away one of three trx_sys->mutex locks per oltp_update_index.lua query. The other two critical sections became smaller. Relevant clean-ups: Replaced rw_trx_set iteration at startup with local set. The latter is needed because values inserted to rw_trx_list must be ordered by trx->id. Removed redundant conditions from trx_reference(): it is (and even was) never called with transactions that have trx->state == TRX_STATE_COMMITTED_IN_MEMORY. do_ref_count doesn't (and probably even didn't) make any sense: now it is called only when reference counter increment is actually requested. Moved condition out of mutex in trx_erase_lists(). trx_rw_is_active(), trx_rw_is_active_low() and trx_get_rw_trx_by_id() were greatly simplified and replaced by appropriate trx_rw_hash_t methods. Compared to rw_trx_set, rw_trx_hash holds transactions only in PREPARED or ACTIVE states. Transactions in COMMITTED state were required to be found at InnoDB startup only. They are now looked up in the local set. Removed unused trx_assert_recovered(). Removed unused innobase_get_trx() declaration. Removed rather semantically incorrect trx_sys_rw_trx_add(). Moved information printout from trx_sys_init_at_db_start() to trx_lists_init_at_db_start().
8 years ago
MDEV-14638 - Replace trx_sys_t::rw_trx_set with LF_HASH trx_sys_t::rw_trx_set is implemented as std::set, which does a few quite expensive operations under trx_sys_t::mutex protection: e.g. malloc/free when adding/removing elements. Traversing b-tree is not that cheap either. This has negative scalability impact, which is especially visible when running oltp_update_index.lua benchmark on a ramdisk. To reduce trx_sys_t::mutex contention std::set is replaced with LF_HASH. None of LF_HASH operations require trx_sys_t::mutex (nor any other global mutex) protection. Another interesting issue observed with std::set is reproducible ~2% performance decline after benchmark is ran for ~60 seconds. With LF_HASH results are stable. All in all this patch optimises away one of three trx_sys->mutex locks per oltp_update_index.lua query. The other two critical sections became smaller. Relevant clean-ups: Replaced rw_trx_set iteration at startup with local set. The latter is needed because values inserted to rw_trx_list must be ordered by trx->id. Removed redundant conditions from trx_reference(): it is (and even was) never called with transactions that have trx->state == TRX_STATE_COMMITTED_IN_MEMORY. do_ref_count doesn't (and probably even didn't) make any sense: now it is called only when reference counter increment is actually requested. Moved condition out of mutex in trx_erase_lists(). trx_rw_is_active(), trx_rw_is_active_low() and trx_get_rw_trx_by_id() were greatly simplified and replaced by appropriate trx_rw_hash_t methods. Compared to rw_trx_set, rw_trx_hash holds transactions only in PREPARED or ACTIVE states. Transactions in COMMITTED state were required to be found at InnoDB startup only. They are now looked up in the local set. Removed unused trx_assert_recovered(). Removed unused innobase_get_trx() declaration. Removed rather semantically incorrect trx_sys_rw_trx_add(). Moved information printout from trx_sys_init_at_db_start() to trx_lists_init_at_db_start().
8 years ago
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed. [TODO: It appears that the resetting is not taking place as often as it could be. We should test that a simple INSERT should eventually cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is invoked soon enough.] The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert). When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0. The persistent InnoDB undo log used to be split into two parts: insert_undo and update_undo. The insert_undo log was discarded at transaction commit or rollback, and the update_undo log was processed by the purge subsystem. As part of this change, we will only generate a single undo log for new transactions, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is touched. That is, all persistent undo log will be preserved at transaction commit or rollback, to be removed by purge. The InnoDB redo log format is changed in two ways: We remove the redo log record type MLOG_UNDO_HDR_REUSE, and we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table. This is also changing the format of persistent InnoDB data files: undo log and clustered index leaf page records. It will still be possible via import and export to exchange data files with earlier versions of MariaDB. The change to clustered index leaf page records is simple: we allow DB_TRX_ID to be 0. When it comes to the undo log, we must be able to upgrade from earlier MariaDB versions after a clean shutdown (no redo log to apply). While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0) before an upgrade, to empty the undo logs, we cannot assume that this has been done. So, separate insert_undo log may exist for recovered uncommitted transactions. These transactions may be automatically rolled back, or they may be in XA PREPARE state, in which case InnoDB will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK. Upgrade has been tested by starting up MariaDB 10.2 with ./mysql-test-run --manual-gdb innodb.read_only_recovery and then starting up this patched server with and without --innodb-read-only. trx_undo_ptr_t::undo: Renamed from update_undo. trx_undo_ptr_t::old_insert: Renamed from insert_undo. trx_rseg_t::undo_list: Renamed from update_undo_list. trx_rseg_t::undo_cached: Merged from update_undo_cached and insert_undo_cached. trx_rseg_t::old_insert_list: Renamed from insert_undo_list. row_purge_reset_trx_id(): New function to reset the columns. This will be called for all undo processing in purge that does not remove the clustered index record. trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the old DB_TRX_ID of the record to the undo log. ReadView::changes_visible(): Allow id==0. (Return true for it. This is what speeds up the MVCC.) row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read(): Implement a fast path for DB_TRX_ID=0. Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type. MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format! innobase_start_or_create_for_mysql(): Set srv_undo_sources before starting any transactions. The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully tested by running the following: ./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680 grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err
8 years ago
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed. [TODO: It appears that the resetting is not taking place as often as it could be. We should test that a simple INSERT should eventually cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is invoked soon enough.] The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert). When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0. The persistent InnoDB undo log used to be split into two parts: insert_undo and update_undo. The insert_undo log was discarded at transaction commit or rollback, and the update_undo log was processed by the purge subsystem. As part of this change, we will only generate a single undo log for new transactions, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is touched. That is, all persistent undo log will be preserved at transaction commit or rollback, to be removed by purge. The InnoDB redo log format is changed in two ways: We remove the redo log record type MLOG_UNDO_HDR_REUSE, and we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table. This is also changing the format of persistent InnoDB data files: undo log and clustered index leaf page records. It will still be possible via import and export to exchange data files with earlier versions of MariaDB. The change to clustered index leaf page records is simple: we allow DB_TRX_ID to be 0. When it comes to the undo log, we must be able to upgrade from earlier MariaDB versions after a clean shutdown (no redo log to apply). While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0) before an upgrade, to empty the undo logs, we cannot assume that this has been done. So, separate insert_undo log may exist for recovered uncommitted transactions. These transactions may be automatically rolled back, or they may be in XA PREPARE state, in which case InnoDB will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK. Upgrade has been tested by starting up MariaDB 10.2 with ./mysql-test-run --manual-gdb innodb.read_only_recovery and then starting up this patched server with and without --innodb-read-only. trx_undo_ptr_t::undo: Renamed from update_undo. trx_undo_ptr_t::old_insert: Renamed from insert_undo. trx_rseg_t::undo_list: Renamed from update_undo_list. trx_rseg_t::undo_cached: Merged from update_undo_cached and insert_undo_cached. trx_rseg_t::old_insert_list: Renamed from insert_undo_list. row_purge_reset_trx_id(): New function to reset the columns. This will be called for all undo processing in purge that does not remove the clustered index record. trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the old DB_TRX_ID of the record to the undo log. ReadView::changes_visible(): Allow id==0. (Return true for it. This is what speeds up the MVCC.) row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read(): Implement a fast path for DB_TRX_ID=0. Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type. MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format! innobase_start_or_create_for_mysql(): Set srv_undo_sources before starting any transactions. The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully tested by running the following: ./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680 grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err
8 years ago
MDEV-14638 - Replace trx_sys_t::rw_trx_set with LF_HASH trx_sys_t::rw_trx_set is implemented as std::set, which does a few quite expensive operations under trx_sys_t::mutex protection: e.g. malloc/free when adding/removing elements. Traversing b-tree is not that cheap either. This has negative scalability impact, which is especially visible when running oltp_update_index.lua benchmark on a ramdisk. To reduce trx_sys_t::mutex contention std::set is replaced with LF_HASH. None of LF_HASH operations require trx_sys_t::mutex (nor any other global mutex) protection. Another interesting issue observed with std::set is reproducible ~2% performance decline after benchmark is ran for ~60 seconds. With LF_HASH results are stable. All in all this patch optimises away one of three trx_sys->mutex locks per oltp_update_index.lua query. The other two critical sections became smaller. Relevant clean-ups: Replaced rw_trx_set iteration at startup with local set. The latter is needed because values inserted to rw_trx_list must be ordered by trx->id. Removed redundant conditions from trx_reference(): it is (and even was) never called with transactions that have trx->state == TRX_STATE_COMMITTED_IN_MEMORY. do_ref_count doesn't (and probably even didn't) make any sense: now it is called only when reference counter increment is actually requested. Moved condition out of mutex in trx_erase_lists(). trx_rw_is_active(), trx_rw_is_active_low() and trx_get_rw_trx_by_id() were greatly simplified and replaced by appropriate trx_rw_hash_t methods. Compared to rw_trx_set, rw_trx_hash holds transactions only in PREPARED or ACTIVE states. Transactions in COMMITTED state were required to be found at InnoDB startup only. They are now looked up in the local set. Removed unused trx_assert_recovered(). Removed unused innobase_get_trx() declaration. Removed rather semantically incorrect trx_sys_rw_trx_add(). Moved information printout from trx_sys_init_at_db_start() to trx_lists_init_at_db_start().
8 years ago
MDEV-14638 - Replace trx_sys_t::rw_trx_set with LF_HASH trx_sys_t::rw_trx_set is implemented as std::set, which does a few quite expensive operations under trx_sys_t::mutex protection: e.g. malloc/free when adding/removing elements. Traversing b-tree is not that cheap either. This has negative scalability impact, which is especially visible when running oltp_update_index.lua benchmark on a ramdisk. To reduce trx_sys_t::mutex contention std::set is replaced with LF_HASH. None of LF_HASH operations require trx_sys_t::mutex (nor any other global mutex) protection. Another interesting issue observed with std::set is reproducible ~2% performance decline after benchmark is ran for ~60 seconds. With LF_HASH results are stable. All in all this patch optimises away one of three trx_sys->mutex locks per oltp_update_index.lua query. The other two critical sections became smaller. Relevant clean-ups: Replaced rw_trx_set iteration at startup with local set. The latter is needed because values inserted to rw_trx_list must be ordered by trx->id. Removed redundant conditions from trx_reference(): it is (and even was) never called with transactions that have trx->state == TRX_STATE_COMMITTED_IN_MEMORY. do_ref_count doesn't (and probably even didn't) make any sense: now it is called only when reference counter increment is actually requested. Moved condition out of mutex in trx_erase_lists(). trx_rw_is_active(), trx_rw_is_active_low() and trx_get_rw_trx_by_id() were greatly simplified and replaced by appropriate trx_rw_hash_t methods. Compared to rw_trx_set, rw_trx_hash holds transactions only in PREPARED or ACTIVE states. Transactions in COMMITTED state were required to be found at InnoDB startup only. They are now looked up in the local set. Removed unused trx_assert_recovered(). Removed unused innobase_get_trx() declaration. Removed rather semantically incorrect trx_sys_rw_trx_add(). Moved information printout from trx_sys_init_at_db_start() to trx_lists_init_at_db_start().
8 years ago
MDEV-12289 Keep 128 persistent rollback segments for compatibility and performance InnoDB divides the allocation of undo logs into rollback segments. The DB_ROLL_PTR system column of clustered indexes can address up to 128 rollback segments (TRX_SYS_N_RSEGS). Originally, InnoDB only created one rollback segment. In MySQL 5.5 or in the InnoDB Plugin for MySQL 5.1, all 128 rollback segments were created. MySQL 5.7 hard-codes the rollback segment IDs 1..32 for temporary undo logs. On upgrade, unless a slow shutdown (innodb_fast_shutdown=0) was performed on the old server instance, these rollback segments could be in use by transactions that are in XA PREPARE state or transactions that were left behind by a server kill followed by a normal shutdown immediately after restart. Persistent tables cannot refer to temporary undo logs or vice versa. Therefore, we should keep two distinct sets of rollback segments: one for persistent tables and another for temporary tables. In this way, all 128 rollback segments will be available for both types of tables, which could improve performance. Also, MariaDB 10.2 will remain more compatible than MySQL 5.7 with data files from earlier versions of MySQL or MariaDB. trx_sys_t::temp_rsegs[TRX_SYS_N_RSEGS]: A new array of temporary rollback segments. The trx_sys_t::rseg_array[TRX_SYS_N_RSEGS] will be solely for persistent undo logs. srv_tmp_undo_logs. Remove. Use the constant TRX_SYS_N_RSEGS. srv_available_undo_logs: Change the type to ulong. trx_rseg_get_on_id(): Remove. Instead, let the callers refer to trx_sys directly. trx_rseg_create(), trx_sysf_rseg_find_free(): Remove unneeded parameters. These functions only deal with persistent undo logs. trx_temp_rseg_create(): New function, to create all temporary rollback segments at server startup. trx_rseg_t::is_persistent(): Determine if the rollback segment is for persistent tables. trx_sys_is_noredo_rseg_slot(): Remove. The callers must know based on context (such as table handle) whether the DB_ROLL_PTR is referring to a persistent undo log. trx_sys_create_rsegs(): Remove all parameters, which were always passed as global variables. Instead, modify the global variables directly. enum trx_rseg_type_t: Remove. trx_t::get_temp_rseg(): A method to ensure that a temporary rollback segment has been assigned for the transaction. trx_t::assign_temp_rseg(): Replaces trx_assign_rseg(). trx_purge_free_segment(), trx_purge_truncate_rseg_history(): Remove the redundant variable noredo=false. Temporary undo logs are discarded immediately at transaction commit or rollback, not lazily by purge. trx_purge_mark_undo_for_truncate(): Remove references to the temporary rollback segments. trx_purge_mark_undo_for_truncate(): Remove a check for temporary rollback segments. Only the dedicated persistent undo log tablespaces can be truncated. trx_undo_get_undo_rec_low(), trx_undo_get_undo_rec(): Add the parameter is_temp. trx_rseg_mem_restore(): Split from trx_rseg_mem_create(). Initialize the undo log and the rollback segment from the file data structures. trx_sysf_get_n_rseg_slots(): Renamed from trx_sysf_used_slots_for_redo_rseg(). Count the persistent rollback segment headers that have been initialized. trx_sys_close(): Also free trx_sys->temp_rsegs[]. get_next_redo_rseg(): Merged to trx_assign_rseg_low(). trx_assign_rseg_low(): Remove the parameters and access the global variables directly. Revert to simple round-robin, now that the whole trx_sys->rseg_array[] is for persistent undo log again. get_next_noredo_rseg(): Moved to trx_t::assign_temp_rseg(). srv_undo_tablespaces_init(): Remove some parameters and use the global variables directly. Clarify some error messages. Adjust the test innodb.log_file. Apparently, before these changes, InnoDB somehow ignored missing dedicated undo tablespace files that are pointed by the TRX_SYS header page, possibly losing part of essential transaction system state.
9 years ago
MDEV-12289 Keep 128 persistent rollback segments for compatibility and performance InnoDB divides the allocation of undo logs into rollback segments. The DB_ROLL_PTR system column of clustered indexes can address up to 128 rollback segments (TRX_SYS_N_RSEGS). Originally, InnoDB only created one rollback segment. In MySQL 5.5 or in the InnoDB Plugin for MySQL 5.1, all 128 rollback segments were created. MySQL 5.7 hard-codes the rollback segment IDs 1..32 for temporary undo logs. On upgrade, unless a slow shutdown (innodb_fast_shutdown=0) was performed on the old server instance, these rollback segments could be in use by transactions that are in XA PREPARE state or transactions that were left behind by a server kill followed by a normal shutdown immediately after restart. Persistent tables cannot refer to temporary undo logs or vice versa. Therefore, we should keep two distinct sets of rollback segments: one for persistent tables and another for temporary tables. In this way, all 128 rollback segments will be available for both types of tables, which could improve performance. Also, MariaDB 10.2 will remain more compatible than MySQL 5.7 with data files from earlier versions of MySQL or MariaDB. trx_sys_t::temp_rsegs[TRX_SYS_N_RSEGS]: A new array of temporary rollback segments. The trx_sys_t::rseg_array[TRX_SYS_N_RSEGS] will be solely for persistent undo logs. srv_tmp_undo_logs. Remove. Use the constant TRX_SYS_N_RSEGS. srv_available_undo_logs: Change the type to ulong. trx_rseg_get_on_id(): Remove. Instead, let the callers refer to trx_sys directly. trx_rseg_create(), trx_sysf_rseg_find_free(): Remove unneeded parameters. These functions only deal with persistent undo logs. trx_temp_rseg_create(): New function, to create all temporary rollback segments at server startup. trx_rseg_t::is_persistent(): Determine if the rollback segment is for persistent tables. trx_sys_is_noredo_rseg_slot(): Remove. The callers must know based on context (such as table handle) whether the DB_ROLL_PTR is referring to a persistent undo log. trx_sys_create_rsegs(): Remove all parameters, which were always passed as global variables. Instead, modify the global variables directly. enum trx_rseg_type_t: Remove. trx_t::get_temp_rseg(): A method to ensure that a temporary rollback segment has been assigned for the transaction. trx_t::assign_temp_rseg(): Replaces trx_assign_rseg(). trx_purge_free_segment(), trx_purge_truncate_rseg_history(): Remove the redundant variable noredo=false. Temporary undo logs are discarded immediately at transaction commit or rollback, not lazily by purge. trx_purge_mark_undo_for_truncate(): Remove references to the temporary rollback segments. trx_purge_mark_undo_for_truncate(): Remove a check for temporary rollback segments. Only the dedicated persistent undo log tablespaces can be truncated. trx_undo_get_undo_rec_low(), trx_undo_get_undo_rec(): Add the parameter is_temp. trx_rseg_mem_restore(): Split from trx_rseg_mem_create(). Initialize the undo log and the rollback segment from the file data structures. trx_sysf_get_n_rseg_slots(): Renamed from trx_sysf_used_slots_for_redo_rseg(). Count the persistent rollback segment headers that have been initialized. trx_sys_close(): Also free trx_sys->temp_rsegs[]. get_next_redo_rseg(): Merged to trx_assign_rseg_low(). trx_assign_rseg_low(): Remove the parameters and access the global variables directly. Revert to simple round-robin, now that the whole trx_sys->rseg_array[] is for persistent undo log again. get_next_noredo_rseg(): Moved to trx_t::assign_temp_rseg(). srv_undo_tablespaces_init(): Remove some parameters and use the global variables directly. Clarify some error messages. Adjust the test innodb.log_file. Apparently, before these changes, InnoDB somehow ignored missing dedicated undo tablespace files that are pointed by the TRX_SYS header page, possibly losing part of essential transaction system state.
9 years ago
MDEV-12289 Keep 128 persistent rollback segments for compatibility and performance InnoDB divides the allocation of undo logs into rollback segments. The DB_ROLL_PTR system column of clustered indexes can address up to 128 rollback segments (TRX_SYS_N_RSEGS). Originally, InnoDB only created one rollback segment. In MySQL 5.5 or in the InnoDB Plugin for MySQL 5.1, all 128 rollback segments were created. MySQL 5.7 hard-codes the rollback segment IDs 1..32 for temporary undo logs. On upgrade, unless a slow shutdown (innodb_fast_shutdown=0) was performed on the old server instance, these rollback segments could be in use by transactions that are in XA PREPARE state or transactions that were left behind by a server kill followed by a normal shutdown immediately after restart. Persistent tables cannot refer to temporary undo logs or vice versa. Therefore, we should keep two distinct sets of rollback segments: one for persistent tables and another for temporary tables. In this way, all 128 rollback segments will be available for both types of tables, which could improve performance. Also, MariaDB 10.2 will remain more compatible than MySQL 5.7 with data files from earlier versions of MySQL or MariaDB. trx_sys_t::temp_rsegs[TRX_SYS_N_RSEGS]: A new array of temporary rollback segments. The trx_sys_t::rseg_array[TRX_SYS_N_RSEGS] will be solely for persistent undo logs. srv_tmp_undo_logs. Remove. Use the constant TRX_SYS_N_RSEGS. srv_available_undo_logs: Change the type to ulong. trx_rseg_get_on_id(): Remove. Instead, let the callers refer to trx_sys directly. trx_rseg_create(), trx_sysf_rseg_find_free(): Remove unneeded parameters. These functions only deal with persistent undo logs. trx_temp_rseg_create(): New function, to create all temporary rollback segments at server startup. trx_rseg_t::is_persistent(): Determine if the rollback segment is for persistent tables. trx_sys_is_noredo_rseg_slot(): Remove. The callers must know based on context (such as table handle) whether the DB_ROLL_PTR is referring to a persistent undo log. trx_sys_create_rsegs(): Remove all parameters, which were always passed as global variables. Instead, modify the global variables directly. enum trx_rseg_type_t: Remove. trx_t::get_temp_rseg(): A method to ensure that a temporary rollback segment has been assigned for the transaction. trx_t::assign_temp_rseg(): Replaces trx_assign_rseg(). trx_purge_free_segment(), trx_purge_truncate_rseg_history(): Remove the redundant variable noredo=false. Temporary undo logs are discarded immediately at transaction commit or rollback, not lazily by purge. trx_purge_mark_undo_for_truncate(): Remove references to the temporary rollback segments. trx_purge_mark_undo_for_truncate(): Remove a check for temporary rollback segments. Only the dedicated persistent undo log tablespaces can be truncated. trx_undo_get_undo_rec_low(), trx_undo_get_undo_rec(): Add the parameter is_temp. trx_rseg_mem_restore(): Split from trx_rseg_mem_create(). Initialize the undo log and the rollback segment from the file data structures. trx_sysf_get_n_rseg_slots(): Renamed from trx_sysf_used_slots_for_redo_rseg(). Count the persistent rollback segment headers that have been initialized. trx_sys_close(): Also free trx_sys->temp_rsegs[]. get_next_redo_rseg(): Merged to trx_assign_rseg_low(). trx_assign_rseg_low(): Remove the parameters and access the global variables directly. Revert to simple round-robin, now that the whole trx_sys->rseg_array[] is for persistent undo log again. get_next_noredo_rseg(): Moved to trx_t::assign_temp_rseg(). srv_undo_tablespaces_init(): Remove some parameters and use the global variables directly. Clarify some error messages. Adjust the test innodb.log_file. Apparently, before these changes, InnoDB somehow ignored missing dedicated undo tablespace files that are pointed by the TRX_SYS header page, possibly losing part of essential transaction system state.
9 years ago
MDEV-12289 Keep 128 persistent rollback segments for compatibility and performance InnoDB divides the allocation of undo logs into rollback segments. The DB_ROLL_PTR system column of clustered indexes can address up to 128 rollback segments (TRX_SYS_N_RSEGS). Originally, InnoDB only created one rollback segment. In MySQL 5.5 or in the InnoDB Plugin for MySQL 5.1, all 128 rollback segments were created. MySQL 5.7 hard-codes the rollback segment IDs 1..32 for temporary undo logs. On upgrade, unless a slow shutdown (innodb_fast_shutdown=0) was performed on the old server instance, these rollback segments could be in use by transactions that are in XA PREPARE state or transactions that were left behind by a server kill followed by a normal shutdown immediately after restart. Persistent tables cannot refer to temporary undo logs or vice versa. Therefore, we should keep two distinct sets of rollback segments: one for persistent tables and another for temporary tables. In this way, all 128 rollback segments will be available for both types of tables, which could improve performance. Also, MariaDB 10.2 will remain more compatible than MySQL 5.7 with data files from earlier versions of MySQL or MariaDB. trx_sys_t::temp_rsegs[TRX_SYS_N_RSEGS]: A new array of temporary rollback segments. The trx_sys_t::rseg_array[TRX_SYS_N_RSEGS] will be solely for persistent undo logs. srv_tmp_undo_logs. Remove. Use the constant TRX_SYS_N_RSEGS. srv_available_undo_logs: Change the type to ulong. trx_rseg_get_on_id(): Remove. Instead, let the callers refer to trx_sys directly. trx_rseg_create(), trx_sysf_rseg_find_free(): Remove unneeded parameters. These functions only deal with persistent undo logs. trx_temp_rseg_create(): New function, to create all temporary rollback segments at server startup. trx_rseg_t::is_persistent(): Determine if the rollback segment is for persistent tables. trx_sys_is_noredo_rseg_slot(): Remove. The callers must know based on context (such as table handle) whether the DB_ROLL_PTR is referring to a persistent undo log. trx_sys_create_rsegs(): Remove all parameters, which were always passed as global variables. Instead, modify the global variables directly. enum trx_rseg_type_t: Remove. trx_t::get_temp_rseg(): A method to ensure that a temporary rollback segment has been assigned for the transaction. trx_t::assign_temp_rseg(): Replaces trx_assign_rseg(). trx_purge_free_segment(), trx_purge_truncate_rseg_history(): Remove the redundant variable noredo=false. Temporary undo logs are discarded immediately at transaction commit or rollback, not lazily by purge. trx_purge_mark_undo_for_truncate(): Remove references to the temporary rollback segments. trx_purge_mark_undo_for_truncate(): Remove a check for temporary rollback segments. Only the dedicated persistent undo log tablespaces can be truncated. trx_undo_get_undo_rec_low(), trx_undo_get_undo_rec(): Add the parameter is_temp. trx_rseg_mem_restore(): Split from trx_rseg_mem_create(). Initialize the undo log and the rollback segment from the file data structures. trx_sysf_get_n_rseg_slots(): Renamed from trx_sysf_used_slots_for_redo_rseg(). Count the persistent rollback segment headers that have been initialized. trx_sys_close(): Also free trx_sys->temp_rsegs[]. get_next_redo_rseg(): Merged to trx_assign_rseg_low(). trx_assign_rseg_low(): Remove the parameters and access the global variables directly. Revert to simple round-robin, now that the whole trx_sys->rseg_array[] is for persistent undo log again. get_next_noredo_rseg(): Moved to trx_t::assign_temp_rseg(). srv_undo_tablespaces_init(): Remove some parameters and use the global variables directly. Clarify some error messages. Adjust the test innodb.log_file. Apparently, before these changes, InnoDB somehow ignored missing dedicated undo tablespace files that are pointed by the TRX_SYS header page, possibly losing part of essential transaction system state.
9 years ago
MDEV-12289 Keep 128 persistent rollback segments for compatibility and performance InnoDB divides the allocation of undo logs into rollback segments. The DB_ROLL_PTR system column of clustered indexes can address up to 128 rollback segments (TRX_SYS_N_RSEGS). Originally, InnoDB only created one rollback segment. In MySQL 5.5 or in the InnoDB Plugin for MySQL 5.1, all 128 rollback segments were created. MySQL 5.7 hard-codes the rollback segment IDs 1..32 for temporary undo logs. On upgrade, unless a slow shutdown (innodb_fast_shutdown=0) was performed on the old server instance, these rollback segments could be in use by transactions that are in XA PREPARE state or transactions that were left behind by a server kill followed by a normal shutdown immediately after restart. Persistent tables cannot refer to temporary undo logs or vice versa. Therefore, we should keep two distinct sets of rollback segments: one for persistent tables and another for temporary tables. In this way, all 128 rollback segments will be available for both types of tables, which could improve performance. Also, MariaDB 10.2 will remain more compatible than MySQL 5.7 with data files from earlier versions of MySQL or MariaDB. trx_sys_t::temp_rsegs[TRX_SYS_N_RSEGS]: A new array of temporary rollback segments. The trx_sys_t::rseg_array[TRX_SYS_N_RSEGS] will be solely for persistent undo logs. srv_tmp_undo_logs. Remove. Use the constant TRX_SYS_N_RSEGS. srv_available_undo_logs: Change the type to ulong. trx_rseg_get_on_id(): Remove. Instead, let the callers refer to trx_sys directly. trx_rseg_create(), trx_sysf_rseg_find_free(): Remove unneeded parameters. These functions only deal with persistent undo logs. trx_temp_rseg_create(): New function, to create all temporary rollback segments at server startup. trx_rseg_t::is_persistent(): Determine if the rollback segment is for persistent tables. trx_sys_is_noredo_rseg_slot(): Remove. The callers must know based on context (such as table handle) whether the DB_ROLL_PTR is referring to a persistent undo log. trx_sys_create_rsegs(): Remove all parameters, which were always passed as global variables. Instead, modify the global variables directly. enum trx_rseg_type_t: Remove. trx_t::get_temp_rseg(): A method to ensure that a temporary rollback segment has been assigned for the transaction. trx_t::assign_temp_rseg(): Replaces trx_assign_rseg(). trx_purge_free_segment(), trx_purge_truncate_rseg_history(): Remove the redundant variable noredo=false. Temporary undo logs are discarded immediately at transaction commit or rollback, not lazily by purge. trx_purge_mark_undo_for_truncate(): Remove references to the temporary rollback segments. trx_purge_mark_undo_for_truncate(): Remove a check for temporary rollback segments. Only the dedicated persistent undo log tablespaces can be truncated. trx_undo_get_undo_rec_low(), trx_undo_get_undo_rec(): Add the parameter is_temp. trx_rseg_mem_restore(): Split from trx_rseg_mem_create(). Initialize the undo log and the rollback segment from the file data structures. trx_sysf_get_n_rseg_slots(): Renamed from trx_sysf_used_slots_for_redo_rseg(). Count the persistent rollback segment headers that have been initialized. trx_sys_close(): Also free trx_sys->temp_rsegs[]. get_next_redo_rseg(): Merged to trx_assign_rseg_low(). trx_assign_rseg_low(): Remove the parameters and access the global variables directly. Revert to simple round-robin, now that the whole trx_sys->rseg_array[] is for persistent undo log again. get_next_noredo_rseg(): Moved to trx_t::assign_temp_rseg(). srv_undo_tablespaces_init(): Remove some parameters and use the global variables directly. Clarify some error messages. Adjust the test innodb.log_file. Apparently, before these changes, InnoDB somehow ignored missing dedicated undo tablespace files that are pointed by the TRX_SYS header page, possibly losing part of essential transaction system state.
9 years ago
MDEV-12289 Keep 128 persistent rollback segments for compatibility and performance InnoDB divides the allocation of undo logs into rollback segments. The DB_ROLL_PTR system column of clustered indexes can address up to 128 rollback segments (TRX_SYS_N_RSEGS). Originally, InnoDB only created one rollback segment. In MySQL 5.5 or in the InnoDB Plugin for MySQL 5.1, all 128 rollback segments were created. MySQL 5.7 hard-codes the rollback segment IDs 1..32 for temporary undo logs. On upgrade, unless a slow shutdown (innodb_fast_shutdown=0) was performed on the old server instance, these rollback segments could be in use by transactions that are in XA PREPARE state or transactions that were left behind by a server kill followed by a normal shutdown immediately after restart. Persistent tables cannot refer to temporary undo logs or vice versa. Therefore, we should keep two distinct sets of rollback segments: one for persistent tables and another for temporary tables. In this way, all 128 rollback segments will be available for both types of tables, which could improve performance. Also, MariaDB 10.2 will remain more compatible than MySQL 5.7 with data files from earlier versions of MySQL or MariaDB. trx_sys_t::temp_rsegs[TRX_SYS_N_RSEGS]: A new array of temporary rollback segments. The trx_sys_t::rseg_array[TRX_SYS_N_RSEGS] will be solely for persistent undo logs. srv_tmp_undo_logs. Remove. Use the constant TRX_SYS_N_RSEGS. srv_available_undo_logs: Change the type to ulong. trx_rseg_get_on_id(): Remove. Instead, let the callers refer to trx_sys directly. trx_rseg_create(), trx_sysf_rseg_find_free(): Remove unneeded parameters. These functions only deal with persistent undo logs. trx_temp_rseg_create(): New function, to create all temporary rollback segments at server startup. trx_rseg_t::is_persistent(): Determine if the rollback segment is for persistent tables. trx_sys_is_noredo_rseg_slot(): Remove. The callers must know based on context (such as table handle) whether the DB_ROLL_PTR is referring to a persistent undo log. trx_sys_create_rsegs(): Remove all parameters, which were always passed as global variables. Instead, modify the global variables directly. enum trx_rseg_type_t: Remove. trx_t::get_temp_rseg(): A method to ensure that a temporary rollback segment has been assigned for the transaction. trx_t::assign_temp_rseg(): Replaces trx_assign_rseg(). trx_purge_free_segment(), trx_purge_truncate_rseg_history(): Remove the redundant variable noredo=false. Temporary undo logs are discarded immediately at transaction commit or rollback, not lazily by purge. trx_purge_mark_undo_for_truncate(): Remove references to the temporary rollback segments. trx_purge_mark_undo_for_truncate(): Remove a check for temporary rollback segments. Only the dedicated persistent undo log tablespaces can be truncated. trx_undo_get_undo_rec_low(), trx_undo_get_undo_rec(): Add the parameter is_temp. trx_rseg_mem_restore(): Split from trx_rseg_mem_create(). Initialize the undo log and the rollback segment from the file data structures. trx_sysf_get_n_rseg_slots(): Renamed from trx_sysf_used_slots_for_redo_rseg(). Count the persistent rollback segment headers that have been initialized. trx_sys_close(): Also free trx_sys->temp_rsegs[]. get_next_redo_rseg(): Merged to trx_assign_rseg_low(). trx_assign_rseg_low(): Remove the parameters and access the global variables directly. Revert to simple round-robin, now that the whole trx_sys->rseg_array[] is for persistent undo log again. get_next_noredo_rseg(): Moved to trx_t::assign_temp_rseg(). srv_undo_tablespaces_init(): Remove some parameters and use the global variables directly. Clarify some error messages. Adjust the test innodb.log_file. Apparently, before these changes, InnoDB somehow ignored missing dedicated undo tablespace files that are pointed by the TRX_SYS header page, possibly losing part of essential transaction system state.
9 years ago
MDEV-12289 Keep 128 persistent rollback segments for compatibility and performance InnoDB divides the allocation of undo logs into rollback segments. The DB_ROLL_PTR system column of clustered indexes can address up to 128 rollback segments (TRX_SYS_N_RSEGS). Originally, InnoDB only created one rollback segment. In MySQL 5.5 or in the InnoDB Plugin for MySQL 5.1, all 128 rollback segments were created. MySQL 5.7 hard-codes the rollback segment IDs 1..32 for temporary undo logs. On upgrade, unless a slow shutdown (innodb_fast_shutdown=0) was performed on the old server instance, these rollback segments could be in use by transactions that are in XA PREPARE state or transactions that were left behind by a server kill followed by a normal shutdown immediately after restart. Persistent tables cannot refer to temporary undo logs or vice versa. Therefore, we should keep two distinct sets of rollback segments: one for persistent tables and another for temporary tables. In this way, all 128 rollback segments will be available for both types of tables, which could improve performance. Also, MariaDB 10.2 will remain more compatible than MySQL 5.7 with data files from earlier versions of MySQL or MariaDB. trx_sys_t::temp_rsegs[TRX_SYS_N_RSEGS]: A new array of temporary rollback segments. The trx_sys_t::rseg_array[TRX_SYS_N_RSEGS] will be solely for persistent undo logs. srv_tmp_undo_logs. Remove. Use the constant TRX_SYS_N_RSEGS. srv_available_undo_logs: Change the type to ulong. trx_rseg_get_on_id(): Remove. Instead, let the callers refer to trx_sys directly. trx_rseg_create(), trx_sysf_rseg_find_free(): Remove unneeded parameters. These functions only deal with persistent undo logs. trx_temp_rseg_create(): New function, to create all temporary rollback segments at server startup. trx_rseg_t::is_persistent(): Determine if the rollback segment is for persistent tables. trx_sys_is_noredo_rseg_slot(): Remove. The callers must know based on context (such as table handle) whether the DB_ROLL_PTR is referring to a persistent undo log. trx_sys_create_rsegs(): Remove all parameters, which were always passed as global variables. Instead, modify the global variables directly. enum trx_rseg_type_t: Remove. trx_t::get_temp_rseg(): A method to ensure that a temporary rollback segment has been assigned for the transaction. trx_t::assign_temp_rseg(): Replaces trx_assign_rseg(). trx_purge_free_segment(), trx_purge_truncate_rseg_history(): Remove the redundant variable noredo=false. Temporary undo logs are discarded immediately at transaction commit or rollback, not lazily by purge. trx_purge_mark_undo_for_truncate(): Remove references to the temporary rollback segments. trx_purge_mark_undo_for_truncate(): Remove a check for temporary rollback segments. Only the dedicated persistent undo log tablespaces can be truncated. trx_undo_get_undo_rec_low(), trx_undo_get_undo_rec(): Add the parameter is_temp. trx_rseg_mem_restore(): Split from trx_rseg_mem_create(). Initialize the undo log and the rollback segment from the file data structures. trx_sysf_get_n_rseg_slots(): Renamed from trx_sysf_used_slots_for_redo_rseg(). Count the persistent rollback segment headers that have been initialized. trx_sys_close(): Also free trx_sys->temp_rsegs[]. get_next_redo_rseg(): Merged to trx_assign_rseg_low(). trx_assign_rseg_low(): Remove the parameters and access the global variables directly. Revert to simple round-robin, now that the whole trx_sys->rseg_array[] is for persistent undo log again. get_next_noredo_rseg(): Moved to trx_t::assign_temp_rseg(). srv_undo_tablespaces_init(): Remove some parameters and use the global variables directly. Clarify some error messages. Adjust the test innodb.log_file. Apparently, before these changes, InnoDB somehow ignored missing dedicated undo tablespace files that are pointed by the TRX_SYS header page, possibly losing part of essential transaction system state.
9 years ago
MDEV-12289 Keep 128 persistent rollback segments for compatibility and performance InnoDB divides the allocation of undo logs into rollback segments. The DB_ROLL_PTR system column of clustered indexes can address up to 128 rollback segments (TRX_SYS_N_RSEGS). Originally, InnoDB only created one rollback segment. In MySQL 5.5 or in the InnoDB Plugin for MySQL 5.1, all 128 rollback segments were created. MySQL 5.7 hard-codes the rollback segment IDs 1..32 for temporary undo logs. On upgrade, unless a slow shutdown (innodb_fast_shutdown=0) was performed on the old server instance, these rollback segments could be in use by transactions that are in XA PREPARE state or transactions that were left behind by a server kill followed by a normal shutdown immediately after restart. Persistent tables cannot refer to temporary undo logs or vice versa. Therefore, we should keep two distinct sets of rollback segments: one for persistent tables and another for temporary tables. In this way, all 128 rollback segments will be available for both types of tables, which could improve performance. Also, MariaDB 10.2 will remain more compatible than MySQL 5.7 with data files from earlier versions of MySQL or MariaDB. trx_sys_t::temp_rsegs[TRX_SYS_N_RSEGS]: A new array of temporary rollback segments. The trx_sys_t::rseg_array[TRX_SYS_N_RSEGS] will be solely for persistent undo logs. srv_tmp_undo_logs. Remove. Use the constant TRX_SYS_N_RSEGS. srv_available_undo_logs: Change the type to ulong. trx_rseg_get_on_id(): Remove. Instead, let the callers refer to trx_sys directly. trx_rseg_create(), trx_sysf_rseg_find_free(): Remove unneeded parameters. These functions only deal with persistent undo logs. trx_temp_rseg_create(): New function, to create all temporary rollback segments at server startup. trx_rseg_t::is_persistent(): Determine if the rollback segment is for persistent tables. trx_sys_is_noredo_rseg_slot(): Remove. The callers must know based on context (such as table handle) whether the DB_ROLL_PTR is referring to a persistent undo log. trx_sys_create_rsegs(): Remove all parameters, which were always passed as global variables. Instead, modify the global variables directly. enum trx_rseg_type_t: Remove. trx_t::get_temp_rseg(): A method to ensure that a temporary rollback segment has been assigned for the transaction. trx_t::assign_temp_rseg(): Replaces trx_assign_rseg(). trx_purge_free_segment(), trx_purge_truncate_rseg_history(): Remove the redundant variable noredo=false. Temporary undo logs are discarded immediately at transaction commit or rollback, not lazily by purge. trx_purge_mark_undo_for_truncate(): Remove references to the temporary rollback segments. trx_purge_mark_undo_for_truncate(): Remove a check for temporary rollback segments. Only the dedicated persistent undo log tablespaces can be truncated. trx_undo_get_undo_rec_low(), trx_undo_get_undo_rec(): Add the parameter is_temp. trx_rseg_mem_restore(): Split from trx_rseg_mem_create(). Initialize the undo log and the rollback segment from the file data structures. trx_sysf_get_n_rseg_slots(): Renamed from trx_sysf_used_slots_for_redo_rseg(). Count the persistent rollback segment headers that have been initialized. trx_sys_close(): Also free trx_sys->temp_rsegs[]. get_next_redo_rseg(): Merged to trx_assign_rseg_low(). trx_assign_rseg_low(): Remove the parameters and access the global variables directly. Revert to simple round-robin, now that the whole trx_sys->rseg_array[] is for persistent undo log again. get_next_noredo_rseg(): Moved to trx_t::assign_temp_rseg(). srv_undo_tablespaces_init(): Remove some parameters and use the global variables directly. Clarify some error messages. Adjust the test innodb.log_file. Apparently, before these changes, InnoDB somehow ignored missing dedicated undo tablespace files that are pointed by the TRX_SYS header page, possibly losing part of essential transaction system state.
9 years ago
MDEV-12289 Keep 128 persistent rollback segments for compatibility and performance InnoDB divides the allocation of undo logs into rollback segments. The DB_ROLL_PTR system column of clustered indexes can address up to 128 rollback segments (TRX_SYS_N_RSEGS). Originally, InnoDB only created one rollback segment. In MySQL 5.5 or in the InnoDB Plugin for MySQL 5.1, all 128 rollback segments were created. MySQL 5.7 hard-codes the rollback segment IDs 1..32 for temporary undo logs. On upgrade, unless a slow shutdown (innodb_fast_shutdown=0) was performed on the old server instance, these rollback segments could be in use by transactions that are in XA PREPARE state or transactions that were left behind by a server kill followed by a normal shutdown immediately after restart. Persistent tables cannot refer to temporary undo logs or vice versa. Therefore, we should keep two distinct sets of rollback segments: one for persistent tables and another for temporary tables. In this way, all 128 rollback segments will be available for both types of tables, which could improve performance. Also, MariaDB 10.2 will remain more compatible than MySQL 5.7 with data files from earlier versions of MySQL or MariaDB. trx_sys_t::temp_rsegs[TRX_SYS_N_RSEGS]: A new array of temporary rollback segments. The trx_sys_t::rseg_array[TRX_SYS_N_RSEGS] will be solely for persistent undo logs. srv_tmp_undo_logs. Remove. Use the constant TRX_SYS_N_RSEGS. srv_available_undo_logs: Change the type to ulong. trx_rseg_get_on_id(): Remove. Instead, let the callers refer to trx_sys directly. trx_rseg_create(), trx_sysf_rseg_find_free(): Remove unneeded parameters. These functions only deal with persistent undo logs. trx_temp_rseg_create(): New function, to create all temporary rollback segments at server startup. trx_rseg_t::is_persistent(): Determine if the rollback segment is for persistent tables. trx_sys_is_noredo_rseg_slot(): Remove. The callers must know based on context (such as table handle) whether the DB_ROLL_PTR is referring to a persistent undo log. trx_sys_create_rsegs(): Remove all parameters, which were always passed as global variables. Instead, modify the global variables directly. enum trx_rseg_type_t: Remove. trx_t::get_temp_rseg(): A method to ensure that a temporary rollback segment has been assigned for the transaction. trx_t::assign_temp_rseg(): Replaces trx_assign_rseg(). trx_purge_free_segment(), trx_purge_truncate_rseg_history(): Remove the redundant variable noredo=false. Temporary undo logs are discarded immediately at transaction commit or rollback, not lazily by purge. trx_purge_mark_undo_for_truncate(): Remove references to the temporary rollback segments. trx_purge_mark_undo_for_truncate(): Remove a check for temporary rollback segments. Only the dedicated persistent undo log tablespaces can be truncated. trx_undo_get_undo_rec_low(), trx_undo_get_undo_rec(): Add the parameter is_temp. trx_rseg_mem_restore(): Split from trx_rseg_mem_create(). Initialize the undo log and the rollback segment from the file data structures. trx_sysf_get_n_rseg_slots(): Renamed from trx_sysf_used_slots_for_redo_rseg(). Count the persistent rollback segment headers that have been initialized. trx_sys_close(): Also free trx_sys->temp_rsegs[]. get_next_redo_rseg(): Merged to trx_assign_rseg_low(). trx_assign_rseg_low(): Remove the parameters and access the global variables directly. Revert to simple round-robin, now that the whole trx_sys->rseg_array[] is for persistent undo log again. get_next_noredo_rseg(): Moved to trx_t::assign_temp_rseg(). srv_undo_tablespaces_init(): Remove some parameters and use the global variables directly. Clarify some error messages. Adjust the test innodb.log_file. Apparently, before these changes, InnoDB somehow ignored missing dedicated undo tablespace files that are pointed by the TRX_SYS header page, possibly losing part of essential transaction system state.
9 years ago
MDEV-12289 Keep 128 persistent rollback segments for compatibility and performance InnoDB divides the allocation of undo logs into rollback segments. The DB_ROLL_PTR system column of clustered indexes can address up to 128 rollback segments (TRX_SYS_N_RSEGS). Originally, InnoDB only created one rollback segment. In MySQL 5.5 or in the InnoDB Plugin for MySQL 5.1, all 128 rollback segments were created. MySQL 5.7 hard-codes the rollback segment IDs 1..32 for temporary undo logs. On upgrade, unless a slow shutdown (innodb_fast_shutdown=0) was performed on the old server instance, these rollback segments could be in use by transactions that are in XA PREPARE state or transactions that were left behind by a server kill followed by a normal shutdown immediately after restart. Persistent tables cannot refer to temporary undo logs or vice versa. Therefore, we should keep two distinct sets of rollback segments: one for persistent tables and another for temporary tables. In this way, all 128 rollback segments will be available for both types of tables, which could improve performance. Also, MariaDB 10.2 will remain more compatible than MySQL 5.7 with data files from earlier versions of MySQL or MariaDB. trx_sys_t::temp_rsegs[TRX_SYS_N_RSEGS]: A new array of temporary rollback segments. The trx_sys_t::rseg_array[TRX_SYS_N_RSEGS] will be solely for persistent undo logs. srv_tmp_undo_logs. Remove. Use the constant TRX_SYS_N_RSEGS. srv_available_undo_logs: Change the type to ulong. trx_rseg_get_on_id(): Remove. Instead, let the callers refer to trx_sys directly. trx_rseg_create(), trx_sysf_rseg_find_free(): Remove unneeded parameters. These functions only deal with persistent undo logs. trx_temp_rseg_create(): New function, to create all temporary rollback segments at server startup. trx_rseg_t::is_persistent(): Determine if the rollback segment is for persistent tables. trx_sys_is_noredo_rseg_slot(): Remove. The callers must know based on context (such as table handle) whether the DB_ROLL_PTR is referring to a persistent undo log. trx_sys_create_rsegs(): Remove all parameters, which were always passed as global variables. Instead, modify the global variables directly. enum trx_rseg_type_t: Remove. trx_t::get_temp_rseg(): A method to ensure that a temporary rollback segment has been assigned for the transaction. trx_t::assign_temp_rseg(): Replaces trx_assign_rseg(). trx_purge_free_segment(), trx_purge_truncate_rseg_history(): Remove the redundant variable noredo=false. Temporary undo logs are discarded immediately at transaction commit or rollback, not lazily by purge. trx_purge_mark_undo_for_truncate(): Remove references to the temporary rollback segments. trx_purge_mark_undo_for_truncate(): Remove a check for temporary rollback segments. Only the dedicated persistent undo log tablespaces can be truncated. trx_undo_get_undo_rec_low(), trx_undo_get_undo_rec(): Add the parameter is_temp. trx_rseg_mem_restore(): Split from trx_rseg_mem_create(). Initialize the undo log and the rollback segment from the file data structures. trx_sysf_get_n_rseg_slots(): Renamed from trx_sysf_used_slots_for_redo_rseg(). Count the persistent rollback segment headers that have been initialized. trx_sys_close(): Also free trx_sys->temp_rsegs[]. get_next_redo_rseg(): Merged to trx_assign_rseg_low(). trx_assign_rseg_low(): Remove the parameters and access the global variables directly. Revert to simple round-robin, now that the whole trx_sys->rseg_array[] is for persistent undo log again. get_next_noredo_rseg(): Moved to trx_t::assign_temp_rseg(). srv_undo_tablespaces_init(): Remove some parameters and use the global variables directly. Clarify some error messages. Adjust the test innodb.log_file. Apparently, before these changes, InnoDB somehow ignored missing dedicated undo tablespace files that are pointed by the TRX_SYS header page, possibly losing part of essential transaction system state.
9 years ago
MDEV-12289 Keep 128 persistent rollback segments for compatibility and performance InnoDB divides the allocation of undo logs into rollback segments. The DB_ROLL_PTR system column of clustered indexes can address up to 128 rollback segments (TRX_SYS_N_RSEGS). Originally, InnoDB only created one rollback segment. In MySQL 5.5 or in the InnoDB Plugin for MySQL 5.1, all 128 rollback segments were created. MySQL 5.7 hard-codes the rollback segment IDs 1..32 for temporary undo logs. On upgrade, unless a slow shutdown (innodb_fast_shutdown=0) was performed on the old server instance, these rollback segments could be in use by transactions that are in XA PREPARE state or transactions that were left behind by a server kill followed by a normal shutdown immediately after restart. Persistent tables cannot refer to temporary undo logs or vice versa. Therefore, we should keep two distinct sets of rollback segments: one for persistent tables and another for temporary tables. In this way, all 128 rollback segments will be available for both types of tables, which could improve performance. Also, MariaDB 10.2 will remain more compatible than MySQL 5.7 with data files from earlier versions of MySQL or MariaDB. trx_sys_t::temp_rsegs[TRX_SYS_N_RSEGS]: A new array of temporary rollback segments. The trx_sys_t::rseg_array[TRX_SYS_N_RSEGS] will be solely for persistent undo logs. srv_tmp_undo_logs. Remove. Use the constant TRX_SYS_N_RSEGS. srv_available_undo_logs: Change the type to ulong. trx_rseg_get_on_id(): Remove. Instead, let the callers refer to trx_sys directly. trx_rseg_create(), trx_sysf_rseg_find_free(): Remove unneeded parameters. These functions only deal with persistent undo logs. trx_temp_rseg_create(): New function, to create all temporary rollback segments at server startup. trx_rseg_t::is_persistent(): Determine if the rollback segment is for persistent tables. trx_sys_is_noredo_rseg_slot(): Remove. The callers must know based on context (such as table handle) whether the DB_ROLL_PTR is referring to a persistent undo log. trx_sys_create_rsegs(): Remove all parameters, which were always passed as global variables. Instead, modify the global variables directly. enum trx_rseg_type_t: Remove. trx_t::get_temp_rseg(): A method to ensure that a temporary rollback segment has been assigned for the transaction. trx_t::assign_temp_rseg(): Replaces trx_assign_rseg(). trx_purge_free_segment(), trx_purge_truncate_rseg_history(): Remove the redundant variable noredo=false. Temporary undo logs are discarded immediately at transaction commit or rollback, not lazily by purge. trx_purge_mark_undo_for_truncate(): Remove references to the temporary rollback segments. trx_purge_mark_undo_for_truncate(): Remove a check for temporary rollback segments. Only the dedicated persistent undo log tablespaces can be truncated. trx_undo_get_undo_rec_low(), trx_undo_get_undo_rec(): Add the parameter is_temp. trx_rseg_mem_restore(): Split from trx_rseg_mem_create(). Initialize the undo log and the rollback segment from the file data structures. trx_sysf_get_n_rseg_slots(): Renamed from trx_sysf_used_slots_for_redo_rseg(). Count the persistent rollback segment headers that have been initialized. trx_sys_close(): Also free trx_sys->temp_rsegs[]. get_next_redo_rseg(): Merged to trx_assign_rseg_low(). trx_assign_rseg_low(): Remove the parameters and access the global variables directly. Revert to simple round-robin, now that the whole trx_sys->rseg_array[] is for persistent undo log again. get_next_noredo_rseg(): Moved to trx_t::assign_temp_rseg(). srv_undo_tablespaces_init(): Remove some parameters and use the global variables directly. Clarify some error messages. Adjust the test innodb.log_file. Apparently, before these changes, InnoDB somehow ignored missing dedicated undo tablespace files that are pointed by the TRX_SYS header page, possibly losing part of essential transaction system state.
9 years ago
MDEV-12289 Keep 128 persistent rollback segments for compatibility and performance InnoDB divides the allocation of undo logs into rollback segments. The DB_ROLL_PTR system column of clustered indexes can address up to 128 rollback segments (TRX_SYS_N_RSEGS). Originally, InnoDB only created one rollback segment. In MySQL 5.5 or in the InnoDB Plugin for MySQL 5.1, all 128 rollback segments were created. MySQL 5.7 hard-codes the rollback segment IDs 1..32 for temporary undo logs. On upgrade, unless a slow shutdown (innodb_fast_shutdown=0) was performed on the old server instance, these rollback segments could be in use by transactions that are in XA PREPARE state or transactions that were left behind by a server kill followed by a normal shutdown immediately after restart. Persistent tables cannot refer to temporary undo logs or vice versa. Therefore, we should keep two distinct sets of rollback segments: one for persistent tables and another for temporary tables. In this way, all 128 rollback segments will be available for both types of tables, which could improve performance. Also, MariaDB 10.2 will remain more compatible than MySQL 5.7 with data files from earlier versions of MySQL or MariaDB. trx_sys_t::temp_rsegs[TRX_SYS_N_RSEGS]: A new array of temporary rollback segments. The trx_sys_t::rseg_array[TRX_SYS_N_RSEGS] will be solely for persistent undo logs. srv_tmp_undo_logs. Remove. Use the constant TRX_SYS_N_RSEGS. srv_available_undo_logs: Change the type to ulong. trx_rseg_get_on_id(): Remove. Instead, let the callers refer to trx_sys directly. trx_rseg_create(), trx_sysf_rseg_find_free(): Remove unneeded parameters. These functions only deal with persistent undo logs. trx_temp_rseg_create(): New function, to create all temporary rollback segments at server startup. trx_rseg_t::is_persistent(): Determine if the rollback segment is for persistent tables. trx_sys_is_noredo_rseg_slot(): Remove. The callers must know based on context (such as table handle) whether the DB_ROLL_PTR is referring to a persistent undo log. trx_sys_create_rsegs(): Remove all parameters, which were always passed as global variables. Instead, modify the global variables directly. enum trx_rseg_type_t: Remove. trx_t::get_temp_rseg(): A method to ensure that a temporary rollback segment has been assigned for the transaction. trx_t::assign_temp_rseg(): Replaces trx_assign_rseg(). trx_purge_free_segment(), trx_purge_truncate_rseg_history(): Remove the redundant variable noredo=false. Temporary undo logs are discarded immediately at transaction commit or rollback, not lazily by purge. trx_purge_mark_undo_for_truncate(): Remove references to the temporary rollback segments. trx_purge_mark_undo_for_truncate(): Remove a check for temporary rollback segments. Only the dedicated persistent undo log tablespaces can be truncated. trx_undo_get_undo_rec_low(), trx_undo_get_undo_rec(): Add the parameter is_temp. trx_rseg_mem_restore(): Split from trx_rseg_mem_create(). Initialize the undo log and the rollback segment from the file data structures. trx_sysf_get_n_rseg_slots(): Renamed from trx_sysf_used_slots_for_redo_rseg(). Count the persistent rollback segment headers that have been initialized. trx_sys_close(): Also free trx_sys->temp_rsegs[]. get_next_redo_rseg(): Merged to trx_assign_rseg_low(). trx_assign_rseg_low(): Remove the parameters and access the global variables directly. Revert to simple round-robin, now that the whole trx_sys->rseg_array[] is for persistent undo log again. get_next_noredo_rseg(): Moved to trx_t::assign_temp_rseg(). srv_undo_tablespaces_init(): Remove some parameters and use the global variables directly. Clarify some error messages. Adjust the test innodb.log_file. Apparently, before these changes, InnoDB somehow ignored missing dedicated undo tablespace files that are pointed by the TRX_SYS header page, possibly losing part of essential transaction system state.
9 years ago
MDEV-12289 Keep 128 persistent rollback segments for compatibility and performance InnoDB divides the allocation of undo logs into rollback segments. The DB_ROLL_PTR system column of clustered indexes can address up to 128 rollback segments (TRX_SYS_N_RSEGS). Originally, InnoDB only created one rollback segment. In MySQL 5.5 or in the InnoDB Plugin for MySQL 5.1, all 128 rollback segments were created. MySQL 5.7 hard-codes the rollback segment IDs 1..32 for temporary undo logs. On upgrade, unless a slow shutdown (innodb_fast_shutdown=0) was performed on the old server instance, these rollback segments could be in use by transactions that are in XA PREPARE state or transactions that were left behind by a server kill followed by a normal shutdown immediately after restart. Persistent tables cannot refer to temporary undo logs or vice versa. Therefore, we should keep two distinct sets of rollback segments: one for persistent tables and another for temporary tables. In this way, all 128 rollback segments will be available for both types of tables, which could improve performance. Also, MariaDB 10.2 will remain more compatible than MySQL 5.7 with data files from earlier versions of MySQL or MariaDB. trx_sys_t::temp_rsegs[TRX_SYS_N_RSEGS]: A new array of temporary rollback segments. The trx_sys_t::rseg_array[TRX_SYS_N_RSEGS] will be solely for persistent undo logs. srv_tmp_undo_logs. Remove. Use the constant TRX_SYS_N_RSEGS. srv_available_undo_logs: Change the type to ulong. trx_rseg_get_on_id(): Remove. Instead, let the callers refer to trx_sys directly. trx_rseg_create(), trx_sysf_rseg_find_free(): Remove unneeded parameters. These functions only deal with persistent undo logs. trx_temp_rseg_create(): New function, to create all temporary rollback segments at server startup. trx_rseg_t::is_persistent(): Determine if the rollback segment is for persistent tables. trx_sys_is_noredo_rseg_slot(): Remove. The callers must know based on context (such as table handle) whether the DB_ROLL_PTR is referring to a persistent undo log. trx_sys_create_rsegs(): Remove all parameters, which were always passed as global variables. Instead, modify the global variables directly. enum trx_rseg_type_t: Remove. trx_t::get_temp_rseg(): A method to ensure that a temporary rollback segment has been assigned for the transaction. trx_t::assign_temp_rseg(): Replaces trx_assign_rseg(). trx_purge_free_segment(), trx_purge_truncate_rseg_history(): Remove the redundant variable noredo=false. Temporary undo logs are discarded immediately at transaction commit or rollback, not lazily by purge. trx_purge_mark_undo_for_truncate(): Remove references to the temporary rollback segments. trx_purge_mark_undo_for_truncate(): Remove a check for temporary rollback segments. Only the dedicated persistent undo log tablespaces can be truncated. trx_undo_get_undo_rec_low(), trx_undo_get_undo_rec(): Add the parameter is_temp. trx_rseg_mem_restore(): Split from trx_rseg_mem_create(). Initialize the undo log and the rollback segment from the file data structures. trx_sysf_get_n_rseg_slots(): Renamed from trx_sysf_used_slots_for_redo_rseg(). Count the persistent rollback segment headers that have been initialized. trx_sys_close(): Also free trx_sys->temp_rsegs[]. get_next_redo_rseg(): Merged to trx_assign_rseg_low(). trx_assign_rseg_low(): Remove the parameters and access the global variables directly. Revert to simple round-robin, now that the whole trx_sys->rseg_array[] is for persistent undo log again. get_next_noredo_rseg(): Moved to trx_t::assign_temp_rseg(). srv_undo_tablespaces_init(): Remove some parameters and use the global variables directly. Clarify some error messages. Adjust the test innodb.log_file. Apparently, before these changes, InnoDB somehow ignored missing dedicated undo tablespace files that are pointed by the TRX_SYS header page, possibly losing part of essential transaction system state.
9 years ago
MDEV-12289 Keep 128 persistent rollback segments for compatibility and performance InnoDB divides the allocation of undo logs into rollback segments. The DB_ROLL_PTR system column of clustered indexes can address up to 128 rollback segments (TRX_SYS_N_RSEGS). Originally, InnoDB only created one rollback segment. In MySQL 5.5 or in the InnoDB Plugin for MySQL 5.1, all 128 rollback segments were created. MySQL 5.7 hard-codes the rollback segment IDs 1..32 for temporary undo logs. On upgrade, unless a slow shutdown (innodb_fast_shutdown=0) was performed on the old server instance, these rollback segments could be in use by transactions that are in XA PREPARE state or transactions that were left behind by a server kill followed by a normal shutdown immediately after restart. Persistent tables cannot refer to temporary undo logs or vice versa. Therefore, we should keep two distinct sets of rollback segments: one for persistent tables and another for temporary tables. In this way, all 128 rollback segments will be available for both types of tables, which could improve performance. Also, MariaDB 10.2 will remain more compatible than MySQL 5.7 with data files from earlier versions of MySQL or MariaDB. trx_sys_t::temp_rsegs[TRX_SYS_N_RSEGS]: A new array of temporary rollback segments. The trx_sys_t::rseg_array[TRX_SYS_N_RSEGS] will be solely for persistent undo logs. srv_tmp_undo_logs. Remove. Use the constant TRX_SYS_N_RSEGS. srv_available_undo_logs: Change the type to ulong. trx_rseg_get_on_id(): Remove. Instead, let the callers refer to trx_sys directly. trx_rseg_create(), trx_sysf_rseg_find_free(): Remove unneeded parameters. These functions only deal with persistent undo logs. trx_temp_rseg_create(): New function, to create all temporary rollback segments at server startup. trx_rseg_t::is_persistent(): Determine if the rollback segment is for persistent tables. trx_sys_is_noredo_rseg_slot(): Remove. The callers must know based on context (such as table handle) whether the DB_ROLL_PTR is referring to a persistent undo log. trx_sys_create_rsegs(): Remove all parameters, which were always passed as global variables. Instead, modify the global variables directly. enum trx_rseg_type_t: Remove. trx_t::get_temp_rseg(): A method to ensure that a temporary rollback segment has been assigned for the transaction. trx_t::assign_temp_rseg(): Replaces trx_assign_rseg(). trx_purge_free_segment(), trx_purge_truncate_rseg_history(): Remove the redundant variable noredo=false. Temporary undo logs are discarded immediately at transaction commit or rollback, not lazily by purge. trx_purge_mark_undo_for_truncate(): Remove references to the temporary rollback segments. trx_purge_mark_undo_for_truncate(): Remove a check for temporary rollback segments. Only the dedicated persistent undo log tablespaces can be truncated. trx_undo_get_undo_rec_low(), trx_undo_get_undo_rec(): Add the parameter is_temp. trx_rseg_mem_restore(): Split from trx_rseg_mem_create(). Initialize the undo log and the rollback segment from the file data structures. trx_sysf_get_n_rseg_slots(): Renamed from trx_sysf_used_slots_for_redo_rseg(). Count the persistent rollback segment headers that have been initialized. trx_sys_close(): Also free trx_sys->temp_rsegs[]. get_next_redo_rseg(): Merged to trx_assign_rseg_low(). trx_assign_rseg_low(): Remove the parameters and access the global variables directly. Revert to simple round-robin, now that the whole trx_sys->rseg_array[] is for persistent undo log again. get_next_noredo_rseg(): Moved to trx_t::assign_temp_rseg(). srv_undo_tablespaces_init(): Remove some parameters and use the global variables directly. Clarify some error messages. Adjust the test innodb.log_file. Apparently, before these changes, InnoDB somehow ignored missing dedicated undo tablespace files that are pointed by the TRX_SYS header page, possibly losing part of essential transaction system state.
9 years ago
MDEV-12289 Keep 128 persistent rollback segments for compatibility and performance InnoDB divides the allocation of undo logs into rollback segments. The DB_ROLL_PTR system column of clustered indexes can address up to 128 rollback segments (TRX_SYS_N_RSEGS). Originally, InnoDB only created one rollback segment. In MySQL 5.5 or in the InnoDB Plugin for MySQL 5.1, all 128 rollback segments were created. MySQL 5.7 hard-codes the rollback segment IDs 1..32 for temporary undo logs. On upgrade, unless a slow shutdown (innodb_fast_shutdown=0) was performed on the old server instance, these rollback segments could be in use by transactions that are in XA PREPARE state or transactions that were left behind by a server kill followed by a normal shutdown immediately after restart. Persistent tables cannot refer to temporary undo logs or vice versa. Therefore, we should keep two distinct sets of rollback segments: one for persistent tables and another for temporary tables. In this way, all 128 rollback segments will be available for both types of tables, which could improve performance. Also, MariaDB 10.2 will remain more compatible than MySQL 5.7 with data files from earlier versions of MySQL or MariaDB. trx_sys_t::temp_rsegs[TRX_SYS_N_RSEGS]: A new array of temporary rollback segments. The trx_sys_t::rseg_array[TRX_SYS_N_RSEGS] will be solely for persistent undo logs. srv_tmp_undo_logs. Remove. Use the constant TRX_SYS_N_RSEGS. srv_available_undo_logs: Change the type to ulong. trx_rseg_get_on_id(): Remove. Instead, let the callers refer to trx_sys directly. trx_rseg_create(), trx_sysf_rseg_find_free(): Remove unneeded parameters. These functions only deal with persistent undo logs. trx_temp_rseg_create(): New function, to create all temporary rollback segments at server startup. trx_rseg_t::is_persistent(): Determine if the rollback segment is for persistent tables. trx_sys_is_noredo_rseg_slot(): Remove. The callers must know based on context (such as table handle) whether the DB_ROLL_PTR is referring to a persistent undo log. trx_sys_create_rsegs(): Remove all parameters, which were always passed as global variables. Instead, modify the global variables directly. enum trx_rseg_type_t: Remove. trx_t::get_temp_rseg(): A method to ensure that a temporary rollback segment has been assigned for the transaction. trx_t::assign_temp_rseg(): Replaces trx_assign_rseg(). trx_purge_free_segment(), trx_purge_truncate_rseg_history(): Remove the redundant variable noredo=false. Temporary undo logs are discarded immediately at transaction commit or rollback, not lazily by purge. trx_purge_mark_undo_for_truncate(): Remove references to the temporary rollback segments. trx_purge_mark_undo_for_truncate(): Remove a check for temporary rollback segments. Only the dedicated persistent undo log tablespaces can be truncated. trx_undo_get_undo_rec_low(), trx_undo_get_undo_rec(): Add the parameter is_temp. trx_rseg_mem_restore(): Split from trx_rseg_mem_create(). Initialize the undo log and the rollback segment from the file data structures. trx_sysf_get_n_rseg_slots(): Renamed from trx_sysf_used_slots_for_redo_rseg(). Count the persistent rollback segment headers that have been initialized. trx_sys_close(): Also free trx_sys->temp_rsegs[]. get_next_redo_rseg(): Merged to trx_assign_rseg_low(). trx_assign_rseg_low(): Remove the parameters and access the global variables directly. Revert to simple round-robin, now that the whole trx_sys->rseg_array[] is for persistent undo log again. get_next_noredo_rseg(): Moved to trx_t::assign_temp_rseg(). srv_undo_tablespaces_init(): Remove some parameters and use the global variables directly. Clarify some error messages. Adjust the test innodb.log_file. Apparently, before these changes, InnoDB somehow ignored missing dedicated undo tablespace files that are pointed by the TRX_SYS header page, possibly losing part of essential transaction system state.
9 years ago
MDEV-12289 Keep 128 persistent rollback segments for compatibility and performance InnoDB divides the allocation of undo logs into rollback segments. The DB_ROLL_PTR system column of clustered indexes can address up to 128 rollback segments (TRX_SYS_N_RSEGS). Originally, InnoDB only created one rollback segment. In MySQL 5.5 or in the InnoDB Plugin for MySQL 5.1, all 128 rollback segments were created. MySQL 5.7 hard-codes the rollback segment IDs 1..32 for temporary undo logs. On upgrade, unless a slow shutdown (innodb_fast_shutdown=0) was performed on the old server instance, these rollback segments could be in use by transactions that are in XA PREPARE state or transactions that were left behind by a server kill followed by a normal shutdown immediately after restart. Persistent tables cannot refer to temporary undo logs or vice versa. Therefore, we should keep two distinct sets of rollback segments: one for persistent tables and another for temporary tables. In this way, all 128 rollback segments will be available for both types of tables, which could improve performance. Also, MariaDB 10.2 will remain more compatible than MySQL 5.7 with data files from earlier versions of MySQL or MariaDB. trx_sys_t::temp_rsegs[TRX_SYS_N_RSEGS]: A new array of temporary rollback segments. The trx_sys_t::rseg_array[TRX_SYS_N_RSEGS] will be solely for persistent undo logs. srv_tmp_undo_logs. Remove. Use the constant TRX_SYS_N_RSEGS. srv_available_undo_logs: Change the type to ulong. trx_rseg_get_on_id(): Remove. Instead, let the callers refer to trx_sys directly. trx_rseg_create(), trx_sysf_rseg_find_free(): Remove unneeded parameters. These functions only deal with persistent undo logs. trx_temp_rseg_create(): New function, to create all temporary rollback segments at server startup. trx_rseg_t::is_persistent(): Determine if the rollback segment is for persistent tables. trx_sys_is_noredo_rseg_slot(): Remove. The callers must know based on context (such as table handle) whether the DB_ROLL_PTR is referring to a persistent undo log. trx_sys_create_rsegs(): Remove all parameters, which were always passed as global variables. Instead, modify the global variables directly. enum trx_rseg_type_t: Remove. trx_t::get_temp_rseg(): A method to ensure that a temporary rollback segment has been assigned for the transaction. trx_t::assign_temp_rseg(): Replaces trx_assign_rseg(). trx_purge_free_segment(), trx_purge_truncate_rseg_history(): Remove the redundant variable noredo=false. Temporary undo logs are discarded immediately at transaction commit or rollback, not lazily by purge. trx_purge_mark_undo_for_truncate(): Remove references to the temporary rollback segments. trx_purge_mark_undo_for_truncate(): Remove a check for temporary rollback segments. Only the dedicated persistent undo log tablespaces can be truncated. trx_undo_get_undo_rec_low(), trx_undo_get_undo_rec(): Add the parameter is_temp. trx_rseg_mem_restore(): Split from trx_rseg_mem_create(). Initialize the undo log and the rollback segment from the file data structures. trx_sysf_get_n_rseg_slots(): Renamed from trx_sysf_used_slots_for_redo_rseg(). Count the persistent rollback segment headers that have been initialized. trx_sys_close(): Also free trx_sys->temp_rsegs[]. get_next_redo_rseg(): Merged to trx_assign_rseg_low(). trx_assign_rseg_low(): Remove the parameters and access the global variables directly. Revert to simple round-robin, now that the whole trx_sys->rseg_array[] is for persistent undo log again. get_next_noredo_rseg(): Moved to trx_t::assign_temp_rseg(). srv_undo_tablespaces_init(): Remove some parameters and use the global variables directly. Clarify some error messages. Adjust the test innodb.log_file. Apparently, before these changes, InnoDB somehow ignored missing dedicated undo tablespace files that are pointed by the TRX_SYS header page, possibly losing part of essential transaction system state.
9 years ago
MDEV-14638 - Replace trx_sys_t::rw_trx_set with LF_HASH trx_sys_t::rw_trx_set is implemented as std::set, which does a few quite expensive operations under trx_sys_t::mutex protection: e.g. malloc/free when adding/removing elements. Traversing b-tree is not that cheap either. This has negative scalability impact, which is especially visible when running oltp_update_index.lua benchmark on a ramdisk. To reduce trx_sys_t::mutex contention std::set is replaced with LF_HASH. None of LF_HASH operations require trx_sys_t::mutex (nor any other global mutex) protection. Another interesting issue observed with std::set is reproducible ~2% performance decline after benchmark is ran for ~60 seconds. With LF_HASH results are stable. All in all this patch optimises away one of three trx_sys->mutex locks per oltp_update_index.lua query. The other two critical sections became smaller. Relevant clean-ups: Replaced rw_trx_set iteration at startup with local set. The latter is needed because values inserted to rw_trx_list must be ordered by trx->id. Removed redundant conditions from trx_reference(): it is (and even was) never called with transactions that have trx->state == TRX_STATE_COMMITTED_IN_MEMORY. do_ref_count doesn't (and probably even didn't) make any sense: now it is called only when reference counter increment is actually requested. Moved condition out of mutex in trx_erase_lists(). trx_rw_is_active(), trx_rw_is_active_low() and trx_get_rw_trx_by_id() were greatly simplified and replaced by appropriate trx_rw_hash_t methods. Compared to rw_trx_set, rw_trx_hash holds transactions only in PREPARED or ACTIVE states. Transactions in COMMITTED state were required to be found at InnoDB startup only. They are now looked up in the local set. Removed unused trx_assert_recovered(). Removed unused innobase_get_trx() declaration. Removed rather semantically incorrect trx_sys_rw_trx_add(). Moved information printout from trx_sys_init_at_db_start() to trx_lists_init_at_db_start().
8 years ago
MDEV-14638 - Replace trx_sys_t::rw_trx_set with LF_HASH trx_sys_t::rw_trx_set is implemented as std::set, which does a few quite expensive operations under trx_sys_t::mutex protection: e.g. malloc/free when adding/removing elements. Traversing b-tree is not that cheap either. This has negative scalability impact, which is especially visible when running oltp_update_index.lua benchmark on a ramdisk. To reduce trx_sys_t::mutex contention std::set is replaced with LF_HASH. None of LF_HASH operations require trx_sys_t::mutex (nor any other global mutex) protection. Another interesting issue observed with std::set is reproducible ~2% performance decline after benchmark is ran for ~60 seconds. With LF_HASH results are stable. All in all this patch optimises away one of three trx_sys->mutex locks per oltp_update_index.lua query. The other two critical sections became smaller. Relevant clean-ups: Replaced rw_trx_set iteration at startup with local set. The latter is needed because values inserted to rw_trx_list must be ordered by trx->id. Removed redundant conditions from trx_reference(): it is (and even was) never called with transactions that have trx->state == TRX_STATE_COMMITTED_IN_MEMORY. do_ref_count doesn't (and probably even didn't) make any sense: now it is called only when reference counter increment is actually requested. Moved condition out of mutex in trx_erase_lists(). trx_rw_is_active(), trx_rw_is_active_low() and trx_get_rw_trx_by_id() were greatly simplified and replaced by appropriate trx_rw_hash_t methods. Compared to rw_trx_set, rw_trx_hash holds transactions only in PREPARED or ACTIVE states. Transactions in COMMITTED state were required to be found at InnoDB startup only. They are now looked up in the local set. Removed unused trx_assert_recovered(). Removed unused innobase_get_trx() declaration. Removed rather semantically incorrect trx_sys_rw_trx_add(). Moved information printout from trx_sys_init_at_db_start() to trx_lists_init_at_db_start().
8 years ago
MDEV-12219 Discard temporary undo logs at transaction commit Starting with MySQL 5.7, temporary tables in InnoDB are handled differently from persistent tables. Because temporary tables are private to a connection, concurrency control and multi-versioning (MVCC) are not applicable. For performance reasons, purge is disabled as well. Rollback is supported for temporary tables; that is why we have the temporary undo logs in the first place. Because MVCC and purge are disabled for temporary tables, we should discard all temporary undo logs already at transaction commit, just like we discard the persistent insert_undo logs. Before this change, update_undo logs were being preserved. trx_temp_undo_t: A wrapper for temporary undo logs, comprising a rollback segment and a single temporary undo log. trx_rsegs_t::m_noredo: Use trx_temp_undo_t. (Instead of insert_undo, update_undo, there will be a single undo.) trx_is_noredo_rseg_updated(), trx_is_rseg_assigned(): Remove. trx_undo_add_page(): Remove the parameter undo_ptr. Acquire and release the rollback segment mutex inside the function. trx_undo_free_last_page(): Remove the parameter trx. trx_undo_truncate_end(): Remove the parameter trx, and add the parameter is_temp. Clean up the code a bit. trx_undo_assign_undo(): Split the parameter undo_ptr into rseg, undo. trx_undo_commit_cleanup(): Renamed from trx_undo_insert_cleanup(). Replace the parameter undo_ptr with undo. This will discard the temporary undo or insert_undo log at commit/rollback. trx_purge_add_update_undo_to_history(), trx_undo_update_cleanup(): Remove 3 parameters. Always operate on the persistent update_undo. trx_serialise(): Renamed from trx_serialisation_number_get(). trx_write_serialisation_history(): Simplify the code flow. If there are no persistent changes, do not update MONITOR_TRX_COMMIT_UNDO. trx_commit_in_memory(): Simplify the logic, and add assertions. trx_undo_page_report_modify(): Keep a direct reference to the persistent update_undo log. trx_undo_report_row_operation(): Simplify some code. Always assign TRX_UNDO_INSERT for temporary undo logs. trx_prepare_low(): Keep only one parameter. Prepare all 3 undo logs. trx_roll_try_truncate(): Remove the parameter undo_ptr. Try to truncate all 3 undo logs of the transaction. trx_roll_pop_top_rec_of_trx_low(): Remove. trx_roll_pop_top_rec_of_trx(): Remove the redundant parameter trx->roll_limit. Clear roll_limit when exhausting the undo logs. Consider all 3 undo logs at once, prioritizing the persistent undo logs. row_undo(): Minor cleanup. Let trx_roll_pop_top_rec_of_trx() reset the trx->roll_limit.
9 years ago
MDEV-12219 Discard temporary undo logs at transaction commit Starting with MySQL 5.7, temporary tables in InnoDB are handled differently from persistent tables. Because temporary tables are private to a connection, concurrency control and multi-versioning (MVCC) are not applicable. For performance reasons, purge is disabled as well. Rollback is supported for temporary tables; that is why we have the temporary undo logs in the first place. Because MVCC and purge are disabled for temporary tables, we should discard all temporary undo logs already at transaction commit, just like we discard the persistent insert_undo logs. Before this change, update_undo logs were being preserved. trx_temp_undo_t: A wrapper for temporary undo logs, comprising a rollback segment and a single temporary undo log. trx_rsegs_t::m_noredo: Use trx_temp_undo_t. (Instead of insert_undo, update_undo, there will be a single undo.) trx_is_noredo_rseg_updated(), trx_is_rseg_assigned(): Remove. trx_undo_add_page(): Remove the parameter undo_ptr. Acquire and release the rollback segment mutex inside the function. trx_undo_free_last_page(): Remove the parameter trx. trx_undo_truncate_end(): Remove the parameter trx, and add the parameter is_temp. Clean up the code a bit. trx_undo_assign_undo(): Split the parameter undo_ptr into rseg, undo. trx_undo_commit_cleanup(): Renamed from trx_undo_insert_cleanup(). Replace the parameter undo_ptr with undo. This will discard the temporary undo or insert_undo log at commit/rollback. trx_purge_add_update_undo_to_history(), trx_undo_update_cleanup(): Remove 3 parameters. Always operate on the persistent update_undo. trx_serialise(): Renamed from trx_serialisation_number_get(). trx_write_serialisation_history(): Simplify the code flow. If there are no persistent changes, do not update MONITOR_TRX_COMMIT_UNDO. trx_commit_in_memory(): Simplify the logic, and add assertions. trx_undo_page_report_modify(): Keep a direct reference to the persistent update_undo log. trx_undo_report_row_operation(): Simplify some code. Always assign TRX_UNDO_INSERT for temporary undo logs. trx_prepare_low(): Keep only one parameter. Prepare all 3 undo logs. trx_roll_try_truncate(): Remove the parameter undo_ptr. Try to truncate all 3 undo logs of the transaction. trx_roll_pop_top_rec_of_trx_low(): Remove. trx_roll_pop_top_rec_of_trx(): Remove the redundant parameter trx->roll_limit. Clear roll_limit when exhausting the undo logs. Consider all 3 undo logs at once, prioritizing the persistent undo logs. row_undo(): Minor cleanup. Let trx_roll_pop_top_rec_of_trx() reset the trx->roll_limit.
9 years ago
MDEV-12219 Discard temporary undo logs at transaction commit Starting with MySQL 5.7, temporary tables in InnoDB are handled differently from persistent tables. Because temporary tables are private to a connection, concurrency control and multi-versioning (MVCC) are not applicable. For performance reasons, purge is disabled as well. Rollback is supported for temporary tables; that is why we have the temporary undo logs in the first place. Because MVCC and purge are disabled for temporary tables, we should discard all temporary undo logs already at transaction commit, just like we discard the persistent insert_undo logs. Before this change, update_undo logs were being preserved. trx_temp_undo_t: A wrapper for temporary undo logs, comprising a rollback segment and a single temporary undo log. trx_rsegs_t::m_noredo: Use trx_temp_undo_t. (Instead of insert_undo, update_undo, there will be a single undo.) trx_is_noredo_rseg_updated(), trx_is_rseg_assigned(): Remove. trx_undo_add_page(): Remove the parameter undo_ptr. Acquire and release the rollback segment mutex inside the function. trx_undo_free_last_page(): Remove the parameter trx. trx_undo_truncate_end(): Remove the parameter trx, and add the parameter is_temp. Clean up the code a bit. trx_undo_assign_undo(): Split the parameter undo_ptr into rseg, undo. trx_undo_commit_cleanup(): Renamed from trx_undo_insert_cleanup(). Replace the parameter undo_ptr with undo. This will discard the temporary undo or insert_undo log at commit/rollback. trx_purge_add_update_undo_to_history(), trx_undo_update_cleanup(): Remove 3 parameters. Always operate on the persistent update_undo. trx_serialise(): Renamed from trx_serialisation_number_get(). trx_write_serialisation_history(): Simplify the code flow. If there are no persistent changes, do not update MONITOR_TRX_COMMIT_UNDO. trx_commit_in_memory(): Simplify the logic, and add assertions. trx_undo_page_report_modify(): Keep a direct reference to the persistent update_undo log. trx_undo_report_row_operation(): Simplify some code. Always assign TRX_UNDO_INSERT for temporary undo logs. trx_prepare_low(): Keep only one parameter. Prepare all 3 undo logs. trx_roll_try_truncate(): Remove the parameter undo_ptr. Try to truncate all 3 undo logs of the transaction. trx_roll_pop_top_rec_of_trx_low(): Remove. trx_roll_pop_top_rec_of_trx(): Remove the redundant parameter trx->roll_limit. Clear roll_limit when exhausting the undo logs. Consider all 3 undo logs at once, prioritizing the persistent undo logs. row_undo(): Minor cleanup. Let trx_roll_pop_top_rec_of_trx() reset the trx->roll_limit.
9 years ago
MDEV-12219 Discard temporary undo logs at transaction commit Starting with MySQL 5.7, temporary tables in InnoDB are handled differently from persistent tables. Because temporary tables are private to a connection, concurrency control and multi-versioning (MVCC) are not applicable. For performance reasons, purge is disabled as well. Rollback is supported for temporary tables; that is why we have the temporary undo logs in the first place. Because MVCC and purge are disabled for temporary tables, we should discard all temporary undo logs already at transaction commit, just like we discard the persistent insert_undo logs. Before this change, update_undo logs were being preserved. trx_temp_undo_t: A wrapper for temporary undo logs, comprising a rollback segment and a single temporary undo log. trx_rsegs_t::m_noredo: Use trx_temp_undo_t. (Instead of insert_undo, update_undo, there will be a single undo.) trx_is_noredo_rseg_updated(), trx_is_rseg_assigned(): Remove. trx_undo_add_page(): Remove the parameter undo_ptr. Acquire and release the rollback segment mutex inside the function. trx_undo_free_last_page(): Remove the parameter trx. trx_undo_truncate_end(): Remove the parameter trx, and add the parameter is_temp. Clean up the code a bit. trx_undo_assign_undo(): Split the parameter undo_ptr into rseg, undo. trx_undo_commit_cleanup(): Renamed from trx_undo_insert_cleanup(). Replace the parameter undo_ptr with undo. This will discard the temporary undo or insert_undo log at commit/rollback. trx_purge_add_update_undo_to_history(), trx_undo_update_cleanup(): Remove 3 parameters. Always operate on the persistent update_undo. trx_serialise(): Renamed from trx_serialisation_number_get(). trx_write_serialisation_history(): Simplify the code flow. If there are no persistent changes, do not update MONITOR_TRX_COMMIT_UNDO. trx_commit_in_memory(): Simplify the logic, and add assertions. trx_undo_page_report_modify(): Keep a direct reference to the persistent update_undo log. trx_undo_report_row_operation(): Simplify some code. Always assign TRX_UNDO_INSERT for temporary undo logs. trx_prepare_low(): Keep only one parameter. Prepare all 3 undo logs. trx_roll_try_truncate(): Remove the parameter undo_ptr. Try to truncate all 3 undo logs of the transaction. trx_roll_pop_top_rec_of_trx_low(): Remove. trx_roll_pop_top_rec_of_trx(): Remove the redundant parameter trx->roll_limit. Clear roll_limit when exhausting the undo logs. Consider all 3 undo logs at once, prioritizing the persistent undo logs. row_undo(): Minor cleanup. Let trx_roll_pop_top_rec_of_trx() reset the trx->roll_limit.
9 years ago
MDEV-12219 Discard temporary undo logs at transaction commit Starting with MySQL 5.7, temporary tables in InnoDB are handled differently from persistent tables. Because temporary tables are private to a connection, concurrency control and multi-versioning (MVCC) are not applicable. For performance reasons, purge is disabled as well. Rollback is supported for temporary tables; that is why we have the temporary undo logs in the first place. Because MVCC and purge are disabled for temporary tables, we should discard all temporary undo logs already at transaction commit, just like we discard the persistent insert_undo logs. Before this change, update_undo logs were being preserved. trx_temp_undo_t: A wrapper for temporary undo logs, comprising a rollback segment and a single temporary undo log. trx_rsegs_t::m_noredo: Use trx_temp_undo_t. (Instead of insert_undo, update_undo, there will be a single undo.) trx_is_noredo_rseg_updated(), trx_is_rseg_assigned(): Remove. trx_undo_add_page(): Remove the parameter undo_ptr. Acquire and release the rollback segment mutex inside the function. trx_undo_free_last_page(): Remove the parameter trx. trx_undo_truncate_end(): Remove the parameter trx, and add the parameter is_temp. Clean up the code a bit. trx_undo_assign_undo(): Split the parameter undo_ptr into rseg, undo. trx_undo_commit_cleanup(): Renamed from trx_undo_insert_cleanup(). Replace the parameter undo_ptr with undo. This will discard the temporary undo or insert_undo log at commit/rollback. trx_purge_add_update_undo_to_history(), trx_undo_update_cleanup(): Remove 3 parameters. Always operate on the persistent update_undo. trx_serialise(): Renamed from trx_serialisation_number_get(). trx_write_serialisation_history(): Simplify the code flow. If there are no persistent changes, do not update MONITOR_TRX_COMMIT_UNDO. trx_commit_in_memory(): Simplify the logic, and add assertions. trx_undo_page_report_modify(): Keep a direct reference to the persistent update_undo log. trx_undo_report_row_operation(): Simplify some code. Always assign TRX_UNDO_INSERT for temporary undo logs. trx_prepare_low(): Keep only one parameter. Prepare all 3 undo logs. trx_roll_try_truncate(): Remove the parameter undo_ptr. Try to truncate all 3 undo logs of the transaction. trx_roll_pop_top_rec_of_trx_low(): Remove. trx_roll_pop_top_rec_of_trx(): Remove the redundant parameter trx->roll_limit. Clear roll_limit when exhausting the undo logs. Consider all 3 undo logs at once, prioritizing the persistent undo logs. row_undo(): Minor cleanup. Let trx_roll_pop_top_rec_of_trx() reset the trx->roll_limit.
9 years ago
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed. [TODO: It appears that the resetting is not taking place as often as it could be. We should test that a simple INSERT should eventually cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is invoked soon enough.] The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert). When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0. The persistent InnoDB undo log used to be split into two parts: insert_undo and update_undo. The insert_undo log was discarded at transaction commit or rollback, and the update_undo log was processed by the purge subsystem. As part of this change, we will only generate a single undo log for new transactions, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is touched. That is, all persistent undo log will be preserved at transaction commit or rollback, to be removed by purge. The InnoDB redo log format is changed in two ways: We remove the redo log record type MLOG_UNDO_HDR_REUSE, and we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table. This is also changing the format of persistent InnoDB data files: undo log and clustered index leaf page records. It will still be possible via import and export to exchange data files with earlier versions of MariaDB. The change to clustered index leaf page records is simple: we allow DB_TRX_ID to be 0. When it comes to the undo log, we must be able to upgrade from earlier MariaDB versions after a clean shutdown (no redo log to apply). While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0) before an upgrade, to empty the undo logs, we cannot assume that this has been done. So, separate insert_undo log may exist for recovered uncommitted transactions. These transactions may be automatically rolled back, or they may be in XA PREPARE state, in which case InnoDB will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK. Upgrade has been tested by starting up MariaDB 10.2 with ./mysql-test-run --manual-gdb innodb.read_only_recovery and then starting up this patched server with and without --innodb-read-only. trx_undo_ptr_t::undo: Renamed from update_undo. trx_undo_ptr_t::old_insert: Renamed from insert_undo. trx_rseg_t::undo_list: Renamed from update_undo_list. trx_rseg_t::undo_cached: Merged from update_undo_cached and insert_undo_cached. trx_rseg_t::old_insert_list: Renamed from insert_undo_list. row_purge_reset_trx_id(): New function to reset the columns. This will be called for all undo processing in purge that does not remove the clustered index record. trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the old DB_TRX_ID of the record to the undo log. ReadView::changes_visible(): Allow id==0. (Return true for it. This is what speeds up the MVCC.) row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read(): Implement a fast path for DB_TRX_ID=0. Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type. MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format! innobase_start_or_create_for_mysql(): Set srv_undo_sources before starting any transactions. The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully tested by running the following: ./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680 grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err
8 years ago
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed. [TODO: It appears that the resetting is not taking place as often as it could be. We should test that a simple INSERT should eventually cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is invoked soon enough.] The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert). When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0. The persistent InnoDB undo log used to be split into two parts: insert_undo and update_undo. The insert_undo log was discarded at transaction commit or rollback, and the update_undo log was processed by the purge subsystem. As part of this change, we will only generate a single undo log for new transactions, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is touched. That is, all persistent undo log will be preserved at transaction commit or rollback, to be removed by purge. The InnoDB redo log format is changed in two ways: We remove the redo log record type MLOG_UNDO_HDR_REUSE, and we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table. This is also changing the format of persistent InnoDB data files: undo log and clustered index leaf page records. It will still be possible via import and export to exchange data files with earlier versions of MariaDB. The change to clustered index leaf page records is simple: we allow DB_TRX_ID to be 0. When it comes to the undo log, we must be able to upgrade from earlier MariaDB versions after a clean shutdown (no redo log to apply). While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0) before an upgrade, to empty the undo logs, we cannot assume that this has been done. So, separate insert_undo log may exist for recovered uncommitted transactions. These transactions may be automatically rolled back, or they may be in XA PREPARE state, in which case InnoDB will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK. Upgrade has been tested by starting up MariaDB 10.2 with ./mysql-test-run --manual-gdb innodb.read_only_recovery and then starting up this patched server with and without --innodb-read-only. trx_undo_ptr_t::undo: Renamed from update_undo. trx_undo_ptr_t::old_insert: Renamed from insert_undo. trx_rseg_t::undo_list: Renamed from update_undo_list. trx_rseg_t::undo_cached: Merged from update_undo_cached and insert_undo_cached. trx_rseg_t::old_insert_list: Renamed from insert_undo_list. row_purge_reset_trx_id(): New function to reset the columns. This will be called for all undo processing in purge that does not remove the clustered index record. trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the old DB_TRX_ID of the record to the undo log. ReadView::changes_visible(): Allow id==0. (Return true for it. This is what speeds up the MVCC.) row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read(): Implement a fast path for DB_TRX_ID=0. Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type. MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format! innobase_start_or_create_for_mysql(): Set srv_undo_sources before starting any transactions. The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully tested by running the following: ./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680 grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err
8 years ago
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed. [TODO: It appears that the resetting is not taking place as often as it could be. We should test that a simple INSERT should eventually cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is invoked soon enough.] The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert). When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0. The persistent InnoDB undo log used to be split into two parts: insert_undo and update_undo. The insert_undo log was discarded at transaction commit or rollback, and the update_undo log was processed by the purge subsystem. As part of this change, we will only generate a single undo log for new transactions, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is touched. That is, all persistent undo log will be preserved at transaction commit or rollback, to be removed by purge. The InnoDB redo log format is changed in two ways: We remove the redo log record type MLOG_UNDO_HDR_REUSE, and we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table. This is also changing the format of persistent InnoDB data files: undo log and clustered index leaf page records. It will still be possible via import and export to exchange data files with earlier versions of MariaDB. The change to clustered index leaf page records is simple: we allow DB_TRX_ID to be 0. When it comes to the undo log, we must be able to upgrade from earlier MariaDB versions after a clean shutdown (no redo log to apply). While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0) before an upgrade, to empty the undo logs, we cannot assume that this has been done. So, separate insert_undo log may exist for recovered uncommitted transactions. These transactions may be automatically rolled back, or they may be in XA PREPARE state, in which case InnoDB will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK. Upgrade has been tested by starting up MariaDB 10.2 with ./mysql-test-run --manual-gdb innodb.read_only_recovery and then starting up this patched server with and without --innodb-read-only. trx_undo_ptr_t::undo: Renamed from update_undo. trx_undo_ptr_t::old_insert: Renamed from insert_undo. trx_rseg_t::undo_list: Renamed from update_undo_list. trx_rseg_t::undo_cached: Merged from update_undo_cached and insert_undo_cached. trx_rseg_t::old_insert_list: Renamed from insert_undo_list. row_purge_reset_trx_id(): New function to reset the columns. This will be called for all undo processing in purge that does not remove the clustered index record. trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the old DB_TRX_ID of the record to the undo log. ReadView::changes_visible(): Allow id==0. (Return true for it. This is what speeds up the MVCC.) row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read(): Implement a fast path for DB_TRX_ID=0. Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type. MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format! innobase_start_or_create_for_mysql(): Set srv_undo_sources before starting any transactions. The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully tested by running the following: ./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680 grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err
8 years ago
MDEV-12219 Discard temporary undo logs at transaction commit Starting with MySQL 5.7, temporary tables in InnoDB are handled differently from persistent tables. Because temporary tables are private to a connection, concurrency control and multi-versioning (MVCC) are not applicable. For performance reasons, purge is disabled as well. Rollback is supported for temporary tables; that is why we have the temporary undo logs in the first place. Because MVCC and purge are disabled for temporary tables, we should discard all temporary undo logs already at transaction commit, just like we discard the persistent insert_undo logs. Before this change, update_undo logs were being preserved. trx_temp_undo_t: A wrapper for temporary undo logs, comprising a rollback segment and a single temporary undo log. trx_rsegs_t::m_noredo: Use trx_temp_undo_t. (Instead of insert_undo, update_undo, there will be a single undo.) trx_is_noredo_rseg_updated(), trx_is_rseg_assigned(): Remove. trx_undo_add_page(): Remove the parameter undo_ptr. Acquire and release the rollback segment mutex inside the function. trx_undo_free_last_page(): Remove the parameter trx. trx_undo_truncate_end(): Remove the parameter trx, and add the parameter is_temp. Clean up the code a bit. trx_undo_assign_undo(): Split the parameter undo_ptr into rseg, undo. trx_undo_commit_cleanup(): Renamed from trx_undo_insert_cleanup(). Replace the parameter undo_ptr with undo. This will discard the temporary undo or insert_undo log at commit/rollback. trx_purge_add_update_undo_to_history(), trx_undo_update_cleanup(): Remove 3 parameters. Always operate on the persistent update_undo. trx_serialise(): Renamed from trx_serialisation_number_get(). trx_write_serialisation_history(): Simplify the code flow. If there are no persistent changes, do not update MONITOR_TRX_COMMIT_UNDO. trx_commit_in_memory(): Simplify the logic, and add assertions. trx_undo_page_report_modify(): Keep a direct reference to the persistent update_undo log. trx_undo_report_row_operation(): Simplify some code. Always assign TRX_UNDO_INSERT for temporary undo logs. trx_prepare_low(): Keep only one parameter. Prepare all 3 undo logs. trx_roll_try_truncate(): Remove the parameter undo_ptr. Try to truncate all 3 undo logs of the transaction. trx_roll_pop_top_rec_of_trx_low(): Remove. trx_roll_pop_top_rec_of_trx(): Remove the redundant parameter trx->roll_limit. Clear roll_limit when exhausting the undo logs. Consider all 3 undo logs at once, prioritizing the persistent undo logs. row_undo(): Minor cleanup. Let trx_roll_pop_top_rec_of_trx() reset the trx->roll_limit.
9 years ago
MDEV-12219 Discard temporary undo logs at transaction commit Starting with MySQL 5.7, temporary tables in InnoDB are handled differently from persistent tables. Because temporary tables are private to a connection, concurrency control and multi-versioning (MVCC) are not applicable. For performance reasons, purge is disabled as well. Rollback is supported for temporary tables; that is why we have the temporary undo logs in the first place. Because MVCC and purge are disabled for temporary tables, we should discard all temporary undo logs already at transaction commit, just like we discard the persistent insert_undo logs. Before this change, update_undo logs were being preserved. trx_temp_undo_t: A wrapper for temporary undo logs, comprising a rollback segment and a single temporary undo log. trx_rsegs_t::m_noredo: Use trx_temp_undo_t. (Instead of insert_undo, update_undo, there will be a single undo.) trx_is_noredo_rseg_updated(), trx_is_rseg_assigned(): Remove. trx_undo_add_page(): Remove the parameter undo_ptr. Acquire and release the rollback segment mutex inside the function. trx_undo_free_last_page(): Remove the parameter trx. trx_undo_truncate_end(): Remove the parameter trx, and add the parameter is_temp. Clean up the code a bit. trx_undo_assign_undo(): Split the parameter undo_ptr into rseg, undo. trx_undo_commit_cleanup(): Renamed from trx_undo_insert_cleanup(). Replace the parameter undo_ptr with undo. This will discard the temporary undo or insert_undo log at commit/rollback. trx_purge_add_update_undo_to_history(), trx_undo_update_cleanup(): Remove 3 parameters. Always operate on the persistent update_undo. trx_serialise(): Renamed from trx_serialisation_number_get(). trx_write_serialisation_history(): Simplify the code flow. If there are no persistent changes, do not update MONITOR_TRX_COMMIT_UNDO. trx_commit_in_memory(): Simplify the logic, and add assertions. trx_undo_page_report_modify(): Keep a direct reference to the persistent update_undo log. trx_undo_report_row_operation(): Simplify some code. Always assign TRX_UNDO_INSERT for temporary undo logs. trx_prepare_low(): Keep only one parameter. Prepare all 3 undo logs. trx_roll_try_truncate(): Remove the parameter undo_ptr. Try to truncate all 3 undo logs of the transaction. trx_roll_pop_top_rec_of_trx_low(): Remove. trx_roll_pop_top_rec_of_trx(): Remove the redundant parameter trx->roll_limit. Clear roll_limit when exhausting the undo logs. Consider all 3 undo logs at once, prioritizing the persistent undo logs. row_undo(): Minor cleanup. Let trx_roll_pop_top_rec_of_trx() reset the trx->roll_limit.
9 years ago
MDEV-12219 Discard temporary undo logs at transaction commit Starting with MySQL 5.7, temporary tables in InnoDB are handled differently from persistent tables. Because temporary tables are private to a connection, concurrency control and multi-versioning (MVCC) are not applicable. For performance reasons, purge is disabled as well. Rollback is supported for temporary tables; that is why we have the temporary undo logs in the first place. Because MVCC and purge are disabled for temporary tables, we should discard all temporary undo logs already at transaction commit, just like we discard the persistent insert_undo logs. Before this change, update_undo logs were being preserved. trx_temp_undo_t: A wrapper for temporary undo logs, comprising a rollback segment and a single temporary undo log. trx_rsegs_t::m_noredo: Use trx_temp_undo_t. (Instead of insert_undo, update_undo, there will be a single undo.) trx_is_noredo_rseg_updated(), trx_is_rseg_assigned(): Remove. trx_undo_add_page(): Remove the parameter undo_ptr. Acquire and release the rollback segment mutex inside the function. trx_undo_free_last_page(): Remove the parameter trx. trx_undo_truncate_end(): Remove the parameter trx, and add the parameter is_temp. Clean up the code a bit. trx_undo_assign_undo(): Split the parameter undo_ptr into rseg, undo. trx_undo_commit_cleanup(): Renamed from trx_undo_insert_cleanup(). Replace the parameter undo_ptr with undo. This will discard the temporary undo or insert_undo log at commit/rollback. trx_purge_add_update_undo_to_history(), trx_undo_update_cleanup(): Remove 3 parameters. Always operate on the persistent update_undo. trx_serialise(): Renamed from trx_serialisation_number_get(). trx_write_serialisation_history(): Simplify the code flow. If there are no persistent changes, do not update MONITOR_TRX_COMMIT_UNDO. trx_commit_in_memory(): Simplify the logic, and add assertions. trx_undo_page_report_modify(): Keep a direct reference to the persistent update_undo log. trx_undo_report_row_operation(): Simplify some code. Always assign TRX_UNDO_INSERT for temporary undo logs. trx_prepare_low(): Keep only one parameter. Prepare all 3 undo logs. trx_roll_try_truncate(): Remove the parameter undo_ptr. Try to truncate all 3 undo logs of the transaction. trx_roll_pop_top_rec_of_trx_low(): Remove. trx_roll_pop_top_rec_of_trx(): Remove the redundant parameter trx->roll_limit. Clear roll_limit when exhausting the undo logs. Consider all 3 undo logs at once, prioritizing the persistent undo logs. row_undo(): Minor cleanup. Let trx_roll_pop_top_rec_of_trx() reset the trx->roll_limit.
9 years ago
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed. [TODO: It appears that the resetting is not taking place as often as it could be. We should test that a simple INSERT should eventually cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is invoked soon enough.] The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert). When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0. The persistent InnoDB undo log used to be split into two parts: insert_undo and update_undo. The insert_undo log was discarded at transaction commit or rollback, and the update_undo log was processed by the purge subsystem. As part of this change, we will only generate a single undo log for new transactions, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is touched. That is, all persistent undo log will be preserved at transaction commit or rollback, to be removed by purge. The InnoDB redo log format is changed in two ways: We remove the redo log record type MLOG_UNDO_HDR_REUSE, and we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table. This is also changing the format of persistent InnoDB data files: undo log and clustered index leaf page records. It will still be possible via import and export to exchange data files with earlier versions of MariaDB. The change to clustered index leaf page records is simple: we allow DB_TRX_ID to be 0. When it comes to the undo log, we must be able to upgrade from earlier MariaDB versions after a clean shutdown (no redo log to apply). While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0) before an upgrade, to empty the undo logs, we cannot assume that this has been done. So, separate insert_undo log may exist for recovered uncommitted transactions. These transactions may be automatically rolled back, or they may be in XA PREPARE state, in which case InnoDB will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK. Upgrade has been tested by starting up MariaDB 10.2 with ./mysql-test-run --manual-gdb innodb.read_only_recovery and then starting up this patched server with and without --innodb-read-only. trx_undo_ptr_t::undo: Renamed from update_undo. trx_undo_ptr_t::old_insert: Renamed from insert_undo. trx_rseg_t::undo_list: Renamed from update_undo_list. trx_rseg_t::undo_cached: Merged from update_undo_cached and insert_undo_cached. trx_rseg_t::old_insert_list: Renamed from insert_undo_list. row_purge_reset_trx_id(): New function to reset the columns. This will be called for all undo processing in purge that does not remove the clustered index record. trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the old DB_TRX_ID of the record to the undo log. ReadView::changes_visible(): Allow id==0. (Return true for it. This is what speeds up the MVCC.) row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read(): Implement a fast path for DB_TRX_ID=0. Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type. MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format! innobase_start_or_create_for_mysql(): Set srv_undo_sources before starting any transactions. The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully tested by running the following: ./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680 grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err
8 years ago
MDEV-12219 Discard temporary undo logs at transaction commit Starting with MySQL 5.7, temporary tables in InnoDB are handled differently from persistent tables. Because temporary tables are private to a connection, concurrency control and multi-versioning (MVCC) are not applicable. For performance reasons, purge is disabled as well. Rollback is supported for temporary tables; that is why we have the temporary undo logs in the first place. Because MVCC and purge are disabled for temporary tables, we should discard all temporary undo logs already at transaction commit, just like we discard the persistent insert_undo logs. Before this change, update_undo logs were being preserved. trx_temp_undo_t: A wrapper for temporary undo logs, comprising a rollback segment and a single temporary undo log. trx_rsegs_t::m_noredo: Use trx_temp_undo_t. (Instead of insert_undo, update_undo, there will be a single undo.) trx_is_noredo_rseg_updated(), trx_is_rseg_assigned(): Remove. trx_undo_add_page(): Remove the parameter undo_ptr. Acquire and release the rollback segment mutex inside the function. trx_undo_free_last_page(): Remove the parameter trx. trx_undo_truncate_end(): Remove the parameter trx, and add the parameter is_temp. Clean up the code a bit. trx_undo_assign_undo(): Split the parameter undo_ptr into rseg, undo. trx_undo_commit_cleanup(): Renamed from trx_undo_insert_cleanup(). Replace the parameter undo_ptr with undo. This will discard the temporary undo or insert_undo log at commit/rollback. trx_purge_add_update_undo_to_history(), trx_undo_update_cleanup(): Remove 3 parameters. Always operate on the persistent update_undo. trx_serialise(): Renamed from trx_serialisation_number_get(). trx_write_serialisation_history(): Simplify the code flow. If there are no persistent changes, do not update MONITOR_TRX_COMMIT_UNDO. trx_commit_in_memory(): Simplify the logic, and add assertions. trx_undo_page_report_modify(): Keep a direct reference to the persistent update_undo log. trx_undo_report_row_operation(): Simplify some code. Always assign TRX_UNDO_INSERT for temporary undo logs. trx_prepare_low(): Keep only one parameter. Prepare all 3 undo logs. trx_roll_try_truncate(): Remove the parameter undo_ptr. Try to truncate all 3 undo logs of the transaction. trx_roll_pop_top_rec_of_trx_low(): Remove. trx_roll_pop_top_rec_of_trx(): Remove the redundant parameter trx->roll_limit. Clear roll_limit when exhausting the undo logs. Consider all 3 undo logs at once, prioritizing the persistent undo logs. row_undo(): Minor cleanup. Let trx_roll_pop_top_rec_of_trx() reset the trx->roll_limit.
9 years ago
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed. [TODO: It appears that the resetting is not taking place as often as it could be. We should test that a simple INSERT should eventually cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is invoked soon enough.] The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert). When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0. The persistent InnoDB undo log used to be split into two parts: insert_undo and update_undo. The insert_undo log was discarded at transaction commit or rollback, and the update_undo log was processed by the purge subsystem. As part of this change, we will only generate a single undo log for new transactions, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is touched. That is, all persistent undo log will be preserved at transaction commit or rollback, to be removed by purge. The InnoDB redo log format is changed in two ways: We remove the redo log record type MLOG_UNDO_HDR_REUSE, and we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table. This is also changing the format of persistent InnoDB data files: undo log and clustered index leaf page records. It will still be possible via import and export to exchange data files with earlier versions of MariaDB. The change to clustered index leaf page records is simple: we allow DB_TRX_ID to be 0. When it comes to the undo log, we must be able to upgrade from earlier MariaDB versions after a clean shutdown (no redo log to apply). While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0) before an upgrade, to empty the undo logs, we cannot assume that this has been done. So, separate insert_undo log may exist for recovered uncommitted transactions. These transactions may be automatically rolled back, or they may be in XA PREPARE state, in which case InnoDB will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK. Upgrade has been tested by starting up MariaDB 10.2 with ./mysql-test-run --manual-gdb innodb.read_only_recovery and then starting up this patched server with and without --innodb-read-only. trx_undo_ptr_t::undo: Renamed from update_undo. trx_undo_ptr_t::old_insert: Renamed from insert_undo. trx_rseg_t::undo_list: Renamed from update_undo_list. trx_rseg_t::undo_cached: Merged from update_undo_cached and insert_undo_cached. trx_rseg_t::old_insert_list: Renamed from insert_undo_list. row_purge_reset_trx_id(): New function to reset the columns. This will be called for all undo processing in purge that does not remove the clustered index record. trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the old DB_TRX_ID of the record to the undo log. ReadView::changes_visible(): Allow id==0. (Return true for it. This is what speeds up the MVCC.) row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read(): Implement a fast path for DB_TRX_ID=0. Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type. MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format! innobase_start_or_create_for_mysql(): Set srv_undo_sources before starting any transactions. The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully tested by running the following: ./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680 grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err
8 years ago
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed. [TODO: It appears that the resetting is not taking place as often as it could be. We should test that a simple INSERT should eventually cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is invoked soon enough.] The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert). When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0. The persistent InnoDB undo log used to be split into two parts: insert_undo and update_undo. The insert_undo log was discarded at transaction commit or rollback, and the update_undo log was processed by the purge subsystem. As part of this change, we will only generate a single undo log for new transactions, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is touched. That is, all persistent undo log will be preserved at transaction commit or rollback, to be removed by purge. The InnoDB redo log format is changed in two ways: We remove the redo log record type MLOG_UNDO_HDR_REUSE, and we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table. This is also changing the format of persistent InnoDB data files: undo log and clustered index leaf page records. It will still be possible via import and export to exchange data files with earlier versions of MariaDB. The change to clustered index leaf page records is simple: we allow DB_TRX_ID to be 0. When it comes to the undo log, we must be able to upgrade from earlier MariaDB versions after a clean shutdown (no redo log to apply). While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0) before an upgrade, to empty the undo logs, we cannot assume that this has been done. So, separate insert_undo log may exist for recovered uncommitted transactions. These transactions may be automatically rolled back, or they may be in XA PREPARE state, in which case InnoDB will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK. Upgrade has been tested by starting up MariaDB 10.2 with ./mysql-test-run --manual-gdb innodb.read_only_recovery and then starting up this patched server with and without --innodb-read-only. trx_undo_ptr_t::undo: Renamed from update_undo. trx_undo_ptr_t::old_insert: Renamed from insert_undo. trx_rseg_t::undo_list: Renamed from update_undo_list. trx_rseg_t::undo_cached: Merged from update_undo_cached and insert_undo_cached. trx_rseg_t::old_insert_list: Renamed from insert_undo_list. row_purge_reset_trx_id(): New function to reset the columns. This will be called for all undo processing in purge that does not remove the clustered index record. trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the old DB_TRX_ID of the record to the undo log. ReadView::changes_visible(): Allow id==0. (Return true for it. This is what speeds up the MVCC.) row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read(): Implement a fast path for DB_TRX_ID=0. Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type. MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format! innobase_start_or_create_for_mysql(): Set srv_undo_sources before starting any transactions. The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully tested by running the following: ./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680 grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err
8 years ago
MDEV-14477 InnoDB update_time is wrongly updated after partial rollback or internal COMMIT The non-persistent UPDATE_TIME for InnoDB tables was not being updated consistently at transaction commit. If a transaction is partly rolled back so that in the end it will not modify a table that it intended to modify, the update_time will be updated nevertheless. This will also happen when InnoDB fails to write an undo log record for the intended modification. If a transaction is committed internally in InnoDB, instead of being committed from the SQL interface, then the trx_t::mod_tables will not be applied to the update_time of the tables. trx_t::mod_tables: Replace the std::set<dict_table_t*> with std::map<dict_table_t*,undo_no_t>, so that the very first modification within the transaction is identified. trx_undo_report_row_operation(): Update mod_tables for every operation after the undo log record was successfully written. trx_rollback_to_savepoint_low(): After partial rollback, erase from trx_t::mod_tables any tables for which all changes were rolled back. trx_commit_in_memory(): Tighten some assertions and simplify conditions. Invoke trx_update_mod_tables_timestamp() if persistent tables were affected. trx_commit_for_mysql(): Remove the call to trx_update_mod_tables_timestamp(), as it is now invoked at the lower level, in trx_commit_in_memory(). trx_rollback_finish(): Clear mod_tables before invoking trx_commit(), because the trx_commit_in_memory() would otherwise wrongly process mod_tables after a full ROLLBACK.
8 years ago
MDEV-14477 InnoDB update_time is wrongly updated after partial rollback or internal COMMIT The non-persistent UPDATE_TIME for InnoDB tables was not being updated consistently at transaction commit. If a transaction is partly rolled back so that in the end it will not modify a table that it intended to modify, the update_time will be updated nevertheless. This will also happen when InnoDB fails to write an undo log record for the intended modification. If a transaction is committed internally in InnoDB, instead of being committed from the SQL interface, then the trx_t::mod_tables will not be applied to the update_time of the tables. trx_t::mod_tables: Replace the std::set<dict_table_t*> with std::map<dict_table_t*,undo_no_t>, so that the very first modification within the transaction is identified. trx_undo_report_row_operation(): Update mod_tables for every operation after the undo log record was successfully written. trx_rollback_to_savepoint_low(): After partial rollback, erase from trx_t::mod_tables any tables for which all changes were rolled back. trx_commit_in_memory(): Tighten some assertions and simplify conditions. Invoke trx_update_mod_tables_timestamp() if persistent tables were affected. trx_commit_for_mysql(): Remove the call to trx_update_mod_tables_timestamp(), as it is now invoked at the lower level, in trx_commit_in_memory(). trx_rollback_finish(): Clear mod_tables before invoking trx_commit(), because the trx_commit_in_memory() would otherwise wrongly process mod_tables after a full ROLLBACK.
8 years ago
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed. [TODO: It appears that the resetting is not taking place as often as it could be. We should test that a simple INSERT should eventually cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is invoked soon enough.] The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert). When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0. The persistent InnoDB undo log used to be split into two parts: insert_undo and update_undo. The insert_undo log was discarded at transaction commit or rollback, and the update_undo log was processed by the purge subsystem. As part of this change, we will only generate a single undo log for new transactions, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is touched. That is, all persistent undo log will be preserved at transaction commit or rollback, to be removed by purge. The InnoDB redo log format is changed in two ways: We remove the redo log record type MLOG_UNDO_HDR_REUSE, and we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table. This is also changing the format of persistent InnoDB data files: undo log and clustered index leaf page records. It will still be possible via import and export to exchange data files with earlier versions of MariaDB. The change to clustered index leaf page records is simple: we allow DB_TRX_ID to be 0. When it comes to the undo log, we must be able to upgrade from earlier MariaDB versions after a clean shutdown (no redo log to apply). While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0) before an upgrade, to empty the undo logs, we cannot assume that this has been done. So, separate insert_undo log may exist for recovered uncommitted transactions. These transactions may be automatically rolled back, or they may be in XA PREPARE state, in which case InnoDB will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK. Upgrade has been tested by starting up MariaDB 10.2 with ./mysql-test-run --manual-gdb innodb.read_only_recovery and then starting up this patched server with and without --innodb-read-only. trx_undo_ptr_t::undo: Renamed from update_undo. trx_undo_ptr_t::old_insert: Renamed from insert_undo. trx_rseg_t::undo_list: Renamed from update_undo_list. trx_rseg_t::undo_cached: Merged from update_undo_cached and insert_undo_cached. trx_rseg_t::old_insert_list: Renamed from insert_undo_list. row_purge_reset_trx_id(): New function to reset the columns. This will be called for all undo processing in purge that does not remove the clustered index record. trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the old DB_TRX_ID of the record to the undo log. ReadView::changes_visible(): Allow id==0. (Return true for it. This is what speeds up the MVCC.) row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read(): Implement a fast path for DB_TRX_ID=0. Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type. MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format! innobase_start_or_create_for_mysql(): Set srv_undo_sources before starting any transactions. The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully tested by running the following: ./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680 grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err
8 years ago
MDEV-12219 Discard temporary undo logs at transaction commit Starting with MySQL 5.7, temporary tables in InnoDB are handled differently from persistent tables. Because temporary tables are private to a connection, concurrency control and multi-versioning (MVCC) are not applicable. For performance reasons, purge is disabled as well. Rollback is supported for temporary tables; that is why we have the temporary undo logs in the first place. Because MVCC and purge are disabled for temporary tables, we should discard all temporary undo logs already at transaction commit, just like we discard the persistent insert_undo logs. Before this change, update_undo logs were being preserved. trx_temp_undo_t: A wrapper for temporary undo logs, comprising a rollback segment and a single temporary undo log. trx_rsegs_t::m_noredo: Use trx_temp_undo_t. (Instead of insert_undo, update_undo, there will be a single undo.) trx_is_noredo_rseg_updated(), trx_is_rseg_assigned(): Remove. trx_undo_add_page(): Remove the parameter undo_ptr. Acquire and release the rollback segment mutex inside the function. trx_undo_free_last_page(): Remove the parameter trx. trx_undo_truncate_end(): Remove the parameter trx, and add the parameter is_temp. Clean up the code a bit. trx_undo_assign_undo(): Split the parameter undo_ptr into rseg, undo. trx_undo_commit_cleanup(): Renamed from trx_undo_insert_cleanup(). Replace the parameter undo_ptr with undo. This will discard the temporary undo or insert_undo log at commit/rollback. trx_purge_add_update_undo_to_history(), trx_undo_update_cleanup(): Remove 3 parameters. Always operate on the persistent update_undo. trx_serialise(): Renamed from trx_serialisation_number_get(). trx_write_serialisation_history(): Simplify the code flow. If there are no persistent changes, do not update MONITOR_TRX_COMMIT_UNDO. trx_commit_in_memory(): Simplify the logic, and add assertions. trx_undo_page_report_modify(): Keep a direct reference to the persistent update_undo log. trx_undo_report_row_operation(): Simplify some code. Always assign TRX_UNDO_INSERT for temporary undo logs. trx_prepare_low(): Keep only one parameter. Prepare all 3 undo logs. trx_roll_try_truncate(): Remove the parameter undo_ptr. Try to truncate all 3 undo logs of the transaction. trx_roll_pop_top_rec_of_trx_low(): Remove. trx_roll_pop_top_rec_of_trx(): Remove the redundant parameter trx->roll_limit. Clear roll_limit when exhausting the undo logs. Consider all 3 undo logs at once, prioritizing the persistent undo logs. row_undo(): Minor cleanup. Let trx_roll_pop_top_rec_of_trx() reset the trx->roll_limit.
9 years ago
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed. [TODO: It appears that the resetting is not taking place as often as it could be. We should test that a simple INSERT should eventually cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is invoked soon enough.] The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert). When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0. The persistent InnoDB undo log used to be split into two parts: insert_undo and update_undo. The insert_undo log was discarded at transaction commit or rollback, and the update_undo log was processed by the purge subsystem. As part of this change, we will only generate a single undo log for new transactions, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is touched. That is, all persistent undo log will be preserved at transaction commit or rollback, to be removed by purge. The InnoDB redo log format is changed in two ways: We remove the redo log record type MLOG_UNDO_HDR_REUSE, and we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table. This is also changing the format of persistent InnoDB data files: undo log and clustered index leaf page records. It will still be possible via import and export to exchange data files with earlier versions of MariaDB. The change to clustered index leaf page records is simple: we allow DB_TRX_ID to be 0. When it comes to the undo log, we must be able to upgrade from earlier MariaDB versions after a clean shutdown (no redo log to apply). While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0) before an upgrade, to empty the undo logs, we cannot assume that this has been done. So, separate insert_undo log may exist for recovered uncommitted transactions. These transactions may be automatically rolled back, or they may be in XA PREPARE state, in which case InnoDB will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK. Upgrade has been tested by starting up MariaDB 10.2 with ./mysql-test-run --manual-gdb innodb.read_only_recovery and then starting up this patched server with and without --innodb-read-only. trx_undo_ptr_t::undo: Renamed from update_undo. trx_undo_ptr_t::old_insert: Renamed from insert_undo. trx_rseg_t::undo_list: Renamed from update_undo_list. trx_rseg_t::undo_cached: Merged from update_undo_cached and insert_undo_cached. trx_rseg_t::old_insert_list: Renamed from insert_undo_list. row_purge_reset_trx_id(): New function to reset the columns. This will be called for all undo processing in purge that does not remove the clustered index record. trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the old DB_TRX_ID of the record to the undo log. ReadView::changes_visible(): Allow id==0. (Return true for it. This is what speeds up the MVCC.) row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read(): Implement a fast path for DB_TRX_ID=0. Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type. MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format! innobase_start_or_create_for_mysql(): Set srv_undo_sources before starting any transactions. The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully tested by running the following: ./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680 grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err
8 years ago
MDEV-12219 Discard temporary undo logs at transaction commit Starting with MySQL 5.7, temporary tables in InnoDB are handled differently from persistent tables. Because temporary tables are private to a connection, concurrency control and multi-versioning (MVCC) are not applicable. For performance reasons, purge is disabled as well. Rollback is supported for temporary tables; that is why we have the temporary undo logs in the first place. Because MVCC and purge are disabled for temporary tables, we should discard all temporary undo logs already at transaction commit, just like we discard the persistent insert_undo logs. Before this change, update_undo logs were being preserved. trx_temp_undo_t: A wrapper for temporary undo logs, comprising a rollback segment and a single temporary undo log. trx_rsegs_t::m_noredo: Use trx_temp_undo_t. (Instead of insert_undo, update_undo, there will be a single undo.) trx_is_noredo_rseg_updated(), trx_is_rseg_assigned(): Remove. trx_undo_add_page(): Remove the parameter undo_ptr. Acquire and release the rollback segment mutex inside the function. trx_undo_free_last_page(): Remove the parameter trx. trx_undo_truncate_end(): Remove the parameter trx, and add the parameter is_temp. Clean up the code a bit. trx_undo_assign_undo(): Split the parameter undo_ptr into rseg, undo. trx_undo_commit_cleanup(): Renamed from trx_undo_insert_cleanup(). Replace the parameter undo_ptr with undo. This will discard the temporary undo or insert_undo log at commit/rollback. trx_purge_add_update_undo_to_history(), trx_undo_update_cleanup(): Remove 3 parameters. Always operate on the persistent update_undo. trx_serialise(): Renamed from trx_serialisation_number_get(). trx_write_serialisation_history(): Simplify the code flow. If there are no persistent changes, do not update MONITOR_TRX_COMMIT_UNDO. trx_commit_in_memory(): Simplify the logic, and add assertions. trx_undo_page_report_modify(): Keep a direct reference to the persistent update_undo log. trx_undo_report_row_operation(): Simplify some code. Always assign TRX_UNDO_INSERT for temporary undo logs. trx_prepare_low(): Keep only one parameter. Prepare all 3 undo logs. trx_roll_try_truncate(): Remove the parameter undo_ptr. Try to truncate all 3 undo logs of the transaction. trx_roll_pop_top_rec_of_trx_low(): Remove. trx_roll_pop_top_rec_of_trx(): Remove the redundant parameter trx->roll_limit. Clear roll_limit when exhausting the undo logs. Consider all 3 undo logs at once, prioritizing the persistent undo logs. row_undo(): Minor cleanup. Let trx_roll_pop_top_rec_of_trx() reset the trx->roll_limit.
9 years ago
MDEV-12219 Discard temporary undo logs at transaction commit Starting with MySQL 5.7, temporary tables in InnoDB are handled differently from persistent tables. Because temporary tables are private to a connection, concurrency control and multi-versioning (MVCC) are not applicable. For performance reasons, purge is disabled as well. Rollback is supported for temporary tables; that is why we have the temporary undo logs in the first place. Because MVCC and purge are disabled for temporary tables, we should discard all temporary undo logs already at transaction commit, just like we discard the persistent insert_undo logs. Before this change, update_undo logs were being preserved. trx_temp_undo_t: A wrapper for temporary undo logs, comprising a rollback segment and a single temporary undo log. trx_rsegs_t::m_noredo: Use trx_temp_undo_t. (Instead of insert_undo, update_undo, there will be a single undo.) trx_is_noredo_rseg_updated(), trx_is_rseg_assigned(): Remove. trx_undo_add_page(): Remove the parameter undo_ptr. Acquire and release the rollback segment mutex inside the function. trx_undo_free_last_page(): Remove the parameter trx. trx_undo_truncate_end(): Remove the parameter trx, and add the parameter is_temp. Clean up the code a bit. trx_undo_assign_undo(): Split the parameter undo_ptr into rseg, undo. trx_undo_commit_cleanup(): Renamed from trx_undo_insert_cleanup(). Replace the parameter undo_ptr with undo. This will discard the temporary undo or insert_undo log at commit/rollback. trx_purge_add_update_undo_to_history(), trx_undo_update_cleanup(): Remove 3 parameters. Always operate on the persistent update_undo. trx_serialise(): Renamed from trx_serialisation_number_get(). trx_write_serialisation_history(): Simplify the code flow. If there are no persistent changes, do not update MONITOR_TRX_COMMIT_UNDO. trx_commit_in_memory(): Simplify the logic, and add assertions. trx_undo_page_report_modify(): Keep a direct reference to the persistent update_undo log. trx_undo_report_row_operation(): Simplify some code. Always assign TRX_UNDO_INSERT for temporary undo logs. trx_prepare_low(): Keep only one parameter. Prepare all 3 undo logs. trx_roll_try_truncate(): Remove the parameter undo_ptr. Try to truncate all 3 undo logs of the transaction. trx_roll_pop_top_rec_of_trx_low(): Remove. trx_roll_pop_top_rec_of_trx(): Remove the redundant parameter trx->roll_limit. Clear roll_limit when exhausting the undo logs. Consider all 3 undo logs at once, prioritizing the persistent undo logs. row_undo(): Minor cleanup. Let trx_roll_pop_top_rec_of_trx() reset the trx->roll_limit.
9 years ago
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed. [TODO: It appears that the resetting is not taking place as often as it could be. We should test that a simple INSERT should eventually cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is invoked soon enough.] The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert). When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0. The persistent InnoDB undo log used to be split into two parts: insert_undo and update_undo. The insert_undo log was discarded at transaction commit or rollback, and the update_undo log was processed by the purge subsystem. As part of this change, we will only generate a single undo log for new transactions, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is touched. That is, all persistent undo log will be preserved at transaction commit or rollback, to be removed by purge. The InnoDB redo log format is changed in two ways: We remove the redo log record type MLOG_UNDO_HDR_REUSE, and we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table. This is also changing the format of persistent InnoDB data files: undo log and clustered index leaf page records. It will still be possible via import and export to exchange data files with earlier versions of MariaDB. The change to clustered index leaf page records is simple: we allow DB_TRX_ID to be 0. When it comes to the undo log, we must be able to upgrade from earlier MariaDB versions after a clean shutdown (no redo log to apply). While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0) before an upgrade, to empty the undo logs, we cannot assume that this has been done. So, separate insert_undo log may exist for recovered uncommitted transactions. These transactions may be automatically rolled back, or they may be in XA PREPARE state, in which case InnoDB will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK. Upgrade has been tested by starting up MariaDB 10.2 with ./mysql-test-run --manual-gdb innodb.read_only_recovery and then starting up this patched server with and without --innodb-read-only. trx_undo_ptr_t::undo: Renamed from update_undo. trx_undo_ptr_t::old_insert: Renamed from insert_undo. trx_rseg_t::undo_list: Renamed from update_undo_list. trx_rseg_t::undo_cached: Merged from update_undo_cached and insert_undo_cached. trx_rseg_t::old_insert_list: Renamed from insert_undo_list. row_purge_reset_trx_id(): New function to reset the columns. This will be called for all undo processing in purge that does not remove the clustered index record. trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the old DB_TRX_ID of the record to the undo log. ReadView::changes_visible(): Allow id==0. (Return true for it. This is what speeds up the MVCC.) row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read(): Implement a fast path for DB_TRX_ID=0. Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type. MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format! innobase_start_or_create_for_mysql(): Set srv_undo_sources before starting any transactions. The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully tested by running the following: ./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680 grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err
8 years ago
MDEV-12219 Discard temporary undo logs at transaction commit Starting with MySQL 5.7, temporary tables in InnoDB are handled differently from persistent tables. Because temporary tables are private to a connection, concurrency control and multi-versioning (MVCC) are not applicable. For performance reasons, purge is disabled as well. Rollback is supported for temporary tables; that is why we have the temporary undo logs in the first place. Because MVCC and purge are disabled for temporary tables, we should discard all temporary undo logs already at transaction commit, just like we discard the persistent insert_undo logs. Before this change, update_undo logs were being preserved. trx_temp_undo_t: A wrapper for temporary undo logs, comprising a rollback segment and a single temporary undo log. trx_rsegs_t::m_noredo: Use trx_temp_undo_t. (Instead of insert_undo, update_undo, there will be a single undo.) trx_is_noredo_rseg_updated(), trx_is_rseg_assigned(): Remove. trx_undo_add_page(): Remove the parameter undo_ptr. Acquire and release the rollback segment mutex inside the function. trx_undo_free_last_page(): Remove the parameter trx. trx_undo_truncate_end(): Remove the parameter trx, and add the parameter is_temp. Clean up the code a bit. trx_undo_assign_undo(): Split the parameter undo_ptr into rseg, undo. trx_undo_commit_cleanup(): Renamed from trx_undo_insert_cleanup(). Replace the parameter undo_ptr with undo. This will discard the temporary undo or insert_undo log at commit/rollback. trx_purge_add_update_undo_to_history(), trx_undo_update_cleanup(): Remove 3 parameters. Always operate on the persistent update_undo. trx_serialise(): Renamed from trx_serialisation_number_get(). trx_write_serialisation_history(): Simplify the code flow. If there are no persistent changes, do not update MONITOR_TRX_COMMIT_UNDO. trx_commit_in_memory(): Simplify the logic, and add assertions. trx_undo_page_report_modify(): Keep a direct reference to the persistent update_undo log. trx_undo_report_row_operation(): Simplify some code. Always assign TRX_UNDO_INSERT for temporary undo logs. trx_prepare_low(): Keep only one parameter. Prepare all 3 undo logs. trx_roll_try_truncate(): Remove the parameter undo_ptr. Try to truncate all 3 undo logs of the transaction. trx_roll_pop_top_rec_of_trx_low(): Remove. trx_roll_pop_top_rec_of_trx(): Remove the redundant parameter trx->roll_limit. Clear roll_limit when exhausting the undo logs. Consider all 3 undo logs at once, prioritizing the persistent undo logs. row_undo(): Minor cleanup. Let trx_roll_pop_top_rec_of_trx() reset the trx->roll_limit.
9 years ago
11 years ago
MDEV-12219 Discard temporary undo logs at transaction commit Starting with MySQL 5.7, temporary tables in InnoDB are handled differently from persistent tables. Because temporary tables are private to a connection, concurrency control and multi-versioning (MVCC) are not applicable. For performance reasons, purge is disabled as well. Rollback is supported for temporary tables; that is why we have the temporary undo logs in the first place. Because MVCC and purge are disabled for temporary tables, we should discard all temporary undo logs already at transaction commit, just like we discard the persistent insert_undo logs. Before this change, update_undo logs were being preserved. trx_temp_undo_t: A wrapper for temporary undo logs, comprising a rollback segment and a single temporary undo log. trx_rsegs_t::m_noredo: Use trx_temp_undo_t. (Instead of insert_undo, update_undo, there will be a single undo.) trx_is_noredo_rseg_updated(), trx_is_rseg_assigned(): Remove. trx_undo_add_page(): Remove the parameter undo_ptr. Acquire and release the rollback segment mutex inside the function. trx_undo_free_last_page(): Remove the parameter trx. trx_undo_truncate_end(): Remove the parameter trx, and add the parameter is_temp. Clean up the code a bit. trx_undo_assign_undo(): Split the parameter undo_ptr into rseg, undo. trx_undo_commit_cleanup(): Renamed from trx_undo_insert_cleanup(). Replace the parameter undo_ptr with undo. This will discard the temporary undo or insert_undo log at commit/rollback. trx_purge_add_update_undo_to_history(), trx_undo_update_cleanup(): Remove 3 parameters. Always operate on the persistent update_undo. trx_serialise(): Renamed from trx_serialisation_number_get(). trx_write_serialisation_history(): Simplify the code flow. If there are no persistent changes, do not update MONITOR_TRX_COMMIT_UNDO. trx_commit_in_memory(): Simplify the logic, and add assertions. trx_undo_page_report_modify(): Keep a direct reference to the persistent update_undo log. trx_undo_report_row_operation(): Simplify some code. Always assign TRX_UNDO_INSERT for temporary undo logs. trx_prepare_low(): Keep only one parameter. Prepare all 3 undo logs. trx_roll_try_truncate(): Remove the parameter undo_ptr. Try to truncate all 3 undo logs of the transaction. trx_roll_pop_top_rec_of_trx_low(): Remove. trx_roll_pop_top_rec_of_trx(): Remove the redundant parameter trx->roll_limit. Clear roll_limit when exhausting the undo logs. Consider all 3 undo logs at once, prioritizing the persistent undo logs. row_undo(): Minor cleanup. Let trx_roll_pop_top_rec_of_trx() reset the trx->roll_limit.
9 years ago
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed. [TODO: It appears that the resetting is not taking place as often as it could be. We should test that a simple INSERT should eventually cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is invoked soon enough.] The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert). When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0. The persistent InnoDB undo log used to be split into two parts: insert_undo and update_undo. The insert_undo log was discarded at transaction commit or rollback, and the update_undo log was processed by the purge subsystem. As part of this change, we will only generate a single undo log for new transactions, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is touched. That is, all persistent undo log will be preserved at transaction commit or rollback, to be removed by purge. The InnoDB redo log format is changed in two ways: We remove the redo log record type MLOG_UNDO_HDR_REUSE, and we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table. This is also changing the format of persistent InnoDB data files: undo log and clustered index leaf page records. It will still be possible via import and export to exchange data files with earlier versions of MariaDB. The change to clustered index leaf page records is simple: we allow DB_TRX_ID to be 0. When it comes to the undo log, we must be able to upgrade from earlier MariaDB versions after a clean shutdown (no redo log to apply). While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0) before an upgrade, to empty the undo logs, we cannot assume that this has been done. So, separate insert_undo log may exist for recovered uncommitted transactions. These transactions may be automatically rolled back, or they may be in XA PREPARE state, in which case InnoDB will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK. Upgrade has been tested by starting up MariaDB 10.2 with ./mysql-test-run --manual-gdb innodb.read_only_recovery and then starting up this patched server with and without --innodb-read-only. trx_undo_ptr_t::undo: Renamed from update_undo. trx_undo_ptr_t::old_insert: Renamed from insert_undo. trx_rseg_t::undo_list: Renamed from update_undo_list. trx_rseg_t::undo_cached: Merged from update_undo_cached and insert_undo_cached. trx_rseg_t::old_insert_list: Renamed from insert_undo_list. row_purge_reset_trx_id(): New function to reset the columns. This will be called for all undo processing in purge that does not remove the clustered index record. trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the old DB_TRX_ID of the record to the undo log. ReadView::changes_visible(): Allow id==0. (Return true for it. This is what speeds up the MVCC.) row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read(): Implement a fast path for DB_TRX_ID=0. Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type. MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format! innobase_start_or_create_for_mysql(): Set srv_undo_sources before starting any transactions. The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully tested by running the following: ./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680 grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err
8 years ago
12 years ago
12 years ago
MDEV-12219 Discard temporary undo logs at transaction commit Starting with MySQL 5.7, temporary tables in InnoDB are handled differently from persistent tables. Because temporary tables are private to a connection, concurrency control and multi-versioning (MVCC) are not applicable. For performance reasons, purge is disabled as well. Rollback is supported for temporary tables; that is why we have the temporary undo logs in the first place. Because MVCC and purge are disabled for temporary tables, we should discard all temporary undo logs already at transaction commit, just like we discard the persistent insert_undo logs. Before this change, update_undo logs were being preserved. trx_temp_undo_t: A wrapper for temporary undo logs, comprising a rollback segment and a single temporary undo log. trx_rsegs_t::m_noredo: Use trx_temp_undo_t. (Instead of insert_undo, update_undo, there will be a single undo.) trx_is_noredo_rseg_updated(), trx_is_rseg_assigned(): Remove. trx_undo_add_page(): Remove the parameter undo_ptr. Acquire and release the rollback segment mutex inside the function. trx_undo_free_last_page(): Remove the parameter trx. trx_undo_truncate_end(): Remove the parameter trx, and add the parameter is_temp. Clean up the code a bit. trx_undo_assign_undo(): Split the parameter undo_ptr into rseg, undo. trx_undo_commit_cleanup(): Renamed from trx_undo_insert_cleanup(). Replace the parameter undo_ptr with undo. This will discard the temporary undo or insert_undo log at commit/rollback. trx_purge_add_update_undo_to_history(), trx_undo_update_cleanup(): Remove 3 parameters. Always operate on the persistent update_undo. trx_serialise(): Renamed from trx_serialisation_number_get(). trx_write_serialisation_history(): Simplify the code flow. If there are no persistent changes, do not update MONITOR_TRX_COMMIT_UNDO. trx_commit_in_memory(): Simplify the logic, and add assertions. trx_undo_page_report_modify(): Keep a direct reference to the persistent update_undo log. trx_undo_report_row_operation(): Simplify some code. Always assign TRX_UNDO_INSERT for temporary undo logs. trx_prepare_low(): Keep only one parameter. Prepare all 3 undo logs. trx_roll_try_truncate(): Remove the parameter undo_ptr. Try to truncate all 3 undo logs of the transaction. trx_roll_pop_top_rec_of_trx_low(): Remove. trx_roll_pop_top_rec_of_trx(): Remove the redundant parameter trx->roll_limit. Clear roll_limit when exhausting the undo logs. Consider all 3 undo logs at once, prioritizing the persistent undo logs. row_undo(): Minor cleanup. Let trx_roll_pop_top_rec_of_trx() reset the trx->roll_limit.
9 years ago
MDEV-12219 Discard temporary undo logs at transaction commit Starting with MySQL 5.7, temporary tables in InnoDB are handled differently from persistent tables. Because temporary tables are private to a connection, concurrency control and multi-versioning (MVCC) are not applicable. For performance reasons, purge is disabled as well. Rollback is supported for temporary tables; that is why we have the temporary undo logs in the first place. Because MVCC and purge are disabled for temporary tables, we should discard all temporary undo logs already at transaction commit, just like we discard the persistent insert_undo logs. Before this change, update_undo logs were being preserved. trx_temp_undo_t: A wrapper for temporary undo logs, comprising a rollback segment and a single temporary undo log. trx_rsegs_t::m_noredo: Use trx_temp_undo_t. (Instead of insert_undo, update_undo, there will be a single undo.) trx_is_noredo_rseg_updated(), trx_is_rseg_assigned(): Remove. trx_undo_add_page(): Remove the parameter undo_ptr. Acquire and release the rollback segment mutex inside the function. trx_undo_free_last_page(): Remove the parameter trx. trx_undo_truncate_end(): Remove the parameter trx, and add the parameter is_temp. Clean up the code a bit. trx_undo_assign_undo(): Split the parameter undo_ptr into rseg, undo. trx_undo_commit_cleanup(): Renamed from trx_undo_insert_cleanup(). Replace the parameter undo_ptr with undo. This will discard the temporary undo or insert_undo log at commit/rollback. trx_purge_add_update_undo_to_history(), trx_undo_update_cleanup(): Remove 3 parameters. Always operate on the persistent update_undo. trx_serialise(): Renamed from trx_serialisation_number_get(). trx_write_serialisation_history(): Simplify the code flow. If there are no persistent changes, do not update MONITOR_TRX_COMMIT_UNDO. trx_commit_in_memory(): Simplify the logic, and add assertions. trx_undo_page_report_modify(): Keep a direct reference to the persistent update_undo log. trx_undo_report_row_operation(): Simplify some code. Always assign TRX_UNDO_INSERT for temporary undo logs. trx_prepare_low(): Keep only one parameter. Prepare all 3 undo logs. trx_roll_try_truncate(): Remove the parameter undo_ptr. Try to truncate all 3 undo logs of the transaction. trx_roll_pop_top_rec_of_trx_low(): Remove. trx_roll_pop_top_rec_of_trx(): Remove the redundant parameter trx->roll_limit. Clear roll_limit when exhausting the undo logs. Consider all 3 undo logs at once, prioritizing the persistent undo logs. row_undo(): Minor cleanup. Let trx_roll_pop_top_rec_of_trx() reset the trx->roll_limit.
9 years ago
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed. [TODO: It appears that the resetting is not taking place as often as it could be. We should test that a simple INSERT should eventually cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is invoked soon enough.] The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert). When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0. The persistent InnoDB undo log used to be split into two parts: insert_undo and update_undo. The insert_undo log was discarded at transaction commit or rollback, and the update_undo log was processed by the purge subsystem. As part of this change, we will only generate a single undo log for new transactions, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is touched. That is, all persistent undo log will be preserved at transaction commit or rollback, to be removed by purge. The InnoDB redo log format is changed in two ways: We remove the redo log record type MLOG_UNDO_HDR_REUSE, and we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table. This is also changing the format of persistent InnoDB data files: undo log and clustered index leaf page records. It will still be possible via import and export to exchange data files with earlier versions of MariaDB. The change to clustered index leaf page records is simple: we allow DB_TRX_ID to be 0. When it comes to the undo log, we must be able to upgrade from earlier MariaDB versions after a clean shutdown (no redo log to apply). While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0) before an upgrade, to empty the undo logs, we cannot assume that this has been done. So, separate insert_undo log may exist for recovered uncommitted transactions. These transactions may be automatically rolled back, or they may be in XA PREPARE state, in which case InnoDB will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK. Upgrade has been tested by starting up MariaDB 10.2 with ./mysql-test-run --manual-gdb innodb.read_only_recovery and then starting up this patched server with and without --innodb-read-only. trx_undo_ptr_t::undo: Renamed from update_undo. trx_undo_ptr_t::old_insert: Renamed from insert_undo. trx_rseg_t::undo_list: Renamed from update_undo_list. trx_rseg_t::undo_cached: Merged from update_undo_cached and insert_undo_cached. trx_rseg_t::old_insert_list: Renamed from insert_undo_list. row_purge_reset_trx_id(): New function to reset the columns. This will be called for all undo processing in purge that does not remove the clustered index record. trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the old DB_TRX_ID of the record to the undo log. ReadView::changes_visible(): Allow id==0. (Return true for it. This is what speeds up the MVCC.) row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read(): Implement a fast path for DB_TRX_ID=0. Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type. MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format! innobase_start_or_create_for_mysql(): Set srv_undo_sources before starting any transactions. The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully tested by running the following: ./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680 grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err
8 years ago
MDEV-12219 Discard temporary undo logs at transaction commit Starting with MySQL 5.7, temporary tables in InnoDB are handled differently from persistent tables. Because temporary tables are private to a connection, concurrency control and multi-versioning (MVCC) are not applicable. For performance reasons, purge is disabled as well. Rollback is supported for temporary tables; that is why we have the temporary undo logs in the first place. Because MVCC and purge are disabled for temporary tables, we should discard all temporary undo logs already at transaction commit, just like we discard the persistent insert_undo logs. Before this change, update_undo logs were being preserved. trx_temp_undo_t: A wrapper for temporary undo logs, comprising a rollback segment and a single temporary undo log. trx_rsegs_t::m_noredo: Use trx_temp_undo_t. (Instead of insert_undo, update_undo, there will be a single undo.) trx_is_noredo_rseg_updated(), trx_is_rseg_assigned(): Remove. trx_undo_add_page(): Remove the parameter undo_ptr. Acquire and release the rollback segment mutex inside the function. trx_undo_free_last_page(): Remove the parameter trx. trx_undo_truncate_end(): Remove the parameter trx, and add the parameter is_temp. Clean up the code a bit. trx_undo_assign_undo(): Split the parameter undo_ptr into rseg, undo. trx_undo_commit_cleanup(): Renamed from trx_undo_insert_cleanup(). Replace the parameter undo_ptr with undo. This will discard the temporary undo or insert_undo log at commit/rollback. trx_purge_add_update_undo_to_history(), trx_undo_update_cleanup(): Remove 3 parameters. Always operate on the persistent update_undo. trx_serialise(): Renamed from trx_serialisation_number_get(). trx_write_serialisation_history(): Simplify the code flow. If there are no persistent changes, do not update MONITOR_TRX_COMMIT_UNDO. trx_commit_in_memory(): Simplify the logic, and add assertions. trx_undo_page_report_modify(): Keep a direct reference to the persistent update_undo log. trx_undo_report_row_operation(): Simplify some code. Always assign TRX_UNDO_INSERT for temporary undo logs. trx_prepare_low(): Keep only one parameter. Prepare all 3 undo logs. trx_roll_try_truncate(): Remove the parameter undo_ptr. Try to truncate all 3 undo logs of the transaction. trx_roll_pop_top_rec_of_trx_low(): Remove. trx_roll_pop_top_rec_of_trx(): Remove the redundant parameter trx->roll_limit. Clear roll_limit when exhausting the undo logs. Consider all 3 undo logs at once, prioritizing the persistent undo logs. row_undo(): Minor cleanup. Let trx_roll_pop_top_rec_of_trx() reset the trx->roll_limit.
9 years ago
MDEV-12219 Discard temporary undo logs at transaction commit Starting with MySQL 5.7, temporary tables in InnoDB are handled differently from persistent tables. Because temporary tables are private to a connection, concurrency control and multi-versioning (MVCC) are not applicable. For performance reasons, purge is disabled as well. Rollback is supported for temporary tables; that is why we have the temporary undo logs in the first place. Because MVCC and purge are disabled for temporary tables, we should discard all temporary undo logs already at transaction commit, just like we discard the persistent insert_undo logs. Before this change, update_undo logs were being preserved. trx_temp_undo_t: A wrapper for temporary undo logs, comprising a rollback segment and a single temporary undo log. trx_rsegs_t::m_noredo: Use trx_temp_undo_t. (Instead of insert_undo, update_undo, there will be a single undo.) trx_is_noredo_rseg_updated(), trx_is_rseg_assigned(): Remove. trx_undo_add_page(): Remove the parameter undo_ptr. Acquire and release the rollback segment mutex inside the function. trx_undo_free_last_page(): Remove the parameter trx. trx_undo_truncate_end(): Remove the parameter trx, and add the parameter is_temp. Clean up the code a bit. trx_undo_assign_undo(): Split the parameter undo_ptr into rseg, undo. trx_undo_commit_cleanup(): Renamed from trx_undo_insert_cleanup(). Replace the parameter undo_ptr with undo. This will discard the temporary undo or insert_undo log at commit/rollback. trx_purge_add_update_undo_to_history(), trx_undo_update_cleanup(): Remove 3 parameters. Always operate on the persistent update_undo. trx_serialise(): Renamed from trx_serialisation_number_get(). trx_write_serialisation_history(): Simplify the code flow. If there are no persistent changes, do not update MONITOR_TRX_COMMIT_UNDO. trx_commit_in_memory(): Simplify the logic, and add assertions. trx_undo_page_report_modify(): Keep a direct reference to the persistent update_undo log. trx_undo_report_row_operation(): Simplify some code. Always assign TRX_UNDO_INSERT for temporary undo logs. trx_prepare_low(): Keep only one parameter. Prepare all 3 undo logs. trx_roll_try_truncate(): Remove the parameter undo_ptr. Try to truncate all 3 undo logs of the transaction. trx_roll_pop_top_rec_of_trx_low(): Remove. trx_roll_pop_top_rec_of_trx(): Remove the redundant parameter trx->roll_limit. Clear roll_limit when exhausting the undo logs. Consider all 3 undo logs at once, prioritizing the persistent undo logs. row_undo(): Minor cleanup. Let trx_roll_pop_top_rec_of_trx() reset the trx->roll_limit.
9 years ago
MDEV-12219 Discard temporary undo logs at transaction commit Starting with MySQL 5.7, temporary tables in InnoDB are handled differently from persistent tables. Because temporary tables are private to a connection, concurrency control and multi-versioning (MVCC) are not applicable. For performance reasons, purge is disabled as well. Rollback is supported for temporary tables; that is why we have the temporary undo logs in the first place. Because MVCC and purge are disabled for temporary tables, we should discard all temporary undo logs already at transaction commit, just like we discard the persistent insert_undo logs. Before this change, update_undo logs were being preserved. trx_temp_undo_t: A wrapper for temporary undo logs, comprising a rollback segment and a single temporary undo log. trx_rsegs_t::m_noredo: Use trx_temp_undo_t. (Instead of insert_undo, update_undo, there will be a single undo.) trx_is_noredo_rseg_updated(), trx_is_rseg_assigned(): Remove. trx_undo_add_page(): Remove the parameter undo_ptr. Acquire and release the rollback segment mutex inside the function. trx_undo_free_last_page(): Remove the parameter trx. trx_undo_truncate_end(): Remove the parameter trx, and add the parameter is_temp. Clean up the code a bit. trx_undo_assign_undo(): Split the parameter undo_ptr into rseg, undo. trx_undo_commit_cleanup(): Renamed from trx_undo_insert_cleanup(). Replace the parameter undo_ptr with undo. This will discard the temporary undo or insert_undo log at commit/rollback. trx_purge_add_update_undo_to_history(), trx_undo_update_cleanup(): Remove 3 parameters. Always operate on the persistent update_undo. trx_serialise(): Renamed from trx_serialisation_number_get(). trx_write_serialisation_history(): Simplify the code flow. If there are no persistent changes, do not update MONITOR_TRX_COMMIT_UNDO. trx_commit_in_memory(): Simplify the logic, and add assertions. trx_undo_page_report_modify(): Keep a direct reference to the persistent update_undo log. trx_undo_report_row_operation(): Simplify some code. Always assign TRX_UNDO_INSERT for temporary undo logs. trx_prepare_low(): Keep only one parameter. Prepare all 3 undo logs. trx_roll_try_truncate(): Remove the parameter undo_ptr. Try to truncate all 3 undo logs of the transaction. trx_roll_pop_top_rec_of_trx_low(): Remove. trx_roll_pop_top_rec_of_trx(): Remove the redundant parameter trx->roll_limit. Clear roll_limit when exhausting the undo logs. Consider all 3 undo logs at once, prioritizing the persistent undo logs. row_undo(): Minor cleanup. Let trx_roll_pop_top_rec_of_trx() reset the trx->roll_limit.
9 years ago
MDEV-12219 Discard temporary undo logs at transaction commit Starting with MySQL 5.7, temporary tables in InnoDB are handled differently from persistent tables. Because temporary tables are private to a connection, concurrency control and multi-versioning (MVCC) are not applicable. For performance reasons, purge is disabled as well. Rollback is supported for temporary tables; that is why we have the temporary undo logs in the first place. Because MVCC and purge are disabled for temporary tables, we should discard all temporary undo logs already at transaction commit, just like we discard the persistent insert_undo logs. Before this change, update_undo logs were being preserved. trx_temp_undo_t: A wrapper for temporary undo logs, comprising a rollback segment and a single temporary undo log. trx_rsegs_t::m_noredo: Use trx_temp_undo_t. (Instead of insert_undo, update_undo, there will be a single undo.) trx_is_noredo_rseg_updated(), trx_is_rseg_assigned(): Remove. trx_undo_add_page(): Remove the parameter undo_ptr. Acquire and release the rollback segment mutex inside the function. trx_undo_free_last_page(): Remove the parameter trx. trx_undo_truncate_end(): Remove the parameter trx, and add the parameter is_temp. Clean up the code a bit. trx_undo_assign_undo(): Split the parameter undo_ptr into rseg, undo. trx_undo_commit_cleanup(): Renamed from trx_undo_insert_cleanup(). Replace the parameter undo_ptr with undo. This will discard the temporary undo or insert_undo log at commit/rollback. trx_purge_add_update_undo_to_history(), trx_undo_update_cleanup(): Remove 3 parameters. Always operate on the persistent update_undo. trx_serialise(): Renamed from trx_serialisation_number_get(). trx_write_serialisation_history(): Simplify the code flow. If there are no persistent changes, do not update MONITOR_TRX_COMMIT_UNDO. trx_commit_in_memory(): Simplify the logic, and add assertions. trx_undo_page_report_modify(): Keep a direct reference to the persistent update_undo log. trx_undo_report_row_operation(): Simplify some code. Always assign TRX_UNDO_INSERT for temporary undo logs. trx_prepare_low(): Keep only one parameter. Prepare all 3 undo logs. trx_roll_try_truncate(): Remove the parameter undo_ptr. Try to truncate all 3 undo logs of the transaction. trx_roll_pop_top_rec_of_trx_low(): Remove. trx_roll_pop_top_rec_of_trx(): Remove the redundant parameter trx->roll_limit. Clear roll_limit when exhausting the undo logs. Consider all 3 undo logs at once, prioritizing the persistent undo logs. row_undo(): Minor cleanup. Let trx_roll_pop_top_rec_of_trx() reset the trx->roll_limit.
9 years ago
MDEV-12219 Discard temporary undo logs at transaction commit Starting with MySQL 5.7, temporary tables in InnoDB are handled differently from persistent tables. Because temporary tables are private to a connection, concurrency control and multi-versioning (MVCC) are not applicable. For performance reasons, purge is disabled as well. Rollback is supported for temporary tables; that is why we have the temporary undo logs in the first place. Because MVCC and purge are disabled for temporary tables, we should discard all temporary undo logs already at transaction commit, just like we discard the persistent insert_undo logs. Before this change, update_undo logs were being preserved. trx_temp_undo_t: A wrapper for temporary undo logs, comprising a rollback segment and a single temporary undo log. trx_rsegs_t::m_noredo: Use trx_temp_undo_t. (Instead of insert_undo, update_undo, there will be a single undo.) trx_is_noredo_rseg_updated(), trx_is_rseg_assigned(): Remove. trx_undo_add_page(): Remove the parameter undo_ptr. Acquire and release the rollback segment mutex inside the function. trx_undo_free_last_page(): Remove the parameter trx. trx_undo_truncate_end(): Remove the parameter trx, and add the parameter is_temp. Clean up the code a bit. trx_undo_assign_undo(): Split the parameter undo_ptr into rseg, undo. trx_undo_commit_cleanup(): Renamed from trx_undo_insert_cleanup(). Replace the parameter undo_ptr with undo. This will discard the temporary undo or insert_undo log at commit/rollback. trx_purge_add_update_undo_to_history(), trx_undo_update_cleanup(): Remove 3 parameters. Always operate on the persistent update_undo. trx_serialise(): Renamed from trx_serialisation_number_get(). trx_write_serialisation_history(): Simplify the code flow. If there are no persistent changes, do not update MONITOR_TRX_COMMIT_UNDO. trx_commit_in_memory(): Simplify the logic, and add assertions. trx_undo_page_report_modify(): Keep a direct reference to the persistent update_undo log. trx_undo_report_row_operation(): Simplify some code. Always assign TRX_UNDO_INSERT for temporary undo logs. trx_prepare_low(): Keep only one parameter. Prepare all 3 undo logs. trx_roll_try_truncate(): Remove the parameter undo_ptr. Try to truncate all 3 undo logs of the transaction. trx_roll_pop_top_rec_of_trx_low(): Remove. trx_roll_pop_top_rec_of_trx(): Remove the redundant parameter trx->roll_limit. Clear roll_limit when exhausting the undo logs. Consider all 3 undo logs at once, prioritizing the persistent undo logs. row_undo(): Minor cleanup. Let trx_roll_pop_top_rec_of_trx() reset the trx->roll_limit.
9 years ago
MDEV-12219 Discard temporary undo logs at transaction commit Starting with MySQL 5.7, temporary tables in InnoDB are handled differently from persistent tables. Because temporary tables are private to a connection, concurrency control and multi-versioning (MVCC) are not applicable. For performance reasons, purge is disabled as well. Rollback is supported for temporary tables; that is why we have the temporary undo logs in the first place. Because MVCC and purge are disabled for temporary tables, we should discard all temporary undo logs already at transaction commit, just like we discard the persistent insert_undo logs. Before this change, update_undo logs were being preserved. trx_temp_undo_t: A wrapper for temporary undo logs, comprising a rollback segment and a single temporary undo log. trx_rsegs_t::m_noredo: Use trx_temp_undo_t. (Instead of insert_undo, update_undo, there will be a single undo.) trx_is_noredo_rseg_updated(), trx_is_rseg_assigned(): Remove. trx_undo_add_page(): Remove the parameter undo_ptr. Acquire and release the rollback segment mutex inside the function. trx_undo_free_last_page(): Remove the parameter trx. trx_undo_truncate_end(): Remove the parameter trx, and add the parameter is_temp. Clean up the code a bit. trx_undo_assign_undo(): Split the parameter undo_ptr into rseg, undo. trx_undo_commit_cleanup(): Renamed from trx_undo_insert_cleanup(). Replace the parameter undo_ptr with undo. This will discard the temporary undo or insert_undo log at commit/rollback. trx_purge_add_update_undo_to_history(), trx_undo_update_cleanup(): Remove 3 parameters. Always operate on the persistent update_undo. trx_serialise(): Renamed from trx_serialisation_number_get(). trx_write_serialisation_history(): Simplify the code flow. If there are no persistent changes, do not update MONITOR_TRX_COMMIT_UNDO. trx_commit_in_memory(): Simplify the logic, and add assertions. trx_undo_page_report_modify(): Keep a direct reference to the persistent update_undo log. trx_undo_report_row_operation(): Simplify some code. Always assign TRX_UNDO_INSERT for temporary undo logs. trx_prepare_low(): Keep only one parameter. Prepare all 3 undo logs. trx_roll_try_truncate(): Remove the parameter undo_ptr. Try to truncate all 3 undo logs of the transaction. trx_roll_pop_top_rec_of_trx_low(): Remove. trx_roll_pop_top_rec_of_trx(): Remove the redundant parameter trx->roll_limit. Clear roll_limit when exhausting the undo logs. Consider all 3 undo logs at once, prioritizing the persistent undo logs. row_undo(): Minor cleanup. Let trx_roll_pop_top_rec_of_trx() reset the trx->roll_limit.
9 years ago
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed. [TODO: It appears that the resetting is not taking place as often as it could be. We should test that a simple INSERT should eventually cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is invoked soon enough.] The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert). When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0. The persistent InnoDB undo log used to be split into two parts: insert_undo and update_undo. The insert_undo log was discarded at transaction commit or rollback, and the update_undo log was processed by the purge subsystem. As part of this change, we will only generate a single undo log for new transactions, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is touched. That is, all persistent undo log will be preserved at transaction commit or rollback, to be removed by purge. The InnoDB redo log format is changed in two ways: We remove the redo log record type MLOG_UNDO_HDR_REUSE, and we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table. This is also changing the format of persistent InnoDB data files: undo log and clustered index leaf page records. It will still be possible via import and export to exchange data files with earlier versions of MariaDB. The change to clustered index leaf page records is simple: we allow DB_TRX_ID to be 0. When it comes to the undo log, we must be able to upgrade from earlier MariaDB versions after a clean shutdown (no redo log to apply). While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0) before an upgrade, to empty the undo logs, we cannot assume that this has been done. So, separate insert_undo log may exist for recovered uncommitted transactions. These transactions may be automatically rolled back, or they may be in XA PREPARE state, in which case InnoDB will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK. Upgrade has been tested by starting up MariaDB 10.2 with ./mysql-test-run --manual-gdb innodb.read_only_recovery and then starting up this patched server with and without --innodb-read-only. trx_undo_ptr_t::undo: Renamed from update_undo. trx_undo_ptr_t::old_insert: Renamed from insert_undo. trx_rseg_t::undo_list: Renamed from update_undo_list. trx_rseg_t::undo_cached: Merged from update_undo_cached and insert_undo_cached. trx_rseg_t::old_insert_list: Renamed from insert_undo_list. row_purge_reset_trx_id(): New function to reset the columns. This will be called for all undo processing in purge that does not remove the clustered index record. trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the old DB_TRX_ID of the record to the undo log. ReadView::changes_visible(): Allow id==0. (Return true for it. This is what speeds up the MVCC.) row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read(): Implement a fast path for DB_TRX_ID=0. Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type. MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format! innobase_start_or_create_for_mysql(): Set srv_undo_sources before starting any transactions. The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully tested by running the following: ./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680 grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err
8 years ago
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed. [TODO: It appears that the resetting is not taking place as often as it could be. We should test that a simple INSERT should eventually cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is invoked soon enough.] The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert). When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0. The persistent InnoDB undo log used to be split into two parts: insert_undo and update_undo. The insert_undo log was discarded at transaction commit or rollback, and the update_undo log was processed by the purge subsystem. As part of this change, we will only generate a single undo log for new transactions, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is touched. That is, all persistent undo log will be preserved at transaction commit or rollback, to be removed by purge. The InnoDB redo log format is changed in two ways: We remove the redo log record type MLOG_UNDO_HDR_REUSE, and we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table. This is also changing the format of persistent InnoDB data files: undo log and clustered index leaf page records. It will still be possible via import and export to exchange data files with earlier versions of MariaDB. The change to clustered index leaf page records is simple: we allow DB_TRX_ID to be 0. When it comes to the undo log, we must be able to upgrade from earlier MariaDB versions after a clean shutdown (no redo log to apply). While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0) before an upgrade, to empty the undo logs, we cannot assume that this has been done. So, separate insert_undo log may exist for recovered uncommitted transactions. These transactions may be automatically rolled back, or they may be in XA PREPARE state, in which case InnoDB will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK. Upgrade has been tested by starting up MariaDB 10.2 with ./mysql-test-run --manual-gdb innodb.read_only_recovery and then starting up this patched server with and without --innodb-read-only. trx_undo_ptr_t::undo: Renamed from update_undo. trx_undo_ptr_t::old_insert: Renamed from insert_undo. trx_rseg_t::undo_list: Renamed from update_undo_list. trx_rseg_t::undo_cached: Merged from update_undo_cached and insert_undo_cached. trx_rseg_t::old_insert_list: Renamed from insert_undo_list. row_purge_reset_trx_id(): New function to reset the columns. This will be called for all undo processing in purge that does not remove the clustered index record. trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the old DB_TRX_ID of the record to the undo log. ReadView::changes_visible(): Allow id==0. (Return true for it. This is what speeds up the MVCC.) row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read(): Implement a fast path for DB_TRX_ID=0. Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type. MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format! innobase_start_or_create_for_mysql(): Set srv_undo_sources before starting any transactions. The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully tested by running the following: ./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680 grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err
8 years ago
MDEV-12219 Discard temporary undo logs at transaction commit Starting with MySQL 5.7, temporary tables in InnoDB are handled differently from persistent tables. Because temporary tables are private to a connection, concurrency control and multi-versioning (MVCC) are not applicable. For performance reasons, purge is disabled as well. Rollback is supported for temporary tables; that is why we have the temporary undo logs in the first place. Because MVCC and purge are disabled for temporary tables, we should discard all temporary undo logs already at transaction commit, just like we discard the persistent insert_undo logs. Before this change, update_undo logs were being preserved. trx_temp_undo_t: A wrapper for temporary undo logs, comprising a rollback segment and a single temporary undo log. trx_rsegs_t::m_noredo: Use trx_temp_undo_t. (Instead of insert_undo, update_undo, there will be a single undo.) trx_is_noredo_rseg_updated(), trx_is_rseg_assigned(): Remove. trx_undo_add_page(): Remove the parameter undo_ptr. Acquire and release the rollback segment mutex inside the function. trx_undo_free_last_page(): Remove the parameter trx. trx_undo_truncate_end(): Remove the parameter trx, and add the parameter is_temp. Clean up the code a bit. trx_undo_assign_undo(): Split the parameter undo_ptr into rseg, undo. trx_undo_commit_cleanup(): Renamed from trx_undo_insert_cleanup(). Replace the parameter undo_ptr with undo. This will discard the temporary undo or insert_undo log at commit/rollback. trx_purge_add_update_undo_to_history(), trx_undo_update_cleanup(): Remove 3 parameters. Always operate on the persistent update_undo. trx_serialise(): Renamed from trx_serialisation_number_get(). trx_write_serialisation_history(): Simplify the code flow. If there are no persistent changes, do not update MONITOR_TRX_COMMIT_UNDO. trx_commit_in_memory(): Simplify the logic, and add assertions. trx_undo_page_report_modify(): Keep a direct reference to the persistent update_undo log. trx_undo_report_row_operation(): Simplify some code. Always assign TRX_UNDO_INSERT for temporary undo logs. trx_prepare_low(): Keep only one parameter. Prepare all 3 undo logs. trx_roll_try_truncate(): Remove the parameter undo_ptr. Try to truncate all 3 undo logs of the transaction. trx_roll_pop_top_rec_of_trx_low(): Remove. trx_roll_pop_top_rec_of_trx(): Remove the redundant parameter trx->roll_limit. Clear roll_limit when exhausting the undo logs. Consider all 3 undo logs at once, prioritizing the persistent undo logs. row_undo(): Minor cleanup. Let trx_roll_pop_top_rec_of_trx() reset the trx->roll_limit.
9 years ago
MDEV-12219 Discard temporary undo logs at transaction commit Starting with MySQL 5.7, temporary tables in InnoDB are handled differently from persistent tables. Because temporary tables are private to a connection, concurrency control and multi-versioning (MVCC) are not applicable. For performance reasons, purge is disabled as well. Rollback is supported for temporary tables; that is why we have the temporary undo logs in the first place. Because MVCC and purge are disabled for temporary tables, we should discard all temporary undo logs already at transaction commit, just like we discard the persistent insert_undo logs. Before this change, update_undo logs were being preserved. trx_temp_undo_t: A wrapper for temporary undo logs, comprising a rollback segment and a single temporary undo log. trx_rsegs_t::m_noredo: Use trx_temp_undo_t. (Instead of insert_undo, update_undo, there will be a single undo.) trx_is_noredo_rseg_updated(), trx_is_rseg_assigned(): Remove. trx_undo_add_page(): Remove the parameter undo_ptr. Acquire and release the rollback segment mutex inside the function. trx_undo_free_last_page(): Remove the parameter trx. trx_undo_truncate_end(): Remove the parameter trx, and add the parameter is_temp. Clean up the code a bit. trx_undo_assign_undo(): Split the parameter undo_ptr into rseg, undo. trx_undo_commit_cleanup(): Renamed from trx_undo_insert_cleanup(). Replace the parameter undo_ptr with undo. This will discard the temporary undo or insert_undo log at commit/rollback. trx_purge_add_update_undo_to_history(), trx_undo_update_cleanup(): Remove 3 parameters. Always operate on the persistent update_undo. trx_serialise(): Renamed from trx_serialisation_number_get(). trx_write_serialisation_history(): Simplify the code flow. If there are no persistent changes, do not update MONITOR_TRX_COMMIT_UNDO. trx_commit_in_memory(): Simplify the logic, and add assertions. trx_undo_page_report_modify(): Keep a direct reference to the persistent update_undo log. trx_undo_report_row_operation(): Simplify some code. Always assign TRX_UNDO_INSERT for temporary undo logs. trx_prepare_low(): Keep only one parameter. Prepare all 3 undo logs. trx_roll_try_truncate(): Remove the parameter undo_ptr. Try to truncate all 3 undo logs of the transaction. trx_roll_pop_top_rec_of_trx_low(): Remove. trx_roll_pop_top_rec_of_trx(): Remove the redundant parameter trx->roll_limit. Clear roll_limit when exhausting the undo logs. Consider all 3 undo logs at once, prioritizing the persistent undo logs. row_undo(): Minor cleanup. Let trx_roll_pop_top_rec_of_trx() reset the trx->roll_limit.
9 years ago
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed. [TODO: It appears that the resetting is not taking place as often as it could be. We should test that a simple INSERT should eventually cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is invoked soon enough.] The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert). When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0. The persistent InnoDB undo log used to be split into two parts: insert_undo and update_undo. The insert_undo log was discarded at transaction commit or rollback, and the update_undo log was processed by the purge subsystem. As part of this change, we will only generate a single undo log for new transactions, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is touched. That is, all persistent undo log will be preserved at transaction commit or rollback, to be removed by purge. The InnoDB redo log format is changed in two ways: We remove the redo log record type MLOG_UNDO_HDR_REUSE, and we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table. This is also changing the format of persistent InnoDB data files: undo log and clustered index leaf page records. It will still be possible via import and export to exchange data files with earlier versions of MariaDB. The change to clustered index leaf page records is simple: we allow DB_TRX_ID to be 0. When it comes to the undo log, we must be able to upgrade from earlier MariaDB versions after a clean shutdown (no redo log to apply). While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0) before an upgrade, to empty the undo logs, we cannot assume that this has been done. So, separate insert_undo log may exist for recovered uncommitted transactions. These transactions may be automatically rolled back, or they may be in XA PREPARE state, in which case InnoDB will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK. Upgrade has been tested by starting up MariaDB 10.2 with ./mysql-test-run --manual-gdb innodb.read_only_recovery and then starting up this patched server with and without --innodb-read-only. trx_undo_ptr_t::undo: Renamed from update_undo. trx_undo_ptr_t::old_insert: Renamed from insert_undo. trx_rseg_t::undo_list: Renamed from update_undo_list. trx_rseg_t::undo_cached: Merged from update_undo_cached and insert_undo_cached. trx_rseg_t::old_insert_list: Renamed from insert_undo_list. row_purge_reset_trx_id(): New function to reset the columns. This will be called for all undo processing in purge that does not remove the clustered index record. trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the old DB_TRX_ID of the record to the undo log. ReadView::changes_visible(): Allow id==0. (Return true for it. This is what speeds up the MVCC.) row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read(): Implement a fast path for DB_TRX_ID=0. Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type. MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format! innobase_start_or_create_for_mysql(): Set srv_undo_sources before starting any transactions. The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully tested by running the following: ./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680 grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err
8 years ago
MDEV-12219 Discard temporary undo logs at transaction commit Starting with MySQL 5.7, temporary tables in InnoDB are handled differently from persistent tables. Because temporary tables are private to a connection, concurrency control and multi-versioning (MVCC) are not applicable. For performance reasons, purge is disabled as well. Rollback is supported for temporary tables; that is why we have the temporary undo logs in the first place. Because MVCC and purge are disabled for temporary tables, we should discard all temporary undo logs already at transaction commit, just like we discard the persistent insert_undo logs. Before this change, update_undo logs were being preserved. trx_temp_undo_t: A wrapper for temporary undo logs, comprising a rollback segment and a single temporary undo log. trx_rsegs_t::m_noredo: Use trx_temp_undo_t. (Instead of insert_undo, update_undo, there will be a single undo.) trx_is_noredo_rseg_updated(), trx_is_rseg_assigned(): Remove. trx_undo_add_page(): Remove the parameter undo_ptr. Acquire and release the rollback segment mutex inside the function. trx_undo_free_last_page(): Remove the parameter trx. trx_undo_truncate_end(): Remove the parameter trx, and add the parameter is_temp. Clean up the code a bit. trx_undo_assign_undo(): Split the parameter undo_ptr into rseg, undo. trx_undo_commit_cleanup(): Renamed from trx_undo_insert_cleanup(). Replace the parameter undo_ptr with undo. This will discard the temporary undo or insert_undo log at commit/rollback. trx_purge_add_update_undo_to_history(), trx_undo_update_cleanup(): Remove 3 parameters. Always operate on the persistent update_undo. trx_serialise(): Renamed from trx_serialisation_number_get(). trx_write_serialisation_history(): Simplify the code flow. If there are no persistent changes, do not update MONITOR_TRX_COMMIT_UNDO. trx_commit_in_memory(): Simplify the logic, and add assertions. trx_undo_page_report_modify(): Keep a direct reference to the persistent update_undo log. trx_undo_report_row_operation(): Simplify some code. Always assign TRX_UNDO_INSERT for temporary undo logs. trx_prepare_low(): Keep only one parameter. Prepare all 3 undo logs. trx_roll_try_truncate(): Remove the parameter undo_ptr. Try to truncate all 3 undo logs of the transaction. trx_roll_pop_top_rec_of_trx_low(): Remove. trx_roll_pop_top_rec_of_trx(): Remove the redundant parameter trx->roll_limit. Clear roll_limit when exhausting the undo logs. Consider all 3 undo logs at once, prioritizing the persistent undo logs. row_undo(): Minor cleanup. Let trx_roll_pop_top_rec_of_trx() reset the trx->roll_limit.
9 years ago
MDEV-12219 Discard temporary undo logs at transaction commit Starting with MySQL 5.7, temporary tables in InnoDB are handled differently from persistent tables. Because temporary tables are private to a connection, concurrency control and multi-versioning (MVCC) are not applicable. For performance reasons, purge is disabled as well. Rollback is supported for temporary tables; that is why we have the temporary undo logs in the first place. Because MVCC and purge are disabled for temporary tables, we should discard all temporary undo logs already at transaction commit, just like we discard the persistent insert_undo logs. Before this change, update_undo logs were being preserved. trx_temp_undo_t: A wrapper for temporary undo logs, comprising a rollback segment and a single temporary undo log. trx_rsegs_t::m_noredo: Use trx_temp_undo_t. (Instead of insert_undo, update_undo, there will be a single undo.) trx_is_noredo_rseg_updated(), trx_is_rseg_assigned(): Remove. trx_undo_add_page(): Remove the parameter undo_ptr. Acquire and release the rollback segment mutex inside the function. trx_undo_free_last_page(): Remove the parameter trx. trx_undo_truncate_end(): Remove the parameter trx, and add the parameter is_temp. Clean up the code a bit. trx_undo_assign_undo(): Split the parameter undo_ptr into rseg, undo. trx_undo_commit_cleanup(): Renamed from trx_undo_insert_cleanup(). Replace the parameter undo_ptr with undo. This will discard the temporary undo or insert_undo log at commit/rollback. trx_purge_add_update_undo_to_history(), trx_undo_update_cleanup(): Remove 3 parameters. Always operate on the persistent update_undo. trx_serialise(): Renamed from trx_serialisation_number_get(). trx_write_serialisation_history(): Simplify the code flow. If there are no persistent changes, do not update MONITOR_TRX_COMMIT_UNDO. trx_commit_in_memory(): Simplify the logic, and add assertions. trx_undo_page_report_modify(): Keep a direct reference to the persistent update_undo log. trx_undo_report_row_operation(): Simplify some code. Always assign TRX_UNDO_INSERT for temporary undo logs. trx_prepare_low(): Keep only one parameter. Prepare all 3 undo logs. trx_roll_try_truncate(): Remove the parameter undo_ptr. Try to truncate all 3 undo logs of the transaction. trx_roll_pop_top_rec_of_trx_low(): Remove. trx_roll_pop_top_rec_of_trx(): Remove the redundant parameter trx->roll_limit. Clear roll_limit when exhausting the undo logs. Consider all 3 undo logs at once, prioritizing the persistent undo logs. row_undo(): Minor cleanup. Let trx_roll_pop_top_rec_of_trx() reset the trx->roll_limit.
9 years ago
MDEV-12219 Discard temporary undo logs at transaction commit Starting with MySQL 5.7, temporary tables in InnoDB are handled differently from persistent tables. Because temporary tables are private to a connection, concurrency control and multi-versioning (MVCC) are not applicable. For performance reasons, purge is disabled as well. Rollback is supported for temporary tables; that is why we have the temporary undo logs in the first place. Because MVCC and purge are disabled for temporary tables, we should discard all temporary undo logs already at transaction commit, just like we discard the persistent insert_undo logs. Before this change, update_undo logs were being preserved. trx_temp_undo_t: A wrapper for temporary undo logs, comprising a rollback segment and a single temporary undo log. trx_rsegs_t::m_noredo: Use trx_temp_undo_t. (Instead of insert_undo, update_undo, there will be a single undo.) trx_is_noredo_rseg_updated(), trx_is_rseg_assigned(): Remove. trx_undo_add_page(): Remove the parameter undo_ptr. Acquire and release the rollback segment mutex inside the function. trx_undo_free_last_page(): Remove the parameter trx. trx_undo_truncate_end(): Remove the parameter trx, and add the parameter is_temp. Clean up the code a bit. trx_undo_assign_undo(): Split the parameter undo_ptr into rseg, undo. trx_undo_commit_cleanup(): Renamed from trx_undo_insert_cleanup(). Replace the parameter undo_ptr with undo. This will discard the temporary undo or insert_undo log at commit/rollback. trx_purge_add_update_undo_to_history(), trx_undo_update_cleanup(): Remove 3 parameters. Always operate on the persistent update_undo. trx_serialise(): Renamed from trx_serialisation_number_get(). trx_write_serialisation_history(): Simplify the code flow. If there are no persistent changes, do not update MONITOR_TRX_COMMIT_UNDO. trx_commit_in_memory(): Simplify the logic, and add assertions. trx_undo_page_report_modify(): Keep a direct reference to the persistent update_undo log. trx_undo_report_row_operation(): Simplify some code. Always assign TRX_UNDO_INSERT for temporary undo logs. trx_prepare_low(): Keep only one parameter. Prepare all 3 undo logs. trx_roll_try_truncate(): Remove the parameter undo_ptr. Try to truncate all 3 undo logs of the transaction. trx_roll_pop_top_rec_of_trx_low(): Remove. trx_roll_pop_top_rec_of_trx(): Remove the redundant parameter trx->roll_limit. Clear roll_limit when exhausting the undo logs. Consider all 3 undo logs at once, prioritizing the persistent undo logs. row_undo(): Minor cleanup. Let trx_roll_pop_top_rec_of_trx() reset the trx->roll_limit.
9 years ago
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed. [TODO: It appears that the resetting is not taking place as often as it could be. We should test that a simple INSERT should eventually cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is invoked soon enough.] The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert). When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0. The persistent InnoDB undo log used to be split into two parts: insert_undo and update_undo. The insert_undo log was discarded at transaction commit or rollback, and the update_undo log was processed by the purge subsystem. As part of this change, we will only generate a single undo log for new transactions, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is touched. That is, all persistent undo log will be preserved at transaction commit or rollback, to be removed by purge. The InnoDB redo log format is changed in two ways: We remove the redo log record type MLOG_UNDO_HDR_REUSE, and we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table. This is also changing the format of persistent InnoDB data files: undo log and clustered index leaf page records. It will still be possible via import and export to exchange data files with earlier versions of MariaDB. The change to clustered index leaf page records is simple: we allow DB_TRX_ID to be 0. When it comes to the undo log, we must be able to upgrade from earlier MariaDB versions after a clean shutdown (no redo log to apply). While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0) before an upgrade, to empty the undo logs, we cannot assume that this has been done. So, separate insert_undo log may exist for recovered uncommitted transactions. These transactions may be automatically rolled back, or they may be in XA PREPARE state, in which case InnoDB will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK. Upgrade has been tested by starting up MariaDB 10.2 with ./mysql-test-run --manual-gdb innodb.read_only_recovery and then starting up this patched server with and without --innodb-read-only. trx_undo_ptr_t::undo: Renamed from update_undo. trx_undo_ptr_t::old_insert: Renamed from insert_undo. trx_rseg_t::undo_list: Renamed from update_undo_list. trx_rseg_t::undo_cached: Merged from update_undo_cached and insert_undo_cached. trx_rseg_t::old_insert_list: Renamed from insert_undo_list. row_purge_reset_trx_id(): New function to reset the columns. This will be called for all undo processing in purge that does not remove the clustered index record. trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the old DB_TRX_ID of the record to the undo log. ReadView::changes_visible(): Allow id==0. (Return true for it. This is what speeds up the MVCC.) row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read(): Implement a fast path for DB_TRX_ID=0. Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type. MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format! innobase_start_or_create_for_mysql(): Set srv_undo_sources before starting any transactions. The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully tested by running the following: ./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680 grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err
8 years ago
MDEV-12219 Discard temporary undo logs at transaction commit Starting with MySQL 5.7, temporary tables in InnoDB are handled differently from persistent tables. Because temporary tables are private to a connection, concurrency control and multi-versioning (MVCC) are not applicable. For performance reasons, purge is disabled as well. Rollback is supported for temporary tables; that is why we have the temporary undo logs in the first place. Because MVCC and purge are disabled for temporary tables, we should discard all temporary undo logs already at transaction commit, just like we discard the persistent insert_undo logs. Before this change, update_undo logs were being preserved. trx_temp_undo_t: A wrapper for temporary undo logs, comprising a rollback segment and a single temporary undo log. trx_rsegs_t::m_noredo: Use trx_temp_undo_t. (Instead of insert_undo, update_undo, there will be a single undo.) trx_is_noredo_rseg_updated(), trx_is_rseg_assigned(): Remove. trx_undo_add_page(): Remove the parameter undo_ptr. Acquire and release the rollback segment mutex inside the function. trx_undo_free_last_page(): Remove the parameter trx. trx_undo_truncate_end(): Remove the parameter trx, and add the parameter is_temp. Clean up the code a bit. trx_undo_assign_undo(): Split the parameter undo_ptr into rseg, undo. trx_undo_commit_cleanup(): Renamed from trx_undo_insert_cleanup(). Replace the parameter undo_ptr with undo. This will discard the temporary undo or insert_undo log at commit/rollback. trx_purge_add_update_undo_to_history(), trx_undo_update_cleanup(): Remove 3 parameters. Always operate on the persistent update_undo. trx_serialise(): Renamed from trx_serialisation_number_get(). trx_write_serialisation_history(): Simplify the code flow. If there are no persistent changes, do not update MONITOR_TRX_COMMIT_UNDO. trx_commit_in_memory(): Simplify the logic, and add assertions. trx_undo_page_report_modify(): Keep a direct reference to the persistent update_undo log. trx_undo_report_row_operation(): Simplify some code. Always assign TRX_UNDO_INSERT for temporary undo logs. trx_prepare_low(): Keep only one parameter. Prepare all 3 undo logs. trx_roll_try_truncate(): Remove the parameter undo_ptr. Try to truncate all 3 undo logs of the transaction. trx_roll_pop_top_rec_of_trx_low(): Remove. trx_roll_pop_top_rec_of_trx(): Remove the redundant parameter trx->roll_limit. Clear roll_limit when exhausting the undo logs. Consider all 3 undo logs at once, prioritizing the persistent undo logs. row_undo(): Minor cleanup. Let trx_roll_pop_top_rec_of_trx() reset the trx->roll_limit.
9 years ago
MDEV-12219 Discard temporary undo logs at transaction commit Starting with MySQL 5.7, temporary tables in InnoDB are handled differently from persistent tables. Because temporary tables are private to a connection, concurrency control and multi-versioning (MVCC) are not applicable. For performance reasons, purge is disabled as well. Rollback is supported for temporary tables; that is why we have the temporary undo logs in the first place. Because MVCC and purge are disabled for temporary tables, we should discard all temporary undo logs already at transaction commit, just like we discard the persistent insert_undo logs. Before this change, update_undo logs were being preserved. trx_temp_undo_t: A wrapper for temporary undo logs, comprising a rollback segment and a single temporary undo log. trx_rsegs_t::m_noredo: Use trx_temp_undo_t. (Instead of insert_undo, update_undo, there will be a single undo.) trx_is_noredo_rseg_updated(), trx_is_rseg_assigned(): Remove. trx_undo_add_page(): Remove the parameter undo_ptr. Acquire and release the rollback segment mutex inside the function. trx_undo_free_last_page(): Remove the parameter trx. trx_undo_truncate_end(): Remove the parameter trx, and add the parameter is_temp. Clean up the code a bit. trx_undo_assign_undo(): Split the parameter undo_ptr into rseg, undo. trx_undo_commit_cleanup(): Renamed from trx_undo_insert_cleanup(). Replace the parameter undo_ptr with undo. This will discard the temporary undo or insert_undo log at commit/rollback. trx_purge_add_update_undo_to_history(), trx_undo_update_cleanup(): Remove 3 parameters. Always operate on the persistent update_undo. trx_serialise(): Renamed from trx_serialisation_number_get(). trx_write_serialisation_history(): Simplify the code flow. If there are no persistent changes, do not update MONITOR_TRX_COMMIT_UNDO. trx_commit_in_memory(): Simplify the logic, and add assertions. trx_undo_page_report_modify(): Keep a direct reference to the persistent update_undo log. trx_undo_report_row_operation(): Simplify some code. Always assign TRX_UNDO_INSERT for temporary undo logs. trx_prepare_low(): Keep only one parameter. Prepare all 3 undo logs. trx_roll_try_truncate(): Remove the parameter undo_ptr. Try to truncate all 3 undo logs of the transaction. trx_roll_pop_top_rec_of_trx_low(): Remove. trx_roll_pop_top_rec_of_trx(): Remove the redundant parameter trx->roll_limit. Clear roll_limit when exhausting the undo logs. Consider all 3 undo logs at once, prioritizing the persistent undo logs. row_undo(): Minor cleanup. Let trx_roll_pop_top_rec_of_trx() reset the trx->roll_limit.
9 years ago
MDEV-14638 - Replace trx_sys_t::rw_trx_set with LF_HASH trx_sys_t::rw_trx_set is implemented as std::set, which does a few quite expensive operations under trx_sys_t::mutex protection: e.g. malloc/free when adding/removing elements. Traversing b-tree is not that cheap either. This has negative scalability impact, which is especially visible when running oltp_update_index.lua benchmark on a ramdisk. To reduce trx_sys_t::mutex contention std::set is replaced with LF_HASH. None of LF_HASH operations require trx_sys_t::mutex (nor any other global mutex) protection. Another interesting issue observed with std::set is reproducible ~2% performance decline after benchmark is ran for ~60 seconds. With LF_HASH results are stable. All in all this patch optimises away one of three trx_sys->mutex locks per oltp_update_index.lua query. The other two critical sections became smaller. Relevant clean-ups: Replaced rw_trx_set iteration at startup with local set. The latter is needed because values inserted to rw_trx_list must be ordered by trx->id. Removed redundant conditions from trx_reference(): it is (and even was) never called with transactions that have trx->state == TRX_STATE_COMMITTED_IN_MEMORY. do_ref_count doesn't (and probably even didn't) make any sense: now it is called only when reference counter increment is actually requested. Moved condition out of mutex in trx_erase_lists(). trx_rw_is_active(), trx_rw_is_active_low() and trx_get_rw_trx_by_id() were greatly simplified and replaced by appropriate trx_rw_hash_t methods. Compared to rw_trx_set, rw_trx_hash holds transactions only in PREPARED or ACTIVE states. Transactions in COMMITTED state were required to be found at InnoDB startup only. They are now looked up in the local set. Removed unused trx_assert_recovered(). Removed unused innobase_get_trx() declaration. Removed rather semantically incorrect trx_sys_rw_trx_add(). Moved information printout from trx_sys_init_at_db_start() to trx_lists_init_at_db_start().
8 years ago
MDEV-12289 Keep 128 persistent rollback segments for compatibility and performance InnoDB divides the allocation of undo logs into rollback segments. The DB_ROLL_PTR system column of clustered indexes can address up to 128 rollback segments (TRX_SYS_N_RSEGS). Originally, InnoDB only created one rollback segment. In MySQL 5.5 or in the InnoDB Plugin for MySQL 5.1, all 128 rollback segments were created. MySQL 5.7 hard-codes the rollback segment IDs 1..32 for temporary undo logs. On upgrade, unless a slow shutdown (innodb_fast_shutdown=0) was performed on the old server instance, these rollback segments could be in use by transactions that are in XA PREPARE state or transactions that were left behind by a server kill followed by a normal shutdown immediately after restart. Persistent tables cannot refer to temporary undo logs or vice versa. Therefore, we should keep two distinct sets of rollback segments: one for persistent tables and another for temporary tables. In this way, all 128 rollback segments will be available for both types of tables, which could improve performance. Also, MariaDB 10.2 will remain more compatible than MySQL 5.7 with data files from earlier versions of MySQL or MariaDB. trx_sys_t::temp_rsegs[TRX_SYS_N_RSEGS]: A new array of temporary rollback segments. The trx_sys_t::rseg_array[TRX_SYS_N_RSEGS] will be solely for persistent undo logs. srv_tmp_undo_logs. Remove. Use the constant TRX_SYS_N_RSEGS. srv_available_undo_logs: Change the type to ulong. trx_rseg_get_on_id(): Remove. Instead, let the callers refer to trx_sys directly. trx_rseg_create(), trx_sysf_rseg_find_free(): Remove unneeded parameters. These functions only deal with persistent undo logs. trx_temp_rseg_create(): New function, to create all temporary rollback segments at server startup. trx_rseg_t::is_persistent(): Determine if the rollback segment is for persistent tables. trx_sys_is_noredo_rseg_slot(): Remove. The callers must know based on context (such as table handle) whether the DB_ROLL_PTR is referring to a persistent undo log. trx_sys_create_rsegs(): Remove all parameters, which were always passed as global variables. Instead, modify the global variables directly. enum trx_rseg_type_t: Remove. trx_t::get_temp_rseg(): A method to ensure that a temporary rollback segment has been assigned for the transaction. trx_t::assign_temp_rseg(): Replaces trx_assign_rseg(). trx_purge_free_segment(), trx_purge_truncate_rseg_history(): Remove the redundant variable noredo=false. Temporary undo logs are discarded immediately at transaction commit or rollback, not lazily by purge. trx_purge_mark_undo_for_truncate(): Remove references to the temporary rollback segments. trx_purge_mark_undo_for_truncate(): Remove a check for temporary rollback segments. Only the dedicated persistent undo log tablespaces can be truncated. trx_undo_get_undo_rec_low(), trx_undo_get_undo_rec(): Add the parameter is_temp. trx_rseg_mem_restore(): Split from trx_rseg_mem_create(). Initialize the undo log and the rollback segment from the file data structures. trx_sysf_get_n_rseg_slots(): Renamed from trx_sysf_used_slots_for_redo_rseg(). Count the persistent rollback segment headers that have been initialized. trx_sys_close(): Also free trx_sys->temp_rsegs[]. get_next_redo_rseg(): Merged to trx_assign_rseg_low(). trx_assign_rseg_low(): Remove the parameters and access the global variables directly. Revert to simple round-robin, now that the whole trx_sys->rseg_array[] is for persistent undo log again. get_next_noredo_rseg(): Moved to trx_t::assign_temp_rseg(). srv_undo_tablespaces_init(): Remove some parameters and use the global variables directly. Clarify some error messages. Adjust the test innodb.log_file. Apparently, before these changes, InnoDB somehow ignored missing dedicated undo tablespace files that are pointed by the TRX_SYS header page, possibly losing part of essential transaction system state.
9 years ago
  1. /*****************************************************************************
  2. Copyright (c) 1996, 2016, Oracle and/or its affiliates. All Rights Reserved.
  3. Copyright (c) 2015, 2018, MariaDB Corporation.
  4. This program is free software; you can redistribute it and/or modify it under
  5. the terms of the GNU General Public License as published by the Free Software
  6. Foundation; version 2 of the License.
  7. This program is distributed in the hope that it will be useful, but WITHOUT
  8. ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
  9. FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
  10. You should have received a copy of the GNU General Public License along with
  11. this program; if not, write to the Free Software Foundation, Inc.,
  12. 51 Franklin Street, Suite 500, Boston, MA 02110-1335 USA
  13. *****************************************************************************/
  14. /**************************************************//**
  15. @file trx/trx0trx.cc
  16. The transaction
  17. Created 3/26/1996 Heikki Tuuri
  18. *******************************************************/
  19. #include "ha_prototypes.h"
  20. #include "trx0trx.h"
  21. #ifdef WITH_WSREP
  22. #include <mysql/service_wsrep.h>
  23. #endif
  24. #include <mysql/service_thd_error_context.h>
  25. #include "btr0sea.h"
  26. #include "lock0lock.h"
  27. #include "log0log.h"
  28. #include "os0proc.h"
  29. #include "que0que.h"
  30. #include "srv0mon.h"
  31. #include "srv0srv.h"
  32. #include "fsp0sysspace.h"
  33. #include "row0mysql.h"
  34. #include "srv0start.h"
  35. #include "trx0purge.h"
  36. #include "trx0rec.h"
  37. #include "trx0roll.h"
  38. #include "trx0rseg.h"
  39. #include "trx0undo.h"
  40. #include "trx0xa.h"
  41. #include "ut0new.h"
  42. #include "ut0pool.h"
  43. #include "ut0vec.h"
  44. #include <set>
  45. #include <new>
  46. extern "C"
  47. int thd_deadlock_victim_preference(const MYSQL_THD thd1, const MYSQL_THD thd2);
  48. /** The bit pattern corresponding to TRX_ID_MAX */
  49. const byte trx_id_max_bytes[8] = {
  50. 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
  51. };
  52. /** The bit pattern corresponding to max timestamp */
  53. const byte timestamp_max_bytes[7] = {
  54. 0x7f, 0xff, 0xff, 0xff, 0x0f, 0x42, 0x3f
  55. };
  56. static const ulint MAX_DETAILED_ERROR_LEN = 256;
  57. /** Set of table_id */
  58. typedef std::set<
  59. table_id_t,
  60. std::less<table_id_t>,
  61. ut_allocator<table_id_t> > table_id_set;
  62. /** Constructor */
  63. TrxVersion::TrxVersion(trx_t* trx)
  64. :
  65. m_trx(trx),
  66. m_version(trx->version)
  67. {
  68. /* No op */
  69. }
  70. /** Set flush observer for the transaction
  71. @param[in/out] trx transaction struct
  72. @param[in] observer flush observer */
  73. void
  74. trx_set_flush_observer(
  75. trx_t* trx,
  76. FlushObserver* observer)
  77. {
  78. trx->flush_observer = observer;
  79. }
  80. /*************************************************************//**
  81. Set detailed error message for the transaction. */
  82. void
  83. trx_set_detailed_error(
  84. /*===================*/
  85. trx_t* trx, /*!< in: transaction struct */
  86. const char* msg) /*!< in: detailed error message */
  87. {
  88. ut_strlcpy(trx->detailed_error, msg, MAX_DETAILED_ERROR_LEN);
  89. }
  90. /*************************************************************//**
  91. Set detailed error message for the transaction from a file. Note that the
  92. file is rewinded before reading from it. */
  93. void
  94. trx_set_detailed_error_from_file(
  95. /*=============================*/
  96. trx_t* trx, /*!< in: transaction struct */
  97. FILE* file) /*!< in: file to read message from */
  98. {
  99. os_file_read_string(file, trx->detailed_error, MAX_DETAILED_ERROR_LEN);
  100. }
  101. /********************************************************************//**
  102. Initialize transaction object.
  103. @param trx trx to initialize */
  104. static
  105. void
  106. trx_init(
  107. /*=====*/
  108. trx_t* trx)
  109. {
  110. /* This is called at the end of commit, do not reset the
  111. trx_t::state here to NOT_STARTED. The FORCED_ROLLBACK
  112. status is required for asynchronous handling. */
  113. trx->id = 0;
  114. trx->no = TRX_ID_MAX;
  115. trx->is_recovered = false;
  116. trx->op_info = "";
  117. trx->active_commit_ordered = 0;
  118. trx->isolation_level = TRX_ISO_REPEATABLE_READ;
  119. trx->check_foreigns = true;
  120. trx->check_unique_secondary = true;
  121. trx->lock.n_rec_locks = 0;
  122. trx->dict_operation = TRX_DICT_OP_NONE;
  123. trx->table_id = 0;
  124. trx->error_state = DB_SUCCESS;
  125. trx->error_key_num = ULINT_UNDEFINED;
  126. trx->undo_no = 0;
  127. trx->rsegs.m_redo.rseg = NULL;
  128. trx->rsegs.m_noredo.rseg = NULL;
  129. trx->read_only = false;
  130. trx->auto_commit = false;
  131. trx->will_lock = 0;
  132. trx->ddl = false;
  133. trx->internal = false;
  134. ut_d(trx->start_file = 0);
  135. ut_d(trx->start_line = 0);
  136. trx->magic_n = TRX_MAGIC_N;
  137. trx->lock.que_state = TRX_QUE_RUNNING;
  138. trx->last_sql_stat_start.least_undo_no = 0;
  139. ut_ad(!trx->read_view.is_open());
  140. trx->lock.rec_cached = 0;
  141. trx->lock.table_cached = 0;
  142. /* During asynchronous rollback, we should reset forced rollback flag
  143. only after rollback is complete to avoid race with the thread owning
  144. the transaction. */
  145. if (!TrxInInnoDB::is_async_rollback(trx)) {
  146. my_atomic_storelong(&trx->killed_by, 0);
  147. /* Note: Do not set to 0, the ref count is decremented inside
  148. the TrxInInnoDB() destructor. We only need to clear the flags. */
  149. trx->in_innodb &= TRX_FORCE_ROLLBACK_MASK;
  150. }
  151. /* Note: It's possible that this list is not empty if a transaction
  152. was interrupted after it collected the victim transactions and before
  153. it got a chance to roll them back asynchronously. */
  154. trx->hit_list.clear();
  155. trx->flush_observer = NULL;
  156. ++trx->version;
  157. }
  158. /** For managing the life-cycle of the trx_t instance that we get
  159. from the pool. */
  160. struct TrxFactory {
  161. /** Initializes a transaction object. It must be explicitly started
  162. with trx_start_if_not_started() before using it. The default isolation
  163. level is TRX_ISO_REPEATABLE_READ.
  164. @param trx Transaction instance to initialise */
  165. static void init(trx_t* trx)
  166. {
  167. /* Explicitly call the constructor of the already
  168. allocated object. trx_t objects are allocated by
  169. ut_zalloc() in Pool::Pool() which would not call
  170. the constructors of the trx_t members. */
  171. new(&trx->mod_tables) trx_mod_tables_t();
  172. new(&trx->lock.rec_pool) lock_pool_t();
  173. new(&trx->lock.table_pool) lock_pool_t();
  174. new(&trx->lock.table_locks) lock_pool_t();
  175. new(&trx->hit_list) hit_list_t();
  176. new(&trx->read_view) ReadView();
  177. trx->rw_trx_hash_pins = 0;
  178. trx_init(trx);
  179. DBUG_LOG("trx", "Init: " << trx);
  180. trx->state = TRX_STATE_NOT_STARTED;
  181. trx->dict_operation_lock_mode = 0;
  182. trx->xid = UT_NEW_NOKEY(xid_t());
  183. trx->detailed_error = reinterpret_cast<char*>(
  184. ut_zalloc_nokey(MAX_DETAILED_ERROR_LEN));
  185. trx->lock.lock_heap = mem_heap_create_typed(
  186. 1024, MEM_HEAP_FOR_LOCK_HEAP);
  187. lock_trx_lock_list_init(&trx->lock.trx_locks);
  188. UT_LIST_INIT(
  189. trx->trx_savepoints,
  190. &trx_named_savept_t::trx_savepoints);
  191. mutex_create(LATCH_ID_TRX, &trx->mutex);
  192. mutex_create(LATCH_ID_TRX_UNDO, &trx->undo_mutex);
  193. lock_trx_alloc_locks(trx);
  194. }
  195. /** Release resources held by the transaction object.
  196. @param trx the transaction for which to release resources */
  197. static void destroy(trx_t* trx)
  198. {
  199. ut_a(trx->magic_n == TRX_MAGIC_N);
  200. ut_ad(!trx->in_mysql_trx_list);
  201. ut_a(trx->lock.wait_lock == NULL);
  202. ut_a(trx->lock.wait_thr == NULL);
  203. ut_a(trx->dict_operation_lock_mode == 0);
  204. if (trx->lock.lock_heap != NULL) {
  205. mem_heap_free(trx->lock.lock_heap);
  206. trx->lock.lock_heap = NULL;
  207. }
  208. ut_a(UT_LIST_GET_LEN(trx->lock.trx_locks) == 0);
  209. UT_DELETE(trx->xid);
  210. ut_free(trx->detailed_error);
  211. mutex_free(&trx->mutex);
  212. mutex_free(&trx->undo_mutex);
  213. trx->mod_tables.~trx_mod_tables_t();
  214. ut_ad(!trx->read_view.is_open());
  215. if (!trx->lock.rec_pool.empty()) {
  216. /* See lock_trx_alloc_locks() why we only free
  217. the first element. */
  218. ut_free(trx->lock.rec_pool[0]);
  219. }
  220. if (!trx->lock.table_pool.empty()) {
  221. /* See lock_trx_alloc_locks() why we only free
  222. the first element. */
  223. ut_free(trx->lock.table_pool[0]);
  224. }
  225. trx->lock.rec_pool.~lock_pool_t();
  226. trx->lock.table_pool.~lock_pool_t();
  227. trx->lock.table_locks.~lock_pool_t();
  228. trx->hit_list.~hit_list_t();
  229. trx->read_view.~ReadView();
  230. }
  231. /** Enforce any invariants here, this is called before the transaction
  232. is added to the pool.
  233. @return true if all OK */
  234. static bool debug(const trx_t* trx)
  235. {
  236. ut_a(trx->error_state == DB_SUCCESS);
  237. ut_a(trx->magic_n == TRX_MAGIC_N);
  238. ut_ad(!trx->read_only);
  239. ut_ad(trx->state == TRX_STATE_NOT_STARTED
  240. || trx->state == TRX_STATE_FORCED_ROLLBACK);
  241. ut_ad(trx->dict_operation == TRX_DICT_OP_NONE);
  242. ut_ad(trx->mysql_thd == 0);
  243. ut_ad(!trx->in_mysql_trx_list);
  244. ut_a(trx->lock.wait_thr == NULL);
  245. ut_a(trx->lock.wait_lock == NULL);
  246. ut_a(trx->dict_operation_lock_mode == 0);
  247. ut_a(UT_LIST_GET_LEN(trx->lock.trx_locks) == 0);
  248. ut_ad(trx->autoinc_locks == NULL);
  249. ut_ad(trx->lock.table_locks.empty());
  250. ut_ad(!trx->abort);
  251. ut_ad(trx->hit_list.empty());
  252. ut_ad(trx->killed_by == 0);
  253. return(true);
  254. }
  255. };
  256. /** The lock strategy for TrxPool */
  257. struct TrxPoolLock {
  258. TrxPoolLock() { }
  259. /** Create the mutex */
  260. void create()
  261. {
  262. mutex_create(LATCH_ID_TRX_POOL, &m_mutex);
  263. }
  264. /** Acquire the mutex */
  265. void enter() { mutex_enter(&m_mutex); }
  266. /** Release the mutex */
  267. void exit() { mutex_exit(&m_mutex); }
  268. /** Free the mutex */
  269. void destroy() { mutex_free(&m_mutex); }
  270. /** Mutex to use */
  271. ib_mutex_t m_mutex;
  272. };
  273. /** The lock strategy for the TrxPoolManager */
  274. struct TrxPoolManagerLock {
  275. TrxPoolManagerLock() { }
  276. /** Create the mutex */
  277. void create()
  278. {
  279. mutex_create(LATCH_ID_TRX_POOL_MANAGER, &m_mutex);
  280. }
  281. /** Acquire the mutex */
  282. void enter() { mutex_enter(&m_mutex); }
  283. /** Release the mutex */
  284. void exit() { mutex_exit(&m_mutex); }
  285. /** Free the mutex */
  286. void destroy() { mutex_free(&m_mutex); }
  287. /** Mutex to use */
  288. ib_mutex_t m_mutex;
  289. };
  290. /** Use explicit mutexes for the trx_t pool and its manager. */
  291. typedef Pool<trx_t, TrxFactory, TrxPoolLock> trx_pool_t;
  292. typedef PoolManager<trx_pool_t, TrxPoolManagerLock > trx_pools_t;
  293. /** The trx_t pool manager */
  294. static trx_pools_t* trx_pools;
  295. /** Size of on trx_t pool in bytes. */
  296. static const ulint MAX_TRX_BLOCK_SIZE = 1024 * 1024 * 4;
  297. /** Create the trx_t pool */
  298. void
  299. trx_pool_init()
  300. {
  301. trx_pools = UT_NEW_NOKEY(trx_pools_t(MAX_TRX_BLOCK_SIZE));
  302. ut_a(trx_pools != 0);
  303. }
  304. /** Destroy the trx_t pool */
  305. void
  306. trx_pool_close()
  307. {
  308. UT_DELETE(trx_pools);
  309. trx_pools = 0;
  310. }
  311. /** @return a trx_t instance from trx_pools. */
  312. static
  313. trx_t*
  314. trx_create_low()
  315. {
  316. trx_t* trx = trx_pools->get();
  317. assert_trx_is_free(trx);
  318. mem_heap_t* heap;
  319. ib_alloc_t* alloc;
  320. /* We just got trx from pool, it should be non locking */
  321. ut_ad(trx->will_lock == 0);
  322. ut_ad(!trx->rw_trx_hash_pins);
  323. /* Background trx should not be forced to rollback,
  324. we will unset the flag for user trx. */
  325. trx->in_innodb |= TRX_FORCE_ROLLBACK_DISABLE;
  326. /* Trx state can be TRX_STATE_FORCED_ROLLBACK if
  327. the trx was forced to rollback before it's reused.*/
  328. DBUG_LOG("trx", "Create: " << trx);
  329. trx->state = TRX_STATE_NOT_STARTED;
  330. heap = mem_heap_create(sizeof(ib_vector_t) + sizeof(void*) * 8);
  331. alloc = ib_heap_allocator_create(heap);
  332. /* Remember to free the vector explicitly in trx_free(). */
  333. trx->autoinc_locks = ib_vector_create(alloc, sizeof(void**), 4);
  334. /* Should have been either just initialized or .clear()ed by
  335. trx_free(). */
  336. ut_a(trx->mod_tables.size() == 0);
  337. #ifdef WITH_WSREP
  338. trx->wsrep_event = NULL;
  339. #endif /* WITH_WSREP */
  340. return(trx);
  341. }
  342. /**
  343. Release a trx_t instance back to the pool.
  344. @param trx the instance to release. */
  345. static
  346. void
  347. trx_free(trx_t*& trx)
  348. {
  349. assert_trx_is_free(trx);
  350. trx_sys.rw_trx_hash.put_pins(trx);
  351. trx->mysql_thd = 0;
  352. trx->mysql_log_file_name = 0;
  353. // FIXME: We need to avoid this heap free/alloc for each commit.
  354. if (trx->autoinc_locks != NULL) {
  355. ut_ad(ib_vector_is_empty(trx->autoinc_locks));
  356. /* We allocated a dedicated heap for the vector. */
  357. ib_vector_free(trx->autoinc_locks);
  358. trx->autoinc_locks = NULL;
  359. }
  360. trx->mod_tables.clear();
  361. ut_ad(!trx->read_view.is_open());
  362. trx->read_view.close();
  363. /* trx locking state should have been reset before returning trx
  364. to pool */
  365. ut_ad(trx->will_lock == 0);
  366. trx_pools->mem_free(trx);
  367. trx = NULL;
  368. }
  369. /********************************************************************//**
  370. Creates a transaction object for background operations by the master thread.
  371. @return own: transaction object */
  372. trx_t*
  373. trx_allocate_for_background(void)
  374. /*=============================*/
  375. {
  376. trx_t* trx;
  377. trx = trx_create_low();
  378. return(trx);
  379. }
  380. /********************************************************************//**
  381. Creates a transaction object for MySQL.
  382. @return own: transaction object */
  383. trx_t*
  384. trx_allocate_for_mysql(void)
  385. /*========================*/
  386. {
  387. trx_t* trx;
  388. trx = trx_allocate_for_background();
  389. mutex_enter(&trx_sys.mutex);
  390. ut_d(trx->in_mysql_trx_list = TRUE);
  391. UT_LIST_ADD_FIRST(trx_sys.mysql_trx_list, trx);
  392. mutex_exit(&trx_sys.mutex);
  393. return(trx);
  394. }
  395. /** Check state of transaction before freeing it.
  396. @param trx trx object to validate */
  397. static
  398. void
  399. trx_validate_state_before_free(trx_t* trx)
  400. {
  401. ut_ad(!trx->declared_to_be_inside_innodb);
  402. ut_ad(!trx->n_mysql_tables_in_use);
  403. ut_ad(!trx->mysql_n_tables_locked);
  404. ut_ad(!trx->internal);
  405. if (trx->declared_to_be_inside_innodb) {
  406. ib::error() << "Freeing a trx (" << trx << ", "
  407. << trx_get_id_for_print(trx) << ") which is declared"
  408. " to be processing inside InnoDB";
  409. trx_print(stderr, trx, 600);
  410. putc('\n', stderr);
  411. /* This is an error but not a fatal error. We must keep
  412. the counters like srv_conc.n_active accurate. */
  413. srv_conc_force_exit_innodb(trx);
  414. }
  415. if (trx->n_mysql_tables_in_use != 0
  416. || trx->mysql_n_tables_locked != 0) {
  417. ib::error() << "MySQL is freeing a thd though"
  418. " trx->n_mysql_tables_in_use is "
  419. << trx->n_mysql_tables_in_use
  420. << " and trx->mysql_n_tables_locked is "
  421. << trx->mysql_n_tables_locked << ".";
  422. trx_print(stderr, trx, 600);
  423. ut_print_buf(stderr, trx, sizeof(trx_t));
  424. putc('\n', stderr);
  425. }
  426. trx->dict_operation = TRX_DICT_OP_NONE;
  427. assert_trx_is_inactive(trx);
  428. }
  429. /** Free and initialize a transaction object instantinated during recovery.
  430. @param trx trx object to free and initialize during recovery */
  431. void
  432. trx_free_resurrected(trx_t* trx)
  433. {
  434. trx_validate_state_before_free(trx);
  435. trx_init(trx);
  436. trx_free(trx);
  437. }
  438. /** Free a transaction that was allocated by background or user threads.
  439. @param trx trx object to free */
  440. void
  441. trx_free_for_background(trx_t* trx)
  442. {
  443. trx_validate_state_before_free(trx);
  444. trx_free(trx);
  445. }
  446. /** At shutdown, frees a transaction object. */
  447. void
  448. trx_free_at_shutdown(trx_t *trx)
  449. {
  450. ut_ad(trx->is_recovered);
  451. ut_a(trx_state_eq(trx, TRX_STATE_PREPARED)
  452. || (trx_state_eq(trx, TRX_STATE_ACTIVE)
  453. && (!srv_was_started
  454. || srv_operation == SRV_OPERATION_RESTORE
  455. || srv_operation == SRV_OPERATION_RESTORE_EXPORT
  456. || srv_read_only_mode
  457. || srv_force_recovery >= SRV_FORCE_NO_TRX_UNDO
  458. || (!srv_is_being_started
  459. && !srv_undo_sources && srv_fast_shutdown))));
  460. ut_a(trx->magic_n == TRX_MAGIC_N);
  461. lock_trx_release_locks(trx);
  462. trx_undo_free_at_shutdown(trx);
  463. ut_a(!trx->read_only);
  464. DBUG_LOG("trx", "Free prepared: " << trx);
  465. trx->state = TRX_STATE_NOT_STARTED;
  466. /* Undo trx_resurrect_table_locks(). */
  467. lock_trx_lock_list_init(&trx->lock.trx_locks);
  468. /* Note: This vector is not guaranteed to be empty because the
  469. transaction was never committed and therefore lock_trx_release()
  470. was not called. */
  471. trx->lock.table_locks.clear();
  472. trx_free(trx);
  473. }
  474. /** Disconnect a transaction from MySQL and optionally mark it as if
  475. it's been recovered. For the marking the transaction must be in prepared state.
  476. The recovery-marked transaction is going to survive "alone" so its association
  477. with the mysql handle is destroyed now rather than when it will be
  478. finally freed.
  479. @param[in,out] trx transaction
  480. @param[in] prepared boolean value to specify whether trx is
  481. for recovery or not. */
  482. inline
  483. void
  484. trx_disconnect_from_mysql(
  485. trx_t* trx,
  486. bool prepared)
  487. {
  488. trx->read_view.close();
  489. mutex_enter(&trx_sys.mutex);
  490. ut_ad(trx->in_mysql_trx_list);
  491. ut_d(trx->in_mysql_trx_list = FALSE);
  492. UT_LIST_REMOVE(trx_sys.mysql_trx_list, trx);
  493. if (prepared) {
  494. ut_ad(trx_state_eq(trx, TRX_STATE_PREPARED));
  495. trx->is_recovered = true;
  496. trx->mysql_thd = NULL;
  497. /* todo/fixme: suggest to do it at innodb prepare */
  498. trx->will_lock = 0;
  499. }
  500. mutex_exit(&trx_sys.mutex);
  501. }
  502. /** Disconnect a transaction from MySQL.
  503. @param[in,out] trx transaction */
  504. inline
  505. void
  506. trx_disconnect_plain(trx_t* trx)
  507. {
  508. trx_disconnect_from_mysql(trx, false);
  509. }
  510. /** Disconnect a prepared transaction from MySQL.
  511. @param[in,out] trx transaction */
  512. void
  513. trx_disconnect_prepared(trx_t* trx)
  514. {
  515. trx_disconnect_from_mysql(trx, true);
  516. }
  517. /** Free a transaction object for MySQL.
  518. @param[in,out] trx transaction */
  519. void
  520. trx_free_for_mysql(trx_t* trx)
  521. {
  522. trx_disconnect_plain(trx);
  523. trx_free_for_background(trx);
  524. }
  525. /****************************************************************//**
  526. Resurrect the table locks for a resurrected transaction. */
  527. static
  528. void
  529. trx_resurrect_table_locks(
  530. /*======================*/
  531. trx_t* trx, /*!< in/out: transaction */
  532. const trx_undo_t* undo) /*!< in: undo log */
  533. {
  534. mtr_t mtr;
  535. page_t* undo_page;
  536. trx_undo_rec_t* undo_rec;
  537. table_id_set tables;
  538. ut_ad(trx_state_eq(trx, TRX_STATE_ACTIVE) ||
  539. trx_state_eq(trx, TRX_STATE_PREPARED));
  540. if (undo->empty) {
  541. return;
  542. }
  543. mtr_start(&mtr);
  544. /* trx_rseg_mem_create() may have acquired an X-latch on this
  545. page, so we cannot acquire an S-latch. */
  546. undo_page = trx_undo_page_get(
  547. page_id_t(undo->space, undo->top_page_no), &mtr);
  548. undo_rec = undo_page + undo->top_offset;
  549. do {
  550. ulint type;
  551. undo_no_t undo_no;
  552. table_id_t table_id;
  553. ulint cmpl_info;
  554. bool updated_extern;
  555. page_t* undo_rec_page = page_align(undo_rec);
  556. if (undo_rec_page != undo_page) {
  557. mtr.release_page(undo_page, MTR_MEMO_PAGE_X_FIX);
  558. undo_page = undo_rec_page;
  559. }
  560. trx_undo_rec_get_pars(
  561. undo_rec, &type, &cmpl_info,
  562. &updated_extern, &undo_no, &table_id);
  563. tables.insert(table_id);
  564. undo_rec = trx_undo_get_prev_rec(
  565. undo_rec, undo->hdr_page_no,
  566. undo->hdr_offset, false, &mtr);
  567. } while (undo_rec);
  568. mtr_commit(&mtr);
  569. for (table_id_set::const_iterator i = tables.begin();
  570. i != tables.end(); i++) {
  571. if (dict_table_t* table = dict_table_open_on_id(
  572. *i, FALSE, DICT_TABLE_OP_LOAD_TABLESPACE)) {
  573. if (!table->is_readable()) {
  574. mutex_enter(&dict_sys->mutex);
  575. dict_table_close(table, TRUE, FALSE);
  576. dict_table_remove_from_cache(table);
  577. mutex_exit(&dict_sys->mutex);
  578. continue;
  579. }
  580. if (trx->state == TRX_STATE_PREPARED) {
  581. trx->mod_tables.insert(
  582. trx_mod_tables_t::value_type(table,
  583. 0));
  584. }
  585. lock_table_ix_resurrect(table, trx);
  586. DBUG_LOG("ib_trx",
  587. "resurrect " << ib::hex(trx->id)
  588. << " IX lock on " << table->name);
  589. dict_table_close(table, FALSE, FALSE);
  590. }
  591. }
  592. }
  593. /**
  594. Resurrect the transactions that were doing inserts/updates the time of the
  595. crash, they need to be undone.
  596. */
  597. static void trx_resurrect(trx_undo_t *undo, trx_rseg_t *rseg,
  598. ib_time_t start_time, uint64_t *rows_to_undo,
  599. bool is_old_insert)
  600. {
  601. trx_state_t state;
  602. /*
  603. This is single-threaded startup code, we do not need the
  604. protection of trx->mutex or trx_sys.mutex here.
  605. */
  606. switch (undo->state)
  607. {
  608. case TRX_UNDO_ACTIVE:
  609. state= TRX_STATE_ACTIVE;
  610. break;
  611. case TRX_UNDO_PREPARED:
  612. /*
  613. Prepared transactions are left in the prepared state
  614. waiting for a commit or abort decision from MySQL
  615. */
  616. ib::info() << "Transaction " << undo->trx_id
  617. << " was in the XA prepared state.";
  618. state= TRX_STATE_PREPARED;
  619. break;
  620. default:
  621. if (is_old_insert && srv_force_recovery < SRV_FORCE_NO_TRX_UNDO)
  622. trx_undo_commit_cleanup(undo, false);
  623. return;
  624. }
  625. trx_t *trx= trx_allocate_for_background();
  626. trx->state= state;
  627. ut_d(trx->start_file= __FILE__);
  628. ut_d(trx->start_line= __LINE__);
  629. ut_ad(trx->no == TRX_ID_MAX);
  630. if (is_old_insert)
  631. trx->rsegs.m_redo.old_insert= undo;
  632. else
  633. trx->rsegs.m_redo.undo= undo;
  634. if (!undo->empty)
  635. {
  636. trx->undo_no= undo->top_undo_no + 1;
  637. trx->undo_rseg_space= undo->rseg->space;
  638. }
  639. trx->rsegs.m_redo.rseg= rseg;
  640. /*
  641. For transactions with active data will not have rseg size = 1
  642. or will not qualify for purge limit criteria. So it is safe to increment
  643. this trx_ref_count w/o mutex protection.
  644. */
  645. ++trx->rsegs.m_redo.rseg->trx_ref_count;
  646. *trx->xid= undo->xid;
  647. trx->id= undo->trx_id;
  648. trx->is_recovered= true;
  649. trx->start_time= start_time;
  650. if (undo->dict_operation)
  651. {
  652. trx_set_dict_operation(trx, TRX_DICT_OP_TABLE);
  653. trx->table_id= undo->table_id;
  654. }
  655. trx_sys.rw_trx_hash.insert(trx);
  656. trx_sys.rw_trx_hash.put_pins(trx);
  657. trx_resurrect_table_locks(trx, undo);
  658. if (trx_state_eq(trx, TRX_STATE_ACTIVE))
  659. *rows_to_undo+= trx->undo_no;
  660. }
  661. /** Initialize (resurrect) transactions at startup. */
  662. void
  663. trx_lists_init_at_db_start()
  664. {
  665. ut_a(srv_is_being_started);
  666. ut_ad(!srv_was_started);
  667. ut_ad(!purge_sys.is_initialised());
  668. if (srv_operation == SRV_OPERATION_RESTORE) {
  669. /* mariabackup --prepare only deals with
  670. the redo log and the data files, not with
  671. transactions or the data dictionary. */
  672. trx_rseg_array_init();
  673. return;
  674. }
  675. if (srv_force_recovery >= SRV_FORCE_NO_UNDO_LOG_SCAN) {
  676. return;
  677. }
  678. purge_sys.create();
  679. trx_rseg_array_init();
  680. /* Look from the rollback segments if there exist undo logs for
  681. transactions. */
  682. const ib_time_t start_time = ut_time();
  683. uint64_t rows_to_undo = 0;
  684. for (ulint i = 0; i < TRX_SYS_N_RSEGS; ++i) {
  685. trx_undo_t* undo;
  686. trx_rseg_t* rseg = trx_sys.rseg_array[i];
  687. /* Some rollback segment may be unavailable,
  688. especially if the server was previously run with a
  689. non-default value of innodb_undo_logs. */
  690. if (rseg == NULL) {
  691. continue;
  692. }
  693. /* Resurrect transactions that were doing inserts
  694. using the old separate insert_undo log. */
  695. undo = UT_LIST_GET_FIRST(rseg->old_insert_list);
  696. while (undo) {
  697. trx_undo_t* next = UT_LIST_GET_NEXT(undo_list, undo);
  698. trx_resurrect(undo, rseg, start_time, &rows_to_undo,
  699. true);
  700. undo = next;
  701. }
  702. /* Ressurrect other transactions. */
  703. for (undo = UT_LIST_GET_FIRST(rseg->undo_list);
  704. undo != NULL;
  705. undo = UT_LIST_GET_NEXT(undo_list, undo)) {
  706. trx_t *trx = trx_sys.rw_trx_hash.find(0, undo->trx_id);
  707. if (!trx) {
  708. trx_resurrect(undo, rseg, start_time,
  709. &rows_to_undo, false);
  710. } else {
  711. ut_ad(trx_state_eq(trx, TRX_STATE_ACTIVE) ||
  712. trx_state_eq(trx, TRX_STATE_PREPARED));
  713. ut_ad(trx->start_time == start_time);
  714. ut_ad(trx->is_recovered);
  715. ut_ad(trx->rsegs.m_redo.rseg == rseg);
  716. ut_ad(trx->rsegs.m_redo.rseg->trx_ref_count);
  717. trx->rsegs.m_redo.undo = undo;
  718. if (!undo->empty
  719. && undo->top_undo_no >= trx->undo_no) {
  720. if (trx_state_eq(trx,
  721. TRX_STATE_ACTIVE)) {
  722. rows_to_undo -= trx->undo_no;
  723. rows_to_undo +=
  724. undo->top_undo_no + 1;
  725. }
  726. trx->undo_no = undo->top_undo_no + 1;
  727. trx->undo_rseg_space =
  728. undo->rseg->space;
  729. }
  730. trx_resurrect_table_locks(trx, undo);
  731. }
  732. }
  733. }
  734. if (trx_sys.rw_trx_hash.size()) {
  735. ib::info() << trx_sys.rw_trx_hash.size()
  736. << " transaction(s) which must be rolled back or"
  737. " cleaned up in total " << rows_to_undo
  738. << " row operations to undo";
  739. ib::info() << "Trx id counter is " << trx_sys.get_max_trx_id();
  740. }
  741. trx_sys.clone_oldest_view();
  742. }
  743. /** Assign a persistent rollback segment in a round-robin fashion,
  744. evenly distributed between 0 and innodb_undo_logs-1
  745. @return persistent rollback segment
  746. @retval NULL if innodb_read_only */
  747. static
  748. trx_rseg_t*
  749. trx_assign_rseg_low()
  750. {
  751. if (srv_read_only_mode) {
  752. ut_ad(srv_undo_logs == ULONG_UNDEFINED);
  753. return(NULL);
  754. }
  755. /* The first slot is always assigned to the system tablespace. */
  756. ut_ad(trx_sys.rseg_array[0]->space == TRX_SYS_SPACE);
  757. /* Choose a rollback segment evenly distributed between 0 and
  758. innodb_undo_logs-1 in a round-robin fashion, skipping those
  759. undo tablespaces that are scheduled for truncation.
  760. Because rseg_slot is not protected by atomics or any mutex, race
  761. conditions are possible, meaning that multiple transactions
  762. that start modifications concurrently will write their undo
  763. log to the same rollback segment. */
  764. static ulong rseg_slot;
  765. ulint slot = rseg_slot++ % srv_undo_logs;
  766. trx_rseg_t* rseg;
  767. #ifdef UNIV_DEBUG
  768. ulint start_scan_slot = slot;
  769. bool look_for_rollover = false;
  770. #endif /* UNIV_DEBUG */
  771. bool allocated = false;
  772. do {
  773. for (;;) {
  774. rseg = trx_sys.rseg_array[slot];
  775. #ifdef UNIV_DEBUG
  776. /* Ensure that we are not revisiting the same
  777. slot that we have already inspected. */
  778. if (look_for_rollover) {
  779. ut_ad(start_scan_slot != slot);
  780. }
  781. look_for_rollover = true;
  782. #endif /* UNIV_DEBUG */
  783. slot = (slot + 1) % srv_undo_logs;
  784. if (rseg == NULL) {
  785. continue;
  786. }
  787. ut_ad(rseg->is_persistent());
  788. if (rseg->space != TRX_SYS_SPACE) {
  789. ut_ad(srv_undo_tablespaces > 1);
  790. if (rseg->skip_allocation) {
  791. continue;
  792. }
  793. } else if (trx_rseg_t* next
  794. = trx_sys.rseg_array[slot]) {
  795. if (next->space != TRX_SYS_SPACE
  796. && srv_undo_tablespaces > 0) {
  797. /** If dedicated
  798. innodb_undo_tablespaces have
  799. been configured, try to use them
  800. instead of the system tablespace. */
  801. continue;
  802. }
  803. }
  804. break;
  805. }
  806. /* By now we have only selected the rseg but not marked it
  807. allocated. By marking it allocated we are ensuring that it will
  808. never be selected for UNDO truncate purge. */
  809. mutex_enter(&rseg->mutex);
  810. if (!rseg->skip_allocation) {
  811. rseg->trx_ref_count++;
  812. allocated = true;
  813. }
  814. mutex_exit(&rseg->mutex);
  815. } while (!allocated);
  816. ut_ad(rseg->trx_ref_count > 0);
  817. ut_ad(rseg->is_persistent());
  818. return(rseg);
  819. }
  820. /** Assign a rollback segment for modifying temporary tables.
  821. @return the assigned rollback segment */
  822. trx_rseg_t*
  823. trx_t::assign_temp_rseg()
  824. {
  825. ut_ad(!rsegs.m_noredo.rseg);
  826. ut_ad(!trx_is_autocommit_non_locking(this));
  827. compile_time_assert(ut_is_2pow(TRX_SYS_N_RSEGS));
  828. /* Choose a temporary rollback segment between 0 and 127
  829. in a round-robin fashion. Because rseg_slot is not protected by
  830. atomics or any mutex, race conditions are possible, meaning that
  831. multiple transactions that start modifications concurrently
  832. will write their undo log to the same rollback segment. */
  833. static ulong rseg_slot;
  834. trx_rseg_t* rseg = trx_sys.temp_rsegs[
  835. rseg_slot++ & (TRX_SYS_N_RSEGS - 1)];
  836. ut_ad(!rseg->is_persistent());
  837. rsegs.m_noredo.rseg = rseg;
  838. if (id == 0) {
  839. trx_sys.register_rw(this);
  840. }
  841. ut_ad(!rseg->is_persistent());
  842. return(rseg);
  843. }
  844. /****************************************************************//**
  845. Starts a transaction. */
  846. static
  847. void
  848. trx_start_low(
  849. /*==========*/
  850. trx_t* trx, /*!< in: transaction */
  851. bool read_write) /*!< in: true if read-write transaction */
  852. {
  853. ut_ad(!trx->in_rollback);
  854. ut_ad(!trx->is_recovered);
  855. ut_ad(trx->hit_list.empty());
  856. ut_ad(trx->start_line != 0);
  857. ut_ad(trx->start_file != 0);
  858. ut_ad(trx->roll_limit == 0);
  859. ut_ad(trx->error_state == DB_SUCCESS);
  860. ut_ad(trx->rsegs.m_redo.rseg == NULL);
  861. ut_ad(trx->rsegs.m_noredo.rseg == NULL);
  862. ut_ad(trx_state_eq(trx, TRX_STATE_NOT_STARTED));
  863. ut_ad(UT_LIST_GET_LEN(trx->lock.trx_locks) == 0);
  864. ut_ad(!(trx->in_innodb & TRX_FORCE_ROLLBACK));
  865. ut_ad(!(trx->in_innodb & TRX_FORCE_ROLLBACK_ASYNC));
  866. ++trx->version;
  867. /* Check whether it is an AUTOCOMMIT SELECT */
  868. trx->auto_commit = thd_trx_is_auto_commit(trx->mysql_thd);
  869. trx->read_only = srv_read_only_mode
  870. || (!trx->ddl && !trx->internal
  871. && thd_trx_is_read_only(trx->mysql_thd));
  872. if (!trx->auto_commit) {
  873. ++trx->will_lock;
  874. } else if (trx->will_lock == 0) {
  875. trx->read_only = true;
  876. }
  877. #ifdef WITH_WSREP
  878. memset(trx->xid, 0, sizeof(xid_t));
  879. trx->xid->formatID = -1;
  880. #endif /* WITH_WSREP */
  881. /* The initial value for trx->no: TRX_ID_MAX is used in
  882. read_view_open_now: */
  883. trx->no = TRX_ID_MAX;
  884. ut_a(ib_vector_is_empty(trx->autoinc_locks));
  885. ut_a(trx->lock.table_locks.empty());
  886. /* If this transaction came from trx_allocate_for_mysql(),
  887. trx->in_mysql_trx_list would hold. In that case, the trx->state
  888. change must be protected by the trx_sys.mutex, so that
  889. lock_print_info_all_transactions() will have a consistent view. */
  890. /* No other thread can access this trx object through rw_trx_hash, thus
  891. we don't need trx_sys.mutex protection for that purpose. Still this
  892. trx can be found through trx_sys.mysql_trx_list, which means state
  893. change must be protected by e.g. trx->mutex.
  894. For now we update it without mutex protection, because original code
  895. did it this way. It has to be reviewed and fixed properly. */
  896. trx->state = TRX_STATE_ACTIVE;
  897. /* By default all transactions are in the read-only list unless they
  898. are non-locking auto-commit read only transactions or background
  899. (internal) transactions. Note: Transactions marked explicitly as
  900. read only can write to temporary tables, we put those on the RO
  901. list too. */
  902. if (!trx->read_only
  903. && (trx->mysql_thd == 0 || read_write || trx->ddl)) {
  904. /* Temporary rseg is assigned only if the transaction
  905. updates a temporary table */
  906. trx->rsegs.m_redo.rseg = trx_assign_rseg_low();
  907. ut_ad(trx->rsegs.m_redo.rseg != 0
  908. || srv_read_only_mode
  909. || srv_force_recovery >= SRV_FORCE_NO_TRX_UNDO);
  910. trx_sys.register_rw(trx);
  911. } else {
  912. trx->id = 0;
  913. if (!trx_is_autocommit_non_locking(trx)) {
  914. /* If this is a read-only transaction that is writing
  915. to a temporary table then it needs a transaction id
  916. to write to the temporary table. */
  917. if (read_write) {
  918. ut_ad(!srv_read_only_mode);
  919. trx_sys.register_rw(trx);
  920. }
  921. } else {
  922. ut_ad(!read_write);
  923. }
  924. }
  925. if (trx->mysql_thd != NULL) {
  926. trx->start_time = thd_start_time_in_secs(trx->mysql_thd);
  927. trx->start_time_micro = thd_query_start_micro(trx->mysql_thd);
  928. } else {
  929. trx->start_time = ut_time();
  930. trx->start_time_micro = 0;
  931. }
  932. ut_a(trx->error_state == DB_SUCCESS);
  933. MONITOR_INC(MONITOR_TRX_ACTIVE);
  934. }
  935. /** Set the serialisation number for a persistent committed transaction.
  936. @param[in,out] trx committed transaction with persistent changes */
  937. static
  938. void
  939. trx_serialise(trx_t* trx)
  940. {
  941. trx_rseg_t *rseg = trx->rsegs.m_redo.rseg;
  942. ut_ad(rseg);
  943. ut_ad(mutex_own(&rseg->mutex));
  944. if (rseg->last_page_no == FIL_NULL) {
  945. mutex_enter(&purge_sys.pq_mutex);
  946. }
  947. trx_sys.assign_new_trx_no(trx);
  948. /* If the rollback segment is not empty then the
  949. new trx_t::no can't be less than any trx_t::no
  950. already in the rollback segment. User threads only
  951. produce events when a rollback segment is empty. */
  952. if (rseg->last_page_no == FIL_NULL) {
  953. purge_sys.purge_queue.push(TrxUndoRsegs(trx->no, *rseg));
  954. mutex_exit(&purge_sys.pq_mutex);
  955. }
  956. }
  957. /****************************************************************//**
  958. Assign the transaction its history serialisation number and write the
  959. update UNDO log record to the assigned rollback segment. */
  960. static
  961. void
  962. trx_write_serialisation_history(
  963. /*============================*/
  964. trx_t* trx, /*!< in/out: transaction */
  965. mtr_t* mtr) /*!< in/out: mini-transaction */
  966. {
  967. /* Change the undo log segment states from TRX_UNDO_ACTIVE to some
  968. other state: these modifications to the file data structure define
  969. the transaction as committed in the file based domain, at the
  970. serialization point of the log sequence number lsn obtained below. */
  971. /* We have to hold the rseg mutex because update log headers have
  972. to be put to the history list in the (serialisation) order of the
  973. UNDO trx number. This is required for the purge in-memory data
  974. structures too. */
  975. if (trx_undo_t* undo = trx->rsegs.m_noredo.undo) {
  976. /* Undo log for temporary tables is discarded at transaction
  977. commit. There is no purge for temporary tables, and also no
  978. MVCC, because they are private to a session. */
  979. mtr_t temp_mtr;
  980. temp_mtr.start();
  981. temp_mtr.set_log_mode(MTR_LOG_NO_REDO);
  982. mutex_enter(&trx->rsegs.m_noredo.rseg->mutex);
  983. trx_undo_set_state_at_finish(undo, &temp_mtr);
  984. mutex_exit(&trx->rsegs.m_noredo.rseg->mutex);
  985. temp_mtr.commit();
  986. }
  987. trx_rseg_t* rseg = trx->rsegs.m_redo.rseg;
  988. if (!rseg) {
  989. ut_ad(!trx->rsegs.m_redo.undo);
  990. ut_ad(!trx->rsegs.m_redo.old_insert);
  991. return;
  992. }
  993. trx_undo_t*& undo = trx->rsegs.m_redo.undo;
  994. trx_undo_t*& old_insert = trx->rsegs.m_redo.old_insert;
  995. if (!undo && !old_insert) {
  996. return;
  997. }
  998. ut_ad(!trx->read_only);
  999. ut_ad(!undo || undo->rseg == rseg);
  1000. ut_ad(!old_insert || old_insert->rseg == rseg);
  1001. mutex_enter(&rseg->mutex);
  1002. /* Assign the transaction serialisation number and add any
  1003. undo log to the purge queue. */
  1004. trx_serialise(trx);
  1005. /* It is not necessary to acquire trx->undo_mutex here because
  1006. only a single OS thread is allowed to commit this transaction.
  1007. The undo logs will be processed and purged later. */
  1008. if (UNIV_LIKELY_NULL(old_insert)) {
  1009. UT_LIST_REMOVE(rseg->old_insert_list, old_insert);
  1010. trx_purge_add_undo_to_history(trx, old_insert, mtr);
  1011. }
  1012. if (undo) {
  1013. UT_LIST_REMOVE(rseg->undo_list, undo);
  1014. trx_purge_add_undo_to_history(trx, undo, mtr);
  1015. }
  1016. mutex_exit(&rseg->mutex);
  1017. MONITOR_INC(MONITOR_TRX_COMMIT_UNDO);
  1018. trx->mysql_log_file_name = NULL;
  1019. }
  1020. /********************************************************************
  1021. Finalize a transaction containing updates for a FTS table. */
  1022. static
  1023. void
  1024. trx_finalize_for_fts_table(
  1025. /*=======================*/
  1026. fts_trx_table_t* ftt) /* in: FTS trx table */
  1027. {
  1028. fts_t* fts = ftt->table->fts;
  1029. fts_doc_ids_t* doc_ids = ftt->added_doc_ids;
  1030. mutex_enter(&fts->bg_threads_mutex);
  1031. if (fts->fts_status & BG_THREAD_STOP) {
  1032. /* The table is about to be dropped, no use
  1033. adding anything to its work queue. */
  1034. mutex_exit(&fts->bg_threads_mutex);
  1035. } else {
  1036. mem_heap_t* heap;
  1037. mutex_exit(&fts->bg_threads_mutex);
  1038. ut_a(fts->add_wq);
  1039. heap = static_cast<mem_heap_t*>(doc_ids->self_heap->arg);
  1040. ib_wqueue_add(fts->add_wq, doc_ids, heap);
  1041. /* fts_trx_table_t no longer owns the list. */
  1042. ftt->added_doc_ids = NULL;
  1043. }
  1044. }
  1045. /******************************************************************//**
  1046. Finalize a transaction containing updates to FTS tables. */
  1047. static
  1048. void
  1049. trx_finalize_for_fts(
  1050. /*=================*/
  1051. trx_t* trx, /*!< in/out: transaction */
  1052. bool is_commit) /*!< in: true if the transaction was
  1053. committed, false if it was rolled back. */
  1054. {
  1055. if (is_commit) {
  1056. const ib_rbt_node_t* node;
  1057. ib_rbt_t* tables;
  1058. fts_savepoint_t* savepoint;
  1059. savepoint = static_cast<fts_savepoint_t*>(
  1060. ib_vector_last(trx->fts_trx->savepoints));
  1061. tables = savepoint->tables;
  1062. for (node = rbt_first(tables);
  1063. node;
  1064. node = rbt_next(tables, node)) {
  1065. fts_trx_table_t** ftt;
  1066. ftt = rbt_value(fts_trx_table_t*, node);
  1067. if ((*ftt)->added_doc_ids) {
  1068. trx_finalize_for_fts_table(*ftt);
  1069. }
  1070. }
  1071. }
  1072. fts_trx_free(trx->fts_trx);
  1073. trx->fts_trx = NULL;
  1074. }
  1075. /**********************************************************************//**
  1076. If required, flushes the log to disk based on the value of
  1077. innodb_flush_log_at_trx_commit. */
  1078. static
  1079. void
  1080. trx_flush_log_if_needed_low(
  1081. /*========================*/
  1082. lsn_t lsn) /*!< in: lsn up to which logs are to be
  1083. flushed. */
  1084. {
  1085. bool flush = srv_file_flush_method != SRV_NOSYNC;
  1086. switch (srv_flush_log_at_trx_commit) {
  1087. case 3:
  1088. case 2:
  1089. /* Write the log but do not flush it to disk */
  1090. flush = false;
  1091. /* fall through */
  1092. case 1:
  1093. /* Write the log and optionally flush it to disk */
  1094. log_write_up_to(lsn, flush);
  1095. return;
  1096. case 0:
  1097. /* Do nothing */
  1098. return;
  1099. }
  1100. ut_error;
  1101. }
  1102. /**********************************************************************//**
  1103. If required, flushes the log to disk based on the value of
  1104. innodb_flush_log_at_trx_commit. */
  1105. static
  1106. void
  1107. trx_flush_log_if_needed(
  1108. /*====================*/
  1109. lsn_t lsn, /*!< in: lsn up to which logs are to be
  1110. flushed. */
  1111. trx_t* trx) /*!< in/out: transaction */
  1112. {
  1113. trx->op_info = "flushing log";
  1114. trx_flush_log_if_needed_low(lsn);
  1115. trx->op_info = "";
  1116. }
  1117. /**********************************************************************//**
  1118. For each table that has been modified by the given transaction: update
  1119. its dict_table_t::update_time with the current timestamp. Clear the list
  1120. of the modified tables at the end. */
  1121. static
  1122. void
  1123. trx_update_mod_tables_timestamp(
  1124. /*============================*/
  1125. trx_t* trx) /*!< in: transaction */
  1126. {
  1127. ut_ad(trx->id != 0);
  1128. /* consider using trx->start_time if calling time() is too
  1129. expensive here */
  1130. time_t now = ut_time();
  1131. trx_mod_tables_t::const_iterator end = trx->mod_tables.end();
  1132. for (trx_mod_tables_t::const_iterator it = trx->mod_tables.begin();
  1133. it != end;
  1134. ++it) {
  1135. /* This could be executed by multiple threads concurrently
  1136. on the same table object. This is fine because time_t is
  1137. word size or less. And _purely_ _theoretically_, even if
  1138. time_t write is not atomic, likely the value of 'now' is
  1139. the same in all threads and even if it is not, getting a
  1140. "garbage" in table->update_time is justified because
  1141. protecting it with a latch here would be too performance
  1142. intrusive. */
  1143. it->first->update_time = now;
  1144. }
  1145. trx->mod_tables.clear();
  1146. }
  1147. /****************************************************************//**
  1148. Commits a transaction in memory. */
  1149. static
  1150. void
  1151. trx_commit_in_memory(
  1152. /*=================*/
  1153. trx_t* trx, /*!< in/out: transaction */
  1154. const mtr_t* mtr) /*!< in: mini-transaction of
  1155. trx_write_serialisation_history(), or NULL if
  1156. the transaction did not modify anything */
  1157. {
  1158. trx->must_flush_log_later = false;
  1159. trx->read_view.unuse();
  1160. if (trx_is_autocommit_non_locking(trx)) {
  1161. ut_ad(trx->id == 0);
  1162. ut_ad(trx->read_only);
  1163. ut_a(!trx->is_recovered);
  1164. ut_ad(trx->rsegs.m_redo.rseg == NULL);
  1165. /* Note: We are asserting without holding the lock mutex. But
  1166. that is OK because this transaction is not waiting and cannot
  1167. be rolled back and no new locks can (or should not) be added
  1168. becuase it is flagged as a non-locking read-only transaction. */
  1169. ut_a(UT_LIST_GET_LEN(trx->lock.trx_locks) == 0);
  1170. /* This state change is not protected by any mutex, therefore
  1171. there is an inherent race here around state transition during
  1172. printouts. We ignore this race for the sake of efficiency.
  1173. However, the trx_sys_t::mutex will protect the trx_t instance
  1174. and it cannot be removed from the mysql_trx_list and freed
  1175. without first acquiring the trx_sys_t::mutex. */
  1176. ut_ad(trx_state_eq(trx, TRX_STATE_ACTIVE));
  1177. MONITOR_INC(MONITOR_TRX_NL_RO_COMMIT);
  1178. /* AC-NL-RO transactions can't be rolled back asynchronously. */
  1179. ut_ad(!trx->abort);
  1180. ut_ad(!(trx->in_innodb
  1181. & (TRX_FORCE_ROLLBACK | TRX_FORCE_ROLLBACK_ASYNC)));
  1182. DBUG_LOG("trx", "Autocommit in memory: " << trx);
  1183. trx->state = TRX_STATE_NOT_STARTED;
  1184. } else {
  1185. if (trx->id > 0) {
  1186. /* For consistent snapshot, we need to remove current
  1187. transaction from rw_trx_hash before doing commit and
  1188. releasing locks. */
  1189. trx_sys.deregister_rw(trx);
  1190. }
  1191. lock_trx_release_locks(trx);
  1192. /* Remove the transaction from the list of active
  1193. transactions now that it no longer holds any user locks. */
  1194. ut_ad(trx_state_eq(trx, TRX_STATE_COMMITTED_IN_MEMORY));
  1195. DEBUG_SYNC_C("after_trx_committed_in_memory");
  1196. if (trx->read_only || trx->rsegs.m_redo.rseg == NULL) {
  1197. MONITOR_INC(MONITOR_TRX_RO_COMMIT);
  1198. } else {
  1199. trx_update_mod_tables_timestamp(trx);
  1200. MONITOR_INC(MONITOR_TRX_RW_COMMIT);
  1201. }
  1202. }
  1203. ut_ad(!trx->rsegs.m_redo.undo);
  1204. if (trx_rseg_t* rseg = trx->rsegs.m_redo.rseg) {
  1205. mutex_enter(&rseg->mutex);
  1206. ut_ad(rseg->trx_ref_count > 0);
  1207. --rseg->trx_ref_count;
  1208. mutex_exit(&rseg->mutex);
  1209. if (trx_undo_t*& insert = trx->rsegs.m_redo.old_insert) {
  1210. ut_ad(insert->rseg == rseg);
  1211. trx_undo_commit_cleanup(insert, false);
  1212. insert = NULL;
  1213. }
  1214. }
  1215. ut_ad(!trx->rsegs.m_redo.old_insert);
  1216. if (mtr != NULL) {
  1217. if (trx_undo_t*& undo = trx->rsegs.m_noredo.undo) {
  1218. ut_ad(undo->rseg == trx->rsegs.m_noredo.rseg);
  1219. trx_undo_commit_cleanup(undo, true);
  1220. undo = NULL;
  1221. }
  1222. /* NOTE that we could possibly make a group commit more
  1223. efficient here: call os_thread_yield here to allow also other
  1224. trxs to come to commit! */
  1225. /*-------------------------------------*/
  1226. /* Depending on the my.cnf options, we may now write the log
  1227. buffer to the log files, making the transaction durable if
  1228. the OS does not crash. We may also flush the log files to
  1229. disk, making the transaction durable also at an OS crash or a
  1230. power outage.
  1231. The idea in InnoDB's group commit is that a group of
  1232. transactions gather behind a trx doing a physical disk write
  1233. to log files, and when that physical write has been completed,
  1234. one of those transactions does a write which commits the whole
  1235. group. Note that this group commit will only bring benefit if
  1236. there are > 2 users in the database. Then at least 2 users can
  1237. gather behind one doing the physical log write to disk.
  1238. If we are calling trx_commit() under prepare_commit_mutex, we
  1239. will delay possible log write and flush to a separate function
  1240. trx_commit_complete_for_mysql(), which is only called when the
  1241. thread has released the mutex. This is to make the
  1242. group commit algorithm to work. Otherwise, the prepare_commit
  1243. mutex would serialize all commits and prevent a group of
  1244. transactions from gathering. */
  1245. lsn_t lsn = mtr->commit_lsn();
  1246. if (lsn == 0) {
  1247. /* Nothing to be done. */
  1248. } else if (trx->flush_log_later) {
  1249. /* Do nothing yet */
  1250. trx->must_flush_log_later = true;
  1251. } else if (srv_flush_log_at_trx_commit == 0) {
  1252. /* Do nothing */
  1253. } else {
  1254. trx_flush_log_if_needed(lsn, trx);
  1255. }
  1256. trx->commit_lsn = lsn;
  1257. /* Tell server some activity has happened, since the trx
  1258. does changes something. Background utility threads like
  1259. master thread, purge thread or page_cleaner thread might
  1260. have some work to do. */
  1261. srv_active_wake_master_thread();
  1262. }
  1263. ut_ad(!trx->rsegs.m_noredo.undo);
  1264. /* Free all savepoints, starting from the first. */
  1265. trx_named_savept_t* savep = UT_LIST_GET_FIRST(trx->trx_savepoints);
  1266. trx_roll_savepoints_free(trx, savep);
  1267. if (trx->fts_trx != NULL) {
  1268. trx_finalize_for_fts(trx, trx->undo_no != 0);
  1269. }
  1270. trx_mutex_enter(trx);
  1271. trx->dict_operation = TRX_DICT_OP_NONE;
  1272. #ifdef WITH_WSREP
  1273. if (trx->mysql_thd && wsrep_on(trx->mysql_thd)) {
  1274. trx->lock.was_chosen_as_deadlock_victim = FALSE;
  1275. }
  1276. #endif
  1277. /* Because we can rollback transactions asynchronously, we change
  1278. the state at the last step. trx_t::abort cannot change once commit
  1279. or rollback has started because we will have released the locks by
  1280. the time we get here. */
  1281. if (trx->abort) {
  1282. trx->abort = false;
  1283. DBUG_LOG("trx", "Abort: " << trx);
  1284. trx->state = TRX_STATE_FORCED_ROLLBACK;
  1285. } else {
  1286. DBUG_LOG("trx", "Commit in memory: " << trx);
  1287. trx->state = TRX_STATE_NOT_STARTED;
  1288. }
  1289. /* trx->in_mysql_trx_list would hold between
  1290. trx_allocate_for_mysql() and trx_free_for_mysql(). It does not
  1291. hold for recovered transactions or system transactions. */
  1292. assert_trx_is_free(trx);
  1293. trx_init(trx);
  1294. trx_mutex_exit(trx);
  1295. ut_a(trx->error_state == DB_SUCCESS);
  1296. srv_wake_purge_thread_if_not_active();
  1297. }
  1298. /****************************************************************//**
  1299. Commits a transaction and a mini-transaction. */
  1300. void
  1301. trx_commit_low(
  1302. /*===========*/
  1303. trx_t* trx, /*!< in/out: transaction */
  1304. mtr_t* mtr) /*!< in/out: mini-transaction (will be committed),
  1305. or NULL if trx made no modifications */
  1306. {
  1307. assert_trx_nonlocking_or_in_list(trx);
  1308. ut_ad(!trx_state_eq(trx, TRX_STATE_COMMITTED_IN_MEMORY));
  1309. ut_ad(!mtr || mtr->is_active());
  1310. ut_ad(!mtr == !trx->has_logged_or_recovered());
  1311. /* undo_no is non-zero if we're doing the final commit. */
  1312. if (trx->fts_trx != NULL && trx->undo_no != 0) {
  1313. dberr_t error;
  1314. ut_a(!trx_is_autocommit_non_locking(trx));
  1315. error = fts_commit(trx);
  1316. /* FTS-FIXME: Temporarily tolerate DB_DUPLICATE_KEY
  1317. instead of dying. This is a possible scenario if there
  1318. is a crash between insert to DELETED table committing
  1319. and transaction committing. The fix would be able to
  1320. return error from this function */
  1321. if (error != DB_SUCCESS && error != DB_DUPLICATE_KEY) {
  1322. /* FTS-FIXME: once we can return values from this
  1323. function, we should do so and signal an error
  1324. instead of just dying. */
  1325. ut_error;
  1326. }
  1327. }
  1328. if (mtr != NULL) {
  1329. mtr->set_sync();
  1330. trx_write_serialisation_history(trx, mtr);
  1331. /* The following call commits the mini-transaction, making the
  1332. whole transaction committed in the file-based world, at this
  1333. log sequence number. The transaction becomes 'durable' when
  1334. we write the log to disk, but in the logical sense the commit
  1335. in the file-based data structures (undo logs etc.) happens
  1336. here.
  1337. NOTE that transaction numbers, which are assigned only to
  1338. transactions with an update undo log, do not necessarily come
  1339. in exactly the same order as commit lsn's, if the transactions
  1340. have different rollback segments. To get exactly the same
  1341. order we should hold the kernel mutex up to this point,
  1342. adding to the contention of the kernel mutex. However, if
  1343. a transaction T2 is able to see modifications made by
  1344. a transaction T1, T2 will always get a bigger transaction
  1345. number and a bigger commit lsn than T1. */
  1346. /*--------------*/
  1347. mtr_commit(mtr);
  1348. DBUG_EXECUTE_IF("ib_crash_during_trx_commit_in_mem",
  1349. if (trx->has_logged()) {
  1350. log_make_checkpoint_at(LSN_MAX, TRUE);
  1351. DBUG_SUICIDE();
  1352. });
  1353. /*--------------*/
  1354. }
  1355. #ifndef DBUG_OFF
  1356. /* In case of this function is called from a stack executing
  1357. THD::release_resources -> ...
  1358. innobase_connection_close() ->
  1359. trx_rollback_for_mysql... -> .
  1360. mysql's thd does not seem to have
  1361. thd->debug_sync_control defined any longer. However the stack
  1362. is possible only with a prepared trx not updating any data.
  1363. */
  1364. if (trx->mysql_thd != NULL && trx->has_logged_persistent()) {
  1365. DEBUG_SYNC_C("before_trx_state_committed_in_memory");
  1366. }
  1367. #endif
  1368. trx_commit_in_memory(trx, mtr);
  1369. }
  1370. /****************************************************************//**
  1371. Commits a transaction. */
  1372. void
  1373. trx_commit(
  1374. /*=======*/
  1375. trx_t* trx) /*!< in/out: transaction */
  1376. {
  1377. mtr_t* mtr;
  1378. mtr_t local_mtr;
  1379. DBUG_EXECUTE_IF("ib_trx_commit_crash_before_trx_commit_start",
  1380. DBUG_SUICIDE(););
  1381. if (trx->has_logged_or_recovered()) {
  1382. mtr = &local_mtr;
  1383. mtr_start_sync(mtr);
  1384. } else {
  1385. mtr = NULL;
  1386. }
  1387. trx_commit_low(trx, mtr);
  1388. }
  1389. /****************************************************************//**
  1390. Prepares a transaction for commit/rollback. */
  1391. void
  1392. trx_commit_or_rollback_prepare(
  1393. /*===========================*/
  1394. trx_t* trx) /*!< in/out: transaction */
  1395. {
  1396. /* We are reading trx->state without holding trx_sys.mutex
  1397. here, because the commit or rollback should be invoked for a
  1398. running (or recovered prepared) transaction that is associated
  1399. with the current thread. */
  1400. switch (trx->state) {
  1401. case TRX_STATE_NOT_STARTED:
  1402. case TRX_STATE_FORCED_ROLLBACK:
  1403. trx_start_low(trx, true);
  1404. /* fall through */
  1405. case TRX_STATE_ACTIVE:
  1406. case TRX_STATE_PREPARED:
  1407. /* If the trx is in a lock wait state, moves the waiting
  1408. query thread to the suspended state */
  1409. if (trx->lock.que_state == TRX_QUE_LOCK_WAIT) {
  1410. ut_a(trx->lock.wait_thr != NULL);
  1411. trx->lock.wait_thr->state = QUE_THR_SUSPENDED;
  1412. trx->lock.wait_thr = NULL;
  1413. trx->lock.que_state = TRX_QUE_RUNNING;
  1414. }
  1415. ut_a(trx->lock.n_active_thrs == 1);
  1416. return;
  1417. case TRX_STATE_COMMITTED_IN_MEMORY:
  1418. break;
  1419. }
  1420. ut_error;
  1421. }
  1422. /*********************************************************************//**
  1423. Creates a commit command node struct.
  1424. @return own: commit node struct */
  1425. commit_node_t*
  1426. trx_commit_node_create(
  1427. /*===================*/
  1428. mem_heap_t* heap) /*!< in: mem heap where created */
  1429. {
  1430. commit_node_t* node;
  1431. node = static_cast<commit_node_t*>(mem_heap_alloc(heap, sizeof(*node)));
  1432. node->common.type = QUE_NODE_COMMIT;
  1433. node->state = COMMIT_NODE_SEND;
  1434. return(node);
  1435. }
  1436. /***********************************************************//**
  1437. Performs an execution step for a commit type node in a query graph.
  1438. @return query thread to run next, or NULL */
  1439. que_thr_t*
  1440. trx_commit_step(
  1441. /*============*/
  1442. que_thr_t* thr) /*!< in: query thread */
  1443. {
  1444. commit_node_t* node;
  1445. node = static_cast<commit_node_t*>(thr->run_node);
  1446. ut_ad(que_node_get_type(node) == QUE_NODE_COMMIT);
  1447. if (thr->prev_node == que_node_get_parent(node)) {
  1448. node->state = COMMIT_NODE_SEND;
  1449. }
  1450. if (node->state == COMMIT_NODE_SEND) {
  1451. trx_t* trx;
  1452. node->state = COMMIT_NODE_WAIT;
  1453. trx = thr_get_trx(thr);
  1454. ut_a(trx->lock.wait_thr == NULL);
  1455. ut_a(trx->lock.que_state != TRX_QUE_LOCK_WAIT);
  1456. trx_commit_or_rollback_prepare(trx);
  1457. trx->lock.que_state = TRX_QUE_COMMITTING;
  1458. trx_commit(trx);
  1459. ut_ad(trx->lock.wait_thr == NULL);
  1460. trx->lock.que_state = TRX_QUE_RUNNING;
  1461. thr = NULL;
  1462. } else {
  1463. ut_ad(node->state == COMMIT_NODE_WAIT);
  1464. node->state = COMMIT_NODE_SEND;
  1465. thr->run_node = que_node_get_parent(node);
  1466. }
  1467. return(thr);
  1468. }
  1469. /**********************************************************************//**
  1470. Does the transaction commit for MySQL.
  1471. @return DB_SUCCESS or error number */
  1472. dberr_t
  1473. trx_commit_for_mysql(
  1474. /*=================*/
  1475. trx_t* trx) /*!< in/out: transaction */
  1476. {
  1477. TrxInInnoDB trx_in_innodb(trx, true);
  1478. if (trx_in_innodb.is_aborted()
  1479. && trx->killed_by != os_thread_get_curr_id()) {
  1480. return(DB_FORCED_ABORT);
  1481. }
  1482. /* Because we do not do the commit by sending an Innobase
  1483. sig to the transaction, we must here make sure that trx has been
  1484. started. */
  1485. switch (trx->state) {
  1486. case TRX_STATE_NOT_STARTED:
  1487. case TRX_STATE_FORCED_ROLLBACK:
  1488. ut_d(trx->start_file = __FILE__);
  1489. ut_d(trx->start_line = __LINE__);
  1490. trx_start_low(trx, true);
  1491. /* fall through */
  1492. case TRX_STATE_ACTIVE:
  1493. case TRX_STATE_PREPARED:
  1494. trx->op_info = "committing";
  1495. trx_commit(trx);
  1496. MONITOR_DEC(MONITOR_TRX_ACTIVE);
  1497. trx->op_info = "";
  1498. return(DB_SUCCESS);
  1499. case TRX_STATE_COMMITTED_IN_MEMORY:
  1500. break;
  1501. }
  1502. ut_error;
  1503. return(DB_CORRUPTION);
  1504. }
  1505. /**********************************************************************//**
  1506. If required, flushes the log to disk if we called trx_commit_for_mysql()
  1507. with trx->flush_log_later == TRUE. */
  1508. void
  1509. trx_commit_complete_for_mysql(
  1510. /*==========================*/
  1511. trx_t* trx) /*!< in/out: transaction */
  1512. {
  1513. if (trx->id != 0
  1514. || !trx->must_flush_log_later
  1515. || (srv_flush_log_at_trx_commit == 1 && trx->active_commit_ordered)) {
  1516. return;
  1517. }
  1518. trx_flush_log_if_needed(trx->commit_lsn, trx);
  1519. trx->must_flush_log_later = false;
  1520. }
  1521. /**********************************************************************//**
  1522. Marks the latest SQL statement ended. */
  1523. void
  1524. trx_mark_sql_stat_end(
  1525. /*==================*/
  1526. trx_t* trx) /*!< in: trx handle */
  1527. {
  1528. ut_a(trx);
  1529. switch (trx->state) {
  1530. case TRX_STATE_PREPARED:
  1531. case TRX_STATE_COMMITTED_IN_MEMORY:
  1532. break;
  1533. case TRX_STATE_NOT_STARTED:
  1534. case TRX_STATE_FORCED_ROLLBACK:
  1535. trx->undo_no = 0;
  1536. trx->undo_rseg_space = 0;
  1537. /* fall through */
  1538. case TRX_STATE_ACTIVE:
  1539. trx->last_sql_stat_start.least_undo_no = trx->undo_no;
  1540. if (trx->fts_trx != NULL) {
  1541. fts_savepoint_laststmt_refresh(trx);
  1542. }
  1543. return;
  1544. }
  1545. ut_error;
  1546. }
  1547. /**********************************************************************//**
  1548. Prints info about a transaction. */
  1549. void
  1550. trx_print_low(
  1551. /*==========*/
  1552. FILE* f,
  1553. /*!< in: output stream */
  1554. const trx_t* trx,
  1555. /*!< in: transaction */
  1556. ulint max_query_len,
  1557. /*!< in: max query length to print,
  1558. or 0 to use the default max length */
  1559. ulint n_rec_locks,
  1560. /*!< in: lock_number_of_rows_locked(&trx->lock) */
  1561. ulint n_trx_locks,
  1562. /*!< in: length of trx->lock.trx_locks */
  1563. ulint heap_size)
  1564. /*!< in: mem_heap_get_size(trx->lock.lock_heap) */
  1565. {
  1566. ibool newline;
  1567. const char* op_info;
  1568. fprintf(f, "TRANSACTION " TRX_ID_FMT, trx_get_id_for_print(trx));
  1569. /* trx->state cannot change from or to NOT_STARTED while we
  1570. are holding the trx_sys.mutex. It may change from ACTIVE to
  1571. PREPARED or COMMITTED. */
  1572. switch (trx->state) {
  1573. case TRX_STATE_NOT_STARTED:
  1574. fputs(", not started", f);
  1575. goto state_ok;
  1576. case TRX_STATE_FORCED_ROLLBACK:
  1577. fputs(", forced rollback", f);
  1578. goto state_ok;
  1579. case TRX_STATE_ACTIVE:
  1580. fprintf(f, ", ACTIVE %lu sec",
  1581. (ulong) difftime(time(NULL), trx->start_time));
  1582. goto state_ok;
  1583. case TRX_STATE_PREPARED:
  1584. fprintf(f, ", ACTIVE (PREPARED) %lu sec",
  1585. (ulong) difftime(time(NULL), trx->start_time));
  1586. goto state_ok;
  1587. case TRX_STATE_COMMITTED_IN_MEMORY:
  1588. fputs(", COMMITTED IN MEMORY", f);
  1589. goto state_ok;
  1590. }
  1591. fprintf(f, ", state %lu", (ulong) trx->state);
  1592. ut_ad(0);
  1593. state_ok:
  1594. /* prevent a race condition */
  1595. op_info = trx->op_info;
  1596. if (*op_info) {
  1597. putc(' ', f);
  1598. fputs(op_info, f);
  1599. }
  1600. if (trx->is_recovered) {
  1601. fputs(" recovered trx", f);
  1602. }
  1603. if (trx->declared_to_be_inside_innodb) {
  1604. fprintf(f, ", thread declared inside InnoDB %lu",
  1605. (ulong) trx->n_tickets_to_enter_innodb);
  1606. }
  1607. putc('\n', f);
  1608. if (trx->n_mysql_tables_in_use > 0 || trx->mysql_n_tables_locked > 0) {
  1609. fprintf(f, "mysql tables in use %lu, locked %lu\n",
  1610. (ulong) trx->n_mysql_tables_in_use,
  1611. (ulong) trx->mysql_n_tables_locked);
  1612. }
  1613. newline = TRUE;
  1614. /* trx->lock.que_state of an ACTIVE transaction may change
  1615. while we are not holding trx->mutex. We perform a dirty read
  1616. for performance reasons. */
  1617. switch (trx->lock.que_state) {
  1618. case TRX_QUE_RUNNING:
  1619. newline = FALSE; break;
  1620. case TRX_QUE_LOCK_WAIT:
  1621. fputs("LOCK WAIT ", f); break;
  1622. case TRX_QUE_ROLLING_BACK:
  1623. fputs("ROLLING BACK ", f); break;
  1624. case TRX_QUE_COMMITTING:
  1625. fputs("COMMITTING ", f); break;
  1626. default:
  1627. fprintf(f, "que state %lu ", (ulong) trx->lock.que_state);
  1628. }
  1629. if (n_trx_locks > 0 || heap_size > 400) {
  1630. newline = TRUE;
  1631. fprintf(f, "%lu lock struct(s), heap size %lu,"
  1632. " %lu row lock(s)",
  1633. (ulong) n_trx_locks,
  1634. (ulong) heap_size,
  1635. (ulong) n_rec_locks);
  1636. }
  1637. if (trx->undo_no != 0) {
  1638. newline = TRUE;
  1639. fprintf(f, ", undo log entries " TRX_ID_FMT, trx->undo_no);
  1640. }
  1641. if (newline) {
  1642. putc('\n', f);
  1643. }
  1644. if (trx->state != TRX_STATE_NOT_STARTED && trx->mysql_thd != NULL) {
  1645. innobase_mysql_print_thd(
  1646. f, trx->mysql_thd, static_cast<uint>(max_query_len));
  1647. }
  1648. }
  1649. /**********************************************************************//**
  1650. Prints info about a transaction.
  1651. The caller must hold lock_sys->mutex.
  1652. When possible, use trx_print() instead. */
  1653. void
  1654. trx_print_latched(
  1655. /*==============*/
  1656. FILE* f, /*!< in: output stream */
  1657. const trx_t* trx, /*!< in: transaction */
  1658. ulint max_query_len) /*!< in: max query length to print,
  1659. or 0 to use the default max length */
  1660. {
  1661. ut_ad(lock_mutex_own());
  1662. trx_print_low(f, trx, max_query_len,
  1663. lock_number_of_rows_locked(&trx->lock),
  1664. UT_LIST_GET_LEN(trx->lock.trx_locks),
  1665. mem_heap_get_size(trx->lock.lock_heap));
  1666. }
  1667. /**********************************************************************//**
  1668. Prints info about a transaction.
  1669. Acquires and releases lock_sys->mutex. */
  1670. void
  1671. trx_print(
  1672. /*======*/
  1673. FILE* f, /*!< in: output stream */
  1674. const trx_t* trx, /*!< in: transaction */
  1675. ulint max_query_len) /*!< in: max query length to print,
  1676. or 0 to use the default max length */
  1677. {
  1678. ulint n_rec_locks;
  1679. ulint n_trx_locks;
  1680. ulint heap_size;
  1681. lock_mutex_enter();
  1682. n_rec_locks = lock_number_of_rows_locked(&trx->lock);
  1683. n_trx_locks = UT_LIST_GET_LEN(trx->lock.trx_locks);
  1684. heap_size = mem_heap_get_size(trx->lock.lock_heap);
  1685. lock_mutex_exit();
  1686. trx_print_low(f, trx, max_query_len,
  1687. n_rec_locks, n_trx_locks, heap_size);
  1688. }
  1689. #ifdef UNIV_DEBUG
  1690. /**********************************************************************//**
  1691. Asserts that a transaction has been started.
  1692. The caller must hold trx_sys.mutex.
  1693. @return TRUE if started */
  1694. ibool
  1695. trx_assert_started(
  1696. /*===============*/
  1697. const trx_t* trx) /*!< in: transaction */
  1698. {
  1699. ut_ad(mutex_own(&trx_sys.mutex));
  1700. /* Non-locking autocommits should not hold any locks and this
  1701. function is only called from the locking code. */
  1702. check_trx_state(trx);
  1703. /* trx->state can change from or to NOT_STARTED while we are holding
  1704. trx_sys.mutex for non-locking autocommit selects but not for other
  1705. types of transactions. It may change from ACTIVE to PREPARED. Unless
  1706. we are holding lock_sys->mutex, it may also change to COMMITTED. */
  1707. switch (trx->state) {
  1708. case TRX_STATE_PREPARED:
  1709. return(TRUE);
  1710. case TRX_STATE_ACTIVE:
  1711. case TRX_STATE_COMMITTED_IN_MEMORY:
  1712. return(TRUE);
  1713. case TRX_STATE_NOT_STARTED:
  1714. case TRX_STATE_FORCED_ROLLBACK:
  1715. break;
  1716. }
  1717. ut_error;
  1718. return(FALSE);
  1719. }
  1720. #endif /* UNIV_DEBUG */
  1721. /*******************************************************************//**
  1722. Compares the "weight" (or size) of two transactions. Transactions that
  1723. have edited non-transactional tables are considered heavier than ones
  1724. that have not.
  1725. @return TRUE if weight(a) >= weight(b) */
  1726. bool
  1727. trx_weight_ge(
  1728. /*==========*/
  1729. const trx_t* a, /*!< in: transaction to be compared */
  1730. const trx_t* b) /*!< in: transaction to be compared */
  1731. {
  1732. ibool a_notrans_edit;
  1733. ibool b_notrans_edit;
  1734. /* If mysql_thd is NULL for a transaction we assume that it has
  1735. not edited non-transactional tables. */
  1736. a_notrans_edit = a->mysql_thd != NULL
  1737. && thd_has_edited_nontrans_tables(a->mysql_thd);
  1738. b_notrans_edit = b->mysql_thd != NULL
  1739. && thd_has_edited_nontrans_tables(b->mysql_thd);
  1740. if (a_notrans_edit != b_notrans_edit) {
  1741. return(a_notrans_edit);
  1742. }
  1743. /* Either both had edited non-transactional tables or both had
  1744. not, we fall back to comparing the number of altered/locked
  1745. rows. */
  1746. return(TRX_WEIGHT(a) >= TRX_WEIGHT(b));
  1747. }
  1748. /** Prepare a transaction.
  1749. @return log sequence number that makes the XA PREPARE durable
  1750. @retval 0 if no changes needed to be made durable */
  1751. static
  1752. lsn_t
  1753. trx_prepare_low(trx_t* trx)
  1754. {
  1755. ut_ad(!trx->rsegs.m_redo.old_insert);
  1756. ut_ad(!trx->is_recovered);
  1757. mtr_t mtr;
  1758. /* It is not necessary to acquire trx->undo_mutex here because
  1759. only the owning (connection) thread of the transaction is
  1760. allowed to perform XA PREPARE. */
  1761. if (trx_undo_t* undo = trx->rsegs.m_noredo.undo) {
  1762. ut_ad(undo->rseg == trx->rsegs.m_noredo.rseg);
  1763. mtr.start();
  1764. mtr.set_log_mode(MTR_LOG_NO_REDO);
  1765. mutex_enter(&undo->rseg->mutex);
  1766. trx_undo_set_state_at_prepare(trx, undo, false, &mtr);
  1767. mutex_exit(&undo->rseg->mutex);
  1768. mtr.commit();
  1769. }
  1770. trx_undo_t* undo = trx->rsegs.m_redo.undo;
  1771. if (!undo) {
  1772. /* There were no changes to persistent tables. */
  1773. return(0);
  1774. }
  1775. trx_rseg_t* rseg = trx->rsegs.m_redo.rseg;
  1776. ut_ad(undo->rseg == rseg);
  1777. mtr.start(true);
  1778. /* Change the undo log segment states from TRX_UNDO_ACTIVE to
  1779. TRX_UNDO_PREPARED: these modifications to the file data
  1780. structure define the transaction as prepared in the file-based
  1781. world, at the serialization point of lsn. */
  1782. mutex_enter(&rseg->mutex);
  1783. trx_undo_set_state_at_prepare(trx, undo, false, &mtr);
  1784. mutex_exit(&rseg->mutex);
  1785. /* Make the XA PREPARE durable. */
  1786. mtr.commit();
  1787. ut_ad(mtr.commit_lsn() > 0);
  1788. return(mtr.commit_lsn());
  1789. }
  1790. /****************************************************************//**
  1791. Prepares a transaction. */
  1792. static
  1793. void
  1794. trx_prepare(
  1795. /*========*/
  1796. trx_t* trx) /*!< in/out: transaction */
  1797. {
  1798. /* This transaction has crossed the point of no return and cannot
  1799. be rolled back asynchronously now. It must commit or rollback
  1800. synhronously. */
  1801. /* Only fresh user transactions can be prepared.
  1802. Recovered transactions cannot. */
  1803. ut_a(!trx->is_recovered);
  1804. lsn_t lsn = trx_prepare_low(trx);
  1805. DBUG_EXECUTE_IF("ib_trx_crash_during_xa_prepare_step", DBUG_SUICIDE(););
  1806. ut_a(trx->state == TRX_STATE_ACTIVE);
  1807. trx_mutex_enter(trx);
  1808. trx->state = TRX_STATE_PREPARED;
  1809. trx_mutex_exit(trx);
  1810. if (lsn) {
  1811. /* Depending on the my.cnf options, we may now write the log
  1812. buffer to the log files, making the prepared state of the
  1813. transaction durable if the OS does not crash. We may also
  1814. flush the log files to disk, making the prepared state of the
  1815. transaction durable also at an OS crash or a power outage.
  1816. The idea in InnoDB's group prepare is that a group of
  1817. transactions gather behind a trx doing a physical disk write
  1818. to log files, and when that physical write has been completed,
  1819. one of those transactions does a write which prepares the whole
  1820. group. Note that this group prepare will only bring benefit if
  1821. there are > 2 users in the database. Then at least 2 users can
  1822. gather behind one doing the physical log write to disk.
  1823. We must not be holding any mutexes or latches here. */
  1824. trx_flush_log_if_needed(lsn, trx);
  1825. }
  1826. }
  1827. /**
  1828. Does the transaction prepare for MySQL.
  1829. @param[in, out] trx Transaction instance to prepare */
  1830. dberr_t
  1831. trx_prepare_for_mysql(trx_t* trx)
  1832. {
  1833. trx_start_if_not_started_xa(trx, false);
  1834. TrxInInnoDB trx_in_innodb(trx, true);
  1835. if (trx_in_innodb.is_aborted()
  1836. && trx->killed_by != os_thread_get_curr_id()) {
  1837. return(DB_FORCED_ABORT);
  1838. }
  1839. trx->op_info = "preparing";
  1840. trx_prepare(trx);
  1841. trx->op_info = "";
  1842. return(DB_SUCCESS);
  1843. }
  1844. struct trx_recover_for_mysql_callback_arg
  1845. {
  1846. XID *xid_list;
  1847. uint len;
  1848. uint count;
  1849. };
  1850. static my_bool trx_recover_for_mysql_callback(rw_trx_hash_element_t *element,
  1851. trx_recover_for_mysql_callback_arg *arg)
  1852. {
  1853. mutex_enter(&element->mutex);
  1854. if (trx_t *trx= element->trx)
  1855. {
  1856. /*
  1857. The state of a read-write transaction can only change from ACTIVE to
  1858. PREPARED while we are holding the element->mutex. But since it is
  1859. executed at startup no state change should occur.
  1860. */
  1861. if (trx_state_eq(trx, TRX_STATE_PREPARED))
  1862. {
  1863. ut_ad(trx->is_recovered);
  1864. if (arg->count == 0)
  1865. ib::info() << "Starting recovery for XA transactions...";
  1866. ib::info() << "Transaction " << trx_get_id_for_print(trx)
  1867. << " in prepared state after recovery";
  1868. ib::info() << "Transaction contains changes to " << trx->undo_no
  1869. << " rows";
  1870. arg->xid_list[arg->count++]= *trx->xid;
  1871. }
  1872. }
  1873. mutex_exit(&element->mutex);
  1874. return arg->count == arg->len;
  1875. }
  1876. /**
  1877. Find prepared transaction objects for recovery.
  1878. @param[out] xid_list prepared transactions
  1879. @param[in] len number of slots in xid_list
  1880. @return number of prepared transactions stored in xid_list
  1881. */
  1882. int trx_recover_for_mysql(XID *xid_list, uint len)
  1883. {
  1884. trx_recover_for_mysql_callback_arg arg= { xid_list, len, 0 };
  1885. ut_ad(xid_list);
  1886. ut_ad(len);
  1887. /* Fill xid_list with PREPARED transactions. */
  1888. trx_sys.rw_trx_hash.iterate_no_dups(reinterpret_cast<my_hash_walk_action>
  1889. (trx_recover_for_mysql_callback), &arg);
  1890. if (arg.count)
  1891. ib::info() << arg.count
  1892. << " transactions in prepared state after recovery";
  1893. return(arg.count);
  1894. }
  1895. struct trx_get_trx_by_xid_callback_arg
  1896. {
  1897. XID *xid;
  1898. trx_t *trx;
  1899. };
  1900. static my_bool trx_get_trx_by_xid_callback(rw_trx_hash_element_t *element,
  1901. trx_get_trx_by_xid_callback_arg *arg)
  1902. {
  1903. my_bool found= 0;
  1904. mutex_enter(&element->mutex);
  1905. if (trx_t *trx= element->trx)
  1906. {
  1907. if (trx->is_recovered && trx_state_eq(trx, TRX_STATE_PREPARED) &&
  1908. arg->xid->eq(reinterpret_cast<XID*>(trx->xid)))
  1909. {
  1910. /* Invalidate the XID, so that subsequent calls will not find it. */
  1911. trx->xid->null();
  1912. arg->trx= trx;
  1913. found= 1;
  1914. }
  1915. }
  1916. mutex_exit(&element->mutex);
  1917. return found;
  1918. }
  1919. /**
  1920. Finds PREPARED XA transaction by xid.
  1921. trx may have been committed, unless the caller is holding lock_sys->mutex.
  1922. @param[in] xid X/Open XA transaction identifier
  1923. @return trx or NULL; on match, the trx->xid will be invalidated;
  1924. */
  1925. trx_t *trx_get_trx_by_xid(XID *xid)
  1926. {
  1927. trx_get_trx_by_xid_callback_arg arg= { xid, 0 };
  1928. if (xid)
  1929. trx_sys.rw_trx_hash.iterate(reinterpret_cast<my_hash_walk_action>
  1930. (trx_get_trx_by_xid_callback), &arg);
  1931. return arg.trx;
  1932. }
  1933. /*************************************************************//**
  1934. Starts the transaction if it is not yet started. */
  1935. void
  1936. trx_start_if_not_started_xa_low(
  1937. /*============================*/
  1938. trx_t* trx, /*!< in/out: transaction */
  1939. bool read_write) /*!< in: true if read write transaction */
  1940. {
  1941. switch (trx->state) {
  1942. case TRX_STATE_NOT_STARTED:
  1943. case TRX_STATE_FORCED_ROLLBACK:
  1944. trx_start_low(trx, read_write);
  1945. return;
  1946. case TRX_STATE_ACTIVE:
  1947. if (trx->id == 0 && read_write) {
  1948. /* If the transaction is tagged as read-only then
  1949. it can only write to temp tables and for such
  1950. transactions we don't want to move them to the
  1951. trx_sys_t::rw_trx_hash. */
  1952. if (!trx->read_only) {
  1953. trx_set_rw_mode(trx);
  1954. }
  1955. }
  1956. return;
  1957. case TRX_STATE_PREPARED:
  1958. case TRX_STATE_COMMITTED_IN_MEMORY:
  1959. break;
  1960. }
  1961. ut_error;
  1962. }
  1963. /*************************************************************//**
  1964. Starts the transaction if it is not yet started. */
  1965. void
  1966. trx_start_if_not_started_low(
  1967. /*==========================*/
  1968. trx_t* trx, /*!< in: transaction */
  1969. bool read_write) /*!< in: true if read write transaction */
  1970. {
  1971. switch (trx->state) {
  1972. case TRX_STATE_NOT_STARTED:
  1973. case TRX_STATE_FORCED_ROLLBACK:
  1974. trx_start_low(trx, read_write);
  1975. return;
  1976. case TRX_STATE_ACTIVE:
  1977. if (read_write && trx->id == 0 && !trx->read_only) {
  1978. trx_set_rw_mode(trx);
  1979. }
  1980. return;
  1981. case TRX_STATE_PREPARED:
  1982. case TRX_STATE_COMMITTED_IN_MEMORY:
  1983. break;
  1984. }
  1985. ut_error;
  1986. }
  1987. /*************************************************************//**
  1988. Starts a transaction for internal processing. */
  1989. void
  1990. trx_start_internal_low(
  1991. /*===================*/
  1992. trx_t* trx) /*!< in/out: transaction */
  1993. {
  1994. /* Ensure it is not flagged as an auto-commit-non-locking
  1995. transaction. */
  1996. trx->will_lock = 1;
  1997. trx->internal = true;
  1998. trx_start_low(trx, true);
  1999. }
  2000. /** Starts a read-only transaction for internal processing.
  2001. @param[in,out] trx transaction to be started */
  2002. void
  2003. trx_start_internal_read_only_low(
  2004. trx_t* trx)
  2005. {
  2006. /* Ensure it is not flagged as an auto-commit-non-locking
  2007. transaction. */
  2008. trx->will_lock = 1;
  2009. trx->internal = true;
  2010. trx_start_low(trx, false);
  2011. }
  2012. /*************************************************************//**
  2013. Starts the transaction for a DDL operation. */
  2014. void
  2015. trx_start_for_ddl_low(
  2016. /*==================*/
  2017. trx_t* trx, /*!< in/out: transaction */
  2018. trx_dict_op_t op) /*!< in: dictionary operation type */
  2019. {
  2020. switch (trx->state) {
  2021. case TRX_STATE_NOT_STARTED:
  2022. case TRX_STATE_FORCED_ROLLBACK:
  2023. /* Flag this transaction as a dictionary operation, so that
  2024. the data dictionary will be locked in crash recovery. */
  2025. trx_set_dict_operation(trx, op);
  2026. /* Ensure it is not flagged as an auto-commit-non-locking
  2027. transation. */
  2028. trx->will_lock = 1;
  2029. trx->ddl= true;
  2030. trx_start_internal_low(trx);
  2031. return;
  2032. case TRX_STATE_ACTIVE:
  2033. case TRX_STATE_PREPARED:
  2034. case TRX_STATE_COMMITTED_IN_MEMORY:
  2035. break;
  2036. }
  2037. ut_error;
  2038. }
  2039. /*************************************************************//**
  2040. Set the transaction as a read-write transaction if it is not already
  2041. tagged as such. Read-only transactions that are writing to temporary
  2042. tables are assigned an ID and a rollback segment but are not added
  2043. to the trx read-write list because their updates should not be visible
  2044. to other transactions and therefore their changes can be ignored by
  2045. by MVCC. */
  2046. void
  2047. trx_set_rw_mode(
  2048. /*============*/
  2049. trx_t* trx) /*!< in/out: transaction that is RW */
  2050. {
  2051. ut_ad(trx->rsegs.m_redo.rseg == 0);
  2052. ut_ad(!trx_is_autocommit_non_locking(trx));
  2053. ut_ad(!trx->read_only);
  2054. ut_ad(trx->id == 0);
  2055. if (high_level_read_only) {
  2056. return;
  2057. }
  2058. /* Function is promoting existing trx from ro mode to rw mode.
  2059. In this process it has acquired trx_sys.mutex as it plan to
  2060. move trx from ro list to rw list. If in future, some other thread
  2061. looks at this trx object while it is being promoted then ensure
  2062. that both threads are synced by acquring trx->mutex to avoid decision
  2063. based on in-consistent view formed during promotion. */
  2064. trx->rsegs.m_redo.rseg = trx_assign_rseg_low();
  2065. ut_ad(trx->rsegs.m_redo.rseg != 0);
  2066. trx_sys.register_rw(trx);
  2067. /* So that we can see our own changes. */
  2068. if (trx->read_view.is_open()) {
  2069. trx->read_view.set_creator_trx_id(trx->id);
  2070. }
  2071. }
  2072. /**
  2073. Kill all transactions that are blocking this transaction from acquiring locks.
  2074. @param[in,out] trx High priority transaction */
  2075. void
  2076. trx_kill_blocking(trx_t* trx)
  2077. {
  2078. if (trx->hit_list.empty()) {
  2079. return;
  2080. }
  2081. DEBUG_SYNC_C("trx_kill_blocking_enter");
  2082. ulint had_dict_lock = trx->dict_operation_lock_mode;
  2083. switch (had_dict_lock) {
  2084. case 0:
  2085. break;
  2086. case RW_S_LATCH:
  2087. /* Release foreign key check latch */
  2088. row_mysql_unfreeze_data_dictionary(trx);
  2089. break;
  2090. default:
  2091. /* There should never be a lock wait when the
  2092. dictionary latch is reserved in X mode. Dictionary
  2093. transactions should only acquire locks on dictionary
  2094. tables, not other tables. All access to dictionary
  2095. tables should be covered by dictionary
  2096. transactions. */
  2097. ut_error;
  2098. }
  2099. ut_a(trx->dict_operation_lock_mode == 0);
  2100. /** Kill the transactions in the lock acquisition order old -> new. */
  2101. hit_list_t::reverse_iterator end = trx->hit_list.rend();
  2102. for (hit_list_t::reverse_iterator it = trx->hit_list.rbegin();
  2103. it != end;
  2104. ++it) {
  2105. trx_t* victim_trx = it->m_trx;
  2106. ulint version = it->m_version;
  2107. /* Shouldn't commit suicide. */
  2108. ut_ad(victim_trx != trx);
  2109. ut_ad(victim_trx->mysql_thd != trx->mysql_thd);
  2110. /* Check that the transaction isn't active inside
  2111. InnoDB code. We have to wait while it is executing
  2112. in the InnoDB context. This can potentially take a
  2113. long time */
  2114. trx_mutex_enter(victim_trx);
  2115. ut_ad(version <= victim_trx->version);
  2116. ulint loop_count = 0;
  2117. /* start with optimistic sleep time of 20 micro seconds. */
  2118. ulint sleep_time = 20;
  2119. while ((victim_trx->in_innodb & TRX_FORCE_ROLLBACK_MASK) > 0
  2120. && victim_trx->version == version) {
  2121. trx_mutex_exit(victim_trx);
  2122. loop_count++;
  2123. /* If the wait is long, don't hog the cpu. */
  2124. if (loop_count < 100) {
  2125. /* 20 microseconds */
  2126. sleep_time = 20;
  2127. } else if (loop_count < 1000) {
  2128. /* 1 millisecond */
  2129. sleep_time = 1000;
  2130. } else {
  2131. /* 100 milliseconds */
  2132. sleep_time = 100000;
  2133. }
  2134. os_thread_sleep(sleep_time);
  2135. trx_mutex_enter(victim_trx);
  2136. }
  2137. /* Compare the version to check if the transaction has
  2138. already finished */
  2139. if (victim_trx->version != version) {
  2140. trx_mutex_exit(victim_trx);
  2141. continue;
  2142. }
  2143. /* We should never kill background transactions. */
  2144. ut_ad(victim_trx->mysql_thd != NULL);
  2145. ut_ad(!(trx->in_innodb & TRX_FORCE_ROLLBACK_DISABLE));
  2146. ut_ad(victim_trx->in_innodb & TRX_FORCE_ROLLBACK);
  2147. ut_ad(victim_trx->in_innodb & TRX_FORCE_ROLLBACK_ASYNC);
  2148. ut_ad(victim_trx->killed_by == os_thread_get_curr_id());
  2149. ut_ad(victim_trx->version == it->m_version);
  2150. /* We don't kill Read Only, Background or high priority
  2151. transactions. */
  2152. ut_a(!victim_trx->read_only);
  2153. ut_a(victim_trx->mysql_thd != NULL);
  2154. trx_mutex_exit(victim_trx);
  2155. #ifndef DBUG_OFF
  2156. char buffer[1024];
  2157. #endif /* !DBUG_OFF */
  2158. DBUG_LOG("trx",
  2159. "High Priority Transaction "
  2160. << trx->id << " killed transaction "
  2161. << victim_trx->id << " in hit list"
  2162. << " - "
  2163. << thd_get_error_context_description(
  2164. victim_trx->mysql_thd,
  2165. buffer, sizeof(buffer), 512));
  2166. trx_rollback_for_mysql(victim_trx);
  2167. trx_mutex_enter(victim_trx);
  2168. version++;
  2169. ut_ad(victim_trx->version == version);
  2170. my_atomic_storelong(&victim_trx->killed_by, 0);
  2171. victim_trx->in_innodb &= TRX_FORCE_ROLLBACK_MASK;
  2172. trx_mutex_exit(victim_trx);
  2173. }
  2174. trx->hit_list.clear();
  2175. if (had_dict_lock) {
  2176. row_mysql_freeze_data_dictionary(trx);
  2177. }
  2178. }