You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

4287 lines
116 KiB

12 years ago
MDEV-13564 Mariabackup does not work with TRUNCATE Implement undo tablespace truncation via normal redo logging. Implement TRUNCATE TABLE as a combination of RENAME to #sql-ib name, CREATE, and DROP. Note: Orphan #sql-ib*.ibd may be left behind if MariaDB Server 10.2 is killed before the DROP operation is committed. If MariaDB Server 10.2 is killed during TRUNCATE, it is also possible that the old table was renamed to #sql-ib*.ibd but the data dictionary will refer to the table using the original name. In MariaDB Server 10.3, RENAME inside InnoDB is transactional, and #sql-* tables will be dropped on startup. So, this new TRUNCATE will be fully crash-safe in 10.3. ha_mroonga::wrapper_truncate(): Pass table options to the underlying storage engine, now that ha_innobase::truncate() will need them. rpl_slave_state::truncate_state_table(): Before truncating mysql.gtid_slave_pos, evict any cached table handles from the table definition cache, so that there will be no stale references to the old table after truncating. == TRUNCATE TABLE == WL#6501 in MySQL 5.7 introduced separate log files for implementing atomic and crash-safe TRUNCATE TABLE, instead of using the InnoDB undo and redo log. Some convoluted logic was added to the InnoDB crash recovery, and some extra synchronization (including a redo log checkpoint) was introduced to make this work. This synchronization has caused performance problems and race conditions, and the extra log files cannot be copied or applied by external backup programs. In order to support crash-upgrade from MariaDB 10.2, we will keep the logic for parsing and applying the extra log files, but we will no longer generate those files in TRUNCATE TABLE. A prerequisite for crash-safe TRUNCATE is a crash-safe RENAME TABLE (with full redo and undo logging and proper rollback). This will be implemented in MDEV-14717. ha_innobase::truncate(): Invoke RENAME, create(), delete_table(). Because RENAME cannot be fully rolled back before MariaDB 10.3 due to missing undo logging, add some explicit rename-back in case the operation fails. ha_innobase::delete(): Introduce a variant that takes sqlcom as a parameter. In TRUNCATE TABLE, we do not want to touch any FOREIGN KEY constraints. ha_innobase::create(): Add the parameters file_per_table, trx. In TRUNCATE, the new table must be created in the same transaction that renames the old table. create_table_info_t::create_table_info_t(): Add the parameters file_per_table, trx. row_drop_table_for_mysql(): Replace a bool parameter with sqlcom. row_drop_table_after_create_fail(): New function, wrapping row_drop_table_for_mysql(). dict_truncate_index_tree_in_mem(), fil_truncate_tablespace(), fil_prepare_for_truncate(), fil_reinit_space_header_for_table(), row_truncate_table_for_mysql(), TruncateLogger, row_truncate_prepare(), row_truncate_rollback(), row_truncate_complete(), row_truncate_fts(), row_truncate_update_system_tables(), row_truncate_foreign_key_checks(), row_truncate_sanity_checks(): Remove. row_upd_check_references_constraints(): Remove a check for TRUNCATE, now that the table is no longer truncated in place. The new test innodb.truncate_foreign uses DEBUG_SYNC to cover some race-condition like scenarios. The test innodb-innodb.truncate does not use any synchronization. We add a redo log subformat to indicate backup-friendly format. MariaDB 10.4 will remove support for the old TRUNCATE logging, so crash-upgrade from old 10.2 or 10.3 to 10.4 will involve limitations. == Undo tablespace truncation == MySQL 5.7 implements undo tablespace truncation. It is only possible when innodb_undo_tablespaces is set to at least 2. The logging is implemented similar to the WL#6501 TRUNCATE, that is, using separate log files and a redo log checkpoint. We can simply implement undo tablespace truncation within a single mini-transaction that reinitializes the undo log tablespace file. Unfortunately, due to the redo log format of some operations, currently, the total redo log written by undo tablespace truncation will be more than the combined size of the truncated undo tablespace. It should be acceptable to have a little more than 1 megabyte of log in a single mini-transaction. This will be fixed in MDEV-17138 in MariaDB Server 10.4. recv_sys_t: Add truncated_undo_spaces[] to remember for which undo tablespaces a MLOG_FILE_CREATE2 record was seen. namespace undo: Remove some unnecessary declarations. fil_space_t::is_being_truncated: Document that this flag now only applies to undo tablespaces. Remove some references. fil_space_t::is_stopping(): Do not refer to is_being_truncated. This check is for tablespaces of tables. Potentially used tablespaces are never truncated any more. buf_dblwr_process(): Suppress the out-of-bounds warning for undo tablespaces. fil_truncate_log(): Write a MLOG_FILE_CREATE2 with a nonzero page number (new size of the tablespace in pages) to inform crash recovery that the undo tablespace size has been reduced. fil_op_write_log(): Relax assertions, so that MLOG_FILE_CREATE2 can be written for undo tablespaces (without .ibd file suffix) for a nonzero page number. os_file_truncate(): Add the parameter allow_shrink=false so that undo tablespaces can actually be shrunk using this function. fil_name_parse(): For undo tablespace truncation, buffer MLOG_FILE_CREATE2 in truncated_undo_spaces[]. recv_read_in_area(): Avoid reading pages for which no redo log records remain buffered, after recv_addr_trim() removed them. trx_rseg_header_create(): Add a FIXME comment that we could write much less redo log. trx_undo_truncate_tablespace(): Reinitialize the undo tablespace in a single mini-transaction, which will be flushed to the redo log before the file size is trimmed. recv_addr_trim(): Discard any redo logs for pages that were logged after the new end of a file, before the truncation LSN. If the rec_list becomes empty, reduce n_addrs. After removing any affected records, actually truncate the file. recv_apply_hashed_log_recs(): Invoke recv_addr_trim() right before applying any log records. The undo tablespace files must be open at this point. buf_flush_or_remove_pages(), buf_flush_dirty_pages(), buf_LRU_flush_or_remove_pages(): Add a parameter for specifying the number of the first page to flush or remove (default 0). trx_purge_initiate_truncate(): Remove the log checkpoints, the extra logging, and some unnecessary crash points. Merge the code from trx_undo_truncate_tablespace(). First, flush all to-be-discarded pages (beyond the new end of the file), then trim the space->size to make the page allocation deterministic. At the only remaining crash injection point, flush the redo log, so that the recovery can be tested.
7 years ago
MDEV-12699 Improve crash recovery of corrupted data pages InnoDB crash recovery used to read every data page for which redo log exists. This is unnecessary for those pages that are initialized by the redo log. If a newly created page is corrupted, recovery could unnecessarily fail. It would suffice to reinitialize the page based on the redo log records. To add insult to injury, InnoDB crash recovery could hang if it encountered a corrupted page. We will fix also that problem. InnoDB would normally refuse to start up if it encounters a corrupted page on recovery, but that can be overridden by setting innodb_force_recovery=1. Data pages are completely initialized by the records MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS. MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE, which notifies that a page has been freed and its contents can be discarded (filled with zeroes). The record MLOG_INDEX_LOAD notifies that redo logging has been re-enabled after being disabled. We can avoid loading the page if all buffered redo log records predate the MLOG_INDEX_LOAD record. For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD records were written before commit aa3f7a107ce3a9a7f80daf3cadd442a61c5493ab. Hence, we will skip these optimizations for tables whose name starts with FTS_. This is joint work with Thirunarayanan Balathandayuthapani. fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the latest recovered MLOG_INDEX_LOAD record for a tablespace. mlog_init: Page initialization operations discovered during redo log scanning. FIXME: This really belongs in recv_sys->addr_hash, and should be removed in MDEV-19176. recv_addr_state: Add the new state RECV_WILL_NOT_READ to indicate that according to mlog_init, the page will be initialized based on redo log record contents. recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS as page initialization. This works around bugs in the crash recovery of ROW_FORMAT=COMPRESSED tables. recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record by resetting the state to RECV_NOT_PROCESSED and by updating the fil_name_t::enable_lsn. recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn to fil_space_t::enable_lsn. recv_recover_page(): Add the parameter init_lsn, to ignore any log records that precede the page initialization. Add DBUG output about skipped operations. buf_page_create(): Initialize FIL_PAGE_LSN, so that recv_recover_page() will not wrongly skip applying the page-initialization record due to the field containing some newer LSN as a leftover from a different page. Do not invoke ibuf_merge_or_delete_for_page() during crash recovery. recv_apply_hashed_log_recs(): Remove some unnecessary lookups. Note if a corrupted page was found during recovery. After invoking buf_page_create(), do invoke ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge() in the last recovery batch. ibuf_merge_or_delete_for_page(): Relax a debug assertion. innobase_start_or_create_for_mysql(): Abort startup if a corrupted page was found during recovery. Corrupted pages will not be flagged if innodb_force_recovery is set. However, the recv_sys->found_corrupt_fs flag can be set regardless of innodb_force_recovery if file names are found to be incorrect (for example, multiple files with the same tablespace ID).
7 years ago
MDEV-12699 Improve crash recovery of corrupted data pages InnoDB crash recovery used to read every data page for which redo log exists. This is unnecessary for those pages that are initialized by the redo log. If a newly created page is corrupted, recovery could unnecessarily fail. It would suffice to reinitialize the page based on the redo log records. To add insult to injury, InnoDB crash recovery could hang if it encountered a corrupted page. We will fix also that problem. InnoDB would normally refuse to start up if it encounters a corrupted page on recovery, but that can be overridden by setting innodb_force_recovery=1. Data pages are completely initialized by the records MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS. MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE, which notifies that a page has been freed and its contents can be discarded (filled with zeroes). The record MLOG_INDEX_LOAD notifies that redo logging has been re-enabled after being disabled. We can avoid loading the page if all buffered redo log records predate the MLOG_INDEX_LOAD record. For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD records were written before commit aa3f7a107ce3a9a7f80daf3cadd442a61c5493ab. Hence, we will skip these optimizations for tables whose name starts with FTS_. This is joint work with Thirunarayanan Balathandayuthapani. fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the latest recovered MLOG_INDEX_LOAD record for a tablespace. mlog_init: Page initialization operations discovered during redo log scanning. FIXME: This really belongs in recv_sys->addr_hash, and should be removed in MDEV-19176. recv_addr_state: Add the new state RECV_WILL_NOT_READ to indicate that according to mlog_init, the page will be initialized based on redo log record contents. recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS as page initialization. This works around bugs in the crash recovery of ROW_FORMAT=COMPRESSED tables. recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record by resetting the state to RECV_NOT_PROCESSED and by updating the fil_name_t::enable_lsn. recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn to fil_space_t::enable_lsn. recv_recover_page(): Add the parameter init_lsn, to ignore any log records that precede the page initialization. Add DBUG output about skipped operations. buf_page_create(): Initialize FIL_PAGE_LSN, so that recv_recover_page() will not wrongly skip applying the page-initialization record due to the field containing some newer LSN as a leftover from a different page. Do not invoke ibuf_merge_or_delete_for_page() during crash recovery. recv_apply_hashed_log_recs(): Remove some unnecessary lookups. Note if a corrupted page was found during recovery. After invoking buf_page_create(), do invoke ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge() in the last recovery batch. ibuf_merge_or_delete_for_page(): Relax a debug assertion. innobase_start_or_create_for_mysql(): Abort startup if a corrupted page was found during recovery. Corrupted pages will not be flagged if innodb_force_recovery is set. However, the recv_sys->found_corrupt_fs flag can be set regardless of innodb_force_recovery if file names are found to be incorrect (for example, multiple files with the same tablespace ID).
7 years ago
MDEV-12699 Improve crash recovery of corrupted data pages InnoDB crash recovery used to read every data page for which redo log exists. This is unnecessary for those pages that are initialized by the redo log. If a newly created page is corrupted, recovery could unnecessarily fail. It would suffice to reinitialize the page based on the redo log records. To add insult to injury, InnoDB crash recovery could hang if it encountered a corrupted page. We will fix also that problem. InnoDB would normally refuse to start up if it encounters a corrupted page on recovery, but that can be overridden by setting innodb_force_recovery=1. Data pages are completely initialized by the records MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS. MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE, which notifies that a page has been freed and its contents can be discarded (filled with zeroes). The record MLOG_INDEX_LOAD notifies that redo logging has been re-enabled after being disabled. We can avoid loading the page if all buffered redo log records predate the MLOG_INDEX_LOAD record. For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD records were written before commit aa3f7a107ce3a9a7f80daf3cadd442a61c5493ab. Hence, we will skip these optimizations for tables whose name starts with FTS_. This is joint work with Thirunarayanan Balathandayuthapani. fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the latest recovered MLOG_INDEX_LOAD record for a tablespace. mlog_init: Page initialization operations discovered during redo log scanning. FIXME: This really belongs in recv_sys->addr_hash, and should be removed in MDEV-19176. recv_addr_state: Add the new state RECV_WILL_NOT_READ to indicate that according to mlog_init, the page will be initialized based on redo log record contents. recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS as page initialization. This works around bugs in the crash recovery of ROW_FORMAT=COMPRESSED tables. recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record by resetting the state to RECV_NOT_PROCESSED and by updating the fil_name_t::enable_lsn. recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn to fil_space_t::enable_lsn. recv_recover_page(): Add the parameter init_lsn, to ignore any log records that precede the page initialization. Add DBUG output about skipped operations. buf_page_create(): Initialize FIL_PAGE_LSN, so that recv_recover_page() will not wrongly skip applying the page-initialization record due to the field containing some newer LSN as a leftover from a different page. Do not invoke ibuf_merge_or_delete_for_page() during crash recovery. recv_apply_hashed_log_recs(): Remove some unnecessary lookups. Note if a corrupted page was found during recovery. After invoking buf_page_create(), do invoke ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge() in the last recovery batch. ibuf_merge_or_delete_for_page(): Relax a debug assertion. innobase_start_or_create_for_mysql(): Abort startup if a corrupted page was found during recovery. Corrupted pages will not be flagged if innodb_force_recovery is set. However, the recv_sys->found_corrupt_fs flag can be set regardless of innodb_force_recovery if file names are found to be incorrect (for example, multiple files with the same tablespace ID).
7 years ago
MDEV-12699 Improve crash recovery of corrupted data pages InnoDB crash recovery used to read every data page for which redo log exists. This is unnecessary for those pages that are initialized by the redo log. If a newly created page is corrupted, recovery could unnecessarily fail. It would suffice to reinitialize the page based on the redo log records. To add insult to injury, InnoDB crash recovery could hang if it encountered a corrupted page. We will fix also that problem. InnoDB would normally refuse to start up if it encounters a corrupted page on recovery, but that can be overridden by setting innodb_force_recovery=1. Data pages are completely initialized by the records MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS. MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE, which notifies that a page has been freed and its contents can be discarded (filled with zeroes). The record MLOG_INDEX_LOAD notifies that redo logging has been re-enabled after being disabled. We can avoid loading the page if all buffered redo log records predate the MLOG_INDEX_LOAD record. For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD records were written before commit aa3f7a107ce3a9a7f80daf3cadd442a61c5493ab. Hence, we will skip these optimizations for tables whose name starts with FTS_. This is joint work with Thirunarayanan Balathandayuthapani. fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the latest recovered MLOG_INDEX_LOAD record for a tablespace. mlog_init: Page initialization operations discovered during redo log scanning. FIXME: This really belongs in recv_sys->addr_hash, and should be removed in MDEV-19176. recv_addr_state: Add the new state RECV_WILL_NOT_READ to indicate that according to mlog_init, the page will be initialized based on redo log record contents. recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS as page initialization. This works around bugs in the crash recovery of ROW_FORMAT=COMPRESSED tables. recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record by resetting the state to RECV_NOT_PROCESSED and by updating the fil_name_t::enable_lsn. recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn to fil_space_t::enable_lsn. recv_recover_page(): Add the parameter init_lsn, to ignore any log records that precede the page initialization. Add DBUG output about skipped operations. buf_page_create(): Initialize FIL_PAGE_LSN, so that recv_recover_page() will not wrongly skip applying the page-initialization record due to the field containing some newer LSN as a leftover from a different page. Do not invoke ibuf_merge_or_delete_for_page() during crash recovery. recv_apply_hashed_log_recs(): Remove some unnecessary lookups. Note if a corrupted page was found during recovery. After invoking buf_page_create(), do invoke ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge() in the last recovery batch. ibuf_merge_or_delete_for_page(): Relax a debug assertion. innobase_start_or_create_for_mysql(): Abort startup if a corrupted page was found during recovery. Corrupted pages will not be flagged if innodb_force_recovery is set. However, the recv_sys->found_corrupt_fs flag can be set regardless of innodb_force_recovery if file names are found to be incorrect (for example, multiple files with the same tablespace ID).
7 years ago
MDEV-12699 Improve crash recovery of corrupted data pages InnoDB crash recovery used to read every data page for which redo log exists. This is unnecessary for those pages that are initialized by the redo log. If a newly created page is corrupted, recovery could unnecessarily fail. It would suffice to reinitialize the page based on the redo log records. To add insult to injury, InnoDB crash recovery could hang if it encountered a corrupted page. We will fix also that problem. InnoDB would normally refuse to start up if it encounters a corrupted page on recovery, but that can be overridden by setting innodb_force_recovery=1. Data pages are completely initialized by the records MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS. MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE, which notifies that a page has been freed and its contents can be discarded (filled with zeroes). The record MLOG_INDEX_LOAD notifies that redo logging has been re-enabled after being disabled. We can avoid loading the page if all buffered redo log records predate the MLOG_INDEX_LOAD record. For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD records were written before commit aa3f7a107ce3a9a7f80daf3cadd442a61c5493ab. Hence, we will skip these optimizations for tables whose name starts with FTS_. This is joint work with Thirunarayanan Balathandayuthapani. fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the latest recovered MLOG_INDEX_LOAD record for a tablespace. mlog_init: Page initialization operations discovered during redo log scanning. FIXME: This really belongs in recv_sys->addr_hash, and should be removed in MDEV-19176. recv_addr_state: Add the new state RECV_WILL_NOT_READ to indicate that according to mlog_init, the page will be initialized based on redo log record contents. recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS as page initialization. This works around bugs in the crash recovery of ROW_FORMAT=COMPRESSED tables. recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record by resetting the state to RECV_NOT_PROCESSED and by updating the fil_name_t::enable_lsn. recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn to fil_space_t::enable_lsn. recv_recover_page(): Add the parameter init_lsn, to ignore any log records that precede the page initialization. Add DBUG output about skipped operations. buf_page_create(): Initialize FIL_PAGE_LSN, so that recv_recover_page() will not wrongly skip applying the page-initialization record due to the field containing some newer LSN as a leftover from a different page. Do not invoke ibuf_merge_or_delete_for_page() during crash recovery. recv_apply_hashed_log_recs(): Remove some unnecessary lookups. Note if a corrupted page was found during recovery. After invoking buf_page_create(), do invoke ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge() in the last recovery batch. ibuf_merge_or_delete_for_page(): Relax a debug assertion. innobase_start_or_create_for_mysql(): Abort startup if a corrupted page was found during recovery. Corrupted pages will not be flagged if innodb_force_recovery is set. However, the recv_sys->found_corrupt_fs flag can be set regardless of innodb_force_recovery if file names are found to be incorrect (for example, multiple files with the same tablespace ID).
7 years ago
MDEV-13564 Mariabackup does not work with TRUNCATE Implement undo tablespace truncation via normal redo logging. Implement TRUNCATE TABLE as a combination of RENAME to #sql-ib name, CREATE, and DROP. Note: Orphan #sql-ib*.ibd may be left behind if MariaDB Server 10.2 is killed before the DROP operation is committed. If MariaDB Server 10.2 is killed during TRUNCATE, it is also possible that the old table was renamed to #sql-ib*.ibd but the data dictionary will refer to the table using the original name. In MariaDB Server 10.3, RENAME inside InnoDB is transactional, and #sql-* tables will be dropped on startup. So, this new TRUNCATE will be fully crash-safe in 10.3. ha_mroonga::wrapper_truncate(): Pass table options to the underlying storage engine, now that ha_innobase::truncate() will need them. rpl_slave_state::truncate_state_table(): Before truncating mysql.gtid_slave_pos, evict any cached table handles from the table definition cache, so that there will be no stale references to the old table after truncating. == TRUNCATE TABLE == WL#6501 in MySQL 5.7 introduced separate log files for implementing atomic and crash-safe TRUNCATE TABLE, instead of using the InnoDB undo and redo log. Some convoluted logic was added to the InnoDB crash recovery, and some extra synchronization (including a redo log checkpoint) was introduced to make this work. This synchronization has caused performance problems and race conditions, and the extra log files cannot be copied or applied by external backup programs. In order to support crash-upgrade from MariaDB 10.2, we will keep the logic for parsing and applying the extra log files, but we will no longer generate those files in TRUNCATE TABLE. A prerequisite for crash-safe TRUNCATE is a crash-safe RENAME TABLE (with full redo and undo logging and proper rollback). This will be implemented in MDEV-14717. ha_innobase::truncate(): Invoke RENAME, create(), delete_table(). Because RENAME cannot be fully rolled back before MariaDB 10.3 due to missing undo logging, add some explicit rename-back in case the operation fails. ha_innobase::delete(): Introduce a variant that takes sqlcom as a parameter. In TRUNCATE TABLE, we do not want to touch any FOREIGN KEY constraints. ha_innobase::create(): Add the parameters file_per_table, trx. In TRUNCATE, the new table must be created in the same transaction that renames the old table. create_table_info_t::create_table_info_t(): Add the parameters file_per_table, trx. row_drop_table_for_mysql(): Replace a bool parameter with sqlcom. row_drop_table_after_create_fail(): New function, wrapping row_drop_table_for_mysql(). dict_truncate_index_tree_in_mem(), fil_truncate_tablespace(), fil_prepare_for_truncate(), fil_reinit_space_header_for_table(), row_truncate_table_for_mysql(), TruncateLogger, row_truncate_prepare(), row_truncate_rollback(), row_truncate_complete(), row_truncate_fts(), row_truncate_update_system_tables(), row_truncate_foreign_key_checks(), row_truncate_sanity_checks(): Remove. row_upd_check_references_constraints(): Remove a check for TRUNCATE, now that the table is no longer truncated in place. The new test innodb.truncate_foreign uses DEBUG_SYNC to cover some race-condition like scenarios. The test innodb-innodb.truncate does not use any synchronization. We add a redo log subformat to indicate backup-friendly format. MariaDB 10.4 will remove support for the old TRUNCATE logging, so crash-upgrade from old 10.2 or 10.3 to 10.4 will involve limitations. == Undo tablespace truncation == MySQL 5.7 implements undo tablespace truncation. It is only possible when innodb_undo_tablespaces is set to at least 2. The logging is implemented similar to the WL#6501 TRUNCATE, that is, using separate log files and a redo log checkpoint. We can simply implement undo tablespace truncation within a single mini-transaction that reinitializes the undo log tablespace file. Unfortunately, due to the redo log format of some operations, currently, the total redo log written by undo tablespace truncation will be more than the combined size of the truncated undo tablespace. It should be acceptable to have a little more than 1 megabyte of log in a single mini-transaction. This will be fixed in MDEV-17138 in MariaDB Server 10.4. recv_sys_t: Add truncated_undo_spaces[] to remember for which undo tablespaces a MLOG_FILE_CREATE2 record was seen. namespace undo: Remove some unnecessary declarations. fil_space_t::is_being_truncated: Document that this flag now only applies to undo tablespaces. Remove some references. fil_space_t::is_stopping(): Do not refer to is_being_truncated. This check is for tablespaces of tables. Potentially used tablespaces are never truncated any more. buf_dblwr_process(): Suppress the out-of-bounds warning for undo tablespaces. fil_truncate_log(): Write a MLOG_FILE_CREATE2 with a nonzero page number (new size of the tablespace in pages) to inform crash recovery that the undo tablespace size has been reduced. fil_op_write_log(): Relax assertions, so that MLOG_FILE_CREATE2 can be written for undo tablespaces (without .ibd file suffix) for a nonzero page number. os_file_truncate(): Add the parameter allow_shrink=false so that undo tablespaces can actually be shrunk using this function. fil_name_parse(): For undo tablespace truncation, buffer MLOG_FILE_CREATE2 in truncated_undo_spaces[]. recv_read_in_area(): Avoid reading pages for which no redo log records remain buffered, after recv_addr_trim() removed them. trx_rseg_header_create(): Add a FIXME comment that we could write much less redo log. trx_undo_truncate_tablespace(): Reinitialize the undo tablespace in a single mini-transaction, which will be flushed to the redo log before the file size is trimmed. recv_addr_trim(): Discard any redo logs for pages that were logged after the new end of a file, before the truncation LSN. If the rec_list becomes empty, reduce n_addrs. After removing any affected records, actually truncate the file. recv_apply_hashed_log_recs(): Invoke recv_addr_trim() right before applying any log records. The undo tablespace files must be open at this point. buf_flush_or_remove_pages(), buf_flush_dirty_pages(), buf_LRU_flush_or_remove_pages(): Add a parameter for specifying the number of the first page to flush or remove (default 0). trx_purge_initiate_truncate(): Remove the log checkpoints, the extra logging, and some unnecessary crash points. Merge the code from trx_undo_truncate_tablespace(). First, flush all to-be-discarded pages (beyond the new end of the file), then trim the space->size to make the page allocation deterministic. At the only remaining crash injection point, flush the redo log, so that the recovery can be tested.
7 years ago
MDEV-13564 Mariabackup does not work with TRUNCATE Implement undo tablespace truncation via normal redo logging. Implement TRUNCATE TABLE as a combination of RENAME to #sql-ib name, CREATE, and DROP. Note: Orphan #sql-ib*.ibd may be left behind if MariaDB Server 10.2 is killed before the DROP operation is committed. If MariaDB Server 10.2 is killed during TRUNCATE, it is also possible that the old table was renamed to #sql-ib*.ibd but the data dictionary will refer to the table using the original name. In MariaDB Server 10.3, RENAME inside InnoDB is transactional, and #sql-* tables will be dropped on startup. So, this new TRUNCATE will be fully crash-safe in 10.3. ha_mroonga::wrapper_truncate(): Pass table options to the underlying storage engine, now that ha_innobase::truncate() will need them. rpl_slave_state::truncate_state_table(): Before truncating mysql.gtid_slave_pos, evict any cached table handles from the table definition cache, so that there will be no stale references to the old table after truncating. == TRUNCATE TABLE == WL#6501 in MySQL 5.7 introduced separate log files for implementing atomic and crash-safe TRUNCATE TABLE, instead of using the InnoDB undo and redo log. Some convoluted logic was added to the InnoDB crash recovery, and some extra synchronization (including a redo log checkpoint) was introduced to make this work. This synchronization has caused performance problems and race conditions, and the extra log files cannot be copied or applied by external backup programs. In order to support crash-upgrade from MariaDB 10.2, we will keep the logic for parsing and applying the extra log files, but we will no longer generate those files in TRUNCATE TABLE. A prerequisite for crash-safe TRUNCATE is a crash-safe RENAME TABLE (with full redo and undo logging and proper rollback). This will be implemented in MDEV-14717. ha_innobase::truncate(): Invoke RENAME, create(), delete_table(). Because RENAME cannot be fully rolled back before MariaDB 10.3 due to missing undo logging, add some explicit rename-back in case the operation fails. ha_innobase::delete(): Introduce a variant that takes sqlcom as a parameter. In TRUNCATE TABLE, we do not want to touch any FOREIGN KEY constraints. ha_innobase::create(): Add the parameters file_per_table, trx. In TRUNCATE, the new table must be created in the same transaction that renames the old table. create_table_info_t::create_table_info_t(): Add the parameters file_per_table, trx. row_drop_table_for_mysql(): Replace a bool parameter with sqlcom. row_drop_table_after_create_fail(): New function, wrapping row_drop_table_for_mysql(). dict_truncate_index_tree_in_mem(), fil_truncate_tablespace(), fil_prepare_for_truncate(), fil_reinit_space_header_for_table(), row_truncate_table_for_mysql(), TruncateLogger, row_truncate_prepare(), row_truncate_rollback(), row_truncate_complete(), row_truncate_fts(), row_truncate_update_system_tables(), row_truncate_foreign_key_checks(), row_truncate_sanity_checks(): Remove. row_upd_check_references_constraints(): Remove a check for TRUNCATE, now that the table is no longer truncated in place. The new test innodb.truncate_foreign uses DEBUG_SYNC to cover some race-condition like scenarios. The test innodb-innodb.truncate does not use any synchronization. We add a redo log subformat to indicate backup-friendly format. MariaDB 10.4 will remove support for the old TRUNCATE logging, so crash-upgrade from old 10.2 or 10.3 to 10.4 will involve limitations. == Undo tablespace truncation == MySQL 5.7 implements undo tablespace truncation. It is only possible when innodb_undo_tablespaces is set to at least 2. The logging is implemented similar to the WL#6501 TRUNCATE, that is, using separate log files and a redo log checkpoint. We can simply implement undo tablespace truncation within a single mini-transaction that reinitializes the undo log tablespace file. Unfortunately, due to the redo log format of some operations, currently, the total redo log written by undo tablespace truncation will be more than the combined size of the truncated undo tablespace. It should be acceptable to have a little more than 1 megabyte of log in a single mini-transaction. This will be fixed in MDEV-17138 in MariaDB Server 10.4. recv_sys_t: Add truncated_undo_spaces[] to remember for which undo tablespaces a MLOG_FILE_CREATE2 record was seen. namespace undo: Remove some unnecessary declarations. fil_space_t::is_being_truncated: Document that this flag now only applies to undo tablespaces. Remove some references. fil_space_t::is_stopping(): Do not refer to is_being_truncated. This check is for tablespaces of tables. Potentially used tablespaces are never truncated any more. buf_dblwr_process(): Suppress the out-of-bounds warning for undo tablespaces. fil_truncate_log(): Write a MLOG_FILE_CREATE2 with a nonzero page number (new size of the tablespace in pages) to inform crash recovery that the undo tablespace size has been reduced. fil_op_write_log(): Relax assertions, so that MLOG_FILE_CREATE2 can be written for undo tablespaces (without .ibd file suffix) for a nonzero page number. os_file_truncate(): Add the parameter allow_shrink=false so that undo tablespaces can actually be shrunk using this function. fil_name_parse(): For undo tablespace truncation, buffer MLOG_FILE_CREATE2 in truncated_undo_spaces[]. recv_read_in_area(): Avoid reading pages for which no redo log records remain buffered, after recv_addr_trim() removed them. trx_rseg_header_create(): Add a FIXME comment that we could write much less redo log. trx_undo_truncate_tablespace(): Reinitialize the undo tablespace in a single mini-transaction, which will be flushed to the redo log before the file size is trimmed. recv_addr_trim(): Discard any redo logs for pages that were logged after the new end of a file, before the truncation LSN. If the rec_list becomes empty, reduce n_addrs. After removing any affected records, actually truncate the file. recv_apply_hashed_log_recs(): Invoke recv_addr_trim() right before applying any log records. The undo tablespace files must be open at this point. buf_flush_or_remove_pages(), buf_flush_dirty_pages(), buf_LRU_flush_or_remove_pages(): Add a parameter for specifying the number of the first page to flush or remove (default 0). trx_purge_initiate_truncate(): Remove the log checkpoints, the extra logging, and some unnecessary crash points. Merge the code from trx_undo_truncate_tablespace(). First, flush all to-be-discarded pages (beyond the new end of the file), then trim the space->size to make the page allocation deterministic. At the only remaining crash injection point, flush the redo log, so that the recovery can be tested.
7 years ago
MDEV-13564 Mariabackup does not work with TRUNCATE Implement undo tablespace truncation via normal redo logging. Implement TRUNCATE TABLE as a combination of RENAME to #sql-ib name, CREATE, and DROP. Note: Orphan #sql-ib*.ibd may be left behind if MariaDB Server 10.2 is killed before the DROP operation is committed. If MariaDB Server 10.2 is killed during TRUNCATE, it is also possible that the old table was renamed to #sql-ib*.ibd but the data dictionary will refer to the table using the original name. In MariaDB Server 10.3, RENAME inside InnoDB is transactional, and #sql-* tables will be dropped on startup. So, this new TRUNCATE will be fully crash-safe in 10.3. ha_mroonga::wrapper_truncate(): Pass table options to the underlying storage engine, now that ha_innobase::truncate() will need them. rpl_slave_state::truncate_state_table(): Before truncating mysql.gtid_slave_pos, evict any cached table handles from the table definition cache, so that there will be no stale references to the old table after truncating. == TRUNCATE TABLE == WL#6501 in MySQL 5.7 introduced separate log files for implementing atomic and crash-safe TRUNCATE TABLE, instead of using the InnoDB undo and redo log. Some convoluted logic was added to the InnoDB crash recovery, and some extra synchronization (including a redo log checkpoint) was introduced to make this work. This synchronization has caused performance problems and race conditions, and the extra log files cannot be copied or applied by external backup programs. In order to support crash-upgrade from MariaDB 10.2, we will keep the logic for parsing and applying the extra log files, but we will no longer generate those files in TRUNCATE TABLE. A prerequisite for crash-safe TRUNCATE is a crash-safe RENAME TABLE (with full redo and undo logging and proper rollback). This will be implemented in MDEV-14717. ha_innobase::truncate(): Invoke RENAME, create(), delete_table(). Because RENAME cannot be fully rolled back before MariaDB 10.3 due to missing undo logging, add some explicit rename-back in case the operation fails. ha_innobase::delete(): Introduce a variant that takes sqlcom as a parameter. In TRUNCATE TABLE, we do not want to touch any FOREIGN KEY constraints. ha_innobase::create(): Add the parameters file_per_table, trx. In TRUNCATE, the new table must be created in the same transaction that renames the old table. create_table_info_t::create_table_info_t(): Add the parameters file_per_table, trx. row_drop_table_for_mysql(): Replace a bool parameter with sqlcom. row_drop_table_after_create_fail(): New function, wrapping row_drop_table_for_mysql(). dict_truncate_index_tree_in_mem(), fil_truncate_tablespace(), fil_prepare_for_truncate(), fil_reinit_space_header_for_table(), row_truncate_table_for_mysql(), TruncateLogger, row_truncate_prepare(), row_truncate_rollback(), row_truncate_complete(), row_truncate_fts(), row_truncate_update_system_tables(), row_truncate_foreign_key_checks(), row_truncate_sanity_checks(): Remove. row_upd_check_references_constraints(): Remove a check for TRUNCATE, now that the table is no longer truncated in place. The new test innodb.truncate_foreign uses DEBUG_SYNC to cover some race-condition like scenarios. The test innodb-innodb.truncate does not use any synchronization. We add a redo log subformat to indicate backup-friendly format. MariaDB 10.4 will remove support for the old TRUNCATE logging, so crash-upgrade from old 10.2 or 10.3 to 10.4 will involve limitations. == Undo tablespace truncation == MySQL 5.7 implements undo tablespace truncation. It is only possible when innodb_undo_tablespaces is set to at least 2. The logging is implemented similar to the WL#6501 TRUNCATE, that is, using separate log files and a redo log checkpoint. We can simply implement undo tablespace truncation within a single mini-transaction that reinitializes the undo log tablespace file. Unfortunately, due to the redo log format of some operations, currently, the total redo log written by undo tablespace truncation will be more than the combined size of the truncated undo tablespace. It should be acceptable to have a little more than 1 megabyte of log in a single mini-transaction. This will be fixed in MDEV-17138 in MariaDB Server 10.4. recv_sys_t: Add truncated_undo_spaces[] to remember for which undo tablespaces a MLOG_FILE_CREATE2 record was seen. namespace undo: Remove some unnecessary declarations. fil_space_t::is_being_truncated: Document that this flag now only applies to undo tablespaces. Remove some references. fil_space_t::is_stopping(): Do not refer to is_being_truncated. This check is for tablespaces of tables. Potentially used tablespaces are never truncated any more. buf_dblwr_process(): Suppress the out-of-bounds warning for undo tablespaces. fil_truncate_log(): Write a MLOG_FILE_CREATE2 with a nonzero page number (new size of the tablespace in pages) to inform crash recovery that the undo tablespace size has been reduced. fil_op_write_log(): Relax assertions, so that MLOG_FILE_CREATE2 can be written for undo tablespaces (without .ibd file suffix) for a nonzero page number. os_file_truncate(): Add the parameter allow_shrink=false so that undo tablespaces can actually be shrunk using this function. fil_name_parse(): For undo tablespace truncation, buffer MLOG_FILE_CREATE2 in truncated_undo_spaces[]. recv_read_in_area(): Avoid reading pages for which no redo log records remain buffered, after recv_addr_trim() removed them. trx_rseg_header_create(): Add a FIXME comment that we could write much less redo log. trx_undo_truncate_tablespace(): Reinitialize the undo tablespace in a single mini-transaction, which will be flushed to the redo log before the file size is trimmed. recv_addr_trim(): Discard any redo logs for pages that were logged after the new end of a file, before the truncation LSN. If the rec_list becomes empty, reduce n_addrs. After removing any affected records, actually truncate the file. recv_apply_hashed_log_recs(): Invoke recv_addr_trim() right before applying any log records. The undo tablespace files must be open at this point. buf_flush_or_remove_pages(), buf_flush_dirty_pages(), buf_LRU_flush_or_remove_pages(): Add a parameter for specifying the number of the first page to flush or remove (default 0). trx_purge_initiate_truncate(): Remove the log checkpoints, the extra logging, and some unnecessary crash points. Merge the code from trx_undo_truncate_tablespace(). First, flush all to-be-discarded pages (beyond the new end of the file), then trim the space->size to make the page allocation deterministic. At the only remaining crash injection point, flush the redo log, so that the recovery can be tested.
7 years ago
Shut down InnoDB after aborted startup. This fixes memory leaks in tests that cause InnoDB startup to fail. buf_pool_free_instance(): Also free buf_pool->flush_rbt, which would normally be freed when crash recovery finishes. fil_node_close_file(), fil_space_free_low(), fil_close_all_files(): Relax some debug assertions to tolerate !srv_was_started. innodb_shutdown(): Renamed from innobase_shutdown_for_mysql(). Changed the return type to void. Do not assume that all subsystems were started. que_init(), que_close(): Remove (empty functions). srv_init(), srv_general_init(): Remove as global functions. srv_free(): Allow srv_sys=NULL. srv_get_active_thread_type(): Only return SRV_PURGE if purge really is running. srv_shutdown_all_bg_threads(): Do not reset srv_start_state. It will be needed by innodb_shutdown(). innobase_start_or_create_for_mysql(): Always call srv_boot() so that innodb_shutdown() can assume that it was called. Make more subsystems dependent on SRV_START_STATE_STAT. srv_shutdown_bg_undo_sources(): Require SRV_START_STATE_STAT. trx_sys_close(): Do not assume purge_sys!=NULL. Do not call buf_dblwr_free(), because the doublewrite buffer can exist while the transaction system does not. logs_empty_and_mark_files_at_shutdown(): Do a faster shutdown if !srv_was_started. recv_sys_close(): Invoke dblwr.pages.clear() which would normally be invoked by buf_dblwr_process(). recv_recovery_from_checkpoint_start(): Always release log_sys->mutex. row_mysql_close(): Allow the subsystem not to exist.
9 years ago
MDEV-12699 Improve crash recovery of corrupted data pages InnoDB crash recovery used to read every data page for which redo log exists. This is unnecessary for those pages that are initialized by the redo log. If a newly created page is corrupted, recovery could unnecessarily fail. It would suffice to reinitialize the page based on the redo log records. To add insult to injury, InnoDB crash recovery could hang if it encountered a corrupted page. We will fix also that problem. InnoDB would normally refuse to start up if it encounters a corrupted page on recovery, but that can be overridden by setting innodb_force_recovery=1. Data pages are completely initialized by the records MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS. MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE, which notifies that a page has been freed and its contents can be discarded (filled with zeroes). The record MLOG_INDEX_LOAD notifies that redo logging has been re-enabled after being disabled. We can avoid loading the page if all buffered redo log records predate the MLOG_INDEX_LOAD record. For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD records were written before commit aa3f7a107ce3a9a7f80daf3cadd442a61c5493ab. Hence, we will skip these optimizations for tables whose name starts with FTS_. This is joint work with Thirunarayanan Balathandayuthapani. fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the latest recovered MLOG_INDEX_LOAD record for a tablespace. mlog_init: Page initialization operations discovered during redo log scanning. FIXME: This really belongs in recv_sys->addr_hash, and should be removed in MDEV-19176. recv_addr_state: Add the new state RECV_WILL_NOT_READ to indicate that according to mlog_init, the page will be initialized based on redo log record contents. recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS as page initialization. This works around bugs in the crash recovery of ROW_FORMAT=COMPRESSED tables. recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record by resetting the state to RECV_NOT_PROCESSED and by updating the fil_name_t::enable_lsn. recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn to fil_space_t::enable_lsn. recv_recover_page(): Add the parameter init_lsn, to ignore any log records that precede the page initialization. Add DBUG output about skipped operations. buf_page_create(): Initialize FIL_PAGE_LSN, so that recv_recover_page() will not wrongly skip applying the page-initialization record due to the field containing some newer LSN as a leftover from a different page. Do not invoke ibuf_merge_or_delete_for_page() during crash recovery. recv_apply_hashed_log_recs(): Remove some unnecessary lookups. Note if a corrupted page was found during recovery. After invoking buf_page_create(), do invoke ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge() in the last recovery batch. ibuf_merge_or_delete_for_page(): Relax a debug assertion. innobase_start_or_create_for_mysql(): Abort startup if a corrupted page was found during recovery. Corrupted pages will not be flagged if innodb_force_recovery is set. However, the recv_sys->found_corrupt_fs flag can be set regardless of innodb_force_recovery if file names are found to be incorrect (for example, multiple files with the same tablespace ID).
7 years ago
10 years ago
12 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-12103 Reduce the time of looking for MLOG_CHECKPOINT during crash recovery This fixes MySQL Bug#80788 in MariaDB 10.2.5. When I made the InnoDB crash recovery more robust by implementing WL#7142, I also introduced an extra redo log scan pass that can be shortened. This fix will slightly extend the InnoDB redo log format that I introduced in MySQL 5.7.9 by writing the start LSN of the MLOG_CHECKPOINT mini-transaction to the end of the log checkpoint page, so that recovery can jump straight to it without scanning all the preceding redo log. LOG_CHECKPOINT_END_LSN: At the end of the checkpoint page, the start LSN of the MLOG_CHECKPOINT mini-transaction. Previously, these bytes were written as 0. log_write_checkpoint_info(), log_group_checkpoint(): Add the parameter end_lsn for writing LOG_CHECKPOINT_END_LSN. log_checkpoint(): Remember the LSN at which the MLOG_CHECKPOINT mini-transaction is starting (or at which the redo log ends on shutdown). recv_init_crash_recovery(): Remove. recv_group_scan_log_recs(): Add the parameter checkpoint_lsn. recv_recovery_from_checkpoint_start(): Read LOG_CHECKPOINT_END_LSN and if it is set, start the first scan from it instead of the checkpoint LSN. Improve some messages and remove bogus assertions. recv_parse_log_recs(): Do not skip DBUG_PRINT("ib_log") for some file-level redo log records. recv_parse_or_apply_log_rec_body(): If we have not parsed all redo log between the checkpoint and the corresponding MLOG_CHECKPOINT record, defer the check for MLOG_FILE_DELETE or MLOG_FILE_NAME records to recv_init_crash_recovery_spaces(). recv_init_crash_recovery_spaces(): Refuse recovery if MLOG_FILE_NAME or MLOG_FILE_DELETE records are missing.
9 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-11254: innodb-use-trim has no effect in 10.2 Problem was that implementation merged from 10.1 was incompatible with InnoDB 5.7. buf0buf.cc: Add functions to return should we punch hole and how big. buf0flu.cc: Add written page to IORequest fil0fil.cc: Remove unneeded status call and add test is sparse files and punch hole supported by file system when tablespace is created. Add call to get file system block size. Used file node is added to IORequest. Added functions to check is punch hole supported and setting punch hole. ha_innodb.cc: Remove unneeded status variables (trim512-32768) and trim_op_saved. Deprecate innodb_use_trim and set it ON by default. Add function to set innodb-use-trim dynamically. dberr.h: Add error code DB_IO_NO_PUNCH_HOLE if punch hole operation fails. fil0fil.h: Add punch_hole variable to fil_space_t and block size to fil_node_t. os0api.h: Header to helper functions on buf0buf.cc and fil0fil.cc for os0file.h os0file.h: Remove unneeded m_block_size from IORequest and add bpage to IORequest to know actual size of the block and m_fil_node to know tablespace file system block size and does it support punch hole. os0file.cc: Add function punch_hole() to IORequest to do punch_hole operation, get the file system block size and determine does file system support sparse files (for punch hole). page0size.h: remove implicit copy disable and use this implicit copy to implement copy_from() function. buf0dblwr.cc, buf0flu.cc, buf0rea.cc, fil0fil.cc, fil0fil.h, os0file.h, os0file.cc, log0log.cc, log0recv.cc: Remove unneeded write_size parameter from fil_io calls. srv0mon.h, srv0srv.h, srv0mon.cc: Remove unneeded trim512-trim32678 status variables. Removed these from monitor tests.
9 years ago
MDEV-14717: Prevent crash-downgrade to earlier MariaDB 10.2 A crash-downgrade of a RENAME (or TRUNCATE or table-rebuilding ALTER TABLE or OPTIMIZE TABLE) operation to an earlier 10.2 version would trigger a debug assertion failure during rollback, in trx_roll_pop_top_rec_of_trx(). In a non-debug build, the TRX_UNDO_RENAME_TABLE record would be misinterpreted as an update_undo log record, and typically the file name would be interpreted as DB_TRX_ID,DB_ROLL_PTR,PRIMARY KEY. If a matching record would be found, row_undo_mod() would hit ut_error in switch (node->rec_type). Typically, ut_a(table2 == NULL) would fail when opening the table from SQL. Because of this, we prevent a crash-downgrade to earlier MariaDB 10.2 versions by changing the InnoDB redo log format identifier to the 10.3 identifier, and by introducing a subformat identifier so that 10.2 can continue to refuse crash-downgrade from 10.3 or later. After a clean shutdown, a downgrade to MariaDB 10.2.13 or later would still be possible thanks to MDEV-14909. A downgrade to older 10.2 versions is only possible after removing the log files (not recommended). LOG_HEADER_FORMAT_CURRENT: Change to 103 (originally the 10.3 format). log_group_t: Add subformat. For 10.2, we will use subformat 1, and will refuse crash recovery from any other subformat of the 10.3 format, that is, a genuine 10.3 redo log. recv_find_max_checkpoint(): Allow startup after clean shutdown from a future LOG_HEADER_FORMAT_10_4 (unencrypted only). We cannot handle the encrypted 10.4 redo log block format, which was introduced in MDEV-12041. Allow crash recovery from the original 10.2 format as well as the new format. In Mariabackup --backup, do not allow any startup from 10.3 or 10.4 redo logs. recv_recovery_from_checkpoint_start(): Skip redo log apply for clean 10.3 redo log, but not for the new 10.2 redo log (10.3 format, subformat 1). srv_prepare_to_delete_redo_log_files(): On format or subformat mismatch, set srv_log_file_size = 0, so that we will display the correct message. innobase_start_or_create_for_mysql(): Check for format or subformat mismatch. xtrabackup_backup_func(): Remove debug assertions that were made redundant by the code changes in recv_find_max_checkpoint().
7 years ago
MDEV-14717: Prevent crash-downgrade to earlier MariaDB 10.2 A crash-downgrade of a RENAME (or TRUNCATE or table-rebuilding ALTER TABLE or OPTIMIZE TABLE) operation to an earlier 10.2 version would trigger a debug assertion failure during rollback, in trx_roll_pop_top_rec_of_trx(). In a non-debug build, the TRX_UNDO_RENAME_TABLE record would be misinterpreted as an update_undo log record, and typically the file name would be interpreted as DB_TRX_ID,DB_ROLL_PTR,PRIMARY KEY. If a matching record would be found, row_undo_mod() would hit ut_error in switch (node->rec_type). Typically, ut_a(table2 == NULL) would fail when opening the table from SQL. Because of this, we prevent a crash-downgrade to earlier MariaDB 10.2 versions by changing the InnoDB redo log format identifier to the 10.3 identifier, and by introducing a subformat identifier so that 10.2 can continue to refuse crash-downgrade from 10.3 or later. After a clean shutdown, a downgrade to MariaDB 10.2.13 or later would still be possible thanks to MDEV-14909. A downgrade to older 10.2 versions is only possible after removing the log files (not recommended). LOG_HEADER_FORMAT_CURRENT: Change to 103 (originally the 10.3 format). log_group_t: Add subformat. For 10.2, we will use subformat 1, and will refuse crash recovery from any other subformat of the 10.3 format, that is, a genuine 10.3 redo log. recv_find_max_checkpoint(): Allow startup after clean shutdown from a future LOG_HEADER_FORMAT_10_4 (unencrypted only). We cannot handle the encrypted 10.4 redo log block format, which was introduced in MDEV-12041. Allow crash recovery from the original 10.2 format as well as the new format. In Mariabackup --backup, do not allow any startup from 10.3 or 10.4 redo logs. recv_recovery_from_checkpoint_start(): Skip redo log apply for clean 10.3 redo log, but not for the new 10.2 redo log (10.3 format, subformat 1). srv_prepare_to_delete_redo_log_files(): On format or subformat mismatch, set srv_log_file_size = 0, so that we will display the correct message. innobase_start_or_create_for_mysql(): Check for format or subformat mismatch. xtrabackup_backup_func(): Remove debug assertions that were made redundant by the code changes in recv_find_max_checkpoint().
7 years ago
MDEV-13564: Implement innodb_unsafe_truncate=ON for compatibility While MariaDB Server 10.2 is not really guaranteed to be compatible with Percona XtraBackup 2.4 (for example, the MySQL 5.7 undo log format change that could be present in XtraBackup, but was reverted from MariaDB in MDEV-12289), we do not want to disrupt users who have deployed xtrabackup and MariaDB Server 10.2 in their environments. With this change, MariaDB 10.2 will continue to use the backup-unsafe TRUNCATE TABLE code, so that neither the undo log nor the redo log formats will change in an incompatible way. Undo tablespace truncation will keep using the redo log only. Recovery or backup with old code will fail to shrink the undo tablespace files, but the contents will be recovered just fine. In the MariaDB Server 10.2 series only, we introduce the configuration parameter innodb_unsafe_truncate and make it ON by default. To allow MariaDB Backup (mariabackup) to work properly with TRUNCATE TABLE operations, use loose_innodb_unsafe_truncate=OFF. MariaDB Server 10.3.10 and later releases will always use the backup-safe TRUNCATE TABLE, and this parameter will not be added there. recv_recovery_rollback_active(): Skip row_mysql_drop_garbage_tables() unless innodb_unsafe_truncate=OFF. It is too unsafe to drop orphan tables if RENAME operations are not transactional within InnoDB. LOG_HEADER_FORMAT_10_3: Replaces LOG_HEADER_FORMAT_CURRENT. log_init(), log_group_file_header_flush(), srv_prepare_to_delete_redo_log_files(), innobase_start_or_create_for_mysql(): Choose the redo log format and subformat based on the value of innodb_unsafe_truncate.
7 years ago
MDEV-14717: Prevent crash-downgrade to earlier MariaDB 10.2 A crash-downgrade of a RENAME (or TRUNCATE or table-rebuilding ALTER TABLE or OPTIMIZE TABLE) operation to an earlier 10.2 version would trigger a debug assertion failure during rollback, in trx_roll_pop_top_rec_of_trx(). In a non-debug build, the TRX_UNDO_RENAME_TABLE record would be misinterpreted as an update_undo log record, and typically the file name would be interpreted as DB_TRX_ID,DB_ROLL_PTR,PRIMARY KEY. If a matching record would be found, row_undo_mod() would hit ut_error in switch (node->rec_type). Typically, ut_a(table2 == NULL) would fail when opening the table from SQL. Because of this, we prevent a crash-downgrade to earlier MariaDB 10.2 versions by changing the InnoDB redo log format identifier to the 10.3 identifier, and by introducing a subformat identifier so that 10.2 can continue to refuse crash-downgrade from 10.3 or later. After a clean shutdown, a downgrade to MariaDB 10.2.13 or later would still be possible thanks to MDEV-14909. A downgrade to older 10.2 versions is only possible after removing the log files (not recommended). LOG_HEADER_FORMAT_CURRENT: Change to 103 (originally the 10.3 format). log_group_t: Add subformat. For 10.2, we will use subformat 1, and will refuse crash recovery from any other subformat of the 10.3 format, that is, a genuine 10.3 redo log. recv_find_max_checkpoint(): Allow startup after clean shutdown from a future LOG_HEADER_FORMAT_10_4 (unencrypted only). We cannot handle the encrypted 10.4 redo log block format, which was introduced in MDEV-12041. Allow crash recovery from the original 10.2 format as well as the new format. In Mariabackup --backup, do not allow any startup from 10.3 or 10.4 redo logs. recv_recovery_from_checkpoint_start(): Skip redo log apply for clean 10.3 redo log, but not for the new 10.2 redo log (10.3 format, subformat 1). srv_prepare_to_delete_redo_log_files(): On format or subformat mismatch, set srv_log_file_size = 0, so that we will display the correct message. innobase_start_or_create_for_mysql(): Check for format or subformat mismatch. xtrabackup_backup_func(): Remove debug assertions that were made redundant by the code changes in recv_find_max_checkpoint().
7 years ago
MDEV-12103 Reduce the time of looking for MLOG_CHECKPOINT during crash recovery This fixes MySQL Bug#80788 in MariaDB 10.2.5. When I made the InnoDB crash recovery more robust by implementing WL#7142, I also introduced an extra redo log scan pass that can be shortened. This fix will slightly extend the InnoDB redo log format that I introduced in MySQL 5.7.9 by writing the start LSN of the MLOG_CHECKPOINT mini-transaction to the end of the log checkpoint page, so that recovery can jump straight to it without scanning all the preceding redo log. LOG_CHECKPOINT_END_LSN: At the end of the checkpoint page, the start LSN of the MLOG_CHECKPOINT mini-transaction. Previously, these bytes were written as 0. log_write_checkpoint_info(), log_group_checkpoint(): Add the parameter end_lsn for writing LOG_CHECKPOINT_END_LSN. log_checkpoint(): Remember the LSN at which the MLOG_CHECKPOINT mini-transaction is starting (or at which the redo log ends on shutdown). recv_init_crash_recovery(): Remove. recv_group_scan_log_recs(): Add the parameter checkpoint_lsn. recv_recovery_from_checkpoint_start(): Read LOG_CHECKPOINT_END_LSN and if it is set, start the first scan from it instead of the checkpoint LSN. Improve some messages and remove bogus assertions. recv_parse_log_recs(): Do not skip DBUG_PRINT("ib_log") for some file-level redo log records. recv_parse_or_apply_log_rec_body(): If we have not parsed all redo log between the checkpoint and the corresponding MLOG_CHECKPOINT record, defer the check for MLOG_FILE_DELETE or MLOG_FILE_NAME records to recv_init_crash_recovery_spaces(). recv_init_crash_recovery_spaces(): Refuse recovery if MLOG_FILE_NAME or MLOG_FILE_DELETE records are missing.
9 years ago
MDEV-13564: Implement innodb_unsafe_truncate=ON for compatibility While MariaDB Server 10.2 is not really guaranteed to be compatible with Percona XtraBackup 2.4 (for example, the MySQL 5.7 undo log format change that could be present in XtraBackup, but was reverted from MariaDB in MDEV-12289), we do not want to disrupt users who have deployed xtrabackup and MariaDB Server 10.2 in their environments. With this change, MariaDB 10.2 will continue to use the backup-unsafe TRUNCATE TABLE code, so that neither the undo log nor the redo log formats will change in an incompatible way. Undo tablespace truncation will keep using the redo log only. Recovery or backup with old code will fail to shrink the undo tablespace files, but the contents will be recovered just fine. In the MariaDB Server 10.2 series only, we introduce the configuration parameter innodb_unsafe_truncate and make it ON by default. To allow MariaDB Backup (mariabackup) to work properly with TRUNCATE TABLE operations, use loose_innodb_unsafe_truncate=OFF. MariaDB Server 10.3.10 and later releases will always use the backup-safe TRUNCATE TABLE, and this parameter will not be added there. recv_recovery_rollback_active(): Skip row_mysql_drop_garbage_tables() unless innodb_unsafe_truncate=OFF. It is too unsafe to drop orphan tables if RENAME operations are not transactional within InnoDB. LOG_HEADER_FORMAT_10_3: Replaces LOG_HEADER_FORMAT_CURRENT. log_init(), log_group_file_header_flush(), srv_prepare_to_delete_redo_log_files(), innobase_start_or_create_for_mysql(): Choose the redo log format and subformat based on the value of innodb_unsafe_truncate.
7 years ago
MDEV-14717: Prevent crash-downgrade to earlier MariaDB 10.2 A crash-downgrade of a RENAME (or TRUNCATE or table-rebuilding ALTER TABLE or OPTIMIZE TABLE) operation to an earlier 10.2 version would trigger a debug assertion failure during rollback, in trx_roll_pop_top_rec_of_trx(). In a non-debug build, the TRX_UNDO_RENAME_TABLE record would be misinterpreted as an update_undo log record, and typically the file name would be interpreted as DB_TRX_ID,DB_ROLL_PTR,PRIMARY KEY. If a matching record would be found, row_undo_mod() would hit ut_error in switch (node->rec_type). Typically, ut_a(table2 == NULL) would fail when opening the table from SQL. Because of this, we prevent a crash-downgrade to earlier MariaDB 10.2 versions by changing the InnoDB redo log format identifier to the 10.3 identifier, and by introducing a subformat identifier so that 10.2 can continue to refuse crash-downgrade from 10.3 or later. After a clean shutdown, a downgrade to MariaDB 10.2.13 or later would still be possible thanks to MDEV-14909. A downgrade to older 10.2 versions is only possible after removing the log files (not recommended). LOG_HEADER_FORMAT_CURRENT: Change to 103 (originally the 10.3 format). log_group_t: Add subformat. For 10.2, we will use subformat 1, and will refuse crash recovery from any other subformat of the 10.3 format, that is, a genuine 10.3 redo log. recv_find_max_checkpoint(): Allow startup after clean shutdown from a future LOG_HEADER_FORMAT_10_4 (unencrypted only). We cannot handle the encrypted 10.4 redo log block format, which was introduced in MDEV-12041. Allow crash recovery from the original 10.2 format as well as the new format. In Mariabackup --backup, do not allow any startup from 10.3 or 10.4 redo logs. recv_recovery_from_checkpoint_start(): Skip redo log apply for clean 10.3 redo log, but not for the new 10.2 redo log (10.3 format, subformat 1). srv_prepare_to_delete_redo_log_files(): On format or subformat mismatch, set srv_log_file_size = 0, so that we will display the correct message. innobase_start_or_create_for_mysql(): Check for format or subformat mismatch. xtrabackup_backup_func(): Remove debug assertions that were made redundant by the code changes in recv_find_max_checkpoint().
7 years ago
MDEV-12103 Reduce the time of looking for MLOG_CHECKPOINT during crash recovery This fixes MySQL Bug#80788 in MariaDB 10.2.5. When I made the InnoDB crash recovery more robust by implementing WL#7142, I also introduced an extra redo log scan pass that can be shortened. This fix will slightly extend the InnoDB redo log format that I introduced in MySQL 5.7.9 by writing the start LSN of the MLOG_CHECKPOINT mini-transaction to the end of the log checkpoint page, so that recovery can jump straight to it without scanning all the preceding redo log. LOG_CHECKPOINT_END_LSN: At the end of the checkpoint page, the start LSN of the MLOG_CHECKPOINT mini-transaction. Previously, these bytes were written as 0. log_write_checkpoint_info(), log_group_checkpoint(): Add the parameter end_lsn for writing LOG_CHECKPOINT_END_LSN. log_checkpoint(): Remember the LSN at which the MLOG_CHECKPOINT mini-transaction is starting (or at which the redo log ends on shutdown). recv_init_crash_recovery(): Remove. recv_group_scan_log_recs(): Add the parameter checkpoint_lsn. recv_recovery_from_checkpoint_start(): Read LOG_CHECKPOINT_END_LSN and if it is set, start the first scan from it instead of the checkpoint LSN. Improve some messages and remove bogus assertions. recv_parse_log_recs(): Do not skip DBUG_PRINT("ib_log") for some file-level redo log records. recv_parse_or_apply_log_rec_body(): If we have not parsed all redo log between the checkpoint and the corresponding MLOG_CHECKPOINT record, defer the check for MLOG_FILE_DELETE or MLOG_FILE_NAME records to recv_init_crash_recovery_spaces(). recv_init_crash_recovery_spaces(): Refuse recovery if MLOG_FILE_NAME or MLOG_FILE_DELETE records are missing.
9 years ago
MDEV-12103 Reduce the time of looking for MLOG_CHECKPOINT during crash recovery This fixes MySQL Bug#80788 in MariaDB 10.2.5. When I made the InnoDB crash recovery more robust by implementing WL#7142, I also introduced an extra redo log scan pass that can be shortened. This fix will slightly extend the InnoDB redo log format that I introduced in MySQL 5.7.9 by writing the start LSN of the MLOG_CHECKPOINT mini-transaction to the end of the log checkpoint page, so that recovery can jump straight to it without scanning all the preceding redo log. LOG_CHECKPOINT_END_LSN: At the end of the checkpoint page, the start LSN of the MLOG_CHECKPOINT mini-transaction. Previously, these bytes were written as 0. log_write_checkpoint_info(), log_group_checkpoint(): Add the parameter end_lsn for writing LOG_CHECKPOINT_END_LSN. log_checkpoint(): Remember the LSN at which the MLOG_CHECKPOINT mini-transaction is starting (or at which the redo log ends on shutdown). recv_init_crash_recovery(): Remove. recv_group_scan_log_recs(): Add the parameter checkpoint_lsn. recv_recovery_from_checkpoint_start(): Read LOG_CHECKPOINT_END_LSN and if it is set, start the first scan from it instead of the checkpoint LSN. Improve some messages and remove bogus assertions. recv_parse_log_recs(): Do not skip DBUG_PRINT("ib_log") for some file-level redo log records. recv_parse_or_apply_log_rec_body(): If we have not parsed all redo log between the checkpoint and the corresponding MLOG_CHECKPOINT record, defer the check for MLOG_FILE_DELETE or MLOG_FILE_NAME records to recv_init_crash_recovery_spaces(). recv_init_crash_recovery_spaces(): Refuse recovery if MLOG_FILE_NAME or MLOG_FILE_DELETE records are missing.
9 years ago
MDEV-12103 Reduce the time of looking for MLOG_CHECKPOINT during crash recovery This fixes MySQL Bug#80788 in MariaDB 10.2.5. When I made the InnoDB crash recovery more robust by implementing WL#7142, I also introduced an extra redo log scan pass that can be shortened. This fix will slightly extend the InnoDB redo log format that I introduced in MySQL 5.7.9 by writing the start LSN of the MLOG_CHECKPOINT mini-transaction to the end of the log checkpoint page, so that recovery can jump straight to it without scanning all the preceding redo log. LOG_CHECKPOINT_END_LSN: At the end of the checkpoint page, the start LSN of the MLOG_CHECKPOINT mini-transaction. Previously, these bytes were written as 0. log_write_checkpoint_info(), log_group_checkpoint(): Add the parameter end_lsn for writing LOG_CHECKPOINT_END_LSN. log_checkpoint(): Remember the LSN at which the MLOG_CHECKPOINT mini-transaction is starting (or at which the redo log ends on shutdown). recv_init_crash_recovery(): Remove. recv_group_scan_log_recs(): Add the parameter checkpoint_lsn. recv_recovery_from_checkpoint_start(): Read LOG_CHECKPOINT_END_LSN and if it is set, start the first scan from it instead of the checkpoint LSN. Improve some messages and remove bogus assertions. recv_parse_log_recs(): Do not skip DBUG_PRINT("ib_log") for some file-level redo log records. recv_parse_or_apply_log_rec_body(): If we have not parsed all redo log between the checkpoint and the corresponding MLOG_CHECKPOINT record, defer the check for MLOG_FILE_DELETE or MLOG_FILE_NAME records to recv_init_crash_recovery_spaces(). recv_init_crash_recovery_spaces(): Refuse recovery if MLOG_FILE_NAME or MLOG_FILE_DELETE records are missing.
9 years ago
Merge Google encryption commit 195158e9889365dc3298f8c1f3bcaa745992f27f Author: Minli Zhu <minliz@google.com> Date: Mon Nov 25 11:05:55 2013 -0800 Innodb redo log encryption/decryption. Use start lsn of a log block as part of AES CTR counter. Record key version with each checkpoint. Internally key version 0 means no encryption. Tests done (see test_innodb_log_encryption.sh for detail): - Verify flag innodb_encrypt_log on or off, combined with various key versions passed through CLI, and dynamically set after startup, will not corrupt database. This includes tests from being unencrypted to encrypted, and encrypted to unencrypted. - Verify start-up with no redo logs succeeds. - Verify fresh start-up succeeds. Change-Id: I4ce4c2afdf3076be2fce90ebbc2a7ce01184b612 commit c1b97273659f07866758c25f4a56f680a1fbad24 Author: Jonas Oreland <jonaso@google.com> Date: Tue Dec 3 18:47:27 2013 +0100 encryption of aria data&index files this patch implements encryption of aria data & index files. this is implemented as 1) add read/write hooks (renamed from callbacks) that does encrypt/decrypt (also add pre_read and post_write hooks) 2) modify page headers for data/index to contain key version (making the data-page header size different for with/without encryption) 3) modify index page 0 to contain IV (and crypt header) 4) AES CRT crypt functions 5) counter block is implemented using combination of page no, lsn and table specific id NOTE: 1) log files are not encrypted, this is not needed for if aria is only used for internal temporary tables and they are not transactional (i.e not logged) 2) all encrypted tables are using PAGE_CHECKSUM (crc) normal internal temporary tables are (currently) not CHECKSUM:ed 3) This patch adds insert-order semantics to aria block_format. The default behaviour of aria block-format is best-fit, meaning that rows gets allocated to page trying to fill the pages as much as possible. However, certain sql constructs materialize temporary result in tmp-tables, and expect that a table scan will later return the rows in the same order they were inserted. This implementation of insert-order is only enabled when explicitly requested by sql-layer. CHANGES: 1) found bug in ma_write that made code try to abort a record that was never written unsure why this is not exposed Change-Id: Ia82bbaa92e2c0629c08693c5add2f56b815c0509 commit 89dc1ab651fe0205d55b4eb588f62df550aa65fc Author: Jonas Oreland <jonaso@google.com> Date: Mon Feb 17 08:04:50 2014 -0800 Implement encryption of innodb datafiles. Pages are encrypted before written to disk and decrypted when read from disk. Each page except first page (page 0) in tablespace is encrypted. Page 0 is unencrypted and contains IV for the tablespace. FIL_PAGE_FILE_FLUSH_LSN on each page (except page 0) is used to store a 32-bit key-version, so that multiple keys can be active in a tablespace simultaneous. The other 32-bit of the FIL_PAGE_FILE_FLUSH_LSN field contains a checksum that is computed after encryption. This checksum is used by innochecksum and when restoring from double-write-buffer. The encryption is performed using AES CRT. Monitoring of encryption is enabled using new IS-table INNODB_TABLESPACES_ENCRYPTION. In addition to that new status variables innodb_encryption_rotation_{ pages_read_from_cache, pages_read_from_disk, pages_modified,pages_flushed } has been added. The following tunables are introduces - innodb_encrypt_tables - innodb_encryption_threads - innodb_encryption_rotate_key_age - innodb_encryption_rotation_iops Change-Id: I8f651795a30b52e71b16d6bc9cb7559be349d0b2 commit a17eef2f6948e58219c9e26fc35633d6fd4de1de Author: Andrew Ford <andrewford@google.com> Date: Thu Jan 2 15:43:09 2014 -0800 Key management skeleton with debug hooks. Change-Id: Ifd6aa3743d7ea291c70083f433a059c439aed866 commit 68a399838ad72264fd61b3dc67fecd29bbdb0af1 Author: Andrew Ford <andrewford@google.com> Date: Mon Oct 28 16:27:44 2013 -0700 Add AES-128 CTR and GCM encryption classes. Change-Id: I116305eced2a233db15306bc2ef5b9d398d1a3a2
11 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
Merge Google encryption commit 195158e9889365dc3298f8c1f3bcaa745992f27f Author: Minli Zhu <minliz@google.com> Date: Mon Nov 25 11:05:55 2013 -0800 Innodb redo log encryption/decryption. Use start lsn of a log block as part of AES CTR counter. Record key version with each checkpoint. Internally key version 0 means no encryption. Tests done (see test_innodb_log_encryption.sh for detail): - Verify flag innodb_encrypt_log on or off, combined with various key versions passed through CLI, and dynamically set after startup, will not corrupt database. This includes tests from being unencrypted to encrypted, and encrypted to unencrypted. - Verify start-up with no redo logs succeeds. - Verify fresh start-up succeeds. Change-Id: I4ce4c2afdf3076be2fce90ebbc2a7ce01184b612 commit c1b97273659f07866758c25f4a56f680a1fbad24 Author: Jonas Oreland <jonaso@google.com> Date: Tue Dec 3 18:47:27 2013 +0100 encryption of aria data&index files this patch implements encryption of aria data & index files. this is implemented as 1) add read/write hooks (renamed from callbacks) that does encrypt/decrypt (also add pre_read and post_write hooks) 2) modify page headers for data/index to contain key version (making the data-page header size different for with/without encryption) 3) modify index page 0 to contain IV (and crypt header) 4) AES CRT crypt functions 5) counter block is implemented using combination of page no, lsn and table specific id NOTE: 1) log files are not encrypted, this is not needed for if aria is only used for internal temporary tables and they are not transactional (i.e not logged) 2) all encrypted tables are using PAGE_CHECKSUM (crc) normal internal temporary tables are (currently) not CHECKSUM:ed 3) This patch adds insert-order semantics to aria block_format. The default behaviour of aria block-format is best-fit, meaning that rows gets allocated to page trying to fill the pages as much as possible. However, certain sql constructs materialize temporary result in tmp-tables, and expect that a table scan will later return the rows in the same order they were inserted. This implementation of insert-order is only enabled when explicitly requested by sql-layer. CHANGES: 1) found bug in ma_write that made code try to abort a record that was never written unsure why this is not exposed Change-Id: Ia82bbaa92e2c0629c08693c5add2f56b815c0509 commit 89dc1ab651fe0205d55b4eb588f62df550aa65fc Author: Jonas Oreland <jonaso@google.com> Date: Mon Feb 17 08:04:50 2014 -0800 Implement encryption of innodb datafiles. Pages are encrypted before written to disk and decrypted when read from disk. Each page except first page (page 0) in tablespace is encrypted. Page 0 is unencrypted and contains IV for the tablespace. FIL_PAGE_FILE_FLUSH_LSN on each page (except page 0) is used to store a 32-bit key-version, so that multiple keys can be active in a tablespace simultaneous. The other 32-bit of the FIL_PAGE_FILE_FLUSH_LSN field contains a checksum that is computed after encryption. This checksum is used by innochecksum and when restoring from double-write-buffer. The encryption is performed using AES CRT. Monitoring of encryption is enabled using new IS-table INNODB_TABLESPACES_ENCRYPTION. In addition to that new status variables innodb_encryption_rotation_{ pages_read_from_cache, pages_read_from_disk, pages_modified,pages_flushed } has been added. The following tunables are introduces - innodb_encrypt_tables - innodb_encryption_threads - innodb_encryption_rotate_key_age - innodb_encryption_rotation_iops Change-Id: I8f651795a30b52e71b16d6bc9cb7559be349d0b2 commit a17eef2f6948e58219c9e26fc35633d6fd4de1de Author: Andrew Ford <andrewford@google.com> Date: Thu Jan 2 15:43:09 2014 -0800 Key management skeleton with debug hooks. Change-Id: Ifd6aa3743d7ea291c70083f433a059c439aed866 commit 68a399838ad72264fd61b3dc67fecd29bbdb0af1 Author: Andrew Ford <andrewford@google.com> Date: Mon Oct 28 16:27:44 2013 -0700 Add AES-128 CTR and GCM encryption classes. Change-Id: I116305eced2a233db15306bc2ef5b9d398d1a3a2
11 years ago
MDEV-12699 Improve crash recovery of corrupted data pages InnoDB crash recovery used to read every data page for which redo log exists. This is unnecessary for those pages that are initialized by the redo log. If a newly created page is corrupted, recovery could unnecessarily fail. It would suffice to reinitialize the page based on the redo log records. To add insult to injury, InnoDB crash recovery could hang if it encountered a corrupted page. We will fix also that problem. InnoDB would normally refuse to start up if it encounters a corrupted page on recovery, but that can be overridden by setting innodb_force_recovery=1. Data pages are completely initialized by the records MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS. MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE, which notifies that a page has been freed and its contents can be discarded (filled with zeroes). The record MLOG_INDEX_LOAD notifies that redo logging has been re-enabled after being disabled. We can avoid loading the page if all buffered redo log records predate the MLOG_INDEX_LOAD record. For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD records were written before commit aa3f7a107ce3a9a7f80daf3cadd442a61c5493ab. Hence, we will skip these optimizations for tables whose name starts with FTS_. This is joint work with Thirunarayanan Balathandayuthapani. fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the latest recovered MLOG_INDEX_LOAD record for a tablespace. mlog_init: Page initialization operations discovered during redo log scanning. FIXME: This really belongs in recv_sys->addr_hash, and should be removed in MDEV-19176. recv_addr_state: Add the new state RECV_WILL_NOT_READ to indicate that according to mlog_init, the page will be initialized based on redo log record contents. recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS as page initialization. This works around bugs in the crash recovery of ROW_FORMAT=COMPRESSED tables. recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record by resetting the state to RECV_NOT_PROCESSED and by updating the fil_name_t::enable_lsn. recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn to fil_space_t::enable_lsn. recv_recover_page(): Add the parameter init_lsn, to ignore any log records that precede the page initialization. Add DBUG output about skipped operations. buf_page_create(): Initialize FIL_PAGE_LSN, so that recv_recover_page() will not wrongly skip applying the page-initialization record due to the field containing some newer LSN as a leftover from a different page. Do not invoke ibuf_merge_or_delete_for_page() during crash recovery. recv_apply_hashed_log_recs(): Remove some unnecessary lookups. Note if a corrupted page was found during recovery. After invoking buf_page_create(), do invoke ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge() in the last recovery batch. ibuf_merge_or_delete_for_page(): Relax a debug assertion. innobase_start_or_create_for_mysql(): Abort startup if a corrupted page was found during recovery. Corrupted pages will not be flagged if innodb_force_recovery is set. However, the recv_sys->found_corrupt_fs flag can be set regardless of innodb_force_recovery if file names are found to be incorrect (for example, multiple files with the same tablespace ID).
7 years ago
MDEV-12699 Improve crash recovery of corrupted data pages InnoDB crash recovery used to read every data page for which redo log exists. This is unnecessary for those pages that are initialized by the redo log. If a newly created page is corrupted, recovery could unnecessarily fail. It would suffice to reinitialize the page based on the redo log records. To add insult to injury, InnoDB crash recovery could hang if it encountered a corrupted page. We will fix also that problem. InnoDB would normally refuse to start up if it encounters a corrupted page on recovery, but that can be overridden by setting innodb_force_recovery=1. Data pages are completely initialized by the records MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS. MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE, which notifies that a page has been freed and its contents can be discarded (filled with zeroes). The record MLOG_INDEX_LOAD notifies that redo logging has been re-enabled after being disabled. We can avoid loading the page if all buffered redo log records predate the MLOG_INDEX_LOAD record. For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD records were written before commit aa3f7a107ce3a9a7f80daf3cadd442a61c5493ab. Hence, we will skip these optimizations for tables whose name starts with FTS_. This is joint work with Thirunarayanan Balathandayuthapani. fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the latest recovered MLOG_INDEX_LOAD record for a tablespace. mlog_init: Page initialization operations discovered during redo log scanning. FIXME: This really belongs in recv_sys->addr_hash, and should be removed in MDEV-19176. recv_addr_state: Add the new state RECV_WILL_NOT_READ to indicate that according to mlog_init, the page will be initialized based on redo log record contents. recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS as page initialization. This works around bugs in the crash recovery of ROW_FORMAT=COMPRESSED tables. recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record by resetting the state to RECV_NOT_PROCESSED and by updating the fil_name_t::enable_lsn. recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn to fil_space_t::enable_lsn. recv_recover_page(): Add the parameter init_lsn, to ignore any log records that precede the page initialization. Add DBUG output about skipped operations. buf_page_create(): Initialize FIL_PAGE_LSN, so that recv_recover_page() will not wrongly skip applying the page-initialization record due to the field containing some newer LSN as a leftover from a different page. Do not invoke ibuf_merge_or_delete_for_page() during crash recovery. recv_apply_hashed_log_recs(): Remove some unnecessary lookups. Note if a corrupted page was found during recovery. After invoking buf_page_create(), do invoke ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge() in the last recovery batch. ibuf_merge_or_delete_for_page(): Relax a debug assertion. innobase_start_or_create_for_mysql(): Abort startup if a corrupted page was found during recovery. Corrupted pages will not be flagged if innodb_force_recovery is set. However, the recv_sys->found_corrupt_fs flag can be set regardless of innodb_force_recovery if file names are found to be incorrect (for example, multiple files with the same tablespace ID).
7 years ago
MDEV-12699 Improve crash recovery of corrupted data pages InnoDB crash recovery used to read every data page for which redo log exists. This is unnecessary for those pages that are initialized by the redo log. If a newly created page is corrupted, recovery could unnecessarily fail. It would suffice to reinitialize the page based on the redo log records. To add insult to injury, InnoDB crash recovery could hang if it encountered a corrupted page. We will fix also that problem. InnoDB would normally refuse to start up if it encounters a corrupted page on recovery, but that can be overridden by setting innodb_force_recovery=1. Data pages are completely initialized by the records MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS. MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE, which notifies that a page has been freed and its contents can be discarded (filled with zeroes). The record MLOG_INDEX_LOAD notifies that redo logging has been re-enabled after being disabled. We can avoid loading the page if all buffered redo log records predate the MLOG_INDEX_LOAD record. For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD records were written before commit aa3f7a107ce3a9a7f80daf3cadd442a61c5493ab. Hence, we will skip these optimizations for tables whose name starts with FTS_. This is joint work with Thirunarayanan Balathandayuthapani. fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the latest recovered MLOG_INDEX_LOAD record for a tablespace. mlog_init: Page initialization operations discovered during redo log scanning. FIXME: This really belongs in recv_sys->addr_hash, and should be removed in MDEV-19176. recv_addr_state: Add the new state RECV_WILL_NOT_READ to indicate that according to mlog_init, the page will be initialized based on redo log record contents. recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS as page initialization. This works around bugs in the crash recovery of ROW_FORMAT=COMPRESSED tables. recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record by resetting the state to RECV_NOT_PROCESSED and by updating the fil_name_t::enable_lsn. recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn to fil_space_t::enable_lsn. recv_recover_page(): Add the parameter init_lsn, to ignore any log records that precede the page initialization. Add DBUG output about skipped operations. buf_page_create(): Initialize FIL_PAGE_LSN, so that recv_recover_page() will not wrongly skip applying the page-initialization record due to the field containing some newer LSN as a leftover from a different page. Do not invoke ibuf_merge_or_delete_for_page() during crash recovery. recv_apply_hashed_log_recs(): Remove some unnecessary lookups. Note if a corrupted page was found during recovery. After invoking buf_page_create(), do invoke ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge() in the last recovery batch. ibuf_merge_or_delete_for_page(): Relax a debug assertion. innobase_start_or_create_for_mysql(): Abort startup if a corrupted page was found during recovery. Corrupted pages will not be flagged if innodb_force_recovery is set. However, the recv_sys->found_corrupt_fs flag can be set regardless of innodb_force_recovery if file names are found to be incorrect (for example, multiple files with the same tablespace ID).
7 years ago
MDEV-12699 Improve crash recovery of corrupted data pages InnoDB crash recovery used to read every data page for which redo log exists. This is unnecessary for those pages that are initialized by the redo log. If a newly created page is corrupted, recovery could unnecessarily fail. It would suffice to reinitialize the page based on the redo log records. To add insult to injury, InnoDB crash recovery could hang if it encountered a corrupted page. We will fix also that problem. InnoDB would normally refuse to start up if it encounters a corrupted page on recovery, but that can be overridden by setting innodb_force_recovery=1. Data pages are completely initialized by the records MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS. MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE, which notifies that a page has been freed and its contents can be discarded (filled with zeroes). The record MLOG_INDEX_LOAD notifies that redo logging has been re-enabled after being disabled. We can avoid loading the page if all buffered redo log records predate the MLOG_INDEX_LOAD record. For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD records were written before commit aa3f7a107ce3a9a7f80daf3cadd442a61c5493ab. Hence, we will skip these optimizations for tables whose name starts with FTS_. This is joint work with Thirunarayanan Balathandayuthapani. fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the latest recovered MLOG_INDEX_LOAD record for a tablespace. mlog_init: Page initialization operations discovered during redo log scanning. FIXME: This really belongs in recv_sys->addr_hash, and should be removed in MDEV-19176. recv_addr_state: Add the new state RECV_WILL_NOT_READ to indicate that according to mlog_init, the page will be initialized based on redo log record contents. recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS as page initialization. This works around bugs in the crash recovery of ROW_FORMAT=COMPRESSED tables. recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record by resetting the state to RECV_NOT_PROCESSED and by updating the fil_name_t::enable_lsn. recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn to fil_space_t::enable_lsn. recv_recover_page(): Add the parameter init_lsn, to ignore any log records that precede the page initialization. Add DBUG output about skipped operations. buf_page_create(): Initialize FIL_PAGE_LSN, so that recv_recover_page() will not wrongly skip applying the page-initialization record due to the field containing some newer LSN as a leftover from a different page. Do not invoke ibuf_merge_or_delete_for_page() during crash recovery. recv_apply_hashed_log_recs(): Remove some unnecessary lookups. Note if a corrupted page was found during recovery. After invoking buf_page_create(), do invoke ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge() in the last recovery batch. ibuf_merge_or_delete_for_page(): Relax a debug assertion. innobase_start_or_create_for_mysql(): Abort startup if a corrupted page was found during recovery. Corrupted pages will not be flagged if innodb_force_recovery is set. However, the recv_sys->found_corrupt_fs flag can be set regardless of innodb_force_recovery if file names are found to be incorrect (for example, multiple files with the same tablespace ID).
7 years ago
MDEV-12548 Initial implementation of Mariabackup for MariaDB 10.2 InnoDB I/O and buffer pool interfaces and the redo log format have been changed between MariaDB 10.1 and 10.2, and the backup code has to be adjusted accordingly. The code has been simplified, and many memory leaks have been fixed. Instead of the file name xtrabackup_logfile, the file name ib_logfile0 is being used for the copy of the redo log. Unnecessary InnoDB startup and shutdown and some unnecessary threads have been removed. Some help was provided by Vladislav Vaintroub. Parameters have been cleaned up and aligned with those of MariaDB 10.2. The --dbug option has been added, so that in debug builds, --dbug=d,ib_log can be specified to enable diagnostic messages for processing redo log entries. By default, innodb_doublewrite=OFF, so that --prepare works faster. If more crash-safety for --prepare is needed, double buffering can be enabled. The parameter innodb_log_checksums=OFF can be used to ignore redo log checksums in --backup. Some messages have been cleaned up. Unless --export is specified, Mariabackup will not deal with undo log. The InnoDB mini-transaction redo log is not only about user-level transactions; it is actually about mini-transactions. To avoid confusion, call it the redo log, not transaction log. We disable any undo log processing in --prepare. Because MariaDB 10.2 supports indexed virtual columns, the undo log processing would need to be able to evaluate virtual column expressions. To reduce the amount of code dependencies, we will not process any undo log in prepare. This means that the --export option must be disabled for now. This also means that the following options are redundant and have been removed: xtrabackup --apply-log-only innobackupex --redo-only In addition to disabling any undo log processing, we will disable any further changes to data pages during --prepare, including the change buffer merge. This means that restoring incremental backups should reliably work even when change buffering is being used on the server. Because of this, preparing a backup will not generate any further redo log, and the redo log file can be safely deleted. (If the --export option is enabled in the future, it must generate redo log when processing undo logs and buffered changes.) In --prepare, we cannot easily know if a partial backup was used, especially when restoring a series of incremental backups. So, we simply warn about any missing files, and ignore the redo log for them. FIXME: Enable the --export option. FIXME: Improve the handling of the MLOG_INDEX_LOAD record, and write a test that initiates a backup while an ALGORITHM=INPLACE operation is creating indexes or rebuilding a table. An error should be detected when preparing the backup. FIXME: In --incremental --prepare, xtrabackup_apply_delta() should ensure that if FSP_SIZE is modified, the file size will be adjusted accordingly.
8 years ago
MDEV-12548 Initial implementation of Mariabackup for MariaDB 10.2 InnoDB I/O and buffer pool interfaces and the redo log format have been changed between MariaDB 10.1 and 10.2, and the backup code has to be adjusted accordingly. The code has been simplified, and many memory leaks have been fixed. Instead of the file name xtrabackup_logfile, the file name ib_logfile0 is being used for the copy of the redo log. Unnecessary InnoDB startup and shutdown and some unnecessary threads have been removed. Some help was provided by Vladislav Vaintroub. Parameters have been cleaned up and aligned with those of MariaDB 10.2. The --dbug option has been added, so that in debug builds, --dbug=d,ib_log can be specified to enable diagnostic messages for processing redo log entries. By default, innodb_doublewrite=OFF, so that --prepare works faster. If more crash-safety for --prepare is needed, double buffering can be enabled. The parameter innodb_log_checksums=OFF can be used to ignore redo log checksums in --backup. Some messages have been cleaned up. Unless --export is specified, Mariabackup will not deal with undo log. The InnoDB mini-transaction redo log is not only about user-level transactions; it is actually about mini-transactions. To avoid confusion, call it the redo log, not transaction log. We disable any undo log processing in --prepare. Because MariaDB 10.2 supports indexed virtual columns, the undo log processing would need to be able to evaluate virtual column expressions. To reduce the amount of code dependencies, we will not process any undo log in prepare. This means that the --export option must be disabled for now. This also means that the following options are redundant and have been removed: xtrabackup --apply-log-only innobackupex --redo-only In addition to disabling any undo log processing, we will disable any further changes to data pages during --prepare, including the change buffer merge. This means that restoring incremental backups should reliably work even when change buffering is being used on the server. Because of this, preparing a backup will not generate any further redo log, and the redo log file can be safely deleted. (If the --export option is enabled in the future, it must generate redo log when processing undo logs and buffered changes.) In --prepare, we cannot easily know if a partial backup was used, especially when restoring a series of incremental backups. So, we simply warn about any missing files, and ignore the redo log for them. FIXME: Enable the --export option. FIXME: Improve the handling of the MLOG_INDEX_LOAD record, and write a test that initiates a backup while an ALGORITHM=INPLACE operation is creating indexes or rebuilding a table. An error should be detected when preparing the backup. FIXME: In --incremental --prepare, xtrabackup_apply_delta() should ensure that if FSP_SIZE is modified, the file size will be adjusted accordingly.
8 years ago
MDEV-12548 Initial implementation of Mariabackup for MariaDB 10.2 InnoDB I/O and buffer pool interfaces and the redo log format have been changed between MariaDB 10.1 and 10.2, and the backup code has to be adjusted accordingly. The code has been simplified, and many memory leaks have been fixed. Instead of the file name xtrabackup_logfile, the file name ib_logfile0 is being used for the copy of the redo log. Unnecessary InnoDB startup and shutdown and some unnecessary threads have been removed. Some help was provided by Vladislav Vaintroub. Parameters have been cleaned up and aligned with those of MariaDB 10.2. The --dbug option has been added, so that in debug builds, --dbug=d,ib_log can be specified to enable diagnostic messages for processing redo log entries. By default, innodb_doublewrite=OFF, so that --prepare works faster. If more crash-safety for --prepare is needed, double buffering can be enabled. The parameter innodb_log_checksums=OFF can be used to ignore redo log checksums in --backup. Some messages have been cleaned up. Unless --export is specified, Mariabackup will not deal with undo log. The InnoDB mini-transaction redo log is not only about user-level transactions; it is actually about mini-transactions. To avoid confusion, call it the redo log, not transaction log. We disable any undo log processing in --prepare. Because MariaDB 10.2 supports indexed virtual columns, the undo log processing would need to be able to evaluate virtual column expressions. To reduce the amount of code dependencies, we will not process any undo log in prepare. This means that the --export option must be disabled for now. This also means that the following options are redundant and have been removed: xtrabackup --apply-log-only innobackupex --redo-only In addition to disabling any undo log processing, we will disable any further changes to data pages during --prepare, including the change buffer merge. This means that restoring incremental backups should reliably work even when change buffering is being used on the server. Because of this, preparing a backup will not generate any further redo log, and the redo log file can be safely deleted. (If the --export option is enabled in the future, it must generate redo log when processing undo logs and buffered changes.) In --prepare, we cannot easily know if a partial backup was used, especially when restoring a series of incremental backups. So, we simply warn about any missing files, and ignore the redo log for them. FIXME: Enable the --export option. FIXME: Improve the handling of the MLOG_INDEX_LOAD record, and write a test that initiates a backup while an ALGORITHM=INPLACE operation is creating indexes or rebuilding a table. An error should be detected when preparing the backup. FIXME: In --incremental --prepare, xtrabackup_apply_delta() should ensure that if FSP_SIZE is modified, the file size will be adjusted accordingly.
8 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12548 Initial implementation of Mariabackup for MariaDB 10.2 InnoDB I/O and buffer pool interfaces and the redo log format have been changed between MariaDB 10.1 and 10.2, and the backup code has to be adjusted accordingly. The code has been simplified, and many memory leaks have been fixed. Instead of the file name xtrabackup_logfile, the file name ib_logfile0 is being used for the copy of the redo log. Unnecessary InnoDB startup and shutdown and some unnecessary threads have been removed. Some help was provided by Vladislav Vaintroub. Parameters have been cleaned up and aligned with those of MariaDB 10.2. The --dbug option has been added, so that in debug builds, --dbug=d,ib_log can be specified to enable diagnostic messages for processing redo log entries. By default, innodb_doublewrite=OFF, so that --prepare works faster. If more crash-safety for --prepare is needed, double buffering can be enabled. The parameter innodb_log_checksums=OFF can be used to ignore redo log checksums in --backup. Some messages have been cleaned up. Unless --export is specified, Mariabackup will not deal with undo log. The InnoDB mini-transaction redo log is not only about user-level transactions; it is actually about mini-transactions. To avoid confusion, call it the redo log, not transaction log. We disable any undo log processing in --prepare. Because MariaDB 10.2 supports indexed virtual columns, the undo log processing would need to be able to evaluate virtual column expressions. To reduce the amount of code dependencies, we will not process any undo log in prepare. This means that the --export option must be disabled for now. This also means that the following options are redundant and have been removed: xtrabackup --apply-log-only innobackupex --redo-only In addition to disabling any undo log processing, we will disable any further changes to data pages during --prepare, including the change buffer merge. This means that restoring incremental backups should reliably work even when change buffering is being used on the server. Because of this, preparing a backup will not generate any further redo log, and the redo log file can be safely deleted. (If the --export option is enabled in the future, it must generate redo log when processing undo logs and buffered changes.) In --prepare, we cannot easily know if a partial backup was used, especially when restoring a series of incremental backups. So, we simply warn about any missing files, and ignore the redo log for them. FIXME: Enable the --export option. FIXME: Improve the handling of the MLOG_INDEX_LOAD record, and write a test that initiates a backup while an ALGORITHM=INPLACE operation is creating indexes or rebuilding a table. An error should be detected when preparing the backup. FIXME: In --incremental --prepare, xtrabackup_apply_delta() should ensure that if FSP_SIZE is modified, the file size will be adjusted accordingly.
8 years ago
MDEV-12548 Initial implementation of Mariabackup for MariaDB 10.2 InnoDB I/O and buffer pool interfaces and the redo log format have been changed between MariaDB 10.1 and 10.2, and the backup code has to be adjusted accordingly. The code has been simplified, and many memory leaks have been fixed. Instead of the file name xtrabackup_logfile, the file name ib_logfile0 is being used for the copy of the redo log. Unnecessary InnoDB startup and shutdown and some unnecessary threads have been removed. Some help was provided by Vladislav Vaintroub. Parameters have been cleaned up and aligned with those of MariaDB 10.2. The --dbug option has been added, so that in debug builds, --dbug=d,ib_log can be specified to enable diagnostic messages for processing redo log entries. By default, innodb_doublewrite=OFF, so that --prepare works faster. If more crash-safety for --prepare is needed, double buffering can be enabled. The parameter innodb_log_checksums=OFF can be used to ignore redo log checksums in --backup. Some messages have been cleaned up. Unless --export is specified, Mariabackup will not deal with undo log. The InnoDB mini-transaction redo log is not only about user-level transactions; it is actually about mini-transactions. To avoid confusion, call it the redo log, not transaction log. We disable any undo log processing in --prepare. Because MariaDB 10.2 supports indexed virtual columns, the undo log processing would need to be able to evaluate virtual column expressions. To reduce the amount of code dependencies, we will not process any undo log in prepare. This means that the --export option must be disabled for now. This also means that the following options are redundant and have been removed: xtrabackup --apply-log-only innobackupex --redo-only In addition to disabling any undo log processing, we will disable any further changes to data pages during --prepare, including the change buffer merge. This means that restoring incremental backups should reliably work even when change buffering is being used on the server. Because of this, preparing a backup will not generate any further redo log, and the redo log file can be safely deleted. (If the --export option is enabled in the future, it must generate redo log when processing undo logs and buffered changes.) In --prepare, we cannot easily know if a partial backup was used, especially when restoring a series of incremental backups. So, we simply warn about any missing files, and ignore the redo log for them. FIXME: Enable the --export option. FIXME: Improve the handling of the MLOG_INDEX_LOAD record, and write a test that initiates a backup while an ALGORITHM=INPLACE operation is creating indexes or rebuilding a table. An error should be detected when preparing the backup. FIXME: In --incremental --prepare, xtrabackup_apply_delta() should ensure that if FSP_SIZE is modified, the file size will be adjusted accordingly.
8 years ago
MDEV-12548 Initial implementation of Mariabackup for MariaDB 10.2 InnoDB I/O and buffer pool interfaces and the redo log format have been changed between MariaDB 10.1 and 10.2, and the backup code has to be adjusted accordingly. The code has been simplified, and many memory leaks have been fixed. Instead of the file name xtrabackup_logfile, the file name ib_logfile0 is being used for the copy of the redo log. Unnecessary InnoDB startup and shutdown and some unnecessary threads have been removed. Some help was provided by Vladislav Vaintroub. Parameters have been cleaned up and aligned with those of MariaDB 10.2. The --dbug option has been added, so that in debug builds, --dbug=d,ib_log can be specified to enable diagnostic messages for processing redo log entries. By default, innodb_doublewrite=OFF, so that --prepare works faster. If more crash-safety for --prepare is needed, double buffering can be enabled. The parameter innodb_log_checksums=OFF can be used to ignore redo log checksums in --backup. Some messages have been cleaned up. Unless --export is specified, Mariabackup will not deal with undo log. The InnoDB mini-transaction redo log is not only about user-level transactions; it is actually about mini-transactions. To avoid confusion, call it the redo log, not transaction log. We disable any undo log processing in --prepare. Because MariaDB 10.2 supports indexed virtual columns, the undo log processing would need to be able to evaluate virtual column expressions. To reduce the amount of code dependencies, we will not process any undo log in prepare. This means that the --export option must be disabled for now. This also means that the following options are redundant and have been removed: xtrabackup --apply-log-only innobackupex --redo-only In addition to disabling any undo log processing, we will disable any further changes to data pages during --prepare, including the change buffer merge. This means that restoring incremental backups should reliably work even when change buffering is being used on the server. Because of this, preparing a backup will not generate any further redo log, and the redo log file can be safely deleted. (If the --export option is enabled in the future, it must generate redo log when processing undo logs and buffered changes.) In --prepare, we cannot easily know if a partial backup was used, especially when restoring a series of incremental backups. So, we simply warn about any missing files, and ignore the redo log for them. FIXME: Enable the --export option. FIXME: Improve the handling of the MLOG_INDEX_LOAD record, and write a test that initiates a backup while an ALGORITHM=INPLACE operation is creating indexes or rebuilding a table. An error should be detected when preparing the backup. FIXME: In --incremental --prepare, xtrabackup_apply_delta() should ensure that if FSP_SIZE is modified, the file size will be adjusted accordingly.
8 years ago
MDEV-13564 Mariabackup does not work with TRUNCATE Implement undo tablespace truncation via normal redo logging. Implement TRUNCATE TABLE as a combination of RENAME to #sql-ib name, CREATE, and DROP. Note: Orphan #sql-ib*.ibd may be left behind if MariaDB Server 10.2 is killed before the DROP operation is committed. If MariaDB Server 10.2 is killed during TRUNCATE, it is also possible that the old table was renamed to #sql-ib*.ibd but the data dictionary will refer to the table using the original name. In MariaDB Server 10.3, RENAME inside InnoDB is transactional, and #sql-* tables will be dropped on startup. So, this new TRUNCATE will be fully crash-safe in 10.3. ha_mroonga::wrapper_truncate(): Pass table options to the underlying storage engine, now that ha_innobase::truncate() will need them. rpl_slave_state::truncate_state_table(): Before truncating mysql.gtid_slave_pos, evict any cached table handles from the table definition cache, so that there will be no stale references to the old table after truncating. == TRUNCATE TABLE == WL#6501 in MySQL 5.7 introduced separate log files for implementing atomic and crash-safe TRUNCATE TABLE, instead of using the InnoDB undo and redo log. Some convoluted logic was added to the InnoDB crash recovery, and some extra synchronization (including a redo log checkpoint) was introduced to make this work. This synchronization has caused performance problems and race conditions, and the extra log files cannot be copied or applied by external backup programs. In order to support crash-upgrade from MariaDB 10.2, we will keep the logic for parsing and applying the extra log files, but we will no longer generate those files in TRUNCATE TABLE. A prerequisite for crash-safe TRUNCATE is a crash-safe RENAME TABLE (with full redo and undo logging and proper rollback). This will be implemented in MDEV-14717. ha_innobase::truncate(): Invoke RENAME, create(), delete_table(). Because RENAME cannot be fully rolled back before MariaDB 10.3 due to missing undo logging, add some explicit rename-back in case the operation fails. ha_innobase::delete(): Introduce a variant that takes sqlcom as a parameter. In TRUNCATE TABLE, we do not want to touch any FOREIGN KEY constraints. ha_innobase::create(): Add the parameters file_per_table, trx. In TRUNCATE, the new table must be created in the same transaction that renames the old table. create_table_info_t::create_table_info_t(): Add the parameters file_per_table, trx. row_drop_table_for_mysql(): Replace a bool parameter with sqlcom. row_drop_table_after_create_fail(): New function, wrapping row_drop_table_for_mysql(). dict_truncate_index_tree_in_mem(), fil_truncate_tablespace(), fil_prepare_for_truncate(), fil_reinit_space_header_for_table(), row_truncate_table_for_mysql(), TruncateLogger, row_truncate_prepare(), row_truncate_rollback(), row_truncate_complete(), row_truncate_fts(), row_truncate_update_system_tables(), row_truncate_foreign_key_checks(), row_truncate_sanity_checks(): Remove. row_upd_check_references_constraints(): Remove a check for TRUNCATE, now that the table is no longer truncated in place. The new test innodb.truncate_foreign uses DEBUG_SYNC to cover some race-condition like scenarios. The test innodb-innodb.truncate does not use any synchronization. We add a redo log subformat to indicate backup-friendly format. MariaDB 10.4 will remove support for the old TRUNCATE logging, so crash-upgrade from old 10.2 or 10.3 to 10.4 will involve limitations. == Undo tablespace truncation == MySQL 5.7 implements undo tablespace truncation. It is only possible when innodb_undo_tablespaces is set to at least 2. The logging is implemented similar to the WL#6501 TRUNCATE, that is, using separate log files and a redo log checkpoint. We can simply implement undo tablespace truncation within a single mini-transaction that reinitializes the undo log tablespace file. Unfortunately, due to the redo log format of some operations, currently, the total redo log written by undo tablespace truncation will be more than the combined size of the truncated undo tablespace. It should be acceptable to have a little more than 1 megabyte of log in a single mini-transaction. This will be fixed in MDEV-17138 in MariaDB Server 10.4. recv_sys_t: Add truncated_undo_spaces[] to remember for which undo tablespaces a MLOG_FILE_CREATE2 record was seen. namespace undo: Remove some unnecessary declarations. fil_space_t::is_being_truncated: Document that this flag now only applies to undo tablespaces. Remove some references. fil_space_t::is_stopping(): Do not refer to is_being_truncated. This check is for tablespaces of tables. Potentially used tablespaces are never truncated any more. buf_dblwr_process(): Suppress the out-of-bounds warning for undo tablespaces. fil_truncate_log(): Write a MLOG_FILE_CREATE2 with a nonzero page number (new size of the tablespace in pages) to inform crash recovery that the undo tablespace size has been reduced. fil_op_write_log(): Relax assertions, so that MLOG_FILE_CREATE2 can be written for undo tablespaces (without .ibd file suffix) for a nonzero page number. os_file_truncate(): Add the parameter allow_shrink=false so that undo tablespaces can actually be shrunk using this function. fil_name_parse(): For undo tablespace truncation, buffer MLOG_FILE_CREATE2 in truncated_undo_spaces[]. recv_read_in_area(): Avoid reading pages for which no redo log records remain buffered, after recv_addr_trim() removed them. trx_rseg_header_create(): Add a FIXME comment that we could write much less redo log. trx_undo_truncate_tablespace(): Reinitialize the undo tablespace in a single mini-transaction, which will be flushed to the redo log before the file size is trimmed. recv_addr_trim(): Discard any redo logs for pages that were logged after the new end of a file, before the truncation LSN. If the rec_list becomes empty, reduce n_addrs. After removing any affected records, actually truncate the file. recv_apply_hashed_log_recs(): Invoke recv_addr_trim() right before applying any log records. The undo tablespace files must be open at this point. buf_flush_or_remove_pages(), buf_flush_dirty_pages(), buf_LRU_flush_or_remove_pages(): Add a parameter for specifying the number of the first page to flush or remove (default 0). trx_purge_initiate_truncate(): Remove the log checkpoints, the extra logging, and some unnecessary crash points. Merge the code from trx_undo_truncate_tablespace(). First, flush all to-be-discarded pages (beyond the new end of the file), then trim the space->size to make the page allocation deterministic. At the only remaining crash injection point, flush the redo log, so that the recovery can be tested.
7 years ago
MDEV-12699 Improve crash recovery of corrupted data pages InnoDB crash recovery used to read every data page for which redo log exists. This is unnecessary for those pages that are initialized by the redo log. If a newly created page is corrupted, recovery could unnecessarily fail. It would suffice to reinitialize the page based on the redo log records. To add insult to injury, InnoDB crash recovery could hang if it encountered a corrupted page. We will fix also that problem. InnoDB would normally refuse to start up if it encounters a corrupted page on recovery, but that can be overridden by setting innodb_force_recovery=1. Data pages are completely initialized by the records MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS. MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE, which notifies that a page has been freed and its contents can be discarded (filled with zeroes). The record MLOG_INDEX_LOAD notifies that redo logging has been re-enabled after being disabled. We can avoid loading the page if all buffered redo log records predate the MLOG_INDEX_LOAD record. For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD records were written before commit aa3f7a107ce3a9a7f80daf3cadd442a61c5493ab. Hence, we will skip these optimizations for tables whose name starts with FTS_. This is joint work with Thirunarayanan Balathandayuthapani. fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the latest recovered MLOG_INDEX_LOAD record for a tablespace. mlog_init: Page initialization operations discovered during redo log scanning. FIXME: This really belongs in recv_sys->addr_hash, and should be removed in MDEV-19176. recv_addr_state: Add the new state RECV_WILL_NOT_READ to indicate that according to mlog_init, the page will be initialized based on redo log record contents. recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS as page initialization. This works around bugs in the crash recovery of ROW_FORMAT=COMPRESSED tables. recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record by resetting the state to RECV_NOT_PROCESSED and by updating the fil_name_t::enable_lsn. recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn to fil_space_t::enable_lsn. recv_recover_page(): Add the parameter init_lsn, to ignore any log records that precede the page initialization. Add DBUG output about skipped operations. buf_page_create(): Initialize FIL_PAGE_LSN, so that recv_recover_page() will not wrongly skip applying the page-initialization record due to the field containing some newer LSN as a leftover from a different page. Do not invoke ibuf_merge_or_delete_for_page() during crash recovery. recv_apply_hashed_log_recs(): Remove some unnecessary lookups. Note if a corrupted page was found during recovery. After invoking buf_page_create(), do invoke ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge() in the last recovery batch. ibuf_merge_or_delete_for_page(): Relax a debug assertion. innobase_start_or_create_for_mysql(): Abort startup if a corrupted page was found during recovery. Corrupted pages will not be flagged if innodb_force_recovery is set. However, the recv_sys->found_corrupt_fs flag can be set regardless of innodb_force_recovery if file names are found to be incorrect (for example, multiple files with the same tablespace ID).
7 years ago
MDEV-12699 Improve crash recovery of corrupted data pages InnoDB crash recovery used to read every data page for which redo log exists. This is unnecessary for those pages that are initialized by the redo log. If a newly created page is corrupted, recovery could unnecessarily fail. It would suffice to reinitialize the page based on the redo log records. To add insult to injury, InnoDB crash recovery could hang if it encountered a corrupted page. We will fix also that problem. InnoDB would normally refuse to start up if it encounters a corrupted page on recovery, but that can be overridden by setting innodb_force_recovery=1. Data pages are completely initialized by the records MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS. MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE, which notifies that a page has been freed and its contents can be discarded (filled with zeroes). The record MLOG_INDEX_LOAD notifies that redo logging has been re-enabled after being disabled. We can avoid loading the page if all buffered redo log records predate the MLOG_INDEX_LOAD record. For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD records were written before commit aa3f7a107ce3a9a7f80daf3cadd442a61c5493ab. Hence, we will skip these optimizations for tables whose name starts with FTS_. This is joint work with Thirunarayanan Balathandayuthapani. fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the latest recovered MLOG_INDEX_LOAD record for a tablespace. mlog_init: Page initialization operations discovered during redo log scanning. FIXME: This really belongs in recv_sys->addr_hash, and should be removed in MDEV-19176. recv_addr_state: Add the new state RECV_WILL_NOT_READ to indicate that according to mlog_init, the page will be initialized based on redo log record contents. recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS as page initialization. This works around bugs in the crash recovery of ROW_FORMAT=COMPRESSED tables. recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record by resetting the state to RECV_NOT_PROCESSED and by updating the fil_name_t::enable_lsn. recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn to fil_space_t::enable_lsn. recv_recover_page(): Add the parameter init_lsn, to ignore any log records that precede the page initialization. Add DBUG output about skipped operations. buf_page_create(): Initialize FIL_PAGE_LSN, so that recv_recover_page() will not wrongly skip applying the page-initialization record due to the field containing some newer LSN as a leftover from a different page. Do not invoke ibuf_merge_or_delete_for_page() during crash recovery. recv_apply_hashed_log_recs(): Remove some unnecessary lookups. Note if a corrupted page was found during recovery. After invoking buf_page_create(), do invoke ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge() in the last recovery batch. ibuf_merge_or_delete_for_page(): Relax a debug assertion. innobase_start_or_create_for_mysql(): Abort startup if a corrupted page was found during recovery. Corrupted pages will not be flagged if innodb_force_recovery is set. However, the recv_sys->found_corrupt_fs flag can be set regardless of innodb_force_recovery if file names are found to be incorrect (for example, multiple files with the same tablespace ID).
7 years ago
MDEV-12699 Improve crash recovery of corrupted data pages InnoDB crash recovery used to read every data page for which redo log exists. This is unnecessary for those pages that are initialized by the redo log. If a newly created page is corrupted, recovery could unnecessarily fail. It would suffice to reinitialize the page based on the redo log records. To add insult to injury, InnoDB crash recovery could hang if it encountered a corrupted page. We will fix also that problem. InnoDB would normally refuse to start up if it encounters a corrupted page on recovery, but that can be overridden by setting innodb_force_recovery=1. Data pages are completely initialized by the records MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS. MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE, which notifies that a page has been freed and its contents can be discarded (filled with zeroes). The record MLOG_INDEX_LOAD notifies that redo logging has been re-enabled after being disabled. We can avoid loading the page if all buffered redo log records predate the MLOG_INDEX_LOAD record. For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD records were written before commit aa3f7a107ce3a9a7f80daf3cadd442a61c5493ab. Hence, we will skip these optimizations for tables whose name starts with FTS_. This is joint work with Thirunarayanan Balathandayuthapani. fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the latest recovered MLOG_INDEX_LOAD record for a tablespace. mlog_init: Page initialization operations discovered during redo log scanning. FIXME: This really belongs in recv_sys->addr_hash, and should be removed in MDEV-19176. recv_addr_state: Add the new state RECV_WILL_NOT_READ to indicate that according to mlog_init, the page will be initialized based on redo log record contents. recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS as page initialization. This works around bugs in the crash recovery of ROW_FORMAT=COMPRESSED tables. recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record by resetting the state to RECV_NOT_PROCESSED and by updating the fil_name_t::enable_lsn. recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn to fil_space_t::enable_lsn. recv_recover_page(): Add the parameter init_lsn, to ignore any log records that precede the page initialization. Add DBUG output about skipped operations. buf_page_create(): Initialize FIL_PAGE_LSN, so that recv_recover_page() will not wrongly skip applying the page-initialization record due to the field containing some newer LSN as a leftover from a different page. Do not invoke ibuf_merge_or_delete_for_page() during crash recovery. recv_apply_hashed_log_recs(): Remove some unnecessary lookups. Note if a corrupted page was found during recovery. After invoking buf_page_create(), do invoke ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge() in the last recovery batch. ibuf_merge_or_delete_for_page(): Relax a debug assertion. innobase_start_or_create_for_mysql(): Abort startup if a corrupted page was found during recovery. Corrupted pages will not be flagged if innodb_force_recovery is set. However, the recv_sys->found_corrupt_fs flag can be set regardless of innodb_force_recovery if file names are found to be incorrect (for example, multiple files with the same tablespace ID).
7 years ago
MDEV-12699 Improve crash recovery of corrupted data pages InnoDB crash recovery used to read every data page for which redo log exists. This is unnecessary for those pages that are initialized by the redo log. If a newly created page is corrupted, recovery could unnecessarily fail. It would suffice to reinitialize the page based on the redo log records. To add insult to injury, InnoDB crash recovery could hang if it encountered a corrupted page. We will fix also that problem. InnoDB would normally refuse to start up if it encounters a corrupted page on recovery, but that can be overridden by setting innodb_force_recovery=1. Data pages are completely initialized by the records MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS. MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE, which notifies that a page has been freed and its contents can be discarded (filled with zeroes). The record MLOG_INDEX_LOAD notifies that redo logging has been re-enabled after being disabled. We can avoid loading the page if all buffered redo log records predate the MLOG_INDEX_LOAD record. For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD records were written before commit aa3f7a107ce3a9a7f80daf3cadd442a61c5493ab. Hence, we will skip these optimizations for tables whose name starts with FTS_. This is joint work with Thirunarayanan Balathandayuthapani. fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the latest recovered MLOG_INDEX_LOAD record for a tablespace. mlog_init: Page initialization operations discovered during redo log scanning. FIXME: This really belongs in recv_sys->addr_hash, and should be removed in MDEV-19176. recv_addr_state: Add the new state RECV_WILL_NOT_READ to indicate that according to mlog_init, the page will be initialized based on redo log record contents. recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS as page initialization. This works around bugs in the crash recovery of ROW_FORMAT=COMPRESSED tables. recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record by resetting the state to RECV_NOT_PROCESSED and by updating the fil_name_t::enable_lsn. recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn to fil_space_t::enable_lsn. recv_recover_page(): Add the parameter init_lsn, to ignore any log records that precede the page initialization. Add DBUG output about skipped operations. buf_page_create(): Initialize FIL_PAGE_LSN, so that recv_recover_page() will not wrongly skip applying the page-initialization record due to the field containing some newer LSN as a leftover from a different page. Do not invoke ibuf_merge_or_delete_for_page() during crash recovery. recv_apply_hashed_log_recs(): Remove some unnecessary lookups. Note if a corrupted page was found during recovery. After invoking buf_page_create(), do invoke ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge() in the last recovery batch. ibuf_merge_or_delete_for_page(): Relax a debug assertion. innobase_start_or_create_for_mysql(): Abort startup if a corrupted page was found during recovery. Corrupted pages will not be flagged if innodb_force_recovery is set. However, the recv_sys->found_corrupt_fs flag can be set regardless of innodb_force_recovery if file names are found to be incorrect (for example, multiple files with the same tablespace ID).
7 years ago
MDEV-12548 Initial implementation of Mariabackup for MariaDB 10.2 InnoDB I/O and buffer pool interfaces and the redo log format have been changed between MariaDB 10.1 and 10.2, and the backup code has to be adjusted accordingly. The code has been simplified, and many memory leaks have been fixed. Instead of the file name xtrabackup_logfile, the file name ib_logfile0 is being used for the copy of the redo log. Unnecessary InnoDB startup and shutdown and some unnecessary threads have been removed. Some help was provided by Vladislav Vaintroub. Parameters have been cleaned up and aligned with those of MariaDB 10.2. The --dbug option has been added, so that in debug builds, --dbug=d,ib_log can be specified to enable diagnostic messages for processing redo log entries. By default, innodb_doublewrite=OFF, so that --prepare works faster. If more crash-safety for --prepare is needed, double buffering can be enabled. The parameter innodb_log_checksums=OFF can be used to ignore redo log checksums in --backup. Some messages have been cleaned up. Unless --export is specified, Mariabackup will not deal with undo log. The InnoDB mini-transaction redo log is not only about user-level transactions; it is actually about mini-transactions. To avoid confusion, call it the redo log, not transaction log. We disable any undo log processing in --prepare. Because MariaDB 10.2 supports indexed virtual columns, the undo log processing would need to be able to evaluate virtual column expressions. To reduce the amount of code dependencies, we will not process any undo log in prepare. This means that the --export option must be disabled for now. This also means that the following options are redundant and have been removed: xtrabackup --apply-log-only innobackupex --redo-only In addition to disabling any undo log processing, we will disable any further changes to data pages during --prepare, including the change buffer merge. This means that restoring incremental backups should reliably work even when change buffering is being used on the server. Because of this, preparing a backup will not generate any further redo log, and the redo log file can be safely deleted. (If the --export option is enabled in the future, it must generate redo log when processing undo logs and buffered changes.) In --prepare, we cannot easily know if a partial backup was used, especially when restoring a series of incremental backups. So, we simply warn about any missing files, and ignore the redo log for them. FIXME: Enable the --export option. FIXME: Improve the handling of the MLOG_INDEX_LOAD record, and write a test that initiates a backup while an ALGORITHM=INPLACE operation is creating indexes or rebuilding a table. An error should be detected when preparing the backup. FIXME: In --incremental --prepare, xtrabackup_apply_delta() should ensure that if FSP_SIZE is modified, the file size will be adjusted accordingly.
8 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12699 Improve crash recovery of corrupted data pages InnoDB crash recovery used to read every data page for which redo log exists. This is unnecessary for those pages that are initialized by the redo log. If a newly created page is corrupted, recovery could unnecessarily fail. It would suffice to reinitialize the page based on the redo log records. To add insult to injury, InnoDB crash recovery could hang if it encountered a corrupted page. We will fix also that problem. InnoDB would normally refuse to start up if it encounters a corrupted page on recovery, but that can be overridden by setting innodb_force_recovery=1. Data pages are completely initialized by the records MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS. MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE, which notifies that a page has been freed and its contents can be discarded (filled with zeroes). The record MLOG_INDEX_LOAD notifies that redo logging has been re-enabled after being disabled. We can avoid loading the page if all buffered redo log records predate the MLOG_INDEX_LOAD record. For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD records were written before commit aa3f7a107ce3a9a7f80daf3cadd442a61c5493ab. Hence, we will skip these optimizations for tables whose name starts with FTS_. This is joint work with Thirunarayanan Balathandayuthapani. fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the latest recovered MLOG_INDEX_LOAD record for a tablespace. mlog_init: Page initialization operations discovered during redo log scanning. FIXME: This really belongs in recv_sys->addr_hash, and should be removed in MDEV-19176. recv_addr_state: Add the new state RECV_WILL_NOT_READ to indicate that according to mlog_init, the page will be initialized based on redo log record contents. recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS as page initialization. This works around bugs in the crash recovery of ROW_FORMAT=COMPRESSED tables. recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record by resetting the state to RECV_NOT_PROCESSED and by updating the fil_name_t::enable_lsn. recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn to fil_space_t::enable_lsn. recv_recover_page(): Add the parameter init_lsn, to ignore any log records that precede the page initialization. Add DBUG output about skipped operations. buf_page_create(): Initialize FIL_PAGE_LSN, so that recv_recover_page() will not wrongly skip applying the page-initialization record due to the field containing some newer LSN as a leftover from a different page. Do not invoke ibuf_merge_or_delete_for_page() during crash recovery. recv_apply_hashed_log_recs(): Remove some unnecessary lookups. Note if a corrupted page was found during recovery. After invoking buf_page_create(), do invoke ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge() in the last recovery batch. ibuf_merge_or_delete_for_page(): Relax a debug assertion. innobase_start_or_create_for_mysql(): Abort startup if a corrupted page was found during recovery. Corrupted pages will not be flagged if innodb_force_recovery is set. However, the recv_sys->found_corrupt_fs flag can be set regardless of innodb_force_recovery if file names are found to be incorrect (for example, multiple files with the same tablespace ID).
7 years ago
MDEV-11556 InnoDB redo log apply fails to adjust data file sizes fil_space_t::recv_size: New member: recovered tablespace size in pages; 0 if no size change was read from the redo log, or if the size change was implemented. fil_space_set_recv_size(): New function for setting space->recv_size. innodb_data_file_size_debug: A debug parameter for setting the system tablespace size in recovery even when the redo log does not contain any size changes. It is hard to write a small test case that would cause the system tablespace to be extended at the critical moment. recv_parse_log_rec(): Note those tablespaces whose size is being changed by the redo log, by invoking fil_space_set_recv_size(). innobase_init(): Correct an error message, and do not require a larger innodb_buffer_pool_size when starting up with a smaller innodb_page_size. innobase_start_or_create_for_mysql(): Allow startup with any initial size of the ibdata1 file if the autoextend attribute is set. Require the minimum size of fixed-size system tablespaces to be 640 pages, not 10 megabytes. Implement innodb_data_file_size_debug. open_or_create_data_files(): Round the system tablespace size down to pages, not to full megabytes, (Our test truncates the system tablespace to more than 800 pages with innodb_page_size=4k. InnoDB should not imagine that it was truncated to 768 pages and then overwrite good pages in the tablespace.) fil_flush_low(): Refactored from fil_flush(). fil_space_extend_must_retry(): Refactored from fil_extend_space_to_desired_size(). fil_mutex_enter_and_prepare_for_io(): Extend the tablespace if fil_space_set_recv_size() was called. The test case has been successfully run with all the innodb_page_size values 4k, 8k, 16k, 32k, 64k.
9 years ago
MDEV-11556 InnoDB redo log apply fails to adjust data file sizes fil_space_t::recv_size: New member: recovered tablespace size in pages; 0 if no size change was read from the redo log, or if the size change was implemented. fil_space_set_recv_size(): New function for setting space->recv_size. innodb_data_file_size_debug: A debug parameter for setting the system tablespace size in recovery even when the redo log does not contain any size changes. It is hard to write a small test case that would cause the system tablespace to be extended at the critical moment. recv_parse_log_rec(): Note those tablespaces whose size is being changed by the redo log, by invoking fil_space_set_recv_size(). innobase_init(): Correct an error message, and do not require a larger innodb_buffer_pool_size when starting up with a smaller innodb_page_size. innobase_start_or_create_for_mysql(): Allow startup with any initial size of the ibdata1 file if the autoextend attribute is set. Require the minimum size of fixed-size system tablespaces to be 640 pages, not 10 megabytes. Implement innodb_data_file_size_debug. open_or_create_data_files(): Round the system tablespace size down to pages, not to full megabytes, (Our test truncates the system tablespace to more than 800 pages with innodb_page_size=4k. InnoDB should not imagine that it was truncated to 768 pages and then overwrite good pages in the tablespace.) fil_flush_low(): Refactored from fil_flush(). fil_space_extend_must_retry(): Refactored from fil_extend_space_to_desired_size(). fil_mutex_enter_and_prepare_for_io(): Extend the tablespace if fil_space_set_recv_size() was called. The test case has been successfully run with all the innodb_page_size values 4k, 8k, 16k, 32k, 64k.
9 years ago
MDEV-12699 Improve crash recovery of corrupted data pages InnoDB crash recovery used to read every data page for which redo log exists. This is unnecessary for those pages that are initialized by the redo log. If a newly created page is corrupted, recovery could unnecessarily fail. It would suffice to reinitialize the page based on the redo log records. To add insult to injury, InnoDB crash recovery could hang if it encountered a corrupted page. We will fix also that problem. InnoDB would normally refuse to start up if it encounters a corrupted page on recovery, but that can be overridden by setting innodb_force_recovery=1. Data pages are completely initialized by the records MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS. MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE, which notifies that a page has been freed and its contents can be discarded (filled with zeroes). The record MLOG_INDEX_LOAD notifies that redo logging has been re-enabled after being disabled. We can avoid loading the page if all buffered redo log records predate the MLOG_INDEX_LOAD record. For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD records were written before commit aa3f7a107ce3a9a7f80daf3cadd442a61c5493ab. Hence, we will skip these optimizations for tables whose name starts with FTS_. This is joint work with Thirunarayanan Balathandayuthapani. fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the latest recovered MLOG_INDEX_LOAD record for a tablespace. mlog_init: Page initialization operations discovered during redo log scanning. FIXME: This really belongs in recv_sys->addr_hash, and should be removed in MDEV-19176. recv_addr_state: Add the new state RECV_WILL_NOT_READ to indicate that according to mlog_init, the page will be initialized based on redo log record contents. recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS as page initialization. This works around bugs in the crash recovery of ROW_FORMAT=COMPRESSED tables. recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record by resetting the state to RECV_NOT_PROCESSED and by updating the fil_name_t::enable_lsn. recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn to fil_space_t::enable_lsn. recv_recover_page(): Add the parameter init_lsn, to ignore any log records that precede the page initialization. Add DBUG output about skipped operations. buf_page_create(): Initialize FIL_PAGE_LSN, so that recv_recover_page() will not wrongly skip applying the page-initialization record due to the field containing some newer LSN as a leftover from a different page. Do not invoke ibuf_merge_or_delete_for_page() during crash recovery. recv_apply_hashed_log_recs(): Remove some unnecessary lookups. Note if a corrupted page was found during recovery. After invoking buf_page_create(), do invoke ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge() in the last recovery batch. ibuf_merge_or_delete_for_page(): Relax a debug assertion. innobase_start_or_create_for_mysql(): Abort startup if a corrupted page was found during recovery. Corrupted pages will not be flagged if innodb_force_recovery is set. However, the recv_sys->found_corrupt_fs flag can be set regardless of innodb_force_recovery if file names are found to be incorrect (for example, multiple files with the same tablespace ID).
7 years ago
MDEV-12699 Improve crash recovery of corrupted data pages InnoDB crash recovery used to read every data page for which redo log exists. This is unnecessary for those pages that are initialized by the redo log. If a newly created page is corrupted, recovery could unnecessarily fail. It would suffice to reinitialize the page based on the redo log records. To add insult to injury, InnoDB crash recovery could hang if it encountered a corrupted page. We will fix also that problem. InnoDB would normally refuse to start up if it encounters a corrupted page on recovery, but that can be overridden by setting innodb_force_recovery=1. Data pages are completely initialized by the records MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS. MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE, which notifies that a page has been freed and its contents can be discarded (filled with zeroes). The record MLOG_INDEX_LOAD notifies that redo logging has been re-enabled after being disabled. We can avoid loading the page if all buffered redo log records predate the MLOG_INDEX_LOAD record. For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD records were written before commit aa3f7a107ce3a9a7f80daf3cadd442a61c5493ab. Hence, we will skip these optimizations for tables whose name starts with FTS_. This is joint work with Thirunarayanan Balathandayuthapani. fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the latest recovered MLOG_INDEX_LOAD record for a tablespace. mlog_init: Page initialization operations discovered during redo log scanning. FIXME: This really belongs in recv_sys->addr_hash, and should be removed in MDEV-19176. recv_addr_state: Add the new state RECV_WILL_NOT_READ to indicate that according to mlog_init, the page will be initialized based on redo log record contents. recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS as page initialization. This works around bugs in the crash recovery of ROW_FORMAT=COMPRESSED tables. recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record by resetting the state to RECV_NOT_PROCESSED and by updating the fil_name_t::enable_lsn. recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn to fil_space_t::enable_lsn. recv_recover_page(): Add the parameter init_lsn, to ignore any log records that precede the page initialization. Add DBUG output about skipped operations. buf_page_create(): Initialize FIL_PAGE_LSN, so that recv_recover_page() will not wrongly skip applying the page-initialization record due to the field containing some newer LSN as a leftover from a different page. Do not invoke ibuf_merge_or_delete_for_page() during crash recovery. recv_apply_hashed_log_recs(): Remove some unnecessary lookups. Note if a corrupted page was found during recovery. After invoking buf_page_create(), do invoke ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge() in the last recovery batch. ibuf_merge_or_delete_for_page(): Relax a debug assertion. innobase_start_or_create_for_mysql(): Abort startup if a corrupted page was found during recovery. Corrupted pages will not be flagged if innodb_force_recovery is set. However, the recv_sys->found_corrupt_fs flag can be set regardless of innodb_force_recovery if file names are found to be incorrect (for example, multiple files with the same tablespace ID).
7 years ago
MDEV-12548 Initial implementation of Mariabackup for MariaDB 10.2 InnoDB I/O and buffer pool interfaces and the redo log format have been changed between MariaDB 10.1 and 10.2, and the backup code has to be adjusted accordingly. The code has been simplified, and many memory leaks have been fixed. Instead of the file name xtrabackup_logfile, the file name ib_logfile0 is being used for the copy of the redo log. Unnecessary InnoDB startup and shutdown and some unnecessary threads have been removed. Some help was provided by Vladislav Vaintroub. Parameters have been cleaned up and aligned with those of MariaDB 10.2. The --dbug option has been added, so that in debug builds, --dbug=d,ib_log can be specified to enable diagnostic messages for processing redo log entries. By default, innodb_doublewrite=OFF, so that --prepare works faster. If more crash-safety for --prepare is needed, double buffering can be enabled. The parameter innodb_log_checksums=OFF can be used to ignore redo log checksums in --backup. Some messages have been cleaned up. Unless --export is specified, Mariabackup will not deal with undo log. The InnoDB mini-transaction redo log is not only about user-level transactions; it is actually about mini-transactions. To avoid confusion, call it the redo log, not transaction log. We disable any undo log processing in --prepare. Because MariaDB 10.2 supports indexed virtual columns, the undo log processing would need to be able to evaluate virtual column expressions. To reduce the amount of code dependencies, we will not process any undo log in prepare. This means that the --export option must be disabled for now. This also means that the following options are redundant and have been removed: xtrabackup --apply-log-only innobackupex --redo-only In addition to disabling any undo log processing, we will disable any further changes to data pages during --prepare, including the change buffer merge. This means that restoring incremental backups should reliably work even when change buffering is being used on the server. Because of this, preparing a backup will not generate any further redo log, and the redo log file can be safely deleted. (If the --export option is enabled in the future, it must generate redo log when processing undo logs and buffered changes.) In --prepare, we cannot easily know if a partial backup was used, especially when restoring a series of incremental backups. So, we simply warn about any missing files, and ignore the redo log for them. FIXME: Enable the --export option. FIXME: Improve the handling of the MLOG_INDEX_LOAD record, and write a test that initiates a backup while an ALGORITHM=INPLACE operation is creating indexes or rebuilding a table. An error should be detected when preparing the backup. FIXME: In --incremental --prepare, xtrabackup_apply_delta() should ensure that if FSP_SIZE is modified, the file size will be adjusted accordingly.
8 years ago
MDEV-12699 Improve crash recovery of corrupted data pages InnoDB crash recovery used to read every data page for which redo log exists. This is unnecessary for those pages that are initialized by the redo log. If a newly created page is corrupted, recovery could unnecessarily fail. It would suffice to reinitialize the page based on the redo log records. To add insult to injury, InnoDB crash recovery could hang if it encountered a corrupted page. We will fix also that problem. InnoDB would normally refuse to start up if it encounters a corrupted page on recovery, but that can be overridden by setting innodb_force_recovery=1. Data pages are completely initialized by the records MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS. MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE, which notifies that a page has been freed and its contents can be discarded (filled with zeroes). The record MLOG_INDEX_LOAD notifies that redo logging has been re-enabled after being disabled. We can avoid loading the page if all buffered redo log records predate the MLOG_INDEX_LOAD record. For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD records were written before commit aa3f7a107ce3a9a7f80daf3cadd442a61c5493ab. Hence, we will skip these optimizations for tables whose name starts with FTS_. This is joint work with Thirunarayanan Balathandayuthapani. fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the latest recovered MLOG_INDEX_LOAD record for a tablespace. mlog_init: Page initialization operations discovered during redo log scanning. FIXME: This really belongs in recv_sys->addr_hash, and should be removed in MDEV-19176. recv_addr_state: Add the new state RECV_WILL_NOT_READ to indicate that according to mlog_init, the page will be initialized based on redo log record contents. recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS as page initialization. This works around bugs in the crash recovery of ROW_FORMAT=COMPRESSED tables. recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record by resetting the state to RECV_NOT_PROCESSED and by updating the fil_name_t::enable_lsn. recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn to fil_space_t::enable_lsn. recv_recover_page(): Add the parameter init_lsn, to ignore any log records that precede the page initialization. Add DBUG output about skipped operations. buf_page_create(): Initialize FIL_PAGE_LSN, so that recv_recover_page() will not wrongly skip applying the page-initialization record due to the field containing some newer LSN as a leftover from a different page. Do not invoke ibuf_merge_or_delete_for_page() during crash recovery. recv_apply_hashed_log_recs(): Remove some unnecessary lookups. Note if a corrupted page was found during recovery. After invoking buf_page_create(), do invoke ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge() in the last recovery batch. ibuf_merge_or_delete_for_page(): Relax a debug assertion. innobase_start_or_create_for_mysql(): Abort startup if a corrupted page was found during recovery. Corrupted pages will not be flagged if innodb_force_recovery is set. However, the recv_sys->found_corrupt_fs flag can be set regardless of innodb_force_recovery if file names are found to be incorrect (for example, multiple files with the same tablespace ID).
7 years ago
MDEV-12548 Initial implementation of Mariabackup for MariaDB 10.2 InnoDB I/O and buffer pool interfaces and the redo log format have been changed between MariaDB 10.1 and 10.2, and the backup code has to be adjusted accordingly. The code has been simplified, and many memory leaks have been fixed. Instead of the file name xtrabackup_logfile, the file name ib_logfile0 is being used for the copy of the redo log. Unnecessary InnoDB startup and shutdown and some unnecessary threads have been removed. Some help was provided by Vladislav Vaintroub. Parameters have been cleaned up and aligned with those of MariaDB 10.2. The --dbug option has been added, so that in debug builds, --dbug=d,ib_log can be specified to enable diagnostic messages for processing redo log entries. By default, innodb_doublewrite=OFF, so that --prepare works faster. If more crash-safety for --prepare is needed, double buffering can be enabled. The parameter innodb_log_checksums=OFF can be used to ignore redo log checksums in --backup. Some messages have been cleaned up. Unless --export is specified, Mariabackup will not deal with undo log. The InnoDB mini-transaction redo log is not only about user-level transactions; it is actually about mini-transactions. To avoid confusion, call it the redo log, not transaction log. We disable any undo log processing in --prepare. Because MariaDB 10.2 supports indexed virtual columns, the undo log processing would need to be able to evaluate virtual column expressions. To reduce the amount of code dependencies, we will not process any undo log in prepare. This means that the --export option must be disabled for now. This also means that the following options are redundant and have been removed: xtrabackup --apply-log-only innobackupex --redo-only In addition to disabling any undo log processing, we will disable any further changes to data pages during --prepare, including the change buffer merge. This means that restoring incremental backups should reliably work even when change buffering is being used on the server. Because of this, preparing a backup will not generate any further redo log, and the redo log file can be safely deleted. (If the --export option is enabled in the future, it must generate redo log when processing undo logs and buffered changes.) In --prepare, we cannot easily know if a partial backup was used, especially when restoring a series of incremental backups. So, we simply warn about any missing files, and ignore the redo log for them. FIXME: Enable the --export option. FIXME: Improve the handling of the MLOG_INDEX_LOAD record, and write a test that initiates a backup while an ALGORITHM=INPLACE operation is creating indexes or rebuilding a table. An error should be detected when preparing the backup. FIXME: In --incremental --prepare, xtrabackup_apply_delta() should ensure that if FSP_SIZE is modified, the file size will be adjusted accordingly.
8 years ago
MDEV-12103 Reduce the time of looking for MLOG_CHECKPOINT during crash recovery This fixes MySQL Bug#80788 in MariaDB 10.2.5. When I made the InnoDB crash recovery more robust by implementing WL#7142, I also introduced an extra redo log scan pass that can be shortened. This fix will slightly extend the InnoDB redo log format that I introduced in MySQL 5.7.9 by writing the start LSN of the MLOG_CHECKPOINT mini-transaction to the end of the log checkpoint page, so that recovery can jump straight to it without scanning all the preceding redo log. LOG_CHECKPOINT_END_LSN: At the end of the checkpoint page, the start LSN of the MLOG_CHECKPOINT mini-transaction. Previously, these bytes were written as 0. log_write_checkpoint_info(), log_group_checkpoint(): Add the parameter end_lsn for writing LOG_CHECKPOINT_END_LSN. log_checkpoint(): Remember the LSN at which the MLOG_CHECKPOINT mini-transaction is starting (or at which the redo log ends on shutdown). recv_init_crash_recovery(): Remove. recv_group_scan_log_recs(): Add the parameter checkpoint_lsn. recv_recovery_from_checkpoint_start(): Read LOG_CHECKPOINT_END_LSN and if it is set, start the first scan from it instead of the checkpoint LSN. Improve some messages and remove bogus assertions. recv_parse_log_recs(): Do not skip DBUG_PRINT("ib_log") for some file-level redo log records. recv_parse_or_apply_log_rec_body(): If we have not parsed all redo log between the checkpoint and the corresponding MLOG_CHECKPOINT record, defer the check for MLOG_FILE_DELETE or MLOG_FILE_NAME records to recv_init_crash_recovery_spaces(). recv_init_crash_recovery_spaces(): Refuse recovery if MLOG_FILE_NAME or MLOG_FILE_DELETE records are missing.
9 years ago
MDEV-12548 Initial implementation of Mariabackup for MariaDB 10.2 InnoDB I/O and buffer pool interfaces and the redo log format have been changed between MariaDB 10.1 and 10.2, and the backup code has to be adjusted accordingly. The code has been simplified, and many memory leaks have been fixed. Instead of the file name xtrabackup_logfile, the file name ib_logfile0 is being used for the copy of the redo log. Unnecessary InnoDB startup and shutdown and some unnecessary threads have been removed. Some help was provided by Vladislav Vaintroub. Parameters have been cleaned up and aligned with those of MariaDB 10.2. The --dbug option has been added, so that in debug builds, --dbug=d,ib_log can be specified to enable diagnostic messages for processing redo log entries. By default, innodb_doublewrite=OFF, so that --prepare works faster. If more crash-safety for --prepare is needed, double buffering can be enabled. The parameter innodb_log_checksums=OFF can be used to ignore redo log checksums in --backup. Some messages have been cleaned up. Unless --export is specified, Mariabackup will not deal with undo log. The InnoDB mini-transaction redo log is not only about user-level transactions; it is actually about mini-transactions. To avoid confusion, call it the redo log, not transaction log. We disable any undo log processing in --prepare. Because MariaDB 10.2 supports indexed virtual columns, the undo log processing would need to be able to evaluate virtual column expressions. To reduce the amount of code dependencies, we will not process any undo log in prepare. This means that the --export option must be disabled for now. This also means that the following options are redundant and have been removed: xtrabackup --apply-log-only innobackupex --redo-only In addition to disabling any undo log processing, we will disable any further changes to data pages during --prepare, including the change buffer merge. This means that restoring incremental backups should reliably work even when change buffering is being used on the server. Because of this, preparing a backup will not generate any further redo log, and the redo log file can be safely deleted. (If the --export option is enabled in the future, it must generate redo log when processing undo logs and buffered changes.) In --prepare, we cannot easily know if a partial backup was used, especially when restoring a series of incremental backups. So, we simply warn about any missing files, and ignore the redo log for them. FIXME: Enable the --export option. FIXME: Improve the handling of the MLOG_INDEX_LOAD record, and write a test that initiates a backup while an ALGORITHM=INPLACE operation is creating indexes or rebuilding a table. An error should be detected when preparing the backup. FIXME: In --incremental --prepare, xtrabackup_apply_delta() should ensure that if FSP_SIZE is modified, the file size will be adjusted accordingly.
8 years ago
MDEV-12699 Improve crash recovery of corrupted data pages InnoDB crash recovery used to read every data page for which redo log exists. This is unnecessary for those pages that are initialized by the redo log. If a newly created page is corrupted, recovery could unnecessarily fail. It would suffice to reinitialize the page based on the redo log records. To add insult to injury, InnoDB crash recovery could hang if it encountered a corrupted page. We will fix also that problem. InnoDB would normally refuse to start up if it encounters a corrupted page on recovery, but that can be overridden by setting innodb_force_recovery=1. Data pages are completely initialized by the records MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS. MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE, which notifies that a page has been freed and its contents can be discarded (filled with zeroes). The record MLOG_INDEX_LOAD notifies that redo logging has been re-enabled after being disabled. We can avoid loading the page if all buffered redo log records predate the MLOG_INDEX_LOAD record. For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD records were written before commit aa3f7a107ce3a9a7f80daf3cadd442a61c5493ab. Hence, we will skip these optimizations for tables whose name starts with FTS_. This is joint work with Thirunarayanan Balathandayuthapani. fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the latest recovered MLOG_INDEX_LOAD record for a tablespace. mlog_init: Page initialization operations discovered during redo log scanning. FIXME: This really belongs in recv_sys->addr_hash, and should be removed in MDEV-19176. recv_addr_state: Add the new state RECV_WILL_NOT_READ to indicate that according to mlog_init, the page will be initialized based on redo log record contents. recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS as page initialization. This works around bugs in the crash recovery of ROW_FORMAT=COMPRESSED tables. recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record by resetting the state to RECV_NOT_PROCESSED and by updating the fil_name_t::enable_lsn. recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn to fil_space_t::enable_lsn. recv_recover_page(): Add the parameter init_lsn, to ignore any log records that precede the page initialization. Add DBUG output about skipped operations. buf_page_create(): Initialize FIL_PAGE_LSN, so that recv_recover_page() will not wrongly skip applying the page-initialization record due to the field containing some newer LSN as a leftover from a different page. Do not invoke ibuf_merge_or_delete_for_page() during crash recovery. recv_apply_hashed_log_recs(): Remove some unnecessary lookups. Note if a corrupted page was found during recovery. After invoking buf_page_create(), do invoke ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge() in the last recovery batch. ibuf_merge_or_delete_for_page(): Relax a debug assertion. innobase_start_or_create_for_mysql(): Abort startup if a corrupted page was found during recovery. Corrupted pages will not be flagged if innodb_force_recovery is set. However, the recv_sys->found_corrupt_fs flag can be set regardless of innodb_force_recovery if file names are found to be incorrect (for example, multiple files with the same tablespace ID).
7 years ago
MDEV-12548 Initial implementation of Mariabackup for MariaDB 10.2 InnoDB I/O and buffer pool interfaces and the redo log format have been changed between MariaDB 10.1 and 10.2, and the backup code has to be adjusted accordingly. The code has been simplified, and many memory leaks have been fixed. Instead of the file name xtrabackup_logfile, the file name ib_logfile0 is being used for the copy of the redo log. Unnecessary InnoDB startup and shutdown and some unnecessary threads have been removed. Some help was provided by Vladislav Vaintroub. Parameters have been cleaned up and aligned with those of MariaDB 10.2. The --dbug option has been added, so that in debug builds, --dbug=d,ib_log can be specified to enable diagnostic messages for processing redo log entries. By default, innodb_doublewrite=OFF, so that --prepare works faster. If more crash-safety for --prepare is needed, double buffering can be enabled. The parameter innodb_log_checksums=OFF can be used to ignore redo log checksums in --backup. Some messages have been cleaned up. Unless --export is specified, Mariabackup will not deal with undo log. The InnoDB mini-transaction redo log is not only about user-level transactions; it is actually about mini-transactions. To avoid confusion, call it the redo log, not transaction log. We disable any undo log processing in --prepare. Because MariaDB 10.2 supports indexed virtual columns, the undo log processing would need to be able to evaluate virtual column expressions. To reduce the amount of code dependencies, we will not process any undo log in prepare. This means that the --export option must be disabled for now. This also means that the following options are redundant and have been removed: xtrabackup --apply-log-only innobackupex --redo-only In addition to disabling any undo log processing, we will disable any further changes to data pages during --prepare, including the change buffer merge. This means that restoring incremental backups should reliably work even when change buffering is being used on the server. Because of this, preparing a backup will not generate any further redo log, and the redo log file can be safely deleted. (If the --export option is enabled in the future, it must generate redo log when processing undo logs and buffered changes.) In --prepare, we cannot easily know if a partial backup was used, especially when restoring a series of incremental backups. So, we simply warn about any missing files, and ignore the redo log for them. FIXME: Enable the --export option. FIXME: Improve the handling of the MLOG_INDEX_LOAD record, and write a test that initiates a backup while an ALGORITHM=INPLACE operation is creating indexes or rebuilding a table. An error should be detected when preparing the backup. FIXME: In --incremental --prepare, xtrabackup_apply_delta() should ensure that if FSP_SIZE is modified, the file size will be adjusted accordingly.
8 years ago
MDEV-12103 Reduce the time of looking for MLOG_CHECKPOINT during crash recovery This fixes MySQL Bug#80788 in MariaDB 10.2.5. When I made the InnoDB crash recovery more robust by implementing WL#7142, I also introduced an extra redo log scan pass that can be shortened. This fix will slightly extend the InnoDB redo log format that I introduced in MySQL 5.7.9 by writing the start LSN of the MLOG_CHECKPOINT mini-transaction to the end of the log checkpoint page, so that recovery can jump straight to it without scanning all the preceding redo log. LOG_CHECKPOINT_END_LSN: At the end of the checkpoint page, the start LSN of the MLOG_CHECKPOINT mini-transaction. Previously, these bytes were written as 0. log_write_checkpoint_info(), log_group_checkpoint(): Add the parameter end_lsn for writing LOG_CHECKPOINT_END_LSN. log_checkpoint(): Remember the LSN at which the MLOG_CHECKPOINT mini-transaction is starting (or at which the redo log ends on shutdown). recv_init_crash_recovery(): Remove. recv_group_scan_log_recs(): Add the parameter checkpoint_lsn. recv_recovery_from_checkpoint_start(): Read LOG_CHECKPOINT_END_LSN and if it is set, start the first scan from it instead of the checkpoint LSN. Improve some messages and remove bogus assertions. recv_parse_log_recs(): Do not skip DBUG_PRINT("ib_log") for some file-level redo log records. recv_parse_or_apply_log_rec_body(): If we have not parsed all redo log between the checkpoint and the corresponding MLOG_CHECKPOINT record, defer the check for MLOG_FILE_DELETE or MLOG_FILE_NAME records to recv_init_crash_recovery_spaces(). recv_init_crash_recovery_spaces(): Refuse recovery if MLOG_FILE_NAME or MLOG_FILE_DELETE records are missing.
9 years ago
MDEV-12103 Reduce the time of looking for MLOG_CHECKPOINT during crash recovery This fixes MySQL Bug#80788 in MariaDB 10.2.5. When I made the InnoDB crash recovery more robust by implementing WL#7142, I also introduced an extra redo log scan pass that can be shortened. This fix will slightly extend the InnoDB redo log format that I introduced in MySQL 5.7.9 by writing the start LSN of the MLOG_CHECKPOINT mini-transaction to the end of the log checkpoint page, so that recovery can jump straight to it without scanning all the preceding redo log. LOG_CHECKPOINT_END_LSN: At the end of the checkpoint page, the start LSN of the MLOG_CHECKPOINT mini-transaction. Previously, these bytes were written as 0. log_write_checkpoint_info(), log_group_checkpoint(): Add the parameter end_lsn for writing LOG_CHECKPOINT_END_LSN. log_checkpoint(): Remember the LSN at which the MLOG_CHECKPOINT mini-transaction is starting (or at which the redo log ends on shutdown). recv_init_crash_recovery(): Remove. recv_group_scan_log_recs(): Add the parameter checkpoint_lsn. recv_recovery_from_checkpoint_start(): Read LOG_CHECKPOINT_END_LSN and if it is set, start the first scan from it instead of the checkpoint LSN. Improve some messages and remove bogus assertions. recv_parse_log_recs(): Do not skip DBUG_PRINT("ib_log") for some file-level redo log records. recv_parse_or_apply_log_rec_body(): If we have not parsed all redo log between the checkpoint and the corresponding MLOG_CHECKPOINT record, defer the check for MLOG_FILE_DELETE or MLOG_FILE_NAME records to recv_init_crash_recovery_spaces(). recv_init_crash_recovery_spaces(): Refuse recovery if MLOG_FILE_NAME or MLOG_FILE_DELETE records are missing.
9 years ago
MDEV-12103 Reduce the time of looking for MLOG_CHECKPOINT during crash recovery This fixes MySQL Bug#80788 in MariaDB 10.2.5. When I made the InnoDB crash recovery more robust by implementing WL#7142, I also introduced an extra redo log scan pass that can be shortened. This fix will slightly extend the InnoDB redo log format that I introduced in MySQL 5.7.9 by writing the start LSN of the MLOG_CHECKPOINT mini-transaction to the end of the log checkpoint page, so that recovery can jump straight to it without scanning all the preceding redo log. LOG_CHECKPOINT_END_LSN: At the end of the checkpoint page, the start LSN of the MLOG_CHECKPOINT mini-transaction. Previously, these bytes were written as 0. log_write_checkpoint_info(), log_group_checkpoint(): Add the parameter end_lsn for writing LOG_CHECKPOINT_END_LSN. log_checkpoint(): Remember the LSN at which the MLOG_CHECKPOINT mini-transaction is starting (or at which the redo log ends on shutdown). recv_init_crash_recovery(): Remove. recv_group_scan_log_recs(): Add the parameter checkpoint_lsn. recv_recovery_from_checkpoint_start(): Read LOG_CHECKPOINT_END_LSN and if it is set, start the first scan from it instead of the checkpoint LSN. Improve some messages and remove bogus assertions. recv_parse_log_recs(): Do not skip DBUG_PRINT("ib_log") for some file-level redo log records. recv_parse_or_apply_log_rec_body(): If we have not parsed all redo log between the checkpoint and the corresponding MLOG_CHECKPOINT record, defer the check for MLOG_FILE_DELETE or MLOG_FILE_NAME records to recv_init_crash_recovery_spaces(). recv_init_crash_recovery_spaces(): Refuse recovery if MLOG_FILE_NAME or MLOG_FILE_DELETE records are missing.
9 years ago
MDEV-12103 Reduce the time of looking for MLOG_CHECKPOINT during crash recovery This fixes MySQL Bug#80788 in MariaDB 10.2.5. When I made the InnoDB crash recovery more robust by implementing WL#7142, I also introduced an extra redo log scan pass that can be shortened. This fix will slightly extend the InnoDB redo log format that I introduced in MySQL 5.7.9 by writing the start LSN of the MLOG_CHECKPOINT mini-transaction to the end of the log checkpoint page, so that recovery can jump straight to it without scanning all the preceding redo log. LOG_CHECKPOINT_END_LSN: At the end of the checkpoint page, the start LSN of the MLOG_CHECKPOINT mini-transaction. Previously, these bytes were written as 0. log_write_checkpoint_info(), log_group_checkpoint(): Add the parameter end_lsn for writing LOG_CHECKPOINT_END_LSN. log_checkpoint(): Remember the LSN at which the MLOG_CHECKPOINT mini-transaction is starting (or at which the redo log ends on shutdown). recv_init_crash_recovery(): Remove. recv_group_scan_log_recs(): Add the parameter checkpoint_lsn. recv_recovery_from_checkpoint_start(): Read LOG_CHECKPOINT_END_LSN and if it is set, start the first scan from it instead of the checkpoint LSN. Improve some messages and remove bogus assertions. recv_parse_log_recs(): Do not skip DBUG_PRINT("ib_log") for some file-level redo log records. recv_parse_or_apply_log_rec_body(): If we have not parsed all redo log between the checkpoint and the corresponding MLOG_CHECKPOINT record, defer the check for MLOG_FILE_DELETE or MLOG_FILE_NAME records to recv_init_crash_recovery_spaces(). recv_init_crash_recovery_spaces(): Refuse recovery if MLOG_FILE_NAME or MLOG_FILE_DELETE records are missing.
9 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-12103 Reduce the time of looking for MLOG_CHECKPOINT during crash recovery This fixes MySQL Bug#80788 in MariaDB 10.2.5. When I made the InnoDB crash recovery more robust by implementing WL#7142, I also introduced an extra redo log scan pass that can be shortened. This fix will slightly extend the InnoDB redo log format that I introduced in MySQL 5.7.9 by writing the start LSN of the MLOG_CHECKPOINT mini-transaction to the end of the log checkpoint page, so that recovery can jump straight to it without scanning all the preceding redo log. LOG_CHECKPOINT_END_LSN: At the end of the checkpoint page, the start LSN of the MLOG_CHECKPOINT mini-transaction. Previously, these bytes were written as 0. log_write_checkpoint_info(), log_group_checkpoint(): Add the parameter end_lsn for writing LOG_CHECKPOINT_END_LSN. log_checkpoint(): Remember the LSN at which the MLOG_CHECKPOINT mini-transaction is starting (or at which the redo log ends on shutdown). recv_init_crash_recovery(): Remove. recv_group_scan_log_recs(): Add the parameter checkpoint_lsn. recv_recovery_from_checkpoint_start(): Read LOG_CHECKPOINT_END_LSN and if it is set, start the first scan from it instead of the checkpoint LSN. Improve some messages and remove bogus assertions. recv_parse_log_recs(): Do not skip DBUG_PRINT("ib_log") for some file-level redo log records. recv_parse_or_apply_log_rec_body(): If we have not parsed all redo log between the checkpoint and the corresponding MLOG_CHECKPOINT record, defer the check for MLOG_FILE_DELETE or MLOG_FILE_NAME records to recv_init_crash_recovery_spaces(). recv_init_crash_recovery_spaces(): Refuse recovery if MLOG_FILE_NAME or MLOG_FILE_DELETE records are missing.
9 years ago
MDEV-12103 Reduce the time of looking for MLOG_CHECKPOINT during crash recovery This fixes MySQL Bug#80788 in MariaDB 10.2.5. When I made the InnoDB crash recovery more robust by implementing WL#7142, I also introduced an extra redo log scan pass that can be shortened. This fix will slightly extend the InnoDB redo log format that I introduced in MySQL 5.7.9 by writing the start LSN of the MLOG_CHECKPOINT mini-transaction to the end of the log checkpoint page, so that recovery can jump straight to it without scanning all the preceding redo log. LOG_CHECKPOINT_END_LSN: At the end of the checkpoint page, the start LSN of the MLOG_CHECKPOINT mini-transaction. Previously, these bytes were written as 0. log_write_checkpoint_info(), log_group_checkpoint(): Add the parameter end_lsn for writing LOG_CHECKPOINT_END_LSN. log_checkpoint(): Remember the LSN at which the MLOG_CHECKPOINT mini-transaction is starting (or at which the redo log ends on shutdown). recv_init_crash_recovery(): Remove. recv_group_scan_log_recs(): Add the parameter checkpoint_lsn. recv_recovery_from_checkpoint_start(): Read LOG_CHECKPOINT_END_LSN and if it is set, start the first scan from it instead of the checkpoint LSN. Improve some messages and remove bogus assertions. recv_parse_log_recs(): Do not skip DBUG_PRINT("ib_log") for some file-level redo log records. recv_parse_or_apply_log_rec_body(): If we have not parsed all redo log between the checkpoint and the corresponding MLOG_CHECKPOINT record, defer the check for MLOG_FILE_DELETE or MLOG_FILE_NAME records to recv_init_crash_recovery_spaces(). recv_init_crash_recovery_spaces(): Refuse recovery if MLOG_FILE_NAME or MLOG_FILE_DELETE records are missing.
9 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-12548 Initial implementation of Mariabackup for MariaDB 10.2 InnoDB I/O and buffer pool interfaces and the redo log format have been changed between MariaDB 10.1 and 10.2, and the backup code has to be adjusted accordingly. The code has been simplified, and many memory leaks have been fixed. Instead of the file name xtrabackup_logfile, the file name ib_logfile0 is being used for the copy of the redo log. Unnecessary InnoDB startup and shutdown and some unnecessary threads have been removed. Some help was provided by Vladislav Vaintroub. Parameters have been cleaned up and aligned with those of MariaDB 10.2. The --dbug option has been added, so that in debug builds, --dbug=d,ib_log can be specified to enable diagnostic messages for processing redo log entries. By default, innodb_doublewrite=OFF, so that --prepare works faster. If more crash-safety for --prepare is needed, double buffering can be enabled. The parameter innodb_log_checksums=OFF can be used to ignore redo log checksums in --backup. Some messages have been cleaned up. Unless --export is specified, Mariabackup will not deal with undo log. The InnoDB mini-transaction redo log is not only about user-level transactions; it is actually about mini-transactions. To avoid confusion, call it the redo log, not transaction log. We disable any undo log processing in --prepare. Because MariaDB 10.2 supports indexed virtual columns, the undo log processing would need to be able to evaluate virtual column expressions. To reduce the amount of code dependencies, we will not process any undo log in prepare. This means that the --export option must be disabled for now. This also means that the following options are redundant and have been removed: xtrabackup --apply-log-only innobackupex --redo-only In addition to disabling any undo log processing, we will disable any further changes to data pages during --prepare, including the change buffer merge. This means that restoring incremental backups should reliably work even when change buffering is being used on the server. Because of this, preparing a backup will not generate any further redo log, and the redo log file can be safely deleted. (If the --export option is enabled in the future, it must generate redo log when processing undo logs and buffered changes.) In --prepare, we cannot easily know if a partial backup was used, especially when restoring a series of incremental backups. So, we simply warn about any missing files, and ignore the redo log for them. FIXME: Enable the --export option. FIXME: Improve the handling of the MLOG_INDEX_LOAD record, and write a test that initiates a backup while an ALGORITHM=INPLACE operation is creating indexes or rebuilding a table. An error should be detected when preparing the backup. FIXME: In --incremental --prepare, xtrabackup_apply_delta() should ensure that if FSP_SIZE is modified, the file size will be adjusted accordingly.
8 years ago
MDEV-12699 Improve crash recovery of corrupted data pages InnoDB crash recovery used to read every data page for which redo log exists. This is unnecessary for those pages that are initialized by the redo log. If a newly created page is corrupted, recovery could unnecessarily fail. It would suffice to reinitialize the page based on the redo log records. To add insult to injury, InnoDB crash recovery could hang if it encountered a corrupted page. We will fix also that problem. InnoDB would normally refuse to start up if it encounters a corrupted page on recovery, but that can be overridden by setting innodb_force_recovery=1. Data pages are completely initialized by the records MLOG_INIT_FILE_PAGE2 and MLOG_ZIP_PAGE_COMPRESS. MariaDB 10.4 additionally recognizes MLOG_INIT_FREE_PAGE, which notifies that a page has been freed and its contents can be discarded (filled with zeroes). The record MLOG_INDEX_LOAD notifies that redo logging has been re-enabled after being disabled. We can avoid loading the page if all buffered redo log records predate the MLOG_INDEX_LOAD record. For the internal tables of FULLTEXT INDEX, no MLOG_INDEX_LOAD records were written before commit aa3f7a107ce3a9a7f80daf3cadd442a61c5493ab. Hence, we will skip these optimizations for tables whose name starts with FTS_. This is joint work with Thirunarayanan Balathandayuthapani. fil_space_t::enable_lsn, file_name_t::enable_lsn: The LSN of the latest recovered MLOG_INDEX_LOAD record for a tablespace. mlog_init: Page initialization operations discovered during redo log scanning. FIXME: This really belongs in recv_sys->addr_hash, and should be removed in MDEV-19176. recv_addr_state: Add the new state RECV_WILL_NOT_READ to indicate that according to mlog_init, the page will be initialized based on redo log record contents. recv_add_to_hash_table(): Set the RECV_WILL_NOT_READ state if appropriate. For now, we do not treat MLOG_ZIP_PAGE_COMPRESS as page initialization. This works around bugs in the crash recovery of ROW_FORMAT=COMPRESSED tables. recv_mark_log_index_load(): Process a MLOG_INDEX_LOAD record by resetting the state to RECV_NOT_PROCESSED and by updating the fil_name_t::enable_lsn. recv_init_crash_recovery_spaces(): Copy fil_name_t::enable_lsn to fil_space_t::enable_lsn. recv_recover_page(): Add the parameter init_lsn, to ignore any log records that precede the page initialization. Add DBUG output about skipped operations. buf_page_create(): Initialize FIL_PAGE_LSN, so that recv_recover_page() will not wrongly skip applying the page-initialization record due to the field containing some newer LSN as a leftover from a different page. Do not invoke ibuf_merge_or_delete_for_page() during crash recovery. recv_apply_hashed_log_recs(): Remove some unnecessary lookups. Note if a corrupted page was found during recovery. After invoking buf_page_create(), do invoke ibuf_merge_or_delete_for_page() via mlog_init.ibuf_merge() in the last recovery batch. ibuf_merge_or_delete_for_page(): Relax a debug assertion. innobase_start_or_create_for_mysql(): Abort startup if a corrupted page was found during recovery. Corrupted pages will not be flagged if innodb_force_recovery is set. However, the recv_sys->found_corrupt_fs flag can be set regardless of innodb_force_recovery if file names are found to be incorrect (for example, multiple files with the same tablespace ID).
7 years ago
MDEV-12103 Reduce the time of looking for MLOG_CHECKPOINT during crash recovery This fixes MySQL Bug#80788 in MariaDB 10.2.5. When I made the InnoDB crash recovery more robust by implementing WL#7142, I also introduced an extra redo log scan pass that can be shortened. This fix will slightly extend the InnoDB redo log format that I introduced in MySQL 5.7.9 by writing the start LSN of the MLOG_CHECKPOINT mini-transaction to the end of the log checkpoint page, so that recovery can jump straight to it without scanning all the preceding redo log. LOG_CHECKPOINT_END_LSN: At the end of the checkpoint page, the start LSN of the MLOG_CHECKPOINT mini-transaction. Previously, these bytes were written as 0. log_write_checkpoint_info(), log_group_checkpoint(): Add the parameter end_lsn for writing LOG_CHECKPOINT_END_LSN. log_checkpoint(): Remember the LSN at which the MLOG_CHECKPOINT mini-transaction is starting (or at which the redo log ends on shutdown). recv_init_crash_recovery(): Remove. recv_group_scan_log_recs(): Add the parameter checkpoint_lsn. recv_recovery_from_checkpoint_start(): Read LOG_CHECKPOINT_END_LSN and if it is set, start the first scan from it instead of the checkpoint LSN. Improve some messages and remove bogus assertions. recv_parse_log_recs(): Do not skip DBUG_PRINT("ib_log") for some file-level redo log records. recv_parse_or_apply_log_rec_body(): If we have not parsed all redo log between the checkpoint and the corresponding MLOG_CHECKPOINT record, defer the check for MLOG_FILE_DELETE or MLOG_FILE_NAME records to recv_init_crash_recovery_spaces(). recv_init_crash_recovery_spaces(): Refuse recovery if MLOG_FILE_NAME or MLOG_FILE_DELETE records are missing.
9 years ago
MDEV-13267 At startup with crash recovery: mtr_t::commit_checkpoint(lsn_t, bool): Assertion `!recv_no_log_write' failed This is a bogus debug assertion failure that should be possible starting with MariaDB 10.2.2 (which merged WL#7142 via MySQL 5.7.9). While generating page-change redo log records is strictly out of the question during tat certain parts of crash recovery, the fil_names_clear() is only emitting informational MLOG_FILE_NAME and MLOG_CHECKPOINT records to guarantee that if the server is killed during or soon after the crash recovery, subsequent crash recovery will be possible. The metadata buffer that fil_names_clear() is flushing to the redo log is being filled by recv_init_crash_recovery_spaces(), right before starting to apply redo log, by invoking fil_names_dirty() on every discovered tablespace for which there are changes to apply. When it comes to Mariabackup (xtrabackup --prepare), it is strictly out of the question to generate any redo log whatsoever, because that could break the restore of incremental backups by causing LSN deviation. So, the fil_names_dirty() call must be skipped when restoring backups. recv_recovery_from_checkpoint_start(): Do not invoke fil_names_clear() when restoring a backup. mtr_t::commit_checkpoint(): Remove the failing assertion. The only caller is fil_names_clear(), and it must be called by recv_recovery_from_checkpoint_start() for normal server startup to be crash-safe. The debug assertion in mtr_t::commit() will still catch rogue redo log writes.
8 years ago
MDEV-13267 At startup with crash recovery: mtr_t::commit_checkpoint(lsn_t, bool): Assertion `!recv_no_log_write' failed This is a bogus debug assertion failure that should be possible starting with MariaDB 10.2.2 (which merged WL#7142 via MySQL 5.7.9). While generating page-change redo log records is strictly out of the question during tat certain parts of crash recovery, the fil_names_clear() is only emitting informational MLOG_FILE_NAME and MLOG_CHECKPOINT records to guarantee that if the server is killed during or soon after the crash recovery, subsequent crash recovery will be possible. The metadata buffer that fil_names_clear() is flushing to the redo log is being filled by recv_init_crash_recovery_spaces(), right before starting to apply redo log, by invoking fil_names_dirty() on every discovered tablespace for which there are changes to apply. When it comes to Mariabackup (xtrabackup --prepare), it is strictly out of the question to generate any redo log whatsoever, because that could break the restore of incremental backups by causing LSN deviation. So, the fil_names_dirty() call must be skipped when restoring backups. recv_recovery_from_checkpoint_start(): Do not invoke fil_names_clear() when restoring a backup. mtr_t::commit_checkpoint(): Remove the failing assertion. The only caller is fil_names_clear(), and it must be called by recv_recovery_from_checkpoint_start() for normal server startup to be crash-safe. The debug assertion in mtr_t::commit() will still catch rogue redo log writes.
8 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-14717: Prevent crash-downgrade to earlier MariaDB 10.2 A crash-downgrade of a RENAME (or TRUNCATE or table-rebuilding ALTER TABLE or OPTIMIZE TABLE) operation to an earlier 10.2 version would trigger a debug assertion failure during rollback, in trx_roll_pop_top_rec_of_trx(). In a non-debug build, the TRX_UNDO_RENAME_TABLE record would be misinterpreted as an update_undo log record, and typically the file name would be interpreted as DB_TRX_ID,DB_ROLL_PTR,PRIMARY KEY. If a matching record would be found, row_undo_mod() would hit ut_error in switch (node->rec_type). Typically, ut_a(table2 == NULL) would fail when opening the table from SQL. Because of this, we prevent a crash-downgrade to earlier MariaDB 10.2 versions by changing the InnoDB redo log format identifier to the 10.3 identifier, and by introducing a subformat identifier so that 10.2 can continue to refuse crash-downgrade from 10.3 or later. After a clean shutdown, a downgrade to MariaDB 10.2.13 or later would still be possible thanks to MDEV-14909. A downgrade to older 10.2 versions is only possible after removing the log files (not recommended). LOG_HEADER_FORMAT_CURRENT: Change to 103 (originally the 10.3 format). log_group_t: Add subformat. For 10.2, we will use subformat 1, and will refuse crash recovery from any other subformat of the 10.3 format, that is, a genuine 10.3 redo log. recv_find_max_checkpoint(): Allow startup after clean shutdown from a future LOG_HEADER_FORMAT_10_4 (unencrypted only). We cannot handle the encrypted 10.4 redo log block format, which was introduced in MDEV-12041. Allow crash recovery from the original 10.2 format as well as the new format. In Mariabackup --backup, do not allow any startup from 10.3 or 10.4 redo logs. recv_recovery_from_checkpoint_start(): Skip redo log apply for clean 10.3 redo log, but not for the new 10.2 redo log (10.3 format, subformat 1). srv_prepare_to_delete_redo_log_files(): On format or subformat mismatch, set srv_log_file_size = 0, so that we will display the correct message. innobase_start_or_create_for_mysql(): Check for format or subformat mismatch. xtrabackup_backup_func(): Remove debug assertions that were made redundant by the code changes in recv_find_max_checkpoint().
7 years ago
MDEV-14717: Prevent crash-downgrade to earlier MariaDB 10.2 A crash-downgrade of a RENAME (or TRUNCATE or table-rebuilding ALTER TABLE or OPTIMIZE TABLE) operation to an earlier 10.2 version would trigger a debug assertion failure during rollback, in trx_roll_pop_top_rec_of_trx(). In a non-debug build, the TRX_UNDO_RENAME_TABLE record would be misinterpreted as an update_undo log record, and typically the file name would be interpreted as DB_TRX_ID,DB_ROLL_PTR,PRIMARY KEY. If a matching record would be found, row_undo_mod() would hit ut_error in switch (node->rec_type). Typically, ut_a(table2 == NULL) would fail when opening the table from SQL. Because of this, we prevent a crash-downgrade to earlier MariaDB 10.2 versions by changing the InnoDB redo log format identifier to the 10.3 identifier, and by introducing a subformat identifier so that 10.2 can continue to refuse crash-downgrade from 10.3 or later. After a clean shutdown, a downgrade to MariaDB 10.2.13 or later would still be possible thanks to MDEV-14909. A downgrade to older 10.2 versions is only possible after removing the log files (not recommended). LOG_HEADER_FORMAT_CURRENT: Change to 103 (originally the 10.3 format). log_group_t: Add subformat. For 10.2, we will use subformat 1, and will refuse crash recovery from any other subformat of the 10.3 format, that is, a genuine 10.3 redo log. recv_find_max_checkpoint(): Allow startup after clean shutdown from a future LOG_HEADER_FORMAT_10_4 (unencrypted only). We cannot handle the encrypted 10.4 redo log block format, which was introduced in MDEV-12041. Allow crash recovery from the original 10.2 format as well as the new format. In Mariabackup --backup, do not allow any startup from 10.3 or 10.4 redo logs. recv_recovery_from_checkpoint_start(): Skip redo log apply for clean 10.3 redo log, but not for the new 10.2 redo log (10.3 format, subformat 1). srv_prepare_to_delete_redo_log_files(): On format or subformat mismatch, set srv_log_file_size = 0, so that we will display the correct message. innobase_start_or_create_for_mysql(): Check for format or subformat mismatch. xtrabackup_backup_func(): Remove debug assertions that were made redundant by the code changes in recv_find_max_checkpoint().
7 years ago
MDEV-13564: Implement innodb_unsafe_truncate=ON for compatibility While MariaDB Server 10.2 is not really guaranteed to be compatible with Percona XtraBackup 2.4 (for example, the MySQL 5.7 undo log format change that could be present in XtraBackup, but was reverted from MariaDB in MDEV-12289), we do not want to disrupt users who have deployed xtrabackup and MariaDB Server 10.2 in their environments. With this change, MariaDB 10.2 will continue to use the backup-unsafe TRUNCATE TABLE code, so that neither the undo log nor the redo log formats will change in an incompatible way. Undo tablespace truncation will keep using the redo log only. Recovery or backup with old code will fail to shrink the undo tablespace files, but the contents will be recovered just fine. In the MariaDB Server 10.2 series only, we introduce the configuration parameter innodb_unsafe_truncate and make it ON by default. To allow MariaDB Backup (mariabackup) to work properly with TRUNCATE TABLE operations, use loose_innodb_unsafe_truncate=OFF. MariaDB Server 10.3.10 and later releases will always use the backup-safe TRUNCATE TABLE, and this parameter will not be added there. recv_recovery_rollback_active(): Skip row_mysql_drop_garbage_tables() unless innodb_unsafe_truncate=OFF. It is too unsafe to drop orphan tables if RENAME operations are not transactional within InnoDB. LOG_HEADER_FORMAT_10_3: Replaces LOG_HEADER_FORMAT_CURRENT. log_init(), log_group_file_header_flush(), srv_prepare_to_delete_redo_log_files(), innobase_start_or_create_for_mysql(): Choose the redo log format and subformat based on the value of innodb_unsafe_truncate.
7 years ago
MDEV-14717: Prevent crash-downgrade to earlier MariaDB 10.2 A crash-downgrade of a RENAME (or TRUNCATE or table-rebuilding ALTER TABLE or OPTIMIZE TABLE) operation to an earlier 10.2 version would trigger a debug assertion failure during rollback, in trx_roll_pop_top_rec_of_trx(). In a non-debug build, the TRX_UNDO_RENAME_TABLE record would be misinterpreted as an update_undo log record, and typically the file name would be interpreted as DB_TRX_ID,DB_ROLL_PTR,PRIMARY KEY. If a matching record would be found, row_undo_mod() would hit ut_error in switch (node->rec_type). Typically, ut_a(table2 == NULL) would fail when opening the table from SQL. Because of this, we prevent a crash-downgrade to earlier MariaDB 10.2 versions by changing the InnoDB redo log format identifier to the 10.3 identifier, and by introducing a subformat identifier so that 10.2 can continue to refuse crash-downgrade from 10.3 or later. After a clean shutdown, a downgrade to MariaDB 10.2.13 or later would still be possible thanks to MDEV-14909. A downgrade to older 10.2 versions is only possible after removing the log files (not recommended). LOG_HEADER_FORMAT_CURRENT: Change to 103 (originally the 10.3 format). log_group_t: Add subformat. For 10.2, we will use subformat 1, and will refuse crash recovery from any other subformat of the 10.3 format, that is, a genuine 10.3 redo log. recv_find_max_checkpoint(): Allow startup after clean shutdown from a future LOG_HEADER_FORMAT_10_4 (unencrypted only). We cannot handle the encrypted 10.4 redo log block format, which was introduced in MDEV-12041. Allow crash recovery from the original 10.2 format as well as the new format. In Mariabackup --backup, do not allow any startup from 10.3 or 10.4 redo logs. recv_recovery_from_checkpoint_start(): Skip redo log apply for clean 10.3 redo log, but not for the new 10.2 redo log (10.3 format, subformat 1). srv_prepare_to_delete_redo_log_files(): On format or subformat mismatch, set srv_log_file_size = 0, so that we will display the correct message. innobase_start_or_create_for_mysql(): Check for format or subformat mismatch. xtrabackup_backup_func(): Remove debug assertions that were made redundant by the code changes in recv_find_max_checkpoint().
7 years ago
MDEV-12103 Reduce the time of looking for MLOG_CHECKPOINT during crash recovery This fixes MySQL Bug#80788 in MariaDB 10.2.5. When I made the InnoDB crash recovery more robust by implementing WL#7142, I also introduced an extra redo log scan pass that can be shortened. This fix will slightly extend the InnoDB redo log format that I introduced in MySQL 5.7.9 by writing the start LSN of the MLOG_CHECKPOINT mini-transaction to the end of the log checkpoint page, so that recovery can jump straight to it without scanning all the preceding redo log. LOG_CHECKPOINT_END_LSN: At the end of the checkpoint page, the start LSN of the MLOG_CHECKPOINT mini-transaction. Previously, these bytes were written as 0. log_write_checkpoint_info(), log_group_checkpoint(): Add the parameter end_lsn for writing LOG_CHECKPOINT_END_LSN. log_checkpoint(): Remember the LSN at which the MLOG_CHECKPOINT mini-transaction is starting (or at which the redo log ends on shutdown). recv_init_crash_recovery(): Remove. recv_group_scan_log_recs(): Add the parameter checkpoint_lsn. recv_recovery_from_checkpoint_start(): Read LOG_CHECKPOINT_END_LSN and if it is set, start the first scan from it instead of the checkpoint LSN. Improve some messages and remove bogus assertions. recv_parse_log_recs(): Do not skip DBUG_PRINT("ib_log") for some file-level redo log records. recv_parse_or_apply_log_rec_body(): If we have not parsed all redo log between the checkpoint and the corresponding MLOG_CHECKPOINT record, defer the check for MLOG_FILE_DELETE or MLOG_FILE_NAME records to recv_init_crash_recovery_spaces(). recv_init_crash_recovery_spaces(): Refuse recovery if MLOG_FILE_NAME or MLOG_FILE_DELETE records are missing.
9 years ago
MDEV-14717: Prevent crash-downgrade to earlier MariaDB 10.2 A crash-downgrade of a RENAME (or TRUNCATE or table-rebuilding ALTER TABLE or OPTIMIZE TABLE) operation to an earlier 10.2 version would trigger a debug assertion failure during rollback, in trx_roll_pop_top_rec_of_trx(). In a non-debug build, the TRX_UNDO_RENAME_TABLE record would be misinterpreted as an update_undo log record, and typically the file name would be interpreted as DB_TRX_ID,DB_ROLL_PTR,PRIMARY KEY. If a matching record would be found, row_undo_mod() would hit ut_error in switch (node->rec_type). Typically, ut_a(table2 == NULL) would fail when opening the table from SQL. Because of this, we prevent a crash-downgrade to earlier MariaDB 10.2 versions by changing the InnoDB redo log format identifier to the 10.3 identifier, and by introducing a subformat identifier so that 10.2 can continue to refuse crash-downgrade from 10.3 or later. After a clean shutdown, a downgrade to MariaDB 10.2.13 or later would still be possible thanks to MDEV-14909. A downgrade to older 10.2 versions is only possible after removing the log files (not recommended). LOG_HEADER_FORMAT_CURRENT: Change to 103 (originally the 10.3 format). log_group_t: Add subformat. For 10.2, we will use subformat 1, and will refuse crash recovery from any other subformat of the 10.3 format, that is, a genuine 10.3 redo log. recv_find_max_checkpoint(): Allow startup after clean shutdown from a future LOG_HEADER_FORMAT_10_4 (unencrypted only). We cannot handle the encrypted 10.4 redo log block format, which was introduced in MDEV-12041. Allow crash recovery from the original 10.2 format as well as the new format. In Mariabackup --backup, do not allow any startup from 10.3 or 10.4 redo logs. recv_recovery_from_checkpoint_start(): Skip redo log apply for clean 10.3 redo log, but not for the new 10.2 redo log (10.3 format, subformat 1). srv_prepare_to_delete_redo_log_files(): On format or subformat mismatch, set srv_log_file_size = 0, so that we will display the correct message. innobase_start_or_create_for_mysql(): Check for format or subformat mismatch. xtrabackup_backup_func(): Remove debug assertions that were made redundant by the code changes in recv_find_max_checkpoint().
7 years ago
MDEV-12103 Reduce the time of looking for MLOG_CHECKPOINT during crash recovery This fixes MySQL Bug#80788 in MariaDB 10.2.5. When I made the InnoDB crash recovery more robust by implementing WL#7142, I also introduced an extra redo log scan pass that can be shortened. This fix will slightly extend the InnoDB redo log format that I introduced in MySQL 5.7.9 by writing the start LSN of the MLOG_CHECKPOINT mini-transaction to the end of the log checkpoint page, so that recovery can jump straight to it without scanning all the preceding redo log. LOG_CHECKPOINT_END_LSN: At the end of the checkpoint page, the start LSN of the MLOG_CHECKPOINT mini-transaction. Previously, these bytes were written as 0. log_write_checkpoint_info(), log_group_checkpoint(): Add the parameter end_lsn for writing LOG_CHECKPOINT_END_LSN. log_checkpoint(): Remember the LSN at which the MLOG_CHECKPOINT mini-transaction is starting (or at which the redo log ends on shutdown). recv_init_crash_recovery(): Remove. recv_group_scan_log_recs(): Add the parameter checkpoint_lsn. recv_recovery_from_checkpoint_start(): Read LOG_CHECKPOINT_END_LSN and if it is set, start the first scan from it instead of the checkpoint LSN. Improve some messages and remove bogus assertions. recv_parse_log_recs(): Do not skip DBUG_PRINT("ib_log") for some file-level redo log records. recv_parse_or_apply_log_rec_body(): If we have not parsed all redo log between the checkpoint and the corresponding MLOG_CHECKPOINT record, defer the check for MLOG_FILE_DELETE or MLOG_FILE_NAME records to recv_init_crash_recovery_spaces(). recv_init_crash_recovery_spaces(): Refuse recovery if MLOG_FILE_NAME or MLOG_FILE_DELETE records are missing.
9 years ago
MDEV-14717: Prevent crash-downgrade to earlier MariaDB 10.2 A crash-downgrade of a RENAME (or TRUNCATE or table-rebuilding ALTER TABLE or OPTIMIZE TABLE) operation to an earlier 10.2 version would trigger a debug assertion failure during rollback, in trx_roll_pop_top_rec_of_trx(). In a non-debug build, the TRX_UNDO_RENAME_TABLE record would be misinterpreted as an update_undo log record, and typically the file name would be interpreted as DB_TRX_ID,DB_ROLL_PTR,PRIMARY KEY. If a matching record would be found, row_undo_mod() would hit ut_error in switch (node->rec_type). Typically, ut_a(table2 == NULL) would fail when opening the table from SQL. Because of this, we prevent a crash-downgrade to earlier MariaDB 10.2 versions by changing the InnoDB redo log format identifier to the 10.3 identifier, and by introducing a subformat identifier so that 10.2 can continue to refuse crash-downgrade from 10.3 or later. After a clean shutdown, a downgrade to MariaDB 10.2.13 or later would still be possible thanks to MDEV-14909. A downgrade to older 10.2 versions is only possible after removing the log files (not recommended). LOG_HEADER_FORMAT_CURRENT: Change to 103 (originally the 10.3 format). log_group_t: Add subformat. For 10.2, we will use subformat 1, and will refuse crash recovery from any other subformat of the 10.3 format, that is, a genuine 10.3 redo log. recv_find_max_checkpoint(): Allow startup after clean shutdown from a future LOG_HEADER_FORMAT_10_4 (unencrypted only). We cannot handle the encrypted 10.4 redo log block format, which was introduced in MDEV-12041. Allow crash recovery from the original 10.2 format as well as the new format. In Mariabackup --backup, do not allow any startup from 10.3 or 10.4 redo logs. recv_recovery_from_checkpoint_start(): Skip redo log apply for clean 10.3 redo log, but not for the new 10.2 redo log (10.3 format, subformat 1). srv_prepare_to_delete_redo_log_files(): On format or subformat mismatch, set srv_log_file_size = 0, so that we will display the correct message. innobase_start_or_create_for_mysql(): Check for format or subformat mismatch. xtrabackup_backup_func(): Remove debug assertions that were made redundant by the code changes in recv_find_max_checkpoint().
7 years ago
MDEV-12103 Reduce the time of looking for MLOG_CHECKPOINT during crash recovery This fixes MySQL Bug#80788 in MariaDB 10.2.5. When I made the InnoDB crash recovery more robust by implementing WL#7142, I also introduced an extra redo log scan pass that can be shortened. This fix will slightly extend the InnoDB redo log format that I introduced in MySQL 5.7.9 by writing the start LSN of the MLOG_CHECKPOINT mini-transaction to the end of the log checkpoint page, so that recovery can jump straight to it without scanning all the preceding redo log. LOG_CHECKPOINT_END_LSN: At the end of the checkpoint page, the start LSN of the MLOG_CHECKPOINT mini-transaction. Previously, these bytes were written as 0. log_write_checkpoint_info(), log_group_checkpoint(): Add the parameter end_lsn for writing LOG_CHECKPOINT_END_LSN. log_checkpoint(): Remember the LSN at which the MLOG_CHECKPOINT mini-transaction is starting (or at which the redo log ends on shutdown). recv_init_crash_recovery(): Remove. recv_group_scan_log_recs(): Add the parameter checkpoint_lsn. recv_recovery_from_checkpoint_start(): Read LOG_CHECKPOINT_END_LSN and if it is set, start the first scan from it instead of the checkpoint LSN. Improve some messages and remove bogus assertions. recv_parse_log_recs(): Do not skip DBUG_PRINT("ib_log") for some file-level redo log records. recv_parse_or_apply_log_rec_body(): If we have not parsed all redo log between the checkpoint and the corresponding MLOG_CHECKPOINT record, defer the check for MLOG_FILE_DELETE or MLOG_FILE_NAME records to recv_init_crash_recovery_spaces(). recv_init_crash_recovery_spaces(): Refuse recovery if MLOG_FILE_NAME or MLOG_FILE_DELETE records are missing.
9 years ago
Shut down InnoDB after aborted startup. This fixes memory leaks in tests that cause InnoDB startup to fail. buf_pool_free_instance(): Also free buf_pool->flush_rbt, which would normally be freed when crash recovery finishes. fil_node_close_file(), fil_space_free_low(), fil_close_all_files(): Relax some debug assertions to tolerate !srv_was_started. innodb_shutdown(): Renamed from innobase_shutdown_for_mysql(). Changed the return type to void. Do not assume that all subsystems were started. que_init(), que_close(): Remove (empty functions). srv_init(), srv_general_init(): Remove as global functions. srv_free(): Allow srv_sys=NULL. srv_get_active_thread_type(): Only return SRV_PURGE if purge really is running. srv_shutdown_all_bg_threads(): Do not reset srv_start_state. It will be needed by innodb_shutdown(). innobase_start_or_create_for_mysql(): Always call srv_boot() so that innodb_shutdown() can assume that it was called. Make more subsystems dependent on SRV_START_STATE_STAT. srv_shutdown_bg_undo_sources(): Require SRV_START_STATE_STAT. trx_sys_close(): Do not assume purge_sys!=NULL. Do not call buf_dblwr_free(), because the doublewrite buffer can exist while the transaction system does not. logs_empty_and_mark_files_at_shutdown(): Do a faster shutdown if !srv_was_started. recv_sys_close(): Invoke dblwr.pages.clear() which would normally be invoked by buf_dblwr_process(). recv_recovery_from_checkpoint_start(): Always release log_sys->mutex. row_mysql_close(): Allow the subsystem not to exist.
9 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-12103 Reduce the time of looking for MLOG_CHECKPOINT during crash recovery This fixes MySQL Bug#80788 in MariaDB 10.2.5. When I made the InnoDB crash recovery more robust by implementing WL#7142, I also introduced an extra redo log scan pass that can be shortened. This fix will slightly extend the InnoDB redo log format that I introduced in MySQL 5.7.9 by writing the start LSN of the MLOG_CHECKPOINT mini-transaction to the end of the log checkpoint page, so that recovery can jump straight to it without scanning all the preceding redo log. LOG_CHECKPOINT_END_LSN: At the end of the checkpoint page, the start LSN of the MLOG_CHECKPOINT mini-transaction. Previously, these bytes were written as 0. log_write_checkpoint_info(), log_group_checkpoint(): Add the parameter end_lsn for writing LOG_CHECKPOINT_END_LSN. log_checkpoint(): Remember the LSN at which the MLOG_CHECKPOINT mini-transaction is starting (or at which the redo log ends on shutdown). recv_init_crash_recovery(): Remove. recv_group_scan_log_recs(): Add the parameter checkpoint_lsn. recv_recovery_from_checkpoint_start(): Read LOG_CHECKPOINT_END_LSN and if it is set, start the first scan from it instead of the checkpoint LSN. Improve some messages and remove bogus assertions. recv_parse_log_recs(): Do not skip DBUG_PRINT("ib_log") for some file-level redo log records. recv_parse_or_apply_log_rec_body(): If we have not parsed all redo log between the checkpoint and the corresponding MLOG_CHECKPOINT record, defer the check for MLOG_FILE_DELETE or MLOG_FILE_NAME records to recv_init_crash_recovery_spaces(). recv_init_crash_recovery_spaces(): Refuse recovery if MLOG_FILE_NAME or MLOG_FILE_DELETE records are missing.
9 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
MDEV-12103 Reduce the time of looking for MLOG_CHECKPOINT during crash recovery This fixes MySQL Bug#80788 in MariaDB 10.2.5. When I made the InnoDB crash recovery more robust by implementing WL#7142, I also introduced an extra redo log scan pass that can be shortened. This fix will slightly extend the InnoDB redo log format that I introduced in MySQL 5.7.9 by writing the start LSN of the MLOG_CHECKPOINT mini-transaction to the end of the log checkpoint page, so that recovery can jump straight to it without scanning all the preceding redo log. LOG_CHECKPOINT_END_LSN: At the end of the checkpoint page, the start LSN of the MLOG_CHECKPOINT mini-transaction. Previously, these bytes were written as 0. log_write_checkpoint_info(), log_group_checkpoint(): Add the parameter end_lsn for writing LOG_CHECKPOINT_END_LSN. log_checkpoint(): Remember the LSN at which the MLOG_CHECKPOINT mini-transaction is starting (or at which the redo log ends on shutdown). recv_init_crash_recovery(): Remove. recv_group_scan_log_recs(): Add the parameter checkpoint_lsn. recv_recovery_from_checkpoint_start(): Read LOG_CHECKPOINT_END_LSN and if it is set, start the first scan from it instead of the checkpoint LSN. Improve some messages and remove bogus assertions. recv_parse_log_recs(): Do not skip DBUG_PRINT("ib_log") for some file-level redo log records. recv_parse_or_apply_log_rec_body(): If we have not parsed all redo log between the checkpoint and the corresponding MLOG_CHECKPOINT record, defer the check for MLOG_FILE_DELETE or MLOG_FILE_NAME records to recv_init_crash_recovery_spaces(). recv_init_crash_recovery_spaces(): Refuse recovery if MLOG_FILE_NAME or MLOG_FILE_DELETE records are missing.
9 years ago
MDEV-12103 Reduce the time of looking for MLOG_CHECKPOINT during crash recovery This fixes MySQL Bug#80788 in MariaDB 10.2.5. When I made the InnoDB crash recovery more robust by implementing WL#7142, I also introduced an extra redo log scan pass that can be shortened. This fix will slightly extend the InnoDB redo log format that I introduced in MySQL 5.7.9 by writing the start LSN of the MLOG_CHECKPOINT mini-transaction to the end of the log checkpoint page, so that recovery can jump straight to it without scanning all the preceding redo log. LOG_CHECKPOINT_END_LSN: At the end of the checkpoint page, the start LSN of the MLOG_CHECKPOINT mini-transaction. Previously, these bytes were written as 0. log_write_checkpoint_info(), log_group_checkpoint(): Add the parameter end_lsn for writing LOG_CHECKPOINT_END_LSN. log_checkpoint(): Remember the LSN at which the MLOG_CHECKPOINT mini-transaction is starting (or at which the redo log ends on shutdown). recv_init_crash_recovery(): Remove. recv_group_scan_log_recs(): Add the parameter checkpoint_lsn. recv_recovery_from_checkpoint_start(): Read LOG_CHECKPOINT_END_LSN and if it is set, start the first scan from it instead of the checkpoint LSN. Improve some messages and remove bogus assertions. recv_parse_log_recs(): Do not skip DBUG_PRINT("ib_log") for some file-level redo log records. recv_parse_or_apply_log_rec_body(): If we have not parsed all redo log between the checkpoint and the corresponding MLOG_CHECKPOINT record, defer the check for MLOG_FILE_DELETE or MLOG_FILE_NAME records to recv_init_crash_recovery_spaces(). recv_init_crash_recovery_spaces(): Refuse recovery if MLOG_FILE_NAME or MLOG_FILE_DELETE records are missing.
9 years ago
MDEV-12103 Reduce the time of looking for MLOG_CHECKPOINT during crash recovery This fixes MySQL Bug#80788 in MariaDB 10.2.5. When I made the InnoDB crash recovery more robust by implementing WL#7142, I also introduced an extra redo log scan pass that can be shortened. This fix will slightly extend the InnoDB redo log format that I introduced in MySQL 5.7.9 by writing the start LSN of the MLOG_CHECKPOINT mini-transaction to the end of the log checkpoint page, so that recovery can jump straight to it without scanning all the preceding redo log. LOG_CHECKPOINT_END_LSN: At the end of the checkpoint page, the start LSN of the MLOG_CHECKPOINT mini-transaction. Previously, these bytes were written as 0. log_write_checkpoint_info(), log_group_checkpoint(): Add the parameter end_lsn for writing LOG_CHECKPOINT_END_LSN. log_checkpoint(): Remember the LSN at which the MLOG_CHECKPOINT mini-transaction is starting (or at which the redo log ends on shutdown). recv_init_crash_recovery(): Remove. recv_group_scan_log_recs(): Add the parameter checkpoint_lsn. recv_recovery_from_checkpoint_start(): Read LOG_CHECKPOINT_END_LSN and if it is set, start the first scan from it instead of the checkpoint LSN. Improve some messages and remove bogus assertions. recv_parse_log_recs(): Do not skip DBUG_PRINT("ib_log") for some file-level redo log records. recv_parse_or_apply_log_rec_body(): If we have not parsed all redo log between the checkpoint and the corresponding MLOG_CHECKPOINT record, defer the check for MLOG_FILE_DELETE or MLOG_FILE_NAME records to recv_init_crash_recovery_spaces(). recv_init_crash_recovery_spaces(): Refuse recovery if MLOG_FILE_NAME or MLOG_FILE_DELETE records are missing.
9 years ago
MDEV-12103 Reduce the time of looking for MLOG_CHECKPOINT during crash recovery This fixes MySQL Bug#80788 in MariaDB 10.2.5. When I made the InnoDB crash recovery more robust by implementing WL#7142, I also introduced an extra redo log scan pass that can be shortened. This fix will slightly extend the InnoDB redo log format that I introduced in MySQL 5.7.9 by writing the start LSN of the MLOG_CHECKPOINT mini-transaction to the end of the log checkpoint page, so that recovery can jump straight to it without scanning all the preceding redo log. LOG_CHECKPOINT_END_LSN: At the end of the checkpoint page, the start LSN of the MLOG_CHECKPOINT mini-transaction. Previously, these bytes were written as 0. log_write_checkpoint_info(), log_group_checkpoint(): Add the parameter end_lsn for writing LOG_CHECKPOINT_END_LSN. log_checkpoint(): Remember the LSN at which the MLOG_CHECKPOINT mini-transaction is starting (or at which the redo log ends on shutdown). recv_init_crash_recovery(): Remove. recv_group_scan_log_recs(): Add the parameter checkpoint_lsn. recv_recovery_from_checkpoint_start(): Read LOG_CHECKPOINT_END_LSN and if it is set, start the first scan from it instead of the checkpoint LSN. Improve some messages and remove bogus assertions. recv_parse_log_recs(): Do not skip DBUG_PRINT("ib_log") for some file-level redo log records. recv_parse_or_apply_log_rec_body(): If we have not parsed all redo log between the checkpoint and the corresponding MLOG_CHECKPOINT record, defer the check for MLOG_FILE_DELETE or MLOG_FILE_NAME records to recv_init_crash_recovery_spaces(). recv_init_crash_recovery_spaces(): Refuse recovery if MLOG_FILE_NAME or MLOG_FILE_DELETE records are missing.
9 years ago
MDEV-13267 At startup with crash recovery: mtr_t::commit_checkpoint(lsn_t, bool): Assertion `!recv_no_log_write' failed This is a bogus debug assertion failure that should be possible starting with MariaDB 10.2.2 (which merged WL#7142 via MySQL 5.7.9). While generating page-change redo log records is strictly out of the question during tat certain parts of crash recovery, the fil_names_clear() is only emitting informational MLOG_FILE_NAME and MLOG_CHECKPOINT records to guarantee that if the server is killed during or soon after the crash recovery, subsequent crash recovery will be possible. The metadata buffer that fil_names_clear() is flushing to the redo log is being filled by recv_init_crash_recovery_spaces(), right before starting to apply redo log, by invoking fil_names_dirty() on every discovered tablespace for which there are changes to apply. When it comes to Mariabackup (xtrabackup --prepare), it is strictly out of the question to generate any redo log whatsoever, because that could break the restore of incremental backups by causing LSN deviation. So, the fil_names_dirty() call must be skipped when restoring backups. recv_recovery_from_checkpoint_start(): Do not invoke fil_names_clear() when restoring a backup. mtr_t::commit_checkpoint(): Remove the failing assertion. The only caller is fil_names_clear(), and it must be called by recv_recovery_from_checkpoint_start() for normal server startup to be crash-safe. The debug assertion in mtr_t::commit() will still catch rogue redo log writes.
8 years ago
MDEV-13267 At startup with crash recovery: mtr_t::commit_checkpoint(lsn_t, bool): Assertion `!recv_no_log_write' failed This is a bogus debug assertion failure that should be possible starting with MariaDB 10.2.2 (which merged WL#7142 via MySQL 5.7.9). While generating page-change redo log records is strictly out of the question during tat certain parts of crash recovery, the fil_names_clear() is only emitting informational MLOG_FILE_NAME and MLOG_CHECKPOINT records to guarantee that if the server is killed during or soon after the crash recovery, subsequent crash recovery will be possible. The metadata buffer that fil_names_clear() is flushing to the redo log is being filled by recv_init_crash_recovery_spaces(), right before starting to apply redo log, by invoking fil_names_dirty() on every discovered tablespace for which there are changes to apply. When it comes to Mariabackup (xtrabackup --prepare), it is strictly out of the question to generate any redo log whatsoever, because that could break the restore of incremental backups by causing LSN deviation. So, the fil_names_dirty() call must be skipped when restoring backups. recv_recovery_from_checkpoint_start(): Do not invoke fil_names_clear() when restoring a backup. mtr_t::commit_checkpoint(): Remove the failing assertion. The only caller is fil_names_clear(), and it must be called by recv_recovery_from_checkpoint_start() for normal server startup to be crash-safe. The debug assertion in mtr_t::commit() will still catch rogue redo log writes.
8 years ago
MDEV-11782: Redefine the innodb_encrypt_log format Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
9 years ago
Follow-up to MDEV-13407 innodb.drop_table_background failed in buildbot with "Tablespace for table exists" This is a backport of commit 88aff5f471d3d9ae8ecc2f909bcf5bd0ddd6aa7c. The InnoDB background DROP TABLE queue is something that we should really remove, but are unable to until we remove dict_operation_lock so that DDL and DML operations can be combined in a single transaction. Because the queue is not persistent, it is not crash-safe. We should in some way ensure that the deferred-dropped tables will be dropped after server restart. The existence of two separate transactions complicates the error handling of CREATE TABLE...SELECT. We should really not break locks in DROP TABLE. Our solution to these problems is to rename the table to a temporary name, and to drop such-named tables on InnoDB startup. Also, the queue will use table IDs instead of names from now on. check-testcase.test: Ignore #sql-ib*.ibd files, because tables may enter the background DROP TABLE queue shortly before the test finishes. innodb.drop_table_background: Test CREATE...SELECT and the creation of tables whose file name starts with #sql-ib. innodb.alter_crash: Adjust the recovery, now that the #sql-ib tables will be dropped on InnoDB startup. row_mysql_drop_garbage_tables(): New function, to drop all #sql-ib tables on InnoDB startup. row_drop_table_for_mysql_in_background(): Remove an unnecessary and misplaced call to log_buffer_flush_to_disk(). (The call should have been after the transaction commit. We do not care about flushing the redo log here, because the table would be dropped again at server startup.) Remove the entry from the list after the table no longer exists. If server shutdown has been initiated, empty the list without actually dropping any tables. They will be dropped again on startup. row_drop_table_for_mysql(): Do not call lock_remove_all_on_table(). Instead, if locks exist, defer the DROP TABLE until they do not exist. If the table name does not start with #sql-ib, rename it to that prefix before adding it to the background DROP TABLE queue.
7 years ago
Follow-up to MDEV-13407 innodb.drop_table_background failed in buildbot with "Tablespace for table exists" This is a backport of commit 88aff5f471d3d9ae8ecc2f909bcf5bd0ddd6aa7c. The InnoDB background DROP TABLE queue is something that we should really remove, but are unable to until we remove dict_operation_lock so that DDL and DML operations can be combined in a single transaction. Because the queue is not persistent, it is not crash-safe. We should in some way ensure that the deferred-dropped tables will be dropped after server restart. The existence of two separate transactions complicates the error handling of CREATE TABLE...SELECT. We should really not break locks in DROP TABLE. Our solution to these problems is to rename the table to a temporary name, and to drop such-named tables on InnoDB startup. Also, the queue will use table IDs instead of names from now on. check-testcase.test: Ignore #sql-ib*.ibd files, because tables may enter the background DROP TABLE queue shortly before the test finishes. innodb.drop_table_background: Test CREATE...SELECT and the creation of tables whose file name starts with #sql-ib. innodb.alter_crash: Adjust the recovery, now that the #sql-ib tables will be dropped on InnoDB startup. row_mysql_drop_garbage_tables(): New function, to drop all #sql-ib tables on InnoDB startup. row_drop_table_for_mysql_in_background(): Remove an unnecessary and misplaced call to log_buffer_flush_to_disk(). (The call should have been after the transaction commit. We do not care about flushing the redo log here, because the table would be dropped again at server startup.) Remove the entry from the list after the table no longer exists. If server shutdown has been initiated, empty the list without actually dropping any tables. They will be dropped again on startup. row_drop_table_for_mysql(): Do not call lock_remove_all_on_table(). Instead, if locks exist, defer the DROP TABLE until they do not exist. If the table name does not start with #sql-ib, rename it to that prefix before adding it to the background DROP TABLE queue.
7 years ago
11 years ago
MDEV-13564 Mariabackup does not work with TRUNCATE Implement undo tablespace truncation via normal redo logging. Implement TRUNCATE TABLE as a combination of RENAME to #sql-ib name, CREATE, and DROP. Note: Orphan #sql-ib*.ibd may be left behind if MariaDB Server 10.2 is killed before the DROP operation is committed. If MariaDB Server 10.2 is killed during TRUNCATE, it is also possible that the old table was renamed to #sql-ib*.ibd but the data dictionary will refer to the table using the original name. In MariaDB Server 10.3, RENAME inside InnoDB is transactional, and #sql-* tables will be dropped on startup. So, this new TRUNCATE will be fully crash-safe in 10.3. ha_mroonga::wrapper_truncate(): Pass table options to the underlying storage engine, now that ha_innobase::truncate() will need them. rpl_slave_state::truncate_state_table(): Before truncating mysql.gtid_slave_pos, evict any cached table handles from the table definition cache, so that there will be no stale references to the old table after truncating. == TRUNCATE TABLE == WL#6501 in MySQL 5.7 introduced separate log files for implementing atomic and crash-safe TRUNCATE TABLE, instead of using the InnoDB undo and redo log. Some convoluted logic was added to the InnoDB crash recovery, and some extra synchronization (including a redo log checkpoint) was introduced to make this work. This synchronization has caused performance problems and race conditions, and the extra log files cannot be copied or applied by external backup programs. In order to support crash-upgrade from MariaDB 10.2, we will keep the logic for parsing and applying the extra log files, but we will no longer generate those files in TRUNCATE TABLE. A prerequisite for crash-safe TRUNCATE is a crash-safe RENAME TABLE (with full redo and undo logging and proper rollback). This will be implemented in MDEV-14717. ha_innobase::truncate(): Invoke RENAME, create(), delete_table(). Because RENAME cannot be fully rolled back before MariaDB 10.3 due to missing undo logging, add some explicit rename-back in case the operation fails. ha_innobase::delete(): Introduce a variant that takes sqlcom as a parameter. In TRUNCATE TABLE, we do not want to touch any FOREIGN KEY constraints. ha_innobase::create(): Add the parameters file_per_table, trx. In TRUNCATE, the new table must be created in the same transaction that renames the old table. create_table_info_t::create_table_info_t(): Add the parameters file_per_table, trx. row_drop_table_for_mysql(): Replace a bool parameter with sqlcom. row_drop_table_after_create_fail(): New function, wrapping row_drop_table_for_mysql(). dict_truncate_index_tree_in_mem(), fil_truncate_tablespace(), fil_prepare_for_truncate(), fil_reinit_space_header_for_table(), row_truncate_table_for_mysql(), TruncateLogger, row_truncate_prepare(), row_truncate_rollback(), row_truncate_complete(), row_truncate_fts(), row_truncate_update_system_tables(), row_truncate_foreign_key_checks(), row_truncate_sanity_checks(): Remove. row_upd_check_references_constraints(): Remove a check for TRUNCATE, now that the table is no longer truncated in place. The new test innodb.truncate_foreign uses DEBUG_SYNC to cover some race-condition like scenarios. The test innodb-innodb.truncate does not use any synchronization. We add a redo log subformat to indicate backup-friendly format. MariaDB 10.4 will remove support for the old TRUNCATE logging, so crash-upgrade from old 10.2 or 10.3 to 10.4 will involve limitations. == Undo tablespace truncation == MySQL 5.7 implements undo tablespace truncation. It is only possible when innodb_undo_tablespaces is set to at least 2. The logging is implemented similar to the WL#6501 TRUNCATE, that is, using separate log files and a redo log checkpoint. We can simply implement undo tablespace truncation within a single mini-transaction that reinitializes the undo log tablespace file. Unfortunately, due to the redo log format of some operations, currently, the total redo log written by undo tablespace truncation will be more than the combined size of the truncated undo tablespace. It should be acceptable to have a little more than 1 megabyte of log in a single mini-transaction. This will be fixed in MDEV-17138 in MariaDB Server 10.4. recv_sys_t: Add truncated_undo_spaces[] to remember for which undo tablespaces a MLOG_FILE_CREATE2 record was seen. namespace undo: Remove some unnecessary declarations. fil_space_t::is_being_truncated: Document that this flag now only applies to undo tablespaces. Remove some references. fil_space_t::is_stopping(): Do not refer to is_being_truncated. This check is for tablespaces of tables. Potentially used tablespaces are never truncated any more. buf_dblwr_process(): Suppress the out-of-bounds warning for undo tablespaces. fil_truncate_log(): Write a MLOG_FILE_CREATE2 with a nonzero page number (new size of the tablespace in pages) to inform crash recovery that the undo tablespace size has been reduced. fil_op_write_log(): Relax assertions, so that MLOG_FILE_CREATE2 can be written for undo tablespaces (without .ibd file suffix) for a nonzero page number. os_file_truncate(): Add the parameter allow_shrink=false so that undo tablespaces can actually be shrunk using this function. fil_name_parse(): For undo tablespace truncation, buffer MLOG_FILE_CREATE2 in truncated_undo_spaces[]. recv_read_in_area(): Avoid reading pages for which no redo log records remain buffered, after recv_addr_trim() removed them. trx_rseg_header_create(): Add a FIXME comment that we could write much less redo log. trx_undo_truncate_tablespace(): Reinitialize the undo tablespace in a single mini-transaction, which will be flushed to the redo log before the file size is trimmed. recv_addr_trim(): Discard any redo logs for pages that were logged after the new end of a file, before the truncation LSN. If the rec_list becomes empty, reduce n_addrs. After removing any affected records, actually truncate the file. recv_apply_hashed_log_recs(): Invoke recv_addr_trim() right before applying any log records. The undo tablespace files must be open at this point. buf_flush_or_remove_pages(), buf_flush_dirty_pages(), buf_LRU_flush_or_remove_pages(): Add a parameter for specifying the number of the first page to flush or remove (default 0). trx_purge_initiate_truncate(): Remove the log checkpoints, the extra logging, and some unnecessary crash points. Merge the code from trx_undo_truncate_tablespace(). First, flush all to-be-discarded pages (beyond the new end of the file), then trim the space->size to make the page allocation deterministic. At the only remaining crash injection point, flush the redo log, so that the recovery can be tested.
7 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
  1. /*****************************************************************************
  2. Copyright (c) 1997, 2017, Oracle and/or its affiliates. All Rights Reserved.
  3. Copyright (c) 2012, Facebook Inc.
  4. Copyright (c) 2013, 2020, MariaDB Corporation.
  5. This program is free software; you can redistribute it and/or modify it under
  6. the terms of the GNU General Public License as published by the Free Software
  7. Foundation; version 2 of the License.
  8. This program is distributed in the hope that it will be useful, but WITHOUT
  9. ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
  10. FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
  11. You should have received a copy of the GNU General Public License along with
  12. this program; if not, write to the Free Software Foundation, Inc.,
  13. 51 Franklin Street, Fifth Floor, Boston, MA 02110-1335 USA
  14. *****************************************************************************/
  15. /**************************************************//**
  16. @file log/log0recv.cc
  17. Recovery
  18. Created 9/20/1997 Heikki Tuuri
  19. *******************************************************/
  20. #include "univ.i"
  21. #include <map>
  22. #include <string>
  23. #include <my_service_manager.h>
  24. #include "log0recv.h"
  25. #ifdef HAVE_MY_AES_H
  26. #include <my_aes.h>
  27. #endif
  28. #include "log0crypt.h"
  29. #include "mem0mem.h"
  30. #include "buf0buf.h"
  31. #include "buf0flu.h"
  32. #include "mtr0mtr.h"
  33. #include "mtr0log.h"
  34. #include "page0cur.h"
  35. #include "page0zip.h"
  36. #include "btr0btr.h"
  37. #include "btr0cur.h"
  38. #include "ibuf0ibuf.h"
  39. #include "trx0undo.h"
  40. #include "trx0rec.h"
  41. #include "fil0fil.h"
  42. #include "row0trunc.h"
  43. #include "buf0rea.h"
  44. #include "srv0srv.h"
  45. #include "srv0start.h"
  46. #include "trx0roll.h"
  47. #include "row0merge.h"
  48. /** Log records are stored in the hash table in chunks at most of this size;
  49. this must be less than UNIV_PAGE_SIZE as it is stored in the buffer pool */
  50. #define RECV_DATA_BLOCK_SIZE (MEM_MAX_ALLOC_IN_BUF - sizeof(recv_data_t) - REDZONE_SIZE)
  51. /** Read-ahead area in applying log records to file pages */
  52. #define RECV_READ_AHEAD_AREA 32
  53. /** The recovery system */
  54. recv_sys_t* recv_sys;
  55. /** TRUE when applying redo log records during crash recovery; FALSE
  56. otherwise. Note that this is FALSE while a background thread is
  57. rolling back incomplete transactions. */
  58. volatile bool recv_recovery_on;
  59. /** TRUE when recv_init_crash_recovery() has been called. */
  60. bool recv_needed_recovery;
  61. #ifdef UNIV_DEBUG
  62. /** TRUE if writing to the redo log (mtr_commit) is forbidden.
  63. Protected by log_sys->mutex. */
  64. bool recv_no_log_write = false;
  65. #endif /* UNIV_DEBUG */
  66. /** TRUE if buf_page_is_corrupted() should check if the log sequence
  67. number (FIL_PAGE_LSN) is in the future. Initially FALSE, and set by
  68. recv_recovery_from_checkpoint_start(). */
  69. bool recv_lsn_checks_on;
  70. /** If the following is TRUE, the buffer pool file pages must be invalidated
  71. after recovery and no ibuf operations are allowed; this becomes TRUE if
  72. the log record hash table becomes too full, and log records must be merged
  73. to file pages already before the recovery is finished: in this case no
  74. ibuf operations are allowed, as they could modify the pages read in the
  75. buffer pool before the pages have been recovered to the up-to-date state.
  76. TRUE means that recovery is running and no operations on the log files
  77. are allowed yet: the variable name is misleading. */
  78. bool recv_no_ibuf_operations;
  79. /** The type of the previous parsed redo log record */
  80. static mlog_id_t recv_previous_parsed_rec_type;
  81. /** The offset of the previous parsed redo log record */
  82. static ulint recv_previous_parsed_rec_offset;
  83. /** The 'multi' flag of the previous parsed redo log record */
  84. static ulint recv_previous_parsed_rec_is_multi;
  85. /** The maximum lsn we see for a page during the recovery process. If this
  86. is bigger than the lsn we are able to scan up to, that is an indication that
  87. the recovery failed and the database may be corrupt. */
  88. static lsn_t recv_max_page_lsn;
  89. #ifdef UNIV_PFS_THREAD
  90. mysql_pfs_key_t trx_rollback_clean_thread_key;
  91. mysql_pfs_key_t recv_writer_thread_key;
  92. #endif /* UNIV_PFS_THREAD */
  93. /** Is recv_writer_thread active? */
  94. bool recv_writer_thread_active;
  95. #ifndef DBUG_OFF
  96. /** Return string name of the redo log record type.
  97. @param[in] type record log record enum
  98. @return string name of record log record */
  99. static const char* get_mlog_string(mlog_id_t type);
  100. #endif /* !DBUG_OFF */
  101. /** Tablespace item during recovery */
  102. struct file_name_t {
  103. /** Tablespace file name (MLOG_FILE_NAME) */
  104. std::string name;
  105. /** Tablespace object (NULL if not valid or not found) */
  106. fil_space_t* space;
  107. /** Tablespace status. */
  108. enum fil_status {
  109. /** Normal tablespace */
  110. NORMAL,
  111. /** Deleted tablespace */
  112. DELETED,
  113. /** Missing tablespace */
  114. MISSING
  115. };
  116. /** Status of the tablespace */
  117. fil_status status;
  118. /** FSP_SIZE of tablespace */
  119. ulint size;
  120. /** the log sequence number of the last observed MLOG_INDEX_LOAD
  121. record for the tablespace */
  122. lsn_t enable_lsn;
  123. /** Constructor */
  124. file_name_t(std::string name_, bool deleted) :
  125. name(name_), space(NULL), status(deleted ? DELETED: NORMAL),
  126. size(0), enable_lsn(0) {}
  127. /** Report a MLOG_INDEX_LOAD operation, meaning that
  128. mlog_init for any earlier LSN must be skipped.
  129. @param lsn log sequence number of the MLOG_INDEX_LOAD */
  130. void mlog_index_load(lsn_t lsn)
  131. {
  132. if (enable_lsn < lsn) enable_lsn = lsn;
  133. }
  134. };
  135. /** Map of dirty tablespaces during recovery */
  136. typedef std::map<
  137. ulint,
  138. file_name_t,
  139. std::less<ulint>,
  140. ut_allocator<std::pair<const ulint, file_name_t> > > recv_spaces_t;
  141. static recv_spaces_t recv_spaces;
  142. /** States of recv_addr_t */
  143. enum recv_addr_state {
  144. /** not yet processed */
  145. RECV_NOT_PROCESSED,
  146. /** not processed; the page will be reinitialized */
  147. RECV_WILL_NOT_READ,
  148. /** page is being read */
  149. RECV_BEING_READ,
  150. /** log records are being applied on the page */
  151. RECV_BEING_PROCESSED,
  152. /** log records have been applied on the page */
  153. RECV_PROCESSED,
  154. /** log records have been discarded because the tablespace
  155. does not exist */
  156. RECV_DISCARDED
  157. };
  158. /** Hashed page file address struct */
  159. struct recv_addr_t{
  160. /** recovery state of the page */
  161. recv_addr_state state;
  162. /** tablespace identifier */
  163. unsigned space:32;
  164. /** page number */
  165. unsigned page_no:32;
  166. /** list of log records for this page */
  167. UT_LIST_BASE_NODE_T(recv_t) rec_list;
  168. /** hash node in the hash bucket chain */
  169. hash_node_t addr_hash;
  170. };
  171. /** Report optimized DDL operation (without redo log),
  172. corresponding to MLOG_INDEX_LOAD.
  173. @param[in] space_id tablespace identifier
  174. */
  175. void (*log_optimized_ddl_op)(ulint space_id);
  176. /** Report backup-unfriendly TRUNCATE operation (with separate log file),
  177. corresponding to MLOG_TRUNCATE. */
  178. void (*log_truncate)();
  179. /** Report an operation to create, delete, or rename a file during backup.
  180. @param[in] space_id tablespace identifier
  181. @param[in] flags tablespace flags (NULL if not create)
  182. @param[in] name file name (not NUL-terminated)
  183. @param[in] len length of name, in bytes
  184. @param[in] new_name new file name (NULL if not rename)
  185. @param[in] new_len length of new_name, in bytes (0 if NULL) */
  186. void (*log_file_op)(ulint space_id, const byte* flags,
  187. const byte* name, ulint len,
  188. const byte* new_name, ulint new_len);
  189. /** Information about initializing page contents during redo log processing */
  190. class mlog_init_t
  191. {
  192. public:
  193. /** A page initialization operation that was parsed from
  194. the redo log */
  195. struct init {
  196. /** log sequence number of the page initialization */
  197. lsn_t lsn;
  198. /** Whether btr_page_create() avoided a read of the page.
  199. At the end of the last recovery batch, ibuf_merge()
  200. will invoke change buffer merge for pages that reside
  201. in the buffer pool. (In the last batch, loading pages
  202. would trigger change buffer merge.) */
  203. bool created;
  204. };
  205. private:
  206. typedef std::map<const page_id_t, init,
  207. std::less<const page_id_t>,
  208. ut_allocator<std::pair<const page_id_t, init> > >
  209. map;
  210. /** Map of page initialization operations.
  211. FIXME: Merge this to recv_sys->addr_hash! */
  212. map inits;
  213. public:
  214. /** Record that a page will be initialized by the redo log.
  215. @param[in] space tablespace identifier
  216. @param[in] page_no page number
  217. @param[in] lsn log sequence number */
  218. void add(ulint space, ulint page_no, lsn_t lsn)
  219. {
  220. ut_ad(mutex_own(&recv_sys->mutex));
  221. const init init = { lsn, false };
  222. std::pair<map::iterator, bool> p = inits.insert(
  223. map::value_type(page_id_t(space, page_no), init));
  224. ut_ad(!p.first->second.created);
  225. if (!p.second && p.first->second.lsn < init.lsn) {
  226. p.first->second = init;
  227. }
  228. }
  229. /** Get the last stored lsn of the page id and its respective
  230. init/load operation.
  231. @param[in] page_id page id
  232. @param[in,out] init initialize log or load log
  233. @return the latest page initialization;
  234. not valid after releasing recv_sys->mutex. */
  235. init& last(page_id_t page_id)
  236. {
  237. ut_ad(mutex_own(&recv_sys->mutex));
  238. return inits.find(page_id)->second;
  239. }
  240. /** At the end of each recovery batch, reset the 'created' flags. */
  241. void reset()
  242. {
  243. ut_ad(mutex_own(&recv_sys->mutex));
  244. ut_ad(recv_no_ibuf_operations);
  245. for (map::iterator i= inits.begin(); i != inits.end(); i++) {
  246. i->second.created = false;
  247. }
  248. }
  249. /** On the last recovery batch, merge buffered changes to those
  250. pages that were initialized by buf_page_create() and still reside
  251. in the buffer pool. Stale pages are not allowed in the buffer pool.
  252. Note: When MDEV-14481 implements redo log apply in the
  253. background, we will have to ensure that buf_page_get_gen()
  254. will not deliver stale pages to users (pages on which the
  255. change buffer was not merged yet). Normally, the change
  256. buffer merge is performed on I/O completion. Maybe, add a
  257. flag to buf_page_t and perform the change buffer merge on
  258. the first actual access?
  259. @param[in,out] mtr dummy mini-transaction */
  260. void ibuf_merge(mtr_t& mtr)
  261. {
  262. ut_ad(mutex_own(&recv_sys->mutex));
  263. ut_ad(!recv_no_ibuf_operations);
  264. mtr.start();
  265. for (map::const_iterator i= inits.begin(); i != inits.end();
  266. i++) {
  267. if (!i->second.created) {
  268. continue;
  269. }
  270. if (buf_block_t* block = buf_page_get_low(
  271. i->first, univ_page_size, RW_X_LATCH, NULL,
  272. BUF_GET_IF_IN_POOL, __FILE__, __LINE__,
  273. &mtr, NULL)) {
  274. mutex_exit(&recv_sys->mutex);
  275. ibuf_merge_or_delete_for_page(
  276. block, i->first,
  277. &block->page.size, true);
  278. mtr.commit();
  279. mtr.start();
  280. mutex_enter(&recv_sys->mutex);
  281. }
  282. }
  283. mtr.commit();
  284. }
  285. /** Clear the data structure */
  286. void clear() { inits.clear(); }
  287. };
  288. static mlog_init_t mlog_init;
  289. /** Process a MLOG_CREATE2 record that indicates that a tablespace
  290. is being shrunk in size.
  291. @param[in] space_id tablespace identifier
  292. @param[in] pages trimmed size of the file, in pages
  293. @param[in] lsn log sequence number of the operation */
  294. static void recv_addr_trim(ulint space_id, unsigned pages, lsn_t lsn)
  295. {
  296. DBUG_ENTER("recv_addr_trim");
  297. DBUG_LOG("ib_log",
  298. "discarding log beyond end of tablespace "
  299. << page_id_t(space_id, pages) << " before LSN " << lsn);
  300. ut_ad(mutex_own(&recv_sys->mutex));
  301. for (ulint i = recv_sys->addr_hash->n_cells; i--; ) {
  302. hash_cell_t* const cell = hash_get_nth_cell(
  303. recv_sys->addr_hash, i);
  304. for (recv_addr_t* addr = static_cast<recv_addr_t*>(cell->node),
  305. *next;
  306. addr; addr = next) {
  307. next = static_cast<recv_addr_t*>(addr->addr_hash);
  308. if (addr->space != space_id || addr->page_no < pages) {
  309. continue;
  310. }
  311. for (recv_t* recv = UT_LIST_GET_FIRST(addr->rec_list);
  312. recv; ) {
  313. recv_t* n = UT_LIST_GET_NEXT(rec_list, recv);
  314. if (recv->start_lsn < lsn) {
  315. DBUG_PRINT("ib_log",
  316. ("Discarding %s for"
  317. " page %u:%u at " LSN_PF,
  318. get_mlog_string(
  319. recv->type),
  320. addr->space, addr->page_no,
  321. recv->start_lsn));
  322. UT_LIST_REMOVE(addr->rec_list, recv);
  323. }
  324. recv = n;
  325. }
  326. }
  327. }
  328. if (fil_space_t* space = fil_space_get(space_id)) {
  329. ut_ad(UT_LIST_GET_LEN(space->chain) == 1);
  330. fil_node_t* file = UT_LIST_GET_FIRST(space->chain);
  331. ut_ad(file->is_open());
  332. os_file_truncate(file->name, file->handle,
  333. os_offset_t(pages) << srv_page_size_shift,
  334. true);
  335. }
  336. DBUG_VOID_RETURN;
  337. }
  338. /** Process a file name from a MLOG_FILE_* record.
  339. @param[in,out] name file name
  340. @param[in] len length of the file name
  341. @param[in] space_id the tablespace ID
  342. @param[in] deleted whether this is a MLOG_FILE_DELETE record */
  343. static
  344. void
  345. fil_name_process(
  346. char* name,
  347. ulint len,
  348. ulint space_id,
  349. bool deleted)
  350. {
  351. if (srv_operation == SRV_OPERATION_BACKUP) {
  352. return;
  353. }
  354. ut_ad(srv_operation == SRV_OPERATION_NORMAL
  355. || srv_operation == SRV_OPERATION_RESTORE
  356. || srv_operation == SRV_OPERATION_RESTORE_EXPORT);
  357. /* We will also insert space=NULL into the map, so that
  358. further checks can ensure that a MLOG_FILE_NAME record was
  359. scanned before applying any page records for the space_id. */
  360. os_normalize_path(name);
  361. file_name_t fname(std::string(name, len - 1), deleted);
  362. std::pair<recv_spaces_t::iterator,bool> p = recv_spaces.insert(
  363. std::make_pair(space_id, fname));
  364. ut_ad(p.first->first == space_id);
  365. file_name_t& f = p.first->second;
  366. if (deleted) {
  367. /* Got MLOG_FILE_DELETE */
  368. if (!p.second && f.status != file_name_t::DELETED) {
  369. f.status = file_name_t::DELETED;
  370. if (f.space != NULL) {
  371. fil_space_free(space_id, false);
  372. f.space = NULL;
  373. }
  374. }
  375. ut_ad(f.space == NULL);
  376. } else if (p.second // the first MLOG_FILE_NAME or MLOG_FILE_RENAME2
  377. || f.name != fname.name) {
  378. fil_space_t* space;
  379. /* Check if the tablespace file exists and contains
  380. the space_id. If not, ignore the file after displaying
  381. a note. Abort if there are multiple files with the
  382. same space_id. */
  383. switch (fil_ibd_load(space_id, name, space)) {
  384. case FIL_LOAD_OK:
  385. ut_ad(space != NULL);
  386. if (f.space == NULL || f.space == space) {
  387. if (f.size && f.space == NULL) {
  388. fil_space_set_recv_size(space->id, f.size);
  389. }
  390. f.name = fname.name;
  391. f.space = space;
  392. f.status = file_name_t::NORMAL;
  393. } else {
  394. ib::error() << "Tablespace " << space_id
  395. << " has been found in two places: '"
  396. << f.name << "' and '" << name << "'."
  397. " You must delete one of them.";
  398. recv_sys->found_corrupt_fs = true;
  399. }
  400. break;
  401. case FIL_LOAD_ID_CHANGED:
  402. ut_ad(space == NULL);
  403. break;
  404. case FIL_LOAD_NOT_FOUND:
  405. /* No matching tablespace was found; maybe it
  406. was renamed, and we will find a subsequent
  407. MLOG_FILE_* record. */
  408. ut_ad(space == NULL);
  409. if (srv_force_recovery) {
  410. /* Without innodb_force_recovery,
  411. missing tablespaces will only be
  412. reported in
  413. recv_init_crash_recovery_spaces().
  414. Enable some more diagnostics when
  415. forcing recovery. */
  416. ib::info()
  417. << "At LSN: " << recv_sys->recovered_lsn
  418. << ": unable to open file " << name
  419. << " for tablespace " << space_id;
  420. }
  421. break;
  422. case FIL_LOAD_INVALID:
  423. ut_ad(space == NULL);
  424. if (srv_force_recovery == 0) {
  425. ib::warn() << "We do not continue the crash"
  426. " recovery, because the table may"
  427. " become corrupt if we cannot apply"
  428. " the log records in the InnoDB log to"
  429. " it. To fix the problem and start"
  430. " mysqld:";
  431. ib::info() << "1) If there is a permission"
  432. " problem in the file and mysqld"
  433. " cannot open the file, you should"
  434. " modify the permissions.";
  435. ib::info() << "2) If the tablespace is not"
  436. " needed, or you can restore an older"
  437. " version from a backup, then you can"
  438. " remove the .ibd file, and use"
  439. " --innodb_force_recovery=1 to force"
  440. " startup without this file.";
  441. ib::info() << "3) If the file system or the"
  442. " disk is broken, and you cannot"
  443. " remove the .ibd file, you can set"
  444. " --innodb_force_recovery.";
  445. recv_sys->found_corrupt_fs = true;
  446. break;
  447. }
  448. ib::info() << "innodb_force_recovery was set to "
  449. << srv_force_recovery << ". Continuing crash"
  450. " recovery even though we cannot access the"
  451. " files for tablespace " << space_id << ".";
  452. break;
  453. }
  454. }
  455. }
  456. /** Parse or process a MLOG_FILE_* record.
  457. @param[in] ptr redo log record
  458. @param[in] end end of the redo log buffer
  459. @param[in] space_id the tablespace ID
  460. @param[in] first_page_no first page number in the file
  461. @param[in] type MLOG_FILE_NAME or MLOG_FILE_DELETE
  462. or MLOG_FILE_CREATE2 or MLOG_FILE_RENAME2
  463. @param[in] apply whether to apply the record
  464. @return pointer to next redo log record
  465. @retval NULL if this log record was truncated */
  466. static
  467. byte*
  468. fil_name_parse(
  469. byte* ptr,
  470. const byte* end,
  471. ulint space_id,
  472. ulint first_page_no,
  473. mlog_id_t type,
  474. bool apply)
  475. {
  476. if (type == MLOG_FILE_CREATE2) {
  477. if (end < ptr + 4) {
  478. return(NULL);
  479. }
  480. ptr += 4;
  481. }
  482. if (end < ptr + 2) {
  483. return(NULL);
  484. }
  485. ulint len = mach_read_from_2(ptr);
  486. ptr += 2;
  487. if (end < ptr + len) {
  488. return(NULL);
  489. }
  490. /* MLOG_FILE_* records should only be written for
  491. user-created tablespaces. The name must be long enough
  492. and end in .ibd. */
  493. bool corrupt = is_predefined_tablespace(space_id)
  494. || len < sizeof "/a.ibd\0"
  495. || (!first_page_no != !memcmp(ptr + len - 5, DOT_IBD, 5));
  496. if (!corrupt && !memchr(ptr, OS_PATH_SEPARATOR, len)) {
  497. if (byte* c = static_cast<byte*>
  498. (memchr(ptr, OS_PATH_SEPARATOR_ALT, len))) {
  499. ut_ad(c >= ptr);
  500. ut_ad(c < ptr + len);
  501. do {
  502. *c = OS_PATH_SEPARATOR;
  503. } while ((c = static_cast<byte*>
  504. (memchr(ptr, OS_PATH_SEPARATOR_ALT,
  505. len - ulint(c - ptr)))) != NULL);
  506. } else {
  507. corrupt = true;
  508. }
  509. }
  510. byte* end_ptr = ptr + len;
  511. switch (type) {
  512. default:
  513. ut_ad(0); // the caller checked this
  514. /* fall through */
  515. case MLOG_FILE_NAME:
  516. if (corrupt) {
  517. ib::error() << "MLOG_FILE_NAME incorrect:" << ptr;
  518. recv_sys->found_corrupt_log = true;
  519. break;
  520. }
  521. fil_name_process(
  522. reinterpret_cast<char*>(ptr), len, space_id, false);
  523. break;
  524. case MLOG_FILE_DELETE:
  525. if (corrupt) {
  526. ib::error() << "MLOG_FILE_DELETE incorrect:" << ptr;
  527. recv_sys->found_corrupt_log = true;
  528. break;
  529. }
  530. fil_name_process(
  531. reinterpret_cast<char*>(ptr), len, space_id, true);
  532. /* fall through */
  533. case MLOG_FILE_CREATE2:
  534. if (first_page_no) {
  535. ut_ad(first_page_no
  536. == SRV_UNDO_TABLESPACE_SIZE_IN_PAGES);
  537. ut_a(srv_is_undo_tablespace(space_id));
  538. compile_time_assert(
  539. UT_ARR_SIZE(recv_sys->truncated_undo_spaces)
  540. == TRX_SYS_MAX_UNDO_SPACES);
  541. recv_sys_t::trunc& t = recv_sys->truncated_undo_spaces[
  542. space_id - srv_undo_space_id_start];
  543. t.lsn = recv_sys->recovered_lsn;
  544. t.pages = uint32_t(first_page_no);
  545. } else if (log_file_op) {
  546. log_file_op(space_id,
  547. type == MLOG_FILE_CREATE2 ? ptr - 4 : NULL,
  548. ptr, len, NULL, 0);
  549. }
  550. break;
  551. case MLOG_FILE_RENAME2:
  552. if (corrupt) {
  553. ib::error() << "MLOG_FILE_RENAME2 incorrect:" << ptr;
  554. recv_sys->found_corrupt_log = true;
  555. }
  556. /* The new name follows the old name. */
  557. byte* new_name = end_ptr + 2;
  558. if (end < new_name) {
  559. return(NULL);
  560. }
  561. ulint new_len = mach_read_from_2(end_ptr);
  562. if (end < end_ptr + 2 + new_len) {
  563. return(NULL);
  564. }
  565. end_ptr += 2 + new_len;
  566. corrupt = corrupt
  567. || new_len < sizeof "/a.ibd\0"
  568. || memcmp(new_name + new_len - 5, DOT_IBD, 5) != 0;
  569. if (!corrupt && !memchr(new_name, OS_PATH_SEPARATOR, new_len)) {
  570. if (byte* c = static_cast<byte*>
  571. (memchr(new_name, OS_PATH_SEPARATOR_ALT,
  572. new_len))) {
  573. ut_ad(c >= new_name);
  574. ut_ad(c < new_name + new_len);
  575. do {
  576. *c = OS_PATH_SEPARATOR;
  577. } while ((c = static_cast<byte*>
  578. (memchr(ptr, OS_PATH_SEPARATOR_ALT,
  579. new_len
  580. - ulint(c - new_name))))
  581. != NULL);
  582. } else {
  583. corrupt = true;
  584. }
  585. }
  586. if (corrupt) {
  587. ib::error() << "MLOG_FILE_RENAME2 new_name incorrect:" << ptr
  588. << " new_name: " << new_name;
  589. recv_sys->found_corrupt_log = true;
  590. break;
  591. }
  592. fil_name_process(
  593. reinterpret_cast<char*>(ptr), len,
  594. space_id, false);
  595. fil_name_process(
  596. reinterpret_cast<char*>(new_name), new_len,
  597. space_id, false);
  598. if (log_file_op) {
  599. log_file_op(space_id, NULL,
  600. ptr, len, new_name, new_len);
  601. }
  602. if (!apply) {
  603. break;
  604. }
  605. if (!fil_op_replay_rename(
  606. space_id, first_page_no,
  607. reinterpret_cast<const char*>(ptr),
  608. reinterpret_cast<const char*>(new_name))) {
  609. recv_sys->found_corrupt_fs = true;
  610. }
  611. }
  612. return(end_ptr);
  613. }
  614. /** Clean up after recv_sys_init() */
  615. void
  616. recv_sys_close()
  617. {
  618. if (recv_sys != NULL) {
  619. recv_sys->dblwr.pages.clear();
  620. if (recv_sys->addr_hash != NULL) {
  621. hash_table_free(recv_sys->addr_hash);
  622. }
  623. if (recv_sys->heap != NULL) {
  624. mem_heap_free(recv_sys->heap);
  625. }
  626. if (recv_sys->flush_start != NULL) {
  627. os_event_destroy(recv_sys->flush_start);
  628. }
  629. if (recv_sys->flush_end != NULL) {
  630. os_event_destroy(recv_sys->flush_end);
  631. }
  632. ut_free(recv_sys->buf);
  633. ut_ad(!recv_writer_thread_active);
  634. mutex_free(&recv_sys->writer_mutex);
  635. mutex_free(&recv_sys->mutex);
  636. ut_free(recv_sys);
  637. recv_sys = NULL;
  638. }
  639. recv_spaces.clear();
  640. mlog_init.clear();
  641. }
  642. /************************************************************
  643. Reset the state of the recovery system variables. */
  644. void
  645. recv_sys_var_init(void)
  646. /*===================*/
  647. {
  648. recv_recovery_on = false;
  649. recv_needed_recovery = false;
  650. recv_lsn_checks_on = false;
  651. recv_no_ibuf_operations = false;
  652. recv_previous_parsed_rec_type = MLOG_SINGLE_REC_FLAG;
  653. recv_previous_parsed_rec_offset = 0;
  654. recv_previous_parsed_rec_is_multi = 0;
  655. recv_max_page_lsn = 0;
  656. }
  657. /******************************************************************//**
  658. recv_writer thread tasked with flushing dirty pages from the buffer
  659. pools.
  660. @return a dummy parameter */
  661. extern "C"
  662. os_thread_ret_t
  663. DECLARE_THREAD(recv_writer_thread)(
  664. /*===============================*/
  665. void* arg MY_ATTRIBUTE((unused)))
  666. /*!< in: a dummy parameter required by
  667. os_thread_create */
  668. {
  669. my_thread_init();
  670. ut_ad(!srv_read_only_mode);
  671. #ifdef UNIV_PFS_THREAD
  672. pfs_register_thread(recv_writer_thread_key);
  673. #endif /* UNIV_PFS_THREAD */
  674. #ifdef UNIV_DEBUG_THREAD_CREATION
  675. ib::info() << "recv_writer thread running, id "
  676. << os_thread_pf(os_thread_get_curr_id());
  677. #endif /* UNIV_DEBUG_THREAD_CREATION */
  678. while (srv_shutdown_state == SRV_SHUTDOWN_NONE) {
  679. /* Wait till we get a signal to clean the LRU list.
  680. Bounded by max wait time of 100ms. */
  681. ib_uint64_t sig_count = os_event_reset(buf_flush_event);
  682. os_event_wait_time_low(buf_flush_event, 100000, sig_count);
  683. mutex_enter(&recv_sys->writer_mutex);
  684. if (!recv_recovery_is_on()) {
  685. mutex_exit(&recv_sys->writer_mutex);
  686. break;
  687. }
  688. /* Flush pages from end of LRU if required */
  689. os_event_reset(recv_sys->flush_end);
  690. recv_sys->flush_type = BUF_FLUSH_LRU;
  691. os_event_set(recv_sys->flush_start);
  692. os_event_wait(recv_sys->flush_end);
  693. mutex_exit(&recv_sys->writer_mutex);
  694. }
  695. recv_writer_thread_active = false;
  696. my_thread_end();
  697. /* We count the number of threads in os_thread_exit().
  698. A created thread should always use that to exit and not
  699. use return() to exit. */
  700. os_thread_exit();
  701. OS_THREAD_DUMMY_RETURN;
  702. }
  703. /** Initialize the redo log recovery subsystem. */
  704. void
  705. recv_sys_init()
  706. {
  707. ut_ad(recv_sys == NULL);
  708. recv_sys = static_cast<recv_sys_t*>(ut_zalloc_nokey(sizeof(*recv_sys)));
  709. mutex_create(LATCH_ID_RECV_SYS, &recv_sys->mutex);
  710. mutex_create(LATCH_ID_RECV_WRITER, &recv_sys->writer_mutex);
  711. recv_sys->heap = mem_heap_create_typed(256, MEM_HEAP_FOR_RECV_SYS);
  712. if (!srv_read_only_mode) {
  713. recv_sys->flush_start = os_event_create(0);
  714. recv_sys->flush_end = os_event_create(0);
  715. }
  716. recv_sys->buf = static_cast<byte*>(
  717. ut_malloc_nokey(RECV_PARSING_BUF_SIZE));
  718. recv_sys->addr_hash = hash_create(buf_pool_get_curr_size() / 512);
  719. recv_sys->progress_time = time(NULL);
  720. recv_max_page_lsn = 0;
  721. /* Call the constructor for recv_sys_t::dblwr member */
  722. new (&recv_sys->dblwr) recv_dblwr_t();
  723. }
  724. /** Empty a fully processed hash table. */
  725. static
  726. void
  727. recv_sys_empty_hash()
  728. {
  729. ut_ad(mutex_own(&(recv_sys->mutex)));
  730. ut_a(recv_sys->n_addrs == 0);
  731. hash_table_free(recv_sys->addr_hash);
  732. mem_heap_empty(recv_sys->heap);
  733. recv_sys->addr_hash = hash_create(buf_pool_get_curr_size() / 512);
  734. }
  735. /********************************************************//**
  736. Frees the recovery system. */
  737. void
  738. recv_sys_debug_free(void)
  739. /*=====================*/
  740. {
  741. mutex_enter(&(recv_sys->mutex));
  742. hash_table_free(recv_sys->addr_hash);
  743. mem_heap_free(recv_sys->heap);
  744. ut_free(recv_sys->buf);
  745. recv_sys->buf = NULL;
  746. recv_sys->heap = NULL;
  747. recv_sys->addr_hash = NULL;
  748. /* wake page cleaner up to progress */
  749. if (!srv_read_only_mode) {
  750. ut_ad(!recv_recovery_is_on());
  751. ut_ad(!recv_writer_thread_active);
  752. os_event_reset(buf_flush_event);
  753. os_event_set(recv_sys->flush_start);
  754. }
  755. mutex_exit(&(recv_sys->mutex));
  756. }
  757. /** Read a log segment to a buffer.
  758. @param[out] buf buffer
  759. @param[in] group redo log files
  760. @param[in, out] start_lsn in : read area start, out: the last read valid lsn
  761. @param[in] end_lsn read area end
  762. @param[out] invalid_block - invalid, (maybe incompletely written) block encountered
  763. @return false, if invalid block encountered (e.g checksum mismatch), true otherwise */
  764. bool
  765. log_group_read_log_seg(
  766. byte* buf,
  767. const log_group_t* group,
  768. lsn_t *start_lsn,
  769. lsn_t end_lsn)
  770. {
  771. ulint len;
  772. lsn_t source_offset;
  773. bool success = true;
  774. ut_ad(log_mutex_own());
  775. ut_ad(!(*start_lsn % OS_FILE_LOG_BLOCK_SIZE));
  776. ut_ad(!(end_lsn % OS_FILE_LOG_BLOCK_SIZE));
  777. loop:
  778. source_offset = log_group_calc_lsn_offset(*start_lsn, group);
  779. ut_a(end_lsn - *start_lsn <= ULINT_MAX);
  780. len = (ulint) (end_lsn - *start_lsn);
  781. ut_ad(len != 0);
  782. const bool at_eof = (source_offset % group->file_size) + len
  783. > group->file_size;
  784. if (at_eof) {
  785. /* If the above condition is true then len (which is ulint)
  786. is > the expression below, so the typecast is ok */
  787. len = (ulint) (group->file_size -
  788. (source_offset % group->file_size));
  789. }
  790. log_sys->n_log_ios++;
  791. MONITOR_INC(MONITOR_LOG_IO);
  792. ut_a(source_offset / UNIV_PAGE_SIZE <= ULINT_MAX);
  793. const ulint page_no
  794. = (ulint) (source_offset / univ_page_size.physical());
  795. fil_io(IORequestLogRead, true,
  796. page_id_t(SRV_LOG_SPACE_FIRST_ID, page_no),
  797. univ_page_size,
  798. (ulint) (source_offset % univ_page_size.physical()),
  799. len, buf, NULL);
  800. for (ulint l = 0; l < len; l += OS_FILE_LOG_BLOCK_SIZE,
  801. buf += OS_FILE_LOG_BLOCK_SIZE,
  802. (*start_lsn) += OS_FILE_LOG_BLOCK_SIZE) {
  803. const ulint block_number = log_block_get_hdr_no(buf);
  804. if (block_number != log_block_convert_lsn_to_no(*start_lsn)) {
  805. /* Garbage or an incompletely written log block.
  806. We will not report any error, because this can
  807. happen when InnoDB was killed while it was
  808. writing redo log. We simply treat this as an
  809. abrupt end of the redo log. */
  810. fail:
  811. end_lsn = *start_lsn;
  812. success = false;
  813. break;
  814. }
  815. if (innodb_log_checksums || group->is_encrypted()) {
  816. ulint crc = log_block_calc_checksum_crc32(buf);
  817. ulint cksum = log_block_get_checksum(buf);
  818. DBUG_EXECUTE_IF("log_intermittent_checksum_mismatch", {
  819. static int block_counter;
  820. if (block_counter++ == 0) {
  821. cksum = crc + 1;
  822. }
  823. });
  824. if (crc != cksum) {
  825. ib::error() << "Invalid log block checksum."
  826. << " block: " << block_number
  827. << " checkpoint no: "
  828. << log_block_get_checkpoint_no(buf)
  829. << " expected: " << crc
  830. << " found: " << cksum;
  831. goto fail;
  832. }
  833. if (group->is_encrypted()) {
  834. log_crypt(buf, *start_lsn,
  835. OS_FILE_LOG_BLOCK_SIZE, true);
  836. }
  837. }
  838. ulint dl = log_block_get_data_len(buf);
  839. if (dl < LOG_BLOCK_HDR_SIZE
  840. || (dl > OS_FILE_LOG_BLOCK_SIZE - LOG_BLOCK_TRL_SIZE
  841. && dl != OS_FILE_LOG_BLOCK_SIZE)) {
  842. recv_sys->found_corrupt_log = true;
  843. goto fail;
  844. }
  845. }
  846. if (recv_sys->report(time(NULL))) {
  847. ib::info() << "Read redo log up to LSN=" << *start_lsn;
  848. service_manager_extend_timeout(INNODB_EXTEND_TIMEOUT_INTERVAL,
  849. "Read redo log up to LSN=" LSN_PF,
  850. *start_lsn);
  851. }
  852. if (*start_lsn != end_lsn) {
  853. goto loop;
  854. }
  855. return(success);
  856. }
  857. /********************************************************//**
  858. Copies a log segment from the most up-to-date log group to the other log
  859. groups, so that they all contain the latest log data. Also writes the info
  860. about the latest checkpoint to the groups, and inits the fields in the group
  861. memory structs to up-to-date values. */
  862. static
  863. void
  864. recv_synchronize_groups()
  865. {
  866. const lsn_t recovered_lsn = recv_sys->recovered_lsn;
  867. /* Read the last recovered log block to the recovery system buffer:
  868. the block is always incomplete */
  869. lsn_t start_lsn = ut_uint64_align_down(recovered_lsn,
  870. OS_FILE_LOG_BLOCK_SIZE);
  871. log_group_read_log_seg(log_sys->buf, &log_sys->log,
  872. &start_lsn, start_lsn + OS_FILE_LOG_BLOCK_SIZE);
  873. /* Update the fields in the group struct to correspond to
  874. recovered_lsn */
  875. log_group_set_fields(&log_sys->log, recovered_lsn);
  876. /* Copy the checkpoint info to the log; remember that we have
  877. incremented checkpoint_no by one, and the info will not be written
  878. over the max checkpoint info, thus making the preservation of max
  879. checkpoint info on disk certain */
  880. if (!srv_read_only_mode) {
  881. log_write_checkpoint_info(true, 0);
  882. log_mutex_enter();
  883. }
  884. }
  885. /** Check the consistency of a log header block.
  886. @param[in] log header block
  887. @return true if ok */
  888. static
  889. bool
  890. recv_check_log_header_checksum(
  891. const byte* buf)
  892. {
  893. return(log_block_get_checksum(buf)
  894. == log_block_calc_checksum_crc32(buf));
  895. }
  896. /** Find the latest checkpoint in the format-0 log header.
  897. @param[out] max_group log group, or NULL
  898. @param[out] max_field LOG_CHECKPOINT_1 or LOG_CHECKPOINT_2
  899. @return error code or DB_SUCCESS */
  900. static MY_ATTRIBUTE((warn_unused_result))
  901. dberr_t
  902. recv_find_max_checkpoint_0(log_group_t** max_group, ulint* max_field)
  903. {
  904. log_group_t* group = &log_sys->log;
  905. ib_uint64_t max_no = 0;
  906. ib_uint64_t checkpoint_no;
  907. byte* buf = log_sys->checkpoint_buf;
  908. ut_ad(group->format == 0);
  909. /** Offset of the first checkpoint checksum */
  910. static const uint CHECKSUM_1 = 288;
  911. /** Offset of the second checkpoint checksum */
  912. static const uint CHECKSUM_2 = CHECKSUM_1 + 4;
  913. /** Most significant bits of the checkpoint offset */
  914. static const uint OFFSET_HIGH32 = CHECKSUM_2 + 12;
  915. /** Least significant bits of the checkpoint offset */
  916. static const uint OFFSET_LOW32 = 16;
  917. *max_group = NULL;
  918. for (ulint field = LOG_CHECKPOINT_1; field <= LOG_CHECKPOINT_2;
  919. field += LOG_CHECKPOINT_2 - LOG_CHECKPOINT_1) {
  920. log_group_header_read(group, field);
  921. if (static_cast<uint32_t>(ut_fold_binary(buf, CHECKSUM_1))
  922. != mach_read_from_4(buf + CHECKSUM_1)
  923. || static_cast<uint32_t>(
  924. ut_fold_binary(buf + LOG_CHECKPOINT_LSN,
  925. CHECKSUM_2 - LOG_CHECKPOINT_LSN))
  926. != mach_read_from_4(buf + CHECKSUM_2)) {
  927. DBUG_LOG("ib_log",
  928. "invalid pre-10.2.2 checkpoint " << field);
  929. continue;
  930. }
  931. checkpoint_no = mach_read_from_8(
  932. buf + LOG_CHECKPOINT_NO);
  933. if (!log_crypt_101_read_checkpoint(buf)) {
  934. ib::error() << "Decrypting checkpoint failed";
  935. continue;
  936. }
  937. DBUG_PRINT("ib_log",
  938. ("checkpoint " UINT64PF " at " LSN_PF " found",
  939. checkpoint_no,
  940. mach_read_from_8(buf + LOG_CHECKPOINT_LSN)));
  941. if (checkpoint_no >= max_no) {
  942. *max_group = group;
  943. *max_field = field;
  944. max_no = checkpoint_no;
  945. group->state = LOG_GROUP_OK;
  946. group->lsn = mach_read_from_8(
  947. buf + LOG_CHECKPOINT_LSN);
  948. group->lsn_offset = static_cast<ib_uint64_t>(
  949. mach_read_from_4(buf + OFFSET_HIGH32)) << 32
  950. | mach_read_from_4(buf + OFFSET_LOW32);
  951. }
  952. }
  953. if (*max_group != NULL) {
  954. return(DB_SUCCESS);
  955. }
  956. ib::error() << "Upgrade after a crash is not supported."
  957. " This redo log was created before MariaDB 10.2.2,"
  958. " and we did not find a valid checkpoint."
  959. " Please follow the instructions at"
  960. " https://mariadb.com/kb/en/library/upgrading/";
  961. return(DB_ERROR);
  962. }
  963. /** Determine if a pre-MySQL 5.7.9/MariaDB 10.2.2 redo log is clean.
  964. @param[in] lsn checkpoint LSN
  965. @param[in] crypt whether the log might be encrypted
  966. @return error code
  967. @retval DB_SUCCESS if the redo log is clean
  968. @retval DB_ERROR if the redo log is corrupted or dirty */
  969. static dberr_t recv_log_format_0_recover(lsn_t lsn, bool crypt)
  970. {
  971. log_mutex_enter();
  972. log_group_t* group = &log_sys->log;
  973. const lsn_t source_offset
  974. = log_group_calc_lsn_offset(lsn, group);
  975. log_mutex_exit();
  976. const ulint page_no
  977. = (ulint) (source_offset / univ_page_size.physical());
  978. byte* buf = log_sys->buf;
  979. static const char* NO_UPGRADE_RECOVERY_MSG =
  980. "Upgrade after a crash is not supported."
  981. " This redo log was created before MariaDB 10.2.2";
  982. static const char* NO_UPGRADE_RTFM_MSG =
  983. ". Please follow the instructions at "
  984. "https://mariadb.com/kb/en/library/upgrading/";
  985. fil_io(IORequestLogRead, true,
  986. page_id_t(SRV_LOG_SPACE_FIRST_ID, page_no),
  987. univ_page_size,
  988. (ulint) ((source_offset & ~(OS_FILE_LOG_BLOCK_SIZE - 1))
  989. % univ_page_size.physical()),
  990. OS_FILE_LOG_BLOCK_SIZE, buf, NULL);
  991. if (log_block_calc_checksum_format_0(buf)
  992. != log_block_get_checksum(buf)
  993. && !log_crypt_101_read_block(buf)) {
  994. ib::error() << NO_UPGRADE_RECOVERY_MSG
  995. << ", and it appears corrupted"
  996. << NO_UPGRADE_RTFM_MSG;
  997. return(DB_CORRUPTION);
  998. }
  999. if (log_block_get_data_len(buf)
  1000. == (source_offset & (OS_FILE_LOG_BLOCK_SIZE - 1))) {
  1001. } else if (crypt) {
  1002. ib::error() << "Cannot decrypt log for upgrading."
  1003. " The encrypted log was created before MariaDB 10.2.2"
  1004. << NO_UPGRADE_RTFM_MSG;
  1005. return DB_ERROR;
  1006. } else {
  1007. ib::error() << NO_UPGRADE_RECOVERY_MSG
  1008. << NO_UPGRADE_RTFM_MSG;
  1009. return(DB_ERROR);
  1010. }
  1011. /* Mark the redo log for upgrading. */
  1012. srv_log_file_size = 0;
  1013. recv_sys->parse_start_lsn = recv_sys->recovered_lsn
  1014. = recv_sys->scanned_lsn
  1015. = recv_sys->mlog_checkpoint_lsn = lsn;
  1016. log_sys->last_checkpoint_lsn = log_sys->next_checkpoint_lsn
  1017. = log_sys->lsn = log_sys->write_lsn
  1018. = log_sys->current_flush_lsn = log_sys->flushed_to_disk_lsn
  1019. = lsn;
  1020. log_sys->next_checkpoint_no = 0;
  1021. return(DB_SUCCESS);
  1022. }
  1023. /** Determine if a redo log from MariaDB 10.3 is clean.
  1024. @return error code
  1025. @retval DB_SUCCESS if the redo log is clean
  1026. @retval DB_CORRUPTION if the redo log is corrupted
  1027. @retval DB_ERROR if the redo log is not empty */
  1028. static
  1029. dberr_t
  1030. recv_log_recover_10_3()
  1031. {
  1032. log_group_t* group = &log_sys->log;
  1033. const lsn_t lsn = group->lsn;
  1034. const lsn_t source_offset = log_group_calc_lsn_offset(lsn, group);
  1035. const ulint page_no
  1036. = (ulint) (source_offset / univ_page_size.physical());
  1037. byte* buf = log_sys->buf;
  1038. fil_io(IORequestLogRead, true,
  1039. page_id_t(SRV_LOG_SPACE_FIRST_ID, page_no),
  1040. univ_page_size,
  1041. (ulint) ((source_offset & ~(OS_FILE_LOG_BLOCK_SIZE - 1))
  1042. % univ_page_size.physical()),
  1043. OS_FILE_LOG_BLOCK_SIZE, buf, NULL);
  1044. if (log_block_calc_checksum(buf) != log_block_get_checksum(buf)) {
  1045. return(DB_CORRUPTION);
  1046. }
  1047. if (group->is_encrypted()) {
  1048. log_crypt(buf, lsn, OS_FILE_LOG_BLOCK_SIZE, true);
  1049. }
  1050. /* On a clean shutdown, the redo log will be logically empty
  1051. after the checkpoint lsn. */
  1052. if (log_block_get_data_len(buf)
  1053. != (source_offset & (OS_FILE_LOG_BLOCK_SIZE - 1))) {
  1054. return(DB_ERROR);
  1055. }
  1056. /* Mark the redo log for downgrading. */
  1057. srv_log_file_size = 0;
  1058. recv_sys->parse_start_lsn = recv_sys->recovered_lsn
  1059. = recv_sys->scanned_lsn
  1060. = recv_sys->mlog_checkpoint_lsn = lsn;
  1061. log_sys->last_checkpoint_lsn = log_sys->next_checkpoint_lsn
  1062. = log_sys->lsn = log_sys->write_lsn
  1063. = log_sys->current_flush_lsn = log_sys->flushed_to_disk_lsn
  1064. = lsn;
  1065. log_sys->next_checkpoint_no = 0;
  1066. return(DB_SUCCESS);
  1067. }
  1068. /** Find the latest checkpoint in the log header.
  1069. @param[out] max_field LOG_CHECKPOINT_1 or LOG_CHECKPOINT_2
  1070. @return error code or DB_SUCCESS */
  1071. dberr_t
  1072. recv_find_max_checkpoint(ulint* max_field)
  1073. {
  1074. log_group_t* group;
  1075. ib_uint64_t max_no;
  1076. ib_uint64_t checkpoint_no;
  1077. ulint field;
  1078. byte* buf;
  1079. group = &log_sys->log;
  1080. max_no = 0;
  1081. *max_field = 0;
  1082. buf = log_sys->checkpoint_buf;
  1083. group->state = LOG_GROUP_CORRUPTED;
  1084. log_group_header_read(group, 0);
  1085. /* Check the header page checksum. There was no
  1086. checksum in the first redo log format (version 0). */
  1087. group->format = mach_read_from_4(buf + LOG_HEADER_FORMAT);
  1088. group->subformat = group->format
  1089. ? mach_read_from_4(buf + LOG_HEADER_SUBFORMAT)
  1090. : 0;
  1091. if (group->format != 0
  1092. && !recv_check_log_header_checksum(buf)) {
  1093. ib::error() << "Invalid redo log header checksum.";
  1094. return(DB_CORRUPTION);
  1095. }
  1096. char creator[LOG_HEADER_CREATOR_END - LOG_HEADER_CREATOR + 1];
  1097. memcpy(creator, buf + LOG_HEADER_CREATOR, sizeof creator);
  1098. /* Ensure that the string is NUL-terminated. */
  1099. creator[LOG_HEADER_CREATOR_END - LOG_HEADER_CREATOR] = 0;
  1100. switch (group->format) {
  1101. case 0:
  1102. return(recv_find_max_checkpoint_0(&group, max_field));
  1103. case LOG_HEADER_FORMAT_10_2:
  1104. case LOG_HEADER_FORMAT_10_2 | LOG_HEADER_FORMAT_ENCRYPTED:
  1105. case LOG_HEADER_FORMAT_10_3:
  1106. case LOG_HEADER_FORMAT_10_3 | LOG_HEADER_FORMAT_ENCRYPTED:
  1107. case LOG_HEADER_FORMAT_10_4:
  1108. /* We can only parse the unencrypted LOG_HEADER_FORMAT_10_4.
  1109. The encrypted format uses a larger redo log block trailer. */
  1110. break;
  1111. default:
  1112. ib::error() << "Unsupported redo log format."
  1113. " The redo log was created"
  1114. " with " << creator <<
  1115. ". Please follow the instructions at "
  1116. "https://mariadb.com/kb/en/library/upgrading/";
  1117. /* Do not issue a message about a possibility
  1118. to cleanly shut down the newer server version
  1119. and to remove the redo logs, because the
  1120. format of the system data structures may
  1121. radically change after MySQL 5.7. */
  1122. return(DB_ERROR);
  1123. }
  1124. for (field = LOG_CHECKPOINT_1; field <= LOG_CHECKPOINT_2;
  1125. field += LOG_CHECKPOINT_2 - LOG_CHECKPOINT_1) {
  1126. log_group_header_read(group, field);
  1127. const ulint crc32 = log_block_calc_checksum_crc32(buf);
  1128. const ulint cksum = log_block_get_checksum(buf);
  1129. if (crc32 != cksum) {
  1130. DBUG_PRINT("ib_log",
  1131. ("invalid checkpoint,"
  1132. " at " ULINTPF
  1133. ", checksum " ULINTPFx
  1134. " expected " ULINTPFx,
  1135. field, cksum, crc32));
  1136. continue;
  1137. }
  1138. if (group->is_encrypted()
  1139. && !log_crypt_read_checkpoint_buf(buf)) {
  1140. ib::error() << "Reading checkpoint"
  1141. " encryption info failed.";
  1142. continue;
  1143. }
  1144. checkpoint_no = mach_read_from_8(
  1145. buf + LOG_CHECKPOINT_NO);
  1146. DBUG_PRINT("ib_log",
  1147. ("checkpoint " UINT64PF " at " LSN_PF " found",
  1148. checkpoint_no, mach_read_from_8(
  1149. buf + LOG_CHECKPOINT_LSN)));
  1150. if (checkpoint_no >= max_no) {
  1151. *max_field = field;
  1152. max_no = checkpoint_no;
  1153. group->state = LOG_GROUP_OK;
  1154. group->lsn = mach_read_from_8(
  1155. buf + LOG_CHECKPOINT_LSN);
  1156. group->lsn_offset = mach_read_from_8(
  1157. buf + LOG_CHECKPOINT_OFFSET);
  1158. log_sys->next_checkpoint_no = checkpoint_no;
  1159. }
  1160. }
  1161. if (*max_field == 0) {
  1162. /* Before 10.2.2, we could get here during database
  1163. initialization if we created an ib_logfile0 file that
  1164. was filled with zeroes, and were killed. After
  1165. 10.2.2, we would reject such a file already earlier,
  1166. when checking the file header. */
  1167. ib::error() << "No valid checkpoint found"
  1168. " (corrupted redo log)."
  1169. " You can try --innodb-force-recovery=6"
  1170. " as a last resort.";
  1171. return(DB_ERROR);
  1172. }
  1173. switch (group->format) {
  1174. case LOG_HEADER_FORMAT_10_3:
  1175. case LOG_HEADER_FORMAT_10_3 | LOG_HEADER_FORMAT_ENCRYPTED:
  1176. if (group->subformat == 1) {
  1177. /* 10.2 with new crash-safe TRUNCATE */
  1178. break;
  1179. }
  1180. /* fall through */
  1181. case LOG_HEADER_FORMAT_10_4:
  1182. if (srv_operation == SRV_OPERATION_BACKUP) {
  1183. ib::error()
  1184. << "Incompatible redo log format."
  1185. " The redo log was created with " << creator;
  1186. return DB_ERROR;
  1187. }
  1188. dberr_t err = recv_log_recover_10_3();
  1189. if (err != DB_SUCCESS) {
  1190. ib::error()
  1191. << "Downgrade after a crash is not supported."
  1192. " The redo log was created with " << creator
  1193. << (err == DB_ERROR
  1194. ? "." : ", and it appears corrupted.");
  1195. }
  1196. return(err);
  1197. }
  1198. return(DB_SUCCESS);
  1199. }
  1200. /** Try to parse a single log record body and also applies it if
  1201. specified.
  1202. @param[in] type redo log entry type
  1203. @param[in] ptr redo log record body
  1204. @param[in] end_ptr end of buffer
  1205. @param[in] space_id tablespace identifier
  1206. @param[in] page_no page number
  1207. @param[in] apply whether to apply the record
  1208. @param[in,out] block buffer block, or NULL if
  1209. a page log record should not be applied
  1210. or if it is a MLOG_FILE_ operation
  1211. @param[in,out] mtr mini-transaction, or NULL if
  1212. a page log record should not be applied
  1213. @return log record end, NULL if not a complete record */
  1214. static
  1215. byte*
  1216. recv_parse_or_apply_log_rec_body(
  1217. mlog_id_t type,
  1218. byte* ptr,
  1219. byte* end_ptr,
  1220. ulint space_id,
  1221. ulint page_no,
  1222. bool apply,
  1223. buf_block_t* block,
  1224. mtr_t* mtr)
  1225. {
  1226. ut_ad(!block == !mtr);
  1227. ut_ad(!apply || recv_sys->mlog_checkpoint_lsn != 0);
  1228. switch (type) {
  1229. case MLOG_FILE_NAME:
  1230. case MLOG_FILE_DELETE:
  1231. case MLOG_FILE_CREATE2:
  1232. case MLOG_FILE_RENAME2:
  1233. ut_ad(block == NULL);
  1234. /* Collect the file names when parsing the log,
  1235. before applying any log records. */
  1236. return(fil_name_parse(ptr, end_ptr, space_id, page_no, type,
  1237. apply));
  1238. case MLOG_INDEX_LOAD:
  1239. if (end_ptr < ptr + 8) {
  1240. return(NULL);
  1241. }
  1242. return(ptr + 8);
  1243. case MLOG_TRUNCATE:
  1244. if (log_truncate) {
  1245. ut_ad(srv_operation != SRV_OPERATION_NORMAL);
  1246. log_truncate();
  1247. recv_sys->found_corrupt_fs = true;
  1248. return NULL;
  1249. }
  1250. return(truncate_t::parse_redo_entry(ptr, end_ptr, space_id));
  1251. default:
  1252. break;
  1253. }
  1254. dict_index_t* index = NULL;
  1255. page_t* page;
  1256. page_zip_des_t* page_zip;
  1257. #ifdef UNIV_DEBUG
  1258. ulint page_type;
  1259. #endif /* UNIV_DEBUG */
  1260. if (block) {
  1261. /* Applying a page log record. */
  1262. ut_ad(apply);
  1263. page = block->frame;
  1264. page_zip = buf_block_get_page_zip(block);
  1265. ut_d(page_type = fil_page_get_type(page));
  1266. } else if (apply
  1267. && !is_predefined_tablespace(space_id)
  1268. && recv_spaces.find(space_id) == recv_spaces.end()) {
  1269. if (recv_sys->recovered_lsn < recv_sys->mlog_checkpoint_lsn) {
  1270. /* We have not seen all records between the
  1271. checkpoint and MLOG_CHECKPOINT. There should be
  1272. a MLOG_FILE_DELETE for this tablespace later. */
  1273. recv_spaces.insert(
  1274. std::make_pair(space_id,
  1275. file_name_t("", false)));
  1276. goto parse_log;
  1277. }
  1278. ib::error() << "Missing MLOG_FILE_NAME or MLOG_FILE_DELETE"
  1279. " for redo log record " << type << " (page "
  1280. << space_id << ":" << page_no << ") at "
  1281. << recv_sys->recovered_lsn << ".";
  1282. recv_sys->found_corrupt_log = true;
  1283. return(NULL);
  1284. } else {
  1285. parse_log:
  1286. /* Parsing a page log record. */
  1287. page = NULL;
  1288. page_zip = NULL;
  1289. ut_d(page_type = FIL_PAGE_TYPE_ALLOCATED);
  1290. }
  1291. const byte* old_ptr = ptr;
  1292. switch (type) {
  1293. #ifdef UNIV_LOG_LSN_DEBUG
  1294. case MLOG_LSN:
  1295. /* The LSN is checked in recv_parse_log_rec(). */
  1296. break;
  1297. #endif /* UNIV_LOG_LSN_DEBUG */
  1298. case MLOG_1BYTE: case MLOG_2BYTES: case MLOG_4BYTES: case MLOG_8BYTES:
  1299. #ifdef UNIV_DEBUG
  1300. if (page && page_type == FIL_PAGE_TYPE_ALLOCATED
  1301. && end_ptr >= ptr + 2) {
  1302. /* It is OK to set FIL_PAGE_TYPE and certain
  1303. list node fields on an empty page. Any other
  1304. write is not OK. */
  1305. /* NOTE: There may be bogus assertion failures for
  1306. dict_hdr_create(), trx_rseg_header_create(),
  1307. trx_sys_create_doublewrite_buf(), and
  1308. trx_sysf_create().
  1309. These are only called during database creation. */
  1310. ulint offs = mach_read_from_2(ptr);
  1311. switch (type) {
  1312. default:
  1313. ut_error;
  1314. case MLOG_2BYTES:
  1315. /* Note that this can fail when the
  1316. redo log been written with something
  1317. older than InnoDB Plugin 1.0.4. */
  1318. ut_ad(offs == FIL_PAGE_TYPE
  1319. || srv_is_undo_tablespace(space_id)
  1320. || offs == IBUF_TREE_SEG_HEADER
  1321. + IBUF_HEADER + FSEG_HDR_OFFSET
  1322. || offs == PAGE_BTR_IBUF_FREE_LIST
  1323. + PAGE_HEADER + FIL_ADDR_BYTE
  1324. || offs == PAGE_BTR_IBUF_FREE_LIST
  1325. + PAGE_HEADER + FIL_ADDR_BYTE
  1326. + FIL_ADDR_SIZE
  1327. || offs == PAGE_BTR_SEG_LEAF
  1328. + PAGE_HEADER + FSEG_HDR_OFFSET
  1329. || offs == PAGE_BTR_SEG_TOP
  1330. + PAGE_HEADER + FSEG_HDR_OFFSET
  1331. || offs == PAGE_BTR_IBUF_FREE_LIST_NODE
  1332. + PAGE_HEADER + FIL_ADDR_BYTE
  1333. + 0 /*FLST_PREV*/
  1334. || offs == PAGE_BTR_IBUF_FREE_LIST_NODE
  1335. + PAGE_HEADER + FIL_ADDR_BYTE
  1336. + FIL_ADDR_SIZE /*FLST_NEXT*/);
  1337. break;
  1338. case MLOG_4BYTES:
  1339. /* Note that this can fail when the
  1340. redo log been written with something
  1341. older than InnoDB Plugin 1.0.4. */
  1342. ut_ad(0
  1343. /* fil_crypt_rotate_page() writes this */
  1344. || offs == FIL_PAGE_SPACE_ID
  1345. || srv_is_undo_tablespace(space_id)
  1346. || offs == IBUF_TREE_SEG_HEADER
  1347. + IBUF_HEADER + FSEG_HDR_SPACE
  1348. || offs == IBUF_TREE_SEG_HEADER
  1349. + IBUF_HEADER + FSEG_HDR_PAGE_NO
  1350. || offs == PAGE_BTR_IBUF_FREE_LIST
  1351. + PAGE_HEADER/* flst_init */
  1352. || offs == PAGE_BTR_IBUF_FREE_LIST
  1353. + PAGE_HEADER + FIL_ADDR_PAGE
  1354. || offs == PAGE_BTR_IBUF_FREE_LIST
  1355. + PAGE_HEADER + FIL_ADDR_PAGE
  1356. + FIL_ADDR_SIZE
  1357. || offs == PAGE_BTR_SEG_LEAF
  1358. + PAGE_HEADER + FSEG_HDR_PAGE_NO
  1359. || offs == PAGE_BTR_SEG_LEAF
  1360. + PAGE_HEADER + FSEG_HDR_SPACE
  1361. || offs == PAGE_BTR_SEG_TOP
  1362. + PAGE_HEADER + FSEG_HDR_PAGE_NO
  1363. || offs == PAGE_BTR_SEG_TOP
  1364. + PAGE_HEADER + FSEG_HDR_SPACE
  1365. || offs == PAGE_BTR_IBUF_FREE_LIST_NODE
  1366. + PAGE_HEADER + FIL_ADDR_PAGE
  1367. + 0 /*FLST_PREV*/
  1368. || offs == PAGE_BTR_IBUF_FREE_LIST_NODE
  1369. + PAGE_HEADER + FIL_ADDR_PAGE
  1370. + FIL_ADDR_SIZE /*FLST_NEXT*/);
  1371. break;
  1372. }
  1373. }
  1374. #endif /* UNIV_DEBUG */
  1375. ptr = mlog_parse_nbytes(type, ptr, end_ptr, page, page_zip);
  1376. if (ptr != NULL && page != NULL
  1377. && page_no == 0 && type == MLOG_4BYTES) {
  1378. ulint offs = mach_read_from_2(old_ptr);
  1379. switch (offs) {
  1380. fil_space_t* space;
  1381. ulint val;
  1382. default:
  1383. break;
  1384. case FSP_HEADER_OFFSET + FSP_SPACE_FLAGS:
  1385. case FSP_HEADER_OFFSET + FSP_SIZE:
  1386. case FSP_HEADER_OFFSET + FSP_FREE_LIMIT:
  1387. case FSP_HEADER_OFFSET + FSP_FREE + FLST_LEN:
  1388. space = fil_space_get(space_id);
  1389. ut_a(space != NULL);
  1390. val = mach_read_from_4(page + offs);
  1391. switch (offs) {
  1392. case FSP_HEADER_OFFSET + FSP_SPACE_FLAGS:
  1393. space->flags = val;
  1394. break;
  1395. case FSP_HEADER_OFFSET + FSP_SIZE:
  1396. space->size_in_header = val;
  1397. break;
  1398. case FSP_HEADER_OFFSET + FSP_FREE_LIMIT:
  1399. space->free_limit = val;
  1400. break;
  1401. case FSP_HEADER_OFFSET + FSP_FREE + FLST_LEN:
  1402. space->free_len = val;
  1403. ut_ad(val == flst_get_len(
  1404. page + offs));
  1405. break;
  1406. }
  1407. }
  1408. }
  1409. break;
  1410. case MLOG_REC_INSERT: case MLOG_COMP_REC_INSERT:
  1411. ut_ad(!page || fil_page_type_is_index(page_type));
  1412. if (NULL != (ptr = mlog_parse_index(
  1413. ptr, end_ptr,
  1414. type == MLOG_COMP_REC_INSERT,
  1415. &index))) {
  1416. ut_a(!page
  1417. || (ibool)!!page_is_comp(page)
  1418. == dict_table_is_comp(index->table));
  1419. ptr = page_cur_parse_insert_rec(FALSE, ptr, end_ptr,
  1420. block, index, mtr);
  1421. }
  1422. break;
  1423. case MLOG_REC_CLUST_DELETE_MARK: case MLOG_COMP_REC_CLUST_DELETE_MARK:
  1424. ut_ad(!page || fil_page_type_is_index(page_type));
  1425. if (NULL != (ptr = mlog_parse_index(
  1426. ptr, end_ptr,
  1427. type == MLOG_COMP_REC_CLUST_DELETE_MARK,
  1428. &index))) {
  1429. ut_a(!page
  1430. || (ibool)!!page_is_comp(page)
  1431. == dict_table_is_comp(index->table));
  1432. ptr = btr_cur_parse_del_mark_set_clust_rec(
  1433. ptr, end_ptr, page, page_zip, index);
  1434. }
  1435. break;
  1436. case MLOG_REC_SEC_DELETE_MARK:
  1437. ut_ad(!page || fil_page_type_is_index(page_type));
  1438. ptr = btr_cur_parse_del_mark_set_sec_rec(ptr, end_ptr,
  1439. page, page_zip);
  1440. break;
  1441. case MLOG_REC_UPDATE_IN_PLACE: case MLOG_COMP_REC_UPDATE_IN_PLACE:
  1442. ut_ad(!page || fil_page_type_is_index(page_type));
  1443. if (NULL != (ptr = mlog_parse_index(
  1444. ptr, end_ptr,
  1445. type == MLOG_COMP_REC_UPDATE_IN_PLACE,
  1446. &index))) {
  1447. ut_a(!page
  1448. || (ibool)!!page_is_comp(page)
  1449. == dict_table_is_comp(index->table));
  1450. ptr = btr_cur_parse_update_in_place(ptr, end_ptr, page,
  1451. page_zip, index);
  1452. }
  1453. break;
  1454. case MLOG_LIST_END_DELETE: case MLOG_COMP_LIST_END_DELETE:
  1455. case MLOG_LIST_START_DELETE: case MLOG_COMP_LIST_START_DELETE:
  1456. ut_ad(!page || fil_page_type_is_index(page_type));
  1457. if (NULL != (ptr = mlog_parse_index(
  1458. ptr, end_ptr,
  1459. type == MLOG_COMP_LIST_END_DELETE
  1460. || type == MLOG_COMP_LIST_START_DELETE,
  1461. &index))) {
  1462. ut_a(!page
  1463. || (ibool)!!page_is_comp(page)
  1464. == dict_table_is_comp(index->table));
  1465. ptr = page_parse_delete_rec_list(type, ptr, end_ptr,
  1466. block, index, mtr);
  1467. }
  1468. break;
  1469. case MLOG_LIST_END_COPY_CREATED: case MLOG_COMP_LIST_END_COPY_CREATED:
  1470. ut_ad(!page || fil_page_type_is_index(page_type));
  1471. if (NULL != (ptr = mlog_parse_index(
  1472. ptr, end_ptr,
  1473. type == MLOG_COMP_LIST_END_COPY_CREATED,
  1474. &index))) {
  1475. ut_a(!page
  1476. || (ibool)!!page_is_comp(page)
  1477. == dict_table_is_comp(index->table));
  1478. ptr = page_parse_copy_rec_list_to_created_page(
  1479. ptr, end_ptr, block, index, mtr);
  1480. }
  1481. break;
  1482. case MLOG_PAGE_REORGANIZE:
  1483. case MLOG_COMP_PAGE_REORGANIZE:
  1484. case MLOG_ZIP_PAGE_REORGANIZE:
  1485. ut_ad(!page || fil_page_type_is_index(page_type));
  1486. if (NULL != (ptr = mlog_parse_index(
  1487. ptr, end_ptr,
  1488. type != MLOG_PAGE_REORGANIZE,
  1489. &index))) {
  1490. ut_a(!page
  1491. || (ibool)!!page_is_comp(page)
  1492. == dict_table_is_comp(index->table));
  1493. ptr = btr_parse_page_reorganize(
  1494. ptr, end_ptr, index,
  1495. type == MLOG_ZIP_PAGE_REORGANIZE,
  1496. block, mtr);
  1497. }
  1498. break;
  1499. case MLOG_PAGE_CREATE: case MLOG_COMP_PAGE_CREATE:
  1500. /* Allow anything in page_type when creating a page. */
  1501. ut_a(!page_zip);
  1502. page_parse_create(block, type == MLOG_COMP_PAGE_CREATE, false);
  1503. break;
  1504. case MLOG_PAGE_CREATE_RTREE: case MLOG_COMP_PAGE_CREATE_RTREE:
  1505. page_parse_create(block, type == MLOG_COMP_PAGE_CREATE_RTREE,
  1506. true);
  1507. break;
  1508. case MLOG_UNDO_INSERT:
  1509. ut_ad(!page || page_type == FIL_PAGE_UNDO_LOG);
  1510. ptr = trx_undo_parse_add_undo_rec(ptr, end_ptr, page);
  1511. break;
  1512. case MLOG_UNDO_ERASE_END:
  1513. ut_ad(!page || page_type == FIL_PAGE_UNDO_LOG);
  1514. ptr = trx_undo_parse_erase_page_end(ptr, end_ptr, page, mtr);
  1515. break;
  1516. case MLOG_UNDO_INIT:
  1517. /* Allow anything in page_type when creating a page. */
  1518. ptr = trx_undo_parse_page_init(ptr, end_ptr, page, mtr);
  1519. break;
  1520. case MLOG_UNDO_HDR_CREATE:
  1521. case MLOG_UNDO_HDR_REUSE:
  1522. ut_ad(!page || page_type == FIL_PAGE_UNDO_LOG);
  1523. ptr = trx_undo_parse_page_header(type, ptr, end_ptr,
  1524. page, mtr);
  1525. break;
  1526. case MLOG_REC_MIN_MARK: case MLOG_COMP_REC_MIN_MARK:
  1527. ut_ad(!page || fil_page_type_is_index(page_type));
  1528. /* On a compressed page, MLOG_COMP_REC_MIN_MARK
  1529. will be followed by MLOG_COMP_REC_DELETE
  1530. or MLOG_ZIP_WRITE_HEADER(FIL_PAGE_PREV, FIL_NULL)
  1531. in the same mini-transaction. */
  1532. ut_a(type == MLOG_COMP_REC_MIN_MARK || !page_zip);
  1533. ptr = btr_parse_set_min_rec_mark(
  1534. ptr, end_ptr, type == MLOG_COMP_REC_MIN_MARK,
  1535. page, mtr);
  1536. break;
  1537. case MLOG_REC_DELETE: case MLOG_COMP_REC_DELETE:
  1538. ut_ad(!page || fil_page_type_is_index(page_type));
  1539. if (NULL != (ptr = mlog_parse_index(
  1540. ptr, end_ptr,
  1541. type == MLOG_COMP_REC_DELETE,
  1542. &index))) {
  1543. ut_a(!page
  1544. || (ibool)!!page_is_comp(page)
  1545. == dict_table_is_comp(index->table));
  1546. ptr = page_cur_parse_delete_rec(ptr, end_ptr,
  1547. block, index, mtr);
  1548. }
  1549. break;
  1550. case MLOG_IBUF_BITMAP_INIT:
  1551. /* Allow anything in page_type when creating a page. */
  1552. ptr = ibuf_parse_bitmap_init(ptr, end_ptr, block, mtr);
  1553. break;
  1554. case MLOG_INIT_FILE_PAGE2:
  1555. /* Allow anything in page_type when creating a page. */
  1556. if (block) fsp_apply_init_file_page(block);
  1557. break;
  1558. case MLOG_WRITE_STRING:
  1559. ptr = mlog_parse_string(ptr, end_ptr, page, page_zip);
  1560. break;
  1561. case MLOG_ZIP_WRITE_NODE_PTR:
  1562. ut_ad(!page || fil_page_type_is_index(page_type));
  1563. ptr = page_zip_parse_write_node_ptr(ptr, end_ptr,
  1564. page, page_zip);
  1565. break;
  1566. case MLOG_ZIP_WRITE_BLOB_PTR:
  1567. ut_ad(!page || fil_page_type_is_index(page_type));
  1568. ptr = page_zip_parse_write_blob_ptr(ptr, end_ptr,
  1569. page, page_zip);
  1570. break;
  1571. case MLOG_ZIP_WRITE_HEADER:
  1572. ut_ad(!page || fil_page_type_is_index(page_type));
  1573. ptr = page_zip_parse_write_header(ptr, end_ptr,
  1574. page, page_zip);
  1575. break;
  1576. case MLOG_ZIP_PAGE_COMPRESS:
  1577. /* Allow anything in page_type when creating a page. */
  1578. ptr = page_zip_parse_compress(ptr, end_ptr, block);
  1579. break;
  1580. case MLOG_ZIP_PAGE_COMPRESS_NO_DATA:
  1581. if (NULL != (ptr = mlog_parse_index(
  1582. ptr, end_ptr, TRUE, &index))) {
  1583. ut_a(!page || ((ibool)!!page_is_comp(page)
  1584. == dict_table_is_comp(index->table)));
  1585. ptr = page_zip_parse_compress_no_data(
  1586. ptr, end_ptr, page, page_zip, index);
  1587. }
  1588. break;
  1589. case MLOG_FILE_WRITE_CRYPT_DATA:
  1590. dberr_t err;
  1591. ptr = const_cast<byte*>(fil_parse_write_crypt_data(ptr, end_ptr, block, &err));
  1592. if (err != DB_SUCCESS) {
  1593. recv_sys->found_corrupt_log = TRUE;
  1594. }
  1595. break;
  1596. default:
  1597. ptr = NULL;
  1598. ib::error() << "Incorrect log record type "
  1599. << ib::hex(unsigned(type));
  1600. recv_sys->found_corrupt_log = true;
  1601. }
  1602. if (index) {
  1603. dict_table_t* table = index->table;
  1604. dict_mem_index_free(index);
  1605. dict_mem_table_free(table);
  1606. }
  1607. return(ptr);
  1608. }
  1609. /*********************************************************************//**
  1610. Calculates the fold value of a page file address: used in inserting or
  1611. searching for a log record in the hash table.
  1612. @return folded value */
  1613. UNIV_INLINE
  1614. ulint
  1615. recv_fold(
  1616. /*======*/
  1617. ulint space, /*!< in: space */
  1618. ulint page_no)/*!< in: page number */
  1619. {
  1620. return(ut_fold_ulint_pair(space, page_no));
  1621. }
  1622. /*********************************************************************//**
  1623. Calculates the hash value of a page file address: used in inserting or
  1624. searching for a log record in the hash table.
  1625. @return folded value */
  1626. UNIV_INLINE
  1627. ulint
  1628. recv_hash(
  1629. /*======*/
  1630. ulint space, /*!< in: space */
  1631. ulint page_no)/*!< in: page number */
  1632. {
  1633. return(hash_calc_hash(recv_fold(space, page_no), recv_sys->addr_hash));
  1634. }
  1635. /*********************************************************************//**
  1636. Gets the hashed file address struct for a page.
  1637. @return file address struct, NULL if not found from the hash table */
  1638. static
  1639. recv_addr_t*
  1640. recv_get_fil_addr_struct(
  1641. /*=====================*/
  1642. ulint space, /*!< in: space id */
  1643. ulint page_no)/*!< in: page number */
  1644. {
  1645. ut_ad(mutex_own(&recv_sys->mutex));
  1646. recv_addr_t* recv_addr;
  1647. for (recv_addr = static_cast<recv_addr_t*>(
  1648. HASH_GET_FIRST(recv_sys->addr_hash,
  1649. recv_hash(space, page_no)));
  1650. recv_addr != 0;
  1651. recv_addr = static_cast<recv_addr_t*>(
  1652. HASH_GET_NEXT(addr_hash, recv_addr))) {
  1653. if (recv_addr->space == space
  1654. && recv_addr->page_no == page_no) {
  1655. return(recv_addr);
  1656. }
  1657. }
  1658. return(NULL);
  1659. }
  1660. /*******************************************************************//**
  1661. Adds a new log record to the hash table of log records. */
  1662. static
  1663. void
  1664. recv_add_to_hash_table(
  1665. /*===================*/
  1666. mlog_id_t type, /*!< in: log record type */
  1667. ulint space, /*!< in: space id */
  1668. ulint page_no, /*!< in: page number */
  1669. byte* body, /*!< in: log record body */
  1670. byte* rec_end, /*!< in: log record end */
  1671. lsn_t start_lsn, /*!< in: start lsn of the mtr */
  1672. lsn_t end_lsn) /*!< in: end lsn of the mtr */
  1673. {
  1674. recv_t* recv;
  1675. ulint len;
  1676. recv_data_t* recv_data;
  1677. recv_data_t** prev_field;
  1678. recv_addr_t* recv_addr;
  1679. ut_ad(type != MLOG_FILE_DELETE);
  1680. ut_ad(type != MLOG_FILE_CREATE2);
  1681. ut_ad(type != MLOG_FILE_RENAME2);
  1682. ut_ad(type != MLOG_FILE_NAME);
  1683. ut_ad(type != MLOG_DUMMY_RECORD);
  1684. ut_ad(type != MLOG_CHECKPOINT);
  1685. ut_ad(type != MLOG_INDEX_LOAD);
  1686. ut_ad(type != MLOG_TRUNCATE);
  1687. len = rec_end - body;
  1688. recv = static_cast<recv_t*>(
  1689. mem_heap_alloc(recv_sys->heap, sizeof(recv_t)));
  1690. recv->type = type;
  1691. recv->len = rec_end - body;
  1692. recv->start_lsn = start_lsn;
  1693. recv->end_lsn = end_lsn;
  1694. recv_addr = recv_get_fil_addr_struct(space, page_no);
  1695. if (recv_addr == NULL) {
  1696. recv_addr = static_cast<recv_addr_t*>(
  1697. mem_heap_alloc(recv_sys->heap, sizeof(recv_addr_t)));
  1698. recv_addr->space = space;
  1699. recv_addr->page_no = page_no;
  1700. recv_addr->state = RECV_NOT_PROCESSED;
  1701. UT_LIST_INIT(recv_addr->rec_list, &recv_t::rec_list);
  1702. HASH_INSERT(recv_addr_t, addr_hash, recv_sys->addr_hash,
  1703. recv_fold(space, page_no), recv_addr);
  1704. recv_sys->n_addrs++;
  1705. }
  1706. switch (type) {
  1707. case MLOG_INIT_FILE_PAGE2:
  1708. case MLOG_ZIP_PAGE_COMPRESS:
  1709. /* Ignore any earlier redo log records for this page. */
  1710. ut_ad(recv_addr->state == RECV_NOT_PROCESSED
  1711. || recv_addr->state == RECV_WILL_NOT_READ);
  1712. recv_addr->state = RECV_WILL_NOT_READ;
  1713. mlog_init.add(space, page_no, start_lsn);
  1714. default:
  1715. break;
  1716. }
  1717. UT_LIST_ADD_LAST(recv_addr->rec_list, recv);
  1718. prev_field = &(recv->data);
  1719. /* Store the log record body in chunks of less than UNIV_PAGE_SIZE:
  1720. recv_sys->heap grows into the buffer pool, and bigger chunks could not
  1721. be allocated */
  1722. while (rec_end > body) {
  1723. len = rec_end - body;
  1724. if (len > RECV_DATA_BLOCK_SIZE) {
  1725. len = RECV_DATA_BLOCK_SIZE;
  1726. }
  1727. recv_data = static_cast<recv_data_t*>(
  1728. mem_heap_alloc(recv_sys->heap,
  1729. sizeof(recv_data_t) + len));
  1730. *prev_field = recv_data;
  1731. memcpy(recv_data + 1, body, len);
  1732. prev_field = &(recv_data->next);
  1733. body += len;
  1734. }
  1735. *prev_field = NULL;
  1736. }
  1737. /*********************************************************************//**
  1738. Copies the log record body from recv to buf. */
  1739. static
  1740. void
  1741. recv_data_copy_to_buf(
  1742. /*==================*/
  1743. byte* buf, /*!< in: buffer of length at least recv->len */
  1744. recv_t* recv) /*!< in: log record */
  1745. {
  1746. recv_data_t* recv_data;
  1747. ulint part_len;
  1748. ulint len;
  1749. len = recv->len;
  1750. recv_data = recv->data;
  1751. while (len > 0) {
  1752. if (len > RECV_DATA_BLOCK_SIZE) {
  1753. part_len = RECV_DATA_BLOCK_SIZE;
  1754. } else {
  1755. part_len = len;
  1756. }
  1757. ut_memcpy(buf, ((byte*) recv_data) + sizeof(recv_data_t),
  1758. part_len);
  1759. buf += part_len;
  1760. len -= part_len;
  1761. recv_data = recv_data->next;
  1762. }
  1763. }
  1764. /** Apply the hashed log records to the page, if the page lsn is less than the
  1765. lsn of a log record.
  1766. @param[in,out] block buffer pool page
  1767. @param[in,out] mtr mini-transaction
  1768. @param[in,out] recv_addr recovery address
  1769. @param[in] init_lsn the initial LSN where to start recovery */
  1770. static void recv_recover_page(buf_block_t* block, mtr_t& mtr,
  1771. recv_addr_t* recv_addr, lsn_t init_lsn = 0)
  1772. {
  1773. page_t* page;
  1774. page_zip_des_t* page_zip;
  1775. ut_ad(mutex_own(&recv_sys->mutex));
  1776. ut_ad(recv_sys->apply_log_recs);
  1777. ut_ad(recv_needed_recovery);
  1778. ut_ad(recv_addr->state != RECV_BEING_PROCESSED);
  1779. ut_ad(recv_addr->state != RECV_PROCESSED);
  1780. if (UNIV_UNLIKELY(srv_print_verbose_log == 2)) {
  1781. fprintf(stderr, "Applying log to page %u:%u\n",
  1782. recv_addr->space, recv_addr->page_no);
  1783. }
  1784. DBUG_LOG("ib_log", "Applying log to page " << block->page.id);
  1785. recv_addr->state = RECV_BEING_PROCESSED;
  1786. mutex_exit(&recv_sys->mutex);
  1787. page = block->frame;
  1788. page_zip = buf_block_get_page_zip(block);
  1789. /* The page may have been modified in the buffer pool.
  1790. FIL_PAGE_LSN would only be updated right before flushing. */
  1791. lsn_t page_lsn = buf_page_get_newest_modification(&block->page);
  1792. if (!page_lsn) {
  1793. page_lsn = mach_read_from_8(page + FIL_PAGE_LSN);
  1794. }
  1795. lsn_t start_lsn = 0, end_lsn = 0;
  1796. if (srv_is_tablespace_truncated(recv_addr->space)) {
  1797. /* The table will be truncated after applying
  1798. normal redo log records. */
  1799. goto skip_log;
  1800. }
  1801. for (recv_t* recv = UT_LIST_GET_FIRST(recv_addr->rec_list);
  1802. recv; recv = UT_LIST_GET_NEXT(rec_list, recv)) {
  1803. ut_ad(recv->start_lsn);
  1804. end_lsn = recv->end_lsn;
  1805. ut_ad(end_lsn <= log_sys->log.scanned_lsn);
  1806. if (recv->start_lsn < page_lsn) {
  1807. /* Ignore this record, because there are later changes
  1808. for this page. */
  1809. DBUG_LOG("ib_log", "apply skip "
  1810. << get_mlog_string(recv->type)
  1811. << " LSN " << recv->start_lsn << " < "
  1812. << page_lsn);
  1813. } else if (recv->start_lsn < init_lsn) {
  1814. DBUG_LOG("ib_log", "init skip "
  1815. << get_mlog_string(recv->type)
  1816. << " LSN " << recv->start_lsn << " < "
  1817. << init_lsn);
  1818. } else if (srv_was_tablespace_truncated(
  1819. fil_space_get(recv_addr->space))
  1820. && recv->start_lsn
  1821. < truncate_t::get_truncated_tablespace_init_lsn(
  1822. recv_addr->space)) {
  1823. /* If per-table tablespace was truncated and
  1824. there exist REDO records before truncate that
  1825. are to be applied as part of recovery
  1826. (checkpoint didn't happen since truncate was
  1827. done) skip such records using lsn check as
  1828. they may not stand valid post truncate. */
  1829. } else {
  1830. if (!start_lsn) {
  1831. start_lsn = recv->start_lsn;
  1832. }
  1833. if (UNIV_UNLIKELY(srv_print_verbose_log == 2)) {
  1834. fprintf(stderr, "apply " LSN_PF ":"
  1835. " %d len " ULINTPF " page %u:%u\n",
  1836. recv->start_lsn, recv->type, recv->len,
  1837. recv_addr->space, recv_addr->page_no);
  1838. }
  1839. DBUG_LOG("ib_log", "apply " << recv->start_lsn << ": "
  1840. << get_mlog_string(recv->type)
  1841. << " len " << recv->len
  1842. << " page " << block->page.id);
  1843. byte* buf;
  1844. if (recv->len > RECV_DATA_BLOCK_SIZE) {
  1845. /* We have to copy the record body to
  1846. a separate buffer */
  1847. buf = static_cast<byte*>
  1848. (ut_malloc_nokey(recv->len));
  1849. recv_data_copy_to_buf(buf, recv);
  1850. } else {
  1851. buf = reinterpret_cast<byte*>(recv->data)
  1852. + sizeof *recv->data;
  1853. }
  1854. recv_parse_or_apply_log_rec_body(
  1855. recv->type, buf, buf + recv->len,
  1856. block->page.id.space(),
  1857. block->page.id.page_no(), true, block, &mtr);
  1858. end_lsn = recv->start_lsn + recv->len;
  1859. mach_write_to_8(FIL_PAGE_LSN + page, end_lsn);
  1860. mach_write_to_8(srv_page_size
  1861. - FIL_PAGE_END_LSN_OLD_CHKSUM
  1862. + page, end_lsn);
  1863. if (page_zip) {
  1864. mach_write_to_8(FIL_PAGE_LSN + page_zip->data,
  1865. end_lsn);
  1866. }
  1867. if (recv->len > RECV_DATA_BLOCK_SIZE) {
  1868. ut_free(buf);
  1869. }
  1870. }
  1871. }
  1872. skip_log:
  1873. #ifdef UNIV_ZIP_DEBUG
  1874. ut_ad(!fil_page_index_page_check(page)
  1875. || !page_zip
  1876. || page_zip_validate_low(page_zip, page, NULL, FALSE));
  1877. #endif /* UNIV_ZIP_DEBUG */
  1878. if (start_lsn) {
  1879. log_flush_order_mutex_enter();
  1880. buf_flush_recv_note_modification(block, start_lsn, end_lsn);
  1881. log_flush_order_mutex_exit();
  1882. }
  1883. /* Make sure that committing mtr does not change the modification
  1884. lsn values of page */
  1885. mtr.discard_modifications();
  1886. mtr.commit();
  1887. time_t now = time(NULL);
  1888. mutex_enter(&recv_sys->mutex);
  1889. if (recv_max_page_lsn < page_lsn) {
  1890. recv_max_page_lsn = page_lsn;
  1891. }
  1892. ut_ad(recv_addr->state == RECV_BEING_PROCESSED);
  1893. recv_addr->state = RECV_PROCESSED;
  1894. ut_a(recv_sys->n_addrs > 0);
  1895. if (ulint n = --recv_sys->n_addrs) {
  1896. if (recv_sys->report(now)) {
  1897. ib::info() << "To recover: " << n << " pages from log";
  1898. service_manager_extend_timeout(
  1899. INNODB_EXTEND_TIMEOUT_INTERVAL, "To recover: " ULINTPF " pages from log", n);
  1900. }
  1901. }
  1902. }
  1903. /** Reduces recv_sys->n_addrs for the corrupted page.
  1904. This function should called when srv_force_recovery > 0.
  1905. @param[in] page_id page id of the corrupted page */
  1906. void recv_recover_corrupt_page(page_id_t page_id)
  1907. {
  1908. ut_ad(srv_force_recovery);
  1909. mutex_enter(&recv_sys->mutex);
  1910. if (!recv_sys->apply_log_recs) {
  1911. } else if (recv_addr_t* recv_addr = recv_get_fil_addr_struct(
  1912. page_id.space(), page_id.page_no())) {
  1913. switch (recv_addr->state) {
  1914. case RECV_WILL_NOT_READ:
  1915. ut_ad(!"wrong state");
  1916. break;
  1917. case RECV_BEING_PROCESSED:
  1918. case RECV_PROCESSED:
  1919. break;
  1920. default:
  1921. recv_addr->state = RECV_PROCESSED;
  1922. ut_ad(recv_sys->n_addrs);
  1923. recv_sys->n_addrs--;
  1924. }
  1925. }
  1926. mutex_exit(&recv_sys->mutex);
  1927. }
  1928. /** Apply any buffered redo log to a page that was just read from a data file.
  1929. @param[in,out] bpage buffer pool page */
  1930. void recv_recover_page(buf_page_t* bpage)
  1931. {
  1932. mtr_t mtr;
  1933. mtr.start();
  1934. mtr.set_log_mode(MTR_LOG_NONE);
  1935. ut_ad(buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE);
  1936. buf_block_t* block = reinterpret_cast<buf_block_t*>(bpage);
  1937. /* Move the ownership of the x-latch on the page to
  1938. this OS thread, so that we can acquire a second
  1939. x-latch on it. This is needed for the operations to
  1940. the page to pass the debug checks. */
  1941. rw_lock_x_lock_move_ownership(&block->lock);
  1942. buf_block_dbg_add_level(block, SYNC_NO_ORDER_CHECK);
  1943. ibool success = buf_page_get_known_nowait(
  1944. RW_X_LATCH, block, BUF_KEEP_OLD,
  1945. __FILE__, __LINE__, &mtr);
  1946. ut_a(success);
  1947. mutex_enter(&recv_sys->mutex);
  1948. if (!recv_sys->apply_log_recs) {
  1949. } else if (recv_addr_t* recv_addr = recv_get_fil_addr_struct(
  1950. bpage->id.space(), bpage->id.page_no())) {
  1951. switch (recv_addr->state) {
  1952. case RECV_BEING_PROCESSED:
  1953. case RECV_PROCESSED:
  1954. break;
  1955. default:
  1956. recv_recover_page(block, mtr, recv_addr);
  1957. goto func_exit;
  1958. }
  1959. }
  1960. mtr.commit();
  1961. func_exit:
  1962. mutex_exit(&recv_sys->mutex);
  1963. ut_ad(mtr.has_committed());
  1964. }
  1965. /** Reads in pages which have hashed log records, from an area around a given
  1966. page number.
  1967. @param[in] page_id page id */
  1968. static void recv_read_in_area(const page_id_t page_id)
  1969. {
  1970. ulint page_nos[RECV_READ_AHEAD_AREA];
  1971. ulint page_no = page_id.page_no()
  1972. - (page_id.page_no() % RECV_READ_AHEAD_AREA);
  1973. ulint* p = page_nos;
  1974. for (const ulint up_limit = page_no + RECV_READ_AHEAD_AREA;
  1975. page_no < up_limit; page_no++) {
  1976. recv_addr_t* recv_addr = recv_get_fil_addr_struct(
  1977. page_id.space(), page_no);
  1978. if (recv_addr
  1979. && recv_addr->state == RECV_NOT_PROCESSED
  1980. && !buf_page_peek(page_id_t(page_id.space(), page_no))) {
  1981. recv_addr->state = RECV_BEING_READ;
  1982. *p++ = page_no;
  1983. }
  1984. }
  1985. mutex_exit(&recv_sys->mutex);
  1986. buf_read_recv_pages(FALSE, page_id.space(), page_nos,
  1987. ulint(p - page_nos));
  1988. mutex_enter(&recv_sys->mutex);
  1989. }
  1990. /** This is another low level function for the recovery system
  1991. to create a page which has buffered page intialization redo log records.
  1992. @param[in] page_id page to be created using redo logs
  1993. @param[in,out] recv_addr Hashed redo logs for the given page id
  1994. @return whether the page creation successfully */
  1995. static buf_block_t* recv_recovery_create_page_low(const page_id_t page_id,
  1996. recv_addr_t* recv_addr)
  1997. {
  1998. mtr_t mtr;
  1999. mlog_init_t::init& i = mlog_init.last(page_id);
  2000. const lsn_t end_lsn = UT_LIST_GET_LAST(recv_addr->rec_list)->end_lsn;
  2001. if (end_lsn < i.lsn)
  2002. {
  2003. DBUG_LOG("ib_log", "skip log for page "
  2004. << page_id
  2005. << " LSN " << end_lsn
  2006. << " < " << i.lsn);
  2007. recv_addr->state = RECV_PROCESSED;
  2008. ignore:
  2009. ut_a(recv_sys->n_addrs);
  2010. recv_sys->n_addrs--;
  2011. return NULL;
  2012. }
  2013. fil_space_t* space = fil_space_acquire(recv_addr->space);
  2014. if (!space)
  2015. {
  2016. recv_addr->state = RECV_PROCESSED;
  2017. goto ignore;
  2018. }
  2019. if (space->enable_lsn)
  2020. {
  2021. init_fail:
  2022. fil_space_release(space);
  2023. recv_addr->state = RECV_NOT_PROCESSED;
  2024. return NULL;
  2025. }
  2026. /* Determine if a tablespace could be for an internal table
  2027. for FULLTEXT INDEX. For those tables, no MLOG_INDEX_LOAD record
  2028. used to be written when redo logging was disabled. Hence, we
  2029. cannot optimize away page reads, because all the redo
  2030. log records for initializing and modifying the page in the
  2031. past could be older than the page in the data file.
  2032. The check is too broad, causing all
  2033. tables whose names start with FTS_ to skip the optimization. */
  2034. if (strstr(space->name, "/FTS_"))
  2035. goto init_fail;
  2036. mtr.start();
  2037. mtr.set_log_mode(MTR_LOG_NONE);
  2038. buf_block_t* block = buf_page_create(page_id, page_size_t(space->flags),
  2039. &mtr);
  2040. if (recv_addr->state == RECV_PROCESSED)
  2041. /* The page happened to exist in the buffer pool, or it was
  2042. just being read in. Before buf_page_get_with_no_latch() returned,
  2043. all changes must have been applied to the page already. */
  2044. mtr.commit();
  2045. else
  2046. {
  2047. i.created = true;
  2048. buf_block_dbg_add_level(block, SYNC_NO_ORDER_CHECK);
  2049. mtr.x_latch_at_savepoint(0, block);
  2050. recv_recover_page(block, mtr, recv_addr, i.lsn);
  2051. ut_ad(mtr.has_committed());
  2052. }
  2053. fil_space_release(space);
  2054. return block;
  2055. }
  2056. /** This is a low level function for the recovery system
  2057. to create a page which has buffered intialized redo log records.
  2058. @param[in] page_id page to be created using redo logs
  2059. @return whether the page creation successfully */
  2060. buf_block_t* recv_recovery_create_page_low(const page_id_t page_id)
  2061. {
  2062. buf_block_t* block= NULL;
  2063. mutex_enter(&recv_sys->mutex);
  2064. recv_addr_t* recv_addr= recv_get_fil_addr_struct(page_id.space(),
  2065. page_id.page_no());
  2066. if (recv_addr && recv_addr->state == RECV_WILL_NOT_READ)
  2067. {
  2068. block= recv_recovery_create_page_low(page_id, recv_addr);
  2069. }
  2070. mutex_exit(&recv_sys->mutex);
  2071. return block;
  2072. }
  2073. /** Apply the hash table of stored log records to persistent data pages.
  2074. @param[in] last_batch whether the change buffer merge will be
  2075. performed as part of the operation */
  2076. void recv_apply_hashed_log_recs(bool last_batch)
  2077. {
  2078. ut_ad(srv_operation == SRV_OPERATION_NORMAL
  2079. || srv_operation == SRV_OPERATION_RESTORE
  2080. || srv_operation == SRV_OPERATION_RESTORE_EXPORT);
  2081. mutex_enter(&recv_sys->mutex);
  2082. while (recv_sys->apply_batch_on) {
  2083. bool abort = recv_sys->found_corrupt_log;
  2084. mutex_exit(&recv_sys->mutex);
  2085. if (abort) {
  2086. return;
  2087. }
  2088. os_thread_sleep(500000);
  2089. mutex_enter(&recv_sys->mutex);
  2090. }
  2091. ut_ad(!last_batch == log_mutex_own());
  2092. recv_no_ibuf_operations = !last_batch
  2093. || srv_operation == SRV_OPERATION_RESTORE
  2094. || srv_operation == SRV_OPERATION_RESTORE_EXPORT;
  2095. ut_d(recv_no_log_write = recv_no_ibuf_operations);
  2096. if (ulint n = recv_sys->n_addrs) {
  2097. const char* msg = last_batch
  2098. ? "Starting final batch to recover "
  2099. : "Starting a batch to recover ";
  2100. ib::info() << msg << n << " pages from redo log.";
  2101. sd_notifyf(0, "STATUS=%s" ULINTPF " pages from redo log",
  2102. msg, n);
  2103. }
  2104. recv_sys->apply_log_recs = TRUE;
  2105. recv_sys->apply_batch_on = TRUE;
  2106. for (ulint id = srv_undo_tablespaces_open; id--; ) {
  2107. recv_sys_t::trunc& t = recv_sys->truncated_undo_spaces[id];
  2108. if (t.lsn) {
  2109. recv_addr_trim(id + srv_undo_space_id_start, t.pages,
  2110. t.lsn);
  2111. }
  2112. }
  2113. mtr_t mtr;
  2114. for (ulint i = 0; i < hash_get_n_cells(recv_sys->addr_hash); i++) {
  2115. for (recv_addr_t* recv_addr = static_cast<recv_addr_t*>(
  2116. HASH_GET_FIRST(recv_sys->addr_hash, i));
  2117. recv_addr;
  2118. recv_addr = static_cast<recv_addr_t*>(
  2119. HASH_GET_NEXT(addr_hash, recv_addr))) {
  2120. if (!UT_LIST_GET_LEN(recv_addr->rec_list)) {
  2121. ignore:
  2122. ut_a(recv_sys->n_addrs);
  2123. recv_sys->n_addrs--;
  2124. continue;
  2125. }
  2126. switch (recv_addr->state) {
  2127. case RECV_BEING_READ:
  2128. case RECV_BEING_PROCESSED:
  2129. case RECV_PROCESSED:
  2130. continue;
  2131. case RECV_DISCARDED:
  2132. goto ignore;
  2133. case RECV_NOT_PROCESSED:
  2134. case RECV_WILL_NOT_READ:
  2135. break;
  2136. }
  2137. if (srv_is_tablespace_truncated(recv_addr->space)) {
  2138. /* Avoid applying REDO log for the tablespace
  2139. that is schedule for TRUNCATE. */
  2140. recv_addr->state = RECV_DISCARDED;
  2141. goto ignore;
  2142. }
  2143. const page_id_t page_id(recv_addr->space,
  2144. recv_addr->page_no);
  2145. if (recv_addr->state == RECV_NOT_PROCESSED) {
  2146. apply:
  2147. mtr.start();
  2148. mtr.set_log_mode(MTR_LOG_NONE);
  2149. if (buf_block_t* block = buf_page_get_low(
  2150. page_id, univ_page_size,
  2151. RW_X_LATCH, NULL,
  2152. BUF_GET_IF_IN_POOL,
  2153. __FILE__, __LINE__, &mtr, NULL)) {
  2154. buf_block_dbg_add_level(
  2155. block, SYNC_NO_ORDER_CHECK);
  2156. recv_recover_page(block, mtr,
  2157. recv_addr);
  2158. ut_ad(mtr.has_committed());
  2159. } else {
  2160. mtr.commit();
  2161. recv_read_in_area(page_id);
  2162. }
  2163. } else if (!recv_recovery_create_page_low(
  2164. page_id, recv_addr)) {
  2165. goto apply;
  2166. }
  2167. }
  2168. }
  2169. /* Wait until all the pages have been processed */
  2170. while (recv_sys->n_addrs != 0) {
  2171. const bool abort = recv_sys->found_corrupt_log
  2172. || recv_sys->found_corrupt_fs;
  2173. if (recv_sys->found_corrupt_fs && !srv_force_recovery) {
  2174. ib::info() << "Set innodb_force_recovery=1"
  2175. " to ignore corrupted pages.";
  2176. }
  2177. mutex_exit(&(recv_sys->mutex));
  2178. if (abort) {
  2179. return;
  2180. }
  2181. os_thread_sleep(500000);
  2182. mutex_enter(&(recv_sys->mutex));
  2183. }
  2184. if (!last_batch) {
  2185. /* Flush all the file pages to disk and invalidate them in
  2186. the buffer pool */
  2187. mutex_exit(&(recv_sys->mutex));
  2188. log_mutex_exit();
  2189. /* Stop the recv_writer thread from issuing any LRU
  2190. flush batches. */
  2191. mutex_enter(&recv_sys->writer_mutex);
  2192. /* Wait for any currently run batch to end. */
  2193. buf_flush_wait_LRU_batch_end();
  2194. os_event_reset(recv_sys->flush_end);
  2195. recv_sys->flush_type = BUF_FLUSH_LIST;
  2196. os_event_set(recv_sys->flush_start);
  2197. os_event_wait(recv_sys->flush_end);
  2198. buf_pool_invalidate();
  2199. /* Allow batches from recv_writer thread. */
  2200. mutex_exit(&recv_sys->writer_mutex);
  2201. log_mutex_enter();
  2202. mutex_enter(&(recv_sys->mutex));
  2203. mlog_init.reset();
  2204. } else if (!recv_no_ibuf_operations) {
  2205. /* We skipped this in buf_page_create(). */
  2206. mlog_init.ibuf_merge(mtr);
  2207. }
  2208. recv_sys->apply_log_recs = FALSE;
  2209. recv_sys->apply_batch_on = FALSE;
  2210. recv_sys_empty_hash();
  2211. mutex_exit(&recv_sys->mutex);
  2212. }
  2213. /** Tries to parse a single log record.
  2214. @param[out] type log record type
  2215. @param[in] ptr pointer to a buffer
  2216. @param[in] end_ptr end of the buffer
  2217. @param[out] space_id tablespace identifier
  2218. @param[out] page_no page number
  2219. @param[in] apply whether to apply MLOG_FILE_* records
  2220. @param[out] body start of log record body
  2221. @return length of the record, or 0 if the record was not complete */
  2222. static
  2223. ulint
  2224. recv_parse_log_rec(
  2225. mlog_id_t* type,
  2226. byte* ptr,
  2227. byte* end_ptr,
  2228. ulint* space,
  2229. ulint* page_no,
  2230. bool apply,
  2231. byte** body)
  2232. {
  2233. byte* new_ptr;
  2234. *body = NULL;
  2235. UNIV_MEM_INVALID(type, sizeof *type);
  2236. UNIV_MEM_INVALID(space, sizeof *space);
  2237. UNIV_MEM_INVALID(page_no, sizeof *page_no);
  2238. UNIV_MEM_INVALID(body, sizeof *body);
  2239. if (ptr == end_ptr) {
  2240. return(0);
  2241. }
  2242. switch (*ptr) {
  2243. #ifdef UNIV_LOG_LSN_DEBUG
  2244. case MLOG_LSN | MLOG_SINGLE_REC_FLAG:
  2245. case MLOG_LSN:
  2246. new_ptr = mlog_parse_initial_log_record(
  2247. ptr, end_ptr, type, space, page_no);
  2248. if (new_ptr != NULL) {
  2249. const lsn_t lsn = static_cast<lsn_t>(
  2250. *space) << 32 | *page_no;
  2251. ut_a(lsn == recv_sys->recovered_lsn);
  2252. }
  2253. *type = MLOG_LSN;
  2254. return(new_ptr - ptr);
  2255. #endif /* UNIV_LOG_LSN_DEBUG */
  2256. case MLOG_MULTI_REC_END:
  2257. case MLOG_DUMMY_RECORD:
  2258. *type = static_cast<mlog_id_t>(*ptr);
  2259. return(1);
  2260. case MLOG_CHECKPOINT:
  2261. if (end_ptr < ptr + SIZE_OF_MLOG_CHECKPOINT) {
  2262. return(0);
  2263. }
  2264. *type = static_cast<mlog_id_t>(*ptr);
  2265. return(SIZE_OF_MLOG_CHECKPOINT);
  2266. case MLOG_MULTI_REC_END | MLOG_SINGLE_REC_FLAG:
  2267. case MLOG_DUMMY_RECORD | MLOG_SINGLE_REC_FLAG:
  2268. case MLOG_CHECKPOINT | MLOG_SINGLE_REC_FLAG:
  2269. ib::error() << "Incorrect log record type "
  2270. << ib::hex(unsigned(*ptr));
  2271. recv_sys->found_corrupt_log = true;
  2272. return(0);
  2273. }
  2274. new_ptr = mlog_parse_initial_log_record(ptr, end_ptr, type, space,
  2275. page_no);
  2276. *body = new_ptr;
  2277. if (UNIV_UNLIKELY(!new_ptr)) {
  2278. return(0);
  2279. }
  2280. const byte* old_ptr = new_ptr;
  2281. new_ptr = recv_parse_or_apply_log_rec_body(
  2282. *type, new_ptr, end_ptr, *space, *page_no, apply, NULL, NULL);
  2283. if (UNIV_UNLIKELY(new_ptr == NULL)) {
  2284. return(0);
  2285. }
  2286. if (*page_no == 0 && *type == MLOG_4BYTES
  2287. && apply
  2288. && mach_read_from_2(old_ptr) == FSP_HEADER_OFFSET + FSP_SIZE) {
  2289. old_ptr += 2;
  2290. ulint size = mach_parse_compressed(&old_ptr, end_ptr);
  2291. recv_spaces_t::iterator it = recv_spaces.find(*space);
  2292. ut_ad(!recv_sys->mlog_checkpoint_lsn
  2293. || *space == TRX_SYS_SPACE
  2294. || srv_is_undo_tablespace(*space)
  2295. || it != recv_spaces.end());
  2296. if (it != recv_spaces.end() && !it->second.space) {
  2297. it->second.size = size;
  2298. }
  2299. fil_space_set_recv_size(*space, size);
  2300. }
  2301. return(new_ptr - ptr);
  2302. }
  2303. /*******************************************************//**
  2304. Calculates the new value for lsn when more data is added to the log. */
  2305. static
  2306. lsn_t
  2307. recv_calc_lsn_on_data_add(
  2308. /*======================*/
  2309. lsn_t lsn, /*!< in: old lsn */
  2310. ib_uint64_t len) /*!< in: this many bytes of data is
  2311. added, log block headers not included */
  2312. {
  2313. ulint frag_len;
  2314. ib_uint64_t lsn_len;
  2315. frag_len = (lsn % OS_FILE_LOG_BLOCK_SIZE) - LOG_BLOCK_HDR_SIZE;
  2316. ut_ad(frag_len < OS_FILE_LOG_BLOCK_SIZE - LOG_BLOCK_HDR_SIZE
  2317. - LOG_BLOCK_TRL_SIZE);
  2318. lsn_len = len;
  2319. lsn_len += (lsn_len + frag_len)
  2320. / (OS_FILE_LOG_BLOCK_SIZE - LOG_BLOCK_HDR_SIZE
  2321. - LOG_BLOCK_TRL_SIZE)
  2322. * (LOG_BLOCK_HDR_SIZE + LOG_BLOCK_TRL_SIZE);
  2323. return(lsn + lsn_len);
  2324. }
  2325. /** Prints diagnostic info of corrupt log.
  2326. @param[in] ptr pointer to corrupt log record
  2327. @param[in] type type of the log record (could be garbage)
  2328. @param[in] space tablespace ID (could be garbage)
  2329. @param[in] page_no page number (could be garbage)
  2330. @return whether processing should continue */
  2331. static
  2332. bool
  2333. recv_report_corrupt_log(
  2334. const byte* ptr,
  2335. int type,
  2336. ulint space,
  2337. ulint page_no)
  2338. {
  2339. ib::error() <<
  2340. "############### CORRUPT LOG RECORD FOUND ##################";
  2341. const ulint ptr_offset = ulint(ptr - recv_sys->buf);
  2342. ib::info() << "Log record type " << type << ", page " << space << ":"
  2343. << page_no << ". Log parsing proceeded successfully up to "
  2344. << recv_sys->recovered_lsn << ". Previous log record type "
  2345. << recv_previous_parsed_rec_type << ", is multi "
  2346. << recv_previous_parsed_rec_is_multi << " Recv offset "
  2347. << ptr_offset << ", prev "
  2348. << recv_previous_parsed_rec_offset;
  2349. ut_ad(ptr <= recv_sys->buf + recv_sys->len);
  2350. const ulint limit = 100;
  2351. const ulint prev_offset = std::min(recv_previous_parsed_rec_offset,
  2352. ptr_offset);
  2353. const ulint before = std::min(prev_offset, limit);
  2354. const ulint after = std::min(recv_sys->len - ptr_offset, limit);
  2355. ib::info() << "Hex dump starting " << before << " bytes before and"
  2356. " ending " << after << " bytes after the corrupted record:";
  2357. const byte* start = recv_sys->buf + prev_offset - before;
  2358. ut_print_buf(stderr, start, ulint(ptr - start) + after);
  2359. putc('\n', stderr);
  2360. if (!srv_force_recovery) {
  2361. ib::info() << "Set innodb_force_recovery to ignore this error.";
  2362. return(false);
  2363. }
  2364. ib::warn() << "The log file may have been corrupt and it is possible"
  2365. " that the log scan did not proceed far enough in recovery!"
  2366. " Please run CHECK TABLE on your InnoDB tables to check"
  2367. " that they are ok! If mysqld crashes after this recovery; "
  2368. << FORCE_RECOVERY_MSG;
  2369. return(true);
  2370. }
  2371. /** Report a MLOG_INDEX_LOAD operation.
  2372. @param[in] space_id tablespace id
  2373. @param[in] page_no page number
  2374. @param[in] lsn log sequence number */
  2375. ATTRIBUTE_COLD static void
  2376. recv_mlog_index_load(ulint space_id, ulint page_no, lsn_t lsn)
  2377. {
  2378. recv_spaces_t::iterator it = recv_spaces.find(space_id);
  2379. if (it != recv_spaces.end()) {
  2380. it->second.mlog_index_load(lsn);
  2381. }
  2382. if (log_optimized_ddl_op) {
  2383. log_optimized_ddl_op(space_id);
  2384. }
  2385. }
  2386. /** Check whether read redo log memory exceeds the available memory
  2387. of buffer pool. Store last_stored_lsn if it is not in last phase
  2388. @param[in] store whether to store page operations
  2389. @param[in] available_mem Available memory in buffer pool to
  2390. read redo logs. */
  2391. static bool recv_sys_heap_check(store_t* store, ulint available_mem)
  2392. {
  2393. if (*store != STORE_NO
  2394. && mem_heap_get_size(recv_sys->heap) >= available_mem)
  2395. {
  2396. if (*store == STORE_YES)
  2397. recv_sys->last_stored_lsn= recv_sys->recovered_lsn;
  2398. *store= STORE_NO;
  2399. DBUG_PRINT("ib_log",("Ran out of memory and last "
  2400. "stored lsn " LSN_PF " last stored offset "
  2401. ULINTPF "\n",recv_sys->recovered_lsn,
  2402. recv_sys->recovered_offset));
  2403. return true;
  2404. }
  2405. return false;
  2406. }
  2407. /** Parse log records from a buffer and optionally store them to a
  2408. hash table to wait merging to file pages.
  2409. @param[in] checkpoint_lsn the LSN of the latest checkpoint
  2410. @param[in] store whether to store page operations
  2411. @param[in] available_mem memory to read the redo logs
  2412. @param[in] apply whether to apply the records
  2413. @return whether MLOG_CHECKPOINT record was seen the first time,
  2414. or corruption was noticed */
  2415. bool recv_parse_log_recs(lsn_t checkpoint_lsn, store_t* store,
  2416. ulint available_mem, bool apply)
  2417. {
  2418. byte* ptr;
  2419. byte* end_ptr;
  2420. bool single_rec;
  2421. ulint len;
  2422. lsn_t new_recovered_lsn;
  2423. lsn_t old_lsn;
  2424. mlog_id_t type;
  2425. ulint space;
  2426. ulint page_no;
  2427. byte* body;
  2428. const bool last_phase = (*store == STORE_IF_EXISTS);
  2429. ut_ad(log_mutex_own());
  2430. ut_ad(mutex_own(&recv_sys->mutex));
  2431. ut_ad(recv_sys->parse_start_lsn != 0);
  2432. loop:
  2433. ptr = recv_sys->buf + recv_sys->recovered_offset;
  2434. end_ptr = recv_sys->buf + recv_sys->len;
  2435. if (ptr == end_ptr) {
  2436. return(false);
  2437. }
  2438. /* Check for memory overflow and ignore the parsing of remaining
  2439. redo log records if InnoDB ran out of memory */
  2440. if (recv_sys_heap_check(store, available_mem) && last_phase) {
  2441. return false;
  2442. }
  2443. switch (*ptr) {
  2444. case MLOG_CHECKPOINT:
  2445. #ifdef UNIV_LOG_LSN_DEBUG
  2446. case MLOG_LSN:
  2447. #endif /* UNIV_LOG_LSN_DEBUG */
  2448. case MLOG_DUMMY_RECORD:
  2449. single_rec = true;
  2450. break;
  2451. default:
  2452. single_rec = !!(*ptr & MLOG_SINGLE_REC_FLAG);
  2453. }
  2454. if (single_rec) {
  2455. /* The mtr did not modify multiple pages */
  2456. old_lsn = recv_sys->recovered_lsn;
  2457. /* Try to parse a log record, fetching its type, space id,
  2458. page no, and a pointer to the body of the log record */
  2459. len = recv_parse_log_rec(&type, ptr, end_ptr, &space,
  2460. &page_no, apply, &body);
  2461. if (recv_sys->found_corrupt_log) {
  2462. recv_report_corrupt_log(ptr, type, space, page_no);
  2463. return(true);
  2464. }
  2465. if (recv_sys->found_corrupt_fs) {
  2466. return(true);
  2467. }
  2468. if (len == 0) {
  2469. return(false);
  2470. }
  2471. new_recovered_lsn = recv_calc_lsn_on_data_add(old_lsn, len);
  2472. if (new_recovered_lsn > recv_sys->scanned_lsn) {
  2473. /* The log record filled a log block, and we require
  2474. that also the next log block should have been scanned
  2475. in */
  2476. return(false);
  2477. }
  2478. recv_previous_parsed_rec_type = type;
  2479. recv_previous_parsed_rec_offset = recv_sys->recovered_offset;
  2480. recv_previous_parsed_rec_is_multi = 0;
  2481. recv_sys->recovered_offset += len;
  2482. recv_sys->recovered_lsn = new_recovered_lsn;
  2483. switch (type) {
  2484. lsn_t lsn;
  2485. case MLOG_DUMMY_RECORD:
  2486. /* Do nothing */
  2487. break;
  2488. case MLOG_CHECKPOINT:
  2489. #if SIZE_OF_MLOG_CHECKPOINT != 1 + 8
  2490. # error SIZE_OF_MLOG_CHECKPOINT != 1 + 8
  2491. #endif
  2492. lsn = mach_read_from_8(ptr + 1);
  2493. if (UNIV_UNLIKELY(srv_print_verbose_log == 2)) {
  2494. fprintf(stderr,
  2495. "MLOG_CHECKPOINT(" LSN_PF ") %s at "
  2496. LSN_PF "\n", lsn,
  2497. lsn != checkpoint_lsn ? "ignored"
  2498. : recv_sys->mlog_checkpoint_lsn
  2499. ? "reread" : "read",
  2500. recv_sys->recovered_lsn);
  2501. }
  2502. DBUG_PRINT("ib_log",
  2503. ("MLOG_CHECKPOINT(" LSN_PF ") %s at "
  2504. LSN_PF,
  2505. lsn,
  2506. lsn != checkpoint_lsn ? "ignored"
  2507. : recv_sys->mlog_checkpoint_lsn
  2508. ? "reread" : "read",
  2509. recv_sys->recovered_lsn));
  2510. if (lsn == checkpoint_lsn) {
  2511. if (recv_sys->mlog_checkpoint_lsn) {
  2512. ut_ad(recv_sys->mlog_checkpoint_lsn
  2513. <= recv_sys->recovered_lsn);
  2514. break;
  2515. }
  2516. recv_sys->mlog_checkpoint_lsn
  2517. = recv_sys->recovered_lsn;
  2518. return(true);
  2519. }
  2520. break;
  2521. #ifdef UNIV_LOG_LSN_DEBUG
  2522. case MLOG_LSN:
  2523. /* Do not add these records to the hash table.
  2524. The page number and space id fields are misused
  2525. for something else. */
  2526. break;
  2527. #endif /* UNIV_LOG_LSN_DEBUG */
  2528. default:
  2529. switch (*store) {
  2530. case STORE_NO:
  2531. break;
  2532. case STORE_IF_EXISTS:
  2533. if (fil_space_get_flags(space)
  2534. == ULINT_UNDEFINED) {
  2535. break;
  2536. }
  2537. /* fall through */
  2538. case STORE_YES:
  2539. recv_add_to_hash_table(
  2540. type, space, page_no, body,
  2541. ptr + len, old_lsn,
  2542. recv_sys->recovered_lsn);
  2543. }
  2544. /* fall through */
  2545. case MLOG_INDEX_LOAD:
  2546. if (type == MLOG_INDEX_LOAD) {
  2547. recv_mlog_index_load(space, page_no, old_lsn);
  2548. }
  2549. /* fall through */
  2550. case MLOG_FILE_NAME:
  2551. case MLOG_FILE_DELETE:
  2552. case MLOG_FILE_CREATE2:
  2553. case MLOG_FILE_RENAME2:
  2554. case MLOG_TRUNCATE:
  2555. /* These were already handled by
  2556. recv_parse_log_rec() and
  2557. recv_parse_or_apply_log_rec_body(). */
  2558. DBUG_PRINT("ib_log",
  2559. ("scan " LSN_PF ": log rec %s"
  2560. " len " ULINTPF
  2561. " page " ULINTPF ":" ULINTPF,
  2562. old_lsn, get_mlog_string(type),
  2563. len, space, page_no));
  2564. }
  2565. } else {
  2566. /* Check that all the records associated with the single mtr
  2567. are included within the buffer */
  2568. ulint total_len = 0;
  2569. ulint n_recs = 0;
  2570. bool only_mlog_file = true;
  2571. ulint mlog_rec_len = 0;
  2572. for (;;) {
  2573. len = recv_parse_log_rec(
  2574. &type, ptr, end_ptr, &space, &page_no,
  2575. false, &body);
  2576. if (recv_sys->found_corrupt_log) {
  2577. corrupted_log:
  2578. recv_report_corrupt_log(
  2579. ptr, type, space, page_no);
  2580. return(true);
  2581. }
  2582. if (ptr == end_ptr) {
  2583. } else if (type == MLOG_CHECKPOINT
  2584. || (*ptr & MLOG_SINGLE_REC_FLAG)) {
  2585. recv_sys->found_corrupt_log = true;
  2586. goto corrupted_log;
  2587. }
  2588. if (recv_sys->found_corrupt_fs) {
  2589. return(true);
  2590. }
  2591. if (len == 0) {
  2592. return(false);
  2593. }
  2594. recv_previous_parsed_rec_type = type;
  2595. recv_previous_parsed_rec_offset
  2596. = recv_sys->recovered_offset + total_len;
  2597. recv_previous_parsed_rec_is_multi = 1;
  2598. /* MLOG_FILE_NAME redo log records doesn't make changes
  2599. to persistent data. If only MLOG_FILE_NAME redo
  2600. log record exists then reset the parsing buffer pointer
  2601. by changing recovered_lsn and recovered_offset. */
  2602. if (type != MLOG_FILE_NAME && only_mlog_file == true) {
  2603. only_mlog_file = false;
  2604. }
  2605. if (only_mlog_file) {
  2606. new_recovered_lsn = recv_calc_lsn_on_data_add(
  2607. recv_sys->recovered_lsn, len);
  2608. mlog_rec_len += len;
  2609. recv_sys->recovered_offset += len;
  2610. recv_sys->recovered_lsn = new_recovered_lsn;
  2611. }
  2612. total_len += len;
  2613. n_recs++;
  2614. ptr += len;
  2615. if (type == MLOG_MULTI_REC_END) {
  2616. DBUG_PRINT("ib_log",
  2617. ("scan " LSN_PF
  2618. ": multi-log end"
  2619. " total_len " ULINTPF
  2620. " n=" ULINTPF,
  2621. recv_sys->recovered_lsn,
  2622. total_len, n_recs));
  2623. total_len -= mlog_rec_len;
  2624. break;
  2625. }
  2626. DBUG_PRINT("ib_log",
  2627. ("scan " LSN_PF ": multi-log rec %s"
  2628. " len " ULINTPF
  2629. " page " ULINTPF ":" ULINTPF,
  2630. recv_sys->recovered_lsn,
  2631. get_mlog_string(type), len, space, page_no));
  2632. }
  2633. new_recovered_lsn = recv_calc_lsn_on_data_add(
  2634. recv_sys->recovered_lsn, total_len);
  2635. if (new_recovered_lsn > recv_sys->scanned_lsn) {
  2636. /* The log record filled a log block, and we require
  2637. that also the next log block should have been scanned
  2638. in */
  2639. return(false);
  2640. }
  2641. /* Add all the records to the hash table */
  2642. ptr = recv_sys->buf + recv_sys->recovered_offset;
  2643. for (;;) {
  2644. old_lsn = recv_sys->recovered_lsn;
  2645. /* This will apply MLOG_FILE_ records. We
  2646. had to skip them in the first scan, because we
  2647. did not know if the mini-transaction was
  2648. completely recovered (until MLOG_MULTI_REC_END). */
  2649. len = recv_parse_log_rec(
  2650. &type, ptr, end_ptr, &space, &page_no,
  2651. apply, &body);
  2652. if (recv_sys->found_corrupt_log
  2653. && !recv_report_corrupt_log(
  2654. ptr, type, space, page_no)) {
  2655. return(true);
  2656. }
  2657. if (recv_sys->found_corrupt_fs) {
  2658. return(true);
  2659. }
  2660. ut_a(len != 0);
  2661. ut_a(!(*ptr & MLOG_SINGLE_REC_FLAG));
  2662. recv_sys->recovered_offset += len;
  2663. recv_sys->recovered_lsn
  2664. = recv_calc_lsn_on_data_add(old_lsn, len);
  2665. switch (type) {
  2666. case MLOG_MULTI_REC_END:
  2667. /* Found the end mark for the records */
  2668. goto loop;
  2669. #ifdef UNIV_LOG_LSN_DEBUG
  2670. case MLOG_LSN:
  2671. /* Do not add these records to the hash table.
  2672. The page number and space id fields are misused
  2673. for something else. */
  2674. break;
  2675. #endif /* UNIV_LOG_LSN_DEBUG */
  2676. case MLOG_INDEX_LOAD:
  2677. recv_mlog_index_load(space, page_no, old_lsn);
  2678. break;
  2679. case MLOG_FILE_NAME:
  2680. case MLOG_FILE_DELETE:
  2681. case MLOG_FILE_CREATE2:
  2682. case MLOG_FILE_RENAME2:
  2683. case MLOG_TRUNCATE:
  2684. /* These were already handled by
  2685. recv_parse_log_rec() and
  2686. recv_parse_or_apply_log_rec_body(). */
  2687. break;
  2688. default:
  2689. switch (*store) {
  2690. case STORE_NO:
  2691. break;
  2692. case STORE_IF_EXISTS:
  2693. if (fil_space_get_flags(space)
  2694. == ULINT_UNDEFINED) {
  2695. break;
  2696. }
  2697. /* fall through */
  2698. case STORE_YES:
  2699. recv_add_to_hash_table(
  2700. type, space, page_no,
  2701. body, ptr + len,
  2702. old_lsn,
  2703. new_recovered_lsn);
  2704. }
  2705. }
  2706. ptr += len;
  2707. }
  2708. }
  2709. goto loop;
  2710. }
  2711. /** Adds data from a new log block to the parsing buffer of recv_sys if
  2712. recv_sys->parse_start_lsn is non-zero.
  2713. @param[in] log_block log block to add
  2714. @param[in] scanned_lsn lsn of how far we were able to find
  2715. data in this log block
  2716. @return true if more data added */
  2717. bool recv_sys_add_to_parsing_buf(const byte* log_block, lsn_t scanned_lsn)
  2718. {
  2719. ulint more_len;
  2720. ulint data_len;
  2721. ulint start_offset;
  2722. ulint end_offset;
  2723. ut_ad(scanned_lsn >= recv_sys->scanned_lsn);
  2724. if (!recv_sys->parse_start_lsn) {
  2725. /* Cannot start parsing yet because no start point for
  2726. it found */
  2727. return(false);
  2728. }
  2729. data_len = log_block_get_data_len(log_block);
  2730. if (recv_sys->parse_start_lsn >= scanned_lsn) {
  2731. return(false);
  2732. } else if (recv_sys->scanned_lsn >= scanned_lsn) {
  2733. return(false);
  2734. } else if (recv_sys->parse_start_lsn > recv_sys->scanned_lsn) {
  2735. more_len = (ulint) (scanned_lsn - recv_sys->parse_start_lsn);
  2736. } else {
  2737. more_len = (ulint) (scanned_lsn - recv_sys->scanned_lsn);
  2738. }
  2739. if (more_len == 0) {
  2740. return(false);
  2741. }
  2742. ut_ad(data_len >= more_len);
  2743. start_offset = data_len - more_len;
  2744. if (start_offset < LOG_BLOCK_HDR_SIZE) {
  2745. start_offset = LOG_BLOCK_HDR_SIZE;
  2746. }
  2747. end_offset = data_len;
  2748. if (end_offset > OS_FILE_LOG_BLOCK_SIZE - LOG_BLOCK_TRL_SIZE) {
  2749. end_offset = OS_FILE_LOG_BLOCK_SIZE - LOG_BLOCK_TRL_SIZE;
  2750. }
  2751. ut_ad(start_offset <= end_offset);
  2752. if (start_offset < end_offset) {
  2753. ut_memcpy(recv_sys->buf + recv_sys->len,
  2754. log_block + start_offset, end_offset - start_offset);
  2755. recv_sys->len += end_offset - start_offset;
  2756. ut_a(recv_sys->len <= RECV_PARSING_BUF_SIZE);
  2757. }
  2758. return(true);
  2759. }
  2760. /** Moves the parsing buffer data left to the buffer start. */
  2761. void recv_sys_justify_left_parsing_buf()
  2762. {
  2763. memmove(recv_sys->buf,
  2764. recv_sys->buf + recv_sys->recovered_offset,
  2765. recv_sys->len - recv_sys->recovered_offset);
  2766. recv_sys->len -= recv_sys->recovered_offset;
  2767. recv_sys->recovered_offset = 0;
  2768. }
  2769. /** Scan redo log from a buffer and stores new log data to the parsing buffer.
  2770. Parse and hash the log records if new data found.
  2771. Apply log records automatically when the hash table becomes full.
  2772. @param[in] available_mem we let the hash table of recs to
  2773. grow to this size, at the maximum
  2774. @param[in,out] store_to_hash whether the records should be
  2775. stored to the hash table; this is
  2776. reset if just debug checking is
  2777. needed, or when the available_mem
  2778. runs out
  2779. @param[in] log_block log segment
  2780. @param[in] checkpoint_lsn latest checkpoint LSN
  2781. @param[in] start_lsn buffer start LSN
  2782. @param[in] end_lsn buffer end LSN
  2783. @param[in,out] contiguous_lsn it is known that all groups contain
  2784. contiguous log data upto this lsn
  2785. @param[out] group_scanned_lsn scanning succeeded upto this lsn
  2786. @return true if not able to scan any more in this log group */
  2787. static bool recv_scan_log_recs(
  2788. ulint available_mem,
  2789. store_t* store_to_hash,
  2790. const byte* log_block,
  2791. lsn_t checkpoint_lsn,
  2792. lsn_t start_lsn,
  2793. lsn_t end_lsn,
  2794. lsn_t* contiguous_lsn,
  2795. lsn_t* group_scanned_lsn)
  2796. {
  2797. lsn_t scanned_lsn = start_lsn;
  2798. bool finished = false;
  2799. ulint data_len;
  2800. bool more_data = false;
  2801. bool apply = recv_sys->mlog_checkpoint_lsn != 0;
  2802. ulint recv_parsing_buf_size = RECV_PARSING_BUF_SIZE;
  2803. const bool last_phase = (*store_to_hash == STORE_IF_EXISTS);
  2804. ut_ad(start_lsn % OS_FILE_LOG_BLOCK_SIZE == 0);
  2805. ut_ad(end_lsn % OS_FILE_LOG_BLOCK_SIZE == 0);
  2806. ut_ad(end_lsn >= start_lsn + OS_FILE_LOG_BLOCK_SIZE);
  2807. const byte* const log_end = log_block
  2808. + ulint(end_lsn - start_lsn);
  2809. do {
  2810. ut_ad(!finished);
  2811. if (log_block_get_flush_bit(log_block)) {
  2812. /* This block was a start of a log flush operation:
  2813. we know that the previous flush operation must have
  2814. been completed for all log groups before this block
  2815. can have been flushed to any of the groups. Therefore,
  2816. we know that log data is contiguous up to scanned_lsn
  2817. in all non-corrupt log groups. */
  2818. if (scanned_lsn > *contiguous_lsn) {
  2819. *contiguous_lsn = scanned_lsn;
  2820. }
  2821. }
  2822. data_len = log_block_get_data_len(log_block);
  2823. if (scanned_lsn + data_len > recv_sys->scanned_lsn
  2824. && log_block_get_checkpoint_no(log_block)
  2825. < recv_sys->scanned_checkpoint_no
  2826. && (recv_sys->scanned_checkpoint_no
  2827. - log_block_get_checkpoint_no(log_block)
  2828. > 0x80000000UL)) {
  2829. /* Garbage from a log buffer flush which was made
  2830. before the most recent database recovery */
  2831. finished = true;
  2832. break;
  2833. }
  2834. if (!recv_sys->parse_start_lsn
  2835. && (log_block_get_first_rec_group(log_block) > 0)) {
  2836. /* We found a point from which to start the parsing
  2837. of log records */
  2838. recv_sys->parse_start_lsn = scanned_lsn
  2839. + log_block_get_first_rec_group(log_block);
  2840. recv_sys->scanned_lsn = recv_sys->parse_start_lsn;
  2841. recv_sys->recovered_lsn = recv_sys->parse_start_lsn;
  2842. }
  2843. scanned_lsn += data_len;
  2844. if (data_len == LOG_BLOCK_HDR_SIZE + SIZE_OF_MLOG_CHECKPOINT
  2845. && scanned_lsn == checkpoint_lsn + SIZE_OF_MLOG_CHECKPOINT
  2846. && log_block[LOG_BLOCK_HDR_SIZE] == MLOG_CHECKPOINT
  2847. && checkpoint_lsn == mach_read_from_8(LOG_BLOCK_HDR_SIZE
  2848. + 1 + log_block)) {
  2849. /* The redo log is logically empty. */
  2850. ut_ad(recv_sys->mlog_checkpoint_lsn == 0
  2851. || recv_sys->mlog_checkpoint_lsn
  2852. == checkpoint_lsn);
  2853. recv_sys->mlog_checkpoint_lsn = checkpoint_lsn;
  2854. DBUG_PRINT("ib_log", ("found empty log; LSN=" LSN_PF,
  2855. scanned_lsn));
  2856. finished = true;
  2857. break;
  2858. }
  2859. if (scanned_lsn > recv_sys->scanned_lsn) {
  2860. ut_ad(!srv_log_files_created);
  2861. if (!recv_needed_recovery) {
  2862. recv_needed_recovery = true;
  2863. if (srv_read_only_mode) {
  2864. ib::warn() << "innodb_read_only"
  2865. " prevents crash recovery";
  2866. return(true);
  2867. }
  2868. ib::info() << "Starting crash recovery from"
  2869. " checkpoint LSN="
  2870. << recv_sys->scanned_lsn;
  2871. }
  2872. /* We were able to find more log data: add it to the
  2873. parsing buffer if parse_start_lsn is already
  2874. non-zero */
  2875. DBUG_EXECUTE_IF(
  2876. "reduce_recv_parsing_buf",
  2877. recv_parsing_buf_size
  2878. = (70 * 1024);
  2879. );
  2880. if (recv_sys->len + 4 * OS_FILE_LOG_BLOCK_SIZE
  2881. >= recv_parsing_buf_size) {
  2882. ib::error() << "Log parsing buffer overflow."
  2883. " Recovery may have failed!";
  2884. recv_sys->found_corrupt_log = true;
  2885. if (!srv_force_recovery) {
  2886. ib::error()
  2887. << "Set innodb_force_recovery"
  2888. " to ignore this error.";
  2889. return(true);
  2890. }
  2891. } else if (!recv_sys->found_corrupt_log) {
  2892. more_data = recv_sys_add_to_parsing_buf(
  2893. log_block, scanned_lsn);
  2894. }
  2895. recv_sys->scanned_lsn = scanned_lsn;
  2896. recv_sys->scanned_checkpoint_no
  2897. = log_block_get_checkpoint_no(log_block);
  2898. }
  2899. /* During last phase of scanning, there can be redo logs
  2900. left in recv_sys->buf to parse & store it in recv_sys->heap */
  2901. if (last_phase
  2902. && recv_sys->recovered_lsn < recv_sys->scanned_lsn) {
  2903. more_data = true;
  2904. }
  2905. if (data_len < OS_FILE_LOG_BLOCK_SIZE) {
  2906. /* Log data for this group ends here */
  2907. finished = true;
  2908. break;
  2909. } else {
  2910. log_block += OS_FILE_LOG_BLOCK_SIZE;
  2911. }
  2912. } while (log_block < log_end);
  2913. *group_scanned_lsn = scanned_lsn;
  2914. mutex_enter(&recv_sys->mutex);
  2915. if (more_data && !recv_sys->found_corrupt_log) {
  2916. /* Try to parse more log records */
  2917. if (recv_parse_log_recs(checkpoint_lsn,
  2918. store_to_hash, available_mem,
  2919. apply)) {
  2920. ut_ad(recv_sys->found_corrupt_log
  2921. || recv_sys->found_corrupt_fs
  2922. || recv_sys->mlog_checkpoint_lsn
  2923. == recv_sys->recovered_lsn);
  2924. finished = true;
  2925. goto func_exit;
  2926. }
  2927. recv_sys_heap_check(store_to_hash, available_mem);
  2928. if (recv_sys->recovered_offset > recv_parsing_buf_size / 4) {
  2929. /* Move parsing buffer data to the buffer start */
  2930. recv_sys_justify_left_parsing_buf();
  2931. }
  2932. /* Need to re-parse the redo log which're stored
  2933. in recv_sys->buf */
  2934. if (last_phase && *store_to_hash == STORE_NO) {
  2935. finished = false;
  2936. }
  2937. }
  2938. func_exit:
  2939. mutex_exit(&recv_sys->mutex);
  2940. return(finished);
  2941. }
  2942. /** Scans log from a buffer and stores new log data to the parsing buffer.
  2943. Parses and hashes the log records if new data found.
  2944. @param[in,out] group log group
  2945. @param[in] checkpoint_lsn latest checkpoint log sequence number
  2946. @param[in,out] contiguous_lsn log sequence number
  2947. until which all redo log has been scanned
  2948. @param[in] last_phase whether changes
  2949. can be applied to the tablespaces
  2950. @return whether rescan is needed (not everything was stored) */
  2951. static
  2952. bool
  2953. recv_group_scan_log_recs(
  2954. log_group_t* group,
  2955. lsn_t checkpoint_lsn,
  2956. lsn_t* contiguous_lsn,
  2957. bool last_phase)
  2958. {
  2959. DBUG_ENTER("recv_group_scan_log_recs");
  2960. DBUG_ASSERT(!last_phase || recv_sys->mlog_checkpoint_lsn > 0);
  2961. mutex_enter(&recv_sys->mutex);
  2962. recv_sys->len = 0;
  2963. recv_sys->recovered_offset = 0;
  2964. recv_sys->n_addrs = 0;
  2965. recv_sys_empty_hash();
  2966. srv_start_lsn = *contiguous_lsn;
  2967. recv_sys->parse_start_lsn = *contiguous_lsn;
  2968. recv_sys->scanned_lsn = *contiguous_lsn;
  2969. recv_sys->recovered_lsn = *contiguous_lsn;
  2970. recv_sys->scanned_checkpoint_no = 0;
  2971. recv_previous_parsed_rec_type = MLOG_SINGLE_REC_FLAG;
  2972. recv_previous_parsed_rec_offset = 0;
  2973. recv_previous_parsed_rec_is_multi = 0;
  2974. ut_ad(recv_max_page_lsn == 0);
  2975. ut_ad(last_phase || !recv_writer_thread_active);
  2976. mutex_exit(&recv_sys->mutex);
  2977. lsn_t start_lsn;
  2978. lsn_t end_lsn;
  2979. store_t store_to_hash = recv_sys->mlog_checkpoint_lsn == 0
  2980. ? STORE_NO : (last_phase ? STORE_IF_EXISTS : STORE_YES);
  2981. ulint available_mem = (buf_pool_get_n_pages() * 2 / 3)
  2982. << srv_page_size_shift;
  2983. group->scanned_lsn = end_lsn = *contiguous_lsn = ut_uint64_align_down(
  2984. *contiguous_lsn, OS_FILE_LOG_BLOCK_SIZE);
  2985. do {
  2986. if (last_phase && store_to_hash == STORE_NO) {
  2987. store_to_hash = STORE_IF_EXISTS;
  2988. /* We must not allow change buffer
  2989. merge here, because it would generate
  2990. redo log records before we have
  2991. finished the redo log scan. */
  2992. recv_apply_hashed_log_recs(false);
  2993. /* Rescan the redo logs from last stored lsn */
  2994. end_lsn = recv_sys->recovered_lsn;
  2995. }
  2996. start_lsn = ut_uint64_align_down(end_lsn,
  2997. OS_FILE_LOG_BLOCK_SIZE);
  2998. end_lsn = start_lsn;
  2999. log_group_read_log_seg(
  3000. log_sys->buf, group, &end_lsn,
  3001. start_lsn + RECV_SCAN_SIZE);
  3002. } while (end_lsn != start_lsn
  3003. && !recv_scan_log_recs(
  3004. available_mem, &store_to_hash, log_sys->buf,
  3005. checkpoint_lsn, start_lsn, end_lsn,
  3006. contiguous_lsn, &group->scanned_lsn));
  3007. if (recv_sys->found_corrupt_log || recv_sys->found_corrupt_fs) {
  3008. DBUG_RETURN(false);
  3009. }
  3010. DBUG_PRINT("ib_log", ("%s " LSN_PF " completed",
  3011. last_phase ? "rescan" : "scan",
  3012. group->scanned_lsn));
  3013. DBUG_RETURN(store_to_hash == STORE_NO);
  3014. }
  3015. /** Report a missing tablespace for which page-redo log exists.
  3016. @param[in] err previous error code
  3017. @param[in] i tablespace descriptor
  3018. @return new error code */
  3019. static
  3020. dberr_t
  3021. recv_init_missing_space(dberr_t err, const recv_spaces_t::const_iterator& i)
  3022. {
  3023. if (srv_operation == SRV_OPERATION_RESTORE
  3024. || srv_operation == SRV_OPERATION_RESTORE_EXPORT) {
  3025. ib::warn() << "Tablespace " << i->first << " was not"
  3026. " found at " << i->second.name << " when"
  3027. " restoring a (partial?) backup. All redo log"
  3028. " for this file will be ignored!";
  3029. return(err);
  3030. }
  3031. if (srv_force_recovery == 0) {
  3032. ib::error() << "Tablespace " << i->first << " was not"
  3033. " found at " << i->second.name << ".";
  3034. if (err == DB_SUCCESS) {
  3035. ib::error() << "Set innodb_force_recovery=1 to"
  3036. " ignore this and to permanently lose"
  3037. " all changes to the tablespace.";
  3038. err = DB_TABLESPACE_NOT_FOUND;
  3039. }
  3040. } else {
  3041. ib::warn() << "Tablespace " << i->first << " was not"
  3042. " found at " << i->second.name << ", and"
  3043. " innodb_force_recovery was set. All redo log"
  3044. " for this tablespace will be ignored!";
  3045. }
  3046. return(err);
  3047. }
  3048. /** Report the missing tablespace and discard the redo logs for the deleted
  3049. tablespace.
  3050. @param[in] rescan rescan of redo logs is needed
  3051. if hash table ran out of memory
  3052. @param[out] missing_tablespace missing tablespace exists or not
  3053. @return error code or DB_SUCCESS. */
  3054. static MY_ATTRIBUTE((warn_unused_result))
  3055. dberr_t
  3056. recv_validate_tablespace(bool rescan, bool& missing_tablespace)
  3057. {
  3058. dberr_t err = DB_SUCCESS;
  3059. for (ulint h = 0; h < hash_get_n_cells(recv_sys->addr_hash); h++) {
  3060. for (recv_addr_t* recv_addr = static_cast<recv_addr_t*>(
  3061. HASH_GET_FIRST(recv_sys->addr_hash, h));
  3062. recv_addr != 0;
  3063. recv_addr = static_cast<recv_addr_t*>(
  3064. HASH_GET_NEXT(addr_hash, recv_addr))) {
  3065. const ulint space = recv_addr->space;
  3066. if (is_predefined_tablespace(space)) {
  3067. continue;
  3068. }
  3069. recv_spaces_t::iterator i = recv_spaces.find(space);
  3070. ut_ad(i != recv_spaces.end());
  3071. switch (i->second.status) {
  3072. case file_name_t::MISSING:
  3073. err = recv_init_missing_space(err, i);
  3074. i->second.status = file_name_t::DELETED;
  3075. /* fall through */
  3076. case file_name_t::DELETED:
  3077. recv_addr->state = RECV_DISCARDED;
  3078. /* fall through */
  3079. case file_name_t::NORMAL:
  3080. continue;
  3081. }
  3082. ut_ad(0);
  3083. }
  3084. }
  3085. if (err != DB_SUCCESS) {
  3086. return(err);
  3087. }
  3088. /* When rescan is not needed then recv_sys->addr_hash will have
  3089. all space id belongs to redo log. If rescan is needed and
  3090. innodb_force_recovery > 0 then InnoDB can ignore missing tablespace. */
  3091. for (recv_spaces_t::iterator i = recv_spaces.begin();
  3092. i != recv_spaces.end(); i++) {
  3093. if (i->second.status != file_name_t::MISSING) {
  3094. continue;
  3095. }
  3096. missing_tablespace = true;
  3097. if (srv_force_recovery > 0) {
  3098. ib::warn() << "Tablespace " << i->first
  3099. <<" was not found at " << i->second.name
  3100. <<", and innodb_force_recovery was set."
  3101. <<" All redo log for this tablespace"
  3102. <<" will be ignored!";
  3103. continue;
  3104. }
  3105. if (!rescan) {
  3106. ib::info() << "Tablespace " << i->first
  3107. << " was not found at '"
  3108. << i->second.name << "', but there"
  3109. <<" were no modifications either.";
  3110. }
  3111. }
  3112. if (!rescan || srv_force_recovery > 0) {
  3113. missing_tablespace = false;
  3114. }
  3115. return DB_SUCCESS;
  3116. }
  3117. /** Check if all tablespaces were found for crash recovery.
  3118. @param[in] rescan rescan of redo logs is needed
  3119. @param[out] missing_tablespace missing table exists
  3120. @return error code or DB_SUCCESS */
  3121. static MY_ATTRIBUTE((warn_unused_result))
  3122. dberr_t
  3123. recv_init_crash_recovery_spaces(bool rescan, bool& missing_tablespace)
  3124. {
  3125. bool flag_deleted = false;
  3126. ut_ad(!srv_read_only_mode);
  3127. ut_ad(recv_needed_recovery);
  3128. for (recv_spaces_t::iterator i = recv_spaces.begin();
  3129. i != recv_spaces.end(); i++) {
  3130. ut_ad(!is_predefined_tablespace(i->first));
  3131. ut_ad(i->second.status != file_name_t::DELETED || !i->second.space);
  3132. if (i->second.status == file_name_t::DELETED) {
  3133. /* The tablespace was deleted,
  3134. so we can ignore any redo log for it. */
  3135. flag_deleted = true;
  3136. } else if (i->second.space != NULL) {
  3137. /* The tablespace was found, and there
  3138. are some redo log records for it. */
  3139. fil_names_dirty(i->second.space);
  3140. i->second.space->enable_lsn = i->second.enable_lsn;
  3141. } else if (i->second.name == "") {
  3142. ib::error() << "Missing MLOG_FILE_NAME"
  3143. " or MLOG_FILE_DELETE"
  3144. " before MLOG_CHECKPOINT for tablespace "
  3145. << i->first;
  3146. recv_sys->found_corrupt_log = true;
  3147. return(DB_CORRUPTION);
  3148. } else {
  3149. i->second.status = file_name_t::MISSING;
  3150. flag_deleted = true;
  3151. }
  3152. ut_ad(i->second.status == file_name_t::DELETED || i->second.name != "");
  3153. }
  3154. if (flag_deleted) {
  3155. return recv_validate_tablespace(rescan, missing_tablespace);
  3156. }
  3157. return DB_SUCCESS;
  3158. }
  3159. /** Start recovering from a redo log checkpoint.
  3160. @see recv_recovery_from_checkpoint_finish
  3161. @param[in] flush_lsn FIL_PAGE_FILE_FLUSH_LSN
  3162. of first system tablespace page
  3163. @return error code or DB_SUCCESS */
  3164. dberr_t
  3165. recv_recovery_from_checkpoint_start(lsn_t flush_lsn)
  3166. {
  3167. ulint max_cp_field;
  3168. lsn_t checkpoint_lsn;
  3169. bool rescan;
  3170. ib_uint64_t checkpoint_no;
  3171. lsn_t contiguous_lsn;
  3172. byte* buf;
  3173. dberr_t err = DB_SUCCESS;
  3174. ut_ad(srv_operation == SRV_OPERATION_NORMAL
  3175. || srv_operation == SRV_OPERATION_RESTORE
  3176. || srv_operation == SRV_OPERATION_RESTORE_EXPORT);
  3177. /* Initialize red-black tree for fast insertions into the
  3178. flush_list during recovery process. */
  3179. buf_flush_init_flush_rbt();
  3180. if (srv_force_recovery >= SRV_FORCE_NO_LOG_REDO) {
  3181. ib::info() << "innodb_force_recovery=6 skips redo log apply";
  3182. return(DB_SUCCESS);
  3183. }
  3184. recv_recovery_on = true;
  3185. log_mutex_enter();
  3186. /* Look for the latest checkpoint from any of the log groups */
  3187. err = recv_find_max_checkpoint(&max_cp_field);
  3188. if (err != DB_SUCCESS) {
  3189. skip_apply:
  3190. log_mutex_exit();
  3191. return(err);
  3192. }
  3193. switch (log_sys->log.format) {
  3194. case 0:
  3195. break;
  3196. case LOG_HEADER_FORMAT_10_2:
  3197. case LOG_HEADER_FORMAT_10_2 | LOG_HEADER_FORMAT_ENCRYPTED:
  3198. break;
  3199. case LOG_HEADER_FORMAT_10_3:
  3200. case LOG_HEADER_FORMAT_10_3 | LOG_HEADER_FORMAT_ENCRYPTED:
  3201. if (log_sys->log.subformat == 1) {
  3202. /* 10.2 with new crash-safe TRUNCATE */
  3203. break;
  3204. }
  3205. /* fall through */
  3206. default:
  3207. /* This must be a clean log from a newer version. */
  3208. goto skip_apply;
  3209. }
  3210. log_group_header_read(&log_sys->log, max_cp_field);
  3211. buf = log_sys->checkpoint_buf;
  3212. checkpoint_lsn = mach_read_from_8(buf + LOG_CHECKPOINT_LSN);
  3213. checkpoint_no = mach_read_from_8(buf + LOG_CHECKPOINT_NO);
  3214. /* Start reading the log groups from the checkpoint lsn up. The
  3215. variable contiguous_lsn contains an lsn up to which the log is
  3216. known to be contiguously written to all log groups. */
  3217. recv_sys->mlog_checkpoint_lsn = 0;
  3218. ut_ad(RECV_SCAN_SIZE <= log_sys->buf_size);
  3219. const lsn_t end_lsn = mach_read_from_8(
  3220. buf + LOG_CHECKPOINT_END_LSN);
  3221. ut_ad(recv_sys->n_addrs == 0);
  3222. contiguous_lsn = checkpoint_lsn;
  3223. switch (log_sys->log.format) {
  3224. case 0:
  3225. log_mutex_exit();
  3226. return recv_log_format_0_recover(checkpoint_lsn,
  3227. buf[20 + 32 * 9] == 2);
  3228. default:
  3229. if (end_lsn == 0) {
  3230. break;
  3231. }
  3232. if (end_lsn >= checkpoint_lsn) {
  3233. contiguous_lsn = end_lsn;
  3234. break;
  3235. }
  3236. recv_sys->found_corrupt_log = true;
  3237. log_mutex_exit();
  3238. return(DB_ERROR);
  3239. }
  3240. /* Look for MLOG_CHECKPOINT. */
  3241. log_group_t* group = &log_sys->log;
  3242. recv_group_scan_log_recs(group, checkpoint_lsn, &contiguous_lsn,
  3243. false);
  3244. /* The first scan should not have stored or applied any records. */
  3245. ut_ad(recv_sys->n_addrs == 0);
  3246. ut_ad(!recv_sys->found_corrupt_fs);
  3247. if (srv_read_only_mode && recv_needed_recovery) {
  3248. log_mutex_exit();
  3249. return(DB_READ_ONLY);
  3250. }
  3251. if (recv_sys->found_corrupt_log && !srv_force_recovery) {
  3252. log_mutex_exit();
  3253. ib::warn() << "Log scan aborted at LSN " << contiguous_lsn;
  3254. return(DB_ERROR);
  3255. }
  3256. if (recv_sys->mlog_checkpoint_lsn == 0) {
  3257. lsn_t scan_lsn = group->scanned_lsn;
  3258. if (!srv_read_only_mode && scan_lsn != checkpoint_lsn) {
  3259. log_mutex_exit();
  3260. ib::error err;
  3261. err << "Missing MLOG_CHECKPOINT";
  3262. if (end_lsn) {
  3263. err << " at " << end_lsn;
  3264. }
  3265. err << " between the checkpoint " << checkpoint_lsn
  3266. << " and the end " << scan_lsn << ".";
  3267. return(DB_ERROR);
  3268. }
  3269. group->scanned_lsn = checkpoint_lsn;
  3270. rescan = false;
  3271. } else {
  3272. contiguous_lsn = checkpoint_lsn;
  3273. rescan = recv_group_scan_log_recs(
  3274. group, checkpoint_lsn, &contiguous_lsn, false);
  3275. if ((recv_sys->found_corrupt_log && !srv_force_recovery)
  3276. || recv_sys->found_corrupt_fs) {
  3277. log_mutex_exit();
  3278. return(DB_ERROR);
  3279. }
  3280. }
  3281. /* NOTE: we always do a 'recovery' at startup, but only if
  3282. there is something wrong we will print a message to the
  3283. user about recovery: */
  3284. if (flush_lsn == checkpoint_lsn + SIZE_OF_MLOG_CHECKPOINT
  3285. && recv_sys->mlog_checkpoint_lsn == checkpoint_lsn) {
  3286. /* The redo log is logically empty. */
  3287. } else if (checkpoint_lsn != flush_lsn) {
  3288. ut_ad(!srv_log_files_created);
  3289. if (checkpoint_lsn + SIZE_OF_MLOG_CHECKPOINT < flush_lsn) {
  3290. ib::warn() << "Are you sure you are using the"
  3291. " right ib_logfiles to start up the database?"
  3292. " Log sequence number in the ib_logfiles is "
  3293. << checkpoint_lsn << ", less than the"
  3294. " log sequence number in the first system"
  3295. " tablespace file header, " << flush_lsn << ".";
  3296. }
  3297. if (!recv_needed_recovery) {
  3298. ib::info() << "The log sequence number " << flush_lsn
  3299. << " in the system tablespace does not match"
  3300. " the log sequence number " << checkpoint_lsn
  3301. << " in the ib_logfiles!";
  3302. if (srv_read_only_mode) {
  3303. ib::error() << "innodb_read_only"
  3304. " prevents crash recovery";
  3305. log_mutex_exit();
  3306. return(DB_READ_ONLY);
  3307. }
  3308. recv_needed_recovery = true;
  3309. }
  3310. }
  3311. log_sys->lsn = recv_sys->recovered_lsn;
  3312. if (recv_needed_recovery) {
  3313. bool missing_tablespace = false;
  3314. err = recv_init_crash_recovery_spaces(
  3315. rescan, missing_tablespace);
  3316. if (err != DB_SUCCESS) {
  3317. log_mutex_exit();
  3318. return(err);
  3319. }
  3320. /* If there is any missing tablespace and rescan is needed
  3321. then there is a possiblity that hash table will not contain
  3322. all space ids redo logs. Rescan the remaining unstored
  3323. redo logs for the validation of missing tablespace. */
  3324. ut_ad(rescan || !missing_tablespace);
  3325. while (missing_tablespace) {
  3326. DBUG_PRINT("ib_log", ("Rescan of redo log to validate "
  3327. "the missing tablespace. Scan "
  3328. "from last stored LSN " LSN_PF,
  3329. recv_sys->last_stored_lsn));
  3330. lsn_t recent_stored_lsn = recv_sys->last_stored_lsn;
  3331. rescan = recv_group_scan_log_recs(
  3332. group, checkpoint_lsn,
  3333. &recent_stored_lsn, false);
  3334. ut_ad(!recv_sys->found_corrupt_fs);
  3335. missing_tablespace = false;
  3336. err = recv_sys->found_corrupt_log
  3337. ? DB_ERROR
  3338. : recv_validate_tablespace(
  3339. rescan, missing_tablespace);
  3340. if (err != DB_SUCCESS) {
  3341. log_mutex_exit();
  3342. return err;
  3343. }
  3344. rescan = true;
  3345. }
  3346. if (srv_operation == SRV_OPERATION_NORMAL) {
  3347. buf_dblwr_process();
  3348. }
  3349. ut_ad(srv_force_recovery <= SRV_FORCE_NO_UNDO_LOG_SCAN);
  3350. /* Spawn the background thread to flush dirty pages
  3351. from the buffer pools. */
  3352. recv_writer_thread_active = true;
  3353. os_thread_create(recv_writer_thread, 0, 0);
  3354. if (rescan) {
  3355. contiguous_lsn = checkpoint_lsn;
  3356. recv_group_scan_log_recs(group, checkpoint_lsn,
  3357. &contiguous_lsn, true);
  3358. if ((recv_sys->found_corrupt_log
  3359. && !srv_force_recovery)
  3360. || recv_sys->found_corrupt_fs) {
  3361. log_mutex_exit();
  3362. return(DB_ERROR);
  3363. }
  3364. }
  3365. } else {
  3366. ut_ad(!rescan || recv_sys->n_addrs == 0);
  3367. }
  3368. /* We currently have only one log group */
  3369. if (group->scanned_lsn < checkpoint_lsn
  3370. || group->scanned_lsn < recv_max_page_lsn) {
  3371. ib::error() << "We scanned the log up to " << group->scanned_lsn
  3372. << ". A checkpoint was at " << checkpoint_lsn << " and"
  3373. " the maximum LSN on a database page was "
  3374. << recv_max_page_lsn << ". It is possible that the"
  3375. " database is now corrupt!";
  3376. }
  3377. if (recv_sys->recovered_lsn < checkpoint_lsn) {
  3378. log_mutex_exit();
  3379. ib::error() << "Recovered only to lsn:"
  3380. << recv_sys->recovered_lsn << " checkpoint_lsn: " << checkpoint_lsn;
  3381. return(DB_ERROR);
  3382. }
  3383. /* Synchronize the uncorrupted log groups to the most up-to-date log
  3384. group; we also copy checkpoint info to groups */
  3385. log_sys->next_checkpoint_lsn = checkpoint_lsn;
  3386. log_sys->next_checkpoint_no = checkpoint_no + 1;
  3387. recv_synchronize_groups();
  3388. if (!recv_needed_recovery) {
  3389. ut_a(checkpoint_lsn == recv_sys->recovered_lsn);
  3390. } else {
  3391. srv_start_lsn = recv_sys->recovered_lsn;
  3392. }
  3393. log_sys->buf_free = (ulint) log_sys->lsn % OS_FILE_LOG_BLOCK_SIZE;
  3394. log_sys->buf_next_to_write = log_sys->buf_free;
  3395. log_sys->write_lsn = log_sys->lsn;
  3396. log_sys->last_checkpoint_lsn = checkpoint_lsn;
  3397. if (!srv_read_only_mode && srv_operation == SRV_OPERATION_NORMAL) {
  3398. /* Write a MLOG_CHECKPOINT marker as the first thing,
  3399. before generating any other redo log. This ensures
  3400. that subsequent crash recovery will be possible even
  3401. if the server were killed soon after this. */
  3402. fil_names_clear(log_sys->last_checkpoint_lsn, true);
  3403. }
  3404. MONITOR_SET(MONITOR_LSN_CHECKPOINT_AGE,
  3405. log_sys->lsn - log_sys->last_checkpoint_lsn);
  3406. log_sys->next_checkpoint_no = ++checkpoint_no;
  3407. mutex_enter(&recv_sys->mutex);
  3408. recv_sys->apply_log_recs = TRUE;
  3409. mutex_exit(&recv_sys->mutex);
  3410. log_mutex_exit();
  3411. recv_lsn_checks_on = true;
  3412. /* The database is now ready to start almost normal processing of user
  3413. transactions: transaction rollbacks and the application of the log
  3414. records in the hash table can be run in background. */
  3415. return(DB_SUCCESS);
  3416. }
  3417. /** Complete recovery from a checkpoint. */
  3418. void
  3419. recv_recovery_from_checkpoint_finish(void)
  3420. {
  3421. /* Make sure that the recv_writer thread is done. This is
  3422. required because it grabs various mutexes and we want to
  3423. ensure that when we enable sync_order_checks there is no
  3424. mutex currently held by any thread. */
  3425. mutex_enter(&recv_sys->writer_mutex);
  3426. /* Free the resources of the recovery system */
  3427. recv_recovery_on = false;
  3428. /* By acquring the mutex we ensure that the recv_writer thread
  3429. won't trigger any more LRU batches. Now wait for currently
  3430. in progress batches to finish. */
  3431. buf_flush_wait_LRU_batch_end();
  3432. mutex_exit(&recv_sys->writer_mutex);
  3433. ulint count = 0;
  3434. while (recv_writer_thread_active) {
  3435. ++count;
  3436. os_thread_sleep(100000);
  3437. if (srv_print_verbose_log && count > 600) {
  3438. ib::info() << "Waiting for recv_writer to"
  3439. " finish flushing of buffer pool";
  3440. count = 0;
  3441. }
  3442. }
  3443. recv_sys_debug_free();
  3444. /* Free up the flush_rbt. */
  3445. buf_flush_free_flush_rbt();
  3446. }
  3447. /********************************************************//**
  3448. Initiates the rollback of active transactions. */
  3449. void
  3450. recv_recovery_rollback_active(void)
  3451. /*===============================*/
  3452. {
  3453. ut_ad(!recv_writer_thread_active);
  3454. /* Switch latching order checks on in sync0debug.cc, if
  3455. --innodb-sync-debug=true (default) */
  3456. ut_d(sync_check_enable());
  3457. /* We can't start any (DDL) transactions if UNDO logging
  3458. has been disabled, additionally disable ROLLBACK of recovered
  3459. user transactions. */
  3460. if (srv_force_recovery < SRV_FORCE_NO_TRX_UNDO
  3461. && !srv_read_only_mode) {
  3462. /* Drop partially created indexes. */
  3463. row_merge_drop_temp_indexes();
  3464. /* Drop garbage tables. */
  3465. if (srv_safe_truncate)
  3466. row_mysql_drop_garbage_tables();
  3467. /* Drop any auxiliary tables that were not dropped when the
  3468. parent table was dropped. This can happen if the parent table
  3469. was dropped but the server crashed before the auxiliary tables
  3470. were dropped. */
  3471. fts_drop_orphaned_tables();
  3472. /* Rollback the uncommitted transactions which have no user
  3473. session */
  3474. trx_rollback_or_clean_is_active = true;
  3475. os_thread_create(trx_rollback_or_clean_all_recovered, 0, 0);
  3476. }
  3477. }
  3478. /** Find a doublewrite copy of a page.
  3479. @param[in] space_id tablespace identifier
  3480. @param[in] page_no page number
  3481. @return page frame
  3482. @retval NULL if no page was found */
  3483. const byte*
  3484. recv_dblwr_t::find_page(ulint space_id, ulint page_no)
  3485. {
  3486. const byte *result= NULL;
  3487. lsn_t max_lsn= 0;
  3488. for (list::const_iterator i = pages.begin(); i != pages.end(); ++i)
  3489. {
  3490. const byte *page= *i;
  3491. if (page_get_page_no(page) != page_no ||
  3492. page_get_space_id(page) != space_id)
  3493. continue;
  3494. const lsn_t lsn= mach_read_from_8(page + FIL_PAGE_LSN);
  3495. if (lsn <= max_lsn)
  3496. continue;
  3497. max_lsn= lsn;
  3498. result= page;
  3499. }
  3500. return result;
  3501. }
  3502. #ifndef DBUG_OFF
  3503. /** Return string name of the redo log record type.
  3504. @param[in] type record log record enum
  3505. @return string name of record log record */
  3506. static const char* get_mlog_string(mlog_id_t type)
  3507. {
  3508. switch (type) {
  3509. case MLOG_SINGLE_REC_FLAG:
  3510. return("MLOG_SINGLE_REC_FLAG");
  3511. case MLOG_1BYTE:
  3512. return("MLOG_1BYTE");
  3513. case MLOG_2BYTES:
  3514. return("MLOG_2BYTES");
  3515. case MLOG_4BYTES:
  3516. return("MLOG_4BYTES");
  3517. case MLOG_8BYTES:
  3518. return("MLOG_8BYTES");
  3519. case MLOG_REC_INSERT:
  3520. return("MLOG_REC_INSERT");
  3521. case MLOG_REC_CLUST_DELETE_MARK:
  3522. return("MLOG_REC_CLUST_DELETE_MARK");
  3523. case MLOG_REC_SEC_DELETE_MARK:
  3524. return("MLOG_REC_SEC_DELETE_MARK");
  3525. case MLOG_REC_UPDATE_IN_PLACE:
  3526. return("MLOG_REC_UPDATE_IN_PLACE");
  3527. case MLOG_REC_DELETE:
  3528. return("MLOG_REC_DELETE");
  3529. case MLOG_LIST_END_DELETE:
  3530. return("MLOG_LIST_END_DELETE");
  3531. case MLOG_LIST_START_DELETE:
  3532. return("MLOG_LIST_START_DELETE");
  3533. case MLOG_LIST_END_COPY_CREATED:
  3534. return("MLOG_LIST_END_COPY_CREATED");
  3535. case MLOG_PAGE_REORGANIZE:
  3536. return("MLOG_PAGE_REORGANIZE");
  3537. case MLOG_PAGE_CREATE:
  3538. return("MLOG_PAGE_CREATE");
  3539. case MLOG_UNDO_INSERT:
  3540. return("MLOG_UNDO_INSERT");
  3541. case MLOG_UNDO_ERASE_END:
  3542. return("MLOG_UNDO_ERASE_END");
  3543. case MLOG_UNDO_INIT:
  3544. return("MLOG_UNDO_INIT");
  3545. case MLOG_UNDO_HDR_REUSE:
  3546. return("MLOG_UNDO_HDR_REUSE");
  3547. case MLOG_UNDO_HDR_CREATE:
  3548. return("MLOG_UNDO_HDR_CREATE");
  3549. case MLOG_REC_MIN_MARK:
  3550. return("MLOG_REC_MIN_MARK");
  3551. case MLOG_IBUF_BITMAP_INIT:
  3552. return("MLOG_IBUF_BITMAP_INIT");
  3553. #ifdef UNIV_LOG_LSN_DEBUG
  3554. case MLOG_LSN:
  3555. return("MLOG_LSN");
  3556. #endif /* UNIV_LOG_LSN_DEBUG */
  3557. case MLOG_WRITE_STRING:
  3558. return("MLOG_WRITE_STRING");
  3559. case MLOG_MULTI_REC_END:
  3560. return("MLOG_MULTI_REC_END");
  3561. case MLOG_DUMMY_RECORD:
  3562. return("MLOG_DUMMY_RECORD");
  3563. case MLOG_FILE_DELETE:
  3564. return("MLOG_FILE_DELETE");
  3565. case MLOG_COMP_REC_MIN_MARK:
  3566. return("MLOG_COMP_REC_MIN_MARK");
  3567. case MLOG_COMP_PAGE_CREATE:
  3568. return("MLOG_COMP_PAGE_CREATE");
  3569. case MLOG_COMP_REC_INSERT:
  3570. return("MLOG_COMP_REC_INSERT");
  3571. case MLOG_COMP_REC_CLUST_DELETE_MARK:
  3572. return("MLOG_COMP_REC_CLUST_DELETE_MARK");
  3573. case MLOG_COMP_REC_UPDATE_IN_PLACE:
  3574. return("MLOG_COMP_REC_UPDATE_IN_PLACE");
  3575. case MLOG_COMP_REC_DELETE:
  3576. return("MLOG_COMP_REC_DELETE");
  3577. case MLOG_COMP_LIST_END_DELETE:
  3578. return("MLOG_COMP_LIST_END_DELETE");
  3579. case MLOG_COMP_LIST_START_DELETE:
  3580. return("MLOG_COMP_LIST_START_DELETE");
  3581. case MLOG_COMP_LIST_END_COPY_CREATED:
  3582. return("MLOG_COMP_LIST_END_COPY_CREATED");
  3583. case MLOG_COMP_PAGE_REORGANIZE:
  3584. return("MLOG_COMP_PAGE_REORGANIZE");
  3585. case MLOG_FILE_CREATE2:
  3586. return("MLOG_FILE_CREATE2");
  3587. case MLOG_ZIP_WRITE_NODE_PTR:
  3588. return("MLOG_ZIP_WRITE_NODE_PTR");
  3589. case MLOG_ZIP_WRITE_BLOB_PTR:
  3590. return("MLOG_ZIP_WRITE_BLOB_PTR");
  3591. case MLOG_ZIP_WRITE_HEADER:
  3592. return("MLOG_ZIP_WRITE_HEADER");
  3593. case MLOG_ZIP_PAGE_COMPRESS:
  3594. return("MLOG_ZIP_PAGE_COMPRESS");
  3595. case MLOG_ZIP_PAGE_COMPRESS_NO_DATA:
  3596. return("MLOG_ZIP_PAGE_COMPRESS_NO_DATA");
  3597. case MLOG_ZIP_PAGE_REORGANIZE:
  3598. return("MLOG_ZIP_PAGE_REORGANIZE");
  3599. case MLOG_FILE_RENAME2:
  3600. return("MLOG_FILE_RENAME2");
  3601. case MLOG_FILE_NAME:
  3602. return("MLOG_FILE_NAME");
  3603. case MLOG_CHECKPOINT:
  3604. return("MLOG_CHECKPOINT");
  3605. case MLOG_PAGE_CREATE_RTREE:
  3606. return("MLOG_PAGE_CREATE_RTREE");
  3607. case MLOG_COMP_PAGE_CREATE_RTREE:
  3608. return("MLOG_COMP_PAGE_CREATE_RTREE");
  3609. case MLOG_INIT_FILE_PAGE2:
  3610. return("MLOG_INIT_FILE_PAGE2");
  3611. case MLOG_INDEX_LOAD:
  3612. return("MLOG_INDEX_LOAD");
  3613. case MLOG_TRUNCATE:
  3614. return("MLOG_TRUNCATE");
  3615. case MLOG_FILE_WRITE_CRYPT_DATA:
  3616. return("MLOG_FILE_WRITE_CRYPT_DATA");
  3617. }
  3618. DBUG_ASSERT(0);
  3619. return(NULL);
  3620. }
  3621. #endif /* !DBUG_OFF */