You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

517 lines
18 KiB

17 years ago
9 years ago
17 years ago
17 years ago
16 years ago
MDEV-9422: Checksum errors on restart when killing busy instance that uses encrypted XtraDB tables Analysis: -- InnoDB has n (>0) redo-log files. -- In the first page of redo-log there is 2 checkpoint records on fixed location (checkpoint is not encrypted) -- On every checkpoint record there is up to 5 crypt_keys containing the keys used for encryption/decryption -- On crash recovery we read all checkpoints on every file -- Recovery starts by reading from the latest checkpoint forward -- Problem is that latest checkpoint might not always contain the key we need to decrypt all the redo-log blocks (see MDEV-9422 for one example) -- Furthermore, there is no way to identify is the log block corrupted or encrypted For example checkpoint can contain following keys : write chk: 4 [ chk key ]: [ 5 1 ] [ 4 1 ] [ 3 1 ] [ 2 1 ] [ 1 1 ] so over time we could have a checkpoint write chk: 13 [ chk key ]: [ 14 1 ] [ 13 1 ] [ 12 1 ] [ 11 1 ] [ 10 1 ] killall -9 mysqld causes crash recovery and on crash recovery we read as many checkpoints as there is log files, e.g. read [ chk key ]: [ 13 1 ] [ 12 1 ] [ 11 1 ] [ 10 1 ] [ 9 1 ] read [ chk key ]: [ 14 1 ] [ 13 1 ] [ 12 1 ] [ 11 1 ] [ 10 1 ] [ 9 1 ] This is problematic, as we could still scan log blocks e.g. from checkpoint 4 and we do not know anymore the correct key. CRYPT INFO: for checkpoint 14 search 4 CRYPT INFO: for checkpoint 13 search 4 CRYPT INFO: for checkpoint 12 search 4 CRYPT INFO: for checkpoint 11 search 4 CRYPT INFO: for checkpoint 10 search 4 CRYPT INFO: for checkpoint 9 search 4 (NOTE: NOT FOUND) For every checkpoint, code generated a new encrypted key based on key from encryption plugin and random numbers. Only random numbers are stored on checkpoint. Fix: Generate only one key for every log file. If checkpoint contains only one key, use that key to encrypt/decrypt all log blocks. If checkpoint contains more than one key (this is case for databases created using MariaDB server version 10.1.0 - 10.1.12 if log encryption was used). If looked checkpoint_no is found from keys on checkpoint we use that key to decrypt the log block. For encryption we use always the first key. If the looked checkpoint_no is not found from keys on checkpoint we use the first key. Modified code also so that if log is not encrypted, we do not generate any empty keys. If we have a log block and no keys is found from checkpoint we assume that log block is unencrypted. Log corruption or missing keys is found by comparing log block checksums. If we have a keys but current log block checksum is correct we again assume log block to be unencrypted. This is because current implementation stores checksum only before encryption and new checksum after encryption but before disk write is not stored anywhere.
10 years ago
16 years ago
16 years ago
16 years ago
9 years ago
16 years ago
16 years ago
16 years ago
16 years ago
17 years ago
16 years ago
16 years ago
16 years ago
16 years ago
MDEV-12113: install_db shows corruption for rest encryption with innodb_data_file_path=ibdata1:3M; Problem was that FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION field that for encrypted pages even in system datafiles should contain key_version except very first page (0:0) is after encryption overwritten with flush lsn. Ported WL#7990 Repurpose FIL_PAGE_FLUSH_LSN to 10.1 The field FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION is consulted during InnoDB startup. At startup, InnoDB reads the FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION from the first page of each file in the InnoDB system tablespace. If there are multiple files, the minimum and maximum LSN can differ. These numbers are passed to InnoDB startup. Having the number in other files than the first file of the InnoDB system tablespace is not providing much additional value. It is conflicting with other use of the field, such as on InnoDB R-tree index pages and encryption key_version. This worklog will stop writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION to other files than the first file of the InnoDB system tablespace (page number 0:0) when system tablespace is encrypted. If tablespace is not encrypted we continue writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION to all first pages of system tablespace to avoid unnecessary warnings on downgrade. open_or_create_data_files(): pass only one flushed_lsn parameter xb_load_tablespaces(): pass only one flushed_lsn parameter. buf_page_create(): Improve comment about where FIL_PAGE_FIL_FLUSH_LSN_OR_KEY_VERSION is set. fil_write_flushed_lsn(): A new function, merged from fil_write_lsn_and_arch_no_to_file() and fil_write_flushed_lsn_to_data_files(). Only write to the first page of the system tablespace (page 0:0) if tablespace is encrypted, or write all first pages of system tablespace and invoke fil_flush_file_spaces(FIL_TYPE_TABLESPACE) afterwards. fil_read_first_page(): read flush_lsn and crypt_data only from first datafile. fil_open_single_table_tablespace(): Remove output of LSN, because it was only valid for the system tablespace and the undo tablespaces, not user tablespaces. fil_validate_single_table_tablespace(): Remove output of LSN. checkpoint_now_set(): Use fil_write_flushed_lsn and output a error if operation fails. Remove lsn variable from fsp_open_info. recv_recovery_from_checkpoint_start(): Remove unnecessary second flush_lsn parameter. log_empty_and_mark_files_at_shutdown(): Use fil_writte_flushed_lsn and output error if it fails. open_or_create_data_files(): Pass only one flushed_lsn variable.
9 years ago
16 years ago
MDEV-12113: install_db shows corruption for rest encryption with innodb_data_file_path=ibdata1:3M; Problem was that FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION field that for encrypted pages even in system datafiles should contain key_version except very first page (0:0) is after encryption overwritten with flush lsn. Ported WL#7990 Repurpose FIL_PAGE_FLUSH_LSN to 10.1 The field FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION is consulted during InnoDB startup. At startup, InnoDB reads the FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION from the first page of each file in the InnoDB system tablespace. If there are multiple files, the minimum and maximum LSN can differ. These numbers are passed to InnoDB startup. Having the number in other files than the first file of the InnoDB system tablespace is not providing much additional value. It is conflicting with other use of the field, such as on InnoDB R-tree index pages and encryption key_version. This worklog will stop writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION to other files than the first file of the InnoDB system tablespace (page number 0:0) when system tablespace is encrypted. If tablespace is not encrypted we continue writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION to all first pages of system tablespace to avoid unnecessary warnings on downgrade. open_or_create_data_files(): pass only one flushed_lsn parameter xb_load_tablespaces(): pass only one flushed_lsn parameter. buf_page_create(): Improve comment about where FIL_PAGE_FIL_FLUSH_LSN_OR_KEY_VERSION is set. fil_write_flushed_lsn(): A new function, merged from fil_write_lsn_and_arch_no_to_file() and fil_write_flushed_lsn_to_data_files(). Only write to the first page of the system tablespace (page 0:0) if tablespace is encrypted, or write all first pages of system tablespace and invoke fil_flush_file_spaces(FIL_TYPE_TABLESPACE) afterwards. fil_read_first_page(): read flush_lsn and crypt_data only from first datafile. fil_open_single_table_tablespace(): Remove output of LSN, because it was only valid for the system tablespace and the undo tablespaces, not user tablespaces. fil_validate_single_table_tablespace(): Remove output of LSN. checkpoint_now_set(): Use fil_write_flushed_lsn and output a error if operation fails. Remove lsn variable from fsp_open_info. recv_recovery_from_checkpoint_start(): Remove unnecessary second flush_lsn parameter. log_empty_and_mark_files_at_shutdown(): Use fil_writte_flushed_lsn and output error if it fails. open_or_create_data_files(): Pass only one flushed_lsn variable.
9 years ago
16 years ago
17 years ago
MDEV-12113: install_db shows corruption for rest encryption with innodb_data_file_path=ibdata1:3M; Problem was that FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION field that for encrypted pages even in system datafiles should contain key_version except very first page (0:0) is after encryption overwritten with flush lsn. Ported WL#7990 Repurpose FIL_PAGE_FLUSH_LSN to 10.1 The field FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION is consulted during InnoDB startup. At startup, InnoDB reads the FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION from the first page of each file in the InnoDB system tablespace. If there are multiple files, the minimum and maximum LSN can differ. These numbers are passed to InnoDB startup. Having the number in other files than the first file of the InnoDB system tablespace is not providing much additional value. It is conflicting with other use of the field, such as on InnoDB R-tree index pages and encryption key_version. This worklog will stop writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION to other files than the first file of the InnoDB system tablespace (page number 0:0) when system tablespace is encrypted. If tablespace is not encrypted we continue writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION to all first pages of system tablespace to avoid unnecessary warnings on downgrade. open_or_create_data_files(): pass only one flushed_lsn parameter xb_load_tablespaces(): pass only one flushed_lsn parameter. buf_page_create(): Improve comment about where FIL_PAGE_FIL_FLUSH_LSN_OR_KEY_VERSION is set. fil_write_flushed_lsn(): A new function, merged from fil_write_lsn_and_arch_no_to_file() and fil_write_flushed_lsn_to_data_files(). Only write to the first page of the system tablespace (page 0:0) if tablespace is encrypted, or write all first pages of system tablespace and invoke fil_flush_file_spaces(FIL_TYPE_TABLESPACE) afterwards. fil_read_first_page(): read flush_lsn and crypt_data only from first datafile. fil_open_single_table_tablespace(): Remove output of LSN, because it was only valid for the system tablespace and the undo tablespaces, not user tablespaces. fil_validate_single_table_tablespace(): Remove output of LSN. checkpoint_now_set(): Use fil_write_flushed_lsn and output a error if operation fails. Remove lsn variable from fsp_open_info. recv_recovery_from_checkpoint_start(): Remove unnecessary second flush_lsn parameter. log_empty_and_mark_files_at_shutdown(): Use fil_writte_flushed_lsn and output error if it fails. open_or_create_data_files(): Pass only one flushed_lsn variable.
9 years ago
17 years ago
MDEV-12113: install_db shows corruption for rest encryption with innodb_data_file_path=ibdata1:3M; Problem was that FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION field that for encrypted pages even in system datafiles should contain key_version except very first page (0:0) is after encryption overwritten with flush lsn. Ported WL#7990 Repurpose FIL_PAGE_FLUSH_LSN to 10.1 The field FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION is consulted during InnoDB startup. At startup, InnoDB reads the FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION from the first page of each file in the InnoDB system tablespace. If there are multiple files, the minimum and maximum LSN can differ. These numbers are passed to InnoDB startup. Having the number in other files than the first file of the InnoDB system tablespace is not providing much additional value. It is conflicting with other use of the field, such as on InnoDB R-tree index pages and encryption key_version. This worklog will stop writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION to other files than the first file of the InnoDB system tablespace (page number 0:0) when system tablespace is encrypted. If tablespace is not encrypted we continue writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION to all first pages of system tablespace to avoid unnecessary warnings on downgrade. open_or_create_data_files(): pass only one flushed_lsn parameter xb_load_tablespaces(): pass only one flushed_lsn parameter. buf_page_create(): Improve comment about where FIL_PAGE_FIL_FLUSH_LSN_OR_KEY_VERSION is set. fil_write_flushed_lsn(): A new function, merged from fil_write_lsn_and_arch_no_to_file() and fil_write_flushed_lsn_to_data_files(). Only write to the first page of the system tablespace (page 0:0) if tablespace is encrypted, or write all first pages of system tablespace and invoke fil_flush_file_spaces(FIL_TYPE_TABLESPACE) afterwards. fil_read_first_page(): read flush_lsn and crypt_data only from first datafile. fil_open_single_table_tablespace(): Remove output of LSN, because it was only valid for the system tablespace and the undo tablespaces, not user tablespaces. fil_validate_single_table_tablespace(): Remove output of LSN. checkpoint_now_set(): Use fil_write_flushed_lsn and output a error if operation fails. Remove lsn variable from fsp_open_info. recv_recovery_from_checkpoint_start(): Remove unnecessary second flush_lsn parameter. log_empty_and_mark_files_at_shutdown(): Use fil_writte_flushed_lsn and output error if it fails. open_or_create_data_files(): Pass only one flushed_lsn variable.
9 years ago
17 years ago
16 years ago
MDEV-12113: install_db shows corruption for rest encryption with innodb_data_file_path=ibdata1:3M; Problem was that FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION field that for encrypted pages even in system datafiles should contain key_version except very first page (0:0) is after encryption overwritten with flush lsn. Ported WL#7990 Repurpose FIL_PAGE_FLUSH_LSN to 10.1 The field FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION is consulted during InnoDB startup. At startup, InnoDB reads the FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION from the first page of each file in the InnoDB system tablespace. If there are multiple files, the minimum and maximum LSN can differ. These numbers are passed to InnoDB startup. Having the number in other files than the first file of the InnoDB system tablespace is not providing much additional value. It is conflicting with other use of the field, such as on InnoDB R-tree index pages and encryption key_version. This worklog will stop writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION to other files than the first file of the InnoDB system tablespace (page number 0:0) when system tablespace is encrypted. If tablespace is not encrypted we continue writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION to all first pages of system tablespace to avoid unnecessary warnings on downgrade. open_or_create_data_files(): pass only one flushed_lsn parameter xb_load_tablespaces(): pass only one flushed_lsn parameter. buf_page_create(): Improve comment about where FIL_PAGE_FIL_FLUSH_LSN_OR_KEY_VERSION is set. fil_write_flushed_lsn(): A new function, merged from fil_write_lsn_and_arch_no_to_file() and fil_write_flushed_lsn_to_data_files(). Only write to the first page of the system tablespace (page 0:0) if tablespace is encrypted, or write all first pages of system tablespace and invoke fil_flush_file_spaces(FIL_TYPE_TABLESPACE) afterwards. fil_read_first_page(): read flush_lsn and crypt_data only from first datafile. fil_open_single_table_tablespace(): Remove output of LSN, because it was only valid for the system tablespace and the undo tablespaces, not user tablespaces. fil_validate_single_table_tablespace(): Remove output of LSN. checkpoint_now_set(): Use fil_write_flushed_lsn and output a error if operation fails. Remove lsn variable from fsp_open_info. recv_recovery_from_checkpoint_start(): Remove unnecessary second flush_lsn parameter. log_empty_and_mark_files_at_shutdown(): Use fil_writte_flushed_lsn and output error if it fails. open_or_create_data_files(): Pass only one flushed_lsn variable.
9 years ago
16 years ago
MDEV-12113: install_db shows corruption for rest encryption with innodb_data_file_path=ibdata1:3M; Problem was that FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION field that for encrypted pages even in system datafiles should contain key_version except very first page (0:0) is after encryption overwritten with flush lsn. Ported WL#7990 Repurpose FIL_PAGE_FLUSH_LSN to 10.1 The field FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION is consulted during InnoDB startup. At startup, InnoDB reads the FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION from the first page of each file in the InnoDB system tablespace. If there are multiple files, the minimum and maximum LSN can differ. These numbers are passed to InnoDB startup. Having the number in other files than the first file of the InnoDB system tablespace is not providing much additional value. It is conflicting with other use of the field, such as on InnoDB R-tree index pages and encryption key_version. This worklog will stop writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION to other files than the first file of the InnoDB system tablespace (page number 0:0) when system tablespace is encrypted. If tablespace is not encrypted we continue writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION to all first pages of system tablespace to avoid unnecessary warnings on downgrade. open_or_create_data_files(): pass only one flushed_lsn parameter xb_load_tablespaces(): pass only one flushed_lsn parameter. buf_page_create(): Improve comment about where FIL_PAGE_FIL_FLUSH_LSN_OR_KEY_VERSION is set. fil_write_flushed_lsn(): A new function, merged from fil_write_lsn_and_arch_no_to_file() and fil_write_flushed_lsn_to_data_files(). Only write to the first page of the system tablespace (page 0:0) if tablespace is encrypted, or write all first pages of system tablespace and invoke fil_flush_file_spaces(FIL_TYPE_TABLESPACE) afterwards. fil_read_first_page(): read flush_lsn and crypt_data only from first datafile. fil_open_single_table_tablespace(): Remove output of LSN, because it was only valid for the system tablespace and the undo tablespaces, not user tablespaces. fil_validate_single_table_tablespace(): Remove output of LSN. checkpoint_now_set(): Use fil_write_flushed_lsn and output a error if operation fails. Remove lsn variable from fsp_open_info. recv_recovery_from_checkpoint_start(): Remove unnecessary second flush_lsn parameter. log_empty_and_mark_files_at_shutdown(): Use fil_writte_flushed_lsn and output error if it fails. open_or_create_data_files(): Pass only one flushed_lsn variable.
9 years ago
17 years ago
16 years ago
MDEV-12113: install_db shows corruption for rest encryption with innodb_data_file_path=ibdata1:3M; Problem was that FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION field that for encrypted pages even in system datafiles should contain key_version except very first page (0:0) is after encryption overwritten with flush lsn. Ported WL#7990 Repurpose FIL_PAGE_FLUSH_LSN to 10.1 The field FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION is consulted during InnoDB startup. At startup, InnoDB reads the FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION from the first page of each file in the InnoDB system tablespace. If there are multiple files, the minimum and maximum LSN can differ. These numbers are passed to InnoDB startup. Having the number in other files than the first file of the InnoDB system tablespace is not providing much additional value. It is conflicting with other use of the field, such as on InnoDB R-tree index pages and encryption key_version. This worklog will stop writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION to other files than the first file of the InnoDB system tablespace (page number 0:0) when system tablespace is encrypted. If tablespace is not encrypted we continue writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION to all first pages of system tablespace to avoid unnecessary warnings on downgrade. open_or_create_data_files(): pass only one flushed_lsn parameter xb_load_tablespaces(): pass only one flushed_lsn parameter. buf_page_create(): Improve comment about where FIL_PAGE_FIL_FLUSH_LSN_OR_KEY_VERSION is set. fil_write_flushed_lsn(): A new function, merged from fil_write_lsn_and_arch_no_to_file() and fil_write_flushed_lsn_to_data_files(). Only write to the first page of the system tablespace (page 0:0) if tablespace is encrypted, or write all first pages of system tablespace and invoke fil_flush_file_spaces(FIL_TYPE_TABLESPACE) afterwards. fil_read_first_page(): read flush_lsn and crypt_data only from first datafile. fil_open_single_table_tablespace(): Remove output of LSN, because it was only valid for the system tablespace and the undo tablespaces, not user tablespaces. fil_validate_single_table_tablespace(): Remove output of LSN. checkpoint_now_set(): Use fil_write_flushed_lsn and output a error if operation fails. Remove lsn variable from fsp_open_info. recv_recovery_from_checkpoint_start(): Remove unnecessary second flush_lsn parameter. log_empty_and_mark_files_at_shutdown(): Use fil_writte_flushed_lsn and output error if it fails. open_or_create_data_files(): Pass only one flushed_lsn variable.
9 years ago
16 years ago
MDEV-12113: install_db shows corruption for rest encryption with innodb_data_file_path=ibdata1:3M; Problem was that FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION field that for encrypted pages even in system datafiles should contain key_version except very first page (0:0) is after encryption overwritten with flush lsn. Ported WL#7990 Repurpose FIL_PAGE_FLUSH_LSN to 10.1 The field FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION is consulted during InnoDB startup. At startup, InnoDB reads the FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION from the first page of each file in the InnoDB system tablespace. If there are multiple files, the minimum and maximum LSN can differ. These numbers are passed to InnoDB startup. Having the number in other files than the first file of the InnoDB system tablespace is not providing much additional value. It is conflicting with other use of the field, such as on InnoDB R-tree index pages and encryption key_version. This worklog will stop writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION to other files than the first file of the InnoDB system tablespace (page number 0:0) when system tablespace is encrypted. If tablespace is not encrypted we continue writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION to all first pages of system tablespace to avoid unnecessary warnings on downgrade. open_or_create_data_files(): pass only one flushed_lsn parameter xb_load_tablespaces(): pass only one flushed_lsn parameter. buf_page_create(): Improve comment about where FIL_PAGE_FIL_FLUSH_LSN_OR_KEY_VERSION is set. fil_write_flushed_lsn(): A new function, merged from fil_write_lsn_and_arch_no_to_file() and fil_write_flushed_lsn_to_data_files(). Only write to the first page of the system tablespace (page 0:0) if tablespace is encrypted, or write all first pages of system tablespace and invoke fil_flush_file_spaces(FIL_TYPE_TABLESPACE) afterwards. fil_read_first_page(): read flush_lsn and crypt_data only from first datafile. fil_open_single_table_tablespace(): Remove output of LSN, because it was only valid for the system tablespace and the undo tablespaces, not user tablespaces. fil_validate_single_table_tablespace(): Remove output of LSN. checkpoint_now_set(): Use fil_write_flushed_lsn and output a error if operation fails. Remove lsn variable from fsp_open_info. recv_recovery_from_checkpoint_start(): Remove unnecessary second flush_lsn parameter. log_empty_and_mark_files_at_shutdown(): Use fil_writte_flushed_lsn and output error if it fails. open_or_create_data_files(): Pass only one flushed_lsn variable.
9 years ago
17 years ago
MDEV-12113: install_db shows corruption for rest encryption with innodb_data_file_path=ibdata1:3M; Problem was that FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION field that for encrypted pages even in system datafiles should contain key_version except very first page (0:0) is after encryption overwritten with flush lsn. Ported WL#7990 Repurpose FIL_PAGE_FLUSH_LSN to 10.1 The field FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION is consulted during InnoDB startup. At startup, InnoDB reads the FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION from the first page of each file in the InnoDB system tablespace. If there are multiple files, the minimum and maximum LSN can differ. These numbers are passed to InnoDB startup. Having the number in other files than the first file of the InnoDB system tablespace is not providing much additional value. It is conflicting with other use of the field, such as on InnoDB R-tree index pages and encryption key_version. This worklog will stop writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION to other files than the first file of the InnoDB system tablespace (page number 0:0) when system tablespace is encrypted. If tablespace is not encrypted we continue writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION to all first pages of system tablespace to avoid unnecessary warnings on downgrade. open_or_create_data_files(): pass only one flushed_lsn parameter xb_load_tablespaces(): pass only one flushed_lsn parameter. buf_page_create(): Improve comment about where FIL_PAGE_FIL_FLUSH_LSN_OR_KEY_VERSION is set. fil_write_flushed_lsn(): A new function, merged from fil_write_lsn_and_arch_no_to_file() and fil_write_flushed_lsn_to_data_files(). Only write to the first page of the system tablespace (page 0:0) if tablespace is encrypted, or write all first pages of system tablespace and invoke fil_flush_file_spaces(FIL_TYPE_TABLESPACE) afterwards. fil_read_first_page(): read flush_lsn and crypt_data only from first datafile. fil_open_single_table_tablespace(): Remove output of LSN, because it was only valid for the system tablespace and the undo tablespaces, not user tablespaces. fil_validate_single_table_tablespace(): Remove output of LSN. checkpoint_now_set(): Use fil_write_flushed_lsn and output a error if operation fails. Remove lsn variable from fsp_open_info. recv_recovery_from_checkpoint_start(): Remove unnecessary second flush_lsn parameter. log_empty_and_mark_files_at_shutdown(): Use fil_writte_flushed_lsn and output error if it fails. open_or_create_data_files(): Pass only one flushed_lsn variable.
9 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
17 years ago
16 years ago
17 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
16 years ago
  1. /*****************************************************************************
  2. Copyright (c) 1997, 2016, Oracle and/or its affiliates. All Rights Reserved.
  3. Copyright (c) 2017, MariaDB Corporation.
  4. This program is free software; you can redistribute it and/or modify it under
  5. the terms of the GNU General Public License as published by the Free Software
  6. Foundation; version 2 of the License.
  7. This program is distributed in the hope that it will be useful, but WITHOUT
  8. ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
  9. FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
  10. You should have received a copy of the GNU General Public License along with
  11. this program; if not, write to the Free Software Foundation, Inc.,
  12. 51 Franklin Street, Fifth Floor, Boston, MA 02110-1335 USA
  13. *****************************************************************************/
  14. /**************************************************//**
  15. @file include/log0recv.h
  16. Recovery
  17. Created 9/20/1997 Heikki Tuuri
  18. *******************************************************/
  19. #ifndef log0recv_h
  20. #define log0recv_h
  21. #include "univ.i"
  22. #include "ut0byte.h"
  23. #include "buf0types.h"
  24. #include "hash0hash.h"
  25. #include "log0log.h"
  26. #include <list>
  27. /******************************************************//**
  28. Checks the 4-byte checksum to the trailer checksum field of a log
  29. block. We also accept a log block in the old format before
  30. InnoDB-3.23.52 where the checksum field contains the log block number.
  31. @return TRUE if ok, or if the log block may be in the format of InnoDB
  32. version predating 3.23.52 */
  33. UNIV_INTERN
  34. ibool
  35. log_block_checksum_is_ok_or_old_format(
  36. /*===================================*/
  37. const byte* block, /*!< in: pointer to a log block */
  38. bool print_err); /*!< in print error ? */
  39. /*******************************************************//**
  40. Calculates the new value for lsn when more data is added to the log. */
  41. UNIV_INTERN
  42. ib_uint64_t
  43. recv_calc_lsn_on_data_add(
  44. /*======================*/
  45. lsn_t lsn, /*!< in: old lsn */
  46. ib_uint64_t len); /*!< in: this many bytes of data is
  47. added, log block headers not included */
  48. #ifdef UNIV_HOTBACKUP
  49. extern ibool recv_replay_file_ops;
  50. /*******************************************************************//**
  51. Reads the checkpoint info needed in hot backup.
  52. @return TRUE if success */
  53. UNIV_INTERN
  54. ibool
  55. recv_read_checkpoint_info_for_backup(
  56. /*=================================*/
  57. const byte* hdr, /*!< in: buffer containing the log group
  58. header */
  59. lsn_t* lsn, /*!< out: checkpoint lsn */
  60. lsn_t* offset, /*!< out: checkpoint offset in the log group */
  61. lsn_t* cp_no, /*!< out: checkpoint number */
  62. lsn_t* first_header_lsn)
  63. /*!< out: lsn of of the start of the
  64. first log file */
  65. MY_ATTRIBUTE((nonnull));
  66. /*******************************************************************//**
  67. Scans the log segment and n_bytes_scanned is set to the length of valid
  68. log scanned. */
  69. UNIV_INTERN
  70. void
  71. recv_scan_log_seg_for_backup(
  72. /*=========================*/
  73. byte* buf, /*!< in: buffer containing log data */
  74. ulint buf_len, /*!< in: data length in that buffer */
  75. lsn_t* scanned_lsn, /*!< in/out: lsn of buffer start,
  76. we return scanned lsn */
  77. ulint* scanned_checkpoint_no,
  78. /*!< in/out: 4 lowest bytes of the
  79. highest scanned checkpoint number so
  80. far */
  81. ulint* n_bytes_scanned);/*!< out: how much we were able to
  82. scan, smaller than buf_len if log
  83. data ended here */
  84. #endif /* UNIV_HOTBACKUP */
  85. /*******************************************************************//**
  86. Returns TRUE if recovery is currently running.
  87. @return recv_recovery_on */
  88. UNIV_INLINE
  89. ibool
  90. recv_recovery_is_on(void);
  91. /*=====================*/
  92. /************************************************************************//**
  93. Applies the hashed log records to the page, if the page lsn is less than the
  94. lsn of a log record. This can be called when a buffer page has just been
  95. read in, or also for a page already in the buffer pool. */
  96. UNIV_INTERN
  97. void
  98. recv_recover_page_func(
  99. /*===================*/
  100. #ifndef UNIV_HOTBACKUP
  101. ibool just_read_in,
  102. /*!< in: TRUE if the i/o handler calls
  103. this for a freshly read page */
  104. #endif /* !UNIV_HOTBACKUP */
  105. buf_block_t* block); /*!< in/out: buffer block */
  106. #ifndef UNIV_HOTBACKUP
  107. /** Wrapper for recv_recover_page_func().
  108. Applies the hashed log records to the page, if the page lsn is less than the
  109. lsn of a log record. This can be called when a buffer page has just been
  110. read in, or also for a page already in the buffer pool.
  111. @param jri in: TRUE if just read in (the i/o handler calls this for
  112. a freshly read page)
  113. @param block in/out: the buffer block
  114. */
  115. # define recv_recover_page(jri, block) recv_recover_page_func(jri, block)
  116. #else /* !UNIV_HOTBACKUP */
  117. /** Wrapper for recv_recover_page_func().
  118. Applies the hashed log records to the page, if the page lsn is less than the
  119. lsn of a log record. This can be called when a buffer page has just been
  120. read in, or also for a page already in the buffer pool.
  121. @param jri in: TRUE if just read in (the i/o handler calls this for
  122. a freshly read page)
  123. @param block in/out: the buffer block
  124. */
  125. # define recv_recover_page(jri, block) recv_recover_page_func(block)
  126. #endif /* !UNIV_HOTBACKUP */
  127. /** Recovers from a checkpoint. When this function returns, the database is able
  128. to start processing of new user transactions, but the function
  129. recv_recovery_from_checkpoint_finish should be called later to complete
  130. the recovery and free the resources used in it.
  131. @param[in] type LOG_CHECKPOINT or LOG_ARCHIVE
  132. @param[in] limit_lsn recover up to this lsn if possible
  133. @param[in] flushed_lsn flushed lsn from first data file
  134. @return error code or DB_SUCCESS */
  135. UNIV_INTERN
  136. dberr_t
  137. recv_recovery_from_checkpoint_start_func(
  138. #ifdef UNIV_LOG_ARCHIVE
  139. ulint type,
  140. lsn_t limit_lsn,
  141. #endif /* UNIV_LOG_ARCHIVE */
  142. lsn_t flushed_lsn)
  143. MY_ATTRIBUTE((warn_unused_result));
  144. #ifdef UNIV_LOG_ARCHIVE
  145. /** Wrapper for recv_recovery_from_checkpoint_start_func().
  146. Recovers from a checkpoint. When this function returns, the database is able
  147. to start processing of new user transactions, but the function
  148. recv_recovery_from_checkpoint_finish should be called later to complete
  149. the recovery and free the resources used in it.
  150. @param type in: LOG_CHECKPOINT or LOG_ARCHIVE
  151. @param lim in: recover up to this log sequence number if possible
  152. @param lsn in: flushed log sequence number from first data file
  153. @return error code or DB_SUCCESS */
  154. # define recv_recovery_from_checkpoint_start(type,lim,lsn) \
  155. recv_recovery_from_checkpoint_start_func(type,lim,lsn)
  156. #else /* UNIV_LOG_ARCHIVE */
  157. /** Wrapper for recv_recovery_from_checkpoint_start_func().
  158. Recovers from a checkpoint. When this function returns, the database is able
  159. to start processing of new user transactions, but the function
  160. recv_recovery_from_checkpoint_finish should be called later to complete
  161. the recovery and free the resources used in it.
  162. @param type ignored: LOG_CHECKPOINT or LOG_ARCHIVE
  163. @param lim ignored: recover up to this log sequence number if possible
  164. @param lsn in: flushed log sequence number from first data file
  165. @return error code or DB_SUCCESS */
  166. # define recv_recovery_from_checkpoint_start(type,lim,lsn) \
  167. recv_recovery_from_checkpoint_start_func(lsn)
  168. #endif /* UNIV_LOG_ARCHIVE */
  169. /********************************************************//**
  170. Completes recovery from a checkpoint. */
  171. UNIV_INTERN
  172. void
  173. recv_recovery_from_checkpoint_finish(void);
  174. /*======================================*/
  175. /********************************************************//**
  176. Initiates the rollback of active transactions. */
  177. UNIV_INTERN
  178. void
  179. recv_recovery_rollback_active(void);
  180. /*===============================*/
  181. /*******************************************************************//**
  182. Tries to parse a single log record and returns its length.
  183. @return length of the record, or 0 if the record was not complete */
  184. UNIV_INTERN
  185. ulint
  186. recv_parse_log_rec(
  187. /*===============*/
  188. byte* ptr, /*!< in: pointer to a buffer */
  189. byte* end_ptr,/*!< in: pointer to the buffer end */
  190. byte* type, /*!< out: type */
  191. ulint* space, /*!< out: space id */
  192. ulint* page_no,/*!< out: page number */
  193. byte** body); /*!< out: log record body start */
  194. /*******************************************************//**
  195. Scans log from a buffer and stores new log data to the parsing buffer.
  196. Parses and hashes the log records if new data found. Unless
  197. UNIV_HOTBACKUP is defined, this function will apply log records
  198. automatically when the hash table becomes full.
  199. @return TRUE if limit_lsn has been reached, or not able to scan any
  200. more in this log group */
  201. UNIV_INTERN
  202. ibool
  203. recv_scan_log_recs(
  204. /*===============*/
  205. ulint available_memory,/*!< in: we let the hash table of recs
  206. to grow to this size, at the maximum */
  207. ibool store_to_hash, /*!< in: TRUE if the records should be
  208. stored to the hash table; this is set
  209. to FALSE if just debug checking is
  210. needed */
  211. const byte* buf, /*!< in: buffer containing a log
  212. segment or garbage */
  213. ulint len, /*!< in: buffer length */
  214. lsn_t start_lsn, /*!< in: buffer start lsn */
  215. lsn_t* contiguous_lsn, /*!< in/out: it is known that all log
  216. groups contain contiguous log data up
  217. to this lsn */
  218. lsn_t* group_scanned_lsn);/*!< out: scanning succeeded up to
  219. this lsn */
  220. /******************************************************//**
  221. Resets the logs. The contents of log files will be lost! */
  222. UNIV_INTERN
  223. void
  224. recv_reset_logs(
  225. /*============*/
  226. #ifdef UNIV_LOG_ARCHIVE
  227. ulint arch_log_no, /*!< in: next archived log file number */
  228. ibool new_logs_created,/*!< in: TRUE if resetting logs
  229. is done at the log creation;
  230. FALSE if it is done after
  231. archive recovery */
  232. #endif /* UNIV_LOG_ARCHIVE */
  233. lsn_t lsn); /*!< in: reset to this lsn
  234. rounded up to be divisible by
  235. OS_FILE_LOG_BLOCK_SIZE, after
  236. which we add
  237. LOG_BLOCK_HDR_SIZE */
  238. #ifdef UNIV_HOTBACKUP
  239. /******************************************************//**
  240. Creates new log files after a backup has been restored. */
  241. UNIV_INTERN
  242. void
  243. recv_reset_log_files_for_backup(
  244. /*============================*/
  245. const char* log_dir, /*!< in: log file directory path */
  246. ulint n_log_files, /*!< in: number of log files */
  247. lsn_t log_file_size, /*!< in: log file size */
  248. lsn_t lsn); /*!< in: new start lsn, must be
  249. divisible by OS_FILE_LOG_BLOCK_SIZE */
  250. #endif /* UNIV_HOTBACKUP */
  251. /********************************************************//**
  252. Creates the recovery system. */
  253. UNIV_INTERN
  254. void
  255. recv_sys_create(void);
  256. /*=================*/
  257. /**********************************************************//**
  258. Release recovery system mutexes. */
  259. UNIV_INTERN
  260. void
  261. recv_sys_close(void);
  262. /*================*/
  263. /********************************************************//**
  264. Frees the recovery system memory. */
  265. UNIV_INTERN
  266. void
  267. recv_sys_mem_free(void);
  268. /*===================*/
  269. /********************************************************//**
  270. Inits the recovery system for a recovery operation. */
  271. UNIV_INTERN
  272. void
  273. recv_sys_init(
  274. /*==========*/
  275. ulint available_memory); /*!< in: available memory in bytes */
  276. #ifndef UNIV_HOTBACKUP
  277. /********************************************************//**
  278. Reset the state of the recovery system variables. */
  279. UNIV_INTERN
  280. void
  281. recv_sys_var_init(void);
  282. /*===================*/
  283. #endif /* !UNIV_HOTBACKUP */
  284. /** Apply the hash table of stored log records to persistent data pages.
  285. @param[in] last_batch whether the change buffer merge will be
  286. performed as part of the operation */
  287. UNIV_INTERN
  288. void
  289. recv_apply_hashed_log_recs(bool last_batch);
  290. #ifdef UNIV_HOTBACKUP
  291. /*******************************************************************//**
  292. Applies log records in the hash table to a backup. */
  293. UNIV_INTERN
  294. void
  295. recv_apply_log_recs_for_backup(void);
  296. /*================================*/
  297. #endif
  298. /** Block of log record data */
  299. struct recv_data_t{
  300. recv_data_t* next; /*!< pointer to the next block or NULL */
  301. /*!< the log record data is stored physically
  302. immediately after this struct, max amount
  303. RECV_DATA_BLOCK_SIZE bytes of it */
  304. };
  305. /** Stored log record struct */
  306. struct recv_t{
  307. byte type; /*!< log record type */
  308. ulint len; /*!< log record body length in bytes */
  309. recv_data_t* data; /*!< chain of blocks containing the log record
  310. body */
  311. lsn_t start_lsn;/*!< start lsn of the log segment written by
  312. the mtr which generated this log record: NOTE
  313. that this is not necessarily the start lsn of
  314. this log record */
  315. lsn_t end_lsn;/*!< end lsn of the log segment written by
  316. the mtr which generated this log record: NOTE
  317. that this is not necessarily the end lsn of
  318. this log record */
  319. UT_LIST_NODE_T(recv_t)
  320. rec_list;/*!< list of log records for this page */
  321. };
  322. /** States of recv_addr_t */
  323. enum recv_addr_state {
  324. /** not yet processed */
  325. RECV_NOT_PROCESSED,
  326. /** page is being read */
  327. RECV_BEING_READ,
  328. /** log records are being applied on the page */
  329. RECV_BEING_PROCESSED,
  330. /** log records have been applied on the page, or they have
  331. been discarded because the tablespace does not exist */
  332. RECV_PROCESSED
  333. };
  334. /** Hashed page file address struct */
  335. struct recv_addr_t{
  336. enum recv_addr_state state;
  337. /*!< recovery state of the page */
  338. unsigned space:32;/*!< space id */
  339. unsigned page_no:32;/*!< page number */
  340. UT_LIST_BASE_NODE_T(recv_t)
  341. rec_list;/*!< list of log records for this page */
  342. hash_node_t addr_hash;/*!< hash node in the hash bucket chain */
  343. };
  344. struct recv_dblwr_t {
  345. void add(byte* page);
  346. byte* find_page(ulint space_id, ulint page_no);
  347. std::list<byte *> pages; /* Pages from double write buffer */
  348. void operator() () {
  349. pages.clear();
  350. }
  351. };
  352. /** Recovery system data structure */
  353. struct recv_sys_t{
  354. #ifndef UNIV_HOTBACKUP
  355. ib_mutex_t mutex; /*!< mutex protecting the fields apply_log_recs,
  356. n_addrs, and the state field in each recv_addr
  357. struct */
  358. ib_mutex_t writer_mutex;/*!< mutex coordinating
  359. flushing between recv_writer_thread and
  360. the recovery thread. */
  361. #endif /* !UNIV_HOTBACKUP */
  362. ibool apply_log_recs;
  363. /*!< this is TRUE when log rec application to
  364. pages is allowed; this flag tells the
  365. i/o-handler if it should do log record
  366. application */
  367. ibool apply_batch_on;
  368. /*!< this is TRUE when a log rec application
  369. batch is running */
  370. lsn_t lsn; /*!< log sequence number */
  371. ulint last_log_buf_size;
  372. /*!< size of the log buffer when the database
  373. last time wrote to the log */
  374. byte* last_block;
  375. /*!< possible incomplete last recovered log
  376. block */
  377. byte* last_block_buf_start;
  378. /*!< the nonaligned start address of the
  379. preceding buffer */
  380. byte* buf; /*!< buffer for parsing log records */
  381. ulint len; /*!< amount of data in buf */
  382. lsn_t parse_start_lsn;
  383. /*!< this is the lsn from which we were able to
  384. start parsing log records and adding them to
  385. the hash table; zero if a suitable
  386. start point not found yet */
  387. lsn_t scanned_lsn;
  388. /*!< the log data has been scanned up to this
  389. lsn */
  390. ulint scanned_checkpoint_no;
  391. /*!< the log data has been scanned up to this
  392. checkpoint number (lowest 4 bytes) */
  393. ulint recovered_offset;
  394. /*!< start offset of non-parsed log records in
  395. buf */
  396. lsn_t recovered_lsn;
  397. /*!< the log records have been parsed up to
  398. this lsn */
  399. lsn_t limit_lsn;/*!< recovery should be made at most
  400. up to this lsn */
  401. ibool found_corrupt_log;
  402. /*!< this is set to TRUE if we during log
  403. scan find a corrupt log block, or a corrupt
  404. log record, or there is a log parsing
  405. buffer overflow */
  406. /** the time when progress was last reported */
  407. time_t progress_time;
  408. #ifdef UNIV_LOG_ARCHIVE
  409. log_group_t* archive_group;
  410. /*!< in archive recovery: the log group whose
  411. archive is read */
  412. #endif /* !UNIV_LOG_ARCHIVE */
  413. mem_heap_t* heap; /*!< memory heap of log records and file
  414. addresses*/
  415. hash_table_t* addr_hash;/*!< hash table of file addresses of pages */
  416. ulint n_addrs;/*!< number of not processed hashed file
  417. addresses in the hash table */
  418. recv_dblwr_t dblwr;
  419. /** Determine whether redo log recovery progress should be reported.
  420. @param[in] time the current time
  421. @return whether progress should be reported
  422. (the last report was at least 15 seconds ago) */
  423. bool report(time_t time)
  424. {
  425. if (time - progress_time < 15) {
  426. return false;
  427. }
  428. progress_time = time;
  429. return true;
  430. }
  431. };
  432. /** The recovery system */
  433. extern recv_sys_t* recv_sys;
  434. /** TRUE when applying redo log records during crash recovery; FALSE
  435. otherwise. Note that this is FALSE while a background thread is
  436. rolling back incomplete transactions. */
  437. extern ibool recv_recovery_on;
  438. /** If the following is TRUE, the buffer pool file pages must be invalidated
  439. after recovery and no ibuf operations are allowed; this becomes TRUE if
  440. the log record hash table becomes too full, and log records must be merged
  441. to file pages already before the recovery is finished: in this case no
  442. ibuf operations are allowed, as they could modify the pages read in the
  443. buffer pool before the pages have been recovered to the up-to-date state.
  444. TRUE means that recovery is running and no operations on the log files
  445. are allowed yet: the variable name is misleading. */
  446. extern ibool recv_no_ibuf_operations;
  447. /** TRUE when recv_init_crash_recovery() has been called. */
  448. extern ibool recv_needed_recovery;
  449. #ifdef UNIV_DEBUG
  450. /** TRUE if writing to the redo log (mtr_commit) is forbidden.
  451. Protected by log_sys->mutex. */
  452. extern ibool recv_no_log_write;
  453. #endif /* UNIV_DEBUG */
  454. /** TRUE if buf_page_is_corrupted() should check if the log sequence
  455. number (FIL_PAGE_LSN) is in the future. Initially FALSE, and set by
  456. recv_recovery_from_checkpoint_start_func(). */
  457. extern ibool recv_lsn_checks_on;
  458. #ifdef UNIV_HOTBACKUP
  459. /** TRUE when the redo log is being backed up */
  460. extern ibool recv_is_making_a_backup;
  461. #endif /* UNIV_HOTBACKUP */
  462. /** Maximum page number encountered in the redo log */
  463. extern ulint recv_max_parsed_page_no;
  464. /** Size of the parsing buffer; it must accommodate RECV_SCAN_SIZE many
  465. times! */
  466. #define RECV_PARSING_BUF_SIZE (2 * 1024 * 1024)
  467. /** Size of block reads when the log groups are scanned forward to do a
  468. roll-forward */
  469. #define RECV_SCAN_SIZE (4 * UNIV_PAGE_SIZE)
  470. /** This many frames must be left free in the buffer pool when we scan
  471. the log and store the scanned log records in the buffer pool: we will
  472. use these free frames to read in pages when we start applying the
  473. log records to the database. */
  474. extern ulint recv_n_pool_free_frames;
  475. #ifndef UNIV_NONINL
  476. #include "log0recv.ic"
  477. #endif
  478. #endif