You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

6359 lines
174 KiB

10 years ago
10 years ago
MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().
9 years ago
MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
12 years ago
MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().
9 years ago
12 years ago
11 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
11 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
11 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
11 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
11 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
11 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
11 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
11 years ago
MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-11738: Mariadb uses 100% of several of my 8 cpus doing nothing MDEV-11581: Mariadb starts InnoDB encryption threads when key has not changed or data scrubbing turned off Background: Key rotation is based on background threads (innodb-encryption-threads) periodically going through all tablespaces on fil_system. For each tablespace current used key version is compared to max key age (innodb-encryption-rotate-key-age). This process naturally takes CPU. Similarly, in same time need for scrubbing is investigated. Currently, key rotation is fully supported on Amazon AWS key management plugin only but InnoDB does not have knowledge what key management plugin is used. This patch re-purposes innodb-encryption-rotate-key-age=0 to disable key rotation and background data scrubbing. All new tables are added to special list for key rotation and key rotation is based on sending a event to background encryption threads instead of using periodic checking (i.e. timeout). fil0fil.cc: Added functions fil_space_acquire_low() to acquire a tablespace when it could be dropped concurrently. This function is used from fil_space_acquire() or fil_space_acquire_silent() that will not print any messages if we try to acquire space that does not exist. fil_space_release() to release a acquired tablespace. fil_space_next() to iterate tablespaces in fil_system using fil_space_acquire() and fil_space_release(). Similarly, fil_space_keyrotation_next() to iterate new list fil_system->rotation_list where new tables. are added if key rotation is disabled. Removed unnecessary functions fil_get_first_space_safe() fil_get_next_space_safe() fil_node_open_file(): After page 0 is read read also crypt_info if it is not yet read. btr_scrub_lock_dict_func() buf_page_check_corrupt() buf_page_encrypt_before_write() buf_merge_or_delete_for_page() lock_print_info_all_transactions() row_fts_psort_info_init() row_truncate_table_for_mysql() row_drop_table_for_mysql() Use fil_space_acquire()/release() to access fil_space_t. buf_page_decrypt_after_read(): Use fil_space_get_crypt_data() because at this point we might not yet have read page 0. fil0crypt.cc/fil0fil.h: Lot of changes. Pass fil_space_t* directly to functions needing it and store fil_space_t* to rotation state. Use fil_space_acquire()/release() when iterating tablespaces and removed unnecessary is_closing from fil_crypt_t. Use fil_space_t::is_stopping() to detect when access to tablespace should be stopped. Removed unnecessary fil_space_get_crypt_data(). fil_space_create(): Inform key rotation that there could be something to do if key rotation is disabled and new table with encryption enabled is created. Remove unnecessary functions fil_get_first_space_safe() and fil_get_next_space_safe(). fil_space_acquire() and fil_space_release() are used instead. Moved fil_space_get_crypt_data() and fil_space_set_crypt_data() to fil0crypt.cc. fsp_header_init(): Acquire fil_space_t*, write crypt_data and release space. check_table_options() Renamed FIL_SPACE_ENCRYPTION_* TO FIL_ENCRYPTION_* i_s.cc: Added ROTATING_OR_FLUSHING field to information_schema.innodb_tablespace_encryption to show current status of key rotation.
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
11 years ago
11 years ago
12 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
12 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
11 years ago
11 years ago
11 years ago
11 years ago
11 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
11 years ago
11 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
11 years ago
11 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
11 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
11 years ago
11 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
11 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
11 years ago
11 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
11 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
11 years ago
11 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
11 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
11 years ago
11 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
11 years ago
11 years ago
11 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
10 years ago
12 years ago
9 years ago
10 years ago
10 years ago
12 years ago
12 years ago
12 years ago
11 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
Merge Google encryption commit 195158e9889365dc3298f8c1f3bcaa745992f27f Author: Minli Zhu <minliz@google.com> Date: Mon Nov 25 11:05:55 2013 -0800 Innodb redo log encryption/decryption. Use start lsn of a log block as part of AES CTR counter. Record key version with each checkpoint. Internally key version 0 means no encryption. Tests done (see test_innodb_log_encryption.sh for detail): - Verify flag innodb_encrypt_log on or off, combined with various key versions passed through CLI, and dynamically set after startup, will not corrupt database. This includes tests from being unencrypted to encrypted, and encrypted to unencrypted. - Verify start-up with no redo logs succeeds. - Verify fresh start-up succeeds. Change-Id: I4ce4c2afdf3076be2fce90ebbc2a7ce01184b612 commit c1b97273659f07866758c25f4a56f680a1fbad24 Author: Jonas Oreland <jonaso@google.com> Date: Tue Dec 3 18:47:27 2013 +0100 encryption of aria data&index files this patch implements encryption of aria data & index files. this is implemented as 1) add read/write hooks (renamed from callbacks) that does encrypt/decrypt (also add pre_read and post_write hooks) 2) modify page headers for data/index to contain key version (making the data-page header size different for with/without encryption) 3) modify index page 0 to contain IV (and crypt header) 4) AES CRT crypt functions 5) counter block is implemented using combination of page no, lsn and table specific id NOTE: 1) log files are not encrypted, this is not needed for if aria is only used for internal temporary tables and they are not transactional (i.e not logged) 2) all encrypted tables are using PAGE_CHECKSUM (crc) normal internal temporary tables are (currently) not CHECKSUM:ed 3) This patch adds insert-order semantics to aria block_format. The default behaviour of aria block-format is best-fit, meaning that rows gets allocated to page trying to fill the pages as much as possible. However, certain sql constructs materialize temporary result in tmp-tables, and expect that a table scan will later return the rows in the same order they were inserted. This implementation of insert-order is only enabled when explicitly requested by sql-layer. CHANGES: 1) found bug in ma_write that made code try to abort a record that was never written unsure why this is not exposed Change-Id: Ia82bbaa92e2c0629c08693c5add2f56b815c0509 commit 89dc1ab651fe0205d55b4eb588f62df550aa65fc Author: Jonas Oreland <jonaso@google.com> Date: Mon Feb 17 08:04:50 2014 -0800 Implement encryption of innodb datafiles. Pages are encrypted before written to disk and decrypted when read from disk. Each page except first page (page 0) in tablespace is encrypted. Page 0 is unencrypted and contains IV for the tablespace. FIL_PAGE_FILE_FLUSH_LSN on each page (except page 0) is used to store a 32-bit key-version, so that multiple keys can be active in a tablespace simultaneous. The other 32-bit of the FIL_PAGE_FILE_FLUSH_LSN field contains a checksum that is computed after encryption. This checksum is used by innochecksum and when restoring from double-write-buffer. The encryption is performed using AES CRT. Monitoring of encryption is enabled using new IS-table INNODB_TABLESPACES_ENCRYPTION. In addition to that new status variables innodb_encryption_rotation_{ pages_read_from_cache, pages_read_from_disk, pages_modified,pages_flushed } has been added. The following tunables are introduces - innodb_encrypt_tables - innodb_encryption_threads - innodb_encryption_rotate_key_age - innodb_encryption_rotation_iops Change-Id: I8f651795a30b52e71b16d6bc9cb7559be349d0b2 commit a17eef2f6948e58219c9e26fc35633d6fd4de1de Author: Andrew Ford <andrewford@google.com> Date: Thu Jan 2 15:43:09 2014 -0800 Key management skeleton with debug hooks. Change-Id: Ifd6aa3743d7ea291c70083f433a059c439aed866 commit 68a399838ad72264fd61b3dc67fecd29bbdb0af1 Author: Andrew Ford <andrewford@google.com> Date: Mon Oct 28 16:27:44 2013 -0700 Add AES-128 CTR and GCM encryption classes. Change-Id: I116305eced2a233db15306bc2ef5b9d398d1a3a2
11 years ago
Merge Google encryption commit 195158e9889365dc3298f8c1f3bcaa745992f27f Author: Minli Zhu <minliz@google.com> Date: Mon Nov 25 11:05:55 2013 -0800 Innodb redo log encryption/decryption. Use start lsn of a log block as part of AES CTR counter. Record key version with each checkpoint. Internally key version 0 means no encryption. Tests done (see test_innodb_log_encryption.sh for detail): - Verify flag innodb_encrypt_log on or off, combined with various key versions passed through CLI, and dynamically set after startup, will not corrupt database. This includes tests from being unencrypted to encrypted, and encrypted to unencrypted. - Verify start-up with no redo logs succeeds. - Verify fresh start-up succeeds. Change-Id: I4ce4c2afdf3076be2fce90ebbc2a7ce01184b612 commit c1b97273659f07866758c25f4a56f680a1fbad24 Author: Jonas Oreland <jonaso@google.com> Date: Tue Dec 3 18:47:27 2013 +0100 encryption of aria data&index files this patch implements encryption of aria data & index files. this is implemented as 1) add read/write hooks (renamed from callbacks) that does encrypt/decrypt (also add pre_read and post_write hooks) 2) modify page headers for data/index to contain key version (making the data-page header size different for with/without encryption) 3) modify index page 0 to contain IV (and crypt header) 4) AES CRT crypt functions 5) counter block is implemented using combination of page no, lsn and table specific id NOTE: 1) log files are not encrypted, this is not needed for if aria is only used for internal temporary tables and they are not transactional (i.e not logged) 2) all encrypted tables are using PAGE_CHECKSUM (crc) normal internal temporary tables are (currently) not CHECKSUM:ed 3) This patch adds insert-order semantics to aria block_format. The default behaviour of aria block-format is best-fit, meaning that rows gets allocated to page trying to fill the pages as much as possible. However, certain sql constructs materialize temporary result in tmp-tables, and expect that a table scan will later return the rows in the same order they were inserted. This implementation of insert-order is only enabled when explicitly requested by sql-layer. CHANGES: 1) found bug in ma_write that made code try to abort a record that was never written unsure why this is not exposed Change-Id: Ia82bbaa92e2c0629c08693c5add2f56b815c0509 commit 89dc1ab651fe0205d55b4eb588f62df550aa65fc Author: Jonas Oreland <jonaso@google.com> Date: Mon Feb 17 08:04:50 2014 -0800 Implement encryption of innodb datafiles. Pages are encrypted before written to disk and decrypted when read from disk. Each page except first page (page 0) in tablespace is encrypted. Page 0 is unencrypted and contains IV for the tablespace. FIL_PAGE_FILE_FLUSH_LSN on each page (except page 0) is used to store a 32-bit key-version, so that multiple keys can be active in a tablespace simultaneous. The other 32-bit of the FIL_PAGE_FILE_FLUSH_LSN field contains a checksum that is computed after encryption. This checksum is used by innochecksum and when restoring from double-write-buffer. The encryption is performed using AES CRT. Monitoring of encryption is enabled using new IS-table INNODB_TABLESPACES_ENCRYPTION. In addition to that new status variables innodb_encryption_rotation_{ pages_read_from_cache, pages_read_from_disk, pages_modified,pages_flushed } has been added. The following tunables are introduces - innodb_encrypt_tables - innodb_encryption_threads - innodb_encryption_rotate_key_age - innodb_encryption_rotation_iops Change-Id: I8f651795a30b52e71b16d6bc9cb7559be349d0b2 commit a17eef2f6948e58219c9e26fc35633d6fd4de1de Author: Andrew Ford <andrewford@google.com> Date: Thu Jan 2 15:43:09 2014 -0800 Key management skeleton with debug hooks. Change-Id: Ifd6aa3743d7ea291c70083f433a059c439aed866 commit 68a399838ad72264fd61b3dc67fecd29bbdb0af1 Author: Andrew Ford <andrewford@google.com> Date: Mon Oct 28 16:27:44 2013 -0700 Add AES-128 CTR and GCM encryption classes. Change-Id: I116305eced2a233db15306bc2ef5b9d398d1a3a2
11 years ago
Merge Google encryption commit 195158e9889365dc3298f8c1f3bcaa745992f27f Author: Minli Zhu <minliz@google.com> Date: Mon Nov 25 11:05:55 2013 -0800 Innodb redo log encryption/decryption. Use start lsn of a log block as part of AES CTR counter. Record key version with each checkpoint. Internally key version 0 means no encryption. Tests done (see test_innodb_log_encryption.sh for detail): - Verify flag innodb_encrypt_log on or off, combined with various key versions passed through CLI, and dynamically set after startup, will not corrupt database. This includes tests from being unencrypted to encrypted, and encrypted to unencrypted. - Verify start-up with no redo logs succeeds. - Verify fresh start-up succeeds. Change-Id: I4ce4c2afdf3076be2fce90ebbc2a7ce01184b612 commit c1b97273659f07866758c25f4a56f680a1fbad24 Author: Jonas Oreland <jonaso@google.com> Date: Tue Dec 3 18:47:27 2013 +0100 encryption of aria data&index files this patch implements encryption of aria data & index files. this is implemented as 1) add read/write hooks (renamed from callbacks) that does encrypt/decrypt (also add pre_read and post_write hooks) 2) modify page headers for data/index to contain key version (making the data-page header size different for with/without encryption) 3) modify index page 0 to contain IV (and crypt header) 4) AES CRT crypt functions 5) counter block is implemented using combination of page no, lsn and table specific id NOTE: 1) log files are not encrypted, this is not needed for if aria is only used for internal temporary tables and they are not transactional (i.e not logged) 2) all encrypted tables are using PAGE_CHECKSUM (crc) normal internal temporary tables are (currently) not CHECKSUM:ed 3) This patch adds insert-order semantics to aria block_format. The default behaviour of aria block-format is best-fit, meaning that rows gets allocated to page trying to fill the pages as much as possible. However, certain sql constructs materialize temporary result in tmp-tables, and expect that a table scan will later return the rows in the same order they were inserted. This implementation of insert-order is only enabled when explicitly requested by sql-layer. CHANGES: 1) found bug in ma_write that made code try to abort a record that was never written unsure why this is not exposed Change-Id: Ia82bbaa92e2c0629c08693c5add2f56b815c0509 commit 89dc1ab651fe0205d55b4eb588f62df550aa65fc Author: Jonas Oreland <jonaso@google.com> Date: Mon Feb 17 08:04:50 2014 -0800 Implement encryption of innodb datafiles. Pages are encrypted before written to disk and decrypted when read from disk. Each page except first page (page 0) in tablespace is encrypted. Page 0 is unencrypted and contains IV for the tablespace. FIL_PAGE_FILE_FLUSH_LSN on each page (except page 0) is used to store a 32-bit key-version, so that multiple keys can be active in a tablespace simultaneous. The other 32-bit of the FIL_PAGE_FILE_FLUSH_LSN field contains a checksum that is computed after encryption. This checksum is used by innochecksum and when restoring from double-write-buffer. The encryption is performed using AES CRT. Monitoring of encryption is enabled using new IS-table INNODB_TABLESPACES_ENCRYPTION. In addition to that new status variables innodb_encryption_rotation_{ pages_read_from_cache, pages_read_from_disk, pages_modified,pages_flushed } has been added. The following tunables are introduces - innodb_encrypt_tables - innodb_encryption_threads - innodb_encryption_rotate_key_age - innodb_encryption_rotation_iops Change-Id: I8f651795a30b52e71b16d6bc9cb7559be349d0b2 commit a17eef2f6948e58219c9e26fc35633d6fd4de1de Author: Andrew Ford <andrewford@google.com> Date: Thu Jan 2 15:43:09 2014 -0800 Key management skeleton with debug hooks. Change-Id: Ifd6aa3743d7ea291c70083f433a059c439aed866 commit 68a399838ad72264fd61b3dc67fecd29bbdb0af1 Author: Andrew Ford <andrewford@google.com> Date: Mon Oct 28 16:27:44 2013 -0700 Add AES-128 CTR and GCM encryption classes. Change-Id: I116305eced2a233db15306bc2ef5b9d398d1a3a2
11 years ago
Merge Google encryption commit 195158e9889365dc3298f8c1f3bcaa745992f27f Author: Minli Zhu <minliz@google.com> Date: Mon Nov 25 11:05:55 2013 -0800 Innodb redo log encryption/decryption. Use start lsn of a log block as part of AES CTR counter. Record key version with each checkpoint. Internally key version 0 means no encryption. Tests done (see test_innodb_log_encryption.sh for detail): - Verify flag innodb_encrypt_log on or off, combined with various key versions passed through CLI, and dynamically set after startup, will not corrupt database. This includes tests from being unencrypted to encrypted, and encrypted to unencrypted. - Verify start-up with no redo logs succeeds. - Verify fresh start-up succeeds. Change-Id: I4ce4c2afdf3076be2fce90ebbc2a7ce01184b612 commit c1b97273659f07866758c25f4a56f680a1fbad24 Author: Jonas Oreland <jonaso@google.com> Date: Tue Dec 3 18:47:27 2013 +0100 encryption of aria data&index files this patch implements encryption of aria data & index files. this is implemented as 1) add read/write hooks (renamed from callbacks) that does encrypt/decrypt (also add pre_read and post_write hooks) 2) modify page headers for data/index to contain key version (making the data-page header size different for with/without encryption) 3) modify index page 0 to contain IV (and crypt header) 4) AES CRT crypt functions 5) counter block is implemented using combination of page no, lsn and table specific id NOTE: 1) log files are not encrypted, this is not needed for if aria is only used for internal temporary tables and they are not transactional (i.e not logged) 2) all encrypted tables are using PAGE_CHECKSUM (crc) normal internal temporary tables are (currently) not CHECKSUM:ed 3) This patch adds insert-order semantics to aria block_format. The default behaviour of aria block-format is best-fit, meaning that rows gets allocated to page trying to fill the pages as much as possible. However, certain sql constructs materialize temporary result in tmp-tables, and expect that a table scan will later return the rows in the same order they were inserted. This implementation of insert-order is only enabled when explicitly requested by sql-layer. CHANGES: 1) found bug in ma_write that made code try to abort a record that was never written unsure why this is not exposed Change-Id: Ia82bbaa92e2c0629c08693c5add2f56b815c0509 commit 89dc1ab651fe0205d55b4eb588f62df550aa65fc Author: Jonas Oreland <jonaso@google.com> Date: Mon Feb 17 08:04:50 2014 -0800 Implement encryption of innodb datafiles. Pages are encrypted before written to disk and decrypted when read from disk. Each page except first page (page 0) in tablespace is encrypted. Page 0 is unencrypted and contains IV for the tablespace. FIL_PAGE_FILE_FLUSH_LSN on each page (except page 0) is used to store a 32-bit key-version, so that multiple keys can be active in a tablespace simultaneous. The other 32-bit of the FIL_PAGE_FILE_FLUSH_LSN field contains a checksum that is computed after encryption. This checksum is used by innochecksum and when restoring from double-write-buffer. The encryption is performed using AES CRT. Monitoring of encryption is enabled using new IS-table INNODB_TABLESPACES_ENCRYPTION. In addition to that new status variables innodb_encryption_rotation_{ pages_read_from_cache, pages_read_from_disk, pages_modified,pages_flushed } has been added. The following tunables are introduces - innodb_encrypt_tables - innodb_encryption_threads - innodb_encryption_rotate_key_age - innodb_encryption_rotation_iops Change-Id: I8f651795a30b52e71b16d6bc9cb7559be349d0b2 commit a17eef2f6948e58219c9e26fc35633d6fd4de1de Author: Andrew Ford <andrewford@google.com> Date: Thu Jan 2 15:43:09 2014 -0800 Key management skeleton with debug hooks. Change-Id: Ifd6aa3743d7ea291c70083f433a059c439aed866 commit 68a399838ad72264fd61b3dc67fecd29bbdb0af1 Author: Andrew Ford <andrewford@google.com> Date: Mon Oct 28 16:27:44 2013 -0700 Add AES-128 CTR and GCM encryption classes. Change-Id: I116305eced2a233db15306bc2ef5b9d398d1a3a2
11 years ago
Merge Google encryption commit 195158e9889365dc3298f8c1f3bcaa745992f27f Author: Minli Zhu <minliz@google.com> Date: Mon Nov 25 11:05:55 2013 -0800 Innodb redo log encryption/decryption. Use start lsn of a log block as part of AES CTR counter. Record key version with each checkpoint. Internally key version 0 means no encryption. Tests done (see test_innodb_log_encryption.sh for detail): - Verify flag innodb_encrypt_log on or off, combined with various key versions passed through CLI, and dynamically set after startup, will not corrupt database. This includes tests from being unencrypted to encrypted, and encrypted to unencrypted. - Verify start-up with no redo logs succeeds. - Verify fresh start-up succeeds. Change-Id: I4ce4c2afdf3076be2fce90ebbc2a7ce01184b612 commit c1b97273659f07866758c25f4a56f680a1fbad24 Author: Jonas Oreland <jonaso@google.com> Date: Tue Dec 3 18:47:27 2013 +0100 encryption of aria data&index files this patch implements encryption of aria data & index files. this is implemented as 1) add read/write hooks (renamed from callbacks) that does encrypt/decrypt (also add pre_read and post_write hooks) 2) modify page headers for data/index to contain key version (making the data-page header size different for with/without encryption) 3) modify index page 0 to contain IV (and crypt header) 4) AES CRT crypt functions 5) counter block is implemented using combination of page no, lsn and table specific id NOTE: 1) log files are not encrypted, this is not needed for if aria is only used for internal temporary tables and they are not transactional (i.e not logged) 2) all encrypted tables are using PAGE_CHECKSUM (crc) normal internal temporary tables are (currently) not CHECKSUM:ed 3) This patch adds insert-order semantics to aria block_format. The default behaviour of aria block-format is best-fit, meaning that rows gets allocated to page trying to fill the pages as much as possible. However, certain sql constructs materialize temporary result in tmp-tables, and expect that a table scan will later return the rows in the same order they were inserted. This implementation of insert-order is only enabled when explicitly requested by sql-layer. CHANGES: 1) found bug in ma_write that made code try to abort a record that was never written unsure why this is not exposed Change-Id: Ia82bbaa92e2c0629c08693c5add2f56b815c0509 commit 89dc1ab651fe0205d55b4eb588f62df550aa65fc Author: Jonas Oreland <jonaso@google.com> Date: Mon Feb 17 08:04:50 2014 -0800 Implement encryption of innodb datafiles. Pages are encrypted before written to disk and decrypted when read from disk. Each page except first page (page 0) in tablespace is encrypted. Page 0 is unencrypted and contains IV for the tablespace. FIL_PAGE_FILE_FLUSH_LSN on each page (except page 0) is used to store a 32-bit key-version, so that multiple keys can be active in a tablespace simultaneous. The other 32-bit of the FIL_PAGE_FILE_FLUSH_LSN field contains a checksum that is computed after encryption. This checksum is used by innochecksum and when restoring from double-write-buffer. The encryption is performed using AES CRT. Monitoring of encryption is enabled using new IS-table INNODB_TABLESPACES_ENCRYPTION. In addition to that new status variables innodb_encryption_rotation_{ pages_read_from_cache, pages_read_from_disk, pages_modified,pages_flushed } has been added. The following tunables are introduces - innodb_encrypt_tables - innodb_encryption_threads - innodb_encryption_rotate_key_age - innodb_encryption_rotation_iops Change-Id: I8f651795a30b52e71b16d6bc9cb7559be349d0b2 commit a17eef2f6948e58219c9e26fc35633d6fd4de1de Author: Andrew Ford <andrewford@google.com> Date: Thu Jan 2 15:43:09 2014 -0800 Key management skeleton with debug hooks. Change-Id: Ifd6aa3743d7ea291c70083f433a059c439aed866 commit 68a399838ad72264fd61b3dc67fecd29bbdb0af1 Author: Andrew Ford <andrewford@google.com> Date: Mon Oct 28 16:27:44 2013 -0700 Add AES-128 CTR and GCM encryption classes. Change-Id: I116305eced2a233db15306bc2ef5b9d398d1a3a2
11 years ago
10 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
12 years ago
MDEV-12113: install_db shows corruption for rest encryption with innodb_data_file_path=ibdata1:3M; Problem was that FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION field that for encrypted pages even in system datafiles should contain key_version except very first page (0:0) is after encryption overwritten with flush lsn. Ported WL#7990 Repurpose FIL_PAGE_FLUSH_LSN to 10.1 The field FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION is consulted during InnoDB startup. At startup, InnoDB reads the FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION from the first page of each file in the InnoDB system tablespace. If there are multiple files, the minimum and maximum LSN can differ. These numbers are passed to InnoDB startup. Having the number in other files than the first file of the InnoDB system tablespace is not providing much additional value. It is conflicting with other use of the field, such as on InnoDB R-tree index pages and encryption key_version. This worklog will stop writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION to other files than the first file of the InnoDB system tablespace (page number 0:0) when system tablespace is encrypted. If tablespace is not encrypted we continue writing FIL_PAGE_FLUSH_LSN_OR_KEY_VERSION to all first pages of system tablespace to avoid unnecessary warnings on downgrade. open_or_create_data_files(): pass only one flushed_lsn parameter xb_load_tablespaces(): pass only one flushed_lsn parameter. buf_page_create(): Improve comment about where FIL_PAGE_FIL_FLUSH_LSN_OR_KEY_VERSION is set. fil_write_flushed_lsn(): A new function, merged from fil_write_lsn_and_arch_no_to_file() and fil_write_flushed_lsn_to_data_files(). Only write to the first page of the system tablespace (page 0:0) if tablespace is encrypted, or write all first pages of system tablespace and invoke fil_flush_file_spaces(FIL_TYPE_TABLESPACE) afterwards. fil_read_first_page(): read flush_lsn and crypt_data only from first datafile. fil_open_single_table_tablespace(): Remove output of LSN, because it was only valid for the system tablespace and the undo tablespaces, not user tablespaces. fil_validate_single_table_tablespace(): Remove output of LSN. checkpoint_now_set(): Use fil_write_flushed_lsn and output a error if operation fails. Remove lsn variable from fsp_open_info. recv_recovery_from_checkpoint_start(): Remove unnecessary second flush_lsn parameter. log_empty_and_mark_files_at_shutdown(): Use fil_writte_flushed_lsn and output error if it fails. open_or_create_data_files(): Pass only one flushed_lsn variable.
9 years ago
Merge Google encryption commit 195158e9889365dc3298f8c1f3bcaa745992f27f Author: Minli Zhu <minliz@google.com> Date: Mon Nov 25 11:05:55 2013 -0800 Innodb redo log encryption/decryption. Use start lsn of a log block as part of AES CTR counter. Record key version with each checkpoint. Internally key version 0 means no encryption. Tests done (see test_innodb_log_encryption.sh for detail): - Verify flag innodb_encrypt_log on or off, combined with various key versions passed through CLI, and dynamically set after startup, will not corrupt database. This includes tests from being unencrypted to encrypted, and encrypted to unencrypted. - Verify start-up with no redo logs succeeds. - Verify fresh start-up succeeds. Change-Id: I4ce4c2afdf3076be2fce90ebbc2a7ce01184b612 commit c1b97273659f07866758c25f4a56f680a1fbad24 Author: Jonas Oreland <jonaso@google.com> Date: Tue Dec 3 18:47:27 2013 +0100 encryption of aria data&index files this patch implements encryption of aria data & index files. this is implemented as 1) add read/write hooks (renamed from callbacks) that does encrypt/decrypt (also add pre_read and post_write hooks) 2) modify page headers for data/index to contain key version (making the data-page header size different for with/without encryption) 3) modify index page 0 to contain IV (and crypt header) 4) AES CRT crypt functions 5) counter block is implemented using combination of page no, lsn and table specific id NOTE: 1) log files are not encrypted, this is not needed for if aria is only used for internal temporary tables and they are not transactional (i.e not logged) 2) all encrypted tables are using PAGE_CHECKSUM (crc) normal internal temporary tables are (currently) not CHECKSUM:ed 3) This patch adds insert-order semantics to aria block_format. The default behaviour of aria block-format is best-fit, meaning that rows gets allocated to page trying to fill the pages as much as possible. However, certain sql constructs materialize temporary result in tmp-tables, and expect that a table scan will later return the rows in the same order they were inserted. This implementation of insert-order is only enabled when explicitly requested by sql-layer. CHANGES: 1) found bug in ma_write that made code try to abort a record that was never written unsure why this is not exposed Change-Id: Ia82bbaa92e2c0629c08693c5add2f56b815c0509 commit 89dc1ab651fe0205d55b4eb588f62df550aa65fc Author: Jonas Oreland <jonaso@google.com> Date: Mon Feb 17 08:04:50 2014 -0800 Implement encryption of innodb datafiles. Pages are encrypted before written to disk and decrypted when read from disk. Each page except first page (page 0) in tablespace is encrypted. Page 0 is unencrypted and contains IV for the tablespace. FIL_PAGE_FILE_FLUSH_LSN on each page (except page 0) is used to store a 32-bit key-version, so that multiple keys can be active in a tablespace simultaneous. The other 32-bit of the FIL_PAGE_FILE_FLUSH_LSN field contains a checksum that is computed after encryption. This checksum is used by innochecksum and when restoring from double-write-buffer. The encryption is performed using AES CRT. Monitoring of encryption is enabled using new IS-table INNODB_TABLESPACES_ENCRYPTION. In addition to that new status variables innodb_encryption_rotation_{ pages_read_from_cache, pages_read_from_disk, pages_modified,pages_flushed } has been added. The following tunables are introduces - innodb_encrypt_tables - innodb_encryption_threads - innodb_encryption_rotate_key_age - innodb_encryption_rotation_iops Change-Id: I8f651795a30b52e71b16d6bc9cb7559be349d0b2 commit a17eef2f6948e58219c9e26fc35633d6fd4de1de Author: Andrew Ford <andrewford@google.com> Date: Thu Jan 2 15:43:09 2014 -0800 Key management skeleton with debug hooks. Change-Id: Ifd6aa3743d7ea291c70083f433a059c439aed866 commit 68a399838ad72264fd61b3dc67fecd29bbdb0af1 Author: Andrew Ford <andrewford@google.com> Date: Mon Oct 28 16:27:44 2013 -0700 Add AES-128 CTR and GCM encryption classes. Change-Id: I116305eced2a233db15306bc2ef5b9d398d1a3a2
11 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().
9 years ago
MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().
9 years ago
MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
Merge Google encryption commit 195158e9889365dc3298f8c1f3bcaa745992f27f Author: Minli Zhu <minliz@google.com> Date: Mon Nov 25 11:05:55 2013 -0800 Innodb redo log encryption/decryption. Use start lsn of a log block as part of AES CTR counter. Record key version with each checkpoint. Internally key version 0 means no encryption. Tests done (see test_innodb_log_encryption.sh for detail): - Verify flag innodb_encrypt_log on or off, combined with various key versions passed through CLI, and dynamically set after startup, will not corrupt database. This includes tests from being unencrypted to encrypted, and encrypted to unencrypted. - Verify start-up with no redo logs succeeds. - Verify fresh start-up succeeds. Change-Id: I4ce4c2afdf3076be2fce90ebbc2a7ce01184b612 commit c1b97273659f07866758c25f4a56f680a1fbad24 Author: Jonas Oreland <jonaso@google.com> Date: Tue Dec 3 18:47:27 2013 +0100 encryption of aria data&index files this patch implements encryption of aria data & index files. this is implemented as 1) add read/write hooks (renamed from callbacks) that does encrypt/decrypt (also add pre_read and post_write hooks) 2) modify page headers for data/index to contain key version (making the data-page header size different for with/without encryption) 3) modify index page 0 to contain IV (and crypt header) 4) AES CRT crypt functions 5) counter block is implemented using combination of page no, lsn and table specific id NOTE: 1) log files are not encrypted, this is not needed for if aria is only used for internal temporary tables and they are not transactional (i.e not logged) 2) all encrypted tables are using PAGE_CHECKSUM (crc) normal internal temporary tables are (currently) not CHECKSUM:ed 3) This patch adds insert-order semantics to aria block_format. The default behaviour of aria block-format is best-fit, meaning that rows gets allocated to page trying to fill the pages as much as possible. However, certain sql constructs materialize temporary result in tmp-tables, and expect that a table scan will later return the rows in the same order they were inserted. This implementation of insert-order is only enabled when explicitly requested by sql-layer. CHANGES: 1) found bug in ma_write that made code try to abort a record that was never written unsure why this is not exposed Change-Id: Ia82bbaa92e2c0629c08693c5add2f56b815c0509 commit 89dc1ab651fe0205d55b4eb588f62df550aa65fc Author: Jonas Oreland <jonaso@google.com> Date: Mon Feb 17 08:04:50 2014 -0800 Implement encryption of innodb datafiles. Pages are encrypted before written to disk and decrypted when read from disk. Each page except first page (page 0) in tablespace is encrypted. Page 0 is unencrypted and contains IV for the tablespace. FIL_PAGE_FILE_FLUSH_LSN on each page (except page 0) is used to store a 32-bit key-version, so that multiple keys can be active in a tablespace simultaneous. The other 32-bit of the FIL_PAGE_FILE_FLUSH_LSN field contains a checksum that is computed after encryption. This checksum is used by innochecksum and when restoring from double-write-buffer. The encryption is performed using AES CRT. Monitoring of encryption is enabled using new IS-table INNODB_TABLESPACES_ENCRYPTION. In addition to that new status variables innodb_encryption_rotation_{ pages_read_from_cache, pages_read_from_disk, pages_modified,pages_flushed } has been added. The following tunables are introduces - innodb_encrypt_tables - innodb_encryption_threads - innodb_encryption_rotate_key_age - innodb_encryption_rotation_iops Change-Id: I8f651795a30b52e71b16d6bc9cb7559be349d0b2 commit a17eef2f6948e58219c9e26fc35633d6fd4de1de Author: Andrew Ford <andrewford@google.com> Date: Thu Jan 2 15:43:09 2014 -0800 Key management skeleton with debug hooks. Change-Id: Ifd6aa3743d7ea291c70083f433a059c439aed866 commit 68a399838ad72264fd61b3dc67fecd29bbdb0af1 Author: Andrew Ford <andrewford@google.com> Date: Mon Oct 28 16:27:44 2013 -0700 Add AES-128 CTR and GCM encryption classes. Change-Id: I116305eced2a233db15306bc2ef5b9d398d1a3a2
11 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().
9 years ago
Merge Google encryption commit 195158e9889365dc3298f8c1f3bcaa745992f27f Author: Minli Zhu <minliz@google.com> Date: Mon Nov 25 11:05:55 2013 -0800 Innodb redo log encryption/decryption. Use start lsn of a log block as part of AES CTR counter. Record key version with each checkpoint. Internally key version 0 means no encryption. Tests done (see test_innodb_log_encryption.sh for detail): - Verify flag innodb_encrypt_log on or off, combined with various key versions passed through CLI, and dynamically set after startup, will not corrupt database. This includes tests from being unencrypted to encrypted, and encrypted to unencrypted. - Verify start-up with no redo logs succeeds. - Verify fresh start-up succeeds. Change-Id: I4ce4c2afdf3076be2fce90ebbc2a7ce01184b612 commit c1b97273659f07866758c25f4a56f680a1fbad24 Author: Jonas Oreland <jonaso@google.com> Date: Tue Dec 3 18:47:27 2013 +0100 encryption of aria data&index files this patch implements encryption of aria data & index files. this is implemented as 1) add read/write hooks (renamed from callbacks) that does encrypt/decrypt (also add pre_read and post_write hooks) 2) modify page headers for data/index to contain key version (making the data-page header size different for with/without encryption) 3) modify index page 0 to contain IV (and crypt header) 4) AES CRT crypt functions 5) counter block is implemented using combination of page no, lsn and table specific id NOTE: 1) log files are not encrypted, this is not needed for if aria is only used for internal temporary tables and they are not transactional (i.e not logged) 2) all encrypted tables are using PAGE_CHECKSUM (crc) normal internal temporary tables are (currently) not CHECKSUM:ed 3) This patch adds insert-order semantics to aria block_format. The default behaviour of aria block-format is best-fit, meaning that rows gets allocated to page trying to fill the pages as much as possible. However, certain sql constructs materialize temporary result in tmp-tables, and expect that a table scan will later return the rows in the same order they were inserted. This implementation of insert-order is only enabled when explicitly requested by sql-layer. CHANGES: 1) found bug in ma_write that made code try to abort a record that was never written unsure why this is not exposed Change-Id: Ia82bbaa92e2c0629c08693c5add2f56b815c0509 commit 89dc1ab651fe0205d55b4eb588f62df550aa65fc Author: Jonas Oreland <jonaso@google.com> Date: Mon Feb 17 08:04:50 2014 -0800 Implement encryption of innodb datafiles. Pages are encrypted before written to disk and decrypted when read from disk. Each page except first page (page 0) in tablespace is encrypted. Page 0 is unencrypted and contains IV for the tablespace. FIL_PAGE_FILE_FLUSH_LSN on each page (except page 0) is used to store a 32-bit key-version, so that multiple keys can be active in a tablespace simultaneous. The other 32-bit of the FIL_PAGE_FILE_FLUSH_LSN field contains a checksum that is computed after encryption. This checksum is used by innochecksum and when restoring from double-write-buffer. The encryption is performed using AES CRT. Monitoring of encryption is enabled using new IS-table INNODB_TABLESPACES_ENCRYPTION. In addition to that new status variables innodb_encryption_rotation_{ pages_read_from_cache, pages_read_from_disk, pages_modified,pages_flushed } has been added. The following tunables are introduces - innodb_encrypt_tables - innodb_encryption_threads - innodb_encryption_rotate_key_age - innodb_encryption_rotation_iops Change-Id: I8f651795a30b52e71b16d6bc9cb7559be349d0b2 commit a17eef2f6948e58219c9e26fc35633d6fd4de1de Author: Andrew Ford <andrewford@google.com> Date: Thu Jan 2 15:43:09 2014 -0800 Key management skeleton with debug hooks. Change-Id: Ifd6aa3743d7ea291c70083f433a059c439aed866 commit 68a399838ad72264fd61b3dc67fecd29bbdb0af1 Author: Andrew Ford <andrewford@google.com> Date: Mon Oct 28 16:27:44 2013 -0700 Add AES-128 CTR and GCM encryption classes. Change-Id: I116305eced2a233db15306bc2ef5b9d398d1a3a2
11 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
11 years ago
Merge Google encryption commit 195158e9889365dc3298f8c1f3bcaa745992f27f Author: Minli Zhu <minliz@google.com> Date: Mon Nov 25 11:05:55 2013 -0800 Innodb redo log encryption/decryption. Use start lsn of a log block as part of AES CTR counter. Record key version with each checkpoint. Internally key version 0 means no encryption. Tests done (see test_innodb_log_encryption.sh for detail): - Verify flag innodb_encrypt_log on or off, combined with various key versions passed through CLI, and dynamically set after startup, will not corrupt database. This includes tests from being unencrypted to encrypted, and encrypted to unencrypted. - Verify start-up with no redo logs succeeds. - Verify fresh start-up succeeds. Change-Id: I4ce4c2afdf3076be2fce90ebbc2a7ce01184b612 commit c1b97273659f07866758c25f4a56f680a1fbad24 Author: Jonas Oreland <jonaso@google.com> Date: Tue Dec 3 18:47:27 2013 +0100 encryption of aria data&index files this patch implements encryption of aria data & index files. this is implemented as 1) add read/write hooks (renamed from callbacks) that does encrypt/decrypt (also add pre_read and post_write hooks) 2) modify page headers for data/index to contain key version (making the data-page header size different for with/without encryption) 3) modify index page 0 to contain IV (and crypt header) 4) AES CRT crypt functions 5) counter block is implemented using combination of page no, lsn and table specific id NOTE: 1) log files are not encrypted, this is not needed for if aria is only used for internal temporary tables and they are not transactional (i.e not logged) 2) all encrypted tables are using PAGE_CHECKSUM (crc) normal internal temporary tables are (currently) not CHECKSUM:ed 3) This patch adds insert-order semantics to aria block_format. The default behaviour of aria block-format is best-fit, meaning that rows gets allocated to page trying to fill the pages as much as possible. However, certain sql constructs materialize temporary result in tmp-tables, and expect that a table scan will later return the rows in the same order they were inserted. This implementation of insert-order is only enabled when explicitly requested by sql-layer. CHANGES: 1) found bug in ma_write that made code try to abort a record that was never written unsure why this is not exposed Change-Id: Ia82bbaa92e2c0629c08693c5add2f56b815c0509 commit 89dc1ab651fe0205d55b4eb588f62df550aa65fc Author: Jonas Oreland <jonaso@google.com> Date: Mon Feb 17 08:04:50 2014 -0800 Implement encryption of innodb datafiles. Pages are encrypted before written to disk and decrypted when read from disk. Each page except first page (page 0) in tablespace is encrypted. Page 0 is unencrypted and contains IV for the tablespace. FIL_PAGE_FILE_FLUSH_LSN on each page (except page 0) is used to store a 32-bit key-version, so that multiple keys can be active in a tablespace simultaneous. The other 32-bit of the FIL_PAGE_FILE_FLUSH_LSN field contains a checksum that is computed after encryption. This checksum is used by innochecksum and when restoring from double-write-buffer. The encryption is performed using AES CRT. Monitoring of encryption is enabled using new IS-table INNODB_TABLESPACES_ENCRYPTION. In addition to that new status variables innodb_encryption_rotation_{ pages_read_from_cache, pages_read_from_disk, pages_modified,pages_flushed } has been added. The following tunables are introduces - innodb_encrypt_tables - innodb_encryption_threads - innodb_encryption_rotate_key_age - innodb_encryption_rotation_iops Change-Id: I8f651795a30b52e71b16d6bc9cb7559be349d0b2 commit a17eef2f6948e58219c9e26fc35633d6fd4de1de Author: Andrew Ford <andrewford@google.com> Date: Thu Jan 2 15:43:09 2014 -0800 Key management skeleton with debug hooks. Change-Id: Ifd6aa3743d7ea291c70083f433a059c439aed866 commit 68a399838ad72264fd61b3dc67fecd29bbdb0af1 Author: Andrew Ford <andrewford@google.com> Date: Mon Oct 28 16:27:44 2013 -0700 Add AES-128 CTR and GCM encryption classes. Change-Id: I116305eced2a233db15306bc2ef5b9d398d1a3a2
11 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
Merge Google encryption commit 195158e9889365dc3298f8c1f3bcaa745992f27f Author: Minli Zhu <minliz@google.com> Date: Mon Nov 25 11:05:55 2013 -0800 Innodb redo log encryption/decryption. Use start lsn of a log block as part of AES CTR counter. Record key version with each checkpoint. Internally key version 0 means no encryption. Tests done (see test_innodb_log_encryption.sh for detail): - Verify flag innodb_encrypt_log on or off, combined with various key versions passed through CLI, and dynamically set after startup, will not corrupt database. This includes tests from being unencrypted to encrypted, and encrypted to unencrypted. - Verify start-up with no redo logs succeeds. - Verify fresh start-up succeeds. Change-Id: I4ce4c2afdf3076be2fce90ebbc2a7ce01184b612 commit c1b97273659f07866758c25f4a56f680a1fbad24 Author: Jonas Oreland <jonaso@google.com> Date: Tue Dec 3 18:47:27 2013 +0100 encryption of aria data&index files this patch implements encryption of aria data & index files. this is implemented as 1) add read/write hooks (renamed from callbacks) that does encrypt/decrypt (also add pre_read and post_write hooks) 2) modify page headers for data/index to contain key version (making the data-page header size different for with/without encryption) 3) modify index page 0 to contain IV (and crypt header) 4) AES CRT crypt functions 5) counter block is implemented using combination of page no, lsn and table specific id NOTE: 1) log files are not encrypted, this is not needed for if aria is only used for internal temporary tables and they are not transactional (i.e not logged) 2) all encrypted tables are using PAGE_CHECKSUM (crc) normal internal temporary tables are (currently) not CHECKSUM:ed 3) This patch adds insert-order semantics to aria block_format. The default behaviour of aria block-format is best-fit, meaning that rows gets allocated to page trying to fill the pages as much as possible. However, certain sql constructs materialize temporary result in tmp-tables, and expect that a table scan will later return the rows in the same order they were inserted. This implementation of insert-order is only enabled when explicitly requested by sql-layer. CHANGES: 1) found bug in ma_write that made code try to abort a record that was never written unsure why this is not exposed Change-Id: Ia82bbaa92e2c0629c08693c5add2f56b815c0509 commit 89dc1ab651fe0205d55b4eb588f62df550aa65fc Author: Jonas Oreland <jonaso@google.com> Date: Mon Feb 17 08:04:50 2014 -0800 Implement encryption of innodb datafiles. Pages are encrypted before written to disk and decrypted when read from disk. Each page except first page (page 0) in tablespace is encrypted. Page 0 is unencrypted and contains IV for the tablespace. FIL_PAGE_FILE_FLUSH_LSN on each page (except page 0) is used to store a 32-bit key-version, so that multiple keys can be active in a tablespace simultaneous. The other 32-bit of the FIL_PAGE_FILE_FLUSH_LSN field contains a checksum that is computed after encryption. This checksum is used by innochecksum and when restoring from double-write-buffer. The encryption is performed using AES CRT. Monitoring of encryption is enabled using new IS-table INNODB_TABLESPACES_ENCRYPTION. In addition to that new status variables innodb_encryption_rotation_{ pages_read_from_cache, pages_read_from_disk, pages_modified,pages_flushed } has been added. The following tunables are introduces - innodb_encrypt_tables - innodb_encryption_threads - innodb_encryption_rotate_key_age - innodb_encryption_rotation_iops Change-Id: I8f651795a30b52e71b16d6bc9cb7559be349d0b2 commit a17eef2f6948e58219c9e26fc35633d6fd4de1de Author: Andrew Ford <andrewford@google.com> Date: Thu Jan 2 15:43:09 2014 -0800 Key management skeleton with debug hooks. Change-Id: Ifd6aa3743d7ea291c70083f433a059c439aed866 commit 68a399838ad72264fd61b3dc67fecd29bbdb0af1 Author: Andrew Ford <andrewford@google.com> Date: Mon Oct 28 16:27:44 2013 -0700 Add AES-128 CTR and GCM encryption classes. Change-Id: I116305eced2a233db15306bc2ef5b9d398d1a3a2
11 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
Merge Google encryption commit 195158e9889365dc3298f8c1f3bcaa745992f27f Author: Minli Zhu <minliz@google.com> Date: Mon Nov 25 11:05:55 2013 -0800 Innodb redo log encryption/decryption. Use start lsn of a log block as part of AES CTR counter. Record key version with each checkpoint. Internally key version 0 means no encryption. Tests done (see test_innodb_log_encryption.sh for detail): - Verify flag innodb_encrypt_log on or off, combined with various key versions passed through CLI, and dynamically set after startup, will not corrupt database. This includes tests from being unencrypted to encrypted, and encrypted to unencrypted. - Verify start-up with no redo logs succeeds. - Verify fresh start-up succeeds. Change-Id: I4ce4c2afdf3076be2fce90ebbc2a7ce01184b612 commit c1b97273659f07866758c25f4a56f680a1fbad24 Author: Jonas Oreland <jonaso@google.com> Date: Tue Dec 3 18:47:27 2013 +0100 encryption of aria data&index files this patch implements encryption of aria data & index files. this is implemented as 1) add read/write hooks (renamed from callbacks) that does encrypt/decrypt (also add pre_read and post_write hooks) 2) modify page headers for data/index to contain key version (making the data-page header size different for with/without encryption) 3) modify index page 0 to contain IV (and crypt header) 4) AES CRT crypt functions 5) counter block is implemented using combination of page no, lsn and table specific id NOTE: 1) log files are not encrypted, this is not needed for if aria is only used for internal temporary tables and they are not transactional (i.e not logged) 2) all encrypted tables are using PAGE_CHECKSUM (crc) normal internal temporary tables are (currently) not CHECKSUM:ed 3) This patch adds insert-order semantics to aria block_format. The default behaviour of aria block-format is best-fit, meaning that rows gets allocated to page trying to fill the pages as much as possible. However, certain sql constructs materialize temporary result in tmp-tables, and expect that a table scan will later return the rows in the same order they were inserted. This implementation of insert-order is only enabled when explicitly requested by sql-layer. CHANGES: 1) found bug in ma_write that made code try to abort a record that was never written unsure why this is not exposed Change-Id: Ia82bbaa92e2c0629c08693c5add2f56b815c0509 commit 89dc1ab651fe0205d55b4eb588f62df550aa65fc Author: Jonas Oreland <jonaso@google.com> Date: Mon Feb 17 08:04:50 2014 -0800 Implement encryption of innodb datafiles. Pages are encrypted before written to disk and decrypted when read from disk. Each page except first page (page 0) in tablespace is encrypted. Page 0 is unencrypted and contains IV for the tablespace. FIL_PAGE_FILE_FLUSH_LSN on each page (except page 0) is used to store a 32-bit key-version, so that multiple keys can be active in a tablespace simultaneous. The other 32-bit of the FIL_PAGE_FILE_FLUSH_LSN field contains a checksum that is computed after encryption. This checksum is used by innochecksum and when restoring from double-write-buffer. The encryption is performed using AES CRT. Monitoring of encryption is enabled using new IS-table INNODB_TABLESPACES_ENCRYPTION. In addition to that new status variables innodb_encryption_rotation_{ pages_read_from_cache, pages_read_from_disk, pages_modified,pages_flushed } has been added. The following tunables are introduces - innodb_encrypt_tables - innodb_encryption_threads - innodb_encryption_rotate_key_age - innodb_encryption_rotation_iops Change-Id: I8f651795a30b52e71b16d6bc9cb7559be349d0b2 commit a17eef2f6948e58219c9e26fc35633d6fd4de1de Author: Andrew Ford <andrewford@google.com> Date: Thu Jan 2 15:43:09 2014 -0800 Key management skeleton with debug hooks. Change-Id: Ifd6aa3743d7ea291c70083f433a059c439aed866 commit 68a399838ad72264fd61b3dc67fecd29bbdb0af1 Author: Andrew Ford <andrewford@google.com> Date: Mon Oct 28 16:27:44 2013 -0700 Add AES-128 CTR and GCM encryption classes. Change-Id: I116305eced2a233db15306bc2ef5b9d398d1a3a2
11 years ago
MDEV-11738: Mariadb uses 100% of several of my 8 cpus doing nothing MDEV-11581: Mariadb starts InnoDB encryption threads when key has not changed or data scrubbing turned off Background: Key rotation is based on background threads (innodb-encryption-threads) periodically going through all tablespaces on fil_system. For each tablespace current used key version is compared to max key age (innodb-encryption-rotate-key-age). This process naturally takes CPU. Similarly, in same time need for scrubbing is investigated. Currently, key rotation is fully supported on Amazon AWS key management plugin only but InnoDB does not have knowledge what key management plugin is used. This patch re-purposes innodb-encryption-rotate-key-age=0 to disable key rotation and background data scrubbing. All new tables are added to special list for key rotation and key rotation is based on sending a event to background encryption threads instead of using periodic checking (i.e. timeout). fil0fil.cc: Added functions fil_space_acquire_low() to acquire a tablespace when it could be dropped concurrently. This function is used from fil_space_acquire() or fil_space_acquire_silent() that will not print any messages if we try to acquire space that does not exist. fil_space_release() to release a acquired tablespace. fil_space_next() to iterate tablespaces in fil_system using fil_space_acquire() and fil_space_release(). Similarly, fil_space_keyrotation_next() to iterate new list fil_system->rotation_list where new tables. are added if key rotation is disabled. Removed unnecessary functions fil_get_first_space_safe() fil_get_next_space_safe() fil_node_open_file(): After page 0 is read read also crypt_info if it is not yet read. btr_scrub_lock_dict_func() buf_page_check_corrupt() buf_page_encrypt_before_write() buf_merge_or_delete_for_page() lock_print_info_all_transactions() row_fts_psort_info_init() row_truncate_table_for_mysql() row_drop_table_for_mysql() Use fil_space_acquire()/release() to access fil_space_t. buf_page_decrypt_after_read(): Use fil_space_get_crypt_data() because at this point we might not yet have read page 0. fil0crypt.cc/fil0fil.h: Lot of changes. Pass fil_space_t* directly to functions needing it and store fil_space_t* to rotation state. Use fil_space_acquire()/release() when iterating tablespaces and removed unnecessary is_closing from fil_crypt_t. Use fil_space_t::is_stopping() to detect when access to tablespace should be stopped. Removed unnecessary fil_space_get_crypt_data(). fil_space_create(): Inform key rotation that there could be something to do if key rotation is disabled and new table with encryption enabled is created. Remove unnecessary functions fil_get_first_space_safe() and fil_get_next_space_safe(). fil_space_acquire() and fil_space_release() are used instead. Moved fil_space_get_crypt_data() and fil_space_set_crypt_data() to fil0crypt.cc. fsp_header_init(): Acquire fil_space_t*, write crypt_data and release space. check_table_options() Renamed FIL_SPACE_ENCRYPTION_* TO FIL_ENCRYPTION_* i_s.cc: Added ROTATING_OR_FLUSHING field to information_schema.innodb_tablespace_encryption to show current status of key rotation.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-11738: Mariadb uses 100% of several of my 8 cpus doing nothing MDEV-11581: Mariadb starts InnoDB encryption threads when key has not changed or data scrubbing turned off Background: Key rotation is based on background threads (innodb-encryption-threads) periodically going through all tablespaces on fil_system. For each tablespace current used key version is compared to max key age (innodb-encryption-rotate-key-age). This process naturally takes CPU. Similarly, in same time need for scrubbing is investigated. Currently, key rotation is fully supported on Amazon AWS key management plugin only but InnoDB does not have knowledge what key management plugin is used. This patch re-purposes innodb-encryption-rotate-key-age=0 to disable key rotation and background data scrubbing. All new tables are added to special list for key rotation and key rotation is based on sending a event to background encryption threads instead of using periodic checking (i.e. timeout). fil0fil.cc: Added functions fil_space_acquire_low() to acquire a tablespace when it could be dropped concurrently. This function is used from fil_space_acquire() or fil_space_acquire_silent() that will not print any messages if we try to acquire space that does not exist. fil_space_release() to release a acquired tablespace. fil_space_next() to iterate tablespaces in fil_system using fil_space_acquire() and fil_space_release(). Similarly, fil_space_keyrotation_next() to iterate new list fil_system->rotation_list where new tables. are added if key rotation is disabled. Removed unnecessary functions fil_get_first_space_safe() fil_get_next_space_safe() fil_node_open_file(): After page 0 is read read also crypt_info if it is not yet read. btr_scrub_lock_dict_func() buf_page_check_corrupt() buf_page_encrypt_before_write() buf_merge_or_delete_for_page() lock_print_info_all_transactions() row_fts_psort_info_init() row_truncate_table_for_mysql() row_drop_table_for_mysql() Use fil_space_acquire()/release() to access fil_space_t. buf_page_decrypt_after_read(): Use fil_space_get_crypt_data() because at this point we might not yet have read page 0. fil0crypt.cc/fil0fil.h: Lot of changes. Pass fil_space_t* directly to functions needing it and store fil_space_t* to rotation state. Use fil_space_acquire()/release() when iterating tablespaces and removed unnecessary is_closing from fil_crypt_t. Use fil_space_t::is_stopping() to detect when access to tablespace should be stopped. Removed unnecessary fil_space_get_crypt_data(). fil_space_create(): Inform key rotation that there could be something to do if key rotation is disabled and new table with encryption enabled is created. Remove unnecessary functions fil_get_first_space_safe() and fil_get_next_space_safe(). fil_space_acquire() and fil_space_release() are used instead. Moved fil_space_get_crypt_data() and fil_space_set_crypt_data() to fil0crypt.cc. fsp_header_init(): Acquire fil_space_t*, write crypt_data and release space. check_table_options() Renamed FIL_SPACE_ENCRYPTION_* TO FIL_ENCRYPTION_* i_s.cc: Added ROTATING_OR_FLUSHING field to information_schema.innodb_tablespace_encryption to show current status of key rotation.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
Merge Google encryption commit 195158e9889365dc3298f8c1f3bcaa745992f27f Author: Minli Zhu <minliz@google.com> Date: Mon Nov 25 11:05:55 2013 -0800 Innodb redo log encryption/decryption. Use start lsn of a log block as part of AES CTR counter. Record key version with each checkpoint. Internally key version 0 means no encryption. Tests done (see test_innodb_log_encryption.sh for detail): - Verify flag innodb_encrypt_log on or off, combined with various key versions passed through CLI, and dynamically set after startup, will not corrupt database. This includes tests from being unencrypted to encrypted, and encrypted to unencrypted. - Verify start-up with no redo logs succeeds. - Verify fresh start-up succeeds. Change-Id: I4ce4c2afdf3076be2fce90ebbc2a7ce01184b612 commit c1b97273659f07866758c25f4a56f680a1fbad24 Author: Jonas Oreland <jonaso@google.com> Date: Tue Dec 3 18:47:27 2013 +0100 encryption of aria data&index files this patch implements encryption of aria data & index files. this is implemented as 1) add read/write hooks (renamed from callbacks) that does encrypt/decrypt (also add pre_read and post_write hooks) 2) modify page headers for data/index to contain key version (making the data-page header size different for with/without encryption) 3) modify index page 0 to contain IV (and crypt header) 4) AES CRT crypt functions 5) counter block is implemented using combination of page no, lsn and table specific id NOTE: 1) log files are not encrypted, this is not needed for if aria is only used for internal temporary tables and they are not transactional (i.e not logged) 2) all encrypted tables are using PAGE_CHECKSUM (crc) normal internal temporary tables are (currently) not CHECKSUM:ed 3) This patch adds insert-order semantics to aria block_format. The default behaviour of aria block-format is best-fit, meaning that rows gets allocated to page trying to fill the pages as much as possible. However, certain sql constructs materialize temporary result in tmp-tables, and expect that a table scan will later return the rows in the same order they were inserted. This implementation of insert-order is only enabled when explicitly requested by sql-layer. CHANGES: 1) found bug in ma_write that made code try to abort a record that was never written unsure why this is not exposed Change-Id: Ia82bbaa92e2c0629c08693c5add2f56b815c0509 commit 89dc1ab651fe0205d55b4eb588f62df550aa65fc Author: Jonas Oreland <jonaso@google.com> Date: Mon Feb 17 08:04:50 2014 -0800 Implement encryption of innodb datafiles. Pages are encrypted before written to disk and decrypted when read from disk. Each page except first page (page 0) in tablespace is encrypted. Page 0 is unencrypted and contains IV for the tablespace. FIL_PAGE_FILE_FLUSH_LSN on each page (except page 0) is used to store a 32-bit key-version, so that multiple keys can be active in a tablespace simultaneous. The other 32-bit of the FIL_PAGE_FILE_FLUSH_LSN field contains a checksum that is computed after encryption. This checksum is used by innochecksum and when restoring from double-write-buffer. The encryption is performed using AES CRT. Monitoring of encryption is enabled using new IS-table INNODB_TABLESPACES_ENCRYPTION. In addition to that new status variables innodb_encryption_rotation_{ pages_read_from_cache, pages_read_from_disk, pages_modified,pages_flushed } has been added. The following tunables are introduces - innodb_encrypt_tables - innodb_encryption_threads - innodb_encryption_rotate_key_age - innodb_encryption_rotation_iops Change-Id: I8f651795a30b52e71b16d6bc9cb7559be349d0b2 commit a17eef2f6948e58219c9e26fc35633d6fd4de1de Author: Andrew Ford <andrewford@google.com> Date: Thu Jan 2 15:43:09 2014 -0800 Key management skeleton with debug hooks. Change-Id: Ifd6aa3743d7ea291c70083f433a059c439aed866 commit 68a399838ad72264fd61b3dc67fecd29bbdb0af1 Author: Andrew Ford <andrewford@google.com> Date: Mon Oct 28 16:27:44 2013 -0700 Add AES-128 CTR and GCM encryption classes. Change-Id: I116305eced2a233db15306bc2ef5b9d398d1a3a2
11 years ago
MDEV-11738: Mariadb uses 100% of several of my 8 cpus doing nothing MDEV-11581: Mariadb starts InnoDB encryption threads when key has not changed or data scrubbing turned off Background: Key rotation is based on background threads (innodb-encryption-threads) periodically going through all tablespaces on fil_system. For each tablespace current used key version is compared to max key age (innodb-encryption-rotate-key-age). This process naturally takes CPU. Similarly, in same time need for scrubbing is investigated. Currently, key rotation is fully supported on Amazon AWS key management plugin only but InnoDB does not have knowledge what key management plugin is used. This patch re-purposes innodb-encryption-rotate-key-age=0 to disable key rotation and background data scrubbing. All new tables are added to special list for key rotation and key rotation is based on sending a event to background encryption threads instead of using periodic checking (i.e. timeout). fil0fil.cc: Added functions fil_space_acquire_low() to acquire a tablespace when it could be dropped concurrently. This function is used from fil_space_acquire() or fil_space_acquire_silent() that will not print any messages if we try to acquire space that does not exist. fil_space_release() to release a acquired tablespace. fil_space_next() to iterate tablespaces in fil_system using fil_space_acquire() and fil_space_release(). Similarly, fil_space_keyrotation_next() to iterate new list fil_system->rotation_list where new tables. are added if key rotation is disabled. Removed unnecessary functions fil_get_first_space_safe() fil_get_next_space_safe() fil_node_open_file(): After page 0 is read read also crypt_info if it is not yet read. btr_scrub_lock_dict_func() buf_page_check_corrupt() buf_page_encrypt_before_write() buf_merge_or_delete_for_page() lock_print_info_all_transactions() row_fts_psort_info_init() row_truncate_table_for_mysql() row_drop_table_for_mysql() Use fil_space_acquire()/release() to access fil_space_t. buf_page_decrypt_after_read(): Use fil_space_get_crypt_data() because at this point we might not yet have read page 0. fil0crypt.cc/fil0fil.h: Lot of changes. Pass fil_space_t* directly to functions needing it and store fil_space_t* to rotation state. Use fil_space_acquire()/release() when iterating tablespaces and removed unnecessary is_closing from fil_crypt_t. Use fil_space_t::is_stopping() to detect when access to tablespace should be stopped. Removed unnecessary fil_space_get_crypt_data(). fil_space_create(): Inform key rotation that there could be something to do if key rotation is disabled and new table with encryption enabled is created. Remove unnecessary functions fil_get_first_space_safe() and fil_get_next_space_safe(). fil_space_acquire() and fil_space_release() are used instead. Moved fil_space_get_crypt_data() and fil_space_set_crypt_data() to fil0crypt.cc. fsp_header_init(): Acquire fil_space_t*, write crypt_data and release space. check_table_options() Renamed FIL_SPACE_ENCRYPTION_* TO FIL_ENCRYPTION_* i_s.cc: Added ROTATING_OR_FLUSHING field to information_schema.innodb_tablespace_encryption to show current status of key rotation.
9 years ago
Merge Google encryption commit 195158e9889365dc3298f8c1f3bcaa745992f27f Author: Minli Zhu <minliz@google.com> Date: Mon Nov 25 11:05:55 2013 -0800 Innodb redo log encryption/decryption. Use start lsn of a log block as part of AES CTR counter. Record key version with each checkpoint. Internally key version 0 means no encryption. Tests done (see test_innodb_log_encryption.sh for detail): - Verify flag innodb_encrypt_log on or off, combined with various key versions passed through CLI, and dynamically set after startup, will not corrupt database. This includes tests from being unencrypted to encrypted, and encrypted to unencrypted. - Verify start-up with no redo logs succeeds. - Verify fresh start-up succeeds. Change-Id: I4ce4c2afdf3076be2fce90ebbc2a7ce01184b612 commit c1b97273659f07866758c25f4a56f680a1fbad24 Author: Jonas Oreland <jonaso@google.com> Date: Tue Dec 3 18:47:27 2013 +0100 encryption of aria data&index files this patch implements encryption of aria data & index files. this is implemented as 1) add read/write hooks (renamed from callbacks) that does encrypt/decrypt (also add pre_read and post_write hooks) 2) modify page headers for data/index to contain key version (making the data-page header size different for with/without encryption) 3) modify index page 0 to contain IV (and crypt header) 4) AES CRT crypt functions 5) counter block is implemented using combination of page no, lsn and table specific id NOTE: 1) log files are not encrypted, this is not needed for if aria is only used for internal temporary tables and they are not transactional (i.e not logged) 2) all encrypted tables are using PAGE_CHECKSUM (crc) normal internal temporary tables are (currently) not CHECKSUM:ed 3) This patch adds insert-order semantics to aria block_format. The default behaviour of aria block-format is best-fit, meaning that rows gets allocated to page trying to fill the pages as much as possible. However, certain sql constructs materialize temporary result in tmp-tables, and expect that a table scan will later return the rows in the same order they were inserted. This implementation of insert-order is only enabled when explicitly requested by sql-layer. CHANGES: 1) found bug in ma_write that made code try to abort a record that was never written unsure why this is not exposed Change-Id: Ia82bbaa92e2c0629c08693c5add2f56b815c0509 commit 89dc1ab651fe0205d55b4eb588f62df550aa65fc Author: Jonas Oreland <jonaso@google.com> Date: Mon Feb 17 08:04:50 2014 -0800 Implement encryption of innodb datafiles. Pages are encrypted before written to disk and decrypted when read from disk. Each page except first page (page 0) in tablespace is encrypted. Page 0 is unencrypted and contains IV for the tablespace. FIL_PAGE_FILE_FLUSH_LSN on each page (except page 0) is used to store a 32-bit key-version, so that multiple keys can be active in a tablespace simultaneous. The other 32-bit of the FIL_PAGE_FILE_FLUSH_LSN field contains a checksum that is computed after encryption. This checksum is used by innochecksum and when restoring from double-write-buffer. The encryption is performed using AES CRT. Monitoring of encryption is enabled using new IS-table INNODB_TABLESPACES_ENCRYPTION. In addition to that new status variables innodb_encryption_rotation_{ pages_read_from_cache, pages_read_from_disk, pages_modified,pages_flushed } has been added. The following tunables are introduces - innodb_encrypt_tables - innodb_encryption_threads - innodb_encryption_rotate_key_age - innodb_encryption_rotation_iops Change-Id: I8f651795a30b52e71b16d6bc9cb7559be349d0b2 commit a17eef2f6948e58219c9e26fc35633d6fd4de1de Author: Andrew Ford <andrewford@google.com> Date: Thu Jan 2 15:43:09 2014 -0800 Key management skeleton with debug hooks. Change-Id: Ifd6aa3743d7ea291c70083f433a059c439aed866 commit 68a399838ad72264fd61b3dc67fecd29bbdb0af1 Author: Andrew Ford <andrewford@google.com> Date: Mon Oct 28 16:27:44 2013 -0700 Add AES-128 CTR and GCM encryption classes. Change-Id: I116305eced2a233db15306bc2ef5b9d398d1a3a2
11 years ago
MDEV-11738: Mariadb uses 100% of several of my 8 cpus doing nothing MDEV-11581: Mariadb starts InnoDB encryption threads when key has not changed or data scrubbing turned off Background: Key rotation is based on background threads (innodb-encryption-threads) periodically going through all tablespaces on fil_system. For each tablespace current used key version is compared to max key age (innodb-encryption-rotate-key-age). This process naturally takes CPU. Similarly, in same time need for scrubbing is investigated. Currently, key rotation is fully supported on Amazon AWS key management plugin only but InnoDB does not have knowledge what key management plugin is used. This patch re-purposes innodb-encryption-rotate-key-age=0 to disable key rotation and background data scrubbing. All new tables are added to special list for key rotation and key rotation is based on sending a event to background encryption threads instead of using periodic checking (i.e. timeout). fil0fil.cc: Added functions fil_space_acquire_low() to acquire a tablespace when it could be dropped concurrently. This function is used from fil_space_acquire() or fil_space_acquire_silent() that will not print any messages if we try to acquire space that does not exist. fil_space_release() to release a acquired tablespace. fil_space_next() to iterate tablespaces in fil_system using fil_space_acquire() and fil_space_release(). Similarly, fil_space_keyrotation_next() to iterate new list fil_system->rotation_list where new tables. are added if key rotation is disabled. Removed unnecessary functions fil_get_first_space_safe() fil_get_next_space_safe() fil_node_open_file(): After page 0 is read read also crypt_info if it is not yet read. btr_scrub_lock_dict_func() buf_page_check_corrupt() buf_page_encrypt_before_write() buf_merge_or_delete_for_page() lock_print_info_all_transactions() row_fts_psort_info_init() row_truncate_table_for_mysql() row_drop_table_for_mysql() Use fil_space_acquire()/release() to access fil_space_t. buf_page_decrypt_after_read(): Use fil_space_get_crypt_data() because at this point we might not yet have read page 0. fil0crypt.cc/fil0fil.h: Lot of changes. Pass fil_space_t* directly to functions needing it and store fil_space_t* to rotation state. Use fil_space_acquire()/release() when iterating tablespaces and removed unnecessary is_closing from fil_crypt_t. Use fil_space_t::is_stopping() to detect when access to tablespace should be stopped. Removed unnecessary fil_space_get_crypt_data(). fil_space_create(): Inform key rotation that there could be something to do if key rotation is disabled and new table with encryption enabled is created. Remove unnecessary functions fil_get_first_space_safe() and fil_get_next_space_safe(). fil_space_acquire() and fil_space_release() are used instead. Moved fil_space_get_crypt_data() and fil_space_set_crypt_data() to fil0crypt.cc. fsp_header_init(): Acquire fil_space_t*, write crypt_data and release space. check_table_options() Renamed FIL_SPACE_ENCRYPTION_* TO FIL_ENCRYPTION_* i_s.cc: Added ROTATING_OR_FLUSHING field to information_schema.innodb_tablespace_encryption to show current status of key rotation.
9 years ago
Merge Google encryption commit 195158e9889365dc3298f8c1f3bcaa745992f27f Author: Minli Zhu <minliz@google.com> Date: Mon Nov 25 11:05:55 2013 -0800 Innodb redo log encryption/decryption. Use start lsn of a log block as part of AES CTR counter. Record key version with each checkpoint. Internally key version 0 means no encryption. Tests done (see test_innodb_log_encryption.sh for detail): - Verify flag innodb_encrypt_log on or off, combined with various key versions passed through CLI, and dynamically set after startup, will not corrupt database. This includes tests from being unencrypted to encrypted, and encrypted to unencrypted. - Verify start-up with no redo logs succeeds. - Verify fresh start-up succeeds. Change-Id: I4ce4c2afdf3076be2fce90ebbc2a7ce01184b612 commit c1b97273659f07866758c25f4a56f680a1fbad24 Author: Jonas Oreland <jonaso@google.com> Date: Tue Dec 3 18:47:27 2013 +0100 encryption of aria data&index files this patch implements encryption of aria data & index files. this is implemented as 1) add read/write hooks (renamed from callbacks) that does encrypt/decrypt (also add pre_read and post_write hooks) 2) modify page headers for data/index to contain key version (making the data-page header size different for with/without encryption) 3) modify index page 0 to contain IV (and crypt header) 4) AES CRT crypt functions 5) counter block is implemented using combination of page no, lsn and table specific id NOTE: 1) log files are not encrypted, this is not needed for if aria is only used for internal temporary tables and they are not transactional (i.e not logged) 2) all encrypted tables are using PAGE_CHECKSUM (crc) normal internal temporary tables are (currently) not CHECKSUM:ed 3) This patch adds insert-order semantics to aria block_format. The default behaviour of aria block-format is best-fit, meaning that rows gets allocated to page trying to fill the pages as much as possible. However, certain sql constructs materialize temporary result in tmp-tables, and expect that a table scan will later return the rows in the same order they were inserted. This implementation of insert-order is only enabled when explicitly requested by sql-layer. CHANGES: 1) found bug in ma_write that made code try to abort a record that was never written unsure why this is not exposed Change-Id: Ia82bbaa92e2c0629c08693c5add2f56b815c0509 commit 89dc1ab651fe0205d55b4eb588f62df550aa65fc Author: Jonas Oreland <jonaso@google.com> Date: Mon Feb 17 08:04:50 2014 -0800 Implement encryption of innodb datafiles. Pages are encrypted before written to disk and decrypted when read from disk. Each page except first page (page 0) in tablespace is encrypted. Page 0 is unencrypted and contains IV for the tablespace. FIL_PAGE_FILE_FLUSH_LSN on each page (except page 0) is used to store a 32-bit key-version, so that multiple keys can be active in a tablespace simultaneous. The other 32-bit of the FIL_PAGE_FILE_FLUSH_LSN field contains a checksum that is computed after encryption. This checksum is used by innochecksum and when restoring from double-write-buffer. The encryption is performed using AES CRT. Monitoring of encryption is enabled using new IS-table INNODB_TABLESPACES_ENCRYPTION. In addition to that new status variables innodb_encryption_rotation_{ pages_read_from_cache, pages_read_from_disk, pages_modified,pages_flushed } has been added. The following tunables are introduces - innodb_encrypt_tables - innodb_encryption_threads - innodb_encryption_rotate_key_age - innodb_encryption_rotation_iops Change-Id: I8f651795a30b52e71b16d6bc9cb7559be349d0b2 commit a17eef2f6948e58219c9e26fc35633d6fd4de1de Author: Andrew Ford <andrewford@google.com> Date: Thu Jan 2 15:43:09 2014 -0800 Key management skeleton with debug hooks. Change-Id: Ifd6aa3743d7ea291c70083f433a059c439aed866 commit 68a399838ad72264fd61b3dc67fecd29bbdb0af1 Author: Andrew Ford <andrewford@google.com> Date: Mon Oct 28 16:27:44 2013 -0700 Add AES-128 CTR and GCM encryption classes. Change-Id: I116305eced2a233db15306bc2ef5b9d398d1a3a2
11 years ago
MDEV-11738: Mariadb uses 100% of several of my 8 cpus doing nothing MDEV-11581: Mariadb starts InnoDB encryption threads when key has not changed or data scrubbing turned off Background: Key rotation is based on background threads (innodb-encryption-threads) periodically going through all tablespaces on fil_system. For each tablespace current used key version is compared to max key age (innodb-encryption-rotate-key-age). This process naturally takes CPU. Similarly, in same time need for scrubbing is investigated. Currently, key rotation is fully supported on Amazon AWS key management plugin only but InnoDB does not have knowledge what key management plugin is used. This patch re-purposes innodb-encryption-rotate-key-age=0 to disable key rotation and background data scrubbing. All new tables are added to special list for key rotation and key rotation is based on sending a event to background encryption threads instead of using periodic checking (i.e. timeout). fil0fil.cc: Added functions fil_space_acquire_low() to acquire a tablespace when it could be dropped concurrently. This function is used from fil_space_acquire() or fil_space_acquire_silent() that will not print any messages if we try to acquire space that does not exist. fil_space_release() to release a acquired tablespace. fil_space_next() to iterate tablespaces in fil_system using fil_space_acquire() and fil_space_release(). Similarly, fil_space_keyrotation_next() to iterate new list fil_system->rotation_list where new tables. are added if key rotation is disabled. Removed unnecessary functions fil_get_first_space_safe() fil_get_next_space_safe() fil_node_open_file(): After page 0 is read read also crypt_info if it is not yet read. btr_scrub_lock_dict_func() buf_page_check_corrupt() buf_page_encrypt_before_write() buf_merge_or_delete_for_page() lock_print_info_all_transactions() row_fts_psort_info_init() row_truncate_table_for_mysql() row_drop_table_for_mysql() Use fil_space_acquire()/release() to access fil_space_t. buf_page_decrypt_after_read(): Use fil_space_get_crypt_data() because at this point we might not yet have read page 0. fil0crypt.cc/fil0fil.h: Lot of changes. Pass fil_space_t* directly to functions needing it and store fil_space_t* to rotation state. Use fil_space_acquire()/release() when iterating tablespaces and removed unnecessary is_closing from fil_crypt_t. Use fil_space_t::is_stopping() to detect when access to tablespace should be stopped. Removed unnecessary fil_space_get_crypt_data(). fil_space_create(): Inform key rotation that there could be something to do if key rotation is disabled and new table with encryption enabled is created. Remove unnecessary functions fil_get_first_space_safe() and fil_get_next_space_safe(). fil_space_acquire() and fil_space_release() are used instead. Moved fil_space_get_crypt_data() and fil_space_set_crypt_data() to fil0crypt.cc. fsp_header_init(): Acquire fil_space_t*, write crypt_data and release space. check_table_options() Renamed FIL_SPACE_ENCRYPTION_* TO FIL_ENCRYPTION_* i_s.cc: Added ROTATING_OR_FLUSHING field to information_schema.innodb_tablespace_encryption to show current status of key rotation.
9 years ago
Merge Google encryption commit 195158e9889365dc3298f8c1f3bcaa745992f27f Author: Minli Zhu <minliz@google.com> Date: Mon Nov 25 11:05:55 2013 -0800 Innodb redo log encryption/decryption. Use start lsn of a log block as part of AES CTR counter. Record key version with each checkpoint. Internally key version 0 means no encryption. Tests done (see test_innodb_log_encryption.sh for detail): - Verify flag innodb_encrypt_log on or off, combined with various key versions passed through CLI, and dynamically set after startup, will not corrupt database. This includes tests from being unencrypted to encrypted, and encrypted to unencrypted. - Verify start-up with no redo logs succeeds. - Verify fresh start-up succeeds. Change-Id: I4ce4c2afdf3076be2fce90ebbc2a7ce01184b612 commit c1b97273659f07866758c25f4a56f680a1fbad24 Author: Jonas Oreland <jonaso@google.com> Date: Tue Dec 3 18:47:27 2013 +0100 encryption of aria data&index files this patch implements encryption of aria data & index files. this is implemented as 1) add read/write hooks (renamed from callbacks) that does encrypt/decrypt (also add pre_read and post_write hooks) 2) modify page headers for data/index to contain key version (making the data-page header size different for with/without encryption) 3) modify index page 0 to contain IV (and crypt header) 4) AES CRT crypt functions 5) counter block is implemented using combination of page no, lsn and table specific id NOTE: 1) log files are not encrypted, this is not needed for if aria is only used for internal temporary tables and they are not transactional (i.e not logged) 2) all encrypted tables are using PAGE_CHECKSUM (crc) normal internal temporary tables are (currently) not CHECKSUM:ed 3) This patch adds insert-order semantics to aria block_format. The default behaviour of aria block-format is best-fit, meaning that rows gets allocated to page trying to fill the pages as much as possible. However, certain sql constructs materialize temporary result in tmp-tables, and expect that a table scan will later return the rows in the same order they were inserted. This implementation of insert-order is only enabled when explicitly requested by sql-layer. CHANGES: 1) found bug in ma_write that made code try to abort a record that was never written unsure why this is not exposed Change-Id: Ia82bbaa92e2c0629c08693c5add2f56b815c0509 commit 89dc1ab651fe0205d55b4eb588f62df550aa65fc Author: Jonas Oreland <jonaso@google.com> Date: Mon Feb 17 08:04:50 2014 -0800 Implement encryption of innodb datafiles. Pages are encrypted before written to disk and decrypted when read from disk. Each page except first page (page 0) in tablespace is encrypted. Page 0 is unencrypted and contains IV for the tablespace. FIL_PAGE_FILE_FLUSH_LSN on each page (except page 0) is used to store a 32-bit key-version, so that multiple keys can be active in a tablespace simultaneous. The other 32-bit of the FIL_PAGE_FILE_FLUSH_LSN field contains a checksum that is computed after encryption. This checksum is used by innochecksum and when restoring from double-write-buffer. The encryption is performed using AES CRT. Monitoring of encryption is enabled using new IS-table INNODB_TABLESPACES_ENCRYPTION. In addition to that new status variables innodb_encryption_rotation_{ pages_read_from_cache, pages_read_from_disk, pages_modified,pages_flushed } has been added. The following tunables are introduces - innodb_encrypt_tables - innodb_encryption_threads - innodb_encryption_rotate_key_age - innodb_encryption_rotation_iops Change-Id: I8f651795a30b52e71b16d6bc9cb7559be349d0b2 commit a17eef2f6948e58219c9e26fc35633d6fd4de1de Author: Andrew Ford <andrewford@google.com> Date: Thu Jan 2 15:43:09 2014 -0800 Key management skeleton with debug hooks. Change-Id: Ifd6aa3743d7ea291c70083f433a059c439aed866 commit 68a399838ad72264fd61b3dc67fecd29bbdb0af1 Author: Andrew Ford <andrewford@google.com> Date: Mon Oct 28 16:27:44 2013 -0700 Add AES-128 CTR and GCM encryption classes. Change-Id: I116305eced2a233db15306bc2ef5b9d398d1a3a2
11 years ago
MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().
9 years ago
Merge Google encryption commit 195158e9889365dc3298f8c1f3bcaa745992f27f Author: Minli Zhu <minliz@google.com> Date: Mon Nov 25 11:05:55 2013 -0800 Innodb redo log encryption/decryption. Use start lsn of a log block as part of AES CTR counter. Record key version with each checkpoint. Internally key version 0 means no encryption. Tests done (see test_innodb_log_encryption.sh for detail): - Verify flag innodb_encrypt_log on or off, combined with various key versions passed through CLI, and dynamically set after startup, will not corrupt database. This includes tests from being unencrypted to encrypted, and encrypted to unencrypted. - Verify start-up with no redo logs succeeds. - Verify fresh start-up succeeds. Change-Id: I4ce4c2afdf3076be2fce90ebbc2a7ce01184b612 commit c1b97273659f07866758c25f4a56f680a1fbad24 Author: Jonas Oreland <jonaso@google.com> Date: Tue Dec 3 18:47:27 2013 +0100 encryption of aria data&index files this patch implements encryption of aria data & index files. this is implemented as 1) add read/write hooks (renamed from callbacks) that does encrypt/decrypt (also add pre_read and post_write hooks) 2) modify page headers for data/index to contain key version (making the data-page header size different for with/without encryption) 3) modify index page 0 to contain IV (and crypt header) 4) AES CRT crypt functions 5) counter block is implemented using combination of page no, lsn and table specific id NOTE: 1) log files are not encrypted, this is not needed for if aria is only used for internal temporary tables and they are not transactional (i.e not logged) 2) all encrypted tables are using PAGE_CHECKSUM (crc) normal internal temporary tables are (currently) not CHECKSUM:ed 3) This patch adds insert-order semantics to aria block_format. The default behaviour of aria block-format is best-fit, meaning that rows gets allocated to page trying to fill the pages as much as possible. However, certain sql constructs materialize temporary result in tmp-tables, and expect that a table scan will later return the rows in the same order they were inserted. This implementation of insert-order is only enabled when explicitly requested by sql-layer. CHANGES: 1) found bug in ma_write that made code try to abort a record that was never written unsure why this is not exposed Change-Id: Ia82bbaa92e2c0629c08693c5add2f56b815c0509 commit 89dc1ab651fe0205d55b4eb588f62df550aa65fc Author: Jonas Oreland <jonaso@google.com> Date: Mon Feb 17 08:04:50 2014 -0800 Implement encryption of innodb datafiles. Pages are encrypted before written to disk and decrypted when read from disk. Each page except first page (page 0) in tablespace is encrypted. Page 0 is unencrypted and contains IV for the tablespace. FIL_PAGE_FILE_FLUSH_LSN on each page (except page 0) is used to store a 32-bit key-version, so that multiple keys can be active in a tablespace simultaneous. The other 32-bit of the FIL_PAGE_FILE_FLUSH_LSN field contains a checksum that is computed after encryption. This checksum is used by innochecksum and when restoring from double-write-buffer. The encryption is performed using AES CRT. Monitoring of encryption is enabled using new IS-table INNODB_TABLESPACES_ENCRYPTION. In addition to that new status variables innodb_encryption_rotation_{ pages_read_from_cache, pages_read_from_disk, pages_modified,pages_flushed } has been added. The following tunables are introduces - innodb_encrypt_tables - innodb_encryption_threads - innodb_encryption_rotate_key_age - innodb_encryption_rotation_iops Change-Id: I8f651795a30b52e71b16d6bc9cb7559be349d0b2 commit a17eef2f6948e58219c9e26fc35633d6fd4de1de Author: Andrew Ford <andrewford@google.com> Date: Thu Jan 2 15:43:09 2014 -0800 Key management skeleton with debug hooks. Change-Id: Ifd6aa3743d7ea291c70083f433a059c439aed866 commit 68a399838ad72264fd61b3dc67fecd29bbdb0af1 Author: Andrew Ford <andrewford@google.com> Date: Mon Oct 28 16:27:44 2013 -0700 Add AES-128 CTR and GCM encryption classes. Change-Id: I116305eced2a233db15306bc2ef5b9d398d1a3a2
11 years ago
MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().
9 years ago
Merge Google encryption commit 195158e9889365dc3298f8c1f3bcaa745992f27f Author: Minli Zhu <minliz@google.com> Date: Mon Nov 25 11:05:55 2013 -0800 Innodb redo log encryption/decryption. Use start lsn of a log block as part of AES CTR counter. Record key version with each checkpoint. Internally key version 0 means no encryption. Tests done (see test_innodb_log_encryption.sh for detail): - Verify flag innodb_encrypt_log on or off, combined with various key versions passed through CLI, and dynamically set after startup, will not corrupt database. This includes tests from being unencrypted to encrypted, and encrypted to unencrypted. - Verify start-up with no redo logs succeeds. - Verify fresh start-up succeeds. Change-Id: I4ce4c2afdf3076be2fce90ebbc2a7ce01184b612 commit c1b97273659f07866758c25f4a56f680a1fbad24 Author: Jonas Oreland <jonaso@google.com> Date: Tue Dec 3 18:47:27 2013 +0100 encryption of aria data&index files this patch implements encryption of aria data & index files. this is implemented as 1) add read/write hooks (renamed from callbacks) that does encrypt/decrypt (also add pre_read and post_write hooks) 2) modify page headers for data/index to contain key version (making the data-page header size different for with/without encryption) 3) modify index page 0 to contain IV (and crypt header) 4) AES CRT crypt functions 5) counter block is implemented using combination of page no, lsn and table specific id NOTE: 1) log files are not encrypted, this is not needed for if aria is only used for internal temporary tables and they are not transactional (i.e not logged) 2) all encrypted tables are using PAGE_CHECKSUM (crc) normal internal temporary tables are (currently) not CHECKSUM:ed 3) This patch adds insert-order semantics to aria block_format. The default behaviour of aria block-format is best-fit, meaning that rows gets allocated to page trying to fill the pages as much as possible. However, certain sql constructs materialize temporary result in tmp-tables, and expect that a table scan will later return the rows in the same order they were inserted. This implementation of insert-order is only enabled when explicitly requested by sql-layer. CHANGES: 1) found bug in ma_write that made code try to abort a record that was never written unsure why this is not exposed Change-Id: Ia82bbaa92e2c0629c08693c5add2f56b815c0509 commit 89dc1ab651fe0205d55b4eb588f62df550aa65fc Author: Jonas Oreland <jonaso@google.com> Date: Mon Feb 17 08:04:50 2014 -0800 Implement encryption of innodb datafiles. Pages are encrypted before written to disk and decrypted when read from disk. Each page except first page (page 0) in tablespace is encrypted. Page 0 is unencrypted and contains IV for the tablespace. FIL_PAGE_FILE_FLUSH_LSN on each page (except page 0) is used to store a 32-bit key-version, so that multiple keys can be active in a tablespace simultaneous. The other 32-bit of the FIL_PAGE_FILE_FLUSH_LSN field contains a checksum that is computed after encryption. This checksum is used by innochecksum and when restoring from double-write-buffer. The encryption is performed using AES CRT. Monitoring of encryption is enabled using new IS-table INNODB_TABLESPACES_ENCRYPTION. In addition to that new status variables innodb_encryption_rotation_{ pages_read_from_cache, pages_read_from_disk, pages_modified,pages_flushed } has been added. The following tunables are introduces - innodb_encrypt_tables - innodb_encryption_threads - innodb_encryption_rotate_key_age - innodb_encryption_rotation_iops Change-Id: I8f651795a30b52e71b16d6bc9cb7559be349d0b2 commit a17eef2f6948e58219c9e26fc35633d6fd4de1de Author: Andrew Ford <andrewford@google.com> Date: Thu Jan 2 15:43:09 2014 -0800 Key management skeleton with debug hooks. Change-Id: Ifd6aa3743d7ea291c70083f433a059c439aed866 commit 68a399838ad72264fd61b3dc67fecd29bbdb0af1 Author: Andrew Ford <andrewford@google.com> Date: Mon Oct 28 16:27:44 2013 -0700 Add AES-128 CTR and GCM encryption classes. Change-Id: I116305eced2a233db15306bc2ef5b9d398d1a3a2
11 years ago
Merge Google encryption commit 195158e9889365dc3298f8c1f3bcaa745992f27f Author: Minli Zhu <minliz@google.com> Date: Mon Nov 25 11:05:55 2013 -0800 Innodb redo log encryption/decryption. Use start lsn of a log block as part of AES CTR counter. Record key version with each checkpoint. Internally key version 0 means no encryption. Tests done (see test_innodb_log_encryption.sh for detail): - Verify flag innodb_encrypt_log on or off, combined with various key versions passed through CLI, and dynamically set after startup, will not corrupt database. This includes tests from being unencrypted to encrypted, and encrypted to unencrypted. - Verify start-up with no redo logs succeeds. - Verify fresh start-up succeeds. Change-Id: I4ce4c2afdf3076be2fce90ebbc2a7ce01184b612 commit c1b97273659f07866758c25f4a56f680a1fbad24 Author: Jonas Oreland <jonaso@google.com> Date: Tue Dec 3 18:47:27 2013 +0100 encryption of aria data&index files this patch implements encryption of aria data & index files. this is implemented as 1) add read/write hooks (renamed from callbacks) that does encrypt/decrypt (also add pre_read and post_write hooks) 2) modify page headers for data/index to contain key version (making the data-page header size different for with/without encryption) 3) modify index page 0 to contain IV (and crypt header) 4) AES CRT crypt functions 5) counter block is implemented using combination of page no, lsn and table specific id NOTE: 1) log files are not encrypted, this is not needed for if aria is only used for internal temporary tables and they are not transactional (i.e not logged) 2) all encrypted tables are using PAGE_CHECKSUM (crc) normal internal temporary tables are (currently) not CHECKSUM:ed 3) This patch adds insert-order semantics to aria block_format. The default behaviour of aria block-format is best-fit, meaning that rows gets allocated to page trying to fill the pages as much as possible. However, certain sql constructs materialize temporary result in tmp-tables, and expect that a table scan will later return the rows in the same order they were inserted. This implementation of insert-order is only enabled when explicitly requested by sql-layer. CHANGES: 1) found bug in ma_write that made code try to abort a record that was never written unsure why this is not exposed Change-Id: Ia82bbaa92e2c0629c08693c5add2f56b815c0509 commit 89dc1ab651fe0205d55b4eb588f62df550aa65fc Author: Jonas Oreland <jonaso@google.com> Date: Mon Feb 17 08:04:50 2014 -0800 Implement encryption of innodb datafiles. Pages are encrypted before written to disk and decrypted when read from disk. Each page except first page (page 0) in tablespace is encrypted. Page 0 is unencrypted and contains IV for the tablespace. FIL_PAGE_FILE_FLUSH_LSN on each page (except page 0) is used to store a 32-bit key-version, so that multiple keys can be active in a tablespace simultaneous. The other 32-bit of the FIL_PAGE_FILE_FLUSH_LSN field contains a checksum that is computed after encryption. This checksum is used by innochecksum and when restoring from double-write-buffer. The encryption is performed using AES CRT. Monitoring of encryption is enabled using new IS-table INNODB_TABLESPACES_ENCRYPTION. In addition to that new status variables innodb_encryption_rotation_{ pages_read_from_cache, pages_read_from_disk, pages_modified,pages_flushed } has been added. The following tunables are introduces - innodb_encrypt_tables - innodb_encryption_threads - innodb_encryption_rotate_key_age - innodb_encryption_rotation_iops Change-Id: I8f651795a30b52e71b16d6bc9cb7559be349d0b2 commit a17eef2f6948e58219c9e26fc35633d6fd4de1de Author: Andrew Ford <andrewford@google.com> Date: Thu Jan 2 15:43:09 2014 -0800 Key management skeleton with debug hooks. Change-Id: Ifd6aa3743d7ea291c70083f433a059c439aed866 commit 68a399838ad72264fd61b3dc67fecd29bbdb0af1 Author: Andrew Ford <andrewford@google.com> Date: Mon Oct 28 16:27:44 2013 -0700 Add AES-128 CTR and GCM encryption classes. Change-Id: I116305eced2a233db15306bc2ef5b9d398d1a3a2
11 years ago
MDEV-11738: Mariadb uses 100% of several of my 8 cpus doing nothing MDEV-11581: Mariadb starts InnoDB encryption threads when key has not changed or data scrubbing turned off Background: Key rotation is based on background threads (innodb-encryption-threads) periodically going through all tablespaces on fil_system. For each tablespace current used key version is compared to max key age (innodb-encryption-rotate-key-age). This process naturally takes CPU. Similarly, in same time need for scrubbing is investigated. Currently, key rotation is fully supported on Amazon AWS key management plugin only but InnoDB does not have knowledge what key management plugin is used. This patch re-purposes innodb-encryption-rotate-key-age=0 to disable key rotation and background data scrubbing. All new tables are added to special list for key rotation and key rotation is based on sending a event to background encryption threads instead of using periodic checking (i.e. timeout). fil0fil.cc: Added functions fil_space_acquire_low() to acquire a tablespace when it could be dropped concurrently. This function is used from fil_space_acquire() or fil_space_acquire_silent() that will not print any messages if we try to acquire space that does not exist. fil_space_release() to release a acquired tablespace. fil_space_next() to iterate tablespaces in fil_system using fil_space_acquire() and fil_space_release(). Similarly, fil_space_keyrotation_next() to iterate new list fil_system->rotation_list where new tables. are added if key rotation is disabled. Removed unnecessary functions fil_get_first_space_safe() fil_get_next_space_safe() fil_node_open_file(): After page 0 is read read also crypt_info if it is not yet read. btr_scrub_lock_dict_func() buf_page_check_corrupt() buf_page_encrypt_before_write() buf_merge_or_delete_for_page() lock_print_info_all_transactions() row_fts_psort_info_init() row_truncate_table_for_mysql() row_drop_table_for_mysql() Use fil_space_acquire()/release() to access fil_space_t. buf_page_decrypt_after_read(): Use fil_space_get_crypt_data() because at this point we might not yet have read page 0. fil0crypt.cc/fil0fil.h: Lot of changes. Pass fil_space_t* directly to functions needing it and store fil_space_t* to rotation state. Use fil_space_acquire()/release() when iterating tablespaces and removed unnecessary is_closing from fil_crypt_t. Use fil_space_t::is_stopping() to detect when access to tablespace should be stopped. Removed unnecessary fil_space_get_crypt_data(). fil_space_create(): Inform key rotation that there could be something to do if key rotation is disabled and new table with encryption enabled is created. Remove unnecessary functions fil_get_first_space_safe() and fil_get_next_space_safe(). fil_space_acquire() and fil_space_release() are used instead. Moved fil_space_get_crypt_data() and fil_space_set_crypt_data() to fil0crypt.cc. fsp_header_init(): Acquire fil_space_t*, write crypt_data and release space. check_table_options() Renamed FIL_SPACE_ENCRYPTION_* TO FIL_ENCRYPTION_* i_s.cc: Added ROTATING_OR_FLUSHING field to information_schema.innodb_tablespace_encryption to show current status of key rotation.
9 years ago
MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
Merge Google encryption commit 195158e9889365dc3298f8c1f3bcaa745992f27f Author: Minli Zhu <minliz@google.com> Date: Mon Nov 25 11:05:55 2013 -0800 Innodb redo log encryption/decryption. Use start lsn of a log block as part of AES CTR counter. Record key version with each checkpoint. Internally key version 0 means no encryption. Tests done (see test_innodb_log_encryption.sh for detail): - Verify flag innodb_encrypt_log on or off, combined with various key versions passed through CLI, and dynamically set after startup, will not corrupt database. This includes tests from being unencrypted to encrypted, and encrypted to unencrypted. - Verify start-up with no redo logs succeeds. - Verify fresh start-up succeeds. Change-Id: I4ce4c2afdf3076be2fce90ebbc2a7ce01184b612 commit c1b97273659f07866758c25f4a56f680a1fbad24 Author: Jonas Oreland <jonaso@google.com> Date: Tue Dec 3 18:47:27 2013 +0100 encryption of aria data&index files this patch implements encryption of aria data & index files. this is implemented as 1) add read/write hooks (renamed from callbacks) that does encrypt/decrypt (also add pre_read and post_write hooks) 2) modify page headers for data/index to contain key version (making the data-page header size different for with/without encryption) 3) modify index page 0 to contain IV (and crypt header) 4) AES CRT crypt functions 5) counter block is implemented using combination of page no, lsn and table specific id NOTE: 1) log files are not encrypted, this is not needed for if aria is only used for internal temporary tables and they are not transactional (i.e not logged) 2) all encrypted tables are using PAGE_CHECKSUM (crc) normal internal temporary tables are (currently) not CHECKSUM:ed 3) This patch adds insert-order semantics to aria block_format. The default behaviour of aria block-format is best-fit, meaning that rows gets allocated to page trying to fill the pages as much as possible. However, certain sql constructs materialize temporary result in tmp-tables, and expect that a table scan will later return the rows in the same order they were inserted. This implementation of insert-order is only enabled when explicitly requested by sql-layer. CHANGES: 1) found bug in ma_write that made code try to abort a record that was never written unsure why this is not exposed Change-Id: Ia82bbaa92e2c0629c08693c5add2f56b815c0509 commit 89dc1ab651fe0205d55b4eb588f62df550aa65fc Author: Jonas Oreland <jonaso@google.com> Date: Mon Feb 17 08:04:50 2014 -0800 Implement encryption of innodb datafiles. Pages are encrypted before written to disk and decrypted when read from disk. Each page except first page (page 0) in tablespace is encrypted. Page 0 is unencrypted and contains IV for the tablespace. FIL_PAGE_FILE_FLUSH_LSN on each page (except page 0) is used to store a 32-bit key-version, so that multiple keys can be active in a tablespace simultaneous. The other 32-bit of the FIL_PAGE_FILE_FLUSH_LSN field contains a checksum that is computed after encryption. This checksum is used by innochecksum and when restoring from double-write-buffer. The encryption is performed using AES CRT. Monitoring of encryption is enabled using new IS-table INNODB_TABLESPACES_ENCRYPTION. In addition to that new status variables innodb_encryption_rotation_{ pages_read_from_cache, pages_read_from_disk, pages_modified,pages_flushed } has been added. The following tunables are introduces - innodb_encrypt_tables - innodb_encryption_threads - innodb_encryption_rotate_key_age - innodb_encryption_rotation_iops Change-Id: I8f651795a30b52e71b16d6bc9cb7559be349d0b2 commit a17eef2f6948e58219c9e26fc35633d6fd4de1de Author: Andrew Ford <andrewford@google.com> Date: Thu Jan 2 15:43:09 2014 -0800 Key management skeleton with debug hooks. Change-Id: Ifd6aa3743d7ea291c70083f433a059c439aed866 commit 68a399838ad72264fd61b3dc67fecd29bbdb0af1 Author: Andrew Ford <andrewford@google.com> Date: Mon Oct 28 16:27:44 2013 -0700 Add AES-128 CTR and GCM encryption classes. Change-Id: I116305eced2a233db15306bc2ef5b9d398d1a3a2
11 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().
9 years ago
MDEV-12253: Buffer pool blocks are accessed after they have been freed Problem was that bpage was referenced after it was already freed from LRU. Fixed by adding a new variable encrypted that is passed down to buf_page_check_corrupt() and used in buf_page_get_gen() to stop processing page read. This patch should also address following test failures and bugs: MDEV-12419: IMPORT should not look up tablespace in PageConverter::validate(). This is now removed. MDEV-10099: encryption.innodb_onlinealter_encryption fails sporadically in buildbot MDEV-11420: encryption.innodb_encryption-page-compression failed in buildbot MDEV-11222: encryption.encrypt_and_grep failed in buildbot on P8 Removed dict_table_t::is_encrypted and dict_table_t::ibd_file_missing and replaced these with dict_table_t::file_unreadable. Table ibd file is missing if fil_get_space(space_id) returns NULL and encrypted if not. Removed dict_table_t::is_corrupted field. Ported FilSpace class from 10.2 and using that on buf_page_check_corrupt(), buf_page_decrypt_after_read(), buf_page_encrypt_before_write(), buf_dblwr_process(), buf_read_page(), dict_stats_save_defrag_stats(). Added test cases when enrypted page could be read while doing redo log crash recovery. Also added test case for row compressed blobs. btr_cur_open_at_index_side_func(), btr_cur_open_at_rnd_pos_func(): Avoid referencing block that is NULL. buf_page_get_zip(): Issue error if page read fails. buf_page_get_gen(): Use dberr_t for error detection and do not reference bpage after we hare freed it. buf_mark_space_corrupt(): remove bpage from LRU also when it is encrypted. buf_page_check_corrupt(): @return DB_SUCCESS if page has been read and is not corrupted, DB_PAGE_CORRUPTED if page based on checksum check is corrupted, DB_DECRYPTION_FAILED if page post encryption checksum matches but after decryption normal page checksum does not match. In read case only DB_SUCCESS is possible. buf_page_io_complete(): use dberr_t for error handling. buf_flush_write_block_low(), buf_read_ahead_random(), buf_read_page_async(), buf_read_ahead_linear(), buf_read_ibuf_merge_pages(), buf_read_recv_pages(), fil_aio_wait(): Issue error if page read fails. btr_pcur_move_to_next_page(): Do not reference page if it is NULL. Introduced dict_table_t::is_readable() and dict_index_t::is_readable() that will return true if tablespace exists and pages read from tablespace are not corrupted or page decryption failed. Removed buf_page_t::key_version. After page decryption the key version is not removed from page frame. For unencrypted pages, old key_version is removed at buf_page_encrypt_before_write() dict_stats_update_transient_for_index(), dict_stats_update_transient() Do not continue if table decryption failed or table is corrupted. dict0stats.cc: Introduced a dict_stats_report_error function to avoid code duplication. fil_parse_write_crypt_data(): Check that key read from redo log entry is found from encryption plugin and if it is not, refuse to start. PageConverter::validate(): Removed access to fil_space_t as tablespace is not available during import. Fixed error code on innodb.innodb test. Merged test cased innodb-bad-key-change5 and innodb-bad-key-shutdown to innodb-bad-key-change2. Removed innodb-bad-key-change5 test. Decreased unnecessary complexity on some long lasting tests. Removed fil_inc_pending_ops(), fil_decr_pending_ops(), fil_get_first_space(), fil_get_next_space(), fil_get_first_space_safe(), fil_get_next_space_safe() functions. fil_space_verify_crypt_checksum(): Fixed bug found using ASAN where FIL_PAGE_END_LSN_OLD_CHECKSUM field was incorrectly accessed from row compressed tables. Fixed out of page frame bug for row compressed tables in fil_space_verify_crypt_checksum() found using ASAN. Incorrect function was called for compressed table. Added new tests for discard, rename table and drop (we should allow them even when page decryption fails). Alter table rename is not allowed. Added test for restart with innodb-force-recovery=1 when page read on redo-recovery cant be decrypted. Added test for corrupted table where both page data and FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is corrupted. Adjusted the test case innodb_bug14147491 so that it does not anymore expect crash. Instead table is just mostly not usable. fil0fil.h: fil_space_acquire_low is not visible function and fil_space_acquire and fil_space_acquire_silent are inline functions. FilSpace class uses fil_space_acquire_low directly. recv_apply_hashed_log_recs() does not return anything.
9 years ago
MDEV-11759: Encryption code in MariaDB 10.1/10.2 causes compatibility problems Pages that are encrypted contain post encryption checksum on different location that normal checksum fields. Therefore, we should before decryption check this checksum to avoid unencrypting corrupted pages. After decryption we can use traditional checksum check to detect if page is corrupted or unencryption was done using incorrect key. Pages that are page compressed do not contain any checksum, here we need to fist unencrypt, decompress and finally use tradional checksum check to detect page corruption or that we used incorrect key in unencryption. buf0buf.cc: buf_page_is_corrupted() mofified so that compressed pages are skipped. buf0buf.h, buf_block_init(), buf_page_init_low(): removed unnecessary page_encrypted, page_compressed, stored_checksum, valculated_checksum fields from buf_page_t buf_page_get_gen(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_check_corrupt(): If page was not yet decrypted check if post encryption checksum still matches. If page is not anymore encrypted, use buf_page_is_corrupted() traditional checksum method. If page is detected as corrupted and it is not encrypted we print corruption message to error log. If page is still encrypted or it was encrypted and now corrupted, we will print message that page is encrypted to error log. buf_page_io_complete(): use new buf_page_check_corrupt() function to detect corrupted pages. buf_page_decrypt_after_read(): Verify post encryption checksum before tring to decrypt. fil0crypt.cc: fil_encrypt_buf() verify post encryption checksum and ind fil_space_decrypt() return true if we really decrypted the page. fil_space_verify_crypt_checksum(): rewrite to use the method used when calculating post encryption checksum. We also check if post encryption checksum matches that traditional checksum check does not match. fil0fil.ic: Add missed page type encrypted and page compressed to fil_get_page_type_name() Note that this change does not yet fix innochecksum tool, that will be done in separate MDEV. Fix test failures caused by buf page corruption injection.
9 years ago
MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().
9 years ago
MDEV-11738: Mariadb uses 100% of several of my 8 cpus doing nothing MDEV-11581: Mariadb starts InnoDB encryption threads when key has not changed or data scrubbing turned off Background: Key rotation is based on background threads (innodb-encryption-threads) periodically going through all tablespaces on fil_system. For each tablespace current used key version is compared to max key age (innodb-encryption-rotate-key-age). This process naturally takes CPU. Similarly, in same time need for scrubbing is investigated. Currently, key rotation is fully supported on Amazon AWS key management plugin only but InnoDB does not have knowledge what key management plugin is used. This patch re-purposes innodb-encryption-rotate-key-age=0 to disable key rotation and background data scrubbing. All new tables are added to special list for key rotation and key rotation is based on sending a event to background encryption threads instead of using periodic checking (i.e. timeout). fil0fil.cc: Added functions fil_space_acquire_low() to acquire a tablespace when it could be dropped concurrently. This function is used from fil_space_acquire() or fil_space_acquire_silent() that will not print any messages if we try to acquire space that does not exist. fil_space_release() to release a acquired tablespace. fil_space_next() to iterate tablespaces in fil_system using fil_space_acquire() and fil_space_release(). Similarly, fil_space_keyrotation_next() to iterate new list fil_system->rotation_list where new tables. are added if key rotation is disabled. Removed unnecessary functions fil_get_first_space_safe() fil_get_next_space_safe() fil_node_open_file(): After page 0 is read read also crypt_info if it is not yet read. btr_scrub_lock_dict_func() buf_page_check_corrupt() buf_page_encrypt_before_write() buf_merge_or_delete_for_page() lock_print_info_all_transactions() row_fts_psort_info_init() row_truncate_table_for_mysql() row_drop_table_for_mysql() Use fil_space_acquire()/release() to access fil_space_t. buf_page_decrypt_after_read(): Use fil_space_get_crypt_data() because at this point we might not yet have read page 0. fil0crypt.cc/fil0fil.h: Lot of changes. Pass fil_space_t* directly to functions needing it and store fil_space_t* to rotation state. Use fil_space_acquire()/release() when iterating tablespaces and removed unnecessary is_closing from fil_crypt_t. Use fil_space_t::is_stopping() to detect when access to tablespace should be stopped. Removed unnecessary fil_space_get_crypt_data(). fil_space_create(): Inform key rotation that there could be something to do if key rotation is disabled and new table with encryption enabled is created. Remove unnecessary functions fil_get_first_space_safe() and fil_get_next_space_safe(). fil_space_acquire() and fil_space_release() are used instead. Moved fil_space_get_crypt_data() and fil_space_set_crypt_data() to fil0crypt.cc. fsp_header_init(): Acquire fil_space_t*, write crypt_data and release space. check_table_options() Renamed FIL_SPACE_ENCRYPTION_* TO FIL_ENCRYPTION_* i_s.cc: Added ROTATING_OR_FLUSHING field to information_schema.innodb_tablespace_encryption to show current status of key rotation.
9 years ago
Merge Google encryption commit 195158e9889365dc3298f8c1f3bcaa745992f27f Author: Minli Zhu <minliz@google.com> Date: Mon Nov 25 11:05:55 2013 -0800 Innodb redo log encryption/decryption. Use start lsn of a log block as part of AES CTR counter. Record key version with each checkpoint. Internally key version 0 means no encryption. Tests done (see test_innodb_log_encryption.sh for detail): - Verify flag innodb_encrypt_log on or off, combined with various key versions passed through CLI, and dynamically set after startup, will not corrupt database. This includes tests from being unencrypted to encrypted, and encrypted to unencrypted. - Verify start-up with no redo logs succeeds. - Verify fresh start-up succeeds. Change-Id: I4ce4c2afdf3076be2fce90ebbc2a7ce01184b612 commit c1b97273659f07866758c25f4a56f680a1fbad24 Author: Jonas Oreland <jonaso@google.com> Date: Tue Dec 3 18:47:27 2013 +0100 encryption of aria data&index files this patch implements encryption of aria data & index files. this is implemented as 1) add read/write hooks (renamed from callbacks) that does encrypt/decrypt (also add pre_read and post_write hooks) 2) modify page headers for data/index to contain key version (making the data-page header size different for with/without encryption) 3) modify index page 0 to contain IV (and crypt header) 4) AES CRT crypt functions 5) counter block is implemented using combination of page no, lsn and table specific id NOTE: 1) log files are not encrypted, this is not needed for if aria is only used for internal temporary tables and they are not transactional (i.e not logged) 2) all encrypted tables are using PAGE_CHECKSUM (crc) normal internal temporary tables are (currently) not CHECKSUM:ed 3) This patch adds insert-order semantics to aria block_format. The default behaviour of aria block-format is best-fit, meaning that rows gets allocated to page trying to fill the pages as much as possible. However, certain sql constructs materialize temporary result in tmp-tables, and expect that a table scan will later return the rows in the same order they were inserted. This implementation of insert-order is only enabled when explicitly requested by sql-layer. CHANGES: 1) found bug in ma_write that made code try to abort a record that was never written unsure why this is not exposed Change-Id: Ia82bbaa92e2c0629c08693c5add2f56b815c0509 commit 89dc1ab651fe0205d55b4eb588f62df550aa65fc Author: Jonas Oreland <jonaso@google.com> Date: Mon Feb 17 08:04:50 2014 -0800 Implement encryption of innodb datafiles. Pages are encrypted before written to disk and decrypted when read from disk. Each page except first page (page 0) in tablespace is encrypted. Page 0 is unencrypted and contains IV for the tablespace. FIL_PAGE_FILE_FLUSH_LSN on each page (except page 0) is used to store a 32-bit key-version, so that multiple keys can be active in a tablespace simultaneous. The other 32-bit of the FIL_PAGE_FILE_FLUSH_LSN field contains a checksum that is computed after encryption. This checksum is used by innochecksum and when restoring from double-write-buffer. The encryption is performed using AES CRT. Monitoring of encryption is enabled using new IS-table INNODB_TABLESPACES_ENCRYPTION. In addition to that new status variables innodb_encryption_rotation_{ pages_read_from_cache, pages_read_from_disk, pages_modified,pages_flushed } has been added. The following tunables are introduces - innodb_encrypt_tables - innodb_encryption_threads - innodb_encryption_rotate_key_age - innodb_encryption_rotation_iops Change-Id: I8f651795a30b52e71b16d6bc9cb7559be349d0b2 commit a17eef2f6948e58219c9e26fc35633d6fd4de1de Author: Andrew Ford <andrewford@google.com> Date: Thu Jan 2 15:43:09 2014 -0800 Key management skeleton with debug hooks. Change-Id: Ifd6aa3743d7ea291c70083f433a059c439aed866 commit 68a399838ad72264fd61b3dc67fecd29bbdb0af1 Author: Andrew Ford <andrewford@google.com> Date: Mon Oct 28 16:27:44 2013 -0700 Add AES-128 CTR and GCM encryption classes. Change-Id: I116305eced2a233db15306bc2ef5b9d398d1a3a2
11 years ago
MDEV-12602 InnoDB: Failing assertion: space->n_pending_ops == 0 This fixes a regression caused by MDEV-12428. When we introduced a variant of fil_space_acquire() that could increment space->n_pending_ops after space->stop_new_ops was set, the logic of fil_check_pending_operations() was broken. fil_space_t::n_pending_ios: A new field to track read or write access from the buffer pool routines immediately before a block write or after a block read in the file system. fil_space_acquire_for_io(), fil_space_release_for_io(): Similar to fil_space_acquire_silent() and fil_space_release(), but modify fil_space_t::n_pending_ios instead of fil_space_t::n_pending_ops. Adjust a number of places accordingly, and remove some redundant tablespace lookups. The following parts of this fix differ from the 10.2 version of this fix: buf_page_get_corrupt(): Add a tablespace parameter. In 10.2, we already had a two-phase process of freeing fil_space objects (first, fil_space_detach(), then release fil_system->mutex, and finally free the fil_space and fil_node objects). fil_space_free_and_mutex_exit(): Renamed from fil_space_free(). Detach the tablespace from the fil_system cache, release the fil_system->mutex, and then wait for space->n_pending_ios to reach 0, to avoid accessing freed data in a concurrent thread. During the wait, future calls to fil_space_acquire_for_io() will not find this tablespace, and the count can only be decremented to 0, at which point it is safe to free the objects. fil_node_free_part1(), fil_node_free_part2(): Refactored from fil_node_free().
9 years ago
Merge Google encryption commit 195158e9889365dc3298f8c1f3bcaa745992f27f Author: Minli Zhu <minliz@google.com> Date: Mon Nov 25 11:05:55 2013 -0800 Innodb redo log encryption/decryption. Use start lsn of a log block as part of AES CTR counter. Record key version with each checkpoint. Internally key version 0 means no encryption. Tests done (see test_innodb_log_encryption.sh for detail): - Verify flag innodb_encrypt_log on or off, combined with various key versions passed through CLI, and dynamically set after startup, will not corrupt database. This includes tests from being unencrypted to encrypted, and encrypted to unencrypted. - Verify start-up with no redo logs succeeds. - Verify fresh start-up succeeds. Change-Id: I4ce4c2afdf3076be2fce90ebbc2a7ce01184b612 commit c1b97273659f07866758c25f4a56f680a1fbad24 Author: Jonas Oreland <jonaso@google.com> Date: Tue Dec 3 18:47:27 2013 +0100 encryption of aria data&index files this patch implements encryption of aria data & index files. this is implemented as 1) add read/write hooks (renamed from callbacks) that does encrypt/decrypt (also add pre_read and post_write hooks) 2) modify page headers for data/index to contain key version (making the data-page header size different for with/without encryption) 3) modify index page 0 to contain IV (and crypt header) 4) AES CRT crypt functions 5) counter block is implemented using combination of page no, lsn and table specific id NOTE: 1) log files are not encrypted, this is not needed for if aria is only used for internal temporary tables and they are not transactional (i.e not logged) 2) all encrypted tables are using PAGE_CHECKSUM (crc) normal internal temporary tables are (currently) not CHECKSUM:ed 3) This patch adds insert-order semantics to aria block_format. The default behaviour of aria block-format is best-fit, meaning that rows gets allocated to page trying to fill the pages as much as possible. However, certain sql constructs materialize temporary result in tmp-tables, and expect that a table scan will later return the rows in the same order they were inserted. This implementation of insert-order is only enabled when explicitly requested by sql-layer. CHANGES: 1) found bug in ma_write that made code try to abort a record that was never written unsure why this is not exposed Change-Id: Ia82bbaa92e2c0629c08693c5add2f56b815c0509 commit 89dc1ab651fe0205d55b4eb588f62df550aa65fc Author: Jonas Oreland <jonaso@google.com> Date: Mon Feb 17 08:04:50 2014 -0800 Implement encryption of innodb datafiles. Pages are encrypted before written to disk and decrypted when read from disk. Each page except first page (page 0) in tablespace is encrypted. Page 0 is unencrypted and contains IV for the tablespace. FIL_PAGE_FILE_FLUSH_LSN on each page (except page 0) is used to store a 32-bit key-version, so that multiple keys can be active in a tablespace simultaneous. The other 32-bit of the FIL_PAGE_FILE_FLUSH_LSN field contains a checksum that is computed after encryption. This checksum is used by innochecksum and when restoring from double-write-buffer. The encryption is performed using AES CRT. Monitoring of encryption is enabled using new IS-table INNODB_TABLESPACES_ENCRYPTION. In addition to that new status variables innodb_encryption_rotation_{ pages_read_from_cache, pages_read_from_disk, pages_modified,pages_flushed } has been added. The following tunables are introduces - innodb_encrypt_tables - innodb_encryption_threads - innodb_encryption_rotate_key_age - innodb_encryption_rotation_iops Change-Id: I8f651795a30b52e71b16d6bc9cb7559be349d0b2 commit a17eef2f6948e58219c9e26fc35633d6fd4de1de Author: Andrew Ford <andrewford@google.com> Date: Thu Jan 2 15:43:09 2014 -0800 Key management skeleton with debug hooks. Change-Id: Ifd6aa3743d7ea291c70083f433a059c439aed866 commit 68a399838ad72264fd61b3dc67fecd29bbdb0af1 Author: Andrew Ford <andrewford@google.com> Date: Mon Oct 28 16:27:44 2013 -0700 Add AES-128 CTR and GCM encryption classes. Change-Id: I116305eced2a233db15306bc2ef5b9d398d1a3a2
11 years ago
  1. /*****************************************************************************
  2. Copyright (c) 1995, 2016, Oracle and/or its affiliates. All Rights Reserved.
  3. Copyright (c) 2008, Google Inc.
  4. Copyright (c) 2013, 2017, MariaDB Corporation.
  5. Portions of this file contain modifications contributed and copyrighted by
  6. Google, Inc. Those modifications are gratefully acknowledged and are described
  7. briefly in the InnoDB documentation. The contributions by Google are
  8. incorporated with their permission, and subject to the conditions contained in
  9. the file COPYING.Google.
  10. This program is free software; you can redistribute it and/or modify it under
  11. the terms of the GNU General Public License as published by the Free Software
  12. Foundation; version 2 of the License.
  13. This program is distributed in the hope that it will be useful, but WITHOUT
  14. ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
  15. FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
  16. You should have received a copy of the GNU General Public License along with
  17. this program; if not, write to the Free Software Foundation, Inc.,
  18. 51 Franklin Street, Suite 500, Boston, MA 02110-1335 USA
  19. *****************************************************************************/
  20. /**************************************************//**
  21. @file buf/buf0buf.cc
  22. The database buffer buf_pool
  23. Created 11/5/1995 Heikki Tuuri
  24. *******************************************************/
  25. #include "buf0buf.h"
  26. #ifdef UNIV_NONINL
  27. #include "buf0buf.ic"
  28. #endif
  29. #include "mem0mem.h"
  30. #include "btr0btr.h"
  31. #include "fil0fil.h"
  32. #include "fil0crypt.h"
  33. #ifndef UNIV_HOTBACKUP
  34. #include "buf0buddy.h"
  35. #include "lock0lock.h"
  36. #include "btr0sea.h"
  37. #include "ibuf0ibuf.h"
  38. #include "trx0undo.h"
  39. #include "log0log.h"
  40. #endif /* !UNIV_HOTBACKUP */
  41. #include "srv0srv.h"
  42. #include "dict0dict.h"
  43. #include "log0recv.h"
  44. #include "page0zip.h"
  45. #include "srv0mon.h"
  46. #include "buf0checksum.h"
  47. #ifdef HAVE_LIBNUMA
  48. #include <numa.h>
  49. #include <numaif.h>
  50. #endif // HAVE_LIBNUMA
  51. #include "fil0pagecompress.h"
  52. #include "ha_prototypes.h"
  53. #include "ut0byte.h"
  54. #include <new>
  55. #ifdef UNIV_LINUX
  56. #include <stdlib.h>
  57. #endif
  58. #ifdef HAVE_LZO
  59. #include "lzo/lzo1x.h"
  60. #endif
  61. #ifdef HAVE_SNAPPY
  62. #include "snappy-c.h"
  63. #endif
  64. inline void* aligned_malloc(size_t size, size_t align) {
  65. void *result;
  66. #ifdef _MSC_VER
  67. result = _aligned_malloc(size, align);
  68. #else
  69. if(posix_memalign(&result, align, size)) {
  70. result = 0;
  71. }
  72. #endif
  73. return result;
  74. }
  75. inline void aligned_free(void *ptr) {
  76. #ifdef _MSC_VER
  77. _aligned_free(ptr);
  78. #else
  79. free(ptr);
  80. #endif
  81. }
  82. /*
  83. IMPLEMENTATION OF THE BUFFER POOL
  84. =================================
  85. Performance improvement:
  86. ------------------------
  87. Thread scheduling in NT may be so slow that the OS wait mechanism should
  88. not be used even in waiting for disk reads to complete.
  89. Rather, we should put waiting query threads to the queue of
  90. waiting jobs, and let the OS thread do something useful while the i/o
  91. is processed. In this way we could remove most OS thread switches in
  92. an i/o-intensive benchmark like TPC-C.
  93. A possibility is to put a user space thread library between the database
  94. and NT. User space thread libraries might be very fast.
  95. SQL Server 7.0 can be configured to use 'fibers' which are lightweight
  96. threads in NT. These should be studied.
  97. Buffer frames and blocks
  98. ------------------------
  99. Following the terminology of Gray and Reuter, we call the memory
  100. blocks where file pages are loaded buffer frames. For each buffer
  101. frame there is a control block, or shortly, a block, in the buffer
  102. control array. The control info which does not need to be stored
  103. in the file along with the file page, resides in the control block.
  104. Buffer pool struct
  105. ------------------
  106. The buffer buf_pool contains a single mutex which protects all the
  107. control data structures of the buf_pool. The content of a buffer frame is
  108. protected by a separate read-write lock in its control block, though.
  109. These locks can be locked and unlocked without owning the buf_pool->mutex.
  110. The OS events in the buf_pool struct can be waited for without owning the
  111. buf_pool->mutex.
  112. The buf_pool->mutex is a hot-spot in main memory, causing a lot of
  113. memory bus traffic on multiprocessor systems when processors
  114. alternately access the mutex. On our Pentium, the mutex is accessed
  115. maybe every 10 microseconds. We gave up the solution to have mutexes
  116. for each control block, for instance, because it seemed to be
  117. complicated.
  118. A solution to reduce mutex contention of the buf_pool->mutex is to
  119. create a separate mutex for the page hash table. On Pentium,
  120. accessing the hash table takes 2 microseconds, about half
  121. of the total buf_pool->mutex hold time.
  122. Control blocks
  123. --------------
  124. The control block contains, for instance, the bufferfix count
  125. which is incremented when a thread wants a file page to be fixed
  126. in a buffer frame. The bufferfix operation does not lock the
  127. contents of the frame, however. For this purpose, the control
  128. block contains a read-write lock.
  129. The buffer frames have to be aligned so that the start memory
  130. address of a frame is divisible by the universal page size, which
  131. is a power of two.
  132. We intend to make the buffer buf_pool size on-line reconfigurable,
  133. that is, the buf_pool size can be changed without closing the database.
  134. Then the database administarator may adjust it to be bigger
  135. at night, for example. The control block array must
  136. contain enough control blocks for the maximum buffer buf_pool size
  137. which is used in the particular database.
  138. If the buf_pool size is cut, we exploit the virtual memory mechanism of
  139. the OS, and just refrain from using frames at high addresses. Then the OS
  140. can swap them to disk.
  141. The control blocks containing file pages are put to a hash table
  142. according to the file address of the page.
  143. We could speed up the access to an individual page by using
  144. "pointer swizzling": we could replace the page references on
  145. non-leaf index pages by direct pointers to the page, if it exists
  146. in the buf_pool. We could make a separate hash table where we could
  147. chain all the page references in non-leaf pages residing in the buf_pool,
  148. using the page reference as the hash key,
  149. and at the time of reading of a page update the pointers accordingly.
  150. Drawbacks of this solution are added complexity and,
  151. possibly, extra space required on non-leaf pages for memory pointers.
  152. A simpler solution is just to speed up the hash table mechanism
  153. in the database, using tables whose size is a power of 2.
  154. Lists of blocks
  155. ---------------
  156. There are several lists of control blocks.
  157. The free list (buf_pool->free) contains blocks which are currently not
  158. used.
  159. The common LRU list contains all the blocks holding a file page
  160. except those for which the bufferfix count is non-zero.
  161. The pages are in the LRU list roughly in the order of the last
  162. access to the page, so that the oldest pages are at the end of the
  163. list. We also keep a pointer to near the end of the LRU list,
  164. which we can use when we want to artificially age a page in the
  165. buf_pool. This is used if we know that some page is not needed
  166. again for some time: we insert the block right after the pointer,
  167. causing it to be replaced sooner than would normally be the case.
  168. Currently this aging mechanism is used for read-ahead mechanism
  169. of pages, and it can also be used when there is a scan of a full
  170. table which cannot fit in the memory. Putting the pages near the
  171. end of the LRU list, we make sure that most of the buf_pool stays
  172. in the main memory, undisturbed.
  173. The unzip_LRU list contains a subset of the common LRU list. The
  174. blocks on the unzip_LRU list hold a compressed file page and the
  175. corresponding uncompressed page frame. A block is in unzip_LRU if and
  176. only if the predicate buf_page_belongs_to_unzip_LRU(&block->page)
  177. holds. The blocks in unzip_LRU will be in same order as they are in
  178. the common LRU list. That is, each manipulation of the common LRU
  179. list will result in the same manipulation of the unzip_LRU list.
  180. The chain of modified blocks (buf_pool->flush_list) contains the blocks
  181. holding file pages that have been modified in the memory
  182. but not written to disk yet. The block with the oldest modification
  183. which has not yet been written to disk is at the end of the chain.
  184. The access to this list is protected by buf_pool->flush_list_mutex.
  185. The chain of unmodified compressed blocks (buf_pool->zip_clean)
  186. contains the control blocks (buf_page_t) of those compressed pages
  187. that are not in buf_pool->flush_list and for which no uncompressed
  188. page has been allocated in the buffer pool. The control blocks for
  189. uncompressed pages are accessible via buf_block_t objects that are
  190. reachable via buf_pool->chunks[].
  191. The chains of free memory blocks (buf_pool->zip_free[]) are used by
  192. the buddy allocator (buf0buddy.cc) to keep track of currently unused
  193. memory blocks of size sizeof(buf_page_t)..UNIV_PAGE_SIZE / 2. These
  194. blocks are inside the UNIV_PAGE_SIZE-sized memory blocks of type
  195. BUF_BLOCK_MEMORY that the buddy allocator requests from the buffer
  196. pool. The buddy allocator is solely used for allocating control
  197. blocks for compressed pages (buf_page_t) and compressed page frames.
  198. Loading a file page
  199. -------------------
  200. First, a victim block for replacement has to be found in the
  201. buf_pool. It is taken from the free list or searched for from the
  202. end of the LRU-list. An exclusive lock is reserved for the frame,
  203. the io_fix field is set in the block fixing the block in buf_pool,
  204. and the io-operation for loading the page is queued. The io-handler thread
  205. releases the X-lock on the frame and resets the io_fix field
  206. when the io operation completes.
  207. A thread may request the above operation using the function
  208. buf_page_get(). It may then continue to request a lock on the frame.
  209. The lock is granted when the io-handler releases the x-lock.
  210. Read-ahead
  211. ----------
  212. The read-ahead mechanism is intended to be intelligent and
  213. isolated from the semantically higher levels of the database
  214. index management. From the higher level we only need the
  215. information if a file page has a natural successor or
  216. predecessor page. On the leaf level of a B-tree index,
  217. these are the next and previous pages in the natural
  218. order of the pages.
  219. Let us first explain the read-ahead mechanism when the leafs
  220. of a B-tree are scanned in an ascending or descending order.
  221. When a read page is the first time referenced in the buf_pool,
  222. the buffer manager checks if it is at the border of a so-called
  223. linear read-ahead area. The tablespace is divided into these
  224. areas of size 64 blocks, for example. So if the page is at the
  225. border of such an area, the read-ahead mechanism checks if
  226. all the other blocks in the area have been accessed in an
  227. ascending or descending order. If this is the case, the system
  228. looks at the natural successor or predecessor of the page,
  229. checks if that is at the border of another area, and in this case
  230. issues read-requests for all the pages in that area. Maybe
  231. we could relax the condition that all the pages in the area
  232. have to be accessed: if data is deleted from a table, there may
  233. appear holes of unused pages in the area.
  234. A different read-ahead mechanism is used when there appears
  235. to be a random access pattern to a file.
  236. If a new page is referenced in the buf_pool, and several pages
  237. of its random access area (for instance, 32 consecutive pages
  238. in a tablespace) have recently been referenced, we may predict
  239. that the whole area may be needed in the near future, and issue
  240. the read requests for the whole area.
  241. */
  242. #ifndef UNIV_HOTBACKUP
  243. /** Value in microseconds */
  244. static const int WAIT_FOR_READ = 100;
  245. /** Number of attemtps made to read in a page in the buffer pool */
  246. static const ulint BUF_PAGE_READ_MAX_RETRIES = 100;
  247. /** The buffer pools of the database */
  248. UNIV_INTERN buf_pool_t* buf_pool_ptr;
  249. #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
  250. static ulint buf_dbg_counter = 0; /*!< This is used to insert validation
  251. operations in execution in the
  252. debug version */
  253. #endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
  254. #ifdef UNIV_DEBUG
  255. /** If this is set TRUE, the program prints info whenever
  256. read-ahead or flush occurs */
  257. UNIV_INTERN ibool buf_debug_prints = FALSE;
  258. #endif /* UNIV_DEBUG */
  259. #ifdef UNIV_PFS_RWLOCK
  260. /* Keys to register buffer block related rwlocks and mutexes with
  261. performance schema */
  262. UNIV_INTERN mysql_pfs_key_t buf_block_lock_key;
  263. # ifdef UNIV_SYNC_DEBUG
  264. UNIV_INTERN mysql_pfs_key_t buf_block_debug_latch_key;
  265. # endif /* UNIV_SYNC_DEBUG */
  266. #endif /* UNIV_PFS_RWLOCK */
  267. #ifdef UNIV_PFS_MUTEX
  268. UNIV_INTERN mysql_pfs_key_t buffer_block_mutex_key;
  269. UNIV_INTERN mysql_pfs_key_t buf_pool_mutex_key;
  270. UNIV_INTERN mysql_pfs_key_t buf_pool_zip_mutex_key;
  271. UNIV_INTERN mysql_pfs_key_t flush_list_mutex_key;
  272. #endif /* UNIV_PFS_MUTEX */
  273. #if defined UNIV_PFS_MUTEX || defined UNIV_PFS_RWLOCK
  274. # ifndef PFS_SKIP_BUFFER_MUTEX_RWLOCK
  275. /* Buffer block mutexes and rwlocks can be registered
  276. in one group rather than individually. If PFS_GROUP_BUFFER_SYNC
  277. is defined, register buffer block mutex and rwlock
  278. in one group after their initialization. */
  279. # define PFS_GROUP_BUFFER_SYNC
  280. /* This define caps the number of mutexes/rwlocks can
  281. be registered with performance schema. Developers can
  282. modify this define if necessary. Please note, this would
  283. be effective only if PFS_GROUP_BUFFER_SYNC is defined. */
  284. # define PFS_MAX_BUFFER_MUTEX_LOCK_REGISTER ULINT_MAX
  285. # endif /* !PFS_SKIP_BUFFER_MUTEX_RWLOCK */
  286. #endif /* UNIV_PFS_MUTEX || UNIV_PFS_RWLOCK */
  287. /** Macro to determine whether the read of write counter is used depending
  288. on the io_type */
  289. #define MONITOR_RW_COUNTER(io_type, counter) \
  290. ((io_type == BUF_IO_READ) \
  291. ? (counter##_READ) \
  292. : (counter##_WRITTEN))
  293. /** Decrypt a page.
  294. @param[in,out] bpage Page control block
  295. @param[in,out] space tablespace
  296. @return whether the operation was successful */
  297. static
  298. bool
  299. buf_page_decrypt_after_read(buf_page_t* bpage, fil_space_t* space)
  300. MY_ATTRIBUTE((nonnull));
  301. /********************************************************************//**
  302. Mark a table with the specified space pointed by bpage->space corrupted.
  303. Also remove the bpage from LRU list.
  304. @param[in,out] bpage Block */
  305. static
  306. void
  307. buf_mark_space_corrupt(
  308. buf_page_t* bpage);
  309. /* prototypes for new functions added to ha_innodb.cc */
  310. trx_t* innobase_get_trx();
  311. /********************************************************************//**
  312. Gets the smallest oldest_modification lsn for any page in the pool. Returns
  313. zero if all modified pages have been flushed to disk.
  314. @return oldest modification in pool, zero if none */
  315. UNIV_INTERN
  316. lsn_t
  317. buf_pool_get_oldest_modification(void)
  318. /*==================================*/
  319. {
  320. ulint i;
  321. buf_page_t* bpage;
  322. lsn_t lsn = 0;
  323. lsn_t oldest_lsn = 0;
  324. /* When we traverse all the flush lists we don't want another
  325. thread to add a dirty page to any flush list. */
  326. log_flush_order_mutex_enter();
  327. for (i = 0; i < srv_buf_pool_instances; i++) {
  328. buf_pool_t* buf_pool;
  329. buf_pool = buf_pool_from_array(i);
  330. buf_flush_list_mutex_enter(buf_pool);
  331. bpage = UT_LIST_GET_LAST(buf_pool->flush_list);
  332. if (bpage != NULL) {
  333. ut_ad(bpage->in_flush_list);
  334. lsn = bpage->oldest_modification;
  335. }
  336. buf_flush_list_mutex_exit(buf_pool);
  337. if (!oldest_lsn || oldest_lsn > lsn) {
  338. oldest_lsn = lsn;
  339. }
  340. }
  341. log_flush_order_mutex_exit();
  342. /* The returned answer may be out of date: the flush_list can
  343. change after the mutex has been released. */
  344. return(oldest_lsn);
  345. }
  346. /********************************************************************//**
  347. Get total buffer pool statistics. */
  348. UNIV_INTERN
  349. void
  350. buf_get_total_list_len(
  351. /*===================*/
  352. ulint* LRU_len, /*!< out: length of all LRU lists */
  353. ulint* free_len, /*!< out: length of all free lists */
  354. ulint* flush_list_len) /*!< out: length of all flush lists */
  355. {
  356. ulint i;
  357. *LRU_len = 0;
  358. *free_len = 0;
  359. *flush_list_len = 0;
  360. for (i = 0; i < srv_buf_pool_instances; i++) {
  361. buf_pool_t* buf_pool;
  362. buf_pool = buf_pool_from_array(i);
  363. *LRU_len += UT_LIST_GET_LEN(buf_pool->LRU);
  364. *free_len += UT_LIST_GET_LEN(buf_pool->free);
  365. *flush_list_len += UT_LIST_GET_LEN(buf_pool->flush_list);
  366. }
  367. }
  368. /********************************************************************//**
  369. Get total list size in bytes from all buffer pools. */
  370. UNIV_INTERN
  371. void
  372. buf_get_total_list_size_in_bytes(
  373. /*=============================*/
  374. buf_pools_list_size_t* buf_pools_list_size) /*!< out: list sizes
  375. in all buffer pools */
  376. {
  377. ut_ad(buf_pools_list_size);
  378. memset(buf_pools_list_size, 0, sizeof(*buf_pools_list_size));
  379. for (ulint i = 0; i < srv_buf_pool_instances; i++) {
  380. buf_pool_t* buf_pool;
  381. buf_pool = buf_pool_from_array(i);
  382. /* We don't need mutex protection since this is
  383. for statistics purpose */
  384. buf_pools_list_size->LRU_bytes += buf_pool->stat.LRU_bytes;
  385. buf_pools_list_size->unzip_LRU_bytes +=
  386. UT_LIST_GET_LEN(buf_pool->unzip_LRU) * UNIV_PAGE_SIZE;
  387. buf_pools_list_size->flush_list_bytes +=
  388. buf_pool->stat.flush_list_bytes;
  389. }
  390. }
  391. /********************************************************************//**
  392. Get total buffer pool statistics. */
  393. UNIV_INTERN
  394. void
  395. buf_get_total_stat(
  396. /*===============*/
  397. buf_pool_stat_t* tot_stat) /*!< out: buffer pool stats */
  398. {
  399. ulint i;
  400. memset(tot_stat, 0, sizeof(*tot_stat));
  401. for (i = 0; i < srv_buf_pool_instances; i++) {
  402. buf_pool_stat_t*buf_stat;
  403. buf_pool_t* buf_pool;
  404. buf_pool = buf_pool_from_array(i);
  405. buf_stat = &buf_pool->stat;
  406. tot_stat->n_page_gets += buf_stat->n_page_gets;
  407. tot_stat->n_pages_read += buf_stat->n_pages_read;
  408. tot_stat->n_pages_written += buf_stat->n_pages_written;
  409. tot_stat->n_pages_created += buf_stat->n_pages_created;
  410. tot_stat->n_ra_pages_read_rnd += buf_stat->n_ra_pages_read_rnd;
  411. tot_stat->n_ra_pages_read += buf_stat->n_ra_pages_read;
  412. tot_stat->n_ra_pages_evicted += buf_stat->n_ra_pages_evicted;
  413. tot_stat->n_pages_made_young += buf_stat->n_pages_made_young;
  414. tot_stat->n_pages_not_made_young +=
  415. buf_stat->n_pages_not_made_young;
  416. }
  417. }
  418. /********************************************************************//**
  419. Allocates a buffer block.
  420. @return own: the allocated block, in state BUF_BLOCK_MEMORY */
  421. UNIV_INTERN
  422. buf_block_t*
  423. buf_block_alloc(
  424. /*============*/
  425. buf_pool_t* buf_pool) /*!< in/out: buffer pool instance,
  426. or NULL for round-robin selection
  427. of the buffer pool */
  428. {
  429. buf_block_t* block;
  430. ulint index;
  431. static ulint buf_pool_index;
  432. if (buf_pool == NULL) {
  433. /* We are allocating memory from any buffer pool, ensure
  434. we spread the grace on all buffer pool instances. */
  435. index = buf_pool_index++ % srv_buf_pool_instances;
  436. buf_pool = buf_pool_from_array(index);
  437. }
  438. block = buf_LRU_get_free_block(buf_pool);
  439. buf_block_set_state(block, BUF_BLOCK_MEMORY);
  440. return(block);
  441. }
  442. #endif /* !UNIV_HOTBACKUP */
  443. /** Check if a page is all zeroes.
  444. @param[in] read_buf database page
  445. @param[in] zip_size ROW_FORMAT=COMPRESSED page size, or 0
  446. @return whether the page is all zeroes */
  447. UNIV_INTERN
  448. bool
  449. buf_page_is_zeroes(const byte* read_buf, ulint zip_size)
  450. {
  451. const ulint page_size = zip_size ? zip_size : UNIV_PAGE_SIZE;
  452. for (ulint i = 0; i < page_size; i++) {
  453. if (read_buf[i] != 0) {
  454. return(false);
  455. }
  456. }
  457. return(true);
  458. }
  459. /** Checks if the page is in crc32 checksum format.
  460. @param[in] read_buf database page
  461. @param[in] checksum_field1 new checksum field
  462. @param[in] checksum_field2 old checksum field
  463. @return true if the page is in crc32 checksum format */
  464. UNIV_INTERN
  465. bool
  466. buf_page_is_checksum_valid_crc32(
  467. const byte* read_buf,
  468. ulint checksum_field1,
  469. ulint checksum_field2)
  470. {
  471. ib_uint32_t crc32 = buf_calc_page_crc32(read_buf);
  472. if (!(checksum_field1 == crc32 && checksum_field2 == crc32)) {
  473. DBUG_PRINT("buf_checksum",
  474. ("Page checksum crc32 not valid field1 " ULINTPF
  475. " field2 " ULINTPF " crc32 %u.",
  476. checksum_field1, checksum_field2, crc32));
  477. return (false);
  478. }
  479. return (true);
  480. }
  481. /** Checks if the page is in innodb checksum format.
  482. @param[in] read_buf database page
  483. @param[in] checksum_field1 new checksum field
  484. @param[in] checksum_field2 old checksum field
  485. @return true if the page is in innodb checksum format */
  486. UNIV_INTERN
  487. bool
  488. buf_page_is_checksum_valid_innodb(
  489. const byte* read_buf,
  490. ulint checksum_field1,
  491. ulint checksum_field2)
  492. {
  493. /* There are 2 valid formulas for
  494. checksum_field2 (old checksum field) which algo=innodb could have
  495. written to the page:
  496. 1. Very old versions of InnoDB only stored 8 byte lsn to the
  497. start and the end of the page.
  498. 2. Newer InnoDB versions store the old formula checksum
  499. (buf_calc_page_old_checksum()). */
  500. if (checksum_field2 != mach_read_from_4(read_buf + FIL_PAGE_LSN)
  501. && checksum_field2 != buf_calc_page_old_checksum(read_buf)) {
  502. DBUG_PRINT("buf_checksum",
  503. ("Page checksum innodb not valid field1 " ULINTPF
  504. " field2 " ULINTPF "crc32 " ULINTPF " lsn " ULINTPF ".",
  505. checksum_field1, checksum_field2, buf_calc_page_old_checksum(read_buf),
  506. mach_read_from_4(read_buf + FIL_PAGE_LSN)));
  507. return(false);
  508. }
  509. /* old field is fine, check the new field */
  510. /* InnoDB versions < 4.0.14 and < 4.1.1 stored the space id
  511. (always equal to 0), to FIL_PAGE_SPACE_OR_CHKSUM */
  512. if (checksum_field1 != 0
  513. && checksum_field1 != buf_calc_page_new_checksum(read_buf)) {
  514. DBUG_PRINT("buf_checksum",
  515. ("Page checksum innodb not valid field1 " ULINTPF
  516. " field2 " ULINTPF "crc32 " ULINTPF " lsn " ULINTPF ".",
  517. checksum_field1, checksum_field2, buf_calc_page_new_checksum(read_buf),
  518. mach_read_from_4(read_buf + FIL_PAGE_LSN)));
  519. return(false);
  520. }
  521. return(true);
  522. }
  523. /** Checks if the page is in none checksum format.
  524. @param[in] read_buf database page
  525. @param[in] checksum_field1 new checksum field
  526. @param[in] checksum_field2 old checksum field
  527. @return true if the page is in none checksum format */
  528. UNIV_INTERN
  529. bool
  530. buf_page_is_checksum_valid_none(
  531. const byte* read_buf,
  532. ulint checksum_field1,
  533. ulint checksum_field2)
  534. {
  535. if (!(checksum_field1 == checksum_field2 && checksum_field1 == BUF_NO_CHECKSUM_MAGIC)) {
  536. DBUG_PRINT("buf_checksum",
  537. ("Page checksum none not valid field1 " ULINTPF
  538. " field2 " ULINTPF "crc32 " ULINTPF " lsn " ULINTPF ".",
  539. checksum_field1, checksum_field2, BUF_NO_CHECKSUM_MAGIC,
  540. mach_read_from_4(read_buf + FIL_PAGE_LSN)));
  541. }
  542. return(checksum_field1 == checksum_field2
  543. && checksum_field1 == BUF_NO_CHECKSUM_MAGIC);
  544. }
  545. /** Check if a page is corrupt.
  546. @param[in] check_lsn true if LSN should be checked
  547. @param[in] read_buf Page to be checked
  548. @param[in] zip_size compressed size or 0
  549. @param[in] space Pointer to tablespace
  550. @return true if corrupted, false if not */
  551. UNIV_INTERN
  552. bool
  553. buf_page_is_corrupted(
  554. bool check_lsn,
  555. const byte* read_buf,
  556. ulint zip_size,
  557. const fil_space_t* space)
  558. {
  559. ulint checksum_field1;
  560. ulint checksum_field2;
  561. ulint space_id = mach_read_from_4(
  562. read_buf + FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID);
  563. ulint page_type = mach_read_from_2(
  564. read_buf + FIL_PAGE_TYPE);
  565. /* We can trust page type if page compression is set on tablespace
  566. flags because page compression flag means file must have been
  567. created with 10.1 (later than 5.5 code base). In 10.1 page
  568. compressed tables do not contain post compression checksum and
  569. FIL_PAGE_END_LSN_OLD_CHKSUM field stored. Note that space can
  570. be null if we are in fil_check_first_page() and first page
  571. is not compressed or encrypted. Page checksum is verified
  572. after decompression (i.e. normally pages are already
  573. decompressed at this stage). */
  574. if ((page_type == FIL_PAGE_PAGE_COMPRESSED ||
  575. page_type == FIL_PAGE_PAGE_COMPRESSED_ENCRYPTED)
  576. && space && FSP_FLAGS_HAS_PAGE_COMPRESSION(space->flags)) {
  577. return (false);
  578. }
  579. if (!zip_size
  580. && memcmp(read_buf + FIL_PAGE_LSN + 4,
  581. read_buf + UNIV_PAGE_SIZE
  582. - FIL_PAGE_END_LSN_OLD_CHKSUM + 4, 4)) {
  583. /* Stored log sequence numbers at the start and the end
  584. of page do not match */
  585. ib_logf(IB_LOG_LEVEL_INFO,
  586. "Log sequence number at the start %lu and the end %lu do not match.",
  587. mach_read_from_4(read_buf + FIL_PAGE_LSN + 4),
  588. mach_read_from_4(read_buf + UNIV_PAGE_SIZE - FIL_PAGE_END_LSN_OLD_CHKSUM + 4));
  589. return(true);
  590. }
  591. #ifndef UNIV_HOTBACKUP
  592. if (check_lsn && recv_lsn_checks_on) {
  593. lsn_t current_lsn;
  594. /* Since we are going to reset the page LSN during the import
  595. phase it makes no sense to spam the log with error messages. */
  596. if (log_peek_lsn(&current_lsn)
  597. && current_lsn
  598. < mach_read_from_8(read_buf + FIL_PAGE_LSN)) {
  599. ut_print_timestamp(stderr);
  600. fprintf(stderr,
  601. " InnoDB: Error: page %lu log sequence number"
  602. " " LSN_PF "\n"
  603. "InnoDB: is in the future! Current system "
  604. "log sequence number " LSN_PF ".\n"
  605. "InnoDB: Your database may be corrupt or "
  606. "you may have copied the InnoDB\n"
  607. "InnoDB: tablespace but not the InnoDB "
  608. "log files. See\n"
  609. "InnoDB: " REFMAN
  610. "forcing-innodb-recovery.html\n"
  611. "InnoDB: for more information.\n",
  612. (ulint) mach_read_from_4(
  613. read_buf + FIL_PAGE_OFFSET),
  614. (lsn_t) mach_read_from_8(
  615. read_buf + FIL_PAGE_LSN),
  616. current_lsn);
  617. }
  618. }
  619. #endif
  620. /* Check whether the checksum fields have correct values */
  621. if (srv_checksum_algorithm == SRV_CHECKSUM_ALGORITHM_NONE) {
  622. return(false);
  623. }
  624. if (zip_size) {
  625. return(!page_zip_verify_checksum(read_buf, zip_size));
  626. }
  627. checksum_field1 = mach_read_from_4(
  628. read_buf + FIL_PAGE_SPACE_OR_CHKSUM);
  629. checksum_field2 = mach_read_from_4(
  630. read_buf + UNIV_PAGE_SIZE - FIL_PAGE_END_LSN_OLD_CHKSUM);
  631. #if FIL_PAGE_LSN % 8
  632. #error "FIL_PAGE_LSN must be 64 bit aligned"
  633. #endif
  634. /* declare empty pages non-corrupted */
  635. if (checksum_field1 == 0 && checksum_field2 == 0
  636. && *reinterpret_cast<const ib_uint64_t*>(read_buf +
  637. FIL_PAGE_LSN) == 0) {
  638. /* make sure that the page is really empty */
  639. for (ulint i = 0; i < UNIV_PAGE_SIZE; i++) {
  640. if (read_buf[i] != 0) {
  641. ib_logf(IB_LOG_LEVEL_INFO,
  642. "Checksum fields zero but page is not empty.");
  643. return(true);
  644. }
  645. }
  646. return(false);
  647. }
  648. DBUG_EXECUTE_IF("buf_page_is_corrupt_failure", return(true); );
  649. ulint page_no = mach_read_from_4(read_buf + FIL_PAGE_OFFSET);
  650. const srv_checksum_algorithm_t curr_algo =
  651. static_cast<srv_checksum_algorithm_t>(srv_checksum_algorithm);
  652. switch (curr_algo) {
  653. case SRV_CHECKSUM_ALGORITHM_CRC32:
  654. case SRV_CHECKSUM_ALGORITHM_STRICT_CRC32:
  655. if (buf_page_is_checksum_valid_crc32(read_buf,
  656. checksum_field1, checksum_field2)) {
  657. return(false);
  658. }
  659. if (buf_page_is_checksum_valid_none(read_buf,
  660. checksum_field1, checksum_field2)) {
  661. if (curr_algo
  662. == SRV_CHECKSUM_ALGORITHM_STRICT_CRC32) {
  663. page_warn_strict_checksum(
  664. curr_algo,
  665. SRV_CHECKSUM_ALGORITHM_NONE,
  666. space_id, page_no);
  667. }
  668. return(false);
  669. }
  670. if (buf_page_is_checksum_valid_innodb(read_buf,
  671. checksum_field1, checksum_field2)) {
  672. if (curr_algo
  673. == SRV_CHECKSUM_ALGORITHM_STRICT_CRC32) {
  674. page_warn_strict_checksum(
  675. curr_algo,
  676. SRV_CHECKSUM_ALGORITHM_INNODB,
  677. space_id, page_no);
  678. }
  679. return(false);
  680. }
  681. return(true);
  682. case SRV_CHECKSUM_ALGORITHM_INNODB:
  683. case SRV_CHECKSUM_ALGORITHM_STRICT_INNODB:
  684. if (buf_page_is_checksum_valid_innodb(read_buf,
  685. checksum_field1, checksum_field2)) {
  686. return(false);
  687. }
  688. if (buf_page_is_checksum_valid_none(read_buf,
  689. checksum_field1, checksum_field2)) {
  690. if (curr_algo
  691. == SRV_CHECKSUM_ALGORITHM_STRICT_INNODB) {
  692. page_warn_strict_checksum(
  693. curr_algo,
  694. SRV_CHECKSUM_ALGORITHM_NONE,
  695. space_id, page_no);
  696. }
  697. return(false);
  698. }
  699. if (buf_page_is_checksum_valid_crc32(read_buf,
  700. checksum_field1, checksum_field2)) {
  701. if (curr_algo
  702. == SRV_CHECKSUM_ALGORITHM_STRICT_INNODB) {
  703. page_warn_strict_checksum(
  704. curr_algo,
  705. SRV_CHECKSUM_ALGORITHM_CRC32,
  706. space_id, page_no);
  707. }
  708. return(false);
  709. }
  710. return(true);
  711. case SRV_CHECKSUM_ALGORITHM_STRICT_NONE:
  712. if (buf_page_is_checksum_valid_none(read_buf,
  713. checksum_field1, checksum_field2)) {
  714. return(false);
  715. }
  716. if (buf_page_is_checksum_valid_crc32(read_buf,
  717. checksum_field1, checksum_field2)) {
  718. page_warn_strict_checksum(
  719. curr_algo,
  720. SRV_CHECKSUM_ALGORITHM_CRC32,
  721. space_id, page_no);
  722. return(false);
  723. }
  724. if (buf_page_is_checksum_valid_innodb(read_buf,
  725. checksum_field1, checksum_field2)) {
  726. page_warn_strict_checksum(
  727. curr_algo,
  728. SRV_CHECKSUM_ALGORITHM_INNODB,
  729. space_id, page_no);
  730. return(false);
  731. }
  732. return(true);
  733. case SRV_CHECKSUM_ALGORITHM_NONE:
  734. /* should have returned FALSE earlier */
  735. break;
  736. /* no default so the compiler will emit a warning if new enum
  737. is added and not handled here */
  738. }
  739. ut_error;
  740. return(false);
  741. }
  742. /********************************************************************//**
  743. Prints a page to stderr. */
  744. UNIV_INTERN
  745. void
  746. buf_page_print(
  747. /*===========*/
  748. const byte* read_buf, /*!< in: a database page */
  749. ulint zip_size, /*!< in: compressed page size, or
  750. 0 for uncompressed pages */
  751. ulint flags) /*!< in: 0 or
  752. BUF_PAGE_PRINT_NO_CRASH or
  753. BUF_PAGE_PRINT_NO_FULL */
  754. {
  755. #ifndef UNIV_HOTBACKUP
  756. dict_index_t* index;
  757. #endif /* !UNIV_HOTBACKUP */
  758. ulint size = zip_size;
  759. if (!size) {
  760. size = UNIV_PAGE_SIZE;
  761. }
  762. if (!(flags & BUF_PAGE_PRINT_NO_FULL)) {
  763. ut_print_timestamp(stderr);
  764. fprintf(stderr,
  765. " InnoDB: Page dump in ascii and hex (%lu bytes):\n",
  766. size);
  767. ut_print_buf(stderr, read_buf, size);
  768. fputs("\nInnoDB: End of page dump\n", stderr);
  769. }
  770. if (zip_size) {
  771. /* Print compressed page. */
  772. ut_print_timestamp(stderr);
  773. fprintf(stderr,
  774. " InnoDB: Compressed page type (" ULINTPF "); "
  775. "stored checksum in field1 " ULINTPF "; "
  776. "calculated checksums for field1: "
  777. "%s " ULINTPF ", "
  778. "%s " ULINTPF ", "
  779. "%s " ULINTPF "; "
  780. "page LSN " LSN_PF "; "
  781. "page number (if stored to page already) " ULINTPF "; "
  782. "space id (if stored to page already) " ULINTPF "\n",
  783. fil_page_get_type(read_buf),
  784. mach_read_from_4(read_buf + FIL_PAGE_SPACE_OR_CHKSUM),
  785. buf_checksum_algorithm_name(
  786. SRV_CHECKSUM_ALGORITHM_CRC32),
  787. page_zip_calc_checksum(read_buf, zip_size,
  788. SRV_CHECKSUM_ALGORITHM_CRC32),
  789. buf_checksum_algorithm_name(
  790. SRV_CHECKSUM_ALGORITHM_INNODB),
  791. page_zip_calc_checksum(read_buf, zip_size,
  792. SRV_CHECKSUM_ALGORITHM_INNODB),
  793. buf_checksum_algorithm_name(
  794. SRV_CHECKSUM_ALGORITHM_NONE),
  795. page_zip_calc_checksum(read_buf, zip_size,
  796. SRV_CHECKSUM_ALGORITHM_NONE),
  797. mach_read_from_8(read_buf + FIL_PAGE_LSN),
  798. mach_read_from_4(read_buf + FIL_PAGE_OFFSET),
  799. mach_read_from_4(read_buf
  800. + FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID));
  801. } else {
  802. ut_print_timestamp(stderr);
  803. fprintf(stderr, " InnoDB: uncompressed page, "
  804. "stored checksum in field1 " ULINTPF ", "
  805. "calculated checksums for field1: "
  806. "%s " UINT32PF ", "
  807. "%s " ULINTPF ", "
  808. "%s " ULINTPF ", "
  809. "stored checksum in field2 " ULINTPF ", "
  810. "calculated checksums for field2: "
  811. "%s " UINT32PF ", "
  812. "%s " ULINTPF ", "
  813. "%s " ULINTPF ", "
  814. "page LSN " ULINTPF " " ULINTPF ", "
  815. "low 4 bytes of LSN at page end " ULINTPF ", "
  816. "page number (if stored to page already) " ULINTPF ", "
  817. "space id (if created with >= MySQL-4.1.1 "
  818. "and stored already) %lu\n",
  819. mach_read_from_4(read_buf + FIL_PAGE_SPACE_OR_CHKSUM),
  820. buf_checksum_algorithm_name(SRV_CHECKSUM_ALGORITHM_CRC32),
  821. buf_calc_page_crc32(read_buf),
  822. buf_checksum_algorithm_name(SRV_CHECKSUM_ALGORITHM_INNODB),
  823. buf_calc_page_new_checksum(read_buf),
  824. buf_checksum_algorithm_name(SRV_CHECKSUM_ALGORITHM_NONE),
  825. BUF_NO_CHECKSUM_MAGIC,
  826. mach_read_from_4(read_buf + UNIV_PAGE_SIZE
  827. - FIL_PAGE_END_LSN_OLD_CHKSUM),
  828. buf_checksum_algorithm_name(SRV_CHECKSUM_ALGORITHM_CRC32),
  829. buf_calc_page_crc32(read_buf),
  830. buf_checksum_algorithm_name(SRV_CHECKSUM_ALGORITHM_INNODB),
  831. buf_calc_page_old_checksum(read_buf),
  832. buf_checksum_algorithm_name(SRV_CHECKSUM_ALGORITHM_NONE),
  833. BUF_NO_CHECKSUM_MAGIC,
  834. mach_read_from_4(read_buf + FIL_PAGE_LSN),
  835. mach_read_from_4(read_buf + FIL_PAGE_LSN + 4),
  836. mach_read_from_4(read_buf + UNIV_PAGE_SIZE
  837. - FIL_PAGE_END_LSN_OLD_CHKSUM + 4),
  838. mach_read_from_4(read_buf + FIL_PAGE_OFFSET),
  839. mach_read_from_4(read_buf
  840. + FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID));
  841. ulint page_type = fil_page_get_type(read_buf);
  842. fprintf(stderr, "InnoDB: page type %ld meaning %s\n", page_type,
  843. fil_get_page_type_name(page_type));
  844. }
  845. #ifndef UNIV_HOTBACKUP
  846. if (mach_read_from_2(read_buf + TRX_UNDO_PAGE_HDR + TRX_UNDO_PAGE_TYPE)
  847. == TRX_UNDO_INSERT) {
  848. fprintf(stderr,
  849. "InnoDB: Page may be an insert undo log page\n");
  850. } else if (mach_read_from_2(read_buf + TRX_UNDO_PAGE_HDR
  851. + TRX_UNDO_PAGE_TYPE)
  852. == TRX_UNDO_UPDATE) {
  853. fprintf(stderr,
  854. "InnoDB: Page may be an update undo log page\n");
  855. }
  856. #endif /* !UNIV_HOTBACKUP */
  857. switch (fil_page_get_type(read_buf)) {
  858. index_id_t index_id;
  859. case FIL_PAGE_INDEX:
  860. index_id = btr_page_get_index_id(read_buf);
  861. fprintf(stderr,
  862. "InnoDB: Page may be an index page where"
  863. " index id is %llu\n",
  864. (ullint) index_id);
  865. #ifndef UNIV_HOTBACKUP
  866. index = dict_index_find_on_id_low(index_id);
  867. if (index) {
  868. fputs("InnoDB: (", stderr);
  869. dict_index_name_print(stderr, NULL, index);
  870. fputs(")\n", stderr);
  871. }
  872. #endif /* !UNIV_HOTBACKUP */
  873. break;
  874. case FIL_PAGE_INODE:
  875. fputs("InnoDB: Page may be an 'inode' page\n", stderr);
  876. break;
  877. case FIL_PAGE_IBUF_FREE_LIST:
  878. fputs("InnoDB: Page may be an insert buffer free list page\n",
  879. stderr);
  880. break;
  881. case FIL_PAGE_TYPE_ALLOCATED:
  882. fputs("InnoDB: Page may be a freshly allocated page\n",
  883. stderr);
  884. break;
  885. case FIL_PAGE_IBUF_BITMAP:
  886. fputs("InnoDB: Page may be an insert buffer bitmap page\n",
  887. stderr);
  888. break;
  889. case FIL_PAGE_TYPE_SYS:
  890. fputs("InnoDB: Page may be a system page\n",
  891. stderr);
  892. break;
  893. case FIL_PAGE_TYPE_TRX_SYS:
  894. fputs("InnoDB: Page may be a transaction system page\n",
  895. stderr);
  896. break;
  897. case FIL_PAGE_TYPE_FSP_HDR:
  898. fputs("InnoDB: Page may be a file space header page\n",
  899. stderr);
  900. break;
  901. case FIL_PAGE_TYPE_XDES:
  902. fputs("InnoDB: Page may be an extent descriptor page\n",
  903. stderr);
  904. break;
  905. case FIL_PAGE_TYPE_BLOB:
  906. fputs("InnoDB: Page may be a BLOB page\n",
  907. stderr);
  908. break;
  909. case FIL_PAGE_TYPE_ZBLOB:
  910. case FIL_PAGE_TYPE_ZBLOB2:
  911. fputs("InnoDB: Page may be a compressed BLOB page\n",
  912. stderr);
  913. break;
  914. }
  915. ut_ad(flags & BUF_PAGE_PRINT_NO_CRASH);
  916. }
  917. #ifndef UNIV_HOTBACKUP
  918. # ifdef PFS_GROUP_BUFFER_SYNC
  919. /********************************************************************//**
  920. This function registers mutexes and rwlocks in buffer blocks with
  921. performance schema. If PFS_MAX_BUFFER_MUTEX_LOCK_REGISTER is
  922. defined to be a value less than chunk->size, then only mutexes
  923. and rwlocks in the first PFS_MAX_BUFFER_MUTEX_LOCK_REGISTER
  924. blocks are registered. */
  925. static
  926. void
  927. pfs_register_buffer_block(
  928. /*======================*/
  929. buf_chunk_t* chunk) /*!< in/out: chunk of buffers */
  930. {
  931. ulint i;
  932. ulint num_to_register;
  933. buf_block_t* block;
  934. block = chunk->blocks;
  935. num_to_register = ut_min(chunk->size,
  936. PFS_MAX_BUFFER_MUTEX_LOCK_REGISTER);
  937. for (i = 0; i < num_to_register; i++) {
  938. ib_mutex_t* mutex;
  939. rw_lock_t* rwlock;
  940. # ifdef UNIV_PFS_MUTEX
  941. mutex = &block->mutex;
  942. ut_a(!mutex->pfs_psi);
  943. mutex->pfs_psi = (PSI_server)
  944. ? PSI_server->init_mutex(buffer_block_mutex_key, mutex)
  945. : NULL;
  946. # endif /* UNIV_PFS_MUTEX */
  947. # ifdef UNIV_PFS_RWLOCK
  948. rwlock = &block->lock;
  949. ut_a(!rwlock->pfs_psi);
  950. rwlock->pfs_psi = (PSI_server)
  951. ? PSI_server->init_rwlock(buf_block_lock_key, rwlock)
  952. : NULL;
  953. # ifdef UNIV_SYNC_DEBUG
  954. rwlock = &block->debug_latch;
  955. ut_a(!rwlock->pfs_psi);
  956. rwlock->pfs_psi = (PSI_server)
  957. ? PSI_server->init_rwlock(buf_block_debug_latch_key,
  958. rwlock)
  959. : NULL;
  960. # endif /* UNIV_SYNC_DEBUG */
  961. # endif /* UNIV_PFS_RWLOCK */
  962. block++;
  963. }
  964. }
  965. # endif /* PFS_GROUP_BUFFER_SYNC */
  966. /********************************************************************//**
  967. Initializes a buffer control block when the buf_pool is created. */
  968. static
  969. void
  970. buf_block_init(
  971. /*===========*/
  972. buf_pool_t* buf_pool, /*!< in: buffer pool instance */
  973. buf_block_t* block, /*!< in: pointer to control block */
  974. byte* frame) /*!< in: pointer to buffer frame */
  975. {
  976. UNIV_MEM_DESC(frame, UNIV_PAGE_SIZE);
  977. block->frame = frame;
  978. block->page.buf_pool_index = buf_pool_index(buf_pool);
  979. block->page.flush_type = BUF_FLUSH_LRU;
  980. block->page.state = BUF_BLOCK_NOT_USED;
  981. block->page.buf_fix_count = 0;
  982. block->page.io_fix = BUF_IO_NONE;
  983. block->page.encrypted = false;
  984. block->page.real_size = 0;
  985. block->page.write_size = 0;
  986. block->modify_clock = 0;
  987. block->page.slot = NULL;
  988. #if defined UNIV_DEBUG_FILE_ACCESSES || defined UNIV_DEBUG
  989. block->page.file_page_was_freed = FALSE;
  990. #endif /* UNIV_DEBUG_FILE_ACCESSES || UNIV_DEBUG */
  991. block->check_index_page_at_flush = FALSE;
  992. block->index = NULL;
  993. #ifdef UNIV_DEBUG
  994. block->page.in_page_hash = FALSE;
  995. block->page.in_zip_hash = FALSE;
  996. block->page.in_flush_list = FALSE;
  997. block->page.in_free_list = FALSE;
  998. block->page.in_LRU_list = FALSE;
  999. block->in_unzip_LRU_list = FALSE;
  1000. #endif /* UNIV_DEBUG */
  1001. #if defined UNIV_AHI_DEBUG || defined UNIV_DEBUG
  1002. block->n_pointers = 0;
  1003. #endif /* UNIV_AHI_DEBUG || UNIV_DEBUG */
  1004. page_zip_des_init(&block->page.zip);
  1005. #if defined PFS_SKIP_BUFFER_MUTEX_RWLOCK || defined PFS_GROUP_BUFFER_SYNC
  1006. /* If PFS_SKIP_BUFFER_MUTEX_RWLOCK is defined, skip registration
  1007. of buffer block mutex/rwlock with performance schema. If
  1008. PFS_GROUP_BUFFER_SYNC is defined, skip the registration
  1009. since buffer block mutex/rwlock will be registered later in
  1010. pfs_register_buffer_block() */
  1011. mutex_create(PFS_NOT_INSTRUMENTED, &block->mutex, SYNC_BUF_BLOCK);
  1012. rw_lock_create(PFS_NOT_INSTRUMENTED, &block->lock, SYNC_LEVEL_VARYING);
  1013. # ifdef UNIV_SYNC_DEBUG
  1014. rw_lock_create(PFS_NOT_INSTRUMENTED,
  1015. &block->debug_latch, SYNC_NO_ORDER_CHECK);
  1016. # endif /* UNIV_SYNC_DEBUG */
  1017. #else /* PFS_SKIP_BUFFER_MUTEX_RWLOCK || PFS_GROUP_BUFFER_SYNC */
  1018. mutex_create(buffer_block_mutex_key, &block->mutex, SYNC_BUF_BLOCK);
  1019. rw_lock_create(buf_block_lock_key, &block->lock, SYNC_LEVEL_VARYING);
  1020. # ifdef UNIV_SYNC_DEBUG
  1021. rw_lock_create(buf_block_debug_latch_key,
  1022. &block->debug_latch, SYNC_NO_ORDER_CHECK);
  1023. # endif /* UNIV_SYNC_DEBUG */
  1024. #endif /* PFS_SKIP_BUFFER_MUTEX_RWLOCK || PFS_GROUP_BUFFER_SYNC */
  1025. ut_ad(rw_lock_validate(&(block->lock)));
  1026. }
  1027. /********************************************************************//**
  1028. Allocates a chunk of buffer frames.
  1029. @return chunk, or NULL on failure */
  1030. static
  1031. buf_chunk_t*
  1032. buf_chunk_init(
  1033. /*===========*/
  1034. buf_pool_t* buf_pool, /*!< in: buffer pool instance */
  1035. buf_chunk_t* chunk, /*!< out: chunk of buffers */
  1036. ulint mem_size) /*!< in: requested size in bytes */
  1037. {
  1038. buf_block_t* block;
  1039. byte* frame;
  1040. ulint i;
  1041. /* Round down to a multiple of page size,
  1042. although it already should be. */
  1043. mem_size = ut_2pow_round(mem_size, UNIV_PAGE_SIZE);
  1044. /* Reserve space for the block descriptors. */
  1045. mem_size += ut_2pow_round((mem_size / UNIV_PAGE_SIZE) * (sizeof *block)
  1046. + (UNIV_PAGE_SIZE - 1), UNIV_PAGE_SIZE);
  1047. chunk->mem_size = mem_size;
  1048. chunk->mem = os_mem_alloc_large(&chunk->mem_size);
  1049. if (UNIV_UNLIKELY(chunk->mem == NULL)) {
  1050. return(NULL);
  1051. }
  1052. #ifdef HAVE_LIBNUMA
  1053. if (srv_numa_interleave) {
  1054. int st = mbind(chunk->mem, chunk->mem_size,
  1055. MPOL_INTERLEAVE,
  1056. numa_all_nodes_ptr->maskp,
  1057. numa_all_nodes_ptr->size,
  1058. MPOL_MF_MOVE);
  1059. if (st != 0) {
  1060. ib_logf(IB_LOG_LEVEL_WARN,
  1061. "Failed to set NUMA memory policy of buffer"
  1062. " pool page frames to MPOL_INTERLEAVE"
  1063. " (error: %s).", strerror(errno));
  1064. }
  1065. }
  1066. #endif // HAVE_LIBNUMA
  1067. /* Allocate the block descriptors from
  1068. the start of the memory block. */
  1069. chunk->blocks = (buf_block_t*) chunk->mem;
  1070. /* Align a pointer to the first frame. Note that when
  1071. os_large_page_size is smaller than UNIV_PAGE_SIZE,
  1072. we may allocate one fewer block than requested. When
  1073. it is bigger, we may allocate more blocks than requested. */
  1074. frame = (byte*) ut_align(chunk->mem, UNIV_PAGE_SIZE);
  1075. chunk->size = chunk->mem_size / UNIV_PAGE_SIZE
  1076. - (frame != chunk->mem);
  1077. /* Subtract the space needed for block descriptors. */
  1078. {
  1079. ulint size = chunk->size;
  1080. while (frame < (byte*) (chunk->blocks + size)) {
  1081. frame += UNIV_PAGE_SIZE;
  1082. size--;
  1083. }
  1084. chunk->size = size;
  1085. }
  1086. /* Init block structs and assign frames for them. Then we
  1087. assign the frames to the first blocks (we already mapped the
  1088. memory above). */
  1089. block = chunk->blocks;
  1090. for (i = chunk->size; i--; ) {
  1091. buf_block_init(buf_pool, block, frame);
  1092. UNIV_MEM_INVALID(block->frame, UNIV_PAGE_SIZE);
  1093. /* Add the block to the free list */
  1094. UT_LIST_ADD_LAST(list, buf_pool->free, (&block->page));
  1095. ut_d(block->page.in_free_list = TRUE);
  1096. ut_ad(buf_pool_from_block(block) == buf_pool);
  1097. block++;
  1098. frame += UNIV_PAGE_SIZE;
  1099. }
  1100. #ifdef PFS_GROUP_BUFFER_SYNC
  1101. pfs_register_buffer_block(chunk);
  1102. #endif
  1103. return(chunk);
  1104. }
  1105. #ifdef UNIV_DEBUG
  1106. /*********************************************************************//**
  1107. Finds a block in the given buffer chunk that points to a
  1108. given compressed page.
  1109. @return buffer block pointing to the compressed page, or NULL */
  1110. static
  1111. buf_block_t*
  1112. buf_chunk_contains_zip(
  1113. /*===================*/
  1114. buf_chunk_t* chunk, /*!< in: chunk being checked */
  1115. const void* data) /*!< in: pointer to compressed page */
  1116. {
  1117. buf_block_t* block;
  1118. ulint i;
  1119. block = chunk->blocks;
  1120. for (i = chunk->size; i--; block++) {
  1121. if (block->page.zip.data == data) {
  1122. return(block);
  1123. }
  1124. }
  1125. return(NULL);
  1126. }
  1127. /*********************************************************************//**
  1128. Finds a block in the buffer pool that points to a
  1129. given compressed page.
  1130. @return buffer block pointing to the compressed page, or NULL */
  1131. UNIV_INTERN
  1132. buf_block_t*
  1133. buf_pool_contains_zip(
  1134. /*==================*/
  1135. buf_pool_t* buf_pool, /*!< in: buffer pool instance */
  1136. const void* data) /*!< in: pointer to compressed page */
  1137. {
  1138. ulint n;
  1139. buf_chunk_t* chunk = buf_pool->chunks;
  1140. ut_ad(buf_pool);
  1141. ut_ad(buf_pool_mutex_own(buf_pool));
  1142. for (n = buf_pool->n_chunks; n--; chunk++) {
  1143. buf_block_t* block = buf_chunk_contains_zip(chunk, data);
  1144. if (block) {
  1145. return(block);
  1146. }
  1147. }
  1148. return(NULL);
  1149. }
  1150. #endif /* UNIV_DEBUG */
  1151. /*********************************************************************//**
  1152. Checks that all file pages in the buffer chunk are in a replaceable state.
  1153. @return address of a non-free block, or NULL if all freed */
  1154. static
  1155. const buf_block_t*
  1156. buf_chunk_not_freed(
  1157. /*================*/
  1158. buf_chunk_t* chunk) /*!< in: chunk being checked */
  1159. {
  1160. buf_block_t* block;
  1161. ulint i;
  1162. block = chunk->blocks;
  1163. for (i = chunk->size; i--; block++) {
  1164. ibool ready;
  1165. switch (buf_block_get_state(block)) {
  1166. case BUF_BLOCK_POOL_WATCH:
  1167. case BUF_BLOCK_ZIP_PAGE:
  1168. case BUF_BLOCK_ZIP_DIRTY:
  1169. /* The uncompressed buffer pool should never
  1170. contain compressed block descriptors. */
  1171. ut_error;
  1172. break;
  1173. case BUF_BLOCK_NOT_USED:
  1174. case BUF_BLOCK_READY_FOR_USE:
  1175. case BUF_BLOCK_MEMORY:
  1176. case BUF_BLOCK_REMOVE_HASH:
  1177. /* Skip blocks that are not being used for
  1178. file pages. */
  1179. break;
  1180. case BUF_BLOCK_FILE_PAGE:
  1181. mutex_enter(&block->mutex);
  1182. ready = buf_flush_ready_for_replace(&block->page);
  1183. mutex_exit(&block->mutex);
  1184. if (!ready) {
  1185. return(block);
  1186. }
  1187. break;
  1188. }
  1189. }
  1190. return(NULL);
  1191. }
  1192. /********************************************************************//**
  1193. Set buffer pool size variables after resizing it */
  1194. static
  1195. void
  1196. buf_pool_set_sizes(void)
  1197. /*====================*/
  1198. {
  1199. ulint i;
  1200. ulint curr_size = 0;
  1201. buf_pool_mutex_enter_all();
  1202. for (i = 0; i < srv_buf_pool_instances; i++) {
  1203. buf_pool_t* buf_pool;
  1204. buf_pool = buf_pool_from_array(i);
  1205. curr_size += buf_pool->curr_pool_size;
  1206. }
  1207. srv_buf_pool_curr_size = curr_size;
  1208. srv_buf_pool_old_size = srv_buf_pool_size;
  1209. buf_pool_mutex_exit_all();
  1210. }
  1211. /********************************************************************//**
  1212. Initialize a buffer pool instance.
  1213. @return DB_SUCCESS if all goes well. */
  1214. UNIV_INTERN
  1215. ulint
  1216. buf_pool_init_instance(
  1217. /*===================*/
  1218. buf_pool_t* buf_pool, /*!< in: buffer pool instance */
  1219. ulint buf_pool_size, /*!< in: size in bytes */
  1220. ulint instance_no) /*!< in: id of the instance */
  1221. {
  1222. ulint i;
  1223. buf_chunk_t* chunk;
  1224. /* 1. Initialize general fields
  1225. ------------------------------- */
  1226. mutex_create(buf_pool_mutex_key,
  1227. &buf_pool->mutex, SYNC_BUF_POOL);
  1228. mutex_create(buf_pool_zip_mutex_key,
  1229. &buf_pool->zip_mutex, SYNC_BUF_BLOCK);
  1230. buf_pool_mutex_enter(buf_pool);
  1231. if (buf_pool_size > 0) {
  1232. buf_pool->n_chunks = 1;
  1233. buf_pool->chunks = chunk =
  1234. (buf_chunk_t*) mem_zalloc(sizeof *chunk);
  1235. if (!buf_chunk_init(buf_pool, chunk, buf_pool_size)) {
  1236. mem_free(chunk);
  1237. mem_free(buf_pool);
  1238. buf_pool_mutex_exit(buf_pool);
  1239. return(DB_ERROR);
  1240. }
  1241. buf_pool->instance_no = instance_no;
  1242. buf_pool->old_pool_size = buf_pool_size;
  1243. buf_pool->curr_size = chunk->size;
  1244. buf_pool->curr_pool_size = buf_pool->curr_size * UNIV_PAGE_SIZE;
  1245. /* Number of locks protecting page_hash must be a
  1246. power of two */
  1247. srv_n_page_hash_locks = static_cast<ulong>(
  1248. ut_2_power_up(srv_n_page_hash_locks));
  1249. ut_a(srv_n_page_hash_locks != 0);
  1250. ut_a(srv_n_page_hash_locks <= MAX_PAGE_HASH_LOCKS);
  1251. buf_pool->page_hash = ib_create(2 * buf_pool->curr_size,
  1252. srv_n_page_hash_locks,
  1253. MEM_HEAP_FOR_PAGE_HASH,
  1254. SYNC_BUF_PAGE_HASH);
  1255. buf_pool->zip_hash = hash_create(2 * buf_pool->curr_size);
  1256. buf_pool->last_printout_time = ut_time();
  1257. }
  1258. /* 2. Initialize flushing fields
  1259. -------------------------------- */
  1260. mutex_create(flush_list_mutex_key, &buf_pool->flush_list_mutex,
  1261. SYNC_BUF_FLUSH_LIST);
  1262. for (i = BUF_FLUSH_LRU; i < BUF_FLUSH_N_TYPES; i++) {
  1263. buf_pool->no_flush[i] = os_event_create();
  1264. }
  1265. buf_pool->watch = (buf_page_t*) mem_zalloc(
  1266. sizeof(*buf_pool->watch) * BUF_POOL_WATCH_SIZE);
  1267. /* All fields are initialized by mem_zalloc(). */
  1268. buf_pool->try_LRU_scan = TRUE;
  1269. /* Initialize the hazard pointer for flush_list batches */
  1270. new(&buf_pool->flush_hp)
  1271. FlushHp(buf_pool, &buf_pool->flush_list_mutex);
  1272. /* Initialize the hazard pointer for LRU batches */
  1273. new(&buf_pool->lru_hp) LRUHp(buf_pool, &buf_pool->mutex);
  1274. /* Initialize the iterator for LRU scan search */
  1275. new(&buf_pool->lru_scan_itr) LRUItr(buf_pool, &buf_pool->mutex);
  1276. /* Initialize the iterator for single page scan search */
  1277. new(&buf_pool->single_scan_itr) LRUItr(buf_pool, &buf_pool->mutex);
  1278. /* Initialize the temporal memory array and slots */
  1279. buf_pool->tmp_arr = (buf_tmp_array_t *)mem_zalloc(sizeof(buf_tmp_array_t));
  1280. ulint n_slots = (srv_n_read_io_threads + srv_n_write_io_threads) * (8 * OS_AIO_N_PENDING_IOS_PER_THREAD);
  1281. buf_pool->tmp_arr->n_slots = n_slots;
  1282. buf_pool->tmp_arr->slots = (buf_tmp_buffer_t*)mem_zalloc(sizeof(buf_tmp_buffer_t) * n_slots);
  1283. buf_pool_mutex_exit(buf_pool);
  1284. DBUG_EXECUTE_IF("buf_pool_init_instance_force_oom",
  1285. return(DB_ERROR); );
  1286. return(DB_SUCCESS);
  1287. }
  1288. /********************************************************************//**
  1289. free one buffer pool instance */
  1290. static
  1291. void
  1292. buf_pool_free_instance(
  1293. /*===================*/
  1294. buf_pool_t* buf_pool) /* in,own: buffer pool instance
  1295. to free */
  1296. {
  1297. buf_chunk_t* chunk;
  1298. buf_chunk_t* chunks;
  1299. buf_page_t* bpage;
  1300. bpage = UT_LIST_GET_LAST(buf_pool->LRU);
  1301. while (bpage != NULL) {
  1302. buf_page_t* prev_bpage = UT_LIST_GET_PREV(LRU, bpage);
  1303. enum buf_page_state state = buf_page_get_state(bpage);
  1304. ut_ad(buf_page_in_file(bpage));
  1305. ut_ad(bpage->in_LRU_list);
  1306. if (state != BUF_BLOCK_FILE_PAGE) {
  1307. /* We must not have any dirty block except
  1308. when doing a fast shutdown. */
  1309. ut_ad(state == BUF_BLOCK_ZIP_PAGE
  1310. || srv_fast_shutdown == 2);
  1311. buf_page_free_descriptor(bpage);
  1312. }
  1313. bpage = prev_bpage;
  1314. }
  1315. mem_free(buf_pool->watch);
  1316. buf_pool->watch = NULL;
  1317. chunks = buf_pool->chunks;
  1318. chunk = chunks + buf_pool->n_chunks;
  1319. while (--chunk >= chunks) {
  1320. os_mem_free_large(chunk->mem, chunk->mem_size);
  1321. }
  1322. mem_free(buf_pool->chunks);
  1323. ha_clear(buf_pool->page_hash);
  1324. hash_table_free(buf_pool->page_hash);
  1325. hash_table_free(buf_pool->zip_hash);
  1326. /* Free all used temporary slots */
  1327. if (buf_pool->tmp_arr) {
  1328. for(ulint i = 0; i < buf_pool->tmp_arr->n_slots; i++) {
  1329. buf_tmp_buffer_t* slot = &(buf_pool->tmp_arr->slots[i]);
  1330. if (slot && slot->crypt_buf) {
  1331. aligned_free(slot->crypt_buf);
  1332. slot->crypt_buf = NULL;
  1333. }
  1334. if (slot && slot->comp_buf) {
  1335. aligned_free(slot->comp_buf);
  1336. slot->comp_buf = NULL;
  1337. }
  1338. }
  1339. }
  1340. mem_free(buf_pool->tmp_arr->slots);
  1341. mem_free(buf_pool->tmp_arr);
  1342. buf_pool->tmp_arr = NULL;
  1343. }
  1344. /********************************************************************//**
  1345. Creates the buffer pool.
  1346. @return DB_SUCCESS if success, DB_ERROR if not enough memory or error */
  1347. UNIV_INTERN
  1348. dberr_t
  1349. buf_pool_init(
  1350. /*==========*/
  1351. ulint total_size, /*!< in: size of the total pool in bytes */
  1352. ulint n_instances) /*!< in: number of instances */
  1353. {
  1354. ulint i;
  1355. const ulint size = total_size / n_instances;
  1356. ut_ad(n_instances > 0);
  1357. ut_ad(n_instances <= MAX_BUFFER_POOLS);
  1358. ut_ad(n_instances == srv_buf_pool_instances);
  1359. #ifdef HAVE_LIBNUMA
  1360. if (srv_numa_interleave) {
  1361. ib_logf(IB_LOG_LEVEL_INFO,
  1362. "Setting NUMA memory policy to MPOL_INTERLEAVE");
  1363. if (set_mempolicy(MPOL_INTERLEAVE,
  1364. numa_all_nodes_ptr->maskp,
  1365. numa_all_nodes_ptr->size) != 0) {
  1366. ib_logf(IB_LOG_LEVEL_WARN,
  1367. "Failed to set NUMA memory policy to"
  1368. " MPOL_INTERLEAVE (error: %s).",
  1369. strerror(errno));
  1370. }
  1371. }
  1372. #endif // HAVE_LIBNUMA
  1373. buf_pool_ptr = (buf_pool_t*) mem_zalloc(
  1374. n_instances * sizeof *buf_pool_ptr);
  1375. for (i = 0; i < n_instances; i++) {
  1376. buf_pool_t* ptr = &buf_pool_ptr[i];
  1377. if (buf_pool_init_instance(ptr, size, i) != DB_SUCCESS) {
  1378. /* Free all the instances created so far. */
  1379. buf_pool_free(i);
  1380. return(DB_ERROR);
  1381. }
  1382. }
  1383. buf_pool_set_sizes();
  1384. buf_LRU_old_ratio_update(100 * 3/ 8, FALSE);
  1385. btr_search_sys_create(buf_pool_get_curr_size() / sizeof(void*) / 64);
  1386. #ifdef HAVE_LIBNUMA
  1387. if (srv_numa_interleave) {
  1388. ib_logf(IB_LOG_LEVEL_INFO,
  1389. "Setting NUMA memory policy to MPOL_DEFAULT");
  1390. if (set_mempolicy(MPOL_DEFAULT, NULL, 0) != 0) {
  1391. ib_logf(IB_LOG_LEVEL_WARN,
  1392. "Failed to set NUMA memory policy to"
  1393. " MPOL_DEFAULT (error: %s).", strerror(errno));
  1394. }
  1395. }
  1396. #endif // HAVE_LIBNUMA
  1397. buf_flush_event = os_event_create();
  1398. return(DB_SUCCESS);
  1399. }
  1400. /********************************************************************//**
  1401. Frees the buffer pool at shutdown. This must not be invoked before
  1402. freeing all mutexes. */
  1403. UNIV_INTERN
  1404. void
  1405. buf_pool_free(
  1406. /*==========*/
  1407. ulint n_instances) /*!< in: numbere of instances to free */
  1408. {
  1409. ulint i;
  1410. for (i = 0; i < n_instances; i++) {
  1411. buf_pool_free_instance(buf_pool_from_array(i));
  1412. }
  1413. mem_free(buf_pool_ptr);
  1414. buf_pool_ptr = NULL;
  1415. }
  1416. /********************************************************************//**
  1417. Clears the adaptive hash index on all pages in the buffer pool. */
  1418. UNIV_INTERN
  1419. void
  1420. buf_pool_clear_hash_index(void)
  1421. /*===========================*/
  1422. {
  1423. ulint p;
  1424. #ifdef UNIV_SYNC_DEBUG
  1425. ut_ad(rw_lock_own(&btr_search_latch, RW_LOCK_EX));
  1426. #endif /* UNIV_SYNC_DEBUG */
  1427. ut_ad(!btr_search_enabled);
  1428. for (p = 0; p < srv_buf_pool_instances; p++) {
  1429. buf_pool_t* buf_pool = buf_pool_from_array(p);
  1430. buf_chunk_t* chunks = buf_pool->chunks;
  1431. buf_chunk_t* chunk = chunks + buf_pool->n_chunks;
  1432. while (--chunk >= chunks) {
  1433. buf_block_t* block = chunk->blocks;
  1434. ulint i = chunk->size;
  1435. for (; i--; block++) {
  1436. dict_index_t* index = block->index;
  1437. /* We can set block->index = NULL
  1438. when we have an x-latch on btr_search_latch;
  1439. see the comment in buf0buf.h */
  1440. if (!index) {
  1441. /* Not hashed */
  1442. continue;
  1443. }
  1444. block->index = NULL;
  1445. # if defined UNIV_AHI_DEBUG || defined UNIV_DEBUG
  1446. block->n_pointers = 0;
  1447. # endif /* UNIV_AHI_DEBUG || UNIV_DEBUG */
  1448. }
  1449. }
  1450. }
  1451. }
  1452. /********************************************************************//**
  1453. Relocate a buffer control block. Relocates the block on the LRU list
  1454. and in buf_pool->page_hash. Does not relocate bpage->list.
  1455. The caller must take care of relocating bpage->list. */
  1456. UNIV_INTERN
  1457. void
  1458. buf_relocate(
  1459. /*=========*/
  1460. buf_page_t* bpage, /*!< in/out: control block being relocated;
  1461. buf_page_get_state(bpage) must be
  1462. BUF_BLOCK_ZIP_DIRTY or BUF_BLOCK_ZIP_PAGE */
  1463. buf_page_t* dpage) /*!< in/out: destination control block */
  1464. {
  1465. buf_page_t* b;
  1466. ulint fold;
  1467. buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
  1468. fold = buf_page_address_fold(bpage->space, bpage->offset);
  1469. ut_ad(buf_pool_mutex_own(buf_pool));
  1470. ut_ad(buf_page_hash_lock_held_x(buf_pool, bpage));
  1471. ut_ad(mutex_own(buf_page_get_mutex(bpage)));
  1472. ut_a(buf_page_get_io_fix(bpage) == BUF_IO_NONE);
  1473. ut_a(bpage->buf_fix_count == 0);
  1474. ut_ad(bpage->in_LRU_list);
  1475. ut_ad(!bpage->in_zip_hash);
  1476. ut_ad(bpage->in_page_hash);
  1477. ut_ad(bpage == buf_page_hash_get_low(buf_pool,
  1478. bpage->space,
  1479. bpage->offset,
  1480. fold));
  1481. ut_ad(!buf_pool_watch_is_sentinel(buf_pool, bpage));
  1482. #ifdef UNIV_DEBUG
  1483. switch (buf_page_get_state(bpage)) {
  1484. case BUF_BLOCK_POOL_WATCH:
  1485. case BUF_BLOCK_NOT_USED:
  1486. case BUF_BLOCK_READY_FOR_USE:
  1487. case BUF_BLOCK_FILE_PAGE:
  1488. case BUF_BLOCK_MEMORY:
  1489. case BUF_BLOCK_REMOVE_HASH:
  1490. ut_error;
  1491. case BUF_BLOCK_ZIP_DIRTY:
  1492. case BUF_BLOCK_ZIP_PAGE:
  1493. break;
  1494. }
  1495. #endif /* UNIV_DEBUG */
  1496. memcpy(dpage, bpage, sizeof *dpage);
  1497. /* Important that we adjust the hazard pointer before
  1498. removing bpage from LRU list. */
  1499. buf_LRU_adjust_hp(buf_pool, bpage);
  1500. ut_d(bpage->in_LRU_list = FALSE);
  1501. ut_d(bpage->in_page_hash = FALSE);
  1502. /* relocate buf_pool->LRU */
  1503. b = UT_LIST_GET_PREV(LRU, bpage);
  1504. UT_LIST_REMOVE(LRU, buf_pool->LRU, bpage);
  1505. if (b) {
  1506. UT_LIST_INSERT_AFTER(LRU, buf_pool->LRU, b, dpage);
  1507. } else {
  1508. UT_LIST_ADD_FIRST(LRU, buf_pool->LRU, dpage);
  1509. }
  1510. if (UNIV_UNLIKELY(buf_pool->LRU_old == bpage)) {
  1511. buf_pool->LRU_old = dpage;
  1512. #ifdef UNIV_LRU_DEBUG
  1513. /* buf_pool->LRU_old must be the first item in the LRU list
  1514. whose "old" flag is set. */
  1515. ut_a(buf_pool->LRU_old->old);
  1516. ut_a(!UT_LIST_GET_PREV(LRU, buf_pool->LRU_old)
  1517. || !UT_LIST_GET_PREV(LRU, buf_pool->LRU_old)->old);
  1518. ut_a(!UT_LIST_GET_NEXT(LRU, buf_pool->LRU_old)
  1519. || UT_LIST_GET_NEXT(LRU, buf_pool->LRU_old)->old);
  1520. } else {
  1521. /* Check that the "old" flag is consistent in
  1522. the block and its neighbours. */
  1523. buf_page_set_old(dpage, buf_page_is_old(dpage));
  1524. #endif /* UNIV_LRU_DEBUG */
  1525. }
  1526. ut_d(UT_LIST_VALIDATE(
  1527. LRU, buf_page_t, buf_pool->LRU, CheckInLRUList()));
  1528. /* relocate buf_pool->page_hash */
  1529. HASH_DELETE(buf_page_t, hash, buf_pool->page_hash, fold, bpage);
  1530. HASH_INSERT(buf_page_t, hash, buf_pool->page_hash, fold, dpage);
  1531. }
  1532. /** Hazard Pointer implementation. */
  1533. /** Set current value
  1534. @param bpage buffer block to be set as hp */
  1535. void
  1536. HazardPointer::set(buf_page_t* bpage)
  1537. {
  1538. ut_ad(mutex_own(m_mutex));
  1539. ut_ad(!bpage || buf_pool_from_bpage(bpage) == m_buf_pool);
  1540. ut_ad(!bpage || buf_page_in_file(bpage));
  1541. m_hp = bpage;
  1542. }
  1543. /** Checks if a bpage is the hp
  1544. @param bpage buffer block to be compared
  1545. @return true if it is hp */
  1546. bool
  1547. HazardPointer::is_hp(const buf_page_t* bpage)
  1548. {
  1549. ut_ad(mutex_own(m_mutex));
  1550. ut_ad(!m_hp || buf_pool_from_bpage(m_hp) == m_buf_pool);
  1551. ut_ad(!bpage || buf_pool_from_bpage(bpage) == m_buf_pool);
  1552. return(bpage == m_hp);
  1553. }
  1554. /** Adjust the value of hp. This happens when some other thread working
  1555. on the same list attempts to remove the hp from the list.
  1556. @param bpage buffer block to be compared */
  1557. void
  1558. FlushHp::adjust(const buf_page_t* bpage)
  1559. {
  1560. ut_ad(bpage != NULL);
  1561. /** We only support reverse traversal for now. */
  1562. if (is_hp(bpage)) {
  1563. m_hp = UT_LIST_GET_PREV(list, m_hp);
  1564. }
  1565. ut_ad(!m_hp || m_hp->in_flush_list);
  1566. }
  1567. /** Adjust the value of hp. This happens when some other thread working
  1568. on the same list attempts to remove the hp from the list.
  1569. @param bpage buffer block to be compared */
  1570. void
  1571. LRUHp::adjust(const buf_page_t* bpage)
  1572. {
  1573. ut_ad(bpage);
  1574. /** We only support reverse traversal for now. */
  1575. if (is_hp(bpage)) {
  1576. m_hp = UT_LIST_GET_PREV(LRU, m_hp);
  1577. }
  1578. ut_ad(!m_hp || m_hp->in_LRU_list);
  1579. }
  1580. /** Selects from where to start a scan. If we have scanned too deep into
  1581. the LRU list it resets the value to the tail of the LRU list.
  1582. @return buf_page_t from where to start scan. */
  1583. buf_page_t*
  1584. LRUItr::start()
  1585. {
  1586. ut_ad(mutex_own(m_mutex));
  1587. if (!m_hp || m_hp->old) {
  1588. m_hp = UT_LIST_GET_LAST(m_buf_pool->LRU);
  1589. }
  1590. return(m_hp);
  1591. }
  1592. /********************************************************************//**
  1593. Determine if a block is a sentinel for a buffer pool watch.
  1594. @return TRUE if a sentinel for a buffer pool watch, FALSE if not */
  1595. UNIV_INTERN
  1596. ibool
  1597. buf_pool_watch_is_sentinel(
  1598. /*=======================*/
  1599. buf_pool_t* buf_pool, /*!< buffer pool instance */
  1600. const buf_page_t* bpage) /*!< in: block */
  1601. {
  1602. /* We must also own the appropriate hash lock. */
  1603. ut_ad(buf_page_hash_lock_held_s_or_x(buf_pool, bpage));
  1604. ut_ad(buf_page_in_file(bpage));
  1605. if (bpage < &buf_pool->watch[0]
  1606. || bpage >= &buf_pool->watch[BUF_POOL_WATCH_SIZE]) {
  1607. ut_ad(buf_page_get_state(bpage) != BUF_BLOCK_ZIP_PAGE
  1608. || bpage->zip.data != NULL);
  1609. return(FALSE);
  1610. }
  1611. ut_ad(buf_page_get_state(bpage) == BUF_BLOCK_ZIP_PAGE);
  1612. ut_ad(!bpage->in_zip_hash);
  1613. ut_ad(bpage->in_page_hash);
  1614. ut_ad(bpage->zip.data == NULL);
  1615. ut_ad(bpage->buf_fix_count > 0);
  1616. return(TRUE);
  1617. }
  1618. /****************************************************************//**
  1619. Add watch for the given page to be read in. Caller must have
  1620. appropriate hash_lock for the bpage. This function may release the
  1621. hash_lock and reacquire it.
  1622. @return NULL if watch set, block if the page is in the buffer pool */
  1623. UNIV_INTERN
  1624. buf_page_t*
  1625. buf_pool_watch_set(
  1626. /*===============*/
  1627. ulint space, /*!< in: space id */
  1628. ulint offset, /*!< in: page number */
  1629. ulint fold) /*!< in: buf_page_address_fold(space, offset) */
  1630. {
  1631. buf_page_t* bpage;
  1632. ulint i;
  1633. buf_pool_t* buf_pool = buf_pool_get(space, offset);
  1634. rw_lock_t* hash_lock;
  1635. hash_lock = buf_page_hash_lock_get(buf_pool, fold);
  1636. #ifdef UNIV_SYNC_DEBUG
  1637. ut_ad(rw_lock_own(hash_lock, RW_LOCK_EX));
  1638. #endif /* UNIV_SYNC_DEBUG */
  1639. bpage = buf_page_hash_get_low(buf_pool, space, offset, fold);
  1640. if (bpage != NULL) {
  1641. page_found:
  1642. if (!buf_pool_watch_is_sentinel(buf_pool, bpage)) {
  1643. /* The page was loaded meanwhile. */
  1644. return(bpage);
  1645. }
  1646. /* Add to an existing watch. */
  1647. #ifdef PAGE_ATOMIC_REF_COUNT
  1648. os_atomic_increment_uint32(&bpage->buf_fix_count, 1);
  1649. #else
  1650. ++bpage->buf_fix_count;
  1651. #endif /* PAGE_ATOMIC_REF_COUNT */
  1652. return(NULL);
  1653. }
  1654. /* From this point this function becomes fairly heavy in terms
  1655. of latching. We acquire the buf_pool mutex as well as all the
  1656. hash_locks. buf_pool mutex is needed because any changes to
  1657. the page_hash must be covered by it and hash_locks are needed
  1658. because we don't want to read any stale information in
  1659. buf_pool->watch[]. However, it is not in the critical code path
  1660. as this function will be called only by the purge thread. */
  1661. /* Enable this for checksum error messages. Currently on by
  1662. default on UNIV_DEBUG for encryption bugs. */
  1663. #ifdef UNIV_DEBUG
  1664. #define UNIV_DEBUG_LEVEL2 1
  1665. #endif
  1666. /* To obey latching order first release the hash_lock. */
  1667. rw_lock_x_unlock(hash_lock);
  1668. buf_pool_mutex_enter(buf_pool);
  1669. hash_lock_x_all(buf_pool->page_hash);
  1670. /* We have to recheck that the page
  1671. was not loaded or a watch set by some other
  1672. purge thread. This is because of the small
  1673. time window between when we release the
  1674. hash_lock to acquire buf_pool mutex above. */
  1675. bpage = buf_page_hash_get_low(buf_pool, space, offset, fold);
  1676. if (UNIV_LIKELY_NULL(bpage)) {
  1677. buf_pool_mutex_exit(buf_pool);
  1678. hash_unlock_x_all_but(buf_pool->page_hash, hash_lock);
  1679. goto page_found;
  1680. }
  1681. /* The maximum number of purge threads should never exceed
  1682. BUF_POOL_WATCH_SIZE. So there is no way for purge thread
  1683. instance to hold a watch when setting another watch. */
  1684. for (i = 0; i < BUF_POOL_WATCH_SIZE; i++) {
  1685. bpage = &buf_pool->watch[i];
  1686. ut_ad(bpage->access_time == 0);
  1687. ut_ad(bpage->newest_modification == 0);
  1688. ut_ad(bpage->oldest_modification == 0);
  1689. ut_ad(bpage->zip.data == NULL);
  1690. ut_ad(!bpage->in_zip_hash);
  1691. switch (bpage->state) {
  1692. case BUF_BLOCK_POOL_WATCH:
  1693. ut_ad(!bpage->in_page_hash);
  1694. ut_ad(bpage->buf_fix_count == 0);
  1695. /* bpage is pointing to buf_pool->watch[],
  1696. which is protected by buf_pool->mutex.
  1697. Normally, buf_page_t objects are protected by
  1698. buf_block_t::mutex or buf_pool->zip_mutex or both. */
  1699. bpage->state = BUF_BLOCK_ZIP_PAGE;
  1700. bpage->space = static_cast<ib_uint32_t>(space);
  1701. bpage->offset = static_cast<ib_uint32_t>(offset);
  1702. bpage->buf_fix_count = 1;
  1703. ut_d(bpage->in_page_hash = TRUE);
  1704. HASH_INSERT(buf_page_t, hash, buf_pool->page_hash,
  1705. fold, bpage);
  1706. buf_pool_mutex_exit(buf_pool);
  1707. /* Once the sentinel is in the page_hash we can
  1708. safely release all locks except just the
  1709. relevant hash_lock */
  1710. hash_unlock_x_all_but(buf_pool->page_hash,
  1711. hash_lock);
  1712. return(NULL);
  1713. case BUF_BLOCK_ZIP_PAGE:
  1714. ut_ad(bpage->in_page_hash);
  1715. ut_ad(bpage->buf_fix_count > 0);
  1716. break;
  1717. default:
  1718. ut_error;
  1719. }
  1720. }
  1721. /* Allocation failed. Either the maximum number of purge
  1722. threads should never exceed BUF_POOL_WATCH_SIZE, or this code
  1723. should be modified to return a special non-NULL value and the
  1724. caller should purge the record directly. */
  1725. ut_error;
  1726. /* Fix compiler warning */
  1727. return(NULL);
  1728. }
  1729. /****************************************************************//**
  1730. Remove the sentinel block for the watch before replacing it with a real block.
  1731. buf_page_watch_clear() or buf_page_watch_occurred() will notice that
  1732. the block has been replaced with the real block.
  1733. @return reference count, to be added to the replacement block */
  1734. static
  1735. void
  1736. buf_pool_watch_remove(
  1737. /*==================*/
  1738. buf_pool_t* buf_pool, /*!< buffer pool instance */
  1739. ulint fold, /*!< in: buf_page_address_fold(
  1740. space, offset) */
  1741. buf_page_t* watch) /*!< in/out: sentinel for watch */
  1742. {
  1743. #ifdef UNIV_SYNC_DEBUG
  1744. /* We must also own the appropriate hash_bucket mutex. */
  1745. rw_lock_t* hash_lock = buf_page_hash_lock_get(buf_pool, fold);
  1746. ut_ad(rw_lock_own(hash_lock, RW_LOCK_EX));
  1747. #endif /* UNIV_SYNC_DEBUG */
  1748. ut_ad(buf_pool_mutex_own(buf_pool));
  1749. HASH_DELETE(buf_page_t, hash, buf_pool->page_hash, fold, watch);
  1750. ut_d(watch->in_page_hash = FALSE);
  1751. watch->buf_fix_count = 0;
  1752. watch->state = BUF_BLOCK_POOL_WATCH;
  1753. }
  1754. /****************************************************************//**
  1755. Stop watching if the page has been read in.
  1756. buf_pool_watch_set(space,offset) must have returned NULL before. */
  1757. UNIV_INTERN
  1758. void
  1759. buf_pool_watch_unset(
  1760. /*=================*/
  1761. ulint space, /*!< in: space id */
  1762. ulint offset) /*!< in: page number */
  1763. {
  1764. buf_page_t* bpage;
  1765. buf_pool_t* buf_pool = buf_pool_get(space, offset);
  1766. ulint fold = buf_page_address_fold(space, offset);
  1767. rw_lock_t* hash_lock = buf_page_hash_lock_get(buf_pool, fold);
  1768. /* We only need to have buf_pool mutex in case where we end
  1769. up calling buf_pool_watch_remove but to obey latching order
  1770. we acquire it here before acquiring hash_lock. This should
  1771. not cause too much grief as this function is only ever
  1772. called from the purge thread. */
  1773. buf_pool_mutex_enter(buf_pool);
  1774. rw_lock_x_lock(hash_lock);
  1775. /* The page must exist because buf_pool_watch_set() increments
  1776. buf_fix_count. */
  1777. bpage = buf_page_hash_get_low(buf_pool, space, offset, fold);
  1778. if (!buf_pool_watch_is_sentinel(buf_pool, bpage)) {
  1779. buf_block_unfix(reinterpret_cast<buf_block_t*>(bpage));
  1780. } else {
  1781. ut_ad(bpage->buf_fix_count > 0);
  1782. #ifdef PAGE_ATOMIC_REF_COUNT
  1783. os_atomic_decrement_uint32(&bpage->buf_fix_count, 1);
  1784. #else
  1785. --bpage->buf_fix_count;
  1786. #endif /* PAGE_ATOMIC_REF_COUNT */
  1787. if (bpage->buf_fix_count == 0) {
  1788. buf_pool_watch_remove(buf_pool, fold, bpage);
  1789. }
  1790. }
  1791. buf_pool_mutex_exit(buf_pool);
  1792. rw_lock_x_unlock(hash_lock);
  1793. }
  1794. /****************************************************************//**
  1795. Check if the page has been read in.
  1796. This may only be called after buf_pool_watch_set(space,offset)
  1797. has returned NULL and before invoking buf_pool_watch_unset(space,offset).
  1798. @return FALSE if the given page was not read in, TRUE if it was */
  1799. UNIV_INTERN
  1800. ibool
  1801. buf_pool_watch_occurred(
  1802. /*====================*/
  1803. ulint space, /*!< in: space id */
  1804. ulint offset) /*!< in: page number */
  1805. {
  1806. ibool ret;
  1807. buf_page_t* bpage;
  1808. buf_pool_t* buf_pool = buf_pool_get(space, offset);
  1809. ulint fold = buf_page_address_fold(space, offset);
  1810. rw_lock_t* hash_lock = buf_page_hash_lock_get(buf_pool,
  1811. fold);
  1812. rw_lock_s_lock(hash_lock);
  1813. /* The page must exist because buf_pool_watch_set()
  1814. increments buf_fix_count. */
  1815. bpage = buf_page_hash_get_low(buf_pool, space, offset, fold);
  1816. ret = !buf_pool_watch_is_sentinel(buf_pool, bpage);
  1817. rw_lock_s_unlock(hash_lock);
  1818. return(ret);
  1819. }
  1820. /********************************************************************//**
  1821. Moves a page to the start of the buffer pool LRU list. This high-level
  1822. function can be used to prevent an important page from slipping out of
  1823. the buffer pool. */
  1824. UNIV_INTERN
  1825. void
  1826. buf_page_make_young(
  1827. /*================*/
  1828. buf_page_t* bpage) /*!< in: buffer block of a file page */
  1829. {
  1830. buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
  1831. buf_pool_mutex_enter(buf_pool);
  1832. ut_a(buf_page_in_file(bpage));
  1833. buf_LRU_make_block_young(bpage);
  1834. buf_pool_mutex_exit(buf_pool);
  1835. }
  1836. /********************************************************************//**
  1837. Moves a page to the start of the buffer pool LRU list if it is too old.
  1838. This high-level function can be used to prevent an important page from
  1839. slipping out of the buffer pool. */
  1840. static
  1841. void
  1842. buf_page_make_young_if_needed(
  1843. /*==========================*/
  1844. buf_page_t* bpage) /*!< in/out: buffer block of a
  1845. file page */
  1846. {
  1847. #ifdef UNIV_DEBUG
  1848. buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
  1849. ut_ad(!buf_pool_mutex_own(buf_pool));
  1850. #endif /* UNIV_DEBUG */
  1851. ut_a(buf_page_in_file(bpage));
  1852. if (buf_page_peek_if_too_old(bpage)) {
  1853. buf_page_make_young(bpage);
  1854. }
  1855. }
  1856. /********************************************************************//**
  1857. Resets the check_index_page_at_flush field of a page if found in the buffer
  1858. pool. */
  1859. UNIV_INTERN
  1860. void
  1861. buf_reset_check_index_page_at_flush(
  1862. /*================================*/
  1863. ulint space, /*!< in: space id */
  1864. ulint offset) /*!< in: page number */
  1865. {
  1866. buf_block_t* block;
  1867. buf_pool_t* buf_pool = buf_pool_get(space, offset);
  1868. buf_pool_mutex_enter(buf_pool);
  1869. block = (buf_block_t*) buf_page_hash_get(buf_pool, space, offset);
  1870. if (block && buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE) {
  1871. ut_ad(!buf_pool_watch_is_sentinel(buf_pool, &block->page));
  1872. block->check_index_page_at_flush = FALSE;
  1873. }
  1874. buf_pool_mutex_exit(buf_pool);
  1875. }
  1876. #if defined UNIV_DEBUG_FILE_ACCESSES || defined UNIV_DEBUG
  1877. /********************************************************************//**
  1878. Sets file_page_was_freed TRUE if the page is found in the buffer pool.
  1879. This function should be called when we free a file page and want the
  1880. debug version to check that it is not accessed any more unless
  1881. reallocated.
  1882. @return control block if found in page hash table, otherwise NULL */
  1883. UNIV_INTERN
  1884. buf_page_t*
  1885. buf_page_set_file_page_was_freed(
  1886. /*=============================*/
  1887. ulint space, /*!< in: space id */
  1888. ulint offset) /*!< in: page number */
  1889. {
  1890. buf_page_t* bpage;
  1891. buf_pool_t* buf_pool = buf_pool_get(space, offset);
  1892. rw_lock_t* hash_lock;
  1893. bpage = buf_page_hash_get_s_locked(buf_pool, space, offset,
  1894. &hash_lock);
  1895. if (bpage) {
  1896. ib_mutex_t* block_mutex = buf_page_get_mutex(bpage);
  1897. ut_ad(!buf_pool_watch_is_sentinel(buf_pool, bpage));
  1898. mutex_enter(block_mutex);
  1899. rw_lock_s_unlock(hash_lock);
  1900. /* bpage->file_page_was_freed can already hold
  1901. when this code is invoked from dict_drop_index_tree() */
  1902. bpage->file_page_was_freed = TRUE;
  1903. mutex_exit(block_mutex);
  1904. }
  1905. return(bpage);
  1906. }
  1907. /********************************************************************//**
  1908. Sets file_page_was_freed FALSE if the page is found in the buffer pool.
  1909. This function should be called when we free a file page and want the
  1910. debug version to check that it is not accessed any more unless
  1911. reallocated.
  1912. @return control block if found in page hash table, otherwise NULL */
  1913. UNIV_INTERN
  1914. buf_page_t*
  1915. buf_page_reset_file_page_was_freed(
  1916. /*===============================*/
  1917. ulint space, /*!< in: space id */
  1918. ulint offset) /*!< in: page number */
  1919. {
  1920. buf_page_t* bpage;
  1921. buf_pool_t* buf_pool = buf_pool_get(space, offset);
  1922. rw_lock_t* hash_lock;
  1923. bpage = buf_page_hash_get_s_locked(buf_pool, space, offset,
  1924. &hash_lock);
  1925. if (bpage) {
  1926. ib_mutex_t* block_mutex = buf_page_get_mutex(bpage);
  1927. ut_ad(!buf_pool_watch_is_sentinel(buf_pool, bpage));
  1928. mutex_enter(block_mutex);
  1929. rw_lock_s_unlock(hash_lock);
  1930. bpage->file_page_was_freed = FALSE;
  1931. mutex_exit(block_mutex);
  1932. }
  1933. return(bpage);
  1934. }
  1935. #endif /* UNIV_DEBUG_FILE_ACCESSES || UNIV_DEBUG */
  1936. /********************************************************************//**
  1937. Attempts to discard the uncompressed frame of a compressed page. The
  1938. caller should not be holding any mutexes when this function is called.
  1939. @return TRUE if successful, FALSE otherwise. */
  1940. static
  1941. void
  1942. buf_block_try_discard_uncompressed(
  1943. /*===============================*/
  1944. ulint space, /*!< in: space id */
  1945. ulint offset) /*!< in: page number */
  1946. {
  1947. buf_page_t* bpage;
  1948. buf_pool_t* buf_pool = buf_pool_get(space, offset);
  1949. /* Since we need to acquire buf_pool mutex to discard
  1950. the uncompressed frame and because page_hash mutex resides
  1951. below buf_pool mutex in sync ordering therefore we must
  1952. first release the page_hash mutex. This means that the
  1953. block in question can move out of page_hash. Therefore
  1954. we need to check again if the block is still in page_hash. */
  1955. buf_pool_mutex_enter(buf_pool);
  1956. bpage = buf_page_hash_get(buf_pool, space, offset);
  1957. if (bpage) {
  1958. buf_LRU_free_page(bpage, false);
  1959. }
  1960. buf_pool_mutex_exit(buf_pool);
  1961. }
  1962. /********************************************************************//**
  1963. Get read access to a compressed page (usually of type
  1964. FIL_PAGE_TYPE_ZBLOB or FIL_PAGE_TYPE_ZBLOB2).
  1965. The page must be released with buf_page_release_zip().
  1966. NOTE: the page is not protected by any latch. Mutual exclusion has to
  1967. be implemented at a higher level. In other words, all possible
  1968. accesses to a given page through this function must be protected by
  1969. the same set of mutexes or latches.
  1970. @return pointer to the block */
  1971. UNIV_INTERN
  1972. buf_page_t*
  1973. buf_page_get_zip(
  1974. /*=============*/
  1975. ulint space, /*!< in: space id */
  1976. ulint zip_size,/*!< in: compressed page size */
  1977. ulint offset) /*!< in: page number */
  1978. {
  1979. buf_page_t* bpage;
  1980. ib_mutex_t* block_mutex;
  1981. rw_lock_t* hash_lock;
  1982. ibool discard_attempted = FALSE;
  1983. ibool must_read;
  1984. buf_pool_t* buf_pool = buf_pool_get(space, offset);
  1985. buf_pool->stat.n_page_gets++;
  1986. for (;;) {
  1987. lookup:
  1988. /* The following call will also grab the page_hash
  1989. mutex if the page is found. */
  1990. bpage = buf_page_hash_get_s_locked(buf_pool, space,
  1991. offset, &hash_lock);
  1992. if (bpage) {
  1993. ut_ad(!buf_pool_watch_is_sentinel(buf_pool, bpage));
  1994. break;
  1995. }
  1996. /* Page not in buf_pool: needs to be read from file */
  1997. ut_ad(!hash_lock);
  1998. dberr_t err = buf_read_page(space, zip_size, offset);
  1999. if (err != DB_SUCCESS) {
  2000. ib_logf(IB_LOG_LEVEL_ERROR,
  2001. "Reading compressed page " ULINTPF
  2002. ":" ULINTPF
  2003. " failed with error: %s.",
  2004. space, offset, ut_strerr(err));
  2005. goto err_exit;
  2006. }
  2007. #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
  2008. ut_a(++buf_dbg_counter % 5771 || buf_validate());
  2009. #endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
  2010. }
  2011. ut_ad(buf_page_hash_lock_held_s(buf_pool, bpage));
  2012. if (!bpage->zip.data) {
  2013. /* There is no compressed page. */
  2014. err_exit:
  2015. rw_lock_s_unlock(hash_lock);
  2016. return(NULL);
  2017. }
  2018. ut_ad(!buf_pool_watch_is_sentinel(buf_pool, bpage));
  2019. switch (buf_page_get_state(bpage)) {
  2020. case BUF_BLOCK_POOL_WATCH:
  2021. case BUF_BLOCK_NOT_USED:
  2022. case BUF_BLOCK_READY_FOR_USE:
  2023. case BUF_BLOCK_MEMORY:
  2024. case BUF_BLOCK_REMOVE_HASH:
  2025. ut_error;
  2026. case BUF_BLOCK_ZIP_PAGE:
  2027. case BUF_BLOCK_ZIP_DIRTY:
  2028. block_mutex = &buf_pool->zip_mutex;
  2029. mutex_enter(block_mutex);
  2030. #ifdef PAGE_ATOMIC_REF_COUNT
  2031. os_atomic_increment_uint32(&bpage->buf_fix_count, 1);
  2032. #else
  2033. ++bpage->buf_fix_count;
  2034. #endif /* PAGE_ATOMIC_REF_COUNT */
  2035. goto got_block;
  2036. case BUF_BLOCK_FILE_PAGE:
  2037. /* Discard the uncompressed page frame if possible. */
  2038. if (!discard_attempted) {
  2039. rw_lock_s_unlock(hash_lock);
  2040. buf_block_try_discard_uncompressed(space, offset);
  2041. discard_attempted = TRUE;
  2042. goto lookup;
  2043. }
  2044. block_mutex = &((buf_block_t*) bpage)->mutex;
  2045. mutex_enter(block_mutex);
  2046. buf_block_buf_fix_inc((buf_block_t*) bpage, __FILE__, __LINE__);
  2047. goto got_block;
  2048. }
  2049. ut_error;
  2050. goto err_exit;
  2051. got_block:
  2052. must_read = buf_page_get_io_fix(bpage) == BUF_IO_READ;
  2053. rw_lock_s_unlock(hash_lock);
  2054. #if defined UNIV_DEBUG_FILE_ACCESSES || defined UNIV_DEBUG
  2055. ut_a(!bpage->file_page_was_freed);
  2056. #endif /* defined UNIV_DEBUG_FILE_ACCESSES || defined UNIV_DEBUG */
  2057. buf_page_set_accessed(bpage);
  2058. mutex_exit(block_mutex);
  2059. buf_page_make_young_if_needed(bpage);
  2060. #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
  2061. ut_a(++buf_dbg_counter % 5771 || buf_validate());
  2062. ut_a(bpage->buf_fix_count > 0);
  2063. ut_a(buf_page_in_file(bpage));
  2064. #endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
  2065. if (must_read) {
  2066. /* Let us wait until the read operation
  2067. completes */
  2068. for (;;) {
  2069. enum buf_io_fix io_fix;
  2070. mutex_enter(block_mutex);
  2071. io_fix = buf_page_get_io_fix(bpage);
  2072. mutex_exit(block_mutex);
  2073. if (io_fix == BUF_IO_READ) {
  2074. os_thread_sleep(WAIT_FOR_READ);
  2075. } else {
  2076. break;
  2077. }
  2078. }
  2079. }
  2080. #ifdef UNIV_IBUF_COUNT_DEBUG
  2081. ut_a(ibuf_count_get(buf_page_get_space(bpage),
  2082. buf_page_get_page_no(bpage)) == 0);
  2083. #endif
  2084. return(bpage);
  2085. }
  2086. /********************************************************************//**
  2087. Initialize some fields of a control block. */
  2088. UNIV_INLINE
  2089. void
  2090. buf_block_init_low(
  2091. /*===============*/
  2092. buf_block_t* block) /*!< in: block to init */
  2093. {
  2094. block->check_index_page_at_flush = FALSE;
  2095. block->index = NULL;
  2096. block->n_hash_helps = 0;
  2097. block->n_fields = 1;
  2098. block->n_bytes = 0;
  2099. block->left_side = TRUE;
  2100. }
  2101. #endif /* !UNIV_HOTBACKUP */
  2102. /********************************************************************//**
  2103. Decompress a block.
  2104. @return TRUE if successful */
  2105. UNIV_INTERN
  2106. ibool
  2107. buf_zip_decompress(
  2108. /*===============*/
  2109. buf_block_t* block, /*!< in/out: block */
  2110. ibool check) /*!< in: TRUE=verify the page checksum */
  2111. {
  2112. const byte* frame = block->page.zip.data;
  2113. ulint size = page_zip_get_size(&block->page.zip);
  2114. /* Space is not found if this function is called during IMPORT */
  2115. fil_space_t* space = fil_space_acquire_for_io(block->page.space);
  2116. const unsigned key_version = mach_read_from_4(frame +
  2117. FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION);
  2118. fil_space_crypt_t* crypt_data = space ? space->crypt_data : NULL;
  2119. const bool encrypted = crypt_data
  2120. && crypt_data->type != CRYPT_SCHEME_UNENCRYPTED
  2121. && (!crypt_data->is_default_encryption()
  2122. || srv_encrypt_tables);
  2123. ut_ad(buf_block_get_zip_size(block));
  2124. ut_a(buf_block_get_space(block) != 0);
  2125. if (UNIV_UNLIKELY(check && !page_zip_verify_checksum(frame, size))) {
  2126. ib_logf(IB_LOG_LEVEL_ERROR,
  2127. "Compressed page checksum mismatch"
  2128. " for %s [%u:%u]: stored: " ULINTPF ", crc32: " ULINTPF
  2129. " innodb: " ULINTPF ", none: " ULINTPF ".",
  2130. space ? space->chain.start->name : "N/A",
  2131. block->page.space, block->page.offset,
  2132. mach_read_from_4(frame + FIL_PAGE_SPACE_OR_CHKSUM),
  2133. page_zip_calc_checksum(frame, size,
  2134. SRV_CHECKSUM_ALGORITHM_CRC32),
  2135. page_zip_calc_checksum(frame, size,
  2136. SRV_CHECKSUM_ALGORITHM_INNODB),
  2137. page_zip_calc_checksum(frame, size,
  2138. SRV_CHECKSUM_ALGORITHM_NONE));
  2139. goto err_exit;
  2140. }
  2141. switch (fil_page_get_type(frame)) {
  2142. case FIL_PAGE_INDEX: {
  2143. if (page_zip_decompress(&block->page.zip,
  2144. block->frame, TRUE)) {
  2145. if (space) {
  2146. fil_space_release_for_io(space);
  2147. }
  2148. return(TRUE);
  2149. }
  2150. ib_logf(IB_LOG_LEVEL_ERROR,
  2151. "Unable to decompress space %s [%u:%u]",
  2152. space ? space->chain.start->name : "N/A",
  2153. block->page.space,
  2154. block->page.offset);
  2155. goto err_exit;
  2156. }
  2157. case FIL_PAGE_TYPE_ALLOCATED:
  2158. case FIL_PAGE_INODE:
  2159. case FIL_PAGE_IBUF_BITMAP:
  2160. case FIL_PAGE_TYPE_FSP_HDR:
  2161. case FIL_PAGE_TYPE_XDES:
  2162. case FIL_PAGE_TYPE_ZBLOB:
  2163. case FIL_PAGE_TYPE_ZBLOB2:
  2164. /* Copy to uncompressed storage. */
  2165. memcpy(block->frame, frame,
  2166. buf_block_get_zip_size(block));
  2167. if (space) {
  2168. fil_space_release_for_io(space);
  2169. }
  2170. return(TRUE);
  2171. }
  2172. ib_logf(IB_LOG_LEVEL_ERROR,
  2173. "Unknown compressed page in %s [%u:%u]"
  2174. " type %s [" ULINTPF "].",
  2175. space ? space->chain.start->name : "N/A",
  2176. block->page.space, block->page.offset,
  2177. fil_get_page_type_name(fil_page_get_type(frame)), fil_page_get_type(frame));
  2178. err_exit:
  2179. if (encrypted) {
  2180. ib_logf(IB_LOG_LEVEL_INFO,
  2181. "Row compressed page could be encrypted with key_version %u.",
  2182. key_version);
  2183. block->page.encrypted = true;
  2184. dict_set_encrypted_by_space(block->page.space);
  2185. } else {
  2186. dict_set_corrupted_by_space(block->page.space);
  2187. }
  2188. if (space) {
  2189. fil_space_release_for_io(space);
  2190. }
  2191. return(FALSE);
  2192. }
  2193. #ifndef UNIV_HOTBACKUP
  2194. /*******************************************************************//**
  2195. Gets the block to whose frame the pointer is pointing to if found
  2196. in this buffer pool instance.
  2197. @return pointer to block */
  2198. UNIV_INTERN
  2199. buf_block_t*
  2200. buf_block_align_instance(
  2201. /*=====================*/
  2202. buf_pool_t* buf_pool, /*!< in: buffer in which the block
  2203. resides */
  2204. const byte* ptr) /*!< in: pointer to a frame */
  2205. {
  2206. buf_chunk_t* chunk;
  2207. ulint i;
  2208. /* TODO: protect buf_pool->chunks with a mutex (it will
  2209. currently remain constant after buf_pool_init()) */
  2210. for (chunk = buf_pool->chunks, i = buf_pool->n_chunks; i--; chunk++) {
  2211. ulint offs;
  2212. if (UNIV_UNLIKELY(ptr < chunk->blocks->frame)) {
  2213. continue;
  2214. }
  2215. /* else */
  2216. offs = ptr - chunk->blocks->frame;
  2217. offs >>= UNIV_PAGE_SIZE_SHIFT;
  2218. if (UNIV_LIKELY(offs < chunk->size)) {
  2219. buf_block_t* block = &chunk->blocks[offs];
  2220. /* The function buf_chunk_init() invokes
  2221. buf_block_init() so that block[n].frame ==
  2222. block->frame + n * UNIV_PAGE_SIZE. Check it. */
  2223. ut_ad(block->frame == page_align(ptr));
  2224. #ifdef UNIV_DEBUG
  2225. /* A thread that updates these fields must
  2226. hold buf_pool->mutex and block->mutex. Acquire
  2227. only the latter. */
  2228. mutex_enter(&block->mutex);
  2229. switch (buf_block_get_state(block)) {
  2230. case BUF_BLOCK_POOL_WATCH:
  2231. case BUF_BLOCK_ZIP_PAGE:
  2232. case BUF_BLOCK_ZIP_DIRTY:
  2233. /* These types should only be used in
  2234. the compressed buffer pool, whose
  2235. memory is allocated from
  2236. buf_pool->chunks, in UNIV_PAGE_SIZE
  2237. blocks flagged as BUF_BLOCK_MEMORY. */
  2238. ut_error;
  2239. break;
  2240. case BUF_BLOCK_NOT_USED:
  2241. case BUF_BLOCK_READY_FOR_USE:
  2242. case BUF_BLOCK_MEMORY:
  2243. /* Some data structures contain
  2244. "guess" pointers to file pages. The
  2245. file pages may have been freed and
  2246. reused. Do not complain. */
  2247. break;
  2248. case BUF_BLOCK_REMOVE_HASH:
  2249. /* buf_LRU_block_remove_hashed_page()
  2250. will overwrite the FIL_PAGE_OFFSET and
  2251. FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID with
  2252. 0xff and set the state to
  2253. BUF_BLOCK_REMOVE_HASH. */
  2254. ut_ad(page_get_space_id(page_align(ptr))
  2255. == 0xffffffff);
  2256. ut_ad(page_get_page_no(page_align(ptr))
  2257. == 0xffffffff);
  2258. break;
  2259. case BUF_BLOCK_FILE_PAGE: {
  2260. ulint space = page_get_space_id(page_align(ptr));
  2261. ulint offset = page_get_page_no(page_align(ptr));
  2262. if (block->page.space != space ||
  2263. block->page.offset != offset) {
  2264. ib_logf(IB_LOG_LEVEL_ERROR,
  2265. "Corruption: Block space_id %lu != page space_id %lu or "
  2266. "Block offset %lu != page offset %lu",
  2267. (ulint)block->page.space, space,
  2268. (ulint)block->page.offset, offset);
  2269. }
  2270. ut_ad(block->page.space
  2271. == page_get_space_id(page_align(ptr)));
  2272. ut_ad(block->page.offset
  2273. == page_get_page_no(page_align(ptr)));
  2274. break;
  2275. }
  2276. }
  2277. mutex_exit(&block->mutex);
  2278. #endif /* UNIV_DEBUG */
  2279. return(block);
  2280. }
  2281. }
  2282. return(NULL);
  2283. }
  2284. /*******************************************************************//**
  2285. Gets the block to whose frame the pointer is pointing to.
  2286. @return pointer to block, never NULL */
  2287. UNIV_INTERN
  2288. buf_block_t*
  2289. buf_block_align(
  2290. /*============*/
  2291. const byte* ptr) /*!< in: pointer to a frame */
  2292. {
  2293. ulint i;
  2294. for (i = 0; i < srv_buf_pool_instances; i++) {
  2295. buf_block_t* block;
  2296. block = buf_block_align_instance(
  2297. buf_pool_from_array(i), ptr);
  2298. if (block) {
  2299. return(block);
  2300. }
  2301. }
  2302. /* The block should always be found. */
  2303. ut_error;
  2304. return(NULL);
  2305. }
  2306. /********************************************************************//**
  2307. Find out if a pointer belongs to a buf_block_t. It can be a pointer to
  2308. the buf_block_t itself or a member of it. This functions checks one of
  2309. the buffer pool instances.
  2310. @return TRUE if ptr belongs to a buf_block_t struct */
  2311. static
  2312. ibool
  2313. buf_pointer_is_block_field_instance(
  2314. /*================================*/
  2315. buf_pool_t* buf_pool, /*!< in: buffer pool instance */
  2316. const void* ptr) /*!< in: pointer not dereferenced */
  2317. {
  2318. const buf_chunk_t* chunk = buf_pool->chunks;
  2319. const buf_chunk_t* const echunk = chunk + buf_pool->n_chunks;
  2320. /* TODO: protect buf_pool->chunks with a mutex (it will
  2321. currently remain constant after buf_pool_init()) */
  2322. while (chunk < echunk) {
  2323. if (ptr >= (void*) chunk->blocks
  2324. && ptr < (void*) (chunk->blocks + chunk->size)) {
  2325. return(TRUE);
  2326. }
  2327. chunk++;
  2328. }
  2329. return(FALSE);
  2330. }
  2331. /********************************************************************//**
  2332. Find out if a pointer belongs to a buf_block_t. It can be a pointer to
  2333. the buf_block_t itself or a member of it
  2334. @return TRUE if ptr belongs to a buf_block_t struct */
  2335. UNIV_INTERN
  2336. ibool
  2337. buf_pointer_is_block_field(
  2338. /*=======================*/
  2339. const void* ptr) /*!< in: pointer not dereferenced */
  2340. {
  2341. ulint i;
  2342. for (i = 0; i < srv_buf_pool_instances; i++) {
  2343. ibool found;
  2344. found = buf_pointer_is_block_field_instance(
  2345. buf_pool_from_array(i), ptr);
  2346. if (found) {
  2347. return(TRUE);
  2348. }
  2349. }
  2350. return(FALSE);
  2351. }
  2352. /********************************************************************//**
  2353. Find out if a buffer block was created by buf_chunk_init().
  2354. @return TRUE if "block" has been added to buf_pool->free by buf_chunk_init() */
  2355. static
  2356. ibool
  2357. buf_block_is_uncompressed(
  2358. /*======================*/
  2359. buf_pool_t* buf_pool, /*!< in: buffer pool instance */
  2360. const buf_block_t* block) /*!< in: pointer to block,
  2361. not dereferenced */
  2362. {
  2363. if ((((ulint) block) % sizeof *block) != 0) {
  2364. /* The pointer should be aligned. */
  2365. return(FALSE);
  2366. }
  2367. return(buf_pointer_is_block_field_instance(buf_pool, (void*) block));
  2368. }
  2369. #if defined UNIV_DEBUG || defined UNIV_IBUF_DEBUG
  2370. /********************************************************************//**
  2371. Return true if probe is enabled.
  2372. @return true if probe enabled. */
  2373. static
  2374. bool
  2375. buf_debug_execute_is_force_flush()
  2376. /*==============================*/
  2377. {
  2378. DBUG_EXECUTE_IF("ib_buf_force_flush", return(true); );
  2379. /* This is used during queisce testing, we want to ensure maximum
  2380. buffering by the change buffer. */
  2381. if (srv_ibuf_disable_background_merge) {
  2382. return(true);
  2383. }
  2384. return(false);
  2385. }
  2386. #endif /* UNIV_DEBUG || UNIV_IBUF_DEBUG */
  2387. /**
  2388. Wait for the block to be read in.
  2389. @param block The block to check */
  2390. static
  2391. void
  2392. buf_wait_for_read(buf_block_t* block)
  2393. {
  2394. /* Note: For the PAGE_ATOMIC_REF_COUNT case:
  2395. We are using the block->lock to check for IO state (and a dirty read).
  2396. We set the IO_READ state under the protection of the hash_lock
  2397. (and block->mutex). This is safe because another thread can only
  2398. access the block (and check for IO state) after the block has been
  2399. added to the page hashtable. */
  2400. if (buf_block_get_io_fix(block) == BUF_IO_READ) {
  2401. /* Wait until the read operation completes */
  2402. ib_mutex_t* mutex = buf_page_get_mutex(&block->page);
  2403. for (;;) {
  2404. buf_io_fix io_fix;
  2405. mutex_enter(mutex);
  2406. io_fix = buf_block_get_io_fix(block);
  2407. mutex_exit(mutex);
  2408. if (io_fix == BUF_IO_READ) {
  2409. /* Wait by temporaly s-latch */
  2410. rw_lock_s_lock(&block->lock);
  2411. rw_lock_s_unlock(&block->lock);
  2412. } else {
  2413. break;
  2414. }
  2415. }
  2416. }
  2417. }
  2418. /********************************************************************//**
  2419. This is the general function used to get access to a database page.
  2420. @return pointer to the block or NULL */
  2421. UNIV_INTERN
  2422. buf_block_t*
  2423. buf_page_get_gen(
  2424. /*=============*/
  2425. ulint space, /*!< in: space id */
  2426. ulint zip_size,/*!< in: compressed page size in bytes
  2427. or 0 for uncompressed pages */
  2428. ulint offset, /*!< in: page number */
  2429. ulint rw_latch,/*!< in: RW_S_LATCH, RW_X_LATCH, RW_NO_LATCH */
  2430. buf_block_t* guess, /*!< in: guessed block or NULL */
  2431. ulint mode, /*!< in: BUF_GET, BUF_GET_IF_IN_POOL,
  2432. BUF_PEEK_IF_IN_POOL, BUF_GET_NO_LATCH, or
  2433. BUF_GET_IF_IN_POOL_OR_WATCH */
  2434. const char* file, /*!< in: file name */
  2435. ulint line, /*!< in: line where called */
  2436. mtr_t* mtr, /*!< in: mini-transaction */
  2437. dberr_t* err) /*!< out: error code */
  2438. {
  2439. buf_block_t* block;
  2440. ulint fold;
  2441. unsigned access_time;
  2442. ulint fix_type;
  2443. rw_lock_t* hash_lock;
  2444. ulint retries = 0;
  2445. buf_block_t* fix_block;
  2446. ib_mutex_t* fix_mutex = NULL;
  2447. buf_pool_t* buf_pool = buf_pool_get(space, offset);
  2448. ut_ad(mtr);
  2449. ut_ad(mtr->state == MTR_ACTIVE);
  2450. ut_ad((rw_latch == RW_S_LATCH)
  2451. || (rw_latch == RW_X_LATCH)
  2452. || (rw_latch == RW_NO_LATCH));
  2453. if (err) {
  2454. *err = DB_SUCCESS;
  2455. }
  2456. #ifdef UNIV_DEBUG
  2457. switch (mode) {
  2458. case BUF_GET_NO_LATCH:
  2459. ut_ad(rw_latch == RW_NO_LATCH);
  2460. break;
  2461. case BUF_GET:
  2462. case BUF_GET_IF_IN_POOL:
  2463. case BUF_PEEK_IF_IN_POOL:
  2464. case BUF_GET_IF_IN_POOL_OR_WATCH:
  2465. case BUF_GET_POSSIBLY_FREED:
  2466. break;
  2467. default:
  2468. ut_error;
  2469. }
  2470. #endif /* UNIV_DEBUG */
  2471. ut_ad(zip_size == fil_space_get_zip_size(space));
  2472. ut_ad(ut_is_2pow(zip_size));
  2473. #ifndef UNIV_LOG_DEBUG
  2474. ut_ad(!ibuf_inside(mtr)
  2475. || ibuf_page_low(space, zip_size, offset,
  2476. FALSE, file, line, NULL));
  2477. #endif
  2478. buf_pool->stat.n_page_gets++;
  2479. fold = buf_page_address_fold(space, offset);
  2480. hash_lock = buf_page_hash_lock_get(buf_pool, fold);
  2481. loop:
  2482. block = guess;
  2483. rw_lock_s_lock(hash_lock);
  2484. if (block != NULL) {
  2485. /* If the guess is a compressed page descriptor that
  2486. has been allocated by buf_page_alloc_descriptor(),
  2487. it may have been freed by buf_relocate(). */
  2488. if (!buf_block_is_uncompressed(buf_pool, block)
  2489. || offset != block->page.offset
  2490. || space != block->page.space
  2491. || buf_block_get_state(block) != BUF_BLOCK_FILE_PAGE) {
  2492. /* Our guess was bogus or things have changed
  2493. since. */
  2494. block = guess = NULL;
  2495. } else {
  2496. ut_ad(!block->page.in_zip_hash);
  2497. }
  2498. }
  2499. if (block == NULL) {
  2500. block = (buf_block_t*) buf_page_hash_get_low(
  2501. buf_pool, space, offset, fold);
  2502. }
  2503. if (!block || buf_pool_watch_is_sentinel(buf_pool, &block->page)) {
  2504. rw_lock_s_unlock(hash_lock);
  2505. block = NULL;
  2506. }
  2507. if (block == NULL) {
  2508. /* Page not in buf_pool: needs to be read from file */
  2509. if (mode == BUF_GET_IF_IN_POOL_OR_WATCH) {
  2510. rw_lock_x_lock(hash_lock);
  2511. block = (buf_block_t*) buf_pool_watch_set(
  2512. space, offset, fold);
  2513. if (UNIV_LIKELY_NULL(block)) {
  2514. /* We can release hash_lock after we
  2515. increment the fix count to make
  2516. sure that no state change takes place. */
  2517. fix_block = block;
  2518. buf_block_fix(fix_block);
  2519. /* Now safe to release page_hash mutex */
  2520. rw_lock_x_unlock(hash_lock);
  2521. goto got_block;
  2522. }
  2523. rw_lock_x_unlock(hash_lock);
  2524. }
  2525. if (mode == BUF_GET_IF_IN_POOL
  2526. || mode == BUF_PEEK_IF_IN_POOL
  2527. || mode == BUF_GET_IF_IN_POOL_OR_WATCH) {
  2528. #ifdef UNIV_SYNC_DEBUG
  2529. ut_ad(!rw_lock_own(hash_lock, RW_LOCK_EX));
  2530. ut_ad(!rw_lock_own(hash_lock, RW_LOCK_SHARED));
  2531. #endif /* UNIV_SYNC_DEBUG */
  2532. return(NULL);
  2533. }
  2534. /* Call path is buf_read_page() -> buf_read_page_low()
  2535. (_fil_io()) -> buf_page_io_complete() ->
  2536. buf_decrypt_after_read() here fil_space_t* is used
  2537. and we decrypt -> buf_page_check_corrupt() where
  2538. page checksums are compared. Decryption/decompression
  2539. is handled lower level, error handling is handled on lower
  2540. level, here we need only to know is page really corrupted
  2541. or encrypted page with correct checksum. */
  2542. dberr_t local_err = buf_read_page(space, zip_size, offset);
  2543. if (local_err == DB_SUCCESS) {
  2544. buf_read_ahead_random(space, zip_size, offset,
  2545. ibuf_inside(mtr));
  2546. retries = 0;
  2547. } else if (retries < BUF_PAGE_READ_MAX_RETRIES) {
  2548. ++retries;
  2549. DBUG_EXECUTE_IF(
  2550. "innodb_page_corruption_retries",
  2551. retries = BUF_PAGE_READ_MAX_RETRIES;
  2552. );
  2553. } else {
  2554. if (err) {
  2555. *err = local_err;
  2556. }
  2557. /* Pages whose encryption key is unavailable or used
  2558. key, encryption algorithm or encryption method is
  2559. incorrect are marked as encrypted in
  2560. buf_page_check_corrupt(). Unencrypted page could be
  2561. corrupted in a way where the key_id field is
  2562. nonzero. There is no checksum on field
  2563. FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION. */
  2564. if (local_err == DB_DECRYPTION_FAILED) {
  2565. return (NULL);
  2566. }
  2567. /* Try to set table as corrupted instead of
  2568. asserting. */
  2569. if (space > TRX_SYS_SPACE &&
  2570. dict_set_corrupted_by_space(space)) {
  2571. return (NULL);
  2572. }
  2573. ib_logf(IB_LOG_LEVEL_FATAL, "Unable"
  2574. " to read tablespace " ULINTPF " page no "
  2575. ULINTPF " into the buffer pool after "
  2576. ULINTPF " attempts."
  2577. " The most probable cause"
  2578. " of this error may be that the"
  2579. " table has been corrupted."
  2580. " You can try to fix this"
  2581. " problem by using"
  2582. " innodb_force_recovery."
  2583. " Please see " REFMAN " for more"
  2584. " details. Aborting...",
  2585. space, offset,
  2586. BUF_PAGE_READ_MAX_RETRIES);
  2587. }
  2588. #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
  2589. ut_a(++buf_dbg_counter % 5771 || buf_validate());
  2590. #endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
  2591. goto loop;
  2592. } else {
  2593. fix_block = block;
  2594. }
  2595. buf_block_fix(fix_block);
  2596. /* Now safe to release page_hash mutex */
  2597. rw_lock_s_unlock(hash_lock);
  2598. got_block:
  2599. fix_mutex = buf_page_get_mutex(&fix_block->page);
  2600. ut_ad(page_zip_get_size(&block->page.zip) == zip_size);
  2601. if (mode == BUF_GET_IF_IN_POOL || mode == BUF_PEEK_IF_IN_POOL) {
  2602. bool must_read;
  2603. {
  2604. buf_page_t* fix_page = &fix_block->page;
  2605. mutex_enter(fix_mutex);
  2606. buf_io_fix io_fix = buf_page_get_io_fix(fix_page);
  2607. must_read = (io_fix == BUF_IO_READ);
  2608. mutex_exit(fix_mutex);
  2609. }
  2610. if (must_read) {
  2611. /* The page is being read to buffer pool,
  2612. but we cannot wait around for the read to
  2613. complete. */
  2614. buf_block_unfix(fix_block);
  2615. return(NULL);
  2616. }
  2617. }
  2618. switch(buf_block_get_state(fix_block)) {
  2619. buf_page_t* bpage;
  2620. case BUF_BLOCK_FILE_PAGE:
  2621. break;
  2622. case BUF_BLOCK_ZIP_PAGE:
  2623. case BUF_BLOCK_ZIP_DIRTY:
  2624. if (mode == BUF_PEEK_IF_IN_POOL) {
  2625. /* This mode is only used for dropping an
  2626. adaptive hash index. There cannot be an
  2627. adaptive hash index for a compressed-only
  2628. page, so do not bother decompressing the page. */
  2629. buf_block_unfix(fix_block);
  2630. return(NULL);
  2631. }
  2632. bpage = &block->page;
  2633. /* Note: We have already buffer fixed this block. */
  2634. if (bpage->buf_fix_count > 1
  2635. || buf_page_get_io_fix(bpage) != BUF_IO_NONE) {
  2636. /* This condition often occurs when the buffer
  2637. is not buffer-fixed, but I/O-fixed by
  2638. buf_page_init_for_read(). */
  2639. buf_block_unfix(fix_block);
  2640. /* The block is buffer-fixed or I/O-fixed.
  2641. Try again later. */
  2642. os_thread_sleep(WAIT_FOR_READ);
  2643. goto loop;
  2644. }
  2645. /* Buffer-fix the block so that it cannot be evicted
  2646. or relocated while we are attempting to allocate an
  2647. uncompressed page. */
  2648. block = buf_LRU_get_free_block(buf_pool);
  2649. buf_pool_mutex_enter(buf_pool);
  2650. rw_lock_x_lock(hash_lock);
  2651. /* Buffer-fixing prevents the page_hash from changing. */
  2652. ut_ad(bpage == buf_page_hash_get_low(
  2653. buf_pool, space, offset, fold));
  2654. buf_block_mutex_enter(block);
  2655. mutex_enter(&buf_pool->zip_mutex);
  2656. ut_ad(fix_block->page.buf_fix_count > 0);
  2657. #ifdef PAGE_ATOMIC_REF_COUNT
  2658. os_atomic_decrement_uint32(&fix_block->page.buf_fix_count, 1);
  2659. #else
  2660. --fix_block->page.buf_fix_count;
  2661. #endif /* PAGE_ATOMIC_REF_COUNT */
  2662. fix_block = block;
  2663. if (bpage->buf_fix_count > 0
  2664. || buf_page_get_io_fix(bpage) != BUF_IO_NONE) {
  2665. mutex_exit(&buf_pool->zip_mutex);
  2666. /* The block was buffer-fixed or I/O-fixed while
  2667. buf_pool->mutex was not held by this thread.
  2668. Free the block that was allocated and retry.
  2669. This should be extremely unlikely, for example,
  2670. if buf_page_get_zip() was invoked. */
  2671. buf_LRU_block_free_non_file_page(block);
  2672. buf_pool_mutex_exit(buf_pool);
  2673. rw_lock_x_unlock(hash_lock);
  2674. buf_block_mutex_exit(block);
  2675. /* Try again */
  2676. goto loop;
  2677. }
  2678. /* Move the compressed page from bpage to block,
  2679. and uncompress it. */
  2680. /* Note: this is the uncompressed block and it is not
  2681. accessible by other threads yet because it is not in
  2682. any list or hash table */
  2683. buf_relocate(bpage, &block->page);
  2684. buf_block_init_low(block);
  2685. /* Set after relocate(). */
  2686. block->page.buf_fix_count = 1;
  2687. block->lock_hash_val = lock_rec_hash(space, offset);
  2688. UNIV_MEM_DESC(&block->page.zip.data,
  2689. page_zip_get_size(&block->page.zip));
  2690. if (buf_page_get_state(&block->page) == BUF_BLOCK_ZIP_PAGE) {
  2691. #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
  2692. UT_LIST_REMOVE(list, buf_pool->zip_clean,
  2693. &block->page);
  2694. #endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
  2695. ut_ad(!block->page.in_flush_list);
  2696. } else {
  2697. /* Relocate buf_pool->flush_list. */
  2698. buf_flush_relocate_on_flush_list(bpage, &block->page);
  2699. }
  2700. /* Buffer-fix, I/O-fix, and X-latch the block
  2701. for the duration of the decompression.
  2702. Also add the block to the unzip_LRU list. */
  2703. block->page.state = BUF_BLOCK_FILE_PAGE;
  2704. /* Insert at the front of unzip_LRU list */
  2705. buf_unzip_LRU_add_block(block, FALSE);
  2706. buf_block_set_io_fix(block, BUF_IO_READ);
  2707. rw_lock_x_lock_inline(&block->lock, 0, file, line);
  2708. UNIV_MEM_INVALID(bpage, sizeof *bpage);
  2709. rw_lock_x_unlock(hash_lock);
  2710. ++buf_pool->n_pend_unzip;
  2711. mutex_exit(&buf_pool->zip_mutex);
  2712. buf_pool_mutex_exit(buf_pool);
  2713. access_time = buf_page_is_accessed(&block->page);
  2714. buf_block_mutex_exit(block);
  2715. buf_page_free_descriptor(bpage);
  2716. /* Decompress the page while not holding
  2717. buf_pool->mutex or block->mutex. */
  2718. {
  2719. bool success = buf_zip_decompress(block, TRUE);
  2720. if (!success) {
  2721. buf_pool_mutex_enter(buf_pool);
  2722. buf_block_mutex_enter(fix_block);
  2723. buf_block_set_io_fix(fix_block, BUF_IO_NONE);
  2724. buf_block_mutex_exit(fix_block);
  2725. --buf_pool->n_pend_unzip;
  2726. buf_block_unfix(fix_block);
  2727. buf_pool_mutex_exit(buf_pool);
  2728. rw_lock_x_unlock(&fix_block->lock);
  2729. return NULL;
  2730. }
  2731. }
  2732. if (!recv_no_ibuf_operations) {
  2733. if (access_time) {
  2734. #ifdef UNIV_IBUF_COUNT_DEBUG
  2735. ut_a(ibuf_count_get(space, offset) == 0);
  2736. #endif /* UNIV_IBUF_COUNT_DEBUG */
  2737. } else {
  2738. ibuf_merge_or_delete_for_page(
  2739. block, space, offset, zip_size, TRUE);
  2740. }
  2741. }
  2742. buf_pool_mutex_enter(buf_pool);
  2743. /* Unfix and unlatch the block. */
  2744. buf_block_mutex_enter(fix_block);
  2745. buf_block_set_io_fix(fix_block, BUF_IO_NONE);
  2746. buf_block_mutex_exit(fix_block);
  2747. --buf_pool->n_pend_unzip;
  2748. buf_pool_mutex_exit(buf_pool);
  2749. rw_lock_x_unlock(&block->lock);
  2750. break;
  2751. case BUF_BLOCK_POOL_WATCH:
  2752. case BUF_BLOCK_NOT_USED:
  2753. case BUF_BLOCK_READY_FOR_USE:
  2754. case BUF_BLOCK_MEMORY:
  2755. case BUF_BLOCK_REMOVE_HASH:
  2756. ut_error;
  2757. break;
  2758. }
  2759. ut_ad(block == fix_block);
  2760. ut_ad(fix_block->page.buf_fix_count > 0);
  2761. #ifdef UNIV_SYNC_DEBUG
  2762. ut_ad(!rw_lock_own(hash_lock, RW_LOCK_EX));
  2763. ut_ad(!rw_lock_own(hash_lock, RW_LOCK_SHARED));
  2764. #endif /* UNIV_SYNC_DEBUG */
  2765. ut_ad(buf_block_get_state(fix_block) == BUF_BLOCK_FILE_PAGE);
  2766. #if defined UNIV_DEBUG || defined UNIV_IBUF_DEBUG
  2767. if ((mode == BUF_GET_IF_IN_POOL || mode == BUF_GET_IF_IN_POOL_OR_WATCH)
  2768. && (ibuf_debug || buf_debug_execute_is_force_flush())) {
  2769. /* Try to evict the block from the buffer pool, to use the
  2770. insert buffer (change buffer) as much as possible. */
  2771. buf_pool_mutex_enter(buf_pool);
  2772. buf_block_unfix(fix_block);
  2773. /* Now we are only holding the buf_pool->mutex,
  2774. not block->mutex or hash_lock. Blocks cannot be
  2775. relocated or enter or exit the buf_pool while we
  2776. are holding the buf_pool->mutex. */
  2777. if (buf_LRU_free_page(&fix_block->page, true)) {
  2778. buf_pool_mutex_exit(buf_pool);
  2779. rw_lock_x_lock(hash_lock);
  2780. if (mode == BUF_GET_IF_IN_POOL_OR_WATCH) {
  2781. /* Set the watch, as it would have
  2782. been set if the page were not in the
  2783. buffer pool in the first place. */
  2784. block = (buf_block_t*) buf_pool_watch_set(
  2785. space, offset, fold);
  2786. } else {
  2787. block = (buf_block_t*) buf_page_hash_get_low(
  2788. buf_pool, space, offset, fold);
  2789. }
  2790. rw_lock_x_unlock(hash_lock);
  2791. if (block != NULL) {
  2792. /* Either the page has been read in or
  2793. a watch was set on that in the window
  2794. where we released the buf_pool::mutex
  2795. and before we acquire the hash_lock
  2796. above. Try again. */
  2797. guess = block;
  2798. goto loop;
  2799. }
  2800. fprintf(stderr,
  2801. "innodb_change_buffering_debug evict %u %u\n",
  2802. (unsigned) space, (unsigned) offset);
  2803. return(NULL);
  2804. }
  2805. mutex_enter(&fix_block->mutex);
  2806. if (buf_flush_page_try(buf_pool, fix_block)) {
  2807. fprintf(stderr,
  2808. "innodb_change_buffering_debug flush %u %u\n",
  2809. (unsigned) space, (unsigned) offset);
  2810. guess = fix_block;
  2811. goto loop;
  2812. }
  2813. buf_block_mutex_exit(fix_block);
  2814. buf_block_fix(fix_block);
  2815. /* Failed to evict the page; change it directly */
  2816. buf_pool_mutex_exit(buf_pool);
  2817. }
  2818. #endif /* UNIV_DEBUG || UNIV_IBUF_DEBUG */
  2819. ut_ad(fix_block->page.buf_fix_count > 0);
  2820. #ifdef UNIV_SYNC_DEBUG
  2821. /* We have already buffer fixed the page, and we are committed to
  2822. returning this page to the caller. Register for debugging. */
  2823. {
  2824. ibool ret;
  2825. ret = rw_lock_s_lock_nowait(&fix_block->debug_latch, file, line);
  2826. ut_a(ret);
  2827. }
  2828. #endif /* UNIV_SYNC_DEBUG */
  2829. #if defined UNIV_DEBUG_FILE_ACCESSES || defined UNIV_DEBUG
  2830. ut_a(mode == BUF_GET_POSSIBLY_FREED
  2831. || !fix_block->page.file_page_was_freed);
  2832. #endif
  2833. /* Check if this is the first access to the page */
  2834. access_time = buf_page_is_accessed(&fix_block->page);
  2835. /* This is a heuristic and we don't care about ordering issues. */
  2836. if (access_time == 0) {
  2837. buf_block_mutex_enter(fix_block);
  2838. buf_page_set_accessed(&fix_block->page);
  2839. buf_block_mutex_exit(fix_block);
  2840. }
  2841. if (mode != BUF_PEEK_IF_IN_POOL) {
  2842. buf_page_make_young_if_needed(&fix_block->page);
  2843. }
  2844. #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
  2845. ut_a(++buf_dbg_counter % 5771 || buf_validate());
  2846. ut_a(fix_block->page.buf_fix_count > 0);
  2847. ut_a(buf_block_get_state(fix_block) == BUF_BLOCK_FILE_PAGE);
  2848. #endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
  2849. #ifdef PAGE_ATOMIC_REF_COUNT
  2850. /* We have to wait here because the IO_READ state was set
  2851. under the protection of the hash_lock and the block->mutex
  2852. but not the block->lock. */
  2853. buf_wait_for_read(fix_block);
  2854. #endif /* PAGE_ATOMIC_REF_COUNT */
  2855. switch (rw_latch) {
  2856. case RW_NO_LATCH:
  2857. #ifndef PAGE_ATOMIC_REF_COUNT
  2858. buf_wait_for_read(fix_block);
  2859. #endif /* !PAGE_ATOMIC_REF_COUNT */
  2860. fix_type = MTR_MEMO_BUF_FIX;
  2861. break;
  2862. case RW_S_LATCH:
  2863. rw_lock_s_lock_inline(&fix_block->lock, 0, file, line);
  2864. fix_type = MTR_MEMO_PAGE_S_FIX;
  2865. break;
  2866. default:
  2867. ut_ad(rw_latch == RW_X_LATCH);
  2868. rw_lock_x_lock_inline(&fix_block->lock, 0, file, line);
  2869. fix_type = MTR_MEMO_PAGE_X_FIX;
  2870. break;
  2871. }
  2872. mtr_memo_push(mtr, fix_block, fix_type);
  2873. if (mode != BUF_PEEK_IF_IN_POOL && !access_time) {
  2874. /* In the case of a first access, try to apply linear
  2875. read-ahead */
  2876. buf_read_ahead_linear(
  2877. space, zip_size, offset, ibuf_inside(mtr));
  2878. }
  2879. #ifdef UNIV_IBUF_COUNT_DEBUG
  2880. ut_a(ibuf_count_get(buf_block_get_space(fix_block),
  2881. buf_block_get_page_no(fix_block)) == 0);
  2882. #endif
  2883. #ifdef UNIV_SYNC_DEBUG
  2884. ut_ad(!rw_lock_own(hash_lock, RW_LOCK_EX));
  2885. ut_ad(!rw_lock_own(hash_lock, RW_LOCK_SHARED));
  2886. #endif /* UNIV_SYNC_DEBUG */
  2887. return(fix_block);
  2888. }
  2889. /********************************************************************//**
  2890. This is the general function used to get optimistic access to a database
  2891. page.
  2892. @return TRUE if success */
  2893. UNIV_INTERN
  2894. ibool
  2895. buf_page_optimistic_get(
  2896. /*====================*/
  2897. ulint rw_latch,/*!< in: RW_S_LATCH, RW_X_LATCH */
  2898. buf_block_t* block, /*!< in: guessed buffer block */
  2899. ib_uint64_t modify_clock,/*!< in: modify clock value */
  2900. const char* file, /*!< in: file name */
  2901. ulint line, /*!< in: line where called */
  2902. mtr_t* mtr) /*!< in: mini-transaction */
  2903. {
  2904. buf_pool_t* buf_pool;
  2905. unsigned access_time;
  2906. ibool success;
  2907. ulint fix_type;
  2908. ut_ad(block);
  2909. ut_ad(mtr);
  2910. ut_ad(mtr->state == MTR_ACTIVE);
  2911. ut_ad((rw_latch == RW_S_LATCH) || (rw_latch == RW_X_LATCH));
  2912. mutex_enter(&block->mutex);
  2913. if (UNIV_UNLIKELY(buf_block_get_state(block) != BUF_BLOCK_FILE_PAGE)) {
  2914. mutex_exit(&block->mutex);
  2915. return(FALSE);
  2916. }
  2917. buf_block_buf_fix_inc(block, file, line);
  2918. access_time = buf_page_is_accessed(&block->page);
  2919. buf_page_set_accessed(&block->page);
  2920. mutex_exit(&block->mutex);
  2921. buf_page_make_young_if_needed(&block->page);
  2922. ut_ad(!ibuf_inside(mtr)
  2923. || ibuf_page(buf_block_get_space(block),
  2924. buf_block_get_zip_size(block),
  2925. buf_block_get_page_no(block), NULL));
  2926. if (rw_latch == RW_S_LATCH) {
  2927. success = rw_lock_s_lock_nowait(&(block->lock),
  2928. file, line);
  2929. fix_type = MTR_MEMO_PAGE_S_FIX;
  2930. } else {
  2931. success = rw_lock_x_lock_func_nowait_inline(&(block->lock),
  2932. file, line);
  2933. fix_type = MTR_MEMO_PAGE_X_FIX;
  2934. }
  2935. if (UNIV_UNLIKELY(!success)) {
  2936. buf_block_buf_fix_dec(block);
  2937. return(FALSE);
  2938. }
  2939. if (UNIV_UNLIKELY(modify_clock != block->modify_clock)) {
  2940. buf_block_dbg_add_level(block, SYNC_NO_ORDER_CHECK);
  2941. if (rw_latch == RW_S_LATCH) {
  2942. rw_lock_s_unlock(&(block->lock));
  2943. } else {
  2944. rw_lock_x_unlock(&(block->lock));
  2945. }
  2946. buf_block_buf_fix_dec(block);
  2947. return(FALSE);
  2948. }
  2949. mtr_memo_push(mtr, block, fix_type);
  2950. #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
  2951. ut_a(++buf_dbg_counter % 5771 || buf_validate());
  2952. ut_a(block->page.buf_fix_count > 0);
  2953. ut_a(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
  2954. #endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
  2955. #if defined UNIV_DEBUG_FILE_ACCESSES || defined UNIV_DEBUG
  2956. mutex_enter(&block->mutex);
  2957. ut_a(!block->page.file_page_was_freed);
  2958. mutex_exit(&block->mutex);
  2959. #endif
  2960. if (!access_time) {
  2961. /* In the case of a first access, try to apply linear
  2962. read-ahead */
  2963. buf_read_ahead_linear(buf_block_get_space(block),
  2964. buf_block_get_zip_size(block),
  2965. buf_block_get_page_no(block),
  2966. ibuf_inside(mtr));
  2967. }
  2968. #ifdef UNIV_IBUF_COUNT_DEBUG
  2969. ut_a(ibuf_count_get(buf_block_get_space(block),
  2970. buf_block_get_page_no(block)) == 0);
  2971. #endif
  2972. buf_pool = buf_pool_from_block(block);
  2973. buf_pool->stat.n_page_gets++;
  2974. return(TRUE);
  2975. }
  2976. /********************************************************************//**
  2977. This is used to get access to a known database page, when no waiting can be
  2978. done. For example, if a search in an adaptive hash index leads us to this
  2979. frame.
  2980. @return TRUE if success */
  2981. UNIV_INTERN
  2982. ibool
  2983. buf_page_get_known_nowait(
  2984. /*======================*/
  2985. ulint rw_latch,/*!< in: RW_S_LATCH, RW_X_LATCH */
  2986. buf_block_t* block, /*!< in: the known page */
  2987. ulint mode, /*!< in: BUF_MAKE_YOUNG or BUF_KEEP_OLD */
  2988. const char* file, /*!< in: file name */
  2989. ulint line, /*!< in: line where called */
  2990. mtr_t* mtr) /*!< in: mini-transaction */
  2991. {
  2992. buf_pool_t* buf_pool;
  2993. ibool success;
  2994. ulint fix_type;
  2995. ut_ad(mtr);
  2996. ut_ad(mtr->state == MTR_ACTIVE);
  2997. ut_ad((rw_latch == RW_S_LATCH) || (rw_latch == RW_X_LATCH));
  2998. mutex_enter(&block->mutex);
  2999. if (buf_block_get_state(block) == BUF_BLOCK_REMOVE_HASH) {
  3000. /* Another thread is just freeing the block from the LRU list
  3001. of the buffer pool: do not try to access this page; this
  3002. attempt to access the page can only come through the hash
  3003. index because when the buffer block state is ..._REMOVE_HASH,
  3004. we have already removed it from the page address hash table
  3005. of the buffer pool. */
  3006. mutex_exit(&block->mutex);
  3007. return(FALSE);
  3008. }
  3009. ut_a(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
  3010. buf_block_buf_fix_inc(block, file, line);
  3011. buf_page_set_accessed(&block->page);
  3012. mutex_exit(&block->mutex);
  3013. buf_pool = buf_pool_from_block(block);
  3014. if (mode == BUF_MAKE_YOUNG) {
  3015. buf_page_make_young_if_needed(&block->page);
  3016. }
  3017. ut_ad(!ibuf_inside(mtr) || mode == BUF_KEEP_OLD);
  3018. if (rw_latch == RW_S_LATCH) {
  3019. success = rw_lock_s_lock_nowait(&(block->lock),
  3020. file, line);
  3021. fix_type = MTR_MEMO_PAGE_S_FIX;
  3022. } else {
  3023. success = rw_lock_x_lock_func_nowait_inline(&(block->lock),
  3024. file, line);
  3025. fix_type = MTR_MEMO_PAGE_X_FIX;
  3026. }
  3027. if (!success) {
  3028. buf_block_buf_fix_dec(block);
  3029. return(FALSE);
  3030. }
  3031. mtr_memo_push(mtr, block, fix_type);
  3032. #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
  3033. ut_a(++buf_dbg_counter % 5771 || buf_validate());
  3034. ut_a(block->page.buf_fix_count > 0);
  3035. ut_a(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
  3036. #endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
  3037. #if defined UNIV_DEBUG_FILE_ACCESSES || defined UNIV_DEBUG
  3038. if (mode != BUF_KEEP_OLD) {
  3039. /* If mode == BUF_KEEP_OLD, we are executing an I/O
  3040. completion routine. Avoid a bogus assertion failure
  3041. when ibuf_merge_or_delete_for_page() is processing a
  3042. page that was just freed due to DROP INDEX, or
  3043. deleting a record from SYS_INDEXES. This check will be
  3044. skipped in recv_recover_page() as well. */
  3045. mutex_enter(&block->mutex);
  3046. ut_a(!block->page.file_page_was_freed);
  3047. mutex_exit(&block->mutex);
  3048. }
  3049. #endif
  3050. #ifdef UNIV_IBUF_COUNT_DEBUG
  3051. ut_a((mode == BUF_KEEP_OLD)
  3052. || (ibuf_count_get(buf_block_get_space(block),
  3053. buf_block_get_page_no(block)) == 0));
  3054. #endif
  3055. buf_pool->stat.n_page_gets++;
  3056. return(TRUE);
  3057. }
  3058. /*******************************************************************//**
  3059. Given a tablespace id and page number tries to get that page. If the
  3060. page is not in the buffer pool it is not loaded and NULL is returned.
  3061. Suitable for using when holding the lock_sys_t::mutex.
  3062. @return pointer to a page or NULL */
  3063. UNIV_INTERN
  3064. buf_block_t*
  3065. buf_page_try_get_func(
  3066. /*==================*/
  3067. ulint space_id,/*!< in: tablespace id */
  3068. ulint page_no,/*!< in: page number */
  3069. ulint rw_latch,/*!< in: RW_S_LATCH, RW_X_LATCH */
  3070. bool possibly_freed,
  3071. const char* file, /*!< in: file name */
  3072. ulint line, /*!< in: line where called */
  3073. mtr_t* mtr) /*!< in: mini-transaction */
  3074. {
  3075. buf_block_t* block;
  3076. ibool success;
  3077. ulint fix_type;
  3078. buf_pool_t* buf_pool = buf_pool_get(space_id, page_no);
  3079. rw_lock_t* hash_lock;
  3080. ut_ad(mtr);
  3081. ut_ad(mtr->state == MTR_ACTIVE);
  3082. block = buf_block_hash_get_s_locked(buf_pool, space_id,
  3083. page_no, &hash_lock);
  3084. if (!block || buf_block_get_state(block) != BUF_BLOCK_FILE_PAGE) {
  3085. if (block) {
  3086. rw_lock_s_unlock(hash_lock);
  3087. }
  3088. return(NULL);
  3089. }
  3090. ut_ad(!buf_pool_watch_is_sentinel(buf_pool, &block->page));
  3091. mutex_enter(&block->mutex);
  3092. rw_lock_s_unlock(hash_lock);
  3093. #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
  3094. ut_a(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
  3095. ut_a(buf_block_get_space(block) == space_id);
  3096. ut_a(buf_block_get_page_no(block) == page_no);
  3097. #endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
  3098. buf_block_buf_fix_inc(block, file, line);
  3099. mutex_exit(&block->mutex);
  3100. if (rw_latch == RW_S_LATCH) {
  3101. fix_type = MTR_MEMO_PAGE_S_FIX;
  3102. success = rw_lock_s_lock_nowait(&block->lock, file, line);
  3103. } else {
  3104. success = false;
  3105. }
  3106. if (!success) {
  3107. /* Let us try to get an X-latch. If the current thread
  3108. is holding an X-latch on the page, we cannot get an
  3109. S-latch. */
  3110. fix_type = MTR_MEMO_PAGE_X_FIX;
  3111. success = rw_lock_x_lock_func_nowait_inline(&block->lock,
  3112. file, line);
  3113. }
  3114. if (!success) {
  3115. buf_block_buf_fix_dec(block);
  3116. return(NULL);
  3117. }
  3118. mtr_memo_push(mtr, block, fix_type);
  3119. #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
  3120. ut_a(++buf_dbg_counter % 5771 || buf_validate());
  3121. ut_a(block->page.buf_fix_count > 0);
  3122. ut_a(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
  3123. #endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
  3124. #if defined UNIV_DEBUG_FILE_ACCESSES || defined UNIV_DEBUG
  3125. if (!possibly_freed) {
  3126. mutex_enter(&block->mutex);
  3127. ut_a(!block->page.file_page_was_freed);
  3128. mutex_exit(&block->mutex);
  3129. }
  3130. #endif /* UNIV_DEBUG_FILE_ACCESSES || UNIV_DEBUG */
  3131. buf_block_dbg_add_level(block, SYNC_NO_ORDER_CHECK);
  3132. buf_pool->stat.n_page_gets++;
  3133. #ifdef UNIV_IBUF_COUNT_DEBUG
  3134. ut_a(ibuf_count_get(buf_block_get_space(block),
  3135. buf_block_get_page_no(block)) == 0);
  3136. #endif
  3137. return(block);
  3138. }
  3139. /********************************************************************//**
  3140. Initialize some fields of a control block. */
  3141. UNIV_INLINE
  3142. void
  3143. buf_page_init_low(
  3144. /*==============*/
  3145. buf_page_t* bpage) /*!< in: block to init */
  3146. {
  3147. bpage->flush_type = BUF_FLUSH_LRU;
  3148. bpage->io_fix = BUF_IO_NONE;
  3149. bpage->buf_fix_count = 0;
  3150. bpage->old = 0;
  3151. bpage->freed_page_clock = 0;
  3152. bpage->access_time = 0;
  3153. bpage->newest_modification = 0;
  3154. bpage->oldest_modification = 0;
  3155. bpage->write_size = 0;
  3156. bpage->encrypted = false;
  3157. bpage->real_size = 0;
  3158. bpage->slot = NULL;
  3159. HASH_INVALIDATE(bpage, hash);
  3160. #if defined UNIV_DEBUG_FILE_ACCESSES || defined UNIV_DEBUG
  3161. bpage->file_page_was_freed = FALSE;
  3162. #endif /* UNIV_DEBUG_FILE_ACCESSES || UNIV_DEBUG */
  3163. }
  3164. /********************************************************************//**
  3165. Inits a page to the buffer buf_pool. */
  3166. static MY_ATTRIBUTE((nonnull))
  3167. void
  3168. buf_page_init(
  3169. /*==========*/
  3170. buf_pool_t* buf_pool,/*!< in/out: buffer pool */
  3171. ulint space, /*!< in: space id */
  3172. ulint offset, /*!< in: offset of the page within space
  3173. in units of a page */
  3174. ulint fold, /*!< in: buf_page_address_fold(space,offset) */
  3175. ulint zip_size,/*!< in: compressed page size, or 0 */
  3176. buf_block_t* block) /*!< in/out: block to init */
  3177. {
  3178. buf_page_t* hash_page;
  3179. ut_ad(buf_pool == buf_pool_get(space, offset));
  3180. ut_ad(buf_pool_mutex_own(buf_pool));
  3181. ut_ad(mutex_own(&(block->mutex)));
  3182. ut_a(buf_block_get_state(block) != BUF_BLOCK_FILE_PAGE);
  3183. #ifdef UNIV_SYNC_DEBUG
  3184. ut_ad(rw_lock_own(buf_page_hash_lock_get(buf_pool, fold),
  3185. RW_LOCK_EX));
  3186. #endif /* UNIV_SYNC_DEBUG */
  3187. /* Set the state of the block */
  3188. buf_block_set_file_page(block, space, offset);
  3189. #ifdef UNIV_DEBUG_VALGRIND
  3190. if (!space) {
  3191. /* Silence valid Valgrind warnings about uninitialized
  3192. data being written to data files. There are some unused
  3193. bytes on some pages that InnoDB does not initialize. */
  3194. UNIV_MEM_VALID(block->frame, UNIV_PAGE_SIZE);
  3195. }
  3196. #endif /* UNIV_DEBUG_VALGRIND */
  3197. buf_block_init_low(block);
  3198. block->lock_hash_val = lock_rec_hash(space, offset);
  3199. buf_page_init_low(&block->page);
  3200. /* Insert into the hash table of file pages */
  3201. hash_page = buf_page_hash_get_low(buf_pool, space, offset, fold);
  3202. if (hash_page == NULL) {
  3203. /* Block not found in the hash table */
  3204. } else if (buf_pool_watch_is_sentinel(buf_pool, hash_page)) {
  3205. ib_uint32_t buf_fix_count = hash_page->buf_fix_count;
  3206. ut_a(buf_fix_count > 0);
  3207. #ifdef PAGE_ATOMIC_REF_COUNT
  3208. os_atomic_increment_uint32(
  3209. &block->page.buf_fix_count, buf_fix_count);
  3210. #else
  3211. block->page.buf_fix_count += ulint(buf_fix_count);
  3212. #endif /* PAGE_ATOMIC_REF_COUNT */
  3213. buf_pool_watch_remove(buf_pool, fold, hash_page);
  3214. } else {
  3215. fprintf(stderr,
  3216. "InnoDB: Error: page %lu %lu already found"
  3217. " in the hash table: %p, %p\n",
  3218. space,
  3219. offset,
  3220. (const void*) hash_page, (const void*) block);
  3221. #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
  3222. mutex_exit(&block->mutex);
  3223. buf_pool_mutex_exit(buf_pool);
  3224. buf_print();
  3225. buf_LRU_print();
  3226. buf_validate();
  3227. buf_LRU_validate();
  3228. #endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
  3229. ut_error;
  3230. }
  3231. ut_ad(!block->page.in_zip_hash);
  3232. ut_ad(!block->page.in_page_hash);
  3233. ut_d(block->page.in_page_hash = TRUE);
  3234. HASH_INSERT(buf_page_t, hash, buf_pool->page_hash, fold, &block->page);
  3235. if (zip_size) {
  3236. page_zip_set_size(&block->page.zip, zip_size);
  3237. }
  3238. }
  3239. /********************************************************************//**
  3240. Function which inits a page for read to the buffer buf_pool. If the page is
  3241. (1) already in buf_pool, or
  3242. (2) if we specify to read only ibuf pages and the page is not an ibuf page, or
  3243. (3) if the space is deleted or being deleted,
  3244. then this function does nothing.
  3245. Sets the io_fix flag to BUF_IO_READ and sets a non-recursive exclusive lock
  3246. on the buffer frame. The io-handler must take care that the flag is cleared
  3247. and the lock released later.
  3248. @return pointer to the block or NULL */
  3249. UNIV_INTERN
  3250. buf_page_t*
  3251. buf_page_init_for_read(
  3252. /*===================*/
  3253. dberr_t* err, /*!< out: DB_SUCCESS or DB_TABLESPACE_DELETED */
  3254. ulint mode, /*!< in: BUF_READ_IBUF_PAGES_ONLY, ... */
  3255. ulint space, /*!< in: space id */
  3256. ulint zip_size,/*!< in: compressed page size, or 0 */
  3257. ibool unzip, /*!< in: TRUE=request uncompressed page */
  3258. ib_int64_t tablespace_version,
  3259. /*!< in: prevents reading from a wrong
  3260. version of the tablespace in case we have done
  3261. DISCARD + IMPORT */
  3262. ulint offset) /*!< in: page number */
  3263. {
  3264. buf_block_t* block;
  3265. buf_page_t* bpage = NULL;
  3266. buf_page_t* watch_page;
  3267. rw_lock_t* hash_lock;
  3268. mtr_t mtr;
  3269. ulint fold;
  3270. ibool lru = FALSE;
  3271. void* data;
  3272. buf_pool_t* buf_pool = buf_pool_get(space, offset);
  3273. ut_ad(buf_pool);
  3274. *err = DB_SUCCESS;
  3275. if (mode == BUF_READ_IBUF_PAGES_ONLY) {
  3276. /* It is a read-ahead within an ibuf routine */
  3277. ut_ad(!ibuf_bitmap_page(zip_size, offset));
  3278. ibuf_mtr_start(&mtr);
  3279. if (!recv_no_ibuf_operations
  3280. && !ibuf_page(space, zip_size, offset, &mtr)) {
  3281. ibuf_mtr_commit(&mtr);
  3282. return(NULL);
  3283. }
  3284. } else {
  3285. ut_ad(mode == BUF_READ_ANY_PAGE);
  3286. }
  3287. if (zip_size && !unzip && !recv_recovery_is_on()) {
  3288. block = NULL;
  3289. } else {
  3290. block = buf_LRU_get_free_block(buf_pool);
  3291. ut_ad(block);
  3292. ut_ad(buf_pool_from_block(block) == buf_pool);
  3293. }
  3294. fold = buf_page_address_fold(space, offset);
  3295. hash_lock = buf_page_hash_lock_get(buf_pool, fold);
  3296. buf_pool_mutex_enter(buf_pool);
  3297. rw_lock_x_lock(hash_lock);
  3298. watch_page = buf_page_hash_get_low(buf_pool, space, offset, fold);
  3299. if (watch_page && !buf_pool_watch_is_sentinel(buf_pool, watch_page)) {
  3300. /* The page is already in the buffer pool. */
  3301. watch_page = NULL;
  3302. err_exit:
  3303. rw_lock_x_unlock(hash_lock);
  3304. if (block) {
  3305. mutex_enter(&block->mutex);
  3306. buf_LRU_block_free_non_file_page(block);
  3307. mutex_exit(&block->mutex);
  3308. }
  3309. bpage = NULL;
  3310. goto func_exit;
  3311. }
  3312. if (fil_tablespace_deleted_or_being_deleted_in_mem(
  3313. space, tablespace_version)) {
  3314. /* The page belongs to a space which has been
  3315. deleted or is being deleted. */
  3316. *err = DB_TABLESPACE_DELETED;
  3317. goto err_exit;
  3318. }
  3319. if (block) {
  3320. bpage = &block->page;
  3321. mutex_enter(&block->mutex);
  3322. ut_ad(buf_pool_from_bpage(bpage) == buf_pool);
  3323. buf_page_init(buf_pool, space, offset, fold, zip_size, block);
  3324. #ifdef PAGE_ATOMIC_REF_COUNT
  3325. /* Note: We set the io state without the protection of
  3326. the block->lock. This is because other threads cannot
  3327. access this block unless it is in the hash table. */
  3328. buf_page_set_io_fix(bpage, BUF_IO_READ);
  3329. #endif /* PAGE_ATOMIC_REF_COUNT */
  3330. rw_lock_x_unlock(hash_lock);
  3331. /* The block must be put to the LRU list, to the old blocks */
  3332. buf_LRU_add_block(bpage, TRUE/* to old blocks */);
  3333. /* We set a pass-type x-lock on the frame because then
  3334. the same thread which called for the read operation
  3335. (and is running now at this point of code) can wait
  3336. for the read to complete by waiting for the x-lock on
  3337. the frame; if the x-lock were recursive, the same
  3338. thread would illegally get the x-lock before the page
  3339. read is completed. The x-lock is cleared by the
  3340. io-handler thread. */
  3341. rw_lock_x_lock_gen(&block->lock, BUF_IO_READ);
  3342. #ifndef PAGE_ATOMIC_REF_COUNT
  3343. buf_page_set_io_fix(bpage, BUF_IO_READ);
  3344. #endif /* !PAGE_ATOMIC_REF_COUNT */
  3345. if (zip_size) {
  3346. /* buf_pool->mutex may be released and
  3347. reacquired by buf_buddy_alloc(). Thus, we
  3348. must release block->mutex in order not to
  3349. break the latching order in the reacquisition
  3350. of buf_pool->mutex. We also must defer this
  3351. operation until after the block descriptor has
  3352. been added to buf_pool->LRU and
  3353. buf_pool->page_hash. */
  3354. mutex_exit(&block->mutex);
  3355. data = buf_buddy_alloc(buf_pool, zip_size, &lru);
  3356. mutex_enter(&block->mutex);
  3357. block->page.zip.data = (page_zip_t*) data;
  3358. /* To maintain the invariant
  3359. block->in_unzip_LRU_list
  3360. == buf_page_belongs_to_unzip_LRU(&block->page)
  3361. we have to add this block to unzip_LRU
  3362. after block->page.zip.data is set. */
  3363. ut_ad(buf_page_belongs_to_unzip_LRU(&block->page));
  3364. buf_unzip_LRU_add_block(block, TRUE);
  3365. }
  3366. mutex_exit(&block->mutex);
  3367. } else {
  3368. rw_lock_x_unlock(hash_lock);
  3369. /* The compressed page must be allocated before the
  3370. control block (bpage), in order to avoid the
  3371. invocation of buf_buddy_relocate_block() on
  3372. uninitialized data. */
  3373. data = buf_buddy_alloc(buf_pool, zip_size, &lru);
  3374. rw_lock_x_lock(hash_lock);
  3375. /* If buf_buddy_alloc() allocated storage from the LRU list,
  3376. it released and reacquired buf_pool->mutex. Thus, we must
  3377. check the page_hash again, as it may have been modified. */
  3378. if (UNIV_UNLIKELY(lru)) {
  3379. watch_page = buf_page_hash_get_low(
  3380. buf_pool, space, offset, fold);
  3381. if (UNIV_UNLIKELY(watch_page
  3382. && !buf_pool_watch_is_sentinel(buf_pool,
  3383. watch_page))) {
  3384. /* The block was added by some other thread. */
  3385. rw_lock_x_unlock(hash_lock);
  3386. watch_page = NULL;
  3387. buf_buddy_free(buf_pool, data, zip_size);
  3388. bpage = NULL;
  3389. goto func_exit;
  3390. }
  3391. }
  3392. bpage = buf_page_alloc_descriptor();
  3393. /* Initialize the buf_pool pointer. */
  3394. bpage->buf_pool_index = buf_pool_index(buf_pool);
  3395. page_zip_des_init(&bpage->zip);
  3396. page_zip_set_size(&bpage->zip, zip_size);
  3397. bpage->zip.data = (page_zip_t*) data;
  3398. bpage->slot = NULL;
  3399. mutex_enter(&buf_pool->zip_mutex);
  3400. UNIV_MEM_DESC(bpage->zip.data,
  3401. page_zip_get_size(&bpage->zip));
  3402. buf_page_init_low(bpage);
  3403. bpage->state = BUF_BLOCK_ZIP_PAGE;
  3404. bpage->space = static_cast<ib_uint32_t>(space);
  3405. bpage->offset = static_cast<ib_uint32_t>(offset);
  3406. #ifdef UNIV_DEBUG
  3407. bpage->in_page_hash = FALSE;
  3408. bpage->in_zip_hash = FALSE;
  3409. bpage->in_flush_list = FALSE;
  3410. bpage->in_free_list = FALSE;
  3411. bpage->in_LRU_list = FALSE;
  3412. #endif /* UNIV_DEBUG */
  3413. ut_d(bpage->in_page_hash = TRUE);
  3414. if (watch_page != NULL) {
  3415. /* Preserve the reference count. */
  3416. ib_uint32_t buf_fix_count;
  3417. buf_fix_count = watch_page->buf_fix_count;
  3418. ut_a(buf_fix_count > 0);
  3419. #ifdef PAGE_ATOMIC_REF_COUNT
  3420. os_atomic_increment_uint32(
  3421. &bpage->buf_fix_count, buf_fix_count);
  3422. #else
  3423. bpage->buf_fix_count += buf_fix_count;
  3424. #endif /* PAGE_ATOMIC_REF_COUNT */
  3425. ut_ad(buf_pool_watch_is_sentinel(buf_pool, watch_page));
  3426. buf_pool_watch_remove(buf_pool, fold, watch_page);
  3427. }
  3428. HASH_INSERT(buf_page_t, hash, buf_pool->page_hash, fold,
  3429. bpage);
  3430. rw_lock_x_unlock(hash_lock);
  3431. /* The block must be put to the LRU list, to the old blocks.
  3432. The zip_size is already set into the page zip */
  3433. buf_LRU_add_block(bpage, TRUE/* to old blocks */);
  3434. #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
  3435. buf_LRU_insert_zip_clean(bpage);
  3436. #endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
  3437. buf_page_set_io_fix(bpage, BUF_IO_READ);
  3438. mutex_exit(&buf_pool->zip_mutex);
  3439. }
  3440. buf_pool->n_pend_reads++;
  3441. func_exit:
  3442. buf_pool_mutex_exit(buf_pool);
  3443. if (mode == BUF_READ_IBUF_PAGES_ONLY) {
  3444. ibuf_mtr_commit(&mtr);
  3445. }
  3446. #ifdef UNIV_SYNC_DEBUG
  3447. ut_ad(!rw_lock_own(hash_lock, RW_LOCK_EX));
  3448. ut_ad(!rw_lock_own(hash_lock, RW_LOCK_SHARED));
  3449. #endif /* UNIV_SYNC_DEBUG */
  3450. ut_ad(!bpage || buf_page_in_file(bpage));
  3451. return(bpage);
  3452. }
  3453. /********************************************************************//**
  3454. Initializes a page to the buffer buf_pool. The page is usually not read
  3455. from a file even if it cannot be found in the buffer buf_pool. This is one
  3456. of the functions which perform to a block a state transition NOT_USED =>
  3457. FILE_PAGE (the other is buf_page_get_gen).
  3458. @return pointer to the block, page bufferfixed */
  3459. UNIV_INTERN
  3460. buf_block_t*
  3461. buf_page_create(
  3462. /*============*/
  3463. ulint space, /*!< in: space id */
  3464. ulint offset, /*!< in: offset of the page within space in units of
  3465. a page */
  3466. ulint zip_size,/*!< in: compressed page size, or 0 */
  3467. mtr_t* mtr) /*!< in: mini-transaction handle */
  3468. {
  3469. buf_frame_t* frame;
  3470. buf_block_t* block;
  3471. ulint fold;
  3472. buf_block_t* free_block = NULL;
  3473. buf_pool_t* buf_pool = buf_pool_get(space, offset);
  3474. rw_lock_t* hash_lock;
  3475. ut_ad(mtr);
  3476. ut_ad(mtr->state == MTR_ACTIVE);
  3477. ut_ad(space || !zip_size);
  3478. free_block = buf_LRU_get_free_block(buf_pool);
  3479. fold = buf_page_address_fold(space, offset);
  3480. hash_lock = buf_page_hash_lock_get(buf_pool, fold);
  3481. buf_pool_mutex_enter(buf_pool);
  3482. rw_lock_x_lock(hash_lock);
  3483. block = (buf_block_t*) buf_page_hash_get_low(
  3484. buf_pool, space, offset, fold);
  3485. if (block
  3486. && buf_page_in_file(&block->page)
  3487. && !buf_pool_watch_is_sentinel(buf_pool, &block->page)) {
  3488. #ifdef UNIV_IBUF_COUNT_DEBUG
  3489. ut_a(ibuf_count_get(space, offset) == 0);
  3490. #endif
  3491. #if defined UNIV_DEBUG_FILE_ACCESSES || defined UNIV_DEBUG
  3492. block->page.file_page_was_freed = FALSE;
  3493. #endif /* UNIV_DEBUG_FILE_ACCESSES || UNIV_DEBUG */
  3494. /* Page can be found in buf_pool */
  3495. buf_pool_mutex_exit(buf_pool);
  3496. rw_lock_x_unlock(hash_lock);
  3497. buf_block_free(free_block);
  3498. return(buf_page_get_with_no_latch(space, zip_size, offset, mtr));
  3499. }
  3500. /* If we get here, the page was not in buf_pool: init it there */
  3501. #ifdef UNIV_DEBUG
  3502. if (buf_debug_prints) {
  3503. fprintf(stderr, "Creating space %lu page %lu to buffer\n",
  3504. space, offset);
  3505. }
  3506. #endif /* UNIV_DEBUG */
  3507. block = free_block;
  3508. mutex_enter(&block->mutex);
  3509. buf_page_init(buf_pool, space, offset, fold, zip_size, block);
  3510. rw_lock_x_unlock(hash_lock);
  3511. /* The block must be put to the LRU list */
  3512. buf_LRU_add_block(&block->page, FALSE);
  3513. buf_block_buf_fix_inc(block, __FILE__, __LINE__);
  3514. buf_pool->stat.n_pages_created++;
  3515. if (zip_size) {
  3516. void* data;
  3517. ibool lru;
  3518. /* Prevent race conditions during buf_buddy_alloc(),
  3519. which may release and reacquire buf_pool->mutex,
  3520. by IO-fixing and X-latching the block. */
  3521. buf_page_set_io_fix(&block->page, BUF_IO_READ);
  3522. rw_lock_x_lock(&block->lock);
  3523. mutex_exit(&block->mutex);
  3524. /* buf_pool->mutex may be released and reacquired by
  3525. buf_buddy_alloc(). Thus, we must release block->mutex
  3526. in order not to break the latching order in
  3527. the reacquisition of buf_pool->mutex. We also must
  3528. defer this operation until after the block descriptor
  3529. has been added to buf_pool->LRU and buf_pool->page_hash. */
  3530. data = buf_buddy_alloc(buf_pool, zip_size, &lru);
  3531. mutex_enter(&block->mutex);
  3532. block->page.zip.data = (page_zip_t*) data;
  3533. /* To maintain the invariant
  3534. block->in_unzip_LRU_list
  3535. == buf_page_belongs_to_unzip_LRU(&block->page)
  3536. we have to add this block to unzip_LRU after
  3537. block->page.zip.data is set. */
  3538. ut_ad(buf_page_belongs_to_unzip_LRU(&block->page));
  3539. buf_unzip_LRU_add_block(block, FALSE);
  3540. buf_page_set_io_fix(&block->page, BUF_IO_NONE);
  3541. rw_lock_x_unlock(&block->lock);
  3542. }
  3543. buf_pool_mutex_exit(buf_pool);
  3544. mtr_memo_push(mtr, block, MTR_MEMO_BUF_FIX);
  3545. buf_page_set_accessed(&block->page);
  3546. mutex_exit(&block->mutex);
  3547. /* Delete possible entries for the page from the insert buffer:
  3548. such can exist if the page belonged to an index which was dropped */
  3549. ibuf_merge_or_delete_for_page(NULL, space, offset, zip_size, TRUE);
  3550. frame = block->frame;
  3551. memset(frame + FIL_PAGE_PREV, 0xff, 4);
  3552. memset(frame + FIL_PAGE_NEXT, 0xff, 4);
  3553. mach_write_to_2(frame + FIL_PAGE_TYPE, FIL_PAGE_TYPE_ALLOCATED);
  3554. /* FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION is only used on the
  3555. following pages:
  3556. (1) The first page of the InnoDB system tablespace (page 0:0)
  3557. (2) FIL_RTREE_SPLIT_SEQ_NUM on R-tree pages
  3558. (3) key_version on encrypted pages (not page 0:0) */
  3559. memset(frame + FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION, 0, 8);
  3560. #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
  3561. ut_a(++buf_dbg_counter % 5771 || buf_validate());
  3562. #endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
  3563. #ifdef UNIV_IBUF_COUNT_DEBUG
  3564. ut_a(ibuf_count_get(buf_block_get_space(block),
  3565. buf_block_get_page_no(block)) == 0);
  3566. #endif
  3567. return(block);
  3568. }
  3569. /********************************************************************//**
  3570. Monitor the buffer page read/write activity, and increment corresponding
  3571. counter value if MONITOR_MODULE_BUF_PAGE (module_buf_page) module is
  3572. enabled. */
  3573. static
  3574. void
  3575. buf_page_monitor(
  3576. /*=============*/
  3577. const buf_page_t* bpage, /*!< in: pointer to the block */
  3578. enum buf_io_fix io_type)/*!< in: io_fix types */
  3579. {
  3580. const byte* frame;
  3581. monitor_id_t counter;
  3582. /* If the counter module is not turned on, just return */
  3583. if (!MONITOR_IS_ON(MONITOR_MODULE_BUF_PAGE)) {
  3584. return;
  3585. }
  3586. ut_a(io_type == BUF_IO_READ || io_type == BUF_IO_WRITE);
  3587. frame = bpage->zip.data
  3588. ? bpage->zip.data
  3589. : ((buf_block_t*) bpage)->frame;
  3590. switch (fil_page_get_type(frame)) {
  3591. ulint level;
  3592. case FIL_PAGE_INDEX:
  3593. level = btr_page_get_level_low(frame);
  3594. /* Check if it is an index page for insert buffer */
  3595. if (btr_page_get_index_id(frame)
  3596. == (index_id_t)(DICT_IBUF_ID_MIN + IBUF_SPACE_ID)) {
  3597. if (level == 0) {
  3598. counter = MONITOR_RW_COUNTER(
  3599. io_type, MONITOR_INDEX_IBUF_LEAF_PAGE);
  3600. } else {
  3601. counter = MONITOR_RW_COUNTER(
  3602. io_type,
  3603. MONITOR_INDEX_IBUF_NON_LEAF_PAGE);
  3604. }
  3605. } else {
  3606. if (level == 0) {
  3607. counter = MONITOR_RW_COUNTER(
  3608. io_type, MONITOR_INDEX_LEAF_PAGE);
  3609. } else {
  3610. counter = MONITOR_RW_COUNTER(
  3611. io_type, MONITOR_INDEX_NON_LEAF_PAGE);
  3612. }
  3613. }
  3614. break;
  3615. case FIL_PAGE_UNDO_LOG:
  3616. counter = MONITOR_RW_COUNTER(io_type, MONITOR_UNDO_LOG_PAGE);
  3617. break;
  3618. case FIL_PAGE_INODE:
  3619. counter = MONITOR_RW_COUNTER(io_type, MONITOR_INODE_PAGE);
  3620. break;
  3621. case FIL_PAGE_IBUF_FREE_LIST:
  3622. counter = MONITOR_RW_COUNTER(io_type,
  3623. MONITOR_IBUF_FREELIST_PAGE);
  3624. break;
  3625. case FIL_PAGE_IBUF_BITMAP:
  3626. counter = MONITOR_RW_COUNTER(io_type,
  3627. MONITOR_IBUF_BITMAP_PAGE);
  3628. break;
  3629. case FIL_PAGE_TYPE_SYS:
  3630. counter = MONITOR_RW_COUNTER(io_type, MONITOR_SYSTEM_PAGE);
  3631. break;
  3632. case FIL_PAGE_TYPE_TRX_SYS:
  3633. counter = MONITOR_RW_COUNTER(io_type, MONITOR_TRX_SYSTEM_PAGE);
  3634. break;
  3635. case FIL_PAGE_TYPE_FSP_HDR:
  3636. counter = MONITOR_RW_COUNTER(io_type, MONITOR_FSP_HDR_PAGE);
  3637. break;
  3638. case FIL_PAGE_TYPE_XDES:
  3639. counter = MONITOR_RW_COUNTER(io_type, MONITOR_XDES_PAGE);
  3640. break;
  3641. case FIL_PAGE_TYPE_BLOB:
  3642. counter = MONITOR_RW_COUNTER(io_type, MONITOR_BLOB_PAGE);
  3643. break;
  3644. case FIL_PAGE_TYPE_ZBLOB:
  3645. counter = MONITOR_RW_COUNTER(io_type, MONITOR_ZBLOB_PAGE);
  3646. break;
  3647. case FIL_PAGE_TYPE_ZBLOB2:
  3648. counter = MONITOR_RW_COUNTER(io_type, MONITOR_ZBLOB2_PAGE);
  3649. break;
  3650. default:
  3651. counter = MONITOR_RW_COUNTER(io_type, MONITOR_OTHER_PAGE);
  3652. }
  3653. MONITOR_INC_NOCHECK(counter);
  3654. }
  3655. /********************************************************************//**
  3656. Mark a table with the specified space pointed by bpage->space corrupted.
  3657. Also remove the bpage from LRU list.
  3658. @param[in,out] bpage Block */
  3659. static
  3660. void
  3661. buf_mark_space_corrupt(
  3662. buf_page_t* bpage)
  3663. {
  3664. buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
  3665. const ibool uncompressed = (buf_page_get_state(bpage)
  3666. == BUF_BLOCK_FILE_PAGE);
  3667. ulint space = bpage->space;
  3668. /* First unfix and release lock on the bpage */
  3669. buf_pool_mutex_enter(buf_pool);
  3670. mutex_enter(buf_page_get_mutex(bpage));
  3671. ut_ad(buf_page_get_io_fix(bpage) == BUF_IO_READ);
  3672. ut_ad(bpage->buf_fix_count == 0);
  3673. /* Set BUF_IO_NONE before we remove the block from LRU list */
  3674. buf_page_set_io_fix(bpage, BUF_IO_NONE);
  3675. if (uncompressed) {
  3676. rw_lock_x_unlock_gen(
  3677. &((buf_block_t*) bpage)->lock,
  3678. BUF_IO_READ);
  3679. }
  3680. mutex_exit(buf_page_get_mutex(bpage));
  3681. /* If block is not encrypted find the table with specified
  3682. space id, and mark it corrupted. Encrypted tables
  3683. are marked unusable later e.g. in ::open(). */
  3684. if (!bpage->encrypted) {
  3685. dict_set_corrupted_by_space(space);
  3686. } else {
  3687. dict_set_encrypted_by_space(space);
  3688. }
  3689. /* After this point bpage can't be referenced. */
  3690. buf_LRU_free_one_page(bpage);
  3691. ut_ad(buf_pool->n_pend_reads > 0);
  3692. buf_pool->n_pend_reads--;
  3693. buf_pool_mutex_exit(buf_pool);
  3694. }
  3695. /** Check if page is maybe compressed, encrypted or both when we encounter
  3696. corrupted page. Note that we can't be 100% sure if page is corrupted
  3697. or decrypt/decompress just failed.
  3698. @param[in,out] bpage page
  3699. @param[in,out] space tablespace from fil_space_acquire_for_io()
  3700. @return whether the operation succeeded
  3701. @retval DB_SUCCESS if page has been read and is not corrupted
  3702. @retval DB_PAGE_CORRUPTED if page based on checksum check is corrupted
  3703. @retval DB_DECRYPTION_FAILED if page post encryption checksum matches but
  3704. after decryption normal page checksum does not match.
  3705. @retval DB_TABLESPACE_DELETED if accessed tablespace is not found */
  3706. static
  3707. dberr_t
  3708. buf_page_check_corrupt(buf_page_t* bpage, fil_space_t* space)
  3709. {
  3710. ut_ad(space->n_pending_ios > 0);
  3711. ulint zip_size = buf_page_get_zip_size(bpage);
  3712. byte* dst_frame = (zip_size) ? bpage->zip.data :
  3713. ((buf_block_t*) bpage)->frame;
  3714. bool still_encrypted = false;
  3715. dberr_t err = DB_SUCCESS;
  3716. bool corrupted = false;
  3717. fil_space_crypt_t* crypt_data = space->crypt_data;
  3718. /* In buf_decrypt_after_read we have either decrypted the page if
  3719. page post encryption checksum matches and used key_id is found
  3720. from the encryption plugin. If checksum did not match page was
  3721. not decrypted and it could be either encrypted and corrupted
  3722. or corrupted or good page. If we decrypted, there page could
  3723. still be corrupted if used key does not match. */
  3724. still_encrypted = (crypt_data &&
  3725. crypt_data->type != CRYPT_SCHEME_UNENCRYPTED &&
  3726. !bpage->encrypted &&
  3727. fil_space_verify_crypt_checksum(dst_frame, zip_size,
  3728. space, bpage->offset));
  3729. if (!still_encrypted) {
  3730. /* If traditional checksums match, we assume that page is
  3731. not anymore encrypted. */
  3732. corrupted = buf_page_is_corrupted(true, dst_frame, zip_size,
  3733. space);
  3734. if (!corrupted) {
  3735. bpage->encrypted = false;
  3736. } else {
  3737. err = DB_PAGE_CORRUPTED;
  3738. }
  3739. }
  3740. /* Pages that we think are unencrypted but do not match the checksum
  3741. checks could be corrupted or encrypted or both. */
  3742. if (corrupted && !bpage->encrypted) {
  3743. /* An error will be reported by
  3744. buf_page_io_complete(). */
  3745. } else if (still_encrypted || (bpage->encrypted && corrupted)) {
  3746. bpage->encrypted = true;
  3747. err = DB_DECRYPTION_FAILED;
  3748. ib_logf(IB_LOG_LEVEL_ERROR,
  3749. "The page [page id: space=%u"
  3750. ", page number=%u]"
  3751. " in file %s cannot be decrypted.",
  3752. bpage->space, bpage->offset,
  3753. space->name);
  3754. ib_logf(IB_LOG_LEVEL_INFO,
  3755. "However key management plugin or used key_version " ULINTPF
  3756. " is not found or"
  3757. " used encryption algorithm or method does not match.",
  3758. mach_read_from_4(dst_frame+FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION));
  3759. if (bpage->space > TRX_SYS_SPACE) {
  3760. ib_logf(IB_LOG_LEVEL_INFO,
  3761. "Marking tablespace as missing. You may drop this table or"
  3762. " install correct key management plugin and key file.");
  3763. }
  3764. }
  3765. return (err);
  3766. }
  3767. /** Complete a read or write request of a file page to or from the buffer pool.
  3768. @param[in,out] bpage Page to complete
  3769. @param[in] evict whether or not to evict the page
  3770. from LRU list.
  3771. @return whether the operation succeeded
  3772. @retval DB_SUCCESS always when writing, or if a read page was OK
  3773. @retval DB_PAGE_CORRUPTED if the checksum fails on a page read
  3774. @retval DB_DECRYPTION_FAILED if page post encryption checksum matches but
  3775. after decryption normal page checksum does
  3776. not match */
  3777. UNIV_INTERN
  3778. dberr_t
  3779. buf_page_io_complete(buf_page_t* bpage, bool evict)
  3780. {
  3781. enum buf_io_fix io_type;
  3782. buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
  3783. const ibool uncompressed = (buf_page_get_state(bpage)
  3784. == BUF_BLOCK_FILE_PAGE);
  3785. byte* frame = NULL;
  3786. dberr_t err = DB_SUCCESS;
  3787. ut_a(buf_page_in_file(bpage));
  3788. /* We do not need protect io_fix here by mutex to read
  3789. it because this is the only function where we can change the value
  3790. from BUF_IO_READ or BUF_IO_WRITE to some other value, and our code
  3791. ensures that this is the only thread that handles the i/o for this
  3792. block. */
  3793. io_type = buf_page_get_io_fix(bpage);
  3794. ut_ad(io_type == BUF_IO_READ || io_type == BUF_IO_WRITE);
  3795. if (io_type == BUF_IO_READ) {
  3796. ulint read_page_no = 0;
  3797. ulint read_space_id = 0;
  3798. uint key_version = 0;
  3799. ut_ad(bpage->zip.data || ((buf_block_t*)bpage)->frame);
  3800. fil_space_t* space = fil_space_acquire_for_io(bpage->space);
  3801. if (!space) {
  3802. return(DB_TABLESPACE_DELETED);
  3803. }
  3804. buf_page_decrypt_after_read(bpage, space);
  3805. if (buf_page_get_zip_size(bpage)) {
  3806. frame = bpage->zip.data;
  3807. } else {
  3808. frame = ((buf_block_t*) bpage)->frame;
  3809. }
  3810. if (buf_page_get_zip_size(bpage)) {
  3811. frame = bpage->zip.data;
  3812. os_atomic_increment_ulint(&buf_pool->n_pend_unzip, 1);
  3813. if (uncompressed
  3814. && !buf_zip_decompress((buf_block_t*) bpage,
  3815. FALSE)) {
  3816. os_atomic_decrement_ulint(&buf_pool->n_pend_unzip, 1);
  3817. ib_logf(IB_LOG_LEVEL_INFO,
  3818. "Page %u in tablespace %u zip_decompress failure.",
  3819. bpage->offset, bpage->space);
  3820. err = DB_PAGE_CORRUPTED;
  3821. goto database_corrupted;
  3822. }
  3823. os_atomic_decrement_ulint(&buf_pool->n_pend_unzip, 1);
  3824. } else {
  3825. ut_a(uncompressed);
  3826. frame = ((buf_block_t*) bpage)->frame;
  3827. }
  3828. /* If this page is not uninitialized and not in the
  3829. doublewrite buffer, then the page number and space id
  3830. should be the same as in block. */
  3831. read_page_no = mach_read_from_4(frame + FIL_PAGE_OFFSET);
  3832. read_space_id = mach_read_from_4(
  3833. frame + FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID);
  3834. key_version = mach_read_from_4(
  3835. frame + FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION);
  3836. if (bpage->space == TRX_SYS_SPACE
  3837. && buf_dblwr_page_inside(bpage->offset)) {
  3838. ut_print_timestamp(stderr);
  3839. fprintf(stderr,
  3840. " InnoDB: Error: reading page %u\n"
  3841. "InnoDB: which is in the"
  3842. " doublewrite buffer!\n",
  3843. bpage->offset);
  3844. } else if (!read_space_id && !read_page_no) {
  3845. /* This is likely an uninitialized page. */
  3846. } else if ((bpage->space
  3847. && bpage->space != read_space_id)
  3848. || bpage->offset != read_page_no) {
  3849. /* We did not compare space_id to read_space_id
  3850. if bpage->space == 0, because the field on the
  3851. page may contain garbage in MySQL < 4.1.1,
  3852. which only supported bpage->space == 0. */
  3853. ib_logf(IB_LOG_LEVEL_ERROR,
  3854. "Space id and page n:o"
  3855. " stored in the page"
  3856. " read in are " ULINTPF ":" ULINTPF ","
  3857. " should be %u:%u!",
  3858. read_space_id,
  3859. read_page_no,
  3860. bpage->space,
  3861. bpage->offset);
  3862. }
  3863. err = buf_page_check_corrupt(bpage, space);
  3864. database_corrupted:
  3865. if (err != DB_SUCCESS) {
  3866. /* Not a real corruption if it was triggered by
  3867. error injection */
  3868. DBUG_EXECUTE_IF("buf_page_is_corrupt_failure",
  3869. if (bpage->space > TRX_SYS_SPACE) {
  3870. buf_mark_space_corrupt(bpage);
  3871. ib_logf(IB_LOG_LEVEL_INFO,
  3872. "Simulated page corruption");
  3873. fil_space_release_for_io(space);
  3874. return(err);
  3875. }
  3876. err = DB_SUCCESS;
  3877. goto page_not_corrupt;
  3878. );
  3879. if (err == DB_PAGE_CORRUPTED) {
  3880. ib_logf(IB_LOG_LEVEL_ERROR,
  3881. "Database page corruption on disk"
  3882. " or a failed file read of tablespace %s"
  3883. " page [page id: space=%u"
  3884. ", page number=%u]"
  3885. ". You may have to recover from "
  3886. "a backup.",
  3887. space->name,
  3888. bpage->space, bpage->offset);
  3889. buf_page_print(frame, buf_page_get_zip_size(bpage),
  3890. BUF_PAGE_PRINT_NO_CRASH);
  3891. ib_logf(IB_LOG_LEVEL_INFO,
  3892. "It is also possible that your"
  3893. " operating system has corrupted"
  3894. " its own file cache and rebooting"
  3895. " your computer removes the error."
  3896. " If the corrupt page is an index page."
  3897. " You can also try to fix the"
  3898. " corruption by dumping, dropping,"
  3899. " and reimporting the corrupt table."
  3900. " You can use CHECK TABLE to scan"
  3901. " your table for corruption. "
  3902. "Please refer to " REFMAN "forcing-innodb-recovery.html"
  3903. " for information about forcing recovery.");
  3904. }
  3905. if (srv_force_recovery < SRV_FORCE_IGNORE_CORRUPT) {
  3906. /* If page space id is larger than TRX_SYS_SPACE
  3907. (0), we will attempt to mark the corresponding
  3908. table as corrupted instead of crashing server */
  3909. if (bpage->space > TRX_SYS_SPACE) {
  3910. buf_mark_space_corrupt(bpage);
  3911. fil_space_release_for_io(space);
  3912. return(err);
  3913. } else {
  3914. ib_logf(IB_LOG_LEVEL_FATAL,
  3915. "Ending processing because of a corrupt database page.");
  3916. }
  3917. }
  3918. }
  3919. DBUG_EXECUTE_IF("buf_page_is_corrupt_failure",
  3920. page_not_corrupt: bpage = bpage; );
  3921. if (recv_recovery_is_on()) {
  3922. /* Pages must be uncompressed for crash recovery. */
  3923. ut_a(uncompressed);
  3924. recv_recover_page(TRUE, (buf_block_t*) bpage);
  3925. }
  3926. if (uncompressed && !recv_no_ibuf_operations
  3927. && fil_page_get_type(frame) == FIL_PAGE_INDEX
  3928. && page_is_leaf(frame)) {
  3929. if (bpage && bpage->encrypted) {
  3930. ib_logf(IB_LOG_LEVEL_WARN,
  3931. "Table in tablespace " ULINTPF " encrypted."
  3932. "However key management plugin or used key_version %u is not found or"
  3933. " used encryption algorithm or method does not match."
  3934. " Can't continue opening the table.",
  3935. read_space_id, key_version);
  3936. } else {
  3937. ibuf_merge_or_delete_for_page(
  3938. (buf_block_t*)bpage, bpage->space,
  3939. bpage->offset, buf_page_get_zip_size(bpage),
  3940. TRUE);
  3941. }
  3942. }
  3943. fil_space_release_for_io(space);
  3944. } else {
  3945. /* io_type == BUF_IO_WRITE */
  3946. if (bpage->slot) {
  3947. /* Mark slot free */
  3948. bpage->slot->reserved = false;
  3949. bpage->slot = NULL;
  3950. }
  3951. }
  3952. buf_pool_mutex_enter(buf_pool);
  3953. mutex_enter(buf_page_get_mutex(bpage));
  3954. #ifdef UNIV_IBUF_COUNT_DEBUG
  3955. if (io_type == BUF_IO_WRITE || uncompressed) {
  3956. /* For BUF_IO_READ of compressed-only blocks, the
  3957. buffered operations will be merged by buf_page_get_gen()
  3958. after the block has been uncompressed. */
  3959. ut_a(ibuf_count_get(bpage->space, bpage->offset) == 0);
  3960. }
  3961. #endif
  3962. /* Because this thread which does the unlocking is not the same that
  3963. did the locking, we use a pass value != 0 in unlock, which simply
  3964. removes the newest lock debug record, without checking the thread
  3965. id. */
  3966. buf_page_set_io_fix(bpage, BUF_IO_NONE);
  3967. buf_page_monitor(bpage, io_type);
  3968. switch (io_type) {
  3969. case BUF_IO_READ:
  3970. /* NOTE that the call to ibuf may have moved the ownership of
  3971. the x-latch to this OS thread: do not let this confuse you in
  3972. debugging! */
  3973. ut_ad(buf_pool->n_pend_reads > 0);
  3974. buf_pool->n_pend_reads--;
  3975. buf_pool->stat.n_pages_read++;
  3976. if (uncompressed) {
  3977. rw_lock_x_unlock_gen(&((buf_block_t*) bpage)->lock,
  3978. BUF_IO_READ);
  3979. }
  3980. mutex_exit(buf_page_get_mutex(bpage));
  3981. break;
  3982. case BUF_IO_WRITE:
  3983. /* Write means a flush operation: call the completion
  3984. routine in the flush system */
  3985. buf_flush_write_complete(bpage);
  3986. if (uncompressed) {
  3987. rw_lock_s_unlock_gen(&((buf_block_t*) bpage)->lock,
  3988. BUF_IO_WRITE);
  3989. }
  3990. buf_pool->stat.n_pages_written++;
  3991. /* In case of flush batches i.e.: BUF_FLUSH_LIST and
  3992. BUF_FLUSH_LRU this function is always called from IO
  3993. helper thread. In this case, we decide whether or not
  3994. to evict the page based on flush type. The value
  3995. passed as evict is the default value in function
  3996. definition which is false.
  3997. We always evict in case of LRU batch and never evict
  3998. in case of flush list batch. For single page flush
  3999. the caller sets the appropriate value. */
  4000. if (buf_page_get_flush_type(bpage) == BUF_FLUSH_LRU) {
  4001. evict = true;
  4002. }
  4003. mutex_exit(buf_page_get_mutex(bpage));
  4004. if (evict) {
  4005. buf_LRU_free_page(bpage, true);
  4006. }
  4007. break;
  4008. default:
  4009. ut_error;
  4010. }
  4011. #ifdef UNIV_DEBUG
  4012. if (buf_debug_prints) {
  4013. fprintf(stderr, "Has %s page space %lu page no %lu\n",
  4014. io_type == BUF_IO_READ ? "read" : "written",
  4015. buf_page_get_space(bpage),
  4016. buf_page_get_page_no(bpage));
  4017. }
  4018. #endif /* UNIV_DEBUG */
  4019. buf_pool_mutex_exit(buf_pool);
  4020. return(err);
  4021. }
  4022. /*********************************************************************//**
  4023. Asserts that all file pages in the buffer are in a replaceable state.
  4024. @return TRUE */
  4025. static
  4026. ibool
  4027. buf_all_freed_instance(
  4028. /*===================*/
  4029. buf_pool_t* buf_pool) /*!< in: buffer pool instancce */
  4030. {
  4031. ulint i;
  4032. buf_chunk_t* chunk;
  4033. ut_ad(buf_pool);
  4034. buf_pool_mutex_enter(buf_pool);
  4035. chunk = buf_pool->chunks;
  4036. for (i = buf_pool->n_chunks; i--; chunk++) {
  4037. const buf_block_t* block = buf_chunk_not_freed(chunk);
  4038. if (UNIV_LIKELY_NULL(block)) {
  4039. fil_space_t* space = fil_space_get(block->page.space);
  4040. ib_logf(IB_LOG_LEVEL_ERROR,
  4041. "Page %u %u still fixed or dirty.",
  4042. block->page.space,
  4043. block->page.offset);
  4044. ib_logf(IB_LOG_LEVEL_ERROR,
  4045. "Page oldest_modification " LSN_PF
  4046. " fix_count %d io_fix %d.",
  4047. block->page.oldest_modification,
  4048. block->page.buf_fix_count,
  4049. buf_page_get_io_fix(&block->page));
  4050. ib_logf(IB_LOG_LEVEL_FATAL,
  4051. "Page space_id %u name %s.",
  4052. block->page.space,
  4053. space->name ? space->name : "NULL");
  4054. }
  4055. }
  4056. buf_pool_mutex_exit(buf_pool);
  4057. return(TRUE);
  4058. }
  4059. /*********************************************************************//**
  4060. Invalidates file pages in one buffer pool instance */
  4061. static
  4062. void
  4063. buf_pool_invalidate_instance(
  4064. /*=========================*/
  4065. buf_pool_t* buf_pool) /*!< in: buffer pool instance */
  4066. {
  4067. ulint i;
  4068. buf_pool_mutex_enter(buf_pool);
  4069. for (i = BUF_FLUSH_LRU; i < BUF_FLUSH_N_TYPES; i++) {
  4070. /* As this function is called during startup and
  4071. during redo application phase during recovery, InnoDB
  4072. is single threaded (apart from IO helper threads) at
  4073. this stage. No new write batch can be in intialization
  4074. stage at this point. */
  4075. ut_ad(buf_pool->init_flush[i] == FALSE);
  4076. /* However, it is possible that a write batch that has
  4077. been posted earlier is still not complete. For buffer
  4078. pool invalidation to proceed we must ensure there is NO
  4079. write activity happening. */
  4080. if (buf_pool->n_flush[i] > 0) {
  4081. buf_flush_t type = static_cast<buf_flush_t>(i);
  4082. buf_pool_mutex_exit(buf_pool);
  4083. buf_flush_wait_batch_end(buf_pool, type);
  4084. buf_pool_mutex_enter(buf_pool);
  4085. }
  4086. }
  4087. buf_pool_mutex_exit(buf_pool);
  4088. ut_ad(buf_all_freed_instance(buf_pool));
  4089. buf_pool_mutex_enter(buf_pool);
  4090. while (buf_LRU_scan_and_free_block(buf_pool, TRUE)) {
  4091. }
  4092. ut_ad(UT_LIST_GET_LEN(buf_pool->LRU) == 0);
  4093. ut_ad(UT_LIST_GET_LEN(buf_pool->unzip_LRU) == 0);
  4094. buf_pool->freed_page_clock = 0;
  4095. buf_pool->LRU_old = NULL;
  4096. buf_pool->LRU_old_len = 0;
  4097. memset(&buf_pool->stat, 0x00, sizeof(buf_pool->stat));
  4098. buf_refresh_io_stats(buf_pool);
  4099. buf_pool_mutex_exit(buf_pool);
  4100. }
  4101. /*********************************************************************//**
  4102. Invalidates the file pages in the buffer pool when an archive recovery is
  4103. completed. All the file pages buffered must be in a replaceable state when
  4104. this function is called: not latched and not modified. */
  4105. UNIV_INTERN
  4106. void
  4107. buf_pool_invalidate(void)
  4108. /*=====================*/
  4109. {
  4110. ulint i;
  4111. for (i = 0; i < srv_buf_pool_instances; i++) {
  4112. buf_pool_invalidate_instance(buf_pool_from_array(i));
  4113. }
  4114. }
  4115. #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
  4116. /*********************************************************************//**
  4117. Validates data in one buffer pool instance
  4118. @return TRUE */
  4119. static
  4120. ibool
  4121. buf_pool_validate_instance(
  4122. /*=======================*/
  4123. buf_pool_t* buf_pool) /*!< in: buffer pool instance */
  4124. {
  4125. buf_page_t* b;
  4126. buf_chunk_t* chunk;
  4127. ulint i;
  4128. ulint n_lru_flush = 0;
  4129. ulint n_page_flush = 0;
  4130. ulint n_list_flush = 0;
  4131. ulint n_lru = 0;
  4132. ulint n_flush = 0;
  4133. ulint n_free = 0;
  4134. ulint n_zip = 0;
  4135. ulint fold = 0;
  4136. ulint space = 0;
  4137. ulint offset = 0;
  4138. ut_ad(buf_pool);
  4139. buf_pool_mutex_enter(buf_pool);
  4140. hash_lock_x_all(buf_pool->page_hash);
  4141. chunk = buf_pool->chunks;
  4142. /* Check the uncompressed blocks. */
  4143. for (i = buf_pool->n_chunks; i--; chunk++) {
  4144. ulint j;
  4145. buf_block_t* block = chunk->blocks;
  4146. for (j = chunk->size; j--; block++) {
  4147. mutex_enter(&block->mutex);
  4148. switch (buf_block_get_state(block)) {
  4149. case BUF_BLOCK_POOL_WATCH:
  4150. case BUF_BLOCK_ZIP_PAGE:
  4151. case BUF_BLOCK_ZIP_DIRTY:
  4152. /* These should only occur on
  4153. zip_clean, zip_free[], or flush_list. */
  4154. ut_error;
  4155. break;
  4156. case BUF_BLOCK_FILE_PAGE:
  4157. space = buf_block_get_space(block);
  4158. offset = buf_block_get_page_no(block);
  4159. fold = buf_page_address_fold(space, offset);
  4160. ut_a(buf_page_hash_get_low(buf_pool,
  4161. space,
  4162. offset,
  4163. fold)
  4164. == &block->page);
  4165. #ifdef UNIV_IBUF_COUNT_DEBUG
  4166. ut_a(buf_page_get_io_fix(&block->page)
  4167. == BUF_IO_READ
  4168. || !ibuf_count_get(buf_block_get_space(
  4169. block),
  4170. buf_block_get_page_no(
  4171. block)));
  4172. #endif
  4173. switch (buf_page_get_io_fix(&block->page)) {
  4174. case BUF_IO_NONE:
  4175. break;
  4176. case BUF_IO_WRITE:
  4177. switch (buf_page_get_flush_type(
  4178. &block->page)) {
  4179. case BUF_FLUSH_LRU:
  4180. n_lru_flush++;
  4181. goto assert_s_latched;
  4182. case BUF_FLUSH_SINGLE_PAGE:
  4183. n_page_flush++;
  4184. assert_s_latched:
  4185. ut_a(rw_lock_is_locked(
  4186. &block->lock,
  4187. RW_LOCK_SHARED));
  4188. break;
  4189. case BUF_FLUSH_LIST:
  4190. n_list_flush++;
  4191. break;
  4192. default:
  4193. ut_error;
  4194. }
  4195. break;
  4196. case BUF_IO_READ:
  4197. ut_a(rw_lock_is_locked(&block->lock,
  4198. RW_LOCK_EX));
  4199. break;
  4200. case BUF_IO_PIN:
  4201. break;
  4202. }
  4203. n_lru++;
  4204. break;
  4205. case BUF_BLOCK_NOT_USED:
  4206. n_free++;
  4207. break;
  4208. case BUF_BLOCK_READY_FOR_USE:
  4209. case BUF_BLOCK_MEMORY:
  4210. case BUF_BLOCK_REMOVE_HASH:
  4211. /* do nothing */
  4212. break;
  4213. }
  4214. mutex_exit(&block->mutex);
  4215. }
  4216. }
  4217. mutex_enter(&buf_pool->zip_mutex);
  4218. /* Check clean compressed-only blocks. */
  4219. for (b = UT_LIST_GET_FIRST(buf_pool->zip_clean); b;
  4220. b = UT_LIST_GET_NEXT(list, b)) {
  4221. ut_a(buf_page_get_state(b) == BUF_BLOCK_ZIP_PAGE);
  4222. switch (buf_page_get_io_fix(b)) {
  4223. case BUF_IO_NONE:
  4224. case BUF_IO_PIN:
  4225. /* All clean blocks should be I/O-unfixed. */
  4226. break;
  4227. case BUF_IO_READ:
  4228. /* In buf_LRU_free_page(), we temporarily set
  4229. b->io_fix = BUF_IO_READ for a newly allocated
  4230. control block in order to prevent
  4231. buf_page_get_gen() from decompressing the block. */
  4232. break;
  4233. default:
  4234. ut_error;
  4235. break;
  4236. }
  4237. /* It is OK to read oldest_modification here because
  4238. we have acquired buf_pool->zip_mutex above which acts
  4239. as the 'block->mutex' for these bpages. */
  4240. ut_a(!b->oldest_modification);
  4241. fold = buf_page_address_fold(b->space, b->offset);
  4242. ut_a(buf_page_hash_get_low(buf_pool, b->space, b->offset,
  4243. fold) == b);
  4244. n_lru++;
  4245. n_zip++;
  4246. }
  4247. /* Check dirty blocks. */
  4248. buf_flush_list_mutex_enter(buf_pool);
  4249. for (b = UT_LIST_GET_FIRST(buf_pool->flush_list); b;
  4250. b = UT_LIST_GET_NEXT(list, b)) {
  4251. ut_ad(b->in_flush_list);
  4252. ut_a(b->oldest_modification);
  4253. n_flush++;
  4254. switch (buf_page_get_state(b)) {
  4255. case BUF_BLOCK_ZIP_DIRTY:
  4256. n_lru++;
  4257. n_zip++;
  4258. switch (buf_page_get_io_fix(b)) {
  4259. case BUF_IO_NONE:
  4260. case BUF_IO_READ:
  4261. case BUF_IO_PIN:
  4262. break;
  4263. case BUF_IO_WRITE:
  4264. switch (buf_page_get_flush_type(b)) {
  4265. case BUF_FLUSH_LRU:
  4266. n_lru_flush++;
  4267. break;
  4268. case BUF_FLUSH_SINGLE_PAGE:
  4269. n_page_flush++;
  4270. break;
  4271. case BUF_FLUSH_LIST:
  4272. n_list_flush++;
  4273. break;
  4274. default:
  4275. ut_error;
  4276. }
  4277. break;
  4278. }
  4279. break;
  4280. case BUF_BLOCK_FILE_PAGE:
  4281. /* uncompressed page */
  4282. break;
  4283. case BUF_BLOCK_POOL_WATCH:
  4284. case BUF_BLOCK_ZIP_PAGE:
  4285. case BUF_BLOCK_NOT_USED:
  4286. case BUF_BLOCK_READY_FOR_USE:
  4287. case BUF_BLOCK_MEMORY:
  4288. case BUF_BLOCK_REMOVE_HASH:
  4289. ut_error;
  4290. break;
  4291. }
  4292. fold = buf_page_address_fold(b->space, b->offset);
  4293. ut_a(buf_page_hash_get_low(buf_pool, b->space, b->offset,
  4294. fold) == b);
  4295. }
  4296. ut_a(UT_LIST_GET_LEN(buf_pool->flush_list) == n_flush);
  4297. hash_unlock_x_all(buf_pool->page_hash);
  4298. buf_flush_list_mutex_exit(buf_pool);
  4299. mutex_exit(&buf_pool->zip_mutex);
  4300. if (n_lru + n_free > buf_pool->curr_size + n_zip) {
  4301. fprintf(stderr, "n LRU %lu, n free %lu, pool %lu zip %lu\n",
  4302. n_lru, n_free,
  4303. buf_pool->curr_size, n_zip);
  4304. ut_error;
  4305. }
  4306. ut_a(UT_LIST_GET_LEN(buf_pool->LRU) == n_lru);
  4307. if (UT_LIST_GET_LEN(buf_pool->free) != n_free) {
  4308. fprintf(stderr, "Free list len %lu, free blocks %lu\n",
  4309. UT_LIST_GET_LEN(buf_pool->free),
  4310. n_free);
  4311. ut_error;
  4312. }
  4313. ut_a(buf_pool->n_flush[BUF_FLUSH_LIST] == n_list_flush);
  4314. ut_a(buf_pool->n_flush[BUF_FLUSH_LRU] == n_lru_flush);
  4315. ut_a(buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE] == n_page_flush);
  4316. buf_pool_mutex_exit(buf_pool);
  4317. ut_a(buf_LRU_validate());
  4318. ut_a(buf_flush_validate(buf_pool));
  4319. return(TRUE);
  4320. }
  4321. /*********************************************************************//**
  4322. Validates the buffer buf_pool data structure.
  4323. @return TRUE */
  4324. UNIV_INTERN
  4325. ibool
  4326. buf_validate(void)
  4327. /*==============*/
  4328. {
  4329. ulint i;
  4330. for (i = 0; i < srv_buf_pool_instances; i++) {
  4331. buf_pool_t* buf_pool;
  4332. buf_pool = buf_pool_from_array(i);
  4333. buf_pool_validate_instance(buf_pool);
  4334. }
  4335. return(TRUE);
  4336. }
  4337. #endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
  4338. #if defined UNIV_DEBUG_PRINT || defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
  4339. /*********************************************************************//**
  4340. Prints info of the buffer buf_pool data structure for one instance. */
  4341. static
  4342. void
  4343. buf_print_instance(
  4344. /*===============*/
  4345. buf_pool_t* buf_pool)
  4346. {
  4347. index_id_t* index_ids;
  4348. ulint* counts;
  4349. ulint size;
  4350. ulint i;
  4351. ulint j;
  4352. index_id_t id;
  4353. ulint n_found;
  4354. buf_chunk_t* chunk;
  4355. dict_index_t* index;
  4356. ut_ad(buf_pool);
  4357. size = buf_pool->curr_size;
  4358. index_ids = static_cast<index_id_t*>(
  4359. mem_alloc(size * sizeof *index_ids));
  4360. counts = static_cast<ulint*>(mem_alloc(sizeof(ulint) * size));
  4361. buf_pool_mutex_enter(buf_pool);
  4362. buf_flush_list_mutex_enter(buf_pool);
  4363. fprintf(stderr,
  4364. "buf_pool size %lu\n"
  4365. "database pages %lu\n"
  4366. "free pages %lu\n"
  4367. "modified database pages %lu\n"
  4368. "n pending decompressions %lu\n"
  4369. "n pending reads %lu\n"
  4370. "n pending flush LRU %lu list %lu single page %lu\n"
  4371. "pages made young %lu, not young %lu\n"
  4372. "pages read %lu, created %lu, written %lu\n",
  4373. (ulint) size,
  4374. (ulint) UT_LIST_GET_LEN(buf_pool->LRU),
  4375. (ulint) UT_LIST_GET_LEN(buf_pool->free),
  4376. (ulint) UT_LIST_GET_LEN(buf_pool->flush_list),
  4377. (ulint) buf_pool->n_pend_unzip,
  4378. (ulint) buf_pool->n_pend_reads,
  4379. (ulint) buf_pool->n_flush[BUF_FLUSH_LRU],
  4380. (ulint) buf_pool->n_flush[BUF_FLUSH_LIST],
  4381. (ulint) buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE],
  4382. (ulint) buf_pool->stat.n_pages_made_young,
  4383. (ulint) buf_pool->stat.n_pages_not_made_young,
  4384. (ulint) buf_pool->stat.n_pages_read,
  4385. (ulint) buf_pool->stat.n_pages_created,
  4386. (ulint) buf_pool->stat.n_pages_written);
  4387. buf_flush_list_mutex_exit(buf_pool);
  4388. /* Count the number of blocks belonging to each index in the buffer */
  4389. n_found = 0;
  4390. chunk = buf_pool->chunks;
  4391. for (i = buf_pool->n_chunks; i--; chunk++) {
  4392. buf_block_t* block = chunk->blocks;
  4393. ulint n_blocks = chunk->size;
  4394. for (; n_blocks--; block++) {
  4395. const buf_frame_t* frame = block->frame;
  4396. if (fil_page_get_type(frame) == FIL_PAGE_INDEX) {
  4397. id = btr_page_get_index_id(frame);
  4398. /* Look for the id in the index_ids array */
  4399. j = 0;
  4400. while (j < n_found) {
  4401. if (index_ids[j] == id) {
  4402. counts[j]++;
  4403. break;
  4404. }
  4405. j++;
  4406. }
  4407. if (j == n_found) {
  4408. n_found++;
  4409. index_ids[j] = id;
  4410. counts[j] = 1;
  4411. }
  4412. }
  4413. }
  4414. }
  4415. buf_pool_mutex_exit(buf_pool);
  4416. for (i = 0; i < n_found; i++) {
  4417. index = dict_index_get_if_in_cache(index_ids[i]);
  4418. fprintf(stderr,
  4419. "Block count for index %llu in buffer is about %lu",
  4420. (ullint) index_ids[i],
  4421. (ulint) counts[i]);
  4422. if (index) {
  4423. putc(' ', stderr);
  4424. dict_index_name_print(stderr, NULL, index);
  4425. }
  4426. putc('\n', stderr);
  4427. }
  4428. mem_free(index_ids);
  4429. mem_free(counts);
  4430. ut_a(buf_pool_validate_instance(buf_pool));
  4431. }
  4432. /*********************************************************************//**
  4433. Prints info of the buffer buf_pool data structure. */
  4434. UNIV_INTERN
  4435. void
  4436. buf_print(void)
  4437. /*===========*/
  4438. {
  4439. ulint i;
  4440. for (i = 0; i < srv_buf_pool_instances; i++) {
  4441. buf_pool_t* buf_pool;
  4442. buf_pool = buf_pool_from_array(i);
  4443. buf_print_instance(buf_pool);
  4444. }
  4445. }
  4446. #endif /* UNIV_DEBUG_PRINT || UNIV_DEBUG || UNIV_BUF_DEBUG */
  4447. #ifdef UNIV_DEBUG
  4448. /*********************************************************************//**
  4449. Returns the number of latched pages in the buffer pool.
  4450. @return number of latched pages */
  4451. UNIV_INTERN
  4452. ulint
  4453. buf_get_latched_pages_number_instance(
  4454. /*==================================*/
  4455. buf_pool_t* buf_pool) /*!< in: buffer pool instance */
  4456. {
  4457. buf_page_t* b;
  4458. ulint i;
  4459. buf_chunk_t* chunk;
  4460. ulint fixed_pages_number = 0;
  4461. buf_pool_mutex_enter(buf_pool);
  4462. chunk = buf_pool->chunks;
  4463. for (i = buf_pool->n_chunks; i--; chunk++) {
  4464. buf_block_t* block;
  4465. ulint j;
  4466. block = chunk->blocks;
  4467. for (j = chunk->size; j--; block++) {
  4468. if (buf_block_get_state(block)
  4469. != BUF_BLOCK_FILE_PAGE) {
  4470. continue;
  4471. }
  4472. mutex_enter(&block->mutex);
  4473. if (block->page.buf_fix_count != 0
  4474. || buf_page_get_io_fix(&block->page)
  4475. != BUF_IO_NONE) {
  4476. fixed_pages_number++;
  4477. }
  4478. mutex_exit(&block->mutex);
  4479. }
  4480. }
  4481. mutex_enter(&buf_pool->zip_mutex);
  4482. /* Traverse the lists of clean and dirty compressed-only blocks. */
  4483. for (b = UT_LIST_GET_FIRST(buf_pool->zip_clean); b;
  4484. b = UT_LIST_GET_NEXT(list, b)) {
  4485. ut_a(buf_page_get_state(b) == BUF_BLOCK_ZIP_PAGE);
  4486. ut_a(buf_page_get_io_fix(b) != BUF_IO_WRITE);
  4487. if (b->buf_fix_count != 0
  4488. || buf_page_get_io_fix(b) != BUF_IO_NONE) {
  4489. fixed_pages_number++;
  4490. }
  4491. }
  4492. buf_flush_list_mutex_enter(buf_pool);
  4493. for (b = UT_LIST_GET_FIRST(buf_pool->flush_list); b;
  4494. b = UT_LIST_GET_NEXT(list, b)) {
  4495. ut_ad(b->in_flush_list);
  4496. switch (buf_page_get_state(b)) {
  4497. case BUF_BLOCK_ZIP_DIRTY:
  4498. if (b->buf_fix_count != 0
  4499. || buf_page_get_io_fix(b) != BUF_IO_NONE) {
  4500. fixed_pages_number++;
  4501. }
  4502. break;
  4503. case BUF_BLOCK_FILE_PAGE:
  4504. /* uncompressed page */
  4505. break;
  4506. case BUF_BLOCK_POOL_WATCH:
  4507. case BUF_BLOCK_ZIP_PAGE:
  4508. case BUF_BLOCK_NOT_USED:
  4509. case BUF_BLOCK_READY_FOR_USE:
  4510. case BUF_BLOCK_MEMORY:
  4511. case BUF_BLOCK_REMOVE_HASH:
  4512. ut_error;
  4513. break;
  4514. }
  4515. }
  4516. buf_flush_list_mutex_exit(buf_pool);
  4517. mutex_exit(&buf_pool->zip_mutex);
  4518. buf_pool_mutex_exit(buf_pool);
  4519. return(fixed_pages_number);
  4520. }
  4521. /*********************************************************************//**
  4522. Returns the number of latched pages in all the buffer pools.
  4523. @return number of latched pages */
  4524. UNIV_INTERN
  4525. ulint
  4526. buf_get_latched_pages_number(void)
  4527. /*==============================*/
  4528. {
  4529. ulint i;
  4530. ulint total_latched_pages = 0;
  4531. for (i = 0; i < srv_buf_pool_instances; i++) {
  4532. buf_pool_t* buf_pool;
  4533. buf_pool = buf_pool_from_array(i);
  4534. total_latched_pages += buf_get_latched_pages_number_instance(
  4535. buf_pool);
  4536. }
  4537. return(total_latched_pages);
  4538. }
  4539. #endif /* UNIV_DEBUG */
  4540. /*********************************************************************//**
  4541. Returns the number of pending buf pool read ios.
  4542. @return number of pending read I/O operations */
  4543. UNIV_INTERN
  4544. ulint
  4545. buf_get_n_pending_read_ios(void)
  4546. /*============================*/
  4547. {
  4548. ulint i;
  4549. ulint pend_ios = 0;
  4550. for (i = 0; i < srv_buf_pool_instances; i++) {
  4551. pend_ios += buf_pool_from_array(i)->n_pend_reads;
  4552. }
  4553. return(pend_ios);
  4554. }
  4555. /*********************************************************************//**
  4556. Returns the ratio in percents of modified pages in the buffer pool /
  4557. database pages in the buffer pool.
  4558. @return modified page percentage ratio */
  4559. UNIV_INTERN
  4560. double
  4561. buf_get_modified_ratio_pct(void)
  4562. /*============================*/
  4563. {
  4564. double percentage = 0.0;
  4565. ulint lru_len = 0;
  4566. ulint free_len = 0;
  4567. ulint flush_list_len = 0;
  4568. buf_get_total_list_len(&lru_len, &free_len, &flush_list_len);
  4569. percentage = (100.0 * flush_list_len) / (1.0 + lru_len + free_len);
  4570. /* 1 + is there to avoid division by zero */
  4571. return(percentage);
  4572. }
  4573. /*******************************************************************//**
  4574. Aggregates a pool stats information with the total buffer pool stats */
  4575. static
  4576. void
  4577. buf_stats_aggregate_pool_info(
  4578. /*==========================*/
  4579. buf_pool_info_t* total_info, /*!< in/out: the buffer pool
  4580. info to store aggregated
  4581. result */
  4582. const buf_pool_info_t* pool_info) /*!< in: individual buffer pool
  4583. stats info */
  4584. {
  4585. ut_a(total_info && pool_info);
  4586. /* Nothing to copy if total_info is the same as pool_info */
  4587. if (total_info == pool_info) {
  4588. return;
  4589. }
  4590. total_info->pool_size += pool_info->pool_size;
  4591. total_info->lru_len += pool_info->lru_len;
  4592. total_info->old_lru_len += pool_info->old_lru_len;
  4593. total_info->free_list_len += pool_info->free_list_len;
  4594. total_info->flush_list_len += pool_info->flush_list_len;
  4595. total_info->n_pend_unzip += pool_info->n_pend_unzip;
  4596. total_info->n_pend_reads += pool_info->n_pend_reads;
  4597. total_info->n_pending_flush_lru += pool_info->n_pending_flush_lru;
  4598. total_info->n_pending_flush_list += pool_info->n_pending_flush_list;
  4599. total_info->n_pages_made_young += pool_info->n_pages_made_young;
  4600. total_info->n_pages_not_made_young += pool_info->n_pages_not_made_young;
  4601. total_info->n_pages_read += pool_info->n_pages_read;
  4602. total_info->n_pages_created += pool_info->n_pages_created;
  4603. total_info->n_pages_written += pool_info->n_pages_written;
  4604. total_info->n_page_gets += pool_info->n_page_gets;
  4605. total_info->n_ra_pages_read_rnd += pool_info->n_ra_pages_read_rnd;
  4606. total_info->n_ra_pages_read += pool_info->n_ra_pages_read;
  4607. total_info->n_ra_pages_evicted += pool_info->n_ra_pages_evicted;
  4608. total_info->page_made_young_rate += pool_info->page_made_young_rate;
  4609. total_info->page_not_made_young_rate +=
  4610. pool_info->page_not_made_young_rate;
  4611. total_info->pages_read_rate += pool_info->pages_read_rate;
  4612. total_info->pages_created_rate += pool_info->pages_created_rate;
  4613. total_info->pages_written_rate += pool_info->pages_written_rate;
  4614. total_info->n_page_get_delta += pool_info->n_page_get_delta;
  4615. total_info->page_read_delta += pool_info->page_read_delta;
  4616. total_info->young_making_delta += pool_info->young_making_delta;
  4617. total_info->not_young_making_delta += pool_info->not_young_making_delta;
  4618. total_info->pages_readahead_rnd_rate += pool_info->pages_readahead_rnd_rate;
  4619. total_info->pages_readahead_rate += pool_info->pages_readahead_rate;
  4620. total_info->pages_evicted_rate += pool_info->pages_evicted_rate;
  4621. total_info->unzip_lru_len += pool_info->unzip_lru_len;
  4622. total_info->io_sum += pool_info->io_sum;
  4623. total_info->io_cur += pool_info->io_cur;
  4624. total_info->unzip_sum += pool_info->unzip_sum;
  4625. total_info->unzip_cur += pool_info->unzip_cur;
  4626. }
  4627. /*******************************************************************//**
  4628. Collect buffer pool stats information for a buffer pool. Also
  4629. record aggregated stats if there are more than one buffer pool
  4630. in the server */
  4631. UNIV_INTERN
  4632. void
  4633. buf_stats_get_pool_info(
  4634. /*====================*/
  4635. buf_pool_t* buf_pool, /*!< in: buffer pool */
  4636. ulint pool_id, /*!< in: buffer pool ID */
  4637. buf_pool_info_t* all_pool_info) /*!< in/out: buffer pool info
  4638. to fill */
  4639. {
  4640. buf_pool_info_t* pool_info;
  4641. time_t current_time;
  4642. double time_elapsed;
  4643. /* Find appropriate pool_info to store stats for this buffer pool */
  4644. pool_info = &all_pool_info[pool_id];
  4645. buf_pool_mutex_enter(buf_pool);
  4646. buf_flush_list_mutex_enter(buf_pool);
  4647. pool_info->pool_unique_id = pool_id;
  4648. pool_info->pool_size = buf_pool->curr_size;
  4649. pool_info->lru_len = UT_LIST_GET_LEN(buf_pool->LRU);
  4650. pool_info->old_lru_len = buf_pool->LRU_old_len;
  4651. pool_info->free_list_len = UT_LIST_GET_LEN(buf_pool->free);
  4652. pool_info->flush_list_len = UT_LIST_GET_LEN(buf_pool->flush_list);
  4653. pool_info->n_pend_unzip = UT_LIST_GET_LEN(buf_pool->unzip_LRU);
  4654. pool_info->n_pend_reads = buf_pool->n_pend_reads;
  4655. pool_info->n_pending_flush_lru =
  4656. (buf_pool->n_flush[BUF_FLUSH_LRU]
  4657. + buf_pool->init_flush[BUF_FLUSH_LRU]);
  4658. pool_info->n_pending_flush_list =
  4659. (buf_pool->n_flush[BUF_FLUSH_LIST]
  4660. + buf_pool->init_flush[BUF_FLUSH_LIST]);
  4661. pool_info->n_pending_flush_single_page =
  4662. (buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE]
  4663. + buf_pool->init_flush[BUF_FLUSH_SINGLE_PAGE]);
  4664. buf_flush_list_mutex_exit(buf_pool);
  4665. current_time = time(NULL);
  4666. time_elapsed = 0.001 + difftime(current_time,
  4667. buf_pool->last_printout_time);
  4668. pool_info->n_pages_made_young = buf_pool->stat.n_pages_made_young;
  4669. pool_info->n_pages_not_made_young =
  4670. buf_pool->stat.n_pages_not_made_young;
  4671. pool_info->n_pages_read = buf_pool->stat.n_pages_read;
  4672. pool_info->n_pages_created = buf_pool->stat.n_pages_created;
  4673. pool_info->n_pages_written = buf_pool->stat.n_pages_written;
  4674. pool_info->n_page_gets = buf_pool->stat.n_page_gets;
  4675. pool_info->n_ra_pages_read_rnd = buf_pool->stat.n_ra_pages_read_rnd;
  4676. pool_info->n_ra_pages_read = buf_pool->stat.n_ra_pages_read;
  4677. pool_info->n_ra_pages_evicted = buf_pool->stat.n_ra_pages_evicted;
  4678. pool_info->page_made_young_rate =
  4679. (buf_pool->stat.n_pages_made_young
  4680. - buf_pool->old_stat.n_pages_made_young) / time_elapsed;
  4681. pool_info->page_not_made_young_rate =
  4682. (buf_pool->stat.n_pages_not_made_young
  4683. - buf_pool->old_stat.n_pages_not_made_young) / time_elapsed;
  4684. pool_info->pages_read_rate =
  4685. (buf_pool->stat.n_pages_read
  4686. - buf_pool->old_stat.n_pages_read) / time_elapsed;
  4687. pool_info->pages_created_rate =
  4688. (buf_pool->stat.n_pages_created
  4689. - buf_pool->old_stat.n_pages_created) / time_elapsed;
  4690. pool_info->pages_written_rate =
  4691. (buf_pool->stat.n_pages_written
  4692. - buf_pool->old_stat.n_pages_written) / time_elapsed;
  4693. pool_info->n_page_get_delta = buf_pool->stat.n_page_gets
  4694. - buf_pool->old_stat.n_page_gets;
  4695. if (pool_info->n_page_get_delta) {
  4696. pool_info->page_read_delta = buf_pool->stat.n_pages_read
  4697. - buf_pool->old_stat.n_pages_read;
  4698. pool_info->young_making_delta =
  4699. buf_pool->stat.n_pages_made_young
  4700. - buf_pool->old_stat.n_pages_made_young;
  4701. pool_info->not_young_making_delta =
  4702. buf_pool->stat.n_pages_not_made_young
  4703. - buf_pool->old_stat.n_pages_not_made_young;
  4704. }
  4705. pool_info->pages_readahead_rnd_rate =
  4706. (buf_pool->stat.n_ra_pages_read_rnd
  4707. - buf_pool->old_stat.n_ra_pages_read_rnd) / time_elapsed;
  4708. pool_info->pages_readahead_rate =
  4709. (buf_pool->stat.n_ra_pages_read
  4710. - buf_pool->old_stat.n_ra_pages_read) / time_elapsed;
  4711. pool_info->pages_evicted_rate =
  4712. (buf_pool->stat.n_ra_pages_evicted
  4713. - buf_pool->old_stat.n_ra_pages_evicted) / time_elapsed;
  4714. pool_info->unzip_lru_len = UT_LIST_GET_LEN(buf_pool->unzip_LRU);
  4715. pool_info->io_sum = buf_LRU_stat_sum.io;
  4716. pool_info->io_cur = buf_LRU_stat_cur.io;
  4717. pool_info->unzip_sum = buf_LRU_stat_sum.unzip;
  4718. pool_info->unzip_cur = buf_LRU_stat_cur.unzip;
  4719. buf_refresh_io_stats(buf_pool);
  4720. buf_pool_mutex_exit(buf_pool);
  4721. }
  4722. /*********************************************************************//**
  4723. Prints info of the buffer i/o. */
  4724. UNIV_INTERN
  4725. void
  4726. buf_print_io_instance(
  4727. /*==================*/
  4728. buf_pool_info_t*pool_info, /*!< in: buffer pool info */
  4729. FILE* file) /*!< in/out: buffer where to print */
  4730. {
  4731. ut_ad(pool_info);
  4732. fprintf(file,
  4733. "Buffer pool size %lu\n"
  4734. "Free buffers %lu\n"
  4735. "Database pages %lu\n"
  4736. "Old database pages %lu\n"
  4737. "Modified db pages %lu\n"
  4738. "Percent of dirty pages(LRU & free pages): %.3f\n"
  4739. "Max dirty pages percent: %.3f\n"
  4740. "Pending reads %lu\n"
  4741. "Pending writes: LRU %lu, flush list %lu, single page %lu\n",
  4742. pool_info->pool_size,
  4743. pool_info->free_list_len,
  4744. pool_info->lru_len,
  4745. pool_info->old_lru_len,
  4746. pool_info->flush_list_len,
  4747. (((double) pool_info->flush_list_len) /
  4748. (pool_info->lru_len + pool_info->free_list_len + 1.0)) * 100.0,
  4749. srv_max_buf_pool_modified_pct,
  4750. pool_info->n_pend_reads,
  4751. pool_info->n_pending_flush_lru,
  4752. pool_info->n_pending_flush_list,
  4753. pool_info->n_pending_flush_single_page);
  4754. fprintf(file,
  4755. "Pages made young %lu, not young %lu\n"
  4756. "%.2f youngs/s, %.2f non-youngs/s\n"
  4757. "Pages read %lu, created %lu, written %lu\n"
  4758. "%.2f reads/s, %.2f creates/s, %.2f writes/s\n",
  4759. pool_info->n_pages_made_young,
  4760. pool_info->n_pages_not_made_young,
  4761. pool_info->page_made_young_rate,
  4762. pool_info->page_not_made_young_rate,
  4763. pool_info->n_pages_read,
  4764. pool_info->n_pages_created,
  4765. pool_info->n_pages_written,
  4766. pool_info->pages_read_rate,
  4767. pool_info->pages_created_rate,
  4768. pool_info->pages_written_rate);
  4769. if (pool_info->n_page_get_delta) {
  4770. double hit_rate = double(pool_info->page_read_delta)
  4771. / pool_info->n_page_get_delta;
  4772. if (hit_rate > 1) {
  4773. hit_rate = 1;
  4774. }
  4775. fprintf(file,
  4776. "Buffer pool hit rate " ULINTPF " / 1000,"
  4777. " young-making rate " ULINTPF " / 1000 not "
  4778. ULINTPF " / 1000\n",
  4779. ulint(1000 * (1 - hit_rate)),
  4780. ulint(1000 * double(pool_info->young_making_delta)
  4781. / pool_info->n_page_get_delta),
  4782. ulint(1000 * double(pool_info->not_young_making_delta)
  4783. / pool_info->n_page_get_delta));
  4784. } else {
  4785. fputs("No buffer pool page gets since the last printout\n",
  4786. file);
  4787. }
  4788. /* Statistics about read ahead algorithm */
  4789. fprintf(file, "Pages read ahead %.2f/s,"
  4790. " evicted without access %.2f/s,"
  4791. " Random read ahead %.2f/s\n",
  4792. pool_info->pages_readahead_rate,
  4793. pool_info->pages_evicted_rate,
  4794. pool_info->pages_readahead_rnd_rate);
  4795. /* Print some values to help us with visualizing what is
  4796. happening with LRU eviction. */
  4797. fprintf(file,
  4798. "LRU len: %lu, unzip_LRU len: %lu\n"
  4799. "I/O sum[%lu]:cur[%lu], unzip sum[%lu]:cur[%lu]\n",
  4800. pool_info->lru_len, pool_info->unzip_lru_len,
  4801. pool_info->io_sum, pool_info->io_cur,
  4802. pool_info->unzip_sum, pool_info->unzip_cur);
  4803. }
  4804. /*********************************************************************//**
  4805. Prints info of the buffer i/o. */
  4806. UNIV_INTERN
  4807. void
  4808. buf_print_io(
  4809. /*=========*/
  4810. FILE* file) /*!< in/out: buffer where to print */
  4811. {
  4812. ulint i;
  4813. buf_pool_info_t* pool_info;
  4814. buf_pool_info_t* pool_info_total;
  4815. /* If srv_buf_pool_instances is greater than 1, allocate
  4816. one extra buf_pool_info_t, the last one stores
  4817. aggregated/total values from all pools */
  4818. if (srv_buf_pool_instances > 1) {
  4819. pool_info = (buf_pool_info_t*) mem_zalloc((
  4820. srv_buf_pool_instances + 1) * sizeof *pool_info);
  4821. pool_info_total = &pool_info[srv_buf_pool_instances];
  4822. } else {
  4823. ut_a(srv_buf_pool_instances == 1);
  4824. pool_info_total = pool_info =
  4825. static_cast<buf_pool_info_t*>(
  4826. mem_zalloc(sizeof *pool_info));
  4827. }
  4828. for (i = 0; i < srv_buf_pool_instances; i++) {
  4829. buf_pool_t* buf_pool;
  4830. buf_pool = buf_pool_from_array(i);
  4831. /* Fetch individual buffer pool info and calculate
  4832. aggregated stats along the way */
  4833. buf_stats_get_pool_info(buf_pool, i, pool_info);
  4834. /* If we have more than one buffer pool, store
  4835. the aggregated stats */
  4836. if (srv_buf_pool_instances > 1) {
  4837. buf_stats_aggregate_pool_info(pool_info_total,
  4838. &pool_info[i]);
  4839. }
  4840. }
  4841. /* Print the aggreate buffer pool info */
  4842. buf_print_io_instance(pool_info_total, file);
  4843. /* If there are more than one buffer pool, print each individual pool
  4844. info */
  4845. if (srv_buf_pool_instances > 1) {
  4846. fputs("----------------------\n"
  4847. "INDIVIDUAL BUFFER POOL INFO\n"
  4848. "----------------------\n", file);
  4849. for (i = 0; i < srv_buf_pool_instances; i++) {
  4850. fprintf(file, "---BUFFER POOL %lu\n", i);
  4851. buf_print_io_instance(&pool_info[i], file);
  4852. }
  4853. }
  4854. mem_free(pool_info);
  4855. }
  4856. /**********************************************************************//**
  4857. Refreshes the statistics used to print per-second averages. */
  4858. UNIV_INTERN
  4859. void
  4860. buf_refresh_io_stats(
  4861. /*=================*/
  4862. buf_pool_t* buf_pool) /*!< in: buffer pool instance */
  4863. {
  4864. buf_pool->last_printout_time = ut_time();
  4865. buf_pool->old_stat = buf_pool->stat;
  4866. }
  4867. /**********************************************************************//**
  4868. Refreshes the statistics used to print per-second averages. */
  4869. UNIV_INTERN
  4870. void
  4871. buf_refresh_io_stats_all(void)
  4872. /*==========================*/
  4873. {
  4874. for (ulint i = 0; i < srv_buf_pool_instances; i++) {
  4875. buf_pool_t* buf_pool;
  4876. buf_pool = buf_pool_from_array(i);
  4877. buf_refresh_io_stats(buf_pool);
  4878. }
  4879. }
  4880. /**********************************************************************//**
  4881. Check if all pages in all buffer pools are in a replacable state.
  4882. @return FALSE if not */
  4883. UNIV_INTERN
  4884. ibool
  4885. buf_all_freed(void)
  4886. /*===============*/
  4887. {
  4888. for (ulint i = 0; i < srv_buf_pool_instances; i++) {
  4889. buf_pool_t* buf_pool;
  4890. buf_pool = buf_pool_from_array(i);
  4891. if (!buf_all_freed_instance(buf_pool)) {
  4892. return(FALSE);
  4893. }
  4894. }
  4895. return(TRUE);
  4896. }
  4897. /*********************************************************************//**
  4898. Checks that there currently are no pending i/o-operations for the buffer
  4899. pool.
  4900. @return number of pending i/o */
  4901. UNIV_INTERN
  4902. ulint
  4903. buf_pool_check_no_pending_io(void)
  4904. /*==============================*/
  4905. {
  4906. ulint i;
  4907. ulint pending_io = 0;
  4908. buf_pool_mutex_enter_all();
  4909. for (i = 0; i < srv_buf_pool_instances; i++) {
  4910. const buf_pool_t* buf_pool;
  4911. buf_pool = buf_pool_from_array(i);
  4912. pending_io += buf_pool->n_pend_reads
  4913. + buf_pool->n_flush[BUF_FLUSH_LRU]
  4914. + buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE]
  4915. + buf_pool->n_flush[BUF_FLUSH_LIST];
  4916. }
  4917. buf_pool_mutex_exit_all();
  4918. return(pending_io);
  4919. }
  4920. #if 0
  4921. Code currently not used
  4922. /*********************************************************************//**
  4923. Gets the current length of the free list of buffer blocks.
  4924. @return length of the free list */
  4925. UNIV_INTERN
  4926. ulint
  4927. buf_get_free_list_len(void)
  4928. /*=======================*/
  4929. {
  4930. ulint len;
  4931. buf_pool_mutex_enter(buf_pool);
  4932. len = UT_LIST_GET_LEN(buf_pool->free);
  4933. buf_pool_mutex_exit(buf_pool);
  4934. return(len);
  4935. }
  4936. #endif
  4937. #else /* !UNIV_HOTBACKUP */
  4938. /********************************************************************//**
  4939. Inits a page to the buffer buf_pool, for use in mysqlbackup --restore. */
  4940. UNIV_INTERN
  4941. void
  4942. buf_page_init_for_backup_restore(
  4943. /*=============================*/
  4944. ulint space, /*!< in: space id */
  4945. ulint offset, /*!< in: offset of the page within space
  4946. in units of a page */
  4947. ulint zip_size,/*!< in: compressed page size in bytes
  4948. or 0 for uncompressed pages */
  4949. buf_block_t* block) /*!< in: block to init */
  4950. {
  4951. block->page.state = BUF_BLOCK_FILE_PAGE;
  4952. block->page.space = space;
  4953. block->page.offset = offset;
  4954. page_zip_des_init(&block->page.zip);
  4955. /* We assume that block->page.data has been allocated
  4956. with zip_size == UNIV_PAGE_SIZE. */
  4957. ut_ad(zip_size <= UNIV_ZIP_SIZE_MAX);
  4958. ut_ad(ut_is_2pow(zip_size));
  4959. page_zip_set_size(&block->page.zip, zip_size);
  4960. if (zip_size) {
  4961. block->page.zip.data = block->frame + UNIV_PAGE_SIZE;
  4962. }
  4963. }
  4964. #endif /* !UNIV_HOTBACKUP */
  4965. /********************************************************************//**
  4966. Reserve unused slot from temporary memory array and allocate necessary
  4967. temporary memory if not yet allocated.
  4968. @return reserved slot */
  4969. UNIV_INTERN
  4970. buf_tmp_buffer_t*
  4971. buf_pool_reserve_tmp_slot(
  4972. /*======================*/
  4973. buf_pool_t* buf_pool, /*!< in: buffer pool where to
  4974. reserve */
  4975. bool compressed) /*!< in: is file space compressed */
  4976. {
  4977. buf_tmp_buffer_t *free_slot=NULL;
  4978. /* Array is protected by buf_pool mutex */
  4979. buf_pool_mutex_enter(buf_pool);
  4980. for(ulint i = 0; i < buf_pool->tmp_arr->n_slots; i++) {
  4981. buf_tmp_buffer_t *slot = &buf_pool->tmp_arr->slots[i];
  4982. if(slot->reserved == false) {
  4983. free_slot = slot;
  4984. break;
  4985. }
  4986. }
  4987. /* We assume that free slot is found */
  4988. ut_a(free_slot != NULL);
  4989. free_slot->reserved = true;
  4990. /* Now that we have reserved this slot we can release
  4991. buf_pool mutex */
  4992. buf_pool_mutex_exit(buf_pool);
  4993. /* Allocate temporary memory for encryption/decryption */
  4994. if (free_slot->crypt_buf == NULL) {
  4995. free_slot->crypt_buf = static_cast<byte*>(aligned_malloc(UNIV_PAGE_SIZE, UNIV_PAGE_SIZE));
  4996. memset(free_slot->crypt_buf, 0, UNIV_PAGE_SIZE);
  4997. }
  4998. /* For page compressed tables allocate temporary memory for
  4999. compression/decompression */
  5000. if (compressed && free_slot->comp_buf == NULL) {
  5001. ulint size = UNIV_PAGE_SIZE;
  5002. /* Both snappy and lzo compression methods require that
  5003. output buffer used for compression is bigger than input
  5004. buffer. Increase the allocated buffer size accordingly. */
  5005. #if HAVE_SNAPPY
  5006. size = snappy_max_compressed_length(size);
  5007. #endif
  5008. #if HAVE_LZO
  5009. size += LZO1X_1_15_MEM_COMPRESS;
  5010. #endif
  5011. free_slot->comp_buf = static_cast<byte*>(aligned_malloc(size, UNIV_PAGE_SIZE));
  5012. memset(free_slot->comp_buf, 0, size);
  5013. }
  5014. return (free_slot);
  5015. }
  5016. /** Encryption and page_compression hook that is called just before
  5017. a page is written to disk.
  5018. @param[in,out] space tablespace
  5019. @param[in,out] bpage buffer page
  5020. @param[in] src_frame physical page frame that is being encrypted
  5021. @return page frame to be written to file
  5022. (may be src_frame or an encrypted/compressed copy of it) */
  5023. UNIV_INTERN
  5024. byte*
  5025. buf_page_encrypt_before_write(
  5026. fil_space_t* space,
  5027. buf_page_t* bpage,
  5028. byte* src_frame)
  5029. {
  5030. ut_ad(space->id == bpage->space);
  5031. bpage->real_size = UNIV_PAGE_SIZE;
  5032. fil_page_type_validate(src_frame);
  5033. switch (bpage->offset) {
  5034. case 0:
  5035. /* Page 0 of a tablespace is not encrypted/compressed */
  5036. return src_frame;
  5037. case TRX_SYS_PAGE_NO:
  5038. if (bpage->space == TRX_SYS_SPACE) {
  5039. /* don't encrypt/compress page as it contains
  5040. address to dblwr buffer */
  5041. return src_frame;
  5042. }
  5043. }
  5044. fil_space_crypt_t* crypt_data = space->crypt_data;
  5045. const bool encrypted = crypt_data
  5046. && !crypt_data->not_encrypted()
  5047. && crypt_data->type != CRYPT_SCHEME_UNENCRYPTED
  5048. && (!crypt_data->is_default_encryption()
  5049. || srv_encrypt_tables);
  5050. bool page_compressed = FSP_FLAGS_HAS_PAGE_COMPRESSION(space->flags);
  5051. if (!encrypted && !page_compressed) {
  5052. /* No need to encrypt or page compress the page.
  5053. Clear key-version & crypt-checksum. */
  5054. memset(src_frame + FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION, 0, 8);
  5055. return src_frame;
  5056. }
  5057. ulint zip_size = buf_page_get_zip_size(bpage);
  5058. ulint page_size = (zip_size) ? zip_size : UNIV_PAGE_SIZE;
  5059. buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
  5060. /* Find free slot from temporary memory array */
  5061. buf_tmp_buffer_t* slot = buf_pool_reserve_tmp_slot(buf_pool, page_compressed);
  5062. slot->out_buf = NULL;
  5063. bpage->slot = slot;
  5064. byte *dst_frame = slot->crypt_buf;
  5065. if (!page_compressed) {
  5066. /* Encrypt page content */
  5067. byte* tmp = fil_space_encrypt(space,
  5068. bpage->offset,
  5069. bpage->newest_modification,
  5070. src_frame,
  5071. dst_frame);
  5072. bpage->real_size = page_size;
  5073. slot->out_buf = dst_frame = tmp;
  5074. ut_d(fil_page_type_validate(tmp));
  5075. } else {
  5076. /* First we compress the page content */
  5077. ulint out_len = 0;
  5078. byte *tmp = fil_compress_page(
  5079. space,
  5080. (byte *)src_frame,
  5081. slot->comp_buf,
  5082. page_size,
  5083. fsp_flags_get_page_compression_level(space->flags),
  5084. fil_space_get_block_size(space, bpage->offset),
  5085. encrypted,
  5086. &out_len);
  5087. bpage->real_size = out_len;
  5088. #ifdef UNIV_DEBUG
  5089. fil_page_type_validate(tmp);
  5090. #endif
  5091. if(encrypted) {
  5092. /* And then we encrypt the page content */
  5093. tmp = fil_space_encrypt(space,
  5094. bpage->offset,
  5095. bpage->newest_modification,
  5096. tmp,
  5097. dst_frame);
  5098. }
  5099. slot->out_buf = dst_frame = tmp;
  5100. }
  5101. #ifdef UNIV_DEBUG
  5102. fil_page_type_validate(dst_frame);
  5103. #endif
  5104. // return dst_frame which will be written
  5105. return dst_frame;
  5106. }
  5107. /** Decrypt a page.
  5108. @param[in,out] bpage Page control block
  5109. @param[in,out] space tablespace
  5110. @return whether the operation was successful */
  5111. static
  5112. bool
  5113. buf_page_decrypt_after_read(buf_page_t* bpage, fil_space_t* space)
  5114. {
  5115. ut_ad(space->n_pending_ios > 0);
  5116. ut_ad(space->id == bpage->space);
  5117. ulint zip_size = buf_page_get_zip_size(bpage);
  5118. ulint size = (zip_size) ? zip_size : UNIV_PAGE_SIZE;
  5119. byte* dst_frame = (zip_size) ? bpage->zip.data :
  5120. ((buf_block_t*) bpage)->frame;
  5121. unsigned key_version =
  5122. mach_read_from_4(dst_frame + FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION);
  5123. bool page_compressed = fil_page_is_compressed(dst_frame);
  5124. bool page_compressed_encrypted = fil_page_is_compressed_encrypted(dst_frame);
  5125. buf_pool_t* buf_pool = buf_pool_from_bpage(bpage);
  5126. bool success = true;
  5127. if (bpage->offset == 0) {
  5128. /* File header pages are not encrypted/compressed */
  5129. return (true);
  5130. }
  5131. /* Page is encrypted if encryption information is found from
  5132. tablespace and page contains used key_version. This is true
  5133. also for pages first compressed and then encrypted. */
  5134. if (!space->crypt_data) {
  5135. key_version = 0;
  5136. }
  5137. if (page_compressed) {
  5138. /* the page we read is unencrypted */
  5139. /* Find free slot from temporary memory array */
  5140. buf_tmp_buffer_t* slot = buf_pool_reserve_tmp_slot(buf_pool, page_compressed);
  5141. #ifdef UNIV_DEBUG
  5142. fil_page_type_validate(dst_frame);
  5143. #endif
  5144. /* decompress using comp_buf to dst_frame */
  5145. fil_decompress_page(slot->comp_buf,
  5146. dst_frame,
  5147. ulong(size),
  5148. &bpage->write_size);
  5149. /* Mark this slot as free */
  5150. slot->reserved = false;
  5151. key_version = 0;
  5152. #ifdef UNIV_DEBUG
  5153. fil_page_type_validate(dst_frame);
  5154. #endif
  5155. } else {
  5156. buf_tmp_buffer_t* slot = NULL;
  5157. if (key_version) {
  5158. /* Verify encryption checksum before we even try to
  5159. decrypt. */
  5160. if (!fil_space_verify_crypt_checksum(dst_frame,
  5161. zip_size, NULL, bpage->offset)) {
  5162. /* Mark page encrypted in case it should
  5163. be. */
  5164. if (space->crypt_data->type
  5165. != CRYPT_SCHEME_UNENCRYPTED) {
  5166. bpage->encrypted = true;
  5167. }
  5168. return (false);
  5169. }
  5170. /* Find free slot from temporary memory array */
  5171. slot = buf_pool_reserve_tmp_slot(buf_pool, page_compressed);
  5172. #ifdef UNIV_DEBUG
  5173. fil_page_type_validate(dst_frame);
  5174. #endif
  5175. /* decrypt using crypt_buf to dst_frame */
  5176. if (!fil_space_decrypt(space, slot->crypt_buf,
  5177. dst_frame, &bpage->encrypted)) {
  5178. success = false;
  5179. }
  5180. #ifdef UNIV_DEBUG
  5181. fil_page_type_validate(dst_frame);
  5182. #endif
  5183. }
  5184. if (page_compressed_encrypted && success) {
  5185. if (!slot) {
  5186. slot = buf_pool_reserve_tmp_slot(buf_pool, page_compressed);
  5187. }
  5188. #ifdef UNIV_DEBUG
  5189. fil_page_type_validate(dst_frame);
  5190. #endif
  5191. /* decompress using comp_buf to dst_frame */
  5192. fil_decompress_page(slot->comp_buf,
  5193. dst_frame,
  5194. ulong(size),
  5195. &bpage->write_size);
  5196. ut_d(fil_page_type_validate(dst_frame));
  5197. }
  5198. /* Mark this slot as free */
  5199. if (slot) {
  5200. slot->reserved = false;
  5201. }
  5202. }
  5203. ut_ad(space->n_pending_ios > 0);
  5204. return (success);
  5205. }