You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

3941 lines
105 KiB

20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
branches/zip: Improve the LRU algorithm with a separate unzip_LRU list of blocks that contains uncompressed and compressed frames. This patch was designed by Heikki and Inaam, implemented by Inaam, and refined and reviewed by Marko and Sunny. buf_buddy_n_frames, buf_buddy_min_n_frames, buf_buddy_max_n_frames: Remove. buf_page_belongs_to_unzip_LRU(): New predicate: bpage->zip.data && buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE. buf_pool_t, buf_block_t: Add the linked list unzip_LRU. A block in the regular LRU list is in unzip_LRU iff buf_page_belongs_to_unzip_LRU() holds. buf_LRU_free_block(): Add a third return value to refine the case "cannot free the block". buf_LRU_search_and_free_block(): Update the documentation to reflect the implementation. buf_LRU_stat_t, buf_LRU_stat_cur, buf_LRU_stat_sum, buf_LRU_stat_arr[]: Statistics for the unzip_LRU algorithm. buf_LRU_stat_update(): New function: Update the statistics. Called once per second by srv_error_monitor_thread(). buf_LRU_validate(): Validate the unzip_LRU list as well. buf_LRU_evict_from_unzip_LRU(): New predicate: Use the unzip_LRU before falling back to the regular LRU? buf_LRU_free_from_unzip_LRU_list(), buf_LRU_free_from_common_LRU_list(): Subfunctions of buf_LRU_search_and_free_block(). buf_LRU_search_and_free_block(): Reimplement. Try to evict an uncompressed page from the unzip_LRU list before falling back to evicting an entire block from the common LRU list. buf_unzip_LRU_remove_block_if_needed(): New function. buf_unzip_LRU_add_block(): New function: Add a block to the unzip_LRU list.
18 years ago
20 years ago
20 years ago
branches/zip: Improve the LRU algorithm with a separate unzip_LRU list of blocks that contains uncompressed and compressed frames. This patch was designed by Heikki and Inaam, implemented by Inaam, and refined and reviewed by Marko and Sunny. buf_buddy_n_frames, buf_buddy_min_n_frames, buf_buddy_max_n_frames: Remove. buf_page_belongs_to_unzip_LRU(): New predicate: bpage->zip.data && buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE. buf_pool_t, buf_block_t: Add the linked list unzip_LRU. A block in the regular LRU list is in unzip_LRU iff buf_page_belongs_to_unzip_LRU() holds. buf_LRU_free_block(): Add a third return value to refine the case "cannot free the block". buf_LRU_search_and_free_block(): Update the documentation to reflect the implementation. buf_LRU_stat_t, buf_LRU_stat_cur, buf_LRU_stat_sum, buf_LRU_stat_arr[]: Statistics for the unzip_LRU algorithm. buf_LRU_stat_update(): New function: Update the statistics. Called once per second by srv_error_monitor_thread(). buf_LRU_validate(): Validate the unzip_LRU list as well. buf_LRU_evict_from_unzip_LRU(): New predicate: Use the unzip_LRU before falling back to the regular LRU? buf_LRU_free_from_unzip_LRU_list(), buf_LRU_free_from_common_LRU_list(): Subfunctions of buf_LRU_search_and_free_block(). buf_LRU_search_and_free_block(): Reimplement. Try to evict an uncompressed page from the unzip_LRU list before falling back to evicting an entire block from the common LRU list. buf_unzip_LRU_remove_block_if_needed(): New function. buf_unzip_LRU_add_block(): New function: Add a block to the unzip_LRU list.
18 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
branches/zip: Improve the LRU algorithm with a separate unzip_LRU list of blocks that contains uncompressed and compressed frames. This patch was designed by Heikki and Inaam, implemented by Inaam, and refined and reviewed by Marko and Sunny. buf_buddy_n_frames, buf_buddy_min_n_frames, buf_buddy_max_n_frames: Remove. buf_page_belongs_to_unzip_LRU(): New predicate: bpage->zip.data && buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE. buf_pool_t, buf_block_t: Add the linked list unzip_LRU. A block in the regular LRU list is in unzip_LRU iff buf_page_belongs_to_unzip_LRU() holds. buf_LRU_free_block(): Add a third return value to refine the case "cannot free the block". buf_LRU_search_and_free_block(): Update the documentation to reflect the implementation. buf_LRU_stat_t, buf_LRU_stat_cur, buf_LRU_stat_sum, buf_LRU_stat_arr[]: Statistics for the unzip_LRU algorithm. buf_LRU_stat_update(): New function: Update the statistics. Called once per second by srv_error_monitor_thread(). buf_LRU_validate(): Validate the unzip_LRU list as well. buf_LRU_evict_from_unzip_LRU(): New predicate: Use the unzip_LRU before falling back to the regular LRU? buf_LRU_free_from_unzip_LRU_list(), buf_LRU_free_from_common_LRU_list(): Subfunctions of buf_LRU_search_and_free_block(). buf_LRU_search_and_free_block(): Reimplement. Try to evict an uncompressed page from the unzip_LRU list before falling back to evicting an entire block from the common LRU list. buf_unzip_LRU_remove_block_if_needed(): New function. buf_unzip_LRU_add_block(): New function: Add a block to the unzip_LRU list.
18 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
branches/zip: Improve the LRU algorithm with a separate unzip_LRU list of blocks that contains uncompressed and compressed frames. This patch was designed by Heikki and Inaam, implemented by Inaam, and refined and reviewed by Marko and Sunny. buf_buddy_n_frames, buf_buddy_min_n_frames, buf_buddy_max_n_frames: Remove. buf_page_belongs_to_unzip_LRU(): New predicate: bpage->zip.data && buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE. buf_pool_t, buf_block_t: Add the linked list unzip_LRU. A block in the regular LRU list is in unzip_LRU iff buf_page_belongs_to_unzip_LRU() holds. buf_LRU_free_block(): Add a third return value to refine the case "cannot free the block". buf_LRU_search_and_free_block(): Update the documentation to reflect the implementation. buf_LRU_stat_t, buf_LRU_stat_cur, buf_LRU_stat_sum, buf_LRU_stat_arr[]: Statistics for the unzip_LRU algorithm. buf_LRU_stat_update(): New function: Update the statistics. Called once per second by srv_error_monitor_thread(). buf_LRU_validate(): Validate the unzip_LRU list as well. buf_LRU_evict_from_unzip_LRU(): New predicate: Use the unzip_LRU before falling back to the regular LRU? buf_LRU_free_from_unzip_LRU_list(), buf_LRU_free_from_common_LRU_list(): Subfunctions of buf_LRU_search_and_free_block(). buf_LRU_search_and_free_block(): Reimplement. Try to evict an uncompressed page from the unzip_LRU list before falling back to evicting an entire block from the common LRU list. buf_unzip_LRU_remove_block_if_needed(): New function. buf_unzip_LRU_add_block(): New function: Add a block to the unzip_LRU list.
18 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
branches/zip: Improve the LRU algorithm with a separate unzip_LRU list of blocks that contains uncompressed and compressed frames. This patch was designed by Heikki and Inaam, implemented by Inaam, and refined and reviewed by Marko and Sunny. buf_buddy_n_frames, buf_buddy_min_n_frames, buf_buddy_max_n_frames: Remove. buf_page_belongs_to_unzip_LRU(): New predicate: bpage->zip.data && buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE. buf_pool_t, buf_block_t: Add the linked list unzip_LRU. A block in the regular LRU list is in unzip_LRU iff buf_page_belongs_to_unzip_LRU() holds. buf_LRU_free_block(): Add a third return value to refine the case "cannot free the block". buf_LRU_search_and_free_block(): Update the documentation to reflect the implementation. buf_LRU_stat_t, buf_LRU_stat_cur, buf_LRU_stat_sum, buf_LRU_stat_arr[]: Statistics for the unzip_LRU algorithm. buf_LRU_stat_update(): New function: Update the statistics. Called once per second by srv_error_monitor_thread(). buf_LRU_validate(): Validate the unzip_LRU list as well. buf_LRU_evict_from_unzip_LRU(): New predicate: Use the unzip_LRU before falling back to the regular LRU? buf_LRU_free_from_unzip_LRU_list(), buf_LRU_free_from_common_LRU_list(): Subfunctions of buf_LRU_search_and_free_block(). buf_LRU_search_and_free_block(): Reimplement. Try to evict an uncompressed page from the unzip_LRU list before falling back to evicting an entire block from the common LRU list. buf_unzip_LRU_remove_block_if_needed(): New function. buf_unzip_LRU_add_block(): New function: Add a block to the unzip_LRU list.
18 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
branches/zip: Improve the LRU algorithm with a separate unzip_LRU list of blocks that contains uncompressed and compressed frames. This patch was designed by Heikki and Inaam, implemented by Inaam, and refined and reviewed by Marko and Sunny. buf_buddy_n_frames, buf_buddy_min_n_frames, buf_buddy_max_n_frames: Remove. buf_page_belongs_to_unzip_LRU(): New predicate: bpage->zip.data && buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE. buf_pool_t, buf_block_t: Add the linked list unzip_LRU. A block in the regular LRU list is in unzip_LRU iff buf_page_belongs_to_unzip_LRU() holds. buf_LRU_free_block(): Add a third return value to refine the case "cannot free the block". buf_LRU_search_and_free_block(): Update the documentation to reflect the implementation. buf_LRU_stat_t, buf_LRU_stat_cur, buf_LRU_stat_sum, buf_LRU_stat_arr[]: Statistics for the unzip_LRU algorithm. buf_LRU_stat_update(): New function: Update the statistics. Called once per second by srv_error_monitor_thread(). buf_LRU_validate(): Validate the unzip_LRU list as well. buf_LRU_evict_from_unzip_LRU(): New predicate: Use the unzip_LRU before falling back to the regular LRU? buf_LRU_free_from_unzip_LRU_list(), buf_LRU_free_from_common_LRU_list(): Subfunctions of buf_LRU_search_and_free_block(). buf_LRU_search_and_free_block(): Reimplement. Try to evict an uncompressed page from the unzip_LRU list before falling back to evicting an entire block from the common LRU list. buf_unzip_LRU_remove_block_if_needed(): New function. buf_unzip_LRU_add_block(): New function: Add a block to the unzip_LRU list.
18 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
branches/zip: Clean up the insert buffer subsystem. Originally, there were provisions in InnoDB for multiple insert buffer B-trees, apparently one for each tablespace. When Heikki implemented innodb_file_per_table (multiple InnoDB tablespaces) in MySQL 4.1, he made the insert buffer live only in the system tablespace (space 0) but left the provisions in the code. When Osku Salerma implemented delete buffering, he also cleaned up the insert buffer subsystem so that only one insert buffer B-tree exists. This patch applies the clean-up to the InnoDB Plugin. Having a separate patch of the insert buffer clean-up should help us better compare the essential changes of the InnoDB Plugin and InnoDB+ and to track down bugs that are specific to InnoDB+. IBUF_SPACE_ID: New constant, defined as 0. ibuf_data_t: Remove. ibuf_t: Add the applicable fields from ibuf_data_t. There is only one insert buffer tree from now on. ibuf_page_low(), ibuf_page(): Merge to a single function ibuf_page(). fil_space_t: Remove ibuf_data. fil_space_get_ibuf_data(): Remove. There is only one ibuf_data, for space IBUF_SPACE_ID. fil_ibuf_init_at_db_start(): Remove. ibuf_init_at_db_start(): Fuse with ibuf_data_init_for_space(). ibuf_validate_low(): Remove. There is only one ibuf tree. ibuf_free_excess_pages(), ibuf_header_page_get(), ibuf_free_excess_pages(): Remove the parameter space, which was always 0. ibuf_tree_root_get(): Remove the parameters space and data. There is only one ibuf tree, for space IBUF_SPACE_ID. ibuf_data_sizes_update(): Rename to ibuf_size_update(), and remove the parameter data. There is only one ibuf data struct. ibuf_build_entry_pre_4_1_x(): New function, refactored from ibuf_build_entry_from_ibuf_rec(). ibuf_data_enough_free_for_insert(), ibuf_data_too_much_free(): Remove the parameter data. There is only one insert buffer tree. ibuf_add_free_page(), ibuf_remove_free_page(): Remove the parameters space and data. There is only one insert buffer tree. ibuf_get_merge_page_nos(): Add parenthesis, to reduce diffs to branches/innodb+. ibuf_contract_ext(): Do not pick an insert buffer tree at random. There is only one. ibuf_print(): Print the single insert buffer tree. rb://19 approved by Heikki on IM
17 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
branches/zip: Improve the LRU algorithm with a separate unzip_LRU list of blocks that contains uncompressed and compressed frames. This patch was designed by Heikki and Inaam, implemented by Inaam, and refined and reviewed by Marko and Sunny. buf_buddy_n_frames, buf_buddy_min_n_frames, buf_buddy_max_n_frames: Remove. buf_page_belongs_to_unzip_LRU(): New predicate: bpage->zip.data && buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE. buf_pool_t, buf_block_t: Add the linked list unzip_LRU. A block in the regular LRU list is in unzip_LRU iff buf_page_belongs_to_unzip_LRU() holds. buf_LRU_free_block(): Add a third return value to refine the case "cannot free the block". buf_LRU_search_and_free_block(): Update the documentation to reflect the implementation. buf_LRU_stat_t, buf_LRU_stat_cur, buf_LRU_stat_sum, buf_LRU_stat_arr[]: Statistics for the unzip_LRU algorithm. buf_LRU_stat_update(): New function: Update the statistics. Called once per second by srv_error_monitor_thread(). buf_LRU_validate(): Validate the unzip_LRU list as well. buf_LRU_evict_from_unzip_LRU(): New predicate: Use the unzip_LRU before falling back to the regular LRU? buf_LRU_free_from_unzip_LRU_list(), buf_LRU_free_from_common_LRU_list(): Subfunctions of buf_LRU_search_and_free_block(). buf_LRU_search_and_free_block(): Reimplement. Try to evict an uncompressed page from the unzip_LRU list before falling back to evicting an entire block from the common LRU list. buf_unzip_LRU_remove_block_if_needed(): New function. buf_unzip_LRU_add_block(): New function: Add a block to the unzip_LRU list.
18 years ago
branches/zip: Improve the LRU algorithm with a separate unzip_LRU list of blocks that contains uncompressed and compressed frames. This patch was designed by Heikki and Inaam, implemented by Inaam, and refined and reviewed by Marko and Sunny. buf_buddy_n_frames, buf_buddy_min_n_frames, buf_buddy_max_n_frames: Remove. buf_page_belongs_to_unzip_LRU(): New predicate: bpage->zip.data && buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE. buf_pool_t, buf_block_t: Add the linked list unzip_LRU. A block in the regular LRU list is in unzip_LRU iff buf_page_belongs_to_unzip_LRU() holds. buf_LRU_free_block(): Add a third return value to refine the case "cannot free the block". buf_LRU_search_and_free_block(): Update the documentation to reflect the implementation. buf_LRU_stat_t, buf_LRU_stat_cur, buf_LRU_stat_sum, buf_LRU_stat_arr[]: Statistics for the unzip_LRU algorithm. buf_LRU_stat_update(): New function: Update the statistics. Called once per second by srv_error_monitor_thread(). buf_LRU_validate(): Validate the unzip_LRU list as well. buf_LRU_evict_from_unzip_LRU(): New predicate: Use the unzip_LRU before falling back to the regular LRU? buf_LRU_free_from_unzip_LRU_list(), buf_LRU_free_from_common_LRU_list(): Subfunctions of buf_LRU_search_and_free_block(). buf_LRU_search_and_free_block(): Reimplement. Try to evict an uncompressed page from the unzip_LRU list before falling back to evicting an entire block from the common LRU list. buf_unzip_LRU_remove_block_if_needed(): New function. buf_unzip_LRU_add_block(): New function: Add a block to the unzip_LRU list.
18 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
branches/zip: Clean up the insert buffer subsystem. Originally, there were provisions in InnoDB for multiple insert buffer B-trees, apparently one for each tablespace. When Heikki implemented innodb_file_per_table (multiple InnoDB tablespaces) in MySQL 4.1, he made the insert buffer live only in the system tablespace (space 0) but left the provisions in the code. When Osku Salerma implemented delete buffering, he also cleaned up the insert buffer subsystem so that only one insert buffer B-tree exists. This patch applies the clean-up to the InnoDB Plugin. Having a separate patch of the insert buffer clean-up should help us better compare the essential changes of the InnoDB Plugin and InnoDB+ and to track down bugs that are specific to InnoDB+. IBUF_SPACE_ID: New constant, defined as 0. ibuf_data_t: Remove. ibuf_t: Add the applicable fields from ibuf_data_t. There is only one insert buffer tree from now on. ibuf_page_low(), ibuf_page(): Merge to a single function ibuf_page(). fil_space_t: Remove ibuf_data. fil_space_get_ibuf_data(): Remove. There is only one ibuf_data, for space IBUF_SPACE_ID. fil_ibuf_init_at_db_start(): Remove. ibuf_init_at_db_start(): Fuse with ibuf_data_init_for_space(). ibuf_validate_low(): Remove. There is only one ibuf tree. ibuf_free_excess_pages(), ibuf_header_page_get(), ibuf_free_excess_pages(): Remove the parameter space, which was always 0. ibuf_tree_root_get(): Remove the parameters space and data. There is only one ibuf tree, for space IBUF_SPACE_ID. ibuf_data_sizes_update(): Rename to ibuf_size_update(), and remove the parameter data. There is only one ibuf data struct. ibuf_build_entry_pre_4_1_x(): New function, refactored from ibuf_build_entry_from_ibuf_rec(). ibuf_data_enough_free_for_insert(), ibuf_data_too_much_free(): Remove the parameter data. There is only one insert buffer tree. ibuf_add_free_page(), ibuf_remove_free_page(): Remove the parameters space and data. There is only one insert buffer tree. ibuf_get_merge_page_nos(): Add parenthesis, to reduce diffs to branches/innodb+. ibuf_contract_ext(): Do not pick an insert buffer tree at random. There is only one. ibuf_print(): Print the single insert buffer tree. rb://19 approved by Heikki on IM
17 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
branches/zip: Clean up the insert buffer subsystem. Originally, there were provisions in InnoDB for multiple insert buffer B-trees, apparently one for each tablespace. When Heikki implemented innodb_file_per_table (multiple InnoDB tablespaces) in MySQL 4.1, he made the insert buffer live only in the system tablespace (space 0) but left the provisions in the code. When Osku Salerma implemented delete buffering, he also cleaned up the insert buffer subsystem so that only one insert buffer B-tree exists. This patch applies the clean-up to the InnoDB Plugin. Having a separate patch of the insert buffer clean-up should help us better compare the essential changes of the InnoDB Plugin and InnoDB+ and to track down bugs that are specific to InnoDB+. IBUF_SPACE_ID: New constant, defined as 0. ibuf_data_t: Remove. ibuf_t: Add the applicable fields from ibuf_data_t. There is only one insert buffer tree from now on. ibuf_page_low(), ibuf_page(): Merge to a single function ibuf_page(). fil_space_t: Remove ibuf_data. fil_space_get_ibuf_data(): Remove. There is only one ibuf_data, for space IBUF_SPACE_ID. fil_ibuf_init_at_db_start(): Remove. ibuf_init_at_db_start(): Fuse with ibuf_data_init_for_space(). ibuf_validate_low(): Remove. There is only one ibuf tree. ibuf_free_excess_pages(), ibuf_header_page_get(), ibuf_free_excess_pages(): Remove the parameter space, which was always 0. ibuf_tree_root_get(): Remove the parameters space and data. There is only one ibuf tree, for space IBUF_SPACE_ID. ibuf_data_sizes_update(): Rename to ibuf_size_update(), and remove the parameter data. There is only one ibuf data struct. ibuf_build_entry_pre_4_1_x(): New function, refactored from ibuf_build_entry_from_ibuf_rec(). ibuf_data_enough_free_for_insert(), ibuf_data_too_much_free(): Remove the parameter data. There is only one insert buffer tree. ibuf_add_free_page(), ibuf_remove_free_page(): Remove the parameters space and data. There is only one insert buffer tree. ibuf_get_merge_page_nos(): Add parenthesis, to reduce diffs to branches/innodb+. ibuf_contract_ext(): Do not pick an insert buffer tree at random. There is only one. ibuf_print(): Print the single insert buffer tree. rb://19 approved by Heikki on IM
17 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
branches/zip: Improve the LRU algorithm with a separate unzip_LRU list of blocks that contains uncompressed and compressed frames. This patch was designed by Heikki and Inaam, implemented by Inaam, and refined and reviewed by Marko and Sunny. buf_buddy_n_frames, buf_buddy_min_n_frames, buf_buddy_max_n_frames: Remove. buf_page_belongs_to_unzip_LRU(): New predicate: bpage->zip.data && buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE. buf_pool_t, buf_block_t: Add the linked list unzip_LRU. A block in the regular LRU list is in unzip_LRU iff buf_page_belongs_to_unzip_LRU() holds. buf_LRU_free_block(): Add a third return value to refine the case "cannot free the block". buf_LRU_search_and_free_block(): Update the documentation to reflect the implementation. buf_LRU_stat_t, buf_LRU_stat_cur, buf_LRU_stat_sum, buf_LRU_stat_arr[]: Statistics for the unzip_LRU algorithm. buf_LRU_stat_update(): New function: Update the statistics. Called once per second by srv_error_monitor_thread(). buf_LRU_validate(): Validate the unzip_LRU list as well. buf_LRU_evict_from_unzip_LRU(): New predicate: Use the unzip_LRU before falling back to the regular LRU? buf_LRU_free_from_unzip_LRU_list(), buf_LRU_free_from_common_LRU_list(): Subfunctions of buf_LRU_search_and_free_block(). buf_LRU_search_and_free_block(): Reimplement. Try to evict an uncompressed page from the unzip_LRU list before falling back to evicting an entire block from the common LRU list. buf_unzip_LRU_remove_block_if_needed(): New function. buf_unzip_LRU_add_block(): New function: Add a block to the unzip_LRU list.
18 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
branches/zip: Improve the LRU algorithm with a separate unzip_LRU list of blocks that contains uncompressed and compressed frames. This patch was designed by Heikki and Inaam, implemented by Inaam, and refined and reviewed by Marko and Sunny. buf_buddy_n_frames, buf_buddy_min_n_frames, buf_buddy_max_n_frames: Remove. buf_page_belongs_to_unzip_LRU(): New predicate: bpage->zip.data && buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE. buf_pool_t, buf_block_t: Add the linked list unzip_LRU. A block in the regular LRU list is in unzip_LRU iff buf_page_belongs_to_unzip_LRU() holds. buf_LRU_free_block(): Add a third return value to refine the case "cannot free the block". buf_LRU_search_and_free_block(): Update the documentation to reflect the implementation. buf_LRU_stat_t, buf_LRU_stat_cur, buf_LRU_stat_sum, buf_LRU_stat_arr[]: Statistics for the unzip_LRU algorithm. buf_LRU_stat_update(): New function: Update the statistics. Called once per second by srv_error_monitor_thread(). buf_LRU_validate(): Validate the unzip_LRU list as well. buf_LRU_evict_from_unzip_LRU(): New predicate: Use the unzip_LRU before falling back to the regular LRU? buf_LRU_free_from_unzip_LRU_list(), buf_LRU_free_from_common_LRU_list(): Subfunctions of buf_LRU_search_and_free_block(). buf_LRU_search_and_free_block(): Reimplement. Try to evict an uncompressed page from the unzip_LRU list before falling back to evicting an entire block from the common LRU list. buf_unzip_LRU_remove_block_if_needed(): New function. buf_unzip_LRU_add_block(): New function: Add a block to the unzip_LRU list.
18 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
branches/zip: Improve the LRU algorithm with a separate unzip_LRU list of blocks that contains uncompressed and compressed frames. This patch was designed by Heikki and Inaam, implemented by Inaam, and refined and reviewed by Marko and Sunny. buf_buddy_n_frames, buf_buddy_min_n_frames, buf_buddy_max_n_frames: Remove. buf_page_belongs_to_unzip_LRU(): New predicate: bpage->zip.data && buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE. buf_pool_t, buf_block_t: Add the linked list unzip_LRU. A block in the regular LRU list is in unzip_LRU iff buf_page_belongs_to_unzip_LRU() holds. buf_LRU_free_block(): Add a third return value to refine the case "cannot free the block". buf_LRU_search_and_free_block(): Update the documentation to reflect the implementation. buf_LRU_stat_t, buf_LRU_stat_cur, buf_LRU_stat_sum, buf_LRU_stat_arr[]: Statistics for the unzip_LRU algorithm. buf_LRU_stat_update(): New function: Update the statistics. Called once per second by srv_error_monitor_thread(). buf_LRU_validate(): Validate the unzip_LRU list as well. buf_LRU_evict_from_unzip_LRU(): New predicate: Use the unzip_LRU before falling back to the regular LRU? buf_LRU_free_from_unzip_LRU_list(), buf_LRU_free_from_common_LRU_list(): Subfunctions of buf_LRU_search_and_free_block(). buf_LRU_search_and_free_block(): Reimplement. Try to evict an uncompressed page from the unzip_LRU list before falling back to evicting an entire block from the common LRU list. buf_unzip_LRU_remove_block_if_needed(): New function. buf_unzip_LRU_add_block(): New function: Add a block to the unzip_LRU list.
18 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
branches/zip: Merge 2437:2485 from branches/5.1: (r2478 was skipped for the obvious reason) ------------------------------------------------------------------------ r2464 | vasil | 2008-05-19 17:59:42 +0300 (Mon, 19 May 2008) | 9 lines branches/5.1: Fix Bug#36600 SHOW STATUS takes a lot of CPU in buf_get_latched_pages_number by removing the Innodb_buffer_pool_pages_latched variable from SHOW STATUS output in non-UNIV_DEBUG compilation. Approved by: Heikki ------------------------------------------------------------------------ r2466 | calvin | 2008-05-20 01:37:14 +0300 (Tue, 20 May 2008) | 12 lines branches/5.1: Fix Bug#11894 innodb_file_per_table crashes w/ Windows .sym symbolic link hack The crash was due to un-handled error 3 (path not found). In the case of file per table, change the call to os_file_handle_error_no_exit() from os_file_handle_error(). Also, checks for full path pattern during table create (Windows only), which is used in symbolic link and temp table creation. Approved by: Heikki ------------------------------------------------------------------------ r2478 | sunny | 2008-05-23 08:29:08 +0300 (Fri, 23 May 2008) | 3 lines branches/5.1: Fix for bug# 36793. This is a back port from branches/zip. This code has been tested on a big-endian machine too. ------------------------------------------------------------------------ r2480 | vasil | 2008-05-27 11:40:07 +0300 (Tue, 27 May 2008) | 11 lines branches/5.1: Fix Bug#36819 ut_usectime does not handle errors from gettimeofday by retrying gettimeofday() several times if it fails in ut_usectime(). If it fails on all calls then return error to the caller to be handled at higher level. Update the variable innodb_row_lock_time_max in SHOW STATUS output only if ut_usectime() was successful. ------------------------------------------------------------------------ r2482 | sunny | 2008-05-28 12:18:35 +0300 (Wed, 28 May 2008) | 5 lines branches/5.1: Fix for Bug#35602, "Failed to read auto-increment value from storage engine". The test for REPLACE was an error of ommission since it's classified as a simple INSERT. For REPLACE statements we don't acquire the special AUTOINC lock for AUTOINC_NEW_STYLE_LOCKING with this fix. ------------------------------------------------------------------------ r2485 | vasil | 2008-05-28 16:01:14 +0300 (Wed, 28 May 2008) | 9 lines branches/5.1: Fix Bug#36149 Read buffer overflow in srv0start.c found during "make test" Use strncmp(3) instead of memcmp(3) to avoid reading past end of the string if it is empty (*str == '\0'). This bug is _not_ a buffer overflow. Discussed with: Sunny (via IM) ------------------------------------------------------------------------
18 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
branches/zip: Merge 2437:2485 from branches/5.1: (r2478 was skipped for the obvious reason) ------------------------------------------------------------------------ r2464 | vasil | 2008-05-19 17:59:42 +0300 (Mon, 19 May 2008) | 9 lines branches/5.1: Fix Bug#36600 SHOW STATUS takes a lot of CPU in buf_get_latched_pages_number by removing the Innodb_buffer_pool_pages_latched variable from SHOW STATUS output in non-UNIV_DEBUG compilation. Approved by: Heikki ------------------------------------------------------------------------ r2466 | calvin | 2008-05-20 01:37:14 +0300 (Tue, 20 May 2008) | 12 lines branches/5.1: Fix Bug#11894 innodb_file_per_table crashes w/ Windows .sym symbolic link hack The crash was due to un-handled error 3 (path not found). In the case of file per table, change the call to os_file_handle_error_no_exit() from os_file_handle_error(). Also, checks for full path pattern during table create (Windows only), which is used in symbolic link and temp table creation. Approved by: Heikki ------------------------------------------------------------------------ r2478 | sunny | 2008-05-23 08:29:08 +0300 (Fri, 23 May 2008) | 3 lines branches/5.1: Fix for bug# 36793. This is a back port from branches/zip. This code has been tested on a big-endian machine too. ------------------------------------------------------------------------ r2480 | vasil | 2008-05-27 11:40:07 +0300 (Tue, 27 May 2008) | 11 lines branches/5.1: Fix Bug#36819 ut_usectime does not handle errors from gettimeofday by retrying gettimeofday() several times if it fails in ut_usectime(). If it fails on all calls then return error to the caller to be handled at higher level. Update the variable innodb_row_lock_time_max in SHOW STATUS output only if ut_usectime() was successful. ------------------------------------------------------------------------ r2482 | sunny | 2008-05-28 12:18:35 +0300 (Wed, 28 May 2008) | 5 lines branches/5.1: Fix for Bug#35602, "Failed to read auto-increment value from storage engine". The test for REPLACE was an error of ommission since it's classified as a simple INSERT. For REPLACE statements we don't acquire the special AUTOINC lock for AUTOINC_NEW_STYLE_LOCKING with this fix. ------------------------------------------------------------------------ r2485 | vasil | 2008-05-28 16:01:14 +0300 (Wed, 28 May 2008) | 9 lines branches/5.1: Fix Bug#36149 Read buffer overflow in srv0start.c found during "make test" Use strncmp(3) instead of memcmp(3) to avoid reading past end of the string if it is empty (*str == '\0'). This bug is _not_ a buffer overflow. Discussed with: Sunny (via IM) ------------------------------------------------------------------------
18 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
branches/zip: Improve the LRU algorithm with a separate unzip_LRU list of blocks that contains uncompressed and compressed frames. This patch was designed by Heikki and Inaam, implemented by Inaam, and refined and reviewed by Marko and Sunny. buf_buddy_n_frames, buf_buddy_min_n_frames, buf_buddy_max_n_frames: Remove. buf_page_belongs_to_unzip_LRU(): New predicate: bpage->zip.data && buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE. buf_pool_t, buf_block_t: Add the linked list unzip_LRU. A block in the regular LRU list is in unzip_LRU iff buf_page_belongs_to_unzip_LRU() holds. buf_LRU_free_block(): Add a third return value to refine the case "cannot free the block". buf_LRU_search_and_free_block(): Update the documentation to reflect the implementation. buf_LRU_stat_t, buf_LRU_stat_cur, buf_LRU_stat_sum, buf_LRU_stat_arr[]: Statistics for the unzip_LRU algorithm. buf_LRU_stat_update(): New function: Update the statistics. Called once per second by srv_error_monitor_thread(). buf_LRU_validate(): Validate the unzip_LRU list as well. buf_LRU_evict_from_unzip_LRU(): New predicate: Use the unzip_LRU before falling back to the regular LRU? buf_LRU_free_from_unzip_LRU_list(), buf_LRU_free_from_common_LRU_list(): Subfunctions of buf_LRU_search_and_free_block(). buf_LRU_search_and_free_block(): Reimplement. Try to evict an uncompressed page from the unzip_LRU list before falling back to evicting an entire block from the common LRU list. buf_unzip_LRU_remove_block_if_needed(): New function. buf_unzip_LRU_add_block(): New function: Add a block to the unzip_LRU list.
18 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
  1. /* Innobase relational database engine; Copyright (C) 2001 Innobase Oy
  2. This program is free software; you can redistribute it and/or modify
  3. it under the terms of the GNU General Public License 2
  4. as published by the Free Software Foundation in June 1991.
  5. This program is distributed in the hope that it will be useful,
  6. but WITHOUT ANY WARRANTY; without even the implied warranty of
  7. MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
  8. GNU General Public License for more details.
  9. You should have received a copy of the GNU General Public License 2
  10. along with this program (in file COPYING); if not, write to the Free
  11. Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */
  12. /******************************************************
  13. The database buffer buf_pool
  14. (c) 1995 Innobase Oy
  15. Created 11/5/1995 Heikki Tuuri
  16. *******************************************************/
  17. /***********************************************************************
  18. # Copyright (c) 2008, Google Inc.
  19. # All rights reserved.
  20. #
  21. # Redistribution and use in source and binary forms, with or without
  22. # modification, are permitted provided that the following conditions
  23. # are met:
  24. # * Redistributions of source code must retain the above copyright
  25. # notice, this list of conditions and the following disclaimer.
  26. # * Redistributions in binary form must reproduce the above
  27. # copyright notice, this list of conditions and the following
  28. # disclaimer in the documentation and/or other materials
  29. # provided with the distribution.
  30. # * Neither the name of the Google Inc. nor the names of its
  31. # contributors may be used to endorse or promote products
  32. # derived from this software without specific prior written
  33. # permission.
  34. #
  35. # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  36. # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  37. # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
  38. # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
  39. # OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  40. # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
  41. # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  42. # DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  43. # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  44. # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  45. # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  46. #
  47. # Note, the BSD license applies to the new code. The old code is GPL.
  48. ***********************************************************************/
  49. #include "buf0buf.h"
  50. #ifdef UNIV_NONINL
  51. #include "buf0buf.ic"
  52. #endif
  53. #include "buf0buddy.h"
  54. #include "mem0mem.h"
  55. #include "btr0btr.h"
  56. #include "fil0fil.h"
  57. #include "lock0lock.h"
  58. #include "btr0sea.h"
  59. #include "ibuf0ibuf.h"
  60. #include "dict0dict.h"
  61. #include "log0recv.h"
  62. #include "trx0undo.h"
  63. #include "srv0srv.h"
  64. #include "page0zip.h"
  65. /*
  66. IMPLEMENTATION OF THE BUFFER POOL
  67. =================================
  68. Performance improvement:
  69. ------------------------
  70. Thread scheduling in NT may be so slow that the OS wait mechanism should
  71. not be used even in waiting for disk reads to complete.
  72. Rather, we should put waiting query threads to the queue of
  73. waiting jobs, and let the OS thread do something useful while the i/o
  74. is processed. In this way we could remove most OS thread switches in
  75. an i/o-intensive benchmark like TPC-C.
  76. A possibility is to put a user space thread library between the database
  77. and NT. User space thread libraries might be very fast.
  78. SQL Server 7.0 can be configured to use 'fibers' which are lightweight
  79. threads in NT. These should be studied.
  80. Buffer frames and blocks
  81. ------------------------
  82. Following the terminology of Gray and Reuter, we call the memory
  83. blocks where file pages are loaded buffer frames. For each buffer
  84. frame there is a control block, or shortly, a block, in the buffer
  85. control array. The control info which does not need to be stored
  86. in the file along with the file page, resides in the control block.
  87. Buffer pool struct
  88. ------------------
  89. The buffer buf_pool contains a single mutex which protects all the
  90. control data structures of the buf_pool. The content of a buffer frame is
  91. protected by a separate read-write lock in its control block, though.
  92. These locks can be locked and unlocked without owning the buf_pool mutex.
  93. The OS events in the buf_pool struct can be waited for without owning the
  94. buf_pool mutex.
  95. The buf_pool mutex is a hot-spot in main memory, causing a lot of
  96. memory bus traffic on multiprocessor systems when processors
  97. alternately access the mutex. On our Pentium, the mutex is accessed
  98. maybe every 10 microseconds. We gave up the solution to have mutexes
  99. for each control block, for instance, because it seemed to be
  100. complicated.
  101. A solution to reduce mutex contention of the buf_pool mutex is to
  102. create a separate mutex for the page hash table. On Pentium,
  103. accessing the hash table takes 2 microseconds, about half
  104. of the total buf_pool mutex hold time.
  105. Control blocks
  106. --------------
  107. The control block contains, for instance, the bufferfix count
  108. which is incremented when a thread wants a file page to be fixed
  109. in a buffer frame. The bufferfix operation does not lock the
  110. contents of the frame, however. For this purpose, the control
  111. block contains a read-write lock.
  112. The buffer frames have to be aligned so that the start memory
  113. address of a frame is divisible by the universal page size, which
  114. is a power of two.
  115. We intend to make the buffer buf_pool size on-line reconfigurable,
  116. that is, the buf_pool size can be changed without closing the database.
  117. Then the database administarator may adjust it to be bigger
  118. at night, for example. The control block array must
  119. contain enough control blocks for the maximum buffer buf_pool size
  120. which is used in the particular database.
  121. If the buf_pool size is cut, we exploit the virtual memory mechanism of
  122. the OS, and just refrain from using frames at high addresses. Then the OS
  123. can swap them to disk.
  124. The control blocks containing file pages are put to a hash table
  125. according to the file address of the page.
  126. We could speed up the access to an individual page by using
  127. "pointer swizzling": we could replace the page references on
  128. non-leaf index pages by direct pointers to the page, if it exists
  129. in the buf_pool. We could make a separate hash table where we could
  130. chain all the page references in non-leaf pages residing in the buf_pool,
  131. using the page reference as the hash key,
  132. and at the time of reading of a page update the pointers accordingly.
  133. Drawbacks of this solution are added complexity and,
  134. possibly, extra space required on non-leaf pages for memory pointers.
  135. A simpler solution is just to speed up the hash table mechanism
  136. in the database, using tables whose size is a power of 2.
  137. Lists of blocks
  138. ---------------
  139. There are several lists of control blocks.
  140. The free list (buf_pool->free) contains blocks which are currently not
  141. used.
  142. The common LRU list contains all the blocks holding a file page
  143. except those for which the bufferfix count is non-zero.
  144. The pages are in the LRU list roughly in the order of the last
  145. access to the page, so that the oldest pages are at the end of the
  146. list. We also keep a pointer to near the end of the LRU list,
  147. which we can use when we want to artificially age a page in the
  148. buf_pool. This is used if we know that some page is not needed
  149. again for some time: we insert the block right after the pointer,
  150. causing it to be replaced sooner than would noramlly be the case.
  151. Currently this aging mechanism is used for read-ahead mechanism
  152. of pages, and it can also be used when there is a scan of a full
  153. table which cannot fit in the memory. Putting the pages near the
  154. of the LRU list, we make sure that most of the buf_pool stays in the
  155. main memory, undisturbed.
  156. The unzip_LRU list contains a subset of the common LRU list. The
  157. blocks on the unzip_LRU list hold a compressed file page and the
  158. corresponding uncompressed page frame. A block is in unzip_LRU if and
  159. only if the predicate buf_page_belongs_to_unzip_LRU(&block->page)
  160. holds. The blocks in unzip_LRU will be in same order as they are in
  161. the common LRU list. That is, each manipulation of the common LRU
  162. list will result in the same manipulation of the unzip_LRU list.
  163. The chain of modified blocks (buf_pool->flush_list) contains the blocks
  164. holding file pages that have been modified in the memory
  165. but not written to disk yet. The block with the oldest modification
  166. which has not yet been written to disk is at the end of the chain.
  167. The chain of unmodified compressed blocks (buf_pool->zip_clean)
  168. contains the control blocks (buf_page_t) of those compressed pages
  169. that are not in buf_pool->flush_list and for which no uncompressed
  170. page has been allocated in the buffer pool. The control blocks for
  171. uncompressed pages are accessible via buf_block_t objects that are
  172. reachable via buf_pool->chunks[].
  173. The chains of free memory blocks (buf_pool->zip_free[]) are used by
  174. the buddy allocator (buf0buddy.c) to keep track of currently unused
  175. memory blocks of size sizeof(buf_page_t)..UNIV_PAGE_SIZE / 2. These
  176. blocks are inside the UNIV_PAGE_SIZE-sized memory blocks of type
  177. BUF_BLOCK_MEMORY that the buddy allocator requests from the buffer
  178. pool. The buddy allocator is solely used for allocating control
  179. blocks for compressed pages (buf_page_t) and compressed page frames.
  180. Loading a file page
  181. -------------------
  182. First, a victim block for replacement has to be found in the
  183. buf_pool. It is taken from the free list or searched for from the
  184. end of the LRU-list. An exclusive lock is reserved for the frame,
  185. the io_fix field is set in the block fixing the block in buf_pool,
  186. and the io-operation for loading the page is queued. The io-handler thread
  187. releases the X-lock on the frame and resets the io_fix field
  188. when the io operation completes.
  189. A thread may request the above operation using the function
  190. buf_page_get(). It may then continue to request a lock on the frame.
  191. The lock is granted when the io-handler releases the x-lock.
  192. Read-ahead
  193. ----------
  194. The read-ahead mechanism is intended to be intelligent and
  195. isolated from the semantically higher levels of the database
  196. index management. From the higher level we only need the
  197. information if a file page has a natural successor or
  198. predecessor page. On the leaf level of a B-tree index,
  199. these are the next and previous pages in the natural
  200. order of the pages.
  201. Let us first explain the read-ahead mechanism when the leafs
  202. of a B-tree are scanned in an ascending or descending order.
  203. When a read page is the first time referenced in the buf_pool,
  204. the buffer manager checks if it is at the border of a so-called
  205. linear read-ahead area. The tablespace is divided into these
  206. areas of size 64 blocks, for example. So if the page is at the
  207. border of such an area, the read-ahead mechanism checks if
  208. all the other blocks in the area have been accessed in an
  209. ascending or descending order. If this is the case, the system
  210. looks at the natural successor or predecessor of the page,
  211. checks if that is at the border of another area, and in this case
  212. issues read-requests for all the pages in that area. Maybe
  213. we could relax the condition that all the pages in the area
  214. have to be accessed: if data is deleted from a table, there may
  215. appear holes of unused pages in the area.
  216. A different read-ahead mechanism is used when there appears
  217. to be a random access pattern to a file.
  218. If a new page is referenced in the buf_pool, and several pages
  219. of its random access area (for instance, 32 consecutive pages
  220. in a tablespace) have recently been referenced, we may predict
  221. that the whole area may be needed in the near future, and issue
  222. the read requests for the whole area.
  223. */
  224. /* Value in microseconds */
  225. static const int WAIT_FOR_READ = 5000;
  226. /* The buffer buf_pool of the database */
  227. UNIV_INTERN buf_pool_t* buf_pool = NULL;
  228. /* mutex protecting the buffer pool struct and control blocks, except the
  229. read-write lock in them */
  230. UNIV_INTERN mutex_t buf_pool_mutex;
  231. /* mutex protecting the control blocks of compressed-only pages
  232. (of type buf_page_t, not buf_block_t) */
  233. UNIV_INTERN mutex_t buf_pool_zip_mutex;
  234. #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
  235. static ulint buf_dbg_counter = 0; /* This is used to insert validation
  236. operations in excution in the
  237. debug version */
  238. /** Flag to forbid the release of the buffer pool mutex.
  239. Protected by buf_pool_mutex. */
  240. UNIV_INTERN ulint buf_pool_mutex_exit_forbidden = 0;
  241. #endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
  242. #ifdef UNIV_DEBUG
  243. /* If this is set TRUE, the program prints info whenever
  244. read-ahead or flush occurs */
  245. UNIV_INTERN ibool buf_debug_prints = FALSE;
  246. #endif /* UNIV_DEBUG */
  247. /* A chunk of buffers. The buffer pool is allocated in chunks. */
  248. struct buf_chunk_struct{
  249. ulint mem_size; /* allocated size of the chunk */
  250. ulint size; /* size of frames[] and blocks[] */
  251. void* mem; /* pointer to the memory area which
  252. was allocated for the frames */
  253. buf_block_t* blocks; /* array of buffer control blocks */
  254. };
  255. /************************************************************************
  256. Calculates a page checksum which is stored to the page when it is written
  257. to a file. Note that we must be careful to calculate the same value on
  258. 32-bit and 64-bit architectures. */
  259. UNIV_INTERN
  260. ulint
  261. buf_calc_page_new_checksum(
  262. /*=======================*/
  263. /* out: checksum */
  264. const byte* page) /* in: buffer page */
  265. {
  266. ulint checksum;
  267. /* Since the field FIL_PAGE_FILE_FLUSH_LSN, and in versions <= 4.1.x
  268. ..._ARCH_LOG_NO, are written outside the buffer pool to the first
  269. pages of data files, we have to skip them in the page checksum
  270. calculation.
  271. We must also skip the field FIL_PAGE_SPACE_OR_CHKSUM where the
  272. checksum is stored, and also the last 8 bytes of page because
  273. there we store the old formula checksum. */
  274. checksum = ut_fold_binary(page + FIL_PAGE_OFFSET,
  275. FIL_PAGE_FILE_FLUSH_LSN - FIL_PAGE_OFFSET)
  276. + ut_fold_binary(page + FIL_PAGE_DATA,
  277. UNIV_PAGE_SIZE - FIL_PAGE_DATA
  278. - FIL_PAGE_END_LSN_OLD_CHKSUM);
  279. checksum = checksum & 0xFFFFFFFFUL;
  280. return(checksum);
  281. }
  282. /************************************************************************
  283. In versions < 4.0.14 and < 4.1.1 there was a bug that the checksum only
  284. looked at the first few bytes of the page. This calculates that old
  285. checksum.
  286. NOTE: we must first store the new formula checksum to
  287. FIL_PAGE_SPACE_OR_CHKSUM before calculating and storing this old checksum
  288. because this takes that field as an input! */
  289. UNIV_INTERN
  290. ulint
  291. buf_calc_page_old_checksum(
  292. /*=======================*/
  293. /* out: checksum */
  294. const byte* page) /* in: buffer page */
  295. {
  296. ulint checksum;
  297. checksum = ut_fold_binary(page, FIL_PAGE_FILE_FLUSH_LSN);
  298. checksum = checksum & 0xFFFFFFFFUL;
  299. return(checksum);
  300. }
  301. /************************************************************************
  302. Checks if a page is corrupt. */
  303. UNIV_INTERN
  304. ibool
  305. buf_page_is_corrupted(
  306. /*==================*/
  307. /* out: TRUE if corrupted */
  308. const byte* read_buf, /* in: a database page */
  309. ulint zip_size) /* in: size of compressed page;
  310. 0 for uncompressed pages */
  311. {
  312. ulint checksum_field;
  313. ulint old_checksum_field;
  314. #ifndef UNIV_HOTBACKUP
  315. ib_uint64_t current_lsn;
  316. #endif
  317. if (UNIV_LIKELY(!zip_size)
  318. && memcmp(read_buf + FIL_PAGE_LSN + 4,
  319. read_buf + UNIV_PAGE_SIZE
  320. - FIL_PAGE_END_LSN_OLD_CHKSUM + 4, 4)) {
  321. /* Stored log sequence numbers at the start and the end
  322. of page do not match */
  323. return(TRUE);
  324. }
  325. #ifndef UNIV_HOTBACKUP
  326. if (recv_lsn_checks_on && log_peek_lsn(&current_lsn)) {
  327. if (current_lsn < mach_read_ull(read_buf + FIL_PAGE_LSN)) {
  328. ut_print_timestamp(stderr);
  329. fprintf(stderr,
  330. " InnoDB: Error: page %lu log sequence number"
  331. " %llu\n"
  332. "InnoDB: is in the future! Current system "
  333. "log sequence number %llu.\n"
  334. "InnoDB: Your database may be corrupt or "
  335. "you may have copied the InnoDB\n"
  336. "InnoDB: tablespace but not the InnoDB "
  337. "log files. See\n"
  338. "InnoDB: http://dev.mysql.com/doc/refman/"
  339. "5.1/en/forcing-recovery.html\n"
  340. "InnoDB: for more information.\n",
  341. (ulong) mach_read_from_4(read_buf
  342. + FIL_PAGE_OFFSET),
  343. mach_read_ull(read_buf + FIL_PAGE_LSN),
  344. current_lsn);
  345. }
  346. }
  347. #endif
  348. /* If we use checksums validation, make additional check before
  349. returning TRUE to ensure that the checksum is not equal to
  350. BUF_NO_CHECKSUM_MAGIC which might be stored by InnoDB with checksums
  351. disabled. Otherwise, skip checksum calculation and return FALSE */
  352. if (UNIV_LIKELY(srv_use_checksums)) {
  353. checksum_field = mach_read_from_4(read_buf
  354. + FIL_PAGE_SPACE_OR_CHKSUM);
  355. if (UNIV_UNLIKELY(zip_size)) {
  356. return(checksum_field != BUF_NO_CHECKSUM_MAGIC
  357. && checksum_field
  358. != page_zip_calc_checksum(read_buf, zip_size));
  359. }
  360. old_checksum_field = mach_read_from_4(
  361. read_buf + UNIV_PAGE_SIZE
  362. - FIL_PAGE_END_LSN_OLD_CHKSUM);
  363. /* There are 2 valid formulas for old_checksum_field:
  364. 1. Very old versions of InnoDB only stored 8 byte lsn to the
  365. start and the end of the page.
  366. 2. Newer InnoDB versions store the old formula checksum
  367. there. */
  368. if (old_checksum_field != mach_read_from_4(read_buf
  369. + FIL_PAGE_LSN)
  370. && old_checksum_field != BUF_NO_CHECKSUM_MAGIC
  371. && old_checksum_field
  372. != buf_calc_page_old_checksum(read_buf)) {
  373. return(TRUE);
  374. }
  375. /* InnoDB versions < 4.0.14 and < 4.1.1 stored the space id
  376. (always equal to 0), to FIL_PAGE_SPACE_OR_CHKSUM */
  377. if (checksum_field != 0
  378. && checksum_field != BUF_NO_CHECKSUM_MAGIC
  379. && checksum_field
  380. != buf_calc_page_new_checksum(read_buf)) {
  381. return(TRUE);
  382. }
  383. }
  384. return(FALSE);
  385. }
  386. /************************************************************************
  387. Prints a page to stderr. */
  388. UNIV_INTERN
  389. void
  390. buf_page_print(
  391. /*===========*/
  392. const byte* read_buf, /* in: a database page */
  393. ulint zip_size) /* in: compressed page size, or
  394. 0 for uncompressed pages */
  395. {
  396. dict_index_t* index;
  397. ulint checksum;
  398. ulint old_checksum;
  399. ulint size = zip_size;
  400. if (!size) {
  401. size = UNIV_PAGE_SIZE;
  402. }
  403. ut_print_timestamp(stderr);
  404. fprintf(stderr, " InnoDB: Page dump in ascii and hex (%lu bytes):\n",
  405. (ulong) size);
  406. ut_print_buf(stderr, read_buf, size);
  407. fputs("\nInnoDB: End of page dump\n", stderr);
  408. if (zip_size) {
  409. /* Print compressed page. */
  410. switch (fil_page_get_type(read_buf)) {
  411. case FIL_PAGE_TYPE_ZBLOB:
  412. case FIL_PAGE_TYPE_ZBLOB2:
  413. checksum = srv_use_checksums
  414. ? page_zip_calc_checksum(read_buf, zip_size)
  415. : BUF_NO_CHECKSUM_MAGIC;
  416. ut_print_timestamp(stderr);
  417. fprintf(stderr,
  418. " InnoDB: Compressed BLOB page"
  419. " checksum %lu, stored %lu\n"
  420. "InnoDB: Page lsn %lu %lu\n"
  421. "InnoDB: Page number (if stored"
  422. " to page already) %lu,\n"
  423. "InnoDB: space id (if stored"
  424. " to page already) %lu\n",
  425. (ulong) checksum,
  426. (ulong) mach_read_from_4(
  427. read_buf + FIL_PAGE_SPACE_OR_CHKSUM),
  428. (ulong) mach_read_from_4(
  429. read_buf + FIL_PAGE_LSN),
  430. (ulong) mach_read_from_4(
  431. read_buf + (FIL_PAGE_LSN + 4)),
  432. (ulong) mach_read_from_4(
  433. read_buf + FIL_PAGE_OFFSET),
  434. (ulong) mach_read_from_4(
  435. read_buf
  436. + FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID));
  437. return;
  438. default:
  439. ut_print_timestamp(stderr);
  440. fprintf(stderr,
  441. " InnoDB: unknown page type %lu,"
  442. " assuming FIL_PAGE_INDEX\n",
  443. fil_page_get_type(read_buf));
  444. /* fall through */
  445. case FIL_PAGE_INDEX:
  446. checksum = srv_use_checksums
  447. ? page_zip_calc_checksum(read_buf, zip_size)
  448. : BUF_NO_CHECKSUM_MAGIC;
  449. ut_print_timestamp(stderr);
  450. fprintf(stderr,
  451. " InnoDB: Compressed page checksum %lu,"
  452. " stored %lu\n"
  453. "InnoDB: Page lsn %lu %lu\n"
  454. "InnoDB: Page number (if stored"
  455. " to page already) %lu,\n"
  456. "InnoDB: space id (if stored"
  457. " to page already) %lu\n",
  458. (ulong) checksum,
  459. (ulong) mach_read_from_4(
  460. read_buf + FIL_PAGE_SPACE_OR_CHKSUM),
  461. (ulong) mach_read_from_4(
  462. read_buf + FIL_PAGE_LSN),
  463. (ulong) mach_read_from_4(
  464. read_buf + (FIL_PAGE_LSN + 4)),
  465. (ulong) mach_read_from_4(
  466. read_buf + FIL_PAGE_OFFSET),
  467. (ulong) mach_read_from_4(
  468. read_buf
  469. + FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID));
  470. return;
  471. case FIL_PAGE_TYPE_XDES:
  472. /* This is an uncompressed page. */
  473. break;
  474. }
  475. }
  476. checksum = srv_use_checksums
  477. ? buf_calc_page_new_checksum(read_buf) : BUF_NO_CHECKSUM_MAGIC;
  478. old_checksum = srv_use_checksums
  479. ? buf_calc_page_old_checksum(read_buf) : BUF_NO_CHECKSUM_MAGIC;
  480. ut_print_timestamp(stderr);
  481. fprintf(stderr,
  482. " InnoDB: Page checksum %lu, prior-to-4.0.14-form"
  483. " checksum %lu\n"
  484. "InnoDB: stored checksum %lu, prior-to-4.0.14-form"
  485. " stored checksum %lu\n"
  486. "InnoDB: Page lsn %lu %lu, low 4 bytes of lsn"
  487. " at page end %lu\n"
  488. "InnoDB: Page number (if stored to page already) %lu,\n"
  489. "InnoDB: space id (if created with >= MySQL-4.1.1"
  490. " and stored already) %lu\n",
  491. (ulong) checksum, (ulong) old_checksum,
  492. (ulong) mach_read_from_4(read_buf + FIL_PAGE_SPACE_OR_CHKSUM),
  493. (ulong) mach_read_from_4(read_buf + UNIV_PAGE_SIZE
  494. - FIL_PAGE_END_LSN_OLD_CHKSUM),
  495. (ulong) mach_read_from_4(read_buf + FIL_PAGE_LSN),
  496. (ulong) mach_read_from_4(read_buf + FIL_PAGE_LSN + 4),
  497. (ulong) mach_read_from_4(read_buf + UNIV_PAGE_SIZE
  498. - FIL_PAGE_END_LSN_OLD_CHKSUM + 4),
  499. (ulong) mach_read_from_4(read_buf + FIL_PAGE_OFFSET),
  500. (ulong) mach_read_from_4(read_buf
  501. + FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID));
  502. if (mach_read_from_2(read_buf + TRX_UNDO_PAGE_HDR + TRX_UNDO_PAGE_TYPE)
  503. == TRX_UNDO_INSERT) {
  504. fprintf(stderr,
  505. "InnoDB: Page may be an insert undo log page\n");
  506. } else if (mach_read_from_2(read_buf + TRX_UNDO_PAGE_HDR
  507. + TRX_UNDO_PAGE_TYPE)
  508. == TRX_UNDO_UPDATE) {
  509. fprintf(stderr,
  510. "InnoDB: Page may be an update undo log page\n");
  511. }
  512. switch (fil_page_get_type(read_buf)) {
  513. case FIL_PAGE_INDEX:
  514. fprintf(stderr,
  515. "InnoDB: Page may be an index page where"
  516. " index id is %lu %lu\n",
  517. (ulong) ut_dulint_get_high(
  518. btr_page_get_index_id(read_buf)),
  519. (ulong) ut_dulint_get_low(
  520. btr_page_get_index_id(read_buf)));
  521. #ifdef UNIV_HOTBACKUP
  522. /* If the code is in ibbackup, dict_sys may be uninitialized,
  523. i.e., NULL */
  524. if (dict_sys == NULL) {
  525. break;
  526. }
  527. #endif /* UNIV_HOTBACKUP */
  528. index = dict_index_find_on_id_low(
  529. btr_page_get_index_id(read_buf));
  530. if (index) {
  531. fputs("InnoDB: (", stderr);
  532. dict_index_name_print(stderr, NULL, index);
  533. fputs(")\n", stderr);
  534. }
  535. break;
  536. case FIL_PAGE_INODE:
  537. fputs("InnoDB: Page may be an 'inode' page\n", stderr);
  538. break;
  539. case FIL_PAGE_IBUF_FREE_LIST:
  540. fputs("InnoDB: Page may be an insert buffer free list page\n",
  541. stderr);
  542. break;
  543. case FIL_PAGE_TYPE_ALLOCATED:
  544. fputs("InnoDB: Page may be a freshly allocated page\n",
  545. stderr);
  546. break;
  547. case FIL_PAGE_IBUF_BITMAP:
  548. fputs("InnoDB: Page may be an insert buffer bitmap page\n",
  549. stderr);
  550. break;
  551. case FIL_PAGE_TYPE_SYS:
  552. fputs("InnoDB: Page may be a system page\n",
  553. stderr);
  554. break;
  555. case FIL_PAGE_TYPE_TRX_SYS:
  556. fputs("InnoDB: Page may be a transaction system page\n",
  557. stderr);
  558. break;
  559. case FIL_PAGE_TYPE_FSP_HDR:
  560. fputs("InnoDB: Page may be a file space header page\n",
  561. stderr);
  562. break;
  563. case FIL_PAGE_TYPE_XDES:
  564. fputs("InnoDB: Page may be an extent descriptor page\n",
  565. stderr);
  566. break;
  567. case FIL_PAGE_TYPE_BLOB:
  568. fputs("InnoDB: Page may be a BLOB page\n",
  569. stderr);
  570. break;
  571. case FIL_PAGE_TYPE_ZBLOB:
  572. case FIL_PAGE_TYPE_ZBLOB2:
  573. fputs("InnoDB: Page may be a compressed BLOB page\n",
  574. stderr);
  575. break;
  576. }
  577. }
  578. /************************************************************************
  579. Initializes a buffer control block when the buf_pool is created. */
  580. static
  581. void
  582. buf_block_init(
  583. /*===========*/
  584. buf_block_t* block, /* in: pointer to control block */
  585. byte* frame) /* in: pointer to buffer frame */
  586. {
  587. UNIV_MEM_DESC(frame, UNIV_PAGE_SIZE, block);
  588. block->frame = frame;
  589. block->page.state = BUF_BLOCK_NOT_USED;
  590. block->page.buf_fix_count = 0;
  591. block->page.io_fix = BUF_IO_NONE;
  592. block->modify_clock = 0;
  593. #ifdef UNIV_DEBUG_FILE_ACCESSES
  594. block->page.file_page_was_freed = FALSE;
  595. #endif /* UNIV_DEBUG_FILE_ACCESSES */
  596. block->check_index_page_at_flush = FALSE;
  597. block->index = NULL;
  598. #ifdef UNIV_DEBUG
  599. block->page.in_page_hash = FALSE;
  600. block->page.in_zip_hash = FALSE;
  601. block->page.in_flush_list = FALSE;
  602. block->page.in_free_list = FALSE;
  603. block->page.in_LRU_list = FALSE;
  604. block->in_unzip_LRU_list = FALSE;
  605. #endif /* UNIV_DEBUG */
  606. #if defined UNIV_AHI_DEBUG || defined UNIV_DEBUG
  607. block->n_pointers = 0;
  608. #endif /* UNIV_AHI_DEBUG || UNIV_DEBUG */
  609. page_zip_des_init(&block->page.zip);
  610. mutex_create(&block->mutex, SYNC_BUF_BLOCK);
  611. rw_lock_create(&block->lock, SYNC_LEVEL_VARYING);
  612. ut_ad(rw_lock_validate(&(block->lock)));
  613. #ifdef UNIV_SYNC_DEBUG
  614. rw_lock_create(&block->debug_latch, SYNC_NO_ORDER_CHECK);
  615. #endif /* UNIV_SYNC_DEBUG */
  616. }
  617. /************************************************************************
  618. Allocates a chunk of buffer frames. */
  619. static
  620. buf_chunk_t*
  621. buf_chunk_init(
  622. /*===========*/
  623. /* out: chunk, or NULL on failure */
  624. buf_chunk_t* chunk, /* out: chunk of buffers */
  625. ulint mem_size) /* in: requested size in bytes */
  626. {
  627. buf_block_t* block;
  628. byte* frame;
  629. ulint i;
  630. /* Round down to a multiple of page size,
  631. although it already should be. */
  632. mem_size = ut_2pow_round(mem_size, UNIV_PAGE_SIZE);
  633. /* Reserve space for the block descriptors. */
  634. mem_size += ut_2pow_round((mem_size / UNIV_PAGE_SIZE) * (sizeof *block)
  635. + (UNIV_PAGE_SIZE - 1), UNIV_PAGE_SIZE);
  636. chunk->mem_size = mem_size;
  637. chunk->mem = os_mem_alloc_large(&chunk->mem_size);
  638. if (UNIV_UNLIKELY(chunk->mem == NULL)) {
  639. return(NULL);
  640. }
  641. /* Allocate the block descriptors from
  642. the start of the memory block. */
  643. chunk->blocks = chunk->mem;
  644. /* Align a pointer to the first frame. Note that when
  645. os_large_page_size is smaller than UNIV_PAGE_SIZE,
  646. we may allocate one fewer block than requested. When
  647. it is bigger, we may allocate more blocks than requested. */
  648. frame = ut_align(chunk->mem, UNIV_PAGE_SIZE);
  649. chunk->size = chunk->mem_size / UNIV_PAGE_SIZE
  650. - (frame != chunk->mem);
  651. /* Subtract the space needed for block descriptors. */
  652. {
  653. ulint size = chunk->size;
  654. while (frame < (byte*) (chunk->blocks + size)) {
  655. frame += UNIV_PAGE_SIZE;
  656. size--;
  657. }
  658. chunk->size = size;
  659. }
  660. /* Init block structs and assign frames for them. Then we
  661. assign the frames to the first blocks (we already mapped the
  662. memory above). */
  663. block = chunk->blocks;
  664. for (i = chunk->size; i--; ) {
  665. buf_block_init(block, frame);
  666. #ifdef HAVE_purify
  667. /* Wipe contents of frame to eliminate a Purify warning */
  668. memset(block->frame, '\0', UNIV_PAGE_SIZE);
  669. #endif
  670. /* Add the block to the free list */
  671. UT_LIST_ADD_LAST(list, buf_pool->free, (&block->page));
  672. ut_d(block->page.in_free_list = TRUE);
  673. block++;
  674. frame += UNIV_PAGE_SIZE;
  675. }
  676. return(chunk);
  677. }
  678. #ifdef UNIV_DEBUG
  679. /*************************************************************************
  680. Finds a block in the given buffer chunk that points to a
  681. given compressed page. */
  682. static
  683. buf_block_t*
  684. buf_chunk_contains_zip(
  685. /*===================*/
  686. /* out: buffer block pointing to
  687. the compressed page, or NULL */
  688. buf_chunk_t* chunk, /* in: chunk being checked */
  689. const void* data) /* in: pointer to compressed page */
  690. {
  691. buf_block_t* block;
  692. ulint i;
  693. ut_ad(buf_pool);
  694. ut_ad(buf_pool_mutex_own());
  695. block = chunk->blocks;
  696. for (i = chunk->size; i--; block++) {
  697. if (block->page.zip.data == data) {
  698. return(block);
  699. }
  700. }
  701. return(NULL);
  702. }
  703. /*************************************************************************
  704. Finds a block in the buffer pool that points to a
  705. given compressed page. */
  706. UNIV_INTERN
  707. buf_block_t*
  708. buf_pool_contains_zip(
  709. /*==================*/
  710. /* out: buffer block pointing to
  711. the compressed page, or NULL */
  712. const void* data) /* in: pointer to compressed page */
  713. {
  714. ulint n;
  715. buf_chunk_t* chunk = buf_pool->chunks;
  716. for (n = buf_pool->n_chunks; n--; chunk++) {
  717. buf_block_t* block = buf_chunk_contains_zip(chunk, data);
  718. if (block) {
  719. return(block);
  720. }
  721. }
  722. return(NULL);
  723. }
  724. #endif /* UNIV_DEBUG */
  725. /*************************************************************************
  726. Checks that all file pages in the buffer chunk are in a replaceable state. */
  727. static
  728. const buf_block_t*
  729. buf_chunk_not_freed(
  730. /*================*/
  731. /* out: address of a non-free block,
  732. or NULL if all freed */
  733. buf_chunk_t* chunk) /* in: chunk being checked */
  734. {
  735. buf_block_t* block;
  736. ulint i;
  737. ut_ad(buf_pool);
  738. ut_ad(buf_pool_mutex_own());
  739. block = chunk->blocks;
  740. for (i = chunk->size; i--; block++) {
  741. mutex_enter(&block->mutex);
  742. if (buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE
  743. && !buf_flush_ready_for_replace(&block->page)) {
  744. mutex_exit(&block->mutex);
  745. return(block);
  746. }
  747. mutex_exit(&block->mutex);
  748. }
  749. return(NULL);
  750. }
  751. /*************************************************************************
  752. Checks that all blocks in the buffer chunk are in BUF_BLOCK_NOT_USED state. */
  753. static
  754. ibool
  755. buf_chunk_all_free(
  756. /*===============*/
  757. /* out: TRUE if all freed */
  758. const buf_chunk_t* chunk) /* in: chunk being checked */
  759. {
  760. const buf_block_t* block;
  761. ulint i;
  762. ut_ad(buf_pool);
  763. ut_ad(buf_pool_mutex_own());
  764. block = chunk->blocks;
  765. for (i = chunk->size; i--; block++) {
  766. if (buf_block_get_state(block) != BUF_BLOCK_NOT_USED) {
  767. return(FALSE);
  768. }
  769. }
  770. return(TRUE);
  771. }
  772. /************************************************************************
  773. Frees a chunk of buffer frames. */
  774. static
  775. void
  776. buf_chunk_free(
  777. /*===========*/
  778. buf_chunk_t* chunk) /* out: chunk of buffers */
  779. {
  780. buf_block_t* block;
  781. const buf_block_t* block_end;
  782. ut_ad(buf_pool_mutex_own());
  783. block_end = chunk->blocks + chunk->size;
  784. for (block = chunk->blocks; block < block_end; block++) {
  785. ut_a(buf_block_get_state(block) == BUF_BLOCK_NOT_USED);
  786. ut_a(!block->page.zip.data);
  787. ut_ad(!block->page.in_LRU_list);
  788. ut_ad(!block->in_unzip_LRU_list);
  789. ut_ad(!block->page.in_flush_list);
  790. /* Remove the block from the free list. */
  791. ut_ad(block->page.in_free_list);
  792. UT_LIST_REMOVE(list, buf_pool->free, (&block->page));
  793. /* Free the latches. */
  794. mutex_free(&block->mutex);
  795. rw_lock_free(&block->lock);
  796. #ifdef UNIV_SYNC_DEBUG
  797. rw_lock_free(&block->debug_latch);
  798. #endif /* UNIV_SYNC_DEBUG */
  799. UNIV_MEM_UNDESC(block);
  800. }
  801. os_mem_free_large(chunk->mem, chunk->mem_size);
  802. }
  803. /************************************************************************
  804. Creates the buffer pool. */
  805. UNIV_INTERN
  806. buf_pool_t*
  807. buf_pool_init(void)
  808. /*===============*/
  809. /* out, own: buf_pool object, NULL if not
  810. enough memory or error */
  811. {
  812. buf_chunk_t* chunk;
  813. ulint i;
  814. buf_pool = mem_zalloc(sizeof(buf_pool_t));
  815. /* 1. Initialize general fields
  816. ------------------------------- */
  817. mutex_create(&buf_pool_mutex, SYNC_BUF_POOL);
  818. mutex_create(&buf_pool_zip_mutex, SYNC_BUF_BLOCK);
  819. buf_pool_mutex_enter();
  820. buf_pool->n_chunks = 1;
  821. buf_pool->chunks = chunk = mem_alloc(sizeof *chunk);
  822. UT_LIST_INIT(buf_pool->free);
  823. if (!buf_chunk_init(chunk, srv_buf_pool_size)) {
  824. mem_free(chunk);
  825. mem_free(buf_pool);
  826. buf_pool = NULL;
  827. return(NULL);
  828. }
  829. srv_buf_pool_old_size = srv_buf_pool_size;
  830. buf_pool->curr_size = chunk->size;
  831. srv_buf_pool_curr_size = buf_pool->curr_size * UNIV_PAGE_SIZE;
  832. buf_pool->page_hash = hash_create(2 * buf_pool->curr_size);
  833. buf_pool->zip_hash = hash_create(2 * buf_pool->curr_size);
  834. buf_pool->last_printout_time = time(NULL);
  835. /* 2. Initialize flushing fields
  836. -------------------------------- */
  837. for (i = BUF_FLUSH_LRU; i < BUF_FLUSH_N_TYPES; i++) {
  838. buf_pool->no_flush[i] = os_event_create(NULL);
  839. }
  840. buf_pool->ulint_clock = 1;
  841. /* 3. Initialize LRU fields
  842. --------------------------- */
  843. /* All fields are initialized by mem_zalloc(). */
  844. buf_pool_mutex_exit();
  845. btr_search_sys_create(buf_pool->curr_size
  846. * UNIV_PAGE_SIZE / sizeof(void*) / 64);
  847. /* 4. Initialize the buddy allocator fields */
  848. /* All fields are initialized by mem_zalloc(). */
  849. return(buf_pool);
  850. }
  851. /************************************************************************
  852. Frees the buffer pool at shutdown. This must not be invoked before
  853. freeing all mutexes. */
  854. UNIV_INTERN
  855. void
  856. buf_pool_free(void)
  857. /*===============*/
  858. {
  859. buf_chunk_t* chunk;
  860. buf_chunk_t* chunks;
  861. chunks = buf_pool->chunks;
  862. chunk = chunks + buf_pool->n_chunks;
  863. while (--chunk >= chunks) {
  864. /* Bypass the checks of buf_chunk_free(), since they
  865. would fail at shutdown. */
  866. os_mem_free_large(chunk->mem, chunk->mem_size);
  867. }
  868. buf_pool->n_chunks = 0;
  869. }
  870. /************************************************************************
  871. Drops the adaptive hash index. To prevent a livelock, this function
  872. is only to be called while holding btr_search_latch and while
  873. btr_search_enabled == FALSE. */
  874. UNIV_INTERN
  875. void
  876. buf_pool_drop_hash_index(void)
  877. /*==========================*/
  878. {
  879. ibool released_search_latch;
  880. #ifdef UNIV_SYNC_DEBUG
  881. ut_ad(rw_lock_own(&btr_search_latch, RW_LOCK_EX));
  882. #endif /* UNIV_SYNC_DEBUG */
  883. ut_ad(!btr_search_enabled);
  884. do {
  885. buf_chunk_t* chunks = buf_pool->chunks;
  886. buf_chunk_t* chunk = chunks + buf_pool->n_chunks;
  887. released_search_latch = FALSE;
  888. while (--chunk >= chunks) {
  889. buf_block_t* block = chunk->blocks;
  890. ulint i = chunk->size;
  891. for (; i--; block++) {
  892. /* block->is_hashed cannot be modified
  893. when we have an x-latch on btr_search_latch;
  894. see the comment in buf0buf.h */
  895. if (!block->is_hashed) {
  896. continue;
  897. }
  898. /* To follow the latching order, we
  899. have to release btr_search_latch
  900. before acquiring block->latch. */
  901. rw_lock_x_unlock(&btr_search_latch);
  902. /* When we release the search latch,
  903. we must rescan all blocks, because
  904. some may become hashed again. */
  905. released_search_latch = TRUE;
  906. rw_lock_x_lock(&block->lock);
  907. /* This should be guaranteed by the
  908. callers, which will be holding
  909. btr_search_enabled_mutex. */
  910. ut_ad(!btr_search_enabled);
  911. /* Because we did not buffer-fix the
  912. block by calling buf_block_get_gen(),
  913. it is possible that the block has been
  914. allocated for some other use after
  915. btr_search_latch was released above.
  916. We do not care which file page the
  917. block is mapped to. All we want to do
  918. is to drop any hash entries referring
  919. to the page. */
  920. /* It is possible that
  921. block->page.state != BUF_FILE_PAGE.
  922. Even that does not matter, because
  923. btr_search_drop_page_hash_index() will
  924. check block->is_hashed before doing
  925. anything. block->is_hashed can only
  926. be set on uncompressed file pages. */
  927. btr_search_drop_page_hash_index(block);
  928. rw_lock_x_unlock(&block->lock);
  929. rw_lock_x_lock(&btr_search_latch);
  930. ut_ad(!btr_search_enabled);
  931. }
  932. }
  933. } while (released_search_latch);
  934. }
  935. /************************************************************************
  936. Relocate a buffer control block. Relocates the block on the LRU list
  937. and in buf_pool->page_hash. Does not relocate bpage->list.
  938. The caller must take care of relocating bpage->list. */
  939. UNIV_INTERN
  940. void
  941. buf_relocate(
  942. /*=========*/
  943. buf_page_t* bpage, /* in/out: control block being relocated;
  944. buf_page_get_state(bpage) must be
  945. BUF_BLOCK_ZIP_DIRTY or BUF_BLOCK_ZIP_PAGE */
  946. buf_page_t* dpage) /* in/out: destination control block */
  947. {
  948. buf_page_t* b;
  949. ulint fold;
  950. ut_ad(buf_pool_mutex_own());
  951. ut_ad(mutex_own(buf_page_get_mutex(bpage)));
  952. ut_a(buf_page_get_io_fix(bpage) == BUF_IO_NONE);
  953. ut_a(bpage->buf_fix_count == 0);
  954. ut_ad(bpage->in_LRU_list);
  955. ut_ad(!bpage->in_zip_hash);
  956. ut_ad(bpage->in_page_hash);
  957. ut_ad(bpage == buf_page_hash_get(bpage->space, bpage->offset));
  958. #ifdef UNIV_DEBUG
  959. switch (buf_page_get_state(bpage)) {
  960. case BUF_BLOCK_ZIP_FREE:
  961. case BUF_BLOCK_NOT_USED:
  962. case BUF_BLOCK_READY_FOR_USE:
  963. case BUF_BLOCK_FILE_PAGE:
  964. case BUF_BLOCK_MEMORY:
  965. case BUF_BLOCK_REMOVE_HASH:
  966. ut_error;
  967. case BUF_BLOCK_ZIP_DIRTY:
  968. case BUF_BLOCK_ZIP_PAGE:
  969. break;
  970. }
  971. #endif /* UNIV_DEBUG */
  972. memcpy(dpage, bpage, sizeof *dpage);
  973. ut_d(bpage->in_LRU_list = FALSE);
  974. ut_d(bpage->in_page_hash = FALSE);
  975. /* relocate buf_pool->LRU */
  976. b = UT_LIST_GET_PREV(LRU, bpage);
  977. UT_LIST_REMOVE(LRU, buf_pool->LRU, bpage);
  978. if (b) {
  979. UT_LIST_INSERT_AFTER(LRU, buf_pool->LRU, b, dpage);
  980. } else {
  981. UT_LIST_ADD_FIRST(LRU, buf_pool->LRU, dpage);
  982. }
  983. if (UNIV_UNLIKELY(buf_pool->LRU_old == bpage)) {
  984. buf_pool->LRU_old = dpage;
  985. #ifdef UNIV_LRU_DEBUG
  986. /* buf_pool->LRU_old must be the first item in the LRU list
  987. whose "old" flag is set. */
  988. ut_a(!UT_LIST_GET_PREV(LRU, buf_pool->LRU_old)
  989. || !UT_LIST_GET_PREV(LRU, buf_pool->LRU_old)->old);
  990. ut_a(!UT_LIST_GET_NEXT(LRU, buf_pool->LRU_old)
  991. || UT_LIST_GET_NEXT(LRU, buf_pool->LRU_old)->old);
  992. #endif /* UNIV_LRU_DEBUG */
  993. }
  994. ut_d(UT_LIST_VALIDATE(LRU, buf_page_t, buf_pool->LRU));
  995. /* relocate buf_pool->page_hash */
  996. fold = buf_page_address_fold(bpage->space, bpage->offset);
  997. HASH_DELETE(buf_page_t, hash, buf_pool->page_hash, fold, bpage);
  998. HASH_INSERT(buf_page_t, hash, buf_pool->page_hash, fold, dpage);
  999. UNIV_MEM_INVALID(bpage, sizeof *bpage);
  1000. }
  1001. /************************************************************************
  1002. Shrinks the buffer pool. */
  1003. static
  1004. void
  1005. buf_pool_shrink(
  1006. /*============*/
  1007. /* out: TRUE if shrunk */
  1008. ulint chunk_size) /* in: number of pages to remove */
  1009. {
  1010. buf_chunk_t* chunks;
  1011. buf_chunk_t* chunk;
  1012. ulint max_size;
  1013. ulint max_free_size;
  1014. buf_chunk_t* max_chunk;
  1015. buf_chunk_t* max_free_chunk;
  1016. ut_ad(!buf_pool_mutex_own());
  1017. try_again:
  1018. btr_search_disable(); /* Empty the adaptive hash index again */
  1019. buf_pool_mutex_enter();
  1020. shrink_again:
  1021. if (buf_pool->n_chunks <= 1) {
  1022. /* Cannot shrink if there is only one chunk */
  1023. goto func_done;
  1024. }
  1025. /* Search for the largest free chunk
  1026. not larger than the size difference */
  1027. chunks = buf_pool->chunks;
  1028. chunk = chunks + buf_pool->n_chunks;
  1029. max_size = max_free_size = 0;
  1030. max_chunk = max_free_chunk = NULL;
  1031. while (--chunk >= chunks) {
  1032. if (chunk->size <= chunk_size
  1033. && chunk->size > max_free_size) {
  1034. if (chunk->size > max_size) {
  1035. max_size = chunk->size;
  1036. max_chunk = chunk;
  1037. }
  1038. if (buf_chunk_all_free(chunk)) {
  1039. max_free_size = chunk->size;
  1040. max_free_chunk = chunk;
  1041. }
  1042. }
  1043. }
  1044. if (!max_free_size) {
  1045. ulint dirty = 0;
  1046. ulint nonfree = 0;
  1047. buf_block_t* block;
  1048. buf_block_t* bend;
  1049. /* Cannot shrink: try again later
  1050. (do not assign srv_buf_pool_old_size) */
  1051. if (!max_chunk) {
  1052. goto func_exit;
  1053. }
  1054. block = max_chunk->blocks;
  1055. bend = block + max_chunk->size;
  1056. /* Move the blocks of chunk to the end of the
  1057. LRU list and try to flush them. */
  1058. for (; block < bend; block++) {
  1059. switch (buf_block_get_state(block)) {
  1060. case BUF_BLOCK_NOT_USED:
  1061. continue;
  1062. case BUF_BLOCK_FILE_PAGE:
  1063. break;
  1064. default:
  1065. nonfree++;
  1066. continue;
  1067. }
  1068. mutex_enter(&block->mutex);
  1069. /* The following calls will temporarily
  1070. release block->mutex and buf_pool_mutex.
  1071. Therefore, we have to always retry,
  1072. even if !dirty && !nonfree. */
  1073. if (!buf_flush_ready_for_replace(&block->page)) {
  1074. buf_LRU_make_block_old(&block->page);
  1075. dirty++;
  1076. } else if (buf_LRU_free_block(&block->page, TRUE, NULL)
  1077. != BUF_LRU_FREED) {
  1078. nonfree++;
  1079. }
  1080. mutex_exit(&block->mutex);
  1081. }
  1082. buf_pool_mutex_exit();
  1083. /* Request for a flush of the chunk if it helps.
  1084. Do not flush if there are non-free blocks, since
  1085. flushing will not make the chunk freeable. */
  1086. if (nonfree) {
  1087. /* Avoid busy-waiting. */
  1088. os_thread_sleep(100000);
  1089. } else if (dirty
  1090. && buf_flush_batch(BUF_FLUSH_LRU, dirty, 0)
  1091. == ULINT_UNDEFINED) {
  1092. buf_flush_wait_batch_end(BUF_FLUSH_LRU);
  1093. }
  1094. goto try_again;
  1095. }
  1096. max_size = max_free_size;
  1097. max_chunk = max_free_chunk;
  1098. srv_buf_pool_old_size = srv_buf_pool_size;
  1099. /* Rewrite buf_pool->chunks. Copy everything but max_chunk. */
  1100. chunks = mem_alloc((buf_pool->n_chunks - 1) * sizeof *chunks);
  1101. memcpy(chunks, buf_pool->chunks,
  1102. (max_chunk - buf_pool->chunks) * sizeof *chunks);
  1103. memcpy(chunks + (max_chunk - buf_pool->chunks),
  1104. max_chunk + 1,
  1105. buf_pool->chunks + buf_pool->n_chunks
  1106. - (max_chunk + 1));
  1107. ut_a(buf_pool->curr_size > max_chunk->size);
  1108. buf_pool->curr_size -= max_chunk->size;
  1109. srv_buf_pool_curr_size = buf_pool->curr_size * UNIV_PAGE_SIZE;
  1110. chunk_size -= max_chunk->size;
  1111. buf_chunk_free(max_chunk);
  1112. mem_free(buf_pool->chunks);
  1113. buf_pool->chunks = chunks;
  1114. buf_pool->n_chunks--;
  1115. /* Allow a slack of one megabyte. */
  1116. if (chunk_size > 1048576 / UNIV_PAGE_SIZE) {
  1117. goto shrink_again;
  1118. }
  1119. func_done:
  1120. srv_buf_pool_old_size = srv_buf_pool_size;
  1121. func_exit:
  1122. buf_pool_mutex_exit();
  1123. btr_search_enable();
  1124. }
  1125. /************************************************************************
  1126. Rebuild buf_pool->page_hash. */
  1127. static
  1128. void
  1129. buf_pool_page_hash_rebuild(void)
  1130. /*============================*/
  1131. {
  1132. ulint i;
  1133. ulint n_chunks;
  1134. buf_chunk_t* chunk;
  1135. hash_table_t* page_hash;
  1136. hash_table_t* zip_hash;
  1137. buf_page_t* b;
  1138. buf_pool_mutex_enter();
  1139. /* Free, create, and populate the hash table. */
  1140. hash_table_free(buf_pool->page_hash);
  1141. buf_pool->page_hash = page_hash = hash_create(2 * buf_pool->curr_size);
  1142. zip_hash = hash_create(2 * buf_pool->curr_size);
  1143. HASH_MIGRATE(buf_pool->zip_hash, zip_hash, buf_page_t, hash,
  1144. BUF_POOL_ZIP_FOLD_BPAGE);
  1145. hash_table_free(buf_pool->zip_hash);
  1146. buf_pool->zip_hash = zip_hash;
  1147. /* Insert the uncompressed file pages to buf_pool->page_hash. */
  1148. chunk = buf_pool->chunks;
  1149. n_chunks = buf_pool->n_chunks;
  1150. for (i = 0; i < n_chunks; i++, chunk++) {
  1151. ulint j;
  1152. buf_block_t* block = chunk->blocks;
  1153. for (j = 0; j < chunk->size; j++, block++) {
  1154. if (buf_block_get_state(block)
  1155. == BUF_BLOCK_FILE_PAGE) {
  1156. ut_ad(!block->page.in_zip_hash);
  1157. ut_ad(block->page.in_page_hash);
  1158. HASH_INSERT(buf_page_t, hash, page_hash,
  1159. buf_page_address_fold(
  1160. block->page.space,
  1161. block->page.offset),
  1162. &block->page);
  1163. }
  1164. }
  1165. }
  1166. /* Insert the compressed-only pages to buf_pool->page_hash.
  1167. All such blocks are either in buf_pool->zip_clean or
  1168. in buf_pool->flush_list. */
  1169. for (b = UT_LIST_GET_FIRST(buf_pool->zip_clean); b;
  1170. b = UT_LIST_GET_NEXT(list, b)) {
  1171. ut_a(buf_page_get_state(b) == BUF_BLOCK_ZIP_PAGE);
  1172. ut_ad(!b->in_flush_list);
  1173. ut_ad(b->in_LRU_list);
  1174. ut_ad(b->in_page_hash);
  1175. ut_ad(!b->in_zip_hash);
  1176. HASH_INSERT(buf_page_t, hash, page_hash,
  1177. buf_page_address_fold(b->space, b->offset), b);
  1178. }
  1179. for (b = UT_LIST_GET_FIRST(buf_pool->flush_list); b;
  1180. b = UT_LIST_GET_NEXT(list, b)) {
  1181. ut_ad(b->in_flush_list);
  1182. ut_ad(b->in_LRU_list);
  1183. ut_ad(b->in_page_hash);
  1184. ut_ad(!b->in_zip_hash);
  1185. switch (buf_page_get_state(b)) {
  1186. case BUF_BLOCK_ZIP_DIRTY:
  1187. HASH_INSERT(buf_page_t, hash, page_hash,
  1188. buf_page_address_fold(b->space,
  1189. b->offset), b);
  1190. break;
  1191. case BUF_BLOCK_FILE_PAGE:
  1192. /* uncompressed page */
  1193. break;
  1194. case BUF_BLOCK_ZIP_FREE:
  1195. case BUF_BLOCK_ZIP_PAGE:
  1196. case BUF_BLOCK_NOT_USED:
  1197. case BUF_BLOCK_READY_FOR_USE:
  1198. case BUF_BLOCK_MEMORY:
  1199. case BUF_BLOCK_REMOVE_HASH:
  1200. ut_error;
  1201. break;
  1202. }
  1203. }
  1204. buf_pool_mutex_exit();
  1205. }
  1206. /************************************************************************
  1207. Resizes the buffer pool. */
  1208. UNIV_INTERN
  1209. void
  1210. buf_pool_resize(void)
  1211. /*=================*/
  1212. {
  1213. buf_pool_mutex_enter();
  1214. if (srv_buf_pool_old_size == srv_buf_pool_size) {
  1215. buf_pool_mutex_exit();
  1216. return;
  1217. }
  1218. if (srv_buf_pool_curr_size + 1048576 > srv_buf_pool_size) {
  1219. buf_pool_mutex_exit();
  1220. /* Disable adaptive hash indexes and empty the index
  1221. in order to free up memory in the buffer pool chunks. */
  1222. buf_pool_shrink((srv_buf_pool_curr_size - srv_buf_pool_size)
  1223. / UNIV_PAGE_SIZE);
  1224. } else if (srv_buf_pool_curr_size + 1048576 < srv_buf_pool_size) {
  1225. /* Enlarge the buffer pool by at least one megabyte */
  1226. ulint mem_size
  1227. = srv_buf_pool_size - srv_buf_pool_curr_size;
  1228. buf_chunk_t* chunks;
  1229. buf_chunk_t* chunk;
  1230. chunks = mem_alloc((buf_pool->n_chunks + 1) * sizeof *chunks);
  1231. memcpy(chunks, buf_pool->chunks, buf_pool->n_chunks
  1232. * sizeof *chunks);
  1233. chunk = &chunks[buf_pool->n_chunks];
  1234. if (!buf_chunk_init(chunk, mem_size)) {
  1235. mem_free(chunks);
  1236. } else {
  1237. buf_pool->curr_size += chunk->size;
  1238. srv_buf_pool_curr_size = buf_pool->curr_size
  1239. * UNIV_PAGE_SIZE;
  1240. mem_free(buf_pool->chunks);
  1241. buf_pool->chunks = chunks;
  1242. buf_pool->n_chunks++;
  1243. }
  1244. srv_buf_pool_old_size = srv_buf_pool_size;
  1245. buf_pool_mutex_exit();
  1246. }
  1247. buf_pool_page_hash_rebuild();
  1248. }
  1249. /************************************************************************
  1250. Moves to the block to the start of the LRU list if there is a danger
  1251. that the block would drift out of the buffer pool. */
  1252. UNIV_INLINE
  1253. void
  1254. buf_block_make_young(
  1255. /*=================*/
  1256. buf_page_t* bpage) /* in: block to make younger */
  1257. {
  1258. ut_ad(!buf_pool_mutex_own());
  1259. /* Note that we read freed_page_clock's without holding any mutex:
  1260. this is allowed since the result is used only in heuristics */
  1261. if (buf_page_peek_if_too_old(bpage)) {
  1262. buf_pool_mutex_enter();
  1263. /* There has been freeing activity in the LRU list:
  1264. best to move to the head of the LRU list */
  1265. buf_LRU_make_block_young(bpage);
  1266. buf_pool_mutex_exit();
  1267. }
  1268. }
  1269. /************************************************************************
  1270. Moves a page to the start of the buffer pool LRU list. This high-level
  1271. function can be used to prevent an important page from from slipping out of
  1272. the buffer pool. */
  1273. UNIV_INTERN
  1274. void
  1275. buf_page_make_young(
  1276. /*================*/
  1277. buf_page_t* bpage) /* in: buffer block of a file page */
  1278. {
  1279. buf_pool_mutex_enter();
  1280. ut_a(buf_page_in_file(bpage));
  1281. buf_LRU_make_block_young(bpage);
  1282. buf_pool_mutex_exit();
  1283. }
  1284. /************************************************************************
  1285. Resets the check_index_page_at_flush field of a page if found in the buffer
  1286. pool. */
  1287. UNIV_INTERN
  1288. void
  1289. buf_reset_check_index_page_at_flush(
  1290. /*================================*/
  1291. ulint space, /* in: space id */
  1292. ulint offset) /* in: page number */
  1293. {
  1294. buf_block_t* block;
  1295. buf_pool_mutex_enter();
  1296. block = (buf_block_t*) buf_page_hash_get(space, offset);
  1297. if (block && buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE) {
  1298. block->check_index_page_at_flush = FALSE;
  1299. }
  1300. buf_pool_mutex_exit();
  1301. }
  1302. /************************************************************************
  1303. Returns the current state of is_hashed of a page. FALSE if the page is
  1304. not in the pool. NOTE that this operation does not fix the page in the
  1305. pool if it is found there. */
  1306. UNIV_INTERN
  1307. ibool
  1308. buf_page_peek_if_search_hashed(
  1309. /*===========================*/
  1310. /* out: TRUE if page hash index is built in search
  1311. system */
  1312. ulint space, /* in: space id */
  1313. ulint offset) /* in: page number */
  1314. {
  1315. buf_block_t* block;
  1316. ibool is_hashed;
  1317. buf_pool_mutex_enter();
  1318. block = (buf_block_t*) buf_page_hash_get(space, offset);
  1319. if (!block || buf_block_get_state(block) != BUF_BLOCK_FILE_PAGE) {
  1320. is_hashed = FALSE;
  1321. } else {
  1322. is_hashed = block->is_hashed;
  1323. }
  1324. buf_pool_mutex_exit();
  1325. return(is_hashed);
  1326. }
  1327. #ifdef UNIV_DEBUG_FILE_ACCESSES
  1328. /************************************************************************
  1329. Sets file_page_was_freed TRUE if the page is found in the buffer pool.
  1330. This function should be called when we free a file page and want the
  1331. debug version to check that it is not accessed any more unless
  1332. reallocated. */
  1333. UNIV_INTERN
  1334. buf_page_t*
  1335. buf_page_set_file_page_was_freed(
  1336. /*=============================*/
  1337. /* out: control block if found in page hash table,
  1338. otherwise NULL */
  1339. ulint space, /* in: space id */
  1340. ulint offset) /* in: page number */
  1341. {
  1342. buf_page_t* bpage;
  1343. buf_pool_mutex_enter();
  1344. bpage = buf_page_hash_get(space, offset);
  1345. if (bpage) {
  1346. bpage->file_page_was_freed = TRUE;
  1347. }
  1348. buf_pool_mutex_exit();
  1349. return(bpage);
  1350. }
  1351. /************************************************************************
  1352. Sets file_page_was_freed FALSE if the page is found in the buffer pool.
  1353. This function should be called when we free a file page and want the
  1354. debug version to check that it is not accessed any more unless
  1355. reallocated. */
  1356. UNIV_INTERN
  1357. buf_page_t*
  1358. buf_page_reset_file_page_was_freed(
  1359. /*===============================*/
  1360. /* out: control block if found in page hash table,
  1361. otherwise NULL */
  1362. ulint space, /* in: space id */
  1363. ulint offset) /* in: page number */
  1364. {
  1365. buf_page_t* bpage;
  1366. buf_pool_mutex_enter();
  1367. bpage = buf_page_hash_get(space, offset);
  1368. if (bpage) {
  1369. bpage->file_page_was_freed = FALSE;
  1370. }
  1371. buf_pool_mutex_exit();
  1372. return(bpage);
  1373. }
  1374. #endif /* UNIV_DEBUG_FILE_ACCESSES */
  1375. /************************************************************************
  1376. Get read access to a compressed page (usually of type
  1377. FIL_PAGE_TYPE_ZBLOB or FIL_PAGE_TYPE_ZBLOB2).
  1378. The page must be released with buf_page_release_zip().
  1379. NOTE: the page is not protected by any latch. Mutual exclusion has to
  1380. be implemented at a higher level. In other words, all possible
  1381. accesses to a given page through this function must be protected by
  1382. the same set of mutexes or latches. */
  1383. UNIV_INTERN
  1384. buf_page_t*
  1385. buf_page_get_zip(
  1386. /*=============*/
  1387. /* out: pointer to the block */
  1388. ulint space, /* in: space id */
  1389. ulint zip_size,/* in: compressed page size */
  1390. ulint offset) /* in: page number */
  1391. {
  1392. buf_page_t* bpage;
  1393. mutex_t* block_mutex;
  1394. ibool must_read;
  1395. #ifndef UNIV_LOG_DEBUG
  1396. ut_ad(!ibuf_inside());
  1397. #endif
  1398. buf_pool->n_page_gets++;
  1399. for (;;) {
  1400. buf_pool_mutex_enter();
  1401. lookup:
  1402. bpage = buf_page_hash_get(space, offset);
  1403. if (bpage) {
  1404. break;
  1405. }
  1406. /* Page not in buf_pool: needs to be read from file */
  1407. buf_pool_mutex_exit();
  1408. buf_read_page(space, zip_size, offset);
  1409. #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
  1410. ut_a(++buf_dbg_counter % 37 || buf_validate());
  1411. #endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
  1412. }
  1413. if (UNIV_UNLIKELY(!bpage->zip.data)) {
  1414. /* There is no compressed page. */
  1415. buf_pool_mutex_exit();
  1416. return(NULL);
  1417. }
  1418. block_mutex = buf_page_get_mutex(bpage);
  1419. mutex_enter(block_mutex);
  1420. switch (buf_page_get_state(bpage)) {
  1421. case BUF_BLOCK_NOT_USED:
  1422. case BUF_BLOCK_READY_FOR_USE:
  1423. case BUF_BLOCK_MEMORY:
  1424. case BUF_BLOCK_REMOVE_HASH:
  1425. case BUF_BLOCK_ZIP_FREE:
  1426. ut_error;
  1427. break;
  1428. case BUF_BLOCK_ZIP_PAGE:
  1429. case BUF_BLOCK_ZIP_DIRTY:
  1430. bpage->buf_fix_count++;
  1431. break;
  1432. case BUF_BLOCK_FILE_PAGE:
  1433. /* Discard the uncompressed page frame if possible. */
  1434. if (buf_LRU_free_block(bpage, FALSE, NULL)
  1435. == BUF_LRU_FREED) {
  1436. mutex_exit(block_mutex);
  1437. goto lookup;
  1438. }
  1439. buf_block_buf_fix_inc((buf_block_t*) bpage,
  1440. __FILE__, __LINE__);
  1441. break;
  1442. }
  1443. must_read = buf_page_get_io_fix(bpage) == BUF_IO_READ;
  1444. buf_pool_mutex_exit();
  1445. buf_page_set_accessed(bpage, TRUE);
  1446. mutex_exit(block_mutex);
  1447. buf_block_make_young(bpage);
  1448. #ifdef UNIV_DEBUG_FILE_ACCESSES
  1449. ut_a(!bpage->file_page_was_freed);
  1450. #endif
  1451. #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
  1452. ut_a(++buf_dbg_counter % 5771 || buf_validate());
  1453. ut_a(bpage->buf_fix_count > 0);
  1454. ut_a(buf_page_in_file(bpage));
  1455. #endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
  1456. if (must_read) {
  1457. /* Let us wait until the read operation
  1458. completes */
  1459. for (;;) {
  1460. enum buf_io_fix io_fix;
  1461. mutex_enter(block_mutex);
  1462. io_fix = buf_page_get_io_fix(bpage);
  1463. mutex_exit(block_mutex);
  1464. if (io_fix == BUF_IO_READ) {
  1465. os_thread_sleep(WAIT_FOR_READ);
  1466. } else {
  1467. break;
  1468. }
  1469. }
  1470. }
  1471. #ifdef UNIV_IBUF_COUNT_DEBUG
  1472. ut_a(ibuf_count_get(buf_page_get_space(bpage),
  1473. buf_page_get_page_no(bpage)) == 0);
  1474. #endif
  1475. return(bpage);
  1476. }
  1477. /************************************************************************
  1478. Initialize some fields of a control block. */
  1479. UNIV_INLINE
  1480. void
  1481. buf_block_init_low(
  1482. /*===============*/
  1483. buf_block_t* block) /* in: block to init */
  1484. {
  1485. block->check_index_page_at_flush = FALSE;
  1486. block->index = NULL;
  1487. block->n_hash_helps = 0;
  1488. block->is_hashed = FALSE;
  1489. block->n_fields = 1;
  1490. block->n_bytes = 0;
  1491. block->left_side = TRUE;
  1492. }
  1493. /************************************************************************
  1494. Decompress a block. */
  1495. static
  1496. ibool
  1497. buf_zip_decompress(
  1498. /*===============*/
  1499. /* out: TRUE if successful */
  1500. buf_block_t* block, /* in/out: block */
  1501. ibool check) /* in: TRUE=verify the page checksum */
  1502. {
  1503. const byte* frame = block->page.zip.data;
  1504. ut_ad(buf_block_get_zip_size(block));
  1505. ut_a(buf_block_get_space(block) != 0);
  1506. if (UNIV_LIKELY(check)) {
  1507. ulint stamp_checksum = mach_read_from_4(
  1508. frame + FIL_PAGE_SPACE_OR_CHKSUM);
  1509. ulint calc_checksum = page_zip_calc_checksum(
  1510. frame, page_zip_get_size(&block->page.zip));
  1511. if (UNIV_UNLIKELY(stamp_checksum != calc_checksum)) {
  1512. ut_print_timestamp(stderr);
  1513. fprintf(stderr,
  1514. " InnoDB: compressed page checksum mismatch"
  1515. " (space %u page %u): %lu != %lu\n",
  1516. block->page.space, block->page.offset,
  1517. stamp_checksum, calc_checksum);
  1518. return(FALSE);
  1519. }
  1520. }
  1521. switch (fil_page_get_type(frame)) {
  1522. case FIL_PAGE_INDEX:
  1523. if (page_zip_decompress(&block->page.zip,
  1524. block->frame)) {
  1525. return(TRUE);
  1526. }
  1527. fprintf(stderr,
  1528. "InnoDB: unable to decompress space %lu page %lu\n",
  1529. (ulong) block->page.space,
  1530. (ulong) block->page.offset);
  1531. return(FALSE);
  1532. case FIL_PAGE_TYPE_ALLOCATED:
  1533. case FIL_PAGE_INODE:
  1534. case FIL_PAGE_IBUF_BITMAP:
  1535. case FIL_PAGE_TYPE_FSP_HDR:
  1536. case FIL_PAGE_TYPE_XDES:
  1537. case FIL_PAGE_TYPE_ZBLOB:
  1538. case FIL_PAGE_TYPE_ZBLOB2:
  1539. /* Copy to uncompressed storage. */
  1540. memcpy(block->frame, frame,
  1541. buf_block_get_zip_size(block));
  1542. return(TRUE);
  1543. }
  1544. ut_print_timestamp(stderr);
  1545. fprintf(stderr,
  1546. " InnoDB: unknown compressed page"
  1547. " type %lu\n",
  1548. fil_page_get_type(frame));
  1549. return(FALSE);
  1550. }
  1551. /***********************************************************************
  1552. Gets the block to whose frame the pointer is pointing to. */
  1553. UNIV_INTERN
  1554. buf_block_t*
  1555. buf_block_align(
  1556. /*============*/
  1557. /* out: pointer to block, never NULL */
  1558. const byte* ptr) /* in: pointer to a frame */
  1559. {
  1560. buf_chunk_t* chunk;
  1561. ulint i;
  1562. /* TODO: protect buf_pool->chunks with a mutex (it will
  1563. currently remain constant after buf_pool_init()) */
  1564. for (chunk = buf_pool->chunks, i = buf_pool->n_chunks; i--; chunk++) {
  1565. lint offs = ptr - chunk->blocks->frame;
  1566. if (UNIV_UNLIKELY(offs < 0)) {
  1567. continue;
  1568. }
  1569. offs >>= UNIV_PAGE_SIZE_SHIFT;
  1570. if (UNIV_LIKELY((ulint) offs < chunk->size)) {
  1571. buf_block_t* block = &chunk->blocks[offs];
  1572. /* The function buf_chunk_init() invokes
  1573. buf_block_init() so that block[n].frame ==
  1574. block->frame + n * UNIV_PAGE_SIZE. Check it. */
  1575. ut_ad(block->frame == page_align(ptr));
  1576. #ifdef UNIV_DEBUG
  1577. /* A thread that updates these fields must
  1578. hold buf_pool_mutex and block->mutex. Acquire
  1579. only the latter. */
  1580. mutex_enter(&block->mutex);
  1581. switch (buf_block_get_state(block)) {
  1582. case BUF_BLOCK_ZIP_FREE:
  1583. case BUF_BLOCK_ZIP_PAGE:
  1584. case BUF_BLOCK_ZIP_DIRTY:
  1585. /* These types should only be used in
  1586. the compressed buffer pool, whose
  1587. memory is allocated from
  1588. buf_pool->chunks, in UNIV_PAGE_SIZE
  1589. blocks flagged as BUF_BLOCK_MEMORY. */
  1590. ut_error;
  1591. break;
  1592. case BUF_BLOCK_NOT_USED:
  1593. case BUF_BLOCK_READY_FOR_USE:
  1594. case BUF_BLOCK_MEMORY:
  1595. /* Some data structures contain
  1596. "guess" pointers to file pages. The
  1597. file pages may have been freed and
  1598. reused. Do not complain. */
  1599. break;
  1600. case BUF_BLOCK_REMOVE_HASH:
  1601. /* buf_LRU_block_remove_hashed_page()
  1602. will overwrite the FIL_PAGE_OFFSET and
  1603. FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID with
  1604. 0xff and set the state to
  1605. BUF_BLOCK_REMOVE_HASH. */
  1606. ut_ad(page_get_space_id(page_align(ptr))
  1607. == 0xffffffff);
  1608. ut_ad(page_get_page_no(page_align(ptr))
  1609. == 0xffffffff);
  1610. break;
  1611. case BUF_BLOCK_FILE_PAGE:
  1612. ut_ad(block->page.space
  1613. == page_get_space_id(page_align(ptr)));
  1614. ut_ad(block->page.offset
  1615. == page_get_page_no(page_align(ptr)));
  1616. break;
  1617. }
  1618. mutex_exit(&block->mutex);
  1619. #endif /* UNIV_DEBUG */
  1620. return(block);
  1621. }
  1622. }
  1623. /* The block should always be found. */
  1624. ut_error;
  1625. return(NULL);
  1626. }
  1627. /************************************************************************
  1628. Find out if a buffer block was created by buf_chunk_init(). */
  1629. static
  1630. ibool
  1631. buf_block_is_uncompressed(
  1632. /*======================*/
  1633. /* out: TRUE if "block" has
  1634. been added to buf_pool->free
  1635. by buf_chunk_init() */
  1636. const buf_block_t* block) /* in: pointer to block,
  1637. not dereferenced */
  1638. {
  1639. const buf_chunk_t* chunk = buf_pool->chunks;
  1640. const buf_chunk_t* const echunk = chunk + buf_pool->n_chunks;
  1641. ut_ad(buf_pool_mutex_own());
  1642. if (UNIV_UNLIKELY((((ulint) block) % sizeof *block) != 0)) {
  1643. /* The pointer should be aligned. */
  1644. return(FALSE);
  1645. }
  1646. while (chunk < echunk) {
  1647. if (block >= chunk->blocks
  1648. && block < chunk->blocks + chunk->size) {
  1649. return(TRUE);
  1650. }
  1651. chunk++;
  1652. }
  1653. return(FALSE);
  1654. }
  1655. /************************************************************************
  1656. This is the general function used to get access to a database page. */
  1657. UNIV_INTERN
  1658. buf_block_t*
  1659. buf_page_get_gen(
  1660. /*=============*/
  1661. /* out: pointer to the block or NULL */
  1662. ulint space, /* in: space id */
  1663. ulint zip_size,/* in: compressed page size in bytes
  1664. or 0 for uncompressed pages */
  1665. ulint offset, /* in: page number */
  1666. ulint rw_latch,/* in: RW_S_LATCH, RW_X_LATCH, RW_NO_LATCH */
  1667. buf_block_t* guess, /* in: guessed block or NULL */
  1668. ulint mode, /* in: BUF_GET, BUF_GET_IF_IN_POOL,
  1669. BUF_GET_NO_LATCH */
  1670. const char* file, /* in: file name */
  1671. ulint line, /* in: line where called */
  1672. mtr_t* mtr) /* in: mini-transaction */
  1673. {
  1674. buf_block_t* block;
  1675. ibool accessed;
  1676. ulint fix_type;
  1677. ibool must_read;
  1678. ut_ad(mtr);
  1679. ut_ad((rw_latch == RW_S_LATCH)
  1680. || (rw_latch == RW_X_LATCH)
  1681. || (rw_latch == RW_NO_LATCH));
  1682. ut_ad((mode != BUF_GET_NO_LATCH) || (rw_latch == RW_NO_LATCH));
  1683. ut_ad((mode == BUF_GET) || (mode == BUF_GET_IF_IN_POOL)
  1684. || (mode == BUF_GET_NO_LATCH));
  1685. ut_ad(zip_size == fil_space_get_zip_size(space));
  1686. #ifndef UNIV_LOG_DEBUG
  1687. ut_ad(!ibuf_inside() || ibuf_page(space, zip_size, offset, NULL));
  1688. #endif
  1689. buf_pool->n_page_gets++;
  1690. loop:
  1691. block = guess;
  1692. buf_pool_mutex_enter();
  1693. if (block) {
  1694. /* If the guess is a compressed page descriptor that
  1695. has been allocated by buf_buddy_alloc(), it may have
  1696. been invalidated by buf_buddy_relocate(). In that
  1697. case, block could point to something that happens to
  1698. contain the expected bits in block->page. Similarly,
  1699. the guess may be pointing to a buffer pool chunk that
  1700. has been released when resizing the buffer pool. */
  1701. if (!buf_block_is_uncompressed(block)
  1702. || offset != block->page.offset
  1703. || space != block->page.space
  1704. || buf_block_get_state(block) != BUF_BLOCK_FILE_PAGE) {
  1705. block = guess = NULL;
  1706. } else {
  1707. ut_ad(!block->page.in_zip_hash);
  1708. ut_ad(block->page.in_page_hash);
  1709. }
  1710. }
  1711. if (block == NULL) {
  1712. block = (buf_block_t*) buf_page_hash_get(space, offset);
  1713. }
  1714. loop2:
  1715. if (block == NULL) {
  1716. /* Page not in buf_pool: needs to be read from file */
  1717. buf_pool_mutex_exit();
  1718. if (mode == BUF_GET_IF_IN_POOL) {
  1719. return(NULL);
  1720. }
  1721. buf_read_page(space, zip_size, offset);
  1722. #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
  1723. ut_a(++buf_dbg_counter % 37 || buf_validate());
  1724. #endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
  1725. goto loop;
  1726. }
  1727. ut_ad(page_zip_get_size(&block->page.zip) == zip_size);
  1728. must_read = buf_block_get_io_fix(block) == BUF_IO_READ;
  1729. if (must_read && mode == BUF_GET_IF_IN_POOL) {
  1730. /* The page is only being read to buffer */
  1731. buf_pool_mutex_exit();
  1732. return(NULL);
  1733. }
  1734. switch (buf_block_get_state(block)) {
  1735. buf_page_t* bpage;
  1736. ibool success;
  1737. case BUF_BLOCK_FILE_PAGE:
  1738. break;
  1739. case BUF_BLOCK_ZIP_PAGE:
  1740. case BUF_BLOCK_ZIP_DIRTY:
  1741. bpage = &block->page;
  1742. if (bpage->buf_fix_count
  1743. || buf_page_get_io_fix(bpage) != BUF_IO_NONE) {
  1744. /* This condition often occurs when the buffer
  1745. is not buffer-fixed, but I/O-fixed by
  1746. buf_page_init_for_read(). */
  1747. wait_until_unfixed:
  1748. /* The block is buffer-fixed or I/O-fixed.
  1749. Try again later. */
  1750. buf_pool_mutex_exit();
  1751. os_thread_sleep(WAIT_FOR_READ);
  1752. goto loop;
  1753. }
  1754. /* Allocate an uncompressed page. */
  1755. buf_pool_mutex_exit();
  1756. block = buf_LRU_get_free_block(0);
  1757. ut_a(block);
  1758. buf_pool_mutex_enter();
  1759. mutex_enter(&block->mutex);
  1760. {
  1761. buf_page_t* hash_bpage
  1762. = buf_page_hash_get(space, offset);
  1763. if (UNIV_UNLIKELY(bpage != hash_bpage)) {
  1764. /* The buf_pool->page_hash was modified
  1765. while buf_pool_mutex was released.
  1766. Free the block that was allocated. */
  1767. buf_LRU_block_free_non_file_page(block);
  1768. mutex_exit(&block->mutex);
  1769. block = (buf_block_t*) hash_bpage;
  1770. goto loop2;
  1771. }
  1772. }
  1773. if (UNIV_UNLIKELY
  1774. (bpage->buf_fix_count
  1775. || buf_page_get_io_fix(bpage) != BUF_IO_NONE)) {
  1776. /* The block was buffer-fixed or I/O-fixed
  1777. while buf_pool_mutex was not held by this thread.
  1778. Free the block that was allocated and try again.
  1779. This should be extremely unlikely. */
  1780. buf_LRU_block_free_non_file_page(block);
  1781. mutex_exit(&block->mutex);
  1782. goto wait_until_unfixed;
  1783. }
  1784. /* Move the compressed page from bpage to block,
  1785. and uncompress it. */
  1786. mutex_enter(&buf_pool_zip_mutex);
  1787. buf_relocate(bpage, &block->page);
  1788. buf_block_init_low(block);
  1789. block->lock_hash_val = lock_rec_hash(space, offset);
  1790. UNIV_MEM_DESC(&block->page.zip.data,
  1791. page_zip_get_size(&block->page.zip), block);
  1792. if (buf_page_get_state(&block->page)
  1793. == BUF_BLOCK_ZIP_PAGE) {
  1794. UT_LIST_REMOVE(list, buf_pool->zip_clean,
  1795. &block->page);
  1796. ut_ad(!block->page.in_flush_list);
  1797. } else {
  1798. /* Relocate buf_pool->flush_list. */
  1799. buf_page_t* b;
  1800. b = UT_LIST_GET_PREV(list, &block->page);
  1801. ut_ad(block->page.in_flush_list);
  1802. UT_LIST_REMOVE(list, buf_pool->flush_list,
  1803. &block->page);
  1804. if (b) {
  1805. UT_LIST_INSERT_AFTER(
  1806. list, buf_pool->flush_list, b,
  1807. &block->page);
  1808. } else {
  1809. UT_LIST_ADD_FIRST(
  1810. list, buf_pool->flush_list,
  1811. &block->page);
  1812. }
  1813. }
  1814. /* Buffer-fix, I/O-fix, and X-latch the block
  1815. for the duration of the decompression.
  1816. Also add the block to the unzip_LRU list. */
  1817. block->page.state = BUF_BLOCK_FILE_PAGE;
  1818. /* Insert at the front of unzip_LRU list */
  1819. buf_unzip_LRU_add_block(block, FALSE);
  1820. block->page.buf_fix_count = 1;
  1821. buf_block_set_io_fix(block, BUF_IO_READ);
  1822. buf_pool->n_pend_unzip++;
  1823. rw_lock_x_lock(&block->lock);
  1824. mutex_exit(&block->mutex);
  1825. mutex_exit(&buf_pool_zip_mutex);
  1826. buf_buddy_free(bpage, sizeof *bpage);
  1827. buf_pool_mutex_exit();
  1828. /* Decompress the page and apply buffered operations
  1829. while not holding buf_pool_mutex or block->mutex. */
  1830. success = buf_zip_decompress(block, srv_use_checksums);
  1831. if (UNIV_LIKELY(success)) {
  1832. ibuf_merge_or_delete_for_page(block, space, offset,
  1833. zip_size, TRUE);
  1834. }
  1835. /* Unfix and unlatch the block. */
  1836. buf_pool_mutex_enter();
  1837. mutex_enter(&block->mutex);
  1838. buf_pool->n_pend_unzip--;
  1839. block->page.buf_fix_count--;
  1840. buf_block_set_io_fix(block, BUF_IO_NONE);
  1841. mutex_exit(&block->mutex);
  1842. rw_lock_x_unlock(&block->lock);
  1843. if (UNIV_UNLIKELY(!success)) {
  1844. buf_pool_mutex_exit();
  1845. return(NULL);
  1846. }
  1847. break;
  1848. case BUF_BLOCK_ZIP_FREE:
  1849. case BUF_BLOCK_NOT_USED:
  1850. case BUF_BLOCK_READY_FOR_USE:
  1851. case BUF_BLOCK_MEMORY:
  1852. case BUF_BLOCK_REMOVE_HASH:
  1853. ut_error;
  1854. break;
  1855. }
  1856. ut_ad(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
  1857. mutex_enter(&block->mutex);
  1858. UNIV_MEM_ASSERT_RW(&block->page, sizeof block->page);
  1859. buf_block_buf_fix_inc(block, file, line);
  1860. buf_pool_mutex_exit();
  1861. /* Check if this is the first access to the page */
  1862. accessed = buf_page_is_accessed(&block->page);
  1863. buf_page_set_accessed(&block->page, TRUE);
  1864. mutex_exit(&block->mutex);
  1865. buf_block_make_young(&block->page);
  1866. #ifdef UNIV_DEBUG_FILE_ACCESSES
  1867. ut_a(!block->page.file_page_was_freed);
  1868. #endif
  1869. #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
  1870. ut_a(++buf_dbg_counter % 5771 || buf_validate());
  1871. ut_a(block->page.buf_fix_count > 0);
  1872. ut_a(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
  1873. #endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
  1874. switch (rw_latch) {
  1875. case RW_NO_LATCH:
  1876. if (must_read) {
  1877. /* Let us wait until the read operation
  1878. completes */
  1879. for (;;) {
  1880. enum buf_io_fix io_fix;
  1881. mutex_enter(&block->mutex);
  1882. io_fix = buf_block_get_io_fix(block);
  1883. mutex_exit(&block->mutex);
  1884. if (io_fix == BUF_IO_READ) {
  1885. os_thread_sleep(WAIT_FOR_READ);
  1886. } else {
  1887. break;
  1888. }
  1889. }
  1890. }
  1891. fix_type = MTR_MEMO_BUF_FIX;
  1892. break;
  1893. case RW_S_LATCH:
  1894. rw_lock_s_lock_func(&(block->lock), 0, file, line);
  1895. fix_type = MTR_MEMO_PAGE_S_FIX;
  1896. break;
  1897. default:
  1898. ut_ad(rw_latch == RW_X_LATCH);
  1899. rw_lock_x_lock_func(&(block->lock), 0, file, line);
  1900. fix_type = MTR_MEMO_PAGE_X_FIX;
  1901. break;
  1902. }
  1903. mtr_memo_push(mtr, block, fix_type);
  1904. if (!accessed) {
  1905. /* In the case of a first access, try to apply linear
  1906. read-ahead */
  1907. buf_read_ahead_linear(space, zip_size, offset);
  1908. }
  1909. #ifdef UNIV_IBUF_COUNT_DEBUG
  1910. ut_a(ibuf_count_get(buf_block_get_space(block),
  1911. buf_block_get_page_no(block)) == 0);
  1912. #endif
  1913. return(block);
  1914. }
  1915. /************************************************************************
  1916. This is the general function used to get optimistic access to a database
  1917. page. */
  1918. UNIV_INTERN
  1919. ibool
  1920. buf_page_optimistic_get_func(
  1921. /*=========================*/
  1922. /* out: TRUE if success */
  1923. ulint rw_latch,/* in: RW_S_LATCH, RW_X_LATCH */
  1924. buf_block_t* block, /* in: guessed buffer block */
  1925. ib_uint64_t modify_clock,/* in: modify clock value if mode is
  1926. ..._GUESS_ON_CLOCK */
  1927. const char* file, /* in: file name */
  1928. ulint line, /* in: line where called */
  1929. mtr_t* mtr) /* in: mini-transaction */
  1930. {
  1931. ibool accessed;
  1932. ibool success;
  1933. ulint fix_type;
  1934. ut_ad(mtr && block);
  1935. ut_ad((rw_latch == RW_S_LATCH) || (rw_latch == RW_X_LATCH));
  1936. mutex_enter(&block->mutex);
  1937. if (UNIV_UNLIKELY(buf_block_get_state(block) != BUF_BLOCK_FILE_PAGE)) {
  1938. mutex_exit(&block->mutex);
  1939. return(FALSE);
  1940. }
  1941. buf_block_buf_fix_inc(block, file, line);
  1942. accessed = buf_page_is_accessed(&block->page);
  1943. buf_page_set_accessed(&block->page, TRUE);
  1944. mutex_exit(&block->mutex);
  1945. buf_block_make_young(&block->page);
  1946. /* Check if this is the first access to the page */
  1947. ut_ad(!ibuf_inside()
  1948. || ibuf_page(buf_block_get_space(block),
  1949. buf_block_get_zip_size(block),
  1950. buf_block_get_page_no(block), NULL));
  1951. if (rw_latch == RW_S_LATCH) {
  1952. success = rw_lock_s_lock_nowait(&(block->lock),
  1953. file, line);
  1954. fix_type = MTR_MEMO_PAGE_S_FIX;
  1955. } else {
  1956. success = rw_lock_x_lock_func_nowait(&(block->lock),
  1957. file, line);
  1958. fix_type = MTR_MEMO_PAGE_X_FIX;
  1959. }
  1960. if (UNIV_UNLIKELY(!success)) {
  1961. mutex_enter(&block->mutex);
  1962. buf_block_buf_fix_dec(block);
  1963. mutex_exit(&block->mutex);
  1964. return(FALSE);
  1965. }
  1966. if (UNIV_UNLIKELY(modify_clock != block->modify_clock)) {
  1967. buf_block_dbg_add_level(block, SYNC_NO_ORDER_CHECK);
  1968. if (rw_latch == RW_S_LATCH) {
  1969. rw_lock_s_unlock(&(block->lock));
  1970. } else {
  1971. rw_lock_x_unlock(&(block->lock));
  1972. }
  1973. mutex_enter(&block->mutex);
  1974. buf_block_buf_fix_dec(block);
  1975. mutex_exit(&block->mutex);
  1976. return(FALSE);
  1977. }
  1978. mtr_memo_push(mtr, block, fix_type);
  1979. #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
  1980. ut_a(++buf_dbg_counter % 5771 || buf_validate());
  1981. ut_a(block->page.buf_fix_count > 0);
  1982. ut_a(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
  1983. #endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
  1984. #ifdef UNIV_DEBUG_FILE_ACCESSES
  1985. ut_a(block->page.file_page_was_freed == FALSE);
  1986. #endif
  1987. if (UNIV_UNLIKELY(!accessed)) {
  1988. /* In the case of a first access, try to apply linear
  1989. read-ahead */
  1990. buf_read_ahead_linear(buf_block_get_space(block),
  1991. buf_block_get_zip_size(block),
  1992. buf_block_get_page_no(block));
  1993. }
  1994. #ifdef UNIV_IBUF_COUNT_DEBUG
  1995. ut_a(ibuf_count_get(buf_block_get_space(block),
  1996. buf_block_get_page_no(block)) == 0);
  1997. #endif
  1998. buf_pool->n_page_gets++;
  1999. return(TRUE);
  2000. }
  2001. /************************************************************************
  2002. This is used to get access to a known database page, when no waiting can be
  2003. done. For example, if a search in an adaptive hash index leads us to this
  2004. frame. */
  2005. UNIV_INTERN
  2006. ibool
  2007. buf_page_get_known_nowait(
  2008. /*======================*/
  2009. /* out: TRUE if success */
  2010. ulint rw_latch,/* in: RW_S_LATCH, RW_X_LATCH */
  2011. buf_block_t* block, /* in: the known page */
  2012. ulint mode, /* in: BUF_MAKE_YOUNG or BUF_KEEP_OLD */
  2013. const char* file, /* in: file name */
  2014. ulint line, /* in: line where called */
  2015. mtr_t* mtr) /* in: mini-transaction */
  2016. {
  2017. ibool success;
  2018. ulint fix_type;
  2019. ut_ad(mtr);
  2020. ut_ad((rw_latch == RW_S_LATCH) || (rw_latch == RW_X_LATCH));
  2021. mutex_enter(&block->mutex);
  2022. if (buf_block_get_state(block) == BUF_BLOCK_REMOVE_HASH) {
  2023. /* Another thread is just freeing the block from the LRU list
  2024. of the buffer pool: do not try to access this page; this
  2025. attempt to access the page can only come through the hash
  2026. index because when the buffer block state is ..._REMOVE_HASH,
  2027. we have already removed it from the page address hash table
  2028. of the buffer pool. */
  2029. mutex_exit(&block->mutex);
  2030. return(FALSE);
  2031. }
  2032. ut_a(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
  2033. buf_block_buf_fix_inc(block, file, line);
  2034. mutex_exit(&block->mutex);
  2035. if (mode == BUF_MAKE_YOUNG) {
  2036. buf_block_make_young(&block->page);
  2037. }
  2038. ut_ad(!ibuf_inside() || (mode == BUF_KEEP_OLD));
  2039. if (rw_latch == RW_S_LATCH) {
  2040. success = rw_lock_s_lock_nowait(&(block->lock),
  2041. file, line);
  2042. fix_type = MTR_MEMO_PAGE_S_FIX;
  2043. } else {
  2044. success = rw_lock_x_lock_func_nowait(&(block->lock),
  2045. file, line);
  2046. fix_type = MTR_MEMO_PAGE_X_FIX;
  2047. }
  2048. if (!success) {
  2049. mutex_enter(&block->mutex);
  2050. buf_block_buf_fix_dec(block);
  2051. mutex_exit(&block->mutex);
  2052. return(FALSE);
  2053. }
  2054. mtr_memo_push(mtr, block, fix_type);
  2055. #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
  2056. ut_a(++buf_dbg_counter % 5771 || buf_validate());
  2057. ut_a(block->page.buf_fix_count > 0);
  2058. ut_a(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
  2059. #endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
  2060. #ifdef UNIV_DEBUG_FILE_ACCESSES
  2061. ut_a(block->page.file_page_was_freed == FALSE);
  2062. #endif
  2063. #ifdef UNIV_IBUF_COUNT_DEBUG
  2064. ut_a((mode == BUF_KEEP_OLD)
  2065. || (ibuf_count_get(buf_block_get_space(block),
  2066. buf_block_get_page_no(block)) == 0));
  2067. #endif
  2068. buf_pool->n_page_gets++;
  2069. return(TRUE);
  2070. }
  2071. /***********************************************************************
  2072. Given a tablespace id and page number tries to get that page. If the
  2073. page is not in the buffer pool it is not loaded and NULL is returned.
  2074. Suitable for using when holding the kernel mutex. */
  2075. UNIV_INTERN
  2076. const buf_block_t*
  2077. buf_page_try_get_func(
  2078. /*==================*/
  2079. /* out: pointer to a page or NULL */
  2080. ulint space_id,/* in: tablespace id */
  2081. ulint page_no,/* in: page number */
  2082. const char* file, /* in: file name */
  2083. ulint line, /* in: line where called */
  2084. mtr_t* mtr) /* in: mini-transaction */
  2085. {
  2086. buf_block_t* block;
  2087. ibool success;
  2088. ulint fix_type;
  2089. buf_pool_mutex_enter();
  2090. block = buf_block_hash_get(space_id, page_no);
  2091. if (!block) {
  2092. buf_pool_mutex_exit();
  2093. return(NULL);
  2094. }
  2095. mutex_enter(&block->mutex);
  2096. buf_pool_mutex_exit();
  2097. #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
  2098. ut_a(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
  2099. ut_a(buf_block_get_space(block) == space_id);
  2100. ut_a(buf_block_get_page_no(block) == page_no);
  2101. #endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
  2102. buf_block_buf_fix_inc(block, file, line);
  2103. mutex_exit(&block->mutex);
  2104. fix_type = MTR_MEMO_PAGE_S_FIX;
  2105. success = rw_lock_s_lock_nowait(&block->lock, file, line);
  2106. if (!success) {
  2107. /* Let us try to get an X-latch. If the current thread
  2108. is holding an X-latch on the page, we cannot get an
  2109. S-latch. */
  2110. fix_type = MTR_MEMO_PAGE_X_FIX;
  2111. success = rw_lock_x_lock_func_nowait(&block->lock,
  2112. file, line);
  2113. }
  2114. if (!success) {
  2115. mutex_enter(&block->mutex);
  2116. buf_block_buf_fix_dec(block);
  2117. mutex_exit(&block->mutex);
  2118. return(NULL);
  2119. }
  2120. mtr_memo_push(mtr, block, fix_type);
  2121. #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
  2122. ut_a(++buf_dbg_counter % 5771 || buf_validate());
  2123. ut_a(block->page.buf_fix_count > 0);
  2124. ut_a(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
  2125. #endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
  2126. #ifdef UNIV_DEBUG_FILE_ACCESSES
  2127. ut_a(block->page.file_page_was_freed == FALSE);
  2128. #endif /* UNIV_DEBUG_FILE_ACCESSES */
  2129. buf_block_dbg_add_level(block, SYNC_NO_ORDER_CHECK);
  2130. buf_pool->n_page_gets++;
  2131. #ifdef UNIV_IBUF_COUNT_DEBUG
  2132. ut_a(ibuf_count_get(buf_block_get_space(block),
  2133. buf_block_get_page_no(block)) == 0);
  2134. #endif
  2135. return(block);
  2136. }
  2137. /************************************************************************
  2138. Initialize some fields of a control block. */
  2139. UNIV_INLINE
  2140. void
  2141. buf_page_init_low(
  2142. /*==============*/
  2143. buf_page_t* bpage) /* in: block to init */
  2144. {
  2145. bpage->flush_type = BUF_FLUSH_LRU;
  2146. bpage->accessed = FALSE;
  2147. bpage->io_fix = BUF_IO_NONE;
  2148. bpage->buf_fix_count = 0;
  2149. bpage->freed_page_clock = 0;
  2150. bpage->newest_modification = 0;
  2151. bpage->oldest_modification = 0;
  2152. HASH_INVALIDATE(bpage, hash);
  2153. #ifdef UNIV_DEBUG_FILE_ACCESSES
  2154. bpage->file_page_was_freed = FALSE;
  2155. #endif /* UNIV_DEBUG_FILE_ACCESSES */
  2156. }
  2157. #ifdef UNIV_HOTBACKUP
  2158. /************************************************************************
  2159. Inits a page to the buffer buf_pool, for use in ibbackup --restore. */
  2160. UNIV_INTERN
  2161. void
  2162. buf_page_init_for_backup_restore(
  2163. /*=============================*/
  2164. ulint space, /* in: space id */
  2165. ulint offset, /* in: offset of the page within space
  2166. in units of a page */
  2167. ulint zip_size,/* in: compressed page size in bytes
  2168. or 0 for uncompressed pages */
  2169. buf_block_t* block) /* in: block to init */
  2170. {
  2171. buf_block_init_low(block);
  2172. block->lock_hash_val = 0;
  2173. buf_page_init_low(&block->page);
  2174. block->page.state = BUF_BLOCK_FILE_PAGE;
  2175. block->page.space = space;
  2176. block->page.offset = offset;
  2177. page_zip_des_init(&block->page.zip);
  2178. /* We assume that block->page.data has been allocated
  2179. with zip_size == UNIV_PAGE_SIZE. */
  2180. ut_ad(zip_size <= UNIV_PAGE_SIZE);
  2181. ut_ad(ut_is_2pow(zip_size));
  2182. page_zip_set_size(&block->page.zip, zip_size);
  2183. }
  2184. #endif /* UNIV_HOTBACKUP */
  2185. /************************************************************************
  2186. Inits a page to the buffer buf_pool. */
  2187. static
  2188. void
  2189. buf_page_init(
  2190. /*==========*/
  2191. ulint space, /* in: space id */
  2192. ulint offset, /* in: offset of the page within space
  2193. in units of a page */
  2194. buf_block_t* block) /* in: block to init */
  2195. {
  2196. buf_page_t* hash_page;
  2197. ut_ad(buf_pool_mutex_own());
  2198. ut_ad(mutex_own(&(block->mutex)));
  2199. ut_a(buf_block_get_state(block) != BUF_BLOCK_FILE_PAGE);
  2200. /* Set the state of the block */
  2201. buf_block_set_file_page(block, space, offset);
  2202. #ifdef UNIV_DEBUG_VALGRIND
  2203. if (!space) {
  2204. /* Silence valid Valgrind warnings about uninitialized
  2205. data being written to data files. There are some unused
  2206. bytes on some pages that InnoDB does not initialize. */
  2207. UNIV_MEM_VALID(block->frame, UNIV_PAGE_SIZE);
  2208. }
  2209. #endif /* UNIV_DEBUG_VALGRIND */
  2210. buf_block_init_low(block);
  2211. block->lock_hash_val = lock_rec_hash(space, offset);
  2212. /* Insert into the hash table of file pages */
  2213. hash_page = buf_page_hash_get(space, offset);
  2214. if (UNIV_LIKELY_NULL(hash_page)) {
  2215. fprintf(stderr,
  2216. "InnoDB: Error: page %lu %lu already found"
  2217. " in the hash table: %p, %p\n",
  2218. (ulong) space,
  2219. (ulong) offset,
  2220. (const void*) hash_page, (const void*) block);
  2221. #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
  2222. mutex_exit(&block->mutex);
  2223. buf_pool_mutex_exit();
  2224. buf_print();
  2225. buf_LRU_print();
  2226. buf_validate();
  2227. buf_LRU_validate();
  2228. #endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
  2229. ut_error;
  2230. }
  2231. buf_page_init_low(&block->page);
  2232. ut_ad(!block->page.in_zip_hash);
  2233. ut_ad(!block->page.in_page_hash);
  2234. ut_d(block->page.in_page_hash = TRUE);
  2235. HASH_INSERT(buf_page_t, hash, buf_pool->page_hash,
  2236. buf_page_address_fold(space, offset), &block->page);
  2237. }
  2238. /************************************************************************
  2239. Function which inits a page for read to the buffer buf_pool. If the page is
  2240. (1) already in buf_pool, or
  2241. (2) if we specify to read only ibuf pages and the page is not an ibuf page, or
  2242. (3) if the space is deleted or being deleted,
  2243. then this function does nothing.
  2244. Sets the io_fix flag to BUF_IO_READ and sets a non-recursive exclusive lock
  2245. on the buffer frame. The io-handler must take care that the flag is cleared
  2246. and the lock released later. */
  2247. UNIV_INTERN
  2248. buf_page_t*
  2249. buf_page_init_for_read(
  2250. /*===================*/
  2251. /* out: pointer to the block or NULL */
  2252. ulint* err, /* out: DB_SUCCESS or DB_TABLESPACE_DELETED */
  2253. ulint mode, /* in: BUF_READ_IBUF_PAGES_ONLY, ... */
  2254. ulint space, /* in: space id */
  2255. ulint zip_size,/* in: compressed page size, or 0 */
  2256. ibool unzip, /* in: TRUE=request uncompressed page */
  2257. ib_int64_t tablespace_version,/* in: prevents reading from a wrong
  2258. version of the tablespace in case we have done
  2259. DISCARD + IMPORT */
  2260. ulint offset) /* in: page number */
  2261. {
  2262. buf_block_t* block;
  2263. buf_page_t* bpage;
  2264. mtr_t mtr;
  2265. ibool lru = FALSE;
  2266. void* data;
  2267. ut_ad(buf_pool);
  2268. *err = DB_SUCCESS;
  2269. if (mode == BUF_READ_IBUF_PAGES_ONLY) {
  2270. /* It is a read-ahead within an ibuf routine */
  2271. ut_ad(!ibuf_bitmap_page(zip_size, offset));
  2272. ut_ad(ibuf_inside());
  2273. mtr_start(&mtr);
  2274. if (!recv_no_ibuf_operations
  2275. && !ibuf_page(space, zip_size, offset, &mtr)) {
  2276. mtr_commit(&mtr);
  2277. return(NULL);
  2278. }
  2279. } else {
  2280. ut_ad(mode == BUF_READ_ANY_PAGE);
  2281. }
  2282. if (zip_size && UNIV_LIKELY(!unzip)
  2283. && UNIV_LIKELY(!recv_recovery_is_on())) {
  2284. block = NULL;
  2285. } else {
  2286. block = buf_LRU_get_free_block(0);
  2287. ut_ad(block);
  2288. }
  2289. buf_pool_mutex_enter();
  2290. if (buf_page_hash_get(space, offset)) {
  2291. /* The page is already in the buffer pool. */
  2292. err_exit:
  2293. if (block) {
  2294. mutex_enter(&block->mutex);
  2295. buf_LRU_block_free_non_file_page(block);
  2296. mutex_exit(&block->mutex);
  2297. }
  2298. bpage = NULL;
  2299. goto func_exit;
  2300. }
  2301. if (fil_tablespace_deleted_or_being_deleted_in_mem(
  2302. space, tablespace_version)) {
  2303. /* The page belongs to a space which has been
  2304. deleted or is being deleted. */
  2305. *err = DB_TABLESPACE_DELETED;
  2306. goto err_exit;
  2307. }
  2308. if (block) {
  2309. bpage = &block->page;
  2310. mutex_enter(&block->mutex);
  2311. buf_page_init(space, offset, block);
  2312. /* The block must be put to the LRU list, to the old blocks */
  2313. buf_LRU_add_block(bpage, TRUE/* to old blocks */);
  2314. /* We set a pass-type x-lock on the frame because then
  2315. the same thread which called for the read operation
  2316. (and is running now at this point of code) can wait
  2317. for the read to complete by waiting for the x-lock on
  2318. the frame; if the x-lock were recursive, the same
  2319. thread would illegally get the x-lock before the page
  2320. read is completed. The x-lock is cleared by the
  2321. io-handler thread. */
  2322. rw_lock_x_lock_gen(&block->lock, BUF_IO_READ);
  2323. buf_page_set_io_fix(bpage, BUF_IO_READ);
  2324. if (UNIV_UNLIKELY(zip_size)) {
  2325. page_zip_set_size(&block->page.zip, zip_size);
  2326. /* buf_pool_mutex may be released and
  2327. reacquired by buf_buddy_alloc(). Thus, we
  2328. must release block->mutex in order not to
  2329. break the latching order in the reacquisition
  2330. of buf_pool_mutex. We also must defer this
  2331. operation until after the block descriptor has
  2332. been added to buf_pool->LRU and
  2333. buf_pool->page_hash. */
  2334. mutex_exit(&block->mutex);
  2335. data = buf_buddy_alloc(zip_size, &lru);
  2336. mutex_enter(&block->mutex);
  2337. block->page.zip.data = data;
  2338. /* To maintain the invariant
  2339. block->in_unzip_LRU_list
  2340. == buf_page_belongs_to_unzip_LRU(&block->page)
  2341. we have to add this block to unzip_LRU
  2342. after block->page.zip.data is set. */
  2343. ut_ad(buf_page_belongs_to_unzip_LRU(&block->page));
  2344. buf_unzip_LRU_add_block(block, TRUE);
  2345. }
  2346. mutex_exit(&block->mutex);
  2347. } else {
  2348. /* Defer buf_buddy_alloc() until after the block has
  2349. been found not to exist. The buf_buddy_alloc() and
  2350. buf_buddy_free() calls may be expensive because of
  2351. buf_buddy_relocate(). */
  2352. /* The compressed page must be allocated before the
  2353. control block (bpage), in order to avoid the
  2354. invocation of buf_buddy_relocate_block() on
  2355. uninitialized data. */
  2356. data = buf_buddy_alloc(zip_size, &lru);
  2357. bpage = buf_buddy_alloc(sizeof *bpage, &lru);
  2358. /* If buf_buddy_alloc() allocated storage from the LRU list,
  2359. it released and reacquired buf_pool_mutex. Thus, we must
  2360. check the page_hash again, as it may have been modified. */
  2361. if (UNIV_UNLIKELY(lru)
  2362. && UNIV_LIKELY_NULL(buf_page_hash_get(space, offset))) {
  2363. /* The block was added by some other thread. */
  2364. buf_buddy_free(bpage, sizeof *bpage);
  2365. buf_buddy_free(data, zip_size);
  2366. bpage = NULL;
  2367. goto func_exit;
  2368. }
  2369. page_zip_des_init(&bpage->zip);
  2370. page_zip_set_size(&bpage->zip, zip_size);
  2371. bpage->zip.data = data;
  2372. mutex_enter(&buf_pool_zip_mutex);
  2373. UNIV_MEM_DESC(bpage->zip.data,
  2374. page_zip_get_size(&bpage->zip), bpage);
  2375. buf_page_init_low(bpage);
  2376. bpage->state = BUF_BLOCK_ZIP_PAGE;
  2377. bpage->space = space;
  2378. bpage->offset = offset;
  2379. #ifdef UNIV_DEBUG
  2380. bpage->in_page_hash = FALSE;
  2381. bpage->in_zip_hash = FALSE;
  2382. bpage->in_flush_list = FALSE;
  2383. bpage->in_free_list = FALSE;
  2384. bpage->in_LRU_list = FALSE;
  2385. #endif /* UNIV_DEBUG */
  2386. ut_d(bpage->in_page_hash = TRUE);
  2387. HASH_INSERT(buf_page_t, hash, buf_pool->page_hash,
  2388. buf_page_address_fold(space, offset), bpage);
  2389. /* The block must be put to the LRU list, to the old blocks */
  2390. buf_LRU_add_block(bpage, TRUE/* to old blocks */);
  2391. buf_LRU_insert_zip_clean(bpage);
  2392. buf_page_set_io_fix(bpage, BUF_IO_READ);
  2393. mutex_exit(&buf_pool_zip_mutex);
  2394. }
  2395. buf_pool->n_pend_reads++;
  2396. func_exit:
  2397. buf_pool_mutex_exit();
  2398. if (mode == BUF_READ_IBUF_PAGES_ONLY) {
  2399. mtr_commit(&mtr);
  2400. }
  2401. ut_ad(!bpage || buf_page_in_file(bpage));
  2402. return(bpage);
  2403. }
  2404. /************************************************************************
  2405. Initializes a page to the buffer buf_pool. The page is usually not read
  2406. from a file even if it cannot be found in the buffer buf_pool. This is one
  2407. of the functions which perform to a block a state transition NOT_USED =>
  2408. FILE_PAGE (the other is buf_page_get_gen). */
  2409. UNIV_INTERN
  2410. buf_block_t*
  2411. buf_page_create(
  2412. /*============*/
  2413. /* out: pointer to the block, page bufferfixed */
  2414. ulint space, /* in: space id */
  2415. ulint offset, /* in: offset of the page within space in units of
  2416. a page */
  2417. ulint zip_size,/* in: compressed page size, or 0 */
  2418. mtr_t* mtr) /* in: mini-transaction handle */
  2419. {
  2420. buf_frame_t* frame;
  2421. buf_block_t* block;
  2422. buf_block_t* free_block = NULL;
  2423. ut_ad(mtr);
  2424. ut_ad(space || !zip_size);
  2425. free_block = buf_LRU_get_free_block(0);
  2426. buf_pool_mutex_enter();
  2427. block = (buf_block_t*) buf_page_hash_get(space, offset);
  2428. if (block && buf_page_in_file(&block->page)) {
  2429. #ifdef UNIV_IBUF_COUNT_DEBUG
  2430. ut_a(ibuf_count_get(space, offset) == 0);
  2431. #endif
  2432. #ifdef UNIV_DEBUG_FILE_ACCESSES
  2433. block->page.file_page_was_freed = FALSE;
  2434. #endif /* UNIV_DEBUG_FILE_ACCESSES */
  2435. /* Page can be found in buf_pool */
  2436. buf_pool_mutex_exit();
  2437. buf_block_free(free_block);
  2438. return(buf_page_get_with_no_latch(space, zip_size,
  2439. offset, mtr));
  2440. }
  2441. /* If we get here, the page was not in buf_pool: init it there */
  2442. #ifdef UNIV_DEBUG
  2443. if (buf_debug_prints) {
  2444. fprintf(stderr, "Creating space %lu page %lu to buffer\n",
  2445. (ulong) space, (ulong) offset);
  2446. }
  2447. #endif /* UNIV_DEBUG */
  2448. block = free_block;
  2449. mutex_enter(&block->mutex);
  2450. buf_page_init(space, offset, block);
  2451. /* The block must be put to the LRU list */
  2452. buf_LRU_add_block(&block->page, FALSE);
  2453. buf_block_buf_fix_inc(block, __FILE__, __LINE__);
  2454. buf_pool->n_pages_created++;
  2455. if (zip_size) {
  2456. void* data;
  2457. ibool lru;
  2458. /* Prevent race conditions during buf_buddy_alloc(),
  2459. which may release and reacquire buf_pool_mutex,
  2460. by IO-fixing and X-latching the block. */
  2461. buf_page_set_io_fix(&block->page, BUF_IO_READ);
  2462. rw_lock_x_lock(&block->lock);
  2463. page_zip_set_size(&block->page.zip, zip_size);
  2464. mutex_exit(&block->mutex);
  2465. /* buf_pool_mutex may be released and reacquired by
  2466. buf_buddy_alloc(). Thus, we must release block->mutex
  2467. in order not to break the latching order in
  2468. the reacquisition of buf_pool_mutex. We also must
  2469. defer this operation until after the block descriptor
  2470. has been added to buf_pool->LRU and buf_pool->page_hash. */
  2471. data = buf_buddy_alloc(zip_size, &lru);
  2472. mutex_enter(&block->mutex);
  2473. block->page.zip.data = data;
  2474. /* To maintain the invariant
  2475. block->in_unzip_LRU_list
  2476. == buf_page_belongs_to_unzip_LRU(&block->page)
  2477. we have to add this block to unzip_LRU after
  2478. block->page.zip.data is set. */
  2479. ut_ad(buf_page_belongs_to_unzip_LRU(&block->page));
  2480. buf_unzip_LRU_add_block(block, FALSE);
  2481. buf_page_set_io_fix(&block->page, BUF_IO_NONE);
  2482. rw_lock_x_unlock(&block->lock);
  2483. }
  2484. buf_pool_mutex_exit();
  2485. mtr_memo_push(mtr, block, MTR_MEMO_BUF_FIX);
  2486. buf_page_set_accessed(&block->page, TRUE);
  2487. mutex_exit(&block->mutex);
  2488. /* Delete possible entries for the page from the insert buffer:
  2489. such can exist if the page belonged to an index which was dropped */
  2490. ibuf_merge_or_delete_for_page(NULL, space, offset, zip_size, TRUE);
  2491. /* Flush pages from the end of the LRU list if necessary */
  2492. buf_flush_free_margin();
  2493. frame = block->frame;
  2494. memset(frame + FIL_PAGE_PREV, 0xff, 4);
  2495. memset(frame + FIL_PAGE_NEXT, 0xff, 4);
  2496. mach_write_to_2(frame + FIL_PAGE_TYPE, FIL_PAGE_TYPE_ALLOCATED);
  2497. /* Reset to zero the file flush lsn field in the page; if the first
  2498. page of an ibdata file is 'created' in this function into the buffer
  2499. pool then we lose the original contents of the file flush lsn stamp.
  2500. Then InnoDB could in a crash recovery print a big, false, corruption
  2501. warning if the stamp contains an lsn bigger than the ib_logfile lsn. */
  2502. memset(frame + FIL_PAGE_FILE_FLUSH_LSN, 0, 8);
  2503. #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
  2504. ut_a(++buf_dbg_counter % 357 || buf_validate());
  2505. #endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
  2506. #ifdef UNIV_IBUF_COUNT_DEBUG
  2507. ut_a(ibuf_count_get(buf_block_get_space(block),
  2508. buf_block_get_page_no(block)) == 0);
  2509. #endif
  2510. return(block);
  2511. }
  2512. /************************************************************************
  2513. Completes an asynchronous read or write request of a file page to or from
  2514. the buffer pool. */
  2515. UNIV_INTERN
  2516. void
  2517. buf_page_io_complete(
  2518. /*=================*/
  2519. buf_page_t* bpage) /* in: pointer to the block in question */
  2520. {
  2521. enum buf_io_fix io_type;
  2522. const ibool uncompressed = (buf_page_get_state(bpage)
  2523. == BUF_BLOCK_FILE_PAGE);
  2524. ut_a(buf_page_in_file(bpage));
  2525. /* We do not need protect io_fix here by mutex to read
  2526. it because this is the only function where we can change the value
  2527. from BUF_IO_READ or BUF_IO_WRITE to some other value, and our code
  2528. ensures that this is the only thread that handles the i/o for this
  2529. block. */
  2530. io_type = buf_page_get_io_fix(bpage);
  2531. ut_ad(io_type == BUF_IO_READ || io_type == BUF_IO_WRITE);
  2532. if (io_type == BUF_IO_READ) {
  2533. ulint read_page_no;
  2534. ulint read_space_id;
  2535. byte* frame;
  2536. if (buf_page_get_zip_size(bpage)) {
  2537. frame = bpage->zip.data;
  2538. buf_pool->n_pend_unzip++;
  2539. if (uncompressed
  2540. && !buf_zip_decompress((buf_block_t*) bpage,
  2541. FALSE)) {
  2542. buf_pool->n_pend_unzip--;
  2543. goto corrupt;
  2544. }
  2545. buf_pool->n_pend_unzip--;
  2546. } else {
  2547. ut_a(uncompressed);
  2548. frame = ((buf_block_t*) bpage)->frame;
  2549. }
  2550. /* If this page is not uninitialized and not in the
  2551. doublewrite buffer, then the page number and space id
  2552. should be the same as in block. */
  2553. read_page_no = mach_read_from_4(frame + FIL_PAGE_OFFSET);
  2554. read_space_id = mach_read_from_4(
  2555. frame + FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID);
  2556. if (bpage->space == TRX_SYS_SPACE
  2557. && trx_doublewrite_page_inside(bpage->offset)) {
  2558. ut_print_timestamp(stderr);
  2559. fprintf(stderr,
  2560. " InnoDB: Error: reading page %lu\n"
  2561. "InnoDB: which is in the"
  2562. " doublewrite buffer!\n",
  2563. (ulong) bpage->offset);
  2564. } else if (!read_space_id && !read_page_no) {
  2565. /* This is likely an uninitialized page. */
  2566. } else if ((bpage->space
  2567. && bpage->space != read_space_id)
  2568. || bpage->offset != read_page_no) {
  2569. /* We did not compare space_id to read_space_id
  2570. if bpage->space == 0, because the field on the
  2571. page may contain garbage in MySQL < 4.1.1,
  2572. which only supported bpage->space == 0. */
  2573. ut_print_timestamp(stderr);
  2574. fprintf(stderr,
  2575. " InnoDB: Error: space id and page n:o"
  2576. " stored in the page\n"
  2577. "InnoDB: read in are %lu:%lu,"
  2578. " should be %lu:%lu!\n",
  2579. (ulong) read_space_id, (ulong) read_page_no,
  2580. (ulong) bpage->space,
  2581. (ulong) bpage->offset);
  2582. }
  2583. /* From version 3.23.38 up we store the page checksum
  2584. to the 4 first bytes of the page end lsn field */
  2585. if (buf_page_is_corrupted(frame,
  2586. buf_page_get_zip_size(bpage))) {
  2587. corrupt:
  2588. fprintf(stderr,
  2589. "InnoDB: Database page corruption on disk"
  2590. " or a failed\n"
  2591. "InnoDB: file read of page %lu.\n"
  2592. "InnoDB: You may have to recover"
  2593. " from a backup.\n",
  2594. (ulong) bpage->offset);
  2595. buf_page_print(frame, buf_page_get_zip_size(bpage));
  2596. fprintf(stderr,
  2597. "InnoDB: Database page corruption on disk"
  2598. " or a failed\n"
  2599. "InnoDB: file read of page %lu.\n"
  2600. "InnoDB: You may have to recover"
  2601. " from a backup.\n",
  2602. (ulong) bpage->offset);
  2603. fputs("InnoDB: It is also possible that"
  2604. " your operating\n"
  2605. "InnoDB: system has corrupted its"
  2606. " own file cache\n"
  2607. "InnoDB: and rebooting your computer"
  2608. " removes the\n"
  2609. "InnoDB: error.\n"
  2610. "InnoDB: If the corrupt page is an index page\n"
  2611. "InnoDB: you can also try to"
  2612. " fix the corruption\n"
  2613. "InnoDB: by dumping, dropping,"
  2614. " and reimporting\n"
  2615. "InnoDB: the corrupt table."
  2616. " You can use CHECK\n"
  2617. "InnoDB: TABLE to scan your"
  2618. " table for corruption.\n"
  2619. "InnoDB: See also"
  2620. " http://dev.mysql.com/doc/refman/5.1/en/"
  2621. "forcing-recovery.html\n"
  2622. "InnoDB: about forcing recovery.\n", stderr);
  2623. if (srv_force_recovery < SRV_FORCE_IGNORE_CORRUPT) {
  2624. fputs("InnoDB: Ending processing because of"
  2625. " a corrupt database page.\n",
  2626. stderr);
  2627. exit(1);
  2628. }
  2629. }
  2630. if (recv_recovery_is_on()) {
  2631. /* Pages must be uncompressed for crash recovery. */
  2632. ut_a(uncompressed);
  2633. recv_recover_page(FALSE, TRUE, (buf_block_t*) bpage);
  2634. }
  2635. if (uncompressed && !recv_no_ibuf_operations) {
  2636. ibuf_merge_or_delete_for_page(
  2637. (buf_block_t*) bpage, bpage->space,
  2638. bpage->offset, buf_page_get_zip_size(bpage),
  2639. TRUE);
  2640. }
  2641. }
  2642. buf_pool_mutex_enter();
  2643. mutex_enter(buf_page_get_mutex(bpage));
  2644. #ifdef UNIV_IBUF_COUNT_DEBUG
  2645. if (io_type == BUF_IO_WRITE || uncompressed) {
  2646. /* For BUF_IO_READ of compressed-only blocks, the
  2647. buffered operations will be merged by buf_page_get_gen()
  2648. after the block has been uncompressed. */
  2649. ut_a(ibuf_count_get(bpage->space, bpage->offset) == 0);
  2650. }
  2651. #endif
  2652. /* Because this thread which does the unlocking is not the same that
  2653. did the locking, we use a pass value != 0 in unlock, which simply
  2654. removes the newest lock debug record, without checking the thread
  2655. id. */
  2656. buf_page_set_io_fix(bpage, BUF_IO_NONE);
  2657. switch (io_type) {
  2658. case BUF_IO_READ:
  2659. /* NOTE that the call to ibuf may have moved the ownership of
  2660. the x-latch to this OS thread: do not let this confuse you in
  2661. debugging! */
  2662. ut_ad(buf_pool->n_pend_reads > 0);
  2663. buf_pool->n_pend_reads--;
  2664. buf_pool->n_pages_read++;
  2665. if (uncompressed) {
  2666. rw_lock_x_unlock_gen(&((buf_block_t*) bpage)->lock,
  2667. BUF_IO_READ);
  2668. }
  2669. break;
  2670. case BUF_IO_WRITE:
  2671. /* Write means a flush operation: call the completion
  2672. routine in the flush system */
  2673. buf_flush_write_complete(bpage);
  2674. if (uncompressed) {
  2675. rw_lock_s_unlock_gen(&((buf_block_t*) bpage)->lock,
  2676. BUF_IO_WRITE);
  2677. }
  2678. buf_pool->n_pages_written++;
  2679. break;
  2680. default:
  2681. ut_error;
  2682. }
  2683. #ifdef UNIV_DEBUG
  2684. if (buf_debug_prints) {
  2685. fprintf(stderr, "Has %s page space %lu page no %lu\n",
  2686. io_type == BUF_IO_READ ? "read" : "written",
  2687. (ulong) buf_page_get_space(bpage),
  2688. (ulong) buf_page_get_page_no(bpage));
  2689. }
  2690. #endif /* UNIV_DEBUG */
  2691. mutex_exit(buf_page_get_mutex(bpage));
  2692. buf_pool_mutex_exit();
  2693. }
  2694. /*************************************************************************
  2695. Invalidates the file pages in the buffer pool when an archive recovery is
  2696. completed. All the file pages buffered must be in a replaceable state when
  2697. this function is called: not latched and not modified. */
  2698. UNIV_INTERN
  2699. void
  2700. buf_pool_invalidate(void)
  2701. /*=====================*/
  2702. {
  2703. ibool freed;
  2704. ut_ad(buf_all_freed());
  2705. freed = TRUE;
  2706. while (freed) {
  2707. freed = buf_LRU_search_and_free_block(100);
  2708. }
  2709. buf_pool_mutex_enter();
  2710. ut_ad(UT_LIST_GET_LEN(buf_pool->LRU) == 0);
  2711. ut_ad(UT_LIST_GET_LEN(buf_pool->unzip_LRU) == 0);
  2712. buf_pool_mutex_exit();
  2713. }
  2714. #if defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
  2715. /*************************************************************************
  2716. Validates the buffer buf_pool data structure. */
  2717. UNIV_INTERN
  2718. ibool
  2719. buf_validate(void)
  2720. /*==============*/
  2721. {
  2722. buf_page_t* b;
  2723. buf_chunk_t* chunk;
  2724. ulint i;
  2725. ulint n_single_flush = 0;
  2726. ulint n_lru_flush = 0;
  2727. ulint n_list_flush = 0;
  2728. ulint n_lru = 0;
  2729. ulint n_flush = 0;
  2730. ulint n_free = 0;
  2731. ulint n_zip = 0;
  2732. ut_ad(buf_pool);
  2733. buf_pool_mutex_enter();
  2734. chunk = buf_pool->chunks;
  2735. /* Check the uncompressed blocks. */
  2736. for (i = buf_pool->n_chunks; i--; chunk++) {
  2737. ulint j;
  2738. buf_block_t* block = chunk->blocks;
  2739. for (j = chunk->size; j--; block++) {
  2740. mutex_enter(&block->mutex);
  2741. switch (buf_block_get_state(block)) {
  2742. case BUF_BLOCK_ZIP_FREE:
  2743. case BUF_BLOCK_ZIP_PAGE:
  2744. case BUF_BLOCK_ZIP_DIRTY:
  2745. /* These should only occur on
  2746. zip_clean, zip_free[], or flush_list. */
  2747. ut_error;
  2748. break;
  2749. case BUF_BLOCK_FILE_PAGE:
  2750. ut_a(buf_page_hash_get(buf_block_get_space(
  2751. block),
  2752. buf_block_get_page_no(
  2753. block))
  2754. == &block->page);
  2755. #ifdef UNIV_IBUF_COUNT_DEBUG
  2756. ut_a(buf_page_get_io_fix(&block->page)
  2757. == BUF_IO_READ
  2758. || !ibuf_count_get(buf_block_get_space(
  2759. block),
  2760. buf_block_get_page_no(
  2761. block)));
  2762. #endif
  2763. switch (buf_page_get_io_fix(&block->page)) {
  2764. case BUF_IO_NONE:
  2765. break;
  2766. case BUF_IO_WRITE:
  2767. switch (buf_page_get_flush_type(
  2768. &block->page)) {
  2769. case BUF_FLUSH_LRU:
  2770. n_lru_flush++;
  2771. ut_a(rw_lock_is_locked(
  2772. &block->lock,
  2773. RW_LOCK_SHARED));
  2774. break;
  2775. case BUF_FLUSH_LIST:
  2776. n_list_flush++;
  2777. break;
  2778. case BUF_FLUSH_SINGLE_PAGE:
  2779. n_single_flush++;
  2780. break;
  2781. default:
  2782. ut_error;
  2783. }
  2784. break;
  2785. case BUF_IO_READ:
  2786. ut_a(rw_lock_is_locked(&block->lock,
  2787. RW_LOCK_EX));
  2788. break;
  2789. }
  2790. n_lru++;
  2791. if (block->page.oldest_modification > 0) {
  2792. n_flush++;
  2793. }
  2794. break;
  2795. case BUF_BLOCK_NOT_USED:
  2796. n_free++;
  2797. break;
  2798. case BUF_BLOCK_READY_FOR_USE:
  2799. case BUF_BLOCK_MEMORY:
  2800. case BUF_BLOCK_REMOVE_HASH:
  2801. /* do nothing */
  2802. break;
  2803. }
  2804. mutex_exit(&block->mutex);
  2805. }
  2806. }
  2807. mutex_enter(&buf_pool_zip_mutex);
  2808. /* Check clean compressed-only blocks. */
  2809. for (b = UT_LIST_GET_FIRST(buf_pool->zip_clean); b;
  2810. b = UT_LIST_GET_NEXT(list, b)) {
  2811. ut_a(buf_page_get_state(b) == BUF_BLOCK_ZIP_PAGE);
  2812. switch (buf_page_get_io_fix(b)) {
  2813. case BUF_IO_NONE:
  2814. /* All clean blocks should be I/O-unfixed. */
  2815. break;
  2816. case BUF_IO_READ:
  2817. /* In buf_LRU_free_block(), we temporarily set
  2818. b->io_fix = BUF_IO_READ for a newly allocated
  2819. control block in order to prevent
  2820. buf_page_get_gen() from decompressing the block. */
  2821. break;
  2822. default:
  2823. ut_error;
  2824. break;
  2825. }
  2826. ut_a(!b->oldest_modification);
  2827. ut_a(buf_page_hash_get(b->space, b->offset) == b);
  2828. n_lru++;
  2829. n_zip++;
  2830. }
  2831. /* Check dirty compressed-only blocks. */
  2832. for (b = UT_LIST_GET_FIRST(buf_pool->flush_list); b;
  2833. b = UT_LIST_GET_NEXT(list, b)) {
  2834. ut_ad(b->in_flush_list);
  2835. switch (buf_page_get_state(b)) {
  2836. case BUF_BLOCK_ZIP_DIRTY:
  2837. ut_a(b->oldest_modification);
  2838. n_lru++;
  2839. n_flush++;
  2840. n_zip++;
  2841. switch (buf_page_get_io_fix(b)) {
  2842. case BUF_IO_NONE:
  2843. case BUF_IO_READ:
  2844. break;
  2845. case BUF_IO_WRITE:
  2846. switch (buf_page_get_flush_type(b)) {
  2847. case BUF_FLUSH_LRU:
  2848. n_lru_flush++;
  2849. break;
  2850. case BUF_FLUSH_LIST:
  2851. n_list_flush++;
  2852. break;
  2853. case BUF_FLUSH_SINGLE_PAGE:
  2854. n_single_flush++;
  2855. break;
  2856. default:
  2857. ut_error;
  2858. }
  2859. break;
  2860. }
  2861. break;
  2862. case BUF_BLOCK_FILE_PAGE:
  2863. /* uncompressed page */
  2864. break;
  2865. case BUF_BLOCK_ZIP_FREE:
  2866. case BUF_BLOCK_ZIP_PAGE:
  2867. case BUF_BLOCK_NOT_USED:
  2868. case BUF_BLOCK_READY_FOR_USE:
  2869. case BUF_BLOCK_MEMORY:
  2870. case BUF_BLOCK_REMOVE_HASH:
  2871. ut_error;
  2872. break;
  2873. }
  2874. ut_a(buf_page_hash_get(b->space, b->offset) == b);
  2875. }
  2876. mutex_exit(&buf_pool_zip_mutex);
  2877. if (n_lru + n_free > buf_pool->curr_size + n_zip) {
  2878. fprintf(stderr, "n LRU %lu, n free %lu, pool %lu zip %lu\n",
  2879. (ulong) n_lru, (ulong) n_free,
  2880. (ulong) buf_pool->curr_size, (ulong) n_zip);
  2881. ut_error;
  2882. }
  2883. ut_a(UT_LIST_GET_LEN(buf_pool->LRU) == n_lru);
  2884. if (UT_LIST_GET_LEN(buf_pool->free) != n_free) {
  2885. fprintf(stderr, "Free list len %lu, free blocks %lu\n",
  2886. (ulong) UT_LIST_GET_LEN(buf_pool->free),
  2887. (ulong) n_free);
  2888. ut_error;
  2889. }
  2890. ut_a(UT_LIST_GET_LEN(buf_pool->flush_list) == n_flush);
  2891. ut_a(buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE] == n_single_flush);
  2892. ut_a(buf_pool->n_flush[BUF_FLUSH_LIST] == n_list_flush);
  2893. ut_a(buf_pool->n_flush[BUF_FLUSH_LRU] == n_lru_flush);
  2894. buf_pool_mutex_exit();
  2895. ut_a(buf_LRU_validate());
  2896. ut_a(buf_flush_validate());
  2897. return(TRUE);
  2898. }
  2899. #endif /* UNIV_DEBUG || UNIV_BUF_DEBUG */
  2900. #if defined UNIV_DEBUG_PRINT || defined UNIV_DEBUG || defined UNIV_BUF_DEBUG
  2901. /*************************************************************************
  2902. Prints info of the buffer buf_pool data structure. */
  2903. UNIV_INTERN
  2904. void
  2905. buf_print(void)
  2906. /*===========*/
  2907. {
  2908. dulint* index_ids;
  2909. ulint* counts;
  2910. ulint size;
  2911. ulint i;
  2912. ulint j;
  2913. dulint id;
  2914. ulint n_found;
  2915. buf_chunk_t* chunk;
  2916. dict_index_t* index;
  2917. ut_ad(buf_pool);
  2918. size = buf_pool->curr_size;
  2919. index_ids = mem_alloc(sizeof(dulint) * size);
  2920. counts = mem_alloc(sizeof(ulint) * size);
  2921. buf_pool_mutex_enter();
  2922. fprintf(stderr,
  2923. "buf_pool size %lu\n"
  2924. "database pages %lu\n"
  2925. "free pages %lu\n"
  2926. "modified database pages %lu\n"
  2927. "n pending decompressions %lu\n"
  2928. "n pending reads %lu\n"
  2929. "n pending flush LRU %lu list %lu single page %lu\n"
  2930. "pages read %lu, created %lu, written %lu\n",
  2931. (ulong) size,
  2932. (ulong) UT_LIST_GET_LEN(buf_pool->LRU),
  2933. (ulong) UT_LIST_GET_LEN(buf_pool->free),
  2934. (ulong) UT_LIST_GET_LEN(buf_pool->flush_list),
  2935. (ulong) buf_pool->n_pend_unzip,
  2936. (ulong) buf_pool->n_pend_reads,
  2937. (ulong) buf_pool->n_flush[BUF_FLUSH_LRU],
  2938. (ulong) buf_pool->n_flush[BUF_FLUSH_LIST],
  2939. (ulong) buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE],
  2940. (ulong) buf_pool->n_pages_read, buf_pool->n_pages_created,
  2941. (ulong) buf_pool->n_pages_written);
  2942. /* Count the number of blocks belonging to each index in the buffer */
  2943. n_found = 0;
  2944. chunk = buf_pool->chunks;
  2945. for (i = buf_pool->n_chunks; i--; chunk++) {
  2946. buf_block_t* block = chunk->blocks;
  2947. ulint n_blocks = chunk->size;
  2948. for (; n_blocks--; block++) {
  2949. const buf_frame_t* frame = block->frame;
  2950. if (fil_page_get_type(frame) == FIL_PAGE_INDEX) {
  2951. id = btr_page_get_index_id(frame);
  2952. /* Look for the id in the index_ids array */
  2953. j = 0;
  2954. while (j < n_found) {
  2955. if (ut_dulint_cmp(index_ids[j],
  2956. id) == 0) {
  2957. counts[j]++;
  2958. break;
  2959. }
  2960. j++;
  2961. }
  2962. if (j == n_found) {
  2963. n_found++;
  2964. index_ids[j] = id;
  2965. counts[j] = 1;
  2966. }
  2967. }
  2968. }
  2969. }
  2970. buf_pool_mutex_exit();
  2971. for (i = 0; i < n_found; i++) {
  2972. index = dict_index_get_if_in_cache(index_ids[i]);
  2973. fprintf(stderr,
  2974. "Block count for index %lu in buffer is about %lu",
  2975. (ulong) ut_dulint_get_low(index_ids[i]),
  2976. (ulong) counts[i]);
  2977. if (index) {
  2978. putc(' ', stderr);
  2979. dict_index_name_print(stderr, NULL, index);
  2980. }
  2981. putc('\n', stderr);
  2982. }
  2983. mem_free(index_ids);
  2984. mem_free(counts);
  2985. ut_a(buf_validate());
  2986. }
  2987. #endif /* UNIV_DEBUG_PRINT || UNIV_DEBUG || UNIV_BUF_DEBUG */
  2988. #ifdef UNIV_DEBUG
  2989. /*************************************************************************
  2990. Returns the number of latched pages in the buffer pool. */
  2991. UNIV_INTERN
  2992. ulint
  2993. buf_get_latched_pages_number(void)
  2994. /*==============================*/
  2995. {
  2996. buf_chunk_t* chunk;
  2997. buf_page_t* b;
  2998. ulint i;
  2999. ulint fixed_pages_number = 0;
  3000. buf_pool_mutex_enter();
  3001. chunk = buf_pool->chunks;
  3002. for (i = buf_pool->n_chunks; i--; chunk++) {
  3003. buf_block_t* block;
  3004. ulint j;
  3005. block = chunk->blocks;
  3006. for (j = chunk->size; j--; block++) {
  3007. if (buf_block_get_state(block)
  3008. != BUF_BLOCK_FILE_PAGE) {
  3009. continue;
  3010. }
  3011. mutex_enter(&block->mutex);
  3012. if (block->page.buf_fix_count != 0
  3013. || buf_page_get_io_fix(&block->page)
  3014. != BUF_IO_NONE) {
  3015. fixed_pages_number++;
  3016. }
  3017. mutex_exit(&block->mutex);
  3018. }
  3019. }
  3020. mutex_enter(&buf_pool_zip_mutex);
  3021. /* Traverse the lists of clean and dirty compressed-only blocks. */
  3022. for (b = UT_LIST_GET_FIRST(buf_pool->zip_clean); b;
  3023. b = UT_LIST_GET_NEXT(list, b)) {
  3024. ut_a(buf_page_get_state(b) == BUF_BLOCK_ZIP_PAGE);
  3025. ut_a(buf_page_get_io_fix(b) != BUF_IO_WRITE);
  3026. if (b->buf_fix_count != 0
  3027. || buf_page_get_io_fix(b) != BUF_IO_NONE) {
  3028. fixed_pages_number++;
  3029. }
  3030. }
  3031. for (b = UT_LIST_GET_FIRST(buf_pool->flush_list); b;
  3032. b = UT_LIST_GET_NEXT(list, b)) {
  3033. ut_ad(b->in_flush_list);
  3034. switch (buf_page_get_state(b)) {
  3035. case BUF_BLOCK_ZIP_DIRTY:
  3036. if (b->buf_fix_count != 0
  3037. || buf_page_get_io_fix(b) != BUF_IO_NONE) {
  3038. fixed_pages_number++;
  3039. }
  3040. break;
  3041. case BUF_BLOCK_FILE_PAGE:
  3042. /* uncompressed page */
  3043. break;
  3044. case BUF_BLOCK_ZIP_FREE:
  3045. case BUF_BLOCK_ZIP_PAGE:
  3046. case BUF_BLOCK_NOT_USED:
  3047. case BUF_BLOCK_READY_FOR_USE:
  3048. case BUF_BLOCK_MEMORY:
  3049. case BUF_BLOCK_REMOVE_HASH:
  3050. ut_error;
  3051. break;
  3052. }
  3053. }
  3054. mutex_exit(&buf_pool_zip_mutex);
  3055. buf_pool_mutex_exit();
  3056. return(fixed_pages_number);
  3057. }
  3058. #endif /* UNIV_DEBUG */
  3059. /*************************************************************************
  3060. Returns the number of pending buf pool ios. */
  3061. UNIV_INTERN
  3062. ulint
  3063. buf_get_n_pending_ios(void)
  3064. /*=======================*/
  3065. {
  3066. return(buf_pool->n_pend_reads
  3067. + buf_pool->n_flush[BUF_FLUSH_LRU]
  3068. + buf_pool->n_flush[BUF_FLUSH_LIST]
  3069. + buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE]);
  3070. }
  3071. /*************************************************************************
  3072. Returns the ratio in percents of modified pages in the buffer pool /
  3073. database pages in the buffer pool. */
  3074. UNIV_INTERN
  3075. ulint
  3076. buf_get_modified_ratio_pct(void)
  3077. /*============================*/
  3078. {
  3079. ulint ratio;
  3080. buf_pool_mutex_enter();
  3081. ratio = (100 * UT_LIST_GET_LEN(buf_pool->flush_list))
  3082. / (1 + UT_LIST_GET_LEN(buf_pool->LRU)
  3083. + UT_LIST_GET_LEN(buf_pool->free));
  3084. /* 1 + is there to avoid division by zero */
  3085. buf_pool_mutex_exit();
  3086. return(ratio);
  3087. }
  3088. /*************************************************************************
  3089. Prints info of the buffer i/o. */
  3090. UNIV_INTERN
  3091. void
  3092. buf_print_io(
  3093. /*=========*/
  3094. FILE* file) /* in/out: buffer where to print */
  3095. {
  3096. time_t current_time;
  3097. double time_elapsed;
  3098. ulint size;
  3099. ut_ad(buf_pool);
  3100. size = buf_pool->curr_size;
  3101. buf_pool_mutex_enter();
  3102. fprintf(file,
  3103. "Buffer pool size %lu\n"
  3104. "Free buffers %lu\n"
  3105. "Database pages %lu\n"
  3106. "Modified db pages %lu\n"
  3107. "Pending reads %lu\n"
  3108. "Pending writes: LRU %lu, flush list %lu, single page %lu\n",
  3109. (ulong) size,
  3110. (ulong) UT_LIST_GET_LEN(buf_pool->free),
  3111. (ulong) UT_LIST_GET_LEN(buf_pool->LRU),
  3112. (ulong) UT_LIST_GET_LEN(buf_pool->flush_list),
  3113. (ulong) buf_pool->n_pend_reads,
  3114. (ulong) buf_pool->n_flush[BUF_FLUSH_LRU]
  3115. + buf_pool->init_flush[BUF_FLUSH_LRU],
  3116. (ulong) buf_pool->n_flush[BUF_FLUSH_LIST]
  3117. + buf_pool->init_flush[BUF_FLUSH_LIST],
  3118. (ulong) buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE]);
  3119. current_time = time(NULL);
  3120. time_elapsed = 0.001 + difftime(current_time,
  3121. buf_pool->last_printout_time);
  3122. buf_pool->last_printout_time = current_time;
  3123. fprintf(file,
  3124. "Pages read %lu, created %lu, written %lu\n"
  3125. "%.2f reads/s, %.2f creates/s, %.2f writes/s\n",
  3126. (ulong) buf_pool->n_pages_read,
  3127. (ulong) buf_pool->n_pages_created,
  3128. (ulong) buf_pool->n_pages_written,
  3129. (buf_pool->n_pages_read - buf_pool->n_pages_read_old)
  3130. / time_elapsed,
  3131. (buf_pool->n_pages_created - buf_pool->n_pages_created_old)
  3132. / time_elapsed,
  3133. (buf_pool->n_pages_written - buf_pool->n_pages_written_old)
  3134. / time_elapsed);
  3135. if (buf_pool->n_page_gets > buf_pool->n_page_gets_old) {
  3136. fprintf(file, "Buffer pool hit rate %lu / 1000\n",
  3137. (ulong)
  3138. (1000 - ((1000 * (buf_pool->n_pages_read
  3139. - buf_pool->n_pages_read_old))
  3140. / (buf_pool->n_page_gets
  3141. - buf_pool->n_page_gets_old))));
  3142. } else {
  3143. fputs("No buffer pool page gets since the last printout\n",
  3144. file);
  3145. }
  3146. buf_pool->n_page_gets_old = buf_pool->n_page_gets;
  3147. buf_pool->n_pages_read_old = buf_pool->n_pages_read;
  3148. buf_pool->n_pages_created_old = buf_pool->n_pages_created;
  3149. buf_pool->n_pages_written_old = buf_pool->n_pages_written;
  3150. /* Print some values to help us with visualizing what is
  3151. happening with LRU eviction. */
  3152. fprintf(file,
  3153. "LRU len: %lu, unzip_LRU len: %lu\n"
  3154. "I/O sum[%lu]:cur[%lu], unzip sum[%lu]:cur[%lu]\n",
  3155. UT_LIST_GET_LEN(buf_pool->LRU),
  3156. UT_LIST_GET_LEN(buf_pool->unzip_LRU),
  3157. buf_LRU_stat_sum.io, buf_LRU_stat_cur.io,
  3158. buf_LRU_stat_sum.unzip, buf_LRU_stat_cur.unzip);
  3159. buf_pool_mutex_exit();
  3160. }
  3161. /**************************************************************************
  3162. Refreshes the statistics used to print per-second averages. */
  3163. UNIV_INTERN
  3164. void
  3165. buf_refresh_io_stats(void)
  3166. /*======================*/
  3167. {
  3168. buf_pool->last_printout_time = time(NULL);
  3169. buf_pool->n_page_gets_old = buf_pool->n_page_gets;
  3170. buf_pool->n_pages_read_old = buf_pool->n_pages_read;
  3171. buf_pool->n_pages_created_old = buf_pool->n_pages_created;
  3172. buf_pool->n_pages_written_old = buf_pool->n_pages_written;
  3173. }
  3174. /*************************************************************************
  3175. Checks that all file pages in the buffer are in a replaceable state. */
  3176. UNIV_INTERN
  3177. ibool
  3178. buf_all_freed(void)
  3179. /*===============*/
  3180. {
  3181. buf_chunk_t* chunk;
  3182. ulint i;
  3183. ut_ad(buf_pool);
  3184. buf_pool_mutex_enter();
  3185. chunk = buf_pool->chunks;
  3186. for (i = buf_pool->n_chunks; i--; chunk++) {
  3187. const buf_block_t* block = buf_chunk_not_freed(chunk);
  3188. if (UNIV_LIKELY_NULL(block)) {
  3189. fprintf(stderr,
  3190. "Page %lu %lu still fixed or dirty\n",
  3191. (ulong) block->page.space,
  3192. (ulong) block->page.offset);
  3193. ut_error;
  3194. }
  3195. }
  3196. buf_pool_mutex_exit();
  3197. return(TRUE);
  3198. }
  3199. /*************************************************************************
  3200. Checks that there currently are no pending i/o-operations for the buffer
  3201. pool. */
  3202. UNIV_INTERN
  3203. ibool
  3204. buf_pool_check_no_pending_io(void)
  3205. /*==============================*/
  3206. /* out: TRUE if there is no pending i/o */
  3207. {
  3208. ibool ret;
  3209. buf_pool_mutex_enter();
  3210. if (buf_pool->n_pend_reads + buf_pool->n_flush[BUF_FLUSH_LRU]
  3211. + buf_pool->n_flush[BUF_FLUSH_LIST]
  3212. + buf_pool->n_flush[BUF_FLUSH_SINGLE_PAGE]) {
  3213. ret = FALSE;
  3214. } else {
  3215. ret = TRUE;
  3216. }
  3217. buf_pool_mutex_exit();
  3218. return(ret);
  3219. }
  3220. /*************************************************************************
  3221. Gets the current length of the free list of buffer blocks. */
  3222. UNIV_INTERN
  3223. ulint
  3224. buf_get_free_list_len(void)
  3225. /*=======================*/
  3226. {
  3227. ulint len;
  3228. buf_pool_mutex_enter();
  3229. len = UT_LIST_GET_LEN(buf_pool->free);
  3230. buf_pool_mutex_exit();
  3231. return(len);
  3232. }