You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

3449 lines
95 KiB

20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
branches/zip: Try to synchronize the updates of uncompressed and compressed pages. btr_root_raise_and_insert(): Distinguish root_page_zip and new_page_zip. btr_cur_set_ownership_of_extern_field(): Do not log the write on the uncompressed page if it will be logged for page_zip. lock_rec_insert_check_and_lock(), lock_sec_rec_modify_check_and_lock(): Update the max_trx_id field also on the compressed page. mlog_write_ulint(): Add UNIV_UNLIKELY hints. Remove trailing white space. mlog_log_string(): Remove trailing white space. rec_set_field_extern_bits(): Remove parameter mtr, as the write will either occur in the heap, or it will be logged at a higher level. recv_parse_or_apply_log_rec_body(), page_zip_write_header(): Add log record type MLOG_ZIP_WRITE_HEADER. page_header_set_field(): Pass mtr=NULL to page_zip_write_header(). page_header_reset_last_insert(): Pass mtr to page_zip_write_header(). btr_page_set_index_id(), btr_page_set_level(), btr_page_set_next(), btr_page_set_prev(): Pass mtr to page_zip_write_header(). row_upd_rec_sys_fields(): Pass mtr=NULL to page_zip_write_trx_id() and page_zip_write_roll_ptr(), since the write will be logged at a higher level. page_zip_write_header(): Add parameter mtr. page_zip_write_header_log(): New function. Remove rec_set_nth_field_extern_bit(). Make rec_set_nth_field_extern_bit_old() static. Rename rec_set_nth_field_extern_bit_new() to rec_set_field_extern_bits_new() and make it static. row_ins_index_entry_low(): Remove bogus TODO comment.
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
branches/zip: dtuple_convert_big_rec(): Do not store anything locally of externally stored columns, and fix bugs introduced in r873. (Bug #22496) btr_page_get_sure_split_rec(), btr_page_insert_fits(), rec_get_converted_size(), rec_convert_dtuple_to_rec(), rec_convert_dtuple_to_rec_old(), rec_convert_dtuple_to_rec_new(): Add parameters ext and n_ext. Flag external fields during the conversion. rec_set_field_extern_bits(), rec_set_field_extern_bits_new(), rec_offs_set_nth_extern(), rec_set_nth_field_extern_bit_old(): Remove. The bits are set by rec_convert_dtuple_to_rec(). page_cur_insert_rec_low(): Remove the parameters ext and n_ext. btr_cur_add_ext(): New utility function for updating and sorting ext[]. Low-level functions now expect the array to be in ascending order for performance reasons. Used in btr_cur_optimistic_insert(), btr_cur_pessimistic_insert(), and btr_cur_pessimistic_update(). btr_cur_optimistic_insert(): Remove some defensive code, because we cannot compute the added parameters of rec_get_converted_size(). btr_push_update_extern_fields(): Sort the array. Require the array to be twice the maximum usage, so that ut_ulint_sort() can be used. dtuple_convert_big_rec(): Allocate new space for the BLOB pointer, to avoid overwriting prefix indexes to the same column. Adapt dtuple_convert_back_big_rec(). row_build_index_entry(): Fetch the columns also for prefix indexes of the clustered index. page_zip_apply_log(), page_zip_decompress_clust(): Allow externally stored fields to lack a locally stored part.
19 years ago
20 years ago
20 years ago
branches/zip: dtuple_convert_big_rec(): Do not store anything locally of externally stored columns, and fix bugs introduced in r873. (Bug #22496) btr_page_get_sure_split_rec(), btr_page_insert_fits(), rec_get_converted_size(), rec_convert_dtuple_to_rec(), rec_convert_dtuple_to_rec_old(), rec_convert_dtuple_to_rec_new(): Add parameters ext and n_ext. Flag external fields during the conversion. rec_set_field_extern_bits(), rec_set_field_extern_bits_new(), rec_offs_set_nth_extern(), rec_set_nth_field_extern_bit_old(): Remove. The bits are set by rec_convert_dtuple_to_rec(). page_cur_insert_rec_low(): Remove the parameters ext and n_ext. btr_cur_add_ext(): New utility function for updating and sorting ext[]. Low-level functions now expect the array to be in ascending order for performance reasons. Used in btr_cur_optimistic_insert(), btr_cur_pessimistic_insert(), and btr_cur_pessimistic_update(). btr_cur_optimistic_insert(): Remove some defensive code, because we cannot compute the added parameters of rec_get_converted_size(). btr_push_update_extern_fields(): Sort the array. Require the array to be twice the maximum usage, so that ut_ulint_sort() can be used. dtuple_convert_big_rec(): Allocate new space for the BLOB pointer, to avoid overwriting prefix indexes to the same column. Adapt dtuple_convert_back_big_rec(). row_build_index_entry(): Fetch the columns also for prefix indexes of the clustered index. page_zip_apply_log(), page_zip_decompress_clust(): Allow externally stored fields to lack a locally stored part.
19 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
branches/zip: dtuple_convert_big_rec(): Do not store anything locally of externally stored columns, and fix bugs introduced in r873. (Bug #22496) btr_page_get_sure_split_rec(), btr_page_insert_fits(), rec_get_converted_size(), rec_convert_dtuple_to_rec(), rec_convert_dtuple_to_rec_old(), rec_convert_dtuple_to_rec_new(): Add parameters ext and n_ext. Flag external fields during the conversion. rec_set_field_extern_bits(), rec_set_field_extern_bits_new(), rec_offs_set_nth_extern(), rec_set_nth_field_extern_bit_old(): Remove. The bits are set by rec_convert_dtuple_to_rec(). page_cur_insert_rec_low(): Remove the parameters ext and n_ext. btr_cur_add_ext(): New utility function for updating and sorting ext[]. Low-level functions now expect the array to be in ascending order for performance reasons. Used in btr_cur_optimistic_insert(), btr_cur_pessimistic_insert(), and btr_cur_pessimistic_update(). btr_cur_optimistic_insert(): Remove some defensive code, because we cannot compute the added parameters of rec_get_converted_size(). btr_push_update_extern_fields(): Sort the array. Require the array to be twice the maximum usage, so that ut_ulint_sort() can be used. dtuple_convert_big_rec(): Allocate new space for the BLOB pointer, to avoid overwriting prefix indexes to the same column. Adapt dtuple_convert_back_big_rec(). row_build_index_entry(): Fetch the columns also for prefix indexes of the clustered index. page_zip_apply_log(), page_zip_decompress_clust(): Allow externally stored fields to lack a locally stored part.
19 years ago
20 years ago
20 years ago
20 years ago
20 years ago
branches/zip: dtuple_convert_big_rec(): Do not store anything locally of externally stored columns, and fix bugs introduced in r873. (Bug #22496) btr_page_get_sure_split_rec(), btr_page_insert_fits(), rec_get_converted_size(), rec_convert_dtuple_to_rec(), rec_convert_dtuple_to_rec_old(), rec_convert_dtuple_to_rec_new(): Add parameters ext and n_ext. Flag external fields during the conversion. rec_set_field_extern_bits(), rec_set_field_extern_bits_new(), rec_offs_set_nth_extern(), rec_set_nth_field_extern_bit_old(): Remove. The bits are set by rec_convert_dtuple_to_rec(). page_cur_insert_rec_low(): Remove the parameters ext and n_ext. btr_cur_add_ext(): New utility function for updating and sorting ext[]. Low-level functions now expect the array to be in ascending order for performance reasons. Used in btr_cur_optimistic_insert(), btr_cur_pessimistic_insert(), and btr_cur_pessimistic_update(). btr_cur_optimistic_insert(): Remove some defensive code, because we cannot compute the added parameters of rec_get_converted_size(). btr_push_update_extern_fields(): Sort the array. Require the array to be twice the maximum usage, so that ut_ulint_sort() can be used. dtuple_convert_big_rec(): Allocate new space for the BLOB pointer, to avoid overwriting prefix indexes to the same column. Adapt dtuple_convert_back_big_rec(). row_build_index_entry(): Fetch the columns also for prefix indexes of the clustered index. page_zip_apply_log(), page_zip_decompress_clust(): Allow externally stored fields to lack a locally stored part.
19 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
branches/zip: dtuple_convert_big_rec(): Do not store anything locally of externally stored columns, and fix bugs introduced in r873. (Bug #22496) btr_page_get_sure_split_rec(), btr_page_insert_fits(), rec_get_converted_size(), rec_convert_dtuple_to_rec(), rec_convert_dtuple_to_rec_old(), rec_convert_dtuple_to_rec_new(): Add parameters ext and n_ext. Flag external fields during the conversion. rec_set_field_extern_bits(), rec_set_field_extern_bits_new(), rec_offs_set_nth_extern(), rec_set_nth_field_extern_bit_old(): Remove. The bits are set by rec_convert_dtuple_to_rec(). page_cur_insert_rec_low(): Remove the parameters ext and n_ext. btr_cur_add_ext(): New utility function for updating and sorting ext[]. Low-level functions now expect the array to be in ascending order for performance reasons. Used in btr_cur_optimistic_insert(), btr_cur_pessimistic_insert(), and btr_cur_pessimistic_update(). btr_cur_optimistic_insert(): Remove some defensive code, because we cannot compute the added parameters of rec_get_converted_size(). btr_push_update_extern_fields(): Sort the array. Require the array to be twice the maximum usage, so that ut_ulint_sort() can be used. dtuple_convert_big_rec(): Allocate new space for the BLOB pointer, to avoid overwriting prefix indexes to the same column. Adapt dtuple_convert_back_big_rec(). row_build_index_entry(): Fetch the columns also for prefix indexes of the clustered index. page_zip_apply_log(), page_zip_decompress_clust(): Allow externally stored fields to lack a locally stored part.
19 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
branches/zip: dtuple_convert_big_rec(): Do not store anything locally of externally stored columns, and fix bugs introduced in r873. (Bug #22496) btr_page_get_sure_split_rec(), btr_page_insert_fits(), rec_get_converted_size(), rec_convert_dtuple_to_rec(), rec_convert_dtuple_to_rec_old(), rec_convert_dtuple_to_rec_new(): Add parameters ext and n_ext. Flag external fields during the conversion. rec_set_field_extern_bits(), rec_set_field_extern_bits_new(), rec_offs_set_nth_extern(), rec_set_nth_field_extern_bit_old(): Remove. The bits are set by rec_convert_dtuple_to_rec(). page_cur_insert_rec_low(): Remove the parameters ext and n_ext. btr_cur_add_ext(): New utility function for updating and sorting ext[]. Low-level functions now expect the array to be in ascending order for performance reasons. Used in btr_cur_optimistic_insert(), btr_cur_pessimistic_insert(), and btr_cur_pessimistic_update(). btr_cur_optimistic_insert(): Remove some defensive code, because we cannot compute the added parameters of rec_get_converted_size(). btr_push_update_extern_fields(): Sort the array. Require the array to be twice the maximum usage, so that ut_ulint_sort() can be used. dtuple_convert_big_rec(): Allocate new space for the BLOB pointer, to avoid overwriting prefix indexes to the same column. Adapt dtuple_convert_back_big_rec(). row_build_index_entry(): Fetch the columns also for prefix indexes of the clustered index. page_zip_apply_log(), page_zip_decompress_clust(): Allow externally stored fields to lack a locally stored part.
19 years ago
20 years ago
branches/zip: dtuple_convert_big_rec(): Do not store anything locally of externally stored columns, and fix bugs introduced in r873. (Bug #22496) btr_page_get_sure_split_rec(), btr_page_insert_fits(), rec_get_converted_size(), rec_convert_dtuple_to_rec(), rec_convert_dtuple_to_rec_old(), rec_convert_dtuple_to_rec_new(): Add parameters ext and n_ext. Flag external fields during the conversion. rec_set_field_extern_bits(), rec_set_field_extern_bits_new(), rec_offs_set_nth_extern(), rec_set_nth_field_extern_bit_old(): Remove. The bits are set by rec_convert_dtuple_to_rec(). page_cur_insert_rec_low(): Remove the parameters ext and n_ext. btr_cur_add_ext(): New utility function for updating and sorting ext[]. Low-level functions now expect the array to be in ascending order for performance reasons. Used in btr_cur_optimistic_insert(), btr_cur_pessimistic_insert(), and btr_cur_pessimistic_update(). btr_cur_optimistic_insert(): Remove some defensive code, because we cannot compute the added parameters of rec_get_converted_size(). btr_push_update_extern_fields(): Sort the array. Require the array to be twice the maximum usage, so that ut_ulint_sort() can be used. dtuple_convert_big_rec(): Allocate new space for the BLOB pointer, to avoid overwriting prefix indexes to the same column. Adapt dtuple_convert_back_big_rec(). row_build_index_entry(): Fetch the columns also for prefix indexes of the clustered index. page_zip_apply_log(), page_zip_decompress_clust(): Allow externally stored fields to lack a locally stored part.
19 years ago
20 years ago
20 years ago
20 years ago
20 years ago
branches/zip: dtuple_convert_big_rec(): Do not store anything locally of externally stored columns, and fix bugs introduced in r873. (Bug #22496) btr_page_get_sure_split_rec(), btr_page_insert_fits(), rec_get_converted_size(), rec_convert_dtuple_to_rec(), rec_convert_dtuple_to_rec_old(), rec_convert_dtuple_to_rec_new(): Add parameters ext and n_ext. Flag external fields during the conversion. rec_set_field_extern_bits(), rec_set_field_extern_bits_new(), rec_offs_set_nth_extern(), rec_set_nth_field_extern_bit_old(): Remove. The bits are set by rec_convert_dtuple_to_rec(). page_cur_insert_rec_low(): Remove the parameters ext and n_ext. btr_cur_add_ext(): New utility function for updating and sorting ext[]. Low-level functions now expect the array to be in ascending order for performance reasons. Used in btr_cur_optimistic_insert(), btr_cur_pessimistic_insert(), and btr_cur_pessimistic_update(). btr_cur_optimistic_insert(): Remove some defensive code, because we cannot compute the added parameters of rec_get_converted_size(). btr_push_update_extern_fields(): Sort the array. Require the array to be twice the maximum usage, so that ut_ulint_sort() can be used. dtuple_convert_big_rec(): Allocate new space for the BLOB pointer, to avoid overwriting prefix indexes to the same column. Adapt dtuple_convert_back_big_rec(). row_build_index_entry(): Fetch the columns also for prefix indexes of the clustered index. page_zip_apply_log(), page_zip_decompress_clust(): Allow externally stored fields to lack a locally stored part.
19 years ago
20 years ago
branches/zip: dtuple_convert_big_rec(): Do not store anything locally of externally stored columns, and fix bugs introduced in r873. (Bug #22496) btr_page_get_sure_split_rec(), btr_page_insert_fits(), rec_get_converted_size(), rec_convert_dtuple_to_rec(), rec_convert_dtuple_to_rec_old(), rec_convert_dtuple_to_rec_new(): Add parameters ext and n_ext. Flag external fields during the conversion. rec_set_field_extern_bits(), rec_set_field_extern_bits_new(), rec_offs_set_nth_extern(), rec_set_nth_field_extern_bit_old(): Remove. The bits are set by rec_convert_dtuple_to_rec(). page_cur_insert_rec_low(): Remove the parameters ext and n_ext. btr_cur_add_ext(): New utility function for updating and sorting ext[]. Low-level functions now expect the array to be in ascending order for performance reasons. Used in btr_cur_optimistic_insert(), btr_cur_pessimistic_insert(), and btr_cur_pessimistic_update(). btr_cur_optimistic_insert(): Remove some defensive code, because we cannot compute the added parameters of rec_get_converted_size(). btr_push_update_extern_fields(): Sort the array. Require the array to be twice the maximum usage, so that ut_ulint_sort() can be used. dtuple_convert_big_rec(): Allocate new space for the BLOB pointer, to avoid overwriting prefix indexes to the same column. Adapt dtuple_convert_back_big_rec(). row_build_index_entry(): Fetch the columns also for prefix indexes of the clustered index. page_zip_apply_log(), page_zip_decompress_clust(): Allow externally stored fields to lack a locally stored part.
19 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
  1. /******************************************************
  2. The B-tree
  3. (c) 1994-1996 Innobase Oy
  4. Created 6/2/1994 Heikki Tuuri
  5. *******************************************************/
  6. #include "btr0btr.h"
  7. #ifdef UNIV_NONINL
  8. #include "btr0btr.ic"
  9. #endif
  10. #include "fsp0fsp.h"
  11. #include "page0page.h"
  12. #include "page0zip.h"
  13. #include "btr0cur.h"
  14. #include "btr0sea.h"
  15. #include "btr0pcur.h"
  16. #include "rem0cmp.h"
  17. #include "lock0lock.h"
  18. #include "ibuf0ibuf.h"
  19. #include "trx0trx.h"
  20. /*
  21. Latching strategy of the InnoDB B-tree
  22. --------------------------------------
  23. A tree latch protects all non-leaf nodes of the tree. Each node of a tree
  24. also has a latch of its own.
  25. A B-tree operation normally first acquires an S-latch on the tree. It
  26. searches down the tree and releases the tree latch when it has the
  27. leaf node latch. To save CPU time we do not acquire any latch on
  28. non-leaf nodes of the tree during a search, those pages are only bufferfixed.
  29. If an operation needs to restructure the tree, it acquires an X-latch on
  30. the tree before searching to a leaf node. If it needs, for example, to
  31. split a leaf,
  32. (1) InnoDB decides the split point in the leaf,
  33. (2) allocates a new page,
  34. (3) inserts the appropriate node pointer to the first non-leaf level,
  35. (4) releases the tree X-latch,
  36. (5) and then moves records from the leaf to the new allocated page.
  37. Node pointers
  38. -------------
  39. Leaf pages of a B-tree contain the index records stored in the
  40. tree. On levels n > 0 we store 'node pointers' to pages on level
  41. n - 1. For each page there is exactly one node pointer stored:
  42. thus the our tree is an ordinary B-tree, not a B-link tree.
  43. A node pointer contains a prefix P of an index record. The prefix
  44. is long enough so that it determines an index record uniquely.
  45. The file page number of the child page is added as the last
  46. field. To the child page we can store node pointers or index records
  47. which are >= P in the alphabetical order, but < P1 if there is
  48. a next node pointer on the level, and P1 is its prefix.
  49. If a node pointer with a prefix P points to a non-leaf child,
  50. then the leftmost record in the child must have the same
  51. prefix P. If it points to a leaf node, the child is not required
  52. to contain any record with a prefix equal to P. The leaf case
  53. is decided this way to allow arbitrary deletions in a leaf node
  54. without touching upper levels of the tree.
  55. We have predefined a special minimum record which we
  56. define as the smallest record in any alphabetical order.
  57. A minimum record is denoted by setting a bit in the record
  58. header. A minimum record acts as the prefix of a node pointer
  59. which points to a leftmost node on any level of the tree.
  60. File page allocation
  61. --------------------
  62. In the root node of a B-tree there are two file segment headers.
  63. The leaf pages of a tree are allocated from one file segment, to
  64. make them consecutive on disk if possible. From the other file segment
  65. we allocate pages for the non-leaf levels of the tree.
  66. */
  67. /******************************************************************
  68. Gets the root node of a tree and x-latches it. */
  69. static
  70. buf_block_t*
  71. btr_root_block_get(
  72. /*===============*/
  73. /* out: root page, x-latched */
  74. dict_index_t* index, /* in: index tree */
  75. mtr_t* mtr) /* in: mtr */
  76. {
  77. ulint space;
  78. ulint zip_size;
  79. ulint root_page_no;
  80. buf_block_t* block;
  81. space = dict_index_get_space(index);
  82. zip_size = dict_table_zip_size(index->table);
  83. root_page_no = dict_index_get_page(index);
  84. block = btr_block_get(space, zip_size, root_page_no, RW_X_LATCH, mtr);
  85. ut_a((ibool)!!page_is_comp(buf_block_get_frame(block))
  86. == dict_table_is_comp(index->table));
  87. return(block);
  88. }
  89. /******************************************************************
  90. Gets the root node of a tree and x-latches it. */
  91. page_t*
  92. btr_root_get(
  93. /*=========*/
  94. /* out: root page, x-latched */
  95. dict_index_t* index, /* in: index tree */
  96. mtr_t* mtr) /* in: mtr */
  97. {
  98. return(buf_block_get_frame(btr_root_block_get(index, mtr)));
  99. }
  100. /*****************************************************************
  101. Gets pointer to the previous user record in the tree. It is assumed that
  102. the caller has appropriate latches on the page and its neighbor. */
  103. rec_t*
  104. btr_get_prev_user_rec(
  105. /*==================*/
  106. /* out: previous user record, NULL if there is none */
  107. rec_t* rec, /* in: record on leaf level */
  108. mtr_t* mtr) /* in: mtr holding a latch on the page, and if
  109. needed, also to the previous page */
  110. {
  111. page_t* page;
  112. page_t* prev_page;
  113. ulint prev_page_no;
  114. if (!page_rec_is_infimum(rec)) {
  115. rec_t* prev_rec = page_rec_get_prev(rec);
  116. if (!page_rec_is_infimum(prev_rec)) {
  117. return(prev_rec);
  118. }
  119. }
  120. page = page_align(rec);
  121. prev_page_no = btr_page_get_prev(page, mtr);
  122. if (prev_page_no != FIL_NULL) {
  123. ulint space;
  124. ulint zip_size;
  125. buf_block_t* prev_block;
  126. space = page_get_space_id(page);
  127. zip_size = fil_space_get_zip_size(space);
  128. prev_block = buf_page_get_with_no_latch(space, zip_size,
  129. prev_page_no, mtr);
  130. prev_page = buf_block_get_frame(prev_block);
  131. /* The caller must already have a latch to the brother */
  132. ut_ad(mtr_memo_contains(mtr, prev_block,
  133. MTR_MEMO_PAGE_S_FIX)
  134. || mtr_memo_contains(mtr, prev_block,
  135. MTR_MEMO_PAGE_X_FIX));
  136. #ifdef UNIV_BTR_DEBUG
  137. ut_a(page_is_comp(prev_page) == page_is_comp(page));
  138. ut_a(btr_page_get_next(prev_page, mtr)
  139. == page_get_page_no(page));
  140. #endif /* UNIV_BTR_DEBUG */
  141. return(page_rec_get_prev(page_get_supremum_rec(prev_page)));
  142. }
  143. return(NULL);
  144. }
  145. /*****************************************************************
  146. Gets pointer to the next user record in the tree. It is assumed that the
  147. caller has appropriate latches on the page and its neighbor. */
  148. rec_t*
  149. btr_get_next_user_rec(
  150. /*==================*/
  151. /* out: next user record, NULL if there is none */
  152. rec_t* rec, /* in: record on leaf level */
  153. mtr_t* mtr) /* in: mtr holding a latch on the page, and if
  154. needed, also to the next page */
  155. {
  156. page_t* page;
  157. page_t* next_page;
  158. ulint next_page_no;
  159. if (!page_rec_is_supremum(rec)) {
  160. rec_t* next_rec = page_rec_get_next(rec);
  161. if (!page_rec_is_supremum(next_rec)) {
  162. return(next_rec);
  163. }
  164. }
  165. page = page_align(rec);
  166. next_page_no = btr_page_get_next(page, mtr);
  167. if (next_page_no != FIL_NULL) {
  168. ulint space;
  169. ulint zip_size;
  170. buf_block_t* next_block;
  171. space = page_get_space_id(page);
  172. zip_size = fil_space_get_zip_size(space);
  173. next_block = buf_page_get_with_no_latch(space, zip_size,
  174. next_page_no, mtr);
  175. next_page = buf_block_get_frame(next_block);
  176. /* The caller must already have a latch to the brother */
  177. ut_ad(mtr_memo_contains(mtr, next_block, MTR_MEMO_PAGE_S_FIX)
  178. || mtr_memo_contains(mtr, next_block,
  179. MTR_MEMO_PAGE_X_FIX));
  180. #ifdef UNIV_BTR_DEBUG
  181. ut_a(page_is_comp(next_page) == page_is_comp(page));
  182. ut_a(btr_page_get_prev(next_page, mtr)
  183. == page_get_page_no(page));
  184. #endif /* UNIV_BTR_DEBUG */
  185. return(page_rec_get_next(page_get_infimum_rec(next_page)));
  186. }
  187. return(NULL);
  188. }
  189. /******************************************************************
  190. Creates a new index page (not the root, and also not
  191. used in page reorganization). */
  192. static
  193. void
  194. btr_page_create(
  195. /*============*/
  196. buf_block_t* block, /* in/out: page to be created */
  197. page_zip_des_t* page_zip,/* in/out: compressed page, or NULL */
  198. dict_index_t* index, /* in: index */
  199. ulint level, /* in: the B-tree level of the page */
  200. mtr_t* mtr) /* in: mtr */
  201. {
  202. page_t* page = buf_block_get_frame(block);
  203. ut_ad(mtr_memo_contains(mtr, block, MTR_MEMO_PAGE_X_FIX));
  204. if (UNIV_LIKELY_NULL(page_zip)) {
  205. page_create_zip(block, index, level, mtr);
  206. } else {
  207. page_create(block, mtr, dict_table_is_comp(index->table));
  208. /* Set the level of the new index page */
  209. btr_page_set_level(page, NULL, level, mtr);
  210. }
  211. block->check_index_page_at_flush = TRUE;
  212. btr_page_set_index_id(page, page_zip, index->id, mtr);
  213. }
  214. /******************************************************************
  215. Allocates a new file page to be used in an ibuf tree. Takes the page from
  216. the free list of the tree, which must contain pages! */
  217. static
  218. buf_block_t*
  219. btr_page_alloc_for_ibuf(
  220. /*====================*/
  221. /* out: new allocated block, x-latched */
  222. dict_index_t* index, /* in: index tree */
  223. mtr_t* mtr) /* in: mtr */
  224. {
  225. fil_addr_t node_addr;
  226. page_t* root;
  227. page_t* new_page;
  228. buf_block_t* new_block;
  229. root = btr_root_get(index, mtr);
  230. node_addr = flst_get_first(root + PAGE_HEADER
  231. + PAGE_BTR_IBUF_FREE_LIST, mtr);
  232. ut_a(node_addr.page != FIL_NULL);
  233. new_block = buf_page_get(dict_index_get_space(index),
  234. dict_table_zip_size(index->table),
  235. node_addr.page, RW_X_LATCH, mtr);
  236. new_page = buf_block_get_frame(new_block);
  237. #ifdef UNIV_SYNC_DEBUG
  238. buf_block_dbg_add_level(new_block, SYNC_TREE_NODE_NEW);
  239. #endif /* UNIV_SYNC_DEBUG */
  240. flst_remove(root + PAGE_HEADER + PAGE_BTR_IBUF_FREE_LIST,
  241. new_page + PAGE_HEADER + PAGE_BTR_IBUF_FREE_LIST_NODE,
  242. mtr);
  243. ut_ad(flst_validate(root + PAGE_HEADER + PAGE_BTR_IBUF_FREE_LIST,
  244. mtr));
  245. return(new_block);
  246. }
  247. /******************************************************************
  248. Allocates a new file page to be used in an index tree. NOTE: we assume
  249. that the caller has made the reservation for free extents! */
  250. buf_block_t*
  251. btr_page_alloc(
  252. /*===========*/
  253. /* out: new allocated block, x-latched;
  254. NULL if out of space */
  255. dict_index_t* index, /* in: index */
  256. ulint hint_page_no, /* in: hint of a good page */
  257. byte file_direction, /* in: direction where a possible
  258. page split is made */
  259. ulint level, /* in: level where the page is placed
  260. in the tree */
  261. mtr_t* mtr) /* in: mtr */
  262. {
  263. fseg_header_t* seg_header;
  264. page_t* root;
  265. buf_block_t* new_block;
  266. ulint new_page_no;
  267. if (index->type & DICT_IBUF) {
  268. return(btr_page_alloc_for_ibuf(index, mtr));
  269. }
  270. root = btr_root_get(index, mtr);
  271. if (level == 0) {
  272. seg_header = root + PAGE_HEADER + PAGE_BTR_SEG_LEAF;
  273. } else {
  274. seg_header = root + PAGE_HEADER + PAGE_BTR_SEG_TOP;
  275. }
  276. /* Parameter TRUE below states that the caller has made the
  277. reservation for free extents, and thus we know that a page can
  278. be allocated: */
  279. new_page_no = fseg_alloc_free_page_general(seg_header, hint_page_no,
  280. file_direction, TRUE, mtr);
  281. if (new_page_no == FIL_NULL) {
  282. return(NULL);
  283. }
  284. new_block = buf_page_get(dict_index_get_space(index),
  285. dict_table_zip_size(index->table),
  286. new_page_no, RW_X_LATCH, mtr);
  287. #ifdef UNIV_SYNC_DEBUG
  288. buf_block_dbg_add_level(new_block, SYNC_TREE_NODE_NEW);
  289. #endif /* UNIV_SYNC_DEBUG */
  290. return(new_block);
  291. }
  292. /******************************************************************
  293. Gets the number of pages in a B-tree. */
  294. ulint
  295. btr_get_size(
  296. /*=========*/
  297. /* out: number of pages */
  298. dict_index_t* index, /* in: index */
  299. ulint flag) /* in: BTR_N_LEAF_PAGES or BTR_TOTAL_SIZE */
  300. {
  301. fseg_header_t* seg_header;
  302. page_t* root;
  303. ulint n;
  304. ulint dummy;
  305. mtr_t mtr;
  306. mtr_start(&mtr);
  307. mtr_s_lock(dict_index_get_lock(index), &mtr);
  308. root = btr_root_get(index, &mtr);
  309. if (flag == BTR_N_LEAF_PAGES) {
  310. seg_header = root + PAGE_HEADER + PAGE_BTR_SEG_LEAF;
  311. fseg_n_reserved_pages(seg_header, &n, &mtr);
  312. } else if (flag == BTR_TOTAL_SIZE) {
  313. seg_header = root + PAGE_HEADER + PAGE_BTR_SEG_TOP;
  314. n = fseg_n_reserved_pages(seg_header, &dummy, &mtr);
  315. seg_header = root + PAGE_HEADER + PAGE_BTR_SEG_LEAF;
  316. n += fseg_n_reserved_pages(seg_header, &dummy, &mtr);
  317. } else {
  318. ut_error;
  319. }
  320. mtr_commit(&mtr);
  321. return(n);
  322. }
  323. /******************************************************************
  324. Frees a page used in an ibuf tree. Puts the page to the free list of the
  325. ibuf tree. */
  326. static
  327. void
  328. btr_page_free_for_ibuf(
  329. /*===================*/
  330. dict_index_t* index, /* in: index tree */
  331. buf_block_t* block, /* in: block to be freed, x-latched */
  332. mtr_t* mtr) /* in: mtr */
  333. {
  334. page_t* root;
  335. ut_ad(mtr_memo_contains(mtr, block, MTR_MEMO_PAGE_X_FIX));
  336. root = btr_root_get(index, mtr);
  337. flst_add_first(root + PAGE_HEADER + PAGE_BTR_IBUF_FREE_LIST,
  338. buf_block_get_frame(block)
  339. + PAGE_HEADER + PAGE_BTR_IBUF_FREE_LIST_NODE, mtr);
  340. ut_ad(flst_validate(root + PAGE_HEADER + PAGE_BTR_IBUF_FREE_LIST,
  341. mtr));
  342. }
  343. /******************************************************************
  344. Frees a file page used in an index tree. Can be used also to (BLOB)
  345. external storage pages, because the page level 0 can be given as an
  346. argument. */
  347. void
  348. btr_page_free_low(
  349. /*==============*/
  350. dict_index_t* index, /* in: index tree */
  351. buf_block_t* block, /* in: block to be freed, x-latched */
  352. ulint level, /* in: page level */
  353. mtr_t* mtr) /* in: mtr */
  354. {
  355. fseg_header_t* seg_header;
  356. page_t* root;
  357. ut_ad(mtr_memo_contains(mtr, block, MTR_MEMO_PAGE_X_FIX));
  358. /* The page gets invalid for optimistic searches: increment the frame
  359. modify clock */
  360. buf_block_modify_clock_inc(block);
  361. if (index->type & DICT_IBUF) {
  362. btr_page_free_for_ibuf(index, block, mtr);
  363. return;
  364. }
  365. root = btr_root_get(index, mtr);
  366. if (level == 0) {
  367. seg_header = root + PAGE_HEADER + PAGE_BTR_SEG_LEAF;
  368. } else {
  369. seg_header = root + PAGE_HEADER + PAGE_BTR_SEG_TOP;
  370. }
  371. fseg_free_page(seg_header,
  372. buf_block_get_space(block),
  373. buf_block_get_page_no(block), mtr);
  374. }
  375. /******************************************************************
  376. Frees a file page used in an index tree. NOTE: cannot free field external
  377. storage pages because the page must contain info on its level. */
  378. void
  379. btr_page_free(
  380. /*==========*/
  381. dict_index_t* index, /* in: index tree */
  382. buf_block_t* block, /* in: block to be freed, x-latched */
  383. mtr_t* mtr) /* in: mtr */
  384. {
  385. ulint level;
  386. level = btr_page_get_level(buf_block_get_frame(block), mtr);
  387. btr_page_free_low(index, block, level, mtr);
  388. }
  389. /******************************************************************
  390. Sets the child node file address in a node pointer. */
  391. UNIV_INLINE
  392. void
  393. btr_node_ptr_set_child_page_no(
  394. /*===========================*/
  395. rec_t* rec, /* in: node pointer record */
  396. page_zip_des_t* page_zip,/* in/out: compressed page whose uncompressed
  397. part will be updated, or NULL */
  398. const ulint* offsets,/* in: array returned by rec_get_offsets() */
  399. ulint page_no,/* in: child node address */
  400. mtr_t* mtr) /* in: mtr */
  401. {
  402. byte* field;
  403. ulint len;
  404. ut_ad(rec_offs_validate(rec, NULL, offsets));
  405. ut_ad(!page_is_leaf(page_align(rec)));
  406. ut_ad(!rec_offs_comp(offsets) || rec_get_node_ptr_flag(rec));
  407. /* The child address is in the last field */
  408. field = rec_get_nth_field(rec, offsets,
  409. rec_offs_n_fields(offsets) - 1, &len);
  410. ut_ad(len == REC_NODE_PTR_SIZE);
  411. if (UNIV_LIKELY_NULL(page_zip)) {
  412. page_zip_write_node_ptr(page_zip, rec,
  413. rec_offs_data_size(offsets),
  414. page_no, mtr);
  415. } else {
  416. mlog_write_ulint(field, page_no, MLOG_4BYTES, mtr);
  417. }
  418. }
  419. /****************************************************************
  420. Returns the child page of a node pointer and x-latches it. */
  421. static
  422. buf_block_t*
  423. btr_node_ptr_get_child(
  424. /*===================*/
  425. /* out: child page, x-latched */
  426. const rec_t* node_ptr,/* in: node pointer */
  427. dict_index_t* index, /* in: index */
  428. const ulint* offsets,/* in: array returned by rec_get_offsets() */
  429. mtr_t* mtr) /* in: mtr */
  430. {
  431. ulint page_no;
  432. ulint space;
  433. ut_ad(rec_offs_validate(node_ptr, index, offsets));
  434. space = page_get_space_id(page_align((rec_t*) node_ptr));
  435. page_no = btr_node_ptr_get_child_page_no(node_ptr, offsets);
  436. return(btr_block_get(space, dict_table_zip_size(index->table),
  437. page_no, RW_X_LATCH, mtr));
  438. }
  439. /****************************************************************
  440. Returns the upper level node pointer to a page. It is assumed that mtr holds
  441. an x-latch on the tree. */
  442. static
  443. ulint*
  444. btr_page_get_father_node_ptr(
  445. /*=========================*/
  446. /* out: rec_get_offsets() of the
  447. node pointer record */
  448. ulint* offsets,/* in: work area for the return value */
  449. mem_heap_t* heap, /* in: memory heap to use */
  450. btr_cur_t* cursor, /* in: cursor pointing to user record,
  451. out: cursor on node pointer record,
  452. its page x-latched */
  453. mtr_t* mtr) /* in: mtr */
  454. {
  455. dtuple_t* tuple;
  456. rec_t* user_rec;
  457. rec_t* node_ptr;
  458. ulint level;
  459. ulint page_no;
  460. dict_index_t* index;
  461. page_no = buf_block_get_page_no(btr_cur_get_block(cursor));
  462. index = btr_cur_get_index(cursor);
  463. ut_ad(mtr_memo_contains(mtr, dict_index_get_lock(index),
  464. MTR_MEMO_X_LOCK));
  465. ut_ad(dict_index_get_page(index) != page_no);
  466. level = btr_page_get_level(btr_cur_get_page(cursor), mtr);
  467. user_rec = btr_cur_get_rec(cursor);
  468. ut_a(page_rec_is_user_rec(user_rec));
  469. tuple = dict_index_build_node_ptr(index, user_rec, 0, heap, level);
  470. btr_cur_search_to_nth_level(index, level + 1, tuple, PAGE_CUR_LE,
  471. BTR_CONT_MODIFY_TREE, cursor, 0, mtr);
  472. node_ptr = btr_cur_get_rec(cursor);
  473. ut_ad(!page_rec_is_comp(node_ptr)
  474. || rec_get_status(node_ptr) == REC_STATUS_NODE_PTR);
  475. offsets = rec_get_offsets(node_ptr, index, offsets,
  476. ULINT_UNDEFINED, &heap);
  477. if (UNIV_UNLIKELY(btr_node_ptr_get_child_page_no(node_ptr, offsets)
  478. != page_no)) {
  479. rec_t* print_rec;
  480. fputs("InnoDB: Dump of the child page:\n", stderr);
  481. buf_page_print(page_align(user_rec), 0);
  482. fputs("InnoDB: Dump of the parent page:\n", stderr);
  483. buf_page_print(page_align(node_ptr), 0);
  484. fputs("InnoDB: Corruption of an index tree: table ", stderr);
  485. ut_print_name(stderr, NULL, TRUE, index->table_name);
  486. fputs(", index ", stderr);
  487. ut_print_name(stderr, NULL, FALSE, index->name);
  488. fprintf(stderr, ",\n"
  489. "InnoDB: father ptr page no %lu, child page no %lu\n",
  490. (ulong)
  491. btr_node_ptr_get_child_page_no(node_ptr, offsets),
  492. (ulong) page_no);
  493. print_rec = page_rec_get_next(
  494. page_get_infimum_rec(page_align(user_rec)));
  495. offsets = rec_get_offsets(print_rec, index,
  496. offsets, ULINT_UNDEFINED, &heap);
  497. page_rec_print(print_rec, offsets);
  498. offsets = rec_get_offsets(node_ptr, index, offsets,
  499. ULINT_UNDEFINED, &heap);
  500. page_rec_print(node_ptr, offsets);
  501. fputs("InnoDB: You should dump + drop + reimport the table"
  502. " to fix the\n"
  503. "InnoDB: corruption. If the crash happens at "
  504. "the database startup, see\n"
  505. "InnoDB: http://dev.mysql.com/doc/refman/5.1/en/"
  506. "forcing-recovery.html about\n"
  507. "InnoDB: forcing recovery. "
  508. "Then dump + drop + reimport.\n", stderr);
  509. ut_error;
  510. }
  511. return(offsets);
  512. }
  513. /****************************************************************
  514. Returns the upper level node pointer to a page. It is assumed that mtr holds
  515. an x-latch on the tree. */
  516. static
  517. ulint*
  518. btr_page_get_father_block(
  519. /*======================*/
  520. /* out: rec_get_offsets() of the
  521. node pointer record */
  522. ulint* offsets,/* in: work area for the return value */
  523. mem_heap_t* heap, /* in: memory heap to use */
  524. dict_index_t* index, /* in: b-tree index */
  525. buf_block_t* block, /* in: child page in the index */
  526. mtr_t* mtr, /* in: mtr */
  527. btr_cur_t* cursor) /* out: cursor on node pointer record,
  528. its page x-latched */
  529. {
  530. rec_t* rec
  531. = page_rec_get_next(page_get_infimum_rec(buf_block_get_frame(
  532. block)));
  533. btr_cur_position(index, rec, block, cursor);
  534. return(btr_page_get_father_node_ptr(offsets, heap, cursor, mtr));
  535. }
  536. /****************************************************************
  537. Seeks to the upper level node pointer to a page.
  538. It is assumed that mtr holds an x-latch on the tree. */
  539. static
  540. void
  541. btr_page_get_father(
  542. /*================*/
  543. dict_index_t* index, /* in: b-tree index */
  544. buf_block_t* block, /* in: child page in the index */
  545. mtr_t* mtr, /* in: mtr */
  546. btr_cur_t* cursor) /* out: cursor on node pointer record,
  547. its page x-latched */
  548. {
  549. mem_heap_t* heap;
  550. rec_t* rec
  551. = page_rec_get_next(page_get_infimum_rec(buf_block_get_frame(
  552. block)));
  553. btr_cur_position(index, rec, block, cursor);
  554. heap = mem_heap_create(100);
  555. btr_page_get_father_node_ptr(NULL, heap, cursor, mtr);
  556. mem_heap_free(heap);
  557. }
  558. /****************************************************************
  559. Creates the root node for a new index tree. */
  560. ulint
  561. btr_create(
  562. /*=======*/
  563. /* out: page number of the created root,
  564. FIL_NULL if did not succeed */
  565. ulint type, /* in: type of the index */
  566. ulint space, /* in: space where created */
  567. ulint zip_size,/* in: compressed page size in bytes
  568. or 0 for uncompressed pages */
  569. dulint index_id,/* in: index id */
  570. dict_index_t* index, /* in: index */
  571. mtr_t* mtr) /* in: mini-transaction handle */
  572. {
  573. ulint page_no;
  574. buf_block_t* block;
  575. buf_frame_t* frame;
  576. page_t* page;
  577. page_zip_des_t* page_zip;
  578. /* Create the two new segments (one, in the case of an ibuf tree) for
  579. the index tree; the segment headers are put on the allocated root page
  580. (for an ibuf tree, not in the root, but on a separate ibuf header
  581. page) */
  582. if (type & DICT_IBUF) {
  583. /* Allocate first the ibuf header page */
  584. buf_block_t* ibuf_hdr_block = fseg_create(
  585. space, 0,
  586. IBUF_HEADER + IBUF_TREE_SEG_HEADER, mtr);
  587. #ifdef UNIV_SYNC_DEBUG
  588. buf_block_dbg_add_level(ibuf_hdr_block, SYNC_TREE_NODE_NEW);
  589. #endif /* UNIV_SYNC_DEBUG */
  590. ut_ad(buf_block_get_page_no(ibuf_hdr_block)
  591. == IBUF_HEADER_PAGE_NO);
  592. /* Allocate then the next page to the segment: it will be the
  593. tree root page */
  594. page_no = fseg_alloc_free_page(buf_block_get_frame(
  595. ibuf_hdr_block)
  596. + IBUF_HEADER
  597. + IBUF_TREE_SEG_HEADER,
  598. IBUF_TREE_ROOT_PAGE_NO,
  599. FSP_UP, mtr);
  600. ut_ad(page_no == IBUF_TREE_ROOT_PAGE_NO);
  601. block = buf_page_get(space, zip_size, page_no,
  602. RW_X_LATCH, mtr);
  603. } else {
  604. block = fseg_create(space, 0,
  605. PAGE_HEADER + PAGE_BTR_SEG_TOP, mtr);
  606. }
  607. if (block == NULL) {
  608. return(FIL_NULL);
  609. }
  610. page_no = buf_block_get_page_no(block);
  611. frame = buf_block_get_frame(block);
  612. #ifdef UNIV_SYNC_DEBUG
  613. buf_block_dbg_add_level(block, SYNC_TREE_NODE_NEW);
  614. #endif /* UNIV_SYNC_DEBUG */
  615. if (type & DICT_IBUF) {
  616. /* It is an insert buffer tree: initialize the free list */
  617. ut_ad(page_no == IBUF_TREE_ROOT_PAGE_NO);
  618. flst_init(frame + PAGE_HEADER + PAGE_BTR_IBUF_FREE_LIST, mtr);
  619. } else {
  620. /* It is a non-ibuf tree: create a file segment for leaf
  621. pages */
  622. fseg_create(space, page_no,
  623. PAGE_HEADER + PAGE_BTR_SEG_LEAF, mtr);
  624. /* The fseg create acquires a second latch on the page,
  625. therefore we must declare it: */
  626. #ifdef UNIV_SYNC_DEBUG
  627. buf_block_dbg_add_level(block, SYNC_TREE_NODE_NEW);
  628. #endif /* UNIV_SYNC_DEBUG */
  629. }
  630. /* Create a new index page on the the allocated segment page */
  631. page_zip = buf_block_get_page_zip(block);
  632. if (UNIV_LIKELY_NULL(page_zip)) {
  633. page = page_create_zip(block, index, 0, mtr);
  634. } else {
  635. page = page_create(block, mtr,
  636. dict_table_is_comp(index->table));
  637. /* Set the level of the new index page */
  638. btr_page_set_level(page, NULL, 0, mtr);
  639. }
  640. block->check_index_page_at_flush = TRUE;
  641. /* Set the index id of the page */
  642. btr_page_set_index_id(page, page_zip, index_id, mtr);
  643. /* Set the next node and previous node fields */
  644. btr_page_set_next(page, page_zip, FIL_NULL, mtr);
  645. btr_page_set_prev(page, page_zip, FIL_NULL, mtr);
  646. /* We reset the free bits for the page to allow creation of several
  647. trees in the same mtr, otherwise the latch on a bitmap page would
  648. prevent it because of the latching order */
  649. ibuf_reset_free_bits_with_type(type, block);
  650. /* In the following assertion we test that two records of maximum
  651. allowed size fit on the root page: this fact is needed to ensure
  652. correctness of split algorithms */
  653. ut_ad(page_get_max_insert_size(page, 2) > 2 * BTR_PAGE_MAX_REC_SIZE);
  654. return(page_no);
  655. }
  656. /****************************************************************
  657. Frees a B-tree except the root page, which MUST be freed after this
  658. by calling btr_free_root. */
  659. void
  660. btr_free_but_not_root(
  661. /*==================*/
  662. ulint space, /* in: space where created */
  663. ulint zip_size, /* in: compressed page size in bytes
  664. or 0 for uncompressed pages */
  665. ulint root_page_no) /* in: root page number */
  666. {
  667. ibool finished;
  668. page_t* root;
  669. mtr_t mtr;
  670. leaf_loop:
  671. mtr_start(&mtr);
  672. root = btr_page_get(space, zip_size, root_page_no, RW_X_LATCH, &mtr);
  673. /* NOTE: page hash indexes are dropped when a page is freed inside
  674. fsp0fsp. */
  675. finished = fseg_free_step(root + PAGE_HEADER + PAGE_BTR_SEG_LEAF,
  676. &mtr);
  677. mtr_commit(&mtr);
  678. if (!finished) {
  679. goto leaf_loop;
  680. }
  681. top_loop:
  682. mtr_start(&mtr);
  683. root = btr_page_get(space, zip_size, root_page_no, RW_X_LATCH, &mtr);
  684. finished = fseg_free_step_not_header(
  685. root + PAGE_HEADER + PAGE_BTR_SEG_TOP, &mtr);
  686. mtr_commit(&mtr);
  687. if (!finished) {
  688. goto top_loop;
  689. }
  690. }
  691. /****************************************************************
  692. Frees the B-tree root page. Other tree MUST already have been freed. */
  693. void
  694. btr_free_root(
  695. /*==========*/
  696. ulint space, /* in: space where created */
  697. ulint zip_size, /* in: compressed page size in bytes
  698. or 0 for uncompressed pages */
  699. ulint root_page_no, /* in: root page number */
  700. mtr_t* mtr) /* in: a mini-transaction which has already
  701. been started */
  702. {
  703. buf_block_t* block;
  704. fseg_header_t* header;
  705. block = btr_block_get(space, zip_size, root_page_no, RW_X_LATCH, mtr);
  706. btr_search_drop_page_hash_index(block);
  707. header = buf_block_get_frame(block) + PAGE_HEADER + PAGE_BTR_SEG_TOP;
  708. while (!fseg_free_step(header, mtr));
  709. }
  710. /*****************************************************************
  711. Reorganizes an index page. */
  712. static
  713. ibool
  714. btr_page_reorganize_low(
  715. /*====================*/
  716. ibool recovery,/* in: TRUE if called in recovery:
  717. locks should not be updated, i.e.,
  718. there cannot exist locks on the
  719. page, and a hash index should not be
  720. dropped: it cannot exist */
  721. buf_block_t* block, /* in: page to be reorganized */
  722. dict_index_t* index, /* in: record descriptor */
  723. mtr_t* mtr) /* in: mtr */
  724. {
  725. page_t* page = buf_block_get_frame(block);
  726. page_zip_des_t* page_zip = buf_block_get_page_zip(block);
  727. buf_block_t* temp_block;
  728. page_t* temp_page;
  729. ulint log_mode;
  730. ulint data_size1;
  731. ulint data_size2;
  732. ulint max_ins_size1;
  733. ulint max_ins_size2;
  734. ibool success = FALSE;
  735. ut_ad(mtr_memo_contains(mtr, block, MTR_MEMO_PAGE_X_FIX));
  736. ut_ad(!!page_is_comp(page) == dict_table_is_comp(index->table));
  737. #ifdef UNIV_ZIP_DEBUG
  738. ut_a(!page_zip || page_zip_validate(page_zip, page));
  739. #endif /* UNIV_ZIP_DEBUG */
  740. data_size1 = page_get_data_size(page);
  741. max_ins_size1 = page_get_max_insert_size_after_reorganize(page, 1);
  742. /* Write the log record */
  743. mlog_open_and_write_index(mtr, page, index, page_is_comp(page)
  744. ? MLOG_COMP_PAGE_REORGANIZE
  745. : MLOG_PAGE_REORGANIZE, 0);
  746. /* Turn logging off */
  747. log_mode = mtr_set_log_mode(mtr, MTR_LOG_NONE);
  748. temp_block = buf_block_alloc(0);
  749. temp_page = temp_block->frame;
  750. /* Copy the old page to temporary space */
  751. buf_frame_copy(temp_page, page);
  752. if (UNIV_LIKELY(!recovery)) {
  753. btr_search_drop_page_hash_index(block);
  754. }
  755. /* Recreate the page: note that global data on page (possible
  756. segment headers, next page-field, etc.) is preserved intact */
  757. page_create(block, mtr, dict_table_is_comp(index->table));
  758. block->check_index_page_at_flush = TRUE;
  759. /* Copy the records from the temporary space to the recreated page;
  760. do not copy the lock bits yet */
  761. page_copy_rec_list_end_no_locks(block, temp_block,
  762. page_get_infimum_rec(temp_page),
  763. index, mtr);
  764. /* Copy max trx id to recreated page */
  765. page_set_max_trx_id(block, NULL, page_get_max_trx_id(temp_page));
  766. if (UNIV_LIKELY_NULL(page_zip)
  767. && UNIV_UNLIKELY
  768. (!page_zip_compress(page_zip, page, index, NULL))) {
  769. /* Restore the old page and exit. */
  770. buf_frame_copy(page, temp_page);
  771. goto func_exit;
  772. }
  773. if (UNIV_LIKELY(!recovery)) {
  774. /* Update the record lock bitmaps */
  775. lock_move_reorganize_page(block, temp_block);
  776. }
  777. data_size2 = page_get_data_size(page);
  778. max_ins_size2 = page_get_max_insert_size_after_reorganize(page, 1);
  779. if (UNIV_UNLIKELY(data_size1 != data_size2)
  780. || UNIV_UNLIKELY(max_ins_size1 != max_ins_size2)) {
  781. buf_page_print(page, 0);
  782. buf_page_print(temp_page, 0);
  783. fprintf(stderr,
  784. "InnoDB: Error: page old data size %lu"
  785. " new data size %lu\n"
  786. "InnoDB: Error: page old max ins size %lu"
  787. " new max ins size %lu\n"
  788. "InnoDB: Submit a detailed bug report"
  789. " to http://bugs.mysql.com\n",
  790. (unsigned long) data_size1, (unsigned long) data_size2,
  791. (unsigned long) max_ins_size1,
  792. (unsigned long) max_ins_size2);
  793. } else {
  794. success = TRUE;
  795. }
  796. /* On compressed pages, recompute the insert buffer free bits. */
  797. if (UNIV_LIKELY_NULL(page_zip) && !dict_index_is_clust(index)) {
  798. ibuf_update_free_bits_if_full(
  799. index, page_zip_get_size(page_zip), block,
  800. UNIV_PAGE_SIZE, ULINT_UNDEFINED);
  801. }
  802. func_exit:
  803. #ifdef UNIV_ZIP_DEBUG
  804. ut_a(!page_zip || page_zip_validate(page_zip, page));
  805. #endif /* UNIV_ZIP_DEBUG */
  806. buf_block_free(temp_block);
  807. /* Restore logging mode */
  808. mtr_set_log_mode(mtr, log_mode);
  809. return(success);
  810. }
  811. /*****************************************************************
  812. Reorganizes an index page. */
  813. ibool
  814. btr_page_reorganize(
  815. /*================*/
  816. /* out: TRUE on success, FALSE on failure */
  817. buf_block_t* block, /* in: page to be reorganized */
  818. dict_index_t* index, /* in: record descriptor */
  819. mtr_t* mtr) /* in: mtr */
  820. {
  821. return(btr_page_reorganize_low(FALSE, block, index, mtr));
  822. }
  823. /***************************************************************
  824. Parses a redo log record of reorganizing a page. */
  825. byte*
  826. btr_parse_page_reorganize(
  827. /*======================*/
  828. /* out: end of log record or NULL */
  829. byte* ptr, /* in: buffer */
  830. byte* end_ptr __attribute__((unused)),
  831. /* in: buffer end */
  832. dict_index_t* index, /* in: record descriptor */
  833. buf_block_t* block, /* in: page to be reorganized, or NULL */
  834. mtr_t* mtr) /* in: mtr or NULL */
  835. {
  836. ut_ad(ptr && end_ptr);
  837. /* The record is empty, except for the record initial part */
  838. if (UNIV_LIKELY(block != NULL)) {
  839. btr_page_reorganize_low(TRUE, block, index, mtr);
  840. }
  841. return(ptr);
  842. }
  843. /*****************************************************************
  844. Empties an index page. */
  845. static
  846. void
  847. btr_page_empty(
  848. /*===========*/
  849. buf_block_t* block, /* in: page to be emptied */
  850. page_zip_des_t* page_zip,/* out: compressed page, or NULL */
  851. mtr_t* mtr, /* in: mtr */
  852. dict_index_t* index) /* in: index of the page */
  853. {
  854. page_t* page = buf_block_get_frame(block);
  855. ut_ad(mtr_memo_contains(mtr, block, MTR_MEMO_PAGE_X_FIX));
  856. #ifdef UNIV_ZIP_DEBUG
  857. ut_a(!page_zip || page_zip_validate(page_zip, page));
  858. #endif /* UNIV_ZIP_DEBUG */
  859. btr_search_drop_page_hash_index(block);
  860. /* Recreate the page: note that global data on page (possible
  861. segment headers, next page-field, etc.) is preserved intact */
  862. if (UNIV_LIKELY_NULL(page_zip)) {
  863. page_create_zip(block, index,
  864. btr_page_get_level(page, mtr), mtr);
  865. } else {
  866. page_create(block, mtr, dict_table_is_comp(index->table));
  867. }
  868. block->check_index_page_at_flush = TRUE;
  869. }
  870. /*****************************************************************
  871. Makes tree one level higher by splitting the root, and inserts
  872. the tuple. It is assumed that mtr contains an x-latch on the tree.
  873. NOTE that the operation of this function must always succeed,
  874. we cannot reverse it: therefore enough free disk space must be
  875. guaranteed to be available before this function is called. */
  876. rec_t*
  877. btr_root_raise_and_insert(
  878. /*======================*/
  879. /* out: inserted record */
  880. btr_cur_t* cursor, /* in: cursor at which to insert: must be
  881. on the root page; when the function returns,
  882. the cursor is positioned on the predecessor
  883. of the inserted record */
  884. dtuple_t* tuple, /* in: tuple to insert */
  885. const ulint* ext, /* in: array of extern field numbers */
  886. ulint n_ext, /* in: number of elements in vec */
  887. mtr_t* mtr) /* in: mtr */
  888. {
  889. dict_index_t* index;
  890. page_t* root;
  891. page_t* new_page;
  892. ulint new_page_no;
  893. rec_t* rec;
  894. mem_heap_t* heap;
  895. dtuple_t* node_ptr;
  896. ulint level;
  897. rec_t* node_ptr_rec;
  898. page_cur_t* page_cursor;
  899. page_zip_des_t* root_page_zip;
  900. page_zip_des_t* new_page_zip;
  901. buf_block_t* root_block;
  902. buf_block_t* new_block;
  903. root = btr_cur_get_page(cursor);
  904. root_block = btr_cur_get_block(cursor);
  905. root_page_zip = buf_block_get_page_zip(root_block);
  906. #ifdef UNIV_ZIP_DEBUG
  907. ut_a(!root_page_zip || page_zip_validate(root_page_zip, root));
  908. #endif /* UNIV_ZIP_DEBUG */
  909. index = btr_cur_get_index(cursor);
  910. ut_ad(dict_index_get_page(index) == page_get_page_no(root));
  911. ut_ad(mtr_memo_contains(mtr, dict_index_get_lock(index),
  912. MTR_MEMO_X_LOCK));
  913. ut_ad(mtr_memo_contains(mtr, root_block, MTR_MEMO_PAGE_X_FIX));
  914. btr_search_drop_page_hash_index(root_block);
  915. /* Allocate a new page to the tree. Root splitting is done by first
  916. moving the root records to the new page, emptying the root, putting
  917. a node pointer to the new page, and then splitting the new page. */
  918. level = btr_page_get_level(root, mtr);
  919. new_block = btr_page_alloc(index, 0, FSP_NO_DIR, level, mtr);
  920. new_page = buf_block_get_frame(new_block);
  921. new_page_zip = buf_block_get_page_zip(new_block);
  922. ut_a(!new_page_zip == !root_page_zip);
  923. ut_a(!new_page_zip
  924. || page_zip_get_size(new_page_zip)
  925. == page_zip_get_size(root_page_zip));
  926. btr_page_create(new_block, new_page_zip, index, level, mtr);
  927. /* Set the next node and previous node fields of new page */
  928. btr_page_set_next(new_page, new_page_zip, FIL_NULL, mtr);
  929. btr_page_set_prev(new_page, new_page_zip, FIL_NULL, mtr);
  930. /* Copy the records from root to the new page one by one. */
  931. if (UNIV_UNLIKELY
  932. (!page_copy_rec_list_end(new_block, root_block,
  933. page_get_infimum_rec(root),
  934. index, mtr))) {
  935. ut_a(new_page_zip);
  936. /* Copy the page byte for byte. */
  937. page_zip_copy(new_page_zip, new_page,
  938. root_page_zip, root, index, mtr);
  939. }
  940. /* If this is a pessimistic insert which is actually done to
  941. perform a pessimistic update then we have stored the lock
  942. information of the record to be inserted on the infimum of the
  943. root page: we cannot discard the lock structs on the root page */
  944. lock_update_root_raise(new_block, root_block);
  945. /* Create a memory heap where the node pointer is stored */
  946. heap = mem_heap_create(100);
  947. rec = page_rec_get_next(page_get_infimum_rec(new_page));
  948. new_page_no = buf_block_get_page_no(new_block);
  949. /* Build the node pointer (= node key and page address) for the
  950. child */
  951. node_ptr = dict_index_build_node_ptr(index, rec, new_page_no, heap,
  952. level);
  953. /* The node pointer must be marked as the predefined minimum record,
  954. as there is no lower alphabetical limit to records in the leftmost
  955. node of a level: */
  956. dtuple_set_info_bits(node_ptr,
  957. dtuple_get_info_bits(node_ptr)
  958. | REC_INFO_MIN_REC_FLAG);
  959. /* Rebuild the root page to get free space */
  960. if (UNIV_LIKELY_NULL(root_page_zip)) {
  961. page_create_zip(root_block, index, level + 1, mtr);
  962. } else {
  963. page_create(root_block, mtr, dict_table_is_comp(index->table));
  964. btr_page_set_level(root, NULL, level + 1, mtr);
  965. }
  966. /* Set the next node and previous node fields, although
  967. they should already have been set. The previous node field
  968. must be FIL_NULL if root_page_zip != NULL, because the
  969. REC_INFO_MIN_REC_FLAG (of the first user record) will be
  970. set if and only if btr_page_get_prev() == FIL_NULL. */
  971. btr_page_set_next(root, root_page_zip, FIL_NULL, mtr);
  972. btr_page_set_prev(root, root_page_zip, FIL_NULL, mtr);
  973. root_block->check_index_page_at_flush = TRUE;
  974. page_cursor = btr_cur_get_page_cur(cursor);
  975. /* Insert node pointer to the root */
  976. page_cur_set_before_first(root_block, page_cursor);
  977. node_ptr_rec = page_cur_tuple_insert(page_cursor, node_ptr,
  978. index, NULL, 0, mtr);
  979. /* The root page should only contain the node pointer
  980. to new_page at this point. Thus, the data should fit. */
  981. ut_a(node_ptr_rec);
  982. /* Free the memory heap */
  983. mem_heap_free(heap);
  984. /* We play safe and reset the free bits for the new page */
  985. #if 0
  986. fprintf(stderr, "Root raise new page no %lu\n", new_page_no);
  987. #endif
  988. ibuf_reset_free_bits_with_type(index->type, new_block);
  989. /* Reposition the cursor to the child node */
  990. page_cur_search(new_block, index, tuple,
  991. PAGE_CUR_LE, page_cursor);
  992. /* Split the child and insert tuple */
  993. return(btr_page_split_and_insert(cursor, tuple, ext, n_ext, mtr));
  994. }
  995. /*****************************************************************
  996. Decides if the page should be split at the convergence point of inserts
  997. converging to the left. */
  998. ibool
  999. btr_page_get_split_rec_to_left(
  1000. /*===========================*/
  1001. /* out: TRUE if split recommended */
  1002. btr_cur_t* cursor, /* in: cursor at which to insert */
  1003. rec_t** split_rec) /* out: if split recommended,
  1004. the first record on upper half page,
  1005. or NULL if tuple to be inserted should
  1006. be first */
  1007. {
  1008. page_t* page;
  1009. rec_t* insert_point;
  1010. rec_t* infimum;
  1011. page = btr_cur_get_page(cursor);
  1012. insert_point = btr_cur_get_rec(cursor);
  1013. if (page_header_get_ptr(page, PAGE_LAST_INSERT)
  1014. == page_rec_get_next(insert_point)) {
  1015. infimum = page_get_infimum_rec(page);
  1016. /* If the convergence is in the middle of a page, include also
  1017. the record immediately before the new insert to the upper
  1018. page. Otherwise, we could repeatedly move from page to page
  1019. lots of records smaller than the convergence point. */
  1020. if (infimum != insert_point
  1021. && page_rec_get_next(infimum) != insert_point) {
  1022. *split_rec = insert_point;
  1023. } else {
  1024. *split_rec = page_rec_get_next(insert_point);
  1025. }
  1026. return(TRUE);
  1027. }
  1028. return(FALSE);
  1029. }
  1030. /*****************************************************************
  1031. Decides if the page should be split at the convergence point of inserts
  1032. converging to the right. */
  1033. ibool
  1034. btr_page_get_split_rec_to_right(
  1035. /*============================*/
  1036. /* out: TRUE if split recommended */
  1037. btr_cur_t* cursor, /* in: cursor at which to insert */
  1038. rec_t** split_rec) /* out: if split recommended,
  1039. the first record on upper half page,
  1040. or NULL if tuple to be inserted should
  1041. be first */
  1042. {
  1043. page_t* page;
  1044. rec_t* insert_point;
  1045. page = btr_cur_get_page(cursor);
  1046. insert_point = btr_cur_get_rec(cursor);
  1047. /* We use eager heuristics: if the new insert would be right after
  1048. the previous insert on the same page, we assume that there is a
  1049. pattern of sequential inserts here. */
  1050. if (UNIV_LIKELY(page_header_get_ptr(page, PAGE_LAST_INSERT)
  1051. == insert_point)) {
  1052. rec_t* next_rec;
  1053. next_rec = page_rec_get_next(insert_point);
  1054. if (page_rec_is_supremum(next_rec)) {
  1055. split_at_new:
  1056. /* Split at the new record to insert */
  1057. *split_rec = NULL;
  1058. } else {
  1059. rec_t* next_next_rec = page_rec_get_next(next_rec);
  1060. if (page_rec_is_supremum(next_next_rec)) {
  1061. goto split_at_new;
  1062. }
  1063. /* If there are >= 2 user records up from the insert
  1064. point, split all but 1 off. We want to keep one because
  1065. then sequential inserts can use the adaptive hash
  1066. index, as they can do the necessary checks of the right
  1067. search position just by looking at the records on this
  1068. page. */
  1069. *split_rec = next_next_rec;
  1070. }
  1071. return(TRUE);
  1072. }
  1073. return(FALSE);
  1074. }
  1075. /*****************************************************************
  1076. Calculates a split record such that the tuple will certainly fit on
  1077. its half-page when the split is performed. We assume in this function
  1078. only that the cursor page has at least one user record. */
  1079. static
  1080. rec_t*
  1081. btr_page_get_sure_split_rec(
  1082. /*========================*/
  1083. /* out: split record, or NULL if
  1084. tuple will be the first record on
  1085. upper half-page */
  1086. btr_cur_t* cursor, /* in: cursor at which insert
  1087. should be made */
  1088. dtuple_t* tuple, /* in: tuple to insert */
  1089. const ulint* ext, /* in: array of extern field numbers */
  1090. ulint n_ext) /* in: number of elements in ext */
  1091. {
  1092. page_t* page;
  1093. page_zip_des_t* page_zip;
  1094. ulint insert_size;
  1095. ulint free_space;
  1096. ulint total_data;
  1097. ulint total_n_recs;
  1098. ulint total_space;
  1099. ulint incl_data;
  1100. rec_t* ins_rec;
  1101. rec_t* rec;
  1102. rec_t* next_rec;
  1103. ulint n;
  1104. mem_heap_t* heap;
  1105. ulint* offsets;
  1106. page = btr_cur_get_page(cursor);
  1107. insert_size = rec_get_converted_size(cursor->index, tuple, ext, n_ext);
  1108. free_space = page_get_free_space_of_empty(page_is_comp(page));
  1109. page_zip = btr_cur_get_page_zip(cursor);
  1110. if (UNIV_LIKELY_NULL(page_zip)) {
  1111. /* Estimate the free space of an empty compressed page. */
  1112. ulint free_space_zip = page_zip_empty_size(
  1113. cursor->index->n_fields,
  1114. page_zip_get_size(page_zip));
  1115. if (UNIV_LIKELY(free_space > (ulint) free_space_zip)) {
  1116. free_space = (ulint) free_space_zip;
  1117. ut_a(insert_size <= free_space);
  1118. }
  1119. }
  1120. /* free_space is now the free space of a created new page */
  1121. total_data = page_get_data_size(page) + insert_size;
  1122. total_n_recs = page_get_n_recs(page) + 1;
  1123. ut_ad(total_n_recs >= 2);
  1124. total_space = total_data + page_dir_calc_reserved_space(total_n_recs);
  1125. n = 0;
  1126. incl_data = 0;
  1127. ins_rec = btr_cur_get_rec(cursor);
  1128. rec = page_get_infimum_rec(page);
  1129. heap = NULL;
  1130. offsets = NULL;
  1131. /* We start to include records to the left half, and when the
  1132. space reserved by them exceeds half of total_space, then if
  1133. the included records fit on the left page, they will be put there
  1134. if something was left over also for the right page,
  1135. otherwise the last included record will be the first on the right
  1136. half page */
  1137. do {
  1138. /* Decide the next record to include */
  1139. if (rec == ins_rec) {
  1140. rec = NULL; /* NULL denotes that tuple is
  1141. now included */
  1142. } else if (rec == NULL) {
  1143. rec = page_rec_get_next(ins_rec);
  1144. } else {
  1145. rec = page_rec_get_next(rec);
  1146. }
  1147. if (rec == NULL) {
  1148. /* Include tuple */
  1149. incl_data += insert_size;
  1150. } else {
  1151. offsets = rec_get_offsets(rec, cursor->index,
  1152. offsets, ULINT_UNDEFINED,
  1153. &heap);
  1154. incl_data += rec_offs_size(offsets);
  1155. }
  1156. n++;
  1157. } while (incl_data + page_dir_calc_reserved_space(n)
  1158. < total_space / 2);
  1159. if (incl_data + page_dir_calc_reserved_space(n) <= free_space) {
  1160. /* The next record will be the first on
  1161. the right half page if it is not the
  1162. supremum record of page */
  1163. if (rec == ins_rec) {
  1164. rec = NULL;
  1165. goto func_exit;
  1166. } else if (rec == NULL) {
  1167. next_rec = page_rec_get_next(ins_rec);
  1168. } else {
  1169. next_rec = page_rec_get_next(rec);
  1170. }
  1171. ut_ad(next_rec);
  1172. if (!page_rec_is_supremum(next_rec)) {
  1173. rec = next_rec;
  1174. }
  1175. }
  1176. func_exit:
  1177. if (UNIV_LIKELY_NULL(heap)) {
  1178. mem_heap_free(heap);
  1179. }
  1180. return(rec);
  1181. }
  1182. /*****************************************************************
  1183. Returns TRUE if the insert fits on the appropriate half-page with the
  1184. chosen split_rec. */
  1185. static
  1186. ibool
  1187. btr_page_insert_fits(
  1188. /*=================*/
  1189. /* out: TRUE if fits */
  1190. btr_cur_t* cursor, /* in: cursor at which insert
  1191. should be made */
  1192. rec_t* split_rec, /* in: suggestion for first record
  1193. on upper half-page, or NULL if
  1194. tuple to be inserted should be first */
  1195. const ulint* offsets, /* in: rec_get_offsets(
  1196. split_rec, cursor->index) */
  1197. dtuple_t* tuple, /* in: tuple to insert */
  1198. const ulint* ext, /* in: array of extern field numbers */
  1199. ulint n_ext, /* in: number of elements in ext */
  1200. mem_heap_t* heap) /* in: temporary memory heap */
  1201. {
  1202. page_t* page;
  1203. ulint insert_size;
  1204. ulint free_space;
  1205. ulint total_data;
  1206. ulint total_n_recs;
  1207. rec_t* rec;
  1208. rec_t* end_rec;
  1209. ulint* offs;
  1210. page = btr_cur_get_page(cursor);
  1211. ut_ad(!split_rec == !offsets);
  1212. ut_ad(!offsets
  1213. || !page_is_comp(page) == !rec_offs_comp(offsets));
  1214. ut_ad(!offsets
  1215. || rec_offs_validate(split_rec, cursor->index, offsets));
  1216. insert_size = rec_get_converted_size(cursor->index, tuple, ext, n_ext);
  1217. free_space = page_get_free_space_of_empty(page_is_comp(page));
  1218. /* free_space is now the free space of a created new page */
  1219. total_data = page_get_data_size(page) + insert_size;
  1220. total_n_recs = page_get_n_recs(page) + 1;
  1221. /* We determine which records (from rec to end_rec, not including
  1222. end_rec) will end up on the other half page from tuple when it is
  1223. inserted. */
  1224. if (split_rec == NULL) {
  1225. rec = page_rec_get_next(page_get_infimum_rec(page));
  1226. end_rec = page_rec_get_next(btr_cur_get_rec(cursor));
  1227. } else if (cmp_dtuple_rec(tuple, split_rec, offsets) >= 0) {
  1228. rec = page_rec_get_next(page_get_infimum_rec(page));
  1229. end_rec = split_rec;
  1230. } else {
  1231. rec = split_rec;
  1232. end_rec = page_get_supremum_rec(page);
  1233. }
  1234. if (total_data + page_dir_calc_reserved_space(total_n_recs)
  1235. <= free_space) {
  1236. /* Ok, there will be enough available space on the
  1237. half page where the tuple is inserted */
  1238. return(TRUE);
  1239. }
  1240. offs = NULL;
  1241. while (rec != end_rec) {
  1242. /* In this loop we calculate the amount of reserved
  1243. space after rec is removed from page. */
  1244. offs = rec_get_offsets(rec, cursor->index, offs,
  1245. ULINT_UNDEFINED, &heap);
  1246. total_data -= rec_offs_size(offs);
  1247. total_n_recs--;
  1248. if (total_data + page_dir_calc_reserved_space(total_n_recs)
  1249. <= free_space) {
  1250. /* Ok, there will be enough available space on the
  1251. half page where the tuple is inserted */
  1252. return(TRUE);
  1253. }
  1254. rec = page_rec_get_next(rec);
  1255. }
  1256. return(FALSE);
  1257. }
  1258. /***********************************************************
  1259. Inserts a data tuple to a tree on a non-leaf level. It is assumed
  1260. that mtr holds an x-latch on the tree. */
  1261. void
  1262. btr_insert_on_non_leaf_level(
  1263. /*=========================*/
  1264. dict_index_t* index, /* in: index */
  1265. ulint level, /* in: level, must be > 0 */
  1266. dtuple_t* tuple, /* in: the record to be inserted */
  1267. mtr_t* mtr) /* in: mtr */
  1268. {
  1269. big_rec_t* dummy_big_rec;
  1270. btr_cur_t cursor;
  1271. ulint err;
  1272. rec_t* rec;
  1273. ut_ad(level > 0);
  1274. btr_cur_search_to_nth_level(index, level, tuple, PAGE_CUR_LE,
  1275. BTR_CONT_MODIFY_TREE,
  1276. &cursor, 0, mtr);
  1277. err = btr_cur_pessimistic_insert(BTR_NO_LOCKING_FLAG
  1278. | BTR_KEEP_SYS_FLAG
  1279. | BTR_NO_UNDO_LOG_FLAG,
  1280. &cursor, tuple, &rec,
  1281. &dummy_big_rec, NULL, 0, NULL, mtr);
  1282. ut_a(err == DB_SUCCESS);
  1283. }
  1284. /******************************************************************
  1285. Attaches the halves of an index page on the appropriate level in an
  1286. index tree. */
  1287. static
  1288. void
  1289. btr_attach_half_pages(
  1290. /*==================*/
  1291. dict_index_t* index, /* in: the index tree */
  1292. buf_block_t* block, /* in/out: page to be split */
  1293. rec_t* split_rec, /* in: first record on upper
  1294. half page */
  1295. buf_block_t* new_block, /* in/out: the new half page */
  1296. ulint direction, /* in: FSP_UP or FSP_DOWN */
  1297. mtr_t* mtr) /* in: mtr */
  1298. {
  1299. ulint space;
  1300. ulint zip_size;
  1301. ulint prev_page_no;
  1302. ulint next_page_no;
  1303. ulint level;
  1304. page_t* page = buf_block_get_frame(block);
  1305. page_t* lower_page;
  1306. page_t* upper_page;
  1307. ulint lower_page_no;
  1308. ulint upper_page_no;
  1309. page_zip_des_t* lower_page_zip;
  1310. page_zip_des_t* upper_page_zip;
  1311. dtuple_t* node_ptr_upper;
  1312. mem_heap_t* heap;
  1313. ut_ad(mtr_memo_contains(mtr, block, MTR_MEMO_PAGE_X_FIX));
  1314. ut_ad(mtr_memo_contains(mtr, new_block, MTR_MEMO_PAGE_X_FIX));
  1315. /* Create a memory heap where the data tuple is stored */
  1316. heap = mem_heap_create(1024);
  1317. /* Based on split direction, decide upper and lower pages */
  1318. if (direction == FSP_DOWN) {
  1319. btr_cur_t cursor;
  1320. ulint* offsets;
  1321. lower_page = buf_block_get_frame(new_block);
  1322. lower_page_no = buf_block_get_page_no(new_block);
  1323. lower_page_zip = buf_block_get_page_zip(new_block);
  1324. upper_page = buf_block_get_frame(block);
  1325. upper_page_no = buf_block_get_page_no(block);
  1326. upper_page_zip = buf_block_get_page_zip(block);
  1327. /* Look up the index for the node pointer to page */
  1328. offsets = btr_page_get_father_block(NULL, heap, index,
  1329. block, mtr, &cursor);
  1330. /* Replace the address of the old child node (= page) with the
  1331. address of the new lower half */
  1332. btr_node_ptr_set_child_page_no(
  1333. btr_cur_get_rec(&cursor),
  1334. btr_cur_get_page_zip(&cursor),
  1335. offsets, lower_page_no, mtr);
  1336. mem_heap_empty(heap);
  1337. } else {
  1338. lower_page = buf_block_get_frame(block);
  1339. lower_page_no = buf_block_get_page_no(block);
  1340. lower_page_zip = buf_block_get_page_zip(block);
  1341. upper_page = buf_block_get_frame(new_block);
  1342. upper_page_no = buf_block_get_page_no(new_block);
  1343. upper_page_zip = buf_block_get_page_zip(new_block);
  1344. }
  1345. /* Get the level of the split pages */
  1346. level = btr_page_get_level(buf_block_get_frame(block), mtr);
  1347. /* Build the node pointer (= node key and page address) for the upper
  1348. half */
  1349. node_ptr_upper = dict_index_build_node_ptr(index, split_rec,
  1350. upper_page_no, heap, level);
  1351. /* Insert it next to the pointer to the lower half. Note that this
  1352. may generate recursion leading to a split on the higher level. */
  1353. btr_insert_on_non_leaf_level(index, level + 1, node_ptr_upper, mtr);
  1354. /* Free the memory heap */
  1355. mem_heap_free(heap);
  1356. /* Get the previous and next pages of page */
  1357. prev_page_no = btr_page_get_prev(page, mtr);
  1358. next_page_no = btr_page_get_next(page, mtr);
  1359. space = buf_block_get_space(block);
  1360. zip_size = buf_block_get_zip_size(block);
  1361. /* Update page links of the level */
  1362. if (prev_page_no != FIL_NULL) {
  1363. buf_block_t* prev_block = btr_block_get(space, zip_size,
  1364. prev_page_no,
  1365. RW_X_LATCH, mtr);
  1366. #ifdef UNIV_BTR_DEBUG
  1367. ut_a(page_is_comp(prev_block->frame) == page_is_comp(page));
  1368. ut_a(btr_page_get_next(prev_block->frame, mtr)
  1369. == buf_block_get_page_no(block));
  1370. #endif /* UNIV_BTR_DEBUG */
  1371. btr_page_set_next(buf_block_get_frame(prev_block),
  1372. buf_block_get_page_zip(prev_block),
  1373. lower_page_no, mtr);
  1374. }
  1375. if (next_page_no != FIL_NULL) {
  1376. buf_block_t* next_block = btr_block_get(space, zip_size,
  1377. next_page_no,
  1378. RW_X_LATCH, mtr);
  1379. #ifdef UNIV_BTR_DEBUG
  1380. ut_a(page_is_comp(next_block->frame) == page_is_comp(page));
  1381. ut_a(btr_page_get_prev(next_block->frame, mtr)
  1382. == page_get_page_no(page));
  1383. #endif /* UNIV_BTR_DEBUG */
  1384. btr_page_set_prev(buf_block_get_frame(next_block),
  1385. buf_block_get_page_zip(next_block),
  1386. upper_page_no, mtr);
  1387. }
  1388. btr_page_set_prev(lower_page, lower_page_zip, prev_page_no, mtr);
  1389. btr_page_set_next(lower_page, lower_page_zip, upper_page_no, mtr);
  1390. btr_page_set_level(lower_page, lower_page_zip, level, mtr);
  1391. btr_page_set_prev(upper_page, upper_page_zip, lower_page_no, mtr);
  1392. btr_page_set_next(upper_page, upper_page_zip, next_page_no, mtr);
  1393. btr_page_set_level(upper_page, upper_page_zip, level, mtr);
  1394. }
  1395. /*****************************************************************
  1396. Splits an index page to halves and inserts the tuple. It is assumed
  1397. that mtr holds an x-latch to the index tree. NOTE: the tree x-latch
  1398. is released within this function! NOTE that the operation of this
  1399. function must always succeed, we cannot reverse it: therefore
  1400. enough free disk space must be guaranteed to be available before
  1401. this function is called. */
  1402. rec_t*
  1403. btr_page_split_and_insert(
  1404. /*======================*/
  1405. /* out: inserted record; NOTE: the tree
  1406. x-latch is released! NOTE: 2 free disk
  1407. pages must be available! */
  1408. btr_cur_t* cursor, /* in: cursor at which to insert; when the
  1409. function returns, the cursor is positioned
  1410. on the predecessor of the inserted record */
  1411. dtuple_t* tuple, /* in: tuple to insert */
  1412. const ulint* ext, /* in: array of extern field numbers */
  1413. ulint n_ext, /* in: number of elements in vec */
  1414. mtr_t* mtr) /* in: mtr */
  1415. {
  1416. buf_block_t* block;
  1417. page_t* page;
  1418. page_zip_des_t* page_zip;
  1419. ulint page_no;
  1420. byte direction;
  1421. ulint hint_page_no;
  1422. buf_block_t* new_block;
  1423. page_t* new_page;
  1424. page_zip_des_t* new_page_zip;
  1425. rec_t* split_rec;
  1426. buf_block_t* left_block;
  1427. buf_block_t* right_block;
  1428. buf_block_t* insert_block;
  1429. page_t* insert_page;
  1430. page_cur_t* page_cursor;
  1431. rec_t* first_rec;
  1432. byte* buf = 0; /* remove warning */
  1433. rec_t* move_limit;
  1434. ibool insert_will_fit;
  1435. ibool insert_left;
  1436. ulint n_iterations = 0;
  1437. rec_t* rec;
  1438. mem_heap_t* heap;
  1439. ulint n_uniq;
  1440. ulint* offsets;
  1441. heap = mem_heap_create(1024);
  1442. n_uniq = dict_index_get_n_unique_in_tree(cursor->index);
  1443. func_start:
  1444. mem_heap_empty(heap);
  1445. offsets = NULL;
  1446. ut_ad(mtr_memo_contains(mtr, dict_index_get_lock(cursor->index),
  1447. MTR_MEMO_X_LOCK));
  1448. #ifdef UNIV_SYNC_DEBUG
  1449. ut_ad(rw_lock_own(dict_index_get_lock(cursor->index), RW_LOCK_EX));
  1450. #endif /* UNIV_SYNC_DEBUG */
  1451. block = btr_cur_get_block(cursor);
  1452. page = buf_block_get_frame(block);
  1453. page_zip = buf_block_get_page_zip(block);
  1454. ut_ad(mtr_memo_contains(mtr, block, MTR_MEMO_PAGE_X_FIX));
  1455. ut_ad(page_get_n_recs(page) >= 1);
  1456. page_no = buf_block_get_page_no(block);
  1457. /* 1. Decide the split record; split_rec == NULL means that the
  1458. tuple to be inserted should be the first record on the upper
  1459. half-page */
  1460. if (n_iterations > 0) {
  1461. direction = FSP_UP;
  1462. hint_page_no = page_no + 1;
  1463. split_rec = btr_page_get_sure_split_rec(cursor, tuple,
  1464. ext, n_ext);
  1465. } else if (btr_page_get_split_rec_to_right(cursor, &split_rec)) {
  1466. direction = FSP_UP;
  1467. hint_page_no = page_no + 1;
  1468. } else if (btr_page_get_split_rec_to_left(cursor, &split_rec)) {
  1469. direction = FSP_DOWN;
  1470. hint_page_no = page_no - 1;
  1471. } else {
  1472. direction = FSP_UP;
  1473. hint_page_no = page_no + 1;
  1474. split_rec = page_get_middle_rec(page);
  1475. }
  1476. /* 2. Allocate a new page to the index */
  1477. new_block = btr_page_alloc(cursor->index, hint_page_no, direction,
  1478. btr_page_get_level(page, mtr), mtr);
  1479. new_page = buf_block_get_frame(new_block);
  1480. new_page_zip = buf_block_get_page_zip(new_block);
  1481. btr_page_create(new_block, new_page_zip, cursor->index,
  1482. btr_page_get_level(page, mtr), mtr);
  1483. /* 3. Calculate the first record on the upper half-page, and the
  1484. first record (move_limit) on original page which ends up on the
  1485. upper half */
  1486. if (split_rec) {
  1487. first_rec = move_limit = split_rec;
  1488. } else {
  1489. buf = mem_alloc(rec_get_converted_size(cursor->index,
  1490. tuple, ext, n_ext));
  1491. first_rec = rec_convert_dtuple_to_rec(buf, cursor->index,
  1492. tuple, ext, n_ext);
  1493. move_limit = page_rec_get_next(btr_cur_get_rec(cursor));
  1494. }
  1495. /* 4. Do first the modifications in the tree structure */
  1496. btr_attach_half_pages(cursor->index, block,
  1497. first_rec, new_block, direction, mtr);
  1498. /* If the split is made on the leaf level and the insert will fit
  1499. on the appropriate half-page, we may release the tree x-latch.
  1500. We can then move the records after releasing the tree latch,
  1501. thus reducing the tree latch contention. */
  1502. if (split_rec) {
  1503. offsets = rec_get_offsets(split_rec, cursor->index, offsets,
  1504. n_uniq, &heap);
  1505. insert_left = cmp_dtuple_rec(tuple, split_rec, offsets) < 0;
  1506. insert_will_fit = btr_page_insert_fits(cursor, split_rec,
  1507. offsets, tuple,
  1508. ext, n_ext, heap);
  1509. } else {
  1510. mem_free(buf);
  1511. insert_left = FALSE;
  1512. insert_will_fit = btr_page_insert_fits(cursor, NULL,
  1513. NULL, tuple,
  1514. ext, n_ext, heap);
  1515. }
  1516. if (insert_will_fit && page_is_leaf(page) && !page_zip) {
  1517. mtr_memo_release(mtr, dict_index_get_lock(cursor->index),
  1518. MTR_MEMO_X_LOCK);
  1519. }
  1520. /* 5. Move then the records to the new page */
  1521. if (direction == FSP_DOWN) {
  1522. /* fputs("Split left\n", stderr); */
  1523. if (UNIV_UNLIKELY
  1524. (!page_move_rec_list_start(new_block, block, move_limit,
  1525. cursor->index, mtr))) {
  1526. /* For some reason, compressing new_page failed,
  1527. even though it should contain fewer records than
  1528. the original page. Copy the page byte for byte
  1529. and then delete the records from both pages
  1530. as appropriate. Deleting will always succeed. */
  1531. ut_a(new_page_zip);
  1532. page_zip_copy(new_page_zip, new_page,
  1533. page_zip, page, cursor->index, mtr);
  1534. page_delete_rec_list_end(move_limit - page + new_page,
  1535. new_block, cursor->index,
  1536. ULINT_UNDEFINED,
  1537. ULINT_UNDEFINED, mtr);
  1538. page_delete_rec_list_start(move_limit, block,
  1539. cursor->index, mtr);
  1540. }
  1541. left_block = new_block;
  1542. right_block = block;
  1543. lock_update_split_left(right_block, left_block);
  1544. } else {
  1545. /* fputs("Split right\n", stderr); */
  1546. if (UNIV_UNLIKELY
  1547. (!page_move_rec_list_end(new_block, block, move_limit,
  1548. cursor->index, mtr))) {
  1549. /* For some reason, compressing new_page failed,
  1550. even though it should contain fewer records than
  1551. the original page. Copy the page byte for byte
  1552. and then delete the records from both pages
  1553. as appropriate. Deleting will always succeed. */
  1554. ut_a(new_page_zip);
  1555. page_zip_copy(new_page_zip, new_page,
  1556. page_zip, page, cursor->index, mtr);
  1557. page_delete_rec_list_start(move_limit - page
  1558. + new_page, new_block,
  1559. cursor->index, mtr);
  1560. page_delete_rec_list_end(move_limit, block,
  1561. cursor->index,
  1562. ULINT_UNDEFINED,
  1563. ULINT_UNDEFINED, mtr);
  1564. }
  1565. left_block = block;
  1566. right_block = new_block;
  1567. lock_update_split_right(right_block, left_block);
  1568. }
  1569. #ifdef UNIV_ZIP_DEBUG
  1570. if (UNIV_LIKELY_NULL(page_zip)) {
  1571. ut_a(page_zip_validate(page_zip, page));
  1572. ut_a(page_zip_validate(new_page_zip, new_page));
  1573. }
  1574. #endif /* UNIV_ZIP_DEBUG */
  1575. /* At this point, split_rec, move_limit and first_rec may point
  1576. to garbage on the old page. */
  1577. /* 6. The split and the tree modification is now completed. Decide the
  1578. page where the tuple should be inserted */
  1579. if (insert_left) {
  1580. insert_block = left_block;
  1581. } else {
  1582. insert_block = right_block;
  1583. }
  1584. insert_page = buf_block_get_frame(insert_block);
  1585. /* 7. Reposition the cursor for insert and try insertion */
  1586. page_cursor = btr_cur_get_page_cur(cursor);
  1587. page_cur_search(insert_block, cursor->index, tuple,
  1588. PAGE_CUR_LE, page_cursor);
  1589. rec = page_cur_tuple_insert(page_cursor, tuple,
  1590. cursor->index, ext, n_ext, mtr);
  1591. #ifdef UNIV_ZIP_DEBUG
  1592. {
  1593. page_zip_des_t* insert_page_zip
  1594. = buf_block_get_page_zip(insert_block);
  1595. ut_a(!insert_page_zip
  1596. || page_zip_validate(insert_page_zip, insert_page));
  1597. }
  1598. #endif /* UNIV_ZIP_DEBUG */
  1599. if (UNIV_LIKELY(rec != NULL)) {
  1600. goto func_exit;
  1601. }
  1602. /* 8. If insert did not fit, try page reorganization */
  1603. if (UNIV_UNLIKELY
  1604. (!btr_page_reorganize(insert_block, cursor->index, mtr))) {
  1605. goto insert_failed;
  1606. }
  1607. page_cur_search(insert_block, cursor->index, tuple,
  1608. PAGE_CUR_LE, page_cursor);
  1609. rec = page_cur_tuple_insert(page_cursor, tuple, cursor->index,
  1610. ext, n_ext, mtr);
  1611. if (UNIV_UNLIKELY(rec == NULL)) {
  1612. /* The insert did not fit on the page: loop back to the
  1613. start of the function for a new split */
  1614. insert_failed:
  1615. /* We play safe and reset the free bits for new_page */
  1616. ibuf_reset_free_bits_with_type(cursor->index->type, new_block);
  1617. /* fprintf(stderr, "Split second round %lu\n",
  1618. page_get_page_no(page)); */
  1619. n_iterations++;
  1620. ut_ad(n_iterations < 2
  1621. || buf_block_get_page_zip(insert_block));
  1622. ut_ad(!insert_will_fit
  1623. || buf_block_get_page_zip(insert_block));
  1624. goto func_start;
  1625. }
  1626. func_exit:
  1627. /* Insert fit on the page: update the free bits for the
  1628. left and right pages in the same mtr */
  1629. if (!dict_index_is_clust(cursor->index) && page_is_leaf(page)) {
  1630. ibuf_update_free_bits_for_two_pages_low(
  1631. cursor->index,
  1632. buf_block_get_zip_size(left_block),
  1633. left_block, right_block, mtr);
  1634. }
  1635. #if 0
  1636. fprintf(stderr, "Split and insert done %lu %lu\n",
  1637. buf_block_get_page_no(left_block),
  1638. buf_block_get_page_no(right_block));
  1639. #endif
  1640. ut_ad(page_validate(buf_block_get_frame(left_block), cursor->index));
  1641. ut_ad(page_validate(buf_block_get_frame(right_block), cursor->index));
  1642. mem_heap_free(heap);
  1643. return(rec);
  1644. }
  1645. /*****************************************************************
  1646. Removes a page from the level list of pages. */
  1647. static
  1648. void
  1649. btr_level_list_remove(
  1650. /*==================*/
  1651. ulint space, /* in: space where removed */
  1652. ulint zip_size,/* in: compressed page size in bytes
  1653. or 0 for uncompressed pages */
  1654. page_t* page, /* in: page to remove */
  1655. mtr_t* mtr) /* in: mtr */
  1656. {
  1657. ulint prev_page_no;
  1658. ulint next_page_no;
  1659. ut_ad(page && mtr);
  1660. ut_ad(mtr_memo_contains_page(mtr, page, MTR_MEMO_PAGE_X_FIX));
  1661. ut_ad(space == page_get_space_id(page));
  1662. /* Get the previous and next page numbers of page */
  1663. prev_page_no = btr_page_get_prev(page, mtr);
  1664. next_page_no = btr_page_get_next(page, mtr);
  1665. /* Update page links of the level */
  1666. if (prev_page_no != FIL_NULL) {
  1667. buf_block_t* prev_block
  1668. = btr_block_get(space, zip_size, prev_page_no,
  1669. RW_X_LATCH, mtr);
  1670. page_t* prev_page
  1671. = buf_block_get_frame(prev_block);
  1672. #ifdef UNIV_BTR_DEBUG
  1673. ut_a(page_is_comp(prev_page) == page_is_comp(page));
  1674. ut_a(btr_page_get_next(prev_page, mtr)
  1675. == page_get_page_no(page));
  1676. #endif /* UNIV_BTR_DEBUG */
  1677. btr_page_set_next(prev_page,
  1678. buf_block_get_page_zip(prev_block),
  1679. next_page_no, mtr);
  1680. }
  1681. if (next_page_no != FIL_NULL) {
  1682. buf_block_t* next_block
  1683. = btr_block_get(space, zip_size, next_page_no,
  1684. RW_X_LATCH, mtr);
  1685. page_t* next_page
  1686. = buf_block_get_frame(next_block);
  1687. #ifdef UNIV_BTR_DEBUG
  1688. ut_a(page_is_comp(next_page) == page_is_comp(page));
  1689. ut_a(btr_page_get_prev(next_page, mtr)
  1690. == page_get_page_no(page));
  1691. #endif /* UNIV_BTR_DEBUG */
  1692. btr_page_set_prev(next_page,
  1693. buf_block_get_page_zip(next_block),
  1694. prev_page_no, mtr);
  1695. }
  1696. }
  1697. /********************************************************************
  1698. Writes the redo log record for setting an index record as the predefined
  1699. minimum record. */
  1700. UNIV_INLINE
  1701. void
  1702. btr_set_min_rec_mark_log(
  1703. /*=====================*/
  1704. rec_t* rec, /* in: record */
  1705. byte type, /* in: MLOG_COMP_REC_MIN_MARK or MLOG_REC_MIN_MARK */
  1706. mtr_t* mtr) /* in: mtr */
  1707. {
  1708. mlog_write_initial_log_record(rec, type, mtr);
  1709. /* Write rec offset as a 2-byte ulint */
  1710. mlog_catenate_ulint(mtr, page_offset(rec), MLOG_2BYTES);
  1711. }
  1712. /********************************************************************
  1713. Parses the redo log record for setting an index record as the predefined
  1714. minimum record. */
  1715. byte*
  1716. btr_parse_set_min_rec_mark(
  1717. /*=======================*/
  1718. /* out: end of log record or NULL */
  1719. byte* ptr, /* in: buffer */
  1720. byte* end_ptr,/* in: buffer end */
  1721. ulint comp, /* in: nonzero=compact page format */
  1722. page_t* page, /* in: page or NULL */
  1723. mtr_t* mtr) /* in: mtr or NULL */
  1724. {
  1725. rec_t* rec;
  1726. if (end_ptr < ptr + 2) {
  1727. return(NULL);
  1728. }
  1729. if (page) {
  1730. ut_a(!page_is_comp(page) == !comp);
  1731. rec = page + mach_read_from_2(ptr);
  1732. btr_set_min_rec_mark(rec, mtr);
  1733. }
  1734. return(ptr + 2);
  1735. }
  1736. /********************************************************************
  1737. Sets a record as the predefined minimum record. */
  1738. void
  1739. btr_set_min_rec_mark(
  1740. /*=================*/
  1741. rec_t* rec, /* in: record */
  1742. mtr_t* mtr) /* in: mtr */
  1743. {
  1744. ulint info_bits;
  1745. if (UNIV_LIKELY(page_rec_is_comp(rec))) {
  1746. info_bits = rec_get_info_bits(rec, TRUE);
  1747. rec_set_info_bits_new(rec, info_bits | REC_INFO_MIN_REC_FLAG);
  1748. btr_set_min_rec_mark_log(rec, MLOG_COMP_REC_MIN_MARK, mtr);
  1749. } else {
  1750. info_bits = rec_get_info_bits(rec, FALSE);
  1751. rec_set_info_bits_old(rec, info_bits | REC_INFO_MIN_REC_FLAG);
  1752. btr_set_min_rec_mark_log(rec, MLOG_REC_MIN_MARK, mtr);
  1753. }
  1754. }
  1755. /*****************************************************************
  1756. Deletes on the upper level the node pointer to a page. */
  1757. void
  1758. btr_node_ptr_delete(
  1759. /*================*/
  1760. dict_index_t* index, /* in: index tree */
  1761. buf_block_t* block, /* in: page whose node pointer is deleted */
  1762. mtr_t* mtr) /* in: mtr */
  1763. {
  1764. btr_cur_t cursor;
  1765. ibool compressed;
  1766. ulint err;
  1767. ut_ad(mtr_memo_contains(mtr, block, MTR_MEMO_PAGE_X_FIX));
  1768. /* Delete node pointer on father page */
  1769. btr_page_get_father(index, block, mtr, &cursor);
  1770. compressed = btr_cur_pessimistic_delete(&err, TRUE, &cursor, FALSE,
  1771. mtr);
  1772. ut_a(err == DB_SUCCESS);
  1773. if (!compressed) {
  1774. btr_cur_compress_if_useful(&cursor, mtr);
  1775. }
  1776. }
  1777. /*****************************************************************
  1778. If page is the only on its level, this function moves its records to the
  1779. father page, thus reducing the tree height. */
  1780. static
  1781. void
  1782. btr_lift_page_up(
  1783. /*=============*/
  1784. dict_index_t* index, /* in: index tree */
  1785. buf_block_t* block, /* in: page which is the only on its level;
  1786. must not be empty: use
  1787. btr_discard_only_page_on_level if the last
  1788. record from the page should be removed */
  1789. mtr_t* mtr) /* in: mtr */
  1790. {
  1791. buf_block_t* father_block;
  1792. page_t* father_page;
  1793. ulint page_level;
  1794. page_zip_des_t* father_page_zip;
  1795. page_t* page = buf_block_get_frame(block);
  1796. ulint root_page_no;
  1797. buf_block_t* blocks[BTR_MAX_LEVELS];
  1798. ulint n_blocks; /* last used index in blocks[] */
  1799. ulint i;
  1800. ut_ad(btr_page_get_prev(page, mtr) == FIL_NULL);
  1801. ut_ad(btr_page_get_next(page, mtr) == FIL_NULL);
  1802. ut_ad(mtr_memo_contains(mtr, block, MTR_MEMO_PAGE_X_FIX));
  1803. page_level = btr_page_get_level(page, mtr);
  1804. root_page_no = dict_index_get_page(index);
  1805. {
  1806. btr_cur_t cursor;
  1807. mem_heap_t* heap = mem_heap_create(100);
  1808. ulint* offsets;
  1809. buf_block_t* b;
  1810. offsets = btr_page_get_father_block(NULL, heap, index,
  1811. block, mtr, &cursor);
  1812. father_block = btr_cur_get_block(&cursor);
  1813. father_page_zip = buf_block_get_page_zip(father_block);
  1814. father_page = buf_block_get_frame(father_block);
  1815. n_blocks = 0;
  1816. /* Store all ancestor pages so we can reset their
  1817. levels later on. We have to do all the searches on
  1818. the tree now because later on, after we've replaced
  1819. the first level, the tree is in an inconsistent state
  1820. and can not be searched. */
  1821. for (b = father_block;
  1822. buf_block_get_page_no(b) != root_page_no; ) {
  1823. ut_a(n_blocks < BTR_MAX_LEVELS);
  1824. offsets = btr_page_get_father_block(offsets, heap,
  1825. index, b,
  1826. mtr, &cursor);
  1827. blocks[n_blocks++] = b = btr_cur_get_block(&cursor);
  1828. }
  1829. mem_heap_free(heap);
  1830. }
  1831. btr_search_drop_page_hash_index(block);
  1832. /* Make the father empty */
  1833. btr_page_empty(father_block, father_page_zip, mtr, index);
  1834. /* Set the level before inserting records, because
  1835. page_zip_compress() requires that the first user record
  1836. on a non-leaf page has the min_rec_mark set. */
  1837. btr_page_set_level(father_page, father_page_zip, page_level, mtr);
  1838. /* Copy the records to the father page one by one. */
  1839. if (UNIV_UNLIKELY
  1840. (!page_copy_rec_list_end(father_block, block,
  1841. page_get_infimum_rec(page),
  1842. index, mtr))) {
  1843. ut_a(father_page_zip);
  1844. /* Copy the page byte for byte. */
  1845. page_zip_copy(father_page_zip, father_page,
  1846. buf_block_get_page_zip(block),
  1847. page, index, mtr);
  1848. }
  1849. lock_update_copy_and_discard(father_block, block);
  1850. /* Go upward to root page, decrementing levels by one. */
  1851. for (i = 0; i < n_blocks; i++, page_level++) {
  1852. page_t* page = buf_block_get_frame(blocks[i]);
  1853. ut_ad(btr_page_get_level(page, mtr) == page_level + 1);
  1854. btr_page_set_level(page, buf_block_get_page_zip(blocks[i]),
  1855. page_level, mtr);
  1856. }
  1857. /* Free the file page */
  1858. btr_page_free(index, block, mtr);
  1859. /* We play safe and reset the free bits for the father */
  1860. ibuf_reset_free_bits_with_type(index->type, father_block);
  1861. ut_ad(page_validate(father_page, index));
  1862. ut_ad(btr_check_node_ptr(index, father_block, mtr));
  1863. }
  1864. /*****************************************************************
  1865. Tries to merge the page first to the left immediate brother if such a
  1866. brother exists, and the node pointers to the current page and to the brother
  1867. reside on the same page. If the left brother does not satisfy these
  1868. conditions, looks at the right brother. If the page is the only one on that
  1869. level lifts the records of the page to the father page, thus reducing the
  1870. tree height. It is assumed that mtr holds an x-latch on the tree and on the
  1871. page. If cursor is on the leaf level, mtr must also hold x-latches to the
  1872. brothers, if they exist. */
  1873. ibool
  1874. btr_compress(
  1875. /*=========*/
  1876. /* out: TRUE on success */
  1877. btr_cur_t* cursor, /* in: cursor on the page to merge or lift;
  1878. the page must not be empty: in record delete
  1879. use btr_discard_page if the page would become
  1880. empty */
  1881. mtr_t* mtr) /* in: mtr */
  1882. {
  1883. dict_index_t* index;
  1884. ulint space;
  1885. ulint zip_size;
  1886. ulint left_page_no;
  1887. ulint right_page_no;
  1888. buf_block_t* merge_block;
  1889. page_t* merge_page;
  1890. page_zip_des_t* merge_page_zip;
  1891. ibool is_left;
  1892. buf_block_t* block;
  1893. page_t* page;
  1894. btr_cur_t father_cursor;
  1895. mem_heap_t* heap;
  1896. ulint* offsets;
  1897. ulint data_size;
  1898. ulint n_recs;
  1899. ulint max_ins_size;
  1900. ulint max_ins_size_reorg;
  1901. ulint level;
  1902. block = btr_cur_get_block(cursor);
  1903. page = btr_cur_get_page(cursor);
  1904. index = btr_cur_get_index(cursor);
  1905. ut_a((ibool) !!page_is_comp(page) == dict_table_is_comp(index->table));
  1906. ut_ad(mtr_memo_contains(mtr, dict_index_get_lock(index),
  1907. MTR_MEMO_X_LOCK));
  1908. ut_ad(mtr_memo_contains(mtr, block, MTR_MEMO_PAGE_X_FIX));
  1909. level = btr_page_get_level(page, mtr);
  1910. space = dict_index_get_space(index);
  1911. zip_size = dict_table_zip_size(index->table);
  1912. left_page_no = btr_page_get_prev(page, mtr);
  1913. right_page_no = btr_page_get_next(page, mtr);
  1914. #if 0
  1915. fprintf(stderr, "Merge left page %lu right %lu \n",
  1916. left_page_no, right_page_no);
  1917. #endif
  1918. heap = mem_heap_create(100);
  1919. offsets = btr_page_get_father_block(NULL, heap, index, block, mtr,
  1920. &father_cursor);
  1921. /* Decide the page to which we try to merge and which will inherit
  1922. the locks */
  1923. is_left = left_page_no != FIL_NULL;
  1924. if (is_left) {
  1925. merge_block = btr_block_get(space, zip_size, left_page_no,
  1926. RW_X_LATCH, mtr);
  1927. merge_page = buf_block_get_frame(merge_block);
  1928. #ifdef UNIV_BTR_DEBUG
  1929. ut_a(btr_page_get_next(merge_page, mtr)
  1930. == buf_block_get_page_no(block));
  1931. #endif /* UNIV_BTR_DEBUG */
  1932. } else if (right_page_no != FIL_NULL) {
  1933. merge_block = btr_block_get(space, zip_size, right_page_no,
  1934. RW_X_LATCH, mtr);
  1935. merge_page = buf_block_get_frame(merge_block);
  1936. #ifdef UNIV_BTR_DEBUG
  1937. ut_a(btr_page_get_prev(merge_page, mtr)
  1938. == buf_block_get_page_no(block));
  1939. #endif /* UNIV_BTR_DEBUG */
  1940. } else {
  1941. /* The page is the only one on the level, lift the records
  1942. to the father */
  1943. btr_lift_page_up(index, block, mtr);
  1944. mem_heap_free(heap);
  1945. return(TRUE);
  1946. }
  1947. n_recs = page_get_n_recs(page);
  1948. data_size = page_get_data_size(page);
  1949. #ifdef UNIV_BTR_DEBUG
  1950. ut_a(page_is_comp(merge_page) == page_is_comp(page));
  1951. #endif /* UNIV_BTR_DEBUG */
  1952. max_ins_size_reorg = page_get_max_insert_size_after_reorganize(
  1953. merge_page, n_recs);
  1954. if (data_size > max_ins_size_reorg) {
  1955. /* No space for merge */
  1956. err_exit:
  1957. mem_heap_free(heap);
  1958. return(FALSE);
  1959. }
  1960. ut_ad(page_validate(merge_page, index));
  1961. max_ins_size = page_get_max_insert_size(merge_page, n_recs);
  1962. if (UNIV_UNLIKELY(data_size > max_ins_size)) {
  1963. /* We have to reorganize merge_page */
  1964. if (UNIV_UNLIKELY(!btr_page_reorganize(merge_block,
  1965. index, mtr))) {
  1966. goto err_exit;
  1967. }
  1968. max_ins_size = page_get_max_insert_size(merge_page, n_recs);
  1969. ut_ad(page_validate(merge_page, index));
  1970. ut_ad(max_ins_size == max_ins_size_reorg);
  1971. if (UNIV_UNLIKELY(data_size > max_ins_size)) {
  1972. /* Add fault tolerance, though this should
  1973. never happen */
  1974. goto err_exit;
  1975. }
  1976. }
  1977. merge_page_zip = buf_block_get_page_zip(merge_block);
  1978. #ifdef UNIV_ZIP_DEBUG
  1979. if (UNIV_LIKELY_NULL(merge_page_zip)) {
  1980. ut_a(page_zip_validate(merge_page_zip, merge_page));
  1981. ut_a(page_zip_validate(buf_block_get_page_zip(block), page));
  1982. }
  1983. #endif /* UNIV_ZIP_DEBUG */
  1984. /* Move records to the merge page */
  1985. if (is_left) {
  1986. rec_t* orig_pred = page_copy_rec_list_start(
  1987. merge_block, block, page_get_supremum_rec(page),
  1988. index, mtr);
  1989. if (UNIV_UNLIKELY(!orig_pred)) {
  1990. goto err_exit;
  1991. }
  1992. btr_search_drop_page_hash_index(block);
  1993. /* Remove the page from the level list */
  1994. btr_level_list_remove(space, zip_size, page, mtr);
  1995. btr_node_ptr_delete(index, block, mtr);
  1996. lock_update_merge_left(merge_block, orig_pred, block);
  1997. } else {
  1998. rec_t* orig_succ;
  1999. #ifdef UNIV_BTR_DEBUG
  2000. byte fil_page_prev[4];
  2001. #endif /* UNIV_BTR_DEBUG */
  2002. if (UNIV_LIKELY_NULL(merge_page_zip)) {
  2003. /* The function page_zip_compress(), which will be
  2004. invoked by page_copy_rec_list_end() below,
  2005. requires that FIL_PAGE_PREV be FIL_NULL.
  2006. Clear the field, but prepare to restore it. */
  2007. #ifdef UNIV_BTR_DEBUG
  2008. memcpy(fil_page_prev, merge_page + FIL_PAGE_PREV, 4);
  2009. #endif /* UNIV_BTR_DEBUG */
  2010. #if FIL_NULL != 0xffffffff
  2011. # error "FIL_NULL != 0xffffffff"
  2012. #endif
  2013. memset(merge_page + FIL_PAGE_PREV, 0xff, 4);
  2014. }
  2015. orig_succ = page_copy_rec_list_end(merge_block, block,
  2016. page_get_infimum_rec(page),
  2017. cursor->index, mtr);
  2018. if (UNIV_UNLIKELY(!orig_succ)) {
  2019. ut_a(merge_page_zip);
  2020. #ifdef UNIV_BTR_DEBUG
  2021. /* FIL_PAGE_PREV was restored from merge_page_zip. */
  2022. ut_a(!memcmp(fil_page_prev,
  2023. merge_page + FIL_PAGE_PREV, 4));
  2024. #endif /* UNIV_BTR_DEBUG */
  2025. goto err_exit;
  2026. }
  2027. btr_search_drop_page_hash_index(block);
  2028. #ifdef UNIV_BTR_DEBUG
  2029. if (UNIV_LIKELY_NULL(merge_page_zip)) {
  2030. /* Restore FIL_PAGE_PREV in order to avoid an assertion
  2031. failure in btr_level_list_remove(), which will set
  2032. the field again to FIL_NULL. Even though this makes
  2033. merge_page and merge_page_zip inconsistent for a
  2034. split second, it is harmless, because the pages
  2035. are X-latched. */
  2036. memcpy(merge_page + FIL_PAGE_PREV, fil_page_prev, 4);
  2037. }
  2038. #endif /* UNIV_BTR_DEBUG */
  2039. /* Remove the page from the level list */
  2040. btr_level_list_remove(space, zip_size, page, mtr);
  2041. /* Replace the address of the old child node (= page) with the
  2042. address of the merge page to the right */
  2043. btr_node_ptr_set_child_page_no(
  2044. btr_cur_get_rec(&father_cursor),
  2045. btr_cur_get_page_zip(&father_cursor),
  2046. offsets, right_page_no, mtr);
  2047. btr_node_ptr_delete(index, merge_block, mtr);
  2048. lock_update_merge_right(merge_block, orig_succ, block);
  2049. }
  2050. mem_heap_free(heap);
  2051. if (!dict_index_is_clust(index) && page_is_leaf(merge_page)) {
  2052. /* We have added new records to merge_page:
  2053. update its free bits */
  2054. ibuf_update_free_bits_if_full(index, zip_size, merge_block,
  2055. UNIV_PAGE_SIZE, ULINT_UNDEFINED);
  2056. }
  2057. ut_ad(page_validate(merge_page, index));
  2058. /* Free the file page */
  2059. btr_page_free(index, block, mtr);
  2060. ut_ad(btr_check_node_ptr(index, merge_block, mtr));
  2061. return(TRUE);
  2062. }
  2063. /*****************************************************************
  2064. Discards a page that is the only page on its level. */
  2065. static
  2066. void
  2067. btr_discard_only_page_on_level(
  2068. /*===========================*/
  2069. dict_index_t* index, /* in: index tree */
  2070. buf_block_t* block, /* in: page which is the only on its level */
  2071. mtr_t* mtr) /* in: mtr */
  2072. {
  2073. btr_cur_t father_cursor;
  2074. buf_block_t* father_block;
  2075. page_t* father_page;
  2076. page_zip_des_t* father_page_zip;
  2077. page_t* page = buf_block_get_frame(block);
  2078. ulint page_level;
  2079. ut_ad(btr_page_get_prev(page, mtr) == FIL_NULL);
  2080. ut_ad(btr_page_get_next(page, mtr) == FIL_NULL);
  2081. ut_ad(mtr_memo_contains(mtr, block, MTR_MEMO_PAGE_X_FIX));
  2082. btr_search_drop_page_hash_index(block);
  2083. btr_page_get_father(index, block, mtr, &father_cursor);
  2084. father_block = btr_cur_get_block(&father_cursor);
  2085. father_page_zip = buf_block_get_page_zip(father_block);
  2086. father_page = buf_block_get_frame(father_block);
  2087. page_level = btr_page_get_level(page, mtr);
  2088. lock_update_discard(father_block, PAGE_HEAP_NO_SUPREMUM, block);
  2089. btr_page_set_level(father_page, father_page_zip, page_level, mtr);
  2090. /* Free the file page */
  2091. btr_page_free(index, block, mtr);
  2092. if (UNIV_LIKELY(buf_block_get_page_no(father_block)
  2093. == dict_index_get_page(index))) {
  2094. /* The father is the root page */
  2095. btr_page_empty(father_block, father_page_zip, mtr, index);
  2096. /* We play safe and reset the free bits for the father */
  2097. ibuf_reset_free_bits_with_type(index->type, father_block);
  2098. } else {
  2099. ut_ad(page_get_n_recs(father_page) == 1);
  2100. btr_discard_only_page_on_level(index, father_block, mtr);
  2101. }
  2102. }
  2103. /*****************************************************************
  2104. Discards a page from a B-tree. This is used to remove the last record from
  2105. a B-tree page: the whole page must be removed at the same time. This cannot
  2106. be used for the root page, which is allowed to be empty. */
  2107. void
  2108. btr_discard_page(
  2109. /*=============*/
  2110. btr_cur_t* cursor, /* in: cursor on the page to discard: not on
  2111. the root page */
  2112. mtr_t* mtr) /* in: mtr */
  2113. {
  2114. dict_index_t* index;
  2115. ulint space;
  2116. ulint zip_size;
  2117. ulint left_page_no;
  2118. ulint right_page_no;
  2119. buf_block_t* merge_block;
  2120. page_t* merge_page;
  2121. buf_block_t* block;
  2122. page_t* page;
  2123. rec_t* node_ptr;
  2124. block = btr_cur_get_block(cursor);
  2125. index = btr_cur_get_index(cursor);
  2126. ut_ad(dict_index_get_page(index) != buf_block_get_page_no(block));
  2127. ut_ad(mtr_memo_contains(mtr, dict_index_get_lock(index),
  2128. MTR_MEMO_X_LOCK));
  2129. ut_ad(mtr_memo_contains(mtr, block, MTR_MEMO_PAGE_X_FIX));
  2130. space = dict_index_get_space(index);
  2131. zip_size = dict_table_zip_size(index->table);
  2132. /* Decide the page which will inherit the locks */
  2133. left_page_no = btr_page_get_prev(buf_block_get_frame(block), mtr);
  2134. right_page_no = btr_page_get_next(buf_block_get_frame(block), mtr);
  2135. if (left_page_no != FIL_NULL) {
  2136. merge_block = btr_block_get(space, zip_size, left_page_no,
  2137. RW_X_LATCH, mtr);
  2138. merge_page = buf_block_get_frame(merge_block);
  2139. #ifdef UNIV_BTR_DEBUG
  2140. ut_a(btr_page_get_next(merge_page, mtr)
  2141. == buf_block_get_page_no(block));
  2142. #endif /* UNIV_BTR_DEBUG */
  2143. } else if (right_page_no != FIL_NULL) {
  2144. merge_block = btr_block_get(space, zip_size, right_page_no,
  2145. RW_X_LATCH, mtr);
  2146. merge_page = buf_block_get_frame(merge_block);
  2147. #ifdef UNIV_BTR_DEBUG
  2148. ut_a(btr_page_get_prev(merge_page, mtr)
  2149. == buf_block_get_page_no(block));
  2150. #endif /* UNIV_BTR_DEBUG */
  2151. } else {
  2152. btr_discard_only_page_on_level(index, block, mtr);
  2153. return;
  2154. }
  2155. page = buf_block_get_frame(block);
  2156. ut_a(page_is_comp(merge_page) == page_is_comp(page));
  2157. btr_search_drop_page_hash_index(block);
  2158. if (left_page_no == FIL_NULL && !page_is_leaf(page)) {
  2159. /* We have to mark the leftmost node pointer on the right
  2160. side page as the predefined minimum record */
  2161. node_ptr = page_rec_get_next(page_get_infimum_rec(merge_page));
  2162. ut_ad(page_rec_is_user_rec(node_ptr));
  2163. /* This will make page_zip_validate() fail on merge_page
  2164. until btr_level_list_remove() completes. This is harmless,
  2165. because everything will take place within a single
  2166. mini-transaction and because writing to the redo log
  2167. is an atomic operation (performed by mtr_commit()). */
  2168. btr_set_min_rec_mark(node_ptr, mtr);
  2169. }
  2170. btr_node_ptr_delete(index, block, mtr);
  2171. /* Remove the page from the level list */
  2172. btr_level_list_remove(space, zip_size, page, mtr);
  2173. #ifdef UNIV_ZIP_DEBUG
  2174. {
  2175. page_zip_des_t* merge_page_zip
  2176. = buf_block_get_page_zip(merge_block);
  2177. ut_a(!merge_page_zip
  2178. || page_zip_validate(merge_page_zip, merge_page));
  2179. }
  2180. #endif /* UNIV_ZIP_DEBUG */
  2181. if (left_page_no != FIL_NULL) {
  2182. lock_update_discard(merge_block, PAGE_HEAP_NO_SUPREMUM,
  2183. block);
  2184. } else {
  2185. lock_update_discard(merge_block,
  2186. lock_get_min_heap_no(merge_block),
  2187. block);
  2188. }
  2189. /* Free the file page */
  2190. btr_page_free(index, block, mtr);
  2191. ut_ad(btr_check_node_ptr(index, merge_block, mtr));
  2192. }
  2193. #ifdef UNIV_BTR_PRINT
  2194. /*****************************************************************
  2195. Prints size info of a B-tree. */
  2196. void
  2197. btr_print_size(
  2198. /*===========*/
  2199. dict_index_t* index) /* in: index tree */
  2200. {
  2201. page_t* root;
  2202. fseg_header_t* seg;
  2203. mtr_t mtr;
  2204. if (index->type & DICT_IBUF) {
  2205. fputs("Sorry, cannot print info of an ibuf tree:"
  2206. " use ibuf functions\n", stderr);
  2207. return;
  2208. }
  2209. mtr_start(&mtr);
  2210. root = btr_root_get(index, &mtr);
  2211. seg = root + PAGE_HEADER + PAGE_BTR_SEG_TOP;
  2212. fputs("INFO OF THE NON-LEAF PAGE SEGMENT\n", stderr);
  2213. fseg_print(seg, &mtr);
  2214. if (!(index->type & DICT_UNIVERSAL)) {
  2215. seg = root + PAGE_HEADER + PAGE_BTR_SEG_LEAF;
  2216. fputs("INFO OF THE LEAF PAGE SEGMENT\n", stderr);
  2217. fseg_print(seg, &mtr);
  2218. }
  2219. mtr_commit(&mtr);
  2220. }
  2221. /****************************************************************
  2222. Prints recursively index tree pages. */
  2223. static
  2224. void
  2225. btr_print_recursive(
  2226. /*================*/
  2227. dict_index_t* index, /* in: index tree */
  2228. buf_block_t* block, /* in: index page */
  2229. ulint width, /* in: print this many entries from start
  2230. and end */
  2231. mem_heap_t** heap, /* in/out: heap for rec_get_offsets() */
  2232. ulint** offsets,/* in/out: buffer for rec_get_offsets() */
  2233. mtr_t* mtr) /* in: mtr */
  2234. {
  2235. const page_t* page = buf_block_get_frame(block);
  2236. page_cur_t cursor;
  2237. ulint n_recs;
  2238. ulint i = 0;
  2239. mtr_t mtr2;
  2240. ut_ad(mtr_memo_contains(mtr, block, MTR_MEMO_PAGE_X_FIX));
  2241. fprintf(stderr, "NODE ON LEVEL %lu page number %lu\n",
  2242. (ulong) btr_page_get_level(page, mtr),
  2243. (ulong) buf_block_get_page_no(block));
  2244. page_print(block, index, width, width);
  2245. n_recs = page_get_n_recs(page);
  2246. page_cur_set_before_first(block, &cursor);
  2247. page_cur_move_to_next(&cursor);
  2248. while (!page_cur_is_after_last(&cursor)) {
  2249. if (page_is_leaf(page)) {
  2250. /* If this is the leaf level, do nothing */
  2251. } else if ((i <= width) || (i >= n_recs - width)) {
  2252. const rec_t* node_ptr;
  2253. mtr_start(&mtr2);
  2254. node_ptr = page_cur_get_rec(&cursor);
  2255. *offsets = rec_get_offsets(node_ptr, index, *offsets,
  2256. ULINT_UNDEFINED, heap);
  2257. btr_print_recursive(index,
  2258. btr_node_ptr_get_child(node_ptr,
  2259. index,
  2260. *offsets,
  2261. &mtr2),
  2262. width, heap, offsets, &mtr2);
  2263. mtr_commit(&mtr2);
  2264. }
  2265. page_cur_move_to_next(&cursor);
  2266. i++;
  2267. }
  2268. }
  2269. /******************************************************************
  2270. Prints directories and other info of all nodes in the tree. */
  2271. void
  2272. btr_print_index(
  2273. /*============*/
  2274. dict_index_t* index, /* in: index */
  2275. ulint width) /* in: print this many entries from start
  2276. and end */
  2277. {
  2278. mtr_t mtr;
  2279. buf_block_t* root;
  2280. mem_heap_t* heap = NULL;
  2281. ulint offsets_[REC_OFFS_NORMAL_SIZE];
  2282. ulint* offsets = offsets_;
  2283. *offsets_ = (sizeof offsets_) / sizeof *offsets_;
  2284. fputs("--------------------------\n"
  2285. "INDEX TREE PRINT\n", stderr);
  2286. mtr_start(&mtr);
  2287. root = btr_root_block_get(index, &mtr);
  2288. btr_print_recursive(index, root, width, &heap, &offsets, &mtr);
  2289. if (UNIV_LIKELY_NULL(heap)) {
  2290. mem_heap_free(heap);
  2291. }
  2292. mtr_commit(&mtr);
  2293. btr_validate_index(index, NULL);
  2294. }
  2295. #endif /* UNIV_BTR_PRINT */
  2296. #ifdef UNIV_DEBUG
  2297. /****************************************************************
  2298. Checks that the node pointer to a page is appropriate. */
  2299. ibool
  2300. btr_check_node_ptr(
  2301. /*===============*/
  2302. /* out: TRUE */
  2303. dict_index_t* index, /* in: index tree */
  2304. buf_block_t* block, /* in: index page */
  2305. mtr_t* mtr) /* in: mtr */
  2306. {
  2307. mem_heap_t* heap;
  2308. dtuple_t* tuple;
  2309. ulint* offsets;
  2310. btr_cur_t cursor;
  2311. page_t* page = buf_block_get_frame(block);
  2312. ut_ad(mtr_memo_contains(mtr, block, MTR_MEMO_PAGE_X_FIX));
  2313. if (dict_index_get_page(index) == buf_block_get_page_no(block)) {
  2314. return(TRUE);
  2315. }
  2316. heap = mem_heap_create(256);
  2317. offsets = btr_page_get_father_block(NULL, heap, index, block, mtr,
  2318. &cursor);
  2319. if (page_is_leaf(page)) {
  2320. goto func_exit;
  2321. }
  2322. tuple = dict_index_build_node_ptr(
  2323. index, page_rec_get_next(page_get_infimum_rec(page)), 0, heap,
  2324. btr_page_get_level(page, mtr));
  2325. ut_a(!cmp_dtuple_rec(tuple, btr_cur_get_rec(&cursor), offsets));
  2326. func_exit:
  2327. mem_heap_free(heap);
  2328. return(TRUE);
  2329. }
  2330. #endif /* UNIV_DEBUG */
  2331. /****************************************************************
  2332. Display identification information for a record. */
  2333. static
  2334. void
  2335. btr_index_rec_validate_report(
  2336. /*==========================*/
  2337. const page_t* page, /* in: index page */
  2338. const rec_t* rec, /* in: index record */
  2339. dict_index_t* index) /* in: index */
  2340. {
  2341. fputs("InnoDB: Record in ", stderr);
  2342. dict_index_name_print(stderr, NULL, index);
  2343. fprintf(stderr, ", page %lu, at offset %lu\n",
  2344. page_get_page_no(page), (ulint) page_offset(rec));
  2345. }
  2346. /****************************************************************
  2347. Checks the size and number of fields in a record based on the definition of
  2348. the index. */
  2349. ibool
  2350. btr_index_rec_validate(
  2351. /*===================*/
  2352. /* out: TRUE if ok */
  2353. rec_t* rec, /* in: index record */
  2354. dict_index_t* index, /* in: index */
  2355. ibool dump_on_error) /* in: TRUE if the function
  2356. should print hex dump of record
  2357. and page on error */
  2358. {
  2359. ulint len;
  2360. ulint n;
  2361. ulint i;
  2362. page_t* page;
  2363. mem_heap_t* heap = NULL;
  2364. ulint offsets_[REC_OFFS_NORMAL_SIZE];
  2365. ulint* offsets = offsets_;
  2366. *offsets_ = (sizeof offsets_) / sizeof *offsets_;
  2367. page = page_align(rec);
  2368. if (UNIV_UNLIKELY(index->type & DICT_UNIVERSAL)) {
  2369. /* The insert buffer index tree can contain records from any
  2370. other index: we cannot check the number of fields or
  2371. their length */
  2372. return(TRUE);
  2373. }
  2374. if (UNIV_UNLIKELY((ibool)!!page_is_comp(page)
  2375. != dict_table_is_comp(index->table))) {
  2376. btr_index_rec_validate_report(page, rec, index);
  2377. fprintf(stderr, "InnoDB: compact flag=%lu, should be %lu\n",
  2378. (ulong) !!page_is_comp(page),
  2379. (ulong) dict_table_is_comp(index->table));
  2380. return(FALSE);
  2381. }
  2382. n = dict_index_get_n_fields(index);
  2383. if (!page_is_comp(page)
  2384. && UNIV_UNLIKELY(rec_get_n_fields_old(rec) != n)) {
  2385. btr_index_rec_validate_report(page, rec, index);
  2386. fprintf(stderr, "InnoDB: has %lu fields, should have %lu\n",
  2387. (ulong) rec_get_n_fields_old(rec), (ulong) n);
  2388. if (dump_on_error) {
  2389. buf_page_print(page, 0);
  2390. fputs("InnoDB: corrupt record ", stderr);
  2391. rec_print_old(stderr, rec);
  2392. putc('\n', stderr);
  2393. }
  2394. return(FALSE);
  2395. }
  2396. offsets = rec_get_offsets(rec, index, offsets, ULINT_UNDEFINED, &heap);
  2397. for (i = 0; i < n; i++) {
  2398. ulint fixed_size = dict_col_get_fixed_size(
  2399. dict_index_get_nth_col(index, i));
  2400. rec_get_nth_field_offs(offsets, i, &len);
  2401. /* Note that prefix indexes are not fixed size even when
  2402. their type is CHAR. */
  2403. if ((dict_index_get_nth_field(index, i)->prefix_len == 0
  2404. && len != UNIV_SQL_NULL && fixed_size
  2405. && len != fixed_size)
  2406. || (dict_index_get_nth_field(index, i)->prefix_len > 0
  2407. && len != UNIV_SQL_NULL
  2408. && len
  2409. > dict_index_get_nth_field(index, i)->prefix_len)) {
  2410. btr_index_rec_validate_report(page, rec, index);
  2411. fprintf(stderr,
  2412. "InnoDB: field %lu len is %lu,"
  2413. " should be %lu\n",
  2414. (ulong) i, (ulong) len, (ulong) fixed_size);
  2415. if (dump_on_error) {
  2416. buf_page_print(page, 0);
  2417. fputs("InnoDB: corrupt record ", stderr);
  2418. rec_print_new(stderr, rec, offsets);
  2419. putc('\n', stderr);
  2420. }
  2421. if (UNIV_LIKELY_NULL(heap)) {
  2422. mem_heap_free(heap);
  2423. }
  2424. return(FALSE);
  2425. }
  2426. }
  2427. if (UNIV_LIKELY_NULL(heap)) {
  2428. mem_heap_free(heap);
  2429. }
  2430. return(TRUE);
  2431. }
  2432. /****************************************************************
  2433. Checks the size and number of fields in records based on the definition of
  2434. the index. */
  2435. static
  2436. ibool
  2437. btr_index_page_validate(
  2438. /*====================*/
  2439. /* out: TRUE if ok */
  2440. buf_block_t* block, /* in: index page */
  2441. dict_index_t* index) /* in: index */
  2442. {
  2443. page_cur_t cur;
  2444. ibool ret = TRUE;
  2445. page_cur_set_before_first(block, &cur);
  2446. page_cur_move_to_next(&cur);
  2447. for (;;) {
  2448. if (page_cur_is_after_last(&cur)) {
  2449. break;
  2450. }
  2451. if (!btr_index_rec_validate(cur.rec, index, TRUE)) {
  2452. return(FALSE);
  2453. }
  2454. page_cur_move_to_next(&cur);
  2455. }
  2456. return(ret);
  2457. }
  2458. /****************************************************************
  2459. Report an error on one page of an index tree. */
  2460. static
  2461. void
  2462. btr_validate_report1(
  2463. /*=================*/
  2464. /* out: TRUE if ok */
  2465. dict_index_t* index, /* in: index */
  2466. ulint level, /* in: B-tree level */
  2467. const buf_block_t* block) /* in: index page */
  2468. {
  2469. fprintf(stderr, "InnoDB: Error in page %lu of ",
  2470. buf_block_get_page_no(block));
  2471. dict_index_name_print(stderr, NULL, index);
  2472. if (level) {
  2473. fprintf(stderr, ", index tree level %lu", level);
  2474. }
  2475. putc('\n', stderr);
  2476. }
  2477. /****************************************************************
  2478. Report an error on two pages of an index tree. */
  2479. static
  2480. void
  2481. btr_validate_report2(
  2482. /*=================*/
  2483. /* out: TRUE if ok */
  2484. const dict_index_t* index, /* in: index */
  2485. ulint level, /* in: B-tree level */
  2486. const buf_block_t* block1, /* in: first index page */
  2487. const buf_block_t* block2) /* in: second index page */
  2488. {
  2489. fprintf(stderr, "InnoDB: Error in pages %lu and %lu of ",
  2490. buf_block_get_page_no(block1),
  2491. buf_block_get_page_no(block2));
  2492. dict_index_name_print(stderr, NULL, index);
  2493. if (level) {
  2494. fprintf(stderr, ", index tree level %lu", level);
  2495. }
  2496. putc('\n', stderr);
  2497. }
  2498. /****************************************************************
  2499. Validates index tree level. */
  2500. static
  2501. ibool
  2502. btr_validate_level(
  2503. /*===============*/
  2504. /* out: TRUE if ok */
  2505. dict_index_t* index, /* in: index tree */
  2506. trx_t* trx, /* in: transaction or NULL */
  2507. ulint level) /* in: level number */
  2508. {
  2509. ulint space;
  2510. ulint zip_size;
  2511. buf_block_t* block;
  2512. page_t* page;
  2513. buf_block_t* right_block = 0; /* remove warning */
  2514. page_t* right_page = 0; /* remove warning */
  2515. page_t* father_page;
  2516. btr_cur_t node_cur;
  2517. btr_cur_t right_node_cur;
  2518. rec_t* rec;
  2519. ulint right_page_no;
  2520. ulint left_page_no;
  2521. page_cur_t cursor;
  2522. dtuple_t* node_ptr_tuple;
  2523. ibool ret = TRUE;
  2524. mtr_t mtr;
  2525. mem_heap_t* heap = mem_heap_create(256);
  2526. ulint* offsets = NULL;
  2527. ulint* offsets2= NULL;
  2528. #ifdef UNIV_ZIP_DEBUG
  2529. page_zip_des_t* page_zip;
  2530. #endif /* UNIV_ZIP_DEBUG */
  2531. mtr_start(&mtr);
  2532. mtr_x_lock(dict_index_get_lock(index), &mtr);
  2533. block = btr_root_block_get(index, &mtr);
  2534. page = buf_block_get_frame(block);
  2535. space = dict_index_get_space(index);
  2536. zip_size = dict_table_zip_size(index->table);
  2537. while (level != btr_page_get_level(page, &mtr)) {
  2538. const rec_t* node_ptr;
  2539. ut_a(space == buf_block_get_space(block));
  2540. ut_a(space == page_get_space_id(page));
  2541. #ifdef UNIV_ZIP_DEBUG
  2542. page_zip = buf_block_get_page_zip(block);
  2543. ut_a(!page_zip || page_zip_validate(page_zip, page));
  2544. #endif /* UNIV_ZIP_DEBUG */
  2545. ut_a(!page_is_leaf(page));
  2546. page_cur_set_before_first(block, &cursor);
  2547. page_cur_move_to_next(&cursor);
  2548. node_ptr = page_cur_get_rec(&cursor);
  2549. offsets = rec_get_offsets(node_ptr, index, offsets,
  2550. ULINT_UNDEFINED, &heap);
  2551. block = btr_node_ptr_get_child(node_ptr, index, offsets, &mtr);
  2552. page = buf_block_get_frame(block);
  2553. }
  2554. /* Now we are on the desired level. Loop through the pages on that
  2555. level. */
  2556. loop:
  2557. if (trx_is_interrupted(trx)) {
  2558. mtr_commit(&mtr);
  2559. mem_heap_free(heap);
  2560. return(ret);
  2561. }
  2562. mem_heap_empty(heap);
  2563. offsets = offsets2 = NULL;
  2564. mtr_x_lock(dict_index_get_lock(index), &mtr);
  2565. #ifdef UNIV_ZIP_DEBUG
  2566. page_zip = buf_block_get_page_zip(block);
  2567. ut_a(!page_zip || page_zip_validate(page_zip, page));
  2568. #endif /* UNIV_ZIP_DEBUG */
  2569. /* Check ordering etc. of records */
  2570. if (!page_validate(page, index)) {
  2571. btr_validate_report1(index, level, block);
  2572. ret = FALSE;
  2573. } else if (level == 0) {
  2574. /* We are on level 0. Check that the records have the right
  2575. number of fields, and field lengths are right. */
  2576. if (!btr_index_page_validate(block, index)) {
  2577. ret = FALSE;
  2578. }
  2579. }
  2580. ut_a(btr_page_get_level(page, &mtr) == level);
  2581. right_page_no = btr_page_get_next(page, &mtr);
  2582. left_page_no = btr_page_get_prev(page, &mtr);
  2583. ut_a(page_get_n_recs(page) > 0 || (level == 0
  2584. && page_get_page_no(page)
  2585. == dict_index_get_page(index)));
  2586. if (right_page_no != FIL_NULL) {
  2587. const rec_t* right_rec;
  2588. right_block = btr_block_get(space, zip_size, right_page_no,
  2589. RW_X_LATCH, &mtr);
  2590. right_page = buf_block_get_frame(right_block);
  2591. if (UNIV_UNLIKELY(btr_page_get_prev(right_page, &mtr)
  2592. != page_get_page_no(page))) {
  2593. btr_validate_report2(index, level, block, right_block);
  2594. fputs("InnoDB: broken FIL_PAGE_NEXT"
  2595. " or FIL_PAGE_PREV links\n", stderr);
  2596. buf_page_print(page, 0);
  2597. buf_page_print(right_page, 0);
  2598. ret = FALSE;
  2599. }
  2600. if (UNIV_UNLIKELY(page_is_comp(right_page)
  2601. != page_is_comp(page))) {
  2602. btr_validate_report2(index, level, block, right_block);
  2603. fputs("InnoDB: 'compact' flag mismatch\n", stderr);
  2604. buf_page_print(page, 0);
  2605. buf_page_print(right_page, 0);
  2606. ret = FALSE;
  2607. goto node_ptr_fails;
  2608. }
  2609. rec = page_rec_get_prev(page_get_supremum_rec(page));
  2610. right_rec = page_rec_get_next(page_get_infimum_rec(
  2611. right_page));
  2612. offsets = rec_get_offsets(rec, index,
  2613. offsets, ULINT_UNDEFINED, &heap);
  2614. offsets2 = rec_get_offsets(right_rec, index,
  2615. offsets2, ULINT_UNDEFINED, &heap);
  2616. if (UNIV_UNLIKELY(cmp_rec_rec(rec, right_rec,
  2617. offsets, offsets2,
  2618. index) >= 0)) {
  2619. btr_validate_report2(index, level, block, right_block);
  2620. fputs("InnoDB: records in wrong order"
  2621. " on adjacent pages\n", stderr);
  2622. buf_page_print(page, 0);
  2623. buf_page_print(right_page, 0);
  2624. fputs("InnoDB: record ", stderr);
  2625. rec = page_rec_get_prev(page_get_supremum_rec(page));
  2626. rec_print(stderr, rec, index);
  2627. putc('\n', stderr);
  2628. fputs("InnoDB: record ", stderr);
  2629. rec = page_rec_get_next(
  2630. page_get_infimum_rec(right_page));
  2631. rec_print(stderr, rec, index);
  2632. putc('\n', stderr);
  2633. ret = FALSE;
  2634. }
  2635. }
  2636. if (level > 0 && left_page_no == FIL_NULL) {
  2637. ut_a(REC_INFO_MIN_REC_FLAG & rec_get_info_bits(
  2638. page_rec_get_next(page_get_infimum_rec(page)),
  2639. page_is_comp(page)));
  2640. }
  2641. if (buf_block_get_page_no(block) != dict_index_get_page(index)) {
  2642. /* Check father node pointers */
  2643. const rec_t* node_ptr;
  2644. offsets = btr_page_get_father_block(offsets, heap, index,
  2645. block, &mtr, &node_cur);
  2646. father_page = btr_cur_get_page(&node_cur);
  2647. node_ptr = btr_cur_get_rec(&node_cur);
  2648. btr_cur_position(
  2649. index, page_rec_get_prev(page_get_supremum_rec(page)),
  2650. block, &node_cur);
  2651. offsets = btr_page_get_father_node_ptr(offsets, heap,
  2652. &node_cur, &mtr);
  2653. if (UNIV_UNLIKELY(node_ptr != btr_cur_get_rec(&node_cur))
  2654. || UNIV_UNLIKELY(btr_node_ptr_get_child_page_no(node_ptr,
  2655. offsets)
  2656. != buf_block_get_page_no(block))) {
  2657. btr_validate_report1(index, level, block);
  2658. fputs("InnoDB: node pointer to the page is wrong\n",
  2659. stderr);
  2660. buf_page_print(father_page, 0);
  2661. buf_page_print(page, 0);
  2662. fputs("InnoDB: node ptr ", stderr);
  2663. rec_print(stderr, node_ptr, index);
  2664. rec = btr_cur_get_rec(&node_cur);
  2665. fprintf(stderr, "\n"
  2666. "InnoDB: node ptr child page n:o %lu\n",
  2667. (ulong) btr_node_ptr_get_child_page_no(
  2668. rec, offsets));
  2669. fputs("InnoDB: record on page ", stderr);
  2670. rec_print_new(stderr, rec, offsets);
  2671. putc('\n', stderr);
  2672. ret = FALSE;
  2673. goto node_ptr_fails;
  2674. }
  2675. if (!page_is_leaf(page)) {
  2676. node_ptr_tuple = dict_index_build_node_ptr(
  2677. index,
  2678. page_rec_get_next(page_get_infimum_rec(page)),
  2679. 0, heap, btr_page_get_level(page, &mtr));
  2680. if (cmp_dtuple_rec(node_ptr_tuple, node_ptr,
  2681. offsets)) {
  2682. const rec_t* first_rec = page_rec_get_next(
  2683. page_get_infimum_rec(page));
  2684. btr_validate_report1(index, level, block);
  2685. buf_page_print(father_page, 0);
  2686. buf_page_print(page, 0);
  2687. fputs("InnoDB: Error: node ptrs differ"
  2688. " on levels > 0\n"
  2689. "InnoDB: node ptr ", stderr);
  2690. rec_print_new(stderr, node_ptr, offsets);
  2691. fputs("InnoDB: first rec ", stderr);
  2692. rec_print(stderr, first_rec, index);
  2693. putc('\n', stderr);
  2694. ret = FALSE;
  2695. goto node_ptr_fails;
  2696. }
  2697. }
  2698. if (left_page_no == FIL_NULL) {
  2699. ut_a(node_ptr == page_rec_get_next(
  2700. page_get_infimum_rec(father_page)));
  2701. ut_a(btr_page_get_prev(father_page, &mtr) == FIL_NULL);
  2702. }
  2703. if (right_page_no == FIL_NULL) {
  2704. ut_a(node_ptr == page_rec_get_prev(
  2705. page_get_supremum_rec(father_page)));
  2706. ut_a(btr_page_get_next(father_page, &mtr) == FIL_NULL);
  2707. } else {
  2708. offsets = btr_page_get_father_block(
  2709. offsets, heap, index, right_block,
  2710. &mtr, &right_node_cur);
  2711. if (page_rec_get_next((rec_t*) node_ptr)
  2712. != page_get_supremum_rec(father_page)) {
  2713. if (btr_cur_get_rec(&right_node_cur)
  2714. != page_rec_get_next((rec_t*) node_ptr)) {
  2715. ret = FALSE;
  2716. fputs("InnoDB: node pointer to"
  2717. " the right page is wrong\n",
  2718. stderr);
  2719. btr_validate_report1(index, level,
  2720. block);
  2721. buf_page_print(father_page, 0);
  2722. buf_page_print(page, 0);
  2723. buf_page_print(right_page, 0);
  2724. }
  2725. } else {
  2726. page_t* right_father_page
  2727. = btr_cur_get_page(&right_node_cur);
  2728. if (btr_cur_get_rec(&right_node_cur)
  2729. != page_rec_get_next(
  2730. page_get_infimum_rec(
  2731. right_father_page))) {
  2732. ret = FALSE;
  2733. fputs("InnoDB: node pointer 2 to"
  2734. " the right page is wrong\n",
  2735. stderr);
  2736. btr_validate_report1(index, level,
  2737. block);
  2738. buf_page_print(father_page, 0);
  2739. buf_page_print(right_father_page, 0);
  2740. buf_page_print(page, 0);
  2741. buf_page_print(right_page, 0);
  2742. }
  2743. if (page_get_page_no(right_father_page)
  2744. != btr_page_get_next(father_page, &mtr)) {
  2745. ret = FALSE;
  2746. fputs("InnoDB: node pointer 3 to"
  2747. " the right page is wrong\n",
  2748. stderr);
  2749. btr_validate_report1(index, level,
  2750. block);
  2751. buf_page_print(father_page, 0);
  2752. buf_page_print(right_father_page, 0);
  2753. buf_page_print(page, 0);
  2754. buf_page_print(right_page, 0);
  2755. }
  2756. }
  2757. }
  2758. }
  2759. node_ptr_fails:
  2760. /* Commit the mini-transaction to release the latch on 'page'.
  2761. Re-acquire the latch on right_page, which will become 'page'
  2762. on the next loop. The page has already been checked. */
  2763. mtr_commit(&mtr);
  2764. if (right_page_no != FIL_NULL) {
  2765. mtr_start(&mtr);
  2766. block = btr_block_get(space, zip_size, right_page_no,
  2767. RW_X_LATCH, &mtr);
  2768. page = buf_block_get_frame(block);
  2769. goto loop;
  2770. }
  2771. mem_heap_free(heap);
  2772. return(ret);
  2773. }
  2774. /******************************************************************
  2775. Checks the consistency of an index tree. */
  2776. ibool
  2777. btr_validate_index(
  2778. /*===============*/
  2779. /* out: TRUE if ok */
  2780. dict_index_t* index, /* in: index */
  2781. trx_t* trx) /* in: transaction or NULL */
  2782. {
  2783. mtr_t mtr;
  2784. page_t* root;
  2785. ulint i;
  2786. ulint n;
  2787. mtr_start(&mtr);
  2788. mtr_x_lock(dict_index_get_lock(index), &mtr);
  2789. root = btr_root_get(index, &mtr);
  2790. n = btr_page_get_level(root, &mtr);
  2791. for (i = 0; i <= n && !trx_is_interrupted(trx); i++) {
  2792. if (!btr_validate_level(index, trx, n - i)) {
  2793. mtr_commit(&mtr);
  2794. return(FALSE);
  2795. }
  2796. }
  2797. mtr_commit(&mtr);
  2798. return(TRUE);
  2799. }