You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

6392 lines
160 KiB

8 years ago
12 years ago
11 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
11 years ago
11 years ago
8 years ago
11 years ago
8 years ago
11 years ago
11 years ago
10 years ago
10 years ago
12 years ago
11 years ago
11 years ago
10 years ago
10 years ago
MDEV-13564 Mariabackup does not work with TRUNCATE Implement undo tablespace truncation via normal redo logging. Implement TRUNCATE TABLE as a combination of RENAME to #sql-ib name, CREATE, and DROP. Note: Orphan #sql-ib*.ibd may be left behind if MariaDB Server 10.2 is killed before the DROP operation is committed. If MariaDB Server 10.2 is killed during TRUNCATE, it is also possible that the old table was renamed to #sql-ib*.ibd but the data dictionary will refer to the table using the original name. In MariaDB Server 10.3, RENAME inside InnoDB is transactional, and #sql-* tables will be dropped on startup. So, this new TRUNCATE will be fully crash-safe in 10.3. ha_mroonga::wrapper_truncate(): Pass table options to the underlying storage engine, now that ha_innobase::truncate() will need them. rpl_slave_state::truncate_state_table(): Before truncating mysql.gtid_slave_pos, evict any cached table handles from the table definition cache, so that there will be no stale references to the old table after truncating. == TRUNCATE TABLE == WL#6501 in MySQL 5.7 introduced separate log files for implementing atomic and crash-safe TRUNCATE TABLE, instead of using the InnoDB undo and redo log. Some convoluted logic was added to the InnoDB crash recovery, and some extra synchronization (including a redo log checkpoint) was introduced to make this work. This synchronization has caused performance problems and race conditions, and the extra log files cannot be copied or applied by external backup programs. In order to support crash-upgrade from MariaDB 10.2, we will keep the logic for parsing and applying the extra log files, but we will no longer generate those files in TRUNCATE TABLE. A prerequisite for crash-safe TRUNCATE is a crash-safe RENAME TABLE (with full redo and undo logging and proper rollback). This will be implemented in MDEV-14717. ha_innobase::truncate(): Invoke RENAME, create(), delete_table(). Because RENAME cannot be fully rolled back before MariaDB 10.3 due to missing undo logging, add some explicit rename-back in case the operation fails. ha_innobase::delete(): Introduce a variant that takes sqlcom as a parameter. In TRUNCATE TABLE, we do not want to touch any FOREIGN KEY constraints. ha_innobase::create(): Add the parameters file_per_table, trx. In TRUNCATE, the new table must be created in the same transaction that renames the old table. create_table_info_t::create_table_info_t(): Add the parameters file_per_table, trx. row_drop_table_for_mysql(): Replace a bool parameter with sqlcom. row_drop_table_after_create_fail(): New function, wrapping row_drop_table_for_mysql(). dict_truncate_index_tree_in_mem(), fil_truncate_tablespace(), fil_prepare_for_truncate(), fil_reinit_space_header_for_table(), row_truncate_table_for_mysql(), TruncateLogger, row_truncate_prepare(), row_truncate_rollback(), row_truncate_complete(), row_truncate_fts(), row_truncate_update_system_tables(), row_truncate_foreign_key_checks(), row_truncate_sanity_checks(): Remove. row_upd_check_references_constraints(): Remove a check for TRUNCATE, now that the table is no longer truncated in place. The new test innodb.truncate_foreign uses DEBUG_SYNC to cover some race-condition like scenarios. The test innodb-innodb.truncate does not use any synchronization. We add a redo log subformat to indicate backup-friendly format. MariaDB 10.4 will remove support for the old TRUNCATE logging, so crash-upgrade from old 10.2 or 10.3 to 10.4 will involve limitations. == Undo tablespace truncation == MySQL 5.7 implements undo tablespace truncation. It is only possible when innodb_undo_tablespaces is set to at least 2. The logging is implemented similar to the WL#6501 TRUNCATE, that is, using separate log files and a redo log checkpoint. We can simply implement undo tablespace truncation within a single mini-transaction that reinitializes the undo log tablespace file. Unfortunately, due to the redo log format of some operations, currently, the total redo log written by undo tablespace truncation will be more than the combined size of the truncated undo tablespace. It should be acceptable to have a little more than 1 megabyte of log in a single mini-transaction. This will be fixed in MDEV-17138 in MariaDB Server 10.4. recv_sys_t: Add truncated_undo_spaces[] to remember for which undo tablespaces a MLOG_FILE_CREATE2 record was seen. namespace undo: Remove some unnecessary declarations. fil_space_t::is_being_truncated: Document that this flag now only applies to undo tablespaces. Remove some references. fil_space_t::is_stopping(): Do not refer to is_being_truncated. This check is for tablespaces of tables. Potentially used tablespaces are never truncated any more. buf_dblwr_process(): Suppress the out-of-bounds warning for undo tablespaces. fil_truncate_log(): Write a MLOG_FILE_CREATE2 with a nonzero page number (new size of the tablespace in pages) to inform crash recovery that the undo tablespace size has been reduced. fil_op_write_log(): Relax assertions, so that MLOG_FILE_CREATE2 can be written for undo tablespaces (without .ibd file suffix) for a nonzero page number. os_file_truncate(): Add the parameter allow_shrink=false so that undo tablespaces can actually be shrunk using this function. fil_name_parse(): For undo tablespace truncation, buffer MLOG_FILE_CREATE2 in truncated_undo_spaces[]. recv_read_in_area(): Avoid reading pages for which no redo log records remain buffered, after recv_addr_trim() removed them. trx_rseg_header_create(): Add a FIXME comment that we could write much less redo log. trx_undo_truncate_tablespace(): Reinitialize the undo tablespace in a single mini-transaction, which will be flushed to the redo log before the file size is trimmed. recv_addr_trim(): Discard any redo logs for pages that were logged after the new end of a file, before the truncation LSN. If the rec_list becomes empty, reduce n_addrs. After removing any affected records, actually truncate the file. recv_apply_hashed_log_recs(): Invoke recv_addr_trim() right before applying any log records. The undo tablespace files must be open at this point. buf_flush_or_remove_pages(), buf_flush_dirty_pages(), buf_LRU_flush_or_remove_pages(): Add a parameter for specifying the number of the first page to flush or remove (default 0). trx_purge_initiate_truncate(): Remove the log checkpoints, the extra logging, and some unnecessary crash points. Merge the code from trx_undo_truncate_tablespace(). First, flush all to-be-discarded pages (beyond the new end of the file), then trim the space->size to make the page allocation deterministic. At the only remaining crash injection point, flush the redo log, so that the recovery can be tested.
7 years ago
10 years ago
MDEV-17938 ALTER TABLE reports ER_TABLESPACE_EXISTS after failed ALTER TABLE There was a race condition in the error handling of ALTER TABLE when the table contains FULLTEXT INDEX. During the error handling of an erroneous ALTER TABLE statement, when InnoDB would drop the internally created tables for FULLTEXT INDEX, it could happen that one of the hidden tables was being concurrently accessed by a background thread. Because of this, InnoDB would defer the drop operation to the background. However, related to MDEV-13564 backup-safe TRUNCATE TABLE and its prerequisite MDEV-14585, we had to make the background drop table queue crash-safe by renaming the table to a temporary name before enqueueing it. This renaming was introduced in a follow-up of the MDEV-13407 fix. As part of this rename operation, we were unnecessarily parsing the current SQL statement, because the same rename operation could also be executed as part of ALTER TABLE via ha_innobase::rename_table(). If an ALTER TABLE statement was being refused due to incorrectly formed FOREIGN KEY constraint, then it could happen that the renaming of the hidden internal tables for FULLTEXT INDEX could also fail, triggering a host of error log messages, and causing a subsequent table-rebuilding ALTER TABLE operation to fail due to the tablespace already existing. innobase_rename_table(), row_rename_table_for_mysql(): Add the parameter use_fk for suppressing the parsing of FOREIGN KEY constraints. It will only be passed as use_fk=true by ha_innobase::rename_table(), which can be invoked as part of ALTER TABLE...ALGORITHM=COPY.
7 years ago
12 years ago
10 years ago
MDEV-12266: Change dict_table_t::space to fil_space_t* InnoDB always keeps all tablespaces in the fil_system cache. The fil_system.LRU is only for closing file handles; the fil_space_t and fil_node_t for all data files will remain in main memory. Between startup to shutdown, they can only be created and removed by DDL statements. Therefore, we can let dict_table_t::space point directly to the fil_space_t. dict_table_t::space_id: A numeric tablespace ID for the corner cases where we do not have a tablespace. The most prominent examples are ALTER TABLE...DISCARD TABLESPACE or a missing or corrupted file. There are a few functional differences; most notably: (1) DROP TABLE will delete matching .ibd and .cfg files, even if they were not attached to the data dictionary. (2) Some error messages will report file names instead of numeric IDs. There still are many functions that use numeric tablespace IDs instead of fil_space_t*, and many functions could be converted to fil_space_t member functions. Also, Tablespace and Datafile should be merged with fil_space_t and fil_node_t. page_id_t and buf_page_get_gen() could use fil_space_t& instead of a numeric ID, and after moving to a single buffer pool (MDEV-15058), buf_pool_t::page_hash could be moved to fil_space_t::page_hash. FilSpace: Remove. Only few calls to fil_space_acquire() will remain, and gradually they should be removed. mtr_t::set_named_space_id(ulint): Renamed from set_named_space(), to prevent accidental calls to this slower function. Very few callers remain. fseg_create(), fsp_reserve_free_extents(): Take fil_space_t* as a parameter instead of a space_id. fil_space_t::rename(): Wrapper for fil_rename_tablespace_check(), fil_name_write_rename(), fil_rename_tablespace(). Mariabackup passes the parameter log=false; InnoDB passes log=true. dict_mem_table_create(): Take fil_space_t* instead of space_id as parameter. dict_process_sys_tables_rec_and_mtr_commit(): Replace the parameter 'status' with 'bool cached'. dict_get_and_save_data_dir_path(): Avoid copying the fil_node_t::name. fil_ibd_open(): Return the tablespace. fil_space_t::set_imported(): Replaces fil_space_set_imported(). truncate_t: Change many member function parameters to fil_space_t*, and remove page_size parameters. row_truncate_prepare(): Merge to its only caller. row_drop_table_from_cache(): Assert that the table is persistent. dict_create_sys_indexes_tuple(): Write SYS_INDEXES.SPACE=FIL_NULL if the tablespace has been discarded. row_import_update_discarded_flag(): Remove a constant parameter.
8 years ago
MDEV-11369 Instant ADD COLUMN for InnoDB For InnoDB tables, adding, dropping and reordering columns has required a rebuild of the table and all its indexes. Since MySQL 5.6 (and MariaDB 10.0) this has been supported online (LOCK=NONE), allowing concurrent modification of the tables. This work revises the InnoDB ROW_FORMAT=REDUNDANT, ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC so that columns can be appended instantaneously, with only minor changes performed to the table structure. The counter innodb_instant_alter_column in INFORMATION_SCHEMA.GLOBAL_STATUS is incremented whenever a table rebuild operation is converted into an instant ADD COLUMN operation. ROW_FORMAT=COMPRESSED tables will not support instant ADD COLUMN. Some usability limitations will be addressed in subsequent work: MDEV-13134 Introduce ALTER TABLE attributes ALGORITHM=NOCOPY and ALGORITHM=INSTANT MDEV-14016 Allow instant ADD COLUMN, ADD INDEX, LOCK=NONE The format of the clustered index (PRIMARY KEY) is changed as follows: (1) The FIL_PAGE_TYPE of the root page will be FIL_PAGE_TYPE_INSTANT, and a new field PAGE_INSTANT will contain the original number of fields in the clustered index ('core' fields). If instant ADD COLUMN has not been used or the table becomes empty, or the very first instant ADD COLUMN operation is rolled back, the fields PAGE_INSTANT and FIL_PAGE_TYPE will be reset to 0 and FIL_PAGE_INDEX. (2) A special 'default row' record is inserted into the leftmost leaf, between the page infimum and the first user record. This record is distinguished by the REC_INFO_MIN_REC_FLAG, and it is otherwise in the same format as records that contain values for the instantly added columns. This 'default row' always has the same number of fields as the clustered index according to the table definition. The values of 'core' fields are to be ignored. For other fields, the 'default row' will contain the default values as they were during the ALTER TABLE statement. (If the column default values are changed later, those values will only be stored in the .frm file. The 'default row' will contain the original evaluated values, which must be the same for every row.) The 'default row' must be completely hidden from higher-level access routines. Assertions have been added to ensure that no 'default row' is ever present in the adaptive hash index or in locked records. The 'default row' is never delete-marked. (3) In clustered index leaf page records, the number of fields must reside between the number of 'core' fields (dict_index_t::n_core_fields introduced in this work) and dict_index_t::n_fields. If the number of fields is less than dict_index_t::n_fields, the missing fields are replaced with the column value of the 'default row'. Note: The number of fields in the record may shrink if some of the last instantly added columns are updated to the value that is in the 'default row'. The function btr_cur_trim() implements this 'compression' on update and rollback; dtuple::trim() implements it on insert. (4) In ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC records, the new status value REC_STATUS_COLUMNS_ADDED will indicate the presence of a new record header that will encode n_fields-n_core_fields-1 in 1 or 2 bytes. (In ROW_FORMAT=REDUNDANT records, the record header always explicitly encodes the number of fields.) We introduce the undo log record type TRX_UNDO_INSERT_DEFAULT for covering the insert of the 'default row' record when instant ADD COLUMN is used for the first time. Subsequent instant ADD COLUMN can use TRX_UNDO_UPD_EXIST_REC. This is joint work with Vin Chen (陈福荣) from Tencent. The design that was discussed in April 2017 would not have allowed import or export of data files, because instead of the 'default row' it would have introduced a data dictionary table. The test rpl.rpl_alter_instant is exactly as contributed in pull request #408. The test innodb.instant_alter is based on a contributed test. The redo log record format changes for ROW_FORMAT=DYNAMIC and ROW_FORMAT=COMPACT are as contributed. (With this change present, crash recovery from MariaDB 10.3.1 will fail in spectacular ways!) Also the semantics of higher-level redo log records that modify the PAGE_INSTANT field is changed. The redo log format version identifier was already changed to LOG_HEADER_FORMAT_CURRENT=103 in MariaDB 10.3.1. Everything else has been rewritten by me. Thanks to Elena Stepanova, the code has been tested extensively. When rolling back an instant ADD COLUMN operation, we must empty the PAGE_FREE list after deleting or shortening the 'default row' record, by calling either btr_page_empty() or btr_page_reorganize(). We must know the size of each entry in the PAGE_FREE list. If rollback left a freed copy of the 'default row' in the PAGE_FREE list, we would be unable to determine its size (if it is in ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC) because it would contain more fields than the rolled-back definition of the clustered index. UNIV_SQL_DEFAULT: A new special constant that designates an instantly added column that is not present in the clustered index record. len_is_stored(): Check if a length is an actual length. There are two magic length values: UNIV_SQL_DEFAULT, UNIV_SQL_NULL. dict_col_t::def_val: The 'default row' value of the column. If the column is not added instantly, def_val.len will be UNIV_SQL_DEFAULT. dict_col_t: Add the accessors is_virtual(), is_nullable(), is_instant(), instant_value(). dict_col_t::remove_instant(): Remove the 'instant ADD' status of a column. dict_col_t::name(const dict_table_t& table): Replaces dict_table_get_col_name(). dict_index_t::n_core_fields: The original number of fields. For secondary indexes and if instant ADD COLUMN has not been used, this will be equal to dict_index_t::n_fields. dict_index_t::n_core_null_bytes: Number of bytes needed to represent the null flags; usually equal to UT_BITS_IN_BYTES(n_nullable). dict_index_t::NO_CORE_NULL_BYTES: Magic value signalling that n_core_null_bytes was not initialized yet from the clustered index root page. dict_index_t: Add the accessors is_instant(), is_clust(), get_n_nullable(), instant_field_value(). dict_index_t::instant_add_field(): Adjust clustered index metadata for instant ADD COLUMN. dict_index_t::remove_instant(): Remove the 'instant ADD' status of a clustered index when the table becomes empty, or the very first instant ADD COLUMN operation is rolled back. dict_table_t: Add the accessors is_instant(), is_temporary(), supports_instant(). dict_table_t::instant_add_column(): Adjust metadata for instant ADD COLUMN. dict_table_t::rollback_instant(): Adjust metadata on the rollback of instant ADD COLUMN. prepare_inplace_alter_table_dict(): First create the ctx->new_table, and only then decide if the table really needs to be rebuilt. We must split the creation of table or index metadata from the creation of the dictionary table records and the creation of the data. In this way, we can transform a table-rebuilding operation into an instant ADD COLUMN operation. Dictionary objects will only be added to cache when table rebuilding or index creation is needed. The ctx->instant_table will never be added to cache. dict_table_t::add_to_cache(): Modified and renamed from dict_table_add_to_cache(). Do not modify the table metadata. Let the callers invoke dict_table_add_system_columns() and if needed, set can_be_evicted. dict_create_sys_tables_tuple(), dict_create_table_step(): Omit the system columns (which will now exist in the dict_table_t object already at this point). dict_create_table_step(): Expect the callers to invoke dict_table_add_system_columns(). pars_create_table(): Before creating the table creation execution graph, invoke dict_table_add_system_columns(). row_create_table_for_mysql(): Expect all callers to invoke dict_table_add_system_columns(). create_index_dict(): Replaces row_merge_create_index_graph(). innodb_update_n_cols(): Renamed from innobase_update_n_virtual(). Call my_error() if an error occurs. btr_cur_instant_init(), btr_cur_instant_init_low(), btr_cur_instant_root_init(): Load additional metadata from the clustered index and set dict_index_t::n_core_null_bytes. This is invoked when table metadata is first loaded into the data dictionary. dict_boot(): Initialize n_core_null_bytes for the four hard-coded dictionary tables. dict_create_index_step(): Initialize n_core_null_bytes. This is executed as part of CREATE TABLE. dict_index_build_internal_clust(): Initialize n_core_null_bytes to NO_CORE_NULL_BYTES if table->supports_instant(). row_create_index_for_mysql(): Initialize n_core_null_bytes for CREATE TEMPORARY TABLE. commit_cache_norebuild(): Call the code to rename or enlarge columns in the cache only if instant ADD COLUMN is not being used. (Instant ADD COLUMN would copy all column metadata from instant_table to old_table, including the names and lengths.) PAGE_INSTANT: A new 13-bit field for storing dict_index_t::n_core_fields. This is repurposing the 16-bit field PAGE_DIRECTION, of which only the least significant 3 bits were used. The original byte containing PAGE_DIRECTION will be accessible via the new constant PAGE_DIRECTION_B. page_get_instant(), page_set_instant(): Accessors for the PAGE_INSTANT. page_ptr_get_direction(), page_get_direction(), page_ptr_set_direction(): Accessors for PAGE_DIRECTION. page_direction_reset(): Reset PAGE_DIRECTION, PAGE_N_DIRECTION. page_direction_increment(): Increment PAGE_N_DIRECTION and set PAGE_DIRECTION. rec_get_offsets(): Use the 'leaf' parameter for non-debug purposes, and assume that heap_no is always set. Initialize all dict_index_t::n_fields for ROW_FORMAT=REDUNDANT records, even if the record contains fewer fields. rec_offs_make_valid(): Add the parameter 'leaf'. rec_copy_prefix_to_dtuple(): Assert that the tuple is only built on the core fields. Instant ADD COLUMN only applies to the clustered index, and we should never build a search key that has more than the PRIMARY KEY and possibly DB_TRX_ID,DB_ROLL_PTR. All these columns are always present. dict_index_build_data_tuple(): Remove assertions that would be duplicated in rec_copy_prefix_to_dtuple(). rec_init_offsets(): Support ROW_FORMAT=REDUNDANT records whose number of fields is between n_core_fields and n_fields. cmp_rec_rec_with_match(): Implement the comparison between two MIN_REC_FLAG records. trx_t::in_rollback: Make the field available in non-debug builds. trx_start_for_ddl_low(): Remove dangerous error-tolerance. A dictionary transaction must be flagged as such before it has generated any undo log records. This is because trx_undo_assign_undo() will mark the transaction as a dictionary transaction in the undo log header right before the very first undo log record is being written. btr_index_rec_validate(): Account for instant ADD COLUMN row_undo_ins_remove_clust_rec(): On the rollback of an insert into SYS_COLUMNS, revert instant ADD COLUMN in the cache by removing the last column from the table and the clustered index. row_search_on_row_ref(), row_undo_mod_parse_undo_rec(), row_undo_mod(), trx_undo_update_rec_get_update(): Handle the 'default row' as a special case. dtuple_t::trim(index): Omit a redundant suffix of an index tuple right before insert or update. After instant ADD COLUMN, if the last fields of a clustered index tuple match the 'default row', there is no need to store them. While trimming the entry, we must hold a page latch, so that the table cannot be emptied and the 'default row' be deleted. btr_cur_optimistic_update(), btr_cur_pessimistic_update(), row_upd_clust_rec_by_insert(), row_ins_clust_index_entry_low(): Invoke dtuple_t::trim() if needed. row_ins_clust_index_entry(): Restore dtuple_t::n_fields after calling row_ins_clust_index_entry_low(). rec_get_converted_size(), rec_get_converted_size_comp(): Allow the number of fields to be between n_core_fields and n_fields. Do not support infimum,supremum. They are never supposed to be stored in dtuple_t, because page creation nowadays uses a lower-level method for initializing them. rec_convert_dtuple_to_rec_comp(): Assign the status bits based on the number of fields. btr_cur_trim(): In an update, trim the index entry as needed. For the 'default row', handle rollback specially. For user records, omit fields that match the 'default row'. btr_cur_optimistic_delete_func(), btr_cur_pessimistic_delete(): Skip locking and adaptive hash index for the 'default row'. row_log_table_apply_convert_mrec(): Replace 'default row' values if needed. In the temporary file that is applied by row_log_table_apply(), we must identify whether the records contain the extra header for instantly added columns. For now, we will allocate an additional byte for this for ROW_T_INSERT and ROW_T_UPDATE records when the source table has been subject to instant ADD COLUMN. The ROW_T_DELETE records are fine, as they will be converted and will only contain 'core' columns (PRIMARY KEY and some system columns) that are converted from dtuple_t. rec_get_converted_size_temp(), rec_init_offsets_temp(), rec_convert_dtuple_to_temp(): Add the parameter 'status'. REC_INFO_DEFAULT_ROW = REC_INFO_MIN_REC_FLAG | REC_STATUS_COLUMNS_ADDED: An info_bits constant for distinguishing the 'default row' record. rec_comp_status_t: An enum of the status bit values. rec_leaf_format: An enum that replaces the bool parameter of rec_init_offsets_comp_ordinary().
8 years ago
MDEV-11369 Instant ADD COLUMN for InnoDB For InnoDB tables, adding, dropping and reordering columns has required a rebuild of the table and all its indexes. Since MySQL 5.6 (and MariaDB 10.0) this has been supported online (LOCK=NONE), allowing concurrent modification of the tables. This work revises the InnoDB ROW_FORMAT=REDUNDANT, ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC so that columns can be appended instantaneously, with only minor changes performed to the table structure. The counter innodb_instant_alter_column in INFORMATION_SCHEMA.GLOBAL_STATUS is incremented whenever a table rebuild operation is converted into an instant ADD COLUMN operation. ROW_FORMAT=COMPRESSED tables will not support instant ADD COLUMN. Some usability limitations will be addressed in subsequent work: MDEV-13134 Introduce ALTER TABLE attributes ALGORITHM=NOCOPY and ALGORITHM=INSTANT MDEV-14016 Allow instant ADD COLUMN, ADD INDEX, LOCK=NONE The format of the clustered index (PRIMARY KEY) is changed as follows: (1) The FIL_PAGE_TYPE of the root page will be FIL_PAGE_TYPE_INSTANT, and a new field PAGE_INSTANT will contain the original number of fields in the clustered index ('core' fields). If instant ADD COLUMN has not been used or the table becomes empty, or the very first instant ADD COLUMN operation is rolled back, the fields PAGE_INSTANT and FIL_PAGE_TYPE will be reset to 0 and FIL_PAGE_INDEX. (2) A special 'default row' record is inserted into the leftmost leaf, between the page infimum and the first user record. This record is distinguished by the REC_INFO_MIN_REC_FLAG, and it is otherwise in the same format as records that contain values for the instantly added columns. This 'default row' always has the same number of fields as the clustered index according to the table definition. The values of 'core' fields are to be ignored. For other fields, the 'default row' will contain the default values as they were during the ALTER TABLE statement. (If the column default values are changed later, those values will only be stored in the .frm file. The 'default row' will contain the original evaluated values, which must be the same for every row.) The 'default row' must be completely hidden from higher-level access routines. Assertions have been added to ensure that no 'default row' is ever present in the adaptive hash index or in locked records. The 'default row' is never delete-marked. (3) In clustered index leaf page records, the number of fields must reside between the number of 'core' fields (dict_index_t::n_core_fields introduced in this work) and dict_index_t::n_fields. If the number of fields is less than dict_index_t::n_fields, the missing fields are replaced with the column value of the 'default row'. Note: The number of fields in the record may shrink if some of the last instantly added columns are updated to the value that is in the 'default row'. The function btr_cur_trim() implements this 'compression' on update and rollback; dtuple::trim() implements it on insert. (4) In ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC records, the new status value REC_STATUS_COLUMNS_ADDED will indicate the presence of a new record header that will encode n_fields-n_core_fields-1 in 1 or 2 bytes. (In ROW_FORMAT=REDUNDANT records, the record header always explicitly encodes the number of fields.) We introduce the undo log record type TRX_UNDO_INSERT_DEFAULT for covering the insert of the 'default row' record when instant ADD COLUMN is used for the first time. Subsequent instant ADD COLUMN can use TRX_UNDO_UPD_EXIST_REC. This is joint work with Vin Chen (陈福荣) from Tencent. The design that was discussed in April 2017 would not have allowed import or export of data files, because instead of the 'default row' it would have introduced a data dictionary table. The test rpl.rpl_alter_instant is exactly as contributed in pull request #408. The test innodb.instant_alter is based on a contributed test. The redo log record format changes for ROW_FORMAT=DYNAMIC and ROW_FORMAT=COMPACT are as contributed. (With this change present, crash recovery from MariaDB 10.3.1 will fail in spectacular ways!) Also the semantics of higher-level redo log records that modify the PAGE_INSTANT field is changed. The redo log format version identifier was already changed to LOG_HEADER_FORMAT_CURRENT=103 in MariaDB 10.3.1. Everything else has been rewritten by me. Thanks to Elena Stepanova, the code has been tested extensively. When rolling back an instant ADD COLUMN operation, we must empty the PAGE_FREE list after deleting or shortening the 'default row' record, by calling either btr_page_empty() or btr_page_reorganize(). We must know the size of each entry in the PAGE_FREE list. If rollback left a freed copy of the 'default row' in the PAGE_FREE list, we would be unable to determine its size (if it is in ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC) because it would contain more fields than the rolled-back definition of the clustered index. UNIV_SQL_DEFAULT: A new special constant that designates an instantly added column that is not present in the clustered index record. len_is_stored(): Check if a length is an actual length. There are two magic length values: UNIV_SQL_DEFAULT, UNIV_SQL_NULL. dict_col_t::def_val: The 'default row' value of the column. If the column is not added instantly, def_val.len will be UNIV_SQL_DEFAULT. dict_col_t: Add the accessors is_virtual(), is_nullable(), is_instant(), instant_value(). dict_col_t::remove_instant(): Remove the 'instant ADD' status of a column. dict_col_t::name(const dict_table_t& table): Replaces dict_table_get_col_name(). dict_index_t::n_core_fields: The original number of fields. For secondary indexes and if instant ADD COLUMN has not been used, this will be equal to dict_index_t::n_fields. dict_index_t::n_core_null_bytes: Number of bytes needed to represent the null flags; usually equal to UT_BITS_IN_BYTES(n_nullable). dict_index_t::NO_CORE_NULL_BYTES: Magic value signalling that n_core_null_bytes was not initialized yet from the clustered index root page. dict_index_t: Add the accessors is_instant(), is_clust(), get_n_nullable(), instant_field_value(). dict_index_t::instant_add_field(): Adjust clustered index metadata for instant ADD COLUMN. dict_index_t::remove_instant(): Remove the 'instant ADD' status of a clustered index when the table becomes empty, or the very first instant ADD COLUMN operation is rolled back. dict_table_t: Add the accessors is_instant(), is_temporary(), supports_instant(). dict_table_t::instant_add_column(): Adjust metadata for instant ADD COLUMN. dict_table_t::rollback_instant(): Adjust metadata on the rollback of instant ADD COLUMN. prepare_inplace_alter_table_dict(): First create the ctx->new_table, and only then decide if the table really needs to be rebuilt. We must split the creation of table or index metadata from the creation of the dictionary table records and the creation of the data. In this way, we can transform a table-rebuilding operation into an instant ADD COLUMN operation. Dictionary objects will only be added to cache when table rebuilding or index creation is needed. The ctx->instant_table will never be added to cache. dict_table_t::add_to_cache(): Modified and renamed from dict_table_add_to_cache(). Do not modify the table metadata. Let the callers invoke dict_table_add_system_columns() and if needed, set can_be_evicted. dict_create_sys_tables_tuple(), dict_create_table_step(): Omit the system columns (which will now exist in the dict_table_t object already at this point). dict_create_table_step(): Expect the callers to invoke dict_table_add_system_columns(). pars_create_table(): Before creating the table creation execution graph, invoke dict_table_add_system_columns(). row_create_table_for_mysql(): Expect all callers to invoke dict_table_add_system_columns(). create_index_dict(): Replaces row_merge_create_index_graph(). innodb_update_n_cols(): Renamed from innobase_update_n_virtual(). Call my_error() if an error occurs. btr_cur_instant_init(), btr_cur_instant_init_low(), btr_cur_instant_root_init(): Load additional metadata from the clustered index and set dict_index_t::n_core_null_bytes. This is invoked when table metadata is first loaded into the data dictionary. dict_boot(): Initialize n_core_null_bytes for the four hard-coded dictionary tables. dict_create_index_step(): Initialize n_core_null_bytes. This is executed as part of CREATE TABLE. dict_index_build_internal_clust(): Initialize n_core_null_bytes to NO_CORE_NULL_BYTES if table->supports_instant(). row_create_index_for_mysql(): Initialize n_core_null_bytes for CREATE TEMPORARY TABLE. commit_cache_norebuild(): Call the code to rename or enlarge columns in the cache only if instant ADD COLUMN is not being used. (Instant ADD COLUMN would copy all column metadata from instant_table to old_table, including the names and lengths.) PAGE_INSTANT: A new 13-bit field for storing dict_index_t::n_core_fields. This is repurposing the 16-bit field PAGE_DIRECTION, of which only the least significant 3 bits were used. The original byte containing PAGE_DIRECTION will be accessible via the new constant PAGE_DIRECTION_B. page_get_instant(), page_set_instant(): Accessors for the PAGE_INSTANT. page_ptr_get_direction(), page_get_direction(), page_ptr_set_direction(): Accessors for PAGE_DIRECTION. page_direction_reset(): Reset PAGE_DIRECTION, PAGE_N_DIRECTION. page_direction_increment(): Increment PAGE_N_DIRECTION and set PAGE_DIRECTION. rec_get_offsets(): Use the 'leaf' parameter for non-debug purposes, and assume that heap_no is always set. Initialize all dict_index_t::n_fields for ROW_FORMAT=REDUNDANT records, even if the record contains fewer fields. rec_offs_make_valid(): Add the parameter 'leaf'. rec_copy_prefix_to_dtuple(): Assert that the tuple is only built on the core fields. Instant ADD COLUMN only applies to the clustered index, and we should never build a search key that has more than the PRIMARY KEY and possibly DB_TRX_ID,DB_ROLL_PTR. All these columns are always present. dict_index_build_data_tuple(): Remove assertions that would be duplicated in rec_copy_prefix_to_dtuple(). rec_init_offsets(): Support ROW_FORMAT=REDUNDANT records whose number of fields is between n_core_fields and n_fields. cmp_rec_rec_with_match(): Implement the comparison between two MIN_REC_FLAG records. trx_t::in_rollback: Make the field available in non-debug builds. trx_start_for_ddl_low(): Remove dangerous error-tolerance. A dictionary transaction must be flagged as such before it has generated any undo log records. This is because trx_undo_assign_undo() will mark the transaction as a dictionary transaction in the undo log header right before the very first undo log record is being written. btr_index_rec_validate(): Account for instant ADD COLUMN row_undo_ins_remove_clust_rec(): On the rollback of an insert into SYS_COLUMNS, revert instant ADD COLUMN in the cache by removing the last column from the table and the clustered index. row_search_on_row_ref(), row_undo_mod_parse_undo_rec(), row_undo_mod(), trx_undo_update_rec_get_update(): Handle the 'default row' as a special case. dtuple_t::trim(index): Omit a redundant suffix of an index tuple right before insert or update. After instant ADD COLUMN, if the last fields of a clustered index tuple match the 'default row', there is no need to store them. While trimming the entry, we must hold a page latch, so that the table cannot be emptied and the 'default row' be deleted. btr_cur_optimistic_update(), btr_cur_pessimistic_update(), row_upd_clust_rec_by_insert(), row_ins_clust_index_entry_low(): Invoke dtuple_t::trim() if needed. row_ins_clust_index_entry(): Restore dtuple_t::n_fields after calling row_ins_clust_index_entry_low(). rec_get_converted_size(), rec_get_converted_size_comp(): Allow the number of fields to be between n_core_fields and n_fields. Do not support infimum,supremum. They are never supposed to be stored in dtuple_t, because page creation nowadays uses a lower-level method for initializing them. rec_convert_dtuple_to_rec_comp(): Assign the status bits based on the number of fields. btr_cur_trim(): In an update, trim the index entry as needed. For the 'default row', handle rollback specially. For user records, omit fields that match the 'default row'. btr_cur_optimistic_delete_func(), btr_cur_pessimistic_delete(): Skip locking and adaptive hash index for the 'default row'. row_log_table_apply_convert_mrec(): Replace 'default row' values if needed. In the temporary file that is applied by row_log_table_apply(), we must identify whether the records contain the extra header for instantly added columns. For now, we will allocate an additional byte for this for ROW_T_INSERT and ROW_T_UPDATE records when the source table has been subject to instant ADD COLUMN. The ROW_T_DELETE records are fine, as they will be converted and will only contain 'core' columns (PRIMARY KEY and some system columns) that are converted from dtuple_t. rec_get_converted_size_temp(), rec_init_offsets_temp(), rec_convert_dtuple_to_temp(): Add the parameter 'status'. REC_INFO_DEFAULT_ROW = REC_INFO_MIN_REC_FLAG | REC_STATUS_COLUMNS_ADDED: An info_bits constant for distinguishing the 'default row' record. rec_comp_status_t: An enum of the status bit values. rec_leaf_format: An enum that replaces the bool parameter of rec_init_offsets_comp_ordinary().
8 years ago
MDEV-11369 Instant ADD COLUMN for InnoDB For InnoDB tables, adding, dropping and reordering columns has required a rebuild of the table and all its indexes. Since MySQL 5.6 (and MariaDB 10.0) this has been supported online (LOCK=NONE), allowing concurrent modification of the tables. This work revises the InnoDB ROW_FORMAT=REDUNDANT, ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC so that columns can be appended instantaneously, with only minor changes performed to the table structure. The counter innodb_instant_alter_column in INFORMATION_SCHEMA.GLOBAL_STATUS is incremented whenever a table rebuild operation is converted into an instant ADD COLUMN operation. ROW_FORMAT=COMPRESSED tables will not support instant ADD COLUMN. Some usability limitations will be addressed in subsequent work: MDEV-13134 Introduce ALTER TABLE attributes ALGORITHM=NOCOPY and ALGORITHM=INSTANT MDEV-14016 Allow instant ADD COLUMN, ADD INDEX, LOCK=NONE The format of the clustered index (PRIMARY KEY) is changed as follows: (1) The FIL_PAGE_TYPE of the root page will be FIL_PAGE_TYPE_INSTANT, and a new field PAGE_INSTANT will contain the original number of fields in the clustered index ('core' fields). If instant ADD COLUMN has not been used or the table becomes empty, or the very first instant ADD COLUMN operation is rolled back, the fields PAGE_INSTANT and FIL_PAGE_TYPE will be reset to 0 and FIL_PAGE_INDEX. (2) A special 'default row' record is inserted into the leftmost leaf, between the page infimum and the first user record. This record is distinguished by the REC_INFO_MIN_REC_FLAG, and it is otherwise in the same format as records that contain values for the instantly added columns. This 'default row' always has the same number of fields as the clustered index according to the table definition. The values of 'core' fields are to be ignored. For other fields, the 'default row' will contain the default values as they were during the ALTER TABLE statement. (If the column default values are changed later, those values will only be stored in the .frm file. The 'default row' will contain the original evaluated values, which must be the same for every row.) The 'default row' must be completely hidden from higher-level access routines. Assertions have been added to ensure that no 'default row' is ever present in the adaptive hash index or in locked records. The 'default row' is never delete-marked. (3) In clustered index leaf page records, the number of fields must reside between the number of 'core' fields (dict_index_t::n_core_fields introduced in this work) and dict_index_t::n_fields. If the number of fields is less than dict_index_t::n_fields, the missing fields are replaced with the column value of the 'default row'. Note: The number of fields in the record may shrink if some of the last instantly added columns are updated to the value that is in the 'default row'. The function btr_cur_trim() implements this 'compression' on update and rollback; dtuple::trim() implements it on insert. (4) In ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC records, the new status value REC_STATUS_COLUMNS_ADDED will indicate the presence of a new record header that will encode n_fields-n_core_fields-1 in 1 or 2 bytes. (In ROW_FORMAT=REDUNDANT records, the record header always explicitly encodes the number of fields.) We introduce the undo log record type TRX_UNDO_INSERT_DEFAULT for covering the insert of the 'default row' record when instant ADD COLUMN is used for the first time. Subsequent instant ADD COLUMN can use TRX_UNDO_UPD_EXIST_REC. This is joint work with Vin Chen (陈福荣) from Tencent. The design that was discussed in April 2017 would not have allowed import or export of data files, because instead of the 'default row' it would have introduced a data dictionary table. The test rpl.rpl_alter_instant is exactly as contributed in pull request #408. The test innodb.instant_alter is based on a contributed test. The redo log record format changes for ROW_FORMAT=DYNAMIC and ROW_FORMAT=COMPACT are as contributed. (With this change present, crash recovery from MariaDB 10.3.1 will fail in spectacular ways!) Also the semantics of higher-level redo log records that modify the PAGE_INSTANT field is changed. The redo log format version identifier was already changed to LOG_HEADER_FORMAT_CURRENT=103 in MariaDB 10.3.1. Everything else has been rewritten by me. Thanks to Elena Stepanova, the code has been tested extensively. When rolling back an instant ADD COLUMN operation, we must empty the PAGE_FREE list after deleting or shortening the 'default row' record, by calling either btr_page_empty() or btr_page_reorganize(). We must know the size of each entry in the PAGE_FREE list. If rollback left a freed copy of the 'default row' in the PAGE_FREE list, we would be unable to determine its size (if it is in ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC) because it would contain more fields than the rolled-back definition of the clustered index. UNIV_SQL_DEFAULT: A new special constant that designates an instantly added column that is not present in the clustered index record. len_is_stored(): Check if a length is an actual length. There are two magic length values: UNIV_SQL_DEFAULT, UNIV_SQL_NULL. dict_col_t::def_val: The 'default row' value of the column. If the column is not added instantly, def_val.len will be UNIV_SQL_DEFAULT. dict_col_t: Add the accessors is_virtual(), is_nullable(), is_instant(), instant_value(). dict_col_t::remove_instant(): Remove the 'instant ADD' status of a column. dict_col_t::name(const dict_table_t& table): Replaces dict_table_get_col_name(). dict_index_t::n_core_fields: The original number of fields. For secondary indexes and if instant ADD COLUMN has not been used, this will be equal to dict_index_t::n_fields. dict_index_t::n_core_null_bytes: Number of bytes needed to represent the null flags; usually equal to UT_BITS_IN_BYTES(n_nullable). dict_index_t::NO_CORE_NULL_BYTES: Magic value signalling that n_core_null_bytes was not initialized yet from the clustered index root page. dict_index_t: Add the accessors is_instant(), is_clust(), get_n_nullable(), instant_field_value(). dict_index_t::instant_add_field(): Adjust clustered index metadata for instant ADD COLUMN. dict_index_t::remove_instant(): Remove the 'instant ADD' status of a clustered index when the table becomes empty, or the very first instant ADD COLUMN operation is rolled back. dict_table_t: Add the accessors is_instant(), is_temporary(), supports_instant(). dict_table_t::instant_add_column(): Adjust metadata for instant ADD COLUMN. dict_table_t::rollback_instant(): Adjust metadata on the rollback of instant ADD COLUMN. prepare_inplace_alter_table_dict(): First create the ctx->new_table, and only then decide if the table really needs to be rebuilt. We must split the creation of table or index metadata from the creation of the dictionary table records and the creation of the data. In this way, we can transform a table-rebuilding operation into an instant ADD COLUMN operation. Dictionary objects will only be added to cache when table rebuilding or index creation is needed. The ctx->instant_table will never be added to cache. dict_table_t::add_to_cache(): Modified and renamed from dict_table_add_to_cache(). Do not modify the table metadata. Let the callers invoke dict_table_add_system_columns() and if needed, set can_be_evicted. dict_create_sys_tables_tuple(), dict_create_table_step(): Omit the system columns (which will now exist in the dict_table_t object already at this point). dict_create_table_step(): Expect the callers to invoke dict_table_add_system_columns(). pars_create_table(): Before creating the table creation execution graph, invoke dict_table_add_system_columns(). row_create_table_for_mysql(): Expect all callers to invoke dict_table_add_system_columns(). create_index_dict(): Replaces row_merge_create_index_graph(). innodb_update_n_cols(): Renamed from innobase_update_n_virtual(). Call my_error() if an error occurs. btr_cur_instant_init(), btr_cur_instant_init_low(), btr_cur_instant_root_init(): Load additional metadata from the clustered index and set dict_index_t::n_core_null_bytes. This is invoked when table metadata is first loaded into the data dictionary. dict_boot(): Initialize n_core_null_bytes for the four hard-coded dictionary tables. dict_create_index_step(): Initialize n_core_null_bytes. This is executed as part of CREATE TABLE. dict_index_build_internal_clust(): Initialize n_core_null_bytes to NO_CORE_NULL_BYTES if table->supports_instant(). row_create_index_for_mysql(): Initialize n_core_null_bytes for CREATE TEMPORARY TABLE. commit_cache_norebuild(): Call the code to rename or enlarge columns in the cache only if instant ADD COLUMN is not being used. (Instant ADD COLUMN would copy all column metadata from instant_table to old_table, including the names and lengths.) PAGE_INSTANT: A new 13-bit field for storing dict_index_t::n_core_fields. This is repurposing the 16-bit field PAGE_DIRECTION, of which only the least significant 3 bits were used. The original byte containing PAGE_DIRECTION will be accessible via the new constant PAGE_DIRECTION_B. page_get_instant(), page_set_instant(): Accessors for the PAGE_INSTANT. page_ptr_get_direction(), page_get_direction(), page_ptr_set_direction(): Accessors for PAGE_DIRECTION. page_direction_reset(): Reset PAGE_DIRECTION, PAGE_N_DIRECTION. page_direction_increment(): Increment PAGE_N_DIRECTION and set PAGE_DIRECTION. rec_get_offsets(): Use the 'leaf' parameter for non-debug purposes, and assume that heap_no is always set. Initialize all dict_index_t::n_fields for ROW_FORMAT=REDUNDANT records, even if the record contains fewer fields. rec_offs_make_valid(): Add the parameter 'leaf'. rec_copy_prefix_to_dtuple(): Assert that the tuple is only built on the core fields. Instant ADD COLUMN only applies to the clustered index, and we should never build a search key that has more than the PRIMARY KEY and possibly DB_TRX_ID,DB_ROLL_PTR. All these columns are always present. dict_index_build_data_tuple(): Remove assertions that would be duplicated in rec_copy_prefix_to_dtuple(). rec_init_offsets(): Support ROW_FORMAT=REDUNDANT records whose number of fields is between n_core_fields and n_fields. cmp_rec_rec_with_match(): Implement the comparison between two MIN_REC_FLAG records. trx_t::in_rollback: Make the field available in non-debug builds. trx_start_for_ddl_low(): Remove dangerous error-tolerance. A dictionary transaction must be flagged as such before it has generated any undo log records. This is because trx_undo_assign_undo() will mark the transaction as a dictionary transaction in the undo log header right before the very first undo log record is being written. btr_index_rec_validate(): Account for instant ADD COLUMN row_undo_ins_remove_clust_rec(): On the rollback of an insert into SYS_COLUMNS, revert instant ADD COLUMN in the cache by removing the last column from the table and the clustered index. row_search_on_row_ref(), row_undo_mod_parse_undo_rec(), row_undo_mod(), trx_undo_update_rec_get_update(): Handle the 'default row' as a special case. dtuple_t::trim(index): Omit a redundant suffix of an index tuple right before insert or update. After instant ADD COLUMN, if the last fields of a clustered index tuple match the 'default row', there is no need to store them. While trimming the entry, we must hold a page latch, so that the table cannot be emptied and the 'default row' be deleted. btr_cur_optimistic_update(), btr_cur_pessimistic_update(), row_upd_clust_rec_by_insert(), row_ins_clust_index_entry_low(): Invoke dtuple_t::trim() if needed. row_ins_clust_index_entry(): Restore dtuple_t::n_fields after calling row_ins_clust_index_entry_low(). rec_get_converted_size(), rec_get_converted_size_comp(): Allow the number of fields to be between n_core_fields and n_fields. Do not support infimum,supremum. They are never supposed to be stored in dtuple_t, because page creation nowadays uses a lower-level method for initializing them. rec_convert_dtuple_to_rec_comp(): Assign the status bits based on the number of fields. btr_cur_trim(): In an update, trim the index entry as needed. For the 'default row', handle rollback specially. For user records, omit fields that match the 'default row'. btr_cur_optimistic_delete_func(), btr_cur_pessimistic_delete(): Skip locking and adaptive hash index for the 'default row'. row_log_table_apply_convert_mrec(): Replace 'default row' values if needed. In the temporary file that is applied by row_log_table_apply(), we must identify whether the records contain the extra header for instantly added columns. For now, we will allocate an additional byte for this for ROW_T_INSERT and ROW_T_UPDATE records when the source table has been subject to instant ADD COLUMN. The ROW_T_DELETE records are fine, as they will be converted and will only contain 'core' columns (PRIMARY KEY and some system columns) that are converted from dtuple_t. rec_get_converted_size_temp(), rec_init_offsets_temp(), rec_convert_dtuple_to_temp(): Add the parameter 'status'. REC_INFO_DEFAULT_ROW = REC_INFO_MIN_REC_FLAG | REC_STATUS_COLUMNS_ADDED: An info_bits constant for distinguishing the 'default row' record. rec_comp_status_t: An enum of the status bit values. rec_leaf_format: An enum that replaces the bool parameter of rec_init_offsets_comp_ordinary().
8 years ago
MDEV-13564 Mariabackup does not work with TRUNCATE Implement undo tablespace truncation via normal redo logging. Implement TRUNCATE TABLE as a combination of RENAME to #sql-ib name, CREATE, and DROP. Note: Orphan #sql-ib*.ibd may be left behind if MariaDB Server 10.2 is killed before the DROP operation is committed. If MariaDB Server 10.2 is killed during TRUNCATE, it is also possible that the old table was renamed to #sql-ib*.ibd but the data dictionary will refer to the table using the original name. In MariaDB Server 10.3, RENAME inside InnoDB is transactional, and #sql-* tables will be dropped on startup. So, this new TRUNCATE will be fully crash-safe in 10.3. ha_mroonga::wrapper_truncate(): Pass table options to the underlying storage engine, now that ha_innobase::truncate() will need them. rpl_slave_state::truncate_state_table(): Before truncating mysql.gtid_slave_pos, evict any cached table handles from the table definition cache, so that there will be no stale references to the old table after truncating. == TRUNCATE TABLE == WL#6501 in MySQL 5.7 introduced separate log files for implementing atomic and crash-safe TRUNCATE TABLE, instead of using the InnoDB undo and redo log. Some convoluted logic was added to the InnoDB crash recovery, and some extra synchronization (including a redo log checkpoint) was introduced to make this work. This synchronization has caused performance problems and race conditions, and the extra log files cannot be copied or applied by external backup programs. In order to support crash-upgrade from MariaDB 10.2, we will keep the logic for parsing and applying the extra log files, but we will no longer generate those files in TRUNCATE TABLE. A prerequisite for crash-safe TRUNCATE is a crash-safe RENAME TABLE (with full redo and undo logging and proper rollback). This will be implemented in MDEV-14717. ha_innobase::truncate(): Invoke RENAME, create(), delete_table(). Because RENAME cannot be fully rolled back before MariaDB 10.3 due to missing undo logging, add some explicit rename-back in case the operation fails. ha_innobase::delete(): Introduce a variant that takes sqlcom as a parameter. In TRUNCATE TABLE, we do not want to touch any FOREIGN KEY constraints. ha_innobase::create(): Add the parameters file_per_table, trx. In TRUNCATE, the new table must be created in the same transaction that renames the old table. create_table_info_t::create_table_info_t(): Add the parameters file_per_table, trx. row_drop_table_for_mysql(): Replace a bool parameter with sqlcom. row_drop_table_after_create_fail(): New function, wrapping row_drop_table_for_mysql(). dict_truncate_index_tree_in_mem(), fil_truncate_tablespace(), fil_prepare_for_truncate(), fil_reinit_space_header_for_table(), row_truncate_table_for_mysql(), TruncateLogger, row_truncate_prepare(), row_truncate_rollback(), row_truncate_complete(), row_truncate_fts(), row_truncate_update_system_tables(), row_truncate_foreign_key_checks(), row_truncate_sanity_checks(): Remove. row_upd_check_references_constraints(): Remove a check for TRUNCATE, now that the table is no longer truncated in place. The new test innodb.truncate_foreign uses DEBUG_SYNC to cover some race-condition like scenarios. The test innodb-innodb.truncate does not use any synchronization. We add a redo log subformat to indicate backup-friendly format. MariaDB 10.4 will remove support for the old TRUNCATE logging, so crash-upgrade from old 10.2 or 10.3 to 10.4 will involve limitations. == Undo tablespace truncation == MySQL 5.7 implements undo tablespace truncation. It is only possible when innodb_undo_tablespaces is set to at least 2. The logging is implemented similar to the WL#6501 TRUNCATE, that is, using separate log files and a redo log checkpoint. We can simply implement undo tablespace truncation within a single mini-transaction that reinitializes the undo log tablespace file. Unfortunately, due to the redo log format of some operations, currently, the total redo log written by undo tablespace truncation will be more than the combined size of the truncated undo tablespace. It should be acceptable to have a little more than 1 megabyte of log in a single mini-transaction. This will be fixed in MDEV-17138 in MariaDB Server 10.4. recv_sys_t: Add truncated_undo_spaces[] to remember for which undo tablespaces a MLOG_FILE_CREATE2 record was seen. namespace undo: Remove some unnecessary declarations. fil_space_t::is_being_truncated: Document that this flag now only applies to undo tablespaces. Remove some references. fil_space_t::is_stopping(): Do not refer to is_being_truncated. This check is for tablespaces of tables. Potentially used tablespaces are never truncated any more. buf_dblwr_process(): Suppress the out-of-bounds warning for undo tablespaces. fil_truncate_log(): Write a MLOG_FILE_CREATE2 with a nonzero page number (new size of the tablespace in pages) to inform crash recovery that the undo tablespace size has been reduced. fil_op_write_log(): Relax assertions, so that MLOG_FILE_CREATE2 can be written for undo tablespaces (without .ibd file suffix) for a nonzero page number. os_file_truncate(): Add the parameter allow_shrink=false so that undo tablespaces can actually be shrunk using this function. fil_name_parse(): For undo tablespace truncation, buffer MLOG_FILE_CREATE2 in truncated_undo_spaces[]. recv_read_in_area(): Avoid reading pages for which no redo log records remain buffered, after recv_addr_trim() removed them. trx_rseg_header_create(): Add a FIXME comment that we could write much less redo log. trx_undo_truncate_tablespace(): Reinitialize the undo tablespace in a single mini-transaction, which will be flushed to the redo log before the file size is trimmed. recv_addr_trim(): Discard any redo logs for pages that were logged after the new end of a file, before the truncation LSN. If the rec_list becomes empty, reduce n_addrs. After removing any affected records, actually truncate the file. recv_apply_hashed_log_recs(): Invoke recv_addr_trim() right before applying any log records. The undo tablespace files must be open at this point. buf_flush_or_remove_pages(), buf_flush_dirty_pages(), buf_LRU_flush_or_remove_pages(): Add a parameter for specifying the number of the first page to flush or remove (default 0). trx_purge_initiate_truncate(): Remove the log checkpoints, the extra logging, and some unnecessary crash points. Merge the code from trx_undo_truncate_tablespace(). First, flush all to-be-discarded pages (beyond the new end of the file), then trim the space->size to make the page allocation deterministic. At the only remaining crash injection point, flush the redo log, so that the recovery can be tested.
7 years ago
MDEV-11233 CREATE FULLTEXT INDEX with a token longer than 127 bytes crashes server This bug is the result of merging the Oracle MySQL follow-up fix BUG#22963169 MYSQL CRASHES ON CREATE FULLTEXT INDEX without merging the base bug fix: Bug#79475 Insert a token of 84 4-bytes chars into fts index causes server crash. Unlike the above mentioned fixes in MySQL, our fix will not change the storage format of fulltext indexes in InnoDB or XtraDB when a character encoding with mbmaxlen=2 or mbmaxlen=3 and the length of a word is between 128 and 84*mbmaxlen bytes. The Oracle fix would allocate 2 length bytes for these cases. Compatibility with other MySQL and MariaDB releases is ensured by persisting the used maximum length in the SYS_COLUMNS table in the InnoDB data dictionary. This fix also removes some unnecessary strcmp() calls when checking for the legacy default collation my_charset_latin1 (my_charset_latin1.name=="latin1_swedish_ci"). fts_create_one_index_table(): Store the actual length in bytes. This metadata will be written to the SYS_COLUMNS table. fts_zip_initialize(): Initialize only the first byte of the buffer. Actually the code should not even care about this first byte, because the length is set as 0. FTX_MAX_WORD_LEN: Define as HA_FT_MAXCHARLEN * 4 aka 336 bytes, not as 254 bytes. row_merge_create_fts_sort_index(): Set the actual maximum length of the column in bytes, similar to fts_create_one_index_table(). row_merge_fts_doc_tokenize(): Remove the redundant parameter word_dtype. Use the actual maximum length of the column. Calculate the extra_size in the same way as row_merge_buf_encode() does.
9 years ago
MDEV-11369 Instant ADD COLUMN for InnoDB For InnoDB tables, adding, dropping and reordering columns has required a rebuild of the table and all its indexes. Since MySQL 5.6 (and MariaDB 10.0) this has been supported online (LOCK=NONE), allowing concurrent modification of the tables. This work revises the InnoDB ROW_FORMAT=REDUNDANT, ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC so that columns can be appended instantaneously, with only minor changes performed to the table structure. The counter innodb_instant_alter_column in INFORMATION_SCHEMA.GLOBAL_STATUS is incremented whenever a table rebuild operation is converted into an instant ADD COLUMN operation. ROW_FORMAT=COMPRESSED tables will not support instant ADD COLUMN. Some usability limitations will be addressed in subsequent work: MDEV-13134 Introduce ALTER TABLE attributes ALGORITHM=NOCOPY and ALGORITHM=INSTANT MDEV-14016 Allow instant ADD COLUMN, ADD INDEX, LOCK=NONE The format of the clustered index (PRIMARY KEY) is changed as follows: (1) The FIL_PAGE_TYPE of the root page will be FIL_PAGE_TYPE_INSTANT, and a new field PAGE_INSTANT will contain the original number of fields in the clustered index ('core' fields). If instant ADD COLUMN has not been used or the table becomes empty, or the very first instant ADD COLUMN operation is rolled back, the fields PAGE_INSTANT and FIL_PAGE_TYPE will be reset to 0 and FIL_PAGE_INDEX. (2) A special 'default row' record is inserted into the leftmost leaf, between the page infimum and the first user record. This record is distinguished by the REC_INFO_MIN_REC_FLAG, and it is otherwise in the same format as records that contain values for the instantly added columns. This 'default row' always has the same number of fields as the clustered index according to the table definition. The values of 'core' fields are to be ignored. For other fields, the 'default row' will contain the default values as they were during the ALTER TABLE statement. (If the column default values are changed later, those values will only be stored in the .frm file. The 'default row' will contain the original evaluated values, which must be the same for every row.) The 'default row' must be completely hidden from higher-level access routines. Assertions have been added to ensure that no 'default row' is ever present in the adaptive hash index or in locked records. The 'default row' is never delete-marked. (3) In clustered index leaf page records, the number of fields must reside between the number of 'core' fields (dict_index_t::n_core_fields introduced in this work) and dict_index_t::n_fields. If the number of fields is less than dict_index_t::n_fields, the missing fields are replaced with the column value of the 'default row'. Note: The number of fields in the record may shrink if some of the last instantly added columns are updated to the value that is in the 'default row'. The function btr_cur_trim() implements this 'compression' on update and rollback; dtuple::trim() implements it on insert. (4) In ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC records, the new status value REC_STATUS_COLUMNS_ADDED will indicate the presence of a new record header that will encode n_fields-n_core_fields-1 in 1 or 2 bytes. (In ROW_FORMAT=REDUNDANT records, the record header always explicitly encodes the number of fields.) We introduce the undo log record type TRX_UNDO_INSERT_DEFAULT for covering the insert of the 'default row' record when instant ADD COLUMN is used for the first time. Subsequent instant ADD COLUMN can use TRX_UNDO_UPD_EXIST_REC. This is joint work with Vin Chen (陈福荣) from Tencent. The design that was discussed in April 2017 would not have allowed import or export of data files, because instead of the 'default row' it would have introduced a data dictionary table. The test rpl.rpl_alter_instant is exactly as contributed in pull request #408. The test innodb.instant_alter is based on a contributed test. The redo log record format changes for ROW_FORMAT=DYNAMIC and ROW_FORMAT=COMPACT are as contributed. (With this change present, crash recovery from MariaDB 10.3.1 will fail in spectacular ways!) Also the semantics of higher-level redo log records that modify the PAGE_INSTANT field is changed. The redo log format version identifier was already changed to LOG_HEADER_FORMAT_CURRENT=103 in MariaDB 10.3.1. Everything else has been rewritten by me. Thanks to Elena Stepanova, the code has been tested extensively. When rolling back an instant ADD COLUMN operation, we must empty the PAGE_FREE list after deleting or shortening the 'default row' record, by calling either btr_page_empty() or btr_page_reorganize(). We must know the size of each entry in the PAGE_FREE list. If rollback left a freed copy of the 'default row' in the PAGE_FREE list, we would be unable to determine its size (if it is in ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC) because it would contain more fields than the rolled-back definition of the clustered index. UNIV_SQL_DEFAULT: A new special constant that designates an instantly added column that is not present in the clustered index record. len_is_stored(): Check if a length is an actual length. There are two magic length values: UNIV_SQL_DEFAULT, UNIV_SQL_NULL. dict_col_t::def_val: The 'default row' value of the column. If the column is not added instantly, def_val.len will be UNIV_SQL_DEFAULT. dict_col_t: Add the accessors is_virtual(), is_nullable(), is_instant(), instant_value(). dict_col_t::remove_instant(): Remove the 'instant ADD' status of a column. dict_col_t::name(const dict_table_t& table): Replaces dict_table_get_col_name(). dict_index_t::n_core_fields: The original number of fields. For secondary indexes and if instant ADD COLUMN has not been used, this will be equal to dict_index_t::n_fields. dict_index_t::n_core_null_bytes: Number of bytes needed to represent the null flags; usually equal to UT_BITS_IN_BYTES(n_nullable). dict_index_t::NO_CORE_NULL_BYTES: Magic value signalling that n_core_null_bytes was not initialized yet from the clustered index root page. dict_index_t: Add the accessors is_instant(), is_clust(), get_n_nullable(), instant_field_value(). dict_index_t::instant_add_field(): Adjust clustered index metadata for instant ADD COLUMN. dict_index_t::remove_instant(): Remove the 'instant ADD' status of a clustered index when the table becomes empty, or the very first instant ADD COLUMN operation is rolled back. dict_table_t: Add the accessors is_instant(), is_temporary(), supports_instant(). dict_table_t::instant_add_column(): Adjust metadata for instant ADD COLUMN. dict_table_t::rollback_instant(): Adjust metadata on the rollback of instant ADD COLUMN. prepare_inplace_alter_table_dict(): First create the ctx->new_table, and only then decide if the table really needs to be rebuilt. We must split the creation of table or index metadata from the creation of the dictionary table records and the creation of the data. In this way, we can transform a table-rebuilding operation into an instant ADD COLUMN operation. Dictionary objects will only be added to cache when table rebuilding or index creation is needed. The ctx->instant_table will never be added to cache. dict_table_t::add_to_cache(): Modified and renamed from dict_table_add_to_cache(). Do not modify the table metadata. Let the callers invoke dict_table_add_system_columns() and if needed, set can_be_evicted. dict_create_sys_tables_tuple(), dict_create_table_step(): Omit the system columns (which will now exist in the dict_table_t object already at this point). dict_create_table_step(): Expect the callers to invoke dict_table_add_system_columns(). pars_create_table(): Before creating the table creation execution graph, invoke dict_table_add_system_columns(). row_create_table_for_mysql(): Expect all callers to invoke dict_table_add_system_columns(). create_index_dict(): Replaces row_merge_create_index_graph(). innodb_update_n_cols(): Renamed from innobase_update_n_virtual(). Call my_error() if an error occurs. btr_cur_instant_init(), btr_cur_instant_init_low(), btr_cur_instant_root_init(): Load additional metadata from the clustered index and set dict_index_t::n_core_null_bytes. This is invoked when table metadata is first loaded into the data dictionary. dict_boot(): Initialize n_core_null_bytes for the four hard-coded dictionary tables. dict_create_index_step(): Initialize n_core_null_bytes. This is executed as part of CREATE TABLE. dict_index_build_internal_clust(): Initialize n_core_null_bytes to NO_CORE_NULL_BYTES if table->supports_instant(). row_create_index_for_mysql(): Initialize n_core_null_bytes for CREATE TEMPORARY TABLE. commit_cache_norebuild(): Call the code to rename or enlarge columns in the cache only if instant ADD COLUMN is not being used. (Instant ADD COLUMN would copy all column metadata from instant_table to old_table, including the names and lengths.) PAGE_INSTANT: A new 13-bit field for storing dict_index_t::n_core_fields. This is repurposing the 16-bit field PAGE_DIRECTION, of which only the least significant 3 bits were used. The original byte containing PAGE_DIRECTION will be accessible via the new constant PAGE_DIRECTION_B. page_get_instant(), page_set_instant(): Accessors for the PAGE_INSTANT. page_ptr_get_direction(), page_get_direction(), page_ptr_set_direction(): Accessors for PAGE_DIRECTION. page_direction_reset(): Reset PAGE_DIRECTION, PAGE_N_DIRECTION. page_direction_increment(): Increment PAGE_N_DIRECTION and set PAGE_DIRECTION. rec_get_offsets(): Use the 'leaf' parameter for non-debug purposes, and assume that heap_no is always set. Initialize all dict_index_t::n_fields for ROW_FORMAT=REDUNDANT records, even if the record contains fewer fields. rec_offs_make_valid(): Add the parameter 'leaf'. rec_copy_prefix_to_dtuple(): Assert that the tuple is only built on the core fields. Instant ADD COLUMN only applies to the clustered index, and we should never build a search key that has more than the PRIMARY KEY and possibly DB_TRX_ID,DB_ROLL_PTR. All these columns are always present. dict_index_build_data_tuple(): Remove assertions that would be duplicated in rec_copy_prefix_to_dtuple(). rec_init_offsets(): Support ROW_FORMAT=REDUNDANT records whose number of fields is between n_core_fields and n_fields. cmp_rec_rec_with_match(): Implement the comparison between two MIN_REC_FLAG records. trx_t::in_rollback: Make the field available in non-debug builds. trx_start_for_ddl_low(): Remove dangerous error-tolerance. A dictionary transaction must be flagged as such before it has generated any undo log records. This is because trx_undo_assign_undo() will mark the transaction as a dictionary transaction in the undo log header right before the very first undo log record is being written. btr_index_rec_validate(): Account for instant ADD COLUMN row_undo_ins_remove_clust_rec(): On the rollback of an insert into SYS_COLUMNS, revert instant ADD COLUMN in the cache by removing the last column from the table and the clustered index. row_search_on_row_ref(), row_undo_mod_parse_undo_rec(), row_undo_mod(), trx_undo_update_rec_get_update(): Handle the 'default row' as a special case. dtuple_t::trim(index): Omit a redundant suffix of an index tuple right before insert or update. After instant ADD COLUMN, if the last fields of a clustered index tuple match the 'default row', there is no need to store them. While trimming the entry, we must hold a page latch, so that the table cannot be emptied and the 'default row' be deleted. btr_cur_optimistic_update(), btr_cur_pessimistic_update(), row_upd_clust_rec_by_insert(), row_ins_clust_index_entry_low(): Invoke dtuple_t::trim() if needed. row_ins_clust_index_entry(): Restore dtuple_t::n_fields after calling row_ins_clust_index_entry_low(). rec_get_converted_size(), rec_get_converted_size_comp(): Allow the number of fields to be between n_core_fields and n_fields. Do not support infimum,supremum. They are never supposed to be stored in dtuple_t, because page creation nowadays uses a lower-level method for initializing them. rec_convert_dtuple_to_rec_comp(): Assign the status bits based on the number of fields. btr_cur_trim(): In an update, trim the index entry as needed. For the 'default row', handle rollback specially. For user records, omit fields that match the 'default row'. btr_cur_optimistic_delete_func(), btr_cur_pessimistic_delete(): Skip locking and adaptive hash index for the 'default row'. row_log_table_apply_convert_mrec(): Replace 'default row' values if needed. In the temporary file that is applied by row_log_table_apply(), we must identify whether the records contain the extra header for instantly added columns. For now, we will allocate an additional byte for this for ROW_T_INSERT and ROW_T_UPDATE records when the source table has been subject to instant ADD COLUMN. The ROW_T_DELETE records are fine, as they will be converted and will only contain 'core' columns (PRIMARY KEY and some system columns) that are converted from dtuple_t. rec_get_converted_size_temp(), rec_init_offsets_temp(), rec_convert_dtuple_to_temp(): Add the parameter 'status'. REC_INFO_DEFAULT_ROW = REC_INFO_MIN_REC_FLAG | REC_STATUS_COLUMNS_ADDED: An info_bits constant for distinguishing the 'default row' record. rec_comp_status_t: An enum of the status bit values. rec_leaf_format: An enum that replaces the bool parameter of rec_init_offsets_comp_ordinary().
8 years ago
MDEV-11233 CREATE FULLTEXT INDEX with a token longer than 127 bytes crashes server This bug is the result of merging the Oracle MySQL follow-up fix BUG#22963169 MYSQL CRASHES ON CREATE FULLTEXT INDEX without merging the base bug fix: Bug#79475 Insert a token of 84 4-bytes chars into fts index causes server crash. Unlike the above mentioned fixes in MySQL, our fix will not change the storage format of fulltext indexes in InnoDB or XtraDB when a character encoding with mbmaxlen=2 or mbmaxlen=3 and the length of a word is between 128 and 84*mbmaxlen bytes. The Oracle fix would allocate 2 length bytes for these cases. Compatibility with other MySQL and MariaDB releases is ensured by persisting the used maximum length in the SYS_COLUMNS table in the InnoDB data dictionary. This fix also removes some unnecessary strcmp() calls when checking for the legacy default collation my_charset_latin1 (my_charset_latin1.name=="latin1_swedish_ci"). fts_create_one_index_table(): Store the actual length in bytes. This metadata will be written to the SYS_COLUMNS table. fts_zip_initialize(): Initialize only the first byte of the buffer. Actually the code should not even care about this first byte, because the length is set as 0. FTX_MAX_WORD_LEN: Define as HA_FT_MAXCHARLEN * 4 aka 336 bytes, not as 254 bytes. row_merge_create_fts_sort_index(): Set the actual maximum length of the column in bytes, similar to fts_create_one_index_table(). row_merge_fts_doc_tokenize(): Remove the redundant parameter word_dtype. Use the actual maximum length of the column. Calculate the extra_size in the same way as row_merge_buf_encode() does.
9 years ago
MDEV-11233 CREATE FULLTEXT INDEX with a token longer than 127 bytes crashes server This bug is the result of merging the Oracle MySQL follow-up fix BUG#22963169 MYSQL CRASHES ON CREATE FULLTEXT INDEX without merging the base bug fix: Bug#79475 Insert a token of 84 4-bytes chars into fts index causes server crash. Unlike the above mentioned fixes in MySQL, our fix will not change the storage format of fulltext indexes in InnoDB or XtraDB when a character encoding with mbmaxlen=2 or mbmaxlen=3 and the length of a word is between 128 and 84*mbmaxlen bytes. The Oracle fix would allocate 2 length bytes for these cases. Compatibility with other MySQL and MariaDB releases is ensured by persisting the used maximum length in the SYS_COLUMNS table in the InnoDB data dictionary. This fix also removes some unnecessary strcmp() calls when checking for the legacy default collation my_charset_latin1 (my_charset_latin1.name=="latin1_swedish_ci"). fts_create_one_index_table(): Store the actual length in bytes. This metadata will be written to the SYS_COLUMNS table. fts_zip_initialize(): Initialize only the first byte of the buffer. Actually the code should not even care about this first byte, because the length is set as 0. FTX_MAX_WORD_LEN: Define as HA_FT_MAXCHARLEN * 4 aka 336 bytes, not as 254 bytes. row_merge_create_fts_sort_index(): Set the actual maximum length of the column in bytes, similar to fts_create_one_index_table(). row_merge_fts_doc_tokenize(): Remove the redundant parameter word_dtype. Use the actual maximum length of the column. Calculate the extra_size in the same way as row_merge_buf_encode() does.
9 years ago
11 years ago
MDEV-11233 CREATE FULLTEXT INDEX with a token longer than 127 bytes crashes server This bug is the result of merging the Oracle MySQL follow-up fix BUG#22963169 MYSQL CRASHES ON CREATE FULLTEXT INDEX without merging the base bug fix: Bug#79475 Insert a token of 84 4-bytes chars into fts index causes server crash. Unlike the above mentioned fixes in MySQL, our fix will not change the storage format of fulltext indexes in InnoDB or XtraDB when a character encoding with mbmaxlen=2 or mbmaxlen=3 and the length of a word is between 128 and 84*mbmaxlen bytes. The Oracle fix would allocate 2 length bytes for these cases. Compatibility with other MySQL and MariaDB releases is ensured by persisting the used maximum length in the SYS_COLUMNS table in the InnoDB data dictionary. This fix also removes some unnecessary strcmp() calls when checking for the legacy default collation my_charset_latin1 (my_charset_latin1.name=="latin1_swedish_ci"). fts_create_one_index_table(): Store the actual length in bytes. This metadata will be written to the SYS_COLUMNS table. fts_zip_initialize(): Initialize only the first byte of the buffer. Actually the code should not even care about this first byte, because the length is set as 0. FTX_MAX_WORD_LEN: Define as HA_FT_MAXCHARLEN * 4 aka 336 bytes, not as 254 bytes. row_merge_create_fts_sort_index(): Set the actual maximum length of the column in bytes, similar to fts_create_one_index_table(). row_merge_fts_doc_tokenize(): Remove the redundant parameter word_dtype. Use the actual maximum length of the column. Calculate the extra_size in the same way as row_merge_buf_encode() does.
9 years ago
MDEV-11369 Instant ADD COLUMN for InnoDB For InnoDB tables, adding, dropping and reordering columns has required a rebuild of the table and all its indexes. Since MySQL 5.6 (and MariaDB 10.0) this has been supported online (LOCK=NONE), allowing concurrent modification of the tables. This work revises the InnoDB ROW_FORMAT=REDUNDANT, ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC so that columns can be appended instantaneously, with only minor changes performed to the table structure. The counter innodb_instant_alter_column in INFORMATION_SCHEMA.GLOBAL_STATUS is incremented whenever a table rebuild operation is converted into an instant ADD COLUMN operation. ROW_FORMAT=COMPRESSED tables will not support instant ADD COLUMN. Some usability limitations will be addressed in subsequent work: MDEV-13134 Introduce ALTER TABLE attributes ALGORITHM=NOCOPY and ALGORITHM=INSTANT MDEV-14016 Allow instant ADD COLUMN, ADD INDEX, LOCK=NONE The format of the clustered index (PRIMARY KEY) is changed as follows: (1) The FIL_PAGE_TYPE of the root page will be FIL_PAGE_TYPE_INSTANT, and a new field PAGE_INSTANT will contain the original number of fields in the clustered index ('core' fields). If instant ADD COLUMN has not been used or the table becomes empty, or the very first instant ADD COLUMN operation is rolled back, the fields PAGE_INSTANT and FIL_PAGE_TYPE will be reset to 0 and FIL_PAGE_INDEX. (2) A special 'default row' record is inserted into the leftmost leaf, between the page infimum and the first user record. This record is distinguished by the REC_INFO_MIN_REC_FLAG, and it is otherwise in the same format as records that contain values for the instantly added columns. This 'default row' always has the same number of fields as the clustered index according to the table definition. The values of 'core' fields are to be ignored. For other fields, the 'default row' will contain the default values as they were during the ALTER TABLE statement. (If the column default values are changed later, those values will only be stored in the .frm file. The 'default row' will contain the original evaluated values, which must be the same for every row.) The 'default row' must be completely hidden from higher-level access routines. Assertions have been added to ensure that no 'default row' is ever present in the adaptive hash index or in locked records. The 'default row' is never delete-marked. (3) In clustered index leaf page records, the number of fields must reside between the number of 'core' fields (dict_index_t::n_core_fields introduced in this work) and dict_index_t::n_fields. If the number of fields is less than dict_index_t::n_fields, the missing fields are replaced with the column value of the 'default row'. Note: The number of fields in the record may shrink if some of the last instantly added columns are updated to the value that is in the 'default row'. The function btr_cur_trim() implements this 'compression' on update and rollback; dtuple::trim() implements it on insert. (4) In ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC records, the new status value REC_STATUS_COLUMNS_ADDED will indicate the presence of a new record header that will encode n_fields-n_core_fields-1 in 1 or 2 bytes. (In ROW_FORMAT=REDUNDANT records, the record header always explicitly encodes the number of fields.) We introduce the undo log record type TRX_UNDO_INSERT_DEFAULT for covering the insert of the 'default row' record when instant ADD COLUMN is used for the first time. Subsequent instant ADD COLUMN can use TRX_UNDO_UPD_EXIST_REC. This is joint work with Vin Chen (陈福荣) from Tencent. The design that was discussed in April 2017 would not have allowed import or export of data files, because instead of the 'default row' it would have introduced a data dictionary table. The test rpl.rpl_alter_instant is exactly as contributed in pull request #408. The test innodb.instant_alter is based on a contributed test. The redo log record format changes for ROW_FORMAT=DYNAMIC and ROW_FORMAT=COMPACT are as contributed. (With this change present, crash recovery from MariaDB 10.3.1 will fail in spectacular ways!) Also the semantics of higher-level redo log records that modify the PAGE_INSTANT field is changed. The redo log format version identifier was already changed to LOG_HEADER_FORMAT_CURRENT=103 in MariaDB 10.3.1. Everything else has been rewritten by me. Thanks to Elena Stepanova, the code has been tested extensively. When rolling back an instant ADD COLUMN operation, we must empty the PAGE_FREE list after deleting or shortening the 'default row' record, by calling either btr_page_empty() or btr_page_reorganize(). We must know the size of each entry in the PAGE_FREE list. If rollback left a freed copy of the 'default row' in the PAGE_FREE list, we would be unable to determine its size (if it is in ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC) because it would contain more fields than the rolled-back definition of the clustered index. UNIV_SQL_DEFAULT: A new special constant that designates an instantly added column that is not present in the clustered index record. len_is_stored(): Check if a length is an actual length. There are two magic length values: UNIV_SQL_DEFAULT, UNIV_SQL_NULL. dict_col_t::def_val: The 'default row' value of the column. If the column is not added instantly, def_val.len will be UNIV_SQL_DEFAULT. dict_col_t: Add the accessors is_virtual(), is_nullable(), is_instant(), instant_value(). dict_col_t::remove_instant(): Remove the 'instant ADD' status of a column. dict_col_t::name(const dict_table_t& table): Replaces dict_table_get_col_name(). dict_index_t::n_core_fields: The original number of fields. For secondary indexes and if instant ADD COLUMN has not been used, this will be equal to dict_index_t::n_fields. dict_index_t::n_core_null_bytes: Number of bytes needed to represent the null flags; usually equal to UT_BITS_IN_BYTES(n_nullable). dict_index_t::NO_CORE_NULL_BYTES: Magic value signalling that n_core_null_bytes was not initialized yet from the clustered index root page. dict_index_t: Add the accessors is_instant(), is_clust(), get_n_nullable(), instant_field_value(). dict_index_t::instant_add_field(): Adjust clustered index metadata for instant ADD COLUMN. dict_index_t::remove_instant(): Remove the 'instant ADD' status of a clustered index when the table becomes empty, or the very first instant ADD COLUMN operation is rolled back. dict_table_t: Add the accessors is_instant(), is_temporary(), supports_instant(). dict_table_t::instant_add_column(): Adjust metadata for instant ADD COLUMN. dict_table_t::rollback_instant(): Adjust metadata on the rollback of instant ADD COLUMN. prepare_inplace_alter_table_dict(): First create the ctx->new_table, and only then decide if the table really needs to be rebuilt. We must split the creation of table or index metadata from the creation of the dictionary table records and the creation of the data. In this way, we can transform a table-rebuilding operation into an instant ADD COLUMN operation. Dictionary objects will only be added to cache when table rebuilding or index creation is needed. The ctx->instant_table will never be added to cache. dict_table_t::add_to_cache(): Modified and renamed from dict_table_add_to_cache(). Do not modify the table metadata. Let the callers invoke dict_table_add_system_columns() and if needed, set can_be_evicted. dict_create_sys_tables_tuple(), dict_create_table_step(): Omit the system columns (which will now exist in the dict_table_t object already at this point). dict_create_table_step(): Expect the callers to invoke dict_table_add_system_columns(). pars_create_table(): Before creating the table creation execution graph, invoke dict_table_add_system_columns(). row_create_table_for_mysql(): Expect all callers to invoke dict_table_add_system_columns(). create_index_dict(): Replaces row_merge_create_index_graph(). innodb_update_n_cols(): Renamed from innobase_update_n_virtual(). Call my_error() if an error occurs. btr_cur_instant_init(), btr_cur_instant_init_low(), btr_cur_instant_root_init(): Load additional metadata from the clustered index and set dict_index_t::n_core_null_bytes. This is invoked when table metadata is first loaded into the data dictionary. dict_boot(): Initialize n_core_null_bytes for the four hard-coded dictionary tables. dict_create_index_step(): Initialize n_core_null_bytes. This is executed as part of CREATE TABLE. dict_index_build_internal_clust(): Initialize n_core_null_bytes to NO_CORE_NULL_BYTES if table->supports_instant(). row_create_index_for_mysql(): Initialize n_core_null_bytes for CREATE TEMPORARY TABLE. commit_cache_norebuild(): Call the code to rename or enlarge columns in the cache only if instant ADD COLUMN is not being used. (Instant ADD COLUMN would copy all column metadata from instant_table to old_table, including the names and lengths.) PAGE_INSTANT: A new 13-bit field for storing dict_index_t::n_core_fields. This is repurposing the 16-bit field PAGE_DIRECTION, of which only the least significant 3 bits were used. The original byte containing PAGE_DIRECTION will be accessible via the new constant PAGE_DIRECTION_B. page_get_instant(), page_set_instant(): Accessors for the PAGE_INSTANT. page_ptr_get_direction(), page_get_direction(), page_ptr_set_direction(): Accessors for PAGE_DIRECTION. page_direction_reset(): Reset PAGE_DIRECTION, PAGE_N_DIRECTION. page_direction_increment(): Increment PAGE_N_DIRECTION and set PAGE_DIRECTION. rec_get_offsets(): Use the 'leaf' parameter for non-debug purposes, and assume that heap_no is always set. Initialize all dict_index_t::n_fields for ROW_FORMAT=REDUNDANT records, even if the record contains fewer fields. rec_offs_make_valid(): Add the parameter 'leaf'. rec_copy_prefix_to_dtuple(): Assert that the tuple is only built on the core fields. Instant ADD COLUMN only applies to the clustered index, and we should never build a search key that has more than the PRIMARY KEY and possibly DB_TRX_ID,DB_ROLL_PTR. All these columns are always present. dict_index_build_data_tuple(): Remove assertions that would be duplicated in rec_copy_prefix_to_dtuple(). rec_init_offsets(): Support ROW_FORMAT=REDUNDANT records whose number of fields is between n_core_fields and n_fields. cmp_rec_rec_with_match(): Implement the comparison between two MIN_REC_FLAG records. trx_t::in_rollback: Make the field available in non-debug builds. trx_start_for_ddl_low(): Remove dangerous error-tolerance. A dictionary transaction must be flagged as such before it has generated any undo log records. This is because trx_undo_assign_undo() will mark the transaction as a dictionary transaction in the undo log header right before the very first undo log record is being written. btr_index_rec_validate(): Account for instant ADD COLUMN row_undo_ins_remove_clust_rec(): On the rollback of an insert into SYS_COLUMNS, revert instant ADD COLUMN in the cache by removing the last column from the table and the clustered index. row_search_on_row_ref(), row_undo_mod_parse_undo_rec(), row_undo_mod(), trx_undo_update_rec_get_update(): Handle the 'default row' as a special case. dtuple_t::trim(index): Omit a redundant suffix of an index tuple right before insert or update. After instant ADD COLUMN, if the last fields of a clustered index tuple match the 'default row', there is no need to store them. While trimming the entry, we must hold a page latch, so that the table cannot be emptied and the 'default row' be deleted. btr_cur_optimistic_update(), btr_cur_pessimistic_update(), row_upd_clust_rec_by_insert(), row_ins_clust_index_entry_low(): Invoke dtuple_t::trim() if needed. row_ins_clust_index_entry(): Restore dtuple_t::n_fields after calling row_ins_clust_index_entry_low(). rec_get_converted_size(), rec_get_converted_size_comp(): Allow the number of fields to be between n_core_fields and n_fields. Do not support infimum,supremum. They are never supposed to be stored in dtuple_t, because page creation nowadays uses a lower-level method for initializing them. rec_convert_dtuple_to_rec_comp(): Assign the status bits based on the number of fields. btr_cur_trim(): In an update, trim the index entry as needed. For the 'default row', handle rollback specially. For user records, omit fields that match the 'default row'. btr_cur_optimistic_delete_func(), btr_cur_pessimistic_delete(): Skip locking and adaptive hash index for the 'default row'. row_log_table_apply_convert_mrec(): Replace 'default row' values if needed. In the temporary file that is applied by row_log_table_apply(), we must identify whether the records contain the extra header for instantly added columns. For now, we will allocate an additional byte for this for ROW_T_INSERT and ROW_T_UPDATE records when the source table has been subject to instant ADD COLUMN. The ROW_T_DELETE records are fine, as they will be converted and will only contain 'core' columns (PRIMARY KEY and some system columns) that are converted from dtuple_t. rec_get_converted_size_temp(), rec_init_offsets_temp(), rec_convert_dtuple_to_temp(): Add the parameter 'status'. REC_INFO_DEFAULT_ROW = REC_INFO_MIN_REC_FLAG | REC_STATUS_COLUMNS_ADDED: An info_bits constant for distinguishing the 'default row' record. rec_comp_status_t: An enum of the status bit values. rec_leaf_format: An enum that replaces the bool parameter of rec_init_offsets_comp_ordinary().
8 years ago
MDEV-11738: Mariadb uses 100% of several of my 8 cpus doing nothing MDEV-11581: Mariadb starts InnoDB encryption threads when key has not changed or data scrubbing turned off Background: Key rotation is based on background threads (innodb-encryption-threads) periodically going through all tablespaces on fil_system. For each tablespace current used key version is compared to max key age (innodb-encryption-rotate-key-age). This process naturally takes CPU. Similarly, in same time need for scrubbing is investigated. Currently, key rotation is fully supported on Amazon AWS key management plugin only but InnoDB does not have knowledge what key management plugin is used. This patch re-purposes innodb-encryption-rotate-key-age=0 to disable key rotation and background data scrubbing. All new tables are added to special list for key rotation and key rotation is based on sending a event to background encryption threads instead of using periodic checking (i.e. timeout). fil0fil.cc: Added functions fil_space_acquire_low() to acquire a tablespace when it could be dropped concurrently. This function is used from fil_space_acquire() or fil_space_acquire_silent() that will not print any messages if we try to acquire space that does not exist. fil_space_release() to release a acquired tablespace. fil_space_next() to iterate tablespaces in fil_system using fil_space_acquire() and fil_space_release(). Similarly, fil_space_keyrotation_next() to iterate new list fil_system->rotation_list where new tables. are added if key rotation is disabled. Removed unnecessary functions fil_get_first_space_safe() fil_get_next_space_safe() fil_node_open_file(): After page 0 is read read also crypt_info if it is not yet read. btr_scrub_lock_dict_func() buf_page_check_corrupt() buf_page_encrypt_before_write() buf_merge_or_delete_for_page() lock_print_info_all_transactions() row_fts_psort_info_init() row_truncate_table_for_mysql() row_drop_table_for_mysql() Use fil_space_acquire()/release() to access fil_space_t. buf_page_decrypt_after_read(): Use fil_space_get_crypt_data() because at this point we might not yet have read page 0. fil0crypt.cc/fil0fil.h: Lot of changes. Pass fil_space_t* directly to functions needing it and store fil_space_t* to rotation state. Use fil_space_acquire()/release() when iterating tablespaces and removed unnecessary is_closing from fil_crypt_t. Use fil_space_t::is_stopping() to detect when access to tablespace should be stopped. Removed unnecessary fil_space_get_crypt_data(). fil_space_create(): Inform key rotation that there could be something to do if key rotation is disabled and new table with encryption enabled is created. Remove unnecessary functions fil_get_first_space_safe() and fil_get_next_space_safe(). fil_space_acquire() and fil_space_release() are used instead. Moved fil_space_get_crypt_data() and fil_space_set_crypt_data() to fil0crypt.cc. fsp_header_init(): Acquire fil_space_t*, write crypt_data and release space. check_table_options() Renamed FIL_SPACE_ENCRYPTION_* TO FIL_ENCRYPTION_* i_s.cc: Added ROTATING_OR_FLUSHING field to information_schema.innodb_tablespace_encryption to show current status of key rotation.
9 years ago
12 years ago
MDEV-11369 Instant ADD COLUMN for InnoDB For InnoDB tables, adding, dropping and reordering columns has required a rebuild of the table and all its indexes. Since MySQL 5.6 (and MariaDB 10.0) this has been supported online (LOCK=NONE), allowing concurrent modification of the tables. This work revises the InnoDB ROW_FORMAT=REDUNDANT, ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC so that columns can be appended instantaneously, with only minor changes performed to the table structure. The counter innodb_instant_alter_column in INFORMATION_SCHEMA.GLOBAL_STATUS is incremented whenever a table rebuild operation is converted into an instant ADD COLUMN operation. ROW_FORMAT=COMPRESSED tables will not support instant ADD COLUMN. Some usability limitations will be addressed in subsequent work: MDEV-13134 Introduce ALTER TABLE attributes ALGORITHM=NOCOPY and ALGORITHM=INSTANT MDEV-14016 Allow instant ADD COLUMN, ADD INDEX, LOCK=NONE The format of the clustered index (PRIMARY KEY) is changed as follows: (1) The FIL_PAGE_TYPE of the root page will be FIL_PAGE_TYPE_INSTANT, and a new field PAGE_INSTANT will contain the original number of fields in the clustered index ('core' fields). If instant ADD COLUMN has not been used or the table becomes empty, or the very first instant ADD COLUMN operation is rolled back, the fields PAGE_INSTANT and FIL_PAGE_TYPE will be reset to 0 and FIL_PAGE_INDEX. (2) A special 'default row' record is inserted into the leftmost leaf, between the page infimum and the first user record. This record is distinguished by the REC_INFO_MIN_REC_FLAG, and it is otherwise in the same format as records that contain values for the instantly added columns. This 'default row' always has the same number of fields as the clustered index according to the table definition. The values of 'core' fields are to be ignored. For other fields, the 'default row' will contain the default values as they were during the ALTER TABLE statement. (If the column default values are changed later, those values will only be stored in the .frm file. The 'default row' will contain the original evaluated values, which must be the same for every row.) The 'default row' must be completely hidden from higher-level access routines. Assertions have been added to ensure that no 'default row' is ever present in the adaptive hash index or in locked records. The 'default row' is never delete-marked. (3) In clustered index leaf page records, the number of fields must reside between the number of 'core' fields (dict_index_t::n_core_fields introduced in this work) and dict_index_t::n_fields. If the number of fields is less than dict_index_t::n_fields, the missing fields are replaced with the column value of the 'default row'. Note: The number of fields in the record may shrink if some of the last instantly added columns are updated to the value that is in the 'default row'. The function btr_cur_trim() implements this 'compression' on update and rollback; dtuple::trim() implements it on insert. (4) In ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC records, the new status value REC_STATUS_COLUMNS_ADDED will indicate the presence of a new record header that will encode n_fields-n_core_fields-1 in 1 or 2 bytes. (In ROW_FORMAT=REDUNDANT records, the record header always explicitly encodes the number of fields.) We introduce the undo log record type TRX_UNDO_INSERT_DEFAULT for covering the insert of the 'default row' record when instant ADD COLUMN is used for the first time. Subsequent instant ADD COLUMN can use TRX_UNDO_UPD_EXIST_REC. This is joint work with Vin Chen (陈福荣) from Tencent. The design that was discussed in April 2017 would not have allowed import or export of data files, because instead of the 'default row' it would have introduced a data dictionary table. The test rpl.rpl_alter_instant is exactly as contributed in pull request #408. The test innodb.instant_alter is based on a contributed test. The redo log record format changes for ROW_FORMAT=DYNAMIC and ROW_FORMAT=COMPACT are as contributed. (With this change present, crash recovery from MariaDB 10.3.1 will fail in spectacular ways!) Also the semantics of higher-level redo log records that modify the PAGE_INSTANT field is changed. The redo log format version identifier was already changed to LOG_HEADER_FORMAT_CURRENT=103 in MariaDB 10.3.1. Everything else has been rewritten by me. Thanks to Elena Stepanova, the code has been tested extensively. When rolling back an instant ADD COLUMN operation, we must empty the PAGE_FREE list after deleting or shortening the 'default row' record, by calling either btr_page_empty() or btr_page_reorganize(). We must know the size of each entry in the PAGE_FREE list. If rollback left a freed copy of the 'default row' in the PAGE_FREE list, we would be unable to determine its size (if it is in ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC) because it would contain more fields than the rolled-back definition of the clustered index. UNIV_SQL_DEFAULT: A new special constant that designates an instantly added column that is not present in the clustered index record. len_is_stored(): Check if a length is an actual length. There are two magic length values: UNIV_SQL_DEFAULT, UNIV_SQL_NULL. dict_col_t::def_val: The 'default row' value of the column. If the column is not added instantly, def_val.len will be UNIV_SQL_DEFAULT. dict_col_t: Add the accessors is_virtual(), is_nullable(), is_instant(), instant_value(). dict_col_t::remove_instant(): Remove the 'instant ADD' status of a column. dict_col_t::name(const dict_table_t& table): Replaces dict_table_get_col_name(). dict_index_t::n_core_fields: The original number of fields. For secondary indexes and if instant ADD COLUMN has not been used, this will be equal to dict_index_t::n_fields. dict_index_t::n_core_null_bytes: Number of bytes needed to represent the null flags; usually equal to UT_BITS_IN_BYTES(n_nullable). dict_index_t::NO_CORE_NULL_BYTES: Magic value signalling that n_core_null_bytes was not initialized yet from the clustered index root page. dict_index_t: Add the accessors is_instant(), is_clust(), get_n_nullable(), instant_field_value(). dict_index_t::instant_add_field(): Adjust clustered index metadata for instant ADD COLUMN. dict_index_t::remove_instant(): Remove the 'instant ADD' status of a clustered index when the table becomes empty, or the very first instant ADD COLUMN operation is rolled back. dict_table_t: Add the accessors is_instant(), is_temporary(), supports_instant(). dict_table_t::instant_add_column(): Adjust metadata for instant ADD COLUMN. dict_table_t::rollback_instant(): Adjust metadata on the rollback of instant ADD COLUMN. prepare_inplace_alter_table_dict(): First create the ctx->new_table, and only then decide if the table really needs to be rebuilt. We must split the creation of table or index metadata from the creation of the dictionary table records and the creation of the data. In this way, we can transform a table-rebuilding operation into an instant ADD COLUMN operation. Dictionary objects will only be added to cache when table rebuilding or index creation is needed. The ctx->instant_table will never be added to cache. dict_table_t::add_to_cache(): Modified and renamed from dict_table_add_to_cache(). Do not modify the table metadata. Let the callers invoke dict_table_add_system_columns() and if needed, set can_be_evicted. dict_create_sys_tables_tuple(), dict_create_table_step(): Omit the system columns (which will now exist in the dict_table_t object already at this point). dict_create_table_step(): Expect the callers to invoke dict_table_add_system_columns(). pars_create_table(): Before creating the table creation execution graph, invoke dict_table_add_system_columns(). row_create_table_for_mysql(): Expect all callers to invoke dict_table_add_system_columns(). create_index_dict(): Replaces row_merge_create_index_graph(). innodb_update_n_cols(): Renamed from innobase_update_n_virtual(). Call my_error() if an error occurs. btr_cur_instant_init(), btr_cur_instant_init_low(), btr_cur_instant_root_init(): Load additional metadata from the clustered index and set dict_index_t::n_core_null_bytes. This is invoked when table metadata is first loaded into the data dictionary. dict_boot(): Initialize n_core_null_bytes for the four hard-coded dictionary tables. dict_create_index_step(): Initialize n_core_null_bytes. This is executed as part of CREATE TABLE. dict_index_build_internal_clust(): Initialize n_core_null_bytes to NO_CORE_NULL_BYTES if table->supports_instant(). row_create_index_for_mysql(): Initialize n_core_null_bytes for CREATE TEMPORARY TABLE. commit_cache_norebuild(): Call the code to rename or enlarge columns in the cache only if instant ADD COLUMN is not being used. (Instant ADD COLUMN would copy all column metadata from instant_table to old_table, including the names and lengths.) PAGE_INSTANT: A new 13-bit field for storing dict_index_t::n_core_fields. This is repurposing the 16-bit field PAGE_DIRECTION, of which only the least significant 3 bits were used. The original byte containing PAGE_DIRECTION will be accessible via the new constant PAGE_DIRECTION_B. page_get_instant(), page_set_instant(): Accessors for the PAGE_INSTANT. page_ptr_get_direction(), page_get_direction(), page_ptr_set_direction(): Accessors for PAGE_DIRECTION. page_direction_reset(): Reset PAGE_DIRECTION, PAGE_N_DIRECTION. page_direction_increment(): Increment PAGE_N_DIRECTION and set PAGE_DIRECTION. rec_get_offsets(): Use the 'leaf' parameter for non-debug purposes, and assume that heap_no is always set. Initialize all dict_index_t::n_fields for ROW_FORMAT=REDUNDANT records, even if the record contains fewer fields. rec_offs_make_valid(): Add the parameter 'leaf'. rec_copy_prefix_to_dtuple(): Assert that the tuple is only built on the core fields. Instant ADD COLUMN only applies to the clustered index, and we should never build a search key that has more than the PRIMARY KEY and possibly DB_TRX_ID,DB_ROLL_PTR. All these columns are always present. dict_index_build_data_tuple(): Remove assertions that would be duplicated in rec_copy_prefix_to_dtuple(). rec_init_offsets(): Support ROW_FORMAT=REDUNDANT records whose number of fields is between n_core_fields and n_fields. cmp_rec_rec_with_match(): Implement the comparison between two MIN_REC_FLAG records. trx_t::in_rollback: Make the field available in non-debug builds. trx_start_for_ddl_low(): Remove dangerous error-tolerance. A dictionary transaction must be flagged as such before it has generated any undo log records. This is because trx_undo_assign_undo() will mark the transaction as a dictionary transaction in the undo log header right before the very first undo log record is being written. btr_index_rec_validate(): Account for instant ADD COLUMN row_undo_ins_remove_clust_rec(): On the rollback of an insert into SYS_COLUMNS, revert instant ADD COLUMN in the cache by removing the last column from the table and the clustered index. row_search_on_row_ref(), row_undo_mod_parse_undo_rec(), row_undo_mod(), trx_undo_update_rec_get_update(): Handle the 'default row' as a special case. dtuple_t::trim(index): Omit a redundant suffix of an index tuple right before insert or update. After instant ADD COLUMN, if the last fields of a clustered index tuple match the 'default row', there is no need to store them. While trimming the entry, we must hold a page latch, so that the table cannot be emptied and the 'default row' be deleted. btr_cur_optimistic_update(), btr_cur_pessimistic_update(), row_upd_clust_rec_by_insert(), row_ins_clust_index_entry_low(): Invoke dtuple_t::trim() if needed. row_ins_clust_index_entry(): Restore dtuple_t::n_fields after calling row_ins_clust_index_entry_low(). rec_get_converted_size(), rec_get_converted_size_comp(): Allow the number of fields to be between n_core_fields and n_fields. Do not support infimum,supremum. They are never supposed to be stored in dtuple_t, because page creation nowadays uses a lower-level method for initializing them. rec_convert_dtuple_to_rec_comp(): Assign the status bits based on the number of fields. btr_cur_trim(): In an update, trim the index entry as needed. For the 'default row', handle rollback specially. For user records, omit fields that match the 'default row'. btr_cur_optimistic_delete_func(), btr_cur_pessimistic_delete(): Skip locking and adaptive hash index for the 'default row'. row_log_table_apply_convert_mrec(): Replace 'default row' values if needed. In the temporary file that is applied by row_log_table_apply(), we must identify whether the records contain the extra header for instantly added columns. For now, we will allocate an additional byte for this for ROW_T_INSERT and ROW_T_UPDATE records when the source table has been subject to instant ADD COLUMN. The ROW_T_DELETE records are fine, as they will be converted and will only contain 'core' columns (PRIMARY KEY and some system columns) that are converted from dtuple_t. rec_get_converted_size_temp(), rec_init_offsets_temp(), rec_convert_dtuple_to_temp(): Add the parameter 'status'. REC_INFO_DEFAULT_ROW = REC_INFO_MIN_REC_FLAG | REC_STATUS_COLUMNS_ADDED: An info_bits constant for distinguishing the 'default row' record. rec_comp_status_t: An enum of the status bit values. rec_leaf_format: An enum that replaces the bool parameter of rec_init_offsets_comp_ordinary().
8 years ago
MDEV-13564 Mariabackup does not work with TRUNCATE Implement undo tablespace truncation via normal redo logging. Implement TRUNCATE TABLE as a combination of RENAME to #sql-ib name, CREATE, and DROP. Note: Orphan #sql-ib*.ibd may be left behind if MariaDB Server 10.2 is killed before the DROP operation is committed. If MariaDB Server 10.2 is killed during TRUNCATE, it is also possible that the old table was renamed to #sql-ib*.ibd but the data dictionary will refer to the table using the original name. In MariaDB Server 10.3, RENAME inside InnoDB is transactional, and #sql-* tables will be dropped on startup. So, this new TRUNCATE will be fully crash-safe in 10.3. ha_mroonga::wrapper_truncate(): Pass table options to the underlying storage engine, now that ha_innobase::truncate() will need them. rpl_slave_state::truncate_state_table(): Before truncating mysql.gtid_slave_pos, evict any cached table handles from the table definition cache, so that there will be no stale references to the old table after truncating. == TRUNCATE TABLE == WL#6501 in MySQL 5.7 introduced separate log files for implementing atomic and crash-safe TRUNCATE TABLE, instead of using the InnoDB undo and redo log. Some convoluted logic was added to the InnoDB crash recovery, and some extra synchronization (including a redo log checkpoint) was introduced to make this work. This synchronization has caused performance problems and race conditions, and the extra log files cannot be copied or applied by external backup programs. In order to support crash-upgrade from MariaDB 10.2, we will keep the logic for parsing and applying the extra log files, but we will no longer generate those files in TRUNCATE TABLE. A prerequisite for crash-safe TRUNCATE is a crash-safe RENAME TABLE (with full redo and undo logging and proper rollback). This will be implemented in MDEV-14717. ha_innobase::truncate(): Invoke RENAME, create(), delete_table(). Because RENAME cannot be fully rolled back before MariaDB 10.3 due to missing undo logging, add some explicit rename-back in case the operation fails. ha_innobase::delete(): Introduce a variant that takes sqlcom as a parameter. In TRUNCATE TABLE, we do not want to touch any FOREIGN KEY constraints. ha_innobase::create(): Add the parameters file_per_table, trx. In TRUNCATE, the new table must be created in the same transaction that renames the old table. create_table_info_t::create_table_info_t(): Add the parameters file_per_table, trx. row_drop_table_for_mysql(): Replace a bool parameter with sqlcom. row_drop_table_after_create_fail(): New function, wrapping row_drop_table_for_mysql(). dict_truncate_index_tree_in_mem(), fil_truncate_tablespace(), fil_prepare_for_truncate(), fil_reinit_space_header_for_table(), row_truncate_table_for_mysql(), TruncateLogger, row_truncate_prepare(), row_truncate_rollback(), row_truncate_complete(), row_truncate_fts(), row_truncate_update_system_tables(), row_truncate_foreign_key_checks(), row_truncate_sanity_checks(): Remove. row_upd_check_references_constraints(): Remove a check for TRUNCATE, now that the table is no longer truncated in place. The new test innodb.truncate_foreign uses DEBUG_SYNC to cover some race-condition like scenarios. The test innodb-innodb.truncate does not use any synchronization. We add a redo log subformat to indicate backup-friendly format. MariaDB 10.4 will remove support for the old TRUNCATE logging, so crash-upgrade from old 10.2 or 10.3 to 10.4 will involve limitations. == Undo tablespace truncation == MySQL 5.7 implements undo tablespace truncation. It is only possible when innodb_undo_tablespaces is set to at least 2. The logging is implemented similar to the WL#6501 TRUNCATE, that is, using separate log files and a redo log checkpoint. We can simply implement undo tablespace truncation within a single mini-transaction that reinitializes the undo log tablespace file. Unfortunately, due to the redo log format of some operations, currently, the total redo log written by undo tablespace truncation will be more than the combined size of the truncated undo tablespace. It should be acceptable to have a little more than 1 megabyte of log in a single mini-transaction. This will be fixed in MDEV-17138 in MariaDB Server 10.4. recv_sys_t: Add truncated_undo_spaces[] to remember for which undo tablespaces a MLOG_FILE_CREATE2 record was seen. namespace undo: Remove some unnecessary declarations. fil_space_t::is_being_truncated: Document that this flag now only applies to undo tablespaces. Remove some references. fil_space_t::is_stopping(): Do not refer to is_being_truncated. This check is for tablespaces of tables. Potentially used tablespaces are never truncated any more. buf_dblwr_process(): Suppress the out-of-bounds warning for undo tablespaces. fil_truncate_log(): Write a MLOG_FILE_CREATE2 with a nonzero page number (new size of the tablespace in pages) to inform crash recovery that the undo tablespace size has been reduced. fil_op_write_log(): Relax assertions, so that MLOG_FILE_CREATE2 can be written for undo tablespaces (without .ibd file suffix) for a nonzero page number. os_file_truncate(): Add the parameter allow_shrink=false so that undo tablespaces can actually be shrunk using this function. fil_name_parse(): For undo tablespace truncation, buffer MLOG_FILE_CREATE2 in truncated_undo_spaces[]. recv_read_in_area(): Avoid reading pages for which no redo log records remain buffered, after recv_addr_trim() removed them. trx_rseg_header_create(): Add a FIXME comment that we could write much less redo log. trx_undo_truncate_tablespace(): Reinitialize the undo tablespace in a single mini-transaction, which will be flushed to the redo log before the file size is trimmed. recv_addr_trim(): Discard any redo logs for pages that were logged after the new end of a file, before the truncation LSN. If the rec_list becomes empty, reduce n_addrs. After removing any affected records, actually truncate the file. recv_apply_hashed_log_recs(): Invoke recv_addr_trim() right before applying any log records. The undo tablespace files must be open at this point. buf_flush_or_remove_pages(), buf_flush_dirty_pages(), buf_LRU_flush_or_remove_pages(): Add a parameter for specifying the number of the first page to flush or remove (default 0). trx_purge_initiate_truncate(): Remove the log checkpoints, the extra logging, and some unnecessary crash points. Merge the code from trx_undo_truncate_tablespace(). First, flush all to-be-discarded pages (beyond the new end of the file), then trim the space->size to make the page allocation deterministic. At the only remaining crash injection point, flush the redo log, so that the recovery can be tested.
7 years ago
11 years ago
11 years ago
11 years ago
11 years ago
12 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
MDEV-11369 Instant ADD COLUMN for InnoDB For InnoDB tables, adding, dropping and reordering columns has required a rebuild of the table and all its indexes. Since MySQL 5.6 (and MariaDB 10.0) this has been supported online (LOCK=NONE), allowing concurrent modification of the tables. This work revises the InnoDB ROW_FORMAT=REDUNDANT, ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC so that columns can be appended instantaneously, with only minor changes performed to the table structure. The counter innodb_instant_alter_column in INFORMATION_SCHEMA.GLOBAL_STATUS is incremented whenever a table rebuild operation is converted into an instant ADD COLUMN operation. ROW_FORMAT=COMPRESSED tables will not support instant ADD COLUMN. Some usability limitations will be addressed in subsequent work: MDEV-13134 Introduce ALTER TABLE attributes ALGORITHM=NOCOPY and ALGORITHM=INSTANT MDEV-14016 Allow instant ADD COLUMN, ADD INDEX, LOCK=NONE The format of the clustered index (PRIMARY KEY) is changed as follows: (1) The FIL_PAGE_TYPE of the root page will be FIL_PAGE_TYPE_INSTANT, and a new field PAGE_INSTANT will contain the original number of fields in the clustered index ('core' fields). If instant ADD COLUMN has not been used or the table becomes empty, or the very first instant ADD COLUMN operation is rolled back, the fields PAGE_INSTANT and FIL_PAGE_TYPE will be reset to 0 and FIL_PAGE_INDEX. (2) A special 'default row' record is inserted into the leftmost leaf, between the page infimum and the first user record. This record is distinguished by the REC_INFO_MIN_REC_FLAG, and it is otherwise in the same format as records that contain values for the instantly added columns. This 'default row' always has the same number of fields as the clustered index according to the table definition. The values of 'core' fields are to be ignored. For other fields, the 'default row' will contain the default values as they were during the ALTER TABLE statement. (If the column default values are changed later, those values will only be stored in the .frm file. The 'default row' will contain the original evaluated values, which must be the same for every row.) The 'default row' must be completely hidden from higher-level access routines. Assertions have been added to ensure that no 'default row' is ever present in the adaptive hash index or in locked records. The 'default row' is never delete-marked. (3) In clustered index leaf page records, the number of fields must reside between the number of 'core' fields (dict_index_t::n_core_fields introduced in this work) and dict_index_t::n_fields. If the number of fields is less than dict_index_t::n_fields, the missing fields are replaced with the column value of the 'default row'. Note: The number of fields in the record may shrink if some of the last instantly added columns are updated to the value that is in the 'default row'. The function btr_cur_trim() implements this 'compression' on update and rollback; dtuple::trim() implements it on insert. (4) In ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC records, the new status value REC_STATUS_COLUMNS_ADDED will indicate the presence of a new record header that will encode n_fields-n_core_fields-1 in 1 or 2 bytes. (In ROW_FORMAT=REDUNDANT records, the record header always explicitly encodes the number of fields.) We introduce the undo log record type TRX_UNDO_INSERT_DEFAULT for covering the insert of the 'default row' record when instant ADD COLUMN is used for the first time. Subsequent instant ADD COLUMN can use TRX_UNDO_UPD_EXIST_REC. This is joint work with Vin Chen (陈福荣) from Tencent. The design that was discussed in April 2017 would not have allowed import or export of data files, because instead of the 'default row' it would have introduced a data dictionary table. The test rpl.rpl_alter_instant is exactly as contributed in pull request #408. The test innodb.instant_alter is based on a contributed test. The redo log record format changes for ROW_FORMAT=DYNAMIC and ROW_FORMAT=COMPACT are as contributed. (With this change present, crash recovery from MariaDB 10.3.1 will fail in spectacular ways!) Also the semantics of higher-level redo log records that modify the PAGE_INSTANT field is changed. The redo log format version identifier was already changed to LOG_HEADER_FORMAT_CURRENT=103 in MariaDB 10.3.1. Everything else has been rewritten by me. Thanks to Elena Stepanova, the code has been tested extensively. When rolling back an instant ADD COLUMN operation, we must empty the PAGE_FREE list after deleting or shortening the 'default row' record, by calling either btr_page_empty() or btr_page_reorganize(). We must know the size of each entry in the PAGE_FREE list. If rollback left a freed copy of the 'default row' in the PAGE_FREE list, we would be unable to determine its size (if it is in ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC) because it would contain more fields than the rolled-back definition of the clustered index. UNIV_SQL_DEFAULT: A new special constant that designates an instantly added column that is not present in the clustered index record. len_is_stored(): Check if a length is an actual length. There are two magic length values: UNIV_SQL_DEFAULT, UNIV_SQL_NULL. dict_col_t::def_val: The 'default row' value of the column. If the column is not added instantly, def_val.len will be UNIV_SQL_DEFAULT. dict_col_t: Add the accessors is_virtual(), is_nullable(), is_instant(), instant_value(). dict_col_t::remove_instant(): Remove the 'instant ADD' status of a column. dict_col_t::name(const dict_table_t& table): Replaces dict_table_get_col_name(). dict_index_t::n_core_fields: The original number of fields. For secondary indexes and if instant ADD COLUMN has not been used, this will be equal to dict_index_t::n_fields. dict_index_t::n_core_null_bytes: Number of bytes needed to represent the null flags; usually equal to UT_BITS_IN_BYTES(n_nullable). dict_index_t::NO_CORE_NULL_BYTES: Magic value signalling that n_core_null_bytes was not initialized yet from the clustered index root page. dict_index_t: Add the accessors is_instant(), is_clust(), get_n_nullable(), instant_field_value(). dict_index_t::instant_add_field(): Adjust clustered index metadata for instant ADD COLUMN. dict_index_t::remove_instant(): Remove the 'instant ADD' status of a clustered index when the table becomes empty, or the very first instant ADD COLUMN operation is rolled back. dict_table_t: Add the accessors is_instant(), is_temporary(), supports_instant(). dict_table_t::instant_add_column(): Adjust metadata for instant ADD COLUMN. dict_table_t::rollback_instant(): Adjust metadata on the rollback of instant ADD COLUMN. prepare_inplace_alter_table_dict(): First create the ctx->new_table, and only then decide if the table really needs to be rebuilt. We must split the creation of table or index metadata from the creation of the dictionary table records and the creation of the data. In this way, we can transform a table-rebuilding operation into an instant ADD COLUMN operation. Dictionary objects will only be added to cache when table rebuilding or index creation is needed. The ctx->instant_table will never be added to cache. dict_table_t::add_to_cache(): Modified and renamed from dict_table_add_to_cache(). Do not modify the table metadata. Let the callers invoke dict_table_add_system_columns() and if needed, set can_be_evicted. dict_create_sys_tables_tuple(), dict_create_table_step(): Omit the system columns (which will now exist in the dict_table_t object already at this point). dict_create_table_step(): Expect the callers to invoke dict_table_add_system_columns(). pars_create_table(): Before creating the table creation execution graph, invoke dict_table_add_system_columns(). row_create_table_for_mysql(): Expect all callers to invoke dict_table_add_system_columns(). create_index_dict(): Replaces row_merge_create_index_graph(). innodb_update_n_cols(): Renamed from innobase_update_n_virtual(). Call my_error() if an error occurs. btr_cur_instant_init(), btr_cur_instant_init_low(), btr_cur_instant_root_init(): Load additional metadata from the clustered index and set dict_index_t::n_core_null_bytes. This is invoked when table metadata is first loaded into the data dictionary. dict_boot(): Initialize n_core_null_bytes for the four hard-coded dictionary tables. dict_create_index_step(): Initialize n_core_null_bytes. This is executed as part of CREATE TABLE. dict_index_build_internal_clust(): Initialize n_core_null_bytes to NO_CORE_NULL_BYTES if table->supports_instant(). row_create_index_for_mysql(): Initialize n_core_null_bytes for CREATE TEMPORARY TABLE. commit_cache_norebuild(): Call the code to rename or enlarge columns in the cache only if instant ADD COLUMN is not being used. (Instant ADD COLUMN would copy all column metadata from instant_table to old_table, including the names and lengths.) PAGE_INSTANT: A new 13-bit field for storing dict_index_t::n_core_fields. This is repurposing the 16-bit field PAGE_DIRECTION, of which only the least significant 3 bits were used. The original byte containing PAGE_DIRECTION will be accessible via the new constant PAGE_DIRECTION_B. page_get_instant(), page_set_instant(): Accessors for the PAGE_INSTANT. page_ptr_get_direction(), page_get_direction(), page_ptr_set_direction(): Accessors for PAGE_DIRECTION. page_direction_reset(): Reset PAGE_DIRECTION, PAGE_N_DIRECTION. page_direction_increment(): Increment PAGE_N_DIRECTION and set PAGE_DIRECTION. rec_get_offsets(): Use the 'leaf' parameter for non-debug purposes, and assume that heap_no is always set. Initialize all dict_index_t::n_fields for ROW_FORMAT=REDUNDANT records, even if the record contains fewer fields. rec_offs_make_valid(): Add the parameter 'leaf'. rec_copy_prefix_to_dtuple(): Assert that the tuple is only built on the core fields. Instant ADD COLUMN only applies to the clustered index, and we should never build a search key that has more than the PRIMARY KEY and possibly DB_TRX_ID,DB_ROLL_PTR. All these columns are always present. dict_index_build_data_tuple(): Remove assertions that would be duplicated in rec_copy_prefix_to_dtuple(). rec_init_offsets(): Support ROW_FORMAT=REDUNDANT records whose number of fields is between n_core_fields and n_fields. cmp_rec_rec_with_match(): Implement the comparison between two MIN_REC_FLAG records. trx_t::in_rollback: Make the field available in non-debug builds. trx_start_for_ddl_low(): Remove dangerous error-tolerance. A dictionary transaction must be flagged as such before it has generated any undo log records. This is because trx_undo_assign_undo() will mark the transaction as a dictionary transaction in the undo log header right before the very first undo log record is being written. btr_index_rec_validate(): Account for instant ADD COLUMN row_undo_ins_remove_clust_rec(): On the rollback of an insert into SYS_COLUMNS, revert instant ADD COLUMN in the cache by removing the last column from the table and the clustered index. row_search_on_row_ref(), row_undo_mod_parse_undo_rec(), row_undo_mod(), trx_undo_update_rec_get_update(): Handle the 'default row' as a special case. dtuple_t::trim(index): Omit a redundant suffix of an index tuple right before insert or update. After instant ADD COLUMN, if the last fields of a clustered index tuple match the 'default row', there is no need to store them. While trimming the entry, we must hold a page latch, so that the table cannot be emptied and the 'default row' be deleted. btr_cur_optimistic_update(), btr_cur_pessimistic_update(), row_upd_clust_rec_by_insert(), row_ins_clust_index_entry_low(): Invoke dtuple_t::trim() if needed. row_ins_clust_index_entry(): Restore dtuple_t::n_fields after calling row_ins_clust_index_entry_low(). rec_get_converted_size(), rec_get_converted_size_comp(): Allow the number of fields to be between n_core_fields and n_fields. Do not support infimum,supremum. They are never supposed to be stored in dtuple_t, because page creation nowadays uses a lower-level method for initializing them. rec_convert_dtuple_to_rec_comp(): Assign the status bits based on the number of fields. btr_cur_trim(): In an update, trim the index entry as needed. For the 'default row', handle rollback specially. For user records, omit fields that match the 'default row'. btr_cur_optimistic_delete_func(), btr_cur_pessimistic_delete(): Skip locking and adaptive hash index for the 'default row'. row_log_table_apply_convert_mrec(): Replace 'default row' values if needed. In the temporary file that is applied by row_log_table_apply(), we must identify whether the records contain the extra header for instantly added columns. For now, we will allocate an additional byte for this for ROW_T_INSERT and ROW_T_UPDATE records when the source table has been subject to instant ADD COLUMN. The ROW_T_DELETE records are fine, as they will be converted and will only contain 'core' columns (PRIMARY KEY and some system columns) that are converted from dtuple_t. rec_get_converted_size_temp(), rec_init_offsets_temp(), rec_convert_dtuple_to_temp(): Add the parameter 'status'. REC_INFO_DEFAULT_ROW = REC_INFO_MIN_REC_FLAG | REC_STATUS_COLUMNS_ADDED: An info_bits constant for distinguishing the 'default row' record. rec_comp_status_t: An enum of the status bit values. rec_leaf_format: An enum that replaces the bool parameter of rec_init_offsets_comp_ordinary().
8 years ago
11 years ago
MDEV-11415 Remove excessive undo logging during ALTER TABLE…ALGORITHM=COPY If a crash occurs during ALTER TABLE…ALGORITHM=COPY, InnoDB would spend a lot of time rolling back writes to the intermediate copy of the table. To reduce the amount of busy work done, a work-around was introduced in commit fd069e2bb36a3c1c1f26d65dd298b07e6d83ac8b in MySQL 4.1.8 and 5.0.2, to commit the transaction after every 10,000 inserted rows. A proper fix would have been to disable the undo logging altogether and to simply drop the intermediate copy of the table on subsequent server startup. This is what happens in MariaDB 10.3 with MDEV-14717,MDEV-14585. In MariaDB 10.2, the intermediate copy of the table would be left behind with a name starting with the string #sql. This is a backport of a bug fix from MySQL 8.0.0 to MariaDB, contributed by jixianliang <271365745@qq.com>. Unlike recent MySQL, MariaDB supports ALTER IGNORE. For that operation InnoDB must for now keep the undo logging enabled, so that the latest row can be rolled back in case of an error. In Galera cluster, the LOAD DATA statement will retain the existing behaviour and commit the transaction after every 10,000 rows if the parameter wsrep_load_data_splitting=ON is set. The logic to do so (the wsrep_load_data_split() function and the call handler::extra(HA_EXTRA_FAKE_START_STMT)) are joint work by Ji Xianliang and Marko Mäkelä. The original fix: Author: Thirunarayanan Balathandayuthapani <thirunarayanan.balathandayuth@oracle.com> Date: Wed Dec 2 16:09:15 2015 +0530 Bug#17479594 AVOID INTERMEDIATE COMMIT WHILE DOING ALTER TABLE ALGORITHM=COPY Problem: During ALTER TABLE, we commit and restart the transaction for every 10,000 rows, so that the rollback after recovery would not take so long. Fix: Suppress the undo logging during copy alter operation. If fts_index is present then insert directly into fts auxiliary table rather than doing at commit time. ha_innobase::num_write_row: Remove the variable. ha_innobase::write_row(): Remove the hack for committing every 10000 rows. row_lock_table_for_mysql(): Remove the extra 2 parameters. lock_get_src_table(), lock_is_table_exclusive(): Remove. Reviewed-by: Marko Mäkelä <marko.makela@oracle.com> Reviewed-by: Shaohua Wang <shaohua.wang@oracle.com> Reviewed-by: Jon Olav Hauglid <jon.hauglid@oracle.com>
8 years ago
MDEV-11415 Remove excessive undo logging during ALTER TABLE…ALGORITHM=COPY If a crash occurs during ALTER TABLE…ALGORITHM=COPY, InnoDB would spend a lot of time rolling back writes to the intermediate copy of the table. To reduce the amount of busy work done, a work-around was introduced in commit fd069e2bb36a3c1c1f26d65dd298b07e6d83ac8b in MySQL 4.1.8 and 5.0.2, to commit the transaction after every 10,000 inserted rows. A proper fix would have been to disable the undo logging altogether and to simply drop the intermediate copy of the table on subsequent server startup. This is what happens in MariaDB 10.3 with MDEV-14717,MDEV-14585. In MariaDB 10.2, the intermediate copy of the table would be left behind with a name starting with the string #sql. This is a backport of a bug fix from MySQL 8.0.0 to MariaDB, contributed by jixianliang <271365745@qq.com>. Unlike recent MySQL, MariaDB supports ALTER IGNORE. For that operation InnoDB must for now keep the undo logging enabled, so that the latest row can be rolled back in case of an error. In Galera cluster, the LOAD DATA statement will retain the existing behaviour and commit the transaction after every 10,000 rows if the parameter wsrep_load_data_splitting=ON is set. The logic to do so (the wsrep_load_data_split() function and the call handler::extra(HA_EXTRA_FAKE_START_STMT)) are joint work by Ji Xianliang and Marko Mäkelä. The original fix: Author: Thirunarayanan Balathandayuthapani <thirunarayanan.balathandayuth@oracle.com> Date: Wed Dec 2 16:09:15 2015 +0530 Bug#17479594 AVOID INTERMEDIATE COMMIT WHILE DOING ALTER TABLE ALGORITHM=COPY Problem: During ALTER TABLE, we commit and restart the transaction for every 10,000 rows, so that the rollback after recovery would not take so long. Fix: Suppress the undo logging during copy alter operation. If fts_index is present then insert directly into fts auxiliary table rather than doing at commit time. ha_innobase::num_write_row: Remove the variable. ha_innobase::write_row(): Remove the hack for committing every 10000 rows. row_lock_table_for_mysql(): Remove the extra 2 parameters. lock_get_src_table(), lock_is_table_exclusive(): Remove. Reviewed-by: Marko Mäkelä <marko.makela@oracle.com> Reviewed-by: Shaohua Wang <shaohua.wang@oracle.com> Reviewed-by: Jon Olav Hauglid <jon.hauglid@oracle.com>
8 years ago
MDEV-11415 Remove excessive undo logging during ALTER TABLE…ALGORITHM=COPY If a crash occurs during ALTER TABLE…ALGORITHM=COPY, InnoDB would spend a lot of time rolling back writes to the intermediate copy of the table. To reduce the amount of busy work done, a work-around was introduced in commit fd069e2bb36a3c1c1f26d65dd298b07e6d83ac8b in MySQL 4.1.8 and 5.0.2, to commit the transaction after every 10,000 inserted rows. A proper fix would have been to disable the undo logging altogether and to simply drop the intermediate copy of the table on subsequent server startup. This is what happens in MariaDB 10.3 with MDEV-14717,MDEV-14585. In MariaDB 10.2, the intermediate copy of the table would be left behind with a name starting with the string #sql. This is a backport of a bug fix from MySQL 8.0.0 to MariaDB, contributed by jixianliang <271365745@qq.com>. Unlike recent MySQL, MariaDB supports ALTER IGNORE. For that operation InnoDB must for now keep the undo logging enabled, so that the latest row can be rolled back in case of an error. In Galera cluster, the LOAD DATA statement will retain the existing behaviour and commit the transaction after every 10,000 rows if the parameter wsrep_load_data_splitting=ON is set. The logic to do so (the wsrep_load_data_split() function and the call handler::extra(HA_EXTRA_FAKE_START_STMT)) are joint work by Ji Xianliang and Marko Mäkelä. The original fix: Author: Thirunarayanan Balathandayuthapani <thirunarayanan.balathandayuth@oracle.com> Date: Wed Dec 2 16:09:15 2015 +0530 Bug#17479594 AVOID INTERMEDIATE COMMIT WHILE DOING ALTER TABLE ALGORITHM=COPY Problem: During ALTER TABLE, we commit and restart the transaction for every 10,000 rows, so that the rollback after recovery would not take so long. Fix: Suppress the undo logging during copy alter operation. If fts_index is present then insert directly into fts auxiliary table rather than doing at commit time. ha_innobase::num_write_row: Remove the variable. ha_innobase::write_row(): Remove the hack for committing every 10000 rows. row_lock_table_for_mysql(): Remove the extra 2 parameters. lock_get_src_table(), lock_is_table_exclusive(): Remove. Reviewed-by: Marko Mäkelä <marko.makela@oracle.com> Reviewed-by: Shaohua Wang <shaohua.wang@oracle.com> Reviewed-by: Jon Olav Hauglid <jon.hauglid@oracle.com>
8 years ago
MDEV-11415 Remove excessive undo logging during ALTER TABLE…ALGORITHM=COPY If a crash occurs during ALTER TABLE…ALGORITHM=COPY, InnoDB would spend a lot of time rolling back writes to the intermediate copy of the table. To reduce the amount of busy work done, a work-around was introduced in commit fd069e2bb36a3c1c1f26d65dd298b07e6d83ac8b in MySQL 4.1.8 and 5.0.2, to commit the transaction after every 10,000 inserted rows. A proper fix would have been to disable the undo logging altogether and to simply drop the intermediate copy of the table on subsequent server startup. This is what happens in MariaDB 10.3 with MDEV-14717,MDEV-14585. In MariaDB 10.2, the intermediate copy of the table would be left behind with a name starting with the string #sql. This is a backport of a bug fix from MySQL 8.0.0 to MariaDB, contributed by jixianliang <271365745@qq.com>. Unlike recent MySQL, MariaDB supports ALTER IGNORE. For that operation InnoDB must for now keep the undo logging enabled, so that the latest row can be rolled back in case of an error. In Galera cluster, the LOAD DATA statement will retain the existing behaviour and commit the transaction after every 10,000 rows if the parameter wsrep_load_data_splitting=ON is set. The logic to do so (the wsrep_load_data_split() function and the call handler::extra(HA_EXTRA_FAKE_START_STMT)) are joint work by Ji Xianliang and Marko Mäkelä. The original fix: Author: Thirunarayanan Balathandayuthapani <thirunarayanan.balathandayuth@oracle.com> Date: Wed Dec 2 16:09:15 2015 +0530 Bug#17479594 AVOID INTERMEDIATE COMMIT WHILE DOING ALTER TABLE ALGORITHM=COPY Problem: During ALTER TABLE, we commit and restart the transaction for every 10,000 rows, so that the rollback after recovery would not take so long. Fix: Suppress the undo logging during copy alter operation. If fts_index is present then insert directly into fts auxiliary table rather than doing at commit time. ha_innobase::num_write_row: Remove the variable. ha_innobase::write_row(): Remove the hack for committing every 10000 rows. row_lock_table_for_mysql(): Remove the extra 2 parameters. lock_get_src_table(), lock_is_table_exclusive(): Remove. Reviewed-by: Marko Mäkelä <marko.makela@oracle.com> Reviewed-by: Shaohua Wang <shaohua.wang@oracle.com> Reviewed-by: Jon Olav Hauglid <jon.hauglid@oracle.com>
8 years ago
MDEV-11415 Remove excessive undo logging during ALTER TABLE…ALGORITHM=COPY If a crash occurs during ALTER TABLE…ALGORITHM=COPY, InnoDB would spend a lot of time rolling back writes to the intermediate copy of the table. To reduce the amount of busy work done, a work-around was introduced in commit fd069e2bb36a3c1c1f26d65dd298b07e6d83ac8b in MySQL 4.1.8 and 5.0.2, to commit the transaction after every 10,000 inserted rows. A proper fix would have been to disable the undo logging altogether and to simply drop the intermediate copy of the table on subsequent server startup. This is what happens in MariaDB 10.3 with MDEV-14717,MDEV-14585. In MariaDB 10.2, the intermediate copy of the table would be left behind with a name starting with the string #sql. This is a backport of a bug fix from MySQL 8.0.0 to MariaDB, contributed by jixianliang <271365745@qq.com>. Unlike recent MySQL, MariaDB supports ALTER IGNORE. For that operation InnoDB must for now keep the undo logging enabled, so that the latest row can be rolled back in case of an error. In Galera cluster, the LOAD DATA statement will retain the existing behaviour and commit the transaction after every 10,000 rows if the parameter wsrep_load_data_splitting=ON is set. The logic to do so (the wsrep_load_data_split() function and the call handler::extra(HA_EXTRA_FAKE_START_STMT)) are joint work by Ji Xianliang and Marko Mäkelä. The original fix: Author: Thirunarayanan Balathandayuthapani <thirunarayanan.balathandayuth@oracle.com> Date: Wed Dec 2 16:09:15 2015 +0530 Bug#17479594 AVOID INTERMEDIATE COMMIT WHILE DOING ALTER TABLE ALGORITHM=COPY Problem: During ALTER TABLE, we commit and restart the transaction for every 10,000 rows, so that the rollback after recovery would not take so long. Fix: Suppress the undo logging during copy alter operation. If fts_index is present then insert directly into fts auxiliary table rather than doing at commit time. ha_innobase::num_write_row: Remove the variable. ha_innobase::write_row(): Remove the hack for committing every 10000 rows. row_lock_table_for_mysql(): Remove the extra 2 parameters. lock_get_src_table(), lock_is_table_exclusive(): Remove. Reviewed-by: Marko Mäkelä <marko.makela@oracle.com> Reviewed-by: Shaohua Wang <shaohua.wang@oracle.com> Reviewed-by: Jon Olav Hauglid <jon.hauglid@oracle.com>
8 years ago
10 years ago
10 years ago
11 years ago
11 years ago
10 years ago
10 years ago
10 years ago
10 years ago
MDEV-11369 Instant ADD COLUMN for InnoDB For InnoDB tables, adding, dropping and reordering columns has required a rebuild of the table and all its indexes. Since MySQL 5.6 (and MariaDB 10.0) this has been supported online (LOCK=NONE), allowing concurrent modification of the tables. This work revises the InnoDB ROW_FORMAT=REDUNDANT, ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC so that columns can be appended instantaneously, with only minor changes performed to the table structure. The counter innodb_instant_alter_column in INFORMATION_SCHEMA.GLOBAL_STATUS is incremented whenever a table rebuild operation is converted into an instant ADD COLUMN operation. ROW_FORMAT=COMPRESSED tables will not support instant ADD COLUMN. Some usability limitations will be addressed in subsequent work: MDEV-13134 Introduce ALTER TABLE attributes ALGORITHM=NOCOPY and ALGORITHM=INSTANT MDEV-14016 Allow instant ADD COLUMN, ADD INDEX, LOCK=NONE The format of the clustered index (PRIMARY KEY) is changed as follows: (1) The FIL_PAGE_TYPE of the root page will be FIL_PAGE_TYPE_INSTANT, and a new field PAGE_INSTANT will contain the original number of fields in the clustered index ('core' fields). If instant ADD COLUMN has not been used or the table becomes empty, or the very first instant ADD COLUMN operation is rolled back, the fields PAGE_INSTANT and FIL_PAGE_TYPE will be reset to 0 and FIL_PAGE_INDEX. (2) A special 'default row' record is inserted into the leftmost leaf, between the page infimum and the first user record. This record is distinguished by the REC_INFO_MIN_REC_FLAG, and it is otherwise in the same format as records that contain values for the instantly added columns. This 'default row' always has the same number of fields as the clustered index according to the table definition. The values of 'core' fields are to be ignored. For other fields, the 'default row' will contain the default values as they were during the ALTER TABLE statement. (If the column default values are changed later, those values will only be stored in the .frm file. The 'default row' will contain the original evaluated values, which must be the same for every row.) The 'default row' must be completely hidden from higher-level access routines. Assertions have been added to ensure that no 'default row' is ever present in the adaptive hash index or in locked records. The 'default row' is never delete-marked. (3) In clustered index leaf page records, the number of fields must reside between the number of 'core' fields (dict_index_t::n_core_fields introduced in this work) and dict_index_t::n_fields. If the number of fields is less than dict_index_t::n_fields, the missing fields are replaced with the column value of the 'default row'. Note: The number of fields in the record may shrink if some of the last instantly added columns are updated to the value that is in the 'default row'. The function btr_cur_trim() implements this 'compression' on update and rollback; dtuple::trim() implements it on insert. (4) In ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC records, the new status value REC_STATUS_COLUMNS_ADDED will indicate the presence of a new record header that will encode n_fields-n_core_fields-1 in 1 or 2 bytes. (In ROW_FORMAT=REDUNDANT records, the record header always explicitly encodes the number of fields.) We introduce the undo log record type TRX_UNDO_INSERT_DEFAULT for covering the insert of the 'default row' record when instant ADD COLUMN is used for the first time. Subsequent instant ADD COLUMN can use TRX_UNDO_UPD_EXIST_REC. This is joint work with Vin Chen (陈福荣) from Tencent. The design that was discussed in April 2017 would not have allowed import or export of data files, because instead of the 'default row' it would have introduced a data dictionary table. The test rpl.rpl_alter_instant is exactly as contributed in pull request #408. The test innodb.instant_alter is based on a contributed test. The redo log record format changes for ROW_FORMAT=DYNAMIC and ROW_FORMAT=COMPACT are as contributed. (With this change present, crash recovery from MariaDB 10.3.1 will fail in spectacular ways!) Also the semantics of higher-level redo log records that modify the PAGE_INSTANT field is changed. The redo log format version identifier was already changed to LOG_HEADER_FORMAT_CURRENT=103 in MariaDB 10.3.1. Everything else has been rewritten by me. Thanks to Elena Stepanova, the code has been tested extensively. When rolling back an instant ADD COLUMN operation, we must empty the PAGE_FREE list after deleting or shortening the 'default row' record, by calling either btr_page_empty() or btr_page_reorganize(). We must know the size of each entry in the PAGE_FREE list. If rollback left a freed copy of the 'default row' in the PAGE_FREE list, we would be unable to determine its size (if it is in ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC) because it would contain more fields than the rolled-back definition of the clustered index. UNIV_SQL_DEFAULT: A new special constant that designates an instantly added column that is not present in the clustered index record. len_is_stored(): Check if a length is an actual length. There are two magic length values: UNIV_SQL_DEFAULT, UNIV_SQL_NULL. dict_col_t::def_val: The 'default row' value of the column. If the column is not added instantly, def_val.len will be UNIV_SQL_DEFAULT. dict_col_t: Add the accessors is_virtual(), is_nullable(), is_instant(), instant_value(). dict_col_t::remove_instant(): Remove the 'instant ADD' status of a column. dict_col_t::name(const dict_table_t& table): Replaces dict_table_get_col_name(). dict_index_t::n_core_fields: The original number of fields. For secondary indexes and if instant ADD COLUMN has not been used, this will be equal to dict_index_t::n_fields. dict_index_t::n_core_null_bytes: Number of bytes needed to represent the null flags; usually equal to UT_BITS_IN_BYTES(n_nullable). dict_index_t::NO_CORE_NULL_BYTES: Magic value signalling that n_core_null_bytes was not initialized yet from the clustered index root page. dict_index_t: Add the accessors is_instant(), is_clust(), get_n_nullable(), instant_field_value(). dict_index_t::instant_add_field(): Adjust clustered index metadata for instant ADD COLUMN. dict_index_t::remove_instant(): Remove the 'instant ADD' status of a clustered index when the table becomes empty, or the very first instant ADD COLUMN operation is rolled back. dict_table_t: Add the accessors is_instant(), is_temporary(), supports_instant(). dict_table_t::instant_add_column(): Adjust metadata for instant ADD COLUMN. dict_table_t::rollback_instant(): Adjust metadata on the rollback of instant ADD COLUMN. prepare_inplace_alter_table_dict(): First create the ctx->new_table, and only then decide if the table really needs to be rebuilt. We must split the creation of table or index metadata from the creation of the dictionary table records and the creation of the data. In this way, we can transform a table-rebuilding operation into an instant ADD COLUMN operation. Dictionary objects will only be added to cache when table rebuilding or index creation is needed. The ctx->instant_table will never be added to cache. dict_table_t::add_to_cache(): Modified and renamed from dict_table_add_to_cache(). Do not modify the table metadata. Let the callers invoke dict_table_add_system_columns() and if needed, set can_be_evicted. dict_create_sys_tables_tuple(), dict_create_table_step(): Omit the system columns (which will now exist in the dict_table_t object already at this point). dict_create_table_step(): Expect the callers to invoke dict_table_add_system_columns(). pars_create_table(): Before creating the table creation execution graph, invoke dict_table_add_system_columns(). row_create_table_for_mysql(): Expect all callers to invoke dict_table_add_system_columns(). create_index_dict(): Replaces row_merge_create_index_graph(). innodb_update_n_cols(): Renamed from innobase_update_n_virtual(). Call my_error() if an error occurs. btr_cur_instant_init(), btr_cur_instant_init_low(), btr_cur_instant_root_init(): Load additional metadata from the clustered index and set dict_index_t::n_core_null_bytes. This is invoked when table metadata is first loaded into the data dictionary. dict_boot(): Initialize n_core_null_bytes for the four hard-coded dictionary tables. dict_create_index_step(): Initialize n_core_null_bytes. This is executed as part of CREATE TABLE. dict_index_build_internal_clust(): Initialize n_core_null_bytes to NO_CORE_NULL_BYTES if table->supports_instant(). row_create_index_for_mysql(): Initialize n_core_null_bytes for CREATE TEMPORARY TABLE. commit_cache_norebuild(): Call the code to rename or enlarge columns in the cache only if instant ADD COLUMN is not being used. (Instant ADD COLUMN would copy all column metadata from instant_table to old_table, including the names and lengths.) PAGE_INSTANT: A new 13-bit field for storing dict_index_t::n_core_fields. This is repurposing the 16-bit field PAGE_DIRECTION, of which only the least significant 3 bits were used. The original byte containing PAGE_DIRECTION will be accessible via the new constant PAGE_DIRECTION_B. page_get_instant(), page_set_instant(): Accessors for the PAGE_INSTANT. page_ptr_get_direction(), page_get_direction(), page_ptr_set_direction(): Accessors for PAGE_DIRECTION. page_direction_reset(): Reset PAGE_DIRECTION, PAGE_N_DIRECTION. page_direction_increment(): Increment PAGE_N_DIRECTION and set PAGE_DIRECTION. rec_get_offsets(): Use the 'leaf' parameter for non-debug purposes, and assume that heap_no is always set. Initialize all dict_index_t::n_fields for ROW_FORMAT=REDUNDANT records, even if the record contains fewer fields. rec_offs_make_valid(): Add the parameter 'leaf'. rec_copy_prefix_to_dtuple(): Assert that the tuple is only built on the core fields. Instant ADD COLUMN only applies to the clustered index, and we should never build a search key that has more than the PRIMARY KEY and possibly DB_TRX_ID,DB_ROLL_PTR. All these columns are always present. dict_index_build_data_tuple(): Remove assertions that would be duplicated in rec_copy_prefix_to_dtuple(). rec_init_offsets(): Support ROW_FORMAT=REDUNDANT records whose number of fields is between n_core_fields and n_fields. cmp_rec_rec_with_match(): Implement the comparison between two MIN_REC_FLAG records. trx_t::in_rollback: Make the field available in non-debug builds. trx_start_for_ddl_low(): Remove dangerous error-tolerance. A dictionary transaction must be flagged as such before it has generated any undo log records. This is because trx_undo_assign_undo() will mark the transaction as a dictionary transaction in the undo log header right before the very first undo log record is being written. btr_index_rec_validate(): Account for instant ADD COLUMN row_undo_ins_remove_clust_rec(): On the rollback of an insert into SYS_COLUMNS, revert instant ADD COLUMN in the cache by removing the last column from the table and the clustered index. row_search_on_row_ref(), row_undo_mod_parse_undo_rec(), row_undo_mod(), trx_undo_update_rec_get_update(): Handle the 'default row' as a special case. dtuple_t::trim(index): Omit a redundant suffix of an index tuple right before insert or update. After instant ADD COLUMN, if the last fields of a clustered index tuple match the 'default row', there is no need to store them. While trimming the entry, we must hold a page latch, so that the table cannot be emptied and the 'default row' be deleted. btr_cur_optimistic_update(), btr_cur_pessimistic_update(), row_upd_clust_rec_by_insert(), row_ins_clust_index_entry_low(): Invoke dtuple_t::trim() if needed. row_ins_clust_index_entry(): Restore dtuple_t::n_fields after calling row_ins_clust_index_entry_low(). rec_get_converted_size(), rec_get_converted_size_comp(): Allow the number of fields to be between n_core_fields and n_fields. Do not support infimum,supremum. They are never supposed to be stored in dtuple_t, because page creation nowadays uses a lower-level method for initializing them. rec_convert_dtuple_to_rec_comp(): Assign the status bits based on the number of fields. btr_cur_trim(): In an update, trim the index entry as needed. For the 'default row', handle rollback specially. For user records, omit fields that match the 'default row'. btr_cur_optimistic_delete_func(), btr_cur_pessimistic_delete(): Skip locking and adaptive hash index for the 'default row'. row_log_table_apply_convert_mrec(): Replace 'default row' values if needed. In the temporary file that is applied by row_log_table_apply(), we must identify whether the records contain the extra header for instantly added columns. For now, we will allocate an additional byte for this for ROW_T_INSERT and ROW_T_UPDATE records when the source table has been subject to instant ADD COLUMN. The ROW_T_DELETE records are fine, as they will be converted and will only contain 'core' columns (PRIMARY KEY and some system columns) that are converted from dtuple_t. rec_get_converted_size_temp(), rec_init_offsets_temp(), rec_convert_dtuple_to_temp(): Add the parameter 'status'. REC_INFO_DEFAULT_ROW = REC_INFO_MIN_REC_FLAG | REC_STATUS_COLUMNS_ADDED: An info_bits constant for distinguishing the 'default row' record. rec_comp_status_t: An enum of the status bit values. rec_leaf_format: An enum that replaces the bool parameter of rec_init_offsets_comp_ordinary().
8 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
12 years ago
11 years ago
MDEV-22782 AddressSanitizer race condition in trx_free() In trx_free() we used to declare the entire trx_t unaccessible and then declare that some data members are accessible. This involves a race condition with other threads that may concurrently access the data members that must remain accessible. One type of error is "AddressSanitizer: unknown-crash", whose exact cause we have not determined. Another type of error (reported in MDEV-23472) is "use-after-poison", where the reported shadow bytes would in fact be 00, indicating that the memory was no longer poisoned. The poison-access-unpoison race condition was confirmed by "rr replay". We eliminate the race condition by invoking MEM_NOACCESS on each individual data member of trx_t before freeing the memory to the pool. The memory would not be unpoisoned until the pool is freed or the memory is being reused for another allocation. trx_t::free(): Replaces trx_free(). trx_t::active_commit_ordered: Changed to bool, so that MEM_NOACCESS can be invoked. Removed some accessor functions. Pool: Remove all MEM_ instrumentation. TrxFactory: Move the MEM_ instrumentation from Pool. TrxFactory::debug(): Removed. Moved to trx_t::free(). Because the memory was already marked unaccessible in trx_t::free(), the Factory::debug() call in Pool::putl() would be unable to access it. trx_allocate_for_background(): Replaces trx_create_low(). trx_t::free(): Perform all consistency checks while avoiding duplication, and declare most data members unaccessible.
5 years ago
10 years ago
10 years ago
10 years ago
10 years ago
MDEV-22782 AddressSanitizer race condition in trx_free() In trx_free() we used to declare the entire trx_t unaccessible and then declare that some data members are accessible. This involves a race condition with other threads that may concurrently access the data members that must remain accessible. One type of error is "AddressSanitizer: unknown-crash", whose exact cause we have not determined. Another type of error (reported in MDEV-23472) is "use-after-poison", where the reported shadow bytes would in fact be 00, indicating that the memory was no longer poisoned. The poison-access-unpoison race condition was confirmed by "rr replay". We eliminate the race condition by invoking MEM_NOACCESS on each individual data member of trx_t before freeing the memory to the pool. The memory would not be unpoisoned until the pool is freed or the memory is being reused for another allocation. trx_t::free(): Replaces trx_free(). trx_t::active_commit_ordered: Changed to bool, so that MEM_NOACCESS can be invoked. Removed some accessor functions. Pool: Remove all MEM_ instrumentation. TrxFactory: Move the MEM_ instrumentation from Pool. TrxFactory::debug(): Removed. Moved to trx_t::free(). Because the memory was already marked unaccessible in trx_t::free(), the Factory::debug() call in Pool::putl() would be unable to access it. trx_allocate_for_background(): Replaces trx_create_low(). trx_t::free(): Perform all consistency checks while avoiding duplication, and declare most data members unaccessible.
5 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
8 years ago
12 years ago
8 years ago
10 years ago
12 years ago
10 years ago
10 years ago
10 years ago
10 years ago
8 years ago
8 years ago
10 years ago
10 years ago
12 years ago
12 years ago
8 years ago
8 years ago
12 years ago
11 years ago
11 years ago
11 years ago
11 years ago
11 years ago
11 years ago
11 years ago
11 years ago
11 years ago
12 years ago
11 years ago
11 years ago
11 years ago
  1. /*****************************************************************************
  2. Copyright (c) 2011, 2018, Oracle and/or its affiliates. All Rights Reserved.
  3. Copyright (c) 2016, 2021, MariaDB Corporation.
  4. This program is free software; you can redistribute it and/or modify it under
  5. the terms of the GNU General Public License as published by the Free Software
  6. Foundation; version 2 of the License.
  7. This program is distributed in the hope that it will be useful, but WITHOUT
  8. ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
  9. FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
  10. You should have received a copy of the GNU General Public License along with
  11. this program; if not, write to the Free Software Foundation, Inc.,
  12. 51 Franklin Street, Fifth Floor, Boston, MA 02110-1335 USA
  13. *****************************************************************************/
  14. /**************************************************//**
  15. @file fts/fts0fts.cc
  16. Full Text Search interface
  17. ***********************************************************************/
  18. #include "trx0roll.h"
  19. #include "row0mysql.h"
  20. #include "row0upd.h"
  21. #include "dict0types.h"
  22. #include "dict0stats_bg.h"
  23. #include "row0sel.h"
  24. #include "fts0fts.h"
  25. #include "fts0priv.h"
  26. #include "fts0types.h"
  27. #include "fts0types.ic"
  28. #include "fts0vlc.ic"
  29. #include "fts0plugin.h"
  30. #include "dict0priv.h"
  31. #include "dict0stats.h"
  32. #include "btr0pcur.h"
  33. #include "sync0sync.h"
  34. static const ulint FTS_MAX_ID_LEN = 32;
  35. /** Column name from the FTS config table */
  36. #define FTS_MAX_CACHE_SIZE_IN_MB "cache_size_in_mb"
  37. /** Verify if a aux table name is a obsolete table
  38. by looking up the key word in the obsolete table names */
  39. #define FTS_IS_OBSOLETE_AUX_TABLE(table_name) \
  40. (strstr((table_name), "DOC_ID") != NULL \
  41. || strstr((table_name), "ADDED") != NULL \
  42. || strstr((table_name), "STOPWORDS") != NULL)
  43. /** This is maximum FTS cache for each table and would be
  44. a configurable variable */
  45. ulong fts_max_cache_size;
  46. /** Whether the total memory used for FTS cache is exhausted, and we will
  47. need a sync to free some memory */
  48. bool fts_need_sync = false;
  49. /** Variable specifying the total memory allocated for FTS cache */
  50. ulong fts_max_total_cache_size;
  51. /** This is FTS result cache limit for each query and would be
  52. a configurable variable */
  53. size_t fts_result_cache_limit;
  54. /** Variable specifying the maximum FTS max token size */
  55. ulong fts_max_token_size;
  56. /** Variable specifying the minimum FTS max token size */
  57. ulong fts_min_token_size;
  58. // FIXME: testing
  59. static time_t elapsed_time;
  60. static ulint n_nodes;
  61. #ifdef FTS_CACHE_SIZE_DEBUG
  62. /** The cache size permissible lower limit (1K) */
  63. static const ulint FTS_CACHE_SIZE_LOWER_LIMIT_IN_MB = 1;
  64. /** The cache size permissible upper limit (1G) */
  65. static const ulint FTS_CACHE_SIZE_UPPER_LIMIT_IN_MB = 1024;
  66. #endif
  67. /** Time to sleep after DEADLOCK error before retrying operation. */
  68. static const ulint FTS_DEADLOCK_RETRY_WAIT = 100000;
  69. /** InnoDB default stopword list:
  70. There are different versions of stopwords, the stop words listed
  71. below comes from "Google Stopword" list. Reference:
  72. http://meta.wikimedia.org/wiki/Stop_word_list/google_stop_word_list.
  73. The final version of InnoDB default stopword list is still pending
  74. for decision */
  75. const char *fts_default_stopword[] =
  76. {
  77. "a",
  78. "about",
  79. "an",
  80. "are",
  81. "as",
  82. "at",
  83. "be",
  84. "by",
  85. "com",
  86. "de",
  87. "en",
  88. "for",
  89. "from",
  90. "how",
  91. "i",
  92. "in",
  93. "is",
  94. "it",
  95. "la",
  96. "of",
  97. "on",
  98. "or",
  99. "that",
  100. "the",
  101. "this",
  102. "to",
  103. "was",
  104. "what",
  105. "when",
  106. "where",
  107. "who",
  108. "will",
  109. "with",
  110. "und",
  111. "the",
  112. "www",
  113. NULL
  114. };
  115. /** For storing table info when checking for orphaned tables. */
  116. struct fts_aux_table_t {
  117. table_id_t id; /*!< Table id */
  118. table_id_t parent_id; /*!< Parent table id */
  119. table_id_t index_id; /*!< Table FT index id */
  120. char* name; /*!< Name of the table */
  121. };
  122. /** FTS auxiliary table suffixes that are common to all FT indexes. */
  123. const char* fts_common_tables[] = {
  124. "BEING_DELETED",
  125. "BEING_DELETED_CACHE",
  126. "CONFIG",
  127. "DELETED",
  128. "DELETED_CACHE",
  129. NULL
  130. };
  131. /** FTS auxiliary INDEX split intervals. */
  132. const fts_index_selector_t fts_index_selector[] = {
  133. { 9, "INDEX_1" },
  134. { 65, "INDEX_2" },
  135. { 70, "INDEX_3" },
  136. { 75, "INDEX_4" },
  137. { 80, "INDEX_5" },
  138. { 85, "INDEX_6" },
  139. { 0 , NULL }
  140. };
  141. /** Default config values for FTS indexes on a table. */
  142. static const char* fts_config_table_insert_values_sql =
  143. "BEGIN\n"
  144. "\n"
  145. "INSERT INTO $config_table VALUES('"
  146. FTS_MAX_CACHE_SIZE_IN_MB "', '256');\n"
  147. ""
  148. "INSERT INTO $config_table VALUES('"
  149. FTS_OPTIMIZE_LIMIT_IN_SECS "', '180');\n"
  150. ""
  151. "INSERT INTO $config_table VALUES ('"
  152. FTS_SYNCED_DOC_ID "', '0');\n"
  153. ""
  154. "INSERT INTO $config_table VALUES ('"
  155. FTS_TOTAL_DELETED_COUNT "', '0');\n"
  156. "" /* Note: 0 == FTS_TABLE_STATE_RUNNING */
  157. "INSERT INTO $config_table VALUES ('"
  158. FTS_TABLE_STATE "', '0');\n";
  159. /** FTS tokenize parmameter for plugin parser */
  160. struct fts_tokenize_param_t {
  161. fts_doc_t* result_doc; /*!< Result doc for tokens */
  162. ulint add_pos; /*!< Added position for tokens */
  163. };
  164. /** Run SYNC on the table, i.e., write out data from the cache to the
  165. FTS auxiliary INDEX table and clear the cache at the end.
  166. @param[in,out] sync sync state
  167. @param[in] unlock_cache whether unlock cache lock when write node
  168. @param[in] wait whether wait when a sync is in progress
  169. @return DB_SUCCESS if all OK */
  170. static
  171. dberr_t
  172. fts_sync(
  173. fts_sync_t* sync,
  174. bool unlock_cache,
  175. bool wait);
  176. /****************************************************************//**
  177. Release all resources help by the words rb tree e.g., the node ilist. */
  178. static
  179. void
  180. fts_words_free(
  181. /*===========*/
  182. ib_rbt_t* words) /*!< in: rb tree of words */
  183. MY_ATTRIBUTE((nonnull));
  184. #ifdef FTS_CACHE_SIZE_DEBUG
  185. /****************************************************************//**
  186. Read the max cache size parameter from the config table. */
  187. static
  188. void
  189. fts_update_max_cache_size(
  190. /*======================*/
  191. fts_sync_t* sync); /*!< in: sync state */
  192. #endif
  193. /*********************************************************************//**
  194. This function fetches the document just inserted right before
  195. we commit the transaction, and tokenize the inserted text data
  196. and insert into FTS auxiliary table and its cache.
  197. @return TRUE if successful */
  198. static
  199. ulint
  200. fts_add_doc_by_id(
  201. /*==============*/
  202. fts_trx_table_t*ftt, /*!< in: FTS trx table */
  203. doc_id_t doc_id, /*!< in: doc id */
  204. ib_vector_t* fts_indexes MY_ATTRIBUTE((unused)));
  205. /*!< in: affected fts indexes */
  206. /******************************************************************//**
  207. Update the last document id. This function could create a new
  208. transaction to update the last document id.
  209. @return DB_SUCCESS if OK */
  210. static
  211. dberr_t
  212. fts_update_sync_doc_id(
  213. /*===================*/
  214. const dict_table_t* table, /*!< in: table */
  215. doc_id_t doc_id, /*!< in: last document id */
  216. trx_t* trx) /*!< in: update trx, or NULL */
  217. MY_ATTRIBUTE((nonnull(1)));
  218. /** Tokenize a document.
  219. @param[in,out] doc document to tokenize
  220. @param[out] result tokenization result
  221. @param[in] parser pluggable parser */
  222. static
  223. void
  224. fts_tokenize_document(
  225. fts_doc_t* doc,
  226. fts_doc_t* result,
  227. st_mysql_ftparser* parser);
  228. /** Continue to tokenize a document.
  229. @param[in,out] doc document to tokenize
  230. @param[in] add_pos add this position to all tokens from this tokenization
  231. @param[out] result tokenization result
  232. @param[in] parser pluggable parser */
  233. static
  234. void
  235. fts_tokenize_document_next(
  236. fts_doc_t* doc,
  237. ulint add_pos,
  238. fts_doc_t* result,
  239. st_mysql_ftparser* parser);
  240. /** Create the vector of fts_get_doc_t instances.
  241. @param[in,out] cache fts cache
  242. @return vector of fts_get_doc_t instances */
  243. static
  244. ib_vector_t*
  245. fts_get_docs_create(
  246. fts_cache_t* cache);
  247. /** Free the FTS cache.
  248. @param[in,out] cache to be freed */
  249. static
  250. void
  251. fts_cache_destroy(fts_cache_t* cache)
  252. {
  253. rw_lock_free(&cache->lock);
  254. rw_lock_free(&cache->init_lock);
  255. mutex_free(&cache->deleted_lock);
  256. mutex_free(&cache->doc_id_lock);
  257. os_event_destroy(cache->sync->event);
  258. if (cache->stopword_info.cached_stopword) {
  259. rbt_free(cache->stopword_info.cached_stopword);
  260. }
  261. if (cache->sync_heap->arg) {
  262. mem_heap_free(static_cast<mem_heap_t*>(cache->sync_heap->arg));
  263. }
  264. mem_heap_free(cache->cache_heap);
  265. }
  266. /** Get a character set based on precise type.
  267. @param prtype precise type
  268. @return the corresponding character set */
  269. UNIV_INLINE
  270. CHARSET_INFO*
  271. fts_get_charset(ulint prtype)
  272. {
  273. #ifdef UNIV_DEBUG
  274. switch (prtype & DATA_MYSQL_TYPE_MASK) {
  275. case MYSQL_TYPE_BIT:
  276. case MYSQL_TYPE_STRING:
  277. case MYSQL_TYPE_VAR_STRING:
  278. case MYSQL_TYPE_TINY_BLOB:
  279. case MYSQL_TYPE_MEDIUM_BLOB:
  280. case MYSQL_TYPE_BLOB:
  281. case MYSQL_TYPE_LONG_BLOB:
  282. case MYSQL_TYPE_VARCHAR:
  283. break;
  284. default:
  285. ut_error;
  286. }
  287. #endif /* UNIV_DEBUG */
  288. uint cs_num = (uint) dtype_get_charset_coll(prtype);
  289. if (CHARSET_INFO* cs = get_charset(cs_num, MYF(MY_WME))) {
  290. return(cs);
  291. }
  292. ib::fatal() << "Unable to find charset-collation " << cs_num;
  293. return(NULL);
  294. }
  295. /****************************************************************//**
  296. This function loads the default InnoDB stopword list */
  297. static
  298. void
  299. fts_load_default_stopword(
  300. /*======================*/
  301. fts_stopword_t* stopword_info) /*!< in: stopword info */
  302. {
  303. fts_string_t str;
  304. mem_heap_t* heap;
  305. ib_alloc_t* allocator;
  306. ib_rbt_t* stop_words;
  307. allocator = stopword_info->heap;
  308. heap = static_cast<mem_heap_t*>(allocator->arg);
  309. if (!stopword_info->cached_stopword) {
  310. stopword_info->cached_stopword = rbt_create_arg_cmp(
  311. sizeof(fts_tokenizer_word_t), innobase_fts_text_cmp,
  312. &my_charset_latin1);
  313. }
  314. stop_words = stopword_info->cached_stopword;
  315. str.f_n_char = 0;
  316. for (ulint i = 0; fts_default_stopword[i]; ++i) {
  317. char* word;
  318. fts_tokenizer_word_t new_word;
  319. /* We are going to duplicate the value below. */
  320. word = const_cast<char*>(fts_default_stopword[i]);
  321. new_word.nodes = ib_vector_create(
  322. allocator, sizeof(fts_node_t), 4);
  323. str.f_len = ut_strlen(word);
  324. str.f_str = reinterpret_cast<byte*>(word);
  325. fts_string_dup(&new_word.text, &str, heap);
  326. rbt_insert(stop_words, &new_word, &new_word);
  327. }
  328. stopword_info->status = STOPWORD_FROM_DEFAULT;
  329. }
  330. /****************************************************************//**
  331. Callback function to read a single stopword value.
  332. @return Always return TRUE */
  333. static
  334. ibool
  335. fts_read_stopword(
  336. /*==============*/
  337. void* row, /*!< in: sel_node_t* */
  338. void* user_arg) /*!< in: pointer to ib_vector_t */
  339. {
  340. ib_alloc_t* allocator;
  341. fts_stopword_t* stopword_info;
  342. sel_node_t* sel_node;
  343. que_node_t* exp;
  344. ib_rbt_t* stop_words;
  345. dfield_t* dfield;
  346. fts_string_t str;
  347. mem_heap_t* heap;
  348. ib_rbt_bound_t parent;
  349. sel_node = static_cast<sel_node_t*>(row);
  350. stopword_info = static_cast<fts_stopword_t*>(user_arg);
  351. stop_words = stopword_info->cached_stopword;
  352. allocator = static_cast<ib_alloc_t*>(stopword_info->heap);
  353. heap = static_cast<mem_heap_t*>(allocator->arg);
  354. exp = sel_node->select_list;
  355. /* We only need to read the first column */
  356. dfield = que_node_get_val(exp);
  357. str.f_n_char = 0;
  358. str.f_str = static_cast<byte*>(dfield_get_data(dfield));
  359. str.f_len = dfield_get_len(dfield);
  360. /* Only create new node if it is a value not already existed */
  361. if (str.f_len != UNIV_SQL_NULL
  362. && rbt_search(stop_words, &parent, &str) != 0) {
  363. fts_tokenizer_word_t new_word;
  364. new_word.nodes = ib_vector_create(
  365. allocator, sizeof(fts_node_t), 4);
  366. new_word.text.f_str = static_cast<byte*>(
  367. mem_heap_alloc(heap, str.f_len + 1));
  368. memcpy(new_word.text.f_str, str.f_str, str.f_len);
  369. new_word.text.f_n_char = 0;
  370. new_word.text.f_len = str.f_len;
  371. new_word.text.f_str[str.f_len] = 0;
  372. rbt_insert(stop_words, &new_word, &new_word);
  373. }
  374. return(TRUE);
  375. }
  376. /******************************************************************//**
  377. Load user defined stopword from designated user table
  378. @return whether the operation is successful */
  379. static
  380. bool
  381. fts_load_user_stopword(
  382. /*===================*/
  383. fts_t* fts, /*!< in: FTS struct */
  384. const char* stopword_table_name, /*!< in: Stopword table
  385. name */
  386. fts_stopword_t* stopword_info) /*!< in: Stopword info */
  387. {
  388. if (!fts->dict_locked) {
  389. mutex_enter(&dict_sys.mutex);
  390. }
  391. /* Validate the user table existence in the right format */
  392. bool ret= false;
  393. stopword_info->charset = fts_valid_stopword_table(stopword_table_name);
  394. if (!stopword_info->charset) {
  395. cleanup:
  396. if (!fts->dict_locked) {
  397. mutex_exit(&dict_sys.mutex);
  398. }
  399. return ret;
  400. }
  401. trx_t* trx = trx_create();
  402. trx->op_info = "Load user stopword table into FTS cache";
  403. if (!stopword_info->cached_stopword) {
  404. /* Create the stopword RB tree with the stopword column
  405. charset. All comparison will use this charset */
  406. stopword_info->cached_stopword = rbt_create_arg_cmp(
  407. sizeof(fts_tokenizer_word_t), innobase_fts_text_cmp,
  408. (void*)stopword_info->charset);
  409. }
  410. pars_info_t* info = pars_info_create();
  411. pars_info_bind_id(info, TRUE, "table_stopword", stopword_table_name);
  412. pars_info_bind_function(info, "my_func", fts_read_stopword,
  413. stopword_info);
  414. que_t* graph = fts_parse_sql_no_dict_lock(
  415. info,
  416. "DECLARE FUNCTION my_func;\n"
  417. "DECLARE CURSOR c IS"
  418. " SELECT value"
  419. " FROM $table_stopword;\n"
  420. "BEGIN\n"
  421. "\n"
  422. "OPEN c;\n"
  423. "WHILE 1 = 1 LOOP\n"
  424. " FETCH c INTO my_func();\n"
  425. " IF c % NOTFOUND THEN\n"
  426. " EXIT;\n"
  427. " END IF;\n"
  428. "END LOOP;\n"
  429. "CLOSE c;");
  430. for (;;) {
  431. dberr_t error = fts_eval_sql(trx, graph);
  432. if (UNIV_LIKELY(error == DB_SUCCESS)) {
  433. fts_sql_commit(trx);
  434. stopword_info->status = STOPWORD_USER_TABLE;
  435. break;
  436. } else {
  437. fts_sql_rollback(trx);
  438. if (error == DB_LOCK_WAIT_TIMEOUT) {
  439. ib::warn() << "Lock wait timeout reading user"
  440. " stopword table. Retrying!";
  441. trx->error_state = DB_SUCCESS;
  442. } else {
  443. ib::error() << "Error '" << error
  444. << "' while reading user stopword"
  445. " table.";
  446. ret = FALSE;
  447. break;
  448. }
  449. }
  450. }
  451. que_graph_free(graph);
  452. trx->free();
  453. ret = true;
  454. goto cleanup;
  455. }
  456. /******************************************************************//**
  457. Initialize the index cache. */
  458. static
  459. void
  460. fts_index_cache_init(
  461. /*=================*/
  462. ib_alloc_t* allocator, /*!< in: the allocator to use */
  463. fts_index_cache_t* index_cache) /*!< in: index cache */
  464. {
  465. ulint i;
  466. ut_a(index_cache->words == NULL);
  467. index_cache->words = rbt_create_arg_cmp(
  468. sizeof(fts_tokenizer_word_t), innobase_fts_text_cmp,
  469. (void*) index_cache->charset);
  470. ut_a(index_cache->doc_stats == NULL);
  471. index_cache->doc_stats = ib_vector_create(
  472. allocator, sizeof(fts_doc_stats_t), 4);
  473. for (i = 0; i < FTS_NUM_AUX_INDEX; ++i) {
  474. ut_a(index_cache->ins_graph[i] == NULL);
  475. ut_a(index_cache->sel_graph[i] == NULL);
  476. }
  477. }
  478. /*********************************************************************//**
  479. Initialize FTS cache. */
  480. void
  481. fts_cache_init(
  482. /*===========*/
  483. fts_cache_t* cache) /*!< in: cache to initialize */
  484. {
  485. ulint i;
  486. /* Just to make sure */
  487. ut_a(cache->sync_heap->arg == NULL);
  488. cache->sync_heap->arg = mem_heap_create(1024);
  489. cache->total_size = 0;
  490. mutex_enter((ib_mutex_t*) &cache->deleted_lock);
  491. cache->deleted_doc_ids = ib_vector_create(
  492. cache->sync_heap, sizeof(doc_id_t), 4);
  493. mutex_exit((ib_mutex_t*) &cache->deleted_lock);
  494. /* Reset the cache data for all the FTS indexes. */
  495. for (i = 0; i < ib_vector_size(cache->indexes); ++i) {
  496. fts_index_cache_t* index_cache;
  497. index_cache = static_cast<fts_index_cache_t*>(
  498. ib_vector_get(cache->indexes, i));
  499. fts_index_cache_init(cache->sync_heap, index_cache);
  500. }
  501. }
  502. /****************************************************************//**
  503. Create a FTS cache. */
  504. fts_cache_t*
  505. fts_cache_create(
  506. /*=============*/
  507. dict_table_t* table) /*!< in: table owns the FTS cache */
  508. {
  509. mem_heap_t* heap;
  510. fts_cache_t* cache;
  511. heap = static_cast<mem_heap_t*>(mem_heap_create(512));
  512. cache = static_cast<fts_cache_t*>(
  513. mem_heap_zalloc(heap, sizeof(*cache)));
  514. cache->cache_heap = heap;
  515. rw_lock_create(fts_cache_rw_lock_key, &cache->lock, SYNC_FTS_CACHE);
  516. rw_lock_create(
  517. fts_cache_init_rw_lock_key, &cache->init_lock,
  518. SYNC_FTS_CACHE_INIT);
  519. mutex_create(LATCH_ID_FTS_DELETE, &cache->deleted_lock);
  520. mutex_create(LATCH_ID_FTS_DOC_ID, &cache->doc_id_lock);
  521. /* This is the heap used to create the cache itself. */
  522. cache->self_heap = ib_heap_allocator_create(heap);
  523. /* This is a transient heap, used for storing sync data. */
  524. cache->sync_heap = ib_heap_allocator_create(heap);
  525. cache->sync_heap->arg = NULL;
  526. cache->sync = static_cast<fts_sync_t*>(
  527. mem_heap_zalloc(heap, sizeof(fts_sync_t)));
  528. cache->sync->table = table;
  529. cache->sync->event = os_event_create(0);
  530. /* Create the index cache vector that will hold the inverted indexes. */
  531. cache->indexes = ib_vector_create(
  532. cache->self_heap, sizeof(fts_index_cache_t), 2);
  533. fts_cache_init(cache);
  534. cache->stopword_info.cached_stopword = NULL;
  535. cache->stopword_info.charset = NULL;
  536. cache->stopword_info.heap = cache->self_heap;
  537. cache->stopword_info.status = STOPWORD_NOT_INIT;
  538. return(cache);
  539. }
  540. /*******************************************************************//**
  541. Add a newly create index into FTS cache */
  542. void
  543. fts_add_index(
  544. /*==========*/
  545. dict_index_t* index, /*!< FTS index to be added */
  546. dict_table_t* table) /*!< table */
  547. {
  548. fts_t* fts = table->fts;
  549. fts_cache_t* cache;
  550. fts_index_cache_t* index_cache;
  551. ut_ad(fts);
  552. cache = table->fts->cache;
  553. rw_lock_x_lock(&cache->init_lock);
  554. ib_vector_push(fts->indexes, &index);
  555. index_cache = fts_find_index_cache(cache, index);
  556. if (!index_cache) {
  557. /* Add new index cache structure */
  558. index_cache = fts_cache_index_cache_create(table, index);
  559. }
  560. rw_lock_x_unlock(&cache->init_lock);
  561. }
  562. /*******************************************************************//**
  563. recalibrate get_doc structure after index_cache in cache->indexes changed */
  564. static
  565. void
  566. fts_reset_get_doc(
  567. /*==============*/
  568. fts_cache_t* cache) /*!< in: FTS index cache */
  569. {
  570. fts_get_doc_t* get_doc;
  571. ulint i;
  572. ut_ad(rw_lock_own(&cache->init_lock, RW_LOCK_X));
  573. ib_vector_reset(cache->get_docs);
  574. for (i = 0; i < ib_vector_size(cache->indexes); i++) {
  575. fts_index_cache_t* ind_cache;
  576. ind_cache = static_cast<fts_index_cache_t*>(
  577. ib_vector_get(cache->indexes, i));
  578. get_doc = static_cast<fts_get_doc_t*>(
  579. ib_vector_push(cache->get_docs, NULL));
  580. memset(get_doc, 0x0, sizeof(*get_doc));
  581. get_doc->index_cache = ind_cache;
  582. get_doc->cache = cache;
  583. }
  584. ut_ad(ib_vector_size(cache->get_docs)
  585. == ib_vector_size(cache->indexes));
  586. }
  587. /*******************************************************************//**
  588. Check an index is in the table->indexes list
  589. @return TRUE if it exists */
  590. static
  591. ibool
  592. fts_in_dict_index(
  593. /*==============*/
  594. dict_table_t* table, /*!< in: Table */
  595. dict_index_t* index_check) /*!< in: index to be checked */
  596. {
  597. dict_index_t* index;
  598. for (index = dict_table_get_first_index(table);
  599. index != NULL;
  600. index = dict_table_get_next_index(index)) {
  601. if (index == index_check) {
  602. return(TRUE);
  603. }
  604. }
  605. return(FALSE);
  606. }
  607. /*******************************************************************//**
  608. Check an index is in the fts->cache->indexes list
  609. @return TRUE if it exists */
  610. static
  611. ibool
  612. fts_in_index_cache(
  613. /*===============*/
  614. dict_table_t* table, /*!< in: Table */
  615. dict_index_t* index) /*!< in: index to be checked */
  616. {
  617. ulint i;
  618. for (i = 0; i < ib_vector_size(table->fts->cache->indexes); i++) {
  619. fts_index_cache_t* index_cache;
  620. index_cache = static_cast<fts_index_cache_t*>(
  621. ib_vector_get(table->fts->cache->indexes, i));
  622. if (index_cache->index == index) {
  623. return(TRUE);
  624. }
  625. }
  626. return(FALSE);
  627. }
  628. /*******************************************************************//**
  629. Check indexes in the fts->indexes is also present in index cache and
  630. table->indexes list
  631. @return TRUE if all indexes match */
  632. ibool
  633. fts_check_cached_index(
  634. /*===================*/
  635. dict_table_t* table) /*!< in: Table where indexes are dropped */
  636. {
  637. ulint i;
  638. if (!table->fts || !table->fts->cache) {
  639. return(TRUE);
  640. }
  641. ut_a(ib_vector_size(table->fts->indexes)
  642. == ib_vector_size(table->fts->cache->indexes));
  643. for (i = 0; i < ib_vector_size(table->fts->indexes); i++) {
  644. dict_index_t* index;
  645. index = static_cast<dict_index_t*>(
  646. ib_vector_getp(table->fts->indexes, i));
  647. if (!fts_in_index_cache(table, index)) {
  648. return(FALSE);
  649. }
  650. if (!fts_in_dict_index(table, index)) {
  651. return(FALSE);
  652. }
  653. }
  654. return(TRUE);
  655. }
  656. /** Clear all fts resources when there is no internal DOC_ID
  657. and there are no new fts index to add.
  658. @param[in,out] table table where fts is to be freed
  659. @param[in] trx transaction to drop all fts tables */
  660. void fts_clear_all(dict_table_t *table, trx_t *trx)
  661. {
  662. if (DICT_TF2_FLAG_IS_SET(table, DICT_TF2_FTS_HAS_DOC_ID) ||
  663. !table->fts ||
  664. !ib_vector_is_empty(table->fts->indexes))
  665. return;
  666. for (const dict_index_t *index= dict_table_get_first_index(table);
  667. index; index= dict_table_get_next_index(index))
  668. if (index->type & DICT_FTS)
  669. return;
  670. fts_optimize_remove_table(table);
  671. fts_drop_tables(trx, table);
  672. fts_free(table);
  673. DICT_TF2_FLAG_UNSET(table, DICT_TF2_FTS);
  674. }
  675. /*******************************************************************//**
  676. Drop auxiliary tables related to an FTS index
  677. @return DB_SUCCESS or error number */
  678. dberr_t
  679. fts_drop_index(
  680. /*===========*/
  681. dict_table_t* table, /*!< in: Table where indexes are dropped */
  682. dict_index_t* index, /*!< in: Index to be dropped */
  683. trx_t* trx) /*!< in: Transaction for the drop */
  684. {
  685. ib_vector_t* indexes = table->fts->indexes;
  686. dberr_t err = DB_SUCCESS;
  687. ut_a(indexes);
  688. if ((ib_vector_size(indexes) == 1
  689. && (index == static_cast<dict_index_t*>(
  690. ib_vector_getp(table->fts->indexes, 0)))
  691. && DICT_TF2_FLAG_IS_SET(table, DICT_TF2_FTS_HAS_DOC_ID))
  692. || ib_vector_is_empty(indexes)) {
  693. doc_id_t current_doc_id;
  694. doc_id_t first_doc_id;
  695. /* If we are dropping the only FTS index of the table,
  696. remove it from optimize thread */
  697. fts_optimize_remove_table(table);
  698. DICT_TF2_FLAG_UNSET(table, DICT_TF2_FTS);
  699. while (index->index_fts_syncing
  700. && !trx_is_interrupted(trx)) {
  701. DICT_BG_YIELD(trx);
  702. }
  703. current_doc_id = table->fts->cache->next_doc_id;
  704. first_doc_id = table->fts->cache->first_doc_id;
  705. fts_cache_clear(table->fts->cache);
  706. fts_cache_destroy(table->fts->cache);
  707. table->fts->cache = fts_cache_create(table);
  708. table->fts->cache->next_doc_id = current_doc_id;
  709. table->fts->cache->first_doc_id = first_doc_id;
  710. } else {
  711. fts_cache_t* cache = table->fts->cache;
  712. fts_index_cache_t* index_cache;
  713. rw_lock_x_lock(&cache->init_lock);
  714. index_cache = fts_find_index_cache(cache, index);
  715. if (index_cache != NULL) {
  716. while (index->index_fts_syncing
  717. && !trx_is_interrupted(trx)) {
  718. DICT_BG_YIELD(trx);
  719. }
  720. if (index_cache->words) {
  721. fts_words_free(index_cache->words);
  722. rbt_free(index_cache->words);
  723. }
  724. ib_vector_remove(cache->indexes, *(void**) index_cache);
  725. }
  726. if (cache->get_docs) {
  727. fts_reset_get_doc(cache);
  728. }
  729. rw_lock_x_unlock(&cache->init_lock);
  730. }
  731. err = fts_drop_index_tables(trx, index);
  732. ib_vector_remove(indexes, (const void*) index);
  733. return(err);
  734. }
  735. /****************************************************************//**
  736. Free the query graph but check whether dict_sys.mutex is already
  737. held */
  738. void
  739. fts_que_graph_free_check_lock(
  740. /*==========================*/
  741. fts_table_t* fts_table, /*!< in: FTS table */
  742. const fts_index_cache_t*index_cache, /*!< in: FTS index cache */
  743. que_t* graph) /*!< in: query graph */
  744. {
  745. bool has_dict = FALSE;
  746. if (fts_table && fts_table->table) {
  747. ut_ad(fts_table->table->fts);
  748. has_dict = fts_table->table->fts->dict_locked;
  749. } else if (index_cache) {
  750. ut_ad(index_cache->index->table->fts);
  751. has_dict = index_cache->index->table->fts->dict_locked;
  752. }
  753. if (!has_dict) {
  754. mutex_enter(&dict_sys.mutex);
  755. }
  756. ut_ad(mutex_own(&dict_sys.mutex));
  757. que_graph_free(graph);
  758. if (!has_dict) {
  759. mutex_exit(&dict_sys.mutex);
  760. }
  761. }
  762. /****************************************************************//**
  763. Create an FTS index cache. */
  764. CHARSET_INFO*
  765. fts_index_get_charset(
  766. /*==================*/
  767. dict_index_t* index) /*!< in: FTS index */
  768. {
  769. CHARSET_INFO* charset = NULL;
  770. dict_field_t* field;
  771. ulint prtype;
  772. field = dict_index_get_nth_field(index, 0);
  773. prtype = field->col->prtype;
  774. charset = fts_get_charset(prtype);
  775. #ifdef FTS_DEBUG
  776. /* Set up charset info for this index. Please note all
  777. field of the FTS index should have the same charset */
  778. for (i = 1; i < index->n_fields; i++) {
  779. CHARSET_INFO* fld_charset;
  780. field = dict_index_get_nth_field(index, i);
  781. prtype = field->col->prtype;
  782. fld_charset = fts_get_charset(prtype);
  783. /* All FTS columns should have the same charset */
  784. if (charset) {
  785. ut_a(charset == fld_charset);
  786. } else {
  787. charset = fld_charset;
  788. }
  789. }
  790. #endif
  791. return(charset);
  792. }
  793. /****************************************************************//**
  794. Create an FTS index cache.
  795. @return Index Cache */
  796. fts_index_cache_t*
  797. fts_cache_index_cache_create(
  798. /*=========================*/
  799. dict_table_t* table, /*!< in: table with FTS index */
  800. dict_index_t* index) /*!< in: FTS index */
  801. {
  802. ulint n_bytes;
  803. fts_index_cache_t* index_cache;
  804. fts_cache_t* cache = table->fts->cache;
  805. ut_a(cache != NULL);
  806. ut_ad(rw_lock_own(&cache->init_lock, RW_LOCK_X));
  807. /* Must not already exist in the cache vector. */
  808. ut_a(fts_find_index_cache(cache, index) == NULL);
  809. index_cache = static_cast<fts_index_cache_t*>(
  810. ib_vector_push(cache->indexes, NULL));
  811. memset(index_cache, 0x0, sizeof(*index_cache));
  812. index_cache->index = index;
  813. index_cache->charset = fts_index_get_charset(index);
  814. n_bytes = sizeof(que_t*) * FTS_NUM_AUX_INDEX;
  815. index_cache->ins_graph = static_cast<que_t**>(
  816. mem_heap_zalloc(static_cast<mem_heap_t*>(
  817. cache->self_heap->arg), n_bytes));
  818. index_cache->sel_graph = static_cast<que_t**>(
  819. mem_heap_zalloc(static_cast<mem_heap_t*>(
  820. cache->self_heap->arg), n_bytes));
  821. fts_index_cache_init(cache->sync_heap, index_cache);
  822. if (cache->get_docs) {
  823. fts_reset_get_doc(cache);
  824. }
  825. return(index_cache);
  826. }
  827. /****************************************************************//**
  828. Release all resources help by the words rb tree e.g., the node ilist. */
  829. static
  830. void
  831. fts_words_free(
  832. /*===========*/
  833. ib_rbt_t* words) /*!< in: rb tree of words */
  834. {
  835. const ib_rbt_node_t* rbt_node;
  836. /* Free the resources held by a word. */
  837. for (rbt_node = rbt_first(words);
  838. rbt_node != NULL;
  839. rbt_node = rbt_first(words)) {
  840. ulint i;
  841. fts_tokenizer_word_t* word;
  842. word = rbt_value(fts_tokenizer_word_t, rbt_node);
  843. /* Free the ilists of this word. */
  844. for (i = 0; i < ib_vector_size(word->nodes); ++i) {
  845. fts_node_t* fts_node = static_cast<fts_node_t*>(
  846. ib_vector_get(word->nodes, i));
  847. ut_free(fts_node->ilist);
  848. fts_node->ilist = NULL;
  849. }
  850. /* NOTE: We are responsible for free'ing the node */
  851. ut_free(rbt_remove_node(words, rbt_node));
  852. }
  853. }
  854. /** Clear cache.
  855. @param[in,out] cache fts cache */
  856. void
  857. fts_cache_clear(
  858. fts_cache_t* cache)
  859. {
  860. ulint i;
  861. for (i = 0; i < ib_vector_size(cache->indexes); ++i) {
  862. ulint j;
  863. fts_index_cache_t* index_cache;
  864. index_cache = static_cast<fts_index_cache_t*>(
  865. ib_vector_get(cache->indexes, i));
  866. fts_words_free(index_cache->words);
  867. rbt_free(index_cache->words);
  868. index_cache->words = NULL;
  869. for (j = 0; j < FTS_NUM_AUX_INDEX; ++j) {
  870. if (index_cache->ins_graph[j] != NULL) {
  871. fts_que_graph_free_check_lock(
  872. NULL, index_cache,
  873. index_cache->ins_graph[j]);
  874. index_cache->ins_graph[j] = NULL;
  875. }
  876. if (index_cache->sel_graph[j] != NULL) {
  877. fts_que_graph_free_check_lock(
  878. NULL, index_cache,
  879. index_cache->sel_graph[j]);
  880. index_cache->sel_graph[j] = NULL;
  881. }
  882. }
  883. index_cache->doc_stats = NULL;
  884. }
  885. fts_need_sync = false;
  886. cache->total_size = 0;
  887. mutex_enter((ib_mutex_t*) &cache->deleted_lock);
  888. cache->deleted_doc_ids = NULL;
  889. mutex_exit((ib_mutex_t*) &cache->deleted_lock);
  890. mem_heap_free(static_cast<mem_heap_t*>(cache->sync_heap->arg));
  891. cache->sync_heap->arg = NULL;
  892. }
  893. /*********************************************************************//**
  894. Search the index specific cache for a particular FTS index.
  895. @return the index cache else NULL */
  896. UNIV_INLINE
  897. fts_index_cache_t*
  898. fts_get_index_cache(
  899. /*================*/
  900. fts_cache_t* cache, /*!< in: cache to search */
  901. const dict_index_t* index) /*!< in: index to search for */
  902. {
  903. ulint i;
  904. ut_ad(rw_lock_own((rw_lock_t*) &cache->lock, RW_LOCK_X)
  905. || rw_lock_own((rw_lock_t*) &cache->init_lock, RW_LOCK_X));
  906. for (i = 0; i < ib_vector_size(cache->indexes); ++i) {
  907. fts_index_cache_t* index_cache;
  908. index_cache = static_cast<fts_index_cache_t*>(
  909. ib_vector_get(cache->indexes, i));
  910. if (index_cache->index == index) {
  911. return(index_cache);
  912. }
  913. }
  914. return(NULL);
  915. }
  916. #ifdef FTS_DEBUG
  917. /*********************************************************************//**
  918. Search the index cache for a get_doc structure.
  919. @return the fts_get_doc_t item else NULL */
  920. static
  921. fts_get_doc_t*
  922. fts_get_index_get_doc(
  923. /*==================*/
  924. fts_cache_t* cache, /*!< in: cache to search */
  925. const dict_index_t* index) /*!< in: index to search for */
  926. {
  927. ulint i;
  928. ut_ad(rw_lock_own((rw_lock_t*) &cache->init_lock, RW_LOCK_X));
  929. for (i = 0; i < ib_vector_size(cache->get_docs); ++i) {
  930. fts_get_doc_t* get_doc;
  931. get_doc = static_cast<fts_get_doc_t*>(
  932. ib_vector_get(cache->get_docs, i));
  933. if (get_doc->index_cache->index == index) {
  934. return(get_doc);
  935. }
  936. }
  937. return(NULL);
  938. }
  939. #endif
  940. /**********************************************************************//**
  941. Find an existing word, or if not found, create one and return it.
  942. @return specified word token */
  943. static
  944. fts_tokenizer_word_t*
  945. fts_tokenizer_word_get(
  946. /*===================*/
  947. fts_cache_t* cache, /*!< in: cache */
  948. fts_index_cache_t*
  949. index_cache, /*!< in: index cache */
  950. fts_string_t* text) /*!< in: node text */
  951. {
  952. fts_tokenizer_word_t* word;
  953. ib_rbt_bound_t parent;
  954. ut_ad(rw_lock_own(&cache->lock, RW_LOCK_X));
  955. /* If it is a stopword, do not index it */
  956. if (!fts_check_token(text,
  957. cache->stopword_info.cached_stopword,
  958. index_cache->charset)) {
  959. return(NULL);
  960. }
  961. /* Check if we found a match, if not then add word to tree. */
  962. if (rbt_search(index_cache->words, &parent, text) != 0) {
  963. mem_heap_t* heap;
  964. fts_tokenizer_word_t new_word;
  965. heap = static_cast<mem_heap_t*>(cache->sync_heap->arg);
  966. new_word.nodes = ib_vector_create(
  967. cache->sync_heap, sizeof(fts_node_t), 4);
  968. fts_string_dup(&new_word.text, text, heap);
  969. parent.last = rbt_add_node(
  970. index_cache->words, &parent, &new_word);
  971. /* Take into account the RB tree memory use and the vector. */
  972. cache->total_size += sizeof(new_word)
  973. + sizeof(ib_rbt_node_t)
  974. + text->f_len
  975. + (sizeof(fts_node_t) * 4)
  976. + sizeof(*new_word.nodes);
  977. ut_ad(rbt_validate(index_cache->words));
  978. }
  979. word = rbt_value(fts_tokenizer_word_t, parent.last);
  980. return(word);
  981. }
  982. /**********************************************************************//**
  983. Add the given doc_id/word positions to the given node's ilist. */
  984. void
  985. fts_cache_node_add_positions(
  986. /*=========================*/
  987. fts_cache_t* cache, /*!< in: cache */
  988. fts_node_t* node, /*!< in: word node */
  989. doc_id_t doc_id, /*!< in: doc id */
  990. ib_vector_t* positions) /*!< in: fts_token_t::positions */
  991. {
  992. ulint i;
  993. byte* ptr;
  994. byte* ilist;
  995. ulint enc_len;
  996. ulint last_pos;
  997. byte* ptr_start;
  998. ulint doc_id_delta;
  999. #ifdef UNIV_DEBUG
  1000. if (cache) {
  1001. ut_ad(rw_lock_own(&cache->lock, RW_LOCK_X));
  1002. }
  1003. #endif /* UNIV_DEBUG */
  1004. ut_ad(doc_id >= node->last_doc_id);
  1005. /* Calculate the space required to store the ilist. */
  1006. doc_id_delta = (ulint)(doc_id - node->last_doc_id);
  1007. enc_len = fts_get_encoded_len(doc_id_delta);
  1008. last_pos = 0;
  1009. for (i = 0; i < ib_vector_size(positions); i++) {
  1010. ulint pos = *(static_cast<ulint*>(
  1011. ib_vector_get(positions, i)));
  1012. ut_ad(last_pos == 0 || pos > last_pos);
  1013. enc_len += fts_get_encoded_len(pos - last_pos);
  1014. last_pos = pos;
  1015. }
  1016. /* The 0x00 byte at the end of the token positions list. */
  1017. enc_len++;
  1018. if ((node->ilist_size_alloc - node->ilist_size) >= enc_len) {
  1019. /* No need to allocate more space, we can fit in the new
  1020. data at the end of the old one. */
  1021. ilist = NULL;
  1022. ptr = node->ilist + node->ilist_size;
  1023. } else {
  1024. ulint new_size = node->ilist_size + enc_len;
  1025. /* Over-reserve space by a fixed size for small lengths and
  1026. by 20% for lengths >= 48 bytes. */
  1027. if (new_size < 16) {
  1028. new_size = 16;
  1029. } else if (new_size < 32) {
  1030. new_size = 32;
  1031. } else if (new_size < 48) {
  1032. new_size = 48;
  1033. } else {
  1034. new_size = (ulint)(1.2 * new_size);
  1035. }
  1036. ilist = static_cast<byte*>(ut_malloc_nokey(new_size));
  1037. ptr = ilist + node->ilist_size;
  1038. node->ilist_size_alloc = new_size;
  1039. }
  1040. ptr_start = ptr;
  1041. /* Encode the new fragment. */
  1042. ptr += fts_encode_int(doc_id_delta, ptr);
  1043. last_pos = 0;
  1044. for (i = 0; i < ib_vector_size(positions); i++) {
  1045. ulint pos = *(static_cast<ulint*>(
  1046. ib_vector_get(positions, i)));
  1047. ptr += fts_encode_int(pos - last_pos, ptr);
  1048. last_pos = pos;
  1049. }
  1050. *ptr++ = 0;
  1051. ut_a(enc_len == (ulint)(ptr - ptr_start));
  1052. if (ilist) {
  1053. /* Copy old ilist to the start of the new one and switch the
  1054. new one into place in the node. */
  1055. if (node->ilist_size > 0) {
  1056. memcpy(ilist, node->ilist, node->ilist_size);
  1057. ut_free(node->ilist);
  1058. }
  1059. node->ilist = ilist;
  1060. }
  1061. node->ilist_size += enc_len;
  1062. if (cache) {
  1063. cache->total_size += enc_len;
  1064. }
  1065. if (node->first_doc_id == FTS_NULL_DOC_ID) {
  1066. node->first_doc_id = doc_id;
  1067. }
  1068. node->last_doc_id = doc_id;
  1069. ++node->doc_count;
  1070. }
  1071. /**********************************************************************//**
  1072. Add document to the cache. */
  1073. static
  1074. void
  1075. fts_cache_add_doc(
  1076. /*==============*/
  1077. fts_cache_t* cache, /*!< in: cache */
  1078. fts_index_cache_t*
  1079. index_cache, /*!< in: index cache */
  1080. doc_id_t doc_id, /*!< in: doc id to add */
  1081. ib_rbt_t* tokens) /*!< in: document tokens */
  1082. {
  1083. const ib_rbt_node_t* node;
  1084. ulint n_words;
  1085. fts_doc_stats_t* doc_stats;
  1086. if (!tokens) {
  1087. return;
  1088. }
  1089. ut_ad(rw_lock_own(&cache->lock, RW_LOCK_X));
  1090. n_words = rbt_size(tokens);
  1091. for (node = rbt_first(tokens); node; node = rbt_first(tokens)) {
  1092. fts_tokenizer_word_t* word;
  1093. fts_node_t* fts_node = NULL;
  1094. fts_token_t* token = rbt_value(fts_token_t, node);
  1095. /* Find and/or add token to the cache. */
  1096. word = fts_tokenizer_word_get(
  1097. cache, index_cache, &token->text);
  1098. if (!word) {
  1099. ut_free(rbt_remove_node(tokens, node));
  1100. continue;
  1101. }
  1102. if (ib_vector_size(word->nodes) > 0) {
  1103. fts_node = static_cast<fts_node_t*>(
  1104. ib_vector_last(word->nodes));
  1105. }
  1106. if (fts_node == NULL || fts_node->synced
  1107. || fts_node->ilist_size > FTS_ILIST_MAX_SIZE
  1108. || doc_id < fts_node->last_doc_id) {
  1109. fts_node = static_cast<fts_node_t*>(
  1110. ib_vector_push(word->nodes, NULL));
  1111. memset(fts_node, 0x0, sizeof(*fts_node));
  1112. cache->total_size += sizeof(*fts_node);
  1113. }
  1114. fts_cache_node_add_positions(
  1115. cache, fts_node, doc_id, token->positions);
  1116. ut_free(rbt_remove_node(tokens, node));
  1117. }
  1118. ut_a(rbt_empty(tokens));
  1119. /* Add to doc ids processed so far. */
  1120. doc_stats = static_cast<fts_doc_stats_t*>(
  1121. ib_vector_push(index_cache->doc_stats, NULL));
  1122. doc_stats->doc_id = doc_id;
  1123. doc_stats->word_count = n_words;
  1124. /* Add the doc stats memory usage too. */
  1125. cache->total_size += sizeof(*doc_stats);
  1126. if (doc_id > cache->sync->max_doc_id) {
  1127. cache->sync->max_doc_id = doc_id;
  1128. }
  1129. }
  1130. /****************************************************************//**
  1131. Drops a table. If the table can't be found we return a SUCCESS code.
  1132. @return DB_SUCCESS or error code */
  1133. static MY_ATTRIBUTE((nonnull, warn_unused_result))
  1134. dberr_t
  1135. fts_drop_table(
  1136. /*===========*/
  1137. trx_t* trx, /*!< in: transaction */
  1138. const char* table_name) /*!< in: table to drop */
  1139. {
  1140. dict_table_t* table;
  1141. dberr_t error = DB_SUCCESS;
  1142. /* Check that the table exists in our data dictionary.
  1143. Similar to regular drop table case, we will open table with
  1144. DICT_ERR_IGNORE_INDEX_ROOT and DICT_ERR_IGNORE_CORRUPT option */
  1145. table = dict_table_open_on_name(
  1146. table_name, TRUE, FALSE,
  1147. static_cast<dict_err_ignore_t>(
  1148. DICT_ERR_IGNORE_INDEX_ROOT | DICT_ERR_IGNORE_CORRUPT));
  1149. if (table != 0) {
  1150. dict_table_close(table, TRUE, FALSE);
  1151. /* Pass nonatomic=false (dont allow data dict unlock),
  1152. because the transaction may hold locks on SYS_* tables from
  1153. previous calls to fts_drop_table(). */
  1154. error = row_drop_table_for_mysql(table_name, trx,
  1155. SQLCOM_DROP_DB, false, false);
  1156. if (UNIV_UNLIKELY(error != DB_SUCCESS)) {
  1157. ib::error() << "Unable to drop FTS index aux table "
  1158. << table_name << ": " << error;
  1159. }
  1160. } else {
  1161. error = DB_FAIL;
  1162. }
  1163. return(error);
  1164. }
  1165. /****************************************************************//**
  1166. Rename a single auxiliary table due to database name change.
  1167. @return DB_SUCCESS or error code */
  1168. static MY_ATTRIBUTE((nonnull, warn_unused_result))
  1169. dberr_t
  1170. fts_rename_one_aux_table(
  1171. /*=====================*/
  1172. const char* new_name, /*!< in: new parent tbl name */
  1173. const char* fts_table_old_name, /*!< in: old aux tbl name */
  1174. trx_t* trx) /*!< in: transaction */
  1175. {
  1176. char fts_table_new_name[MAX_TABLE_NAME_LEN];
  1177. ulint new_db_name_len = dict_get_db_name_len(new_name);
  1178. ulint old_db_name_len = dict_get_db_name_len(fts_table_old_name);
  1179. ulint table_new_name_len = strlen(fts_table_old_name)
  1180. + new_db_name_len - old_db_name_len;
  1181. /* Check if the new and old database names are the same, if so,
  1182. nothing to do */
  1183. ut_ad((new_db_name_len != old_db_name_len)
  1184. || strncmp(new_name, fts_table_old_name, old_db_name_len) != 0);
  1185. /* Get the database name from "new_name", and table name
  1186. from the fts_table_old_name */
  1187. strncpy(fts_table_new_name, new_name, new_db_name_len);
  1188. strncpy(fts_table_new_name + new_db_name_len,
  1189. strchr(fts_table_old_name, '/'),
  1190. table_new_name_len - new_db_name_len);
  1191. fts_table_new_name[table_new_name_len] = 0;
  1192. return row_rename_table_for_mysql(
  1193. fts_table_old_name, fts_table_new_name, trx, false, false);
  1194. }
  1195. /****************************************************************//**
  1196. Rename auxiliary tables for all fts index for a table. This(rename)
  1197. is due to database name change
  1198. @return DB_SUCCESS or error code */
  1199. dberr_t
  1200. fts_rename_aux_tables(
  1201. /*==================*/
  1202. dict_table_t* table, /*!< in: user Table */
  1203. const char* new_name, /*!< in: new table name */
  1204. trx_t* trx) /*!< in: transaction */
  1205. {
  1206. ulint i;
  1207. fts_table_t fts_table;
  1208. FTS_INIT_FTS_TABLE(&fts_table, NULL, FTS_COMMON_TABLE, table);
  1209. dberr_t err = DB_SUCCESS;
  1210. char old_table_name[MAX_FULL_NAME_LEN];
  1211. /* Rename common auxiliary tables */
  1212. for (i = 0; fts_common_tables[i] != NULL; ++i) {
  1213. fts_table.suffix = fts_common_tables[i];
  1214. fts_get_table_name(&fts_table, old_table_name, true);
  1215. err = fts_rename_one_aux_table(new_name, old_table_name, trx);
  1216. if (err != DB_SUCCESS) {
  1217. return(err);
  1218. }
  1219. }
  1220. fts_t* fts = table->fts;
  1221. /* Rename index specific auxiliary tables */
  1222. for (i = 0; fts->indexes != 0 && i < ib_vector_size(fts->indexes);
  1223. ++i) {
  1224. dict_index_t* index;
  1225. index = static_cast<dict_index_t*>(
  1226. ib_vector_getp(fts->indexes, i));
  1227. FTS_INIT_INDEX_TABLE(&fts_table, NULL, FTS_INDEX_TABLE, index);
  1228. for (ulint j = 0; j < FTS_NUM_AUX_INDEX; ++j) {
  1229. fts_table.suffix = fts_get_suffix(j);
  1230. fts_get_table_name(&fts_table, old_table_name, true);
  1231. err = fts_rename_one_aux_table(
  1232. new_name, old_table_name, trx);
  1233. DBUG_EXECUTE_IF("fts_rename_failure",
  1234. err = DB_DEADLOCK;
  1235. fts_sql_rollback(trx););
  1236. if (err != DB_SUCCESS) {
  1237. return(err);
  1238. }
  1239. }
  1240. }
  1241. return(DB_SUCCESS);
  1242. }
  1243. /** Drops the common ancillary tables needed for supporting an FTS index
  1244. on the given table. row_mysql_lock_data_dictionary must have been called
  1245. before this.
  1246. @param[in] trx transaction to drop fts common table
  1247. @param[in] fts_table table with an FTS index
  1248. @param[in] drop_orphan True if the function is used to drop
  1249. orphaned table
  1250. @return DB_SUCCESS or error code */
  1251. static dberr_t
  1252. fts_drop_common_tables(
  1253. trx_t* trx,
  1254. fts_table_t* fts_table,
  1255. bool drop_orphan=false)
  1256. {
  1257. ulint i;
  1258. dberr_t error = DB_SUCCESS;
  1259. for (i = 0; fts_common_tables[i] != NULL; ++i) {
  1260. dberr_t err;
  1261. char table_name[MAX_FULL_NAME_LEN];
  1262. fts_table->suffix = fts_common_tables[i];
  1263. fts_get_table_name(fts_table, table_name, true);
  1264. err = fts_drop_table(trx, table_name);
  1265. /* We only return the status of the last error. */
  1266. if (err != DB_SUCCESS && err != DB_FAIL) {
  1267. error = err;
  1268. }
  1269. if (drop_orphan && err == DB_FAIL) {
  1270. char* path = fil_make_filepath(
  1271. NULL, table_name, IBD, false);
  1272. if (path != NULL) {
  1273. os_file_delete_if_exists(
  1274. innodb_data_file_key, path, NULL);
  1275. ut_free(path);
  1276. }
  1277. }
  1278. }
  1279. return(error);
  1280. }
  1281. /****************************************************************//**
  1282. Since we do a horizontal split on the index table, we need to drop
  1283. all the split tables.
  1284. @return DB_SUCCESS or error code */
  1285. static
  1286. dberr_t
  1287. fts_drop_index_split_tables(
  1288. /*========================*/
  1289. trx_t* trx, /*!< in: transaction */
  1290. dict_index_t* index) /*!< in: fts instance */
  1291. {
  1292. ulint i;
  1293. fts_table_t fts_table;
  1294. dberr_t error = DB_SUCCESS;
  1295. FTS_INIT_INDEX_TABLE(&fts_table, NULL, FTS_INDEX_TABLE, index);
  1296. for (i = 0; i < FTS_NUM_AUX_INDEX; ++i) {
  1297. dberr_t err;
  1298. char table_name[MAX_FULL_NAME_LEN];
  1299. fts_table.suffix = fts_get_suffix(i);
  1300. fts_get_table_name(&fts_table, table_name, true);
  1301. err = fts_drop_table(trx, table_name);
  1302. /* We only return the status of the last error. */
  1303. if (err != DB_SUCCESS && err != DB_FAIL) {
  1304. error = err;
  1305. }
  1306. }
  1307. return(error);
  1308. }
  1309. /****************************************************************//**
  1310. Drops FTS auxiliary tables for an FTS index
  1311. @return DB_SUCCESS or error code */
  1312. dberr_t
  1313. fts_drop_index_tables(
  1314. /*==================*/
  1315. trx_t* trx, /*!< in: transaction */
  1316. dict_index_t* index) /*!< in: Index to drop */
  1317. {
  1318. return(fts_drop_index_split_tables(trx, index));
  1319. }
  1320. /****************************************************************//**
  1321. Drops FTS ancillary tables needed for supporting an FTS index
  1322. on the given table. row_mysql_lock_data_dictionary must have been called
  1323. before this.
  1324. @return DB_SUCCESS or error code */
  1325. static MY_ATTRIBUTE((nonnull, warn_unused_result))
  1326. dberr_t
  1327. fts_drop_all_index_tables(
  1328. /*======================*/
  1329. trx_t* trx, /*!< in: transaction */
  1330. fts_t* fts) /*!< in: fts instance */
  1331. {
  1332. dberr_t error = DB_SUCCESS;
  1333. for (ulint i = 0;
  1334. fts->indexes != 0 && i < ib_vector_size(fts->indexes);
  1335. ++i) {
  1336. dberr_t err;
  1337. dict_index_t* index;
  1338. index = static_cast<dict_index_t*>(
  1339. ib_vector_getp(fts->indexes, i));
  1340. err = fts_drop_index_tables(trx, index);
  1341. if (err != DB_SUCCESS) {
  1342. error = err;
  1343. }
  1344. }
  1345. return(error);
  1346. }
  1347. /*********************************************************************//**
  1348. Drops the ancillary tables needed for supporting an FTS index on a
  1349. given table. row_mysql_lock_data_dictionary must have been called before
  1350. this.
  1351. @return DB_SUCCESS or error code */
  1352. dberr_t
  1353. fts_drop_tables(
  1354. /*============*/
  1355. trx_t* trx, /*!< in: transaction */
  1356. dict_table_t* table) /*!< in: table has the FTS index */
  1357. {
  1358. dberr_t error;
  1359. fts_table_t fts_table;
  1360. FTS_INIT_FTS_TABLE(&fts_table, NULL, FTS_COMMON_TABLE, table);
  1361. /* TODO: This is not atomic and can cause problems during recovery. */
  1362. error = fts_drop_common_tables(trx, &fts_table);
  1363. if (error == DB_SUCCESS && table->fts) {
  1364. error = fts_drop_all_index_tables(trx, table->fts);
  1365. }
  1366. return(error);
  1367. }
  1368. /** Create dict_table_t object for FTS Aux tables.
  1369. @param[in] aux_table_name FTS Aux table name
  1370. @param[in] table table object of FTS Index
  1371. @param[in] n_cols number of columns for FTS Aux table
  1372. @return table object for FTS Aux table */
  1373. static
  1374. dict_table_t*
  1375. fts_create_in_mem_aux_table(
  1376. const char* aux_table_name,
  1377. const dict_table_t* table,
  1378. ulint n_cols)
  1379. {
  1380. dict_table_t* new_table = dict_mem_table_create(
  1381. aux_table_name, NULL, n_cols, 0, table->flags,
  1382. table->space_id == TRX_SYS_SPACE
  1383. ? 0 : table->space_id == SRV_TMP_SPACE_ID
  1384. ? DICT_TF2_TEMPORARY : DICT_TF2_USE_FILE_PER_TABLE);
  1385. if (DICT_TF_HAS_DATA_DIR(table->flags)) {
  1386. ut_ad(table->data_dir_path != NULL);
  1387. new_table->data_dir_path = mem_heap_strdup(
  1388. new_table->heap, table->data_dir_path);
  1389. }
  1390. return(new_table);
  1391. }
  1392. /** Function to create on FTS common table.
  1393. @param[in,out] trx InnoDB transaction
  1394. @param[in] table Table that has FTS Index
  1395. @param[in] fts_table_name FTS AUX table name
  1396. @param[in] fts_suffix FTS AUX table suffix
  1397. @param[in,out] heap temporary memory heap
  1398. @return table object if created, else NULL */
  1399. static
  1400. dict_table_t*
  1401. fts_create_one_common_table(
  1402. trx_t* trx,
  1403. const dict_table_t* table,
  1404. const char* fts_table_name,
  1405. const char* fts_suffix,
  1406. mem_heap_t* heap)
  1407. {
  1408. dict_table_t* new_table;
  1409. dberr_t error;
  1410. bool is_config = strcmp(fts_suffix, "CONFIG") == 0;
  1411. if (!is_config) {
  1412. new_table = fts_create_in_mem_aux_table(
  1413. fts_table_name, table, FTS_DELETED_TABLE_NUM_COLS);
  1414. dict_mem_table_add_col(
  1415. new_table, heap, "doc_id", DATA_INT, DATA_UNSIGNED,
  1416. FTS_DELETED_TABLE_COL_LEN);
  1417. } else {
  1418. /* Config table has different schema. */
  1419. new_table = fts_create_in_mem_aux_table(
  1420. fts_table_name, table, FTS_CONFIG_TABLE_NUM_COLS);
  1421. dict_mem_table_add_col(
  1422. new_table, heap, "key", DATA_VARCHAR, 0,
  1423. FTS_CONFIG_TABLE_KEY_COL_LEN);
  1424. dict_mem_table_add_col(
  1425. new_table, heap, "value", DATA_VARCHAR, DATA_NOT_NULL,
  1426. FTS_CONFIG_TABLE_VALUE_COL_LEN);
  1427. }
  1428. dict_table_add_system_columns(new_table, heap);
  1429. error = row_create_table_for_mysql(new_table, trx,
  1430. FIL_ENCRYPTION_DEFAULT, FIL_DEFAULT_ENCRYPTION_KEY);
  1431. if (error == DB_SUCCESS) {
  1432. dict_index_t* index = dict_mem_index_create(
  1433. new_table, "FTS_COMMON_TABLE_IND",
  1434. DICT_UNIQUE|DICT_CLUSTERED, 1);
  1435. if (!is_config) {
  1436. dict_mem_index_add_field(index, "doc_id", 0);
  1437. } else {
  1438. dict_mem_index_add_field(index, "key", 0);
  1439. }
  1440. /* We save and restore trx->dict_operation because
  1441. row_create_index_for_mysql() changes the operation to
  1442. TRX_DICT_OP_TABLE. */
  1443. trx_dict_op_t op = trx_get_dict_operation(trx);
  1444. error = row_create_index_for_mysql(index, trx, NULL);
  1445. trx->dict_operation = op;
  1446. } else {
  1447. err_exit:
  1448. new_table = NULL;
  1449. ib::warn() << "Failed to create FTS common table "
  1450. << fts_table_name;
  1451. trx->error_state = error;
  1452. return NULL;
  1453. }
  1454. if (error != DB_SUCCESS) {
  1455. dict_mem_table_free(new_table);
  1456. trx->error_state = DB_SUCCESS;
  1457. row_drop_table_for_mysql(fts_table_name, trx, SQLCOM_DROP_DB);
  1458. goto err_exit;
  1459. }
  1460. return(new_table);
  1461. }
  1462. /** Creates the common auxiliary tables needed for supporting an FTS index
  1463. on the given table. row_mysql_lock_data_dictionary must have been called
  1464. before this.
  1465. The following tables are created.
  1466. CREATE TABLE $FTS_PREFIX_DELETED
  1467. (doc_id BIGINT UNSIGNED, UNIQUE CLUSTERED INDEX on doc_id)
  1468. CREATE TABLE $FTS_PREFIX_DELETED_CACHE
  1469. (doc_id BIGINT UNSIGNED, UNIQUE CLUSTERED INDEX on doc_id)
  1470. CREATE TABLE $FTS_PREFIX_BEING_DELETED
  1471. (doc_id BIGINT UNSIGNED, UNIQUE CLUSTERED INDEX on doc_id)
  1472. CREATE TABLE $FTS_PREFIX_BEING_DELETED_CACHE
  1473. (doc_id BIGINT UNSIGNED, UNIQUE CLUSTERED INDEX on doc_id)
  1474. CREATE TABLE $FTS_PREFIX_CONFIG
  1475. (key CHAR(50), value CHAR(200), UNIQUE CLUSTERED INDEX on key)
  1476. @param[in,out] trx transaction
  1477. @param[in,out] table table with FTS index
  1478. @param[in] skip_doc_id_index Skip index on doc id
  1479. @return DB_SUCCESS if succeed */
  1480. dberr_t
  1481. fts_create_common_tables(
  1482. trx_t* trx,
  1483. dict_table_t* table,
  1484. bool skip_doc_id_index)
  1485. {
  1486. dberr_t error;
  1487. que_t* graph;
  1488. fts_table_t fts_table;
  1489. mem_heap_t* heap = mem_heap_create(1024);
  1490. pars_info_t* info;
  1491. char fts_name[MAX_FULL_NAME_LEN];
  1492. char full_name[sizeof(fts_common_tables) / sizeof(char*)]
  1493. [MAX_FULL_NAME_LEN];
  1494. dict_index_t* index = NULL;
  1495. trx_dict_op_t op;
  1496. /* common_tables vector is used for dropping FTS common tables
  1497. on error condition. */
  1498. std::vector<dict_table_t*> common_tables;
  1499. std::vector<dict_table_t*>::const_iterator it;
  1500. FTS_INIT_FTS_TABLE(&fts_table, NULL, FTS_COMMON_TABLE, table);
  1501. op = trx_get_dict_operation(trx);
  1502. error = fts_drop_common_tables(trx, &fts_table);
  1503. if (error != DB_SUCCESS) {
  1504. goto func_exit;
  1505. }
  1506. /* Create the FTS tables that are common to an FTS index. */
  1507. for (ulint i = 0; fts_common_tables[i] != NULL; ++i) {
  1508. fts_table.suffix = fts_common_tables[i];
  1509. fts_get_table_name(&fts_table, full_name[i], true);
  1510. dict_table_t* common_table = fts_create_one_common_table(
  1511. trx, table, full_name[i], fts_table.suffix, heap);
  1512. if (!common_table) {
  1513. trx->error_state = DB_SUCCESS;
  1514. error = DB_ERROR;
  1515. goto func_exit;
  1516. } else {
  1517. common_tables.push_back(common_table);
  1518. }
  1519. mem_heap_empty(heap);
  1520. DBUG_EXECUTE_IF("ib_fts_aux_table_error",
  1521. /* Return error after creating FTS_AUX_CONFIG table. */
  1522. if (i == 4) {
  1523. error = DB_ERROR;
  1524. goto func_exit;
  1525. }
  1526. );
  1527. }
  1528. /* Write the default settings to the config table. */
  1529. info = pars_info_create();
  1530. fts_table.suffix = "CONFIG";
  1531. fts_get_table_name(&fts_table, fts_name, true);
  1532. pars_info_bind_id(info, true, "config_table", fts_name);
  1533. graph = fts_parse_sql_no_dict_lock(
  1534. info, fts_config_table_insert_values_sql);
  1535. error = fts_eval_sql(trx, graph);
  1536. que_graph_free(graph);
  1537. if (error != DB_SUCCESS || skip_doc_id_index) {
  1538. goto func_exit;
  1539. }
  1540. index = dict_mem_index_create(table, FTS_DOC_ID_INDEX_NAME,
  1541. DICT_UNIQUE, 1);
  1542. dict_mem_index_add_field(index, FTS_DOC_ID_COL_NAME, 0);
  1543. op = trx_get_dict_operation(trx);
  1544. error = row_create_index_for_mysql(index, trx, NULL);
  1545. func_exit:
  1546. if (error != DB_SUCCESS) {
  1547. for (it = common_tables.begin(); it != common_tables.end();
  1548. ++it) {
  1549. row_drop_table_for_mysql((*it)->name.m_name, trx,
  1550. SQLCOM_DROP_DB);
  1551. }
  1552. }
  1553. trx->dict_operation = op;
  1554. common_tables.clear();
  1555. mem_heap_free(heap);
  1556. return(error);
  1557. }
  1558. /** Create one FTS auxiliary index table for an FTS index.
  1559. @param[in,out] trx transaction
  1560. @param[in] index the index instance
  1561. @param[in] fts_table fts_table structure
  1562. @param[in,out] heap temporary memory heap
  1563. @see row_merge_create_fts_sort_index()
  1564. @return DB_SUCCESS or error code */
  1565. static
  1566. dict_table_t*
  1567. fts_create_one_index_table(
  1568. trx_t* trx,
  1569. const dict_index_t* index,
  1570. const fts_table_t* fts_table,
  1571. mem_heap_t* heap)
  1572. {
  1573. dict_field_t* field;
  1574. dict_table_t* new_table;
  1575. char table_name[MAX_FULL_NAME_LEN];
  1576. dberr_t error;
  1577. CHARSET_INFO* charset;
  1578. ut_ad(index->type & DICT_FTS);
  1579. fts_get_table_name(fts_table, table_name, true);
  1580. new_table = fts_create_in_mem_aux_table(
  1581. table_name, fts_table->table,
  1582. FTS_AUX_INDEX_TABLE_NUM_COLS);
  1583. field = dict_index_get_nth_field(index, 0);
  1584. charset = fts_get_charset(field->col->prtype);
  1585. dict_mem_table_add_col(new_table, heap, "word",
  1586. charset == &my_charset_latin1
  1587. ? DATA_VARCHAR : DATA_VARMYSQL,
  1588. field->col->prtype,
  1589. FTS_MAX_WORD_LEN_IN_CHAR
  1590. * unsigned(field->col->mbmaxlen));
  1591. dict_mem_table_add_col(new_table, heap, "first_doc_id", DATA_INT,
  1592. DATA_NOT_NULL | DATA_UNSIGNED,
  1593. FTS_INDEX_FIRST_DOC_ID_LEN);
  1594. dict_mem_table_add_col(new_table, heap, "last_doc_id", DATA_INT,
  1595. DATA_NOT_NULL | DATA_UNSIGNED,
  1596. FTS_INDEX_LAST_DOC_ID_LEN);
  1597. dict_mem_table_add_col(new_table, heap, "doc_count", DATA_INT,
  1598. DATA_NOT_NULL | DATA_UNSIGNED,
  1599. FTS_INDEX_DOC_COUNT_LEN);
  1600. /* The precise type calculation is as follows:
  1601. least signficiant byte: MySQL type code (not applicable for sys cols)
  1602. second least : DATA_NOT_NULL | DATA_BINARY_TYPE
  1603. third least : the MySQL charset-collation code (DATA_MTYPE_MAX) */
  1604. dict_mem_table_add_col(
  1605. new_table, heap, "ilist", DATA_BLOB,
  1606. (DATA_MTYPE_MAX << 16) | DATA_UNSIGNED | DATA_NOT_NULL,
  1607. FTS_INDEX_ILIST_LEN);
  1608. dict_table_add_system_columns(new_table, heap);
  1609. error = row_create_table_for_mysql(new_table, trx,
  1610. FIL_ENCRYPTION_DEFAULT, FIL_DEFAULT_ENCRYPTION_KEY);
  1611. if (error == DB_SUCCESS) {
  1612. dict_index_t* index = dict_mem_index_create(
  1613. new_table, "FTS_INDEX_TABLE_IND",
  1614. DICT_UNIQUE|DICT_CLUSTERED, 2);
  1615. dict_mem_index_add_field(index, "word", 0);
  1616. dict_mem_index_add_field(index, "first_doc_id", 0);
  1617. trx_dict_op_t op = trx_get_dict_operation(trx);
  1618. error = row_create_index_for_mysql(index, trx, NULL);
  1619. trx->dict_operation = op;
  1620. } else {
  1621. err_exit:
  1622. new_table = NULL;
  1623. ib::warn() << "Failed to create FTS index table "
  1624. << table_name;
  1625. trx->error_state = error;
  1626. return NULL;
  1627. }
  1628. if (error != DB_SUCCESS) {
  1629. dict_mem_table_free(new_table);
  1630. trx->error_state = DB_SUCCESS;
  1631. row_drop_table_for_mysql(table_name, trx, SQLCOM_DROP_DB);
  1632. goto err_exit;
  1633. }
  1634. return(new_table);
  1635. }
  1636. /** Creates the column specific ancillary tables needed for supporting an
  1637. FTS index on the given table. row_mysql_lock_data_dictionary must have
  1638. been called before this.
  1639. All FTS AUX Index tables have the following schema.
  1640. CREAT TABLE $FTS_PREFIX_INDEX_[1-6](
  1641. word VARCHAR(FTS_MAX_WORD_LEN),
  1642. first_doc_id INT NOT NULL,
  1643. last_doc_id UNSIGNED NOT NULL,
  1644. doc_count UNSIGNED INT NOT NULL,
  1645. ilist VARBINARY NOT NULL,
  1646. UNIQUE CLUSTERED INDEX ON (word, first_doc_id))
  1647. @param[in,out] trx dictionary transaction
  1648. @param[in] index fulltext index
  1649. @param[in] id table id
  1650. @return DB_SUCCESS or error code */
  1651. dberr_t
  1652. fts_create_index_tables(trx_t* trx, const dict_index_t* index, table_id_t id)
  1653. {
  1654. ulint i;
  1655. fts_table_t fts_table;
  1656. dberr_t error = DB_SUCCESS;
  1657. mem_heap_t* heap = mem_heap_create(1024);
  1658. fts_table.type = FTS_INDEX_TABLE;
  1659. fts_table.index_id = index->id;
  1660. fts_table.table_id = id;
  1661. fts_table.table = index->table;
  1662. /* aux_idx_tables vector is used for dropping FTS AUX INDEX
  1663. tables on error condition. */
  1664. std::vector<dict_table_t*> aux_idx_tables;
  1665. std::vector<dict_table_t*>::const_iterator it;
  1666. for (i = 0; i < FTS_NUM_AUX_INDEX && error == DB_SUCCESS; ++i) {
  1667. dict_table_t* new_table;
  1668. /* Create the FTS auxiliary tables that are specific
  1669. to an FTS index. We need to preserve the table_id %s
  1670. which fts_parse_sql_no_dict_lock() will fill in for us. */
  1671. fts_table.suffix = fts_get_suffix(i);
  1672. new_table = fts_create_one_index_table(
  1673. trx, index, &fts_table, heap);
  1674. if (new_table == NULL) {
  1675. error = DB_FAIL;
  1676. break;
  1677. } else {
  1678. aux_idx_tables.push_back(new_table);
  1679. }
  1680. mem_heap_empty(heap);
  1681. DBUG_EXECUTE_IF("ib_fts_index_table_error",
  1682. /* Return error after creating FTS_INDEX_5
  1683. aux table. */
  1684. if (i == 4) {
  1685. error = DB_FAIL;
  1686. break;
  1687. }
  1688. );
  1689. }
  1690. if (error != DB_SUCCESS) {
  1691. for (it = aux_idx_tables.begin(); it != aux_idx_tables.end();
  1692. ++it) {
  1693. row_drop_table_for_mysql((*it)->name.m_name, trx,
  1694. SQLCOM_DROP_DB);
  1695. }
  1696. }
  1697. aux_idx_tables.clear();
  1698. mem_heap_free(heap);
  1699. return(error);
  1700. }
  1701. /******************************************************************//**
  1702. Calculate the new state of a row given the existing state and a new event.
  1703. @return new state of row */
  1704. static
  1705. fts_row_state
  1706. fts_trx_row_get_new_state(
  1707. /*======================*/
  1708. fts_row_state old_state, /*!< in: existing state of row */
  1709. fts_row_state event) /*!< in: new event */
  1710. {
  1711. /* The rules for transforming states:
  1712. I = inserted
  1713. M = modified
  1714. D = deleted
  1715. N = nothing
  1716. M+D -> D:
  1717. If the row existed before the transaction started and it is modified
  1718. during the transaction, followed by a deletion of the row, only the
  1719. deletion will be signaled.
  1720. M+ -> M:
  1721. If the row existed before the transaction started and it is modified
  1722. more than once during the transaction, only the last modification
  1723. will be signaled.
  1724. IM*D -> N:
  1725. If a new row is added during the transaction (and possibly modified
  1726. after its initial insertion) but it is deleted before the end of the
  1727. transaction, nothing will be signaled.
  1728. IM* -> I:
  1729. If a new row is added during the transaction and modified after its
  1730. initial insertion, only the addition will be signaled.
  1731. M*DI -> M:
  1732. If the row existed before the transaction started and it is deleted,
  1733. then re-inserted, only a modification will be signaled. Note that
  1734. this case is only possible if the table is using the row's primary
  1735. key for FTS row ids, since those can be re-inserted by the user,
  1736. which is not true for InnoDB generated row ids.
  1737. It is easily seen that the above rules decompose such that we do not
  1738. need to store the row's entire history of events. Instead, we can
  1739. store just one state for the row and update that when new events
  1740. arrive. Then we can implement the above rules as a two-dimensional
  1741. look-up table, and get checking of invalid combinations "for free"
  1742. in the process. */
  1743. /* The lookup table for transforming states. old_state is the
  1744. Y-axis, event is the X-axis. */
  1745. static const fts_row_state table[4][4] = {
  1746. /* I M D N */
  1747. /* I */ { FTS_INVALID, FTS_INSERT, FTS_NOTHING, FTS_INVALID },
  1748. /* M */ { FTS_INVALID, FTS_MODIFY, FTS_DELETE, FTS_INVALID },
  1749. /* D */ { FTS_MODIFY, FTS_INVALID, FTS_INVALID, FTS_INVALID },
  1750. /* N */ { FTS_INVALID, FTS_INVALID, FTS_INVALID, FTS_INVALID }
  1751. };
  1752. fts_row_state result;
  1753. ut_a(old_state < FTS_INVALID);
  1754. ut_a(event < FTS_INVALID);
  1755. result = table[(int) old_state][(int) event];
  1756. ut_a(result != FTS_INVALID);
  1757. return(result);
  1758. }
  1759. /******************************************************************//**
  1760. Create a savepoint instance.
  1761. @return savepoint instance */
  1762. static
  1763. fts_savepoint_t*
  1764. fts_savepoint_create(
  1765. /*=================*/
  1766. ib_vector_t* savepoints, /*!< out: InnoDB transaction */
  1767. const char* name, /*!< in: savepoint name */
  1768. mem_heap_t* heap) /*!< in: heap */
  1769. {
  1770. fts_savepoint_t* savepoint;
  1771. savepoint = static_cast<fts_savepoint_t*>(
  1772. ib_vector_push(savepoints, NULL));
  1773. memset(savepoint, 0x0, sizeof(*savepoint));
  1774. if (name) {
  1775. savepoint->name = mem_heap_strdup(heap, name);
  1776. }
  1777. savepoint->tables = rbt_create(
  1778. sizeof(fts_trx_table_t*), fts_trx_table_cmp);
  1779. return(savepoint);
  1780. }
  1781. /******************************************************************//**
  1782. Create an FTS trx.
  1783. @return FTS trx */
  1784. fts_trx_t*
  1785. fts_trx_create(
  1786. /*===========*/
  1787. trx_t* trx) /*!< in/out: InnoDB
  1788. transaction */
  1789. {
  1790. fts_trx_t* ftt;
  1791. ib_alloc_t* heap_alloc;
  1792. mem_heap_t* heap = mem_heap_create(1024);
  1793. trx_named_savept_t* savep;
  1794. ut_a(trx->fts_trx == NULL);
  1795. ftt = static_cast<fts_trx_t*>(mem_heap_alloc(heap, sizeof(fts_trx_t)));
  1796. ftt->trx = trx;
  1797. ftt->heap = heap;
  1798. heap_alloc = ib_heap_allocator_create(heap);
  1799. ftt->savepoints = static_cast<ib_vector_t*>(ib_vector_create(
  1800. heap_alloc, sizeof(fts_savepoint_t), 4));
  1801. ftt->last_stmt = static_cast<ib_vector_t*>(ib_vector_create(
  1802. heap_alloc, sizeof(fts_savepoint_t), 4));
  1803. /* Default instance has no name and no heap. */
  1804. fts_savepoint_create(ftt->savepoints, NULL, NULL);
  1805. fts_savepoint_create(ftt->last_stmt, NULL, NULL);
  1806. /* Copy savepoints that already set before. */
  1807. for (savep = UT_LIST_GET_FIRST(trx->trx_savepoints);
  1808. savep != NULL;
  1809. savep = UT_LIST_GET_NEXT(trx_savepoints, savep)) {
  1810. fts_savepoint_take(ftt, savep->name);
  1811. }
  1812. return(ftt);
  1813. }
  1814. /******************************************************************//**
  1815. Create an FTS trx table.
  1816. @return FTS trx table */
  1817. static
  1818. fts_trx_table_t*
  1819. fts_trx_table_create(
  1820. /*=================*/
  1821. fts_trx_t* fts_trx, /*!< in: FTS trx */
  1822. dict_table_t* table) /*!< in: table */
  1823. {
  1824. fts_trx_table_t* ftt;
  1825. ftt = static_cast<fts_trx_table_t*>(
  1826. mem_heap_alloc(fts_trx->heap, sizeof(*ftt)));
  1827. memset(ftt, 0x0, sizeof(*ftt));
  1828. ftt->table = table;
  1829. ftt->fts_trx = fts_trx;
  1830. ftt->rows = rbt_create(sizeof(fts_trx_row_t), fts_trx_row_doc_id_cmp);
  1831. return(ftt);
  1832. }
  1833. /******************************************************************//**
  1834. Clone an FTS trx table.
  1835. @return FTS trx table */
  1836. static
  1837. fts_trx_table_t*
  1838. fts_trx_table_clone(
  1839. /*=================*/
  1840. const fts_trx_table_t* ftt_src) /*!< in: FTS trx */
  1841. {
  1842. fts_trx_table_t* ftt;
  1843. ftt = static_cast<fts_trx_table_t*>(
  1844. mem_heap_alloc(ftt_src->fts_trx->heap, sizeof(*ftt)));
  1845. memset(ftt, 0x0, sizeof(*ftt));
  1846. ftt->table = ftt_src->table;
  1847. ftt->fts_trx = ftt_src->fts_trx;
  1848. ftt->rows = rbt_create(sizeof(fts_trx_row_t), fts_trx_row_doc_id_cmp);
  1849. /* Copy the rb tree values to the new savepoint. */
  1850. rbt_merge_uniq(ftt->rows, ftt_src->rows);
  1851. /* These are only added on commit. At this stage we only have
  1852. the updated row state. */
  1853. ut_a(ftt_src->added_doc_ids == NULL);
  1854. return(ftt);
  1855. }
  1856. /******************************************************************//**
  1857. Initialize the FTS trx instance.
  1858. @return FTS trx instance */
  1859. static
  1860. fts_trx_table_t*
  1861. fts_trx_init(
  1862. /*=========*/
  1863. trx_t* trx, /*!< in: transaction */
  1864. dict_table_t* table, /*!< in: FTS table instance */
  1865. ib_vector_t* savepoints) /*!< in: Savepoints */
  1866. {
  1867. fts_trx_table_t* ftt;
  1868. ib_rbt_bound_t parent;
  1869. ib_rbt_t* tables;
  1870. fts_savepoint_t* savepoint;
  1871. savepoint = static_cast<fts_savepoint_t*>(ib_vector_last(savepoints));
  1872. tables = savepoint->tables;
  1873. rbt_search_cmp(tables, &parent, &table->id, fts_trx_table_id_cmp, NULL);
  1874. if (parent.result == 0) {
  1875. fts_trx_table_t** fttp;
  1876. fttp = rbt_value(fts_trx_table_t*, parent.last);
  1877. ftt = *fttp;
  1878. } else {
  1879. ftt = fts_trx_table_create(trx->fts_trx, table);
  1880. rbt_add_node(tables, &parent, &ftt);
  1881. }
  1882. ut_a(ftt->table == table);
  1883. return(ftt);
  1884. }
  1885. /******************************************************************//**
  1886. Notify the FTS system about an operation on an FTS-indexed table. */
  1887. static
  1888. void
  1889. fts_trx_table_add_op(
  1890. /*=================*/
  1891. fts_trx_table_t*ftt, /*!< in: FTS trx table */
  1892. doc_id_t doc_id, /*!< in: doc id */
  1893. fts_row_state state, /*!< in: state of the row */
  1894. ib_vector_t* fts_indexes) /*!< in: FTS indexes affected */
  1895. {
  1896. ib_rbt_t* rows;
  1897. ib_rbt_bound_t parent;
  1898. rows = ftt->rows;
  1899. rbt_search(rows, &parent, &doc_id);
  1900. /* Row id found, update state, and if new state is FTS_NOTHING,
  1901. we delete the row from our tree. */
  1902. if (parent.result == 0) {
  1903. fts_trx_row_t* row = rbt_value(fts_trx_row_t, parent.last);
  1904. row->state = fts_trx_row_get_new_state(row->state, state);
  1905. if (row->state == FTS_NOTHING) {
  1906. if (row->fts_indexes) {
  1907. ib_vector_free(row->fts_indexes);
  1908. }
  1909. ut_free(rbt_remove_node(rows, parent.last));
  1910. row = NULL;
  1911. } else if (row->fts_indexes != NULL) {
  1912. ib_vector_free(row->fts_indexes);
  1913. row->fts_indexes = fts_indexes;
  1914. }
  1915. } else { /* Row-id not found, create a new one. */
  1916. fts_trx_row_t row;
  1917. row.doc_id = doc_id;
  1918. row.state = state;
  1919. row.fts_indexes = fts_indexes;
  1920. rbt_add_node(rows, &parent, &row);
  1921. }
  1922. }
  1923. /******************************************************************//**
  1924. Notify the FTS system about an operation on an FTS-indexed table. */
  1925. void
  1926. fts_trx_add_op(
  1927. /*===========*/
  1928. trx_t* trx, /*!< in: InnoDB transaction */
  1929. dict_table_t* table, /*!< in: table */
  1930. doc_id_t doc_id, /*!< in: new doc id */
  1931. fts_row_state state, /*!< in: state of the row */
  1932. ib_vector_t* fts_indexes) /*!< in: FTS indexes affected
  1933. (NULL=all) */
  1934. {
  1935. fts_trx_table_t* tran_ftt;
  1936. fts_trx_table_t* stmt_ftt;
  1937. if (!trx->fts_trx) {
  1938. trx->fts_trx = fts_trx_create(trx);
  1939. }
  1940. tran_ftt = fts_trx_init(trx, table, trx->fts_trx->savepoints);
  1941. stmt_ftt = fts_trx_init(trx, table, trx->fts_trx->last_stmt);
  1942. fts_trx_table_add_op(tran_ftt, doc_id, state, fts_indexes);
  1943. fts_trx_table_add_op(stmt_ftt, doc_id, state, fts_indexes);
  1944. }
  1945. /******************************************************************//**
  1946. Fetch callback that converts a textual document id to a binary value and
  1947. stores it in the given place.
  1948. @return always returns NULL */
  1949. static
  1950. ibool
  1951. fts_fetch_store_doc_id(
  1952. /*===================*/
  1953. void* row, /*!< in: sel_node_t* */
  1954. void* user_arg) /*!< in: doc_id_t* to store
  1955. doc_id in */
  1956. {
  1957. int n_parsed;
  1958. sel_node_t* node = static_cast<sel_node_t*>(row);
  1959. doc_id_t* doc_id = static_cast<doc_id_t*>(user_arg);
  1960. dfield_t* dfield = que_node_get_val(node->select_list);
  1961. dtype_t* type = dfield_get_type(dfield);
  1962. ulint len = dfield_get_len(dfield);
  1963. char buf[32];
  1964. ut_a(dtype_get_mtype(type) == DATA_VARCHAR);
  1965. ut_a(len > 0 && len < sizeof(buf));
  1966. memcpy(buf, dfield_get_data(dfield), len);
  1967. buf[len] = '\0';
  1968. n_parsed = sscanf(buf, FTS_DOC_ID_FORMAT, doc_id);
  1969. ut_a(n_parsed == 1);
  1970. return(FALSE);
  1971. }
  1972. #ifdef FTS_CACHE_SIZE_DEBUG
  1973. /******************************************************************//**
  1974. Get the max cache size in bytes. If there is an error reading the
  1975. value we simply print an error message here and return the default
  1976. value to the caller.
  1977. @return max cache size in bytes */
  1978. static
  1979. ulint
  1980. fts_get_max_cache_size(
  1981. /*===================*/
  1982. trx_t* trx, /*!< in: transaction */
  1983. fts_table_t* fts_table) /*!< in: table instance */
  1984. {
  1985. dberr_t error;
  1986. fts_string_t value;
  1987. ulong cache_size_in_mb;
  1988. /* Set to the default value. */
  1989. cache_size_in_mb = FTS_CACHE_SIZE_LOWER_LIMIT_IN_MB;
  1990. /* We set the length of value to the max bytes it can hold. This
  1991. information is used by the callback that reads the value. */
  1992. value.f_n_char = 0;
  1993. value.f_len = FTS_MAX_CONFIG_VALUE_LEN;
  1994. value.f_str = ut_malloc_nokey(value.f_len + 1);
  1995. error = fts_config_get_value(
  1996. trx, fts_table, FTS_MAX_CACHE_SIZE_IN_MB, &value);
  1997. if (UNIV_LIKELY(error == DB_SUCCESS)) {
  1998. value.f_str[value.f_len] = 0;
  1999. cache_size_in_mb = strtoul((char*) value.f_str, NULL, 10);
  2000. if (cache_size_in_mb > FTS_CACHE_SIZE_UPPER_LIMIT_IN_MB) {
  2001. ib::warn() << "FTS max cache size ("
  2002. << cache_size_in_mb << ") out of range."
  2003. " Minimum value is "
  2004. << FTS_CACHE_SIZE_LOWER_LIMIT_IN_MB
  2005. << "MB and the maximum value is "
  2006. << FTS_CACHE_SIZE_UPPER_LIMIT_IN_MB
  2007. << "MB, setting cache size to upper limit";
  2008. cache_size_in_mb = FTS_CACHE_SIZE_UPPER_LIMIT_IN_MB;
  2009. } else if (cache_size_in_mb
  2010. < FTS_CACHE_SIZE_LOWER_LIMIT_IN_MB) {
  2011. ib::warn() << "FTS max cache size ("
  2012. << cache_size_in_mb << ") out of range."
  2013. " Minimum value is "
  2014. << FTS_CACHE_SIZE_LOWER_LIMIT_IN_MB
  2015. << "MB and the maximum value is"
  2016. << FTS_CACHE_SIZE_UPPER_LIMIT_IN_MB
  2017. << "MB, setting cache size to lower limit";
  2018. cache_size_in_mb = FTS_CACHE_SIZE_LOWER_LIMIT_IN_MB;
  2019. }
  2020. } else {
  2021. ib::error() << "(" << error << ") reading max"
  2022. " cache config value from config table "
  2023. << fts_table->table->name;
  2024. }
  2025. ut_free(value.f_str);
  2026. return(cache_size_in_mb * 1024 * 1024);
  2027. }
  2028. #endif
  2029. /*********************************************************************//**
  2030. Update the next and last Doc ID in the CONFIG table to be the input
  2031. "doc_id" value (+ 1). We would do so after each FTS index build or
  2032. table truncate */
  2033. void
  2034. fts_update_next_doc_id(
  2035. /*===================*/
  2036. trx_t* trx, /*!< in/out: transaction */
  2037. const dict_table_t* table, /*!< in: table */
  2038. doc_id_t doc_id) /*!< in: DOC ID to set */
  2039. {
  2040. table->fts->cache->synced_doc_id = doc_id;
  2041. table->fts->cache->next_doc_id = doc_id + 1;
  2042. table->fts->cache->first_doc_id = table->fts->cache->next_doc_id;
  2043. fts_update_sync_doc_id(
  2044. table, table->fts->cache->synced_doc_id, trx);
  2045. }
  2046. /*********************************************************************//**
  2047. Get the next available document id.
  2048. @return DB_SUCCESS if OK */
  2049. dberr_t
  2050. fts_get_next_doc_id(
  2051. /*================*/
  2052. const dict_table_t* table, /*!< in: table */
  2053. doc_id_t* doc_id) /*!< out: new document id */
  2054. {
  2055. fts_cache_t* cache = table->fts->cache;
  2056. /* If the Doc ID system has not yet been initialized, we
  2057. will consult the CONFIG table and user table to re-establish
  2058. the initial value of the Doc ID */
  2059. if (cache->first_doc_id == FTS_NULL_DOC_ID) {
  2060. fts_init_doc_id(table);
  2061. }
  2062. if (!DICT_TF2_FLAG_IS_SET(table, DICT_TF2_FTS_HAS_DOC_ID)) {
  2063. *doc_id = FTS_NULL_DOC_ID;
  2064. return(DB_SUCCESS);
  2065. }
  2066. DEBUG_SYNC_C("get_next_FTS_DOC_ID");
  2067. mutex_enter(&cache->doc_id_lock);
  2068. *doc_id = cache->next_doc_id++;
  2069. mutex_exit(&cache->doc_id_lock);
  2070. return(DB_SUCCESS);
  2071. }
  2072. /*********************************************************************//**
  2073. This function fetch the Doc ID from CONFIG table, and compare with
  2074. the Doc ID supplied. And store the larger one to the CONFIG table.
  2075. @return DB_SUCCESS if OK */
  2076. static MY_ATTRIBUTE((nonnull))
  2077. dberr_t
  2078. fts_cmp_set_sync_doc_id(
  2079. /*====================*/
  2080. const dict_table_t* table, /*!< in: table */
  2081. doc_id_t cmp_doc_id, /*!< in: Doc ID to compare */
  2082. ibool read_only, /*!< in: TRUE if read the
  2083. synced_doc_id only */
  2084. doc_id_t* doc_id) /*!< out: larger document id
  2085. after comparing "cmp_doc_id"
  2086. to the one stored in CONFIG
  2087. table */
  2088. {
  2089. trx_t* trx;
  2090. pars_info_t* info;
  2091. dberr_t error;
  2092. fts_table_t fts_table;
  2093. que_t* graph = NULL;
  2094. fts_cache_t* cache = table->fts->cache;
  2095. char table_name[MAX_FULL_NAME_LEN];
  2096. retry:
  2097. ut_a(table->fts->doc_col != ULINT_UNDEFINED);
  2098. fts_table.suffix = "CONFIG";
  2099. fts_table.table_id = table->id;
  2100. fts_table.type = FTS_COMMON_TABLE;
  2101. fts_table.table = table;
  2102. trx = trx_create();
  2103. if (srv_read_only_mode) {
  2104. trx_start_internal_read_only(trx);
  2105. } else {
  2106. trx_start_internal(trx);
  2107. }
  2108. trx->op_info = "update the next FTS document id";
  2109. info = pars_info_create();
  2110. pars_info_bind_function(
  2111. info, "my_func", fts_fetch_store_doc_id, doc_id);
  2112. fts_get_table_name(&fts_table, table_name);
  2113. pars_info_bind_id(info, true, "config_table", table_name);
  2114. graph = fts_parse_sql(
  2115. &fts_table, info,
  2116. "DECLARE FUNCTION my_func;\n"
  2117. "DECLARE CURSOR c IS SELECT value FROM $config_table"
  2118. " WHERE key = 'synced_doc_id' FOR UPDATE;\n"
  2119. "BEGIN\n"
  2120. ""
  2121. "OPEN c;\n"
  2122. "WHILE 1 = 1 LOOP\n"
  2123. " FETCH c INTO my_func();\n"
  2124. " IF c % NOTFOUND THEN\n"
  2125. " EXIT;\n"
  2126. " END IF;\n"
  2127. "END LOOP;\n"
  2128. "CLOSE c;");
  2129. *doc_id = 0;
  2130. error = fts_eval_sql(trx, graph);
  2131. fts_que_graph_free_check_lock(&fts_table, NULL, graph);
  2132. // FIXME: We need to retry deadlock errors
  2133. if (error != DB_SUCCESS) {
  2134. goto func_exit;
  2135. }
  2136. if (read_only) {
  2137. /* InnoDB stores actual synced_doc_id value + 1 in
  2138. FTS_CONFIG table. Reduce the value by 1 while reading
  2139. after startup. */
  2140. if (*doc_id) *doc_id -= 1;
  2141. goto func_exit;
  2142. }
  2143. if (cmp_doc_id == 0 && *doc_id) {
  2144. cache->synced_doc_id = *doc_id - 1;
  2145. } else {
  2146. cache->synced_doc_id = ut_max(cmp_doc_id, *doc_id);
  2147. }
  2148. mutex_enter(&cache->doc_id_lock);
  2149. /* For each sync operation, we will add next_doc_id by 1,
  2150. so to mark a sync operation */
  2151. if (cache->next_doc_id < cache->synced_doc_id + 1) {
  2152. cache->next_doc_id = cache->synced_doc_id + 1;
  2153. }
  2154. mutex_exit(&cache->doc_id_lock);
  2155. if (cmp_doc_id > *doc_id) {
  2156. error = fts_update_sync_doc_id(
  2157. table, cache->synced_doc_id, trx);
  2158. }
  2159. *doc_id = cache->next_doc_id;
  2160. func_exit:
  2161. if (UNIV_LIKELY(error == DB_SUCCESS)) {
  2162. fts_sql_commit(trx);
  2163. } else {
  2164. *doc_id = 0;
  2165. ib::error() << "(" << error << ") while getting next doc id "
  2166. "for table " << table->name;
  2167. fts_sql_rollback(trx);
  2168. if (error == DB_DEADLOCK) {
  2169. os_thread_sleep(FTS_DEADLOCK_RETRY_WAIT);
  2170. goto retry;
  2171. }
  2172. }
  2173. trx->free();
  2174. return(error);
  2175. }
  2176. /*********************************************************************//**
  2177. Update the last document id. This function could create a new
  2178. transaction to update the last document id.
  2179. @return DB_SUCCESS if OK */
  2180. static
  2181. dberr_t
  2182. fts_update_sync_doc_id(
  2183. /*===================*/
  2184. const dict_table_t* table, /*!< in: table */
  2185. doc_id_t doc_id, /*!< in: last document id */
  2186. trx_t* trx) /*!< in: update trx, or NULL */
  2187. {
  2188. byte id[FTS_MAX_ID_LEN];
  2189. pars_info_t* info;
  2190. fts_table_t fts_table;
  2191. ulint id_len;
  2192. que_t* graph = NULL;
  2193. dberr_t error;
  2194. ibool local_trx = FALSE;
  2195. fts_cache_t* cache = table->fts->cache;
  2196. char fts_name[MAX_FULL_NAME_LEN];
  2197. if (srv_read_only_mode) {
  2198. return DB_READ_ONLY;
  2199. }
  2200. fts_table.suffix = "CONFIG";
  2201. fts_table.table_id = table->id;
  2202. fts_table.type = FTS_COMMON_TABLE;
  2203. fts_table.table = table;
  2204. if (!trx) {
  2205. trx = trx_create();
  2206. trx_start_internal(trx);
  2207. trx->op_info = "setting last FTS document id";
  2208. local_trx = TRUE;
  2209. }
  2210. info = pars_info_create();
  2211. id_len = (ulint) snprintf(
  2212. (char*) id, sizeof(id), FTS_DOC_ID_FORMAT, doc_id + 1);
  2213. pars_info_bind_varchar_literal(info, "doc_id", id, id_len);
  2214. fts_get_table_name(&fts_table, fts_name,
  2215. table->fts->dict_locked);
  2216. pars_info_bind_id(info, true, "table_name", fts_name);
  2217. graph = fts_parse_sql(
  2218. &fts_table, info,
  2219. "BEGIN"
  2220. " UPDATE $table_name SET value = :doc_id"
  2221. " WHERE key = 'synced_doc_id';");
  2222. error = fts_eval_sql(trx, graph);
  2223. fts_que_graph_free_check_lock(&fts_table, NULL, graph);
  2224. if (local_trx) {
  2225. if (UNIV_LIKELY(error == DB_SUCCESS)) {
  2226. fts_sql_commit(trx);
  2227. cache->synced_doc_id = doc_id;
  2228. } else {
  2229. ib::error() << "(" << error << ") while"
  2230. " updating last doc id for table"
  2231. << table->name;
  2232. fts_sql_rollback(trx);
  2233. }
  2234. trx->free();
  2235. }
  2236. return(error);
  2237. }
  2238. /*********************************************************************//**
  2239. Create a new fts_doc_ids_t.
  2240. @return new fts_doc_ids_t */
  2241. fts_doc_ids_t*
  2242. fts_doc_ids_create(void)
  2243. /*====================*/
  2244. {
  2245. fts_doc_ids_t* fts_doc_ids;
  2246. mem_heap_t* heap = mem_heap_create(512);
  2247. fts_doc_ids = static_cast<fts_doc_ids_t*>(
  2248. mem_heap_alloc(heap, sizeof(*fts_doc_ids)));
  2249. fts_doc_ids->self_heap = ib_heap_allocator_create(heap);
  2250. fts_doc_ids->doc_ids = static_cast<ib_vector_t*>(ib_vector_create(
  2251. fts_doc_ids->self_heap, sizeof(doc_id_t), 32));
  2252. return(fts_doc_ids);
  2253. }
  2254. /*********************************************************************//**
  2255. Do commit-phase steps necessary for the insertion of a new row. */
  2256. void
  2257. fts_add(
  2258. /*====*/
  2259. fts_trx_table_t*ftt, /*!< in: FTS trx table */
  2260. fts_trx_row_t* row) /*!< in: row */
  2261. {
  2262. dict_table_t* table = ftt->table;
  2263. doc_id_t doc_id = row->doc_id;
  2264. ut_a(row->state == FTS_INSERT || row->state == FTS_MODIFY);
  2265. fts_add_doc_by_id(ftt, doc_id, row->fts_indexes);
  2266. mutex_enter(&table->fts->cache->deleted_lock);
  2267. ++table->fts->cache->added;
  2268. mutex_exit(&table->fts->cache->deleted_lock);
  2269. if (!DICT_TF2_FLAG_IS_SET(table, DICT_TF2_FTS_HAS_DOC_ID)
  2270. && doc_id >= table->fts->cache->next_doc_id) {
  2271. table->fts->cache->next_doc_id = doc_id + 1;
  2272. }
  2273. }
  2274. /*********************************************************************//**
  2275. Do commit-phase steps necessary for the deletion of a row.
  2276. @return DB_SUCCESS or error code */
  2277. static MY_ATTRIBUTE((nonnull, warn_unused_result))
  2278. dberr_t
  2279. fts_delete(
  2280. /*=======*/
  2281. fts_trx_table_t*ftt, /*!< in: FTS trx table */
  2282. fts_trx_row_t* row) /*!< in: row */
  2283. {
  2284. que_t* graph;
  2285. fts_table_t fts_table;
  2286. dberr_t error = DB_SUCCESS;
  2287. doc_id_t write_doc_id;
  2288. dict_table_t* table = ftt->table;
  2289. doc_id_t doc_id = row->doc_id;
  2290. trx_t* trx = ftt->fts_trx->trx;
  2291. pars_info_t* info = pars_info_create();
  2292. fts_cache_t* cache = table->fts->cache;
  2293. /* we do not index Documents whose Doc ID value is 0 */
  2294. if (doc_id == FTS_NULL_DOC_ID) {
  2295. ut_ad(!DICT_TF2_FLAG_IS_SET(table, DICT_TF2_FTS_HAS_DOC_ID));
  2296. return(error);
  2297. }
  2298. ut_a(row->state == FTS_DELETE || row->state == FTS_MODIFY);
  2299. FTS_INIT_FTS_TABLE(&fts_table, "DELETED", FTS_COMMON_TABLE, table);
  2300. /* Convert to "storage" byte order. */
  2301. fts_write_doc_id((byte*) &write_doc_id, doc_id);
  2302. fts_bind_doc_id(info, "doc_id", &write_doc_id);
  2303. /* It is possible we update a record that has not yet been sync-ed
  2304. into cache from last crash (delete Doc will not initialize the
  2305. sync). Avoid any added counter accounting until the FTS cache
  2306. is re-established and sync-ed */
  2307. if (table->fts->added_synced
  2308. && doc_id > cache->synced_doc_id) {
  2309. mutex_enter(&table->fts->cache->deleted_lock);
  2310. /* The Doc ID could belong to those left in
  2311. ADDED table from last crash. So need to check
  2312. if it is less than first_doc_id when we initialize
  2313. the Doc ID system after reboot */
  2314. if (doc_id >= table->fts->cache->first_doc_id
  2315. && table->fts->cache->added > 0) {
  2316. --table->fts->cache->added;
  2317. }
  2318. mutex_exit(&table->fts->cache->deleted_lock);
  2319. /* Only if the row was really deleted. */
  2320. ut_a(row->state == FTS_DELETE || row->state == FTS_MODIFY);
  2321. }
  2322. /* Note the deleted document for OPTIMIZE to purge. */
  2323. if (error == DB_SUCCESS) {
  2324. char table_name[MAX_FULL_NAME_LEN];
  2325. trx->op_info = "adding doc id to FTS DELETED";
  2326. info->graph_owns_us = TRUE;
  2327. fts_table.suffix = "DELETED";
  2328. fts_get_table_name(&fts_table, table_name);
  2329. pars_info_bind_id(info, true, "deleted", table_name);
  2330. graph = fts_parse_sql(
  2331. &fts_table,
  2332. info,
  2333. "BEGIN INSERT INTO $deleted VALUES (:doc_id);");
  2334. error = fts_eval_sql(trx, graph);
  2335. fts_que_graph_free(graph);
  2336. } else {
  2337. pars_info_free(info);
  2338. }
  2339. /* Increment the total deleted count, this is used to calculate the
  2340. number of documents indexed. */
  2341. if (error == DB_SUCCESS) {
  2342. mutex_enter(&table->fts->cache->deleted_lock);
  2343. ++table->fts->cache->deleted;
  2344. mutex_exit(&table->fts->cache->deleted_lock);
  2345. }
  2346. return(error);
  2347. }
  2348. /*********************************************************************//**
  2349. Do commit-phase steps necessary for the modification of a row.
  2350. @return DB_SUCCESS or error code */
  2351. static MY_ATTRIBUTE((nonnull, warn_unused_result))
  2352. dberr_t
  2353. fts_modify(
  2354. /*=======*/
  2355. fts_trx_table_t* ftt, /*!< in: FTS trx table */
  2356. fts_trx_row_t* row) /*!< in: row */
  2357. {
  2358. dberr_t error;
  2359. ut_a(row->state == FTS_MODIFY);
  2360. error = fts_delete(ftt, row);
  2361. if (error == DB_SUCCESS) {
  2362. fts_add(ftt, row);
  2363. }
  2364. return(error);
  2365. }
  2366. /*********************************************************************//**
  2367. The given transaction is about to be committed; do whatever is necessary
  2368. from the FTS system's POV.
  2369. @return DB_SUCCESS or error code */
  2370. static MY_ATTRIBUTE((nonnull, warn_unused_result))
  2371. dberr_t
  2372. fts_commit_table(
  2373. /*=============*/
  2374. fts_trx_table_t* ftt) /*!< in: FTS table to commit*/
  2375. {
  2376. if (srv_read_only_mode) {
  2377. return DB_READ_ONLY;
  2378. }
  2379. const ib_rbt_node_t* node;
  2380. ib_rbt_t* rows;
  2381. dberr_t error = DB_SUCCESS;
  2382. fts_cache_t* cache = ftt->table->fts->cache;
  2383. trx_t* trx = trx_create();
  2384. trx_start_internal(trx);
  2385. rows = ftt->rows;
  2386. ftt->fts_trx->trx = trx;
  2387. if (cache->get_docs == NULL) {
  2388. rw_lock_x_lock(&cache->init_lock);
  2389. if (cache->get_docs == NULL) {
  2390. cache->get_docs = fts_get_docs_create(cache);
  2391. }
  2392. rw_lock_x_unlock(&cache->init_lock);
  2393. }
  2394. for (node = rbt_first(rows);
  2395. node != NULL && error == DB_SUCCESS;
  2396. node = rbt_next(rows, node)) {
  2397. fts_trx_row_t* row = rbt_value(fts_trx_row_t, node);
  2398. switch (row->state) {
  2399. case FTS_INSERT:
  2400. fts_add(ftt, row);
  2401. break;
  2402. case FTS_MODIFY:
  2403. error = fts_modify(ftt, row);
  2404. break;
  2405. case FTS_DELETE:
  2406. error = fts_delete(ftt, row);
  2407. break;
  2408. default:
  2409. ut_error;
  2410. }
  2411. }
  2412. fts_sql_commit(trx);
  2413. trx->free();
  2414. return(error);
  2415. }
  2416. /*********************************************************************//**
  2417. The given transaction is about to be committed; do whatever is necessary
  2418. from the FTS system's POV.
  2419. @return DB_SUCCESS or error code */
  2420. dberr_t
  2421. fts_commit(
  2422. /*=======*/
  2423. trx_t* trx) /*!< in: transaction */
  2424. {
  2425. const ib_rbt_node_t* node;
  2426. dberr_t error;
  2427. ib_rbt_t* tables;
  2428. fts_savepoint_t* savepoint;
  2429. savepoint = static_cast<fts_savepoint_t*>(
  2430. ib_vector_last(trx->fts_trx->savepoints));
  2431. tables = savepoint->tables;
  2432. for (node = rbt_first(tables), error = DB_SUCCESS;
  2433. node != NULL && error == DB_SUCCESS;
  2434. node = rbt_next(tables, node)) {
  2435. fts_trx_table_t** ftt;
  2436. ftt = rbt_value(fts_trx_table_t*, node);
  2437. error = fts_commit_table(*ftt);
  2438. }
  2439. return(error);
  2440. }
  2441. /*********************************************************************//**
  2442. Initialize a document. */
  2443. void
  2444. fts_doc_init(
  2445. /*=========*/
  2446. fts_doc_t* doc) /*!< in: doc to initialize */
  2447. {
  2448. mem_heap_t* heap = mem_heap_create(32);
  2449. memset(doc, 0, sizeof(*doc));
  2450. doc->self_heap = ib_heap_allocator_create(heap);
  2451. }
  2452. /*********************************************************************//**
  2453. Free document. */
  2454. void
  2455. fts_doc_free(
  2456. /*=========*/
  2457. fts_doc_t* doc) /*!< in: document */
  2458. {
  2459. mem_heap_t* heap = static_cast<mem_heap_t*>(doc->self_heap->arg);
  2460. if (doc->tokens) {
  2461. rbt_free(doc->tokens);
  2462. }
  2463. ut_d(memset(doc, 0, sizeof(*doc)));
  2464. mem_heap_free(heap);
  2465. }
  2466. /*********************************************************************//**
  2467. Callback function for fetch that stores the text of an FTS document,
  2468. converting each column to UTF-16.
  2469. @return always FALSE */
  2470. ibool
  2471. fts_query_expansion_fetch_doc(
  2472. /*==========================*/
  2473. void* row, /*!< in: sel_node_t* */
  2474. void* user_arg) /*!< in: fts_doc_t* */
  2475. {
  2476. que_node_t* exp;
  2477. sel_node_t* node = static_cast<sel_node_t*>(row);
  2478. fts_doc_t* result_doc = static_cast<fts_doc_t*>(user_arg);
  2479. dfield_t* dfield;
  2480. ulint len;
  2481. ulint doc_len;
  2482. fts_doc_t doc;
  2483. CHARSET_INFO* doc_charset = NULL;
  2484. ulint field_no = 0;
  2485. len = 0;
  2486. fts_doc_init(&doc);
  2487. doc.found = TRUE;
  2488. exp = node->select_list;
  2489. doc_len = 0;
  2490. doc_charset = result_doc->charset;
  2491. /* Copy each indexed column content into doc->text.f_str */
  2492. while (exp) {
  2493. dfield = que_node_get_val(exp);
  2494. len = dfield_get_len(dfield);
  2495. /* NULL column */
  2496. if (len == UNIV_SQL_NULL) {
  2497. exp = que_node_get_next(exp);
  2498. continue;
  2499. }
  2500. if (!doc_charset) {
  2501. doc_charset = fts_get_charset(dfield->type.prtype);
  2502. }
  2503. doc.charset = doc_charset;
  2504. if (dfield_is_ext(dfield)) {
  2505. /* We ignore columns that are stored externally, this
  2506. could result in too many words to search */
  2507. exp = que_node_get_next(exp);
  2508. continue;
  2509. } else {
  2510. doc.text.f_n_char = 0;
  2511. doc.text.f_str = static_cast<byte*>(
  2512. dfield_get_data(dfield));
  2513. doc.text.f_len = len;
  2514. }
  2515. if (field_no == 0) {
  2516. fts_tokenize_document(&doc, result_doc,
  2517. result_doc->parser);
  2518. } else {
  2519. fts_tokenize_document_next(&doc, doc_len, result_doc,
  2520. result_doc->parser);
  2521. }
  2522. exp = que_node_get_next(exp);
  2523. doc_len += (exp) ? len + 1 : len;
  2524. field_no++;
  2525. }
  2526. ut_ad(doc_charset);
  2527. if (!result_doc->charset) {
  2528. result_doc->charset = doc_charset;
  2529. }
  2530. fts_doc_free(&doc);
  2531. return(FALSE);
  2532. }
  2533. /*********************************************************************//**
  2534. fetch and tokenize the document. */
  2535. static
  2536. void
  2537. fts_fetch_doc_from_rec(
  2538. /*===================*/
  2539. fts_get_doc_t* get_doc, /*!< in: FTS index's get_doc struct */
  2540. dict_index_t* clust_index, /*!< in: cluster index */
  2541. btr_pcur_t* pcur, /*!< in: cursor whose position
  2542. has been stored */
  2543. rec_offs* offsets, /*!< in: offsets */
  2544. fts_doc_t* doc) /*!< out: fts doc to hold parsed
  2545. documents */
  2546. {
  2547. dict_index_t* index;
  2548. const rec_t* clust_rec;
  2549. const dict_field_t* ifield;
  2550. ulint clust_pos;
  2551. ulint doc_len = 0;
  2552. st_mysql_ftparser* parser;
  2553. if (!get_doc) {
  2554. return;
  2555. }
  2556. index = get_doc->index_cache->index;
  2557. parser = get_doc->index_cache->index->parser;
  2558. clust_rec = btr_pcur_get_rec(pcur);
  2559. ut_ad(!page_rec_is_comp(clust_rec)
  2560. || rec_get_status(clust_rec) == REC_STATUS_ORDINARY);
  2561. for (ulint i = 0; i < index->n_fields; i++) {
  2562. ifield = dict_index_get_nth_field(index, i);
  2563. clust_pos = dict_col_get_clust_pos(ifield->col, clust_index);
  2564. if (!get_doc->index_cache->charset) {
  2565. get_doc->index_cache->charset = fts_get_charset(
  2566. ifield->col->prtype);
  2567. }
  2568. if (rec_offs_nth_extern(offsets, clust_pos)) {
  2569. doc->text.f_str =
  2570. btr_rec_copy_externally_stored_field(
  2571. clust_rec, offsets,
  2572. btr_pcur_get_block(pcur)->zip_size(),
  2573. clust_pos, &doc->text.f_len,
  2574. static_cast<mem_heap_t*>(
  2575. doc->self_heap->arg));
  2576. } else {
  2577. doc->text.f_str = (byte*) rec_get_nth_field(
  2578. clust_rec, offsets, clust_pos,
  2579. &doc->text.f_len);
  2580. }
  2581. doc->found = TRUE;
  2582. doc->charset = get_doc->index_cache->charset;
  2583. /* Null Field */
  2584. if (doc->text.f_len == UNIV_SQL_NULL || doc->text.f_len == 0) {
  2585. continue;
  2586. }
  2587. if (!doc_len) {
  2588. fts_tokenize_document(doc, NULL, parser);
  2589. } else {
  2590. fts_tokenize_document_next(doc, doc_len, NULL, parser);
  2591. }
  2592. doc_len += doc->text.f_len + 1;
  2593. }
  2594. }
  2595. /** Fetch the data from tuple and tokenize the document.
  2596. @param[in] get_doc FTS index's get_doc struct
  2597. @param[in] tuple tuple should be arranged in table schema order
  2598. @param[out] doc fts doc to hold parsed documents. */
  2599. static
  2600. void
  2601. fts_fetch_doc_from_tuple(
  2602. fts_get_doc_t* get_doc,
  2603. const dtuple_t* tuple,
  2604. fts_doc_t* doc)
  2605. {
  2606. dict_index_t* index;
  2607. st_mysql_ftparser* parser;
  2608. ulint doc_len = 0;
  2609. ulint processed_doc = 0;
  2610. ulint num_field;
  2611. if (get_doc == NULL) {
  2612. return;
  2613. }
  2614. index = get_doc->index_cache->index;
  2615. parser = get_doc->index_cache->index->parser;
  2616. num_field = dict_index_get_n_fields(index);
  2617. for (ulint i = 0; i < num_field; i++) {
  2618. const dict_field_t* ifield;
  2619. const dict_col_t* col;
  2620. ulint pos;
  2621. ifield = dict_index_get_nth_field(index, i);
  2622. col = dict_field_get_col(ifield);
  2623. pos = dict_col_get_no(col);
  2624. const dfield_t* field = dtuple_get_nth_field(tuple, pos);
  2625. if (!get_doc->index_cache->charset) {
  2626. get_doc->index_cache->charset = fts_get_charset(
  2627. ifield->col->prtype);
  2628. }
  2629. ut_ad(!dfield_is_ext(field));
  2630. doc->text.f_str = (byte*) dfield_get_data(field);
  2631. doc->text.f_len = dfield_get_len(field);
  2632. doc->found = TRUE;
  2633. doc->charset = get_doc->index_cache->charset;
  2634. /* field data is NULL. */
  2635. if (doc->text.f_len == UNIV_SQL_NULL || doc->text.f_len == 0) {
  2636. continue;
  2637. }
  2638. if (processed_doc == 0) {
  2639. fts_tokenize_document(doc, NULL, parser);
  2640. } else {
  2641. fts_tokenize_document_next(doc, doc_len, NULL, parser);
  2642. }
  2643. processed_doc++;
  2644. doc_len += doc->text.f_len + 1;
  2645. }
  2646. }
  2647. /** Fetch the document from tuple, tokenize the text data and
  2648. insert the text data into fts auxiliary table and
  2649. its cache. Moreover this tuple fields doesn't contain any information
  2650. about externally stored field. This tuple contains data directly
  2651. converted from mysql.
  2652. @param[in] ftt FTS transaction table
  2653. @param[in] doc_id doc id
  2654. @param[in] tuple tuple from where data can be retrieved
  2655. and tuple should be arranged in table
  2656. schema order. */
  2657. void
  2658. fts_add_doc_from_tuple(
  2659. fts_trx_table_t*ftt,
  2660. doc_id_t doc_id,
  2661. const dtuple_t* tuple)
  2662. {
  2663. mtr_t mtr;
  2664. fts_cache_t* cache = ftt->table->fts->cache;
  2665. ut_ad(cache->get_docs);
  2666. if (!ftt->table->fts->added_synced) {
  2667. fts_init_index(ftt->table, FALSE);
  2668. }
  2669. mtr_start(&mtr);
  2670. ulint num_idx = ib_vector_size(cache->get_docs);
  2671. for (ulint i = 0; i < num_idx; ++i) {
  2672. fts_doc_t doc;
  2673. dict_table_t* table;
  2674. fts_get_doc_t* get_doc;
  2675. get_doc = static_cast<fts_get_doc_t*>(
  2676. ib_vector_get(cache->get_docs, i));
  2677. table = get_doc->index_cache->index->table;
  2678. fts_doc_init(&doc);
  2679. fts_fetch_doc_from_tuple(
  2680. get_doc, tuple, &doc);
  2681. if (doc.found) {
  2682. mtr_commit(&mtr);
  2683. rw_lock_x_lock(&table->fts->cache->lock);
  2684. if (table->fts->cache->stopword_info.status
  2685. & STOPWORD_NOT_INIT) {
  2686. fts_load_stopword(table, NULL, NULL,
  2687. true, true);
  2688. }
  2689. fts_cache_add_doc(
  2690. table->fts->cache,
  2691. get_doc->index_cache,
  2692. doc_id, doc.tokens);
  2693. rw_lock_x_unlock(&table->fts->cache->lock);
  2694. if (cache->total_size > fts_max_cache_size / 5
  2695. || fts_need_sync) {
  2696. fts_sync(cache->sync, true, false);
  2697. }
  2698. mtr_start(&mtr);
  2699. }
  2700. fts_doc_free(&doc);
  2701. }
  2702. mtr_commit(&mtr);
  2703. }
  2704. /*********************************************************************//**
  2705. This function fetches the document inserted during the committing
  2706. transaction, and tokenize the inserted text data and insert into
  2707. FTS auxiliary table and its cache.
  2708. @return TRUE if successful */
  2709. static
  2710. ulint
  2711. fts_add_doc_by_id(
  2712. /*==============*/
  2713. fts_trx_table_t*ftt, /*!< in: FTS trx table */
  2714. doc_id_t doc_id, /*!< in: doc id */
  2715. ib_vector_t* fts_indexes MY_ATTRIBUTE((unused)))
  2716. /*!< in: affected fts indexes */
  2717. {
  2718. mtr_t mtr;
  2719. mem_heap_t* heap;
  2720. btr_pcur_t pcur;
  2721. dict_table_t* table;
  2722. dtuple_t* tuple;
  2723. dfield_t* dfield;
  2724. fts_get_doc_t* get_doc;
  2725. doc_id_t temp_doc_id;
  2726. dict_index_t* clust_index;
  2727. dict_index_t* fts_id_index;
  2728. ibool is_id_cluster;
  2729. fts_cache_t* cache = ftt->table->fts->cache;
  2730. ut_ad(cache->get_docs);
  2731. /* If Doc ID has been supplied by the user, then the table
  2732. might not yet be sync-ed */
  2733. if (!ftt->table->fts->added_synced) {
  2734. fts_init_index(ftt->table, FALSE);
  2735. }
  2736. /* Get the first FTS index's get_doc */
  2737. get_doc = static_cast<fts_get_doc_t*>(
  2738. ib_vector_get(cache->get_docs, 0));
  2739. ut_ad(get_doc);
  2740. table = get_doc->index_cache->index->table;
  2741. heap = mem_heap_create(512);
  2742. clust_index = dict_table_get_first_index(table);
  2743. fts_id_index = table->fts_doc_id_index;
  2744. /* Check whether the index on FTS_DOC_ID is cluster index */
  2745. is_id_cluster = (clust_index == fts_id_index);
  2746. mtr_start(&mtr);
  2747. btr_pcur_init(&pcur);
  2748. /* Search based on Doc ID. Here, we'll need to consider the case
  2749. when there is no primary index on Doc ID */
  2750. tuple = dtuple_create(heap, 1);
  2751. dfield = dtuple_get_nth_field(tuple, 0);
  2752. dfield->type.mtype = DATA_INT;
  2753. dfield->type.prtype = DATA_NOT_NULL | DATA_UNSIGNED | DATA_BINARY_TYPE;
  2754. mach_write_to_8((byte*) &temp_doc_id, doc_id);
  2755. dfield_set_data(dfield, &temp_doc_id, sizeof(temp_doc_id));
  2756. btr_pcur_open_with_no_init(
  2757. fts_id_index, tuple, PAGE_CUR_LE, BTR_SEARCH_LEAF,
  2758. &pcur, 0, &mtr);
  2759. /* If we have a match, add the data to doc structure */
  2760. if (btr_pcur_get_low_match(&pcur) == 1) {
  2761. const rec_t* rec;
  2762. btr_pcur_t* doc_pcur;
  2763. const rec_t* clust_rec;
  2764. btr_pcur_t clust_pcur;
  2765. rec_offs* offsets = NULL;
  2766. ulint num_idx = ib_vector_size(cache->get_docs);
  2767. rec = btr_pcur_get_rec(&pcur);
  2768. /* Doc could be deleted */
  2769. if (page_rec_is_infimum(rec)
  2770. || rec_get_deleted_flag(rec, dict_table_is_comp(table))) {
  2771. goto func_exit;
  2772. }
  2773. if (is_id_cluster) {
  2774. clust_rec = rec;
  2775. doc_pcur = &pcur;
  2776. } else {
  2777. dtuple_t* clust_ref;
  2778. ulint n_fields;
  2779. btr_pcur_init(&clust_pcur);
  2780. n_fields = dict_index_get_n_unique(clust_index);
  2781. clust_ref = dtuple_create(heap, n_fields);
  2782. dict_index_copy_types(clust_ref, clust_index, n_fields);
  2783. row_build_row_ref_in_tuple(
  2784. clust_ref, rec, fts_id_index, NULL);
  2785. btr_pcur_open_with_no_init(
  2786. clust_index, clust_ref, PAGE_CUR_LE,
  2787. BTR_SEARCH_LEAF, &clust_pcur, 0, &mtr);
  2788. doc_pcur = &clust_pcur;
  2789. clust_rec = btr_pcur_get_rec(&clust_pcur);
  2790. }
  2791. offsets = rec_get_offsets(clust_rec, clust_index, NULL,
  2792. clust_index->n_core_fields,
  2793. ULINT_UNDEFINED, &heap);
  2794. for (ulint i = 0; i < num_idx; ++i) {
  2795. fts_doc_t doc;
  2796. dict_table_t* table;
  2797. fts_get_doc_t* get_doc;
  2798. get_doc = static_cast<fts_get_doc_t*>(
  2799. ib_vector_get(cache->get_docs, i));
  2800. table = get_doc->index_cache->index->table;
  2801. fts_doc_init(&doc);
  2802. fts_fetch_doc_from_rec(
  2803. get_doc, clust_index, doc_pcur, offsets, &doc);
  2804. if (doc.found) {
  2805. ibool success MY_ATTRIBUTE((unused));
  2806. btr_pcur_store_position(doc_pcur, &mtr);
  2807. mtr_commit(&mtr);
  2808. rw_lock_x_lock(&table->fts->cache->lock);
  2809. if (table->fts->cache->stopword_info.status
  2810. & STOPWORD_NOT_INIT) {
  2811. fts_load_stopword(table, NULL,
  2812. NULL, true, true);
  2813. }
  2814. fts_cache_add_doc(
  2815. table->fts->cache,
  2816. get_doc->index_cache,
  2817. doc_id, doc.tokens);
  2818. bool need_sync = false;
  2819. if ((cache->total_size > fts_max_cache_size / 10
  2820. || fts_need_sync)
  2821. && !cache->sync->in_progress) {
  2822. need_sync = true;
  2823. }
  2824. rw_lock_x_unlock(&table->fts->cache->lock);
  2825. DBUG_EXECUTE_IF(
  2826. "fts_instrument_sync",
  2827. fts_optimize_request_sync_table(table);
  2828. os_event_wait(cache->sync->event);
  2829. );
  2830. DBUG_EXECUTE_IF(
  2831. "fts_instrument_sync_debug",
  2832. fts_sync(cache->sync, true, true);
  2833. );
  2834. DEBUG_SYNC_C("fts_instrument_sync_request");
  2835. DBUG_EXECUTE_IF(
  2836. "fts_instrument_sync_request",
  2837. fts_optimize_request_sync_table(table);
  2838. );
  2839. if (need_sync) {
  2840. fts_optimize_request_sync_table(table);
  2841. }
  2842. mtr_start(&mtr);
  2843. if (i < num_idx - 1) {
  2844. success = btr_pcur_restore_position(
  2845. BTR_SEARCH_LEAF, doc_pcur,
  2846. &mtr);
  2847. ut_ad(success);
  2848. }
  2849. }
  2850. fts_doc_free(&doc);
  2851. }
  2852. if (!is_id_cluster) {
  2853. btr_pcur_close(doc_pcur);
  2854. }
  2855. }
  2856. func_exit:
  2857. mtr_commit(&mtr);
  2858. btr_pcur_close(&pcur);
  2859. mem_heap_free(heap);
  2860. return(TRUE);
  2861. }
  2862. /*********************************************************************//**
  2863. Callback function to read a single ulint column.
  2864. return always returns TRUE */
  2865. static
  2866. ibool
  2867. fts_read_ulint(
  2868. /*===========*/
  2869. void* row, /*!< in: sel_node_t* */
  2870. void* user_arg) /*!< in: pointer to ulint */
  2871. {
  2872. sel_node_t* sel_node = static_cast<sel_node_t*>(row);
  2873. ulint* value = static_cast<ulint*>(user_arg);
  2874. que_node_t* exp = sel_node->select_list;
  2875. dfield_t* dfield = que_node_get_val(exp);
  2876. void* data = dfield_get_data(dfield);
  2877. *value = mach_read_from_4(static_cast<const byte*>(data));
  2878. return(TRUE);
  2879. }
  2880. /*********************************************************************//**
  2881. Get maximum Doc ID in a table if index "FTS_DOC_ID_INDEX" exists
  2882. @return max Doc ID or 0 if index "FTS_DOC_ID_INDEX" does not exist */
  2883. doc_id_t
  2884. fts_get_max_doc_id(
  2885. /*===============*/
  2886. dict_table_t* table) /*!< in: user table */
  2887. {
  2888. dict_index_t* index;
  2889. dict_field_t* dfield MY_ATTRIBUTE((unused)) = NULL;
  2890. doc_id_t doc_id = 0;
  2891. mtr_t mtr;
  2892. btr_pcur_t pcur;
  2893. index = table->fts_doc_id_index;
  2894. if (!index) {
  2895. return(0);
  2896. }
  2897. ut_ad(!index->is_instant());
  2898. dfield = dict_index_get_nth_field(index, 0);
  2899. #if 0 /* This can fail when renaming a column to FTS_DOC_ID_COL_NAME. */
  2900. ut_ad(innobase_strcasecmp(FTS_DOC_ID_COL_NAME, dfield->name) == 0);
  2901. #endif
  2902. mtr_start(&mtr);
  2903. /* fetch the largest indexes value */
  2904. btr_pcur_open_at_index_side(
  2905. false, index, BTR_SEARCH_LEAF, &pcur, true, 0, &mtr);
  2906. if (!page_is_empty(btr_pcur_get_page(&pcur))) {
  2907. const rec_t* rec = NULL;
  2908. do {
  2909. rec = btr_pcur_get_rec(&pcur);
  2910. if (page_rec_is_user_rec(rec)) {
  2911. break;
  2912. }
  2913. } while (btr_pcur_move_to_prev(&pcur, &mtr));
  2914. if (!rec || rec_is_metadata(rec, *index)) {
  2915. goto func_exit;
  2916. }
  2917. doc_id = fts_read_doc_id(rec);
  2918. }
  2919. func_exit:
  2920. btr_pcur_close(&pcur);
  2921. mtr_commit(&mtr);
  2922. return(doc_id);
  2923. }
  2924. /*********************************************************************//**
  2925. Fetch document with the given document id.
  2926. @return DB_SUCCESS if OK else error */
  2927. dberr_t
  2928. fts_doc_fetch_by_doc_id(
  2929. /*====================*/
  2930. fts_get_doc_t* get_doc, /*!< in: state */
  2931. doc_id_t doc_id, /*!< in: id of document to
  2932. fetch */
  2933. dict_index_t* index_to_use, /*!< in: caller supplied FTS index,
  2934. or NULL */
  2935. ulint option, /*!< in: search option, if it is
  2936. greater than doc_id or equal */
  2937. fts_sql_callback
  2938. callback, /*!< in: callback to read */
  2939. void* arg) /*!< in: callback arg */
  2940. {
  2941. pars_info_t* info;
  2942. dberr_t error;
  2943. const char* select_str;
  2944. doc_id_t write_doc_id;
  2945. dict_index_t* index;
  2946. trx_t* trx = trx_create();
  2947. que_t* graph;
  2948. trx->op_info = "fetching indexed FTS document";
  2949. /* The FTS index can be supplied by caller directly with
  2950. "index_to_use", otherwise, get it from "get_doc" */
  2951. index = (index_to_use) ? index_to_use : get_doc->index_cache->index;
  2952. if (get_doc && get_doc->get_document_graph) {
  2953. info = get_doc->get_document_graph->info;
  2954. } else {
  2955. info = pars_info_create();
  2956. }
  2957. /* Convert to "storage" byte order. */
  2958. fts_write_doc_id((byte*) &write_doc_id, doc_id);
  2959. fts_bind_doc_id(info, "doc_id", &write_doc_id);
  2960. pars_info_bind_function(info, "my_func", callback, arg);
  2961. select_str = fts_get_select_columns_str(index, info, info->heap);
  2962. pars_info_bind_id(info, TRUE, "table_name", index->table->name.m_name);
  2963. if (!get_doc || !get_doc->get_document_graph) {
  2964. if (option == FTS_FETCH_DOC_BY_ID_EQUAL) {
  2965. graph = fts_parse_sql(
  2966. NULL,
  2967. info,
  2968. mem_heap_printf(info->heap,
  2969. "DECLARE FUNCTION my_func;\n"
  2970. "DECLARE CURSOR c IS"
  2971. " SELECT %s FROM $table_name"
  2972. " WHERE %s = :doc_id;\n"
  2973. "BEGIN\n"
  2974. ""
  2975. "OPEN c;\n"
  2976. "WHILE 1 = 1 LOOP\n"
  2977. " FETCH c INTO my_func();\n"
  2978. " IF c %% NOTFOUND THEN\n"
  2979. " EXIT;\n"
  2980. " END IF;\n"
  2981. "END LOOP;\n"
  2982. "CLOSE c;",
  2983. select_str, FTS_DOC_ID_COL_NAME));
  2984. } else {
  2985. ut_ad(option == FTS_FETCH_DOC_BY_ID_LARGE);
  2986. /* This is used for crash recovery of table with
  2987. hidden DOC ID or FTS indexes. We will scan the table
  2988. to re-processing user table rows whose DOC ID or
  2989. FTS indexed documents have not been sync-ed to disc
  2990. during recent crash.
  2991. In the case that all fulltext indexes are dropped
  2992. for a table, we will keep the "hidden" FTS_DOC_ID
  2993. column, and this scan is to retreive the largest
  2994. DOC ID being used in the table to determine the
  2995. appropriate next DOC ID.
  2996. In the case of there exists fulltext index(es), this
  2997. operation will re-tokenize any docs that have not
  2998. been sync-ed to the disk, and re-prime the FTS
  2999. cached */
  3000. graph = fts_parse_sql(
  3001. NULL,
  3002. info,
  3003. mem_heap_printf(info->heap,
  3004. "DECLARE FUNCTION my_func;\n"
  3005. "DECLARE CURSOR c IS"
  3006. " SELECT %s, %s FROM $table_name"
  3007. " WHERE %s > :doc_id;\n"
  3008. "BEGIN\n"
  3009. ""
  3010. "OPEN c;\n"
  3011. "WHILE 1 = 1 LOOP\n"
  3012. " FETCH c INTO my_func();\n"
  3013. " IF c %% NOTFOUND THEN\n"
  3014. " EXIT;\n"
  3015. " END IF;\n"
  3016. "END LOOP;\n"
  3017. "CLOSE c;",
  3018. FTS_DOC_ID_COL_NAME,
  3019. select_str, FTS_DOC_ID_COL_NAME));
  3020. }
  3021. if (get_doc) {
  3022. get_doc->get_document_graph = graph;
  3023. }
  3024. } else {
  3025. graph = get_doc->get_document_graph;
  3026. }
  3027. error = fts_eval_sql(trx, graph);
  3028. fts_sql_commit(trx);
  3029. trx->free();
  3030. if (!get_doc) {
  3031. fts_que_graph_free(graph);
  3032. }
  3033. return(error);
  3034. }
  3035. /*********************************************************************//**
  3036. Write out a single word's data as new entry/entries in the INDEX table.
  3037. @return DB_SUCCESS if all OK. */
  3038. dberr_t
  3039. fts_write_node(
  3040. /*===========*/
  3041. trx_t* trx, /*!< in: transaction */
  3042. que_t** graph, /*!< in: query graph */
  3043. fts_table_t* fts_table, /*!< in: aux table */
  3044. fts_string_t* word, /*!< in: word in UTF-8 */
  3045. fts_node_t* node) /*!< in: node columns */
  3046. {
  3047. pars_info_t* info;
  3048. dberr_t error;
  3049. ib_uint32_t doc_count;
  3050. time_t start_time;
  3051. doc_id_t last_doc_id;
  3052. doc_id_t first_doc_id;
  3053. char table_name[MAX_FULL_NAME_LEN];
  3054. ut_a(node->ilist != NULL);
  3055. if (*graph) {
  3056. info = (*graph)->info;
  3057. } else {
  3058. info = pars_info_create();
  3059. fts_get_table_name(fts_table, table_name);
  3060. pars_info_bind_id(info, true, "index_table_name", table_name);
  3061. }
  3062. pars_info_bind_varchar_literal(info, "token", word->f_str, word->f_len);
  3063. /* Convert to "storage" byte order. */
  3064. fts_write_doc_id((byte*) &first_doc_id, node->first_doc_id);
  3065. fts_bind_doc_id(info, "first_doc_id", &first_doc_id);
  3066. /* Convert to "storage" byte order. */
  3067. fts_write_doc_id((byte*) &last_doc_id, node->last_doc_id);
  3068. fts_bind_doc_id(info, "last_doc_id", &last_doc_id);
  3069. ut_a(node->last_doc_id >= node->first_doc_id);
  3070. /* Convert to "storage" byte order. */
  3071. mach_write_to_4((byte*) &doc_count, node->doc_count);
  3072. pars_info_bind_int4_literal(
  3073. info, "doc_count", (const ib_uint32_t*) &doc_count);
  3074. /* Set copy_name to FALSE since it's a static. */
  3075. pars_info_bind_literal(
  3076. info, "ilist", node->ilist, node->ilist_size,
  3077. DATA_BLOB, DATA_BINARY_TYPE);
  3078. if (!*graph) {
  3079. *graph = fts_parse_sql(
  3080. fts_table,
  3081. info,
  3082. "BEGIN\n"
  3083. "INSERT INTO $index_table_name VALUES"
  3084. " (:token, :first_doc_id,"
  3085. " :last_doc_id, :doc_count, :ilist);");
  3086. }
  3087. start_time = time(NULL);
  3088. error = fts_eval_sql(trx, *graph);
  3089. elapsed_time += time(NULL) - start_time;
  3090. ++n_nodes;
  3091. return(error);
  3092. }
  3093. /*********************************************************************//**
  3094. Add rows to the DELETED_CACHE table.
  3095. @return DB_SUCCESS if all went well else error code*/
  3096. static MY_ATTRIBUTE((nonnull, warn_unused_result))
  3097. dberr_t
  3098. fts_sync_add_deleted_cache(
  3099. /*=======================*/
  3100. fts_sync_t* sync, /*!< in: sync state */
  3101. ib_vector_t* doc_ids) /*!< in: doc ids to add */
  3102. {
  3103. ulint i;
  3104. pars_info_t* info;
  3105. que_t* graph;
  3106. fts_table_t fts_table;
  3107. char table_name[MAX_FULL_NAME_LEN];
  3108. doc_id_t dummy = 0;
  3109. dberr_t error = DB_SUCCESS;
  3110. ulint n_elems = ib_vector_size(doc_ids);
  3111. ut_a(ib_vector_size(doc_ids) > 0);
  3112. ib_vector_sort(doc_ids, fts_doc_id_cmp);
  3113. info = pars_info_create();
  3114. fts_bind_doc_id(info, "doc_id", &dummy);
  3115. FTS_INIT_FTS_TABLE(
  3116. &fts_table, "DELETED_CACHE", FTS_COMMON_TABLE, sync->table);
  3117. fts_get_table_name(&fts_table, table_name);
  3118. pars_info_bind_id(info, true, "table_name", table_name);
  3119. graph = fts_parse_sql(
  3120. &fts_table,
  3121. info,
  3122. "BEGIN INSERT INTO $table_name VALUES (:doc_id);");
  3123. for (i = 0; i < n_elems && error == DB_SUCCESS; ++i) {
  3124. doc_id_t* update;
  3125. doc_id_t write_doc_id;
  3126. update = static_cast<doc_id_t*>(ib_vector_get(doc_ids, i));
  3127. /* Convert to "storage" byte order. */
  3128. fts_write_doc_id((byte*) &write_doc_id, *update);
  3129. fts_bind_doc_id(info, "doc_id", &write_doc_id);
  3130. error = fts_eval_sql(sync->trx, graph);
  3131. }
  3132. fts_que_graph_free(graph);
  3133. return(error);
  3134. }
  3135. /** Write the words and ilist to disk.
  3136. @param[in,out] trx transaction
  3137. @param[in] index_cache index cache
  3138. @param[in] unlock_cache whether unlock cache when write node
  3139. @return DB_SUCCESS if all went well else error code */
  3140. static MY_ATTRIBUTE((nonnull, warn_unused_result))
  3141. dberr_t
  3142. fts_sync_write_words(
  3143. trx_t* trx,
  3144. fts_index_cache_t* index_cache,
  3145. bool unlock_cache)
  3146. {
  3147. fts_table_t fts_table;
  3148. ulint n_nodes = 0;
  3149. ulint n_words = 0;
  3150. const ib_rbt_node_t* rbt_node;
  3151. dberr_t error = DB_SUCCESS;
  3152. ibool print_error = FALSE;
  3153. dict_table_t* table = index_cache->index->table;
  3154. FTS_INIT_INDEX_TABLE(
  3155. &fts_table, NULL, FTS_INDEX_TABLE, index_cache->index);
  3156. n_words = rbt_size(index_cache->words);
  3157. /* We iterate over the entire tree, even if there is an error,
  3158. since we want to free the memory used during caching. */
  3159. for (rbt_node = rbt_first(index_cache->words);
  3160. rbt_node;
  3161. rbt_node = rbt_next(index_cache->words, rbt_node)) {
  3162. ulint i;
  3163. ulint selected;
  3164. fts_tokenizer_word_t* word;
  3165. word = rbt_value(fts_tokenizer_word_t, rbt_node);
  3166. DBUG_EXECUTE_IF("fts_instrument_write_words_before_select_index",
  3167. os_thread_sleep(300000););
  3168. selected = fts_select_index(
  3169. index_cache->charset, word->text.f_str,
  3170. word->text.f_len);
  3171. fts_table.suffix = fts_get_suffix(selected);
  3172. /* We iterate over all the nodes even if there was an error */
  3173. for (i = 0; i < ib_vector_size(word->nodes); ++i) {
  3174. fts_node_t* fts_node = static_cast<fts_node_t*>(
  3175. ib_vector_get(word->nodes, i));
  3176. if (fts_node->synced) {
  3177. continue;
  3178. } else {
  3179. fts_node->synced = true;
  3180. }
  3181. /*FIXME: we need to handle the error properly. */
  3182. if (error == DB_SUCCESS) {
  3183. if (unlock_cache) {
  3184. rw_lock_x_unlock(
  3185. &table->fts->cache->lock);
  3186. }
  3187. error = fts_write_node(
  3188. trx,
  3189. &index_cache->ins_graph[selected],
  3190. &fts_table, &word->text, fts_node);
  3191. DEBUG_SYNC_C("fts_write_node");
  3192. DBUG_EXECUTE_IF("fts_write_node_crash",
  3193. DBUG_SUICIDE(););
  3194. DBUG_EXECUTE_IF("fts_instrument_sync_sleep",
  3195. os_thread_sleep(1000000);
  3196. );
  3197. if (unlock_cache) {
  3198. rw_lock_x_lock(
  3199. &table->fts->cache->lock);
  3200. }
  3201. }
  3202. }
  3203. n_nodes += ib_vector_size(word->nodes);
  3204. if (UNIV_UNLIKELY(error != DB_SUCCESS) && !print_error) {
  3205. ib::error() << "(" << error << ") writing"
  3206. " word node to FTS auxiliary index table "
  3207. << table->name;
  3208. print_error = TRUE;
  3209. }
  3210. }
  3211. if (UNIV_UNLIKELY(fts_enable_diag_print)) {
  3212. printf("Avg number of nodes: %lf\n",
  3213. (double) n_nodes / (double) (n_words > 1 ? n_words : 1));
  3214. }
  3215. return(error);
  3216. }
  3217. /*********************************************************************//**
  3218. Begin Sync, create transaction, acquire locks, etc. */
  3219. static
  3220. void
  3221. fts_sync_begin(
  3222. /*===========*/
  3223. fts_sync_t* sync) /*!< in: sync state */
  3224. {
  3225. fts_cache_t* cache = sync->table->fts->cache;
  3226. n_nodes = 0;
  3227. elapsed_time = 0;
  3228. sync->start_time = time(NULL);
  3229. sync->trx = trx_create();
  3230. trx_start_internal(sync->trx);
  3231. if (UNIV_UNLIKELY(fts_enable_diag_print)) {
  3232. ib::info() << "FTS SYNC for table " << sync->table->name
  3233. << ", deleted count: "
  3234. << ib_vector_size(cache->deleted_doc_ids)
  3235. << " size: " << cache->total_size << " bytes";
  3236. }
  3237. }
  3238. /*********************************************************************//**
  3239. Run SYNC on the table, i.e., write out data from the index specific
  3240. cache to the FTS aux INDEX table and FTS aux doc id stats table.
  3241. @return DB_SUCCESS if all OK */
  3242. static MY_ATTRIBUTE((nonnull, warn_unused_result))
  3243. dberr_t
  3244. fts_sync_index(
  3245. /*===========*/
  3246. fts_sync_t* sync, /*!< in: sync state */
  3247. fts_index_cache_t* index_cache) /*!< in: index cache */
  3248. {
  3249. trx_t* trx = sync->trx;
  3250. trx->op_info = "doing SYNC index";
  3251. if (UNIV_UNLIKELY(fts_enable_diag_print)) {
  3252. ib::info() << "SYNC words: " << rbt_size(index_cache->words);
  3253. }
  3254. ut_ad(rbt_validate(index_cache->words));
  3255. return(fts_sync_write_words(trx, index_cache, sync->unlock_cache));
  3256. }
  3257. /** Check if index cache has been synced completely
  3258. @param[in,out] index_cache index cache
  3259. @return true if index is synced, otherwise false. */
  3260. static
  3261. bool
  3262. fts_sync_index_check(
  3263. fts_index_cache_t* index_cache)
  3264. {
  3265. const ib_rbt_node_t* rbt_node;
  3266. for (rbt_node = rbt_first(index_cache->words);
  3267. rbt_node != NULL;
  3268. rbt_node = rbt_next(index_cache->words, rbt_node)) {
  3269. fts_tokenizer_word_t* word;
  3270. word = rbt_value(fts_tokenizer_word_t, rbt_node);
  3271. fts_node_t* fts_node;
  3272. fts_node = static_cast<fts_node_t*>(ib_vector_last(word->nodes));
  3273. if (!fts_node->synced) {
  3274. return(false);
  3275. }
  3276. }
  3277. return(true);
  3278. }
  3279. /** Reset synced flag in index cache when rollback
  3280. @param[in,out] index_cache index cache */
  3281. static
  3282. void
  3283. fts_sync_index_reset(
  3284. fts_index_cache_t* index_cache)
  3285. {
  3286. const ib_rbt_node_t* rbt_node;
  3287. for (rbt_node = rbt_first(index_cache->words);
  3288. rbt_node != NULL;
  3289. rbt_node = rbt_next(index_cache->words, rbt_node)) {
  3290. fts_tokenizer_word_t* word;
  3291. word = rbt_value(fts_tokenizer_word_t, rbt_node);
  3292. fts_node_t* fts_node;
  3293. fts_node = static_cast<fts_node_t*>(ib_vector_last(word->nodes));
  3294. fts_node->synced = false;
  3295. }
  3296. }
  3297. /** Commit the SYNC, change state of processed doc ids etc.
  3298. @param[in,out] sync sync state
  3299. @return DB_SUCCESS if all OK */
  3300. static MY_ATTRIBUTE((nonnull, warn_unused_result))
  3301. dberr_t
  3302. fts_sync_commit(
  3303. fts_sync_t* sync)
  3304. {
  3305. dberr_t error;
  3306. trx_t* trx = sync->trx;
  3307. fts_cache_t* cache = sync->table->fts->cache;
  3308. doc_id_t last_doc_id;
  3309. trx->op_info = "doing SYNC commit";
  3310. /* After each Sync, update the CONFIG table about the max doc id
  3311. we just sync-ed to index table */
  3312. error = fts_cmp_set_sync_doc_id(sync->table, sync->max_doc_id, FALSE,
  3313. &last_doc_id);
  3314. /* Get the list of deleted documents that are either in the
  3315. cache or were headed there but were deleted before the add
  3316. thread got to them. */
  3317. if (error == DB_SUCCESS && ib_vector_size(cache->deleted_doc_ids) > 0) {
  3318. error = fts_sync_add_deleted_cache(
  3319. sync, cache->deleted_doc_ids);
  3320. }
  3321. /* We need to do this within the deleted lock since fts_delete() can
  3322. attempt to add a deleted doc id to the cache deleted id array. */
  3323. fts_cache_clear(cache);
  3324. DEBUG_SYNC_C("fts_deleted_doc_ids_clear");
  3325. fts_cache_init(cache);
  3326. rw_lock_x_unlock(&cache->lock);
  3327. if (UNIV_LIKELY(error == DB_SUCCESS)) {
  3328. fts_sql_commit(trx);
  3329. } else {
  3330. fts_sql_rollback(trx);
  3331. ib::error() << "(" << error << ") during SYNC of "
  3332. "table " << sync->table->name;
  3333. }
  3334. if (UNIV_UNLIKELY(fts_enable_diag_print) && elapsed_time) {
  3335. ib::info() << "SYNC for table " << sync->table->name
  3336. << ": SYNC time: "
  3337. << (time(NULL) - sync->start_time)
  3338. << " secs: elapsed "
  3339. << (double) n_nodes / elapsed_time
  3340. << " ins/sec";
  3341. }
  3342. /* Avoid assertion in trx_t::free(). */
  3343. trx->dict_operation_lock_mode = 0;
  3344. trx->free();
  3345. return(error);
  3346. }
  3347. /** Rollback a sync operation
  3348. @param[in,out] sync sync state */
  3349. static
  3350. void
  3351. fts_sync_rollback(
  3352. fts_sync_t* sync)
  3353. {
  3354. trx_t* trx = sync->trx;
  3355. fts_cache_t* cache = sync->table->fts->cache;
  3356. for (ulint i = 0; i < ib_vector_size(cache->indexes); ++i) {
  3357. ulint j;
  3358. fts_index_cache_t* index_cache;
  3359. index_cache = static_cast<fts_index_cache_t*>(
  3360. ib_vector_get(cache->indexes, i));
  3361. /* Reset synced flag so nodes will not be skipped
  3362. in the next sync, see fts_sync_write_words(). */
  3363. fts_sync_index_reset(index_cache);
  3364. for (j = 0; fts_index_selector[j].value; ++j) {
  3365. if (index_cache->ins_graph[j] != NULL) {
  3366. fts_que_graph_free_check_lock(
  3367. NULL, index_cache,
  3368. index_cache->ins_graph[j]);
  3369. index_cache->ins_graph[j] = NULL;
  3370. }
  3371. if (index_cache->sel_graph[j] != NULL) {
  3372. fts_que_graph_free_check_lock(
  3373. NULL, index_cache,
  3374. index_cache->sel_graph[j]);
  3375. index_cache->sel_graph[j] = NULL;
  3376. }
  3377. }
  3378. }
  3379. rw_lock_x_unlock(&cache->lock);
  3380. fts_sql_rollback(trx);
  3381. /* Avoid assertion in trx_t::free(). */
  3382. trx->dict_operation_lock_mode = 0;
  3383. trx->free();
  3384. }
  3385. /** Run SYNC on the table, i.e., write out data from the cache to the
  3386. FTS auxiliary INDEX table and clear the cache at the end.
  3387. @param[in,out] sync sync state
  3388. @param[in] unlock_cache whether unlock cache lock when write node
  3389. @param[in] wait whether wait when a sync is in progress
  3390. @return DB_SUCCESS if all OK */
  3391. static
  3392. dberr_t
  3393. fts_sync(
  3394. fts_sync_t* sync,
  3395. bool unlock_cache,
  3396. bool wait)
  3397. {
  3398. if (srv_read_only_mode) {
  3399. return DB_READ_ONLY;
  3400. }
  3401. ulint i;
  3402. dberr_t error = DB_SUCCESS;
  3403. fts_cache_t* cache = sync->table->fts->cache;
  3404. rw_lock_x_lock(&cache->lock);
  3405. /* Check if cache is being synced.
  3406. Note: we release cache lock in fts_sync_write_words() to
  3407. avoid long wait for the lock by other threads. */
  3408. while (sync->in_progress) {
  3409. rw_lock_x_unlock(&cache->lock);
  3410. if (wait) {
  3411. os_event_wait(sync->event);
  3412. } else {
  3413. return(DB_SUCCESS);
  3414. }
  3415. rw_lock_x_lock(&cache->lock);
  3416. }
  3417. sync->unlock_cache = unlock_cache;
  3418. sync->in_progress = true;
  3419. DEBUG_SYNC_C("fts_sync_begin");
  3420. fts_sync_begin(sync);
  3421. begin_sync:
  3422. if (cache->total_size > fts_max_cache_size) {
  3423. /* Avoid the case: sync never finish when
  3424. insert/update keeps comming. */
  3425. ut_ad(sync->unlock_cache);
  3426. sync->unlock_cache = false;
  3427. }
  3428. for (i = 0; i < ib_vector_size(cache->indexes); ++i) {
  3429. fts_index_cache_t* index_cache;
  3430. index_cache = static_cast<fts_index_cache_t*>(
  3431. ib_vector_get(cache->indexes, i));
  3432. if (index_cache->index->to_be_dropped
  3433. || index_cache->index->table->to_be_dropped) {
  3434. continue;
  3435. }
  3436. DBUG_EXECUTE_IF("fts_instrument_sync_before_syncing",
  3437. os_thread_sleep(300000););
  3438. index_cache->index->index_fts_syncing = true;
  3439. error = fts_sync_index(sync, index_cache);
  3440. if (error != DB_SUCCESS) {
  3441. goto end_sync;
  3442. }
  3443. }
  3444. DBUG_EXECUTE_IF("fts_instrument_sync_interrupted",
  3445. sync->interrupted = true;
  3446. error = DB_INTERRUPTED;
  3447. goto end_sync;
  3448. );
  3449. /* Make sure all the caches are synced. */
  3450. for (i = 0; i < ib_vector_size(cache->indexes); ++i) {
  3451. fts_index_cache_t* index_cache;
  3452. index_cache = static_cast<fts_index_cache_t*>(
  3453. ib_vector_get(cache->indexes, i));
  3454. if (index_cache->index->to_be_dropped
  3455. || index_cache->index->table->to_be_dropped
  3456. || fts_sync_index_check(index_cache)) {
  3457. continue;
  3458. }
  3459. goto begin_sync;
  3460. }
  3461. end_sync:
  3462. if (error == DB_SUCCESS && !sync->interrupted) {
  3463. error = fts_sync_commit(sync);
  3464. } else {
  3465. fts_sync_rollback(sync);
  3466. }
  3467. rw_lock_x_lock(&cache->lock);
  3468. /* Clear fts syncing flags of any indexes in case sync is
  3469. interrupted */
  3470. for (i = 0; i < ib_vector_size(cache->indexes); ++i) {
  3471. static_cast<fts_index_cache_t*>(
  3472. ib_vector_get(cache->indexes, i))
  3473. ->index->index_fts_syncing = false;
  3474. }
  3475. sync->interrupted = false;
  3476. sync->in_progress = false;
  3477. os_event_set(sync->event);
  3478. rw_lock_x_unlock(&cache->lock);
  3479. /* We need to check whether an optimize is required, for that
  3480. we make copies of the two variables that control the trigger. These
  3481. variables can change behind our back and we don't want to hold the
  3482. lock for longer than is needed. */
  3483. mutex_enter(&cache->deleted_lock);
  3484. cache->added = 0;
  3485. cache->deleted = 0;
  3486. mutex_exit(&cache->deleted_lock);
  3487. return(error);
  3488. }
  3489. /** Run SYNC on the table, i.e., write out data from the cache to the
  3490. FTS auxiliary INDEX table and clear the cache at the end.
  3491. @param[in,out] table fts table
  3492. @param[in] wait whether wait for existing sync to finish
  3493. @return DB_SUCCESS on success, error code on failure. */
  3494. dberr_t fts_sync_table(dict_table_t* table, bool wait)
  3495. {
  3496. dberr_t err = DB_SUCCESS;
  3497. ut_ad(table->fts);
  3498. if (table->space && table->fts->cache
  3499. && !dict_table_is_corrupted(table)) {
  3500. err = fts_sync(table->fts->cache->sync, !wait, wait);
  3501. }
  3502. return(err);
  3503. }
  3504. /** Check if a fts token is a stopword or less than fts_min_token_size
  3505. or greater than fts_max_token_size.
  3506. @param[in] token token string
  3507. @param[in] stopwords stopwords rb tree
  3508. @param[in] cs token charset
  3509. @retval true if it is not stopword and length in range
  3510. @retval false if it is stopword or lenght not in range */
  3511. bool
  3512. fts_check_token(
  3513. const fts_string_t* token,
  3514. const ib_rbt_t* stopwords,
  3515. const CHARSET_INFO* cs)
  3516. {
  3517. ut_ad(cs != NULL || stopwords == NULL);
  3518. ib_rbt_bound_t parent;
  3519. return(token->f_n_char >= fts_min_token_size
  3520. && token->f_n_char <= fts_max_token_size
  3521. && (stopwords == NULL
  3522. || rbt_search(stopwords, &parent, token) != 0));
  3523. }
  3524. /** Add the token and its start position to the token's list of positions.
  3525. @param[in,out] result_doc result doc rb tree
  3526. @param[in] str token string
  3527. @param[in] position token position */
  3528. static
  3529. void
  3530. fts_add_token(
  3531. fts_doc_t* result_doc,
  3532. fts_string_t str,
  3533. ulint position)
  3534. {
  3535. /* Ignore string whose character number is less than
  3536. "fts_min_token_size" or more than "fts_max_token_size" */
  3537. if (fts_check_token(&str, NULL, result_doc->charset)) {
  3538. mem_heap_t* heap;
  3539. fts_string_t t_str;
  3540. fts_token_t* token;
  3541. ib_rbt_bound_t parent;
  3542. ulint newlen;
  3543. heap = static_cast<mem_heap_t*>(result_doc->self_heap->arg);
  3544. t_str.f_n_char = str.f_n_char;
  3545. t_str.f_len = str.f_len * result_doc->charset->casedn_multiply + 1;
  3546. t_str.f_str = static_cast<byte*>(
  3547. mem_heap_alloc(heap, t_str.f_len));
  3548. /* For binary collations, a case sensitive search is
  3549. performed. Hence don't convert to lower case. */
  3550. if (my_binary_compare(result_doc->charset)) {
  3551. memcpy(t_str.f_str, str.f_str, str.f_len);
  3552. t_str.f_str[str.f_len]= 0;
  3553. newlen= str.f_len;
  3554. } else {
  3555. newlen = innobase_fts_casedn_str(
  3556. result_doc->charset, (char*) str.f_str, str.f_len,
  3557. (char*) t_str.f_str, t_str.f_len);
  3558. }
  3559. t_str.f_len = newlen;
  3560. t_str.f_str[newlen] = 0;
  3561. /* Add the word to the document statistics. If the word
  3562. hasn't been seen before we create a new entry for it. */
  3563. if (rbt_search(result_doc->tokens, &parent, &t_str) != 0) {
  3564. fts_token_t new_token;
  3565. new_token.text.f_len = newlen;
  3566. new_token.text.f_str = t_str.f_str;
  3567. new_token.text.f_n_char = t_str.f_n_char;
  3568. new_token.positions = ib_vector_create(
  3569. result_doc->self_heap, sizeof(ulint), 32);
  3570. parent.last = rbt_add_node(
  3571. result_doc->tokens, &parent, &new_token);
  3572. ut_ad(rbt_validate(result_doc->tokens));
  3573. }
  3574. token = rbt_value(fts_token_t, parent.last);
  3575. ib_vector_push(token->positions, &position);
  3576. }
  3577. }
  3578. /********************************************************************
  3579. Process next token from document starting at the given position, i.e., add
  3580. the token's start position to the token's list of positions.
  3581. @return number of characters handled in this call */
  3582. static
  3583. ulint
  3584. fts_process_token(
  3585. /*==============*/
  3586. fts_doc_t* doc, /* in/out: document to
  3587. tokenize */
  3588. fts_doc_t* result, /* out: if provided, save
  3589. result here */
  3590. ulint start_pos, /*!< in: start position in text */
  3591. ulint add_pos) /*!< in: add this position to all
  3592. tokens from this tokenization */
  3593. {
  3594. ulint ret;
  3595. fts_string_t str;
  3596. ulint position;
  3597. fts_doc_t* result_doc;
  3598. byte buf[FTS_MAX_WORD_LEN + 1];
  3599. str.f_str = buf;
  3600. /* Determine where to save the result. */
  3601. result_doc = (result != NULL) ? result : doc;
  3602. /* The length of a string in characters is set here only. */
  3603. ret = innobase_mysql_fts_get_token(
  3604. doc->charset, doc->text.f_str + start_pos,
  3605. doc->text.f_str + doc->text.f_len, &str);
  3606. position = start_pos + ret - str.f_len + add_pos;
  3607. fts_add_token(result_doc, str, position);
  3608. return(ret);
  3609. }
  3610. /*************************************************************//**
  3611. Get token char size by charset
  3612. @return token size */
  3613. ulint
  3614. fts_get_token_size(
  3615. /*===============*/
  3616. const CHARSET_INFO* cs, /*!< in: Character set */
  3617. const char* token, /*!< in: token */
  3618. ulint len) /*!< in: token length */
  3619. {
  3620. char* start;
  3621. char* end;
  3622. ulint size = 0;
  3623. /* const_cast is for reinterpret_cast below, or it will fail. */
  3624. start = const_cast<char*>(token);
  3625. end = start + len;
  3626. while (start < end) {
  3627. int ctype;
  3628. int mbl;
  3629. mbl = cs->cset->ctype(
  3630. cs, &ctype,
  3631. reinterpret_cast<uchar*>(start),
  3632. reinterpret_cast<uchar*>(end));
  3633. size++;
  3634. start += mbl > 0 ? mbl : (mbl < 0 ? -mbl : 1);
  3635. }
  3636. return(size);
  3637. }
  3638. /*************************************************************//**
  3639. FTS plugin parser 'myql_parser' callback function for document tokenize.
  3640. Refer to 'st_mysql_ftparser_param' for more detail.
  3641. @return always returns 0 */
  3642. int
  3643. fts_tokenize_document_internal(
  3644. /*===========================*/
  3645. MYSQL_FTPARSER_PARAM* param, /*!< in: parser parameter */
  3646. const char* doc,/*!< in/out: document */
  3647. int len) /*!< in: document length */
  3648. {
  3649. fts_string_t str;
  3650. byte buf[FTS_MAX_WORD_LEN + 1];
  3651. /* JAN: TODO: MySQL 5.7
  3652. MYSQL_FTPARSER_BOOLEAN_INFO bool_info =
  3653. { FT_TOKEN_WORD, 0, 0, 0, 0, 0, ' ', 0 };
  3654. */
  3655. MYSQL_FTPARSER_BOOLEAN_INFO bool_info =
  3656. { FT_TOKEN_WORD, 0, 0, 0, 0, ' ', 0};
  3657. ut_ad(len >= 0);
  3658. str.f_str = buf;
  3659. for (ulint i = 0, inc = 0; i < static_cast<ulint>(len); i += inc) {
  3660. inc = innobase_mysql_fts_get_token(
  3661. const_cast<CHARSET_INFO*>(param->cs),
  3662. (uchar*)(doc) + i,
  3663. (uchar*)(doc) + len,
  3664. &str);
  3665. if (str.f_len > 0) {
  3666. /* JAN: TODO: MySQL 5.7
  3667. bool_info.position =
  3668. static_cast<int>(i + inc - str.f_len);
  3669. ut_ad(bool_info.position >= 0);
  3670. */
  3671. /* Stop when add word fails */
  3672. if (param->mysql_add_word(
  3673. param,
  3674. reinterpret_cast<char*>(str.f_str),
  3675. static_cast<int>(str.f_len),
  3676. &bool_info)) {
  3677. break;
  3678. }
  3679. }
  3680. }
  3681. return(0);
  3682. }
  3683. /******************************************************************//**
  3684. FTS plugin parser 'myql_add_word' callback function for document tokenize.
  3685. Refer to 'st_mysql_ftparser_param' for more detail.
  3686. @return always returns 0 */
  3687. static
  3688. int
  3689. fts_tokenize_add_word_for_parser(
  3690. /*=============================*/
  3691. MYSQL_FTPARSER_PARAM* param, /* in: parser paramter */
  3692. const char* word, /* in: token word */
  3693. int word_len, /* in: word len */
  3694. MYSQL_FTPARSER_BOOLEAN_INFO*)
  3695. {
  3696. fts_string_t str;
  3697. fts_tokenize_param_t* fts_param;
  3698. fts_doc_t* result_doc;
  3699. ulint position;
  3700. fts_param = static_cast<fts_tokenize_param_t*>(param->mysql_ftparam);
  3701. result_doc = fts_param->result_doc;
  3702. ut_ad(result_doc != NULL);
  3703. str.f_str = (byte*)(word);
  3704. str.f_len = ulint(word_len);
  3705. str.f_n_char = fts_get_token_size(
  3706. const_cast<CHARSET_INFO*>(param->cs), word, str.f_len);
  3707. /* JAN: TODO: MySQL 5.7 FTS
  3708. ut_ad(boolean_info->position >= 0);
  3709. position = boolean_info->position + fts_param->add_pos;
  3710. */
  3711. position = fts_param->add_pos;
  3712. fts_add_token(result_doc, str, position);
  3713. return(0);
  3714. }
  3715. /******************************************************************//**
  3716. Parse a document using an external / user supplied parser */
  3717. static
  3718. void
  3719. fts_tokenize_by_parser(
  3720. /*===================*/
  3721. fts_doc_t* doc, /* in/out: document to tokenize */
  3722. st_mysql_ftparser* parser, /* in: plugin fts parser */
  3723. fts_tokenize_param_t* fts_param) /* in: fts tokenize param */
  3724. {
  3725. MYSQL_FTPARSER_PARAM param;
  3726. ut_a(parser);
  3727. /* Set paramters for param */
  3728. param.mysql_parse = fts_tokenize_document_internal;
  3729. param.mysql_add_word = fts_tokenize_add_word_for_parser;
  3730. param.mysql_ftparam = fts_param;
  3731. param.cs = doc->charset;
  3732. param.doc = reinterpret_cast<char*>(doc->text.f_str);
  3733. param.length = static_cast<int>(doc->text.f_len);
  3734. param.mode= MYSQL_FTPARSER_SIMPLE_MODE;
  3735. PARSER_INIT(parser, &param);
  3736. parser->parse(&param);
  3737. PARSER_DEINIT(parser, &param);
  3738. }
  3739. /** Tokenize a document.
  3740. @param[in,out] doc document to tokenize
  3741. @param[out] result tokenization result
  3742. @param[in] parser pluggable parser */
  3743. static
  3744. void
  3745. fts_tokenize_document(
  3746. fts_doc_t* doc,
  3747. fts_doc_t* result,
  3748. st_mysql_ftparser* parser)
  3749. {
  3750. ut_a(!doc->tokens);
  3751. ut_a(doc->charset);
  3752. doc->tokens = rbt_create_arg_cmp(sizeof(fts_token_t),
  3753. innobase_fts_text_cmp,
  3754. (void*) doc->charset);
  3755. if (parser != NULL) {
  3756. fts_tokenize_param_t fts_param;
  3757. fts_param.result_doc = (result != NULL) ? result : doc;
  3758. fts_param.add_pos = 0;
  3759. fts_tokenize_by_parser(doc, parser, &fts_param);
  3760. } else {
  3761. ulint inc;
  3762. for (ulint i = 0; i < doc->text.f_len; i += inc) {
  3763. inc = fts_process_token(doc, result, i, 0);
  3764. ut_a(inc > 0);
  3765. }
  3766. }
  3767. }
  3768. /** Continue to tokenize a document.
  3769. @param[in,out] doc document to tokenize
  3770. @param[in] add_pos add this position to all tokens from this tokenization
  3771. @param[out] result tokenization result
  3772. @param[in] parser pluggable parser */
  3773. static
  3774. void
  3775. fts_tokenize_document_next(
  3776. fts_doc_t* doc,
  3777. ulint add_pos,
  3778. fts_doc_t* result,
  3779. st_mysql_ftparser* parser)
  3780. {
  3781. ut_a(doc->tokens);
  3782. if (parser) {
  3783. fts_tokenize_param_t fts_param;
  3784. fts_param.result_doc = (result != NULL) ? result : doc;
  3785. fts_param.add_pos = add_pos;
  3786. fts_tokenize_by_parser(doc, parser, &fts_param);
  3787. } else {
  3788. ulint inc;
  3789. for (ulint i = 0; i < doc->text.f_len; i += inc) {
  3790. inc = fts_process_token(doc, result, i, add_pos);
  3791. ut_a(inc > 0);
  3792. }
  3793. }
  3794. }
  3795. /** Create the vector of fts_get_doc_t instances.
  3796. @param[in,out] cache fts cache
  3797. @return vector of fts_get_doc_t instances */
  3798. static
  3799. ib_vector_t*
  3800. fts_get_docs_create(
  3801. fts_cache_t* cache)
  3802. {
  3803. ib_vector_t* get_docs;
  3804. ut_ad(rw_lock_own(&cache->init_lock, RW_LOCK_X));
  3805. /* We need one instance of fts_get_doc_t per index. */
  3806. get_docs = ib_vector_create(cache->self_heap, sizeof(fts_get_doc_t), 4);
  3807. /* Create the get_doc instance, we need one of these
  3808. per FTS index. */
  3809. for (ulint i = 0; i < ib_vector_size(cache->indexes); ++i) {
  3810. dict_index_t** index;
  3811. fts_get_doc_t* get_doc;
  3812. index = static_cast<dict_index_t**>(
  3813. ib_vector_get(cache->indexes, i));
  3814. get_doc = static_cast<fts_get_doc_t*>(
  3815. ib_vector_push(get_docs, NULL));
  3816. memset(get_doc, 0x0, sizeof(*get_doc));
  3817. get_doc->index_cache = fts_get_index_cache(cache, *index);
  3818. get_doc->cache = cache;
  3819. /* Must find the index cache. */
  3820. ut_a(get_doc->index_cache != NULL);
  3821. }
  3822. return(get_docs);
  3823. }
  3824. /********************************************************************
  3825. Release any resources held by the fts_get_doc_t instances. */
  3826. static
  3827. void
  3828. fts_get_docs_clear(
  3829. /*===============*/
  3830. ib_vector_t* get_docs) /*!< in: Doc retrieval vector */
  3831. {
  3832. ulint i;
  3833. /* Release the get doc graphs if any. */
  3834. for (i = 0; i < ib_vector_size(get_docs); ++i) {
  3835. fts_get_doc_t* get_doc = static_cast<fts_get_doc_t*>(
  3836. ib_vector_get(get_docs, i));
  3837. if (get_doc->get_document_graph != NULL) {
  3838. ut_a(get_doc->index_cache);
  3839. fts_que_graph_free(get_doc->get_document_graph);
  3840. get_doc->get_document_graph = NULL;
  3841. }
  3842. }
  3843. }
  3844. /*********************************************************************//**
  3845. Get the initial Doc ID by consulting the CONFIG table
  3846. @return initial Doc ID */
  3847. doc_id_t
  3848. fts_init_doc_id(
  3849. /*============*/
  3850. const dict_table_t* table) /*!< in: table */
  3851. {
  3852. doc_id_t max_doc_id = 0;
  3853. rw_lock_x_lock(&table->fts->cache->lock);
  3854. /* Return if the table is already initialized for DOC ID */
  3855. if (table->fts->cache->first_doc_id != FTS_NULL_DOC_ID) {
  3856. rw_lock_x_unlock(&table->fts->cache->lock);
  3857. return(0);
  3858. }
  3859. DEBUG_SYNC_C("fts_initialize_doc_id");
  3860. /* Then compare this value with the ID value stored in the CONFIG
  3861. table. The larger one will be our new initial Doc ID */
  3862. fts_cmp_set_sync_doc_id(table, 0, FALSE, &max_doc_id);
  3863. /* If DICT_TF2_FTS_ADD_DOC_ID is set, we are in the process of
  3864. creating index (and add doc id column. No need to recovery
  3865. documents */
  3866. if (!DICT_TF2_FLAG_IS_SET(table, DICT_TF2_FTS_ADD_DOC_ID)) {
  3867. fts_init_index((dict_table_t*) table, TRUE);
  3868. }
  3869. table->fts->added_synced = true;
  3870. table->fts->cache->first_doc_id = max_doc_id;
  3871. rw_lock_x_unlock(&table->fts->cache->lock);
  3872. ut_ad(max_doc_id > 0);
  3873. return(max_doc_id);
  3874. }
  3875. #ifdef FTS_MULT_INDEX
  3876. /*********************************************************************//**
  3877. Check if the index is in the affected set.
  3878. @return TRUE if index is updated */
  3879. static
  3880. ibool
  3881. fts_is_index_updated(
  3882. /*=================*/
  3883. const ib_vector_t* fts_indexes, /*!< in: affected FTS indexes */
  3884. const fts_get_doc_t* get_doc) /*!< in: info for reading
  3885. document */
  3886. {
  3887. ulint i;
  3888. dict_index_t* index = get_doc->index_cache->index;
  3889. for (i = 0; i < ib_vector_size(fts_indexes); ++i) {
  3890. const dict_index_t* updated_fts_index;
  3891. updated_fts_index = static_cast<const dict_index_t*>(
  3892. ib_vector_getp_const(fts_indexes, i));
  3893. ut_a(updated_fts_index != NULL);
  3894. if (updated_fts_index == index) {
  3895. return(TRUE);
  3896. }
  3897. }
  3898. return(FALSE);
  3899. }
  3900. #endif
  3901. /*********************************************************************//**
  3902. Fetch COUNT(*) from specified table.
  3903. @return the number of rows in the table */
  3904. ulint
  3905. fts_get_rows_count(
  3906. /*===============*/
  3907. fts_table_t* fts_table) /*!< in: fts table to read */
  3908. {
  3909. trx_t* trx;
  3910. pars_info_t* info;
  3911. que_t* graph;
  3912. dberr_t error;
  3913. ulint count = 0;
  3914. char table_name[MAX_FULL_NAME_LEN];
  3915. trx = trx_create();
  3916. trx->op_info = "fetching FT table rows count";
  3917. info = pars_info_create();
  3918. pars_info_bind_function(info, "my_func", fts_read_ulint, &count);
  3919. fts_get_table_name(fts_table, table_name);
  3920. pars_info_bind_id(info, true, "table_name", table_name);
  3921. graph = fts_parse_sql(
  3922. fts_table,
  3923. info,
  3924. "DECLARE FUNCTION my_func;\n"
  3925. "DECLARE CURSOR c IS"
  3926. " SELECT COUNT(*)"
  3927. " FROM $table_name;\n"
  3928. "BEGIN\n"
  3929. "\n"
  3930. "OPEN c;\n"
  3931. "WHILE 1 = 1 LOOP\n"
  3932. " FETCH c INTO my_func();\n"
  3933. " IF c % NOTFOUND THEN\n"
  3934. " EXIT;\n"
  3935. " END IF;\n"
  3936. "END LOOP;\n"
  3937. "CLOSE c;");
  3938. for (;;) {
  3939. error = fts_eval_sql(trx, graph);
  3940. if (UNIV_LIKELY(error == DB_SUCCESS)) {
  3941. fts_sql_commit(trx);
  3942. break; /* Exit the loop. */
  3943. } else {
  3944. fts_sql_rollback(trx);
  3945. if (error == DB_LOCK_WAIT_TIMEOUT) {
  3946. ib::warn() << "lock wait timeout reading"
  3947. " FTS table. Retrying!";
  3948. trx->error_state = DB_SUCCESS;
  3949. } else {
  3950. ib::error() << "(" << error
  3951. << ") while reading FTS table "
  3952. << table_name;
  3953. break; /* Exit the loop. */
  3954. }
  3955. }
  3956. }
  3957. fts_que_graph_free(graph);
  3958. trx->free();
  3959. return(count);
  3960. }
  3961. #ifdef FTS_CACHE_SIZE_DEBUG
  3962. /*********************************************************************//**
  3963. Read the max cache size parameter from the config table. */
  3964. static
  3965. void
  3966. fts_update_max_cache_size(
  3967. /*======================*/
  3968. fts_sync_t* sync) /*!< in: sync state */
  3969. {
  3970. trx_t* trx;
  3971. fts_table_t fts_table;
  3972. trx = trx_create();
  3973. FTS_INIT_FTS_TABLE(&fts_table, "CONFIG", FTS_COMMON_TABLE, sync->table);
  3974. /* The size returned is in bytes. */
  3975. sync->max_cache_size = fts_get_max_cache_size(trx, &fts_table);
  3976. fts_sql_commit(trx);
  3977. trx->free();
  3978. }
  3979. #endif /* FTS_CACHE_SIZE_DEBUG */
  3980. /*********************************************************************//**
  3981. Free the modified rows of a table. */
  3982. UNIV_INLINE
  3983. void
  3984. fts_trx_table_rows_free(
  3985. /*====================*/
  3986. ib_rbt_t* rows) /*!< in: rbt of rows to free */
  3987. {
  3988. const ib_rbt_node_t* node;
  3989. for (node = rbt_first(rows); node; node = rbt_first(rows)) {
  3990. fts_trx_row_t* row;
  3991. row = rbt_value(fts_trx_row_t, node);
  3992. if (row->fts_indexes != NULL) {
  3993. /* This vector shouldn't be using the
  3994. heap allocator. */
  3995. ut_a(row->fts_indexes->allocator->arg == NULL);
  3996. ib_vector_free(row->fts_indexes);
  3997. row->fts_indexes = NULL;
  3998. }
  3999. ut_free(rbt_remove_node(rows, node));
  4000. }
  4001. ut_a(rbt_empty(rows));
  4002. rbt_free(rows);
  4003. }
  4004. /*********************************************************************//**
  4005. Free an FTS savepoint instance. */
  4006. UNIV_INLINE
  4007. void
  4008. fts_savepoint_free(
  4009. /*===============*/
  4010. fts_savepoint_t* savepoint) /*!< in: savepoint instance */
  4011. {
  4012. const ib_rbt_node_t* node;
  4013. ib_rbt_t* tables = savepoint->tables;
  4014. /* Nothing to free! */
  4015. if (tables == NULL) {
  4016. return;
  4017. }
  4018. for (node = rbt_first(tables); node; node = rbt_first(tables)) {
  4019. fts_trx_table_t* ftt;
  4020. fts_trx_table_t** fttp;
  4021. fttp = rbt_value(fts_trx_table_t*, node);
  4022. ftt = *fttp;
  4023. /* This can be NULL if a savepoint was released. */
  4024. if (ftt->rows != NULL) {
  4025. fts_trx_table_rows_free(ftt->rows);
  4026. ftt->rows = NULL;
  4027. }
  4028. /* This can be NULL if a savepoint was released. */
  4029. if (ftt->added_doc_ids != NULL) {
  4030. fts_doc_ids_free(ftt->added_doc_ids);
  4031. ftt->added_doc_ids = NULL;
  4032. }
  4033. /* The default savepoint name must be NULL. */
  4034. if (ftt->docs_added_graph) {
  4035. fts_que_graph_free(ftt->docs_added_graph);
  4036. }
  4037. /* NOTE: We are responsible for free'ing the node */
  4038. ut_free(rbt_remove_node(tables, node));
  4039. }
  4040. ut_a(rbt_empty(tables));
  4041. rbt_free(tables);
  4042. savepoint->tables = NULL;
  4043. }
  4044. /*********************************************************************//**
  4045. Free an FTS trx. */
  4046. void
  4047. fts_trx_free(
  4048. /*=========*/
  4049. fts_trx_t* fts_trx) /* in, own: FTS trx */
  4050. {
  4051. ulint i;
  4052. for (i = 0; i < ib_vector_size(fts_trx->savepoints); ++i) {
  4053. fts_savepoint_t* savepoint;
  4054. savepoint = static_cast<fts_savepoint_t*>(
  4055. ib_vector_get(fts_trx->savepoints, i));
  4056. /* The default savepoint name must be NULL. */
  4057. if (i == 0) {
  4058. ut_a(savepoint->name == NULL);
  4059. }
  4060. fts_savepoint_free(savepoint);
  4061. }
  4062. for (i = 0; i < ib_vector_size(fts_trx->last_stmt); ++i) {
  4063. fts_savepoint_t* savepoint;
  4064. savepoint = static_cast<fts_savepoint_t*>(
  4065. ib_vector_get(fts_trx->last_stmt, i));
  4066. /* The default savepoint name must be NULL. */
  4067. if (i == 0) {
  4068. ut_a(savepoint->name == NULL);
  4069. }
  4070. fts_savepoint_free(savepoint);
  4071. }
  4072. if (fts_trx->heap) {
  4073. mem_heap_free(fts_trx->heap);
  4074. }
  4075. }
  4076. /*********************************************************************//**
  4077. Extract the doc id from the FTS hidden column.
  4078. @return doc id that was extracted from rec */
  4079. doc_id_t
  4080. fts_get_doc_id_from_row(
  4081. /*====================*/
  4082. dict_table_t* table, /*!< in: table */
  4083. dtuple_t* row) /*!< in: row whose FTS doc id we
  4084. want to extract.*/
  4085. {
  4086. dfield_t* field;
  4087. doc_id_t doc_id = 0;
  4088. ut_a(table->fts->doc_col != ULINT_UNDEFINED);
  4089. field = dtuple_get_nth_field(row, table->fts->doc_col);
  4090. ut_a(dfield_get_len(field) == sizeof(doc_id));
  4091. ut_a(dfield_get_type(field)->mtype == DATA_INT);
  4092. doc_id = fts_read_doc_id(
  4093. static_cast<const byte*>(dfield_get_data(field)));
  4094. return(doc_id);
  4095. }
  4096. /** Extract the doc id from the record that belongs to index.
  4097. @param[in] rec record containing FTS_DOC_ID
  4098. @param[in] index index of rec
  4099. @param[in] offsets rec_get_offsets(rec,index)
  4100. @return doc id that was extracted from rec */
  4101. doc_id_t
  4102. fts_get_doc_id_from_rec(
  4103. const rec_t* rec,
  4104. const dict_index_t* index,
  4105. const rec_offs* offsets)
  4106. {
  4107. ulint f = dict_col_get_index_pos(
  4108. &index->table->cols[index->table->fts->doc_col], index);
  4109. ulint len;
  4110. doc_id_t doc_id = mach_read_from_8(
  4111. rec_get_nth_field(rec, offsets, f, &len));
  4112. ut_ad(len == 8);
  4113. return doc_id;
  4114. }
  4115. /*********************************************************************//**
  4116. Search the index specific cache for a particular FTS index.
  4117. @return the index specific cache else NULL */
  4118. fts_index_cache_t*
  4119. fts_find_index_cache(
  4120. /*=================*/
  4121. const fts_cache_t* cache, /*!< in: cache to search */
  4122. const dict_index_t* index) /*!< in: index to search for */
  4123. {
  4124. /* We cast away the const because our internal function, takes
  4125. non-const cache arg and returns a non-const pointer. */
  4126. return(static_cast<fts_index_cache_t*>(
  4127. fts_get_index_cache((fts_cache_t*) cache, index)));
  4128. }
  4129. /*********************************************************************//**
  4130. Search cache for word.
  4131. @return the word node vector if found else NULL */
  4132. const ib_vector_t*
  4133. fts_cache_find_word(
  4134. /*================*/
  4135. const fts_index_cache_t*index_cache, /*!< in: cache to search */
  4136. const fts_string_t* text) /*!< in: word to search for */
  4137. {
  4138. ib_rbt_bound_t parent;
  4139. const ib_vector_t* nodes = NULL;
  4140. #ifdef UNIV_DEBUG
  4141. dict_table_t* table = index_cache->index->table;
  4142. fts_cache_t* cache = table->fts->cache;
  4143. ut_ad(rw_lock_own(&cache->lock, RW_LOCK_X));
  4144. #endif /* UNIV_DEBUG */
  4145. /* Lookup the word in the rb tree */
  4146. if (rbt_search(index_cache->words, &parent, text) == 0) {
  4147. const fts_tokenizer_word_t* word;
  4148. word = rbt_value(fts_tokenizer_word_t, parent.last);
  4149. nodes = word->nodes;
  4150. }
  4151. return(nodes);
  4152. }
  4153. /*********************************************************************//**
  4154. Append deleted doc ids to vector. */
  4155. void
  4156. fts_cache_append_deleted_doc_ids(
  4157. /*=============================*/
  4158. const fts_cache_t* cache, /*!< in: cache to use */
  4159. ib_vector_t* vector) /*!< in: append to this vector */
  4160. {
  4161. mutex_enter(const_cast<ib_mutex_t*>(&cache->deleted_lock));
  4162. if (cache->deleted_doc_ids == NULL) {
  4163. mutex_exit((ib_mutex_t*) &cache->deleted_lock);
  4164. return;
  4165. }
  4166. for (ulint i = 0; i < ib_vector_size(cache->deleted_doc_ids); ++i) {
  4167. doc_id_t* update;
  4168. update = static_cast<doc_id_t*>(
  4169. ib_vector_get(cache->deleted_doc_ids, i));
  4170. ib_vector_push(vector, &update);
  4171. }
  4172. mutex_exit((ib_mutex_t*) &cache->deleted_lock);
  4173. }
  4174. /*********************************************************************//**
  4175. Add the FTS document id hidden column. */
  4176. void
  4177. fts_add_doc_id_column(
  4178. /*==================*/
  4179. dict_table_t* table, /*!< in/out: Table with FTS index */
  4180. mem_heap_t* heap) /*!< in: temporary memory heap, or NULL */
  4181. {
  4182. dict_mem_table_add_col(
  4183. table, heap,
  4184. FTS_DOC_ID_COL_NAME,
  4185. DATA_INT,
  4186. dtype_form_prtype(
  4187. DATA_NOT_NULL | DATA_UNSIGNED
  4188. | DATA_BINARY_TYPE | DATA_FTS_DOC_ID, 0),
  4189. sizeof(doc_id_t));
  4190. DICT_TF2_FLAG_SET(table, DICT_TF2_FTS_HAS_DOC_ID);
  4191. }
  4192. /** Add new fts doc id to the update vector.
  4193. @param[in] table the table that contains the FTS index.
  4194. @param[in,out] ufield the fts doc id field in the update vector.
  4195. No new memory is allocated for this in this
  4196. function.
  4197. @param[in,out] next_doc_id the fts doc id that has been added to the
  4198. update vector. If 0, a new fts doc id is
  4199. automatically generated. The memory provided
  4200. for this argument will be used by the update
  4201. vector. Ensure that the life time of this
  4202. memory matches that of the update vector.
  4203. @return the fts doc id used in the update vector */
  4204. doc_id_t
  4205. fts_update_doc_id(
  4206. dict_table_t* table,
  4207. upd_field_t* ufield,
  4208. doc_id_t* next_doc_id)
  4209. {
  4210. doc_id_t doc_id;
  4211. dberr_t error = DB_SUCCESS;
  4212. if (*next_doc_id) {
  4213. doc_id = *next_doc_id;
  4214. } else {
  4215. /* Get the new document id that will be added. */
  4216. error = fts_get_next_doc_id(table, &doc_id);
  4217. }
  4218. if (error == DB_SUCCESS) {
  4219. dict_index_t* clust_index;
  4220. dict_col_t* col = dict_table_get_nth_col(
  4221. table, table->fts->doc_col);
  4222. ufield->exp = NULL;
  4223. ufield->new_val.len = sizeof(doc_id);
  4224. clust_index = dict_table_get_first_index(table);
  4225. ufield->field_no = dict_col_get_clust_pos(col, clust_index);
  4226. dict_col_copy_type(col, dfield_get_type(&ufield->new_val));
  4227. /* It is possible we update record that has
  4228. not yet be sync-ed from last crash. */
  4229. /* Convert to storage byte order. */
  4230. ut_a(doc_id != FTS_NULL_DOC_ID);
  4231. fts_write_doc_id((byte*) next_doc_id, doc_id);
  4232. ufield->new_val.data = next_doc_id;
  4233. ufield->new_val.ext = 0;
  4234. }
  4235. return(doc_id);
  4236. }
  4237. /** fts_t constructor.
  4238. @param[in] table table with FTS indexes
  4239. @param[in,out] heap memory heap where 'this' is stored */
  4240. fts_t::fts_t(
  4241. const dict_table_t* table,
  4242. mem_heap_t* heap)
  4243. :
  4244. added_synced(0), dict_locked(0),
  4245. add_wq(NULL),
  4246. cache(NULL),
  4247. doc_col(ULINT_UNDEFINED), in_queue(false),
  4248. fts_heap(heap)
  4249. {
  4250. ut_a(table->fts == NULL);
  4251. ib_alloc_t* heap_alloc = ib_heap_allocator_create(fts_heap);
  4252. indexes = ib_vector_create(heap_alloc, sizeof(dict_index_t*), 4);
  4253. dict_table_get_all_fts_indexes(table, indexes);
  4254. }
  4255. /** fts_t destructor. */
  4256. fts_t::~fts_t()
  4257. {
  4258. ut_ad(add_wq == NULL);
  4259. if (cache != NULL) {
  4260. fts_cache_clear(cache);
  4261. fts_cache_destroy(cache);
  4262. cache = NULL;
  4263. }
  4264. /* There is no need to call ib_vector_free() on this->indexes
  4265. because it is stored in this->fts_heap. */
  4266. }
  4267. /*********************************************************************//**
  4268. Create an instance of fts_t.
  4269. @return instance of fts_t */
  4270. fts_t*
  4271. fts_create(
  4272. /*=======*/
  4273. dict_table_t* table) /*!< in/out: table with FTS indexes */
  4274. {
  4275. fts_t* fts;
  4276. mem_heap_t* heap;
  4277. heap = mem_heap_create(512);
  4278. fts = static_cast<fts_t*>(mem_heap_alloc(heap, sizeof(*fts)));
  4279. new(fts) fts_t(table, heap);
  4280. return(fts);
  4281. }
  4282. /*********************************************************************//**
  4283. Free the FTS resources. */
  4284. void
  4285. fts_free(
  4286. /*=====*/
  4287. dict_table_t* table) /*!< in/out: table with FTS indexes */
  4288. {
  4289. fts_t* fts = table->fts;
  4290. fts->~fts_t();
  4291. mem_heap_free(fts->fts_heap);
  4292. table->fts = NULL;
  4293. }
  4294. /*********************************************************************//**
  4295. Take a FTS savepoint. */
  4296. UNIV_INLINE
  4297. void
  4298. fts_savepoint_copy(
  4299. /*===============*/
  4300. const fts_savepoint_t* src, /*!< in: source savepoint */
  4301. fts_savepoint_t* dst) /*!< out: destination savepoint */
  4302. {
  4303. const ib_rbt_node_t* node;
  4304. const ib_rbt_t* tables;
  4305. tables = src->tables;
  4306. for (node = rbt_first(tables); node; node = rbt_next(tables, node)) {
  4307. fts_trx_table_t* ftt_dst;
  4308. const fts_trx_table_t** ftt_src;
  4309. ftt_src = rbt_value(const fts_trx_table_t*, node);
  4310. ftt_dst = fts_trx_table_clone(*ftt_src);
  4311. rbt_insert(dst->tables, &ftt_dst, &ftt_dst);
  4312. }
  4313. }
  4314. /*********************************************************************//**
  4315. Take a FTS savepoint. */
  4316. void
  4317. fts_savepoint_take(
  4318. /*===============*/
  4319. fts_trx_t* fts_trx, /*!< in: fts transaction */
  4320. const char* name) /*!< in: savepoint name */
  4321. {
  4322. mem_heap_t* heap;
  4323. fts_savepoint_t* savepoint;
  4324. fts_savepoint_t* last_savepoint;
  4325. ut_a(name != NULL);
  4326. heap = fts_trx->heap;
  4327. /* The implied savepoint must exist. */
  4328. ut_a(ib_vector_size(fts_trx->savepoints) > 0);
  4329. last_savepoint = static_cast<fts_savepoint_t*>(
  4330. ib_vector_last(fts_trx->savepoints));
  4331. savepoint = fts_savepoint_create(fts_trx->savepoints, name, heap);
  4332. if (last_savepoint->tables != NULL) {
  4333. fts_savepoint_copy(last_savepoint, savepoint);
  4334. }
  4335. }
  4336. /*********************************************************************//**
  4337. Lookup a savepoint instance by name.
  4338. @return ULINT_UNDEFINED if not found */
  4339. UNIV_INLINE
  4340. ulint
  4341. fts_savepoint_lookup(
  4342. /*==================*/
  4343. ib_vector_t* savepoints, /*!< in: savepoints */
  4344. const char* name) /*!< in: savepoint name */
  4345. {
  4346. ulint i;
  4347. ut_a(ib_vector_size(savepoints) > 0);
  4348. for (i = 1; i < ib_vector_size(savepoints); ++i) {
  4349. fts_savepoint_t* savepoint;
  4350. savepoint = static_cast<fts_savepoint_t*>(
  4351. ib_vector_get(savepoints, i));
  4352. if (strcmp(name, savepoint->name) == 0) {
  4353. return(i);
  4354. }
  4355. }
  4356. return(ULINT_UNDEFINED);
  4357. }
  4358. /*********************************************************************//**
  4359. Release the savepoint data identified by name. All savepoints created
  4360. after the named savepoint are kept.
  4361. @return DB_SUCCESS or error code */
  4362. void
  4363. fts_savepoint_release(
  4364. /*==================*/
  4365. trx_t* trx, /*!< in: transaction */
  4366. const char* name) /*!< in: savepoint name */
  4367. {
  4368. ut_a(name != NULL);
  4369. ib_vector_t* savepoints = trx->fts_trx->savepoints;
  4370. ut_a(ib_vector_size(savepoints) > 0);
  4371. ulint i = fts_savepoint_lookup(savepoints, name);
  4372. if (i != ULINT_UNDEFINED) {
  4373. ut_a(i >= 1);
  4374. fts_savepoint_t* savepoint;
  4375. savepoint = static_cast<fts_savepoint_t*>(
  4376. ib_vector_get(savepoints, i));
  4377. if (i == ib_vector_size(savepoints) - 1) {
  4378. /* If the savepoint is the last, we save its
  4379. tables to the previous savepoint. */
  4380. fts_savepoint_t* prev_savepoint;
  4381. prev_savepoint = static_cast<fts_savepoint_t*>(
  4382. ib_vector_get(savepoints, i - 1));
  4383. ib_rbt_t* tables = savepoint->tables;
  4384. savepoint->tables = prev_savepoint->tables;
  4385. prev_savepoint->tables = tables;
  4386. }
  4387. fts_savepoint_free(savepoint);
  4388. ib_vector_remove(savepoints, *(void**)savepoint);
  4389. /* Make sure we don't delete the implied savepoint. */
  4390. ut_a(ib_vector_size(savepoints) > 0);
  4391. }
  4392. }
  4393. /**********************************************************************//**
  4394. Refresh last statement savepoint. */
  4395. void
  4396. fts_savepoint_laststmt_refresh(
  4397. /*===========================*/
  4398. trx_t* trx) /*!< in: transaction */
  4399. {
  4400. fts_trx_t* fts_trx;
  4401. fts_savepoint_t* savepoint;
  4402. fts_trx = trx->fts_trx;
  4403. savepoint = static_cast<fts_savepoint_t*>(
  4404. ib_vector_pop(fts_trx->last_stmt));
  4405. fts_savepoint_free(savepoint);
  4406. ut_ad(ib_vector_is_empty(fts_trx->last_stmt));
  4407. savepoint = fts_savepoint_create(fts_trx->last_stmt, NULL, NULL);
  4408. }
  4409. /********************************************************************
  4410. Undo the Doc ID add/delete operations in last stmt */
  4411. static
  4412. void
  4413. fts_undo_last_stmt(
  4414. /*===============*/
  4415. fts_trx_table_t* s_ftt, /*!< in: Transaction FTS table */
  4416. fts_trx_table_t* l_ftt) /*!< in: last stmt FTS table */
  4417. {
  4418. ib_rbt_t* s_rows;
  4419. ib_rbt_t* l_rows;
  4420. const ib_rbt_node_t* node;
  4421. l_rows = l_ftt->rows;
  4422. s_rows = s_ftt->rows;
  4423. for (node = rbt_first(l_rows);
  4424. node;
  4425. node = rbt_next(l_rows, node)) {
  4426. fts_trx_row_t* l_row = rbt_value(fts_trx_row_t, node);
  4427. ib_rbt_bound_t parent;
  4428. rbt_search(s_rows, &parent, &(l_row->doc_id));
  4429. if (parent.result == 0) {
  4430. fts_trx_row_t* s_row = rbt_value(
  4431. fts_trx_row_t, parent.last);
  4432. switch (l_row->state) {
  4433. case FTS_INSERT:
  4434. ut_free(rbt_remove_node(s_rows, parent.last));
  4435. break;
  4436. case FTS_DELETE:
  4437. if (s_row->state == FTS_NOTHING) {
  4438. s_row->state = FTS_INSERT;
  4439. } else if (s_row->state == FTS_DELETE) {
  4440. ut_free(rbt_remove_node(
  4441. s_rows, parent.last));
  4442. }
  4443. break;
  4444. /* FIXME: Check if FTS_MODIFY need to be addressed */
  4445. case FTS_MODIFY:
  4446. case FTS_NOTHING:
  4447. break;
  4448. default:
  4449. ut_error;
  4450. }
  4451. }
  4452. }
  4453. }
  4454. /**********************************************************************//**
  4455. Rollback to savepoint indentified by name.
  4456. @return DB_SUCCESS or error code */
  4457. void
  4458. fts_savepoint_rollback_last_stmt(
  4459. /*=============================*/
  4460. trx_t* trx) /*!< in: transaction */
  4461. {
  4462. ib_vector_t* savepoints;
  4463. fts_savepoint_t* savepoint;
  4464. fts_savepoint_t* last_stmt;
  4465. fts_trx_t* fts_trx;
  4466. ib_rbt_bound_t parent;
  4467. const ib_rbt_node_t* node;
  4468. ib_rbt_t* l_tables;
  4469. ib_rbt_t* s_tables;
  4470. fts_trx = trx->fts_trx;
  4471. savepoints = fts_trx->savepoints;
  4472. savepoint = static_cast<fts_savepoint_t*>(ib_vector_last(savepoints));
  4473. last_stmt = static_cast<fts_savepoint_t*>(
  4474. ib_vector_last(fts_trx->last_stmt));
  4475. l_tables = last_stmt->tables;
  4476. s_tables = savepoint->tables;
  4477. for (node = rbt_first(l_tables);
  4478. node;
  4479. node = rbt_next(l_tables, node)) {
  4480. fts_trx_table_t** l_ftt;
  4481. l_ftt = rbt_value(fts_trx_table_t*, node);
  4482. rbt_search_cmp(
  4483. s_tables, &parent, &(*l_ftt)->table->id,
  4484. fts_trx_table_id_cmp, NULL);
  4485. if (parent.result == 0) {
  4486. fts_trx_table_t** s_ftt;
  4487. s_ftt = rbt_value(fts_trx_table_t*, parent.last);
  4488. fts_undo_last_stmt(*s_ftt, *l_ftt);
  4489. }
  4490. }
  4491. }
  4492. /**********************************************************************//**
  4493. Rollback to savepoint indentified by name.
  4494. @return DB_SUCCESS or error code */
  4495. void
  4496. fts_savepoint_rollback(
  4497. /*===================*/
  4498. trx_t* trx, /*!< in: transaction */
  4499. const char* name) /*!< in: savepoint name */
  4500. {
  4501. ulint i;
  4502. ib_vector_t* savepoints;
  4503. ut_a(name != NULL);
  4504. savepoints = trx->fts_trx->savepoints;
  4505. /* We pop all savepoints from the the top of the stack up to
  4506. and including the instance that was found. */
  4507. i = fts_savepoint_lookup(savepoints, name);
  4508. if (i != ULINT_UNDEFINED) {
  4509. fts_savepoint_t* savepoint;
  4510. ut_a(i > 0);
  4511. while (ib_vector_size(savepoints) > i) {
  4512. fts_savepoint_t* savepoint;
  4513. savepoint = static_cast<fts_savepoint_t*>(
  4514. ib_vector_pop(savepoints));
  4515. if (savepoint->name != NULL) {
  4516. /* Since name was allocated on the heap, the
  4517. memory will be released when the transaction
  4518. completes. */
  4519. savepoint->name = NULL;
  4520. fts_savepoint_free(savepoint);
  4521. }
  4522. }
  4523. /* Pop all a elements from the top of the stack that may
  4524. have been released. We have to be careful that we don't
  4525. delete the implied savepoint. */
  4526. for (savepoint = static_cast<fts_savepoint_t*>(
  4527. ib_vector_last(savepoints));
  4528. ib_vector_size(savepoints) > 1
  4529. && savepoint->name == NULL;
  4530. savepoint = static_cast<fts_savepoint_t*>(
  4531. ib_vector_last(savepoints))) {
  4532. ib_vector_pop(savepoints);
  4533. }
  4534. /* Make sure we don't delete the implied savepoint. */
  4535. ut_a(ib_vector_size(savepoints) > 0);
  4536. /* Restore the savepoint. */
  4537. fts_savepoint_take(trx->fts_trx, name);
  4538. }
  4539. }
  4540. bool fts_check_aux_table(const char *name,
  4541. table_id_t *table_id,
  4542. index_id_t *index_id)
  4543. {
  4544. ulint len= strlen(name);
  4545. const char* ptr;
  4546. const char* end= name + len;
  4547. ut_ad(len <= MAX_FULL_NAME_LEN);
  4548. ptr= static_cast<const char*>(memchr(name, '/', len));
  4549. if (ptr != NULL)
  4550. {
  4551. /* We will start the match after the '/' */
  4552. ++ptr;
  4553. len = end - ptr;
  4554. }
  4555. /* All auxiliary tables are prefixed with "FTS_" and the name
  4556. length will be at the very least greater than 20 bytes. */
  4557. if (ptr && len > 20 && !memcmp(ptr, "FTS_", 4))
  4558. {
  4559. /* Skip the prefix. */
  4560. ptr+= 4;
  4561. len-= 4;
  4562. const char *table_id_ptr= ptr;
  4563. /* Skip the table id. */
  4564. ptr= static_cast<const char*>(memchr(ptr, '_', len));
  4565. if (!ptr)
  4566. return false;
  4567. /* Skip the underscore. */
  4568. ++ptr;
  4569. ut_ad(end > ptr);
  4570. len= end - ptr;
  4571. sscanf(table_id_ptr, UINT64PFx, table_id);
  4572. /* First search the common table suffix array. */
  4573. for (ulint i = 0; fts_common_tables[i]; ++i)
  4574. {
  4575. if (!strncmp(ptr, fts_common_tables[i], len))
  4576. return true;
  4577. }
  4578. /* Could be obsolete common tables. */
  4579. if ((len == 5 && !memcmp(ptr, "ADDED", len)) ||
  4580. (len == 9 && !memcmp(ptr, "STOPWORDS", len)))
  4581. return true;
  4582. const char* index_id_ptr= ptr;
  4583. /* Skip the index id. */
  4584. ptr= static_cast<const char*>(memchr(ptr, '_', len));
  4585. if (!ptr)
  4586. return false;
  4587. sscanf(index_id_ptr, UINT64PFx, index_id);
  4588. /* Skip the underscore. */
  4589. ++ptr;
  4590. ut_a(end > ptr);
  4591. len= end - ptr;
  4592. if (len > 7)
  4593. return false;
  4594. /* Search the FT index specific array. */
  4595. for (ulint i = 0; i < FTS_NUM_AUX_INDEX; ++i)
  4596. {
  4597. if (!memcmp(ptr, "INDEX_", len - 1))
  4598. return true;
  4599. }
  4600. /* Other FT index specific table(s). */
  4601. if (len == 6 && !memcmp(ptr, "DOC_ID", len))
  4602. return true;
  4603. }
  4604. return false;
  4605. }
  4606. typedef std::pair<table_id_t,index_id_t> fts_aux_id;
  4607. typedef std::set<fts_aux_id> fts_space_set_t;
  4608. /** Iterate over all the spaces in the space list and fetch the
  4609. fts parent table id and index id.
  4610. @param[in,out] fts_space_set store the list of tablespace id and
  4611. index id */
  4612. static void fil_get_fts_spaces(fts_space_set_t& fts_space_set)
  4613. {
  4614. mutex_enter(&fil_system.mutex);
  4615. for (fil_space_t *space= UT_LIST_GET_FIRST(fil_system.space_list);
  4616. space;
  4617. space= UT_LIST_GET_NEXT(space_list, space))
  4618. {
  4619. index_id_t index_id= 0;
  4620. table_id_t table_id= 0;
  4621. if (space->purpose == FIL_TYPE_TABLESPACE
  4622. && fts_check_aux_table(space->name, &table_id, &index_id))
  4623. fts_space_set.insert(std::make_pair(table_id, index_id));
  4624. }
  4625. mutex_exit(&fil_system.mutex);
  4626. }
  4627. /** Check whether the parent table id and index id of fts auxilary
  4628. tables with SYS_INDEXES. If it exists then we can safely ignore the
  4629. fts table from orphaned tables.
  4630. @param[in,out] fts_space_set fts space set contains set of auxiliary
  4631. table ids */
  4632. static void fts_check_orphaned_tables(fts_space_set_t& fts_space_set)
  4633. {
  4634. btr_pcur_t pcur;
  4635. mtr_t mtr;
  4636. trx_t* trx = trx_create();
  4637. trx->op_info = "checking fts orphaned tables";
  4638. row_mysql_lock_data_dictionary(trx);
  4639. mtr.start();
  4640. btr_pcur_open_at_index_side(
  4641. true, dict_table_get_first_index(dict_sys.sys_indexes),
  4642. BTR_SEARCH_LEAF, &pcur, true, 0, &mtr);
  4643. do
  4644. {
  4645. const rec_t *rec;
  4646. const byte *tbl_field;
  4647. const byte *index_field;
  4648. ulint len;
  4649. btr_pcur_move_to_next_user_rec(&pcur, &mtr);
  4650. if (!btr_pcur_is_on_user_rec(&pcur))
  4651. break;
  4652. rec= btr_pcur_get_rec(&pcur);
  4653. if (rec_get_deleted_flag(rec, 0))
  4654. continue;
  4655. tbl_field= rec_get_nth_field_old(rec, 0, &len);
  4656. if (len != 8)
  4657. continue;
  4658. index_field= rec_get_nth_field_old(rec, 1, &len);
  4659. if (len != 8)
  4660. continue;
  4661. table_id_t table_id = mach_read_from_8(tbl_field);
  4662. index_id_t index_id = mach_read_from_8(index_field);
  4663. fts_space_set_t::iterator it = fts_space_set.find(
  4664. fts_aux_id(table_id, index_id));
  4665. if (it != fts_space_set.end())
  4666. fts_space_set.erase(*it);
  4667. else
  4668. {
  4669. it= fts_space_set.find(fts_aux_id(table_id, 0));
  4670. if (it != fts_space_set.end())
  4671. fts_space_set.erase(*it);
  4672. }
  4673. } while(!fts_space_set.empty());
  4674. btr_pcur_close(&pcur);
  4675. mtr.commit();
  4676. row_mysql_unlock_data_dictionary(trx);
  4677. trx->free();
  4678. }
  4679. /** Drop all fts auxilary table for the respective fts_id
  4680. @param[in] fts_id fts auxilary table ids */
  4681. static void fts_drop_all_aux_tables(trx_t *trx, fts_table_t *fts_table)
  4682. {
  4683. char fts_table_name[MAX_FULL_NAME_LEN];
  4684. for (ulint i= 0;i < FTS_NUM_AUX_INDEX; i++)
  4685. {
  4686. fts_table->suffix= fts_get_suffix(i);
  4687. fts_get_table_name(fts_table, fts_table_name, true);
  4688. /* Drop all fts aux and common table */
  4689. dberr_t err= fts_drop_table(trx, fts_table_name);
  4690. if (err == DB_FAIL)
  4691. {
  4692. char *path= fil_make_filepath(NULL, fts_table_name, IBD, false);
  4693. if (path != NULL)
  4694. {
  4695. os_file_delete_if_exists(innodb_data_file_key, path , NULL);
  4696. ut_free(path);
  4697. }
  4698. }
  4699. }
  4700. }
  4701. /** Drop all orphaned FTS auxiliary tables, those that don't have
  4702. a parent table or FTS index defined on them. */
  4703. void fts_drop_orphaned_tables()
  4704. {
  4705. fts_space_set_t fts_space_set;
  4706. fil_get_fts_spaces(fts_space_set);
  4707. if (fts_space_set.empty())
  4708. return;
  4709. fts_check_orphaned_tables(fts_space_set);
  4710. if (fts_space_set.empty())
  4711. return;
  4712. trx_t* trx= trx_create();
  4713. trx->op_info= "Drop orphaned aux FTS tables";
  4714. row_mysql_lock_data_dictionary(trx);
  4715. for (fts_space_set_t::iterator it = fts_space_set.begin();
  4716. it != fts_space_set.end(); it++)
  4717. {
  4718. fts_table_t fts_table;
  4719. dict_table_t *table= dict_table_open_on_id(it->first, TRUE,
  4720. DICT_TABLE_OP_NORMAL);
  4721. if (!table)
  4722. continue;
  4723. FTS_INIT_FTS_TABLE(&fts_table, NULL, FTS_COMMON_TABLE, table);
  4724. fts_drop_common_tables(trx, &fts_table, true);
  4725. fts_table.type= FTS_INDEX_TABLE;
  4726. fts_table.index_id= it->second;
  4727. fts_drop_all_aux_tables(trx, &fts_table);
  4728. dict_table_close(table, true, false);
  4729. }
  4730. trx_commit_for_mysql(trx);
  4731. row_mysql_unlock_data_dictionary(trx);
  4732. trx->dict_operation_lock_mode= 0;
  4733. trx->free();
  4734. }
  4735. /**********************************************************************//**
  4736. Check whether user supplied stopword table is of the right format.
  4737. Caller is responsible to hold dictionary locks.
  4738. @return the stopword column charset if qualifies */
  4739. CHARSET_INFO*
  4740. fts_valid_stopword_table(
  4741. /*=====================*/
  4742. const char* stopword_table_name) /*!< in: Stopword table
  4743. name */
  4744. {
  4745. dict_table_t* table;
  4746. dict_col_t* col = NULL;
  4747. if (!stopword_table_name) {
  4748. return(NULL);
  4749. }
  4750. table = dict_table_get_low(stopword_table_name);
  4751. if (!table) {
  4752. ib::error() << "User stopword table " << stopword_table_name
  4753. << " does not exist.";
  4754. return(NULL);
  4755. } else {
  4756. const char* col_name;
  4757. col_name = dict_table_get_col_name(table, 0);
  4758. if (ut_strcmp(col_name, "value")) {
  4759. ib::error() << "Invalid column name for stopword"
  4760. " table " << stopword_table_name << ". Its"
  4761. " first column must be named as 'value'.";
  4762. return(NULL);
  4763. }
  4764. col = dict_table_get_nth_col(table, 0);
  4765. if (col->mtype != DATA_VARCHAR
  4766. && col->mtype != DATA_VARMYSQL) {
  4767. ib::error() << "Invalid column type for stopword"
  4768. " table " << stopword_table_name << ". Its"
  4769. " first column must be of varchar type";
  4770. return(NULL);
  4771. }
  4772. }
  4773. ut_ad(col);
  4774. return(fts_get_charset(col->prtype));
  4775. }
  4776. /**********************************************************************//**
  4777. This function loads the stopword into the FTS cache. It also
  4778. records/fetches stopword configuration to/from FTS configure
  4779. table, depending on whether we are creating or reloading the
  4780. FTS.
  4781. @return true if load operation is successful */
  4782. bool
  4783. fts_load_stopword(
  4784. /*==============*/
  4785. const dict_table_t*
  4786. table, /*!< in: Table with FTS */
  4787. trx_t* trx, /*!< in: Transactions */
  4788. const char* session_stopword_table, /*!< in: Session stopword table
  4789. name */
  4790. bool stopword_is_on, /*!< in: Whether stopword
  4791. option is turned on/off */
  4792. bool reload) /*!< in: Whether it is
  4793. for reloading FTS table */
  4794. {
  4795. fts_table_t fts_table;
  4796. fts_string_t str;
  4797. dberr_t error = DB_SUCCESS;
  4798. ulint use_stopword;
  4799. fts_cache_t* cache;
  4800. const char* stopword_to_use = NULL;
  4801. ibool new_trx = FALSE;
  4802. byte str_buffer[MAX_FULL_NAME_LEN + 1];
  4803. FTS_INIT_FTS_TABLE(&fts_table, "CONFIG", FTS_COMMON_TABLE, table);
  4804. cache = table->fts->cache;
  4805. if (!reload && !(cache->stopword_info.status & STOPWORD_NOT_INIT)) {
  4806. return true;
  4807. }
  4808. if (!trx) {
  4809. trx = trx_create();
  4810. if (srv_read_only_mode) {
  4811. trx_start_internal_read_only(trx);
  4812. } else {
  4813. trx_start_internal(trx);
  4814. }
  4815. trx->op_info = "upload FTS stopword";
  4816. new_trx = TRUE;
  4817. }
  4818. /* First check whether stopword filtering is turned off */
  4819. if (reload) {
  4820. error = fts_config_get_ulint(
  4821. trx, &fts_table, FTS_USE_STOPWORD, &use_stopword);
  4822. } else {
  4823. use_stopword = (ulint) stopword_is_on;
  4824. error = fts_config_set_ulint(
  4825. trx, &fts_table, FTS_USE_STOPWORD, use_stopword);
  4826. }
  4827. if (error != DB_SUCCESS) {
  4828. goto cleanup;
  4829. }
  4830. /* If stopword is turned off, no need to continue to load the
  4831. stopword into cache, but still need to do initialization */
  4832. if (!use_stopword) {
  4833. cache->stopword_info.status = STOPWORD_OFF;
  4834. goto cleanup;
  4835. }
  4836. if (reload) {
  4837. /* Fetch the stopword table name from FTS config
  4838. table */
  4839. str.f_n_char = 0;
  4840. str.f_str = str_buffer;
  4841. str.f_len = sizeof(str_buffer) - 1;
  4842. error = fts_config_get_value(
  4843. trx, &fts_table, FTS_STOPWORD_TABLE_NAME, &str);
  4844. if (error != DB_SUCCESS) {
  4845. goto cleanup;
  4846. }
  4847. if (*str.f_str) {
  4848. stopword_to_use = (const char*) str.f_str;
  4849. }
  4850. } else {
  4851. stopword_to_use = session_stopword_table;
  4852. }
  4853. if (stopword_to_use
  4854. && fts_load_user_stopword(table->fts, stopword_to_use,
  4855. &cache->stopword_info)) {
  4856. /* Save the stopword table name to the configure
  4857. table */
  4858. if (!reload) {
  4859. str.f_n_char = 0;
  4860. str.f_str = (byte*) stopword_to_use;
  4861. str.f_len = ut_strlen(stopword_to_use);
  4862. error = fts_config_set_value(
  4863. trx, &fts_table, FTS_STOPWORD_TABLE_NAME, &str);
  4864. }
  4865. } else {
  4866. /* Load system default stopword list */
  4867. fts_load_default_stopword(&cache->stopword_info);
  4868. }
  4869. cleanup:
  4870. if (new_trx) {
  4871. if (error == DB_SUCCESS) {
  4872. fts_sql_commit(trx);
  4873. } else {
  4874. fts_sql_rollback(trx);
  4875. }
  4876. trx->free();
  4877. }
  4878. if (!cache->stopword_info.cached_stopword) {
  4879. cache->stopword_info.cached_stopword = rbt_create_arg_cmp(
  4880. sizeof(fts_tokenizer_word_t), innobase_fts_text_cmp,
  4881. &my_charset_latin1);
  4882. }
  4883. return error == DB_SUCCESS;
  4884. }
  4885. /**********************************************************************//**
  4886. Callback function when we initialize the FTS at the start up
  4887. time. It recovers the maximum Doc IDs presented in the current table.
  4888. @return: always returns TRUE */
  4889. static
  4890. ibool
  4891. fts_init_get_doc_id(
  4892. /*================*/
  4893. void* row, /*!< in: sel_node_t* */
  4894. void* user_arg) /*!< in: fts cache */
  4895. {
  4896. doc_id_t doc_id = FTS_NULL_DOC_ID;
  4897. sel_node_t* node = static_cast<sel_node_t*>(row);
  4898. que_node_t* exp = node->select_list;
  4899. fts_cache_t* cache = static_cast<fts_cache_t*>(user_arg);
  4900. ut_ad(ib_vector_is_empty(cache->get_docs));
  4901. /* Copy each indexed column content into doc->text.f_str */
  4902. if (exp) {
  4903. dfield_t* dfield = que_node_get_val(exp);
  4904. dtype_t* type = dfield_get_type(dfield);
  4905. void* data = dfield_get_data(dfield);
  4906. ut_a(dtype_get_mtype(type) == DATA_INT);
  4907. doc_id = static_cast<doc_id_t>(mach_read_from_8(
  4908. static_cast<const byte*>(data)));
  4909. if (doc_id >= cache->next_doc_id) {
  4910. cache->next_doc_id = doc_id + 1;
  4911. }
  4912. }
  4913. return(TRUE);
  4914. }
  4915. /**********************************************************************//**
  4916. Callback function when we initialize the FTS at the start up
  4917. time. It recovers Doc IDs that have not sync-ed to the auxiliary
  4918. table, and require to bring them back into FTS index.
  4919. @return: always returns TRUE */
  4920. static
  4921. ibool
  4922. fts_init_recover_doc(
  4923. /*=================*/
  4924. void* row, /*!< in: sel_node_t* */
  4925. void* user_arg) /*!< in: fts cache */
  4926. {
  4927. fts_doc_t doc;
  4928. ulint doc_len = 0;
  4929. ulint field_no = 0;
  4930. fts_get_doc_t* get_doc = static_cast<fts_get_doc_t*>(user_arg);
  4931. doc_id_t doc_id = FTS_NULL_DOC_ID;
  4932. sel_node_t* node = static_cast<sel_node_t*>(row);
  4933. que_node_t* exp = node->select_list;
  4934. fts_cache_t* cache = get_doc->cache;
  4935. st_mysql_ftparser* parser = get_doc->index_cache->index->parser;
  4936. fts_doc_init(&doc);
  4937. doc.found = TRUE;
  4938. ut_ad(cache);
  4939. /* Copy each indexed column content into doc->text.f_str */
  4940. while (exp) {
  4941. dfield_t* dfield = que_node_get_val(exp);
  4942. ulint len = dfield_get_len(dfield);
  4943. if (field_no == 0) {
  4944. dtype_t* type = dfield_get_type(dfield);
  4945. void* data = dfield_get_data(dfield);
  4946. ut_a(dtype_get_mtype(type) == DATA_INT);
  4947. doc_id = static_cast<doc_id_t>(mach_read_from_8(
  4948. static_cast<const byte*>(data)));
  4949. field_no++;
  4950. exp = que_node_get_next(exp);
  4951. continue;
  4952. }
  4953. if (len == UNIV_SQL_NULL) {
  4954. exp = que_node_get_next(exp);
  4955. continue;
  4956. }
  4957. ut_ad(get_doc);
  4958. if (!get_doc->index_cache->charset) {
  4959. get_doc->index_cache->charset = fts_get_charset(
  4960. dfield->type.prtype);
  4961. }
  4962. doc.charset = get_doc->index_cache->charset;
  4963. if (dfield_is_ext(dfield)) {
  4964. dict_table_t* table = cache->sync->table;
  4965. doc.text.f_str = btr_copy_externally_stored_field(
  4966. &doc.text.f_len,
  4967. static_cast<byte*>(dfield_get_data(dfield)),
  4968. table->space->zip_size(), len,
  4969. static_cast<mem_heap_t*>(doc.self_heap->arg));
  4970. } else {
  4971. doc.text.f_str = static_cast<byte*>(
  4972. dfield_get_data(dfield));
  4973. doc.text.f_len = len;
  4974. }
  4975. if (field_no == 1) {
  4976. fts_tokenize_document(&doc, NULL, parser);
  4977. } else {
  4978. fts_tokenize_document_next(&doc, doc_len, NULL, parser);
  4979. }
  4980. exp = que_node_get_next(exp);
  4981. doc_len += (exp) ? len + 1 : len;
  4982. field_no++;
  4983. }
  4984. fts_cache_add_doc(cache, get_doc->index_cache, doc_id, doc.tokens);
  4985. fts_doc_free(&doc);
  4986. cache->added++;
  4987. if (doc_id >= cache->next_doc_id) {
  4988. cache->next_doc_id = doc_id + 1;
  4989. }
  4990. return(TRUE);
  4991. }
  4992. /**********************************************************************//**
  4993. This function brings FTS index in sync when FTS index is first
  4994. used. There are documents that have not yet sync-ed to auxiliary
  4995. tables from last server abnormally shutdown, we will need to bring
  4996. such document into FTS cache before any further operations
  4997. @return TRUE if all OK */
  4998. ibool
  4999. fts_init_index(
  5000. /*===========*/
  5001. dict_table_t* table, /*!< in: Table with FTS */
  5002. ibool has_cache_lock) /*!< in: Whether we already have
  5003. cache lock */
  5004. {
  5005. dict_index_t* index;
  5006. doc_id_t start_doc;
  5007. fts_get_doc_t* get_doc = NULL;
  5008. fts_cache_t* cache = table->fts->cache;
  5009. bool need_init = false;
  5010. ut_ad(!mutex_own(&dict_sys.mutex));
  5011. /* First check cache->get_docs is initialized */
  5012. if (!has_cache_lock) {
  5013. rw_lock_x_lock(&cache->lock);
  5014. }
  5015. rw_lock_x_lock(&cache->init_lock);
  5016. if (cache->get_docs == NULL) {
  5017. cache->get_docs = fts_get_docs_create(cache);
  5018. }
  5019. rw_lock_x_unlock(&cache->init_lock);
  5020. if (table->fts->added_synced) {
  5021. goto func_exit;
  5022. }
  5023. need_init = true;
  5024. start_doc = cache->synced_doc_id;
  5025. if (!start_doc) {
  5026. fts_cmp_set_sync_doc_id(table, 0, TRUE, &start_doc);
  5027. cache->synced_doc_id = start_doc;
  5028. }
  5029. /* No FTS index, this is the case when previous FTS index
  5030. dropped, and we re-initialize the Doc ID system for subsequent
  5031. insertion */
  5032. if (ib_vector_is_empty(cache->get_docs)) {
  5033. index = table->fts_doc_id_index;
  5034. ut_a(index);
  5035. fts_doc_fetch_by_doc_id(NULL, start_doc, index,
  5036. FTS_FETCH_DOC_BY_ID_LARGE,
  5037. fts_init_get_doc_id, cache);
  5038. } else {
  5039. if (table->fts->cache->stopword_info.status
  5040. & STOPWORD_NOT_INIT) {
  5041. fts_load_stopword(table, NULL, NULL, true, true);
  5042. }
  5043. for (ulint i = 0; i < ib_vector_size(cache->get_docs); ++i) {
  5044. get_doc = static_cast<fts_get_doc_t*>(
  5045. ib_vector_get(cache->get_docs, i));
  5046. index = get_doc->index_cache->index;
  5047. fts_doc_fetch_by_doc_id(NULL, start_doc, index,
  5048. FTS_FETCH_DOC_BY_ID_LARGE,
  5049. fts_init_recover_doc, get_doc);
  5050. }
  5051. }
  5052. table->fts->added_synced = true;
  5053. fts_get_docs_clear(cache->get_docs);
  5054. func_exit:
  5055. if (!has_cache_lock) {
  5056. rw_lock_x_unlock(&cache->lock);
  5057. }
  5058. if (need_init) {
  5059. mutex_enter(&dict_sys.mutex);
  5060. /* Register the table with the optimize thread. */
  5061. fts_optimize_add_table(table);
  5062. mutex_exit(&dict_sys.mutex);
  5063. }
  5064. return(TRUE);
  5065. }
  5066. /** Check if the all the auxillary tables associated with FTS index are in
  5067. consistent state. For now consistency is check only by ensuring
  5068. index->page_no != FIL_NULL
  5069. @param[out] base_table table has host fts index
  5070. @param[in,out] trx trx handler */
  5071. void
  5072. fts_check_corrupt(
  5073. dict_table_t* base_table,
  5074. trx_t* trx)
  5075. {
  5076. bool sane = true;
  5077. fts_table_t fts_table;
  5078. /* Iterate over the common table and check for their sanity. */
  5079. FTS_INIT_FTS_TABLE(&fts_table, NULL, FTS_COMMON_TABLE, base_table);
  5080. for (ulint i = 0; fts_common_tables[i] != NULL && sane; ++i) {
  5081. char table_name[MAX_FULL_NAME_LEN];
  5082. fts_table.suffix = fts_common_tables[i];
  5083. fts_get_table_name(&fts_table, table_name);
  5084. dict_table_t* aux_table = dict_table_open_on_name(
  5085. table_name, true, FALSE, DICT_ERR_IGNORE_NONE);
  5086. if (aux_table == NULL) {
  5087. dict_set_corrupted(
  5088. dict_table_get_first_index(base_table),
  5089. trx, "FTS_SANITY_CHECK");
  5090. ut_ad(base_table->corrupted == TRUE);
  5091. sane = false;
  5092. continue;
  5093. }
  5094. for (dict_index_t* aux_table_index =
  5095. UT_LIST_GET_FIRST(aux_table->indexes);
  5096. aux_table_index != NULL;
  5097. aux_table_index =
  5098. UT_LIST_GET_NEXT(indexes, aux_table_index)) {
  5099. /* Check if auxillary table needed for FTS is sane. */
  5100. if (aux_table_index->page == FIL_NULL) {
  5101. dict_set_corrupted(
  5102. dict_table_get_first_index(base_table),
  5103. trx, "FTS_SANITY_CHECK");
  5104. ut_ad(base_table->corrupted == TRUE);
  5105. sane = false;
  5106. }
  5107. }
  5108. dict_table_close(aux_table, FALSE, FALSE);
  5109. }
  5110. }