You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

818 lines
23 KiB

20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
branches/zip: Improve the LRU algorithm with a separate unzip_LRU list of blocks that contains uncompressed and compressed frames. This patch was designed by Heikki and Inaam, implemented by Inaam, and refined and reviewed by Marko and Sunny. buf_buddy_n_frames, buf_buddy_min_n_frames, buf_buddy_max_n_frames: Remove. buf_page_belongs_to_unzip_LRU(): New predicate: bpage->zip.data && buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE. buf_pool_t, buf_block_t: Add the linked list unzip_LRU. A block in the regular LRU list is in unzip_LRU iff buf_page_belongs_to_unzip_LRU() holds. buf_LRU_free_block(): Add a third return value to refine the case "cannot free the block". buf_LRU_search_and_free_block(): Update the documentation to reflect the implementation. buf_LRU_stat_t, buf_LRU_stat_cur, buf_LRU_stat_sum, buf_LRU_stat_arr[]: Statistics for the unzip_LRU algorithm. buf_LRU_stat_update(): New function: Update the statistics. Called once per second by srv_error_monitor_thread(). buf_LRU_validate(): Validate the unzip_LRU list as well. buf_LRU_evict_from_unzip_LRU(): New predicate: Use the unzip_LRU before falling back to the regular LRU? buf_LRU_free_from_unzip_LRU_list(), buf_LRU_free_from_common_LRU_list(): Subfunctions of buf_LRU_search_and_free_block(). buf_LRU_search_and_free_block(): Reimplement. Try to evict an uncompressed page from the unzip_LRU list before falling back to evicting an entire block from the common LRU list. buf_unzip_LRU_remove_block_if_needed(): New function. buf_unzip_LRU_add_block(): New function: Add a block to the unzip_LRU list.
18 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
branches/zip: Improve the LRU algorithm with a separate unzip_LRU list of blocks that contains uncompressed and compressed frames. This patch was designed by Heikki and Inaam, implemented by Inaam, and refined and reviewed by Marko and Sunny. buf_buddy_n_frames, buf_buddy_min_n_frames, buf_buddy_max_n_frames: Remove. buf_page_belongs_to_unzip_LRU(): New predicate: bpage->zip.data && buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE. buf_pool_t, buf_block_t: Add the linked list unzip_LRU. A block in the regular LRU list is in unzip_LRU iff buf_page_belongs_to_unzip_LRU() holds. buf_LRU_free_block(): Add a third return value to refine the case "cannot free the block". buf_LRU_search_and_free_block(): Update the documentation to reflect the implementation. buf_LRU_stat_t, buf_LRU_stat_cur, buf_LRU_stat_sum, buf_LRU_stat_arr[]: Statistics for the unzip_LRU algorithm. buf_LRU_stat_update(): New function: Update the statistics. Called once per second by srv_error_monitor_thread(). buf_LRU_validate(): Validate the unzip_LRU list as well. buf_LRU_evict_from_unzip_LRU(): New predicate: Use the unzip_LRU before falling back to the regular LRU? buf_LRU_free_from_unzip_LRU_list(), buf_LRU_free_from_common_LRU_list(): Subfunctions of buf_LRU_search_and_free_block(). buf_LRU_search_and_free_block(): Reimplement. Try to evict an uncompressed page from the unzip_LRU list before falling back to evicting an entire block from the common LRU list. buf_unzip_LRU_remove_block_if_needed(): New function. buf_unzip_LRU_add_block(): New function: Add a block to the unzip_LRU list.
18 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
20 years ago
  1. /*****************************************************************************
  2. Copyright (c) 1995, 2009, Innobase Oy. All Rights Reserved.
  3. This program is free software; you can redistribute it and/or modify it under
  4. the terms of the GNU General Public License as published by the Free Software
  5. Foundation; version 2 of the License.
  6. This program is distributed in the hope that it will be useful, but WITHOUT
  7. ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
  8. FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
  9. You should have received a copy of the GNU General Public License along with
  10. this program; if not, write to the Free Software Foundation, Inc., 59 Temple
  11. Place, Suite 330, Boston, MA 02111-1307 USA
  12. *****************************************************************************/
  13. /**************************************************//**
  14. @file buf/buf0rea.c
  15. The database buffer read
  16. Created 11/5/1995 Heikki Tuuri
  17. *******************************************************/
  18. #include "buf0rea.h"
  19. #include "fil0fil.h"
  20. #include "mtr0mtr.h"
  21. #include "buf0buf.h"
  22. #include "buf0flu.h"
  23. #include "buf0lru.h"
  24. #include "ibuf0ibuf.h"
  25. #include "log0recv.h"
  26. #include "trx0sys.h"
  27. #include "os0file.h"
  28. #include "srv0start.h"
  29. #include "srv0srv.h"
  30. /** The size in blocks of the area where the random read-ahead algorithm counts
  31. the accessed pages when deciding whether to read-ahead */
  32. #define BUF_READ_AHEAD_RANDOM_AREA BUF_READ_AHEAD_AREA
  33. /** There must be at least this many pages in buf_pool in the area to start
  34. a random read-ahead */
  35. #define BUF_READ_AHEAD_RANDOM_THRESHOLD (1 + BUF_READ_AHEAD_RANDOM_AREA / 2)
  36. /** The linear read-ahead area size */
  37. #define BUF_READ_AHEAD_LINEAR_AREA BUF_READ_AHEAD_AREA
  38. /** If there are buf_pool->curr_size per the number below pending reads, then
  39. read-ahead is not done: this is to prevent flooding the buffer pool with
  40. i/o-fixed buffer blocks */
  41. #define BUF_READ_AHEAD_PEND_LIMIT 2
  42. /********************************************************************//**
  43. Low-level function which reads a page asynchronously from a file to the
  44. buffer buf_pool if it is not already there, in which case does nothing.
  45. Sets the io_fix flag and sets an exclusive lock on the buffer frame. The
  46. flag is cleared and the x-lock released by an i/o-handler thread.
  47. @return 1 if a read request was queued, 0 if the page already resided
  48. in buf_pool, or if the page is in the doublewrite buffer blocks in
  49. which case it is never read into the pool, or if the tablespace does
  50. not exist or is being dropped */
  51. static
  52. ulint
  53. buf_read_page_low(
  54. /*==============*/
  55. ulint* err, /*!< out: DB_SUCCESS or DB_TABLESPACE_DELETED if we are
  56. trying to read from a non-existent tablespace, or a
  57. tablespace which is just now being dropped */
  58. ibool sync, /*!< in: TRUE if synchronous aio is desired */
  59. ulint mode, /*!< in: BUF_READ_IBUF_PAGES_ONLY, ...,
  60. ORed to OS_AIO_SIMULATED_WAKE_LATER (see below
  61. at read-ahead functions) */
  62. ulint space, /*!< in: space id */
  63. ulint zip_size,/*!< in: compressed page size, or 0 */
  64. ibool unzip, /*!< in: TRUE=request uncompressed page */
  65. ib_int64_t tablespace_version, /*!< in: if the space memory object has
  66. this timestamp different from what we are giving here,
  67. treat the tablespace as dropped; this is a timestamp we
  68. use to stop dangling page reads from a tablespace
  69. which we have DISCARDed + IMPORTed back */
  70. ulint offset) /*!< in: page number */
  71. {
  72. buf_page_t* bpage;
  73. ulint wake_later;
  74. *err = DB_SUCCESS;
  75. wake_later = mode & OS_AIO_SIMULATED_WAKE_LATER;
  76. mode = mode & ~OS_AIO_SIMULATED_WAKE_LATER;
  77. if (trx_doublewrite && space == TRX_SYS_SPACE
  78. && ( (offset >= trx_doublewrite->block1
  79. && offset < trx_doublewrite->block1
  80. + TRX_SYS_DOUBLEWRITE_BLOCK_SIZE)
  81. || (offset >= trx_doublewrite->block2
  82. && offset < trx_doublewrite->block2
  83. + TRX_SYS_DOUBLEWRITE_BLOCK_SIZE))) {
  84. ut_print_timestamp(stderr);
  85. fprintf(stderr,
  86. " InnoDB: Warning: trying to read"
  87. " doublewrite buffer page %lu\n",
  88. (ulong) offset);
  89. return(0);
  90. }
  91. if (ibuf_bitmap_page(zip_size, offset)
  92. || trx_sys_hdr_page(space, offset)) {
  93. /* Trx sys header is so low in the latching order that we play
  94. safe and do not leave the i/o-completion to an asynchronous
  95. i/o-thread. Ibuf bitmap pages must always be read with
  96. syncronous i/o, to make sure they do not get involved in
  97. thread deadlocks. */
  98. sync = TRUE;
  99. }
  100. /* The following call will also check if the tablespace does not exist
  101. or is being dropped; if we succeed in initing the page in the buffer
  102. pool for read, then DISCARD cannot proceed until the read has
  103. completed */
  104. bpage = buf_page_init_for_read(err, mode, space, zip_size, unzip,
  105. tablespace_version, offset);
  106. if (bpage == NULL) {
  107. return(0);
  108. }
  109. #ifdef UNIV_DEBUG
  110. if (buf_debug_prints) {
  111. fprintf(stderr,
  112. "Posting read request for page %lu, sync %lu\n",
  113. (ulong) offset,
  114. (ulong) sync);
  115. }
  116. #endif
  117. ut_ad(buf_page_in_file(bpage));
  118. if (zip_size) {
  119. *err = fil_io(OS_FILE_READ | wake_later,
  120. sync, space, zip_size, offset, 0, zip_size,
  121. bpage->zip.data, bpage);
  122. } else {
  123. ut_a(buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE);
  124. *err = fil_io(OS_FILE_READ | wake_later,
  125. sync, space, 0, offset, 0, UNIV_PAGE_SIZE,
  126. ((buf_block_t*) bpage)->frame, bpage);
  127. }
  128. ut_a(*err == DB_SUCCESS);
  129. if (sync) {
  130. /* The i/o is already completed when we arrive from
  131. fil_read */
  132. buf_page_io_complete(bpage);
  133. }
  134. return(1);
  135. }
  136. /********************************************************************//**
  137. Applies a random read-ahead in buf_pool if there are at least a threshold
  138. value of accessed pages from the random read-ahead area. Does not read any
  139. page, not even the one at the position (space, offset), if the read-ahead
  140. mechanism is not activated. NOTE 1: the calling thread may own latches on
  141. pages: to avoid deadlocks this function must be written such that it cannot
  142. end up waiting for these latches! NOTE 2: the calling thread must want
  143. access to the page given: this rule is set to prevent unintended read-aheads
  144. performed by ibuf routines, a situation which could result in a deadlock if
  145. the OS does not support asynchronous i/o.
  146. @return number of page read requests issued; NOTE that if we read ibuf
  147. pages, it may happen that the page at the given page number does not
  148. get read even if we return a positive value! */
  149. static
  150. ulint
  151. buf_read_ahead_random(
  152. /*==================*/
  153. ulint space, /*!< in: space id */
  154. ulint zip_size,/*!< in: compressed page size in bytes, or 0 */
  155. ulint offset) /*!< in: page number of a page which the current thread
  156. wants to access */
  157. {
  158. ib_int64_t tablespace_version;
  159. ulint recent_blocks = 0;
  160. ulint count;
  161. ulint LRU_recent_limit;
  162. ulint ibuf_mode;
  163. ulint low, high;
  164. ulint err;
  165. ulint i;
  166. ulint buf_read_ahead_random_area;
  167. /* We have currently disabled random readahead */
  168. return(0);
  169. if (srv_startup_is_before_trx_rollback_phase) {
  170. /* No read-ahead to avoid thread deadlocks */
  171. return(0);
  172. }
  173. if (ibuf_bitmap_page(zip_size, offset)
  174. || trx_sys_hdr_page(space, offset)) {
  175. /* If it is an ibuf bitmap page or trx sys hdr, we do
  176. no read-ahead, as that could break the ibuf page access
  177. order */
  178. return(0);
  179. }
  180. /* Remember the tablespace version before we ask te tablespace size
  181. below: if DISCARD + IMPORT changes the actual .ibd file meanwhile, we
  182. do not try to read outside the bounds of the tablespace! */
  183. tablespace_version = fil_space_get_version(space);
  184. buf_read_ahead_random_area = BUF_READ_AHEAD_RANDOM_AREA;
  185. low = (offset / buf_read_ahead_random_area)
  186. * buf_read_ahead_random_area;
  187. high = (offset / buf_read_ahead_random_area + 1)
  188. * buf_read_ahead_random_area;
  189. if (high > fil_space_get_size(space)) {
  190. high = fil_space_get_size(space);
  191. }
  192. /* Get the minimum LRU_position field value for an initial segment
  193. of the LRU list, to determine which blocks have recently been added
  194. to the start of the list. */
  195. LRU_recent_limit = buf_LRU_get_recent_limit();
  196. buf_pool_mutex_enter();
  197. if (buf_pool->n_pend_reads
  198. > buf_pool->curr_size / BUF_READ_AHEAD_PEND_LIMIT) {
  199. buf_pool_mutex_exit();
  200. return(0);
  201. }
  202. /* Count how many blocks in the area have been recently accessed,
  203. that is, reside near the start of the LRU list. */
  204. for (i = low; i < high; i++) {
  205. const buf_page_t* bpage = buf_page_hash_get(space, i);
  206. if (bpage
  207. && buf_page_is_accessed(bpage)
  208. && (buf_page_get_LRU_position(bpage) > LRU_recent_limit)) {
  209. recent_blocks++;
  210. if (recent_blocks >= BUF_READ_AHEAD_RANDOM_THRESHOLD) {
  211. buf_pool_mutex_exit();
  212. goto read_ahead;
  213. }
  214. }
  215. }
  216. buf_pool_mutex_exit();
  217. /* Do nothing */
  218. return(0);
  219. read_ahead:
  220. /* Read all the suitable blocks within the area */
  221. if (ibuf_inside()) {
  222. ibuf_mode = BUF_READ_IBUF_PAGES_ONLY;
  223. } else {
  224. ibuf_mode = BUF_READ_ANY_PAGE;
  225. }
  226. count = 0;
  227. for (i = low; i < high; i++) {
  228. /* It is only sensible to do read-ahead in the non-sync aio
  229. mode: hence FALSE as the first parameter */
  230. if (!ibuf_bitmap_page(zip_size, i)) {
  231. count += buf_read_page_low(
  232. &err, FALSE,
  233. ibuf_mode | OS_AIO_SIMULATED_WAKE_LATER,
  234. space, zip_size, FALSE,
  235. tablespace_version, i);
  236. if (err == DB_TABLESPACE_DELETED) {
  237. ut_print_timestamp(stderr);
  238. fprintf(stderr,
  239. " InnoDB: Warning: in random"
  240. " readahead trying to access\n"
  241. "InnoDB: tablespace %lu page %lu,\n"
  242. "InnoDB: but the tablespace does not"
  243. " exist or is just being dropped.\n",
  244. (ulong) space, (ulong) i);
  245. }
  246. }
  247. }
  248. /* In simulated aio we wake the aio handler threads only after
  249. queuing all aio requests, in native aio the following call does
  250. nothing: */
  251. os_aio_simulated_wake_handler_threads();
  252. #ifdef UNIV_DEBUG
  253. if (buf_debug_prints && (count > 0)) {
  254. fprintf(stderr,
  255. "Random read-ahead space %lu offset %lu pages %lu\n",
  256. (ulong) space, (ulong) offset,
  257. (ulong) count);
  258. }
  259. #endif /* UNIV_DEBUG */
  260. ++srv_read_ahead_rnd;
  261. return(count);
  262. }
  263. /********************************************************************//**
  264. High-level function which reads a page asynchronously from a file to the
  265. buffer buf_pool if it is not already there. Sets the io_fix flag and sets
  266. an exclusive lock on the buffer frame. The flag is cleared and the x-lock
  267. released by the i/o-handler thread. Does a random read-ahead if it seems
  268. sensible.
  269. @return number of page read requests issued: this can be greater than
  270. 1 if read-ahead occurred */
  271. UNIV_INTERN
  272. ulint
  273. buf_read_page(
  274. /*==========*/
  275. ulint space, /*!< in: space id */
  276. ulint zip_size,/*!< in: compressed page size in bytes, or 0 */
  277. ulint offset) /*!< in: page number */
  278. {
  279. ib_int64_t tablespace_version;
  280. ulint count;
  281. ulint count2;
  282. ulint err;
  283. tablespace_version = fil_space_get_version(space);
  284. count = buf_read_ahead_random(space, zip_size, offset);
  285. /* We do the i/o in the synchronous aio mode to save thread
  286. switches: hence TRUE */
  287. count2 = buf_read_page_low(&err, TRUE, BUF_READ_ANY_PAGE, space,
  288. zip_size, FALSE,
  289. tablespace_version, offset);
  290. srv_buf_pool_reads+= count2;
  291. if (err == DB_TABLESPACE_DELETED) {
  292. ut_print_timestamp(stderr);
  293. fprintf(stderr,
  294. " InnoDB: Error: trying to access"
  295. " tablespace %lu page no. %lu,\n"
  296. "InnoDB: but the tablespace does not exist"
  297. " or is just being dropped.\n",
  298. (ulong) space, (ulong) offset);
  299. }
  300. /* Flush pages from the end of the LRU list if necessary */
  301. buf_flush_free_margin();
  302. /* Increment number of I/O operations used for LRU policy. */
  303. buf_LRU_stat_inc_io();
  304. return(count + count2);
  305. }
  306. /********************************************************************//**
  307. Applies linear read-ahead if in the buf_pool the page is a border page of
  308. a linear read-ahead area and all the pages in the area have been accessed.
  309. Does not read any page if the read-ahead mechanism is not activated. Note
  310. that the the algorithm looks at the 'natural' adjacent successor and
  311. predecessor of the page, which on the leaf level of a B-tree are the next
  312. and previous page in the chain of leaves. To know these, the page specified
  313. in (space, offset) must already be present in the buf_pool. Thus, the
  314. natural way to use this function is to call it when a page in the buf_pool
  315. is accessed the first time, calling this function just after it has been
  316. bufferfixed.
  317. NOTE 1: as this function looks at the natural predecessor and successor
  318. fields on the page, what happens, if these are not initialized to any
  319. sensible value? No problem, before applying read-ahead we check that the
  320. area to read is within the span of the space, if not, read-ahead is not
  321. applied. An uninitialized value may result in a useless read operation, but
  322. only very improbably.
  323. NOTE 2: the calling thread may own latches on pages: to avoid deadlocks this
  324. function must be written such that it cannot end up waiting for these
  325. latches!
  326. NOTE 3: the calling thread must want access to the page given: this rule is
  327. set to prevent unintended read-aheads performed by ibuf routines, a situation
  328. which could result in a deadlock if the OS does not support asynchronous io.
  329. @return number of page read requests issued */
  330. UNIV_INTERN
  331. ulint
  332. buf_read_ahead_linear(
  333. /*==================*/
  334. ulint space, /*!< in: space id */
  335. ulint zip_size,/*!< in: compressed page size in bytes, or 0 */
  336. ulint offset) /*!< in: page number of a page; NOTE: the current thread
  337. must want access to this page (see NOTE 3 above) */
  338. {
  339. ib_int64_t tablespace_version;
  340. buf_page_t* bpage;
  341. buf_frame_t* frame;
  342. buf_page_t* pred_bpage = NULL;
  343. ulint pred_offset;
  344. ulint succ_offset;
  345. ulint count;
  346. int asc_or_desc;
  347. ulint new_offset;
  348. ulint fail_count;
  349. ulint ibuf_mode;
  350. ulint low, high;
  351. ulint err;
  352. ulint i;
  353. const ulint buf_read_ahead_linear_area
  354. = BUF_READ_AHEAD_LINEAR_AREA;
  355. ulint threshold;
  356. if (UNIV_UNLIKELY(srv_startup_is_before_trx_rollback_phase)) {
  357. /* No read-ahead to avoid thread deadlocks */
  358. return(0);
  359. }
  360. low = (offset / buf_read_ahead_linear_area)
  361. * buf_read_ahead_linear_area;
  362. high = (offset / buf_read_ahead_linear_area + 1)
  363. * buf_read_ahead_linear_area;
  364. if ((offset != low) && (offset != high - 1)) {
  365. /* This is not a border page of the area: return */
  366. return(0);
  367. }
  368. if (ibuf_bitmap_page(zip_size, offset)
  369. || trx_sys_hdr_page(space, offset)) {
  370. /* If it is an ibuf bitmap page or trx sys hdr, we do
  371. no read-ahead, as that could break the ibuf page access
  372. order */
  373. return(0);
  374. }
  375. /* Remember the tablespace version before we ask te tablespace size
  376. below: if DISCARD + IMPORT changes the actual .ibd file meanwhile, we
  377. do not try to read outside the bounds of the tablespace! */
  378. tablespace_version = fil_space_get_version(space);
  379. buf_pool_mutex_enter();
  380. if (high > fil_space_get_size(space)) {
  381. buf_pool_mutex_exit();
  382. /* The area is not whole, return */
  383. return(0);
  384. }
  385. if (buf_pool->n_pend_reads
  386. > buf_pool->curr_size / BUF_READ_AHEAD_PEND_LIMIT) {
  387. buf_pool_mutex_exit();
  388. return(0);
  389. }
  390. /* Check that almost all pages in the area have been accessed; if
  391. offset == low, the accesses must be in a descending order, otherwise,
  392. in an ascending order. */
  393. asc_or_desc = 1;
  394. if (offset == low) {
  395. asc_or_desc = -1;
  396. }
  397. /* How many out of order accessed pages can we ignore
  398. when working out the access pattern for linear readahead */
  399. threshold = ut_min(srv_read_ahead_factor, BUF_READ_AHEAD_AREA);
  400. fail_count = 0;
  401. for (i = low; i < high; i++) {
  402. bpage = buf_page_hash_get(space, i);
  403. if ((bpage == NULL) || !buf_page_is_accessed(bpage)) {
  404. /* Not accessed */
  405. fail_count++;
  406. } else if (pred_bpage) {
  407. int res = (ut_ulint_cmp(
  408. buf_page_get_LRU_position(bpage),
  409. buf_page_get_LRU_position(pred_bpage)));
  410. /* Accesses not in the right order */
  411. if (res != 0 && res != asc_or_desc) {
  412. fail_count++;
  413. }
  414. }
  415. if (fail_count > threshold) {
  416. /* Too many failures: return */
  417. buf_pool_mutex_exit();
  418. return(0);
  419. }
  420. if (bpage && buf_page_is_accessed(bpage)) {
  421. pred_bpage = bpage;
  422. }
  423. }
  424. /* If we got this far, we know that enough pages in the area have
  425. been accessed in the right order: linear read-ahead can be sensible */
  426. bpage = buf_page_hash_get(space, offset);
  427. if (bpage == NULL) {
  428. buf_pool_mutex_exit();
  429. return(0);
  430. }
  431. switch (buf_page_get_state(bpage)) {
  432. case BUF_BLOCK_ZIP_PAGE:
  433. frame = bpage->zip.data;
  434. break;
  435. case BUF_BLOCK_FILE_PAGE:
  436. frame = ((buf_block_t*) bpage)->frame;
  437. break;
  438. default:
  439. ut_error;
  440. break;
  441. }
  442. /* Read the natural predecessor and successor page addresses from
  443. the page; NOTE that because the calling thread may have an x-latch
  444. on the page, we do not acquire an s-latch on the page, this is to
  445. prevent deadlocks. Even if we read values which are nonsense, the
  446. algorithm will work. */
  447. pred_offset = fil_page_get_prev(frame);
  448. succ_offset = fil_page_get_next(frame);
  449. buf_pool_mutex_exit();
  450. if ((offset == low) && (succ_offset == offset + 1)) {
  451. /* This is ok, we can continue */
  452. new_offset = pred_offset;
  453. } else if ((offset == high - 1) && (pred_offset == offset - 1)) {
  454. /* This is ok, we can continue */
  455. new_offset = succ_offset;
  456. } else {
  457. /* Successor or predecessor not in the right order */
  458. return(0);
  459. }
  460. low = (new_offset / buf_read_ahead_linear_area)
  461. * buf_read_ahead_linear_area;
  462. high = (new_offset / buf_read_ahead_linear_area + 1)
  463. * buf_read_ahead_linear_area;
  464. if ((new_offset != low) && (new_offset != high - 1)) {
  465. /* This is not a border page of the area: return */
  466. return(0);
  467. }
  468. if (high > fil_space_get_size(space)) {
  469. /* The area is not whole, return */
  470. return(0);
  471. }
  472. /* If we got this far, read-ahead can be sensible: do it */
  473. if (ibuf_inside()) {
  474. ibuf_mode = BUF_READ_IBUF_PAGES_ONLY;
  475. } else {
  476. ibuf_mode = BUF_READ_ANY_PAGE;
  477. }
  478. count = 0;
  479. /* Since Windows XP seems to schedule the i/o handler thread
  480. very eagerly, and consequently it does not wait for the
  481. full read batch to be posted, we use special heuristics here */
  482. os_aio_simulated_put_read_threads_to_sleep();
  483. for (i = low; i < high; i++) {
  484. /* It is only sensible to do read-ahead in the non-sync
  485. aio mode: hence FALSE as the first parameter */
  486. if (!ibuf_bitmap_page(zip_size, i)) {
  487. count += buf_read_page_low(
  488. &err, FALSE,
  489. ibuf_mode | OS_AIO_SIMULATED_WAKE_LATER,
  490. space, zip_size, FALSE, tablespace_version, i);
  491. if (err == DB_TABLESPACE_DELETED) {
  492. ut_print_timestamp(stderr);
  493. fprintf(stderr,
  494. " InnoDB: Warning: in"
  495. " linear readahead trying to access\n"
  496. "InnoDB: tablespace %lu page %lu,\n"
  497. "InnoDB: but the tablespace does not"
  498. " exist or is just being dropped.\n",
  499. (ulong) space, (ulong) i);
  500. }
  501. }
  502. }
  503. /* In simulated aio we wake the aio handler threads only after
  504. queuing all aio requests, in native aio the following call does
  505. nothing: */
  506. os_aio_simulated_wake_handler_threads();
  507. /* Flush pages from the end of the LRU list if necessary */
  508. buf_flush_free_margin();
  509. #ifdef UNIV_DEBUG
  510. if (buf_debug_prints && (count > 0)) {
  511. fprintf(stderr,
  512. "LINEAR read-ahead space %lu offset %lu pages %lu\n",
  513. (ulong) space, (ulong) offset, (ulong) count);
  514. }
  515. #endif /* UNIV_DEBUG */
  516. /* Read ahead is considered one I/O operation for the purpose of
  517. LRU policy decision. */
  518. buf_LRU_stat_inc_io();
  519. ++srv_read_ahead_seq;
  520. return(count);
  521. }
  522. /********************************************************************//**
  523. Issues read requests for pages which the ibuf module wants to read in, in
  524. order to contract the insert buffer tree. Technically, this function is like
  525. a read-ahead function. */
  526. UNIV_INTERN
  527. void
  528. buf_read_ibuf_merge_pages(
  529. /*======================*/
  530. ibool sync, /*!< in: TRUE if the caller
  531. wants this function to wait
  532. for the highest address page
  533. to get read in, before this
  534. function returns */
  535. const ulint* space_ids, /*!< in: array of space ids */
  536. const ib_int64_t* space_versions,/*!< in: the spaces must have
  537. this version number
  538. (timestamp), otherwise we
  539. discard the read; we use this
  540. to cancel reads if DISCARD +
  541. IMPORT may have changed the
  542. tablespace size */
  543. const ulint* page_nos, /*!< in: array of page numbers
  544. to read, with the highest page
  545. number the last in the
  546. array */
  547. ulint n_stored) /*!< in: number of elements
  548. in the arrays */
  549. {
  550. ulint i;
  551. ut_ad(!ibuf_inside());
  552. #ifdef UNIV_IBUF_DEBUG
  553. ut_a(n_stored < UNIV_PAGE_SIZE);
  554. #endif
  555. while (buf_pool->n_pend_reads
  556. > buf_pool->curr_size / BUF_READ_AHEAD_PEND_LIMIT) {
  557. os_thread_sleep(500000);
  558. }
  559. for (i = 0; i < n_stored; i++) {
  560. ulint zip_size = fil_space_get_zip_size(space_ids[i]);
  561. ulint err;
  562. if (UNIV_UNLIKELY(zip_size == ULINT_UNDEFINED)) {
  563. goto tablespace_deleted;
  564. }
  565. buf_read_page_low(&err, sync && (i + 1 == n_stored),
  566. BUF_READ_ANY_PAGE, space_ids[i],
  567. zip_size, TRUE, space_versions[i],
  568. page_nos[i]);
  569. if (UNIV_UNLIKELY(err == DB_TABLESPACE_DELETED)) {
  570. tablespace_deleted:
  571. /* We have deleted or are deleting the single-table
  572. tablespace: remove the entries for that page */
  573. ibuf_merge_or_delete_for_page(NULL, space_ids[i],
  574. page_nos[i],
  575. zip_size, FALSE);
  576. }
  577. }
  578. os_aio_simulated_wake_handler_threads();
  579. /* Flush pages from the end of the LRU list if necessary */
  580. buf_flush_free_margin();
  581. #ifdef UNIV_DEBUG
  582. if (buf_debug_prints) {
  583. fprintf(stderr,
  584. "Ibuf merge read-ahead space %lu pages %lu\n",
  585. (ulong) space_ids[0], (ulong) n_stored);
  586. }
  587. #endif /* UNIV_DEBUG */
  588. }
  589. /********************************************************************//**
  590. Issues read requests for pages which recovery wants to read in. */
  591. UNIV_INTERN
  592. void
  593. buf_read_recv_pages(
  594. /*================*/
  595. ibool sync, /*!< in: TRUE if the caller
  596. wants this function to wait
  597. for the highest address page
  598. to get read in, before this
  599. function returns */
  600. ulint space, /*!< in: space id */
  601. ulint zip_size, /*!< in: compressed page size in
  602. bytes, or 0 */
  603. const ulint* page_nos, /*!< in: array of page numbers
  604. to read, with the highest page
  605. number the last in the
  606. array */
  607. ulint n_stored) /*!< in: number of page numbers
  608. in the array */
  609. {
  610. ib_int64_t tablespace_version;
  611. ulint count;
  612. ulint err;
  613. ulint i;
  614. zip_size = fil_space_get_zip_size(space);
  615. if (UNIV_UNLIKELY(zip_size == ULINT_UNDEFINED)) {
  616. /* It is a single table tablespace and the .ibd file is
  617. missing: do nothing */
  618. return;
  619. }
  620. tablespace_version = fil_space_get_version(space);
  621. for (i = 0; i < n_stored; i++) {
  622. count = 0;
  623. os_aio_print_debug = FALSE;
  624. while (buf_pool->n_pend_reads >= recv_n_pool_free_frames / 2) {
  625. os_aio_simulated_wake_handler_threads();
  626. os_thread_sleep(500000);
  627. count++;
  628. if (count > 100) {
  629. fprintf(stderr,
  630. "InnoDB: Error: InnoDB has waited for"
  631. " 50 seconds for pending\n"
  632. "InnoDB: reads to the buffer pool to"
  633. " be finished.\n"
  634. "InnoDB: Number of pending reads %lu,"
  635. " pending pread calls %lu\n",
  636. (ulong) buf_pool->n_pend_reads,
  637. (ulong)os_file_n_pending_preads);
  638. os_aio_print_debug = TRUE;
  639. }
  640. }
  641. os_aio_print_debug = FALSE;
  642. if ((i + 1 == n_stored) && sync) {
  643. buf_read_page_low(&err, TRUE, BUF_READ_ANY_PAGE, space,
  644. zip_size, TRUE, tablespace_version,
  645. page_nos[i]);
  646. } else {
  647. buf_read_page_low(&err, FALSE, BUF_READ_ANY_PAGE
  648. | OS_AIO_SIMULATED_WAKE_LATER,
  649. space, zip_size, TRUE,
  650. tablespace_version, page_nos[i]);
  651. }
  652. }
  653. os_aio_simulated_wake_handler_threads();
  654. /* Flush pages from the end of the LRU list if necessary */
  655. buf_flush_free_margin();
  656. #ifdef UNIV_DEBUG
  657. if (buf_debug_prints) {
  658. fprintf(stderr,
  659. "Recovery applies read-ahead pages %lu\n",
  660. (ulong) n_stored);
  661. }
  662. #endif /* UNIV_DEBUG */
  663. }