Browse Source

MDEV-21452 fixup: Fix fake server hang reports

srv_monitor_task(): Make the innodb_fatal_semaphore_wait_threshold
watchdog tolerate non-monotonic clock. On NUMA systems, the
my_hrtime_coarse() executed by different NUMA nodes are not in sync,
and the clock could appear to run backwards. We must treat negative
time durations as zero, just like we did in
commit ff5d306e29 in
dict_sys_t::mutex_lock_wait().

The wrong logic caused occasional crashes of the test
mariabackup.apply-log-only-incr when it was run concurrently with
itself with a large number of instances.
pull/1727/head
Marko Mäkelä 5 years ago
parent
commit
8c68b54981
  1. 23
      storage/innobase/srv/srv0srv.cc

23
storage/innobase/srv/srv0srv.cc

@ -1346,20 +1346,23 @@ void srv_monitor_task(void*)
eviction policy. */
buf_LRU_stat_update();
const ulonglong now = my_hrtime_coarse().val;
ulonglong now = my_hrtime_coarse().val;
const ulong threshold = srv_fatal_semaphore_wait_threshold;
if (ulonglong start = dict_sys.oldest_wait()) {
ulong waited = static_cast<ulong>((now - start) / 1000000);
if (waited >= threshold) {
ib::fatal() << dict_sys.fatal_msg;
}
if (now >= start) {
now -= start;
ulong waited = static_cast<ulong>(now / 1000000);
if (waited >= threshold) {
ib::fatal() << dict_sys.fatal_msg;
}
if (waited == threshold / 4
|| waited == threshold / 2
|| waited == threshold / 4 * 3) {
ib::warn() << "Long wait (" << waited
<< " seconds) for dict_sys.mutex";
if (waited == threshold / 4
|| waited == threshold / 2
|| waited == threshold / 4 * 3) {
ib::warn() << "Long wait (" << waited
<< " seconds) for dict_sys.mutex";
}
}
}

Loading…
Cancel
Save