1, Data
We talked about the log before the database drop. Today, it's very troublesome to analyze the data drop. But the principle is the same. In the previous analysis, we can clearly know that in MySql, no matter what kind of data, it enters the cache first, and then drops the disk to save. In the database, what is the most important? Of course, it's data, no matter what 2PC you are, what cache, what thread, etc. The ultimate goal is to ensure the safe application of data. To put it bluntly, it is to meet the operation of various SQL statements and support various recovery and backup of data and database migration. Sir Ma didn't say that the DT era will follow, and data is king. Therefore, the domestic database industry has been booming in recent years.
2, Data writing
Data writing starts after the log transaction is committed. That is, the data pages cached in the buffer pool. At this time, if there are special circumstances, such as power failure, how to ensure the security of data? As mentioned earlier, the Redo Log will be processed. Therefore, each transaction submission will trigger a Redo Log. Of course, if Binlog is opened, it will also brush Binlog.
When all transactions are completed (that is, both logs are successfully flushed), MySql will have a double write mechanism to write the data (i.e. dirty data) in the Buffer Pool to double write, which is stored in the shared table space of the database. At this time, the disk has been dropped, but it is still not the real data. The purpose of this is also to ensure the security of incomplete dirty page write failure. Together with Redo Log, it ensures the safe recovery of data. (the granularity of the log is page, and double write is responsible for the security of the data in the page)
In addition, in the operation of database, there are two types of indexes of database tables, namely, clustered index and non clustered index. Clustered index is easy to handle, and it has flat physical space. However, non clustered indexes are not. If they are updated every time, the efficiency will be very low. In fact, at this point, those who have some experience should understand that it's nothing, cache, and finally focus on writing. Yes, MySql is called Insert Buffer.
3, Source code analysis
Analyze the INNODB service status source code:
/** Function to pass InnoDB status variables to MySQL */ void srv_export_innodb_status(void) { buf_pool_stat_t stat; buf_pools_list_size_t buf_pools_list_size; ulint LRU_len; ulint free_len; ulint flush_list_len; buf_get_total_stat(&stat); buf_get_total_list_len(&LRU_len, &free_len, &flush_list_len); buf_get_total_list_size_in_bytes(&buf_pools_list_size); mutex_enter(&srv_innodb_monitor_mutex); export_vars.innodb_data_pending_reads = os_n_pending_reads; export_vars.innodb_data_pending_writes = os_n_pending_writes; export_vars.innodb_data_pending_fsyncs = fil_n_pending_log_flushes + fil_n_pending_tablespace_flushes; export_vars.innodb_data_fsyncs = os_n_fsyncs; export_vars.innodb_data_read = srv_stats.data_read; export_vars.innodb_data_reads = os_n_file_reads; export_vars.innodb_data_writes = os_n_file_writes; export_vars.innodb_data_written = srv_stats.data_written; export_vars.innodb_buffer_pool_read_requests = Counter::total(stat.m_n_page_gets); export_vars.innodb_buffer_pool_write_requests = srv_stats.buf_pool_write_requests; export_vars.innodb_buffer_pool_wait_free = srv_stats.buf_pool_wait_free; export_vars.innodb_buffer_pool_pages_flushed = srv_stats.buf_pool_flushed; export_vars.innodb_buffer_pool_reads = srv_stats.buf_pool_reads; export_vars.innodb_buffer_pool_read_ahead_rnd = stat.n_ra_pages_read_rnd; export_vars.innodb_buffer_pool_read_ahead = stat.n_ra_pages_read; export_vars.innodb_buffer_pool_read_ahead_evicted = stat.n_ra_pages_evicted; export_vars.innodb_buffer_pool_pages_data = LRU_len; export_vars.innodb_buffer_pool_bytes_data = buf_pools_list_size.LRU_bytes + buf_pools_list_size.unzip_LRU_bytes; export_vars.innodb_buffer_pool_pages_dirty = flush_list_len; export_vars.innodb_buffer_pool_bytes_dirty = buf_pools_list_size.flush_list_bytes; export_vars.innodb_buffer_pool_pages_free = free_len; #ifdef UNIV_DEBUG export_vars.innodb_buffer_pool_pages_latched = buf_get_latched_pages_number(); #endif /* UNIV_DEBUG */ export_vars.innodb_buffer_pool_pages_total = buf_pool_get_n_pages(); export_vars.innodb_buffer_pool_pages_misc = buf_pool_get_n_pages() - LRU_len - free_len; export_vars.innodb_page_size = UNIV_PAGE_SIZE; export_vars.innodb_log_waits = srv_stats.log_waits; export_vars.innodb_os_log_written = srv_stats.os_log_written; export_vars.innodb_os_log_fsyncs = fil_n_log_flushes; export_vars.innodb_os_log_pending_fsyncs = fil_n_pending_log_flushes; export_vars.innodb_os_log_pending_writes = srv_stats.os_log_pending_writes; export_vars.innodb_log_write_requests = srv_stats.log_write_requests; export_vars.innodb_log_writes = srv_stats.log_writes; //Focus on these two variables //Monitors the number of pages of DOBLE WRITE export_vars.innodb_dblwr_pages_written = srv_stats.dblwr_pages_written; monitor DOUBLE WRITE Number of writes export_vars.innodb_dblwr_writes = srv_stats.dblwr_writes; export_vars.innodb_pages_created = stat.n_pages_created; export_vars.innodb_pages_read = stat.n_pages_read; export_vars.innodb_pages_written = stat.n_pages_written; export_vars.innodb_redo_log_enabled = srv_redo_log; export_vars.innodb_row_lock_waits = srv_stats.n_lock_wait_count; export_vars.innodb_row_lock_current_waits = srv_stats.n_lock_wait_current_count; export_vars.innodb_row_lock_time = srv_stats.n_lock_wait_time / 1000; ...... mutex_exit(&srv_innodb_monitor_mutex); }
Let's take another look at the process of DoubleWrite. First, there are several functions to start writing, namely buf_flush_page_try,buf_flush_try_neighbors,buf_flush_single_page_from and MMY_ATTRIBUTE, they all call a function:
ibool buf_flush_page(buf_pool_t *buf_pool, buf_page_t *bpage, buf_flush_t flush_type, bool sync) { BPageMutex *block_mutex; ut_ad(flush_type < BUF_FLUSH_N_TYPES); /* Hold the LRU list mutex iff called for a single page LRU flush. A single page LRU flush is already non-performant, and holding the LRU list mutex allows us to avoid having to store the previous LRU list page or to restart the LRU scan in buf_flush_single_page_from_LRU(). */ ut_ad(flush_type == BUF_FLUSH_SINGLE_PAGE || !mutex_own(&buf_pool->LRU_list_mutex)); ut_ad(flush_type != BUF_FLUSH_SINGLE_PAGE || mutex_own(&buf_pool->LRU_list_mutex)); ut_ad(buf_page_in_file(bpage)); ut_ad(!sync || flush_type == BUF_FLUSH_SINGLE_PAGE); block_mutex = buf_page_get_mutex(bpage); ut_ad(mutex_own(block_mutex)); ut_ad(buf_flush_ready_for_flush(bpage, flush_type)); bool is_uncompressed; is_uncompressed = (buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE); ut_ad(is_uncompressed == (block_mutex != &buf_pool->zip_mutex)); ibool flush; rw_lock_t *rw_lock = nullptr; bool no_fix_count = bpage->buf_fix_count == 0; if (!is_uncompressed) { flush = TRUE; rw_lock = nullptr; } else if (!(no_fix_count || flush_type == BUF_FLUSH_LIST) || (!no_fix_count && srv_shutdown_state.load() < SRV_SHUTDOWN_FLUSH_PHASE && fsp_is_system_temporary(bpage->id.space()))) { /* This is a heuristic, to avoid expensive SX attempts. */ /* For table residing in temporary tablespace sync is done using IO_FIX and so before scheduling for flush ensure that page is not fixed. */ flush = FALSE; } else { rw_lock = &reinterpret_cast<buf_block_t *>(bpage)->lock; if (flush_type != BUF_FLUSH_LIST) { flush = rw_lock_sx_lock_nowait(rw_lock, BUF_IO_WRITE); } else { /* Will SX lock later */ flush = TRUE; } } if (flush) { /* We are committed to flushing by the time we get here */ mutex_enter(&buf_pool->flush_state_mutex); buf_page_set_io_fix(bpage, BUF_IO_WRITE); buf_page_set_flush_type(bpage, flush_type); if (buf_pool->n_flush[flush_type] == 0) { os_event_reset(buf_pool->no_flush[flush_type]); } ++buf_pool->n_flush[flush_type]; if (bpage->get_oldest_lsn() > buf_pool->max_lsn_io) { buf_pool->max_lsn_io = bpage->get_oldest_lsn(); } if (!fsp_is_system_temporary(bpage->id.space()) && buf_pool->track_page_lsn != LSN_MAX) { auto frame = bpage->zip.data; if (frame == nullptr) { frame = ((buf_block_t *)bpage)->frame; } lsn_t frame_lsn = mach_read_from_8(frame + FIL_PAGE_LSN); arch_page_sys->track_page(bpage, buf_pool->track_page_lsn, frame_lsn, false); } mutex_exit(&buf_pool->flush_state_mutex); mutex_exit(block_mutex); if (flush_type == BUF_FLUSH_SINGLE_PAGE) { mutex_exit(&buf_pool->LRU_list_mutex); } if (flush_type == BUF_FLUSH_LIST && is_uncompressed && !rw_lock_sx_lock_nowait(rw_lock, BUF_IO_WRITE)) { if (!fsp_is_system_temporary(bpage->id.space()) && dblwr::enabled) { dblwr::force_flush(flush_type, buf_pool_index(buf_pool)); } else { buf_flush_sync_datafiles(); } rw_lock_sx_lock_gen(rw_lock, BUF_IO_WRITE); } /* If there is an observer that wants to know if the asynchronous flushing was sent then notify it. Note: we set flush observer to a page with x-latch, so we can guarantee that notify_flush and notify_remove are called in pair with s-latch on a uncompressed page. */ if (bpage->get_flush_observer() != nullptr) { bpage->get_flush_observer()->notify_flush(buf_pool, bpage); } /* Even though bpage is not protected by any mutex at this point, it is safe to access bpage, because it is io_fixed and oldest_modification != 0. Thus, it cannot be relocated in the buffer pool or removed from flush_list or LRU_list. */ buf_flush_write_block_low(bpage, flush_type, sync); } return (flush); }
There are several ways to look at the brush plate. There is dblwr::force_flush,buf_flush_sync_datafiles,buf_ flush_ write_ block_ At the same time, if a viewer is waiting for this action, call bpage - > get_ flush_ observer()->notify_ Flush (buf_pool, bpage) to notify relevant observers.
Take a look at the relevant source code and list only some:
void force_flush(buf_flush_t flush_type) noexcept { for (;;) { mutex_enter(&m_mutex); if (!m_buf_pages.empty() && !flush_to_disk(flush_type)) { ut_ad(!mutex_own(&m_mutex)); continue; } break; } mutex_exit(&m_mutex); } bool flush_to_disk(buf_flush_t flush_type) noexcept { ut_ad(mutex_own(&m_mutex)); /* Wait for any batch writes that are in progress. */ if (wait_for_pending_batch()) { ut_ad(!mutex_own(&m_mutex)); return false; } MONITOR_INC(MONITOR_DBLWR_FLUSH_REQUESTS); /* Write the pages to disk and free up the buffer. */ write_pages(flush_type); ut_a(m_buffer.empty()); ut_a(m_buf_pages.empty()); return true; } void Double_write::write_pages(buf_flush_t flush_type) noexcept { ut_ad(mutex_own(&m_mutex)); ut_a(!m_buffer.empty()); Batch_segment *batch_segment{}; auto segments = flush_type == BUF_FLUSH_LRU ? s_LRU_batch_segments : s_flush_list_batch_segments; while (!segments->dequeue(batch_segment)) { std::this_thread::yield(); } batch_segment->start(this); //Call write file batch_segment->write(m_buffer); m_buffer.clear(); #ifndef _WIN32 if (is_fsync_required()) { batch_segment->flush(); } #endif /* !_WIN32 */ batch_segment->set_batch_size(m_buf_pages.size()); for (uint32_t i = 0; i < m_buf_pages.size(); ++i) { const auto bpage = std::get<0>(m_buf_pages.m_pages[i]); ut_d(auto page_id = bpage->id); bpage->set_dblwr_batch_id(batch_segment->id()); ut_d(bpage->take_io_responsibility()); auto err = write_to_datafile(bpage, false, std::get<1>(m_buf_pages.m_pages[i]), std::get<2>(m_buf_pages.m_pages[i])); if (err == DB_PAGE_IS_STALE || err == DB_TABLESPACE_DELETED) { write_complete(bpage, flush_type); buf_page_free_stale_during_write( bpage, buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE); const file::Block *block = std::get<1>(m_buf_pages.m_pages[i]); if (block != nullptr) { os_free_block(const_cast<file::Block *>(block)); } } else { ut_a(err == DB_SUCCESS); } /* We don't hold io_responsibility here no matter which path through ifs and elses we've got here, but we can't assert: ut_ad(!bpage->current_thread_has_io_responsibility()); because bpage could be freed by the time we got here. */ #ifdef UNIV_DEBUG if (dblwr::Force_crash == page_id) { DBUG_SUICIDE(); } #endif /* UNIV_DEBUG */ } srv_stats.dblwr_writes.inc(); m_buf_pages.clear(); os_aio_simulated_wake_handler_threads(); } dberr_t os_file_write_retry(IORequest &type, const char *name, pfs_os_file_t file, const void *buf, os_offset_t offset, ulint n) { dberr_t err; for (;;) { err = os_file_write(type, name, file, buf, offset, n); if (err == DB_SUCCESS || err == DB_TABLESPACE_DELETED) { break; } else if (err == DB_IO_ERROR) { ib::error(ER_INNODB_IO_WRITE_ERROR_RETRYING, name); std::chrono::seconds ten(10); std::this_thread::sleep_for(ten); continue; } else { ib::fatal(ER_INNODB_IO_WRITE_FAILED, name); } } return err; } dberr_t os_file_write_func(IORequest &type, const char *name, os_file_t file, const void *buf, os_offset_t offset, ulint n) { ut_ad(type.validate()); ut_ad(type.is_write()); /* We never compress the first page. Note: This assumes we always do block IO. */ if (offset == 0) { type.clear_compressed(); } const byte *ptr = reinterpret_cast<const byte *>(buf); return os_file_write_page(type, name, file, ptr, offset, n, type.get_encrypted_block()); }
Two macros need to be noted here:
#define os_file_write(type, name, file, buf, offset, n) \ os_file_write_pfs(type, name, file, buf, offset, n) #define os_file_write_pfs(type, name, file, buf, offset, n) \ os_file_write_func(type, name, file, buf, offset, n)
The last call is os_. file_ write_ Func this function. Final call os_file_write_page, written to disk. Others, such as table space processing, can be understood by viewing several other written functions.
Of course, in some cases, you can directly brush the disk without DOUBLE WTITE. One is to turn off this option, and the other is similar to some actions that do not need this operation, such as Drop Table.
Then look at the data drop:
//Write on top_ Write is called in the pages function_ to_ datafile /** Writes a page that has already been written to the doublewrite buffer to the data file. It is the job of the caller to sync the datafile. @param[in] in_bpage Page to write. @param[in] sync true if it's a synchronous write. @param[in] e_block block containing encrypted data frame. @param[in] e_len encrypted data length. @return DB_SUCCESS or error code */ static dberr_t write_to_datafile(const buf_page_t *in_bpage, bool sync, const file::Block* e_block, uint32_t e_len) noexcept MY_ATTRIBUTE((warn_unused_result)); dberr_t Double_write::write_to_datafile(const buf_page_t *in_bpage, bool sync, const file::Block *e_block, uint32_t e_len) noexcept { ut_ad(buf_page_in_file(in_bpage)); ut_ad(in_bpage->current_thread_has_io_responsibility()); ut_ad(in_bpage->is_io_fix_write()); uint32_t len; void *frame{}; if (e_block == nullptr) { Double_write::prepare(in_bpage, &frame, &len); } else { frame = os_block_get_frame(e_block); len = e_len; } /* Our IO API is common for both reads and writes and is therefore geared towards a non-const parameter. */ auto bpage = const_cast<buf_page_t *>(in_bpage); uint32_t type = IORequest::WRITE; if (sync) { type |= IORequest::DO_NOT_WAKE; } IORequest io_request(type); io_request.set_encrypted_block(e_block); #ifdef UNIV_DEBUG { byte *page = static_cast<byte *>(frame); ut_ad(mach_read_from_4(page + FIL_PAGE_OFFSET) == bpage->page_no()); ut_ad(mach_read_from_4(page + FIL_PAGE_SPACE_ID) == bpage->space()); } #endif /* UNIV_DEBUG */ auto err = fil_io(io_request, sync, bpage->id, bpage->size, 0, len, frame, bpage); /* When a tablespace is deleted with BUF_REMOVE_NONE, fil_io() might return DB_PAGE_IS_STALE or DB_TABLESPACE_DELETED. */ ut_a(err == DB_SUCCESS || err == DB_TABLESPACE_DELETED || err == DB_PAGE_IS_STALE); return err; } dberr_t fil_io(const IORequest &type, bool sync, const page_id_t &page_id, const page_size_t &page_size, ulint byte_offset, ulint len, void *buf, void *message) { auto shard = fil_system->shard_by_id(page_id.space()); #ifdef UNIV_DEBUG if (!sync) { /* In case of async io we transfer the io responsibility to the thread which will perform the io completion routine. */ static_cast<buf_page_t *>(message)->release_io_responsibility(); } #endif auto const err = shard->do_io(type, sync, page_id, page_size, byte_offset, len, buf, message); #ifdef UNIV_DEBUG /* If the error prevented async io, then we haven't actually transfered the io responsibility at all, so we revert the debug io responsibility info. */ if (err != DB_SUCCESS && !sync) { static_cast<buf_page_t *>(message)->take_io_responsibility(); } #endif return err; }
Finally, take a look at the calling function when DOUBLE WRITE is not allowed:
//It means writing data directly to the hard disk. There are generally several situations: 1,DML Huge amount of data 2,Insensitive to detail corruption of data 3,Write load too large /** Flush a batch of writes to the datafiles that have already been written to the dblwr buffer on disk. */ static void buf_flush_sync_datafiles() { /* Wake possible simulated AIO thread to actually post the writes to the operating system */ os_aio_simulated_wake_handler_threads(); /* Wait that all async writes to tablespaces have been posted to the OS */ os_aio_wait_until_no_pending_writes(); /* Now we flush the data to disk (for example, with fsync) */ fil_flush_file_spaces(FIL_TYPE_TABLESPACE); } /** Flush to disk the writes in file spaces of the given type possibly cached by the OS. @param[in] purpose FIL_TYPE_TABLESPACE or FIL_TYPE_LOG, can be ORred. */ void fil_flush_file_spaces(uint8_t purpose) { fil_system->flush_file_spaces(purpose); }
The above code is actually very simple. In fact, it is to wake up the operation thread and brush the disk directly. Finally, take a look at asynchronous disk dropping:
/** Does an asynchronous write of a buffer page. @param[in] bpage buffer block to write @param[in] flush_type type of flush @param[in] sync true if sync IO request */ static void buf_flush_write_block_low(buf_page_t *bpage, buf_flush_t flush_type, bool sync) { page_t *frame = nullptr; #ifdef UNIV_DEBUG buf_pool_t *buf_pool = buf_pool_from_bpage(bpage); ut_ad(!mutex_own(&buf_pool->LRU_list_mutex)); #endif /* UNIV_DEBUG */ DBUG_PRINT("ib_buf", ("flush %s %u page " UINT32PF ":" UINT32PF, sync ? "sync" : "async", (unsigned)flush_type, bpage->id.space(), bpage->id.page_no())); ut_ad(buf_page_in_file(bpage)); /* We are not holding block_mutex here. Nevertheless, it is safe to access bpage, because it is io_fixed and oldest_modification != 0. Thus, it cannot be relocated in the buffer pool or removed from flush_list or LRU_list. */ ut_ad(!buf_flush_list_mutex_own(buf_pool)); ut_ad(!buf_page_get_mutex(bpage)->is_owned()); ut_ad(bpage->is_io_fix_write()); ut_ad(bpage->is_dirty()); #ifdef UNIV_IBUF_COUNT_DEBUG ut_a(ibuf_count_get(bpage->id) == 0); #endif /* UNIV_IBUF_COUNT_DEBUG */ ut_ad(recv_recovery_is_on() || bpage->get_newest_lsn() != 0); /* Force the log to the disk before writing the modified block */ if (!srv_read_only_mode) { const lsn_t flush_to_lsn = bpage->get_newest_lsn(); /* Do the check before calling log_write_up_to() because in most cases it would allow to avoid call, and because of that we don't want those calls because they would have bad impact on the counter of calls, which is monitored to save CPU on spinning in log threads. */ if (log_sys->flushed_to_disk_lsn.load() < flush_to_lsn) { Wait_stats wait_stats; wait_stats = log_write_up_to(*log_sys, flush_to_lsn, true); MONITOR_INC_WAIT_STATS_EX(MONITOR_ON_LOG_, _PAGE_WRITTEN, wait_stats); } } DBUG_EXECUTE_IF("log_first_rec_group_test", { recv_no_ibuf_operations = false; const lsn_t end_lsn = mtr_commit_mlog_test(*log_sys); log_write_up_to(*log_sys, end_lsn, true); DBUG_SUICIDE(); }); switch (buf_page_get_state(bpage)) { case BUF_BLOCK_POOL_WATCH: case BUF_BLOCK_ZIP_PAGE: /* The page should be dirty. */ case BUF_BLOCK_NOT_USED: case BUF_BLOCK_READY_FOR_USE: case BUF_BLOCK_MEMORY: case BUF_BLOCK_REMOVE_HASH: ut_error; break; case BUF_BLOCK_ZIP_DIRTY: { frame = bpage->zip.data; BlockReporter reporter = BlockReporter(false, frame, bpage->size, fsp_is_checksum_disabled(bpage->id.space())); mach_write_to_8(frame + FIL_PAGE_LSN, bpage->get_newest_lsn()); ut_a(reporter.verify_zip_checksum()); break; } case BUF_BLOCK_FILE_PAGE: frame = bpage->zip.data; if (!frame) { frame = ((buf_block_t *)bpage)->frame; } buf_flush_init_for_writing( reinterpret_cast<const buf_block_t *>(bpage), reinterpret_cast<const buf_block_t *>(bpage)->frame, bpage->zip.data ? &bpage->zip : nullptr, bpage->get_newest_lsn(), fsp_is_checksum_disabled(bpage->id.space()), false /* do not skip lsn check */); break; } dberr_t err = dblwr::write(flush_type, bpage, sync); ut_a(err == DB_SUCCESS || err == DB_TABLESPACE_DELETED); /* Increment the counter of I/O operations used for selecting LRU policy. */ buf_LRU_stat_inc_io(); } dberr_t dblwr::write(buf_flush_t flush_type, buf_page_t *bpage, bool sync) noexcept { dberr_t err; const space_id_t space_id = bpage->id.space(); ut_ad(bpage->current_thread_has_io_responsibility()); /* This is not required for correctness, but it aborts the processing early. */ if (bpage->was_stale()) { /* Disable batch completion in write_complete(). */ bpage->set_dblwr_batch_id(std::numeric_limits<uint16_t>::max()); buf_page_free_stale_during_write( bpage, buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE); /* We don't hold io_responsibility here no matter which path through ifs and elses we've got here, but we can't assert: ut_ad(!bpage->current_thread_has_io_responsibility()); because bpage could be freed by the time we got here. */ return DB_SUCCESS; } if (srv_read_only_mode || fsp_is_system_temporary(space_id) || !dblwr::enabled || Double_write::s_instances == nullptr || mtr_t::s_logging.dblwr_disabled()) { /* Skip the double-write buffer since it is not needed. Temporary tablespaces are never recovered, therefore we don't care about torn writes. */ bpage->set_dblwr_batch_id(std::numeric_limits<uint16_t>::max()); err = Double_write::write_to_datafile(bpage, sync, nullptr, 0); if (err == DB_PAGE_IS_STALE || err == DB_TABLESPACE_DELETED) { buf_page_free_stale_during_write( bpage, buf_page_get_state(bpage) == BUF_BLOCK_FILE_PAGE); err = DB_SUCCESS; } else if (sync) { ut_ad(flush_type == BUF_FLUSH_LRU || flush_type == BUF_FLUSH_SINGLE_PAGE); if (err == DB_SUCCESS) { fil_flush(space_id); } /* true means we want to evict this page from the LRU list as well. */ buf_page_io_complete(bpage, true); } } else { ut_d(auto page_id = bpage->id); /* Encrypt the page here, so that the same encrypted contents are written to the dblwr file and the data file. */ uint32_t e_len{}; file::Block *e_block = dblwr::get_encrypted_frame(bpage, e_len); if (!sync && flush_type != BUF_FLUSH_SINGLE_PAGE) { MONITOR_INC(MONITOR_DBLWR_ASYNC_REQUESTS); ut_d(bpage->release_io_responsibility()); Double_write::submit(flush_type, bpage, e_block, e_len); err = DB_SUCCESS; #ifdef UNIV_DEBUG if (dblwr::Force_crash == page_id) { force_flush(flush_type, buf_pool_index(buf_pool_from_bpage(bpage))); } #endif /* UNIV_DEBUG */ } else { MONITOR_INC(MONITOR_DBLWR_SYNC_REQUESTS); /* Disable batch completion in write_complete(). */ bpage->set_dblwr_batch_id(std::numeric_limits<uint16_t>::max()); err = Double_write::sync_page_flush(bpage, e_block, e_len); } } /* We don't hold io_responsibility here no matter which path through ifs and elses we've got here, but we can't assert: ut_ad(!bpage->current_thread_has_io_responsibility()); because bpage could be freed by the time we got here. */ return err; }
write finally calls force_flush, that is, the code remains consistent.
4, Summary
The code is messy, dizzy and dizzy. Fortunately, I can basically see it clearly.
Work hard, returned boy!