involflt-0.1.0/0000755000000000000000000000000014467303212012045 5ustar rootrootinvolflt-0.1.0/SECURITY.md0000644000000000000000000000515014467303177013651 0ustar rootroot## Security Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin), and [our GitHub organizations](https://opensource.microsoft.com/). If you believe you have found a security vulnerability in any Microsoft-owned repository that meets [Microsoft's definition of a security vulnerability](https://aka.ms/opensource/security/definition), please report it to us as described below. ## Reporting Security Issues **Please do not report security vulnerabilities through public GitHub issues.** Instead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://aka.ms/opensource/security/create-report). If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com). If possible, encrypt your message with our PGP key; please download it from the [Microsoft Security Response Center PGP Key page](https://aka.ms/opensource/security/pgpkey). You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://aka.ms/opensource/security/msrc). Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue: * Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.) * Full paths of source file(s) related to the manifestation of the issue * The location of the affected source code (tag/branch/commit or direct URL) * Any special configuration required to reproduce the issue * Step-by-step instructions to reproduce the issue * Proof-of-concept or exploit code (if possible) * Impact of the issue, including how an attacker might exploit the issue This information will help us triage your report more quickly. If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://aka.ms/opensource/security/bounty) page for more details about our active programs. ## Preferred Languages We prefer all communications to be in English. ## Policy Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://aka.ms/opensource/security/cvd). involflt-0.1.0/src/0000755000000000000000000000000014467303177012646 5ustar rootrootinvolflt-0.1.0/src/data-mode.h0000755000000000000000000000641714467303177014665 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ /* * File : data-mode.h * * Description: Header file used by data mode implementation */ #ifndef LINVOLFLT_DATA_MODE_H #define LINVOLFLT_DATA_MODE_H struct _target_context; struct _write_metadata_tag; struct _change_node; struct _data_page; #define MIN_DATA_SZ_PER_CHANGE_NODE (1*1024*1024) /* 1MB */ #define DEFAULT_MAX_DATA_SZ_PER_CHANGE_NODE (4*1024*1024) /* 4MB */ #define MAX_DATA_SZ_PER_CHANGE_NODE (64*1024*1024) /* 64MB */ #define SECTOR_SIZE_MASK 0xFFFFFE00 /* struct for writedata information*/ struct inm_writedata { void *wd_iovp; inm_u32_t wd_iovcnt; inm_u32_t wd_cplen; inm_u32_t wd_flag; inm_u32_t wd_resrved; void *wd_privp; void (*wd_copy_wd_to_datapgs)(struct _target_context *, struct inm_writedata *, struct inm_list_head *); struct _change_node *wd_chg_node; inm_page_t *wd_meta_page; }; typedef struct inm_writedata inm_wdata_t; #define INM_WD_WRITE_OFFLOAD 0x1 /* This structure holds data mode filtering context. This structure will be * initialized during data mode initialization and the pointer is stored in * the driver context. */ typedef struct _data_flt { /* Lock to synchronize data_pages list. */ inm_spinlock_t data_pages_lock; /* List of free data pages */ struct inm_list_head data_pages_head; /* Cumulative pages allocated for data mode filtering. */ inm_u32_t pages_allocated; /* Current number of free pages available */ inm_u32_t pages_free; inm_u32_t dp_nrpgs_slab; inm_u32_t dp_least_free_pgs; inm_s32_t dp_pages_alloc_free; } data_flt_t; inm_s32_t init_data_flt_ctxt(data_flt_t *); void free_data_flt_ctxt(data_flt_t *); void save_data_in_data_mode(struct _target_context *, struct _write_metadata_tag *, inm_wdata_t *); inm_s32_t get_data_pages(struct _target_context *tgt_ctxt, struct inm_list_head *head, inm_s32_t num_pages); inm_s32_t inm_rel_data_pages(struct _target_context*,struct inm_list_head *, inm_u32_t); void data_mode_cleanup_for_s2_exit(void); inm_s32_t add_data_pages(inm_u32_t); void recalc_data_file_mode_thres(void); inm_s32_t add_tag_in_stream_mode(tag_volinfo_t *, tag_info_t *, int, tag_guid_t *, inm_s32_t); inm_s32_t inm_tc_resv_add(struct _target_context *, inm_u32_t); inm_s32_t inm_tc_resv_del(struct _target_context *, inm_u32_t); data_page_t *get_cur_data_pg(struct _change_node *, inm_s32_t *); void update_cur_dat_pg(struct _change_node *, data_page_t *, inm_s32_t); #endif involflt-0.1.0/src/segmented_bitmap.c0000755000000000000000000002262014467303177016326 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include "involflt.h" #include "involflt-common.h" #include "data-mode.h" #include "utils.h" #include "change-node.h" #include "filestream.h" #include "iobuffer.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "bitmap_api.h" #include "VBitmap.h" #include "work_queue.h" #include "data-file-mode.h" #include "target-context.h" #include "driver-context.h" #include "involflt_debug.h" #include "bitmap_operations.h" segmented_bitmap_t *segmented_bitmap_ctr(fstream_segment_mapper_t *fssm, inm_u64_t bits_in_bitmap) { segmented_bitmap_t *sb = NULL; sb = INM_KMALLOC(sizeof(*sb), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!sb) return NULL; INM_MEM_ZERO(sb, sizeof(*sb)); INM_ATOMIC_SET(&sb->refcnt, 1); sb->fssm = fssm; sb->bits_in_bitmap = bits_in_bitmap; return sb; } void segmented_bitmap_dtr(segmented_bitmap_t *sb) { INM_BUG_ON(INM_ATOMIC_READ(&sb->refcnt) != 0); INM_KFREE(sb, sizeof(*sb), INM_KERNEL_HEAP); sb = NULL; } segmented_bitmap_t *segmented_bitmap_get(segmented_bitmap_t *sb) { INM_ATOMIC_INC(&sb->refcnt); return sb; } void segmented_bitmap_put(segmented_bitmap_t *sb) { if (INM_ATOMIC_DEC_AND_TEST(&sb->refcnt)) segmented_bitmap_dtr(sb); } inm_s32_t segmented_bitmap_process_bitrun(segmented_bitmap_t *sb, inm_u32_t bitsinrun, inm_u64_t bitoffset, inm_s32_t bitmap_operation) { inm_s32_t ret = 0; unsigned char *bit_buffer = NULL; inm_u32_t bit_buffer_byte_size = 0; inm_u32_t adjusted_bitsinrun = 0; inm_u64_t adjusted_buffer_size = 0; inm_u32_t nr_bytes_changed = 0; unsigned char *first_byte_changed = NULL; inm_u64_t byte_offset = 0; inm_u32_t bits_to_process = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } nr_bytes_changed = 0; bits_to_process = bitsinrun; if (bitoffset + bitsinrun > sb->bits_in_bitmap) return EOF_BMAP; while(bits_to_process > 0) { byte_offset = bitoffset / 8; ret = fstream_segment_mapper_read_and_lock(sb->fssm, byte_offset, &bit_buffer, &bit_buffer_byte_size); if (ret) break; /* figure out if this run crosses segment boundries or the bitmap end */ adjusted_buffer_size = min(((inm_u64_t)bit_buffer_byte_size * 8), (sb->bits_in_bitmap - (byte_offset * 8))); /* prevent runs that span buffers, hand that in this function */ adjusted_bitsinrun = (inm_u32_t)min(((inm_u64_t)bits_to_process), (adjusted_buffer_size - (bitoffset % 8))); switch(bitmap_operation) { case BITMAP_OP_SETBITS: ret = SetBitmapBitRun(bit_buffer, (inm_u32_t)adjusted_buffer_size, adjusted_bitsinrun, (inm_u32_t)(bitoffset%8), &nr_bytes_changed, &first_byte_changed); break; case BITMAP_OP_CLEARBITS: ret = ClearBitmapBitRun(bit_buffer, (inm_u32_t) adjusted_buffer_size, adjusted_bitsinrun, (inm_u32_t) (bitoffset % 8), &nr_bytes_changed, &first_byte_changed); break; case BITMAP_OP_INVERTBITS: ret = InvertBitmapBitRun(bit_buffer, (inm_u32_t) adjusted_buffer_size, adjusted_bitsinrun, (inm_u32_t) (bitoffset % 8), &nr_bytes_changed, &first_byte_changed); break; default: err("Invalid operation code (%d) passed to segmented_bitmap_process_bitrun \n", bitmap_operation); return 1; } if (nr_bytes_changed) fstream_segment_mapper_unlock_and_mark_dirty(sb->fssm, (byte_offset + (first_byte_changed - bit_buffer))); else fstream_segment_mapper_unlock(sb->fssm, byte_offset); bits_to_process -= adjusted_bitsinrun; bitoffset += adjusted_bitsinrun; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with return status = %d", ret); } return ret; } inm_s32_t segmented_bitmap_set_bitrun(segmented_bitmap_t *sb, inm_u32_t bitsinrun, inm_u64_t bitoffset) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } return segmented_bitmap_process_bitrun(sb, bitsinrun, bitoffset, BITMAP_OP_SETBITS); } inm_s32_t segmented_bitmap_clear_bitrun(segmented_bitmap_t *sb, inm_u32_t bitsinrun, inm_u64_t bitoffset) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } return segmented_bitmap_process_bitrun(sb, bitsinrun, bitoffset, BITMAP_OP_CLEARBITS); } inm_s32_t segmented_bitmap_invert_bitrun(segmented_bitmap_t *sb, inm_u32_t bitsinrun, inm_u64_t bitoffset) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } return segmented_bitmap_process_bitrun(sb, bitsinrun, bitoffset, BITMAP_OP_INVERTBITS); } inm_s32_t segmented_bitmap_clear_all_bits(segmented_bitmap_t *sb) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } return segmented_bitmap_process_bitrun(sb, sb->bits_in_bitmap, 0, BITMAP_OP_CLEARBITS); } inm_s32_t segmented_bitmap_get_first_bitrun(segmented_bitmap_t *sb, inm_u32_t *bitsinrun, inm_u64_t *bitoffset) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } sb->next_search_offset = 0; return segmented_bitmap_get_next_bitrun(sb, bitsinrun, bitoffset); } inm_s32_t segmented_bitmap_get_next_bitrun(segmented_bitmap_t *sb, inm_u32_t *bitsinrun, inm_u64_t *bitoffset) { inm_s32_t ret = 0; unsigned char *bit_buffer = NULL; inm_u32_t bit_buffer_byte_size; //inm_u32_t adjusted_bitsinrun; inm_u64_t adjusted_buffer_size; inm_u64_t byte_offset; inm_u32_t search_bit_offset; inm_u32_t run_length; inm_u32_t run_offset; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } *bitsinrun = 0; *bitoffset = 0; run_length = 0; run_offset = 0; while(sb->next_search_offset < sb->bits_in_bitmap) { byte_offset = sb->next_search_offset / 8; dbg("bitmap search offset = %llu", byte_offset); ret = fstream_segment_mapper_read_and_lock(sb->fssm, byte_offset, &bit_buffer, &bit_buffer_byte_size); if (ret) break; search_bit_offset = (inm_u32_t) (sb->next_search_offset % 8); adjusted_buffer_size = (inm_u32_t) min(((inm_u64_t)bit_buffer_byte_size * 8), (sb->bits_in_bitmap - (byte_offset * 8))); ret = GetNextBitmapBitRun(bit_buffer, adjusted_buffer_size, &search_bit_offset, &run_length, &run_offset); fstream_segment_mapper_unlock(sb->fssm, byte_offset); if (ret) break; sb->next_search_offset = (byte_offset * 8) + search_bit_offset; if (run_length > 0) { *bitoffset = (byte_offset * 8) + run_offset; break; } } if (ret == 0) { if (run_length > 0) { *bitsinrun = run_length; } else { *bitsinrun = 0; *bitoffset = sb->next_search_offset; ret = EOF_BMAP; } } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with ret status %d", ret); } return ret; } inm_s32_t find_bmap_io_pat(char *buf, long long bits_in_bmap, bmap_bit_stats_t *bbsp, int eobmap); inm_u64_t segmented_bitmap_get_number_of_bits_set(segmented_bitmap_t *sb, bmap_bit_stats_t *bbsp) { inm_s32_t ret = 0; unsigned char *bit_buffer = NULL; inm_u32_t bit_buffer_byte_size = 0; inm_u64_t adjusted_buffer_size = 0; inm_u64_t byte_offset = 0; inm_u64_t bit_offset = 0; inm_u64_t bits_to_process = 0; inm_u64_t bits_set = 0; inm_u32_t eobmap = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } bits_to_process = sb->bits_in_bitmap; while(bits_to_process > 0) { byte_offset = bit_offset / 8; ret = fstream_segment_mapper_read_and_lock(sb->fssm, byte_offset, &bit_buffer, &bit_buffer_byte_size); if (ret) break; adjusted_buffer_size = min(((inm_u64_t)bit_buffer_byte_size * 8) , (sb->bits_in_bitmap - (byte_offset * 8))); if (bbsp) { if ((byte_offset * 8 + 4096) >= sb->bits_in_bitmap) { eobmap = 1; } find_bmap_io_pat(bit_buffer, adjusted_buffer_size, bbsp, eobmap); info("bbsp = %d \n", bbsp->bbs_nr_dbs); } bits_set += (inm_u64_t)find_number_of_bits_set(bit_buffer, adjusted_buffer_size); bits_to_process -= adjusted_buffer_size; bit_offset += adjusted_buffer_size; fstream_segment_mapper_unlock(sb->fssm,byte_offset); } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with ret value = %llu", bits_set); } return bits_set; } inm_s32_t segmented_bitmap_sync_flush_all(segmented_bitmap_t *sb) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } return (fstream_segment_mapper_sync_flush_all(sb->fssm)); } involflt-0.1.0/src/segmented_bitmap.h0000755000000000000000000000470214467303177016334 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _INMAGE_SEGMENTED_BITMAP_H #define _INMAGE_SEGMENTED_BITMAP_H #include "involflt-common.h" typedef struct _segmented_bitmap_tag { fstream_segment_mapper_t *fssm; inm_u64_t next_search_offset; inm_u64_t bits_in_bitmap; inm_atomic_t refcnt; }segmented_bitmap_t; struct bmap_bit_stats; segmented_bitmap_t *segmented_bitmap_ctr(fstream_segment_mapper_t *fssm, inm_u64_t bits_in_bitmap); void segmented_bitmap_dtr(segmented_bitmap_t *sb); segmented_bitmap_t *segmented_bitmap_get(segmented_bitmap_t *sb); void segmented_bitmap_put(segmented_bitmap_t *sb); inm_s32_t segmented_bitmap_process_bitrun(segmented_bitmap_t *sb, inm_u32_t bitsinrun, inm_u64_t bitoffset, inm_s32_t bitmap_operation); inm_s32_t segmented_bitmap_set_bitrun(segmented_bitmap_t *sb, inm_u32_t bitsinrun, inm_u64_t bitoffset); inm_s32_t segmented_bitmap_clear_bitrun(segmented_bitmap_t *sb, inm_u32_t bitsinrun, inm_u64_t bitoffset); inm_s32_t segmented_bitmap_invert_bitrun(segmented_bitmap_t *sb, inm_u32_t bitsinrun, inm_u64_t bitoffset); inm_s32_t segmented_bitmap_clear_all_bits(segmented_bitmap_t *sb); inm_s32_t segmented_bitmap_get_first_bitrun(segmented_bitmap_t *sb, inm_u32_t *bitsinrun, inm_u64_t *bitoffset); inm_s32_t segmented_bitmap_get_next_bitrun(segmented_bitmap_t *sb, inm_u32_t *bitsinrun, inm_u64_t *bitoffset); inm_u64_t segmented_bitmap_get_number_of_bits_set(segmented_bitmap_t *sb, struct bmap_bit_stats *); inm_s32_t segmented_bitmap_sync_flush_all(segmented_bitmap_t *sb); inm_s32_t get_next_bitmap_bitrun(char *bit_buffer, inm_u64_t adjusted_buffer_size, inm_u32_t *search_bit_offset, inm_u32_t *run_length, inm_u32_t *run_offset); #endif /* _INMAGE_SEGMENTED_BITMAP_H */ involflt-0.1.0/src/driver-context.h0000755000000000000000000003243214467303177016003 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef LINVOLFLT_DRIVER_CONTEXT_H #define LINVOLFLT_DRIVER_CONTEXT_H #include "inm_utypes.h" #include "telemetry-types.h" #define DC_FLAGS_BITMAP_WORK_ITEM_POOL_INIT 0x00004000 #define DC_FLAGS_WORKQUEUE_ENTRIES_POOL_INIT 0x00000400 #define DC_FLAGS_SERVICE_STATE_CHANGED 0x00001000 #define DC_FLAGS_TARGET_MODULE_REGISTERED 0x00020000 #define DC_FLAGS_SYSTEM_SHUTDOWN 0x00100000 #define DC_FLAGS_INVOLFLT_LOAD 0x00200000 #define DC_FLAGS_REBOOT_MODE 0x00400000 typedef struct __dc_statistics { inm_u32_t num_malloc_fails; inm_atomic_t pending_chg_nodes; } dc_stats_t; typedef struct __dc_tunable_params { /* thresholds */ inm_u32_t db_high_water_marks[MAX_SERVICE_STATES]; inm_u32_t db_low_water_mark_while_service_running; inm_u32_t db_topurge_when_high_water_mark_is_reached; inm_u32_t max_data_pages_per_target; inm_s32_t free_percent_thres_for_filewrite; inm_s32_t volume_percent_thres_for_filewrite; inm_s32_t free_pages_thres_for_filewrite; inm_u32_t max_data_size_per_non_data_mode_drty_blk; inm_s32_t enable_data_filtering; inm_s32_t enable_data_filtering_for_new_volumes; inm_s32_t enable_data_file_mode; inm_s32_t enable_data_file_mode_for_new_volumes; inm_u64_t data_to_disk_limit; inm_s32_t db_notify; inm_u32_t data_pool_size; /* in terms of MB */ inm_u32_t max_data_pool_percent; /* in terms of %age */ inm_u32_t volume_data_pool_size; /* in terms of MB */ char data_file_log_dir[INM_PATH_MAX]; inm_u32_t max_data_sz_dm_cn; /* max data size that a change node can hold in data mode*/ inm_u32_t max_sz_md_coalesce; /* max coalesce size for a change */ inm_u32_t percent_change_data_pool_size; inm_u32_t time_reorg_data_pool_sec; inm_u32_t time_reorg_data_pool_factor; inm_u32_t vacp_iobarrier_timeout; inm_u32_t fs_freeze_timeout; inm_u32_t vacp_app_tag_commit_timeout; inm_u32_t enable_recio; inm_u32_t stable_pages; inm_u32_t enable_chained_io; } dc_tune_params_t; typedef struct __kernel_thread_t { inm_s32_t initialized; inm_atomic_t wakeup_event_raised; inm_atomic_t shutdown_event_raised; inm_wait_queue_head_t wakeup_event; inm_wait_queue_head_t shutdown_event; inm_completion_t _completion; inm_completion_t _new_event_completion; } kernel_thread_t; struct drv_ctx_host_info { /* This list maintains list of request queue structures that we modified * its make request function. */ struct inm_list_head rq_list; inm_spinlock_t rq_list_lock; #ifndef INM_AIX inm_kmem_cache_t *bio_info_cache; #endif inm_kmem_cache_t *mirror_bioinfo_cache; #ifdef INM_AIX inm_kmem_cache_t *data_file_node_cache; #endif }; struct drv_ctx_fabric_info { void *target_priv; }; struct drv_ctx_bmap_info { struct inm_list_head head_for_volume_bitmaps; inm_kmem_cache_t *iob_obj_cache; /*iobuffer object Lookasidelist*/ inm_kmem_cache_t *iob_data_cache; /*iobuffer data Lookaside list*/ inm_mempool_t *iob_obj_pool; inm_mempool_t *iob_data_pool; inm_kmem_cache_t *bitmap_work_item_pool; inm_u32_t max_bitmap_buffer_memory; inm_u32_t current_bitmap_buffer_memory; inm_u32_t bitmap_512K_granularity_size; unsigned long num_volume_bitmaps; iobuffer_t write_filtering_obj; }; #ifdef INM_AIX typedef struct _queue_buf_thread { inm_atomic_t qbt_pending; inm_completion_t qbt_exit; inm_completion_t qbt_completion; }inm_queue_buf_thread_t; typedef struct _inm_cdb_dev_entry{ struct inm_list_head this_entry; char cdb_devname[INM_GUID_LEN_MAX]; struct file *cdb_fp; } inm_cdb_dev_entry_t; #endif /* Maintains VM level CX session */ typedef struct _vm_cx_session { /* Flags */ inm_u64_t vcs_flags; /* Transaction ID */ inm_u64_t vcs_transaction_id; /* Number of disk CX sessions */ inm_u64_t vcs_num_disk_cx_sess; /* CX session start time */ inm_u64_t vcs_start_ts; /* CX session end time */ inm_u64_t vcs_end_ts; /* This is the base time to calculate 1s intervals */ inm_u64_t vcs_base_secs_ts; /* Bytes tracked */ inm_u64_t vcs_tracked_bytes; /* Bytes drained */ inm_u64_t vcs_drained_bytes; /* Bytes tracked every second */ inm_u64_t vcs_tracked_bytes_per_second; /* Churn buckets */ inm_u64_t vcs_churn_buckets[DEFAULT_NR_CHURN_BUCKETS]; /* Disk level supported peak churn */ inm_u64_t vcs_default_disk_peak_churn; /* VM level supported peak churn */ inm_u64_t vcs_default_vm_peak_churn; /* Max peak churn */ inm_u64_t vcs_max_peak_churn; /* Time of first peak churn */ inm_u64_t vcs_first_peak_churn_ts; /* Time of first peak churn */ inm_u64_t vcs_last_peak_churn_ts; /* Excess churn on top of peak churn */ inm_u64_t vcs_excess_churn; /* Number of consecutive tag failres observed */ inm_u64_t vcs_num_consecutive_tag_failures; /* Time Jump timestamp */ inm_u64_t vcs_timejump_ts; /* Time Jump in msec */ inm_u64_t vcs_max_jump_ms; /* Drainer latency */ inm_u64_t vcs_max_s2_latency; /* CS session number */ inm_u32_t vcs_nth_cx_session; } vm_cx_session_t; #define VCS_CX_SESSION_STARTED 0x0001 #define VCS_CX_SESSION_ENDED 0x0002 #define VCS_CX_S2_EXIT 0x0004 #define VCS_CX_SVAGENT_EXIT 0x0008 #define VCS_CX_TAG_FAILURE 0x0010 #define VCS_CX_PRODUCT_ISSUE 0x0020 #define VCS_CX_TIME_JUMP_FWD 0x0040 #define VCS_CX_TIME_JUMP_BWD 0x0080 #define VCS_CX_UNSUPPORTED_BIO 0x0100 #define VCS_NUM_CONSECTIVE_TAG_FAILURES_ALLOWED 3 #define DISK_LEVEL_SUPPORTED_CHURN (25 * 1024 * 1024) /* 25MB */ #define VM_LEVEL_SUPPORTED_CHURN (50 * 1024 * 1024) /* 50MB */ #define FORWARD_TIMEJUMP_ALLOWED 180000 /* in msecs = 3 mins */ #define BACKWARD_TIMEJUMP_ALLOWED 3600000 /* in msecs = 60 mins */ #define DC_AT_INTO_INITING 0x1 #define DC_AT_INTO_INITED 0x2 /* This structure holds generic information about involflt target. There will * be one instance of this structure created and initialized during involflt * module load. */ typedef struct _driver_context { /* This lock is used to synchronize access to tgt_list structure. */ inm_rwsem_t tgt_list_sem; #ifdef INM_AIX inm_spinlock_t tgt_list_lock; #endif /* Doubly linked list of target context. One target context created * per filter target and target specific information is stored here. */ struct inm_list_head tgt_list; /* service state */ svc_state_t service_state; inm_s32_t s2_started; /* This points to number of volumes that are being tracked currently */ inm_u16_t total_prot_volumes; inm_u16_t host_prot_volumes; inm_u16_t mirror_prot_volumes; /* This structure hold information specific to data mode filtering. */ data_flt_t data_flt_ctx; inm_dev_t flt_dev; inm_cdev_t flt_cdev; inm_spinlock_t tunables_lock; dc_tune_params_t tunable_params; /* per io time stamp, seqno file names */ char driver_time_stamp[INM_PATH_MAX]; void *driver_time_stamp_handle; char driver_time_stamp_seqno[INM_PATH_MAX]; void *driver_time_stamp_seqno_handle; char driver_time_stamp_buf[NUM_CHARS_IN_LONGLONG + 1 ]; /* 6.25% of system memory */ inm_u32_t default_data_pool_size_mb; /* Total unreserved pages for the driver context */ inm_u32_t dc_cur_unres_pages; /* Total reserved pages for target contexts */ inm_u32_t dc_cur_res_pages; /* Page reservations for new volume context based on tunable*/ inm_u32_t dc_vol_data_pool_size; dc_stats_t stats; /* statistics info */ /* service thread */ kernel_thread_t service_thread; /* This lock is used to synchronize access to various logging events * and statistic info **/ inm_spinlock_t log_lock; /* there is a check in metadata mode, for the follwoing flag */ inm_u8_t enable_data_filtering; /* bitmap mode */ inm_dev_t root_dev; #ifdef INM_AIX struct file *root_filp; #endif inm_u32_t sys_shutdown; /* to keep track of allocation of buffers */ inm_u32_t flags; inm_kmem_cache_t *wq_entry_pool; workq_t wqueue; inm_u32_t service_supports_data_filtering; inm_u32_t enable_data_files; inm_pid_t sentinal_pid; inm_pid_t svagent_pid; inm_devhandle_t *svagent_idhp; inm_devhandle_t *sentinal_idhp; /* Data Structures to maintain unique timestamps across all volumes. */ inm_spinlock_t time_stamp_lock; inm_u64_t last_time_stamp; inm_u64_t last_time_stamp_seqno; /* reserved memory pool */ inm_spinlock_t page_pool_lock; struct inm_list_head page_pool; inm_u32_t dc_res_cnode_pgs; /* shutdown related data */ inm_completion_t shutdown_completion; /* Binary semaphore to serialize issuing tags */ inm_sem_t tag_sem; struct drv_ctx_host_info dc_host_info; struct drv_ctx_fabric_info dc_fabric_info; struct drv_ctx_bmap_info dc_bmap_info; /* lock to protect a_ops list */ inm_rwsem_t dc_inmaops_sem; /* inma_ops list for handling recursive writes */ /* list of duplicated address space operations */ struct inm_list_head dc_inma_ops_list; inm_atomic_t involflt_refcnt; inm_u32_t clean_shutdown; inm_u32_t unclean_shutdown; inm_u32_t dc_flags; inm_spinlock_t clean_shutdown_lock; inm_rwsem_t tag_guid_list_sem; struct inm_list_head tag_guid_list; #ifdef INM_SOLARIS inm_dev_open_t dev_major_open; #else inm_dc_at_lun_info_t dc_at_lun; #endif #ifdef INM_AIX inm_sem_t dc_mxs_sem; struct inm_list_head dc_mxs_list; inm_queue_buf_thread_t dc_qbt; pid_t dc_qbt_pid; #endif inm_spinlock_t recursive_writes_meta_list_lock; inm_list_head_t recursive_writes_meta_list; /* freeze volume list, head of freeze volume list*/ struct inm_list_head freeze_vol_list; /* flag to maintain the state of driver with io barrier */ inm_atomic_t is_iobarrier_on; /* Consistency Point State */ inm_u32_t dc_cp; /* App/Crash consistency guid */ char dc_cp_guid[GUID_LEN]; /* To sync App/Crash consstency */ inm_sem_t dc_cp_mutex; workq_t dc_tqueue; driver_telemetry_t dc_tel; /* Last Chance Writes */ inma_ops_t *dc_lcw_aops; void *dc_lcw_rhdl; int dc_lcw_rflag; struct _target_context *dc_root_disk; /* CX related */ vm_cx_session_t dc_vm_cx_session; inm_spinlock_t dc_vm_cx_session_lock; unsigned long dc_vm_cx_session_lock_flag; inm_u16_t total_prot_volumes_in_nwo; inm_u64_t dc_disk_level_supported_churn; inm_u64_t dc_vm_level_supported_churn; inm_u64_t dc_nth_cx_session; inm_u64_t dc_transaction_id; inm_wait_queue_head_t dc_vm_cx_session_waitq; inm_list_head_t dc_disk_cx_stats_list; inm_list_head_t dc_disk_cx_sess_list; inm_u16_t dc_num_disk_cx_stats; inm_u16_t dc_num_consecutive_tags_failed; inm_u64_t dc_max_fwd_timejump_ms; inm_u64_t dc_max_bwd_timejump_ms; inm_u16_t dc_wokeup_monitor_thread; inm_u32_t dc_verifier_on; inm_spinlock_t dc_verifier_lock; char *dc_verifier_area; inm_spinlock_t dc_tag_commit_status; inm_atomic_t dc_nr_tag_commit_status_pending_disks; inm_atomic_t dc_tag_commit_status_failed; inm_wait_queue_head_t dc_tag_commit_status_waitq; inm_u16_t dc_wokeup_tag_drain_notify_thread; char *dc_tag_drain_notify_guid; inm_u32_t dc_tag_commit_notify_flag; #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) inm_completion_t dc_alloc_thread_started; inm_wait_queue_head_t dc_alloc_thread_waitq; inm_completion_t dc_alloc_thread_exit; struct task_struct *dc_alloc_thread_task; inm_atomic_t dc_nr_bioinfo_allocs_failed; inm_atomic_t dc_nr_chgnode_allocs_failed; inm_atomic_t dc_nr_metapage_allocs_failed; inm_atomic_t dc_alloc_thread_quit; inm_atomic_t dc_nr_bioinfo_alloced; inm_atomic_t dc_nr_chdnodes_alloced; inm_atomic_t dc_nr_metapages_alloced; inm_atomic_t dc_nr_bioinfo_alloced_from_pool; inm_atomic_t dc_nr_chgnodes_alloced_from_pool; inm_atomic_t dc_nr_metapages_alloced_from_pool; inm_list_head_t dc_bioinfo_list; inm_list_head_t dc_chdnodes_list; TIME_STAMP_TAG_V2 dc_crash_tag_timestamps; #endif } driver_context_t; #define INM_CP_NONE 0 #define INM_CP_APP_ACTIVE 1 #define INM_CP_CRASH_ACTIVE 2 #define INM_CP_TAG_COMMIT_PENDING 4 #define INM_CP_SHUTDOWN 8 #define SYS_UNCLEAN_SHUTDOWN 0x1 #define SYS_CLEAN_SHUTDOWN 0x2 #define DRV_MIRROR_NOT_SUPPORT 0x4 #define DRV_DUMMY_LUN_CREATED 0x8 inm_s32_t init_driver_context(void); void free_driver_context(void); void add_tc_to_dc(struct _target_context *); void remove_tc_from_dc(struct _target_context *); void inm_svagent_exit(void); void inm_s2_exit(void); /* global pool management */ inm_s32_t alloc_cache_pools(void); inm_s32_t dealloc_cache_pools(void); void balance_page_pool(int, int); #endif involflt-0.1.0/src/filter.h0000755000000000000000000001610514467303177014312 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _INM_FILTER_H #define _INM_FILTER_H #include "involflt.h" #include "involflt-common.h" #define INM_MIRROR_MODEL_SETUP 0x00000001 #define INM_MIRROR_IO_TO_ATLUN 0x00000002 #define INM_MIRROR_IO_TO_SOURCE 0x00000004 #define INM_MIRROR_IO_TO_ATLUN_DONE 0x00000008 #define INM_MIRROR_IO_TO_SOURCE_DONE 0x00000010 #define INM_MIRROR_IO_PAGELOCKDONE 0x00000020 #define INM_MIRROR_IO_ATLUN_ERR 0x00000040 #define INM_MIRROR_IO_SOURCE_ERR 0x00000080 #define INM_MIRROR_IO_ATLUN_DIFF_PATH 0x00000100 #define INM_MIRROR_IO_ATLUN_PATHS_FAILURE 0x00000200 #define INM_MIRROR_IO_PTLUN_PATHS_FAILURE 0x00000400 #define INM_PTIO_CANCEL_PENDING 0x00000800 #define INM_PTIO_CANCEL_SENT 0x00001000 #define INM_PTIO_FULL_FAILED 0x00002000 #define INM_ATIO_FULL_FAILED 0x00004000 #define INM_ATBUF_FULL_FAILED 0x00000001 #define INM_ATBUF_PARTIAL_FAILED 0x00000002 #define INM_ATBUF_DONT_LINK_PREV 0x00000004 #define UPDATE_ATIO_SEND(vol_entryp, io_sz) \ do{ \ if(io_sz){ \ vol_entryp->vol_byte_written += io_sz; \ vol_entryp->vol_io_issued++; \ vol_entryp->vol_io_succeeded++; \ } \ }while(0) #define UPDATE_ATIO_FAILED(vol_entryp, io_sz) \ do{ \ vol_entryp->vol_byte_written -= io_sz; \ vol_entryp->vol_io_succeeded--; \ }while(0) typedef struct inm_dev_extinfo { inm_device_t d_type; char d_guid[INM_GUID_LEN_MAX]; char d_mnt_pt[INM_PATH_MAX]; inm_u64_t d_nblks; inm_u32_t d_bsize; inm_u64_t d_flags; char d_pname[INM_GUID_LEN_MAX]; char d_src_scsi_id[INM_GUID_LEN_MAX]; char d_dst_scsi_id[INM_GUID_LEN_MAX]; struct inm_list_head *src_list; struct inm_list_head *dst_list; inm_u64_t d_startoff; } inm_dev_extinfo_t; void do_unstack_all(void); inm_s32_t isrootdev(struct _target_context *vcptr); inm_s32_t do_volume_stacking(inm_dev_extinfo_t *); inm_s32_t do_start_filtering(inm_devhandle_t *, inm_dev_extinfo_t *); inm_s32_t do_start_mirroring(inm_devhandle_t *, mirror_conf_info_t *); inm_s32_t init_boottime_stacking(void); inm_s32_t inm_dentry_callback(char *fname); void flt_cleanup_sync_tag(tag_guid_t *); tag_guid_t * get_tag_from_guid(char *); inm_s32_t populate_volume_lists(struct inm_list_head *src_mirror_list_head, struct inm_list_head *dst_mirror_list_head, mirror_conf_info_t *mirror_infop); void free_mirror_list(struct inm_list_head *list_head, int close_device); void print_mirror_list(struct inm_list_head *list_head); inm_s32_t is_flt_disabled(char *volname); void add_tags(tag_volinfo_t *tag_volinfop, tag_info_t *tag_info, inm_s32_t num_tags, tag_guid_t *tag_guid, inm_s32_t index); int flt_process_tags(inm_s32_t num_vols, void __INM_USER **user_buf, inm_s32_t flags, tag_guid_t *tag_guid); void load_bal_rr(struct _target_context *ctx, inm_u32_t io_sz); inm_s32_t ptio_cancel_send(struct _target_context *tcp, inm_u64_t write_off, inm_u32_t write_len); struct _mirror_vol_entry * get_cur_vol_entry(struct _target_context *tcp, inm_u32_t io_sz); void inm_atio_retry(wqentry_t *wqe); void issue_ptio_cancel_cdb(wqentry_t *wqe); inm_iodone_t INM_MIRROR_IODONE(inm_pt_mirror_iodone, pt_bp, done, error); inm_iodone_t INM_MIRROR_IODONE(inm_at_mirror_iodone, at_bp, done, error); inm_u32_t inm_devpath_to_maxxfer(char *device_name); void inm_free_atbuf_list(inm_list_head_t *); int process_tag_volume(tag_info_t_v2 *tag_vol, tag_info_t *tag_list, int commit_pending); tag_volinfo_t *build_volume_node_totag(volume_info_t *vol_info, inm_s32_t *error); void add_volume_tags(tag_info_t_v2 *tag_vol, tag_volinfo_t *tag_volinfop, tag_info_t *tag_info, int commit_pending); inm_s32_t issue_tag_volume(tag_info_t_v2 *tag_vol, tag_volinfo_t *vol_tag, tag_info_t *tag_list, int commit_pending); tag_info_t *build_tag_vol_list(tag_info_t_v2 *tag_vol, inm_s32_t *error); void set_tag_drain_notify_status(struct _target_context *ctxt, int tag_status, int dev_status); inm_s32_t modify_persistent_device_name(struct _target_context *ctx, char *p_name); #ifdef IDEBUG_MIRROR_IO #ifdef INM_LINUX #if LINUX_VERSION_CODE >= KERNEL_VERSION(4,4,0) #define INM_INJECT_ERR(cond, error, bp) \ do{ \ if(cond){ \ dbg("injecting mirroring err"); \ bp->bi_error = cond; \ } \ }while(0) #else #define INM_INJECT_ERR(cond, error, bp) \ do{ \ if(cond){ \ dbg("injecting mirroring err"); \ error = cond; \ } \ }while(0) #endif #else #define INM_INJECT_ERR(cond, error, bp) \ do{ \ if(cond){ \ dbg("injecting mirroring err"); \ bp->b_error = EIO; \ } \ cond = 0; \ }while(0) #endif #define INJECT_VENDOR_CDB_ERR(cond, ret) \ do{ \ if(cond){ \ ret = 1; \ } \ cond = 0; \ }while(0) #else #define INM_INJECT_ERR(cond, error, bp) #define INJECT_VENDOR_CDB_ERR(inject_vendorcdb_err, ret) #endif struct inm_mirror_bufinfo { inm_buf_t imb_pt_buf; inm_list_head_t imb_atbuf_list; inm_list_head_t imb_list; inm_buf_t *imb_org_bp; void *imb_privp; inm_u64_t imb_io_off; inm_u64_t imb_volsz; inm_u32_t imb_io_sz; inm_atomic_t imb_done_cnt; wqentry_t ptio_can_wqe; struct _mirror_vol_entry *imb_vol_entry; inm_u32_t imb_flag; inm_u32_t imb_atbuf_cnt; inm_u64_t imb_atbuf_absoff; inm_s32_t imb_pt_err; inm_u32_t imb_pt_done; }; typedef struct inm_mirror_bufinfo inm_mirror_bufinfo_t; struct _mirror_atbuf { inm_list_head_t imb_atbuf_this; inm_u32_t imb_atbuf_flag; wqentry_t imb_atbuf_wqe; inm_buf_t imb_atbuf_buf; inm_mirror_bufinfo_t *imb_atbuf_imbinfo; struct _mirror_vol_entry *imb_atbuf_vol_entry; inm_u32_t imb_atbuf_iosz; inm_u32_t imb_atbuf_done; }; typedef struct _mirror_atbuf inm_mirror_atbuf; inm_s32_t inm_save_mirror_bufinfo(struct _target_context *, inm_mirror_bufinfo_t **, inm_buf_t **, struct _mirror_vol_entry *); #ifdef INM_AIX #define INM_ALLOC_MIRROR_BUFINFO(mrr_info) mrr_info = (inm_mirror_bufinfo_t *)INM_KMALLOC(sizeof(inm_mirror_bufinfo_t), INM_KM_SLEEP, INM_KERNEL_HEAP); #else #define INM_ALLOC_MIRROR_BUFINFO(mrr_info) mrr_info = INM_KMEM_CACHE_ALLOC(driver_ctx->dc_host_info.mirror_bioinfo_cache, INM_KM_NOIO); #endif #define INM_MIRROR_BIOSZ sizeof(inm_mirror_bufinfo_t) #endif /* _INM_FILTER_H */ involflt-0.1.0/src/involflt-common.h0000755000000000000000000001272214467303177016151 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef LINVOLFLT_COMMON_H #define LINVOLFLT_COMMON_H #include "osdep.h" #include "safecapismajor.h" /* KGDB couldn't handle inlined functions correctly. So, when KGDB is enabled, then inline * functions are treated as normal functions. */ #ifndef static_inline #ifdef CONFIG_KGDB # define static_inline static __attribute__ ((__unused__)) #else #define static_inline static inline #endif #endif #ifndef MSEC_PER_SEC #define MSEC_PER_SEC 1000L #endif /* #define ALLOC_CAN_WAIT_FLAG INM_KM_SLEEP #define ALLOC_CANT_WAIT_FLAG INM_KM_NOSLEEP */ #define MAX_TARGET_NAME_LENGTH 256 #define NUM_CHARS_IN_INTEGER 10 #define NUM_CHARS_IN_LONGLONG 20 #define PERSISTENT_DIR "/etc/vxagent/involflt" #define DRIVER_NAME "involflt" #define FALSE 0 #define TRUE 1 /* bitmap related constants */ #define MAX_LOG_PATHNAME (0x200) #define LOG_FILE_NAME_PREFIX "InMage-" #define LOG_FILE_NAME_SUFFIX ".VolumeLog" #define GIGABYTES (1024*1024*1024) #define MEGABYTES (0x100000) /*1024*1024*/ #define KILOBYTES (0x400) /*1024*/ #define THIRTY_TWO_K_SIZE (0x8000) #define SIXTEEN_K_SIZE (0x4000) #define EIGHT_K_SIZE (0x2000) #define FOUR_K_SIZE (0x1000) #define MEGABYTE_BIT_SHIFT (0x14) /* 20 bits */ #define ERROR_TO_REG_OUT_OF_MEMORY_FOR_DIRTY_BLOCKS 0x0001 #define VCF_GUID_OBTAINED (0x00000200) #define DEFAULT_DB_NOTIFY_THRESHOLD DEFAULT_MAX_DATA_SZ_PER_CHANGE_NODE #define CX_SESSION_PENDING_BYTES_THRESHOLD \ (2 * DEFAULT_MAX_DATA_SZ_PER_CHANGE_NODE) /* 8MB */ #define TAG_VOLUME_MAX_LENGTH 256 #define TAG_MAX_LENGTH 256 /*flag values to pass to the driver*/ #define TAG_VOLUME_INPUT_FLAGS_ATOMIC_TO_VOLUME_GROUP 0x0001 #define TAG_FS_CONSISTENCY_REQUIRED 0x0002 #define TAG_FS_FROZEN_IN_USERSPACE 0x0004 struct _tag_info { char tag_name[TAG_MAX_LENGTH]; unsigned short tag_len; }; typedef struct _tag_info tag_info_t; #ifndef MIN #define MIN(a,b) (((a) < (b)) ? (a) : (b)) #endif #ifndef INM_MAX #define INM_MAX(a,b) (((a) > (b)) ? (a) : (b)) #endif struct _target_context; #define GIGABYTES (1024*1024*1024) #define MEGABYTES (0x100000) /* (1024*1024) */ #define KILOBYTES (0x400) /* (1024) */ #define FIVE_TWELVE_K_SIZE (0x80000) #define TWO_FIFTY_SIX_K_SIZE (0x40000) #define ONE_TWENTY_EIGHT_K_SIZE (0x20000) #define SIXTY_FOUR_K_SIZE (0x10000) #define THIRTY_TWO_K_SIZE (0x8000) #define SIXTEEN_K_SIZE (0x4000) #define EIGHT_K_SIZE (0x2000) #define FOUR_K_SIZE (0x1000) #define INM_SECTOR_SIZE 512 #define INM_SECTOR_SHIFT 9 #define SINGLE_MAX_WRITE_LENGTH 0x10000 /* (64 KBytes) */ #define MAX_LOG_PATHNAME (0x200) #define MAX_NAME_LEN NAME_MAX /*255*/ #define MAX_UUID_LEN 128 #define MAX_NR_FS_BLKS_IN_16K 32 #define INM_DEV_BSIZE 512 #define INM_DEV_BSZ_SHIFT 9 static_inline int inm_is_little_endian(void) { inm_u32_t val = 0xabcdef; inm_uchar *cp = (inm_uchar *)&val; if ((*cp & 0xff) == 0xef) { return (1); } else { return (0); } } #ifdef INM_DEBUG #define INM_BUG_ON_TMP(vcptr) \ if (vcptr->dummy_tc_cur_mode > 0 || vcptr->dummy_tc_dev_state > 0) { \ info("Involflt driver bug %p tc_mode:%u,tc_pmode:%u tc_devst:%u tc_mode1:%u tc_pmode1:%u tc_devst1:%u", vcptr, vcptr->tc_cur_mode, vcptr->tc_prev_mode, vcptr->tc_dev_state,vcptr->dummy_tc_cur_mode, vcptr->dummy_tc_prev_mode, vcptr->dummy_tc_dev_state); \ INM_BUG_ON(1); \ } #else #define INM_BUG_ON_TMP(vcptr) #endif #define ASYNC_TAG 0 #define SYNC_TAG 1 typedef struct tag_guid { struct inm_list_head tag_list; inm_wait_queue_head_t wq; inm_s32_t *status; char *guid; inm_u16_t guid_len; inm_u16_t num_vols; }tag_guid_t; #define INM_SCSI_VENDOR_ID_SIZE 10 #define IMPOSSIBLE_SCSI_STATUS 0xff #define PAGE_0 0 #define PAGE_80 0x80 #define PAGE_83 0x83 #define VENDOR_LENGTH 8 #define MODEL_LENGTH 16 #define WRITE_CANCEL_CDB 0xC0 #define WRITE_CANCEL_CDB_LEN 16 #define VACP_CDB 0xC2 #define VACP_CDB_LEN 6 #define HEARTBEAT_CDB 0xC5 #define HEARTBEAT_CDB_LEN 6 typedef struct disk_cx_stats_info { inm_list_head_t dcsi_list; int dcsi_valid; DEVICE_CXFAILURE_STATS dcsi_dev_cx_stats; } disk_cx_stats_info_t; #endif involflt-0.1.0/src/inm_list.h0000755000000000000000000000364714467303177014652 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _INM_LIST_H #define _INM_LIST_H #include #define inm_list_head list_head typedef struct list_head inm_list_head_t; #define inm_container_of container_of #define INM_LIST_HEAD_INIT LIST_HEAD_INIT #define INM_INIT_LIST_HEAD INIT_LIST_HEAD #define INM_LIST_HEAD LIST_HEAD #define INIT_LIST_HEAD INIT_LIST_HEAD #define inm_list_add list_add #define inm_list_add_tail list_add_tail #define inm_list_del list_del #define inm_list_replace list_replace #define inm_list_replace_init list_replace_init #define inm_list_del_init list_del_init #define inm_list_move list_move #define inm_list_is_last list_is_last #define inm_list_empty list_empty #define inm_list_splice list_splice #define inm_list_splice_init list_splice_init static inline inm_list_head_t * inm_list_first(inm_list_head_t *head) { return head->next; } static inline inm_list_head_t * inm_list_last(inm_list_head_t *head) { return head->prev; } #define inm_list_entry list_entry #define inm_list_first_entry list_first_entry #define __inm_list_for_each list_for_each #define inm_list_for_each_safe list_for_each_safe #endif /* _INM_LIST_H */ involflt-0.1.0/src/filestream_segment_mapper.c0000755000000000000000000003343214467303177020243 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ /* * File : filestream_segment_mapper.c * * Description: This file contains data mode implementation of the * filter driver. * * Functions defined are **fstream_segment_mapper_ctr fstream_segment_mapper_dtr *fstream_segment_mapper_get fstream_segment_mapper_put fstream_segment_mapper_detach fstream_segment_mapper_sync_flush_all * */ #include "involflt-common.h" #include "involflt.h" #include "data-mode.h" #include "change-node.h" #include "filestream.h" #include "iobuffer.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "bitmap_api.h" #include "work_queue.h" #include "driver-context.h" extern driver_context_t *driver_ctx; inm_u32_t prime_table[32] = { 7, 17, 29, 41, 59, 89, 137, 211, 293, 449, 673, 997, 1493, 2003, 3001, 4507, 6779, 9311, 13933, 19819, 29863, 44987, 66973, 90019, 130069, 195127, 301237, 0}; iobuffer_t* get_iobuffer_cache_ptr(fstream_segment_mapper_t *fssm, inm_u32_t buffer_index) { iobuffer_t *iob = NULL; unsigned char **page_ptr = NULL; inm_s32_t page_index = (buffer_index*(sizeof(iobuffer_t*))/ BITMAP_FILE_SEGMENT_SIZE); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered: buffer_index:%d page_index:%d", buffer_index, page_index); } if (fssm->buffer_cache_index) { page_ptr = (unsigned char **)fssm->buffer_cache_index[page_index]; iob = (iobuffer_t*)page_ptr[buffer_index % (BITMAP_FILE_SEGMENT_SIZE/(sizeof(iobuffer_t*)))]; } dbg("index:%d iob:%p page_ptr:%p", buffer_index, iob, page_ptr); return iob; } iobuffer_t* reset_iobuffer_cache_ptr(fstream_segment_mapper_t *fssm, inm_u32_t buffer_index, iobuffer_t *ptr) { inm_s32_t page_index = (buffer_index*(sizeof(iobuffer_t*))/ BITMAP_FILE_SEGMENT_SIZE); iobuffer_t *iob = NULL; unsigned char **page_ptr = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered: buffer_index:%d page_index:%d", buffer_index, page_index); } if (fssm->buffer_cache_index) { page_ptr = (unsigned char **)fssm->buffer_cache_index[page_index]; iob = (iobuffer_t*)page_ptr[buffer_index % (BITMAP_FILE_SEGMENT_SIZE/(sizeof(iobuffer_t*)))]; page_ptr[buffer_index%(BITMAP_FILE_SEGMENT_SIZE/(sizeof(iobuffer_t*)))] = (unsigned char*)ptr; } dbg("index:%d out iob:%p page_ptr:%p in ptr:%p", buffer_index, iob, page_ptr, ptr); return iob; } fstream_segment_mapper_t *fstream_segment_mapper_ctr() { fstream_segment_mapper_t *fssm = NULL; fssm = (fstream_segment_mapper_t *)INM_KMALLOC(sizeof(*fssm), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!fssm) return NULL; INM_MEM_ZERO(fssm, sizeof(*fssm)); INM_ATOMIC_SET(&fssm->refcnt, 1); return fssm; } void fstream_segment_mapper_dtr(fstream_segment_mapper_t *fssm) { inm_s32_t index = 0; inm_s32_t npages = 0; iobuffer_t *iob; if (!fssm) return; if (fssm->buffer_cache_index) { for (index = 0; index < fssm->cache_size; index++) { iob = reset_iobuffer_cache_ptr(fssm, index, NULL); dbg("resetting iobuffer:%p", iob); iobuffer_put(iob); } npages = (fssm->cache_size * sizeof(iobuffer_t *))/ BITMAP_FILE_SEGMENT_SIZE + (((fssm->cache_size * sizeof(iobuffer_t *))%BITMAP_FILE_SEGMENT_SIZE)?1:0); index = 0; while (index != npages) { dbg("Freeing buffer page:%p", fssm->buffer_cache_index[index]); INM_KFREE((unsigned char*)fssm->buffer_cache_index[index], BITMAP_FILE_SEGMENT_SIZE, INM_KERNEL_HEAP); fssm->buffer_cache_index[index] = NULL; index++; } dbg("Freeing buffer index page:%p", fssm->buffer_cache_index); INM_KFREE(fssm->buffer_cache_index,BITMAP_FILE_SEGMENT_SIZE, INM_KERNEL_HEAP); fssm->buffer_cache_index = NULL; } INM_KFREE(fssm, sizeof(fstream_segment_mapper_t), INM_KERNEL_HEAP); fssm = NULL; } fstream_segment_mapper_t * fstream_segment_mapper_get(fstream_segment_mapper_t * fssm) { INM_ATOMIC_INC(&fssm->refcnt); return fssm; } void fstream_segment_mapper_put(fstream_segment_mapper_t * fssm) { if (INM_ATOMIC_DEC_AND_TEST(&fssm->refcnt)) fstream_segment_mapper_dtr(fssm); } inm_s32_t fstream_segment_mapper_attach(fstream_segment_mapper_t *fssm, bitmap_api_t *bapi, inm_u64_t offset, inm_u64_t min_file_size, inm_u32_t segment_cache_limit) { inm_u64_t _min_file_size = min_file_size; inm_s32_t _rc = 1; inm_s32_t npages = 0; unsigned char **page_ptr = NULL; inm_u64_t index = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } fssm->bapi = bapi; bapi->fssm = fssm; fssm->segment_size = BITMAP_FILE_SEGMENT_SIZE; fssm->starting_offset = offset; INM_DO_DIV(_min_file_size, fssm->segment_size); fssm->cache_size = (inm_u32_t)(_min_file_size + 1); npages = (fssm->cache_size * sizeof(iobuffer_t *))/BITMAP_FILE_SEGMENT_SIZE + (((fssm->cache_size * sizeof(iobuffer_t *))%BITMAP_FILE_SEGMENT_SIZE)?1:0); INM_INIT_LIST_HEAD(&fssm->segment_list); fssm->nr_free_buffers = segment_cache_limit; /* Index page maintains page pointers in it */ fssm->buffer_cache_index = (unsigned char **) (INM_KMALLOC(BITMAP_FILE_SEGMENT_SIZE, INM_KM_SLEEP, INM_KERNEL_HEAP)); dbg("Allocated fssm->buffer_cache_index:%p \n ", fssm->buffer_cache_index); if (fssm->buffer_cache_index) { INM_MEM_ZERO(fssm->buffer_cache_index, BITMAP_FILE_SEGMENT_SIZE); page_ptr = fssm->buffer_cache_index; while (index != npages) { page_ptr[index] = (INM_KMALLOC(BITMAP_FILE_SEGMENT_SIZE, INM_KM_SLEEP, INM_KERNEL_HEAP)); dbg("Allocated buffer cache ptr:%p page_ptr: %p\n", page_ptr[index], page_ptr); if (!page_ptr[index]) { err("Error allocating memory for iobuffer pointers"); break; } INM_MEM_ZERO(page_ptr[index], BITMAP_FILE_SEGMENT_SIZE); index++; } /* error ? */ if (index && index != npages) { err("Error allocating memory for iobuffer pointers "); /* release incompletely allocated pages */ do { index--; info("Freeing buffer page:%p", page_ptr[index]); INM_KFREE(page_ptr[index], BITMAP_FILE_SEGMENT_SIZE, INM_KERNEL_HEAP); page_ptr[index] = NULL; } while (index != 0); dbg("Freeing buffer index page:%p", page_ptr); INM_KFREE(page_ptr, BITMAP_FILE_SEGMENT_SIZE, INM_KERNEL_HEAP); page_ptr = NULL; fssm->buffer_cache_index = NULL; } else _rc = 0; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with return status = %d", _rc); } return _rc; } inm_s32_t fstream_segment_mapper_detach(fstream_segment_mapper_t *fssm) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } return 0; } inm_s32_t fstream_segment_mapper_read_and_lock(fstream_segment_mapper_t *fssm, inm_u64_t offset, unsigned char **return_iobuf_ptr, inm_u32_t *return_seg_size) { inm_s32_t ret = 0; iobuffer_t *iob = NULL; unsigned char *data_buffer = NULL; inm_u32_t data_size = 0; inm_u32_t buffer_index = 0; //struct inm_list_entry *entry = NULL; inm_u64_t _offset = offset; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered\n"); } if (!fssm || !fssm->buffer_cache_index) return -EINVAL; INM_DO_DIV(_offset, fssm->segment_size); buffer_index = (inm_u32_t)_offset; /* first check the cache for the correct buffer */ if (get_iobuffer_cache_ptr(fssm, buffer_index) == NULL) { /* it's not in the cache, read it */ fssm->nr_cache_miss++; /* the segment is not in memory, try to bring it in */ if (fssm->nr_free_buffers) { /* We can allocate few more ioBuffers */ iob = iobuffer_ctr(fssm->bapi, fssm->segment_size, buffer_index); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("iob allocation = %p", iob); } if (iob) fssm->nr_free_buffers--; } if (iob == NULL) { /* The io buffer at the tail contains the segment which is LRU and * this can be replaced with the required segment */ iob = inm_list_entry(fssm->segment_list.prev, iobuffer_t, list_entry); /* remove from head */ inm_list_del(&iob->list_entry); INM_INIT_LIST_HEAD(&iob->list_entry); /* Since we are evacuating this segment from the ioBuffer, * we need to flush it */ ret = iobuffer_sync_flush(iob); if (ret) { /* We are unsuccesful in evacuating this segment from the * IOBuffer, so insert this ioBuffer back at the tail of * the list */ inm_list_add_tail(&iob->list_entry, &fssm->segment_list); iob = NULL; } if (iob) { /* Things went right in evacuating the existing segment from * the IOBuffer, erase this segment off from the mapping in * bufferCache, as we will be evacuating this segment */ reset_iobuffer_cache_ptr(fssm, iobuffer_get_owner_index(iob), NULL); /* Reuse this iobuffer for new bufferIndex */ iobuffer_set_owner_index(iob, buffer_index); } } if (!iob) { ret = -ENOMEM; } else { iobuffer_set_fstream(iob, fssm->bapi->fs); iobuffer_set_foffset(iob, fssm->starting_offset + (buffer_index * fssm->segment_size)); ret = iobuffer_sync_read(iob); if (ret) { iobuffer_put(iob); iob = NULL; fssm->nr_free_buffers++; } else { reset_iobuffer_cache_ptr(fssm, buffer_index, iob); /* Since this is the segment that is referenced RIGHT NOW, * it will be placed at the head of the list to signify that * it is the Most-Recently referenced segment * *add most recent one to head */ inm_list_add(&iob->list_entry, &fssm->segment_list); } } } else { fssm->nr_cache_hits++; } if (get_iobuffer_cache_ptr(fssm, buffer_index) != NULL) { inm_u64_t _mod = 0; iobuffer_t *iob_ptr = NULL; /* it's in the cache */ iob_ptr = get_iobuffer_cache_ptr(fssm, buffer_index); data_buffer = iob_ptr->buffer; /* make return pointer be at correct byte */ /* 64bit division is not directly allowed on linux 32bit kernels * data_buffer += (offset % (inm_u64_t) fssm->segment_size); **/ #ifdef INM_LINUX _offset = offset; _mod = do_div(_offset, fssm->segment_size); data_buffer += (inm_u32_t)_mod; #else _mod = (offset % (inm_u64_t) fssm->segment_size); _mod = (offset % (inm_u64_t) fssm->segment_size); data_buffer += _mod; #endif data_size = (inm_u32_t)(fssm->segment_size - _mod); iobuffer_lockbuffer(iob_ptr); ret = 0; } *return_iobuf_ptr = data_buffer; *return_seg_size = data_size; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with status %d and buffer = %p", data_size, data_buffer); } return ret; } int fstream_segment_mapper_unlock_and_mark_dirty(fstream_segment_mapper_t * fssm, inm_u64_t offset) { inm_u32_t buffer_index; inm_u64_t _offset = offset; iobuffer_t *iob = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } INM_DO_DIV(_offset, fssm->segment_size); buffer_index = (inm_u32_t)_offset; iob = get_iobuffer_cache_ptr(fssm, buffer_index); if (iob == NULL) return -EINVAL; iobuffer_setdirty(iob); iobuffer_unlockbuffer(iob); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return 0; } inm_s32_t fstream_segment_mapper_unlock(fstream_segment_mapper_t * fssm, inm_u64_t offset) { inm_u32_t buffer_index; inm_u64_t _offset = offset; iobuffer_t *iob = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } INM_DO_DIV(_offset, fssm->segment_size); buffer_index = _offset; iob = get_iobuffer_cache_ptr(fssm, buffer_index); if (iob == NULL) return -EINVAL; iobuffer_unlockbuffer(iob); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return 0; } inm_s32_t fstream_segment_mapper_flush(fstream_segment_mapper_t * fssm, inm_u64_t offset) { inm_s32_t ret = 0; inm_u32_t buffer_index; inm_u64_t _offset = offset; iobuffer_t *iob = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } INM_DO_DIV(_offset, fssm->segment_size); buffer_index = _offset; iob = get_iobuffer_cache_ptr(fssm, buffer_index); if (iob == NULL) return EINVAL; ret = iobuffer_sync_flush(iob); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with return value = %d", ret); } return ret; } inm_s32_t fstream_segment_mapper_sync_flush_all(fstream_segment_mapper_t * fssm) { inm_s32_t ret = 0, r = 0; inm_u32_t buffer_index; iobuffer_t *iob = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (fssm->buffer_cache_index == NULL) return 0; /* flush all the iobuffers */ for (buffer_index = 0; buffer_index < fssm->cache_size; buffer_index++) { iob = get_iobuffer_cache_ptr(fssm, buffer_index); if (!iob) continue; r = iobuffer_sync_flush(iob); if (r) ret = r; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with ret value = %d", ret); } return ret; } involflt-0.1.0/src/involflt.mak0000755000000000000000000000251514467303177015203 0ustar rootroot# SPDX-License-Identifier: GPL-2.0-only # Copyright (C) 2022 Microsoft Corporation # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License along # with this program; if not, write to the Free Software Foundation, Inc., # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. ifeq (, $(WORKING_DIR)) WORKING_DIR=${shell pwd} endif ifeq (, ${BLD_INVOLFLT}) BLD_INVOLFLT=bld_involflt endif BLD_DIR=${WORKING_DIR}/${BLD_INVOLFLT}/ .PHONY: all clean all: @rm -rf ${BLD_INVOLFLT} @rm -rf ${BLD_DIR} @mkdir -p ${BLD_DIR} @cp ${WORKING_DIR}/*.[ch] ${BLD_DIR}/ @cp ${WORKING_DIR}/Makefile ${BLD_DIR}/ @cp ${WORKING_DIR}/uapi/*.[ch] ${BLD_DIR}/ @ln -s ${BLD_DIR} ${BLD_INVOLFLT} $(MAKE) debug=$(debug) KDIR=$(KDIR) WORKING_DIR=${WORKING_DIR} BLD_DIR=${BLD_DIR} TELEMETRY=${TELEMETRY} clean: @rm -rf ${BLD_INVOLFLT} @rm -rf ${BLD_DIR} involflt-0.1.0/src/md5.h0000755000000000000000000000303714467303177013512 0ustar rootroot/* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ /* * Define functions used to compute MD5 checksum */ #ifndef MD5__H #define MD5__H #ifdef __alpha typedef unsigned inm_s32_t uint32; #else //typedef unsigned long uint32; /* changed to inm_u32_t, on 64 bit m/c unsigned long is 64 bit */ typedef inm_u32_t uint32; #endif #define MD5TEXTSIGLEN 32 typedef struct MD5Context { uint32 buf[4]; uint32 bits[2]; unsigned char in[64]; } MD5Context; typedef struct MD5Context MD5_CTX; #ifdef __cplusplus extern "C" { #endif void byteReverse(unsigned char *buf, unsigned longs); void MD5Transform(uint32 buf[4], uint32 in[16]); void MD5Init(MD5Context *ctx); void MD5Update(MD5Context *ctx, unsigned char *buf, unsigned len); void MD5Final(unsigned char digest[16], struct MD5Context *ctx); #ifdef __cplusplus } #endif #endif /* !MD5__H */ involflt-0.1.0/src/filter_lun.h0000755000000000000000000000770414467303177015175 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _INMAGE_FILTER_TARGET_H_ #define _INMAGE_FILTER_TARGET_H_ #include "change-node.h" #include "data-file-mode.h" #include "target-context.h" #include "driver-context.h" #include #include #include #include "emd.h" struct _change_node; struct _volume_context; #define VOLUME_FILTERING_DISABLED_ATTR "VolumeFilteringDisabled" #define AT_LUN_TYPE "LunType" #define AT_BLOCK_SIZE "BlockSize" //#define isdigit(n) ((n) >= '0' && (n) <= '9') #define INM_SECTORS_PER 63 #define DEF_SECTORS 56 #define DEF_HEADS 255 #define MODES_ENSE_BUF_SZ 256 #define DBD 0x08 /* disable block descriptor */ #define WP 0x80 /* write protect */ #define DPOFUA 0x10 /* DPOFUA bit */ #define WCE 0x04 /* write cache enable */ #define BYTE 8 #ifndef _TARGET_VOLUME_CTX #define _TARGET_VOLUME_CTX #define TARGET_VOLUME_DIRECT_IO 0x00000001 typedef struct initiator_node { struct inm_list_head init_list; char *initiator_wwpn; /* Can be FC wwpn or iSCSI iqn name */ inm_u64_t timestamp; /* Last IO timestamp */ } initiator_node_t; typedef struct target_volume_ctx { target_context_t *vcptr; inm_u32_t bsize; inm_u64_t nblocks; inm_u32_t virt_id; inm_atomic_t remote_volume_refcnt; /* keep track of last write that the initiaitor has performed. */ char initiator_name[MAX_INITIATOR_NAME_LEN]; char pt_guid[INM_GUID_LEN_MAX]; inm_u32_t flags; /* list of "initiator_node_t" */ struct inm_list_head init_list; } target_volume_ctx_t; #endif /*_TARGET_VOLUME_CTX */ target_volume_ctx_t *alloc_target_volume_context(void); inm_s32_t register_filter_target(void); inm_s32_t unregister_filter_target(void); inm_s32_t register_bypass_target(void); inm_s32_t unregister_bypass_target(void); inm_s32_t init_emd(void); void exit_emd(void); void copy_iovec_data_to_data_pages(inm_wdata_t *, struct inm_list_head *); inm_s32_t process_at_lun_delete(struct file *, void __user *); static inline void update_initiator(target_volume_ctx_t* tvcptr, char *iname) { strncpy_s(tvcptr->initiator_name, MAX_INITIATOR_NAME_LEN, iname, MAX_INITIATOR_NAME_LEN - 1); tvcptr->initiator_name[MAX_INITIATOR_NAME_LEN - 1] = '\0'; return; } inm_s32_t filter_lun_create(char*, inm_u64_t, inm_u32_t, inm_u64_t); inm_s32_t filter_lun_delete(char*); inm_s32_t get_at_lun_last_write_vi(char*,char*); inm_s32_t get_at_lun_last_host_io_timestamp(AT_LUN_LAST_HOST_IO_TIMESTAMP *); inm_s32_t get_lun_query_data(inm_u32_t ,inm_u32_t*,LunData*); #ifdef SV_FABRICLUN_PERSISTENCE void fabric_set_volume_name(char*,char*); void fabric_recreate_at_lun(char*); inm_s32_t fabric_read_persistent_attr(char* fname, char* fabric_attr, void **buf, inm_s32_t len, int* bytes_read); inm_s32_t volume_filter_disabled(char*, int*); inm_s32_t read_at_lun_block_size(char*, inm_u64_t*); inm_s32_t read_at_lun_type(char*,LunType*); #endif /* SV_FABRICLUN_PERSISTENCE */ inm_s32_t inm_validate_fabric_vol(target_context_t *tcp, inm_dev_info_t const *dip); #endif /* _INMAGE_FILTER_TARGET_H_ */ involflt-0.1.0/src/work_queue.h0000755000000000000000000000615014467303177015212 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _INMAGE_WORK_QUEUE_H #define _INMAGE_WORK_QUEUE_H #include "involflt-common.h" #define WQ_FLAGS_THREAD_SHUTDOWN 0x00000001 #define WQ_FLAGS_THREAD_WAKEUP 0x00000002 #define WQ_FLAGS_REORG_DP_ALLOC 0x00000004 #define MIN_FREE_PAGES_TO_FREE_LAST_WHOLE_SLAB_PERCENT 190 #define MIN_FREE_PAGES_TO_ALLOC_SLAB_PERCENT 10 #define WQE_FLAGS_THREAD_SHUTDOWN 0x00000001 typedef struct _workq { struct inm_list_head worker_queue_head; inm_completion_t new_event_completion; inm_wait_queue_head_t wakeup_event; inm_wait_queue_head_t shutdown_event; inm_atomic_t wakeup_event_raised; inm_atomic_t shutdown_event_raised; int (*worker_thread_routine)(void *); inm_spinlock_t lock; inm_u32_t flags; inm_s32_t worker_thread_initialized; inm_completion_t worker_thread_completion; #if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 5, 0) struct task_struct *task; #endif } workq_t; typedef struct _wqentry { struct inm_list_head list_entry; void (* work_func)(struct _wqentry *); void *context; inm_atomic_t refcnt; inm_u32_t witem_type; inm_u32_t flags; inm_u32_t extra1; inm_u32_t extra2; } wqentry_t; enum { WITEM_TYPE_UNINITIALIZED = 0, WITEM_TYPE_OPEN_BITMAP = 1, WITEM_TYPE_BITMAP_WRITE = 2, WITEM_TYPE_START_BITMAP_READ = 3, WITEM_TYPE_CONTINUE_BITMAP_READ = 4, WITEM_TYPE_VOLUME_UNLOAD = 5, WITEM_TYPE_SYSTEM_SHUTDOWN = 6, WITEM_TYPE_TIMEOUT = 7, WITEM_TYPE_TELEMETRY_FLUSH = 8, }; typedef struct flt_timer { struct _wqentry ft_task; inm_timer_t ft_timer; } flt_timer_t; typedef void (*timeout_t)(wqentry_t *); inm_s32_t force_timeout(flt_timer_t *timer); inm_s32_t end_timer(flt_timer_t *timer); void start_timer(flt_timer_t *timer, int timeout_ms, timeout_t callback); inm_s32_t init_work_queue(workq_t *work_q, int (*worker_thread_function)(void *)); void cleanup_work_queue(workq_t *work_q); void init_work_queue_entry(wqentry_t *wqe); wqentry_t *alloc_work_queue_entry(inm_u32_t gfpmask); void cleanup_work_queue_entry(wqentry_t *wqe); void get_work_queue_entry(wqentry_t *wqe); void put_work_queue_entry(wqentry_t *wqe); inm_s32_t add_item_to_work_queue(workq_t *work_q, wqentry_t *wq_entry); int generic_worker_thread_function(void *context); int timer_worker(void *context); inm_s32_t wrap_reorg_datapool(void); #endif /* _INMAGE_WORK_QUEUE_H */ involflt-0.1.0/src/bitmap_operations.h0000755000000000000000000000410614467303177016542 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ /* * File : bitmap_operations.h * * Description: This file contains bitmap mode implementation of the * filter driver. */ #ifndef _INMAGE_BITMAP_OPERATIONS_H #define _INMAGE_BITMAP_OPERATIONS_H #include "involflt-common.h" inm_u32_t find_number_of_bits_set(unsigned char *bit_buffer, inm_u32_t buffer_size_in_bits); inm_s32_t SetBitmapBitRun(unsigned char * bitBuffer, inm_u32_t bitsInBitmap, inm_u32_t bitsInRun, inm_u32_t bitOffset, inm_u32_t * nbrBytesChanged, unsigned char * *firstByteChanged); inm_s32_t ClearBitmapBitRun(unsigned char * bitBuffer, inm_u32_t bitsInBitmap, inm_u32_t bitsInRun, inm_u32_t bitOffset, inm_u32_t * nbrBytesChanged, unsigned char * *firstByteChanged); inm_s32_t InvertBitmapBitRun(unsigned char * bitBuffer, inm_u32_t bitsInBitmap, inm_u32_t bitsInRun, inm_u32_t bitOffset, inm_u32_t * nbrBytesChanged, unsigned char * *firstByteChanged); inm_s32_t GetNextBitmapBitRun( unsigned char * bitBuffer, inm_u32_t totalBitsInBitmap, inm_u32_t * startingBitOffset, inm_u32_t * bitsInRun, inm_u32_t * bitOffset); #endif /* _INMAGE_BITMAP_OPERATIONS_H */ involflt-0.1.0/src/filter.c0000755000000000000000000022426414467303177014314 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include "involflt.h" #include "involflt-common.h" #include "utils.h" #include "change-node.h" #include "filestream.h" #include "iobuffer.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "bitmap_api.h" #include "VBitmap.h" #include "work_queue.h" #include "data-file-mode.h" #include "target-context.h" #include "data-mode.h" #include "driver-context.h" #include "file-io.h" #include "metadata-mode.h" #include "statechange.h" #include "tunable_params.h" #include "db_routines.h" #include "filter.h" #include "filter_host.h" #include "osdep.h" #include "telemetry-types.h" #include "telemetry.h" #include "errlog.h" #ifdef INM_LINUX #include "filter_lun.h" #endif #define MIN_INFOS_POOL_NUMBER 256 extern driver_context_t *driver_ctx; #ifdef IDEBUG_MIRROR_IO extern inm_s32_t inject_atio_err; extern inm_s32_t inject_ptio_err; extern inm_s32_t inject_vendorcdb_err; extern inm_s32_t clear_vol_entry_err; #endif #ifdef INM_LINUX extern inm_s32_t driver_state; #endif void do_stop_filtering(target_context_t *); extern void gen_bmaphdr_fname(char *volume_name, char *bhfname); extern inm_s32_t dev_validate(inm_dev_extinfo_t *, host_dev_ctx_t **); static void mv_stale_entry_to_dead_list(target_context_t *, struct inm_list_head *, struct inm_list_head *, struct inm_list_head *); static inm_s32_t inm_deref_all_vol_entry_list(struct inm_list_head *, target_context_t *); static void inm_mirror_done(inm_mirror_bufinfo_t *mbufinfo); static mirror_vol_entry_t * inm_get_healthy_vol_entry(target_context_t *tcp); static inm_mirror_atbuf * inm_freg_atbuf(inm_mirror_atbuf *atbuf_wrap); static inm_mirror_atbuf * inm_alloc_atbuf_wrap(inm_mirror_bufinfo_t *mbufinfo); static void inm_map_abs_off_ln(inm_buf_t *, target_context_t *, inm_u64_t *); inm_s32_t isrootdev(target_context_t *vcptr) { int isroot = 0; volume_lock(vcptr); if (vcptr->tc_flags & VCF_ROOT_DEV) isroot = 1; volume_unlock(vcptr); return isroot; } void init_volume_fully(target_context_t *tgt_ctxt, inm_dev_extinfo_t *dev_info) { info("Initialising %s fully", tgt_ctxt->tc_guid); if (strncmp(tgt_ctxt->tc_pname, dev_info->d_pname, INM_GUID_LEN_MAX)) { /* Root disk persistent name is different */ info("Updating %s pname: %s", tgt_ctxt->tc_guid, dev_info->d_pname); strcpy_s(tgt_ctxt->tc_pname, INM_GUID_LEN_MAX, dev_info->d_pname); } /* Check for bmap file in deprecated path */ fill_bitmap_filename_in_volume_context(tgt_ctxt); /* Read the tunables for this protected volume */ load_volume_params(tgt_ctxt); volume_lock(tgt_ctxt); tgt_ctxt->tc_flags &= ~VCF_VOLUME_STACKED_PARTIALLY; volume_unlock(tgt_ctxt); } int do_volume_stacking(inm_dev_extinfo_t *dev_info) { target_context_t *ctx, *tgt_ctx = NULL; host_dev_ctx_t *hdcp = NULL; mirror_vol_entry_t *vol_entry = NULL; inm_s32_t err = -1; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("do_volume_stacking: entered"); } ctx = target_context_ctr(); if (!ctx) { err = INM_ENOMEM; err("failed to alloc space for target context"); return err; } if ((err = dev_validate(dev_info, &hdcp))) { err("Failed to validate the device with error %d\n", err); target_context_dtr(ctx); return err; } retry: INM_DOWN_WRITE(&(driver_ctx->tgt_list_sem)); /* check if sysfs entry with the same d_guid/scsi id has already added to sysfs * This is required to break out from Double stacking retry loop */ switch(dev_info->d_type) { case FILTER_DEV_HOST_VOLUME: case FILTER_DEV_FABRIC_LUN: tgt_ctx = get_tgt_ctxt_from_uuid_locked(dev_info->d_guid); /* Another disk with same persistent name is not protected */ if (!tgt_ctx) { tgt_ctx = get_tgt_ctxt_from_name_nowait_locked(dev_info->d_pname); if (tgt_ctx) put_tgt_ctxt(tgt_ctx); } break; case FILTER_DEV_MIRROR_SETUP: tgt_ctx = get_tgt_ctxt_from_scsiid_locked(dev_info->d_src_scsi_id); break; default: err("Invalid case for filter dev type:%d", dev_info->d_type); } if (!tgt_ctx) { ctx->tc_priv = hdcp; INM_INIT_LIST_HEAD(&ctx->tc_list); INM_INIT_LIST_HEAD(&ctx->tc_nwo_dmode_list); INM_INIT_LIST_HEAD(&ctx->cdw_list); INM_INIT_SEM(&ctx->cdw_sem); ctx->tc_dev_type = dev_info->d_type; strcpy_s(ctx->tc_guid, INM_GUID_LEN_MAX, dev_info->d_guid); INM_INIT_LIST_HEAD(&(ctx->tc_src_list)); INM_INIT_LIST_HEAD(&(ctx->tc_dst_list)); switch (ctx->tc_dev_type) { case FILTER_DEV_FABRIC_LUN: case FILTER_DEV_HOST_VOLUME: strcpy_s(ctx->tc_pname, INM_GUID_LEN_MAX, dev_info->d_pname); break; case FILTER_DEV_MIRROR_SETUP: strcpy_s(ctx->tc_pname, INM_GUID_LEN_MAX, dev_info->d_src_scsi_id); /* check for boot time stacking ioctl for mirror setup */ if (dev_info->d_flags & MIRROR_VOLUME_STACKING_FLAG) { INM_BUG_ON(inm_list_empty(dev_info->src_list)); INM_BUG_ON(inm_list_empty(dev_info->dst_list)); list_change_head(&ctx->tc_src_list, dev_info->src_list); list_change_head(&ctx->tc_dst_list, dev_info->dst_list); dev_info->src_list = &ctx->tc_src_list; dev_info->dst_list = &ctx->tc_dst_list; /* Get the first entry in the list and set it as tcp mirror_dev */ vol_entry = inm_list_entry(ctx->tc_dst_list.next, mirror_vol_entry_t, next); INM_BUG_ON(!vol_entry); ctx->tc_vol_entry = vol_entry; INM_BUG_ON(!(ctx->tc_vol_entry)); } break; default: err("invalid filter dev type:%d", ctx->tc_dev_type); } ctx->tc_flags = VCF_FILTERING_STOPPED | VCF_VOLUME_CREATING; #ifdef INM_LINUX if (driver_state & DRV_LOADED_PARTIALLY) { ctx->tc_flags |= VCF_VOLUME_STACKED_PARTIALLY; ctx->tc_flags |= VCF_VOLUME_INITRD_STACKED; } #endif INM_INIT_WAITQUEUE_HEAD(&ctx->tc_waitq); INM_TRY_MODULE_GET(); add_tc_to_dc(ctx); INM_UP_WRITE(&(driver_ctx->tgt_list_sem)); if ((err = tgt_ctx_common_init(ctx, dev_info))) { if (err == INM_EEXIST) { dbg("Double stacking failed for %s\n", dev_info->d_guid); INM_DELAY(3*INM_HZ); goto retry; } return err; } if((err = tgt_ctx_spec_init(ctx, dev_info))) { free_data_file_flt_ctxt(ctx); put_tgt_ctxt(ctx); return err; } volume_lock(ctx); ctx->tc_flags &= ~VCF_VOLUME_CREATING; if (dev_info->d_flags & HOST_VOLUME_STACKING_FLAG) { ctx->tc_flags |= VCF_VOLUME_BOOTTIME_STACKED; #ifdef INITRD_MODE /* If any disk is not protected in initrd and then protected at * boot time stacking stage, needs to be marked for resync. */ if ((driver_state & DRV_LOADED_FULLY) && !(ctx->tc_flags & VCF_VOLUME_INITRD_STACKED)) { err("The disk %s is not protected in initrd", ctx->tc_guid); queue_worker_routine_for_set_volume_out_of_sync(ctx, ERROR_TO_REG_UNCLEAN_SYS_BOOT, LINVOLFLT_ERR_IN_SYNC); } #endif } ctx->tc_flags |= VCF_IN_NWO; volume_unlock(ctx); INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); add_disk_sess_to_dc(ctx); driver_ctx->total_prot_volumes_in_nwo++; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); INM_DOWN_WRITE(&(driver_ctx->tgt_list_sem)); if(!(dev_info->d_flags & (MIRROR_VOLUME_STACKING_FLAG | HOST_VOLUME_STACKING_FLAG))){ ctx->tc_dev_startoff = dev_info->d_startoff; } wake_up_tc_state(ctx); INM_UP_WRITE(&(driver_ctx->tgt_list_sem)); info("Stacked: %s -> %s", dev_info->d_pname, dev_info->d_guid); err = 0; } else { /* get_tgt_ctxt_from_uuid_locked() returns without reference */ get_tgt_ctxt(tgt_ctx); INM_UP_WRITE(&(driver_ctx->tgt_list_sem)); if (tgt_ctx->tc_flags & VCF_VOLUME_STACKED_PARTIALLY) { if (driver_state & DRV_LOADED_FULLY) init_volume_fully(tgt_ctx, dev_info); err = 0; } else { if ((err = inm_validate_tc_devattr(tgt_ctx, (inm_dev_info_t *)dev_info))) { if ((err = inm_is_upgrade_pname(tgt_ctx->tc_pname, dev_info->d_pname))) { err("Existing: %s -> %s, Requested %s -> %s", tgt_ctx->tc_pname, tgt_ctx->tc_guid, dev_info->d_pname, dev_info->d_guid); } else { dbg("PNAME: %s == %s",tgt_ctx->tc_pname, dev_info->d_pname); } } else { dbg("Volume Stacking already done for %s\n", dev_info->d_guid); err = 0; } } put_tgt_ctxt(tgt_ctx); target_context_dtr(ctx); inm_free_host_dev_ctx(hdcp); } if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("do_volume_stacking: leaving err:%d", err); } return err; } /* driver state */ extern inm_s32_t inm_mod_state; void do_unstack_all() { target_context_t *tgt_ctxt; #ifdef INM_AIX int ipl = 0; #endif inm_mod_state |= INM_ALLOW_UNLOAD; INM_DOWN_WRITE(&(driver_ctx->tgt_list_sem)); get_time_stamp(&(driver_ctx->dc_tel.dt_unstack_all_time)); retry: while(!inm_list_empty(&driver_ctx->tgt_list)){ tgt_ctxt = inm_list_entry(driver_ctx->tgt_list.next, target_context_t, tc_list); info("unstack_all : ctx:%p tc_guid:%s", tgt_ctxt, tgt_ctxt->tc_guid); if(check_for_tc_state(tgt_ctxt, 1)){ tgt_ctxt = NULL; goto retry; } #ifdef INM_AIX INM_SPIN_LOCK(&driver_ctx->tgt_list_lock, ipl); #endif volume_lock(tgt_ctxt); tgt_ctxt->tc_flags |= VCF_VOLUME_DELETING; tgt_ctxt->tc_filtering_disable_required = 1; get_time_stamp(&(tgt_ctxt->tc_tel.tt_user_stop_flt_time)); telemetry_set_dbs(&tgt_ctxt->tc_tel.tt_blend, DBS_FILTERING_STOPPED_BY_USER); volume_unlock(tgt_ctxt); #ifdef INM_AIX INM_SPIN_UNLOCK(&driver_ctx->tgt_list_lock, ipl); #endif if (driver_ctx->dc_root_disk == tgt_ctxt) driver_ctx->dc_root_disk = NULL; INM_UP_WRITE(&(driver_ctx->tgt_list_sem)); if(tgt_ctxt->tc_dev_type == FILTER_DEV_FABRIC_LUN) inm_scst_unregister(tgt_ctxt); tgt_ctx_force_soft_remove(tgt_ctxt); INM_DOWN_WRITE(&(driver_ctx->tgt_list_sem)); } INM_UP_WRITE(&(driver_ctx->tgt_list_sem)); } void start_notify_completion(void) { data_mode_cleanup_for_s2_exit(); } void involflt_completion(target_context_t *tgt_ctxt, write_metadata_t *wmdp, inm_wdata_t *wdatap, int lock_held) { INM_BUG_ON_TMP(tgt_ctxt); if (!lock_held) volume_lock(tgt_ctxt); tgt_ctxt->tc_bytes_tracked += wmdp->length; update_cx_session(tgt_ctxt, wmdp->length); if(!(tgt_ctxt->tc_flags & VCF_FILTERING_STOPPED)) { switch(tgt_ctxt->tc_cur_mode) { case FLT_MODE_DATA: save_data_in_data_mode(tgt_ctxt, wmdp, wdatap); break; default: if (save_data_in_metadata_mode(tgt_ctxt, wmdp, wdatap)) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("Save meta data function failed:...\n"); } } break; } if (!tgt_ctxt->tc_bp->volume_bitmap && !(tgt_ctxt->tc_flags & VCF_OPEN_BITMAP_REQUESTED) && can_open_bitmap_file(tgt_ctxt, FALSE) && (!driver_ctx->sys_shutdown)) { request_service_thread_to_open_bitmap(tgt_ctxt); } } if (!lock_held) volume_unlock(tgt_ctxt); if(should_wakeup_s2(tgt_ctxt)) INM_WAKEUP_INTERRUPTIBLE(&tgt_ctxt->tc_waitq); } int do_start_filtering(inm_devhandle_t *idhp, inm_dev_extinfo_t *dev_infop) { inm_s32_t r = 0; target_context_t *ctx = NULL; inm_block_device_t *src_dev; switch(dev_infop->d_type) { case FILTER_DEV_HOST_VOLUME: r = validate_pname(dev_infop->d_pname); if (!r) r = do_volume_stacking(dev_infop); if (r){ err("volume:%s -> %s stacking failed with %d", dev_infop->d_guid, dev_infop->d_pname, r); } break; case FILTER_DEV_FABRIC_LUN: #ifdef INM_LINUX src_dev = NULL; src_dev = open_by_dev_path(dev_infop->d_guid, 0); if (src_dev) { close_bdev(src_dev, FMODE_READ); r = INM_EINVAL; err("Device:%s incorrectly sent as source device for AT LUN", dev_infop->d_guid); } #else INM_BUG_ON(1); #endif break; default: r = INM_EINVAL; err("Invalid source:%s device type:%d", dev_infop->d_guid, dev_infop->d_type); } if (r) return r; ctx = get_tgt_ctxt_from_uuid(dev_infop->d_guid); if (ctx) { inm_s32_t flt_on = FALSE; volume_lock(ctx); get_time_stamp(&(ctx->tc_tel.tt_start_flt_time_by_user)); if(is_target_filtering_disabled(ctx)) { ctx->tc_flags &= ~VCF_FILTERING_STOPPED; ctx->tc_flags |= VCF_IGNORE_BITMAP_CREATION; flt_on = TRUE; } /* Request the service thread to open the bitmap file, by which * the filtering mode & write order state would change and helps * in issuing tags for a disk (belonging to a volume group) with * no I/Os. */ if (!ctx->tc_bp->volume_bitmap && !(ctx->tc_flags & VCF_OPEN_BITMAP_REQUESTED) && can_open_bitmap_file(ctx, FALSE) && (!driver_ctx->sys_shutdown)){ request_service_thread_to_open_bitmap(ctx); } volume_unlock(ctx); if (flt_on) { set_int_vol_attr(ctx, VolumeFilteringDisabled, 0); set_unsignedlonglong_vol_attr(ctx, VolumeRpoTimeStamp, ctx->tc_hist.ths_start_flt_ts*HUNDREDS_OF_NANOSEC_IN_SECOND); } if (!idhp->private_data) { idhp->private_data = (void *)ctx; } else { put_tgt_ctxt(ctx); } if (dev_infop->d_type == FILTER_DEV_HOST_VOLUME && strcmp(dev_infop->d_mnt_pt, ctx->tc_mnt_pt)) { set_string_vol_attr(ctx, VolumeMountPoint, dev_infop->d_mnt_pt); } set_unsignedlonglong_vol_attr(ctx, VolumeIsDeviceMultipath, ((dev_infop->d_flags & INM_IS_DEVICE_MULTIPATH)? 1:0)); set_unsignedlonglong_vol_attr(ctx, VolumeDiskFlags, dev_infop->d_flags); telemetry_clear_dbs(&ctx->tc_tel.tt_blend, DBS_FILTERING_STOPPED_BY_USER); telemetry_clear_dbs(&ctx->tc_tel.tt_blend, DBS_FILTERING_STOPPED_BY_KERNEL); r = 0; } else { r = INM_EINVAL; err("start filtering:%s failing with EINVAL", dev_infop->d_guid); } return r; } mirror_vol_entry_t * build_mirror_volume(inm_s32_t vol_length, void *ptr, struct inm_list_head *mirror_list_head, int keep_device_open, int lun_type) { mirror_vol_entry_t *vol_entry = NULL; char *vol_ptr = NULL; keep_device_open = 1; vol_ptr = (char*)INM_KMALLOC(INM_GUID_LEN_MAX, INM_KM_SLEEP, INM_KERNEL_HEAP); if(!vol_ptr){ err("MIRROR Setup Failed: failed to allocate"); goto out; } if (!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)ptr, vol_length)) { err("MIRROR Setup Failed: Access violation while accessing \ element in volume list"); goto out; } if (INM_COPYIN(vol_ptr, ptr, vol_length)) { err("MIRROR Setup Failed: INM_COPYIN failed while accessing \ volumes"); goto out; } vol_ptr[vol_length-1] = '\0'; vol_entry = (mirror_vol_entry_t*)INM_KMALLOC(sizeof(mirror_vol_entry_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!vol_entry){ err("MIRROR Setup Failed: failed to allocate vol_entry"); goto out; } if(INM_PIN(vol_entry, sizeof(mirror_vol_entry_t))){ INM_KFREE(vol_entry, sizeof(mirror_vol_entry_t), INM_KERNEL_HEAP); vol_entry = NULL; goto out; } INM_MEM_ZERO(vol_entry, sizeof(mirror_vol_entry_t)); strncpy_s(vol_entry->tc_mirror_guid, INM_GUID_LEN_MAX, vol_ptr, vol_length); vol_entry->vol_flags = lun_type; /* Find out the mirror device */ if(inm_get_mirror_dev(vol_entry)){ INM_UNPIN(vol_entry, sizeof(mirror_vol_entry_t)); INM_KFREE(vol_entry, sizeof(mirror_vol_entry_t), INM_KERNEL_HEAP); vol_entry = NULL; goto out; } dbg("Mirror volume: Volume Path %s", vol_ptr); if (vol_entry && mirror_list_head) { vol_entry->vol_error = 0; vol_entry->vol_state = INM_VOL_ENTRY_ALIVE; vol_entry->vol_count = NO_SKIPS_AFTER_ERROR; INM_ATOMIC_SET(&vol_entry->vol_ref, 1); inm_list_add_tail(&vol_entry->next, mirror_list_head); } out: if (vol_ptr) { INM_KFREE(vol_ptr, INM_GUID_LEN_MAX, INM_KERNEL_HEAP); } return vol_entry; } void free_mirror_list(struct inm_list_head *list_head, int close_device) { struct inm_list_head *ptr = NULL,*nextptr = NULL; mirror_vol_entry_t *vol_entry; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("free_mirror_list: entered"); } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("Traversing volume list"); } close_device = 1; inm_list_for_each_safe(ptr, nextptr, list_head) { inm_list_del(ptr); vol_entry = inm_list_entry(ptr, mirror_vol_entry_t, next); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("vol:%s ",vol_entry->tc_mirror_guid); } inm_free_mirror_dev(vol_entry); vol_entry->vol_state = INM_VOL_ENTRY_FREED; INM_UNPIN(vol_entry, sizeof(mirror_vol_entry_t)); INM_KFREE(vol_entry, sizeof(mirror_vol_entry_t), INM_KERNEL_HEAP); } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("\nfree_mirror_list: leaving"); } } void print_mirror_list(struct inm_list_head *list_head) { struct inm_list_head *ptr = NULL,*nextptr = NULL; mirror_vol_entry_t *vol_entry; inm_list_for_each_safe(ptr, nextptr, list_head) { vol_entry = inm_list_entry(ptr, mirror_vol_entry_t, next); info(":%s: mirror_dev:%p",vol_entry->tc_mirror_guid, vol_entry->mirror_dev); } } inm_s32_t populate_volume_lists(struct inm_list_head *src_mirror_list_head, struct inm_list_head *dst_mirror_list_head, mirror_conf_info_t *mirror_infop) { inm_s32_t r, num_src_vols, num_dst_vols, i, vol_length; void *user_buf = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("entered"); } r = 0; /* Get the number of source volumes */ num_src_vols = mirror_infop->nsources; vol_length = INM_GUID_LEN_MAX; /* Form a list of source volume guids */ user_buf = mirror_infop->src_guid_list; for (i=1; i<=num_src_vols; i++) { if (!build_mirror_volume(vol_length, user_buf, src_mirror_list_head, 0, INM_PT_LUN)) { err("Error while building the source volume of mirror setup"); r = SRC_LUN_INVALID; break; } user_buf += vol_length; } if (r) { /* Release volume entries for source volume */ free_mirror_list(src_mirror_list_head, 0); return r; } /* Get the number of source volumes */ num_dst_vols = mirror_infop->ndestinations; /* Form a list of AT LUN guids */ user_buf = mirror_infop->dst_guid_list; for (i=1; i<=num_dst_vols; i++) { if (!build_mirror_volume(vol_length, user_buf, dst_mirror_list_head, 1, INM_AT_LUN)) { err("Error while building the destination volume of mirror setup"); r = ATLUN_INVALID; break; } user_buf += vol_length; } if (r) { /* Release volume entries for source and destination volumes */ free_mirror_list(src_mirror_list_head, 0); free_mirror_list(dst_mirror_list_head, 1); return r; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("leaving r:%d",r); } return r; } void write_src_dst_attr(target_context_t *ctxt, mirror_conf_info_t *mirror_infop) { struct inm_list_head *ptr = NULL, *nextptr = NULL; mirror_vol_entry_t *vol_entry; char *buf = NULL; int buf_len = 0; char *sptr; int size = mirror_infop->nsources*INM_GUID_LEN_MAX; buf = INM_KMALLOC(size, INM_KM_SLEEP, INM_KERNEL_HEAP); INM_MEM_ZERO(buf, size); sptr = buf; volume_lock(ctxt); inm_list_for_each_safe(ptr, nextptr, &ctxt->tc_src_list) { vol_entry = inm_list_entry(ptr, mirror_vol_entry_t, next); if (sptr!=buf) { snprintf(sptr, size-buf_len, ","); sptr += 1; buf_len += 1; } snprintf(sptr, strlen(vol_entry->tc_mirror_guid)+1, "%s", vol_entry->tc_mirror_guid); sptr += strlen(vol_entry->tc_mirror_guid); buf_len += strlen(vol_entry->tc_mirror_guid); } volume_unlock(ctxt); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("Formed source list:%s size:%d", buf, (int)strlen(buf)); } set_string_vol_attr(ctxt, VolumeMirrorSourceList, buf); if (buf) { INM_KFREE(buf, size, INM_KERNEL_HEAP); buf = NULL; } size = mirror_infop->ndestinations*INM_GUID_LEN_MAX; buf = INM_KMALLOC(size, INM_KM_SLEEP, INM_KERNEL_HEAP); INM_MEM_ZERO(buf, size); sptr = buf; volume_lock(ctxt); inm_list_for_each_safe(ptr, nextptr, &ctxt->tc_dst_list) { vol_entry = inm_list_entry(ptr, mirror_vol_entry_t, next); if (sptr!=buf) { snprintf(sptr, size-buf_len, ","); sptr += 1; buf_len += 1; } snprintf(sptr, size-buf_len, "%s", vol_entry->tc_mirror_guid); sptr += strlen(vol_entry->tc_mirror_guid); buf_len += strlen(vol_entry->tc_mirror_guid); } volume_unlock(ctxt); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("Formed destination list:%s size:%d", buf, (int)strlen(buf)); } set_string_vol_attr(ctxt, VolumeMirrorDestinationList, buf); if (buf) { INM_KFREE(buf, size, INM_KERNEL_HEAP); } } int do_start_mirroring(inm_devhandle_t *idhp, mirror_conf_info_t *mirror_infop) { inm_s32_t r; inm_s32_t mark_resync = 0; target_context_t *ctx; inm_dev_extinfo_t *dev_infop = NULL; struct inm_list_head src_mirror_list_head, dst_mirror_list_head; struct inm_list_head del_src_mirror_list_head, del_dst_mirror_list_head; struct inm_list_head deref_mirror_list; struct inm_list_head *ptr1, *nextptr1; struct inm_list_head *ptr2, *nextptr2; mirror_vol_entry_t *vol_entry1, *vol_entry2; mirror_vol_entry_t *vol_entry = NULL; host_dev_t *hdc_dev = NULL; host_dev_ctx_t *hdcp = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("entered"); } ctx = NULL; r = INM_EINVAL; INM_INIT_LIST_HEAD(&src_mirror_list_head); INM_INIT_LIST_HEAD(&dst_mirror_list_head); INM_INIT_LIST_HEAD(&del_src_mirror_list_head); INM_INIT_LIST_HEAD(&del_dst_mirror_list_head); INM_INIT_LIST_HEAD(&deref_mirror_list); mirror_infop->d_status = populate_volume_lists(&src_mirror_list_head, &dst_mirror_list_head, mirror_infop); if (mirror_infop->d_status) { r = EINVAL; goto out; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("source volumes:%u scsi_id:%s: list: ", mirror_infop->nsources, mirror_infop->src_scsi_id); print_mirror_list(&src_mirror_list_head); info("destination volumes:%u scsi_id:%s: list: ", mirror_infop->ndestinations, mirror_infop->dst_scsi_id); print_mirror_list(&dst_mirror_list_head); } if (!mirror_infop->nsources) { err("Number of source devices is zero"); r = EINVAL; mirror_infop->d_status = SRC_LUN_INVALID; free_mirror_list(&src_mirror_list_head, 0); free_mirror_list(&dst_mirror_list_head, 1); goto out; } if (!mirror_infop->ndestinations) { err("Number of destination devices is zero"); r = EINVAL; mirror_infop->d_status = ATLUN_INVALID; free_mirror_list(&src_mirror_list_head, 0); free_mirror_list(&dst_mirror_list_head, 1); goto out; } if (mirror_infop->src_scsi_id[0] == ' ' || mirror_infop->src_scsi_id[0] == '\0') { err("Empty source scsi id:%s:",mirror_infop->src_scsi_id); r = EINVAL; mirror_infop->d_status = SRC_DEV_SCSI_ID_ERR; free_mirror_list(&src_mirror_list_head, 0); free_mirror_list(&dst_mirror_list_head, 1); goto out; } if (mirror_infop->dst_scsi_id[0] == ' ' || mirror_infop->dst_scsi_id[0] == '\0') { err("Empty destination scsi id:%s:",mirror_infop->src_scsi_id); r = EINVAL; mirror_infop->d_status = DST_DEV_SCSI_ID_ERR; free_mirror_list(&src_mirror_list_head, 0); free_mirror_list(&dst_mirror_list_head, 1); goto out; } vol_entry = inm_list_entry(src_mirror_list_head.next, mirror_vol_entry_t, next); INM_BUG_ON(!vol_entry); dev_infop = (inm_dev_extinfo_t *)INM_KMALLOC(sizeof(inm_dev_extinfo_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!dev_infop) { err("INM_KMALLOC failed to allocate memory for inm_dev_info_t"); r = INM_ENOMEM; mirror_infop->d_status = DRV_MEM_ALLOC_ERR; free_mirror_list(&src_mirror_list_head, 0); free_mirror_list(&dst_mirror_list_head, 1); goto out; } INM_MEM_ZERO(dev_infop, sizeof(inm_dev_extinfo_t)); dev_infop->d_type = mirror_infop->d_type; strncpy_s(dev_infop->d_guid, INM_GUID_LEN_MAX, vol_entry->tc_mirror_guid, strlen(vol_entry->tc_mirror_guid)); strncpy_s(dev_infop->d_src_scsi_id, INM_GUID_LEN_MAX, mirror_infop->src_scsi_id, INM_MAX_SCSI_ID_SIZE); strncpy_s(dev_infop->d_dst_scsi_id, INM_GUID_LEN_MAX, mirror_infop->dst_scsi_id, INM_MAX_SCSI_ID_SIZE); dev_infop->d_guid[INM_GUID_LEN_MAX-1] = '\0'; dev_infop->d_src_scsi_id[strlen(mirror_infop->src_scsi_id)] = '\0'; dev_infop->d_dst_scsi_id[strlen(mirror_infop->dst_scsi_id)] = '\0'; dev_infop->d_flags = mirror_infop->d_flags; dev_infop->d_nblks = mirror_infop->d_nblks; dev_infop->d_bsize = mirror_infop->d_bsize; dev_infop->src_list = &src_mirror_list_head; dev_infop->dst_list = &dst_mirror_list_head; dev_infop->d_startoff = mirror_infop->startoff; if (mirror_infop->d_type == FILTER_DEV_MIRROR_SETUP) { r = do_volume_stacking(dev_infop); if (r) { mirror_infop->d_status = MIRROR_STACKING_ERR; free_mirror_list(&src_mirror_list_head, 0); free_mirror_list(&dst_mirror_list_head, 1); goto out; } } ctx = get_tgt_ctxt_from_scsiid(dev_infop->d_src_scsi_id); if (ctx) { inm_s32_t flt_on = FALSE; INM_DOWN_READ(&(driver_ctx->tgt_list_sem)); if (mirror_infop->d_flags & MIRROR_SETUP_PENDING_RESYNC_CLEARED_FLAG) { reset_volume_out_of_sync(ctx); } volume_lock(ctx); if (mirror_infop->d_flags & MIRROR_SETUP_PENDING_RESYNC_CLEARED_FLAG) { ctx->tc_flags &= ~VCF_MIRRORING_PAUSED; } if (r) { goto err_case; } if (is_target_filtering_disabled(ctx)) { ctx->tc_flags &= ~VCF_FILTERING_STOPPED; ctx->tc_flags |= VCF_IGNORE_BITMAP_CREATION; ctx->tc_flags |= VCF_DATA_FILES_DISABLED; ctx->tc_flags |= VCF_BITMAP_WRITE_DISABLED; ctx->tc_flags |= VCF_BITMAP_READ_DISABLED; flt_on = TRUE; } /* disable data files for mirror setup */ ctx->tc_flags |= VCF_DATA_FILES_DISABLED; /* In case of update, source devices may have new path in source list */ if (!inm_list_empty(&ctx->tc_src_list)) { /* Discard filtered paths from the list */ inm_list_for_each_safe(ptr1, nextptr1, &src_mirror_list_head) { vol_entry1 = inm_list_entry(ptr1, mirror_vol_entry_t, next); inm_list_for_each_safe(ptr2, nextptr2, &ctx->tc_src_list) { vol_entry2 = inm_list_entry(ptr2, mirror_vol_entry_t, next); if (!strcmp(vol_entry1->tc_mirror_guid, vol_entry2->tc_mirror_guid)) { inm_list_del(ptr1); inm_list_add_tail(ptr1, &del_src_mirror_list_head); break; } } } /* check if new path has been added for source device */ if (!inm_list_empty(&src_mirror_list_head)) { hdcp = (host_dev_ctx_t*)ctx->tc_priv; inm_list_for_each_safe(ptr2, nextptr2, &src_mirror_list_head) { vol_entry2 = inm_list_entry(ptr2, mirror_vol_entry_t, next); info("Mirror setup for new path:%s for scsi id:%s", vol_entry2->tc_mirror_guid, mirror_infop->src_scsi_id); /* Process vol_entry for adding to the list */ hdc_dev = (host_dev_t*)INM_KMALLOC(sizeof(host_dev_t), INM_KM_NOSLEEP, INM_KERNEL_HEAP); INM_MEM_ZERO(hdc_dev, sizeof(host_dev_t)); if (hdc_dev) { #if (defined(INM_LINUX)) req_queue_info_t *q_info; hdc_dev->hdc_dev = vol_entry2->mirror_dev->bd_inode->i_rdev; hdc_dev->hdc_disk_ptr = vol_entry2->mirror_dev->bd_disk; volume_unlock(ctx); q_info = alloc_and_init_qinfo(vol_entry2->mirror_dev, ctx); volume_lock(ctx); if (!q_info) { r = INM_EINVAL; mirror_infop->d_status = DRV_MEM_ALLOC_ERR; INM_KFREE(hdc_dev, sizeof(host_dev_t), INM_KERNEL_HEAP); err("Failed to allocate and initialize q_info during" "mirror setup"); break; } hdc_dev->hdc_req_q_ptr = q_info; INM_ATOMIC_INC(&q_info->vol_users); init_tc_kobj(vol_entry2->mirror_dev, &hdc_dev->hdc_disk_kobj_ptr); inm_list_add_tail(&hdc_dev->hdc_dev_list, &hdcp->hdc_dev_list_head); #endif #if (defined(INM_SOLARIS) || defined(INM_AIX)) hdc_dev->hdc_dev = *(vol_entry2->mirror_dev); inm_list_add_tail(&hdc_dev->hdc_dev_list, &hdcp->hdc_dev_list_head); #endif inm_list_del(ptr2); inm_list_add_tail(ptr2, &ctx->tc_src_list); } } mark_resync = 1; } } else if (inm_list_empty(&ctx->tc_src_list)) { list_change_head(&ctx->tc_src_list, &src_mirror_list_head); } mv_stale_entry_to_dead_list(ctx, &dst_mirror_list_head, &del_dst_mirror_list_head, &deref_mirror_list); if (!inm_list_empty(&dst_mirror_list_head)) { inm_list_splice_at_tail(&dst_mirror_list_head, &ctx->tc_dst_list); } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("destination scsi_id:%s: list: ", mirror_infop->dst_scsi_id); print_mirror_list(&(ctx->tc_dst_list)); } /* Get the first entry in the list and set it as tcp mirror_dev */ vol_entry = inm_list_entry(ctx->tc_dst_list.next, mirror_vol_entry_t, next); INM_BUG_ON(!vol_entry); ctx->tc_vol_entry = vol_entry; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ #if (defined(INM_LINUX)) info("Setting up mirror for source:%s mirror dev:%s(%d,%d)", ctx->tc_guid, vol_entry->tc_mirror_guid, INM_GET_MAJOR((ctx->tc_vol_entry->mirror_dev)->bd_dev), INM_GET_MINOR((ctx->tc_vol_entry->mirror_dev)->bd_dev)); #endif #if (defined(INM_SOLARIS) || defined(INM_AIX)) info("Setting up mirror for source:%s mirror dev:%s(%d,%d)", ctx->tc_guid, vol_entry->tc_mirror_guid, INM_GET_MAJOR(*(ctx->tc_vol_entry->mirror_dev)), INM_GET_MINOR(*(ctx->tc_vol_entry->mirror_dev))); #endif } err_case: volume_unlock(ctx); inm_deref_all_vol_entry_list(&deref_mirror_list, ctx); INM_UP_READ(&(driver_ctx->tgt_list_sem)); if (!inm_list_empty(&del_src_mirror_list_head)) { free_mirror_list(&del_src_mirror_list_head, 0); } if (!inm_list_empty(&del_dst_mirror_list_head)) { free_mirror_list(&del_dst_mirror_list_head, 1); } if (r) { if (!inm_list_empty(&del_src_mirror_list_head)) { free_mirror_list(&src_mirror_list_head, 0); } if (!inm_list_empty(&dst_mirror_list_head)) { free_mirror_list(&dst_mirror_list_head, 1); } put_tgt_ctxt(ctx); goto out; } write_src_dst_attr(ctx, mirror_infop); if (flt_on) { set_int_vol_attr(ctx, VolumeFilteringDisabled, 0); } set_unsignedlonglong_vol_attr(ctx, VolumeIsDeviceMultipath, ((mirror_infop->d_flags & INM_IS_DEVICE_MULTIPATH)? 1:0)); set_int_vol_attr(ctx, VolumeDeviceVendor, mirror_infop->d_vendor); set_unsignedlonglong_vol_attr(ctx, VolumeDiskFlags, mirror_infop->d_flags); set_unsignedlonglong_vol_attr(ctx, VolumeDevStartOff, mirror_infop->startoff); if (mark_resync) { queue_worker_routine_for_set_volume_out_of_sync(ctx, ERROR_TO_REG_NEW_SOURCE_PATH_ADDED, INM_EINVAL); } if (!idhp->private_data) { idhp->private_data = (void *)ctx; } else { put_tgt_ctxt(ctx); } } else { //need to change agent stuff r = INM_EINVAL; mirror_infop->d_status = MIRROR_STACKING_ERR; /* Release volume entries for source and destination volumes */ free_mirror_list(&src_mirror_list_head, 0); free_mirror_list(&dst_mirror_list_head, 1); } out: if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("leaving r:%d status:%u",r, mirror_infop->d_status); } if(dev_infop){ INM_KFREE(dev_infop, sizeof(inm_dev_extinfo_t), INM_KERNEL_HEAP); } return r; } extern void inm_exchange_strategy(host_dev_ctx_t *); void do_stop_filtering(target_context_t *tgt_ctxt) { struct inm_list_head chg_nodes_hd ; struct inm_list_head *curp = NULL, *nxtp = NULL; volume_bitmap_t *vbmap = NULL; bitmap_info_t *bitmap = tgt_ctxt->tc_bp; inm_s32_t seton = FALSE; inm_u64_t PrevEndSequenceNumber; inm_u64_t PrevEndTimeStamp; inm_u32_t PrevSequenceIDforSplitIO; host_dev_ctx_t *hdcp = tgt_ctxt->tc_priv; INM_INIT_LIST_HEAD(&chg_nodes_hd); volume_lock(tgt_ctxt); if(!is_target_filtering_disabled(tgt_ctxt)) { /* pending change nodes are marked as orphaned nodes as we do * in clear diffs */ if (tgt_ctxt->tc_pending_confirm && !(tgt_ctxt->tc_pending_confirm->flags & CHANGE_NODE_ORPHANED)) { change_node_t *cnp = tgt_ctxt->tc_pending_confirm; cnp->flags |= CHANGE_NODE_ORPHANED; inm_list_del_init(&cnp->next); /* If change node on non write order data mode list, then * remove it from that list */ if (!inm_list_empty(&cnp->nwo_dmode_next)) { inm_list_del_init(&cnp->nwo_dmode_next); } deref_chg_node(cnp); } inm_list_for_each_safe(curp, nxtp, &tgt_ctxt->tc_nwo_dmode_list) { inm_list_del_init(curp); } list_change_head(&chg_nodes_hd, &tgt_ctxt->tc_node_head); INM_INIT_LIST_HEAD(&tgt_ctxt->tc_node_head); if(tgt_ctxt->tc_filtering_disable_required) tgt_ctxt->tc_flags |= VCF_FILTERING_STOPPED; tgt_ctxt->tc_cur_node = NULL; vbmap = bitmap->volume_bitmap; stop_filtering_device(tgt_ctxt, TRUE, &vbmap); info(" filtering mode = %d, stop filtering flag = %d, write order state = %d", tgt_ctxt->tc_cur_mode, is_target_filtering_disabled(tgt_ctxt), tgt_ctxt->tc_cur_wostate); seton = TRUE; } if(!tgt_ctxt->tc_filtering_disable_required){ seton = is_target_filtering_disabled(tgt_ctxt)? 1 : 0; PrevEndTimeStamp = tgt_ctxt->tc_PrevEndTimeStamp; PrevEndSequenceNumber = tgt_ctxt->tc_PrevEndSequenceNumber; PrevSequenceIDforSplitIO = tgt_ctxt->tc_PrevSequenceIDforSplitIO; }else{ seton = TRUE; PrevEndTimeStamp = 0; PrevEndSequenceNumber = 0; PrevSequenceIDforSplitIO = 0; } tgt_ctxt->tc_filtering_disable_required = 0; volume_unlock(tgt_ctxt); set_int_vol_attr(tgt_ctxt, VolumeFilteringDisabled, seton); set_unsignedlonglong_vol_attr(tgt_ctxt, VolumePrevEndTimeStamp, PrevEndTimeStamp); set_unsignedlonglong_vol_attr(tgt_ctxt, VolumePrevEndSequenceNumber, PrevEndSequenceNumber); set_unsignedlonglong_vol_attr(tgt_ctxt, VolumePrevSequenceIDforSplitIO, PrevSequenceIDforSplitIO); set_int_vol_attr(tgt_ctxt, VolumeDrainBlocked, 0); cleanup_change_nodes(&chg_nodes_hd, ecFilteringStopped); volume_lock(tgt_ctxt); tgt_ctxt->tc_stats.st_mode_switch_time = INM_GET_CURR_TIME_IN_SEC; tgt_ctxt->tc_stats.st_wostate_switch_time = INM_GET_CURR_TIME_IN_SEC; tgt_ctxt->tc_pending_changes = 0; tgt_ctxt->tc_bytes_pending_changes = 0; tgt_ctxt->tc_pending_wostate_data_changes = 0; tgt_ctxt->tc_pending_wostate_md_changes = 0; tgt_ctxt->tc_pending_wostate_bm_changes = 0; tgt_ctxt->tc_pending_wostate_rbm_changes = 0; tgt_ctxt->tc_commited_changes = 0; tgt_ctxt->tc_bytes_commited_changes = 0; bitmap = tgt_ctxt->tc_bp; bitmap->num_bitmap_open_errors = 0; bitmap->num_bitmap_clear_errors = 0; bitmap->num_bitmap_read_errors = 0; bitmap->num_bitmap_write_errors = 0; bitmap->num_changes_queued_for_writing = 0; bitmap->num_byte_changes_queued_for_writing = 0; bitmap->num_changes_read_from_bitmap = 0; bitmap->num_changes_written_to_bitmap = 0; bitmap->num_of_times_bitmap_written = 0; bitmap->num_of_times_bitmap_read = 0; telemetry_clear_dbs(&tgt_ctxt->tc_tel.tt_blend, DBS_DRIVER_RESYNC_REQUIRED); tgt_ctxt->tc_resync_required = 0; tgt_ctxt->tc_resync_indicated = 0; tgt_ctxt->tc_nr_out_of_sync = 0; tgt_ctxt->tc_out_of_sync_err_code = 0; tgt_ctxt->tc_out_of_sync_time_stamp = 0; tgt_ctxt->tc_out_of_sync_err_status = 0; tgt_ctxt->tc_nr_out_of_sync_indicated = 0; volume_unlock(tgt_ctxt); remove_disk_sess_from_dc(tgt_ctxt); /* resetting resync required flags if set any */ set_int_vol_attr(tgt_ctxt, VolumeResyncRequired, tgt_ctxt->tc_resync_required); set_int_vol_attr(tgt_ctxt, VolumeOutOfSyncErrorCode, tgt_ctxt->tc_out_of_sync_err_code); set_int_vol_attr(tgt_ctxt, VolumeOutOfSyncCount, tgt_ctxt->tc_nr_out_of_sync); set_longlong_vol_attr(tgt_ctxt, VolumeOutOfSyncTimestamp, tgt_ctxt->tc_out_of_sync_time_stamp); /* Exchange the strategy functions for host target context */ if (tgt_ctxt->tc_dev_type != FILTER_DEV_FABRIC_LUN) { inm_exchange_strategy(hdcp); } /* Wait for all I/Os to complete */ while (INM_ATOMIC_READ(&tgt_ctxt->tc_nr_in_flight_ios)) { INM_WAIT_EVENT_INTERRUPTIBLE_TIMEOUT(tgt_ctxt->tc_wq_in_flight_ios, (INM_ATOMIC_READ(&tgt_ctxt->tc_nr_in_flight_ios) == 0), 3 * INM_HZ); } volume_lock(tgt_ctxt); #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) if(!inm_list_empty(&tgt_ctxt->tc_non_drainable_node_head)) { tgt_ctxt->tc_flags &= ~VCF_IO_BARRIER_ON; inm_list_splice_at_tail(&tgt_ctxt->tc_non_drainable_node_head, &tgt_ctxt->tc_node_head); INM_INIT_LIST_HEAD(&tgt_ctxt->tc_non_drainable_node_head); } #endif list_change_head(&chg_nodes_hd, &tgt_ctxt->tc_node_head); INM_INIT_LIST_HEAD(&tgt_ctxt->tc_node_head); volume_unlock(tgt_ctxt); cleanup_change_nodes(&chg_nodes_hd, ecFilteringStopped); if (vbmap) { close_bitmap_file(vbmap, TRUE); bitmap->volume_bitmap = NULL; } } void free_volume_list(tag_volinfo_t *vinfo, inm_s32_t num_vols) { inm_s32_t temp = 0; tag_volinfo_t *lptr = vinfo; if(!vinfo) return; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("entered"); } while(temp < num_vols) { if(lptr->ctxt) { volume_lock(lptr->ctxt); lptr->ctxt->tc_flags &= ~VCF_VOLUME_TO_BE_FROZEN; volume_unlock(lptr->ctxt); put_tgt_ctxt(lptr->ctxt); } #ifdef INM_AIX if(lptr->chg_node){ inm_free_change_node(lptr->chg_node); } if(lptr->meta_page){ inm_page_t *pgp = lptr->meta_page; INM_UNPIN(pgp->cur_pg, INM_PAGESZ); INM_FREE_PAGE(pgp->cur_pg, INM_KERNEL_HEAP); pgp->cur_pg = NULL; INM_UNPIN(pgp, sizeof(inm_page_t)); INM_KFREE(pgp, sizeof(inm_page_t), INM_KERNEL_HEAP); } #endif temp++; lptr++; } INM_KFREE(vinfo, num_vols * sizeof(tag_volinfo_t), INM_KERNEL_HEAP); if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("leaving"); } } void free_tag_list(tag_info_t *tag_list, inm_s32_t num_tags) { if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("entered"); } INM_KFREE(tag_list, num_tags * sizeof(tag_info_t), INM_KERNEL_HEAP); if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("leaving"); } } void add_tags(tag_volinfo_t *tag_volinfop, tag_info_t *tag_info, inm_s32_t num_tags, tag_guid_t *tag_guid, inm_s32_t index) { target_context_t *ctxt = tag_volinfop->ctxt; if (ctxt->tc_dev_type == FILTER_DEV_MIRROR_SETUP){ inm_form_tag_cdb(ctxt, tag_info, num_tags); return; } volume_lock(ctxt); if(ctxt->tc_cur_wostate != ecWriteOrderStateData) { add_tag_in_non_stream_mode(tag_volinfop, tag_info, num_tags, tag_guid, index, TAG_COMMIT_NOT_PENDING, NULL); } else { if(!add_tag_in_stream_mode(tag_volinfop, tag_info, num_tags, tag_guid, index)) add_tag_in_non_stream_mode(tag_volinfop, tag_info, num_tags, tag_guid, index, TAG_COMMIT_NOT_PENDING, NULL); } volume_unlock(ctxt); INM_WAKEUP_INTERRUPTIBLE(&ctxt->tc_waitq); } inm_s32_t issue_tags(inm_s32_t vols, tag_volinfo_t *vol_list, inm_s32_t tags, tag_info_t *tag_list, inm_s32_t flags, tag_guid_t *tag_guid) { inm_s32_t num_vols = 0, index = 0, error = 0; tag_volinfo_t *temp = vol_list; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("entered"); } /* Freeze all volumes */ if (!(flags & TAG_FS_FROZEN_IN_USERSPACE)) { freeze_volumes(vols, vol_list); } while(num_vols < vols) { if(temp->ctxt && temp->ctxt->tc_dev_type != FILTER_DEV_MIRROR_SETUP && temp->ctxt->tc_cur_wostate != ecWriteOrderStateData) { dbg("One of the volume is not in Wrote Order State Data"); error = INM_EAGAIN; goto unfreeze_all; } num_vols++; temp++; } temp = vol_list; num_vols = 0; /* Issue tags to all volumes with the supplied timestamp. */ while(num_vols < vols) { if(temp->ctxt) { add_tags(temp, tag_list, tags, tag_guid, index); } index++; num_vols++; temp++; } unfreeze_all: /* Unfreeze all volumes */ if (!(flags & TAG_FS_FROZEN_IN_USERSPACE)) { unfreeze_volumes(vols, vol_list); } if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("leaving"); } return error; } static_inline tag_volinfo_t * build_volume_list(inm_s32_t num_vols, void __INM_USER **user_buf, inm_s32_t *error) { inm_u16_t vol_length = 0; char vol_ptr[TAG_VOLUME_MAX_LENGTH]; inm_s32_t i = 0; target_context_t *ctxt = NULL; tag_volinfo_t *list = NULL, *list_temp = NULL; inm_s32_t vols = 0; #ifdef INM_AIX change_node_t *chg_node = NULL; inm_page_t *pgp = NULL; #endif if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("entered"); } *error = 0; list = (tag_volinfo_t *)INM_KMALLOC((num_vols * sizeof(tag_volinfo_t)), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!list) { err("TAG Input Failed: INM_KMALLOC failed for volumes"); return 0; } list_temp = list; INM_MEM_ZERO(list, (num_vols * sizeof(tag_volinfo_t))); while(i < num_vols) { if(!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)*user_buf, sizeof(unsigned short))){ err("TAG Input Failed: Access violation while accessing %d \ element in volume list", i); *error = -EFAULT; break; } if(INM_COPYIN(&vol_length, *user_buf, sizeof(unsigned short))) { err("TAG Input Failed: INM_COPYIN failed while accessing \ flags"); *error = -EFAULT; break; } if(vol_length > TAG_VOLUME_MAX_LENGTH) { err("TAG Input Failed: volume length greater than limit"); *error = -EFAULT; break; } *user_buf += sizeof(unsigned short); if(!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)*user_buf, vol_length)){ err("TAG Input Failed: Access violation while accessing %d \ element in volume list", i); *error = -EFAULT; break; } if(INM_COPYIN(vol_ptr, *user_buf, vol_length)) { err("TAG Input Failed: INM_COPYIN failed while accessing \ flags"); *error = -EFAULT; break; } vol_ptr[vol_length] = '\0'; dbg("TAG: Volume Path %s", vol_ptr); ctxt = get_tgt_ctxt_from_uuid_nowait(vol_ptr); if(ctxt) { if(!is_target_filtering_disabled(ctxt) && !is_target_being_frozen(ctxt)) { list_temp->ctxt = ctxt; vols++; volume_lock(ctxt); list_temp->ctxt->tc_flags |= VCF_VOLUME_TO_BE_FROZEN; volume_unlock(ctxt); } else { put_tgt_ctxt(ctxt); } } else { dbg("TAG Input Failed: can't issue tag to %s", vol_ptr); list_temp->ctxt = NULL; } #ifdef INM_AIX chg_node = inm_alloc_change_node(NULL, INM_KM_NOSLEEP); if(!chg_node){ err("Failed to allocate change node"); *error = INM_ENOMEM; break; } INM_INIT_SEM(&chg_node->mutex); chg_node->mutext_initialized = 1; pgp = get_page_from_page_pool(0, 0, NULL); if(!pgp){ INM_DESTROY_SEM(&chg_node->mutex); inm_free_change_node(chg_node); err("Failed to allocate metadata page"); *error = INM_ENOMEM; break; } list_temp->chg_node = chg_node; list_temp->meta_page = pgp; #endif *user_buf += vol_length; i++; list_temp++; } if(*error || !vols) { free_volume_list(list, num_vols); return NULL; } if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("leaving"); } return list; } static_inline void init_tag_list(tag_info_t *tag_ptr, inm_s32_t num_tags) { inm_s32_t temp = 0; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("entered"); } while(temp < num_tags) { tag_ptr->tag_len = 0; tag_ptr++; temp++; } if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("leaving"); } } static_inline tag_info_t * build_tag_list(inm_s32_t num_tags, void __INM_USER **user_buf, inm_s32_t *error) { inm_s32_t temp = 0; inm_u16_t tag_length = 0; inm_s32_t valid_tags = 0; tag_info_t *tag_ptr = NULL; tag_info_t *tag_list = NULL; *error = 0; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("entered"); } tag_list = (tag_info_t *)INM_KMALLOC((num_tags * sizeof(tag_info_t)), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!tag_list) { err("Failed to allocated memory for tags"); *error = -ENOMEM; return NULL; } tag_ptr = tag_list; init_tag_list(tag_list, num_tags); while(temp < num_tags) { if(!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)*user_buf, sizeof(unsigned short))){ err("TAG Input Failed: Access violation while accessing %d \ element in tag list", temp); *error = -EFAULT; break; } if(INM_COPYIN(&tag_length, *user_buf, sizeof(unsigned short))) { err("TAG Input Failed: Access violation while accessing %d \ element in tag list", temp); *error = -EFAULT; break; } if((tag_length > TAG_MAX_LENGTH) || (tag_length <= 0)) { err("TAG Input Failed: Exceeded max limit size for each tag"); *error = -EFAULT; break; } *user_buf += sizeof(unsigned short); if(!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)*user_buf, tag_length)){ err("TAG Input Failed: Access violation while accessing %d \ element in tag list", temp); *error = -EFAULT; break; } if(INM_COPYIN(tag_ptr->tag_name, *user_buf, tag_length)) { err("TAG Input Failed: Access violation while accessing %d \ element in tag list", temp); *error = -EFAULT; break; } tag_ptr->tag_len = tag_length; *user_buf += tag_length; temp++; tag_ptr++; valid_tags++; } if(*error || !valid_tags) { INM_KFREE(tag_list, (num_tags * sizeof(tag_info_t)), INM_KERNEL_HEAP); return NULL; } if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("leaving"); } return tag_list; } tag_guid_t * get_tag_from_guid(char *guid) { struct inm_list_head *ptr; tag_guid_t *tag_guid = NULL; INM_DOWN_READ(&(driver_ctx->tgt_list_sem)); for(ptr = driver_ctx->tag_guid_list.next; ptr != &(driver_ctx->tag_guid_list); ptr = ptr->next, tag_guid = NULL) { tag_guid = inm_list_entry(ptr, tag_guid_t, tag_list); if(!strcmp(tag_guid->guid, guid)) break; } INM_UP_READ(&(driver_ctx->tgt_list_sem)); return tag_guid; } void flt_cleanup_sync_tag(tag_guid_t *tag_guid) { if(!tag_guid) return; if(tag_guid->guid) INM_KFREE(tag_guid->guid, tag_guid->guid_len + 1, INM_KERNEL_HEAP); if(tag_guid->status) INM_KFREE(tag_guid->status, tag_guid->num_vols * sizeof(inm_s32_t), INM_KERNEL_HEAP); INM_DESTROY_WAITQUEUE_HEAD(&tag_guid->wq); INM_KFREE(tag_guid, sizeof(tag_guid_t), INM_KERNEL_HEAP); } int flt_process_tags(inm_s32_t num_vols, void __INM_USER **user_buf, inm_s32_t flags, tag_guid_t *tag_guid) { inm_u16_t num_tags = 0; tag_volinfo_t *vol_list = NULL; tag_info_t *tag_list = NULL; inm_s32_t error = 0; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("entered"); } /* Build volume list of target context info. Need to acquire tag_sem here when * the volume list is built. */ INM_DOWN(&driver_ctx->tag_sem); vol_list = build_volume_list(num_vols, user_buf, &error); if(error || !vol_list){ err("TAG Input Failed: Failed while building volume list"); goto release_sem; } /* Get total number of tags. */ if(INM_COPYIN(&num_tags, *user_buf, sizeof(unsigned short))) { err("TAG Input Failed: INM_COPYIN failed while accessing flags"); error = -EFAULT; goto release_sem; } if(num_tags <= 0) { err("TAG Input Failed: Number of tags can't be zero or negative"); goto release_sem; } *user_buf += sizeof(unsigned short); /* Now, build the tag list. */ tag_list = build_tag_list(num_tags, user_buf, &error); if(error || !tag_list) { err("TAG Input Failed: Failed while building tag list"); goto release_sem; } if(tag_guid){ if(!INM_ACCESS_OK(VERIFY_WRITE, (void __INM_USER *)*user_buf, num_vols * sizeof(inm_s32_t))){ err("TAG Input Failed: Access violation in getting guid status"); error = INM_EFAULT; free_tag_list(tag_list, num_tags); goto release_sem; } } error = issue_tags(num_vols, vol_list, num_tags, tag_list, flags, tag_guid); free_tag_list(tag_list, num_tags); release_sem: INM_UP(&driver_ctx->tag_sem); free_volume_list(vol_list, num_vols); if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("leaving"); } return error; } inm_s32_t is_flt_disabled(char *pname) { char *s = NULL; inm_s32_t status = 0; s = INM_KMEM_CACHE_ALLOC_PATH(names_cachep, INM_KM_SLEEP, INM_PATH_MAX, INM_KERNEL_HEAP); INM_BUG_ON(!s); strncpy_s(s, INM_PATH_MAX, pname, INM_PATH_MAX); strcat_s(&s[0], INM_PATH_MAX, "/VolumeFilteringDisabled"); read_value_from_file(s, &status); dbg("volume filter disabled flag = %d\n", status); INM_KMEM_CACHE_FREE_PATH(names_cachep, s,INM_KERNEL_HEAP); s = NULL; return ((status != 0) ? TRUE : FALSE); } inm_s32_t init_boottime_stacking(void) { return 0; } void load_bal_rr(target_context_t *ctx, inm_u32_t io_sz) { struct inm_list_head *ptr, *hd, *nextptr; mirror_vol_entry_t *vol_entry = NULL; mirror_vol_entry_t *init_vol_entry = ctx->tc_vol_entry; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ dbg("entered"); } INM_BUG_ON(!init_vol_entry); if(!init_vol_entry){ goto out; } if(io_sz){ UPDATE_ATIO_SEND(init_vol_entry, io_sz); ctx->tc_commited_changes++; ctx->tc_bytes_commited_changes += io_sz; } hd = &(init_vol_entry->next); inm_list_for_each_safe(ptr, nextptr, hd){ if(ptr == &(ctx->tc_dst_list)){ continue; } vol_entry = inm_container_of(ptr, mirror_vol_entry_t, next); if(vol_entry->vol_error){ vol_entry->vol_io_skiped++; continue; } ctx->tc_vol_entry = vol_entry; break; } out: if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ dbg("exiting"); } return; } /* Function to discard dead entries from tc_dst_list and add only new path * entries to tc_dst_list */ static void mv_stale_entry_to_dead_list(target_context_t *tcp, struct inm_list_head *new_dst_listp, struct inm_list_head *delete_dst_list, struct inm_list_head *deref_list) { struct inm_list_head *ptr, *hd, *nextptr; struct inm_list_head *ptr_new, *hd_new, *nextptr_new; mirror_vol_entry_t *vol_entry = NULL; mirror_vol_entry_t *vol_entry_new = NULL; inm_s32_t mv_entry = 1; hd = &(tcp->tc_dst_list); inm_list_for_each_safe(ptr, nextptr, hd) { vol_entry = inm_container_of(ptr, mirror_vol_entry_t, next); hd_new = new_dst_listp; /* scan through destination list from user space (reconfiguration) * and isolate common (already existing in tc_dst_list) from it * to delete_dst_list */ inm_list_for_each_safe(ptr_new, nextptr_new, hd_new) { vol_entry_new = inm_container_of(ptr_new, mirror_vol_entry_t, next); if (!strncmp(vol_entry->tc_mirror_guid, vol_entry_new->tc_mirror_guid, INM_GUID_LEN_MAX)) { inm_list_del(ptr_new); inm_list_add_tail(ptr_new, delete_dst_list); vol_entry->vol_error = 0; mv_entry = 0; break; } } /* Uncommon entries are dead entries and are moved to dead list */ if (mv_entry) { inm_list_del(ptr); inm_list_add_tail(ptr, deref_list); } mv_entry = 1; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("tcp destination list: "); print_mirror_list(&(tcp->tc_dst_list)); info("delete destination list: "); print_mirror_list(delete_dst_list); info("input new destination list: "); print_mirror_list(new_dst_listp); info("deref list list: "); print_mirror_list(deref_list); } } static inm_s32_t inm_deref_all_vol_entry_list(struct inm_list_head *deref_list, target_context_t *tcp) { struct inm_list_head *ptr, *hd, *nextptr; mirror_vol_entry_t *vol_entry = NULL; inm_s32_t error = 0; hd = deref_list; inm_list_for_each_safe(ptr, nextptr, hd) { vol_entry = inm_container_of(ptr, mirror_vol_entry_t, next); while(INM_ATOMIC_READ(&(vol_entry->vol_ref)) > 1){ INM_DELAY(1 * INM_HZ); } } free_mirror_list(deref_list, 1); return error; } inm_s32_t ptio_cancel_send(target_context_t *tcp, inm_u64_t write_off, inm_u32_t write_len) { unsigned char cdbp[WRITE_CANCEL_CDB_LEN]; inm_s32_t ret = 0; dbg("PTIO cancel issued at <%llu, %u>",write_off, write_len); cdbp[0] = WRITE_CANCEL_CDB; cdbp[1] = 0x0; cdbp[2] = (write_off >> 56) & 0xFF; cdbp[3] = (write_off >> 48) & 0xFF; cdbp[4] = (write_off >> 40) & 0xFF; cdbp[5] = (write_off >> 32) & 0xFF; cdbp[6] = (write_off >> 24) & 0xFF; cdbp[7] = (write_off >> 16) & 0xFF; cdbp[8] = (write_off >> 8) & 0xFF; cdbp[9] = (write_off) & 0xFF; cdbp[10] = (write_len >> 24 ) & 0xFF; cdbp[11] = (write_len >> 16) & 0xFF; cdbp[12] = (write_len >> 8) & 0xFF; cdbp[13] = (write_len) & 0xFF; cdbp[14] = 0; cdbp[15] = 0; ret = inm_all_AT_cdb_send(tcp, cdbp, WRITE_CANCEL_CDB_LEN, 1, NULL, 0, 0); if(ret){ err("USCSI ioctl failed for cmd %x retval %u", WRITE_CANCEL_CDB_LEN, ret); goto out; } INM_ATOMIC_INC(&(tcp->tc_stats.tc_write_cancel)); out: return ret; } inm_iodone_t INM_MIRROR_IODONE(inm_pt_mirror_iodone, pt_bp, done, error) { inm_mirror_bufinfo_t *mbufinfo = (inm_mirror_bufinfo_t *) inm_container_of(pt_bp, inm_mirror_bufinfo_t, imb_pt_buf); target_context_t *tcp = mbufinfo->imb_privp; INM_MORE_IODONE_EXPECTED(pt_bp); INM_IMB_ERROR_SET(mbufinfo->imb_pt_err, error); INM_IMB_DONE_SET(mbufinfo->imb_pt_done, done); if(is_target_mirror_paused(tcp)){ goto ptio_done; } INM_INJECT_ERR(inject_ptio_err, error, pt_bp); if( INM_BUF_FAILED(pt_bp, error) ) { volume_lock(tcp); if( INM_BUF_RESID(pt_bp, done) == mbufinfo->imb_io_sz ){ mbufinfo->imb_flag |= INM_PTIO_FULL_FAILED; } mbufinfo->imb_flag |= INM_PTIO_CANCEL_PENDING; volume_unlock(tcp); } ptio_done: if(INM_ATOMIC_DEC_RET(&(mbufinfo->imb_done_cnt)) == INM_ALL_IOS_DONE){ inm_mirror_done(mbufinfo); } return INM_RET_IODONE; } inm_iodone_t INM_MIRROR_IODONE(inm_at_mirror_iodone, at_bp, done, error) { inm_mirror_atbuf *atbuf_wrap = (inm_mirror_atbuf *) inm_container_of(at_bp, inm_mirror_atbuf, imb_atbuf_buf); mirror_vol_entry_t *vol_entry = atbuf_wrap->imb_atbuf_vol_entry; inm_mirror_bufinfo_t *mbufinfo = atbuf_wrap->imb_atbuf_imbinfo; target_context_t *tcp = mbufinfo->imb_privp; INM_MORE_IODONE_EXPECTED(at_bp); if(is_target_mirror_paused(tcp)){ goto atio_done; } INM_INJECT_ERR(inject_atio_err, error, at_bp); INM_IMB_DONE_SET(atbuf_wrap->imb_atbuf_done, done); if( INM_BUF_FAILED(at_bp, error) ){ add_item_to_work_queue(&driver_ctx->wqueue, &(atbuf_wrap->imb_atbuf_wqe)); goto out; } atio_done: INM_DEREF_VOL_ENTRY(vol_entry, tcp); if(INM_ATOMIC_DEC_RET(&(mbufinfo->imb_done_cnt)) == INM_ALL_IOS_DONE){ inm_mirror_done(mbufinfo); } out: return INM_RET_IODONE; } static void inm_mirror_done(inm_mirror_bufinfo_t *mbufinfo) { target_context_t *tcp = mbufinfo->imb_privp; mirror_vol_entry_t *vol_entry = mbufinfo->imb_vol_entry; inm_mirror_atbuf *atbuf_wrap = NULL; struct inm_list_head *ptr = NULL, *nextptr = NULL; inm_u32_t failed_atbufs = 0; #ifndef INM_LINUX host_dev_ctx_t *hdcp = tcp->tc_priv; inm_s32_t flag; #endif INM_BUG_ON(INM_ATOMIC_READ(&(mbufinfo->imb_done_cnt)) != 0); /* * No need to protect inm_flag as nobody else would be refering it * as both the IO has completed. */ if(mbufinfo->imb_flag & INM_PTIO_CANCEL_PENDING){ if(mbufinfo->imb_flag & INM_PTIO_FULL_FAILED){ inm_list_for_each_safe(ptr, nextptr, &mbufinfo->imb_atbuf_list) { atbuf_wrap = inm_list_entry(ptr, inm_mirror_atbuf, imb_atbuf_this); if(atbuf_wrap->imb_atbuf_flag & INM_ATBUF_FULL_FAILED){ failed_atbufs++; } } if(failed_atbufs == mbufinfo->imb_atbuf_cnt){ goto done; } } INM_BUG_ON(mbufinfo->imb_flag & INM_PTIO_CANCEL_SENT); add_item_to_work_queue(&driver_ctx->wqueue, &(mbufinfo->ptio_can_wqe)); goto out; } done: INM_PT_IODONE(mbufinfo); INM_MIRROR_INFO_RETURN(hdcp, mbufinfo, flag); put_tgt_ctxt(tcp); INM_ATOMIC_DEC(&tcp->tc_nr_in_flight_ios); INM_DEREF_VOL_ENTRY(vol_entry, tcp); out: return; } void inm_atio_retry(wqentry_t *wqe) { inm_mirror_atbuf *atbuf_wrap = (inm_mirror_atbuf *) (wqe->context); inm_mirror_bufinfo_t *mbufinfo = atbuf_wrap->imb_atbuf_imbinfo; mirror_vol_entry_t *first_vol_entry = atbuf_wrap->imb_atbuf_vol_entry; mirror_vol_entry_t *vol_entry = NULL; inm_buf_t *at_bp = &(atbuf_wrap->imb_atbuf_buf); target_context_t *tcp = mbufinfo->imb_privp; inm_s32_t err = 0; volume_lock(tcp); first_vol_entry->vol_error = 1; UPDATE_ATIO_FAILED(first_vol_entry, mbufinfo->imb_io_sz); INM_DEREF_VOL_ENTRY(first_vol_entry, tcp); if( INM_BUF_RESID(at_bp, atbuf_wrap->imb_atbuf_done) == atbuf_wrap->imb_atbuf_iosz ){ atbuf_wrap->imb_atbuf_flag |= INM_ATBUF_FULL_FAILED; } else { atbuf_wrap->imb_atbuf_flag |= INM_ATBUF_PARTIAL_FAILED; } if(mbufinfo->imb_flag & INM_PTIO_CANCEL_PENDING){ /* * No need to resend the ATIO as PTIO cancel anyway will happen. */ goto atio_done; } err = 1; vol_entry = inm_get_healthy_vol_entry(tcp); if(!vol_entry){ goto atio_done; } INM_REF_VOL_ENTRY(vol_entry); UPDATE_ATIO_SEND(vol_entry, atbuf_wrap->imb_atbuf_iosz); atbuf_wrap->imb_atbuf_flag &= ~(INM_ATBUF_FULL_FAILED | INM_ATBUF_PARTIAL_FAILED); volume_unlock(tcp); atbuf_wrap->imb_atbuf_vol_entry = vol_entry; atbuf_wrap = inm_freg_atbuf(atbuf_wrap); at_bp = &(atbuf_wrap->imb_atbuf_buf); INM_DEREF_VOL_ENTRY(vol_entry, tcp); inm_issue_atio(at_bp, vol_entry); out: return; atio_done: volume_unlock(tcp); if(INM_ATOMIC_DEC_RET(&(mbufinfo->imb_done_cnt)) == INM_ALL_IOS_DONE){ inm_mirror_done(mbufinfo); } if(err){ dbg("AT IO retry Buf: Offset = %lld, count = %u", (long long int)INM_BUF_SECTOR(at_bp), INM_BUF_COUNT(at_bp)); queue_worker_routine_for_set_volume_out_of_sync(tcp, ERROR_TO_REG_AT_PATHS_FAILURE, 0); } goto out; } void issue_ptio_cancel_cdb(wqentry_t *wqe) { inm_mirror_bufinfo_t *mbufinfo = (inm_mirror_bufinfo_t *)(wqe->context); target_context_t *tcp = mbufinfo->imb_privp; inm_u64_t cancel_io_offset = mbufinfo->imb_atbuf_absoff; INM_BUG_ON(INM_ATOMIC_READ(&(mbufinfo->imb_done_cnt)) != 0); inm_map_abs_off_ln(&(mbufinfo->imb_pt_buf), tcp, &cancel_io_offset); if(ptio_cancel_send(tcp, mbufinfo->imb_io_off, mbufinfo->imb_io_sz)){ queue_worker_routine_for_set_volume_out_of_sync(tcp, ERROR_TO_REG_PTIO_CANCEL_FAILED, 2); } /* * No need to protect inm_flag as nobody else would be refering it * as both the IO has completed. We would come here from inm_mirror_done() only. */ mbufinfo->imb_flag |= INM_PTIO_CANCEL_SENT; mbufinfo->imb_flag &= ~INM_PTIO_CANCEL_PENDING; inm_mirror_done(mbufinfo); return; } mirror_vol_entry_t * get_cur_vol_entry(target_context_t *tcp, inm_u32_t io_sz) { mirror_vol_entry_t *vol_entry = NULL; if(tcp->tc_dev_type != FILTER_DEV_MIRROR_SETUP){ goto out; } if ((vol_entry = tcp->tc_vol_entry)) { INM_REF_VOL_ENTRY(vol_entry); } INM_BUG_ON(!vol_entry); load_bal_rr(tcp, io_sz); out: return vol_entry; } static mirror_vol_entry_t * inm_get_healthy_vol_entry(target_context_t *tcp) { struct inm_list_head *ptr, *nextptr; mirror_vol_entry_t *vol_entry = NULL; inm_list_for_each_safe(ptr, nextptr, &(tcp->tc_dst_list)){ vol_entry = inm_container_of(ptr, mirror_vol_entry_t, next); if(vol_entry->vol_error){ vol_entry = NULL; continue; } break; } return vol_entry; } int inm_save_mirror_bufinfo(target_context_t *tcp, inm_mirror_bufinfo_t **imbinfopp, inm_buf_t **org_bpp, mirror_vol_entry_t *vol_entry) { inm_buf_t *bp = *org_bpp; inm_buf_t *prev_at_buf = NULL; inm_buf_t *first_atbuf = NULL; host_dev_ctx_t *hdcp = NULL; inm_buf_t *newbp_hdp = NULL, *newbp_tailp = NULL; inm_mirror_bufinfo_t *newbp = NULL; int ret = INM_ENOMEM; inm_u32_t idx; inm_u32_t count = 0; inm_u32_t max_xfer_size = INM_MAX_XFER_SZ(vol_entry, bp); inm_u32_t no_atbufs = (INM_BUF_COUNT(bp) + max_xfer_size - 1) / max_xfer_size; inm_u64_t io_sz = 0; inm_list_head_t tmp_atbuf_list; inm_mirror_atbuf *atbuf_wrap = NULL; hdcp = tcp->tc_priv; do { #if (defined(IDEBUG) || defined(IDEBUG_BMAP)) dbg("write entry pt : off = %llu, sz = %d", INM_BUF_SECTOR(bp), INM_BUF_COUNT(bp)); #endif newbp = NULL; INM_INIT_LIST_HEAD(&tmp_atbuf_list); newbp = inm_get_imb_cached(hdcp); if(newbp){ goto out; } newbp = INM_KMEM_CACHE_ALLOC(driver_ctx->dc_host_info.mirror_bioinfo_cache, INM_KM_NOIO); if (!newbp){ err("Failed to allocate inm_mirror_bufinfo_t object"); goto error; } INM_MEM_ZERO(newbp, sizeof(inm_mirror_bufinfo_t)); INM_ATOMIC_SET(&(newbp->ptio_can_wqe.refcnt), 1); INM_INIT_LIST_HEAD(&(newbp->imb_atbuf_list)); newbp->ptio_can_wqe.context = newbp; newbp->ptio_can_wqe.work_func = issue_ptio_cancel_cdb; out: newbp->imb_atbuf_cnt = 0; INM_ATOMIC_SET(&(newbp->imb_done_cnt), 2); newbp->imb_org_bp = bp; first_atbuf = NULL; inm_bufoff_to_fldisk(bp, tcp, &(newbp->imb_atbuf_absoff)); inm_list_splice_init(&(newbp->imb_atbuf_list), &tmp_atbuf_list); do{ if(!inm_list_empty(&tmp_atbuf_list)){ atbuf_wrap = inm_list_entry(tmp_atbuf_list.next, inm_mirror_atbuf, imb_atbuf_this); if (atbuf_wrap) { inm_list_del(&atbuf_wrap->imb_atbuf_this); INM_INIT_LIST_HEAD(&(atbuf_wrap->imb_atbuf_this)); atbuf_wrap->imb_atbuf_flag = 0; goto populate_atbuf; } } if(!(atbuf_wrap = inm_alloc_atbuf_wrap(newbp))){ err("failed to allocate atbug_wrap"); goto free_newbp; } INM_BUG_ON(!(atbuf_wrap->imb_atbuf_imbinfo)); populate_atbuf: inm_list_add_tail(&(atbuf_wrap->imb_atbuf_this), &(newbp->imb_atbuf_list)); inm_prepare_atbuf(atbuf_wrap, bp, vol_entry, newbp->imb_atbuf_cnt); if(!first_atbuf){ first_atbuf = &(atbuf_wrap->imb_atbuf_buf); } newbp->imb_atbuf_cnt++; io_sz += INM_BUF_COUNT((&atbuf_wrap->imb_atbuf_buf)); }while(no_atbufs > newbp->imb_atbuf_cnt); inm_free_atbuf_list(&(tmp_atbuf_list)); get_tgt_ctxt(tcp); /* this ref is handled with bufinfo */ idx = inm_comp_io_bkt_idx(INM_BUF_COUNT(bp)); INM_ATOMIC_INC(&tcp->tc_stats.io_pat_writes[idx]); memcpy_s(&newbp->imb_pt_buf, sizeof(*bp), bp, sizeof(*bp)); newbp->imb_io_sz = INM_BUF_COUNT((&atbuf_wrap->imb_atbuf_buf)); newbp->imb_vol_entry = vol_entry; newbp->imb_io_off = INM_BUF_SECTOR(bp); INM_SET_ENDIO_FN((&newbp->imb_pt_buf), inm_pt_mirror_iodone); newbp->imb_flag = 0; newbp->imb_privp = tcp; INM_ATOMIC_INC(&tcp->tc_nr_bufs_pending); INM_ATOMIC_INC(&tcp->tc_nr_in_flight_ios); if (newbp_hdp) { INM_CHAIN_BUF((inm_buf_t *)newbp, newbp_tailp); INM_CHAIN_BUF(first_atbuf, prev_at_buf); INM_REF_VOL_ENTRY(vol_entry); } else { newbp_hdp = (inm_buf_t *)newbp; } prev_at_buf = &(atbuf_wrap->imb_atbuf_buf); count += newbp->imb_atbuf_cnt; newbp_tailp = (inm_buf_t *)newbp; bp = INM_GET_FWD_BUF(bp); } while (bp); *org_bpp = newbp_hdp; *imbinfopp = (inm_mirror_bufinfo_t *)newbp_hdp; INM_UPDATE_VOL_ENTRY_STAT(tcp, vol_entry, count, io_sz); ret = 0; goto exit; free_newbp: INM_BUG_ON(!inm_list_empty(&tmp_atbuf_list)); INM_KMEM_CACHE_FREE(driver_ctx->dc_host_info.mirror_bioinfo_cache, newbp); error: while (newbp_hdp) { INM_ATOMIC_DEC(&tcp->tc_nr_bufs_pending); INM_ATOMIC_DEC(&tcp->tc_nr_in_flight_ios); newbp = (inm_mirror_bufinfo_t *)newbp_hdp; newbp_hdp = INM_GET_FWD_BUF((&newbp->imb_pt_buf)); inm_free_atbuf_list(&(newbp->imb_atbuf_list)); put_tgt_ctxt(tcp); INM_KMEM_CACHE_FREE(driver_ctx->dc_host_info.mirror_bioinfo_cache, newbp); } queue_worker_routine_for_set_volume_out_of_sync(tcp, ERROR_TO_REG_FAILED_TO_ALLOC_BIOINFO, INM_ENOMEM); exit: inm_cleanup_mirror_bufinfo(hdcp); inm_free_atbuf_list(&(tmp_atbuf_list)); return (ret); } /* end of inm_save_mirror_bufinfo() */ static inm_mirror_atbuf * inm_freg_atbuf(inm_mirror_atbuf *atbuf_wrap) { mirror_vol_entry_t *vol_entry = atbuf_wrap->imb_atbuf_vol_entry; inm_mirror_bufinfo_t *mbufinfo = atbuf_wrap->imb_atbuf_imbinfo; target_context_t *tcp = (target_context_t *)mbufinfo->imb_privp; inm_buf_t *at_bp = &atbuf_wrap->imb_atbuf_buf; inm_u32_t max_xfer_size = INM_MAX_XFER_SZ(vol_entry, at_bp); inm_mirror_atbuf *new_atbuf_wrap = NULL; inm_list_head_t prev_head; inm_u32_t count = 0; inm_u32_t more = 0; inm_u32_t err = 0; inm_u64_t org_atbuf_blkno = mbufinfo->imb_atbuf_absoff; dbg("entered in inm_freg_atbuf"); /* No need to add the start device offset in blkno as they are added * already during save_mirror_bufinfo. */ mbufinfo->imb_atbuf_absoff = 0; if(max_xfer_size >= atbuf_wrap->imb_atbuf_iosz){ inm_prepare_atbuf(atbuf_wrap, &atbuf_wrap->imb_atbuf_buf, vol_entry, count); new_atbuf_wrap = atbuf_wrap; } else { INM_INIT_LIST_HEAD(&(prev_head)); do { new_atbuf_wrap = inm_alloc_atbuf_wrap(mbufinfo); if(!new_atbuf_wrap){ info("While AT IO retry memory allocation failed"); err = INM_ENOMEM; break; } inm_list_add_tail(&(new_atbuf_wrap->imb_atbuf_this), &prev_head); more = inm_prepare_atbuf(new_atbuf_wrap, at_bp, vol_entry, count); dbg("breaking at_pb %p of size %u in %dth buf of size %u", at_bp, INM_BUF_COUNT(at_bp), count, INM_BUF_COUNT((&atbuf_wrap->imb_atbuf_buf))); count++; } while(more); if(err){ goto out; } new_atbuf_wrap = inm_container_of(prev_head.next, inm_mirror_atbuf, imb_atbuf_this); volume_lock(tcp); inm_list_splice(&prev_head, &(atbuf_wrap->imb_atbuf_this)); inm_list_del(&(atbuf_wrap->imb_atbuf_this)); volume_unlock(tcp); INM_UNPIN(atbuf_wrap, sizeof(inm_mirror_atbuf)); INM_KFREE(atbuf_wrap, sizeof(inm_mirror_atbuf), INM_KERNEL_HEAP); atbuf_wrap = NULL; } out: mbufinfo->imb_atbuf_absoff = org_atbuf_blkno; while(err && !inm_list_empty(&prev_head)){ new_atbuf_wrap = inm_list_entry(prev_head.next, inm_mirror_atbuf, imb_atbuf_this); inm_list_del(&(new_atbuf_wrap->imb_atbuf_this)); INM_UNPIN(new_atbuf_wrap, sizeof(mirror_vol_entry_t)); INM_KFREE(new_atbuf_wrap, sizeof(mirror_vol_entry_t), INM_KERNEL_HEAP); new_atbuf_wrap = NULL; } dbg("exiting from inm_freg_atbuf with err %d, new_atbuf %p",err, new_atbuf_wrap); return new_atbuf_wrap; } static inm_mirror_atbuf * inm_alloc_atbuf_wrap(inm_mirror_bufinfo_t *mbufinfo) { inm_mirror_atbuf *atbuf_wrap = NULL; inm_s32_t err = 0; atbuf_wrap = (inm_mirror_atbuf *) INM_KMALLOC(sizeof(inm_mirror_atbuf), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!atbuf_wrap){ err = INM_ENOMEM; goto out; } if(INM_PIN(atbuf_wrap, sizeof(inm_mirror_atbuf))){ err = 2; goto out; } INM_MEM_ZERO(atbuf_wrap, sizeof(inm_mirror_atbuf)); INM_INIT_LIST_HEAD(&(atbuf_wrap->imb_atbuf_this)); INM_ATOMIC_SET((&atbuf_wrap->imb_atbuf_wqe.refcnt), 1); atbuf_wrap->imb_atbuf_wqe.context = atbuf_wrap; atbuf_wrap->imb_atbuf_imbinfo = mbufinfo; atbuf_wrap->imb_atbuf_wqe.work_func = inm_atio_retry; out: if(err){ if(atbuf_wrap){ INM_KFREE(atbuf_wrap, sizeof(inm_mirror_atbuf), INM_KERNEL_HEAP); } atbuf_wrap = NULL; } return atbuf_wrap; } void inm_free_atbuf_list(inm_list_head_t *atbuf_list) { inm_mirror_atbuf *atbuf_wrap = NULL; while(!inm_list_empty(atbuf_list)){ atbuf_wrap = NULL; atbuf_wrap = inm_list_entry(atbuf_list->next, inm_mirror_atbuf, imb_atbuf_this); INM_BUG_ON(!atbuf_wrap); inm_list_del(&atbuf_wrap->imb_atbuf_this); INM_UNPIN(atbuf_wrap, sizeof(inm_mirror_atbuf)); INM_KFREE(atbuf_wrap, sizeof(inm_mirror_atbuf), INM_KERNEL_HEAP); } } static void inm_map_abs_off_ln(inm_buf_t *bp, target_context_t *tcp, inm_u64_t *abs_off) { *abs_off += INM_BUF_SECTOR(bp); *abs_off += (tcp->tc_dev_startoff >> 9); } /* * function to start processing of tags for a given volume. */ int process_tag_volume(tag_info_t_v2 *tag_vol, tag_info_t *tag_list, int commit_pending) { tag_volinfo_t *tag_volinfop = NULL; int ret = 0; inm_s32_t error = 0; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("entered"); } INM_DOWN(&driver_ctx->tag_sem); tag_volinfop = build_volume_node_totag(tag_vol->vol_info, &error); if (error || !tag_volinfop) { dbg("TAG Input Failed: Failed while building volume context"); ret = error; if (!(tag_vol->vol_info->status & STATUS_TAG_NOT_PROTECTED)) { tag_vol->vol_info->status |= STATUS_TAG_NOT_ACCEPTED; } goto out; } ret = issue_tag_volume(tag_vol, tag_volinfop, tag_list, commit_pending); if(ret) { dbg("the issue tag volume failed for the volume"); if (!(tag_vol->vol_info->status & STATUS_TAG_NOT_PROTECTED) && !(tag_vol->vol_info->status & STATUS_TAG_WO_METADATA)) { tag_vol->vol_info->status |= STATUS_TAG_NOT_ACCEPTED; } } out: if(tag_volinfop) { if(tag_volinfop->ctxt) { put_tgt_ctxt(tag_volinfop->ctxt); tag_volinfop->ctxt = NULL; } INM_KFREE(tag_volinfop, sizeof(tag_volinfo_t), INM_KERNEL_HEAP); tag_volinfop = NULL; } INM_UP(&driver_ctx->tag_sem); if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("leaving"); } return ret; } /* * function to build tag list for set of tags. * this has to be moved in main function */ tag_info_t * build_tag_vol_list(tag_info_t_v2 *tag_vol, int *error) { tag_info_t *tag_list = NULL; int numtags = 0; unsigned short tmp_len; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("entered"); } *error = 0; tag_list = (tag_info_t *)INM_KMALLOC((tag_vol->nr_tags * sizeof(tag_info_t)), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!tag_list) { err("Failed to allocated memory for tags"); return NULL; } for(numtags = 0; numtags < tag_vol->nr_tags; numtags++) { tmp_len = tag_vol->tag_names[numtags].tag_len; /* don't proceed if tag_len coming from user is greater than 256 */ if (tmp_len > INM_GUID_LEN_MAX) { *error = -1; goto out; } tag_list[numtags].tag_len = tmp_len; if(memcpy_s(tag_list[numtags].tag_name, INM_GUID_LEN_MAX, tag_vol->tag_names[numtags].tag_name, tmp_len)) { *error = -1; goto out; } } out: if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("leaving"); } return tag_list; } /* * function to issue tags for given volume */ inm_s32_t issue_tag_volume(tag_info_t_v2 *tag_vol, tag_volinfo_t *tag_volinfop, tag_info_t *tag_list, int commit_pending) { int ret = 0; TAG_COMMIT_STATUS *tag_status = NULL; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("entered"); } if(tag_volinfop->ctxt && tag_volinfop->ctxt->tc_dev_type != FILTER_DEV_MIRROR_SETUP && tag_volinfop->ctxt->tc_cur_wostate != ecWriteOrderStateData) { if (tag_volinfop->ctxt->tc_cur_wostate != ecWriteOrderStateData) tag_vol->vol_info->status |= STATUS_TAG_WO_METADATA; INM_SPIN_LOCK(&driver_ctx->dc_tag_commit_status); if (driver_ctx->dc_tag_drain_notify_guid && !INM_MEM_CMP(driver_ctx->dc_cp_guid, driver_ctx->dc_tag_drain_notify_guid, GUID_LEN)) { tag_status = tag_volinfop->ctxt->tc_tag_commit_status; info("The disk %s is in non write order state", tag_volinfop->ctxt->tc_guid); } INM_SPIN_UNLOCK(&driver_ctx->dc_tag_commit_status); if (tag_status) set_tag_drain_notify_status(tag_volinfop->ctxt, TAG_STATUS_INSERTION_FAILED, DEVICE_STATUS_NON_WRITE_ORDER_STATE); dbg("the volume is not in Write Order State Data"); ret = INM_EAGAIN; goto out; } add_volume_tags(tag_vol, tag_volinfop, tag_list, commit_pending); out: if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("leaving"); } return ret; } /* * function to add tag to given volume */ void add_volume_tags(tag_info_t_v2 *tag_vol, tag_volinfo_t *tag_volinfop, tag_info_t *tag_info_listp, int commit_pending) { target_context_t *ctxt = tag_volinfop->ctxt; int index = 0; if (ctxt->tc_dev_type == FILTER_DEV_MIRROR_SETUP){ inm_form_tag_cdb(ctxt, tag_info_listp, tag_vol->nr_tags); tag_vol->vol_info->status |= STATUS_TAG_NOT_ACCEPTED; return; } volume_lock(ctxt); /* * add tags only in metadata mode, to save the data pages. */ if(!add_tag_in_non_stream_mode(tag_volinfop, tag_info_listp, tag_vol->nr_tags, NULL, index, commit_pending, NULL)) { tag_vol->vol_info->status |= STATUS_TAG_ACCEPTED; if (commit_pending) driver_ctx->dc_cp |= INM_CP_TAG_COMMIT_PENDING; } else { if (tag_volinfop->ctxt->tc_cur_wostate != ecWriteOrderStateData) tag_vol->vol_info->status |= STATUS_TAG_WO_METADATA; else tag_vol->vol_info->status |= STATUS_TAG_NOT_ACCEPTED; } volume_unlock(ctxt); INM_WAKEUP_INTERRUPTIBLE(&ctxt->tc_waitq); } /* * function to build the volume context for the given * volume to issue tags */ tag_volinfo_t * build_volume_node_totag(volume_info_t *vol_info, inm_s32_t *error) { target_context_t *ctxt = NULL; tag_volinfo_t *tmp_vol; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("entered"); } #ifdef INM_AIX change_node_t *chg_node = NULL; inm_page_t *pgp = NULL; #endif *error = 0; tmp_vol = (tag_volinfo_t *)INM_KMALLOC(sizeof(tag_volinfo_t), INM_KM_NOSLEEP, INM_KERNEL_HEAP); if(!tmp_vol) { err("TAG Input Failed: INM_KMALLOC failed for volumes"); return 0; } INM_MEM_ZERO(tmp_vol, sizeof(tag_volinfo_t)); if ( !(INM_ATOMIC_READ(&driver_ctx->is_iobarrier_on)) ) { ctxt = get_tgt_ctxt_from_uuid_nowait(vol_info->vol_name); } else { ctxt = get_tgt_ctxt_from_uuid_nowait_locked(vol_info->vol_name); } if(ctxt) { if(!is_target_filtering_disabled(ctxt)) { tmp_vol->ctxt = ctxt; } else { put_tgt_ctxt(ctxt); tmp_vol->ctxt = NULL; *error = -1; vol_info->status |= STATUS_TAG_NOT_PROTECTED; goto out; } } else { dbg("TAG Input Failed: can't issue tag to %s", vol_info->vol_name); *error = -1; vol_info->status |= STATUS_TAG_NOT_PROTECTED; goto out; } #ifdef INM_AIX chg_node = inm_alloc_change_node(NULL, INM_KM_NOSLEEP); if(!chg_node){ err("Failed to allocate change node"); *error = INM_ENOMEM; goto out; } INM_INIT_SEM(&chg_node->mutex); chg_node->mutext_initialized = 1; pgp = get_page_from_page_pool(0, 0, NULL); if(!pgp){ INM_DESTROY_SEM(&chg_node->mutex); inm_free_change_node(chg_node); err("Failed to allocate metadata page"); *error = INM_ENOMEM; goto out; } tmp_vol->chg_node = chg_node; tmp_vol->meta_page = pgp; #endif out: if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("leaving"); } return tmp_vol; } void set_tag_drain_notify_status(target_context_t *ctxt, int tag_status, int dev_status) { dbg("tag_status = %d, dev_status = %d", tag_status, dev_status); INM_SPIN_LOCK(&driver_ctx->dc_tag_commit_status); if (ctxt->tc_tag_commit_status) { ctxt->tc_tag_commit_status->Status = dev_status; ctxt->tc_tag_commit_status->TagStatus = tag_status; if (tag_status == TAG_STATUS_COMMITTED) INM_ATOMIC_DEC(&driver_ctx->dc_nr_tag_commit_status_pending_disks); else if (tag_status != TAG_STATUS_INSERTED) INM_ATOMIC_INC(&driver_ctx->dc_tag_commit_status_failed); if (TAG_STATUS_DROPPED == tag_status) { info("The failback tag is dropped for disk %s", ctxt->tc_guid); } wake_up_interruptible(&driver_ctx->dc_tag_commit_status_waitq); } INM_SPIN_UNLOCK(&driver_ctx->dc_tag_commit_status); } inm_s32_t modify_persistent_device_name(target_context_t *ctx, char *p_name) { if (strcpy_s(ctx->tc_pname, GUID_SIZE_IN_CHARS, p_name)) { err("strcpy failed for pname"); return INM_EFAULT; } return 0; } involflt-0.1.0/src/db_routines.c0000755000000000000000000002555514467303177015346 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ /* * File : db_routines.c * * Description: This file contains data mode implementation of the * filter driver. */ #include "involflt.h" #include "involflt-common.h" #include "data-mode.h" #include "change-node.h" #include "filestream.h" #include "iobuffer.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "bitmap_api.h" #include "VBitmap.h" #include "work_queue.h" #include "data-file-mode.h" #include "target-context.h" #include "driver-context.h" #include "utils.h" #include "errlog.h" #include "work_queue.h" #include "db_routines.h" #include "tunable_params.h" #include "filter_host.h" #include "telemetry-types.h" #include "telemetry.h" extern driver_context_t *driver_ctx; char *ErrorToRegErrorDescriptionsA[ERROR_TO_REG_MAX_ERROR + 1] = { "--%#x--", "Out of NonPagedPool for dirty blocks %#x", "Bit map read failed with Status %#x", "Bit map write failed with Status %#x", "Bit map open failed with Status %#x", "Bit map open failed and could not write changes %#x", "Out Of Bound IO on source volume with Status %#x", "The I/O is spanning across multiple partitions/volumes %#x", "See server message log for errors %#x", "Out of memory while allocating work queue item. Status %#x", "Write via IO control path - format/label change %#x", "Vendor specific CDB error Status:%#x", "Forced resync due to error injection Status:%#x", "AT IO error %#x", "AT device list is bad. Status:%#x", "Out of memory while allocating IO info structure %#x", "New path added to source device(s) %#x", "System crashed or not cleanly shutdown %#x", "PT IO cancel failed %#x", "Out of order issue detected on source %#x", "I/O is greater than 64MB in metadata mode %#x", "Bitmap device object not found %#x", "Failed to learn bitmap header blkmap %#x", "Failed to flush changes to bitmap file %#x", "Failed to protect root disk in initrd %#x" }; inm_s32_t set_volume_out_of_sync(target_context_t *vcptr, inm_u64_t out_of_sync_error_code, inm_s32_t status_to_log) { inm_s32_t _rc = 0; inm_s32_t flag = TRUE; TIME_STAMP_TAG_V2 time_stamp; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!vcptr) return -EINVAL; if ((vcptr->tc_flags & VCF_IGNORE_BITMAP_CREATION) && (LINVOLFLT_ERR_BITMAP_FILE_CREATED == out_of_sync_error_code)) return _rc; if (INM_IN_INTERRUPT()) { _rc = queue_worker_routine_for_set_volume_out_of_sync(vcptr, out_of_sync_error_code, status_to_log); return _rc; } if (out_of_sync_error_code == LINVOLFLT_ERR_LOST_SYNC_SYSTEM_CRASHED) { out_of_sync_error_code = ERROR_TO_REG_UNCLEAN_SYS_SHUTDOWN; } err("Resync Required (%s -> %s): %llu, %d, %d, %llu", vcptr->tc_pname, vcptr->tc_guid, out_of_sync_error_code, status_to_log, vcptr->tc_flags & VCF_IGNORE_BITMAP_CREATION, vcptr->tc_out_of_sync_time_stamp); telemetry_set_dbs(&vcptr->tc_tel.tt_blend, DBS_DRIVER_RESYNC_REQUIRED); vcptr->tc_resync_required = TRUE; vcptr->tc_out_of_sync_err_code = out_of_sync_error_code; vcptr->tc_nr_out_of_sync++; get_time_stamp_tag(&time_stamp); vcptr->tc_out_of_sync_time_stamp = time_stamp.TimeInHundNanoSecondsFromJan1601; vcptr->tc_out_of_sync_err_status = status_to_log; vcptr->tc_hist.ths_nr_osyncs++; vcptr->tc_hist.ths_osync_err = out_of_sync_error_code; vcptr->tc_hist.ths_osync_ts = vcptr->tc_out_of_sync_time_stamp; INM_DO_DIV(vcptr->tc_hist.ths_osync_ts, HUNDREDS_OF_NANOSEC_IN_SECOND); set_int_vol_attr(vcptr, VolumeResyncRequired, flag); set_int_vol_attr(vcptr, VolumeOutOfSyncErrorCode, vcptr->tc_out_of_sync_err_code); set_int_vol_attr(vcptr, VolumeOutOfSyncErrorStatus, status_to_log); set_int_vol_attr(vcptr, VolumeOutOfSyncCount, vcptr->tc_nr_out_of_sync); set_longlong_vol_attr(vcptr, VolumeOutOfSyncTimestamp, vcptr->tc_out_of_sync_time_stamp); if (vcptr->tc_dev_type == FILTER_DEV_MIRROR_SETUP){ volume_lock(vcptr); vcptr->tc_flags |= VCF_MIRRORING_PAUSED; volume_unlock(vcptr); INM_WAKEUP(&(((host_dev_ctx_t *)(vcptr->tc_priv))->resync_notify)); } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return _rc; } void set_volume_out_of_sync_worker_routine(wqentry_t *wqe) { target_context_t *vcptr = NULL; inm_s64_t out_of_sync_error_code = INM_EINVAL; inm_s32_t status = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!wqe) { info("\n invalid work queue entry \n"); return; } vcptr = (target_context_t *) wqe->context; out_of_sync_error_code = wqe->extra1; status = wqe->extra2; put_work_queue_entry(wqe); set_volume_out_of_sync(vcptr, out_of_sync_error_code, status); put_tgt_ctxt(vcptr); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return; } inm_s32_t queue_worker_routine_for_set_volume_out_of_sync(target_context_t *vcptr, int64_t out_of_sync_error_code, inm_s32_t status) { wqentry_t *wqe = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!vcptr) { return -EINVAL; } wqe = alloc_work_queue_entry(INM_KM_NOSLEEP); if (!wqe) { telemetry_set_dbs(&vcptr->tc_tel.tt_blend, DBS_DRIVER_RESYNC_REQUIRED); vcptr->tc_resync_required = TRUE; vcptr->tc_out_of_sync_err_code = out_of_sync_error_code; vcptr->tc_nr_out_of_sync++; info("failed to allocate work queue entry\n"); return -ENOMEM; } get_tgt_ctxt(vcptr); wqe->context = vcptr; wqe->extra1 = out_of_sync_error_code; wqe->extra2 = status; wqe->work_func = set_volume_out_of_sync_worker_routine; add_item_to_work_queue(&driver_ctx->wqueue , wqe); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return 0; } inm_s32_t stop_filtering_device(target_context_t *vcptr, inm_s32_t lock_acquired, volume_bitmap_t **vbmap_ptr) { inm_s32_t status = 0; volume_bitmap_t *vbmap = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!vcptr) { info("vcptr is null"); return -EINVAL; } if (!lock_acquired) volume_lock(vcptr); if(vcptr->tc_filtering_disable_required) vcptr->tc_flags |= VCF_FILTERING_STOPPED; get_time_stamp(&(vcptr->tc_tel.tt_stop_flt_time)); vbmap = vcptr->tc_bp->volume_bitmap; vcptr->tc_bp->volume_bitmap = NULL; set_tgt_ctxt_wostate(vcptr, ecWriteOrderStateUnInitialized, FALSE, ecWOSChangeReasonUnInitialized); vcptr->tc_prev_mode = FLT_MODE_UNINITIALIZED; vcptr->tc_cur_mode = FLT_MODE_UNINITIALIZED; vcptr->tc_prev_wostate = ecWriteOrderStateUnInitialized; vcptr->tc_cur_wostate = ecWriteOrderStateUnInitialized; if (!lock_acquired) volume_unlock(vcptr); if (vbmap) { if (!lock_acquired) { close_bitmap_file(vbmap, TRUE); } else { if (vbmap_ptr) *vbmap_ptr = vbmap; } } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with ret value = %d", status); } return status; } void orphandata_and_move_commit_pending_changenode_to_main_queue(target_context_t *vcptr) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!vcptr) { info("vcptr is null"); return; } if (!vcptr->tc_pending_confirm) { dbg("pending_confirm is null "); return; } vcptr->tc_pending_confirm = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } } void add_resync_required_flag(UDIRTY_BLOCK_V2 *udb, target_context_t *vcptr) { unsigned long sync_err = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!vcptr) { info("vcptr is null"); return; } INM_DOWN(&vcptr->tc_resync_sem); if (vcptr->tc_resync_required) { vcptr->tc_resync_indicated = TRUE; vcptr->tc_nr_out_of_sync_indicated = vcptr->tc_nr_out_of_sync; udb->uHdr.Hdr.ulFlags |= UDIRTY_BLOCK_FLAG_VOLUME_RESYNC_REQUIRED; udb->uHdr.Hdr.ulOutOfSyncCount = vcptr->tc_nr_out_of_sync; udb->uHdr.Hdr.liOutOfSyncTimeStamp = vcptr->tc_out_of_sync_time_stamp; udb->uHdr.Hdr.ulOutOfSyncErrorCode = vcptr->tc_out_of_sync_err_code; sync_err = vcptr->tc_out_of_sync_err_code; if (sync_err > (ERROR_TO_REG_MAX_ERROR)) { sync_err = ERROR_TO_REG_DESCRIPTION_IN_EVENT_LOG; } snprintf((char *)&udb->uHdr.Hdr.ErrorStringForResync[0], UDIRTY_BLOCK_MAX_ERROR_STRING_SIZE, ErrorToRegErrorDescriptionsA[sync_err], vcptr->tc_out_of_sync_err_status); dbg("Resync error string:%s", (char *)&udb->uHdr.Hdr.ErrorStringForResync); } else { udb->uHdr.Hdr.ErrorStringForResync[0]='\0'; } INM_UP(&vcptr->tc_resync_sem); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } } void reset_volume_out_of_sync(target_context_t *vcptr) { inm_s32_t nil=0; inm_s32_t null_str[]={0}; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (vcptr && !vcptr->tc_resync_required) return; INM_DOWN(&vcptr->tc_resync_sem); if (vcptr->tc_nr_out_of_sync_indicated >= vcptr->tc_nr_out_of_sync) { /* No resync was set between resync indication and acking the resync action */ telemetry_clear_dbs(&vcptr->tc_tel.tt_blend, DBS_DRIVER_RESYNC_REQUIRED); vcptr->tc_resync_required = FALSE; vcptr->tc_nr_out_of_sync = 0; vcptr->tc_out_of_sync_time_stamp = 0; vcptr->tc_out_of_sync_err_code = 0; vcptr->tc_out_of_sync_err_status = 0; set_int_vol_attr(vcptr, VolumeResyncRequired, vcptr->tc_resync_required); set_int_vol_attr(vcptr, VolumeOutOfSyncErrorCode, vcptr->tc_out_of_sync_err_code); set_int_vol_attr(vcptr, VolumeOutOfSyncErrorStatus, vcptr->tc_out_of_sync_err_status); set_int_vol_attr(vcptr, VolumeOutOfSyncCount, vcptr->tc_nr_out_of_sync); set_string_vol_attr(vcptr, VolumeOutOfSyncErrorDescription, (char *)null_str); set_int_vol_attr(vcptr, VolumeOutOfSyncTimestamp, nil); } else { vcptr->tc_nr_out_of_sync -= vcptr->tc_nr_out_of_sync_indicated; set_int_vol_attr(vcptr, VolumeOutOfSyncCount, vcptr->tc_nr_out_of_sync); } vcptr->tc_resync_indicated = FALSE; INM_UP(&vcptr->tc_resync_sem); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return; } involflt-0.1.0/src/target-context.c0000755000000000000000000020622714467303177015776 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include "involflt.h" #include "involflt-common.h" #include "data-mode.h" #include "change-node.h" #include "utils.h" #include "filestream.h" #include "iobuffer.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "bitmap_api.h" #include "VBitmap.h" #include "work_queue.h" #include "data-file-mode.h" #include "target-context.h" #include "driver-context.h" #include "involflt_debug.h" #include "db_routines.h" #include "filter.h" #include "data-file-mode.h" #include "tunable_params.h" #include "file-io.h" #include "filter_host.h" #include "telemetry-types.h" #include "telemetry.h" extern driver_context_t *driver_ctx; extern inm_s32_t fabric_volume_init(target_context_t *, inm_dev_info_t *); extern inm_s32_t fabric_volume_deinit(target_context_t *ctx); extern inm_s32_t inm_validate_fabric_vol(target_context_t *, inm_dev_info_t *); extern void do_stop_filtering(target_context_t *); static inm_s32_t inm_deref_all_vol_entry_tcp(target_context_t *); const inm_u64_t inm_dbwait_notify_lat_bkts_in_usec[INM_LATENCY_DIST_BKT_CAPACITY] = { 10, 100, 1000, // Milli second 10000, // 10 Milli seconds 100000, 1000000, // 1 Second 10000000, // 10 Seconds 30000000, // 30 Seconds 60000000, // 1 Minute 0, }; const inm_u64_t inm_dbcommit_lat_bkts_in_usec[INM_LATENCY_DIST_BKT_CAPACITY] = { 80000, 150000, 200000, 250000, 325000, 400000, 500000, 1000000, 3000000, 0, }; const inm_u64_t inm_dbret_lat_bkts_in_usec[INM_LATENCY_DIST_BKT_CAPACITY] = { 10, 100, 1000, // Milli second 10000, // 10 Milli seconds 100000, 1000000, // 1 Second 10000000, // 10 Seconds 30000000, // 30 Seconds 60000000, // 1 Minute 600000000, // 10 Minutes 1200000000, // 20 Minutes 0, }; void init_latency_stats(inm_latency_stats_t *lat_stp, const inm_u64_t *bktsp); #ifdef INM_LINUX #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,30) inm_super_block_t *freeze_bdev(inm_block_device_t *); void thaw_bdev(inm_block_device_t *, inm_super_block_t *); #endif #endif void volume_lock_irqsave(target_context_t *ctx) { INM_SPIN_LOCK_IRQSAVE(&ctx->tc_lock, ctx->tc_lock_flag); } void volume_unlock_irqrestore(target_context_t *ctx) { INM_SPIN_UNLOCK_IRQRESTORE(&ctx->tc_lock, ctx->tc_lock_flag); } void volume_lock_bh(target_context_t *ctx) { INM_VOL_LOCK(&ctx->tc_lock, ctx->tc_lock_flag); } void volume_unlock_bh(target_context_t *ctx) { INM_VOL_UNLOCK(&ctx->tc_lock, ctx->tc_lock_flag); } void inm_tc_reserv_init(target_context_t *ctx, int vol_lock) { inm_u32_t tc_res_pages = 0, try = 0; inm_s32_t ret = -1, len; unsigned long lock_flag = 0; inm_u32_t data_pool_size = 0; char new_data_pool_size[NUM_CHARS_IN_LONGLONG + 1]; /* sysfs read must have read tc_reserve_pages */ if(!strncmp(ctx->tc_guid, "DUMMY_LUN_ZERO", strlen("DUMMY_LUN_ZERO"))){ INM_SPIN_LOCK_IRQSAVE(&driver_ctx->clean_shutdown_lock, lock_flag); driver_ctx->dc_flags |= DRV_DUMMY_LUN_CREATED; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->clean_shutdown_lock, lock_flag); return; } if (vol_lock) volume_lock(ctx); tc_res_pages = ctx->tc_reserved_pages; ctx->tc_reserved_pages = 0; /* Add page reservations to this context */ ret = inm_tc_resv_add(ctx, tc_res_pages); if (vol_lock) volume_unlock(ctx); if (ret) { /* There is no enough memory in dc's unreserve pages for * target context */ retry: INM_SPIN_LOCK_IRQSAVE(&driver_ctx->tunables_lock, lock_flag); data_pool_size = driver_ctx->tunable_params.data_pool_size; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->tunables_lock, lock_flag); INM_SPIN_LOCK_IRQSAVE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); if ((data_pool_size - driver_ctx->data_flt_ctx.pages_allocated) < tc_res_pages) { INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); len = sprintf(new_data_pool_size,"%u", tc_res_pages + data_pool_size); ret = wrap_common_attr_store(DataPoolSize, new_data_pool_size, len); } else { INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); } if (ret >= 0) { wrap_reorg_datapool(); if (vol_lock) volume_lock(ctx); ret = inm_tc_resv_add(ctx, tc_res_pages); if (vol_lock) volume_unlock(ctx); if (!ret) { recalc_data_file_mode_thres(); return; } else { if (try < 1) { try++; goto retry; } } } info("Insufficient data page pool for %s reservations.Continuing! ret=%d", ctx->tc_guid, ret); } recalc_data_file_mode_thres(); } static_inline void inm_tc_reserv_deinit(target_context_t *ctx) { inm_s32_t ret = -1; if ((ret = inm_tc_resv_del(ctx, ctx->tc_reserved_pages))) { /* * we could not release page reservations - not good */ info("Failed to release page reservations! ret:%d", ret); INM_BUG_ON(1); } recalc_data_file_mode_thres(); } target_context_t * target_context_ctr(void) { target_context_t *ctx; ctx = (target_context_t *) INM_KMALLOC(sizeof (*ctx), INM_KM_SLEEP, INM_PINNED_HEAP); if (ctx == NULL) { return NULL; } INM_MEM_ZERO(ctx, sizeof(*ctx)); return ctx; } void target_context_dtr(target_context_t *ctx) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered ctx:%p ",ctx); } INM_KFREE(ctx, sizeof(target_context_t), INM_PINNED_HEAP); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } } inm_s32_t tgt_ctx_common_init(target_context_t *ctx, inm_dev_extinfo_t *dev_info) { inm_device_t dtype; inm_s32_t err = -1; char *dst_scsi_id = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered ctx:%p volume:%s",ctx, dev_info->d_guid); } ctx->tc_data_log_dir = INM_KMALLOC(INM_PATH_MAX, INM_KM_SLEEP, INM_KERNEL_HEAP); if(!ctx->tc_data_log_dir) { err = -ENOMEM; goto out_err; } ctx->tc_bp = INM_KMALLOC(sizeof(struct bitmap_info), INM_KM_SLEEP, INM_PINNED_HEAP); if (!ctx->tc_bp) { INM_KFREE(ctx->tc_data_log_dir, INM_PATH_MAX, INM_KERNEL_HEAP); err = -ENOMEM; goto out_err; } INM_MEM_ZERO(ctx->tc_bp, sizeof(struct bitmap_info)); ctx->tc_bp->bitmap_file_name[0]='\0'; fill_bitmap_filename_in_volume_context(ctx); err = validate_path_for_file_name(ctx->tc_bp->bitmap_file_name); if (err) { INM_KFREE(ctx->tc_data_log_dir, INM_PATH_MAX, INM_KERNEL_HEAP); INM_KFREE(ctx->tc_bp, sizeof(struct bitmap_info), INM_PINNED_HEAP); err("Cannot get bitmap file name"); goto out_err; } ctx->tc_mnt_pt = INM_KMALLOC(INM_PATH_MAX, INM_KM_SLEEP, INM_KERNEL_HEAP); if(!ctx->tc_mnt_pt) { INM_KFREE(ctx->tc_data_log_dir, INM_PATH_MAX, INM_KERNEL_HEAP); INM_KFREE(ctx->tc_bp, sizeof(struct bitmap_info), INM_PINNED_HEAP); err = -ENOMEM; goto out_err; } ctx->tc_db_v2 = (UDIRTY_BLOCK_V2 *)INM_KMALLOC(sizeof(UDIRTY_BLOCK_V2), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!ctx->tc_db_v2) { INM_KFREE(ctx->tc_data_log_dir, INM_PATH_MAX, INM_KERNEL_HEAP); INM_KFREE(ctx->tc_bp, sizeof(struct bitmap_info), INM_PINNED_HEAP); INM_KFREE(ctx->tc_mnt_pt, INM_PATH_MAX, INM_KERNEL_HEAP); err = -ENOMEM; goto out_err; } INM_MEM_ZERO(ctx->tc_db_v2, sizeof(UDIRTY_BLOCK_V2)); /* By default, filtering is disabled, user space would send * START_FILTERING ioctl, which would enable filtering. */ ctx->tc_stats.st_mode_switch_time = INM_GET_CURR_TIME_IN_SEC; ctx->tc_stats.st_wostate_switch_time = INM_GET_CURR_TIME_IN_SEC; INM_INIT_SEM(&ctx->tc_sem); INM_INIT_SEM(&ctx->tc_resync_sem); ctx->tc_cur_mode = FLT_MODE_UNINITIALIZED; ctx->tc_prev_mode = FLT_MODE_UNINITIALIZED; ctx->tc_cur_wostate = ecWriteOrderStateUnInitialized; ctx->tc_prev_wostate = ecWriteOrderStateUnInitialized; ctx->tc_dev_state = DEVICE_STATE_ONLINE; INM_INIT_SPIN_LOCK(&(ctx->tc_lock)); INM_INIT_SPIN_LOCK(&ctx->tc_tunables_lock); INM_INIT_LIST_HEAD(&(ctx->tc_node_head)); INM_INIT_LIST_HEAD(&(ctx->tc_non_drainable_node_head)); ctx->tc_db_notify_thres = DEFAULT_DB_NOTIFY_THRESHOLD; ctx->tc_data_to_disk_limit = driver_ctx->tunable_params.data_to_disk_limit; INM_INIT_WAITQUEUE_HEAD(&ctx->tc_wq_in_flight_ios); INM_ATOMIC_SET(&(ctx->tc_nr_in_flight_ios), 0); INM_ATOMIC_SET(&(ctx->tc_nr_chain_bios_submitted), 0); INM_ATOMIC_SET(&(ctx->tc_nr_chain_bios_pending), 0); INM_ATOMIC_SET(&(ctx->tc_nr_completed_in_child_stack), 0); INM_ATOMIC_SET(&(ctx->tc_nr_completed_in_own_stack), 0); INM_ATOMIC_SET(&(ctx->tc_async_bufs_pending), 0); INM_ATOMIC_SET(&(ctx->tc_async_bufs_processed), 0); INM_ATOMIC_SET(&(ctx->tc_async_bufs_write_pending), 0); INM_ATOMIC_SET(&(ctx->tc_async_bufs_write_processed), 0); ctx->tc_lock_fn = volume_lock_irqsave; ctx->tc_unlock_fn = volume_unlock_irqrestore; init_latency_stats(&ctx->tc_dbret_latstat, inm_dbret_lat_bkts_in_usec); init_latency_stats(&ctx->tc_dbwait_notify_latstat, inm_dbwait_notify_lat_bkts_in_usec); init_latency_stats(&ctx->tc_dbcommit_latstat, inm_dbcommit_lat_bkts_in_usec); /* added the 2nd arg as ctx->tc_priv which holds the name is not yet * initialized */ if ((err = sysfs_init_volume(ctx, dev_info->d_pname))) { ctx->tc_reserved_pages = 0; err("sysfs_init_volume failed err = %d\n", err); goto ret1; } if ((err = init_data_file_flt_ctxt(ctx))) { ctx->tc_reserved_pages = 0; err("init_data_file_flt_ctxt failed err = %d\n", err); goto ret2; } if ((err = create_datafile_dir_name(ctx,(inm_dev_info_t *)dev_info))) { ctx->tc_reserved_pages = 0; err("create_proc_for_target failed w/ error = %d\n", err); goto ret3; } /* if the device is just added, create the sysfs entry for FilterDevType */ dtype = filter_dev_type_get(ctx->tc_pname); if (dtype != ctx->tc_dev_type) { if ((err = filter_dev_type_set(ctx, ctx->tc_dev_type))) { err("tgt_ctx_eommon_init: failed for %s to set filter_dev_type err = %d", ctx->tc_guid, err); goto ret3; } } /* Allocate tc'c reservations from dc's unreserved pages */ if (dev_info->d_type != FILTER_DEV_MIRROR_SETUP) inm_tc_reserv_init(ctx, 0); get_time_stamp(&(ctx->tc_tel.tt_create_time)); err = 0; goto out; ret3: free_data_file_flt_ctxt(ctx); ret2: put_tgt_ctxt(ctx); goto out; ret1: INM_DESTROY_WAITQUEUE_HEAD(&ctx->tc_waitq); INM_DESTROY_SPIN_LOCK(&ctx->tc_tunables_lock); INM_DESTROY_SPIN_LOCK(&(ctx->tc_lock)); INM_DESTROY_SEM(&ctx->tc_sem); INM_DESTROY_SEM(&ctx->tc_resync_sem); INM_KFREE(ctx->tc_mnt_pt, INM_PATH_MAX, INM_KERNEL_HEAP); ctx->tc_mnt_pt = NULL; INM_KFREE(ctx->tc_db_v2, sizeof(UDIRTY_BLOCK_V2), INM_KERNEL_HEAP); ctx->tc_db_v2 = NULL; INM_KFREE(ctx->tc_bp, sizeof(struct bitmap_info), INM_PINNED_HEAP); ctx->tc_bp = NULL; INM_KFREE(ctx->tc_data_log_dir, INM_PATH_MAX, INM_KERNEL_HEAP); ctx->tc_data_log_dir = NULL; out_err: remove_tc_from_dc(ctx); inm_free_host_dev_ctx(ctx->tc_priv); INM_DESTROY_SEM(&ctx->cdw_sem); target_context_dtr(ctx); INM_MODULE_PUT(); out: if(dst_scsi_id){ INM_KFREE(dst_scsi_id, INM_MAX_SCSI_ID_SIZE, INM_KERNEL_HEAP); dst_scsi_id = NULL; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return err; } /* * called when tgt_ctx_spec_init() failed or normal unstaking. * in either case calling free_data_file_flt_ctxt() here in this function is * a NOP because if tgt_ctx_spec_init() failed there is nothing this function * does since there won't be any messages queued. In the case of normal * stop we don't even get here as each message on the queue has a referece * on the target_context. */ void tgt_ctx_common_deinit(target_context_t *ctx) { unsigned long lock_flag = 0; ctx->tc_flags &= VCF_VOLUME_CREATING | VCF_VOLUME_DELETING | VCF_IN_NWO; if (ctx->tc_filtering_disable_required) ctx->tc_flags |= VCF_FILTERING_STOPPED; ctx->tc_cur_mode = FLT_MODE_UNINITIALIZED; ctx->tc_cur_wostate = ecWriteOrderStateUnInitialized; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered ctx:%p",ctx); } if(!strncmp(ctx->tc_guid, "DUMMY_LUN_ZERO", strlen("DUMMY_LUN_ZERO"))){ INM_SPIN_LOCK_IRQSAVE(&driver_ctx->clean_shutdown_lock, lock_flag); driver_ctx->dc_flags &= ~DRV_DUMMY_LUN_CREATED; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->clean_shutdown_lock, lock_flag); }else { /* Release tc'c reservations to dc's unreserved pages */ inm_tc_reserv_deinit(ctx); } if(ctx->tc_datafile_dir_name) { INM_KFREE(ctx->tc_datafile_dir_name, INM_GUID_LEN_MAX, INM_KERNEL_HEAP); ctx->tc_datafile_dir_name = NULL; } if(ctx->tc_mnt_pt){ INM_KFREE(ctx->tc_mnt_pt, INM_PATH_MAX, INM_KERNEL_HEAP); ctx->tc_mnt_pt = NULL; } if(ctx->tc_db_v2){ INM_KFREE(ctx->tc_db_v2 , sizeof(UDIRTY_BLOCK_V2), INM_KERNEL_HEAP); ctx->tc_db_v2 = NULL; } if(ctx->tc_bp){ INM_KFREE(ctx->tc_bp, sizeof(struct bitmap_info), INM_PINNED_HEAP); ctx->tc_bp = NULL; } if(ctx->tc_data_log_dir){ INM_KFREE(ctx->tc_data_log_dir, INM_PATH_MAX, INM_KERNEL_HEAP); ctx->tc_data_log_dir = NULL; } INM_DESTROY_WAITQUEUE_HEAD(&ctx->tc_wq_in_flight_ios); INM_DESTROY_WAITQUEUE_HEAD(&ctx->tc_waitq); INM_DESTROY_SPIN_LOCK(&ctx->tc_tunables_lock); INM_DESTROY_SPIN_LOCK(&(ctx->tc_lock)); INM_DESTROY_SEM(&ctx->tc_sem); INM_DESTROY_SEM(&ctx->tc_resync_sem); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } } int tgt_ctx_spec_init(target_context_t *ctx, inm_dev_extinfo_t *dev_info) { inm_s32_t error = 1; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered ctx:%p volume:%s", ctx, dev_info->d_guid); } switch(ctx->tc_dev_type) { case FILTER_DEV_HOST_VOLUME: case FILTER_DEV_MIRROR_SETUP: if (stack_host_dev(ctx, dev_info)) { error = 1; goto out; } error = 0; break; case FILTER_DEV_FABRIC_LUN: if ((error = fabric_volume_init(ctx, (inm_dev_info_t*)dev_info))) { goto out; } error = 0; break; default: dbg("do_volume_stacking: bad dev type for filtering"); break; } out: if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving error:%d", error); } return error; } void tgt_ctx_spec_deinit(target_context_t *ctx) { switch (ctx->tc_dev_type) { case FILTER_DEV_HOST_VOLUME: case FILTER_DEV_MIRROR_SETUP: if (ctx->tc_priv) { /* need completion to improve code to fit both the approaches */ INM_REL_DEV_RESOURCES(ctx); } break; case FILTER_DEV_FABRIC_LUN: fabric_volume_deinit(ctx); break; default: break; } } /** * check_for_tc_state * @tgt_ctx - target context object * @write_lock - flag to check whether the caller has taken read/write lock * * The caller will wait if the target is undergone for creation or deletion. * * Returns 1 if it has to wait. Otherwise 0. */ int check_for_tc_state(target_context_t *tgt_ctx, int write_lock) { struct create_delete_wait *cdw_item; int ret = 0; cdw_item = INM_KMALLOC(sizeof(struct create_delete_wait), INM_KM_NOIO, INM_KERNEL_HEAP); if(!cdw_item){ INM_DELAY(3 * INM_HZ); return 1; } if(tgt_ctx->tc_flags & (VCF_VOLUME_CREATING | VCF_VOLUME_DELETING)){ INM_INIT_COMPLETION(&cdw_item->wait); INM_DOWN(&tgt_ctx->cdw_sem); inm_list_add_tail(&cdw_item->list, &tgt_ctx->cdw_list); INM_UP(&tgt_ctx->cdw_sem); if(write_lock) INM_UP_WRITE(&(driver_ctx->tgt_list_sem)); else INM_UP_READ(&(driver_ctx->tgt_list_sem)); INM_WAIT_FOR_COMPLETION(&cdw_item->wait); if(write_lock) INM_DOWN_WRITE(&(driver_ctx->tgt_list_sem)); else INM_DOWN_READ(&(driver_ctx->tgt_list_sem)); INM_DESTROY_COMPLETION(&cdw_item->wait); ret = 1; } INM_KFREE(cdw_item, sizeof(struct create_delete_wait), INM_KERNEL_HEAP); return ret; } /** * wake_up_tc_state * @tgt_ctx - target context object * * This function will wake-up all the threads which are waiting * for the target context's creation or deletion. */ void wake_up_tc_state(target_context_t *tgt_ctx) { struct create_delete_wait *cdw_item; while(!inm_list_empty(&tgt_ctx->cdw_list)){ cdw_item = inm_list_entry(tgt_ctx->cdw_list.next, struct create_delete_wait, list); INM_DOWN(&tgt_ctx->cdw_sem); inm_list_del(&cdw_item->list); INM_UP(&tgt_ctx->cdw_sem); INM_COMPLETE(&cdw_item->wait); } } target_context_t * get_tgt_ctxt_from_uuid_locked(char *uuid) { struct inm_list_head *ptr; target_context_t *tgt_ctxt = NULL; host_dev_ctx_t *hdcp; dev_t dev = 0; convert_path_to_dev(uuid, &dev); retry: for(ptr = driver_ctx->tgt_list.next; ptr != &(driver_ctx->tgt_list); ptr = ptr->next, tgt_ctxt = NULL) { tgt_ctxt = inm_list_entry(ptr, target_context_t, tc_list); if (tgt_ctxt->tc_dev_type == FILTER_DEV_HOST_VOLUME || tgt_ctxt->tc_dev_type == FILTER_DEV_MIRROR_SETUP) { hdcp = (host_dev_ctx_t *) (tgt_ctxt->tc_priv); if (hdcp) { struct inm_list_head *hptr = NULL; host_dev_t *hdc_dev = NULL; __inm_list_for_each(hptr, &hdcp->hdc_dev_list_head) { hdc_dev = inm_list_entry(hptr, host_dev_t, hdc_dev_list); if (hdc_dev->hdc_dev == dev) { dbg("uuid %s uuid_dev %u matching dev",uuid, dev); break; } hdc_dev = NULL; } if (hdc_dev) break; } } else if (strcmp(tgt_ctxt->tc_guid, uuid) == 0) { break; } } if(tgt_ctxt && check_for_tc_state(tgt_ctxt, 1)){ tgt_ctxt = NULL; goto retry; } return tgt_ctxt; } target_context_t * get_tgt_ctxt_from_scsiid(char *scsiid) { target_context_t *tgt_ctxt = NULL; INM_DOWN_READ(&(driver_ctx->tgt_list_sem)); tgt_ctxt = get_tgt_ctxt_from_scsiid_locked(scsiid); if (tgt_ctxt) { get_tgt_ctxt(tgt_ctxt); } INM_UP_READ(&(driver_ctx->tgt_list_sem)); return tgt_ctxt; } target_context_t * get_tgt_ctxt_from_scsiid_locked(char *scsi_id) { struct inm_list_head *ptr; target_context_t *tgt_ctxt = NULL; retry: for(ptr = driver_ctx->tgt_list.next; ptr != &(driver_ctx->tgt_list); ptr = ptr->next, tgt_ctxt = NULL) { tgt_ctxt = inm_list_entry(ptr, target_context_t, tc_list); if (tgt_ctxt->tc_dev_type == FILTER_DEV_MIRROR_SETUP) { dbg("get_tgt_ctxt_from_scsiid_locked tgt:%s devinfo:%s", tgt_ctxt->tc_pname, scsi_id); if (strcmp(tgt_ctxt->tc_pname, scsi_id) == 0) { break; } } } if (tgt_ctxt && check_for_tc_state(tgt_ctxt, 1)) { tgt_ctxt = NULL; goto retry; } return tgt_ctxt; } target_context_t * get_tgt_ctxt_persisted_name_nowait_locked(char *persisted_name) { struct inm_list_head *ptr; target_context_t *tgt_ctxt = NULL; for(ptr = driver_ctx->tgt_list.next; ptr != &(driver_ctx->tgt_list); ptr = ptr->next, tgt_ctxt = NULL) { tgt_ctxt = inm_list_entry(ptr, target_context_t, tc_list); if (strcmp(tgt_ctxt->tc_pname, persisted_name) == 0) { break; } } if(tgt_ctxt){ if(tgt_ctxt->tc_flags & (VCF_VOLUME_CREATING | VCF_VOLUME_DELETING)){ tgt_ctxt = NULL; } else { get_tgt_ctxt(tgt_ctxt); } } return tgt_ctxt; } target_context_t * get_tgt_ctxt_from_name_nowait_locked(char *id) { target_context_t *tgt_ctxt = NULL; tgt_ctxt = get_tgt_ctxt_from_uuid_nowait_locked(id); if (!tgt_ctxt) /* If persistent name recieved */ tgt_ctxt = get_tgt_ctxt_persisted_name_nowait_locked(id); return tgt_ctxt; } target_context_t * get_tgt_ctxt_from_name_nowait(char *id) { target_context_t *tgt_ctxt = NULL; INM_DOWN_READ(&(driver_ctx->tgt_list_sem)); tgt_ctxt = get_tgt_ctxt_from_name_nowait_locked(id); INM_UP_READ(&(driver_ctx->tgt_list_sem)); return tgt_ctxt; } target_context_t * get_tgt_ctxt_from_uuid(char *uuid) { struct inm_list_head *ptr; target_context_t *tgt_ctxt = NULL; host_dev_ctx_t *hdcp; inm_dev_t dev = 0; convert_path_to_dev(uuid, &dev); INM_DOWN_READ(&(driver_ctx->tgt_list_sem)); retry: for(ptr = driver_ctx->tgt_list.next; ptr != &(driver_ctx->tgt_list); ptr = ptr->next, tgt_ctxt = NULL) { tgt_ctxt = inm_list_entry(ptr, target_context_t, tc_list); if (tgt_ctxt->tc_dev_type == FILTER_DEV_HOST_VOLUME || tgt_ctxt->tc_dev_type == FILTER_DEV_MIRROR_SETUP) { hdcp = (host_dev_ctx_t *) (tgt_ctxt->tc_priv); if (hdcp) { struct inm_list_head *hptr = NULL; host_dev_t *hdc_dev = NULL; __inm_list_for_each(hptr, &hdcp->hdc_dev_list_head) { hdc_dev = inm_list_entry(hptr, host_dev_t, hdc_dev_list); if (hdc_dev->hdc_dev == dev) { dbg("uuid %s uuid_dev %u dev %u", uuid, dev, hdc_dev->hdc_dev); break; } hdc_dev = NULL; } if (hdc_dev) break; } } else if (strcmp(tgt_ctxt->tc_guid, uuid) == 0) { break; } } if(tgt_ctxt && check_for_tc_state(tgt_ctxt, 0)){ tgt_ctxt = NULL; goto retry; } if(tgt_ctxt) get_tgt_ctxt(tgt_ctxt); INM_UP_READ(&(driver_ctx->tgt_list_sem)); return tgt_ctxt; } target_context_t *get_tgt_ctxt_from_uuid_nowait(char *uuid) { struct inm_list_head *ptr; target_context_t *tgt_ctxt = NULL; host_dev_ctx_t *hdcp; dev_t dev = 0; convert_path_to_dev(uuid, &dev); INM_DOWN_READ(&(driver_ctx->tgt_list_sem)); for (ptr = driver_ctx->tgt_list.next; ptr != &(driver_ctx->tgt_list); ptr = ptr->next, tgt_ctxt = NULL) { tgt_ctxt = inm_list_entry(ptr, target_context_t, tc_list); if (tgt_ctxt->tc_dev_type == FILTER_DEV_HOST_VOLUME || tgt_ctxt->tc_dev_type == FILTER_DEV_MIRROR_SETUP) { hdcp = (host_dev_ctx_t *) (tgt_ctxt->tc_priv); if (hdcp) { struct inm_list_head *hptr = NULL; host_dev_t *hdc_dev = NULL; __inm_list_for_each(hptr, &hdcp->hdc_dev_list_head) { hdc_dev = inm_list_entry(hptr, host_dev_t, hdc_dev_list); if (hdc_dev->hdc_dev == dev) { break; } hdc_dev = NULL; } if (hdc_dev) break; } } else if (strcmp(tgt_ctxt->tc_guid, uuid) == 0) { break; } } if(tgt_ctxt && (tgt_ctxt->tc_flags & (VCF_VOLUME_CREATING | VCF_VOLUME_DELETING))) tgt_ctxt = NULL; if(tgt_ctxt) get_tgt_ctxt(tgt_ctxt); INM_UP_READ(&(driver_ctx->tgt_list_sem)); return tgt_ctxt; } /* * caller of this function make sure to hold lock tgt_list_sem * before calling this function. */ target_context_t *get_tgt_ctxt_from_uuid_nowait_locked(char *uuid) { struct inm_list_head *ptr; target_context_t *tgt_ctxt = NULL; host_dev_ctx_t *hdcp; dev_t dev = 0; convert_path_to_dev(uuid, &dev); for (ptr = driver_ctx->tgt_list.next; ptr != &(driver_ctx->tgt_list); ptr = ptr->next, tgt_ctxt = NULL) { tgt_ctxt = inm_list_entry(ptr, target_context_t, tc_list); if (tgt_ctxt->tc_dev_type == FILTER_DEV_HOST_VOLUME || tgt_ctxt->tc_dev_type == FILTER_DEV_MIRROR_SETUP) { hdcp = (host_dev_ctx_t *) (tgt_ctxt->tc_priv); if (hdcp) { struct inm_list_head *hptr = NULL; host_dev_t *hdc_dev = NULL; __inm_list_for_each(hptr, &hdcp->hdc_dev_list_head) { hdc_dev = inm_list_entry(hptr, host_dev_t, hdc_dev_list); if (hdc_dev->hdc_dev == dev) { break; } hdc_dev = NULL; } if (hdc_dev) break; } } else if (strcmp(tgt_ctxt->tc_guid, uuid) == 0) { break; } } if(tgt_ctxt && (tgt_ctxt->tc_flags & (VCF_VOLUME_CREATING | VCF_VOLUME_DELETING))) tgt_ctxt = NULL; if(tgt_ctxt) get_tgt_ctxt(tgt_ctxt); return tgt_ctxt; } inm_s32_t set_tgt_ctxt_filtering_mode(target_context_t *tgt_ctxt, flt_mode new_flt_mode, inm_s32_t service_initiated) { register flt_mode curr_flt_mode = FLT_MODE_UNINITIALIZED; inm_u64_t last_time = 0, time_in_secs = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered cur mode:%d req mode:%d", tgt_ctxt->tc_cur_mode, new_flt_mode); } /* validate the inputs and store the volume */ if (!tgt_ctxt || new_flt_mode < 0 || service_initiated < 0) return -EINVAL; curr_flt_mode = tgt_ctxt->tc_cur_mode; /* save the present state * before modifying it */ if (new_flt_mode == curr_flt_mode) return 0; last_time = tgt_ctxt->tc_stats.st_mode_switch_time; tgt_ctxt->tc_stats.st_mode_switch_time = INM_GET_CURR_TIME_IN_SEC; /* The time (last_time) of switching to one of the mode is * greater than the time (current time) of switching to the other mode. * So the number of seconds that the driver spent in previous mode gets * negative. So in this scenario, set last_time to current time and the * the number of seconds that the driver spent in previous mode will * always be 0. */ if(last_time > tgt_ctxt->tc_stats.st_mode_switch_time) last_time = tgt_ctxt->tc_stats.st_mode_switch_time; time_in_secs = (tgt_ctxt->tc_stats.st_mode_switch_time - \ last_time); /* add the value to respective mode. */ tgt_ctxt->tc_stats.num_secs_in_flt_mode[curr_flt_mode] += time_in_secs; switch (new_flt_mode) { case FLT_MODE_DATA: dbg("setting filtering mode as DATA-MODE "); curr_flt_mode = FLT_MODE_DATA; break; case FLT_MODE_METADATA: curr_flt_mode = FLT_MODE_METADATA; dbg("setting filtering mode as META-DATA-MODE \n"); if (service_initiated) { tgt_ctxt->tc_stats.num_change_metadata_flt_mode_on_user_req++; } break; default: dbg("unknown filtering mode \n"); return -EINVAL; break; } tgt_ctxt->tc_prev_mode = tgt_ctxt->tc_cur_mode; tgt_ctxt->tc_cur_mode = curr_flt_mode; tgt_ctxt->tc_stats.num_change_to_flt_mode[curr_flt_mode]++; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving cur mode:%d req mode:%d", tgt_ctxt->tc_cur_mode, new_flt_mode); } return 0; } inm_s32_t set_tgt_ctxt_wostate(target_context_t *tgt_ctxt, etWriteOrderState new_wostate, inm_s32_t service_initiated, etWOSChangeReason reason) { etWriteOrderState curr_wostate = ecWriteOrderStateUnInitialized; inm_u64_t last_time = 0, time_in_secs = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered cur write order state:%d req write order state:%d", tgt_ctxt->tc_cur_wostate, new_wostate); } if (!tgt_ctxt || new_wostate < 0 || service_initiated < 0) return INM_EINVAL; curr_wostate = tgt_ctxt->tc_cur_wostate; if (new_wostate == curr_wostate) return 0; last_time = tgt_ctxt->tc_stats.st_wostate_switch_time; tgt_ctxt->tc_stats.st_wostate_switch_time = INM_GET_CURR_TIME_IN_SEC; if(last_time > tgt_ctxt->tc_stats.st_wostate_switch_time) last_time = tgt_ctxt->tc_stats.st_wostate_switch_time; time_in_secs = (tgt_ctxt->tc_stats.st_wostate_switch_time - last_time); /* add the value to respective write order state. */ tgt_ctxt->tc_stats.num_secs_in_wostate[curr_wostate] += time_in_secs; if (curr_wostate == ecWriteOrderStateData) telemetry_nwo_stats_record(tgt_ctxt, curr_wostate, new_wostate, reason); switch (new_wostate) { case ecWriteOrderStateData: dbg("setting write order state as DATA "); curr_wostate = ecWriteOrderStateData; break; case ecWriteOrderStateMetadata: dbg("setting write order state as METADATA "); curr_wostate = ecWriteOrderStateMetadata; break; case ecWriteOrderStateBitmap: dbg("setting write order state as BITMAP "); curr_wostate = ecWriteOrderStateBitmap; break; case ecWriteOrderStateRawBitmap: dbg("setting write order state as RAW BITMAP"); curr_wostate = ecWriteOrderStateRawBitmap; break; case ecWriteOrderStateUnInitialized: dbg("setting write order state as UNINITIALIZED"); curr_wostate = ecWriteOrderStateUnInitialized; break; default: dbg("unknown write order state \n"); return INM_EINVAL; break; } INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); if (new_wostate == ecWriteOrderStateData) { tgt_ctxt->tc_flags &= ~VCF_IN_NWO; driver_ctx->total_prot_volumes_in_nwo--; } else if (tgt_ctxt->tc_cur_wostate == ecWriteOrderStateData) { tgt_ctxt->tc_flags |= VCF_IN_NWO; driver_ctx->total_prot_volumes_in_nwo++; end_cx_session(); } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); tgt_ctxt->tc_prev_wostate = tgt_ctxt->tc_cur_wostate; tgt_ctxt->tc_cur_wostate = curr_wostate; if (service_initiated) tgt_ctxt->tc_stats.num_change_to_wostate_user[curr_wostate]++; else tgt_ctxt->tc_stats.num_change_to_wostate[curr_wostate]++; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving cur write order state:%d req write order state:%d", tgt_ctxt->tc_cur_wostate, new_wostate); } return 0; } void set_malloc_fail_error(target_context_t *tgt_ctxt) { unsigned long lock_flag = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (tgt_ctxt) { /* ACQUIRE NOMEMORY_LOG_EVENT lock */ INM_SPIN_LOCK_IRQSAVE(&driver_ctx->log_lock, lock_flag); driver_ctx->stats.num_malloc_fails++; tgt_ctxt->tc_stats.num_malloc_fails++; /* RELEASE NOMEMORY_LOG_EVENT lock */ INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->log_lock, lock_flag); } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } } inm_s32_t can_switch_to_data_filtering_mode(target_context_t *tgt_ctxt) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!(tgt_ctxt->tc_optimize_performance && PERF_OPT_DATA_MODE_CAPTURE_WITH_BITMAP)) { if (!tgt_ctxt->tc_bp || !tgt_ctxt->tc_bp->volume_bitmap || (ecVBitmapStateReadCompleted != tgt_ctxt->tc_bp->volume_bitmap->eVBitmapState)) return FALSE; } /* * Can switch to data mode: target context data page pool * threshold drops below the minimum volume data pages reservations */ if (tgt_ctxt->tc_stats.num_pages_allocated > tgt_ctxt->tc_reserved_pages) { dbg("Returning FALSE, as # of pages allocated (%#x) > tc_reserve_pages (%#x)\n", tgt_ctxt->tc_stats.num_pages_allocated, tgt_ctxt->tc_reserved_pages); return FALSE; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ dbg("Returning TRUE, as\nFree data pages = %#x\nData pages allocated = %#x\ntc_reserve_pages = %#x\n", driver_ctx->data_flt_ctx.pages_free, driver_ctx->data_flt_ctx.pages_allocated, tgt_ctxt->tc_reserved_pages); } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return TRUE; /* to enable data mode filtering */ } inm_s32_t can_switch_to_data_wostate(target_context_t *tgt_ctxt) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if(!tgt_ctxt->tc_bp || !tgt_ctxt->tc_bp->volume_bitmap || (ecVBitmapStateReadCompleted != tgt_ctxt->tc_bp->volume_bitmap->eVBitmapState)) return FALSE; if(tgt_ctxt->tc_pending_md_changes) return FALSE; if(FLT_MODE_METADATA == tgt_ctxt->tc_cur_mode) return FALSE; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return TRUE; /* to enable write order state */ } void add_changes_to_pending_changes(target_context_t *ctx, etWriteOrderState wostate, inm_u32_t num_changes) { switch(wostate){ case ecWriteOrderStateData: ctx->tc_pending_wostate_data_changes += num_changes; break; case ecWriteOrderStateMetadata: ctx->tc_pending_wostate_md_changes += num_changes; break; case ecWriteOrderStateBitmap: ctx->tc_pending_wostate_bm_changes += num_changes; break; case ecWriteOrderStateRawBitmap: ctx->tc_pending_wostate_rbm_changes += num_changes; break; default: err("Write Order State didn't match with the existing ones\n"); break; } } void subtract_changes_from_pending_changes(target_context_t *ctx, etWriteOrderState wostate, inm_u32_t num_changes) { switch(wostate){ case ecWriteOrderStateData: ctx->tc_pending_wostate_data_changes -= num_changes; break; case ecWriteOrderStateMetadata: ctx->tc_pending_wostate_md_changes -= num_changes; break; case ecWriteOrderStateBitmap: ctx->tc_pending_wostate_bm_changes -= num_changes; break; case ecWriteOrderStateRawBitmap: ctx->tc_pending_wostate_rbm_changes -= num_changes; break; default: err("Write Order State didn't match with the existing ones\n"); break; } } inm_s32_t is_data_filtering_enabled_for_this_volume(target_context_t *vcptr) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!vcptr) return FALSE; if(!driver_ctx->service_supports_data_filtering) return FALSE; if(!driver_ctx->tunable_params.enable_data_filtering) return FALSE; if (vcptr->tc_flags & VCF_DATA_MODE_DISABLED) return FALSE; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return TRUE; } void fs_freeze_volume(target_context_t *ctxt, struct inm_list_head *head) { host_dev_ctx_t *hdcp; inm_s32_t count, success; struct inm_list_head *hptr = NULL; host_dev_t *hdc_dev = NULL; vol_info_t *vinfo = NULL; if (ctxt->tc_dev_type == FILTER_DEV_FABRIC_LUN) return; INM_INIT_LIST_HEAD(head); count = success = 0; hdcp = ctxt->tc_priv; __inm_list_for_each(hptr, &hdcp->hdc_dev_list_head) { hdc_dev = inm_list_entry(hptr, host_dev_t, hdc_dev_list); count++; vinfo = (vol_info_t*)INM_KMALLOC(sizeof(vol_info_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!vinfo){ err("Failed to allocate the vol_info_t object"); break; } INM_MEM_ZERO(vinfo, sizeof(vol_info_t)); #ifdef INM_LINUX vinfo->bdev = inm_open_by_devnum(hdc_dev->hdc_dev, FMODE_READ | FMODE_WRITE); #endif if (IS_ERR(vinfo->bdev)) { info(" failed to open block device %s", ctxt->tc_guid); vinfo->bdev = NULL; INM_KFREE(vinfo, sizeof(vol_info_t), INM_KERNEL_HEAP); break; } inm_freeze_bdev(vinfo->bdev, vinfo->sb); success++; inm_list_add_tail(&vinfo->next, head); } if (count == success) { dbg("Freeze successful for device %s", ctxt->tc_guid); } else { while(!inm_list_empty(head)){ vinfo = inm_list_entry(head->next, vol_info_t, next); inm_list_del(&vinfo->next); inm_thaw_bdev(vinfo->bdev, vinfo->sb); INM_KFREE(vinfo, sizeof(vol_info_t), INM_KERNEL_HEAP); } if (success) { err("Freeze Failed for one or more paths of device %s", ctxt->tc_guid); } else { err("Freeze Failed for device %s", ctxt->tc_guid); } } } void thaw_volume(target_context_t *ctxt, struct inm_list_head *head) { vol_info_t *vinfo = NULL; if (ctxt->tc_dev_type == FILTER_DEV_FABRIC_LUN) return; while(!inm_list_empty(head)){ vinfo = inm_list_entry(head->next, vol_info_t, next); inm_list_del(&vinfo->next); inm_thaw_bdev(vinfo->bdev, vinfo->sb); if (vinfo->sb) { dbg("Thaw successful for device %s for filesystem", ctxt->tc_guid); } else { dbg("Thaw successful for device %s", ctxt->tc_guid); } #ifdef INM_LINUX close_bdev(vinfo->bdev, FMODE_READ | FMODE_WRITE); #endif INM_KFREE(vinfo, sizeof(vol_info_t), INM_KERNEL_HEAP); } } void target_context_release(target_context_t *ctxt) { info("Target Context destroyed %d:%d device:%s", INM_GET_MAJOR(inm_dev_id_get(ctxt)),\ INM_GET_MINOR(inm_dev_id_get(ctxt)), ctxt->tc_guid); tgt_ctx_spec_deinit(ctxt); tgt_ctx_common_deinit(ctxt); remove_tc_from_dc(ctxt); INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); if (ctxt->tc_flags & VCF_IN_NWO) driver_ctx->total_prot_volumes_in_nwo--; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); inm_free_host_dev_ctx(ctxt->tc_priv); INM_DESTROY_SEM(&ctxt->cdw_sem); if(ctxt->tc_dev_type == FILTER_DEV_MIRROR_SETUP){ free_tc_global_at_lun(&(ctxt->tc_dst_list)); inm_deref_all_vol_entry_tcp(ctxt); } if (!inm_list_empty(&ctxt->tc_src_list)) { free_mirror_list(&ctxt->tc_src_list, 0); } if (!inm_list_empty(&ctxt->tc_dst_list)) { free_mirror_list(&ctxt->tc_dst_list, 1); } target_context_dtr(ctxt); INM_MODULE_PUT(); } /* * This function is called when a target has been fully initialized and is * now ready to be cleaned up due to * disk removal (lvm reconfig) for host volumes. * lun cleanup for fabric case. * * the caller expects that on return the target_context is removed. */ void tgt_ctx_force_soft_remove(target_context_t *ctx) { if (ctx) { target_forced_cleanup(ctx); /* Unregister sysfs interface which would do the cleanup. */ put_tgt_ctxt(ctx); } } /* Call this function whenever the driver needs to do forced cleanup of * target. Called during lvm removal or during uninstallation. */ void target_forced_cleanup(target_context_t *ctx) { info("\tRemoved dev-type: %d, dev-id: %d dev:%s", ctx->tc_dev_type, inm_dev_id_get(ctx), ctx->tc_guid); info("\tStop filtering the device..:%s", ctx->tc_guid); do_stop_filtering(ctx); /* Perform Data File Mode cleanup */ free_data_file_flt_ctxt(ctx); #ifdef INM_AIX while(1) { host_dev_ctx_t *hdcp = ctx->tc_priv; volume_lock(ctx); if(!hdcp->hdc_buf_head){ volume_unlock(ctx); break; } volume_unlock(ctx); INM_DELAY(3 * HZ); } #endif } void inm_do_clear_stats(target_context_t *tcp) { bitmap_info_t *bp = NULL; tgt_hist_stats_t *thsp = NULL; inm_u8_t idx = 0; INM_BUG_ON(!tcp); bp = tcp->tc_bp; thsp = &tcp->tc_hist; INM_DOWN(&tcp->tc_sem); INM_MEM_ZERO(tcp->tc_stats.num_change_to_flt_mode, sizeof(tcp->tc_stats.num_change_to_flt_mode)); INM_MEM_ZERO(tcp->tc_stats.num_secs_in_flt_mode, sizeof(tcp->tc_stats.num_secs_in_flt_mode)); volume_lock(tcp); for (idx = 0; idx < MAX_NR_IO_BUCKETS; idx++) { INM_ATOMIC_SET(&tcp->tc_stats.io_pat_reads[idx], 0); INM_ATOMIC_SET(&tcp->tc_stats.io_pat_writes[idx], 0); } INM_ATOMIC_SET(&tcp->tc_stats.tc_write_cancel, 0); thsp->ths_nr_clrdiffs = 0; thsp->ths_clrdiff_ts = 0; thsp->ths_nr_osyncs = 0; thsp->ths_osync_ts = 0; thsp->ths_osync_err = 0; thsp->ths_clrstats_ts = INM_GET_CURR_TIME_IN_SEC; tcp->tc_stats.st_mode_switch_time = thsp->ths_clrstats_ts; tcp->tc_stats.st_wostate_switch_time = thsp->ths_clrstats_ts; volume_unlock(tcp); bp->num_bitmap_open_errors = 0; bp->num_bitmap_clear_errors = 0; bp->num_bitmap_read_errors = 0; bp->num_bitmap_write_errors = 0; bp->num_changes_read_from_bitmap = 0; bp->num_byte_changes_read_from_bitmap = 0; bp->num_changes_written_to_bitmap = 0; bp->num_byte_changes_written_to_bitmap = 0; INM_UP(&tcp->tc_sem); } /* * inm_validate_tc_devattr() * @tcp : target context ptr * notes : Validate the device/lun attributes * Callers should hold appropriate locks */ int inm_validate_tc_devattr(target_context_t *tcp, inm_dev_info_t *dip) { inm_s32_t ret = -1; if (!tcp) { return -EINVAL; } switch (dip->d_type) { case FILTER_DEV_FABRIC_LUN: ret = inm_validate_fabric_vol(tcp, dip); break; case FILTER_DEV_HOST_VOLUME: case FILTER_DEV_MIRROR_SETUP: /* If the newly requested start filtering guid/pname mapping * does not match preexisting, log an error */ if (strcmp(tcp->tc_guid, dip->d_guid) != 0 || strcmp(tcp->tc_pname, dip->d_pname) !=0) ret = -EINVAL; else ret = 0; break; case FILTER_DEV_FABRIC_VSNAP: ret = 0; break; default: info("Unknown device type inm_dev_info_t = %d\n", dip->d_type); break; } return ret; } void do_clear_diffs(target_context_t *tgt_ctxt) { target_context_t *vcptr = NULL; struct inm_list_head chg_nodes_hd; struct inm_list_head *curp = NULL, *nxtp = NULL; volume_bitmap_t *vbmap = NULL; bitmap_info_t *bitmap = tgt_ctxt->tc_bp; INM_INIT_LIST_HEAD(&chg_nodes_hd); INM_DOWN(&tgt_ctxt->tc_sem); volume_lock(tgt_ctxt); set_tgt_ctxt_wostate(tgt_ctxt, ecWriteOrderStateUnInitialized, FALSE, ecWOSChangeReasonUnInitialized); tgt_ctxt->tc_cur_mode = FLT_MODE_UNINITIALIZED; tgt_ctxt->tc_prev_mode = FLT_MODE_UNINITIALIZED; tgt_ctxt->tc_cur_wostate = ecWriteOrderStateUnInitialized; tgt_ctxt->tc_prev_wostate = ecWriteOrderStateUnInitialized; /* Data corruption issue caused due to releasing the data pages of * pending change node, before committing. this has occurred * when tc_pending_confirm initialized to NULL. * So check all the code paths, when initializing * tc_pending_confirm to NULL (which is mainly used in perform_commit() fn * * w/ this fix, orphan pages idea has replaced with orphan change nodes. * orphan nodes are not considered for updating tgt ctxt's statistics. **/ if (tgt_ctxt->tc_pending_confirm && !(tgt_ctxt->tc_pending_confirm->flags & CHANGE_NODE_ORPHANED)) { change_node_t *cnp = tgt_ctxt->tc_pending_confirm; cnp->flags |= CHANGE_NODE_ORPHANED; inm_list_del_init(&cnp->next); if (!inm_list_empty(&cnp->nwo_dmode_next)) { inm_list_del_init(&cnp->nwo_dmode_next); } deref_chg_node(cnp); } /* Before writing to bitmap, remove data mode node from * non write order list */ inm_list_for_each_safe(curp, nxtp, &tgt_ctxt->tc_nwo_dmode_list) { inm_list_del_init(curp); } list_change_head(&chg_nodes_hd, &tgt_ctxt->tc_node_head); INM_INIT_LIST_HEAD(&tgt_ctxt->tc_node_head); tgt_ctxt->tc_cur_node = NULL; tgt_ctxt->tc_pending_changes = 0; tgt_ctxt->tc_pending_md_changes = 0; tgt_ctxt->tc_bytes_pending_md_changes = 0; tgt_ctxt->tc_bytes_pending_changes = 0; tgt_ctxt->tc_pending_wostate_data_changes = 0; tgt_ctxt->tc_pending_wostate_md_changes = 0; tgt_ctxt->tc_pending_wostate_bm_changes = 0; tgt_ctxt->tc_pending_wostate_rbm_changes = 0; tgt_ctxt->tc_commited_changes = 0; tgt_ctxt->tc_bytes_commited_changes = 0; tgt_ctxt->tc_transaction_id = 0; telemetry_clear_dbs(&tgt_ctxt->tc_tel.tt_blend, DBS_DRIVER_RESYNC_REQUIRED); tgt_ctxt->tc_resync_required = 0; /* bitmap related data */ vcptr = tgt_ctxt; vbmap = bitmap->volume_bitmap; bitmap->volume_bitmap = NULL; bitmap->num_bitmap_open_errors = 0; tgt_ctxt->tc_hist.ths_clrdiff_ts = INM_GET_CURR_TIME_IN_SEC; tgt_ctxt->tc_hist.ths_nr_clrdiffs++; volume_unlock(tgt_ctxt); cleanup_change_nodes(&chg_nodes_hd, ecClearDiffs); volume_lock(tgt_ctxt); tgt_ctxt->tc_stats.st_mode_switch_time = INM_GET_CURR_TIME_IN_SEC; tgt_ctxt->tc_stats.st_wostate_switch_time = INM_GET_CURR_TIME_IN_SEC; volume_unlock(tgt_ctxt); if (vbmap) close_bitmap_file(vbmap, TRUE); set_unsignedlonglong_vol_attr(tgt_ctxt, VolumeRpoTimeStamp, tgt_ctxt->tc_hist.ths_clrdiff_ts*HUNDREDS_OF_NANOSEC_IN_SECOND); INM_UP(&tgt_ctxt->tc_sem); } inm_u32_t get_data_source(target_context_t *ctxt) { switch(ctxt->tc_cur_mode) { case FLT_MODE_DATA: return INVOLFLT_DATA_SOURCE_DATA; case FLT_MODE_METADATA: return INVOLFLT_DATA_SOURCE_META_DATA; default: return INVOLFLT_DATA_SOURCE_META_DATA; } } /* *inm_deref_all_vol_entry_tcp() can only be called after stop filtering is issued */ static inm_s32_t inm_deref_all_vol_entry_tcp(target_context_t *tgt_ctxt) { struct inm_list_head *ptr, *hd, *nextptr; mirror_vol_entry_t *vol_entry = NULL; inm_s32_t error = 0; hd = &(tgt_ctxt->tc_dst_list); inm_list_for_each_safe(ptr, nextptr, hd) { vol_entry = inm_container_of(ptr, mirror_vol_entry_t, next); while(INM_ATOMIC_READ(&(vol_entry->vol_ref)) > 1){ INM_DELAY(1 * INM_HZ); } } free_mirror_list(&(tgt_ctxt->tc_dst_list), 1); free_mirror_list(&(tgt_ctxt->tc_src_list), 1); return error; } void init_latency_stats(inm_latency_stats_t *lat_stp, const inm_u64_t *bktsp) { inm_s32_t i; if (!lat_stp || !bktsp) { err("initializing latency buckets failed\n"); return; } INM_MEM_ZERO(lat_stp, sizeof(*lat_stp)); for (i=0; ((i < INM_LATENCY_DIST_BKT_CAPACITY) && (bktsp[i] > 0)); i++) { lat_stp->ls_bkts[i]=bktsp[i]; } lat_stp->ls_nr_avail_bkts = i; } void collect_latency_stats(inm_latency_stats_t *lat_stp, inm_u64_t time_in_usec) { inm_u32_t idx = 0; inm_u64_t max_nr_bkts = lat_stp->ls_nr_avail_bkts-1; if (time_in_usec > lat_stp->ls_bkts[max_nr_bkts]) { idx = max_nr_bkts; } else { for (idx = 0; idx < max_nr_bkts; idx++) { if (time_in_usec <= lat_stp->ls_bkts[idx]) { break; } } } lat_stp->ls_freq[idx]++; if (!lat_stp->ls_init_min_max) { lat_stp->ls_log_min = lat_stp->ls_log_max = time_in_usec; lat_stp->ls_init_min_max = 1; } if (lat_stp->ls_log_min > time_in_usec) { lat_stp->ls_log_min = time_in_usec; } if (lat_stp->ls_log_max < time_in_usec) { lat_stp->ls_log_max = time_in_usec; } idx = (lat_stp->ls_log_idx % INM_LATENCY_LOG_CAPACITY); lat_stp->ls_log_buf[idx] = time_in_usec; lat_stp->ls_log_idx++; } void retrieve_volume_latency_stats(target_context_t *tcp, VOLUME_LATENCY_STATS *vlsp) { inm_u32_t idx, o_idx; if (!tcp || !vlsp) { err("invalid buffers to copy latency data"); return; } memcpy_s(vlsp->s2dbret_bkts, sizeof(tcp->tc_dbret_latstat.ls_bkts), tcp->tc_dbret_latstat.ls_bkts, sizeof(tcp->tc_dbret_latstat.ls_bkts)); memcpy_s(vlsp->s2dbret_freq, sizeof(tcp->tc_dbret_latstat.ls_freq), tcp->tc_dbret_latstat.ls_freq, sizeof(tcp->tc_dbret_latstat.ls_freq)); vlsp->s2dbret_nr_avail_bkts = tcp->tc_dbret_latstat.ls_nr_avail_bkts; vlsp->s2dbret_log_min = tcp->tc_dbret_latstat.ls_log_min; vlsp->s2dbret_log_max = tcp->tc_dbret_latstat.ls_log_max; memcpy_s(vlsp->s2dbwait_notify_bkts, sizeof(tcp->tc_dbwait_notify_latstat.ls_bkts), tcp->tc_dbwait_notify_latstat.ls_bkts, sizeof(tcp->tc_dbwait_notify_latstat.ls_bkts)); memcpy_s(vlsp->s2dbwait_notify_freq, sizeof(tcp->tc_dbwait_notify_latstat.ls_freq), tcp->tc_dbwait_notify_latstat.ls_freq, sizeof(tcp->tc_dbwait_notify_latstat.ls_freq)); vlsp->s2dbwait_notify_nr_avail_bkts = tcp->tc_dbwait_notify_latstat.ls_nr_avail_bkts; vlsp->s2dbwait_notify_log_min = tcp->tc_dbwait_notify_latstat.ls_log_min; vlsp->s2dbwait_notify_log_max = tcp->tc_dbwait_notify_latstat.ls_log_max; memcpy_s(vlsp->s2dbcommit_bkts, sizeof(tcp->tc_dbcommit_latstat.ls_bkts), tcp->tc_dbcommit_latstat.ls_bkts, sizeof(tcp->tc_dbcommit_latstat.ls_bkts)); memcpy_s(vlsp->s2dbcommit_freq, sizeof(tcp->tc_dbcommit_latstat.ls_freq), tcp->tc_dbcommit_latstat.ls_freq, sizeof(tcp->tc_dbcommit_latstat.ls_freq)); vlsp->s2dbcommit_nr_avail_bkts = tcp->tc_dbcommit_latstat.ls_nr_avail_bkts; vlsp->s2dbcommit_log_min = tcp->tc_dbcommit_latstat.ls_log_min; vlsp->s2dbcommit_log_max = tcp->tc_dbcommit_latstat.ls_log_max; for (o_idx = 0; o_idx < INM_LATENCY_LOG_CAPACITY; o_idx++) { idx = (tcp->tc_dbret_latstat.ls_log_idx + INM_LATENCY_LOG_CAPACITY - 1) % INM_LATENCY_LOG_CAPACITY; vlsp->s2dbret_log_buf[o_idx] = tcp->tc_dbret_latstat.ls_log_buf[idx]; tcp->tc_dbret_latstat.ls_log_idx++; idx = (tcp->tc_dbwait_notify_latstat.ls_log_idx + INM_LATENCY_LOG_CAPACITY - 1) % INM_LATENCY_LOG_CAPACITY; vlsp->s2dbwait_notify_log_buf[o_idx] = tcp->tc_dbwait_notify_latstat.ls_log_buf[idx]; tcp->tc_dbwait_notify_latstat.ls_log_idx++; idx = (tcp->tc_dbcommit_latstat.ls_log_idx + INM_LATENCY_LOG_CAPACITY - 1) % INM_LATENCY_LOG_CAPACITY; vlsp->s2dbcommit_log_buf[o_idx] = tcp->tc_dbcommit_latstat.ls_log_buf[idx]; tcp->tc_dbcommit_latstat.ls_log_idx++; } } inm_u64_t get_rpo_timestamp(target_context_t *ctxt, inm_u32_t flag, change_node_t *pending_confirm) { inm_irqflag_t lock_flag = 0; struct inm_list_head *ptr; change_node_t *oldest_chg_node; /* TSO commit or rpo timestamp req. through ioctl */ if (inm_list_empty(&ctxt->tc_node_head)) { if (ctxt->tc_cur_wostate != ecWriteOrderStateBitmap && (ctxt->tc_cur_wostate != ecWriteOrderStateUnInitialized)) { /* Get the current driver time stamp */ INM_SPIN_LOCK_IRQSAVE(&driver_ctx->time_stamp_lock, lock_flag); ctxt->tc_rpo_timestamp = driver_ctx->last_time_stamp; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->time_stamp_lock, lock_flag); } /* otherwise just return the older rpo timestamp */ return ctxt->tc_rpo_timestamp; } ptr = ctxt->tc_node_head.next; oldest_chg_node = (change_node_t *)inm_list_entry(ptr, change_node_t, next); /* TSO file commit OR ADDITIONAL STATS IOCTL codepath */ if (ctxt->tc_tso_file == 1 || flag == IOCTL_INMAGE_GET_ADDITIONAL_VOLUME_STATS) { /* See if RPO timestamp can be set to the start TS of oldest chgnode */ if ((oldest_chg_node->wostate != ecWriteOrderStateBitmap) && (oldest_chg_node != ctxt->tc_pending_confirm)) { ctxt->tc_rpo_timestamp = oldest_chg_node->changes.start_ts.TimeInHundNanoSecondsFromJan1601; } return ctxt->tc_rpo_timestamp; } /* User data dirty block commit */ INM_BUG_ON(!pending_confirm); if (pending_confirm->wostate == ecWriteOrderStateData) { ptr = ctxt->tc_node_head.next->next; if (ptr && ptr != &ctxt->tc_node_head) { /* set RPO to start timestamp of the second from oldest iff * not in bitmap wostate otherwise set it to end timestamp of * currently drained change node (pending_confirm) */ oldest_chg_node = (change_node_t *)inm_list_entry(ptr, change_node_t, next); if (oldest_chg_node->wostate != ecWriteOrderStateBitmap) ctxt->tc_rpo_timestamp = oldest_chg_node->changes.start_ts.TimeInHundNanoSecondsFromJan1601; else ctxt->tc_rpo_timestamp = pending_confirm->changes.end_ts.TimeInHundNanoSecondsFromJan1601; } else { /* Last change node got drained then RPO is zero */ INM_SPIN_LOCK_IRQSAVE(&driver_ctx->time_stamp_lock, lock_flag); ctxt->tc_rpo_timestamp = driver_ctx->last_time_stamp; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->time_stamp_lock, lock_flag); } } else { /* pending confirm is not in wo state data i.e. perf changes * kicked in and pending confirm always has manupulated * timestamp */ if (pending_confirm != oldest_chg_node && oldest_chg_node->wostate != ecWriteOrderStateBitmap) { ctxt->tc_rpo_timestamp = oldest_chg_node->changes.start_ts.TimeInHundNanoSecondsFromJan1601; } } return ctxt->tc_rpo_timestamp; } void end_cx_session() { vm_cx_session_t *vm_cx_sess = &driver_ctx->dc_vm_cx_session; disk_cx_session_t *disk_cx_sess; inm_list_head_t *ptr; if (!(vm_cx_sess->vcs_flags & VCS_CX_SESSION_STARTED) || (vm_cx_sess->vcs_flags & VCS_CX_SESSION_ENDED)) return; vm_cx_sess->vcs_flags |= VCS_CX_SESSION_ENDED; get_time_stamp(&(vm_cx_sess->vcs_end_ts)); for (ptr = driver_ctx->dc_disk_cx_sess_list.next; ptr != &(driver_ctx->dc_disk_cx_sess_list); ptr = ptr->next) { disk_cx_sess = inm_list_entry(ptr, disk_cx_session_t, dcs_list); if (!(disk_cx_sess->dcs_flags & DCS_CX_SESSION_STARTED)) continue; disk_cx_sess->dcs_flags |= DCS_CX_SESSION_ENDED; disk_cx_sess->dcs_end_ts = vm_cx_sess->vcs_end_ts; } } void update_disk_churn_buckets(disk_cx_session_t *disk_cx_sess) { vm_cx_session_t *vm_cx_sess = &driver_ctx->dc_vm_cx_session; inm_u32_t disk_churn_in_MB = (disk_cx_sess->dcs_tracked_bytes_per_second >> MEGABYTE_BIT_SHIFT); inm_u32_t disk_bucket_idx = disk_churn_in_MB / 5; if (disk_bucket_idx >= DEFAULT_NR_CHURN_BUCKETS) disk_bucket_idx = DEFAULT_NR_CHURN_BUCKETS - 1; disk_cx_sess->dcs_churn_buckets[disk_bucket_idx]++; if (disk_cx_sess->dcs_tracked_bytes_per_second >= vm_cx_sess->vcs_default_disk_peak_churn) { disk_cx_sess->dcs_excess_churn += (disk_cx_sess->dcs_tracked_bytes_per_second - vm_cx_sess->vcs_default_disk_peak_churn); get_time_stamp(&(disk_cx_sess->dcs_last_peak_churn_ts)); if (!disk_cx_sess->dcs_first_peak_churn_ts) disk_cx_sess->dcs_first_peak_churn_ts = disk_cx_sess->dcs_last_peak_churn_ts; if (disk_cx_sess->dcs_tracked_bytes_per_second > disk_cx_sess->dcs_max_peak_churn) disk_cx_sess->dcs_max_peak_churn = disk_cx_sess->dcs_tracked_bytes_per_second; } disk_cx_sess->dcs_tracked_bytes_per_second = 0; disk_cx_sess->dcs_base_secs_ts += HUNDREDS_OF_NANOSEC_IN_SECOND; } void update_vm_churn_buckets(vm_cx_session_t *vm_cx_sess) { inm_u32_t vm_churn_in_MB = (vm_cx_sess->vcs_tracked_bytes_per_second >> MEGABYTE_BIT_SHIFT); inm_u32_t vm_bucket_idx = vm_churn_in_MB / 10; if (vm_bucket_idx >= DEFAULT_NR_CHURN_BUCKETS) vm_bucket_idx = DEFAULT_NR_CHURN_BUCKETS - 1; vm_cx_sess->vcs_churn_buckets[vm_bucket_idx]++; if (vm_cx_sess->vcs_tracked_bytes_per_second >= vm_cx_sess->vcs_default_vm_peak_churn) { vm_cx_sess->vcs_excess_churn += (vm_cx_sess->vcs_tracked_bytes_per_second - vm_cx_sess->vcs_default_vm_peak_churn); get_time_stamp(&(vm_cx_sess->vcs_last_peak_churn_ts)); if (!vm_cx_sess->vcs_first_peak_churn_ts) vm_cx_sess->vcs_first_peak_churn_ts = vm_cx_sess->vcs_last_peak_churn_ts; if (vm_cx_sess->vcs_tracked_bytes_per_second > vm_cx_sess->vcs_max_peak_churn) vm_cx_sess->vcs_max_peak_churn = vm_cx_sess->vcs_tracked_bytes_per_second; } vm_cx_sess->vcs_tracked_bytes_per_second = 0; vm_cx_sess->vcs_base_secs_ts += HUNDREDS_OF_NANOSEC_IN_SECOND; } void add_disk_sess_to_dc(target_context_t *ctxt) { disk_cx_session_t *disk_cx_sess = &ctxt->tc_disk_cx_session; inm_list_add_tail(&disk_cx_sess->dcs_list, &driver_ctx->dc_disk_cx_sess_list); } void remove_disk_sess_from_dc(target_context_t *ctxt) { disk_cx_session_t *disk_cx_sess = &ctxt->tc_disk_cx_session; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); inm_list_del(&disk_cx_sess->dcs_list); INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); } void start_disk_cx_session(target_context_t *ctxt, vm_cx_session_t *vm_cx_sess, inm_u32_t nr_bytes) { disk_cx_session_t *disk_cx_sess = &ctxt->tc_disk_cx_session; inm_u32_t churn; churn = ((ctxt->tc_bytes_pending_changes <= CX_SESSION_PENDING_BYTES_THRESHOLD) ? (ctxt->tc_bytes_pending_changes + nr_bytes - CX_SESSION_PENDING_BYTES_THRESHOLD) : nr_bytes); inm_list_del(&disk_cx_sess->dcs_list); INM_MEM_ZERO(disk_cx_sess, sizeof(disk_cx_session_t)); inm_list_add_tail(&disk_cx_sess->dcs_list, &driver_ctx->dc_disk_cx_sess_list); if (!vm_cx_sess->vcs_num_disk_cx_sess) { vm_cx_sess->vcs_tracked_bytes = ctxt->tc_bytes_pending_changes + nr_bytes; vm_cx_sess->vcs_tracked_bytes_per_second += churn; disk_cx_sess->dcs_start_ts = vm_cx_sess->vcs_start_ts; } else { vm_cx_sess->vcs_tracked_bytes += nr_bytes; get_time_stamp(&(disk_cx_sess->dcs_start_ts)); if (disk_cx_sess->dcs_start_ts - vm_cx_sess->vcs_base_secs_ts >= HUNDREDS_OF_NANOSEC_IN_SECOND) { update_disk_churn_buckets(disk_cx_sess); update_vm_churn_buckets(vm_cx_sess); } vm_cx_sess->vcs_tracked_bytes_per_second += nr_bytes; } vm_cx_sess->vcs_num_disk_cx_sess++; disk_cx_sess->dcs_flags |= DCS_CX_SESSION_STARTED; disk_cx_sess->dcs_base_secs_ts = vm_cx_sess->vcs_base_secs_ts; disk_cx_sess->dcs_tracked_bytes = ctxt->tc_bytes_pending_changes + nr_bytes; disk_cx_sess->dcs_tracked_bytes_per_second += churn; disk_cx_sess->dcs_nth_cx_session = vm_cx_sess->vcs_nth_cx_session; } void start_cx_session(target_context_t *ctxt, vm_cx_session_t *vm_cx_sess, inm_u32_t nr_bytes) { INM_MEM_ZERO(vm_cx_sess, sizeof(vm_cx_session_t)); vm_cx_sess->vcs_flags |= VCS_CX_SESSION_STARTED; get_time_stamp(&(vm_cx_sess->vcs_start_ts)); vm_cx_sess->vcs_base_secs_ts = vm_cx_sess->vcs_start_ts; vm_cx_sess->vcs_nth_cx_session = ++driver_ctx->dc_nth_cx_session; if (driver_ctx->dc_disk_level_supported_churn) { vm_cx_sess->vcs_default_disk_peak_churn = driver_ctx->dc_disk_level_supported_churn; vm_cx_sess->vcs_default_vm_peak_churn = driver_ctx->dc_vm_level_supported_churn; } else { vm_cx_sess->vcs_default_disk_peak_churn = DISK_LEVEL_SUPPORTED_CHURN; vm_cx_sess->vcs_default_vm_peak_churn = VM_LEVEL_SUPPORTED_CHURN; } start_disk_cx_session(ctxt, vm_cx_sess, nr_bytes); } disk_cx_stats_info_t *find_disk_stat_info(target_context_t *ctxt) { inm_list_head_t *disk_cx_stats_ptr; disk_cx_stats_info_t *disk_cx_stats_info; DEVICE_CXFAILURE_STATS *dev_cx_stats; for (disk_cx_stats_ptr = &driver_ctx->dc_disk_cx_stats_list; disk_cx_stats_ptr != &driver_ctx->dc_disk_cx_stats_list; disk_cx_stats_ptr = disk_cx_stats_ptr->next) { disk_cx_stats_info = inm_list_entry(disk_cx_stats_ptr, disk_cx_stats_info_t, dcsi_list); dev_cx_stats = &disk_cx_stats_info->dcsi_dev_cx_stats; if (!disk_cx_stats_info->dcsi_valid) { disk_cx_stats_info->dcsi_valid = 1; strcpy_s(dev_cx_stats->DeviceId.volume_guid, GUID_SIZE_IN_CHARS, ctxt->tc_pname); return disk_cx_stats_info; } if (!strcmp(ctxt->tc_pname, dev_cx_stats->DeviceId.volume_guid)) return disk_cx_stats_info; } return NULL; } void close_disk_cx_session(target_context_t *ctxt, int reason_code) { vm_cx_session_t *vm_cx_sess = &driver_ctx->dc_vm_cx_session; disk_cx_session_t *disk_cx_sess = &ctxt->tc_disk_cx_session; disk_cx_stats_info_t *disk_cx_stats_info; DEVICE_CXFAILURE_STATS *disk_cx_stats; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); if (!(vm_cx_sess->vcs_flags & VCS_CX_SESSION_STARTED)) goto out; if (disk_cx_sess->dcs_flags & DCS_CX_SESSION_STARTED) { if (reason_code == CX_CLOSE_STOP_FILTERING_ISSUED || reason_code == CX_CLOSE_DISK_REMOVAL) { disk_cx_stats_info = disk_cx_sess->dcs_disk_cx_stats_info; if (!disk_cx_stats_info) { disk_cx_stats_info = find_disk_stat_info(ctxt); if (!disk_cx_stats_info) goto erase_disk_session; driver_ctx->dc_num_disk_cx_stats++; } disk_cx_stats = &disk_cx_stats_info->dcsi_dev_cx_stats; if (reason_code & CX_CLOSE_STOP_FILTERING_ISSUED) disk_cx_stats->ullFlags |= DISK_CXSTATUS_DISK_NOT_FILTERED; if (reason_code & CX_CLOSE_DISK_REMOVAL) disk_cx_stats->ullFlags |= DISK_CXSTATUS_DISK_REMOVED; } erase_disk_session: disk_cx_sess->dcs_flags = 0; vm_cx_sess->vcs_num_disk_cx_sess--; if (!vm_cx_sess->vcs_num_disk_cx_sess) INM_MEM_ZERO(vm_cx_sess, sizeof(vm_cx_session_t)); } out: INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); } void update_disk_cx_session(disk_cx_session_t *disk_cx_sess, vm_cx_session_t *vm_cx_sess, inm_u32_t nr_bytes) { inm_u64_t cuur_ts; if (disk_cx_sess->dcs_base_secs_ts < vm_cx_sess->vcs_base_secs_ts) { update_disk_churn_buckets(disk_cx_sess); disk_cx_sess->dcs_base_secs_ts = vm_cx_sess->vcs_base_secs_ts; } get_time_stamp(&cuur_ts); if (cuur_ts - vm_cx_sess->vcs_base_secs_ts >= HUNDREDS_OF_NANOSEC_IN_SECOND) update_disk_churn_buckets(disk_cx_sess); disk_cx_sess->dcs_tracked_bytes += nr_bytes; disk_cx_sess->dcs_tracked_bytes_per_second += nr_bytes; } void update_vm_cx_session(vm_cx_session_t *vm_cx_sess, inm_u32_t nr_bytes) { inm_u64_t cuur_ts; get_time_stamp(&cuur_ts); if (cuur_ts - vm_cx_sess->vcs_base_secs_ts >= HUNDREDS_OF_NANOSEC_IN_SECOND) update_vm_churn_buckets(vm_cx_sess); vm_cx_sess->vcs_tracked_bytes += nr_bytes; vm_cx_sess->vcs_tracked_bytes_per_second += nr_bytes; } void update_cx_session(target_context_t *ctxt, inm_u32_t nr_bytes) { vm_cx_session_t *vm_cx_sess = &driver_ctx->dc_vm_cx_session; disk_cx_session_t *disk_cx_sess = &ctxt->tc_disk_cx_session; inm_list_head_t *ptr; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); if (driver_ctx->total_prot_volumes_in_nwo) goto out; if (!(disk_cx_sess->dcs_flags & DCS_CX_SESSION_STARTED) && (vm_cx_sess->vcs_flags & VCS_CX_SESSION_ENDED) && (ctxt->tc_bytes_pending_changes + nr_bytes) > CX_SESSION_PENDING_BYTES_THRESHOLD) { for (ptr = driver_ctx->dc_disk_cx_sess_list.next; ptr != &driver_ctx->dc_disk_cx_sess_list; ptr = ptr->next) { disk_cx_session_t *dcs; dcs = inm_list_entry(ptr, disk_cx_session_t, dcs_list); if (!(dcs->dcs_flags & DCS_CX_SESSION_STARTED)) continue; dcs->dcs_flags = 0; vm_cx_sess->vcs_num_disk_cx_sess--; if (!vm_cx_sess->vcs_num_disk_cx_sess) INM_MEM_ZERO(vm_cx_sess, sizeof(vm_cx_session_t)); } } if (vm_cx_sess->vcs_flags & VCS_CX_SESSION_ENDED) goto out; if (!(vm_cx_sess->vcs_flags & VCS_CX_SESSION_STARTED)) { if ((ctxt->tc_bytes_pending_changes + nr_bytes) > CX_SESSION_PENDING_BYTES_THRESHOLD) start_cx_session(ctxt, vm_cx_sess, nr_bytes); } else { if (!(disk_cx_sess->dcs_flags & DCS_CX_SESSION_STARTED)) { if ((ctxt->tc_bytes_pending_changes + nr_bytes) > CX_SESSION_PENDING_BYTES_THRESHOLD) start_disk_cx_session(ctxt, vm_cx_sess, nr_bytes); else update_vm_cx_session(vm_cx_sess, nr_bytes); } else { update_disk_cx_session(disk_cx_sess, vm_cx_sess, nr_bytes); update_vm_cx_session(vm_cx_sess, nr_bytes); } } out: INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); } void update_cx_session_with_committed_bytes(target_context_t *ctxt, inm_s32_t committed_bytes) { vm_cx_session_t *vm_cx_sess = &driver_ctx->dc_vm_cx_session; disk_cx_session_t *disk_cx_sess = &ctxt->tc_disk_cx_session; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); if (vm_cx_sess->vcs_flags & VCS_CX_SESSION_STARTED && !(vm_cx_sess->vcs_flags & VCS_CX_SESSION_ENDED)) { vm_cx_sess->vcs_drained_bytes += committed_bytes; if (disk_cx_sess->dcs_flags & DCS_CX_SESSION_STARTED && !(disk_cx_sess->dcs_flags & DCS_CX_SESSION_ENDED)) disk_cx_sess->dcs_drained_bytes += committed_bytes; } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); } void update_cx_product_issue(int flag) { vm_cx_session_t *vm_cx_sess = &driver_ctx->dc_vm_cx_session; int updated = 0; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); if (vm_cx_sess->vcs_flags & VCS_CX_SESSION_STARTED) { if (!(vm_cx_sess->vcs_flags & flag)) { vm_cx_sess->vcs_flags |= (flag | VCS_CX_PRODUCT_ISSUE); updated = 1; } } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); if (!updated) return; switch (flag) { case VCS_CX_S2_EXIT: info("Drainer exited while CX sesion is in progress"); break; case VCS_CX_SVAGENT_EXIT: info("svagent exited while CX sesion is in progress"); break; case VCS_CX_TAG_FAILURE: info("Tag failed before CX session has ended"); break; case VCS_CX_UNSUPPORTED_BIO: dbg("Unsupported BIO is detected"); break; } } void update_cx_with_tag_failure() { vm_cx_session_t *vm_cx_sess = &driver_ctx->dc_vm_cx_session; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); if (vm_cx_sess->vcs_flags & VCS_CX_SESSION_STARTED) { INM_BUG_ON(!(vm_cx_sess->vcs_flags & VCS_CX_SESSION_ENDED)); vm_cx_sess->vcs_num_consecutive_tag_failures++; if (vm_cx_sess->vcs_flags & VCS_CX_PRODUCT_ISSUE) goto out; if (vm_cx_sess->vcs_transaction_id) vm_cx_sess->vcs_transaction_id = ++driver_ctx->dc_transaction_id; if (vm_cx_sess->vcs_num_consecutive_tag_failures >= driver_ctx->dc_num_consecutive_tags_failed) wake_up_interruptible(&driver_ctx->dc_vm_cx_session_waitq); } out: INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); } void update_cx_with_tag_success() { vm_cx_session_t *vm_cx_sess = &driver_ctx->dc_vm_cx_session; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); if (vm_cx_sess->vcs_flags & VCS_CX_SESSION_ENDED) { vm_cx_sess->vcs_num_consecutive_tag_failures = 0; vm_cx_sess->vcs_transaction_id = 0; } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); } void update_cx_with_s2_latency(target_context_t *ctxt) { vm_cx_session_t *vm_cx_sess = &driver_ctx->dc_vm_cx_session; disk_cx_session_t *disk_cx_sess = &ctxt->tc_disk_cx_session; inm_u64_t curr_ts; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); if (disk_cx_sess->dcs_flags & DCS_CX_SESSION_STARTED) { get_time_stamp(&curr_ts); if (ctxt->tc_s2_latency_base_ts && (curr_ts - ctxt->tc_s2_latency_base_ts) > disk_cx_sess->dcs_max_s2_latency) disk_cx_sess->dcs_max_s2_latency = curr_ts - ctxt->tc_s2_latency_base_ts; if (disk_cx_sess->dcs_max_s2_latency > vm_cx_sess->vcs_max_s2_latency) vm_cx_sess->vcs_max_s2_latency = disk_cx_sess->dcs_max_s2_latency; } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); } void update_cx_with_time_jump(inm_u64_t cur_time, inm_u64_t prev_time) { vm_cx_session_t *vm_cx_sess = &driver_ctx->dc_vm_cx_session; int time_jump_detected = 0; inm_u64_t jump_in_ns; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); if (cur_time > prev_time) { if (cur_time - prev_time > (driver_ctx->dc_max_fwd_timejump_ms * 10000ULL)) { time_jump_detected = 1; vm_cx_sess->vcs_flags |= VCS_CX_TIME_JUMP_FWD; jump_in_ns = cur_time - prev_time; } } else { if (prev_time - cur_time > (driver_ctx->dc_max_bwd_timejump_ms * 10000ULL)) { time_jump_detected = 1; vm_cx_sess->vcs_flags |= VCS_CX_TIME_JUMP_BWD; jump_in_ns = prev_time - cur_time; } } if (time_jump_detected) { vm_cx_sess->vcs_timejump_ts = prev_time; vm_cx_sess->vcs_max_jump_ms = (jump_in_ns / 10000ULL); if (vm_cx_sess->vcs_transaction_id) vm_cx_sess->vcs_transaction_id = ++driver_ctx->dc_transaction_id; wake_up_interruptible(&driver_ctx->dc_vm_cx_session_waitq); } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); } void reset_s2_latency_time() { target_context_t *ctxt; inm_list_head_t *ptr; INM_DOWN_READ(&driver_ctx->tgt_list_sem); for (ptr = driver_ctx->tgt_list.next; ptr != &(driver_ctx->tgt_list); ptr = ptr->next, ctxt = NULL) { ctxt = inm_list_entry(ptr, target_context_t, tc_list); if (ctxt->tc_flags & (VCF_VOLUME_CREATING | VCF_VOLUME_DELETING)) continue; volume_lock(ctxt); ctxt->tc_s2_latency_base_ts = 0; volume_unlock(ctxt); } INM_UP_READ(&driver_ctx->tgt_list_sem); } void volume_lock_all_close_cur_chg_node(void) { target_context_t *ctxt; inm_list_head_t *ptr; for (ptr = driver_ctx->tgt_list.next; ptr != &(driver_ctx->tgt_list); ptr = ptr->next, ctxt = NULL) { ctxt = inm_list_entry(ptr, target_context_t, tc_list); if (ctxt->tc_flags & (VCF_VOLUME_CREATING | VCF_VOLUME_DELETING)) continue; INM_BUG_ON(!inm_list_empty(&ctxt->tc_non_drainable_node_head)); volume_lock(ctxt); ctxt->tc_flags |= VCF_VOLUME_LOCKED; if ((ctxt->tc_optimize_performance & PERF_OPT_DRAIN_PREF_DATA_MODE_CHANGES_IN_NWO) && !inm_list_empty(&ctxt->tc_node_head) && ctxt->tc_cur_node) { do_perf_changes(ctxt, ctxt->tc_cur_node, IN_IOCTL_PATH); } ctxt->tc_cur_node = NULL; } } void volume_unlock_all(void) { target_context_t *ctxt; inm_list_head_t *ptr; /* Accessing the target contexts in reverse order to main the lock * hierarchy provided by volume_lock_all_close_cur_chg_node */ for (ptr = driver_ctx->tgt_list.prev; ptr != &(driver_ctx->tgt_list); ptr = ptr->prev, ctxt = NULL) { ctxt = inm_list_entry(ptr, target_context_t, tc_list); if (ctxt->tc_flags & VCF_VOLUME_LOCKED) { ctxt->tc_flags &= ~VCF_VOLUME_LOCKED; volume_unlock(ctxt); } } } involflt-0.1.0/src/safecapismajor.h0000755000000000000000000002512514467303177016016 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef __SAFE_C_APIS_MAJOR_H__ #define __SAFE_C_APIS_MAJOR_H__ #ifdef __INM_KERNEL_DRIVERS__ #ifdef INM_LINUX #include #include #else #include #include #endif #else #include #include #include #include #endif /* __INM_KERNEL_DRIVERS__ */ /* Error Values */ #define INM_ERR_SUCCESS 0 /* Success */ #define INM_ERR_INVALID 1 /* Invalid Argument */ #define INM_ERR_OVERLAP 2 /* Overlap */ #define INM_ERR_UNTERM 3 /* Unterminated */ #define INM_ERR_NOSPC 4 /* No Space */ /* * RHEL 5 gcc compiler does not support inline functions with variable args. * As such, same function is copied as non inline variant in driver code as * a workaround. Any bug fixes here should be made there as well. */ #if !(defined(RHEL_MAJOR) && (RHEL_MAJOR == 5) && __INM_KERNEL_DRIVERS__) /* Make sure compiler does printf style format checking */ static inline int sprintf_s(char *buf, size_t bufsz, const char *fmt, ...) __attribute__ ((format(printf, 3, 4))); static inline int sprintf_s(char *buf, size_t bufsz, const char *fmt, ...) { int retval = -1; va_list args; if( buf && bufsz > 0 && fmt ) { va_start(args, fmt); retval = vsnprintf(buf, bufsz, fmt, args); /* If buffer not adequate, return error */ if( retval >= bufsz ) retval = -1; va_end(args); } if( retval == -1 ) { if( buf && bufsz ) *buf = '\0'; } return retval; } static inline int vsnprintf_s(char *buf, size_t bufsz, const char *fmt, va_list args) { int retval = -1; if( buf && bufsz > 0 && fmt ) { retval = vsnprintf(buf, bufsz, fmt, args); /* If buffer not adequate, return error */ if( retval >= bufsz ) retval = -1; } if( retval == -1 ) { if( buf && bufsz ) *buf = '\0'; } return retval; } #endif static inline int memcpy_s(void *dest, size_t d_count, const void *src, size_t s_count) { uint8_t *destp; const uint8_t *srcp; destp = (uint8_t *)dest; srcp = (const uint8_t *)src; /* nothing to copy*/ if (s_count == 0) { return INM_ERR_SUCCESS; } /* Validations: * 1. dest and src shouldn't be NULL * 2. d_count and s_count shouldn't be 0 * 3. s_count shouldn't be greater than d_count */ if (destp == NULL || d_count == 0 || srcp == NULL || s_count > d_count) { return INM_ERR_INVALID; } /* Check for overlap of dest and src */ if (((srcp > destp) && (srcp < (destp + d_count))) || ((destp > srcp) && (destp < (srcp + s_count)))) { return INM_ERR_OVERLAP; } /* Copy the data from src to dest */ do { *destp++ = *srcp++; } while (--s_count); return INM_ERR_SUCCESS; } static inline int strcat_s(char *str_dest, size_t d_count, const char *str_src) { int ret = INM_ERR_NOSPC; const char *overlap_pos, *overlap_buf; char *str_dest_orig = str_dest; const char *str_src_orig = str_src; /* Validations: * 1. str_dest & str_src shouldn't be NULL * 2. d_count shouldn't be 0 */ if (str_dest == NULL || d_count == 0 || str_src == NULL) return INM_ERR_INVALID; /* Find the end of str_dest */ while (*str_dest != '\0') { /* Check for overlap */ if (str_dest == str_src) { ret = INM_ERR_OVERLAP; goto out; } str_dest++; d_count--; if (d_count == 0) { ret = INM_ERR_UNTERM; goto out; } } if (str_dest_orig < str_src_orig) overlap_pos = str_src_orig; else overlap_pos = str_dest_orig; while (d_count > 0) { if (str_dest_orig < str_src_orig) overlap_buf = str_dest; else overlap_buf = str_src; /* Check for overlap */ if (overlap_buf == overlap_pos){ ret = INM_ERR_OVERLAP; goto out; } *str_dest = *str_src; if (*str_dest == '\0') return INM_ERR_SUCCESS; str_dest++; str_src++; d_count--; } out: *str_dest_orig = '\0'; return ret; } static inline int strncat_s(char *str_dest, size_t d_count, const char *str_src, size_t s_count) { int ret = INM_ERR_NOSPC; const char *overlap_pos, *overlap_buf; char *str_dest_orig = str_dest; const char *str_src_orig = str_src; /* Validations: * 1. str_dest & str_src shouldn't be NULL * 2. d_count shouldn't be 0 */ if (str_dest == NULL || d_count == 0 || str_src == NULL) return INM_ERR_INVALID; /* Find the end of str_dest */ while (*str_dest != '\0') { /* Check for overlap */ if (str_dest == str_src) { ret = INM_ERR_OVERLAP; goto out; } str_dest++; d_count--; if (d_count == 0){ ret = INM_ERR_UNTERM; goto out; } } if (str_dest_orig < str_src_orig) overlap_pos = str_src_orig; else overlap_pos = str_dest_orig; while (d_count > 0) { if (str_dest_orig < str_src_orig) overlap_buf = str_dest; else overlap_buf = str_src; /* Check for overlap */ if (overlap_buf == overlap_pos) { ret = INM_ERR_OVERLAP; goto out; } if (s_count == 0) { *str_dest = '\0'; return INM_ERR_SUCCESS; } *str_dest = *str_src; if (*str_dest == '\0') return INM_ERR_SUCCESS; str_dest++; str_src++; d_count--; s_count--; } out: *str_dest_orig = '\0'; return ret; } /* * api to safely copy the string from src to tgt. * INPUT: * tgt: poiter to target buffer. * src: pointer to string need to be copied. * tgtmax: max aize of the target buffer * RETURN: * 0 : success of string copy. * >0 : see failure cases defined above. */ static inline int strcpy_s(char *tgt, size_t tgtmax, const char *src) { const char *overlap_sensor; int ret; int src_high_mem = 0; char *tgt_orig = tgt; /* Validate argument */ if(tgt == NULL || tgtmax == 0) return INM_ERR_INVALID; if (src == NULL) { ret = INM_ERR_INVALID; goto out; } if (tgt < src) { overlap_sensor = src; src_high_mem = 1; } else { overlap_sensor = tgt; } do { if (!tgtmax) { ret = INM_ERR_NOSPC; goto out; } if (src_high_mem) { if(tgt == overlap_sensor) { ret = INM_ERR_OVERLAP; goto out; } } else { if (src == overlap_sensor) { ret = INM_ERR_OVERLAP; goto out; } } --tgtmax; } while((*tgt++ = *src++)); /*zero_out_remaining*/ while(tgtmax) { *tgt = '\0'; tgtmax--; tgt++; } return INM_ERR_SUCCESS; out: *tgt_orig = '\0'; return ret; } /* * api to safe copy the n bytes from source to target. * INPUT: * tgt : poiter to target buffer. * src : pointer to string need to be copied. * tgtmax: max aize of the target buffer * len : lenght of the bytes need to be copied. * RETURN: * 0 : success of string copy. * >0: see failure cases defined above. * RUNTIME CONSTRAINTS: * If len is either greater than or equal to tgtmax, then tgtmax * should be more than strnlen(src,dmax) * */ static inline int strncpy_s(char *tgt, size_t tgtmax, const char *src, size_t len) { const char *overlap_sensor; int ret; int src_high_mem = 0; char *tgt_orig = tgt; /* validate argument */ if(tgt == NULL || tgtmax == 0) return INM_ERR_INVALID; if (src == NULL) { ret = INM_ERR_INVALID; goto out; } if (tgt < src) { overlap_sensor = src; src_high_mem = 1; } else { overlap_sensor = tgt; } do { if (!tgtmax) { ret = INM_ERR_NOSPC; goto out; } if (src_high_mem) { if(tgt == overlap_sensor) { ret = INM_ERR_OVERLAP; goto out; } } else { if (src == overlap_sensor) { ret = INM_ERR_OVERLAP; goto out; } } if(!len) { goto zero_out_remaining; } --tgtmax; --len; } while((*tgt++ = *src++)); zero_out_remaining: while(tgtmax) { *tgt = '\0'; tgtmax--; tgt++; } return INM_ERR_SUCCESS; out: *tgt_orig = '\0'; return ret; } #ifndef __INM_KERNEL_DRIVERS__ static inline int wcscat_s(wchar_t *str_dest, size_t d_count, const wchar_t *str_src) { int ret = INM_ERR_NOSPC; const wchar_t *overlap_pos, *overlap_buf; wchar_t *str_dest_orig = str_dest; const wchar_t *str_src_orig = str_src; /* Validations: * 1. str_dest & str_src shouldn't be NULL * 2. d_count shouldn't be 0 */ if (str_dest == NULL || d_count == 0 || str_src == NULL) return INM_ERR_INVALID; /* Find the end of str_dest */ while (*str_dest != L'\0') { /* Check for overlap */ if (str_dest == str_src) { ret = INM_ERR_OVERLAP; goto out; } str_dest++; d_count--; if (d_count == 0) { ret = INM_ERR_UNTERM; goto out; } } if (str_dest_orig < str_src_orig) overlap_pos = str_src_orig; else overlap_pos = str_dest_orig; while (d_count > 0) { if (str_dest_orig < str_src_orig) overlap_buf = str_dest; else overlap_buf = str_src; /* Check for overlap */ if (overlap_buf == overlap_pos) { ret = INM_ERR_OVERLAP; goto out; } *str_dest = *str_src; if (*str_dest == L'\0') return INM_ERR_SUCCESS; str_dest++; str_src++; d_count--; } out: *str_dest_orig = L'\0'; return ret; } /* * api to safe copy the wchar bytes from src to tgt. * INPUT: * tgt: poiter to target buffer. * src: pointer to string need to be copied. * tgtmax: max aize of the target buffer. * RETURN: * 0 : success of string copy. * >0 : see failure cases defined above. * */ static inline int wcscpy_s(wchar_t *tgt, size_t tgtmax, const wchar_t *src) { const wchar_t *overlap_sensor; int ret; int src_high_mem = 0; wchar_t *tgt_orig = tgt; /* Validate argument */ if (tgt == NULL || tgtmax == 0) return INM_ERR_INVALID; if (src == NULL) { ret = INM_ERR_INVALID; goto out; } if (tgt < src) { overlap_sensor = src; src_high_mem = 1; } else { overlap_sensor = tgt; } do { if (!tgtmax) { ret = INM_ERR_NOSPC; goto out; } if (src_high_mem) { if(tgt == overlap_sensor) { ret = INM_ERR_OVERLAP; goto out; } } else { if (src == overlap_sensor) { ret = INM_ERR_OVERLAP; goto out; } } --tgtmax; } while(*tgt++ = *src++); /*zero_out_remaining*/ while(tgtmax) { *tgt = L'\0'; tgtmax--; tgt++; } return INM_ERR_SUCCESS; out: *tgt_orig = L'\0'; return ret; } #endif /* __INM_KERNEL_DRIVERS__ */ #endif /* __SAFE_C_APIS_MAJOR_H__ */ involflt-0.1.0/src/last_chance_writes.c0000755000000000000000000001351714467303177016665 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include "involflt.h" #include "work_queue.h" #include "utils.h" #include "filestream.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "VBitmap.h" #include "change-node.h" #include "data-file-mode.h" #include "target-context.h" #include "data-mode.h" #include "driver-context.h" #include "file-io.h" #include "osdep.h" #include "db_routines.h" #include "errlog.h" #include "filestream_raw.h" #include "bitmap_api.h" #include "filter_host.h" extern driver_context_t *driver_ctx; void lcw_move_bitmap_to_raw_mode(target_context_t *tgt_ctxt) { inm_s32_t error = 0; inm_u64_t resync_error = 0; volume_bitmap_t *vbmap = NULL; bitmap_api_t *bapi = NULL; err("Switching bitmap file to rawio mode for %s", tgt_ctxt->tc_guid); inmage_flt_save_all_changes(tgt_ctxt, TRUE, INM_NO_OP); volume_lock(tgt_ctxt); if(tgt_ctxt->tc_bp->volume_bitmap) { get_volume_bitmap(tgt_ctxt->tc_bp->volume_bitmap); vbmap = tgt_ctxt->tc_bp->volume_bitmap; } volume_unlock(tgt_ctxt); if (!vbmap) { resync_error = ERROR_TO_REG_LEARN_PHYSICAL_IO_FAILURE; error = LINVOLFLT_ERR_DELETE_BITMAP_FILE_NO_NAME; goto out; } INM_DOWN(&vbmap->sem); if (vbmap->eVBitmapState != ecVBitmapStateClosed) { bapi = tgt_ctxt->tc_bp->volume_bitmap->bitmap_api; error = bitmap_api_switch_to_rawio_mode(bapi, &resync_error); } else { resync_error = ERROR_TO_REG_LEARN_PHYSICAL_IO_FAILURE; error = LINVOLFLT_ERR_BITMAP_FILE_CANT_OPEN; } INM_UP(&vbmap->sem); put_volume_bitmap(vbmap); out: if (resync_error || error) { set_volume_out_of_sync(tgt_ctxt, resync_error, error); flush_and_close_bitmap_file(tgt_ctxt); } } static void lcw_flush_volume_changes(target_context_t *tgt_ctxt) { volume_bitmap_t *vbmap = NULL; err("Flushing bitmap file in rawio mode for %s", tgt_ctxt->tc_guid); volume_lock(tgt_ctxt); if(tgt_ctxt->tc_bp->volume_bitmap) { vbmap = tgt_ctxt->tc_bp->volume_bitmap; get_volume_bitmap(vbmap); } volume_unlock(tgt_ctxt); if (!vbmap) goto out; INM_BUG_ON(vbmap->eVBitmapState == ecVBitmapStateClosed); inmage_flt_save_all_changes(tgt_ctxt, TRUE, INM_NO_OP); if (tgt_ctxt->tc_resync_required) bitmap_api_set_volume_out_of_sync(vbmap->bitmap_api, tgt_ctxt->tc_out_of_sync_err_code, tgt_ctxt->tc_out_of_sync_err_status); volume_lock(tgt_ctxt); tgt_ctxt->tc_flags |= VCF_VOLUME_STACKED_PARTIALLY; volume_unlock(tgt_ctxt); flush_and_close_bitmap_file(tgt_ctxt); volume_lock(tgt_ctxt); tgt_ctxt->tc_flags &= ~VCF_VOLUME_STACKED_PARTIALLY; volume_unlock(tgt_ctxt); put_volume_bitmap(vbmap); out: return; } void lcw_flush_changes(void) { struct inm_list_head *ptr = NULL, *nextptr = NULL; target_context_t *tgt_ctxt = NULL; target_context_t *root = NULL; INM_DOWN_READ(&driver_ctx->tgt_list_sem); inm_list_for_each_safe(ptr, nextptr, &driver_ctx->tgt_list) { tgt_ctxt = inm_list_entry(ptr, target_context_t, tc_list); if (tgt_ctxt->tc_flags & (VCF_VOLUME_CREATING | VCF_VOLUME_DELETING)) continue; /* keep the root device for the end */ if (isrootdev(tgt_ctxt)) { root = tgt_ctxt; continue; } if (tgt_ctxt->tc_bp->volume_bitmap) lcw_flush_volume_changes(tgt_ctxt); } if (root && root->tc_bp->volume_bitmap) lcw_flush_volume_changes(root); else INM_BUG_ON(1); /* This should never happen */ INM_UP_READ(&driver_ctx->tgt_list_sem); } static inm_s32_t lcw_map_bitmap_file_blocks(target_context_t *tgt_ctxt) { inm_s32_t error = 0; volume_bitmap_t *vbmap = NULL; fstream_raw_hdl_t *hdl = NULL; err("Mapping bitmap file for %s", tgt_ctxt->tc_guid); volume_lock(tgt_ctxt); if(tgt_ctxt->tc_bp->volume_bitmap) { get_volume_bitmap(tgt_ctxt->tc_bp->volume_bitmap); vbmap = tgt_ctxt->tc_bp->volume_bitmap; } volume_unlock(tgt_ctxt); if (!vbmap) { error = INM_ENOENT; goto out; } INM_DOWN(&vbmap->sem); error = bitmap_api_map_file_blocks(vbmap->bitmap_api, &hdl); if (!error) fstream_raw_close(hdl); INM_UP(&vbmap->sem); put_volume_bitmap(vbmap); out: return error; } inm_s32_t lcw_perform_bitmap_op(char *guid, enum LCW_OP op) { inm_s32_t error = 0; target_context_t *tgt_ctxt = NULL; tgt_ctxt = get_tgt_ctxt_from_uuid(guid); if (!tgt_ctxt) { error = -ENOENT; err("Guid not found - %s", guid); goto out; } err("Running op %d for disk %s", op, guid); switch (op) { case LCW_OP_BMAP_MAP_FILE: error = lcw_map_bitmap_file_blocks(tgt_ctxt); break; case LCW_OP_BMAP_SWITCH_RAWIO: lcw_move_bitmap_to_raw_mode(tgt_ctxt); break; case LCW_OP_BMAP_CLOSE: lcw_flush_volume_changes(tgt_ctxt); write_to_file("/proc/sys/vm/drop_caches", "1", strlen("1"), NULL); break; case LCW_OP_BMAP_OPEN: request_service_thread_to_open_bitmap(tgt_ctxt); break; default: err("Invalid opcode - %d", op); error = -EINVAL; break; } put_tgt_ctxt(tgt_ctxt); out: return error; } inm_s32_t lcw_map_file_blocks(char *name) { fstream_raw_hdl_t *hdl = NULL; inm_s32_t error = 0; error = fstream_raw_open(name, 0, 0, &hdl); if (error) fstream_raw_close(hdl); return error; } involflt-0.1.0/src/distro.h0000755000000000000000000000346714467303177014340 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _DISTRO_H #define _DISTRO_H /* SLES 11 */ #if (defined suse && DISTRO_VER == 11) #define SLES11 #if (PATCH_LEVEL == 3) #define SLES11SP3 #elif (PATCH_LEVEL == 4) #define SLES11SP4 #endif #endif /* SLES 12 */ #if (defined suse && DISTRO_VER == 12) #define SLES12 #if (PATCH_LEVEL == 1) #define SLES12SP1 #elif (PATCH_LEVEL == 2) #define SLES12SP2 #elif (PATCH_LEVEL == 3) #define SLES12SP3 #elif (PATCH_LEVEL == 4) #define SLES12SP4 #elif (PATCH_LEVEL == 5) #define SLES12SP5 #endif #endif #if (defined suse && DISTRO_VER == 15) #define SLES15 #if (PATCH_LEVEL == 1) #define SLES15SP1 #elif (PATCH_LEVEL == 2) #define SLES15SP2 #elif (PATCH_LEVEL == 3) #define SLES15SP3 #elif (PATCH_LEVEL == 4) #define SLES15SP4 #endif #endif /* RHEL 7 */ #if (defined redhat && DISTRO_VER == 8) #define RHEL8 #elif (defined redhat && DISTRO_VER == 7) #define RHEL7 #elif (defined redhat && DISTRO_VER == 6) #define RHEL6 #elif (defined redhat && DISTRO_VER == 5) #define RHEL5 #endif #ifndef RHEL5 #define INITRD_MODE #endif #endif involflt-0.1.0/src/uapi/0000755000000000000000000000000014467303177013604 5ustar rootrootinvolflt-0.1.0/src/uapi/ioctl_codes.h0000755000000000000000000001727414467303177016262 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include #ifndef INM_PATH_MAX #define INM_PATH_MAX PATH_MAX #endif #ifndef _IOCTL_H_ #define _IOCTL_H_ #define IOCTL_INMAGE_VOLUME_STACKING _IOW(FLT_IOCTL, VOLUME_STACKING_CMD, PROCESS_VOLUME_STACKING_INPUT) #define IOCTL_INMAGE_PROCESS_START_NOTIFY _IOW(FLT_IOCTL, START_NOTIFY_CMD, PROCESS_START_NOTIFY_INPUT) #define IOCTL_INMAGE_SERVICE_SHUTDOWN_NOTIFY _IOW(FLT_IOCTL, SHUTDOWN_NOTIFY_CMD, SHUTDOWN_NOTIFY_INPUT) #define IOCTL_INMAGE_STOP_FILTERING_DEVICE _IOW(FLT_IOCTL, STOP_FILTER_CMD, VOLUME_GUID) #define IOCTL_INMAGE_START_FILTERING_DEVICE _IOW(FLT_IOCTL, START_FILTER_CMD, VOLUME_GUID) #define IOCTL_INMAGE_START_FILTERING_DEVICE_V2 _IOW(FLT_IOCTL, START_FILTER_CMD_V2, inm_dev_info_compat_t) #define IOCTL_INMAGE_FREEZE_VOLUME _IOW(FLT_IOCTL, VOLUME_FREEZE_CMD, freeze_info_t) #define IOCTL_INMAGE_THAW_VOLUME _IOW(FLT_IOCTL, VOLUME_THAW_CMD, thaw_info_t) #define IOCTL_INMAGE_TAG_VOLUME_V2 _IOWR(FLT_IOCTL, TAG_CMD_V2, tag_info_t_v2) #define IOCTL_INMAGE_IOBARRIER_TAG_VOLUME _IOWR(FLT_IOCTL, TAG_CMD_V3, tag_info_t_v2) #define IOCTL_INMAGE_CREATE_BARRIER_ALL _IOWR(FLT_IOCTL, CREATE_BARRIER, flt_barrier_create_t) #define IOCTL_INMAGE_REMOVE_BARRIER_ALL _IOWR(FLT_IOCTL, REMOVE_BARRIER, flt_barrier_remove_t) #define IOCTL_INMAGE_TAG_COMMIT_V2 _IOWR(FLT_IOCTL, TAG_COMMIT_V2, flt_tag_commit_t) /* Mirror setup related ioctls */ #define IOCTL_INMAGE_START_MIRRORING_DEVICE _IOW(FLT_IOCTL, START_MIRROR_CMD, mirror_conf_info_t) #define IOCTL_INMAGE_STOP_MIRRORING_DEVICE _IOW(FLT_IOCTL, STOP_MIRROR_CMD, SCSI_ID) #define IOCTL_INMAGE_MIRROR_VOLUME_STACKING _IOW(FLT_IOCTL, MIRROR_VOLUME_STACKING_CMD, mirror_conf_info_t) #define IOCTL_INMAGE_MIRROR_EXCEPTION_NOTIFY _IOW(FLT_IOCTL, MIRROR_EXCEPTION_NOTIFY_CMD, inm_resync_notify_info_t) #define IOCTL_INMAGE_MIRROR_TEST_HEARTBEAT _IOW(FLT_IOCTL, MIRROR_TEST_HEARTBEAT_CMD, SCSI_ID) #define IOCTL_INMAGE_BLOCK_AT_LUN _IOW(FLT_IOCTL, BLOCK_AT_LUN_CMD, inm_at_lun_reconfig_t) #define IOCTL_INMAGE_GET_DIRTY_BLOCKS_TRANS_V2 _IOWR(FLT_IOCTL, GET_DB_CMD_V2, UDIRTY_BLOCK_V2) #define IOCTL_INMAGE_COMMIT_DIRTY_BLOCKS_TRANS _IOW(FLT_IOCTL, COMMIT_DB_CMD, COMMIT_TRANSACTION) #define IOCTL_INMAGE_SET_VOLUME_FLAGS _IOW(FLT_IOCTL, SET_VOL_FLAGS_CMD, VOLUME_FLAGS_INPUT) #define IOCTL_INMAGE_GET_VOLUME_FLAGS _IOR(FLT_IOCTL, GET_VOL_FLAGS_CMD, VOLUME_FLAGS_INPUT) #define IOCTL_INMAGE_WAIT_FOR_DB _IOW(FLT_IOCTL, WAIT_FOR_DB_CMD, WAIT_FOR_DB_NOTIFY) #define IOCTL_INMAGE_CLEAR_DIFFERENTIALS _IOW(FLT_IOCTL, CLEAR_DIFFS_CMD, VOLUME_GUID) #define IOCTL_INMAGE_GET_NANOSECOND_TIME _IOWR(FLT_IOCTL, GET_TIME_CMD, long long) #define IOCTL_INMAGE_UNSTACK_ALL _IO(FLT_IOCTL, UNSTACK_ALL_CMD) #define IOCTL_INMAGE_SYS_PRE_SHUTDOWN _IOW(FLT_IOCTL, SYS_PRE_SHUTDOWN_NOTIFY_CMD, void *) #define IOCTL_INMAGE_SYS_SHUTDOWN _IOW(FLT_IOCTL, SYS_SHUTDOWN_NOTIFY_CMD, SYS_SHUTDOWN_NOTIFY_INPUT) #define IOCTL_INMAGE_TAG_VOLUME _IOWR(FLT_IOCTL, TAG_CMD, unsigned long) #define IOCTL_INMAGE_SYNC_TAG_VOLUME _IOWR(FLT_IOCTL, SYNC_TAG_CMD, unsigned long) #define IOCTL_INMAGE_GET_TAG_VOLUME_STATUS _IOWR(FLT_IOCTL, SYNC_TAG_STATUS_CMD, unsigned long) #define IOCTL_INMAGE_WAKEUP_ALL_THREADS _IO(FLT_IOCTL, WAKEUP_THREADS_CMD) #define IOCTL_INMAGE_GET_DB_NOTIFY_THRESHOLD _IOWR(FLT_IOCTL, GET_DB_THRESHOLD_CMD, get_db_thres_t ) #define IOCTL_INMAGE_RESYNC_START_NOTIFICATION _IOWR(FLT_IOCTL, RESYNC_START_CMD, RESYNC_START ) #define IOCTL_INMAGE_RESYNC_END_NOTIFICATION _IOWR(FLT_IOCTL, RESYNC_END_CMD, RESYNC_END) #define IOCTL_INMAGE_GET_DRIVER_VERSION _IOWR(FLT_IOCTL, GET_DRIVER_VER_CMD, DRIVER_VERSION) #define IOCTL_INMAGE_SHELL_LOG _IOWR(FLT_IOCTL, GET_SHELL_LOG_CMD, char *) #define IOCTL_INMAGE_AT_LUN_CREATE _IOW(FLT_IOCTL, AT_LUN_CREATE_CMD, LUN_CREATE_INPUT) #define IOCTL_INMAGE_AT_LUN_DELETE _IOW(FLT_IOCTL, AT_LUN_DELETE_CMD, LUN_DELETE_INPUT) #define IOCTL_INMAGE_AT_LUN_LAST_WRITE_VI _IOWR(FLT_IOCTL, AT_LUN_LAST_WRITE_VI_CMD, AT_LUN_LAST_WRITE_VI) #define IOCTL_INMAGE_AT_LUN_LAST_HOST_IO_TIMESTAMP _IOWR(FLT_IOCTL, AT_LUN_LAST_HOST_IO_TIMESTAMP_CMD, AT_LUN_LAST_HOST_IO_TIMESTAMP) #define IOCTL_INMAGE_AT_LUN_QUERY _IOWR(FLT_IOCTL, AT_LUN_QUERY_CMD, LUN_QUERY_DATA) #define IOCTL_INMAGE_GET_GLOBAL_STATS _IO(FLT_IOCTL, GET_GLOBAL_STATS_CMD) #define IOCTL_INMAGE_GET_VOLUME_STATS _IO(FLT_IOCTL, GET_VOLUME_STATS_CMD) #define IOCTL_INMAGE_GET_PROTECTED_VOLUME_LIST _IO(FLT_IOCTL, GET_PROTECTED_VOLUME_LIST_CMD) #define IOCTL_INMAGE_GET_SET_ATTR _IOWR(FLT_IOCTL, GET_SET_ATTR_CMD, struct _inm_attribute *) #define IOCTL_INMAGE_GET_ADDITIONAL_VOLUME_STATS _IOWR(FLT_IOCTL, GET_ADDITIONAL_VOLUME_STATS_CMD, VOLUME_STATS_ADDITIONAL_INFO) #define IOCTL_INMAGE_GET_VOLUME_LATENCY_STATS _IOWR(FLT_IOCTL, GET_VOLUME_LATENCY_STATS_CMD, VOLUME_LATENCY_STATS) #define IOCTL_INMAGE_GET_VOLUME_BMAP_STATS _IOWR(FLT_IOCTL, GET_VOLUME_BMAP_STATS_CMD, VOLUME_BMAP_STATS) #define IOCTL_INMAGE_SET_INVOLFLT_VERBOSITY _IOWR(FLT_IOCTL, SET_INVOLFLT_VERBOSITY_CMD, inm_u32_t) #define IOCTL_INMAGE_GET_MONITORING_STATS _IOWR(FLT_IOCTL, GET_MONITORING_STATS_CMD, MONITORING_STATS) #define IOCTL_INMAGE_GET_BLK_MQ_STATUS _IOWR(FLT_IOCTL, GET_BLK_MQ_STATUS_CMD, int) #define IOCTL_INMAGE_GET_VOLUME_STATS_V2 _IOWR(FLT_IOCTL, GET_VOLUME_STATS_V2_CMD, TELEMETRY_VOL_STATS) #define IOCTL_INMAGE_REPLICATION_STATE _IOWR(FLT_IOCTL, REPLICATION_STATE, replication_state_t) #define IOCTL_INMAGE_NAME_MAPPING _IOWR(FLT_IOCTL, NAME_MAPPING, vol_name_map_t) #define IOCTL_INMAGE_LCW _IOR(FLT_IOCTL, LCW, lcw_op_t) #define IOCTL_INMAGE_WAIT_FOR_DB_V2 _IOW(FLT_IOCTL, WAIT_FOR_DB_CMD_V2, WAIT_FOR_DB_NOTIFY) #define IOCTL_INMAGE_INIT_DRIVER_FULLY _IOW(FLT_IOCTL, INIT_DRIVER_FULLY_CMD, int) #define IOCTL_INMAGE_COMMITDB_FAIL_TRANS _IOW(FLT_IOCTL, COMMITDB_FAIL_TRANS, COMMIT_DB_FAILURE_STATS) #define IOCTL_INMAGE_GET_CXSTATS_NOTIFY _IOWR(FLT_IOCTL, GET_CXSTATS_NOTIFY, VM_CXFAILURE_STATS) #define IOCTL_INMAGE_WAKEUP_GET_CXSTATS_NOTIFY_THREAD _IO(FLT_IOCTL, WAKEUP_GET_CXSTATS_NOTIFY_THREAD) #define IOCTL_INMAGE_TAG_DRAIN_NOTIFY _IOWR(FLT_IOCTL, TAG_DRAIN_NOTIFY, TAG_COMMIT_NOTIFY_OUTPUT) #define IOCTL_INMAGE_WAKEUP_TAG_DRAIN_NOTIFY_THREAD _IO(FLT_IOCTL, WAKEUP_TAG_DRAIN_NOTIFY_THREAD) #define IOCTL_INMAGE_MODIFY_PERSISTENT_DEVICE_NAME _IOW(FLT_IOCTL, MODIFY_PERSISTENT_DEVICE_NAME, MODIFY_PERSISTENT_DEVICE_NAME_INPUT) #define IOCTL_INMAGE_GET_DRAIN_STATE _IOWR(FLT_IOCTL, GET_DRAIN_STATE_CMD, GET_DISK_STATE_OUTPUT) #define IOCTL_INMAGE_SET_DRAIN_STATE _IOWR(FLT_IOCTL, SET_DRAIN_STATE_CMD, SET_DRAIN_STATE_OUTPUT) #endif /* _IOCTL_H_ */ involflt-0.1.0/src/uapi/inm_types.h0000755000000000000000000000327414467303177015775 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef INM_TYPES_H #define INM_TYPES_H #ifdef __KERNEL__ #include #else #include #endif #include /* signed types */ typedef long long inm_sll64_t; typedef int64_t inm_s64_t; typedef int32_t inm_s32_t; typedef int16_t inm_s16_t; typedef int8_t inm_s8_t; typedef char inm_schar; /* unsigned types */ typedef unsigned long long inm_ull64_t; typedef uint64_t inm_u64_t; typedef uint32_t inm_u32_t; typedef uint16_t inm_u16_t; typedef uint8_t inm_u8_t; typedef unsigned char inm_uchar; #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,24) typedef void inm_iodone_t; #else typedef int inm_iodone_t; #endif #endif /* INM_TYPES_H */ involflt-0.1.0/src/uapi/involflt.h0000755000000000000000000012747514467303177015635 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ /* * File : involflt.h * * Description: Header shared between linux filter driver and user-space components. */ #ifndef INVOLFLT_H #define INVOLFLT_H #include "inm_types.h" #include "ioctl_codes.h" #define INMAGE_FILTER_DEVICE_NAME "/dev/involflt" #define GUID_SIZE_IN_CHARS 128 #define MAX_INITIATOR_NAME_LEN 24 #define INM_GUID_LEN_MAX 256 #define INM_MAX_VOLUMES_IN_LIST 0xFF #define INM_MAX_SCSI_ID_SIZE 256 #define MAX_WWPN_SIZE 256 #define TAG_VOLUME_MAX_LENGTH 256 #define GUID_LEN 36 #define TAG_MAX_LENGTH 256 /* Version information stated with Driver Version 2.0.0.0 */ #define DRIVER_MAJOR_VERSION 0x02 #define DRIVER_MINOR_VERSION 0x03 #define DRIVER_MINOR_VERSION2 0x0f #define DRIVER_MINOR_VERSION3 0x3f /*freeze, thaw, tag volume return status */ #define STATUS_FREEZE_SUCCESS 0x0000 #define STATUS_FREEZE_FAILED 0x0001 #define STATUS_THAW_SUCCESS 0x0000 #define STATUS_THAW_FAILED 0x0001 #define STATUS_TAG_ACCEPTED 0x0000 #define STATUS_TAG_NOT_ACCEPTED 0x0001 #define STATUS_TAG_NOT_PROTECTED 0x0002 #define STATUS_TAG_WO_METADATA 0x0004 #define TAG_FS_FROZEN_IN_USERSPACE 0x0004 #define VACP_IOBARRIER_TIMEOUT 300 /* in ms */ #define VACP_TAG_COMMIT_TIMEOUT 300 /* in ms */ #define FS_FREEZE_TIMEOUT 60000 /* in ms */ #define VACP_APP_TAG_COMMIT_TIMEOUT 60000 /* in ms */ typedef enum { INM_TAG_FAILED = -1, /* ioctl returns -1 on failure*/ /* and sets the errno */ INM_TAG_SUCCESS = 0, /* ioctl returns 0 on success */ INM_TAG_PARTIAL = 1, /* indicate partial success */ } inm_tag_status_t; typedef enum { FLT_MODE_UNINITIALIZED = 0, FLT_MODE_METADATA, FLT_MODE_DATA } flt_mode; typedef enum _etWriterOrderState { ecWriteOrderStateUnInitialized = 0, ecWriteOrderStateBitmap = 1, ecWriteOrderStateMetadata = 2, ecWriteOrderStateData = 3, ecWriteOrderStateRawBitmap = 4 } etWriteOrderState, *petWriteOrderState; typedef enum inm_device { FILTER_DEV_FABRIC_LUN = 1, FILTER_DEV_HOST_VOLUME = 2, FILTER_DEV_FABRIC_VSNAP = 3, FILTER_DEV_FABRIC_RESYNC = 4, FILTER_DEV_MIRROR_SETUP = 5, } inm_device_t; typedef enum vendor { FILTER_DEV_UNKNOWN_VENDOR = 1, FILTER_DEV_NATIVE = 2, FILTER_DEV_DEVMAPPER = 3, FILTER_DEV_MPXIO = 4, FILTER_DEV_EMC = 5, FILTER_DEV_HDLM = 6, FILTER_DEV_DEVDID = 7, FILTER_DEV_DEVGLOBAL = 8, FILTER_DEV_VXDMP = 9, FILTER_DEV_SVM = 10, FILTER_DEV_VXVM = 11, FILTER_DEV_LVM = 12, FILTER_DEV_INMVOLPACK = 13, FILTER_DEV_INMVSNAP = 14, FILTER_DEV_CUSTOMVENDOR = 15, FILTER_DEV_ASM = 16, }inm_vendor_t; #define MIRROR_SETUP_PENDING_RESYNC_CLEARED_FLAG 0x0000000000000001 #define FULL_DISK_FLAG 0x0000000000000002 #define FULL_DISK_PARTITION_FLAG 0x0000000000000004 #define FULL_DISK_LABEL_VTOC 0x0000000000000008 #define FULL_DISK_LABEL_EFI 0x0000000000000010 #define INM_IS_DEVICE_MULTIPATH 0x0000000000000100 #define MIRROR_VOLUME_STACKING_FLAG 0x00008000 #define HOST_VOLUME_STACKING_FLAG 0x00010000 /* * This structure is defined for backward compatibility with older drivers * who accept the older structure without d_pname. This structure is only * used for defining the ioctl command values. */ typedef struct inm_dev_info_compat { inm_device_t d_type; char d_guid[INM_GUID_LEN_MAX]; char d_mnt_pt[INM_PATH_MAX]; inm_u64_t d_nblks; inm_u32_t d_bsize; inm_u64_t d_flags; } inm_dev_info_compat_t; typedef struct inm_dev_info { inm_device_t d_type; char d_guid[INM_GUID_LEN_MAX]; char d_mnt_pt[INM_PATH_MAX]; inm_u64_t d_nblks; inm_u32_t d_bsize; inm_u64_t d_flags; char d_pname[INM_GUID_LEN_MAX]; } inm_dev_info_t; typedef enum eMirrorConfErrors { MIRROR_NO_ERROR = 0, SRC_LUN_INVALID, ATLUN_INVALID, DRV_MEM_ALLOC_ERR, DRV_MEM_COPYIN_ERR, DRV_MEM_COPYOUT_ERR, SRC_NAME_CHANGED_ERR, ATLUN_NAME_CHANGED_ERR, MIRROR_STACKING_ERR, RESYNC_CLEAR_ERROR, RESYNC_NOT_SET_ON_CLEAR_ERR, SRC_DEV_LIST_MISMATCH_ERR, ATLUN_DEV_LIST_MISMATCH_ERR, SRC_DEV_SCSI_ID_ERR, DST_DEV_SCSI_ID_ERR, MIRROR_NOT_SETUP, MIRROR_NOT_SUPPORTED } eMirrorConfErrors_t; typedef struct _mirror_conf_info { inm_u64_t d_flags; inm_u64_t d_nblks; inm_u64_t d_bsize; inm_u64_t startoff; #ifdef INM_AIX #ifdef __64BIT__ /* [volume name 1]..[volume name n] */ char *src_guid_list; #else int padding; /* [volume name 1]..[volume name n] */ char *src_guid_list; #endif #ifdef __64BIT__ /* [volume name 1]..[volume name n] */ char *dst_guid_list; #else int padding_2; /* [volume name 1]..[volume name n] */ char *dst_guid_list; #endif #else /* [volume name 1]..[volume name n] */ char *src_guid_list; /* [volume name 1]..[volume name n] */ char *dst_guid_list; #endif eMirrorConfErrors_t d_status; inm_u32_t nsources; inm_u32_t ndestinations; inm_device_t d_type; inm_vendor_t d_vendor; char src_scsi_id[INM_MAX_SCSI_ID_SIZE]; char dst_scsi_id[INM_MAX_SCSI_ID_SIZE]; /* AT LUN id */ char at_name[INM_GUID_LEN_MAX]; } mirror_conf_info_t; typedef enum _etBitOperation { ecBitOpNotDefined = 0, ecBitOpSet = 1, ecBitOpReset = 2, } etBitOperation; #define PROCESS_START_NOTIFY_INPUT_FLAGS_DATA_FILES_AWARE 0x0001 #define PROCESS_START_NOTIFY_INPUT_FLAGS_64BIT_PROCESS 0x0002 #define SHUTDOWN_NOTIFY_FLAGS_ENABLE_DATA_FILTERING 0x00000001 #define SHUTDOWN_NOTIFY_FLAGS_ENABLE_DATA_FILES 0x00000002 typedef struct { char volume_guid[GUID_SIZE_IN_CHARS]; } VOLUME_GUID; typedef struct { char scsi_id[INM_MAX_SCSI_ID_SIZE]; } SCSI_ID; typedef struct _SHUTDOWN_NOTIFY_INPUT { inm_u32_t ulFlags; } SHUTDOWN_NOTIFY_INPUT, *PSHUTDOWN_NOTIFY_INPUT; typedef SHUTDOWN_NOTIFY_INPUT SYS_SHUTDOWN_NOTIFY_INPUT; typedef struct _PROCESS_START_NOTIFY_INPUT { inm_u32_t ulFlags; } PROCESS_START_NOTIFY_INPUT, *PPROCESS_START_NOTIFY_INPUT; typedef struct _PROCESS_VOLUME_STACKING_INPUT { inm_u32_t ulFlags; } PROCESS_VOLUME_STACKING_INPUT, *PPROCESS_VOLUME_STACKING_INPUT; typedef struct _DISK_CHANGE { inm_u64_t ByteOffset; inm_u32_t Length; unsigned short usBufferIndex; unsigned short usNumberOfBuffers; } DISK_CHANGE, DiskChange, *PDISK_CHANGE; typedef struct _LUN_CREATE_INPUT { char uuid[GUID_SIZE_IN_CHARS+1]; inm_u64_t lunSize; inm_u64_t lunStartOff; inm_u32_t blockSize; inm_device_t lunType; } LUN_CREATE_INPUT, LunCreateData, *PLUN_CREATE_DATA; typedef struct _LUN_DELETE_DATA { char uuid[GUID_SIZE_IN_CHARS+1]; inm_device_t lunType; } LUN_DELETE_INPUT, LunDeleteData, *PLUN_DELETE_DATA; typedef struct _AT_LUN_LAST_WRITE_VI { char uuid[GUID_SIZE_IN_CHARS+1]; char initiator_name[MAX_INITIATOR_NAME_LEN]; inm_u64_t timestamp; /* Return timestamp at which ioctl was issued */ } AT_LUN_LAST_WRITE_VI,ATLunLastWriteVI , *PATLUN_LAST_WRITE_VI; typedef struct _WWPN_DATA { /* wwpn in FC (Max. 23 bytes ex: aa:bb:cc:dd:ee:ff:gg:hh) OR * iscsi iqn name (maximum length of 223 bytes * iscsi eui name (Max. 20 bytes ex: eui.02004567A425678D) * Using: 256 bytes long string */ char wwpn[MAX_WWPN_SIZE]; } WWPN_DATA, WwpnData, *PWWPNDATA; typedef struct _AT_LUN_LAST_HOST_IO_TIMESTAMP { char uuid[GUID_SIZE_IN_CHARS+1]; /* Input: AT lun name, Max size = GUID_SIZE_IN_CHARS+1 */ inm_u64_t timestamp; /* Output: Return timestamp of last successful IO done by the host */ inm_u32_t wwpn_count; /* Input: Number of host PI wwpns */ WwpnData wwpn_data[1]; /* Input: each terminated by '\0', MAX size = MAX_INITIATOR_NAME_LEN ??? */ } AT_LUN_LAST_HOST_IO_TIMESTAMP, ATLunLastHostIOTimeStamp, *PAT_LUN_LAST_HOST_IO_TIMESTAMP; typedef struct _LUN_DATA { char uuid[GUID_SIZE_IN_CHARS+1]; inm_device_t lun_type; } LUN_DATA, LunData, *PLUNDATA; typedef struct _LUN_QUERY_DATA { inm_u32_t count; inm_u32_t lun_count; LunData lun_data[1]; } LUN_QUERY_DATA, LunQueryData, *PLUN_QUERY_DATA; /* Tag Structure _________________________________________________________________________ | | | | Padding | | Tag Header | Tag Size | Tag Data | (4Byte | |__(4 / 8 Bytes)____|____(2 Bytes)_|__(Tag Size Bytes)______|__ Alignment)| Tag Size doesnot contain the padding. But the length in Tag Header contains the total tag length including padding. i.e. Tag length in header = Tag Header size + 2 bytes Tag Size + Tag Data Length + Padding */ #define STREAM_REC_TYPE_START_OF_TAG_LIST 0x0001 #define STREAM_REC_TYPE_END_OF_TAG_LIST 0x0002 #define STREAM_REC_TYPE_TIME_STAMP_TAG 0x0003 #define STREAM_REC_TYPE_DATA_SOURCE 0x0004 #define STREAM_REC_TYPE_USER_DEFINED_TAG 0x0005 #define STREAM_REC_TYPE_PADDING 0x0006 #ifndef INVOFLT_STREAM_FUNCTIONS #define INVOFLT_STREAM_FUNCTIONS typedef struct _STREAM_REC_HDR_4B { unsigned short usStreamRecType; unsigned char ucFlags; unsigned char ucLength; /* Length includes size of this header too. */ } STREAM_REC_HDR_4B, *PSTREAM_REC_HDR_4B; typedef struct _STREAM_REC_HDR_8B { unsigned short usStreamRecType; unsigned char ucFlags; /* STREAM_REC_FLAGS_LENGTH_BIT bit is set for this record. */ unsigned char ucReserved; inm_u32_t ulLength; /* Length includes size of this header too. */ } STREAM_REC_HDR_8B, *PSTREAM_REC_HDR_8B; #define FILL_STREAM_HEADER_4B(pHeader, Type, Len) \ { \ ((PSTREAM_REC_HDR_4B)pHeader)->usStreamRecType = Type; \ ((PSTREAM_REC_HDR_4B)pHeader)->ucFlags = 0; \ ((PSTREAM_REC_HDR_4B)pHeader)->ucLength = Len; \ } #define FILL_STREAM_HEADER_8B(pHeader, Type, Len) \ { \ ((PSTREAM_REC_HDR_8B)pHeader)->usStreamRecType = Type; \ ((PSTREAM_REC_HDR_8B)pHeader)->ucFlags = STREAM_REC_FLAGS_LENGTH_BIT; \ ((PSTREAM_REC_HDR_8B)pHeader)->ucReserved = 0; \ ((PSTREAM_REC_HDR_8B)pHeader)->ulLength = Len; \ } #define STREAM_REC_FLAGS_LENGTH_BIT 0x01 #define GET_STREAM_LENGTH(pHeader) \ ( (((PSTREAM_REC_HDR_4B)pHeader)->ucFlags & STREAM_REC_FLAGS_LENGTH_BIT) ? \ (((PSTREAM_REC_HDR_8B)pHeader)->ulLength) : \ (((PSTREAM_REC_HDR_4B)pHeader)->ucLength)) #endif #define FILL_STREAM_HEADER(pHeader, Type, Len) \ { \ if((inm_u32_t )Len > (inm_u32_t )0xFF) { \ ((PSTREAM_REC_HDR_8B)pHeader)->usStreamRecType = Type; \ ((PSTREAM_REC_HDR_8B)pHeader)->ucFlags = STREAM_REC_FLAGS_LENGTH_BIT; \ ((PSTREAM_REC_HDR_8B)pHeader)->ucReserved = 0; \ ((PSTREAM_REC_HDR_8B)pHeader)->ulLength = Len; \ } else { \ ((PSTREAM_REC_HDR_4B)pHeader)->usStreamRecType = Type; \ ((PSTREAM_REC_HDR_4B)pHeader)->ucFlags = 0; \ ((PSTREAM_REC_HDR_4B)pHeader)->ucLength = (unsigned char )Len; \ } \ } #define FILL_STREAM(pHeader, Type, Len, pData) \ { \ if((inm_u32_t )Len > (inm_u32_t )0xFF) { \ ((PSTREAM_REC_HDR_8B)pHeader)->usStreamRecType = Type; \ ((PSTREAM_REC_HDR_8B)pHeader)->ucFlags = STREAM_REC_FLAGS_LENGTH_BIT; \ ((PSTREAM_REC_HDR_8B)pHeader)->ucReserved = 0; \ ((PSTREAM_REC_HDR_8B)pHeader)->ulLength = Len; \ RtlCopyMemory(((Punsigned char )pHeader) + sizeof(PSTREAM_REC_HDR_8B), pData, Len); \ } else { \ ((PSTREAM_REC_HDR_4B)pHeader)->usStreamRecType = Type; \ ((PSTREAM_REC_HDR_4B)pHeader)->ucFlags = 0; \ ((PSTREAM_REC_HDR_4B)pHeader)->ucLength = (unsigned char )Len; \ RtlCopyMemory(((Punsigned char )pHeader) + sizeof(PSTREAM_REC_HDR_4B), pData, Len); \ } \ } #define STREAM_REC_SIZE(pHeader) \ ( (((PSTREAM_REC_HDR_4B)pHeader)->ucFlags & STREAM_REC_FLAGS_LENGTH_BIT) ? \ (((PSTREAM_REC_HDR_8B)pHeader)->ulLength) : \ (((PSTREAM_REC_HDR_4B)pHeader)->ucLength)) #define STREAM_REC_TYPE(pHeader) ((pHeader->usStreamRecType & TAG_TYPE_MASK) >> 0x14) #define STREAM_REC_ID(pHeader) (((PSTREAM_REC_HDR_4B)pHeader)->usStreamRecType) #define STREAM_REC_HEADER_SIZE(pHeader) ( (((PSTREAM_REC_HDR_4B)pHeader)->ucFlags & STREAM_REC_FLAGS_LENGTH_BIT) ? sizeof(STREAM_REC_HDR_8B) : sizeof(STREAM_REC_HDR_4B) ) #define STREAM_REC_DATA_SIZE(pHeader) (STREAM_REC_SIZE(pHeader) - STREAM_REC_HEADER_SIZE(pHeader)) #define STREAM_REC_DATA(pHeader) ((Punsigned char )pHeader + STREAM_REC_HEADER_SIZE(pHeader)) typedef struct _TIME_STAMP_TAG { STREAM_REC_HDR_4B Header; inm_u32_t ulSequenceNumber; inm_u64_t TimeInHundNanoSecondsFromJan1601; } TIME_STAMP_TAG, *PTIME_STAMP_TAG; typedef struct _TIME_STAMP_TAG_V2 { STREAM_REC_HDR_4B Header; STREAM_REC_HDR_4B Reserved; inm_u64_t ullSequenceNumber; inm_u64_t TimeInHundNanoSecondsFromJan1601; } TIME_STAMP_TAG_V2, *PTIME_STAMP_TAG_V2; #define INVOLFLT_DATA_SOURCE_UNDEFINED 0x00 #define INVOLFLT_DATA_SOURCE_BITMAP 0x01 #define INVOLFLT_DATA_SOURCE_META_DATA 0x02 #define INVOLFLT_DATA_SOURCE_DATA 0x03 typedef struct _DATA_SOURCE_TAG { STREAM_REC_HDR_4B Header; inm_u32_t ulDataSource; } DATA_SOURCE_TAG, *PDATA_SOURCE_TAG; typedef struct _VOLUME_STATS{ VOLUME_GUID guid; char *bufp; inm_u32_t buf_len; } VOLUME_STATS; /* Light weight stats about Tags */ typedef struct _VOLUME_TAG_STATS { inm_u64_t TagsDropped; } VOLUME_TAG_STATS; /* Light weight stats about write churn */ typedef struct _VOLUME_CHURN_STATS { inm_s64_t NumCommitedChangesInBytes; } VOLUME_CHURN_STATS; /* User passes below flag indicating required Light weight stats */ typedef enum { GET_TAG_STATS = 1, GET_CHURN_STATS } ReqStatsType; typedef struct _MONITORING_STATS{ VOLUME_GUID VolumeGuid; ReqStatsType ReqStat; union { VOLUME_TAG_STATS TagStats; VOLUME_CHURN_STATS ChurnStats; }; } MONITORING_STATS; typedef struct _BLK_MQ_STATUS{ VOLUME_GUID VolumeGuid; int blk_mq_enabled; } BLK_MQ_STATUS; typedef struct _GET_VOLUME_LIST { #ifdef INM_AIX #ifdef __64BIT__ char *bufp; #else int padding; char *bufp; #endif #else char *bufp; #endif inm_u32_t buf_len; } GET_VOLUME_LIST; typedef struct _inm_attribute{ inm_u32_t type; inm_u32_t why; VOLUME_GUID guid; char *bufp; inm_u32_t buflen; }inm_attribute_t; #define MAX_DIRTY_CHANGES 256 #ifdef __sparcv9 #define MAX_DIRTY_CHANGES_V2 409 #else #define MAX_DIRTY_CHANGES_V2 204 #endif #define UDIRTY_BLOCK_FLAG_START_OF_SPLIT_CHANGE 0x00000001 #define UDIRTY_BLOCK_FLAG_PART_OF_SPLIT_CHANGE 0x00000002 #define UDIRTY_BLOCK_FLAG_END_OF_SPLIT_CHANGE 0x00000004 #define UDIRTY_BLOCK_FLAG_DATA_FILE 0x00000008 #define UDIRTY_BLOCK_FLAG_SVD_STREAM 0x00000010 #define UDIRTY_BLOCK_FLAG_VOLUME_RESYNC_REQUIRED 0x80000000 #define UDIRTY_BLOCK_FLAG_TSO_FILE 0x00000020 #define COMMIT_TRANSACTION_FLAG_RESET_RESYNC_REQUIRED_FLAG 0x00000001 typedef struct _UDIRTY_BLOCK { #define UDIRTY_BLOCK_HEADER_SIZE 0x200 /* 512 Bytes */ #define UDIRTY_BLOCK_MAX_ERROR_STRING_SIZE 0x80 /* 128 Bytes */ #define UDIRTY_BLOCK_TAGS_SIZE 0xE00 /* 7 * 512 Bytes */ /* UDIRTY_BLOCK_MAX_FILE_NAME is UDIRTY_BLOCK_TAGS_SIZE / sizeof(unsigned char ) - 1(for length field) */ #define UDIRTY_BLOCK_MAX_FILE_NAME 0x6FF /* (0xE00 /2) - 1 */ /* uHeader is .5 KB and uTags is 3.5 KB, uHeader + uTags = 4KB */ union { struct { inm_u64_t uliTransactionID; inm_u64_t ulicbChanges; inm_u32_t cChanges; inm_u32_t ulTotalChangesPending; inm_u64_t ulicbTotalChangesPending; inm_u32_t ulFlags; inm_u32_t ulSequenceIDforSplitIO; inm_u32_t ulBufferSize; unsigned short usMaxNumberOfBuffers; unsigned short usNumberOfBuffers; inm_u32_t ulcbChangesInStream; /* This is actually a pointer to memory and not an array of pointers. * It contains Changes in linear memorylocation. */ void **ppBufferArray; inm_u32_t ulcbBufferArraySize; /* resync flags */ unsigned long ulOutOfSyncCount; unsigned char ErrorStringForResync[UDIRTY_BLOCK_MAX_ERROR_STRING_SIZE]; unsigned long ulOutOfSyncErrorCode; inm_u64_t liOutOfSyncTimeStamp; inm_u32_t ulPrevEndSequenceNumber; inm_u64_t ullPrevEndTimeStamp; inm_u32_t ulPrevSequenceIDforSplitIO; etWriteOrderState eWOState; } Hdr; unsigned char BufferReservedForHeader[UDIRTY_BLOCK_HEADER_SIZE]; } uHdr; /* Start of Markers */ union { struct { STREAM_REC_HDR_4B TagStartOfList; STREAM_REC_HDR_4B TagPadding; TIME_STAMP_TAG TagTimeStampOfFirstChange; TIME_STAMP_TAG TagTimeStampOfLastChange; DATA_SOURCE_TAG TagDataSource; STREAM_REC_HDR_4B TagEndOfList; } TagList; struct { unsigned short usLength; /* Filename length in bytes not including NULL */ unsigned char FileName[UDIRTY_BLOCK_MAX_FILE_NAME]; } DataFile; unsigned char BufferForTags[UDIRTY_BLOCK_TAGS_SIZE]; } uTagList; inm_sll64_t ChangeOffsetArray[MAX_DIRTY_CHANGES]; inm_u32_t ChangeLengthArray[MAX_DIRTY_CHANGES]; } UDIRTY_BLOCK, *PUDIRTY_BLOCK; typedef struct _UDIRTY_BLOCK_V2 { #define UDIRTY_BLOCK_HEADER_SIZE 0x200 /* 512 Bytes */ #define UDIRTY_BLOCK_MAX_ERROR_STRING_SIZE 0x80 /* 128 Bytes */ #define UDIRTY_BLOCK_TAGS_SIZE 0xE00 /* 7 * 512 Bytes */ /* UDIRTY_BLOCK_MAX_FILE_NAME is UDIRTY_BLOCK_TAGS_SIZE / sizeof(unsigned char ) - 1(for length field) */ #define UDIRTY_BLOCK_MAX_FILE_NAME 0x6FF /* (0xE00 /2) - 1 */ /* uHeader is .5 KB and uTags is 3.5 KB, uHeader + uTags = 4KB */ union { struct { inm_u64_t uliTransactionID; inm_u64_t ulicbChanges; inm_u32_t cChanges; inm_u32_t ulTotalChangesPending; inm_u64_t ulicbTotalChangesPending; inm_u32_t ulFlags; inm_u32_t ulSequenceIDforSplitIO; inm_u32_t ulBufferSize; unsigned short usMaxNumberOfBuffers; unsigned short usNumberOfBuffers; inm_u32_t ulcbChangesInStream; /* This is actually a pointer to memory and not an array of pointers. * It contains Changes in linear memorylocation. */ void **ppBufferArray; inm_u32_t ulcbBufferArraySize; /* resync flags */ unsigned long ulOutOfSyncCount; unsigned char ErrorStringForResync[UDIRTY_BLOCK_MAX_ERROR_STRING_SIZE]; unsigned long ulOutOfSyncErrorCode; inm_u64_t liOutOfSyncTimeStamp; inm_u64_t ullPrevEndSequenceNumber; inm_u64_t ullPrevEndTimeStamp; inm_u32_t ulPrevSequenceIDforSplitIO; etWriteOrderState eWOState; } Hdr; unsigned char BufferReservedForHeader[UDIRTY_BLOCK_HEADER_SIZE]; } uHdr; /* Start of Markers */ union { struct { STREAM_REC_HDR_4B TagStartOfList; STREAM_REC_HDR_4B TagPadding; TIME_STAMP_TAG_V2 TagTimeStampOfFirstChange; TIME_STAMP_TAG_V2 TagTimeStampOfLastChange; DATA_SOURCE_TAG TagDataSource; STREAM_REC_HDR_4B TagEndOfList; } TagList; struct { /* Filename length in bytes not including NULL */ unsigned short usLength; unsigned char FileName[UDIRTY_BLOCK_MAX_FILE_NAME]; } DataFile; unsigned char BufferForTags[UDIRTY_BLOCK_TAGS_SIZE]; } uTagList; inm_ull64_t ChangeOffsetArray[MAX_DIRTY_CHANGES_V2]; inm_u32_t ChangeLengthArray[MAX_DIRTY_CHANGES_V2]; inm_u32_t TimeDeltaArray[MAX_DIRTY_CHANGES_V2]; inm_u32_t SequenceNumberDeltaArray[MAX_DIRTY_CHANGES_V2]; } UDIRTY_BLOCK_V2, *PUDIRTY_BLOCK_V2; typedef struct _COMMIT_TRANSACTION { unsigned char VolumeGUID[GUID_SIZE_IN_CHARS]; inm_u64_t ulTransactionID; inm_u32_t ulFlags; } COMMIT_TRANSACTION, *PCOMMIT_TRANSACTION; typedef struct _VOLUME_FLAGS_INPUT { unsigned char VolumeGUID[GUID_SIZE_IN_CHARS]; // if eOperation is BitOpSet the bits in ulVolumeFlags will be set // if eOperation is BitOpReset the bits in ulVolumeFlags will be unset etBitOperation eOperation; inm_u32_t ulVolumeFlags; } VOLUME_FLAGS_INPUT, *PVOLUME_FLAGS_INPUT; typedef struct _VOLUME_FLAGS_OUTPUT { inm_u32_t ulVolumeFlags; } VOLUME_FLAGS_OUTPUT, *PVOLUME_FLAGS_OUTPUT; #define DRIVER_FLAG_DISABLE_DATA_FILES 0x00000001 #define DRIVER_FLAG_DISABLE_DATA_FILES_FOR_NEW_VOLUMES 0x00000002 #define DRIVER_FLAGS_VALID 0x00000003 typedef struct _DRIVER_FLAGS_INPUT { /* if eOperation is BitOpSet the bits in ulFlags will be set * if eOperation is BitOpReset the bits in ulFlags will be unset */ etBitOperation eOperation; inm_u32_t ulFlags; } DRIVER_FLAGS_INPUT, *PDRIVER_FLAGS_INPUT; typedef struct _DRIVER_FLAGS_OUTPUT { inm_u32_t ulFlags; } DRVER_FLAGS_OUTPUT, *PDRIVER_FLAGS_OUTPUT; typedef struct { unsigned char VolumeGUID[GUID_SIZE_IN_CHARS]; /* Maximum time to wait in the kernel */ inm_s32_t Seconds; } WAIT_FOR_DB_NOTIFY; typedef struct { unsigned char VolumeGUID[GUID_SIZE_IN_CHARS]; inm_u32_t threshold; } get_db_thres_t; typedef struct { unsigned char VolumeGUID[GUID_SIZE_IN_CHARS]; inm_u64_t TimeInHundNanoSecondsFromJan1601; inm_u32_t ulSequenceNumber; inm_u32_t ulReserved; } RESYNC_START; typedef struct { unsigned char VolumeGUID[GUID_SIZE_IN_CHARS]; inm_u64_t TimeInHundNanoSecondsFromJan1601; inm_u64_t ullSequenceNumber; } RESYNC_START_V2; typedef struct { unsigned char VolumeGUID[GUID_SIZE_IN_CHARS]; inm_u64_t TimeInHundNanoSecondsFromJan1601; inm_u32_t ulSequenceNumber; inm_u32_t ulReserved; } RESYNC_END; typedef struct { unsigned char VolumeGUID[GUID_SIZE_IN_CHARS]; inm_u64_t TimeInHundNanoSecondsFromJan1601; inm_u64_t ullSequenceNumber; } RESYNC_END_V2; typedef struct _DRIVER_VERSION { unsigned short ulDrMajorVersion; unsigned short ulDrMinorVersion; unsigned short ulDrMinorVersion2; unsigned short ulDrMinorVersion3; unsigned short ulPrMajorVersion; unsigned short ulPrMinorVersion; unsigned short ulPrMinorVersion2; unsigned short ulPrBuildNumber; } DRIVER_VERSION, *PDRIVER_VERSION; #define TAG_VOLUME_INPUT_FLAGS_ATOMIC_TO_VOLUME_GROUP 0x0001 #define TAG_FS_CONSISTENCY_REQUIRED 0x0002 #define TAG_ALL_PROTECTED_VOLUME_IOBARRIER 0x0004 /* Structure definition to freeze */ typedef struct volume_info { int flags; /* Fill it with 0s, will be used in future */ int status; /* Status of the volume */ char vol_name[TAG_VOLUME_MAX_LENGTH]; /* volume name */ } volume_info_t; typedef struct freeze_info { int nr_vols; /* No. of volumes */ int timeout; /* timeout in terms of seconds */ volume_info_t *vol_info; /* array of volume_info_t object */ char tag_guid[GUID_LEN];/* one guid fro set of volumes*/ } freeze_info_t; /* Structure definition to thaw */ typedef struct thaw_info { int nr_vols; /* No. of volumes */ volume_info_t *vol_info; /* array of volume_info_t object */ char tag_guid[GUID_LEN]; } thaw_info_t; typedef struct tag_names { unsigned short tag_len;/*tag length header plus name*/ char tag_name[TAG_MAX_LENGTH]; /* volume name */ } tag_names_t; /* Structure definition to tag */ typedef struct tag_info { int flags;/* Fill it with 0s, will be used in future */ int nr_vols; int nr_tags; int timeout;/*time to not drain the dirty block has tag*/ char tag_guid[GUID_LEN];/* one guid fro set of volumes*/ volume_info_t *vol_info; /* Array of volume_info_t object */ tag_names_t *tag_names; /* Arrays of tag names */ /* each of length TAG_MAX_LENGTH */ } tag_info_t_v2; /* * IO Barriers for Crash Consistency */ typedef struct flt_barrier_create { char fbc_guid[GUID_LEN]; int fbc_timeout_ms; int fbc_flags; } flt_barrier_create_t; typedef struct flt_barrier_remove { char fbr_guid[GUID_LEN]; int fbr_flags; } flt_barrier_remove_t; typedef enum _TAG_COMMIT_STATUS_T { TAG_REVOKE = 0, TAG_COMMIT = 1 } TAG_COMMIT_STATUS_T; typedef struct flt_tag_commit { char ftc_guid[GUID_LEN]; TAG_COMMIT_STATUS_T ftc_flags; } flt_tag_commit_t; #define INM_VOL_NAME_MAP_GUID 0x1 #define INM_VOL_NAME_MAP_PNAME 0x2 typedef struct vol_pname{ int vnm_flags; char vnm_request[INM_GUID_LEN_MAX]; char vnm_response[INM_GUID_LEN_MAX]; } vol_name_map_t; typedef struct _COMMIT_DB_FAILURE_STATS { VOLUME_GUID DeviceID; inm_u64_t ulTransactionID; inm_u64_t ullFlags; inm_u64_t ullErrorCode; } COMMIT_DB_FAILURE_STATS; #define COMMITDB_NETWORK_FAILURE 0x00000001 #define DEFAULT_NR_CHURN_BUCKETS 11 typedef struct _DEVICE_CXFAILURE_STATS { VOLUME_GUID DeviceId; inm_u64_t ullFlags; inm_u64_t ChurnBucketsMBps[DEFAULT_NR_CHURN_BUCKETS]; inm_u64_t ExcessChurnBucketsMBps[DEFAULT_NR_CHURN_BUCKETS]; inm_u64_t CxStartTS; inm_u64_t ullMaxDiffChurnThroughputTS; inm_u64_t firstNwFailureTS; inm_u64_t lastNwFailureTS; inm_u64_t firstPeakChurnTS; inm_u64_t lastPeakChurnTS; inm_u64_t CxEndTS; inm_u64_t ullLastNWErrorCode; inm_u64_t ullMaximumPeakChurnInBytes; inm_u64_t ullDiffChurnThroughputInBytes; inm_u64_t ullMaxDiffChurnThroughputInBytes; inm_u64_t ullTotalNWErrors; inm_u64_t ullNumOfConsecutiveTagFailures; inm_u64_t ullTotalExcessChurnInBytes; inm_u64_t ullMaxS2LatencyInMS; } DEVICE_CXFAILURE_STATS, *PDEVICE_CXFAILURE_STATS; /* Disk Level Flags */ #define DISK_CXSTATUS_NWFAILURE_FLAG 0x00000001 #define DISK_CXSTATUS_PEAKCHURN_FLAG 0x00000002 #define DISK_CXSTATUS_CHURNTHROUGHPUT_FLAG 0x00000004 #define DISK_CXSTATUS_EXCESS_CHURNBUCKET_FLAG 0x00000008 #define DISK_CXSTATUS_MAX_CHURNTHROUGHPUT_FLAG 0x00000010 #define DISK_CXSTATUS_DISK_NOT_FILTERED 0x00000020 #define DISK_CXSTATUS_DISK_REMOVED 0x00000040 typedef struct _GET_CXFAILURE_NOTIFY { inm_u64_t ullFlags; inm_u64_t ulTransactionID; inm_u64_t ullMinConsecutiveTagFailures; inm_u64_t ullMaxVMChurnSupportedMBps; inm_u64_t ullMaxDiskChurnSupportedMBps; inm_u64_t ullMaximumTimeJumpFwdAcceptableInMs; inm_u64_t ullMaximumTimeJumpBwdAcceptableInMs; inm_u32_t ulNumberOfOutputDisks; inm_u32_t ulNumberOfProtectedDisks; VOLUME_GUID DeviceIdList[1]; } GET_CXFAILURE_NOTIFY, *PGET_CXFAILURE_NOTIFY; #define CXSTATUS_COMMIT_PREV_SESSION 0x00000001 typedef struct _VM_CXFAILURE_STATS { inm_u64_t ullFlags; inm_u64_t ulTransactionID; inm_u64_t ChurnBucketsMBps[DEFAULT_NR_CHURN_BUCKETS]; inm_u64_t ExcessChurnBucketsMBps[DEFAULT_NR_CHURN_BUCKETS]; inm_u64_t CxStartTS; inm_u64_t ullMaxChurnThroughputTS; inm_u64_t firstPeakChurnTS; inm_u64_t lastPeakChurnTS; inm_u64_t CxEndTS; inm_u64_t ullMaximumPeakChurnInBytes; inm_u64_t ullDiffChurnThroughputInBytes; inm_u64_t ullMaxDiffChurnThroughputInBytes; inm_u64_t ullTotalExcessChurnInBytes; inm_u64_t TimeJumpTS; inm_u64_t ullTimeJumpInMS; inm_u64_t ullNumOfConsecutiveTagFailures; inm_u64_t ullMaxS2LatencyInMS; inm_u32_t ullNumDisks; DEVICE_CXFAILURE_STATS DeviceCxStats[1]; } VM_CXFAILURE_STATS, *PVM_CXFAILURE_STATS; /* VM Level Flags */ #define VM_CXSTATUS_PEAKCHURN_FLAG 0x00000001 #define VM_CXSTATUS_CHURNTHROUGHPUT_FLAG 0x00000002 #define VM_CXSTATUS_TIMEJUMP_FWD_FLAG 0x00000004 #define VM_CXSTATUS_TIMEJUMP_BCKWD_FLAG 0x00000008 #define VM_CXSTATUS_EXCESS_CHURNBUCKETS_FLAG 0x00000010 #define VM_CXSTATUS_MAX_CHURNTHROUGHPUT_FLAG 0x00000020 /* IOCTL codes for involflt driver in linux. */ #define FLT_IOCTL 0xfe enum { STOP_FILTER_CMD = 0, START_FILTER_CMD, START_NOTIFY_CMD, SHUTDOWN_NOTIFY_CMD, GET_DB_CMD, COMMIT_DB_CMD, SET_VOL_FLAGS_CMD, GET_VOL_FLAGS_CMD, WAIT_FOR_DB_CMD, CLEAR_DIFFS_CMD, GET_TIME_CMD, UNSTACK_ALL_CMD, SYS_SHUTDOWN_NOTIFY_CMD, TAG_CMD, WAKEUP_THREADS_CMD, GET_DB_THRESHOLD_CMD, VOLUME_STACKING_CMD, RESYNC_START_CMD, RESYNC_END_CMD, GET_DRIVER_VER_CMD, GET_SHELL_LOG_CMD, AT_LUN_CREATE_CMD, AT_LUN_DELETE_CMD, AT_LUN_LAST_WRITE_VI_CMD, AT_LUN_QUERY_CMD, GET_GLOBAL_STATS_CMD, GET_VOLUME_STATS_CMD, GET_PROTECTED_VOLUME_LIST_CMD, GET_SET_ATTR_CMD, BOOTTIME_STACKING_CMD, VOLUME_UNSTACKING_CMD, START_FILTER_CMD_V2, SYNC_TAG_CMD, SYNC_TAG_STATUS_CMD, START_MIRROR_CMD, STOP_MIRROR_CMD, MIRROR_VOLUME_STACKING_CMD, MIRROR_EXCEPTION_NOTIFY_CMD, AT_LUN_LAST_HOST_IO_TIMESTAMP_CMD, GET_DMESG_CMD, BLOCK_AT_LUN_CMD, BLOCK_AT_LUN_ACCESS_CMD, MAX_XFER_SZ_CMD, GET_ADDITIONAL_VOLUME_STATS_CMD, GET_VOLUME_LATENCY_STATS_CMD, GET_VOLUME_BMAP_STATS_CMD, SET_INVOLFLT_VERBOSITY_CMD, MIRROR_TEST_HEARTBEAT_CMD, INIT_DRIVER_FULLY_CMD, VOLUME_FREEZE_CMD, TAG_CMD_V2, VOLUME_THAW_CMD, TAG_CMD_V3, CREATE_BARRIER, REMOVE_BARRIER, TAG_COMMIT_V2, SYS_PRE_SHUTDOWN_NOTIFY_CMD, GET_MONITORING_STATS_CMD, GET_BLK_MQ_STATUS_CMD, GET_VOLUME_STATS_V2_CMD, REPLICATION_STATE, NAME_MAPPING, COMMITDB_FAIL_TRANS, GET_CXSTATS_NOTIFY, WAKEUP_GET_CXSTATS_NOTIFY_THREAD, LCW, WAIT_FOR_DB_CMD_V2, TAG_DRAIN_NOTIFY, WAKEUP_TAG_DRAIN_NOTIFY_THREAD, MODIFY_PERSISTENT_DEVICE_NAME, GET_DRAIN_STATE_CMD, SET_DRAIN_STATE_CMD, GET_DB_CMD_V2 }; /* Error numbers to report out of sync */ #define ERROR_TO_REG_BITMAP_READ_ERROR 0x0002 #define ERROR_TO_REG_BITMAP_WRITE_ERROR 0x0003 #define ERROR_TO_REG_BITMAP_OPEN_ERROR 0x0004 #define ERROR_TO_REG_BITMAP_OPEN_FAIL_CHANGES_LOST (0x0005) #define ERROR_TO_REG_OUT_OF_BOUND_IO 0x0006 #define ERROR_TO_REG_INVALID_IO 0x0007 #define ERROR_TO_REG_DESCRIPTION_IN_EVENT_LOG 0x0008 #define ERROR_TO_REG_NO_MEM_FOR_WORK_QUEUE_ITEM 0x0009 #define ERROR_TO_REG_WRITE_TO_CNT_IOC_PATH 0x000a #define ERROR_TO_REG_VENDOR_CDB_ERR 0x000b #define RESYNC_DUE_TO_ERR_INJECTION 0x000c #define ERROR_TO_REG_AT_PATHS_FAILURE 0x000d #define ERROR_TO_REG_BAD_AT_DEVICE_LIST 0x000e #define ERROR_TO_REG_FAILED_TO_ALLOC_BIOINFO 0x000f #define ERROR_TO_REG_NEW_SOURCE_PATH_ADDED 0x0010 #define ERROR_TO_REG_UNCLEAN_SYS_SHUTDOWN 0x0011 #define ERROR_TO_REG_PTIO_CANCEL_FAILED 0x0012 #define ERROR_TO_REG_OOD_ISSUE 0x0013 #define ERROR_TO_REG_IO_SIZE_64MB_METADATA 0x0014 #define ERROR_TO_REG_BITMAP_DEVOBJ_NOT_FOUND 0x0015 #define ERROR_TO_REG_LEARN_PHYSICAL_IO_FAILURE 0x0016 #define ERROR_TO_REG_PRESHUTDOWN_BITMAP_FLUSH_FAILURE 0x0017 #define ERROR_TO_REG_UNCLEAN_SYS_BOOT 0x0018 #define ERROR_TO_REG_UNSUPPORTED_IO 0x0019 #define ERROR_TO_REG_MAX_ERROR ERROR_TO_REG_UNCLEAN_SYS_BOOT #define EXTRA_PROTECTED_VOLUME 4096 * 10 #define INITIAL_BUFSZ_FOR_VOL_LIST 4096 * 500 /* Macros to denote clean/unclean shutdown */ #define UNCLEAN_SHUTDOWN 0 #define CLEAN_SHUTDOWN 1 #ifdef INM_LINUX #include #if (defined(redhat) && (DISTRO_VER==5) && (UPDATE>=4)) #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,24) typedef struct address_space_operations inm_address_space_operations_t; #else typedef struct address_space_operations_ext inm_address_space_operations_t; #endif #else #ifdef INM_RECUSIVE_ADSPC typedef struct address_space inm_address_space_operations_t; #else typedef struct address_space_operations inm_address_space_operations_t; #endif #endif #endif enum TagStatus { STATUS_PENDING = 1, /* Tag is pending */ STATUS_COMMITED, /* Tag is commited by drainer */ STATUS_DELETED, /* Tag is deleted due to stop filtering or clear diffs */ STATUS_DROPPED, /* Tag is dropped due to write in bitmap file */ STATUS_FAILURE, /* Some error occured while adding tag */ }; typedef struct _inm_resync_notify_info { inm_u64_t rsin_out_of_sync_count; inm_u64_t rsin_resync_err_code; inm_u64_t rsin_out_of_sync_time_stamp; inm_u64_t rsin_flag; inm_u64_t rsin_out_of_sync_err_status; inm_s32_t timeout_in_sec; char rsin_src_scsi_id[INM_MAX_SCSI_ID_SIZE]; char rsin_err_string_resync[UDIRTY_BLOCK_MAX_ERROR_STRING_SIZE]; eMirrorConfErrors_t rstatus; } inm_resync_notify_info_t; #define INM_SET_RESYNC_REQ_FLAG 0x1 #define INM_RESET_RESYNC_REQ_FLAG 0x2 /* driver states */ #define INM_ALLOW_UNLOAD 0x00000001 #define INM_PREVENT_UNLOAD 0x00000002 #define INM_FAILED_UNLOAD 0x00000004 #define INM_ALL_FREE 255 /* Wait in seconds to drain outstanding IOs */ #define INM_WAIT_UNLOAD 10 #define ADD_AT_LUN_GLOBAL_LIST 1 #define DEL_AT_LUN_GLOBAL_LIST 2 typedef struct _VOLUME_STATS_ADDITIONAL_INFO { VOLUME_GUID VolumeGuid; inm_u64_t ullTotalChangesPending; inm_u64_t ullOldestChangeTimeStamp; inm_u64_t ullDriverCurrentTimeStamp; }VOLUME_STATS_ADDITIONAL_INFO, *PVOLUME_STATS_ADDITIONAL_INFO; /* structure for latency distribution */ #define INM_LATENCY_DIST_BKT_CAPACITY 12 #define INM_LATENCY_LOG_CAPACITY 12 typedef struct __VOLUME_LATENCY_STATS { VOLUME_GUID VolumeGuid; inm_u64_t s2dbret_bkts[INM_LATENCY_DIST_BKT_CAPACITY]; inm_u32_t s2dbret_freq[INM_LATENCY_DIST_BKT_CAPACITY]; inm_u32_t s2dbret_nr_avail_bkts; inm_u64_t s2dbret_log_buf[INM_LATENCY_LOG_CAPACITY]; inm_u64_t s2dbret_log_min; inm_u64_t s2dbret_log_max; inm_u64_t s2dbwait_notify_bkts[INM_LATENCY_DIST_BKT_CAPACITY]; inm_u32_t s2dbwait_notify_freq[INM_LATENCY_DIST_BKT_CAPACITY]; inm_u32_t s2dbwait_notify_nr_avail_bkts; inm_u64_t s2dbwait_notify_log_buf[INM_LATENCY_LOG_CAPACITY]; inm_u64_t s2dbwait_notify_log_min; inm_u64_t s2dbwait_notify_log_max; inm_u64_t s2dbcommit_bkts[INM_LATENCY_DIST_BKT_CAPACITY]; inm_u32_t s2dbcommit_freq[INM_LATENCY_DIST_BKT_CAPACITY]; inm_u32_t s2dbcommit_nr_avail_bkts; inm_u64_t s2dbcommit_log_buf[INM_LATENCY_LOG_CAPACITY]; inm_u64_t s2dbcommit_log_min; inm_u64_t s2dbcommit_log_max; } VOLUME_LATENCY_STATS; typedef struct __VOLUME_BMAP_STATS { VOLUME_GUID VolumeGuid; inm_u64_t bmap_gran; inm_u64_t bmap_data_sz; inm_u32_t nr_dbs; } VOLUME_BMAP_STATS; #define INM_DEBUG_ONLY 0x1 #define INM_IDEBUG 0x2 #define INM_IDEBUG_BMAP 0x4 #define INM_IDEBUG_MIRROR 0x8 #define INM_IDEBUG_MIRROR_IO 0x10 #define INM_IDEBUG_META 0x20 #define INM_IDEBUG_REF 0x40 #define INM_IDEBUG_IO 0x80 #define IS_DBG_ENABLED(verbosity, flag) ((verbosity & flag) == flag) #ifndef FLT_VERBOSITY #define FLT_VERBOSITY extern inm_u32_t inm_verbosity; #endif typedef struct _inm_user_max_xfer_sz{ inm_u32_t mxs_flag; char mxs_devname[INM_GUID_LEN_MAX]; inm_u32_t mxs_max_xfer_sz; }inm_user_max_xfer_sz_t; enum common_prams_idx { DataPoolSize, DefaultLogDirectory, FreeThresholdForFileWrite, CommonVolumeThresholdForFileWrite, DirtyBlockHighWaterMarkServiceNotStarted, DirtyBlockLowWaterMarkServiceRunning, DirtyBlockHighWaterMarkServiceRunning, DirtyBlockHighWaterMarkServiceShutdown, DirtyBlocksToPurgeWhenHighWaterMarkIsReached, MaximumBitmapBufferMemory, Bitmap512KGranularitySize, CommonVolumeDataFiltering, CommonVolumeDataFilteringForNewVolumes, CommonVolumeDataFiles, CommonVolumeDataFilesForNewVolumes, CommonVolumeDataToDiskLimitInMB, CommonVolumeDataNotifyLimit, SequenceNumber, MaxDataSizeForDataModeDirtyBlock, CommonVolumeResDataPoolSize, MaxDataPoolSize, CleanShutdown, MaxCoalescedMetaDataChangeSize, PercentChangeDataPoolSize, TimeReorgDataPoolSec, TimeReorgDataPoolFactor, VacpIObarrierTimeout, FsFreezeTimeout, VacpAppTagCommitTimeout }; enum volume_params_idx { VolumeFilteringDisabled, VolumeBitmapReadDisabled, VolumeBitmapWriteDisabled, VolumeDataFiltering, VolumeDataFiles, VolumeDataToDiskLimitInMB, VolumeDataNotifyLimitInKB, VolumeDataLogDirectory, VolumeBitmapGranularity, VolumeResyncRequired, VolumeOutOfSyncErrorCode, VolumeOutOfSyncErrorStatus, VolumeOutOfSyncCount, VolumeOutOfSyncTimestamp, VolumeOutOfSyncErrorDescription, VolumeFilterDevType, VolumeNblks, VolumeBsize, VolumeResDataPoolSize, VolumeMountPoint, VolumePrevEndTimeStamp, VolumePrevEndSequenceNumber, VolumePrevSequenceIDforSplitIO, VolumePTPath, VolumeATDirectRead, VolumeMirrorSourceList, VolumeMirrorDestinationList, VolumeMirrorDestinationScsiID, VolumeDiskFlags, VolumeIsDeviceMultipath, VolumeDeviceVendor, VolumeDevStartOff, VolumePTpathList, VolumePerfOptimization, VolumeMaxXferSz, VolumeRpoTimeStamp, VolumeDrainBlocked }; #define GET_ATTR 1 #define SET_ATTR 2 #define GET_SET_ATTR_BUF_LEN (0x2000) #define REPLICATION_STATE_DIFF_SYNC_THROTTLED 0x1 #define REPLICATION_STATES_SUPPORTED (REPLICATION_STATE_DIFF_SYNC_THROTTLED) typedef struct _REPLICATION_STATE{ VOLUME_GUID DeviceId; inm_u64_t ulFlags; inm_u64_t Timestamp; char Data[1]; }replication_state_t; typedef enum _svc_state { SERVICE_UNITIALIZED = 0x00, SERVICE_NOTSTARTED = 0x01, SERVICE_RUNNING = 0x02, SERVICE_SHUTDOWN = 0x03, MAX_SERVICE_STATES = 0x04, } svc_state_t; typedef enum _etDriverMode { UninitializedMode = 0, NoRebootMode, RebootMode } etDriverMode; #define VOLUME_STATS_DATA_MAJOR_VERSION 0x0003 #define VOLUME_STATS_DATA_MINOR_VERSION 0x0000 typedef struct _VOLUME_STATS_DATA { unsigned short usMajorVersion; unsigned short usMinorVersion; unsigned long ulVolumesReturned; unsigned long ulNonPagedMemoryLimitInMB; unsigned long LockedDataBlockCounter; unsigned long ulTotalVolumes; unsigned short ulNumProtectedDisk; svc_state_t eServiceState; etDriverMode eDiskFilterMode; char LastShutdownMarker; int PersistentRegistryCreated; unsigned long ulDriverFlags; long ulCommonBootCounter; unsigned long long ullDataPoolSizeAllocated; unsigned long long ullPersistedTimeStampAfterBoot; unsigned long long ullPersistedSequenceNumberAfterBoot; } VOLUME_STATS_DATA; typedef struct _LARGE_INTEGER { long long QuadPart; } LARGE_INTEGER; typedef struct _ULARGE_INTEGER { unsigned long long QuadPart; } ULARGE_INTEGER; typedef struct _VOLUME_STATS_V2 { char VolumeGUID[GUID_SIZE_IN_CHARS]; unsigned long long ullDataPoolSize; LARGE_INTEGER liDriverLoadTime; long long llTimeJumpDetectedTS; long long llTimeJumpedTS; LARGE_INTEGER liLastS2StartTime; LARGE_INTEGER liLastS2StopTime; LARGE_INTEGER liLastAgentStartTime; LARGE_INTEGER liLastAgentStopTime; LARGE_INTEGER liLastTagReq; LARGE_INTEGER liStopFilteringAllTimeStamp; /* per disk stats */ unsigned long long ullTotalTrackedBytes; ULARGE_INTEGER ulVolumeSize; unsigned long ulVolumeFlags; LARGE_INTEGER liVolumeContextCreationTS; LARGE_INTEGER liStartFilteringTimeStamp; LARGE_INTEGER liStartFilteringTimeStampByUser; LARGE_INTEGER liStopFilteringTimeStamp; LARGE_INTEGER liStopFilteringTimestampByUser; LARGE_INTEGER liClearDiffsTimeStamp; LARGE_INTEGER liCommitDBTimeStamp; LARGE_INTEGER liGetDBTimeStamp; } VOLUME_STATS_V2; typedef struct _TELEMETRY_VOL_STATS { VOLUME_STATS_DATA drv_stats; VOLUME_STATS_V2 vol_stats; } TELEMETRY_VOL_STATS; enum LCW_OP { LCW_OP_NONE, LCW_OP_BMAP_MAP_FILE, LCW_OP_BMAP_SWITCH_RAWIO, LCW_OP_BMAP_CLOSE, LCW_OP_BMAP_OPEN, LCW_OP_MAP_FILE, LCW_OP_MAX }; typedef struct lcw_op { enum LCW_OP lo_op; VOLUME_GUID lo_name; } lcw_op_t; typedef enum { TAG_STATUS_UNINITALIZED, TAG_STATUS_INPUT_VERIFIED, TAG_STATUS_TAG_REQUEST_NOT_RECEIVED, TAG_STATUS_INSERTED, TAG_STATUS_INSERTION_FAILED, TAG_STATUS_DROPPED, TAG_STATUS_COMMITTED, TAG_STATUS_UNKNOWN }TAG_DEVICE_COMMIT_STATUS; typedef enum { DEVICE_STATUS_SUCCESS = 0, DEVICE_STATUS_NON_WRITE_ORDER_STATE, DEVICE_STATUS_FILTERING_STOPPED, DEVICE_STATUS_REMOVED, DEVICE_STATUS_DISKID_CONFLICT, DEVICE_STATUS_NOT_FOUND, DEVICE_STATUS_DRAIN_BLOCK_FAILED, DEVICE_STATUS_DRAIN_ALREADY_BLOCKED, DEVICE_STATUS_UNKNOWN }DEVICE_STATUS; typedef struct _TAG_COMMIT_STATUS { VOLUME_GUID DeviceId; DEVICE_STATUS Status; TAG_DEVICE_COMMIT_STATUS TagStatus; inm_u64_t TagInsertionTime; inm_u64_t TagSequenceNumber; DEVICE_CXFAILURE_STATS DeviceCxStats; } TAG_COMMIT_STATUS, *PTAG_COMMIT_STATUS; #define TAG_COMMIT_NOTIFY_BLOCK_DRAIN_FLAG 0x00000001 typedef struct _TAG_COMMIT_NOTIFY_INPUT { char TagGUID[GUID_LEN]; inm_u64_t ulFlags; inm_u64_t ulNumDisks; VOLUME_GUID DeviceId[1]; } TAG_COMMIT_NOTIFY_INPUT, *PTAG_COMMIT_NOTIFY_INPUT; typedef struct _TAG_COMMIT_NOTIFY_OUTPUT{ char TagGUID[GUID_LEN]; inm_u64_t ulFlags; VM_CXFAILURE_STATS vmCxStatus; inm_u64_t ulNumDisks; TAG_COMMIT_STATUS TagStatus[1]; } TAG_COMMIT_NOTIFY_OUTPUT, *PTAG_COMMIT_NOTIFY_OUTPUT; typedef struct _SET_DRAIN_STATE_INPUT { inm_u64_t ulFlags; inm_u64_t ulNumDisks; VOLUME_GUID DeviceId[1]; } SET_DRAIN_STATE_INPUT, *PSET_DRAIN_STATE_INPUT; typedef enum { SET_DRAIN_STATUS_SUCCESS = 0, SET_DRAIN_STATUS_DEVICE_NOT_FOUND, SET_DRAIN_STATUS_PERSISTENCE_FAILED, SET_DRAIN_STATUS_UNKNOWN } ERROR_SET_DRAIN_STATUS; typedef struct _SET_DRAIN_STATUS { VOLUME_GUID DeviceId; inm_u64_t ulFlags; ERROR_SET_DRAIN_STATUS Status; inm_u64_t ulInternalError; } SET_DRAIN_STATUS, *PSET_DRAIN_STATUS; typedef struct _SET_DRAIN_STATE_OUTPUT { inm_u64_t ulFlags; inm_u64_t ulNumDisks; SET_DRAIN_STATUS diskStatus[1]; } SET_DRAIN_STATE_OUTPUT, *PSET_DRAIN_STATE_OUTPUT; typedef struct _GET_DISK_STATE_INPUT { inm_u64_t ulNumDisks; VOLUME_GUID DeviceId[1]; } GET_DISK_STATE_INPUT, *PGET_DISK_STATE_INPUT; typedef struct _DISK_STATE { VOLUME_GUID DeviceId; inm_u64_t Status; inm_u64_t ulFlags; } DISK_STATE; #define DISK_STATE_FILTERED 0x00000001 #define DISK_STATE_DRAIN_BLOCKED 0x00000002 typedef struct _GET_DISK_STATE_OUTPUT { inm_u64_t ulSupportedFlags; inm_u64_t ulNumDisks; DISK_STATE diskState[1]; } GET_DISK_STATE_OUTPUT, *PGET_DISK_STATE_OUTPUT; typedef struct _MODIFY_PERSISTENT_DEVICE_NAME_INPUT { VOLUME_GUID DevName; VOLUME_GUID OldPName; VOLUME_GUID NewPName; } MODIFY_PERSISTENT_DEVICE_NAME_INPUT, *PMODIFY_PERSISTENT_DEVICE_NAME_INPUT; #endif /* ifndef INVOLFLT_H */ involflt-0.1.0/src/VBitmap.c0000755000000000000000000021323414467303177014364 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include "involflt.h" #include "involflt-common.h" #include "data-mode.h" #include "utils.h" #include "change-node.h" #include "filestream.h" #include "iobuffer.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "bitmap_api.h" #include "VBitmap.h" #include "work_queue.h" #include "data-file-mode.h" #include "target-context.h" #include "driver-context.h" #include "metadata-mode.h" #include "db_routines.h" #include "file-io.h" #include "tunable_params.h" #include "telemetry.h" extern driver_context_t *driver_ctx; #define update_target_context_stats(vcptr) \ do{ \ vcptr->tc_bp->num_changes_queued_for_writing += vcptr->tc_pending_changes; \ vcptr->tc_bp->num_of_times_bitmap_written++; \ vcptr->tc_bp->num_byte_changes_queued_for_writing += vcptr->tc_bytes_pending_changes; \ vcptr->tc_pending_changes = 0; \ vcptr->tc_pending_md_changes = 0; \ vcptr->tc_bytes_pending_md_changes = 0; \ vcptr->tc_bytes_pending_changes = 0; \ vcptr->tc_bytes_pending_changes = 0; \ \ vcptr->tc_pending_wostate_data_changes = 0; \ vcptr->tc_pending_wostate_md_changes = 0; \ vcptr->tc_pending_wostate_bm_changes = 0; \ vcptr->tc_pending_wostate_rbm_changes = 0; \ }while(0) volume_bitmap_t *open_bitmap_file(target_context_t *vcptr, inm_s32_t *status) { volume_bitmap_t *vol_bitmap = NULL; bitmap_api_t *bmap_api = NULL; inm_u64_t bitmap_granularity = 0; inm_u64_t bitmap_granularity_bmfile = 0; inm_s32_t vol_in_sync = TRUE; inm_s32_t inmage_open_status = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!vcptr) return NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered - target volume %p (%s)\n", vcptr, vcptr->tc_guid); } if (is_rootfs_ro()) { err("Root filesystem is RO. Retry bitmap open later.."); return NULL; } vol_bitmap = allocate_volume_bitmap(); if (!vol_bitmap) { err("allocation of vol_bitmap failed \n"); return NULL; } /* We are storing volume size in target context * during creation of target context */ bitmap_granularity_bmfile = get_bmfile_granularity(vcptr); *status = get_volume_bitmap_granularity(vcptr, &bitmap_granularity); if (bitmap_granularity_bmfile && bitmap_granularity_bmfile != bitmap_granularity && bitmap_granularity_bmfile % INM_SECTOR_SIZE == 0) { bitmap_granularity = bitmap_granularity_bmfile; info("Upgrade for %s: retaining older bitmap granularity %llu", vcptr->tc_bp->bitmap_file_name, bitmap_granularity_bmfile); } if (*status != 0 || !bitmap_granularity) goto cleanup_and_return_error; vol_bitmap->segment_cache_limit = MAX_BITMAP_SEGMENT_BUFFERS; bmap_api = bitmap_api_ctr(); if (bmap_api) { inmage_open_status = 0; vol_bitmap->bitmap_api = bmap_api; INM_DOWN(&vcptr->tc_sem); *status = bitmap_api_open(bmap_api, vcptr, bitmap_granularity, vcptr->tc_bp->bitmap_offset, inm_dev_size_get(vcptr), vcptr->tc_guid, vol_bitmap->segment_cache_limit, &inmage_open_status); INM_UP(&vcptr->tc_sem); if (*status == 0) { /* if success */ if (vcptr->tc_bp->num_bitmap_open_errors) log_bitmap_open_success_event(vcptr); if (inmage_open_status != 0) { // DS allocated, hdr may/not contain data dbg("bitmap data structures allocated err = %x\n", inmage_open_status); } } else { if (inmage_open_status != 0) { info("Error %x in opening bitmap file for volume %s\n", inmage_open_status, vcptr->tc_guid); /* Logging error into InMageFltLogError() */ } goto cleanup_and_return_error; } if (bmap_api->io_bitmap_header) { *status = is_volume_in_sync(bmap_api, &vol_in_sync, &inmage_open_status); if (*status != 0) { /* 0 means success */ err("bitmap file open failed"); goto cleanup_and_return_error; } if (vol_in_sync == FALSE) { /* volume is not in sync */ info("resync triggered"); set_volume_out_of_sync(vcptr, inmage_open_status, inmage_open_status); /* CDataFile::DeleteDataFilesInDirectory() */ bmap_api->volume_insync = TRUE; } } } else { err("failed to allocate bitmap_api"); goto cleanup_and_return_error; } if (bmap_api->fs) { /* unset ignore bitmap creation flag. */ volume_lock(vcptr); vcptr->tc_flags &= ~VCF_IGNORE_BITMAP_CREATION; volume_unlock(vcptr); vol_bitmap->eVBitmapState = ecVBitmapStateOpened; set_tgt_ctxt_wostate(vcptr, ecWriteOrderStateBitmap, FALSE, ecWOSChangeReasonUnInitialized); } else { vol_bitmap->eVBitmapState = ecVBitmapStateClosed; set_tgt_ctxt_wostate(vcptr, ecWriteOrderStateRawBitmap, FALSE, ecWOSChangeReasonUnInitialized); } vcptr->tc_stats.st_wostate_switch_time = INM_GET_CURR_TIME_IN_SEC; /* reference the volume context */ get_tgt_ctxt(vcptr); vol_bitmap->volume_context = vcptr; /* vol_bitmap->volume_GUID is only used as string in logging messages */ memcpy_s(vol_bitmap->volume_GUID, sizeof(vol_bitmap->volume_GUID), vcptr->tc_guid, sizeof(vcptr->tc_guid)); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return vol_bitmap; cleanup_and_return_error: if (vol_bitmap->bitmap_api) { bitmap_api_dtr(vol_bitmap->bitmap_api); vol_bitmap->bitmap_api = NULL; bmap_api = NULL; } if (vol_bitmap) { put_volume_bitmap(vol_bitmap); vol_bitmap = NULL; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return NULL; } /* CloseBitmapFile */ /** * FUNCTION NAME: close_bitmap_file * * DESCRIPTION : This function is a wrapper function that closes the bitmapfile * and deallocates some of the bitmap data structures * On success, it frees bitmap_api, volume bitmap ptrs * INPUT PARAMETERS : vbmap -> ptr to volume bitmap * clear_bitmap -> flag to indicate whether to clear bits * in bitmap or not * * * OUTPUT PARAMETERS : * NOTES * * return value : closes the bitmap file - for success * returns without successful operation- for invalid inputs * **/ void close_bitmap_file(volume_bitmap_t *vbmap, inm_s32_t clear_bitmap) { inm_s32_t wait_for_notification = FALSE; inm_s32_t set_bits_work_item_list_empty = FALSE; unsigned long lock_flag = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!vbmap) return; //-EINVAL; /* * If clear_bitmap is true, set the skip_writes flag so * that any bitmap writes queued turn to no-ops as we * will eventually clear all the bits. Since this is an * advisory flag and ONLY SET HERE, no need for locking. */ INM_DOWN(&vbmap->sem); if (clear_bitmap) vbmap->bitmap_skip_writes = 1; INM_UP(&vbmap->sem); do { if (wait_for_notification) { INM_WAIT_FOR_COMPLETION(&vbmap->set_bits_work_item_list_empty_notification); dbg("waiting on set_bits_work_item_list_empty_notification"); dbg("for volume (%s)", vbmap->volume_GUID); wait_for_notification = FALSE; } INM_DOWN(&vbmap->sem); INM_SPIN_LOCK_IRQSAVE(&vbmap->lock, lock_flag); set_bits_work_item_list_empty = inm_list_empty(&vbmap->set_bits_work_item_list); INM_SPIN_UNLOCK_IRQRESTORE(&vbmap->lock, lock_flag); if (set_bits_work_item_list_empty) { /* set the state of the bitmap to close */ vbmap->eVBitmapState = ecVBitmapStateClosed; if (vbmap->bitmap_api) { inm_s32_t _rc = 0; if (clear_bitmap) { vbmap->bitmap_skip_writes = 0; INM_UP(&vbmap->sem); bitmap_api_clear_all_bits(vbmap->bitmap_api); INM_DOWN(&vbmap->sem); } INM_UP(&vbmap->sem); if (bitmap_api_close(vbmap->bitmap_api, &_rc)) { target_context_t *vcp = vbmap->volume_context; if (vcp && vcp->tc_pending_changes) { set_volume_out_of_sync(vcp, ERROR_TO_REG_BITMAP_OPEN_FAIL_CHANGES_LOST, 0); } } INM_DOWN(&vbmap->sem); bitmap_api_dtr(vbmap->bitmap_api); vbmap->bitmap_api = NULL; } if (vbmap->volume_context) { put_tgt_ctxt(vbmap->volume_context); vbmap->volume_context = NULL; } } else { vbmap->flags |= VOLUME_BITMAP_FLAGS_WAITING_FOR_SETBITS_WORKITEM_LIST_EMPTY_NOTIFICATION; wait_for_notification = TRUE; } INM_UP(&vbmap->sem); } while (wait_for_notification); put_volume_bitmap(vbmap); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return; } /** * FUNCTION NAME: wake_service_thread * * DESCRIPTION : This function wakesup service thread to open * bitmap file, esp when the driver mode switches from * metadata mode to bitmap mode. * * INPUT PARAMETERS : dc_lock_acquired -flag - true/false indicating * driver_ctx lock * * * * OUTPUT PARAMETERS : * NOTES * * return value : 0 - for success * EINVAL - for invalid inputs - for failure * **/ void wake_service_thread(inm_s32_t dc_lock_acquired) { if(IS_DBG_ENABLED(inm_verbosity, ((INM_IDEBUG | INM_IDEBUG_META) | INM_IDEBUG_BMAP))){ info("entered"); } INM_ATOMIC_INC(&driver_ctx->service_thread.wakeup_event_raised); INM_WAKEUP_INTERRUPTIBLE(&driver_ctx->service_thread.wakeup_event); INM_COMPLETE(&driver_ctx->service_thread._new_event_completion); dbg("waking up service thread\n"); if(IS_DBG_ENABLED(inm_verbosity, ((INM_IDEBUG | INM_IDEBUG_META) | INM_IDEBUG_BMAP))){ info("leaving"); } return; } void request_service_thread_to_open_bitmap(target_context_t *vcptr) { inm_s32_t wakeup_service_thread = FALSE; if(IS_DBG_ENABLED(inm_verbosity, ((INM_IDEBUG | INM_IDEBUG_META) | INM_IDEBUG_BMAP))){ info("entered"); } if (!vcptr) return; /* acquire vcptr->lock : if it is not acquired in involflt_completion*/ if(!(vcptr->tc_flags & VCF_OPEN_BITMAP_REQUESTED)) { vcptr->tc_flags |= VCF_OPEN_BITMAP_REQUESTED; wakeup_service_thread = TRUE; } /* release vcptr->lock */ if (wakeup_service_thread) wake_service_thread(FALSE); if(IS_DBG_ENABLED(inm_verbosity, ((INM_IDEBUG | INM_IDEBUG_META) | INM_IDEBUG_BMAP))){ info("leaving"); } return; } void bitmap_write_worker_routine(wqentry_t *wqe) { bitmap_work_item_t *bmap_witem = NULL; volume_bitmap_t *vbmap = NULL; unsigned long lock_flag = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } bmap_witem = (bitmap_work_item_t *) wqe->context; put_work_queue_entry(wqe); if (!bmap_witem) { info("bmap witem is null"); return; } vbmap = bmap_witem->volume_bitmap; if (!vbmap) { info("volume bitmap is null"); return; } get_volume_bitmap(vbmap); INM_DOWN(&vbmap->sem); if (!vbmap->bitmap_api || vbmap->eVBitmapState == ecVBitmapStateClosed) { info("failed state %s, bapi = %p, for volume %s", get_volume_bitmap_state_string(vbmap->eVBitmapState), vbmap->bitmap_api, vbmap->volume_GUID); if (bmap_witem->eBitmapWorkItem == ecBitmapWorkItemSetBits) { INM_SPIN_LOCK_IRQSAVE(&vbmap->lock, lock_flag); inm_list_del(&bmap_witem->list_entry); if ((vbmap->flags & VOLUME_BITMAP_FLAGS_WAITING_FOR_SETBITS_WORKITEM_LIST_EMPTY_NOTIFICATION) && inm_list_empty(&vbmap->set_bits_work_item_list)) { vbmap->flags &= ~VOLUME_BITMAP_FLAGS_WAITING_FOR_SETBITS_WORKITEM_LIST_EMPTY_NOTIFICATION; INM_COMPLETE(&vbmap->set_bits_work_item_list_empty_notification); } INM_SPIN_UNLOCK_IRQRESTORE(&vbmap->lock, lock_flag); } else { /* required to optimize inm_list_del is calling from * if and else places... and else part doesn't have * protection */ inm_list_del(&bmap_witem->list_entry); } volume_lock(vbmap->volume_context); vbmap->volume_context->tc_flags &= ~VCF_VOLUME_IN_BMAP_WRITE; volume_unlock(vbmap->volume_context); put_bitmap_work_item(bmap_witem); } else { bmap_witem->bit_runs.context1 = bmap_witem; bmap_witem->bit_runs.completion_callback = write_bitmap_completion_callback; INM_UP(&vbmap->sem); bitmap_api_setbits(vbmap->bitmap_api, &bmap_witem->bit_runs, vbmap); INM_DOWN(&vbmap->sem); } INM_UP(&vbmap->sem); put_volume_bitmap(vbmap); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return; } /** * FUNCTION NAME: get_volume_bitmap_granularity * * DESCRIPTION : This function returns the bitmap granularity if it set, * otherwise, it returns the default granularity, which is based on * volume size. * * INPUT PARAMETERS : vcptr - ptr to target context * bitmap_granularity - ptr to unsigned long * * * * OUTPUT PARAMETERS : bitmap_granularity - ptr to unsigned long * NOTES * * return value : 0 - for success * -EINVAL - for invalid inputs * **/ inm_s32_t get_volume_bitmap_granularity(target_context_t *vcptr, inm_u64_t *bitmap_granularity) { inm_s32_t status = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!vcptr || !bitmap_granularity || !inm_dev_size_get(vcptr)) return -EINVAL; *bitmap_granularity = default_granularity_from_volume_size(inm_dev_size_get(vcptr)); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with status = %d", status); } return status; } volume_bitmap_t * allocate_volume_bitmap(void) { volume_bitmap_t *vol_bitmap = NULL; vol_bitmap = (volume_bitmap_t *)INM_KMALLOC(sizeof(volume_bitmap_t), INM_KM_SLEEP, INM_PINNED_HEAP); if (!vol_bitmap) return NULL; INM_MEM_ZERO(vol_bitmap, sizeof(volume_bitmap_t)); INM_ATOMIC_SET(&vol_bitmap->refcnt, 1); vol_bitmap->eVBitmapState = ecVBitmapStateUnInitialized; INM_INIT_LIST_HEAD(&vol_bitmap->list_entry); INM_INIT_LIST_HEAD(&vol_bitmap->work_item_list); INM_INIT_LIST_HEAD(&vol_bitmap->set_bits_work_item_list); INM_INIT_COMPLETION(&vol_bitmap->set_bits_work_item_list_empty_notification); INM_INIT_SEM(&vol_bitmap->sem); INM_INIT_SPIN_LOCK(&vol_bitmap->lock); inm_list_add_tail(&vol_bitmap->list_entry, &driver_ctx->dc_bmap_info.head_for_volume_bitmaps); driver_ctx->dc_bmap_info.num_volume_bitmaps++; return vol_bitmap; } void dealloc_volume_bitmap(volume_bitmap_t *vol_bitmap) { if(!vol_bitmap) { info("vol bitmap is null"); return; } inm_list_del(&vol_bitmap->list_entry); driver_ctx->dc_bmap_info.num_volume_bitmaps--; INM_DESTROY_COMPLETION(&vol_bitmap->set_bits_work_item_list_empty_notification); INM_DESTROY_SPIN_LOCK(&vol_bitmap->lock); INM_DESTROY_SEM(&vol_bitmap->sem); INM_KFREE(vol_bitmap, sizeof(*vol_bitmap), INM_PINNED_HEAP); vol_bitmap = NULL; } /* increments volume bitmap reference count */ void get_volume_bitmap(volume_bitmap_t *vol_bitmap) { INM_ATOMIC_INC(&vol_bitmap->refcnt); return; } /* decrements volume bitmap reference count */ void put_volume_bitmap(volume_bitmap_t *vol_bitmap) { if (INM_ATOMIC_DEC_AND_TEST(&vol_bitmap->refcnt)) dealloc_volume_bitmap(vol_bitmap); return; } bitmap_work_item_t *allocate_bitmap_work_item(inm_u32_t witem_type) { bitmap_work_item_t *bm_witem = NULL; bm_witem = (bitmap_work_item_t *) INM_KMEM_CACHE_ALLOC(driver_ctx->dc_bmap_info.bitmap_work_item_pool, INM_KM_SLEEP | INM_KM_NOWARN); if (!bm_witem) return NULL; INM_MEM_ZERO(bm_witem, sizeof(bitmap_work_item_t)); INM_ATOMIC_SET(&bm_witem->refcnt, 1); INM_INIT_LIST_HEAD(&bm_witem->list_entry); INM_INIT_LIST_HEAD(&bm_witem->bit_runs.meta_page_list); bm_witem->volume_bitmap = NULL; bm_witem->eBitmapWorkItem = ecBitmapWorkItemNotInitialized; if(WITEM_TYPE_BITMAP_WRITE != witem_type){ inm_page_t *pgp = NULL; pgp = get_page_from_page_pool(0, INM_KM_SLEEP, NULL); if(!pgp){ put_bitmap_work_item(bm_witem); return NULL; } inm_list_add_tail(&pgp->entry, &bm_witem->bit_runs.meta_page_list); bm_witem->bit_runs.runs = (disk_chg_t *)pgp->cur_pg; } return bm_witem; } /* frees the bitmap work item */ void cleanup_bitmap_work_item(bitmap_work_item_t *bm_witem) { inm_page_t *pgp; while(!inm_list_empty(&bm_witem->bit_runs.meta_page_list)){ pgp = inm_list_entry(bm_witem->bit_runs.meta_page_list.next, inm_page_t, entry); inm_list_del(&pgp->entry); inm_free_metapage(pgp); } INM_KMEM_CACHE_FREE(driver_ctx->dc_bmap_info.bitmap_work_item_pool, bm_witem); return; } /* reference bitmap_work_item , increment refcnt */ void get_bitmap_work_item(bitmap_work_item_t *bitmap_work_item) { INM_ATOMIC_INC(&bitmap_work_item->refcnt); return; } /* dereference bitmap_work_item, decrement refcnt */ void put_bitmap_work_item(bitmap_work_item_t *bitmap_work_item) { if (INM_ATOMIC_DEC_AND_TEST(&bitmap_work_item->refcnt)) cleanup_bitmap_work_item(bitmap_work_item); return; } /** * FUNCTION NAME: wait_for_all_writes_to_complete * * DESCRIPTION : This function waits till all bitmap work items drained * in set_bitmap_work_item_list * * INPUT PARAMETERS : vbmap - ptr to volume bitmap **/ void wait_for_all_writes_to_complete(volume_bitmap_t *vbmap) { inm_s32_t set_bits_work_item_list_is_empty; inm_s32_t wait_for_notification = FALSE; unsigned long lock_flag = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!vbmap) { info("null vbmap"); return; } INM_DOWN(&vbmap->sem); INM_SPIN_LOCK_IRQSAVE(&vbmap->lock, lock_flag); set_bits_work_item_list_is_empty = inm_list_empty(&vbmap->set_bits_work_item_list); INM_SPIN_UNLOCK_IRQRESTORE(&vbmap->lock, lock_flag); if (!set_bits_work_item_list_is_empty) { dbg("set_bits_work_item_list is not empty"); vbmap->flags |= VOLUME_BITMAP_FLAGS_WAITING_FOR_SETBITS_WORKITEM_LIST_EMPTY_NOTIFICATION; wait_for_notification = TRUE; } INM_UP(&vbmap->sem); if (wait_for_notification) { dbg("waiting on set_bits_work_item_list_empty_notification"); dbg("for volume %s", vbmap->volume_GUID); INM_WAIT_FOR_COMPLETION_INTERRUPTIBLE(&vbmap->set_bits_work_item_list_empty_notification); } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return; } /** * FUNCTION NAME: queue_worker_routine_for_start_bitmap_read * * DESCRIPTION : This function queues work queue entry for bitmap read (start), * It is called by service thread. * * INPUT PARAMETERS : vbmap - ptr to volume bitmap **/ void queue_worker_routine_for_start_bitmap_read(volume_bitmap_t *vbmap) { wqentry_t *wqe = NULL; bitmap_work_item_t *bmap_witem = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!vbmap) { info("null vbmap"); return; } wqe = alloc_work_queue_entry(INM_KM_SLEEP); if (!wqe) { info("malloc failed : wqe"); return; } bmap_witem = allocate_bitmap_work_item(WITEM_TYPE_START_BITMAP_READ); if (!bmap_witem) { info("malloc failed : bmap_witem"); put_work_queue_entry(wqe); return; } get_volume_bitmap(vbmap); bmap_witem->volume_bitmap = vbmap; bmap_witem->eBitmapWorkItem = ecBitmapWorkItemStartRead; wqe->witem_type = WITEM_TYPE_START_BITMAP_READ; wqe->context = bmap_witem; wqe->work_func = start_bitmap_read_worker_routine; add_item_to_work_queue(&driver_ctx->wqueue, wqe); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return; } /** * FUNCTION NAME: start_bitmap_read_worker_routine * * DESCRIPTION : This function is a wrapper, which extracts bitmap work item * from work queue entry and calls low level bitmap read routines * It is called by worker thread. * INPUT PARAMETERS : wqe - ptr to work queue entry * NOTES * This function queues the read changes to device * specific dirty block context. After queuing changes it checks if changes * cross low water mark the read is paused. If the changes read are the last * set of changes the read state is set to completed. In this function we * have to unset the bits successfully read and send a next read. **/ void start_bitmap_read_worker_routine(wqentry_t *wqe) { bitmap_work_item_t *bmap_witem = NULL; volume_bitmap_t *vbmap = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!wqe) { info("null wqe"); return; } bmap_witem = (bitmap_work_item_t *) wqe->context; put_work_queue_entry(wqe); if (!bmap_witem || !bmap_witem->volume_bitmap) { info("null bmap_witem, or null volume bitmap"); return; } get_volume_bitmap(bmap_witem->volume_bitmap); vbmap = bmap_witem->volume_bitmap; INM_DOWN(&vbmap->sem); if ((vbmap->eVBitmapState == ecVBitmapStateReadStarted) && vbmap->bitmap_api) { bmap_witem->bit_runs.context1 = bmap_witem; bmap_witem->bit_runs.completion_callback = read_bitmap_completion_callback; inm_list_add(&bmap_witem->list_entry, &vbmap->work_item_list); INM_UP(&vbmap->sem); bitmap_api_get_first_runs(vbmap->bitmap_api, &bmap_witem->bit_runs); INM_DOWN(&vbmap->sem); bmap_witem = NULL; } INM_UP(&vbmap->sem); put_volume_bitmap(vbmap); if (bmap_witem) put_bitmap_work_item(bmap_witem); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return; } /* * Function : ReadBitmapCompletionCallback * Parameters : BitRuns - This indicates the changes that have been read * to bit map * NOTES : * This function is called after read completion. */ void read_bitmap_completion_callback(bitruns_t *bit_runs) { wqentry_t *wqe = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!bit_runs) { info("null bit_runs"); return; } /* Do not call completion directly here, * This could lead to recursive call and stack overflow. */ wqe = alloc_work_queue_entry(INM_KM_SLEEP); if (!wqe) { info("malloc failed : wqe "); return; } wqe->context = bit_runs->context1; wqe->work_func = read_bitmap_completion_worker_routine; add_item_to_work_queue(&driver_ctx->wqueue, wqe); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return; } /* * Function : ReadBitMapCompletionWorkerRoutine * Parameters : BitRuns - This indicates the changes that have been read * to bit map * NOTES : * This function calls the ReadBitmapCompletion funciton * */ void read_bitmap_completion_worker_routine(wqentry_t *wqe) { bitmap_work_item_t *bmap_witem = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } bmap_witem = (bitmap_work_item_t *) wqe->context; put_work_queue_entry(wqe); read_bitmap_completion(bmap_witem); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return; } static inm_s32_t split_change_into_chg_node_in_bitmap_wostate(target_context_t *ctx, struct inm_list_head *node_hd, write_metadata_t *wmd, etWriteOrderState wostate) { inm_u64_t max_data_sz_per_chg_node = driver_ctx->tunable_params.max_data_size_per_non_data_mode_drty_blk; inm_u64_t remaining_length = wmd->length; struct inm_list_head split_chg_list_hd; inm_u64_t byte_offset = wmd->offset; inm_u64_t nr_splits = 0; change_node_t *chg_node = NULL; inm_u64_t chg_len = 0; disk_chg_t *chg = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } INM_INIT_LIST_HEAD(&split_chg_list_hd); while(remaining_length) { chg_node = inm_alloc_change_node(NULL, INM_KM_SLEEP); if (!chg_node) { info("change node is null"); return -ENOMEM; } init_change_node(chg_node, 0, INM_KM_SLEEP, NULL); ref_chg_node(chg_node); chg_node->type = NODE_SRC_METADATA; chg_node->wostate = wostate; inm_list_add_tail(&chg_node->next, &split_chg_list_hd); nr_splits++; chg_len = min(max_data_sz_per_chg_node, remaining_length); chg = (disk_chg_t *)((char *)chg_node->changes.cur_md_pgp + (sizeof(disk_chg_t) * chg_node->changes.change_idx)); chg->offset = byte_offset; chg->length = chg_len; chg->time_delta = 0; chg->seqno_delta = 0; chg_node->changes.change_idx++; chg_node->changes.bytes_changes += chg_len; chg_node->seq_id_for_split_io = nr_splits; chg_node->flags |= KDIRTY_BLOCK_FLAG_PART_OF_SPLIT_CHANGE; byte_offset += chg_len; remaining_length -= chg_len; } if (!nr_splits) return 0; chg_node = inm_list_entry(split_chg_list_hd.next, change_node_t, next); chg_node->flags |= KDIRTY_BLOCK_FLAG_START_OF_SPLIT_CHANGE; chg_node->flags &= ~KDIRTY_BLOCK_FLAG_PART_OF_SPLIT_CHANGE; chg_node = inm_list_entry(split_chg_list_hd.prev, change_node_t, next); chg_node->flags |= KDIRTY_BLOCK_FLAG_END_OF_SPLIT_CHANGE; chg_node->flags &= ~KDIRTY_BLOCK_FLAG_PART_OF_SPLIT_CHANGE; inm_list_splice_at_tail(&split_chg_list_hd, node_hd); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving nr splits = %llu\n", nr_splits); } return nr_splits; } static inm_s32_t add_metadata_bitmap_wostate(target_context_t *ctx, struct inm_list_head *node_hd, change_node_t **change_node, write_metadata_t *wmd) { inm_u32_t chg_sz = wmd->length; inm_u32_t max_data_sz_per_chg_node = driver_ctx->tunable_params.max_data_size_per_non_data_mode_drty_blk; inm_u32_t nr_splits = 0; disk_chg_t *chg = NULL; inm_u64_t avail_space = 0; change_node_t *tchg_node = *change_node; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (max_data_sz_per_chg_node < chg_sz) { *change_node = NULL; return split_change_into_chg_node_in_bitmap_wostate(ctx, node_hd, wmd, ecWriteOrderStateBitmap); } if (tchg_node && tchg_node->changes.change_idx < (MAX_CHANGE_INFOS_PER_PAGE)) { avail_space = max_data_sz_per_chg_node - tchg_node->changes.bytes_changes; if ((avail_space < wmd->length) || (tchg_node->changes.change_idx >= MAX_CHANGE_INFOS_PER_PAGE)) tchg_node = NULL; } if (!tchg_node) { tchg_node = inm_alloc_change_node(NULL, INM_KM_SLEEP); if (!tchg_node) { info("change node is null"); return -ENOMEM; } init_change_node(tchg_node, 0, INM_KM_SLEEP, NULL); ref_chg_node(tchg_node); tchg_node->type = NODE_SRC_METADATA; tchg_node->wostate = ecWriteOrderStateBitmap; inm_list_add_tail(&tchg_node->next, node_hd); *change_node = tchg_node; } chg = (disk_chg_t *)((char *)tchg_node->changes.cur_md_pgp + (sizeof(disk_chg_t) * tchg_node->changes.change_idx)); chg->offset = wmd->offset; chg->length = wmd->length; /* the time deltas are updated in completion routine */ chg->time_delta = 0; chg->seqno_delta = 0; tchg_node->changes.change_idx++; tchg_node->changes.bytes_changes += wmd->length; nr_splits++; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return nr_splits; } /* * Function : ReadBitmapCompletion * Parameters : BitRuns - This indicates the changes that have been read * to bit map * NOTES : * This function is called after read completion. This function queues the read changes to device * specific dirty block context. After queuing changes it checks if changes * cross low water mark the read is paused. If the changes read are the last * set of changes the read state is set to completed. In this function we * have to unset the bits successfully read and send a next read. * * To avoid copying of the bitruns, we would use the previous read DB_WORK_ITEM for * clearing bits and allocate a new one for read if required. * */ void read_bitmap_completion(bitmap_work_item_t *bmap_witem) { volume_bitmap_t *vbmap = NULL; target_context_t *vcptr = NULL; change_node_t *change_node = NULL; struct inm_list_head node_hd; /* change node list for bitmap change nodes */ inm_s32_t clear_bits_read = 0, cont_bitmap_read = 0; inm_u64_t vc_nr_bytes_changes_read_from_bitmap = 0; inm_s32_t vc_split_changes_returned = 0; inm_s32_t vc_total_changes_pending = 0; struct inm_list_head *ptr = NULL, *nextptr = NULL; inm_u32_t i = 0; inm_u32_t tdelta = 0, sdelta = 0; inm_u64_t time = 0, nr_seq = 0; unsigned char delay_bitmap_read = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } INM_INIT_LIST_HEAD(&node_hd); if (!bmap_witem || !bmap_witem->volume_bitmap) { info("null bmap_witem or volume bitmap"); return; } get_volume_bitmap(bmap_witem->volume_bitmap); vbmap = bmap_witem->volume_bitmap; INM_DOWN(&vbmap->sem); /*check if bitmap is still in read state ?? */ if ((vbmap->eVBitmapState != ecVBitmapStateReadStarted) || !vbmap->bitmap_api || !vbmap->volume_context || (vbmap->volume_context->tc_flags & VCF_CV_FS_UNMOUNTED)) { /* bitmap is closed, or bitmap is moved to write state, */ /* Ignore this read */ inm_list_del(&bmap_witem->list_entry); put_bitmap_work_item(bmap_witem); info("ignoring bitmap read - state = %s, bapi = %p, vcptr = %p", get_volume_bitmap_state_string(vbmap->eVBitmapState), vbmap->bitmap_api, vbmap->volume_context); } else { vcptr = vbmap->volume_context; do { /*error processing*/ /* CASE 1: Error in reading bitmap * soln: set volume resync required flag */ if (bmap_witem->bit_runs.final_status && (bmap_witem->bit_runs.final_status != EAGAIN)) { vbmap->eVBitmapState = ecVBitmapStateReadError; vcptr->tc_bp->num_bitmap_read_errors++; set_volume_out_of_sync(vcptr, ERROR_TO_REG_BITMAP_READ_ERROR, bmap_witem->bit_runs.final_status); info("received error %x setting bitmap state of volume %s to ecVBitmap error", bmap_witem->bit_runs.final_status, vbmap->volume_GUID); break; } /* CASE 2: if changes are not added to volume context's * change list, then add here */ /* Read succeeded, check if any changes are returned */ if (!bmap_witem->bit_runs.final_status && !bmap_witem->bit_runs.nbr_runs) { vbmap->eVBitmapState = ecVBitmapStateReadCompleted; dbg("bitmap read of volume %s is completed", vbmap->volume_GUID); break; } for (i = 0; i < bmap_witem->bit_runs.nbr_runs; i++) { write_metadata_t wmd; wmd.offset = bmap_witem->bit_runs.runs[i].offset; wmd.length = bmap_witem->bit_runs.runs[i].length; vc_split_changes_returned = add_metadata_bitmap_wostate(vcptr, &node_hd, &change_node, &wmd); if (vc_split_changes_returned <= 0) break; vc_nr_bytes_changes_read_from_bitmap += bmap_witem->bit_runs.runs[i].length; vc_total_changes_pending += vc_split_changes_returned; } volume_lock(vcptr); inm_list_for_each_safe(ptr, nextptr, &node_hd) { unsigned short idx = 0; unsigned long lock_flag = 0; change_node = inm_list_entry(ptr, change_node_t, next); change_node->vcptr = vcptr; vcptr->tc_nr_cns++; change_node->transaction_id = 0; get_time_stamp_tag(&change_node->changes.start_ts); if (change_node->flags & KDIRTY_BLOCK_FLAG_SPLIT_CHANGE_MASK) { /* maintaining single time stamp, and seq # for split ios */ if (change_node->seq_id_for_split_io == 1) { time = change_node->changes.start_ts.TimeInHundNanoSecondsFromJan1601; nr_seq = change_node->changes.start_ts.ullSequenceNumber; } else { change_node->changes.start_ts.TimeInHundNanoSecondsFromJan1601 = time; change_node->changes.start_ts.ullSequenceNumber = nr_seq; } change_node->changes.end_ts.TimeInHundNanoSecondsFromJan1601 = time; change_node->changes.end_ts.ullSequenceNumber = nr_seq; } else { /* per-io time stamp changes , adds time stamp for each change*/ INM_SPIN_LOCK_IRQSAVE(&driver_ctx->time_stamp_lock, lock_flag); sdelta = (driver_ctx->last_time_stamp_seqno - change_node->changes.start_ts.ullSequenceNumber); driver_ctx->last_time_stamp_seqno += change_node->changes.change_idx; tdelta = driver_ctx->last_time_stamp - change_node->changes.start_ts.TimeInHundNanoSecondsFromJan1601; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->time_stamp_lock, lock_flag); change_node->changes.end_ts.ullSequenceNumber = (change_node->changes.start_ts.ullSequenceNumber + sdelta + change_node->changes.change_idx); change_node->changes.end_ts.TimeInHundNanoSecondsFromJan1601 = (change_node->changes.start_ts.TimeInHundNanoSecondsFromJan1601 + tdelta); for (idx = 0; idx < change_node->changes.change_idx; idx++) { disk_chg_t *dcp = (disk_chg_t *) ((char *)change_node->changes.cur_md_pgp + (sizeof(disk_chg_t) * idx)); sdelta++; dcp->seqno_delta = sdelta; dcp->time_delta = tdelta; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ print_chg_info(change_node, idx); } } } vcptr->tc_pending_changes += change_node->changes.change_idx; vcptr->tc_pending_md_changes += change_node->changes.change_idx; vcptr->tc_bytes_pending_md_changes += change_node->changes.bytes_changes; vcptr->tc_bytes_pending_changes += change_node->changes.bytes_changes; vcptr->tc_cnode_pgs++; add_changes_to_pending_changes(vcptr, change_node->wostate, change_node->changes.change_idx); dbg("nr of changes in chgnode (%p) = %d nr bytes changes = %d\n", change_node, change_node->changes.change_idx, change_node->changes.bytes_changes); } if (vcptr->tc_cur_node && (vcptr->tc_optimize_performance & PERF_OPT_DRAIN_PREF_DATA_MODE_CHANGES_IN_NWO)) { change_node_t * chg_node_tp = vcptr->tc_cur_node; INM_BUG_ON(!chg_node_tp); if ((chg_node_tp->type == NODE_SRC_DATA || chg_node_tp->type == NODE_SRC_DATAFILE) && chg_node_tp->wostate != ecWriteOrderStateData) { close_change_node(chg_node_tp, IN_BMAP_READ_PATH); inm_list_add_tail(&chg_node_tp->nwo_dmode_next, &vcptr->tc_nwo_dmode_list); if (vcptr->tc_optimize_performance & PERF_OPT_DEBUG_DATA_DRAIN) { dbg("Appending chg:%p to tgt_ctxt:%p next:%p prev:%p mode:%d", chg_node_tp, vcptr,chg_node_tp->nwo_dmode_next.next, chg_node_tp->nwo_dmode_next.prev, chg_node_tp->type); } } } /* insert node_head into tgt_ctxt->node_head */ inm_list_splice_at_tail(&node_hd, &vcptr->tc_node_head); vcptr->tc_bp->num_changes_read_from_bitmap += vc_total_changes_pending; vcptr->tc_bp->num_byte_changes_read_from_bitmap += vc_nr_bytes_changes_read_from_bitmap; vcptr->tc_bp->num_of_times_bitmap_read++; vcptr->tc_cur_node = NULL; delay_bitmap_read = vcptr->tc_flags & (VCF_VOLUME_IN_BMAP_WRITE | VCF_VOLUME_IN_GET_DB); volume_unlock(vcptr); /* if # of split changes are zero, then previous loop breaks */ if (i < bmap_witem->bit_runs.nbr_runs) { vbmap->eVBitmapState = ecVBitmapStateOpened; break; } clear_bits_read = TRUE; if (!bmap_witem->bit_runs.final_status) { /* this is the last read */ vbmap->eVBitmapState = ecVBitmapStateReadCompleted; dbg("bitmap read for volume %s is completed", vbmap->volume_GUID); } else if (driver_ctx->tunable_params.db_low_water_mark_while_service_running && (vcptr->tc_pending_changes >= driver_ctx->tunable_params.db_low_water_mark_while_service_running)) { /* if this is not a final read, and reached the lower water mark pause the reads. */ vbmap->eVBitmapState = ecVBitmapStateReadPaused; dbg("bitmap read for volume %s is paused", vbmap->volume_GUID); } else if (!delay_bitmap_read) cont_bitmap_read = TRUE; else { dbg("bitmap read for volume %s is paused for racing writes", vbmap->volume_GUID); vbmap->eVBitmapState = ecVBitmapStateReadPaused; } } while (0); if (cont_bitmap_read) { bitmap_work_item_t *bwi_for_continuing_read = NULL; /* continue reading bitmap */ bwi_for_continuing_read = allocate_bitmap_work_item(WITEM_TYPE_CONTINUE_BITMAP_READ); if (!bwi_for_continuing_read) { vbmap->eVBitmapState = ecVBitmapStateReadPaused; info("malloc failed : bitmap work item"); } else { bwi_for_continuing_read->eBitmapWorkItem = ecBitmapWorkItemContinueRead; get_volume_bitmap(vbmap); bwi_for_continuing_read->volume_bitmap = vbmap; continue_bitmap_read(bwi_for_continuing_read, (int)TRUE); } } if (clear_bits_read) { bmap_witem->eBitmapWorkItem = ecBitmapWorkItemClearBits; bmap_witem->bit_runs.completion_callback = write_bitmap_completion_callback; INM_UP(&vbmap->sem); bitmap_api_clearbits(vbmap->bitmap_api, &bmap_witem->bit_runs); INM_DOWN(&vbmap->sem); } else { inm_list_del(&bmap_witem->list_entry); put_bitmap_work_item(bmap_witem); } if (vcptr && (vbmap->eVBitmapState == ecVBitmapStateReadCompleted)) { /* check for switching to data filtering mode */ volume_lock(vcptr); if (is_data_filtering_enabled_for_this_volume(vcptr) && (driver_ctx->service_state == SERVICE_RUNNING) && can_switch_to_data_filtering_mode(vcptr)) { set_tgt_ctxt_filtering_mode(vcptr, FLT_MODE_DATA, TRUE); } else { set_tgt_ctxt_filtering_mode(vcptr, FLT_MODE_METADATA, FALSE); } if (is_data_filtering_enabled_for_this_volume(vcptr) && (driver_ctx->service_state == SERVICE_RUNNING) && can_switch_to_data_wostate(vcptr)){ /* switch to data write order state */ set_tgt_ctxt_wostate(vcptr, ecWriteOrderStateData, FALSE, ecWOSChangeReasonUnInitialized); dbg("switched to data write order state\n"); } else if ((driver_ctx->service_state == SERVICE_RUNNING) && !vcptr->tc_pending_wostate_bm_changes && !vcptr->tc_pending_wostate_rbm_changes) { set_tgt_ctxt_wostate(vcptr, ecWriteOrderStateMetadata, FALSE, ecWOSChangeReasonMDChanges); dbg("switched to metadata write order state\n"); } volume_unlock(vcptr); } /* notify service to drain the changes read from bitmap */ if(should_wakeup_s2(vcptr)) INM_WAKEUP_INTERRUPTIBLE(&vcptr->tc_waitq); } INM_UP(&vbmap->sem); put_volume_bitmap(vbmap); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return; } void continue_bitmap_read(bitmap_work_item_t *bmap_witem, inm_s32_t mutex_acquired) { volume_bitmap_t *vbmap = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!bmap_witem) { info("null bmap_witem"); return; } vbmap = bmap_witem->volume_bitmap; if (!vbmap) { info("vbmap is null"); return; } get_volume_bitmap(vbmap); if (!mutex_acquired) INM_DOWN(&vbmap->sem); if ((vbmap->eVBitmapState != ecVBitmapStateReadStarted) || !vbmap->bitmap_api) put_bitmap_work_item(bmap_witem); else { bmap_witem->bit_runs.context1 = bmap_witem; bmap_witem->bit_runs.completion_callback = read_bitmap_completion_callback; inm_list_add(&bmap_witem->list_entry, &vbmap->work_item_list); INM_UP(&vbmap->sem); bitmap_api_get_next_runs(vbmap->bitmap_api, &bmap_witem->bit_runs); INM_DOWN(&vbmap->sem); } if (!mutex_acquired) INM_UP(&vbmap->sem); put_volume_bitmap(vbmap); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return; } void continue_bitmap_read_worker_routine(wqentry_t *wqe) { bitmap_work_item_t *bmap_witem = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!wqe) { info("wqe is null"); return; } bmap_witem = (bitmap_work_item_t *) wqe->context; put_work_queue_entry(wqe); if (!bmap_witem) { info("bmap witem is null"); return; } continue_bitmap_read(bmap_witem, FALSE); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return; } /* * Function : WriteBitmapCompletionCallback * Parameters : BitRuns - This indicates the changes that have been writen * to bit map */ void write_bitmap_completion_callback(bitruns_t *bit_runs) { wqentry_t *wqe = NULL; bitmap_work_item_t *bmap_witem = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!bit_runs) { info("bit_runs is null"); return; } bmap_witem = (bitmap_work_item_t *)bit_runs->context1; if (INM_IN_INTERRUPT()) { wqe = alloc_work_queue_entry(INM_KM_NOSLEEP); if (!wqe) { info("malloc failed: wqe"); return; } wqe->context = bmap_witem; wqe->work_func = write_bitmap_completion_worker_routine; add_item_to_work_queue(&driver_ctx->wqueue, wqe); } else { write_bitmap_completion(bmap_witem); } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return; } /* * Function Name : WriteBitMapCompletion * Parameters : * pWorkQueueEntry : pointer to work queue entry, Context member has * pointer to BitmapWorkItem. BitmapWorkItem.BitRuns has write * meta data/changes we requested to write in bit map. */ void write_bitmap_completion(bitmap_work_item_t *bmap_witem) { volume_bitmap_t *vbmap = NULL; target_context_t *vcptr = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!bmap_witem) { info("bmap_witem is null"); return; } vbmap = bmap_witem->volume_bitmap; INM_DOWN(&vbmap->sem); vcptr = vbmap->volume_context; if (bmap_witem->eBitmapWorkItem == ecBitmapWorkItemSetBits) { volume_lock(vcptr); vcptr->tc_flags &= ~VCF_VOLUME_IN_BMAP_WRITE; volume_unlock(vcptr); if (bmap_witem->bit_runs.final_status) { // != 0 info("setbits witem failed with status %x for volume %s", bmap_witem->bit_runs.final_status, vbmap->volume_GUID); if ((vbmap->eVBitmapState != ecVBitmapStateClosed) && vcptr) { vcptr->tc_bp->num_bitmap_write_errors++; set_volume_out_of_sync(vcptr, ERROR_TO_REG_BITMAP_WRITE_ERROR, bmap_witem->bit_runs.final_status); info("setting volume out of sync due to write error"); } } else { if ((vbmap->eVBitmapState != ecVBitmapStateClosed) && vcptr) { volume_lock(vcptr); vcptr->tc_bp->num_changes_queued_for_writing -= bmap_witem->changes; vcptr->tc_bp->num_changes_written_to_bitmap += bmap_witem->changes; vcptr->tc_bp->num_byte_changes_queued_for_writing -= bmap_witem->nr_bytes_changed_data; vcptr->tc_bp->num_byte_changes_written_to_bitmap += bmap_witem->nr_bytes_changed_data; volume_unlock(vcptr); } } } else { if (vcptr && !(vcptr->tc_flags & VCF_CV_FS_UNMOUNTED) && bmap_witem->bit_runs.final_status) { vcptr->tc_bp->num_bitmap_clear_errors++; } } /* for all the cases, it is required to remove this entry from list. */ if (bmap_witem->eBitmapWorkItem == ecBitmapWorkItemSetBits) { unsigned long lock_flag = 0; INM_SPIN_LOCK_IRQSAVE(&vbmap->lock, lock_flag); inm_list_del(&bmap_witem->list_entry); if ((vbmap->flags & VOLUME_BITMAP_FLAGS_WAITING_FOR_SETBITS_WORKITEM_LIST_EMPTY_NOTIFICATION) && inm_list_empty(&vbmap->set_bits_work_item_list)) { vbmap->flags &= ~VOLUME_BITMAP_FLAGS_WAITING_FOR_SETBITS_WORKITEM_LIST_EMPTY_NOTIFICATION; INM_COMPLETE(&vbmap->set_bits_work_item_list_empty_notification); } INM_SPIN_UNLOCK_IRQRESTORE(&vbmap->lock, lock_flag); } else { inm_list_del(&bmap_witem->list_entry); } INM_UP(&vbmap->sem); put_bitmap_work_item(bmap_witem); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return; } /* * Function Name : write_bitmap_completion_worker_routine * Parameters : * pWorkQueueEntry : pointer to work queue entry, Context member has * pointer to BitRuns. BitRuns has the write meta data/changes * we requested to write in bit map. * NOTES : * This function removes queued write work items in device specific dirty * block context and derefences the dirty block context. * */ void write_bitmap_completion_worker_routine(wqentry_t *wqep) { bitmap_work_item_t *bwip = NULL; bwip = (bitmap_work_item_t *) wqep->context; put_work_queue_entry(wqep); write_bitmap_completion(bwip); } void queue_worker_routine_for_continue_bitmap_read(volume_bitmap_t *vbmap) { wqentry_t *wqe = NULL; bitmap_work_item_t *bmap_witem = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!vbmap) { info("null vbmap"); return; } wqe = alloc_work_queue_entry(INM_KM_SLEEP); if (!wqe) { info("malloc failed : wqe"); return; } bmap_witem = allocate_bitmap_work_item(WITEM_TYPE_CONTINUE_BITMAP_READ); if (!bmap_witem) { info("malloc failed : bmap_witem"); put_work_queue_entry(wqe); return; } bmap_witem->eBitmapWorkItem = ecBitmapWorkItemContinueRead; get_volume_bitmap(vbmap); bmap_witem->volume_bitmap = vbmap; wqe->witem_type = WITEM_TYPE_CONTINUE_BITMAP_READ; wqe->context = bmap_witem; wqe->work_func = continue_bitmap_read_worker_routine; add_item_to_work_queue(&driver_ctx->wqueue, wqe); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return; } const char * get_volume_bitmap_state_string(etVBitmapState bmap_state) { switch (bmap_state) { case ecVBitmapStateUnInitialized: return "ecVBitmapStateUnInitialized"; break; case ecVBitmapStateOpened: return "ecVBitmapStateOpened"; break; case ecVBitmapStateReadStarted: return "ecVBitmapStateReadStarted"; break; case ecVBitmapStateReadPaused: return "ecVBitmapStateReadPaused"; break; case ecVBitmapStateReadCompleted: return "ecVBitmapStateReadCompleted"; break; case ecVBitmapStateAddingChanges: return "ecVBitmapStateAddingChanges"; break; case ecVBitmapStateClosed: return "ecVBitmapStateClosed"; break; case ecVBitmapStateReadError: return "ecVBitmapStateReadError"; break; case ecVBitmapStateInternalError: return "ecVBitmapStateInternalError"; break; default: return "ecVBitmapStateUnknown"; break; } } void close_bitmap_file_on_tgt_ctx_deletion(volume_bitmap_t *vbmap, target_context_t *vcptr) { close_bitmap_file(vbmap, TRUE); vcptr->tc_bp->volume_bitmap = NULL; } void process_vcontext_work_items(wqentry_t *wqeptr) { target_context_t *vcptr = NULL; volume_bitmap_t *vbmap = NULL; inm_u32_t ret = 0; struct inm_list_head change_node_list; wqentry_t *wqep; bitmap_work_item_t *bwip; struct inm_list_head *curp = NULL, *nxtp = NULL; change_node_t *cnp = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!wqeptr) { info("invalid wqeptr"); return; } vcptr = (target_context_t *)wqeptr->context; if (!vcptr) { info("invalid vcptr"); return; } volume_lock(vcptr); if(vcptr->tc_bp->volume_bitmap) { get_volume_bitmap(vcptr->tc_bp->volume_bitmap); vbmap = vcptr->tc_bp->volume_bitmap; } volume_unlock(vcptr); if ( (vbmap) && (vcptr->tc_cur_wostate == ecWriteOrderStateRawBitmap)) { INM_DOWN(&vbmap->sem); ret = move_rawbitmap_to_bmap(vcptr, FALSE); INM_UP(&vbmap->sem); if (vcptr->tc_prev_wostate == ecWriteOrderStateUnInitialized && ret < 0) { info("move_rawbitmap_to_bmap operation failed \n"); } } dbg("volume:%s pending changes = %lld work_item:%u\n", vcptr->tc_guid,vcptr->tc_pending_changes, wqeptr->witem_type); switch(wqeptr->witem_type) { case WITEM_TYPE_OPEN_BITMAP: if (vcptr->tc_flags & VCF_VOLUME_STACKED_PARTIALLY) return; if ((driver_ctx->sys_shutdown) && (inm_dev_id_get(vcptr) == driver_ctx->root_dev)) break; if (!vbmap) { inm_s32_t status = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info(" opening bitmap for volume %s\n", vcptr->tc_guid); } vbmap = open_bitmap_file(vcptr, &status); volume_lock(vcptr); if (status || !vbmap) { info("open bitmap for volume:%s has failed = %d", vcptr->tc_guid, status); set_bitmap_open_error(vcptr, TRUE, status); } else { vcptr->tc_flags &= ~VCF_OPEN_BITMAP_REQUESTED; /* There is a lack of synchronisation between this function and do_stop_filtering and * is as follows: * * Before assigning vbmap to vcptr->tc_bp->volume_bitmap, if do_stop_filtering * get executed completely, no one is responsible for closing the bitmap file. * * So if the target is undergone for deletion, do the do_stop_filtering task * here itself. */ if (vbmap) { if (vcptr->tc_flags & VCF_VOLUME_DELETING) { volume_unlock(vcptr); close_bitmap_file_on_tgt_ctx_deletion(vbmap, vcptr); vbmap = NULL; volume_lock(vcptr); } else { get_volume_bitmap(vbmap); vcptr->tc_bp->volume_bitmap = vbmap; } } } volume_unlock(vcptr); } break; case WITEM_TYPE_BITMAP_WRITE: if (vbmap) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("writing to bitmap for volume %s\n", vcptr->tc_guid); } if ((vcptr->tc_cur_wostate == ecWriteOrderStateRawBitmap) && (is_bmaphdr_loaded(vbmap) == FALSE)) { set_volume_out_of_sync(vcptr, ERROR_TO_REG_BITMAP_OPEN_FAIL_CHANGES_LOST, 0); break; } wqep = alloc_work_queue_entry(INM_KM_SLEEP); if(!wqep){ ret = INM_ENOMEM; break; } bwip = allocate_bitmap_work_item(WITEM_TYPE_BITMAP_WRITE); if(!bwip){ put_work_queue_entry(wqep); ret = INM_ENOMEM; break; } INM_DOWN(&vbmap->sem); volume_lock(vcptr); vbmap->eVBitmapState = ecVBitmapStateAddingChanges; if (inm_list_empty(&vcptr->tc_node_head)) { volume_unlock(vcptr); put_work_queue_entry(wqep); put_bitmap_work_item(bwip); goto busy_wait; } vcptr->tc_flags |= VCF_VOLUME_IN_BMAP_WRITE; /* Reset changes related stats here */ update_target_context_stats(vcptr); /* Before writing to bitmap, remove data mode node from * non write order list */ inm_list_for_each_safe(curp, nxtp, &vcptr->tc_nwo_dmode_list) { inm_list_del_init(curp); } vcptr->tc_cur_node = NULL; list_change_head(&change_node_list, &vcptr->tc_node_head); INM_INIT_LIST_HEAD(&vcptr->tc_node_head); if (vcptr->tc_pending_confirm && !(vcptr->tc_pending_confirm->flags & CHANGE_NODE_ORPHANED)) { cnp = vcptr->tc_pending_confirm; inm_list_del_init(&cnp->next); inm_list_add_tail(&cnp->next, &vcptr->tc_node_head); if ((vcptr->tc_optimize_performance & PERF_OPT_DRAIN_PREF_DATA_MODE_CHANGES_IN_NWO) && (cnp->type == NODE_SRC_DATA || cnp->type == NODE_SRC_DATAFILE) && (cnp->wostate != ecWriteOrderStateData)) { inm_list_add_tail(&cnp->nwo_dmode_next, &vcptr->tc_nwo_dmode_list); if (vcptr->tc_optimize_performance & PERF_OPT_DEBUG_DATA_DRAIN) { dbg("Appending chg:%p to tgt_ctxt:%p next:%p prev:%p mode:%d", cnp, vcptr, cnp->nwo_dmode_next.next, cnp->nwo_dmode_next.prev, cnp->type); } } vcptr->tc_pending_changes = cnp->changes.change_idx; if (cnp->type == NODE_SRC_METADATA) { vcptr->tc_pending_md_changes = cnp->changes.change_idx; vcptr->tc_bytes_pending_md_changes = cnp->changes.bytes_changes; } vcptr->tc_bytes_pending_changes = cnp->changes.bytes_changes; add_changes_to_pending_changes(vcptr, cnp->wostate, cnp->changes.change_idx); vcptr->tc_bp->num_changes_queued_for_writing -= cnp->changes.change_idx; vcptr->tc_bp->num_byte_changes_queued_for_writing -= cnp->changes.bytes_changes; } if ((vcptr->tc_cur_wostate != ecWriteOrderStateBitmap) && (vcptr->tc_cur_wostate != ecWriteOrderStateRawBitmap)) { set_tgt_ctxt_wostate(vcptr, ecWriteOrderStateBitmap, FALSE, ecWOSChangeReasonBitmapChanges); } volume_unlock(vcptr); ret = queue_worker_routine_for_bitmap_write(vcptr, wqeptr->extra1, vbmap, &change_node_list, wqep, bwip); busy_wait: volume_lock(vcptr); vcptr->tc_bp->bmap_busy_wait = FALSE; volume_unlock(vcptr); INM_UP(&vbmap->sem); if (ret < 0) { err("bmap write op failed for volume %s [err code = %d]", vcptr->tc_guid, ret); } } else { set_bitmap_open_fail_due_to_loss_of_changes(vcptr, FALSE); } break; case WITEM_TYPE_START_BITMAP_READ: /* reading bitmap in raw mode is not allowed */ if (vbmap && (vcptr->tc_cur_wostate != ecWriteOrderStateRawBitmap)) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("starting read from bitmap for volume %s\n", vcptr->tc_guid); } INM_DOWN(&vbmap->sem); if ((vbmap->eVBitmapState == ecVBitmapStateOpened) || (vbmap->eVBitmapState == ecVBitmapStateAddingChanges)){ vbmap->eVBitmapState = ecVBitmapStateReadStarted; queue_worker_routine_for_start_bitmap_read(vbmap); } INM_UP(&vbmap->sem); } else { err("volume is in raw bitmap write order state/null vbmap(vc-wostate = %d\n)", vcptr->tc_cur_wostate); } break; case WITEM_TYPE_CONTINUE_BITMAP_READ: if (vbmap) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("starting read from bitmap for volume %s\n", vcptr->tc_guid); } INM_DOWN(&vbmap->sem); if ((vbmap->eVBitmapState == ecVBitmapStateReadStarted) || (vbmap->eVBitmapState == ecVBitmapStateReadPaused)) { vbmap->eVBitmapState = ecVBitmapStateReadStarted; queue_worker_routine_for_continue_bitmap_read(vbmap); } INM_UP(&vbmap->sem); } else { err("null vol_bitmap for volume = %s\n", vcptr->tc_guid); } break; case WITEM_TYPE_VOLUME_UNLOAD: dbg("Processing VOLUME UNLOAD work item \n"); if (!inm_list_empty(&vcptr->tc_node_head)) { if (!vbmap) { inm_s32_t status = 0; vbmap = open_bitmap_file(vcptr, &status); } } if(vbmap){ wqep = alloc_work_queue_entry(INM_KM_SLEEP); if(!wqep){ ret = INM_ENOMEM; break; } bwip = allocate_bitmap_work_item(WITEM_TYPE_BITMAP_WRITE); if(!bwip){ put_work_queue_entry(wqep); ret = INM_ENOMEM; break; } volume_lock(vcptr); vcptr->tc_flags |= VCF_VOLUME_IN_BMAP_WRITE; vbmap->eVBitmapState = ecVBitmapStateAddingChanges; if (inm_list_empty(&vcptr->tc_node_head)) { volume_unlock(vcptr); put_work_queue_entry(wqep); put_bitmap_work_item(bwip); goto bitmap_close; } /* Reset changes related stats here */ update_target_context_stats(vcptr); /* Before writing to bitmap, remove data mode node from * non write order list */ inm_list_for_each_safe(curp, nxtp, &vcptr->tc_nwo_dmode_list) { inm_list_del_init(curp); } vcptr->tc_cur_node = NULL; list_change_head(&change_node_list, &vcptr->tc_node_head); INM_INIT_LIST_HEAD(&vcptr->tc_node_head); if (vcptr->tc_pending_confirm && !(vcptr->tc_pending_confirm->flags & CHANGE_NODE_ORPHANED)) { cnp = vcptr->tc_pending_confirm; inm_list_del_init(&cnp->next); inm_list_add_tail(&cnp->next, &vcptr->tc_node_head); if ((vcptr->tc_optimize_performance & PERF_OPT_DRAIN_PREF_DATA_MODE_CHANGES_IN_NWO) && (cnp->type == NODE_SRC_DATA || cnp->type == NODE_SRC_DATAFILE) && (cnp->wostate != ecWriteOrderStateData)) { inm_list_add_tail(&cnp->nwo_dmode_next, &vcptr->tc_nwo_dmode_list); if (vcptr->tc_optimize_performance & PERF_OPT_DEBUG_DATA_DRAIN) { dbg("Appending chg:%p to tgt_ctxt:%p", cnp, vcptr); } } vcptr->tc_pending_changes = cnp->changes.change_idx; if (cnp->type == NODE_SRC_METADATA) { vcptr->tc_pending_md_changes = cnp->changes.change_idx; vcptr->tc_bytes_pending_md_changes = cnp->changes.bytes_changes; } vcptr->tc_bytes_pending_changes = cnp->changes.bytes_changes; add_changes_to_pending_changes(vcptr, cnp->wostate, cnp->changes.change_idx); vcptr->tc_bp->num_changes_queued_for_writing -= cnp->changes.change_idx; vcptr->tc_bp->num_byte_changes_queued_for_writing -= cnp->changes.bytes_changes; } if ((vcptr->tc_cur_wostate != ecWriteOrderStateBitmap) && (vcptr->tc_cur_wostate != ecWriteOrderStateRawBitmap)) { set_tgt_ctxt_wostate(vcptr, ecWriteOrderStateBitmap, FALSE, ecWOSChangeReasonBitmapChanges); } volume_unlock(vcptr); ret = queue_worker_routine_for_bitmap_write(vcptr, 0, vbmap, &change_node_list, wqep, bwip); if (ret < 0) { err("bmap write op failed for volume %s [ err code = %d ]", vcptr->tc_guid, ret); } } bitmap_close: flush_and_close_bitmap_file(vcptr); break; default: dbg("Unknow work-item type - 0x%x \n", wqeptr->witem_type); break; } put_tgt_ctxt(vcptr); if (vbmap) put_volume_bitmap(vbmap); return; } inm_s32_t add_vc_workitem_to_list(inm_u32_t witem_type, target_context_t *vcptr, inm_u32_t extra1, inm_u8_t open_bitmap, struct inm_list_head *lhptr) { inm_s32_t success = 1; wqentry_t *wqe = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (witem_type != WITEM_TYPE_OPEN_BITMAP) { if (!vcptr->tc_bp->volume_bitmap && !(vcptr->tc_flags & VCF_OPEN_BITMAP_REQUESTED) && open_bitmap) { dbg("Queuing open bitmap for volume = %p\n", vcptr); wqe = alloc_work_queue_entry(INM_KM_NOSLEEP); if (!wqe) { info("Failed to allocate work queue entry for bitmap open\n"); return !success; } vcptr->tc_flags |= VCF_OPEN_BITMAP_REQUESTED; wqe->witem_type = WITEM_TYPE_OPEN_BITMAP; get_tgt_ctxt(vcptr); wqe->context = vcptr; wqe->extra1 = 0; inm_list_add_tail(&wqe->list_entry, lhptr); wqe = NULL; } } wqe = alloc_work_queue_entry(INM_KM_NOSLEEP); if (wqe == NULL) return !success; wqe->witem_type = witem_type; get_tgt_ctxt(vcptr); wqe->context = vcptr; wqe->extra1 = extra1; inm_list_add_tail(&wqe->list_entry, lhptr); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return success; } void fill_bitmap_filename_in_volume_context(target_context_t *vcptr) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!vcptr) return; snprintf(vcptr->tc_bp->bitmap_file_name, sizeof(vcptr->tc_bp->bitmap_file_name), "%s/%s%s%s", INM_BMAP_DEFAULT_DIR_DEPRECATED, LOG_FILE_NAME_PREFIX, vcptr->tc_pname, LOG_FILE_NAME_SUFFIX); if (INM_BMAP_ALLOW_DEPRECATED(vcptr->tc_bp->bitmap_file_name)) { snprintf(vcptr->tc_bp->bitmap_dir_name, sizeof(vcptr->tc_bp->bitmap_dir_name), "%s", INM_BMAP_DEFAULT_DIR_DEPRECATED); info("Using deprecated file %s", vcptr->tc_bp->bitmap_file_name); } else { snprintf(vcptr->tc_bp->bitmap_dir_name, sizeof(vcptr->tc_bp->bitmap_dir_name), "%s/%s", PERSISTENT_DIR, vcptr->tc_pname); snprintf(vcptr->tc_bp->bitmap_file_name, sizeof(vcptr->tc_bp->bitmap_file_name), "%s/%s%s%s", vcptr->tc_bp->bitmap_dir_name, LOG_FILE_NAME_PREFIX, vcptr->tc_pname, LOG_FILE_NAME_SUFFIX); } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("bitmap_file_name:%s",vcptr->tc_bp->bitmap_file_name); } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } } void set_bitmap_open_fail_due_to_loss_of_changes(target_context_t *vcptr, inm_s32_t lock_acquired) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!vcptr) { return; } if (!lock_acquired) volume_lock(vcptr); /* As we are losing changes, it is required to set resync required * and also disable filtering temporarily for this volume **/ vcptr->tc_flags |= VCF_OPEN_BITMAP_FAILED; vcptr->tc_flags |= VCF_FILTERING_STOPPED; telemetry_set_dbs(&vcptr->tc_tel.tt_blend, DBS_FILTERING_STOPPED_BY_KERNEL); stop_filtering_device(vcptr, TRUE, NULL); dbg("stop filtering device .. functionality yet to implement "); if(!lock_acquired) volume_unlock(vcptr); set_volume_out_of_sync(vcptr, ERROR_TO_REG_BITMAP_OPEN_FAIL_CHANGES_LOST, 0); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return; } void set_bitmap_open_error(target_context_t *vcptr, inm_s32_t lock_acquired, inm_s32_t status) { inm_s32_t fail_further_opens = FALSE; inm_s32_t out_of_sync = FALSE; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if(!vcptr) return; if (!lock_acquired) volume_lock(vcptr); vcptr->tc_bp->num_bitmap_open_errors++; if (vcptr->tc_bp->num_bitmap_open_errors > MAX_BITMAP_OPEN_ERRORS_TO_STOP_FILTERING) fail_further_opens = TRUE; if (fail_further_opens) { vcptr->tc_flags |= VCF_OPEN_BITMAP_FAILED; vcptr->tc_flags |= VCF_FILTERING_STOPPED; telemetry_set_dbs(&vcptr->tc_tel.tt_blend, DBS_FILTERING_STOPPED_BY_KERNEL); out_of_sync = TRUE; stop_filtering_device(vcptr, TRUE, NULL); info("bitmap open error on volume %s\n", vcptr->tc_guid); } if (!lock_acquired) volume_unlock(vcptr); if (out_of_sync) set_volume_out_of_sync(vcptr, ERROR_TO_REG_BITMAP_OPEN_ERROR, status); return; } inm_s32_t can_open_bitmap_file(target_context_t *vcptr, inm_s32_t lose_changes) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!vcptr) return -EINVAL; if (vcptr->tc_flags & (VCF_FILTERING_STOPPED | VCF_OPEN_BITMAP_FAILED)) { return FALSE; } if (vcptr->tc_flags & VCF_VOLUME_STACKED_PARTIALLY) { return FALSE; } if (lose_changes || !vcptr->tc_bp->num_bitmap_open_errors) { return TRUE; } return FALSE; } /* * Function Name : queue_worker_routine_for_bitmap_write * Parameters : * pVolumeContext : This parameter is referenced and stored in the * work item. * ulNumDirtyChanges : if this value is zero all the dirty changes are * copied. If this value is non zero, minimum of * ulNumDirtyChanges are copied. * Retruns : * BOOLEAN : TRUE if succeded in queing work item for write * FALSE if failed in queing work item * NOTES : * * This function removes changes from volume context and queues set bits * work item for each dirty block. These work items are later removed and * processed by Work queue */ inm_s32_t queue_worker_routine_for_bitmap_write(target_context_t *vcptr, inm_u64_t nr_dirty_changes, volume_bitmap_t *vbmap, struct inm_list_head *change_node_list, wqentry_t *wqep, bitmap_work_item_t *bwip) { inm_u64_t nr_changes_copied = 0; unsigned long lock_flag = 0; inm_s32_t ret = -1; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } while(!inm_list_empty(change_node_list)) { change_node_t *change_node = NULL; inm_u32_t nr_chgs = 0; unsigned short rem; /* check whether it exceeded the limit */ if (nr_dirty_changes && nr_changes_copied >= nr_dirty_changes) break; change_node = inm_list_entry(change_node_list->next, change_node_t, next); inm_list_del(&change_node->next); /* tag-change nodes should be dropped */ if (change_node->type == NODE_SRC_TAGS) { telemetry_log_tag_history(change_node, vcptr, ecTagStatusDropped, ecBitmapWrite, ecMsgTagDropped); INM_ATOMIC_INC(&change_node->vcptr->tc_stats.num_tags_dropped); if (change_node->tag_guid) { change_node->tag_guid->status[change_node->tag_status_idx] = STATUS_DROPPED; INM_WAKEUP_INTERRUPTIBLE(&change_node->tag_guid->wq); change_node->tag_guid = NULL; } if (change_node->flags & CHANGE_NODE_FAILBACK_TAG) { info("The failback tag is dropped for disk %s, dirty block = %p", vcptr->tc_guid, change_node); set_tag_drain_notify_status(vcptr, TAG_STATUS_DROPPED, DEVICE_STATUS_NON_WRITE_ORDER_STATE); } commit_change_node(change_node); } else { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("removed change node %p from vcptr %p\n", change_node, vcptr); } rem = change_node->changes.change_idx; while (!inm_list_empty(&change_node->changes.md_pg_list)) { inm_page_t *pgp = NULL; nr_chgs = min((inm_u32_t)rem, (inm_u32_t)MAX_CHANGE_INFOS_PER_PAGE); pgp = inm_list_entry(change_node->changes.md_pg_list.next,inm_page_t, entry); inm_list_del(&pgp->entry); pgp->nr_chgs = nr_chgs; inm_list_add_tail(&pgp->entry, &bwip->bit_runs.meta_page_list); rem -= nr_chgs; if (!rem) break; } bwip->changes += change_node->changes.change_idx; bwip->bit_runs.nbr_runs += change_node->changes.change_idx; bwip->nr_bytes_changed_data += change_node->changes.bytes_changes; commit_change_node(change_node); } } /* if bitmap file is alreadyc opened. We can directly queue work * item to worker queue. if it is not opened lets insert to list and * will be processed later. */ if(bwip->changes){ bwip->eBitmapWorkItem = ecBitmapWorkItemSetBits; get_volume_bitmap(vbmap); bwip->volume_bitmap = vbmap; wqep->context = bwip; wqep->witem_type = WITEM_TYPE_BITMAP_WRITE; wqep->work_func = bitmap_write_worker_routine; INM_SPIN_LOCK_IRQSAVE(&vbmap->lock, lock_flag); inm_list_add(&bwip->list_entry, &vbmap->set_bits_work_item_list); INM_SPIN_UNLOCK_IRQRESTORE(&vbmap->lock, lock_flag); add_item_to_work_queue(&driver_ctx->wqueue, wqep); }else{ put_work_queue_entry(wqep); put_bitmap_work_item(bwip); } ret = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving ret value true"); } return ret; } /** * FUNCTION NAME: flush_and_close_bitmap_file * * DESCRIPTION : Flushes all the changes volume context, and closes it bitmap. * * INPUT PARAMETERS : vcptr - ptr to target_context */ void flush_and_close_bitmap_file(target_context_t *vcptr) { volume_bitmap_t *vbmap = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!vcptr) { info("null vcptr"); return; } inmage_flt_save_all_changes(vcptr, TRUE, INM_NO_OP); volume_lock(vcptr); vbmap = vcptr->tc_bp->volume_bitmap; vcptr->tc_bp->volume_bitmap = NULL; volume_unlock(vcptr); if (vbmap) close_bitmap_file(vbmap, FALSE); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return; } int inmage_flt_save_all_changes(target_context_t *vcptr, inm_s32_t wait_required, inm_s32_t op_type) { struct inm_list_head lh; wqentry_t *wqe = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!vcptr) return -EINVAL; if (vcptr->tc_flags & VCF_BITMAP_WRITE_DISABLED) return 0; INM_INIT_LIST_HEAD(&lh); volume_lock(vcptr); // device is being shutdown soon, write all pending changes to bitmap if (!inm_list_empty(&vcptr->tc_node_head)) { inm_s32_t wi = WITEM_TYPE_UNINITIALIZED; inm_s32_t open_bmap = FALSE; switch (op_type) { case INM_NO_OP: case INM_STOP_FILTERING: case INM_SYSTEM_SHUTDOWN: case INM_UNSTACK: wi = WITEM_TYPE_BITMAP_WRITE; break; default: dbg("unknown operation type\n"); break; } if (wi != WITEM_TYPE_UNINITIALIZED) add_vc_workitem_to_list(wi, vcptr, 0, open_bmap, &lh); } volume_unlock(vcptr); if(!inm_list_empty(&lh)) { struct inm_list_head *ptr = NULL, *nextptr = NULL; inm_list_for_each_safe(ptr, nextptr, &lh) { wqe = inm_list_entry(ptr, wqentry_t, list_entry); inm_list_del(&wqe->list_entry); process_vcontext_work_items(wqe); put_work_queue_entry(wqe); } } if (wait_required) wait_for_all_writes_to_complete(vcptr->tc_bp->volume_bitmap); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return 0; } void log_bitmap_open_success_event(target_context_t *vcptr) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!vcptr || vcptr->tc_bp->num_bitmap_open_errors) return; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return; } /* move raw bitmap mode to bitmap mode * return value +ve for last chance changes that can fit into the bmap hdr * -ve for failure * 0 for success */ int move_rawbitmap_to_bmap(target_context_t *vcptr, inm_s32_t force) { bitmap_api_t *bapi = NULL; inm_s32_t _rc = -1; inm_s32_t status; if (driver_ctx->sys_shutdown) return -1; INM_BUG_ON(!vcptr->tc_bp->volume_bitmap->bitmap_api); bapi = vcptr->tc_bp->volume_bitmap->bitmap_api; /*check if we aleady loaded the header into memory * if loaded , flush it */ _rc = bitmap_api_open_bitmap_stream(bapi, vcptr, &status); dbg("bmap(%s) open trial one status = %d\n", bapi->bitmap_filename, _rc); if (!_rc) goto success; if (bapi->bitmap_header.un.header.header_size) { inm_s32_t max_nr_lcwrites = 31*64; inm_s32_t rem = max_nr_lcwrites - vcptr->tc_pending_changes; if (rem < 0) { bapi->bitmap_header.un.header.changes_lost++; } } else { /* bitmap has not opened yet */ if (force && vcptr->tc_pending_changes > 0) { /* is service shutdown?? */ /* this should never happen for file stored on root volume */ if (vcptr->tc_bp->bitmap_file_name[0] == '/') return _rc; _rc = bitmap_api_open_bitmap_stream(bapi, vcptr, &status); if (_rc) { info("bitmap_api_open_bitmap_stream failed = %d\n", _rc); } } } success: if (_rc == 0) { inm_s32_t vol_in_sync = TRUE; inm_s32_t ret = 0; INM_BUG_ON(!bapi->fs); if (bapi->fs) { vcptr->tc_bp->volume_bitmap->eVBitmapState = ecVBitmapStateOpened; set_tgt_ctxt_wostate(vcptr, ecWriteOrderStateBitmap, FALSE, ecWOSChangeReasonUnInitialized); } bapi->bitmap_file_state = BITMAP_FILE_STATE_OPENED; /* bmap hdr is freshly loaded (first time) */ ret = is_volume_in_sync(bapi, &vol_in_sync, &status); if (vol_in_sync == FALSE) { /* volume is not in sync */ set_volume_out_of_sync(vcptr, status, status); /* CDataFile::DeleteDataFilesInDirectory() */ bapi->volume_insync = TRUE; } /* unset ignore bitmap creation flag. */ volume_lock(vcptr); vcptr->tc_flags &= ~VCF_IGNORE_BITMAP_CREATION; volume_unlock(vcptr); } return _rc; } involflt-0.1.0/src/filter_host.h0000755000000000000000000001322614467303177015350 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _INM_FILTER_HOST_H #define _INM_FILTER_HOST_H #include "involflt.h" #include "involflt-common.h" typedef struct host_dev { struct inm_list_head hdc_dev_list; inm_dev_t hdc_dev; struct gendisk *hdc_disk_ptr; req_queue_info_t *hdc_req_q_ptr; struct kobject *hdc_disk_kobj_ptr; const struct block_device_operations *hdc_fops; } host_dev_t; typedef struct host_dev_context { struct inm_list_head hdc_dev_list_head; sector_t hdc_start_sect; sector_t hdc_end_sect; sector_t hdc_actual_end_sect; inm_mempool_t *hdc_bio_info_pool; inm_u64_t hdc_volume_size; /* to validate the write offset */ inm_u32_t hdc_bsize; inm_u64_t hdc_nblocks; inm_wait_queue_head_t resync_notify; } host_dev_ctx_t; typedef host_dev_ctx_t *host_dev_ctxp; #define INM_BIOSZ sizeof(dm_bio_info_t) #define INM_IOINFO_MPOOL_SZ (INM_PAGESIZE/sizeof(dm_bio_info_t)) /* filtering definitions */ #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) #ifndef queue_rq_fn typedef blk_status_t (queue_rq_fn)(struct blk_mq_hw_ctx *, const struct blk_mq_queue_data *); #endif blk_status_t inm_queue_rq(struct blk_mq_hw_ctx *hctx, const struct blk_mq_queue_data *bd); dm_bio_info_t *inm_alloc_bioinfo(void); change_node_t *inm_alloc_chgnode(void); void inm_alloc_pools(void); int create_alloc_thread(void); void destroy_alloc_thread(void); void inm_free_bio_info(dm_bio_info_t *); #else #if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 4, 0) blk_qc_t flt_make_request_fn(struct request_queue *, struct bio *bio); #elif LINUX_VERSION_CODE >= KERNEL_VERSION(3, 2, 0) void flt_make_request_fn(struct request_queue *, struct bio *); #else inm_s32_t flt_make_request_fn(struct request_queue *, struct bio *); #endif #endif void flt_disk_obj_rel(struct kobject *); void flt_part_obj_rel(struct kobject *); void flt_queue_obj_rel(struct kobject *); void get_qinfo(req_queue_info_t *); void put_qinfo(req_queue_info_t *); void copy_bio_data_to_data_pages(target_context_t *tgt_ctxt, inm_wdata_t *wdatap, struct inm_list_head *change_node_list); void reset_stable_pages_for_all_devs(void); void set_stable_pages_for_all_devs(void); req_queue_info_t* alloc_and_init_qinfo(inm_block_device_t *bdev, target_context_t *tgt_ctxt); void init_tc_kobj(inm_block_device_t *bdev, struct kobject **hdc_disk_kobj_ptr); void inm_bufoff_to_fldisk(inm_buf_t *bp, target_context_t *tcp, inm_u64_t *abs_off); inm_mirror_bufinfo_t * inm_get_imb_cached(host_dev_ctx_t *); void inm_cleanup_mirror_bufinfo(host_dev_ctx_t *hdcp); inm_s32_t inm_prepare_atbuf(inm_mirror_atbuf *, inm_buf_t *, mirror_vol_entry_t *, inm_u32_t); /* disk resize notification */ void unregister_disk_change_notification(target_context_t *ctx, host_dev_t *hdc_dev); void register_disk_change_notification(target_context_t *ctx, host_dev_t *hdc_dev); #define INM_ALL_IOS_DONE 0 #define INM_MIRROR_INFO_RETURN(hdcp, mbufinfo, flag) \ { \ inm_free_atbuf_list(&(mbufinfo->imb_atbuf_list)); \ INM_KMEM_CACHE_FREE(driver_ctx->dc_host_info.mirror_bioinfo_cache, mbufinfo); \ } void inm_issue_atio(inm_buf_t *at_bp, mirror_vol_entry_t *vol_entry); #define INM_MAX_XFER_SZ(vol_entry, bp) INM_BUF_COUNT(bp) #define INM_UPDATE_VOL_ENTRY_STAT(tcp, vol_entry, count, io_sz) #ifdef INM_DEBUG #define INM_IS_TEST_MIRROR(bio) (bio->bi_private == driver_ctx) #define INM_SET_TEST_MIRROR(bio) (bio->bi_private = driver_ctx) #else #define INM_IS_TEST_MIRROR(bio) 0 #define INM_SET_TEST_MIRROR(bio) #endif #define INIT_OSSPEC_DRV_CTX(driver_ctx) \ do{ \ INM_INIT_LIST_HEAD(&(driver_ctx->dc_at_lun.dc_at_lun_list)); \ INM_INIT_SPIN_LOCK(&(driver_ctx->dc_at_lun.dc_at_lun_list_spn));\ }while(0) #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,9,0) #if LINUX_VERSION_CODE >= KERNEL_VERSION(5,10,0) || defined SET_INM_QUEUE_FLAG_STABLE_WRITE #define SET_STABLE_PAGES(q) blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, q) #define CLEAR_STABLE_PAGES(q) blk_queue_flag_clear(QUEUE_FLAG_STABLE_WRITES, q) #define TEST_STABLE_PAGES(q) blk_queue_stable_writes(q) #else #define SET_STABLE_PAGES(q) (INM_BDI_CAPABILITIES(q) |= BDI_CAP_STABLE_WRITES) #define CLEAR_STABLE_PAGES(q) (INM_BDI_CAPABILITIES(q) &= ~BDI_CAP_STABLE_WRITES) #define TEST_STABLE_PAGES(q) (INM_BDI_CAPABILITIES(q) & BDI_CAP_STABLE_WRITES) #endif #endif #ifndef INMAGE_PRODUCT_VERSION_MAJOR #define INMAGE_PRODUCT_VERSION_MAJOR 1 #endif #ifndef INMAGE_PRODUCT_VERSION_MINOR #define INMAGE_PRODUCT_VERSION_MINOR 0 #endif #ifndef INMAGE_PRODUCT_VERSION_BUILDNUM #define INMAGE_PRODUCT_VERSION_BUILDNUM 0 #endif #ifndef INMAGE_PRODUCT_VERSION_PRIVATE #define INMAGE_PRODUCT_VERSION_PRIVATE 1 #endif #endif /* _INM_FILTER_HOST_H*/ involflt-0.1.0/src/statechange.c0000755000000000000000000002474514467303177015317 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include "involflt.h" #include "involflt-common.h" #include "data-mode.h" #include "change-node.h" #include "filestream.h" #include "iobuffer.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "bitmap_api.h" #include "VBitmap.h" #include "work_queue.h" #include "data-file-mode.h" #include "target-context.h" #include "driver-context.h" #include "filter_host.h" #include "statechange.h" extern driver_context_t *driver_ctx; #define SERVICE_STATE_CHAGNE_THREAD_TIMEOUT 1000 /* clock ticks */ inm_s32_t create_service_thread() { inm_pid_t pid; inm_s32_t err = 0; INM_INIT_COMPLETION(&driver_ctx->service_thread._completion); /* notificaions for new event */ INM_INIT_COMPLETION(&driver_ctx->service_thread._new_event_completion); driver_ctx->service_state = SERVICE_NOTSTARTED; INM_INIT_WAITQUEUE_HEAD(&driver_ctx->service_thread.wakeup_event); INM_INIT_WAITQUEUE_HEAD(&driver_ctx->service_thread.shutdown_event); INM_ATOMIC_SET(&driver_ctx->service_thread.wakeup_event_raised, 0); INM_ATOMIC_SET(&driver_ctx->service_thread.shutdown_event_raised, 0); #ifdef INM_LINUX pid = INM_KERNEL_THREAD(service_thread_task, service_state_change_thread, NULL, 0, "inmsvcd"); #else pid = INM_KERNEL_THREAD(service_state_change_thread, NULL, 0, "inmsvcd"); #endif if (pid >= 0) { info("kernel thread with pid = %d has created", pid); driver_ctx->service_thread.initialized = 1; } err = (driver_ctx->service_thread.initialized == 0) ? 1 : 0; return err; } void destroy_service_thread() { if (driver_ctx->service_thread.initialized) { driver_ctx->flags |= DC_FLAGS_SERVICE_STATE_CHANGED; INM_ATOMIC_INC(&driver_ctx->service_thread.shutdown_event_raised); INM_WAKEUP(&driver_ctx->service_thread.shutdown_event); INM_COMPLETE(&driver_ctx->service_thread._new_event_completion); INM_WAIT_FOR_COMPLETION(&driver_ctx->service_thread._completion); INM_KTHREAD_STOP(service_thread_task); INM_DESTROY_COMPLETION(&driver_ctx->service_thread._completion); INM_DESTROY_COMPLETION(&driver_ctx->service_thread._new_event_completion); driver_ctx->service_thread.initialized = 0; } } inm_s32_t process_volume_state_change(target_context_t *tgt_ctxt, struct inm_list_head *lhptr, inm_s32_t statechanged) { inm_s32_t success = 1; inm_s32_t work_item_type = 0; inm_s32_t open_bitmap = FALSE; #define vcptr tgt_ctxt dbg("In process_volume_state_change() volume:%s\n",tgt_ctxt->tc_guid); if (tgt_ctxt->tc_dev_state == DEVICE_STATE_OFFLINE) { dbg("process_volume_state_change():: device marked OFFLINE \n"); INM_BUG_ON(1); success = add_vc_workitem_to_list(WITEM_TYPE_VOLUME_UNLOAD, tgt_ctxt, 0, 0, lhptr); dbg("Added target context work item ... \n"); remove_tc_from_dc(tgt_ctxt); return success; } #define __IS_FLAG_SET(_flag) (vcptr->tc_flags & (_flag)) if (__IS_FLAG_SET(VCF_FILTERING_STOPPED)) return success; /* CASE 1: bitmap file has to be opened */ if (__IS_FLAG_SET(VCF_OPEN_BITMAP_REQUESTED)) { success = add_vc_workitem_to_list(WITEM_TYPE_OPEN_BITMAP, vcptr, 0, FALSE, lhptr); if (!success) return success; } /* CASE 2: check for GUID */ if (__IS_FLAG_SET(VCF_GUID_OBTAINED)) { /* No GUID ?? then No bitmap */ return success; } #define _HIGH_WATER_MARK(_state) \ driver_ctx->tunable_params.db_high_water_marks[(_state)] #define _LOW_WATER_MARK \ driver_ctx->tunable_params.db_low_water_mark_while_service_running switch (driver_ctx->service_state) { case SERVICE_NOTSTARTED: if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("service has not started \n"); } if (_HIGH_WATER_MARK(SERVICE_NOTSTARTED) && !__IS_FLAG_SET(VCF_BITMAP_WRITE_DISABLED) && (vcptr->tc_pending_changes >= _HIGH_WATER_MARK(SERVICE_NOTSTARTED))) { work_item_type = WITEM_TYPE_BITMAP_WRITE; open_bitmap = TRUE; } break; case SERVICE_RUNNING: if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("SERVICE_RUNNING"); } /* RANGE 1: service state changed * LOW_WATER_MARK exists * change pending < LOW_WATER_MARK **/ if (statechanged && _LOW_WATER_MARK && !__IS_FLAG_SET(VCF_BITMAP_READ_DISABLED) && (vcptr->tc_pending_changes < _LOW_WATER_MARK)) { work_item_type = WITEM_TYPE_START_BITMAP_READ; open_bitmap = TRUE; break; } /* RANGE 2: HIGH_WATER_MARK exist * enabled BITMAP WRITE * Number of metadata changes >= HIGH_WATER_MARK changes **/ if (_HIGH_WATER_MARK(SERVICE_RUNNING) && !__IS_FLAG_SET(VCF_BITMAP_WRITE_DISABLED) && (vcptr->tc_pending_md_changes >= _HIGH_WATER_MARK(SERVICE_RUNNING))) { work_item_type = WITEM_TYPE_BITMAP_WRITE; open_bitmap = TRUE; break; } /* RANGE 3: LOW_WATER_MARK exist * enabled BITMAP READ * change pending < LOW_WATER_MARK **/ if (_LOW_WATER_MARK && !__IS_FLAG_SET(VCF_BITMAP_READ_DISABLED) && (vcptr->tc_pending_changes < _LOW_WATER_MARK)) { if (!vcptr->tc_bp->volume_bitmap) { work_item_type = WITEM_TYPE_START_BITMAP_READ; open_bitmap = TRUE; } else { switch (vcptr->tc_bp->volume_bitmap->eVBitmapState) { case ecVBitmapStateOpened: case ecVBitmapStateAddingChanges: work_item_type = WITEM_TYPE_START_BITMAP_READ; open_bitmap = FALSE; break; case ecVBitmapStateReadPaused: work_item_type = WITEM_TYPE_CONTINUE_BITMAP_READ; open_bitmap = FALSE; default: break; } } } break; case SERVICE_SHUTDOWN: /* service is shutdown, move all pending changes to bitmap */ if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("SERVICE_SHUTDOWN "); } if (statechanged) { if (vcptr->tc_cur_wostate != ecWriteOrderStateRawBitmap) set_tgt_ctxt_wostate(vcptr, ecWriteOrderStateBitmap, TRUE, eCWOSChangeReasonServiceShutdown); if (!__IS_FLAG_SET(VCF_BITMAP_WRITE_DISABLED)) { work_item_type = WITEM_TYPE_BITMAP_WRITE; open_bitmap = TRUE; } break; } if (_HIGH_WATER_MARK(SERVICE_SHUTDOWN) && !__IS_FLAG_SET(VCF_BITMAP_WRITE_DISABLED) && (vcptr->tc_pending_changes >= _HIGH_WATER_MARK(SERVICE_SHUTDOWN))) { work_item_type = WITEM_TYPE_BITMAP_WRITE; open_bitmap = TRUE; break; } break; default: INM_BUG_ON(0); break; } if (work_item_type != WITEM_TYPE_UNINITIALIZED) { success = add_vc_workitem_to_list(work_item_type, vcptr, 0, open_bitmap, lhptr); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("work item queued "); } } return success; } int service_state_change_thread(void *context) { long timeout_val = SERVICE_STATE_CHAGNE_THREAD_TIMEOUT; struct inm_list_head *ptr = NULL, *nextptr = NULL; target_context_t *tgt_ctxt = NULL; inm_s32_t service_state_changed = 0; struct inm_list_head workq_list_head; wqentry_t *wqeptr = NULL; inm_s32_t shutdown_event, wakeup_event; INM_DAEMONIZE("inmsvcd"); INM_INIT_LIST_HEAD(&workq_list_head); while (1) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("waiting for new event completion in service thread \n"); } INM_WAIT_FOR_COMPLETION_INTERRUPTIBLE(&driver_ctx->service_thread._new_event_completion); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("service thread received new event completion notification \n"); } wakeup_event = INM_WAIT_EVENT_INTERRUPTIBLE_TIMEOUT( driver_ctx->service_thread.wakeup_event, INM_ATOMIC_READ(&driver_ctx->service_thread.wakeup_event_raised), timeout_val); shutdown_event = !wakeup_event && INM_WAIT_EVENT_INTERRUPTIBLE_TIMEOUT( driver_ctx->service_thread.shutdown_event, INM_ATOMIC_READ(&driver_ctx->service_thread.shutdown_event_raised), timeout_val); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("looping in service thread\n"); } if (shutdown_event) { INM_ATOMIC_DEC(&driver_ctx->service_thread.shutdown_event_raised); info("shutdown event is received by service thread"); break; } if (!wakeup_event) { info(" wakeup event is not received "); continue; } INM_ATOMIC_DEC(&driver_ctx->service_thread.wakeup_event_raised); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("received wakeup event in service thread\n"); } if (driver_ctx->sys_shutdown) { wqentry_t *wqe = NULL; wqe = alloc_work_queue_entry(INM_KM_SLEEP); wqe->witem_type = WITEM_TYPE_SYSTEM_SHUTDOWN; add_item_to_work_queue(&driver_ctx->wqueue, wqe); } INM_DOWN_WRITE(&driver_ctx->tgt_list_sem); service_state_changed = (driver_ctx->flags & DC_FLAGS_SERVICE_STATE_CHANGED); if (service_state_changed) { driver_ctx->flags &= ~DC_FLAGS_SERVICE_STATE_CHANGED; } inm_list_for_each_safe(ptr, nextptr, &driver_ctx->tgt_list) { tgt_ctxt = inm_list_entry(ptr, target_context_t, tc_list); if(tgt_ctxt->tc_flags & (VCF_VOLUME_CREATING | VCF_VOLUME_DELETING)){ tgt_ctxt = NULL; continue; } volume_lock(tgt_ctxt); if (tgt_ctxt->tc_dev_type != FILTER_DEV_MIRROR_SETUP) { dbg("call process_volume_state_change ctx %p\n", tgt_ctxt); process_volume_state_change(tgt_ctxt, &workq_list_head, service_state_changed); } volume_unlock(tgt_ctxt); } INM_UP_WRITE(&driver_ctx->tgt_list_sem); inm_list_for_each_safe(ptr, nextptr, &workq_list_head) { wqeptr = inm_list_entry(ptr, wqentry_t, list_entry); inm_list_del(&wqeptr->list_entry); process_vcontext_work_items(wqeptr); put_work_queue_entry(wqeptr); } } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("received shutdown event in service thread\n"); } INM_COMPLETE_AND_EXIT(&driver_ctx->service_thread._completion, 0); return 0; } involflt-0.1.0/src/change-node.c0000755000000000000000000022150614467303177015173 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ /* * File : change-node.c * * Description: change node implementation. */ #include "involflt-common.h" #include "involflt.h" #include "data-mode.h" #include "utils.h" #include "change-node.h" #include "filestream.h" #include "iobuffer.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "bitmap_api.h" #include "VBitmap.h" #include "work_queue.h" #include "data-file-mode.h" #include "target-context.h" #include "change-node.h" #include "tunable_params.h" #include "driver-context.h" #include "file-io.h" #include "utils.h" #include "involflt_debug.h" #include "metadata-mode.h" #include "db_routines.h" #include "telemetry-types.h" #include "telemetry.h" #include "verifier.h" extern void finalize_data_stream(change_node_t *); extern driver_context_t *driver_ctx; int print_dblk_filename(change_node_t *chg_node); void print_change_node_off_length(change_node_t *chg_node); inm_s32_t verify_change_node(change_node_t *chg_node); /* * change_node_drain_barrier_set * * Marks a change node as drain barrier. */ static inline void change_node_drain_barrier_set(target_context_t *ctxt, change_node_t *chg_node) { INM_BUG_ON(ctxt->tc_cur_wostate != ecWriteOrderStateData); chg_node->flags |= CHANGE_NODE_DRAIN_BARRIER; } /* * change_node_drain_barrier_clear * * Remove the barrier flag from change node */ static inline void change_node_drain_barrier_clear(change_node_t *chg_node) { INM_BUG_ON(!IS_CHANGE_NODE_DRAIN_BARRIER(chg_node)); chg_node->flags &= ~CHANGE_NODE_DRAIN_BARRIER; } /* * commit_tag_for_one_volume * * Commit tag for one volume by removing the drain barrier on tag node */ inm_s32_t commit_usertag(target_context_t *ctxt) { inm_list_head_t *ptr = NULL; inm_list_head_t *next = NULL; change_node_t *chg_node = NULL; int commit = 0; volume_lock(ctxt); inm_list_for_each_safe(ptr, next, &ctxt->tc_node_head) { chg_node = inm_list_entry(ptr, change_node_t, next); if (chg_node->type == NODE_SRC_TAGS && IS_CHANGE_NODE_DRAIN_BARRIER(chg_node)) { dbg("Commit CN %p", chg_node); change_node_drain_barrier_clear(chg_node); commit = 1; break; } } volume_unlock(ctxt); if (!commit) { err("Could not find change node to commit"); return -1; } else { dbg("%p committed", chg_node); return 0; } } /* * revoke_tag_for_one_volume * * Revoke tag for one volume by removing the change node from db list */ void revoke_usertag(target_context_t *ctxt, int timedout) { inm_list_head_t *ptr = NULL; inm_list_head_t *next = NULL; change_node_t *chg_node = NULL; change_node_t *revoke = NULL; volume_lock(ctxt); inm_list_for_each_safe(ptr, next, &ctxt->tc_node_head) { chg_node = inm_list_entry(ptr, change_node_t, next); if (chg_node->type == NODE_SRC_TAGS && IS_CHANGE_NODE_DRAIN_BARRIER(chg_node)) { dbg("revoke tag %p", chg_node); inm_list_del(&chg_node->next); change_node_drain_barrier_clear(chg_node); /* * Since this is only supported on async tags * there is no need to update tag status tag_guid */ revoke = chg_node; break; } } volume_unlock(ctxt); if (revoke) { dbg("%p revoked", chg_node); telemetry_log_tag_history(revoke, ctxt, ecTagStatusRevoked, timedout ? ecRevokeTimeOut : ecRevokeCommitIOCTL, ecMsgTagRevoked); commit_change_node(chg_node); } else { err("Could not find change node to revoke"); } } inm_s32_t init_change_node(change_node_t *node, inm_s32_t from_pool, inm_s32_t flag, inm_wdata_t *wdatap) { inm_page_t *pgp = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("entered"); } INM_ATOMIC_SET(&node->ref_cnt, 0); node->type = NODE_SRC_UNDEFINED; node->wostate = ecWriteOrderStateUnInitialized; node->flags = 0; node->transaction_id = 0; #ifdef INM_AIX if(!node->mutext_initialized) INM_INIT_SEM(&node->mutex); #else INM_INIT_SEM(&node->mutex); #endif INM_INIT_LIST_HEAD(&node->next); INM_INIT_LIST_HEAD(&node->nwo_dmode_next); INM_INIT_LIST_HEAD(&node->data_pg_head); node->cur_data_pg = NULL; node->cur_data_pg_off = -1; node->data_free = 0; node->mapped_address = 0; node->mapped_thread = NULL; node->data_file_name = NULL; node->data_file_size = 0; node->stream_len = 0; FILL_STREAM_HEADER(&node->changes.start_ts, STREAM_REC_TYPE_TIME_STAMP_TAG, sizeof(TIME_STAMP_TAG_V2)); FILL_STREAM_HEADER(&node->changes.end_ts, STREAM_REC_TYPE_TIME_STAMP_TAG, sizeof(TIME_STAMP_TAG_V2)); get_time_stamp_tag(&node->changes.start_ts); /* keeping end time stamp, same as the start time stamp */ node->changes.end_ts.ullSequenceNumber = node->changes.start_ts.ullSequenceNumber; node->changes.end_ts.TimeInHundNanoSecondsFromJan1601 = node->changes.start_ts.TimeInHundNanoSecondsFromJan1601; INM_INIT_LIST_HEAD(&node->changes.md_pg_list); pgp = get_page_from_page_pool(from_pool, flag, wdatap); if (!pgp) { err("Failed to get the metadata page from the pool"); return 0; } inm_list_add_tail(&pgp->entry, &node->changes.md_pg_list); node->changes.cur_md_pgp = pgp->cur_pg; node->changes.num_data_pgs = 0; node->changes.bytes_changes = 0; node->changes.change_idx = 0; node->vcptr = NULL; node->seq_id_for_split_io = 1; INM_ATOMIC_INC(&driver_ctx->stats.pending_chg_nodes); node->tag_guid = NULL; node->cn_hist = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("leaving"); } return 1; } change_node_t * inm_alloc_change_node(inm_wdata_t *wdatap, unsigned flags) { change_node_t *node = NULL; #ifndef INM_AIX if(wdatap && wdatap->wd_chg_node){ node = wdatap->wd_chg_node; wdatap->wd_chg_node = (change_node_t *) node->next.next; node->next.next = NULL; }else{ node = INM_KMALLOC(sizeof(change_node_t), flags, INM_KERNEL_HEAP); if (node) INM_MEM_ZERO(node, sizeof(change_node_t)); } #else if(!wdatap || !wdatap->wd_chg_node){ node = INM_KMALLOC(sizeof(change_node_t), flags, INM_KERNEL_HEAP); if(node){ if(INM_PIN(node, sizeof(change_node_t))){ INM_KFREE(node, sizeof(change_node_t), INM_KERNEL_HEAP); node = NULL; }else node->mutext_initialized = 0; } }else{ node = wdatap->wd_chg_node; wdatap->wd_chg_node = NULL; } #endif return node; } void inm_free_change_node(change_node_t *node) { #ifndef INM_AIX #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) if (node->flags & CHANGE_NODE_ALLOCED_FROM_POOL) { unsigned long lock_flag = 0; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->page_pool_lock, lock_flag); INM_ATOMIC_INC(&driver_ctx->dc_nr_chdnodes_alloced); inm_list_add_tail(&node->next, &driver_ctx->dc_chdnodes_list); INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->page_pool_lock, lock_flag); } else INM_KFREE(node, sizeof(change_node_t), INM_KERNEL_HEAP); #else INM_KFREE(node, sizeof(change_node_t), INM_KERNEL_HEAP); #endif #else INM_UNPIN(node, sizeof(change_node_t)); INM_KFREE(node, sizeof(change_node_t), INM_KERNEL_HEAP); #endif } change_node_t *get_change_node_to_update(target_context_t *tgt_ctxt, inm_wdata_t *wdatap, inm_tsdelta_t *ts_delta) { change_node_t *node = tgt_ctxt->tc_cur_node; change_node_t *recent_cnode = NULL; struct inm_list_head *ptr; int perf_changes = 1; #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) int is_barrier_on = 0; #endif if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("entered"); } /* * Check if the current change node is matching with the filter mode. * If not create a new change node and set its type corresponding to * the filter mode. However if the change_idx of the node is 0 it means * we have a freshly allocated node and it can used, no need for * allocation. At present we have no limit on the number of changes in * data-mode change node while the meta-data-mode node is limited by * MAX_CHANGE_INFOS_PER_PAGE. */ #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) if ((INM_ATOMIC_READ(&driver_ctx->is_iobarrier_on)) ) { is_barrier_on = 1; perf_changes = 0; if(!(tgt_ctxt->tc_flags & VCF_IO_BARRIER_ON)) { node = NULL; tgt_ctxt->tc_flags |= VCF_IO_BARRIER_ON; tgt_ctxt->tc_cur_node = NULL; goto alloc_chg_node; } } #endif switch(tgt_ctxt->tc_cur_mode) { case FLT_MODE_DATA: if(node){ if(node->type != NODE_SRC_DATA) node = (!node->changes.change_idx) ? (node->type = NODE_SRC_DATA, node) : NULL; else if(node->wostate != tgt_ctxt->tc_cur_wostate) node = (!node->changes.change_idx) ? node : NULL; if(node){ inm_s32_t chg_sz = (sv_chg_sz + wdatap->wd_cplen); if((node->stream_len + sv_const_sz + chg_sz) > driver_ctx->tunable_params.max_data_sz_dm_cn) { tgt_ctxt->tc_cur_node = NULL; node = NULL; goto alloc_chg_node; } node->wostate = tgt_ctxt->tc_cur_wostate; } } break; default: if (!node) { break; } if (node->type == NODE_SRC_DATA) node = (!node->changes.change_idx) ? (node->type = NODE_SRC_METADATA, node) : NULL; else if(node->wostate != tgt_ctxt->tc_cur_wostate) node = (!node->changes.change_idx) ? node : NULL; if(!node) break; node->wostate = tgt_ctxt->tc_cur_wostate; if (node->changes.change_idx == (MAX_CHANGE_INFOS_PER_PAGE)) { node = NULL; } break; } if(!node) goto alloc_chg_node; /* deltas will bump up the global seq no. by one per io */ inm_get_ts_and_seqno_deltas(node, ts_delta); if (ts_delta->td_oflow) { tgt_ctxt->tc_cur_node = NULL; node = NULL; goto alloc_chg_node; } if (node && node->changes.change_idx > 0 && (node->changes.change_idx % (MAX_CHANGE_INFOS_PER_PAGE)) == 0) { /* md page is full, and allocate new one */ inm_page_t *pgp = NULL; pgp = get_page_from_page_pool(1, INM_KM_NOSLEEP, wdatap); if (!pgp) { return NULL; } inm_list_add_tail(&pgp->entry, &node->changes.md_pg_list); node->changes.cur_md_pgp = pgp->cur_pg; tgt_ctxt->tc_cnode_pgs++; } alloc_chg_node: /* There exist no matching change node to update or * current change node is filled up */ if (!node) { /* Now see if we can put the oldest data mode change node * without write order on a separate list */ if ((tgt_ctxt->tc_optimize_performance & PERF_OPT_DRAIN_PREF_DATA_MODE_CHANGES_IN_NWO) && !inm_list_empty(&tgt_ctxt->tc_node_head) && perf_changes == 1) { ptr = tgt_ctxt->tc_node_head.prev; recent_cnode = (change_node_t *)inm_list_entry(ptr, change_node_t, next); do_perf_changes(tgt_ctxt, recent_cnode, IN_IO_PATH); } node = inm_alloc_change_node(wdatap, INM_KM_NOSLEEP); if (!node) return NULL; if(!init_change_node(node, 1, INM_KM_NOSLEEP, wdatap)) { inm_free_change_node(node); return NULL; } /* change node belongs to either data mode or * meta data mode dirty block * but not combination of both the modes */ switch(tgt_ctxt->tc_cur_mode) { case FLT_MODE_DATA: node->type = NODE_SRC_DATA; break; default: node->type = NODE_SRC_METADATA; break; } node->wostate = tgt_ctxt->tc_cur_wostate; ref_chg_node(node); node->transaction_id = 0; #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) if (is_barrier_on) { inm_list_add_tail(&node->next, &tgt_ctxt->tc_non_drainable_node_head); } else { if(!inm_list_empty(&tgt_ctxt->tc_non_drainable_node_head)) { do_perf_changes_all(tgt_ctxt, IN_IO_PATH); tgt_ctxt->tc_flags &= ~VCF_IO_BARRIER_ON; inm_list_splice_at_tail(&tgt_ctxt->tc_non_drainable_node_head, &tgt_ctxt->tc_node_head); INM_INIT_LIST_HEAD(&tgt_ctxt->tc_non_drainable_node_head); } inm_list_add_tail(&node->next, &tgt_ctxt->tc_node_head); } #else inm_list_add_tail(&node->next, &tgt_ctxt->tc_node_head); #endif tgt_ctxt->tc_nr_cns++; tgt_ctxt->tc_cur_node = node; node->vcptr = tgt_ctxt; tgt_ctxt->tc_cnode_pgs++; inm_get_ts_and_seqno_deltas(node, ts_delta); } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("leaving"); } return node; } void do_perf_changes(target_context_t *tgt_ctxt, change_node_t *recent_cnode, int path) { if (recent_cnode && recent_cnode->type == NODE_SRC_DATA && recent_cnode->wostate != ecWriteOrderStateData && recent_cnode != tgt_ctxt->tc_pending_confirm && inm_list_empty(&recent_cnode->nwo_dmode_next)) { close_change_node(recent_cnode, path); inm_list_add_tail(&recent_cnode->nwo_dmode_next, &tgt_ctxt->tc_nwo_dmode_list); if (tgt_ctxt->tc_optimize_performance & PERF_OPT_DEBUG_DATA_DRAIN) { info("Appending chg:%p to tgt_ctxt:%p next:%p prev:%p " "mdoe:%d", recent_cnode,tgt_ctxt, recent_cnode->nwo_dmode_next.next, recent_cnode->nwo_dmode_next.prev, recent_cnode->type); } } } #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) void do_perf_changes_all(target_context_t *tgt_ctxt, int path) { inm_list_head_t *ptr = NULL, *nextptr = NULL; change_node_t *cnode = NULL; inm_list_for_each_safe(ptr, nextptr, &tgt_ctxt->tc_non_drainable_node_head) { cnode = inm_list_entry(ptr, change_node_t, next); if ((tgt_ctxt->tc_optimize_performance & PERF_OPT_DRAIN_PREF_DATA_MODE_CHANGES_IN_NWO) && tgt_ctxt->tc_cur_node != cnode) { do_perf_changes(tgt_ctxt, cnode, path); } } } void move_chg_nodes_to_drainable_queue(void) { inm_list_head_t *ptr = NULL, *nextptr = NULL; target_context_t *tgt_ctxt = NULL; INM_DOWN_WRITE(&driver_ctx->tgt_list_sem); inm_list_for_each_safe(ptr, nextptr, &driver_ctx->tgt_list) { tgt_ctxt = inm_list_entry(ptr, target_context_t, tc_list); volume_lock(tgt_ctxt); if (tgt_ctxt->tc_flags & VCF_IO_BARRIER_ON) { tgt_ctxt->tc_flags &= ~VCF_IO_BARRIER_ON; if(!inm_list_empty(&tgt_ctxt->tc_non_drainable_node_head)) { do_perf_changes_all(tgt_ctxt, IN_IOCTL_PATH); inm_list_splice_at_tail(&tgt_ctxt->tc_non_drainable_node_head, &tgt_ctxt->tc_node_head); INM_INIT_LIST_HEAD(&tgt_ctxt->tc_non_drainable_node_head); } } volume_unlock(tgt_ctxt); } INM_UP_WRITE(&driver_ctx->tgt_list_sem); } #endif void close_change_node(change_node_t *chg_node, inm_u32_t path) { target_context_t *tgt_ctxt; inm_s32_t i, md_idx, count = 0; disk_chg_t *chg = NULL; inm_page_t *pgp = NULL; struct inm_list_head *ptr; if (!chg_node) return; tgt_ctxt = chg_node->vcptr; INM_BUG_ON(chg_node->flags & CHANGE_NODE_IN_NWO_CLOSED); INM_BUG_ON(!tgt_ctxt); INM_BUG_ON(!(tgt_ctxt->tc_optimize_performance & (PERF_OPT_DRAIN_PREF_DATA_MODE_CHANGES_IN_NWO))); /* no need to close metadata change node until it gets drained */ if (path != IN_GET_DB_PATH && chg_node->type == NODE_SRC_METADATA) { return; } /* * Performance effort - * Always drain data mode change node if any except the active one * Otherwise metadata change node with tweaked start,end TS & deltas * As we are bumping up the start seq number of a metadata * change node while draining, it may clash with the active change * nodes start sequence number, so then we would see OOD issue * Solution - Close non write order data mode change always with * current start and end time with zero deltas * PS - Change node with write order is always closed, so this fn * should not be called for it. */ if (chg_node->wostate == ecWriteOrderStateData) return; if (tgt_ctxt->tc_optimize_performance & PERF_OPT_DEBUG_DATA_DRAIN) { info("Closing chg:%p to tgt_ctxt:%p next:%p prev:%p mode:%d " "path:%d", chg_node, tgt_ctxt, chg_node->nwo_dmode_next.next, chg_node->nwo_dmode_next.prev, chg_node->type, path); } chg_node->flags |= CHANGE_NODE_IN_NWO_CLOSED; switch (chg_node->type) { case NODE_SRC_DATA: /* Apply global time stamps in non write order mode */ get_time_stamp_tag(&chg_node->changes.start_ts); memcpy_s(&chg_node->changes.end_ts, sizeof(TIME_STAMP_TAG_V2), &chg_node->changes.start_ts, sizeof(TIME_STAMP_TAG_V2)); break; case NODE_SRC_METADATA: /* set start and end time stamp to current timestamp */ get_time_stamp_tag(&chg_node->changes.start_ts); chg_node->changes.start_ts.ullSequenceNumber = tgt_ctxt->tc_PrevEndSequenceNumber + 1; /* split IOs in metadata mode no meaning and atomicity * could be * lost if partial split IOs make it to * bitmap */ if (chg_node->flags & KDIRTY_BLOCK_FLAG_SPLIT_CHANGE_MASK) { chg_node->flags &= ~(KDIRTY_BLOCK_FLAG_SPLIT_CHANGE_MASK); chg_node->seq_id_for_split_io = 1; } memcpy_s(&chg_node->changes.end_ts, sizeof(TIME_STAMP_TAG_V2), &chg_node->changes.start_ts, sizeof(TIME_STAMP_TAG_V2)); /* Tweak the time stamp and seq deltas in all * changes */ __inm_list_for_each(ptr, &chg_node->changes.md_pg_list) { pgp = inm_list_entry(ptr, inm_page_t, entry); i = 0; while (i < MAX_CHANGE_INFOS_PER_PAGE) { md_idx = count % (MAX_CHANGE_INFOS_PER_PAGE); chg = (disk_chg_t *) ((char *)pgp->cur_pg + (sizeof(disk_chg_t) * md_idx)); chg->seqno_delta = 0; chg->time_delta = 0; i++; count++; if (count == chg_node->changes.change_idx) { break; } } } break; case NODE_SRC_TAGS: /* control may come here for tag tracked with page i * alloc failure */ break; default: err("Invalid change node type while closing a change " "node:%p", chg_node); INM_BUG_ON(1); break; } } /* Caller is responsible to hold the target context lock. */ change_node_t *get_oldest_change_node(target_context_t *tgt_ctxt, inm_s32_t *status) { struct inm_list_head *ptr; change_node_t *chg_node; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("entered"); } tgt_ctxt->tc_tel.tt_getdb++; if(INM_UNLIKELY(tgt_ctxt->tc_pending_confirm || inm_list_empty(&tgt_ctxt->tc_node_head))) { if (INM_UNLIKELY(tgt_ctxt->tc_pending_confirm)) tgt_ctxt->tc_tel.tt_revertdb++; return tgt_ctxt->tc_pending_confirm; } ptr = tgt_ctxt->tc_node_head.next; chg_node = (change_node_t *)inm_list_entry(ptr, change_node_t, next); if (chg_node == tgt_ctxt->tc_cur_node) { tgt_ctxt->tc_cur_node = NULL; } if (chg_node) { if (IS_CHANGE_NODE_DRAIN_BARRIER(chg_node)) { dbg("Drain Barrier, returning EAGAIN"); tgt_ctxt->tc_flags |= VCF_DRAIN_BARRIER; *status = INM_EAGAIN; return NULL; } else { if (!chg_node->transaction_id) { chg_node->transaction_id = ++tgt_ctxt->tc_transaction_id; } tgt_ctxt->tc_pending_confirm = chg_node; tgt_ctxt->tc_flags &= ~VCF_DRAIN_BARRIER; get_tgt_ctxt(tgt_ctxt); } } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("leaving"); } return chg_node; } /** * FUNCTION NAME: get_oldest_change_node_pref_datamode * * DESCRIPTION : Get the data mode change first then metadata changes * Preferably get non write order data mode change or * get metadata change node by tweaking the time stamps * and sequence number * * return value : change_node_t * - for success * NULL - for failure **/ change_node_t * get_oldest_change_node_pref_datamode(target_context_t *tgt_ctxt, inm_s32_t *status) { struct inm_list_head *ptr; change_node_t *chg_node = NULL; struct inm_list_head *oldest_ptr = NULL; #if (defined(IDEBUG) || defined(IDEBUG_META)) info("entered"); #endif tgt_ctxt->tc_tel.tt_getdb++; /* * In case, change node was already mapped to user space and for some * reason agent did not commit, died then came back for get db, * give it tc_pending_confirm change node OR * if change node list is empty then tc_pending_confirm would be NULL * i.e. tso file case */ if (INM_UNLIKELY(tgt_ctxt->tc_pending_confirm || inm_list_empty(&tgt_ctxt->tc_node_head))) { if (INM_UNLIKELY(tgt_ctxt->tc_pending_confirm)) tgt_ctxt->tc_tel.tt_revertdb++; return tgt_ctxt->tc_pending_confirm; } /* Walk through real change node list to drain write order changes as * it is */ if (!inm_list_empty(&tgt_ctxt->tc_node_head)) { oldest_ptr = tgt_ctxt->tc_node_head.next; chg_node = (change_node_t *)inm_list_entry(oldest_ptr, change_node_t, next); if (chg_node) { if (chg_node->wostate == ecWriteOrderStateData) { if (tgt_ctxt->tc_optimize_performance & PERF_OPT_DEBUG_DATA_DRAIN) { info("Drain wo chgnode from tc_node_head " "chg_node:%p tc_cur_node:%p", chg_node, tgt_ctxt->tc_cur_node); } INM_BUG_ON(chg_node->type == NODE_SRC_METADATA); } else { if (chg_node->type == NODE_SRC_DATA || chg_node->type == NODE_SRC_DATAFILE || /*Tag page alloc failure*/ chg_node->type == NODE_SRC_TAGS) { if (tgt_ctxt->tc_optimize_performance & PERF_OPT_DEBUG_DATA_DRAIN) { info("Drain chg node from tc_node_head" " chg_node:%p mode:%d" " tc_cur_node:%p", chg_node, chg_node->type, tgt_ctxt->tc_cur_node); } } else { chg_node = NULL; } } } } /* * Currently we support drain barrier only for wostate=DATA * so there is no need to check nwo_list below for drain barrier * change node. We can check and return here. */ if (chg_node) { if (IS_CHANGE_NODE_DRAIN_BARRIER(chg_node)) { dbg("Drain Barrier, returning EAGAIN"); tgt_ctxt->tc_flags |= VCF_DRAIN_BARRIER; *status = INM_EAGAIN; return NULL; } } else { /* Haven't found the candidate for draining yet ? * Look up the nwo data mode list or metadata change nodes */ /* Walk though the non write order mode data mode change node * list */ if (!inm_list_empty(&tgt_ctxt->tc_nwo_dmode_list)) { ptr = tgt_ctxt->tc_nwo_dmode_list.next; chg_node = (change_node_t *)inm_list_entry(ptr, change_node_t, nwo_dmode_next); INM_BUG_ON(chg_node->type != NODE_SRC_DATA && chg_node->type != NODE_SRC_DATAFILE); if (tgt_ctxt->tc_optimize_performance & PERF_OPT_DEBUG_DATA_DRAIN) { info("Draining chgnode from nwo list chg_node:" "%p next:%p prev:%p mode:%d", chg_node, chg_node->nwo_dmode_next.next, chg_node->nwo_dmode_next.prev, chg_node->type); } } if (!chg_node) { /* * Always fabricate transaction id, start_ts and end_ts * of a meta data change node as we are not modifying * them for non write order data mode change node list */ chg_node = (change_node_t *)inm_list_entry(oldest_ptr, change_node_t, next); if (!chg_node) { err("Empty change node list tc_mode:%d", tgt_ctxt->tc_cur_wostate); goto ret; } INM_BUG_ON(chg_node->type != NODE_SRC_METADATA); if (chg_node != tgt_ctxt->tc_cur_node && !(chg_node->flags & CHANGE_NODE_IN_NWO_CLOSED)) { close_change_node(chg_node, IN_GET_DB_PATH); } if (tgt_ctxt->tc_optimize_performance & PERF_OPT_DEBUG_DATA_DRAIN) { dbg("Draining change node in chg_node:%p mode:" "%d delta ts:%llu seq:%llu" "tc_cur_node:%p", chg_node,chg_node->type, chg_node->changes.end_ts.TimeInHundNanoSecondsFromJan1601 - chg_node->changes.start_ts.TimeInHundNanoSecondsFromJan1601, chg_node->changes.end_ts.ullSequenceNumber - chg_node->changes.start_ts.ullSequenceNumber, tgt_ctxt->tc_cur_node); } } } INM_BUG_ON(!chg_node); if (chg_node == tgt_ctxt->tc_cur_node) { close_change_node(chg_node, IN_GET_DB_PATH); if (tgt_ctxt->tc_optimize_performance & PERF_OPT_DRAIN_PREF_DATA_MODE_CHANGES_IN_NWO) { INM_BUG_ON(!inm_list_empty(&chg_node->nwo_dmode_next)); if (chg_node->type == NODE_SRC_DATA && chg_node->wostate != ecWriteOrderStateData) { inm_list_add_tail(&chg_node->nwo_dmode_next, &tgt_ctxt->tc_nwo_dmode_list); if (tgt_ctxt->tc_optimize_performance & PERF_OPT_DEBUG_DATA_DRAIN) { info("Appending tc_cur_node chg:%p to " "tgt_ctxt:%p next:%p prev:%p mode:%d", chg_node,tgt_ctxt, chg_node->nwo_dmode_next.next, chg_node->nwo_dmode_next.prev, chg_node->type); } } } tgt_ctxt->tc_cur_node = NULL; } tgt_ctxt->tc_pending_confirm = chg_node; tgt_ctxt->tc_flags &= ~VCF_DRAIN_BARRIER; get_tgt_ctxt(tgt_ctxt); if (!chg_node->transaction_id) { chg_node->transaction_id = ++tgt_ctxt->tc_transaction_id; } ret: #if (defined(IDEBUG) || defined(IDEBUG_META)) info("leaving ret:%p", chg_node); #endif return chg_node; } /* Assumes target context lock is held. */ change_node_t *get_change_node_for_usertag(target_context_t *tgt_ctxt, inm_wdata_t *wdatap, int commit_pending) { change_node_t *chg_node = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("entered"); } if (commit_pending && tgt_ctxt->tc_cur_wostate != ecWriteOrderStateData) { err("Tagging without write order state data"); return NULL; } if (tgt_ctxt->tc_cur_node && (tgt_ctxt->tc_optimize_performance & PERF_OPT_DRAIN_PREF_DATA_MODE_CHANGES_IN_NWO)) { INM_BUG_ON(!inm_list_empty(&tgt_ctxt->tc_cur_node->nwo_dmode_next)); if (tgt_ctxt->tc_cur_node->type == NODE_SRC_DATA && tgt_ctxt->tc_cur_node->wostate != ecWriteOrderStateData) { close_change_node(tgt_ctxt->tc_cur_node, IN_IOCTL_PATH); inm_list_add_tail(&tgt_ctxt->tc_cur_node->nwo_dmode_next, &tgt_ctxt->tc_nwo_dmode_list); if (tgt_ctxt->tc_optimize_performance & PERF_OPT_DEBUG_DATA_DRAIN) { info("Appending chg:%p to tgt_ctxt:%p next:%p " "prev:%p mode:%d", tgt_ctxt->tc_cur_node,tgt_ctxt, tgt_ctxt->tc_cur_node->nwo_dmode_next.next, tgt_ctxt->tc_cur_node->nwo_dmode_next.prev, tgt_ctxt->tc_cur_node->type); } } } chg_node = inm_alloc_change_node(wdatap, INM_KM_NOSLEEP); if (!chg_node) return NULL; if(!init_change_node(chg_node, 1, INM_KM_NOSLEEP, wdatap)) { inm_free_change_node(chg_node); return NULL; } if (commit_pending) { tgt_ctxt->tc_flags |= VCF_TAG_COMMIT_PENDING; change_node_drain_barrier_set(tgt_ctxt, chg_node); dbg("CN %p commit pending", chg_node); } chg_node->type = NODE_SRC_TAGS; chg_node->wostate = tgt_ctxt->tc_cur_wostate; ref_chg_node(chg_node); chg_node->transaction_id = 0; ++tgt_ctxt->tc_nr_cns; inm_list_add_tail(&chg_node->next, &tgt_ctxt->tc_node_head); chg_node->vcptr = tgt_ctxt; tgt_ctxt->tc_cnode_pgs++; /* Set cur_node in target context to NULL, so new changes would result * in allocation of new change node. */ tgt_ctxt->tc_cur_node = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("leaving"); } return chg_node; } void inm_free_metapage(inm_page_t *pgp) { #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) if (pgp->flags & METAPAGE_ALLOCED_FROM_POOL) { unsigned long lock_flag = 0; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->page_pool_lock, lock_flag); INM_ATOMIC_INC(&driver_ctx->dc_nr_metapages_alloced); inm_list_add_tail(&pgp->entry, &driver_ctx->page_pool); INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->page_pool_lock, lock_flag); } #else INM_UNPIN(pgp->cur_pg, INM_PAGESZ); INM_FREE_PAGE(pgp->cur_pg, INM_KERNEL_HEAP); pgp->cur_pg = NULL; INM_UNPIN(pgp, sizeof(inm_page_t)); INM_KFREE(pgp, sizeof(inm_page_t), INM_KERNEL_HEAP); #endif } void cleanup_change_node(change_node_t *chg_node) { struct inm_list_head *curp = NULL, *nxtp = NULL; inm_page_t *pgp = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("entered"); } /* If change node on non write order data mode list, then * remove it from that list */ if (!inm_list_empty(&chg_node->nwo_dmode_next)) { inm_list_del_init(&chg_node->nwo_dmode_next); } chg_node->changes.cur_md_pgp = NULL; inm_list_for_each_safe(curp, nxtp, &chg_node->changes.md_pg_list) { inm_list_del(curp); pgp = inm_list_entry(curp, inm_page_t, entry); inm_free_metapage(pgp); chg_node->vcptr->tc_cnode_pgs--; } INM_DESTROY_SEM(&chg_node->mutex); inm_free_change_node(chg_node); INM_ATOMIC_DEC(&driver_ctx->stats.pending_chg_nodes); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("leaving"); } } void cleanup_change_nodes(struct inm_list_head *hd, etTagStateTriggerReason reason) { struct inm_list_head *ptr, *cur; change_node_t *chg_node; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("entered"); } for( ptr = hd->next; ptr != hd; ) { cur = ptr; ptr = ptr->next; inm_list_del(cur); chg_node = inm_list_entry(cur, change_node_t, next); if (chg_node->type == NODE_SRC_TAGS) telemetry_log_tag_history(chg_node, chg_node->vcptr, ecTagStatusDropped, reason, ecMsgTagDropped); if (chg_node->tag_guid) { chg_node->tag_guid->status[chg_node->tag_status_idx] = STATUS_DELETED; INM_WAKEUP_INTERRUPTIBLE(&chg_node->tag_guid->wq); chg_node->tag_guid = NULL; } commit_change_node(chg_node); } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("leaving"); } } void free_changenode_list(target_context_t *ctxt, etTagStateTriggerReason reason) { struct inm_list_head *clist = NULL, *ptr = NULL, *nextptr = NULL; change_node_t *cnode = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } clist = &ctxt->tc_node_head; ctxt->tc_pending_confirm = NULL; inm_list_for_each_safe(ptr, nextptr, &ctxt->tc_node_head) { cnode = inm_list_entry(ptr, change_node_t, next); inm_list_del(ptr); switch(cnode->type) { case NODE_SRC_DATAFILE: case NODE_SRC_DATA: case NODE_SRC_METADATA: /* If change node on non write order data mode list, then * remove it from that list */ if (!inm_list_empty(&cnode->nwo_dmode_next)) { inm_list_del_init(&cnode->nwo_dmode_next); } INM_BUG_ON(ctxt->tc_pending_changes < (cnode->changes.change_idx)); ctxt->tc_pending_changes -= (cnode->changes.change_idx); if (cnode->type == NODE_SRC_METADATA) { ctxt->tc_pending_md_changes -= (cnode->changes.change_idx); ctxt->tc_bytes_pending_md_changes -= (cnode->changes.bytes_changes); } ctxt->tc_bytes_pending_changes -= cnode->changes.bytes_changes; subtract_changes_from_pending_changes(ctxt, cnode->wostate, cnode->changes.change_idx); dbg("queuing changenode cleanup worker routine for cnode %p\n", cnode); queue_changenode_cleanup_worker_routine(cnode, reason); break; case NODE_SRC_TAGS: queue_changenode_cleanup_worker_routine(cnode, reason); break; default: break; } } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } } void changenode_cleanup_routine(wqentry_t *wqe) { change_node_t *cnode = NULL; etTagStateTriggerReason reason; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!wqe) return; cnode = (change_node_t *)wqe->context; if (cnode->type == NODE_SRC_TAGS) { reason = (etTagStateTriggerReason)wqe->extra1; telemetry_log_tag_history(cnode, cnode->vcptr, ecTagStatusDropped, reason, ecMsgTagDropped); } put_work_queue_entry(wqe); commit_change_node(cnode); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } } inm_s32_t queue_changenode_cleanup_worker_routine(change_node_t *cnode, etTagStateTriggerReason reason) { wqentry_t *wqe = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } wqe = alloc_work_queue_entry(INM_KM_NOSLEEP); if (!wqe) return 1; wqe->witem_type = WITEM_TYPE_VOLUME_UNLOAD; wqe->context = cnode; wqe->work_func = changenode_cleanup_routine; wqe->extra1 = (inm_u32_t)reason; dbg("queuing work queue entry for changenode %p cleanup \n", cnode); add_item_to_work_queue(&driver_ctx->wqueue, wqe); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return 0; } /* This function assumes target context lock is held. */ change_node_t *get_change_node_to_save_as_file(target_context_t *ctxt) { struct inm_list_head *ptr = ctxt->tc_node_head.prev; change_node_t *node = NULL; data_file_flt_t *flt_ctxt = &ctxt->tc_dfm; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("entered"); } if(INM_ATOMIC_READ(&flt_ctxt->terminating)) return NULL; if (ctxt->tc_stats.dfm_bytes_to_disk >= ctxt->tc_data_to_disk_limit) { return NULL; } if (ptr) { /* This is to ignore the current change node. */ ptr = ptr->prev; if (!ptr) return NULL; } else { return NULL; } while(ptr != &ctxt->tc_node_head) { node = inm_list_entry(ptr, change_node_t, next); if ((node->type == NODE_SRC_DATA) && (0 == (node->flags & (CHANGE_NODE_FLAGS_QUEUED_FOR_DATA_WRITE | CHANGE_NODE_FLAGS_ERROR_IN_DATA_WRITE | CHANGE_NODE_DATA_PAGES_MAPPED_TO_S2)))) { node->flags |= CHANGE_NODE_FLAGS_QUEUED_FOR_DATA_WRITE; ref_chg_node(node); return node; } ptr = ptr->prev; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("leaving"); } return NULL; } inm_page_t * get_page_from_page_pool(inm_s32_t alloc_from_page_pool, inm_s32_t flag, inm_wdata_t *wdatap) { inm_page_t *pg = NULL; #ifndef INM_AIX unsigned long lock_flag = 0; #endif if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("entered"); } #ifndef INM_AIX if(alloc_from_page_pool) { INM_SPIN_LOCK_IRQSAVE(&driver_ctx->page_pool_lock, lock_flag); if(inm_list_empty(&driver_ctx->page_pool)) { #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) INM_ATOMIC_INC(&driver_ctx->dc_nr_metapage_allocs_failed); #endif INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->page_pool_lock, lock_flag); goto alloc_fresh; } pg = inm_list_entry(driver_ctx->page_pool.next, inm_page_t, entry); inm_list_del(&pg->entry); /* Decrement the number of free change nodes in the list. */ driver_ctx->dc_res_cnode_pgs--; #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) INM_ATOMIC_DEC(&driver_ctx->dc_nr_metapages_alloced); INM_ATOMIC_INC(&driver_ctx->dc_nr_metapages_alloced_from_pool); wake_up_interruptible(&driver_ctx->dc_alloc_thread_waitq); #endif INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->page_pool_lock, lock_flag); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("leaving"); } return pg; } alloc_fresh: pg = (inm_page_t *) INM_KMALLOC(sizeof(inm_page_t), flag, INM_KERNEL_HEAP); if (pg) { pg->cur_pg = (unsigned long *) __INM_GET_FREE_PAGE(flag, INM_KERNEL_HEAP); if (!pg->cur_pg) { err("page alloc failed"); INM_KFREE(pg, sizeof(inm_page_t), INM_KERNEL_HEAP); pg = NULL; } } else { err("inm_page_t alloc failed"); } #else if(!wdatap || !wdatap->wd_meta_page){ pg = (inm_page_t *)INM_KMALLOC(sizeof(inm_page_t), flag, INM_KERNEL_HEAP); if(!pg) goto out; if(INM_PIN(pg, sizeof(inm_page_t))){ INM_KFREE(pg, sizeof(inm_page_t), INM_KERNEL_HEAP); pg = NULL; goto out; } INM_MEM_ZERO(pg, sizeof(inm_page_t)); INM_INIT_LIST_HEAD(&pg->entry); pg->cur_pg = (unsigned long *) __INM_GET_FREE_PAGE(flag, INM_KERNEL_HEAP); if (!pg->cur_pg) { err("page alloc failed"); INM_UNPIN(pg, sizeof(inm_page_t)); INM_KFREE(pg, sizeof(inm_page_t), INM_KERNEL_HEAP); pg = NULL; goto out; } if(INM_PIN(pg->cur_pg, INM_PAGESZ)){ INM_FREE_PAGE(pg->cur_pg, INM_KERNEL_HEAP); INM_UNPIN(pg, sizeof(inm_page_t)); INM_KFREE(pg, sizeof(inm_page_t), INM_KERNEL_HEAP); pg = NULL; goto out; } }else{ pg = wdatap->wd_meta_page; INM_BUG_ON(!pg); wdatap->wd_meta_page = NULL; } out: #endif INM_BUG_ON(!pg); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("leaving with fresh allocation"); } return pg; } void print_chg_info(change_node_t *cnp, unsigned short idx) { disk_chg_t *dcp = NULL; if (cnp->type == NODE_SRC_TAGS) return; idx = (idx % (MAX_CHANGE_INFOS_PER_PAGE)); dcp = (disk_chg_t *)((char *)cnp->changes.cur_md_pgp + (sizeof(disk_chg_t) * idx)); info("cnode = %p type = %d", cnp, cnp->type); info("start ts = %llu , start seq = %llu ", cnp->changes.start_ts.TimeInHundNanoSecondsFromJan1601, cnp->changes.start_ts.ullSequenceNumber); info("end ts = %llu , end seq = %llu", cnp->changes.end_ts.TimeInHundNanoSecondsFromJan1601, cnp->changes.end_ts.ullSequenceNumber); info("chg # = %u, off = %llu , len = %u , td = %u, sd = %u ", idx + 1, dcp->offset, dcp->length,dcp->time_delta, dcp->seqno_delta); info("=============================================================="); } void update_change_node(change_node_t *chg_node, struct _write_metadata_tag *wmd, inm_tsdelta_t *tdp) { inm_s32_t md_idx = chg_node->changes.change_idx % (MAX_CHANGE_INFOS_PER_PAGE); disk_chg_t *chg = (disk_chg_t *) ((char *)chg_node->changes.cur_md_pgp + (sizeof(disk_chg_t) * md_idx)); /* common code for data and metadata mode */ chg->offset = wmd->offset; chg->length = wmd->length; chg->seqno_delta = tdp->td_seqno; chg->time_delta = tdp->td_time; /* last ts : time = start time + time delta of last change, seqno = start seq + seq nr delta of last change */ chg_node->changes.end_ts.TimeInHundNanoSecondsFromJan1601 = (chg_node->changes.start_ts.TimeInHundNanoSecondsFromJan1601 + tdp->td_time); chg_node->changes.end_ts.ullSequenceNumber = (chg_node->changes.start_ts.ullSequenceNumber + tdp->td_seqno); chg_node->changes.bytes_changes += chg->length; chg_node->changes.change_idx++; } static_inline void copy_metadata_to_udirty(UDIRTY_BLOCK_V2 *udirty, change_node_t *chg_node) { inm_s32_t _num_chgs = chg_node->changes.change_idx; inm_s32_t _cur_chg = 0; disk_chg_t *_chg = NULL; while(_cur_chg < _num_chgs) { _chg = (disk_chg_t *)((char *)chg_node->changes.cur_md_pgp + (sizeof(disk_chg_t) * _cur_chg)); udirty->ChangeOffsetArray[_cur_chg] = _chg->offset; udirty->ChangeLengthArray[_cur_chg] = _chg->length; udirty->TimeDeltaArray[_cur_chg] = _chg->time_delta; udirty->SequenceNumberDeltaArray[_cur_chg] = _chg->seqno_delta; _cur_chg++; } } int copy_chg_node_to_udirty(struct _target_context *ctxt, change_node_t *chg_node, UDIRTY_BLOCK_V2 *udirty, inm_devhandle_t *filp) { inm_s32_t bytes; inm_s32_t status = 0; inm_u64_t ts_in_usec; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("entered"); } udirty->uHdr.Hdr.ulFlags = 0; udirty->uHdr.Hdr.uliTransactionID = chg_node->transaction_id; udirty->uHdr.Hdr.cChanges = chg_node->changes.change_idx; udirty->uHdr.Hdr.ulicbChanges = chg_node->changes.bytes_changes; udirty->uHdr.Hdr.ulSequenceIDforSplitIO = chg_node->seq_id_for_split_io; udirty->uHdr.Hdr.ulTotalChangesPending = ctxt->tc_pending_changes - udirty->uHdr.Hdr.cChanges; udirty->uHdr.Hdr.ulicbTotalChangesPending = (ctxt->tc_bytes_pending_changes - udirty->uHdr.Hdr.ulicbChanges); udirty->uHdr.Hdr.liOutOfSyncTimeStamp = 0; udirty->uHdr.Hdr.ulOutOfSyncErrorCode = 0; udirty->uHdr.Hdr.ulBufferSize = INM_PAGESZ; udirty->uHdr.Hdr.usNumberOfBuffers = chg_node->changes.num_data_pgs; ctxt->tc_tso_file = 0; INM_MEM_ZERO(udirty->uTagList.BufferForTags, UDIRTY_BLOCK_TAGS_SIZE); FILL_STREAM_HEADER_4B(&udirty->uTagList.TagList.TagStartOfList, STREAM_REC_TYPE_START_OF_TAG_LIST, sizeof(STREAM_REC_HDR_4B)); FILL_STREAM_HEADER_4B(&udirty->uTagList.TagList.TagPadding, STREAM_REC_TYPE_PADDING, sizeof(STREAM_REC_HDR_4B)); FILL_STREAM_HEADER_4B(&udirty->uTagList.TagList.TagEndOfList, STREAM_REC_TYPE_END_OF_TAG_LIST, sizeof(STREAM_REC_HDR_4B)); FILL_STREAM_HEADER(&udirty->uTagList.TagList.TagDataSource, STREAM_REC_TYPE_DATA_SOURCE, sizeof(DATA_SOURCE_TAG)); memcpy_s(&udirty->uTagList.TagList.TagTimeStampOfFirstChange, sizeof(TIME_STAMP_TAG_V2), &chg_node->changes.start_ts, sizeof(TIME_STAMP_TAG_V2)); memcpy_s(&udirty->uTagList.TagList.TagTimeStampOfLastChange, sizeof(TIME_STAMP_TAG_V2), &chg_node->changes.end_ts, sizeof(TIME_STAMP_TAG_V2)); if (chg_node->flags & KDIRTY_BLOCK_FLAG_SPLIT_CHANGE_MASK) { if (chg_node->flags & KDIRTY_BLOCK_FLAG_START_OF_SPLIT_CHANGE) { udirty->uHdr.Hdr.ulFlags |= UDIRTY_BLOCK_FLAG_START_OF_SPLIT_CHANGE; } else if (chg_node->flags & KDIRTY_BLOCK_FLAG_PART_OF_SPLIT_CHANGE) { udirty->uHdr.Hdr.ulFlags |= UDIRTY_BLOCK_FLAG_PART_OF_SPLIT_CHANGE; } else if (chg_node->flags & KDIRTY_BLOCK_FLAG_END_OF_SPLIT_CHANGE) { udirty->uHdr.Hdr.ulFlags |= UDIRTY_BLOCK_FLAG_END_OF_SPLIT_CHANGE; } } ctxt->tc_CurrEndTimeStamp = chg_node->changes.end_ts.TimeInHundNanoSecondsFromJan1601; ctxt->tc_CurrEndSequenceNumber = chg_node->changes.end_ts.ullSequenceNumber; ctxt->tc_CurrSequenceIDforSplitIO = chg_node->seq_id_for_split_io; udirty->uHdr.Hdr.ullPrevEndTimeStamp = ctxt->tc_PrevEndTimeStamp; udirty->uHdr.Hdr.ullPrevEndSequenceNumber = ctxt->tc_PrevEndSequenceNumber; udirty->uHdr.Hdr.ulPrevSequenceIDforSplitIO = ctxt->tc_PrevSequenceIDforSplitIO; /* Set the write order state */ udirty->uHdr.Hdr.eWOState = chg_node->wostate; /* validate dirty block time stamps */ if ((!(udirty->uHdr.Hdr.ullPrevEndTimeStamp <= udirty->uTagList.TagList.TagTimeStampOfFirstChange.TimeInHundNanoSecondsFromJan1601 && udirty->uHdr.Hdr.ullPrevEndSequenceNumber <= udirty->uTagList.TagList.TagTimeStampOfFirstChange.ullSequenceNumber && udirty->uTagList.TagList.TagTimeStampOfFirstChange.TimeInHundNanoSecondsFromJan1601 <= udirty->uTagList.TagList.TagTimeStampOfLastChange.TimeInHundNanoSecondsFromJan1601 && udirty->uTagList.TagList.TagTimeStampOfFirstChange.ullSequenceNumber <= udirty->uTagList.TagList.TagTimeStampOfLastChange.ullSequenceNumber) || verify_change_node(chg_node)) && (udirty->uHdr.Hdr.ullPrevEndTimeStamp != -1)) { err("*** Out of order differential file ***"); print_dblk_filename(chg_node); if (!ctxt->tc_resync_required) { queue_worker_routine_for_set_volume_out_of_sync(ctxt, ERROR_TO_REG_OOD_ISSUE, 1); } } /* data buffers are required to map only in data mode */ switch (chg_node->type) { case NODE_SRC_TAGS: if(0 == (chg_node->flags & CHANGE_NODE_TAG_IN_STREAM)) { udirty->uTagList.TagList.TagDataSource.ulDataSource = INVOLFLT_DATA_SOURCE_META_DATA; udirty->uHdr.Hdr.ppBufferArray = NULL; bytes = ((unsigned char *)&udirty->uTagList.TagList.TagEndOfList - (unsigned char *)udirty->uTagList.BufferForTags); memcpy_s(&udirty->uTagList.TagList.TagEndOfList, (UDIRTY_BLOCK_TAGS_SIZE - bytes), chg_node->changes.cur_md_pgp, (UDIRTY_BLOCK_TAGS_SIZE - bytes)); break; } /* Fall through to map tag in stream mode */ /* GCC 7 requires the following marker comment to not print a * warning fall through */ case NODE_SRC_DATA: if(0 == (chg_node->flags & CHANGE_NODE_DATA_STREAM_FINALIZED)) { finalize_data_stream(chg_node); } udirty->uTagList.TagList.TagDataSource.ulDataSource = INVOLFLT_DATA_SOURCE_DATA; udirty->uHdr.Hdr.ulFlags |= UDIRTY_BLOCK_FLAG_SVD_STREAM; #ifdef INM_AIX do{ inm_u32_t stream_len = udirty->uHdr.Hdr.ulcbChangesInStream; udirty->uHdr.Hdr.ulcbChangesInStream = chg_node->stream_len; if(chg_node->stream_len > stream_len){ status = INM_EINVAL; goto out; } chg_node->mapped_address = udirty->uHdr.Hdr.ppBufferArray; }while(0); #else udirty->uHdr.Hdr.ulcbChangesInStream = chg_node->stream_len; #endif chg_node->flags |= CHANGE_NODE_DATA_PAGES_MAPPED_TO_S2; ref_chg_node(chg_node); volume_unlock(ctxt); status = map_change_node_to_user(chg_node, filp); volume_lock(ctxt); if(status) { #ifndef INM_AIX udirty->uHdr.Hdr.ppBufferArray = (void **)NULL; #endif chg_node->flags &= ~CHANGE_NODE_DATA_PAGES_MAPPED_TO_S2; } else { #ifndef INM_AIX udirty->uHdr.Hdr.ppBufferArray = (void **)chg_node->mapped_address; #endif } deref_chg_node(chg_node); break; case NODE_SRC_METADATA: copy_metadata_to_udirty(udirty, chg_node); udirty->uTagList.TagList.TagDataSource.ulDataSource = INVOLFLT_DATA_SOURCE_META_DATA; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("udirty block of type meta data change node \n"); } break; case NODE_SRC_DATAFILE: INM_BUG_ON(!chg_node->data_file_name); udirty->uTagList.TagList.TagDataSource.ulDataSource = INVOLFLT_DATA_SOURCE_DATA; udirty->uHdr.Hdr.ulFlags |= UDIRTY_BLOCK_FLAG_DATA_FILE; strcpy_s((char *)udirty->uTagList.DataFile.FileName, UDIRTY_BLOCK_MAX_FILE_NAME, chg_node->data_file_name); udirty->uTagList.DataFile.usLength = strlen((char *)udirty->uTagList.DataFile.FileName); break; default: dbg("unknown change node type \n"); break; } GET_TIME_STAMP_IN_USEC(ts_in_usec); if (ctxt->tc_dbwait_event_ts_in_usec) { collect_latency_stats(&ctxt->tc_dbwait_notify_latstat, (ts_in_usec - ctxt->tc_dbwait_event_ts_in_usec)); ctxt->tc_dbwait_event_ts_in_usec = 0; } if (chg_node->type) { inm_u64_t last_ts; last_ts = chg_node->changes.end_ts.TimeInHundNanoSecondsFromJan1601; chg_node->dbret_ts_in_usec = ts_in_usec; INM_DO_DIV(last_ts, 10); collect_latency_stats(&ctxt->tc_dbret_latstat, (ts_in_usec - last_ts)); } if (ctxt->tc_optimize_performance & PERF_OPT_DEBUG_DBLK_FILENAME) { print_dblk_filename(chg_node); } if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)) { info("leaving"); } #ifdef INM_AIX out: #endif return status; } static_inline void copy_ts_to_udirty(target_context_t *ctxt, UDIRTY_BLOCK_V2 *udirty, inm_devhandle_t *filp) { udirty->uHdr.Hdr.ulFlags = UDIRTY_BLOCK_FLAG_TSO_FILE; udirty->uHdr.Hdr.uliTransactionID = ++ctxt->tc_transaction_id; udirty->uHdr.Hdr.cChanges = 0; udirty->uHdr.Hdr.ulicbChanges = 0; udirty->uHdr.Hdr.ulSequenceIDforSplitIO = 1; udirty->uHdr.Hdr.ulTotalChangesPending = 0; udirty->uHdr.Hdr.ulicbTotalChangesPending = 0; udirty->uHdr.Hdr.liOutOfSyncTimeStamp = 0; udirty->uHdr.Hdr.ulOutOfSyncErrorCode = 0; udirty->uHdr.Hdr.ppBufferArray = NULL; ctxt->tc_tso_file = 1; ctxt->tc_tso_trans_id = udirty->uHdr.Hdr.uliTransactionID; FILL_STREAM_HEADER_4B(&udirty->uTagList.TagList.TagStartOfList, STREAM_REC_TYPE_START_OF_TAG_LIST, sizeof(STREAM_REC_HDR_4B)); FILL_STREAM_HEADER_4B(&udirty->uTagList.TagList.TagPadding, STREAM_REC_TYPE_PADDING, sizeof(STREAM_REC_HDR_4B)); FILL_STREAM_HEADER_4B(&udirty->uTagList.TagList.TagEndOfList, STREAM_REC_TYPE_END_OF_TAG_LIST, sizeof(STREAM_REC_HDR_4B)); FILL_STREAM_HEADER(&udirty->uTagList.TagList.TagDataSource, STREAM_REC_TYPE_DATA_SOURCE, sizeof(DATA_SOURCE_TAG)); FILL_STREAM_HEADER(&udirty->uTagList.TagList.TagTimeStampOfFirstChange, STREAM_REC_TYPE_TIME_STAMP_TAG, sizeof(TIME_STAMP_TAG_V2)); FILL_STREAM_HEADER(&udirty->uTagList.TagList.TagTimeStampOfLastChange, STREAM_REC_TYPE_TIME_STAMP_TAG, sizeof(TIME_STAMP_TAG_V2)); udirty->uTagList.TagList.TagTimeStampOfFirstChange.ullSequenceNumber = 0; udirty->uTagList.TagList.TagTimeStampOfLastChange.ullSequenceNumber = 0; get_time_stamp_tag(&udirty->uTagList.TagList.TagTimeStampOfFirstChange); get_time_stamp_tag(&udirty->uTagList.TagList.TagTimeStampOfLastChange); ctxt->tc_CurrEndTimeStamp = udirty->uTagList.TagList.TagTimeStampOfLastChange.TimeInHundNanoSecondsFromJan1601; ctxt->tc_CurrEndSequenceNumber = udirty->uTagList.TagList.TagTimeStampOfLastChange.ullSequenceNumber; ctxt->tc_CurrSequenceIDforSplitIO = 1; udirty->uHdr.Hdr.ullPrevEndTimeStamp = ctxt->tc_PrevEndTimeStamp; udirty->uHdr.Hdr.ullPrevEndSequenceNumber = ctxt->tc_PrevEndSequenceNumber; udirty->uHdr.Hdr.ulPrevSequenceIDforSplitIO = ctxt->tc_PrevSequenceIDforSplitIO; /* validate timestamps of TSO file */ if (!(udirty->uHdr.Hdr.ullPrevEndTimeStamp <= udirty->uTagList.TagList.TagTimeStampOfFirstChange.TimeInHundNanoSecondsFromJan1601 && udirty->uHdr.Hdr.ullPrevEndSequenceNumber < udirty->uTagList.TagList.TagTimeStampOfFirstChange.ullSequenceNumber && udirty->uTagList.TagList.TagTimeStampOfFirstChange.TimeInHundNanoSecondsFromJan1601 <= udirty->uTagList.TagList.TagTimeStampOfLastChange.TimeInHundNanoSecondsFromJan1601 && udirty->uTagList.TagList.TagTimeStampOfFirstChange.ullSequenceNumber <= udirty->uTagList.TagList.TagTimeStampOfLastChange.ullSequenceNumber && udirty->uHdr.Hdr.ulPrevSequenceIDforSplitIO <= ctxt->tc_CurrSequenceIDforSplitIO) && (udirty->uHdr.Hdr.ullPrevEndTimeStamp != -1)) { err("*** Out of order differential tso file ***"); err("TSO File Previous TS:%llu Seq:%llu Start TS:%llu Seq:%llu End TS:%llu Seq:%llu", udirty->uHdr.Hdr.ullPrevEndTimeStamp, udirty->uHdr.Hdr.ullPrevEndSequenceNumber, udirty->uTagList.TagList.TagTimeStampOfFirstChange.TimeInHundNanoSecondsFromJan1601, udirty->uTagList.TagList.TagTimeStampOfFirstChange.ullSequenceNumber, udirty->uTagList.TagList.TagTimeStampOfLastChange.TimeInHundNanoSecondsFromJan1601, udirty->uTagList.TagList.TagTimeStampOfLastChange.ullSequenceNumber); if (!ctxt->tc_resync_required) { queue_worker_routine_for_set_volume_out_of_sync(ctxt, ERROR_TO_REG_OOD_ISSUE, 2); } } /* Set the write order state. In case of tso file, the write order * state can be DATA if bitmap read is complete. Otherwise use the * current * write order state. */ if(ctxt->tc_bp && ctxt->tc_bp->volume_bitmap && (ecVBitmapStateReadCompleted == ctxt->tc_bp->volume_bitmap->eVBitmapState)) udirty->uHdr.Hdr.eWOState = ecWriteOrderStateData; else udirty->uHdr.Hdr.eWOState = ctxt->tc_cur_wostate; switch(ctxt->tc_cur_mode) { case FLT_MODE_DATA: udirty->uTagList.TagList.TagDataSource.ulDataSource = INVOLFLT_DATA_SOURCE_DATA; break; case FLT_MODE_METADATA: udirty->uTagList.TagList.TagDataSource.ulDataSource = INVOLFLT_DATA_SOURCE_META_DATA; break; default: udirty->uTagList.TagList.TagDataSource.ulDataSource = INVOLFLT_DATA_SOURCE_UNDEFINED; dbg("unknown filtering mode %d\n", ctxt->tc_cur_mode); break; } } int fill_udirty_block(target_context_t *ctxt, UDIRTY_BLOCK_V2 *udirty, inm_devhandle_t *filp) { int32_t witem_type = WITEM_TYPE_UNINITIALIZED; struct inm_list_head lh; change_node_t *chg_node = NULL; inm_s32_t status = 0; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("entered"); } INM_INIT_LIST_HEAD(&lh); volume_lock(ctxt); ctxt->tc_flags |= VCF_VOLUME_IN_GET_DB; if (ctxt->tc_optimize_performance & PERF_OPT_DRAIN_PREF_DATA_MODE_CHANGES_IN_NWO) { /* Preferably get non write order data mode change or * get metadata change node by tweaking the time stamps * and sequence number */ chg_node = get_oldest_change_node_pref_datamode(ctxt, &status); } else { INM_BUG_ON(!inm_list_empty(&ctxt->tc_nwo_dmode_list)); chg_node = get_oldest_change_node(ctxt, &status); } if (INM_LIKELY(chg_node)) { ref_chg_node(chg_node); status = copy_chg_node_to_udirty(ctxt, chg_node, udirty, filp); if (status) { deref_chg_node(chg_node); } } else { if (status) goto out; else { if (ctxt->tc_flags & VCF_IO_BARRIER_ON) { dbg("Drain Barrier due the pending list of i" "change nodes is non-empty, returning EAGAIN") status = INM_EAGAIN; goto out; } /* No changes, but lets simply return the timestamp. */ copy_ts_to_udirty(ctxt, udirty, filp); } } INM_BUG_ON_TMP(ctxt); if(FLT_MODE_METADATA == ctxt->tc_cur_mode){ if (is_data_filtering_enabled_for_this_volume(ctxt) && (driver_ctx->service_state == SERVICE_RUNNING) && can_switch_to_data_filtering_mode(ctxt)){ /* switch to data filtering mode */ set_tgt_ctxt_filtering_mode(ctxt, FLT_MODE_DATA, FALSE); dbg("switched to data mode \n"); } } switch (ctxt->tc_cur_wostate) { case ecWriteOrderStateMetadata: if (is_data_filtering_enabled_for_this_volume(ctxt) && (driver_ctx->service_state == SERVICE_RUNNING) && can_switch_to_data_wostate(ctxt)){ /* switch to data write order state */ set_tgt_ctxt_wostate(ctxt, ecWriteOrderStateData, FALSE, ecWOSChangeReasonUnInitialized); dbg("switched to data write order state\n"); } break; case ecWriteOrderStateData: case ecWriteOrderStateBitmap: if (ctxt->tc_bp->volume_bitmap && (ctxt->tc_pending_changes < driver_ctx->tunable_params.db_low_water_mark_while_service_running) && !(ctxt->tc_flags & VCF_VOLUME_IN_BMAP_WRITE)) { switch (ctxt->tc_bp->volume_bitmap->eVBitmapState) { case ecVBitmapStateReadPaused: witem_type = WITEM_TYPE_CONTINUE_BITMAP_READ; break; case ecVBitmapStateOpened: case ecVBitmapStateAddingChanges: witem_type = WITEM_TYPE_START_BITMAP_READ; break; default: witem_type = WITEM_TYPE_UNINITIALIZED; break; } if (witem_type) add_vc_workitem_to_list(witem_type, ctxt, 0, FALSE, &lh); } break; case ecWriteOrderStateRawBitmap: if (!(ctxt->tc_flags & VCF_VOLUME_IN_BMAP_WRITE)) add_vc_workitem_to_list(WITEM_TYPE_START_BITMAP_READ, ctxt, 0, FALSE, &lh); break; default: dbg("unknown mode\n"); break; } volume_unlock(ctxt); if (!inm_list_empty(&lh)) { wqentry_t *wqe = NULL; struct inm_list_head *ptr = NULL, *nextptr = NULL; inm_list_for_each_safe(ptr, nextptr, &lh) { wqe = inm_list_entry(ptr, wqentry_t, list_entry); inm_list_del(&wqe->list_entry); process_vcontext_work_items(wqe); put_work_queue_entry(wqe); } } add_resync_required_flag(udirty, ctxt); volume_lock(ctxt); out: ctxt->tc_flags &= ~VCF_VOLUME_IN_GET_DB; volume_unlock(ctxt); if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("leaving"); } return status; } void commit_change_node(change_node_t *chg_node) { target_context_t *tgt_ctxt = NULL; struct inm_list_head *curp = NULL, *nxtp = NULL; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("entered"); } INM_DOWN(&chg_node->mutex); if(chg_node->flags & CHANGE_NODE_FLAGS_QUEUED_FOR_DATA_WRITE) { chg_node->flags &= ~CHANGE_NODE_FLAGS_QUEUED_FOR_DATA_WRITE; } tgt_ctxt = chg_node->vcptr; INM_BUG_ON_TMP(tgt_ctxt); INM_BUG_ON(chg_node->flags & CHANGE_NODE_COMMITTED); switch(chg_node->type) { case NODE_SRC_TAGS: case NODE_SRC_DATA: /* Check if this node is being cleaned up while the data pages are * mapped */ volume_lock(tgt_ctxt); if(INM_UNLIKELY(tgt_ctxt->tc_pending_confirm == chg_node)) { volume_unlock(tgt_ctxt); INM_UP(&chg_node->mutex); return; } else { volume_unlock(tgt_ctxt); /* current->mm could be NULL if this func. gets called from * flt_release() when drainer exits. As drainer is exiting, so unmap is * not applicable. Otherwise if current->mm is valid then we * must unmap the change node */ #ifndef INM_AIX if (INM_CURPROC == chg_node->mapped_thread && INM_PROC_ADDR) unmap_change_node(chg_node); #endif chg_node->mapped_address = 0; chg_node->mapped_thread = NULL; volume_lock(tgt_ctxt); tgt_ctxt->tc_stats.num_pages_allocated -= chg_node->changes.num_data_pgs; inm_rel_data_pages(tgt_ctxt, &chg_node->data_pg_head, chg_node->changes.num_data_pgs); INM_INIT_LIST_HEAD(&chg_node->data_pg_head); chg_node->changes.num_data_pgs = 0; volume_unlock(tgt_ctxt); } chg_node->changes.num_data_pgs = 0; break; case NODE_SRC_DATAFILE: INM_BUG_ON(!chg_node->data_file_name); if(0 != inm_unlink_datafile(tgt_ctxt, chg_node->data_file_name)) { err("Data File Mode: Unlink Failed For %s", chg_node->data_file_name); } else { dbg("Data File Mode: Unlink Succeeded For %s", chg_node->data_file_name); volume_lock(tgt_ctxt); tgt_ctxt->tc_stats.dfm_bytes_to_disk -= chg_node->data_file_size; INM_ATOMIC_DEC(&tgt_ctxt->tc_stats.num_dfm_files_pending); volume_unlock(tgt_ctxt); } INM_KFREE(chg_node->data_file_name, INM_PATH_MAX, INM_KERNEL_HEAP); chg_node->data_file_name = NULL; chg_node->data_file_size = 0; break; default: break; } /* early free, will help to reuse the mem quickly*/ inm_list_for_each_safe(curp, nxtp, &chg_node->changes.md_pg_list) { inm_page_t *pgp = inm_list_entry(curp, inm_page_t, entry); inm_list_del(curp); inm_free_metapage(pgp); tgt_ctxt->tc_cnode_pgs--; } chg_node->changes.cur_md_pgp = NULL; chg_node->flags |= CHANGE_NODE_COMMITTED; volume_lock(tgt_ctxt); tgt_ctxt->tc_nr_cns--; volume_unlock(tgt_ctxt); INM_UP(&chg_node->mutex); if (tgt_ctxt->tc_optimize_performance & PERF_OPT_DEBUG_DATA_DRAIN) { info("commit - chg_node:%p next:%p prev:%p mode:%d closed:%d", chg_node, chg_node->nwo_dmode_next.next, chg_node->nwo_dmode_next.prev, chg_node->type, (chg_node->flags & CHANGE_NODE_IN_NWO_CLOSED)); } deref_chg_node(chg_node); #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) inm_alloc_pools(); #else balance_page_pool(INM_KM_SLEEP, 0); #endif if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("leaving"); } } inm_s32_t perform_commit(target_context_t *ctxt, COMMIT_TRANSACTION *commit, inm_devhandle_t *filp) { inm_s32_t err = 0; change_node_t *chg_node = NULL; inm_u32_t is_tag = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } INM_BUG_ON(ctxt->tc_pending_confirm && ctxt->tc_tso_file); if (ctxt->tc_pending_confirm) { if (ctxt->tc_optimize_performance & PERF_OPT_DEBUG_DBLK_INFO) { print_chg_info(ctxt->tc_pending_confirm, ctxt->tc_pending_confirm->changes.change_idx-1); } if (ctxt->tc_optimize_performance & PERF_OPT_DEBUG_DBLK_CHANGES) { print_change_node_off_length(ctxt->tc_pending_confirm); } } if (commit->ulFlags & COMMIT_TRANSACTION_FLAG_RESET_RESYNC_REQUIRED_FLAG) { reset_volume_out_of_sync(ctxt); } volume_lock(ctxt); ctxt->tc_tel.tt_commitdb++; get_time_stamp(&(ctxt->tc_s2_latency_base_ts)); chg_node = ctxt->tc_pending_confirm; if (chg_node && chg_node->type) { inm_u64_t ts_in_usec; GET_TIME_STAMP_IN_USEC(ts_in_usec); ctxt->tc_tel.tt_commitdb_time = ts_in_usec * 10; collect_latency_stats(&ctxt->tc_dbcommit_latstat, (ts_in_usec - chg_node->dbret_ts_in_usec)); } if (chg_node && (commit->ulTransactionID == chg_node->transaction_id)) { ctxt->tc_pending_confirm = NULL; /* If change node on non write order data mode list, then * remove it from that list */ if (!inm_list_empty(&chg_node->nwo_dmode_next)) { inm_list_del_init(&chg_node->nwo_dmode_next); } if (chg_node->type == NODE_SRC_TAGS) { is_tag = 1; ref_chg_node(chg_node); } if (is_tag && ((chg_node->flags & CHANGE_NODE_FAILBACK_TAG) || (chg_node->flags & CHANGE_NODE_BLOCK_DRAIN_TAG))) { INM_SPIN_LOCK(&driver_ctx->dc_tag_commit_status); if (ctxt->tc_tag_commit_status) { ctxt->tc_tag_commit_status->TagInsertionTime = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC(ctxt->tc_CurrEndTimeStamp); ctxt->tc_tag_commit_status->TagSequenceNumber = ctxt->tc_CurrEndSequenceNumber; info("The tag is drained for disk %s, dirty " "block = %p, TagInsertionTime = %llu, " "TagSequenceNumber = %llu", ctxt->tc_guid, chg_node, ctxt->tc_tag_commit_status->TagInsertionTime, ctxt->tc_tag_commit_status->TagSequenceNumber); } INM_SPIN_UNLOCK(&driver_ctx->dc_tag_commit_status); set_tag_drain_notify_status(ctxt, TAG_STATUS_COMMITTED, DEVICE_STATUS_SUCCESS); if (chg_node->flags & CHANGE_NODE_BLOCK_DRAIN_TAG) { ctxt->tc_flags |= VCF_DRAIN_BLOCKED; } } put_tgt_ctxt(ctxt); ctxt->tc_prev_transaction_id = chg_node->transaction_id; ctxt->tc_PrevEndTimeStamp = ctxt->tc_CurrEndTimeStamp; ctxt->tc_PrevEndSequenceNumber = ctxt->tc_CurrEndSequenceNumber; ctxt->tc_PrevSequenceIDforSplitIO = ctxt->tc_CurrSequenceIDforSplitIO; get_rpo_timestamp(ctxt, IOCTL_INMAGE_COMMIT_DIRTY_BLOCKS_TRANS, chg_node); if (chg_node->tag_guid) { chg_node->tag_guid->status[chg_node->tag_status_idx] = STATUS_COMMITED; INM_WAKEUP_INTERRUPTIBLE(&chg_node->tag_guid->wq); chg_node->tag_guid = NULL; } /* not an orphan node */ if ((chg_node->flags & CHANGE_NODE_ORPHANED) == 0) { INM_BUG_ON(ctxt->tc_pending_changes < (chg_node->changes.change_idx)); ctxt->tc_pending_changes -= (chg_node->changes.change_idx); if (chg_node->type == NODE_SRC_METADATA) { ctxt->tc_pending_md_changes -= (chg_node->changes.change_idx); ctxt->tc_bytes_pending_md_changes -= (chg_node->changes.bytes_changes); } ctxt->tc_bytes_pending_changes -= chg_node->changes.bytes_changes; ctxt->tc_commited_changes += (chg_node->changes.change_idx); ctxt->tc_bytes_commited_changes += chg_node->changes.bytes_changes; subtract_changes_from_pending_changes(ctxt, chg_node->wostate, chg_node->changes.change_idx); update_cx_session_with_committed_bytes(ctxt, chg_node->changes.bytes_changes); if (ctxt->tc_bytes_pending_changes <= CX_SESSION_PENDING_BYTES_THRESHOLD) { close_disk_cx_session(ctxt, CX_CLOSE_PENDING_BYTES_BELOW_THRESHOLD); } inm_list_del(&chg_node->next); deref_chg_node(chg_node); if (ecWriteOrderStateData != ctxt->tc_cur_wostate) { if (is_data_filtering_enabled_for_this_volume(ctxt) && (driver_ctx->service_state == SERVICE_RUNNING) && can_switch_to_data_wostate(ctxt)) { // switch to data write order state set_tgt_ctxt_wostate(ctxt, ecWriteOrderStateData, FALSE, ecWOSChangeReasonUnInitialized); dbg("switched to data write order state"); } else if (ecWriteOrderStateBitmap == ctxt->tc_cur_wostate) { if (ctxt->tc_bp && ctxt->tc_bp->volume_bitmap && (ecVBitmapStateReadCompleted == ctxt->tc_bp->volume_bitmap->eVBitmapState)) { if ((driver_ctx->service_state == SERVICE_RUNNING) && !ctxt->tc_pending_wostate_bm_changes && !ctxt->tc_pending_wostate_rbm_changes) { set_tgt_ctxt_wostate(ctxt, ecWriteOrderStateMetadata, FALSE, ecWOSChangeReasonMDChanges); dbg("switched to metadata write order state\n"); } } } } } volume_unlock(ctxt); if (is_tag) { telemetry_log_tag_history(chg_node, ctxt, ecTagStatusTagCommitDBSuccess, ecNotApplicable, ecMsgTagCommitDBSuccess); ctxt->tc_tel.tt_prev_tag_ts = ctxt->tc_PrevEndTimeStamp; ctxt->tc_tel.tt_prev_tag_seqno = ctxt->tc_PrevEndSequenceNumber; deref_chg_node(chg_node); } ctxt->tc_tel.tt_prev_ts = ctxt->tc_PrevEndTimeStamp; ctxt->tc_tel.tt_prev_seqno = ctxt->tc_PrevEndSequenceNumber; commit_change_node(chg_node); } else if (ctxt->tc_tso_file) { if (commit->ulTransactionID == ctxt->tc_tso_trans_id) { ctxt->tc_PrevEndTimeStamp = ctxt->tc_CurrEndTimeStamp; ctxt->tc_PrevEndSequenceNumber = ctxt->tc_CurrEndSequenceNumber; ctxt->tc_PrevSequenceIDforSplitIO = ctxt->tc_CurrSequenceIDforSplitIO; ctxt->tc_tel.tt_prev_ts = ctxt->tc_PrevEndTimeStamp; ctxt->tc_tel.tt_prev_seqno = ctxt->tc_PrevEndSequenceNumber; get_rpo_timestamp(ctxt, IOCTL_INMAGE_COMMIT_DIRTY_BLOCKS_TRANS, NULL); } volume_unlock(ctxt); } else { err = INM_EFAULT; ctxt->tc_tel.tt_commitdb_failed++; volume_unlock(ctxt); } if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("leaving"); } return err; } void balance_page_pool(inm_s32_t alloc_flag, int quit) { #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) inm_s32_t threshold = 1024; #else inm_s32_t threshold = 256; #endif inm_page_t *pg = NULL; unsigned long lock_flag = 0; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("entered"); } while (driver_ctx->dc_res_cnode_pgs < threshold) { pg = (inm_page_t *) INM_KMALLOC(sizeof(inm_page_t), alloc_flag, INM_KERNEL_HEAP); INM_BUG_ON(!pg); if(INM_PIN(pg, sizeof(inm_page_t))){ INM_KFREE(pg, sizeof(*pg), INM_KERNEL_HEAP); return; } INM_MEM_ZERO(pg, sizeof(inm_page_t)); pg->cur_pg = (unsigned long *)__INM_GET_FREE_PAGE(alloc_flag, INM_KERNEL_HEAP); if (!pg->cur_pg) { INM_UNPIN(pg, sizeof(inm_page_t)); INM_KFREE(pg, sizeof(*pg), INM_KERNEL_HEAP); return; } if(INM_PIN(pg->cur_pg, INM_PAGESZ)){ INM_FREE_PAGE(pg->cur_pg, INM_KERNEL_HEAP); INM_UNPIN(pg, sizeof(inm_page_t)); INM_KFREE(pg, sizeof(*pg), INM_KERNEL_HEAP); return; } INM_SPIN_LOCK_IRQSAVE(&driver_ctx->page_pool_lock, lock_flag); driver_ctx->dc_res_cnode_pgs ++; inm_list_add_tail(&pg->entry, &driver_ctx->page_pool); #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) pg->flags = METAPAGE_ALLOCED_FROM_POOL; INM_ATOMIC_INC(&driver_ctx->dc_nr_metapages_alloced); #endif INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->page_pool_lock, lock_flag); if (quit) break; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return; } #ifndef INM_AIX int map_change_node_to_user(change_node_t *chg_node, inm_devhandle_t *idhp) { inm_addr_t addr = 0; void *saved_ptr; inm_s32_t len = 0; inm_s32_t status = 0, ret = 0; inm_u32_t off = 0; INM_DOWN(&chg_node->mutex); len = pages_to_bytes(chg_node->changes.num_data_pgs); if(!len) { status = -EINVAL; goto rel_mutex; } /* If change node mapped already, do nothing, previously mapped address * can be returned to drainer. */ if(chg_node->mapped_thread) { dbg("Returning already mapped address: 0x%ld", chg_node->mapped_address); goto rel_mutex; } if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("entered"); } saved_ptr = idhp->private_data; INM_FILL_MMAP_PRIVATE_DATA(idhp, chg_node); ret = INM_DO_STREAM_MAP(idhp, off, len, addr, INM_MMAP_PROT_FLAGS, INM_MMAP_MAPFLAG); if (ret) { err("INM_DO_STREAM_MAP() failed with err = %d", ret); } idhp->private_data = saved_ptr; if (IS_ERR((void *)addr)) { err("Mapping Failed, err %ld \n", addr); status = -ENOMEM; goto rel_mutex; } dbg("Mapping to User Address: 0x%p, len: %d\n", (void *)addr, len); chg_node->mapped_thread = INM_CURPROC; chg_node->mapped_address = addr; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("leaving"); } rel_mutex: INM_UP(&chg_node->mutex); return status; } #else int map_change_node_to_user(change_node_t *chg_node, inm_devhandle_t *idhp) { inm_s32_t len = chg_node->stream_len, to_copy, status = 0;; struct inm_list_head *ptr, *hd; data_page_t *page; void *user_buf; INM_DOWN(&chg_node->mutex); user_buf = (void *)chg_node->mapped_address; hd =&(chg_node->data_pg_head); for(ptr = hd->next; ptr != hd; ptr = ptr->next) { page = inm_list_entry(ptr, data_page_t, next); to_copy = MIN(len, INM_PAGESZ); if(INM_COPYOUT(user_buf, page->page, to_copy)){ err("copy to user failed in get_db"); status = INM_EFAULT; goto rel_mutex; } user_buf += to_copy; len -= to_copy; } INM_BUG_ON(len); rel_mutex: INM_UP(&chg_node->mutex); return status; } #endif void print_change_node_off_length(change_node_t *chg_node) { unsigned short index = 0, i; inm_s32_t md_idx; disk_chg_t *chg = NULL; if (!chg_node || (chg_node->type != NODE_SRC_DATA && chg_node->type != NODE_SRC_DATAFILE && chg_node->type != NODE_SRC_METADATA)) { return; } index = chg_node->changes.change_idx; info("Printing change info trans_id:%lld start seq:%llu change node:%d" " splcnt:%u", chg_node->transaction_id, chg_node->changes.start_ts.ullSequenceNumber, chg_node->type, chg_node->seq_id_for_split_io); index = index%(MAX_CHANGE_INFOS_PER_PAGE); if (chg_node->changes.cur_md_pgp) { for (i=0; ichanges.cur_md_pgp + (sizeof(disk_chg_t) * md_idx)); info("Index:%u offset:%llu length:%u td:%u sd:%u\n", i, chg->offset, chg->length, chg->time_delta, chg->seqno_delta); } } info("Printing change info trans_id:%lld end seq:%llu\n", chg_node->transaction_id, chg_node->changes.end_ts.ullSequenceNumber); } int print_dblk_filename(change_node_t *chg_node) { if (chg_node->flags & (KDIRTY_BLOCK_FLAG_START_OF_SPLIT_CHANGE | KDIRTY_BLOCK_FLAG_PART_OF_SPLIT_CHANGE)) { if (ecWriteOrderStateData == chg_node->wostate) { info("%lld:pre_completed_diff_type_%d_P%llu_%llu_S%llu_%llu_E%llu_%llu_WC%u.dat", chg_node->transaction_id, chg_node->type,chg_node->vcptr->tc_PrevEndTimeStamp, chg_node->vcptr->tc_PrevEndSequenceNumber, chg_node->changes.start_ts.TimeInHundNanoSecondsFromJan1601, chg_node->changes.start_ts.ullSequenceNumber, chg_node->changes.end_ts.TimeInHundNanoSecondsFromJan1601, chg_node->changes.end_ts.ullSequenceNumber, chg_node->seq_id_for_split_io); } else { info("%lld pre_completed_diff_type_%d_P%llu_%llu_S%llu_%llu_E%llu_%llu_MC%u.dat", chg_node->transaction_id, chg_node->type,chg_node->vcptr->tc_PrevEndTimeStamp, chg_node->vcptr->tc_PrevEndSequenceNumber, chg_node->changes.start_ts.TimeInHundNanoSecondsFromJan1601, chg_node->changes.start_ts.ullSequenceNumber, chg_node->changes.end_ts.TimeInHundNanoSecondsFromJan1601, chg_node->changes.end_ts.ullSequenceNumber, chg_node->seq_id_for_split_io); } } else { if (ecWriteOrderStateData == chg_node->wostate) { info("%lld pre_completed_diff_type_%d_P%llu_%llu_S%llu_%llu_E%llu_%llu_WE%u.dat", chg_node->transaction_id, chg_node->type,chg_node->vcptr->tc_PrevEndTimeStamp, chg_node->vcptr->tc_PrevEndSequenceNumber, chg_node->changes.start_ts.TimeInHundNanoSecondsFromJan1601, chg_node->changes.start_ts.ullSequenceNumber, chg_node->changes.end_ts.TimeInHundNanoSecondsFromJan1601, chg_node->changes.end_ts.ullSequenceNumber, chg_node->seq_id_for_split_io); } else { info("%lld pre_completed_diff_type_%d_P%llu_%llu_S%llu_%llu_E%llu_%llu_ME%u.dat", chg_node->transaction_id, chg_node->type,chg_node->vcptr->tc_PrevEndTimeStamp, chg_node->vcptr->tc_PrevEndSequenceNumber, chg_node->changes.start_ts.TimeInHundNanoSecondsFromJan1601, chg_node->changes.start_ts.ullSequenceNumber, chg_node->changes.end_ts.TimeInHundNanoSecondsFromJan1601, chg_node->changes.end_ts.ullSequenceNumber, chg_node->seq_id_for_split_io); } } return 0; } inm_s32_t verify_change_node(change_node_t *chg_node) { target_context_t *tgt_ctxt; inm_s32_t i, md_idx; disk_chg_t *chg = NULL; inm_page_t *pgp = NULL; struct inm_list_head *ptr; if (!chg_node) return 0; tgt_ctxt = chg_node->vcptr; INM_BUG_ON(!tgt_ctxt); if (chg_node->type == NODE_SRC_TAGS) { return 0; } if (chg_node->changes.change_idx) { i = 0; pgp = NULL; __inm_list_for_each(ptr, &chg_node->changes.md_pg_list) { pgp = inm_list_entry(ptr, inm_page_t, entry); i += MAX_CHANGE_INFOS_PER_PAGE; if (i >= chg_node->changes.change_idx) break; } if (!pgp) { err("offset length page empty!"); return 1; } md_idx = ((chg_node->changes.change_idx-1) % (MAX_CHANGE_INFOS_PER_PAGE)); chg = (disk_chg_t *)((char *)pgp->cur_pg + (sizeof(disk_chg_t) * md_idx)); if (((chg_node->changes.start_ts.TimeInHundNanoSecondsFromJan1601 + chg->time_delta) != chg_node->changes.end_ts.TimeInHundNanoSecondsFromJan1601) || ((chg_node->changes.start_ts.ullSequenceNumber + chg->seqno_delta) != chg_node->changes.end_ts.ullSequenceNumber)) return 1; } return 0; } inm_s32_t change_node_unmap_page_buffer(void *map) { return 0; } char * change_node_map_dm_page_to_buffer(change_node_t *cnode) { char *buf = driver_ctx->dc_verifier_area; inm_u32_t bufsz = driver_ctx->tunable_params.max_data_sz_dm_cn; inm_s32_t len = cnode->stream_len; inm_s32_t to_copy = 0; inm_s32_t error = 0; struct inm_list_head *ptr = NULL; struct inm_list_head *hd = &(cnode->data_pg_head); data_page_t *page; char *src = NULL; if (!buf) { error = -ENOMEM; goto out; } if (len > bufsz) { error = -EFBIG; goto out; } for(ptr = hd->next; ptr != hd; ptr = ptr->next) { page = inm_list_entry(ptr, data_page_t, next); to_copy = MIN(len, INM_PAGESZ); INM_PAGE_MAP(src, page->page, KM_SOFTIRQ0); memcpy_s(buf, bufsz, src, to_copy); INM_PAGE_UNMAP(src, page->page, KM_SOFTIRQ0); buf += to_copy; bufsz -= to_copy; len -= to_copy; } INM_BUG_ON(len); out: if (error) return ERR_PTR(error); else return driver_ctx->dc_verifier_area; } inm_s32_t verify_change_node_file(change_node_t *cnode) { static inm_s32_t error_logged = 0; /* log once on failure */ inm_s32_t error = 0; inm_irqflag_t flag = 0; char *buf = NULL; if (!driver_ctx->dc_verifier_on) return 0; switch(cnode->type) { case NODE_SRC_DATA: INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_verifier_lock, flag); buf = change_node_map_dm_page_to_buffer(cnode); if (IS_ERR(buf)) { if (!error_logged) { err("Cannot map for verification"); error_logged = 1; } error = PTR_ERR(buf); } else { error_logged = 0; /* mapping successful */ error = inm_verify_change_node_data(buf, cnode->stream_len, 0); change_node_unmap_page_buffer(buf); } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_verifier_lock, flag); break; default: error = 0; break; } return error; } involflt-0.1.0/src/driver-context.c0000755000000000000000000004636214467303177016005 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include "inm_types.h" #define FLT_VERBOSITY inm_u32_t inm_verbosity; #include "involflt.h" #include "involflt-common.h" #include "data-mode.h" #include "change-node.h" #include "filestream.h" #include "iobuffer.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "bitmap_api.h" #include "VBitmap.h" #include "work_queue.h" #include "data-file-mode.h" #include "target-context.h" #include "driver-context.h" #include "file-io.h" #include "utils.h" #include "tunable_params.h" #include "filter_host.h" #include "filter.h" #include "telemetry.h" inm_s32_t data_mode = 1; driver_context_t *driver_ctx = NULL; inm_s32_t init_driver_context(void) { inm_s32_t r = 0; driver_ctx = (driver_context_t *)INM_KMALLOC(sizeof(*driver_ctx), INM_KM_SLEEP, INM_PINNED_HEAP); if(!driver_ctx) { err("Failed to initialize driver context err = %d", ENOMEM); return -ENOMEM; } INM_MEM_ZERO(driver_ctx, sizeof(*driver_ctx)); #if (defined(INM_DEBUG)) inm_verbosity |= (inm_u32_t)INM_DEBUG_ONLY; #endif #if (defined(IDEBUG)) inm_verbosity |= INM_IDEBUG; #endif #if defined(IDEBUG_BMAP) inm_verbosity |= INM_IDEBUG_BMAP; #endif #if defined(IDEBUG_MIRROR) inm_verbosity |= INM_IDEBUG_MIRROR; #endif #if defined(IDEBUG_MIRROR_IO) inm_verbosity |= INM_IDEBUG_MIRROR_IO; #endif #if defined(IDEBUG_META) inm_verbosity |= INM_IDEBUG_META; #endif #if defined(DEBUG_REF) inm_verbosity |= INM_IDEBUG_REF; #endif #if defined(IDEBUG_BMAP_REF) inm_verbosity |= INM_IDEBUG_IO; #endif /* Reserve 6.25% of system memory for data page pool */ driver_ctx->default_data_pool_size_mb = get_data_page_pool_mb(); if (!driver_ctx->default_data_pool_size_mb) { r = -ENOMEM; err("Memory is not enough for driver initialization err = %d", r); goto free_dc; } INM_INIT_SPIN_LOCK(&driver_ctx->tunables_lock); init_driver_tunable_params(); #ifndef INM_AIX driver_ctx->dc_host_info.bio_info_cache = INM_KMEM_CACHE_CREATE("global_bio_info_cache", INM_BIOSZ, 0, 0, NULL, NULL, INM_MAX_NR_GLOBAL_BUF_INFO_POOL, INM_MIN_NR_GLOBAL_BUF_INFO_POOL, INM_PINNED); if(!driver_ctx->dc_host_info.bio_info_cache){ INM_DESTROY_SPIN_LOCK(&driver_ctx->tunables_lock); r = INM_ENOMEM; err("INM_KMEM_CACHE_CREATE failed to create bio cache err = %d", r); goto free_dc; } #endif driver_ctx->dc_host_info.mirror_bioinfo_cache = INM_KMEM_CACHE_CREATE("global_mirror_bioinfo_cache", INM_MIRROR_BIOSZ, 0, 0, NULL, NULL, INM_MAX_NR_GLOBAL_MIRROR_BUF_INFO_POOL, INM_MIN_NR_GLOBAL_MIRROR_BUF_INFO_POOL, INM_PINNED); if(!driver_ctx->dc_host_info.mirror_bioinfo_cache){ INM_DESTROY_SPIN_LOCK(&driver_ctx->tunables_lock); #ifndef INM_AIX INM_KMEM_CACHE_DESTROY(driver_ctx->dc_host_info.bio_info_cache); #endif r = INM_ENOMEM; err("INM_KMEM_CACHE_CREATE failed to create mirror bio cache " "err = %d", r); goto free_dc; } #ifdef INM_AIX driver_ctx->dc_host_info.data_file_node_cache = INM_KMEM_CACHE_CREATE("data_file_node_cache", sizeof(data_file_node_t), 0, 0, NULL, NULL, INM_MAX_NR_DATA_FILE_NODE_POOL, INM_MIN_NR_DATA_FILE_NODE_POOL, INM_PINNED); if(!driver_ctx->dc_host_info.data_file_node_cache){ INM_DESTROY_SPIN_LOCK(&driver_ctx->tunables_lock); r = -ENOMEM; err("INM_KMEM_CACHE_CREATE failed to create data_file_node_cache " "err = %d", r); goto free_bio_cache; } #endif INM_INIT_LIST_HEAD(&driver_ctx->tgt_list); INM_RW_SEM_INIT(&driver_ctx->tgt_list_sem); INIT_OSSPEC_DRV_CTX(driver_ctx); INM_INIT_LIST_HEAD(&driver_ctx->tag_guid_list); INM_RW_SEM_INIT(&driver_ctx->tag_guid_list_sem); driver_ctx->service_state = SERVICE_UNITIALIZED; INM_INIT_SPIN_LOCK(&driver_ctx->log_lock); /* initialize the stats structure */ INM_MEM_ZERO(&driver_ctx->stats, sizeof(dc_stats_t)); /* change node memory pool */ INM_INIT_LIST_HEAD(&driver_ctx->page_pool); INM_INIT_SPIN_LOCK(&driver_ctx->page_pool_lock); /* allocate bitmap work item pool */ /* allocate work queue entry pool */ r = alloc_cache_pools(); if (r) goto deinit_locks; /* initialize head_for_volume_bitmaps */ INM_INIT_LIST_HEAD(&driver_ctx->dc_bmap_info.head_for_volume_bitmaps); driver_ctx->dc_bmap_info.num_volume_bitmaps = 0; INM_ATOMIC_SET(&driver_ctx->involflt_refcnt,0); INM_INIT_SPIN_LOCK(&driver_ctx->clean_shutdown_lock); if (driver_ctx->tunable_params.data_pool_size) { r = init_data_flt_ctxt(&driver_ctx->data_flt_ctx); } else { r = -ENOMEM; } if(r) goto free_cache_pools; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ /* print_driver_context(driver_ctx); */ } INM_ATOMIC_SET(&(driver_ctx->stats.pending_chg_nodes), 0); INM_INIT_LIST_HEAD(&driver_ctx->dc_host_info.rq_list); INM_INIT_SPIN_LOCK(&driver_ctx->dc_host_info.rq_list_lock); INM_INIT_SPIN_LOCK(&driver_ctx->time_stamp_lock); driver_ctx->sys_shutdown = 0; INM_INIT_COMPLETION(&driver_ctx->shutdown_completion); INM_INIT_SEM(&driver_ctx->tag_sem); driver_ctx->sentinal_pid = 0; driver_ctx->sentinal_idhp = NULL; driver_ctx->svagent_pid = 0; driver_ctx->svagent_idhp = NULL; INM_RW_SEM_INIT(&driver_ctx->dc_inmaops_sem); INM_INIT_LIST_HEAD(&driver_ctx->dc_inma_ops_list); INM_INIT_LIST_HEAD(&driver_ctx->recursive_writes_meta_list); INM_INIT_SPIN_LOCK(&driver_ctx->recursive_writes_meta_list_lock); /* initialize work queue context */ r = init_work_queue(&driver_ctx->wqueue, NULL); if(r) goto free_cache_pools; /* Per IO stamp file */ snprintf(driver_ctx->driver_time_stamp, INM_PATH_MAX , "%s/%s/GlobalTimeStamp", PERSISTENT_DIR, COMMON_ATTR_NAME); /* Per IO stamp seqno file */ snprintf(driver_ctx->driver_time_stamp_seqno, INM_PATH_MAX , "%s/%s/SequenceNumber", PERSISTENT_DIR, COMMON_ATTR_NAME); /* Consistency Point */ INM_ATOMIC_SET(&driver_ctx->is_iobarrier_on, 0); INM_INIT_LIST_HEAD(&driver_ctx->freeze_vol_list); driver_ctx->dc_cp = INM_CP_NONE; INM_MEM_ZERO(driver_ctx->dc_cp_guid, sizeof(driver_ctx->dc_cp_guid)); INM_INIT_SEM(&driver_ctx->dc_cp_mutex); driver_ctx->dc_lcw_aops = NULL; driver_ctx->dc_lcw_rhdl = NULL; driver_ctx->dc_lcw_rflag = 0; driver_ctx->dc_root_disk = NULL; INM_INIT_SPIN_LOCK(&driver_ctx->dc_tel.dt_dbs_slock); telemetry_set_dbs(&driver_ctx->dc_tel.dt_blend, DBS_DRIVER_NOREBOOT_MODE); INM_INIT_SPIN_LOCK(&driver_ctx->dc_vm_cx_session_lock); INM_INIT_WAITQUEUE_HEAD(&driver_ctx->dc_vm_cx_session_waitq); INM_INIT_LIST_HEAD(&driver_ctx->dc_disk_cx_stats_list); INM_INIT_LIST_HEAD(&driver_ctx->dc_disk_cx_sess_list); driver_ctx->dc_num_consecutive_tags_failed = VCS_NUM_CONSECTIVE_TAG_FAILURES_ALLOWED; driver_ctx->dc_max_fwd_timejump_ms = FORWARD_TIMEJUMP_ALLOWED; driver_ctx->dc_max_bwd_timejump_ms = BACKWARD_TIMEJUMP_ALLOWED; INM_INIT_SPIN_LOCK(&driver_ctx->dc_tag_commit_status); INM_ATOMIC_SET(&driver_ctx->dc_nr_tag_commit_status_pending_disks, 0); INM_ATOMIC_SET(&driver_ctx->dc_tag_commit_status_failed, 0); INM_INIT_WAITQUEUE_HEAD(&driver_ctx->dc_tag_commit_status_waitq); /* initialize timer queue context */ r = init_work_queue(&driver_ctx->dc_tqueue, timer_worker); if(r) goto cleanup_wqueue; telemetry_check_time_jump(); get_time_stamp(&(driver_ctx->dc_tel.dt_drv_load_time)); driver_ctx->dc_verifier_on = 0; INM_INIT_SPIN_LOCK(&driver_ctx->dc_verifier_lock); driver_ctx->dc_verifier_area = NULL; return 0; cleanup_wqueue: cleanup_work_queue(&driver_ctx->wqueue); free_cache_pools: INM_DESTROY_SPIN_LOCK(&driver_ctx->clean_shutdown_lock); dealloc_cache_pools(); deinit_locks: INM_DESTROY_SPIN_LOCK(&driver_ctx->page_pool_lock); INM_DESTROY_SPIN_LOCK(&driver_ctx->log_lock); INM_RW_SEM_DESTROY(&driver_ctx->tgt_list_sem); INM_RW_SEM_DESTROY(&driver_ctx->tag_guid_list_sem); #ifdef INM_AIX INM_DESTROY_SEM(&driver_ctx->dc_mxs_sem); free_bio_cache: #endif #ifndef INM_AIX if(driver_ctx->dc_host_info.bio_info_cache){ INM_KMEM_CACHE_DESTROY(driver_ctx->dc_host_info.bio_info_cache); } #endif if(driver_ctx->dc_host_info.mirror_bioinfo_cache){ INM_KMEM_CACHE_DESTROY(driver_ctx->dc_host_info.mirror_bioinfo_cache); } #ifdef INM_AIX INM_DESTROY_SPIN_LOCK(&(driver_ctx->dc_at_lun.dc_at_lun_list_spn)); INM_DESTROY_SPIN_LOCK(&(driver_ctx->dc_at_lun.dc_cdb_dev_list_lock)); INM_DESTROY_SPIN_LOCK(&driver_ctx->tgt_list_lock); INM_KMEM_CACHE_DESTROY(driver_ctx->dc_host_info.data_file_node_cache); #endif free_dc: INM_KFREE(driver_ctx, sizeof(*driver_ctx), INM_PINNED_HEAP); driver_ctx = NULL; return r; } void free_driver_context(void) { struct inm_list_head *lp = NULL, *np = NULL; INM_BUG_ON(!inm_list_empty(&driver_ctx->dc_disk_cx_sess_list)); while (!inm_list_empty(&driver_ctx->dc_disk_cx_stats_list)) { disk_cx_stats_info_t *disk_cx_stats_info; disk_cx_stats_info = inm_list_entry(driver_ctx->dc_disk_cx_stats_list.next, disk_cx_stats_info_t, dcsi_list); inm_list_del(&disk_cx_stats_info->dcsi_list); INM_KFREE(disk_cx_stats_info, sizeof(disk_cx_stats_info_t), INM_KERNEL_HEAP); } free_data_flt_ctxt(&driver_ctx->data_flt_ctx); cleanup_work_queue(&driver_ctx->dc_tqueue); cleanup_work_queue(&driver_ctx->wqueue); /* freeing inma_ops */ INM_DOWN_WRITE(&driver_ctx->dc_inmaops_sem); inm_list_for_each_safe(lp, np, &driver_ctx->dc_inma_ops_list) { inma_ops_t *t_inma_opsp = NULL; inm_list_del(lp); t_inma_opsp = (inma_ops_t *) inm_list_entry(lp, inma_ops_t, ia_list); inm_free_inma_ops(t_inma_opsp); } INM_UP_WRITE(&driver_ctx->dc_inmaops_sem); /* free the bitmap work item, work queue entry pools */ dealloc_cache_pools(); /* Destroy tag guid list if exists */ INM_DOWN_WRITE(&driver_ctx->tag_guid_list_sem); while(!inm_list_empty(&driver_ctx->tag_guid_list)){ tag_guid_t *tag_guid = inm_list_entry(driver_ctx->tag_guid_list.next, tag_guid_t, tag_list); inm_list_del(&tag_guid->tag_list); INM_WAKEUP_INTERRUPTIBLE(&tag_guid->wq); flt_cleanup_sync_tag(tag_guid); } INM_UP_WRITE(&driver_ctx->tag_guid_list_sem); INM_DESTROY_COMPLETION(&driver_ctx->shutdown_completion); INM_RW_SEM_DESTROY(&driver_ctx->dc_inmaops_sem); INM_DESTROY_SEM(&driver_ctx->tag_sem); INM_DESTROY_SPIN_LOCK(&driver_ctx->time_stamp_lock); INM_DESTROY_SPIN_LOCK(&driver_ctx->dc_host_info.rq_list_lock); INM_DESTROY_SPIN_LOCK(&driver_ctx->page_pool_lock); INM_DESTROY_SPIN_LOCK(&driver_ctx->log_lock); INM_RW_SEM_DESTROY(&driver_ctx->tgt_list_sem); #ifdef INM_AIX free_all_at_lun_entries(); free_all_mxs_entries(); INM_DESTROY_SPIN_LOCK(&(driver_ctx->dc_at_lun.dc_at_lun_list_spn)); INM_DESTROY_SPIN_LOCK(&(driver_ctx->dc_at_lun.dc_cdb_dev_list_lock)); INM_DESTROY_SPIN_LOCK(&driver_ctx->tgt_list_lock); INM_DESTROY_SEM(&driver_ctx->dc_mxs_sem); INM_KMEM_CACHE_DESTROY(driver_ctx->dc_host_info.data_file_node_cache); #endif INM_RW_SEM_DESTROY(&driver_ctx->tag_guid_list_sem); INM_DESTROY_SPIN_LOCK(&driver_ctx->recursive_writes_meta_list_lock); if(driver_ctx) { inm_flush_clean_shutdown(CLEAN_SHUTDOWN); INM_DESTROY_SPIN_LOCK(&driver_ctx->clean_shutdown_lock); #ifndef INM_AIX INM_KMEM_CACHE_DESTROY(driver_ctx->dc_host_info.bio_info_cache); #endif INM_KMEM_CACHE_DESTROY(driver_ctx->dc_host_info.mirror_bioinfo_cache); INM_KFREE(driver_ctx, sizeof(*driver_ctx), INM_PINNED_HEAP); driver_ctx = NULL; } } void add_tc_to_dc(target_context_t *tc) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } tc->tc_hist.ths_start_flt_ts = INM_GET_CURR_TIME_IN_SEC; inm_list_add_tail(&tc->tc_list, &driver_ctx->tgt_list); driver_ctx->total_prot_volumes++; if(tc->tc_dev_type == FILTER_DEV_MIRROR_SETUP){ driver_ctx->mirror_prot_volumes++; }else { driver_ctx->host_prot_volumes++; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } } void remove_tc_from_dc(target_context_t *tc) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } INM_DOWN_WRITE(&driver_ctx->tgt_list_sem); inm_list_del(&tc->tc_list); driver_ctx->total_prot_volumes--; if(tc->tc_dev_type == FILTER_DEV_MIRROR_SETUP){ driver_ctx->mirror_prot_volumes--; }else { driver_ctx->host_prot_volumes--; } wake_up_tc_state(tc); INM_UP_WRITE(&driver_ctx->tgt_list_sem); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } } /* allocate global pools */ inm_s32_t alloc_cache_pools() { inm_s32_t err = 0; inm_s32_t threshold = 256, i = 0; inm_page_t *pg = NULL; while (i < threshold) { pg = (inm_page_t *) INM_KMALLOC(sizeof(inm_page_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!pg) { err = -ENOMEM; goto next; } INM_MEM_ZERO(pg, sizeof(inm_page_t)); if(INM_PIN(pg, sizeof(inm_page_t))){ err = -ENOMEM; INM_KFREE(pg, sizeof(inm_page_t), INM_KERNEL_HEAP); goto next; } pg->cur_pg = (unsigned long *)__INM_GET_FREE_PAGE(INM_KM_SLEEP, INM_KERNEL_HEAP); if (!pg->cur_pg) { err = -ENOMEM; INM_UNPIN(pg, sizeof(inm_page_t)); INM_KFREE(pg, sizeof(inm_page_t), INM_KERNEL_HEAP); goto next; } if(INM_PIN(pg->cur_pg, INM_PAGESZ)){ err = -ENOMEM; INM_FREE_PAGE(pg->cur_pg, INM_KERNEL_HEAP); INM_UNPIN(pg, sizeof(inm_page_t)); INM_KFREE(pg, sizeof(inm_page_t), INM_KERNEL_HEAP); goto next; } inm_list_add_tail(&pg->entry, &driver_ctx->page_pool); #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) pg->flags = METAPAGE_ALLOCED_FROM_POOL; #endif i++; } driver_ctx->dc_res_cnode_pgs = threshold; driver_ctx->dc_bmap_info.bitmap_work_item_pool = INM_KMEM_CACHE_CREATE("BITMAP_WORK_ITEM_POOL", sizeof(bitmap_work_item_t), 0, INM_SLAB_HWCACHE_ALIGN, NULL, NULL, INM_MAX_NR_BITMAP_WORK_ITEM_POOL, INM_MIN_NR_BITMAP_WORK_ITEM_POOL, INM_UNPINNED); if (IS_ERR(driver_ctx->dc_bmap_info.bitmap_work_item_pool)) { err = -ENOMEM; goto next; } else driver_ctx->flags |= DC_FLAGS_BITMAP_WORK_ITEM_POOL_INIT; driver_ctx->wq_entry_pool = INM_KMEM_CACHE_CREATE("WQ_ENTRY_POOL", sizeof(wqentry_t), 0, INM_SLAB_HWCACHE_ALIGN, NULL, NULL, INM_MAX_NR_WQ_ENTRY_POOL, INM_MIN_NR_WQ_ENTRY_POOL, INM_PINNED); if (IS_ERR(driver_ctx->wq_entry_pool)) err = -ENOMEM; else driver_ctx->flags |= DC_FLAGS_WORKQUEUE_ENTRIES_POOL_INIT; next: if (err) dealloc_cache_pools(); return err; } inm_s32_t dealloc_cache_pools() { inm_s32_t err = 0; struct inm_list_head *ptr = NULL, *nextptr = NULL; inm_page_t *pg = NULL; inm_list_for_each_safe(ptr, nextptr, &driver_ctx->page_pool) { pg = inm_list_entry(ptr, inm_page_t, entry); inm_list_del(&pg->entry); INM_UNPIN(pg->cur_pg, INM_PAGESZ); INM_FREE_PAGE(pg->cur_pg, INM_KERNEL_HEAP); INM_UNPIN(pg, sizeof(inm_page_t)); INM_KFREE(pg, sizeof(inm_page_t), INM_KERNEL_HEAP); } if (driver_ctx->flags & DC_FLAGS_BITMAP_WORK_ITEM_POOL_INIT) { INM_KMEM_CACHE_DESTROY(driver_ctx->dc_bmap_info.bitmap_work_item_pool); } if (driver_ctx->flags & DC_FLAGS_WORKQUEUE_ENTRIES_POOL_INIT) { INM_KMEM_CACHE_DESTROY(driver_ctx->wq_entry_pool); } driver_ctx->flags &= ~(DC_FLAGS_BITMAP_WORK_ITEM_POOL_INIT | DC_FLAGS_WORKQUEUE_ENTRIES_POOL_INIT); return err; } void service_shutdown_completion(void) { driver_ctx->service_state = SERVICE_SHUTDOWN; info("service got shutdown"); driver_ctx->flags |= DC_FLAGS_SERVICE_STATE_CHANGED; INM_ATOMIC_INC(&driver_ctx->service_thread.wakeup_event_raised); INM_WAKEUP_INTERRUPTIBLE(&driver_ctx->service_thread.wakeup_event); INM_COMPLETE(&driver_ctx->service_thread._new_event_completion); } void inm_svagent_exit(void) { info("Service exiting pid = %d", driver_ctx->svagent_pid); get_time_stamp(&(driver_ctx->dc_tel.dt_svagent_stop_time)); telemetry_set_dbs(&driver_ctx->dc_tel.dt_blend, DBS_SERVICE_STOPPED); driver_ctx->svagent_pid = 0; driver_ctx->svagent_idhp = NULL; service_shutdown_completion(); update_cx_product_issue(VCS_CX_SVAGENT_EXIT); } void inm_s2_exit(void) { target_context_t *tgt_ctxt = NULL; change_node_t *chg_node; struct inm_list_head *curp = NULL, *nxtp = NULL; dbg("Drainer exiting pid = %d", driver_ctx->sentinal_pid); get_time_stamp(&(driver_ctx->dc_tel.dt_s2_stop_time)); telemetry_set_dbs(&driver_ctx->dc_tel.dt_blend, DBS_S2_STOPPED); start_notify_completion(); update_cx_product_issue(VCS_CX_S2_EXIT); reset_s2_latency_time(); retry: INM_DOWN_READ(&driver_ctx->tgt_list_sem); inm_list_for_each_safe(curp, nxtp, &driver_ctx->tgt_list){ tgt_ctxt = inm_list_entry(curp, target_context_t, tc_list); volume_lock(tgt_ctxt); chg_node = tgt_ctxt->tc_pending_confirm; if (chg_node) { chg_node->flags &= ~CHANGE_NODE_DATA_PAGES_MAPPED_TO_S2; /* Reset closed state of a change node in non write order mode: * drainer gets killed with currently mapped matadata change node * and in the mean time, data mode change node is available before * drainer comes up and again calls getdb, we would serve data mode * change node instead due perf changes * Hence currently closed metadata change node needs to reset * its start and end time stamps during its next getdb call * otherwise we would see OOD issue */ if (tgt_ctxt->tc_optimize_performance & (PERF_OPT_DRAIN_PREF_DATA_MODE_CHANGES_IN_NWO) && (chg_node->type == NODE_SRC_METADATA) && (chg_node->flags & CHANGE_NODE_IN_NWO_CLOSED)) { chg_node->flags &= ~(CHANGE_NODE_IN_NWO_CLOSED); chg_node->transaction_id = 0; --tgt_ctxt->tc_transaction_id; if (tgt_ctxt->tc_optimize_performance & PERF_OPT_DEBUG_DATA_DRAIN) { info("drainer_exit: Reset metadata closed state chg_node:%p mode:%d" "delta ts:%llu seq:%llu tc_cur_node:%p", chg_node,chg_node->type, chg_node->changes.end_ts.TimeInHundNanoSecondsFromJan1601 - chg_node->changes.start_ts.TimeInHundNanoSecondsFromJan1601, chg_node->changes.end_ts.ullSequenceNumber - chg_node->changes.start_ts.ullSequenceNumber, tgt_ctxt->tc_cur_node); } } } tgt_ctxt->tc_pending_confirm = NULL; volume_unlock(tgt_ctxt); if (chg_node) { if (chg_node->flags & CHANGE_NODE_ORPHANED) { if (chg_node->type == NODE_SRC_TAGS) telemetry_log_tag_history(chg_node, tgt_ctxt, ecTagStatusDropped, ecOrphan, ecMsgTagDropped); commit_change_node(chg_node); } else { INM_DOWN(&chg_node->mutex); chg_node->mapped_thread = NULL; #ifdef INM_AIX chg_node->mapped_address = 0; #else #ifdef INM_LINUX if(current->mm) #endif unmap_change_node(chg_node); #endif INM_UP(&chg_node->mutex); deref_chg_node(chg_node); } INM_UP_READ(&driver_ctx->tgt_list_sem); put_tgt_ctxt(tgt_ctxt); goto retry; } } INM_UP_READ(&driver_ctx->tgt_list_sem); /* de-initialize and free the associated resources * related to mapped dblk*/ driver_ctx->sentinal_pid = 0; driver_ctx->sentinal_idhp = NULL; } involflt-0.1.0/src/data-file-mode.c0000755000000000000000000005047514467303177015600 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ /* * File : data-file-mode.c * * Description: Data File Mode support. */ #include "involflt.h" #include "involflt-common.h" #include "data-mode.h" #include "change-node.h" #include "filestream.h" #include "iobuffer.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "bitmap_api.h" #include "VBitmap.h" #include "work_queue.h" #include "data-file-mode.h" #include "target-context.h" #include "driver-context.h" #include "utils.h" #include "tunable_params.h" #include "svdparse.h" #include "file-io.h" #include "osdep.h" extern driver_context_t *driver_ctx; extern void finalize_data_stream(change_node_t *); static_inline void set_next_file_thread(data_file_flt_t *flt_ctxt) { INM_BUG_ON(flt_ctxt->next_thr == NULL); if(flt_ctxt->next_thr->next.next == &(flt_ctxt->dfm_thr_hd)) { flt_ctxt->next_thr = DFM_THREAD_ENTRY(flt_ctxt->dfm_thr_hd.next); } else { flt_ctxt->next_thr = DFM_THREAD_ENTRY(flt_ctxt->next_thr->next.next); } } data_file_node_t * inm_alloc_data_file_node(inm_u32_t flags) { data_file_node_t *file_node = NULL; #ifndef INM_AIX file_node = INM_KMALLOC(sizeof(data_file_node_t), flags, INM_KERNEL_HEAP); #else file_node = INM_KMEM_CACHE_ALLOC( driver_ctx->dc_host_info.data_file_node_cache, flags); #endif return file_node; } void inm_free_data_file_node(data_file_node_t *file_node) { #ifndef INM_AIX INM_KFREE(file_node, sizeof(data_file_node_t), INM_KERNEL_HEAP); #else INM_KMEM_CACHE_FREE(driver_ctx->dc_host_info.data_file_node_cache, file_node); #endif } inm_s32_t queue_chg_node_to_file_thread(target_context_t *tgt_ctxt, change_node_t *node) { data_file_flt_t *flt_ctxt = &tgt_ctxt->tc_dfm; data_file_thread_t *thr = flt_ctxt->next_thr; data_file_node_t *file_node = NULL; unsigned long lock_flag = 0; file_node = inm_alloc_data_file_node(INM_KM_NOSLEEP); if(!file_node) return 0; INM_ATOMIC_INC(&thr->pending); tgt_ctxt->tc_stats.num_pgs_in_dfm_queue += node->changes.num_data_pgs; ref_chg_node(node); file_node->chg_node = node; INM_SPIN_LOCK_IRQSAVE(&thr->wq_list_lock, lock_flag); inm_list_add_tail(&file_node->next, &thr->wq_hd); INM_SPIN_UNLOCK_IRQRESTORE(&thr->wq_list_lock, lock_flag); set_next_file_thread(flt_ctxt); #ifdef INM_AIX INM_COMPLETE(&thr->compl); #else INM_UP(&thr->mutex); #endif return 1; } inm_s32_t inm_unlink_datafile(target_context_t *tgt_ctxt, char *data_file_name) { inm_s32_t error = 0; char *parent = NULL; if (!get_path_memory(&parent)) { err("Cannot alloc parent mem: %s", data_file_name); error = -ENOMEM; } else { sprintf_s(parent, INM_PATH_MAX, "%s/%s", tgt_ctxt->tc_data_log_dir, tgt_ctxt->tc_datafile_dir_name); dbg("Parent: %s", parent); error = inm_unlink(data_file_name, parent); free_path_memory(&parent); } return error; } static_inline char *generate_data_file_name(change_node_t *node, target_context_t *tgt_ctxt) { char *filename = NULL; char *log_dir = NULL, *fmt_strp = NULL; unsigned long lock_flag = 0; filename = (char*)INM_KMALLOC(INM_PATH_MAX, INM_KM_SLEEP, INM_KERNEL_HEAP); if(!filename) return NULL; log_dir = (char *)INM_KMALLOC(INM_PATH_MAX, INM_KM_SLEEP, INM_KERNEL_HEAP); if (!log_dir) { INM_KFREE(filename, INM_PATH_MAX, INM_KERNEL_HEAP); return NULL; } INM_SPIN_LOCK_IRQSAVE(&tgt_ctxt->tc_tunables_lock, lock_flag); if (strcpy_s(log_dir, INM_PATH_MAX, tgt_ctxt->tc_data_log_dir)) { INM_SPIN_UNLOCK_IRQRESTORE(&tgt_ctxt->tc_tunables_lock, lock_flag); INM_KFREE(filename, INM_PATH_MAX, INM_KERNEL_HEAP); INM_KFREE(log_dir, INM_PATH_MAX, INM_KERNEL_HEAP); return NULL; } INM_SPIN_UNLOCK_IRQRESTORE(&tgt_ctxt->tc_tunables_lock, lock_flag); if(0 == (tgt_ctxt->tc_flags & VCF_DATAFILE_DIR_CREATED)) { inm_mkdir(log_dir, 0700); snprintf(filename, 2048, "%s/%s", log_dir, tgt_ctxt->tc_datafile_dir_name); inm_mkdir(filename, 0700); volume_lock(tgt_ctxt); tgt_ctxt->tc_flags |= VCF_DATAFILE_DIR_CREATED; volume_unlock(tgt_ctxt); } if (node->flags & (KDIRTY_BLOCK_FLAG_START_OF_SPLIT_CHANGE | KDIRTY_BLOCK_FLAG_PART_OF_SPLIT_CHANGE)) { if (ecWriteOrderStateData == node->wostate) fmt_strp = "%s/%s/pre_completed_diff_S%llu_%llu_E%llu_%llu_WC%llu.dat"; else fmt_strp = "%s/%s/pre_completed_diff_S%llu_%llu_E%llu_%llu_MC%llu.dat"; } else { if (ecWriteOrderStateData == node->wostate) fmt_strp = "%s/%s/pre_completed_diff_S%llu_%llu_E%llu_%llu_WE%llu.dat"; else fmt_strp = "%s/%s/pre_completed_diff_S%llu_%llu_E%llu_%llu_ME%llu.dat"; } snprintf(filename, 2048, fmt_strp, log_dir, tgt_ctxt->tc_datafile_dir_name, node->changes.start_ts.TimeInHundNanoSecondsFromJan1601, node->changes.start_ts.ullSequenceNumber, node->changes.end_ts.TimeInHundNanoSecondsFromJan1601, node->changes.end_ts.ullSequenceNumber, (inm_u64_t)node->seq_id_for_split_io); INM_KFREE(log_dir, INM_PATH_MAX, INM_KERNEL_HEAP); return filename; } #ifdef INM_AIX #include "filter_host.h" static_inline inm_s32_t write_changes_to_file(void *hdl, char *fname, change_node_t *node, inm_u64_t *file_offset) { data_page_t *data_pg = PG_ENTRY(node->data_pg_head.next); inm_s32_t length = get_strm_len(node); char *buffer = NULL; inm_s32_t success = 0; inm_u32_t bytes_written = 0; offset_t file_len; inm_rec_write_meta_t *rec_write_meta; int flag; *file_offset = 0; buffer = INM_KMALLOC_ALIGN(INM_PAGESZ, INM_KM_SLEEP, INM_PAGESHIFT, INM_KERNEL_HEAP); if(!buffer) return success; INM_BUG_ON(inm_list_empty(&node->data_pg_head)); rec_write_meta = INM_KMALLOC(sizeof(inm_rec_write_meta_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!rec_write_meta){ INM_KFREE(buffer, INM_PAGESZ, INM_KERNEL_HEAP); return success; } if(INM_PIN(rec_write_meta, sizeof(inm_rec_write_meta_t))){ INM_KFREE(buffer, INM_PAGESZ, INM_KERNEL_HEAP); INM_KFREE(rec_write_meta, sizeof(inm_rec_write_meta_t), INM_KERNEL_HEAP); return success; } INM_MEM_ZERO(rec_write_meta, sizeof(inm_rec_write_meta_t)); rec_write_meta->buf = buffer; rec_write_meta->len = INM_PAGESZ; rec_write_meta->pid = getpid(); INM_SPIN_LOCK(&driver_ctx->recursive_writes_meta_list_lock, flag); inm_list_add_tail(&rec_write_meta->list, &driver_ctx->recursive_writes_meta_list); INM_SPIN_UNLOCK(&driver_ctx->recursive_writes_meta_list_lock, flag); file_len = length; while(length > 0) { inm_s32_t len; char *src; INM_MEM_ZERO(buffer, INM_PAGESZ); len = MIN(INM_PAGESZ, length); INM_PAGE_MAP(src, data_pg->page, KM_USER0); if (memcpy_s(buffer, len, src, len)) { INM_PAGE_UNMAP(src, data_pg->page, KM_USER0); break; } INM_PAGE_UNMAP(src, data_pg->page, KM_USER0); rec_write_meta->initialised = 1; if(!flt_write_file(hdl, buffer, *file_offset, INM_PAGESZ, &bytes_written)){ break; } data_pg = PG_ENTRY(data_pg->next.next); length -= len; *file_offset += len; } INM_SPIN_LOCK(&driver_ctx->recursive_writes_meta_list_lock, flag); inm_list_del(&rec_write_meta->list); INM_SPIN_UNLOCK(&driver_ctx->recursive_writes_meta_list_lock, flag); if(length == 0){ struct file *file = hdl; VNOP_FTRUNC(file->f_vnode, file->f_flag, file_len, NULL, file->f_cred); success = 1; } INM_UNPIN(rec_write_meta, sizeof(inm_rec_write_meta_t)); INM_KFREE(rec_write_meta, sizeof(inm_rec_write_meta_t), INM_KERNEL_HEAP); INM_KFREE(buffer, INM_PAGESZ, INM_KERNEL_HEAP); return success; } #else static_inline inm_s32_t write_changes_to_file(void *hdl, char *fname, change_node_t *node, inm_u64_t *file_offset) { data_page_t *data_pg = PG_ENTRY(node->data_pg_head.next); inm_s32_t length = get_strm_len(node); char *buffer = NULL; inm_s32_t success = 0; inm_u32_t bytes_written = 0; *file_offset = 0; buffer = INM_KMALLOC(INM_PAGESZ, INM_KM_SLEEP, INM_KERNEL_HEAP); if(!buffer) return success; INM_BUG_ON(inm_list_empty(&node->data_pg_head)); while(length > 0) { inm_s32_t len; char *src; len = MIN(INM_PAGESZ, length); INM_PAGE_MAP(src, data_pg->page, KM_USER0); if (memcpy_s(buffer, len, src, len)) { INM_PAGE_UNMAP(src, data_pg->page, KM_USER0); break; } INM_PAGE_UNMAP(src, data_pg->page, KM_USER0); if(!flt_write_file(hdl, buffer, *file_offset, len, &bytes_written)) break; data_pg = PG_ENTRY(data_pg->next.next); length -= len; *file_offset += len; } if(length == 0) success = 1; INM_KFREE(buffer, INM_PAGESZ, INM_KERNEL_HEAP); return success; } #endif static_inline void write_file_node(data_file_node_t *file_node, target_context_t *tgt_ctxt) { change_node_t *node = file_node->chg_node; char *fname = NULL; void *hdl = NULL; inm_u64_t file_sz = 0; inm_s32_t err = 0; INM_DOWN(&node->mutex); do { volume_lock(tgt_ctxt); if((0 == (node->flags & CHANGE_NODE_FLAGS_QUEUED_FOR_DATA_WRITE)) || (node->flags & CHANGE_NODE_DATA_PAGES_MAPPED_TO_S2)) { volume_unlock(tgt_ctxt); break; } if(tgt_ctxt->tc_stats.dfm_bytes_to_disk >= tgt_ctxt->tc_data_to_disk_limit) { volume_unlock(tgt_ctxt); break; } if(0 == (node->flags & CHANGE_NODE_DATA_STREAM_FINALIZED)) finalize_data_stream(node); volume_unlock(tgt_ctxt); fname = generate_data_file_name(file_node->chg_node, tgt_ctxt); if(!fname) { break; } #ifdef INM_AIX if(!flt_open_data_file(fname, (INM_RDWR | INM_CREAT | INM_DIRECT), &hdl)) { #else if(!flt_open_data_file(fname, (INM_RDWR | INM_CREAT), &hdl)) { #endif err = 1; break; } if(!write_changes_to_file(hdl, fname, node, &file_sz)) err = 1; #ifndef INM_AIX inm_restore_org_addr_space_ops(INM_HDL_TO_INODE(hdl)); #endif INM_CLOSE_FILE(hdl, (INM_RDWR | INM_CREAT)); hdl = NULL; if (!err) { volume_lock(tgt_ctxt); if (node->flags & CHANGE_NODE_DATA_PAGES_MAPPED_TO_S2) { volume_unlock(tgt_ctxt); inm_unlink_datafile(tgt_ctxt, fname); break; } tgt_ctxt->tc_stats.num_pages_allocated -= node->changes.num_data_pgs; inm_rel_data_pages(tgt_ctxt, &node->data_pg_head, node->changes.num_data_pgs); INM_INIT_LIST_HEAD(&node->data_pg_head); node->changes.num_data_pgs = 0; node->type = NODE_SRC_DATAFILE; tgt_ctxt->tc_stats.dfm_bytes_to_disk += file_sz; node->data_file_name = fname; node->data_file_size = file_sz; INM_ATOMIC_INC(&tgt_ctxt->tc_stats.num_dfm_files); INM_ATOMIC_INC(&tgt_ctxt->tc_stats.num_dfm_files_pending); volume_unlock(tgt_ctxt); } else { inm_unlink_datafile(tgt_ctxt, fname); } } while(0); volume_lock(tgt_ctxt); node->flags &= ~CHANGE_NODE_FLAGS_QUEUED_FOR_DATA_WRITE; tgt_ctxt->tc_stats.num_pgs_in_dfm_queue -= node->changes.num_data_pgs; volume_unlock(tgt_ctxt); if (err && fname) INM_KFREE(fname, INM_PATH_MAX, INM_KERNEL_HEAP); INM_UP(&node->mutex); deref_chg_node(node); file_node->chg_node = NULL; } static_inline void process_file_writer_op(data_file_thread_t * thr) { data_file_node_t *file_node = NULL; unsigned long lock_flag = 0; INM_SPIN_LOCK_IRQSAVE(&thr->wq_list_lock, lock_flag); INM_BUG_ON(inm_list_empty(&thr->wq_hd)); file_node = inm_list_entry(thr->wq_hd.next, data_file_node_t, next); inm_list_del(thr->wq_hd.next); INM_SPIN_UNLOCK_IRQRESTORE(&thr->wq_list_lock, lock_flag); write_file_node(file_node, thr->ctxt); if (file_node){ inm_free_data_file_node(file_node); } } #ifdef INM_AIX void data_file_writer_thread_func(int flag, void *args, int arg_len) { data_file_thread_t *thr = *(data_file_thread_t **)args; #else int data_file_writer_thread_func(void *args) { data_file_thread_t *thr = (data_file_thread_t *)args; #endif dbg("thr = %p", thr); INM_DAEMONIZE("fw%d", inm_dev_id_get((target_context_t*)thr->ctxt), thr->id); INM_SET_USER_NICE(20); INM_ATOMIC_INC(&thr->pending); dbg("Data File Writer Thread %d started", thr->id); INM_COMPLETE(&thr->ctxt->exit); for(;;) { #ifdef INM_AIX INM_WAIT_FOR_COMPLETION(&thr->compl); #else INM_DOWN(&thr->mutex); #endif if(!INM_ATOMIC_READ(&thr->pending)) break; process_file_writer_op(thr); if(INM_ATOMIC_DEC_AND_TEST(&thr->pending)) break; } dbg("Data File Writer Thread with id %d exiting", thr->id); /* signal the process waiting for completion. */ INM_COMPLETE(&thr->exit); return 0; } inm_s32_t is_data_files_enabled(target_context_t *tgt_ctxt) { if(!driver_ctx->service_supports_data_filtering) return 0; if(!driver_ctx->tunable_params.enable_data_file_mode) return 0; if(tgt_ctxt->tc_flags & VCF_DATA_FILES_DISABLED) return 0; return 1; } inm_s32_t init_data_file_flt_ctxt(target_context_t *tgt_ctxt) { data_file_thread_t *thr; inm_s32_t thr_created = 0, thr_to_create = 0; data_file_flt_t *flt_ctxt = &tgt_ctxt->tc_dfm; #ifdef INM_AIX char pname[30]; #endif if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered ctx:%p volume:%s",tgt_ctxt, tgt_ctxt->tc_guid); } INM_INIT_LIST_HEAD(&flt_ctxt->dfm_thr_hd); INM_INIT_SEM(&flt_ctxt->list_mutex); flt_ctxt->num_dfm_threads = 0; flt_ctxt->next_thr = NULL; INM_ATOMIC_SET(&flt_ctxt->terminating, 0); if (tgt_ctxt->tc_dev_type == FILTER_DEV_MIRROR_SETUP || !is_data_files_enabled(tgt_ctxt)) { return 0; } thr_to_create = DEFAULT_NUMBER_OF_FILEWRITERS_PER_VOLUME; INM_DOWN(&flt_ctxt->list_mutex); /* Create threads */ while(thr_created != thr_to_create) { thr = (data_file_thread_t *)INM_KMALLOC(sizeof(data_file_thread_t), INM_KM_NOSLEEP, INM_PINNED_HEAP); if(!thr) { err("Failed to allocate memory for data_file_thread_t"); break; } thr->id = thr_created; dbg("DFM id %d", thr->id); INM_ATOMIC_SET(&thr->pending, 0); inm_list_add(&thr->next, &flt_ctxt->dfm_thr_hd); INM_INIT_COMPLETION(&thr->exit); INM_INIT_COMPLETION(&tgt_ctxt->exit); #ifdef INM_AIX INM_INIT_COMPLETION(&thr->compl); #else INM_INIT_SEM_LOCKED(&thr->mutex); #endif INM_INIT_SPIN_LOCK(&thr->wq_list_lock); INM_INIT_LIST_HEAD(&thr->wq_hd); get_tgt_ctxt(tgt_ctxt); thr->ctxt = (void *)tgt_ctxt; #ifdef INM_SOLARIS if(INM_KERNEL_THREAD(data_file_writer_thread_func, thr, 0, 0) == NULL) { #endif #ifdef INM_LINUX if(INM_KERNEL_THREAD(thr->thread_task, data_file_writer_thread_func, thr, 0, "fw%d", inm_dev_id_get(tgt_ctxt)) < 0) { #endif #ifdef INM_AIX sprintf(pname, "fw%d", inm_dev_id_get(tgt_ctxt)); info("process name = %s", pname); if(INM_KERNEL_THREAD(data_file_writer_thread_func, &thr, sizeof(thr), pname) < 0) { #endif inm_list_del(&thr->next); put_tgt_ctxt(tgt_ctxt); INM_KFREE(thr, sizeof(data_file_thread_t), INM_PINNED_HEAP); err("Failed to create data file writer thread"); break; } INM_WAIT_FOR_COMPLETION(&tgt_ctxt->exit); thr_created++; } INM_UP(&flt_ctxt->list_mutex); if (thr_created > 0) { flt_ctxt->next_thr = DFM_THREAD_ENTRY(flt_ctxt->dfm_thr_hd.next); flt_ctxt->num_dfm_threads = thr_created; return 0; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving ctx:%p volume:%s",tgt_ctxt, tgt_ctxt->tc_guid); } return 1; } void flush_work_items(target_context_t *tgt_ctxt, data_file_thread_t *thr) { unsigned long lock_flag; struct inm_list_head *ptr, *hd, *nextptr; data_file_node_t *file_node = NULL; INM_SPIN_LOCK_IRQSAVE(&thr->wq_list_lock, lock_flag); hd = &thr->wq_hd; inm_list_for_each_safe(ptr, nextptr, hd) { change_node_t *node = NULL; file_node = inm_list_entry(ptr, data_file_node_t, next); node = file_node->chg_node; inm_list_del(ptr); INM_ATOMIC_DEC(&thr->pending); INM_SPIN_LOCK_IRQSAVE(&tgt_ctxt->tc_lock, tgt_ctxt->tc_lock_flag); node->flags &= ~CHANGE_NODE_FLAGS_QUEUED_FOR_DATA_WRITE; tgt_ctxt->tc_stats.num_pgs_in_dfm_queue -= node->changes.num_data_pgs; INM_SPIN_UNLOCK_IRQRESTORE(&tgt_ctxt->tc_lock, tgt_ctxt->tc_lock_flag); deref_chg_node(node); inm_free_data_file_node(file_node); } INM_SPIN_UNLOCK_IRQRESTORE(&thr->wq_list_lock, lock_flag); } void free_data_file_flt_ctxt(target_context_t *tgt_ctxt) { struct inm_list_head *ptr, *hd, *nextptr; data_file_thread_t *thr; data_file_flt_t *flt_ctxt = &tgt_ctxt->tc_dfm; if (tgt_ctxt->tc_dev_type == FILTER_DEV_MIRROR_SETUP || !is_data_files_enabled(tgt_ctxt)) { return; } INM_DOWN(&flt_ctxt->list_mutex); hd = &(tgt_ctxt->tc_dfm.dfm_thr_hd); inm_list_for_each_safe(ptr, nextptr, hd) { thr = inm_list_entry(ptr, data_file_thread_t, next); flush_work_items(tgt_ctxt, thr); if(INM_ATOMIC_DEC_AND_TEST(&thr->pending)) #ifdef INM_AIX INM_COMPLETE(&thr->compl); #else INM_UP(&thr->mutex); #endif INM_WAIT_FOR_COMPLETION(&thr->exit); INM_KTHREAD_STOP(thr->thread_task); inm_list_del(&thr->next); put_tgt_ctxt((target_context_t *)thr->ctxt); INM_DESTROY_COMPLETION(&thr->exit); INM_KFREE(thr, sizeof(data_file_thread_t), INM_PINNED_HEAP); } INM_DESTROY_COMPLETION(&tgt_ctxt->exit); INM_UP(&flt_ctxt->list_mutex); } inm_s32_t should_write_to_datafile(target_context_t *tgt_ctxt) { inm_s32_t num_pages_used; unsigned long lock_flag = 0; inm_u32_t is_vol_thres_hit = 0; inm_u32_t is_drv_thres_hit = 0; inm_u32_t drv_thres_pages = 0; inm_u32_t vol_pg_thres_df = 0; inm_u32_t data_pool_size = 0; inm_u32_t total_unres_pages = 0; if (tgt_ctxt->tc_flags & VCF_VOLUME_STACKED_PARTIALLY) return 0; /* caculate the volume threshold based on volumes min data pool size */ vol_pg_thres_df = (((driver_ctx->tunable_params.volume_percent_thres_for_filewrite)* (tgt_ctxt->tc_reserved_pages))/100); if((driver_ctx->service_state != SERVICE_RUNNING) || !is_data_files_enabled(tgt_ctxt)) { return 0; } num_pages_used = (tgt_ctxt->tc_stats.num_pages_allocated - tgt_ctxt->tc_stats.num_pgs_in_dfm_queue); if (num_pages_used > vol_pg_thres_df) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ dbg("Crossed volume data pages threshold for file write"); } is_vol_thres_hit = 1; } INM_SPIN_LOCK_IRQSAVE(&driver_ctx->tunables_lock, lock_flag); drv_thres_pages = driver_ctx->tunable_params.free_pages_thres_for_filewrite; data_pool_size = driver_ctx->tunable_params.data_pool_size; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->tunables_lock, lock_flag); data_pool_size <<= (MEGABYTE_BIT_SHIFT - INM_PAGESHIFT); /* Insure that driver threshold and target context thresholds * are hit to start writing into data file * Fairness solution after considering various scenarios * DataPoolSize 64 MB i) 1 volume ii) 4 volumes * DataPoolSize 256MB i) 1 volume ii) 4 or more voluames * DataPoolSize X GB i) 1 volume ii) 4 or more voluames */ INM_SPIN_LOCK_IRQSAVE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); total_unres_pages = driver_ctx->data_flt_ctx.pages_allocated - driver_ctx->dc_cur_unres_pages; total_unres_pages = data_pool_size - total_unres_pages; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); if (total_unres_pages <= drv_thres_pages) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ dbg("Reached driver cxt free data pages threshold for file write"); } is_drv_thres_hit = 1; } /* It is fairly okay to start writing to a data file on crossing * the volume threshold */ if (is_vol_thres_hit && is_drv_thres_hit) { return 1; } return 0; } inm_s32_t create_datafile_dir_name(target_context_t *ctxt, inm_dev_info_t *dev_info) { char *temp = NULL; char *ptr, *path; inm_s32_t len,i; path = ptr = dev_info->d_guid; temp = (char *)INM_KMALLOC(INM_GUID_LEN_MAX, INM_KM_SLEEP, INM_KERNEL_HEAP); if(!temp) { err("proc entry creation for volume %s failed: No Memory", path); return 1; } if(strncmp("/dev/", path, 5) == 0) ptr += 5; if (strcpy_s(temp, INM_GUID_LEN_MAX, ptr)) { INM_KFREE(temp, INM_GUID_LEN_MAX, INM_KERNEL_HEAP);; return 1; } len = strlen(temp); i = 0; while(i < len) { if(temp[i] == '/') temp[i] = '-'; i++; } ctxt->tc_datafile_dir_name = temp; return 0; } involflt-0.1.0/src/telemetry-exception.h0000755000000000000000000000343014467303177017030 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _TELE_EXC_H #define _TELE_EXC_H /* Limit the number of exceptions to limit telemetry data to PAGE_SIZE */ #define INM_EXCEPTION_MAX 5 #define INM_EXCEPTION_BUFSZ 2048 typedef enum _etException { excSuccess = 0, /* Global Exception - MSB 0 */ /* Per Volume Exception - MSB 1 */ ecUnsupportedBIO = 2147483649, ecResync, } etException; typedef struct exception { inm_list_head_t e_list; char e_tag[INM_GUID_LEN_MAX]; inm_u64_t e_first_time; inm_u64_t e_last_time; etException e_error; inm_u32_t e_count; inm_u64_t e_data; } exception_t; typedef struct exception_buf { inm_u32_t eb_refcnt; inm_u32_t eb_gen; char eb_buf[2040]; /* align to 2048 */ } exception_buf_t; void telemetry_init_exception(void); void telemetry_set_exception(char *, etException, inm_u64_t); exception_buf_t *telemetry_get_exception(void); void telemetry_put_exception(exception_buf_t *); #endif involflt-0.1.0/src/osdep.h0000755000000000000000000010133014467303177014132 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _INM_OSDEP_H #define _INM_OSDEP_H #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "distro.h" #if LINUX_VERSION_CODE < KERNEL_VERSION(4, 12, 0) #include #else #include #endif #if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 5, 0) #include #endif #include "involflt.h" #include "inm_mem.h" #include "inm_locks.h" #include "inm_utypes.h" #include "inm_list.h" #include "flt_bio.h" #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,13,0) #include "linux/blk-mq.h" #endif #if LINUX_VERSION_CODE >= KERNEL_VERSION(4,4,0) #define __GFP_WAIT __GFP_RECLAIM #endif #if LINUX_VERSION_CODE >= KERNEL_VERSION(4,11,0) #include #endif #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,19,0) typedef struct _lookup { struct path path; } inm_lookup_t; #define INM_HDL_TO_INODE(file) (file_inode(file)) #else typedef struct nameidata inm_lookup_t; #define INM_HDL_TO_INODE(hdlp) (((struct file *)hdlp)->f_dentry->d_inode) #endif typedef struct super_block inm_super_block_t; typedef struct block_device inm_block_device_t; typedef struct timer_list inm_timer_t; typedef unsigned long inm_irqflag_t; struct _change_node; struct _target_context; struct _bitmap_api_tag; struct _iobuffer_tag; struct _tag_info; struct _vol_info { struct inm_list_head next; inm_block_device_t *bdev; inm_super_block_t *sb; }; typedef struct _vol_info vol_info_t; struct _tag_volinfo { struct _target_context *ctxt; struct inm_list_head head; /* contains list of tag_vol_info_t */ }; typedef struct _tag_volinfo tag_volinfo_t; struct _mirror_vol_entry; /* filtering specific section */ typedef struct _dm_bio_info { #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) struct inm_list_head entry; #endif sector_t bi_sector; inm_u32_t bi_flags; inm_u32_t bi_size; #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,14,0) inm_u32_t bi_idx; inm_u32_t bi_bvec_done; #else unsigned short bi_idx; #endif bio_end_io_t *bi_end_io; void *tc; void *bi_private; struct _change_node *bi_chg_node; struct bio *new_bio; struct bio *orig_bio; struct bio *orig_bio_copy; inm_block_device_t *start_mirror_dev; inm_u32_t src_done; inm_s32_t src_error; inm_u32_t dst_done; inm_s32_t dst_error; inm_spinlock_t bio_info_lock; inm_u32_t dm_bio_flags; struct _mirror_vol_entry *dmbioinfo_vol_entry; } dm_bio_info_t; #define BINFO_FLAG_CHAIN 0x1 #define BINFO_ALLOCED_FROM_POOL 0x2 #define BIO_INFO_MPOOL_SIZE (PAGE_SIZE/sizeof(dm_bio_info_t)) #define BIO_BI_SIZE(bio) (((struct bio *)bio)->bi_size) #define BIO_BI_VCNT(bio) (((struct bio *)bio)->bi_vcnt) typedef struct _req_q_info { struct inm_list_head next; inm_atomic_t ref_cnt; inm_atomic_t vol_users; #if LINUX_VERSION_CODE < KERNEL_VERSION(5, 8, 0) && !defined(SLES15SP3) make_request_fn *orig_make_req_fn; #endif struct kobj_type mod_kobj_type; struct kobj_type *orig_kobj_type; struct request_queue *q; int rqi_flags; #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) struct blk_mq_ops *orig_mq_ops; struct blk_mq_ops mod_mq_ops; void *tc; #endif } req_queue_info_t; #define INM_STABLE_PAGES_FLAG_SET 0x1 typedef struct _req_q_info inm_stginfo_t; typedef struct file inm_devhandle_t; /* Structure definition to store volume info to freeze/thaw */ typedef struct freeze_vol_info { struct inm_list_head freeze_list_entry; char vol_name[TAG_VOLUME_MAX_LENGTH]; inm_block_device_t *bdev; inm_super_block_t *sb; } freeze_vol_info_t; #define INM_MODULE_PUT() module_put(THIS_MODULE) #define INM_TRY_MODULE_GET() try_module_get(THIS_MODULE) extern atomic_t inm_flt_memprint; void *inm_kmalloc(size_t size, int flags); void inm_kfree(size_t size,const void * objp); void *inm_kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags); void inm_kmem_cache_free(struct kmem_cache *cachep, void *objp); void *inm_mempool_alloc(mempool_t *pool, gfp_t gfp_mask); void inm_mempool_free(void *element, mempool_t *pool); void inm_vfree(const void *addr, unsigned long size); void *inm_vmalloc(unsigned long size); struct page *inm_alloc_page(gfp_t gfp_mask); void inm_free_page(unsigned long addr); unsigned long __inm_get_free_page(gfp_t gfp_mask); void __inm_free_page(struct page *page); void freeze_volumes(int, tag_volinfo_t *); void unfreeze_volumes(int, tag_volinfo_t *); void lock_volumes(int, tag_volinfo_t *); void unlock_volumes(int, tag_volinfo_t *); inm_s32_t is_rootfs_ro(void); inm_s32_t map_change_node_to_user(struct _change_node *, struct file *); inm_block_device_t *open_by_dev_path(char *, int); inm_block_device_t *open_by_dev_path_v2(char *, int); #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,30) #define close_bdev(bdev, mode) blkdev_put(bdev, mode); #else #define close_bdev(bdev, mode) blkdev_put(bdev); #endif inm_s32_t flt_release(struct inode *inode, struct file *filp); inm_s32_t iobuffer_sync_read_physical(struct _iobuffer_tag *iob, inm_s32_t force); inm_s32_t iobuffer_sync_flush_physical(struct _iobuffer_tag *iob); inm_u64_t get_bmfile_granularity(struct _target_context *vcptr); void inm_scst_unregister(struct _target_context *); int inm_path_lookup_parent(const char *, inm_lookup_t *); int inm_path_lookup(const char *, unsigned int, inm_lookup_t *); void inm_path_release(inm_lookup_t *); inm_s32_t inm_get_scsi_id(char *path); /* ioctl specific section */ #define __INM_USER __user #if ((LINUX_VERSION_CODE >= KERNEL_VERSION(5,0,0)) || \ (defined RHEL8 && RHEL_MINOR >= 1)) #define INM_ACCESS_OK(access, arg, size) \ access_ok(arg, size) #else #define INM_ACCESS_OK(access, arg, size) \ access_ok(access, arg, size) #endif #define INM_COPYIN(dst, src, len) copy_from_user(dst, src, len) #define INM_COPYOUT(dst, src, len) copy_to_user(dst, src, len) #define IOCTL_INMAGE_VOLUME_STACKING _IOW(FLT_IOCTL, VOLUME_STACKING_CMD, PROCESS_VOLUME_STACKING_INPUT) #define IOCTL_INMAGE_PROCESS_START_NOTIFY _IOW(FLT_IOCTL, START_NOTIFY_CMD, PROCESS_START_NOTIFY_INPUT) #define IOCTL_INMAGE_SERVICE_SHUTDOWN_NOTIFY _IOW(FLT_IOCTL, SHUTDOWN_NOTIFY_CMD, SHUTDOWN_NOTIFY_INPUT) #define IOCTL_INMAGE_STOP_FILTERING_DEVICE _IOW(FLT_IOCTL, STOP_FILTER_CMD, VOLUME_GUID) #define IOCTL_INMAGE_START_FILTERING_DEVICE _IOW(FLT_IOCTL, START_FILTER_CMD, VOLUME_GUID) #define IOCTL_INMAGE_START_FILTERING_DEVICE_V2 _IOW(FLT_IOCTL, START_FILTER_CMD_V2, inm_dev_info_compat_t) #define IOCTL_INMAGE_COMMIT_DIRTY_BLOCKS_TRANS _IOW(FLT_IOCTL, COMMIT_DB_CMD, COMMIT_TRANSACTION) #define IOCTL_INMAGE_SET_VOLUME_FLAGS _IOW(FLT_IOCTL, SET_VOL_FLAGS_CMD, VOLUME_FLAGS_INPUT) #define IOCTL_INMAGE_GET_VOLUME_FLAGS _IOR(FLT_IOCTL, GET_VOL_FLAGS_CMD, VOLUME_FLAGS_INPUT) #define IOCTL_INMAGE_WAIT_FOR_DB _IOW(FLT_IOCTL, WAIT_FOR_DB_CMD, WAIT_FOR_DB_NOTIFY) #define IOCTL_INMAGE_CLEAR_DIFFERENTIALS _IOW(FLT_IOCTL, CLEAR_DIFFS_CMD, VOLUME_GUID) #define IOCTL_INMAGE_GET_NANOSECOND_TIME _IOWR(FLT_IOCTL, GET_TIME_CMD, long long) #define IOCTL_INMAGE_UNSTACK_ALL _IO(FLT_IOCTL, UNSTACK_ALL_CMD) #define IOCTL_INMAGE_SYS_SHUTDOWN _IOW(FLT_IOCTL, SYS_SHUTDOWN_NOTIFY_CMD, SYS_SHUTDOWN_NOTIFY_INPUT) #define IOCTL_INMAGE_TAG_VOLUME _IOWR(FLT_IOCTL, TAG_CMD, unsigned long) #define IOCTL_INMAGE_SYNC_TAG_VOLUME _IOWR(FLT_IOCTL, SYNC_TAG_CMD, unsigned long) #define IOCTL_INMAGE_GET_TAG_VOLUME_STATUS _IOWR(FLT_IOCTL, SYNC_TAG_STATUS_CMD, unsigned long) #define IOCTL_INMAGE_WAKEUP_ALL_THREADS _IO(FLT_IOCTL, WAKEUP_THREADS_CMD) #define IOCTL_INMAGE_GET_DB_NOTIFY_THRESHOLD _IOWR(FLT_IOCTL, GET_DB_THRESHOLD_CMD, get_db_thres_t ) #define IOCTL_INMAGE_RESYNC_START_NOTIFICATION _IOWR(FLT_IOCTL, RESYNC_START_CMD, RESYNC_START ) #define IOCTL_INMAGE_RESYNC_END_NOTIFICATION _IOWR(FLT_IOCTL, RESYNC_END_CMD, RESYNC_END) #define IOCTL_INMAGE_GET_DRIVER_VERSION _IOWR(FLT_IOCTL, GET_DRIVER_VER_CMD, DRIVER_VERSION) #define IOCTL_INMAGE_SHELL_LOG _IOWR(FLT_IOCTL, GET_SHELL_LOG_CMD, char *) #define IOCTL_INMAGE_AT_LUN_CREATE _IOW(FLT_IOCTL, AT_LUN_CREATE_CMD, LUN_CREATE_INPUT) #define IOCTL_INMAGE_AT_LUN_DELETE _IOW(FLT_IOCTL, AT_LUN_DELETE_CMD, LUN_DELETE_INPUT) #define IOCTL_INMAGE_AT_LUN_LAST_WRITE_VI _IOWR(FLT_IOCTL, AT_LUN_LAST_WRITE_VI_CMD, \ AT_LUN_LAST_WRITE_VI) #define IOCTL_INMAGE_AT_LUN_LAST_HOST_IO_TIMESTAMP _IOWR(FLT_IOCTL, AT_LUN_LAST_HOST_IO_TIMESTAMP_CMD, \ AT_LUN_LAST_HOST_IO_TIMESTAMP) #define IOCTL_INMAGE_AT_LUN_QUERY _IOWR(FLT_IOCTL, AT_LUN_QUERY_CMD, LUN_QUERY_DATA) #define IOCTL_INMAGE_VOLUME_UNSTACKING _IOW(FLT_IOCTL, VOLUME_UNSTACKING_CMD, VOLUME_GUID) #define IOCTL_INMAGE_BOOTTIME_STACKING _IO(FLT_IOCTL, BOOTTIME_STACKING_CMD) /* target context specific sections */ struct _target_context *get_tgt_ctxt_from_bio(struct bio *); struct _target_context *get_tgt_ctxt_from_kobj(struct kobject *); /* proc specific sections */ typedef struct proc_dir_entry inm_proc_dir_entry; /* change node specific sections */ /* This structure holds information about a single data page. Linked list * of data pages is used in data mode filtering. Also, data pages are allocated * for disk changes and tags. */ typedef struct _data_page { /* To link this data page to the caller specified doubly linked list */ struct inm_list_head next; /* Virtual address of the data page is stored here. */ struct page *page; } data_page_t; inm_s32_t alloc_data_pages(struct inm_list_head *, inm_u32_t, inm_u32_t *, int); void free_data_pages(struct inm_list_head *); void delete_data_pages(inm_u32_t); #define PG_ENTRY(ptr) (inm_list_entry(ptr, data_page_t, next)) typedef struct task_struct inm_task_struct; /* data mode specific sections */ #define INM_SET_PAGE_RESERVED(page) SetPageReserved(page) #define INM_CLEAR_PAGE_RESERVED(page) ClearPageReserved(page) #define INM_COPY_BIO_DATA_TO_DATA_PAGES(iov, listhd) \ copy_bio_data_to_data_pages((struct bio *)iov, listhd) #define INM_COPY_IOVEC_DATA_TO_DATA_PAGES(len, iov, iov_count, listhd) \ copy_iovec_data_to_data_pages(len,(struct iovec *)iov, \ iov_count, listhd) #define INM_GET_WMD(info, wmd) \ wmd.offset = (info->bi_sector << 9); \ wmd.length = (info->bi_size - INM_BUF_COUNT(bio)) typedef dm_bio_info_t inm_io_info_t; #ifdef INM_RECUSIVE_ADSPC struct inma_ops { struct inm_list_head ia_list; const inm_address_space_operations_t *ia_mapping; }; #else /* file-io specific sections */ struct inma_ops { struct inm_list_head ia_list; const inm_address_space_operations_t *ia_org_aopsp; inm_address_space_operations_t *ia_dup_aopsp; }; #endif typedef struct inma_ops inma_ops_t; /* Debug APIs */ #define dbg(format, arg...) \ if(IS_DBG_ENABLED(inm_verbosity, INM_DEBUG_ONLY)){ \ printk(KERN_DEBUG "%s[%s:%d (DBG)]: " format "\n" , DRIVER_NAME , \ __FUNCTION__, __LINE__, ## arg); \ } #define err(format, arg...) \ printk(KERN_ERR "%s[%s:%d (ERR)]: " format "\n" , DRIVER_NAME, __FUNCTION__ , \ __LINE__, ## arg) #define vol_err(tcxt, format, arg...) \ printk(KERN_ERR "%s[%s:%d (ERR)]: (%s:%s)" format "\n" , DRIVER_NAME, \ __FUNCTION__ , __LINE__, ((target_context_t *)tcxt)->tc_pname, \ ((target_context_t *)tcxt)->tc_guid, ## arg); #define info(format, arg...) \ if(IS_DBG_ENABLED(inm_verbosity, (INM_DEBUG_ONLY | INM_IDEBUG | INM_IDEBUG_META | INM_IDEBUG_BMAP))){ \ printk(KERN_ERR "%s[%s:%d (INFO)]: " format "\n" , DRIVER_NAME, __FUNCTION__, \ __LINE__, ## arg); \ }else { \ printk(KERN_INFO "%s[%s:%d (INFO)]: " format "\n" , DRIVER_NAME, __FUNCTION__, \ __LINE__, ## arg); \ } #define warn(format, arg...) \ printk(KERN_WARN "%s[%s:%d (WARN)]: " format "\n" , DRIVER_NAME, __FUNCTION__, \ __LINE__, ## arg) #define print_dbginfo(fmt, arg...) \ printk(KERN_ERR fmt, ##arg) #ifdef INM_DEBUG #define INM_BUG_ON(EXPR) BUG_ON(EXPR) #else #define INM_BUG_ON(EXPR) \ ({ \ if (EXPR){ \ err("Involflt driver bug"); \ } \ }) #endif #define INM_LIKELY(v) likely(v) #define INM_UNLIKELY(v) unlikely(v) #if LINUX_VERSION_CODE < KERNEL_VERSION(5,8,10) typedef struct timespec inm_timespec; #else typedef struct timespec64 inm_timespec; #endif /* utils sections */ #if LINUX_VERSION_CODE < KERNEL_VERSION(4, 12, 0) #define INM_GET_CURRENT_TIME(now) \ now = CURRENT_TIME #else #if LINUX_VERSION_CODE < KERNEL_VERSION(4, 20, 0) #define INM_GET_CURRENT_TIME(now) \ now = current_kernel_time() #elif LINUX_VERSION_CODE < KERNEL_VERSION(5, 8, 10) #define INM_GET_CURRENT_TIME(now) getnstimeofday(&now) #else #define INM_GET_CURRENT_TIME(now) ktime_get_real_ts64(&now) #endif #endif #define INM_MSEC_PER_SEC MSEC_PER_SEC #define INM_MSECS_TO_JIFFIES(msecs) msecs_to_jiffies(msecs) #if LINUX_VERSION_CODE < KERNEL_VERSION(4, 12, 0) #define INM_GET_CURR_TIME_IN_SEC ((CURRENT_TIME).tv_sec) #else #if LINUX_VERSION_CODE < KERNEL_VERSION(4, 20, 0) #define INM_GET_CURR_TIME_IN_SEC (current_kernel_time().tv_sec) #else inm_s64_t inm_current_kernel_time_secs(void); #define INM_GET_CURR_TIME_IN_SEC inm_current_kernel_time_secs() #endif #endif #define INM_HZ (HZ) /* to handle freeze volume list*/ inm_s32_t freeze_root_dev(void); inm_s32_t process_freeze_volume_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg); inm_s32_t process_thaw_volume_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg); inm_s32_t convert_path_to_dev(const char *, inm_dev_t *); inm_s32_t get_dir(char *dir, inm_s32_t (*inm_entry_callback)(char *fname)); /* for single node crash consistency */ inm_s32_t process_iobarrier_tag_volume_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg); inm_s32_t process_commit_revert_tag_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg); inm_s32_t process_remove_iobarrier_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg); inm_s32_t process_create_iobarrier_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg); inm_s32_t test_timer(inm_devhandle_t *idhp, void __INM_USER *arg); /* Synchronisation wrappers */ typedef wait_queue_head_t inm_wait_queue_head_t; #define INM_INIT_WAITQUEUE_HEAD(event) \ init_waitqueue_head(event) #define INM_WAKEUP(event) \ wake_up(event) #define INM_WAKEUP_INTERRUPTIBLE(event) \ wake_up_interruptible(event) #define INM_WAIT_EVENT_INTERRUPTIBLE_TIMEOUT(event, val, timeout) \ wait_event_interruptible_timeout(event, val, timeout) typedef struct completion inm_completion_t; #define INM_COMPLETE(event) \ complete(event) #define INM_INIT_COMPLETION(event) \ init_completion(event) #define INM_DESTROY_COMPLETION(compl) #define INM_COMPLETE_AND_EXIT(event, val) \ complete_and_exit(event, val) #define INM_WAIT_FOR_COMPLETION(event) \ wait_for_completion(event) #if LINUX_VERSION_CODE > KERNEL_VERSION(2,6,9) || \ (defined(redhat) && (DISTRO_VER==4) && (UPDATE>=3)) #define INM_WAIT_FOR_COMPLETION_INTERRUPTIBLE(event) \ wait_for_completion_interruptible(event) #else #define INM_WAIT_FOR_COMPLETION_INTERRUPTIBLE(event) \ INM_WAIT_FOR_COMPLETION(event) #endif #define INM_INIT_WORKER_CHILD(compl, condition) \ do { \ *condition = 1; \ INM_COMPLETE(compl); \ } while(0) #define INM_COMPLETE_CONDITION_LOCK(compl) #define INM_COMPLETE_CONDITION_UNLOCK(compl) #define __inm_wait_event_interruptible_timeout(inm_wq, inm_condition, inm_ret) \ do { \ DEFINE_WAIT(__inm_wait); \ \ for (;;) { \ prepare_to_wait(&inm_wq, &__inm_wait, TASK_INTERRUPTIBLE); \ if (inm_condition) \ break; \ if (!signal_pending(current)) { \ schedule_timeout(inm_ret); \ inm_ret = 0; \ break; \ } \ inm_ret = -ERESTARTSYS; \ break; \ } \ finish_wait(&inm_wq, &__inm_wait); \ } while (0) #define inm_wait_event_interruptible_timeout(inm_wq, inm_condition, inm_timeout) \ ({ \ long __inm_ret = inm_timeout; \ if (!(inm_condition)) \ __inm_wait_event_interruptible_timeout(inm_wq, inm_condition, __inm_ret); \ __inm_ret; \ }) #define __inm_wait_event_interruptible(inm_wq, inm_condition, inm_ret) \ do { \ DEFINE_WAIT(__inm_wait); \ \ for (;;) { \ prepare_to_wait(&inm_wq, &__inm_wait, TASK_INTERRUPTIBLE); \ if (inm_condition) \ break; \ if (!signal_pending(current)) { \ schedule(); \ break; \ } \ inm_ret = -ERESTARTSYS; \ break; \ } \ finish_wait(&inm_wq, &__inm_wait); \ } while (0) #define inm_wait_event_interruptible(inm_wq, inm_condition) \ ({ \ long __inm_ret = 0; \ if (!(inm_condition)) \ __inm_wait_event_interruptible(inm_wq, inm_condition, __inm_ret); \ __inm_ret; \ }) #if LINUX_VERSION_CODE < KERNEL_VERSION(3, 5, 0) #define INM_KERNEL_THREAD(task, funcp, arg, len, name, ...) kernel_thread(funcp, arg, CLONE_KERNEL) #define INM_KTHREAD_STOP(task) #else extern struct task_struct *service_thread_task; #define INM_KERNEL_THREAD(task, funcp, arg, len, name, ...) \ ({ \ pid_t __pid = -1; \ task = kthread_run(funcp, arg, name, ## __VA_ARGS__); \ if (!IS_ERR(task)) \ __pid = task->pid; \ \ __pid; \ }) #define INM_KTHREAD_STOP(task) kthread_stop(task) #endif #define INM_ETIMEDOUT (-ETIMEDOUT) #define INM_ENOMEM (-ENOMEM) #define INM_ENOENT (-ENOENT) #define INM_EINVAL (-EINVAL) #define INM_EEXIST (-EEXIST) #define INM_EFAULT (-EFAULT) #define INM_EAGAIN (-EAGAIN) #define INM_EBUSY (-EBUSY) #define INM_EBADRQC (-EBADRQC) #define INM_EINTR (-EINTR) #define INM_EROFS (0) #define INM_ENOTSUP (-EOPNOTSUPP) #define INM_RDONLY (O_RDONLY) #define INM_RDWR (O_RDWR) #define INM_EXCL (O_EXCL) #define INM_SYNC (O_SYNC) #define INM_CREAT (O_CREAT) #define INM_SYNC (O_SYNC) #define INM_TRUNC (O_TRUNC) #define INM_LARGEFILE (O_LARGEFILE) #define INM_DIRECT (O_DIRECT) #define INM_DO_DIV(val1, val2) do_div(val1, val2) #if LINUX_VERSION_CODE < KERNEL_VERSION(3, 5, 0) #define INM_DAEMONIZE(name, arg...) daemonize(name,##arg) #else #define INM_DAEMONIZE(name, arg...) #endif #define INM_SET_USER_NICE(val) set_user_nice(INM_CURPROC, -val) #define INM_IN_INTERRUPT() (in_interrupt() || irqs_disabled()) #define INM_MMAP_LOCK() INM_DOWN_WRITE(¤t->mm->mmap_sem) #define INM_MMAP_UNLOCK() INM_UP_WRITE(¤t->mm->mmap_sem) #define INM_MMAP_PROT_FLAGS (PROT_READ) #define INM_MMAP_MAPFLAG (MAP_SHARED) #if LINUX_VERSION_CODE < KERNEL_VERSION(3, 5, 0) #define INM_DO_STREAM_MAP(dh, off, sz, addr, prot, mapflag) \ ({ \ inm_s32_t __ret = 0; \ INM_MMAP_LOCK(); \ addr = do_mmap_pgoff((struct file *)dh, off, sz, (prot), (mapflag), 0); \ INM_MMAP_UNLOCK(); \ __ret; \ }) #define INM_DO_STREAM_UNMAP(addr, len) \ ({ \ inm_s32_t __ret = 0; \ INM_MMAP_LOCK(); \ __ret = do_munmap(current->mm, addr, len); \ INM_MMAP_UNLOCK(); \ \ __ret; \ }) #else #define INM_DO_STREAM_MAP(dh, off, sz, addr, prot, mapflag) \ ({ \ inm_s32_t __ret = 0; \ addr = vm_mmap((struct file *)dh, addr, sz, (prot), (mapflag), off); \ __ret; \ }) #define INM_DO_STREAM_UNMAP(addr, len) vm_munmap(addr, len) #endif #define INM_FILL_MMAP_PRIVATE_DATA(idhp, cnp) \ do { \ idhp->private_data = cnp; \ } while(0); #define INM_PAGESZ (PAGE_SIZE) #define INM_PAGESHIFT (PAGE_SHIFT) #define INM_PAGEMASK (~(PAGE_SIZE-1)) #define INM_PAGEALIGN(addr) (((addr)+PAGE_SIZE-1)&PAGE_MASK) #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,5,0) #define INM_KMAP_ATOMIC(page, idx) kmap_atomic(page) #define INM_KUNMAP_ATOMIC(vaddr, idx) kunmap_atomic(vaddr) #else #define INM_KMAP_ATOMIC(page, idx) kmap_atomic(page, idx) #define INM_KUNMAP_ATOMIC(vaddr, idx) kunmap_atomic(vaddr, idx) #endif #define INM_PAGE_MAP(vaddr, page, irqidx) (vaddr) = INM_KMAP_ATOMIC((page), (irqidx)) #define INM_PAGE_UNMAP(vaddr, page, irqidx) INM_KUNMAP_ATOMIC(vaddr, (irqidx)) #ifndef INM_PATH_MAX #define INM_PATH_MAX (PATH_MAX) #endif #define INM_NAME_MAX (NAME_MAX) typedef struct _page_wrapper { struct inm_list_head entry; unsigned long *cur_pg; inm_u32_t nr_chgs; #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) inm_u32_t flags; #endif } inm_page_t; #define METAPAGE_ALLOCED_FROM_POOL 0x01 /* * process related macros/constants */ #define INM_CURPROC current #define INM_PROC_ADDR (INM_CURPROC)->mm #define INM_CURPROC_PID (current->pid) #define INM_CURPROC_COMM (current->comm) #define INM_DELAY(ticks) set_current_state(TASK_INTERRUPTIBLE); \ schedule_timeout((ticks)) #define INM_PRAGMA_PUSH1 pack( push, 1 ) #define INM_PRAGMA POP pack( pop ) #define INM_REL_DEV_RESOURCES(ctx) inm_rel_dev_resources(ctx, ctx->tc_priv) #define INM_GET_MINOR(dev) MINOR((dev)) #define INM_GET_MAJOR(dev) MAJOR((dev)) struct host_dev_context; void inm_rel_dev_resources(struct _target_context *ctx, struct host_dev_context *hdcp); inm_s32_t inm_get_mirror_dev(struct _mirror_vol_entry *); void inm_free_mirror_dev(struct _mirror_vol_entry *); inm_dev_t inm_get_dev_t_from_path(const char *); struct _target_context; inm_dev_t inm_dev_id_get(struct _target_context *); inm_u64_t inm_dev_size_get(struct _target_context *); #define INM_S_IRWXUGO S_IRWXUGO #define INM_S_IRUGO S_IRUGO #define INM_S_READ NULL #define INM_S_WRITE NULL #define INM_MEM_ZERO(addr, size) memset(addr, 0, size) #define INM_MEM_CMP(addr_src, addr_tgt, size) memcmp(addr_src, addr_tgt, size) #define INM_SI_MEMINFO(infop) si_meminfo(infop) #define INM_CLOSE_FILE(hdlp, oflag) flt_close_file(hdlp) #define INM_WRITE_SCSI_TIMEOUT (60 * INM_HZ) #define INM_CNTL_SCSI_TIMEOUT (10 * INM_HZ) #define IS_DMA_FLAG(tcp, flag) inm_dma_flag(tcp, &flag) inm_s32_t validate_file(char *pathp, inm_s32_t *type); void inm_dma_flag(struct _target_context *tcp, inm_u32_t *flag); void print_AT_stat(struct _target_context *, char *page, inm_s32_t *len); inm_s32_t try_reactive_offline_AT_path(struct _target_context *, unsigned char *, inm_u32_t, inm_s32_t, unsigned char *, inm_u32_t, inm_u32_t); #define inm_claim_metadata_page(tgt_ctxt, chg_node, wdatap) int _inm_xm_mapin(struct _target_context *, void *, char **); #define inm_xm_mapin(tgt_ctxt, wdatap, map_addr) \ _inm_xm_mapin(tgt_ctxt, (void *)wdatap, map_addr) #define inm_xm_det(wdatap, map_addr) #define INM_SET_ENDIO_FN(bi, endio_fn) bi->bi_end_io = endio_fn #define INM_CHAIN_BUF(cur_bi, prev_bi) #define INM_GET_FWD_BUF(bp) NULL #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,24) #if LINUX_VERSION_CODE >= KERNEL_VERSION(4,4,0) /* * For kernels > 4.4, the error comes set as part of bio */ #define INM_IMB_ERROR_SET(var, error_value) #define INM_MIRROR_IODONE(fn, bp, done, error) fn(inm_buf_t *bp) #define INM_PT_IODONE(mbufinfo) \ { \ mbufinfo->imb_org_bp->bi_flags = mbufinfo->imb_pt_buf.bi_flags; \ INM_BUF_COUNT(mbufinfo->imb_org_bp) = INM_BUF_COUNT(&(mbufinfo->imb_pt_buf)); \ inm_bio_error(mbufinfo->imb_org_bp) = inm_bio_error(&(mbufinfo->imb_pt_buf)); \ mbufinfo->imb_org_bp->bi_end_io(mbufinfo->imb_org_bp); \ } #define INM_BUF_FAILED(bp, error) inm_bio_error(bp) #else /* Else: Kernel < 4.4 */ #define INM_IMB_ERROR_SET(var, error_value) var = error_value #define INM_MIRROR_IODONE(fn, bp, done, error) fn(inm_buf_t *bp, inm_s32_t error) #define INM_PT_IODONE(mbufinfo) \ { \ mbufinfo->imb_org_bp->bi_flags = mbufinfo->imb_pt_buf.bi_flags; \ INM_BUF_COUNT(mbufinfo->imb_org_bp) = INM_BUF_COUNT(&(mbufinfo->imb_pt_buf)); \ mbufinfo->imb_org_bp->bi_end_io(mbufinfo->imb_org_bp, mbufinfo->imb_pt_err); \ } #define INM_BUF_FAILED(bp, error) error #endif /* End: Kernel >= 4.4 */ /* Kernels > 2.6.24 */ #define INM_IMB_DONE_SET(var, done_value) #define INM_RET_IODONE #define INM_MORE_IODONE_EXPECTED(bp) #define INM_BUF_RESID(bi, done) INM_BUF_COUNT(bi) #else /* End: Kernel > 2.6.24 */ #define INM_MIRROR_IODONE(fn, bp, done, error) fn(inm_buf_t *bp, inm_u32_t done, inm_s32_t error) #define INM_IMB_ERROR_SET(var, error_value) var = error_value #define INM_IMB_DONE_SET(var, done_value) var = done_value #define INM_RET_IODONE 0 #define INM_PT_IODONE(mbufinfo) \ { \ mbufinfo->imb_org_bp->bi_flags = mbufinfo->imb_pt_buf.bi_flags; \ mbufinfo->imb_org_bp->bi_size = mbufinfo->imb_pt_buf.bi_size; \ mbufinfo->imb_org_bp->bi_end_io(mbufinfo->imb_org_bp, mbufinfo->imb_pt_done, mbufinfo->imb_pt_err); \ } #define INM_MORE_IODONE_EXPECTED(bp) \ { \ if(bp->bi_size){ \ return 1; \ } \ } #define INM_BUF_RESID(bi, done) (bi->bi_size - done) #define INM_BUF_FAILED(bp, error) error #define IS_ALIGNED(x, y) (!(x & (y - 1))) #endif struct block_device *inm_open_by_devnum(dev_t, unsigned); #define INM_SET_HDEV_MXS(hdcp, val) #define INM_GET_HDEV_MXS(hdc_dev) 0 void free_tc_global_at_lun(struct inm_list_head *dst_list); typedef struct _inm_at_lun_reconfig{ inm_u64_t flag; char atdev_name[INM_GUID_LEN_MAX]; }inm_at_lun_reconfig_t; typedef struct _dc_at_vol_entry { struct inm_list_head dc_at_this_entry; char dc_at_name[INM_GUID_LEN_MAX]; inm_block_device_t *dc_at_dev; } dc_at_vol_entry_t; inm_s32_t process_block_at_lun(inm_devhandle_t *handle, void * arg); void free_all_at_lun_entries(void); dc_at_vol_entry_t * find_dc_at_lun_entry(char *); void replace_sd_open(void); #define DRV_LOADED_PARTIALLY 0x1 #define DRV_LOADED_FULLY 0x2 #define TAG_COMMIT_NOT_PENDING 0 #define TAG_COMMIT_PENDING 1 inm_s32_t process_init_driver_fully(inm_devhandle_t *, void *); #define inm_ksleep(x) msleep_interruptible(x) typedef struct gendisk inm_disk_t; struct device *inm_get_parent_dev(struct gendisk *bd_disk); inm_s32_t inm_get_user_page(void __INM_USER *, struct page **); #ifdef INM_RECUSIVE_ADSPC #define INM_AOPS(mapping) (mapping) #else #define INM_AOPS(mapping) ((mapping)->a_ops) #endif #define INM_INODE_AOPS(inode) INM_AOPS((inode)->i_mapping) #if (defined(RHEL_MAJOR) && (RHEL_MAJOR == 5)) #ifndef kobj_to_disk #define kobj_to_disk(k) container_of(k, struct gendisk, kobj) #endif #endif inm_s32_t inm_register_reboot_notifier(int); void inm_blkdev_name(inm_bio_dev_t *bdev, char *name); inm_s32_t inm_blkdev_get(inm_bio_dev_t *bdev); #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,9,0) /* * Multiple kernels changed q->backing_dev_info from structure to pointer * in same minor version (errata) so cant have kernel version based check. * * NOTE: The macros below are gcc compile time directives to choose the right * expression to access the structure based on the type of q->backing_dev_info * and will not work with other compilers. */ #define INM_BDI_IS_PTR(q) \ __builtin_types_compatible_p(typeof(q->backing_dev_info), \ struct backing_dev_info *) #define INM_BDI_PTR(q) \ __builtin_choose_expr(INM_BDI_IS_PTR(q), \ ((q)->backing_dev_info), (&((q)->backing_dev_info))) #define INM_BDI_CAPABILITIES(q) ((INM_BDI_PTR(q))->capabilities) #endif /* KERNEL_VERSION(3,9,0) */ #define INM_BDEVNAME_SIZE BDEVNAME_SIZE #if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 5, 0) #define INM_PAGE_TO_VIRT(page) page_to_virt(page) #define INM_VIRT_ADDR_VALID(vaddr) virt_addr_valid(vaddr) #else #if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 2, 0) #ifndef pfn_to_virt #define pfn_to_virt(pfn) __va((pfn) << PAGE_SHIFT) #endif #ifndef page_to_virt #define page_to_virt(page) pfn_to_virt(page_to_pfn(page)) #endif #define INM_PAGE_TO_VIRT(page) page_to_virt(page) #define INM_VIRT_ADDR_VALID(vaddr) virt_addr_valid(vaddr) #else #define INM_VIRT_ADDR_VALID(vaddr) (1) #endif #endif #if LINUX_VERSION_CODE >= KERNEL_VERSION(5,10,0) #define set_fs(a) #define get_fs() \ ({ \ mm_segment_t __ret; \ __ret; \ }) #endif #if LINUX_VERSION_CODE >= KERNEL_VERSION(5,11,0) #define inm_freeze_bdev(__bdev, __sb) freeze_bdev(__bdev) #define inm_thaw_bdev(__bdev, __sb) thaw_bdev(__bdev) #else #define inm_freeze_bdev(__bdev, __sb) \ ({ \ int __ret = 0; \ __sb = freeze_bdev(__bdev); \ if (__sb && IS_ERR(__sb)) \ __ret = PTR_ERR(__sb); \ \ __ret; \ }) #define inm_thaw_bdev(__bdev, __sb) thaw_bdev(__bdev, __sb) #endif #endif /* _INM_OSDEP_H */ involflt-0.1.0/src/tunable_params.c0000755000000000000000000040750214467303177016022 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include "involflt.h" #include "involflt-common.h" #include "data-mode.h" #include "utils.h" #include "change-node.h" #include "filestream.h" #include "iobuffer.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "bitmap_api.h" #include "VBitmap.h" #include "work_queue.h" #include "data-file-mode.h" #include "target-context.h" #include "driver-context.h" #include "metadata-mode.h" #include "statechange.h" #include "file-io.h" #include "tunable_params.h" #include "osdep.h" #include "filter.h" #include "filter_host.h" #include "db_routines.h" #include "verifier.h" extern driver_context_t *driver_ctx; extern char *ErrorToRegErrorDescriptionsA[]; #ifdef INM_SOLARIS extern pgcnt_t physinstalled; #endif void init_driver_tunable_params(void) { load_driver_params(); driver_ctx->dc_tel.dt_timestamp_in_persistent_store = driver_ctx->last_time_stamp; driver_ctx->dc_tel.dt_seqno_in_persistent_store = driver_ctx->last_time_stamp_seqno; driver_ctx->tunable_params.max_data_pages_per_target = DEFAULT_MAX_DATA_PAGES_PER_TARGET; driver_ctx->tunable_params.max_data_size_per_non_data_mode_drty_blk = DEFAULT_MAX_DATA_SIZE_PER_NON_DATA_MODE_DIRTY_BLOCK; driver_ctx->tunable_params.free_pages_thres_for_filewrite = (((driver_ctx->tunable_params.data_pool_size << (MEGABYTE_BIT_SHIFT-INM_PAGESHIFT)) * (driver_ctx->tunable_params.free_percent_thres_for_filewrite)) / 100); if(!driver_ctx->clean_shutdown) driver_ctx->unclean_shutdown = 1; } /* Global data page pool memory allocation = 6.25% of system memory * Examples -- * For <= 1 GB System Memory - 64 MB * For 2 GB System Memory - 128 MB * For 4 GB System Memory - 256 MB * For 8 GB System Memory - 512 MB * For 16 GB System Memory - 1024 MB * For 32 GB System Memory - 2048 MB */ inm_u32_t get_data_page_pool_mb(void) { inm_u32_t total_ram_mb = 0, data_pool_mb = 0; inm_meminfo_t info; INM_SI_MEMINFO(&info); total_ram_mb = info.totalram >> (MEGABYTE_BIT_SHIFT-INM_PAGESHIFT); if (total_ram_mb < DEFAULT_DATA_POOL_SIZE_MB) return 0; /* 6.25 % of system memory is equal to 1/16th system memory */ data_pool_mb = total_ram_mb >> 4; #ifdef APPLIANCE_DRV if (total_ram_mb > INM_INCR_DPS_LIMIT_ON_APP) { data_pool_mb = total_ram_mb >> INM_DEFAULT_DPS_APPLIANCE_DRV; } dbg("total ram size is %u and DPS is %u",total_ram_mb, data_pool_mb); #endif if (data_pool_mb < DEFAULT_DATA_POOL_SIZE_MB) data_pool_mb = DEFAULT_DATA_POOL_SIZE_MB; return data_pool_mb; } /* * ============================COMMON ATTRIBUTES=========================================================== */ inm_s32_t write_common_attr(const char *file_name, void *buf, inm_s32_t len) { char *path = NULL; inm_u32_t wrote = 0, ret = 0; if(!get_path_memory(&path)) { err("Failed to allocate memory for path"); goto out; } snprintf(path, INM_PATH_MAX, "%s/%s/%s",PERSISTENT_DIR, COMMON_ATTR_NAME, file_name); dbg("Writing to file %s", path); if(!write_full_file(path, (void *)buf, len, &wrote)) { err("write to persistent store failed %s", path); goto free_path_buf; } ret = 1; free_path_buf: free_path_memory(&path); out: return ret; } inm_s32_t read_common_attr(char *fname, void **buf, inm_s32_t len, inm_s32_t *bytes_read) { inm_s32_t ret = 0; char *path = NULL; if(!get_path_memory(&path)) { err("Failed to allocate memory for path"); goto out; } snprintf(path, INM_PATH_MAX, "%s/%s/%s",PERSISTENT_DIR, COMMON_ATTR_NAME, fname); dbg("Reading from file %s", path); *buf = (void *)INM_KMALLOC(len, INM_KM_SLEEP, INM_KERNEL_HEAP); if(!*buf) { err("Failed to allocate memory for buffer"); goto free_path_buf; } INM_MEM_ZERO(*buf, len); if(!read_full_file(path, *buf, len, (inm_u32_t *)bytes_read)) goto free_buf; ret = 1; goto free_path_buf; free_buf: if(*buf) INM_KFREE(*buf, len, INM_KERNEL_HEAP); *buf = NULL; free_path_buf: free_path_memory(&path); out: return ret; } static ssize_t common_attr_show(struct attribute *attr, char *page) { struct common_attribute *common_attr; ssize_t ret = 0; common_attr = inm_container_of(attr, struct common_attribute, attr); if (common_attr->show) ret = common_attr->show(page); return ret; } static ssize_t common_attr_store(struct attribute *attr, const char *page, size_t len) { struct common_attribute *common_attr; ssize_t ret = len; common_attr = inm_container_of(attr, struct common_attribute, attr); if (common_attr->store) ret = common_attr->store(common_attr->file_name, page, len); if(ret < 0) return ret; else { return len; } } ssize_t pool_size_show(char *buf) { inm_u64_t mb = driver_ctx->tunable_params.data_pool_size; return snprintf(buf, INM_PAGESZ, "%lldMB\n", (unsigned long long)mb); } ssize_t pool_size_store(const char *file_name, const char *buf, size_t len) { inm_u32_t num_pages = 0, diff_pages = 0; inm_u32_t mem = 0; inm_u32_t orig_data_pool_size = 0; inm_u32_t max_data_pool_limit = 0; inm_meminfo_t meminfo; unsigned long lock_flag; inm_s32_t ret = 0; if (len > 6) { err("Invalid Data Pool Size Supplied: Very large value"); return -EINVAL; } if (!is_digit(buf, len)) { err("Data Pool Size supplied contains non-digit chars"); return -EINVAL; } mem = inm_atoi(buf); INM_SPIN_LOCK_IRQSAVE(&driver_ctx->tunables_lock, lock_flag); orig_data_pool_size = driver_ctx->tunable_params.data_pool_size; driver_ctx->tunable_params.data_pool_size = mem; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->tunables_lock, lock_flag); if (mem < DEFAULT_DATA_POOL_SIZE_MB) { err("Invalid Data Pool Size! It cannot be less than %d MB.", DEFAULT_DATA_POOL_SIZE_MB); ret = INM_EINVAL; goto restore_old; } INM_SI_MEMINFO(&meminfo); max_data_pool_limit = ((meminfo.totalram * driver_ctx->tunable_params.max_data_pool_percent)/100); num_pages = mem << (MEGABYTE_BIT_SHIFT - INM_PAGESHIFT); info("DataPagePool modification - num_pages:%u current alloc'd pages:%u", num_pages, driver_ctx->data_flt_ctx.pages_allocated); if (num_pages > max_data_pool_limit) { max_data_pool_limit >>= (MEGABYTE_BIT_SHIFT - INM_PAGESHIFT); info( "DataPoolSize:%uMB cannot be greater than %u%%(%uMB) of system memory:%luMB", mem, driver_ctx->tunable_params.max_data_pool_percent, max_data_pool_limit, meminfo.totalram >> (MEGABYTE_BIT_SHIFT - INM_PAGESHIFT)); ret = INM_EINVAL; goto restore_old; } if(driver_ctx->data_flt_ctx.pages_allocated > num_pages) { diff_pages = driver_ctx->data_flt_ctx.pages_allocated - num_pages; INM_BUG_ON(diff_pages > driver_ctx->data_flt_ctx.pages_allocated); if (diff_pages > (driver_ctx->dc_cur_unres_pages)) { err("DataPoolSize can't be reduced below %uMB due to reservations", driver_ctx->dc_cur_res_pages >> (MEGABYTE_BIT_SHIFT-INM_PAGESHIFT)); ret = INM_EINVAL; goto restore_old; } info("deleting %d pages", diff_pages); delete_data_pages(diff_pages); recalc_data_file_mode_thres(); } if (!write_common_attr(file_name, (void *)buf, len)) { ret = INM_EINVAL; goto restore_old; } out: return len; restore_old: INM_SPIN_LOCK_IRQSAVE(&driver_ctx->tunables_lock, lock_flag); driver_ctx->tunable_params.data_pool_size = orig_data_pool_size; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->tunables_lock, lock_flag); goto out; } ssize_t inm_time_reorg_data_pool_show(char *buf) { unsigned int pc = driver_ctx->tunable_params.time_reorg_data_pool_sec; return snprintf(buf, INM_PAGESZ, "%u\n", pc); } ssize_t inm_time_reorg_data_pool_store(const char *file_name, const char *buf, size_t len) { int time_sec = 0; inm_irqflag_t lock_flag; if (!is_digit(buf, len)) { err("Percent Change Data Pool Size supplied contains non-digit chars"); return INM_EINVAL; } time_sec = inm_atoi(buf); if(time_sec < 0){ err("Time for Reorg Data Pool Size supplied is not a positive number"); return INM_EINVAL; } INM_SPIN_LOCK_IRQSAVE(&driver_ctx->tunables_lock, lock_flag); driver_ctx->tunable_params.time_reorg_data_pool_sec = time_sec; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->tunables_lock, lock_flag); if (!write_common_attr(file_name, (void *)buf, len)) { err("TimeReorgDataPoolSec update failed to write file:%s.", file_name); return INM_EINVAL; } return len; } ssize_t inm_time_reorg_data_pool_factor_store(const char *file_name, const char *buf, size_t len) { int factor = 0; inm_irqflag_t lock_flag; if (!is_digit(buf, len)) { err("Time reorg Data Pool Factor has to be interger"); return INM_EINVAL; } factor = inm_atoi(buf); if(factor <= 0){ err("Time Reorg Data Pool Factor has to be interger supplied is not a positive number"); return INM_EINVAL; } INM_SPIN_LOCK_IRQSAVE(&driver_ctx->tunables_lock, lock_flag); driver_ctx->tunable_params.time_reorg_data_pool_factor = factor; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->tunables_lock, lock_flag); if (!write_common_attr(file_name, (void *)buf, len)) { err("TimeReorgDataPoolFactor update failed to write file:%s.", file_name); return INM_EINVAL; } return len; } ssize_t inm_time_reorg_data_pool_factor_show(char *buf) { unsigned int pc = driver_ctx->tunable_params.time_reorg_data_pool_factor; return snprintf(buf, INM_PAGESZ, "%u\n", pc); } ssize_t inm_recio_show(char *buf) { return snprintf(buf, INM_PAGESZ, "%d\n", driver_ctx->tunable_params.enable_recio); } ssize_t inm_recio_store(const char *file_name, const char *buf, size_t len) { inm_s32_t val = 0; if(!is_digit(buf, len)) { err("Invalid volume threshold percent supplied: has non-digit chars"); return -EINVAL; } val = inm_atoi(buf); if((val != 0) && (val != 1)) { err("Cant have anything other than 0 and 1 for TrackRecursiveWrites"); return -EINVAL; } if(!write_common_attr(file_name, (void *)buf, len)) return -EINVAL; else driver_ctx->tunable_params.enable_recio = val; if (driver_ctx->tunable_params.enable_recio) { info("Recursive IO tracking Enabled"); } else { info("Recursive IO tracking Disabled"); } return len; } ssize_t inm_recio_read(char *fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; driver_ctx->tunable_params.enable_recio = inm_atoi(buf); goto free_buf; } set_default: driver_ctx->tunable_params.enable_recio = 0; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); if (driver_ctx->tunable_params.enable_recio) { info("Recursive IO tracking Enabled"); } return 0; } ssize_t inm_stable_pages_show(char *buf) { return snprintf(buf, INM_PAGESZ, "%d\n", driver_ctx->tunable_params.stable_pages); } ssize_t inm_stable_pages_store(const char *file_name, const char *buf, size_t len) { #ifndef INM_LINUX return -EINVAL; #elif LINUX_VERSION_CODE < KERNEL_VERSION(3,9,0) return -EINVAL; #else inm_s32_t val = 0; if(!is_digit(buf, len)) { err("Invalid volume threshold percent supplied: has non-digit chars"); return -EINVAL; } val = inm_atoi(buf); if((val != 0) && (val != 1)) { err("Cant have anything other than 0 and 1 for TrackRecursiveWrites"); return -EINVAL; } if(!write_common_attr(file_name, (void *)buf, len)) return -EINVAL; else driver_ctx->tunable_params.stable_pages = val; if (driver_ctx->tunable_params.stable_pages) { info("Stable Pages Enabled"); set_stable_pages_for_all_devs(); } else { info("Stable Pages Disabled"); reset_stable_pages_for_all_devs(); } return len; #endif } ssize_t inm_stable_pages_read(char *fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; driver_ctx->tunable_params.stable_pages = inm_atoi(buf); if (driver_ctx->tunable_params.stable_pages) { info("Stable Pages Enabled"); set_stable_pages_for_all_devs(); } else { info("Stable Pages Disabled"); reset_stable_pages_for_all_devs(); } goto free_buf; } set_default: driver_ctx->tunable_params.stable_pages = 0; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); return 0; } ssize_t inm_chained_io_show(char *buf) { return snprintf(buf, INM_PAGESZ, "%d\n", driver_ctx->tunable_params.enable_chained_io); } ssize_t inm_chained_io_store(const char *file_name, const char *buf, size_t len) { inm_s32_t val = 0; if(!is_digit(buf, len)) { err("Invalid EnableChainedIO value: has non-digit chars"); return -EINVAL; } val = inm_atoi(buf); if((val != 0) && (val != 1)) { err("Cant have anything other than 0 and 1 for EnableChainedIO"); return -EINVAL; } if(!write_common_attr(file_name, (void *)buf, len)) return -EINVAL; INM_DOWN_READ(&driver_ctx->tgt_list_sem); info("Chained IO: %d", val); driver_ctx->tunable_params.enable_chained_io = val; INM_UP_READ(&driver_ctx->tgt_list_sem); return len; } ssize_t inm_chained_io_read(char *fname) { inm_s32_t bytes_read = 0, val = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_common_attr(fname, &buf, val, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; val = inm_atoi(buf); INM_DOWN_READ(&driver_ctx->tgt_list_sem); driver_ctx->tunable_params.enable_chained_io = val; INM_UP_READ(&driver_ctx->tgt_list_sem); goto free_buf; } set_default: #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) driver_ctx->tunable_params.enable_chained_io = 1; #else driver_ctx->tunable_params.enable_chained_io = 0; #endif free_buf: if(buf) INM_KFREE(buf, val, INM_KERNEL_HEAP); return 0; } ssize_t inm_vacp_iobarrier_timeout_read(char *fname) { inm_s32_t bytes_read = 0; inm_s32_t buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; driver_ctx->tunable_params.vacp_iobarrier_timeout = inm_atoi(buf); goto free_buf; } set_default: driver_ctx->tunable_params.vacp_iobarrier_timeout = VACP_IOBARRIER_TIMEOUT; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); return 0; } ssize_t inm_vacp_iobarrier_timeout_store(const char *file_name, const char *buf, size_t len) { int timeout = 0; inm_irqflag_t lock_flag; if (!is_digit(buf, len)) { err("VacpIObarrierTimeout has to be interger"); return INM_EINVAL; } timeout = inm_atoi(buf); if (timeout <= 0) { err("VacpIObarrierTimeout must be greater than 0"); return INM_EINVAL; } if (!write_common_attr(file_name, (void *)buf, len)) { err("VacpIObarrierTimeout update failed to write file:%s.", file_name); return INM_EINVAL; } INM_SPIN_LOCK_IRQSAVE(&driver_ctx->tunables_lock, lock_flag); driver_ctx->tunable_params.vacp_iobarrier_timeout = timeout; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->tunables_lock, lock_flag); return len; } ssize_t inm_vacp_iobarrier_timeout_show(char *buf) { unsigned int pc = driver_ctx->tunable_params.vacp_iobarrier_timeout; return snprintf(buf, INM_PAGESZ, "%u\n", pc); } ssize_t inm_fs_freeze_timeout_read(char *fname) { inm_s32_t bytes_read = 0; inm_s32_t buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; driver_ctx->tunable_params.fs_freeze_timeout = inm_atoi(buf); goto free_buf; } set_default: driver_ctx->tunable_params.fs_freeze_timeout = FS_FREEZE_TIMEOUT; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); return 0; } ssize_t inm_fs_freeze_timeout_store(const char *file_name, const char *buf, size_t len) { int timeout = 0; inm_irqflag_t lock_flag; if (!is_digit(buf, len)) { err("FsFreezeTimeout has to be interger"); return INM_EINVAL; } timeout = inm_atoi(buf); if (timeout <= 0) { err("FsFreezeTimeout must be greater than 0"); return INM_EINVAL; } if (!write_common_attr(file_name, (void *)buf, len)) { err("FsFreezeTimeout update failed to write file:%s.", file_name); return INM_EINVAL; } INM_SPIN_LOCK_IRQSAVE(&driver_ctx->tunables_lock, lock_flag); driver_ctx->tunable_params.fs_freeze_timeout = timeout; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->tunables_lock, lock_flag); return len; } ssize_t inm_fs_freeze_timeout_show(char *buf) { unsigned int pc = driver_ctx->tunable_params.fs_freeze_timeout; return snprintf(buf, INM_PAGESZ, "%u\n", pc); } ssize_t inm_vacp_app_tag_commit_timeout_read(char *fname) { inm_s32_t bytes_read = 0; inm_s32_t buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; driver_ctx->tunable_params.vacp_app_tag_commit_timeout = inm_atoi(buf); goto free_buf; } set_default: driver_ctx->tunable_params.vacp_app_tag_commit_timeout = VACP_APP_TAG_COMMIT_TIMEOUT; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); return 0; } ssize_t inm_vacp_app_tag_commit_timeout_store(const char *file_name, const char *buf, size_t len) { int timeout = 0; inm_irqflag_t lock_flag; if (!is_digit(buf, len)) { err("VacpAppTagCommitTimeout has to be interger"); return INM_EINVAL; } timeout = inm_atoi(buf); if (timeout <= 0) { err("VacpAppTagCommitTimeout must be greater than 0"); return INM_EINVAL; } if (!write_common_attr(file_name, (void *)buf, len)) { err("VacpAppTagCommitTimeout update failed to write file:%s.", file_name); return INM_EINVAL; } INM_SPIN_LOCK_IRQSAVE(&driver_ctx->tunables_lock, lock_flag); driver_ctx->tunable_params.vacp_app_tag_commit_timeout = timeout; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->tunables_lock, lock_flag); return len; } ssize_t inm_vacp_app_tag_commit_timeout_show(char *buf) { unsigned int pc = driver_ctx->tunable_params.vacp_app_tag_commit_timeout; return snprintf(buf, INM_PAGESZ, "%u\n", pc); } ssize_t inm_percent_change_data_pool_size_show(char *buf) { unsigned int pc = driver_ctx->tunable_params.percent_change_data_pool_size; return snprintf(buf, INM_PAGESZ, "%u%%\n", pc); } ssize_t inm_percent_change_data_pool_size_store(const char *file_name, const char *buf, size_t len) { int percentage = 0; inm_meminfo_t meminfo; inm_u32_t nr_pages; inm_irqflag_t lock_flag; if (len > 2) { err("Invalid Percent Change Data Pool Size Supplied: Very large value"); return INM_EINVAL; } if (!is_digit(buf, len)) { err("Percent Change Data Pool Size supplied contains non-digit chars"); return INM_EINVAL; } percentage = inm_atoi(buf); if(percentage <= 0){ err("Percent Change Data Pool Size supplied is not a positive number"); return INM_EINVAL; } INM_SI_MEMINFO(&meminfo); nr_pages = ((meminfo.totalram * percentage) / 100); info("PercentChangeDataPoolSize modification - DataPoolSize:%u current alloc'd pages:%u", nr_pages, driver_ctx->data_flt_ctx.pages_allocated); INM_SPIN_LOCK_IRQSAVE(&driver_ctx->tunables_lock, lock_flag); driver_ctx->tunable_params.percent_change_data_pool_size = percentage; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->tunables_lock, lock_flag); if (!write_common_attr(file_name, (void *)buf, len)) { err("PercentChangeDataPoolSize update failed to write file:%s.", file_name); return INM_EINVAL; } INM_SPIN_LOCK_IRQSAVE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); driver_ctx->data_flt_ctx.dp_nrpgs_slab = nr_pages; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); return len; } ssize_t inm_maxdatapool_sz_show(char *buf) { unsigned int pc = driver_ctx->tunable_params.max_data_pool_percent; return snprintf(buf, INM_PAGESZ, "%u%%\n", pc); } ssize_t inm_maxdatapool_sz_store(const char *file_name, const char *buf, size_t len) { int percentage = 0; unsigned int max_data_pool_limit = 0; inm_meminfo_t meminfo; if (len > 2) { err("Invalid Max Data Pool Size Supplied: Very large value"); return -EINVAL; } if (!is_digit(buf, len)) { err("Max Data Pool Size supplied contains non-digit chars"); return -EINVAL; } percentage = inm_atoi(buf); INM_SI_MEMINFO(&meminfo); max_data_pool_limit = ((meminfo.totalram * percentage) / 100); info("MaxDataPagePoolSize modification - MaxDataPoolSize:%u current alloc'd pages:%u", max_data_pool_limit, driver_ctx->data_flt_ctx.pages_allocated); if(max_data_pool_limit < driver_ctx->data_flt_ctx.pages_allocated) { info("MaxDataPoolSize:%u pages cannot be less than the current alloc'd pages:%u", max_data_pool_limit, driver_ctx->data_flt_ctx.pages_allocated); return -EINVAL; } driver_ctx->tunable_params.max_data_pool_percent = percentage; if (!write_common_attr(file_name, (void *)buf, len)) { err("MaxDataPoolSize update failed to write file:%s.", file_name); return -EINVAL; } return len; } ssize_t inm_vol_respool_sz_show(char *buf) { inm_u64_t mb = driver_ctx->tunable_params.volume_data_pool_size; return snprintf(buf, INM_PAGESZ, "%lldMB\n", (long long)mb); } ssize_t inm_vol_respool_sz_store(const char *file_name, const char *buf, size_t len) { inm_s32_t num_pages = 0; inm_u64_t mem = 0; unsigned long lock_flag; /* Not supported */ return -EINVAL; if (len > 6) { err("Invalid Data Pool Size Supplied: Very large value"); return -EINVAL; } if (!is_digit(buf, len)) { err("Data Pool Size supplied contains non-digit chars"); return -EINVAL; } mem = inm_atoi64(buf); num_pages = mem << (MEGABYTE_BIT_SHIFT - INM_PAGESHIFT); if(num_pages > driver_ctx->data_flt_ctx.pages_allocated){ err("Per Volume Reserve Data Pool can not be bigger then Global Data Pool size"); return INM_EINVAL; } INM_SPIN_LOCK_IRQSAVE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); driver_ctx->dc_vol_data_pool_size = num_pages; driver_ctx->tunable_params.volume_data_pool_size = num_pages >> (MEGABYTE_BIT_SHIFT - INM_PAGESHIFT); INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); if (!write_common_attr(file_name, (void *)buf, len)) { err("VolumeResDataPoolSize update failed to write file:%s.", file_name); return -EINVAL; } return len; } ssize_t log_dir_show(char *buf) { return snprintf(buf, INM_PAGESZ, "%s\n", driver_ctx->tunable_params.data_file_log_dir); } ssize_t log_dir_store(const char *file_name, const char *buf, size_t len) { unsigned long lock_flag = 0; if(!write_common_attr(file_name, (void *)buf, len)) { return -EINVAL; } else { INM_SPIN_LOCK_IRQSAVE(&driver_ctx->tunables_lock, lock_flag); if (strncpy_s(driver_ctx->tunable_params.data_file_log_dir, INM_PATH_MAX, buf, len)) { INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->tunables_lock, lock_flag); return -INM_EFAULT; } driver_ctx->tunable_params.data_file_log_dir[len] = '\0'; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->tunables_lock, lock_flag); } return len; } ssize_t free_thres_show(char *buf) { return snprintf(buf, INM_PAGESZ, "%d%%\n", driver_ctx->tunable_params.free_percent_thres_for_filewrite); } ssize_t free_thres_store(const char *file_name, const char *buf, size_t len) { inm_s32_t thres = 0; if(len > 3) { err("Invalid Free threshold percent supplied"); return -EINVAL; } if(!is_digit(buf, len)) { err("Invalid Free threshold percent supplied: has non-digit chars"); return -EINVAL; } thres = inm_atoi(buf); if((thres < 0) || (thres > 100)) { err("Percent free page threshold can't be less than zero or greater than 100"); } if(!write_common_attr(file_name, (void *)buf, len)) { return -EINVAL; } else { driver_ctx->tunable_params.free_percent_thres_for_filewrite = thres; recalc_data_file_mode_thres(); } return len; } ssize_t volume_thres_show(char *buf) { return snprintf(buf, INM_PAGESZ, "%d%%\n", driver_ctx->tunable_params.volume_percent_thres_for_filewrite); } ssize_t volume_thres_store(const char *file_name, const char *buf, size_t len) { inm_s32_t thres = 0; if(len > 3) { err("Invalid volume threshold percent supplied"); return -EINVAL; } if(!is_digit(buf, len)) { err("Invalid volume threshold percent supplied: has non-digit chars"); return -EINVAL; } thres = inm_atoi(buf); if((thres < 0) || (thres > 100)) { err("Percent free page threshold can't be less than zero or greater than 100"); return -EINVAL; } if(!write_common_attr(file_name, (void *)buf, len)) { return -EINVAL; } else { driver_ctx->tunable_params.volume_percent_thres_for_filewrite = thres; } return len; } ssize_t dbhwm_sns_show(char *buf) { return snprintf(buf, INM_PAGESZ, "%d\n", driver_ctx->tunable_params.db_high_water_marks[SERVICE_NOTSTARTED]); } ssize_t dbhwm_sns_store(const char *file_name, const char *buf, size_t len) { inm_s32_t thres = 0; if(!is_digit(buf, len)) { err("Invalid volume threshold percent supplied: has non-digit chars"); return -EINVAL; } thres = inm_atoi(buf); if(!write_common_attr(file_name, (void *)buf, len)) { return -EINVAL; } else { driver_ctx->tunable_params.db_high_water_marks[SERVICE_NOTSTARTED] = thres; } return len; } ssize_t dblwm_sr_show(char *buf) { return snprintf(buf, INM_PAGESZ, "%d\n", driver_ctx->tunable_params.db_low_water_mark_while_service_running); } ssize_t dblwm_sr_store(const char *file_name, const char *buf, size_t len) { inm_s32_t thres = 0; if(!is_digit(buf, len)) { err("Invalid volume threshold percent supplied: has non-digit chars"); return -EINVAL; } thres = inm_atoi(buf); if(!write_common_attr(file_name, (void *)buf, len)) { return -EINVAL; } else { driver_ctx->tunable_params.db_low_water_mark_while_service_running = thres; } return len; } ssize_t dbhwm_sr_show(char *buf) { return snprintf(buf, INM_PAGESZ, "%d\n", driver_ctx->tunable_params.db_high_water_marks[SERVICE_RUNNING]); } ssize_t dbhwm_sr_store(const char *file_name, const char *buf, size_t len) { inm_s32_t thres = 0; if(!is_digit(buf, len)) { err("Invalid volume threshold percent supplied: has non-digit chars"); return -EINVAL; } thres = inm_atoi(buf); if(!write_common_attr(file_name, (void *)buf, len)) { return -EINVAL; } else { driver_ctx->tunable_params.db_high_water_marks[SERVICE_RUNNING] = thres; } return len; } ssize_t dbhwm_ss_show(char *buf) { return snprintf(buf, INM_PAGESZ, "%d\n", driver_ctx->tunable_params.db_high_water_marks[SERVICE_SHUTDOWN]); } ssize_t dbhwm_ss_store(const char *file_name, const char *buf, size_t len) { inm_s32_t thres = 0; if(!is_digit(buf, len)) { err("Invalid volume threshold percent supplied: has non-digit chars"); return -EINVAL; } thres = inm_atoi(buf); if(!write_common_attr(file_name, (void *)buf, len)) { return -EINVAL; } else { driver_ctx->tunable_params.db_high_water_marks[SERVICE_SHUTDOWN] = thres; } return len; } ssize_t dbp_hwm_show(char *buf) { return snprintf(buf, INM_PAGESZ, "%d\n", driver_ctx->tunable_params.db_topurge_when_high_water_mark_is_reached); } ssize_t dbp_hwm_store(const char *file_name, const char *buf, size_t len) { inm_s32_t thres = 0; if(!is_digit(buf, len)) { err("Invalid volume threshold percent supplied: has non-digit chars"); return -EINVAL; } thres = inm_atoi(buf); if(!write_common_attr(file_name, (void *)buf, len)) { return -EINVAL; } else { driver_ctx->tunable_params.db_topurge_when_high_water_mark_is_reached = thres; } return len; } ssize_t max_bmapmem_show(char *buf) { return snprintf(buf, INM_PAGESZ, "%d\n", driver_ctx->dc_bmap_info.max_bitmap_buffer_memory); } ssize_t max_bmapmem_store(const char *file_name, const char *buf, size_t len) { inm_s32_t thres = 0; if(!is_digit(buf, len)) { err("Invalid volume threshold percent supplied: has non-digit chars"); return -EINVAL; } thres = inm_atoi(buf); if(!write_common_attr(file_name, (void *)buf, len)) { return -EINVAL; } else { driver_ctx->dc_bmap_info.max_bitmap_buffer_memory = thres; } return len; } ssize_t bmap_512ksz_show(char *buf) { return snprintf(buf, INM_PAGESZ, "%d\n", driver_ctx->dc_bmap_info.bitmap_512K_granularity_size); } ssize_t bmap_512ksz_store(const char *file_name, const char *buf, size_t len) { inm_s32_t thres = 0; if(!is_digit(buf, len)) { err("Invalid volume threshold percent supplied: has non-digit chars"); return -EINVAL; } thres = inm_atoi(buf); if(!write_common_attr(file_name, (void *)buf, len)) { return -EINVAL; } else { driver_ctx->dc_bmap_info.bitmap_512K_granularity_size = thres; } return len; } ssize_t vdf_show(char *buf) { return snprintf(buf, INM_PAGESZ, "%d\n", driver_ctx->tunable_params.enable_data_filtering); } ssize_t vdf_store(const char *file_name, const char *buf, size_t len) { inm_s32_t val = 0; if(!is_digit(buf, len)) { err("Invalid volume threshold percent supplied: has non-digit chars"); return -EINVAL; } val = inm_atoi(buf); if((val != 0) && (val != 1)) { err("Cant have anything other than 0 and 1 for VolumeDataFiltering"); return -EINVAL; } if(!write_common_attr(file_name, (void *)buf, len)) { return -EINVAL; } else { driver_ctx->tunable_params.enable_data_filtering = val; } return len; } ssize_t vdf_newvol_show(char *buf) { return snprintf(buf, INM_PAGESZ, "%d\n", driver_ctx->tunable_params.enable_data_filtering_for_new_volumes); } ssize_t vdf_newvol_store(const char *file_name, const char *buf, size_t len) { inm_s32_t val = 0; if(!is_digit(buf, len)) { err("Invalid volume threshold percent supplied: has non-digit chars"); return -EINVAL; } val = inm_atoi(buf); if((val != 0) && (val != 1)) { err("Cant have anything other than 0 and 1 for VolumeDataFiltering"); return -EINVAL; } if(!write_common_attr(file_name, (void *)buf, len)) { return -EINVAL; } else { driver_ctx->tunable_params.enable_data_filtering_for_new_volumes = val; } return len; } ssize_t vol_dfm_show(char *buf) { return snprintf(buf, INM_PAGESZ, "%d\n", driver_ctx->tunable_params.enable_data_file_mode); } ssize_t vol_dfm_store(const char *file_name, const char *buf, size_t len) { inm_s32_t val = 0; if(!is_digit(buf, len)) { err("Invalid volume threshold percent supplied: has non-digit chars"); return -EINVAL; } val = inm_atoi(buf); if((val != 0) && (val != 1)) { err("Cant have anything other than 0 and 1 for VolumeDataFiltering"); return -EINVAL; } if(!write_common_attr(file_name, (void *)buf, len)) { return -EINVAL; } else { driver_ctx->tunable_params.enable_data_file_mode = val; } return len; } #ifdef IDEBUG_MIRROR_IO extern inm_s32_t inject_atio_err; extern inm_s32_t inject_ptio_err; extern inm_s32_t inject_vendorcdb_err; extern inm_s32_t clear_vol_entry_err; #endif ssize_t newvol_dfm_show(char *buf) { return snprintf(buf, INM_PAGESZ, "%d\n", driver_ctx->tunable_params.enable_data_file_mode_for_new_volumes); } ssize_t newvol_dfm_store(const char *file_name, const char *buf, size_t len) { inm_s32_t val = 0; if(!is_digit(buf, len)) { err("Invalid volume threshold percent supplied: has non-digit chars"); return -EINVAL; } val = inm_atoi(buf); #if (defined(IDEBUG_MIRROR_IO)) if(val == 1){ inject_atio_err = 1; } if(val == 2){ inject_ptio_err = 1; dbg("enabled ptio error injection"); } if(val == 3){ inject_vendorcdb_err = 1; } if(val == 4){ clear_vol_entry_err = 1; } #endif if((val != 0) && (val != 1)) { err("Cant have anything other than 0 and 1 for VolumeDataFiltering"); return -EINVAL; } if(!write_common_attr(file_name, (void *)buf, len)) { return -EINVAL; } else { driver_ctx->tunable_params.enable_data_file_mode_for_new_volumes = val; } return len; } ssize_t dfm_disk_limit_show(char *buf) { return snprintf(buf, INM_PAGESZ, "%lld MB\n", (long long)(driver_ctx->tunable_params.data_to_disk_limit/MEGABYTES)); } ssize_t dfm_disk_limit_store(const char *file_name, const char *buf, size_t len) { inm_s64_t lmt = 0; if(!is_digit(buf, len)) { err("Invalid volume threshold percent supplied: has non-digit chars"); return -EINVAL; } if(!write_common_attr(file_name, (void *)buf, len)) { return -EINVAL; } else { lmt = inm_atoi(buf); driver_ctx->tunable_params.data_to_disk_limit = (lmt * MEGABYTES); } return len; } ssize_t vol_dbnotify_show(char *buf) { return snprintf(buf, INM_PAGESZ, "%d\n", driver_ctx->tunable_params.db_notify); } ssize_t vol_dbnotify_store(const char *file_name, const char *buf, size_t len) { info("Reset Global DB Notify limit to %u", driver_ctx->tunable_params.db_notify); driver_ctx->tunable_params.db_notify = driver_ctx->tunable_params.max_data_sz_dm_cn; return 0; } ssize_t inm_seqno_show(char *buf) { return snprintf(buf, INM_PAGESZ, "%llu\n", (unsigned long long)driver_ctx->last_time_stamp_seqno); } ssize_t inm_seqno_store(const char *file_name, const char *buf, size_t len) { inm_u64_t val = 0; unsigned long lock_flag = 0; if(!is_digit(buf, len)) { err("Invalid volume threshold percent supplied: has non-digit chars"); return -EINVAL; } val = inm_atoi64(buf); INM_SPIN_LOCK_IRQSAVE(&driver_ctx->time_stamp_lock, lock_flag); if (val <= driver_ctx->last_time_stamp_seqno) { INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->time_stamp_lock, lock_flag); info("new seqno val %llu is lessthan the current seqno %llu\n", val, driver_ctx->last_time_stamp_seqno); return -EINVAL; } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->time_stamp_lock, lock_flag); sprintf((char *)buf, "%llu", ((inm_u64_t) val)); if(!write_common_attr(file_name, (void *)buf, len)) { return -EINVAL; } else { INM_SPIN_LOCK_IRQSAVE(&driver_ctx->time_stamp_lock, lock_flag); driver_ctx->last_time_stamp_seqno = val; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->time_stamp_lock, lock_flag); } return len; } ssize_t inm_max_data_sz_dm_cn_show(char *buf) { return snprintf(buf, INM_PAGESZ, "%u MB", ((inm_u32_t)driver_ctx->tunable_params.max_data_sz_dm_cn)/(MEGABYTES)); } ssize_t inm_max_data_sz_dm_cn_store(const char *file_name, const char *buf, size_t len) { inm_u64_t val = 0; struct inm_list_head *ptr = NULL, *nextptr = NULL; target_context_t *vcptr = NULL; if(!is_digit(buf, len)) { err("Invalid volume threshold percent supplied: has non-digit chars"); return -EINVAL; } val = inm_atoi64(buf); val *= (MEGABYTES); val = max(val, (inm_u64_t)MIN_DATA_SZ_PER_CHANGE_NODE); val = min(val, (inm_u64_t)MAX_DATA_SZ_PER_CHANGE_NODE); sprintf((char *)buf, "%llu", ((inm_u64_t) val)/(MEGABYTES)); if (driver_ctx->dc_verifier_on) { err("Cannot change data mode change node size with verifier on"); return -EPERM; } if(!write_common_attr(file_name, (void *)buf, len)) { return -EINVAL; } else { driver_ctx->tunable_params.max_data_sz_dm_cn = val; driver_ctx->tunable_params.db_notify = driver_ctx->tunable_params.max_data_sz_dm_cn; INM_DOWN_READ(&driver_ctx->tgt_list_sem); inm_list_for_each_safe(ptr, nextptr, &driver_ctx->tgt_list) { vcptr = inm_list_entry(ptr, target_context_t, tc_list); if (vcptr->tc_flags & VCF_VOLUME_DELETING) continue; info("Update DB Notify Threshold from %u to %u", vcptr->tc_db_notify_thres, driver_ctx->tunable_params.max_data_sz_dm_cn); vcptr->tc_db_notify_thres = driver_ctx->tunable_params.max_data_sz_dm_cn; } INM_UP_READ(&driver_ctx->tgt_list_sem); } return len; } ssize_t read_pool_size(char *fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; inm_s32_t mem = 0; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) { err("Not a digit:"); goto set_default; } mem = inm_atoi(buf); if(mem < DEFAULT_DATA_POOL_SIZE_MB) mem = DEFAULT_DATA_POOL_SIZE_MB; driver_ctx->tunable_params.data_pool_size = mem; info("data pool size set to %dMB", driver_ctx->tunable_params.data_pool_size); goto free_buf; } set_default: driver_ctx->tunable_params.data_pool_size = driver_ctx->default_data_pool_size_mb; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); return 0; } ssize_t inm_read_vol_respool_sz(char *fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; inm_s32_t mem = 0; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) { err("Not a digit:"); goto set_default; } mem = inm_atoi(buf); if (mem >= 0 && mem <= driver_ctx->default_data_pool_size_mb ) { driver_ctx->tunable_params.volume_data_pool_size = mem; driver_ctx->dc_vol_data_pool_size = mem << (MEGABYTE_BIT_SHIFT - INM_PAGESHIFT); goto free_buf; } } set_default: driver_ctx->tunable_params.volume_data_pool_size = DEFAULT_VOLUME_DATA_POOL_SIZE_MB; driver_ctx->dc_vol_data_pool_size = DEFAULT_VOLUME_DATA_POOL_SIZE_MB << (MEGABYTE_BIT_SHIFT - INM_PAGESHIFT); free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); return 0; } ssize_t inm_read_maxdatapool_sz(char *fname) { int bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; unsigned int percentage = 0; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(bytes_read > 2) { err("Invalid percentage value. Setting MaxDataPoolSize " "based on %d%%", DEFAULT_MAX_DATA_POOL_PERCENTAGE); goto set_default; } if(!is_digit(buf, bytes_read)) { err("Not a digit:"); goto set_default; } percentage = inm_atoi(buf); driver_ctx->tunable_params.max_data_pool_percent = percentage; goto free_buf; } set_default: driver_ctx->tunable_params.max_data_pool_percent = DEFAULT_MAX_DATA_POOL_PERCENTAGE; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); return 0; } ssize_t read_log_dir(char *fname) { inm_s32_t bytes_read = 0, buf_len = INM_PATH_MAX; void *buf = NULL; unsigned long lock_flag = 0; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(((char *)buf)[bytes_read-1] == '\n') bytes_read--; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->tunables_lock, lock_flag); if (memcpy_s(driver_ctx->tunable_params.data_file_log_dir, INM_PATH_MAX, buf, bytes_read)) { INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->tunables_lock, lock_flag); dbg("memcpy_s failed to copy the datafile log dire path"); goto set_default; } driver_ctx->tunable_params.data_file_log_dir[bytes_read] = '\0'; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->tunables_lock, lock_flag); goto free_buf; } set_default: INM_SPIN_LOCK_IRQSAVE(&driver_ctx->tunables_lock, lock_flag); strcpy_s(driver_ctx->tunable_params.data_file_log_dir, INM_PATH_MAX, DEFAULT_VOLUME_DATALOG_DIR ); INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->tunables_lock, lock_flag); free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); return 0; } ssize_t read_free_thres(char *fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { inm_s32_t sz = 0; if(!is_digit(buf, bytes_read)) goto set_default; sz = inm_atoi(buf); if(sz < 0 || sz > 100) goto set_default; driver_ctx->tunable_params.free_percent_thres_for_filewrite = sz; goto free_buf; } set_default: driver_ctx->tunable_params.free_percent_thres_for_filewrite = DEFAULT_FREE_THRESHOLD_FOR_FILEWRITE; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); return 0; } ssize_t read_volume_thres(char *fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { inm_s32_t sz = 0; if(!is_digit(buf, bytes_read)) goto set_default; sz = inm_atoi(buf); if(sz < 0 || sz > 100) goto set_default; driver_ctx->tunable_params.volume_percent_thres_for_filewrite = sz; goto free_buf; } set_default: driver_ctx->tunable_params.volume_percent_thres_for_filewrite = DEFAULT_VOLUME_THRESHOLD_FOR_FILEWRITE; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); return 0; } ssize_t read_dbhwm_sns(char *fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; driver_ctx->tunable_params.db_high_water_marks[SERVICE_NOTSTARTED] = inm_atoi(buf); goto free_buf; } set_default: driver_ctx->tunable_params.db_high_water_marks[SERVICE_NOTSTARTED] = DEFAULT_DB_HIGH_WATERMARK_SERVICE_NOT_STARTED; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); return 0; } ssize_t read_dbhwm_sr(char *fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; driver_ctx->tunable_params.db_high_water_marks[SERVICE_RUNNING] = inm_atoi(buf); goto free_buf; } set_default: driver_ctx->tunable_params.db_high_water_marks[SERVICE_RUNNING] = DEFAULT_DB_HIGH_WATERMARK_SERVICE_RUNNING; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); return 0; } ssize_t read_dblwm_sr(char *fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; driver_ctx->tunable_params.db_low_water_mark_while_service_running = inm_atoi(buf); goto free_buf; } set_default: driver_ctx->tunable_params.db_low_water_mark_while_service_running = DEFAULT_DB_LOW_WATERMARK_SERVICE_RUNNING; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); return 0; } ssize_t read_dbhwm_ss(char *fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; driver_ctx->tunable_params.db_high_water_marks[SERVICE_SHUTDOWN] = inm_atoi(buf); goto free_buf; } set_default: driver_ctx->tunable_params.db_high_water_marks[SERVICE_SHUTDOWN]= DEFAULT_DB_HIGH_WATERMARK_SERVICE_SHUTDOWN; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); return 0; } ssize_t read_dbp_hwm(char *fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; driver_ctx->tunable_params.db_topurge_when_high_water_mark_is_reached = inm_atoi(buf); goto free_buf; } set_default: driver_ctx->tunable_params.db_topurge_when_high_water_mark_is_reached = DEFAULT_DB_TO_PURGE_HIGH_WATERMARK_REACHED; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); return 0; } ssize_t read_max_bmapmem(char *fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; driver_ctx->dc_bmap_info.max_bitmap_buffer_memory = inm_atoi(buf); goto free_buf; } set_default: driver_ctx->dc_bmap_info.max_bitmap_buffer_memory = \ DEFAULT_MAXIMUM_BITMAP_BUFFER_MEMORY; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); return 0; } ssize_t read_bmap_512ksz(char *fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; driver_ctx->dc_bmap_info.bitmap_512K_granularity_size = inm_atoi(buf); goto free_buf; } set_default: driver_ctx->dc_bmap_info.bitmap_512K_granularity_size = \ DEFAULT_BITMAP_512K_GRANULARITY_SIZE; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); return 0; } ssize_t read_vdf(char *fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; driver_ctx->tunable_params.enable_data_filtering = inm_atoi(buf); goto free_buf; } set_default: driver_ctx->tunable_params.enable_data_filtering = 1; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); return 0; } ssize_t read_vdf_newvol(char *fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; driver_ctx->tunable_params.enable_data_filtering_for_new_volumes = inm_atoi(buf); goto free_buf; } set_default: driver_ctx->tunable_params.enable_data_filtering_for_new_volumes= 1; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); return 0; } ssize_t read_vol_dfm(char *fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; driver_ctx->tunable_params.enable_data_file_mode = inm_atoi(buf); goto free_buf; } set_default: driver_ctx->tunable_params.enable_data_file_mode= 0; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); return 0; } ssize_t read_newvol_dfm(char *fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; driver_ctx->tunable_params.enable_data_file_mode_for_new_volumes = inm_atoi(buf); goto free_buf; } set_default: driver_ctx->tunable_params.enable_data_file_mode_for_new_volumes= 0; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); return 0; } ssize_t read_dfm_disk_limit(char *fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; inm_s64_t lmt = 0; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; lmt = inm_atoi(buf); driver_ctx->tunable_params.data_to_disk_limit = (lmt * MEGABYTES); goto free_buf; } set_default: driver_ctx->tunable_params.data_to_disk_limit = (DEFAULT_VOLUME_DATA_TO_DISK_LIMIT_IN_MB * MEGABYTES); free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); return 0; } ssize_t read_dbnotify(char *fname) { if (driver_ctx->tunable_params.max_data_sz_dm_cn) driver_ctx->tunable_params.db_notify = driver_ctx->tunable_params.max_data_sz_dm_cn; else driver_ctx->tunable_params.db_notify = DEFAULT_DB_NOTIFY_THRESHOLD; return 0; } ssize_t inm_read_seqno(char *fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_LONGLONG + 1); void *buf = NULL; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; /* On first I/0 after reading the data from file, it should flush */ driver_ctx->last_time_stamp_seqno = inm_atoi64(buf) + RELOAD_TIME_SEQNO_JUMP_COUNT; goto free_buf; } set_default: driver_ctx->last_time_stamp_seqno = DEFAULT_SEQNO; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); dbg("starting seq no = %llu\n", driver_ctx->last_time_stamp_seqno); return 0; } /* read timstamp from disk */ ssize_t inm_read_ts(void) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_LONGLONG + 1); void *buf = NULL; char *fname = "GlobalTimeStamp"; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; driver_ctx->last_time_stamp = inm_atoi64(buf) + 2 * HUNDREDS_OF_NANOSEC_IN_SECOND; dbg("persitent value for ts = %llu\n", driver_ctx->last_time_stamp); goto free_buf; } set_default: driver_ctx->last_time_stamp = DEFAULT_TIME_STAMP_VALUE; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); return 0; } ssize_t inm_read_max_data_sz_dm_cn(char *fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_LONGLONG + 1); void *buf = NULL; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; driver_ctx->tunable_params.max_data_sz_dm_cn = inm_atoi64(buf) * (MEGABYTES); goto free_buf; } set_default: driver_ctx->tunable_params.max_data_sz_dm_cn = DEFAULT_MAX_DATA_SZ_PER_CHANGE_NODE; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); driver_ctx->tunable_params.db_notify = driver_ctx->tunable_params.max_data_sz_dm_cn; return 0; } ssize_t inm_show_verifier(char *buf) { return snprintf(buf, INM_PAGESZ, "%u\n", driver_ctx->dc_verifier_on); } ssize_t inm_store_verifier(const char *file_name, const char *buf, size_t len) { inm_u32_t verifier_on = 0; inm_u32_t error = 0; if (!is_digit(buf, len)) { err("The supplied value of clean shutdown contains non-digit chars"); return -EINVAL; } verifier_on = inm_atoi(buf); if (verifier_on == driver_ctx->dc_verifier_on) goto out; if (verifier_on) error = inm_verify_alloc_area(driver_ctx->tunable_params.max_data_sz_dm_cn, 1); else inm_verify_free_area(); if (!error) { info("Verification Mode: %d", verifier_on); driver_ctx->dc_verifier_on = verifier_on; } if (!write_common_attr(file_name, (void *)buf, len)) { err("Verifier update failed to write file:%s.", file_name); err("Verifier = %d until next boot", verifier_on); } out: return len; } ssize_t inm_read_verifier(char *fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); inm_u32_t verifier_on = 0; void *buf = NULL; inm_s32_t error = 0; driver_ctx->dc_verifier_on = 0; if(read_common_attr(fname, &buf, buf_len, &bytes_read) && is_digit(buf, bytes_read)) { verifier_on = inm_atoi(buf); if (verifier_on) { info("Verification Mode On"); error = inm_verify_alloc_area(driver_ctx->tunable_params.max_data_sz_dm_cn, 1); } if (error) err("Cannot turn on verification mode"); else driver_ctx->dc_verifier_on = verifier_on; } if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); return 0; } ssize_t inm_clean_shutdown_show(char *buf) { return snprintf(buf, INM_PAGESZ, "%u\n", driver_ctx->clean_shutdown); } ssize_t inm_clean_shutdown_store(const char *file_name, const char *buf, size_t len) { inm_u32_t clean_shutdown; if (!is_digit(buf, len)) { err("The supplied value of clean shutdown contains non-digit chars"); return -EINVAL; } clean_shutdown = inm_atoi(buf); driver_ctx->clean_shutdown = clean_shutdown; if (!write_common_attr(file_name, (void *)buf, len)) { err("CleanShutdown update failed to write file:%s.", file_name); return -EINVAL; } return len; } ssize_t inm_read_clean_shutdown(char *fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; driver_ctx->clean_shutdown = inm_atoi(buf); if (driver_ctx->clean_shutdown) { info("Clean system shutdown"); } else { info("Unclean system shutdown"); } goto free_buf; } set_default: driver_ctx->clean_shutdown = CLEAN_SHUTDOWN; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); return 0; } ssize_t inm_max_md_coalesce_show(char *buf) { return snprintf(buf, INM_PAGESZ, "%u (bytes)\n", driver_ctx->tunable_params.max_sz_md_coalesce); } ssize_t inm_max_md_coalesce_store(const char *file_name, const char *buf, size_t len) { inm_u32_t max_sz_md_coalesce; if (!is_digit(buf, len)) { err("The supplied value of max coalesce bytes contains non-digit chars"); return -EINVAL; } max_sz_md_coalesce = inm_atoi(buf); driver_ctx->tunable_params.max_sz_md_coalesce = max_sz_md_coalesce; if (!write_common_attr(file_name, (void *)buf, len)) { err("MaxCoalescedMetaDataChangeSize update failed to write file:%s.", file_name); return -EINVAL; } return len; } ssize_t inm_max_md_coalesce_read(char *fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; driver_ctx->tunable_params.max_sz_md_coalesce = inm_atoi(buf); goto free_buf; } set_default: driver_ctx->tunable_params.max_sz_md_coalesce = DEFAULT_MAX_COALESCED_METADATA_CHANGE_SIZE; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); return 0; } ssize_t inm_time_reorg_data_pool_read(char *fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; driver_ctx->tunable_params.time_reorg_data_pool_sec = inm_atoi(buf); goto free_buf; } set_default: driver_ctx->tunable_params.time_reorg_data_pool_sec = DEFAULT_REORG_THRSHLD_TIME_SEC; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); return 0; } ssize_t inm_time_reorg_data_pool_factor_read(char *fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; driver_ctx->tunable_params.time_reorg_data_pool_factor = inm_atoi(buf); goto free_buf; } set_default: driver_ctx->tunable_params.time_reorg_data_pool_factor = DEFAULT_REORG_THRSHLD_TIME_FACTOR; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); return 0; } ssize_t inm_percent_change_data_pool_size_read(char *fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_common_attr(fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; driver_ctx->tunable_params.percent_change_data_pool_size = inm_atoi(buf); goto free_buf; } set_default: driver_ctx->tunable_params.percent_change_data_pool_size = DEFAULT_PERCENT_CHANGE_DATA_POOL_SIZE; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); return 0; } COMMON_ATTR(common_attr_DataPoolSize, "DataPoolSize", INM_S_IRWXUGO, &pool_size_show, &pool_size_store, &read_pool_size); COMMON_ATTR(common_attr_DefaultLogDirectory, "DefaultLogDirectory", INM_S_IRWXUGO,&log_dir_show, &log_dir_store, &read_log_dir); COMMON_ATTR(common_attr_FreeThresholdForFileWrite, "FreeThresholdForFileWrite", INM_S_IRWXUGO, &free_thres_show, &free_thres_store, &read_free_thres); COMMON_ATTR(common_attr_VolumeThresholdForFileWrite, "VolumeThresholdForFileWrite", INM_S_IRWXUGO, &volume_thres_show, &volume_thres_store, &read_volume_thres); COMMON_ATTR(common_attr_DirtyBlockHighWaterMarkServiceNotStarted, "DirtyBlockHighWaterMarkServiceNotStarted", INM_S_IRWXUGO, &dbhwm_sns_show, &dbhwm_sns_store, &read_dbhwm_sns); COMMON_ATTR(common_attr_DirtyBlockLowWaterMarkServiceRunning, "DirtyBlockLowWaterMarkServiceRunning", INM_S_IRWXUGO, &dblwm_sr_show, &dblwm_sr_store, &read_dblwm_sr); COMMON_ATTR(common_attr_DirtyBlockHighWaterMarkServiceRunning, "DirtyBlockHighWaterMarkServiceRunning", INM_S_IRWXUGO, &dbhwm_sr_show, &dbhwm_sr_store, &read_dbhwm_sr); COMMON_ATTR(common_attr_DirtyBlockHighWaterMarkServiceShutdown, "DirtyBlockHighWaterMarkServiceShutdown", INM_S_IRWXUGO, &dbhwm_ss_show, &dbhwm_ss_store, &read_dbhwm_ss); COMMON_ATTR(common_attr_DirtyBlocksToPurgeWhenHighWaterMarkIsReached, "DirtyBlocksToPurgeWhenHighWaterMarkIsReached", INM_S_IRWXUGO, &dbp_hwm_show, &dbp_hwm_store, &read_dbp_hwm); COMMON_ATTR(common_attr_MaximumBitmapBufferMemory, "MaximumBitmapBufferMemory", INM_S_IRWXUGO, &max_bmapmem_show, &max_bmapmem_store, &read_max_bmapmem); COMMON_ATTR(common_attr_Bitmap512KGranularitySize, "Bitmap512KGranularitySize", INM_S_IRWXUGO, &bmap_512ksz_show, &bmap_512ksz_store, &read_bmap_512ksz); COMMON_ATTR(common_attr_VolumeDataFiltering, "VolumeDataFiltering", INM_S_IRWXUGO, &vdf_show, &vdf_store, &read_vdf); COMMON_ATTR(common_attr_VolumeDataFilteringForNewVolumes, "VolumeDataFilteringForNewVolumes", INM_S_IRWXUGO, &vdf_newvol_show, &vdf_newvol_store, &read_vdf_newvol); COMMON_ATTR(common_attr_VolumeDataFiles, "VolumeDataFiles", INM_S_IRWXUGO, &vol_dfm_show, &vol_dfm_store, &read_vol_dfm); COMMON_ATTR(common_attr_VolumeDataFilesForNewVolumes, "VolumeDataFilesForNewVolumes", INM_S_IRWXUGO, &newvol_dfm_show, &newvol_dfm_store, &read_newvol_dfm); COMMON_ATTR(common_attr_VolumeDataToDiskLimitInMB, "VolumeDataToDiskLimitInMB", INM_S_IRWXUGO, &dfm_disk_limit_show, &dfm_disk_limit_store, &read_dfm_disk_limit); COMMON_ATTR(common_attr_VolumeDataNotifyLimit, "VolumeDataNotifyLimit", INM_S_IRWXUGO, &vol_dbnotify_show, &vol_dbnotify_store, &read_dbnotify); COMMON_ATTR(common_attr_SequenceNumber, "SequenceNumber", INM_S_IRWXUGO, &inm_seqno_show, inm_seqno_store, &inm_read_seqno); COMMON_ATTR(common_attr_MaxDataSizeForDataModeDirtyBlock, "MaxDataSizeForDataModeDirtyBlock", INM_S_IRWXUGO, &inm_max_data_sz_dm_cn_show, &inm_max_data_sz_dm_cn_store, &inm_read_max_data_sz_dm_cn); COMMON_ATTR(common_attr_VolumeResDataPoolSize, "VolumeResDataPoolSize", INM_S_IRWXUGO, &inm_vol_respool_sz_show, &inm_vol_respool_sz_store, &inm_read_vol_respool_sz); COMMON_ATTR(common_attr_MaxDataPoolSize, "MaxDataPoolSize", INM_S_IRWXUGO, &inm_maxdatapool_sz_show, &inm_maxdatapool_sz_store, &inm_read_maxdatapool_sz); COMMON_ATTR(common_attr_CleanShutdown, "CleanShutdown", INM_S_IRWXUGO, &inm_clean_shutdown_show, &inm_clean_shutdown_store, &inm_read_clean_shutdown); COMMON_ATTR(common_attr_MaxCoalescedMetaDataChangeSize, "MaxCoalescedMetaDataChangeSize", INM_S_IRWXUGO, &inm_max_md_coalesce_show, &inm_max_md_coalesce_store, &inm_max_md_coalesce_read); COMMON_ATTR(common_attr_PercentChangeDataPoolSize, "PercentChangeDataPoolSize", INM_S_IRWXUGO, &inm_percent_change_data_pool_size_show, &inm_percent_change_data_pool_size_store, &inm_percent_change_data_pool_size_read); COMMON_ATTR(common_attr_TimeReorgDataPoolSec, "TimeReorgDataPoolSec", INM_S_IRWXUGO, &inm_time_reorg_data_pool_show, &inm_time_reorg_data_pool_store, &inm_time_reorg_data_pool_read); COMMON_ATTR(common_attr_TimeReorgDataPoolFactor, "TimeReorgDataPoolFactor", INM_S_IRWXUGO, &inm_time_reorg_data_pool_factor_show, &inm_time_reorg_data_pool_factor_store, &inm_time_reorg_data_pool_factor_read); COMMON_ATTR(common_attr_VacpIObarrierTimeout, "VacpIObarrierTimeout", INM_S_IRWXUGO, &inm_vacp_iobarrier_timeout_show, &inm_vacp_iobarrier_timeout_store, &inm_vacp_iobarrier_timeout_read); COMMON_ATTR(common_attr_FsFreezeTimeout, "FsFreezeTimeout", INM_S_IRWXUGO, &inm_fs_freeze_timeout_show, &inm_fs_freeze_timeout_store, &inm_fs_freeze_timeout_read); COMMON_ATTR(common_attr_VacpAppTagCommitTimeout, "VacpAppTagCommitTimeout", INM_S_IRWXUGO, &inm_vacp_app_tag_commit_timeout_show, &inm_vacp_app_tag_commit_timeout_store, &inm_vacp_app_tag_commit_timeout_read); COMMON_ATTR(common_attr_TrackRecursiveWrites, "TrackRecursiveWrites", INM_S_IRWXUGO, &inm_recio_show, &inm_recio_store, &inm_recio_read); COMMON_ATTR(common_attr_StablePages, "StablePages", INM_S_IRWXUGO, &inm_stable_pages_show, &inm_stable_pages_store, &inm_stable_pages_read); COMMON_ATTR(common_attr_Verifier, "Verifier", INM_S_IRWXUGO, &inm_show_verifier, &inm_store_verifier, &inm_read_verifier); COMMON_ATTR(common_attr_ChainedIO, "ChainedIO", INM_S_IRWXUGO, &inm_chained_io_show, &inm_chained_io_store, &inm_chained_io_read); static struct attribute *sysfs_common_attrs[] = { &common_attr_DataPoolSize.attr, &common_attr_DefaultLogDirectory.attr, &common_attr_FreeThresholdForFileWrite.attr, &common_attr_VolumeThresholdForFileWrite.attr, &common_attr_DirtyBlockHighWaterMarkServiceNotStarted.attr, &common_attr_DirtyBlockLowWaterMarkServiceRunning.attr, &common_attr_DirtyBlockHighWaterMarkServiceRunning.attr, &common_attr_DirtyBlockHighWaterMarkServiceShutdown.attr, &common_attr_DirtyBlocksToPurgeWhenHighWaterMarkIsReached.attr, &common_attr_MaximumBitmapBufferMemory.attr, &common_attr_Bitmap512KGranularitySize.attr, &common_attr_VolumeDataFiltering.attr, &common_attr_VolumeDataFilteringForNewVolumes.attr, &common_attr_VolumeDataFiles.attr, &common_attr_VolumeDataFilesForNewVolumes.attr, &common_attr_VolumeDataToDiskLimitInMB.attr, &common_attr_VolumeDataNotifyLimit.attr, &common_attr_SequenceNumber.attr, &common_attr_MaxDataSizeForDataModeDirtyBlock.attr, &common_attr_VolumeResDataPoolSize.attr, &common_attr_MaxDataPoolSize.attr, &common_attr_CleanShutdown.attr, &common_attr_MaxCoalescedMetaDataChangeSize.attr, &common_attr_PercentChangeDataPoolSize.attr, &common_attr_TimeReorgDataPoolSec.attr, &common_attr_TimeReorgDataPoolFactor.attr, &common_attr_VacpIObarrierTimeout.attr, &common_attr_FsFreezeTimeout.attr, &common_attr_VacpAppTagCommitTimeout.attr, &common_attr_TrackRecursiveWrites.attr, &common_attr_StablePages.attr, &common_attr_Verifier.attr, &common_attr_ChainedIO.attr, NULL, }; void load_driver_params(void) { inm_s32_t num_attribs, temp; num_attribs = (sizeof(sysfs_common_attrs)/sizeof(struct atttribute *)); num_attribs--; temp = 0; while(temp < num_attribs) { struct common_attribute *common_attr; struct attribute *attr = sysfs_common_attrs[temp]; common_attr = inm_container_of(attr, struct common_attribute, attr); if (common_attr->read) common_attr->read(common_attr->file_name); temp++; } inm_read_ts(); } inm_s32_t sysfs_involflt_init(void) { char *path = NULL; if(!get_path_memory(&path)) { err("Failed to get memory while creating persistent directory"); return 1; } /* Create /etc/vxagent */ strcpy_s(path, INM_PATH_MAX, "/etc/vxagent"); inm_mkdir(path, 0755); /* Create persistent dir, /etc/vxagent/involflt */ strcpy_s(path, INM_PATH_MAX, PERSISTENT_DIR); inm_mkdir(path, 0755); /* Create /etc/vxagent/involflt/common */ snprintf(path, INM_PATH_MAX, "%s/%s", PERSISTENT_DIR, COMMON_ATTR_NAME); inm_mkdir(path, 0755); free_path_memory(&path); driver_ctx->dc_tel.dt_persistent_dir_created = 1; return 0; } inm_s32_t common_get_set_attribute_entry(struct _inm_attribute *attr) { inm_s32_t ret = 0; char *lbufp = NULL; inm_u32_t lbuflen = 0; lbuflen = INM_MAX(attr->buflen, INM_PAGESZ); lbufp = (char *) INM_KMALLOC(lbuflen, INM_KM_SLEEP, INM_KERNEL_HEAP); if(!lbufp){ err("buffer allocation get_set ioctl failed\n"); ret = INM_ENOMEM; goto out; } INM_MEM_ZERO(lbufp, lbuflen); if (INM_COPYIN(lbufp, attr->bufp, attr->buflen)) { err("copyin failed\n"); ret = INM_EFAULT; goto out; } if(attr->why == SET_ATTR){ ret = common_attr_store(sysfs_common_attrs[attr->type], lbufp, attr->buflen); if (ret < 0){ err("attributre store failed"); ret = -ret; goto out; } } else { ret = common_attr_show(sysfs_common_attrs[attr->type], lbufp); if (ret == 0){ dbg("its unlikely but get attribute read 0 bytes only"); } else if (ret > 0){ if (INM_COPYOUT(attr->bufp, lbufp, ret+1)) { err("copyout failed\n"); ret = INM_EFAULT; goto out; } } else { err("get attribute failed"); ret = -ret; goto out; } } ret = 0; out: if (lbufp) { INM_KFREE(lbufp, lbuflen, INM_KERNEL_HEAP); } return ret; } /* * ==============================================VOLUME ATTRIBUTES============================================== */ inm_s32_t read_vol_attr(target_context_t *ctxt, char *fname, void **buf, inm_s32_t len, inm_s32_t *bytes_read) { inm_s32_t ret = 0; char *path = NULL; if(!get_path_memory(&path)) { err("Failed to allocated memory path"); return ret; } snprintf(path, INM_PATH_MAX, "%s/%s/%s", PERSISTENT_DIR, ctxt->tc_pname, fname); dbg("Reading from file %s", path); *buf = (void *)INM_KMALLOC(len, INM_KM_SLEEP, INM_KERNEL_HEAP); if(!*buf) goto free_buf; dbg("Allocated buffer of len %d", len); INM_MEM_ZERO(*buf, len); if(!read_full_file(path, *buf, len, (inm_u32_t *)bytes_read)) { ret = 0; goto free_buf; } ret = 1; goto free_path_buf; free_buf: if(*buf) INM_KFREE(*buf, len, INM_KERNEL_HEAP); *buf = NULL; free_path_buf: if(path) free_path_memory(&path); path = NULL; return ret; } inm_s32_t vol_flt_disabled_show(target_context_t *ctxt, char *buf) { return snprintf(buf, INM_PAGESZ, "%d\n", (is_target_filtering_disabled(ctxt)? 1 : 0)); } inm_s32_t vol_flt_disabled_store(target_context_t *ctxt, char * file_name, const char *buf, inm_s32_t len) { inm_s32_t val = 0; if(!is_digit(buf, len)) { err("Invalid value for volume filtering disabled = %s : has non-digit chars", buf); return -EINVAL; } val = inm_atoi(buf); if((val != 0) && (val != 1)) { err("Cant have anything other than 0 and 1 for VolumeFilteringDisabled"); return -EINVAL; } if(!write_vol_attr(ctxt, file_name, (void *)buf, len)) { return -EINVAL; } else { volume_lock(ctxt); if(val) ctxt->tc_flags |= VCF_FILTERING_STOPPED; else ctxt->tc_flags &= ~VCF_FILTERING_STOPPED; volume_unlock(ctxt); } return len; } inm_s32_t vol_bmapread_disabled_show(target_context_t *temp, char *buf) { return 1; } inm_s32_t vol_bmapread_disabled_store(target_context_t *temp, char *file_name, const char *buf, inm_s32_t len) { return 0; } inm_s32_t vol_bmapwrite_disabled_show(target_context_t *temp, char *buf) { return 1; } inm_s32_t vol_bmapwrite_disabled_store(target_context_t *temp, char *file_name, const char *buf, inm_s32_t len) { return 1; } inm_s32_t vol_data_flt_show(target_context_t *ctxt, char *buf) { return snprintf(buf, INM_PAGESZ, "%d\n", ((ctxt->tc_flags & VCF_DATA_MODE_DISABLED)? 0 : 1)); } inm_s32_t vol_data_flt_store(target_context_t *ctxt, char *file_name, const char *buf, inm_s32_t len) { inm_s32_t val = 0; if(!is_digit(buf, len)) { err("Invalid value for volume data filtering %s : has non-digit chars", buf); return -EINVAL; } val = inm_atoi(buf); if((val != 0) && (val != 1)) { err("Cant have anything other than 0 and 1 for VolumeDataFiltering"); return -EINVAL; } if(!write_vol_attr(ctxt, file_name, (void *)buf, len)) { return -EINVAL; } else { volume_lock(ctxt); if(val) { ctxt->tc_flags &= ~VCF_DATA_MODE_DISABLED; } else { if(ctxt->tc_cur_mode == FLT_MODE_DATA) set_tgt_ctxt_filtering_mode(ctxt, FLT_MODE_METADATA, FALSE); if(ctxt->tc_cur_wostate == ecWriteOrderStateData) set_tgt_ctxt_wostate(ctxt, ecWriteOrderStateMetadata, FALSE, ecWOSChangeReasonExplicitNonWO); ctxt->tc_flags |= VCF_DATA_MODE_DISABLED; } volume_unlock(ctxt); } return len; } inm_s32_t vol_data_files_show(target_context_t *ctxt, char *buf) { return snprintf(buf, INM_PAGESZ, "%d\n", ((ctxt->tc_flags & VCF_DATA_FILES_DISABLED)? 0 : 1)); } inm_s32_t vol_data_files_store(target_context_t *ctxt, char *file_name, const char *buf, inm_s32_t len) { inm_s32_t val = 0; if(!is_digit(buf, len)) { err("Invalid value for volume data files %s has non-digit chars", buf); return -EINVAL; } val = inm_atoi(buf); if((val != 0) && (val != 1)) { err("Cant have anything other than 0 and 1 for VolumeDataFiles"); return -EINVAL; } if(!write_vol_attr(ctxt, file_name, (void *)buf, len)) { return -EINVAL; } else { if(val) ctxt->tc_flags &= ~VCF_DATA_FILES_DISABLED; else ctxt->tc_flags |= VCF_DATA_FILES_DISABLED; } return len; } inm_s32_t vol_data_to_disk_limit_show(target_context_t *ctxt, char *buf) { return snprintf(buf, INM_PAGESZ, "%lld MB", (long long)(ctxt->tc_data_to_disk_limit/MEGABYTES)); } inm_s32_t vol_data_to_disk_limit_store(target_context_t *ctxt, char *file_name, const char *buf, inm_s32_t len) { inm_s64_t lmt = 0; if(!is_digit(buf, len)) { err("Invalid value for volume data to disk limit %s : has non-digit chars", buf); return -EINVAL; } if(!write_vol_attr(ctxt, file_name, (void *)buf, len)) { return -EINVAL; } else { lmt = inm_atoi(buf); ctxt->tc_data_to_disk_limit = (lmt * MEGABYTES); } return len; } inm_s32_t vol_data_notify_show(target_context_t *ctxt, char *buf) { return snprintf(buf, INM_PAGESZ, "%d", (ctxt->tc_db_notify_thres)); } inm_s32_t vol_data_notify_store(target_context_t *ctxt, char *file_name, const char *buf, inm_s32_t len) { info("%s: Reset DB Notify limit to %u", ctxt->tc_guid, driver_ctx->tunable_params.db_notify); ctxt->tc_db_notify_thres = driver_ctx->tunable_params.db_notify; return 0; } inm_s32_t vol_log_dir_show(target_context_t *ctxt, char *buf) { return snprintf(buf, INM_PAGESZ, "%s", ctxt->tc_data_log_dir); } inm_s32_t vol_log_dir_store(target_context_t *ctxt, char *file_name, const char *buf, inm_s32_t len) { unsigned long lock_flag = 0; if(!write_vol_attr(ctxt, file_name, (void *)buf, len)) { return -EINVAL; } else { INM_SPIN_LOCK_IRQSAVE(&ctxt->tc_tunables_lock, lock_flag); if (strncpy_s(ctxt->tc_data_log_dir, len + 1, buf, len)) { INM_SPIN_UNLOCK_IRQRESTORE(&ctxt->tc_tunables_lock, lock_flag); return -INM_EFAULT; } ctxt->tc_data_log_dir[len] = '\0'; ctxt->tc_flags &= ~VCF_DATAFILE_DIR_CREATED; INM_SPIN_UNLOCK_IRQRESTORE(&ctxt->tc_tunables_lock, lock_flag); } return len; } inm_s32_t vol_bmap_gran_show(target_context_t *ctxt, char *buf) { return 1; } inm_s32_t vol_bmap_gran_store(target_context_t *ctxt, char *file_name, const char *buf, inm_s32_t len) { return 1; } inm_s32_t vol_resync_req_show(target_context_t *ctxt, char *buf) { return snprintf(buf, INM_PAGESZ, "%d", ctxt->tc_resync_required); } inm_s32_t vol_resync_req_store(target_context_t *ctxt, char *file_name, const char *buf, inm_s32_t len) { inm_s32_t val = 0; if(!is_digit(buf, len)) { err("Invalid value for volume resync required %s: has non-digit chars", buf); return -EINVAL; } val = inm_atoi(buf); if((val != 0) && (val != 1)) { err("Cant have anything other than 0 and 1 for VolumeResyncRequired"); return -EINVAL; } if(!write_vol_attr(ctxt, file_name, (void *)buf, len)) { return -EINVAL; } else { ctxt->tc_resync_required = val; } return len; } inm_s32_t vol_osync_errcode_show(target_context_t *ctxt, char *buf) { return snprintf(buf, INM_PAGESZ, "%ld", ctxt->tc_out_of_sync_err_code); } inm_s32_t vol_osync_errcode_store(target_context_t *ctxt, char *file_name, const char *buf, inm_s32_t len) { if(!is_digit(buf, len)) { err("Invalid value for volume out of sync err code: %s (expecting decimal value)", buf); return -EINVAL; } if(!write_vol_attr(ctxt, file_name, (void *)buf, len)) { return -EINVAL; } else { ctxt->tc_out_of_sync_err_code = inm_atoi(buf); } return len; } inm_s32_t vol_osync_status_show(target_context_t *ctxt, char *buf) { return snprintf(buf, INM_PAGESZ, "%ld", ctxt->tc_out_of_sync_err_status); } inm_s32_t vol_osync_status_store(target_context_t *ctxt, char *file_name, const char *buf, inm_s32_t len) { if(!is_digit(buf, len)) { err("Invalid value for volume out of sync err status %s\n", buf); return -EINVAL; } if(!write_vol_attr(ctxt, file_name, (void *)buf, len)) { return -EINVAL; } else { ctxt->tc_out_of_sync_err_status = inm_atoi(buf); } return len; } inm_s32_t vol_osync_count_show(target_context_t *ctxt, char *buf) { return snprintf(buf, INM_PAGESZ, "%ld", ctxt->tc_nr_out_of_sync); } inm_s32_t vol_osync_count_store(target_context_t *ctxt, char *file_name, const char *buf, inm_s32_t len) { if(!is_digit(buf, len)) { err("Invalid value for out of sync count %s: has non-digit chars", buf); return -EINVAL; } if(!write_vol_attr(ctxt, file_name, (void *)buf, len)) { return -EINVAL; } else { ctxt->tc_nr_out_of_sync = inm_atoi(buf); } return len; } inm_s32_t vol_osync_ts_show(target_context_t *ctxt, char *buf) { return snprintf(buf, INM_PAGESZ, "%lld", (long long)ctxt->tc_out_of_sync_time_stamp); } inm_s32_t vol_osync_ts_store(target_context_t *ctxt, char *file_name, const char *buf, inm_s32_t len) { if(!is_digit(buf, len)) { err("Invalid time stamp value %s: has non-digit chars", buf); return -EINVAL; } if(!write_vol_attr(ctxt, file_name, (void *)buf, len)) { return -EINVAL; } else { ctxt->tc_out_of_sync_time_stamp = inm_atoi64(buf); } return len; } inm_s32_t vol_osync_desc_show(target_context_t *ctxt, char *buf) { if (ctxt->tc_out_of_sync_err_code < 5) { return snprintf(buf, INM_PAGESZ, "%s\n", ErrorToRegErrorDescriptionsA[ctxt->tc_out_of_sync_err_code]); } else { return snprintf(buf, INM_PAGESZ, "See system log (/var/log/messages for error description\n"); } } inm_s32_t vol_osync_desc_store(target_context_t *ctxt, char *file_name, const char *buf, inm_s32_t len) { return 0; } inm_s32_t inm_vol_reserv_show(target_context_t *ctxt, char *buf) { inm_u32_t mb = 0; mb = ctxt->tc_reserved_pages >> (MEGABYTE_BIT_SHIFT - INM_PAGESHIFT); return snprintf(buf, INM_PAGESZ, "%u MB\n", mb); } inm_s32_t inm_vol_reserv_store(target_context_t *ctxt, char *file_name, const char *buf, inm_s32_t len) { inm_u32_t thres = 0, num_pages = 0, diff_pages; inm_u32_t add_pages = 0; inm_u32_t num_pages_available = 0; /* Not supported */ return -EINVAL; if (ctxt->tc_dev_type == FILTER_DEV_MIRROR_SETUP) { err("Tunable is invalid for mirror setup"); return -EINVAL; } if (!is_digit(buf, len)) { err("Invalid value VolumeResDataPoolSize:%s has non-digit chars", buf); return -EINVAL; } thres = inm_atoi(buf); num_pages = thres << (MEGABYTE_BIT_SHIFT - INM_PAGESHIFT); if (!num_pages) { err("Invalid value VolumeResDataPoolSize:%s cannot be zero", buf); return -EINVAL; } volume_lock(ctxt); if (num_pages > ctxt->tc_reserved_pages) { add_pages = 1; diff_pages = num_pages - ctxt->tc_reserved_pages; } else { add_pages = 0; diff_pages = ctxt->tc_reserved_pages - num_pages; } volume_unlock(ctxt); if (!diff_pages) { return len; } if (add_pages) { if (inm_tc_resv_add(ctxt, diff_pages)) { num_pages_available = driver_ctx->dc_cur_unres_pages; err("VolumeResDataPoolSize is not within limits. Available:%uMB", (num_pages_available >> (MEGABYTE_BIT_SHIFT-INM_PAGESHIFT))); return -EINVAL; } } else { if (inm_tc_resv_del(ctxt, diff_pages)) { num_pages_available = driver_ctx->dc_cur_unres_pages; err("VolumeResDataPoolSize is not within limits. Available:%uMB", (num_pages_available >> (MEGABYTE_BIT_SHIFT-INM_PAGESHIFT))); return -EINVAL; } } recalc_data_file_mode_thres(); if (!write_vol_attr(ctxt, file_name, (void *)buf, len)) { err("VolumeResDataPoolSize update failed to write file:%s.", file_name); return -EINVAL; } return len; } void read_vol_flt_disabled(target_context_t *ctxt, char * fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_vol_attr(ctxt, fname, &buf, buf_len, &bytes_read)) { goto set_default; } else { inm_s32_t disabled; if(!is_digit(buf, bytes_read)) goto set_default; disabled = inm_atoi(buf); if(disabled != 0 && disabled != 1) goto set_default; if(disabled != 0) { ctxt->tc_flags |= VCF_FILTERING_STOPPED; } else { ctxt->tc_flags &= ~VCF_FILTERING_STOPPED; } goto free_buf; } set_default: if (ctxt->tc_flags & VCF_VOLUME_STACKED_PARTIALLY) ctxt->tc_flags &= ~VCF_FILTERING_STOPPED; else ctxt->tc_flags |= VCF_FILTERING_STOPPED; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); } void read_bmapread_disabled(target_context_t *ctxt, char * fname) { return; } void read_bmapwrite_disabled(target_context_t *ctxt, char * fname) { return; } void read_data_flt_enabled(target_context_t *ctxt, char * fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; char tempbuf[2]; if(!read_vol_attr(ctxt, fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { inm_s32_t enabled; if(!is_digit(buf, bytes_read)) goto set_default; enabled = inm_atoi(buf); if(enabled != 0 && enabled != 1) goto set_default; if(enabled) ctxt->tc_flags &= ~VCF_DATA_MODE_DISABLED; else ctxt->tc_flags |= VCF_DATA_MODE_DISABLED; goto free_buf; } set_default: if(driver_ctx->tunable_params.enable_data_filtering_for_new_volumes) tempbuf[0] = '1'; else tempbuf[0] = '0'; tempbuf[1] = '\0'; vol_data_flt_store(ctxt,fname, tempbuf, strlen(tempbuf)); free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); } void read_data_files_enabled(target_context_t *ctxt, char * fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; char tempbuf[2]; if(!read_vol_attr(ctxt, fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { inm_s32_t enabled; if(!is_digit(buf, bytes_read)) goto set_default; enabled = inm_atoi(buf); if(enabled != 0 && enabled != 1) goto set_default; if(enabled) ctxt->tc_flags &= ~VCF_DATA_FILES_DISABLED; else ctxt->tc_flags |= VCF_DATA_FILES_DISABLED; goto free_buf; } set_default: if(driver_ctx->tunable_params.enable_data_file_mode_for_new_volumes) tempbuf[0] = '1'; else tempbuf[0] = '0'; tempbuf[1] = '\0'; vol_data_files_store(ctxt,fname, tempbuf, strlen(tempbuf)); free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); } void read_data_to_disk_limit(target_context_t *ctxt, char * fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; inm_s64_t lmt = 0; char tempbuf[NUM_CHARS_IN_INTEGER + 1]; if(!read_vol_attr(ctxt, fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; lmt = inm_atoi(buf); ctxt->tc_data_to_disk_limit = (lmt * MEGABYTES); goto free_buf; } set_default: snprintf(tempbuf, NUM_CHARS_IN_INTEGER, "%d", (int)(driver_ctx->tunable_params.data_to_disk_limit/MEGABYTES)); vol_data_to_disk_limit_store(ctxt, fname, tempbuf, strlen(tempbuf)); free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); } void read_data_notify_limit(target_context_t *ctxt, char * fname) { ctxt->tc_db_notify_thres = driver_ctx->tunable_params.db_notify; } void read_data_log_dir(target_context_t *ctxt, char * fname) { inm_s32_t bytes_read = 0, buf_len = INM_PATH_MAX; void *buf = NULL; unsigned long lock_flag, lock_flag1 = 0; char *path_buf = NULL; if(!read_vol_attr(ctxt,fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(((char *)buf)[bytes_read-1] == '\n') bytes_read--; INM_SPIN_LOCK_IRQSAVE(&ctxt->tc_tunables_lock, lock_flag1); if (memcpy_s(ctxt->tc_data_log_dir, INM_PATH_MAX, buf, bytes_read)) { INM_SPIN_UNLOCK_IRQRESTORE(&ctxt->tc_tunables_lock, lock_flag1); dbg("memcpy_s failed to copy the datafile log dire path"); goto set_default; } ctxt->tc_data_log_dir[bytes_read] = '\0'; INM_SPIN_UNLOCK_IRQRESTORE(&ctxt->tc_tunables_lock, lock_flag1); goto free_buf; } set_default: get_path_memory(&path_buf); INM_SPIN_LOCK_IRQSAVE(&driver_ctx->tunables_lock, lock_flag); if(path_buf) strcpy_s(path_buf, INM_PATH_MAX, driver_ctx->tunable_params.data_file_log_dir); INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->tunables_lock, lock_flag); if(path_buf) { vol_log_dir_store(ctxt, fname, path_buf, strlen(path_buf)); free_path_memory(&path_buf); } free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); } void read_bmap_gran(target_context_t *ctxt, char * fname) { return; } void read_resync_req(target_context_t *ctxt, char * fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_vol_attr(ctxt, fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; ctxt->tc_resync_required = inm_atoi(buf); goto free_buf; } set_default: ctxt->tc_resync_required = 0; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); } void read_osync_errcode(target_context_t *ctxt, char * fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_vol_attr(ctxt, fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; ctxt->tc_out_of_sync_err_code = inm_atoi(buf); ctxt->tc_hist.ths_osync_err = ctxt->tc_out_of_sync_err_code; goto free_buf; } set_default: ctxt->tc_out_of_sync_err_code = 0; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); } void read_osync_status(target_context_t *ctxt, char * fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_vol_attr(ctxt, fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; ctxt->tc_out_of_sync_err_status = inm_atoi(buf); goto free_buf; } set_default: ctxt->tc_out_of_sync_err_status = 0; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); } void read_osync_count(target_context_t *ctxt, char * fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_vol_attr(ctxt, fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; ctxt->tc_nr_out_of_sync = inm_atoi(buf); ctxt->tc_hist.ths_nr_osyncs = ctxt->tc_nr_out_of_sync; goto free_buf; } set_default: ctxt->tc_nr_out_of_sync = 0; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); } void read_osync_ts(target_context_t *ctxt, char * fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_LONGLONG + 1); void *buf = NULL; if(!read_vol_attr(ctxt, fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; ctxt->tc_out_of_sync_time_stamp= inm_atoi64(buf); ctxt->tc_hist.ths_osync_ts = ctxt->tc_out_of_sync_time_stamp; goto free_buf; } set_default: ctxt->tc_out_of_sync_time_stamp = 0; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); } void read_osync_desc(target_context_t *ctxt, char *fname) { return; } void inm_read_reserv_dpsize(target_context_t *ctxt, char *fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_LONGLONG + 1); inm_u32_t thres = 0; void *buf = NULL; if (ctxt->tc_dev_type == FILTER_DEV_MIRROR_SETUP) { return; } if(!read_vol_attr(ctxt, fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(!is_digit(buf, bytes_read)) { goto set_default; } thres = inm_atoi(buf); if ((thres >= 0) && (thres <= driver_ctx->tunable_params.data_pool_size)) { ctxt->tc_reserved_pages = thres << (MEGABYTE_BIT_SHIFT-INM_PAGESHIFT); } else { goto set_default; } goto free_buf; } set_default: ctxt->tc_reserved_pages = driver_ctx->dc_vol_data_pool_size; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); } inm_s32_t filter_dev_type_show(target_context_t *ctxt, char *buf) { switch(ctxt->tc_dev_type) { case FILTER_DEV_HOST_VOLUME: return snprintf(buf, INM_PAGESZ, "Host-Volume"); break; case FILTER_DEV_FABRIC_LUN: return snprintf(buf, INM_PAGESZ, "Fabric-Lun"); break; case FILTER_DEV_MIRROR_SETUP: return snprintf(buf, INM_PAGESZ, "Host-Mirror-Setup"); break; default: break; } return 0; } void read_filter_dev_type(target_context_t *ctxt, char * fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_vol_attr(ctxt, fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); } else { ctxt->tc_dev_type = 9999; /* invalid type */ if(is_digit(buf, bytes_read)) ctxt->tc_dev_type = (inm_device_t) inm_atoi(buf); } if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); } void read_filter_dev_nblks(target_context_t *ctxt, char * fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_LONGLONG + 1); void *buf = NULL; if(!read_vol_attr(ctxt, fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); } else { inm_u64_t val; if(!is_digit(buf, bytes_read)) goto set_default; val = inm_atoi64(buf); switch(ctxt->tc_dev_type) { case FILTER_DEV_HOST_VOLUME: case FILTER_DEV_MIRROR_SETUP: ((host_dev_ctx_t *)ctxt->tc_priv)->hdc_nblocks = val; break; case FILTER_DEV_FABRIC_LUN: ((target_volume_ctx_t *) (ctxt->tc_priv))->nblocks = val; break; default: break; } goto free_buf; } set_default: switch(ctxt->tc_dev_type) { case FILTER_DEV_HOST_VOLUME: case FILTER_DEV_MIRROR_SETUP: ((host_dev_ctx_t *)ctxt->tc_priv)->hdc_nblocks = 0; break; default: break; } free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); } inm_s32_t filter_dev_nblks_show(target_context_t *ctxt, char *buf) { switch(ctxt->tc_dev_type) { case FILTER_DEV_HOST_VOLUME: case FILTER_DEV_MIRROR_SETUP: return snprintf(buf, INM_PAGESZ, "%lld", (long long)((ctxt->tc_priv != NULL) ? ((host_dev_ctx_t *)ctxt->tc_priv)->hdc_nblocks : -1)); break; case FILTER_DEV_FABRIC_LUN: return sprintf(buf, "%lld", (long long)((ctxt->tc_priv != NULL) ? ((target_volume_ctx_t*)ctxt->tc_priv)->nblocks : -1)); break; default: break; } return 0; } inm_s32_t filter_dev_nblks_store(target_context_t *ctxt, char * file_name, const char *buf, inm_s32_t len) { inm_u64_t val = 0; if(!is_digit(buf, len)) { err("Invalid value for volume number of blocks = %s : has non-digit chars", buf); return -EINVAL; } val = inm_atoi64(buf); if(!write_vol_attr(ctxt, file_name, (void *)buf, len)) { return -EINVAL; } else { switch(ctxt->tc_dev_type) { case FILTER_DEV_HOST_VOLUME: case FILTER_DEV_MIRROR_SETUP: ((host_dev_ctx_t *)ctxt->tc_priv)->hdc_nblocks = val; break; default: break; } } return len; } void read_filter_dev_bsize(target_context_t *ctxt, char * fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_vol_attr(ctxt, fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); } else { inm_u32_t val; if(!is_digit(buf, bytes_read)) goto set_default; val = inm_atoi(buf); switch(ctxt->tc_dev_type) { case FILTER_DEV_HOST_VOLUME: case FILTER_DEV_MIRROR_SETUP: ((host_dev_ctx_t *)ctxt->tc_priv)->hdc_bsize = val; break; case FILTER_DEV_FABRIC_LUN: ((target_volume_ctx_t *) (ctxt->tc_priv))->bsize = val; break; default: break; } goto free_buf; } set_default: switch(ctxt->tc_dev_type) { case FILTER_DEV_HOST_VOLUME: case FILTER_DEV_MIRROR_SETUP: ((host_dev_ctx_t *)ctxt->tc_priv)->hdc_bsize = 0; break; default: break; } free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); } inm_s32_t filter_dev_bsize_show(target_context_t *ctxt, char *buf) { switch(ctxt->tc_dev_type) { case FILTER_DEV_HOST_VOLUME: case FILTER_DEV_MIRROR_SETUP: return snprintf(buf, INM_PAGESZ, "%d", ((ctxt->tc_priv != NULL) ? ((host_dev_ctx_t *)ctxt->tc_priv)->hdc_bsize : -1)); break; case FILTER_DEV_FABRIC_LUN: return sprintf(buf, "%d", ((ctxt->tc_priv != NULL) ? ((target_volume_ctx_t*)ctxt->tc_priv)->bsize : -1)); break; default: break; } return 0; } inm_s32_t inm_vol_pt_path_show(target_context_t *ctxt, char *buf) { inm_u32_t len = 0; target_volume_ctx_t *tvcptr = NULL; if (ctxt->tc_dev_type != FILTER_DEV_FABRIC_LUN) { dbg("Volume tunable only for fabric setup"); return 0; } if (ctxt->tc_dev_type == FILTER_DEV_FABRIC_LUN) { tvcptr = (target_volume_ctx_t*) (ctxt->tc_priv); if (tvcptr) { len = snprintf(buf, INM_PAGESZ, "%s\n", tvcptr->pt_guid); err("In inm_vol_pt_path_show GUID %s", tvcptr->pt_guid); } } else { len = snprintf(buf, INM_PAGESZ, "Operation is not supported on %s\n", ctxt->tc_guid); } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("leaving inm_vol_pt_path_show :%s", buf); } return len; } inm_s32_t inm_vol_pt_path_store(target_context_t *ctxt, char *file_name, const char *buf, inm_s32_t len) { target_volume_ctx_t *tvcptr = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("Entering inm_vol_pt_path_store :%s", buf); } if (ctxt->tc_dev_type != FILTER_DEV_FABRIC_LUN) { dbg("Volume tunable only for fabric setup"); return 0; } if (len > INM_GUID_LEN_MAX) { err("%s PT path is too long or bad", buf); return -EINVAL; } if (ctxt->tc_dev_type == FILTER_DEV_FABRIC_LUN) { tvcptr = (target_volume_ctx_t*) (ctxt->tc_priv); if (tvcptr) { INM_MEM_ZERO(tvcptr->pt_guid, INM_GUID_LEN_MAX); if (strncpy_s(tvcptr->pt_guid, INM_GUID_LEN_MAX, buf, len)) return -INM_EFAULT; tvcptr->pt_guid[len] = '\0'; err("In inm_vol_pt_path_store GUID :%s:", tvcptr->pt_guid); } if (!write_vol_attr(ctxt, file_name, (void *)buf, len)) { err("VolumePTPath update failed to write file:%s.", file_name); return -EINVAL; } } else { err("Operation is not supported on %s\n", ctxt->tc_guid); } return len; } void inm_read_pt_path(target_context_t *ctxt, char *fname) { inm_s32_t bytes_read = 0, buf_len = (INM_GUID_LEN_MAX); void *buf = NULL; inm_u32_t len = 0; target_volume_ctx_t *tvcptr = NULL; if (ctxt->tc_dev_type != FILTER_DEV_FABRIC_LUN) { dbg("Volume tunable only for fabric setup"); return; } if (!read_vol_attr(ctxt, fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { len = strlen(buf); if (len > INM_GUID_LEN_MAX) { err("%s PT path is too long or bad", (char *)buf); goto free_buf; } if (ctxt->tc_dev_type == FILTER_DEV_FABRIC_LUN) { tvcptr = (target_volume_ctx_t*) (ctxt->tc_priv); if (tvcptr) { memcpy_s(tvcptr->pt_guid, INM_GUID_LEN_MAX, buf, len); } } } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("Leaving inm_read_pt_path :%s", (char *)buf); } set_default: free_buf: if (buf) { INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); } } inm_s32_t inm_vol_at_direct_rd_show(target_context_t *ctxt, char *buf) { inm_u32_t len = 0; target_volume_ctx_t *tvcptr = NULL; if (ctxt->tc_dev_type != FILTER_DEV_FABRIC_LUN) { err("Volume tunable only for fabric setup"); return 0; } if (ctxt->tc_dev_type == FILTER_DEV_FABRIC_LUN) { tvcptr = (target_volume_ctx_t*) (ctxt->tc_priv); if (tvcptr) { len = snprintf(buf, INM_PAGESZ, "%d\n", tvcptr->flags & TARGET_VOLUME_DIRECT_IO); } } else { len = snprintf(buf, INM_PAGESZ, "Operation is not supported on %s\n", ctxt->tc_guid); } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("leaving inm_vol_at_direct_rd_show :%s", buf); } return len; } inm_s32_t inm_vol_at_direct_rd_store(target_context_t *ctxt, char *file_name, const char *buf, inm_s32_t len) { inm_s32_t val = 0; target_volume_ctx_t *tvcptr = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("entering inm_vol_at_direct_rd_store :%s", buf); } if (ctxt->tc_dev_type != FILTER_DEV_FABRIC_LUN) { dbg("Volume tunable only for fabric setup"); return 0; } if(!is_digit(buf, len)) { err("Invalid value for volume data filtering %s : has non-digit chars", buf); return -EINVAL; } val = inm_atoi(buf); if ((val != 0) && (val != 1)) { err("Cant have anything other than 0 and 1 for VolumeATDirectRead"); return -EINVAL; } if (!write_vol_attr(ctxt, file_name, (void *)buf, len)) { return -EINVAL; } else { volume_lock(ctxt); tvcptr = (target_volume_ctx_t*) (ctxt->tc_priv); if (tvcptr) { if (val) { tvcptr->flags |= TARGET_VOLUME_DIRECT_IO; } else { tvcptr->flags &= ~TARGET_VOLUME_DIRECT_IO; } } volume_unlock(ctxt); } return len; } void inm_read_at_direct_rd(target_context_t *ctxt, char *fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; target_volume_ctx_t *tvcptr = (target_volume_ctx_t*) (ctxt->tc_priv); if (ctxt->tc_dev_type != FILTER_DEV_FABRIC_LUN) { dbg("Volume tunable only for fabric setup"); return; } if (!read_vol_attr(ctxt, fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { inm_s32_t enabled; if (!is_digit(buf, bytes_read)) goto set_default; enabled = inm_atoi(buf); if (tvcptr) { if(enabled != 0 && enabled != 1) goto set_default; if (enabled) tvcptr->flags |= TARGET_VOLUME_DIRECT_IO; else tvcptr->flags &= ~TARGET_VOLUME_DIRECT_IO; } goto free_buf; } set_default: if (tvcptr) { tvcptr->flags |= TARGET_VOLUME_DIRECT_IO; } free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); } inm_s32_t filter_dev_bsize_store(target_context_t *ctxt, char * file_name, const char *buf, inm_s32_t len) { inm_u32_t val = 0; if(!is_digit(buf, len)) { err("Invalid value for volume block size = %s : has non-digit chars", buf); return -EINVAL; } val = inm_atoi(buf); if(!write_vol_attr(ctxt, file_name, (void *)buf, len)) { return -EINVAL; } else { switch(ctxt->tc_dev_type) { case FILTER_DEV_HOST_VOLUME: case FILTER_DEV_MIRROR_SETUP: ((host_dev_ctx_t *)ctxt->tc_priv)->hdc_bsize = val; break; default: break; } } return len; } inm_s32_t vol_mount_point_show(target_context_t *ctxt, char *buf) { if (ctxt->tc_dev_type != FILTER_DEV_HOST_VOLUME) return 0; return snprintf(buf, INM_PAGESZ, "%s", ctxt->tc_mnt_pt); } inm_s32_t vol_mount_point_store(target_context_t *ctxt, char * file_name, const char *buf, inm_s32_t len) { if (ctxt->tc_dev_type != FILTER_DEV_HOST_VOLUME) return len; if(!write_vol_attr(ctxt, file_name, (void *)buf, len)) { return -EINVAL; } else { switch(ctxt->tc_dev_type) { case FILTER_DEV_HOST_VOLUME: volume_lock(ctxt); if (strncpy_s(ctxt->tc_mnt_pt, len + 1, buf, len)) { volume_unlock(ctxt); return -INM_EFAULT; } ctxt->tc_mnt_pt[len] = '\0'; volume_unlock(ctxt); break; default: break; } } return len; } void read_vol_mount_point(target_context_t *ctxt, char * fname) { inm_s32_t bytes_read = 0, buf_len = INM_PATH_MAX; void *buf = NULL; if (ctxt->tc_dev_type != FILTER_DEV_HOST_VOLUME) goto set_default; if(!read_vol_attr(ctxt,fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if(((char *)buf)[bytes_read-1] == '\n') bytes_read--; volume_lock(ctxt); if (memcpy_s(ctxt->tc_mnt_pt, INM_PATH_MAX, buf, bytes_read)) { volume_unlock(ctxt); goto set_default; } ctxt->tc_mnt_pt[bytes_read] = '\0'; volume_unlock(ctxt); goto free_buf; } set_default: ctxt->tc_mnt_pt[0] = '\0'; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); } inm_s32_t vol_prev_end_timestamp_show(target_context_t *ctxt, char *buf) { return snprintf(buf, INM_PAGESZ, "%llu", (unsigned long long)ctxt->tc_PrevEndTimeStamp); } inm_s32_t vol_prev_end_timestamp_store(target_context_t *ctxt, char * file_name, const char *buf, inm_s32_t len) { inm_u64_t val; if(!is_digit(buf, len)) { err("Invalid value for volume previous end timestamps = %s : has non-digit chars", buf); return -EINVAL; } val = inm_atoi64(buf); if(!write_vol_attr(ctxt, file_name, (void *)buf, len)) { return -EINVAL; } else { ctxt->tc_PrevEndTimeStamp = val; } return len; } void read_vol_prev_end_timestamp(target_context_t *ctxt, char * fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_LONGLONG + 1); void *buf = NULL; if(!read_vol_attr(ctxt, fname, &buf, buf_len, &bytes_read)) { goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; ctxt->tc_PrevEndTimeStamp = inm_atoi64(buf); goto free_buf; } set_default: ctxt->tc_PrevEndTimeStamp = 0; free_buf: if(driver_ctx->unclean_shutdown){ err("The system is not cleanly shutdowned"); set_volume_out_of_sync(ctxt, ERROR_TO_REG_UNCLEAN_SYS_SHUTDOWN, 0); ctxt->tc_PrevEndTimeStamp = -1ULL; } if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); } inm_s32_t vol_prev_end_sequence_number_show(target_context_t *ctxt, char *buf) { return snprintf(buf, INM_PAGESZ, "%llu", (unsigned long long)ctxt->tc_PrevEndSequenceNumber); } inm_s32_t vol_prev_end_sequence_number_store(target_context_t *ctxt, char * file_name, const char *buf, inm_s32_t len) { inm_u64_t val; if(!is_digit(buf, len)) { err("Invalid value for volume previous end sequence number = %s : has non-digit chars", buf); return -EINVAL; } val = inm_atoi64(buf); if(!write_vol_attr(ctxt, file_name, (void *)buf, len)) { return -EINVAL; } else { ctxt->tc_PrevEndSequenceNumber = val; } return len; } void read_vol_prev_end_sequence_number(target_context_t *ctxt, char * fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_LONGLONG + 1); void *buf = NULL; if(!read_vol_attr(ctxt, fname, &buf, buf_len, &bytes_read)) { goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; ctxt->tc_PrevEndSequenceNumber = inm_atoi64(buf); goto free_buf; } set_default: ctxt->tc_PrevEndSequenceNumber = 0; free_buf: if(driver_ctx->unclean_shutdown) ctxt->tc_PrevEndSequenceNumber = -1ULL; if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); } inm_s32_t vol_prev_sequence_id_for_split_io_show(target_context_t *ctxt, char *buf) { return snprintf(buf, INM_PAGESZ, "%u", ctxt->tc_PrevSequenceIDforSplitIO); } inm_s32_t vol_prev_sequence_id_for_split_io_store(target_context_t *ctxt, char * file_name, const char *buf, inm_s32_t len) { inm_u32_t val; if(!is_digit(buf, len)) { err("Invalid value for volume previous continuation id = %s : has non-digit chars", buf); return -EINVAL; } val = inm_atoi(buf); if(!write_vol_attr(ctxt, file_name, (void *)buf, len)) { return -EINVAL; } else { volume_lock(ctxt); ctxt->tc_PrevSequenceIDforSplitIO = val; volume_unlock(ctxt); } return len; } void read_vol_prev_sequence_id_for_split_io(target_context_t *ctxt, char * fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_vol_attr(ctxt, fname, &buf, buf_len, &bytes_read)) { goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; ctxt->tc_PrevSequenceIDforSplitIO = inm_atoi(buf); goto free_buf; } set_default: ctxt->tc_PrevSequenceIDforSplitIO = 0; free_buf: if(driver_ctx->unclean_shutdown) ctxt->tc_PrevSequenceIDforSplitIO = -1; if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); } inm_s32_t vol_mirror_source_list_show(target_context_t *ctxt, char *buf) { struct inm_list_head *ptr = NULL, *nextptr = NULL; mirror_vol_entry_t *vol_entry; inm_s32_t buf_len = 0; char *sptr = buf; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("entring"); } if (ctxt->tc_dev_type != FILTER_DEV_MIRROR_SETUP) return 0; INM_MEM_ZERO(buf,INM_PAGESZ); volume_lock(ctxt); inm_list_for_each_safe(ptr, nextptr, &ctxt->tc_src_list) { vol_entry = inm_list_entry(ptr, mirror_vol_entry_t, next); if (buf_len+strlen(vol_entry->tc_mirror_guid+1) > INM_PAGESZ) { break; } snprintf(sptr, INM_PAGESZ-buf_len, "%s\n", vol_entry->tc_mirror_guid); sptr += strlen(vol_entry->tc_mirror_guid)+1; buf_len += strlen(vol_entry->tc_mirror_guid)+1; } volume_unlock(ctxt); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("leaving buf_len:%d",buf_len); } return buf_len; } inm_s32_t vol_mirror_source_list_store(target_context_t *ctxt, char * file_name, const char *buf, inm_s32_t len) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("entered buf:%s",buf); } if (ctxt->tc_dev_type != FILTER_DEV_MIRROR_SETUP) return len; if(!write_vol_attr(ctxt, file_name, (void *)buf, len)) { return -EINVAL; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("leaving buf:%s len:%d",buf,len); } return len; } inm_s32_t prepare_volume_list(char *buf, struct inm_list_head *list_head, int keep_device_open) { char *str, *str1, *str2; int len = 0; int err = 0; mirror_vol_entry_t *vol_entry; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("entered buf:%s",buf); } str = buf; while (1) { if ((str1 = strchr(str,'/'))) { if ((str2 = strchr(str,','))) { *str2 = '\0'; } len = strlen(str1); vol_entry = INM_KMALLOC(sizeof(mirror_vol_entry_t), INM_KM_SLEEP, INM_KERNEL_HEAP); INM_MEM_ZERO(vol_entry, sizeof(mirror_vol_entry_t)); if (strncpy_s(vol_entry->tc_mirror_guid, INM_GUID_LEN_MAX, str1, len)) { INM_KFREE(vol_entry, sizeof(mirror_vol_entry_t), INM_KERNEL_HEAP); err = 1; break; } vol_entry->tc_mirror_guid[len] = '\0'; #ifdef INM_LINUX vol_entry->mirror_dev = open_by_dev_path(vol_entry->tc_mirror_guid, 1); if (!vol_entry->mirror_dev || vol_entry->mirror_dev->bd_disk) { err("Failed to open the volume:%s during boot time stacking", vol_entry->tc_mirror_guid); INM_KFREE(vol_entry, sizeof(mirror_vol_entry_t), INM_KERNEL_HEAP); err = 1; break; } else { if (!keep_device_open) { close_bdev(vol_entry->mirror_dev, FMODE_WRITE); } } #else #ifdef INM_SOLARIS vol_entry->mirror_dev = (inm_block_device_t *) INM_KMALLOC(sizeof(inm_block_device_t), INM_KM_SLEEP, INM_KERNEL_HEAP); *vol_entry->mirror_dev = 0; *vol_entry->mirror_dev = inm_get_dev_t_from_path(vol_entry->tc_mirror_guid); if (!*vol_entry->mirror_dev) { err("Failed to open the volume:%s during boot time stacking for" "mirror setup", vol_entry->tc_mirror_guid); INM_KFREE(vol_entry, sizeof(mirror_vol_entry_t), INM_KERNEL_HEAP); err = 1; break; } #endif #endif dbg("prepare_volume_list(): Mirror Volume Path %s", vol_entry->tc_mirror_guid); inm_list_add_tail(&vol_entry->next, list_head); if (str2) { *str2 = ','; str = str2+1; } else { break; } } else { break; } } if (err) { free_mirror_list(list_head, keep_device_open); } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("leaving buf:%s err:%d",buf,err); } return err; } void read_vol_mirror_source_list(target_context_t *ctxt, char * fname) { inm_s32_t bytes_read = 0, buf_len; void *buf = NULL; struct inm_list_head src_mirror_list_head; int err = 0; if (1) { return; } if (!inm_list_empty(&ctxt->tc_src_list)) { return; } INM_INIT_LIST_HEAD(&src_mirror_list_head); buf_len = INM_MAX_VOLUMES_IN_LIST*INM_GUID_LEN_MAX; if(!read_vol_attr(ctxt, fname, &buf, buf_len, &bytes_read)) { err = 1; goto error_case; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("entered %s", (char*)buf); } err = prepare_volume_list(buf, &src_mirror_list_head, 0); if (err) { free_mirror_list(&src_mirror_list_head, 0); } else { inm_list_splice_at_tail(&src_mirror_list_head, &ctxt->tc_src_list); } error_case: if (err) { err("Failed to read source volumes of mirror setup"); } if (buf) { INM_KFREE(buf, INM_MAX_VOLUMES_IN_LIST*INM_GUID_LEN_MAX, INM_KERNEL_HEAP); } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("leaving err:%d", err); } } inm_s32_t vol_mirror_destination_list_show(target_context_t *ctxt, char *buf) { struct inm_list_head *ptr = NULL, *nextptr = NULL; mirror_vol_entry_t *vol_entry; inm_s32_t buf_len = 0; char *sptr = buf; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("entring"); } if (ctxt->tc_dev_type != FILTER_DEV_MIRROR_SETUP) return 0; INM_MEM_ZERO(buf, INM_PAGESZ); volume_lock(ctxt); inm_list_for_each_safe(ptr, nextptr, &ctxt->tc_dst_list) { vol_entry = inm_list_entry(ptr, mirror_vol_entry_t, next); if (buf_len+strlen(vol_entry->tc_mirror_guid+1) > INM_PAGESZ) { break; } snprintf(sptr, INM_PAGESZ-buf_len, "%s\n", vol_entry->tc_mirror_guid); sptr += strlen(vol_entry->tc_mirror_guid)+1; buf_len += strlen(vol_entry->tc_mirror_guid)+1; } volume_unlock(ctxt); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("leaving buf_len:%d",buf_len); } return buf_len; } inm_s32_t vol_mirror_destination_list_store(target_context_t *ctxt, char * file_name, const char *buf, inm_s32_t len) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("entered buf:%s",buf); } if (ctxt->tc_dev_type != FILTER_DEV_MIRROR_SETUP) return len; if(!write_vol_attr(ctxt, file_name, (void *)buf, len)) { return -EINVAL; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("leaving buf:%s len:%d",buf,len); } return len; } void read_vol_mirror_destination_list(target_context_t *ctxt, char * fname) { } inm_s32_t vol_mirror_destination_scsi_show(target_context_t *ctxt, char *uuid) { inm_s32_t len = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered volume:%s",ctxt->tc_pname); } uuid = filter_guid_name_string_get(ctxt->tc_pname, "VolumeMirrorDestinationScsiID", INM_MAX_SCSI_ID_SIZE); if (!uuid){ len = 0; } else { uuid[INM_MAX_SCSI_ID_SIZE-1] = '\0'; len = strlen(uuid); } return len; } inm_s32_t vol_PTpath_list_show(target_context_t *ctxt, char *bufp) { inm_s32_t len = 0; struct inm_list_head *ptr, *nextptr; mirror_vol_entry_t *vol_entry = NULL; #if (defined(IDEBUG) || defined(IDEBUG_BMAP)) info("entered volume:%s",ctxt->tc_pname); #endif if(ctxt->tc_dev_type != FILTER_DEV_MIRROR_SETUP){ goto out; } volume_lock(ctxt); inm_list_for_each_safe(ptr, nextptr, &ctxt->tc_src_list) { vol_entry = inm_container_of(ptr, mirror_vol_entry_t, next); len += snprintf((bufp+(len)), (INM_PAGESZ - len), vol_entry->tc_mirror_guid); len += snprintf((bufp+(len)), (INM_PAGESZ - len), "\n"); } volume_unlock(ctxt); out: return len; } inm_s32_t vol_mirror_destination_scsi_store(target_context_t *ctxt, char * file_name, const char *buf, inm_s32_t len) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered volume:%s",ctxt->tc_pname); } if (ctxt->tc_dev_type != FILTER_DEV_MIRROR_SETUP) return len; if(strlen(buf) == 0){ return len; } if(!write_vol_attr(ctxt, file_name, (void *)buf, len)) { return -EINVAL; } return len; } inm_u64_t filter_full_disk_flags_get(target_context_t *ctxt) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_LONGLONG + 1); void *buf = NULL; inm_u64_t flags = 0; if(!read_vol_attr(ctxt, "VolumeDiskFlags", &buf, buf_len, &bytes_read)) { goto out; } else { if(!is_digit(buf, bytes_read)){ goto out; } flags = inm_atoi(buf); } out: return flags; } inm_s32_t vol_disk_flags_store(target_context_t *ctxt, char * file_name, const char *buf, inm_s32_t len) { if(!is_digit(buf, len)) { err("Invalid value for volume disk flag %s : has non-digit chars", buf); return INM_EINVAL; } if(!write_vol_attr(ctxt, file_name, (void *)buf, len)) { return INM_EINVAL; } return 0; } inm_s32_t vol_dev_multipath_store(target_context_t *ctxt, char * file_name, const char *buf, inm_s32_t len) { if(!is_digit(buf, len)) { err("Invalid value for volume disk flag %s : has non-digit chars", buf); return INM_EINVAL; } if(!write_vol_attr(ctxt, file_name, (void *)buf, len)) { return INM_EINVAL; } return 0; } inm_s32_t vol_dev_vendor_store(target_context_t *ctxt, char * file_name, const char *buf, inm_s32_t len) { if(!is_digit(buf, len)) { err("Invalid value for volume disk flag %s : has non-digit chars", buf); return INM_EINVAL; } if(!write_vol_attr(ctxt, file_name, (void *)buf, len)) { return INM_EINVAL; } return 0; } inm_s32_t vol_dev_startoff_store(target_context_t *ctxt, char * file_name, const char *buf, inm_s32_t len) { if(!is_digit(buf, len)) { err("Invalid value for volume disk flag %s : has non-digit chars", buf); return INM_EINVAL; } if(!write_vol_attr(ctxt, file_name, (void *)buf, len)) { return INM_EINVAL; } return 0; } void vol_dev_startoff_read(target_context_t *ctxt, char * fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_LONGLONG + 1); void *buf = NULL; if(!read_vol_attr(ctxt, fname, &buf, buf_len, &bytes_read)) { goto set_default; } else { inm_u64_t startoff; if(!is_digit(buf, bytes_read)) goto free_buf; startoff = inm_atoi(buf); ctxt->tc_dev_startoff = startoff; } goto free_buf; set_default: ctxt->tc_dev_startoff = 0; free_buf: if(buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); } inm_s32_t mirror_dst_id_get(target_context_t *ctxt, char *uuid) { inm_s32_t err = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered volume:%s",ctxt->tc_pname); } if (!vol_mirror_destination_scsi_show(ctxt, uuid)){ err = 1; } return err; } inm_s32_t mirror_dst_id_set(target_context_t *ctxt, char *uuid) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered volume:%s",ctxt->tc_pname); } if(strlen(uuid) == 0){ return 0; } if(!vol_mirror_destination_scsi_store(ctxt, "VolumeMirrorDestinationScsiID", (const char *)uuid, INM_MAX_SCSI_ID_SIZE)){ return 1; } return 0; } inm_s32_t vol_max_xfersz_show(target_context_t *ctxt, char *buf) { host_dev_ctx_t *hdcp = ctxt->tc_priv; host_dev_t *hdc_dev = NULL; inm_u32_t mxs = 0; if (ctxt->tc_dev_type & FILTER_DEV_HOST_VOLUME){ volume_lock(ctxt); hdc_dev = inm_list_entry((hdcp->hdc_dev_list_head.next), host_dev_t, hdc_dev_list); mxs = INM_GET_HDEV_MXS(hdc_dev); volume_unlock(ctxt); return snprintf(buf, INM_PAGESZ, "%u", INM_GET_HDEV_MXS(hdc_dev)); } return 0; } inm_s32_t vol_max_xfersz_store(target_context_t *ctxt, char * file_name, const char *buf, inm_s32_t len) { inm_u32_t val; host_dev_ctx_t *hdcp = ctxt->tc_priv; host_dev_t *hdc_dev = NULL; if (!(ctxt->tc_dev_type & FILTER_DEV_HOST_VOLUME)){ return 0; } if(!is_digit(buf, len)) { err("Invalid value for volume perf opt= %s : has non-digit chars", buf); return -EINVAL; } val = inm_atoi(buf); if(!write_vol_attr(ctxt, file_name, (void *)buf, len)) { return INM_EINVAL; } else { volume_lock(ctxt); hdc_dev = inm_list_entry((hdcp->hdc_dev_list_head.next), host_dev_t, hdc_dev_list); INM_SET_HDEV_MXS(hdc_dev, val); volume_unlock(ctxt); } return len; } void vol_max_xfersz_read(target_context_t *ctxt, char * fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; host_dev_ctx_t *hdcp = ctxt->tc_priv; host_dev_t *hdc_dev = NULL; if (!(ctxt->tc_dev_type & FILTER_DEV_HOST_VOLUME)){ return; } hdc_dev = inm_list_entry((hdcp->hdc_dev_list_head.next), host_dev_t, hdc_dev_list); if(!read_vol_attr(ctxt, fname, &buf, buf_len, &bytes_read)) { goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; INM_SET_HDEV_MXS(hdc_dev, (inm_atoi(buf))); goto free_buf; } set_default: INM_SET_HDEV_MXS(hdc_dev, INM_DEFAULT_VOLUME_MXS); free_buf: if (buf) { INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); } } inm_s32_t vol_perf_opt_show(target_context_t *ctxt, char *buf) { return snprintf(buf, INM_PAGESZ, "%u", ctxt->tc_optimize_performance); } inm_s32_t vol_perf_opt_store(target_context_t *ctxt, char * file_name, const char *buf, inm_s32_t len) { inm_u32_t val; if(!is_digit(buf, len)) { err("Invalid value for volume perf opt= %s : has non-digit chars", buf); return -EINVAL; } val = inm_atoi(buf); if(!write_vol_attr(ctxt, file_name, (void *)buf, len)) { return -EINVAL; } else { volume_lock(ctxt); ctxt->tc_optimize_performance = val | DEFAULT_PERFORMANCE_OPTMIZATION; volume_unlock(ctxt); } return len; } void vol_perf_opt_read(target_context_t *ctxt, char * fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_INTEGER + 1); void *buf = NULL; if(!read_vol_attr(ctxt, fname, &buf, buf_len, &bytes_read)) { goto set_default; } else { if(!is_digit(buf, bytes_read)) goto set_default; ctxt->tc_optimize_performance = inm_atoi(buf); goto free_buf; } set_default: ctxt->tc_optimize_performance = DEFAULT_PERFORMANCE_OPTMIZATION; free_buf: if (buf) { INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); } } inm_u64_t vol_rpo_ts_show(target_context_t *ctxt, char *buf) { return snprintf(buf, INM_PAGESZ, "%llu", ctxt->tc_rpo_timestamp); } void vol_rpo_ts_read(target_context_t *ctxt, char * fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_LONGLONG + 1); void *buf = NULL; if (!read_vol_attr(ctxt, fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if (!is_digit(buf, bytes_read)) { goto set_default; } ctxt->tc_rpo_timestamp = inm_atoi64(buf); dbg("Rpo value set to %llu",ctxt->tc_rpo_timestamp); goto free_buf; } set_default: ctxt->tc_rpo_timestamp = 0; dbg("Rpo value set to default %llu",ctxt->tc_rpo_timestamp); free_buf: if (buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); dbg("RPO timestamp from persistent store = %llu\n", ctxt->tc_rpo_timestamp); } inm_s32_t vol_rpo_ts_store(target_context_t *ctxt, char * file_name, const char *buf, inm_s32_t len) { inm_u64_t val; if (!is_digit(buf, len)) { err("Invalid value for RPO timestamp = %s : has non-digit chars", buf); return -EINVAL; } val = inm_atoi64(buf); ctxt->tc_rpo_timestamp = val; if (!write_vol_attr(ctxt, file_name, (void *)buf, len)) { return -EINVAL; } else { ctxt->tc_rpo_timestamp = val; } return len; } inm_u64_t vol_drain_blocked_show(target_context_t *ctxt, char *buf) { return snprintf(buf, INM_PAGESZ, "%d", (ctxt->tc_flags & VCF_DRAIN_BLOCKED) ? 1 : 0); } inm_s32_t vol_drain_blocked_store(target_context_t *ctxt, char * file_name, const char *buf, inm_s32_t len) { inm_s32_t val; if (!is_digit(buf, len)) { err("Invalid value for volume drain blocked = %s : has non-digit chars", buf); return -EINVAL; } val = inm_atoi(buf); if (val != 0 && val != 1) { err("Invalid value for volume drain block : %d", val); return -EINVAL; } if (!write_vol_attr(ctxt, file_name, (void *)buf, len)) { err("Failed to persist drain block to disk"); return -EINVAL; } else { info("Drain block flag : %d\n", val); volume_lock(ctxt); if (val) { ctxt->tc_flags |= VCF_DRAIN_BLOCKED; } else { ctxt->tc_flags &= ~VCF_DRAIN_BLOCKED; } volume_unlock(ctxt); } return 0; } void vol_drain_blocked_read(target_context_t *ctxt, char * fname) { inm_s32_t bytes_read = 0, buf_len = (NUM_CHARS_IN_LONGLONG + 1), val; void *buf = NULL; if (!read_vol_attr(ctxt, fname, &buf, buf_len, &bytes_read)) { dbg("read failed for %s", fname); goto set_default; } else { if (!is_digit(buf, bytes_read)) { goto set_default; } val = inm_atoi(buf); if (val == 0) { volume_lock(ctxt); ctxt->tc_flags &= ~VCF_DRAIN_BLOCKED; volume_unlock(ctxt); goto free_buf; } else if (val == 1) { volume_lock(ctxt); ctxt->tc_flags |= VCF_DRAIN_BLOCKED; volume_unlock(ctxt); goto free_buf; } else { dbg("Invalid value of val : %d\n", val); goto set_default; } } set_default: volume_lock(ctxt); ctxt->tc_flags &= ~VCF_DRAIN_BLOCKED; volume_unlock(ctxt); dbg("Drain block set to default 0"); free_buf: if (buf) INM_KFREE(buf, buf_len, INM_KERNEL_HEAP); dbg("Drain block from persistent store = %d\n", (ctxt->tc_flags & VCF_DRAIN_BLOCKED) ? 1 : 0); } VOLUME_ATTR(vol_attr_VolumeFilteringDisabled, "VolumeFilteringDisabled", INM_S_IRWXUGO, &vol_flt_disabled_show, &vol_flt_disabled_store, &read_vol_flt_disabled); VOLUME_ATTR(vol_attr_VolumeBitmapReadDisabled, "VolumeBitmapReadDisabled", INM_S_IRWXUGO, &vol_bmapread_disabled_show, &vol_bmapread_disabled_store, &read_bmapread_disabled); VOLUME_ATTR(vol_attr_VolumeBitmapWriteDisabled, "VolumeBitmapWriteDisabled", INM_S_IRWXUGO, &vol_bmapwrite_disabled_show, &vol_bmapwrite_disabled_store, &read_bmapwrite_disabled); VOLUME_ATTR(vol_attr_VolumeDataFiltering, "VolumeDataFiltering", INM_S_IRWXUGO, &vol_data_flt_show, &vol_data_flt_store, &read_data_flt_enabled); VOLUME_ATTR(vol_attr_VolumeDataFiles, "VolumeDataFiles", INM_S_IRWXUGO, &vol_data_files_show, &vol_data_files_store, &read_data_files_enabled); VOLUME_ATTR(vol_attr_VolumeDataToDiskLimitInMB, "VolumeDataToDiskLimitInMB", INM_S_IRWXUGO, &vol_data_to_disk_limit_show, &vol_data_to_disk_limit_store, &read_data_to_disk_limit); VOLUME_ATTR(vol_attr_VolumeDataNotifyLimitInKB, "VolumeDataNotifyLimitInKB", INM_S_IRWXUGO, &vol_data_notify_show, &vol_data_notify_store, &read_data_notify_limit); VOLUME_ATTR(vol_attr_VolumeDataLogDirectory, "VolumeDataLogDirectory", INM_S_IRWXUGO, &vol_log_dir_show, &vol_log_dir_store, &read_data_log_dir); VOLUME_ATTR(vol_attr_VolumeBitmapGranularity, "VolumeBitmapGranularity", INM_S_IRWXUGO, &vol_bmap_gran_show, &vol_bmap_gran_store, &read_bmap_gran); VOLUME_ATTR(vol_attr_VolumeResyncRequired, "VolumeResyncRequired", INM_S_IRWXUGO, &vol_resync_req_show, &vol_resync_req_store, &read_resync_req); VOLUME_ATTR(vol_attr_VolumeOutOfSyncErrorCode, "VolumeOutOfSyncErrorCode", INM_S_IRWXUGO, &vol_osync_errcode_show, &vol_osync_errcode_store, &read_osync_errcode); VOLUME_ATTR(vol_attr_VolumeOutOfSyncErrorStatus, "VolumeOutOfSyncErrorStatus", INM_S_IRWXUGO, &vol_osync_status_show, &vol_osync_status_store, &read_osync_status); VOLUME_ATTR(vol_attr_VolumeOutOfSyncCount, "VolumeOutOfSyncCount", INM_S_IRWXUGO, &vol_osync_count_show, &vol_osync_count_store, &read_osync_count); VOLUME_ATTR(vol_attr_VolumeOutOfSyncTimeStamp, "VolumeOutOfSyncTimeStamp", INM_S_IRWXUGO, &vol_osync_ts_show, &vol_osync_ts_store, &read_osync_ts); VOLUME_ATTR(vol_attr_VolumeOutOfSyncErrorDescription, "VolumeOutOfSyncErrorDescription", INM_S_IRWXUGO, &vol_osync_desc_show, &vol_osync_desc_store, &read_osync_desc); VOLUME_ATTR(vol_attr_VolumeFilterDevType, "VolumeFilterDevType", INM_S_IRUGO, &filter_dev_type_show, NULL, &read_filter_dev_type); VOLUME_ATTR(vol_attr_VolumeNblks, "VolumeNblks", INM_S_IRWXUGO, &filter_dev_nblks_show, &filter_dev_nblks_store, &read_filter_dev_nblks); VOLUME_ATTR(vol_attr_VolumeBsize, "VolumeBsize", INM_S_IRWXUGO, &filter_dev_bsize_show, &filter_dev_bsize_store, &read_filter_dev_bsize); VOLUME_ATTR(vol_attr_VolumeResDataPoolSize, "VolumeResDataPoolSize", INM_S_IRWXUGO, &inm_vol_reserv_show, &inm_vol_reserv_store, &inm_read_reserv_dpsize); VOLUME_ATTR(vol_attr_VolumeMountPoint, "VolumeMountPoint", INM_S_IRWXUGO, &vol_mount_point_show, &vol_mount_point_store, &read_vol_mount_point); VOLUME_ATTR(vol_attr_VolumePrevEndTimeStamp, "VolumePrevEndTimeStamp", INM_S_IRWXUGO, &vol_prev_end_timestamp_show, &vol_prev_end_timestamp_store, &read_vol_prev_end_timestamp); VOLUME_ATTR(vol_attr_VolumePrevEndSequenceNumber, "VolumePrevEndSequenceNumber", INM_S_IRWXUGO, &vol_prev_end_sequence_number_show, &vol_prev_end_sequence_number_store, &read_vol_prev_end_sequence_number); VOLUME_ATTR(vol_attr_VolumePrevSequenceIDforSplitIO, "VolumePrevSequenceIDforSplitIO", INM_S_IRWXUGO, &vol_prev_sequence_id_for_split_io_show, &vol_prev_sequence_id_for_split_io_store, &read_vol_prev_sequence_id_for_split_io); VOLUME_ATTR(vol_attr_VolumePTPath, "VolumePTPath", INM_S_IRWXUGO, &inm_vol_pt_path_show, &inm_vol_pt_path_store, &inm_read_pt_path); VOLUME_ATTR(vol_attr_VolumeATDirectRead, "ATDirectRead", INM_S_IRWXUGO, &inm_vol_at_direct_rd_show, &inm_vol_at_direct_rd_store, &inm_read_at_direct_rd); VOLUME_ATTR(vol_attr_VolumeMirrorSourceList, "VolumeMirrorSourceList", INM_S_IRWXUGO, &vol_mirror_source_list_show, &vol_mirror_source_list_store, &read_vol_mirror_source_list); VOLUME_ATTR(vol_attr_VolumeMirrorDestinationList, "VolumeMirrorDestinationList", INM_S_IRWXUGO, &vol_mirror_destination_list_show, &vol_mirror_destination_list_store, &read_vol_mirror_destination_list); VOLUME_ATTR(vol_attr_VolumeMirrorDestinationScsiID, "VolumeMirrorDestinationScsiID", INM_S_IRWXUGO, &vol_mirror_destination_scsi_show, &vol_mirror_destination_scsi_store, NULL); VOLUME_ATTR(vol_attr_VolumeDiskFlags, "VolumeDiskFlags", INM_S_IRWXUGO, NULL, &vol_disk_flags_store, NULL); VOLUME_ATTR(vol_attr_VolumeIsDeviceMultipath, "VolumeIsDeviceMultipath", INM_S_IRWXUGO, NULL, &vol_dev_multipath_store, NULL); VOLUME_ATTR(vol_attr_VolumeDeviceVendor, "VolumeDeviceVendor", INM_S_IRWXUGO, NULL, &vol_dev_vendor_store, NULL); VOLUME_ATTR(vol_attr_VolumeDevStartOff, "VolumeDevStartOff", INM_S_IRWXUGO, NULL, &vol_dev_startoff_store, &vol_dev_startoff_read); VOLUME_ATTR(vol_attr_VolumePTpathList, "VolumePTpathList", INM_S_IRWXUGO, &vol_PTpath_list_show, NULL, NULL); VOLUME_ATTR(vol_attr_VolumePerfOptimization, "VolumePerfOptimization", INM_S_IRWXUGO, &vol_perf_opt_show, &vol_perf_opt_store, &vol_perf_opt_read); VOLUME_ATTR(vol_attr_VolumeMaxXferSz, "VolumeMaxXferSz", INM_S_IRWXUGO, &vol_max_xfersz_show, &vol_max_xfersz_store, &vol_max_xfersz_read); VOLUME_ATTR(vol_attr_VolumeRpoTimeStamp, "VolumeRpoTimeStamp", INM_S_IRWXUGO, &vol_rpo_ts_show, &vol_rpo_ts_store, &vol_rpo_ts_read); VOLUME_ATTR(vol_attr_VolumeDrainBlocked, "VolumeDrainBlocked", INM_S_IRWXUGO, &vol_drain_blocked_show, &vol_drain_blocked_store, &vol_drain_blocked_read); static struct attribute *sysfs_volume_attrs[] = { &vol_attr_VolumeFilteringDisabled.attr, &vol_attr_VolumeBitmapReadDisabled.attr, &vol_attr_VolumeBitmapWriteDisabled.attr, &vol_attr_VolumeDataFiltering.attr, &vol_attr_VolumeDataFiles.attr, &vol_attr_VolumeDataToDiskLimitInMB.attr, &vol_attr_VolumeDataNotifyLimitInKB.attr, &vol_attr_VolumeDataLogDirectory.attr, &vol_attr_VolumeBitmapGranularity.attr, &vol_attr_VolumeResyncRequired.attr, &vol_attr_VolumeOutOfSyncErrorCode.attr, &vol_attr_VolumeOutOfSyncErrorStatus.attr, &vol_attr_VolumeOutOfSyncCount.attr, &vol_attr_VolumeOutOfSyncTimeStamp.attr, &vol_attr_VolumeOutOfSyncErrorDescription.attr, &vol_attr_VolumeFilterDevType.attr, &vol_attr_VolumeNblks.attr, &vol_attr_VolumeBsize.attr, &vol_attr_VolumeResDataPoolSize.attr, &vol_attr_VolumeMountPoint.attr, &vol_attr_VolumePrevEndTimeStamp.attr, &vol_attr_VolumePrevEndSequenceNumber.attr, &vol_attr_VolumePrevSequenceIDforSplitIO.attr, &vol_attr_VolumePTPath.attr, &vol_attr_VolumeATDirectRead.attr, &vol_attr_VolumeMirrorSourceList.attr, &vol_attr_VolumeMirrorDestinationList.attr, &vol_attr_VolumeMirrorDestinationScsiID.attr, &vol_attr_VolumeDiskFlags.attr, &vol_attr_VolumeIsDeviceMultipath.attr, &vol_attr_VolumeDeviceVendor.attr, &vol_attr_VolumeDevStartOff.attr, &vol_attr_VolumePTpathList.attr, &vol_attr_VolumePerfOptimization.attr, &vol_attr_VolumeMaxXferSz.attr, &vol_attr_VolumeRpoTimeStamp.attr, &vol_attr_VolumeDrainBlocked.attr, NULL, }; int set_int_vol_attr(target_context_t *ctxt, enum volume_params_idx index, inm_s32_t val) { char buf[(NUM_CHARS_IN_INTEGER + 1)]; inm_s32_t copied; struct volume_attribute *volume_attr; struct attribute *attr = sysfs_volume_attrs[index]; INM_MEM_ZERO(buf, NUM_CHARS_IN_INTEGER + 1); copied = snprintf(buf, NUM_CHARS_IN_INTEGER + 1, "%u", val); volume_attr = inm_container_of(attr, struct volume_attribute, attr); if(volume_attr->store) return volume_attr->store(ctxt, volume_attr->file_name, buf, copied); return INM_EFAULT; } void set_string_vol_attr(target_context_t *ctxt, enum volume_params_idx index, char *string) { struct volume_attribute *volume_attr; struct attribute *attr = sysfs_volume_attrs[index]; volume_attr = inm_container_of(attr, struct volume_attribute, attr); if(volume_attr->store) volume_attr->store(ctxt, volume_attr->file_name, string, strlen(string)); } void set_longlong_vol_attr(target_context_t *ctxt, enum volume_params_idx index, inm_s64_t val) { char buf[(NUM_CHARS_IN_LONGLONG + 1)]; inm_s32_t copied; struct volume_attribute *volume_attr; struct attribute *attr = sysfs_volume_attrs[index]; INM_MEM_ZERO(buf, NUM_CHARS_IN_LONGLONG + 1); copied = snprintf(buf, NUM_CHARS_IN_LONGLONG + 1, "%lld", (long long)val); volume_attr = inm_container_of(attr, struct volume_attribute, attr); if(volume_attr->store) volume_attr->store(ctxt, volume_attr->file_name, buf, copied); } void set_unsignedlonglong_vol_attr(target_context_t *ctxt, enum volume_params_idx index, inm_u64_t val) { char buf[(NUM_CHARS_IN_LONGLONG + 1)]; inm_s32_t copied; struct volume_attribute *volume_attr; struct attribute *attr = sysfs_volume_attrs[index]; INM_MEM_ZERO(buf, NUM_CHARS_IN_LONGLONG + 1); copied = snprintf(buf, NUM_CHARS_IN_LONGLONG + 1, "%llu", (unsigned long long)val); volume_attr = inm_container_of(attr, struct volume_attribute, attr); if(volume_attr->store) volume_attr->store(ctxt, volume_attr->file_name, buf, copied); } static inm_s32_t vol_attr_show(void *vol_infop, struct attribute *attr, char *page) { target_context_t *temp; struct volume_attribute *volume_attr; inm_s32_t ret = 0; temp = (target_context_t *)vol_infop; volume_attr = inm_container_of(attr, struct volume_attribute, attr); if (volume_attr->show) ret = volume_attr->show(temp, page); return ret; } static inm_s32_t vol_attr_store(void *vol_infop, struct attribute *attr, const char *page, inm_s32_t len) { target_context_t *temp; struct volume_attribute *volume_attr; inm_s32_t ret = 0; temp = (target_context_t *)vol_infop; volume_attr = inm_container_of(attr, struct volume_attribute, attr); if (volume_attr->store) ret = volume_attr->store(temp, volume_attr->file_name, page,len); if(ret < 0) return ret; else { return len; } } static void volume_release(void *argp) { target_context_t *ctxt = (target_context_t *) argp; target_context_release(ctxt); } void load_volume_params(target_context_t *ctxt) { inm_s32_t num_attribs, temp; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered ctx:%p volume:%s",ctxt, ctxt->tc_guid); } num_attribs = (sizeof(sysfs_volume_attrs)/sizeof(struct atttribute *)); num_attribs--; temp = 0; while(temp < num_attribs) { struct volume_attribute *volume_attr; struct attribute *attr = sysfs_volume_attrs[temp]; volume_attr = inm_container_of(attr, struct volume_attribute, attr); if (volume_attr->read) volume_attr->read(ctxt, volume_attr->file_name); temp++; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving ctx:%p volume:%s",ctxt, ctxt->tc_guid); } } /* Modify persistent store based on persistent name */ int inm_write_guid_attr(char *pname, enum volume_params_idx index, inm_s32_t val) { char *path = NULL; inm_u32_t wrote = 0, ret = 0; char buf[(NUM_CHARS_IN_INTEGER + 1)]; inm_s32_t copied; struct volume_attribute *volume_attr; struct attribute *attr = sysfs_volume_attrs[index]; volume_attr = inm_container_of(attr, struct volume_attribute, attr); INM_MEM_ZERO(buf, NUM_CHARS_IN_INTEGER + 1); copied = snprintf(buf, (NUM_CHARS_IN_INTEGER) + 1, "%d", val); if(!get_path_memory(&path)) { err("write_guid_attr: Failed to allocated memory path"); return 1; } snprintf(path, INM_PATH_MAX, "%s/%s/%s", PERSISTENT_DIR, pname, volume_attr->file_name); dbg("Writing to file %s", path); if(!write_full_file(path, (void *)buf, copied, &wrote)) { err("write_guid_attr: write to persistent store failed %s", path); ret = 1; } else { ret = 0; } free_path_memory(&path); return ret; } inm_s32_t volume_get_set_attribute_entry(struct _inm_attribute *inm_attr) { inm_s32_t ret = 0; char *lbufp = NULL; inm_u32_t lbuflen = 0; target_context_t *ctxt = NULL; INM_DOWN_READ(&driver_ctx->tgt_list_sem); ctxt = get_tgt_ctxt_from_name_nowait_locked(inm_attr->guid.volume_guid); if (!ctxt){ INM_UP_READ(&driver_ctx->tgt_list_sem); dbg("%s is not stacked",inm_attr->guid.volume_guid); goto out; } INM_UP_READ(&driver_ctx->tgt_list_sem); lbuflen = INM_MAX(inm_attr->buflen, INM_PAGESZ); lbufp = (char *) INM_KMALLOC(lbuflen, INM_KM_SLEEP, INM_KERNEL_HEAP); if(!lbufp){ err("buffer allocation get_set ioctl failed\n"); ret = INM_ENOMEM; goto out; } INM_MEM_ZERO(lbufp, lbuflen); if(inm_attr->why == SET_ATTR){ if (INM_COPYIN(lbufp, inm_attr->bufp, inm_attr->buflen)) { err("copyin failed\n"); ret = INM_EFAULT; goto out; } ret = vol_attr_store((void *)ctxt, sysfs_volume_attrs[inm_attr->type], lbufp, inm_attr->buflen); if (ret < 0){ err("attributre store failed"); ret = -ret; goto out; } } else { ret = vol_attr_show((void *)ctxt, sysfs_volume_attrs[inm_attr->type], lbufp); if (ret == 0){ dbg("its unlikely but get attribute read 0 bytes only"); } else if (ret > 0){ if (INM_COPYOUT(inm_attr->bufp, lbufp, ret+1)) { err("copyout failed\n"); ret = INM_EFAULT; goto out; } } else { err("get attribute failed"); ret = -ret; goto out; } } ret = 0; out: if(ctxt) put_tgt_ctxt(ctxt); if (lbufp) { INM_KFREE(lbufp, lbuflen, INM_KERNEL_HEAP); } return ret; } inm_s32_t sysfs_init_volume(target_context_t *ctxt, char *pname) { char *path = NULL; inm_s32_t err = -1; #ifdef INM_AIX host_dev_ctx_t *hdcp = NULL; host_dev_t *hdc_dev = NULL; inm_s32_t mxs_vol = 0; #endif if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered ctx:%p volume:%s",ctxt, pname); } ctxt->release = volume_release; get_tgt_ctxt(ctxt); if (!get_path_memory(&path)) { put_tgt_ctxt(ctxt); err = -ENOMEM; err("Failed to get memory while creating persistent directory"); return err; } snprintf(path, INM_PATH_MAX, "%s/%s" , PERSISTENT_DIR, pname); /* * Upgrades for change in persistent names are done by creating symlinks to * existing persistent dir names. In case of a fresh protection, remove old * symlinks if any before creating persistent directory */ inm_unlink_symlink(path, PERSISTENT_DIR); inm_mkdir(path, 0755); free_path_memory(&path); #ifdef INM_AIX hdcp = ctxt->tc_priv; hdc_dev = inm_list_entry((hdcp->hdc_dev_list_head.next), host_dev_t, hdc_dev_list); if (mxs_vol = INM_GET_HDEV_MXS(hdc_dev)) { set_int_vol_attr(ctxt, VolumeMaxXferSz, mxs_vol); } #endif load_volume_params(ctxt); err = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving ctx:%p volume:%s ret:%d",ctxt, ctxt->tc_pname, err); } return err; } /* Wrapper to common_attr_store() */ ssize_t wrap_common_attr_store(inm_u32_t type, const char *page, size_t len) { struct attribute *attr; attr = sysfs_common_attrs[type]; return common_attr_store(attr, page, len); } inm_s32_t inm_is_upgrade_pname(char *actual, char *upgrade) { int retval = 0; void *af = NULL; void *uf = NULL; char *path = NULL; if (!get_path_memory(&path)) { retval = -ENOMEM; path = NULL; goto out; } snprintf(path, INM_PATH_MAX, "%s/%s" , PERSISTENT_DIR, actual); af = filp_open(path, O_RDONLY, 0777); if (IS_ERR(af)) { retval = PTR_ERR(af); err("Cannot open %s", path); goto out; } snprintf(path, INM_PATH_MAX, "%s/%s" , PERSISTENT_DIR, upgrade); uf = filp_open(path, O_RDONLY, 0777); if (IS_ERR(uf)) { retval = -EINVAL; err("Cannot open %s", path); filp_close(af, NULL); goto out; } dbg("af = %p, uf = %p", INM_HDL_TO_INODE(af), INM_HDL_TO_INODE(uf)); if (INM_HDL_TO_INODE(af) != INM_HDL_TO_INODE(uf)) retval = -EINVAL; filp_close(af, NULL); filp_close(uf, NULL); out: if (path) free_path_memory(&path); return retval; } involflt-0.1.0/src/verifier.c0000755000000000000000000001355014467303177014634 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include "verifier.h" #ifdef __INM_KERNEL_DRIVERS__ extern driver_context_t *driver_ctx; /* Can be called to toggle the verifier on or on change of dm file size */ inm_s32_t inm_verify_alloc_area(inm_u32_t size, int toggle) { void *old = NULL; inm_s32_t error = 0; inm_irqflag_t flag = 0; /* Verifier is not on and verifier is not being toggled on */ if (!driver_ctx->dc_verifier_on && !toggle) return error; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_verifier_lock, flag); old = driver_ctx->dc_verifier_area; driver_ctx->dc_verifier_area = vmalloc(size); if (!driver_ctx->dc_verifier_area) { err("Cannot vmalloc %u", size); driver_ctx->dc_verifier_area = old; error = -ENOMEM; } else { if (old) vfree(old); } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_verifier_lock, flag); return error; } void inm_verify_free_area(void) { inm_irqflag_t flag = 0; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_verifier_lock, flag); if (driver_ctx->dc_verifier_area) vfree(driver_ctx->dc_verifier_area); driver_ctx->dc_verifier_area = NULL; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_verifier_lock, flag); } #endif /* __INM_KERNEL_DRIVERS__ */ inm_s32_t inm_verify_change_node_data(char *buf, int bufsz, int verbose) { inm_s32_t error = -EBADF; char *cur = NULL; inm_u64_t offset = 0; inm_u32_t *tag = 0; SVD_PREFIX *prefix = 0; SVD_TIME_STAMP_V2 *time_stamp; long long chglen = 0; SVD_DIRTY_BLOCK_V2 *dblk; int dbcount = 1; cur = buf; /* Endian Tag */ verify_err(verbose, "Endian Tag: %c%c%c%c", cur[0], cur[1], cur[2], cur[3]); tag = (inm_u32_t *)cur; /*if ( *tag != inm_is_little_endian() ? SVD_TAG_LEFORMAT : SVD_TAG_BEFORMAT)*/ if ( *tag != SVD_TAG_LEFORMAT) goto err; cur += sizeof(SVD_TAG_LEFORMAT); offset += sizeof(SVD_TAG_LEFORMAT); /* SVD Header Tag */ prefix = (SVD_PREFIX *)cur; verify_err(verbose, "HDR -> Tag: %c%c%c%c Count: %d Flags: %d", cur[0], cur[1], cur[2], cur[3], prefix->count, prefix->Flags); if (prefix->tag != SVD_TAG_HEADER1) goto err; cur += sizeof(SVD_PREFIX); offset += sizeof(SVD_PREFIX); /* SVD Header - skip for now */ cur += sizeof(SVD_HEADER1); offset += sizeof(SVD_HEADER1); /* TOFC */ prefix = (SVD_PREFIX *)cur; verify_err(verbose, "TOFC -> Tag: %c%c%c%c Count: %d Flags: %d", cur[0], cur[1], cur[2], cur[3], prefix->count, prefix->Flags); if (prefix->tag != SVD_TAG_TIME_STAMP_OF_FIRST_CHANGE_V2) goto err; cur += sizeof(SVD_PREFIX); offset += sizeof(SVD_PREFIX); /* TOFC Time Stamp */ time_stamp = (SVD_TIME_STAMP_V2 *)cur; verify_err(verbose, "TOFC Time: %llu Seq No: %llu", time_stamp->TimeInHundNanoSecondsFromJan1601, time_stamp->ullSequenceNumber); cur += sizeof(SVD_TIME_STAMP_V2); offset += sizeof(SVD_TIME_STAMP_V2); /* LODC */ prefix = (SVD_PREFIX *)cur; verify_err(verbose, "LODC -> Tag: %c%c%c%c Count: %d Flags: %d", cur[0], cur[1], cur[2], cur[3], prefix->count, prefix->Flags); if (prefix->tag != SVD_TAG_LENGTH_OF_DRTD_CHANGES) goto err; cur += sizeof(SVD_PREFIX); offset += sizeof(SVD_PREFIX); /* Change Len */ chglen = *((long long *)cur); verify_err(verbose, "Change Len: %lld", chglen); cur += sizeof(chglen); offset += sizeof(chglen); /* Changes */ while (chglen) { /* Dirty Block Prefix */ prefix = (SVD_PREFIX *)cur; verify_err(verbose, "DB[%d] -> Tag: %c%c%c%c Count: %d Flags: %d", dbcount, cur[0], cur[1], cur[2], cur[3], prefix->count, prefix->Flags); if (prefix->tag != SVD_TAG_DIRTY_BLOCK_DATA_V2) goto err; cur += sizeof(SVD_PREFIX); offset += sizeof(SVD_PREFIX); /* Dirty Block */ dblk = (SVD_DIRTY_BLOCK_V2 *)cur; verify_err(verbose, "DB[%d] -> Len: %u Off: %llu TDelta: %u SDelta: %u", dbcount, dblk->Length, dblk->ByteOffset, dblk->uiTimeDelta, dblk->uiSequenceNumberDelta); cur += sizeof(SVD_DIRTY_BLOCK_V2); offset += sizeof(SVD_DIRTY_BLOCK_V2); /* Data */ if (dblk->Length < bufsz - offset) { cur += dblk->Length; offset += dblk->Length; } /* Next dirty Block */ chglen -= (sizeof(SVD_PREFIX) + sizeof(SVD_DIRTY_BLOCK_V2) + dblk->Length); dbcount++; if (offset > bufsz) { verify_err(verbose, "Exceeded buffer"); goto err; } } /* TLV2 */ prefix = (SVD_PREFIX *)cur; verify_err(verbose, "TLV2 -> Tag: %c%c%c%c Count: %d Flags: %d", cur[0], cur[1], cur[2], cur[3], prefix->count, prefix->Flags); if (prefix->tag != SVD_TAG_TIME_STAMP_OF_LAST_CHANGE_V2) goto err; cur += sizeof(SVD_PREFIX); offset += sizeof(SVD_PREFIX); /* TLV2 Time Stamp */ time_stamp = (SVD_TIME_STAMP_V2 *)cur; verify_err(verbose, "TLV2 Time: %llu Seq No: %llu", time_stamp->TimeInHundNanoSecondsFromJan1601, time_stamp->ullSequenceNumber); cur += sizeof(SVD_TIME_STAMP_V2); offset += sizeof(SVD_TIME_STAMP_V2); /* Success */ verify_err(verbose, "File is good"); error = 0; out: return error; err: if (!verbose) inm_verify_change_node_data(buf, bufsz, 1); else verify_err(verbose, "Bad data at offset %llu(%x)", offset, (inm_u32_t)offset); goto out; } involflt-0.1.0/src/verifier.h0000755000000000000000000000401514467303177014635 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _VERIFIER_H #define _VERIFIER_H #ifndef __INM_KERNEL_DRIVERS__ /* User Mode */ #include "verifier_user.h" #define verify_err(verbose, format, arg...) \ do { \ if (verbose) \ printf("INFO: %llu: " format "\n", offset, ## arg); \ } while(0) #else /* Kernel Mode */ #include "involflt.h" #include "involflt-common.h" #include "data-mode.h" #include "utils.h" #include "change-node.h" #include "filestream.h" #include "iobuffer.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "bitmap_api.h" #include "VBitmap.h" #include "work_queue.h" #include "data-file-mode.h" #include "target-context.h" #include "driver-context.h" #include "filter_host.h" #include "metadata-mode.h" #include "tunable_params.h" #include "svdparse.h" #include "db_routines.h" inm_s32_t inm_verify_alloc_area(inm_u32_t size, int toggle); void inm_verify_free_area(void); #define verify_err(verbose, format, arg...) (verbose ? err(format, ## arg) : 0) #endif /* Kernel Mode */ #include "svdparse.h" inm_s32_t inm_verify_change_node_data(char *buf, int bufsz, int verbose); #endif involflt-0.1.0/src/last_chance_writes.h0000755000000000000000000000203214467303177016660 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _LCW_H #define _LCW_H void lcw_move_bitmap_to_raw_mode(target_context_t *tgt_ctxt); void lcw_flush_changes(void); inm_s32_t lcw_perform_bitmap_op(char *guid, enum LCW_OP op); inm_s32_t lcw_map_file_blocks(char *name); #endif involflt-0.1.0/src/telemetry.c0000755000000000000000000011117714467303177015037 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include "involflt.h" #include "work_queue.h" #include "utils.h" #include "filestream.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "VBitmap.h" #include "change-node.h" #include "data-file-mode.h" #include "target-context.h" #include "data-mode.h" #include "driver-context.h" #include "file-io.h" #include "osdep.h" #include "telemetry-types.h" #include "telemetry.h" #include "telemetry-exception.h" extern driver_context_t *driver_ctx; static inm_sem_t tel_mutex; /* Log one file at a time */ static inm_spinlock_t tel_slock; /* Record list manipulation */ static flt_timer_t tel_timer; /* Refresh files/throttle at interval */ wqentry_t *tel_wqe = NULL; /* Worker thread work item */ static inm_u64_t tel_event_seq = 1; /* Cur seq no. Indicate holes in logs */ static inm_u64_t tel_file_seq = 0; /* File seq number */ static char *tel_fname = NULL; /* Current file name */ typedef struct tel_rec { inm_list_head_t tr_head; /* Record list */ inm_u64_t tr_seq; /* Event Seq num */ inm_u32_t tr_len; /* Length of buffer used */ inm_s32_t tr_written; /* Bytes written in single iteration */ char tr_buf[0]; /* Log buffer */ } tel_rec_t; static inm_list_head_t tel_list; /* Record List */ static int tel_nrecs = 0; static int tel_init = 0; static int tel_shutdown = 0; /* Indicated telemetry shutdown */ #define TELEMETRY_FILE_NR_MAX 12 #define TELEMETRY_FILE_NAME_PREFIX "/var/log/involflt_telemetry_completed_" #define TELEMETRY_FILE_NAME_SUFFIX ".log" #define TELEMETRY_REC_SIZE PAGE_SIZE #define TELEMETRY_REC_BUF_SIZE (TELEMETRY_REC_SIZE - sizeof(tel_rec_t)) #define TELEMETRY_REC_NR_MAX_PER_VOL 16 /* 1 hour of recs per vol */ #define TELEMETRY_REC_CUR(l) \ (inm_list_entry(inm_list_last(l), tel_rec_t, tr_head)) #define TELEMETRY_REC_CUR_BUF(l) ((TELEMETRY_REC_CUR(l))->tr_buf) #define TELEMETRY_REC_CUR_LEN(l) ((TELEMETRY_REC_CUR(l))->tr_len) #define TELEMETRY_REC_CUR_WRITTEN(l) ((TELEMETRY_REC_CUR(l))->tr_written) #define TELEMETRY_REC_CUR_SEQ(l) ((TELEMETRY_REC_CUR(l))->tr_seq) #define TELEMETRY_REC_PREFIX "{\"Map\":{" #define TELEMETRY_REC_SUFFIX "}}\n" /* Log format is "{"Map":{"K1":"V1","K2":"V2",....,"KN":"VN"}}\n" */ #define TELEMETRY_LOG(l, fmt, arg...) \ do { \ if (inm_list_empty(l)) \ break; \ \ TELEMETRY_REC_CUR_WRITTEN(l) = \ sprintf_s(TELEMETRY_REC_CUR_BUF(l) + TELEMETRY_REC_CUR_LEN(l), \ TELEMETRY_REC_BUF_SIZE - TELEMETRY_REC_CUR_LEN(l), \ fmt, ## arg); \ \ dbg("Written %d bytes", TELEMETRY_REC_CUR_WRITTEN(l)); \ if (TELEMETRY_REC_CUR_WRITTEN(l) == -1) \ telemetry_rec_alloc(l); \ else \ TELEMETRY_REC_CUR_LEN(l) += TELEMETRY_REC_CUR_WRITTEN(l); \ \ } while (!inm_list_empty(l) && \ TELEMETRY_REC_CUR_LEN(l) == 0) /* New record added to the list */ #define TELEMETRY_LOG_PREFIX(l) TELEMETRY_LOG(l, "%s", TELEMETRY_REC_PREFIX) /* go back and write over comma after last key:value pair */ #define TELEMETRY_LOG_SUFFIX(l) \ do { \ if (inm_list_empty(l)) \ break; \ \ TELEMETRY_REC_CUR_LEN(l)--; \ TELEMETRY_LOG(l, "%s", TELEMETRY_REC_SUFFIX); \ } while(0) #define TELEMETRY_LOG_KV(l, k, v, f) TELEMETRY_LOG(l, "\"%s\":\""f"\",", k, v) #define TELEMETRY_LOG_ULL(l, k, v) TELEMETRY_LOG_KV(l, k, v, "%llu") #define TELEMETRY_LOG_UL(l, k, v) TELEMETRY_LOG_KV(l, k, v, "%lu") #define TELEMETRY_LOG_UINT(l, k, v) TELEMETRY_LOG_KV(l, k, v, "%u") #define TELEMETRY_LOG_INT(l, k, v) TELEMETRY_LOG_KV(l, k, v, "%d") #define TELEMETRY_LOG_STR(l, k, v) TELEMETRY_LOG_KV(l, k, v, "%s") #define TELEMETRY_LOG_WTIME(l, k, v) \ TELEMETRY_LOG_KV(l, k, ((v != 0) ? (v + TELEMETRY_WTIME_OFF) : v), "%llu") static void telemetry_refresh_timeout(wqentry_t *unused); static void telemetry_cleanup(void) { wqentry_t *tmp_wqe = NULL; char *tmp_fname = NULL; dbg("Free telemetry memory"); INM_SPIN_LOCK(&tel_slock); /* If the work queue entry is no longer queued with worker thread */ if (tel_wqe && tel_wqe->flags != WITEM_TYPE_TELEMETRY_FLUSH) { tmp_wqe = tel_wqe; tel_wqe = NULL; tmp_fname = tel_fname; tel_fname = NULL; } INM_SPIN_UNLOCK(&tel_slock); if (tmp_wqe) put_work_queue_entry(tmp_wqe); if (tmp_fname) free_path_memory(&tmp_fname); } static inline inm_u64_t telemetry_next_event_id(void) { inm_u64_t event_id = 0; INM_SPIN_LOCK(&tel_slock); event_id = tel_event_seq++; INM_SPIN_UNLOCK(&tel_slock); return event_id; } static void telemetry_log_rec_drop(inm_s32_t errno, inm_u64_t start, inm_u64_t end) { dbg("Telemetry: Dropped(%d) => %llu - %llu", errno, start, end); } static void telemetry_free_rec(tel_rec_t *rec) { INM_SPIN_LOCK(&tel_slock); tel_nrecs--; INM_SPIN_UNLOCK(&tel_slock); dbg("Free %p", rec); INM_FREE_PAGE(rec, INM_KERNEL_HEAP); } /* Free record list */ static void telemetry_free_rec_list(inm_list_head_t *tel_data) { inm_list_head_t *cur = NULL; inm_list_head_t *next = NULL; tel_rec_t *rec = NULL; inm_list_for_each_safe(cur, next, tel_data) { rec = inm_list_entry(cur, tel_rec_t, tr_head); inm_list_del_init(cur); telemetry_free_rec(rec); } } static tel_rec_t * __telemetry_rec_alloc(inm_u64_t event_id) { int alloc = 0; void *tel_buf = NULL; tel_rec_t *rec = NULL; static int mem_throttled = 0; INM_SPIN_LOCK(&tel_slock); if (tel_nrecs < (driver_ctx->total_prot_volumes * TELEMETRY_REC_NR_MAX_PER_VOL)) { tel_nrecs++; alloc = 1; if (mem_throttled) { info("Telemetry: memory unthrottled"); mem_throttled = 0; } } else { if (!mem_throttled) { info("Telemetry: memory throttled"); mem_throttled = 1; } } INM_SPIN_UNLOCK(&tel_slock); if (alloc) { tel_buf = (void *)__INM_GET_FREE_PAGE(INM_KM_SLEEP | INM_KM_NOIO, INM_KERNEL_HEAP); if (!tel_buf) { err("Error allocating telemetry buffer"); INM_SPIN_LOCK(&tel_slock); tel_nrecs--; INM_SPIN_UNLOCK(&tel_slock); } else { rec = (tel_rec_t *)tel_buf; INM_INIT_LIST_HEAD(&(rec->tr_head)); rec->tr_seq = event_id; rec->tr_len = 0; rec->tr_written = 0; } } return rec; } static inm_s32_t telemetry_rec_alloc(inm_list_head_t *rec_list) { inm_s32_t error = 0; tel_rec_t *rec = NULL; inm_u64_t event_id = 0; if (!tel_init) { err("Telemetry request before telemetry service initialised"); return -EPERM; } if (inm_list_empty(rec_list)) { event_id = telemetry_next_event_id(); } else { event_id = TELEMETRY_REC_CUR_SEQ(rec_list); } rec = __telemetry_rec_alloc(event_id); if (!rec) { telemetry_free_rec_list(rec_list); telemetry_log_rec_drop(error, event_id, event_id); error = -ENOMEM; goto out; } dbg("Allocate %p", rec); inm_list_add_tail(&rec->tr_head, rec_list); out: return error; } static void telemetry_queue_rec(inm_list_head_t *rec_list) { INM_SPIN_LOCK(&tel_slock); inm_list_splice_init(rec_list, &tel_list); INM_SPIN_UNLOCK(&tel_slock); } /* start the timer */ static void start_tel_timer(void) { if (!tel_shutdown) start_timer(&tel_timer, TELEMETRY_FILE_REFRESH_INTERVAL, telemetry_refresh_timeout); } /* * Check for throttling telemetry if files are not being drained */ static int telemetry_throttled(void) { void *temp_fhdl; static int tel_throttled = 0; if (tel_file_seq >= TELEMETRY_FILE_NR_MAX) { sprintf_s(tel_fname, INM_PATH_MAX, "%s%llu%s", TELEMETRY_FILE_NAME_PREFIX, (tel_file_seq - TELEMETRY_FILE_NR_MAX + 1), TELEMETRY_FILE_NAME_SUFFIX); dbg("Telemetry: Checking for old file: %s", tel_fname); if (flt_open_file(tel_fname, O_RDONLY, &temp_fhdl)) { flt_close_file(temp_fhdl); if (!tel_throttled) { info("Telemetry: Throttled"); } tel_throttled = 1; } else { if (tel_throttled) { info("Telemetry: Unthrottled"); } tel_throttled = 0; } } return tel_throttled; } /* * Check for throttle and get a new file handle */ static inm_s32_t telemetry_refresh_file(void **phdl) { inm_s32_t error = 0; void *hdl = NULL; INM_DOWN(&tel_mutex); if (!tel_fname) { INM_BUG_ON(!tel_fname); goto out; } if (telemetry_throttled()) { error = -EMFILE; goto out; } tel_file_seq++; sprintf_s(tel_fname, INM_PATH_MAX, "%s%llu%s", TELEMETRY_FILE_NAME_PREFIX, tel_file_seq, TELEMETRY_FILE_NAME_SUFFIX); dbg("Telemetry: New file = %s", tel_fname); if (!flt_open_file(tel_fname, O_RDWR | O_CREAT | O_TRUNC, &hdl)) { err("Telemetry: File open failed: %s", tel_fname); error = -EIO; tel_file_seq--; hdl = NULL; } out: INM_UP(&tel_mutex); *phdl = hdl; return error; } /* Take each record and write to the file handle */ static inm_s32_t telemetry_write_data(void *tel_hdl, inm_list_head_t *tel_data) { inm_s32_t error = 0; inm_list_head_t *cur = NULL; inm_list_head_t *next = NULL; tel_rec_t *rec = NULL; inm_s32_t write_succeeded = 0; inm_u64_t offset = 0; inm_u32_t written = 0; inm_list_for_each_safe(cur, next, tel_data) { rec = inm_list_entry(cur, tel_rec_t, tr_head); dbg("Telemetry: Logging event = %llu len = %d", rec->tr_seq, rec->tr_len); write_succeeded = flt_write_file(tel_hdl, rec->tr_buf, offset, rec->tr_len, &written); if (!write_succeeded || /* Failed */ written != rec->tr_len) { /* Partial write */ error = -EIO; break; } offset += written; } flt_close_file(tel_hdl); return error; } /* * Get a new file handle and write telemetry data to it */ static inm_s32_t telemetry_flush_data(inm_list_head_t *tel_data) { inm_s32_t error = 0; void *tel_fhdl = NULL; inm_u64_t start_seqno = 0; inm_u64_t end_seqno = 0; tel_rec_t *rec = NULL; if (inm_list_empty(tel_data)) goto out; error = telemetry_refresh_file(&tel_fhdl); if (!error) error = telemetry_write_data(tel_fhdl, tel_data); if (error) { if (error != -EMFILE) /* Not throttled */ err("Error writing telemetry log"); rec = inm_list_entry(inm_list_first(tel_data), tel_rec_t, tr_head); start_seqno = rec->tr_seq; rec = inm_list_entry(inm_list_last(tel_data), tel_rec_t, tr_head); end_seqno = rec->tr_seq; telemetry_log_rec_drop(error, start_seqno, end_seqno); } telemetry_free_rec_list(tel_data); out: if (tel_shutdown) telemetry_cleanup(); return error; } /* * Telemetry flush worker routine */ static void telemetry_flush_worker(wqentry_t *wqe) { inm_list_head_t temp; INM_INIT_LIST_HEAD(&temp); /* * Grab all pending records into temp * and mark the work item as free so further * worker thread telemetry offloads can be queued */ INM_SPIN_LOCK(&tel_slock); wqe->flags = WITEM_TYPE_UNINITIALIZED; inm_list_replace_init(&tel_list, &temp); INM_SPIN_UNLOCK(&tel_slock); telemetry_flush_data(&temp); } /* * Queue telemetry flush work item to worker thread */ static void telemetry_offload_flush_to_worker(void) { INM_SPIN_LOCK(&tel_slock); /* We only queue one telemetry flush work item */ if (tel_wqe && tel_wqe->flags != WITEM_TYPE_TELEMETRY_FLUSH) { tel_wqe->flags = WITEM_TYPE_TELEMETRY_FLUSH; tel_wqe->context = NULL; tel_wqe->work_func = telemetry_flush_worker; add_item_to_work_queue(&driver_ctx->wqueue, tel_wqe); } INM_SPIN_UNLOCK(&tel_slock); } /* * Timeout callback for telemetry timer. Since barrier/freeze * timeout is handled by timer thread, we offload writing telemetry * data to worker thread. */ static void telemetry_refresh_timeout(wqentry_t *unused) { telemetry_offload_flush_to_worker(); telemetry_check_time_jump(); start_tel_timer(); } static inm_s32_t telemetry_log_end(inm_list_head_t *rec_list) { inm_s32_t error = 0; /* Telemetry should shutdown after all tags have been dropped/drained */ INM_BUG_ON(tel_shutdown); TELEMETRY_LOG_SUFFIX(rec_list); if (inm_list_empty(rec_list)) error = -ENOMEM; else telemetry_queue_rec(rec_list); return error; } static inm_s32_t telemetry_log_start(inm_list_head_t *rec_list) { inm_s32_t error = 0; inm_u64_t ltime = 0; INM_INIT_LIST_HEAD(rec_list); error = telemetry_rec_alloc(rec_list); if (error) goto out; /* * This time should be wrt unix epoch as it is * interpreted by event collector */ get_time_stamp(<ime); TELEMETRY_LOG_PREFIX(rec_list); TELEMETRY_LOG_ULL(rec_list, "EventRecId", TELEMETRY_REC_CUR_SEQ(rec_list)); TELEMETRY_LOG_ULL(rec_list, "SrcLoggerTime", ltime); out: return error; } void telemetry_log_drop_error(inm_s32_t error) { inm_u64_t event_id = 0; event_id = telemetry_next_event_id(); telemetry_log_rec_drop(error, event_id, event_id); } #if defined(TELEMETRY) inm_s32_t telemetry_log_tag_history(change_node_t *chg_node, target_context_t *ctxt, etTagStatus status, etTagStateTriggerReason reason, etMessageType msg) { inm_s32_t error = 0; tag_history_t *tag_hist = NULL; tag_telemetry_common_t *tag_common = NULL; inm_list_head_t rec_list; inm_list_head_t *recs = &rec_list; inm_u64_t state = 0; tgt_stats_t *stats = NULL; non_wo_stats_t *nwo = NULL; exception_buf_t *excbuf = NULL; if (!chg_node) { err("NULL change node"); INM_BUG_ON(!chg_node); goto out; } tag_hist = chg_node->cn_hist; if (!tag_hist) goto out; if (!ctxt) { telemetry_log_drop_error(-ENODEV); goto out; } tag_common = tag_hist->th_tag_common; error = telemetry_log_start(recs); if (error) goto out; excbuf = telemetry_get_exception(); TELEMETRY_LOG_STR(recs, "CustomJson", excbuf->eb_buf); telemetry_put_exception(excbuf); /* Tag Data */ state = telemetry_get_dbs(ctxt, status, reason); TELEMETRY_LOG_WTIME(recs, "TagRqstTime", tag_common->tc_req_time); TELEMETRY_LOG_UINT(recs, "TagType", tag_common->tc_type); TELEMETRY_LOG_STR(recs, "TagMarkerGUID", tag_common->tc_guid); TELEMETRY_LOG_UINT(recs, "NumOfTotalDisk", tag_common->tc_ndisks); TELEMETRY_LOG_UINT(recs, "NumOfProtectdDisk", tag_common->tc_ndisks_prot); TELEMETRY_LOG_UINT(recs, "NumOfTaggedDisk", tag_common->tc_ndisks_tagged); TELEMETRY_LOG_INT(recs, "IoctlCode", tag_common->tc_ioctl_cmd); TELEMETRY_LOG_INT(recs, "TagStatus", tag_hist->th_tag_status); TELEMETRY_LOG_INT(recs, "GlobalIoctlStatus", tag_common->tc_ioctl_status); TELEMETRY_LOG_ULL(recs, "PrevTagEndTS", ctxt->tc_tel.tt_prev_tag_ts); TELEMETRY_LOG_ULL(recs, "PrevTagEndSeq", ctxt->tc_tel.tt_prev_tag_seqno); TELEMETRY_LOG_ULL(recs, "PrevCommittedTS", ctxt->tc_tel.tt_prev_ts); TELEMETRY_LOG_ULL(recs, "PrevCommittedSeq", ctxt->tc_tel.tt_prev_seqno); TELEMETRY_LOG_ULL(recs, "TagEndTS", chg_node->changes.end_ts.TimeInHundNanoSecondsFromJan1601); TELEMETRY_LOG_ULL(recs, "TagEndSeq", chg_node->changes.end_ts.ullSequenceNumber); TELEMETRY_LOG_WTIME(recs, "TagInsertTime", tag_hist->th_insert_time); TELEMETRY_LOG_WTIME(recs, "TagCompleteTime", 0ULL); /* commit/revoke */ TELEMETRY_LOG_WTIME(recs, "TimeJumpDetectedTS", driver_ctx->dc_tel.dt_time_jump_exp); TELEMETRY_LOG_WTIME(recs, "TimeJumpedTS", driver_ctx->dc_tel.dt_time_jump_cur); TELEMETRY_LOG_STR(recs, "DiskIdentity", ctxt->tc_pname); TELEMETRY_LOG_STR(recs, "DiskName", ctxt->tc_guid); TELEMETRY_LOG_ULL(recs, "DiskBlendedState", state); TELEMETRY_LOG_UINT(recs, "TagState", status); /* Previous tag replication stats */ stats = &tag_hist->th_prev_stats; TELEMETRY_LOG_WTIME(recs, "LastTagInsertTime", tag_hist->th_prev_tag_time); TELEMETRY_LOG_ULL(recs, "PendChgPrev", stats->ts_pending); TELEMETRY_LOG_ULL(recs, "ChurnPrev", stats->ts_tracked_bytes); TELEMETRY_LOG_ULL(recs, "DrainDBCountPrev", stats->ts_getdb); TELEMETRY_LOG_ULL(recs, "RevertDBCountPrev", stats->ts_revertdb); TELEMETRY_LOG_ULL(recs, "CommitDBCountPrev", stats->ts_commitdb); TELEMETRY_LOG_ULL(recs, "DrainDataPrevInBytes", stats->ts_drained_bytes); TELEMETRY_LOG_ULL(recs, "NwLatBckt1Prev", stats->ts_nwlb1); TELEMETRY_LOG_ULL(recs, "NwLatBckt2Prev", stats->ts_nwlb2); TELEMETRY_LOG_ULL(recs, "NwLatBckt3Prev", stats->ts_nwlb3); TELEMETRY_LOG_ULL(recs, "NwLatBckt4Prev", stats->ts_nwlb4); TELEMETRY_LOG_ULL(recs, "NwLatBckt5Prev", stats->ts_nwlb5); TELEMETRY_LOG_ULL(recs, "CommitDbFailPrev", stats->ts_commitdb_failed); /* Current replication stats */ /* Update the prev stats to current stats */ stats = &tag_hist->th_cur_stats; TELEMETRY_LOG_ULL(recs, "PendChgOnInsert", stats->ts_pending); TELEMETRY_LOG_ULL(recs, "ChurnCurr", stats->ts_tracked_bytes); TELEMETRY_LOG_ULL(recs, "DrainDBCountCurr", stats->ts_getdb); TELEMETRY_LOG_ULL(recs, "RevertDBCountCurr", stats->ts_revertdb); TELEMETRY_LOG_ULL(recs, "CommitDBCountCurr", stats->ts_commitdb); TELEMETRY_LOG_ULL(recs, "DrainDataCurr", stats->ts_drained_bytes); TELEMETRY_LOG_ULL(recs, "NwLatBckt1Curr", stats->ts_nwlb1); TELEMETRY_LOG_ULL(recs, "NwLatBckt2Curr", stats->ts_nwlb2); TELEMETRY_LOG_ULL(recs, "NwLatBckt3Curr", stats->ts_nwlb3); TELEMETRY_LOG_ULL(recs, "NwLatBckt4Curr", stats->ts_nwlb4); TELEMETRY_LOG_ULL(recs, "NwLatBckt5Curr", stats->ts_nwlb5); TELEMETRY_LOG_ULL(recs, "CommitDbFailCurr", stats->ts_commitdb_failed); /* Prev successful tag stats */ stats = &tag_hist->th_prev_succ_stats; /* If successful commit, generate current stats and give */ if (status == ecTagStatusTagCommitDBSuccess) telemetry_tag_stats_record(ctxt, stats); TELEMETRY_LOG_ULL(recs, "PendChgToCmp", stats->ts_pending); TELEMETRY_LOG_ULL(recs, "ChurnToCmp", stats->ts_tracked_bytes); TELEMETRY_LOG_ULL(recs, "DrainDBCountCmp", stats->ts_getdb); TELEMETRY_LOG_ULL(recs, "RevertDBCountCmp", stats->ts_revertdb); TELEMETRY_LOG_ULL(recs, "CommitDBCountCmp", stats->ts_commitdb); TELEMETRY_LOG_ULL(recs, "DrainDataToCmp", stats->ts_drained_bytes); TELEMETRY_LOG_ULL(recs, "NwLatBckt1ToCmp", stats->ts_nwlb1); TELEMETRY_LOG_ULL(recs, "NwLatBckt2ToCmp", stats->ts_nwlb2); TELEMETRY_LOG_ULL(recs, "NwLatBckt3ToCmp", stats->ts_nwlb3); TELEMETRY_LOG_ULL(recs, "NwLatBckt4ToCmp", stats->ts_nwlb4); TELEMETRY_LOG_ULL(recs, "NwLatBckt5ToCmp", stats->ts_nwlb5); TELEMETRY_LOG_ULL(recs, "CommitDbFailToCmp", stats->ts_commitdb_failed); /* Generic info */ state = telemetry_get_wostate(ctxt); TELEMETRY_LOG_WTIME(recs, "LastSuccessInsertTime", tag_hist->th_prev_succ_tag_time); TELEMETRY_LOG_ULL(recs, "WoFlags", state); TELEMETRY_LOG_UL(recs, "CountWOSToMD", ctxt->tc_stats.num_change_to_wostate[ecWriteOrderStateMetadata]); TELEMETRY_LOG_UL(recs, "CountWOSToMDUser", ctxt->tc_stats.num_change_to_wostate_user[ecWriteOrderStateMetadata]); TELEMETRY_LOG_UL(recs, "CountWOSToBitmap", ctxt->tc_stats.num_change_to_wostate[ecWriteOrderStateBitmap]); TELEMETRY_LOG_UL(recs, "CountWOSToBitmapUser", ctxt->tc_stats.num_change_to_wostate_user[ecWriteOrderStateBitmap]); TELEMETRY_LOG_WTIME(recs, "LastWoTime", ctxt->tc_stats.st_wostate_switch_time * HUNDREDS_OF_NANOSEC_IN_SECOND); TELEMETRY_LOG_ULL(recs, "MDChangesPending", ctxt->tc_bytes_pending_md_changes); TELEMETRY_LOG_WTIME(recs, "DiffSyncThrottleStartTS", ctxt->tc_tel.tt_ds_throttle_start); TELEMETRY_LOG_WTIME(recs, "DiffSyncThrottleEndTS", ctxt->tc_tel.tt_ds_throttle_stop); TELEMETRY_LOG_WTIME(recs, "FirstGetDbTimeOnDrainBlk", 0ULL); TELEMETRY_LOG_ULL(recs, "GetDbLatencyCnt", 0ULL); TELEMETRY_LOG_ULL(recs, "WaitDbLatencyCnt", 0ULL); TELEMETRY_LOG_ULL(recs, "DispatchIrpCnt", 0ULL); TELEMETRY_LOG_ULL(recs, "PageFileIoCnt", 0ULL); TELEMETRY_LOG_ULL(recs, "NullFileObjCnt", 0ULL); /* Non WO State stats */ nwo = &ctxt->tc_tel.tt_nwo; state = telemetry_md_capture_reason(ctxt); TELEMETRY_LOG_UINT(recs, "NonWoSReason", nwo->nws_reason); TELEMETRY_LOG_ULL(recs, "MetaDataCaptureReason", state); TELEMETRY_LOG_ULL(recs, "DiskStatusFlagsNonWo", nwo->nws_blend); TELEMETRY_LOG_UINT(recs, "NonPagePoolAlloc", nwo->nws_np_alloc); TELEMETRY_LOG_WTIME(recs, "LastNPLimitHitTime", nwo->nws_np_limit_time); TELEMETRY_LOG_UINT(recs, "NonPagePoolAllocFail", nwo->nws_np_alloc_fail); TELEMETRY_LOG_UINT(recs, "MemAllocDevCntxt", nwo->nws_mem_alloc); TELEMETRY_LOG_UINT(recs, "MemResDevCntxt", nwo->nws_mem_reserved); TELEMETRY_LOG_UINT(recs, "MemFreeDevCntxt", nwo->nws_mem_free); TELEMETRY_LOG_UINT(recs, "GlobalFreeDbCount", nwo->nws_free_cn); TELEMETRY_LOG_UINT(recs, "GlobalLockedDbCount", nwo->nws_used_cn); TELEMETRY_LOG_UINT(recs, "MaxLockedDb", nwo->nws_max_used_cn); TELEMETRY_LOG_UINT(recs, "NewWoState", nwo->nws_new_state); TELEMETRY_LOG_UINT(recs, "OldStateDuration", nwo->nws_nwo_secs); TELEMETRY_LOG_WTIME(recs, "SysTimeAtWOSChange", nwo->nws_change_time); /* Generic info */ TELEMETRY_LOG_UINT(recs, "LastResyncError", ctxt->tc_hist.ths_osync_err); TELEMETRY_LOG_WTIME(recs, "LastResyncTime",ctxt->tc_hist.ths_osync_ts * HUNDREDS_OF_NANOSEC_IN_SECOND); TELEMETRY_LOG_WTIME(recs, "ResyncStartTime", ctxt->tc_tel.tt_resync_start); TELEMETRY_LOG_WTIME(recs, "ResyncEndTime", ctxt->tc_tel.tt_resync_end); TELEMETRY_LOG_WTIME(recs, "ClearDiffTime", ctxt->tc_hist.ths_clrdiff_ts * HUNDREDS_OF_NANOSEC_IN_SECOND); TELEMETRY_LOG_WTIME(recs, "StartFilterKrnlTime", ctxt->tc_hist.ths_start_flt_ts * HUNDREDS_OF_NANOSEC_IN_SECOND); TELEMETRY_LOG_WTIME(recs, "DevCntxtCreateTime", ctxt->tc_tel.tt_create_time); TELEMETRY_LOG_WTIME(recs, "LastDrainDbTime", ctxt->tc_tel.tt_getdb_time); TELEMETRY_LOG_WTIME(recs, "LastCommitDbTime", ctxt->tc_tel.tt_commitdb_time); TELEMETRY_LOG_WTIME(recs, "DrvLoadTime", driver_ctx->dc_tel.dt_drv_load_time); TELEMETRY_LOG_UINT(recs, "DevContextFlags", ctxt->tc_flags); TELEMETRY_LOG_WTIME(recs, "LastS2Start", driver_ctx->dc_tel.dt_s2_start_time); TELEMETRY_LOG_WTIME(recs, "LastS2Stop", driver_ctx->dc_tel.dt_s2_stop_time); TELEMETRY_LOG_WTIME(recs, "LastSVStart", driver_ctx->dc_tel.dt_svagent_start_time); TELEMETRY_LOG_WTIME(recs, "LastSVStop", driver_ctx->dc_tel.dt_svagent_stop_time); TELEMETRY_LOG_UINT(recs, "MemAllocFailCount", driver_ctx->stats.num_malloc_fails); TELEMETRY_LOG_UINT(recs, "MessageType", msg | TELEMETRY_LINUX_MSG_TYPE); error = telemetry_log_end(recs); out: if (tag_hist) telemetry_tag_history_free(tag_hist); return error; } inm_s32_t telemetry_log_tag_failure(target_context_t *ctxt, tag_telemetry_common_t *tag_common, inm_s32_t tag_error, etMessageType msg) { inm_s32_t error = 0; inm_list_head_t rec_list; inm_list_head_t *recs = &rec_list; inm_u64_t state = 0; tgt_stats_t *stats = NULL; non_wo_stats_t *nwo = NULL; inm_u64_t curtime = 0; exception_buf_t *excbuf = NULL; if (!tag_common) goto out; error = telemetry_log_start(recs); if (error) goto out; excbuf = telemetry_get_exception(); TELEMETRY_LOG_STR(recs, "CustomJson", excbuf->eb_buf); telemetry_put_exception(excbuf); get_time_stamp(&curtime); /* Tag Data */ state = telemetry_get_dbs(ctxt, ecTagStatusInsertFailure, ecNotApplicable); TELEMETRY_LOG_WTIME(recs, "TagRqstTime", tag_common->tc_req_time); TELEMETRY_LOG_UINT(recs, "TagType", tag_common->tc_type); TELEMETRY_LOG_STR(recs, "TagMarkerGUID", tag_common->tc_guid); TELEMETRY_LOG_UINT(recs, "NumOfTotalDisk", tag_common->tc_ndisks); TELEMETRY_LOG_UINT(recs, "NumOfProtectdDisk", tag_common->tc_ndisks_prot); TELEMETRY_LOG_UINT(recs, "NumOfTaggedDisk", tag_common->tc_ndisks_tagged); TELEMETRY_LOG_INT(recs, "IoctlCode", tag_common->tc_ioctl_cmd); TELEMETRY_LOG_INT(recs, "TagStatus", tag_error); TELEMETRY_LOG_INT(recs, "GlobalIoctlStatus", tag_common->tc_ioctl_status); TELEMETRY_LOG_ULL(recs, "PrevTagEndTS", ctxt->tc_tel.tt_prev_tag_ts); TELEMETRY_LOG_ULL(recs, "PrevTagEndSeq", ctxt->tc_tel.tt_prev_tag_seqno); TELEMETRY_LOG_WTIME(recs, "TagInsertTime", curtime); TELEMETRY_LOG_WTIME(recs, "TimeJumpDetectedTS", driver_ctx->dc_tel.dt_time_jump_exp); TELEMETRY_LOG_WTIME(recs, "TimeJumpedTS", driver_ctx->dc_tel.dt_time_jump_cur); TELEMETRY_LOG_STR(recs, "DiskIdentity", ctxt->tc_pname); TELEMETRY_LOG_STR(recs, "DiskName", ctxt->tc_guid); TELEMETRY_LOG_ULL(recs, "DiskBlendedState", state); TELEMETRY_LOG_ULL(recs, "TagState", (inm_u64_t)ecTagStatusInsertFailure); /* Previous tag replication stats */ stats = &ctxt->tc_tel.tt_prev_stats; TELEMETRY_LOG_WTIME(recs, "LastTagInsertTime", ctxt->tc_tel.tt_prev_tag_time); TELEMETRY_LOG_ULL(recs, "PendChgPrev", stats->ts_pending); TELEMETRY_LOG_ULL(recs, "ChurnPrev", stats->ts_tracked_bytes); TELEMETRY_LOG_ULL(recs, "DrainDBCountPrev", stats->ts_getdb); TELEMETRY_LOG_ULL(recs, "RevertDBCountPrev", stats->ts_revertdb); TELEMETRY_LOG_ULL(recs, "CommitDBCountPrev", stats->ts_commitdb); TELEMETRY_LOG_ULL(recs, "DrainDataPrevInBytes", stats->ts_drained_bytes); TELEMETRY_LOG_ULL(recs, "NwLatBckt1Prev", stats->ts_nwlb1); TELEMETRY_LOG_ULL(recs, "NwLatBckt2Prev", stats->ts_nwlb2); TELEMETRY_LOG_ULL(recs, "NwLatBckt3Prev", stats->ts_nwlb3); TELEMETRY_LOG_ULL(recs, "NwLatBckt4Prev", stats->ts_nwlb4); TELEMETRY_LOG_ULL(recs, "NwLatBckt5Prev", stats->ts_nwlb5); TELEMETRY_LOG_ULL(recs, "CommitDbFailPrev", stats->ts_commitdb_failed); /* Current replication stats */ /* Update the prev stats to current stats */ ctxt->tc_tel.tt_prev_tag_time = curtime; telemetry_tag_stats_record(ctxt, stats); TELEMETRY_LOG_ULL(recs, "PendChgOnInsert", stats->ts_pending); TELEMETRY_LOG_ULL(recs, "ChurnCurr", stats->ts_tracked_bytes); TELEMETRY_LOG_ULL(recs, "DrainDBCountCurr", stats->ts_getdb); TELEMETRY_LOG_ULL(recs, "RevertDBCountCurr", stats->ts_revertdb); TELEMETRY_LOG_ULL(recs, "CommitDBCountCurr", stats->ts_commitdb); TELEMETRY_LOG_ULL(recs, "DrainDataCurr", stats->ts_drained_bytes); TELEMETRY_LOG_ULL(recs, "NwLatBckt1Curr", stats->ts_nwlb1); TELEMETRY_LOG_ULL(recs, "NwLatBckt2Curr", stats->ts_nwlb2); TELEMETRY_LOG_ULL(recs, "NwLatBckt3Curr", stats->ts_nwlb3); TELEMETRY_LOG_ULL(recs, "NwLatBckt4Curr", stats->ts_nwlb4); TELEMETRY_LOG_ULL(recs, "NwLatBckt5Curr", stats->ts_nwlb5); TELEMETRY_LOG_ULL(recs, "CommitDbFailCurr", stats->ts_commitdb_failed); /* Prev successful tag stats */ stats = &ctxt->tc_tel.tt_prev_succ_stats; TELEMETRY_LOG_ULL(recs, "PendChgToCmp", stats->ts_pending); TELEMETRY_LOG_ULL(recs, "ChurnToCmp", stats->ts_tracked_bytes); TELEMETRY_LOG_ULL(recs, "DrainDBCountCmp", stats->ts_getdb); TELEMETRY_LOG_ULL(recs, "RevertDBCountCmp", stats->ts_revertdb); TELEMETRY_LOG_ULL(recs, "CommitDBCountCmp", stats->ts_commitdb); TELEMETRY_LOG_ULL(recs, "DrainDataToCmp", stats->ts_drained_bytes); TELEMETRY_LOG_ULL(recs, "NwLatBckt1ToCmp", stats->ts_nwlb1); TELEMETRY_LOG_ULL(recs, "NwLatBckt2ToCmp", stats->ts_nwlb2); TELEMETRY_LOG_ULL(recs, "NwLatBckt3ToCmp", stats->ts_nwlb3); TELEMETRY_LOG_ULL(recs, "NwLatBckt4ToCmp", stats->ts_nwlb4); TELEMETRY_LOG_ULL(recs, "NwLatBckt5ToCmp", stats->ts_nwlb5); TELEMETRY_LOG_ULL(recs, "CommitDbFailToCmp", stats->ts_commitdb_failed); /* Generic info */ state = telemetry_get_wostate(ctxt); TELEMETRY_LOG_WTIME(recs, "LastSuccessInsertTime", ctxt->tc_tel.tt_prev_succ_tag_time); TELEMETRY_LOG_ULL(recs, "WoFlags", state); TELEMETRY_LOG_UL(recs, "CountWOSToMD", ctxt->tc_stats.num_change_to_wostate[ecWriteOrderStateMetadata]); TELEMETRY_LOG_UL(recs, "CountWOSToMDUser", ctxt->tc_stats.num_change_to_wostate_user[ecWriteOrderStateMetadata]); TELEMETRY_LOG_UL(recs, "CountWOSToBitmap", ctxt->tc_stats.num_change_to_wostate[ecWriteOrderStateBitmap]); TELEMETRY_LOG_UL(recs, "CountWOSToBitmapUser", ctxt->tc_stats.num_change_to_wostate_user[ecWriteOrderStateBitmap]); TELEMETRY_LOG_WTIME(recs, "LastWoTime", ctxt->tc_stats.st_wostate_switch_time * HUNDREDS_OF_NANOSEC_IN_SECOND); TELEMETRY_LOG_ULL(recs, "MDChangesPending", ctxt->tc_bytes_pending_md_changes); TELEMETRY_LOG_WTIME(recs, "DiffSyncThrottleStartTS", ctxt->tc_tel.tt_ds_throttle_start); TELEMETRY_LOG_WTIME(recs, "DiffSyncThrottleEndTS", ctxt->tc_tel.tt_ds_throttle_stop); TELEMETRY_LOG_WTIME(recs, "FirstGetDbTimeOnDrainBlk", 0ULL); TELEMETRY_LOG_ULL(recs, "GetDbLatencyCnt", 0ULL); TELEMETRY_LOG_ULL(recs, "WaitDbLatencyCnt", 0ULL); TELEMETRY_LOG_ULL(recs, "DispatchIrpCnt", 0ULL); TELEMETRY_LOG_ULL(recs, "PageFileIoCnt", 0ULL); TELEMETRY_LOG_ULL(recs, "NullFileObjCnt", 0ULL); /* Non WO State stats */ nwo = &ctxt->tc_tel.tt_nwo; state = telemetry_md_capture_reason(ctxt); TELEMETRY_LOG_UINT(recs, "NonWoSReason", nwo->nws_reason); TELEMETRY_LOG_ULL(recs, "MetaDataCaptureReason", state); TELEMETRY_LOG_ULL(recs, "DiskStatusFlagsNonWo", nwo->nws_blend); TELEMETRY_LOG_UINT(recs, "NonPagePoolAlloc", nwo->nws_np_alloc); TELEMETRY_LOG_WTIME(recs, "LastNPLimitHitTime", nwo->nws_np_limit_time); TELEMETRY_LOG_UINT(recs, "NonPagePoolAllocFail", nwo->nws_np_alloc_fail); TELEMETRY_LOG_UINT(recs, "MemAllocDevCntxt", nwo->nws_mem_alloc); TELEMETRY_LOG_UINT(recs, "MemResDevCntxt", nwo->nws_mem_reserved); TELEMETRY_LOG_UINT(recs, "MemFreeDevCntxt", nwo->nws_mem_free); TELEMETRY_LOG_UINT(recs, "GlobalFreeDbCount", nwo->nws_free_cn); TELEMETRY_LOG_UINT(recs, "GlobalLockedDbCount", nwo->nws_used_cn); TELEMETRY_LOG_UINT(recs, "MaxLockedDb", nwo->nws_max_used_cn); TELEMETRY_LOG_UINT(recs, "NewWoState", nwo->nws_new_state); TELEMETRY_LOG_UINT(recs, "OldStateDuration", nwo->nws_nwo_secs); TELEMETRY_LOG_WTIME(recs, "SysTimeAtWOSChange", nwo->nws_change_time); /* Generic info */ TELEMETRY_LOG_UINT(recs, "LastResyncError", ctxt->tc_hist.ths_osync_err); TELEMETRY_LOG_WTIME(recs, "LastResyncTime",ctxt->tc_hist.ths_osync_ts * HUNDREDS_OF_NANOSEC_IN_SECOND); TELEMETRY_LOG_WTIME(recs, "ResyncStartTime", ctxt->tc_tel.tt_resync_start); TELEMETRY_LOG_WTIME(recs, "ResyncEndTime", ctxt->tc_tel.tt_resync_end); TELEMETRY_LOG_WTIME(recs, "ClearDiffTime", ctxt->tc_hist.ths_clrdiff_ts * HUNDREDS_OF_NANOSEC_IN_SECOND); TELEMETRY_LOG_WTIME(recs, "StartFilterKrnlTime", ctxt->tc_hist.ths_start_flt_ts * HUNDREDS_OF_NANOSEC_IN_SECOND); TELEMETRY_LOG_WTIME(recs, "DevCntxtCreateTime", ctxt->tc_tel.tt_create_time); TELEMETRY_LOG_WTIME(recs, "LastDrainDbTime", ctxt->tc_tel.tt_getdb_time); TELEMETRY_LOG_WTIME(recs, "LastCommitDbTime", ctxt->tc_tel.tt_commitdb_time); TELEMETRY_LOG_WTIME(recs, "DrvLoadTime", driver_ctx->dc_tel.dt_drv_load_time); TELEMETRY_LOG_UINT(recs, "DevContextFlags", ctxt->tc_flags); TELEMETRY_LOG_WTIME(recs, "LastS2Start", driver_ctx->dc_tel.dt_s2_start_time); TELEMETRY_LOG_WTIME(recs, "LastS2Stop", driver_ctx->dc_tel.dt_s2_stop_time); TELEMETRY_LOG_WTIME(recs, "LastSVStart", driver_ctx->dc_tel.dt_svagent_start_time); TELEMETRY_LOG_WTIME(recs, "LastSVStop", driver_ctx->dc_tel.dt_svagent_stop_time); TELEMETRY_LOG_UINT(recs, "MemAllocFailCount", driver_ctx->stats.num_malloc_fails); TELEMETRY_LOG_UINT(recs, "MessageType", msg | TELEMETRY_LINUX_MSG_TYPE); error = telemetry_log_end(recs); out: return error; } inm_s32_t telemetry_log_ioctl_failure(tag_telemetry_common_t *tag_common, inm_s32_t tag_error, etMessageType msg) { inm_s32_t error = 0; inm_u64_t state = 0; inm_list_head_t rec_list; inm_list_head_t *recs = &rec_list; exception_buf_t *excbuf = NULL; if (!tag_common) goto out; error = telemetry_log_start(recs); if (error) goto out; excbuf = telemetry_get_exception(); TELEMETRY_LOG_STR(recs, "CustomJson", excbuf->eb_buf); telemetry_put_exception(excbuf); state = telemetry_get_dbs(NULL, ecTagStatusIOCTLFailure, ecNotApplicable); TELEMETRY_LOG_WTIME(recs, "TagRqstTime", tag_common->tc_req_time); TELEMETRY_LOG_UINT(recs, "TagType", tag_common->tc_type); TELEMETRY_LOG_STR(recs, "TagMarkerGUID", tag_common->tc_guid); TELEMETRY_LOG_UINT(recs, "NumOfTotalDisk", tag_common->tc_ndisks); TELEMETRY_LOG_UINT(recs, "NumOfProtectdDisk", tag_common->tc_ndisks_prot); TELEMETRY_LOG_UINT(recs, "NumOfTaggedDisk", tag_common->tc_ndisks_tagged); TELEMETRY_LOG_INT(recs, "IoctlCode", tag_common->tc_ioctl_cmd); TELEMETRY_LOG_INT(recs, "GlobalIoctlStatus", tag_error); TELEMETRY_LOG_WTIME(recs, "TimeJumpDetectedTS", driver_ctx->dc_tel.dt_time_jump_exp); TELEMETRY_LOG_WTIME(recs, "TimeJumpedTS", driver_ctx->dc_tel.dt_time_jump_cur); TELEMETRY_LOG_ULL(recs, "DiskBlendedState", state); TELEMETRY_LOG_ULL(recs, "TagState", (inm_u64_t)ecTagStatusIOCTLFailure); TELEMETRY_LOG_UINT(recs, "MessageType", msg | TELEMETRY_LINUX_MSG_TYPE); error = telemetry_log_end(recs); out: return error; } #ifdef INM_FLT_TEST /* Need some space for prefix and suffix */ char tbuf[TELEMETRY_REC_BUF_SIZE - 1024]; #define TELEMETRY_TEST_PAGES 3 /* * This test is intended to run without any protections * enabled and on driver load */ void telemetry_log_multi_page_test(void) { inm_s32_t error = 0; inm_list_head_t rec_list; inm_list_head_t *recs = &rec_list; char x = 'a'; int i = TELEMETRY_TEST_PAGES; driver_ctx->total_prot_volumes = 1; error = telemetry_log_start(recs); if (error) goto out; while (i--) { memset(tbuf, x++, sizeof(tbuf) - 1); tbuf[sizeof(tbuf) - 1] = '\0'; TELEMETRY_LOG_STR(recs, "KEY", tbuf); } error = telemetry_log_end(recs); driver_ctx->total_prot_volumes = 0; telemetry_offload_flush_to_worker(); out: return; } #endif void telemetry_shutdown(void) { if (!tel_init) { err("Telemetry not initialized"); INM_BUG_ON(!tel_init); return; } /* * dont allow any further timers */ tel_shutdown = 1; /* * Force a timeout which should queue all pending writes * to worker thread */ force_timeout(&tel_timer); /* * If timer was not started while force timeout was called, * explicitly queue all pending */ telemetry_offload_flush_to_worker(); } inm_s32_t telemetry_init(void) { int error = 0; INM_INIT_LIST_HEAD(&tel_list); INM_INIT_SEM(&tel_mutex); INM_INIT_SPIN_LOCK(&tel_slock); tel_shutdown = 0; tel_wqe = alloc_work_queue_entry(INM_KM_SLEEP); if (!tel_wqe) { err("Error allocating telemetry work queue entry"); error = -ENOMEM; goto out; } if (!get_path_memory(&tel_fname)) { err("Error allocating telemetry file name buffer"); put_work_queue_entry(tel_wqe); error = -ENOMEM; goto out; } start_tel_timer(); info("involflt telemetry initialsed"); tel_init = 1; #ifdef INM_FLT_TEST telemetry_log_multi_page_test(); #endif out: return error; } #else inm_s32_t telemetry_log_tag_history(change_node_t *chg_node, target_context_t *ctxt, etTagStatus status, etTagStateTriggerReason reason, etMessageType msg) { return 0; } inm_s32_t telemetry_log_tag_failure(target_context_t *ctxt, tag_telemetry_common_t *tag_common, inm_s32_t tag_error, etMessageType msg) { return 0; } inm_s32_t telemetry_log_ioctl_failure(tag_telemetry_common_t *tag_common, inm_s32_t tag_error, etMessageType msg) { return 0; } void telemetry_shutdown(void) { return; } inm_s32_t telemetry_init(void) { return 0; } #endif involflt-0.1.0/src/ioctl.h0000755000000000000000000001276014467303177014142 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _INM_IOCTL_H #define _INM_IOCTL_H #include "inm_utypes.h" #include "involflt.h" #include "osdep.h" #include "driver-context.h" /* IOCTL function declarations */ inm_s32_t process_volume_stacking_ioctl(inm_devhandle_t *, void __INM_USER *); inm_s32_t process_mirror_volume_stacking_ioctl(inm_devhandle_t *, void __INM_USER *); inm_s32_t process_start_notify_ioctl(inm_devhandle_t *, void __INM_USER *); inm_s32_t process_shutdown_notify_ioctl(inm_devhandle_t *, void __INM_USER *); inm_s32_t process_start_filtering_ioctl(inm_devhandle_t *, void __INM_USER *); inm_s32_t process_stop_filtering_ioctl(inm_devhandle_t *, void __INM_USER *); inm_s32_t process_start_mirroring_ioctl(inm_devhandle_t *, void __INM_USER *); inm_s32_t process_stop_mirroring_ioctl(inm_devhandle_t *, void __INM_USER *); inm_s32_t process_volume_unstacking_ioctl(inm_devhandle_t *, void __INM_USER *); inm_s32_t process_get_db_ioctl(inm_devhandle_t *, void __INM_USER *); inm_s32_t process_commit_db_ioctl(inm_devhandle_t *, void __INM_USER *); inm_s32_t process_get_time_ioctl(void __INM_USER *); inm_s32_t process_clear_diffs_ioctl(inm_devhandle_t *, void __INM_USER *); inm_s32_t process_set_volume_flags_ioctl(inm_devhandle_t *, void __INM_USER *); inm_s32_t process_get_volume_flags_ioctl(inm_devhandle_t *, void __INM_USER *); inm_s32_t process_wait_for_db_ioctl(inm_devhandle_t *, void __INM_USER *); inm_s32_t process_wait_for_db_v2_ioctl(inm_devhandle_t *, void __INM_USER *); inm_s32_t process_sys_shutdown_notify_ioctl(inm_devhandle_t *, void __INM_USER *); inm_s32_t process_sys_pre_shutdown_notify_ioctl(inm_devhandle_t *, void __INM_USER *); inm_s32_t process_tag_ioctl(inm_devhandle_t *, void __INM_USER *, inm_s32_t); inm_s32_t process_get_tag_status_ioctl(inm_devhandle_t *, void __INM_USER *); inm_s32_t process_wake_all_threads_ioctl(inm_devhandle_t *, void __INM_USER *); inm_s32_t process_get_db_threshold(inm_devhandle_t *, void __INM_USER *); inm_s32_t process_resync_start_ioctl(inm_devhandle_t *, void __INM_USER *); inm_s32_t process_resync_end_ioctl(inm_devhandle_t *, void __INM_USER *); inm_s32_t process_get_driver_version_ioctl(inm_devhandle_t *, void __INM_USER *); inm_s32_t process_shell_log_ioctl(inm_devhandle_t *, void __INM_USER *); inm_s32_t process_at_lun_create(inm_devhandle_t *, void __INM_USER *); inm_s32_t process_at_lun_last_write_vi(inm_devhandle_t *, void __INM_USER *); inm_s32_t process_at_lun_last_host_io_timestamp(inm_devhandle_t *, void __INM_USER *); inm_s32_t process_at_lun_query(inm_devhandle_t *, void __INM_USER*); inm_s32_t process_at_lun_delete(inm_devhandle_t *, void __INM_USER *); inm_s32_t process_get_global_stats_ioctl(inm_devhandle_t *handle, void * arg); inm_s32_t process_get_volume_stats_ioctl(inm_devhandle_t *handle, void * arg); inm_s32_t process_get_volume_stats_v2_ioctl(inm_devhandle_t *handle, void * arg); inm_s32_t process_get_monitoring_stats_ioctl(inm_devhandle_t *handle, void * arg); inm_s32_t process_get_protected_volume_list_ioctl(inm_devhandle_t *handle, void * arg); inm_s32_t process_get_set_attr_ioctl(inm_devhandle_t *handle, void * arg); inm_u32_t process_boottime_stacking_ioctl(inm_devhandle_t *handle, void * arg); inm_u32_t process_mirror_exception_notify_ioctl(inm_devhandle_t *handle, void * arg); inm_s32_t process_get_dmesg(inm_devhandle_t *handle, void * arg); inm_s32_t process_get_additional_volume_stats(inm_devhandle_t *handle, void * arg); inm_s32_t process_get_volume_latency_stats(inm_devhandle_t *handle, void * arg); inm_s32_t process_bitmap_stats_ioctl(inm_devhandle_t *handle, void *arg); inm_s32_t process_set_involflt_verbosity(inm_devhandle_t *handle, void *arg); inm_s32_t process_mirror_test_heartbeat(inm_devhandle_t *handle, void *arg); inm_s32_t process_tag_volume_ioctl(inm_devhandle_t *handle, void *arg); inm_s32_t process_get_blk_mq_status_ioctl(inm_devhandle_t *handle, void * arg); inm_s32_t process_replication_state_ioctl(inm_devhandle_t *handle, void * arg); inm_s32_t process_name_mapping_ioctl(inm_devhandle_t *handle, void *arg); inm_s32_t process_lcw_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg); inm_s32_t process_commitdb_fail_trans_ioctl(inm_devhandle_t *handle, void *arg); inm_s32_t process_get_cxstatus_notify_ioctl(inm_devhandle_t *handle, void *arg); inm_s32_t process_wakeup_get_cxstatus_notify_ioctl(inm_devhandle_t *handle, void *arg); inm_s32_t process_tag_drain_notify_ioctl(inm_devhandle_t *handle, void *arg); inm_s32_t process_wakeup_tag_drain_notify_ioctl(inm_devhandle_t *handle, void *arg); inm_s32_t process_modify_persistent_device_name(inm_devhandle_t *handle, void *arg); inm_s32_t process_get_drain_state_ioctl(inm_devhandle_t *handle, void *arg); inm_s32_t process_set_drain_state_ioctl(inm_devhandle_t *handle, void *arg); #endif /* _INM_FILTER_H */ involflt-0.1.0/src/inm_locks.h0000755000000000000000000001056414467303177015006 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _INM_LOCKS_H #define _INM_LOCKS_H /* #include #include #include #include #include */ typedef struct mutex inm_mutex_t; typedef spinlock_t inm_spinlock_t; typedef rwlock_t inm_rw_spinlock_t; typedef struct rw_semaphore inm_rwsem_t; typedef struct semaphore inm_sem_t; typedef atomic_t inm_atomic_t; /* mutex lock apis */ #define INM_MUTEX_INIT(lock) mutex_init((struct mutex*)lock) #define INM_MUTEX_LOCK(lock) mutex_lock((struct mutex*)lock) #define INM_MUTEX_UNLOCK(lock) mutex_unlock((struct mutex*)lock) /* spin lock api */ #define INM_INIT_SPIN_LOCK(lock) spin_lock_init((spinlock_t*)lock) #define INM_DESTROY_SPIN_LOCK(lock) #define INM_SPIN_LOCK(lock) spin_lock((spinlock_t*)lock) #define INM_SPIN_UNLOCK(lock) spin_unlock((spinlock_t*)lock) #define INM_SPIN_LOCK_BH(lock) spin_lock_bh((spinlock_t*)lock) #define INM_SPIN_UNLOCK_BH(lock) spin_unlock_bh((spinlock_t*)lock) #define INM_SPIN_LOCK_IRQSAVE(lock, flag) \ spin_lock_irqsave((spinlock_t*)lock, flag) #define INM_SPIN_UNLOCK_IRQRESTORE(lock, flag) \ spin_unlock_irqrestore((spinlock_t*)lock, flag) #define INM_IS_SPINLOCK_HELD(lock_addr) spin_is_locked(lock_addr) #define INM_SPIN_LOCK_WRAPPER(lock, flag) INM_SPIN_LOCK(lock) #define INM_SPIN_UNLOCK_WRAPPER(lock, flag) INM_SPIN_UNLOCK(lock) /* semahore apis */ #if LINUX_VERSION_CODE <= KERNEL_VERSION(2,6,35) #define INM_INIT_SEM(sem) init_MUTEX((struct semaphore*)sem) #define INM_INIT_SEM_LOCKED(sem) init_MUTEX_LOCKED((struct semaphore*)sem) #else #define INM_INIT_SEM(sem) sema_init((struct semaphore*)sem, 1) #define INM_INIT_SEM_LOCKED(sem) sema_init((struct semaphore*)sem, 0) #endif #define INM_DOWN(sem) down((struct semaphore*)sem) #define INM_DOWN_INTERRUPTIBLE(sem) down_interruptible((struct semaphore*)sem) #define INM_DOWN_TRYLOCK(sem) down_trylock((struct semaphore*)sem) #define INM_UP(sem) up((struct semaphore*)sem) #define INM_DESTROY_SEM(sem) /* read write semahore apis */ #define INM_RW_SEM_INIT(rw_sem) init_rwsem((struct rw_semaphore*)rw_sem) #define INM_RW_SEM_DESTROY(rw_sem) #define INM_DOWN_READ(rw_sem) down_read((struct rw_semaphore*)rw_sem) #define INM_DOWN_READ_TRYLOCK(rw_sem) down_read_trylock((struct rw_semaphore*)rw_sem) #define INM_UP_READ(rw_sem) up_read((struct rw_semaphore*)rw_sem) #define INM_DOWN_WRITE(rw_sem) down_write((struct rw_semaphore*)rw_sem) #define INM_UP_WRITE(rw_sem) up_write((struct rw_semaphore*)rw_sem) #define INM_DOWNGRADE_WRITE(rw_sem) downgrade_write((struct rw_semaphore*)rw_sem) /* atomic operations */ #define INM_ATOMIC_SET(var, val) atomic_set((atomic_t*)var, val) #define INM_ATOMIC_INC(var) atomic_inc((atomic_t*)var) #define INM_ATOMIC_DEC(var) atomic_dec((atomic_t*)var) #define INM_ATOMIC_READ(var) atomic_read((atomic_t*)var) #define INM_ATOMIC_DEC_AND_TEST(var) atomic_dec_and_test((atomic_t*)var) #define INM_VOL_LOCK(lock, flag) \ if(irqs_disabled()) { \ INM_SPIN_LOCK_IRQSAVE(lock, flag); \ } else { \ INM_SPIN_LOCK_BH(lock); \ } #define INM_VOL_UNLOCK(lock, flag) \ if(irqs_disabled()) { \ INM_SPIN_UNLOCK_IRQRESTORE(lock, flag);\ } else { \ INM_SPIN_UNLOCK_BH(lock);\ } #define INM_DESTROY_WAITQUEUE_HEAD(eventp) #define INM_ATOMIC_DEC_RET(var) atomic_dec_return(var) #endif /* _INM_LOCKS_H */ involflt-0.1.0/src/md5.c0000755000000000000000000001731714467303177013513 0ustar rootroot/* * This code implements the MD5 message-digest algorithm. * The algorithm is due to Ron Rivest. This code was * written by Colin Plumb in 1993, no copyright is claimed. * This code is in the public domain; do with it what you wish. * * Equivalent code is available from RSA Data Security, Inc. * This code has been tested against that, and is equivalent, * except that you don't need to include two pages of legalese * with every copy. * * To compute the message digest of a chunk of bytes, declare an * MD5Context structure, pass it to MD5Init, call MD5Update as * needed on buffers full of bytes, and then call MD5Final, which * will fill a supplied 16-byte array with the digest. */ /* Brutally hacked by John Walker back from ANSI C to K&R (no prototypes) to maintain the tradition that Netfone will compile with Sun's original "cc". */ #include "involflt-common.h" #include "md5.h" #ifdef sgi #define HIGHFIRST #endif #ifdef sun #define HIGHFIRST #endif #ifndef HIGHFIRST #define byteReverse(buf, len) /* Nothing */ #else /* * Note: this code is harmless on little-endian machines. */ void byteReverse(unsigned char *buf, unsigned longs) { uint32 t; do { t = (uint32) ((unsigned) buf[3] << 8 | buf[2]) << 16 | ((unsigned) buf[1] << 8 | buf[0]); *(uint32 *) buf = t; buf += 4; } while (--longs); } #endif /* * Start MD5 accumulation. Set bit count to 0 and buffer to mysterious * initialization constants. */ void MD5Init(MD5Context *ctx) { ctx->buf[0] = 0x67452301; ctx->buf[1] = 0xefcdab89; ctx->buf[2] = 0x98badcfe; ctx->buf[3] = 0x10325476; ctx->bits[0] = 0; ctx->bits[1] = 0; } /* * Update context to reflect the concatenation of another buffer full * of bytes. */ void MD5Update(MD5Context *ctx, unsigned char *buf, unsigned len) { uint32 t; /* Update bitcount */ t = ctx->bits[0]; if ((ctx->bits[0] = t + ((uint32) len << 3)) < t) ctx->bits[1]++; /* Carry from low to high */ ctx->bits[1] += len >> 29; t = (t >> 3) & 0x3f; /* Bytes already in shsInfo->data */ /* Handle any leading odd-sized chunks */ if (t) { unsigned char *p = (unsigned char *) ctx->in + t; t = 64 - t; if (len < t) { memcpy_s(p, len, buf, len); return; } memcpy_s(p, t, buf, t); byteReverse(ctx->in, 16); MD5Transform(ctx->buf, (uint32 *) ctx->in); buf += t; len -= t; } /* Process data in 64-byte chunks */ while (len >= 64) { memcpy_s(ctx->in, 64, buf, 64); byteReverse(ctx->in, 16); MD5Transform(ctx->buf, (uint32 *) ctx->in); buf += 64; len -= 64; } /* Handle any remaining bytes of data. */ memcpy_s(ctx->in, len, buf, len); } /* * Final wrapup - pad to 64-byte boundary with the bit pattern * 1 0* (64-bit count of bits processed, MSB-first) */ void MD5Final(unsigned char digest[16], struct MD5Context *ctx) { unsigned count; unsigned char *p; /* Compute number of bytes mod 64 */ count = (ctx->bits[0] >> 3) & 0x3F; /* Set the first char of padding to 0x80. This is safe since there is always at least one byte free */ p = ctx->in + count; *p++ = 0x80; /* Bytes of padding needed to make 64 bytes */ count = 64 - 1 - count; /* Pad out to 56 mod 64 */ if (count < 8) { /* Two lots of padding: Pad the first block to 64 bytes */ INM_MEM_ZERO(p, count); byteReverse(ctx->in, 16); MD5Transform(ctx->buf, (uint32 *) ctx->in); /* Now fill the next block with 56 bytes */ INM_MEM_ZERO(ctx->in, 56); } else { /* Pad block to 56 bytes */ INM_MEM_ZERO(p, count - 8); } byteReverse(ctx->in, 14); /* Append length in bits and transform */ ((uint32 *) ctx->in)[14] = ctx->bits[0]; ((uint32 *) ctx->in)[15] = ctx->bits[1]; MD5Transform(ctx->buf, (uint32 *) ctx->in); byteReverse((unsigned char *) ctx->buf, 4); memcpy_s(digest, 16, ctx->buf, 16); INM_MEM_ZERO(ctx, sizeof(*ctx)); /* In case it's sensitive */ } /* The four core functions - F1 is optimized somewhat */ /* #define F1(x, y, z) (x & y | ~x & z) */ #define F1(x, y, z) (z ^ (x & (y ^ z))) #define F2(x, y, z) F1(z, x, y) #define F3(x, y, z) (x ^ y ^ z) #define F4(x, y, z) (y ^ (x | ~z)) /* This is the central step in the MD5 algorithm. */ #define MD5STEP(f, w, x, y, z, data, s) \ ( w += f(x, y, z) + data, w = w<>(32-s), w += x ) /* * The core of the MD5 algorithm, this alters an existing MD5 hash to * reflect the addition of 16 longwords of new data. MD5Update blocks * the data and converts bytes into longwords for this routine. */ void MD5Transform(uint32 buf[4], uint32 in[16]) { register uint32 a, b, c, d; a = buf[0]; b = buf[1]; c = buf[2]; d = buf[3]; MD5STEP(F1, a, b, c, d, in[0] + 0xd76aa478, 7); MD5STEP(F1, d, a, b, c, in[1] + 0xe8c7b756, 12); MD5STEP(F1, c, d, a, b, in[2] + 0x242070db, 17); MD5STEP(F1, b, c, d, a, in[3] + 0xc1bdceee, 22); MD5STEP(F1, a, b, c, d, in[4] + 0xf57c0faf, 7); MD5STEP(F1, d, a, b, c, in[5] + 0x4787c62a, 12); MD5STEP(F1, c, d, a, b, in[6] + 0xa8304613, 17); MD5STEP(F1, b, c, d, a, in[7] + 0xfd469501, 22); MD5STEP(F1, a, b, c, d, in[8] + 0x698098d8, 7); MD5STEP(F1, d, a, b, c, in[9] + 0x8b44f7af, 12); MD5STEP(F1, c, d, a, b, in[10] + 0xffff5bb1, 17); MD5STEP(F1, b, c, d, a, in[11] + 0x895cd7be, 22); MD5STEP(F1, a, b, c, d, in[12] + 0x6b901122, 7); MD5STEP(F1, d, a, b, c, in[13] + 0xfd987193, 12); MD5STEP(F1, c, d, a, b, in[14] + 0xa679438e, 17); MD5STEP(F1, b, c, d, a, in[15] + 0x49b40821, 22); MD5STEP(F2, a, b, c, d, in[1] + 0xf61e2562, 5); MD5STEP(F2, d, a, b, c, in[6] + 0xc040b340, 9); MD5STEP(F2, c, d, a, b, in[11] + 0x265e5a51, 14); MD5STEP(F2, b, c, d, a, in[0] + 0xe9b6c7aa, 20); MD5STEP(F2, a, b, c, d, in[5] + 0xd62f105d, 5); MD5STEP(F2, d, a, b, c, in[10] + 0x02441453, 9); MD5STEP(F2, c, d, a, b, in[15] + 0xd8a1e681, 14); MD5STEP(F2, b, c, d, a, in[4] + 0xe7d3fbc8, 20); MD5STEP(F2, a, b, c, d, in[9] + 0x21e1cde6, 5); MD5STEP(F2, d, a, b, c, in[14] + 0xc33707d6, 9); MD5STEP(F2, c, d, a, b, in[3] + 0xf4d50d87, 14); MD5STEP(F2, b, c, d, a, in[8] + 0x455a14ed, 20); MD5STEP(F2, a, b, c, d, in[13] + 0xa9e3e905, 5); MD5STEP(F2, d, a, b, c, in[2] + 0xfcefa3f8, 9); MD5STEP(F2, c, d, a, b, in[7] + 0x676f02d9, 14); MD5STEP(F2, b, c, d, a, in[12] + 0x8d2a4c8a, 20); MD5STEP(F3, a, b, c, d, in[5] + 0xfffa3942, 4); MD5STEP(F3, d, a, b, c, in[8] + 0x8771f681, 11); MD5STEP(F3, c, d, a, b, in[11] + 0x6d9d6122, 16); MD5STEP(F3, b, c, d, a, in[14] + 0xfde5380c, 23); MD5STEP(F3, a, b, c, d, in[1] + 0xa4beea44, 4); MD5STEP(F3, d, a, b, c, in[4] + 0x4bdecfa9, 11); MD5STEP(F3, c, d, a, b, in[7] + 0xf6bb4b60, 16); MD5STEP(F3, b, c, d, a, in[10] + 0xbebfbc70, 23); MD5STEP(F3, a, b, c, d, in[13] + 0x289b7ec6, 4); MD5STEP(F3, d, a, b, c, in[0] + 0xeaa127fa, 11); MD5STEP(F3, c, d, a, b, in[3] + 0xd4ef3085, 16); MD5STEP(F3, b, c, d, a, in[6] + 0x04881d05, 23); MD5STEP(F3, a, b, c, d, in[9] + 0xd9d4d039, 4); MD5STEP(F3, d, a, b, c, in[12] + 0xe6db99e5, 11); MD5STEP(F3, c, d, a, b, in[15] + 0x1fa27cf8, 16); MD5STEP(F3, b, c, d, a, in[2] + 0xc4ac5665, 23); MD5STEP(F4, a, b, c, d, in[0] + 0xf4292244, 6); MD5STEP(F4, d, a, b, c, in[7] + 0x432aff97, 10); MD5STEP(F4, c, d, a, b, in[14] + 0xab9423a7, 15); MD5STEP(F4, b, c, d, a, in[5] + 0xfc93a039, 21); MD5STEP(F4, a, b, c, d, in[12] + 0x655b59c3, 6); MD5STEP(F4, d, a, b, c, in[3] + 0x8f0ccc92, 10); MD5STEP(F4, c, d, a, b, in[10] + 0xffeff47d, 15); MD5STEP(F4, b, c, d, a, in[1] + 0x85845dd1, 21); MD5STEP(F4, a, b, c, d, in[8] + 0x6fa87e4f, 6); MD5STEP(F4, d, a, b, c, in[15] + 0xfe2ce6e0, 10); MD5STEP(F4, c, d, a, b, in[6] + 0xa3014314, 15); MD5STEP(F4, b, c, d, a, in[13] + 0x4e0811a1, 21); MD5STEP(F4, a, b, c, d, in[4] + 0xf7537e82, 6); MD5STEP(F4, d, a, b, c, in[11] + 0xbd3af235, 10); MD5STEP(F4, c, d, a, b, in[2] + 0x2ad7d2bb, 15); MD5STEP(F4, b, c, d, a, in[9] + 0xeb86d391, 21); buf[0] += a; buf[1] += b; buf[2] += c; buf[3] += d; } involflt-0.1.0/src/metadata-mode.c0000755000000000000000000004175014467303177015526 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include "involflt.h" #include "involflt-common.h" #include "data-mode.h" #include "change-node.h" #include "filestream.h" #include "iobuffer.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "bitmap_api.h" #include "VBitmap.h" #include "work_queue.h" #include "data-file-mode.h" #include "target-context.h" #include "involflt_debug.h" #include "driver-context.h" #include "db_routines.h" #include "metadata-mode.h" #include "tunable_params.h" extern driver_context_t *driver_ctx; #define COALESCE_ARRAY_LENGTH 10 inm_u64_t coalesce_array[10]; void update_coalesce_array(inm_u64_t length) { inm_u16_t i = 0; /* Just for verification purpose, we keep track of top 10 Coalesced changes. * None of them should exceed DriverContext.MaxCoalescedMetaDataChangeSize */ for (i = 0; i < COALESCE_ARRAY_LENGTH; i++) { if (coalesce_array[i] < length) { coalesce_array[i] = length; break; } } return; } inm_u64_t coalesce_metadata_change(target_context_t *vcptr, write_metadata_t *wmd, inm_s32_t data_source, change_node_t *chg_node, inm_u32_t *add_length) { inm_u64_t ret = 1; inm_u16_t md_idx = 0; disk_chg_t *chg = NULL; inm_u32_t sub_offset = 0; if (data_source != NODE_SRC_METADATA || (chg_node && (chg_node->changes.change_idx == 0))) { /* * Coalescing applies only to metadata mode * For coealescingm, we need atleast one change in change node */ return ret; } md_idx = chg_node->changes.change_idx % (MAX_CHANGE_INFOS_PER_PAGE); chg = (disk_chg_t *) ((char *)chg_node->changes.cur_md_pgp + (sizeof(disk_chg_t) * (md_idx-1))); *add_length = 0; /* Check for overlapping IOs: */ if ((chg->offset <= wmd->offset)) { if ((chg->offset + chg->length) >= wmd->offset) { if ((chg->offset + chg->length) < (wmd->offset + wmd->length)) { *add_length = ((wmd->offset + wmd->length) - (chg->offset + chg->length)); } ret = 0; } } else { if ((wmd->offset + wmd->length) >= chg->offset) { if ((wmd->offset + wmd->length) < (chg->offset + chg->length)) { *add_length = chg->offset - wmd->offset; sub_offset = chg->offset - wmd->offset; } else { *add_length = (wmd->offset+wmd->length)- (chg->offset+chg->length) + (chg->offset-wmd->offset); sub_offset = chg->offset - wmd->offset; } ret = 0; } } if (!ret && ((chg->length + *add_length) <= driver_ctx->tunable_params.max_sz_md_coalesce)) { chg->length += *add_length; chg->offset -= sub_offset; update_coalesce_array(chg->length); if (vcptr->tc_optimize_performance & PERF_OPT_DEBUG_COALESCED_CHANGES) { info("Adjacent entry: offset:%llu length:%u end offset:%llu td:%u sd:%u\n", chg->offset, chg->length,(chg->offset + chg->length), chg->time_delta, chg->seqno_delta); info("Incoming entry: offset:%llu length:%u end offset:%llu", wmd->offset, wmd->length, (wmd->offset + wmd->length)); info("Final entry : offset:%llu length:%u end offset:%llu tsd:%u seqd:%u\n", chg->offset, chg->length, (chg->offset + chg->length), chg->time_delta, chg->seqno_delta); } } else { ret = 1; } return ret; } inm_u32_t split_change_into_chg_node(target_context_t *vcptr, write_metadata_t *wmd, inm_s32_t data_source, struct inm_list_head *split_chg_list_hd, inm_wdata_t *wdatap) { unsigned long max_data_sz_per_chg_node = 0; inm_u32_t nr_splits = 0; change_node_t *chg_node = NULL; inm_tsdelta_t ts_delta; inm_u64_t time = 0, nr_seq = 0; unsigned long remaining_length = wmd->length; write_metadata_t wmd_local; if (data_source != NODE_SRC_DATA && data_source != NODE_SRC_METADATA) { dbg("Invalid mode in switch case %s:%i", __FILE__, __LINE__); return 0; } if (data_source == NODE_SRC_DATA) { /* MAX_DATA_SIZE_PER_DATA_MODE_CHANGE_NODE value needs to be * stored in tunable param. */ max_data_sz_per_chg_node = driver_ctx->tunable_params.max_data_sz_dm_cn - \ sv_chg_sz - sv_const_sz; } else { max_data_sz_per_chg_node = driver_ctx->tunable_params.\ max_data_size_per_non_data_mode_drty_blk; } INM_MEM_ZERO(&ts_delta, sizeof(ts_delta)); INM_INIT_LIST_HEAD(split_chg_list_hd); wmd_local.offset = wmd->offset; while(remaining_length) { chg_node = inm_alloc_change_node(wdatap, INM_KM_NOSLEEP); if (!chg_node) { info("change node is null"); return 0; } if(!init_change_node(chg_node, 1, INM_KM_NOSLEEP, wdatap)) { inm_free_change_node(chg_node); return 0; } ref_chg_node(chg_node); chg_node->type = data_source; chg_node->wostate = vcptr->tc_cur_wostate; chg_node->vcptr = vcptr; chg_node->transaction_id = 0; inm_list_add_tail(&chg_node->next, split_chg_list_hd); INM_BUG_ON(remaining_length & ~SECTOR_SIZE_MASK); wmd_local.length = min(max_data_sz_per_chg_node, remaining_length); wmd_local.length = wmd_local.length & SECTOR_SIZE_MASK; update_change_node(chg_node, &wmd_local, &ts_delta); chg_node->flags |= KDIRTY_BLOCK_FLAG_PART_OF_SPLIT_CHANGE; /* copy the time stamp related info from the first change node * as it should * be the same for all split ones. */ if (!nr_splits) { time = chg_node->changes.start_ts.TimeInHundNanoSecondsFromJan1601; nr_seq = chg_node->changes.start_ts.ullSequenceNumber; } else { chg_node->changes.start_ts.TimeInHundNanoSecondsFromJan1601 = time; chg_node->changes.start_ts.ullSequenceNumber = nr_seq; chg_node->changes.end_ts.TimeInHundNanoSecondsFromJan1601 = time; chg_node->changes.end_ts.ullSequenceNumber = nr_seq; dbg("time and seq # for chg node = %p \n", chg_node); } nr_splits++; chg_node->seq_id_for_split_io = nr_splits; wmd_local.offset += wmd_local.length; remaining_length -= wmd_local.length; } if (!nr_splits) return 0; chg_node = inm_list_entry(split_chg_list_hd->next, change_node_t, next); chg_node->flags |= KDIRTY_BLOCK_FLAG_START_OF_SPLIT_CHANGE; chg_node->flags &= ~KDIRTY_BLOCK_FLAG_PART_OF_SPLIT_CHANGE; chg_node = inm_list_entry(split_chg_list_hd->prev, change_node_t, next); chg_node->flags |= KDIRTY_BLOCK_FLAG_END_OF_SPLIT_CHANGE; chg_node->flags &= ~KDIRTY_BLOCK_FLAG_PART_OF_SPLIT_CHANGE; return nr_splits; } inm_s32_t add_metadata(target_context_t *vcptr, change_node_t *chg_node, write_metadata_t *wmd, inm_s32_t data_source, inm_wdata_t *wdatap) { unsigned long max_data_sz_per_chg_node = driver_ctx->tunable_params.max_data_size_per_non_data_mode_drty_blk; inm_s64_t avail_space = 0; inm_u32_t nr_splits = 0; inm_u32_t add_length = 0; inm_tsdelta_t ts_delta; struct inm_list_head split_chg_list_hd; /* Allocate change nodes for split io */ if (max_data_sz_per_chg_node < wmd->length) { #ifdef INM_AIX dbg("I/O is greater than %d in metadata mode, actual I/O size is %d", max_data_sz_per_chg_node, wmd->length); queue_worker_routine_for_set_volume_out_of_sync(vcptr, ERROR_TO_REG_IO_SIZE_64MB_METADATA, 0); return 0; #endif if (vcptr->tc_cur_node && (vcptr->tc_optimize_performance & PERF_OPT_DRAIN_PREF_DATA_MODE_CHANGES_IN_NWO)) { INM_BUG_ON(!inm_list_empty(&vcptr->tc_cur_node->nwo_dmode_next)); if (vcptr->tc_cur_node->type == NODE_SRC_DATA && vcptr->tc_cur_node->wostate != ecWriteOrderStateData) { close_change_node(vcptr->tc_cur_node, IN_IO_PATH); inm_list_add_tail(&vcptr->tc_cur_node->nwo_dmode_next, &vcptr->tc_nwo_dmode_list); if (vcptr->tc_optimize_performance & PERF_OPT_DEBUG_DATA_DRAIN) { info("Appending chg:%p to tgt_ctxt:%p next:%p prev:%p mode:%d", vcptr->tc_cur_node,vcptr, vcptr->tc_cur_node->nwo_dmode_next.next, vcptr->tc_cur_node->nwo_dmode_next.prev, vcptr->tc_cur_node->type); } } } vcptr->tc_cur_node = NULL; nr_splits = split_change_into_chg_node(vcptr, wmd, data_source, &split_chg_list_hd, wdatap); if (nr_splits) { inm_list_splice_at_tail(&split_chg_list_hd, &vcptr->tc_node_head); vcptr->tc_pending_changes += nr_splits; vcptr->tc_pending_md_changes += nr_splits; vcptr->tc_bytes_pending_md_changes += wmd->length; INM_BUG_ON(vcptr->tc_pending_changes < 0 ); vcptr->tc_bytes_pending_changes += wmd->length; vcptr->tc_cnode_pgs += nr_splits; add_changes_to_pending_changes(vcptr, vcptr->tc_cur_wostate, nr_splits); } return nr_splits; } chg_node = get_change_node_to_update(vcptr, wdatap, &ts_delta); if (chg_node && chg_node->changes.change_idx < (MAX_CHANGE_INFOS_PER_PAGE)) avail_space = max_data_sz_per_chg_node - chg_node->changes.bytes_changes; if (avail_space < wmd->length) { vcptr->tc_cur_node = NULL; chg_node = get_change_node_to_update(vcptr, wdatap, &ts_delta); if (!chg_node) { info("change node is null"); return -ENOMEM; } } nr_splits++; /* Check if we can coalesce the metadata change with previous one */ if ((vcptr->tc_optimize_performance & PERF_OPT_METADATA_COALESCE) && !coalesce_metadata_change(vcptr, wmd, data_source, chg_node, &add_length)) { vcptr->tc_bytes_pending_changes += add_length; chg_node->changes.bytes_changes += add_length; vcptr->tc_bytes_pending_md_changes += add_length; } else { update_change_node(chg_node, wmd, &ts_delta); vcptr->tc_pending_changes++; if (chg_node->type == NODE_SRC_METADATA) { vcptr->tc_pending_md_changes++; vcptr->tc_bytes_pending_md_changes += wmd->length; } INM_BUG_ON(vcptr->tc_pending_changes < 0 ); vcptr->tc_bytes_pending_changes += wmd->length; add_changes_to_pending_changes(vcptr, chg_node->wostate, nr_splits); } return nr_splits; } inm_s32_t save_data_in_metadata_mode(target_context_t *tgt_ctxt, write_metadata_t *wmd, inm_wdata_t *wdatap) { change_node_t *change_node = NULL; inm_s32_t _rc = 0; /* check for valid inputs */ if (!tgt_ctxt || !wmd) return -EINVAL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("entered tcPages:%d dc_cur_res_pages:%d ", tgt_ctxt->tc_stats.num_pages_allocated, driver_ctx->dc_cur_res_pages); } #define vcptr tgt_ctxt /* initialize the target context mode to bitmap, if it is in uninitialized * state */ if (!tgt_ctxt->tc_cur_mode) { tgt_ctxt->tc_cur_mode = FLT_MODE_METADATA; tgt_ctxt->tc_stats.st_mode_switch_time = INM_GET_CURR_TIME_IN_SEC; } if (!tgt_ctxt->tc_cur_wostate) { tgt_ctxt->tc_cur_wostate = ecWriteOrderStateBitmap; tgt_ctxt->tc_stats.st_wostate_switch_time = INM_GET_CURR_TIME_IN_SEC; } if (add_metadata(tgt_ctxt, change_node, wmd, NODE_SRC_METADATA, wdatap) < 0) { err("Memory pool : out of memory %s\n", tgt_ctxt->tc_guid); free_changenode_list(tgt_ctxt, ecNonPagedPoolLimitHitMDMode); /* send notification to the service process */ /* No need to clear dbnotify event here, since the service * waits for 30 secs, the control returns with failure */ queue_worker_routine_for_set_volume_out_of_sync(vcptr, ERROR_TO_REG_OUT_OF_MEMORY_FOR_DIRTY_BLOCKS, -ENOMEM); err("Allocation of change node failed \n"); } /* check the target context counters against with WATERMARK thresholds * if it meets the requirements then wake up service thread */ /* * Switch to bitmap after evaluating following set of conditions * Condition:1 Valid HWM value is set * Condition:2 Total changes in metadata mode have crossed HWM * Condition:3 Service is not shutdown */ if ((driver_ctx->tunable_params.db_high_water_marks[driver_ctx->service_state]) && (tgt_ctxt->tc_pending_md_changes >= (driver_ctx->tunable_params.db_high_water_marks[driver_ctx->service_state])) && (!driver_ctx->sys_shutdown)) { /*wakeup service thread*/ if (!(vcptr->tc_flags & VCF_VOLUME_STACKED_PARTIALLY) && !vcptr->tc_bp->bmap_busy_wait) { INM_ATOMIC_INC(&driver_ctx->service_thread.wakeup_event_raised); INM_WAKEUP_INTERRUPTIBLE(&driver_ctx->service_thread.wakeup_event); INM_COMPLETE(&driver_ctx->service_thread._new_event_completion); vcptr->tc_bp->bmap_busy_wait = TRUE; } } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("leaving tcPages:%d dc_cur_res_pages:%d ", tgt_ctxt->tc_stats.num_pages_allocated, driver_ctx->dc_cur_res_pages); } #undef vcptr return _rc; } inm_s32_t add_tag_in_non_stream_mode(tag_volinfo_t *tag_volinfop, tag_info_t *tag_buf, inm_s32_t num_tags, tag_guid_t *tag_guid, inm_s32_t index, int commit_pending, tag_history_t *hist) { target_context_t *ctxt = tag_volinfop->ctxt; inm_s32_t status = 0, padtaglen = 0, hdrlen = 0, idx = 0; change_node_t *chg_node = NULL; char *pg = NULL; inm_s32_t tag_idx = 0; tag_info_t *tag_ptr = tag_buf; #ifdef INM_AIX inm_wdata_t wdata; #endif TAG_COMMIT_STATUS *tag_status = NULL; /* Whether to freeze or not should be an option in future. No problem * in ensuring fs consistency also. So, the current approach also * seem to be fine. */ dbg("Issuing tag in non-stream mode"); dbg("Num inflight IOs while taking tag %d\n", INM_ATOMIC_READ(&ctxt->tc_nr_in_flight_ios)); #ifdef INM_AIX INM_MEM_ZERO(&wdata, sizeof(inm_wdata_t)); wdata.wd_chg_node = tag_volinfop->chg_node; wdata.wd_meta_page = tag_volinfop->meta_page; chg_node = get_change_node_for_usertag(ctxt, &wdata, commit_pending); tag_volinfop->chg_node = wdata.wd_chg_node; tag_volinfop->meta_page = wdata.wd_meta_page; #else chg_node = get_change_node_for_usertag(ctxt, NULL, commit_pending); #endif if(!chg_node) { status = -ENOMEM; err("Failed to get change node for adding tag"); goto unlock_exit; } #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) if (INM_ATOMIC_READ(&driver_ctx->is_iobarrier_on)) { memcpy_s(&chg_node->changes.start_ts, sizeof(TIME_STAMP_TAG_V2), &driver_ctx->dc_crash_tag_timestamps, sizeof(TIME_STAMP_TAG_V2)); memcpy_s(&chg_node->changes.end_ts, sizeof(TIME_STAMP_TAG_V2), &chg_node->changes.start_ts, sizeof(TIME_STAMP_TAG_V2)); } #endif INM_SPIN_LOCK(&driver_ctx->dc_tag_commit_status); if (driver_ctx->dc_tag_drain_notify_guid && !INM_MEM_CMP(driver_ctx->dc_cp_guid, driver_ctx->dc_tag_drain_notify_guid, GUID_LEN)) { if (driver_ctx->dc_tag_commit_notify_flag & TAG_COMMIT_NOTIFY_BLOCK_DRAIN_FLAG) { chg_node->flags |= CHANGE_NODE_BLOCK_DRAIN_TAG; } else { chg_node->flags |= CHANGE_NODE_FAILBACK_TAG; } tag_status = ctxt->tc_tag_commit_status; } INM_SPIN_UNLOCK(&driver_ctx->dc_tag_commit_status); /* * set write order state to data here, because tag is not a change */ chg_node->wostate = ecWriteOrderStateData; pg = (char *)chg_node->changes.cur_md_pgp; while(tag_idx < num_tags) { padtaglen = ALIGN((tag_ptr->tag_len + sizeof(unsigned short)), sizeof(inm_u32_t)); hdrlen = (padtaglen + sizeof(STREAM_REC_HDR_4B)) < 0xFF ? sizeof(STREAM_REC_HDR_4B) : sizeof(STREAM_REC_HDR_8B); if((idx + hdrlen + padtaglen) > INM_PAGESZ) { err("Exceeded Maximum tag size per change node"); status = -ENOMEM; inm_list_del(&chg_node->next); deref_chg_node(chg_node); goto unlock_exit; } FILL_STREAM_HEADER((pg + idx), STREAM_REC_TYPE_USER_DEFINED_TAG, (hdrlen + padtaglen)); idx += hdrlen; *(unsigned short *)(pg + idx) = tag_ptr->tag_len; idx += sizeof(unsigned short); if (memcpy_s((pg + idx), tag_ptr->tag_len, tag_ptr->tag_name, tag_ptr->tag_len)) { status = INM_EFAULT; inm_list_del(&chg_node->next); deref_chg_node(chg_node); goto unlock_exit; } idx -= sizeof(unsigned short); idx += padtaglen; tag_idx++; tag_ptr++; } /* Append end of tag list stream. */ FILL_STREAM_HEADER_4B((pg + idx), STREAM_REC_TYPE_END_OF_TAG_LIST, sizeof(STREAM_REC_HDR_4B)); if(tag_guid){ tag_guid->status[index] = STATUS_PENDING; chg_node->tag_status_idx = index; } if (tag_status) { info("The failback tag is inserted for disk %s, dirty block = %p", ctxt->tc_guid, chg_node); set_tag_drain_notify_status(ctxt, TAG_STATUS_INSERTED, DEVICE_STATUS_SUCCESS); } chg_node->tag_guid = tag_guid; chg_node->cn_hist = hist; dbg("Tag Issued Successfully to volume %s", ctxt->tc_guid); goto out; unlock_exit: if(tag_guid) tag_guid->status[index] = STATUS_FAILURE; if (tag_status) set_tag_drain_notify_status(ctxt, TAG_STATUS_INSERTION_FAILED, DEVICE_STATUS_SUCCESS); out: return status; } involflt-0.1.0/src/filestream.c0000755000000000000000000002002314467303177015145 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include #include #include "involflt.h" #include "involflt-common.h" #include "data-mode.h" #include "utils.h" #include "change-node.h" #include "filestream.h" #include "iobuffer.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "bitmap_api.h" #include "VBitmap.h" #include "work_queue.h" #include "data-file-mode.h" #include "target-context.h" #include "driver-context.h" #include "file-io.h" #include "involflt_debug.h" #include "db_routines.h" #include "data-file-mode.h" #include "tunable_params.h" #include "errlog.h" #include "filestream_raw.h" extern driver_context_t *driver_ctx; fstream_t *fstream_ctr(void *ctx) { fstream_t *fs = NULL; fs = (fstream_t *)INM_KMALLOC(sizeof(*fs), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!fs) return NULL; INM_MEM_ZERO(fs, sizeof(*fs)); fs->context = ctx; INM_ATOMIC_SET(&fs->refcnt, 1); return fs; } void fstream_dtr(fstream_t *fs) { if(!fs) return; kfree(fs); fs = NULL; return; } fstream_t *fstream_get(fstream_t *fs) { INM_ATOMIC_INC(&fs->refcnt); return fs; } void fstream_put(fstream_t *fs) { if (INM_ATOMIC_DEC_AND_TEST(&fs->refcnt)) fstream_dtr(fs); } inm_s32_t fstream_open(fstream_t *fs, char *path, inm_s32_t flags, inm_s32_t mode) { struct file *fp = NULL; inm_s32_t _rc = 0; mm_segment_t _mfs; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } _mfs = get_fs(); set_fs(KERNEL_DS); INM_BUG_ON(!path); if (path[0] == '/') { //open the file with full path fp = filp_open(path, flags, mode); } if (!fp) { _rc = -1; goto out; } if (IS_ERR(fp)) { _rc = PTR_ERR(fp); set_fs(_mfs); return _rc; } if (!S_ISREG((INM_HDL_TO_INODE(fp))->i_mode)) { _rc = -EACCES; goto out; } #if LINUX_VERSION_CODE < KERNEL_VERSION(3,7,0) if (!fp->f_op->write || !fp->f_op->read) { _rc = -EIO; goto out; } #endif inm_prepare_tohandle_recursive_writes(INM_HDL_TO_INODE(fp)); fs->filp = fp; fs->inode = INM_HDL_TO_INODE(fp); _rc = 0; goto success; out: if (fp) filp_close(fp, current->files); success: set_fs(_mfs); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return _rc; } inm_s32_t fstream_close(fstream_t *fs) { struct file *fp = (struct file *)fs->filp; struct inode *_tinode = (struct inode *)fs->inode; inm_s32_t _rc = 0; mm_segment_t _mfs; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (fs->fs_raw_hdl) return fstream_raw_close(fs->fs_raw_hdl); if (!fp) return (1); fs->filp = NULL; fs->inode = NULL; _mfs = get_fs(); set_fs(KERNEL_DS); inm_restore_org_addr_space_ops(_tinode); filp_close(fp, NULL); set_fs(_mfs); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return _rc; } inm_s32_t fstream_get_fsize(fstream_t *fs) { struct file *fp = (struct file *)fs->filp; if (fs->fs_raw_hdl) return fstream_raw_get_fsize(fs->fs_raw_hdl); if (!fp) return -EINVAL; return (INM_HDL_TO_INODE(fp))->i_size; } inm_s32_t fstream_open_or_create(fstream_t *fs, char *path, inm_s32_t *file_created, inm_u32_t bmap_sz) { inm_s32_t ret = 0, c_ret = 0; inm_s32_t oflags = (O_RDWR | O_EXCL | O_LARGEFILE | O_NOATIME); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } ret = fstream_open(fs, path, oflags, 0644); if (ret) { // may be file does not exist, create it oflags |= O_CREAT; c_ret = fstream_open(fs, path, oflags, 0644); if (c_ret) { if (c_ret == -EEXIST) return ret; else return c_ret; } else { ret = c_ret; } *file_created = 1; } else { if (fstream_get_fsize(fs) < bmap_sz) { //bmap file corrupted *file_created = 2; } } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return ret; } void fstream_enable_buffered_io(fstream_t *fs) { fs->fs_flags |= FS_FLAGS_BUFIO; } void fstream_disable_buffered_io(fstream_t *fs) { fs->fs_flags &= ~FS_FLAGS_BUFIO; } void fstream_sync_range(fstream_t *fs, inm_u64_t offset, inm_u32_t size) { struct file *fp = (struct file *)fs->filp; #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,32) sync_page_range(fp->f_dentry->d_inode, fp->f_mapping, offset, size); #elif LINUX_VERSION_CODE == KERNEL_VERSION(2,6,32) filemap_write_and_wait_range(fp->f_mapping, offset, offset + size - 1); #else vfs_fsync_range(fp, offset, offset + size - 1, 0); #endif } void fstream_sync(fstream_t *fs) { inm_s32_t size = fstream_get_fsize(fs); if (size <= 0) { err("Invalid bitmap size (%d)", size); INM_BUG_ON(size < 0); } else { fstream_sync_range(fs, 0, (inm_u32_t)size); } } inm_s32_t fstream_write(fstream_t *fs, char *buffer, inm_u32_t size, inm_u64_t offset) { struct file *fp = (struct file *)fs->filp; ssize_t nr = 0; mm_segment_t _mfs; loff_t pos = (loff_t) offset; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (fs->fs_raw_hdl) return fstream_raw_write(fs->fs_raw_hdl, buffer, size, offset); if (!fp) return 1; _mfs = get_fs(); set_fs(KERNEL_DS); #if LINUX_VERSION_CODE >= KERNEL_VERSION(4,14,0) nr = kernel_write(fp, buffer, (ssize_t)size, &pos); #else #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,7,0) nr = vfs_write(fp, buffer, (ssize_t)size, &pos); #else nr = fp->f_op->write(fp, buffer, (ssize_t)size, &pos); #endif #endif if (nr > 0 && !(fs->fs_flags & FS_FLAGS_BUFIO)) fstream_sync_range(fs, offset, nr); set_fs(_mfs); if (nr != (ssize_t)size) { info("Write requested for %d bytes wrote %d bytes \n", size, (int)nr); } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with status = %d , num of bytes written" "into bmap = %d", (nr == size), (int)nr); } return nr == size ? 0 : nr; } inm_s32_t fstream_read(fstream_t *fs, char *buffer, inm_u32_t size, inm_u64_t offset) { struct file *fp = (struct file *)fs->filp; ssize_t nr = 0; mm_segment_t _mfs; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (fs->fs_raw_hdl) return fstream_raw_read(fs->fs_raw_hdl, buffer, size, offset); if (!fp) return 1; _mfs = get_fs(); set_fs(KERNEL_DS); #if LINUX_VERSION_CODE >= KERNEL_VERSION(4,14,0) nr = kernel_read(fp, buffer, (ssize_t)size, &offset); #else #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,7,0) nr = vfs_read(fp, buffer, (ssize_t)size, &offset); #else nr = fp->f_op->read(fp, buffer, (ssize_t)size, &offset); #endif #endif if (nr < 0) { err("Unable to read the data %0xx\n", size); set_fs(_mfs); return 1; } set_fs(_mfs); if (nr != (ssize_t)size) { info("Read requested for %u bytes read %d bytes \n", size, (int)nr); } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with status %d, nr bytes read from bmap = %d", (nr == size), (int)nr); } return nr == size ? 0 : 1; } inm_s32_t fstream_map_file_blocks(fstream_t *fs, inm_u64_t offset, inm_u32_t len, fstream_raw_hdl_t **hdl) { return fstream_raw_open(fs->filp, offset, len, hdl); } void fstream_switch_to_raw_mode(fstream_t *fs, fstream_raw_hdl_t *raw_hdl) { dbg("Switching filestream to rawe mode"); fstream_close(fs); fs->fs_raw_hdl = raw_hdl; } involflt-0.1.0/src/verifier_user.h0000755000000000000000000000225614467303177015700 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _VUSER #define _VUSER #include #include #include #include #include #include #include #define INM_LINUX typedef int inm_s32_t; typedef unsigned int inm_u32_t; typedef long long inm_s64_t; typedef unsigned long long inm_u64_t; static inline int inm_is_little_endian(void) { return 1; } #endif involflt-0.1.0/src/filestream_raw.h0000755000000000000000000000521314467303177016027 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _FSTREAM_RAW #define _FSTREAM_RAW #include "osdep.h" #include "flt_bio.h" typedef struct fr_block { inm_bio_dev_t *fb_disk; inm_u64_t fb_offset; } fr_block_t; #define FSRAW_BLK_PER_PAGE (PAGE_SIZE/sizeof(fr_block_t)) #define FSRAW_BLK_PER_PAGE_SHIFT 8 #define FSRAW_BLK_MAP_OFF(hdl,offset) /* Offset to block */ \ ((offset - hdl->frh_offset) >> hdl->frh_bshift) #define FSRAW_BLK_PAGE(hdl, offset) /* Offset to blkmap page */ \ (FSRAW_BLK_MAP_OFF(hdl, offset) >> FSRAW_BLK_PER_PAGE_SHIFT) #define FSRAW_BLK_IDX(hdl, offset) /* Offset to fr_block idx */ \ (FSRAW_BLK_MAP_OFF(hdl, offset) & ~(UINT_MAX << FSRAW_BLK_PER_PAGE_SHIFT)) typedef struct fstream_raw_hdl { inm_spinlock_t frh_slock; inm_u64_t frh_fsize; /* File Size */ inm_u32_t frh_bsize; /* Block Size */ inm_u32_t frh_bshift; /* Block Size Shift */ inm_u64_t frh_offset; /* Offset Mapped */ inm_u32_t frh_len; /* Length Mapped */ inm_u32_t frh_alen; /* Length aligned to block */ inm_u32_t frh_nblks; /* Num Blocks */ inm_u32_t frh_npages; /* Num Mapping Pages */ fr_block_t **frh_blocks; /* Mapping */ } fstream_raw_hdl_t; void fstream_raw_map_bio(struct bio *); inm_s32_t fstream_raw_open(char *, inm_u64_t, inm_u32_t, fstream_raw_hdl_t **); inm_s32_t fstream_raw_get_fsize(fstream_raw_hdl_t *); inm_s32_t fstream_raw_close(fstream_raw_hdl_t *); inm_s32_t fstream_raw_read(fstream_raw_hdl_t *, char *, inm_u32_t, inm_u64_t); inm_s32_t fstream_raw_write(fstream_raw_hdl_t *, char *, inm_u32_t, inm_u64_t); #endif involflt-0.1.0/src/telemetry-types.h0000755000000000000000000003333114467303177016201 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _TEL_TYPES_H #define _TEL_TYPES_H #define TELEMETRY_SEC_TO_100NSEC(tsec) (tsec * HUNDREDS_OF_NANOSEC_IN_SECOND) #define TELEMETRY_MSEC_TO_100NSEC(tsec) (tsec * 10000ULL) #define TELEMETRY_WTIME_OFF 116444736000000000ULL #define TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC(t) \ ((t != 0) ? ((t) + TELEMETRY_WTIME_OFF) : (t)) #define TELEMETRY_FMT1601_TIMESTAMP_FROM_SEC(tsec) \ TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC(TELEMETRY_SEC_TO_100NSEC(tsec)) #define TELEMETRY_FILE_REFRESH_INTERVAL 300000 /* in msecs = 5 mins */ #define TELEMETRY_ACCEPTABLE_TIME_JUMP 180000 /* in msecs = 3 mins */ #define TELEMETRY_ACCEPTABLE_TIME_JUMP_THRESHOLD \ TELEMETRY_MSEC_TO_100NSEC(TELEMETRY_ACCEPTABLE_TIME_JUMP) typedef enum _etTagStateTriggerReason { ecNotApplicable = 1, ecBitmapWrite, /* Dropped due to bitmap write */ ecFilteringStopped, /* Dropped as stop filtering issued */ ecClearDiffs, /* Dropped as clear diffs issued */ ecNonPagedPoolLimitHitMDMode, /* Dropped as no metadata pages */ ecChangesQueuedToTempQueue, /* Not Used */ ecRevokeTimeOut, /* Tag revoked on timeout */ ecRevokeCancelIrpRoutine, /* Tag revoked on user request */ ecRevokeCommitIOCTL, /* Not Used */ ecRevokeLocalCrash, /* Not Used */ ecRevokeDistrubutedCrashCleanupTag, /* Not Used */ ecRevokeDistrubutedCrashInsertTag, /* Not Used */ ecRevokeDistrubutedCrashReleaseTag, /* Not Used */ ecRevokeAppTagInsertIOCTL, /* Not Used */ ecSplitIOFailed, /* LIN: splitting large io failed */ ecOrphan, /* LIN: Orphan dropped on drainer exit */ } etTagStateTriggerReason; typedef enum _etTagType { ecTagNone = 0, ecTagLocalCrash, ecTagDistributedCrash, ecTagLocalApp, ecTagDistributedApp, } etTagType; typedef enum _etTagStatus { ecTagStatusCommited = 0, /* Successful state */ ecTagStatusPending = 1, /* Initialized */ ecTagStatusDeleted = 2, /* Deleted due to StopFlt or ClrDif */ ecTagStatusDropped = 3, /* Dropped due to Bitmap writes */ ecTagStatusInvalidGUID = 4, /* Invalig GUID */ ecTagStatusFilteringStopped = 5, /* Device filtering is stopped */ ecTagStatusUnattempted = 6, /* Tag unattempted for any reason */ ecTagStatusFailure = 7, /* Any error e.g. mem alloc failure */ ecTagStatusRevoked = 8, /* Multi phase revoke */ ecTagStatusInsertFailure = 9, /* Tag Insert Failure */ ecTagStatusInsertSuccess = 10, /* Tag Insert Success */ ecTagStatusIOCTLFailure = 11, /* Tag IOCTL failure */ ecTagStatusTagCommitDBSuccess = 12, /* Tag committed as part of drain */ ecTagStatusTagNonDataWOS = 13, ecTagStatusTagDataWOS = 14, ecTagStatusMaxEnum } etTagStatus; #define TELEMETRY_LINUX_MSG_TYPE 0x60000000 typedef enum _etMessageType { ecMsgUninitialized = 1, ecMsgCCInputBufferMismatch, ecMsgCCInvalidTagInputBuffer, ecMsgCompareExchangeTagStateFailure, ecMsgValidateAndBarrierFailure, ecMsgTagVolumeInSequenceFailure, ecMsgInputZeroTimeOut, ecMsgAllFilteredDiskFlagNotSet, ecMsgInFlightIO, ecMsgTagIrpCancelled, ecMsgInValidTagProtocolState, ecMsgTagInsertFailure = 15, ecMsgTagCommitDBSuccess, ecMsgTagRevoked, ecMsgTagDropped, ecMsgPreCheckFailure, ecMsgAppInputBufferMismatch, ecMsgAppInvalidTagInputBuffer, ecMsgAppUnexpectedFlags, ecMsgAppInvalidInputDiskNum, ecMsgAppOutputBufferTooSmall, ecMsgAppTagInfoMemAllocFailure, ecMsgAppDeviceNotFound, ecMsgStatusNoMemory, ecMsgStatusUnexpectedPrecheckFlags, } etMessageType; typedef enum _etWOSChangeReason { ecWOSChangeReasonUnInitialized = 0, eCWOSChangeReasonServiceShutdown = 1, ecWOSChangeReasonBitmapChanges = 2, ecWOSChangeReasonBitmapNotOpen = 3, ecWOSChangeReasonBitmapState = 4, ecWOSChangeReasonCaptureModeMD = 5, ecWOSChangeReasonMDChanges = 6, ecWOSChangeReasonDChanges = 7, ecWOSChangeReasonDontPageFault = 8, ecWOSChangeReasonPageFileMissed = 9, ecWOSChangeReasonExplicitNonWO = 10, ecWOSChangeReasonUnsupportedBIO = 11, } etWOSChangeReason; typedef enum _etTagProtocolPhase { HoldWrites = 1, InsertTag, ReleaseWrites, CommitTag } etTagProtocolPhase; /* Used by driver and target context to represent a blended state */ #define DBS_DIFF_SYNC_THROTTLE 0x0000000000000001 /* Disk */ #define DBS_SERVICE_STOPPED 0x0000000000000002 /* Driver */ #define DBS_S2_STOPPED 0x0000000000000004 /* Driver */ #define DBS_DRIVER_NOREBOOT_MODE 0x0000000000000008 /* Driver */ #define DBS_DRIVER_RESYNC_REQUIRED 0x0000000000000010 /* Disk */ #define DBS_FILTERING_STOPPED_BY_USER 0x0000000000000020 /* Disk */ #define DBS_FILTERING_STOPPED_BY_KERNEL 0x0000000000000040 /* Disk */ #define DBS_FILTERING_STOPPED 0x0000000000000080 /* Disk */ #define DBS_CLEAR_DIFFERENTIALS 0x0000000000000100 /* Disk */ #define DBS_BITMAP_WRITE 0x0000000000000200 /* Disk */ #define DBS_NPPOOL_LIMIT_HIT_MD_MODE 0x0000000000000400 /* NA */ #define DBS_MAX_NONPAGED_POOL_LIMIT_HIT 0x0000000000000800 /* NA */ #define DBS_LOW_MEMORY_CONDITION 0x0000000000001000 /* NA */ #define DBS_SPLIT_IO_FAILED 0x0000000000002000 /* NEW */ #define DBS_ORPHAN 0x0000000000004000 /* NEW */ // Reserved Fields #define DBS_TAG_REVOKE_TIMEOUT 0x0000000000010000 #define DBS_TAG_REVOKE_CANCELIRP 0x0000000000020000 #define DBS_TAG_REVOKE_COMMITIOCTL 0x0000000000040000 #define DBS_TAG_REVOKE_LOCALCC 0x0000000000080000 #define DBS_TAG_REVOKE_DCCLEANUPTAG 0x0000000000100000 #define DBS_TAG_REVOKE_DCINSERTIOCTL 0x0000000000200000 #define DBS_TAG_REVOKE_DCRELEASEIOCTL 0x0000000000400000 #define DBS_TAG_REVOKE_APPINSERTIOCTL 0x0000000000800000 #define TEL_FLAGS_SET_BY_DRIVER 0x0400000000000000 /* * Global driver telemetry data: DRIVER_TELEMETRY */ typedef struct driver_telemetry { inm_spinlock_t dt_dbs_slock; /* Disk Blended State lock */ inm_u64_t dt_blend; /* Blended State */ inm_u64_t dt_drv_load_time; /* Driver Load Time */ inm_u64_t dt_svagent_start_time; inm_u64_t dt_svagent_stop_time; inm_u64_t dt_s2_start_time; inm_u64_t dt_s2_stop_time; inm_u64_t dt_last_tag_request_time; /* Last Crash* tag req time */ int dt_persistent_dir_created; inm_u64_t dt_timestamp_in_persistent_store; inm_u64_t dt_seqno_in_persistent_store; inm_u64_t dt_unstack_all_time; inm_u64_t dt_time_jump_exp; /* Exp time in case of jump */ inm_u64_t dt_time_jump_cur; /* Act time in case of jump */ } driver_telemetry_t; /* * Stats when moving to non wo statenm_list_splice_init: NON_WOSTATE_STATS */ typedef struct non_wo_stats { etWriteOrderState nws_old_state; /* Old State */ etWriteOrderState nws_new_state; /* New State */ inm_u64_t nws_change_time; /* Transition time */ inm_u64_t nws_meta_pending; /* Metadata changes pending */ inm_u64_t nws_bmap_pending; /* Bitmap changes pending */ inm_u64_t nws_data_pending; /* Data changes pending */ inm_u32_t nws_nwo_secs; /* Time in old state */ etWOSChangeReason nws_reason; /* Reason for changes */ inm_u32_t nws_mem_alloc; /* Pages allocated */ inm_u32_t nws_mem_reserved; /* Pages reserved */ inm_u32_t nws_mem_free; /* Remaining unreserved pool*/ inm_u32_t nws_free_cn; /* Free change nodes == 0 */ inm_u32_t nws_used_cn; /* Allocated change nodes */ inm_u32_t nws_max_used_cn; /* Max change nodes == 0 */ inm_u64_t nws_blend; /* Disk blended state */ inm_u32_t nws_np_alloc; /* NA */ inm_u64_t nws_np_limit_time; /* NA */ inm_u32_t nws_np_alloc_fail; /* NA */ } non_wo_stats_t; /* * Replication stats snapshot at tag insert: TAG_DISK_REPLICATION_STATS */ typedef struct tgt_stats { inm_u64_t ts_pending; /* Pending Changes */ inm_u64_t ts_tracked_bytes; /* Bytes tracked */ inm_u64_t ts_getdb; /* Dirty Blocks drained */ inm_u64_t ts_drained_bytes; /* Bytes drained */ inm_u64_t ts_commitdb; /* Dirty blocks committed */ inm_u64_t ts_revertdb; /* Dirty blocks reverted */ inm_u64_t ts_nwlb1; /* Network latency <=150ms */ inm_u64_t ts_nwlb2; /* Network latency <=250ms */ inm_u64_t ts_nwlb3; /* Network latency <=500ms */ inm_u64_t ts_nwlb4; /* Network latency <=1sec */ inm_u64_t ts_nwlb5; /* Network latency > 1sec */ inm_u64_t ts_commitdb_failed; /* Commit DB failed */ } tgt_stats_t; /* * Per disk telemetry data: DISK_TELEMETRY */ #define TELEMETRY_THROTTLE_IN_PROGRESS 0xffffffffffffffffULL typedef struct target_telemetry { inm_u64_t tt_getdb; /* Dirty blocks drained */ inm_u64_t tt_commitdb; /* Drity blocks committed */ inm_u64_t tt_commitdb_failed; /* Dirty blocks commit fail */ inm_u64_t tt_revertdb; /* Resent pending confirm */ inm_u64_t tt_user_stop_flt_time; /* Stop flt by user time */ inm_u64_t tt_prev_tag_time; /* Last succ/fail tag time */ inm_u64_t tt_prev_succ_tag_time; /* Last success tag time */ inm_u64_t tt_blend; /* Disk blended state DBS_* */ tgt_stats_t tt_prev_succ_stats; /* Previous succ tag stats */ tgt_stats_t tt_prev_stats; /* Previous tag stats */ non_wo_stats_t tt_nwo; /* Non write order stats */ inm_u64_t tt_resync_start; /* Resync start time */ inm_u64_t tt_resync_end; /* Resync end time */ inm_u64_t tt_getdb_time; /* Last get_db time */ inm_u64_t tt_commitdb_time; /* Last commit_db time */ inm_u64_t tt_create_time; /* tgt_ctxt create time */ inm_u64_t tt_prev_ts; /* Prev commited cn ts */ inm_u64_t tt_prev_seqno; /* Prev commited cn seq no */ inm_u64_t tt_prev_tag_ts; /* Prev tag ts */ inm_u64_t tt_prev_tag_seqno; /* Prev tag seq num */ inm_u64_t tt_ds_throttle_start; /* Diff sync throttle start */ inm_u64_t tt_ds_throttle_stop; /* Diff sync throttle stop */ inm_u64_t tt_stop_flt_time; /* Stop flt time */ inm_u64_t tt_start_flt_time_by_user; /* User start filtering time */ } target_telemetry_t; /* * Global tag telemetry data: TAG_TELEMETRY_COMMON */ typedef struct tag_telemetry_common { atomic_t tc_refcnt; /* Ref Count */ inm_u16_t tc_ndisks; /* Num disks */ inm_u16_t tc_ndisks_prot; /* Num protected disks */ inm_u64_t tc_req_time; /* Tag ioctl time */ char tc_guid[GUID_SIZE_IN_CHARS + 1];/* Tag guid */ inm_s32_t tc_ioctl_cmd; /* Tag ioctl Called */ inm_s32_t tc_ioctl_status; /* Ioctl status */ inm_u32_t tc_ndisks_tagged; /* Number disks tagged */ etTagType tc_type; /* Tag type */ } tag_telemetry_common_t; /* * Per disk successful tag telemetry data: TAG_HISTORY */ typedef struct tag_history { tag_telemetry_common_t *th_tag_common; /* Common data */ inm_u64_t th_insert_time; /* Insert time */ inm_u64_t th_prev_tag_time; /* Prev tag time */ inm_u64_t th_prev_succ_tag_time; /* Prev succ tag time */ inm_s32_t th_tag_status; /* Per disk status */ inm_u64_t th_blend; /* Blended state */ inm_u64_t th_tag_state; /* Tag state */ inm_u64_t th_commit_time; /* Tag commit time */ inm_u64_t th_drainbarr_time; /* Drain barrier time */ tgt_stats_t th_prev_succ_stats; /* Prev succ tag stats */ tgt_stats_t th_prev_stats; /* Prev tag stats */ tgt_stats_t th_cur_stats; /* Current stats */ void *th_tgt_ctxt; } tag_history_t; #endif involflt-0.1.0/src/utils.c0000755000000000000000000006772414467303177014175 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include "involflt.h" #include "involflt-common.h" #include "data-mode.h" #include "utils.h" #include "change-node.h" #include "filestream.h" #include "iobuffer.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "bitmap_api.h" #include "VBitmap.h" #include "work_queue.h" #include "data-file-mode.h" #include "target-context.h" #include "driver-context.h" #include "involflt_debug.h" #include "tunable_params.h" #include "filter_host.h" #include "file-io.h" #ifdef INM_LINUX #include "filter_lun.h" #endif #include "filestream_raw.h" extern driver_context_t *driver_ctx; extern inm_s32_t driver_state; inm_s32_t write_vol_attr(target_context_t * , const char *, void *, int); static inm_u32_t is_big_endian(void); #ifdef INM_DEBUG static void print_tag_struct(tag_info_t *, inm_s32_t); #endif void get_time_stamp(inm_u64_t *time_in_100nsec) { #ifdef INM_AIX struct timestruc_t now; #else inm_timespec now; #endif if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("entered"); } *time_in_100nsec = 0; INM_GET_CURRENT_TIME(now); (*time_in_100nsec) += (now.tv_sec*HUNDREDS_OF_NANOSEC_IN_SECOND); (*time_in_100nsec) += (now.tv_nsec/100); INM_BUG_ON(!*time_in_100nsec); if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("leaving"); } } void get_time_stamp_tag(TIME_STAMP_TAG_V2 *time_stamp) { #ifdef INM_AIX struct timestruc_t now; #else inm_timespec now; #endif inm_u64_t time_in_100nsec = 0; unsigned long lock_flag = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } INM_GET_CURRENT_TIME(now); time_in_100nsec += (now.tv_sec*HUNDREDS_OF_NANOSEC_IN_SECOND); time_in_100nsec += (now.tv_nsec/100); INM_BUG_ON(!time_in_100nsec); time_stamp->TimeInHundNanoSecondsFromJan1601 = time_in_100nsec; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->time_stamp_lock, lock_flag); if(driver_ctx->last_time_stamp_seqno >= INMAGE_MAX_TS_SEQUENCE_NUMBER) { driver_ctx->last_time_stamp++; driver_ctx->last_time_stamp_seqno = 0; } else { driver_ctx->last_time_stamp_seqno++; } time_stamp->ullSequenceNumber = driver_ctx->last_time_stamp_seqno; if(time_stamp->TimeInHundNanoSecondsFromJan1601 <= driver_ctx->last_time_stamp) { time_stamp->TimeInHundNanoSecondsFromJan1601 = driver_ctx->last_time_stamp; } else { driver_ctx->last_time_stamp = time_stamp->TimeInHundNanoSecondsFromJan1601; } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->time_stamp_lock, lock_flag); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } } inm_s32_t validate_path_for_file_name(char *filename) { char *local_fname = NULL; char *cptr = NULL; char *next = NULL; char *tmp_str = NULL; inm_u32_t sz = 0; inm_s32_t err = 0; #if defined(__SunOS_5_9) || defined(__SunOS_5_8) inm_u32_t len = 0; char char_tmp; #endif if (!filename || !strlen(filename)) return -EINVAL; local_fname = (char *) INM_KMALLOC(MAX_LOG_PATHNAME, INM_KM_SLEEP, INM_KERNEL_HEAP); if (!local_fname) { return -ENOMEM; } tmp_str = (char *) INM_KMALLOC(MAX_LOG_PATHNAME, INM_KM_SLEEP, INM_KERNEL_HEAP); if (!tmp_str) { INM_KFREE(local_fname, MAX_LOG_PATHNAME, INM_KERNEL_HEAP); return -ENOMEM; } local_fname[0] = '\0'; if (strcpy_s(tmp_str, MAX_LOG_PATHNAME, "/")) { INM_KFREE(local_fname, MAX_LOG_PATHNAME, INM_KERNEL_HEAP); INM_KFREE(tmp_str, MAX_LOG_PATHNAME, INM_KERNEL_HEAP); return INM_EFAULT; } if (*filename == '/') { /* ensure that path doesn't have dev path */ if (strncmp(filename, "/dev/", strlen("/dev/")) == 0) return 0; if (strlen(filename) > MAX_LOG_PATHNAME) return -EINVAL; cptr = filename; } else { sz = strlen(DEFAULT_LOG_DIRECTORY_VALUE); if (filename[sz-1] != '/') { sz++; } sz += strlen(filename); if (sz > MAX_LOG_PATHNAME) { INM_KFREE(local_fname, MAX_LOG_PATHNAME, INM_KERNEL_HEAP); INM_KFREE(tmp_str, MAX_LOG_PATHNAME, INM_KERNEL_HEAP); return -EINVAL; } if (strcpy_s(local_fname, MAX_LOG_PATHNAME, DEFAULT_LOG_DIRECTORY_VALUE)) { INM_KFREE(local_fname, MAX_LOG_PATHNAME, INM_KERNEL_HEAP); INM_KFREE(tmp_str, MAX_LOG_PATHNAME, INM_KERNEL_HEAP); return INM_EFAULT; } if (strcat_s(local_fname, MAX_LOG_PATHNAME, "/") || strcat_s(local_fname, MAX_LOG_PATHNAME, filename)) { INM_KFREE(local_fname, MAX_LOG_PATHNAME, INM_KERNEL_HEAP); INM_KFREE(tmp_str, MAX_LOG_PATHNAME, INM_KERNEL_HEAP); return INM_EFAULT; } cptr = local_fname; } while(cptr) { if (*cptr == '/') { cptr++; continue; } next = strchr(cptr, '/'); if (!next) break; #ifdef INM_LINUX strncat_s(tmp_str, MAX_LOG_PATHNAME, cptr, (next - cptr)); #endif #ifdef INM_SOLARIS #if defined(__SunOS_5_11) || defined(__SunOS_5_10) strncat_s(tmp_str, MAX_LOG_PATHNAME, cptr, (next - cptr)); #endif #if defined(__SunOS_5_9) || defined(__SunOS_5_8) len = (next - cptr); char_tmp = cptr[len]; cptr[len] = '\0'; strcat_s(tmp_str, MAX_LOG_PATHNAME, cptr); cptr[len] = char_tmp; #endif #endif #ifdef INM_AIX strncat_s(tmp_str, MAX_LOG_PATHNAME, cptr, (next - cptr)); #endif err = inm_mkdir(tmp_str, 0755); if (!(err == 0 || err == INM_EEXIST || err == INM_EROFS)) INM_BUG_ON(1); strcat_s(tmp_str, MAX_LOG_PATHNAME, "/"); cptr = next+1; } if (local_fname) { INM_KFREE(local_fname, MAX_LOG_PATHNAME, INM_KERNEL_HEAP); local_fname = NULL; } if (tmp_str) { INM_KFREE(tmp_str, MAX_LOG_PATHNAME, INM_KERNEL_HEAP); tmp_str = NULL; } return 0; } inm_s32_t validate_pname(char *pname) { int error = 0; int i = 0; while (*pname && i < INM_GUID_LEN_MAX) { if (*pname == '/') { error = -EINVAL; break; } pname++; i++; } /* Make sure its null terminated */ return *pname ? -EINVAL : 0; } inm_s32_t get_volume_size(int64_t *vol_size, inm_s32_t *inmage_status) { inm_s32_t status = 0; if (!vol_size || !inmage_status) return -EINVAL; *vol_size = (4096 * 4096); *inmage_status = 0; return status; } inm_s32_t inm_find_msb(inm_u64_t x) { inm_s32_t r = 64; if (!x) return 0; if (!(x & 0xffffffff00000000ULL)) { x <<= 32; r -= 32; } if (!(x & 0xffff000000000000ULL)) { x <<= 16; r -= 16; } if (!(x & 0xff00000000000000ULL)) { x <<= 8; r -= 8; } if (!(x & 0xf000000000000000ULL)) { x <<= 4; r -= 4; } if (!(x & 0xc000000000000000ULL)) { x <<= 2; r -= 2; } if (!(x & 0x8000000000000000ULL)) { x <<= 1; r -= 1; } return r; } /* * computing granularity, <512GB, 256K is the granularity * <1tb - 512k, <2tb - 1mb, <4tb - 2mb, and so on **/ inm_s32_t default_granularity_from_volume_size(inm_u64_t volume_size) { inm_u64_t _scale = 0; inm_s32_t _rc = 0; _scale = volume_size-1; _scale >>= 30; /* <= 512G - 4k granularity, otherwise 16K */ if (driver_ctx->dc_bmap_info.bitmap_512K_granularity_size && _scale > driver_ctx->dc_bmap_info.bitmap_512K_granularity_size) { _rc = SIXTEEN_K_SIZE; } else { _rc = FOUR_K_SIZE; } return _rc; } inm_ull64_t inm_atoull64(const char *name) { inm_ull64_t val = 0; for (;; name++) { switch (*name) { case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': val = 10*val+(*name-'0'); break; default: return val; } } } inm_u64_t inm_atoi64(const char *name) { inm_u64_t val = 0; for (;; name++) { switch (*name) { case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': val = 10*val+(*name-'0'); break; default: return val; } } } inm_u32_t inm_atoi(const char *name) { inm_u32_t val = 0; for (;; name++) { switch (*name) { case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': val = 10*val+(*name-'0'); break; default: return val; } } } inm_s32_t is_digit(const char *buf, inm_s32_t len) { inm_s32_t i = 0; if (buf[len-1] == '\n') len--; while(i < len) { if(!isdigit(buf[i])) { return 0; } i++; } return 1; } inm_s32_t get_path_memory(char **path) { *path = (char *)INM_KMALLOC(INM_PATH_MAX, INM_KM_SLEEP, INM_KERNEL_HEAP); if(!*path) return 0; (*path)[0] = '\0'; return 1; } void free_path_memory(char **path) { if(*path == NULL) return; INM_KFREE(*path, INM_PATH_MAX, INM_KERNEL_HEAP); *path = NULL; } inm_s32_t filter_guid_name_val_get(char *pname, char *fname) { char *s = NULL; inm_s32_t value = 0; s = INM_KMEM_CACHE_ALLOC_PATH(names_cachep, INM_KM_SLEEP, INM_PATH_MAX, INM_KERNEL_HEAP); INM_BUG_ON(!s); strncpy_s(s, INM_PATH_MAX, pname, INM_PATH_MAX); strcat_s(&s[0], INM_PATH_MAX, "/"); strcat_s(&s[0], INM_PATH_MAX, fname); read_value_from_file(s, &value); dbg("value from read for name %s val %d\n", s, value); INM_KMEM_CACHE_FREE_PATH(names_cachep, s, INM_KERNEL_HEAP); s = NULL; return ((inm_s32_t) value); } char * filter_guid_name_string_get(char *guid, char *name, inm_s32_t len) { char *s = NULL; char *buf = NULL; s = INM_KMEM_CACHE_ALLOC_PATH(names_cachep, INM_KM_SLEEP, INM_PATH_MAX, INM_KERNEL_HEAP); INM_BUG_ON(!s); strncpy_s(s, INM_PATH_MAX, guid, INM_PATH_MAX); strcat_s(&s[0], INM_PATH_MAX, "/"); strcat_s(&s[0], INM_PATH_MAX, name); read_string_from_file(s, buf, len); dbg("value from read for name %s val %s\n", name, buf); INM_KMEM_CACHE_FREE_PATH(names_cachep, s, INM_KERNEL_HEAP); s = NULL; return buf; } int filter_ctx_name_val_set(target_context_t *ctxt, char *name, inm_s32_t value) { inm_s32_t len = (NUM_CHARS_IN_INTEGER + 1); char buf[(NUM_CHARS_IN_INTEGER + 1)]; inm_s32_t copied; INM_MEM_ZERO(buf, NUM_CHARS_IN_INTEGER + 1); copied = snprintf(buf, NUM_CHARS_IN_INTEGER + 1, "%d", value); if(!write_vol_attr(ctxt, name, (void *)buf, len)) { return -EINVAL; } return 0; } inm_device_t filter_dev_type_get(char *pname) { inm_s32_t status = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered volume:%s",pname); } status = filter_guid_name_val_get(pname, "FilterDevType"); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving volume:%s status:%d",pname, status); } return (inm_device_t) status; } int filter_dev_type_set(target_context_t *ctxt, inm_device_t val) { return filter_ctx_name_val_set(ctxt, "FilterDevType", val); } int read_value_from_file(char *fname, inm_s32_t *val) { inm_s32_t ret = 0; char *path = NULL, *buf = NULL; inm_u32_t len = 32, bytes_read = 0; if(!get_path_memory(&path)) { err("Failed to allocated memory path"); return -EINVAL; } snprintf(path, INM_PATH_MAX, "%s/%s", PERSISTENT_DIR, fname); dbg("Reading from file %s", path); buf = (void *)INM_KMALLOC(len, INM_KM_SLEEP, INM_KERNEL_HEAP); if(!buf) goto free_path_buf; dbg("Allocated buffer of len %d", len); INM_MEM_ZERO(buf, len); if(!read_full_file(path, buf, len, &bytes_read)) { ret = 0; goto free_buf; } *val = inm_atoi(buf); ret = 1; free_buf: if(buf) INM_KFREE(buf, len, INM_KERNEL_HEAP); buf = NULL; free_path_buf: if(path) free_path_memory(&path); path = NULL; return ret; } char * read_string_from_file(char *fname, char *buf, inm_s32_t len) { char *path = NULL; int bytes_read = 0; buf = NULL; if (!get_path_memory(&path)) { err("Failed to allocated memory path"); return NULL; } snprintf(path, INM_PATH_MAX, "%s/%s", PERSISTENT_DIR, fname); dbg("Reading string from file %s", path); buf = (void *)INM_KMALLOC(len, INM_KM_SLEEP, INM_KERNEL_HEAP); if(!buf) goto free_path_buf; dbg("Allocated buffer of len %d", len); INM_MEM_ZERO(buf, len); if (read_full_file(path, buf, len, &bytes_read)) { goto free_buf; } free_buf: if (buf) { INM_KFREE(buf, len, INM_KERNEL_HEAP); } buf = NULL; free_path_buf: if (path) free_path_memory(&path); path = NULL; return buf; } /* This fn gets the ts delta, and seqno deltas, and their overflow * info */ void inm_get_ts_and_seqno_deltas(change_node_t *cnp, inm_tsdelta_t *tdp) { inm_u64_t tdelta = 0, sdelta = 0; tdp->td_oflow = FALSE; /* do not compute deltas for new change nodes */ if (cnp->changes.change_idx) { TIME_STAMP_TAG_V2 ts; get_time_stamp_tag(&ts); sdelta = ts.ullSequenceNumber - cnp->changes.start_ts.ullSequenceNumber; tdelta = ts.TimeInHundNanoSecondsFromJan1601 - cnp->changes.start_ts.TimeInHundNanoSecondsFromJan1601; /* check for overflow */ if (tdelta >= 0xFFFFFFFE || sdelta >= 0xFFFFFFFE) { tdp->td_oflow = TRUE; sdelta = 0; tdelta = 0; } } tdp->td_time = (inm_u32_t) tdelta; tdp->td_seqno = (inm_u32_t) sdelta; if (cnp->vcptr->tc_cur_wostate != ecWriteOrderStateData) { tdp->td_oflow = FALSE; tdp->td_seqno = 0; tdp->td_time = 0; } } /* persistent store for timestamp and seq # */ /* called on every PERSISTENT_SEQNO_THRESHOLD diff */ void inm_flush_ts_and_seqno(wqentry_t *wqep) { inm_flush_ts_and_seqno_to_file(FALSE); } /* This fn flushes timestamp and seqno to disk * called on every 1 sec */ void inm_flush_ts_and_seqno_to_file(inm_u32_t force) { inm_s32_t len = NUM_CHARS_IN_LONGLONG + 1, nr_bytes = 0; unsigned long lock_flag; static inm_u64_t prev_seqno = 0, prev_ts = 0; inm_u64_t cur_seqno = 0, cur_ts = 0; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->clean_shutdown_lock, lock_flag); if (!(driver_state & DRV_LOADED_FULLY)) { INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->clean_shutdown_lock, lock_flag); dbg("Driver is not initialized fully and so quitting without updating global timestamps"); return; } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->clean_shutdown_lock, lock_flag); /* * Reject any requests from worker thread once * system shutdown in progress. System shutdown * ioctl then uses force=1 to force write ts/seqno */ if (driver_ctx->sys_shutdown && !force) return; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->time_stamp_lock, lock_flag); cur_seqno = driver_ctx->last_time_stamp_seqno; cur_ts = driver_ctx->last_time_stamp; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->time_stamp_lock, lock_flag); /* Open handles for seqno */ if (!driver_ctx->driver_time_stamp_handle) { flt_open_data_file(driver_ctx->driver_time_stamp, (INM_RDWR | INM_CREAT | INM_TRUNC | INM_SYNC), &driver_ctx->driver_time_stamp_handle); } /* Open handles for seqno */ if (!driver_ctx->driver_time_stamp_seqno_handle) { flt_open_data_file(driver_ctx->driver_time_stamp_seqno, (INM_RDWR | INM_CREAT | INM_TRUNC | INM_SYNC), &driver_ctx->driver_time_stamp_seqno_handle); } if (prev_seqno == cur_seqno || !driver_ctx->driver_time_stamp_seqno_handle) { return; } /* flush seq no. */ INM_MEM_ZERO(driver_ctx->driver_time_stamp_buf, len); nr_bytes = snprintf(driver_ctx->driver_time_stamp_buf, len, "%llu", (unsigned long long)cur_seqno); flt_write_file(driver_ctx->driver_time_stamp_seqno_handle, driver_ctx->driver_time_stamp_buf, 0, nr_bytes, NULL); prev_seqno = cur_seqno; if (prev_ts == cur_ts || !driver_ctx->driver_time_stamp_handle) { return; } /* flush time stamp */ INM_MEM_ZERO(driver_ctx->driver_time_stamp_buf, len); nr_bytes = snprintf(driver_ctx->driver_time_stamp_buf, len, "%llu", (unsigned long long)cur_ts); flt_write_file(driver_ctx->driver_time_stamp_handle, driver_ctx->driver_time_stamp_buf, 0, nr_bytes, NULL); } void inm_close_ts_and_seqno_file(void) { /* Release handles on timestamp & seqno files */ if (driver_ctx->driver_time_stamp_seqno_handle) { #ifndef INM_AIX inm_restore_org_addr_space_ops(INM_HDL_TO_INODE(driver_ctx->driver_time_stamp_seqno_handle)); #endif INM_CLOSE_FILE(driver_ctx->driver_time_stamp_seqno_handle, (INM_RDWR | INM_CREAT | INM_TRUNC | INM_SYNC)); driver_ctx->driver_time_stamp_seqno_handle = NULL; } if (driver_ctx->driver_time_stamp_handle) { #ifndef INM_AIX inm_restore_org_addr_space_ops(INM_HDL_TO_INODE(driver_ctx->driver_time_stamp_handle)); #endif INM_CLOSE_FILE(driver_ctx->driver_time_stamp_handle, (INM_RDWR | INM_CREAT | INM_TRUNC | INM_SYNC)); driver_ctx->driver_time_stamp_handle = NULL; } } inm_s32_t inm_flush_clean_shutdown(inm_u32_t clean_shutdown) { char *pathp = NULL, *bufp = NULL; inm_s32_t len = NUM_CHARS_IN_INTEGER + 1, nr_bytes = 0, err = 0; if (!get_path_memory(&pathp)) { err("Failed to allocate memory"); goto exit; } bufp = (char *) INM_KMALLOC(len , INM_KM_SLEEP, INM_KERNEL_HEAP); if (!bufp) { err("Failed to allocate memory"); goto exit; } INM_MEM_ZERO(bufp, len); snprintf(pathp, INM_PATH_MAX, "%s/%s/CleanShutdown", PERSISTENT_DIR,COMMON_ATTR_NAME); nr_bytes=snprintf(bufp, len, "%u", clean_shutdown); driver_ctx->clean_shutdown = clean_shutdown; err = write_full_file(pathp, (void *)bufp, nr_bytes, NULL); exit: if (bufp) INM_KFREE(bufp, len, INM_KERNEL_HEAP); if (pathp) free_path_memory(&pathp); return err; } /* fn computes index into io bucket array, * since io size 1k/4k is more frequent * care taken through this fn */ inm_u32_t inm_comp_io_bkt_idx(inm_u32_t io_sz) { inm_s32_t nr_bit = 0; io_sz >>= 10; // converted into sectors if (io_sz <= 2) { //if io size <= 2k nr_bit=io_sz; } else if (io_sz == 4) { //if io size is 4k nr_bit = 3; } else { nr_bit = inm_find_msb((inm_u64_t) io_sz); if (io_sz & ~(1 << nr_bit)) { nr_bit++; } if (nr_bit > MAX_NR_IO_BUCKETS) { nr_bit = MAX_NR_IO_BUCKETS; } } return nr_bit; } inm_s32_t write_vol_attr(target_context_t * ctxt, const char *file_name, void *buf, inm_s32_t len) { char *path = NULL; inm_s32_t wrote = 0, ret = 0; if (ctxt->tc_flags & VCF_VOLUME_STACKED_PARTIALLY) return -EROFS; if(!get_path_memory(&path)) { err("Failed to allocated memory path"); return -EINVAL; } snprintf(path, INM_PATH_MAX, "%s/%s/%s", PERSISTENT_DIR, ctxt->tc_pname, file_name); dbg("Writing to file %s", path); if(!write_full_file(path, (void *)buf, len, &wrote)) { if (!is_rootfs_ro()) err("write to persistent store failed %s", path); ret = -EINVAL; } else { ret = 1; } free_path_memory(&path); return ret; } void inm_free_host_dev_ctx(struct host_dev_context *hdcp) { struct inm_list_head *ptr = NULL,*nextptr = NULL; host_dev_t *hdc_dev = NULL; if (hdcp) { inm_list_for_each_safe(ptr, nextptr, &hdcp->hdc_dev_list_head) { inm_list_del(ptr); hdc_dev = inm_list_entry(ptr, host_dev_t, hdc_dev_list); INM_KFREE(hdc_dev, sizeof(host_dev_t), INM_KERNEL_HEAP); } #ifdef INM_AIX INM_DESTROY_SPIN_LOCK(&hdcp->hdc_lock); #endif INM_DESTROY_WAITQUEUE_HEAD(&hdcp->resync_notify); INM_KFREE(hdcp, sizeof(host_dev_ctx_t), INM_PINNED_HEAP); hdcp = NULL; } } inm_u32_t is_AT_blocked() { inm_u32_t ret = 0; if (strncmp(INM_CURPROC_COMM, "inm_dmit", strlen("inm_dmit")) && strncmp(INM_CURPROC_COMM, "vx.sh", strlen("vx.sh")) && strncmp(INM_CURPROC_COMM, "scsi_id", strlen("scsi_id")) && strncmp(INM_CURPROC_COMM, "inm_scsi_id", strlen("inm_scsi_id")) && strncmp(INM_CURPROC_COMM, "appservice", strlen("appservice")) && strncmp(INM_CURPROC_COMM, "s2", strlen("s2")) && (!(driver_ctx->flags & DC_FLAGS_INVOLFLT_LOAD) || (strcmp(INM_CURPROC_COMM, "modprobe") && strcmp(INM_CURPROC_COMM, "insmod")))) { ret =1; } return ret; } tag_info_t * cnvt_tag_info2stream(tag_info_t * tag_info, inm_s32_t num_tags, inm_u32_t flag) { tag_info_t *stag_info = NULL; unsigned short ltag_len = 0; unsigned long uuid_len = 0; unsigned char *taglenp = NULL; inm_u16_t i = 0; inm_u64_t processed_len; stag_info = (tag_info_t *)INM_KMALLOC(sizeof(tag_info_t) * num_tags, INM_KM_SLEEP | flag, INM_KERNEL_HEAP); if (!stag_info) { dbg("Failed to allocate tag info structure"); goto out; } if (memcpy_s(stag_info, sizeof(tag_info_t) * num_tags, tag_info, sizeof(tag_info_t) * num_tags)) { INM_KFREE(stag_info, sizeof(tag_info_t) * num_tags, INM_KERNEL_HEAP); stag_info = NULL; goto out; } if (!is_big_endian()) { goto out; } for (i = 0; i < num_tags; i++) { ltag_len = stag_info[i].tag_len; taglenp = (unsigned char *)&(stag_info[i].tag_len); taglenp[1] = ((ltag_len >> 8) & 0xFF); taglenp[0] = (ltag_len & 0xFF); processed_len = 0; while (processed_len < (inm_u64_t)(tag_info[i].tag_len)) { if(*(((unsigned char *)(tag_info[i].tag_name)) + processed_len + 2)) { taglenp = (unsigned char *)(stag_info[i].tag_name) + processed_len; processed_len += *((unsigned long*)((tag_info[i].tag_name) + processed_len + 4)); } else { taglenp = (unsigned char *)(stag_info[i].tag_name) + processed_len; processed_len += (*((unsigned char *)((tag_info[i].tag_name) + processed_len + 3))); } ltag_len = (unsigned short)(*((unsigned short*)taglenp)); taglenp[1] = ((ltag_len >> 8) & 0xFF); taglenp[0] = ((ltag_len) & 0xFF); if(*(taglenp+2)){ taglenp = ((unsigned char *)taglenp) + 4; uuid_len = (unsigned long)(*((unsigned long*)taglenp)); taglenp[3] = ((uuid_len >> 24) & 0xFF); taglenp[2] = ((uuid_len >> 16) & 0xFF); taglenp[1] = ((uuid_len >> 8) & 0xFF); taglenp[0] = (uuid_len & 0xFF); } } } out: #ifdef INM_DEBUG dbg("Printing old tag struct"); print_tag_struct(tag_info, num_tags); dbg("Printing new tag struct"); print_tag_struct(stag_info, num_tags); #endif return stag_info; } tag_info_t * cnvt_stream2tag_info(tag_info_t *stag_info, inm_s32_t num_tags) { tag_info_t *tag_info = NULL; unsigned short ltag_len = 0; unsigned char *taglenp = NULL; unsigned long uuid_len = 0; unsigned long *ltag_lenp = NULL; unsigned short *lstag_lenp = NULL; inm_u16_t i = 0; inm_u64_t processed_len; tag_info = (tag_info_t *)INM_KMALLOC(sizeof(tag_info_t) * num_tags, INM_KM_NOSLEEP, INM_KERNEL_HEAP); if (!tag_info){ goto out; } if (memcpy_s(tag_info, sizeof(tag_info_t) * num_tags, stag_info, sizeof(tag_info_t) * num_tags)) { INM_KFREE(tag_info, sizeof(tag_info_t) * num_tags, INM_KERNEL_HEAP); stag_info = NULL; goto out; } if(!is_big_endian()){ goto out; } for (i = 0; i < num_tags; i++){ taglenp = (unsigned char *)&(tag_info[i].tag_len); ltag_len = 0; ltag_len |= (((unsigned short)taglenp[1]) >> 8); ltag_len |= ((unsigned short)taglenp[0]); tag_info[i].tag_len = ltag_len; processed_len = 0; while (processed_len < tag_info[i].tag_len){ if(*(((unsigned char *)(tag_info[i].tag_name)) + processed_len + 2)){ taglenp = (unsigned char *)(tag_info[i].tag_name) + processed_len; processed_len += *((unsigned long*)((tag_info[i].tag_name) + 4)); } else { taglenp = (unsigned char *)(tag_info[i].tag_name) + processed_len; processed_len += (unsigned long)(*((unsigned char *)((tag_info[i].tag_name) + 3))); } ltag_len = 0; ltag_len |= (((unsigned short)taglenp[0]) >> 8); ltag_len |= ((unsigned short)taglenp[1]); lstag_lenp = (unsigned short *) taglenp; *lstag_lenp = ltag_len; if(*(taglenp+2)){ taglenp = ((unsigned char *)taglenp) + 4; uuid_len = 0; uuid_len |= (((unsigned long)taglenp[0]) >> 24); uuid_len |= (((unsigned long)taglenp[1]) >> 16); uuid_len |= (((unsigned long)taglenp[2]) >> 8); uuid_len |= (((unsigned long)taglenp[3])); ltag_lenp = (unsigned long *) taglenp; *ltag_lenp = uuid_len; } } } out: #ifdef INM_DEBUG dbg("Printing old tag struct"); print_tag_struct(stag_info, num_tags); dbg("Printing new tag struct"); print_tag_struct(tag_info, num_tags); #endif return tag_info; } #ifdef INM_DEBUG static void print_tag_struct(tag_info_t *tag_info, inm_s32_t num_tags) { inm_u32_t i = 0; if(!tag_info){ goto out; } for (i = 0 ; i < num_tags; i++){ dbg("Tag %u's tag len is %u", i, tag_info[i].tag_len); } out: return; } #endif inm_s32_t inm_form_tag_cdb(target_context_t *tcp, tag_info_t *tag_info, inm_s32_t num_tags) { inm_s32_t error = 0; unsigned char cmd[16]; inm_u32_t buflen = 0; inm_u32_t flag = INM_KM_SLEEP; tag_info_t *stag_info = NULL; if (!tcp){ error = 1; goto out; } IS_DMA_FLAG(tcp, flag); stag_info = cnvt_tag_info2stream(tag_info, num_tags, flag); buflen = num_tags * sizeof(tag_info_t); cmd[0] = VACP_CDB; cmd[1] = (buflen >> 24) & 0xFF; cmd[2] = (buflen >> 16) & 0xFF; cmd[3] = (buflen >> 8) & 0xFF; cmd[4] = (buflen) & 0xFF; cmd[5] = 0x0; cmd[6] = 0x0; cmd[7] = 0x0; cmd[8] = 0x0; cmd[9] = 0x0; cmd[10] = 0x0; cmd[11] = 0x0; cmd[12] = 0x0; cmd[13] = 0x0; cmd[14] = 0x0; cmd[15] = 0x0; error = inm_all_AT_cdb_send(tcp, cmd, VACP_CDB_LEN, 1, (unsigned char *)stag_info, buflen, 0); if (error){ INM_ATOMIC_INC(&(tcp->tc_stats.num_tags_dropped)); } out: dbg("exiting form_tag_cdb with %d", error); if (stag_info){ INM_KFREE(stag_info, sizeof(tag_info_t) * num_tags, INM_KERNEL_HEAP); } return 0; } inm_s32_t inm_heartbeat_cdb(target_context_t *tcp) { inm_s32_t error = 0; unsigned char cmd[16]; if (!tcp){ error = 1; goto out; } cmd[0] = HEARTBEAT_CDB; cmd[1] = 0x0; cmd[2] = 0x0; cmd[3] = 0x0; cmd[4] = 0x0; cmd[5] = 0x0; cmd[6] = 0x0; cmd[7] = 0x0; cmd[8] = 0x0; cmd[9] = 0x0; cmd[10] = 0x0; cmd[11] = 0x0; cmd[12] = 0x0; cmd[13] = 0x0; cmd[14] = 0x0; cmd[15] = 0x0; error = try_reactive_offline_AT_path(tcp, cmd, HEARTBEAT_CDB_LEN, 1, NULL, 0, 0); if(!error){ goto out; } error = inm_all_AT_cdb_send(tcp, cmd, HEARTBEAT_CDB_LEN, 1, NULL, 0, 0); out: dbg("exiting heart_beat_cdb with %d", error); return error; } static inm_u32_t is_big_endian(void) { unsigned short i = 1; char *c = (char *)&i; unsigned short j = (unsigned short)(*c); inm_u32_t ret = j?0:1; return ret; } inm_s32_t inm_erase_resync_info_from_persistent_store(char *pname) { inm_s32_t err = 0; char *parent = NULL, *fname = NULL; if(!get_path_memory(&fname)) { err("malloc failed"); err = INM_ENOMEM; return err; } if(!get_path_memory(&parent)) { free_path_memory(&fname); err("malloc failed"); err = INM_ENOMEM; return err; } snprintf(parent, INM_PATH_MAX, "%s/%s", PERSISTENT_DIR, pname); snprintf(fname, INM_PATH_MAX, "%s/VolumeResyncRequired", parent); inm_unlink(fname, parent); snprintf(fname, INM_PATH_MAX, "%s/VolumeOutOfSyncCount", parent); inm_unlink(fname, parent); snprintf(fname, INM_PATH_MAX, "%s/VolumeOutOfSyncErrorCode", parent); inm_unlink(fname, parent); snprintf(fname, INM_PATH_MAX, "%s/VolumeOutOfSyncErrorStatus", parent); inm_unlink(fname, parent); snprintf(fname, INM_PATH_MAX, "%s/VolumeOutOfSyncTimeStamp", parent); inm_unlink(fname, parent); free_path_memory(&fname); free_path_memory(&parent); return err; } void inm_get_tag_marker_guid(char *tag_buf, inm_u32_t tag_buf_len, char *guid, inm_u32_t guid_len) { STREAM_REC_HDR_4B *hdr = NULL; hdr = (STREAM_REC_HDR_4B *)tag_buf; if (hdr->ucFlags & STREAM_REC_FLAGS_LENGTH_BIT) { tag_buf += sizeof(STREAM_REC_HDR_8B); tag_buf_len -= sizeof(STREAM_REC_HDR_8B); } else { tag_buf += sizeof(STREAM_REC_HDR_4B); tag_buf_len -= sizeof(STREAM_REC_HDR_8B); } memcpy_s(guid, guid_len, tag_buf, tag_buf_len); } #if defined(RHEL_MAJOR) && (RHEL_MAJOR == 5) /* Make sure compiler does printf style format checking */ int sprintf_s(char *buf, size_t bufsz, const char *fmt, ...) \ __attribute__ ((format(printf, 3, 4))); int sprintf_s(char *buf, size_t bufsz, const char *fmt, ...) { int retval = -1; va_list args; if( buf && bufsz > 0 && fmt ) { va_start(args, fmt); retval = vsnprintf(buf, bufsz, fmt, args); /* If buffer not adequate, return error */ if( retval >= bufsz ) retval = -1; va_end(args); } if( retval == -1 ) { if( buf && bufsz ) *buf = '\0'; } return retval; } #endif involflt-0.1.0/src/bitmap_api.c0000755000000000000000000013312214467303177015124 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ /********************************************************************* * File : bitmap_api.c * * Description: This file contains bitmap mode implementation of the * filter driver. * * Functions defined in this file are * bitmap_api_ctr * bitmap_api_dtr * initialize_bitmap_api * terminate_bitmap_api * bitmap_api_load_bitmap_header_from_filestream * bitmap_api_is_bitmap_closed * bitmap_api_close * bitmap_api_set_writesize_not_to_exceed_volumesize * bitmap_api_setbits * bitmap_api_clearbits * bitmap_api_get_first_runs * bitmap_api_get_next_runs * bitmap_api_clear_all_bits * move_rawio_changes_to_bitmap * bitmap_api_init_bitmap_file * bitmap_api_commit_bitmap_internal * bitmap_api_fast_zero_bitmap * bitmap_api_commit_header * bitmap_api_calculate_hdr_integrity_checksums * bitmap_api_read_and_verify_bitmap_header * bitmap_api_verify_header * bitmap_api_save_write_metadata_to_bitmap * bitmap_api_change_bitmap_mode_to_raw_io * bitmap_api_commit_bitmap * is_volume_in_sync * ************************************************************************/ #include "involflt-common.h" #include "involflt.h" #include "data-mode.h" #include "utils.h" #include "change-node.h" #include "filestream.h" #include "iobuffer.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "bitmap_api.h" #include "VBitmap.h" #include "work_queue.h" #include "data-file-mode.h" #include "target-context.h" #include "driver-context.h" #include "errlog.h" #include "metadata-mode.h" #include "md5.h" extern driver_context_t *driver_ctx; bitmap_api_t *bitmap_api_ctr() { bitmap_api_t *bapi = NULL; bapi = (bitmap_api_t *)INM_KMALLOC(sizeof(bitmap_api_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!bapi) return NULL; INM_MEM_ZERO(bapi,sizeof(*bapi)); bapi->bitmap_file_state = BITMAP_FILE_STATE_UNINITIALIZED; INM_INIT_SEM(&bapi->sem); bapi->volume_insync = FALSE; bapi->err_causing_outofsync = 0; return bapi; } void bitmap_api_dtr(bitmap_api_t *bmap) { if (bmap) INM_KFREE(bmap, sizeof(bitmap_api_t), INM_KERNEL_HEAP); bmap = NULL; } inm_s32_t initialize_bitmap_api() { /* initialize async, iobuffer lookaside lists */ return iobuffer_initialize_memory_lookaside_list(); } inm_s32_t terminate_bitmap_api() { /* terminate async, iobuffer lookaside lists */ iobuffer_terminate_memory_lookaside_list(); return 0; } inm_s32_t bitmap_api_open(bitmap_api_t *bapi, target_context_t *vcptr, inm_u32_t granularity, inm_u32_t offset, inm_u64_t volume_size, char *volume_name, inm_u32_t segment_cache_limit, inm_s32_t *detailed_status) { inm_s32_t ret = 0; inm_s32_t status = 0; inm_u32_t max_bitmap_buffer_required = 0; fstream_segment_mapper_t *fssm = NULL; inm_u64_t _gran_vol_size = volume_size; inm_s32_t _dstatus = 0; char *bitmap_filename = vcptr->tc_bp->bitmap_file_name; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!bapi || !detailed_status || !bitmap_filename || !granularity || !volume_size) return 1; *detailed_status = 0; INM_DOWN(&bapi->sem); bapi->bitmap_granularity = granularity; bapi->bitmap_offset = offset; bapi->volume_size = volume_size; if (volume_name) { if (strncpy_s(bapi->volume_name, INM_NAME_MAX + 1, volume_name, INM_NAME_MAX)) { ret = INM_EFAULT; goto cleanup_and_return_failure; } } INM_DO_DIV(_gran_vol_size, granularity); bapi->nr_bits_in_bitmap = (inm_u32_t)(_gran_vol_size + 1); bapi->bitmap_size_in_bytes = ((bapi->nr_bits_in_bitmap + 7) / 8); if ((bapi->bitmap_size_in_bytes % BITMAP_FILE_SEGMENT_SIZE) != 0) bapi->bitmap_size_in_bytes += BITMAP_FILE_SEGMENT_SIZE - (bapi->bitmap_size_in_bytes % BITMAP_FILE_SEGMENT_SIZE); bapi->bitmap_size_in_bytes += LOG_HEADER_OFFSET; max_bitmap_buffer_required = min((segment_cache_limit * BITMAP_FILE_SEGMENT_SIZE), bapi->bitmap_size_in_bytes); bapi->segment_cache_limit = segment_cache_limit; if (strncpy_s(bapi->bitmap_filename, INM_NAME_MAX + 1, bitmap_filename, INM_NAME_MAX)) { ret = INM_EFAULT; goto cleanup_and_return_failure; } if ((driver_ctx->dc_bmap_info.current_bitmap_buffer_memory + max_bitmap_buffer_required) > driver_ctx->dc_bmap_info.max_bitmap_buffer_memory) { *detailed_status = LINVOLFLT_ERR_BITMAP_FILE_EXCEEDED_MEMORY_LIMIT; ret = -ENOMEM; goto cleanup_and_return_failure; } if (INM_MEM_CMP("/dev/", bitmap_filename, strlen("/dev/")) == 0) { //bitmap is a raw volume } else { bapi->bitmap_offset = 0; } bapi->fssm = fssm = fstream_segment_mapper_ctr(); if (!fssm) { info("error in fssm_ctr \n"); goto cleanup_and_return_failure; } fssm->bapi = bapi; ret = fstream_segment_mapper_attach(fssm, bapi, bapi->bitmap_offset + LOG_HEADER_OFFSET, (bapi->nr_bits_in_bitmap/8)+1, segment_cache_limit); if (ret) { info("fssm attach error = %d", ret); goto cleanup_and_return_failure; } bapi->sb = segmented_bitmap_ctr(fssm, bapi->nr_bits_in_bitmap); if (bapi->sb == NULL) { *detailed_status = LINVOLFLT_ERR_NO_MEMORY; status = -ENOMEM; info("sb ctr error = %d",ret); goto cleanup_and_return_failure; } driver_ctx->dc_bmap_info.current_bitmap_buffer_memory += max_bitmap_buffer_required; ret = bitmap_api_open_bitmap_stream(bapi, vcptr, &_dstatus); *detailed_status = _dstatus; if (ret && (bapi->bitmap_filename[0] == '/')) { if (is_rootfs_ro()) { info("root is read only file system : " "can't open/create bitmap files, " "so moving to raw bitmap mode.\n"); goto exit_fn; } else { info("root file system is full or missing directory " "hierarchy : so can't open/create bitmap " "files.\n"); goto cleanup_and_return_failure; } } exit_fn: INM_UP(&bapi->sem); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return 0; cleanup_and_return_failure: if (bapi->sb != NULL) { segmented_bitmap_put(bapi->sb); bapi->sb = NULL; } if (bapi->fssm != NULL) { fstream_segment_mapper_put(bapi->fssm); bapi->fssm = NULL; } if (bapi->fs != NULL) { fstream_close(bapi->fs); fstream_put(bapi->fs); bapi->fs = NULL; } if (bapi->io_bitmap_header != NULL) { iobuffer_put(bapi->io_bitmap_header); bapi->io_bitmap_header = NULL; } bapi->bitmap_file_state = BITMAP_FILE_STATE_UNINITIALIZED; INM_UP(&bapi->sem); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving - cleaning the memory on error"); } return ret; } void bitmap_api_upgrade_header(bitmap_api_t *bapi) { inm_s32_t error = 0; inm_s32_t upgrade_error = 0; while (!error && bapi->bitmap_header.un.header.version != BITMAP_FILE_VERSION) { switch (bapi->bitmap_header.un.header.version) { case BITMAP_FILE_VERSION1: bapi->bitmap_header.un.header.resync_required = 0; bapi->bitmap_header.un.header.resync_errcode = 0; bapi->bitmap_header.un.header.resync_errstatus = 0; bapi->bitmap_header.un.header.version = BITMAP_FILE_VERSION2; bapi->bitmap_header.un.header.header_size = BITMAP_HDR2_SIZE; break; default: /* This should never happen as header is verified * before upgrade */ err("Invalid version - %x", bapi->bitmap_header.un.header.version); INM_BUG_ON(1); error = -EINVAL; } } if (!error) { bitmap_api_calculate_hdr_integrity_checksums(&bapi->bitmap_header); if (bitmap_api_verify_header(bapi, &bapi->bitmap_header)) { error = bitmap_api_commit_header(bapi, FALSE, &upgrade_error); if (error) { err("Cannot persist bitmap header - %x", upgrade_error); } else { info("Successfully upgraded bitmap to version " "0x%x", bapi->bitmap_header.un.header.version); } } else { err("Bitmap upgrade header verification failed"); } } } inm_s32_t bitmap_api_load_bitmap_header_from_filestream(bitmap_api_t *bapi, inm_s32_t *detailed_status, inm_s32_t was_created) { inm_s32_t ret = 0; iobuffer_t *iob; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } *detailed_status = 0; bapi->volume_insync = FALSE; bapi->err_causing_outofsync = 0; bapi->io_bitmap_header = iob = iobuffer_ctr(bapi, LOG_HEADER_SIZE, 0); if (!iob) { info("iob is null"); return -ENOMEM; } if (!was_created) { ret = iobuffer_sync_read(iob); if (ret) { *detailed_status = bapi->err_causing_outofsync = LINVOLFLT_ERR_BITMAP_FILE_CANT_READ; goto cleanup_and_return_failure; } if (memcpy_s(&bapi->bitmap_header, sizeof(bapi->bitmap_header), iob->buffer, sizeof(bapi->bitmap_header))) { ret = INM_EFAULT; goto cleanup_and_return_failure; } if (!bitmap_api_verify_header(bapi, &bapi->bitmap_header)) { *detailed_status = bapi->err_causing_outofsync = LINVOLFLT_ERR_BITMAP_FILE_LOG_FIXED; info("Verify header failed, bitmap file is corrupted"); bapi->corrupt_bitmap = TRUE; } if (bapi->bitmap_header.un.header.recovery_state == BITMAP_LOG_RECOVERY_STATE_CLEAN_SHUTDOWN) { if ((bapi->bitmap_header.un.header.version >= BITMAP_FILE_VERSION2) && (bapi->bitmap_header.un.header.resync_required)) { *detailed_status = bapi->err_causing_outofsync = bapi->bitmap_header.un.header.resync_errcode; bapi->bitmap_header.un.header.resync_required = 0; bapi->bitmap_header.un.header.resync_errcode = 0; bapi->bitmap_header.un.header.resync_errstatus = 0; } else { dbg("previous shutdown was normal"); bapi->volume_insync = TRUE; } } else { *detailed_status = bapi->err_causing_outofsync = LINVOLFLT_ERR_LOST_SYNC_SYSTEM_CRASHED; info("indicates unexpected previous shutdown"); } bapi->bitmap_header.un.header.boot_cycles++; if (bapi->bitmap_header.un.header.version != BITMAP_FILE_VERSION) bitmap_api_upgrade_header(bapi); return 0; } else { bapi->new_bitmap = 1; *detailed_status = bapi->err_causing_outofsync = LINVOLFLT_ERR_BITMAP_FILE_CREATED; } ret = bitmap_api_init_bitmap_file(bapi, detailed_status); if(ret) { *detailed_status = bapi->err_causing_outofsync = LINVOLFLT_ERR_BITMAP_FILE_CANT_INIT; goto cleanup_and_return_failure; } if (bapi->empyt_bitmap) { err("repaired empty bitmap file"); } else { if (bapi->corrupt_bitmap) { err("repaired corrupt bitmap file"); } else { err("created new bitmap file"); } } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return 0; cleanup_and_return_failure: if (bapi->io_bitmap_header != NULL) { err("error in loading bitmap header"); iobuffer_dtr(bapi->io_bitmap_header); bapi->io_bitmap_header = NULL; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with ret value = %d", ret); } return ret; } inm_s32_t bitmap_api_is_volume_insync(bitmap_api_t *bapi, inm_u8_t *volume_insync, inm_s32_t *out_of_sync_err_code) { inm_s32_t ret = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (volume_insync == NULL) return EINVAL; INM_DOWN(&bapi->sem); switch(bapi->bitmap_file_state) { case BITMAP_FILE_STATE_OPENED: case BITMAP_FILE_STATE_RAWIO: *volume_insync = bapi->volume_insync; if (out_of_sync_err_code != NULL) *out_of_sync_err_code = bapi->err_causing_outofsync; break; default: ret = EINVAL; break; } INM_UP(&bapi->sem); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving - volume in sync = %d", *volume_insync); } return ret; } inm_s32_t bitmap_api_is_bitmap_closed(bitmap_api_t *bapi) { inm_s32_t closed = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } INM_DOWN(&bapi->sem); switch(bapi->bitmap_file_state) { case BITMAP_FILE_STATE_OPENED: case BITMAP_FILE_STATE_RAWIO: closed = 0; break; default: closed = 1; break; } INM_UP(&bapi->sem); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving = %d", closed); } return closed; } inm_s32_t bitmap_api_close(bitmap_api_t *bapi, inm_s32_t *close_status) { inm_s32_t ret = 0; int clean_shutdown = 1; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } INM_DOWN(&bapi->sem); switch(bapi->bitmap_file_state) { case BITMAP_FILE_STATE_OPENED: case BITMAP_FILE_STATE_RAWIO: dbg("close issued on %s in %d state\n", bapi->bitmap_filename, bapi->bitmap_file_state); ret = bitmap_api_commit_bitmap_internal(bapi, clean_shutdown, close_status); break; default: ret = EINVAL; break; } if (bapi->sb != NULL) { segmented_bitmap_put(bapi->sb); bapi->sb = NULL; } if (bapi->fssm != NULL) { fstream_segment_mapper_put(bapi->fssm); bapi->fssm = NULL; } if (bapi->fs != NULL) { fstream_close(bapi->fs); fstream_put(bapi->fs); bapi->fs = NULL; } if (bapi->io_bitmap_header != NULL) { iobuffer_put(bapi->io_bitmap_header); bapi->io_bitmap_header = NULL; } driver_ctx->dc_bmap_info.current_bitmap_buffer_memory -= min(bapi->segment_cache_limit * BITMAP_FILE_SEGMENT_SIZE, bapi->bitmap_size_in_bytes); bapi->bitmap_file_state = BITMAP_FILE_STATE_CLOSED; INM_UP(&bapi->sem); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with ret value %d", ret); } return ret; } void bitmap_api_getscaled_offsetandsize_from_diskchange(bitmap_api_t *bapi, disk_chg_t *dc, inm_u64_t *scaled_offset, inm_u64_t *scaled_size) { inm_u64_t _gran_scaled_offset = (inm_u64_t) dc->offset; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } INM_DO_DIV(_gran_scaled_offset, bapi->bitmap_granularity); /* round down offset */ *scaled_offset= _gran_scaled_offset * bapi->bitmap_granularity; /* 1st, calculate how much size grew from rounding down */ *scaled_size = dc->offset - *scaled_offset; /* 2nd, add in the actual size plus rounding factor */ *scaled_size += ((inm_u64_t)dc->length) + (bapi->bitmap_granularity - 1); /* 3rd, now scale it to the granularity */ INM_DO_DIV(*scaled_size, bapi->bitmap_granularity); INM_DO_DIV(*scaled_offset, bapi->bitmap_granularity); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving scaled offset = %llu , scaled size = %llu", *scaled_offset, *scaled_size); } } void bitmap_api_getscaled_offsetandsize_from_writemetadata(bitmap_api_t *bapi, write_metadata_t *wmd, inm_u64_t *scaled_offset, inm_u64_t *scaled_size) { inm_u64_t _gran_scaled_offset = (inm_u64_t) wmd->offset; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } INM_DO_DIV(_gran_scaled_offset, bapi->bitmap_granularity); /* round down offset */ *scaled_offset= _gran_scaled_offset * bapi->bitmap_granularity; /* 1st, calculate how much size grew from rounding down */ *scaled_size = ((inm_u64_t)wmd->offset) - *scaled_offset; /* 2nd, add in the actual size plus rounding factor */ *scaled_size += wmd->length + (bapi->bitmap_granularity - 1); /* 3rd, now scale it to the granularity */ INM_DO_DIV(*scaled_size, bapi->bitmap_granularity); INM_DO_DIV(*scaled_offset, bapi->bitmap_granularity); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } } inm_s32_t bitmap_api_set_writesize_not_to_exceed_volumesize(bitmap_api_t *bapi, disk_chg_t *dc) { inm_s32_t ret = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if ((dc->length + dc->offset) > bapi->volume_size) { if (dc->offset < bapi->volume_size) { dc->length = (inm_u32_t)(bapi->volume_size - dc->offset); ret = 0; } else { ret = EOF_BMAP; } } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with ret value = %d", ret); } return ret; } inm_s32_t bitmap_api_setbits(bitmap_api_t *bapi, bitruns_t *bruns, volume_bitmap_t *vbmap) { inm_s32_t ret = 0; inm_u32_t current_run = 0; inm_u64_t scaled_offset = 0, scaled_size = 0; logheader_t *lh = NULL; /* log header */ bitmap_header_t *bh = NULL; inm_u32_t index = 0, rem = 0; struct inm_list_head *ptr; inm_page_t *pgp; inm_ull64_t rem_runs, nbr_runs; int skip_logged = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } bruns->nbr_runs_processed = 0; bh = &bapi->bitmap_header; lh = &bh->un.header; INM_DOWN(&bapi->sem); switch(bapi->bitmap_file_state) { case BITMAP_FILE_STATE_OPENED: rem_runs = bruns->nbr_runs; __inm_list_for_each(ptr, &bruns->meta_page_list){ pgp = inm_list_entry(ptr, inm_page_t, entry); bruns->runs = (disk_chg_t *)pgp->cur_pg; nbr_runs = pgp->nr_chgs; for(current_run = 0; current_run < nbr_runs; current_run++) { INM_DOWN(&vbmap->sem); if (!vbmap->bitmap_skip_writes) { INM_UP(&vbmap->sem); bitmap_api_getscaled_offsetandsize_from_diskchange(bapi, &bruns->runs[current_run], &scaled_offset, &scaled_size); if ((bruns->runs[current_run].offset < bapi->volume_size) && (bruns->runs[current_run].offset >= 0) && ((bruns->runs[current_run].offset + bruns->runs[current_run].length) <= bapi->volume_size)) { bruns->final_status = segmented_bitmap_set_bitrun(bapi->sb, (inm_u32_t)scaled_size, scaled_offset); ret = bruns->final_status; if (ret) goto out_err; bruns->nbr_runs_processed++; } } else { INM_UP(&vbmap->sem); if (!skip_logged) { err("Skipping bitmap writes"); skip_logged = 1; } /* treat it as successful write so downstream * processing is not affected */ bruns->final_status = 0; bruns->nbr_runs_processed++; } } rem_runs -= nbr_runs; if(!rem_runs) break; } out_err: break; case BITMAP_FILE_STATE_RAWIO: dbg("writing last chance changes \n"); rem_runs = bruns->nbr_runs; __inm_list_for_each(ptr, &bruns->meta_page_list){ pgp = inm_list_entry(ptr, inm_page_t, entry); bruns->runs = (disk_chg_t *)pgp->cur_pg; nbr_runs = pgp->nr_chgs; for(current_run = 0; current_run < nbr_runs; current_run++) { if (lh->last_chance_changes == (MAX_WRITE_GROUPS_IN_BITMAP_HEADER * MAX_CHANGES_IN_WRITE_GROUP)){ lh->changes_lost += bruns->nbr_runs - current_run; break; } bitmap_api_getscaled_offsetandsize_from_diskchange(bapi, &bruns->runs[current_run], &scaled_offset, &scaled_size); while (scaled_size > 0 && lh->last_chance_changes < (MAX_WRITE_GROUPS_IN_BITMAP_HEADER * MAX_CHANGES_IN_WRITE_GROUP)) { index = lh->last_chance_changes/MAX_CHANGES_IN_WRITE_GROUP; rem = lh->last_chance_changes % MAX_CHANGES_IN_WRITE_GROUP; bh->change_groups[index].un.length_offset_pair[rem] = (min(scaled_size, (inm_u64_t)0xffff) << 48) | (scaled_offset & 0xffffffffffffULL); dbg("len off pair = %llu, off = %llu, len = %llu", bh->change_groups[index].un.length_offset_pair[rem], scaled_offset, scaled_size); scaled_offset += min(scaled_size, (inm_u64_t)0xffff); scaled_size -= min(scaled_size, (inm_u64_t)0xffff); lh->last_chance_changes++; } bruns->nbr_runs_processed++; dbg("bruns = %d\n", bruns->nbr_runs_processed); } info("# of last chance changes = %d", lh->last_chance_changes); rem_runs -= nbr_runs; if(!rem_runs) break; } bruns->final_status = 0; break; default: ret = -EBUSY; bruns->final_status = ret; break; } INM_UP(&bapi->sem); if (bruns->completion_callback) bruns->completion_callback(bruns); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with ret value = %d", ret); } return ret; } inm_s32_t bitmap_api_clearbits(bitmap_api_t *bapi, bitruns_t *bruns) { inm_s32_t ret = 0; inm_u32_t current_run; inm_u64_t scaled_offset, scaled_size; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } bruns->nbr_runs_processed = 0; INM_DOWN(&bapi->sem); switch(bapi->bitmap_file_state) { case BITMAP_FILE_STATE_OPENED: for(current_run = 0; current_run < bruns->nbr_runs; current_run++) { bitmap_api_getscaled_offsetandsize_from_diskchange(bapi, &bruns->runs[current_run], &scaled_offset, &scaled_size); if ((bruns->runs[current_run].offset < bapi->volume_size) && (bruns->runs[current_run].offset >= 0) && ((bruns->runs[current_run].offset + bruns->runs[current_run].length) <= bapi->volume_size)) { bruns->final_status = segmented_bitmap_clear_bitrun(bapi->sb, (inm_u32_t)scaled_size, scaled_offset); ret = bruns->final_status; if (ret) break; bruns->nbr_runs_processed++; } } if (bruns->nbr_runs_processed != 0) segmented_bitmap_sync_flush_all(bapi->sb); break; default: ret = -EBUSY; break; } INM_UP(&bapi->sem); bruns->final_status = ret; if (bruns->completion_callback) bruns->completion_callback(bruns); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with ret value = %d", ret); } return ret; } inm_s32_t bitmap_api_get_first_runs(bitmap_api_t *bapi, bitruns_t *bruns) { inm_s32_t ret = 0; inm_u32_t current_run = 0; bruns->nbr_runs = 0; bruns->nbr_runs_processed = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } INM_DOWN(&bapi->sem); switch(bapi->bitmap_file_state) { case BITMAP_FILE_STATE_OPENED: ret = segmented_bitmap_get_first_bitrun(bapi->sb, (inm_u32_t *)&bruns->runs[current_run].length, &bruns->runs[current_run].offset); bruns->runs[current_run].length *= bapi->bitmap_granularity; bruns->runs[current_run].offset *= bapi->bitmap_granularity; if (ret == 0) ret = bitmap_api_set_writesize_not_to_exceed_volumesize(bapi, &bruns->runs[current_run]); if (ret == 0) { current_run++; bruns->nbr_runs_processed++; bruns->nbr_runs++; while(ret == 0 && current_run < MAX_KDIRTY_CHANGES) { ret = segmented_bitmap_get_next_bitrun(bapi->sb, (inm_u32_t *)&bruns->runs[current_run].length, &bruns->runs[current_run].offset); bruns->runs[current_run].length *= bapi->bitmap_granularity; bruns->runs[current_run].offset *= bapi->bitmap_granularity; if (ret == 0) ret = bitmap_api_set_writesize_not_to_exceed_volumesize(bapi, &bruns->runs[current_run]); if (ret == 0) { if ((current_run > 0) && ((bruns->runs[current_run - 1].offset + bruns->runs[current_run -1].length) == (bruns->runs[current_run].offset)) && bruns->runs[current_run-1].length < 0x1000000 && (bruns->runs[current_run-1].length < 0x1000000)) { /* * don't merge if already large size, * prevents int32 overflow */ bruns->runs[current_run-1].length += bruns->runs[current_run].length; } else { bruns->nbr_runs_processed++; bruns->nbr_runs++; current_run++; } } } } if (ret == 0 && current_run == MAX_KDIRTY_CHANGES) { ret = EAGAIN; } else if (ret == EOF_BMAP) { ret = 0; } break; default: ret = EBUSY; break; } INM_UP(&bapi->sem); bruns->final_status = ret; if (bruns->completion_callback) bruns->completion_callback(bruns); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with ret value = %d", ret); } return ret; } inm_s32_t bitmap_api_get_next_runs(bitmap_api_t *bapi, bitruns_t *bruns) { inm_s32_t ret = 0; inm_u32_t current_run = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } bruns->nbr_runs = 0; bruns->nbr_runs_processed = 0; INM_DOWN(&bapi->sem); switch(bapi->bitmap_file_state) { case BITMAP_FILE_STATE_OPENED: while(ret == 0 && current_run < MAX_KDIRTY_CHANGES) { ret = segmented_bitmap_get_next_bitrun(bapi->sb, (inm_u32_t *)&bruns->runs[current_run].length, &bruns->runs[current_run].offset); bruns->runs[current_run].length *= bapi->bitmap_granularity; bruns->runs[current_run].offset *= bapi->bitmap_granularity; if (ret == 0) ret = bitmap_api_set_writesize_not_to_exceed_volumesize(bapi, &bruns->runs[current_run]); if (ret == 0) { if ((current_run > 0) && ((bruns->runs[current_run - 1].offset + bruns->runs[current_run -1].length) == (bruns->runs[current_run].offset)) && bruns->runs[current_run-1].length < 0x1000000 && (bruns->runs[current_run-1].length < 0x1000000)) { /* don't merge if already large size, prevents int32 overflow */ bruns->runs[current_run-1].length += bruns->runs[current_run].length; } else { bruns->nbr_runs_processed++; bruns->nbr_runs++; current_run++; } } if (ret == 0 && current_run == MAX_KDIRTY_CHANGES) { ret = EAGAIN; } else if (ret == EOF_BMAP) { ret = 0; break; } } break; default: ret = EBUSY; break; } INM_UP(&bapi->sem); bruns->final_status = ret; if (bruns->completion_callback) bruns->completion_callback(bruns); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with ret value = %d", ret); } return ret; } inm_s32_t bitmap_api_clear_all_bits(bitmap_api_t *bapi) { inm_s32_t ret = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } INM_DOWN(&bapi->sem); switch(bapi->bitmap_file_state) { case BITMAP_FILE_STATE_OPENED: fstream_enable_buffered_io(bapi->fs); ret = segmented_bitmap_clear_all_bits(bapi->sb); segmented_bitmap_sync_flush_all(bapi->sb); fstream_disable_buffered_io(bapi->fs); fstream_sync(bapi->fs); break; default: ret = EBUSY; break; } INM_UP(&bapi->sem); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with ret value = %d", ret); } return ret; } inm_s32_t move_rawio_changes_to_bitmap(bitmap_api_t *bapi, inm_s32_t *inmage_open_status) { inm_u64_t i = 0, scaled_size = 0, write_size = 0; inm_u64_t size_offset_pair = 0, scaled_offset = 0, write_offset = 0, rounded_volume_size = 0; inm_s32_t status = 0; inm_u32_t max_nr_lcw = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } /* * bitmap has to be in opened state * this operation can't be performed in committed, raw io/closed state */ if (bapi->bitmap_file_state != BITMAP_FILE_STATE_OPENED) return -EINVAL; rounded_volume_size = ((inm_u64_t) bapi->volume_size + bapi->bitmap_granularity - 1); INM_DO_DIV(rounded_volume_size, bapi->bitmap_granularity); rounded_volume_size *= bapi->bitmap_granularity; /* now sweep through and save any last chance changes into bitmap */ max_nr_lcw = min(bapi->bitmap_header.un.header.last_chance_changes, (inm_u32_t)(MAX_WRITE_GROUPS_IN_BITMAP_HEADER * MAX_CHANGES_IN_WRITE_GROUP)); info("%s: Last chance changes - %u", bapi->volume_name, max_nr_lcw); for (i = 0; i < max_nr_lcw; i++) { size_offset_pair = bapi->bitmap_header.change_groups[i / MAX_CHANGES_IN_WRITE_GROUP ].un.length_offset_pair[i%MAX_CHANGES_IN_WRITE_GROUP]; scaled_size = (unsigned long) (size_offset_pair >> 48); scaled_offset = size_offset_pair & 0xFFFFFFFFFFFFULL; dbg("read len off pair = %llu off = %llu len = %llu", size_offset_pair, scaled_offset, scaled_size); write_offset = scaled_offset * bapi->bitmap_granularity; write_size = scaled_size * bapi->bitmap_granularity; if (write_offset < bapi->volume_size) { if ((write_offset + write_size) > rounded_volume_size) { INM_BUG_ON(1); /* * as the granularity used to save changes in raw io mode is same as * bitmap granularity. this should not happen */ dbg("correcting write offset, write size"); scaled_size = (unsigned long) ((inm_u64_t) bapi->volume_size - write_offset); scaled_size = scaled_size + bapi->bitmap_granularity - 1; INM_DO_DIV(scaled_size, bapi->bitmap_granularity); dbg("corrected write size "); } /* In case of header blocks verification signature, size == 0 */ if (scaled_size) { status = segmented_bitmap_set_bitrun(bapi->sb, (inm_u32_t) scaled_size, scaled_offset); if (status) { *inmage_open_status = bapi->err_causing_outofsync = LINVOLFLT_ERR_BITMAP_FILE_CANT_APPLY_SHUTDOWN_CHANGES; bapi->volume_insync = FALSE; info("error writing raw io changes to bitmap"); return status; } } } else { info("discarding change "); } } if (bapi->bitmap_header.un.header.changes_lost) { *inmage_open_status = bapi->err_causing_outofsync = LINVOLFLT_ERR_TOO_MANY_LAST_CHANCE; bapi->volume_insync = FALSE; info("lost changes"); } bapi->bitmap_header.un.header.last_chance_changes = 0; bapi->bitmap_header.un.header.changes_lost = 0; /* unless we shutdown clean, assume dirty */ bapi->bitmap_header.un.header.recovery_state = BITMAP_LOG_RECOVERY_STATE_DIRTY_SHUTDOWN; /* * We have to update the header even if there are no raw io changes. * We have to do this to save the header with new state indicating the bitmap is dirty. */ bapi->bitmap_header.un.header.last_chance_changes = 0; status = bitmap_api_commit_header(bapi, FALSE, inmage_open_status); if (status) { info("error in updating bitmap header "); *inmage_open_status = bapi->err_causing_outofsync = LINVOLFLT_ERR_BITMAP_FILE_CANT_UPDATE_HEADER; bapi->volume_insync = FALSE; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with ret value = %d", status); } return status; } inm_s32_t bitmap_api_init_bitmap_file(bitmap_api_t *bapi, inm_s32_t *inmage_status) { inm_s32_t status = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } status = bitmap_api_fast_zero_bitmap(bapi); if (status != 0) return status; #define bmap_hdr bapi->bitmap_header.un.header bmap_hdr.endian = BITMAP_FILE_ENDIAN_FLAG; bmap_hdr.header_size = sizeof(logheader_t); bmap_hdr.version = BITMAP_FILE_VERSION; bmap_hdr.data_offset = LOG_HEADER_OFFSET; bmap_hdr.bitmap_offset = bapi->bitmap_offset; bmap_hdr.bitmap_size = bapi->nr_bits_in_bitmap; bmap_hdr.bitmap_granularity = bapi->bitmap_granularity; bmap_hdr.volume_size = bapi->volume_size; bmap_hdr.recovery_state = BITMAP_LOG_RECOVERY_STATE_DIRTY_SHUTDOWN; bmap_hdr.last_chance_changes = 0; bmap_hdr.boot_cycles = 0; bmap_hdr.changes_lost = 0; bmap_hdr.resync_required = 0; bmap_hdr.resync_errcode = 0; bmap_hdr.resync_errstatus = 0; status = bitmap_api_commit_header(bapi, FALSE, inmage_status); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with ret value = %d", status); } return status; #undef bmap_hdr } inm_s32_t bitmap_api_commit_bitmap_internal(bitmap_api_t *bapi, int clean_shutdown, inm_s32_t *inmage_status) { inm_s32_t status = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } switch (bapi->bitmap_file_state) { case BITMAP_FILE_STATE_OPENED: bapi->bitmap_header.un.header.recovery_state = clean_shutdown ? BITMAP_LOG_RECOVERY_STATE_CLEAN_SHUTDOWN : BITMAP_LOG_RECOVERY_STATE_DIRTY_SHUTDOWN; segmented_bitmap_sync_flush_all(bapi->sb); status = bitmap_api_commit_header(bapi, FALSE, inmage_status); if (status) { info("unable to write header"); status = LINVOLFLT_ERR_FINAL_HEADER_FS_WRITE_FAILED; } break; case BITMAP_FILE_STATE_RAWIO: if (!bapi->io_bitmap_header) { status = -EINVAL; break; } /* Flush all the changes with a dirty header. This way, we can * guarantee all the lcw have been written to the disk before * marking the header clean on the disk. */ bapi->bitmap_header.un.header.recovery_state = BITMAP_LOG_RECOVERY_STATE_DIRTY_SHUTDOWN; status = bitmap_api_commit_header(bapi, TRUE, inmage_status); if (!status && clean_shutdown) { bapi->bitmap_header.un.header.recovery_state = BITMAP_LOG_RECOVERY_STATE_CLEAN_SHUTDOWN; status = bitmap_api_commit_header(bapi, FALSE, inmage_status); if (status) { info("unable to write header with raw io"); status = LINVOLFLT_ERR_FINAL_HEADER_FS_WRITE_FAILED; } } break; default: status = 1; break; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with ret value = %d", status); } return status; } inm_s32_t bitmap_api_fast_zero_bitmap(bitmap_api_t *bapi) { inm_s32_t status = 0; inm_u64_t i = 0; iobuffer_t *iob = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } iob = iobuffer_ctr(bapi, BITMAP_FILE_SEGMENT_SIZE, 0); if (!iob) { info("memory allocation failed for iobuffer %p\n", iob); return -ENOMEM; } iobuffer_set_fstream(iob, bapi->fs); fstream_enable_buffered_io(bapi->fs); for(i = bapi->bitmap_offset + LOG_HEADER_OFFSET; i < (bapi->bitmap_offset + bapi->bitmap_size_in_bytes); i += BITMAP_FILE_SEGMENT_SIZE) { iobuffer_set_foffset(iob, i); iobuffer_setdirty(iob); status = iobuffer_sync_flush(iob); if (status != 0) { break; } } fstream_disable_buffered_io(bapi->fs); fstream_sync(bapi->fs); iobuffer_put(iob); iob = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with ret value = %d", status); } return status; } inm_s32_t bitmap_api_verify_header_blocks(bitmap_api_t *bapi, bitmap_header_t *hdr) { int i = 0; inm_u64_t sig = 0; char *vname = NULL; char *csig = (char *)&sig; int matches = TRUE; /* The last three bytes of volume name are always unique */ vname = bapi->volume_name + (strlen(bapi->volume_name) - BITMAP_LCW_SIGNATURE_PREFIX_SZ); /* Volume Name as prefix */ for (i = 0; i < BITMAP_LCW_SIGNATURE_PREFIX_SZ; i++) csig[i] = vname[i]; sig |= BITMAP_LCW_SIGNATURE_SUFFIX; info("Signature: %llx", sig); for (i = 0; i < MAX_WRITE_GROUPS_IN_BITMAP_HEADER; i++) { if (hdr->change_groups[i].un.length_offset_pair[0] != sig) { err("Signature Mismatch. CG[%d] %llx != %llx", i, hdr->change_groups[i].un.length_offset_pair[0], sig); matches = FALSE; break; } } return matches; } void bitmap_api_clear_signed_header_blocks(bitmap_api_t *bapi) { int i = 0; for (i = 0; i < MAX_WRITE_GROUPS_IN_BITMAP_HEADER; i++) bapi->bitmap_header.change_groups[i].un.length_offset_pair[0] = 0; } void bitmap_api_sign_header_blocks(bitmap_api_t *bapi) { int i = 0; inm_u64_t sig = 0; char *vname = NULL; char *csig = (char *)&sig; /* The last three bytes of volume name are always unique */ vname = bapi->volume_name + (strlen(bapi->volume_name) - BITMAP_LCW_SIGNATURE_PREFIX_SZ); /* Volume Name as prefix */ for (i = 0; i < BITMAP_LCW_SIGNATURE_PREFIX_SZ; i++) csig[i] = vname[i]; sig |= BITMAP_LCW_SIGNATURE_SUFFIX; info("Signature: %llx", sig); for (i = 0; i < MAX_WRITE_GROUPS_IN_BITMAP_HEADER; i++) bapi->bitmap_header.change_groups[i].un.length_offset_pair[0] = sig; } inm_s32_t bitmap_api_commit_header(bitmap_api_t *bapi, inm_s32_t verify_existing_hdr_for_raw_io, inm_s32_t *inmage_status) { inm_s32_t status = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } switch(bapi->bitmap_file_state) { case BITMAP_FILE_STATE_OPENED: bitmap_api_calculate_hdr_integrity_checksums(&bapi->bitmap_header); if (memcpy_s(bapi->io_bitmap_header->buffer, sizeof(bitmap_header_t), &bapi->bitmap_header, sizeof(bitmap_header_t))) { status = 1; break; } iobuffer_setdirty(bapi->io_bitmap_header); status = iobuffer_sync_flush(bapi->io_bitmap_header); if (status != 0) *inmage_status = LINVOLFLT_ERR_FINAL_HEADER_FS_WRITE_FAILED; break; case BITMAP_FILE_STATE_RAWIO: if (verify_existing_hdr_for_raw_io) status = bitmap_api_read_and_verify_bitmap_header(bapi, inmage_status); else status = 0; if (status == 0) { bitmap_api_calculate_hdr_integrity_checksums(&bapi->bitmap_header); if (memcpy_s(bapi->io_bitmap_header->buffer, sizeof(bitmap_header_t), &bapi->bitmap_header, sizeof(bitmap_header_t))) { status = 1; break; } iobuffer_setdirty(bapi->io_bitmap_header); status = iobuffer_sync_flush(bapi->io_bitmap_header); if (status != 0) *inmage_status = LINVOLFLT_ERR_FINAL_HEADER_DIRECT_WRITE_FAILED; } break; default: status = 1; break; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with ret value = %d", status); } return status; } void bitmap_api_calculate_hdr_integrity_checksums(bitmap_header_t *bhdr) { MD5Context ctx; /* calculate the checksum */ MD5Init(&ctx); MD5Update(&ctx, (unsigned char *)(&bhdr->un.header.endian), HEADER_CHECKSUM_DATA_SIZE); MD5Final(bhdr->un.header.validation_checksum, &ctx); return; } inm_s32_t bitmap_api_read_and_verify_bitmap_header(bitmap_api_t *bapi, inm_s32_t *inmage_status) { inm_s32_t status = 0; bitmap_header_t *hdriob = (bitmap_header_t *)bapi->io_bitmap_header->buffer; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } status = iobuffer_sync_read(bapi->io_bitmap_header); if (status == 0) { if (bitmap_api_verify_header(bapi, hdriob) && bitmap_api_verify_header_blocks(bapi, hdriob)) return 0; else *inmage_status = LINVOLFLT_ERR_FINAL_HEADER_VALIDATE_FAILED; } else { if (inmage_status) *inmage_status = LINVOLFLT_ERR_FINAL_HEADER_READ_FAILED; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with ret value = %d", status); } return status; } inm_s32_t bitmap_api_verify_header(bitmap_api_t *bapi, bitmap_header_t *bheader) { unsigned char actual_checksum[HEADER_CHECKSUM_SIZE] = {0}; MD5Context ctx; inm_s32_t _rc = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } /* recalculate the checksum */ MD5Init(&ctx); MD5Update(&ctx, (unsigned char *)(&(bheader->un.header.endian)), HEADER_CHECKSUM_DATA_SIZE); MD5Final(actual_checksum, &ctx); #define bhdr bheader->un.header _rc = ((bhdr.endian == BITMAP_FILE_ENDIAN_FLAG) && /* Hdr size should match version */ ((bhdr.version == BITMAP_FILE_VERSION1 && bhdr.header_size == BITMAP_HDR1_SIZE) || (bhdr.version == BITMAP_FILE_VERSION2 && bhdr.header_size == BITMAP_HDR2_SIZE))&& (bhdr.data_offset == LOG_HEADER_OFFSET) && (bhdr.bitmap_offset == bapi->bitmap_offset) && (bhdr.bitmap_size == bapi->nr_bits_in_bitmap) && (bhdr.bitmap_granularity == bapi->bitmap_granularity) && (bhdr.volume_size == bapi->volume_size) && (INM_MEM_CMP(bhdr.validation_checksum, actual_checksum, HEADER_CHECKSUM_SIZE) == 0)); if (!_rc) { info("Invalid Header"); info("validition of bmap hdr"); info("endian = %d", (bhdr.endian == BITMAP_FILE_ENDIAN_FLAG)); info("hdr sz = %d", bhdr.header_size); info("version = 0x%x", (bhdr.version)); info("data offset = %d", (bhdr.data_offset == LOG_HEADER_OFFSET)); info("bmap offset = %d", (bhdr.bitmap_offset == bapi->bitmap_offset)); info("nr bmap bits= %d", (bhdr.bitmap_size == bapi->nr_bits_in_bitmap)); info("granularity = %d", (bhdr.bitmap_granularity == bapi->bitmap_granularity)); info("vol size = %d", (bhdr.volume_size == bapi->volume_size)); info("checksum = %d", (INM_MEM_CMP(bhdr.validation_checksum, actual_checksum, HEADER_CHECKSUM_SIZE) == 0)); } #undef bhdr return _rc; } inm_s32_t bitmap_api_commit_bitmap(bitmap_api_t *bapi, int clean_shutdown, inm_s32_t *inmage_close_status) { inm_s32_t status = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } INM_DOWN(&bapi->sem); status = bitmap_api_commit_bitmap_internal(bapi, clean_shutdown, inmage_close_status); INM_UP(&bapi->sem); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return status; } inm_s32_t is_volume_in_sync(bitmap_api_t *bapi, inm_s32_t *vol_in_sync, inm_s32_t *out_of_sync_err_code) { inm_s32_t status = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!bapi || !vol_in_sync || !out_of_sync_err_code) return -EINVAL; INM_DOWN(&bapi->sem); switch(bapi->bitmap_file_state) { case BITMAP_FILE_STATE_OPENED: case BITMAP_FILE_STATE_RAWIO: *vol_in_sync = bapi->volume_insync; *out_of_sync_err_code = bapi->err_causing_outofsync; dbg("volume in sync = %d , out of sync code = %x\n", bapi->volume_insync, bapi->err_causing_outofsync); break; default: status = -EINVAL; break; } INM_UP(&bapi->sem); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with ret value = %d", status); } return status; } inm_u64_t bitmap_api_get_dat_bytes_in_bitmap(bitmap_api_t *bapi, bmap_bit_stats_t *bbsp) { inm_u64_t data_bytes = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (bbsp) { INM_MEM_ZERO(bbsp, sizeof(*bbsp)); bbsp->bbs_bmap_gran = bapi->bitmap_granularity; bbsp->bbs_max_nr_bits_in_chg = (1024 * 1024)/bapi->bitmap_granularity; } if (bapi && bapi->volume_size < (1024*KILOBYTES)) { return 0; } if (bapi->sb) { data_bytes = segmented_bitmap_get_number_of_bits_set(bapi->sb, bbsp); data_bytes *= bapi->bitmap_granularity; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with ret value = %llu", data_bytes); } return data_bytes; } int bitmap_api_open_bitmap_stream(bitmap_api_t *bapi, target_context_t *vcptr, inm_s32_t *detailed_status) { inm_s32_t file_created = 0; inm_s32_t ret = -1; inm_s32_t prev_state = BITMAP_FILE_STATE_UNINITIALIZED; bapi->fs = fstream_ctr(vcptr); if (!bapi->fs) { *detailed_status = LINVOLFLT_ERR_NO_MEMORY; ret = -ENOMEM; goto last; } prev_state = bapi->bitmap_file_state; ret = fstream_open_or_create(bapi->fs, bapi->bitmap_filename, &file_created, bapi->bitmap_size_in_bytes); if (ret) { *detailed_status = LINVOLFLT_ERR_BITMAP_FILE_CANT_OPEN; info("Error in opening bitmap file = %d\n", ret); goto last; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("bitmap file %s opened \n", bapi->bitmap_filename); } if (!bapi->io_bitmap_header) { /* hdr already loaded from physical blocks of volume */ bapi->bitmap_file_state = BITMAP_FILE_STATE_OPENED; ret = bitmap_api_load_bitmap_header_from_filestream(bapi, detailed_status, file_created); if (ret) { info("error in load bhdr = %d", ret); if (bapi->io_bitmap_header != NULL) { iobuffer_put(bapi->io_bitmap_header); bapi->io_bitmap_header = NULL; } goto last; } } else if (file_created) { ret = bitmap_api_fast_zero_bitmap(bapi); if (ret) goto last; /* recorded changes might have been lost here, set out of sync here */ *detailed_status = bapi->err_causing_outofsync = LINVOLFLT_ERR_BITMAP_FILE_CREATED; bapi->bitmap_file_state = BITMAP_FILE_STATE_OPENED; ret = bitmap_api_commit_header(bapi, FALSE, detailed_status); if (ret) { goto last; } } else { bapi->bitmap_file_state = BITMAP_FILE_STATE_OPENED; } ret = move_rawio_changes_to_bitmap(bapi, detailed_status); if (ret) goto last; ret = 0; return ret; last: bapi->bitmap_file_state = prev_state; if (bapi->fs != NULL) { fstream_close(bapi->fs); fstream_put(bapi->fs); bapi->fs = NULL; } return ret; } inm_s32_t is_bmaphdr_loaded(volume_bitmap_t *vbmap) { if (vbmap && vbmap->bitmap_api && vbmap->bitmap_api->io_bitmap_header) { return TRUE; } return FALSE; } inm_s32_t bitmap_api_map_file_blocks(bitmap_api_t *bapi, fstream_raw_hdl_t **hdl) { return fstream_raw_open(bapi->bitmap_filename, 0, sizeof(bitmap_header_t), hdl); } inm_s32_t bitmap_api_switch_to_rawio_mode(bitmap_api_t *bapi, inm_u64_t *resync_error) { inm_s32_t error = 0; int clean_shutdown = 1; fstream_raw_hdl_t *hdl = NULL; INM_DOWN(&bapi->sem); if (bapi->bitmap_file_state != BITMAP_FILE_STATE_OPENED) { *resync_error = ERROR_TO_REG_PRESHUTDOWN_BITMAP_FLUSH_FAILURE; error = LINVOLFLT_ERR_BITMAP_FILE_CANT_OPEN; goto out; } /* Add signatures to header blocks to verify the raw blocks */ bitmap_api_sign_header_blocks(bapi); error = bitmap_api_commit_bitmap_internal(bapi, !clean_shutdown, &error); bitmap_api_clear_signed_header_blocks(bapi); if (error) { *resync_error = ERROR_TO_REG_PRESHUTDOWN_BITMAP_FLUSH_FAILURE; goto out; } error = bitmap_api_map_file_blocks(bapi, &hdl); if (error) { *resync_error = ERROR_TO_REG_LEARN_PHYSICAL_IO_FAILURE; goto out; } bapi->bitmap_file_state = BITMAP_FILE_STATE_RAWIO; fstream_switch_to_raw_mode(bapi->fs, hdl); out: INM_UP(&bapi->sem); return error; } void bitmap_api_set_volume_out_of_sync(bitmap_api_t *bapi, inm_u64_t error_status, inm_u32_t error_code) { bapi->bitmap_header.un.header.resync_required = 1; bapi->bitmap_header.un.header.resync_errcode = error_code; bapi->bitmap_header.un.header.resync_errstatus = error_status; } involflt-0.1.0/src/emd.h0000755000000000000000000000526514467303177013577 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _EMD_TARGET_H_ #define _EMD_TARGET_H_ /* This file should be in sync with dev_handlers/scst_utap.h file. */ typedef unsigned long long emd_handle_t; typedef struct emd_dev { struct list_head dev_list; int dev_id; emd_handle_t dev_handle; unsigned char dev_name[256]; }emd_dev_t; /* struct for Read/write data from emd driver */ typedef struct emd_io { unsigned long eio_rw; void *eio_iovp; unsigned int eio_iovcnt; unsigned int eio_len; emd_handle_t eio_dev_handle; unsigned long long eio_start; const char * eio_iname; }emd_io_t; typedef struct emd_dev_cap { inm_u32_t bsize; inm_u64_t nblocks; inm_u64_t startoff; }emd_dev_cap_t; typedef struct emd_dev_type { int (*exec)(unsigned char *, unsigned char *); // For vendor Commands. /* For attach() with given device name our filter driver will return target_context_t pointer. * Our us attach() will be similar to get_tgt_ctxt_from_uuid_nowait_fabric(). */ emd_handle_t (*attach)(const char *); void (*detach)(emd_handle_t); int (*get_capacity)(emd_handle_t, emd_dev_cap_t *); int (*prepare_write)(emd_handle_t,inm_u64_t, inm_u64_t); int (*exec_write)(emd_io_t *io); int (*exec_read)(emd_io_t *io); int (*exec_io_cancel)(emd_dev_t *, inm_u64_t, inm_u64_t); int (*exec_vacp_write)(emd_handle_t, char *, loff_t); const char* (*get_path)(emd_handle_t, int *); }emd_dev_type_t; int emd_unregister_virtual_device(int dev_id); int emd_register_virtual_device(char *name); int emd_register_virtual_dev_driver(emd_dev_type_t *dev_type); int emd_unregister_virtual_dev_driver(emd_dev_type_t *dev_type); #endif /* _EMD_TARGET_H_ */ involflt-0.1.0/src/telemetry.h0000755000000000000000000000507014467303177015036 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _TEL_H #define _TEL_H #include "target-context.h" #include "change-node.h" #include "telemetry-types.h" /* * Prototypes - telemetry types ops */ void telemetry_set_dbs(inm_u64_t *, inm_u64_t); void telemetry_clear_dbs(inm_u64_t *, inm_u64_t); inm_u64_t telemetry_get_dbs(target_context_t *, inm_s32_t, etTagStateTriggerReason); void telemetry_tag_stats_record(struct _target_context *, tgt_stats_t *); inm_u64_t telemetry_get_wostate(struct _target_context *tgt_ctxt); inm_u64_t telemetry_md_capture_reason(struct _target_context *); void telemetry_tag_common_put(tag_telemetry_common_t *); void telemetry_tag_common_get(tag_telemetry_common_t *); tag_telemetry_common_t *telemetry_tag_common_alloc(inm_s32_t); void telemetry_tag_history_free(tag_history_t *); tag_history_t *telemetry_tag_history_alloc(struct _target_context *, tag_telemetry_common_t *); void telemetry_tag_history_record(struct _target_context *, tag_history_t *); void telemetry_nwo_stats_record(target_context_t *, etWriteOrderState, etWriteOrderState, etWOSChangeReason); void telemetry_check_time_jump(void); /* * Prototypes - telemetry ops */ inm_s32_t telemetry_init(void); void telemetry_shutdown(void); inm_s32_t telemetry_log_tag_history(change_node_t *, target_context_t *, etTagStatus, etTagStateTriggerReason, etMessageType); inm_s32_t telemetry_log_tag_failure(target_context_t *,tag_telemetry_common_t *, inm_s32_t , etMessageType); inm_s32_t telemetry_log_ioctl_failure(tag_telemetry_common_t *, inm_s32_t, etMessageType); void telemetry_log_drop_error(inm_s32_t); #endif involflt-0.1.0/src/iobuffer.c0000755000000000000000000001717414467303177014630 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include "involflt.h" #include "involflt-common.h" #include "data-mode.h" #include "change-node.h" #include "filestream.h" #include "iobuffer.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "bitmap_api.h" #include "VBitmap.h" #include "work_queue.h" #include "data-file-mode.h" #include "target-context.h" #include "driver-context.h" #include "utils.h" #include "tunable_params.h" extern driver_context_t *driver_ctx; iobuffer_t *iobuffer_ctr(bitmap_api_t *bapi, inm_u32_t buffer_size, inm_u32_t index) { iobuffer_t *iob; unsigned char *buffer; if(!buffer_size) return NULL; if (buffer_size == BITMAP_FILE_SEGMENT_SIZE) { buffer = INM_MEMPOOL_ALLOC(driver_ctx->dc_bmap_info.iob_data_pool, INM_KM_SLEEP); INM_BUG_ON(!buffer); } else { buffer = (unsigned char *)INM_VMALLOC((unsigned long)buffer_size, INM_KM_SLEEP, INM_KERNEL_HEAP); } if(!buffer) return NULL; INM_MEM_ZERO(buffer,buffer_size); iob = INM_MEMPOOL_ALLOC(driver_ctx->dc_bmap_info.iob_obj_pool, INM_KM_SLEEP); INM_BUG_ON(!iob); if(!iob) return iob; INM_MEM_ZERO(iob, sizeof(*iob)); iob->bapi = bapi; iob->buffer = buffer; iob->size = buffer_size; iob->dirty = 0; iob->starting_offset = 0; iob->fssm = bapi->fssm; iob->fssm_index = index; INM_INIT_LIST_HEAD(&iob->list_entry); INM_ATOMIC_SET(&iob->refcnt,1); INM_ATOMIC_SET(&iob->locked, 0); iob->starting_offset = bapi->bitmap_offset + 0; return iob; } void iobuffer_dtr(iobuffer_t *iob) { if(!iob) return; if (iob->size == BITMAP_FILE_SEGMENT_SIZE) INM_MEMPOOL_FREE(iob->buffer, driver_ctx->dc_bmap_info.iob_data_pool); else INM_VFREE((void *)iob->buffer, iob->size, INM_KERNEL_HEAP); iob->buffer = NULL; INM_MEMPOOL_FREE(iob, driver_ctx->dc_bmap_info.iob_obj_pool); iob = NULL; } iobuffer_t *iobuffer_get(iobuffer_t *iob) { INM_ATOMIC_INC(&iob->refcnt); return iob; } void iobuffer_put(iobuffer_t *iob) { if (!iob) return; if (INM_ATOMIC_DEC_AND_TEST(&iob->refcnt)) iobuffer_dtr(iob); } inm_s32_t iobuffer_sync_read(iobuffer_t *iob) { inm_s32_t ret = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (iob->dirty) return -EBUSY; if (INM_ATOMIC_READ(&iob->locked) > 1) return -EBUSY; ret = fstream_read(iob->bapi->fs, iob->buffer, iob->size, iob->starting_offset); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with ret value = %d", ret); } return ret; } inm_s32_t iobuffer_sync_flush(iobuffer_t *iob) { inm_s32_t ret = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!iob->dirty) return 0; if (INM_ATOMIC_READ(&iob->locked) > 0) return -EBUSY; ret = fstream_write(iob->bapi->fs, iob->buffer, iob->size, iob->starting_offset); if (!ret) iob->dirty = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with ret value = %d", ret); } return ret; } void iobuffer_set_fstream(iobuffer_t *iob, fstream_t *fs) { iob->bapi->fs = fs; } void iobuffer_set_foffset(iobuffer_t *iob, inm_u64_t file_offset) { iob->starting_offset = file_offset; } void iobuffer_set_owner_index(iobuffer_t *iob, inm_u32_t owner_index) { iob->fssm_index = owner_index; } inm_u32_t iobuffer_get_owner_index(iobuffer_t *iob) { return iob->fssm_index; } inm_s32_t iobuffer_isdirty(iobuffer_t *iob) { return (int)iob->dirty; } inm_s32_t iobuffer_islocked(iobuffer_t *iob) { return (INM_ATOMIC_READ(&iob->locked) > 0); } void iobuffer_setdirty(iobuffer_t *iob) { iob->dirty = 1; } void iobuffer_lockbuffer(iobuffer_t *iob) { INM_ATOMIC_INC(&iob->locked); } void iobuffer_unlockbuffer(iobuffer_t *iob) { INM_ATOMIC_DEC(&iob->locked); } void iobuffer_learn_physical_iofilter(iobuffer_t *iob, struct bio *bio) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (!iob) { info("iob is null"); return; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } } inm_s32_t iobuffer_initialize_memory_lookaside_list(void) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } driver_ctx->dc_bmap_info.iob_obj_cache = INM_KMEM_CACHE_CREATE("iob_obj_cache", sizeof(iobuffer_t), 0, INM_SLAB_HWCACHE_ALIGN, NULL, NULL, INM_MAX_NR_IO_BUFFER_POOL, INM_MIN_NR_IO_BUFFER_POOL, INM_UNPINNED); if (!driver_ctx->dc_bmap_info.iob_obj_cache) { err("INM_KMEM_CACHE_CREATE failed to create iob_obj_cache\n"); goto fail; } driver_ctx->dc_bmap_info.iob_data_cache = INM_KMEM_CACHE_CREATE("iob_data_cache", BITMAP_FILE_SEGMENT_SIZE, 0, INM_SLAB_HWCACHE_ALIGN, NULL, NULL, INM_MAX_NR_IO_BUFFER_DATA_POOL, INM_MIN_NR_IO_BUFFER_DATA_POOL, INM_UNPINNED); if (!driver_ctx->dc_bmap_info.iob_data_cache) { err("INM_KMEM_CACHE_CREATE failed to create iob_data_cache\n"); goto fail; } driver_ctx->dc_bmap_info.iob_obj_pool = INM_MEMPOOL_CREATE(\ PAGE_SIZE/sizeof(iobuffer_t), INM_MEMPOOL_ALLOC_SLAB, INM_MEMPOOL_FREE_SLAB, driver_ctx->dc_bmap_info.iob_obj_cache); if (!driver_ctx->dc_bmap_info.iob_obj_pool) { err("mem pool create failed for iob_obj_pool\n"); goto fail; } driver_ctx->dc_bmap_info.iob_data_pool = INM_MEMPOOL_CREATE(\ MAX_BITMAP_SEGMENT_BUFFERS, INM_MEMPOOL_ALLOC_SLAB, INM_MEMPOOL_FREE_SLAB, driver_ctx->dc_bmap_info.iob_data_cache); if (!driver_ctx->dc_bmap_info.iob_data_pool) { err("mem pool create failed for iob_data_pool\n"); goto fail; } return 0; fail: if (driver_ctx->dc_bmap_info.iob_obj_pool) { INM_MEMPOOL_DESTROY(driver_ctx->dc_bmap_info.iob_obj_pool); driver_ctx->dc_bmap_info.iob_obj_pool = NULL; } if (driver_ctx->dc_bmap_info.iob_data_cache) { INM_KMEM_CACHE_DESTROY(driver_ctx->dc_bmap_info.iob_data_cache); driver_ctx->dc_bmap_info.iob_data_cache = NULL; } if (driver_ctx->dc_bmap_info.iob_obj_cache) { INM_KMEM_CACHE_DESTROY(driver_ctx->dc_bmap_info.iob_obj_cache); driver_ctx->dc_bmap_info.iob_obj_cache = NULL; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return -ENOMEM; } void iobuffer_terminate_memory_lookaside_list(void) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (driver_ctx->dc_bmap_info.iob_obj_cache) { if (driver_ctx->dc_bmap_info.iob_obj_pool) INM_MEMPOOL_DESTROY(driver_ctx->dc_bmap_info.iob_obj_pool); INM_KMEM_CACHE_DESTROY(driver_ctx->dc_bmap_info.iob_obj_cache); driver_ctx->dc_bmap_info.iob_obj_cache = NULL; } if (driver_ctx->dc_bmap_info.iob_data_cache) { if (driver_ctx->dc_bmap_info.iob_data_pool) INM_MEMPOOL_DESTROY(driver_ctx->dc_bmap_info.iob_data_pool); INM_KMEM_CACHE_DESTROY(driver_ctx->dc_bmap_info.iob_data_cache); driver_ctx->dc_bmap_info.iob_data_cache = NULL; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } } involflt-0.1.0/src/filestream_segment_mapper.h0000755000000000000000000000525114467303177020246 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _INMAGE_FILESTREAM_SEGMENT_MAPPER_H #define _INMAGE_FILESTREAM_SEGMENT_MAPPER_H #include "involflt-common.h" #define BITMAP_FILE_SEGMENT_SIZE (0x1000) #define MAX_BITMAP_SEGMENT_BUFFERS 0x41 /*65 segments*/ //bitmap operation #define BITMAP_OP_SETBITS 0 #define BITMAP_OP_CLEARBITS 1 #define BITMAP_OP_INVERTBITS 2 struct _bitmap_api_tag; /* typedef'ed to bitmap_api_t */ struct _volume_bitmap; typedef struct _fstream_segment_mapper_tag { struct _bitmap_api_tag *bapi; inm_u32_t cache_size; inm_atomic_t refcnt; /* index for buffer cache pages */ unsigned char **buffer_cache_index; struct inm_list_head segment_list; inm_u32_t nr_free_buffers; inm_u32_t nr_cache_hits; inm_u32_t nr_cache_miss; inm_u32_t segment_size; inm_u64_t starting_offset; }fstream_segment_mapper_t; fstream_segment_mapper_t *fstream_segment_mapper_ctr(void); void fstream_segment_mapper_dtr(fstream_segment_mapper_t *fssm); fstream_segment_mapper_t *fstream_segment_mapper_get(fstream_segment_mapper_t *fssm); void fstream_segment_mapper_put(fstream_segment_mapper_t *fssm); inm_s32_t fstream_segment_mapper_attach(fstream_segment_mapper_t *fssm, struct _bitmap_api_tag *bapi, inm_u64_t offset, inm_u64_t min_file_size, inm_u32_t segment_cache_limit); inm_s32_t fstream_segment_mapper_detach(fstream_segment_mapper_t *fssm); inm_s32_t fstream_segment_mapper_read_and_lock(fstream_segment_mapper_t *fssm, inm_u64_t offset, unsigned char **return_iobuf_ptr, inm_u32_t *return_seg_size); inm_s32_t fstream_segment_mapper_unlock_and_mark_dirty(fstream_segment_mapper_t * fssm, inm_u64_t offset); inm_s32_t fstream_segment_mapper_unlock(fstream_segment_mapper_t * fssm, inm_u64_t offset); inm_s32_t fstream_segment_mapper_flush(fstream_segment_mapper_t * fssm, inm_u64_t offset); inm_s32_t fstream_segment_mapper_sync_flush_all(fstream_segment_mapper_t *fssm); #endif /* _INMAGE_FILESTREAM_SEGMENT_MAPPER_H */ involflt-0.1.0/src/filter_host.c0000755000000000000000000030053114467303177015341 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include "involflt.h" #include "involflt-common.h" #include "utils.h" #include "change-node.h" #include "filestream.h" #include "iobuffer.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "bitmap_api.h" #include "VBitmap.h" #include "work_queue.h" #include "data-file-mode.h" #include "target-context.h" #include "data-mode.h" #include "driver-context.h" #include "file-io.h" #include "metadata-mode.h" #include "statechange.h" #include "tunable_params.h" #include "db_routines.h" #include "filter.h" #include "filter_lun.h" #include "ioctl.h" #include "filter_host.h" #include "osdep.h" #include "telemetry.h" #include "errlog.h" #include "filestream_raw.h" #include "verifier.h" #include "telemetry-exception.h" /* driver state */ inm_s32_t inm_mod_state; #ifdef IDEBUG_MIRROR_IO inm_s32_t inject_atio_err = 0; inm_s32_t inject_ptio_err = 0; inm_s32_t inject_vendorcdb_err = 0; inm_s32_t clear_vol_entry_err = 0; #endif static inm_s32_t block_sd_open(void); static void restore_sd_open(void); struct completion_chk_req { target_context_t *ctx; inm_completion_t comp; }; typedef struct completion_chk_req completion_chk_req_t; extern driver_context_t *driver_ctx; static void flt_orig_endio(struct bio *bio, inm_s32_t error); extern void involflt_completion(target_context_t *tgt_ctxt, write_metadata_t *wmd, void *bio, int lock_held); #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,10) extern inm_s32_t remap_pfn_range(struct vm_area_struct *, unsigned long, unsigned long , unsigned long , pgprot_t ); #define REMAP_PAGE remap_pfn_range #define PAGE_2_PFN_OR_PHYS(x) (page_to_pfn(x)) #else extern inm_s32_t remap_page_range(struct vm_area_struct *, unsigned long , unsigned long , unsigned long , pgprot_t ); #define REMAP_PAGE remap_page_range #define PAGE_2_PFN_OR_PHYS(x) (page_to_phys(x)) #endif typedef void (flt_part_release)(struct kobject *); typedef void (flt_disk_release)(struct kobject *); static flt_part_release *flt_part_release_fn = NULL; static flt_disk_release *flt_disk_release_fn = NULL; static struct kobj_type *disk_ktype_ptr = NULL; static struct kobj_type *part_ktype_ptr = NULL; void update_cur_dat_pg(change_node_t *, data_page_t *, int); data_page_t *get_cur_data_pg(change_node_t *node, inm_s32_t *offset); static void flt_end_io_chain(struct bio *bio, inm_s32_t error); inm_s32_t driver_state = DRV_LOADED_FULLY; req_queue_info_t *get_qinfo_from_kobj(struct kobject *kobj) { req_queue_info_t *req_q; struct kobj_type *dev_ktype = NULL; dev_ktype = kobj->ktype; if(!dev_ktype) return NULL; if(dev_ktype->release == flt_queue_obj_rel) req_q = container_of(dev_ktype, req_queue_info_t, mod_kobj_type); else req_q = NULL; return req_q; } void reset_stable_pages_for_all_devs(void) { #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,9,0) struct inm_list_head *entry = NULL; req_queue_info_t *qinfo = NULL; struct request_queue *q = NULL; inm_irqflag_t flag = 0; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_host_info.rq_list_lock, flag); __inm_list_for_each(entry, &driver_ctx->dc_host_info.rq_list) { qinfo = inm_list_entry(entry, req_queue_info_t, next); q = qinfo->q; if (qinfo->rqi_flags & INM_STABLE_PAGES_FLAG_SET) { info("Setting stable pages off for %p", q); CLEAR_STABLE_PAGES(q); qinfo->rqi_flags &= ~INM_STABLE_PAGES_FLAG_SET; } } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_host_info.rq_list_lock, flag); #else return; #endif } void set_stable_pages_for_all_devs(void) { #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,9,0) struct inm_list_head *entry = NULL; req_queue_info_t *qinfo = NULL; struct request_queue *q = NULL; inm_irqflag_t flag = 0; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_host_info.rq_list_lock, flag); __inm_list_for_each(entry, &driver_ctx->dc_host_info.rq_list) { qinfo = inm_list_entry(entry, req_queue_info_t, next); q = qinfo->q; if (!TEST_STABLE_PAGES(q)) { qinfo->rqi_flags |= INM_STABLE_PAGES_FLAG_SET; info("Setting stable pages on for %p", q); SET_STABLE_PAGES(q); } } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_host_info.rq_list_lock, flag); #else return; #endif } void add_qinfo_to_dc(req_queue_info_t *q_info) { inm_list_add_tail(&q_info->next, &driver_ctx->dc_host_info.rq_list); } void remove_qinfo_from_dc(req_queue_info_t *q_info) { inm_list_del(&q_info->next); } void get_qinfo(req_queue_info_t *q_info) { INM_ATOMIC_INC(&q_info->ref_cnt); } void inm_exchange_strategy(host_dev_ctx_t *hdcp) { unsigned long lock_flag = 0; struct inm_list_head *ptr = NULL; host_dev_t *hdc_dev = NULL; req_queue_info_t *q_info; __inm_list_for_each(ptr, &hdcp->hdc_dev_list_head) { hdc_dev = inm_list_entry(ptr, host_dev_t, hdc_dev_list); q_info = hdc_dev->hdc_req_q_ptr; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_host_info.rq_list_lock, lock_flag); if(INM_ATOMIC_DEC_AND_TEST(&q_info->vol_users)){ remove_qinfo_from_dc(q_info); #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) (void)xchg(&q_info->q->mq_ops, q_info->orig_mq_ops); #else (void)xchg(&q_info->q->make_request_fn, q_info->orig_make_req_fn); #endif (void)xchg(&q_info->q->kobj.ktype, q_info->orig_kobj_type); #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,9,0) if (q_info->rqi_flags & INM_STABLE_PAGES_FLAG_SET) { CLEAR_STABLE_PAGES(q_info->q); q_info->rqi_flags &= ~INM_STABLE_PAGES_FLAG_SET; } #endif } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_host_info.rq_list_lock, lock_flag); } } void put_qinfo(req_queue_info_t *q_info) { if(INM_ATOMIC_DEC_AND_TEST(&q_info->ref_cnt)) { info("Destroying q_info"); kfree(q_info); } } void restore_disk_rel_ptrs(void) { if(flt_disk_release_fn && disk_ktype_ptr) disk_ktype_ptr->release = flt_disk_release_fn; if(flt_part_release_fn && part_ktype_ptr) part_ktype_ptr->release = flt_part_release_fn; } void init_tc_kobj(inm_block_device_t *bdev, struct kobject **hdc_disk_kobj_ptr) { #if LINUX_VERSION_CODE >= KERNEL_VERSION(5,11,0) if (!bdev_is_partition(bdev)) { if(flt_disk_release_fn == NULL) { INM_BUG_ON(bdev_kobj(bdev)->ktype->release == NULL); flt_disk_release_fn = bdev_kobj(bdev)->ktype->release; disk_ktype_ptr = bdev_kobj(bdev)->ktype; } (void)xchg(&bdev_kobj(bdev)->ktype->release, &flt_disk_obj_rel); *hdc_disk_kobj_ptr = bdev_kobj(bdev); } else { if(flt_part_release_fn == NULL) { INM_BUG_ON(bdev_kobj(bdev)->ktype == NULL); flt_part_release_fn = bdev_kobj(bdev)->ktype->release; part_ktype_ptr = bdev_kobj(bdev)->ktype; } (void)xchg(&bdev_kobj(bdev)->ktype->release, &flt_part_obj_rel); *hdc_disk_kobj_ptr = bdev_kobj(bdev); } #else #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,30) if(bdev == bdev->bd_contains) { if(flt_disk_release_fn == NULL) { INM_BUG_ON(bdev->bd_disk->part0.__dev.kobj.ktype->release == NULL); flt_disk_release_fn = bdev->bd_disk->part0.__dev.kobj.ktype->release; disk_ktype_ptr = bdev->bd_disk->part0.__dev.kobj.ktype; } (void)xchg(&bdev->bd_disk->part0.__dev.kobj.ktype->release, &flt_disk_obj_rel); *hdc_disk_kobj_ptr = &bdev->bd_disk->part0.__dev.kobj; } else { if(flt_part_release_fn == NULL) { INM_BUG_ON(bdev->bd_part->__dev.kobj.ktype == NULL); flt_part_release_fn = bdev->bd_part->__dev.kobj.ktype->release; part_ktype_ptr = bdev->bd_part->__dev.kobj.ktype; } (void)xchg(&bdev->bd_part->__dev.kobj.ktype->release, &flt_part_obj_rel); *hdc_disk_kobj_ptr = &bdev->bd_part->__dev.kobj; } #else #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,27) if(!bdev->bd_part) { if(flt_disk_release_fn == NULL) { INM_BUG_ON(bdev->bd_disk->dev.kobj.ktype->release == NULL); flt_disk_release_fn = bdev->bd_disk->dev.kobj.ktype->release; disk_ktype_ptr = bdev->bd_disk->dev.kobj.ktype; } (void)xchg(&bdev->bd_disk->dev.kobj.ktype->release, &flt_disk_obj_rel); *hdc_disk_kobj_ptr = &bdev->bd_disk->dev.kobj; } else { if(flt_part_release_fn == NULL) { INM_BUG_ON(bdev->bd_part->dev.kobj.ktype == NULL); flt_part_release_fn = bdev->bd_part->dev.kobj.ktype->release; part_ktype_ptr = bdev->bd_part->dev.kobj.ktype; } (void)xchg(&bdev->bd_part->dev.kobj.ktype->release, &flt_part_obj_rel); *hdc_disk_kobj_ptr = &bdev->bd_part->dev.kobj; } #else if(!bdev->bd_part) { if(flt_disk_release_fn == NULL) { INM_BUG_ON(bdev->bd_disk->kobj.kset->ktype->release == NULL); flt_disk_release_fn = bdev->bd_disk->kobj.kset->ktype->release; disk_ktype_ptr = bdev->bd_disk->kobj.kset->ktype; } (void)xchg(&bdev->bd_disk->kobj.kset->ktype->release, &flt_disk_obj_rel); *hdc_disk_kobj_ptr = &bdev->bd_disk->kobj; } else { if(flt_part_release_fn == NULL) { INM_BUG_ON(bdev->bd_part->kobj.ktype == NULL); flt_part_release_fn = bdev->bd_part->kobj.ktype->release; part_ktype_ptr = bdev->bd_part->kobj.ktype; } (void)xchg(&bdev->bd_part->kobj.ktype->release, &flt_part_obj_rel); *hdc_disk_kobj_ptr = &bdev->bd_part->kobj; } #endif #endif #endif } req_queue_info_t * alloc_and_init_qinfo(inm_block_device_t *bdev, target_context_t *ctx) { struct request_queue *q = bdev_get_queue(bdev); req_queue_info_t *new_q_info, *q_info; unsigned long lock_flag = 0; new_q_info = (req_queue_info_t *)INM_KMALLOC(sizeof(req_queue_info_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!new_q_info) return NULL; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_host_info.rq_list_lock, lock_flag); q_info = get_qinfo_from_kobj(&bdev->bd_disk->queue->kobj); if (q_info) { INM_KFREE(new_q_info, sizeof(req_queue_info_t), INM_KERNEL_HEAP); goto out; } q_info = new_q_info; INM_MEM_ZERO(q_info, sizeof(req_queue_info_t)); #if LINUX_VERSION_CODE < KERNEL_VERSION(5, 8, 0) && !defined(SLES15SP3) q_info->orig_make_req_fn = q->make_request_fn; #endif q_info->q = q; if (q->kobj.ktype) { memcpy_s(&(q_info->mod_kobj_type), sizeof(struct kobj_type), q->kobj.ktype, sizeof(struct kobj_type)); } else { q_info->mod_kobj_type.release = NULL; q_info->mod_kobj_type.sysfs_ops = NULL; q_info->mod_kobj_type.default_attrs = NULL; } INM_ATOMIC_SET(&q_info->ref_cnt, 0); INM_ATOMIC_SET(&q_info->vol_users, 0); q_info->mod_kobj_type.release = flt_queue_obj_rel; q_info->orig_kobj_type = q->kobj.ktype; #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) q_info->tc = ctx; q_info->orig_mq_ops = q->mq_ops; memcpy_s(&q_info->mod_mq_ops, sizeof(struct blk_mq_ops), q_info->orig_mq_ops, sizeof(struct blk_mq_ops)); q_info->mod_mq_ops.queue_rq = inm_queue_rq; (void)xchg(&q->mq_ops, &q_info->mod_mq_ops); #else /* now exchange pointers for make_request function and kobject type */ (void)xchg(&q->make_request_fn, &flt_make_request_fn); #endif (void)xchg(&q->kobj.ktype, &q_info->mod_kobj_type); #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,9,0) if (driver_ctx->tunable_params.stable_pages && !TEST_STABLE_PAGES(q)) { q_info->rqi_flags |= INM_STABLE_PAGES_FLAG_SET; SET_STABLE_PAGES(q); } #endif add_qinfo_to_dc(q_info); out: get_qinfo(q_info); INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_host_info.rq_list_lock, lock_flag); return q_info; } void dump_bio(struct bio *bio) { inm_bvec_iter_t idx; struct bio_vec *bvec; dm_bio_info_t *info = bio->bi_private; int vcnt = 0; #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,16,0) inm_bvec_iter_t iter; struct bio_vec vec; #endif err("bio: %p", bio); if (bio->bi_end_io == flt_end_io_fn && info) { /* end io */ err("bio->bi_idx: %d", info->bi_idx); err("bio->bi_sector: %llu", (inm_u64_t)info->bi_sector); err("bio->bi_size: %u", info->bi_size); } else { /* make request */ err("bio->bi_idx: %d", INM_BUF_IDX(bio)); err("bio->bi_sector: %llu", (inm_u64_t)INM_BUF_SECTOR(bio)); err("bio->bi_size: %u", INM_BUF_COUNT(bio)); } err("bio->bi_bdev: %p", INM_BUF_BDEV(bio)); err("bio->bi_flags: %lu", (unsigned long)bio->bi_flags); err("bio->bi_rw: %lu", (unsigned long)inm_bio_rw(bio)); err("bio->bi_vcnt: %d", bio->bi_vcnt); #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,16,0) bvec = &vec; if (bio->bi_end_io == flt_end_io_fn && info) { /* end io */ INM_BVEC_ITER_IDX(iter) = info->bi_idx; INM_BVEC_ITER_SECTOR(iter) = info->bi_sector; INM_BVEC_ITER_SZ(iter) = info->bi_size; } else { /* Make request */ INM_BVEC_ITER_IDX(iter) = INM_BUF_IDX(bio); INM_BVEC_ITER_SECTOR(iter) = INM_BUF_SECTOR(bio); INM_BVEC_ITER_SZ(iter) = INM_BUF_COUNT(bio); } INM_BVEC_ITER_BVDONE(iter) = INM_BVEC_ITER_BVDONE(INM_BUF_ITER(bio)); idx = iter; /* structure assignment */ __bio_for_each_segment(vec, bio, idx, iter) #else __bio_for_each_segment(bvec, bio, idx, info->bi_idx) #endif { err("bio->bv_page[%d]: %p", vcnt, bvec->bv_page); err("bio->bv_len[%d]: %u", vcnt, bvec->bv_len); err("bio->bv_offset[%d]: %u", vcnt, bvec->bv_offset); vcnt++; } } void inm_handle_bad_bio(target_context_t *tgt_ctxt, inm_buf_t *bio) { static int print_once = 0; telemetry_set_exception(tgt_ctxt->tc_guid, ecUnsupportedBIO, INM_BIO_RW_FLAGS(bio)); queue_worker_routine_for_set_volume_out_of_sync(tgt_ctxt, ERROR_TO_REG_UNSUPPORTED_IO, -EOPNOTSUPP); /* dont flood the syslog */ if (print_once) return; dump_bio(bio); dump_stack(); print_once = 1; } static struct inm_list_head * copy_vec_to_data_pages(target_context_t *tgt_ctxt, struct bio_vec *bvec, inm_wdata_t *wdatap, struct inm_list_head *change_node_list, inm_s32_t *bytes_res_node) { data_page_t *pg; inm_s32_t pg_rem = 0, pg_offset = 0, seg_offset = 0, seg_rem = 0; inm_s32_t bytes_to_copy = 0; inm_s32_t to_copy = 0; char *src,*dst; change_node_t *node; struct bio *bio = (struct bio *) wdatap->wd_privp; inm_s32_t org_poffset = 0; static int print_once = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("entered"); } pg_offset = 0; node = inm_list_entry(change_node_list, change_node_t, next); INM_BUG_ON(node == NULL); bytes_to_copy = wdatap->wd_cplen; if(bytes_to_copy == 0) goto out; pg = get_cur_data_pg(node, &pg_offset); pg_rem = (PAGE_SIZE - pg_offset); org_poffset = pg_offset; INM_BUG_ON(pg_rem <= 0); INM_BUG_ON(pg == NULL); INM_BUG_ON((void *)pg == (void *)&node->data_pg_head); dst = INM_KMAP_ATOMIC(pg->page, KM_SOFTIRQ1); dbg("Vec = %p", bvec); seg_offset = bvec->bv_offset; seg_rem = MIN(bvec->bv_len, bytes_to_copy); src = INM_KMAP_ATOMIC(bvec->bv_page, KM_SOFTIRQ0); while (seg_rem) { if (*bytes_res_node) { to_copy = MIN(seg_rem, pg_rem); to_copy = MIN(to_copy, *bytes_res_node); dbg("SPage = %p, SOffset = %d, DPage = %p, DOffset = %d copy = %d", src, seg_offset, dst, pg_offset, to_copy); memcpy_s((char *)(dst + pg_offset), to_copy, (char *)(src + seg_offset), to_copy); seg_rem -= to_copy; seg_offset += to_copy; bytes_to_copy -= to_copy; pg_rem -= to_copy; pg_offset += to_copy; *bytes_res_node -= to_copy; INM_BUG_ON(seg_rem < 0); INM_BUG_ON(pg_rem < 0); INM_BUG_ON(seg_offset > PAGE_SIZE); INM_BUG_ON(pg_offset > PAGE_SIZE); INM_BUG_ON(bytes_to_copy < 0); if (!bytes_to_copy) break; } if (!pg_rem || !*bytes_res_node) { INM_KUNMAP_ATOMIC(src, KM_SOFTIRQ0); INM_KUNMAP_ATOMIC(dst, KM_SOFTIRQ1); if (!pg_rem) { pg = get_next_data_page(pg->next.next, &pg_rem, &pg_offset, node); } else { /* update offsets for current page in change node structure */ update_cur_dat_pg(node, pg, pg_offset); INM_BUG_ON(!(node->flags & KDIRTY_BLOCK_FLAG_SPLIT_CHANGE_MASK)); /* * Valid case for split io, node needs to be changed. Use next * change node from change_node_list if current one is full */ node = inm_list_entry(node->next.next, change_node_t, next); INM_BUG_ON(!node); INM_BUG_ON(!(node->flags & KDIRTY_BLOCK_FLAG_SPLIT_CHANGE_MASK)); /* Reset destination page offset values */ pg = get_cur_data_pg(node, &pg_offset); pg_rem = (PAGE_SIZE - pg_offset); *bytes_res_node = node->data_free; } dst = INM_KMAP_ATOMIC(pg->page, KM_SOFTIRQ1); src = INM_KMAP_ATOMIC(bvec->bv_page, KM_SOFTIRQ0); } } INM_BUG_ON(bytes_to_copy < 0); INM_BUG_ON(seg_rem != 0); INM_KUNMAP_ATOMIC(src, KM_SOFTIRQ0); /* update offsets for current page in change node structure */ update_cur_dat_pg(node, pg, pg_offset); INM_KUNMAP_ATOMIC(dst, KM_SOFTIRQ1); /* Detect any under/overrun */ /* New offset should match old offset + len */ if (!(node->flags & KDIRTY_BLOCK_FLAG_SPLIT_CHANGE_MASK) && ((((org_poffset + wdatap->wd_cplen) & (INM_PAGESZ - 1)) != pg_offset) || bytes_to_copy)) { if (!print_once) { err("Data copy error: Org: %d Len: %d New: %d " "First: %d Last: %d Remaining: %d", org_poffset, wdatap->wd_cplen, pg_offset, CHANGE_NODE_IS_FIRST_DATA_PAGE(node, pg), CHANGE_NODE_IS_LAST_DATA_PAGE(node, pg), bytes_to_copy); err("bytes_res_node: %d to_copy = %d seg_rem = %d " "seg_offset = %d pg_rem = %d", *bytes_res_node, to_copy, seg_rem, seg_offset, pg_rem); print_once = 1; } inm_handle_bad_bio(tgt_ctxt, bio); } out: return &(node->next); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("leaving"); } } /* * BIO without vectors - zero the data */ void copy_no_vector_bio_data_to_data_pages(target_context_t *tgt_ctxt, inm_wdata_t *wdatap, struct inm_list_head *change_node_list) { #ifdef RHEL5 INM_BUG_ON("Usupported op on RHEL5"); return; #else inm_buf_t *bio = wdatap->wd_privp; struct bio_vec vec = {0}; inm_u32_t orglen = 0; inm_u32_t iolen = 0; change_node_t *node; inm_s32_t bytes_res_node = 0; dbg("Write Zeroes: %u", wdatap->wd_cplen); if (!(INM_IS_BIO_WOP(bio, INM_REQ_DISCARD) || INM_IS_BIO_WOP(bio, INM_REQ_WRITE_ZEROES))) { inm_handle_bad_bio(tgt_ctxt, bio); return; } if (bio->bi_vcnt != 0) { static int print_once = 0; if (!print_once) { print_once = 1; err("Write Zeroes: %u", wdatap->wd_cplen); dump_bio(bio); dump_stack(); } } vec.bv_page = ZERO_PAGE(0); vec.bv_offset = 0; orglen = iolen = wdatap->wd_cplen; node = inm_list_entry(change_node_list, change_node_t, next); bytes_res_node = node->data_free; while (iolen) { wdatap->wd_cplen = min(iolen, (inm_u32_t)PAGE_SIZE); vec.bv_len = wdatap->wd_cplen; change_node_list = copy_vec_to_data_pages(tgt_ctxt, &vec, wdatap, change_node_list, &bytes_res_node); iolen -= wdatap->wd_cplen; } wdatap->wd_cplen = orglen; #endif } /* * BIO with single vector */ void copy_single_vector_bio_data_to_data_pages(target_context_t *tgt_ctxt, inm_wdata_t *wdatap, struct inm_list_head *change_node_list) { inm_buf_t *bio = wdatap->wd_privp; dm_bio_info_t *info = bio->bi_private; struct bio_vec *bvec = NULL; inm_bvec_iter_t iter = INM_BVEC_ITER_INIT(); inm_u32_t iolen = 0; inm_u32_t orglen = 0; change_node_t *node; inm_s32_t bytes_res_node = 0; dbg("Write Same: %u", wdatap->wd_cplen); if (bio->bi_vcnt != 1 || !INM_IS_BIO_WOP(bio, INM_REQ_WRITE_SAME)) { err("Write Same: %u", wdatap->wd_cplen); inm_handle_bad_bio(tgt_ctxt, bio); return; } #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,16,0) INM_BVEC_ITER_IDX(iter) = info->bi_idx; #else iter = info->bi_idx; #endif bvec = bio_iovec_idx(bio, iter); orglen = iolen = wdatap->wd_cplen; wdatap->wd_cplen = bvec->bv_len; node = inm_list_entry(change_node_list, change_node_t, next); bytes_res_node = node->data_free; while (iolen) { change_node_list = copy_vec_to_data_pages(tgt_ctxt, bvec, wdatap, change_node_list, &bytes_res_node); iolen -= wdatap->wd_cplen; } wdatap->wd_cplen = orglen; } void copy_normal_bio_data_to_data_pages(target_context_t *tgt_ctxt, inm_wdata_t *wdatap, struct inm_list_head *change_node_list) { struct bio_vec *bvec; data_page_t *pg; inm_s32_t pg_rem = 0, pg_offset = 0, seg_offset = 0, seg_rem = 0; inm_bvec_iter_t idx; inm_s32_t bytes_to_copy = 0, bytes_res_node = 0; inm_s32_t to_copy = 0; char *src,*dst; change_node_t *node; struct bio *bio = (struct bio *) wdatap->wd_privp; dm_bio_info_t *info = bio->bi_private; inm_s32_t org_poffset = 0; static int print_once = 0; #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,16,0) inm_bvec_iter_t iter; struct bio_vec vec; #endif if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("entered"); } pg_offset = 0; node = inm_list_entry(change_node_list, change_node_t, next); INM_BUG_ON(node == NULL); bytes_to_copy = wdatap->wd_cplen; if(bytes_to_copy == 0) return; bytes_res_node = node->data_free; #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,16,0) /* * __bio_for_each_segment returns copy of vector * instead of vector ptr. So we get copy of vector * in vec with bvec pointing to it and use bvec * pointer to access members to have common code * cross all kernels */ bvec = &vec; INM_BVEC_ITER_IDX(iter) = info->bi_idx; INM_BVEC_ITER_SECTOR(iter) = info->bi_sector; INM_BVEC_ITER_SZ(iter) = info->bi_size; INM_BVEC_ITER_BVDONE(iter) = info->bi_bvec_done; idx = iter; /* structure assignment */ #else idx = info->bi_idx; #endif pg = get_cur_data_pg(node, &pg_offset); pg_rem = (PAGE_SIZE - pg_offset); org_poffset = pg_offset; INM_BUG_ON(pg_rem <= 0); INM_BUG_ON(pg == NULL); INM_BUG_ON((void *)pg == (void *)&node->data_pg_head); dst = INM_KMAP_ATOMIC(pg->page, KM_SOFTIRQ1); #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,16,0) __bio_for_each_segment(vec, bio, idx, iter) { #else __bio_for_each_segment(bvec, bio, idx, info->bi_idx) { #endif dbg("Vec = %p", bvec); seg_offset = bvec->bv_offset; seg_rem = MIN(bvec->bv_len, bytes_to_copy); src = INM_KMAP_ATOMIC(bvec->bv_page, KM_SOFTIRQ0); while (seg_rem) { to_copy = MIN(seg_rem, pg_rem); to_copy = MIN(to_copy, bytes_res_node); dbg("SPage = %p, SOffset = %d, DPage = %p, DOffset = %d copy = %d", src, seg_offset, dst, pg_offset, to_copy); memcpy_s((char *)(dst + pg_offset), to_copy, (char *)(src + seg_offset), to_copy); seg_rem -= to_copy; seg_offset += to_copy; bytes_to_copy -= to_copy; pg_rem -= to_copy; pg_offset += to_copy; bytes_res_node -= to_copy; INM_BUG_ON(seg_rem < 0); INM_BUG_ON(pg_rem < 0); INM_BUG_ON(seg_offset > PAGE_SIZE); INM_BUG_ON(pg_offset > PAGE_SIZE); INM_BUG_ON(bytes_to_copy < 0); if (!bytes_to_copy) break; if (!pg_rem || !bytes_res_node) { INM_KUNMAP_ATOMIC(src, KM_SOFTIRQ0); INM_KUNMAP_ATOMIC(dst, KM_SOFTIRQ1); if (!pg_rem) { pg = get_next_data_page(pg->next.next, &pg_rem, &pg_offset, node); } else { /* update offsets for current page in change node structure */ update_cur_dat_pg(node, pg, pg_offset); INM_BUG_ON(!(node->flags & KDIRTY_BLOCK_FLAG_SPLIT_CHANGE_MASK)); /* * Valid case for split io, node needs to be changed. Use next * change node from change_node_list if current one is full */ node = inm_list_entry(node->next.next, change_node_t, next); INM_BUG_ON(!node); INM_BUG_ON(!(node->flags & KDIRTY_BLOCK_FLAG_SPLIT_CHANGE_MASK)); /* Reset destination page offset values */ pg = get_cur_data_pg(node, &pg_offset); pg_rem = (PAGE_SIZE - pg_offset); bytes_res_node = node->data_free; } dst = INM_KMAP_ATOMIC(pg->page, KM_SOFTIRQ1); src = INM_KMAP_ATOMIC(bvec->bv_page, KM_SOFTIRQ0); } } INM_BUG_ON(bytes_to_copy < 0); INM_BUG_ON(seg_rem != 0); INM_KUNMAP_ATOMIC(src, KM_SOFTIRQ0); if(bytes_to_copy == 0) break; } /* update offsets for current page in change node structure */ update_cur_dat_pg(node, pg, pg_offset); INM_KUNMAP_ATOMIC(dst, KM_SOFTIRQ1); /* Detect any under/overrun */ /* New offset should match old offset + len */ pg = get_cur_data_pg(node, &pg_offset); if (!(node->flags & KDIRTY_BLOCK_FLAG_SPLIT_CHANGE_MASK) && ((((org_poffset + wdatap->wd_cplen) & (INM_PAGESZ - 1)) != pg_offset) || bytes_to_copy)) { if (!print_once) { err("Data copy error: Org: %d Len: %d New: %d " "First: %d Last: %d Remaining: %d", org_poffset, wdatap->wd_cplen, pg_offset, CHANGE_NODE_IS_FIRST_DATA_PAGE(node, pg), CHANGE_NODE_IS_LAST_DATA_PAGE(node, pg), bytes_to_copy); err("bytes_res_node: %d to_copy = %d seg_rem = %d " "seg_offset = %d pg_rem = %d", bytes_res_node, to_copy, seg_rem, seg_offset, pg_rem); print_once=1; } inm_handle_bad_bio(tgt_ctxt, bio); } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("leaving"); } } void copy_bio_data_to_data_pages(target_context_t *tgt_ctxt, inm_wdata_t *wdatap, struct inm_list_head *change_node_list) { inm_buf_t *bio = wdatap->wd_privp; if (INM_IS_OFFLOAD_REQUEST_OP(bio)) { if (INM_IS_BIO_WOP(bio, INM_REQ_WRITE_SAME)) copy_single_vector_bio_data_to_data_pages(tgt_ctxt, wdatap, change_node_list); else copy_no_vector_bio_data_to_data_pages(tgt_ctxt, wdatap, change_node_list); } else { copy_normal_bio_data_to_data_pages(tgt_ctxt, wdatap, change_node_list); } } inline static int inm_bio_supported_in_data_mode(struct bio *bio) { if (!INM_IS_SUPPORTED_REQUEST_OP(bio)) { err("Unsupported IO Op: 0x%lx", (unsigned long)inm_bio_op(bio)); return 0; } if (INM_IS_OFFLOAD_REQUEST_OP(bio) && INM_BUF_COUNT(bio) >= DEFAULT_MAX_DATA_SZ_PER_CHANGE_NODE) { err("Large offload IO: 0x%lx:%u", (unsigned long)inm_bio_op(bio), INM_BUF_COUNT(bio)); return 0; } return 1; } static void flt_copy_bio(struct bio *bio) { dm_bio_info_t *bio_info = bio->bi_private; target_context_t *ctxt; host_dev_ctx_t *hdcp; write_metadata_t wmd; inm_wdata_t wdata = {0}; INM_BUG_ON(!bio_info); ctxt = (target_context_t *)bio_info->tc; INM_BUG_ON(!ctxt); hdcp = ctxt->tc_priv; if (INM_UNLIKELY(0 == (bio_info->bi_size - INM_BUF_COUNT(bio)))) { goto free_bio_info; } if (ctxt->tc_dev_type != FILTER_DEV_MIRROR_SETUP) { INM_GET_WMD(bio_info, wmd); wdata.wd_privp = (void *)bio; wdata.wd_cplen = (bio_info->bi_size - INM_BUF_COUNT(bio)); wdata.wd_copy_wd_to_datapgs = copy_bio_data_to_data_pages; wdata.wd_chg_node = bio_info->bi_chg_node; wdata.wd_meta_page = NULL; if (INM_IS_OFFLOAD_REQUEST_OP(bio)) wdata.wd_flag |= INM_WD_WRITE_OFFLOAD; if (unlikely(!inm_bio_supported_in_data_mode(bio))) { dbg("switching to metadata mode \n"); set_tgt_ctxt_filtering_mode(ctxt, FLT_MODE_METADATA, FALSE); if (ecWriteOrderStateData == ctxt->tc_cur_wostate) { update_cx_product_issue(VCS_CX_UNSUPPORTED_BIO); set_tgt_ctxt_wostate(ctxt, ecWriteOrderStateMetadata, FALSE, ecWOSChangeReasonUnsupportedBIO); } } involflt_completion(ctxt, &wmd, &wdata, FALSE); bio_info->bi_chg_node = wdata.wd_chg_node; } free_bio_info: while (bio_info->bi_chg_node) { change_node_t *node = bio_info->bi_chg_node; bio_info->bi_chg_node = (change_node_t *) node->next.next; node->next.next = NULL; inm_free_change_node(node); } bio->bi_end_io = bio_info->bi_end_io; bio->bi_private = bio_info->bi_private; if (INM_BUF_COUNT(bio) == 0) { if (bio_info->orig_bio_copy) { INM_KFREE(bio_info->orig_bio_copy, sizeof(struct bio), INM_KERNEL_HEAP); } INM_DESTROY_SPIN_LOCK(&bio_info->bio_info_lock); #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) inm_free_bio_info(bio_info); #else INM_MEMPOOL_FREE(bio_info, hdcp->hdc_bio_info_pool); #endif put_tgt_ctxt(ctxt); INM_ATOMIC_DEC(&ctxt->tc_nr_in_flight_ios); } } struct bio *bio_to_complete[NR_CPUS]; /* * flt_end_io_parent - Called for a chain parent. The function returns the * bio pointer to bio_endio() in the caller to prevent stack overflow. */ static void flt_end_io_parent(struct bio *bio, target_context_t *ctxt, inm_s32_t error) { inm_irqflag_t flags; struct bio *on_child_stack = NULL; int cpuid = 0; INM_ATOMIC_DEC(&ctxt->tc_nr_chain_bios_pending); local_irq_save(flags); cpuid = smp_processor_id(); on_child_stack = bio_to_complete[cpuid]; if (on_child_stack) { INM_ATOMIC_INC(&ctxt->tc_nr_completed_in_child_stack); bio_to_complete[cpuid] = bio; dbg("PARENT(%d): %p -> %p", cpuid, on_child_stack, bio); local_irq_restore(flags); } else { dbg("PARENT(%d): %p", cpuid, bio); INM_ATOMIC_INC(&ctxt->tc_nr_completed_in_own_stack); local_irq_restore(flags); return flt_end_io_chain(bio, error); } } static void flt_end_io_chain(struct bio *bio, inm_s32_t error) { inm_irqflag_t flags; int cpuid = 0; struct bio *obio = NULL; local_irq_save(flags); cpuid = smp_processor_id(); do { dbg("CHILD(%d): %p", cpuid, bio); /* * In case we preempted another flt_end_io_chain() in execution * we may not complete IO in right order */ INM_BUG_ON((obio = bio_to_complete[cpuid])); /* * Add the bio to per cpu list so that parent endio * can determine if its a recursion or not */ bio_to_complete[cpuid] = bio; flt_orig_endio(bio, error); /* * For a chain bio, the parent bio's end_io() == flt_end_io_fn() * is recursively called when flt_orig_endio() is called on child bio. * This can lead to a stack overflow in case of very large chains * where parent bio itself is a child to another bio. * To prevent stack overflow for large chains, the parent bio is * identified by BINFO_FLAG_CHAIN flag in its flt_end_io_fn() and * is handled by flt_end_io_parent() which will place the parent bio * in per-cpu list bio_to_complete after copying the change buffer, * for the child endio() to call flt_orig_endio() on the parent bio * an complete its processing in its stack and not let the stack grow. */ if (bio != bio_to_complete[cpuid]) { dbg("CHILD: Bio from parent(%d): %p -> %p", cpuid, bio, bio_to_complete[cpuid]); bio = bio_to_complete[cpuid]; } else { bio = NULL; /* No parent to endio() */ } bio_to_complete[cpuid] = obio; } while(bio); local_irq_restore(flags); dbg("Done"); } /* * This function is not used for RHEL 5 (<2.6.24) */ static void flt_end_io(struct bio *bio, inm_s32_t error) { dm_bio_info_t *bio_info = bio->bi_private; target_context_t *ctxt = (target_context_t *)bio_info->tc; int is_chain_bio = bio_info->dm_bio_flags & BINFO_FLAG_CHAIN; flt_copy_bio(bio); if (!driver_ctx->tunable_params.enable_chained_io) return flt_orig_endio(bio, error); /* * bio_endio() will reset BIO_CHAIN flag. As such, we rely on our flag * set during make_request() when BIO_CHAIN flag can be checked for. */ if (is_chain_bio) return flt_end_io_parent(bio, ctxt, error); else return flt_end_io_chain(bio, error); } #if LINUX_VERSION_CODE >= KERNEL_VERSION(4,4,0) static void flt_orig_endio(struct bio *bio, inm_s32_t error) { dbg("ENDIO: %p", bio); if (bio->bi_end_io) return bio->bi_end_io(bio); } void flt_end_io_fn(struct bio *bio) { if (!inm_bio_error(bio)) INM_BUG_ON(INM_BUF_COUNT(bio) != 0); flt_end_io(bio, inm_bio_error(bio)); } #elif LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,24) static void flt_orig_endio(struct bio *bio, inm_s32_t error) { if (bio->bi_end_io) return bio->bi_end_io(bio, error); } void flt_end_io_fn(struct bio *bio, inm_s32_t error) { if (!error) { #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,16,0) INM_BUG_ON(INM_BUF_COUNT(bio) != 0); #else INM_BUF_COUNT(bio) = 0; #endif } flt_end_io(bio, error); } #else static void flt_orig_endio(struct bio *bio, inm_s32_t error) { INM_BUG_ON(1); } /* No special IO handling required - take the default path */ inm_s32_t flt_end_io_fn(struct bio *bio, inm_u32_t done, inm_s32_t error) { flt_copy_bio(bio); if (bio->bi_end_io) return bio->bi_end_io(bio, done, error); return 0; } #endif static void inm_capture_in_metadata(target_context_t *ctx, struct bio *bio, dm_bio_info_t *bio_info) { write_metadata_t wmd; inm_wdata_t wdata = {0}; wmd.offset = (bio_info->bi_sector << 9); wmd.length = bio_info->bi_size; wdata.wd_chg_node = bio_info->bi_chg_node; wdata.wd_meta_page = NULL; volume_lock(ctx); dbg("switching to metadata mode \n"); set_tgt_ctxt_filtering_mode(ctx, FLT_MODE_METADATA, FALSE); if (ecWriteOrderStateData == ctx->tc_cur_wostate) { update_cx_product_issue(VCS_CX_UNSUPPORTED_BIO); set_tgt_ctxt_wostate(ctx, ecWriteOrderStateMetadata, FALSE, ecWOSChangeReasonUnsupportedBIO); } involflt_completion(ctx, &wmd, &wdata, TRUE); volume_unlock(ctx); bio_info->bi_chg_node = wdata.wd_chg_node; while (bio_info->bi_chg_node) { change_node_t *chg_node = bio_info->bi_chg_node; bio_info->bi_chg_node = (change_node_t *) chg_node->next.next; chg_node->next.next = NULL; inm_free_change_node(chg_node); } } static_inline void flt_save_bio_info(target_context_t *ctx, dm_bio_info_t **bio_info, struct bio *bio) { host_dev_ctx_t *hdcp = ctx->tc_priv; change_node_t *chg_node; int full_disk; int alloced_from_pool = 0; full_disk = 0; chg_node = NULL; #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) *bio_info = INM_KMALLOC(sizeof(dm_bio_info_t), GFP_ATOMIC | __GFP_NOWARN, INM_KERNEL_HEAP); if (!(*bio_info)) { *bio_info = inm_alloc_bioinfo(); alloced_from_pool = 1; } #else *bio_info = INM_MEMPOOL_ALLOC(hdcp->hdc_bio_info_pool, INM_KM_SLEEP); /* Ideally we should not panic here. This bug check is here to * understand the load on _bio_info_pool. This should be removed later * and switch mode???? */ INM_BUG_ON(!(*bio_info)); #endif if (!(*bio_info)) { queue_worker_routine_for_set_volume_out_of_sync(ctx, ERROR_TO_REG_FAILED_TO_ALLOC_BIOINFO, -ENOMEM); err("Mempool Alloc Failed"); return; } INM_MEM_ZERO(*bio_info, sizeof(dm_bio_info_t)); if (alloced_from_pool) (*bio_info)->dm_bio_flags |= BINFO_ALLOCED_FROM_POOL; volume_lock(ctx); full_disk = ctx->tc_flags & VCF_FULL_DEV; volume_unlock(ctx); if (full_disk) { (*bio_info)->bi_sector = INM_BUF_SECTOR(bio); } else { (*bio_info)->bi_sector = (INM_BUF_SECTOR(bio) - hdcp->hdc_start_sect); } (*bio_info)->bi_size = INM_BUF_COUNT(bio); (*bio_info)->bi_idx = INM_BUF_IDX(bio); #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,16,0) (*bio_info)->bi_bvec_done = INM_BVEC_ITER_BVDONE(INM_BUF_ITER(bio)); #endif (*bio_info)->bi_flags = bio->bi_flags; (*bio_info)->bi_end_io = bio->bi_end_io; (*bio_info)->bi_private = bio->bi_private; (*bio_info)->tc = ctx; (*bio_info)->orig_bio = bio; INM_INIT_SPIN_LOCK(&((*bio_info)->bio_info_lock)); if (ctx->tc_dev_type == FILTER_DEV_HOST_VOLUME) { unsigned long max_data_sz_per_chg_node; unsigned long remaining_length = (*bio_info)->bi_size; unsigned long length; max_data_sz_per_chg_node = driver_ctx->tunable_params.max_data_sz_dm_cn - \ sv_chg_sz - sv_const_sz; while (remaining_length) { alloced_from_pool = 0; #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) chg_node = INM_KMALLOC(sizeof(change_node_t), GFP_ATOMIC | __GFP_NOWARN, INM_KERNEL_HEAP); if (!chg_node) { chg_node = inm_alloc_chgnode(); alloced_from_pool = 1; } #else chg_node = INM_KMALLOC(sizeof(change_node_t), INM_KM_NOIO, INM_KERNEL_HEAP); #endif if (!chg_node) { queue_worker_routine_for_set_volume_out_of_sync(ctx, ERROR_TO_REG_OUT_OF_MEMORY_FOR_DIRTY_BLOCKS, -ENOMEM); err("Change node allocation failed"); goto out_err; } INM_MEM_ZERO(chg_node, sizeof(change_node_t)); if (alloced_from_pool) chg_node->flags = CHANGE_NODE_ALLOCED_FROM_POOL; chg_node->next.next = NULL; if ((*bio_info)->bi_chg_node) chg_node->next.next = (struct inm_list_head *) (*bio_info)->bi_chg_node; (*bio_info)->bi_chg_node = chg_node; length = min(max_data_sz_per_chg_node, remaining_length); remaining_length -= length; } } /* bio->bi_end_io == flt_end_io_fn is used an indicator that bi_private * belongs to our driver. Change the checks if the following two lines * or their order need to be changed. */ bio->bi_private = (void *)*bio_info; bio->bi_end_io = flt_end_io_fn; if (INM_IS_CHAINED_BIO(bio)) { INM_ATOMIC_INC(&ctx->tc_nr_chain_bios_submitted); INM_ATOMIC_INC(&ctx->tc_nr_chain_bios_pending); dbg("CHAIN: %p", bio); (*bio_info)->dm_bio_flags |= BINFO_FLAG_CHAIN; } out: dbg("Write Entry Point: Offset %llu Length %d", (inm_u64_t)(INM_BUF_SECTOR(bio) * 512), INM_BUF_COUNT(bio)); return; out_err: if ((*bio_info)->orig_bio_copy) { INM_KFREE((*bio_info)->orig_bio_copy, sizeof(struct bio), INM_KERNEL_HEAP); } while ((*bio_info)->bi_chg_node) { chg_node = (*bio_info)->bi_chg_node; (*bio_info)->bi_chg_node = (change_node_t *) chg_node->next.next; chg_node->next.next = NULL; inm_free_change_node(chg_node); } #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) inm_free_bio_info(*bio_info); #else INM_MEMPOOL_FREE(*bio_info, hdcp->hdc_bio_info_pool); #endif *bio_info = NULL; goto out; } void get_root_disk(struct bio *bio) { target_context_t *ctx; INM_DOWN_READ(&driver_ctx->tgt_list_sem); ctx = get_tgt_ctxt_from_bio(bio); if (ctx) { volume_lock(ctx); if (!(ctx->tc_flags & (VCF_VOLUME_DELETING | VCF_VOLUME_CREATING)) && !(ctx->tc_flags & VCF_ROOT_DEV)) { info("Root Disk - %s (%s)", ctx->tc_guid, ctx->tc_pname); driver_ctx->dc_root_disk = ctx; ctx->tc_flags |= VCF_ROOT_DEV; } volume_unlock(ctx); } INM_UP_READ(&driver_ctx->tgt_list_sem); } /* chk whether is driver doing this IO */ int is_our_io(struct bio *biop) { struct bio_vec *bvecp = NULL; const inm_address_space_operations_t *a_opsp = NULL; inma_ops_t *t_inma_opsp = NULL; struct address_space *mapping = NULL; INM_BUG_ON(!biop); #if !(defined(RHEL_MAJOR) && (RHEL_MAJOR == 5)) if (!biop->bi_vcnt || INM_IS_OFFLOAD_REQUEST_OP(biop)) return FALSE; #endif bvecp = bio_iovec_idx(biop, INM_BUF_ITER(biop)); if(!bvecp || !bvecp->bv_page) return FALSE; if (!virt_addr_valid(bvecp) || !INM_VIRT_ADDR_VALID(INM_PAGE_TO_VIRT(bvecp->bv_page))) return FALSE; if (!bvecp->bv_page->mapping) return FALSE; if (!virt_addr_valid(bvecp->bv_page->mapping)) return FALSE; if (PageAnon(bvecp->bv_page)) return FALSE; mapping = bvecp->bv_page->mapping; if (unlikely(!driver_ctx->dc_root_disk) && virt_addr_valid(mapping->host) && virt_addr_valid(mapping->host->i_sb) && mapping->host->i_sb->s_dev == driver_ctx->root_dev && driver_state & DRV_LOADED_FULLY) get_root_disk(biop); #ifdef INM_RECUSIVE_ADSPC a_opsp = bvecp->bv_page->mapping; #else if (!bvecp->bv_page->mapping->a_ops) return FALSE; a_opsp = bvecp->bv_page->mapping->a_ops; #endif INM_DOWN_READ(&driver_ctx->dc_inmaops_sem); t_inma_opsp = inm_get_inmaops_from_aops(a_opsp, INM_DUP_ADDR_SPACE_OPS); INM_UP_READ(&driver_ctx->dc_inmaops_sem); if (t_inma_opsp) { #ifdef INM_RECUSIVE_ADSPC INM_BUG_ON(t_inma_opsp->ia_mapping != bvecp->bv_page->mapping); #endif dbg("Recursive write: Lookup = %p, Mapping = %p", t_inma_opsp, a_opsp); if (driver_ctx->dc_lcw_aops == t_inma_opsp) fstream_raw_map_bio(biop); /* If TrackRecursiveWrites is set, return FALSE */ return driver_ctx->tunable_params.enable_recio ? FALSE : TRUE; } return FALSE; } #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) #define NR_MAX_ALLOCATIONS 1024 #define NR_MAX_FREED 128 void inm_free_bio_info(dm_bio_info_t *bio_info) { unsigned long lock_flag = 0; if (bio_info->dm_bio_flags & BINFO_ALLOCED_FROM_POOL) { INM_SPIN_LOCK_IRQSAVE(&driver_ctx->page_pool_lock, lock_flag); INM_ATOMIC_INC(&driver_ctx->dc_nr_bioinfo_alloced); inm_list_add_tail(&bio_info->entry, &driver_ctx->dc_bioinfo_list); INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->page_pool_lock, lock_flag); } else INM_KFREE(bio_info, sizeof(dm_bio_info_t), INM_KERNEL_HEAP); } void inm_free_bioinfo_pool(void) { dm_bio_info_t *info; while (!inm_list_empty(&driver_ctx->dc_bioinfo_list)) { info = inm_list_entry(driver_ctx->dc_bioinfo_list.next, dm_bio_info_t, entry); inm_list_del(&info->entry); INM_KFREE(info, sizeof(dm_bio_info_t), INM_KERNEL_HEAP); } } void inm_alloc_bioinfo_pool(void) { dm_bio_info_t *info; unsigned long lock_flag = 0; if (INM_ATOMIC_READ(&driver_ctx->dc_nr_bioinfo_alloced) < NR_MAX_ALLOCATIONS) { info = INM_KMALLOC(sizeof(dm_bio_info_t), GFP_NOIO, INM_KERNEL_HEAP); if (!info) return; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->page_pool_lock, lock_flag); INM_ATOMIC_INC(&driver_ctx->dc_nr_bioinfo_alloced); inm_list_add_tail(&info->entry, &driver_ctx->dc_bioinfo_list); INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->page_pool_lock, lock_flag); } } void inm_free_chdnodes_pool(void) { change_node_t *chg_node; while (!inm_list_empty(&driver_ctx->dc_chdnodes_list)) { chg_node = inm_list_entry(driver_ctx->dc_chdnodes_list.next, change_node_t, next); inm_list_del(&chg_node->next); INM_KFREE(chg_node, sizeof(change_node_t), INM_KERNEL_HEAP); } } void inm_alloc_chdnodes_pool(void) { change_node_t *chg_node; unsigned long lock_flag = 0; if (INM_ATOMIC_READ(&driver_ctx->dc_nr_chdnodes_alloced) < NR_MAX_ALLOCATIONS) { chg_node = INM_KMALLOC(sizeof(change_node_t), GFP_NOIO, INM_KERNEL_HEAP); if (!chg_node) return; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->page_pool_lock, lock_flag); INM_ATOMIC_INC(&driver_ctx->dc_nr_chdnodes_alloced); inm_list_add_tail(&chg_node->next, &driver_ctx->dc_chdnodes_list); INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->page_pool_lock, lock_flag); } } dm_bio_info_t *inm_alloc_bioinfo(void) { dm_bio_info_t *info = NULL; unsigned long lock_flag = 0; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->page_pool_lock, lock_flag); if (inm_list_empty(&driver_ctx->dc_bioinfo_list)) { wake_up_interruptible(&driver_ctx->dc_alloc_thread_waitq); INM_ATOMIC_INC(&driver_ctx->dc_nr_bioinfo_allocs_failed); INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->page_pool_lock, lock_flag); goto out; } info = inm_list_entry(driver_ctx->dc_bioinfo_list.next, dm_bio_info_t, entry); inm_list_del(&info->entry); INM_ATOMIC_DEC(&driver_ctx->dc_nr_bioinfo_alloced); INM_ATOMIC_INC(&driver_ctx->dc_nr_bioinfo_alloced_from_pool); wake_up_interruptible(&driver_ctx->dc_alloc_thread_waitq); INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->page_pool_lock, lock_flag); out: return info; } change_node_t *inm_alloc_chgnode(void) { change_node_t *chg_node = NULL; unsigned long lock_flag = 0; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->page_pool_lock, lock_flag); if (inm_list_empty(&driver_ctx->dc_chdnodes_list)) { wake_up_interruptible(&driver_ctx->dc_alloc_thread_waitq); INM_ATOMIC_INC(&driver_ctx->dc_nr_chgnode_allocs_failed); INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->page_pool_lock, lock_flag); goto out; } chg_node = inm_list_entry(driver_ctx->dc_chdnodes_list.next, change_node_t, next); inm_list_del(&chg_node->next); INM_ATOMIC_DEC(&driver_ctx->dc_nr_chdnodes_alloced); INM_ATOMIC_INC(&driver_ctx->dc_nr_chgnodes_alloced_from_pool); wake_up_interruptible(&driver_ctx->dc_alloc_thread_waitq); INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->page_pool_lock, lock_flag); out: return chg_node; } void inm_alloc_pools(void) { unsigned long lock_flag = 0; int nr_bioinfos = 0; int nr_chgnodes = 0; int nr_metapages = 0; static int alloc_in_progress = 0; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->page_pool_lock, lock_flag); if (alloc_in_progress) { INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->page_pool_lock, lock_flag); return; } alloc_in_progress = 1; if(INM_ATOMIC_READ(&driver_ctx->dc_nr_bioinfo_alloced) < NR_MAX_ALLOCATIONS) nr_bioinfos = NR_MAX_ALLOCATIONS - INM_ATOMIC_READ(&driver_ctx->dc_nr_bioinfo_alloced); if (INM_ATOMIC_READ(&driver_ctx->dc_nr_chdnodes_alloced) < NR_MAX_ALLOCATIONS) nr_chgnodes = NR_MAX_ALLOCATIONS - INM_ATOMIC_READ(&driver_ctx->dc_nr_chdnodes_alloced); if (INM_ATOMIC_READ(&driver_ctx->dc_nr_metapages_alloced) < NR_MAX_ALLOCATIONS) nr_metapages = NR_MAX_ALLOCATIONS - INM_ATOMIC_READ(&driver_ctx->dc_nr_metapages_alloced); INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->page_pool_lock, lock_flag); while (nr_bioinfos || nr_chgnodes || nr_metapages) { if (nr_bioinfos) { inm_alloc_bioinfo_pool(); nr_bioinfos--; } if (nr_chgnodes) { inm_alloc_chdnodes_pool(); nr_chgnodes--; } if (nr_metapages) { balance_page_pool(GFP_NOIO, 1); nr_metapages--; } } alloc_in_progress = 0; } void inm_balance_pools(void) { int nr_free = 0; unsigned long lock_flag = 0; dm_bio_info_t *info; change_node_t *chg_node; inm_page_t *pg; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->page_pool_lock, lock_flag); if (INM_ATOMIC_READ(&driver_ctx->dc_nr_bioinfo_alloced) > NR_MAX_ALLOCATIONS) nr_free = INM_ATOMIC_READ(&driver_ctx->dc_nr_bioinfo_alloced) - NR_MAX_ALLOCATIONS; if (nr_free > NR_MAX_FREED) nr_free = NR_MAX_FREED; while (nr_free) { info = inm_list_entry(driver_ctx->dc_bioinfo_list.next, dm_bio_info_t, entry); inm_list_del(&info->entry); INM_KFREE(info, sizeof(dm_bio_info_t), INM_KERNEL_HEAP); nr_free--; INM_ATOMIC_DEC(&driver_ctx->dc_nr_bioinfo_alloced); } nr_free = 0; if (INM_ATOMIC_READ(&driver_ctx->dc_nr_chdnodes_alloced) > NR_MAX_ALLOCATIONS) nr_free = INM_ATOMIC_READ(&driver_ctx->dc_nr_chdnodes_alloced) - NR_MAX_ALLOCATIONS; if (nr_free > NR_MAX_FREED) nr_free = NR_MAX_FREED; while (nr_free) { chg_node = inm_list_entry(driver_ctx->dc_chdnodes_list.next, change_node_t, next); inm_list_del(&chg_node->next); INM_KFREE(chg_node, sizeof(change_node_t), INM_KERNEL_HEAP); nr_free--; INM_ATOMIC_DEC(&driver_ctx->dc_nr_chdnodes_alloced); } nr_free = 0; if (INM_ATOMIC_READ(&driver_ctx->dc_nr_metapages_alloced) > NR_MAX_ALLOCATIONS) nr_free = INM_ATOMIC_READ(&driver_ctx->dc_nr_metapages_alloced) - NR_MAX_ALLOCATIONS; if (nr_free > NR_MAX_FREED) nr_free = NR_MAX_FREED; while (nr_free) { pg = inm_list_entry(driver_ctx->page_pool.next, inm_page_t, entry); inm_list_del(&pg->entry); INM_UNPIN(pg->cur_pg, INM_PAGESZ); INM_FREE_PAGE(pg->cur_pg, INM_KERNEL_HEAP); INM_UNPIN(pg, sizeof(inm_page_t)); INM_KFREE(pg, sizeof(inm_page_t), INM_KERNEL_HEAP); nr_free--; INM_ATOMIC_DEC(&driver_ctx->dc_nr_metapages_alloced); } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->page_pool_lock, lock_flag); } int inm_alloc_thread(void *args) { INM_COMPLETE(&driver_ctx->dc_alloc_thread_started); while (1) { inm_wait_event_interruptible_timeout(driver_ctx->dc_alloc_thread_waitq, ((INM_ATOMIC_READ(&driver_ctx->dc_nr_bioinfo_alloced) < NR_MAX_ALLOCATIONS) || (INM_ATOMIC_READ(&driver_ctx->dc_nr_chdnodes_alloced) < NR_MAX_ALLOCATIONS) || (INM_ATOMIC_READ(&driver_ctx->dc_nr_metapages_alloced) < NR_MAX_ALLOCATIONS)), 60 * INM_HZ); if (INM_ATOMIC_READ(&driver_ctx->dc_alloc_thread_quit)) break; inm_alloc_pools(); INM_DELAY(INM_HZ/1000); inm_balance_pools(); } inm_free_bioinfo_pool(); inm_free_chdnodes_pool(); INM_COMPLETE(&driver_ctx->dc_alloc_thread_exit); info("inmallocd thread exited"); return 0; } int create_alloc_thread(void) { inm_pid_t pid; int err = 1; INM_INIT_COMPLETION(&driver_ctx->dc_alloc_thread_started); INM_INIT_WAITQUEUE_HEAD(&driver_ctx->dc_alloc_thread_waitq); INM_INIT_COMPLETION(&driver_ctx->dc_alloc_thread_exit); INM_ATOMIC_SET(&driver_ctx->dc_nr_bioinfo_allocs_failed, 0); INM_ATOMIC_SET(&driver_ctx->dc_nr_chgnode_allocs_failed, 0); INM_ATOMIC_SET(&driver_ctx->dc_nr_metapage_allocs_failed, 0); INM_ATOMIC_SET(&driver_ctx->dc_alloc_thread_quit, 0); INM_ATOMIC_SET(&driver_ctx->dc_nr_bioinfo_alloced, 0); INM_ATOMIC_SET(&driver_ctx->dc_nr_chdnodes_alloced, 0); INM_ATOMIC_SET(&driver_ctx->dc_nr_metapages_alloced, 256); INM_ATOMIC_SET(&driver_ctx->dc_nr_bioinfo_alloced_from_pool, 0); INM_ATOMIC_SET(&driver_ctx->dc_nr_chgnodes_alloced_from_pool, 0); INM_ATOMIC_SET(&driver_ctx->dc_nr_metapages_alloced_from_pool, 0); INM_INIT_LIST_HEAD(&driver_ctx->dc_bioinfo_list); INM_INIT_LIST_HEAD(&driver_ctx->dc_chdnodes_list); pid = INM_KERNEL_THREAD(driver_ctx->dc_alloc_thread_task, inm_alloc_thread, NULL, 0, "inmallocd"); if (pid >= 0) { err = 0; info("inmallocd thread with pid = %d has created", pid); INM_WAIT_FOR_COMPLETION(&driver_ctx->dc_alloc_thread_started); } return err; } void destroy_alloc_thread(void) { INM_ATOMIC_INC(&driver_ctx->dc_alloc_thread_quit); wake_up_interruptible(&driver_ctx->dc_alloc_thread_waitq); INM_WAIT_FOR_COMPLETION(&driver_ctx->dc_alloc_thread_exit); INM_KTHREAD_STOP(driver_ctx->dc_alloc_thread_task); } blk_status_t inm_queue_rq(struct blk_mq_hw_ctx *hctx, const struct blk_mq_queue_data *bd) { req_queue_info_t *q_info = NULL; struct request *rq = bd->rq; struct request_queue *q = rq->q; unsigned long lock_flag = 0; queue_rq_fn *orig_queue_rq_fn = NULL; struct bio *bio; target_context_t *ctx; dm_bio_info_t *bio_info; inm_u32_t idx; sector_t end_sector; host_dev_ctx_t *hdcp; int is_resized = 0; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_host_info.rq_list_lock, lock_flag); q_info = get_qinfo_from_kobj(&q->kobj); if(q_info){ orig_queue_rq_fn = q_info->orig_mq_ops->queue_rq; ctx = q_info->tc; get_tgt_ctxt(ctx); }else{ orig_queue_rq_fn = q->mq_ops->queue_rq; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_host_info.rq_list_lock, lock_flag); goto out_orig_queue_rq_fn; } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_host_info.rq_list_lock, lock_flag); if (!rq->bio) goto out; bio = rq->bio; process_bio: /* Handling reads */ if (!inm_bio_is_write(bio) || !INM_BUF_COUNT(bio) || INM_IS_TEST_MIRROR(bio)) goto next_bio; /* Trim requests */ if (inm_bio_is_discard(bio) && !INM_DISCARD_ZEROES_DATA(q_info->q)) { dbg("Trim request: start sector = %llu, len = %d, vcnt = %d", (inm_u64_t)INM_BUF_SECTOR(bio), INM_BUF_COUNT(bio), bio->bi_vcnt); goto next_bio; } /* Recursive writes (IO generated by involflt specific thread) */ if (is_our_io(bio)) goto next_bio; if (ctx->tc_flags & (VCF_VOLUME_CREATING | VCF_VOLUME_DELETING)) { goto out; } if(is_target_filtering_disabled(ctx)) { dbg("mirror paused for scsi_id %s",ctx->tc_pname); goto out; } if (bio->bi_end_io == flt_end_io_fn) goto next_bio; hdcp = (host_dev_ctx_t *) ctx->tc_priv; end_sector = INM_BUF_SECTOR(bio) + ((INM_BUF_COUNT(bio) + 511) >> 9) - 1; if ((INM_BUF_SECTOR(bio) >= hdcp->hdc_start_sect) && (end_sector <= hdcp->hdc_end_sect)) goto get_reference; volume_lock(ctx); if (ctx->tc_flags & VCF_FULL_DEV) { /* hdc_actual_end_sect contains the latest size of disk * extracted from gendisk. if the IO is beyond the latest * size, we assume the disk is resized and mark for resync. */ if (end_sector > hdcp->hdc_actual_end_sect) { is_resized = 1; queue_worker_routine_for_set_volume_out_of_sync(ctx, ERROR_TO_REG_INVALID_IO, -EINVAL); /* Update actual_end_sect to new size so no * further resyncs are queued */ hdcp->hdc_actual_end_sect = get_capacity(INM_BUF_DISK(bio)) - 1; err("%s: Resize: Expected: %llu, New: %llu", ctx->tc_guid, (inm_u64_t)hdcp->hdc_end_sect, (inm_u64_t)hdcp->hdc_actual_end_sect); } } else { if (((INM_BUF_SECTOR(bio) >= hdcp->hdc_start_sect) && (INM_BUF_SECTOR(bio) <= hdcp->hdc_end_sect) && (end_sector > hdcp->hdc_end_sect)) || /* Right Overlap */ ((INM_BUF_SECTOR(bio) < hdcp->hdc_start_sect) && (end_sector >= hdcp->hdc_start_sect) && (end_sector <= hdcp->hdc_end_sect)) ||/* left Overlap */ ((INM_BUF_SECTOR(bio) < hdcp->hdc_start_sect) && (end_sector > hdcp->hdc_end_sect)) || /* Super Set */ ((INM_BUF_SECTOR(bio) > hdcp->hdc_end_sect) && (INM_BUF_SECTOR(bio) <= hdcp->hdc_actual_end_sect))) { is_resized = 1; err("Unable to handle the spanning I/O across multiple " "partitions/volumes"); queue_worker_routine_for_set_volume_out_of_sync(ctx, ERROR_TO_REG_INVALID_IO, -EINVAL); } } volume_unlock(ctx); if (is_resized) goto out; get_reference: get_tgt_ctxt(ctx); if (inm_bio_is_discard(bio) && bio->bi_vcnt) { err("Discard bio with data is seen"); inm_handle_bad_bio(ctx, bio); put_tgt_ctxt(ctx); goto next_bio; } flt_save_bio_info(ctx, &bio_info, bio); if (!bio_info) { put_tgt_ctxt(ctx); goto next_bio; } idx = inm_comp_io_bkt_idx(INM_BUF_COUNT(bio)); INM_ATOMIC_INC(&ctx->tc_stats.io_pat_writes[idx]); INM_ATOMIC_INC(&ctx->tc_nr_in_flight_ios); if (!driver_ctx->tunable_params.enable_chained_io && INM_IS_CHAINED_BIO(bio)) { telemetry_set_exception(ctx->tc_guid, ecUnsupportedBIO, INM_BIO_RW_FLAGS(bio)); inm_capture_in_metadata(ctx, bio, bio_info); bio->bi_end_io = bio_info->bi_end_io; bio->bi_private = bio_info->bi_private; if (bio_info->orig_bio_copy) { INM_KFREE(bio_info->orig_bio_copy, sizeof(struct bio), INM_KERNEL_HEAP); } INM_DESTROY_SPIN_LOCK(&bio_info->bio_info_lock); inm_free_bio_info(bio_info); put_tgt_ctxt(ctx); INM_ATOMIC_DEC(&ctx->tc_nr_in_flight_ios); goto next_bio; } next_bio: if (bio == rq->biotail) goto out; bio = bio->bi_next; goto process_bio; out: put_tgt_ctxt(ctx); out_orig_queue_rq_fn: return orig_queue_rq_fn(hctx, bd); } #else int create_alloc_thread(void) { return 0; } void destroy_alloc_thread(void) { return; } #if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 4, 0) blk_qc_t #elif LINUX_VERSION_CODE >= KERNEL_VERSION(3, 2, 0) void #else int #endif flt_make_request_fn(struct request_queue *q, struct bio *bio) { req_queue_info_t *q_info = NULL; target_context_t *ctx; dm_bio_info_t *bio_info; make_request_fn *orig_make_request_fn = NULL; unsigned long lock_flag = 0; mirror_vol_entry_t *vol_entry = NULL; struct request_queue *atbio_q = NULL; inm_mirror_atbuf *atbuf_wrap = NULL; inm_mirror_bufinfo_t *imbinfop = NULL; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_host_info.rq_list_lock, lock_flag); q_info = get_qinfo_from_kobj(&INM_BUF_DISK(bio)->queue->kobj); if(q_info){ INM_BUG_ON(!q_info->orig_make_req_fn); orig_make_request_fn = q_info->orig_make_req_fn; }else{ #if LINUX_VERSION_CODE >= KERNEL_VERSION(4,14,0) || defined SLES12SP4 || \ defined SLES12SP5 || defined SLES15 struct request_queue *q = bio->bi_disk->queue; #else struct request_queue *q = bdev_get_queue(bio->bi_bdev); #endif orig_make_request_fn = q->make_request_fn; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_host_info.rq_list_lock, lock_flag); #if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 4, 0) return orig_make_request_fn(q, bio); #elif LINUX_VERSION_CODE >= KERNEL_VERSION(3, 2, 0) orig_make_request_fn(q, bio); return; #else return orig_make_request_fn(q, bio); #endif } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_host_info.rq_list_lock, lock_flag); /* Handling reads */ if (!inm_bio_is_write(bio) || !INM_BUF_COUNT(bio) || INM_IS_TEST_MIRROR(bio)) goto map_and_exit; /* Trim requests */ if (inm_bio_is_discard(bio) && !INM_DISCARD_ZEROES_DATA(q_info->q)) { dbg("Trim request: start sector = %llu, len = %d, vcnt = %d", (inm_u64_t)INM_BUF_SECTOR(bio), INM_BUF_COUNT(bio), bio->bi_vcnt); goto map_and_exit; } /* Recursive writes (IO generated by involflt specific thread) */ if(is_our_io(bio)) goto map_and_exit; INM_DOWN_READ(&driver_ctx->tgt_list_sem); ctx = get_tgt_ctxt_from_bio(bio); if (ctx) { inm_u32_t idx = 0; if(is_target_filtering_disabled(ctx) || (ctx->tc_dev_type == FILTER_DEV_MIRROR_SETUP && is_target_mirror_paused(ctx))) { INM_UP_READ(&driver_ctx->tgt_list_sem); dbg("mirror paused for scsi_id %s", ctx->tc_pname); goto map_and_exit; } if(is_target_read_only(ctx)) { INM_UP_READ(&driver_ctx->tgt_list_sem); #if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 4, 0) #if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 13, 0) || defined SLES12SP4 || \ defined SLES12SP5 || defined SLES15 bio->bi_status = BLK_STS_IOERR; #else bio->bi_error = -EIO; #endif bio_endio(bio); return BLK_QC_T_NONE; #elif LINUX_VERSION_CODE >= KERNEL_VERSION(3, 2, 0) bio_endio(bio, -EIO); return; #else return -EIO; #endif } get_tgt_ctxt(ctx); INM_UP_READ(&driver_ctx->tgt_list_sem); if (inm_bio_is_discard(bio) && bio->bi_vcnt) { err("Discard bio with data is seen"); inm_handle_bad_bio(ctx, bio); put_tgt_ctxt(ctx); goto map_and_exit; } switch(ctx->tc_dev_type) { case FILTER_DEV_HOST_VOLUME: flt_save_bio_info(ctx, &bio_info, bio); balance_page_pool(INM_KM_NOIO, 0); idx = inm_comp_io_bkt_idx(INM_BUF_COUNT(bio)); INM_ATOMIC_INC(&ctx->tc_stats.io_pat_writes[idx]); INM_ATOMIC_INC(&ctx->tc_nr_in_flight_ios); if (!driver_ctx->tunable_params.enable_chained_io && INM_IS_CHAINED_BIO(bio)) { host_dev_ctx_t *hdcp = ctx->tc_priv; telemetry_set_exception(ctx->tc_guid, ecUnsupportedBIO, INM_BIO_RW_FLAGS(bio)); inm_capture_in_metadata(ctx, bio, bio_info); bio->bi_end_io = bio_info->bi_end_io; bio->bi_private = bio_info->bi_private; if (bio_info->orig_bio_copy) { INM_KFREE(bio_info->orig_bio_copy, sizeof(struct bio), INM_KERNEL_HEAP); } INM_DESTROY_SPIN_LOCK(&bio_info->bio_info_lock); INM_MEMPOOL_FREE(bio_info, hdcp->hdc_bio_info_pool); put_tgt_ctxt(ctx); INM_ATOMIC_DEC(&ctx->tc_nr_in_flight_ios); goto map_and_exit; } break; case FILTER_DEV_MIRROR_SETUP: volume_lock(ctx); vol_entry = get_cur_vol_entry(ctx, INM_BUF_COUNT(bio)); volume_unlock(ctx); if (inm_save_mirror_bufinfo(ctx, &imbinfop, &bio, vol_entry)) { INM_DEREF_VOL_ENTRY(vol_entry, ctx); } else { INM_BUG_ON(!imbinfop); atbuf_wrap = inm_list_entry(imbinfop->imb_atbuf_list.next, inm_mirror_atbuf, imb_atbuf_this); atbio_q = bdev_get_queue(vol_entry->mirror_dev); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG_MIRROR_IO))) { info("Intial mirror config - atio bi_sector:%llu bi_size:%d rw:%d" "bi_rw:%d atbuf:%p", (inm_u64_t)(INM_BUF_SECTOR(&(atbuf_wrap->imb_atbuf_buf)) * 512), INM_BUF_COUNT(&(atbuf_wrap->imb_atbuf_buf)), (int)(inm_bio_is_write(&atbuf_wrap->imb_atbuf_buf)), (int)(inm_bio_rw(&atbuf_wrap->imb_atbuf_buf)), &(atbuf_wrap->imb_atbuf_buf)); } atbio_q->make_request_fn(atbio_q, &(atbuf_wrap->imb_atbuf_buf)); put_tgt_ctxt(ctx); } break; case FILTER_DEV_FABRIC_LUN: INM_BUG_ON(1); default: break; } } else { INM_UP_READ(&driver_ctx->tgt_list_sem); } map_and_exit: #if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 4, 0) return orig_make_request_fn(q, bio); #elif LINUX_VERSION_CODE >= KERNEL_VERSION(3, 2, 0) orig_make_request_fn(q, bio); #else return orig_make_request_fn(q, bio); #endif } #endif static_inline void flt_disk_removal(struct kobject *kobj) { target_context_t *ctx = NULL; char *save_guid = NULL; if(!get_path_memory(&save_guid)) { err("Failed to get memory while removing the disk"); } dbg("entered"); if (driver_ctx->sys_shutdown) { dbg("disk removal ignored\n"); if (save_guid) free_path_memory(&save_guid); return; } down_read(&(driver_ctx->tgt_list_sem)); ctx = get_tgt_ctxt_from_kobj(kobj); if (ctx == NULL){ up_read(&(driver_ctx->tgt_list_sem)); return; } volume_lock(ctx); ctx->tc_flags |= VCF_VOLUME_DELETING; ctx->tc_filtering_disable_required = 0; close_disk_cx_session(ctx, CX_CLOSE_DISK_REMOVAL); set_tag_drain_notify_status(ctx, TAG_STATUS_DROPPED, DEVICE_STATUS_REMOVED); volume_unlock(ctx); if (driver_ctx->dc_root_disk == ctx) driver_ctx->dc_root_disk = NULL; up_read(&(driver_ctx->tgt_list_sem)); if (save_guid) { strncpy_s(save_guid, INM_PATH_MAX, ctx->tc_pname, PATH_MAX); } if (ctx->tc_bp->volume_bitmap) { wait_for_all_writes_to_complete(ctx->tc_bp->volume_bitmap); flush_and_close_bitmap_file(ctx); } tgt_ctx_force_soft_remove(ctx); put_tgt_ctxt(ctx); /* When volume goes offline and comes back up online, it should * be able to restart the filtering. To make sure that happens, * set VolumeFilteringDisabled to FALSE */ if (save_guid) { inm_write_guid_attr(save_guid, VolumeFilteringDisabled, 0); free_path_memory(&save_guid); } dbg("leaving"); } void flt_disk_obj_rel(struct kobject *kobj) { struct gendisk *disk = NULL; struct kobject *qkobj = NULL; struct device *dev = NULL; dbg("Disk Removal"); #if (defined(RHEL_MAJOR) && (RHEL_MAJOR == 5)) if (!kobj->parent || !kobj->parent->name) goto out; if (strcmp(kobj->parent->name, "block")) goto out; disk = kobj_to_disk(kobj); #else dev = kobj_to_dev(kobj); if (!dev->type || !dev->type->name) goto out; if (strcmp(dev->type->name, "disk")) goto out; disk = dev_to_disk(dev); #endif if (!disk->queue) goto out; qkobj = &disk->queue->kobj; dbg("%s: queue = %p, kobj = %p, qkobj = %p", disk->disk_name, disk->queue, kobj, qkobj); if (!get_qinfo_from_kobj(qkobj)) { info("%s: not protected", disk->disk_name); } else { info("%s: protected", disk->disk_name); flt_disk_removal(kobj); } out: INM_BUG_ON(NULL == flt_disk_release_fn); flt_disk_release_fn(kobj); } void flt_part_obj_rel(struct kobject *kobj) { dbg("Partition Removal:"); flt_disk_removal(kobj); INM_BUG_ON(NULL == flt_part_release_fn); flt_part_release_fn(kobj); } void flt_queue_obj_rel(struct kobject *kobj) { req_queue_info_t *req_q = NULL; req_q = get_qinfo_from_kobj(kobj); INM_BUG_ON(NULL == req_q); dbg("Calling Original queue release function"); if(req_q->orig_kobj_type->release) req_q->orig_kobj_type->release(kobj); } #if LINUX_VERSION_CODE >= KERNEL_VERSION(4,4,0) static void completion_check_endio(struct bio *bio) #elif LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,24) static void completion_check_endio(struct bio *bio, inm_s32_t error) #else static inm_s32_t completion_check_endio(struct bio *bio, inm_u32_t done, inm_s32_t error) #endif { completion_chk_req_t *req; target_context_t *ctx; req = bio->bi_private; ctx = req->ctx; if(in_irq()) { dbg("Completion called in Interrupt context for %s", ctx->tc_guid); ctx->tc_lock_fn = volume_lock_irqsave; ctx->tc_unlock_fn = volume_unlock_irqrestore; } else if(in_softirq()){ dbg("Completion called in Soft Interrupt context for %s", ctx->tc_guid); ctx->tc_lock_fn = volume_lock_bh; ctx->tc_unlock_fn = volume_unlock_bh; } __free_page(bio->bi_io_vec[0].bv_page); bio->bi_io_vec[0].bv_page = NULL; bio_put(bio); INM_COMPLETE(&req->comp); #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,24) return 0; #endif } void check_completion_context(target_context_t *ctx, inm_block_device_t *bdev) { completion_chk_req_t req; struct bio *bio = NULL; req.ctx = ctx; INM_INIT_COMPLETION(&req.comp); bio = bio_alloc(INM_KM_SLEEP, 1); if(bio) { #if ((LINUX_VERSION_CODE < KERNEL_VERSION(5,12,0) && \ LINUX_VERSION_CODE >= KERNEL_VERSION(4,14,0)) || \ defined SLES12SP4 || defined SLES12SP5 || defined SLES15) && \ (!defined SLES15SP4) bio->bi_disk = bdev->bd_disk; #else bio->bi_bdev = bdev; #endif bio->bi_io_vec[0].bv_page = INM_ALLOC_PAGE(INM_KM_SLEEP); if(!bio->bi_io_vec[0].bv_page) goto error_out; bio->bi_io_vec[0].bv_len = PAGE_SIZE; bio->bi_io_vec[0].bv_offset = 0; bio->bi_vcnt = 1; INM_BUF_ITER(bio) = INM_BVEC_ITER_INIT(); INM_BUF_COUNT(bio) = PAGE_SIZE; INM_BUF_SECTOR(bio) = 0; bio->bi_end_io = completion_check_endio; bio->bi_private = &req; #if LINUX_VERSION_CODE >= KERNEL_VERSION(4,8,0) || defined SLES12SP3 bio->bi_opf = REQ_OP_READ; submit_bio(bio); #else submit_bio(READ, bio); #endif INM_WAIT_FOR_COMPLETION(&req.comp); } return; error_out: if(bio) { bio_put(bio); bio = NULL; } } target_context_t * flt_gendisk_to_tgt_ctxt(struct gendisk *disk) { target_context_t *tgt_ctx = NULL; struct kobject *kobj = NULL; #if defined(RHEL_MAJOR) && (RHEL_MAJOR == 5) kobj = &(disk->kobj); #else kobj = &((disk_to_dev(disk))->kobj); #endif INM_DOWN_READ(&(driver_ctx->tgt_list_sem)); tgt_ctx = get_tgt_ctxt_from_kobj(kobj); INM_UP_READ(&(driver_ctx->tgt_list_sem)); return tgt_ctx; } #if LINUX_VERSION_CODE < KERNEL_VERSION(5,13,0) int flt_revalidate_disk(struct gendisk *disk) { target_context_t *tgt_ctx = NULL; host_dev_ctx_t *hdcp = NULL; host_dev_t *hdc_dev = NULL; sector_t nr_sect = 0; int (*org_revalidate_disk) (struct gendisk *) = NULL; int error = 0; err("Revalidating disk"); tgt_ctx = flt_gendisk_to_tgt_ctxt(disk); if (!tgt_ctx) { INM_BUG_ON(!tgt_ctx); return -1; } hdcp = tgt_ctx->tc_priv; hdc_dev = inm_list_entry((hdcp->hdc_dev_list_head.next), host_dev_t, hdc_dev_list); /* * Call the original revalidate_disk() and * then check for changes in disk properties. * * unregister_disk_notification() can happen * on another CPU so check should be protected * by volume lock. However revalidation is IO bound * and cannot be called under volume spinlock * so assign it to temp func ptr and call the func * ptr outside the lock. */ volume_lock(tgt_ctx); if (hdc_dev->hdc_fops && hdc_dev->hdc_fops->revalidate_disk) org_revalidate_disk = hdc_dev->hdc_fops->revalidate_disk; volume_unlock(tgt_ctx); if (org_revalidate_disk) error = org_revalidate_disk(disk); nr_sect = get_capacity(disk); volume_lock(tgt_ctx); /* * get_capacity() returns no. of 512 byte sector * irrespective of physical sector size */ if (nr_sect != (hdcp->hdc_volume_size >> 9)) { err("%s capacity change ... marking for resync", tgt_ctx->tc_guid); queue_worker_routine_for_set_volume_out_of_sync(tgt_ctx, ERROR_TO_REG_INVALID_IO, -EINVAL); /* update actual end sector so writes do not trigger another resync */ hdcp->hdc_actual_end_sect = nr_sect - 1; } volume_unlock(tgt_ctx); put_tgt_ctxt(tgt_ctx); /* return error code from original revalidate_disk() */ return error; } #endif void unregister_disk_change_notification(target_context_t *ctx, host_dev_t *hdc_dev) { struct gendisk *disk = hdc_dev->hdc_disk_ptr; const struct block_device_operations *flt_fops; if (!hdc_dev->hdc_fops || !disk || !disk->fops) { dbg("Unregister disk notification not required"); return; } #if LINUX_VERSION_CODE < KERNEL_VERSION(5,13,0) if (disk->fops->revalidate_disk != flt_revalidate_disk) { INM_BUG_ON(disk->fops->revalidate_disk != flt_revalidate_disk); return; } #endif dbg("Unregistering for disk change notification"); flt_fops = disk->fops; volume_lock(ctx); disk->fops = hdc_dev->hdc_fops; hdc_dev->hdc_fops = NULL; volume_unlock(ctx); INM_KFREE(flt_fops, sizeof(*flt_fops), INM_KERNEL_HEAP); } void register_disk_change_notification(target_context_t *ctx, host_dev_t *hdc_dev) { struct gendisk *disk = hdc_dev->hdc_disk_ptr; struct block_device_operations *flt_fops = NULL; host_dev_ctx_t *hdcp = NULL; #if LINUX_VERSION_CODE < KERNEL_VERSION(5,13,0) if (disk->fops->revalidate_disk == flt_revalidate_disk) { INM_BUG_ON(disk->fops->revalidate_disk == flt_revalidate_disk); return; } #endif /* * If the allocation fails, we fall back to beyond range * check in make_request_fn to determine disk resize. */ flt_fops = INM_KMALLOC(sizeof(*flt_fops), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!flt_fops) { err("Failed to allocate memory for registration"); return; } if (memcpy_s(flt_fops, sizeof(*flt_fops), disk->fops, sizeof(*(disk->fops)))) { INM_KFREE(flt_fops, sizeof(*flt_fops), INM_KERNEL_HEAP); return; } #if LINUX_VERSION_CODE < KERNEL_VERSION(5,13,0) flt_fops->revalidate_disk = flt_revalidate_disk; #endif dbg("Registering for disk change notification"); volume_lock(ctx); hdc_dev->hdc_fops = disk->fops; disk->fops = flt_fops; volume_unlock(ctx); /* Check if the expected and actual sizes match */ hdcp = ctx->tc_priv; if (hdcp->hdc_end_sect != hdcp->hdc_actual_end_sect) { err("%s: Resize: Expected: %llu, New: %llu", ctx->tc_guid, (inm_u64_t)hdcp->hdc_end_sect, (inm_u64_t)hdcp->hdc_actual_end_sect); set_volume_out_of_sync(ctx, ERROR_TO_REG_INVALID_IO, -EBADF); } } int stack_host_dev(target_context_t *ctx, inm_dev_extinfo_t *dinfo) { inm_s32_t ret = 0; req_queue_info_t *q_info; inm_s32_t found = 0; inm_s32_t last_char_pos = 0; inm_block_device_t *bdev; host_dev_ctx_t *hdcp; target_context_t *tgt_ctx; struct inm_list_head *ptr1, *ptr2; host_dev_t *hdc_dev = NULL; mirror_vol_entry_t *vol_entry = NULL; hdcp = (host_dev_ctx_t *)ctx->tc_priv; bdev = open_by_dev_path(dinfo->d_guid, 0); /* open by device path */ if (!bdev) { dbg("Failed to convert dev path (%s) to bdev", dinfo->d_guid); ret = -ENODEV; return ret; } hdcp->hdc_bio_info_pool = INM_MEMPOOL_CREATE(BIO_INFO_MPOOL_SIZE, INM_MEMPOOL_ALLOC_SLAB, INM_MEMPOOL_FREE_SLAB, driver_ctx->dc_host_info.bio_info_cache); if (!hdcp->hdc_bio_info_pool) { err("INM_MEMPOOL_CREATE failed"); close_bdev(bdev, FMODE_READ); ret = -ENOMEM; return ret; } /* Check the completion context and reset the pointers accordingly. * if this function fails, we have the default safer ones. */ INM_DOWN_READ(&driver_ctx->tgt_list_sem); retry: __inm_list_for_each(ptr1, &driver_ctx->tgt_list) { tgt_ctx = inm_list_entry(ptr1, target_context_t, tc_list); if(tgt_ctx == ctx) break; if (tgt_ctx->tc_dev_type == FILTER_DEV_HOST_VOLUME || tgt_ctx->tc_dev_type == FILTER_DEV_MIRROR_SETUP) { host_dev_ctx_t *hdcp_ptr = (host_dev_ctx_t *) tgt_ctx->tc_priv; hdc_dev = inm_list_entry(hdcp_ptr->hdc_dev_list_head.next, host_dev_t, hdc_dev_list); INM_BUG_ON(!hdc_dev); if (hdc_dev->hdc_disk_ptr == bdev->bd_disk) { if (tgt_ctx->tc_flags & VCF_VOLUME_CREATING) { if (check_for_tc_state(tgt_ctx, 0)) { tgt_ctx = NULL; goto retry; } } ctx->tc_lock_fn = tgt_ctx->tc_lock_fn; ctx->tc_unlock_fn = tgt_ctx->tc_unlock_fn; found = 1; break; } } } INM_UP_READ(&driver_ctx->tgt_list_sem); switch (dinfo->d_type) { case FILTER_DEV_HOST_VOLUME: q_info = alloc_and_init_qinfo(bdev, ctx); if (!q_info) { ret = -EINVAL; err("Failed to allocate and initialize q_info"); break; } hdc_dev = NULL; __inm_list_for_each(ptr2, &hdcp->hdc_dev_list_head) { hdc_dev = inm_list_entry(ptr2, host_dev_t, hdc_dev_list); if (hdc_dev->hdc_dev == bdev->bd_inode->i_rdev) break; hdc_dev = NULL; } if (hdc_dev) { hdc_dev->hdc_req_q_ptr = q_info; INM_ATOMIC_INC(&q_info->vol_users); init_tc_kobj(bdev, &hdc_dev->hdc_disk_kobj_ptr); } else ret = -EINVAL; break; case FILTER_DEV_MIRROR_SETUP: __inm_list_for_each(ptr1, dinfo->src_list) { vol_entry = inm_list_entry(ptr1, mirror_vol_entry_t, next); __inm_list_for_each(ptr2, &hdcp->hdc_dev_list_head) { hdc_dev = inm_list_entry(ptr2, host_dev_t, hdc_dev_list); if (hdc_dev->hdc_dev == vol_entry->mirror_dev->bd_inode->i_rdev) break; hdc_dev = NULL; } if (hdc_dev) { q_info = alloc_and_init_qinfo(vol_entry->mirror_dev, ctx); if (!q_info) { ret = -EINVAL; err("Failed to allocate and initialize q_info during mirror setup"); break; } hdc_dev->hdc_req_q_ptr = q_info; INM_ATOMIC_INC(&q_info->vol_users); } else { err("Failed to find the hdcp device entry for volume:%s\n", vol_entry->tc_mirror_guid); ret = -EINVAL; break; } init_tc_kobj(vol_entry->mirror_dev, &hdc_dev->hdc_disk_kobj_ptr); } break; case FILTER_DEV_FABRIC_LUN: INM_BUG_ON(1); break; default: err("Invalid filtering device type\n"); INM_BUG_ON(1); ret = -EINVAL; } close_bdev(bdev, FMODE_READ); if (ret) { inm_rel_dev_resources(ctx, hdcp); return ret; } /* If the volume is marked for stop filtering, then use the size * provided by the user-space. Otherwise, use the persistent store * size */ if ((ctx->tc_flags & VCF_FILTERING_STOPPED) || (ctx->tc_flags & VCF_VOLUME_STACKED_PARTIALLY)){ host_dev_ctx_t *hdcp = ctx->tc_priv; hdcp->hdc_bsize = dinfo->d_bsize; hdcp->hdc_nblocks = dinfo->d_nblks; set_int_vol_attr(ctx, VolumeBsize, hdcp->hdc_bsize); set_unsignedlonglong_vol_attr(ctx, VolumeNblks, hdcp->hdc_nblocks); } /* get volume size */ hdcp->hdc_volume_size = hdcp->hdc_bsize * hdcp->hdc_nblocks; hdcp->hdc_end_sect = hdcp->hdc_start_sect + (hdcp->hdc_volume_size >> 9) - 1; volume_lock(ctx); /* set full disk flag */ last_char_pos = strlen(tgt_ctx->tc_guid)-1; /* last char */ if ( ! ((tgt_ctx->tc_guid[last_char_pos] >= '0') && (tgt_ctx->tc_guid[last_char_pos] <= '9')) ) { ctx->tc_flags |= VCF_FULL_DEV; } volume_unlock(ctx); if (ctx->tc_flags & VCF_FULL_DEV) { /* * Register for disk size change notification */ register_disk_change_notification(ctx, hdc_dev); } return 0; } inm_s32_t get_root_info(void) { struct file *f = NULL; f = filp_open("/", O_RDONLY, 0400); if (!f) { err("Can't open / "); return -1; } #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,19,0) driver_ctx->root_dev = INM_HDL_TO_INODE(f)->i_sb->s_dev; #else driver_ctx->root_dev = f->f_dentry->d_sb->s_dev; #endif filp_close(f, current->files); dbg("Root dev_t = %u,%u", MAJOR(driver_ctx->root_dev), MINOR(driver_ctx->root_dev)); return 0; } inm_s32_t get_boot_dev_t(inm_dev_t *dev) { struct file *f = NULL; f = filp_open("/boot", O_RDONLY, 0400); if (!f) { err("Can't open /boot"); return -1; } #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,19,0) *dev = INM_HDL_TO_INODE(f)->i_sb->s_dev; #else *dev = f->f_dentry->d_sb->s_dev; #endif filp_close(f, current->files); return 0; } /* * Indicates if disk is root disk * Matches the gendisk structure of passed disk against that of root/boot partition */ inm_s32_t isrootdisk(target_context_t *vcptr) { inm_block_device_t *rbdev = NULL; host_dev_ctx_t *hdcp = NULL; host_dev_t *hdc_dev = NULL; struct inm_list_head *hptr = NULL; inm_s32_t isroot = 0; inm_dev_t boot_dev = 0; do { if (vcptr->tc_dev_type != FILTER_DEV_HOST_VOLUME && vcptr->tc_dev_type != FILTER_DEV_MIRROR_SETUP) break; if (!driver_ctx->root_dev) break; rbdev = inm_open_by_devnum(driver_ctx->root_dev, FMODE_READ); if (IS_ERR(rbdev)) break; dbg("Root gendisk name %p = %s", rbdev->bd_disk, rbdev->bd_disk->disk_name); hdcp = (host_dev_ctx_t *)vcptr->tc_priv; if (hdcp) { __inm_list_for_each(hptr, &hdcp->hdc_dev_list_head) { hdc_dev = inm_list_entry(hptr, host_dev_t, hdc_dev_list); dbg("Disk ptr = %p = %s", hdc_dev->hdc_disk_ptr, hdc_dev->hdc_disk_ptr->disk_name); if (hdc_dev->hdc_disk_ptr == rbdev->bd_disk) { dbg("Root Dev = %s", vcptr->tc_guid); isroot = 1; break; } hdc_dev = NULL; } } close_bdev(rbdev, FMODE_READ); if (isroot) break; if (get_boot_dev_t(&boot_dev)) break; rbdev = inm_open_by_devnum(boot_dev, FMODE_READ); if (IS_ERR(rbdev)) break; dbg("Boot gendisk name %p = %s", rbdev->bd_disk, rbdev->bd_disk->disk_name); hdcp = (host_dev_ctx_t *)vcptr->tc_priv; if (hdcp) { __inm_list_for_each(hptr, &hdcp->hdc_dev_list_head) { hdc_dev = inm_list_entry(hptr, host_dev_t, hdc_dev_list); if (hdc_dev->hdc_disk_ptr == rbdev->bd_disk) { dbg("Root Dev = %s", vcptr->tc_guid); isroot = 1; break; } hdc_dev = NULL; } } close_bdev(rbdev, FMODE_READ); } while(0); return isroot; } /* * Indicates if volume/partition is root volume * Matches dev_t of passed volume against root device */ inm_s32_t isrootvol(target_context_t *vcptr) { inm_dev_t vdev = 0; inm_s32_t isroot = 0; do { if (vcptr->tc_dev_type != FILTER_DEV_HOST_VOLUME && vcptr->tc_dev_type != FILTER_DEV_MIRROR_SETUP) break; if (!driver_ctx->root_dev) break; if (!convert_path_to_dev(vcptr->tc_guid, &vdev)) { if (vdev == driver_ctx->root_dev) { dbg("Root Dev = %s", vcptr->tc_guid); isroot = 1; } } } while(0); return isroot; } /* Character interface functions exported by involflt DM target module. */ inm_s32_t flt_open(struct inode *inode, struct file *filp) { /* Nothing much to do here. */ filp->private_data = NULL; return 0; } #if LINUX_VERSION_CODE <= KERNEL_VERSION(2,6,35) inm_s32_t flt_ioctl(struct inode *inode, struct file *filp, inm_u32_t cmd, unsigned long arg) #else #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,10,0) long flt_ioctl(struct file *filp, inm_u32_t cmd, unsigned long arg) #else inm_s32_t flt_ioctl(struct file *filp, inm_u32_t cmd, unsigned long arg) #endif #endif { inm_s32_t error = 0; /* if driver is getting unloaded then fail the IOCTLs */ if (inm_mod_state & INM_ALLOW_UNLOAD) { err("Driver is being unloaded now"); return INM_EINVAL; } switch(cmd) { case IOCTL_INMAGE_VOLUME_STACKING: error = process_volume_stacking_ioctl(filp, (void __user *)arg); break; case IOCTL_INMAGE_MIRROR_VOLUME_STACKING: error = process_mirror_volume_stacking_ioctl(filp, (void __user *)arg); break; case IOCTL_INMAGE_PROCESS_START_NOTIFY: error = process_start_notify_ioctl(filp, (void __user *)arg); break; case IOCTL_INMAGE_SERVICE_SHUTDOWN_NOTIFY: error = process_shutdown_notify_ioctl(filp, (void __user *) arg); break; case IOCTL_INMAGE_STOP_FILTERING_DEVICE: error = process_stop_filtering_ioctl(filp, (void __user *)arg); break; case IOCTL_INMAGE_START_FILTERING_DEVICE_V2: error = process_start_filtering_ioctl(filp, (void __user *)arg); break; case IOCTL_INMAGE_FREEZE_VOLUME: error = process_freeze_volume_ioctl(filp, (void __user *)arg); break; case IOCTL_INMAGE_THAW_VOLUME: error = process_thaw_volume_ioctl(filp, (void __user *)arg); break; case IOCTL_INMAGE_TAG_VOLUME_V2: error = process_tag_volume_ioctl(filp, (void __user *)arg); break; case IOCTL_INMAGE_TAG_COMMIT_V2: error = process_commit_revert_tag_ioctl(filp, (void __user *)arg); break; case IOCTL_INMAGE_CREATE_BARRIER_ALL: error = process_create_iobarrier_ioctl(filp, (void __user *)arg); break; case IOCTL_INMAGE_REMOVE_BARRIER_ALL: error = process_remove_iobarrier_ioctl(filp, (void __user *)arg); break; case IOCTL_INMAGE_IOBARRIER_TAG_VOLUME: error = process_iobarrier_tag_volume_ioctl(filp, (void __user *)arg); break; case IOCTL_INMAGE_START_MIRRORING_DEVICE: error = process_start_mirroring_ioctl(filp, (void __user *)arg); break; case IOCTL_INMAGE_STOP_MIRRORING_DEVICE: error = process_stop_mirroring_ioctl(filp, (void __user *)arg); break; case IOCTL_INMAGE_GET_DIRTY_BLOCKS_TRANS_V2: error = process_get_db_ioctl(filp, (void __user *) arg); break; case IOCTL_INMAGE_COMMIT_DIRTY_BLOCKS_TRANS: error = process_commit_db_ioctl(filp, (void __user *)arg); break; case IOCTL_INMAGE_GET_NANOSECOND_TIME: error = process_get_time_ioctl((void __user *)arg); break; case IOCTL_INMAGE_CLEAR_DIFFERENTIALS: error = process_clear_diffs_ioctl(filp, (void __user *)arg); break; case IOCTL_INMAGE_GET_VOLUME_FLAGS: error = process_get_volume_flags_ioctl(filp, (void __user *)arg); break; case IOCTL_INMAGE_SET_VOLUME_FLAGS: error = process_set_volume_flags_ioctl(filp, (void __user *)arg); break; case IOCTL_INMAGE_WAIT_FOR_DB: error = process_wait_for_db_ioctl(filp, (void __user *)arg); break; case IOCTL_INMAGE_WAIT_FOR_DB_V2: error = process_wait_for_db_v2_ioctl(filp, (void __user *)arg); break; case IOCTL_INMAGE_UNSTACK_ALL: do_unstack_all(); break; case IOCTL_INMAGE_SYS_SHUTDOWN: error = process_sys_shutdown_notify_ioctl(filp, (void __user *)arg); break; case IOCTL_INMAGE_SYS_PRE_SHUTDOWN: error = process_sys_pre_shutdown_notify_ioctl(filp, (void __user *)arg); break; case IOCTL_INMAGE_TAG_VOLUME: error = process_tag_ioctl(filp, (void __user *)arg, ASYNC_TAG); break; case IOCTL_INMAGE_SYNC_TAG_VOLUME: error = process_tag_ioctl(filp, (void __user *)arg, SYNC_TAG); break; case IOCTL_INMAGE_GET_TAG_VOLUME_STATUS: error = process_get_tag_status_ioctl(filp, (void __user *)arg); break; case IOCTL_INMAGE_WAKEUP_ALL_THREADS: error = process_wake_all_threads_ioctl(filp, (void __user *)arg); break; case IOCTL_INMAGE_GET_DB_NOTIFY_THRESHOLD: error = process_get_db_threshold(filp,(void __user *)arg); break; case IOCTL_INMAGE_RESYNC_START_NOTIFICATION: error = process_resync_start_ioctl(filp,(void __user *)arg); break; case IOCTL_INMAGE_RESYNC_END_NOTIFICATION: error = process_resync_end_ioctl(filp,(void __user *)arg); break; case IOCTL_INMAGE_GET_DRIVER_VERSION: error = process_get_driver_version_ioctl(filp, (void __user *)arg); break; case IOCTL_INMAGE_SHELL_LOG: error = process_shell_log_ioctl(filp, (void __user *) arg); break; case IOCTL_INMAGE_AT_LUN_CREATE: error = process_at_lun_create(filp, (void __user *)arg); break; case IOCTL_INMAGE_AT_LUN_DELETE: error = process_at_lun_delete(filp, (void __user *)arg); break; case IOCTL_INMAGE_AT_LUN_LAST_WRITE_VI: error = process_at_lun_last_write_vi(filp, (void __user *) arg); break; case IOCTL_INMAGE_AT_LUN_LAST_HOST_IO_TIMESTAMP: error = process_at_lun_last_host_io_timestamp(filp, (void __user *) arg); break; case IOCTL_INMAGE_AT_LUN_QUERY: error = process_at_lun_query(filp, (void __user*) arg); break; case IOCTL_INMAGE_GET_GLOBAL_STATS: error = process_get_global_stats_ioctl(filp, (void __user*)arg); dbg("IOCTL_INMAGE_GET_GLOBAL_STATS ioctl err = %d\n", error); break; case IOCTL_INMAGE_GET_VOLUME_STATS: error = process_get_volume_stats_ioctl(filp, (void __user*)arg); dbg("IOCTL_INMAGE_GET_VOLUME_STATS ioctl err = %d\n", error); break; case IOCTL_INMAGE_GET_VOLUME_STATS_V2: error = process_get_volume_stats_v2_ioctl(filp, (void __user*)arg); dbg("IOCTL_INMAGE_GET_VOLUME_STATS_V2 ioctl err = %d\n", error); break; case IOCTL_INMAGE_GET_MONITORING_STATS: error = process_get_monitoring_stats_ioctl(filp, (void __user*)arg); dbg("IOCTL_INMAGE_GET_MONITORING_STATS ioctl err = %d\n", error); break; case IOCTL_INMAGE_GET_PROTECTED_VOLUME_LIST: error = process_get_protected_volume_list_ioctl(filp, (void __user*)arg); dbg("IOCTL_INMAGE_GET_PROTECTED_VOLUME_LIST ioctl err = %d\n", error); break; case IOCTL_INMAGE_GET_SET_ATTR: error = process_get_set_attr_ioctl(filp, (void __user*)arg); dbg("IOCTL_INMAGE_GET_SET_ATTR ioctl err = %d\n", error); break; case IOCTL_INMAGE_BOOTTIME_STACKING: error = process_boottime_stacking_ioctl(filp, (void __user*)arg); dbg("IOCTL_INMAGE_BOOTTIME_STACKING ioctl err = %d\n", error); break; case IOCTL_INMAGE_VOLUME_UNSTACKING: error = process_volume_unstacking_ioctl(filp, (void __user *)arg); break; case IOCTL_INMAGE_MIRROR_EXCEPTION_NOTIFY: error = process_mirror_exception_notify_ioctl(filp, (void *)arg); dbg("IOCTL_INMAGE_MIRROR_EXCEPTION_NOTIFY ioctl err = %d\n", error); break; case IOCTL_INMAGE_GET_ADDITIONAL_VOLUME_STATS: error = process_get_additional_volume_stats(filp, (void __user *) arg); dbg("IOCTL_INMAGE_GET_ADDITIONAL_VOLUME_STATS ioctl err = %d\n", error); break; case IOCTL_INMAGE_GET_VOLUME_LATENCY_STATS: error = process_get_volume_latency_stats(filp, (void __user *) arg); dbg("IOCTL_INMAGE_GET_VOLUME_LATENCY_STATS ioctl err = %d\n", error); break; case IOCTL_INMAGE_GET_VOLUME_BMAP_STATS: error = process_bitmap_stats_ioctl(filp, (void __user *)arg); dbg("IOCTL_INMAGE_GET_VOLUME_BMAP_STATS ioctl err = %d\n", error); break; case IOCTL_INMAGE_SET_INVOLFLT_VERBOSITY: error = process_set_involflt_verbosity(filp, (void __user *)arg); dbg("IOCTL_INMAGE_SET_INVOLFLT_VERBOSITY ioctl err = %d\n", error); break; case IOCTL_INMAGE_MIRROR_TEST_HEARTBEAT: error = process_mirror_test_heartbeat(filp, (void __user *)arg); dbg("IOCTL_INMAGE_MIRROR_TEST_HEARTBEAT ioctl err = %d\n", error); break; case IOCTL_INMAGE_BLOCK_AT_LUN: error = process_block_at_lun(filp, (void __user *)arg); dbg("IOCTL_INMAGE_BLOCK_AT_LUN ioctl err = %d\n", error); break; case IOCTL_INMAGE_GET_BLK_MQ_STATUS: error = process_get_blk_mq_status_ioctl(filp, (void __user*)arg); dbg("IOCTL_INMAGE_GET_BLK_MQ_STATUS ioctl err = %d", error); break; case IOCTL_INMAGE_REPLICATION_STATE: error = process_replication_state_ioctl(filp, (void __user*)arg); dbg("IOCTL_INMAGE_REPLICATION_STATE ioctl err = %d", error); break; case IOCTL_INMAGE_NAME_MAPPING: error = process_name_mapping_ioctl(filp, (void __user*)arg); dbg("IOCTL_INMAGE_NAME_MAPPING ioctl err = %d", error); break; case IOCTL_INMAGE_LCW: error = process_lcw_ioctl(filp, (void __user*)arg); dbg("IOCTL_INMAGE_LCW ioctl err = %d", error); break; case IOCTL_INMAGE_INIT_DRIVER_FULLY: error = process_init_driver_fully(filp, (void *)arg); dbg("IOCTL_INMAGE_INIT_DRIVER_FULLY err = %d\n", error); break; case IOCTL_INMAGE_COMMITDB_FAIL_TRANS: error = process_commitdb_fail_trans_ioctl(filp, (void __user*)arg); dbg("IOCTL_INMAGE_COMMITDB_FAIL_TRANS err = %d\n", error); break; case IOCTL_INMAGE_GET_CXSTATS_NOTIFY: error = process_get_cxstatus_notify_ioctl(filp, (void __user*)arg); dbg("IOCTL_INMAGE_GET_CXSTATS_NOTIFY err = %d\n", error); break; case IOCTL_INMAGE_WAKEUP_GET_CXSTATS_NOTIFY_THREAD: error = process_wakeup_get_cxstatus_notify_ioctl(filp, (void __user*)arg); dbg("IOCTL_INMAGE_WAKEUP_GET_CXSTATS_NOTIFY_THREAD err = %d\n", error); break; case IOCTL_INMAGE_TAG_DRAIN_NOTIFY: error = process_tag_drain_notify_ioctl(filp, (void __user*)arg); dbg("IOCTL_INMAGE_TAG_DRAIN_NOTIFY err = %d\n", error); break; case IOCTL_INMAGE_WAKEUP_TAG_DRAIN_NOTIFY_THREAD: error = process_wakeup_tag_drain_notify_ioctl(filp, (void __user*)arg); dbg("IOCTL_INMAGE_WAKEUP_TAG_DRAIN_NOTIFY_THREAD err = %d\n", error); break; case IOCTL_INMAGE_MODIFY_PERSISTENT_DEVICE_NAME: error = process_modify_persistent_device_name(filp, (void __user*)arg); dbg("IOCTL_INMAGE_MODIFY_PERSISTENT_DEVICE_NAME err = %d\n", error); break; case IOCTL_INMAGE_GET_DRAIN_STATE: error = process_get_drain_state_ioctl(filp, (void __user*)arg); dbg("IOCTL_INMAGE_GET_DRAIN_STATE err = %d\n", error); break; case IOCTL_INMAGE_SET_DRAIN_STATE: error = process_set_drain_state_ioctl(filp, (void __user*)arg); dbg("IOCTL_INMAGE_SET_DRAIN_STATE err = %d\n", error); break; default: err("Invalid ioctl command(%u) issued by pid %d process %s", cmd, current->pid, current->comm); error = INM_EINVAL; } return error; } #if LINUX_VERSION_CODE <= KERNEL_VERSION(2,6,17) #if (suse && DISTRO_VER==10 && PATCH_LEVEL==2) int flt_flush(struct file *filp, fl_owner_t id) #else inm_s32_t flt_flush(struct file *filp) #endif { /* Need to distinguish between noraml close and drainer shutdown in which * we need to perform the cleanup. */ /* Perform cleanup due to drainer shutdown. private_data will be * set to non-null for fd which issues PROCESS_START_NOTIFY. */ return 0; } #else inm_s32_t flt_flush(struct file *filp, fl_owner_t id) { return 0; } #endif #if (LINUX_VERSION_CODE < KERNEL_VERSION(2,6,10)) && defined(CONFIG_HIGHMEM) static struct page * inm_nopage(struct vm_area_struct *vma, unsigned long address, int *type) { struct inm_list_head *ptr, *hd; change_node_t *chg_node = vma->vm_private_data; data_page_t *page; unsigned long pgoff; pgoff = ((address - vma->vm_start) >> PAGE_CACHE_SHIFT) + vma->vm_pgoff; hd =&(chg_node->data_pg_head); for(ptr = hd->next; pgoff != 0; ptr = ptr->next, pgoff--); page = inm_list_entry(ptr, data_page_t, next); page_cache_get(page->page); return page->page; } struct vm_operations_struct inm_vm_ops = { .nopage = inm_nopage, }; #endif inm_s32_t flt_mmap(struct file *filp, struct vm_area_struct *vma) { change_node_t *chg_node = filp->private_data; #if (LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,10)) || \ (LINUX_VERSION_CODE < KERNEL_VERSION(2,6,10) && \ !defined(CONFIG_HIGHMEM)) struct inm_list_head *ptr, *hd; data_page_t *page; inm_s32_t vm_offset = 0; #endif inm_s32_t status = 0; if(!chg_node) return -EINVAL; #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,10) && defined(CONFIG_HIGHMEM) vma->vm_ops = &inm_vm_ops; vma->vm_private_data = (void *)chg_node; #else hd =&(chg_node->data_pg_head); for(ptr = hd->next; ptr != hd; ptr = ptr->next) { page = inm_list_entry(ptr, data_page_t, next); if(REMAP_PAGE(vma, (vma->vm_start + vm_offset), PAGE_2_PFN_OR_PHYS(page->page), PAGE_SIZE, PAGE_SHARED)) { err("remap failed"); status = -ENOMEM; break; } vm_offset += PAGE_SIZE; } #endif return status; } static void restore_sd_open() { if(driver_ctx->dc_at_lun.dc_at_drv_info.status){ free_all_at_lun_entries(); } driver_ctx->dc_at_lun.dc_at_drv_info.status = 0; while (INM_ATOMIC_READ(&(driver_ctx->dc_at_lun.dc_at_drv_info.nr_in_flight_ops))) { INM_DELAY(3 * INM_HZ); } } struct file_operations flt_ops = { .owner = THIS_MODULE, #if LINUX_VERSION_CODE <= KERNEL_VERSION(2,6,35) .ioctl = flt_ioctl, #else .unlocked_ioctl = flt_ioctl, #endif .open = flt_open, .release = flt_release, .flush = flt_flush, .mmap = flt_mmap, }; inm_s32_t is_root_filesystem_is_rootfs(void) { inm_s32_t ret = 0; struct file *f = NULL; f = filp_open("/", O_RDONLY, 0400); if(!f){ goto out; } #if LINUX_VERSION_CODE < KERNEL_VERSION(3,19,0) if(!f->f_dentry){ goto out; } #endif if(!INM_HDL_TO_INODE(f) || (INM_HDL_TO_INODE(f))->i_sb) { goto out; } if(!strcmp("rootfs", INM_HDL_TO_INODE(f)->i_sb->s_type->name)) { ret = 1; } out: if(f){ filp_close(f, current->files); } return ret; } #ifdef INITRD_MODE static char *in_initrd = "no"; module_param(in_initrd, charp, 0000); MODULE_PARM_DESC(in_initrd, "A character string"); #endif inm_s32_t __init involflt_init(void) { inm_s32_t r; unsigned long lock_flag = 0; info("Version - %u.%u.%u.%u", INMAGE_PRODUCT_VERSION_MAJOR, INMAGE_PRODUCT_VERSION_MINOR, INMAGE_PRODUCT_VERSION_PRIVATE, INMAGE_PRODUCT_VERSION_BUILDNUM); #ifdef INITRD_MODE if (!strcmp(in_initrd, "yes") || !strcmp(in_initrd, "YES")) driver_state = DRV_LOADED_PARTIALLY; #endif telemetry_init_exception(); if((driver_state & DRV_LOADED_FULLY) && is_root_filesystem_is_rootfs()){ info("The root filesystem is rootfs and so not loading the involflt driver"); return INM_EINVAL; } atomic_set(&inm_flt_memprint,0); r = init_driver_context(); if(r) return r; r = register_filter_target(); if (r < 0) { err("Failed to register involflt target with the system"); goto free_dc_and_exit; } /* initializes the freeze volume list */ INM_INIT_LIST_HEAD(&driver_ctx->freeze_vol_list); r = alloc_chrdev_region(&driver_ctx->flt_dev, 0, 1, "involflt"); if( r < 0 ) { err("Failed to allocate character major number for involflt \ driver"); goto free_dc_and_exit; } INM_MEM_ZERO(&driver_ctx->flt_cdev, sizeof(inm_cdev_t)); cdev_init(&driver_ctx->flt_cdev, &flt_ops); driver_ctx->flt_cdev.owner = THIS_MODULE; driver_ctx->flt_cdev.ops = &flt_ops; r = cdev_add(&driver_ctx->flt_cdev, driver_ctx->flt_dev, 1); if( r < 0) { err("Failed in cdev_add for involflt"); goto free_chrdev_region_exit; } r = initialize_bitmap_api(); if (r < 0) { err("Failed in creation iob mempools"); goto free_chrdev_region_exit; } /* creation of service thread here */ r = create_service_thread(); if (r) { err("could not able to create service thread \n"); goto free_chrdev_region_exit; } if (create_alloc_thread()) { err("could not able to create allocation thread"); goto free_service_thread; } if(block_sd_open()){ INM_SPIN_LOCK_IRQSAVE(&driver_ctx->clean_shutdown_lock, lock_flag); driver_ctx->dc_flags |= DRV_MIRROR_NOT_SUPPORT; info("Mirror capability is not supported by involflt driver"); INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->clean_shutdown_lock, lock_flag); } if(driver_state & DRV_LOADED_FULLY) { sysfs_involflt_init(); get_root_info(); } driver_ctx->flags |= DC_FLAGS_INVOLFLT_LOAD; init_boottime_stacking(); driver_ctx->flags &= ~DC_FLAGS_INVOLFLT_LOAD; if (driver_state & DRV_LOADED_FULLY) { if (driver_ctx->clean_shutdown) inm_flush_clean_shutdown(UNCLEAN_SHUTDOWN); telemetry_init(); info("Successfully loaded involflt target module"); } else { info("Successfully loaded involflt target module from initrd"); } return 0; free_service_thread: destroy_service_thread(); free_chrdev_region_exit: unregister_chrdev_region(driver_ctx->flt_dev, 1); free_dc_and_exit: free_driver_context(); err("Failed to load involflt target module"); return r; } static inm_s32_t block_sd_open() { VOLUME_GUID *guid = NULL; inm_block_device_t *bdev = NULL; struct gendisk *bd_disk = NULL; const struct block_device_operations *fops = NULL; struct scsi_device *sdp = NULL; inm_u32_t i = 0; inm_u32_t error = 0; dbg("entered"); guid = (VOLUME_GUID *)INM_KMALLOC(sizeof(VOLUME_GUID), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!guid) { err("INM_KMALLOC failed to allocate memory for VOLUME_GUID"); error = -ENOMEM; goto out; } for(i = 0; i < 26; i++) { snprintf(guid->volume_guid, GUID_SIZE_IN_CHARS-1, "/dev/sd%c", 'a' + i); guid->volume_guid[GUID_SIZE_IN_CHARS-1] = '\0'; bdev = open_by_dev_path(guid->volume_guid, 0); if (bdev && !IS_ERR(bdev)){ break; } } if (!bdev || IS_ERR(bdev)) { error = 1; goto out; } bd_disk = bdev->bd_disk; if (!bd_disk || !inm_get_parent_dev(bd_disk)) { error = 1; goto out; } sdp = to_scsi_device(inm_get_parent_dev(bd_disk)); fops = bd_disk->fops; memcpy_s(&driver_ctx->dc_at_lun.dc_at_drv_info.mod_dev_ops, sizeof(struct block_device_operations), fops, sizeof(struct block_device_operations)); driver_ctx->dc_at_lun.dc_at_drv_info.orig_drv_open = fops->open; driver_ctx->dc_at_lun.dc_at_drv_info.orig_dev_ops = fops; INM_ATOMIC_SET(&(driver_ctx->dc_at_lun.dc_at_drv_info.nr_in_flight_ops), 0); replace_sd_open(); driver_ctx->dc_at_lun.dc_at_drv_info.status = 1; out: if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ dbg("leaving with err %d",error); } if (guid){ INM_KFREE(guid, sizeof(VOLUME_GUID), INM_KERNEL_HEAP); } return error; } void __exit involflt_exit(void) { inm_s32_t r; INM_DELAY(INM_WAIT_UNLOAD * INM_HZ); inm_register_reboot_notifier(FALSE); telemetry_shutdown(); inm_verify_free_area(); restore_disk_rel_ptrs(); cdev_del(&driver_ctx->flt_cdev); unregister_chrdev_region(driver_ctx->flt_dev, 1); destroy_alloc_thread(); destroy_service_thread(); terminate_bitmap_api(); r = unregister_filter_target(); INM_BUG_ON(r < 0); restore_sd_open(); free_driver_context(); info("Successfully unloaded involflt target module"); } void inm_bufoff_to_fldisk(inm_buf_t *bp, target_context_t *tcp, inm_u64_t *abs_off) { *abs_off = 0; } inm_mirror_bufinfo_t * inm_get_imb_cached(host_dev_ctx_t *hdcp) { return NULL; } inm_s32_t inm_prepare_atbuf(inm_mirror_atbuf *atbuf_wrap, inm_buf_t *bp, mirror_vol_entry_t *vol_entry, inm_u32_t count) { inm_u32_t more = 0; memcpy_s((&(atbuf_wrap->imb_atbuf_buf)), sizeof(inm_buf_t), bp, sizeof(inm_buf_t)); #if LINUX_VERSION_CODE < KERNEL_VERSION(3,8,13) atbuf_wrap->imb_atbuf_buf.bi_destructor = NULL; #endif #if ((LINUX_VERSION_CODE < KERNEL_VERSION(5,12,0) && \ LINUX_VERSION_CODE >= KERNEL_VERSION(4,14,0)) || \ defined SLES12SP4 || defined SLES12SP5 || defined SLES15) && \ (!defined SLES15SP4) atbuf_wrap->imb_atbuf_buf.bi_disk = vol_entry->mirror_dev->bd_disk; #else atbuf_wrap->imb_atbuf_buf.bi_bdev = vol_entry->mirror_dev; #endif atbuf_wrap->imb_atbuf_buf.bi_end_io = inm_at_mirror_iodone; atbuf_wrap->imb_atbuf_buf.bi_next = NULL; atbuf_wrap->imb_atbuf_vol_entry = vol_entry; atbuf_wrap->imb_atbuf_iosz = INM_BUF_COUNT(&(atbuf_wrap->imb_atbuf_buf)); INM_REF_VOL_ENTRY(vol_entry); INM_SET_TEST_MIRROR((&atbuf_wrap->imb_atbuf_buf)); return more; } void inm_issue_atio(inm_buf_t *at_bp, mirror_vol_entry_t *vol_entry) { struct request_queue *q = NULL; q = bdev_get_queue(vol_entry->mirror_dev); #if LINUX_VERSION_CODE < KERNEL_VERSION(5, 8, 0) && !defined(SLES15SP3) q->make_request_fn(q, at_bp); #endif } void inm_cleanup_mirror_bufinfo(host_dev_ctx_t *hdcp) { return; } module_init(involflt_init); module_exit(involflt_exit); MODULE_AUTHOR("Microsoft Corporation"); MODULE_DESCRIPTION("Microsoft Filter Driver"); MODULE_LICENSE("GPL v2"); MODULE_VERSION(BLD_DATE " [ " BLD_TIME " ]"); involflt-0.1.0/src/bitmap_operations.c0000755000000000000000000005406514467303177016546 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ /* * File : bitmap_operations.c * * Description: This file contains bitmap mode implementation of the * filter driver. */ #include "involflt.h" #include "involflt-common.h" #include "data-mode.h" #include "utils.h" #include "change-node.h" #include "filestream.h" #include "iobuffer.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "bitmap_api.h" #include "VBitmap.h" #include "work_queue.h" #include "data-file-mode.h" #include "target-context.h" #include "driver-context.h" #include "involflt_debug.h" #include "bitmap_operations.h" /* * all this code assumes bit ordering within a byte is like this: * 7 6 5 4 3 2 1 0 bit number within byte * |x x x x x x x x| where unsigned char=0x01 would have bit 0=1 and unsigned char=0x80 would have bit 7=1 * * This means bit 17(decimal) in a char[3] array would be here: * 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 * | 0 0 0 0 0 0 1 0| 0 0 0 0 0 0 0 0| 0 0 0 0 0 0 0 0| * | ch[2] | ch[1] | ch[0] | * * All processors do it this way, even if the WORD endian is big-endian, char arrays are not affected */ /* * [numberOfSetBits][bitOffsetInByte] * this table is for setting a nbr of bits in a byte at a specific bit offset */ const unsigned char bitSetTable[9][8] /* bit offset 0 1 2 3 4 5 6 7 */ = {{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}, /* 0 bits */ {0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80}, /* 1 bits */ {0x03, 0x06, 0x0C, 0x18, 0x30, 0x60, 0xC0, 0x00}, /* 2 bits, note that 0x00 means invalid */ {0x07, 0x0E, 0x1C, 0x38, 0x70, 0xE0, 0x00, 0x00}, /* 3 bits */ {0x0F, 0x1E, 0x3C, 0x78, 0xF0, 0x00, 0x00, 0x00}, /* 4 bits */ {0x1F, 0x3E, 0x7C, 0xF8, 0x00, 0x00, 0x00, 0x00}, /* 5 bits */ {0x3F, 0x7E, 0xFC, 0x00, 0x00, 0x00, 0x00, 0x00}, /* 6 bits */ {0x7F, 0xFE, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}, /* 7 bits */ {0xFF, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}}; /* 8 bits */ /* this table is used for searching for bit runs, it tells the bit offset of the first set bit */ const unsigned char bitSearchOffsetTable[256] = /* 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, */ { 0x00, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, /* 0x1x */ 0x04, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, /* 0x2x */ 0x05, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, /* 0x3x */ 0x04, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, /* 0x4x */ 0x06, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, /* 0x5x */ 0x04, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, /* 0x6x */ 0x05, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, /* 0x7x */ 0x04, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, /* 0x8x */ 0x07, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, /* 0x9x */ 0x04, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, /* 0xAx */ 0x05, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, /* 0xBx */ 0x04, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, /* 0xCx */ 0x06, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, /* 0xDx */ 0x04, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, /* 0xEx */ 0x05, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, /* 0xFx */ 0x04, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00}; /* this table is used for searching for bit runs, it tells the number of bits that are contiguous from the lsb, use with shifting */ const unsigned char bitSearchCountTable[256] = /* 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, */ { 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x04, /* 0x00 */ 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x05, /* 0x10 */ 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x04, /* 0x20 */ 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x06, /* 0x30 */ 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x04, /* 0x40 */ 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x05, /* 0x50 */ 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x04, /* 0x60 */ 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x07, /* 0x70 */ 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x04, /* 0x80 */ 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x05, /* 0x90 */ 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x04, /* 0xa0 */ 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x06, /* 0xb0 */ 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x04, /* 0xc0 */ 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x05, /* 0xd0 */ 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x04, /* 0xe0 */ 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x03, 0x00, 0x01, 0x00, 0x02, 0x00, 0x01, 0x00, 0x08};/* 0xf0 */ /* * this table is used to mask the last byte in a buffer so we don't test bits we're not supposed to * index will be number of bits remaining as valid */ const unsigned char lastByteMaskTable[9] = {0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F, 0xFF}; /* * these are codes that are split into 4 bytes to control the bit manipulation * they merge an entry from bitSetTable into a byte in the desired operation */ #define OpValueSetBits (0xFF000000) #define OpValueClearBits (0xFFFF00FF) #define OpValueInvertBits (0x00FF00FF) #define MAX_BITRUN_LEN (0x400) //1024 /* Number of set bits for each value of a nibble; used for counting */ const unsigned char nibble_bit_count[16] = {0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4}; inm_s32_t ProcessBitRun(unsigned char * bitBuffer, inm_u32_t bitsInBitmap, inm_u32_t bitsInRun, inm_u32_t bitOffset, inm_u32_t * nbrBytesChanged, unsigned char * *firstByteChanged, inm_u32_t opValue) { inm_s32_t status; inm_u32_t bytesTouched; inm_u32_t bitBufferSize; inm_u32_t bitsInFirstByte; inm_u32_t bitOffsetInFirstByte; unsigned char * firstByteTouched; unsigned char ch; /* a byte to process */ unsigned char xor1Mask; /* first xor applied to ch */ /* xor applied to bitSetTable entry and then xored with ch */ unsigned char xor2Mask; /* final xor applied to ch */ unsigned char xor3Mask; /* and applied to bitSetTable entry and then ored with ch */ unsigned char andMask; status = 0; /* we need to keep track of which bytes we change */ /* so we can write the correct sectors to disk */ bytesTouched = 0; firstByteTouched = bitBuffer; /* round up to nbr of bytes need to contain bitmap */ bitBufferSize = (bitsInBitmap + 7) / 8; if (bitsInRun == 0) { /* don't do anything */ } else if ((bitOffset >= bitsInBitmap) || (bitsInRun > bitsInBitmap) || /* check that we don't overflow this bitmap segment */ ((bitOffset + bitsInRun) > bitsInBitmap)) { status = EOF_BMAP; } else { /* set the operation values, these are somewhat like */ /* rasterop values used in a bitblt */ xor1Mask = (unsigned char)(opValue & 0xFF); xor2Mask = (unsigned char)((opValue >> 8) & 0xFF); xor3Mask = (unsigned char)((opValue >> 16) & 0xFF); andMask = (unsigned char)((opValue >>24) & 0xFF); /* move the bitmap byte pointer up to the correct byte for the */ /* first opeation */ bitBuffer += (bitOffset / 8); /* handle the first possibly partial byte */ bytesTouched = 1; firstByteTouched = bitBuffer; /* one of 8 offsets, same as % 8 */ bitOffsetInFirstByte = ((inm_u32_t)bitOffset & 0x7); bitsInFirstByte = min((min(bitsInRun, (inm_u32_t)8)), (8 - bitOffsetInFirstByte)); /* this code allows doing set or clear or invert of bits */ ch = *bitBuffer; ch ^= xor1Mask; ch ^= ( bitSetTable[bitsInFirstByte][bitOffsetInFirstByte]) ^ xor2Mask; ch |= ( bitSetTable[bitsInFirstByte][bitOffsetInFirstByte]) & andMask; ch ^= xor3Mask; *bitBuffer = ch; /* the byte is now transformed */ bitsInRun -= bitsInFirstByte; bitBuffer++; /* handle the middle bytes, we have already checked bitmap */ /* bounds, so don't need to do here */ while (bitsInRun >= 8) { /* this code allows doing set or clear or invert of bits */ ch = *bitBuffer; ch ^= xor1Mask; ch ^= 0xFF ^ xor2Mask; ch |= 0xFF & andMask; ch ^= xor3Mask; *bitBuffer = ch; /* the byte is now transformed */ bitsInRun -= 8; bitBuffer++; bytesTouched++; } /* process possible last byte (possibly less than 8 bits) */ if (bitsInRun > 0) { /* this code allows doing set or clear or invert of bits */ ch = *bitBuffer; ch ^= xor1Mask; ch ^= (bitSetTable[bitsInRun][0]) ^ xor2Mask; ch |= (bitSetTable[bitsInRun][0]) & andMask; ch ^= xor3Mask; *bitBuffer = ch; /* the byte is now transformed */ bytesTouched++; bitsInRun = 0; } } if (NULL != nbrBytesChanged) { /* this parameter is optional */ *nbrBytesChanged = bytesTouched; } if (NULL != firstByteChanged) { /* this parameter is optional */ *firstByteChanged = firstByteTouched; } return status; } inm_s32_t SetBitmapBitRun(unsigned char * bitBuffer, inm_u32_t bitsInBitmap, inm_u32_t bitsInRun, inm_u32_t bitOffset, inm_u32_t * nbrBytesChanged, unsigned char * *firstByteChanged) { return ProcessBitRun(bitBuffer, bitsInBitmap, bitsInRun, bitOffset, nbrBytesChanged, firstByteChanged, OpValueSetBits); } inm_s32_t ClearBitmapBitRun(unsigned char * bitBuffer, inm_u32_t bitsInBitmap, inm_u32_t bitsInRun, inm_u32_t bitOffset, inm_u32_t * nbrBytesChanged, unsigned char * *firstByteChanged) { return ProcessBitRun(bitBuffer, bitsInBitmap, bitsInRun, bitOffset, nbrBytesChanged, firstByteChanged, OpValueClearBits); } inm_s32_t InvertBitmapBitRun(unsigned char * bitBuffer, inm_u32_t bitsInBitmap, inm_u32_t bitsInRun, inm_u32_t bitOffset, inm_u32_t * nbrBytesChanged, unsigned char * *firstByteChanged) { return ProcessBitRun(bitBuffer, bitsInBitmap, bitsInRun, bitOffset, nbrBytesChanged, firstByteChanged, OpValueInvertBits); } /* This function is the compute intensive code for turning bitmaps back into groups of bits * It has a bunch of edge conditions: * 1) there could be no run of bits from the starting offset to the end of the bitmap * 2) the starting search offset can be on any bit offset of a byte * 3) the end of the bitmap can be at any bit offset in a byte, not just ends of bytes * 4) the run of bits can extend to any offset of the last byte in the buffer * 5) the starting byte could also be the ending byte in the bitmap (the run after #4) * 6) a run of bits can extend to the end of the bitmap (bounds terminate the run, not clear bit) * 7) the starting search offset may not be be the start of the bit run * 8) the run may start and end on adjacent bytes * 9) the starting offset is past the end of the bitmap * 10) the size of the bitmap could be less than 8 bits */ inm_s32_t GetNextBitmapBitRun( unsigned char * bitBuffer, /* in BITS not bytes */ inm_u32_t totalBitsInBitmap, /* in and out parameter, set search start and is updated, relative to * start of bitBuffer */ inm_u32_t * startingBitOffset, /* 0 means no run found, can be up to totalBitsInBitmap */ inm_u32_t * bitsInRun, /* output bit offset relative to bitBuffer, meaningless value if * *bitsInRun == 0 */ inm_u32_t * bitOffset) { inm_s32_t status; inm_u32_t bitsAvailableToSearch; inm_u32_t runOffset; inm_u32_t runLength; inm_u32_t bitsDiscardedOnFirstByteSearched; inm_u32_t bitsDiscardedOnFirstByteOfRun; inm_u32_t bitsInLastByte; unsigned char ch; inm_u32_t BitRunBreak = FALSE; runOffset = *startingBitOffset; /* the minimum offset it could be */ runLength = 0; if ((totalBitsInBitmap % 8) == 0) { bitsInLastByte = 8; } else { bitsInLastByte = (totalBitsInBitmap % 8); } status = 0; /* already validated this will not underflow */ bitsAvailableToSearch = totalBitsInBitmap - *startingBitOffset; /* throw away full bytes from buffer that are before starting offset */ bitBuffer += (*startingBitOffset / 8); /* get the first byte of buffer, this contains the starting offset bit */ ch = *bitBuffer++; /* get the bits before the starting offset in the first byte * shifted away * this offset is already included in the starting position of * runOffset */ bitsDiscardedOnFirstByteSearched = (inm_u32_t)(*startingBitOffset & 0x7); /* throw away bits before starting point */ ch = ch >> bitsDiscardedOnFirstByteSearched; /* check for the first byte also being the last byte */ if (bitsAvailableToSearch < bitsInLastByte) { /* only partial byte to search, mask off trailing unused bits */ ch &= lastByteMaskTable[bitsInLastByte]; } do { /* is this the last byte of buffer */ if (bitsAvailableToSearch <= bitsInLastByte) { if (ch == 0) { /* no runs found */ break; } else { /* found a run in last byte */ /* get the offset of the first bit */ bitsDiscardedOnFirstByteOfRun = bitSearchOffsetTable[ch]; /* align the first set bit to lsb (little-endian) */ ch >>= bitsDiscardedOnFirstByteOfRun; runOffset += bitsDiscardedOnFirstByteOfRun; /* get the number of bits in the run */ runLength = bitSearchCountTable[ch]; break; } } /* this is not the last byte and we need to find the first * byte of run */ if (ch == 0) { /* get aligned * get aligned to byte boundry */ bitsAvailableToSearch -= (8 - bitsDiscardedOnFirstByteSearched); runOffset += (8 - bitsDiscardedOnFirstByteSearched); /* scan for start of a run */ ch = *bitBuffer++; while ((bitsAvailableToSearch > bitsInLastByte) && (ch == 0)) { ch = *bitBuffer++; /* this can't underflow if the above condition passes */ bitsAvailableToSearch -= 8; runOffset += 8; } } /* we are either at the first byte of the run or the last byte * of the buffer */ if (bitsAvailableToSearch <= bitsInLastByte) { /* only partial byte to search, mask off trailing unused bits */ ch &= lastByteMaskTable[bitsInLastByte]; if (ch) { /* on last byte, run found * get the offset of the first bit */ bitsDiscardedOnFirstByteOfRun = bitSearchOffsetTable[ch]; /* align the first set bit to lsb (little-endian) */ ch >>= bitsDiscardedOnFirstByteOfRun; runOffset += bitsDiscardedOnFirstByteOfRun; /* get the number of bits in the run */ runLength = bitSearchCountTable[ch]; break; } else { /* on last byte of buffer, no run found */ break; } } /* we must be at the start of a run, and not the end of * the buffer * get the offset of the first bit */ bitsDiscardedOnFirstByteOfRun = bitSearchOffsetTable[ch]; /* align the first set bit to lsb (little-endian) */ ch >>= bitsDiscardedOnFirstByteOfRun; /* this will be the final runOffset position */ runOffset += bitsDiscardedOnFirstByteOfRun; /* get the number of bits of the run in this byte */ runLength = bitSearchCountTable[ch]; if ((bitsDiscardedOnFirstByteOfRun + runLength) < 8) { /* we must have found a run that doesn't continue to the */ /* next byte */ break; } /* the run might continue or not * should be byte aligned */ ch = *bitBuffer++; bitsAvailableToSearch -= 8; while ((bitsAvailableToSearch > bitsInLastByte) && (ch == 0xFF)){ /* Check whether the # bits in next byte can fit in the * current run length */ if ((runLength + 8) > MAX_BITRUN_LEN){ BitRunBreak = TRUE; break; } /* full bytes are part of run */ ch = *bitBuffer++; bitsAvailableToSearch -= 8; runLength += 8; } if (BitRunBreak == TRUE){ break; } /* we know we're either at the end of the run or the end * of the buffer */ if (bitsAvailableToSearch <= bitsInLastByte) { /* on last byte of buffer, mask off any bits that are * past the end of the bitmap * handle bitmaps of non multiple of 8 size */ ch &= lastByteMaskTable[bitsAvailableToSearch]; /* Check whether the # bits in next byte can fit in the * current run length */ if ((runLength + bitSearchCountTable[ch]) > MAX_BITRUN_LEN) { break; } /* get the number of bits starting at the lsb for this run */ runLength += bitSearchCountTable[ch]; break; } /* Check whether the # bits in next byte can fit in the * current run length */ if ((runLength + bitSearchCountTable[ch]) > MAX_BITRUN_LEN) { break; } /* run must end on this byte and this is not end of buffer * get the number of bits in the run */ runLength += bitSearchCountTable[ch]; } while (0); if (runLength == 0) { /* no bits past startingOffset */ *startingBitOffset = totalBitsInBitmap; } else { /* update for next run search */ *startingBitOffset = runOffset + runLength; } *bitsInRun = runLength; *bitOffset = runOffset; return status; } inm_u32_t find_number_of_bits_set(unsigned char *bit_buffer, inm_u32_t buffer_size_in_bits) { unsigned char *buffer; inm_u32_t byte_count, remainder_bits; inm_u32_t total_bits; unsigned char remainder_byte; if (!bit_buffer || buffer_size_in_bits == 0) return 0; buffer = bit_buffer; byte_count = (buffer_size_in_bits/8)+ ((buffer_size_in_bits%8)?1:0); total_bits = 0; remainder_bits = buffer_size_in_bits & 0x7; INM_BUG_ON(byte_count==0); while (byte_count!=0) { /* find bits correspond to top nibble */ total_bits += nibble_bit_count[*buffer >> 4]; /*find bits correspond to lower nibble of byte */ total_bits += nibble_bit_count[*buffer & 0xf]; --byte_count; if (!byte_count) { break; } buffer++; } remainder_byte = *buffer & lastByteMaskTable[remainder_bits]; total_bits += nibble_bit_count[remainder_byte >> 4]; total_bits += nibble_bit_count[remainder_byte & 0xf]; return total_bits; } #define DEFAULT_MAX_DATA_SIZE_PER_NON_DATA_MODE_DIRTY_BLOCK (64 * 1024 * 1024) void add_chg_to_db(bmap_bit_stats_t *bbsp, int cur_chg_len) { if (((bbsp->bbs_nr_chgs_in_curr_db + 1) > MAX_CHANGE_INFOS_PER_PAGE) || (bbsp->bbs_nr_dbs == 0) || (bbsp->bbs_curr_db_sz + cur_chg_len > DEFAULT_MAX_DATA_SIZE_PER_NON_DATA_MODE_DIRTY_BLOCK)) { bbsp->bbs_nr_dbs++; bbsp->bbs_nr_chgs_in_curr_db = 1; bbsp->bbs_curr_db_sz = cur_chg_len; } else { bbsp->bbs_nr_chgs_in_curr_db++; bbsp->bbs_curr_db_sz += cur_chg_len; } } void find_bmap_io_pat(char *buf, inm_u64_t bits_in_bmap, bmap_bit_stats_t *bbsp, int eobmap) { int nr_bits_in_word = 8; int nr_byt, nr_bit; int cnt = 0; char c; int cur_chg_len = 0; inm_u64_t len = bits_in_bmap/8; int rem_bits = bits_in_bmap - (len *8); if (rem_bits && !eobmap) { eobmap = 1; } if (bbsp->bbs_nr_prev_bits) { cnt = bbsp->bbs_nr_prev_bits; } for (nr_byt = 0; nr_byt < len; nr_byt++) { c = buf[nr_byt]; for (nr_bit = 0; nr_bit < nr_bits_in_word; nr_bit++) { if (c & (1 << nr_bit)) { cnt = (cnt > 0) ? cnt+1 : 1; if (cnt != bbsp->bbs_max_nr_bits_in_chg) { /* move to next change */ continue; } } if (cnt == 0) { continue; } cur_chg_len = (cnt * bbsp->bbs_bmap_gran); add_chg_to_db(bbsp, cur_chg_len); cnt = 0; } } c = buf[nr_byt]; for (nr_bit = 0; nr_bit < rem_bits; nr_bit++) { if (c & (1 << nr_bit)) { cnt = (cnt > 0) ? cnt+1 : 1; if (cnt != bbsp->bbs_max_nr_bits_in_chg) { /* move to next change */ continue; } } if (cnt == 0) { continue; } cur_chg_len = (cnt * bbsp->bbs_bmap_gran); add_chg_to_db(bbsp, cur_chg_len); cnt = 0; } cur_chg_len = 0; if (cnt) { bbsp->bbs_nr_prev_bits = cnt; cur_chg_len = cnt * bbsp->bbs_bmap_gran; } if (cnt && eobmap) { add_chg_to_db(bbsp, cur_chg_len); } } involflt-0.1.0/src/filestream.h0000755000000000000000000000377414467303177015170 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _INMAGE_FILESTREAM_H #define _INMAGE_FILESTREAM_H #include "involflt-common.h" #include "filestream_raw.h" typedef struct _fstream_tag { void *filp; void *inode; inm_dev_t rdev; void *context; inm_atomic_t refcnt; inm_u32_t fs_flags; fstream_raw_hdl_t *fs_raw_hdl; }fstream_t; #define FS_FLAGS_BUFIO 1 fstream_t *fstream_ctr(void *ctx); void fstream_dtr(fstream_t *fs); void fstream_put(fstream_t *fs); inm_s32_t fstream_open(fstream_t *fs, char *path, inm_s32_t flags, inm_s32_t mode); inm_s32_t fstream_close(fstream_t *fs); inm_s32_t fstream_get_fsize(fstream_t *fs); inm_s32_t fstream_open_or_create(fstream_t *fs, char *path, inm_s32_t *file_created, inm_u32_t bmap_sz); inm_s32_t fstream_write(fstream_t *fs, char *buffer, inm_u32_t size, inm_u64_t offset); inm_s32_t fstream_read(fstream_t *fs, char *buffer, inm_u32_t size, inm_u64_t offset); void fstream_enable_buffered_io(fstream_t *fs); void fstream_disable_buffered_io(fstream_t *fs); void fstream_sync(fstream_t *fs); inm_s32_t fstream_map_file_blocks(fstream_t *, inm_u64_t , inm_u32_t, fstream_raw_hdl_t ** ); void fstream_switch_to_raw_mode(fstream_t *, fstream_raw_hdl_t *); #endif /* _INMAGE_FILESTREAM_H */ involflt-0.1.0/src/inm_mem.h0000755000000000000000000001007714467303177014450 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _INM_MEM_H #define _INM_MEM_H #define INM_KM_NORETRY __GFP_NORETRY #define INM_KM_NOWARN __GFP_NOWARN #define INM_KM_HIGHMEM __GFP_HIGHMEM #define INM_KM_SLEEP GFP_KERNEL #define INM_KM_NOSLEEP GFP_ATOMIC #define INM_KM_NOIO GFP_NOIO #define INM_UMEM_SLEEP GFP_KERNEL #define INM_UMEM_NOSLEEP GFP_ATOMIC #define INM_SLAB_HWCACHE_ALIGN SLAB_HWCACHE_ALIGN #define CHECK_OVERFLOW(size)\ ({\ int ret;\ unsigned long long tmp = (unsigned long long)size;\ if(tmp < ((size_t) - 1)){\ ret = 0;\ }else {\ ret = -1;\ }\ ret;\ }) #define INM_KMALLOC(size, flag, heap)\ ({\ void *rptr = NULL;\ if(!CHECK_OVERFLOW(size)) {\ rptr = inm_kmalloc(size, flag);\ }\ rptr;\ }) #define INM_KFREE(ptr, size, heap) inm_kfree(size, ptr) static inline int INM_PIN(void *addr, size_t size) { return 0; } static inline int INM_UNPIN(void *addr, size_t size) { return 0; } #define INM_VMALLOC(size, flag, heap) \ ({\ void *rptr = NULL;\ if(!CHECK_OVERFLOW(size)) {\ rptr = inm_vmalloc(size);\ }\ rptr;\ }) #define INM_VFREE(ptr, size, heap) inm_vfree(ptr, size) #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,24) #define INM_KMEM_CACHE_CREATE(cache_name, obj_size, align, flags, ctor, dtor, nr_objs, min_nr, pinned) \ kmem_cache_create(cache_name, obj_size, align, flags, ctor) #else #define INM_KMEM_CACHE_CREATE(cache_name, obj_size, align, flags, ctor, dtor, nr_objs, min_nr, pinned) \ kmem_cache_create(cache_name, obj_size, align, flags, ctor, dtor) #endif #define INM_KMEM_CACHE_DESTROY(cachep) kmem_cache_destroy(cachep) #define INM_KMEM_CACHE_ALLOC(cachep, flags) inm_kmem_cache_alloc(cachep, flags) #define INM_KMEM_CACHE_ALLOC_PATH(cachep, flags, size, heap) \ INM_KMEM_CACHE_ALLOC(cachep, flags) #define INM_KMEM_CACHE_FREE(cachep, objp) inm_kmem_cache_free(cachep, objp) #define INM_KMEM_CACHE_FREE_PATH(cachep, objp, heap) \ INM_KMEM_CACHE_FREE(cachep, objp) #define INM_MEMPOOL_CREATE(min_nr, alloc_slab, free_slab, cachep) \ mempool_create(min_nr, alloc_slab, free_slab, cachep) #define INM_MEMPOOL_FREE(objp, poolp) inm_mempool_free(objp, poolp) #define INM_MEMPOOL_ALLOC(poolp, flag) inm_mempool_alloc(poolp, flag) #define INM_MEMPOOL_DESTROY(poolp) mempool_destroy(poolp) #define INM_ALLOC_PAGE(flag) inm_alloc_page(flag) #define __INM_FREE_PAGE(pagep) __inm_free_page(pagep) #define INM_ALLOC_MAPPABLE_PAGE(flag) inm_alloc_page(flag) #define INM_FREE_MAPPABLE_PAGE(page, heap) __inm_free_page(page) #define __INM_GET_FREE_PAGE(flag, heap) __inm_get_free_page(flag) /* This removes any protection we get from compiler against * passing pointers of other unexpected types. */ #define INM_FREE_PAGE(pagep, heap) inm_free_page((unsigned long)pagep) #define INM_MEMPOOL_ALLOC_SLAB mempool_alloc_slab #define INM_MEMPOOL_FREE_SLAB mempool_free_slab #define INM_MMAP_PGOFF(filp, addr, len, prot, flags, pgoff) \ do_mmap_pgoff(filp, addr, len, prot, flags, pgoff) typedef unsigned inm_kmalloc_flag; typedef struct kmem_cache inm_kmem_cache_t; typedef mempool_t inm_mempool_t; typedef unsigned long inm_kmem_cache_flags; typedef mempool_alloc_t inm_mempool_alloc_t; #endif /* end of _INM_MEM_H */ involflt-0.1.0/src/flt_bio.h0000755000000000000000000002354114467303177014445 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _FLT_BIO_H #define _FLT_BIO_H #include "distro.h" #if LINUX_VERSION_CODE >= KERNEL_VERSION(4,4,0) void flt_end_io_fn(struct bio *bio); #elif LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,24) void flt_end_io_fn(struct bio *bio, inm_s32_t error); #else inm_s32_t flt_end_io_fn(struct bio *bio, inm_u32_t done, inm_s32_t error); #endif typedef struct block_device inm_bio_dev_t; #if ((LINUX_VERSION_CODE < KERNEL_VERSION(5,12,0) && \ LINUX_VERSION_CODE >= KERNEL_VERSION(4,14,0)) || \ defined SLES12SP4 || defined SLES12SP5 || \ defined SLES15) && (!defined SLES15SP4) #define INM_BUF_DISK(bio) ((bio)->bi_disk) #if LINUX_VERSION_CODE >= KERNEL_VERSION(5,11,0) #define INM_BUF_BDEV(bio) ((bio)->bi_disk->part0) #else #define INM_BUF_BDEV(bio) (bdget_disk((bio)->bi_disk, 0)) #endif #else #define INM_BUF_DISK(bio) ((bio)->bi_bdev->bd_disk) #define INM_BUF_BDEV(bio) ((bio)->bi_bdev) #endif #define INM_BDEVNAME_PREFIX "/dev/" #if defined BIO_CHAIN || \ LINUX_VERSION_CODE >= KERNEL_VERSION(5,1,0) /* Mainline */ #define INM_IS_CHAINED_BIO(bio) bio_flagged(bio, BIO_CHAIN) #elif defined BIO_AUX_CHAIN /* RHEL 7 */ #define INM_IS_CHAINED_BIO(bio) bio_aux_flagged(bio, BIO_AUX_CHAIN) #else /* unsupported = always FALSE */ #define INM_IS_CHAINED_BIO(bio) (0) #endif #if (LINUX_VERSION_CODE >= KERNEL_VERSION(4,8,0)) || defined SLES12SP3 #define inm_bio_rw(bio) ((bio)->bi_opf) #define inm_bio_is_write(bio) (op_is_write(bio_op(bio))) #define inm_bio_is_discard(bio) (bio_op(bio) == REQ_OP_DISCARD) #else #define inm_bio_rw(bio) ((bio)->bi_rw) #define inm_bio_is_write(bio) (bio_rw(bio) == WRITE) #ifdef RHEL5 #define inm_bio_is_discard(bio) (0) #else #ifdef RHEL6 /* The newer kernels of RHEL6 has pulled the support for DISCARD and BIO_DISCARD * is defined in newer kernels but the driver is built against the base and so * added the same definition to compile and handle the DISCARD as well. */ #define BIO_RQ_DISCARD (1 << 9) #else #define BIO_RQ_DISCARD REQ_DISCARD #endif #define inm_bio_is_discard(bio) ((bio)->bi_rw & BIO_RQ_DISCARD) #endif #endif #if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 13, 0) || \ defined SLES12SP4 || defined SLES12SP5 || defined SLES15 #define inm_bio_error(bio) ((bio)->bi_status) #else #define inm_bio_error(bio) ((bio)->bi_error) #endif #define INM_BUF_IOVEC(bio) ((bio)->bi_io_vec) #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,16,0) typedef struct bvec_iter inm_bvec_iter_t; /* * NOTE: INM_BVEC_ITER* macros follow kernel convention * and take iterators as arguments and not iterator pointers */ #define INM_BVEC_ITER_IDX(iter) ((iter).bi_idx) #define INM_BVEC_ITER_SECTOR(iter) ((iter).bi_sector) #define INM_BVEC_ITER_SZ(iter) ((iter).bi_size) #define INM_BVEC_ITER_BVDONE(iter) ((iter).bi_bvec_done) #if !defined RHEL8 && ((LINUX_VERSION_CODE >= KERNEL_VERSION(4,13,0) && \ LINUX_VERSION_CODE < KERNEL_VERSION(5,0,0)) || defined SLES12SP4) #define INM_BVEC_ITER_INIT() \ ((struct bvec_iter) { \ .bi_sector = 0, \ .bi_size = 0, \ .bi_idx = 0, \ .bi_bvec_done = 0, \ .bi_done = 0, \ }) #else #define INM_BVEC_ITER_INIT() \ ((struct bvec_iter) { \ .bi_sector = 0, \ .bi_size = 0, \ .bi_idx = 0, \ .bi_bvec_done = 0, \ }) #endif #define INM_BUF_ITER(bio) ((bio)->bi_iter) #define INM_BUF_SECTOR(bio) INM_BVEC_ITER_SECTOR(INM_BUF_ITER(bio)) #define INM_BUF_COUNT(bio) INM_BVEC_ITER_SZ(INM_BUF_ITER(bio)) #define INM_BUF_IDX(bio) INM_BVEC_ITER_IDX(INM_BUF_ITER(bio)) #define INM_BUF_OFFSET(bio) \ bvec_iter_offset(INM_BUF_IOVEC(bio), INM_BUF_ITER(bio)) #define bio_iovec_idx(bio, iter) __bvec_iter_bvec((bio)->bi_io_vec, iter) #else /* LINUX_VERSION_CODE < KERNEL_VERSION(3,19,0) */ typedef unsigned short inm_bvec_iter_t; #define INM_BVEC_ITER_INIT() 0 #define INM_BUF_ITER(bi) ((bi)->bi_idx) #define INM_BUF_SECTOR(bi) ((bi)->bi_sector) #define INM_BUF_COUNT(bi) ((bi)->bi_size) #define INM_BUF_IDX(bi) INM_BUF_ITER(bi) #define INM_BUF_OFFSET(bi) bio_offset(bi) #endif /* LINUX_VERSION_CODE >= KERNEL_VERSION(3,19,0) */ /* * Discard zeroes data */ #if (LINUX_VERSION_CODE < KERNEL_VERSION(4,12,0)) && \ !defined RHEL5 /* * For a long time, device used to advertise capability to zero data with * discard request and appications used to use the capability to zero out * blocks using discard command. For correct handling, we check for this * capability when REQ_DISCARD flag is set and zero data on completion. */ #if (LINUX_VERSION_CODE == KERNEL_VERSION(2,6,32)) && defined UEK #define INM_DISCARD_ZEROES_DATA(q) (0) #else #define INM_DISCARD_ZEROES_DATA(q) (q->limits.discard_zeroes_data) #endif #else /* New kernels remove the ambiguity of discard capability. Apps now call * discard to unmap blocks while REQ_OP_WRITE_ZEROES should be explicitly * used when zeroes are expected. The hardware driver may use DISCARD/WRITE_SAME * or any other mechanism to zero the block range but we are not bothered * at our layer. So always return false as its is only true with * REQ_OP_WRITE_ZEROES */ #define INM_DISCARD_ZEROES_DATA(q) (0) #endif /* * Offload requests */ #if (LINUX_VERSION_CODE >= KERNEL_VERSION(4,8,0)) || defined SLES12SP3 #define INM_REQ_WRITE REQ_OP_WRITE #define INM_REQ_WRITE_SAME REQ_OP_WRITE_SAME #define INM_REQ_DISCARD REQ_OP_DISCARD #ifdef REQ_OP_WRITE_ZEROES #define INM_REQ_WRITE_ZEROES REQ_OP_WRITE_ZEROES #else #define INM_REQ_WRITE_ZEROES 0 #endif #define inm_bio_op(bio) bio_op(bio) /* INM_REQ* flags can be defined as 0 == REQ_READ. So check the op is write */ #define INM_IS_BIO_WOP(bio, op) \ (inm_bio_is_write(bio) && (inm_bio_op(bio) == op)) inline static int INM_IS_OFFLOAD_REQUEST_OP(struct bio *bio) { switch (bio_op(bio)) { case INM_REQ_DISCARD: case INM_REQ_WRITE_SAME: #ifdef REQ_OP_WRITE_ZEROES case INM_REQ_WRITE_ZEROES: #endif return 1; default: return 0; } } inline static int INM_IS_SUPPORTED_REQUEST_OP(struct bio *bio) { switch (bio_op(bio)) { case INM_REQ_WRITE: case INM_REQ_DISCARD: case INM_REQ_WRITE_SAME: #ifdef REQ_OP_WRITE_ZEROES case INM_REQ_WRITE_ZEROES: #endif return 1; default: return 0; } } #else /* LINUX_VERSION_CODE < KERNEL_VERSION(4,8,0) */ #define INM_REQ_WRITE REQ_WRITE #ifdef RHEL5 #define INM_REQ_DISCARD 0 #define INM_REQ_WRITE_SAME 0 #define INM_REQ_WRITE_ZEROES 0 #else /* !RHEL5 */ #define INM_REQ_DISCARD BIO_RQ_DISCARD #define INM_REQ_WRITE_ZEROES 0 #ifdef REQ_WRITE_SAME #define INM_REQ_WRITE_SAME REQ_WRITE_SAME #else #define INM_REQ_WRITE_SAME 0 #endif #endif /* !RHEL5 */ #define __INM_SUPPORTED_OFFLOAD_REQUESTS \ (INM_REQ_WRITE_SAME | INM_REQ_DISCARD) /* RHEL 7.4+ & new kernels provide bio_op() to extract ops from other flags */ #ifdef bio_op #define inm_bio_op(bio) bio_op(bio) #define __INM_SUPPORTED_REQUESTS \ ((unsigned long) (INM_REQ_WRITE | __INM_SUPPORTED_OFFLOAD_REQUESTS)) #else /* bio_op */ /* * For legacy kernels, we support all ops since there is no easy way to extract * ops from other flags and we do not want to switch to md mode unnecessarily */ #define inm_bio_op(bio) (bio->bi_rw) #define __INM_SUPPORTED_REQUESTS UINT_MAX #endif /* bio_op */ /* INM_REQ* flags can be defined as 0 == REQ_READ. So check the op is write */ #define INM_IS_BIO_WOP(bio, op) \ (inm_bio_is_write(bio) && (inm_bio_op(bio) & op)) #define INM_IS_OFFLOAD_REQUEST_OP(bio) \ (inm_bio_op(bio) & __INM_SUPPORTED_OFFLOAD_REQUESTS) #define INM_IS_SUPPORTED_REQUEST_OP(bio) \ ((inm_bio_op(bio) & __INM_SUPPORTED_REQUESTS) == inm_bio_op(bio)) #endif /* LINUX_VERSION_CODE < KERNEL_VERSION(4,8,0) */ /* * For unsupported kernels, break build so we are forced to verify * we are logging the right data. */ #if (LINUX_VERSION_CODE >= KERNEL_VERSION(5,16,0)) #define INM_BIO_RW_FLAGS(bio) (*bio = 0) #elif (LINUX_VERSION_CODE >= KERNEL_VERSION(4,8,0)) || defined SLES12SP3 #define INM_BIO_RW_FLAGS(bio) (inm_bio_rw(bio) | bio->bi_flags) #elif (LINUX_VERSION_CODE >= KERNEL_VERSION(4,3,0)) #define INM_BIO_RW_FLAGS(bio) (inm_bio_rw(bio) << 32 | bio->bi_flags) #else #define INM_BIO_RW_FLAGS(bio) (inm_bio_rw(bio) << 32 | bio->bi_flags << 32) #endif #endif involflt-0.1.0/src/data-file-mode.h0000755000000000000000000000451514467303177015577 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ /* * File : data-file-mode.h * * Description: Data File Mode support */ #ifndef LINVOLFLT_DATAFILE_MODE_H #define LINVOLFLT_DATAFILE_MODE_H #include "involflt-common.h" #include "involflt.h" #define DEFAULT_NUMBER_OF_FILEWRITERS_PER_VOLUME 1 struct _target_context; struct _change_node; typedef struct _data_file_thread { inm_s32_t id; inm_atomic_t pending; struct inm_list_head next; inm_completion_t exit; #ifdef INM_AIX inm_completion_t compl; #else inm_sem_t mutex; #endif struct _target_context *ctxt; inm_spinlock_t wq_list_lock; struct inm_list_head wq_hd; #ifdef INM_LINUX #if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 5, 0) struct task_struct *thread_task; #endif #endif } data_file_thread_t; typedef struct _data_file_context { inm_s32_t num_dfm_threads; inm_atomic_t terminating; inm_sem_t list_mutex; struct inm_list_head dfm_thr_hd; data_file_thread_t *next_thr; } data_file_flt_t; typedef struct _data_file_node { struct inm_list_head next; void *chg_node; } data_file_node_t; #define DFM_THREAD_ENTRY(ptr) (inm_list_entry(ptr, data_file_thread_t, next)) inm_s32_t init_data_file_flt_ctxt(struct _target_context *); void free_data_file_flt_ctxt(struct _target_context *); inm_s32_t inm_unlink_datafile(struct _target_context *, char *); inm_s32_t should_write_to_datafile(struct _target_context *); inm_s32_t queue_chg_node_to_file_thread(struct _target_context *, struct _change_node *); inm_s32_t create_datafile_dir_name(struct _target_context *, struct inm_dev_info *); #endif involflt-0.1.0/src/utils.h0000755000000000000000000001105214467303177014161 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef LINVOLFLT_UTILS_H #define LINVOLFLT_UTILS_H #include "involflt-common.h" #define bytes_to_pages(bytes) (INM_PAGEALIGN(bytes) >> INM_PAGESHIFT) #define pages_to_bytes(pages) ((pages) << INM_PAGESHIFT) #define HUNDREDS_OF_NANOSEC_IN_SECOND 10000000LL #define INMAGE_MAX_TS_SEQUENCE_NUMBER 0xFFFFFFFFFFFFFFFEULL #define INM_IS_DSKPART(flag) (!(flag & FULL_DISK_FLAG)) #define INM_IS_FULLDISK(flags) (flags & (FULL_DISK_FLAG | FULL_DISK_PARTITION_FLAG)) #define INM_TGT_CTXT_LOCK 0x1 #define NO_SKIPS_AFTER_ERROR 100 #define inm_div_up(nr, sz) (((nr) + (sz) - 1) / (sz)) struct _target_context; struct _wqentry; struct inm_ts_delta { inm_u32_t td_time; inm_u32_t td_seqno; inm_u32_t td_oflow; /* indicates either td_time or td_seqno overflow */ char reserved[4]; }; typedef struct inm_ts_delta inm_tsdelta_t; static_inline void inm_list_splice_at_tail(struct inm_list_head *oldhead, struct inm_list_head *newhead) { struct inm_list_head *nhlast = newhead->prev; struct inm_list_head *ohfirst = oldhead->next; struct inm_list_head *ohlast = oldhead->prev; INM_BUG_ON(inm_list_empty(oldhead)); nhlast->next = ohfirst; ohfirst->prev = nhlast; ohlast->next = newhead; newhead->prev = ohlast; } static_inline void list_change_head(struct inm_list_head *newhead, struct inm_list_head *oldhead) { INM_INIT_LIST_HEAD(newhead); if(inm_list_empty(oldhead)) return; newhead->next = oldhead->next; newhead->prev = oldhead->prev; newhead->prev->next = newhead; newhead->next->prev = newhead; } struct host_dev_context; void get_time_stamp(inm_u64_t *); void get_time_stamp_tag(TIME_STAMP_TAG_V2 *); inm_s32_t validate_path_for_file_name(char *); inm_s32_t validate_pname(char *); inm_s32_t get_volume_size(int64_t *vol_size, inm_s32_t *inmage_status); inm_s32_t default_granularity_from_volume_size(inm_u64_t volume_size); char *convert_path(char *path_name); long inm_mkdir(char *dirname, inm_s32_t mode); inm_u32_t inm_atoi(const char *); inm_u64_t inm_atoi64(const char *); inm_ull64_t inm_atoull64(const char *); char *convert_str_to_path(char *str); void chg_psname_as_inmageproc(char *psname); inm_s32_t is_digit(const char *, int); inm_s32_t get_path_memory(char **); void free_path_memory(char **); inm_device_t filter_dev_type_get(char *); char * filter_guid_name_string_get(char *guid, char *name, inm_s32_t len); inm_s32_t filter_dev_type_set(struct _target_context *, inm_device_t ); inm_s32_t read_value_from_file(char *, inm_s32_t *); char * read_string_from_file(char *fname, char *buf, inm_s32_t len); void inm_flush_ts_and_seqno(struct _wqentry *wqep); inm_s32_t inm_flush_clean_shutdown(inm_u32_t); void inm_flush_ts_and_seqno_to_file(inm_u32_t force); void inm_close_ts_and_seqno_file(void); inm_u32_t inm_comp_io_bkt_idx(inm_u32_t); inm_s32_t write_vol_attr(struct _target_context * ctxt, const char *file_name, void *buf, inm_s32_t len); void inm_free_host_dev_ctx(struct host_dev_context *hdcp); inm_u32_t is_AT_blocked(void); struct _tag_info* cnvt_tag_info2stream(struct _tag_info *, inm_s32_t, inm_u32_t); struct _tag_info* cnvt_stream2tag_info(struct _tag_info *, inm_s32_t); inm_s32_t inm_all_AT_cdb_send(struct _target_context *, unsigned char *, inm_u32_t, inm_s32_t, unsigned char *, inm_u32_t, inm_u32_t); inm_s32_t inm_form_tag_cdb(struct _target_context *, tag_info_t *, inm_s32_t); inm_s32_t inm_heartbeat_cdb(struct _target_context *); inm_s32_t inm_erase_resync_info_from_persistent_store(char *); void inm_get_tag_marker_guid(char *, inm_u32_t, char *, inm_u32_t); #if defined(RHEL_MAJOR) && (RHEL_MAJOR == 5) int sprintf_s(char *buf, size_t bufsz, const char *fmt, ...); #endif #define GET_TIME_STAMP_IN_USEC(tsp) do { get_time_stamp(&tsp); INM_DO_DIV(tsp, 10);} while(0) #endif involflt-0.1.0/src/db_routines.h0000755000000000000000000000263314467303177015343 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ /* * File : db_routines.h */ inm_s32_t set_volume_out_of_sync(target_context_t *vcptr, inm_u64_t out_of_sync_error_code, inm_s32_t status_to_log); void set_volume_out_of_sync_worker_routine(wqentry_t *wqe); inm_s32_t queue_worker_routine_for_set_volume_out_of_sync(target_context_t *vcptr, int64_t out_of_sync_error_code, inm_s32_t status); inm_s32_t stop_filtering_device(target_context_t *vcptr, inm_s32_t lock_acquired, volume_bitmap_t **vbmap_ptr); void add_resync_required_flag(UDIRTY_BLOCK_V2 *udb, target_context_t *vcptr); void reset_volume_out_of_sync(target_context_t *vcptr); involflt-0.1.0/src/bitmap_api.h0000755000000000000000000002527314467303177015140 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _INMAGE_BITMAP_API_H #define _INMAGE_BITMAP_API_H #include "involflt-common.h" #include "change-node.h" #include "iobuffer.h" /* thse are flags used in the bitmap file header */ #define BITMAP_FILE_VERSION1 (0x00010004) #define BITMAP_FILE_VERSION2 (0x0001000C) #define BITMAP_FILE_VERSION BITMAP_FILE_VERSION2 #define BITMAP_FILE_ENDIAN_FLAG (0x000000FF) #define MAX_WRITE_GROUPS_IN_BITMAP_HEADER (31) #define MAX_CHANGES_IN_WRITE_GROUP (64) #define DISK_SECTOR_SIZE (512) #define HEADER_CHECKSUM_SIZE (16) #define HEADER_CHECKSUM_DATA_SIZE (sizeof(bitmap_header_t) - \ HEADER_CHECKSUM_SIZE) #define LOG_HEADER_SIZE ((DISK_SECTOR_SIZE * \ MAX_WRITE_GROUPS_IN_BITMAP_HEADER) + DISK_SECTOR_SIZE) #define LOG_HEADER_OFFSET (LOG_HEADER_SIZE) struct _volume_bitmap; struct _target_context; /** log header * +--------------------------------------------------------------------+ * | validation_checksum | endian | header_size | version | data_offset | * +--------------------------------------------------------------------+ * | bitmap_offset | bitmap_size | bitmap_granularity | * +--------------------------------------------------------------------+ * | volume_size | recovery_state | last_chance_changes | boot_cycles | * + -------------------------------------------------------------------+ * | changes_lost | resync_required | resync_errcode | resync_errstatus | * +--------------------------------------------------------------------+ */ typedef struct _logheader_tag { inm_u8_t validation_checksum[HEADER_CHECKSUM_SIZE]; inm_u32_t endian; inm_u32_t header_size; inm_u32_t version; inm_u32_t data_offset; inm_u64_t bitmap_offset; inm_u64_t bitmap_size; inm_u64_t bitmap_granularity; int64_t volume_size; inm_u32_t recovery_state; inm_u32_t last_chance_changes; inm_u32_t boot_cycles; inm_u32_t changes_lost; /* V2 */ inm_u32_t resync_required; inm_u32_t resync_errcode; inm_u64_t resync_errstatus; } logheader_t; #define BITMAP_HDR1_SIZE offsetof(logheader_t, resync_required) #define BITMAP_HDR2_SIZE sizeof(logheader_t) /* recovery states */ #define BITMAP_LOG_RECOVERY_STATE_UNINITIALIZED 0 #define BITMAP_LOG_RECOVERY_STATE_CLEAN_SHUTDOWN 1 #define BITMAP_LOG_RECOVERY_STATE_DIRTY_SHUTDOWN 2 #define BITMAP_LOG_RECOVERY_STATE_LOST_SYNC 3 typedef struct _last_chance_changes_tag { union { inm_u64_t length_offset_pair[MAX_CHANGES_IN_WRITE_GROUP]; inm_u8_t sector_fill[DISK_SECTOR_SIZE]; }un; } last_chance_changes_t; /* Last two bytes are marked 00 to make len = 0 for length_offset_pair */ #define BITMAP_LCW_SIGNATURE_SUFFIX 0x57434C000000 /* LCW00 */ #define BITMAP_LCW_SIGNATURE_PREFIX_SZ 3 /* bytes */ /** Bitmap header * +----------------------------------+ * | log header | * +----------------------------------+ * | length and offset pair group 1 --+-----------+ * +----------------------------------+ | * | length and offset pair group 2 --+------+ | * +----------------------------------+ | | * | ... | | | * +----------------------------------+ | | * | length and offset pair group 31--+--+ | | * +----------------------------------+ | | | * | | | * +------------------------------------+ | | * | +------------------------------------+ V * | | +---------------------------------------------------------------------+ * | | | length_offset_pair1 | length_offset_pair2 | .. length_offset_pair64 | * | | +---------------------------------------------------------------------+ * | | * | | +---------------------------------------------------------------------+ * | + --->| length_offset_pair1 | length_offset_pair2 | .. length_offset_pair64 | * | +---------------------------------------------------------------------+ * | ... * | ... * | ... * | +---------------------------------------------------------------------+ * +--->| length_offset_pair1 | length_offset_pair2 | .. length_offset_pair64 | * +---------------------------------------------------------------------+ */ typedef struct _bitmap_header_tag { union { logheader_t header; inm_u8_t sector_fill[DISK_SECTOR_SIZE]; }un; last_chance_changes_t change_groups[MAX_WRITE_GROUPS_IN_BITMAP_HEADER]; } bitmap_header_t; /* Bitrun data structure * * +----------------------------------+ * | final_status | nbr_runs_processed| * +----------------------------------+ * | context 1 | context 2 | * +----------------------------------+ * | completion_callback | * +----------------------------------+ * | # of runs (nbr_runs) | * +----------------------------------+ * | meta_page_list | * +----------------------------------+ +-------------------+ * | disk change 1 --------+-------->| offset | length | * +----------------------------------+ +-------------------+ * | disk change 2 --------+--+ * +----------------------------------+ | +-------------------+ * | ... | +----->| offset | length | * +----------------------------------+ +-------------------+ * | disk change N --------+--+ ... * +----------------------------------+ | +-------------------+ * +----->| offset | length | * +-------------------+ */ struct _disk_chg; typedef struct _bitruns_tag { inm_u32_t final_status; inm_u32_t nbr_runs_processed; void *context1; void *context2; void (*completion_callback)(struct _bitruns_tag *runs); inm_ull64_t nbr_runs; struct inm_list_head meta_page_list; disk_chg_t *runs; } bitruns_t; typedef struct _bitmap_api_tag { inm_u64_t volume_size; inm_u32_t bitmap_granularity; inm_u32_t bitmap_size_in_bytes; inm_u32_t bitmap_offset; inm_u32_t nr_bits_in_bitmap; inm_u32_t bitmap_file_state; inm_sem_t sem; segmented_bitmap_t *sb; fstream_t *fs; fstream_segment_mapper_t *fssm; inm_u8_t corrupt_bitmap; inm_u8_t empyt_bitmap; inm_u8_t new_bitmap; inm_u8_t volume_insync; inm_s32_t err_causing_outofsync; char bitmap_filename[INM_NAME_MAX + 1]; bitmap_header_t bitmap_header; iobuffer_t *io_bitmap_header; inm_u32_t segment_cache_limit; char volume_name[INM_NAME_MAX + 1]; inm_dev_t bmapdev; } bitmap_api_t; /* bitmap file state */ #define BITMAP_FILE_STATE_UNINITIALIZED 0 #define BITMAP_FILE_STATE_OPENED 1 #define BITMAP_FILE_STATE_RAWIO 2 #define BITMAP_FILE_STATE_CLOSED 3 /* bitmap api operations */ bitmap_api_t *bitmap_api_ctr(void); void bitmap_api_dtr(bitmap_api_t *bmap); inm_s32_t initialize_bitmap_api(void); inm_s32_t terminate_bitmap_api(void); inm_s32_t bitmap_api_open(bitmap_api_t *bapi, struct _target_context *vcptr, inm_u32_t granularity, inm_u32_t offset, inm_u64_t volume_size, char *volume_name, inm_u32_t segment_cache_limit, inm_s32_t *detailed_status); inm_s32_t bitmap_api_load_bitmap_header_from_filestream(bitmap_api_t *bapi, inm_s32_t *detailed_status, inm_s32_t was_created); inm_s32_t bitmap_api_is_volume_insync(bitmap_api_t *bapi, inm_u8_t *volume_in_sync, inm_s32_t *out_of_sync_err_code); inm_s32_t bitmap_api_is_bitmap_closed(bitmap_api_t *bapi); inm_s32_t bitmap_api_close(bitmap_api_t *bapi, inm_s32_t *close_status); inm_s32_t bitmap_api_setbits(bitmap_api_t *bapi, bitruns_t *bruns, struct _volume_bitmap *vbmap); inm_s32_t bitmap_api_clearbits(bitmap_api_t *bapi, bitruns_t *bruns); inm_s32_t bitmap_api_get_first_runs(bitmap_api_t *bapi, bitruns_t *bruns); inm_s32_t bitmap_api_get_next_runs(bitmap_api_t *bapi, bitruns_t *bruns); inm_s32_t bitmap_api_clear_all_bits(bitmap_api_t *bapi); inm_s32_t move_rawio_changes_to_bitmap(bitmap_api_t *bapi, inm_s32_t *inmage_open_status); inm_s32_t bitmap_api_verify_bitmap_header(bitmap_header_t *bh); inm_s32_t bitmap_api_init_bitmap_file(bitmap_api_t *bapi, inm_s32_t *inmage_status); inm_s32_t bitmap_api_commit_bitmap_internal(bitmap_api_t *, int, inm_s32_t *); inm_s32_t bitmap_api_fast_zero_bitmap(bitmap_api_t *bapi); inm_s32_t bitmap_api_commit_header(bitmap_api_t *bapi, inm_s32_t verify_existing_hdr_for_raw_io, inm_s32_t *inmage_status); void bitmap_api_calculate_hdr_integrity_checksums(bitmap_header_t *bhdr); inm_s32_t bitmap_api_read_and_verify_bitmap_header(bitmap_api_t *bapi, inm_s32_t *inmage_status); inm_s32_t bitmap_api_verify_header(bitmap_api_t *bapi, bitmap_header_t *bheader); inm_s32_t is_volume_in_sync(bitmap_api_t *bapi, inm_s32_t *vol_in_sync, inm_s32_t *out_of_sync_err_code); inm_s32_t bitmap_api_open_bitmap_stream(bitmap_api_t *bapi, struct _target_context *vcptr, inm_s32_t *detailed_status); inm_s32_t is_bmaphdr_loaded(struct _volume_bitmap *vbmap); struct bmap_bit_stats { unsigned long long bbs_bmap_gran; int bbs_nr_prev_bits; int bbs_max_nr_bits_in_chg; int bbs_nr_dbs; unsigned long long bbs_curr_db_sz; int bbs_nr_chgs_in_curr_db; }; typedef struct bmap_bit_stats bmap_bit_stats_t; inm_u64_t bitmap_api_get_dat_bytes_in_bitmap(bitmap_api_t *bapi, bmap_bit_stats_t *); inm_s32_t bitmap_api_map_file_blocks(bitmap_api_t *, fstream_raw_hdl_t **); inm_s32_t bitmap_api_switch_to_rawio_mode(bitmap_api_t *, inm_u64_t *); void bitmap_api_set_volume_out_of_sync(bitmap_api_t *, inm_u64_t, inm_u32_t); #endif /* _INMAGE_BITMAP_API_H */ involflt-0.1.0/src/change-node.h0000755000000000000000000002022714467303177015175 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ /* * File : change-node.h */ #ifndef LINVOLFLT_CHANGE_NODE_H #define LINVOLFLT_CHANGE_NODE_H #include "utils.h" #include "osdep.h" #include "telemetry-types.h" extern const inm_s32_t sv_chg_sz; extern const inm_s32_t sv_const_sz; struct _target_context; struct _wqentry; struct _write_metadata_tag; #define KDIRTY_BLOCK_FLAG_START_OF_SPLIT_CHANGE 0x00000001 #define KDIRTY_BLOCK_FLAG_PART_OF_SPLIT_CHANGE 0x00000002 #define KDIRTY_BLOCK_FLAG_END_OF_SPLIT_CHANGE 0x00000004 #define KDIRTY_BLOCK_FLAG_SPLIT_CHANGE_MASK \ (KDIRTY_BLOCK_FLAG_START_OF_SPLIT_CHANGE | \ KDIRTY_BLOCK_FLAG_PART_OF_SPLIT_CHANGE | \ KDIRTY_BLOCK_FLAG_END_OF_SPLIT_CHANGE) #define CHANGE_NODE_FLAGS_QUEUED_FOR_DATA_WRITE 0x00000008 #define CHANGE_NODE_FLAGS_ERROR_IN_DATA_WRITE 0x00000010 #define CHANGE_NODE_RESYNC_FLAG_SENT_TO_S2 0x00000020 #define CHANGE_NODE_DATA_STREAM_FINALIZED 0x00000040 #define CHANGE_NODE_DATA_PAGES_MAPPED_TO_S2 0x00000080 #define CHANGE_NODE_TAG_IN_STREAM 0x00000100 #define CHANGE_NODE_ORPHANED 0x00000200 #define CHANGE_NODE_COMMITTED 0x00000400 #define CHANGE_NODE_IN_NWO_CLOSED 0x00000800 #define CHANGE_NODE_DRAIN_BARRIER 0x00001000 #define CHANGE_NODE_FAILBACK_TAG 0x00002000 #define CHANGE_NODE_BLOCK_DRAIN_TAG 0x00004000 #define CHANGE_NODE_ALLOCED_FROM_POOL 0x00008000 #define KDIRTY_BLOCK_FLAG_PREPARED_FOR_USERMODE 0x80000000 #define MAX_KDIRTY_CHANGES (MAX_CHANGE_INFOS_PER_PAGE) #define NOT_IN_IO_PATH 0x0001 #define IN_IO_PATH 0x0002 #define IN_IOCTL_PATH 0x0004 #define IN_GET_DB_PATH 0x0008 #define IN_BMAP_READ_PATH 0x0010 /* Change node types */ typedef enum { NODE_SRC_UNDEFINED = 0, NODE_SRC_DATA = 1, NODE_SRC_METADATA = 2, NODE_SRC_TAGS = 3, NODE_SRC_DATAFILE = 4, }node_type_t; /* disk_chg_t is required to seperately for metadata, since it is possible that * we can store double entries in a single change node */ struct _disk_chg { inm_u64_t offset; inm_u32_t length; inm_u32_t seqno_delta; inm_u32_t time_delta; }; typedef struct _disk_chg disk_chg_t; typedef struct { TIME_STAMP_TAG_V2 start_ts; TIME_STAMP_TAG_V2 end_ts; struct inm_list_head md_pg_list; /* metadata page(inm_page_t) list */ unsigned long *cur_md_pgp; /* curr meta data page */ inm_s32_t num_data_pgs; inm_s32_t bytes_changes; unsigned short change_idx; } disk_chg_head_t; #define MAX_CHANGE_INFOS_PER_PAGE (INM_PAGESZ/sizeof(disk_chg_t)) struct _target_context; struct _change_node { inm_atomic_t ref_cnt; node_type_t type; etWriteOrderState wostate; inm_s32_t flags; inm_s64_t transaction_id; inm_s32_t mutext_initialized; /* This mutex protects the data pages shared * between data mode and data file mode. */ inm_sem_t mutex; struct inm_list_head next; struct inm_list_head nwo_dmode_next; struct inm_list_head data_pg_head; data_page_t *cur_data_pg; inm_s32_t cur_data_pg_off; inm_s32_t data_free; inm_addr_t mapped_address; inm_task_struct *mapped_thread; char *data_file_name; inm_s32_t data_file_size; inm_s32_t stream_len; disk_chg_head_t changes; struct _target_context *vcptr; /* for splitted change nodes */ inm_u32_t seq_id_for_split_io; tag_guid_t *tag_guid; inm_s32_t tag_status_idx; inm_u64_t dbret_ts_in_usec; tag_history_t *cn_hist; }; typedef struct _change_node change_node_t; #define IS_CHANGE_NODE_DRAIN_BARRIER(c) \ ((c)->flags & CHANGE_NODE_DRAIN_BARRIER) #define get_drtd_len(node) (((change_node_t *)node)->changes.bytes_changes + \ (((change_node_t *)node)->changes.change_idx * sv_chg_sz)) #define get_strm_len(node) (sv_const_sz + get_drtd_len(node)) #define CHANGE_NODE_IS_FIRST_DATA_PAGE(node, pg) \ ((void *)PG_ENTRY(pg->next.prev) == (void *)&node->data_pg_head) #define CHANGE_NODE_IS_LAST_DATA_PAGE(node, pg) \ ((void *)PG_ENTRY(pg->next.next) == (void *)&node->data_pg_head) #ifndef INM_AIX static_inline void unmap_change_node(change_node_t *chg_node) { inm_s32_t _len = 0, ret = 0; if(!chg_node->mapped_address || !chg_node->changes.num_data_pgs) return; _len = pages_to_bytes(chg_node->changes.num_data_pgs); ret = INM_DO_STREAM_UNMAP(chg_node->mapped_address, _len); if (ret) { err("INM_DO_STREAM_UNMAP() failed w/ err = %d", ret); } else { dbg("Unmapped user address 0x%p len %d\n", (unsigned long *)chg_node->mapped_address, _len); } } #endif void cleanup_change_node(change_node_t *); static_inline void ref_chg_node(change_node_t *node) { INM_ATOMIC_INC(&node->ref_cnt); } static_inline void deref_chg_node(change_node_t *node) { if(INM_ATOMIC_DEC_AND_TEST(&node->ref_cnt)) { cleanup_change_node(node); } } struct inm_writedata; change_node_t *get_oldest_change_node(struct _target_context *, inm_s32_t *); change_node_t *get_oldest_datamode_change_node(struct _target_context *); change_node_t *get_change_node_to_update(struct _target_context *, struct inm_writedata *, inm_tsdelta_t *); void cleanup_change_nodes(struct inm_list_head *, etTagStateTriggerReason); void free_changenode_list(struct _target_context *ctxt, etTagStateTriggerReason); void changenode_cleanup_routine(struct _wqentry *wqe); void change_node_cleanup_worker_routine(struct _wqentry *wqe); inm_s32_t queue_changenode_cleanup_worker_routine(change_node_t *cnode, etTagStateTriggerReason); inm_s32_t queue_worker_routine_for_change_node_cleanup(change_node_t *); inm_s32_t init_change_node(change_node_t *, int, int, struct inm_writedata *); change_node_t *get_change_node_to_save_as_file(struct _target_context *); change_node_t *get_change_node_for_usertag(struct _target_context *, struct inm_writedata *, int commit_pending); void commit_change_node(change_node_t *change_node); inm_page_t *get_page_from_page_pool(int, int, struct inm_writedata *); change_node_t *inm_alloc_change_node(struct inm_writedata *, unsigned); void inm_free_change_node(change_node_t *); void inm_free_metapage(inm_page_t *); void update_change_node(change_node_t *chg_node, struct _write_metadata_tag *wmd, inm_tsdelta_t *tdp); void inm_get_ts_and_seqno_deltas(change_node_t *, inm_tsdelta_t *); void close_change_node(change_node_t *, inm_u32_t); void print_chg_info(change_node_t *cnp, unsigned short idx); inm_s32_t fill_udirty_block(struct _target_context *ctxt, UDIRTY_BLOCK_V2 *udirty, inm_devhandle_t *filp); inm_s32_t perform_commit(struct _target_context *ctxt, COMMIT_TRANSACTION *commit, inm_devhandle_t *filp); inm_s32_t commit_usertag(struct _target_context *ctxt); void revoke_usertag(struct _target_context *ctxt, int timedout); static_inline data_page_t * get_next_data_page(struct inm_list_head *node, inm_s32_t *pg_free, inm_s32_t *pg_offset, change_node_t *chg_node) { if (node == &chg_node->data_pg_head) return NULL; *pg_free = INM_PAGESZ; *pg_offset = 0; return inm_list_entry(node, data_page_t, next); } inm_s32_t verify_change_node_file(change_node_t *cnode); void do_perf_changes(struct _target_context *tgt_ctxt, change_node_t *recent_cnode, int path); #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) void do_perf_changes_all(struct _target_context *tgt_ctxt, int path); void move_chg_nodes_to_drainable_queue(void); #endif #endif involflt-0.1.0/src/work_queue.c0000755000000000000000000005010714467303177015206 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include "involflt.h" #include "involflt-common.h" #include "data-mode.h" #include "utils.h" #include "change-node.h" #include "filestream.h" #include "file-io.h" #include "iobuffer.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "bitmap_api.h" #include "VBitmap.h" #include "work_queue.h" #include "data-file-mode.h" #include "target-context.h" #include "driver-context.h" #include "tunable_params.h" #define WORKER_THREAD_TIMEOUT 1000 /* clock ticks */ extern driver_context_t *driver_ctx; static inm_s32_t reorg_datapool(inm_u32_t); #ifdef INM_LINUX extern inm_s32_t driver_state; #endif int timer_worker(void *context) { workq_t *work_q = &driver_ctx->dc_tqueue; long timeout_val = WORKER_THREAD_TIMEOUT; inm_irqflag_t lock_flag = 0; struct inm_list_head workq_list_head; struct inm_list_head *ptr = NULL, *nextptr = NULL; wqentry_t *wqeptr = NULL; inm_s32_t shutdown_event, wakeup_event; INM_DAEMONIZE("inmtmrd"); timeout_val = INM_MSECS_TO_JIFFIES(INM_MSEC_PER_SEC); while (1) { dbg("Sleeping"); INM_WAIT_FOR_COMPLETION_INTERRUPTIBLE(&work_q->new_event_completion); dbg("Awoken"); INM_INIT_LIST_HEAD(&workq_list_head); wakeup_event = INM_WAIT_EVENT_INTERRUPTIBLE_TIMEOUT(work_q->wakeup_event, INM_ATOMIC_READ(&work_q->wakeup_event_raised), timeout_val); shutdown_event = (wakeup_event == 0) && INM_WAIT_EVENT_INTERRUPTIBLE_TIMEOUT(work_q->shutdown_event, INM_ATOMIC_READ(&work_q->shutdown_event_raised), INM_MSECS_TO_JIFFIES(5)); if (shutdown_event > 0) { dbg("Recieved shutdown event"); INM_ATOMIC_DEC(&work_q->shutdown_event_raised); work_q->flags |= WQ_FLAGS_THREAD_SHUTDOWN; break; } if (wakeup_event) { dbg("Recieved wakeup for timer"); INM_ATOMIC_DEC(&work_q->wakeup_event_raised); INM_SPIN_LOCK_IRQSAVE(&work_q->lock, lock_flag); inm_list_replace_init(&work_q->worker_queue_head, &workq_list_head); INM_SPIN_UNLOCK_IRQRESTORE(&work_q->lock, lock_flag); inm_list_for_each_safe(ptr, nextptr, &workq_list_head) { wqeptr = inm_list_entry(ptr, wqentry_t, list_entry); if (wqeptr->flags & WITEM_TYPE_TIMEOUT) { inm_list_del_init(ptr); if (wqeptr && wqeptr->work_func) wqeptr->work_func(wqeptr); else dbg("timeout without work item"); } else { err("unknown wqe type"); INM_BUG_ON(!(wqeptr->flags & WITEM_TYPE_TIMEOUT)); } } } } dbg("Timer thread dying"); INM_COMPLETE_AND_EXIT(&work_q->worker_thread_completion, 0); } inm_s32_t init_work_queue(workq_t *work_q, int (*worker_thread_function)(void *)) { inm_pid_t pid; inm_s32_t err = 0; struct task_struct *thread_task = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (work_q == NULL) return -ENOMEM; INM_MEM_ZERO(work_q, sizeof(*work_q)); INM_INIT_SPIN_LOCK(&work_q->lock); INM_INIT_LIST_HEAD(&work_q->worker_queue_head); INM_INIT_WAITQUEUE_HEAD(&work_q->wakeup_event); INM_INIT_WAITQUEUE_HEAD(&work_q->shutdown_event); INM_ATOMIC_SET(&work_q->wakeup_event_raised, 0); INM_ATOMIC_SET(&work_q->shutdown_event_raised, 0); INM_INIT_COMPLETION(&work_q->worker_thread_completion); INM_INIT_COMPLETION(&work_q->new_event_completion); if (worker_thread_function == NULL) worker_thread_function = generic_worker_thread_function; work_q->worker_thread_routine = worker_thread_function; #ifdef INM_LINUX pid = INM_KERNEL_THREAD(thread_task, worker_thread_function, work_q, sizeof(work_q), "inmwrkrd"); #if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 5, 0) work_q->task = thread_task; #endif #else pid = INM_KERNEL_THREAD(worker_thread_function, work_q, sizeof(work_q), "inmwrkrd"); #endif if (pid >= 0) { info ("worker thread with pid = %d has created", pid); work_q->worker_thread_initialized = 1; INM_COMPLETE(&work_q->new_event_completion); } err = work_q->worker_thread_initialized == 0 ? pid : 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with ret value = %d", err); } return err; } void cleanup_work_queue(workq_t *work_q) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (work_q == NULL || work_q->worker_thread_initialized == 0) return; INM_ATOMIC_INC(&work_q->shutdown_event_raised); INM_WAKEUP_INTERRUPTIBLE(&work_q->shutdown_event); INM_COMPLETE(&work_q->new_event_completion); INM_WAIT_FOR_COMPLETION(&work_q->worker_thread_completion); INM_KTHREAD_STOP(work_q->task); INM_DESTROY_COMPLETION(&work_q->worker_thread_completion); INM_DESTROY_COMPLETION(&work_q->new_event_completion); work_q->worker_thread_initialized = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return; } void init_work_queue_entry(wqentry_t *wqe) { INM_MEM_ZERO(wqe, sizeof(*wqe)); INM_ATOMIC_SET(&wqe->refcnt, 1); INM_INIT_LIST_HEAD(&wqe->list_entry); } wqentry_t *alloc_work_queue_entry(inm_u32_t gfpmask) { wqentry_t *wqe = NULL; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("entered"); } wqe = (wqentry_t *)INM_KMEM_CACHE_ALLOC(driver_ctx->wq_entry_pool, gfpmask); if (!wqe) return NULL; init_work_queue_entry(wqe); if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("leaving"); } return wqe; } void cleanup_work_queue_entry(wqentry_t *wqe) { if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("entered"); } INM_KMEM_CACHE_FREE(driver_ctx->wq_entry_pool, wqe); wqe = NULL; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("leaving"); } return; } /*reference work queue entry - increment refcnt*/ void get_work_queue_entry(wqentry_t *wqe) { INM_ATOMIC_INC(&wqe->refcnt); return; } /*dereference work queue entry - decrement refcnt*/ void put_work_queue_entry(wqentry_t *wqe) { if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("entered"); } if (INM_ATOMIC_DEC_AND_TEST(&wqe->refcnt)) cleanup_work_queue_entry(wqe); if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("leaving"); } return; } inm_s32_t add_item_to_work_queue(workq_t *work_q, wqentry_t *wq_entry) { inm_s32_t r = 0; unsigned long lock_flag = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } if (work_q == NULL || work_q->worker_thread_initialized == 0) return 1; INM_SPIN_LOCK_IRQSAVE(&work_q->lock, lock_flag); if (work_q->flags & WQ_FLAGS_THREAD_SHUTDOWN) { r = 1; } else { inm_list_add_tail(&wq_entry->list_entry, &work_q->worker_queue_head); work_q->flags |= WQ_FLAGS_THREAD_WAKEUP; INM_ATOMIC_INC(&work_q->wakeup_event_raised); INM_WAKEUP_INTERRUPTIBLE(&work_q->wakeup_event); INM_COMPLETE(&work_q->new_event_completion); } INM_SPIN_UNLOCK_IRQRESTORE(&work_q->lock, lock_flag); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving with ret value = %d", r); } return r; } int generic_worker_thread_function(void *context) { workq_t *work_q = &driver_ctx->wqueue; long timeout_val = WORKER_THREAD_TIMEOUT; struct inm_list_head *ptr = NULL, *nextptr = NULL; struct inm_list_head workq_list_head; wqentry_t *wqeptr = NULL; inm_s32_t shutdown_event, wakeup_event; inm_irqflag_t lock_flag = 0; inm_u64_t prev_ts_100nsec = 0, cur_ts_100nsec = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered"); } INM_DAEMONIZE("inmwrkrd"); timeout_val = INM_MSECS_TO_JIFFIES(INM_MSEC_PER_SEC); while (1) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ dbg("waiting for new event completion in worker thread \n"); } INM_WAIT_FOR_COMPLETION(&work_q->new_event_completion); INM_INIT_LIST_HEAD(&workq_list_head); wakeup_event = INM_WAIT_EVENT_INTERRUPTIBLE_TIMEOUT(work_q->wakeup_event, INM_ATOMIC_READ(&work_q->wakeup_event_raised), timeout_val); shutdown_event = (wakeup_event == 0) && INM_WAIT_EVENT_INTERRUPTIBLE_TIMEOUT(work_q->shutdown_event, INM_ATOMIC_READ(&work_q->shutdown_event_raised), timeout_val); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ dbg("worker thread wakeup_event %d shutdown_event %d \n", wakeup_event, shutdown_event); } if (shutdown_event > 0) { INM_ATOMIC_DEC(&work_q->shutdown_event_raised); work_q->flags |= WQ_FLAGS_THREAD_SHUTDOWN; break; } /* if reorg_datapool() is called only if there is not wakeup_event then reorg * may not get a chance for a long time if a lot of work item has to processed. */ reorg_datapool(work_q->flags); work_q->flags &= ~WQ_FLAGS_REORG_DP_ALLOC; if (!wakeup_event) { inm_flush_ts_and_seqno_to_file(FALSE); INM_SPIN_LOCK_IRQSAVE(&driver_ctx->clean_shutdown_lock, lock_flag); if(!(driver_ctx->dc_flags & SYS_CLEAN_SHUTDOWN) && !(driver_ctx->dc_flags & SYS_UNCLEAN_SHUTDOWN) && driver_state & DRV_LOADED_FULLY) { INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->clean_shutdown_lock, lock_flag); if(inm_flush_clean_shutdown(UNCLEAN_SHUTDOWN)) driver_ctx->dc_flags |= SYS_UNCLEAN_SHUTDOWN; }else INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->clean_shutdown_lock, lock_flag); INM_COMPLETE(&work_q->new_event_completion); #ifdef INM_AIX do{ inm_s32_t flag; INM_SPIN_LOCK(&logger->log_buffer_lock, flag); if(logger->log_buffer_count >= LOG_THRESHOLD){ INM_SPIN_UNLOCK(&logger->log_buffer_lock, flag); inm_flush_log_file(); INM_SPIN_LOCK(&logger->log_buffer_lock, flag); } INM_SPIN_UNLOCK(&logger->log_buffer_lock, flag); }while(0); #endif continue; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ dbg("received wakeup event in worker thread\n"); dbg("worker thread wakeup_event_raised %d ", work_q->wakeup_event_raised); } INM_ATOMIC_DEC(&work_q->wakeup_event_raised); INM_SPIN_LOCK_IRQSAVE(&work_q->lock, lock_flag); inm_list_for_each_safe(ptr, nextptr, &work_q->worker_queue_head) { wqeptr = inm_list_entry(ptr, wqentry_t, list_entry); inm_list_del_init(ptr); inm_list_add_tail(&wqeptr->list_entry,&workq_list_head); } INM_SPIN_UNLOCK_IRQRESTORE(&work_q->lock, lock_flag); get_time_stamp(&prev_ts_100nsec); inm_list_for_each_safe(ptr, nextptr, &workq_list_head) { wqeptr = inm_list_entry(ptr, wqentry_t, list_entry); inm_list_del(&wqeptr->list_entry); /* flush time stamp and seq no*/ get_time_stamp(&cur_ts_100nsec); if ((cur_ts_100nsec - prev_ts_100nsec) >= HUNDREDS_OF_NANOSEC_IN_SECOND) { inm_flush_ts_and_seqno_to_file(FALSE); INM_SPIN_LOCK_IRQSAVE(&driver_ctx->clean_shutdown_lock, lock_flag); if (!(driver_ctx->dc_flags & SYS_CLEAN_SHUTDOWN) && !(driver_ctx->dc_flags & SYS_UNCLEAN_SHUTDOWN) && driver_state & DRV_LOADED_FULLY) { INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->clean_shutdown_lock, lock_flag); if(inm_flush_clean_shutdown(UNCLEAN_SHUTDOWN)) driver_ctx->dc_flags |= SYS_UNCLEAN_SHUTDOWN; }else INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->clean_shutdown_lock, lock_flag); prev_ts_100nsec = cur_ts_100nsec; } if (wqeptr->witem_type == WITEM_TYPE_SYSTEM_SHUTDOWN) { put_work_queue_entry(wqeptr); INM_COMPLETE(&driver_ctx->shutdown_completion); info("received sys shutdown message\n"); continue; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ dbg("processing wqe in workq"); dbg("work queue entry = %p", wqeptr); dbg("work item type = %d, refcnt = %d", ((bitmap_work_item_t *) wqeptr->context)->eBitmapWorkItem, INM_ATOMIC_READ(&wqeptr->refcnt)); } if (wqeptr->work_func) wqeptr->work_func(wqeptr); } } info("received shutdown event in worker thread\n"); inm_close_ts_and_seqno_file(); INM_COMPLETE_AND_EXIT(&work_q->worker_thread_completion, 0); return 0; } #define SAMPLE_WINDOW 0xa /* 10 sampling for rate of comsumption of pages*/ #define INM_REORG_WAITING_TIME_SEC 2 static inm_s32_t reorg_datapool(inm_u32_t reorg_flag) { #ifdef INM_AIX struct timestruc_t now; #else inm_timespec now; #endif static inm_u64_t recored_time; static inm_u64_t prev_dcr_time; inm_u64_t thrshld_time = driver_ctx->tunable_params.time_reorg_data_pool_sec; inm_u32_t cur_prot_vols = 0; static inm_u32_t prev_prot_vols; static inm_u32_t last_free_pages; inm_u32_t least_free_pages = 0; inm_u32_t expect_free = 0; inm_u32_t cur_free_pages = 0; inm_u32_t cur_allocd_pages; inm_u32_t alloc_limit = 0; inm_u32_t slab_nrpgs = 0; inm_u32_t data_pool_size = 0; static inm_u32_t index; static inm_u32_t *rate_array; inm_u32_t diff_sec = 0; inm_u32_t reorg = 0; inm_s32_t nr_pages = 0; inm_u32_t nr_slabs = 0; inm_u32_t default_dp_pages =(DEFAULT_DATA_POOL_SIZE_MB << (MEGABYTE_BIT_SHIFT - INM_PAGESHIFT)); inm_irqflag_t lock_flag; inm_s32_t rate = 0; inm_s32_t i = 0; inm_u32_t alloc_always = (reorg_flag & WQ_FLAGS_REORG_DP_ALLOC); inm_s32_t ret = 0; inm_u32_t factor = 0; INM_GET_CURRENT_TIME(now); if(!recored_time){ INM_DOWN_READ(&(driver_ctx->tgt_list_sem)); cur_prot_vols = driver_ctx->host_prot_volumes; INM_UP_READ(&(driver_ctx->tgt_list_sem)); INM_SPIN_LOCK_IRQSAVE(&driver_ctx->clean_shutdown_lock, lock_flag); if(driver_ctx->dc_flags & DRV_DUMMY_LUN_CREATED){ cur_prot_vols--; } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->clean_shutdown_lock, lock_flag); INM_SPIN_LOCK_IRQSAVE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); last_free_pages = driver_ctx->dc_cur_unres_pages; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); prev_prot_vols = cur_prot_vols; recored_time = now.tv_sec; index = 0; rate_array = INM_KMALLOC(sizeof(inm_u32_t) * SAMPLE_WINDOW, INM_KM_SLEEP, INM_KERNEL_HEAP); INM_MEM_ZERO(rate_array, sizeof(inm_u32_t) * SAMPLE_WINDOW); prev_dcr_time = now.tv_sec; } /* Right now dp_wait_time is not a tunable so we can access it without taking * data_pages_lock. If it happen to be tunable then we need to take the lock. */ diff_sec = now.tv_sec - recored_time; if(diff_sec >= thrshld_time || alloc_always){ INM_DOWN_READ(&(driver_ctx->tgt_list_sem)); cur_prot_vols = driver_ctx->host_prot_volumes; INM_UP_READ(&(driver_ctx->tgt_list_sem)); INM_SPIN_LOCK_IRQSAVE(&driver_ctx->clean_shutdown_lock, lock_flag); if(driver_ctx->dc_flags & DRV_DUMMY_LUN_CREATED){ cur_prot_vols--; } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->clean_shutdown_lock, lock_flag); INM_SPIN_LOCK_IRQSAVE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); slab_nrpgs = driver_ctx->data_flt_ctx.dp_nrpgs_slab; cur_free_pages = driver_ctx->dc_cur_unres_pages; cur_allocd_pages = driver_ctx->data_flt_ctx.pages_allocated; least_free_pages = driver_ctx->data_flt_ctx.dp_least_free_pgs; last_free_pages += (driver_ctx->data_flt_ctx.dp_pages_alloc_free); driver_ctx->data_flt_ctx.dp_pages_alloc_free = 0; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); INM_SPIN_LOCK_IRQSAVE(&driver_ctx->tunables_lock, lock_flag); data_pool_size = driver_ctx->tunable_params.data_pool_size; factor = driver_ctx->tunable_params.time_reorg_data_pool_factor; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->tunables_lock, lock_flag); data_pool_size <<= (MEGABYTE_BIT_SHIFT - INM_PAGESHIFT); /* if rate of consumption of free pages are negative that means it releasing pages * so we can make rate is zero because release extra free pages are handled saparately */ if (last_free_pages > cur_free_pages) { rate = last_free_pages - cur_free_pages; } else { rate = 0; } if(diff_sec && rate){ /* The following 2 line of code is doing below stuff: * rate = (last_free_pages - cur_free_pages + diff_sec - 1) / diff_sec; */ rate += diff_sec - 1; rate /= diff_sec; } rate_array[index++] = rate; index = ((index) % SAMPLE_WINDOW); /* * its very unlikely that alloc_always is one and cur_prot_vols are zero, * For now not allocating any pages if cur_prot_vols are zero even if alloc_always * is set. */ if(!cur_prot_vols){ if(prev_prot_vols){ nr_pages = cur_allocd_pages - default_dp_pages; if(nr_pages > 0){ delete_data_pages(nr_pages); } } goto reorg_done; } if(!prev_prot_vols || alloc_always){ if(cur_allocd_pages > data_pool_size){ goto reorg_done; } nr_pages = data_pool_size - cur_allocd_pages; nr_pages = MIN(slab_nrpgs, nr_pages); dbg("reconfig to allocate %u pages, alloc_always is %u, prev_prot volumes %u", nr_pages, alloc_always, prev_prot_vols); ret = add_data_pages(nr_pages); goto reorg_done; } /* below try to ave the rate over SAMPLE_WINDOW iteration. * for first SAMPLE_WINDOW time it will miscalculate rate * but for first 2 * SAMPLE_WINDOW sec not much will happen * in boottime loading as well as loading of drv when system * up and running. */ rate = 0; for (i = 0; i < SAMPLE_WINDOW; i++){ rate += rate_array[i]; } rate = ((rate + SAMPLE_WINDOW - 1)/SAMPLE_WINDOW); /* * extrapolating free page consumption for twice of thrshld_time * because we not wakeup in exactly in thrshld_time time in on a * very busy system and it is harmless on idle systems. */ expect_free = cur_prot_vols * rate * 2 * thrshld_time; expect_free /= prev_prot_vols; expect_free = cur_free_pages - expect_free; alloc_limit = MIN(slab_nrpgs, cur_allocd_pages); alloc_limit = (alloc_limit * MIN_FREE_PAGES_TO_ALLOC_SLAB_PERCENT) / 100; if(expect_free <= alloc_limit){ if(!(cur_allocd_pages < data_pool_size)){ goto reorg_done; } nr_pages = data_pool_size - cur_allocd_pages; nr_pages = MIN(slab_nrpgs, nr_pages); dbg("reconfig to allocate %u pages", nr_pages); ret = add_data_pages(nr_pages); reorg = 1; } else { if(now.tv_sec - prev_dcr_time > 2 * factor * thrshld_time){ if(least_free_pages > ((slab_nrpgs * MIN_FREE_PAGES_TO_FREE_LAST_WHOLE_SLAB_PERCENT) /100)){ dbg("reconfig to delete the pages cur_unres pgs %u, cur_res pgs %u", driver_ctx->dc_cur_unres_pages, driver_ctx->dc_cur_res_pages); nr_pages = nr_slabs; if((cur_allocd_pages - nr_pages) < default_dp_pages){ nr_pages = cur_allocd_pages - default_dp_pages; } delete_data_pages(nr_pages); reorg = 1; } prev_dcr_time = now.tv_sec; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); driver_ctx->data_flt_ctx.dp_least_free_pgs = driver_ctx->dc_cur_unres_pages; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); }else { if(now.tv_sec - prev_dcr_time < factor * thrshld_time){ goto reorg_done; } if(least_free_pages >= (slab_nrpgs * 2)){ dbg("reconfig to delete the pages"); nr_slabs = (least_free_pages/slab_nrpgs) - 1; nr_pages = nr_slabs * slab_nrpgs; if((cur_allocd_pages - nr_pages) < default_dp_pages){ nr_pages = cur_allocd_pages - default_dp_pages; } delete_data_pages(nr_pages); prev_dcr_time = now.tv_sec; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); driver_ctx->data_flt_ctx.dp_least_free_pgs = driver_ctx->dc_cur_unres_pages; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); reorg = 1; } } } reorg_done: prev_prot_vols = cur_prot_vols; INM_GET_CURRENT_TIME(now); recored_time = now.tv_sec; last_free_pages = cur_free_pages; if(reorg){ recalc_data_file_mode_thres(); } } return ret; } inm_s32_t wrap_reorg_datapool() { inm_u32_t flag = 0; flag |= WQ_FLAGS_REORG_DP_ALLOC; return reorg_datapool(flag); } involflt-0.1.0/src/file-io.h0000755000000000000000000000455314467303177014355 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _INM_FILE_IO_H #define _INM_FILE_IO_H #include "involflt-common.h" #include "involflt.h" /* code to handle recursive writes, the writes been self initiated by the * driver itself shouldn't be captured by the driver again. */ enum { INM_ORG_ADDR_SPACE_OPS = 0, INM_DUP_ADDR_SPACE_OPS = 1, INM_MAX_ADDR_OPS = 2, /* array size */ }; inma_ops_t *inm_alloc_inma_ops(void); void inm_free_inma_ops(inma_ops_t *inma_opsp); inma_ops_t *inm_get_inmaops_from_aops(const inm_address_space_operations_t *a_opsp, inm_u32_t lookup_flag); inm_s32_t inm_prepare_tohandle_recursive_writes(struct inode *inodep); void inm_restore_org_addr_space_ops(struct inode *inodep); inm_s32_t flt_open_file(const char *, inm_u32_t, void **); inm_s32_t flt_read_file (void *, void *, inm_u64_t , inm_u32_t ,inm_u32_t *); inm_s32_t flt_write_file(void *, void *, inm_u64_t, inm_u32_t, inm_u32_t *); int32_t flt_seek_file(void *, long long, inm_s64_t *, inm_u32_t); void flt_close_file (void *); long flt_mkdir(const char *, int); long inm_unlink(const char *, char *); inm_s32_t inm_unlink_symlink(const char *, char *); long flt_rmdir(const char *); inm_s32_t flt_get_file_size(void *, loff_t*); inm_s32_t read_full_file(char *, char *, inm_u32_t , inm_u32_t *); inm_s32_t write_full_file(char *, void *, inm_s32_t , inm_u32_t *); inm_s32_t write_to_file(char *, void *, inm_s32_t , inm_u32_t *); inm_s32_t flt_open_data_file(const char *, inm_u32_t, void **); int file_exists(char *filename); struct dentry *inm_lookup_create(struct nameidata *nd, inm_s32_t is_dir); #endif /* _INM_FILE_IO_H */ involflt-0.1.0/src/tunable_params.h0000755000000000000000000001762014467303177016025 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _INMAGE_TUNABLE_PARAMS_H #define _INMAGE_TUNABLE_PARAMS_H #include "involflt-common.h" /* Memory quotas in percentage based on total data pool size */ #define DEFAULT_DATA_POOL_SIZE_MB 0x40 /* in MB, 64MB */ #define DEFAULT_MAX_DATA_POOL_PERCENTAGE 0x32 /* in %age(50i%) */ #define DEFAULT_VOLUME_DATA_POOL_SIZE_MB \ ((2 * DEFAULT_MAX_DATA_SZ_PER_CHANGE_NODE) >> MEGABYTE_BIT_SHIFT) #define DEFAULT_DB_HIGH_WATERMARK_SERVICE_NOT_STARTED 0x2000 /* 8K Changes */ #define DEFAULT_DB_LOW_WATERMARK_SERVICE_RUNNING 0x4000 /* 16K Changes */ #define DEFAULT_DB_HIGH_WATERMARK_SERVICE_RUNNING 0x10000 /* 64K Changes */ #define DEFAULT_DB_HIGH_WATERMARK_SERVICE_SHUTDOWN 0x0800 /* 2K Changes */ #define DEFAULT_DB_TO_PURGE_HIGH_WATERMARK_REACHED 0x2000 /* 8K */ #define DEFAULT_FREE_THRESHOLD_FOR_FILEWRITE 0x14 #define DEFAULT_VOLUME_THRESHOLD_FOR_FILEWRITE 0x28 #define DEFAULT_VOLUME_DATA_TO_DISK_LIMIT_IN_MB 0x100 /* 256MB */ #define DEFAULT_VOLUME_DATALOG_DIR "/usr/local/InMage/Vx/ApplicationData" #define DEFAULT_MAX_DATA_PAGES_PER_TARGET 0x2000 /*8192 pages */ #define DEFAULT_SEQNO 0x0 #define DEFAULT_TIME_STAMP_VALUE 0x0 #define PERSISTENT_SEQNO_THRESHOLD 0x400 /*1024 */ #define RELOAD_TIME_SEQNO_JUMP_COUNT 0xF4240 /*1 million */ #define INM_INCR_DPS_LIMIT_ON_APP 2048 #define INM_DEFAULT_DPS_APPLIANCE_DRV 2 /* To make 1/4 that is 25% of total size */ #define INM_DEFAULT_VOLUME_MXS 0x40000 /* Max tranfer size of a device for aix */ #define DEFAULT_MAX_DATA_SIZE_PER_NON_DATA_MODE_DIRTY_BLOCK (64 * 1024 * 1024) /*in KBytes = 64MB */ #define DEFAULT_MAX_COALESCED_METADATA_CHANGE_SIZE 0x100000 /* 1MB */ #define DEFAULT_PERCENT_CHANGE_DATA_POOL_SIZE 0x5 #define DEFAULT_REORG_THRSHLD_TIME_SEC 0xa /* 10 sec */ #define DEFAULT_REORG_THRSHLD_TIME_FACTOR 0x3 /* bitmap related constants */ #define DEFAULT_MAXIMUM_BITMAP_BUFFER_MEMORY (0x100000 * 65) /*65 MBytes*/ #define DEFAULT_BITMAP_512K_GRANULARITY_SIZE (512) /* The last bucket has to be always zero. */ #define VC_DEFAULT_IO_SIZE_BUCKET_0 0x200 /* 512 */ #define VC_DEFAULT_IO_SIZE_BUCKET_1 0x400 /* 1K */ #define VC_DEFAULT_IO_SIZE_BUCKET_2 0x800 /* 2K */ #define VC_DEFAULT_IO_SIZE_BUCKET_3 0x1000 /* 4K */ #define VC_DEFAULT_IO_SIZE_BUCKET_4 0x2000 /* 8K */ #define VC_DEFAULT_IO_SIZE_BUCKET_5 0x4000 /* 16K */ #define VC_DEFAULT_IO_SIZE_BUCKET_6 0x10000 /* 64K */ #define VC_DEFAULT_IO_SIZE_BUCKET_7 0x40000 /* 256K */ #define VC_DEFAULT_IO_SIZE_BUCKET_8 0x100000 /* 1M */ #define VC_DEFAULT_IO_SIZE_BUCKET_9 0x400000 /* 4M */ #define VC_DEFAULT_IO_SIZE_BUCKET_10 0x800000 /* 8M */ #define VC_DEFAULT_IO_SIZE_BUCKET_11 0x0 /* > 8M */ #define DEFAULT_LOG_DIRECTORY_VALUE "/root/InMageVolumeLogs" void init_driver_tunable_params(void); inm_u32_t get_data_page_pool_mb(void); /* Be careful when modifying this. Ensure this directory gets created in sysfs_involflt_init() */ struct volume_attribute { struct attribute attr; inm_s32_t (*show)(target_context_t *, char *); inm_s32_t (*store)(target_context_t *, char *, const char *, inm_s32_t); char *file_name; void (*read)(target_context_t *, char *); }; #define VOLUME_ATTR(_struct_name, _name, _mode, _show, _store, _read) \ struct volume_attribute _struct_name = { \ .attr = { .name = _name, .mode = _mode}, \ .show = (inm_s32_t (*)(target_context_t *, char *)) _show, \ .store = (inm_s32_t (*)(target_context_t *, char *, const char *, inm_s32_t)) _store, \ .file_name = _name, \ .read =(void (*)(target_context_t *, char *)) _read, \ } /* Sysfs definitions for common objects */ #define COMMON_ATTR_NAME "common" struct common_attribute { struct attribute attr; inm_s32_t (*show)(char *); inm_s32_t (*store)(const char *, const char *, size_t); char *file_name; inm_s32_t (*read)(char *); }; #define COMMON_ATTR(_struct_name, _name, _mode, _show, _store, _read) \ struct common_attribute _struct_name = { \ .attr = { .name = _name, .mode = _mode}, \ .show = (inm_s32_t(*)(char*)) _show, \ .store = (inm_s32_t (*)(const char *, const char *, size_t)) _store, \ .file_name = _name, \ .read = (inm_s32_t (*)(char *)) _read, \ } #ifndef _TARGET_VOLUME_CTX #define _TARGET_VOLUME_CTX #define TARGET_VOLUME_DIRECT_IO 0x00000001 typedef struct initiator_node { struct inm_list_head init_list; char *initiator_wwpn; /* Can be FC wwpn or iSCSI iqn name */ inm_u64_t timestamp; /* Last IO timestamp */ } initiator_node_t; typedef struct target_volume_ctx { target_context_t *vcptr; inm_u32_t bsize; inm_u64_t nblocks; inm_u32_t virt_id; inm_atomic_t remote_volume_refcnt; /* keep track of last write that the initiaitor has performed. */ char initiator_name[MAX_INITIATOR_NAME_LEN]; char pt_guid[INM_GUID_LEN_MAX]; inm_u32_t flags; struct inm_list_head init_list; /* list of "initiator_list_t" */ } target_volume_ctx_t; #endif /*_TARGET_VOLUME_CTX */ struct _inm_attribute; inm_s32_t sysfs_involflt_init(void); inm_s32_t sysfs_init_volume(target_context_t *, char *pname); void load_driver_params(void); void load_volume_params(target_context_t *ctxt); void is_filtering_disabled_for_path(char *, int); int set_int_vol_attr(target_context_t *, enum volume_params_idx , int); void set_string_vol_attr(target_context_t *, enum volume_params_idx , char *); void set_longlong_vol_attr(target_context_t *, enum volume_params_idx , inm_s64_t ); void set_unsignedlonglong_vol_attr(target_context_t *, enum volume_params_idx, inm_u64_t); inm_s32_t read_value_from_file(char *, inm_s32_t *); inm_s32_t write_vol_attr(target_context_t * ctxt, const char *file_name, void *buf, inm_s32_t len); inm_s32_t inm_write_guid_attr(char *tc_guid, enum volume_params_idx index, inm_s32_t len); inm_s32_t common_get_set_attribute_entry(struct _inm_attribute *); inm_s32_t volume_get_set_attribute_entry(struct _inm_attribute *inm_attr); inm_s32_t mirror_dst_id_get(target_context_t *ctxt, char *uuid); inm_s32_t mirror_dst_id_set(target_context_t *ctxt, char *uuid); inm_u64_t filter_full_disk_flags_get(target_context_t *ctxt); ssize_t wrap_common_attr_store(inm_u32_t, const char *, size_t); inm_s32_t inm_is_upgrade_pname(char *actual, char *upgrade); /* Performance optmization levels & debugging info */ #define DEFAULT_PERFORMANCE_OPTMIZATION 0x00000007 #define PERF_OPT_DATA_MODE_CAPTURE_WITH_BITMAP 0x00000001 #define PERF_OPT_DRAIN_PREF_DATA_MODE_CHANGES_IN_NWO 0x00000002 #define PERF_OPT_METADATA_COALESCE 0x00000004 #define PERF_OPT_DEBUG_DBLK_FILENAME 0x00000008 #define PERF_OPT_DEBUG_DATA_DRAIN 0x00000010 #define PERF_OPT_DEBUG_DBLK_INFO 0x00000020 #define PERF_OPT_DEBUG_DBLK_CHANGES 0x00000040 #define PERF_OPT_DEBUG_COALESCED_CHANGES 0x00000080 #endif /* _INMAGE_TUNABLE_PARAMS_H */ involflt-0.1.0/src/ioctl.c0000755000000000000000000051416714467303177014145 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include "involflt.h" #include "involflt-common.h" #include "data-mode.h" #include "utils.h" #include "change-node.h" #include "filestream.h" #include "iobuffer.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "bitmap_api.h" #include "VBitmap.h" #include "work_queue.h" #include "data-file-mode.h" #include "target-context.h" #include "driver-context.h" #include "utils.h" #include "tunable_params.h" #include "db_routines.h" #include "involflt_debug.h" #include "metadata-mode.h" #include "file-io.h" #include "filter_host.h" #include "filter.h" #include "errlog.h" #ifdef INM_LINUX #include "filter_lun.h" #endif #include "ioctl.h" #include "filestream_raw.h" #include "last_chance_writes.h" #include "telemetry.h" struct _inm_resync_notify_info; extern driver_context_t *driver_ctx; extern char *ErrorToRegErrorDescriptionsA[]; extern const inm_s32_t sv_const_sz; extern const inm_s32_t sv_chg_sz; extern inm_s32_t flt_process_tags(inm_s32_t num_vols, void __INM_USER **user_buf, inm_s32_t flags, tag_guid_t *); #ifdef INM_LINUX extern flt_timer_t cp_timer; void start_cp_timer(int timeout_ms, timeout_t callback); void inm_fvol_list_thaw_on_timeout(wqentry_t *not_used); inm_s32_t iobarrier_issue_tag_all_volume(tag_info_t *tag_list, int nr_tags, int commit_pending, tag_telemetry_common_t *); extern inm_s32_t driver_state; #endif static inm_u32_t inm_calc_len_required( struct inm_list_head *ptr); static inm_u32_t inm_wait_exception_ev(target_context_t *, inm_resync_notify_info_t *); static inm_s32_t inm_fill_resync_notify_info(target_context_t *tgt_ctxt, struct _inm_resync_notify_info *resync_info); static void print_AT_stat_common(target_context_t *tcp, char *page, inm_s32_t *len); inm_s32_t stop_filtering_volume(char *uuid, inm_devhandle_t *idhp, int dbs_flag); static inm_u32_t inm_wait_exception_ev(target_context_t *tgt_ctxt, inm_resync_notify_info_t *resync_info) { inm_u32_t ret = 0; host_dev_ctx_t *hdcp = NULL; dbg("entered"); hdcp = tgt_ctxt->tc_priv; INM_BUG_ON(!hdcp); if (hdcp) { if (tgt_ctxt->tc_resync_required) { inm_fill_resync_notify_info(tgt_ctxt, resync_info); goto out; } inm_wait_event_interruptible_timeout(hdcp->resync_notify, tgt_ctxt->tc_resync_required, resync_info->timeout_in_sec * INM_HZ); } else { ret = INM_EINVAL; resync_info->rstatus = MIRROR_STACKING_ERR; goto out; } if(tgt_ctxt->tc_resync_required){ inm_fill_resync_notify_info(tgt_ctxt, resync_info); } else { if (!(is_target_mirror_paused(tgt_ctxt))){ inm_heartbeat_cdb(tgt_ctxt); } ret = INM_ETIMEDOUT; } out: dbg("leaving"); return ret; } static inm_s32_t inm_fill_resync_notify_info(target_context_t *tgt_ctxt, inm_resync_notify_info_t *resync_info) { inm_s32_t ret = 0; unsigned long sync_err = 0; if (!tgt_ctxt || !resync_info){ ret = INM_EINVAL; INM_BUG_ON(1); goto out; } volume_lock(tgt_ctxt); tgt_ctxt->tc_flags |= VCF_MIRRORING_PAUSED; volume_unlock(tgt_ctxt); resync_info->rsin_flag |= INM_SET_RESYNC_REQ_FLAG; resync_info->rsin_resync_err_code = tgt_ctxt->tc_out_of_sync_err_code; resync_info->rsin_out_of_sync_count = tgt_ctxt->tc_nr_out_of_sync; resync_info->rsin_out_of_sync_time_stamp = tgt_ctxt->tc_out_of_sync_time_stamp; resync_info->rsin_out_of_sync_err_status = tgt_ctxt->tc_out_of_sync_err_status; tgt_ctxt->tc_nr_out_of_sync_indicated = tgt_ctxt->tc_nr_out_of_sync; dbg("tcp TS %llu resy TS %llu tc_out_of_sync_err_code:%lu", tgt_ctxt->tc_out_of_sync_time_stamp, resync_info->rsin_out_of_sync_time_stamp, tgt_ctxt->tc_out_of_sync_err_code); sync_err = tgt_ctxt->tc_out_of_sync_err_code; if (sync_err > ERROR_TO_REG_MAX_ERROR) { sync_err = ERROR_TO_REG_DESCRIPTION_IN_EVENT_LOG; } snprintf(resync_info->rsin_err_string_resync, UDIRTY_BLOCK_MAX_ERROR_STRING_SIZE, ErrorToRegErrorDescriptionsA[sync_err], tgt_ctxt->tc_out_of_sync_err_status); resync_info->rsin_err_string_resync[UDIRTY_BLOCK_MAX_ERROR_STRING_SIZE-1] = '\0'; out: return ret; } static inm_u32_t inm_calc_len_required( struct inm_list_head *ptr) { inm_u32_t len = 0; target_context_t *tgt_ctxt; for (; ptr != &(driver_ctx->tgt_list); ptr = ptr->next, tgt_ctxt = NULL) { tgt_ctxt = inm_list_entry(ptr, target_context_t, tc_list); if(tgt_ctxt->tc_flags & (VCF_VOLUME_CREATING | VCF_VOLUME_DELETING)){ tgt_ctxt = NULL; continue; } len += strlen(tgt_ctxt->tc_guid) + 1; } return (len); } inm_s32_t process_start_notify_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg) { if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ dbg("entered"); } if(!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, sizeof(PROCESS_START_NOTIFY_INPUT))) { return -EFAULT; } /* Fail the ioctl, if start notify ioctl comes more than once */ if(driver_ctx->sentinal_idhp || driver_ctx->sentinal_pid) { return -EINVAL; } driver_ctx->sentinal_pid = INM_CURPROC_PID; driver_ctx->sentinal_idhp = idhp; get_time_stamp(&(driver_ctx->dc_tel.dt_s2_start_time)); telemetry_clear_dbs(&driver_ctx->dc_tel.dt_blend, DBS_S2_STOPPED); if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ dbg("leaving"); } return 0; } inm_s32_t process_shutdown_notify_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg) { unsigned long lock_flag = 0; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("entered"); } if(!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, sizeof(SHUTDOWN_NOTIFY_INPUT))) return -EFAULT; if(inm_flush_clean_shutdown(UNCLEAN_SHUTDOWN)){ INM_SPIN_LOCK_IRQSAVE(&driver_ctx->clean_shutdown_lock, lock_flag); driver_ctx->dc_flags |= SYS_UNCLEAN_SHUTDOWN; driver_ctx->unclean_shutdown = 0; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->clean_shutdown_lock, lock_flag); } driver_ctx->service_state = SERVICE_RUNNING; driver_ctx->svagent_pid = INM_CURPROC_PID; driver_ctx->svagent_idhp = idhp; info("service has started = %d , process name = %s\n", INM_CURPROC_PID, INM_CURPROC_COMM); driver_ctx->flags |= DC_FLAGS_SERVICE_STATE_CHANGED; driver_ctx->service_supports_data_filtering = TRUE; INM_ATOMIC_INC(&driver_ctx->service_thread.wakeup_event_raised); INM_WAKEUP_INTERRUPTIBLE(&driver_ctx->service_thread.wakeup_event); INM_COMPLETE(&driver_ctx->service_thread._new_event_completion); get_time_stamp(&(driver_ctx->dc_tel.dt_svagent_start_time)); telemetry_clear_dbs(&driver_ctx->dc_tel.dt_blend, DBS_SERVICE_STOPPED); if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("leaving"); } return 0; } inm_s32_t process_volume_stacking_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg) { inm_dev_extinfo_t *dev_infop = NULL; inm_s32_t err = 0; inm_device_t dtype; target_context_t *tgt_ctxt = NULL; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("entered"); } telemetry_clear_dbs(&driver_ctx->dc_tel.dt_blend, DBS_DRIVER_NOREBOOT_MODE); if(!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, sizeof(inm_dev_info_t))) { err("Read access violation for inm_dev_info_t"); err = INM_EFAULT; goto out; } dev_infop = (inm_dev_extinfo_t *)INM_KMALLOC(sizeof(inm_dev_extinfo_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!dev_infop) { err("INM_KMALLOC failed to allocate memory for inm_dev_info_t"); err = INM_EFAULT; goto out; } INM_MEM_ZERO(dev_infop, sizeof(inm_dev_info_t)); if(INM_COPYIN((inm_dev_info_t *)dev_infop, arg, sizeof(inm_dev_info_t))) { err("INM_COPYIN failed"); err = INM_EFAULT; goto out; } dev_infop->d_guid[GUID_SIZE_IN_CHARS-1] = '\0'; dev_infop->d_pname[GUID_SIZE_IN_CHARS-1] = '\0'; dev_infop->d_mnt_pt[INM_PATH_MAX-1] = '\0'; switch (dev_infop->d_type) { case FILTER_DEV_FABRIC_LUN: case FILTER_DEV_HOST_VOLUME: if (strncpy_s(dev_infop->d_src_scsi_id, INM_GUID_LEN_MAX, dev_infop->d_pname, INM_GUID_LEN_MAX)) { err = INM_EFAULT; goto out; } dev_infop->d_src_scsi_id[INM_MAX_SCSI_ID_SIZE-1] = '\0'; dev_infop->d_dst_scsi_id[INM_MAX_SCSI_ID_SIZE-1] = '\0'; dev_infop->src_list = NULL; dev_infop->dst_list = NULL; dev_infop->d_startoff = 0; dev_infop->d_flags = HOST_VOLUME_STACKING_FLAG; break; case FILTER_DEV_MIRROR_SETUP: dev_infop->d_flags = MIRROR_VOLUME_STACKING_FLAG; break; default: err("invalid filter dev type:%d",dev_infop->d_type); err = -EINVAL; goto out; } if (driver_state & DRV_LOADED_FULLY) { if (is_flt_disabled(dev_infop->d_pname)) { info("Filtering not enabled for %s", dev_infop->d_pname); err = -EINVAL; } if (!err) { dtype = filter_dev_type_get(dev_infop->d_pname); if (dtype != FILTER_DEV_HOST_VOLUME && dtype != dev_infop->d_type) { err("Invalid dev type %d for %s", dtype, dev_infop->d_pname); err = -EINVAL; } } } /* * If the volume was partially stacked from initrd when it should not be, * let the volume be initialized fully before stop filtering on it */ if (err) { tgt_ctxt = get_tgt_ctxt_from_uuid(dev_infop->d_guid); if (!tgt_ctxt) goto out; INM_BUG_ON(!(tgt_ctxt->tc_flags & VCF_VOLUME_STACKED_PARTIALLY)); } err = do_volume_stacking(dev_infop); if (tgt_ctxt) { err("%s not protected", tgt_ctxt->tc_guid); stop_filtering_volume(tgt_ctxt->tc_guid, idhp, DBS_FILTERING_STOPPED_BY_KERNEL); put_tgt_ctxt(tgt_ctxt); } out: if (err) err("Stacking failed for %s (%s) - %d", dev_infop->d_guid, dev_infop->d_pname, err); if (dev_infop) INM_KFREE(dev_infop, sizeof(*dev_infop), INM_KERNEL_HEAP); idhp->private_data = NULL; dbg("leaving"); return err; } inm_s32_t process_start_filtering_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg) { inm_dev_extinfo_t *dev_infop = NULL; inm_s32_t err = 0; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("entered"); } if(!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, sizeof(inm_dev_info_t))) { err("Read access violation for inm_dev_info_t"); return INM_EFAULT; } dev_infop = (inm_dev_extinfo_t *)INM_KMALLOC(sizeof(inm_dev_extinfo_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!dev_infop) { err("INM_KMALLOC failed to allocate memory for inm_dev_info_t"); return INM_ENOMEM; } INM_MEM_ZERO(dev_infop, sizeof(inm_dev_extinfo_t)); if(INM_COPYIN((inm_dev_info_t *)dev_infop, arg, sizeof(inm_dev_info_t))) { err("INM_COPYIN failed"); INM_KFREE(dev_infop, sizeof(inm_dev_info_t), INM_KERNEL_HEAP); return INM_EFAULT; } dev_infop->d_guid[GUID_SIZE_IN_CHARS-1] = '\0'; dev_infop->d_pname[GUID_SIZE_IN_CHARS-1] = '\0'; if (strncpy_s(dev_infop->d_src_scsi_id, INM_GUID_LEN_MAX, dev_infop->d_pname, strlen(dev_infop->d_pname))) { INM_KFREE(dev_infop, sizeof(inm_dev_info_t), INM_KERNEL_HEAP); return INM_EFAULT; } dev_infop->d_src_scsi_id[INM_GUID_LEN_MAX-1] = '\0'; dev_infop->d_mnt_pt[INM_PATH_MAX-1] = '\0'; dev_infop->d_dst_scsi_id[0] = '\0'; dev_infop->src_list = NULL; dev_infop->dst_list = NULL; dev_infop->d_startoff = 0; err = do_start_filtering(idhp, dev_infop); INM_KFREE(dev_infop, sizeof(*dev_infop), INM_KERNEL_HEAP); return err; } inm_s32_t process_start_mirroring_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg) { mirror_conf_info_t *mirror_infop = NULL; inm_s32_t err = 0; inm_irqflag_t lock_flag = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("entered"); } if(!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, sizeof(mirror_conf_info_t))) { err("Read access violation for mirror_conf_info_t"); return INM_EFAULT; } mirror_infop = (mirror_conf_info_t *)INM_KMALLOC(sizeof(mirror_conf_info_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!mirror_infop) { err("INM_KMALLOC failed to allocate memory for mirror_conf_info_t"); return INM_ENOMEM; } INM_MEM_ZERO(mirror_infop, sizeof(mirror_conf_info_t)); if (INM_COPYIN(mirror_infop, arg, sizeof(mirror_conf_info_t))) { err("INM_COPYIN failed"); INM_KFREE(mirror_infop, sizeof(mirror_conf_info_t), INM_KERNEL_HEAP); return INM_EFAULT; } INM_SPIN_LOCK_IRQSAVE(&driver_ctx->clean_shutdown_lock, lock_flag); if(driver_ctx->dc_flags & DRV_MIRROR_NOT_SUPPORT){ INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->clean_shutdown_lock, lock_flag); err("Mirror is not supported as system didn't had any scsi device at driver loading time"); mirror_infop->d_status = MIRROR_NOT_SUPPORTED; err = INM_ENOTSUP; goto out; } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->clean_shutdown_lock, lock_flag); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("Mirror setup information:"); info("d_type:%d d_flags:%llu d_nblks:%llu d_bsize:%llu startoff:%llu", mirror_infop->d_type, mirror_infop->d_flags, mirror_infop->d_nblks, mirror_infop->d_bsize, mirror_infop->startoff); } err = do_start_mirroring(idhp, mirror_infop); out: if (INM_COPYOUT(arg, mirror_infop, sizeof(mirror_conf_info_t))) { err("INM_COPYOUT failed"); err = INM_EFAULT; } INM_KFREE(mirror_infop, sizeof(*mirror_infop), INM_KERNEL_HEAP); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("leaving err:%d",err); } return err; } inm_s32_t process_mirror_volume_stacking_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg) { mirror_conf_info_t *mirror_infop = NULL; inm_s32_t err = 0; inm_dev_extinfo_t *dev_infop = NULL; struct inm_list_head src_mirror_list_head, dst_mirror_list_head; mirror_vol_entry_t *vol_entry = NULL; inm_irqflag_t lock_flag = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("entered"); } if(!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, sizeof(mirror_conf_info_t))) { err("Read access violation for mirror_conf_info_t"); return INM_EFAULT; } mirror_infop = (mirror_conf_info_t *)INM_KMALLOC(sizeof(mirror_conf_info_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!mirror_infop) { err("INM_KMALLOC failed to allocate memory for mirror_conf_info_t"); return INM_ENOMEM; } INM_MEM_ZERO(mirror_infop, sizeof(mirror_conf_info_t)); if(INM_COPYIN(mirror_infop, arg, sizeof(mirror_conf_info_t))) { err("INM_COPYIN failed"); INM_KFREE(mirror_infop, sizeof(mirror_conf_info_t), INM_KERNEL_HEAP); return INM_EFAULT; } INM_SPIN_LOCK_IRQSAVE(&driver_ctx->clean_shutdown_lock, lock_flag); if(driver_ctx->dc_flags & DRV_MIRROR_NOT_SUPPORT){ INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->clean_shutdown_lock, lock_flag); err("Mirror is not supported as system didn't had any scsi device at driver loading time"); mirror_infop->d_status = MIRROR_NOT_SUPPORTED; err = INM_ENOTSUP; goto out; } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->clean_shutdown_lock, lock_flag); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("Mirror setup information:"); info("d_type:%d d_flags:%llu d_nblks:%llu d_bsize:%llu", mirror_infop->d_type, mirror_infop->d_flags, mirror_infop->d_nblks, mirror_infop->d_bsize); } INM_INIT_LIST_HEAD(&src_mirror_list_head); INM_INIT_LIST_HEAD(&dst_mirror_list_head); err = populate_volume_lists(&src_mirror_list_head, &dst_mirror_list_head, mirror_infop); if (err) { goto out; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("source volume:%u scsi_id:%s: list: ", mirror_infop->nsources, mirror_infop->src_scsi_id); print_mirror_list(&src_mirror_list_head); info("destination volume:%u scsi_id:%s: list: ", mirror_infop->ndestinations, mirror_infop->dst_scsi_id); print_mirror_list(&dst_mirror_list_head); } if (mirror_infop->src_scsi_id[0] == ' ' || mirror_infop->src_scsi_id[0] == '\0') { err("Empty source scsi id:%s:",mirror_infop->src_scsi_id); err = EINVAL; mirror_infop->d_status = SRC_DEV_SCSI_ID_ERR; free_mirror_list(&src_mirror_list_head, 0); free_mirror_list(&dst_mirror_list_head, 1); goto out; } if (mirror_infop->dst_scsi_id[0] == ' ' || mirror_infop->dst_scsi_id[0] == '\0') { err("Empty destination scsi id:%s:",mirror_infop->src_scsi_id); err = EINVAL; mirror_infop->d_status = DST_DEV_SCSI_ID_ERR; free_mirror_list(&src_mirror_list_head, 0); free_mirror_list(&dst_mirror_list_head, 1); goto out; } /* This is mirroring code and change to is_flt_disabled() should not affect */ if (is_flt_disabled(mirror_infop->src_scsi_id)) { dbg("stop mirroring already issued on device with scsi id %s\n", mirror_infop->src_scsi_id); return -EINVAL; } vol_entry = inm_list_entry(src_mirror_list_head.next, mirror_vol_entry_t, next); INM_BUG_ON(!vol_entry); dev_infop = (inm_dev_extinfo_t *)INM_KMALLOC(sizeof(inm_dev_extinfo_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!dev_infop) { err("INM_KMALLOC failed to allocate memory for inm_dev_info_t"); err = INM_ENOMEM; mirror_infop->d_status = DRV_MEM_ALLOC_ERR; free_mirror_list(&src_mirror_list_head, 0); free_mirror_list(&dst_mirror_list_head, 1); goto out; } INM_MEM_ZERO(dev_infop, sizeof(inm_dev_extinfo_t)); dev_infop->d_type = mirror_infop->d_type; dev_infop->d_startoff = 0; if (strncpy_s(dev_infop->d_guid, INM_GUID_LEN_MAX, vol_entry->tc_mirror_guid, strlen(vol_entry->tc_mirror_guid)) || strncpy_s(dev_infop->d_src_scsi_id, INM_MAX_SCSI_ID_SIZE, mirror_infop->src_scsi_id, INM_MAX_SCSI_ID_SIZE - 1) || strncpy_s(dev_infop->d_dst_scsi_id, INM_MAX_SCSI_ID_SIZE, mirror_infop->dst_scsi_id, INM_MAX_SCSI_ID_SIZE - 1)) { free_mirror_list(&src_mirror_list_head, 0); free_mirror_list(&dst_mirror_list_head, 1); err = INM_EFAULT; goto out; } dev_infop->d_guid[INM_GUID_LEN_MAX-1] = '\0'; dev_infop->d_src_scsi_id[strlen(mirror_infop->src_scsi_id)] = '\0'; dev_infop->d_dst_scsi_id[strlen(mirror_infop->dst_scsi_id)] = '\0'; dev_infop->d_flags = mirror_infop->d_flags; dev_infop->d_flags |= MIRROR_VOLUME_STACKING_FLAG; dev_infop->d_nblks = mirror_infop->d_nblks; dev_infop->d_bsize = mirror_infop->d_bsize; dev_infop->src_list = &src_mirror_list_head; dev_infop->dst_list = &dst_mirror_list_head; dev_infop->d_startoff = mirror_infop->startoff; if (mirror_infop->d_type == FILTER_DEV_MIRROR_SETUP) { err = do_volume_stacking(dev_infop); mirror_infop->d_status = MIRROR_STACKING_ERR; } out: if (INM_COPYOUT(arg, mirror_infop, sizeof(mirror_conf_info_t))) { err("INM_COPYOUT failed"); err = INM_EFAULT; } if (mirror_infop) { INM_KFREE(mirror_infop, sizeof(*mirror_infop), INM_KERNEL_HEAP); } if (dev_infop) { INM_KFREE(dev_infop, sizeof(*dev_infop), INM_KERNEL_HEAP); } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_MIRROR))){ info("leaving err:%d",err); } return err; } void process_stop_filtering_common(target_context_t *tgt_ctxt, inm_devhandle_t *idhp) { dbg("entered"); get_tgt_ctxt(tgt_ctxt); switch (tgt_ctxt->tc_dev_type) { case FILTER_DEV_HOST_VOLUME: case FILTER_DEV_FABRIC_LUN: if (tgt_ctxt->tc_dev_type == FILTER_DEV_FABRIC_LUN) { inm_scst_unregister(tgt_ctxt); } break; case FILTER_DEV_MIRROR_SETUP: break; default: err("Invalid dev type:%d",tgt_ctxt->tc_dev_type); } tgt_ctx_force_soft_remove(tgt_ctxt); inm_unlink(tgt_ctxt->tc_bp->bitmap_file_name, tgt_ctxt->tc_bp->bitmap_dir_name); put_tgt_ctxt(tgt_ctxt); if (idhp && idhp->private_data) { /* One extra dereference as we did extra reference in * start filtering */ put_tgt_ctxt(tgt_ctxt); } dbg("leaving"); } inm_s32_t stop_filtering_volume(char *uuid, inm_devhandle_t *idhp, int dbs_flag) { target_context_t *tgt_ctxt = NULL; #ifdef INM_AIX int flag; #endif INM_DOWN_WRITE(&driver_ctx->tgt_list_sem); tgt_ctxt = get_tgt_ctxt_from_uuid_locked(uuid); if(!tgt_ctxt) { INM_UP_WRITE(&driver_ctx->tgt_list_sem); dbg("Failed to get target context from uuid"); return -ENODEV; } #ifdef INM_AIX INM_SPIN_LOCK(&driver_ctx->tgt_list_lock, flag); #endif volume_lock(tgt_ctxt); tgt_ctxt->tc_flags |= VCF_VOLUME_DELETING; tgt_ctxt->tc_filtering_disable_required = 1; get_time_stamp(&(tgt_ctxt->tc_tel.tt_user_stop_flt_time)); telemetry_set_dbs(&tgt_ctxt->tc_tel.tt_blend, dbs_flag); close_disk_cx_session(tgt_ctxt, CX_CLOSE_STOP_FILTERING_ISSUED); set_tag_drain_notify_status(tgt_ctxt, TAG_STATUS_DROPPED, DEVICE_STATUS_FILTERING_STOPPED); volume_unlock(tgt_ctxt); #ifdef INM_AIX INM_SPIN_UNLOCK(&driver_ctx->tgt_list_lock, flag); #endif if (driver_ctx->dc_root_disk == tgt_ctxt) driver_ctx->dc_root_disk = NULL; INM_UP_WRITE(&driver_ctx->tgt_list_sem); inm_erase_resync_info_from_persistent_store(tgt_ctxt->tc_pname); process_stop_filtering_common(tgt_ctxt, idhp); return 0; } inm_s32_t process_stop_filtering_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg) { inm_s32_t error = 0; VOLUME_GUID *guid = NULL; dbg("entered"); if(!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, sizeof(VOLUME_GUID))) { err("Read access violation for VOLUME_GUID"); return -EFAULT; } guid = (VOLUME_GUID *)INM_KMALLOC(sizeof(VOLUME_GUID), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!guid) { err("INM_KMALLOC failed to allocate memory for VOLUME_GUID"); return -ENOMEM; } if(INM_COPYIN(guid, arg, sizeof(VOLUME_GUID))) { err("INM_COPYIN failed"); INM_KFREE(guid, sizeof(VOLUME_GUID), INM_KERNEL_HEAP); return -EFAULT; } guid->volume_guid[GUID_SIZE_IN_CHARS-1] = '\0'; error = stop_filtering_volume((char *)&guid->volume_guid[0], idhp, DBS_FILTERING_STOPPED_BY_USER); INM_KFREE(guid, sizeof(VOLUME_GUID), INM_KERNEL_HEAP); idhp->private_data = NULL; dbg("leaving"); return error; } inm_s32_t process_stop_mirroring_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg) { target_context_t *tgt_ctxt = NULL; SCSI_ID *scsi_id = NULL; inm_irqflag_t lock_flag = 0; dbg("entered"); INM_SPIN_LOCK_IRQSAVE(&driver_ctx->clean_shutdown_lock, lock_flag); if(driver_ctx->dc_flags & DRV_MIRROR_NOT_SUPPORT){ INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->clean_shutdown_lock, lock_flag); err("Mirror is not supported as system didn't had any scsi device at driver loading time"); return INM_ENOTSUP; } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->clean_shutdown_lock, lock_flag); if (!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, sizeof(SCSI_ID))) { err("Read access violation for SCSI_ID"); return -EFAULT; } scsi_id = (SCSI_ID *)INM_KMALLOC(sizeof(SCSI_ID), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!scsi_id) { err("INM_KMALLOC failed to allocate memory for SCSI_ID"); return -ENOMEM; } if (INM_COPYIN(scsi_id, arg, sizeof(SCSI_ID))) { err("INM_COPYIN failed"); INM_KFREE(scsi_id, sizeof(SCSI_ID), INM_KERNEL_HEAP); return -EFAULT; } scsi_id->scsi_id[INM_MAX_SCSI_ID_SIZE-1] = '\0'; INM_DOWN_WRITE(&driver_ctx->tgt_list_sem); tgt_ctxt = get_tgt_ctxt_from_scsiid_locked((char *)&scsi_id->scsi_id[0]); if (!tgt_ctxt) { INM_UP_WRITE(&driver_ctx->tgt_list_sem); dbg("Failed to get target context from scsi id:%s",scsi_id->scsi_id); INM_KFREE(scsi_id, sizeof(SCSI_ID), INM_KERNEL_HEAP); return 0; } volume_lock(tgt_ctxt); tgt_ctxt->tc_flags |= VCF_VOLUME_DELETING; tgt_ctxt->tc_filtering_disable_required = 1; volume_unlock(tgt_ctxt); if (driver_ctx->dc_root_disk == tgt_ctxt) driver_ctx->dc_root_disk = NULL; INM_UP_WRITE(&driver_ctx->tgt_list_sem); inm_erase_resync_info_from_persistent_store(tgt_ctxt->tc_pname); process_stop_filtering_common(tgt_ctxt, idhp); INM_KFREE(scsi_id, sizeof(SCSI_ID), INM_KERNEL_HEAP); idhp->private_data = NULL; dbg("leaving"); return 0; } inm_s32_t process_volume_unstacking_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg) { target_context_t *tgt_ctxt = NULL; VOLUME_GUID *guid = NULL; #ifdef INM_AIX int flag; #endif dbg("entered"); if(!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, sizeof(VOLUME_GUID))) { err("Read access violation for VOLUME_GUID"); return -EFAULT; } guid = (VOLUME_GUID *)INM_KMALLOC(sizeof(VOLUME_GUID), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!guid) { err("INM_KMALLOC failed to allocate memory for VOLUME_GUID"); return -ENOMEM; } if(INM_COPYIN(guid, arg, sizeof(VOLUME_GUID))) { err("INM_COPYIN failed"); INM_KFREE(guid, sizeof(VOLUME_GUID), INM_KERNEL_HEAP); return -EFAULT; } guid->volume_guid[GUID_SIZE_IN_CHARS-1] = '\0'; INM_DOWN_WRITE(&driver_ctx->tgt_list_sem); tgt_ctxt = get_tgt_ctxt_from_uuid_locked((char *)&guid->volume_guid[0]); if(!tgt_ctxt) { INM_UP_WRITE(&driver_ctx->tgt_list_sem); dbg("Failed to get target context from uuid"); INM_KFREE(guid, sizeof(VOLUME_GUID), INM_KERNEL_HEAP); return -EINVAL; } #ifdef INM_AIX INM_SPIN_LOCK(&driver_ctx->tgt_list_lock, flag); #endif volume_lock(tgt_ctxt); tgt_ctxt->tc_flags |= VCF_VOLUME_DELETING; tgt_ctxt->tc_filtering_disable_required = 1; get_time_stamp(&(tgt_ctxt->tc_tel.tt_user_stop_flt_time)); telemetry_set_dbs(&tgt_ctxt->tc_tel.tt_blend, DBS_FILTERING_STOPPED_BY_USER); close_disk_cx_session(tgt_ctxt, CX_CLOSE_STOP_FILTERING_ISSUED); volume_unlock(tgt_ctxt); #ifdef INM_AIX INM_SPIN_UNLOCK(&driver_ctx->tgt_list_lock, flag); #endif if (driver_ctx->dc_root_disk == tgt_ctxt) driver_ctx->dc_root_disk = NULL; INM_UP_WRITE(&driver_ctx->tgt_list_sem); inm_erase_resync_info_from_persistent_store(tgt_ctxt->tc_pname); if (tgt_ctxt->tc_dev_type == FILTER_DEV_FABRIC_LUN) { inm_scst_unregister(tgt_ctxt); } get_tgt_ctxt(tgt_ctxt); tgt_ctx_force_soft_remove(tgt_ctxt); inm_unlink(tgt_ctxt->tc_bp->bitmap_file_name, tgt_ctxt->tc_bp->bitmap_dir_name); put_tgt_ctxt(tgt_ctxt); if (idhp->private_data) put_tgt_ctxt(tgt_ctxt); INM_KFREE(guid, sizeof(VOLUME_GUID), INM_KERNEL_HEAP); idhp->private_data = NULL; dbg("leaving"); return 0; } inm_s32_t process_get_db_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg) { target_context_t *ctxt = (target_context_t *)idhp->private_data; UDIRTY_BLOCK_V2 *user_db = NULL; inm_s32_t status = 0; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ dbg("entered"); } if(!ctxt) { err("Get_Db_trans ioctl called as file private is NULL"); return -EINVAL; } if(is_target_filtering_disabled(ctxt)) { dbg("Get_Db_trans ioctl is failed as filtering is not enabled"); return INM_EBUSY; } if(ctxt->tc_flags & VCF_DRAIN_BLOCKED) { return -EFAULT; } if(!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, sizeof(UDIRTY_BLOCK_V2))) return -EFAULT; user_db = ctxt->tc_db_v2; INM_BUG_ON(!user_db); if(!user_db) { err("Failed to allocate memory for Udirty Block"); return -ENOMEM; } if(INM_COPYIN(user_db, arg, sizeof(UDIRTY_BLOCK_V2))) { err("Copy from user failed in get_db"); return -EFAULT; } get_tgt_ctxt(ctxt); volume_lock(ctxt); get_time_stamp(&ctxt->tc_tel.tt_getdb_time); update_cx_with_s2_latency(ctxt); if (ctxt->tc_tel.tt_ds_throttle_stop == TELEMETRY_THROTTLE_IN_PROGRESS) { get_time_stamp(&ctxt->tc_tel.tt_ds_throttle_stop); telemetry_clear_dbs(&ctxt->tc_tel.tt_blend, DBS_DIFF_SYNC_THROTTLE); } volume_unlock(ctxt); status = fill_udirty_block(ctxt, user_db, idhp); put_tgt_ctxt(ctxt); if(INM_COPYOUT(arg, user_db, sizeof(UDIRTY_BLOCK_V2))) { err("copy to user failed in get_db"); return -EFAULT; } #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) inm_alloc_pools(); #else balance_page_pool(INM_KM_SLEEP, 0); #endif if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ dbg("leaving"); } return status; } inm_s32_t process_commit_db_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg) { inm_s32_t err = 0; target_context_t *ctxt = (target_context_t *)idhp->private_data; COMMIT_TRANSACTION *commit_db; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ dbg("entered"); } if(!ctxt) { err("Commit DB failed as file private is NULL"); return -EINVAL; } if(!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, sizeof(COMMIT_TRANSACTION))) return -EFAULT; commit_db = (COMMIT_TRANSACTION*)INM_KMALLOC(sizeof(COMMIT_TRANSACTION), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!commit_db) { err("Failed to allocate memory for Commit DB"); return -ENOMEM; } if(INM_COPYIN(commit_db, arg, sizeof(COMMIT_TRANSACTION))) { err("copy from user failed in commit db ioctl"); INM_KFREE(commit_db, sizeof(COMMIT_TRANSACTION), INM_KERNEL_HEAP); return -EFAULT; } get_tgt_ctxt(ctxt); err = perform_commit(ctxt, commit_db, idhp); put_tgt_ctxt(ctxt); INM_KFREE(commit_db, sizeof(COMMIT_TRANSACTION), INM_KERNEL_HEAP); if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ dbg("leaving"); } return err; } inm_s32_t process_get_time_ioctl(void __INM_USER *arg) { inm_u64_t ts = 0; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ dbg("entered"); } if(!INM_ACCESS_OK(VERIFY_WRITE, (void __INM_USER *)arg, sizeof(long long))) { err("write access verification failed"); return -EFAULT; } get_time_stamp(&ts); if(INM_COPYOUT(arg, &ts, sizeof(long long))) { err("copy to user failed"); return -EFAULT; } if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ dbg("leaving"); } return 0; } inm_s32_t process_clear_diffs_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg) { char *uuid = NULL; target_context_t *tgt_ctxt = NULL; dbg("clear diffs issued"); if(!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, sizeof(VOLUME_GUID))) return -EFAULT; uuid = (char *)INM_KMALLOC(GUID_SIZE_IN_CHARS, INM_KM_SLEEP, INM_KERNEL_HEAP); if(!uuid) return -ENOMEM; if(INM_COPYIN(uuid, arg, GUID_SIZE_IN_CHARS)) { INM_KFREE(uuid, GUID_SIZE_IN_CHARS, INM_KERNEL_HEAP); return -EFAULT; } tgt_ctxt = get_tgt_ctxt_from_uuid_nowait(uuid); if(!tgt_ctxt) { /* It is possible that cleardiffs may come before * completing the stacking, in this case return * the status is considered as success. */ INM_KFREE(uuid, GUID_SIZE_IN_CHARS, INM_KERNEL_HEAP); return 0; } do_clear_diffs(tgt_ctxt); put_tgt_ctxt(tgt_ctxt); INM_KFREE(uuid, GUID_SIZE_IN_CHARS, INM_KERNEL_HEAP); if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ dbg("leaving"); } return 0; } inm_s32_t process_set_volume_flags_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg) { target_context_t *ctxt; VOLUME_FLAGS_INPUT *flagip; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ dbg("entered"); } if(!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, sizeof(VOLUME_FLAGS_OUTPUT))) return -EFAULT; flagip = (VOLUME_FLAGS_INPUT *)INM_KMALLOC(sizeof(VOLUME_FLAGS_INPUT), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!flagip) return -ENOMEM; if(INM_COPYIN(flagip, arg, sizeof(VOLUME_FLAGS_INPUT))) { INM_KFREE(flagip, sizeof(VOLUME_FLAGS_INPUT), INM_KERNEL_HEAP); return -EFAULT; } ctxt = get_tgt_ctxt_from_uuid_nowait((char *)flagip->VolumeGUID); if(!ctxt) { dbg("Failed to get target context from uuid"); INM_KFREE(flagip, sizeof(VOLUME_FLAGS_INPUT), INM_KERNEL_HEAP); return -EINVAL; } volume_lock(ctxt); if(flagip->eOperation == ecBitOpSet) { if(flagip->ulVolumeFlags & VCF_READ_ONLY) ctxt->tc_flags |= VCF_READ_ONLY; } else { if(flagip->ulVolumeFlags & VCF_READ_ONLY) { if(ctxt->tc_flags & VCF_READ_ONLY) ctxt->tc_flags &= ~VCF_READ_ONLY; else ctxt->tc_flags |= VCF_READ_ONLY; } } volume_unlock(ctxt); put_tgt_ctxt(ctxt); INM_KFREE(flagip, sizeof(VOLUME_FLAGS_INPUT), INM_KERNEL_HEAP); if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ dbg("leaving"); } return 0; } inm_s32_t process_get_volume_flags_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg) { target_context_t *ctxt; VOLUME_FLAGS_INPUT *flagip; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ dbg("entered"); } if(!INM_ACCESS_OK(VERIFY_WRITE, (void __INM_USER *)arg, sizeof(VOLUME_FLAGS_OUTPUT))) return -EFAULT; flagip = (VOLUME_FLAGS_INPUT *)INM_KMALLOC(sizeof(VOLUME_FLAGS_INPUT), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!flagip) return -ENOMEM; ctxt = get_tgt_ctxt_from_uuid_nowait((char *)flagip->VolumeGUID); if(!ctxt) { dbg("Failed to get target context from uuid"); INM_KFREE(flagip, sizeof(VOLUME_FLAGS_INPUT), INM_KERNEL_HEAP); return -EINVAL; } volume_lock(ctxt); flagip->ulVolumeFlags = ctxt->tc_flags; volume_unlock(ctxt); if(INM_COPYOUT(arg, flagip, sizeof(VOLUME_FLAGS_INPUT))) { INM_KFREE(flagip, sizeof(VOLUME_FLAGS_INPUT), INM_KERNEL_HEAP); return -EFAULT; } put_tgt_ctxt(ctxt); INM_KFREE(flagip, sizeof(VOLUME_FLAGS_INPUT), INM_KERNEL_HEAP); if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ dbg("leaving"); } return 0; } inm_s32_t wait_for_db(target_context_t *ctxt, inm_s32_t timeout) { inm_s64_t timeout_err = 0; inm_s32_t need_to_wait = 1; volume_lock(ctxt); if(!should_wait_for_db(ctxt)) need_to_wait = 0; else GET_TIME_STAMP_IN_USEC(ctxt->tc_dbwait_event_ts_in_usec); volume_unlock(ctxt); if(need_to_wait) { inm_wait_event_interruptible_timeout(ctxt->tc_waitq, should_wakeup_s2(ctxt), (timeout * INM_HZ)); volume_lock(ctxt); if (!ctxt->tc_pending_changes) { timeout_err = (inm_s32_t)INM_ETIMEDOUT; ctxt->tc_dbwait_event_ts_in_usec = 0; } volume_unlock(ctxt); } return (inm_s32_t)timeout_err; } inm_s32_t process_wait_for_db_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg) { target_context_t *ctxt; WAIT_FOR_DB_NOTIFY *notify; inm_s32_t timeout_err = 0; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ dbg("entered"); } if(!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, sizeof(WAIT_FOR_DB_NOTIFY))) return -EFAULT; notify = (WAIT_FOR_DB_NOTIFY *)INM_KMALLOC(sizeof(WAIT_FOR_DB_NOTIFY), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!notify) return -ENOMEM; if(INM_COPYIN(notify, arg, sizeof(WAIT_FOR_DB_NOTIFY))) { INM_KFREE(notify, sizeof(WAIT_FOR_DB_NOTIFY), INM_KERNEL_HEAP); return -EFAULT; } notify->VolumeGUID[GUID_SIZE_IN_CHARS-1] = '\0'; dbg("wait db on volume = %s", ¬ify->VolumeGUID[0]); ctxt = get_tgt_ctxt_from_uuid_nowait((char *)¬ify->VolumeGUID[0]); if(!ctxt) { dbg("Failed to get target context from uuid"); INM_KFREE(notify, sizeof(WAIT_FOR_DB_NOTIFY), INM_KERNEL_HEAP); INM_DELAY(60*INM_HZ); return -EINVAL; } timeout_err = wait_for_db(ctxt, notify->Seconds); get_time_stamp(&(ctxt->tc_s2_latency_base_ts)); INM_KFREE(notify, sizeof(WAIT_FOR_DB_NOTIFY), INM_KERNEL_HEAP); put_tgt_ctxt(ctxt); return timeout_err; } inm_s32_t process_wait_for_db_v2_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg) { target_context_t *ctxt = NULL; WAIT_FOR_DB_NOTIFY *notify = NULL; inm_s32_t error = 0; if (!INM_ACCESS_OK(VERIFY_READ, arg, sizeof(WAIT_FOR_DB_NOTIFY))) { error = -EFAULT; goto err; } notify = INM_KMALLOC(sizeof(WAIT_FOR_DB_NOTIFY), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!notify) { error = -ENOMEM; goto err; } if (INM_COPYIN(notify, arg, sizeof(WAIT_FOR_DB_NOTIFY))) { error = -EFAULT; goto err; } notify->VolumeGUID[GUID_SIZE_IN_CHARS-1] = '\0'; ctxt = (target_context_t *)idhp->private_data; if(!ctxt) { err("Wait_DB_V2 ioctl called without file private"); error = -EINVAL; goto err; } get_tgt_ctxt(ctxt); if(is_target_filtering_disabled(ctxt)) { err("Wait_DB_V2 ioctl filtering disabled"); error = INM_EBUSY; goto err; } if (strcmp(ctxt->tc_guid, notify->VolumeGUID)) { err("Wait_DB_V2 ioctl called without file private"); error = -EINVAL; goto err; } dbg("wait db on volume = %s", ctxt->tc_guid); error = wait_for_db(ctxt, notify->Seconds); get_time_stamp(&(ctxt->tc_s2_latency_base_ts)); out: if (ctxt) put_tgt_ctxt(ctxt); if (notify) INM_KFREE(notify, sizeof(WAIT_FOR_DB_NOTIFY), INM_KERNEL_HEAP); return error; err: INM_DELAY(60*INM_HZ); goto out; } static inm_s32_t shutdown_volume(target_context_t *vcptr, inm_s32_t freeze_root) { struct inm_list_head head; dbg("Shutting down %s", vcptr->tc_guid); set_unsignedlonglong_vol_attr(vcptr, VolumePrevEndTimeStamp, vcptr->tc_PrevEndTimeStamp); set_unsignedlonglong_vol_attr(vcptr, VolumePrevEndSequenceNumber, vcptr->tc_PrevEndSequenceNumber); set_unsignedlonglong_vol_attr(vcptr, VolumePrevSequenceIDforSplitIO, vcptr->tc_PrevSequenceIDforSplitIO); set_unsignedlonglong_vol_attr(vcptr, VolumeRpoTimeStamp, vcptr->tc_rpo_timestamp); fs_freeze_volume(vcptr, &head); thaw_volume(vcptr, &head); wait_for_all_writes_to_complete(vcptr->tc_bp->volume_bitmap); if (freeze_root) { /* * freeze_root flag is only set when shutting down root volume/disk * in the end. Flush all the cached data which may be caused by all * the bitmap writes for other volumnes and private file writes by * freezing the root */ freeze_root_dev(); /* * Write all pending dirty blocks to bitmap and wait for bitmap * updation to complete */ inmage_flt_save_all_changes(vcptr, TRUE, INM_NO_OP); /* * Flush all telemetry logs generated by tags dropped while * converting pending writes to bitmap int the step above */ telemetry_shutdown(); /* * The bitmap and telemetry writes may have caused additional writes. * Freeze the root again to flush any of those pending changes. */ freeze_root_dev(); /* * Write all pending dirty blocks to bitmap and wait for bitmap * updation to complete */ inmage_flt_save_all_changes(vcptr, TRUE, INM_NO_OP); /* * The bitmap writes may have caused additional * writes to metadata. Freeze the root again to flush any of * those pending changes. */ freeze_root_dev(); } volume_lock(vcptr); vcptr->tc_flags |= VCF_VOLUME_FROZEN_SYS_SHUTDOWN; volume_unlock(vcptr); /* Write all pending dirty blocks to bitmap and close it */ inmage_flt_save_all_changes(vcptr, TRUE, INM_NO_OP); lcw_move_bitmap_to_raw_mode(vcptr); return 0; } inm_s32_t process_sys_shutdown_notify_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg) { struct inm_list_head *ptr = NULL, *nextptr = NULL; target_context_t *vcptr = NULL; unsigned long lock_flag = 0; inm_s32_t error = 0; target_context_t *root = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered\n"); } if (!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, sizeof(SYS_SHUTDOWN_NOTIFY_INPUT))) return -EFAULT; err("system_shutdown is informed to inm driver"); #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) move_chg_nodes_to_drainable_queue(); #endif driver_ctx->sys_shutdown = DC_FLAGS_SYSTEM_SHUTDOWN; INM_ATOMIC_INC(&driver_ctx->service_thread.wakeup_event_raised); INM_WAKEUP_INTERRUPTIBLE(&driver_ctx->service_thread.wakeup_event); INM_COMPLETE(&driver_ctx->service_thread._new_event_completion); INM_WAIT_FOR_COMPLETION_INTERRUPTIBLE(&driver_ctx->shutdown_completion); retry: INM_DOWN_READ(&driver_ctx->tgt_list_sem); inm_list_for_each_safe(ptr, nextptr, &driver_ctx->tgt_list) { vcptr = inm_list_entry(ptr, target_context_t, tc_list); if(vcptr->tc_flags & (VCF_VOLUME_CREATING | VCF_VOLUME_DELETING | VCF_VOLUME_FROZEN_SYS_SHUTDOWN)){ vcptr = NULL; continue; } /* keep the root device for the end */ if (isrootdev(vcptr)) { root = vcptr; vcptr = NULL; continue; } INM_BUG_ON_TMP(vcptr); if (vcptr->tc_bp->volume_bitmap) { get_tgt_ctxt(vcptr); INM_UP_READ(&driver_ctx->tgt_list_sem); shutdown_volume(vcptr, FALSE); put_tgt_ctxt(vcptr); goto retry; } } INM_UP_READ(&driver_ctx->tgt_list_sem); inm_flush_ts_and_seqno_to_file(TRUE); inm_close_ts_and_seqno_file(); /* * If root device is not found, mark it for resync */ INM_SPIN_LOCK_IRQSAVE(&driver_ctx->clean_shutdown_lock, lock_flag); driver_ctx->dc_flags |= SYS_CLEAN_SHUTDOWN; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->clean_shutdown_lock, lock_flag); if (!inm_flush_clean_shutdown(CLEAN_SHUTDOWN)) error=-EIO; #ifdef INM_AIX inm_flush_log_file(); #endif if (root) shutdown_volume(root, TRUE); inm_register_reboot_notifier(TRUE); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return error; } inm_s32_t process_sys_pre_shutdown_notify_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered\n"); } err("System PreShutdown"); /* * Since we cant hold dc_cp_mutex as timeout function * itself can require it, we keep force timing out any * active CP until we reach a no active CP state */ do { dbg("Killing CP timers"); INM_DOWN(&driver_ctx->dc_cp_mutex); if (driver_ctx->dc_cp != INM_CP_NONE && driver_ctx->dc_cp != INM_CP_SHUTDOWN ) { force_timeout(&cp_timer); INM_UP(&driver_ctx->dc_cp_mutex); inm_ksleep(INM_HZ); } else { break; } } while(1); /* Prevent further CP */ driver_ctx->dc_cp = INM_CP_SHUTDOWN; INM_UP(&driver_ctx->dc_cp_mutex); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving"); } return 0; } inm_s32_t process_lcw_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg) { inm_s32_t error = 0; lcw_op_t *op = NULL; if (!INM_ACCESS_OK(VERIFY_READ, arg, sizeof(lcw_op_t))) { error = -EFAULT; goto out; } op = (lcw_op_t *)INM_KMALLOC(sizeof(lcw_op_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!op){ err("op allocation failed"); error = INM_ENOMEM; goto out; } if (INM_COPYIN(op, arg, sizeof(lcw_op_t))) { err("copyin failed"); error = -EFAULT; goto out; } op->lo_name.volume_guid[GUID_SIZE_IN_CHARS-1] = '\0'; if (op->lo_op == LCW_OP_MAP_FILE) error = lcw_map_file_blocks(op->lo_name.volume_guid); else error = lcw_perform_bitmap_op(op->lo_name.volume_guid, op->lo_op); out: if (op) INM_KFREE(op, sizeof(lcw_op_t), INM_KERNEL_HEAP); return error; } /* Format: * ,), * ,, * ,, ........ * , , * , * , , * , */ inm_s32_t process_tag_ioctl(inm_devhandle_t *idhp, void __INM_USER *user_buf, inm_s32_t sync_tag) { inm_s32_t flags = 0; inm_u16_t num_vols = 0, i; inm_s32_t error = 0; tag_guid_t *tag_guid = NULL; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("entered"); } if(sync_tag){ tag_guid = (tag_guid_t *)INM_KMALLOC(sizeof(tag_guid_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!tag_guid){ err("TAG Input Failed: Allocation of tag_guid_t object"); error = INM_ENOMEM; goto just_exit; } INM_MEM_ZERO(tag_guid, sizeof(tag_guid_t)); INM_INIT_WAITQUEUE_HEAD(&tag_guid->wq); if(!INM_ACCESS_OK(VERIFY_READ , (void __INM_USER *)user_buf, sizeof(unsigned short))){ err("TAG Input Failed: Access violation in getting guid length"); error = INM_EFAULT; goto just_exit; } if(INM_COPYIN(&tag_guid->guid_len, user_buf, sizeof(unsigned short))){ err("TAG Input Failed: INM_COPYIN failed while accessing guid length"); error = INM_EFAULT; goto just_exit; } user_buf += sizeof(unsigned short); if(!INM_ACCESS_OK(VERIFY_READ , (void __INM_USER *)user_buf, tag_guid->guid_len)){ err("TAG Input Failed: Access violation in getting guid"); error = INM_EFAULT; goto just_exit; } tag_guid->guid = (char *)INM_KMALLOC(tag_guid->guid_len + 1, INM_KM_SLEEP, INM_KERNEL_HEAP); if(!tag_guid->guid){ err("TAG Input Failed: Allocation of memory for guid"); error = INM_ENOMEM; goto just_exit; } if(INM_COPYIN(tag_guid->guid, user_buf, tag_guid->guid_len)){ err("TAG Input Failed: INM_COPYIN failed while accessing guid"); error = INM_EFAULT; goto just_exit; } user_buf += tag_guid->guid_len; tag_guid->guid[tag_guid->guid_len] = '\0'; } if(!INM_ACCESS_OK(VERIFY_READ , (void __INM_USER *)user_buf, (sizeof(inm_u32_t) + sizeof(unsigned short)))) { err("TAG Input Failed: Access violation in getting Flags and \ Total Number of volumes"); error = -EFAULT; goto just_exit; } if(INM_COPYIN(&flags, user_buf, sizeof(inm_u32_t))) { err("TAG Input Failed: INM_COPYIN failed while accessing flags"); error = -EFAULT; goto just_exit; } /* Get total number of volumes in the input stream. */ user_buf += sizeof(inm_u32_t); if(INM_COPYIN(&num_vols, user_buf, sizeof(unsigned short))) { err("TAG Input Failed: INM_COPYIN failed while accessing flags"); error = -EFAULT; goto just_exit; } if(num_vols <= 0) { err("TAG Input Failed: Number of volumes can't be zero or negative"); error = -EINVAL; goto just_exit; } dbg("TAG: No of volumes: %d", num_vols); user_buf += sizeof(unsigned short); if(sync_tag){ tag_guid->num_vols = num_vols; tag_guid->status = INM_KMALLOC(num_vols * sizeof(inm_s32_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!tag_guid->status){ err("TAG Input Failed: Allocation of memory for status of tag"); error = INM_ENOMEM; goto just_exit; } error = flt_process_tags(num_vols, &user_buf, flags, tag_guid); if(error) goto just_exit; if(INM_COPYOUT(user_buf, tag_guid->status, num_vols * sizeof(inm_s32_t))) { err("copy to user failed for tag status"); error = INM_EFAULT; goto just_exit; } for(i = 0; i < num_vols; i++){ if(tag_guid->status[i] == STATUS_PENDING){ INM_DOWN_WRITE(&(driver_ctx->tag_guid_list_sem)); inm_list_add_tail(&tag_guid->tag_list, &driver_ctx->tag_guid_list); INM_UP_WRITE(&(driver_ctx->tag_guid_list_sem)); return error; } } goto just_exit; }else error = flt_process_tags(num_vols, &user_buf, flags, NULL); return error; just_exit: if(sync_tag) flt_cleanup_sync_tag(tag_guid); return error; } inm_s32_t process_get_tag_status_ioctl(inm_devhandle_t *idhp, void __INM_USER *user_buf) { inm_u16_t guid_len; char *guid = NULL; inm_s32_t status = 0, error, need_to_wait = 1, i; tag_guid_t *tag_guid; unsigned short seconds; if(!INM_ACCESS_OK(VERIFY_READ , (void __INM_USER *)user_buf, sizeof(unsigned short))){ err("TAG STATUS Input Failed: Access violation in getting guid length"); error = INM_EFAULT; goto out_err; } if(INM_COPYIN(&guid_len, user_buf, sizeof(unsigned short))){ err("TAG STATUS Input Failed: INM_COPYIN failed while accessing guid length"); error = INM_EFAULT; goto out_err; } user_buf += sizeof(unsigned short); if(!INM_ACCESS_OK(VERIFY_READ , (void __INM_USER *)user_buf, guid_len)){ err("TAG STATUS Input Failed: Access violation in getting guid"); error = INM_EFAULT; goto out_err; } guid = (char *)INM_KMALLOC(guid_len + 1, INM_KM_SLEEP, INM_KERNEL_HEAP); if(!guid){ err("TAG STATUS Input Failed: Allocation of memory for guid"); error = INM_ENOMEM; goto out_err; } if(INM_COPYIN(guid, user_buf, guid_len)){ err("TAG STATUS Input Failed: INM_COPYIN failed while accessing guid"); error = INM_EFAULT; goto out_err; } guid[guid_len] = '\0'; user_buf += guid_len; if(!INM_ACCESS_OK(VERIFY_READ , (void __INM_USER *)user_buf, sizeof(unsigned short))){ err("TAG STATUS Input Failed: Access violation in getting seconds"); error = INM_EFAULT; goto out_err; } if(INM_COPYIN(&seconds, user_buf, sizeof(unsigned short))){ err("TAG STATUS Input Failed: INM_COPYIN failed while accessing seconds"); error = INM_EFAULT; goto out_err; } info("Timeout = %u\n", seconds); user_buf += sizeof(unsigned short); retry: INM_DOWN_READ(&(driver_ctx->tag_guid_list_sem)); tag_guid = get_tag_from_guid(guid); if(!tag_guid){ INM_UP_READ(&(driver_ctx->tag_guid_list_sem)); error = INM_EINVAL; err("There is no matching synchronous tag"); goto out_err; } for(i = 0; i < tag_guid->num_vols; i++){ if(tag_guid->status[i] == STATUS_PENDING){ if(need_to_wait){ INM_UP_READ(&(driver_ctx->tag_guid_list_sem)); goto wait; } status = STATUS_PENDING; break; } } INM_UP_READ(&(driver_ctx->tag_guid_list_sem)); if(!INM_ACCESS_OK(VERIFY_WRITE, (void __INM_USER *)user_buf, tag_guid->num_vols * sizeof(inm_s32_t))){ err("TAG STATUS Input Failed: Access violation in getting guid status"); error = INM_EFAULT; goto out_err; } if(INM_COPYOUT(user_buf, tag_guid->status, tag_guid->num_vols * sizeof(inm_s32_t))) { err("TAG STATUS Output Failed: copy to user failed for tag status"); error = INM_EFAULT; goto out_err; } if(status != STATUS_PENDING){ INM_DOWN_WRITE(&(driver_ctx->tag_guid_list_sem)); inm_list_del(&tag_guid->tag_list); INM_UP_WRITE(&(driver_ctx->tag_guid_list_sem)); flt_cleanup_sync_tag(tag_guid); } error = 0; goto out_err; wait: if(seconds) inm_wait_event_interruptible_timeout(tag_guid->wq, 0, (seconds * INM_HZ)); else inm_wait_event_interruptible(tag_guid->wq, 0); need_to_wait = 0; goto retry; out_err: if(guid) INM_KFREE(guid, guid_len + 1, INM_KERNEL_HEAP); return error; } inm_s32_t process_wake_all_threads_ioctl(inm_devhandle_t *idhp, void __INM_USER *user_buf) { struct inm_list_head *ptr; target_context_t *tgt_ctxt = NULL; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ dbg("entered"); } INM_DOWN_READ(&(driver_ctx->tgt_list_sem)); for(ptr = driver_ctx->tgt_list.next; ptr != &(driver_ctx->tgt_list); ptr = ptr->next) { tgt_ctxt = inm_list_entry(ptr, target_context_t, tc_list); INM_WAKEUP_INTERRUPTIBLE(&tgt_ctxt->tc_waitq); } INM_UP_READ(&(driver_ctx->tgt_list_sem)); if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ dbg("leaving"); } return 0; } inm_s32_t process_get_db_threshold(inm_devhandle_t *idhp, void __INM_USER *user_buf) { get_db_thres_t *thr; target_context_t *ctxt; if(!INM_ACCESS_OK(VERIFY_READ | VERIFY_WRITE, (void __INM_USER *)user_buf, sizeof(get_db_thres_t))) { err("Access violation for get_db_thres_t buffer"); return -EFAULT; } thr = (get_db_thres_t *)INM_KMALLOC(sizeof(get_db_thres_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!thr) return -ENOMEM; if(INM_COPYIN(thr, user_buf, sizeof(get_db_thres_t))) { INM_KFREE(thr, sizeof(get_db_thres_t), INM_KERNEL_HEAP); return -EFAULT; } thr->VolumeGUID[GUID_SIZE_IN_CHARS-1] = '\0'; ctxt = get_tgt_ctxt_from_uuid_nowait((char *)&thr->VolumeGUID[0]); if(!ctxt) { dbg("Failed to get target context from uuid"); INM_KFREE(thr, sizeof(get_db_thres_t), INM_KERNEL_HEAP); return -EINVAL; } thr->threshold = ctxt->tc_db_notify_thres; if(INM_COPYOUT(user_buf, thr, sizeof(get_db_thres_t))) { err("copy to user failed in get_db"); put_tgt_ctxt(ctxt); INM_KFREE(thr, sizeof(get_db_thres_t), INM_KERNEL_HEAP); return -EFAULT; } put_tgt_ctxt(ctxt); INM_KFREE(thr, sizeof(get_db_thres_t), INM_KERNEL_HEAP); return 0; } inm_s32_t process_resync_start_ioctl(inm_devhandle_t *idhp, void __INM_USER *user_buf) { RESYNC_START_V2 *resync_start; TIME_STAMP_TAG_V2 ts; target_context_t *ctxt = NULL; if(!INM_ACCESS_OK(VERIFY_READ | VERIFY_WRITE, (void __INM_USER *)user_buf, sizeof(RESYNC_START_V2))) { err("Access violation for RESYNC_START buffer"); return -EFAULT; } resync_start = (RESYNC_START_V2 *)INM_KMALLOC(sizeof(RESYNC_START_V2), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!resync_start) return -ENOMEM; if(INM_COPYIN(resync_start, user_buf, sizeof(RESYNC_START_V2))) { INM_KFREE(resync_start, sizeof(RESYNC_START_V2), INM_KERNEL_HEAP); return -EFAULT; } resync_start->VolumeGUID[GUID_SIZE_IN_CHARS-1] = '\0'; ctxt = get_tgt_ctxt_from_uuid_nowait((char *)&resync_start->VolumeGUID[0]); if(!ctxt) { dbg("Failed to get target context from uuid"); INM_KFREE(resync_start, sizeof(RESYNC_START_V2), INM_KERNEL_HEAP); return -EINVAL; } volume_lock(ctxt); if (ctxt->tc_cur_node && ctxt->tc_optimize_performance & PERF_OPT_DRAIN_PREF_DATA_MODE_CHANGES_IN_NWO) { INM_BUG_ON(!inm_list_empty(&ctxt->tc_cur_node->nwo_dmode_next)); if (ctxt->tc_cur_node->type == NODE_SRC_DATA && ctxt->tc_cur_node->wostate != ecWriteOrderStateData) { close_change_node(ctxt->tc_cur_node, IN_IOCTL_PATH); inm_list_add_tail(&ctxt->tc_cur_node->nwo_dmode_next, &ctxt->tc_nwo_dmode_list); if (ctxt->tc_optimize_performance & PERF_OPT_DEBUG_DATA_DRAIN) { info("Appending chg:%p to ctxt:%p next:%p prev:%p mode:%d", ctxt->tc_cur_node,ctxt, ctxt->tc_cur_node->nwo_dmode_next.next, ctxt->tc_cur_node->nwo_dmode_next.prev, ctxt->tc_cur_node->type); } } } /* set cur_node to NULL. */ ctxt->tc_cur_node = NULL; /* get timestamp. */ get_time_stamp_tag(&ts); resync_start->TimeInHundNanoSecondsFromJan1601 = ts.TimeInHundNanoSecondsFromJan1601; resync_start->ullSequenceNumber = ts.ullSequenceNumber; ctxt->tc_tel.tt_resync_start = ts.TimeInHundNanoSecondsFromJan1601; volume_unlock(ctxt); if(INM_COPYOUT(user_buf, resync_start, sizeof(RESYNC_START_V2))) { err("copy to user failed in resync_start"); put_tgt_ctxt(ctxt); INM_KFREE(resync_start, sizeof(RESYNC_START_V2), INM_KERNEL_HEAP); return -EFAULT; } put_tgt_ctxt(ctxt); INM_KFREE(resync_start, sizeof(RESYNC_START_V2), INM_KERNEL_HEAP); return 0; } inm_s32_t process_resync_end_ioctl(inm_devhandle_t *idhp, void __INM_USER *user_buf) { RESYNC_END_V2 *resync_end; TIME_STAMP_TAG_V2 ts; target_context_t *ctxt = NULL; if(!INM_ACCESS_OK(VERIFY_READ | VERIFY_WRITE, (void __INM_USER *)user_buf, sizeof(RESYNC_END))) { err("Access violation for RESYNC_END buffer"); return -EFAULT; } resync_end = (RESYNC_END_V2 *)INM_KMALLOC(sizeof(RESYNC_END_V2), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!resync_end) return -ENOMEM; if(INM_COPYIN(resync_end, user_buf, sizeof(RESYNC_END_V2))) { INM_KFREE(resync_end, sizeof(RESYNC_END_V2), INM_KERNEL_HEAP); return -EFAULT; } resync_end->VolumeGUID[GUID_SIZE_IN_CHARS-1] = '\0'; ctxt = get_tgt_ctxt_from_uuid_nowait((char *)&resync_end->VolumeGUID[0]); if(!ctxt) { dbg("Failed to get target context from uuid"); INM_KFREE(resync_end, sizeof(RESYNC_END_V2), INM_KERNEL_HEAP); return -EINVAL; } volume_lock(ctxt); if (ctxt->tc_cur_node && (ctxt->tc_optimize_performance & PERF_OPT_DRAIN_PREF_DATA_MODE_CHANGES_IN_NWO)) { INM_BUG_ON(!inm_list_empty(&ctxt->tc_cur_node->nwo_dmode_next)); if (ctxt->tc_cur_node->type == NODE_SRC_DATA && ctxt->tc_cur_node->wostate != ecWriteOrderStateData) { close_change_node(ctxt->tc_cur_node, IN_IOCTL_PATH); inm_list_add_tail(&ctxt->tc_cur_node->nwo_dmode_next, &ctxt->tc_nwo_dmode_list); if (ctxt->tc_optimize_performance & PERF_OPT_DEBUG_DATA_DRAIN) { info("Appending chg:%p to ctxt:%p next:%p prev:%p mode:%d", ctxt->tc_cur_node,ctxt, ctxt->tc_cur_node->nwo_dmode_next.next, ctxt->tc_cur_node->nwo_dmode_next.prev, ctxt->tc_cur_node->type); } } } /* set cur_node to NULL. */ ctxt->tc_cur_node = NULL; /* get timestamp. */ get_time_stamp_tag(&ts); resync_end->TimeInHundNanoSecondsFromJan1601 = ts.TimeInHundNanoSecondsFromJan1601; resync_end->ullSequenceNumber = ts.ullSequenceNumber; ctxt->tc_tel.tt_resync_end = ts.TimeInHundNanoSecondsFromJan1601; volume_unlock(ctxt); if(INM_COPYOUT(user_buf, resync_end, sizeof(RESYNC_END_V2))) { err("copy to user failed in resync_end"); put_tgt_ctxt(ctxt); INM_KFREE(resync_end, sizeof(RESYNC_END_V2), INM_KERNEL_HEAP); return -EFAULT; } put_tgt_ctxt(ctxt); INM_KFREE(resync_end, sizeof(RESYNC_END_V2), INM_KERNEL_HEAP); return 0; } inm_s32_t process_get_driver_version_ioctl(inm_devhandle_t *idhp, void __INM_USER *user_buf) { DRIVER_VERSION *version; if(!INM_ACCESS_OK(VERIFY_READ | VERIFY_WRITE, (void __INM_USER *)user_buf, sizeof(DRIVER_VERSION))) { err("Access violation for DRIVER_VERSION buffer"); return -EFAULT; } version = (DRIVER_VERSION *)INM_KMALLOC(sizeof(DRIVER_VERSION), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!version) return -ENOMEM; if(INM_COPYIN(version, user_buf, sizeof(DRIVER_VERSION))) { INM_KFREE(version, sizeof(DRIVER_VERSION), INM_KERNEL_HEAP); return -EFAULT; } version->ulDrMajorVersion = DRIVER_MAJOR_VERSION; version->ulDrMinorVersion = DRIVER_MINOR_VERSION; version->ulDrMinorVersion2 = DRIVER_MINOR_VERSION2; version->ulDrMinorVersion3 = DRIVER_MINOR_VERSION3; version->ulPrMajorVersion = INMAGE_PRODUCT_VERSION_MAJOR; version->ulPrMinorVersion = INMAGE_PRODUCT_VERSION_MINOR; version->ulPrMinorVersion2 = INMAGE_PRODUCT_VERSION_PRIVATE; version->ulPrBuildNumber = INMAGE_PRODUCT_VERSION_BUILDNUM; if(INM_COPYOUT(user_buf, version, sizeof(DRIVER_VERSION))) { err("copy to user failed in get_driver_version"); INM_KFREE(version, sizeof(DRIVER_VERSION), INM_KERNEL_HEAP); return -EFAULT; } INM_KFREE(version, sizeof(DRIVER_VERSION), INM_KERNEL_HEAP); return 0; } inm_s32_t process_shell_log_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg) { if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ dbg("entered"); } if(!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, sizeof(SHUTDOWN_NOTIFY_INPUT))) return -EFAULT; dbg("%s\n", (char *)arg); if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ dbg("leaving"); } return 0; } inm_s32_t process_get_global_stats_ioctl(inm_devhandle_t *handle, void * arg) { inm_u32_t len = 0; inm_u64_t mb_allocated, mb_free; inm_u64_t mb_unres, mb_total_res; inm_s32_t ret = 0; char guid[sizeof(driver_ctx->dc_cp_guid) + 1]; char *page; char *strp; vm_cx_session_t *vm_cx_sess = &driver_ctx->dc_vm_cx_session; int idx; /* notes: use some sort of structure that includes the buf len * : its ok not to take any lock during accessing the field * of driver context since here we are the reader only so at most * we could get little inaccurate info but we it wont' cause any kernel * panic. In future if we see really weird behaviour we'll take the proper locks. */ mb_allocated = driver_ctx->data_flt_ctx.pages_allocated >> (MEGABYTE_BIT_SHIFT-INM_PAGESHIFT); mb_free = driver_ctx->data_flt_ctx.pages_free >> (MEGABYTE_BIT_SHIFT-INM_PAGESHIFT); mb_unres = driver_ctx->dc_cur_unres_pages >> (MEGABYTE_BIT_SHIFT-INM_PAGESHIFT); mb_total_res = driver_ctx->dc_cur_res_pages >> (MEGABYTE_BIT_SHIFT-INM_PAGESHIFT); page = INM_KMALLOC(INM_PAGESZ, INM_KM_SLEEP, INM_KERNEL_HEAP); if (!page){ ret = -ENOMEM; goto out; } len += snprintf(page+len, (INM_PAGESZ - len), "\n"); len += snprintf(page+len, (INM_PAGESZ - len), "Common Info:\n"); len += snprintf(page+len, (INM_PAGESZ - len), "------------\n"); len += snprintf(page+len, (INM_PAGESZ - len), "No. Pending Change Nodes : %d\n", INM_ATOMIC_READ(&driver_ctx->stats.pending_chg_nodes)); len += snprintf(page+len, (INM_PAGESZ - len), "Service state : %u (pid = %u)\n", (driver_ctx->service_state), driver_ctx->svagent_pid); len += snprintf(page+len, (INM_PAGESZ - len), "[Note: 0.uninit, 1.not started, 2.running 3. shutdown]\n"); len += snprintf(page+len, (INM_PAGESZ - len), "sentinal state : %s (pid = %u)\n\n", (driver_ctx->sentinal_pid)? "RUNNING" : "STOPPED", driver_ctx->sentinal_pid); len += snprintf(page+len, (INM_PAGESZ - len), "No of protected volumes : %d\n", driver_ctx->total_prot_volumes); len += snprintf(page+len, (INM_PAGESZ - len), "DRIVER BUILD TIME : %s (%s)\n", BLD_DATE, BLD_TIME); len += snprintf(page+len, (INM_PAGESZ - len), "Reserved change-node pages : %u Pages\n", (driver_ctx->dc_res_cnode_pgs)); #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) len += snprintf(page+len, (INM_PAGESZ - len), "Reserved BIOInfo/failed : %d/%d\n", INM_ATOMIC_READ(&driver_ctx->dc_nr_bioinfo_alloced), INM_ATOMIC_READ(&driver_ctx->dc_nr_bioinfo_allocs_failed)); len += snprintf(page+len, (INM_PAGESZ - len), "Reserved changeNodes/failed : %d/%d\n", INM_ATOMIC_READ(&driver_ctx->dc_nr_chdnodes_alloced), INM_ATOMIC_READ(&driver_ctx->dc_nr_chgnode_allocs_failed)); len += snprintf(page+len, (INM_PAGESZ - len), "Reserved meta pages/failed : %d/%d\n", INM_ATOMIC_READ(&driver_ctx->dc_nr_metapages_alloced), INM_ATOMIC_READ(&driver_ctx->dc_nr_metapage_allocs_failed)); len += snprintf(page+len, (INM_PAGESZ - len), "Allocated BIOInfo/Change nodes/Meta pages from Pool : %d/%d/%d\n", INM_ATOMIC_READ(&driver_ctx->dc_nr_bioinfo_alloced_from_pool), INM_ATOMIC_READ(&driver_ctx->dc_nr_chgnodes_alloced_from_pool), INM_ATOMIC_READ(&driver_ctx->dc_nr_metapages_alloced_from_pool)); #endif len += snprintf(page+len, (INM_PAGESZ - len), "\nData Mode Info:\n"); len += snprintf(page+len, (INM_PAGESZ - len), "---------------\n"); len += snprintf(page+len, (INM_PAGESZ - len), "Data Pool Size Allocated : %llu MB(%u pages)\n", mb_allocated, driver_ctx->data_flt_ctx.pages_allocated); if (mb_free){ len += snprintf(page+len, (INM_PAGESZ - len), "Data Pool Size Free : %llu MB(%u pages) \n", mb_free, driver_ctx->data_flt_ctx.pages_free); } else { len += snprintf(page+len, (INM_PAGESZ - len), "Data Pool Size Free : %ld bytes(%d pages) \n", (driver_ctx->data_flt_ctx.pages_free * INM_PAGESZ), driver_ctx->data_flt_ctx.pages_free); } len += snprintf(page+len, (INM_PAGESZ - len), "Total Reserved Pages : %llu MB(%u pages) \n", mb_total_res, driver_ctx->dc_cur_res_pages); len += snprintf(page+len, (INM_PAGESZ - len), "Unreserved Pages : %llu MB(%u pages) \n", mb_unres, driver_ctx->dc_cur_unres_pages); len += snprintf(page+len, (INM_PAGESZ - len), "\nData File Mode Info:\n"); len += snprintf(page+len, (INM_PAGESZ - len), "----------------------\n"); len += snprintf(page+len, (INM_PAGESZ - len), "Data File Mode Directory : %s \n", (driver_ctx->tunable_params.data_file_log_dir)); len += snprintf(page+len, (INM_PAGESZ - len), "Data File Mode - Disk Limit : %lld MB\n", driver_ctx->tunable_params.data_to_disk_limit/MEGABYTES); len += snprintf(page+len, (INM_PAGESZ - len), "Data File Mode Enabled : %s \n", (driver_ctx->tunable_params.enable_data_file_mode ? "Yes" : "No")); len += snprintf(page+len, (INM_PAGESZ - len), "Free Pages Threshold : %d \n\n", driver_ctx->tunable_params.free_pages_thres_for_filewrite); #ifdef INM_AIX len += snprintf(page+len, (INM_PAGESZ - len), "Bitmap Work Item Pool Info:\n"); len += snprintf(page+len, (INM_PAGESZ - len), "---------------------------\n"); len += snprintf(page+len, (INM_PAGESZ - len), "No. of available objects : %u\n", driver_ctx->dc_bmap_info.bitmap_work_item_pool->pi_reserved); len += snprintf(page+len, (INM_PAGESZ - len), "No. of used objects : %u\n", driver_ctx->dc_bmap_info.bitmap_work_item_pool->pi_allocd); len += snprintf(page+len, (INM_PAGESZ - len), "Max / Min limit : %u/%u\n\n", driver_ctx->dc_bmap_info.bitmap_work_item_pool->pi_max_nr, driver_ctx->dc_bmap_info.bitmap_work_item_pool->pi_min_nr); len += snprintf(page+len, (INM_PAGESZ - len), "Work Queue Entry Pool Info:\n"); len += snprintf(page+len, (INM_PAGESZ - len), "---------------------------\n"); len += snprintf(page+len, (INM_PAGESZ - len), "No. of available objects : %u\n", driver_ctx->wq_entry_pool->pi_reserved); len += snprintf(page+len, (INM_PAGESZ - len), "No. of used objects : %u\n", driver_ctx->wq_entry_pool->pi_allocd); len += snprintf(page+len, (INM_PAGESZ - len), "Max / Min limit : %u/%u\n\n", driver_ctx->wq_entry_pool->pi_max_nr, driver_ctx->wq_entry_pool->pi_min_nr); len += snprintf(page+len, (INM_PAGESZ - len), "I/O Buffer Pool Info:\n"); len += snprintf(page+len, (INM_PAGESZ - len), "---------------------\n"); len += snprintf(page+len, (INM_PAGESZ - len), "No. of available objects : %u\n", driver_ctx->dc_bmap_info.iob_obj_pool->pi_reserved); len += snprintf(page+len, (INM_PAGESZ - len), "No. of used objects : %u\n", driver_ctx->dc_bmap_info.iob_obj_pool->pi_allocd); len += snprintf(page+len, (INM_PAGESZ - len), "Max / Min limit : %u/%u\n\n", driver_ctx->dc_bmap_info.iob_obj_pool->pi_max_nr, driver_ctx->dc_bmap_info.iob_obj_pool->pi_min_nr); len += snprintf(page+len, (INM_PAGESZ - len), "I/O Buffer Data Pool Info:\n"); len += snprintf(page+len, (INM_PAGESZ - len), "--------------------------\n"); len += snprintf(page+len, (INM_PAGESZ - len), "No. of available objects : %u\n", driver_ctx->dc_bmap_info.iob_data_pool->pi_reserved); len += snprintf(page+len, (INM_PAGESZ - len), "No. of used objects : %u\n", driver_ctx->dc_bmap_info.iob_data_pool->pi_allocd); len += snprintf(page+len, (INM_PAGESZ - len), "Max / Min limit : %u/%u\n\n", driver_ctx->dc_bmap_info.iob_data_pool->pi_max_nr, driver_ctx->dc_bmap_info.iob_data_pool->pi_min_nr); len += snprintf(page+len, (INM_PAGESZ - len), "Data File Node Pool Info:\n"); len += snprintf(page+len, (INM_PAGESZ - len), "-------------------------\n"); len += snprintf(page+len, (INM_PAGESZ - len), "No. of available objects : %u\n", driver_ctx->dc_host_info.data_file_node_cache->pi_reserved); len += snprintf(page+len, (INM_PAGESZ - len), "No. of used objects : %u\n", driver_ctx->dc_host_info.data_file_node_cache->pi_allocd); len += snprintf(page+len, (INM_PAGESZ - len), "Max / Min limit : %u/%u\n\n", driver_ctx->dc_host_info.data_file_node_cache->pi_max_nr, driver_ctx->dc_host_info.data_file_node_cache->pi_min_nr); #endif #ifdef INM_LINUX len += snprintf(page+len, (INM_PAGESZ - len), "Total memory usage : %lu KB \n\n", (unsigned long) (atomic_read(&inm_flt_memprint)/1024)); #endif INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); len += snprintf(page+len, (INM_PAGESZ - len), "CX Session Details:\n"); len += snprintf(page+len, (INM_PAGESZ - len), "-------------------\n"); len += snprintf(page+len, (INM_PAGESZ - len), "No. Of disks in Non Wrote Order : %u\n", driver_ctx->total_prot_volumes_in_nwo); if (!(vm_cx_sess->vcs_flags & VCS_CX_SESSION_STARTED)) { strp = "Session Not Started"; len += snprintf(page+len, (INM_PAGESZ - len), "State : %s\n", strp); goto unlcok_session_lock; } if (vm_cx_sess->vcs_flags & VCS_CX_SESSION_ENDED) strp = "Session Ended"; else strp = "Session Started"; len += snprintf(page+len, (INM_PAGESZ - len), "State : %s", strp); if (vm_cx_sess->vcs_flags & VCS_CX_S2_EXIT) len += snprintf(page+len, (INM_PAGESZ - len), ", Drainer Exited"); if (vm_cx_sess->vcs_flags & VCS_CX_SVAGENT_EXIT) len += snprintf(page+len, (INM_PAGESZ - len), ", Service exited"); len += snprintf(page+len, (INM_PAGESZ - len), "\nSession Number : %u\n", vm_cx_sess->vcs_nth_cx_session); len += snprintf(page+len, (INM_PAGESZ - len), "Transaction ID : %llu\n", vm_cx_sess->vcs_transaction_id); len += snprintf(page+len, (INM_PAGESZ - len), "No. of Disks in session : %llu\n", vm_cx_sess->vcs_num_disk_cx_sess); len += snprintf(page+len, (INM_PAGESZ - len), "CX Start / End Time : %llu/%llu\n", vm_cx_sess->vcs_start_ts, vm_cx_sess->vcs_end_ts); #ifdef INM_DEBUG len += snprintf(page+len, (INM_PAGESZ - len), "Base Time for 1 sec interval : %llu\n", vm_cx_sess->vcs_base_secs_ts); len += snprintf(page+len, (INM_PAGESZ - len), "Tracked bytes in 1 sec : %llu\n", vm_cx_sess->vcs_tracked_bytes_per_second); #endif len += snprintf(page+len, (INM_PAGESZ - len), "Tracked / Drained bytes : %llu / %llu\n", vm_cx_sess->vcs_tracked_bytes, vm_cx_sess->vcs_drained_bytes); len += snprintf(page+len, (INM_PAGESZ - len), "Churn Buckets : "); for (idx = 0; idx < DEFAULT_NR_CHURN_BUCKETS; idx++) { len += snprintf(page+len, (INM_PAGESZ - len), "%llu ", vm_cx_sess->vcs_churn_buckets[idx]); } len += snprintf(page+len, (INM_PAGESZ - len), "\nDefault Disk / VM Peak Churn : %llu / %llu\n", vm_cx_sess->vcs_default_disk_peak_churn, vm_cx_sess->vcs_default_vm_peak_churn); len += snprintf(page+len, (INM_PAGESZ - len), "Max Peak / Excess Churn : %llu / %llu\n", vm_cx_sess->vcs_max_peak_churn, vm_cx_sess->vcs_excess_churn); len += snprintf(page+len, (INM_PAGESZ - len), "First / Last Peak Churn TS : %llu / %llu\n", vm_cx_sess->vcs_first_peak_churn_ts, vm_cx_sess->vcs_last_peak_churn_ts); len += snprintf(page+len, (INM_PAGESZ - len), "Consecutive Tag Failures : %llu\n", vm_cx_sess->vcs_num_consecutive_tag_failures); len += snprintf(page+len, (INM_PAGESZ - len), "Time Jump at TS : %llu\n", vm_cx_sess->vcs_timejump_ts); len += snprintf(page+len, (INM_PAGESZ - len), "Time Jump in msec : %llu\n", vm_cx_sess->vcs_max_jump_ms); len += snprintf(page+len, (INM_PAGESZ - len), "FWD / BWD Time Jump TS : %llu / %llu\n", driver_ctx->dc_max_fwd_timejump_ms, driver_ctx->dc_max_bwd_timejump_ms); len += snprintf(page+len, (INM_PAGESZ - len), "Drainer latency : %llu\n", vm_cx_sess->vcs_max_s2_latency); unlcok_session_lock: INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); memcpy_s(guid, sizeof(guid), driver_ctx->dc_cp_guid, sizeof(driver_ctx->dc_cp_guid)); guid[sizeof(guid) - 1] = '\0'; len += snprintf(page+len, (INM_PAGESZ - len), "CP: %d (%s)\n", driver_ctx->dc_cp, guid); if (INM_COPYOUT(arg, page, INM_PAGESZ)){ err("copyout failed"); ret = -EFAULT; goto out; } out: if (page){ INM_KFREE(page, INM_PAGESZ, INM_KERNEL_HEAP); } return (ret); } inm_s32_t process_get_monitoring_stats_ioctl(inm_devhandle_t *handle, void * arg) { inm_s32_t ret = 0; MONITORING_STATS *vol_lw_statsp = NULL; target_context_t *ctxt = NULL; if (!INM_ACCESS_OK(VERIFY_READ | VERIFY_WRITE, (void __user*)arg, sizeof(MONITORING_STATS))) { err( "Read Access Violation for GET_MONITORING_STATS\n"); ret = -EFAULT; return ret; } vol_lw_statsp = (MONITORING_STATS*) INM_KMALLOC(sizeof(MONITORING_STATS), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!vol_lw_statsp) { ret = -ENOMEM; err("INM_KMALLOC failed\n"); return ret; } INM_MEM_ZERO(vol_lw_statsp, sizeof(MONITORING_STATS)); if (INM_COPYIN(vol_lw_statsp, arg, sizeof(MONITORING_STATS))) { err("INM_COPYIN failed"); ret = -EFAULT; goto ERR_EXIT; } vol_lw_statsp->VolumeGuid.volume_guid[GUID_SIZE_IN_CHARS-1] = '\0'; ctxt = get_tgt_ctxt_from_uuid_nowait((char *)&vol_lw_statsp->VolumeGuid.volume_guid[0]); if (!ctxt) { err("Failed to get target context for uuid %s", vol_lw_statsp->VolumeGuid.volume_guid); ret = -EINVAL; goto ERR_EXIT; } switch(vol_lw_statsp->ReqStat) { case GET_TAG_STATS: vol_lw_statsp->TagStats.TagsDropped = INM_ATOMIC_READ(&ctxt->tc_stats.num_tags_dropped); break; case GET_CHURN_STATS: vol_lw_statsp->ChurnStats.NumCommitedChangesInBytes = ctxt->tc_bytes_commited_changes; break; default: err("ReqStat %d not supported", vol_lw_statsp->ReqStat); ret = -EINVAL; put_tgt_ctxt(ctxt); goto ERR_EXIT; } put_tgt_ctxt(ctxt); if (INM_COPYOUT((void*)arg, vol_lw_statsp, sizeof(MONITORING_STATS))) { err("INM_COPYOUT failed"); ret = -EFAULT; } ERR_EXIT: INM_KFREE(vol_lw_statsp, sizeof(MONITORING_STATS), INM_KERNEL_HEAP); return ret; } inm_s32_t process_get_volume_stats_ioctl(inm_devhandle_t *handle, void * arg) { inm_s32_t ret = 0; inm_s32_t len = 0; inm_s32_t idx = 0; char *page = NULL; char *strp, *strp_2; VOLUME_STATS *vol_stat = NULL; target_context_t *ctxt = NULL; bitmap_info_t *bitmap = NULL; disk_cx_session_t *disk_cx_sess; vol_stat = (VOLUME_STATS *) INM_KMALLOC(sizeof(VOLUME_STATS), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!vol_stat){ dbg("vol_stat allocation volume_stats failed\n"); ret = INM_ENOMEM; goto out; } if (INM_COPYIN(vol_stat, arg, sizeof(VOLUME_STATS))){ err("copyin failed"); ret = -EFAULT; goto out; } vol_stat->guid.volume_guid[GUID_SIZE_IN_CHARS-1] = '\0'; INM_DOWN_READ(&driver_ctx->tgt_list_sem); ctxt = get_tgt_ctxt_from_name_nowait_locked(vol_stat->guid.volume_guid); if (!ctxt){ INM_UP_READ(&driver_ctx->tgt_list_sem); dbg("%s device is not stacked",vol_stat->guid.volume_guid); goto out; } INM_UP_READ(&driver_ctx->tgt_list_sem); bitmap = ctxt->tc_bp; page = INM_KMALLOC(INM_PAGESZ, INM_KM_SLEEP, INM_KERNEL_HEAP); if(!page){ dbg("page allocation volume_stats failed"); ret = -ENOMEM; goto out; } INM_MEM_ZERO(page, sizeof(*page)); len += snprintf(page+len, (INM_PAGESZ - len), "\nPersistent Name : "); len += snprintf(page+len, (INM_PAGESZ - len), "%s", ctxt->tc_pname); len += snprintf(page+len, (INM_PAGESZ - len), "\nVolume State : "); if (ctxt->tc_dev_type == FILTER_DEV_MIRROR_SETUP){ if (is_target_mirror_paused(ctxt)){ strp = "Mirroring Paused"; }else { strp = "Mirroring Enabled"; } }else { if (is_target_filtering_disabled(ctxt)) { strp = "Filtering Disabled "; } else { strp = "Filtering Enabled "; } } len += snprintf(page+len, (INM_PAGESZ - len), strp); if (is_target_read_only(ctxt)) { strp = ", Read-Only "; } else { strp = ", Read-Write "; } len += snprintf(page+len, (INM_PAGESZ - len), strp); if (ctxt->tc_resync_required) { len += snprintf(page+len, (INM_PAGESZ - len), ", Resync Required "); } len += snprintf(page+len, (INM_PAGESZ - len), "\n"); if (ctxt->tc_dev_type != FILTER_DEV_MIRROR_SETUP){ len += snprintf(page+len, (INM_PAGESZ - len), "Filtering Mode/Write Order State : "); if (ctxt->tc_cur_mode == FLT_MODE_DATA) { strp = "Data"; } else if (ctxt->tc_cur_mode == FLT_MODE_METADATA) { strp = "MetaData"; } else { strp = "Uninitialized/"; } if (ctxt->tc_cur_wostate == ecWriteOrderStateData) strp_2 = "Data\n"; else if (ctxt->tc_cur_wostate == ecWriteOrderStateMetadata) strp_2 = "MetaData\n"; else if (ctxt->tc_cur_wostate == ecWriteOrderStateBitmap) strp_2 = "Bitmap\n"; else if (ctxt->tc_cur_wostate == ecWriteOrderStateRawBitmap) strp_2 = "Raw Bitmap\n"; else strp_2 = "Uninitialized\n"; len += snprintf(page+len, (INM_PAGESZ - len), "%s/%s", strp, strp_2); len += snprintf(page+len, (INM_PAGESZ - len), "Time spent in Curr mode/state (sec) : %llu/%llu\n", (INM_GET_CURR_TIME_IN_SEC - ctxt->tc_stats.st_mode_switch_time), (INM_GET_CURR_TIME_IN_SEC - ctxt->tc_stats.st_wostate_switch_time)); len += snprintf(page+len, (INM_PAGESZ - len), "Writes (bytes) : %lld\n", ctxt->tc_bytes_tracked); len += snprintf(page+len, (INM_PAGESZ - len), "Pending Changes/bytes : %lld/%lld\n", ctxt->tc_pending_changes, ctxt->tc_bytes_pending_changes); len += snprintf(page+len, (INM_PAGESZ - len), "Pending Changes/bytes in metadata : %lld/%lld\n", ctxt->tc_pending_md_changes, ctxt->tc_bytes_pending_md_changes); len += snprintf(page+len, (INM_PAGESZ - len), "Commited Changes/bytes : %lld/%lld\n", ctxt->tc_commited_changes, ctxt->tc_bytes_commited_changes); len += snprintf(page+len, (INM_PAGESZ - len), "Chain Bio Submitted/Pening : %d/%d\n", INM_ATOMIC_READ(&ctxt->tc_nr_chain_bios_submitted), INM_ATOMIC_READ(&ctxt->tc_nr_chain_bios_pending)); len += snprintf(page+len, (INM_PAGESZ - len), "Chain Bio in child/Own stack : %d/%d\n", INM_ATOMIC_READ(&ctxt->tc_nr_completed_in_child_stack), INM_ATOMIC_READ(&ctxt->tc_nr_completed_in_own_stack)); if (ctxt->tc_dev_type == FILTER_DEV_FABRIC_LUN) { len += snprintf(page+len, (INM_PAGESZ - len), "Total Write IOs received : %llu (%llu bytes)\n", ctxt->tc_stats.tc_write_io_rcvd_bytes, ctxt->tc_stats.tc_write_io_rcvd); len += snprintf(page+len, (INM_PAGESZ - len), "Total Write IO Cancels received : %d (%llu bytes)\n", INM_ATOMIC_READ(&(ctxt->tc_stats.tc_write_cancel)), ctxt->tc_stats.tc_write_cancel_rcvd_bytes); } len += snprintf(page+len, (INM_PAGESZ - len), "Data Files Created/Pending : %d/%d\n", INM_ATOMIC_READ(&ctxt->tc_stats.num_dfm_files), INM_ATOMIC_READ(&ctxt->tc_stats.num_dfm_files_pending)); len += snprintf(page+len, (INM_PAGESZ - len), "Data File Disk Space (Alloc/Used) : %lld/%lld\n", ctxt->tc_data_to_disk_limit, ctxt->tc_stats.dfm_bytes_to_disk); len += snprintf(page+len, (INM_PAGESZ - len), "Bitmap Changes Queued/bytes : %llu/%llu\n", bitmap->num_changes_queued_for_writing, bitmap->num_byte_changes_queued_for_writing); len += snprintf(page+len, (INM_PAGESZ - len), "Bitmap Changes Read/bytes : %llu/%llu (%llu times)\n", bitmap->num_changes_read_from_bitmap, bitmap->num_byte_changes_read_from_bitmap, bitmap->num_of_times_bitmap_read); len += snprintf(page+len, (INM_PAGESZ - len), "Bitmap Changes Written/bytes : %llu/%llu (%llu times)\n", bitmap->num_changes_written_to_bitmap, bitmap->num_byte_changes_written_to_bitmap, bitmap->num_of_times_bitmap_written); #ifdef INM_AIX len += snprintf(page+len, (INM_PAGESZ - len), "Async bufs submitted/processed : %u, %u/%u, %u\n", INM_ATOMIC_READ(&ctxt->tc_async_bufs_pending), INM_ATOMIC_READ(&ctxt->tc_async_bufs_write_pending), INM_ATOMIC_READ(&ctxt->tc_async_bufs_processed), INM_ATOMIC_READ(&ctxt->tc_async_bufs_write_processed)); len += snprintf(page+len, (INM_PAGESZ - len), "No. of requests queued to thread : %lu\n", ctxt->tc_nr_requests_queued); len += snprintf(page+len, (INM_PAGESZ - len), "No. of calles to ddwrite : %lu\n", ctxt->tc_nr_ddwrites_called); len += snprintf(page+len, (INM_PAGESZ - len), "Request has both read & write bufs : %d\n", INM_ATOMIC_READ(&ctxt->tc_mixedbufs)); len += snprintf(page+len, (INM_PAGESZ - len), "First Read / Write : %u / %u\n", INM_ATOMIC_READ(&ctxt->tc_read_buf_first), INM_ATOMIC_READ(&ctxt->tc_write_buf_first)); len += snprintf(page+len, (INM_PAGESZ - len), "No. of bufs submitted/processed : %u/%u\n", INM_ATOMIC_READ(&ctxt->tc_nr_bufs_pending), INM_ATOMIC_READ(&ctxt->tc_nr_bufs_processed)); len += snprintf(page+len, (INM_PAGESZ - len), "No. of bufs queued/processed : %lu/%lu\n", ctxt->tc_nr_bufs_queued_to_thread, ctxt->tc_nr_bufs_processed_by_thread); len += snprintf(page+len, (INM_PAGESZ - len), "No. of queued bufs submitted : %lu\n", ctxt->tc_nr_processed_queued_bufs); len += snprintf(page+len, (INM_PAGESZ - len), "More done set/more bufs submitted : %d/%d\n", ctxt->tc_more_done_set, ctxt->tc_nr_bufs_submitted_gr_than_one); len += snprintf(page+len, (INM_PAGESZ - len), "Meta split I/Os excetption : %lu\n", ctxt->tc_nr_spilt_io_data_mode); len += snprintf(page+len, (INM_PAGESZ - len), "No. of xm_mapin failures : %lu\n", ctxt->tc_nr_xm_mapin_failures); #endif len += snprintf(page+len, (INM_PAGESZ - len), "Pending Changes in each state : " "Data = %lld | Meta = %lld | Bitmap = %lld\n", ctxt->tc_pending_wostate_data_changes, ctxt->tc_pending_wostate_md_changes, ctxt->tc_pending_wostate_bm_changes); len += snprintf(page+len, (INM_PAGESZ - len), "Pages Allocated : %d\n", ctxt->tc_stats.num_pages_allocated); len += snprintf(page+len, (INM_PAGESZ - len), "No. of Pages Reserved : %u\n", ctxt->tc_reserved_pages); len += snprintf(page+len, (INM_PAGESZ - len), "No. of change-node pages : %lld\n", ctxt->tc_cnode_pgs); len += snprintf(page+len, (INM_PAGESZ - len), "Threshold for DF in pages : %u \n", (driver_ctx->tunable_params.volume_percent_thres_for_filewrite* ctxt->tc_reserved_pages)/100); len += snprintf(page+len, (INM_PAGESZ - len), "Pages in DF Queue : %d\n", ctxt->tc_stats.num_pgs_in_dfm_queue); len += snprintf(page+len, (INM_PAGESZ - len), "Changes Lost : %s\n", (ctxt->tc_resync_required) ? "Yes" : "No"); len += snprintf(page+len, (INM_PAGESZ - len), "DB Notify Threshold : %d\n", ctxt->tc_db_notify_thres); } else { len += snprintf(page+len, (INM_PAGESZ - len), "Writes/bytes sent to PS : %lld/%lld\n", ctxt->tc_commited_changes, ctxt->tc_bytes_commited_changes); } len += snprintf(page+len, (INM_PAGESZ - len), "Tags Dropped : %d\n", INM_ATOMIC_READ(&ctxt->tc_stats.num_tags_dropped)); len += snprintf(page+len, (INM_PAGESZ - len), "Metadata transition due to delay Data Pool allocation : %d\n", INM_ATOMIC_READ(&ctxt->tc_stats.metadata_trans_due_to_delay_alloc)); if (ctxt->tc_dev_type != FILTER_DEV_MIRROR_SETUP){ len += snprintf(page+len, (INM_PAGESZ - len), "Total Mode Transistions : " "Data = %ld | Meta = %ld\n", ctxt->tc_stats.num_change_to_flt_mode[FLT_MODE_DATA], ctxt->tc_stats.num_change_to_flt_mode[FLT_MODE_METADATA]); len += snprintf(page+len, (INM_PAGESZ - len), "Total Time spent in each Mode(sec) : " "Data = %ld | Meta = %ld\n", ctxt->tc_stats.num_secs_in_flt_mode[FLT_MODE_DATA], ctxt->tc_stats.num_secs_in_flt_mode[FLT_MODE_METADATA]); len += snprintf(page+len, (INM_PAGESZ - len), "Total State Transistions : " "Data = %ld | Meta = %ld | Bitmap = %ld\n", ctxt->tc_stats.num_change_to_wostate[ecWriteOrderStateData] + ctxt->tc_stats.num_change_to_wostate_user[ecWriteOrderStateData], ctxt->tc_stats.num_change_to_wostate[ecWriteOrderStateMetadata] + ctxt->tc_stats.num_change_to_wostate_user[ecWriteOrderStateMetadata], ctxt->tc_stats.num_change_to_wostate[ecWriteOrderStateBitmap] + ctxt->tc_stats.num_change_to_wostate_user[ecWriteOrderStateBitmap]); len += snprintf(page+len, (INM_PAGESZ - len), "Total Time spent in each State(sec) : " "Data = %ld | Meta = %ld | Bitmap = %ld\n", ctxt->tc_stats.num_secs_in_wostate[ecWriteOrderStateData], ctxt->tc_stats.num_secs_in_wostate[ecWriteOrderStateMetadata], ctxt->tc_stats.num_secs_in_wostate[ecWriteOrderStateBitmap]); } else{ len += snprintf(page+len, (INM_PAGESZ - len), "Times PT Write I/O Cancelled : %d\n", INM_ATOMIC_READ(&ctxt->tc_stats.tc_write_cancel)); } len += snprintf(page+len, (INM_PAGESZ - len), "IO Pattern\n"); len += snprintf(page+len, (INM_PAGESZ - len), "OP 512 1k 2k 4k 8k 16k 32k 64k 128k 256k 512k 1M 2M 4M 8M 8M+\n"); len += snprintf(page+len, (INM_PAGESZ - len), "W "); for (idx = 0; idx < MAX_NR_IO_BUCKETS; idx++) { len += snprintf(page+len, (INM_PAGESZ - len), "%d ", INM_ATOMIC_READ(&ctxt->tc_stats.io_pat_writes[idx])); } len += snprintf(page+len, (INM_PAGESZ - len), "\n"); len += snprintf(page+len, (INM_PAGESZ - len), "\n"); print_AT_stat_common(ctxt, page, &len); len += snprintf(page+len, (INM_PAGESZ - len), "\n"); INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); disk_cx_sess = &ctxt->tc_disk_cx_session; if (!(disk_cx_sess->dcs_flags & DCS_CX_SESSION_STARTED)) { len += snprintf(page+len, (INM_PAGESZ - len), "CX State : Session Not Started\n"); goto unlock_session_lock; } if (disk_cx_sess->dcs_flags & DCS_CX_SESSION_ENDED) strp = "Session Ended"; else strp = "Session Started"; len += snprintf(page+len, (INM_PAGESZ - len), "CX State : %s\n", strp); len += snprintf(page+len, (INM_PAGESZ - len), "Session Number : %llu\n", disk_cx_sess->dcs_nth_cx_session); len += snprintf(page+len, (INM_PAGESZ - len), "CX Start / End Time : %llu/%llu\n", disk_cx_sess->dcs_start_ts, disk_cx_sess->dcs_end_ts); #ifdef INM_DEBUG len += snprintf(page+len, (INM_PAGESZ - len), "Base Time for 1 sec interval : %llu\n", disk_cx_sess->dcs_base_secs_ts); len += snprintf(page+len, (INM_PAGESZ - len), "Tracked bytes in 1 sec : %llu\n", disk_cx_sess->dcs_tracked_bytes_per_second); #endif len += snprintf(page+len, (INM_PAGESZ - len), "Tracked / Drained bytes : %llu / %llu\n", disk_cx_sess->dcs_tracked_bytes, disk_cx_sess->dcs_drained_bytes); len += snprintf(page+len, (INM_PAGESZ - len), "Churn Buckets : "); for (idx = 0; idx < DEFAULT_NR_CHURN_BUCKETS; idx++) { len += snprintf(page+len, (INM_PAGESZ - len), "%llu ", disk_cx_sess->dcs_churn_buckets[idx]); } len += snprintf(page+len, (INM_PAGESZ - len), "\nMax Peak / Excess Churn : %llu / %llu\n", disk_cx_sess->dcs_max_peak_churn, disk_cx_sess->dcs_excess_churn); len += snprintf(page+len, (INM_PAGESZ - len), "First / Last Peak Churn TS : %llu / %llu\n", disk_cx_sess->dcs_first_peak_churn_ts, disk_cx_sess->dcs_last_peak_churn_ts); len += snprintf(page+len, (INM_PAGESZ - len), "First / Last NW failure TS : %llu / %llu\n", disk_cx_sess->dcs_first_nw_failure_ts, disk_cx_sess->dcs_last_nw_failure_ts); len += snprintf(page+len, (INM_PAGESZ - len), "No. of NW failires : %llu\n", disk_cx_sess->dcs_nr_nw_failures); len += snprintf(page+len, (INM_PAGESZ - len), "S2 latency : %llu\n", disk_cx_sess->dcs_max_s2_latency); unlock_session_lock: INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); len += snprintf(page+len, (INM_PAGESZ - len), "History \n"); len += snprintf(page+len, (INM_PAGESZ - len), "Start filtering issued at (sec) : %llu\n", ctxt->tc_hist.ths_start_flt_ts); len += snprintf(page+len, (INM_PAGESZ - len), "Clear stats issued at (sec) : %llu\n", ctxt->tc_hist.ths_clrstats_ts); len += snprintf(page+len, (INM_PAGESZ - len), "Clear diffs issued : %u\n", ctxt->tc_hist.ths_nr_clrdiffs); len += snprintf(page+len, (INM_PAGESZ - len), "Last Clear diffs issued at (sec) : %llu\n", ctxt->tc_hist.ths_clrdiff_ts); len += snprintf(page+len, (INM_PAGESZ - len), "Times Resync marked : %u\n", ctxt->tc_hist.ths_nr_osyncs); len += snprintf(page+len, (INM_PAGESZ - len), "Last Resync marked at(sec) : %llu\n", ctxt->tc_hist.ths_osync_ts); len += snprintf(page+len, (INM_PAGESZ - len), "Last Resync Error : %u\n", ctxt->tc_hist.ths_osync_err); len += snprintf(page+len, (INM_PAGESZ - len), "\n"); if (INM_COPYOUT(vol_stat->bufp, page, MIN(vol_stat->buf_len, INM_PAGESZ))) { err("copyout failed"); ret = INM_EFAULT; goto out; } out: if(ctxt) put_tgt_ctxt(ctxt); if (page){ INM_KFREE(page, INM_PAGESZ, INM_KERNEL_HEAP); } if (vol_stat){ INM_KFREE(vol_stat, sizeof(VOLUME_STATS), INM_KERNEL_HEAP); } return ret; } inm_s32_t process_get_volume_stats_v2_ioctl(inm_devhandle_t *handle, void * arg) { TELEMETRY_VOL_STATS *telemetry_vol_stats = NULL; int ret = 0; target_context_t *tgt_ctxt = NULL; VOLUME_STATS_DATA *drv_statsp = NULL; VOLUME_STATS_V2 *vol_statsp = NULL; telemetry_vol_stats = (TELEMETRY_VOL_STATS *) INM_KMALLOC( sizeof(TELEMETRY_VOL_STATS), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!telemetry_vol_stats) { err("Failed to allocated telemetry_vol_stats\n"); ret = INM_ENOMEM; goto out; } if (INM_COPYIN(telemetry_vol_stats, arg, sizeof(TELEMETRY_VOL_STATS))) { err("Failed to copy into telemetry_vol_stats"); ret = -EFAULT; goto out; } drv_statsp = &(telemetry_vol_stats->drv_stats); vol_statsp = &(telemetry_vol_stats->vol_stats); vol_statsp->VolumeGUID[GUID_SIZE_IN_CHARS-1] = '\0'; tgt_ctxt = get_tgt_ctxt_from_uuid_nowait((char *)&vol_statsp->VolumeGUID[0]); if (!tgt_ctxt) { err("Failed to get target context for uuid %s", vol_statsp->VolumeGUID); ret = -ENODEV; goto out; } drv_statsp->usMajorVersion = VOLUME_STATS_DATA_MAJOR_VERSION; drv_statsp->usMinorVersion = VOLUME_STATS_DATA_MINOR_VERSION; /* Fills up only one volume stats per call */ drv_statsp->ulVolumesReturned = 1; drv_statsp->ulNonPagedMemoryLimitInMB = 0; drv_statsp->LockedDataBlockCounter = 0; drv_statsp->ulTotalVolumes = driver_ctx->total_prot_volumes; drv_statsp->ulNumProtectedDisk = driver_ctx->total_prot_volumes; drv_statsp->eServiceState = driver_ctx->service_state; if (driver_ctx->dc_tel.dt_blend & DBS_DRIVER_NOREBOOT_MODE) drv_statsp->eDiskFilterMode = NoRebootMode; else drv_statsp->eDiskFilterMode = RebootMode; drv_statsp->LastShutdownMarker = driver_ctx->clean_shutdown; drv_statsp->PersistentRegistryCreated = driver_ctx->dc_tel.dt_persistent_dir_created; drv_statsp->ulDriverFlags = driver_ctx->dc_flags; drv_statsp->ulCommonBootCounter = 0; drv_statsp->ullDataPoolSizeAllocated = (driver_ctx->data_flt_ctx.pages_allocated * PAGE_SIZE); drv_statsp->ullPersistedTimeStampAfterBoot = driver_ctx->dc_tel.dt_timestamp_in_persistent_store; drv_statsp->ullPersistedSequenceNumberAfterBoot = driver_ctx->dc_tel.dt_seqno_in_persistent_store; vol_statsp->ullDataPoolSize = (driver_ctx->tunable_params.data_pool_size << MEGABYTE_BIT_SHIFT); vol_statsp->liDriverLoadTime.QuadPart = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( driver_ctx->dc_tel.dt_drv_load_time); vol_statsp->llTimeJumpDetectedTS = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( driver_ctx->dc_tel.dt_time_jump_exp); vol_statsp->llTimeJumpedTS = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( driver_ctx->dc_tel.dt_time_jump_cur); vol_statsp->liLastS2StartTime.QuadPart = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( driver_ctx->dc_tel.dt_s2_start_time); vol_statsp->liLastS2StopTime.QuadPart = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( driver_ctx->dc_tel.dt_s2_stop_time); vol_statsp->liLastAgentStartTime.QuadPart = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( driver_ctx->dc_tel.dt_svagent_start_time); vol_statsp->liLastAgentStopTime.QuadPart = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( driver_ctx->dc_tel.dt_svagent_stop_time); vol_statsp->liLastTagReq.QuadPart = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( driver_ctx->dc_tel.dt_last_tag_request_time); vol_statsp->liStopFilteringAllTimeStamp.QuadPart = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( driver_ctx->dc_tel.dt_unstack_all_time); vol_statsp->ullTotalTrackedBytes = tgt_ctxt->tc_bytes_commited_changes; vol_statsp->ulVolumeFlags = tgt_ctxt->tc_flags; vol_statsp->ulVolumeSize.QuadPart = inm_dev_size_get(tgt_ctxt); vol_statsp->liVolumeContextCreationTS.QuadPart = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( tgt_ctxt->tc_tel.tt_create_time); vol_statsp->liStartFilteringTimeStamp.QuadPart = TELEMETRY_FMT1601_TIMESTAMP_FROM_SEC( tgt_ctxt->tc_hist.ths_start_flt_ts); vol_statsp->liStartFilteringTimeStampByUser.QuadPart = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( tgt_ctxt->tc_tel.tt_start_flt_time_by_user); vol_statsp->liStopFilteringTimeStamp.QuadPart = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( tgt_ctxt->tc_tel.tt_stop_flt_time); vol_statsp->liStopFilteringTimestampByUser.QuadPart = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( tgt_ctxt->tc_tel.tt_user_stop_flt_time); vol_statsp->liClearDiffsTimeStamp.QuadPart = TELEMETRY_FMT1601_TIMESTAMP_FROM_SEC( tgt_ctxt->tc_hist.ths_clrdiff_ts); vol_statsp->liCommitDBTimeStamp.QuadPart = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( tgt_ctxt->tc_tel.tt_commitdb_time); vol_statsp->liGetDBTimeStamp.QuadPart = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( tgt_ctxt->tc_tel.tt_getdb_time); if (INM_COPYOUT(arg, telemetry_vol_stats, sizeof(TELEMETRY_VOL_STATS))) { err("Failed to copyout from telemetry_vol_stats\n"); ret = INM_EFAULT; goto out; } out: if (tgt_ctxt) { put_tgt_ctxt(tgt_ctxt); } if (telemetry_vol_stats) { INM_KFREE(telemetry_vol_stats, sizeof(TELEMETRY_VOL_STATS), INM_KERNEL_HEAP); } return ret; } inm_s32_t process_get_protected_volume_list_ioctl(inm_devhandle_t *handle, void * arg) { GET_VOLUME_LIST *vol_list = NULL; inm_s32_t ret = 0; inm_u32_t len = 0; inm_schar *lbufp = NULL; target_context_t *tgt_ctxt = NULL; struct inm_list_head *ptr = NULL, *nextptr = NULL; inm_u32_t usr_buf_len = 0; inm_u32_t unit_guid_len = 0; inm_u32_t lbuf_len = INM_GUID_LEN_MAX + 2; vol_list = (GET_VOLUME_LIST *) INM_KMALLOC(sizeof(GET_VOLUME_LIST), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!vol_list){ err("vol_list allocation volume_list ioctl failed\n"); ret = INM_ENOMEM; goto out; } if (INM_COPYIN(vol_list, (GET_VOLUME_LIST *) arg, sizeof(GET_VOLUME_LIST))) { err("copyin failed\n"); ret = INM_EFAULT; goto out; } if(vol_list->buf_len == 0){ err("allocate some space in user space"); INM_DOWN_READ(&(driver_ctx->tgt_list_sem)); len = inm_calc_len_required(driver_ctx->tgt_list.next); INM_UP_READ(&(driver_ctx->tgt_list_sem)); ret = INM_EAGAIN; goto out; } lbufp = (char *)INM_KMALLOC(lbuf_len, INM_KM_SLEEP, INM_KERNEL_HEAP); if(!lbufp){ err("buffer allocation volume_list ioctl failed\n"); ret = INM_ENOMEM; goto out; } usr_buf_len = vol_list->buf_len; INM_DOWN_READ(&(driver_ctx->tgt_list_sem)); inm_list_for_each_safe(ptr, nextptr, &driver_ctx->tgt_list) { tgt_ctxt = inm_list_entry(ptr, target_context_t, tc_list); if(tgt_ctxt->tc_flags & (VCF_VOLUME_CREATING | VCF_VOLUME_DELETING)){ tgt_ctxt = NULL; continue; } unit_guid_len = snprintf(lbufp, lbuf_len, "%s\n", tgt_ctxt->tc_guid); if (vol_list->buf_len < (len + unit_guid_len)) { err("insufficient mem allocated by user"); ret = INM_EAGAIN; len += inm_calc_len_required(ptr); break; } if (INM_COPYOUT(vol_list->bufp + len, lbufp, unit_guid_len)) { err("copyout failed\n"); ret = INM_EFAULT; break; } len += unit_guid_len; } INM_UP_READ(&(driver_ctx->tgt_list_sem)); if ( (len+1) > vol_list->buf_len){ ret = INM_EAGAIN; } if (INM_COPYOUT(vol_list->bufp + len, "\0", sizeof("\0"))) { err("copyout failed\n"); ret = INM_EFAULT; } out: if (ret == INM_EAGAIN){ vol_list->buf_len = len + EXTRA_PROTECTED_VOLUME; vol_list->bufp = NULL; if (INM_COPYOUT((GET_VOLUME_LIST *) arg, vol_list, sizeof (GET_VOLUME_LIST))) { err("copyout failed\n"); ret = INM_EFAULT; } } if(lbufp){ INM_KFREE(lbufp, lbuf_len, INM_KERNEL_HEAP); } if(vol_list){ INM_KFREE(vol_list, sizeof(GET_VOLUME_LIST), INM_KERNEL_HEAP); } return (ret); } inm_s32_t process_get_set_attr_ioctl(inm_devhandle_t *handle, void __INM_USER *arg) { inm_attribute_t *attr = NULL; inm_s32_t ret = 0; if(!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, sizeof(inm_attribute_t))){ err("Access ok failed"); return -EFAULT; } attr = (inm_attribute_t *) INM_KMALLOC(sizeof(inm_attribute_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!attr){ err("attr allocation get_set ioctl failed\n"); ret = INM_ENOMEM; goto out; } INM_MEM_ZERO(attr, sizeof(*attr)); if (INM_COPYIN(attr, arg, sizeof(inm_attribute_t))) { err("copyin failed\n"); ret = INM_EFAULT; goto out; } INM_BUG_ON((attr->why != SET_ATTR) && (attr->why != GET_ATTR)); if(!strcmp(attr->guid.volume_guid, "common")){ ret = common_get_set_attribute_entry(attr); } else { ret = volume_get_set_attribute_entry(attr); } out: if (attr){ INM_KFREE(attr, sizeof(inm_attribute_t), INM_KERNEL_HEAP); attr = NULL; } return ret; } inm_u32_t process_boottime_stacking_ioctl(inm_devhandle_t *handle, void * arg) { if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("entered"); } dbg("boottime stacking during ioctl"); init_boottime_stacking(); if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("leaving"); } return 0; } inm_u32_t process_mirror_exception_notify_ioctl(inm_devhandle_t *handle, void * arg) { char *src_scsi_id = NULL; target_context_t *tgt_ctxt = NULL; inm_resync_notify_info_t *resync_info = NULL; host_dev_ctx_t *hdcp = NULL; inm_irqflag_t lock_flag = 0; inm_u32_t ret = 0; dbg("entered"); INM_SPIN_LOCK_IRQSAVE(&driver_ctx->clean_shutdown_lock, lock_flag); if(driver_ctx->dc_flags & DRV_MIRROR_NOT_SUPPORT){ INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->clean_shutdown_lock, lock_flag); err("Mirror is not supported as system didn't had any scsi device at driver loading time"); return INM_ENOTSUP; } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->clean_shutdown_lock, lock_flag); resync_info = (inm_resync_notify_info_t *) INM_KMALLOC(sizeof(*resync_info), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!resync_info){ err("resync_info allocation get_set ioctl failed\n"); ret = INM_ENOMEM; goto out; } INM_MEM_ZERO(resync_info, sizeof(*resync_info)); if (INM_COPYIN(resync_info, (inm_resync_notify_info_t *) arg, sizeof(*resync_info))) { err("copyin failed\n"); ret = INM_EFAULT; goto out; } src_scsi_id = resync_info->rsin_src_scsi_id; if (!strcmp(src_scsi_id, "")){ err("NULL scsi id sent to resync nofication ioctl"); ret = INM_EINVAL; resync_info->rstatus = SRC_DEV_SCSI_ID_ERR; goto out; } INM_DOWN_READ(&driver_ctx->tgt_list_sem); tgt_ctxt = get_tgt_ctxt_persisted_name_nowait_locked(src_scsi_id); INM_UP_READ(&driver_ctx->tgt_list_sem); if (!tgt_ctxt) { dbg("Volume is not filtering"); resync_info->rstatus = MIRROR_NOT_SETUP; ret = INM_EINVAL; goto out; } hdcp = (host_dev_ctx_t *)(tgt_ctxt->tc_priv); if (resync_info->rsin_flag & INM_RESET_RESYNC_REQ_FLAG) { reset_volume_out_of_sync(tgt_ctxt); resync_info->rsin_flag &= ~INM_RESET_RESYNC_REQ_FLAG; } if ((ret = inm_wait_exception_ev(tgt_ctxt, resync_info))){ goto out; } if (INM_COPYOUT((inm_resync_notify_info_t *) arg, resync_info, sizeof(*resync_info))) { err("copyout failed\n"); ret = INM_EFAULT; } out: dbg("exiting"); if(resync_info){ INM_KFREE(resync_info, sizeof(*resync_info), INM_KERNEL_HEAP); } if (tgt_ctxt){ put_tgt_ctxt(tgt_ctxt); } return ret; } inm_s32_t process_get_dmesg(inm_devhandle_t *handle, void * arg) { inm_s32_t ret = 0; #ifdef INM_AIX inm_flush_log_file(); #endif return ret; } inm_s32_t process_mirror_test_heartbeat(inm_devhandle_t *idhp, void __INM_USER *arg) { target_context_t *tgt_ctxt = NULL; SCSI_ID *scsi_id = NULL; inm_irqflag_t lock_flag = 0; inm_s32_t ret = 0; dbg("entered"); INM_SPIN_LOCK_IRQSAVE(&driver_ctx->clean_shutdown_lock, lock_flag); if(driver_ctx->dc_flags & DRV_MIRROR_NOT_SUPPORT){ INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->clean_shutdown_lock, lock_flag); err("Mirror is not supported as system didn't had any scsi device at driver loading time"); return INM_ENOTSUP; } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->clean_shutdown_lock, lock_flag); if (!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, sizeof(SCSI_ID))) { err("Read access violation for SCSI_ID"); return -EFAULT; } scsi_id = (SCSI_ID *)INM_KMALLOC(sizeof(SCSI_ID), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!scsi_id) { err("INM_KMALLOC failed to allocate memory for SCSI_ID"); return -ENOMEM; } if (INM_COPYIN(scsi_id, arg, sizeof(SCSI_ID))) { err("INM_COPYIN failed"); INM_KFREE(scsi_id, sizeof(SCSI_ID), INM_KERNEL_HEAP); return -EFAULT; } scsi_id->scsi_id[INM_MAX_SCSI_ID_SIZE-1] = '\0'; INM_DOWN_READ(&driver_ctx->tgt_list_sem); tgt_ctxt = get_tgt_ctxt_persisted_name_nowait_locked((char *)&scsi_id->scsi_id[0]); if (!tgt_ctxt) { INM_UP_READ(&driver_ctx->tgt_list_sem); dbg("Failed to get target context from scsi id:%s", scsi_id->scsi_id); INM_KFREE(scsi_id, sizeof(SCSI_ID), INM_KERNEL_HEAP); return 0; } INM_UP_READ(&driver_ctx->tgt_list_sem); ret = inm_heartbeat_cdb(tgt_ctxt); INM_KFREE(scsi_id, sizeof(SCSI_ID), INM_KERNEL_HEAP); idhp->private_data = NULL; put_tgt_ctxt(tgt_ctxt); dbg("leaving"); return ret; } static void print_AT_stat_common(target_context_t *tcp, char *page, inm_s32_t *len) { mirror_vol_entry_t *vol_entry = NULL; struct inm_list_head *ptr, *hd, *nextptr; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ dbg("entered with tcp %p, page %p, len %p len %d",tcp, page, len, *len); } if(tcp->tc_dev_type == FILTER_DEV_MIRROR_SETUP){ (*len) += snprintf((page+(*len)), (INM_PAGESZ - (*len)), "AT name, #IO issued, #successful IOs, #No of Byte written, Status of path, no of ref \n"); volume_lock(tcp); hd = &(tcp->tc_dst_list); inm_list_for_each_safe(ptr, nextptr, hd){ vol_entry = inm_container_of(ptr, mirror_vol_entry_t, next); (*len) += snprintf((page+(*len)), (INM_PAGESZ - (*len)), "%s, %llu, %llu, %llu, %s %d\n", vol_entry->tc_mirror_guid, vol_entry->vol_io_issued, vol_entry->vol_io_succeeded, vol_entry->vol_byte_written, vol_entry->vol_error?"offline":"online", INM_ATOMIC_READ(&(vol_entry->vol_ref))); } volume_unlock(tcp); } if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ dbg("exiting"); } } inm_s32_t process_get_additional_volume_stats(inm_devhandle_t *handle, void *arg) { inm_s32_t ret = -1; VOLUME_STATS_ADDITIONAL_INFO *vsa_infop = NULL; target_context_t *ctxt = NULL; inm_irqflag_t lock_flag = 0; if ( !INM_ACCESS_OK(VERIFY_READ, (void __user*)arg, sizeof(VOLUME_STATS_ADDITIONAL_INFO))) { err( " Read Access Violation for GET_ADDITIONAL_VOLUME_STATS\n"); ret = -EFAULT; return (ret); } vsa_infop = (VOLUME_STATS_ADDITIONAL_INFO *) INM_KMALLOC(sizeof(VOLUME_STATS_ADDITIONAL_INFO), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!vsa_infop) { ret = -ENOMEM; err("INM_KMALLOC() failed additional stats structure\n"); return (ret); } INM_MEM_ZERO(vsa_infop, sizeof(VOLUME_STATS_ADDITIONAL_INFO)); if ( INM_COPYIN( vsa_infop, arg, sizeof( VOLUME_STATS_ADDITIONAL_INFO ))) { err("INM_COPYIN failed"); INM_KFREE(vsa_infop, sizeof( VOLUME_STATS_ADDITIONAL_INFO ), INM_KERNEL_HEAP); ret = -EFAULT; return (ret); } vsa_infop->VolumeGuid.volume_guid[GUID_SIZE_IN_CHARS-1] = '\0'; ctxt = get_tgt_ctxt_from_uuid_nowait((char *)&vsa_infop->VolumeGuid.volume_guid[0]); if (!ctxt) { dbg("Failed to get target context from uuid"); INM_KFREE(vsa_infop, sizeof( VOLUME_STATS_ADDITIONAL_INFO ), INM_KERNEL_HEAP); ret = -EINVAL; return (ret); } /* collect the in-core pending changes and set appropriate rpo timestamp */ volume_lock(ctxt); vsa_infop->ullTotalChangesPending = ctxt->tc_bytes_pending_changes; vsa_infop->ullOldestChangeTimeStamp = get_rpo_timestamp(ctxt, IOCTL_INMAGE_GET_ADDITIONAL_VOLUME_STATS, NULL); volume_unlock(ctxt); /* Get the current driver time stamp */ INM_SPIN_LOCK_IRQSAVE(&driver_ctx->time_stamp_lock, lock_flag); vsa_infop->ullDriverCurrentTimeStamp = driver_ctx->last_time_stamp; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->time_stamp_lock, lock_flag); /* can't negative RPOs */ if (vsa_infop->ullDriverCurrentTimeStamp < vsa_infop->ullOldestChangeTimeStamp) { vsa_infop->ullDriverCurrentTimeStamp = vsa_infop->ullOldestChangeTimeStamp; } /* Add outstanding bitmap changes to the total pending changes */ if (ctxt->tc_bp && ctxt->tc_bp->volume_bitmap && ctxt->tc_bp->volume_bitmap->bitmap_api) { bitmap_api_t *bapi = ctxt->tc_bp->volume_bitmap->bitmap_api; INM_DOWN(&bapi->sem); vsa_infop->ullTotalChangesPending += bitmap_api_get_dat_bytes_in_bitmap(bapi, NULL); INM_UP(&bapi->sem); } put_tgt_ctxt(ctxt); ret = 0; if ( INM_COPYOUT( (void*) arg, vsa_infop, sizeof(VOLUME_STATS_ADDITIONAL_INFO))) { err("INM_COPYOUT failed"); INM_KFREE(vsa_infop, sizeof( VOLUME_STATS_ADDITIONAL_INFO ), INM_KERNEL_HEAP); ret = -EFAULT; return (ret); } INM_KFREE(vsa_infop, sizeof( VOLUME_STATS_ADDITIONAL_INFO ), INM_KERNEL_HEAP); return (ret); } inm_s32_t process_get_volume_latency_stats(inm_devhandle_t *handle, void *arg) { inm_s32_t ret = -1; VOLUME_LATENCY_STATS *vol_latstatsp = NULL; target_context_t *ctxt = NULL; if ( !INM_ACCESS_OK(VERIFY_READ, (void __user*)arg, sizeof(VOLUME_LATENCY_STATS))) { err( " Read Access Violation for GET_ADDITIONAL_VOLUME_STATS\n"); ret = -EFAULT; return (ret); } vol_latstatsp = (VOLUME_LATENCY_STATS *) INM_KMALLOC(sizeof(*vol_latstatsp), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!vol_latstatsp) { ret = -ENOMEM; err("INM_KMALLOC() failed additional stats structure\n"); return (ret); } INM_MEM_ZERO(vol_latstatsp, sizeof(VOLUME_LATENCY_STATS)); if ( INM_COPYIN(vol_latstatsp, arg, sizeof(VOLUME_LATENCY_STATS))) { err("INM_COPYIN failed"); INM_KFREE(vol_latstatsp, sizeof(VOLUME_LATENCY_STATS), INM_KERNEL_HEAP); ret = -EFAULT; return (ret); } vol_latstatsp->VolumeGuid.volume_guid[GUID_SIZE_IN_CHARS-1] = '\0'; ctxt = get_tgt_ctxt_from_uuid_nowait((char *)&vol_latstatsp->VolumeGuid.volume_guid[0]); if(!ctxt) { dbg("Failed to get target context from uuid"); INM_KFREE(vol_latstatsp, sizeof( VOLUME_LATENCY_STATS ), INM_KERNEL_HEAP); ret = -EINVAL; return (ret); } volume_lock(ctxt); retrieve_volume_latency_stats(ctxt, vol_latstatsp); volume_unlock(ctxt); put_tgt_ctxt(ctxt); ret = 0; if ( INM_COPYOUT( (void*) arg, vol_latstatsp, sizeof(VOLUME_LATENCY_STATS))) { err("INM_COPYOUT failed"); INM_KFREE(vol_latstatsp, sizeof( VOLUME_LATENCY_STATS ), INM_KERNEL_HEAP); ret = -EFAULT; return (ret); } INM_KFREE(vol_latstatsp, sizeof( VOLUME_LATENCY_STATS ), INM_KERNEL_HEAP); return (ret); } inm_s32_t process_bitmap_stats_ioctl(inm_devhandle_t *handle, void *arg) { VOLUME_BMAP_STATS *vbstatsp; VOLUME_GUID *vguidp = NULL; bmap_bit_stats_t *bbsp = NULL; inm_s32_t ret = INM_EFAULT; volume_bitmap_t *vbmap = NULL; target_context_t *tcp = NULL; vbstatsp = (VOLUME_BMAP_STATS *) INM_KMALLOC(sizeof(*vbstatsp), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!vbstatsp) { err("INM_KMALLOC() failed\n"); return -ENOMEM; } INM_MEM_ZERO(vbstatsp, sizeof(*vguidp)); if (INM_COPYIN(vbstatsp, arg, sizeof(*vbstatsp))) { err("copyin failed\n"); INM_KFREE(vbstatsp, sizeof(*vbstatsp), INM_KERNEL_HEAP); ret = INM_EFAULT; return (ret); } tcp = get_tgt_ctxt_from_uuid_nowait(vbstatsp->VolumeGuid.volume_guid); if(!tcp) { err("no target context for %s\n", vbstatsp->VolumeGuid.volume_guid); INM_KFREE(vbstatsp, sizeof(*vbstatsp), INM_KERNEL_HEAP); ret = INM_EFAULT; return ret; } bbsp = (bmap_bit_stats_t *) INM_KMALLOC(sizeof(*bbsp), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!bbsp) { err("INM_KMALLOC() failed\n"); put_tgt_ctxt(tcp); INM_KFREE(vbstatsp, sizeof(*vbstatsp), INM_KERNEL_HEAP); return -ENOMEM; } if (tcp->tc_bp) { vbmap = tcp->tc_bp->volume_bitmap; if (vbmap && vbmap->bitmap_api) { vbstatsp->bmap_data_sz = bitmap_api_get_dat_bytes_in_bitmap(vbmap->bitmap_api, bbsp); vbstatsp->nr_dbs = (inm_u32_t)bbsp->bbs_nr_dbs; vbstatsp->bmap_gran = bbsp->bbs_bmap_gran; } } info("Volume Name : %s \n", vbstatsp->VolumeGuid.volume_guid); info("bitmap gran : %lld \n", bbsp->bbs_bmap_gran); info("bitmap dblks: %d\n", bbsp->bbs_nr_dbs); INM_KFREE(bbsp, sizeof(*bbsp), INM_KERNEL_HEAP); put_tgt_ctxt(tcp); ret = 0; if ( INM_COPYOUT( (void*) arg, vbstatsp, sizeof(VOLUME_BMAP_STATS))) { err("INM_COPYOUT failed"); INM_KFREE(vbstatsp, sizeof( VOLUME_BMAP_STATS ), INM_KERNEL_HEAP); ret = -EFAULT; return (ret); } INM_KFREE(vbstatsp, sizeof(*vbstatsp), INM_KERNEL_HEAP); return (ret); } inm_s32_t process_set_involflt_verbosity(inm_devhandle_t *handle, void *arg) { inm_u32_t ioc_verbose = 0; inm_s32_t ret = 0; if (INM_COPYIN(&ioc_verbose, arg, sizeof(inm_u32_t))) { err("copyin failed\n"); ret = INM_EFAULT; return (ret); } if (ioc_verbose < 1) { inm_verbosity = 0; goto out; } inm_verbosity |= INM_DEBUG_ONLY; if (ioc_verbose < 2) { goto out; } inm_verbosity |= INM_IDEBUG; if (ioc_verbose < 3) { goto out; } inm_verbosity |= INM_IDEBUG_META; if (ioc_verbose < 4) { goto out; } inm_verbosity |= INM_IDEBUG_MIRROR; if (ioc_verbose < 5) { goto out; } inm_verbosity |= INM_IDEBUG_MIRROR_IO; if (ioc_verbose < 6) { goto out; } inm_verbosity |= INM_IDEBUG_REF; if (ioc_verbose < 7) { goto out; } inm_verbosity |= INM_IDEBUG_IO; out: return ret; } inm_s32_t process_tag_volume_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg) { tag_info_t_v2 *tag_vol = NULL; int ret = 0; int numvol = 0; int no_of_vol_tags_done = 0; inm_s32_t error = 0; tag_info_t *tag_list = NULL; int commit_pending = TAG_COMMIT_PENDING; inm_u32_t vacp_app_tag_commit_timeout = 0; unsigned long lock_flag = 0; int set_tag_guid = 0; dbg("entered process_tag_volume_ioctl"); if(!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, sizeof(tag_info_t_v2))) { err("Read access violation for tag_info_t_v2"); ret = -EFAULT; goto out; } tag_vol = (tag_info_t_v2 *)INM_KMALLOC(sizeof(tag_info_t_v2), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!tag_vol) { err("INM_KMALLOC failed to allocate memory for tag_info_t_v2"); ret = -ENOMEM; goto out; } if(INM_COPYIN(tag_vol, arg, sizeof(tag_info_t_v2))) { err("INM_COPYIN failed"); ret = -EFAULT; goto out_err; } if(tag_vol->nr_tags <= 0) { err("Tag Input Failed: number of tags can't be zero or negative"); ret = -EINVAL; goto out_err; } if(tag_vol->nr_vols <= 0 && !(tag_vol->flags & TAG_ALL_PROTECTED_VOLUME_IOBARRIER)) { err("Tag Input Failed: Number of volumes can't be zero or negative"); ret = -EINVAL; goto out_err; } arg = tag_vol->tag_names; tag_vol->tag_names = NULL; tag_vol->tag_names = (tag_names_t *)INM_KMALLOC(tag_vol->nr_tags * sizeof(tag_names_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!tag_vol->tag_names) { err("INM_KMALLOC failed to allocate memory for tag_names_t"); ret = -EFAULT; goto out_err; } if(!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, tag_vol->nr_tags * sizeof(tag_names_t))) { err("Read access violation for tag_names_t"); ret = -EFAULT; goto out_err_vol; } if(INM_COPYIN(tag_vol->tag_names, arg, tag_vol->nr_tags * sizeof(tag_names_t))) { err("INM_COPYIN failed"); ret = -EFAULT; goto out_err_vol; } /* now build the tag list which will be use for set of given volumes */ tag_list = build_tag_vol_list(tag_vol, &error); if(error | !tag_list) { err("build tag volume list failed for the volume"); ret = error; goto out_err_vol; } arg = tag_vol->vol_info; tag_vol->vol_info = NULL; INM_DOWN(&driver_ctx->dc_cp_mutex); if (driver_ctx->dc_cp != INM_CP_NONE) { /* Active CP */ if (INM_MEM_CMP(driver_ctx->dc_cp_guid, tag_vol->tag_guid, sizeof(driver_ctx->dc_cp_guid))) { err("GUID mismatch"); ret = -EINVAL; goto out_unlock; } if (driver_ctx->dc_cp & INM_CP_TAG_COMMIT_PENDING) { err("Already Tagged"); ret = -EINVAL; goto out_unlock; } } INM_SPIN_LOCK(&driver_ctx->dc_tag_commit_status); if (driver_ctx->dc_tag_drain_notify_guid && driver_ctx->dc_cp_guid[0] == NULL && !INM_MEM_CMP(tag_vol->tag_guid, driver_ctx->dc_tag_drain_notify_guid, GUID_LEN)) { set_tag_guid = 1; memcpy_s(&driver_ctx->dc_cp_guid, sizeof(driver_ctx->dc_cp_guid), tag_vol->tag_guid, sizeof(tag_vol->tag_guid)); } INM_SPIN_UNLOCK(&driver_ctx->dc_tag_commit_status); if ((tag_vol->flags & TAG_ALL_PROTECTED_VOLUME_IOBARRIER) == TAG_ALL_PROTECTED_VOLUME_IOBARRIER) { #ifdef INM_LINUX if (driver_ctx->dc_cp == INM_CP_CRASH_ACTIVE) { dbg("issuing tag"); ret = iobarrier_issue_tag_all_volume(tag_list, tag_vol->nr_tags, commit_pending, NULL); if(ret) { dbg("Failed to tag all the volume\n"); } else update_cx_with_tag_success(); } else { err("Barrier not created"); dbg("cp state = %d", driver_ctx->dc_cp); ret = -EINVAL; } #else err("Crash consistency not supported on non-Linux platforms"); ret = -EINVAL;; #endif goto out_unlock; } INM_SPIN_LOCK_IRQSAVE(&driver_ctx->tunables_lock, lock_flag); vacp_app_tag_commit_timeout = driver_ctx->tunable_params.vacp_app_tag_commit_timeout; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->tunables_lock, lock_flag); /* * If we need to start the timer, check for timeout sanity */ if (driver_ctx->dc_cp == INM_CP_NONE) { if (tag_vol->timeout <= 0 || tag_vol->timeout > vacp_app_tag_commit_timeout) { err("Tag Input Failed: Invalid timeout"); ret = -EINVAL; goto out_unlock; } } /* alloc a buffer and reuse it to store the volume info for a set of volumes */ tag_vol->vol_info = (volume_info_t *)INM_KMALLOC( sizeof(volume_info_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!tag_vol->vol_info) { err("INM_KMALLOC failed to allocate memory for volume_info_t"); ret = -EFAULT; goto out_unlock; } for(numvol = 0; numvol < tag_vol->nr_vols; numvol++) { /* mem set the buffer before using it */ INM_MEM_ZERO(tag_vol->vol_info, sizeof(volume_info_t)); if(!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, sizeof(*tag_vol->vol_info))) { err("Read access violation for volume_info_t"); ret = -EFAULT; break; } if(INM_COPYIN(tag_vol->vol_info, arg, sizeof(*tag_vol->vol_info))) { err("INM_COPYIN failed"); ret = -EFAULT; break; } /* process the tag volume list */ tag_vol->vol_info->vol_name[TAG_VOLUME_MAX_LENGTH - 1] = '\0'; ret = process_tag_volume(tag_vol, tag_list, commit_pending); if(ret) { dbg("Failed to tag the volume\n"); } else { no_of_vol_tags_done++; } if(!INM_ACCESS_OK(VERIFY_WRITE, (void __INM_USER *)arg, sizeof(*tag_vol->vol_info))) { err("write access verification failed"); ret = INM_EFAULT; break; } if(INM_COPYOUT(arg, tag_vol->vol_info, sizeof(*tag_vol->vol_info))) { err("copy to user failed for freeze volume status"); ret = INM_EFAULT; break; } arg += sizeof(*tag_vol->vol_info); } if (no_of_vol_tags_done) { if(no_of_vol_tags_done == tag_vol->nr_vols) { dbg("Tagged all volumes"); ret = INM_TAG_SUCCESS; } else { dbg("Volumes partially tagged"); ret = INM_TAG_PARTIAL; } #ifdef INM_LINUX /* * If no kernel fs freeze, start timer to * revoke tags on timeout */ if (driver_ctx->dc_cp == INM_CP_TAG_COMMIT_PENDING) { set_tag_guid = 0; memcpy_s(&driver_ctx->dc_cp_guid, sizeof(driver_ctx->dc_cp_guid), tag_vol->tag_guid, sizeof(tag_vol->tag_guid)); start_cp_timer(tag_vol->timeout, inm_fvol_list_thaw_on_timeout); } #endif } else { INM_BUG_ON(ret >= 0); } dbg("no_of_vol_tags_done [%d], no of volumes [%d]", no_of_vol_tags_done,tag_vol->nr_vols); out_unlock: INM_UP(&driver_ctx->dc_cp_mutex); out: if (set_tag_guid) INM_MEM_ZERO(driver_ctx->dc_cp_guid, sizeof(driver_ctx->dc_cp_guid)); if(tag_list) { INM_KFREE(tag_list, tag_vol->nr_tags * sizeof(tag_info_t), INM_KERNEL_HEAP); tag_list = NULL; } if(tag_vol) { if(tag_vol->vol_info) { INM_KFREE(tag_vol->vol_info, sizeof(volume_info_t), INM_KERNEL_HEAP); tag_vol->vol_info = NULL; } if(tag_vol->tag_names) { INM_KFREE(tag_vol->tag_names, tag_vol->nr_tags * sizeof(tag_names_t), INM_KERNEL_HEAP); tag_vol->tag_names = NULL; } INM_KFREE(tag_vol, sizeof(tag_info_t_v2), INM_KERNEL_HEAP); tag_vol = NULL; } dbg("leaving process_tag_volume_ioctl"); return ret; out_err_vol: tag_vol->vol_info = NULL; goto out; out_err: tag_vol->vol_info = NULL; tag_vol->tag_names = NULL; goto out; } inm_s32_t process_get_blk_mq_status_ioctl(inm_devhandle_t *handle, void *arg) { #if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 10, 0) inm_s32_t ret = 0; inm_block_device_t *bdev = NULL; struct request_queue *q = NULL; BLK_MQ_STATUS *blk_mq_status = NULL; if (!INM_ACCESS_OK(VERIFY_READ | VERIFY_WRITE, (void __user*)arg, sizeof(BLK_MQ_STATUS))) { err( "Access Violation for GET_BLK_MQ_STATUS"); ret = -EFAULT; return ret; } blk_mq_status = (BLK_MQ_STATUS*) INM_KMALLOC(sizeof(BLK_MQ_STATUS), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!blk_mq_status) { ret = -ENOMEM; err("INM_KMALLOC failed to allocate blk_mq_status"); return ret; } INM_MEM_ZERO(blk_mq_status, sizeof(BLK_MQ_STATUS)); if (INM_COPYIN(blk_mq_status, arg, sizeof(BLK_MQ_STATUS))) { err("INM_COPYIN failed"); ret = -EFAULT; goto ERR_EXIT; } blk_mq_status->VolumeGuid.volume_guid[GUID_SIZE_IN_CHARS-1] = '\0'; blk_mq_status->blk_mq_enabled = 0; dbg("Device path: %s", blk_mq_status->VolumeGuid.volume_guid); bdev = open_by_dev_path(blk_mq_status->VolumeGuid.volume_guid, 0); if (!bdev) { dbg("Failed to convert dev path (%s) to bdev", blk_mq_status->VolumeGuid.volume_guid); ret = -ENODEV; goto ERR_EXIT; } q = bdev_get_queue(bdev); if (q->mq_ops != NULL) { blk_mq_status->blk_mq_enabled = 1; } if (INM_COPYOUT(arg, blk_mq_status, sizeof(BLK_MQ_STATUS))) { err("copyout failed"); ret = INM_EFAULT; } ERR_EXIT: if (bdev != NULL) close_bdev(bdev, FMODE_READ); if (blk_mq_status != NULL) INM_KFREE(blk_mq_status, sizeof(BLK_MQ_STATUS), INM_KERNEL_HEAP); return ret; #else return INM_ENOTSUP; #endif } inm_s32_t process_replication_state_ioctl(inm_devhandle_t *handle, void *arg) { inm_s32_t error = 0; replication_state_t *rep = NULL; target_context_t *ctxt = NULL; if (!INM_ACCESS_OK(VERIFY_READ, (void __user*)arg, sizeof(replication_state_t))) { err( "Access Violation for replication_state_t"); error = -EFAULT; goto out; } rep = (replication_state_t *)INM_KMALLOC(sizeof(replication_state_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!rep) { error = -ENOMEM; err("INM_KMALLOC failed to allocate replication_state_t"); goto out; } if (INM_COPYIN(rep, arg, sizeof(replication_state_t))) { err("copyin failed for replication_state_t"); error = -EFAULT; goto out; } if (!(rep->ulFlags & REPLICATION_STATES_SUPPORTED)) { err("Unsupported flag %llu", rep->ulFlags); error = -EINVAL; goto out; } ctxt = get_tgt_ctxt_from_uuid(rep->DeviceId.volume_guid); if (!ctxt) { err("Cannot find %s to set DS throttle", rep->DeviceId.volume_guid); error = -EFAULT; goto out; } volume_lock(ctxt); if (!ctxt->tc_tel.tt_ds_throttle_start || (ctxt->tc_tel.tt_ds_throttle_stop != TELEMETRY_THROTTLE_IN_PROGRESS)) { telemetry_set_dbs(&ctxt->tc_tel.tt_blend, DBS_DIFF_SYNC_THROTTLE); get_time_stamp(&ctxt->tc_tel.tt_ds_throttle_start); ctxt->tc_tel.tt_ds_throttle_stop = TELEMETRY_THROTTLE_IN_PROGRESS; } volume_unlock(ctxt); put_tgt_ctxt(ctxt); out: if (rep) INM_KFREE(rep, sizeof(replication_state_t), INM_KERNEL_HEAP); return error; } inm_s32_t process_name_mapping_ioctl(inm_devhandle_t *handle, void *arg) { inm_s32_t error = 0; vol_name_map_t *vnmap= NULL; target_context_t *ctxt = NULL; if (!INM_ACCESS_OK(VERIFY_READ, (void __user*)arg, sizeof(vol_name_map_t))) { err( "Access Violation for vol_name_map_t"); error = -EFAULT; goto out; } vnmap = (vol_name_map_t *)INM_KMALLOC(sizeof(vol_name_map_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!vnmap) { error = -ENOMEM; err("INM_KMALLOC failed to allocate vol_name_map_t"); goto out; } if (INM_COPYIN(vnmap, arg, sizeof(vol_name_map_t))) { err("copyin failed for vol_name_map_t"); error = -EFAULT; goto out; } if (!(vnmap->vnm_flags & INM_VOL_NAME_MAP_GUID) && !(vnmap->vnm_flags & INM_VOL_NAME_MAP_PNAME)) { err("Request flag not set"); error = -EINVAL; goto out; } vnmap->vnm_request[sizeof(vnmap->vnm_request) - 1] = '\0'; if (vnmap->vnm_flags & INM_VOL_NAME_MAP_GUID) ctxt = get_tgt_ctxt_from_uuid_nowait(vnmap->vnm_request); else ctxt = get_tgt_ctxt_from_name_nowait(vnmap->vnm_request); if (!ctxt) { err("Cannot find %s for name mapping", vnmap->vnm_request); error = -ENODEV; goto out; } if (vnmap->vnm_flags & INM_VOL_NAME_MAP_GUID) strcpy_s(vnmap->vnm_response, sizeof(vnmap->vnm_response), ctxt->tc_pname); else strcpy_s(vnmap->vnm_response, sizeof(vnmap->vnm_response), ctxt->tc_guid); put_tgt_ctxt(ctxt); if (INM_COPYOUT(arg, vnmap, sizeof(vol_name_map_t))) { err("copyout failed"); error = INM_EFAULT; } out: if (vnmap) INM_KFREE(vnmap, sizeof(vol_name_map_t), INM_KERNEL_HEAP); return error; } inm_s32_t process_commitdb_fail_trans_ioctl(inm_devhandle_t *idhp, void *arg) { COMMIT_DB_FAILURE_STATS *cdf_stats = NULL; target_context_t *ctxt = (target_context_t *)idhp->private_data; vm_cx_session_t *vm_cx_sess; disk_cx_session_t *disk_cx_sess; inm_s32_t error = 0; if (!ctxt) { err("commitdb_fail_trans ioctl is called with file private as NULL"); error = INM_EINVAL; goto out; } if (!INM_ACCESS_OK(VERIFY_READ, (void __user*)arg, sizeof(COMMIT_DB_FAILURE_STATS))) { err( "Access Violation for COMMIT_DB_FAILURE_STATS"); error = INM_EFAULT; goto out; } cdf_stats = INM_KMALLOC(sizeof(COMMIT_DB_FAILURE_STATS), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!cdf_stats) { err("failed to allocate COMMIT_DB_FAILURE_STATS"); error = INM_ENOMEM; goto out; } if (INM_COPYIN(cdf_stats, arg, sizeof(COMMIT_DB_FAILURE_STATS))) { err("copyin failed for COMMIT_DB_FAILURE_STATS"); error = INM_EFAULT; goto out; } if(is_target_filtering_disabled(ctxt)) { dbg("commitdb_fail_trans ioctl failed as filtering is not enabled for" " %s", cdf_stats->DeviceID.volume_guid); error = INM_EBUSY; goto out; } get_tgt_ctxt(ctxt); vm_cx_sess = &driver_ctx->dc_vm_cx_session; disk_cx_sess = &ctxt->tc_disk_cx_session; volume_lock(ctxt); ctxt->tc_s2_latency_base_ts = 0; volume_unlock(ctxt); INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); if (vm_cx_sess->vcs_flags & VCS_CX_SESSION_STARTED && !(vm_cx_sess->vcs_flags & VCS_CX_SESSION_ENDED)&& disk_cx_sess->dcs_flags & DCS_CX_SESSION_STARTED) { change_node_t *chg_node = ctxt->tc_pending_confirm; if (chg_node && (chg_node->transaction_id == cdf_stats->ulTransactionID)) { if (cdf_stats->ullFlags & COMMITDB_NETWORK_FAILURE) { disk_cx_sess->dcs_nr_nw_failures++; disk_cx_sess->dcs_last_nw_failure_error_code = cdf_stats->ullErrorCode; get_time_stamp(&(disk_cx_sess->dcs_last_nw_failure_ts)); if (!disk_cx_sess->dcs_first_nw_failure_ts) disk_cx_sess->dcs_first_nw_failure_ts = disk_cx_sess->dcs_last_nw_failure_ts; } } } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); put_tgt_ctxt(ctxt); out: if (cdf_stats) INM_KFREE(cdf_stats, sizeof(COMMIT_DB_FAILURE_STATS), INM_KERNEL_HEAP); return error; } inm_s32_t validate_output_disk_buffer(void *device_list_arg, inm_u32_t num_protected_disks, inm_u32_t num_output_disks, inm_u32_t *num_output_disks_occupied, inm_list_head_t *disk_cx_stats_list) { inm_s32_t error = 0; disk_cx_stats_info_t *disk_cx_stats_info = NULL; DEVICE_CXFAILURE_STATS *dev_cx_stats; VOLUME_GUID *guid = NULL; target_context_t *tgt_ctxt = NULL; inm_list_head_t *ptr; inm_list_head_t *disk_cx_stats_ptr; int idx; int found; int num_out_disks = 0; for (idx = 1; idx <= num_output_disks; idx++) { disk_cx_stats_info = INM_KMALLOC(sizeof(disk_cx_stats_info_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!disk_cx_stats_info) { err("Failed to allocate disk_cx_stats_info_t"); error = INM_ENOMEM; goto out; } INM_MEM_ZERO(disk_cx_stats_info, sizeof(disk_cx_stats_info_t)); inm_list_add_tail(&disk_cx_stats_info->dcsi_list, disk_cx_stats_list); } guid = INM_KMALLOC(sizeof(VOLUME_GUID), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!guid) { err("Failed to allocate VOLUME_GUID"); error = INM_ENOMEM; goto out; } while (num_protected_disks) { if (!INM_ACCESS_OK(VERIFY_READ, (void __user*)device_list_arg, sizeof(VOLUME_GUID))) { err( "Access Violation for VOLUME_GUID"); error = INM_EFAULT; goto out; } if (INM_COPYIN(guid, device_list_arg, sizeof(VOLUME_GUID))) { err("copyin failed for VOLUME_GUID"); error = INM_EFAULT; goto out; } disk_cx_stats_info = inm_list_entry(disk_cx_stats_list->prev, disk_cx_stats_info_t, dcsi_list); inm_list_del(&disk_cx_stats_info->dcsi_list); disk_cx_stats_info->dcsi_valid = 1; dev_cx_stats = &disk_cx_stats_info->dcsi_dev_cx_stats; memcpy_s(&dev_cx_stats->DeviceId, sizeof(VOLUME_GUID), guid, sizeof(VOLUME_GUID)); dev_cx_stats->ullFlags |= DISK_CXSTATUS_DISK_NOT_FILTERED; inm_list_add(&disk_cx_stats_info->dcsi_list, disk_cx_stats_list); num_out_disks++; device_list_arg += sizeof(VOLUME_GUID); num_protected_disks--; } INM_DOWN_READ(&(driver_ctx->tgt_list_sem)); for(ptr = driver_ctx->tgt_list.next; ptr != &(driver_ctx->tgt_list); ptr = ptr->next, tgt_ctxt = NULL) { tgt_ctxt = inm_list_entry(ptr, target_context_t, tc_list); if (tgt_ctxt->tc_flags & (VCF_VOLUME_CREATING | VCF_VOLUME_DELETING)) continue; found = 0; for (disk_cx_stats_ptr = disk_cx_stats_list->next; disk_cx_stats_ptr != disk_cx_stats_list; disk_cx_stats_ptr = disk_cx_stats_ptr->next) { disk_cx_stats_info = inm_list_entry(disk_cx_stats_ptr, disk_cx_stats_info_t, dcsi_list); dev_cx_stats = &disk_cx_stats_info->dcsi_dev_cx_stats; if (!disk_cx_stats_info->dcsi_valid) break; if (!strcmp(tgt_ctxt->tc_pname, dev_cx_stats->DeviceId.volume_guid)) { dev_cx_stats->ullFlags &= ~DISK_CXSTATUS_DISK_NOT_FILTERED; found = 1; break; } } if (found) continue; if (num_out_disks == num_output_disks) { INM_UP_READ(&(driver_ctx->tgt_list_sem)); error = INM_EAGAIN; goto out; } disk_cx_stats_info = inm_list_entry(disk_cx_stats_list->prev, disk_cx_stats_info_t, dcsi_list); inm_list_del(&disk_cx_stats_info->dcsi_list); disk_cx_stats_info->dcsi_valid = 1; dev_cx_stats = &disk_cx_stats_info->dcsi_dev_cx_stats; strcpy_s(dev_cx_stats->DeviceId.volume_guid, GUID_SIZE_IN_CHARS, tgt_ctxt->tc_pname); inm_list_add(&disk_cx_stats_info->dcsi_list, disk_cx_stats_list); num_out_disks++; } INM_UP_READ(&(driver_ctx->tgt_list_sem)); *num_output_disks_occupied = num_out_disks; out: if (guid) INM_KFREE(guid, sizeof(VOLUME_GUID), INM_KERNEL_HEAP); return error; } inm_s32_t process_get_cxstatus_notify_ioctl(inm_devhandle_t *handle, void *arg) { GET_CXFAILURE_NOTIFY *get_cx_notify = NULL; VM_CXFAILURE_STATS *vm_cx_stats = NULL; void *device_list_arg; void *disk_cx_stats_arg; inm_list_head_t disk_cx_stats_list; inm_list_head_t *ptr; inm_list_head_t *disk_cx_stats_ptr; inm_s32_t error = 0; vm_cx_session_t *vm_cx_sess = &driver_ctx->dc_vm_cx_session; disk_cx_session_t *disk_cx_sess; inm_u32_t num_output_disks; inm_u32_t num_output_disks_occupied = 0; disk_cx_stats_info_t *disk_cx_stats_info = NULL; DEVICE_CXFAILURE_STATS *dev_cx_stats; inm_u64_t flags; target_context_t *tgt_ctxt; int found; int ret; INM_INIT_LIST_HEAD(&disk_cx_stats_list); if (!INM_ACCESS_OK(VERIFY_READ, (void __user*)arg, sizeof(GET_CXFAILURE_NOTIFY))) { err( "Access Violation for GET_CXFAILURE_NOTIFY"); error = INM_EFAULT; goto out; } get_cx_notify = INM_KMALLOC((sizeof(GET_CXFAILURE_NOTIFY) - sizeof(VOLUME_GUID)), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!get_cx_notify) { err("failed to allocate GET_CXFAILURE_NOTIFY"); error = INM_ENOMEM; goto out; } if (INM_COPYIN(get_cx_notify, arg, (sizeof(GET_CXFAILURE_NOTIFY) - sizeof(VOLUME_GUID)))) { err("copyin failed for GET_CXFAILURE_NOTIFY"); error = INM_EFAULT; goto out; } if (!get_cx_notify->ulNumberOfProtectedDisks) { err("GET_CXFAILURE_NOTIFY: Number of protected disks can't be zero"); error = INM_EINVAL; } if (!get_cx_notify->ulNumberOfOutputDisks) { err("GET_CXFAILURE_NOTIFY: Number of output disks can't be zero"); error = INM_EINVAL; } if (get_cx_notify->ulNumberOfOutputDisks < get_cx_notify->ulNumberOfProtectedDisks) { err("GET_CXFAILURE_NOTIFY: Number of output disks can't be less than" " number of protected disks"); error = INM_EINVAL; } vm_cx_stats = INM_KMALLOC((sizeof(VM_CXFAILURE_STATS) - sizeof(DEVICE_CXFAILURE_STATS)), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!vm_cx_stats) { err("failed to allocate VM_CXFAILURE_STATS"); error = INM_ENOMEM; goto out; } INM_MEM_ZERO(vm_cx_stats, sizeof(VM_CXFAILURE_STATS) - sizeof(DEVICE_CXFAILURE_STATS)); device_list_arg = arg + sizeof(GET_CXFAILURE_NOTIFY) - sizeof(VOLUME_GUID); num_output_disks = get_cx_notify->ulNumberOfOutputDisks; error = validate_output_disk_buffer(device_list_arg, get_cx_notify->ulNumberOfProtectedDisks, get_cx_notify->ulNumberOfOutputDisks, &num_output_disks_occupied, &disk_cx_stats_list); if (error) goto out; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); inm_list_replace_init(&disk_cx_stats_list, &driver_ctx->dc_disk_cx_stats_list); driver_ctx->dc_num_disk_cx_stats = num_output_disks_occupied; driver_ctx->dc_num_consecutive_tags_failed = get_cx_notify->ullMinConsecutiveTagFailures; driver_ctx->dc_disk_level_supported_churn = get_cx_notify->ullMaxDiskChurnSupportedMBps << MEGABYTE_BIT_SHIFT; driver_ctx->dc_vm_level_supported_churn = get_cx_notify->ullMaxVMChurnSupportedMBps << MEGABYTE_BIT_SHIFT; driver_ctx->dc_max_fwd_timejump_ms = get_cx_notify->ullMaximumTimeJumpFwdAcceptableInMs; driver_ctx->dc_max_bwd_timejump_ms = get_cx_notify->ullMaximumTimeJumpBwdAcceptableInMs; if (get_cx_notify->ullFlags & CXSTATUS_COMMIT_PREV_SESSION && get_cx_notify->ulTransactionID && get_cx_notify->ulTransactionID == vm_cx_sess->vcs_transaction_id) { vm_cx_sess->vcs_transaction_id = 0; if (!(vm_cx_sess->vcs_flags & VCS_CX_PRODUCT_ISSUE) && (get_cx_notify->ullMinConsecutiveTagFailures <= vm_cx_sess->vcs_num_consecutive_tag_failures)) vm_cx_sess->vcs_num_consecutive_tag_failures = 0; vm_cx_sess->vcs_timejump_ts = 0; vm_cx_sess->vcs_flags &= ~(VCS_CX_TIME_JUMP_FWD | VCS_CX_TIME_JUMP_BWD); } while(1) { if (driver_ctx->dc_wokeup_monitor_thread) { driver_ctx->dc_wokeup_monitor_thread = 0; break; } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); ret = inm_wait_event_interruptible_timeout( driver_ctx->dc_vm_cx_session_waitq, should_wakeup_monitor_thread(vm_cx_sess, get_cx_notify), 60 * INM_HZ); INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); if (ret || should_wakeup_monitor_thread(vm_cx_sess, get_cx_notify)) { driver_ctx->dc_wokeup_monitor_thread = 0; break; } } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); arg += sizeof(GET_CXFAILURE_NOTIFY) + ((get_cx_notify->ulNumberOfProtectedDisks - 1) * sizeof(VOLUME_GUID)); disk_cx_stats_arg = arg + sizeof(VM_CXFAILURE_STATS) - sizeof(DEVICE_CXFAILURE_STATS); INM_DOWN_READ(&(driver_ctx->tgt_list_sem)); INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); inm_list_replace_init(&driver_ctx->dc_disk_cx_stats_list, &disk_cx_stats_list); num_output_disks_occupied = driver_ctx->dc_num_disk_cx_stats; if (!(vm_cx_sess->vcs_flags & VCS_CX_SESSION_ENDED) || (vm_cx_sess->vcs_flags & VCS_CX_PRODUCT_ISSUE) || (get_cx_notify->ullMinConsecutiveTagFailures > vm_cx_sess->vcs_num_consecutive_tag_failures)) goto update_timejump; for (ptr = driver_ctx->tgt_list.next; ptr != &(driver_ctx->tgt_list); ptr = ptr->next, tgt_ctxt = NULL) { tgt_ctxt = inm_list_entry(ptr, target_context_t, tc_list); if (tgt_ctxt->tc_flags & (VCF_VOLUME_CREATING | VCF_VOLUME_DELETING)) continue; disk_cx_sess = &tgt_ctxt->tc_disk_cx_session; found = 0; for (disk_cx_stats_ptr = disk_cx_stats_list.next; disk_cx_stats_ptr != &disk_cx_stats_list; disk_cx_stats_ptr = disk_cx_stats_ptr->next) { disk_cx_stats_info = inm_list_entry(disk_cx_stats_ptr, disk_cx_stats_info_t, dcsi_list); if (!disk_cx_stats_info->dcsi_valid) continue; dev_cx_stats = &disk_cx_stats_info->dcsi_dev_cx_stats; if (!strncmp(tgt_ctxt->tc_pname, dev_cx_stats->DeviceId.volume_guid, GUID_SIZE_IN_CHARS)) { found = 1; dev_cx_stats->ullFlags &= ~DISK_CXSTATUS_DISK_NOT_FILTERED; break; } } if (found) goto update_disk_cx_session; if (num_output_disks_occupied == num_output_disks) { INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); INM_UP_READ(&(driver_ctx->tgt_list_sem)); error = INM_EAGAIN; goto out; } disk_cx_stats_info = inm_list_entry(disk_cx_stats_list.prev, disk_cx_stats_info_t, dcsi_list); inm_list_del(&disk_cx_stats_info->dcsi_list); disk_cx_stats_info->dcsi_valid = 1; dev_cx_stats = &disk_cx_stats_info->dcsi_dev_cx_stats; strcpy_s(dev_cx_stats->DeviceId.volume_guid, GUID_SIZE_IN_CHARS, tgt_ctxt->tc_pname); inm_list_add(&disk_cx_stats_info->dcsi_list, &disk_cx_stats_list); num_output_disks_occupied++; update_disk_cx_session: disk_cx_sess->dcs_disk_cx_stats_info = disk_cx_stats_info; flags = 0; if (!(disk_cx_sess->dcs_flags & DCS_CX_SESSION_STARTED) || !(vm_cx_sess->vcs_flags & VCS_CX_SESSION_ENDED)) continue; if (disk_cx_sess->dcs_nr_nw_failures) flags |= DISK_CXSTATUS_NWFAILURE_FLAG; else if (disk_cx_sess->dcs_max_peak_churn) flags |= DISK_CXSTATUS_PEAKCHURN_FLAG; else if (disk_cx_sess->dcs_tracked_bytes > disk_cx_sess->dcs_drained_bytes) { flags |= DISK_CXSTATUS_CHURNTHROUGHPUT_FLAG; dev_cx_stats->ullDiffChurnThroughputInBytes = (disk_cx_sess->dcs_tracked_bytes - disk_cx_sess->dcs_drained_bytes); } dev_cx_stats->ullFlags |= flags; dev_cx_stats->firstNwFailureTS = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( disk_cx_sess->dcs_first_nw_failure_ts); dev_cx_stats->lastNwFailureTS = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( disk_cx_sess->dcs_last_nw_failure_ts); dev_cx_stats->ullTotalNWErrors = disk_cx_sess->dcs_nr_nw_failures; dev_cx_stats->ullLastNWErrorCode = disk_cx_sess->dcs_last_nw_failure_error_code; dev_cx_stats->firstPeakChurnTS = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( disk_cx_sess->dcs_first_peak_churn_ts); dev_cx_stats->lastPeakChurnTS = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( disk_cx_sess->dcs_last_peak_churn_ts); dev_cx_stats->ullTotalExcessChurnInBytes = disk_cx_sess->dcs_excess_churn; memcpy_s(dev_cx_stats->ChurnBucketsMBps, sizeof(dev_cx_stats->ChurnBucketsMBps), disk_cx_sess->dcs_churn_buckets, sizeof(disk_cx_sess->dcs_churn_buckets)); dev_cx_stats->ullMaximumPeakChurnInBytes = disk_cx_sess->dcs_max_peak_churn; dev_cx_stats->ullMaxS2LatencyInMS = (disk_cx_sess->dcs_max_s2_latency / 10000ULL); dev_cx_stats->CxStartTS = disk_cx_sess->dcs_start_ts; dev_cx_stats->CxEndTS = vm_cx_sess->vcs_end_ts; } /* Update VM CX session */ flags = 0; if (!(vm_cx_sess->vcs_flags & VCS_CX_SESSION_ENDED)) goto update_timejump; if (vm_cx_sess->vcs_max_peak_churn) flags |= VM_CXSTATUS_PEAKCHURN_FLAG; else if (vm_cx_sess->vcs_tracked_bytes - vm_cx_sess->vcs_drained_bytes) { flags |= VM_CXSTATUS_CHURNTHROUGHPUT_FLAG; vm_cx_stats->ullDiffChurnThroughputInBytes = (vm_cx_sess->vcs_tracked_bytes - vm_cx_sess->vcs_drained_bytes); } vm_cx_stats->ullFlags |= flags; vm_cx_stats->firstPeakChurnTS = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( vm_cx_sess->vcs_first_peak_churn_ts); vm_cx_stats->lastPeakChurnTS = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( vm_cx_sess->vcs_last_peak_churn_ts); vm_cx_stats->ullTotalExcessChurnInBytes = vm_cx_sess->vcs_excess_churn; memcpy_s(vm_cx_stats->ChurnBucketsMBps, sizeof(vm_cx_stats->ChurnBucketsMBps), vm_cx_sess->vcs_churn_buckets, sizeof(vm_cx_sess->vcs_churn_buckets)); vm_cx_stats->ullMaximumPeakChurnInBytes = vm_cx_sess->vcs_max_peak_churn; vm_cx_stats->ullMaxS2LatencyInMS = (vm_cx_sess->vcs_max_s2_latency / 10000ULL); vm_cx_stats->CxStartTS = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( vm_cx_sess->vcs_start_ts); vm_cx_stats->CxEndTS = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( vm_cx_sess->vcs_end_ts); if (!vm_cx_sess->vcs_transaction_id) vm_cx_sess->vcs_transaction_id = ++driver_ctx->dc_transaction_id; vm_cx_stats->ulTransactionID = vm_cx_sess->vcs_transaction_id; vm_cx_stats->ullNumOfConsecutiveTagFailures = vm_cx_sess->vcs_num_consecutive_tag_failures; vm_cx_stats->ullNumDisks = num_output_disks_occupied; update_timejump: if (!vm_cx_sess->vcs_transaction_id) vm_cx_sess->vcs_transaction_id = ++driver_ctx->dc_transaction_id; vm_cx_stats->ulTransactionID = vm_cx_sess->vcs_transaction_id; vm_cx_stats->TimeJumpTS = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( vm_cx_sess->vcs_timejump_ts); vm_cx_stats->ullTimeJumpInMS = vm_cx_sess->vcs_max_jump_ms; if (vm_cx_sess->vcs_flags & VCS_CX_TIME_JUMP_FWD) vm_cx_stats->ullFlags |= VM_CXSTATUS_TIMEJUMP_FWD_FLAG; if (vm_cx_sess->vcs_flags & VCS_CX_TIME_JUMP_BWD) vm_cx_stats->ullFlags |= VM_CXSTATUS_TIMEJUMP_BCKWD_FLAG; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); INM_UP_READ(&(driver_ctx->tgt_list_sem)); while (!inm_list_empty(&disk_cx_stats_list)) { disk_cx_stats_info = inm_list_entry(disk_cx_stats_list.next, disk_cx_stats_info_t, dcsi_list); inm_list_del(&disk_cx_stats_info->dcsi_list); if (!INM_ACCESS_OK(VERIFY_WRITE, (void __user*)disk_cx_stats_arg, sizeof(DEVICE_CXFAILURE_STATS))) { err("Access Violation for DEVICE_CXFAILURE_STATS"); INM_KFREE(disk_cx_stats_info, sizeof(disk_cx_stats_info_t), INM_KERNEL_HEAP); error = INM_EFAULT; goto out; } if (INM_COPYOUT(disk_cx_stats_arg, &disk_cx_stats_info->dcsi_dev_cx_stats, sizeof(DEVICE_CXFAILURE_STATS))) { err("copyout failed for DEVICE_CXFAILURE_STATS"); INM_KFREE(disk_cx_stats_info, sizeof(disk_cx_stats_info_t), INM_KERNEL_HEAP); error = INM_EFAULT; goto out; } INM_KFREE(disk_cx_stats_info, sizeof(disk_cx_stats_info_t), INM_KERNEL_HEAP); disk_cx_stats_arg += sizeof(DEVICE_CXFAILURE_STATS); } if (!INM_ACCESS_OK(VERIFY_WRITE, (void __user*)arg, sizeof(VM_CXFAILURE_STATS))) { err( "Access Violation for VM_CXFAILURE_STATS"); error = INM_EFAULT; goto out; } if (INM_COPYOUT(arg, vm_cx_stats, (sizeof(VM_CXFAILURE_STATS) - sizeof(DEVICE_CXFAILURE_STATS)))) { err("copyout failed for VM_CXFAILURE_STATS"); error = INM_EFAULT; goto out; } out: while (!inm_list_empty(&disk_cx_stats_list)) { disk_cx_stats_info = inm_list_entry(disk_cx_stats_list.next, disk_cx_stats_info_t, dcsi_list); inm_list_del(&disk_cx_stats_info->dcsi_list); INM_KFREE(disk_cx_stats_info, sizeof(disk_cx_stats_info_t), INM_KERNEL_HEAP); } if (get_cx_notify) INM_KFREE(get_cx_notify, (sizeof(GET_CXFAILURE_NOTIFY) - sizeof(VOLUME_GUID)), INM_KERNEL_HEAP); if (vm_cx_stats) INM_KFREE(vm_cx_stats, (sizeof(VM_CXFAILURE_STATS) - sizeof(DEVICE_CXFAILURE_STATS)), INM_KERNEL_HEAP); return error; } inm_s32_t process_wakeup_get_cxstatus_notify_ioctl(inm_devhandle_t *handle, void *arg) { INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); wake_up_interruptible(&driver_ctx->dc_vm_cx_session_waitq); driver_ctx->dc_wokeup_monitor_thread = 1; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); return 0; } inm_s32_t process_tag_drain_notify_ioctl(inm_devhandle_t *handle, void *arg) { TAG_COMMIT_NOTIFY_INPUT *tag_drain_notify_input = NULL; TAG_COMMIT_NOTIFY_OUTPUT *tag_drain_notify_output = NULL; VOLUME_GUID *guid = NULL; TAG_COMMIT_STATUS *tag_commit_status; int out_size = 0; int idx; void *device_list_arg; vm_cx_session_t *vm_cx_sess = &driver_ctx->dc_vm_cx_session; inm_list_head_t *ptr; target_context_t *tgt_ctxt; disk_cx_session_t *disk_cx_sess; inm_list_head_t *disk_cx_stats_ptr; disk_cx_stats_info_t *disk_cx_stats_info; DEVICE_CXFAILURE_STATS *dev_cx_stats; VM_CXFAILURE_STATS *vm_cx_stats; inm_u64_t flags; inm_s32_t error = 0; int found; static int tag_drain_notify_thread_in_progress = 0; info("Tag drain notify thread arrived"); if (!INM_ACCESS_OK(VERIFY_READ, (void __user*)arg, sizeof(TAG_COMMIT_NOTIFY_INPUT))) { err( "Access Violation for TAG_DRAIN_INPUT"); error = INM_EFAULT; goto out; } tag_drain_notify_input = INM_KMALLOC((sizeof(TAG_COMMIT_NOTIFY_INPUT) - sizeof(VOLUME_GUID)), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!tag_drain_notify_input) { err("failed to allocate TAG_COMMIT_NOTIFY_INPUT"); error = INM_ENOMEM; goto out; } if (INM_COPYIN(tag_drain_notify_input, arg, (sizeof(TAG_COMMIT_NOTIFY_INPUT) - sizeof(VOLUME_GUID)))) { err("copyin failed for TAG_COMMIT_NOTIFY_INPUT"); error = INM_EFAULT; goto out; } if (!tag_drain_notify_input->ulNumDisks) { err("TAG_COMMIT_NOTIFY_INPUT: Number of protected disks from user can't be zero"); error = INM_EFAULT; goto out; } driver_ctx->dc_tag_commit_notify_flag = tag_drain_notify_input->ulFlags; out_size = sizeof(TAG_COMMIT_NOTIFY_OUTPUT) - sizeof(TAG_COMMIT_STATUS) + tag_drain_notify_input->ulNumDisks * sizeof(TAG_COMMIT_STATUS); tag_drain_notify_output = INM_KMALLOC(out_size, INM_KM_SLEEP, INM_KERNEL_HEAP); if (!tag_drain_notify_output) { err("failed to allocate TAG_COMMIT_NOTIFY_OUTPUT"); error = INM_ENOMEM; goto out; } INM_MEM_ZERO(tag_drain_notify_output, out_size); memcpy_s(tag_drain_notify_output->TagGUID, GUID_LEN, tag_drain_notify_input->TagGUID, GUID_LEN); tag_drain_notify_output->ulNumDisks = tag_drain_notify_input->ulNumDisks; device_list_arg = arg + sizeof(TAG_COMMIT_NOTIFY_INPUT) - sizeof(VOLUME_GUID); guid = INM_KMALLOC(sizeof(VOLUME_GUID), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!guid) { err("Failed to allocate VOLUME_GUID"); error = INM_ENOMEM; goto out; } tag_commit_status = tag_drain_notify_output->TagStatus; for (idx = 0; idx < tag_drain_notify_input->ulNumDisks; idx++) { if (!INM_ACCESS_OK(VERIFY_READ, (void __user*)device_list_arg, sizeof(VOLUME_GUID))) { err( "Access Violation for VOLUME_GUID"); error = INM_EFAULT; goto out; } if (INM_COPYIN(guid, device_list_arg, sizeof(VOLUME_GUID))) { err("copyin failed for VOLUME_GUID"); error = INM_EFAULT; goto out; } memcpy_s(&tag_commit_status[idx].DeviceId, sizeof(VOLUME_GUID), guid, sizeof(VOLUME_GUID)); info("input pname = %s", tag_commit_status[idx].DeviceId.volume_guid); tag_commit_status[idx].Status = DEVICE_STATUS_UNKNOWN; tag_commit_status[idx].TagStatus = TAG_STATUS_UNINITALIZED; device_list_arg += sizeof(VOLUME_GUID); } INM_SPIN_LOCK(&driver_ctx->dc_tag_commit_status); if (tag_drain_notify_thread_in_progress) { INM_SPIN_UNLOCK(&driver_ctx->dc_tag_commit_status); info("One thread is already in progress, so quitting"); error = INM_EBUSY; goto out; } tag_drain_notify_thread_in_progress = 1; driver_ctx->dc_tag_drain_notify_guid = tag_drain_notify_output->TagGUID; info("input tag guid = %.36s", driver_ctx->dc_tag_drain_notify_guid); INM_SPIN_UNLOCK(&driver_ctx->dc_tag_commit_status); INM_DOWN_READ(&(driver_ctx->tgt_list_sem)); info("Number of disks = %llu and protected number of disks = %d", tag_drain_notify_input->ulNumDisks, driver_ctx->total_prot_volumes); if (tag_drain_notify_input->ulNumDisks > driver_ctx->total_prot_volumes) { INM_UP_READ(&(driver_ctx->tgt_list_sem)); err("TAG_COMMIT_NOTIFY_INPUT: Number of protected disks from user (%llu) can't" " be greater than the actual number protected disks (%d) at driver", tag_drain_notify_input->ulNumDisks, driver_ctx->total_prot_volumes); error = INM_EFAULT; goto out; } if (driver_ctx->dc_tag_commit_notify_flag & TAG_COMMIT_NOTIFY_BLOCK_DRAIN_FLAG) { int ret = 0; for (idx = 0; idx < tag_drain_notify_input->ulNumDisks; idx++) { char *uuid = &tag_commit_status[idx].DeviceId.volume_guid[0]; tgt_ctxt = get_tgt_ctxt_persisted_name_nowait_locked(uuid); if (!tgt_ctxt) { err("The disk %s is not protected", uuid); INM_SPIN_LOCK(&driver_ctx->dc_tag_commit_status); tag_commit_status[idx].Status = DEVICE_STATUS_NOT_FOUND; INM_SPIN_UNLOCK(&driver_ctx->dc_tag_commit_status); error = -ENODEV; ret = 1; break; } volume_lock(tgt_ctxt); if (tgt_ctxt->tc_flags & VCF_DRAIN_BLOCKED) { err("Draining is already blocked for uuid : %s\n", uuid); INM_SPIN_LOCK(&driver_ctx->dc_tag_commit_status); tag_commit_status[idx].Status = DEVICE_STATUS_DRAIN_ALREADY_BLOCKED; INM_SPIN_UNLOCK(&driver_ctx->dc_tag_commit_status); error = INM_EEXIST; ret = 1; } volume_unlock(tgt_ctxt); put_tgt_ctxt(tgt_ctxt); if (ret) { break; } } if (ret) { INM_UP_READ(&(driver_ctx->tgt_list_sem)); goto update_tag_drain_notify_output; } } found = 1; for (idx = 0; idx < tag_drain_notify_input->ulNumDisks; idx++) { char *uuid = &tag_commit_status[idx].DeviceId.volume_guid[0]; tgt_ctxt = get_tgt_ctxt_persisted_name_nowait_locked(uuid); if (!tgt_ctxt) { info("The disk %s is not protected", uuid); found = 0; INM_SPIN_LOCK(&driver_ctx->dc_tag_commit_status); tag_commit_status[idx].Status = DEVICE_STATUS_NOT_FOUND; tag_commit_status[idx].TagStatus = TAG_STATUS_UNKNOWN; INM_SPIN_UNLOCK(&driver_ctx->dc_tag_commit_status); continue; } INM_ATOMIC_INC(&driver_ctx->dc_nr_tag_commit_status_pending_disks); INM_SPIN_LOCK(&driver_ctx->dc_tag_commit_status); tag_commit_status[idx].Status = DEVICE_STATUS_SUCCESS; tag_commit_status[idx].TagStatus = TAG_STATUS_UNINITALIZED; tgt_ctxt->tc_tag_commit_status = &tag_commit_status[idx]; INM_SPIN_UNLOCK(&driver_ctx->dc_tag_commit_status); put_tgt_ctxt(tgt_ctxt); } INM_UP_READ(&(driver_ctx->tgt_list_sem)); INM_SPIN_LOCK(&driver_ctx->dc_tag_commit_status); while(found) { int ret = 0; if (driver_ctx->dc_wokeup_tag_drain_notify_thread) { driver_ctx->dc_wokeup_tag_drain_notify_thread = 0; error = INM_EINTR; break; } INM_SPIN_UNLOCK(&driver_ctx->dc_tag_commit_status); ret = inm_wait_event_interruptible_timeout( driver_ctx->dc_tag_commit_status_waitq, !INM_ATOMIC_READ(&driver_ctx->dc_nr_tag_commit_status_pending_disks) || INM_ATOMIC_READ(&driver_ctx->dc_tag_commit_status_failed), 60 * INM_HZ); INM_SPIN_LOCK(&driver_ctx->dc_tag_commit_status); if (ret || !INM_ATOMIC_READ(&driver_ctx->dc_nr_tag_commit_status_pending_disks) || INM_ATOMIC_READ(&driver_ctx->dc_tag_commit_status_failed)) { info("The tag drain notify waiitng over"); break; } } INM_SPIN_UNLOCK(&driver_ctx->dc_tag_commit_status); INM_DOWN_READ(&(driver_ctx->tgt_list_sem)); if (driver_ctx->dc_tag_commit_notify_flag & TAG_COMMIT_NOTIFY_BLOCK_DRAIN_FLAG) { int ret = 0; for (idx = 0; idx < tag_drain_notify_input->ulNumDisks; idx++) { char *uuid = &tag_commit_status[idx].DeviceId.volume_guid[0]; tgt_ctxt = get_tgt_ctxt_persisted_name_nowait_locked(uuid); if (!tgt_ctxt) { err("The disk %s is not protected", uuid); error = -ENODEV; ret = 1; break; } volume_lock(tgt_ctxt); if (!(tgt_ctxt->tc_flags & VCF_DRAIN_BLOCKED)) { err("Drain block failed for uuid : %s\n", uuid); error = INM_EFAULT; ret = 1; } volume_unlock(tgt_ctxt); put_tgt_ctxt(tgt_ctxt); if(ret) { break; } } if (ret) { unblock_drain: info("Unblocking drain for all disks\n"); for (idx = 0; idx < tag_drain_notify_input->ulNumDisks; idx++) { char *uuid = &tag_commit_status[idx].DeviceId.volume_guid[0]; tgt_ctxt = get_tgt_ctxt_persisted_name_nowait_locked(uuid); if (!tgt_ctxt) { err("The disk %s is not protected", uuid); error = -ENODEV; continue; } info("Unblocking drain for disk : %s\n", uuid); set_int_vol_attr(tgt_ctxt, VolumeDrainBlocked, 0); put_tgt_ctxt(tgt_ctxt); } } else { for (idx = 0; idx < tag_drain_notify_input->ulNumDisks; idx++) { char *uuid = &tag_commit_status[idx].DeviceId.volume_guid[0]; tgt_ctxt = get_tgt_ctxt_persisted_name_nowait_locked(uuid); if (!tgt_ctxt) { err("The disk %s is not protected", uuid); error = -ENODEV; goto unblock_drain; } info("Persist drain block for disk: %s\n", uuid); if(set_int_vol_attr(tgt_ctxt, VolumeDrainBlocked, 1)) { err("Persist drain block failed for disk :%s\n", uuid); tag_commit_status[idx].Status = DEVICE_STATUS_DRAIN_BLOCK_FAILED; put_tgt_ctxt(tgt_ctxt); error = INM_EFAULT; goto unblock_drain; } put_tgt_ctxt(tgt_ctxt); } } } INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); if (!(vm_cx_sess->vcs_flags & VCS_CX_SESSION_ENDED) || (vm_cx_sess->vcs_flags & VCS_CX_PRODUCT_ISSUE)) goto update_tag_drain_notify_output; for (ptr = driver_ctx->tgt_list.next; ptr != &(driver_ctx->tgt_list); ptr = ptr->next, tgt_ctxt = NULL) { int found = 0; tgt_ctxt = inm_list_entry(ptr, target_context_t, tc_list); if (tgt_ctxt->tc_flags & (VCF_VOLUME_CREATING | VCF_VOLUME_DELETING)) continue; disk_cx_sess = &tgt_ctxt->tc_disk_cx_session; if (!(disk_cx_sess->dcs_flags & DCS_CX_SESSION_STARTED) || !(vm_cx_sess->vcs_flags & VCS_CX_SESSION_ENDED)) continue; for (disk_cx_stats_ptr = driver_ctx->dc_disk_cx_stats_list.next; disk_cx_stats_ptr != &driver_ctx->dc_disk_cx_stats_list; disk_cx_stats_ptr = disk_cx_stats_ptr->next) { disk_cx_stats_info = inm_list_entry(disk_cx_stats_ptr, disk_cx_stats_info_t, dcsi_list); if (!disk_cx_stats_info->dcsi_valid) continue; dev_cx_stats = &disk_cx_stats_info->dcsi_dev_cx_stats; if (!strncmp(tgt_ctxt->tc_pname, dev_cx_stats->DeviceId.volume_guid, GUID_SIZE_IN_CHARS)) { found = 1; break; } } if (!found) continue; INM_SPIN_LOCK(&driver_ctx->dc_tag_commit_status); if (!tgt_ctxt->tc_tag_commit_status) { INM_SPIN_UNLOCK(&driver_ctx->dc_tag_commit_status); continue; } dev_cx_stats = &tgt_ctxt->tc_tag_commit_status->DeviceCxStats; INM_SPIN_UNLOCK(&driver_ctx->dc_tag_commit_status); flags = 0; if (disk_cx_sess->dcs_nr_nw_failures) flags |= DISK_CXSTATUS_NWFAILURE_FLAG; else if (disk_cx_sess->dcs_max_peak_churn) flags |= DISK_CXSTATUS_PEAKCHURN_FLAG; else if (disk_cx_sess->dcs_tracked_bytes > disk_cx_sess->dcs_drained_bytes) { flags |= DISK_CXSTATUS_CHURNTHROUGHPUT_FLAG; dev_cx_stats->ullDiffChurnThroughputInBytes = (disk_cx_sess->dcs_tracked_bytes - disk_cx_sess->dcs_drained_bytes); } dev_cx_stats->ullFlags |= flags; dev_cx_stats->firstNwFailureTS = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( disk_cx_sess->dcs_first_nw_failure_ts); dev_cx_stats->lastNwFailureTS = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( disk_cx_sess->dcs_last_nw_failure_ts); dev_cx_stats->ullTotalNWErrors = disk_cx_sess->dcs_nr_nw_failures; dev_cx_stats->ullLastNWErrorCode = disk_cx_sess->dcs_last_nw_failure_error_code; dev_cx_stats->firstPeakChurnTS = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( disk_cx_sess->dcs_first_peak_churn_ts); dev_cx_stats->lastPeakChurnTS = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( disk_cx_sess->dcs_last_peak_churn_ts); dev_cx_stats->ullTotalExcessChurnInBytes = disk_cx_sess->dcs_excess_churn; memcpy_s(dev_cx_stats->ChurnBucketsMBps, sizeof(dev_cx_stats->ChurnBucketsMBps), disk_cx_sess->dcs_churn_buckets, sizeof(disk_cx_sess->dcs_churn_buckets)); dev_cx_stats->ullMaximumPeakChurnInBytes = disk_cx_sess->dcs_max_peak_churn; dev_cx_stats->ullMaxS2LatencyInMS = (disk_cx_sess->dcs_max_s2_latency / 10000ULL); dev_cx_stats->CxStartTS = disk_cx_sess->dcs_start_ts; dev_cx_stats->CxEndTS = vm_cx_sess->vcs_end_ts; } /* Update VM CX session */ flags = 0; if (!(vm_cx_sess->vcs_flags & VCS_CX_SESSION_ENDED)) goto update_tag_drain_notify_output; vm_cx_stats = &tag_drain_notify_output->vmCxStatus; if (vm_cx_sess->vcs_max_peak_churn) flags |= VM_CXSTATUS_PEAKCHURN_FLAG; else if (vm_cx_sess->vcs_tracked_bytes - vm_cx_sess->vcs_drained_bytes) { flags |= VM_CXSTATUS_CHURNTHROUGHPUT_FLAG; vm_cx_stats->ullDiffChurnThroughputInBytes = (vm_cx_sess->vcs_tracked_bytes - vm_cx_sess->vcs_drained_bytes); } vm_cx_stats->ullFlags |= flags; vm_cx_stats->firstPeakChurnTS = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( vm_cx_sess->vcs_first_peak_churn_ts); vm_cx_stats->lastPeakChurnTS = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( vm_cx_sess->vcs_last_peak_churn_ts); vm_cx_stats->ullTotalExcessChurnInBytes = vm_cx_sess->vcs_excess_churn; memcpy_s(vm_cx_stats->ChurnBucketsMBps, sizeof(vm_cx_stats->ChurnBucketsMBps), vm_cx_sess->vcs_churn_buckets, sizeof(vm_cx_sess->vcs_churn_buckets)); vm_cx_stats->ullMaximumPeakChurnInBytes = vm_cx_sess->vcs_max_peak_churn; vm_cx_stats->ullMaxS2LatencyInMS = (vm_cx_sess->vcs_max_s2_latency / 10000ULL); vm_cx_stats->CxStartTS = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( vm_cx_sess->vcs_start_ts); vm_cx_stats->CxEndTS = TELEMETRY_FMT1601_TIMESTAMP_FROM_100NSEC( vm_cx_sess->vcs_end_ts); vm_cx_stats->ullNumOfConsecutiveTagFailures = 0; vm_cx_stats->ullNumDisks = 0; update_tag_drain_notify_output: INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_vm_cx_session_lock, driver_ctx->dc_vm_cx_session_lock_flag); INM_UP_READ(&(driver_ctx->tgt_list_sem)); if (!INM_ACCESS_OK(VERIFY_WRITE, (void __user*)device_list_arg, out_size)) { err( "Access Violation for TAG_COMMIT_NOTIFY_OUTPUT"); error = INM_EFAULT; goto out; } if (INM_COPYOUT(device_list_arg, tag_drain_notify_output, out_size)) { err("copyout failed for TAG_COMMIT_NOTIFY_OUTPUT"); error = INM_EFAULT; goto out; } out: INM_SPIN_LOCK(&driver_ctx->dc_tag_commit_status); driver_ctx->dc_tag_drain_notify_guid = NULL; tag_drain_notify_thread_in_progress = 0; driver_ctx->dc_tag_commit_notify_flag = 0; INM_SPIN_UNLOCK(&driver_ctx->dc_tag_commit_status); INM_DOWN_READ(&(driver_ctx->tgt_list_sem)); for (ptr = driver_ctx->tgt_list.next; ptr != &(driver_ctx->tgt_list); ptr = ptr->next, tgt_ctxt = NULL) { tgt_ctxt = inm_list_entry(ptr, target_context_t, tc_list); INM_SPIN_LOCK(&driver_ctx->dc_tag_commit_status); tgt_ctxt->tc_tag_commit_status = NULL; INM_SPIN_UNLOCK(&driver_ctx->dc_tag_commit_status); } INM_UP_READ(&(driver_ctx->tgt_list_sem)); INM_ATOMIC_SET(&driver_ctx->dc_nr_tag_commit_status_pending_disks, 0); INM_ATOMIC_SET(&driver_ctx->dc_tag_commit_status_failed, 0); if (guid) INM_KFREE(guid, sizeof(VOLUME_GUID), INM_KERNEL_HEAP); if (tag_drain_notify_output) INM_KFREE(tag_drain_notify_output, out_size, INM_KERNEL_HEAP); if (tag_drain_notify_input) INM_KFREE(tag_drain_notify_input, (sizeof(TAG_COMMIT_NOTIFY_INPUT) - sizeof(VOLUME_GUID)), INM_KERNEL_HEAP); info("Tag drain notify thread is quitting with error = %d", error); return error; } inm_s32_t process_wakeup_tag_drain_notify_ioctl(inm_devhandle_t *handle, void *arg) { info("Waking up the tag drain notify thread"); INM_SPIN_LOCK(&driver_ctx->dc_tag_commit_status); wake_up_interruptible(&driver_ctx->dc_tag_commit_status_waitq); driver_ctx->dc_wokeup_tag_drain_notify_thread = 1; INM_SPIN_UNLOCK(&driver_ctx->dc_tag_commit_status); return 0; } inm_s32_t process_modify_persistent_device_name(inm_devhandle_t *handle, void *arg) { MODIFY_PERSISTENT_DEVICE_NAME_INPUT *modify_pname_input = NULL; target_context_t *tgt_ctxt = NULL; inm_s32_t error = 0; char *old_path = NULL, *new_path = NULL; if (!INM_ACCESS_OK(VERIFY_READ, (void __user*)arg, sizeof(MODIFY_PERSISTENT_DEVICE_NAME_INPUT))) { err( "Access Violation for MODIFY_PERSISTENT_DEVICE_NAME"); error = INM_EFAULT; goto out; } modify_pname_input = INM_KMALLOC(sizeof(MODIFY_PERSISTENT_DEVICE_NAME_INPUT), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!modify_pname_input) { err("failed to allocate MODIFY_PERSISTENT_DEVICE_NAME_INPUT"); error = INM_ENOMEM; goto out; } if (INM_COPYIN(modify_pname_input, arg, sizeof(MODIFY_PERSISTENT_DEVICE_NAME_INPUT))) { err("copyin failed for MODIFY_PERSISTENT_DEVICE_NAME_INPUT"); error = INM_EFAULT; goto out; } info("Modifying persistent device name source disk : %s, old pname : %s, new pname : %s\n", modify_pname_input->DevName.volume_guid, modify_pname_input->OldPName.volume_guid, modify_pname_input->NewPName.volume_guid); tgt_ctxt = get_tgt_ctxt_from_uuid_nowait( modify_pname_input->DevName.volume_guid); if (!tgt_ctxt) { err("The disk %s is not protected", modify_pname_input->DevName.volume_guid); error = -ENODEV; goto out; } if (strncmp(tgt_ctxt->tc_pname, modify_pname_input->OldPName.volume_guid, GUID_SIZE_IN_CHARS)) { err("Device : %s, Expected pname : %s, Received pname :%s\n", modify_pname_input->DevName.volume_guid, tgt_ctxt->tc_pname, modify_pname_input->OldPName.volume_guid); error = -EINVAL; goto out; } old_path = (char *)INM_KMALLOC(INM_PATH_MAX, INM_KM_SLEEP, INM_KERNEL_HEAP); if(!old_path){ err("Allocation of memory failed for old_path.\n"); error = INM_ENOMEM; goto out; } new_path = (char *)INM_KMALLOC(INM_PATH_MAX, INM_KM_SLEEP, INM_KERNEL_HEAP); if(!new_path){ err("Allocation of memory failed for new_path.\n"); error = INM_ENOMEM; goto out; } snprintf(old_path, INM_PATH_MAX, "%s", tgt_ctxt->tc_pname); snprintf(new_path, INM_PATH_MAX, "%s", modify_pname_input->NewPName.volume_guid); if (inm_is_upgrade_pname(old_path, new_path)) { error = INM_EFAULT; goto out; } snprintf(old_path, INM_PATH_MAX, "%s/%s%s%s", tgt_ctxt->tc_pname, LOG_FILE_NAME_PREFIX, tgt_ctxt->tc_pname, LOG_FILE_NAME_SUFFIX); snprintf(new_path, INM_PATH_MAX, "%s/%s%s%s", tgt_ctxt->tc_pname, LOG_FILE_NAME_PREFIX, modify_pname_input->NewPName.volume_guid, LOG_FILE_NAME_SUFFIX); if (inm_is_upgrade_pname(old_path, new_path)) { error = INM_EFAULT; goto out; } error = modify_persistent_device_name(tgt_ctxt, modify_pname_input->NewPName.volume_guid); out: if(tgt_ctxt) { put_tgt_ctxt(tgt_ctxt); } if (new_path) { INM_KFREE(new_path, INM_PATH_MAX, INM_KERNEL_HEAP); } if (old_path) { INM_KFREE(old_path, INM_PATH_MAX, INM_KERNEL_HEAP); } if (modify_pname_input) { INM_KFREE(modify_pname_input, sizeof(MODIFY_PERSISTENT_DEVICE_NAME_INPUT), INM_KERNEL_HEAP); } dbg("modify persistent device name is exiting with error = %d", error); return error; } inm_s32_t process_get_drain_state_ioctl(inm_devhandle_t *handle, void *arg) { GET_DISK_STATE_INPUT *drain_state_input = NULL; GET_DISK_STATE_OUTPUT *drain_state_output = NULL; VOLUME_GUID *guid = NULL; int out_size = 0; int idx; void *device_list_arg; target_context_t *tgt_ctxt; char *uuid; inm_s32_t error = 0; dbg("Get Drain state thread arrived"); if (!INM_ACCESS_OK(VERIFY_READ, (void __user*)arg, sizeof(GET_DISK_STATE_INPUT))) { err("Access Violation for GET_DISK_STATE_INPUT"); error = INM_EFAULT; goto out; } drain_state_input = INM_KMALLOC((sizeof(GET_DISK_STATE_INPUT) - sizeof(VOLUME_GUID)), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!drain_state_input) { err("failed to allocate GET_DISK_STATE_INPUT"); error = INM_ENOMEM; goto out; } if (INM_COPYIN(drain_state_input, arg, (sizeof(GET_DISK_STATE_INPUT) - sizeof(VOLUME_GUID)))) { err("copyin failed for GET_DISK_STATE_INPUT"); error = INM_EFAULT; goto out; } if (!drain_state_input->ulNumDisks) { err("GET_DISK_STATE_INPUT: Number of protected disks from user can't be zero"); error = INM_EFAULT; goto out; } out_size = sizeof(GET_DISK_STATE_OUTPUT) - sizeof(DISK_STATE) + drain_state_input->ulNumDisks * sizeof(DISK_STATE); drain_state_output = INM_KMALLOC(out_size, INM_KM_SLEEP, INM_KERNEL_HEAP); if (!drain_state_output) { err("failed to allocate GET_DISK_STATE_OUTPUT"); error = INM_ENOMEM; goto out; } INM_MEM_ZERO(drain_state_output, out_size); drain_state_output->ulNumDisks = drain_state_input->ulNumDisks; device_list_arg = arg + sizeof(GET_DISK_STATE_INPUT) - sizeof(VOLUME_GUID); guid = INM_KMALLOC(sizeof(VOLUME_GUID), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!guid) { err("Failed to allocate VOLUME_GUID"); error = INM_ENOMEM; goto out; } INM_DOWN_READ(&(driver_ctx->tgt_list_sem)); for (idx = 0; idx < drain_state_input->ulNumDisks; idx++) { if (!INM_ACCESS_OK(VERIFY_READ, (void __user*)device_list_arg, sizeof(VOLUME_GUID))) { err( "Access Violation for VOLUME_GUID"); error = INM_EFAULT; break; } if (INM_COPYIN(guid, device_list_arg, sizeof(VOLUME_GUID))) { err("copyin failed for VOLUME_GUID"); error = INM_EFAULT; break; } memcpy_s(&drain_state_output->diskState[idx].DeviceId, sizeof(VOLUME_GUID), guid, sizeof(VOLUME_GUID)); uuid = drain_state_output->diskState[idx].DeviceId.volume_guid; info("input pname : %s", uuid); tgt_ctxt = get_tgt_ctxt_persisted_name_nowait_locked(uuid); if (!tgt_ctxt) { err("The disk %s is not protected", uuid); error = -ENODEV; break; } if (tgt_ctxt->tc_flags & VCF_DRAIN_BLOCKED) { info("Draining blocked for uuid : %s.\n", uuid); drain_state_output->diskState[idx].ulFlags |= DISK_STATE_DRAIN_BLOCKED; } else { info("Draining is not blocked for uuid : %s\n", uuid); drain_state_output->diskState[idx].ulFlags |= DISK_STATE_FILTERED; } put_tgt_ctxt(tgt_ctxt); device_list_arg += sizeof(VOLUME_GUID); } INM_UP_READ(&(driver_ctx->tgt_list_sem)); if (!INM_ACCESS_OK(VERIFY_WRITE, (void __user*)device_list_arg, out_size)) { err("Access Violation for GET_DISK_STATE_OUTPUT"); error = INM_EFAULT; goto out; } if (INM_COPYOUT(device_list_arg, drain_state_output, out_size)) { err("copyout failed for GET_DISK_STATE_OUTPUT"); error = INM_EFAULT; goto out; } out: if (guid) INM_KFREE(guid, sizeof(VOLUME_GUID), INM_KERNEL_HEAP); if (drain_state_output) INM_KFREE(drain_state_output, out_size, INM_KERNEL_HEAP); if (drain_state_input) INM_KFREE(drain_state_input, (sizeof(GET_DISK_STATE_INPUT) - sizeof(VOLUME_GUID)), INM_KERNEL_HEAP); dbg("Get Drain state thread is quitting with error = %d", error); return error; } inm_s32_t process_set_drain_state_ioctl(inm_devhandle_t *handle, void *arg) { SET_DRAIN_STATE_INPUT *drain_state_input = NULL; SET_DRAIN_STATE_OUTPUT *drain_state_output = NULL; VOLUME_GUID *guid = NULL; int idx; int out_size = 0; void *device_list_arg; target_context_t *tgt_ctxt; char *uuid; inm_s32_t error = 0; int ret; dbg("Set Drain state thread arrived"); if (!INM_ACCESS_OK(VERIFY_READ, (void __user*)arg, sizeof(SET_DRAIN_STATE_INPUT))) { err( "Access Violation for SET_DRAIN_STATE_INPUT"); error = INM_EFAULT; goto out; } drain_state_input = INM_KMALLOC((sizeof(SET_DRAIN_STATE_INPUT) - sizeof(VOLUME_GUID)), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!drain_state_input) { err("failed to allocate SET_DRAIN_STATE_INPUT"); error = INM_ENOMEM; goto out; } if (INM_COPYIN(drain_state_input, arg, (sizeof(SET_DRAIN_STATE_INPUT) - sizeof(VOLUME_GUID)))) { err("copyin failed for SET_DRAIN_STATE_INPUT"); error = INM_EFAULT; goto out; } if (!drain_state_input->ulNumDisks) { err("SET_DRAIN_STATE_INPUT: Number of protected disks from user can't be zero"); error = INM_EFAULT; goto out; } out_size = sizeof(SET_DRAIN_STATE_OUTPUT) - sizeof(DISK_STATE) + drain_state_input->ulNumDisks * sizeof(DISK_STATE); drain_state_output = INM_KMALLOC(out_size, INM_KM_SLEEP, INM_KERNEL_HEAP); if (!drain_state_output) { err("failed to allocate SET_DRAIN_STATE_OUTPUT"); error = INM_ENOMEM; goto out; } INM_MEM_ZERO(drain_state_output, out_size); drain_state_output->ulNumDisks = drain_state_input->ulNumDisks; for (idx = 0; idx < drain_state_input->ulNumDisks; idx++) { drain_state_output->diskStatus[idx].Status = SET_DRAIN_STATUS_UNKNOWN; } device_list_arg = arg + sizeof(SET_DRAIN_STATE_INPUT) - sizeof(VOLUME_GUID); guid = INM_KMALLOC(sizeof(VOLUME_GUID), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!guid) { err("Failed to allocate VOLUME_GUID"); error = INM_ENOMEM; goto out; } INM_DOWN_READ(&(driver_ctx->tgt_list_sem)); for (idx = 0; idx < drain_state_input->ulNumDisks; idx++) { if (!INM_ACCESS_OK(VERIFY_READ, (void __user*)device_list_arg, sizeof(VOLUME_GUID))) { err( "Access Violation for VOLUME_GUID"); error = INM_EFAULT; break; } if (INM_COPYIN(guid, device_list_arg, sizeof(VOLUME_GUID))) { err("copyin failed for VOLUME_GUID"); error = INM_EFAULT; break; } memcpy_s(&drain_state_output->diskStatus[idx].DeviceId, sizeof(VOLUME_GUID), guid, sizeof(VOLUME_GUID)); uuid = guid->volume_guid; info("input pname : %s", uuid); tgt_ctxt = get_tgt_ctxt_persisted_name_nowait_locked(uuid); if (!tgt_ctxt) { err("The disk %s is not protected", uuid); error = -ENODEV; drain_state_output->diskStatus[idx].Status = SET_DRAIN_STATUS_DEVICE_NOT_FOUND; break; } info("Unblocking drain for disk : %s\n", uuid); ret = set_int_vol_attr(tgt_ctxt, VolumeDrainBlocked, 0); if (ret) { err ("Unblocking drain failed for %s\n", uuid); error = INM_EFAULT; drain_state_output->diskStatus[idx].Status = SET_DRAIN_STATUS_PERSISTENCE_FAILED; drain_state_output->diskStatus[idx].ulInternalError = ret; } else { drain_state_output->diskStatus[idx].Status = SET_DRAIN_STATUS_SUCCESS; } put_tgt_ctxt(tgt_ctxt); device_list_arg += sizeof(VOLUME_GUID); } INM_UP_READ(&(driver_ctx->tgt_list_sem)); if (!INM_ACCESS_OK(VERIFY_WRITE, (void __user*)device_list_arg, out_size)) { err("Access Violation for GET_DISK_STATE_OUTPUT"); error = INM_EFAULT; goto out; } if (INM_COPYOUT(device_list_arg, drain_state_output, out_size)) { err("copyout failed for GET_DISK_STATE_OUTPUT"); error = INM_EFAULT; goto out; } out: if (guid) INM_KFREE(guid, sizeof(VOLUME_GUID), INM_KERNEL_HEAP); if (drain_state_input) INM_KFREE(drain_state_input, (sizeof(GET_DISK_STATE_INPUT) - sizeof(VOLUME_GUID)), INM_KERNEL_HEAP); dbg("Set Drain state thread is quitting with error = %d", error); return error; } involflt-0.1.0/src/osdep.c0000755000000000000000000025412014467303177014133 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include "involflt.h" #include "involflt-common.h" #include "utils.h" #include "change-node.h" #include "filestream.h" #include "iobuffer.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "bitmap_api.h" #include "VBitmap.h" #include "work_queue.h" #include "data-file-mode.h" #include "target-context.h" #include "data-mode.h" #include "driver-context.h" #include "file-io.h" #include "metadata-mode.h" #include "statechange.h" #include "db_routines.h" #include "filter_host.h" #include "filter_lun.h" #include "errlog.h" #include #include #if LINUX_VERSION_CODE <= KERNEL_VERSION(2,6,13) #include #endif #include #if (LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,30) && \ LINUX_VERSION_CODE <= KERNEL_VERSION(2,6,35)) #include #endif #include #include "ioctl.h" #include "tunable_params.h" #include "telemetry-types.h" #include "telemetry.h" #include "last_chance_writes.h" #include "flt_bio.h" #include "distro.h" #if LINUX_VERSION_CODE >= KERNEL_VERSION(4,9,0) || defined SLES12 || \ defined SLES15 #if LINUX_VERSION_CODE < KERNEL_VERSION(4,10,0) || defined SLES12 || \ defined SLES15 #include #endif #endif extern driver_context_t *driver_ctx; atomic_t inm_flt_memprint; #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,32) static int inm_sd_open(struct block_device *bdev, fmode_t mode); #else static int inm_sd_open(struct inode *inode, struct file *filp); #endif static inm_s32_t inm_one_AT_cdb_send(inm_block_device_t *, unsigned char *, inm_u32_t, inm_s32_t, unsigned char *, inm_u32_t); static dc_at_vol_entry_t *alloc_dc_vol_entry(void); static void free_dc_vol_entry(dc_at_vol_entry_t *at_vol_entry); #ifdef IDEBUG_MIRROR_IO extern inm_s32_t inject_atio_err; extern inm_s32_t inject_ptio_err; extern inm_s32_t inject_vendorcdb_err; extern inm_s32_t clear_vol_entry_err; #endif #if LINUX_VERSION_CODE <= KERNEL_VERSION(2,6,13) static inm_s32_t queue_request_scsi(inm_block_device_t *, unsigned char *, inm_u32_t , inm_s32_t , unsigned char *, inm_u32_t ); #endif #if LINUX_VERSION_CODE < KERNEL_VERSION(4,11,0) static inm_s32_t process_sense_info(char *sense); #endif #if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 5, 0) struct task_struct *service_thread_task; #endif #ifdef INM_LINUX #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,30) inm_super_block_t *freeze_bdev(inm_block_device_t *); void thaw_bdev(inm_block_device_t *, inm_super_block_t *); #endif #endif flt_timer_t cp_timer; void inm_fvol_list_thaw_on_timeout(wqentry_t *not_used); inm_s32_t iobarrier_issue_tag_all_volume(tag_info_t *tag_list, int nr_tags, int commit_pending, tag_telemetry_common_t *); inm_s32_t iobarrier_add_volume_tags(tag_volinfo_t *tag_volinfop, tag_info_t *tag_info_listp, int nr_tags, int commit_pending, tag_telemetry_common_t *); #if LINUX_VERSION_CODE >= KERNEL_VERSION(4,20,0) inm_s64_t inm_current_kernel_time_secs(void) { inm_timespec ts; INM_GET_CURRENT_TIME(ts); return ts.tv_sec; } #endif inline int _inm_xm_mapin(struct _target_context *tgt_ctxt, void *wdatap, char **map_addr) { return 0; } void freeze_volumes(inm_s32_t vols, tag_volinfo_t *vol_list) { inm_s32_t num_vols = 0; while (num_vols < vols) { if (vol_list->ctxt) { fs_freeze_volume(vol_list->ctxt, &vol_list->head); } num_vols++; vol_list++; } } void unfreeze_volumes(inm_s32_t vols, tag_volinfo_t *vol_list) { inm_s32_t num_vols = 0; while (num_vols < vols) { if (vol_list->ctxt) { thaw_volume(vol_list->ctxt, &vol_list->head); } num_vols++; vol_list++; } } void lock_volumes(inm_s32_t vols, tag_volinfo_t *vol_list) { inm_s32_t num_vols = 0; while(num_vols < vols) { if(vol_list->ctxt) { INM_SPIN_LOCK_IRQSAVE(&vol_list->ctxt->tc_lock, vol_list->ctxt->tc_lock_flag); } num_vols++; vol_list++; } } void unlock_volumes(inm_s32_t vols, tag_volinfo_t *vol_list) { while(vols > 0) { vols--; if(vol_list[vols].ctxt) { INM_SPIN_UNLOCK_IRQRESTORE(&vol_list[vols].ctxt->tc_lock, vol_list[vols].ctxt->tc_lock_flag); } } } inm_s32_t is_rootfs_ro(void) { int retval = 0; inm_block_device_t *bdevp = NULL; /* check whether root file system is in read only mode */ bdevp = inm_open_by_devnum(driver_ctx->root_dev, FMODE_READ); if (!IS_ERR(bdevp)) { #if LINUX_VERSION_CODE >= KERNEL_VERSION(5,11,0) inm_super_block_t *sbp = bdevp->bd_super; #else inm_super_block_t *sbp = get_super(bdevp); #endif if (sbp) { #define INM_FS_RDONLY 1 if (sbp->s_flags & INM_FS_RDONLY) { dbg("root is read only file system \n"); retval = 1; } #if LINUX_VERSION_CODE < KERNEL_VERSION(5,11,0) drop_super(sbp); #endif } close_bdev(bdevp, FMODE_READ); } return retval; } inm_u64_t get_bmfile_granularity(target_context_t *vcptr) { char *buffer = NULL; inm_u32_t read = 0; void *hdl = NULL; inm_u64_t ret = 0; logheader_t *hdr = NULL; if (!vcptr || !vcptr->tc_bp || !vcptr->tc_bp->bitmap_file_name) { dbg("Unable to get bitmap granularity from bitmap file"); ret = 0; return ret; } if (!flt_open_file(vcptr->tc_bp->bitmap_file_name, O_RDONLY, &hdl)) { dbg("Unable to open bitmap granularity from bitmap file"); return ret; } buffer = (char *)INM_KMALLOC(INM_SECTOR_SIZE, INM_KM_SLEEP, INM_KERNEL_HEAP); if (!buffer) { dbg("Unable to allocate memory while getting bitmap file " "granularity from file"); goto close_return; } if (flt_read_file(hdl, buffer, 0, INM_SECTOR_SIZE, (inm_s32_t *) &read)) { hdr = (logheader_t*)buffer; ret = hdr->bitmap_granularity; } flt_close_file(hdl); if (buffer) { INM_KFREE(buffer, INM_SECTOR_SIZE, INM_KERNEL_HEAP); buffer=NULL; } dbg("Bitmap granularity : %llu",ret); return ret; close_return: flt_close_file(hdl); return ret; } inm_s32_t dev_validate(inm_dev_extinfo_t *dev_info, host_dev_ctx_t **hdcp) { inm_s32_t ret = 0; struct block_device *bdev; struct inm_list_head *ptr = NULL,*nextptr = NULL; mirror_vol_entry_t *vol_entry; host_dev_t *hdc_dev = NULL; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("dev_validate: entered"); } *hdcp = (host_dev_ctx_t *) INM_KMALLOC(sizeof(host_dev_ctx_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!(*hdcp)) return 1; INM_MEM_ZERO(*hdcp, sizeof(host_dev_ctx_t)); INM_INIT_LIST_HEAD(&((*hdcp)->hdc_dev_list_head)); INM_INIT_WAITQUEUE_HEAD(&((*hdcp)->resync_notify)); switch (dev_info->d_type) { case FILTER_DEV_HOST_VOLUME: bdev = open_by_dev_path(dev_info->d_guid, 0); /* open by device path */ if (bdev) { hdc_dev = (host_dev_t*)INM_KMALLOC(sizeof(host_dev_t), INM_KM_SLEEP, INM_KERNEL_HEAP); INM_MEM_ZERO(hdc_dev, sizeof(host_dev_t)); if (hdc_dev) { hdc_dev->hdc_dev = bdev->bd_inode->i_rdev; hdc_dev->hdc_disk_ptr = bdev->bd_disk; inm_list_add_tail(&hdc_dev->hdc_dev_list, &((*hdcp)->hdc_dev_list_head)); } #if LINUX_VERSION_CODE < KERNEL_VERSION(5,11,0) if (bdev->bd_part) { (*hdcp)->hdc_start_sect = bdev->bd_part->start_sect; (*hdcp)->hdc_actual_end_sect = ((bdev->bd_part->start_sect + bdev->bd_part->nr_sects) - 1); } else { (*hdcp)->hdc_start_sect = 0; #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,30) (*hdcp)->hdc_end_sect = bdev->bd_disk->part0.nr_sects; #else (*hdcp)->hdc_actual_end_sect = bdev->bd_disk->capacity - 1; #endif } #else (*hdcp)->hdc_start_sect = get_start_sect(bdev); (*hdcp)->hdc_actual_end_sect = (*hdcp)->hdc_start_sect + get_capacity(bdev->bd_disk) - 1; #endif close_bdev(bdev, FMODE_READ); return ret; } ret = INM_EINVAL; err("dev_validate: Failed to open the device by path"); inm_free_host_dev_ctx(*hdcp); break; case FILTER_DEV_MIRROR_SETUP: bdev = open_by_dev_path(dev_info->d_guid, 0); /* open by device path */ if (bdev) { inm_list_for_each_safe(ptr, nextptr, dev_info->src_list) { vol_entry = inm_list_entry(ptr, mirror_vol_entry_t, next); if (vol_entry->mirror_dev) { if (!(*hdcp)->hdc_end_sect) { #if LINUX_VERSION_CODE < KERNEL_VERSION(5,11,0) if (bdev->bd_part) { (*hdcp)->hdc_start_sect = vol_entry->mirror_dev->bd_part->start_sect; (*hdcp)->hdc_actual_end_sect = ((vol_entry->mirror_dev->bd_part->start_sect + vol_entry->mirror_dev->bd_part->nr_sects) - 1); } else { (*hdcp)->hdc_start_sect = 0; #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,30) (*hdcp)->hdc_actual_end_sect = vol_entry->mirror_dev->bd_disk->part0.nr_sects; #else (*hdcp)->hdc_actual_end_sect = vol_entry->mirror_dev->bd_disk->capacity - 1; #endif } #else (*hdcp)->hdc_start_sect = get_start_sect(vol_entry->mirror_dev); (*hdcp)->hdc_end_sect = get_capacity(vol_entry->mirror_dev->bd_disk); #endif } hdc_dev = (host_dev_t*)INM_KMALLOC(sizeof(host_dev_t), INM_KM_SLEEP, INM_KERNEL_HEAP); INM_MEM_ZERO(hdc_dev, sizeof(host_dev_t)); if (hdc_dev) { hdc_dev->hdc_dev = vol_entry->mirror_dev->bd_inode->i_rdev; hdc_dev->hdc_disk_ptr = vol_entry->mirror_dev->bd_disk; inm_list_add_tail(&hdc_dev->hdc_dev_list, &((*hdcp)->hdc_dev_list_head)); } } else { ret = INM_EINVAL; err("dev_validate: Failed to open the device by path"); break; } } close_bdev(bdev, FMODE_READ); } else { ret = INM_EINVAL; err("dev_validate: Failed to open the device by path"); } if (!ret) { return ret; } err("dev_validate: Failed to open the device by path"); inm_free_host_dev_ctx(*hdcp); break; case FILTER_DEV_FABRIC_LUN: inm_free_host_dev_ctx(*hdcp); *hdcp = NULL; return ret; default: ret = INM_EINVAL; inm_free_host_dev_ctx(*hdcp); } return ret; } inm_s32_t flt_release(struct inode *inode, struct file *filp) { target_context_t *tgt_ctxt = NULL; if (driver_ctx->svagent_idhp == filp) { inm_svagent_exit(); } else if (driver_ctx->sentinal_idhp == filp) { inm_s2_exit(); } else { tgt_ctxt = (target_context_t *)filp->private_data; if (tgt_ctxt) { put_tgt_ctxt(tgt_ctxt); filp->private_data = NULL; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ dbg("\nthread %u (%s) killed\n", current->pid, current->comm); } } return 0; } target_context_t *get_tgt_ctxt_from_kobj(struct kobject *kobj) { struct inm_list_head *ptr; target_context_t *tgt_ctxt = NULL; host_dev_ctx_t *hdcp; struct inm_list_head *hptr; host_dev_t *hdc_dev = NULL; retry: for(ptr = driver_ctx->tgt_list.next; ptr != &(driver_ctx->tgt_list); ptr = ptr->next) { tgt_ctxt = inm_list_entry(ptr, target_context_t, tc_list); if(check_for_tc_state(tgt_ctxt, 0)){ tgt_ctxt = NULL; goto retry; } /* * This case in not valid for MIRROR setup as there could be multiple * protected device name to a LUN and one of the name is deleting then * it would be invalid to stop the mirroring on all other devices. * Even more disaster is that a stop filtering process will be done on * mirror device. */ if (tgt_ctxt->tc_dev_type == FILTER_DEV_HOST_VOLUME) { hdcp = (host_dev_ctx_t *) tgt_ctxt->tc_priv; __inm_list_for_each(hptr, &hdcp->hdc_dev_list_head) { hdc_dev = inm_list_entry(hptr, host_dev_t, hdc_dev_list); if (kobj == hdc_dev->hdc_disk_kobj_ptr) { break; } hdc_dev = NULL; } if (hdc_dev) { get_tgt_ctxt(tgt_ctxt); break; } } tgt_ctxt = NULL; } return tgt_ctxt; } /* tgt_list_sem need to be held by the caller. */ target_context_t *get_tgt_ctxt_from_bio(struct bio *bio) { struct inm_list_head *ptr, *hptr; target_context_t *tgt_ctxt = NULL; host_dev_ctx_t *hdcp; host_dev_t *hdc_dev = NULL; sector_t end_sector; int found = 0; for(ptr = driver_ctx->tgt_list.next; ptr != &(driver_ctx->tgt_list); ptr = ptr->next) { tgt_ctxt = inm_list_entry(ptr, target_context_t, tc_list); if (tgt_ctxt->tc_dev_type == FILTER_DEV_HOST_VOLUME || tgt_ctxt->tc_dev_type == FILTER_DEV_MIRROR_SETUP) { hdcp = (host_dev_ctx_t *) tgt_ctxt->tc_priv; __inm_list_for_each(hptr, &hdcp->hdc_dev_list_head) { hdc_dev = inm_list_entry(hptr, host_dev_t, hdc_dev_list); if (hdc_dev->hdc_disk_ptr == INM_BUF_DISK(bio)) break; hdc_dev = NULL; } if (hdc_dev && (hdc_dev->hdc_disk_ptr == INM_BUF_DISK(bio))) { end_sector = INM_BUF_SECTOR(bio) + ((INM_BUF_COUNT(bio) + 511) >> 9) - 1; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG_MIRROR_IO)){ info("get_tgt %s (start:%llu end:%llu) " "hdc_start:%llu hdc_end:%llu rw:%d bi_rw:%d", tgt_ctxt->tc_guid, (long long unsigned int)INM_BUF_SECTOR(bio), (long long unsigned int)end_sector, (long long unsigned int)hdcp->hdc_start_sect, (long long unsigned int)hdcp->hdc_end_sect, (int)(inm_bio_is_write(bio)), (int)(inm_bio_rw(bio))); } if ((INM_BUF_SECTOR(bio) >= hdcp->hdc_start_sect) && (end_sector <= hdcp->hdc_end_sect)) { found = 1; } if (tgt_ctxt->tc_flags & (VCF_VOLUME_CREATING | VCF_VOLUME_DELETING)) { tgt_ctxt = NULL; if (found) break; else continue; } if (found) break; if (tgt_ctxt->tc_flags & VCF_FULL_DEV) { /* hdc_actual_end_sect contains the latest size of disk * extracted from gendisk. if the IO is beyond the latest * size, we assume the disk is resized and mark for resync. */ volume_lock(tgt_ctxt); if (end_sector > hdcp->hdc_actual_end_sect) { queue_worker_routine_for_set_volume_out_of_sync(tgt_ctxt, ERROR_TO_REG_INVALID_IO, -EINVAL); /* Update actual_end_sect to new size so no * further resyncs are queued */ hdcp->hdc_actual_end_sect = get_capacity(hdc_dev->hdc_disk_ptr) - 1; err("%s: Resize: Expected: %llu, New: %llu", tgt_ctxt->tc_guid, (inm_u64_t)hdcp->hdc_end_sect, (inm_u64_t)hdcp->hdc_actual_end_sect); } volume_unlock(tgt_ctxt); } else { if (((INM_BUF_SECTOR(bio) >= hdcp->hdc_start_sect) && (INM_BUF_SECTOR(bio) <= hdcp->hdc_end_sect) && (end_sector > hdcp->hdc_end_sect)) || /* Right Overlap */ ((INM_BUF_SECTOR(bio) < hdcp->hdc_start_sect) && (end_sector >= hdcp->hdc_start_sect) && (end_sector <= hdcp->hdc_end_sect)) ||/* left Overlap */ ((INM_BUF_SECTOR(bio) < hdcp->hdc_start_sect) && (end_sector > hdcp->hdc_end_sect)) || /* Super Set */ ((INM_BUF_SECTOR(bio) > hdcp->hdc_end_sect) && (INM_BUF_SECTOR(bio) <= hdcp->hdc_actual_end_sect))) { err("Unable to handle the spanning I/O across multiple " "partitions/volumes"); queue_worker_routine_for_set_volume_out_of_sync(tgt_ctxt, ERROR_TO_REG_INVALID_IO, -EINVAL); } } } } tgt_ctxt = NULL; } return tgt_ctxt; } /* * Convert a device path to a dev_t. */ inm_s32_t convert_path_to_dev(const char *path, inm_dev_t *dev) { inm_s32_t r = 0; inm_lookup_t nd; struct inode *inode = NULL; if ((r = inm_path_lookup(path, LOOKUP_FOLLOW, &nd))) return r; #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,27) inode = nd.path.dentry->d_inode; #else inode = nd.dentry->d_inode; #endif if (!inode) { r = -ENOENT; goto out; } if (!S_ISBLK(inode->i_mode)) { r = -ENOTBLK; goto out; } *dev = inode->i_rdev; dbg("dev %s i_ino %lu i_rdev %u",path, inode->i_ino, inode->i_rdev); out: inm_path_release(&nd); return r; } inm_block_device_t * open_by_dev_path_v2(char *path, int mode) { inm_dev_t dev = 0; inm_block_device_t *bdev; if(!path) return NULL; if(convert_path_to_dev((const char *)path, &dev)) return NULL; bdev = inm_open_by_devnum(dev, mode); if(IS_ERR(bdev)) return NULL; return bdev; } /* returns bdev ptr using path */ inm_block_device_t * open_by_dev_path(char *path, int mode) { inm_dev_t dev = 0; inm_block_device_t *bdev; if(!path) return NULL; if(convert_path_to_dev((const char *)path, &dev)) return NULL; bdev = inm_open_by_devnum(dev, mode == 0 ? FMODE_READ : FMODE_WRITE); if(IS_ERR(bdev)) return NULL; return bdev; } /* Generic api to allocate pages. It is the responsibility of the caller to * acquire relevant locks to protect the head from getting corrupted due to * parallel access to the list. This function returns success even if it has * allocated one page and could not allocate more. It is the responsibility of * the callers to check for actual_nr_pages and release those if not sufficient. */ inm_s32_t alloc_data_pages(struct inm_list_head *head, inm_u32_t nr_pages, inm_u32_t *actual_nr_pages, inm_s32_t flags) { data_page_t *page = NULL; /* Do basic checks on the requested number of pages. */ *actual_nr_pages = 0; while (*actual_nr_pages < nr_pages ) { page = (data_page_t *)INM_KMALLOC(sizeof(*page), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!page) break; page->page = INM_ALLOC_MAPPABLE_PAGE(flags); if(!page->page) break; INM_SET_PAGE_RESERVED(page->page); inm_list_add_tail(&page->next, head); (*actual_nr_pages)++; } if((*actual_nr_pages) == 0) return 0; info("Data Mode Init: Allocated pages %d Page size %ld", *actual_nr_pages, INM_PAGESZ); return 1; } void free_data_pages(struct inm_list_head *head) { struct inm_list_head *ptr; data_page_t *entry; inm_s32_t num_pages = 0; if(head == NULL) return; for(ptr = head->next; ptr != head;) { entry = inm_list_entry(ptr, data_page_t, next); ptr = ptr->next; inm_list_del(&entry->next); INM_CLEAR_PAGE_RESERVED(entry->page); INM_FREE_MAPPABLE_PAGE(entry->page, INM_KERNEL_HEAP); INM_KFREE(entry, sizeof(data_page_t), INM_KERNEL_HEAP); num_pages++; } info("Data Mode Unint: Freed Data Pages: %d\n", num_pages); } void delete_data_pages(inm_u32_t num_pages) { struct inm_list_head *ptr,*hd,*nextptr; unsigned long lock_flag = 0; data_page_t *entry; if(!num_pages){ return; } INM_SPIN_LOCK_IRQSAVE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); hd = &(driver_ctx->data_flt_ctx.data_pages_head); /* * Check to see if num_pages can be reclaimed from * dc's unreserve pages */ if (num_pages > driver_ctx->dc_cur_unres_pages) { INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); return; } inm_list_for_each_safe(ptr, nextptr, hd) { entry = inm_list_entry(ptr, data_page_t, next); inm_list_del(ptr); INM_CLEAR_PAGE_RESERVED(entry->page); __INM_FREE_PAGE(entry->page); INM_KFREE(entry, sizeof(data_page_t), INM_KERNEL_HEAP); driver_ctx->data_flt_ctx.pages_free--; driver_ctx->data_flt_ctx.pages_allocated--; driver_ctx->dc_cur_unres_pages--; driver_ctx->data_flt_ctx.dp_pages_alloc_free--; num_pages--; if (!num_pages) break; } if(driver_ctx->data_flt_ctx.dp_least_free_pgs > num_pages){ driver_ctx->data_flt_ctx.dp_least_free_pgs -= num_pages; } else { driver_ctx->data_flt_ctx.dp_least_free_pgs = 0; } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); } inm_dev_t inm_dev_id_get(target_context_t *ctx) { host_dev_ctx_t *hdcp; inm_block_device_t *bdev; inm_dev_t devid = 0; host_dev_t *hdc_dev = NULL; switch(ctx->tc_dev_type) { case FILTER_DEV_HOST_VOLUME: case FILTER_DEV_MIRROR_SETUP: hdcp = ctx->tc_priv; /* * If the target_context is completely initialized use hdcp else * open the dev. */ if (hdcp) { INM_BUG_ON(!(&hdcp->hdc_dev_list_head)); hdc_dev = inm_list_entry(hdcp->hdc_dev_list_head.next, host_dev_t, hdc_dev_list); return hdc_dev->hdc_dev; } else { /* crash in debug build if target context is without devt */ INM_BUG_ON(1); bdev = open_by_dev_path(ctx->tc_guid, 0); if (bdev) { devid = bdev->bd_inode->i_rdev; close_bdev(bdev, FMODE_READ); return devid; } else { return -ENODEV; } } break; case FILTER_DEV_FABRIC_LUN: /* * since MAJOR/MINOR macros are used on the ret value, it does not seem * appropriate to return virtid of the LUN. */ return 0; break; default: break; } return 0; } inm_s32_t inm_get_mirror_dev(mirror_vol_entry_t *vol_entry) { inm_s32_t ret = 1; if(!vol_entry) goto out; if((vol_entry->vol_flags & INM_AT_LUN) && !find_dc_at_lun_entry(vol_entry->tc_mirror_guid)){ info("%s AT lun is not masked, failing the mirroring IOCTL", vol_entry->tc_mirror_guid); ret = -ENXIO; goto out; } vol_entry->mirror_dev = open_by_dev_path(vol_entry->tc_mirror_guid, 1); if (!vol_entry->mirror_dev || !vol_entry->mirror_dev->bd_disk) { err("Failed to open the volume:%s mirror_dev:%p", vol_entry->tc_mirror_guid, vol_entry->mirror_dev); vol_entry->mirror_dev = NULL; goto out; } ret = 0; out: return ret; } void inm_free_mirror_dev(mirror_vol_entry_t *vol_entry) { if (vol_entry->mirror_dev) { close_bdev(vol_entry->mirror_dev, FMODE_WRITE); vol_entry->mirror_dev = NULL; } } inm_dev_t inm_get_dev_t_from_path(const char *pathp) { inm_dev_t rdev = 0; inm_block_device_t *bdevp = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered path:%s", pathp); } bdevp = open_by_dev_path((char *)pathp, 0); if (bdevp) { rdev = bdevp->bd_inode->i_rdev; close_bdev(bdevp, FMODE_READ); } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving path:%s rdev:%d", pathp,rdev); } return (rdev); } inm_u64_t inm_dev_size_get(target_context_t *ctx) { host_dev_ctx_t *hdcp; target_volume_ctx_t *tvcptr; switch(ctx->tc_dev_type) { case FILTER_DEV_HOST_VOLUME: case FILTER_DEV_MIRROR_SETUP: hdcp = ctx->tc_priv; return (inm_u64_t) (hdcp->hdc_volume_size); break; case FILTER_DEV_FABRIC_LUN: tvcptr = ctx->tc_priv; return (inm_u64_t) ((tvcptr->nblocks) * (tvcptr->bsize)); break; default: break; } return 0; } void inm_scst_unregister(target_context_t *tgt_ctxt) { target_volume_ctx_t *vtgtctx_ptr = tgt_ctxt->tc_priv; emd_unregister_virtual_device(vtgtctx_ptr->virt_id); vtgtctx_ptr->vcptr = NULL; } int inm_path_lookup_parent(const char *name, inm_lookup_t *nd) { #if LINUX_VERSION_CODE <= KERNEL_VERSION(2,6,35) return path_lookup(name, LOOKUP_PARENT, nd); #else #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,2,0) return kern_path(name, LOOKUP_PARENT, &nd->path); #else return kern_path_parent(name, nd); #endif #endif } int inm_path_lookup(const char *name, unsigned int flags, inm_lookup_t *nd) { #if LINUX_VERSION_CODE <= KERNEL_VERSION(2,6,35) return path_lookup(name, flags | LOOKUP_FOLLOW, nd); #else return kern_path(name, flags | LOOKUP_FOLLOW, &nd->path); #endif } void inm_path_release(inm_lookup_t *nd) { #if LINUX_VERSION_CODE > KERNEL_VERSION(2,6,35) path_put(&nd->path); #else #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,27) dput(nd->path.dentry); if (nd->path.mnt) { nd->path.mnt->mnt_expiry_mark = 0; mntput_no_expire(nd->path.mnt); } #else path_release(nd); #endif #endif } void replace_sd_open(void) { driver_ctx->dc_at_lun.dc_at_drv_info.mod_dev_ops.open = inm_sd_open; } #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,32) static int inm_sd_open(struct block_device *bdev, fmode_t mode) #else static int inm_sd_open(struct inode *inode, struct file *filp) #endif { inm_s32_t err = 0; struct gendisk *disk = NULL; struct scsi_device *sdp = NULL; if(is_AT_blocked()){ err = -EACCES; goto out; } INM_ATOMIC_INC(&(driver_ctx->dc_at_lun.dc_at_drv_info.nr_in_flight_ops)); #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,32) err = driver_ctx->dc_at_lun.dc_at_drv_info.orig_drv_open(bdev, mode); #else err = driver_ctx->dc_at_lun.dc_at_drv_info.orig_drv_open(inode, filp); #endif if(!err) { #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,32) disk = bdev->bd_disk; #else disk = inode->i_bdev->bd_disk; #endif /* For pseudo devices like emc powerpath may not populate * gendisk->driverfs_dev structure, so we can exclude such * devices from being checked */ if (disk && inm_get_parent_dev(disk)) { sdp = to_scsi_device(inm_get_parent_dev(disk)); dbg("InMage util to open AT VendorID:%s", (sdp->vendor)?(sdp->vendor):("NULL")); INM_BUG_ON(strncmp(sdp->vendor, "InMage ", strlen("InMage "))); } } INM_ATOMIC_DEC(&(driver_ctx->dc_at_lun.dc_at_drv_info.nr_in_flight_ops)); out: return err; } /* validate the file and return its type */ inm_s32_t validate_file(char *pathp, inm_s32_t *type) { inm_s32_t r = 0; inm_lookup_t nd; struct inode *inode = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered fname:%s", pathp); } if ((r = inm_path_lookup(pathp, LOOKUP_FOLLOW, &nd))) return r; #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,27) inode = nd.path.dentry->d_inode; #else inode = nd.dentry->d_inode; #endif if (!inode) { r = -ENOENT; goto out; } *type = inode->i_mode; out: inm_path_release(&nd); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving fname:%s ret:%d", pathp, r); } return r; } void inm_rel_dev_resources(target_context_t *ctx, host_dev_ctx_t *hdcp) { struct inm_list_head *ptr = NULL; host_dev_t *hdc_dev = NULL; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("entered - releasing device resources"); } __inm_list_for_each(ptr, &hdcp->hdc_dev_list_head) { hdc_dev = inm_list_entry(ptr, host_dev_t, hdc_dev_list); if (hdc_dev->hdc_fops) unregister_disk_change_notification(ctx, hdc_dev); if (hdc_dev->hdc_req_q_ptr) put_qinfo(hdc_dev->hdc_req_q_ptr); hdc_dev->hdc_req_q_ptr = NULL; } if (hdcp->hdc_bio_info_pool) { INM_MEMPOOL_DESTROY(hdcp->hdc_bio_info_pool); hdcp->hdc_bio_info_pool = NULL; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("leaving - done releasing device resources"); } } #if defined(IDEBUG_MIRROR_IO) extern inm_s32_t inject_vendorcdb_err; #endif inm_s32_t inm_all_AT_cdb_send(target_context_t *tcp, unsigned char *cmd, inm_u32_t cmdlen, inm_s32_t rw, unsigned char *buf, inm_u32_t buflen, inm_u32_t flag) { mirror_vol_entry_t *vol_entry = NULL; mirror_vol_entry_t *prev_vol_entry = NULL; inm_block_device_t *bdev = NULL; struct inm_list_head *ptr, *nextptr; inm_s32_t error = 0; volume_lock(tcp); prev_vol_entry = tcp->tc_vol_entry; INM_REF_VOL_ENTRY(prev_vol_entry); volume_unlock(tcp); bdev = prev_vol_entry->mirror_dev; error = inm_one_AT_cdb_send(bdev, cmd, cmdlen, rw, buf, buflen); #if (defined(INJECT_ERR)) error = 1; #endif if (!error){ goto out; } volume_lock(tcp); restart: inm_list_for_each_safe(ptr, nextptr, &tcp->tc_dst_list) { vol_entry = inm_container_of(ptr, mirror_vol_entry_t, next); if (vol_entry->vol_error) { vol_entry = NULL; continue; } INM_REF_VOL_ENTRY(vol_entry); volume_unlock(tcp); bdev = vol_entry->mirror_dev; if(prev_vol_entry){ prev_vol_entry->vol_error = 1; INM_DEREF_VOL_ENTRY(prev_vol_entry, tcp); prev_vol_entry = NULL; } error = inm_one_AT_cdb_send(bdev, cmd, cmdlen, rw, buf, buflen); #if defined(IDEBUG_MIRROR_IO) if (inject_vendorcdb_err) { error = 1; inject_vendorcdb_err = 0; } #endif volume_lock(tcp); if (!error) { tcp->tc_vol_entry = vol_entry; break; } vol_entry->vol_error = 1; prev_vol_entry = vol_entry; vol_entry = NULL; goto restart; } volume_unlock(tcp); out: if(prev_vol_entry){ INM_DEREF_VOL_ENTRY(prev_vol_entry, tcp); } if(vol_entry){ INM_DEREF_VOL_ENTRY(vol_entry, tcp); } #if (defined(INJECT_ERR)) error = 1; #endif return error; } #if LINUX_VERSION_CODE >= KERNEL_VERSION(4,11,0) static inm_s32_t inm_one_AT_cdb_send(inm_block_device_t *bdev, unsigned char *cmd, inm_u32_t cmdlen, inm_s32_t rw, unsigned char *buf, inm_u32_t buflen) { return 0; } #else static inm_s32_t inm_one_AT_cdb_send(inm_block_device_t *bdev, unsigned char *cmd, inm_u32_t cmdlen, inm_s32_t rw, unsigned char *buf, inm_u32_t buflen) { struct gendisk *bd_disk = NULL; struct request *rq = NULL; struct request_queue *q = NULL; char sense[SCSI_SENSE_BUFFERSIZE]; inm_s32_t error = 0; if (!bdev){ error = 2; goto out; } bd_disk = bdev->bd_disk; if(!bd_disk){ error = 3; goto out; } if(buflen){ #if LINUX_VERSION_CODE <= KERNEL_VERSION(2,6,13) rq = NULL; error = queue_request_scsi(bdev, cmd, cmdlen, rw, buf, buflen); goto out; #endif } q = bd_disk->queue; if(!q){ error = 4; goto out; } rq = blk_get_request(q, rw, __GFP_WAIT); if(!rq){ error = 7; goto out; } rq->cmd_len = cmdlen; memcpy_s(rq->cmd, cmdlen, cmd, cmdlen); #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,32) rq->data = buf; rq->data_len = buflen; #endif memset(sense, 0, sizeof(sense)); rq->sense_len = 0; rq->sense = sense; #if LINUX_VERSION_CODE > KERNEL_VERSION(2,6,18) rq->cmd_type |= REQ_TYPE_BLOCK_PC; #else rq->flags |= REQ_BLOCK_PC | REQ_SPECIAL; #endif if (buflen){ rq->timeout = INM_WRITE_SCSI_TIMEOUT; } else { rq->timeout = INM_CNTL_SCSI_TIMEOUT; } #if LINUX_VERSION_CODE > KERNEL_VERSION(2,6,13) if (buflen && blk_rq_map_kern(q, rq, buf, buflen, __GFP_WAIT)){ goto out; } #endif #if LINUX_VERSION_CODE <= KERNEL_VERSION(2,6,13) blk_execute_rq(q, bd_disk, rq); #else blk_execute_rq(q, bd_disk, rq, 0); #endif error = rq->errors; if( error ){ dbg("error in blk_execute_rq error %d",error); if(rq->sense_len){ process_sense_info(rq->sense); } else { info("no sense available"); } } out: dbg("exiting send cmd %c with %d", cmd[0], error); if (rq){ blk_put_request(rq); } return error; } #endif #if LINUX_VERSION_CODE < KERNEL_VERSION(4,11,0) static inm_s32_t process_sense_info(char *sense) { /* SLES9 2.6.5 does support following function, structures */ #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,16) #if defined(RHEL_MAJOR) && (RHEL_MAJOR == 7) /* * RH7 has kabi bump for scsi_normalize_sense from 7.0 * and 7.2 because of which driver does not load. */ return 0; #else struct scsi_sense_hdr sshdr; if (!scsi_normalize_sense(sense, SCSI_SENSE_BUFFERSIZE, &sshdr)) { info("CDB sense header's fields are"); info("response_code 0x%x, sense_key 0x%x, asc 0x%x, \ ascq 0x%x, byte4 0x%x, byte5 0x%x, byte6 0x%x, \ additional_length 0x%x", sshdr.response_code, sshdr.sense_key, sshdr.asc, sshdr.ascq, sshdr.byte4, sshdr.byte5, sshdr.byte6, sshdr.additional_length); } else { info("failed to send MODE SELECT, no sense available"); } #endif /* RH7 */ #endif return 0; } #endif inm_s32_t try_reactive_offline_AT_path(target_context_t *tcp, unsigned char *cmd, inm_u32_t cmdlen, inm_s32_t rw, unsigned char *buf, inm_u32_t buflen, inm_u32_t flag) { mirror_vol_entry_t *vol_entry = NULL; inm_block_device_t *bdev = NULL; struct inm_list_head *ptr, *nextptr; inm_s32_t error = 1; inm_s32_t ret = 0; restart: volume_lock(tcp); inm_list_for_each_safe(ptr, nextptr, &(tcp->tc_dst_list)) { vol_entry = inm_container_of(ptr, mirror_vol_entry_t, next); if (!vol_entry->vol_error || (vol_entry->vol_state & INM_VOL_ENTRY_TRY_ONLINE)) { vol_entry = NULL; continue; } INM_REF_VOL_ENTRY(vol_entry); vol_entry->vol_state |= INM_VOL_ENTRY_TRY_ONLINE; volume_unlock(tcp); bdev = vol_entry->mirror_dev; ret = inm_one_AT_cdb_send(bdev, cmd, cmdlen, rw, buf, buflen); #if (defined(IDEBUG_MIRROR_IO)) if(clear_vol_entry_err){ ret = 0; } #endif if (!ret) { vol_entry->vol_error = 0; error = 0; } INM_DEREF_VOL_ENTRY(vol_entry, tcp); goto restart; } inm_list_for_each_safe(ptr, nextptr, &(tcp->tc_dst_list)) { vol_entry = inm_container_of(ptr, mirror_vol_entry_t, next); vol_entry->vol_state &= ~INM_VOL_ENTRY_TRY_ONLINE; } volume_unlock(tcp); #if (defined(IDEBUG_MIRROR_IO)) clear_vol_entry_err = 0; #endif #if (defined(INJECT_ERR)) error = 1; #endif return error; } #if LINUX_VERSION_CODE <= KERNEL_VERSION(2,6,13) static inm_s32_t queue_request_scsi(inm_block_device_t *bdev, unsigned char *cmd, inm_u32_t cmdlen, inm_s32_t rw, unsigned char *buf, inm_u32_t buflen) { struct gendisk *bd_disk = NULL; char sense[SCSI_SENSE_BUFFERSIZE]; struct scsi_device *sdp = NULL; struct scsi_request *scsi_rq = NULL; inm_s32_t error = 0; bd_disk = bdev->bd_disk; if(!bd_disk || !bd_disk->driverfs_dev){ goto out; } sdp = to_scsi_device(bd_disk->driverfs_dev); if(!sdp){ goto out; } scsi_rq = scsi_allocate_request(sdp, GFP_KERNEL); if(!scsi_rq){ goto out; } scsi_rq->sr_data_direction = DMA_TO_DEVICE; scsi_wait_req(scsi_rq, cmd, buf, buflen, INM_WRITE_SCSI_TIMEOUT, 1); error = scsi_rq->sr_result; out: if (scsi_rq){ scsi_release_request(scsi_rq); } scsi_rq = NULL; return error; } #endif void inm_dma_flag(target_context_t *tcp, inm_u32_t *flag) { inm_block_device_t *bdev = NULL; struct gendisk *bd_disk = NULL; struct scsi_device *sdp = NULL; struct Scsi_Host *shost = NULL; *flag = 0; if (!tcp) { goto out; } bdev = (tcp->tc_vol_entry->mirror_dev); if (!bdev) { goto out; } bd_disk = bdev->bd_disk; if (!bd_disk || !inm_get_parent_dev(bd_disk)) { goto out; } sdp = to_scsi_device(inm_get_parent_dev(bd_disk)); if(!sdp){ goto out; } shost = sdp->host; if(!shost){ goto out; } #if LINUX_VERSION_CODE < KERNEL_VERSION(5,13,0) *flag = ((shost->unchecked_isa_dma) ? GFP_DMA : GFP_KERNEL); #else *flag = GFP_KERNEL; #endif out: return; } void print_AT_stat(target_context_t *tcp, char *page, inm_s32_t *len) { mirror_vol_entry_t *vol_entry = NULL; struct inm_list_head *ptr, *hd, *nextptr; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ dbg("entered with tcp %p, page %p, len %p len %d",tcp, page, len, *len); } if(tcp->tc_dev_type == FILTER_DEV_MIRROR_SETUP){ (*len) += sprintf((page+(*len)), "AT name, #IO issued, #successful \ IOs, #No of Byte written, Status, no of ref \n"); volume_lock(tcp); hd = &(tcp->tc_dst_list); inm_list_for_each_safe(ptr, nextptr, hd){ vol_entry = inm_container_of(ptr, mirror_vol_entry_t, next); (*len) += sprintf((page+(*len)), "%s, %llu, %llu, %llu, %s,\ %u\n", vol_entry->tc_mirror_guid, vol_entry->vol_io_issued, vol_entry->vol_io_succeeded, vol_entry->vol_byte_written, vol_entry->vol_error?"offline":"online", INM_ATOMIC_READ(&(vol_entry->vol_ref))); } volume_unlock(tcp); } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ dbg("exiting"); } } struct block_device * inm_open_by_devnum(dev_t dev, unsigned mode) { #if LINUX_VERSION_CODE > KERNEL_VERSION(2,6,35) return blkdev_get_by_dev(dev, mode, NULL); #else return open_by_devnum(dev, mode); #endif } void free_tc_global_at_lun(struct inm_list_head *dst_list) { struct inm_list_head local_at_list; struct inm_list_head *cur = NULL; mirror_vol_entry_t *vol_entry = NULL; dc_at_vol_entry_t *dc_vol_entry = NULL; dbg("enteded in free_tc_global_at_lun"); INM_INIT_LIST_HEAD(&(local_at_list)); INM_SPIN_LOCK(&(driver_ctx->dc_at_lun.dc_at_lun_list_spn)); for (cur = dst_list->next; !(cur == dst_list); cur = cur->next){ vol_entry = inm_container_of(cur, mirror_vol_entry_t, next); dc_vol_entry = find_dc_at_lun_entry(vol_entry->tc_mirror_guid); if(!dc_vol_entry) continue; inm_list_del(&dc_vol_entry->dc_at_this_entry); inm_list_add(&(dc_vol_entry->dc_at_this_entry), &local_at_list); } INM_SPIN_UNLOCK(&(driver_ctx->dc_at_lun.dc_at_lun_list_spn)); while (!inm_list_empty(&local_at_list)) { dc_vol_entry = inm_container_of(local_at_list.next, dc_at_vol_entry_t, dc_at_this_entry); inm_list_del(&(dc_vol_entry->dc_at_this_entry)); free_dc_vol_entry(dc_vol_entry); } dbg("exiting from free_tc_global_at_lun"); } inm_s32_t process_block_at_lun(inm_devhandle_t *handle, void * arg) { inm_s32_t ret = 0; inm_s32_t err = 0; dc_at_vol_entry_t *dc_vol_entry = NULL; dc_at_vol_entry_t *chk_dc_vol_entry = NULL; inm_at_lun_reconfig_t *at_lun_reconf = NULL; struct gendisk *disk = NULL; struct scsi_device *sdp = NULL; at_lun_reconf = (inm_at_lun_reconfig_t *) INM_KMALLOC(sizeof(inm_at_lun_reconfig_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!at_lun_reconf) { err = 1; goto out; } INM_MEM_ZERO(at_lun_reconf, sizeof(inm_at_lun_reconfig_t)); if (INM_COPYIN(at_lun_reconf, (inm_at_lun_reconfig_t *) arg, sizeof(inm_at_lun_reconfig_t))) { err("copyin failed\n"); ret = INM_EFAULT; goto out; } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("%s process_block_at_lun for lun name %s", (at_lun_reconf->flag & ADD_AT_LUN_GLOBAL_LIST)? "add":"del", at_lun_reconf->atdev_name); } if (at_lun_reconf->flag & ADD_AT_LUN_GLOBAL_LIST) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("Add dc_at_blocking_entry ioctl for for %s", at_lun_reconf->atdev_name); INM_BUG_ON(at_lun_reconf->flag & DEL_AT_LUN_GLOBAL_LIST); } dc_vol_entry = alloc_dc_vol_entry(); if (!dc_vol_entry) { err = INM_ENOMEM; goto out; } if (!(dc_vol_entry->dc_at_dev=open_by_dev_path(at_lun_reconf->atdev_name, FMODE_WRITE))) { err("Fail to open AT LUN device %s", at_lun_reconf->atdev_name); err = INM_EINVAL; goto out; } disk = dc_vol_entry->dc_at_dev->bd_disk; /* For pseudo devices like emc powerpath may not populate * gendisk->driverfs_dev structure, so we can exclude such * devices from being checked */ if (disk && inm_get_parent_dev(disk)) { sdp = to_scsi_device(inm_get_parent_dev(disk)); /* Not a InMage AT Lun ? */ if (strncmp(sdp->vendor, "InMage ", strlen("InMage "))) { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("Entry is not an InMage AT device %s vendor:[%s]", at_lun_reconf->atdev_name, (sdp->vendor)?(sdp->vendor):"NULL"); } err = INM_EINVAL; goto out; } } else { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ info("Entry without driverfs_dev is not an InMage AT \ device %s", at_lun_reconf->atdev_name); } err = INM_EINVAL; goto out; } if (dc_vol_entry->dc_at_dev->bd_disk->fops == &(driver_ctx->dc_at_lun.dc_at_drv_info.mod_dev_ops)) { /* Could come here for disk partition*/ dbg("Device %s already masked ", at_lun_reconf->atdev_name); close_bdev(dc_vol_entry->dc_at_dev, FMODE_WRITE); dc_vol_entry->dc_at_dev = NULL; INM_KFREE(dc_vol_entry, sizeof(dc_at_vol_entry_t), INM_KERNEL_HEAP); err = INM_EEXIST; goto out1; } strcpy_s(dc_vol_entry->dc_at_name, INM_GUID_LEN_MAX, at_lun_reconf->atdev_name); dc_vol_entry->dc_at_dev->bd_disk->fops = &(driver_ctx->dc_at_lun.dc_at_drv_info.mod_dev_ops); INM_SPIN_LOCK_WRAPPER(&(driver_ctx->dc_at_lun.dc_at_lun_list_spn), flag); chk_dc_vol_entry = find_dc_at_lun_entry(at_lun_reconf->atdev_name); if (chk_dc_vol_entry) { INM_BUG_ON(strcmp(chk_dc_vol_entry->dc_at_name, at_lun_reconf->atdev_name)); INM_SPIN_UNLOCK_WRAPPER(&(driver_ctx->dc_at_lun.dc_at_lun_list_spn), flag); if (IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_BMAP))){ dbg("Entry already exist for device %s", at_lun_reconf->atdev_name); } close_bdev(dc_vol_entry->dc_at_dev, FMODE_WRITE); dc_vol_entry->dc_at_dev = NULL; INM_KFREE(dc_vol_entry, sizeof(dc_at_vol_entry_t), INM_KERNEL_HEAP); err = INM_EEXIST; goto out1; } inm_list_add_tail(&dc_vol_entry->dc_at_this_entry, &(driver_ctx->dc_at_lun.dc_at_lun_list)); INM_SPIN_UNLOCK_WRAPPER(&(driver_ctx->dc_at_lun.dc_at_lun_list_spn), flag); } else { INM_BUG_ON(!(at_lun_reconf->flag & DEL_AT_LUN_GLOBAL_LIST)); dbg("deleting dc_at_blocking_entry ioctl for for %s", at_lun_reconf->atdev_name); INM_BUG_ON(at_lun_reconf->flag & ADD_AT_LUN_GLOBAL_LIST); INM_SPIN_LOCK_WRAPPER(&(driver_ctx->dc_at_lun.dc_at_lun_list_spn), flag); dc_vol_entry = find_dc_at_lun_entry(at_lun_reconf->atdev_name); if (!dc_vol_entry) { INM_SPIN_UNLOCK_WRAPPER(&(driver_ctx->dc_at_lun.dc_at_lun_list_spn), flag); info("lun is not blocked but still got unblocked \ called, device name %s", at_lun_reconf->atdev_name); goto out; } inm_list_del(&dc_vol_entry->dc_at_this_entry); INM_SPIN_UNLOCK_WRAPPER(&(driver_ctx->dc_at_lun.dc_at_lun_list_spn), flag); free_dc_vol_entry(dc_vol_entry); dc_vol_entry=NULL; } out: if (err && dc_vol_entry) { free_dc_vol_entry(dc_vol_entry); } out1: if (at_lun_reconf) { INM_KFREE(at_lun_reconf, sizeof(inm_at_lun_reconfig_t), INM_KERNEL_HEAP); } dbg("exiting from process_block_at_lun"); return err; } void free_all_at_lun_entries() { inm_list_head_t *ptr = NULL, *nextptr = NULL; inm_list_head_t llist; dc_at_vol_entry_t *at_vol_entry = NULL; dbg("entered free_all_at_lun_entries"); INM_INIT_LIST_HEAD(&llist); INM_SPIN_LOCK_WRAPPER(&(driver_ctx->dc_at_lun.dc_at_lun_list_spn), lock_flag); inm_list_replace_init(&(driver_ctx->dc_at_lun.dc_at_lun_list), &llist); INM_SPIN_UNLOCK_WRAPPER(&(driver_ctx->dc_at_lun.dc_at_lun_list_spn), lock_flag); inm_list_for_each_safe(ptr, nextptr, &llist) { at_vol_entry = inm_container_of(ptr, dc_at_vol_entry_t, dc_at_this_entry); free_dc_vol_entry(at_vol_entry); } dbg("exiting from free_all_at_lun_entries"); return; } static void free_dc_vol_entry(dc_at_vol_entry_t *at_vol_entry) { if (at_vol_entry->dc_at_dev) { if (at_vol_entry->dc_at_dev->bd_disk && (at_vol_entry->dc_at_dev->bd_disk->fops == &(driver_ctx->dc_at_lun.dc_at_drv_info.mod_dev_ops))) { at_vol_entry->dc_at_dev->bd_disk->fops = driver_ctx->dc_at_lun.dc_at_drv_info.orig_dev_ops; } close_bdev(at_vol_entry->dc_at_dev, FMODE_READ); at_vol_entry->dc_at_dev = NULL; } INM_KFREE(at_vol_entry, sizeof(dc_at_vol_entry_t), INM_KERNEL_HEAP); at_vol_entry = NULL; } static dc_at_vol_entry_t* alloc_dc_vol_entry() { dc_at_vol_entry_t *at_vol_entry = NULL; at_vol_entry = (dc_at_vol_entry_t *) INM_KMALLOC(sizeof(dc_at_vol_entry_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!at_vol_entry){ goto out; } INM_MEM_ZERO(at_vol_entry, sizeof(dc_at_vol_entry_t)); INM_INIT_LIST_HEAD(&at_vol_entry->dc_at_this_entry); out: return at_vol_entry; } /* *This func should be called with dc_at_lun_list_spn lock held. */ dc_at_vol_entry_t * find_dc_at_lun_entry(char *devname) { inm_list_head_t *ptr = NULL, *nextptr = NULL; dc_at_vol_entry_t *at_vol_entry = NULL; inm_list_for_each_safe(ptr, nextptr, &(driver_ctx->dc_at_lun.dc_at_lun_list)) { at_vol_entry = inm_container_of(ptr, dc_at_vol_entry_t, dc_at_this_entry); if(!strcmp(at_vol_entry->dc_at_name, devname)){ break; } at_vol_entry = NULL; } return at_vol_entry; } void *inm_kmalloc(size_t size, int flags) { void *ptr = NULL; ptr = kmalloc(size, flags); if(ptr) { atomic_add(size, &inm_flt_memprint); } return ptr; } void inm_kfree(size_t size,const void * objp) { if(objp) { atomic_sub(size, &inm_flt_memprint); } kfree(objp); } void *inm_kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags) { void *ptr = NULL; ptr = kmem_cache_alloc(cachep,flags); #ifdef CONFIG_SLAB #if (LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,32)) if(ptr) { #if (LINUX_VERSION_CODE >= KERNEL_VERSION(3,8,13)) atomic_add(cachep->size, &inm_flt_memprint); #else atomic_add(cachep->buffer_size, &inm_flt_memprint); #endif } #endif #endif return ptr; } void inm_kmem_cache_free(struct kmem_cache *cachep, void *objp) { #ifdef CONFIG_SLAB #if (LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,32)) if(objp) { #if (LINUX_VERSION_CODE >= KERNEL_VERSION(3,8,13)) atomic_sub(cachep->size, &inm_flt_memprint); #else atomic_sub(cachep->buffer_size, &inm_flt_memprint); #endif } #endif #endif kmem_cache_free(cachep, objp); } void *inm_mempool_alloc(inm_mempool_t *pool, gfp_t gfp_mask) { void *ptr = mempool_alloc(pool, gfp_mask); #ifdef CONFIG_SLAB #if (LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,32)) struct kmem_cache *cachep; if(ptr) { if (pool->pool_data) { cachep = pool->pool_data; #if (LINUX_VERSION_CODE >= KERNEL_VERSION(3,8,13)) atomic_add(cachep->size, &inm_flt_memprint); #else atomic_add(cachep->buffer_size, &inm_flt_memprint); #endif } } #endif #endif return ptr; } void inm_mempool_free(void *element, inm_mempool_t *pool) { #ifdef CONFIG_SLAB #if (LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,32)) struct kmem_cache *cachep = NULL; if(element){ if(pool->pool_data) { cachep = pool->pool_data; #if (LINUX_VERSION_CODE >= KERNEL_VERSION(3,8,13)) atomic_sub(cachep->size, &inm_flt_memprint); #else atomic_sub(cachep->buffer_size, &inm_flt_memprint); #endif } } #endif #endif mempool_free(element, pool); } void *inm_vmalloc(unsigned long size) { void *ptr = NULL; ptr = vmalloc(size); if(ptr) { atomic_add(size, &inm_flt_memprint); } return ptr; } void inm_vfree(const void *addr, unsigned long size) { if(addr) { atomic_sub(size, &inm_flt_memprint); } vfree(addr); } struct page *inm_alloc_page(gfp_t gfp_mask) { struct page *ptr = NULL; ptr = alloc_page(gfp_mask); if(ptr) { atomic_add(PAGE_SIZE, &inm_flt_memprint); } return ptr; } void __inm_free_page(struct page *page) { if(page) { atomic_sub(PAGE_SIZE, &inm_flt_memprint); } __free_page(page); } void inm_free_page(unsigned long addr) { if(addr) { atomic_sub(PAGE_SIZE, &inm_flt_memprint); } free_page((unsigned long)addr); } unsigned long __inm_get_free_page(gfp_t gfp_mask) { unsigned long addr = (unsigned long)NULL; addr = __get_free_page(gfp_mask); if(addr) { atomic_add(PAGE_SIZE, &inm_flt_memprint); } return addr; } #if (LINUX_VERSION_CODE >= KERNEL_VERSION(4,14,0)) void post_timeout_task_to_wthread(inm_timer_t *timer) { flt_timer_t *flt_timer = inm_container_of(timer, flt_timer_t, ft_timer); add_item_to_work_queue(&driver_ctx->dc_tqueue, &flt_timer->ft_task); } #else void post_timeout_task_to_wthread(unsigned long wqtask) { add_item_to_work_queue(&driver_ctx->dc_tqueue, (wqentry_t *)wqtask); } #endif inm_s32_t force_timeout(flt_timer_t *timer) { if (timer->ft_task.flags == WITEM_TYPE_TIMEOUT) { dbg("Force timeout"); mod_timer(&timer->ft_timer, jiffies); return 0; } else { return -EINVAL; } } inm_s32_t end_timer(flt_timer_t *timer) { if (timer->ft_task.flags == WITEM_TYPE_TIMEOUT) { dbg("Shutting down the timer"); del_timer_sync(&timer->ft_timer); timer->ft_task.flags = WITEM_TYPE_UNINITIALIZED; return 0; } else { return -EINVAL; } } void start_timer(flt_timer_t *timer, int timeout_ms, timeout_t callback) { init_work_queue_entry(&timer->ft_task); timer->ft_task.flags = WITEM_TYPE_TIMEOUT; timer->ft_task.work_func = callback; timer->ft_task.context = NULL; dbg("Starting cp timer with %d ms timeout at %lu", timeout_ms, jiffies); #if (LINUX_VERSION_CODE >= KERNEL_VERSION(4,14,0)) timer_setup(&timer->ft_timer, post_timeout_task_to_wthread, 0); #else init_timer(&timer->ft_timer); timer->ft_timer.function = post_timeout_task_to_wthread; timer->ft_timer.data = (unsigned long)&timer->ft_task; #endif timer->ft_timer.expires = jiffies + INM_MSECS_TO_JIFFIES(timeout_ms); add_timer(&timer->ft_timer); } inm_s32_t end_cp_timer(void) { if (driver_ctx->dc_cp != INM_CP_NONE) { INM_BUG_ON(driver_ctx->dc_cp != INM_CP_NONE); return -EINVAL; } INM_MEM_ZERO(driver_ctx->dc_cp_guid, sizeof(driver_ctx->dc_cp_guid)); return end_timer(&cp_timer); } void start_cp_timer(int timeout_ms, timeout_t callback) { dbg("Starting timer -> %d ms", timeout_ms); start_timer(&cp_timer, timeout_ms, callback); } /* * COMMIT/REVOKE TAGS */ /* * commit_tags_v2 * * Function to commit/revoke tags for multiple volumes. The volumes are * fetched tgt_list and checked for pending comit * */ inm_s32_t commit_tags_v2(char *tag_guid, TAG_COMMIT_STATUS_T commit, int timedout) { inm_s32_t error = 0; inm_list_head_t *cur = NULL; inm_list_head_t *next= NULL; target_context_t *tgt_ctxt = NULL; int tag_committed = 0; INM_DOWN(&driver_ctx->dc_cp_mutex); dbg("Commit tag: flag = %d and timeout = %d", commit, timedout); if (INM_MEM_CMP(driver_ctx->dc_cp_guid, tag_guid, /* Tag Matches */ sizeof(driver_ctx->dc_cp_guid))) { err("Tag guid mismatch"); error = -EINVAL; goto out; } /* * We check for absolute value and not bit to confirm * volume has been unquiesced before commiting the tag * Also prevents deadlock with create and remove barrier */ if (driver_ctx->dc_cp != INM_CP_TAG_COMMIT_PENDING) { dbg("no commit pending"); error = -EINVAL; goto out; } error = INM_TAG_SUCCESS; INM_DOWN_READ(&(driver_ctx->tgt_list_sem)); inm_list_for_each_safe(cur, next, &driver_ctx->tgt_list) { tgt_ctxt = inm_list_entry(cur, target_context_t, tc_list); tag_committed = 0; if (is_target_tag_commit_pending(tgt_ctxt)) { if (commit == TAG_COMMIT) { dbg("Committing tag"); /* if COMMIT fails, mark return a partial */ if (commit_usertag(tgt_ctxt)) error = INM_TAG_PARTIAL; else tag_committed = 1; } else { dbg("Revoking tag"); revoke_usertag(tgt_ctxt, timedout); } tgt_ctxt->tc_flags &= ~VCF_TAG_COMMIT_PENDING; if (tag_committed || /* Committed */ /* Changes */ should_wakeup_s2_ignore_drain_barrier(tgt_ctxt)) INM_WAKEUP_INTERRUPTIBLE(&tgt_ctxt->tc_waitq); } } INM_UP_READ(&(driver_ctx->tgt_list_sem)); driver_ctx->dc_cp &= ~INM_CP_TAG_COMMIT_PENDING; INM_BUG_ON(driver_ctx->dc_cp != INM_CP_NONE); dbg("New cp state = %d", driver_ctx->dc_cp); INM_MEM_ZERO(driver_ctx->dc_cp_guid, sizeof(driver_ctx->dc_cp_guid)); if (!timedout) { if (end_cp_timer()) err("Cannot stop timer"); } out: INM_UP(&driver_ctx->dc_cp_mutex); return error ; } /* * Actual function to freeze a given volume for a given timeout value. * Input: freeze_vol structure. * Output: 0 if succeded. * */ static inm_s32_t process_freeze_volume(freeze_info_t *freeze_vol) { inm_list_head_t *ptr = NULL, *nextptr = NULL; freeze_vol_info_t *freeze_ele = NULL; freeze_vol_info_t *freeze_vinfo = NULL; int ret = 0; dbg ("entered process_freeze_volume"); dbg("Freezing %s", freeze_vol->vol_info->vol_name); /* check if the volume is already frozen */ /* iterate over the freeze link list */ inm_list_for_each_safe(ptr, nextptr, &driver_ctx->freeze_vol_list) { freeze_ele = inm_list_entry(ptr, freeze_vol_info_t, freeze_list_entry); if(freeze_ele) { if(!strcmp(freeze_ele->vol_name, freeze_vol->vol_info->vol_name)) { dbg ("the volume [%s] is alreday frozen", freeze_ele->vol_name); ret = -1; goto out; } } freeze_ele = NULL; } freeze_vinfo = (freeze_vol_info_t *) INM_KMALLOC (sizeof (freeze_vol_info_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!freeze_vinfo) { err ("Failed to allocate the freeze_vol_info_t object"); ret = -1; goto out; } INM_MEM_ZERO (freeze_vinfo, sizeof (freeze_vol_info_t)); strcpy_s (freeze_vinfo->vol_name, TAG_VOLUME_MAX_LENGTH, freeze_vol->vol_info->vol_name); /* open by device path */ freeze_vinfo->bdev = open_by_dev_path_v2 (freeze_vinfo->vol_name, FMODE_READ | FMODE_WRITE); if (!(freeze_vinfo->bdev)) { info ("failed to open block device %s", freeze_vinfo->vol_name); if (freeze_vinfo) { INM_KFREE (freeze_vinfo, sizeof (freeze_vol_info_t), INM_KERNEL_HEAP); freeze_vinfo = NULL; } ret = -1; goto out; } if (inm_freeze_bdev(freeze_vinfo->bdev, freeze_vinfo->sb)) { info (" failed to freeze block device %s", freeze_vinfo->vol_name); close_bdev (freeze_vinfo->bdev, FMODE_READ | FMODE_WRITE); freeze_vinfo->bdev = NULL; freeze_vinfo->sb = NULL; if (freeze_vinfo) { INM_KFREE (freeze_vinfo, sizeof (freeze_vol_info_t), INM_KERNEL_HEAP); freeze_vinfo = NULL; } ret = -1; goto out; } /* insert the node inside the freeze volume list */ inm_list_add_tail (&freeze_vinfo->freeze_list_entry, &driver_ctx->freeze_vol_list); ret = 0; out: if (ret) { freeze_vol->vol_info->status |= STATUS_FREEZE_FAILED; } else { freeze_vol->vol_info->status |= STATUS_FREEZE_SUCCESS; } dbg ("leaving process_freeze_volume"); return ret; } /* * IOCTL function to freeze a set of given volume for a given timeout value. * Input: handle, arg * Output: 0 if all succeded. * */ inm_s32_t process_freeze_volume_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg) { freeze_info_t *freeze_vol = NULL; int ret = 0; int numvol = 0; int no_of_vol_freeze_done = 0; inm_u32_t fs_freeze_timeout = 0; unsigned long lock_flag = 0; dbg("entered process_freeze_volume_ioctl"); if(!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, sizeof(freeze_info_t))) { err("Read access violation for freeze_info_t"); ret = -EFAULT; goto out; } freeze_vol = (freeze_info_t *)INM_KMALLOC(sizeof(freeze_info_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!freeze_vol) { err("INM_KMALLOC failed to allocate memory for freeze_info_t"); ret = -ENOMEM; goto out; } if(INM_COPYIN(freeze_vol, arg, sizeof(freeze_info_t))) { err("INM_COPYIN failed"); ret = -EFAULT; goto out_err; } if(freeze_vol->nr_vols <= 0) { err("Freeze Input Failed: Number of volumes can't be zero or \ negative"); ret = -EINVAL; goto out_err; } INM_SPIN_LOCK_IRQSAVE(&driver_ctx->tunables_lock, lock_flag); fs_freeze_timeout = driver_ctx->tunable_params.fs_freeze_timeout; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->tunables_lock, lock_flag); if(freeze_vol->timeout <= 0 || freeze_vol->timeout > fs_freeze_timeout) { err("Freeze Input Failed: Invalid timeout value"); ret = -EINVAL; goto out_err; } arg = freeze_vol->vol_info; freeze_vol->vol_info = NULL; /* allocate a buffer and reuse to store the volume info for a set of volumes */ freeze_vol->vol_info = (volume_info_t *)INM_KMALLOC( sizeof(*freeze_vol->vol_info), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!freeze_vol->vol_info) { err("INM_KMALLOC failed to allocate memory for volume_info_t"); ret = -ENOMEM; goto out; } INM_DOWN (&(driver_ctx->dc_cp_mutex)); if (driver_ctx->dc_cp != INM_CP_APP_ACTIVE && driver_ctx->dc_cp != INM_CP_NONE) { dbg("CP active -> %d", driver_ctx->dc_cp); ret = -EAGAIN; goto out_unlock; } /* If some fs were frozen earlier, match the guid */ if (driver_ctx->dc_cp != INM_CP_NONE) { INM_BUG_ON(driver_ctx->dc_cp != INM_CP_APP_ACTIVE); if (INM_MEM_CMP(driver_ctx->dc_cp_guid, freeze_vol->tag_guid, sizeof(driver_ctx->dc_cp_guid))) { err("GUID mismatch"); ret = -EINVAL; goto out_unlock; } } /* iterate over the given volume list */ for ( numvol = 0; numvol < freeze_vol->nr_vols; numvol++) { /* mem set the buffer before using it */ INM_MEM_ZERO(freeze_vol->vol_info, sizeof(volume_info_t)); if(!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, sizeof(*freeze_vol->vol_info))) { err("Read access violation for freeze_info_t"); ret = -EFAULT; break; } if(INM_COPYIN(freeze_vol->vol_info, arg, sizeof(*freeze_vol->vol_info))) { err("INM_COPYIN failed"); ret = -EFAULT; break; } /* process the freeze volume list */ freeze_vol->vol_info->vol_name[TAG_VOLUME_MAX_LENGTH - 1] = '\0'; ret = process_freeze_volume(freeze_vol); if(ret) { dbg("Failed to freeze the volume %s\n", freeze_vol->vol_info->vol_name); } else { no_of_vol_freeze_done++; } if(INM_COPYOUT(arg, freeze_vol->vol_info, sizeof(*freeze_vol->vol_info))) { err("copy to user failed for freeze volume status"); ret = INM_EFAULT; break; } arg += sizeof(*freeze_vol->vol_info); } if (no_of_vol_freeze_done == freeze_vol->nr_vols) { ret = 0; } else { if (!ret) ret = -1; } if (no_of_vol_freeze_done) { /* If first freeze ioctl, copy the guid and start timer */ if (driver_ctx->dc_cp == INM_CP_NONE) { driver_ctx->dc_cp |= INM_CP_APP_ACTIVE; dbg("First freeze"); dbg("New cp state = %d", driver_ctx->dc_cp); memcpy_s(&driver_ctx->dc_cp_guid, sizeof(driver_ctx->dc_cp_guid), freeze_vol->tag_guid, sizeof(freeze_vol->tag_guid)); start_cp_timer(freeze_vol->timeout, inm_fvol_list_thaw_on_timeout); } else { INM_BUG_ON(driver_ctx->dc_cp != INM_CP_APP_ACTIVE); dbg("Already marked for APP_CP"); } } out_unlock: INM_UP (&(driver_ctx->dc_cp_mutex)); out: if(freeze_vol) { if (freeze_vol->vol_info) { INM_KFREE(freeze_vol->vol_info, sizeof(volume_info_t), INM_KERNEL_HEAP); freeze_vol->vol_info = NULL; } INM_KFREE(freeze_vol, sizeof(freeze_info_t), INM_KERNEL_HEAP); freeze_vol = NULL; } dbg("leaving process_freeze_volume_ioctl"); return ret; out_err: freeze_vol->vol_info = NULL; goto out; } /* * The actual function which does thaw on a given volume. * Input: thaw_info_t * Output: 0 if success. */ static inm_s32_t process_thaw_volume(thaw_info_t *thaw_vol) { inm_list_head_t *ptr = NULL, *nextptr = NULL; freeze_vol_info_t *freeze_ele = NULL; int ret; dbg("entered process_thaw_volume"); dbg("Thawing %s", thaw_vol->vol_info->vol_name); /* iterate over the freeze link list */ inm_list_for_each_safe(ptr, nextptr, &driver_ctx->freeze_vol_list) { freeze_ele = inm_list_entry(ptr, freeze_vol_info_t, freeze_list_entry); if(freeze_ele && (!strcmp(freeze_ele->vol_name, thaw_vol->vol_info->vol_name))) { dbg("the volume to thaw is [%s]\n", thaw_vol->vol_info->vol_name); /* * found element inside the freeze link list * delete entry from freeze link list */ inm_list_del(&freeze_ele->freeze_list_entry); /* * thaw the bdev, not checking the return value * because in older kernel version return type is void */ inm_thaw_bdev(freeze_ele->bdev, freeze_ele->sb); close_bdev(freeze_ele->bdev, FMODE_READ | FMODE_WRITE); freeze_ele->bdev = NULL; freeze_ele->sb = NULL; INM_KFREE(freeze_ele, sizeof(freeze_vol_info_t), INM_KERNEL_HEAP); freeze_ele = NULL; ptr = NULL; nextptr = NULL; ret = 0; goto out; } freeze_ele = NULL; } ret = -1; out: if (ret) { thaw_vol->vol_info->status |= STATUS_THAW_FAILED; } else { thaw_vol->vol_info->status |= STATUS_THAW_SUCCESS; } if (inm_list_empty(&driver_ctx->freeze_vol_list)) { dbg("All volume thawed"); driver_ctx->dc_cp &= ~INM_CP_APP_ACTIVE; dbg("New cp state = %d", driver_ctx->dc_cp); if (driver_ctx->dc_cp == INM_CP_NONE) end_cp_timer(); } dbg("leaving process_thaw_volume"); return ret; } /* * IOCTL function which does thaw on set of given volumes * Input: handle, arg * Ouput: 0 if all succeded * */ inm_s32_t process_thaw_volume_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg) { thaw_info_t *thaw_vol = NULL; int ret = 0; int numvol = 0; int no_vol_thaw_done = 0; dbg("entered process_unfreeze_volume_ioctl"); if(!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, sizeof(thaw_info_t))) { err("Read access violation for thaw_info_t"); ret = -EFAULT; goto out; } thaw_vol = (thaw_info_t *)INM_KMALLOC(sizeof(thaw_info_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!thaw_vol) { err("INM_KMALLOC failed to allocate memory for thaw_info_t"); ret = -ENOMEM; goto out; } if(INM_COPYIN(thaw_vol, arg, sizeof(thaw_info_t))) { err("INM_COPYIN failed"); ret = -EFAULT; goto out_err; } if(thaw_vol->nr_vols <= 0) { err("Thaw Input Failed: Number of volumes can't be zero or \ negative"); ret = -EINVAL; goto out_err; } arg = thaw_vol->vol_info; thaw_vol->vol_info = NULL; /* allocate a buffer and reuse to store volume info for a set of * volumes */ thaw_vol->vol_info = (volume_info_t *)INM_KMALLOC( sizeof(volume_info_t), INM_KM_SLEEP, INM_KERNE:_HEAP); if(!thaw_vol->vol_info) { err("INM_KMALLOC failed to allocate memory for volume_info_t"); ret = -ENOMEM; goto out; } /* take the lock */ INM_DOWN(&(driver_ctx->dc_cp_mutex)); if (!(driver_ctx->dc_cp & INM_CP_APP_ACTIVE)) { err("Thaw without freeze"); ret = -EINVAL; goto out_unlock; } if (INM_MEM_CMP(driver_ctx->dc_cp_guid, thaw_vol->tag_guid, sizeof(driver_ctx->dc_cp_guid))) { err("Invalid thaw guid"); ret = -EINVAL; goto out_unlock; } for (numvol = 0; numvol < thaw_vol->nr_vols; numvol++) { /* mem set the buffer before using it */ INM_MEM_ZERO(thaw_vol->vol_info, sizeof(volume_info_t)); if(!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, sizeof(*thaw_vol->vol_info))) { err("Read access violation for volume_info_t"); ret = -EFAULT; goto out_unlock; } if(INM_COPYIN(thaw_vol->vol_info, arg, sizeof(*thaw_vol->vol_info))) { err("INM_COPYIN failed"); ret = -EFAULT; goto out_unlock; } /* process the freeze volume list */ thaw_vol->vol_info->vol_name[TAG_VOLUME_MAX_LENGTH - 1] = '\0'; ret = process_thaw_volume(thaw_vol); if(ret) { dbg("Fail to Thaw the volume %s\n", thaw_vol->vol_info->vol_name); } else { no_vol_thaw_done++; } if(INM_COPYOUT(arg, thaw_vol->vol_info, sizeof(*thaw_vol->vol_info))) { err("copy to user failed for thaw volume status"); ret = INM_EFAULT; goto out_unlock; } arg += sizeof(*thaw_vol->vol_info); } if (no_vol_thaw_done == thaw_vol->nr_vols) { ret = 0; } else { ret = -1; } out_unlock: /* release the lock */ INM_UP(&(driver_ctx->dc_cp_mutex)); out: if(thaw_vol) { if (thaw_vol->vol_info) { INM_KFREE(thaw_vol->vol_info, sizeof(volume_info_t), INM_KERNEL_HEAP); thaw_vol->vol_info = NULL; } INM_KFREE(thaw_vol, sizeof(thaw_info_t), INM_KERNEL_HEAP); thaw_vol = NULL; } dbg("leaving process_unfreeze_volume_ioctl"); return ret; out_err: thaw_vol->vol_info = NULL; goto out; } /* * Function monitors the freeze volume list and thaw if someone timedout. */ void inm_fvol_list_thaw_on_timeout(wqentry_t *not_used) { inm_list_head_t *ptr = NULL, *nextptr = NULL; freeze_vol_info_t *freeze_ele = NULL; err("Starting timeout procedure at %lu", jiffies); /* take the lock*/ INM_DOWN(&(driver_ctx->dc_cp_mutex)); /* iterate over the global freeze link list*/ inm_list_for_each_safe(ptr, nextptr, &driver_ctx->freeze_vol_list){ freeze_ele = inm_list_entry(ptr, freeze_vol_info_t, freeze_list_entry); if(freeze_ele){ dbg("thaw the volume [%s]\n", freeze_ele->vol_name); /* thaw the volume using bdev and sb present in link * list */ inm_thaw_bdev(freeze_ele->bdev, freeze_ele->sb); close_bdev(freeze_ele->bdev, FMODE_READ | FMODE_WRITE); freeze_ele->bdev = NULL; freeze_ele->sb = NULL; inm_list_del(&freeze_ele->freeze_list_entry); INM_KFREE(freeze_ele, sizeof(freeze_vol_info_t), INM_KERNEL_HEAP); freeze_ele = NULL; } } driver_ctx->dc_cp &= ~INM_CP_APP_ACTIVE; /* release the lock */ INM_UP(&(driver_ctx->dc_cp_mutex)); commit_tags_v2(driver_ctx->dc_cp_guid, TAG_REVOKE, 1); dbg("leaving inm_fvol_list_thaw_on_timeout"); return; } #ifndef INITRD_MODE inm_s32_t process_init_driver_fully(inm_devhandle_t *handle, void * arg) { inm_irqflag_t flag = 0; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->clean_shutdown_lock, flag); driver_ctx->dc_flags |= DC_FLAGS_REBOOT_MODE; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->clean_shutdown_lock, flag); return 0; } #else extern inm_s32_t driver_state; extern inm_s32_t get_root_info(void); inm_s32_t process_init_driver_fully(inm_devhandle_t *handle, void * arg) { inm_s32_t state; inm_irqflag_t flag = 0; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->clean_shutdown_lock, flag); state = driver_state; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->clean_shutdown_lock, flag); if (!(state & DRV_LOADED_PARTIALLY)) return 0; /* Read the common tunables */ init_driver_tunable_params(); sysfs_involflt_init(); get_root_info(); INM_SPIN_LOCK_IRQSAVE(&driver_ctx->clean_shutdown_lock, flag); driver_state &= ~DRV_LOADED_PARTIALLY; driver_state |= DRV_LOADED_FULLY; driver_ctx->dc_flags |= DC_FLAGS_REBOOT_MODE; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->clean_shutdown_lock, flag); if(driver_ctx->clean_shutdown) inm_flush_clean_shutdown(UNCLEAN_SHUTDOWN); telemetry_init(); info("Initialized the involflt module successfully"); return 0; } #endif /* * function to add tag to given volume * when iobarrier is on */ inm_s32_t iobarrier_add_volume_tags(tag_volinfo_t *tag_volinfop, tag_info_t *tag_info_listp, int nr_tags, int commit_pending, tag_telemetry_common_t *tag_common) { int index = 0; int ret = -1; target_context_t *ctxt = tag_volinfop->ctxt; tag_history_t *tag_hist = NULL; if (ctxt->tc_dev_type == FILTER_DEV_MIRROR_SETUP){ inm_form_tag_cdb(ctxt, tag_info_listp, nr_tags); goto out; } if (tag_common) tag_hist = telemetry_tag_history_alloc(ctxt, tag_common); volume_lock(ctxt); /* * add tags only in metadata mode, to save the data pages. */ ret = add_tag_in_non_stream_mode(tag_volinfop, tag_info_listp, nr_tags, NULL, index, commit_pending, tag_hist); if (!ret){ if (commit_pending) { driver_ctx->dc_cp |= INM_CP_TAG_COMMIT_PENDING; dbg("New cp state = %d", driver_ctx->dc_cp); } if (tag_common) { if (tag_hist) telemetry_tag_history_record(ctxt, tag_hist); else telemetry_log_drop_error(-ENOMEM); } } volume_unlock(ctxt); if (ret && tag_hist) telemetry_tag_history_free(tag_hist); INM_WAKEUP_INTERRUPTIBLE(&ctxt->tc_waitq); out: return ret; } /* * function to issue tags for all protected volume * when iobarrier is on i.e. already holding lock tgt_list_sem. */ inm_s32_t iobarrier_issue_tag_all_volume(tag_info_t *tag_list, int nr_tags, int commit_pending, tag_telemetry_common_t *tag_common) { int ret = 0; struct inm_list_head *ptr; target_context_t *tgt_ctxt = NULL; tag_volinfo_t *tag_volinfop = NULL; int vols_tagged = 0; inm_s32_t error = 0; int tag_not_issued = 0; TAG_COMMIT_STATUS *tag_status = NULL; dbg("entered iobarrier_issue_tag_all_volume"); tag_volinfop = (tag_volinfo_t *)INM_KMALLOC(sizeof(tag_volinfo_t), INM_KM_NOSLEEP, INM_KERNEL_HEAP); if(!tag_volinfop) { err("TAG Input Failed: INM_KMALLOC failed for tag_volinfo_t"); return -ENOMEM; } INM_MEM_ZERO(tag_volinfop, sizeof(tag_volinfo_t)); for (ptr = driver_ctx->tgt_list.next; ptr != &(driver_ctx->tgt_list); ptr = ptr->next) { tgt_ctxt = inm_list_entry(ptr, target_context_t, tc_list); if(tgt_ctxt) { if(tgt_ctxt->tc_flags & (VCF_VOLUME_CREATING | VCF_VOLUME_DELETING)){ tgt_ctxt = NULL; continue; } get_tgt_ctxt(tgt_ctxt); if(tgt_ctxt->tc_dev_type != FILTER_DEV_MIRROR_SETUP && tgt_ctxt->tc_cur_wostate != ecWriteOrderStateData) { dbg("the volume is not in Write Order State Data"); /* Non WO state takes precedence over other errors */ tag_not_issued = 1; error = -EPERM; ret = -EPERM; INM_SPIN_LOCK(&driver_ctx->dc_tag_commit_status); if (driver_ctx->dc_tag_drain_notify_guid && !INM_MEM_CMP(driver_ctx->dc_cp_guid, driver_ctx->dc_tag_drain_notify_guid, GUID_LEN)) { tag_status = tgt_ctxt->tc_tag_commit_status; info("The disk %s is in non write order \ state", tgt_ctxt->tc_guid); } INM_SPIN_UNLOCK(&driver_ctx->dc_tag_commit_status); if (tag_status) set_tag_drain_notify_status(tgt_ctxt, TAG_STATUS_INSERTION_FAILED, DEVICE_STATUS_NON_WRITE_ORDER_STATE); goto tag_fail; } if(is_target_filtering_disabled(tgt_ctxt)) { if (!error) error = -ENODEV; ret = -ENODEV; goto tag_fail; } tag_volinfop->ctxt = tgt_ctxt; ret = iobarrier_add_volume_tags(tag_volinfop, tag_list, nr_tags, commit_pending, tag_common); if (ret) { dbg("failed to issue tag"); if (!error) error = ret; goto tag_fail; } else { vols_tagged++; if (tag_common) tag_common->tc_ndisks_tagged = vols_tagged; } tag_fail: if (ret) { if (tag_common) { tag_common->tc_ioctl_status = (vols_tagged ? INM_TAG_PARTIAL : INM_TAG_FAILED); telemetry_log_tag_failure(tgt_ctxt, tag_common, ret, ecMsgTagInsertFailure); } ret = 0; } put_tgt_ctxt(tgt_ctxt); tgt_ctxt = NULL; tag_volinfop->ctxt = NULL; } } if (tag_not_issued) update_cx_with_tag_failure(); if (vols_tagged) { if (error) { dbg("Partial tag"); ret = INM_TAG_PARTIAL; } else { ret = INM_TAG_SUCCESS; } } else { ret = error; } if (tag_common) tag_common->tc_ioctl_status = ret; if (tag_volinfop) { if (tgt_ctxt) { put_tgt_ctxt(tgt_ctxt); tgt_ctxt = NULL; tag_volinfop->ctxt = NULL; } INM_KFREE(tag_volinfop, sizeof(tag_volinfo_t), INM_KERNEL_HEAP); tag_volinfop = NULL; } dbg("leaving iobarrier_issue_tag_all_volume"); return ret; } /* * process ioctl to: * --Create IO barrier. * --Issue TAG for all or protected volumes those are in data mode. * each volume will have all the tag of tag list. * --remove IO barrier. */ inm_s32_t process_iobarrier_tag_volume_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg) { tag_info_t_v2 *tag_vol = NULL; int ret = 0; int numvol = 0; int no_of_vol_tags_done = 0; inm_s32_t error = 0; tag_info_t *tag_list = NULL; tag_telemetry_common_t *tag_common = NULL; etMessageType msg = ecMsgUninitialized; int tag_failed = 0; dbg("entered process_iobarrier_tag_volume_ioctl"); tag_common = telemetry_tag_common_alloc(IOCTL_INMAGE_IOBARRIER_TAG_VOLUME); if(!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, sizeof(tag_info_t_v2))) { err("Read access violation for tag_info_t_v2"); ret = -EFAULT; msg = ecMsgCCInputBufferMismatch; goto out; } tag_vol = (tag_info_t_v2 *)INM_KMALLOC(sizeof(tag_info_t_v2), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!tag_vol) { err("INM_KMALLOC failed to allocate memory for tag_info_t_v2"); ret = -ENOMEM; msg = ecMsgCCInputBufferMismatch; goto out; } if(INM_COPYIN(tag_vol, arg, sizeof(tag_info_t_v2))) { err("INM_COPYIN failed"); ret = -EFAULT; msg = ecMsgCCInputBufferMismatch; goto out_err; } if(tag_vol->nr_tags <= 0) { err("Tag Input Failed: number of tags can't be zero or negative"); ret = -EINVAL; msg = ecMsgCCInvalidTagInputBuffer; goto out_err; } if(tag_vol->timeout < 0) { err("Tag Input Failed: timeout of tags can't be negative"); ret = -EINVAL; msg = ecMsgCCInvalidTagInputBuffer; goto out_err; } arg = tag_vol->tag_names; tag_vol->tag_names = NULL; tag_vol->tag_names = (tag_names_t *)INM_KMALLOC(tag_vol->nr_tags * sizeof(tag_names_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if(!tag_vol->tag_names) { err("INM_KMALLOC failed to allocate memory for tag_names_t"); ret = -ENOMEM; msg = ecMsgCCInputBufferMismatch; goto out_err; } if(!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, tag_vol->nr_tags * sizeof(tag_names_t))) { err("Read access violation for tag_names_t"); ret = -EFAULT; msg = ecMsgCCInputBufferMismatch; goto out_err_vol; } if(INM_COPYIN(tag_vol->tag_names, arg, tag_vol->nr_tags * sizeof(tag_names_t))) { err("INM_COPYIN failed"); ret = -EFAULT; msg = ecMsgCCInputBufferMismatch; goto out_err_vol; } if (tag_common) { inm_get_tag_marker_guid(tag_vol->tag_names[0].tag_name, tag_vol->tag_names[0].tag_len, tag_common->tc_guid, sizeof(tag_common->tc_guid)); tag_common->tc_guid[sizeof(tag_common->tc_guid) - 1] = '\0'; dbg("Tag Marker GUID = %s", tag_common->tc_guid); } /* now build the tag list which will be use for set of given volumes */ tag_list = build_tag_vol_list(tag_vol, &error); if(error || !tag_list) { err("build tag volume list failed for the volume"); ret = error; msg = ecMsgCCInvalidTagInputBuffer; goto out_err_vol; } arg = tag_vol->vol_info; tag_vol->vol_info = NULL; /* * first create IO Barrier, followed by issue a tag and than * remove io barrier at last before returning to this ioctl call */ INM_DOWN(&driver_ctx->dc_cp_mutex); if (driver_ctx->dc_cp != INM_CP_NONE) { err("Consistency Point already active"); ret = -EAGAIN; msg = ecMsgCompareExchangeTagStateFailure; goto unlock_cp_mutex; } dbg("creating io barrier\n"); INM_DOWN_WRITE(&(driver_ctx->tgt_list_sem)); #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) volume_lock_all_close_cur_chg_node(); #endif INM_ATOMIC_SET(&driver_ctx->is_iobarrier_on, 1); #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) get_time_stamp_tag(&driver_ctx->dc_crash_tag_timestamps); volume_unlock_all(); #endif dbg("created io barrier\n"); if (tag_common) { tag_common->tc_ndisks = driver_ctx->total_prot_volumes; tag_common->tc_ndisks_prot = driver_ctx->total_prot_volumes; } /* get list of all the protected volumes if the below falg is set */ if ((tag_vol->flags & TAG_ALL_PROTECTED_VOLUME_IOBARRIER) == TAG_ALL_PROTECTED_VOLUME_IOBARRIER) { dbg("issuing tag to all volumes"); memcpy_s(&driver_ctx->dc_cp_guid, sizeof(driver_ctx->dc_cp_guid), tag_vol->tag_guid, sizeof(tag_vol->tag_guid)); ret = iobarrier_issue_tag_all_volume(tag_list, tag_vol->nr_tags, TAG_COMMIT_NOT_PENDING, tag_common); INM_MEM_ZERO(driver_ctx->dc_cp_guid, sizeof(driver_ctx->dc_cp_guid)); if(ret) { dbg("Failed to tag all the volume\n"); msg = ecMsgTagVolumeInSequenceFailure; } else update_cx_with_tag_success(); goto remove_io_barrier; } if(tag_vol->nr_vols <= 0) { err("Tag Input Failed: Number of volumes can't be zero or \ negative"); ret = -EINVAL; goto remove_io_barrier; } /* alloc a buffer and reuse it to store the volume info for a set of volumes */ tag_vol->vol_info = (volume_info_t *)INM_KMALLOC( sizeof(volume_info_t), INM_KM_NOSLEEP, INM_KERNEL_HEAP); if(!tag_vol->vol_info) { err("INM_KMALLOC failed to allocate memory for volume_info_t"); ret = -EFAULT; goto remove_io_barrier; } for(numvol = 0; numvol < tag_vol->nr_vols; numvol++) { /* mem set the buffer before using it */ INM_MEM_ZERO(tag_vol->vol_info, sizeof(volume_info_t)); if(!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, sizeof(*tag_vol->vol_info))) { err("Read access violation for volume_info_t"); ret = -EFAULT; break; } if(INM_COPYIN(tag_vol->vol_info, arg, sizeof(*tag_vol->vol_info))) { err("INM_COPYIN failed"); ret = -EFAULT; break; } /* process the tag volume list */ tag_vol->vol_info->vol_name[TAG_VOLUME_MAX_LENGTH - 1] = '\0'; ret = process_tag_volume(tag_vol, tag_list, TAG_COMMIT_NOT_PENDING); if(ret) { if (ret == INM_EAGAIN) tag_failed = 1; dbg("Failed to tag the volume\n"); } else { no_of_vol_tags_done++; } if(!INM_ACCESS_OK(VERIFY_WRITE, (void __INM_USER *)arg, sizeof(*tag_vol->vol_info))) { err("write access verification failed"); ret = INM_EFAULT; break; } if(INM_COPYOUT(arg, tag_vol->vol_info, sizeof(*tag_vol->vol_info))) { err("copy to user failed for freeze volume status"); ret = INM_EFAULT; break; } arg += sizeof(*tag_vol->vol_info); } if (tag_failed) update_cx_with_tag_failure(); if (no_of_vol_tags_done) { if(no_of_vol_tags_done == tag_vol->nr_vols) { dbg("Tagged all volumes"); update_cx_with_tag_success(); ret = INM_TAG_SUCCESS; } else { err("Volumes partially tagged"); ret = INM_TAG_PARTIAL; } } else { /* else ret should have errno set */ err("Cannot tag any volume"); } dbg("no_of_vol_tags_done [%d], no of volumes [%d]", no_of_vol_tags_done,tag_vol->nr_vols); remove_io_barrier: /* * to handle single node crash consistency * remove io barrier at last before returning to this ioctl call */ dbg("removing io barrier\n"); INM_ATOMIC_SET(&driver_ctx->is_iobarrier_on, 0); INM_UP_WRITE(&(driver_ctx->tgt_list_sem)); #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) move_chg_nodes_to_drainable_queue(); #endif dbg("removed io barrier"); unlock_cp_mutex: INM_UP(&driver_ctx->dc_cp_mutex); out: if(tag_list) { INM_KFREE(tag_list, tag_vol->nr_tags * sizeof(tag_info_t), INM_KERNEL_HEAP); tag_list = NULL; } if(tag_vol) { if(tag_vol->vol_info) { INM_KFREE(tag_vol->vol_info, sizeof(volume_info_t), INM_KERNEL_HEAP); tag_vol->vol_info = NULL; } if(tag_vol->tag_names) { INM_KFREE(tag_vol->tag_names, tag_vol->nr_tags * sizeof(tag_names_t), INM_KERNEL_HEAP); tag_vol->tag_names = NULL; } INM_KFREE(tag_vol, sizeof(tag_info_t_v2), INM_KERNEL_HEAP); tag_vol = NULL; } if (tag_common) { if (ret < 0) telemetry_log_ioctl_failure(tag_common, ret, msg); telemetry_tag_common_put(tag_common); } dbg("leaving process_tag_volume_ioctl"); return ret; out_err_vol: tag_vol->vol_info = NULL; goto out; out_err: tag_vol->vol_info = NULL; tag_vol->tag_names = NULL; goto out; } /* * BARRIER */ /* * remove_io_barrier_all * * Checks if barrier is on and verifies the GUID to check if * barrier created by same context and then remove barrier * on all volumes by unlocking tgt_sem_list */ inm_s32_t remove_io_barrier_all(char *tag_guid, inm_s32_t tag_guid_len) { inm_s32_t error = 0; INM_DOWN(&driver_ctx->dc_cp_mutex); dbg("removing io barrier\n"); if (driver_ctx->dc_cp & INM_CP_CRASH_ACTIVE) { dbg("crash consistency on"); /* * Match the GUID of the request with that of create */ if (!INM_MEM_CMP(driver_ctx->dc_cp_guid, tag_guid, sizeof(driver_ctx->dc_cp_guid))) { dbg("Guid matched, removing barrier"); INM_UP_WRITE(&(driver_ctx->tgt_list_sem)); INM_ATOMIC_SET(&driver_ctx->is_iobarrier_on, 0); #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) move_chg_nodes_to_drainable_queue(); #endif driver_ctx->dc_cp &= ~INM_CP_CRASH_ACTIVE; dbg("New cp state = %d", driver_ctx->dc_cp); if (driver_ctx->dc_cp == INM_CP_NONE) end_cp_timer(); } else { err("Invalid remove barrier guid"); error = -EINVAL; } } else { err("Barrier not active"); error = -EINVAL; } INM_UP(&driver_ctx->dc_cp_mutex); return error; } /* * barrier_all_timeout * * Rollback barrier on timeout */ void barrier_all_timeout(wqentry_t *not_used) { err("Starting timeout procedure at %lu", jiffies); /* take the lock*/ remove_io_barrier_all(driver_ctx->dc_cp_guid, sizeof(driver_ctx->dc_cp_guid)); commit_tags_v2(driver_ctx->dc_cp_guid, TAG_REVOKE, 1); return; } /* * create_io_barrier_all * * Checks if no other CP is on. Copies GUID for new txn * and creates barrier by taking a write lock on tgt_list_sem */ inm_s32_t create_io_barrier_all(char *tag_guid, inm_s32_t tag_guid_len, int timeout_ms) { inm_s32_t error = 0; target_context_t *tgt_ctxt = NULL; inm_list_head_t *cur = NULL; inm_list_head_t *next= NULL; inm_u32_t vacp_iobarrier_timeout = 0; unsigned long lock_flag = 0; dbg("creating io barrier"); INM_SPIN_LOCK_IRQSAVE(&driver_ctx->tunables_lock, lock_flag); vacp_iobarrier_timeout = driver_ctx->tunable_params.vacp_iobarrier_timeout; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->tunables_lock, lock_flag); if (timeout_ms == 0 || timeout_ms > vacp_iobarrier_timeout) return -EINVAL; INM_DOWN(&driver_ctx->dc_cp_mutex); if (driver_ctx->dc_cp == INM_CP_NONE) { /* Stop all IO */ INM_DOWN_WRITE(&(driver_ctx->tgt_list_sem)); if (!driver_ctx->total_prot_volumes) { dbg("No protected volumes"); error = -ENODEV; goto out_err; } inm_list_for_each_safe(cur, next, &driver_ctx->tgt_list) { tgt_ctxt = inm_list_entry(cur, target_context_t, tc_list); if (tgt_ctxt->tc_dev_type != FILTER_DEV_MIRROR_SETUP && tgt_ctxt->tc_cur_wostate != ecWriteOrderStateData) { dbg("Volume not in write order state"); update_cx_with_tag_failure(); error = -EPERM; goto out_err; } } #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) volume_lock_all_close_cur_chg_node(); #endif INM_ATOMIC_SET(&driver_ctx->is_iobarrier_on, 1); #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) get_time_stamp_tag(&driver_ctx->dc_crash_tag_timestamps); volume_unlock_all(); #endif memcpy_s(&driver_ctx->dc_cp_guid, sizeof(driver_ctx->dc_cp_guid), tag_guid, tag_guid_len); dbg("Num inflight IOs while taking barrier %d\n", INM_ATOMIC_READ(&tgt_ctxt->tc_nr_in_flight_ios)); driver_ctx->dc_cp = INM_CP_CRASH_ACTIVE; start_cp_timer(timeout_ms, barrier_all_timeout); dbg("created io barrier\n"); dbg("New cp state = %d", driver_ctx->dc_cp); } else { err("Barrier already present"); error = -EAGAIN; } out: INM_UP(&driver_ctx->dc_cp_mutex); return error; out_err: INM_UP_WRITE(&(driver_ctx->tgt_list_sem)); goto out; } inm_s32_t process_create_iobarrier_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg) { flt_barrier_create_t *bcreate = NULL; inm_s32_t error = 0; dbg("Create Barrier IOCTL"); if( !INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, sizeof(flt_barrier_create_t)) ) { err("Read access violation for flt_barrier_create_t"); error = -EFAULT; goto out; } bcreate = (flt_barrier_create_t *) INM_KMALLOC(sizeof(flt_barrier_create_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if( !bcreate ) { err("INM_KMALLOC failed to allocate memory for \ flt_barrier_create_t"); error = -ENOMEM; goto out; } if( INM_COPYIN(bcreate, arg, sizeof(flt_barrier_create_t)) ) { err("INM_COPYIN failed"); error = -EFAULT; goto out; } error = create_io_barrier_all(bcreate->fbc_guid, sizeof(bcreate->fbc_guid), bcreate->fbc_timeout_ms); if( error ) err("Create barrier failed - %d", error); out: if( bcreate ) INM_KFREE(bcreate, sizeof(flt_barrier_create_t), INM_KERNEL_HEAP); return error; } inm_s32_t process_remove_iobarrier_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg) { flt_barrier_remove_t *bremove = NULL; inm_s32_t error = 0; dbg("Remove Barrier IOCTL"); if( !INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, sizeof(flt_barrier_remove_t)) ) { err("Read access violation for flt_barrier_remove_t"); error = -EFAULT; goto out; } bremove = (flt_barrier_remove_t *) INM_KMALLOC(sizeof(flt_barrier_remove_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if( !bremove ) { err("INM_KMALLOC failed to allocate memory for \ flt_barrier_remove_t"); error = -ENOMEM; goto out; } if( INM_COPYIN(bremove, arg, sizeof(flt_barrier_remove_t)) ) { err("INM_COPYIN failed"); error = -EFAULT; goto out; } error = remove_io_barrier_all(bremove->fbr_guid, sizeof(bremove->fbr_guid)); if( error ) err("Remove barrier failed - %d", error); out: if( bremove ) INM_KFREE(bremove, sizeof(flt_barrier_remove_t), INM_KERNEL_HEAP); return error; } /* * process_commit_revert_tag_ioctl * * process ioctl to commit/revert tag issued earlier * */ inm_s32_t process_commit_revert_tag_ioctl(inm_devhandle_t *idhp, void __INM_USER *arg) { inm_s32_t error = 0; flt_tag_commit_t *commit = NULL; dbg("Commit Tag IOCTL"); if (!INM_ACCESS_OK(VERIFY_READ, (void __INM_USER *)arg, sizeof(*commit))) { err("Read access violation for commit"); error = -EFAULT; goto out; } commit = INM_KMALLOC(sizeof(*commit), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!commit) { error = -ENOMEM; goto out; } if (INM_COPYIN(commit, arg, sizeof(*commit))) { err("copyin failed\n"); error = -EFAULT; goto out; } error = commit_tags_v2(commit->ftc_guid, commit->ftc_flags, 0); if( error ) err("Commit/Revoke (%d) tags failed - %d", commit->ftc_flags, error); out: if (commit) INM_KFREE(commit, sizeof(*commit), INM_KERNEL_HEAP); return error; } inm_s32_t freeze_root_dev(void) { inm_s32_t error = 0; inm_block_device_t *rbdev; inm_super_block_t *rsb; dbg("Freezing root"); if (!driver_ctx->root_dev) return -ENODEV; rbdev = inm_open_by_devnum(driver_ctx->root_dev, FMODE_READ); if (!IS_ERR(rbdev)) { error = inm_freeze_bdev(rbdev, rsb); if (!error) inm_thaw_bdev(rbdev, rsb); close_bdev(rbdev, FMODE_READ); } else { error = PTR_ERR(rbdev); } return error; } struct device * inm_get_parent_dev(struct gendisk *bd_disk) { if (!bd_disk) return NULL; #if (LINUX_VERSION_CODE >= KERNEL_VERSION(4,8,0)) || defined SLES12SP3 return (disk_to_dev(bd_disk))->parent; #else return bd_disk->driverfs_dev; #endif } inm_s32_t inm_reboot_handler(struct notifier_block *nblock, unsigned long code_unused, void *unused) { err("Got reboot notification"); INM_KFREE(nblock, sizeof(struct notifier_block), INM_KERNEL_HEAP); lcw_flush_changes(); return 0; } inm_s32_t __inm_register_reboot_notifier(struct notifier_block **nb) { struct notifier_block *nblock = NULL; inm_s32_t error = 0; nblock = INM_KMALLOC(sizeof(struct notifier_block), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!nblock) { error = -ENOMEM; err("Cannot registered for reboot notification"); } else { nblock->notifier_call = inm_reboot_handler; nblock->next = NULL; nblock->priority = 10; *nb = nblock; error = register_reboot_notifier(nblock); info("Registered for reboot notification"); } return error; } inm_s32_t __inm_unregister_reboot_notifier(struct notifier_block **nb) { struct notifier_block *nblock = *nb; *nb = NULL; info("Unregistered reboot notification"); return unregister_reboot_notifier(nblock); } inm_s32_t inm_register_reboot_notifier(int reboot_notify) { static struct notifier_block *nblock = NULL; /* we only support single reboot notification */ if (reboot_notify) { if (nblock) return 0; return __inm_register_reboot_notifier(&nblock); } else { if (!nblock) return 0; return __inm_unregister_reboot_notifier(&nblock); } } void log_console(const char *fmt, ...) { char buf[256]; va_list args; if (fmt) { va_start(args, fmt); vsnprintf(buf, sizeof(buf), fmt, args); va_end(args); buf[sizeof(buf) - 1] = '\0'; write_to_file("/dev/console", buf, sizeof(buf), NULL); } } void inm_blkdev_name(inm_bio_dev_t *bdev, char *name) { bdevname(bdev, name); } inm_s32_t inm_blkdev_get(inm_bio_dev_t *bdev) { #if LINUX_VERSION_CODE >= KERNEL_VERSION(5, 10, 0) return (IS_ERR(blkdev_get_by_dev(bdev->bd_dev, FMODE_READ | FMODE_WRITE, NULL)) ? 1 : 0); #elif (LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,38)) return blkdev_get(bdev, FMODE_READ | FMODE_WRITE, NULL); #elif (LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,19)) return blkdev_get(bdev, FMODE_READ | FMODE_WRITE); #else return blkdev_get(bdev, FMODE_READ | FMODE_WRITE, 0); #endif } involflt-0.1.0/src/involflt_debug_routines.c0000755000000000000000000001652314467303177017757 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include "involflt.h" #include "involflt-common.h" #include "data-mode.h" #include "change-node.h" #include "filestream.h" #include "iobuffer.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "bitmap_api.h" #include "VBitmap.h" #include "work_queue.h" #include "data-file-mode.h" #include "target-context.h" #include #include "involflt_debug.h" #include "driver-context.h" void print_driver_context(driver_context_t *dc) { struct inm_list_head *iter = NULL; target_context_t *tc_entry = NULL; data_page_t *entry = NULL; if (dc) { print_dbginfo("Device information \n"); print_dbginfo("Service State = %d \n", dc->service_state); print_dbginfo("# of protected volumes = %d \n", dc->total_prot_volumes); /* printing all the target contexts in driver context */ for(iter = dc->tgt_list.next; iter != &(dc->tgt_list); iter = iter->next) { tc_entry = inm_list_entry(iter, target_context_t, tc_list); print_target_context(tc_entry); } print_dbginfo("flt dev info\n"); print_dbginfo("flt cdev info\n"); print_dbginfo("Module owner name = %s\n", dc->flt_cdev.owner->name); print_dbginfo("pages info \n"); print_dbginfo("pages allocated = %d\n", dc->data_flt_ctx.pages_allocated); print_dbginfo("pages free = %d\n", dc->data_flt_ctx.pages_free); for (iter = dc->data_flt_ctx.data_pages_head.next; iter != &(dc->data_flt_ctx.data_pages_head);) { entry = inm_list_entry (iter, data_page_t, next); iter = iter->next; print_dbginfo("page address = 0x%p\n", entry); } } } void print_target_context(target_context_t *tgt_ctxt) { print_dbginfo("entered \n"); print_dbginfo("--------------------------------------------"); if (!tgt_ctxt) { print_dbginfo("\nInvalid target context\n"); return; } print_dbginfo("\nTarget Information \n"); print_dbginfo("TARGET = %p\n", tgt_ctxt); print_dbginfo("Major = %d\n" "Minor = %d\n", MAJOR(inm_dev_id_get(tgt_ctxt)), MINOR(inm_dev_id_get(tgt_ctxt))); print_dbginfo("DM device specific information \n"); if ( tgt_ctxt->tc_flags & VCF_FILTERING_STOPPED ) print_dbginfo("Filtering status= STOPPED\n"); if (tgt_ctxt->tc_flags & VCF_READ_ONLY) print_dbginfo("Filtering status= READ ONLY\n"); print_dbginfo("Filtering mode = "); switch (tgt_ctxt->tc_cur_mode) { case FLT_MODE_DATA: print_dbginfo("DATA MODE(%x)\n", (int32_t)tgt_ctxt->tc_cur_mode); break; case FLT_MODE_METADATA: print_dbginfo("META DATA MODE(%x)\n", (int32_t)tgt_ctxt->tc_cur_mode); break; default: print_dbginfo("UNKNOWN (%x)\n", (int32_t)tgt_ctxt->tc_cur_mode); break; } print_dbginfo("Pending changes = %d\n", (int32_t)tgt_ctxt->tc_pending_changes); print_dbginfo("pending changes (bytes) = %d\n", (int32_t)tgt_ctxt->tc_bytes_pending_changes); print_dbginfo("Transaction id = %d\n", (int32_t)tgt_ctxt->tc_transaction_id); if (tgt_ctxt->tc_cur_node) { print_dbginfo("Current change node information \n"); print_change_node(tgt_ctxt->tc_cur_node); } if ( tgt_ctxt->tc_pending_confirm) { print_dbginfo("Pending Transaction/dirty blk information \n\n"); print_change_node(tgt_ctxt->tc_pending_confirm); } print_dbginfo("STATISTICS\n"); print_dbginfo("# of malloc fails = %u\n", tgt_ctxt->tc_stats.num_malloc_fails); } void print_change_node(change_node_t *change_node) { struct inm_list_head *ptr; struct inm_list_head *head = &change_node->data_pg_head; data_page_t *entry; inm_s32_t count = 0; if (!change_node) print_dbginfo("Invalid change node \n"); else { print_dbginfo("NODE = "); switch (change_node->type) { case NODE_SRC_DATA: print_dbginfo("DATA\n"); break; case NODE_SRC_METADATA: print_dbginfo("META DATA\n"); /* fall through */ case NODE_SRC_TAGS: print_dbginfo("TAGS\n"); break; case NODE_SRC_DATAFILE: print_dbginfo("DATA FILE\n"); break; default: print_dbginfo("UNKNOWN\n"); } print_dbginfo("transaction id = %d\n", (int32_t)change_node->transaction_id); print_dbginfo("mapped address = %x\n", (int32_t)change_node->mapped_address); print_disk_change_head(&change_node->changes); if (head) { print_dbginfo("pages info \n"); count = 0; for (ptr = head; ptr != head;) { entry = inm_list_entry (ptr, data_page_t, next); ptr = ptr->next; print_dbginfo("page address = 0x%p\n", entry); count++; } } print_dbginfo("total # pages = %d\n", count); } } void print_disk_change_head(disk_chg_head_t *disk_change_hd) { inm_s32_t _index = 0; if (!disk_change_hd) print_dbginfo("Invalid disk changes \n"); else { print_dbginfo("disk chang head = %p\n", disk_change_hd); print_dbginfo("First time stamp = %lld\n", disk_change_hd->start_ts.TimeInHundNanoSecondsFromJan1601); print_dbginfo("Laste time stamp = %lld\n", disk_change_hd->end_ts.TimeInHundNanoSecondsFromJan1601); print_dbginfo("# changes (bytes)= %d \n", (int32_t)disk_change_hd->bytes_changes); if (!disk_change_hd->change_idx) { print_dbginfo("Empty\n"); } print_dbginfo("DISK CHANGE INFORMATION\n"); print_dbginfo("# of disk changes = %d \n", disk_change_hd->change_idx); for (_index = 0; _index < disk_change_hd->change_idx && _index < MAX_CHANGE_INFOS_PER_PAGE-1; _index++) { print_dbginfo("DISK CHANGE # = %d\n", _index); print_disk_change((disk_chg_t *) (&disk_change_hd->cur_md_pgp)[_index]); /* if one wants to display the page addresses then need to have * global buff list access ... i.e. global context info * buf_idx is the index where the change starts from **/ } } } void print_disk_change(disk_chg_t *disk_change) { if (disk_change) { print_dbginfo("offset = %x\n", (int32_t)disk_change->offset); print_dbginfo("length = %d\n", (int32_t)disk_change->length); } else print_dbginfo("Invalid disk change\n"); } void print_bio(struct bio * _bio) { if (!_bio) { print_dbginfo("OFFSET = %lld\n", (long long)INM_BUF_SECTOR(_bio)); print_dbginfo("LEN = %x\n", INM_BUF_COUNT(_bio)); } } void print_dm_bio_info(dm_bio_info_t *dm_bio_info) { if (dm_bio_info){ print_dbginfo("DM BIO INFO\n"); print_dbginfo("SECTOR = %x\n", (int32_t)dm_bio_info->bi_sector); print_dbginfo("SIZE = %d\n", (int32_t)dm_bio_info->bi_size); print_dbginfo("INDEX = %x\n", (int32_t)dm_bio_info->bi_idx); print_dbginfo("FLAGS = %x\n", (int32_t)dm_bio_info->bi_flags); } else print_dbginfo("Invalid dm_bio_info variable\n"); } involflt-0.1.0/src/svdparse.h0000755000000000000000000001257614467303177014664 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ /// /// @file svdparse.h /// /// Define interface to SV Delta files. Delta files are a list of chunks. /// Each chunk begins with a four character tag, /// a length, and then that many data bytes. /// #ifndef SVDPARSE__H #define SVDPARSE__H /* *********************************************************************** * THIS FILE SHOULD BE REMOVES IN FUTURE. THIS FILE SHOULD BE COMMON * FOR ALL FILTER DRIVERS IRRESPECTIVE OF OS. * *********************************************************************** */ #define BYTEMASK(ch) ((ch) & (0xFF)) #define INMAGE_MAKEFOURCC(ch0, ch1, ch2, ch3) \ (BYTEMASK(ch0) | (BYTEMASK(ch1) << 8) | \ (BYTEMASK(ch2) << 16) | (BYTEMASK(ch3) << 24 )) #define SVD_TAG_HEADER1 INMAGE_MAKEFOURCC( 'S', 'V', 'D', '1' ) #define SVD_TAG_DIRTY_BLOCKS INMAGE_MAKEFOURCC( 'D', 'I', 'R', 'T' ) #define SVD_TAG_DIRTY_BLOCK_DATA INMAGE_MAKEFOURCC( 'D', 'R', 'T', 'D' ) #define SVD_TAG_DIRTY_BLOCK_DATA_V2 INMAGE_MAKEFOURCC( 'D', 'D', 'V', '2' ) #define SVD_TAG_SENTINEL_HEADER INMAGE_MAKEFOURCC( 'S', 'E', 'N', 'T' ) #define SVD_TAG_SENTINEL_DIRT INMAGE_MAKEFOURCC( 'S', 'D', 'R', 'T' ) #define SVD_TAG_SYNC_HASH_COMPARE_DATA INMAGE_MAKEFOURCC( 'S', 'H', 'C', 'D' ) #define SVD_TAG_SYNC_DATA INMAGE_MAKEFOURCC( 'S', 'D', 'A', 'T' ) #define SVD_TAG_SYNC_DATA_NEEDED_INFO INMAGE_MAKEFOURCC( 'S', 'D', 'N', 'I' ) #define SVD_TAG_TIME_STAMP_OF_FIRST_CHANGE INMAGE_MAKEFOURCC( 'T', 'S', 'F', 'C' ) #define SVD_TAG_TIME_STAMP_OF_FIRST_CHANGE_V2 INMAGE_MAKEFOURCC( 'T', 'F', 'V', '2' ) #define SVD_TAG_TIME_STAMP_OF_LAST_CHANGE INMAGE_MAKEFOURCC( 'T', 'S', 'L', 'C' ) #define SVD_TAG_TIME_STAMP_OF_LAST_CHANGE_V2 INMAGE_MAKEFOURCC( 'T', 'L', 'V', '2' ) #define SVD_TAG_LENGTH_OF_DRTD_CHANGES INMAGE_MAKEFOURCC( 'L', 'O', 'D', 'C' ) #define SVD_TAG_USER INMAGE_MAKEFOURCC( 'U', 'S', 'E', 'R' ) #define SVD_TAG_BEFORMAT INMAGE_MAKEFOURCC('D','R','T','B') #define SVD_TAG_LEFORMAT INMAGE_MAKEFOURCC('D','R','T','L') typedef struct tagGUID { inm_u32_t Data1; unsigned short Data2; unsigned short Data3; unsigned char Data4[8]; } SV_GUID; #ifdef INM_LINUX #pragma pack( push, 1 ) #else #ifdef INM_SOLARIS #pragma INM_PRAGMA_PUSH1 #else #pragma pack(1) #endif #endif typedef struct { inm_u32_t tag; inm_u32_t count; inm_u32_t Flags; }SVD_PREFIX; typedef struct { unsigned char MD5Checksum[16]; /* MD5 checksum of all data that follows this field */ SV_GUID SVId; /* Unique ID assigned by amethyst */ SV_GUID OriginHost; /* Unique origin host id */ SV_GUID OriginVolumeGroup; /* Unique origin vol group id */ SV_GUID OriginVolume; /* Unique origin vol id */ SV_GUID DestHost; /* Unique dest host id */ SV_GUID DestVolumeGroup; /* Unique dest vol group id */ SV_GUID DestVolume; /* Unique dest vol id */ } SVD_HEADER1; typedef struct { inm_s64_t Length; inm_s64_t ByteOffset; } SVD_DIRTY_BLOCK; typedef struct { inm_u32_t Length; inm_u64_t ByteOffset; inm_u32_t uiSequenceNumberDelta; inm_u32_t uiTimeDelta; } SVD_DIRTY_BLOCK_V2; typedef struct { SVD_DIRTY_BLOCK DirtyBlock; inm_u64_t DataFileOffset; }SVD_DIRTY_BLOCK_INFO; typedef struct { inm_s64_t Length; inm_s64_t ByteOffset; unsigned char MD5Checksum[16]; /* MD5 checksum of all data that follows this field */ } SVD_BLOCK_CHECKSUM; /* Doesn't exist anymore. Just an array of SVD_DIRTY_BLOCK followed by Length data bytes. */ typedef struct { inm_s64_t BlockCount; }SVD_DIRTY_BLOCK_DATA; struct SVD_TIME_STAMP_HEADER { unsigned short usStreamRecType; unsigned char ucFlags; unsigned char ucLength; }; typedef struct { struct SVD_TIME_STAMP_HEADER Header; inm_u32_t ulSequenceNumber; inm_u64_t TimeInHundNanoSecondsFromJan1601; }SVD_TIME_STAMP; typedef struct { struct SVD_TIME_STAMP_HEADER Header; inm_u64_t ullSequenceNumber; inm_u64_t TimeInHundNanoSecondsFromJan1601; }SVD_TIME_STAMP_V2; /* Raw sentinel dirt file header. Two blocks: SENT and SDRT */ struct SENTINEL_DIRTYFILE_HEADER { inm_u32_t tagSentinelHeader; inm_u32_t dwSize; inm_u32_t dwMajorVersion, dwMinorVersion; inm_u64_t ullVolumeCapacity; inm_u32_t dwPageSize; inm_u32_t tagSentinelDirty; inm_u32_t dwDirtSize; }; #ifdef INM_LINUX #pragma pack( pop ) #else #ifdef INM_SOLARIS #pragma INM_PRAGMA_POP #else #pragma pack() #endif #endif #endif /* SVDPARSE__H */ involflt-0.1.0/src/target-context.h0000755000000000000000000005434514467303177016005 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef LINVOLFLT_TARGET_CONTEXT_H #define LINVOLFLT_TARGET_CONTEXT_H #include "involflt-common.h" #include "filter.h" #include "telemetry-types.h" typedef enum _devstate_t { DEVICE_STATE_INITIALIZED = 0, DEVICE_STATE_ONLINE = 1, DEVICE_STATE_OFFLINE = 2, DEVICE_STATE_SHUTDOWN = 3, } devstate_t; typedef struct __target_statistics { inm_s32_t num_malloc_fails; /* handles memomory allocation failures */ inm_u32_t num_pages_allocated; /* tracks # of pages allocated for current target */ inm_u32_t num_pgs_in_dfm_queue; /* Number of pages queued to data file thread. */ inm_u64_t dfm_bytes_to_disk; inm_atomic_t num_dfm_files; inm_atomic_t num_dfm_files_pending; inm_atomic_t num_tags_dropped; inm_atomic_t metadata_trans_due_to_delay_alloc; #define MAX_FLT_MODES 3 /* filtering statistics */ /* counter for # of times switched to each mode */ long num_change_to_flt_mode[MAX_FLT_MODES]; /* counter for time(in secs) spent in each flt mode */ long num_secs_in_flt_mode[MAX_FLT_MODES]; inm_s64_t st_mode_switch_time; #define MAX_WOSTATE_MODES 5 /* counter for # of times switched to each write order state */ long num_change_to_wostate[MAX_WOSTATE_MODES]; /* counter for # of times switched to each write order state */ long num_change_to_wostate_user[MAX_WOSTATE_MODES]; /* counter for time(in secs) spent in each write order state */ long num_secs_in_wostate[MAX_WOSTATE_MODES]; inm_s64_t st_wostate_switch_time; /* meta data statistics */ long num_change_metadata_flt_mode_on_user_req; #define MAX_NR_IO_BUCKETS 16 inm_atomic_t io_pat_reads[MAX_NR_IO_BUCKETS]; /* read io pattern */ inm_atomic_t io_pat_writes[MAX_NR_IO_BUCKETS]; /* write io pattern */ inm_u64_t tc_write_io_rcvd; inm_u64_t tc_write_io_rcvd_bytes; inm_atomic_t tc_write_cancel; inm_u64_t tc_write_cancel_rcvd_bytes; } tc_stats_t; struct tgt_hist_stats { inm_u64_t ths_start_flt_ts; /* start filtering time */ inm_u64_t ths_clrdiff_ts; /* time of last clear diffs */ inm_u32_t ths_nr_clrdiffs; /* # times clear diffs issued */ inm_u32_t ths_reserved; inm_u32_t ths_nr_osyncs; /* # times resync marked */ inm_u32_t ths_osync_err; /* last osync err code */ inm_u64_t ths_osync_ts; /* last osync time */ inm_u64_t ths_clrstats_ts; /* last time clear stats issued */ }; typedef struct tgt_hist_stats tgt_hist_stats_t; /* bitmap related declrations */ typedef struct bitmap_info { char bitmap_file_name[INM_NAME_MAX + 1]; /* UNICODE string */ unsigned long bitmap_granularity; unsigned long bitmap_offset; volume_bitmap_t *volume_bitmap; inm_u32_t num_bitmap_open_errors; inm_u32_t num_bitmap_clear_errors; inm_u32_t num_bitmap_read_errors; inm_u32_t num_bitmap_write_errors; inm_u64_t num_changes_queued_for_writing; inm_u64_t num_byte_changes_queued_for_writing; inm_u64_t num_of_times_bitmap_written; inm_u64_t num_changes_read_from_bitmap; inm_u64_t num_byte_changes_read_from_bitmap; inm_u64_t num_of_times_bitmap_read; inm_u64_t num_changes_written_to_bitmap; inm_u64_t num_byte_changes_written_to_bitmap; inm_u64_t nr_bytes_in_bmap; inm_u32_t bmap_busy_wait; char bitmap_dir_name[INM_NAME_MAX + 1]; /* UNICODE string */ } bitmap_info_t; struct create_delete_wait { struct completion wait; struct inm_list_head list; }; typedef struct _mirror_vol_entry { struct inm_list_head next; char tc_mirror_guid[INM_GUID_LEN_MAX]; inm_block_device_t *mirror_dev; inm_u64_t vol_error; inm_u64_t vol_count; inm_u64_t vol_byte_written; inm_u64_t vol_io_issued; inm_u64_t vol_io_succeeded; inm_u64_t vol_io_skiped; inm_u64_t vol_flags; inm_s32_t vol_state; inm_atomic_t vol_ref; void *vol_private; } mirror_vol_entry_t; #define INM_VOL_ENTRY_ALIVE 0x1 #define INM_VOL_ENTRY_DEAD 0x2 #define INM_VOL_ENTRY_FREED 0x4 #define INM_VOL_ENTRY_TRY_ONLINE 0x8 #define INM_PT_LUN 0x1 #define INM_AT_LUN 0x2 /* INM_BUG_ON(1) there because we never drop last ref on the volume entry * instead we call free_mirror_list to close the device and free the entry * in target_context_release(). */ #define INM_DEREF_VOL_ENTRY(vol_entry, tcp) \ { \ if(tcp->tc_dev_type == FILTER_DEV_MIRROR_SETUP && INM_ATOMIC_DEC_AND_TEST(&(vol_entry->vol_ref))){ \ dbg("deleting vol_entry for %s",vol_entry->tc_mirror_guid); \ INM_BUG_ON(1); \ volume_lock(tcp); \ inm_list_del(&(vol_entry->next)); \ volume_unlock(tcp); \ INM_KFREE(vol_entry, sizeof(mirror_vol_entry_t), INM_KERNEL_HEAP); \ vol_entry = NULL; \ } \ } #define INM_REF_VOL_ENTRY(vol_entry) INM_ATOMIC_INC(&(vol_entry->vol_ref)) /* structure for latency distribution */ #define INM_LATENCY_DIST_BKT_CAPACITY 12 #define INM_LATENCY_LOG_CAPACITY 12 struct inm_latency_stats { inm_u64_t ls_bkts[INM_LATENCY_DIST_BKT_CAPACITY]; inm_u32_t ls_freq[INM_LATENCY_DIST_BKT_CAPACITY]; inm_u32_t ls_freq_used_bkt; inm_u32_t ls_nr_avail_bkts; inm_u64_t ls_init_min_max; inm_u64_t ls_log_buf[INM_LATENCY_LOG_CAPACITY]; inm_u64_t ls_log_min; inm_u64_t ls_log_max; inm_u32_t ls_log_idx; }; typedef struct inm_latency_stats inm_latency_stats_t; /* Maintains Disk level CX session */ typedef struct _disk_cx_session { inm_u64_t dcs_flags; /* Flags */ inm_u64_t dcs_start_ts; /* CX session start time */ inm_u64_t dcs_end_ts; /* CX session end time */ inm_u64_t dcs_base_secs_ts; /* This is the base time to calculate 1s intervals */ inm_u64_t dcs_tracked_bytes; /* Bytes tracked */ inm_u64_t dcs_drained_bytes; /* Bytes drained */ inm_u64_t dcs_tracked_bytes_per_second; /* Bytes tracked every second */ inm_u64_t dcs_churn_buckets[DEFAULT_NR_CHURN_BUCKETS]; /* Churn buckets */ inm_u64_t dcs_first_nw_failure_ts; /* First network failure time in this CX session */ inm_u64_t dcs_last_nw_failure_ts; /* Last network failure time in this CX session */ inm_u64_t dcs_last_nw_failure_error_code; /* Error code for last network failure */ inm_u64_t dcs_nr_nw_failures; /* Number of network failures */ inm_u64_t dcs_max_peak_churn; /* Max peak churn */ inm_u64_t dcs_first_peak_churn_ts; /* Time of first peak churn */ inm_u64_t dcs_last_peak_churn_ts; /* Time of first peak churn */ inm_u64_t dcs_excess_churn; /* Excess churn on top of peak churn */ inm_u64_t dcs_max_s2_latency; /* S2 latency */ inm_u64_t dcs_nth_cx_session; /* CS session number */ inm_list_head_t dcs_list; disk_cx_stats_info_t *dcs_disk_cx_stats_info; } disk_cx_session_t; #define DCS_CX_SESSION_STARTED 0x01 #define DCS_CX_SESSION_ENDED 0x02 /* Reasos for closing CX session at disk level */ enum { CX_CLOSE_PENDING_BYTES_BELOW_THRESHOLD = 1, CX_CLOSE_STOP_FILTERING_ISSUED, CX_CLOSE_DISK_REMOVAL, }; /* There will be one instance of this structure per involflt target. New * instance of this structure is created while stacking operation. It holds * target specific information. */ typedef struct _target_context { struct inm_list_head tc_list; /* links all targets, head in driver-context */ inm_u32_t tc_flags; /* Indicated if the volume is read-only etc. */ inm_sem_t tc_sem; inm_u32_t refcnt; /* Using reference counting infrastructe provided by sysfs interface. */ void (*release)(void *); flt_mode dummy_tc_cur_mode; /* Current filtering mode for this target. */ flt_mode dummy_tc_prev_mode; /* previous filtering mode for this target */ devstate_t dummy_tc_dev_state; inm_spinlock_t tc_lock; /* lock to protect change list */ inm_spinlock_t tc_tunables_lock; /* spin lock function pointers */ void (*tc_lock_fn)(struct _target_context *); void (*tc_unlock_fn)(struct _target_context *); unsigned long tc_lock_flag; change_node_t *tc_cur_node; struct inm_list_head tc_node_head; struct inm_list_head tc_non_drainable_node_head; struct inm_list_head tc_nwo_dmode_list; /*link all non write order data changes */ inm_wait_queue_head_t tc_waitq; /* Data File Mode Support */ data_file_flt_t tc_dfm; inm_u32_t tc_nr_cns; inm_s64_t tc_bytes_tracked; inm_s64_t tc_pending_changes; inm_s64_t tc_pending_md_changes; inm_s64_t tc_bytes_pending_md_changes; inm_s64_t tc_pending_wostate_data_changes; inm_s64_t tc_pending_wostate_md_changes; inm_s64_t tc_pending_wostate_bm_changes; inm_s64_t tc_pending_wostate_rbm_changes; inm_s64_t tc_bytes_pending_changes; inm_s64_t tc_bytes_coalesced_changes; inm_s64_t tc_bytes_overlap_changes; inm_s64_t tc_cnode_pgs; inm_s64_t tc_commited_changes; inm_s64_t tc_bytes_commited_changes; inm_s64_t tc_transaction_id; inm_s64_t tc_prev_transaction_id; change_node_t *tc_pending_confirm; inm_u32_t tc_db_notify_thres; inm_u64_t tc_data_to_disk_limit; char *tc_data_log_dir; tc_stats_t tc_stats; /* statistics info */ char *tc_datafile_dir_name; /* Reserved pages from data pool */ inm_u32_t tc_reserved_pages; // sync flags inm_s32_t tc_resync_required; inm_s32_t tc_resync_indicated; /* ResyncRequired flag sent to user mode * process */ unsigned long tc_nr_out_of_sync; unsigned long tc_out_of_sync_err_code; inm_u64_t tc_out_of_sync_time_stamp; unsigned long tc_out_of_sync_err_status; unsigned long tc_nr_out_of_sync_indicated; inm_device_t tc_dev_type; void *tc_priv; /* points to host_dev_ctx/fabric_dev_ctx */ bitmap_info_t *tc_bp; /* non-NULL if bitmap is enabled (default) */ /* ideally guid should be moved to device specific data structure, but * sysfs assumes this to be in here, so leaving it in here for now. */ char tc_guid[INM_GUID_LEN_MAX]; tgt_hist_stats_t tc_hist; flt_mode tc_cur_mode; /* Current filtering mode for this target. */ flt_mode tc_prev_mode; /* previous filtering mode for this target */ etWriteOrderState tc_cur_wostate; etWriteOrderState tc_prev_wostate; devstate_t tc_dev_state; inm_completion_t exit; struct inm_list_head cdw_list; inm_sem_t cdw_sem; /* Lock to protect the above list */ /* mirror setup data strucutres */ struct inm_list_head tc_src_list; struct inm_list_head tc_dst_list; mirror_vol_entry_t *tc_vol_entry; char *tc_mnt_pt; inm_s32_t tc_filtering_disable_required; inm_u64_t tc_CurrEndSequenceNumber; inm_u64_t tc_CurrEndTimeStamp; inm_u32_t tc_CurrSequenceIDforSplitIO; inm_u64_t tc_PrevEndSequenceNumber; inm_u64_t tc_PrevEndTimeStamp; inm_u32_t tc_PrevSequenceIDforSplitIO; inm_u64_t tc_rpo_timestamp; inm_s32_t tc_tso_file; inm_s64_t tc_tso_trans_id; char tc_pname[INM_GUID_LEN_MAX]; inm_u64_t tc_dev_startoff; inm_atomic_t tc_async_bufs_pending; inm_atomic_t tc_async_bufs_processed; inm_atomic_t tc_async_bufs_write_pending; inm_atomic_t tc_async_bufs_write_processed; inm_u64_t tc_nr_requests_queued; inm_u64_t tc_nr_bufs_queued_to_thread; inm_u64_t tc_nr_bufs_processed_by_thread; inm_u64_t tc_nr_processed_queued_bufs; inm_u64_t tc_nr_ddwrites_called; inm_atomic_t tc_nr_bufs_pending; inm_atomic_t tc_nr_bufs_processed; inm_atomic_t tc_mixedbufs; inm_atomic_t tc_read_buf_first; inm_atomic_t tc_write_buf_first; int tc_more_done_set; int tc_nr_bufs_submitted_gr_than_one; inm_u64_t tc_nr_spilt_io_data_mode; inm_u64_t tc_nr_xm_mapin_failures; inm_wait_queue_head_t tc_wq_in_flight_ios; inm_atomic_t tc_nr_in_flight_ios; UDIRTY_BLOCK_V2 *tc_db_v2; inm_u64_t tc_dbwait_event_ts_in_usec; inm_latency_stats_t tc_dbwait_notify_latstat; inm_latency_stats_t tc_dbret_latstat; inm_latency_stats_t tc_dbcommit_latstat; inm_u32_t tc_optimize_performance; target_telemetry_t tc_tel; inm_sem_t tc_resync_sem; disk_cx_session_t tc_disk_cx_session; inm_u64_t tc_s2_latency_base_ts; TAG_COMMIT_STATUS *tc_tag_commit_status; inm_atomic_t tc_nr_chain_bios_submitted; inm_atomic_t tc_nr_chain_bios_pending; inm_atomic_t tc_nr_completed_in_child_stack; inm_atomic_t tc_nr_completed_in_own_stack; } target_context_t; #define volume_lock(ctx) \ do { \ INM_BUG_ON(((target_context_t *)ctx)->tc_lock_fn == NULL); \ ((target_context_t *)ctx)->tc_lock_fn((target_context_t *)ctx); \ } while (0) #define volume_unlock(ctx) \ do { \ INM_BUG_ON(((target_context_t *)ctx)->tc_unlock_fn == NULL); \ ((target_context_t *)ctx)->tc_unlock_fn((target_context_t *)ctx); \ } while (0) /* flags */ #define VCF_FILTERING_STOPPED 0x00000001 #define VCF_READ_ONLY 0x00000002 #define VCF_DATA_MODE_DISABLED 0x00000004 #define VCF_DATA_FILES_DISABLED 0x00000008 #define VCF_OPEN_BITMAP_FAILED 0x00000010 #define VCF_VOLUME_TO_BE_FROZEN 0x00000020 #define VCF_VOLUME_IN_GET_DB 0x00000040 #define VCF_VOLUME_IN_BMAP_WRITE 0x00000080 #define VCF_VOLUME_INITRD_STACKED 0x00000100 #define VCF_VOLUME_BOOTTIME_STACKED 0x00000200 #define VCF_VOLUME_LETTER_OBTAINED 0x00000400 #define VCF_CV_FS_UNMOUNTED 0x00000800 #define VCF_OPEN_BITMAP_REQUESTED 0x00001000 #define VCF_BITMAP_READ_DISABLED 0x00002000 #define VCF_BITMAP_WRITE_DISABLED 0x00004000 #define VCF_VOLUME_DELETING 0x00008000 #define VCF_VOLUME_CREATING 0x00010000 #define VCF_FULL_DEV 0x00020000 #define VCF_FULL_DEV_PARTITION 0x00040000 #define VCF_VOLUME_LOCKED 0x00080000 #define VCF_IGNORE_BITMAP_CREATION 0x00100000 #define VCF_DATAFILE_DIR_CREATED 0x00200000 #define VCF_VOLUME_FROZEN_SYS_SHUTDOWN 0x00800000 #define VCF_MIRRORING_PAUSED 0x01000000 #define VCF_VOLUME_STACKED_PARTIALLY 0x02000000 #define VCF_TAG_COMMIT_PENDING 0x04000000 #define VCF_ROOT_DEV 0x08000000 #define VCF_DRAIN_BARRIER 0x10000000 #define VCF_IN_NWO 0x20000000 #define VCF_DRAIN_BLOCKED 0x40000000 #define VCF_IO_BARRIER_ON 0x80000000 #define MAX_BITMAP_OPEN_ERRORS_TO_STOP_FILTERING 0x0040 /* exponential back off 1, 2, 4, 8, 16, 32, 64,128 */ #define MAX_DELAY_FOR_BITMAP_FILE_OPEN_IN_SECONDS 300 // 5 * 60 Sec = 5 Minutes #define MIN_DELAY_FOR_BIMTAP_FILE_OPEN_IN_SECONDS 1 // 1 second //miscellenous constants //parameters save all function enum { INM_NO_OP = 0, INM_STOP_FILTERING = 1, INM_SYSTEM_SHUTDOWN = 2, INM_UNSTACK = 3, }; #define is_target_read_only(ctx) (((target_context_t *)ctx)->tc_flags & \ VCF_READ_ONLY) #define is_target_filtering_disabled(ctx) (((target_context_t *)ctx)->tc_flags & \ VCF_FILTERING_STOPPED) #define is_target_being_frozen(ctx) (((target_context_t *)ctx)->tc_flags & \ VCF_VOLUME_TO_BE_FROZEN) #define is_target_mirror_paused(ctx) (((target_context_t *)ctx)->tc_flags & \ VCF_MIRRORING_PAUSED) #define is_target_tag_commit_pending(ctx) (((target_context_t *)ctx)->tc_flags & \ VCF_TAG_COMMIT_PENDING) #define is_target_drain_barrier_on(ctx) (((target_context_t *)ctx)->tc_flags & \ VCF_DRAIN_BARRIER) #define is_target_enabled_for_data_filtering(ctx) \ (!(((target_context_t *) ctx)->tc_flags & VCF_DATA_MODE_DISABLED)) /* ( !((target_context_t *) ctx)->tc_flags & VCF_DATA_MODE_DISABLED) || \ * (driver_context->service_supports_data_filtering) || \ * (driver_context->enable_data_filtering)) ? 1 : 0 **/ #define __should_wakeup_s2(ctx) \ (((((target_context_t *)ctx)->tc_bytes_pending_changes >= \ ((target_context_t *)ctx)->tc_db_notify_thres) || \ (((((target_context_t *)ctx)->tc_pending_changes - \ ((target_context_t *)ctx)->tc_pending_md_changes) > 0) && \ (((target_context_t *)ctx)->tc_pending_md_changes > 0)) || \ ((((target_context_t *)ctx)->tc_pending_md_changes >= \ MAX_CHANGE_INFOS_PER_PAGE))) ? 1 : 0) #define should_wakeup_s2(ctx) \ (!is_target_drain_barrier_on(ctx) && \ __should_wakeup_s2(ctx) ? 1 : 0) #define should_wakeup_s2_ignore_drain_barrier(ctx) __should_wakeup_s2(ctx) #define should_wait_for_db(ctx) \ (!is_target_drain_barrier_on(ctx) && \ ((((target_context_t *)ctx)->tc_bytes_pending_changes >= \ ((target_context_t *)ctx)->tc_db_notify_thres) || \ (((((target_context_t *)ctx)->tc_pending_changes - \ ((target_context_t *)ctx)->tc_pending_md_changes) > 0) && \ (((target_context_t *)ctx)->tc_pending_md_changes > 0)) || \ ((((target_context_t *)ctx)->tc_pending_md_changes >= \ MAX_CHANGE_INFOS_PER_PAGE))) ? 0 : 1) #define should_wakeup_monitor_thread(vm_cx_sess, get_cx_notify) \ (((!(vm_cx_sess->vcs_flags & VCS_CX_PRODUCT_ISSUE) && \ (get_cx_notify->ullMinConsecutiveTagFailures <= \ vm_cx_sess->vcs_num_consecutive_tag_failures)) || \ vm_cx_sess->vcs_timejump_ts) ? 1 : 0) target_context_t *target_context_ctr(void); void target_context_dtr(target_context_t *); inm_s32_t tgt_ctx_spec_init(target_context_t *, inm_dev_extinfo_t *); inm_s32_t tgt_ctx_common_init(target_context_t *, struct inm_dev_extinfo *); void tgt_ctx_common_deinit(target_context_t *); void tgt_ctx_spec_deinit(target_context_t *); int check_for_tc_state(target_context_t *, int); void wake_up_tc_state(target_context_t *); target_context_t *get_tgt_ctxt_from_uuid_locked(char *); target_context_t *get_tgt_ctxt_from_scsiid_locked(char *); target_context_t *get_tgt_ctxt_from_scsiid(char *); target_context_t *get_tgt_ctxt_from_uuid(char *); target_context_t *get_tgt_ctxt_from_uuid_nowait(char *); target_context_t *get_tgt_ctxt_from_uuid_nowait_locked(char *); target_context_t *get_tgt_ctxt_from_name_nowait(char *id); target_context_t *get_tgt_ctxt_from_name_nowait_locked(char *id); void target_context_release(target_context_t *); inm_s32_t stack_host_dev(target_context_t *ctx, inm_dev_extinfo_t *dinfo); target_context_t * get_tgt_ctxt_persisted_name_nowait_locked(char *); static_inline void get_tgt_ctxt(target_context_t *ctxt) { INM_ATOMIC_INC(&(ctxt)->refcnt); } static_inline void put_tgt_ctxt(target_context_t *ctxt) { if (INM_ATOMIC_DEC_AND_TEST(&(ctxt->refcnt))) { dbg("put_tgt_ctxt- target context ref count:%d", ctxt->refcnt); if(ctxt->release){ ctxt->release(ctxt); } } } inm_s32_t can_switch_to_data_filtering_mode(target_context_t *); inm_s32_t can_switch_to_data_wostate(target_context_t *); inm_s32_t set_tgt_ctxt_filtering_mode(target_context_t *tgt_ctxt, flt_mode filtering_mode, inm_s32_t service_initiated); inm_s32_t set_tgt_ctxt_wostate(target_context_t *, etWriteOrderState, inm_s32_t, etWOSChangeReason); void set_malloc_fail_error(target_context_t * tgt_ctxt); void volume_lock_irqsave(target_context_t *); void volume_unlock_irqrestore(target_context_t *); void volume_lock_bh(target_context_t *); void volume_unlock_bh(target_context_t *); inm_s32_t is_data_filtering_enabled_for_this_volume(target_context_t *vcptr); void fs_freeze_volume(target_context_t *, struct inm_list_head *head); void thaw_volume(target_context_t *, struct inm_list_head *head); inm_s32_t inm_dev_guid_cmp(target_context_t *, char *); inm_dev_t inm_dev_id_get(target_context_t *); inm_dev_t inm_dev_id_get(target_context_t *); inm_u64_t inm_dev_size_get(target_context_t *); void tgt_ctx_force_soft_remove(target_context_t *); void target_forced_cleanup(target_context_t *); inm_s32_t filter_guid_name_val_get(char *, char *); inm_s32_t filter_ctx_name_val_set(target_context_t *, char *, int); void inm_do_clear_stats(target_context_t *tcp); inm_s32_t inm_validate_tc_devattr(target_context_t *tcp, inm_dev_info_t *dip); char *get_volume_override(target_context_t *vcptr); void start_notify_completion(void); inm_u32_t get_data_source(target_context_t *); void do_clear_diffs(target_context_t *tgt_ctxt); void add_changes_to_pending_changes(target_context_t *, etWriteOrderState, inm_u32_t); void subtract_changes_from_pending_changes(target_context_t *, etWriteOrderState, inm_u32_t); void collect_latency_stats(inm_latency_stats_t *lat_stp, inm_u64_t time_in_usec); void retrieve_volume_latency_stats(target_context_t *, VOLUME_LATENCY_STATS *); inm_u64_t get_rpo_timestamp(target_context_t *ctxt, inm_u32_t flag, struct _change_node *pending_confirm); void inm_tc_reserv_init(target_context_t *ctx, int vol_lock); void update_cx_session(target_context_t *ctxt, inm_u32_t nr_bytes); void end_cx_session(void); void update_cx_with_tag_failure(void); void update_cx_with_tag_success(void); void update_cx_with_s2_latency(target_context_t *ctxt); void update_cx_with_time_jump(inm_u64_t cur_time, inm_u64_t prev_time); void close_disk_cx_session(target_context_t *ctxt, int reason_code); void update_cx_session_with_committed_bytes(target_context_t *ctxt, inm_s32_t committed_bytes); void update_cx_product_issue(int flag); void reset_s2_latency_time(void); void add_disk_sess_to_dc(target_context_t *ctxt); void remove_disk_sess_from_dc(target_context_t *ctxt); void volume_lock_all_close_cur_chg_node(void); void volume_unlock_all(void); #endif involflt-0.1.0/src/iobuffer.h0000755000000000000000000000443214467303177014626 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _INMAGE_IOBUFFER_H #define _INMAGE_IOBUFFER_H #include "involflt-common.h" struct _fstream_segment_mapper_tag; struct _volume_bitmap; typedef struct _iobuffer_tag { struct inm_list_head list_entry; inm_u32_t size; unsigned char *buffer; inm_u8_t dirty; inm_atomic_t locked; inm_u64_t starting_offset; struct _bitmap_api_tag *bapi; struct _fstream_segment_mapper_tag *fssm; inm_u32_t fssm_index; inm_atomic_t refcnt; inm_kmem_cache_t *iob_obj_cache; /*object Lookasidelist*/ inm_kmem_cache_t *iob_data_cache; /*data Lookaside list*/ void *allocation_ptr; }iobuffer_t; iobuffer_t *iobuffer_ctr(struct _bitmap_api_tag *bapi, inm_u32_t buffer_size, inm_u32_t index); void iobuffer_dtr(iobuffer_t *iob); iobuffer_t *iobuffer_get(iobuffer_t *iob); void iobuffer_put(iobuffer_t *iob); inm_s32_t iobuffer_sync_read(iobuffer_t *iob); inm_s32_t iobuffer_sync_flush(iobuffer_t *iob); void iobuffer_set_fstream(iobuffer_t *iob, fstream_t *fs); void iobuffer_set_foffset(iobuffer_t *iob, inm_u64_t file_offset); void iobuffer_set_owner_index(iobuffer_t *iob, inm_u32_t owner_index); inm_u32_t iobuffer_get_owner_index(iobuffer_t *iob); inm_s32_t iobuffer_isdirty(iobuffer_t *iob); inm_s32_t iobuffer_islocked(iobuffer_t *iob); void iobuffer_setdirty(iobuffer_t *iob); void iobuffer_lockbuffer(iobuffer_t *iob); void iobuffer_unlockbuffer(iobuffer_t *iob); inm_s32_t iobuffer_initialize_memory_lookaside_list(void); void iobuffer_terminate_memory_lookaside_list(void); #endif /* _INMAGE_IOBUFFER_H */ involflt-0.1.0/src/telemetry-exception.c0000755000000000000000000001361614467303177017032 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include "involflt.h" #include "work_queue.h" #include "utils.h" #include "filestream.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "VBitmap.h" #include "change-node.h" #include "data-file-mode.h" #include "target-context.h" #include "data-mode.h" #include "driver-context.h" #include "file-io.h" #include "osdep.h" #include "telemetry-types.h" #include "telemetry-exception.h" #include "telemetry.h" #include "telemetry-exception.h" static inm_spinlock_t exception_lock; static inm_list_head_t exception_list; static inm_u32_t exception_gen = 0; exception_buf_t exception_enomem = { .eb_refcnt = 1, .eb_gen = 0, .eb_buf = "{\\\"Exception\\\":[{\\\"Seq\\\":\\\"0\\\",\\\"Err\\\":\\\"ENOMEM\\\"}]}" }; exception_buf_t exception_none = { .eb_refcnt = 1, .eb_gen = 0, .eb_buf = "{\\\"Exception\\\":[{\\\"Seq\\\":\\\"0\\\",\\\"Err\\\":\\\"0\\\"}]}" }; static void telemetry_reset_exception(exception_t *exception) { INM_MEM_ZERO(exception->e_tag, sizeof(exception->e_tag)); exception->e_first_time = 0; exception->e_last_time = 0; exception->e_error = excSuccess; exception->e_count = 0; exception->e_data = 0; } void telemetry_init_exception(void) { exception_t *exception = NULL; int i = 0; INM_INIT_LIST_HEAD(&exception_list); INM_INIT_SPIN_LOCK(&exception_lock); for (i = 0; i < INM_EXCEPTION_MAX; i++) { exception = INM_KMALLOC(sizeof(exception_t), INM_KM_SLEEP, INM_KERNEL_HEAP ); if (!exception) { err("Cannot allocate memory for exceptions data"); continue; } telemetry_reset_exception(exception); INM_INIT_LIST_HEAD(&exception->e_list); inm_list_add(&exception->e_list, &exception_list); } } void telemetry_set_exception(char *tag, etException error, inm_u64_t data) { exception_t *exception = NULL; inm_irqflag_t flag = 0; struct inm_list_head *cur = NULL; char *default_tag = "NA"; if (!tag) tag = default_tag; if (inm_list_empty(&exception_list)) return; INM_SPIN_LOCK_IRQSAVE(&exception_lock, flag); __inm_list_for_each(cur, &exception_list) { exception = inm_list_entry(cur, exception_t, e_list); if (exception->e_error == error && !strcmp(tag, exception->e_tag)) { get_time_stamp(&exception->e_last_time); break; } } /* If no matching exception, reuse the oldest exception */ if (cur == &exception_list) { cur = cur->prev; exception = inm_list_entry(cur, exception_t, e_list); telemetry_reset_exception(exception); get_time_stamp(&exception->e_first_time); exception->e_last_time = exception->e_first_time; } exception->e_error = error; exception->e_data = data; exception->e_count++; strcpy_s(exception->e_tag, sizeof(exception->e_tag), tag); /* Maintain LRU of exceptions */ inm_list_del_init(&exception->e_list); inm_list_add(&exception->e_list, &exception_list); exception_gen++; INM_SPIN_UNLOCK_IRQRESTORE(&exception_lock, flag); } void telemetry_put_exception(exception_buf_t *buf) { if (INM_ATOMIC_DEC_AND_TEST(&buf->eb_refcnt)) { if (buf == &exception_none || buf == &exception_enomem) { err("Trying to put default exception"); INM_BUG_ON(buf == &exception_none || buf == &exception_enomem); } else { INM_KFREE(buf, sizeof(exception_buf_t), INM_KERNEL_HEAP); } } } exception_buf_t * telemetry_get_exception(void) { static exception_buf_t *buf = NULL; exception_t *exception = NULL; inm_irqflag_t flag = 0; struct inm_list_head *cur = NULL; int offset = 0; int count = 0; exception_buf_t *ret = NULL; INM_SPIN_LOCK_IRQSAVE(&exception_lock, flag); if (buf) { if (buf->eb_gen == exception_gen) { ret = buf; goto out; } else { telemetry_put_exception(buf); buf = NULL; } } INM_BUG_ON(buf); if (exception_gen) /* Exceptions */ buf = INM_KMALLOC(sizeof(exception_buf_t), INM_KM_NOSLEEP | INM_KM_NOIO, INM_KERNEL_HEAP); if (buf) { ret = buf; INM_ATOMIC_SET(&buf->eb_refcnt, 1); buf->eb_gen = exception_gen; /* Generate exception string to be pushed to telemetry */ offset += sprintf_s(buf->eb_buf, INM_EXCEPTION_BUFSZ - offset, "{\\\"Exception\\\":["); __inm_list_for_each(cur, &exception_list) { exception = inm_list_entry(cur, exception_t, e_list); if (!exception->e_error) break; offset += sprintf_s(buf->eb_buf + offset, INM_EXCEPTION_BUFSZ - offset, "{\\\"Seq\\\":\\\"%d\\\",\\\"Err\\\":\\\"%d\\\"," "\\\"Cnt\\\":\\\"%u\\\",\\\"Data\\\":\\\"%llu\\\"," "\\\"First\\\":\\\"%llu\\\",\\\"Last\\\":\\\"%llu\\\"," "\\\"Tag\\\":\\\"%s\\\"}", exception_gen - count, exception->e_error, exception->e_count, exception->e_data, exception->e_first_time, exception->e_last_time, exception->e_tag); count++; } sprintf_s(buf->eb_buf + offset, INM_EXCEPTION_BUFSZ - offset, "]}"); } else { if (exception_gen) ret = &exception_enomem; else ret = &exception_none; } out: INM_ATOMIC_INC(&ret->eb_refcnt); INM_SPIN_UNLOCK_IRQRESTORE(&exception_lock, flag); return ret; } involflt-0.1.0/src/involflt_debug.h0000755000000000000000000000223314467303177016025 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include "involflt-common.h" #include "change-node.h" #include "target-context.h" #define IDEBUG_METADATA 1 #ifndef LINVOLFLT_DEBUG_H #define LINVOLFLT_DEBUG_H #define DEBUG_LEVEL 1 void print_target_context(target_context_t *); void print_disk_change_head(disk_chg_head_t *); void print_change_node(change_node_t *); void print_disk_change(disk_chg_t *); #endif involflt-0.1.0/src/Makefile0000755000000000000000000001723114467303177014315 0ustar rootroot# SPDX-License-Identifier: GPL-2.0-only # Copyright (C) 2022 Microsoft Corporation # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License along # with this program; if not, write to the Free Software Foundation, Inc., # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. ifeq (, $(WORKING_DIR)) WORKING_DIR=${shell pwd} endif BLD_DIR=${WORKING_DIR}/ EXTRA_CFLAGS += $(CFLAGS) -Wall -Wstrict-prototypes -DINM_LINUX -D__INM_KERNEL_DRIVERS__ ifeq (, $(VERSION_MAJOR)) VERSION_MAJOR:=1 endif ifeq (, $(VERSION_MINOR)) VERSION_MINOR:=0 endif ifeq (, $(VERSION_BUILDNUM)) VERSION_BUILDNUM:=0 endif ifeq (, $(VERSION_PRIVATE)) VERSION_PRIVATE:=1 endif EXTRA_CFLAGS += -DINMAGE_PRODUCT_VERSION_MAJOR=${VERSION_MAJOR} EXTRA_CFLAGS += -DINMAGE_PRODUCT_VERSION_MINOR=${VERSION_MINOR} EXTRA_CFLAGS += -DINMAGE_PRODUCT_VERSION_BUILDNUM=${VERSION_BUILDNUM} EXTRA_CFLAGS += -DINMAGE_PRODUCT_VERSION_PRIVATE=${VERSION_PRIVATE} ifeq (, $(BLD_DATE)) # same as __DATE__ BLD_DATE=""$(shell date +"%b %d %Y")"" endif ifeq (, $(BLD_TIME)) # same as __TIME__ BLD_TIME=""$(shell date +"%H:%M:%S")"" endif EXTRA_CFLAGS += -DBLD_DATE="\"${BLD_DATE}\"" -DBLD_TIME="\"${BLD_TIME}\"" # To handle distro and/or update/service pack/patch sepecific issues VENDOR:=$(shell export PATH=/bin:/sbin:/usr/bin:/usr/sbin:$PATH; \ if [ -f /etc/redhat-release ]; then echo "redhat"; \ elif [ -f /etc/SuSE-release ] ; then echo "suse"; \ elif [ -f /etc/os-release ] && grep -q 'SLES' /etc/os-release ; then echo "suse-os" ; \ elif [ -f /etc/lsb-release ] && grep -q 'Ubuntu' /etc/lsb-release ; then echo "ubuntu" ; \ elif [ -f /etc/debian_version ]; then echo "debian"; \ else echo "OS_UNKNOWN"; fi) EXTRA_CFLAGS += -D${VENDOR} ifeq (yes, $(noerror)) EXTRA_CFLAGS += -Wno-error endif ifeq ($(findstring suse, $(VENDOR)), suse) ifeq ($(VENDOR), suse) DISTRO_VER:=$(shell export PATH=/bin:/sbin:/usr/bin:/usr/sbin:$PATH;\ grep VERSION /etc/SuSE-release | cut -d" " -f3 | \ cut -d"." -f 1) ifeq ($(PATCH_LEVEL), ) PATCH_LEVEL:=$(shell grep PATCHLEVEL /etc/SuSE-release | cut -d" " -f3) endif else DISTRO_VER:=$(shell export PATH=/bin:/sbin:/usr/bin:/usr/sbin:$PATH;\ grep VERSION= /etc/os-release | cut -d"\"" -f 2 | \ cut -d"-" -f 1) PATCH_LEVEL:=$(shell export PATH=/bin:/sbin:/usr/bin:/usr/sbin:$PATH;\ grep -o -- -SP[0-9] /etc/os-release | cut -d"P" -f 2) VENDOR:=suse endif ifeq ($(PATCH_LEVEL), ) PATCH_LEVEL:=0 endif EXTRA_CFLAGS += -DDISTRO_VER=${DISTRO_VER} -DPATCH_LEVEL=${PATCH_LEVEL} ifeq ($(DISTRO_VER), 12) EXTRA_CFLAGS += -mindirect-branch=thunk-inline -mindirect-branch-register endif endif ifeq ($(findstring redhat, $(VENDOR)), redhat) IS_CENTOS:=$(shell export PATH=/bin:/sbin:/usr/bin:/usr/sbin:$PATH; cat /etc/redhat-release | grep -i '^centos' | cut -d" " -f 1 | tr A-Z a-z) ifeq ($(findstring centos, $(IS_CENTOS)), centos) VER_STR=$(shell export PATH=/bin:/sbin:/usr/bin:/usr/sbin:$PATH;\ cat /etc/redhat-release | sed 's/Server //' | cut -d" " -f 4) else VER_STR=$(shell export PATH=/bin:/sbin:/usr/bin:/usr/sbin:$PATH;\ cat /etc/redhat-release | sed 's/Server //' | cut -d" " -f 6) endif DISTRO_VER:=$(shell export PATH=/bin:/sbin:/usr/bin:/usr/sbin:$PATH;\ echo ${VER_STR} | cut -d"." -f 1) ifeq ($(DISTRO_VER), ) DISTRO_VER:=$(shell export PATH=/bin:/sbin:/usr/bin:/usr/sbin:$PATH; \ uname -r | cut -d"e" -f 2 | cut -d"." -f 1 | cut -d"l" -f 2) ifeq ($(DISTRO_VER), ) DISTRO_VER:=4 endif endif ifeq ($(DISTRO_VER), 4) UPDATE:=`if [ \`grep Update /etc/redhat-release | wc -l\` -le 1 ]; \ then echo 0; else cat /etc/redhat-release | cut -d" " -f 10 | \ tr -d ")"; fi` else ifneq ($(VER_STR), $(DISTRO_VER)) UPDATE:=$(shell export PATH=/bin:/sbin:/usr/bin:/usr/sbin:$PATH;\ echo ${VER_STR} | cut -d" " -f 7 | \ cut -d"." -f 2) endif endif ifeq ($(UPDATE), ) UPDATE:=0 endif STATUS:=$(shell export PATH=/bin:/sbin:/usr/bin:/usr/sbin:$PATH;\ grep XenServer /etc/redhat-release) ifneq ($(STATUS), ) DISTRO_VER:=0 endif EXTRA_CFLAGS += -DDISTRO_VER=${DISTRO_VER} -DUPDATE=${UPDATE} ifeq ($(findstring 4.18.0-425, $(KDIR)), 4.18.0-425) ifeq ($(findstring el8, $(KDIR)), el8) EXTRA_CFLAGS += -DSET_INM_QUEUE_FLAG_STABLE_WRITE endif endif ifeq ($(findstring uek, $(KDIR)), uek) EXTRA_CFLAGS += -DUEK endif ifeq ($(findstring 2.6.32-100.28.5.el6.x86_64, $(KDIR)), 2.6.32-100.28.5.el6.x86_64) EXTRA_CFLAGS += -DUEK endif endif ifeq (yes, $(debug)) ifeq ($(findstring suse, $(VENDOR)), suse) ifeq ($(DISTRO_VER), 11) ifeq ($(PATCH_LEVEL), 2) EXTRA_CFLAGS += -DINM_DEBUG -DIDEBUG_MIRROR -g3 -gdwarf-2 -O2 endif endif else EXTRA_CFLAGS += -DINM_DEBUG -DIDEBUG_MIRROR -g3 -gdwarf-2 -O0 endif else ifeq (all, $(debug)) ifeq ($(findstring suse, $(VENDOR)), suse) ifeq ($(DISTRO_VER), 11) ifeq ($(PATCH_LEVEL), 2) EXTRA_CFLAGS += -DINM_DEBUG -DIDEBUG -DIDEBUG_META -DIDEBUG_MIRROR -DIDEBUG_MIRROR_IO -g3 -gdwarf-2 -O2 endif endif else EXTRA_CFLAGS += -DINM_DEBUG -DIDEBUG -DIDEBUG_META -DIDEBUG_MIRROR -DIDEBUG_MIRROR_IO -g3 -gdwarf-2 -O0 endif else EXTRA_CFLAGS += -O3 -g3 endif endif ifeq (yes, $(err)) EXTRA_CFLAGS += -DINJECT_ERR endif ifeq (yes, $(fabric)) EXTRA_CFLAGS += -DAPPLIANCE_DRV endif ifeq (yes, $(TELEMETRY)) EXTRA_CFLAGS += -DTELEMETRY endif EXTRA_CFLAGS += -I${BLD_DIR} OBJ_MODULE=involflt.o ifneq (yes, $(dummy)) INVOLFLT_OBJS = verifier.o \ osdep.o \ last_chance_writes.o \ filestream_raw.o \ tunable_params.o \ ioctl.o \ bitmap_api.o \ bitmap_operations.o \ change-node.o \ data-file-mode.o \ data-mode.o \ db_routines.o \ driver-context.o \ filestream.o \ filestream_segment_mapper.o \ filter.o \ filter_host.o \ involflt_debug_routines.o \ iobuffer.o \ md5.o \ metadata-mode.o \ segmented_bitmap.o \ statechange.o \ target-context.o \ utils.o \ work_queue.o \ VBitmap.o \ file-io.o \ telemetry-types.o \ telemetry-exception.o \ telemetry.o else ifeq (yes, $(dummy)) INVOLFLT_OBJS = verifier.o \ osdep.o \ dummy/bitmap_api.o \ bitmap-mode.o \ dummy/bitmap_operations.o \ change-node.o \ dummy/data-file-mode.o \ data-mode.o \ dummy/db_routines.o \ driver-context.o \ filestream.o \ dummy/filestream_segment_mapper.o \ filter.o \ filter_host.o \ involflt_debug_routines.o \ dummy/iobuffer.o \ ioctl.o \ md5.o \ metadata-mode.o \ proc.o \ dummy/segmented_bitmap.o \ dummy/statechange.o \ target-context.o \ tunable_params.o \ utils.o \ dummy/work_queue.o \ dummy/VBitmap.o \ file-io.o \ sysfs_common_attributes.o \ sysfs_volume_attributes.o endif endif ifeq (yes, $(fabric)) INVOLFLT_OBJS += filter_lun.o else INVOLFLT_OBJS += dummy_filter_lun.o endif obj-m += ${OBJ_MODULE} involflt-objs := ${INVOLFLT_OBJS} all: @echo "BLD_DATE: ${BLD_DATE}" @echo "BLD_TIME: ${BLD_TIME}" $(MAKE) -C ${KDIR} M=${BLD_DIR} modules #cleaning all the build files clean: $(MAKE) -C ${KDIR} M=${WORKING_DIR} clean involflt-0.1.0/src/VBitmap.h0000755000000000000000000001521514467303177014370 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _INMAGE_VOLUME_BITMAP_H_ #define _INMAGE_VOLUME_BITMAP_H_ #ifdef LCW_TEST #pragma message "LCW Test Mode" #define INM_BMAP_DEFAULT_DIR_DEPRECATED "/mnt/tmp/x" /* For last chance writes testing, allow creating file in custom location */ #define INM_BMAP_ALLOW_DEPRECATED(fname) (TRUE) #else #define INM_BMAP_DEFAULT_DIR_DEPRECATED "/root" /* Allow deprecated path for existing protections */ #define INM_BMAP_ALLOW_DEPRECATED(fname) (file_exists(fname)) #endif #define VOLUME_BITMAP_FLAGS_WAITING_FOR_SETBITS_WORKITEM_LIST_EMPTY_NOTIFICATION 0x0001 #define VOLUME_BITMAP_FLAGS_HAS_VOLUME_LETTER 0x0002 #define VOLUME_LETTER_IN_CHARS 2 #define STATUS_MORE_PROCESSING_REQUIRED 0xfff1 #define EOF_BMAP (1) #include "bitmap_api.h" #include "inm_utypes.h" typedef enum _etVBitmapState { ecVBitmapStateUnInitialized = 0, /* Set to this state as soon as the bitmap is opened. */ ecVBitmapStateOpened = 1, /* Set to this state when worker routine is queued for first read. */ ecVBitmapStateReadStarted = 2, ecVBitmapStateReadPaused = 3, ecVBitmapStateAddingChanges = 4, ecVBitmapStateReadCompleted = 5, ecVBitmapStateClosed = 6, ecVBitmapStateReadError = 7, ecVBitmapStateInternalError = 8 } etVBitmapState; struct _target_context; struct _wqentry; /* This structure is used for bitmap mode */ typedef struct _volume_bitmap { struct inm_list_head list_entry; /* link all bitmaps to dc */ inm_atomic_t refcnt; inm_u64_t flags; inm_u64_t reserved; etVBitmapState eVBitmapState; struct _target_context *volume_context; inm_sem_t sem; struct inm_list_head work_item_list; inm_spinlock_t lock; struct inm_list_head set_bits_work_item_list; inm_completion_t set_bits_work_item_list_empty_notification; char volume_GUID[GUID_SIZE_IN_CHARS + 1]; char volume_letter[VOLUME_LETTER_IN_CHARS + 1]; inm_u64_t segment_cache_limit; bitmap_api_t *bitmap_api; inm_u32_t bitmap_skip_writes; } volume_bitmap_t; typedef enum _etBitmapWorkItem { ecBitmapWorkItemNotInitialized = 0, ecBitmapWorkItemStartRead, ecBitmapWorkItemContinueRead, ecBitmapWorkItemClearBits, ecBitmapWorkItemSetBits, } etBitmapWorkItem; typedef struct _bitmap_work_item { struct inm_list_head list_entry; inm_atomic_t refcnt; inm_ull64_t changes; inm_u64_t nr_bytes_changed_data; etBitmapWorkItem eBitmapWorkItem; volume_bitmap_t *volume_bitmap; bitruns_t bit_runs; } bitmap_work_item_t; void queue_worker_routine_for_start_bitmap_read(volume_bitmap_t *volume_bitmap); void queue_worker_routine_for_continue_bitmap_read(volume_bitmap_t *volume_bitmap); void continue_bitmap_read(bitmap_work_item_t *bmap_witem, inm_s32_t mutex_acquired); void request_service_thread_to_open_bitmap(struct _target_context *vcptr); volume_bitmap_t *open_bitmap_file(struct _target_context *vcptr, inm_s32_t *status); void close_bitmap_file(volume_bitmap_t *volume_bitmap, inm_s32_t clear_bitmap); void wait_for_all_writes_to_complete(volume_bitmap_t *volume_bitmap); inm_s32_t get_volume_bitmap_granularity(struct _target_context *vcptr, inm_u64_t*bitmap_granularity); void queue_worker_routine_for_continue_bitmap_read(volume_bitmap_t *vbmap); void continue_bitmap_read_worker_routine(struct _wqentry *wqe); void write_bitmap_completion(bitmap_work_item_t *bmap_witem); void bitmap_write_worker_routine(struct _wqentry *wqe); void start_bitmap_read_worker_routine(struct _wqentry *wqe); void write_bitmap_completion_callback(bitruns_t *bitruns); void write_bitmap_completion_worker_routine(struct _wqentry *wqep); void read_bitmap_completion_callback(bitruns_t *bit_runs); void read_bitmap_completion_worker_routine(struct _wqentry *wqe); void read_bitmap_completion(bitmap_work_item_t *bmap_witem); volume_bitmap_t *allocate_volume_bitmap(void); void get_volume_bitmap(volume_bitmap_t *volume_bitmap); void put_volume_bitmap(volume_bitmap_t *volume_bitmap); /* function prototypes for bitmap work items */ bitmap_work_item_t * allocate_bitmap_work_item(inm_u32_t); void cleanup_work_item(bitmap_work_item_t *bm_witem); void get_bitmap_work_item(bitmap_work_item_t *); void put_bitmap_work_item(bitmap_work_item_t *); const char * get_volume_bitmap_state_string(etVBitmapState); inm_s32_t add_metadata_in_change_node(struct inm_list_head *node_hd, change_node_t change_node, inm_u64_t chg_offset, inm_u32_t chg_length); inm_s32_t can_open_bitmap_file(struct _target_context *vcptr, inm_s32_t lose_changes); void set_bitmap_open_fail_due_to_loss_of_changes(struct _target_context *vcptr, inm_s32_t lock_acquired); void set_bitmap_open_error(struct _target_context *vcptr, inm_s32_t lock_acquired, inm_s32_t status); struct _wqentry; inm_s32_t queue_worker_routine_for_bitmap_write(struct _target_context *, inm_u64_t, volume_bitmap_t *, struct inm_list_head *, struct _wqentry *, bitmap_work_item_t *); void log_bitmap_open_success_event(struct _target_context *vcptr); inm_s32_t inmage_flt_save_all_changes(struct _target_context *vcptr, inm_s32_t wait_required, inm_s32_t op_type); void flush_and_close_bitmap_file(struct _target_context *vcptr); void fill_bitmap_filename_in_volume_context(struct _target_context *vcptr); struct _target_context *get_tc_using_dev(inm_dev_t rdev); inm_s32_t move_rawbitmap_to_bmap(struct _target_context *vcptr, inm_s32_t force); void process_vcontext_work_items(struct _wqentry *wqeptr); inm_s32_t add_vc_workitem_to_list(inm_u32_t witem_type, struct _target_context *vcptr, inm_u32_t extra1, inm_u8_t open_bitmap, struct inm_list_head *lhptr); #endif /* _INMAGE_VOLUME_BITMAP_H_ */ involflt-0.1.0/src/dummy_filter_lun.c0000755000000000000000000001076614467303177016405 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include "involflt.h" #include "involflt-common.h" #include "utils.h" #include "change-node.h" #include "filestream.h" #include "iobuffer.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "bitmap_api.h" #include "VBitmap.h" #include "work_queue.h" #include "data-file-mode.h" #include "target-context.h" #include "data-mode.h" #include "driver-context.h" #include "metadata-mode.h" #include "statechange.h" #include "driver-context.h" #include "filter_lun.h" #include #include #include #if LINUX_VERSION_CODE < KERNEL_VERSION(5,11,0) && !defined SET_INM_QUEUE_FLAG_STABLE_WRITE #include #endif #include "file-io.h" #include "target-context.h" struct scst_cmd { void *not_used; }; struct scst_dev_type { void *not_used; }; struct scst_proc_data { void *not_used; }; /* dummy file to resolve the scst defined symbols */ /* scst_single_seq_open */ inm_s32_t scst_single_seq_open(struct inode *inode, struct file *file) { return -EINVAL; } /* scst_set_cmd_error */ void scst_set_cmd_error_status(struct scst_cmd *cmd, inm_s32_t status) { return; } /* __scst_register_virtual_dev_driver */ inm_s32_t __scst_register_virtual_dev_driver(struct scst_dev_type *dev_type, const char *version) { return -EINVAL; } /* scst_set_busy */ void scst_set_busy(struct scst_cmd *cmd) { return; } /* scst_sbc_generic_parse */ inm_s32_t scst_sbc_generic_parse(struct scst_cmd *cmd, inm_s32_t (*get_block_shift)(struct scst_cmd *cmd)) { return -EINVAL; } /* scst_register_virtual_device */ inm_s32_t scst_register_virtual_device(struct scst_dev_type *dev_handler, const char *dev_name) { return -EINVAL; } /* scst_unregister_virtual_device */ void scst_unregister_virtual_device(inm_s32_t id, unsigned int flag) { return; } /* scst_unregister_virtual_dev_driver */ void scst_unregister_virtual_dev_driver(struct scst_dev_type *dev_type) { return; } /* scst_set_resp_data_len */ void scst_set_resp_data_len(struct scst_cmd *cmd, inm_s32_t resp_data_len) { return; } /* scst_create_proc_entry */ struct proc_dir_entry *scst_create_proc_entry(struct proc_dir_entry * root, const char *name, struct scst_proc_data *pdata) { return NULL; } void scst_set_cmd_error(struct scst_cmd *cmd, inm_s32_t key, inm_s32_t asc, inm_s32_t ascq) { return; } inm_s32_t register_filter_target() { return 0; } inm_s32_t unregister_filter_target() { return 0; } inm_s32_t get_lun_query_data(inm_u32_t i, inm_u32_t *ip, LunData *ldp) { return 0; } inm_s32_t fabric_volume_init(target_context_t ctx, inm_dev_info_t *dev_info) { return 0; } inm_s32_t filter_lun_delete(char *s) { return 0; } inm_s32_t get_at_lun_last_write_vi(char* uuid, char *initiator_name) { return 0; } inm_s32_t get_at_lun_last_host_io_timestamp(AT_LUN_LAST_HOST_IO_TIMESTAMP *timestamp) { return 0; } inm_s32_t filter_lun_create(char* uuid, inm_u64_t nblks, inm_u32_t bsize, inm_u64_t startoff) { return 0; } inm_s32_t fabric_volume_deinit(target_context_t *ctx) { return 0; } void copy_iovec_data_to_data_pages(inm_wdata_t *wdatap, struct inm_list_head *listhdp) { return; } int inm_validate_fabric_vol(target_context_t *tcp, const inm_dev_info_t *dip) { return (0); } inm_s32_t process_at_lun_create(struct file *filp, void __user *arg) { return(0); } inm_s32_t process_at_lun_last_write_vi(struct file *filp, void __user *arg) { return(0); } inm_s32_t process_at_lun_last_host_io_timestamp(struct file *filp, void __user *arg) { return(0); } inm_s32_t process_at_lun_query(struct file *filp, void __user *arg) { return(0); } inm_s32_t process_at_lun_delete(struct file *filp, void __user *arg) { return(0); } int emd_unregister_virtual_device(int dev_id) { return -1; } involflt-0.1.0/src/metadata-mode.h0000755000000000000000000000311614467303177015525 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef LINVOLFLT_METADATA_MODE_H #define LINVOLFLT_METADATA_MODE_H typedef struct _write_metadata_tag { inm_u64_t offset; inm_u32_t length; }write_metadata_t; inm_u32_t split_change_into_chg_node(target_context_t *vcptr, write_metadata_t *wmd, inm_s32_t data_source, struct inm_list_head *split_chg_list_hd, inm_wdata_t *wdatap); inm_s32_t add_metadata(target_context_t *vcptr, struct _change_node *chg_node, write_metadata_t *wmd, inm_s32_t data_source, inm_wdata_t *wdatap); inm_s32_t save_data_in_metadata_mode(target_context_t *, write_metadata_t *, inm_wdata_t *); inm_s32_t add_tag_in_non_stream_mode(tag_volinfo_t *, tag_info_t *, int, tag_guid_t *, inm_s32_t, int commit_pending, tag_history_t *); #endif involflt-0.1.0/src/data-mode.c0000755000000000000000000012537414467303177014664 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ /* * File : data-mode.c * * Description: This file contains data mode implementation of the * filter driver. */ #include "involflt.h" #include "involflt-common.h" #include "data-mode.h" #include "utils.h" #include "change-node.h" #include "filestream.h" #include "iobuffer.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "bitmap_api.h" #include "VBitmap.h" #include "work_queue.h" #include "data-file-mode.h" #include "target-context.h" #include "driver-context.h" #include "filter_host.h" #include "metadata-mode.h" #include "tunable_params.h" #include "svdparse.h" #include "db_routines.h" const inm_s32_t sv_hdr_sz = (sizeof(SVD_PREFIX) + sizeof(SVD_HEADER1)); const inm_s32_t sv_ts_sz = (sizeof(SVD_PREFIX) + sizeof(SVD_TIME_STAMP_V2)); const inm_s32_t sv_drtd_sz = (sizeof(SVD_PREFIX) + sizeof(inm_u64_t)); const inm_s32_t sv_chg_sz = (sizeof(SVD_PREFIX) + sizeof(SVD_DIRTY_BLOCK_V2)); const inm_s32_t sv_pref_sz = (3*sizeof(SVD_PREFIX) + sizeof(SVD_HEADER1) + sizeof(SVD_TIME_STAMP_V2) + sizeof(inm_u64_t)) + sizeof(inm_u32_t); const inm_s32_t sv_const_sz = (4*sizeof(SVD_PREFIX) + sizeof(SVD_HEADER1) + (2*sizeof(SVD_TIME_STAMP_V2)) + sizeof(inm_u64_t) + sizeof(inm_u32_t)); /* From data mode perspective, a node is new if it has no data pages */ #define is_new_data_node(node) (inm_list_empty(&node->data_pg_head)? 1 : 0) static inm_s32_t should_wakeup_worker_alloc(inm_u32_t); static inm_s32_t is_metadata_due_to_delay_alloc(target_context_t *tcp); static inm_s32_t set_all_data_mode(void); extern driver_context_t *driver_ctx; extern void copy_iovec_data_to_data_pages(inm_wdata_t *, inm_list_head_t *); /* A framework abstracted in data mode filtering which tells whether to * pre-allocate data pages or allocate pages on demand. */ inm_s32_t dynamic_alloc = 0; data_page_t *get_cur_data_pg(change_node_t *node, inm_s32_t *offset) { INM_BUG_ON(node->cur_data_pg == NULL); if( node->cur_data_pg_off == INM_PAGESZ ) { *offset = 0; node->cur_data_pg = PG_ENTRY(node->cur_data_pg->next.next); return node->cur_data_pg; } else { *offset = node->cur_data_pg_off; return node->cur_data_pg; } } void update_cur_dat_pg(change_node_t *node, data_page_t *pg, inm_s32_t offset) { INM_BUG_ON(pg == NULL); if(offset == INM_PAGESZ) { node->cur_data_pg = PG_ENTRY(pg->next.next); node->cur_data_pg_off = 0; } else { node->cur_data_pg = pg; node->cur_data_pg_off = offset; } } static_inline void copy_data_to_data_pages( void *addr, inm_s32_t len, change_node_t *node) { data_page_t *pg; inm_s32_t pg_offset, pg_free, to_copy, buf_offset = 0, rem = len ; char *buf = (char *)addr, *dst; inm_s32_t org_pg_offset; #ifdef TEST_VERIFIER #define CORRUPTION_EVERY_SECS 120ULL #define CORRUPTION_EVERY_100NSECS (CORRUPTION_EVERY_SECS * 10000000) #define CORRUPTION_OFF_BY 1 static unsigned int files = 1; inm_u64_t now; #endif pg = get_cur_data_pg(node, &pg_offset); pg_free = (INM_PAGESZ - pg_offset); org_pg_offset = pg_offset; INM_BUG_ON((pg == NULL) || (pg_offset >= INM_PAGESZ) || (pg_free <= 0)); #ifdef TEST_VERIFIER if (node->type == NODE_SRC_DATA) { get_time_stamp(&now); if ((now - driver_ctx->dc_tel.dt_drv_load_time) > (files * CORRUPTION_EVERY_100NSECS)) { files++; /* Make sure we conly corrupt our buffers */ if (CHANGE_NODE_IS_FIRST_DATA_PAGE(node, pg)) { /* First page - Overflow */ pg_offset += CORRUPTION_OFF_BY; err("Corrupting Page 0: offset = %d", node->stream_len); } else if (CHANGE_NODE_IS_LAST_DATA_PAGE(node, pg)) { /* Last page - Underrun */ pg_offset -= CORRUPTION_OFF_BY; err("Corrupting Page N: offset = %d", node->stream_len); } else { /* Random corruption */ if (now & 1) pg_offset += CORRUPTION_OFF_BY; else pg_offset -= CORRUPTION_OFF_BY; err("Corrupting DB: offset = %d", node->stream_len); } } } #endif while(1) { to_copy = MIN(pg_free, rem); INM_PAGE_MAP(dst, pg->page, KM_SOFTIRQ0); memcpy_s((char *)(dst + pg_offset), to_copy, (char *)(buf + buf_offset), to_copy); INM_PAGE_UNMAP(dst, pg->page, KM_SOFTIRQ0); pg_free -= to_copy; rem -= to_copy; buf_offset += to_copy; pg_offset += to_copy; if(pg_free == 0) { pg = PG_ENTRY(pg->next.next); pg_offset = 0; pg_free = INM_PAGESZ; if(((void *)pg == (void *)&node->data_pg_head)) { /* Detect any under/overrun */ /* New offset should match old offset + len */ if (((org_pg_offset + len) & (INM_PAGESZ - 1)) != pg_offset) { err("Data copy error: Org: %d Len: %d " "New: %d Last: 1", org_pg_offset, len, pg_offset); } /* This is possible when TOLC is copied. So just return. */ return; } } if(rem == 0) break; } update_cur_dat_pg(node, pg, pg_offset); /* Detect any under/overrun */ /* New offset should match old offset + len */ pg = get_cur_data_pg(node, &pg_offset); if (((org_pg_offset + len) & (INM_PAGESZ - 1)) != pg_offset) { err("Data copy error: Org: %d Len: %d New: %d First: %d " "Last: %d", org_pg_offset, len, pg_offset, CHANGE_NODE_IS_FIRST_DATA_PAGE(node, pg), CHANGE_NODE_IS_LAST_DATA_PAGE(node, pg)); } } /* * Return 0 for success i.e. there is overflow * Return 1 for failure i.e. request can be satisfied * from tc's reservation */ static_inline int inm_tc_resv_overflow(target_context_t *tgt_ctxt, inm_s32_t num_pages, inm_u32_t *overflow_pages) { inm_u32_t tc_allocated_pages = tgt_ctxt->tc_stats.num_pages_allocated; inm_u32_t tc_res_pages = tgt_ctxt->tc_reserved_pages; *overflow_pages = 0; /* * Calculate overflow_pages * case 1:Complete allocation can be done from tc's reserved pages * case 2:Allocation is split into tc's reserved and dc's un-reserved area * case 3:Complete allocation needs to be done from dc's unreserved pages */ if (tc_allocated_pages < tc_res_pages) { if ((tc_allocated_pages + num_pages) <= tc_res_pages) { /* case 1 */ return 1; } else { /* case 2 */ *overflow_pages = (tc_allocated_pages + num_pages) - tc_res_pages; } } else { /* case 3 */ *overflow_pages = num_pages; } return 0; } void inm_tc_resv_fill(void) { struct inm_list_head *ptr, *nextptr; target_context_t *tgt_ctxt; ptr = NULL; nextptr = NULL; tgt_ctxt= NULL; /* * Allocate page reservations to target contexts if it didnt get pages * earlier */ INM_DOWN_READ(&driver_ctx->tgt_list_sem); inm_list_for_each_safe(ptr, nextptr, &driver_ctx->tgt_list) { tgt_ctxt = inm_list_entry(ptr, target_context_t, tc_list); /* If target is undergone for creation or deletion, reservation for * it is not required here. */ if (tgt_ctxt->tc_flags & (VCF_VOLUME_DELETING | VCF_VOLUME_CREATING)) continue; volume_lock(tgt_ctxt); /* Make an attempt to reserve pages if target does not have * reservations */ if (tgt_ctxt->tc_reserved_pages) { volume_unlock(tgt_ctxt); continue; } /* Not enough pages? */ if (inm_tc_resv_add(tgt_ctxt, driver_ctx->dc_vol_data_pool_size)) { volume_unlock(tgt_ctxt); break; } volume_unlock(tgt_ctxt); } /* inm_list_for_each_safe */ INM_UP_READ(&driver_ctx->tgt_list_sem); } int inm_tc_resv_add(target_context_t *tgt_ctxt, inm_u32_t num_pages) { inm_u32_t tc_allocated_pages = tgt_ctxt->tc_stats.num_pages_allocated; inm_u32_t num_unres_pages = 0; inm_u32_t updated_tc_reserved_pages; unsigned long lock_flag = 0; if (!num_pages) return 1; updated_tc_reserved_pages = tgt_ctxt->tc_reserved_pages + num_pages; /* * Account currently tc's allocated pages and then evaluate the number * of pages required from dc's unreserved pages * case 1: tc's allocated pages < tc's current reservation * case 2: tc's allocated pages > tc's current reservation and * tc's allocated pages < tc's new reservation * case 3: tc's allocated pages > tc's new reservation */ if (tc_allocated_pages <= tgt_ctxt->tc_reserved_pages) { /* case 1 */ num_unres_pages = num_pages; } else { if (tc_allocated_pages < updated_tc_reserved_pages) { /* case 2 */ num_unres_pages = updated_tc_reserved_pages - tc_allocated_pages; } else { /* case 3 */ num_unres_pages = 0; } } INM_SPIN_LOCK_IRQSAVE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); /* Do we have enough unrerved pages? */ if (driver_ctx->dc_cur_unres_pages < num_unres_pages) { INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); return 1; } tgt_ctxt->tc_reserved_pages = updated_tc_reserved_pages; driver_ctx->dc_cur_unres_pages -= num_unres_pages; driver_ctx->dc_cur_res_pages += num_pages; if(driver_ctx->data_flt_ctx.dp_least_free_pgs > num_pages){ driver_ctx->data_flt_ctx.dp_least_free_pgs -= num_pages; } else { driver_ctx->data_flt_ctx.dp_least_free_pgs = 0; } driver_ctx->data_flt_ctx.dp_pages_alloc_free -= num_pages; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); return 0; } int inm_tc_resv_del(target_context_t *tgt_ctxt, inm_u32_t num_pages) { inm_u32_t tc_allocated_pages = tgt_ctxt->tc_stats.num_pages_allocated; inm_u32_t num_unres_pages = 0; inm_u32_t updated_tc_reserve_pages; unsigned long lock_flag = 0; /* On target deinit path, we may have num_pages zero when * target ctx didnt get any reservations */ if (!num_pages) return 0; if (num_pages > tgt_ctxt->tc_reserved_pages) return 1; updated_tc_reserve_pages = tgt_ctxt->tc_reserved_pages - num_pages; /* * Account currently tc's allocated pages and then evaluate the number * of pages required from dc's unreserved pages * case 1: tc's allocated pages < tc's new reservation * case 2: tc's allocated pages < tc's current reservation and * tc's allocated pages > tc's new reservation * case 3: tc's allocated pages > tc's new reservation */ if (tc_allocated_pages <= updated_tc_reserve_pages) { /* case 1 */ num_unres_pages = num_pages; } else { if (tc_allocated_pages < tgt_ctxt->tc_reserved_pages) { /* case 2 */ num_unres_pages = tgt_ctxt->tc_reserved_pages - tc_allocated_pages; } else { /* case 3 */ num_unres_pages = 0; } } INM_SPIN_LOCK_IRQSAVE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); tgt_ctxt->tc_reserved_pages = updated_tc_reserve_pages; driver_ctx->dc_cur_unres_pages += num_unres_pages; driver_ctx->dc_cur_res_pages -= num_pages; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); INM_BUG_ON(driver_ctx->dc_cur_unres_pages > (driver_ctx->data_flt_ctx.pages_allocated - driver_ctx->dc_cur_res_pages)); /* fill reservations for tc's with empty reservations */ inm_tc_resv_fill(); return 0; } inm_s32_t init_data_flt_ctxt(data_flt_t *data_ctxt) { inm_u32_t num_pages = 0; #ifndef INM_AIX inm_u32_t gfp_mask = INM_KM_SLEEP|INM_KM_NORETRY; #else inm_u32_t gfp_mask = 0; #endif inm_u32_t total_ram_pgs = 0; inm_meminfo_t info; INM_SI_MEMINFO(&info); total_ram_pgs = info.totalram; #ifndef INM_DEBUG gfp_mask |= INM_KM_NOWARN; #endif INM_INIT_LIST_HEAD(&data_ctxt->data_pages_head); INM_INIT_SPIN_LOCK(&data_ctxt->data_pages_lock); data_ctxt->pages_allocated = 0; data_ctxt->pages_free = 0; #ifdef CONFIG_HIGHMEM gfp_mask |= INM_KM_HIGHMEM; #endif num_pages = DEFAULT_DATA_POOL_SIZE_MB; num_pages <<= (MEGABYTE_BIT_SHIFT - INM_PAGESHIFT); if(driver_ctx->tunable_params.enable_data_filtering) { if(!alloc_data_pages(&data_ctxt->data_pages_head, num_pages, &data_ctxt->pages_allocated, gfp_mask)) { if(data_ctxt->pages_allocated) { free_data_pages(&data_ctxt->data_pages_head); data_ctxt->pages_allocated = 0; } err("Not enough data pages available for filtering"); return -ENOMEM; } data_ctxt->pages_free = data_ctxt->pages_allocated; driver_ctx->dc_cur_unres_pages = data_ctxt->pages_allocated; driver_ctx->data_flt_ctx.dp_pages_alloc_free = data_ctxt->pages_allocated; INM_BUG_ON(!driver_ctx->tunable_params.percent_change_data_pool_size); data_ctxt->dp_nrpgs_slab = (total_ram_pgs * driver_ctx->tunable_params.percent_change_data_pool_size) / 100; data_ctxt->dp_least_free_pgs = driver_ctx->dc_cur_unres_pages; INM_BUG_ON(driver_ctx->dc_cur_unres_pages > (driver_ctx->data_flt_ctx.pages_allocated - driver_ctx->dc_cur_res_pages)); recalc_data_file_mode_thres(); } return 0; } void free_data_flt_ctxt(data_flt_t *data_ctxt) { free_data_pages(&data_ctxt->data_pages_head); data_ctxt->pages_allocated = 0; data_ctxt->pages_free = 0; } int add_data_pages(inm_u32_t num_pages) { struct inm_list_head pg_head; inm_u32_t pgs_alloced = 0; unsigned long lock_flag = 0; #ifndef INM_AIX inm_u32_t gfp_mask = INM_KM_SLEEP|INM_KM_NORETRY; #else inm_u32_t gfp_mask = 0; #endif #ifndef INM_DEBUG gfp_mask |= INM_KM_NOWARN; #endif #ifdef CONFIG_HIGHMEM gfp_mask |= INM_KM_HIGHMEM; #endif if(!num_pages){ return 0; } INM_INIT_LIST_HEAD(&pg_head); alloc_data_pages(&pg_head, num_pages, &pgs_alloced, gfp_mask); if (pgs_alloced != num_pages) { free_data_pages(&pg_head); return 1; } /* Update data page pool */ INM_SPIN_LOCK_IRQSAVE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); inm_list_splice_at_tail(&pg_head, &(driver_ctx->data_flt_ctx.data_pages_head)); driver_ctx->data_flt_ctx.pages_free += pgs_alloced; driver_ctx->data_flt_ctx.pages_allocated += pgs_alloced; driver_ctx->dc_cur_unres_pages += pgs_alloced; driver_ctx->data_flt_ctx.dp_pages_alloc_free += pgs_alloced; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); INM_BUG_ON(driver_ctx->dc_cur_unres_pages > (driver_ctx->data_flt_ctx.pages_allocated - driver_ctx->dc_cur_res_pages)); /* fill reservations for tc's with empty reservations */ inm_tc_resv_fill(); set_all_data_mode(); return 0; } static inm_s32_t should_wakeup_worker_alloc(inm_u32_t overflow_pages) { inm_u32_t alloc_limit = 0, temp = 0; inm_s32_t ret = 0; alloc_limit = MIN(driver_ctx->data_flt_ctx.dp_nrpgs_slab, driver_ctx->data_flt_ctx.pages_allocated); alloc_limit = (alloc_limit * MIN_FREE_PAGES_TO_ALLOC_SLAB_PERCENT) / 100; if (driver_ctx->dc_cur_unres_pages > overflow_pages) { temp = driver_ctx->dc_cur_unres_pages - overflow_pages; } if(alloc_limit > temp){ ret = 1; } return ret; } int get_data_pages(target_context_t *tgt_ctxt, struct inm_list_head *head, inm_s32_t num_pages) { struct inm_list_head *ptr,*hd,*new, lhead; inm_s32_t pages_allocated = 0; inm_s32_t r = 0; unsigned long lock_flag = 0; inm_u32_t overflow_pages = 0; inm_s32_t wakeup = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("entered tcPages:%d freePages:%d ReqPages:%d dc_cur_unres_pages:%d", tgt_ctxt->tc_stats.num_pages_allocated, driver_ctx->data_flt_ctx.pages_free, num_pages, driver_ctx->dc_cur_unres_pages); } INM_BUG_ON(!num_pages); INM_BUG_ON(!driver_ctx); hd = &(driver_ctx->data_flt_ctx.data_pages_head); INM_INIT_LIST_HEAD(&lhead); INM_SPIN_LOCK_IRQSAVE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); /* * Check if request can be satisfied from tc's reserve pool otherwise * check if overflow pages (pages beyond tc's reservation) * can be allocated dc's unreserved pool */ if (inm_tc_resv_overflow(tgt_ctxt, num_pages, &overflow_pages)) { goto allocate; } wakeup = should_wakeup_worker_alloc(overflow_pages); if (overflow_pages <= driver_ctx->dc_cur_unres_pages) { goto allocate; } else { goto unlock_return; } allocate: for(ptr = hd->next; ptr != hd;) { new = ptr; ptr = ptr->next; inm_list_del(new); inm_list_add_tail(new, &lhead); pages_allocated++; driver_ctx->data_flt_ctx.pages_free--; if(pages_allocated == num_pages) break; } driver_ctx->dc_cur_unres_pages -= overflow_pages; if(driver_ctx->data_flt_ctx.dp_least_free_pgs > overflow_pages){ driver_ctx->data_flt_ctx.dp_least_free_pgs -= overflow_pages; } else { driver_ctx->data_flt_ctx.dp_least_free_pgs = 0; } if (pages_allocated != num_pages) { /* * The memory counters dont match due to some bug. Attempt to restore * sanity to memory counters. The (num_pages - pages_allocated) pages * will be lost forever. */ INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); inm_rel_data_pages(tgt_ctxt, &lhead, pages_allocated); /* We are here as we ran out of data pool */ wakeup = 1; /* return error */ r = 0; INM_BUG_ON(pages_allocated != num_pages); goto out; } else { inm_list_splice_at_tail(&lhead, head); r = 1; } unlock_return: INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); out: if(wakeup){ INM_SPIN_LOCK_IRQSAVE(&(driver_ctx->wqueue.lock), lock_flag); driver_ctx->wqueue.flags |= WQ_FLAGS_REORG_DP_ALLOC; INM_ATOMIC_INC(&(driver_ctx->wqueue.wakeup_event_raised)); INM_WAKEUP_INTERRUPTIBLE(&(driver_ctx->wqueue.wakeup_event)); INM_COMPLETE(&(driver_ctx->wqueue.new_event_completion)); INM_SPIN_UNLOCK_IRQRESTORE(&(driver_ctx->wqueue.lock), lock_flag); } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("leaving tcPages:%d freePages:%d ReqPages:%d dc_cur_unres_pages:%d", tgt_ctxt->tc_stats.num_pages_allocated, driver_ctx->data_flt_ctx.pages_free, num_pages, driver_ctx->dc_cur_unres_pages); } if (!r) set_malloc_fail_error(tgt_ctxt); return r; } /* * inm_rel_data_pages() * @tcp : target_context_t ptr variable * @lhp : struct inm_list_head ptr variable * @nrpgs : # of pages in the list * * notes : adds pages from input list to the global list (driver context) * Caller must decrement tc_stats.num_pages_allocated before * calling this function */ int inm_rel_data_pages(target_context_t *tcp, struct inm_list_head *lhp, inm_u32_t nrpgs) { struct inm_list_head *ptr = NULL,*nextptr = NULL; inm_u32_t nr_oflow_pgs = 0; unsigned long lock_flag = 0; inm_s32_t ret = -1; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("entered"); } inm_tc_resv_overflow(tcp, nrpgs, &nr_oflow_pgs); INM_BUG_ON(!driver_ctx); INM_SPIN_LOCK_IRQSAVE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); inm_list_for_each_safe(ptr, nextptr, lhp) { inm_list_del(ptr); inm_list_add_tail(ptr, &driver_ctx->data_flt_ctx.data_pages_head); driver_ctx->data_flt_ctx.pages_free++; } if (nr_oflow_pgs) { driver_ctx->dc_cur_unres_pages += nr_oflow_pgs; } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); INM_BUG_ON(driver_ctx->dc_cur_unres_pages > (driver_ctx->data_flt_ctx.pages_allocated - driver_ctx->dc_cur_res_pages)); ret = 0; if(IS_DBG_ENABLED(inm_verbosity, INM_IDEBUG)){ info("leaving"); } return ret; } /* * this function would allocate pages from page head passed to the function * returns number of pages allocated * There should not be any failure in this function as pages are preallocated */ int inm_get_pages_for_node(change_node_t *chg_node, struct inm_list_head *pg_head, inm_s32_t len) { inm_s32_t bytes_req, new, chg_sz; inm_u32_t num_pages; inm_u32_t pg_cnt; struct inm_list_head *headp, *next_headp; struct inm_list_head headp_local; new = 0; num_pages = 0; pg_cnt = 0; chg_sz = sv_chg_sz + len; headp = NULL; next_headp = NULL; if(chg_sz > chg_node->data_free) bytes_req = (chg_sz - chg_node->data_free); else bytes_req = 0; if (is_new_data_node(chg_node)) { bytes_req += sv_const_sz; new = 1; } INM_INIT_LIST_HEAD(&headp_local); num_pages = bytes_to_pages(bytes_req); pg_cnt = num_pages; /* Allocate pages from pg_head */ inm_list_for_each_safe(headp, next_headp, pg_head) { inm_list_del(headp); inm_list_add_tail(headp, &headp_local); pg_cnt--; if (!pg_cnt) break; } if (bytes_req) { inm_list_splice_at_tail(&headp_local, &(chg_node->data_pg_head)); chg_node->data_free += (num_pages * INM_PAGESZ); chg_node->changes.num_data_pgs += num_pages; } if (new) { chg_node->cur_data_pg = PG_ENTRY(chg_node->data_pg_head.next); chg_node->cur_data_pg_off = sv_pref_sz; chg_node->data_free -= sv_const_sz; } return num_pages; } inm_u32_t inm_split_change_in_data_mode(target_context_t *tgt_ctxt, write_metadata_t *wmd, inm_wdata_t *wdatap) { change_node_t *chg_node = NULL; SVD_DIRTY_BLOCK_V2 dblock; struct inm_list_head *headp, *next_headp; inm_u32_t nr_splits; struct inm_list_head pg_head; disk_chg_t *disk_chg_ptr = NULL; inm_s32_t num_pages, total_num_pages, i; struct inm_list_head split_chg_node_list; static const SVD_PREFIX db_prefix = {SVD_TAG_DIRTY_BLOCK_DATA_V2, 1, 0}; int perf_changes = 1; int is_barrier_on = 0; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("entered %u", wmd->length); } num_pages = 0; total_num_pages = 0; #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) if ((INM_ATOMIC_READ(&driver_ctx->is_iobarrier_on)) ) { is_barrier_on = 1; if(!(tgt_ctxt->tc_flags & VCF_IO_BARRIER_ON)) { tgt_ctxt->tc_flags |= VCF_IO_BARRIER_ON; } perf_changes = 0; } #endif if (tgt_ctxt->tc_cur_node && (tgt_ctxt->tc_optimize_performance & PERF_OPT_DRAIN_PREF_DATA_MODE_CHANGES_IN_NWO) && perf_changes) { INM_BUG_ON(!inm_list_empty(&tgt_ctxt->tc_cur_node->nwo_dmode_next)); do_perf_changes(tgt_ctxt, tgt_ctxt->tc_cur_node, IN_IO_PATH); } tgt_ctxt->tc_cur_node = NULL; nr_splits = split_change_into_chg_node(tgt_ctxt, wmd, NODE_SRC_DATA, &split_chg_node_list, wdatap); /* * on error, release half baked change node entries and set volume as out * of sync */ if (INM_UNLIKELY(nr_splits <= 0)) { err("Failed to get current change node"); free_changenode_list(tgt_ctxt, ecSplitIOFailed); queue_worker_routine_for_set_volume_out_of_sync(tgt_ctxt, ERROR_TO_REG_OUT_OF_MEMORY_FOR_DIRTY_BLOCKS, -ENOMEM); return 0; } /* Calculate total bytes of memory required to store current change * in split nodes */ inm_list_for_each_safe(headp, next_headp, &split_chg_node_list) { chg_node = inm_list_entry(headp, change_node_t, next); disk_chg_ptr = (disk_chg_t*)chg_node->changes.cur_md_pgp; /* change nodes are new hence we need to account * sv_chg_sz + sv_const_sz */ total_num_pages += bytes_to_pages(disk_chg_ptr->length + sv_chg_sz + sv_const_sz); } INM_INIT_LIST_HEAD(&pg_head); /* Allocate data pages in one go */ if (!get_data_pages(tgt_ctxt, &pg_head, total_num_pages)) { goto error; } /* allocate data pages to split io change nodes */ inm_list_for_each_safe(headp, next_headp, &split_chg_node_list) { chg_node = inm_list_entry(headp, change_node_t, next); INM_BUG_ON(chg_node->changes.cur_md_pgp == NULL); disk_chg_ptr = (disk_chg_t*)chg_node->changes.cur_md_pgp; /* Allocate data pages for the current change node */ num_pages = inm_get_pages_for_node(chg_node, &pg_head, disk_chg_ptr->length); if (!num_pages) { /* should not be here */ INM_BUG_ON(1); goto error; } /* It is safe to copy data to split io change nodes. We have already * updated offsets of data pages, but that really does not matter in * case of roll back */ dblock.Length = disk_chg_ptr->length; dblock.ByteOffset = disk_chg_ptr->offset; dblock.uiSequenceNumberDelta = disk_chg_ptr->seqno_delta; dblock.uiTimeDelta = disk_chg_ptr->time_delta; copy_data_to_data_pages((void *)&db_prefix, sizeof(SVD_PREFIX), chg_node); copy_data_to_data_pages((void *)&dblock, sizeof(SVD_DIRTY_BLOCK_V2), chg_node); /* update chnage nodes information for data page copy */ chg_node->data_free -= (sv_chg_sz); chg_node->stream_len += (sv_chg_sz); /* Align the data_free to length for copy_bio function * This is a split IO path and copy_biodata_to_data_pages is shared * by non-split and split IO paths. disk_chg_ptr.length value is * sector (512 bytes) aligned value. For split IO path, we dont want to * change copy_bio.. functions at this stage and can rely change nodes * data_free member to only copy sector aligned bytes. */ chg_node->data_free = disk_chg_ptr->length; } /* On successful allocation, make copy of bio pages and necessary updates * to the target context and change nodes, otherwise roll back */ (*wdatap->wd_copy_wd_to_datapgs)(tgt_ctxt,wdatap, split_chg_node_list.next); if (is_barrier_on) { goto add_change_node_to_list; } /* Update change node variables -- data_free and stream_len */ inm_list_for_each_safe(headp, next_headp, &split_chg_node_list) { chg_node = inm_list_entry(headp, change_node_t, next); disk_chg_ptr = (disk_chg_t*)chg_node->changes.cur_md_pgp; /* update chnage nodes information for data page copy */ chg_node->data_free -= (disk_chg_ptr->length); chg_node->stream_len += (disk_chg_ptr->length); if (tgt_ctxt->tc_optimize_performance & PERF_OPT_DRAIN_PREF_DATA_MODE_CHANGES_IN_NWO) { if (chg_node->wostate != ecWriteOrderStateData && inm_list_empty(&chg_node->nwo_dmode_next)) { chg_node->flags |= CHANGE_NODE_IN_NWO_CLOSED; inm_list_add_tail(&chg_node->nwo_dmode_next, &tgt_ctxt->tc_nwo_dmode_list); if (tgt_ctxt->tc_optimize_performance & PERF_OPT_DEBUG_DATA_DRAIN) { info("SplitIO Appending chg:%p tgt_ctxt:%p next:%p prev:%p mode:%d", chg_node,tgt_ctxt, chg_node->nwo_dmode_next.next, chg_node->nwo_dmode_next.prev, chg_node->type); } } } } add_change_node_to_list: /* * Append the split IO change nodes in to target head list of change nodes */ #if defined(SLES15SP3) || LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0) if (is_barrier_on) { inm_list_splice_at_tail(&split_chg_node_list, &chg_node->vcptr->tc_non_drainable_node_head); } else { if(!inm_list_empty(&chg_node->vcptr->tc_non_drainable_node_head)) { do_perf_changes_all(chg_node->vcptr, IN_IO_PATH); chg_node->vcptr->tc_flags &= ~VCF_IO_BARRIER_ON; inm_list_splice_at_tail(&chg_node->vcptr->tc_non_drainable_node_head, &chg_node->vcptr->tc_node_head); INM_INIT_LIST_HEAD(&chg_node->vcptr->tc_non_drainable_node_head); } inm_list_splice_at_tail(&split_chg_node_list, &chg_node->vcptr->tc_node_head); } #else inm_list_splice_at_tail(&split_chg_node_list, &chg_node->vcptr->tc_node_head); #endif /* Update total number of data pages allocated */ tgt_ctxt->tc_stats.num_pages_allocated += total_num_pages; /* Updates to target context information variables */ tgt_ctxt->tc_pending_changes += nr_splits; tgt_ctxt->tc_cnode_pgs += nr_splits; INM_BUG_ON(tgt_ctxt->tc_pending_changes < 0 ); tgt_ctxt->tc_bytes_pending_changes += wmd->length; add_changes_to_pending_changes(tgt_ctxt, tgt_ctxt->tc_cur_wostate, nr_splits); /* Queue change node(s) to file thread if required */ for (i=0; (itc_cur_wostate) set_tgt_ctxt_wostate(tgt_ctxt, ecWriteOrderStateMetadata, FALSE, ecWOSChangeReasonDChanges); /* mark them as metadata change nodes */ inm_list_for_each_safe(headp, next_headp, &split_chg_node_list) { chg_node = inm_list_entry(headp, change_node_t, next); chg_node->type = NODE_SRC_METADATA; /* If change node on non write order data mode list, then * remove it from that list */ if (!inm_list_empty(&chg_node->nwo_dmode_next)) { inm_list_del_init(&chg_node->nwo_dmode_next); } chg_node->wostate = tgt_ctxt->tc_cur_wostate; } /* * Append the split IO change nodes in to target head list of change nodes */ inm_list_splice_at_tail(&split_chg_node_list, &chg_node->vcptr->tc_node_head); /* Updates to target context information variables */ tgt_ctxt->tc_pending_changes += nr_splits; tgt_ctxt->tc_cnode_pgs += nr_splits; tgt_ctxt->tc_bytes_pending_changes += wmd->length; tgt_ctxt->tc_pending_md_changes += nr_splits; tgt_ctxt->tc_bytes_pending_md_changes += wmd->length; add_changes_to_pending_changes(tgt_ctxt, tgt_ctxt->tc_cur_wostate, nr_splits); return 0; } void save_data_in_data_mode_normal(target_context_t *tgt_ctxt, write_metadata_t *wmd, inm_wdata_t *wdatap) { inm_u32_t num_pages = 0; struct inm_list_head pg_head; change_node_t *chg_node = NULL; inm_s32_t bytes_req, chg_sz, new = 0; SVD_DIRTY_BLOCK_V2 dblock; static const SVD_PREFIX db_prefix = {SVD_TAG_DIRTY_BLOCK_DATA_V2, 1, 0}; inm_tsdelta_t ts_delta; char *map_addr; inm_s32_t ret; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("entered"); } ret = inm_xm_mapin(tgt_ctxt, wdatap, &map_addr); if(ret){ dbg("switching to metadata mode \n"); set_tgt_ctxt_filtering_mode(tgt_ctxt, FLT_MODE_METADATA, FALSE); if(ecWriteOrderStateData == tgt_ctxt->tc_cur_wostate) set_tgt_ctxt_wostate(tgt_ctxt, ecWriteOrderStateMetadata, FALSE, ecWOSChangeReasonUnInitialized); if (save_data_in_metadata_mode(tgt_ctxt, wmd, wdatap)) err("save_data_in_metadata_mode failed"); return; } dblock.ByteOffset = wmd->offset; dblock.Length = wmd->length; chg_sz = (sv_chg_sz + dblock.Length); chg_node = get_change_node_to_update(tgt_ctxt, wdatap, &ts_delta); if(!chg_node) { err("Failed to get current change node"); queue_worker_routine_for_set_volume_out_of_sync(tgt_ctxt, ERROR_TO_REG_OUT_OF_MEMORY_FOR_DIRTY_BLOCKS, -ENOMEM); inm_xm_det(wdatap, map_addr); return; } if(chg_sz > chg_node->data_free) bytes_req = (chg_sz - chg_node->data_free); else bytes_req = 0; if(is_new_data_node(chg_node)) { bytes_req += sv_const_sz; new = 1; } INM_INIT_LIST_HEAD(&pg_head); num_pages = bytes_to_pages(bytes_req); if (bytes_req && !get_data_pages(tgt_ctxt, &pg_head, num_pages)) { inm_xm_det(wdatap, map_addr); inm_claim_metadata_page(tgt_ctxt, chg_node, wdatap); dbg("switching to metadata mode \n"); set_tgt_ctxt_filtering_mode(tgt_ctxt, FLT_MODE_METADATA, FALSE); if(ecWriteOrderStateData == tgt_ctxt->tc_cur_wostate) set_tgt_ctxt_wostate(tgt_ctxt, ecWriteOrderStateMetadata, FALSE, ecWOSChangeReasonDChanges); if (save_data_in_metadata_mode(tgt_ctxt, wmd, wdatap)) { err("save_data_in_metadata_mode failed"); } return; } if(bytes_req) { inm_list_splice_at_tail(&pg_head, &(chg_node->data_pg_head)); chg_node->data_free += (num_pages * INM_PAGESZ); tgt_ctxt->tc_stats.num_pages_allocated += num_pages; chg_node->changes.num_data_pgs += num_pages; } if(new) { chg_node->cur_data_pg = PG_ENTRY(chg_node->data_pg_head.next); chg_node->cur_data_pg_off = sv_pref_sz; chg_node->data_free -= sv_const_sz; } if (chg_node->wostate != ecWriteOrderStateData) { ts_delta.td_time = 0; ts_delta.td_seqno = 0; } dblock.uiTimeDelta = ts_delta.td_time; dblock.uiSequenceNumberDelta = ts_delta.td_seqno; update_change_node(chg_node, wmd, &ts_delta); copy_data_to_data_pages((void *)&db_prefix, sizeof(SVD_PREFIX), chg_node); copy_data_to_data_pages((void *)&dblock, sizeof(SVD_DIRTY_BLOCK_V2), chg_node); #ifdef INM_AIX inm_copy_buf_data_to_datapgs(wdatap, &chg_node->next, map_addr); #else INM_BUG_ON(!wdatap->wd_copy_wd_to_datapgs); (*wdatap->wd_copy_wd_to_datapgs)(tgt_ctxt, wdatap, &chg_node->next); #endif inm_xm_det(wdatap, map_addr); chg_node->data_free -= chg_sz; chg_node->stream_len += (sv_chg_sz + dblock.Length); tgt_ctxt->tc_pending_changes++; INM_BUG_ON(tgt_ctxt->tc_pending_changes < 0 ); tgt_ctxt->tc_bytes_pending_changes += dblock.Length; add_changes_to_pending_changes(tgt_ctxt, chg_node->wostate, 1); if(should_write_to_datafile(tgt_ctxt)) { chg_node = get_change_node_to_save_as_file(tgt_ctxt); if(chg_node) { queue_chg_node_to_file_thread(tgt_ctxt, chg_node); deref_chg_node(chg_node); } } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("leaving"); } } void save_data_in_data_mode(target_context_t *tgt_ctxt, write_metadata_t *wmd, inm_wdata_t *wdatap) { inm_u32_t nr_changeNodes; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("entered %u", wmd->length); } /* split the IO larger than MAX_DATA_SIZE_PER_DATA_MODE_CHANGE_NODE */ if ( (sv_const_sz + sv_chg_sz + wmd->length) > driver_ctx->tunable_params.max_data_sz_dm_cn) { #ifdef INM_AIX dbg("switching to metadata mode due to split I/O of length %d\n", wmd->length); set_tgt_ctxt_filtering_mode(tgt_ctxt, FLT_MODE_METADATA, FALSE); if(ecWriteOrderStateData == tgt_ctxt->tc_cur_wostate) set_tgt_ctxt_wostate(tgt_ctxt, ecWriteOrderStateMetadata, FALSE, ecWOSChangeReasonUnInitialized); if (save_data_in_metadata_mode(tgt_ctxt, wmd, wdatap)) err("save_data_in_metadata_mode failed"); tgt_ctxt->tc_nr_spilt_io_data_mode++; #else if (wdatap->wd_flag & INM_WD_WRITE_OFFLOAD) { err("Offload write greater than change node size %llu:%u", wmd->offset, wmd->length); queue_worker_routine_for_set_volume_out_of_sync(tgt_ctxt, ERROR_TO_REG_UNSUPPORTED_IO, -EOPNOTSUPP); INM_BUG_ON(wdatap->wd_flag & INM_WD_WRITE_OFFLOAD); } else { nr_changeNodes = inm_split_change_in_data_mode(tgt_ctxt, wmd, wdatap); if (!nr_changeNodes) err("Memory allocation failure\n"); } #endif } else { save_data_in_data_mode_normal(tgt_ctxt, wmd, wdatap); } is_metadata_due_to_delay_alloc(tgt_ctxt); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("leaving"); } } /* This gets called when driver detects that drainer is going down. */ void data_mode_cleanup_for_s2_exit() { if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("entered"); } if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("leaving"); } } void finalize_data_stream(change_node_t *node) { SVD_PREFIX prefix; SVD_HEADER1 hdr; SVD_TIME_STAMP_V2 svd_ts; inm_u64_t drtd_chgs; inm_u32_t endian_tag = 0; INM_BUG_ON(inm_list_empty(&node->data_pg_head)); prefix.Flags = 0; prefix.count = 1; memcpy_s(&svd_ts, sizeof(STREAM_REC_HDR_4B), (void *)&node->changes.start_ts, sizeof(STREAM_REC_HDR_4B)); /* Write TOLC. */ prefix.tag = SVD_TAG_TIME_STAMP_OF_LAST_CHANGE_V2; copy_data_to_data_pages(&prefix, sizeof(SVD_PREFIX), node); svd_ts.ullSequenceNumber = node->changes.end_ts.ullSequenceNumber; svd_ts.TimeInHundNanoSecondsFromJan1601 = node->changes.end_ts.TimeInHundNanoSecondsFromJan1601; copy_data_to_data_pages(&svd_ts, sizeof(SVD_TIME_STAMP_V2), node); node->stream_len += sv_ts_sz; /* Reset the pointers to start of the data page. */ update_cur_dat_pg(node, PG_ENTRY(node->data_pg_head.next), 0); /*Write Endian string */ if (inm_is_little_endian()) { endian_tag = SVD_TAG_LEFORMAT; } else { endian_tag = SVD_TAG_BEFORMAT; } copy_data_to_data_pages(&endian_tag, sizeof(endian_tag), node); node->stream_len += sizeof(endian_tag); /* Write SVD_HEADER1 */ prefix.tag = SVD_TAG_HEADER1; INM_MEM_ZERO(&hdr, sizeof(SVD_HEADER1)); copy_data_to_data_pages(&prefix, sizeof(SVD_PREFIX), node); copy_data_to_data_pages(&hdr, sizeof(SVD_HEADER1), node); node->stream_len += sv_hdr_sz; /* Write TOFC */ prefix.tag = SVD_TAG_TIME_STAMP_OF_FIRST_CHANGE_V2; copy_data_to_data_pages(&prefix, sizeof(SVD_PREFIX), node); svd_ts.ullSequenceNumber = node->changes.start_ts.ullSequenceNumber; svd_ts.TimeInHundNanoSecondsFromJan1601 = node->changes.start_ts.TimeInHundNanoSecondsFromJan1601; copy_data_to_data_pages(&svd_ts, sizeof(SVD_TIME_STAMP_V2), node); node->stream_len += sv_ts_sz; /* Write DRTD changes */ prefix.tag = SVD_TAG_LENGTH_OF_DRTD_CHANGES; copy_data_to_data_pages(&prefix, sizeof(SVD_PREFIX), node); drtd_chgs = get_drtd_len(node); copy_data_to_data_pages(&drtd_chgs, sizeof(inm_u64_t), node); node->stream_len += sv_drtd_sz; node->flags |= CHANGE_NODE_DATA_STREAM_FINALIZED; if (verify_change_node_file(node)) err("File bad on finalize"); } void recalc_data_file_mode_thres(void) { unsigned long lock_flag; inm_u32_t free_pages_thres = 0; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); free_pages_thres = (((driver_ctx->data_flt_ctx.pages_allocated - driver_ctx->dc_cur_res_pages)* (driver_ctx->tunable_params.free_percent_thres_for_filewrite)) / 100); INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); INM_SPIN_LOCK_IRQSAVE(&driver_ctx->tunables_lock, lock_flag); driver_ctx->tunable_params.free_pages_thres_for_filewrite = free_pages_thres; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->tunables_lock, lock_flag); } inm_s32_t add_tag_in_stream_mode(tag_volinfo_t *tag_volinfop, tag_info_t *tag_buf, inm_s32_t num_tags, tag_guid_t *tag_guid, inm_s32_t index) { target_context_t *ctxt = tag_volinfop->ctxt; change_node_t *chg_node = NULL; inm_s32_t bytes_req = 0; inm_u32_t num_pages = 0; struct inm_list_head pg_head; SVD_PREFIX prefix; SVD_HEADER1 hdr; SVD_TIME_STAMP_V2 svd_ts; inm_s32_t idx = 0; inm_u32_t endian_tag = 0, status = 1; #ifdef INM_AIX inm_wdata_t wdata; #endif dbg("Issuing tag in stream mode"); prefix.Flags = 0; prefix.count = 1; if(tag_buf->tag_len == 0){ if(tag_guid) tag_guid->status[index] = STATUS_FAILURE; return 1; } INM_INIT_LIST_HEAD(&pg_head); bytes_req += (sizeof(endian_tag) + sv_hdr_sz + (2 * sv_ts_sz)); while(idx < num_tags) { bytes_req += (sizeof(SVD_PREFIX) + tag_buf->tag_len); idx++; } idx = 0; num_pages = bytes_to_pages(bytes_req); if(!get_data_pages(ctxt, &pg_head, num_pages)) { set_tgt_ctxt_filtering_mode(ctxt, FLT_MODE_METADATA, FALSE); if(ecWriteOrderStateData == ctxt->tc_cur_wostate) set_tgt_ctxt_wostate(ctxt, ecWriteOrderStateMetadata, FALSE, ecWOSChangeReasonUnInitialized); is_metadata_due_to_delay_alloc(ctxt); status = 0; goto out_err; } #ifdef INM_AIX INM_MEM_ZERO(&wdata, sizeof(inm_wdata_t)); wdata.wd_chg_node = tag_volinfop->chg_node; wdata.wd_meta_page = tag_volinfop->meta_page; chg_node = get_change_node_for_usertag(ctxt, &wdata, TAG_COMMIT_NOT_PENDING); tag_volinfop->chg_node = wdata.wd_chg_node; tag_volinfop->meta_page = wdata.wd_meta_page; #else chg_node = get_change_node_for_usertag(ctxt, NULL, TAG_COMMIT_NOT_PENDING); #endif if(!chg_node) { inm_rel_data_pages(ctxt, &pg_head, num_pages); err("Failed to get change node for adding tag"); status = 0; goto out_err; } inm_list_splice_at_tail(&pg_head, &(chg_node->data_pg_head)); chg_node->data_free += (num_pages * INM_PAGESZ); ctxt->tc_stats.num_pages_allocated += num_pages; chg_node->changes.num_data_pgs += num_pages; update_cur_dat_pg(chg_node, PG_ENTRY(chg_node->data_pg_head.next), 0); /* Write Endian string */ if (inm_is_little_endian()) { endian_tag = SVD_TAG_LEFORMAT; } else { endian_tag = SVD_TAG_BEFORMAT; } copy_data_to_data_pages(&endian_tag, sizeof(endian_tag), chg_node); chg_node->stream_len += sizeof(endian_tag); /* Copy SVD_HEADER1 */ prefix.tag = SVD_TAG_HEADER1; INM_MEM_ZERO(&hdr, sizeof(SVD_HEADER1)); copy_data_to_data_pages(&prefix, sizeof(SVD_PREFIX), chg_node); copy_data_to_data_pages(&hdr, sizeof(SVD_HEADER1), chg_node); chg_node->stream_len += sv_hdr_sz; /* Copy TOFC */ prefix.tag = SVD_TAG_TIME_STAMP_OF_FIRST_CHANGE_V2; copy_data_to_data_pages(&prefix, sizeof(SVD_PREFIX), chg_node); memcpy_s(&svd_ts.Header, sizeof(STREAM_REC_HDR_4B), (void *)&chg_node->changes.start_ts.Header, sizeof(STREAM_REC_HDR_4B)); svd_ts.TimeInHundNanoSecondsFromJan1601 = chg_node->changes.start_ts.TimeInHundNanoSecondsFromJan1601; svd_ts.ullSequenceNumber = chg_node->changes.start_ts.ullSequenceNumber; copy_data_to_data_pages(&svd_ts, sizeof(SVD_TIME_STAMP_V2), chg_node); chg_node->stream_len += sv_ts_sz; while(idx < num_tags) { /* Copy Tag */ prefix.tag = SVD_TAG_USER; prefix.Flags = tag_buf->tag_len; copy_data_to_data_pages(&prefix, sizeof(SVD_PREFIX), chg_node); copy_data_to_data_pages(&tag_buf->tag_name[0], tag_buf->tag_len, chg_node); chg_node->stream_len += (sizeof(SVD_PREFIX) + tag_buf->tag_len); tag_buf++; idx++; } /* Copy TOLC */ prefix.tag = SVD_TAG_TIME_STAMP_OF_LAST_CHANGE_V2; copy_data_to_data_pages(&prefix, sizeof(SVD_PREFIX), chg_node); svd_ts.TimeInHundNanoSecondsFromJan1601 = chg_node->changes.end_ts.TimeInHundNanoSecondsFromJan1601; svd_ts.ullSequenceNumber = chg_node->changes.end_ts.ullSequenceNumber; copy_data_to_data_pages(&svd_ts, sizeof(SVD_TIME_STAMP_V2), chg_node); chg_node->stream_len += sv_ts_sz; chg_node->flags |= CHANGE_NODE_DATA_STREAM_FINALIZED; chg_node->flags |= CHANGE_NODE_TAG_IN_STREAM; if(tag_guid){ tag_guid->status[index] = STATUS_PENDING; chg_node->tag_status_idx = index; } chg_node->tag_guid = tag_guid; dbg("Tag Issued Successfully to volume %s", ctxt->tc_guid); goto out; out_err: if(tag_guid) tag_guid->status[index] = STATUS_FAILURE; out: return status; } static inm_s32_t is_metadata_due_to_delay_alloc(target_context_t *tcp) { inm_u32_t data_pool_size = 0; unsigned long lock_flag; inm_s32_t ret = 0; if(tcp->tc_cur_mode != FLT_MODE_METADATA){ goto out; } INM_SPIN_LOCK_IRQSAVE(&driver_ctx->tunables_lock, lock_flag); data_pool_size = driver_ctx->tunable_params.data_pool_size; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->tunables_lock, lock_flag); data_pool_size <<= (MEGABYTE_BIT_SHIFT-INM_PAGESHIFT); INM_SPIN_LOCK_IRQSAVE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); if(driver_ctx->data_flt_ctx.pages_allocated < data_pool_size){ ret = 1; } INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->data_flt_ctx.data_pages_lock, lock_flag); out: if(ret){ INM_ATOMIC_INC(&(tcp->tc_stats.metadata_trans_due_to_delay_alloc)); } return ret; } static inm_s32_t set_all_data_mode(void) { inm_list_head_t *ptr = NULL, *nextptr = NULL; target_context_t *tcp = NULL; inm_s32_t ret = 0; INM_DOWN_READ(&driver_ctx->tgt_list_sem); inm_list_for_each_safe(ptr, nextptr, &driver_ctx->tgt_list) { tcp = inm_list_entry(ptr, target_context_t, tc_list); if(tcp && tcp->tc_cur_mode == FLT_MODE_METADATA){ set_tgt_ctxt_filtering_mode(tcp, FLT_MODE_DATA, FALSE); } tcp = NULL; } INM_UP_READ(&driver_ctx->tgt_list_sem); return ret; } involflt-0.1.0/src/file-io.c0000755000000000000000000007353214467303177014353 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include "involflt.h" #include "involflt-common.h" #include "data-mode.h" #include "change-node.h" #include "filestream.h" #include "iobuffer.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "bitmap_api.h" #include "VBitmap.h" #include "work_queue.h" #include "data-file-mode.h" #include "target-context.h" #include "driver-context.h" #include "file-io.h" #include #include #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,30) #include #include #endif extern driver_context_t *driver_ctx; #if LINUX_VERSION_CODE <= KERNEL_VERSION(2,6,13) #define RHEL_OLD #endif inm_s32_t flt_open_file (const char *fname, inm_u32_t mode, void **hdl) { struct file *fp = NULL; mm_segment_t fs; inm_s32_t err = 1; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("entered fname:%s", fname); } INM_BUG_ON (fname == NULL); *hdl = NULL; fs = get_fs (); set_fs (KERNEL_DS); fp = filp_open (fname, mode, 0644); if (IS_ERR (fp)) { dbg ("Not able to open %s", fname); err = 0; goto filp_open_failed; } #if LINUX_VERSION_CODE < KERNEL_VERSION(3,7,0) if ((!fp->f_op->write) || !(fp->f_op->read)) { dbg ("No write found for the fild id %p", fp); err = 0; filp_close (fp, NULL); goto filp_open_failed; } #endif *hdl = (void *) fp; set_fs (fs); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("leaving"); } return 1; filp_open_failed: set_fs (fs); if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("leaving"); } return err; } inm_s32_t flt_read_file (void *hdl, void *buffer, inm_u64_t offset, inm_u32_t length, inm_u32_t *bytes_read) { struct file *fp; ssize_t read; mm_segment_t fs; INM_BUG_ON ((hdl == NULL) || (buffer == NULL)); fp = (struct file *) hdl; if (bytes_read != NULL) *bytes_read = 0; fs = get_fs (); set_fs (KERNEL_DS); #if LINUX_VERSION_CODE >= KERNEL_VERSION(4,14,0) read = kernel_read(fp, (char *) buffer, length, (loff_t *) & offset); #else #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,7,0) read = vfs_read(fp, (char *) buffer, length, (loff_t *) & offset); #else read = fp->f_op->read (fp, (char *) buffer, length, (loff_t *) & offset); #endif #endif set_fs (fs); if (bytes_read != NULL) *bytes_read = 0; if (read <= 0) { return 0; } if (bytes_read != NULL) *bytes_read = read; return 1; } inm_s32_t flt_write_file (void *hdl, void *buffer, inm_u64_t offset, inm_u32_t length, inm_u32_t *bytes_written) { struct file *fp; ssize_t NrWritten; mm_segment_t fs; INM_BUG_ON ((hdl == NULL) || (buffer == NULL)); fp = (struct file *) hdl; fs = get_fs (); set_fs (KERNEL_DS); fp->f_pos = offset; #if LINUX_VERSION_CODE >= KERNEL_VERSION(4,14,0) NrWritten = kernel_write(fp, (char *) buffer, length, (loff_t *) & offset); #else #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,7,0) NrWritten = vfs_write(fp, (char *) buffer, length, (loff_t *) & offset); #else NrWritten = fp->f_op->write (fp, (char *) buffer, length, (loff_t *) & offset); #endif #endif fp->f_pos = offset; set_fs (fs); if (bytes_written != NULL) *bytes_written = 0; if (NrWritten <= 0) { dbg("write failed with error 0x%x", (int32_t) NrWritten); return 0; } if (bytes_written != NULL) *bytes_written = NrWritten; return 1; } inm_s32_t flt_seek_file (void *hdl, inm_s64_t offset, inm_s64_t * newoffset, inm_u32_t seekpos) { struct file *fp; loff_t FinalOffset; mm_segment_t fs; INM_BUG_ON (hdl == NULL); fp = (struct file *) hdl; fs = get_fs (); set_fs (KERNEL_DS); if (fp->f_op && fp->f_op->llseek) { FinalOffset = fp->f_op->llseek (fp, (loff_t) offset, seekpos); } else { FinalOffset = default_llseek (fp, (loff_t) offset, seekpos); } set_fs (fs); if (FinalOffset < 0) { dbg ("llSsek to offset[%lld] from [%d] in file-id[%p] failed.", offset, seekpos, fp); return 0; } *newoffset = (int64_t) FinalOffset; return 1; } inm_s32_t flt_get_file_size (void *hdl, loff_t *fs_size) { struct file *fp; loff_t size; struct address_space *mapping; struct inode *inode; INM_BUG_ON ((hdl == NULL) || (fs_size == NULL)); fp = (struct file *) hdl; mapping = fp->f_mapping; inode = mapping->host; size = i_size_read (inode); *fs_size = (inm_u64_t)size; return 1; } void flt_close_file (void *hdl) { INM_BUG_ON (hdl == NULL); filp_close ((struct file *) hdl, NULL); } inm_s32_t read_full_file(char *filename, char *buffer, inm_u32_t length, inm_u32_t *bytes_read) { inm_u32_t read = 0; void *hdl = NULL; loff_t size = 0; inm_s32_t ret = 1; INM_MEM_ZERO(buffer, length); if (!flt_open_file (filename, O_RDONLY, &hdl)) { dbg("involflt: Opening file %s failed.", filename); return 0; } if(!flt_get_file_size(hdl, &size)) { dbg("flt_get_file_size failed"); ret = 0; goto close_return; } if(length < size) { dbg("Insufficient buffer specified in read_full_file"); ret = 0; goto close_return; } do { if (!flt_read_file(hdl, buffer, 0, length, (inm_s32_t *) &read)) { dbg("flt_read_file failed for %s", filename); ret = 0; goto close_return; } *bytes_read += read; } while (0); close_return: flt_close_file(hdl); return ret; } int32_t __write_to_file(char *filename, void *buffer, inm_s32_t length, inm_u32_t * bytes_written, int oflag) { void *hdl = NULL; int32_t success = 1; if (1 != flt_open_file(filename, O_RDWR | O_CREAT | O_SYNC | oflag, &hdl)) { dbg("involflt: Opening file %s failed.", filename); return 0; } do { if (!flt_write_file(hdl, buffer, 0, length, (inm_s32_t *) bytes_written)) { dbg("involflt: file write failed"); success = 0; break; } } while (0); flt_close_file (hdl); return success; } int32_t write_full_file(char *filename, void *buffer, inm_s32_t length, inm_u32_t * bytes_written) { return __write_to_file(filename, buffer, length, bytes_written, O_TRUNC); } int32_t write_to_file(char *filename, void *buffer, inm_s32_t len, inm_u32_t *written) { return __write_to_file(filename, buffer, len, written, 0); } int file_exists(char *name) { void *hdl = NULL; if (flt_open_file(name, O_RDONLY, &hdl)) { flt_close_file(hdl); return 1; } return 0; } #if LINUX_VERSION_CODE < KERNEL_VERSION(3, 10, 0) #if LINUX_VERSION_CODE > KERNEL_VERSION(2,6,15) static struct dentry * inm_cached_lookup(struct dentry *base, struct qstr *qstr, struct nameidata *nameidata) { struct dentry *dir = d_lookup(base, qstr); if (!dir) dir = d_lookup(base, qstr); if (dir && dir->d_op && dir->d_op->d_revalidate) { if (!dir->d_op->d_revalidate(dir, nameidata) && !d_invalidate(dir)) { dput(dir); dir = NULL; } } return dir; } static struct dentry * inm__lookup_hash(struct qstr *qstr, struct dentry *base_dir, struct nameidata *nameidata) { struct dentry *dir; struct inode *vfs_inode; inm_s32_t error; vfs_inode = base_dir->d_inode; #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,30) error = inode_permission(vfs_inode, MAY_EXEC); #else #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,27) error = vfs_permission(nameidata, MAY_EXEC); #else error = permission(vfs_inode, MAY_EXEC, nameidata); #endif #endif dir= ERR_PTR(error); if (error) goto out; if (base_dir->d_op && base_dir->d_op->d_hash) { #if LINUX_VERSION_CODE <= KERNEL_VERSION(2,6,35) error = base_dir->d_op->d_hash(base_dir, qstr); #else error = base_dir->d_op->d_hash(base_dir, base_dir->d_inode, qstr); #endif dir = ERR_PTR(error); if (error < 0) goto out; } dir = inm_cached_lookup(base_dir, qstr, nameidata); if (!dir) { struct dentry *new_dir = d_alloc(base_dir, qstr); dir = ERR_PTR(-ENOMEM); if (!new_dir) goto out; dir = vfs_inode->i_op->lookup(vfs_inode, new_dir, nameidata); if (!dir) dir = new_dir; else dput(new_dir); } out: return dir; } static struct dentry *inm_lookup_hash(struct nameidata *nameidata) { #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,27) return inm__lookup_hash(&nameidata->last, nameidata->path.dentry, nameidata); #else return inm__lookup_hash(&nameidata->last, nameidata->dentry, nameidata); #endif } #endif struct dentry * inm_lookup_create(struct nameidata *nameidata, inm_s32_t is_dentry) { struct dentry *dir = ERR_PTR(-EEXIST); #if LINUX_VERSION_CODE > KERNEL_VERSION(2,6,15) #ifdef CONFIG_DEBUG_LOCK_ALLOC #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,27) mutex_lock_nested(&nameidata->path.dentry->d_inode->i_mutex, I_MUTEX_PARENT); #else mutex_lock_nested(&nameidata->dentry->d_inode->i_mutex, I_MUTEX_PARENT); #endif #else #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,27) mutex_lock(&nameidata->path.dentry->d_inode->i_mutex); #else mutex_lock(&nameidata->dentry->d_inode->i_mutex); #endif #endif if (nameidata->last_type != LAST_NORM) goto fail; nameidata->flags &= ~LOOKUP_PARENT; nameidata->flags |= LOOKUP_CREATE; #if LINUX_VERSION_CODE < KERNEL_VERSION(3, 8, 13) nameidata->intent.open.flags = O_EXCL; #endif dir = inm_lookup_hash(nameidata); #else INM_DOWN(&nameidata->dentry->d_inode->i_sem); if (nameidata->last_type != LAST_NORM) goto fail; nameidata->flags &= ~LOOKUP_PARENT; #ifdef CONFIG_KGDB dir = lookup_hash(nameidata); #else dir = lookup_hash(&nameidata->last, nameidata->dentry); #endif #endif if (IS_ERR(dir)) goto fail; if (!is_dentry && nameidata->last.name[nameidata->last.len] && !dir->d_inode) goto enoent; return dir; enoent: dput(dir); dir = ERR_PTR(-ENOENT); fail: return dir; } #endif long inm_mkdir(char *dir_name, inm_s32_t mode) { #if LINUX_VERSION_CODE > KERNEL_VERSION(2,6,15) inm_s32_t err = 0; char *tmp; tmp = dir_name; do { struct dentry *dir; inm_lookup_t nameidata; #if LINUX_VERSION_CODE < KERNEL_VERSION(3, 2, 0) err = inm_path_lookup_parent(tmp, &nameidata); if (err) { err("Error in path_lookup\n"); break; } dir = inm_lookup_create(&nameidata, 1); #else dir = kern_path_create(AT_FDCWD, tmp, &nameidata.path, LOOKUP_DIRECTORY); #endif err = PTR_ERR(dir); if (!IS_ERR(dir)) { #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,27) if (!IS_POSIXACL(nameidata.path.dentry->d_inode)) #else if (!IS_POSIXACL(nameidata.dentry->d_inode)) #endif mode &= ~current->fs->umask; dbg("Coming in inm_mkdir befor vfs_mkdir\n"); #if LINUX_VERSION_CODE >= KERNEL_VERSION(5,12,0) err = vfs_mkdir(mnt_user_ns(nameidata.path.mnt), nameidata.path.dentry->d_inode, dir, mode); #else #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,30) err = vfs_mkdir(nameidata.path.dentry->d_inode, dir, mode); #else #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,27) err = vfs_mkdir(nameidata.path.dentry->d_inode, dir, nameidata.path.mnt, mode); #else #if (defined suse && DISTRO_VER==10 && PATCH_LEVEL>=2) || \ LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,24) err = vfs_mkdir(nameidata.dentry->d_inode, dir, nameidata.mnt, mode); #else err = vfs_mkdir(nameidata.dentry->d_inode, dir, mode); #endif #endif #endif #endif dbg("Coming in inm_mkdir after vfs_mkdir\n"); #if LINUX_VERSION_CODE < KERNEL_VERSION(3, 8, 13) dput(dir); #else done_path_create(&nameidata.path, dir); #endif } #if LINUX_VERSION_CODE < KERNEL_VERSION(3, 8, 13) #if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 2, 0) if(IS_ERR(dir)) { break; } #endif #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,27) mutex_unlock(&nameidata.path.dentry->d_inode->i_mutex); #else mutex_unlock(&nameidata.dentry->d_inode->i_mutex); #endif inm_path_release(&nameidata); #endif } while(0); return err; #else inm_s32_t err = 0; struct dentry *dir; struct nameidata nameidata; err = inm_path_lookup_parent(dir_name, &nameidata); if (err) return err; dir = inm_lookup_create(&nameidata, 1); err = PTR_ERR(dir); if (!IS_ERR(dir)) { err = vfs_mkdir(nameidata.dentry->d_inode, dir, mode); dput(dir); } INM_UP(&nameidata.dentry->d_inode->i_sem); path_release(&nameidata); return err; #endif } #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,13,0) long __inm_unlink(const char * pathname, char *parent_path) { inm_s32_t error = 0; char *name = (char *)pathname; struct inode *parent_inode = NULL; struct file *parent_hdl = NULL; struct path path; struct dentry *dentry = NULL; struct inode *deleg = NULL; int retry = 0; dbg("Unlink called on %s, parent = %s", pathname, parent_path); parent_hdl = filp_open(parent_path, O_DIRECTORY, 0); if (IS_ERR(parent_hdl)) error = PTR_ERR(parent_hdl); if (!error) { parent_inode = INM_HDL_TO_INODE(parent_hdl); do { deleg = NULL; retry = 0; error = kern_path(name, 0, &path); if (!error) { dentry = path.dentry; #if LINUX_VERSION_CODE >= KERNEL_VERSION(4,5,0) inode_lock_nested(parent_inode, I_MUTEX_PARENT); #else mutex_lock_nested(&parent_inode->i_mutex, I_MUTEX_PARENT); #endif #if LINUX_VERSION_CODE >= KERNEL_VERSION(5,12,0) error = vfs_unlink(mnt_user_ns(path.mnt), parent_inode, dentry, &deleg); #else error = vfs_unlink(parent_inode, dentry, &deleg); #endif if (error && !deleg) err("vfs_unlink failed with error %d", error); #if LINUX_VERSION_CODE >= KERNEL_VERSION(4,5,0) inode_unlock(parent_inode); #else mutex_unlock(&parent_inode->i_mutex); #endif path_put(&path); } else { dbg("Path lookup failed: %d", error); } if (deleg) { error = break_deleg_wait(&deleg); if (error) err("Cannot break delegation %p," " error %d", deleg, error); else retry = 1; } } while(retry); /* cant use deleg as deleg=NULL in break_deleg_wait()*/ filp_close(parent_hdl, NULL); } else { dbg("Parent path %s open failed: %d", parent_path, error); } return error; } #elif LINUX_VERSION_CODE >= KERNEL_VERSION(3,10,0) long __inm_unlink(const char * pathname, char *unused) { inm_s32_t error = 0; char *name = (char *)pathname; struct path parent_path, path; struct dentry *dentry = NULL; struct inode *deleg = NULL; int retry = 0; dbg("Unlink called on %s", pathname); error = kern_path(name, LOOKUP_DIRECTORY | LOOKUP_PARENT, &parent_path); if (!error) { do { deleg = NULL; retry = 0; error = kern_path(name, 0, &path); if (!error) { dentry = path.dentry; mutex_lock_nested(&parent_path.dentry->d_inode->i_mutex, I_MUTEX_PARENT); error = vfs_unlink(parent_path.dentry->d_inode, dentry, &deleg); if (error && !deleg) err("vfs_unlink failed with error %d\n", error); mutex_unlock(&parent_path.dentry->d_inode->i_mutex); path_put(&path); } else { dbg("Path lookup failed: %d", error); } if (deleg) { error = break_deleg_wait(&deleg); if (error) err("Cannot break delegation %p," "error %d", deleg, error); else retry = 1; } } while(retry); /* cant use deleg as deleg=NULL in break_deleg_wait()*/ path_put(&parent_path); } else { dbg("Parent path lookup failed: %d", error); } return error; } #else #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,32) long __inm_unlink(const char * pathname, char *unused) { inm_s32_t error = 0; char *name = (char *)pathname; struct path parent_path, path; struct dentry *dentry; error = kern_path(name, LOOKUP_DIRECTORY | LOOKUP_PARENT, &parent_path); if (error) { dbg("Parent path lookup failed"); goto out; } error = kern_path(name, 0, &path); if (error) { dbg("Path lookup failed"); goto out_1; } dentry = path.dentry; mutex_lock_nested(&parent_path.dentry->d_inode->i_mutex, I_MUTEX_PARENT); error = vfs_unlink(parent_path.dentry->d_inode, dentry); if (error) err("vfs_unlink failed with error = %di\n", error); mutex_unlock(&parent_path.dentry->d_inode->i_mutex); path_put(&path); out_1: path_put(&parent_path); out: return error; } #else long __inm_unlink(const char *path, char *unused) { #if LINUX_VERSION_CODE > KERNEL_VERSION(2,6,15) inm_s32_t err = 0; char *pathname; struct dentry *dir; struct nameidata nameidata; struct inode *vfs_inode = NULL; pathname = (char *) path; err = inm_path_lookup_parent(pathname, &nameidata); if (err) goto exit; err = -EISDIR; if (nameidata.last_type != LAST_NORM) goto exit1; #ifdef CONFIG_DEBUG_LOCK_ALLOC #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,27) mutex_lock_nested(&nameidata.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT); #else mutex_lock_nested(&nameidata.dentry->d_inode->i_mutex, I_MUTEX_PARENT); #endif #else #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,27) mutex_lock(&nameidata.path.dentry->d_inode->i_mutex); #else mutex_lock(&nameidata.dentry->d_inode->i_mutex); #endif #endif dir = inm_lookup_hash(&nameidata); err = PTR_ERR(dir); if (!IS_ERR(dir)) { if (nameidata.last.name[nameidata.last.len]) goto slashes; vfs_inode = dir->d_inode; if (vfs_inode) INM_ATOMIC_INC(&vfs_inode->i_count); #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,30) err = vfs_unlink(nameidata.path.dentry->d_inode, dir); #else #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,27) err = vfs_unlink(nameidata.path.dentry->d_inode, dir, nameidata.path.mnt); #else #if (defined suse && DISTRO_VER==10 && PATCH_LEVEL>=2) || \ LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,24) err = vfs_unlink(nameidata.dentry->d_inode, dir, nameidata.mnt); #else err = vfs_unlink(nameidata.dentry->d_inode, dir); #endif #endif #endif exit2: dput(dir); } #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,27) mutex_unlock(&nameidata.path.dentry->d_inode->i_mutex); #else mutex_unlock(&nameidata.dentry->d_inode->i_mutex); #endif if (vfs_inode) iput(vfs_inode); exit1: inm_path_release(&nameidata); exit: return err; slashes: err = !dir->d_inode ? -ENOENT : S_ISDIR(dir->d_inode->i_mode) ? -EISDIR : -ENOTDIR; goto exit2; #else inm_s32_t err = 0; char * pathname; struct dentry *dir; struct nameidata nameidata; struct inode *vfs_inode = NULL; pathname = (char *) path; err = inm_path_lookup_parent(pathname, &nameidata); if (err) goto exit; err = -EISDIR; if (nameidata.last_type != LAST_NORM) goto exit1; INM_DOWN(&nameidata.dentry->d_inode->i_sem); #ifdef CONFIG_KGDB dir = lookup_hash(&nameidata); #else dir = lookup_hash(&nameidata.last, nameidata.dentry); #endif err = PTR_ERR(dir); if (!IS_ERR(dir)) { if (nameidata.last.name[nameidata.last.len]) goto slashes; vfs_inode = dir->d_inode; if (vfs_inode) INM_ATOMIC_INC(&vfs_inode->i_count); err = vfs_unlink(nameidata.dentry->d_inode, dir); exit2: dput(dir); } INM_UP(&nameidata.dentry->d_inode->i_sem); if (vfs_inode) iput(vfs_inode); exit1: path_release(&nameidata); exit: return err; slashes: err = !dir->d_inode ? -ENOENT : S_ISDIR(dir->d_inode->i_mode) ? -EISDIR : -ENOTDIR; goto exit2; #endif } #endif #endif long inm_unlink(const char *pathname, char *parent) { if (file_exists((char *)pathname)) return __inm_unlink(pathname, parent); return 0; } inm_s32_t inm_unlink_symlink(const char * pathname, char *parent_path) { struct file *filp = NULL; int unlink = 0; filp = filp_open(pathname, O_RDONLY | O_NOFOLLOW, 0777); if (IS_ERR(filp)) { /* O_NOFOLLOW gives ELOOP for slink */ if (PTR_ERR(filp) == -ELOOP) { dbg("%s is a symlink", pathname); unlink = 1; } else { dbg("Cannot open %s", pathname); } } else { dbg("%s is not a symlink", pathname); filp_close(filp, NULL); } if (unlink) { dbg("Deleting existing symlink %s", pathname); inm_unlink(pathname, parent_path); } return 0; } #if LINUX_VERSION_CODE < KERNEL_VERSION(3,10,0) long inm_rmdir(const char *name) { #if LINUX_VERSION_CODE > KERNEL_VERSION(2,6,15) inm_s32_t err = 0; char * pathname; struct dentry *dir; struct nameidata nameidata; pathname = (char *) name; err = inm_path_lookup_parent(pathname, &nameidata); if (err) goto exit; switch(nameidata.last_type) { case LAST_DOTDOT: err = -ENOTEMPTY; goto exit1; case LAST_DOT: err = -EINVAL; goto exit1; case LAST_ROOT: err = -EBUSY; goto exit1; } #ifdef CONFIG_DEBUG_LOCK_ALLOC #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,27) mutex_lock_nested(&nameidata.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT); #else mutex_lock_nested(&nameidata.dentry->d_inode->i_mutex, I_MUTEX_PARENT); #endif #else #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,27) mutex_lock(&nameidata.path.dentry->d_inode->i_mutex); #else mutex_lock(&nameidata.dentry->d_inode->i_mutex); #endif #endif dir = inm_lookup_hash(&nameidata); err = PTR_ERR(dir); if (!IS_ERR(dir)) { #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,30) err = vfs_rmdir(nameidata.path.dentry->d_inode, dir); #else #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,27) err = vfs_rmdir(nameidata.path.dentry->d_inode, dir, nameidata.path.mnt); #else #if (defined suse && DISTRO_VER==10 && PATCH_LEVEL>=2) || \ LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,24) err = vfs_rmdir(nameidata.dentry->d_inode, dir, nameidata.mnt); #else err = vfs_rmdir(nameidata.dentry->d_inode, dir); #endif #endif #endif dput(dir); } #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,27) mutex_unlock(&nameidata.path.dentry->d_inode->i_mutex); #else mutex_unlock(&nameidata.dentry->d_inode->i_mutex); #endif exit1: inm_path_release(&nameidata); exit: return err; #else inm_s32_t err = 0; char * pathname; struct dentry *dir; struct nameidata nameidata; pathname = (char *)name; err = inm_path_lookup_parent(pathname, &nameidata); if (err) goto exit; switch(nameidata.last_type) { case LAST_DOTDOT: err = -ENOTEMPTY; goto exit1; case LAST_DOT: err = -EINVAL; goto exit1; case LAST_ROOT: err = -EBUSY; goto exit1; } INM_DOWN(&nameidata.dentry->d_inode->i_sem); #ifdef CONFIG_KGDB dir = lookup_hash(&nameidata); #else dir = lookup_hash(&nameidata.last, nameidata.dentry); #endif err = PTR_ERR(dir); if (!IS_ERR(dir)) { err = vfs_rmdir(nameidata.dentry->d_inode, dir); dput(dir); } INM_UP(&nameidata.dentry->d_inode->i_sem); exit1: path_release(&nameidata); exit: return err; #endif } #endif void remove_slashes(char *vol_name, char *target) { char * ptr; strcpy_s(target, strlen(vol_name) + 1, vol_name); ptr = target; while (*ptr) { if (*ptr == '/') *ptr = '_'; ptr++; } } #ifdef INM_RECUSIVE_ADSPC inma_ops_t *inm_alloc_inma_ops(void) { inma_ops_t *inma_opsp = NULL; inma_opsp = (inma_ops_t *) INM_KMALLOC(sizeof(inma_ops_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if (inma_opsp) inma_opsp->ia_mapping = NULL; return inma_opsp; } void inm_free_inma_ops(inma_ops_t *inma_opsp) { if (inma_opsp) kfree(inma_opsp); } inma_ops_t * inm_get_inmaops_from_aops(const inm_address_space_operations_t *mapping, inm_u32_t unused) { struct inm_list_head *lp = NULL, *np = NULL; inma_ops_t *t_inma_opsp = NULL; INM_BUG_ON(!mapping); inm_list_for_each_safe(lp, np, &driver_ctx->dc_inma_ops_list) { t_inma_opsp = inm_list_entry(lp, inma_ops_t, ia_list); if (mapping == t_inma_opsp->ia_mapping) break; t_inma_opsp = NULL; } return t_inma_opsp; } int inm_prepare_tohandle_recursive_writes(struct inode *inodep) { const inm_address_space_operations_t *mapping= NULL; inma_ops_t *t_inma_opsp = NULL; inm_s32_t ret = -1; if (!inodep || !inodep->i_mapping) { ret = -EINVAL; goto exit; } dbg("%p -> %p", inodep, inodep->i_mapping); mapping = (inm_address_space_operations_t *)inodep->i_mapping; INM_BUG_ON(!mapping); /* new node is required */ INM_DOWN_WRITE(&driver_ctx->dc_inmaops_sem); t_inma_opsp = inm_get_inmaops_from_aops(mapping, INM_ORG_ADDR_SPACE_OPS); if (t_inma_opsp) { /* Multiple open files should not have same mapping */ INM_BUG_ON(t_inma_opsp); ret = -EEXIST; goto exit_locked; } t_inma_opsp = inm_alloc_inma_ops(); if (!t_inma_opsp) { ret = -ENOMEM; goto exit_locked; } t_inma_opsp->ia_mapping = mapping; dbg("Add recursive object: Lookup = %p, Mapping = %p", t_inma_opsp, mapping); inm_list_add_tail(&t_inma_opsp->ia_list, &driver_ctx->dc_inma_ops_list); ret = 0; exit_locked: INM_UP_WRITE(&driver_ctx->dc_inmaops_sem); exit: return ret; } void inm_restore_org_addr_space_ops(struct inode *inodep) { inma_ops_t *t_inma_opsp = NULL; INM_DOWN_WRITE(&driver_ctx->dc_inmaops_sem); t_inma_opsp = inm_get_inmaops_from_aops(inodep->i_mapping, INM_ORG_ADDR_SPACE_OPS); inm_list_del(&t_inma_opsp->ia_list); INM_UP_WRITE(&driver_ctx->dc_inmaops_sem); dbg("Delete recursive object: Lookup = %p, mapping = %p", t_inma_opsp, t_inma_opsp->ia_mapping); inm_free_inma_ops(t_inma_opsp); } #else /* allocates memory and returns inm_ops_t pointer */ inma_ops_t *inm_alloc_inma_ops(void) { inma_ops_t *inma_opsp = NULL; inm_address_space_operations_t *a_opsp = NULL; a_opsp = (inm_address_space_operations_t *) INM_KMALLOC(sizeof(inm_address_space_operations_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!a_opsp) { goto exit; } inma_opsp = (inma_ops_t *) INM_KMALLOC(sizeof(inma_ops_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!inma_opsp) { kfree(a_opsp); goto exit; } inma_opsp->ia_org_aopsp = NULL; inma_opsp->ia_dup_aopsp = a_opsp; exit: return inma_opsp; } /* frees memory associated with inma_opsp ptr */ void inm_free_inma_ops(inma_ops_t *inma_opsp) { if (inma_opsp) { if (inma_opsp->ia_dup_aopsp) { kfree(inma_opsp->ia_dup_aopsp); } kfree(inma_opsp); } } /* walks through the inm_aops_list, if the a_opsp matches with * ia_org/dup_addr_space_opsp based on lookup_flag then it returns * the inma_aops_t ptr, otherwise returns NULL. * the callers should acquire dc_inmaops_sem, before calling this fn */ inma_ops_t *inm_get_inmaops_from_aops(const inm_address_space_operations_t *a_opsp, inm_u32_t lookup_flag) { struct inm_list_head *lp = NULL, *np = NULL; inma_ops_t *t_inma_opsp = NULL; INM_BUG_ON(!a_opsp); INM_BUG_ON(lookup_flag >= INM_MAX_ADDR_OPS); inm_list_for_each_safe(lp, np, &driver_ctx->dc_inma_ops_list) { t_inma_opsp = (inma_ops_t *) inm_list_entry(lp, inma_ops_t, ia_list); if (lookup_flag == INM_DUP_ADDR_SPACE_OPS) { if (a_opsp == t_inma_opsp->ia_dup_aopsp) { break; } } else if (a_opsp == t_inma_opsp->ia_org_aopsp) { break; } t_inma_opsp = NULL; } return t_inma_opsp; } /* this function replaces a_ops with duplicated a_ops ptr in two steps, * if one duplicated a_ops ptr is found in the dc_inma_ops_list, otherwise * it allocates new inma_ops structure, then adds to the dc_inma_ops_list, * * once appropriate duplicate a_ops exist in the global list, it replaces the * file's a_ops ptr with duplicate a_ops ptr */ int inm_prepare_tohandle_recursive_writes(struct inode *inodep) { const inm_address_space_operations_t *a_opsp = NULL; inma_ops_t *t_inma_opsp = NULL; inm_s32_t ret = -1; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("entered"); } if (!inodep || !inodep->i_mapping) { ret = -EINVAL; goto exit; } a_opsp = (inm_address_space_operations_t *)inodep->i_mapping->a_ops; INM_BUG_ON(!a_opsp); INM_DOWN_READ(&driver_ctx->dc_inmaops_sem); t_inma_opsp = inm_get_inmaops_from_aops(a_opsp, INM_ORG_ADDR_SPACE_OPS); INM_UP_READ(&driver_ctx->dc_inmaops_sem); if (t_inma_opsp) { goto xchange; } /* new node is required */ INM_DOWN_WRITE(&driver_ctx->dc_inmaops_sem); t_inma_opsp = inm_get_inmaops_from_aops(a_opsp, INM_ORG_ADDR_SPACE_OPS); if (t_inma_opsp) { /* some body else added the new node */ INM_UP_WRITE(&driver_ctx->dc_inmaops_sem); goto xchange; } t_inma_opsp = inm_alloc_inma_ops(); if (!t_inma_opsp) { INM_UP_WRITE(&driver_ctx->dc_inmaops_sem); ret = -ENOMEM; goto exit; } t_inma_opsp->ia_org_aopsp = a_opsp; memcpy_s(t_inma_opsp->ia_dup_aopsp, sizeof(*a_opsp), a_opsp, sizeof(*a_opsp)); inm_list_add_tail(&t_inma_opsp->ia_list, &driver_ctx->dc_inma_ops_list); INM_UP_WRITE(&driver_ctx->dc_inmaops_sem); xchange: (void)xchg(&inodep->i_mapping->a_ops, t_inma_opsp->ia_dup_aopsp); dbg("DAOPS = %p, OAOPS = %p", t_inma_opsp->ia_dup_aopsp, t_inma_opsp->ia_org_aopsp); ret = 0; exit: if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("leaving"); } return ret; } /* this function replaces the original address apace operations ptr, that was * replaced by inm_prepare_tohandle_recursive_writes() fn */ void inm_restore_org_addr_space_ops(struct inode *inodep) { inma_ops_t *t_inma_opsp = NULL; INM_DOWN_READ(&driver_ctx->dc_inmaops_sem); t_inma_opsp = inm_get_inmaops_from_aops( (inm_address_space_operations_t *)inodep->i_mapping->a_ops, INM_DUP_ADDR_SPACE_OPS); INM_UP_READ(&driver_ctx->dc_inmaops_sem); if (t_inma_opsp) { (void)xchg(&inodep->i_mapping->a_ops, t_inma_opsp->ia_org_aopsp); } } #endif /* wrapper around flt_open_file, this is required only for data files */ inm_s32_t flt_open_data_file (const char *fnamep, inm_u32_t mode, void **hdlpp) { void *t_hdlp = NULL; mm_segment_t fs; struct inode *inodep = NULL; inm_s32_t ret = 1; if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("entered"); } ret = flt_open_file(fnamep, mode, &t_hdlp); if (ret == 0) { goto exit; } fs = get_fs (); set_fs (KERNEL_DS); inodep = INM_HDL_TO_INODE((struct file *) t_hdlp); inm_prepare_tohandle_recursive_writes(inodep); *hdlpp = t_hdlp; set_fs (fs); exit: if(IS_DBG_ENABLED(inm_verbosity, (INM_IDEBUG | INM_IDEBUG_META))){ info("leaving"); } return ret; } involflt-0.1.0/src/verifier_user.c0000755000000000000000000000271114467303177015667 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include "verifier.h" void verify_file(char *fname) { int fd = -1; unsigned long long size = 0; char *buf = NULL; if (!fname) exit(-1); fd = open(fname, O_RDONLY); if (fd < 0) { perror("Open"); exit(-1); } size = lseek(fd, 0, SEEK_END); printf("File Size: %llu\n", size); buf = malloc(size); if (!buf) { perror("Malloc"); exit(-1); } if (lseek(fd, 0, SEEK_SET) != 0) { perror("Seek SET"); exit(-1); } if (size != read(fd, buf, size)) { perror("Read"); exit(-1); } if (inm_verify_change_node_data(buf, size, 1)) { printf("Verification failed"); exit(-1); } } int main(int argc, char *argv[]) { verify_file(argv[1]); } involflt-0.1.0/src/telemetry-types.c0000755000000000000000000002237114467303177016176 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include "involflt.h" #include "work_queue.h" #include "utils.h" #include "filestream.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "VBitmap.h" #include "change-node.h" #include "data-file-mode.h" #include "target-context.h" #include "data-mode.h" #include "driver-context.h" #include "file-io.h" #include "osdep.h" #include "telemetry-types.h" #include "telemetry.h" extern driver_context_t *driver_ctx; #define TELEMETRY_DEFAULT_TAG_GUID "TAG_GUID_UNINITIALIZED" void telemetry_set_dbs(inm_u64_t *state, inm_u64_t flag) { inm_irqflag_t lock_flag; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_tel.dt_dbs_slock, lock_flag); *state |= flag; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_tel.dt_dbs_slock, lock_flag); } void telemetry_clear_dbs(inm_u64_t *state, inm_u64_t flag) { inm_irqflag_t lock_flag; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_tel.dt_dbs_slock, lock_flag); *state &= ~flag; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_tel.dt_dbs_slock, lock_flag); } inm_u64_t telemetry_get_dbs(target_context_t *tgt_ctxt, inm_s32_t tag_status, etTagStateTriggerReason reason) { inm_u64_t dbs = 0; inm_irqflag_t lock_flag; INM_SPIN_LOCK_IRQSAVE(&driver_ctx->dc_tel.dt_dbs_slock, lock_flag); dbs = driver_ctx->dc_tel.dt_blend; if (tgt_ctxt) dbs |= tgt_ctxt->tc_tel.tt_blend; INM_SPIN_UNLOCK_IRQRESTORE(&driver_ctx->dc_tel.dt_dbs_slock, lock_flag); if (tgt_ctxt && tag_status == ecTagStatusDropped) { switch (reason) { case ecBitmapWrite: dbs |= DBS_BITMAP_WRITE; break; case ecFilteringStopped: dbs |= DBS_FILTERING_STOPPED; break; case ecClearDiffs: dbs |= DBS_CLEAR_DIFFERENTIALS; break; case ecNonPagedPoolLimitHitMDMode: dbs |= DBS_NPPOOL_LIMIT_HIT_MD_MODE; break; case ecSplitIOFailed: dbs |= DBS_SPLIT_IO_FAILED; break; case ecOrphan: dbs |= DBS_ORPHAN; break; default: err("Invalid reason %d for dropped tag", reason); break; } } return (dbs | TEL_FLAGS_SET_BY_DRIVER); } void telemetry_tag_stats_record(target_context_t *tgt_ctxt, tgt_stats_t *stats) { stats->ts_pending = tgt_ctxt->tc_bytes_pending_changes; stats->ts_tracked_bytes = tgt_ctxt->tc_bytes_tracked; stats->ts_drained_bytes = tgt_ctxt->tc_bytes_commited_changes; stats->ts_getdb = tgt_ctxt->tc_tel.tt_getdb; stats->ts_commitdb = tgt_ctxt->tc_tel.tt_commitdb; stats->ts_revertdb = tgt_ctxt->tc_tel.tt_revertdb; stats->ts_commitdb_failed = tgt_ctxt->tc_tel.tt_commitdb_failed; stats->ts_nwlb1 = tgt_ctxt->tc_dbcommit_latstat.ls_freq[0]; stats->ts_nwlb2 = tgt_ctxt->tc_dbcommit_latstat.ls_freq[1]; stats->ts_nwlb3 = tgt_ctxt->tc_dbcommit_latstat.ls_freq[2] + tgt_ctxt->tc_dbcommit_latstat.ls_freq[3]; stats->ts_nwlb4 = tgt_ctxt->tc_dbcommit_latstat.ls_freq[4] + tgt_ctxt->tc_dbcommit_latstat.ls_freq[5] + tgt_ctxt->tc_dbcommit_latstat.ls_freq[6]; stats->ts_nwlb5 = tgt_ctxt->tc_dbcommit_latstat.ls_freq[7] + tgt_ctxt->tc_dbcommit_latstat.ls_freq[8]; } inm_u64_t telemetry_get_wostate(target_context_t *tgt_ctxt) { inm_u64_t wo_state = 0; wo_state |= tgt_ctxt->tc_prev_wostate; wo_state = wo_state << 2; wo_state |= tgt_ctxt->tc_cur_wostate; wo_state = wo_state << 2; wo_state |= tgt_ctxt->tc_cur_mode; wo_state |= TEL_FLAGS_SET_BY_DRIVER; return wo_state; } inm_u64_t telemetry_md_capture_reason(target_context_t *tgt_ctxt) { return TEL_FLAGS_SET_BY_DRIVER; } void telemetry_tag_common_put(tag_telemetry_common_t *tag_common) { if (INM_ATOMIC_DEC_AND_TEST(&tag_common->tc_refcnt)) INM_KFREE(tag_common, sizeof(tag_telemetry_common_t), INM_KERNEL_HEAP); } void telemetry_tag_common_get(tag_telemetry_common_t *tag_common) { INM_ATOMIC_INC(&tag_common->tc_refcnt); } tag_telemetry_common_t * telemetry_tag_common_alloc(inm_s32_t ioctl_cmd) { tag_telemetry_common_t *tag_common = NULL; get_time_stamp(&(driver_ctx->dc_tel.dt_last_tag_request_time)); tag_common = INM_KMALLOC(sizeof(tag_telemetry_common_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!tag_common) goto out; INM_MEM_ZERO(tag_common, sizeof(tag_telemetry_common_t)); switch (ioctl_cmd) { case IOCTL_INMAGE_IOBARRIER_TAG_VOLUME: tag_common->tc_type = ecTagLocalCrash; sprintf_s(tag_common->tc_guid, sizeof(tag_common->tc_guid), "%s", TELEMETRY_DEFAULT_TAG_GUID); break; default: INM_BUG_ON(ioctl_cmd); INM_KFREE(tag_common, sizeof(tag_telemetry_common_t), INM_KERNEL_HEAP); tag_common = NULL; break; } if (tag_common) { tag_common->tc_ioctl_cmd = ioctl_cmd; tag_common->tc_req_time = driver_ctx->dc_tel.dt_last_tag_request_time; INM_ATOMIC_SET(&tag_common->tc_refcnt, 0); telemetry_tag_common_get(tag_common); } out: return tag_common; } void telemetry_tag_history_free(tag_history_t *tag_hist) { target_context_t *tgt_ctxt = (target_context_t *)tag_hist->th_tgt_ctxt; telemetry_tag_common_put(tag_hist->th_tag_common); INM_KFREE(tag_hist, sizeof(tag_history_t), INM_KERNEL_HEAP); put_tgt_ctxt(tgt_ctxt); } tag_history_t * telemetry_tag_history_alloc(target_context_t *tgt_ctxt, tag_telemetry_common_t *tag_common) { tag_history_t *tag_hist = NULL; tag_hist = INM_KMALLOC(sizeof(tag_history_t), INM_KM_SLEEP, INM_KERNEL_HEAP); if (tag_hist) { INM_MEM_ZERO(tag_hist, sizeof(tag_history_t)); get_tgt_ctxt(tgt_ctxt); telemetry_tag_common_get(tag_common); tag_hist->th_tag_common = tag_common; tag_hist->th_tgt_ctxt = tgt_ctxt; } return tag_hist; } void telemetry_tag_history_record(target_context_t *tgt_ctxt, tag_history_t *tag_hist) { get_time_stamp(&tag_hist->th_insert_time); tag_hist->th_prev_tag_time = tgt_ctxt->tc_tel.tt_prev_tag_time; tag_hist->th_prev_succ_tag_time = tgt_ctxt->tc_tel.tt_prev_succ_tag_time; tag_hist->th_tag_status = 0; tag_hist->th_blend = telemetry_get_dbs(tgt_ctxt, ecTagStatusMaxEnum, ecNotApplicable); tag_hist->th_tag_state = ecTagStatusInsertSuccess; tag_hist->th_commit_time = 0; /* Not implemented */ tag_hist->th_drainbarr_time = 0; /* Not implemented */ tag_hist->th_prev_succ_stats = tgt_ctxt->tc_tel.tt_prev_succ_stats; tag_hist->th_prev_stats = tgt_ctxt->tc_tel.tt_prev_stats; telemetry_tag_stats_record(tgt_ctxt, &tag_hist->th_cur_stats); /* update target telemetry last and last succ ts/stats */ tgt_ctxt->tc_tel.tt_prev_tag_time = tag_hist->th_tag_common->tc_req_time; tgt_ctxt->tc_tel.tt_prev_stats = tag_hist->th_cur_stats; tgt_ctxt->tc_tel.tt_prev_succ_tag_time = tag_hist->th_tag_common->tc_req_time; tgt_ctxt->tc_tel.tt_prev_succ_stats = tag_hist->th_cur_stats; return; } void telemetry_nwo_stats_record(target_context_t *tgt_ctxt, etWriteOrderState cur_state, etWriteOrderState new_state, etWOSChangeReason reason) { non_wo_stats_t *nwo = &tgt_ctxt->tc_tel.tt_nwo; nwo->nws_old_state = cur_state; nwo->nws_new_state = new_state; nwo->nws_change_time = tgt_ctxt->tc_stats.st_wostate_switch_time * HUNDREDS_OF_NANOSEC_IN_SECOND; /* Following pending changes should be in bytes */ nwo->nws_meta_pending = tgt_ctxt->tc_pending_md_changes; nwo->nws_bmap_pending = tgt_ctxt->tc_bp->num_changes_queued_for_writing; nwo->nws_data_pending = tgt_ctxt->tc_pending_changes; nwo->nws_nwo_secs = tgt_ctxt->tc_stats.num_secs_in_wostate[cur_state]; nwo->nws_reason = reason; nwo->nws_mem_alloc = tgt_ctxt->tc_stats.num_pages_allocated * PAGE_SIZE; nwo->nws_mem_reserved = tgt_ctxt->tc_reserved_pages * PAGE_SIZE; nwo->nws_mem_free = driver_ctx->dc_cur_unres_pages * PAGE_SIZE; nwo->nws_free_cn = 0; nwo->nws_used_cn = tgt_ctxt->tc_nr_cns; nwo->nws_max_used_cn = 0; nwo->nws_blend = telemetry_get_dbs(tgt_ctxt, ecTagStatusMaxEnum, ecNotApplicable); nwo->nws_np_alloc = 0; nwo->nws_np_limit_time = 0; nwo->nws_np_alloc_fail = 0; } void telemetry_check_time_jump(void) { static inm_u64_t prev_time = 0; inm_u64_t cur_time = 0; inm_u64_t diff_time = 0; get_time_stamp(&cur_time); if (prev_time) { /* Expected time */ prev_time += TELEMETRY_MSEC_TO_100NSEC( TELEMETRY_FILE_REFRESH_INTERVAL); diff_time = (cur_time > prev_time) ? cur_time - prev_time : prev_time - cur_time; update_cx_with_time_jump(cur_time, prev_time); if (diff_time > TELEMETRY_ACCEPTABLE_TIME_JUMP_THRESHOLD) { driver_ctx->dc_tel.dt_time_jump_exp = prev_time; driver_ctx->dc_tel.dt_time_jump_cur = cur_time; } } prev_time = cur_time; } involflt-0.1.0/src/statechange.h0000755000000000000000000000211114467303177015303 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _INMAGE_STATECHANGE_H_ #define _INMAGE_STATECHANGE_H_ #include "involflt-common.h" #include "target-context.h" inm_s32_t create_service_thread(void); void destroy_service_thread(void); int service_state_change_thread(void *context); #endif /* _INMAGE_STATECHANGE_H_ */ involflt-0.1.0/src/filestream_raw.c0000755000000000000000000003445214467303177016031 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include "involflt.h" #include "work_queue.h" #include "utils.h" #include "filestream.h" #include "filestream_segment_mapper.h" #include "segmented_bitmap.h" #include "VBitmap.h" #include "change-node.h" #include "data-file-mode.h" #include "target-context.h" #include "data-mode.h" #include "driver-context.h" #include "file-io.h" #include "osdep.h" #include "db_routines.h" #include "errlog.h" #include "filestream_raw.h" extern driver_context_t *driver_ctx; static void fstream_raw_print_map(char *file, fstream_raw_hdl_t *hdl) { int i = 0; int j = 0; inm_u32_t nblks = 0; inm_u64_t doffset = 0; inm_u64_t foffset = 0; inm_u32_t len = 0; void *disk = 0; char diskname[INM_BDEVNAME_SIZE]; if (!hdl) { dump_stack(); return; } nblks = hdl->frh_nblks; info("HDL0: %s", file); info("HDL1: fsize = %llu, offset = %llu, len = %u, alen = %u", hdl->frh_fsize, hdl->frh_offset, hdl->frh_len, hdl->frh_alen); info("HDL2: nbsize = %u, bshift = %u, blks = %u, npgs = %u", hdl->frh_bsize, hdl->frh_bshift, hdl->frh_nblks, hdl->frh_npages); info("HDL3: Blkmap"); for (i = 0; i < hdl->frh_npages; i++) { info("page[%d] = %p", i, hdl->frh_blocks[i]); info("%16s %16s %16s %16s", "foffset","disk", "sector", "length"); for (j = 0; j < FSRAW_BLK_PER_PAGE && nblks; nblks--, j++) { if (doffset != 0) { /* previous valid block */ /* If the block is contiguous with previous block */ if (((hdl->frh_blocks[i][j]).fb_disk == disk && (doffset + len) == (hdl->frh_blocks[i][j]).fb_offset)) { len += hdl->frh_bsize; } else { inm_blkdev_name(disk, diskname); info("%16llu %16s %16llu %16u", foffset, diskname, doffset >> INM_SECTOR_SHIFT, len); foffset += len; doffset = 0; } } if (doffset == 0) { disk = (hdl->frh_blocks[i][j]).fb_disk; doffset = (hdl->frh_blocks[i][j]).fb_offset; len = hdl->frh_bsize; } if (nblks == 1) { inm_blkdev_name(disk, diskname); info("%16llu %16s %16llu %16u", foffset, diskname, doffset >> INM_SECTOR_SHIFT, len); } } } } static void fstream_raw_revert_recursive_detection(void *filp) { int rflag = 0; rflag = driver_ctx->dc_lcw_rflag; driver_ctx->dc_lcw_rflag = 0; driver_ctx->dc_lcw_aops = NULL; driver_ctx->dc_lcw_rhdl = NULL; inm_restore_org_addr_space_ops(INM_HDL_TO_INODE(filp)); if (rflag) inm_restore_org_addr_space_ops(INM_HDL_TO_INODE(filp)); } /* * Recursive writes logic is optimized to share same duplicate aops * across multiple files which cannot be used to distinguish raw mapping * writes from writes to other files. As a workaround, we prepare the * file twice for recursive writes which gives the inode a distinct * aops/mapping and allows us to distinguish from writes to other files. */ static inm_s32_t fstream_raw_prepare_for_recusive_detection(void *filp, fstream_raw_hdl_t *hdl) { inm_s32_t error = 0; inma_ops_t *aops = NULL; int rflag = 0; /* * If file is not already prepped for recursive writes, do it the first * time to get the mapping/aops shared with other files */ aops = inm_get_inmaops_from_aops(INM_INODE_AOPS(INM_HDL_TO_INODE(filp)), INM_DUP_ADDR_SPACE_OPS); if (!aops) { error = inm_prepare_tohandle_recursive_writes(INM_HDL_TO_INODE(filp)); if (error) { err("Recursive IO (1) handling failed"); goto out; } rflag = 1; } /* * Override the shared mapping/aops with a new one distinct from others. */ error = inm_prepare_tohandle_recursive_writes(INM_HDL_TO_INODE(filp)); if (error) { err("Recursive IO (2) handling failed"); if (rflag) inm_restore_org_addr_space_ops(INM_HDL_TO_INODE(filp)); goto out; } driver_ctx->dc_lcw_aops = inm_get_inmaops_from_aops( INM_INODE_AOPS(INM_HDL_TO_INODE(filp)), INM_DUP_ADDR_SPACE_OPS); driver_ctx->dc_lcw_rhdl = hdl; driver_ctx->dc_lcw_rflag = rflag; out: return error; } static void fstream_raw_map_file_blocks(fstream_raw_hdl_t *hdl, inm_bio_dev_t *disk, inm_u64_t foffset, inm_u64_t doffset, inm_u32_t len) { fr_block_t *map = NULL; inm_u32_t page = 0; inm_u32_t block = 0; inm_irqflag_t flag = 0; dbg("disk = %p, f_offset = %llu, d_offset = %llu, len = %u", disk, foffset, doffset, len); while (len) { page = FSRAW_BLK_PAGE(hdl, foffset); block = FSRAW_BLK_IDX(hdl, foffset); INM_BUG_ON(!(hdl->frh_blocks[page])); INM_SPIN_LOCK_IRQSAVE(&(hdl->frh_slock), flag); map = &(hdl->frh_blocks[page][block]); if (map->fb_disk) { err("Mapping already mapped block"); hdl->frh_nblks = 0; break; } if (hdl->frh_nblks == (hdl->frh_alen >> hdl->frh_bshift)) { err("Mapping more blocks then expected"); hdl->frh_nblks = 0; break; } map->fb_disk = disk; map->fb_offset = doffset; hdl->frh_nblks++; INM_SPIN_UNLOCK_IRQRESTORE(&(hdl->frh_slock), flag); foffset += hdl->frh_bsize; doffset += hdl->frh_bsize; len -= min(len, hdl->frh_bsize); } } void fstream_raw_map_bio(inm_buf_t *bio) { inm_u64_t foffset = 0; inm_u32_t len = 0; fstream_raw_hdl_t *hdl = driver_ctx->dc_lcw_rhdl; inm_bio_dev_t *bdev; dbg("Hdl = %p", hdl); if (hdl->frh_bsize < PAGE_SIZE) { /* Write should be fs block aligned and < PAGE_SIZE */ if (!IS_ALIGNED(INM_BUF_OFFSET(bio), hdl->frh_bsize) || !IS_ALIGNED(INM_BUF_COUNT(bio), hdl->frh_bsize) || INM_BUF_COUNT(bio) > PAGE_SIZE) { err("LCW Learn: bsize = %u, io size = %u, page offset = %lu", hdl->frh_bsize, INM_BUF_COUNT(bio), (long unsigned int)INM_BUF_OFFSET(bio)); hdl->frh_nblks = 0; return; } foffset = hdl->frh_nblks * hdl->frh_bsize; /* * Since FS writes are of size == PAGE_SIZE, multiple bio may be * generated if fs bsize < PAGE_SIZE and some of the blocks may have * been mapped. Align the file offset to PAGE_SIZE on the lower side */ if (foffset) foffset = ALIGN((foffset - (PAGE_SIZE - 1)), PAGE_SIZE); dbg("foffset(1) = %llu", foffset); /* * The vector page maps PAGE_SIZE and vector offset == offset from * PAGE_SIZE aligned file offset. */ foffset += INM_BUF_OFFSET(bio); dbg("foffset(2) = %llu", foffset); len = INM_BUF_COUNT(bio); dbg("len = %d", len); } else { /* Write should be upto fs bsize and should start at vec pg offset 0 */ if ((INM_BUF_COUNT(bio) > hdl->frh_bsize) || INM_BUF_OFFSET(bio)) { err("LCW Learn: bsize = %u, io size = %u, page offset = %lu", hdl->frh_bsize, INM_BUF_COUNT(bio), (long unsigned int)INM_BUF_OFFSET(bio)); hdl->frh_nblks = 0; return; } foffset = hdl->frh_nblks * hdl->frh_bsize; len = hdl->frh_bsize; } bdev = INM_BUF_BDEV(bio); if (!bdev) return; if (inm_blkdev_get(bdev)) return; fstream_raw_map_file_blocks(hdl, bdev, foffset, INM_BUF_SECTOR(bio) << INM_SECTOR_SHIFT, len); } static void fstream_raw_hdl_free(fstream_raw_hdl_t *hdl) { int i = 0; if (!hdl) { dump_stack(); return; } for (i = 0; i < hdl->frh_npages; i++) { dbg("Free Page %p", hdl->frh_blocks[i]); inm_free_page((unsigned long)hdl->frh_blocks[i]); } i = sizeof(fstream_raw_hdl_t) + (sizeof(fr_block_t *) * hdl->frh_npages); dbg("Free HDL: %d", i); INM_KFREE(hdl, i, INM_KERNEL_HEAP); } static inm_s32_t fstream_raw_hdl_alloc(void *filp, inm_u64_t offset, inm_u32_t len, fstream_raw_hdl_t **rhdl) { int error = 0; fstream_raw_hdl_t *hdl = NULL; int nblks = 0; int npages = 0; loff_t fsize = 0; fr_block_t *page = NULL; if (!flt_get_file_size(filp, &fsize)) { error = -ENOENT; goto out; } if (offset) return -EINVAL; if (!len) { /* If no len , map entire file */ if ((fsize - offset) > UINT_MAX) { /* len == uint */ error = -EINVAL; goto out; } len = (inm_u32_t)(fsize - offset); } if ((offset + len) > fsize) { error = -EINVAL; goto out; } nblks = ALIGN(len, (INM_HDL_TO_INODE(filp))->i_sb->s_blocksize); nblks = nblks >> (INM_HDL_TO_INODE(filp))->i_sb->s_blocksize_bits;; npages = ((nblks - 1) >> FSRAW_BLK_PER_PAGE_SHIFT) + 1; dbg("Expected nblks = %u, npages = %u", nblks, npages); hdl = INM_KMALLOC(sizeof(fstream_raw_hdl_t) + (sizeof(fr_block_t *) * npages), INM_KM_SLEEP, INM_KERNEL_HEAP); if (!hdl) { error = -ENOMEM; goto out; } INM_MEM_ZERO(hdl, sizeof(fstream_raw_hdl_t) + (sizeof(fr_block_t *) * npages)); INM_INIT_SPIN_LOCK(&hdl->frh_slock); hdl->frh_fsize = fsize; hdl->frh_offset = offset; hdl->frh_len = len; hdl->frh_alen = ALIGN(len, (INM_HDL_TO_INODE(filp))->i_sb->s_blocksize); hdl->frh_bsize = (INM_HDL_TO_INODE(filp))->i_sb->s_blocksize; hdl->frh_bshift = (INM_HDL_TO_INODE(filp))->i_sb->s_blocksize_bits; hdl->frh_nblks = 0; /* access the block map as 2D array */ hdl->frh_blocks = (fr_block_t **)((char *)hdl + sizeof(fstream_raw_hdl_t)); for (hdl->frh_npages = 0; hdl->frh_npages < npages; hdl->frh_npages++) { page = (fr_block_t *)__inm_get_free_page(INM_KM_SLEEP); if (!page) break; dbg("Page[%u] = %p", hdl->frh_npages, page); INM_MEM_ZERO(page, PAGE_SIZE); hdl->frh_blocks[hdl->frh_npages] = page; } if (hdl->frh_npages != npages) { error = -ENOMEM; fstream_raw_hdl_free(hdl); hdl = NULL; } else { *rhdl = hdl; } out: return error; } /* * This API is not multithread safe and calls should be serialized */ inm_s32_t fstream_raw_open(char *file, inm_u64_t offset, inm_u32_t len, fstream_raw_hdl_t **raw_hdl) { int error = 0; void *filp = NULL; char *buf = NULL; inm_u32_t iosize = 0; inm_u32_t iodone = 0; fstream_raw_hdl_t *hdl = NULL; dbg("Mapping %s -> %llu:%d", file, offset, len); if (!flt_open_file(file, INM_RDWR | INM_SYNC, &filp)) { err("Cannot open %s", file); filp = NULL; error = -ENOENT; goto out; } error = fstream_raw_hdl_alloc(filp, offset, len, &hdl); if (error) { err("Cannot alloc raw handle - %d", error); goto out; } len = hdl->frh_len; buf = (char *)__inm_get_free_page(INM_KM_SLEEP); if (!buf) { error = -ENOMEM; goto out; } error = fstream_raw_prepare_for_recusive_detection(filp, hdl); if (error) { err("Recursive IO handling failed"); goto out; } while (len) { iosize = min(len, (inm_u32_t)PAGE_SIZE); if (!flt_read_file(filp, buf, offset, iosize, &iodone) || iodone != iosize) { err("Read Failed: %llu:%u:%u", offset, iosize, iodone); error = -EIO; break; } if (!flt_write_file(filp, buf, offset, iosize, &iodone) || iodone != iosize) { err("Write Failed: %llu:%u:%u", offset, iosize, iodone); error = -EIO; break; } /* * For fs bsize >= PAGE_SIZE except last partial block io, * move to next fs block */ if (hdl->frh_bsize >= PAGE_SIZE && iosize == PAGE_SIZE) iosize = hdl->frh_bsize; offset += iosize; len -= iosize; } fstream_raw_revert_recursive_detection(filp); /* If all blocks could not be mapped */ if (hdl->frh_nblks != (hdl->frh_alen >> hdl->frh_bshift)) { err("Mapped: Actual = %d Expected = %d", hdl->frh_nblks, (hdl->frh_alen >> hdl->frh_bshift)); error = -EBADF; } out: if (!error) { *raw_hdl = hdl; fstream_raw_print_map(file, hdl); } else { if (hdl) fstream_raw_close(hdl); err("Cannot map %s", file); } if (buf) inm_free_page((unsigned long)buf); if (filp) flt_close_file(filp); return error; } inm_s32_t fstream_raw_get_fsize(fstream_raw_hdl_t *hdl) { return hdl->frh_fsize; } inm_s32_t fstream_raw_close(fstream_raw_hdl_t *hdl) { fstream_raw_hdl_free(hdl); return 0; } inm_s32_t fstream_raw_perform_block_io(inm_bio_dev_t *disk, char *buf, inm_u64_t offset, inm_u32_t len, inm_u32_t write) { static void *filp = NULL; static inm_bio_dev_t *prev_disk = NULL; static char diskname[INM_BDEVNAME_SIZE]; inm_u32_t iodone = 0; if (disk != prev_disk && filp) { flt_close_file(filp); filp = NULL; prev_disk = NULL; } if (!filp) { snprintf(diskname, INM_PATH_MAX, "%s", INM_BDEVNAME_PREFIX); inm_blkdev_name(disk, diskname + strlen(INM_BDEVNAME_PREFIX)); if (!flt_open_file(diskname, O_RDWR | O_SYNC, &filp)) { err("Cannot open file %s", diskname); filp = NULL; return -EIO; } prev_disk = disk; } info("%s: %s [%llu:%u] %p", write ? "WRITE" : "READ", diskname, offset, len, buf); if (write) flt_write_file(filp, buf, offset, len, &iodone); else flt_read_file(filp, buf, offset, len, &iodone); return (iodone == len) ? 0 : -EIO; } inm_s32_t fstream_raw_io(fstream_raw_hdl_t *hdl, char *buf, inm_u32_t len, inm_u64_t offset, int write) { inm_s32_t error = 0; int iosize = 0; inm_u64_t doffset = 0; int page = 0; int block = 0; inm_bio_dev_t *disk = NULL; if (offset < hdl->frh_offset || (offset + len) > (hdl->frh_offset + hdl->frh_len)) return -EINVAL; while (len) { page = FSRAW_BLK_PAGE(hdl, offset); block = FSRAW_BLK_IDX(hdl, offset); disk = (hdl->frh_blocks[page][block]).fb_disk; doffset = (hdl->frh_blocks[page][block]).fb_offset; iosize = min(len, hdl->frh_bsize); dbg("FSRAW %s: disk = %p, off = %llu, len = %u, page = %d, block = %u," "doffset = %llu, iosize = %u", write ? "WRITE" : "READ", disk, offset, len, page, block, doffset, iosize); error = fstream_raw_perform_block_io(disk, buf, doffset, iosize, write); if (error) break; buf += iosize; offset += iosize; len -= iosize; } if (error) err("Raw IO failed with error - %d", error); return error ? 1 : 0; } inm_s32_t fstream_raw_read(fstream_raw_hdl_t *hdl, char *buf, inm_u32_t len, inm_u64_t offset) { return fstream_raw_io(hdl, buf, len, offset, 0); } inm_s32_t fstream_raw_write(fstream_raw_hdl_t *hdl, char *buf, inm_u32_t len, inm_u64_t offset) { return fstream_raw_io(hdl, buf, len, offset, 1); } involflt-0.1.0/src/errlog.h0000755000000000000000000002315014467303177014315 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _INMAGE_ERRLOG_H #define _INMAGE_ERRLOG_H /* * MessageId: LINVOLFLT_ERR_NO_NPAGED_POOL_FOR_DIRTYBLOCKS * * MessageText: * * Not enough memory was available to store changes to volume %2 (GUID = %3). This usually indicates * a shortage of non-paged pool memory. */ #define LINVOLFLT_ERR_NO_NPAGED_POOL_FOR_DIRTYBLOCKS ((inm_u32_t)0xE1120001) /* * MessageId: LINVOLFLT_ERR_VOLUME_WRITE_PAST_EOV * * MessageText: * * A write attempt past the end of the volume was detected on volume %2 (GUID = %3). This may * indicate the volume has dynamically been grown. */ #define LINVOLFLT_ERR_VOLUME_WRITE_PAST_EOV ((inm_u32_t)0xA1120002) /* * MessageId: LINVOLFLT_ERR_NO_MEMORY * * MessageText: * * Not enough memory was available to perform an operation, as a result replication on * volume %2 (GUID = %3) has failed. */ #define LINVOLFLT_ERR_NO_MEMORY ((inm_u32_t)0xE112000B) /* * MessageId: LINVOLFLT_ERR_BITMAP_FILE_CANT_OPEN * * MessageText: * * The file used to store change information for volume %2 (GUID = %3) could not be opened. */ #define LINVOLFLT_ERR_BITMAP_FILE_CANT_OPEN ((inm_u32_t)0xE112000C) /* * MessageId: LINVOLFLT_ERR_BITMAP_FILE_CANT_UPDATE_HEADER * * MessageText: * * The file used to store change information for volume %2 (GUID = %3) could not be written to. */ #define LINVOLFLT_ERR_BITMAP_FILE_CANT_UPDATE_HEADER ((inm_u32_t)0xE112000D) /* * MessageId: LINVOLFLT_ERR_BITMAP_FILE_CANT_READ * * MessageText: * * The file used to store change information for volume %2 (GUID = %3) could not be read. Check for * disk errors on the device. */ #define LINVOLFLT_ERR_BITMAP_FILE_CANT_READ ((inm_u32_t)0xE112000E) /* * MessageId: LINVOLFLT_ERR_BITMAP_FILE_LOG_DAMAGED * * MessageText: * * The file used to store change information for volume %2 (GUID = %3) is damaged and could not * be automatically repaired. */ #define LINVOLFLT_ERR_BITMAP_FILE_LOG_DAMAGED ((inm_u32_t)0xE112000F) /* * MessageId: LINVOLFLT_ERR_BITMAP_FILE_CANT_APPLY_SHUTDOWN_CHANGES * * MessageText: * * Changes on volume %2 (GUID = %3)that occured at the previous system shutdown could not be merged with * current changes. */ #define LINVOLFLT_ERR_BITMAP_FILE_CANT_APPLY_SHUTDOWN_CHANGES ((inm_u32_t)0xE1120010) /* * MessageId: LINVOLFLT_ERR_BITMAP_FILE_CREATED * * MessageText: * * A new file used to store change information for volume %2 (GUID = %3) is created. */ #define LINVOLFLT_ERR_BITMAP_FILE_CREATED ((inm_u32_t)0x61120011) /* * MessageId: LINVOLFLT_ERR_LOST_SYNC_SYSTEM_CRASHED * * MessageText: * * The system crashed or experienced a non-controlled shutdown. Replication sync on * volume %2 (GUID = %3) will need to be reestablished. */ #define LINVOLFLT_ERR_LOST_SYNC_SYSTEM_CRASHED ((inm_u32_t)0xA1120012) /* * MessageId: LINVOLFLT_ERR_BITMAP_FILE_NAME_ERROR * * MessageText: * * The file used to store change information for volume %2 (GUID = %3) could not be opened * because of a naming problem. */ #define LINVOLFLT_ERR_BITMAP_FILE_NAME_ERROR ((inm_u32_t)0xE1120013) /* * MessageId: LINVOLFLT_ERR_BITMAP_FILE_WRITE_ERROR * * MessageText: * * The file used to store change information for volume %2 (GUID = %3) could not be written to. */ #define LINVOLFLT_ERR_BITMAP_FILE_WRITE_ERROR ((inm_u32_t)0xE1120014) /* * MessageId: LINVOLFLT_ERR_BITMAP_FILE_CANT_INIT * * MessageText: * * The file used to store change information for volume %2 (GUID = %3) could not be initialized. */ #define LINVOLFLT_ERR_BITMAP_FILE_CANT_INIT ((inm_u32_t)0xE1120015) /* * MessageId: LINVOLFLT_ERR_BITMAP_FILE_LOG_FIXED * * MessageText: * * The file used to store change information for volume %2 (GUID = %3) was repaired, but the volume * needs to be resynchronized. */ #define LINVOLFLT_ERR_BITMAP_FILE_LOG_FIXED ((inm_u32_t)0xA1120016) /* * MessageId: LINVOLFLT_ERR_TOO_MANY_LAST_CHANCE * * MessageText: * * The file used to store change information for volume %2 (GUID = %3) did not have sufficent reserved * space to store all changes at system shutdown. */ #define LINVOLFLT_ERR_TOO_MANY_LAST_CHANCE ((inm_u32_t)0xE1120017) /* * MessageId: LINVOLFLT_ERR_IN_SYNC * * MessageText: * * The replication on volume %2 (GUID = %3) has resumed correctly. */ #define LINVOLFLT_ERR_IN_SYNC ((inm_u32_t)0x61120018) /* * MessageId: LINVOLFLT_ERR_FINAL_HEADER_VALIDATE_FAILED * * MessageText: * * The final write to store change information for volume %2 (GUID = %3) failed header validation. */ #define LINVOLFLT_ERR_FINAL_HEADER_VALIDATE_FAILED ((inm_u32_t)0xE1120019) /* * MessageId: LINVOLFLT_ERR_FINAL_HEADER_DIRECT_WRITE_FAILED * * MessageText: * * The final direct write to store change information for volume %2 (GUID = %3) failed. */ #define LINVOLFLT_ERR_FINAL_HEADER_DIRECT_WRITE_FAILED ((inm_u32_t)0xE1120020) /* * MessageId: LINVOLFLT_ERR_FINAL_HEADER_FS_WRITE_FAILED * * MessageText: * * The final file system write to store change information for volume %2 (GUID = %3) failed. */ #define LINVOLFLT_ERR_FINAL_HEADER_FS_WRITE_FAILED ((inm_u32_t)0xE1120021) /* * MessageId: LINVOLFLT_ERR_FINAL_HEADER_READ_FAILED * * MessageText: * * The final direct read to store change information for volume %2 (GUID = %3) failed. */ #define LINVOLFLT_ERR_FINAL_HEADER_READ_FAILED ((inm_u32_t)0xE1120022) /* * MessageId: LINVOLFLT_ERR_DELETE_BITMAP_FILE_NO_NAME * * MessageText: * * Deleting the file used to store changes on volume %2 (GUID = %3) failed because the filename was not set. */ #define LINVOLFLT_ERR_DELETE_BITMAP_FILE_NO_NAME ((inm_u32_t)0xE1120023) /* * MessageId: LINVOLFLT_ERR_BITMAP_FILE_EXCEEDED_MEMORY_LIMIT * * MessageText: * * The memory limit for storing changes on volume %2 (GUID = %3) has exceeded. Increase memory limit or increase * change granularity using appropriate registry entries. */ #define LINVOLFLT_ERR_BITMAP_FILE_EXCEEDED_MEMORY_LIMIT ((inm_u32_t)0xE1120024) /* * MessageId: LINVOLFLT_ERR_VOLUME_SIZE_SEARCH_FAILED * * MessageText: * * The driver was unable to determine the correct size of volume %2 (GUID = %3) using a last sector search. */ #define LINVOLFLT_ERR_VOLUME_SIZE_SEARCH_FAILED ((inm_u32_t)0xE1120025) /* * MessageId: LINVOLFLT_ERR_VOLUME_GET_LENGTH_INFO_FAILED * * MessageText: * * The driver was unable to determine the correct size of volume %2 (GUID = %3) using IOCTL_DISK_GET_LENGTH_INFO. */ #define LINVOLFLT_ERR_VOLUME_GET_LENGTH_INFO_FAILED ((inm_u32_t)0xE1120026) /* * MessageId: LINVOLFLT_ERR_TOO_MANY_EVENT_LOG_EVENTS * * MessageText: * * The driver has written too many events to the system event log recently. Events will be discarded for * the next time interval. */ #define LINVOLFLT_ERR_TOO_MANY_EVENT_LOG_EVENTS ((inm_u32_t)0xE1120027) /* * MessageId: LINVOLFLT_WARNING_FIRST_FAILURE_TO_OPEN_BITMAP * * MessageText: * * The driver failed to open bitmap file for volume %2 (GUID = %3) on its first attempt. */ #define LINVOLFLT_WARNING_FIRST_FAILURE_TO_OPEN_BITMAP ((inm_u32_t)0xA1120028) /* * MessageId: LINVOLFLT_SUCCESS_OPEN_BITMAP_AFTER_RETRY * * MessageText: * * The driver succeeded to open bitmap file for volume %2 (GUID = %3) after %4 retries. * TimeInterval between first failure and success to open bitmap is %5 seconds. */ #define LINVOLFLT_SUCCESS_OPEN_BITMAP_AFTER_RETRY ((inm_u32_t)0x21120029) /* * MessageId: LINVOLFLT_INFO_OPEN_BITMAP_CALLED_PRIOR_TO_OBTAINING_GUID * * MessageText: * * The driver is opening bitmap for volume prior to symbolic link with GUID is created. */ #define LINVOLFLT_INFO_OPEN_BITMAP_CALLED_PRIOR_TO_OBTAINING_GUID ((inm_u32_t)0x61120030) /* * MessageId: LINVOLFLT_ERR_FAILED_TO_ALLOCATE_DATA_POOL * * MessageText: * * The driver has failed to allocate memory required for data pool. */ #define LINVOLFLT_ERR_FAILED_TO_ALLOCATE_DATA_POOL ((inm_u32_t)0xE1120031) /* * MessageId: LINVOLFLT_THREAD_SHUTDOWN_IN_PROGRESS * * MessageText: * * Thread shutdown is in progess. */ #define LINVOLFLT_THREAD_SHUTDOWN_IN_PROGRESS ((inm_u32_t)0xE1120032) /* * MessageId: LINVOLFLT_DATA_FILE_OPEN_FAILED * * MessageText: * * Data file open failed for volume %2 (GUID = %3) with status %4 */ #define LINVOLFLT_DATA_FILE_OPEN_FAILED ((inm_u32_t)0xE1120033) /* * MessageId: LINVOLFLT_WRITE_TO_DATA_FILE_FAILED * * MessageText: * * Write to data file %4 for volume %2 (GUID = %3) failed with status %5 */ #define LINVOLFLT_WRITE_TO_DATA_FILE_FAILED ((inm_u32_t)0xE1120034) /* * MessageId: LINVOLFLT_WARNING_STATUS_NO_MEMORY * * MessageText: * * Allocation of memory of size %4 type %2 failed in file %3 line %5 */ #define LINVOLFLT_WARNING_STATUS_NO_MEMORY ((inm_u32_t)0xA1120035) /* * MessageId: LINVOLFLT_DELETE_FILE_FAILED * * MessageText: * * Deletion of data file for volume %2 (GUID = %3) failed with error %4 */ #define LINVOLFLT_DELETE_FILE_FAILED ((inm_u32_t)0xA1120036) #endif /* _INMAGE_ERRLOG_H */ involflt-0.1.0/src/inm_utypes.h0000755000000000000000000000363414467303177015224 0ustar rootroot/* SPDX-License-Identifier: GPL-2.0-only */ /* Copyright (C) 2022 Microsoft Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef INM_UTYPES_H #define INM_UTYPES_H #include #include #include #include #include "inm_types.h" #include "inm_list.h" typedef struct cdev inm_cdev_t; typedef dev_t inm_dev_t; typedef pid_t inm_pid_t; typedef unsigned long inm_addr_t; typedef struct sysinfo inm_sysinfo_t; typedef struct file inm_filehandle_t; typedef struct sysinfo inm_meminfo_t; typedef struct bio inm_buf_t; struct drv_open{ const struct block_device_operations *orig_dev_ops; struct block_device_operations mod_dev_ops; #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,32) int (*orig_drv_open)(struct block_device *bdev, fmode_t mode); #else int (*orig_drv_open)(struct inode *inode, struct file *filp); #endif inm_atomic_t nr_in_flight_ops; int status; }; typedef struct drv_open inm_dev_open_t; typedef struct _inm_dc_at_lun_info{ inm_spinlock_t dc_at_lun_list_spn; struct inm_list_head dc_at_lun_list; inm_dev_open_t dc_at_drv_info; inm_u32_t dc_at_lun_info_flag; }inm_dc_at_lun_info_t; #endif /* INM_UTYPES_H */ involflt-0.1.0/LICENSE.txt0000644000000000000000000004313314467303177013706 0ustar rootroot GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc. 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Library General Public License instead of this License. involflt-0.1.0/README.md0000644000000000000000000000764114467303177013346 0ustar rootroot# Azure Site Recovery Disk Filter Driver for Linux (ASRDFD) * [Introduction](#introduction) * [Licensing](#licensing) * [Design](#design) * [Contributing](#contributing) * [Roadmap](#roadmap) * [Telemetry](#telemetry) * [Trademarks](#trademarks) ## Introduction Azure Site Recovery Disk Filter Driver for Linux (ASRDFD) is a disk filter driver to capture any changes to the disk. It can be installed as a kernel module in Linux and is currently used to implement the changed block tracking functionality used in Microsoft's Azure Site Recovery (ASR) product. The ASR product uses this module to achieve the disaster recovery and one can refer to the public documentation for ASR is available at https://learn.microsoft.com/en-us/azure/site-recovery/. The data tracked/captured at this module can be drained to user-space and can be used to replicate locally or to a remote site. The eventual goal is to upstream the driver to the Linux code tree. ## Licensing This project is licensed under the license [GPL-v2 only](./LICENSE.txt). ## Design The design is captured at [Design](doc/design.md). ## Contributing This file describes the steps to use the driver. For more information about how to contribute code to this project, please check the [CONTRIBUTING.md](CONTRIBUTING.md) file in this repository. ### How to build ``` bash cd src make -f involflt.mak KDIR=/lib/modules//build ``` - To compile the driver with telemetry support, compile with "TELEMETRY=yes" option. - For SLES12/15, have to pass the service pack level of the kernel for which the driver is getting built with option PATCH_LEVEL=\. For example, if the driver is getting built for a kernel which is been released for SLES12 SP3, so "PATCH_LEVEL=3" has to be passed as an argument to the above make command as the service pack is 3. - The build will generate the driver module file "involflt.ko". ### How to install Use the below command to load the driver module. ``` bash insmod involflt.ko ``` ### How to test All the IOCTLs are defined at [IOCTLs](doc/userspace-api/ioctl.md) to write the test utility. ## Roadmap As new Linux kernels are released, we intend to keep updating the driver code to ensure that the driver works with the latest kernel releases. By opensourcing this code, we can enable Linux distro vendors to support the driver functionality as soon as a new kernel version is released. Our eventual goal is to work towards contributing this driver to the upstream Linux code tree. ## Telemetry Data Collection. The software may collect information about you and your use of the software and send it to Microsoft. Microsoft may use this information to provide services and improve our products and services. You may turn off the telemetry as described in the repository. There are also some features in the software that may enable you and Microsoft to collect data from users of your applications. If you use these features, you must comply with applicable law, including providing appropriate notices to users of your applications together with a copy of Microsoft’s privacy statement. Our privacy statement is located at https://go.microsoft.com/fwlink/?LinkID=824704. You can learn more about data collection and use in the help documentation and our privacy statement. Your use of the software operates as your consent to these practices. ### Instructions to turn off telemetry The telemetry is disabled by default. Please refer to the section [How to build](#how-to-build) if telemetry needs to be enabled. ## Trademarks This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies. involflt-0.1.0/CONTRIBUTING.md0000644000000000000000000000564214467303177014317 0ustar rootrootThis project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com. When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repositories using our CLA. This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments. ## Reporting Issues Please open a new issue to describe the problem and suggestions to address the problem. You may also open an issue for asking questions and seeking help from the project community. **_NOTE:_** If your issue is related to a security vulnerability, please follow the guidelines mentioned in the [Security](SECURITY.md) file. ## Branching Guidelines We have 3 tags associated with the code 1. Main - this is where all the active development happens. New PRs are merged from the feature branches to the **main** branch. 2. Release - this branch tracks the release candidate. Based on the release content, the **release** branch is cutoff from the **main** branch and is subjected to additional testing. 3. Stable - **Release** branch is promoted to a stable tag after testing completes. Please use the **stable** tag for the most stable version of the project. ## Contributing to Source Code Here are a set of guidelines for contributing code to the project: 1. Please create a separate feature branch forking off from the main branch with your code changes. 2. Please follow the Linux kernel coding style for any code changes - https://www.kernel.org/doc/html/v4.10/process/coding-style.html **_DISCLAIMER:_** We are working towards aligning the existing code repository to the coding style mentioned above. Therefore, you may observe discrepancy between the existing coding style vs the Linux kernel coding style. Please be assured that we are actively working on fixing this discrepancy in order to make it ready for upstream. However - we expect all new code changes to adhere to the coding style mentioned above. 3. Please submit the PR describing the fix and the tests that were run to validate the fix. The maintainers will get in touch with you wrt your PR. If you dont receive a response within a reasonable time, please feel free to email the maintainers. ## Maintainers Microsoft Azure Site Recovery Disk Filter Driver for Linux maintainer list is available at [GitHub teams](https://github.com/orgs/microsoft/teams/asrsourceteam/members). involflt-0.1.0/SUPPORT.md0000644000000000000000000000206214467303177013555 0ustar rootroot# How to file issues and get help This project uses GitHub Issues to track bugs and feature requests. Please search the existing issues before filing new issues to avoid duplicates. For new issues, file your bug or feature request as a new Issue. For help and questions about using this project, please again use the GitHub Issues for this project to ask questions. # Microsoft Support Policy This sample code is not supported under any Microsoft standard support program or service. The entire risk arising out of the use or performance of the sample scripts and documentation remains with you. In no event shall Microsoft, its authors, or anyone else involved in the creation, production, or delivery of the scripts or sample code, be liable for any damages whatsoever (including, without limitation, damages for loss of business profits, business interruption, loss of business information, or other pecuniary loss) arising out of the use of or inability to use the sample scripts or documentation, even if Microsoft has been advised of the possibility of such damages. involflt-0.1.0/CODE_OF_CONDUCT.md0000644000000000000000000000067414467303177014665 0ustar rootroot# Microsoft Open Source Code of Conduct This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). Resources: - [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/) - [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) - Contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with questions or concerns involflt-0.1.0/doc/0000755000000000000000000000000014467303177012624 5ustar rootrootinvolflt-0.1.0/doc/userspace-api/0000755000000000000000000000000014467303177015365 5ustar rootrootinvolflt-0.1.0/doc/userspace-api/ioctl.md0000644000000000000000000000720714467303177017027 0ustar rootrootThis driver module is well tested for distros/kernels mentioned at https://learn.microsoft.com/en-us/azure/site-recovery/azure-to-azure-support-matrix#linux. The driver exposes following interfaces in terms of IOTCLs to write the test code. The interfaces are as follows: - IOCTL_INMAGE_START_FILTERING_DEVICE_V2
This IOCTL will be used to start filtering the writes on to a disk. It is going to save the actual make_request_fn and replaces with the driver's one to intercept the I/Os. - IOCTL_INMAGE_STOP_FILTERING_DEVICE
This IOCTL will be used to stop filtering the writes on to a disk. It is going to restore the actual make_request_fn and unplugs from the I/O stack for this disk. - IOCTL_INMAGE_UNSTACK_ALL
This IOCTL will stop the filtering all disks compared to previous IOCTL which works for one disk only. - IOCTL_INMAGE_GET_DIRTY_BLOCKS_TRANS
This IOCTL will drain the data from the driver for a disk so that it can be applied to a local or remote disk to replicate the source disk. - IOCTL_INMAGE_WAIT_FOR_DB_V2
This IOCTL helps in waiting at driver side for data to drain. If the above IOCTL get called frequently, very small chunks of data would be drained. So if the user thread waits in kernel using this IOCTL, the driver will wake-up the thread whenever data available for draining. - IOCTL_INMAGE_COMMIT_DIRTY_BLOCKS_TRANS
Once the data drained using the IOCTL IOCTL_INMAGE_GET_DIRTY_BLOCKS_TRANS, the user thread can apply the data on to another disk. Afterwards, the user thread can intimate the driver through this IOCTL to throw away this data and not needed anymore. Otherwise, as part of draining, the driver will drain the same data again and again. - IOCTL_INMAGE_CLEAR_DIFFERENTIALS
This IOCTL helps in throwing away all the captured data at the driver side. - IOCTL_INMAGE_PROCESS_START_NOTIFY
The driver notes down the PID of this thread and this thread shouldn't close the file-descriptor of the driver handle till the drainer threads exist. That way, when this thread closes this file-descriptor, the driver can safely deal with the last data drained for which IOCTL IOCTL_INMAGE_COMMIT_DIRTY_BLOCKS_TRANS is not received. - IOCTL_INMAGE_WAKEUP_ALL_THREADS
This IOCTL helps in waking-up all the drainer threads waiting at the driver side in kernel. - IOCTL_INMAGE_FREEZE_VOLUME
Freezes one or more file-systems. - IOCTL_INMAGE_THAW_VOLUME
Thaws the frozen file-system. - IOCTL_INMAGE_TAG_VOLUME_V2
By using the above two IOCTLs, a bookmark can be introduced to recover the data up to this bookmark. - IOCTL_INMAGE_TAG_COMMIT_V2
This IOCTL helps in to allow or drop the tag based the correctness above 3 IOCTLs. - IOCTL_INMAGE_CREATE_BARRIER_ALL
Once this IOCTL issued, the interception routine will not allow any writes to go down the I/O stack. This way creates a barrier across all the disks to issue a book mark using the IOCTL IOCTL_INMAGE_TAG_VOLUME_V2 to create crash consistent recovery point. - IOCTL_INMAGE_REMOVE_BARRIER_ALL
This IOCTL removes the barrier created using the previous IOCTL. - IOCTL_INMAGE_IOBARRIER_TAG_VOLUME
This IOCTL creates barrier, issues the bookmark and removes the barrer for all filtered disks in a system. **_NOTE:_** The last 7 IOCTLs can be used to achieve the crash/file-system consistent image of the disk(s) of the system. - IOCTL_INMAGE_GET_PROTECTED_VOLUME_LIST
Lists all the filtered disks using this driver. - IOCTL_INMAGE_GET_GLOBAL_STATS
Lists the statistics maintained at the driver level. - IOCTL_INMAGE_GET_VOLUME_STATS
Lists the statistics for each filtered disk involflt-0.1.0/doc/design.md0000644000000000000000000001105214467303177014416 0ustar rootroot# Design The interception of the I/O starts by replacing the actual make_request_fn pointer in request_queue for a disk and this allows the driver to filter the writes on to the disk based on the bio submitted for WRITE.. The actual make_request_fn pointer of a disk would be saved by the driver and will be used to unplug from the I/O stack at any later point of time to stop intercepting the I/Os for the disk. The core functionality of this driver is to capture the data or payload within the I/O vectors of a bio and this capturing will be performed once the write hits the disk, that is, in the path of completion of write in the interrupt context where bio_endio callback will be invoked for each submitted bio. In order to achieve this, the intercept route of driver does the following:
1. Once a bio comes to the intercept routine of the driver, the driver checks if the bio is submitted for READ or WRITE. If it is a READ, then invokes the original make_request_fn to complete the READ. Otherwise, moves to the next step. 2. Saves the bi_endio and bi_private fields of bio and point to the driver's version so that the completion can end in the driver's routine and helps in getting the callback to driver's routine once the write completes. Invokes the original make_request_fn to send the write request down the I/O stack for completion. 3. On completion of write, the driver starts capturing the data in the bio to driver specific buffers if available, otherwise captures starting write offset of the disk and length of the write so that this information can be utilized to read the data from the disk at later point of time. 4. Once the driver captures the data inside the bio, the driver would restores the original bi_endio and bi_private fields of bio and invokes bi_endio to send the control to the owner of this bio. The submission of write request flow will look like submit_bio | v make_request_fn | v flt_make_request_fn (driver's intercept routine) |----------------------> Saves the bi_endio and bi_private fields of bio | Replaces with driver's routine inm_bio_endio v make_request_fn (Original saved routine) The completion of the same will look like inm_bio_endio (driver's completion routine) |----------------------> Captures the data in bio and restores the bi_end_io and bi_provate fileds v bi_end_io Starting 5.8 kernel, make_request_fn is not getting populated for all the disks and in 5.9 kernel, this pointer is completely removed. The driver has started replacing the queue_rq function pointer in blk_mq_ops of request_queue. This routine intercepts a request instead of bio and then loops over the bios in this request and performs the above operations. It is clear that this module is capturing the data in the completion path of each write where the data associated with the write is already written to the disk or failed to write. This can be done in the driver's interception routine as well but that needs extra handling as the write may fail and in the mean time if the user-space drains and uses it. The whole workflow needs a mechanism to undo the captured data and it adds challenges to handle this scenario. And also how to handle the partial completion of write on to the disk. SO handling the capturing in the completion path. This driver module captures the data in two modes: - Data Mode: The driver captures the offset and size of the write involved by referring the bio object received in the completion routine. This mode also captures data/payload associated with the write by referreing to the bio vectors. This mode mandates the driver to reserve some memory as the driver can't just allocate all the memory which can impact the production system. For this purpose, the driver allocates 6.25% of RAM. Once this reserved memory exhausts, the driver moves to metadata mode of capturing. - Metadata Mode: The driver captures the offset and size of the write involved only. When the drainer drains this kind of data, the drainer has to read the data from disk using the offset and size of the writes so far captured in this mode. The driver maintains a bitmap file of granulariry 4k or 16k depneding on the size of the disk. The driver will start writing to the bitmap file once the number of writes captured in the metadata mode goes beyond certain limit and discards the captured writes. This way, the driver restricts the over utlization of production system's memory. This is referred to as Bitmap Mode. involflt-0.1.0/LICENSES/0000755000000000000000000000000014467303177013264 5ustar rootrootinvolflt-0.1.0/LICENSES/exceptions/0000755000000000000000000000000014467303177015445 5ustar rootrootinvolflt-0.1.0/LICENSES/exceptions/Linux-syscall-note0000644000000000000000000000235214467303177021104 0ustar rootrootSPDX-Exception-Identifier: Linux-syscall-note SPDX-URL: https://spdx.org/licenses/Linux-syscall-note.html SPDX-Licenses: GPL-2.0, GPL-2.0+, GPL-1.0+, LGPL-2.0, LGPL-2.0+, LGPL-2.1, LGPL-2.1+, GPL-2.0-only, GPL-2.0-or-later Usage-Guide: This exception is used together with one of the above SPDX-Licenses to mark user space API (uapi) header files so they can be included into non GPL compliant user space application code. To use this exception add it with the keyword WITH to one of the identifiers in the SPDX-Licenses tag: SPDX-License-Identifier: WITH Linux-syscall-note License-Text: NOTE! This copyright does *not* cover user programs that use kernel services by normal system calls - this is merely considered normal use of the kernel, and does *not* fall under the heading of "derived work". Also note that the GPL below is copyrighted by the Free Software Foundation, but the instance of code that it refers to (the Linux kernel) is copyrighted by me and others who actually wrote it. Also note that the only valid version of the GPL as far as the kernel is concerned is _this_ particular version of the license (ie v2, not v2.2 or v3.x or whatever), unless explicitly otherwise stated. Linus Torvalds