Skip to content

Analysis Commands

Core commands for processing and analyzing methylation data.


all_analysis

Run all analysis methods.

.. note:: case_data and ctr_data must be from a list of samples, where each contains one of the following column formats: - ["chromosome", "position_start", "coverage", "methylation_percentage"] - ["chromosome", "position_start", "positive_methylation_count", "negative_methylation_count"]

.. note:: if window_based is True, the following methods will be run: - merge_tables - window_based - generate_q_values - filters - map_win_2_pos if window_based is False, the following methods will be run: - merge_tables - position_based - generate_q_values - filters

Argument Default Description
--case_data *Required* Case data (list of files)
--ctr_data *Required* Control data (list of files)
--ref_folder None ---
--window_based False Window-based analysis, defaults to True
--min_cov_individual 10 Minimum coverage filter (individual), defaults to 10
--min_cov_group 15 Minimum coverage filter (group), defaults to 15
--filter_samples_ratio 0.6 Minimum sample ratio filter. Used with min_cov_group, defaults to 0.6
--meth_group_threshold 0.2 Methylation group threshold. Used with min_cov_group, defaults to 0.2
--cov_percentile 100.0 Maximum coverage filter (percentile of sample coverage). Ranges from 0.0-100.0, defaults to 100.0
--min_samp_ctr 2 Minimum samples in control, defaults to 2
--min_samp_case 2 Minimum samples in case, defaults to 2
--max_q_value 0.05 Maximum q-value filter, defaults to 0.05
--abs_min_diff 0.0 Minimum absolute difference filter, defaults to 0.25
--features None ---

merge_tables

Merge case and control data tables.

.. note:: case_data and ctr_data must be from a list of samples, where each contains one of the following column formats: - ["chromosome", "position_start", "coverage", "methylation_percentage"] - ["chromosome", "position_start", "positive_methylation_count", "negative_methylation_count"] - Additionally, "position_end" and "strand" columns can be used. These will be considered during the join process.

Argument Default Description
--case_data *Required* The case data to be merged.
--ctr_data *Required* The control data to be merged.
--min_cov_individual 10 Minimum coverage filter (individual), defaults to 10
--min_cov_group 15 Minimum coverage filter (group), defaults to 15
--filter_samples_ratio 0.6 Minimum sample ratio filter. Used with min_cov_group, defaults to 0.6
--meth_group_threshold 0.2 Methylation group threshold. Used with min_cov_group, defaults to 0.2
--cov_percentile 100.0 Maximum coverage filter (percentile of sample coverage). Ranges from 0.0-100.0, defaults to 100.0
--min_samp_ctr 2 Minimum samples in control, defaults to 2
--min_samp_case 2 Minimum samples in case, defaults to 2
--rerun False Rerun the analysis. If False, load previous output. Defaults to False.
--small_mean 1 ---

position_based

Perform position-based DML detection. Has options for using the gamma function, or the limma R package.

.. note:: data must contain the following column format: - ["chromosome", "position_start", "methylation_percentage*"] - There must be a methylation_percentage column for each sample to test.

Argument Default Description
--data None Input data. Not necessary if the pipeline is in use, defaults to None
--method limma Position-based method to use, options are ["gamma", "limma"], defaults to "limma"
--features None Features .csv file used with the "limma" method, has indicies with the names of samples in data and columns for the features, defaults to None
--test_factor Group Test factor used with the "limma" method, defaults to "Group"
--processes 12 Number of CPU processes, defaults to 12
--model eBayes Limma model to use, options are ["eBayes", "treat"] defaults to "eBayes"
--min_std 0.1 Minimum standard deviation filter, defaults to 0.1
--fill_na True Fill NA values with row group average, defaults to True
--rerun False Rerun the analysis. If False, load previous output. Defaults to False.

process_regions

No description available.

Argument Default Description
--region_file None ---
--ref_folder None ---
--annotation_file CpG_gencode_annotation.bed ---
--gene_bed_file gencode.v42.chr_patch_hapl_scaff.annotation.genes.bed ---
--ccre_file encodeCcreCombined.bed ---

generate_DMR

Generate Differentially Methylated Regions (DMRs) by clustering DMLs.

.. note:: Required columns for significant_position_data: - ["chromosome", "position_start", "diff"] Required columns for position_data: - ["chromosome", "position_start", "diff"]

Argument Default Description
--significant_position_data None Significant position data. Not necessary if the pipeline is in use, defaults to None
--position_data None All position data. Not necessary if the pipeline is in use, defaults to None
--min_pos 3 Minimum positions, defaults to 3
--neutral_change_limit 7.5 Neutral change limit, defaults to 7.5
--neutral_perc 30 Neutral percentage, defaults to 30
--opposite_perc 10 Opposite percentage, defaults to 10
--significant_position_pipeline auto The significant position-based or window-based results to use as input if DiffMethylTools is pipelined and no data is provided. Options are ["auto", "position", "window"], defaults to "auto"
--rerun False Rerun the analysis. If False, load previous output. Defaults to False.

filters

Filter positions by q-value and minimum difference.

.. note:: Required columns for data: - ["q-value", "diff"]

Argument Default Description
--data None Input data. Not necessary if the pipeline is in use, defaults to None
--max_q_value 0.05 Maximum q-value filter, defaults to 0.05
--abs_min_diff 0.1 Absolute minimum difference filter, defaults to 0.10
--position_or_window auto The position-based or window-based results to use as input if DiffMethylTools is pipelined and no data is provided. Options are ["auto", "position", "window"], defaults to "auto"
--rerun False Rerun the analysis. If False, load previous output. Defaults to False.

map_win_2_pos

Map windows to positions.

.. note:: Required columns for window_data: - ["chromosome", "region_start", "region_end"]

Required columns for ``position_data``:
    - ``["chromosome", "position_start", "avg_case", "avg_ctr"]``
Argument Default Description
--window_data None Window data to map with. Not necessary if the pipeline is in use, defaults to None
--position_data None Position data to map the windows to. Not necessary if the pipeline is in use, defaults to None
--processes 12 Number of CPU processes, defaults to 12
--sub_window_size 100 Sub-window size for a deeper difference filtering, defaults to 100
--sub_window_step 100 Sub-window step size, defaults to 100
--sub_window_min_diff 0 Sub-window minimum difference, defaults to 0
--pipeline_window_result auto The function results to use as input if DiffMethylTools is pipelined and no data is provided. Options are ["auto", "filters", "generate_q_values", "window_based"], defaults to "auto"
--rerun False Rerun the analysis. If False, load previous output. Defaults to False.