Analysis Commands
Core commands for processing and analyzing methylation data.
all_analysis
Run all analysis methods.
.. note::
case_data and ctr_data must be from a list of samples, where each contains one of the following column formats:
- ["chromosome", "position_start", "coverage", "methylation_percentage"]
- ["chromosome", "position_start", "positive_methylation_count", "negative_methylation_count"]
.. note::
if window_based is True, the following methods will be run:
- merge_tables
- window_based
- generate_q_values
- filters
- map_win_2_pos
if window_based is False, the following methods will be run:
- merge_tables
- position_based
- generate_q_values
- filters
| Argument | Default | Description |
|---|---|---|
--case_data |
*Required* |
Case data (list of files) |
--ctr_data |
*Required* |
Control data (list of files) |
--ref_folder |
None |
--- |
--window_based |
False |
Window-based analysis, defaults to True |
--min_cov_individual |
10 |
Minimum coverage filter (individual), defaults to 10 |
--min_cov_group |
15 |
Minimum coverage filter (group), defaults to 15 |
--filter_samples_ratio |
0.6 |
Minimum sample ratio filter. Used with min_cov_group, defaults to 0.6 |
--meth_group_threshold |
0.2 |
Methylation group threshold. Used with min_cov_group, defaults to 0.2 |
--cov_percentile |
100.0 |
Maximum coverage filter (percentile of sample coverage). Ranges from 0.0-100.0, defaults to 100.0 |
--min_samp_ctr |
2 |
Minimum samples in control, defaults to 2 |
--min_samp_case |
2 |
Minimum samples in case, defaults to 2 |
--max_q_value |
0.05 |
Maximum q-value filter, defaults to 0.05 |
--abs_min_diff |
0.0 |
Minimum absolute difference filter, defaults to 0.25 |
--features |
None |
--- |
merge_tables
Merge case and control data tables.
.. note::
case_data and ctr_data must be from a list of samples, where each contains one of the following column formats:
- ["chromosome", "position_start", "coverage", "methylation_percentage"]
- ["chromosome", "position_start", "positive_methylation_count", "negative_methylation_count"]
- Additionally, "position_end" and "strand" columns can be used. These will be considered during the join process.
| Argument | Default | Description |
|---|---|---|
--case_data |
*Required* |
The case data to be merged. |
--ctr_data |
*Required* |
The control data to be merged. |
--min_cov_individual |
10 |
Minimum coverage filter (individual), defaults to 10 |
--min_cov_group |
15 |
Minimum coverage filter (group), defaults to 15 |
--filter_samples_ratio |
0.6 |
Minimum sample ratio filter. Used with min_cov_group, defaults to 0.6 |
--meth_group_threshold |
0.2 |
Methylation group threshold. Used with min_cov_group, defaults to 0.2 |
--cov_percentile |
100.0 |
Maximum coverage filter (percentile of sample coverage). Ranges from 0.0-100.0, defaults to 100.0 |
--min_samp_ctr |
2 |
Minimum samples in control, defaults to 2 |
--min_samp_case |
2 |
Minimum samples in case, defaults to 2 |
--rerun |
False |
Rerun the analysis. If False, load previous output. Defaults to False. |
--small_mean |
1 |
--- |
position_based
Perform position-based DML detection. Has options for using the gamma function, or the limma R package.
.. note::
data must contain the following column format:
- ["chromosome", "position_start", "methylation_percentage*"]
- There must be a methylation_percentage column for each sample to test.
| Argument | Default | Description |
|---|---|---|
--data |
None |
Input data. Not necessary if the pipeline is in use, defaults to None |
--method |
limma |
Position-based method to use, options are ["gamma", "limma"], defaults to "limma" |
--features |
None |
Features .csv file used with the "limma" method, has indicies with the names of samples in data and columns for the features, defaults to None |
--test_factor |
Group |
Test factor used with the "limma" method, defaults to "Group" |
--processes |
12 |
Number of CPU processes, defaults to 12 |
--model |
eBayes |
Limma model to use, options are ["eBayes", "treat"] defaults to "eBayes" |
--min_std |
0.1 |
Minimum standard deviation filter, defaults to 0.1 |
--fill_na |
True |
Fill NA values with row group average, defaults to True |
--rerun |
False |
Rerun the analysis. If False, load previous output. Defaults to False. |
process_regions
No description available.
| Argument | Default | Description |
|---|---|---|
--region_file |
None |
--- |
--ref_folder |
None |
--- |
--annotation_file |
CpG_gencode_annotation.bed |
--- |
--gene_bed_file |
gencode.v42.chr_patch_hapl_scaff.annotation.genes.bed |
--- |
--ccre_file |
encodeCcreCombined.bed |
--- |
generate_DMR
Generate Differentially Methylated Regions (DMRs) by clustering DMLs.
.. note::
Required columns for significant_position_data:
- ["chromosome", "position_start", "diff"]
Required columns for position_data:
- ["chromosome", "position_start", "diff"]
| Argument | Default | Description |
|---|---|---|
--significant_position_data |
None |
Significant position data. Not necessary if the pipeline is in use, defaults to None |
--position_data |
None |
All position data. Not necessary if the pipeline is in use, defaults to None |
--min_pos |
3 |
Minimum positions, defaults to 3 |
--neutral_change_limit |
7.5 |
Neutral change limit, defaults to 7.5 |
--neutral_perc |
30 |
Neutral percentage, defaults to 30 |
--opposite_perc |
10 |
Opposite percentage, defaults to 10 |
--significant_position_pipeline |
auto |
The significant position-based or window-based results to use as input if DiffMethylTools is pipelined and no data is provided. Options are ["auto", "position", "window"], defaults to "auto" |
--rerun |
False |
Rerun the analysis. If False, load previous output. Defaults to False. |
filters
Filter positions by q-value and minimum difference.
.. note::
Required columns for data:
- ["q-value", "diff"]
| Argument | Default | Description |
|---|---|---|
--data |
None |
Input data. Not necessary if the pipeline is in use, defaults to None |
--max_q_value |
0.05 |
Maximum q-value filter, defaults to 0.05 |
--abs_min_diff |
0.1 |
Absolute minimum difference filter, defaults to 0.10 |
--position_or_window |
auto |
The position-based or window-based results to use as input if DiffMethylTools is pipelined and no data is provided. Options are ["auto", "position", "window"], defaults to "auto" |
--rerun |
False |
Rerun the analysis. If False, load previous output. Defaults to False. |
map_win_2_pos
Map windows to positions.
.. note::
Required columns for window_data:
- ["chromosome", "region_start", "region_end"]
Required columns for ``position_data``:
- ``["chromosome", "position_start", "avg_case", "avg_ctr"]``
| Argument | Default | Description |
|---|---|---|
--window_data |
None |
Window data to map with. Not necessary if the pipeline is in use, defaults to None |
--position_data |
None |
Position data to map the windows to. Not necessary if the pipeline is in use, defaults to None |
--processes |
12 |
Number of CPU processes, defaults to 12 |
--sub_window_size |
100 |
Sub-window size for a deeper difference filtering, defaults to 100 |
--sub_window_step |
100 |
Sub-window step size, defaults to 100 |
--sub_window_min_diff |
0 |
Sub-window minimum difference, defaults to 0 |
--pipeline_window_result |
auto |
The function results to use as input if DiffMethylTools is pipelined and no data is provided. Options are ["auto", "filters", "generate_q_values", "window_based"], defaults to "auto" |
--rerun |
False |
Rerun the analysis. If False, load previous output. Defaults to False. |