Quick Start Tutorial
DiffMethylTools supports both default methylation input formats and fully customizable formats. Users can either specify a standard format via --input_format or manually define column indices.
1. Core Analysis (Generate DML/DMR)
You can run the entire core analysis pipeline using the all_analysis command.
Option A: BED Format with Methylation Percentage
Use this if your data is in BED format. DiffMethylTools automatically interprets the chromosome, position, coverage, and methylation percentage columns.
Supported via: --input_format BED
Expected BED-like format (example):
chr1 10468 10469 5mC 743 + 10468 10469 0,0,0 52 95.21
chr1 10470 10471 5mC 850 + 10470 10471 0,0,0 58 94.26
Run Analysis:
DiffMethylTools all_analysis \
--case_data_file case1.bed case2.bed \
--ctr_data_file ctr1.bed ctr2.bed \
--input_format BED \
--ref_folder hg38
Option B: Bismark CpG Report Format (CR)
Use this if your data is from Bismark. DiffMethylTools automatically detects chromosome, CpG position, methylated counts, and unmethylated counts.
Supported via: --input_format CR
Expected Bismark CpG report format:
Run Analysis:
DiffMethylTools all_analysis \
--case_data_file case1_CpG_report.txt case2_CpG_report.txt \
--ctr_data_file ctr1_CpG_report.txt ctr2_CpG_report.txt \
--input_format CR \
--ref_folder hg38
Option C: Flexible / Custom Input Format
If the input files do not conform to standard BED or Bismark CpG report formats, you can manually specify column indices (0-based). For both case and control files, you must define the field separator, chromosome column, start position column, and either methylation percentage + coverage columns or methylated + unmethylated count columns.
Example 1: Custom format with CpG report counts
DiffMethylTools all_analysis \
--case_data_file case1.txt case2.txt \
--ctr_data_file ctr1.txt ctr2.txt \
--case_data_chromosome_column_index 0 \
--ctr_data_chromosome_column_index 0 \
--case_data_position_start_column_index 1 \
--ctr_data_position_start_column_index 1 \
--case_data_positive_methylation_count_column_index 3 \
--case_data_negative_methylation_count_column_index 4 \
--ctr_data_positive_methylation_count_column_index 3 \
--ctr_data_negative_methylation_count_column_index 4 \
--case_data_separator '\t' \
--ctr_data_separator '\t' \
--ref_folder hg38
Example 2: Custom format with BED-style percentages
DiffMethylTools all_analysis \
--case_data_file case1.bed case2.bed \
--ctr_data_file ctr1.bed ctr2.bed \
--case_data_separator '\t' \
--ctr_data_separator '\t' \
--case_data_chromosome_column_index 0 \
--ctr_data_chromosome_column_index 0 \
--case_data_position_start_column_index 1 \
--ctr_data_position_start_column_index 1 \
--case_data_methylation_percentage_column_index 10 \
--case_data_coverage_column_index 9 \
--ctr_data_methylation_percentage_column_index 10 \
--ctr_data_coverage_column_index 9 \
--ref_folder hg38
Understanding the Output
Running all_analysis will generate two folders in your working directory:
plot/
This folder is initially empty. It will be populated by output figures and plots generated by DiffMethylTools (e.g., volcano plots, methylation curves, pie charts) when you run the plotting commands.
data/
This folder contains the required input and intermediate files used during the analysis. Below is a description of each file:
-
merge_tables.csv: Merged and coverage-filtered methylation data for both case and control samples. -
position_based.csv: Methylation data summarized at the individual CpG or position level, including q-value and difference information. -
filters.csv: Filtered data based on methylation difference and statistical significance (q-value). -
generate_DMR_0.csv: Differentially methylated regions (DMRs), aggregated from position-level data. -
generate_DMR_1.csv: DMLs that do not fall within any DMR (isolated differential sites). -
generate_DMR_2.csv: Differentially methylated loci (DMLs) located within identified DMRs. -
map_positions_to_genes_genes.csv: Mapping of methylation regions to annotated gene features. -
map_positions_to_genes_CCRE.csv: Mapping of methylation regions to candidate cis-regulatory elements (CCREs). -
map_win_2_pos.csv: Maps DMR regions to all underlying CpG positions (includes both DML and non-DML positions). -
state.yaml: YAML configuration file that tracks tool state, parameters, and progress.
2. Visualizing Results
Once your data is processed, you can easily generate comprehensive plots using the files in the data/ folder.
Generate All Standard Plots To generate Volcano plots, Manhattan plots, upstream clustering, and gene region mappings all at once:
DiffMethylTools all_plots \
--data_file data/position_based.csv --data_has_header \
--window_data_file data/generate_DMR_0.csv --window_data_has_header \
--data_separator ',' \
--window_data_separator ',' \
--gene_file data/map_positions_to_genes_genes.csv --gene_has_header \
--ccre_file data/map_positions_to_genes_CCRE.csv --ccre_has_header \
--ref_folder hg38
Annotation Pie Charts Region-based DMR annotation pie chart:
DiffMethylTools match_region_annotation \
--regions_df_file data/generate_DMR_0.csv \
--regions_df_has_header \
--ref_folder hg38
Feature-based DMR annotation pie chart:
DiffMethylTools match_region_annotation \
--regions_df_file data/generate_DMR_0.csv \
--regions_df_has_header \
--annotation_or_region annotation \
--ref_folder hg38
Plot Specific Methylation Curves To plot DMR regions on a specific chromosome (e.g., chr1 between positions 3,664,000 and 3,668,000):
DiffMethylTools plot_methylation_curve \
--region_data_file data/generate_DMR_0.csv --region_data_has_header \
--position_data_file data/position_based.csv --position_data_has_header \
--chr_filter chr1 \
--start_filter 3664000 \
--end_filter 3668000 \
--ref_folder hg38
(Note: Omitting the --chr_filter, --start_filter, and --end_filter options will generate plots for all DMRs).