Abstract: | Cell-free DNA (cfDNA) fragmentation patterns contain important molecular information linked to tissues of origin. We explored the possibility of using fragmentation patterns to predict cytosine-phosphate-guanine (CpG) methylation of cfDNA, obviating the use of bisulfite treatment and associated risks of DNA degradation. This study investigated the cfDNA cleavage profile surrounding a CpG (i.e., within an 11-nucleotide [nt] window) to analyze cfDNA methylation. The cfDNA cleavage proportion across positions within the window appeared nonrandom and exhibited correlation with methylation status. The mean cleavage proportion was ∼twofold higher at the cytosine of methylated CpGs than unmethylated ones in healthy controls. In contrast, the mean cleavage proportion rapidly decreased at the 1-nt position immediately preceding methylated CpGs. Such differential cleavages resulted in a characteristic change in relative presentations of CGN and NCG motifs at 5′ ends, where N represented any nucleotide. CGN/NCG motif ratios were correlated with methylation levels at tissue-specific methylated CpGs (e.g., placenta or liver) (Pearson’s absolute r > 0.86). cfDNA cleavage profiles were thus informative for cfDNA methylation and tissue-of-origin analyses. Using CG-containing end motifs, we achieved an area under a receiver operating characteristic curve (AUC) of 0.98 in differentiating patients with and without hepatocellular carcinoma and enhanced the positive predictive value of nasopharyngeal carcinoma screening (from 19.6 to 26.8%). Furthermore, we elucidated the feasibility of using cfDNA cleavage patterns to deduce CpG methylation at single CpG resolution using a deep learning algorithm and achieved an AUC of 0.93. FRAGmentomics-based Methylation Analysis (FRAGMA) presents many possibilities for noninvasive prenatal, cancer, and organ transplantation assessment.Fragmentation patterns of cell-free DNA (cfDNA) molecules contain a wealth of molecular information related to their tissues of origin (1). For instance, compared with the background DNA molecules that are mainly derived from the hematopoietic system (2, 3), size shortening of fetal and tumoral DNA molecules occurs in the plasma DNA of pregnant women and cancer patients, respectively (4–6). In addition, a series of 10-bp periodicities were present in fetal and tumoral DNA molecules below 146 bp, with a relative reduction in the major peak at 166 bp (1). Such characteristic size profiles suggest that the fragmentation of cfDNA may be associated with nucleosome structures (5, 7). Many important characteristics pertaining to cfDNA fragmentation have been unveiled recently, such as nucleosome footprints (8, 9), fragment end motifs (10), preferred ends (7, 11), and jagged ends (12), which are examples of fragmentomic markers (1).cfDNA fragmentomics is an emergent and actively pursued area, with wide-ranging biological and clinical implications. It has been reported that the use of fragmentation patterns of cfDNA could inform the expression status of genes (13, 14). Using mouse models, DNA nucleases (e.g., DNASE1L3) were found to play important roles in the generation of plasma DNA molecules (15, 16). Fragmentomic features, such as cfDNA end motifs and jagged ends, were further demonstrated to be useful for monitoring DNA nuclease activities, providing biomarkers for autoimmune diseases (e.g., systemic lupus erythematosus) (17, 18). In addition, the deficiencies of nuclease activities in a mouse model resulted in altered DNA methylation profiles of plasma DNA molecules (19). However, how cfDNA fragmentation patterns interplay with DNA methylation in human individuals under different pathophysiological conditions, such as pregnancy and oncogenesis, and in healthy patients without nuclease deficiency, is unknown. It is also not known whether fragmentomic features can be used to deduce cfDNA methylation status.A widely employed way to assess DNA methylation is through bisulfite sequencing (20). A key limitation of this approach is the severe degradation of DNA molecules caused by the bisulfite treatment (21), which greatly increases the sampling variation when analyzing rare target molecules (e.g., tumoral cfDNA at early stages of cancer). Many efforts have been made toward overcoming this issue. For example, Vaisvila et al. developed enzymatic methyl sequencing for which DNA molecules were treated using tet methylcytosine dioxygenase 2 and T4 phage β-glucosyltransferase, followed by the apolipoprotein B mRNA editing enzyme catalytic subunit 3A (APOBEC3A) treatment. Cytosine conversion based on enzymatic processes was reported to be much less destructive (22). Recently, researchers developed approaches making use of third-generation sequencing technologies such as single-molecule real-time sequencing (Pacific Biosciences) (23) and nanopore sequencing (24) to analyze cytosine-phosphate-guanine (CpG) methylation patterns in native DNA molecules, theoretically overcoming the above-mentioned limitation. However, compared with second-generation sequencing (also called next-generation sequencing [NGS]) technologies, the throughput of third-generation sequencing technologies is generally lower and the sequencing cost per nucleotide (nt) is much higher, thus restricting its immediate application in clinical settings. Here, we explore the feasibility of enabling the assessment of DNA methylation using fragmentomic characteristics of cfDNA molecules deduced from NGS results without the use of bisulfite or enzymatic treatment. If successful, such an approach could leverage the high throughput of NGS while obviating the use of chemical/enzymatic conversion and could potentially be readily integrated into currently used NGS-based platforms for cfDNA analysis.In this study, we utilize the fragmentation patterns proximal to a CpG site for deducing its methylation status. The fragmentation pattern is depicted by the frequency of cfDNA fragment ends at each position within a certain nt range relative to a CpG of interest, termed a cleavage profile (). Such a cleavage profile varies according to the methylation status of the CpG site of interest, providing the basis for methylation analysis by using fragmentomic features. We further correlated two types of end motifs (CGN and NCG; N represents any nucleotide of A, C, G, or T) resulting from differential cutting in the measurement window related to DNA methylation, attempting to construct a simplified approach for methylation analysis. Modeling CpG methylation using cfDNA fragmentation may facilitate noninvasive prenatal testing, cancer detection, and tissue-of-origin analysis (). Furthermore, we explore the feasibility of using deep learning to deduce the methylation status at single CpG resolution through the cleavage profile (). We refer to this FRAGmentomics-based Methylation Analysis as FRAGMA in this study.Open in a separate windowSchematic for FRAGMA of cfDNA molecules. cfDNA molecules were sequenced by massively parallel sequencing and aligned to the human reference genome. The cleavage proportion within an 11-nt window (the cleavage measurement window) was used to measure the cutting preference of cfDNA molecules. The patterns of cleavage proportion within a window (the cleavage profile) depended on the methylation status of one or more CpG sites associated with that window. For example, a methylated CpG site might confer a higher probability of cfDNA cutting at the cytosine in the CpG context, but an unmethylated site might not. Such methylation-dependent differential fragmentation within a cleavage measurement window resulted in the change in CGN/NCG motif ratio. Thus, the CGN/NCG motif ratio provided a simplified version for reflecting CpG methylation, allowing cfDNA tissue-of-origin analysis of cfDNA and cancer detection. Furthermore, the great number of cleavage profiles derived from cfDNA molecules might provide an opportunity to train a deep learning model for methylation prediction at the single CpG resolution. |