Segmented copy number density
Distance heat map
 
Best fit solutions

rascal (relative to absolute copy number scaling)

Shiny app for scaling relative copy number data from shallow whole genome sequencing of cancer samples to absolute values and estimating the tumour ploidy and cellularity of the samples.

Several research groups at CRUK CI are using shallow whole genome sequencing as a relatively inexpensive method for obtaining copy number profiles for tumour samples, particularly as libraries from several samples can be multiplexed in a single lane of sequencing.

We are principally using QDNAseq for summing reads that align within genomic windows or bins, typically 30kb in size, and correcting for GC-content and mappability. This results in values that are relative to the average copy number within the sample for the GC and mappability of each bin. These relative copy numbers are smoothed and segmented and provide useful insight into genomic abnormalities in cancers.

For some research projects it is desirable to obtain absolute copy numbers. Normally this would require deeper whole genome sequencing from which allele fractions of germline SNPs can help determine the clonal architecture of a tumour sample. In the absence of such information, and noting the significant increase in cost for deeper sequencing, we can attempt to fit the relative copy number profiles to absolute copy numbers by evaluating various estimates of ploidy and cellularity.

The approach used in this application is based on concepts introduced in the ACE package developed by Bauke Ylstra's group at Amsterdam UMC. The mathematics underpinning this approach assume a single dominant clone; estimating ploidy and cellularity for heterogeneous tumour samples may prove difficult with this method.

This application was created using the R Shiny web application framework. It was developed by Matt Eldridge in the Bioinformatics Core in collaboration with the James Brenton's laboratory at the Cancer Research UK Cambridge Institute.

User guide


Main page

Upload a tab-delimited, CSV or R data object file (.rds) containing a copy number table (or data frame in the case of an .rds file) by clicking the Browse button on the main page. The following columns are expected:

  • sample (optional)
  • chromosome
  • start
  • end
  • copy_number (optional)
  • segmented
A single unnamed sample will be assumed if there is no sample column.

Each row in the table should correspond to a bin (or window) or a wider continuous copy number segment following segmentation. Values should be relative copy numbers that have not been log2-transformed. Segmented copy number values are required as these are used in fitting to absolute copy numbers. Copy number values for individual bins are optional but can be helpful in assessing how well the segmentation performed and showing the level of noise in the data.

Alternatively an R data object file (.rds) containing a QDNAseqCopyNumbers object obtained from processing shallow whole genome sequencing data using QDNAseq can also be uploaded.

Select a sample to view from the drop-down list.

Click on a chromosome in the whole genome copy number plot (left-hand side) to display the copy number for that chromosome on the right-hand side. Zoom in to a specific region on a chromosome by clicking and dragging to select the region in the chromosome copy number plot; double-click to zoom out again and view the whole chromosome.

Hover over a location to display the copy number, log2 ratio, fitted absolute copy number and tumour DNA fraction at this locus.

The tumour fraction is the fraction of tumour DNA at that location given the cellularity and absolute copy number. For example, a sample with cellularity 0.5 (50% tumour and 50% normal) would have a tumour fraction of 0.5 if the absolute copy number at that position in both the tumour and the normal is 2, or a fraction of 0.6 if the absolute copy number in the tumour is 3.

Select a ploidy and cellularity using the selectors at the top of the main page or by clicking on a point within the distance heatmap. The distance heatmap shows how well different choices of ploidy and cellularity scale the relative copy number data to whole numbers on the absolute copy number scale.

Best fit solutions are displayed as points in the heatmap and listed in the table below the heatmap. Select a solution to update the currently selected ploidy and cellularity.

Specify the distance function (mean absolute difference or root mean square difference) from the drop-down list. This is applied to segmented copy number values with the following options for which values to use:

  • segments — relative copy number values for each segment weighted by the size of the segment
  • maxima — relative copy number values for each peak in the segmented copy number density plot, each given equal weight; the selected number of the most frequently observed relative copy number states (maxima) are used

The selected ploidy and cellularity can be stored in a cache by clicking on the Store button. Click on the Restore button to select the ploidy and cellularity currently stored in the cache.

The segmented copy number data can be saved for the current sample using the Save button. These include both relative copy numbers and scaled, absolute values for the currently selected ploidy and cellularity.

The copy number plots can be saved as PDF image files using the PDF buttons.

Ploidy/cellularity cache page

The cached ploidies and cellularities for each sample are displayed on the Ploidy/cellularity cache page. Cached ploidies and cellularities can be saved as a CSV file by clicking on the Save button. Previously saved (or otherwise determined) ploidies and cellularities can be loaded from a tab-delimited or CSV file by clicking the Browse button.

Genes page

A set of genes and their locations can be loaded on the Genes page.

Genes are displayed on the chromosome copy number plot as vertical bars.

Selecting a gene from the table on this page or in the drop-down on the main page will display the copy number plot for the chromosome on which the gene is located. The tumour fraction for the selected gene will also be displayed alongside each of the best fit solutions to help in deciding which solution is most consistent with other supporting data, e.g. allele fraction for a homozygous variant in that gene from digital PCR or amplicon sequencing. However, this is only the case where there is a single absolute copy number fitted across the entire length of the gene.

Settings page

Various display settings can be adjusted on the Settings page.

Settings


Copy number plots


PDF export

Absolute copy number steps


Filtering options for best fit solutions

Copy number density plot


Distance heat map


© University of Cambridge
Terms and Conditions