Manhattan plot

Normally, for instance in genome-wide association studies (GWAS), a Manhattan plot is a type of scatter plot, usually used to display data with a large number non-zero amplitude data-points. It gains its name from the similarity of such a plot to the Manhattan skyline. In GWAS Manhattan plots, genomic coordinates are displayed along the X-axis. The negative logarithm of the association P-value for each single nucleotide polymorphism (SNP) are displayed on the Y-axis. Therefore, the stronger the associations between SNP, the larger the Y-axis value.

To improve the feasibility and transmitting speed of our global website service and to serve the gene targeting research as best as possible, we did several modifications to the original Manhattan plot:

→ For gene targeting research, we plot Manhattan plot for genes rather than for SNPs which means the P-value are calculated for genes in two different group samples.

→ To improve the transmitting speed, we used line rather than spots in the original Manhattan plot.

→ We expanded the usage of Manhattan plot from mRNA expression values to copy number variations.

→ For more options, you can plot Manhattan plot by chromosomes besides by genome.

→ Besides regular Manhattan plot, we provided another option to show more information in this plot: Directional Manhattan plot. If the median value of the gene in group is smaller than that in group2, the corresponding line points down from the base line (normally, it's zero line).

As we mentioned above, there are three ways to get into the Manhattan plot page.

→ Through the Navigation bar at the Home page, select “Manhattan plot” under “Data Analysis”;

→ Go to “Data Analysis” page, then go to “Data visualization” area, select “Manhattan plot”;

→ Through the link in the “Link area” at the Home page, go to “Data Analysis” page, then go to “Data visualization” area, select “Manhattan plot”.

For “Manhattan plot” page, there are five areas:

→ Navigation bar: You can switch to other pages through this navigation bar.

→ Setting area: You can specify genes, cancer types, data types, cutoff values and other parameter details here.

→ Plotting area: The Manhattan plot will be plotted in this area.

→ Figure Downloading and DIY area: You can download Manhattan plot in a certain format and size. You can also customize line color and so on through the option buttons in this area.

→ Link area: Necessary links are available for you to switch to other pages or websites.

Note: quick help can be available through putting your mouse on the small question marks besides certain options in this pages.

1. It reminds you which kind of plot you are working on.

2. You can select mRNA expression, copy number variation. Originally, Manhattan plot was used to visualize genome mRNA expression values relationships. But we expand it to visualize the relationships of copy number variation, methylation, and so on.

3. In TCGA/GDC dataset, non-malignant samples and tumor samples are not both always available for all cancer types. Available sample types vary for different data type even for the same cancer type. For example, for acute myeloid leukemia (LAML) cancer, no non-malignant samples of mRNA expression values are available, but both non-malignant and tumor samples are available for copy number variation data. Different legends are added before cancer names to tell you which kind of samples of the given cancer types can be available.

Note: The gene copy numbers in non-malignant samples are supposed to be 2. But because of so many reasons, they are a little bit different from 2. To make a good comparison, all non-malignant samples are combined together as a super-control. You can select it by selecting “[all-non-malignant]” at the bottom of the cancer list.

⚠: without non-malignant which means only tumor samples of this cancer type are available for the data type specified in (4) and (5).

❌: not available which means neither tumor samples nor non-malignant samples of this cancer type are available for the data type specified in (4) and (5).

4. You can specify the first group here by selecting the cancer type through the drop-down list and the sample type by checking one of the circles.

5. You can specify the second group here by selecting the cancer type through the drop-down list and the sample type by checking one of the circles..

Note: It needs samples of two different groups to do the t-test, if you select the same cancer type for the first and second group, please make sure the sample types of them are different. Otherwise, an error information will be displayed in the plotting area and no Manhattan plot will be created.

6. You can specify a concern chromosome here. You also can select to create a Manhattan plot for chromosomes 1-22. But due to the big data size for transmitting and for t-test calculating (more than 20,000 genes and dozens or hundreds samples), it may take several minutes. So please bear with it!

7. You can input the concern gene symbols here. Then they will be highlighted in Manhattan plot in different colors to make it easier to compare. If you want to input more than one gene symbols, a common and a space should be used to separate two gene symbols. Only HUGO (Human Genome Organization) symbols are accepted. For example: EGFR, KRAS, TP63….

Note: small case and big case are all acceptable. For example, kRAS, kras, KRas, KRAS are all treated as the same gene.

8. In Manhattan plot, the color of each line is decided by the mean or median values of the corresponding gene in the specified two groups. You can specify 'mean' or 'median' through this drop-down list.

9. There are two other options here:

→ Directional Manhattan: if you select this option, the directional Manhattan plot will be created in the plotting area. In some cases, except for the difference between the samples of two groups, it is also good to know which group is bigger. Therefore, we compare the median values of samples in each group. If the median value of each gene

→ Log2: You can specify concern transformation type checking this option. Correspondingly, log2 transformation will be applied to the data before Bee-swarm plot (for mRNA expression values, it's log2 transformation; for CNV (copy number variation) values, it's log2(CNV/2) transformation).

10. You can input a cutoff value for p-value to see how many gene's P-values are significantly different for each arm. A line at -10log10(cutoff) will be plotted to show the cutoff on the Manhattan plot. For Directional Manhattan plot, two lines will be plotted at -10log10(cutoff) and 10log10(cutoff).

After setting all these necessary options, click “GO” button at the bottom of this area, the Manhattan plot will be created in the plotting area. Because the big data size and the calculating time for t-test, it may take seconds or minutes to do the t-test and to create Manhattan plot. The processing time varies according to the internet transmitting speed and the configuration of your computer.

Manhattan plot figures will be shown in this area.

A toolbar will show up at the top right of this plotting area when a Manhattan plot is created.

1. Save as Image: You can click it to swich into a image saving webpage then click right mouse button to save this image. You also can specify the image format and size by selecting the options in the Figure downloading and DIY area.

2. Data table: If you want to download the sample data in a table, you can click this button. Then a table containing all data will show up in the plotting area like this. You can select and copy the whole table or any part of it into a word or excel file by selecting and clicking right mouse button as you usally do. You can scroll down to see the information of other samples. You also can click the “close” button at the bottom left of this page to close the table page and go back to the default page with the plotting area.

For your convinence, the sample ID and other details of each individual gene will show up when you put your mouse on the corresponding line.

For example: in the above figure, after putting the mouse on a line, a catalog showed up like this:

From the left to the right are: gene symbol and p-value of this gene in the corresponding two groups. Therefore, in this example: the gene symbol is FUDC3B, the p-value of it's copy number variations in lung adenocarcinoma and lung squmous cell carcinoma is 1.36e-95 which means it has significantly different copy number variations in these two groups.

You can specify image format (png or jpg) and size/dimensions for the image to download .

You can modify colors and Y-Limits of this figure.

Example 1: Plotting Manhattan plot of copy numbers on chromosome 3 in lung adenocarcinoma tumor vs non-malignant samples with 0.05 as the cutoff of p-value.

From the figure above, we can see that the whole P-arm genes whose copy number variations in lung adenocarcinoma tumor samples and non-malignant samples are significantly different. On the contrary, only 60.43% of the genes on Q-arm are significantly different.

After checking the Directional Manhattan box, the figure will turn out to be:

The whole P-arm genes' lines are pointing down which means the lung adenocarcinoma tumor samples has a smaller median copy number variation than that of the corresponding non-malignant samples. From the corresponding Mountain plot, we can see that there is a mild deletion in P-arm which confirms the results we observed above.

Example 2: Plotting Manhattan plot of copy number variations on chromosome 3 of lung adenocarcinoma tumor samples vs lung squamous cell carcinoma tumor samples with SOX2 and TP63 highlighted.

We can see there are two lines are highlighted in a different color. When we put and hold our mouse on it, the corresponding information will show up.

You may want to know which group has a higher copy number variation median value and you may not like the colors plotted in default. Therefore, you can select the “Directional Manhattan” option and use the Figure DIY options to change the color. The Setting area and the final plot are shown in the following picture.

  • manhattan.txt
  • Last modified: 2019/07/06 14:20
  • by tongyifan