In IPA (Ingenuity Pathway Analysis), p-values are calculated using statistical analysis. The p-value is a metric used to evaluate whether the observed data is statistically significant compared to what would be expected in random conditions.
General Workflow
In IPA pathway analysis, p-values are calculated through the following steps:
- Data Preprocessing: Input raw data, such as gene expression data or protein expression data, for pathway analysis.
- Pathway Selection: Choose the biological pathways of interest for analysis. These may include signaling pathways or metabolic pathways related to specific diseases, for example.
- Pathway Scoring: Calculate scores for the genes or proteins involved in the selected pathway. This may involve differential expression analysis for gene expression data or analysis of expression variations for protein data, for example.
- Permutation Test: Generate random datasets based on the scores of genes or proteins within the pathway. These random datasets maintain the characteristics of the input data while assigning scores to genes or proteins randomly.
- P-value Calculation: Use the generated random datasets from the permutation test to create a random distribution of scores. Then, evaluate the position of the observed scores within the random distribution and calculate the p-value. The p-value represents the probability of obtaining the observed scores in a random distribution.
A smaller p-value indicates a lower probability of obtaining the observed scores in random conditions. Generally, if the p-value is below 0.05 (typically below 0.01), it is considered statistically significant. In such cases, the observed scores are believed to have a very low probability of occurring in random conditions, suggesting a biologically meaningful association.
How is Pathway Scoring done?
Pathway scoring involves quantifying the importance or contribution of genes and evaluating them numerically.
As a specific example, let’s consider three genes (A, B, C) associated with a pathway, and their expression levels are given as follows:
Gene A expression level: 10 Gene B expression level: 5 Gene C expression level: 8
In this case, we can calculate scores for the genes based on their expression levels, assigning, for example, a score out of 10 points. Since Gene A has the highest expression level, it would receive 10 points. Gene B, with a moderate expression level, would receive 5 points. Gene C, also having a high expression level, would receive 8 points.
Next, we normalize these scores, for example, scaling them to a range of 0 to 1. In this case, Gene A would be 1.0, Gene B would be 0.5, and Gene C would be 0.8.
By following these steps, we can assign scores to genes within the pathway, allowing for the evaluation of gene importance and their roles within the pathway.
What is a Permutation Test?
In a permutation test, data is randomly shuffled to perform statistical analysis. This random shuffling helps predict the results that would be expected if the data were obtained under random conditions.
For example, when evaluating whether gene expression data for Gene A and Gene B are associated with Pathway X, a permutation test can be performed using the following steps:
- Data Preprocessing: Extract gene expression data for Gene A and Gene B from patient data.
- Gene Scoring: Assign scores to Gene A and Gene B based on their expression data. The scoring method can be based on gene expression levels or importance, for example.
- Permutation Test Preparation: Prepare the score data for Gene A and Gene B for the permutation test.
- Perform Permutation: Randomly shuffle the score data for Gene A and Gene B and perform the permutation test. This predicts the results that would be obtained if the association between Gene A and Gene B occurred in a random situation.
- Repeat Permutation: Repeat the permutations multiple times to generate random datasets. Typically, thousands of permutations are performed.
Through the permutation test, it is possible to evaluate whether there is a statistically significant association between the scores for Gene A and Gene B and Pathway X. This enables a statistical verification of whether specific genes are involved in specific pathways.
How are p-values calculated?
- After performing the permutation test, evaluate the position of the original data within the generated random dataset.
- Calculate statistical measures of the original data (e.g., absolute difference in scores between Gene A and Gene B, correlation coefficient, etc.) for evaluation.
- Calculate the proportion of random datasets from the permutation test that yield statistical measures greater than or equal to those of the original data.
- This proportion becomes the p-value, which represents the probability of obtaining the original data in a random state.
Let’s consider the evaluation of the absolute difference in scores between Gene A and Gene B as an example.
- In the permutation test, randomly shuffle the score data for Gene A and Gene B.
- Calculate the absolute difference in scores between Gene A and Gene B in the original data.
- Count the number of times the absolute difference in scores greater than or equal to the original data is observed in the generated random datasets from the permutation test.
- Divide this count by the number of permutations and calculate the proportion. This becomes the p-value.
The p-value represents the probability of obtaining the original data in a random state. A smaller p-value indicates a lower probability of obtaining the original data in random conditions. In statistical hypothesis testing, if the p-value is small compared to a pre-defined significance level (usually 0.05 or 0.01), the result is considered statistically significant.
This is just one example of how p-values can be calculated. It allows for a statistical evaluation of the permutation test results and determines the probability of obtaining the original data in a random state.
Here’s an example visual representation: