Table of Contents
1. Data input #
Example files are provided in the example directory:
test.bed
, test.bim
, test.fam
, and test_simu_pheno.txt
.
Note: The example dataset is intentionally small and should only be used for testing program functionality. To obtain stable and reliable results, analyses should be performed on large-scale datasets.
PLINK Binary File #
PLINK binary files consist of three components:
*.bed
(binary genotype file): Stores genotype data in a compact binary format.*.bim
(variant information file): Contains SNP identifiers, chromosome positions, and allele information.*.fam
(sample information file): Includes individual IDs, family structure, sex, and phenotype data.
In fastGxE, missing genotype values for the focus SNP are imputed using the mean genotype. However, it is recommended to preprocess missing genotypes with Beagle or other imputation software for improved accuracy.
Data File Format #
The data file should contain individual IDs, confounding covariates, interacting environmental covariates, and phenotypic values. It must follow the structure below:
- Header Line: A header row is required.
- Delimiter: Data should be separated by blanks or tabs.
- First Column: The first column must be individual IDs.
- Matching Individuals:
- Only individuals present in both the PLINK binary file and the data file will be used in the analysis.
- Individuals missing in either the PLINK binary file or the data file will be automatically removed by fastGxE during the analysis.
An example data file
iid | pheno | E1 | E2 | E3 | E4 | E5 | E6 | E7 | E8 | E9 | E10 | E11 | E12 | E13 | E14 | E15 | E16 | E17 | E18 | E19 | E20 | E21 | E22 | E23 | E24 | E25 | E26 | E27 | E28 | E29 | E30 | E31 | E32 | E33 | E34 | E35 | E36 | E37 | E38 | E39 | E40 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1110005 | -0.75286 | 0.21092 | 0.648552 | 0.468537 | 0.285131 | -0.15092 | -0.1555 | -0.56958 | 0.07954 | -0.1891 | 0.338504 | -0.07503 | 0.744465 | -1.2262 | -0.77347 | -0.9495 | 0.204434 | 0.287995 | -0.36906 | 0.023615 | 0.100999 | 0.913586 | 0.281957 | -0.26894 | 1.076054 | 0.631956 | -0.25934 | -0.12063 | 0.293751 | -0.14226 | 0.269535 | -0.69097 | 0.12688 | 1.172146 | -0.77402 | -1.75724 | -2.34637 | -0.57211 | 1.166077 | -0.4581 | -0.55331 |
1110006 | -0.62861 | 0.000148 | -0.12916 | -1.77284 | -0.31648 | 0.461651 | 0.281532 | -0.74519 | -1.43904 | -0.94778 | -0.1699 | 0.276132 | 0.292023 | -0.52048 | 0.963399 | 0.311522 | -1.27339 | 0.302758 | -0.91646 | 1.055273 | 0.500981 | -0.87986 | -0.48948 | 0.78681 | 1.010843 | -1.76933 | 1.697091 | -0.32637 | -0.83483 | -1.82932 | -1.17107 | 0.733051 | -0.82921 | -0.22705 | 0.509752 | -0.06756 | 1.728553 | -0.03009 | -1.43085 | 0.59895 | 0.836599 |
1110008 | 1.132841 | -1.14961 | -0.29059 | 0.446468 | -0.15161 | 0.222543 | -0.33517 | 0.81958 | 0.502829 | -1.87353 | -0.82221 | 0.332078 | 1.77179 | -0.12034 | -0.81934 | 0.500914 | 0.265973 | -0.81532 | 0.794444 | 1.517311 | -1.1751 | 0.228761 | -0.07199 | 0.446747 | 0.091397 | -1.82258 | 0.542994 | -0.16897 | 0.352514 | -1.63777 | -0.10023 | 1.466236 | 0.746707 | 0.59923 | 0.515903 | 0.473775 | 1.697436 | 1.165627 | 0.500921 | 1.237719 | -0.10859 |
1110009 | 0.773396 | -0.90416 | -0.00513 | -0.045 | -0.90065 | -0.46537 | 0.752511 | 0.355544 | 0.359747 | 1.585845 | -0.38118 | -0.50373 | 0.253581 | -0.44434 | -0.5287 | -0.15618 | 1.483534 | 1.320471 | 0.593645 | 0.506708 | -0.65951 | 0.033098 | -1.52481 | 1.100987 | 0.33127 | -0.45424 | -0.18036 | 1.700584 | 1.22 | -0.86434 | -0.15836 | 0.332595 | 1.73738 | 1.11772 | -0.92361 | -0.39209 | -3.61622 | 0.128307 | -0.20065 | 0.291977 | -1.28622 |
1110011 | 0.209078 | -0.87784 | -2.24717 | -0.33057 | -0.87823 | 1.68817 | 0.193299 | 0.753634 | -0.22589 | -2.1622 | 1.294862 | 0.898332 | -0.33214 | -1.23525 | -1.06617 | -1.29512 | 1.44334 | -1.03666 | -0.44052 | -1.22453 | -0.03294 | 1.713709 | -0.53588 | -0.11923 | -0.68673 | 0.378322 | -1.08003 | 0.897898 | -2.00473 | 1.621039 | -1.26384 | 0.352975 | -0.45193 | -0.36424 | -0.07237 | -0.4594 | 0.971471 | -0.84527 | 0.207248 | 2.920991 | -0.81485 |
1110023 | -0.13773 | 0.99387 | 0.577044 | 0.074119 | -0.34685 | -1.34186 | 0.63063 | 0.828125 | 1.127962 | 0.679971 | 0.242218 | -0.06751 | -0.13105 | -1.01052 | -0.86894 | -1.46575 | 0.20377 | -0.93781 | -1.36779 | 0.476924 | -1.15502 | -0.0056 | 0.40834 | -1.19064 | -0.12594 | -0.13059 | -0.50763 | 0.904232 | -1.18535 | -0.87793 | -1.05899 | -0.01426 | 1.457778 | 1.412801 | 0.00192 | 0.074879 | -0.10291 | 1.164542 | 0.203062 | 0.478138 | 1.223259 |
1110033 | -1.16732 | 0.14276 | -0.4078 | 0.215731 | 1.581048 | 0.077753 | -0.12957 | -0.91409 | -1.17939 | -0.74846 | 1.230236 | -0.41687 | 0.228574 | 0.667259 | 1.042392 | 0.16277 | -0.22133 | 1.092244 | -0.20791 | 0.527073 | 0.03231 | -0.46105 | 0.084871 | -1.07208 | 0.936474 | 0.117742 | 0.472216 | 0.086622 | 0.072974 | 0.02184 | -0.47137 | 0.089335 | -1.54968 | -0.10841 | 1.040897 | -0.19092 | -0.95165 | 0.342614 | 0.174287 | 0.104902 | 0.20719 |
1110035 | 1.52892 | 0.380602 | -0.23611 | -0.25621 | 1.251567 | -1.07136 | 1.600939 | -1.5843 | 0.54906 | 1.703418 | -1.6009 | 0.126096 | -0.13079 | -0.30247 | -1.79334 | 0.608061 | -1.37453 | -0.59196 | -1.2016 | 0.451544 | 0.170108 | -1.18104 | 0.517898 | -0.84502 | 1.49383 | 0.509155 | -0.28951 | 0.18437 | -0.27631 | 0.032461 | 2.121288 | 1.155691 | -0.84489 | 0.037005 | -0.19401 | -0.7891 | -1.29008 | -0.49291 | -0.82919 | -0.94171 | 0.200087 |
1110043 | -1.63005 | 1.625271 | 1.558561 | -0.36283 | -1.31468 | -0.38357 | 0.820452 | 1.067546 | 1.625511 | 1.289687 | -1.14444 | 1.275622 | 1.990491 | -0.69936 | -0.08353 | -0.63966 | 2.483601 | -1.80873 | 0.815655 | 0.134742 | 1.330531 | -0.32461 | -0.33861 | 0.008995 | -0.55535 | 1.039129 | -0.01007 | -0.38419 | 0.107745 | -2.12606 | -0.17434 | 1.990626 | 4.504256 | 1.592914 | 0.327584 | -0.29862 | -1.24944 | 1.32345 | -0.89109 | 1.247789 | 2.851869 |
1110044 | 1.27608 | -1.35986 | -2.07641 | -1.65412 | 0.267007 | 0.046312 | -1.33002 | -0.13738 | -0.05922 | -0.651 | -0.0998 | -0.3337 | 0.188556 | -0.72331 | -0.84339 | 0.091237 | -0.23129 | 0.493803 | -1.79401 | -0.75501 | -0.30332 | -0.32378 | 0.658686 | 1.003795 | -0.26546 | 1.366845 | 1.35157 | -0.17636 | 0.769366 | 0.844615 | -1.59522 | 0.466167 | -0.83507 | -0.00921 | -0.15057 | 0.169063 | 0.677308 | 0.109467 | 0.440098 | 0.611541 | -1.35669 |
1110063 | -0.40119 | 1.028213 | 0.746369 | 0.000715 | 0.037617 | 0.65775 | 0.234105 | 0.660881 | -0.09425 | -0.22022 | 0.299793 | -0.31184 | -1.48914 | -0.82789 | -0.88923 | -1.36938 | 0.712816 | 0.124924 | -0.44722 | -1.54906 | 2.592713 | -0.25346 | -1.10651 | 0.757067 | -0.87045 | -1.89933 | 0.686628 | -0.25726 | -1.73575 | -0.51228 | 1.107235 | 1.016678 | -0.60243 | 0.017886 | -2.17325 | -0.19513 | -0.41595 | -1.96924 | -1.52661 | -0.19715 | -0.65824 |
Associated options in fastGxE
**--missing-data**NA Na na NAN NaN nan -NAN -NaN -nan N/A n/a
Any symbol listed above will be treated as a missing value. Symbols are separated by spaces, and users can add custom missing values to the list. If a variable is included in the model and contains a missing value in this field, the entire row will be discarded.
2. Calculate the Genomic Relationship Matrix (GRM) #
Single-Step Calculation #
For standard datasets, compute the GRM in a single step:
fastgxe --make-grm --bfile test --code-type 2 --out test
Parallel Computation for Large Datasets (e.g., Biobank Data) #
For large-scale datasets, partition the GRM into 100 parts, compute them in parallel, and then merge the results:
fastgxe --make-grm --bfile test --code-type 2 --npart 100 1 --out test
fastgxe --make-grm --bfile test --code-type 2 --npart 100 2 --out test
...
fastgxe --make-grm --bfile test --code-type 2 --npart 100 100 --out test
Once all parts are computed, merge them into a single GRM file:
fastgxe --process-grm --merge --grm test.agrm --npart 100
Sample Clustering Based on Relatedness #
To group samples such that any pair with an estimated relatedness greater than 0.05 belongs to the same subgroup:
fastgxe --process-grm --group --grm test.agrm --cut-value 0.05 --out test.agrm
3. Genome-Wide GxE Analysis #
Using the previously generated sparse GRM, along with a PLINK binary file and a data file, perform a GxE analysis as follows:
fastgxe --test-gxe --grm test --bfile test --data test_simu_pheno.txt --trait pheno \
--env-int E1:E40 --out test_gxe
- The column
pheno
represents the trait values. - Columns from
E1
toE40
are considered interacting environmental factors.
Parallel Computation for Large Datasets #
For large datasets, partition the SNPs into 10 parts and compute them in parallel. The results can then be merged using a custom Python or R script.
fastgxe --test-gxe --split-task 10 1 --grm test --bfile test --data test_simu_pheno.txt --trait pheno \
--env-int E1:E40 --out test_gxe
fastgxe --test-gxe --split-task 10 2 --grm test --bfile test --data test_simu_pheno.txt --trait pheno \
--env-int E1:E40 --out test_gxe
...
fastgxe --test-gxe --split-task 10 10 --grm test --bfile test --data test_simu_pheno.txt --trait pheno \
--env-int E1:E40 --out test_gxe
Once all tasks are completed, the results can be merged using a custom Python or R script.
4. GxE Heritability Estimation Using the Method of Moments #
To obtain an unbiased estimate of GxE heritability, use the method of moments:
Note: we need large sample size to get stable results. The example data have too small size.
fastgxe --mom --bfile test --data test_simu_pheno.txt --trait pheno \
--env-int E1:E40 --out test_mom
5. Identifying Environmental Factors Driving GxE #
Once GxE genomic loci are identified, the environmental factors driving the GxE effect of the lead SNP can be determined using mmSuSiE. mmSuSiE is written in python can be found at https://github.com/chaoning/mmsusie
First reformat the genomic relationship matrix with fastgxe
# this step will produce the
fastgxe --process-grm --reformat --grm test.agrm --out-fmt 1 --out test.agrm
from mmsusie import MMSuSiE
model = MMSuSiE()
# Define input files
bed_file = "test" # PLINK binary genotype file prefix (.bed/.bim/.fam)
grm_file = "test" # GRM (genetic relationship matrix) file prefix
pheno_file = "test_simu_pheno.txt" # Phenotype file with environmental covariates and trait
# Define a list of interacting environmental variables: E1, E2, ..., E40
env_int = [f"E{i}" for i in range(1, 41)]
# Perform LD pruning to identify leading SNPs for GxE signals
assoc_file = "test_gxe.res" # Association results from fastGxE or similar method
df_leading = model.ld_pure(assoc_file, bed_file, ld_r2=0.1, snp="SNP", p="p_gxe", p_cutoff=5e-8)
print(df_leading) # Print the pruned set of lead SNPs
# Specify parameters for mmSuSiE fine-mapping
snp_id = "rs550011" # Lead SNP to analyze
trait = "pheno" # Target phenotype column name
varcom_file = "test_gxe.var" # Variance components file estimated from fastGxE --test-gxe
out_file = "test_mmsusie" # Output file prefix for mmSuSiE results
# Run the mmSuSiE model to identify environmental factors driving the GxE effect
res_dct = model.run(pheno_file, trait, env_int, grm_file, bed_file, snp_id, varcom_file, out_file)
# Print the credible sets (CS) identified by mmSuSiE (index start from 0)
print(res_dct['cs'])
# Extract posterior inclusion probabilities (PIPs)
df_pip = res_dct["pip"]
print(df_pip)
# Display environmental factors with PIP > 0.5
print(df_pip[df_pip["pip"] > 0.5])