Last updated: 2025-01-07
Checks: 6 1
Knit directory:
Genomic-Selection-for-Drought-Tolerance-Using-Genome-Wide-SNPs-in-Casava/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
The R Markdown file has unstaged changes. To know which version of
the R Markdown file created these results, you’ll want to first commit
it to the Git repo. If you’re still working on the analysis, you can
ignore this warning. When you’re finished, you can run
wflow_publish
to commit the R Markdown file and build the
HTML.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20221020)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 5b6a406. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: data/allchrAR08.txt
Untracked files:
Untracked: ETA_1_ScaleBayesA.dat
Untracked: ETA_1_parBayesB.dat
Untracked: analysis/figure/
Untracked: analysis/mixed_models.Rmd
Untracked: data/Articles/
Untracked: data/Phenotyping2.xlsx
Untracked: data/Tabela auxiliar.xlsx
Untracked: data/data.rar
Untracked: mu.dat
Untracked: output/BLUPS2.csv
Untracked: output/BLUPS_GEBV_GETG_all clones.csv
Untracked: output/BLUPS_density_med.png
Untracked: output/BLUPS_density_med_row_col.png
Untracked: output/BLUPS_drgBLUPs__boxplot_med.png
Untracked: output/BLUPS_drgBLUPs__boxplot_med_row_col.png
Untracked: output/BLUPS_drgBLUPs_density_med.png
Untracked: output/BLUPS_drgBLUPs_med.png
Untracked: output/BLUPS_par_mmer.Rdata
Untracked: output/BLUPS_row_col.csv
Untracked: output/BLUPS_row_col_random.csv
Untracked: output/BLUPS_x_BLUPS_row_col_boxplot_med.png
Untracked: output/Density_residual_row_col.tiff
Untracked: output/Figuras_article.rar
Untracked: output/GEBVS_BayesA.RDS
Untracked: output/GEBVS_BayesB.RDS
Untracked: output/GEBVS_DOM.RDS
Untracked: output/GEBVS_G_BLUP.RDS
Untracked: output/GEBVS_G_BLUP_row_col.RDS
Untracked: output/GEBVS_G_BLUP_row_col_random.RDS
Untracked: output/GEBVS_RF.RDS
Untracked: output/GEBVS_RKHS.RDS
Untracked: output/GEBVS_RR_BLUP.RDS
Untracked: output/GEBV_BLUP.csv
Untracked: output/GEBV_DOMxBLUPS.tiff
Untracked: output/GEBV_GBLUPxBLUPS.tiff
Untracked: output/GEBVxGETGV.tiff
Untracked: output/GETGVxBLUPS.tiff
Untracked: output/G_matrix.rds
Untracked: output/H2.csv
Untracked: output/H2_row_col.csv
Untracked: output/H2_row_col_random.csv
Untracked: output/Heatmap.tiff
Untracked: output/MSPE_all_methods.tiff
Untracked: output/Residuals_vs_fitted_row_col.tiff
Untracked: output/accuracy_all_methods.tiff
Untracked: output/accuracy_all_methods_points.tiff
Untracked: output/cor_GEBVxBLUPS.tiff
Untracked: output/cor_GEBVxBLUPS_all_methods.tiff
Untracked: output/correlation_blups.tiff
Untracked: output/density_blups.tiff
Untracked: output/dierencial_selecao.csv
Untracked: output/dierencial_selecao.xlsx
Untracked: output/drgBLUP.csv
Untracked: output/fitted_residual_data_row_col_random.csv
Untracked: output/fitted_values_row_col_random.csv
Untracked: output/indice_selection.tiff
Untracked: output/indice_selection3.tiff
Untracked: output/indice_selection_GEBV.tiff
Untracked: output/indice_selection_GEBV_GETGV.tiff
Untracked: output/indice_selection_GETGV.tiff
Untracked: output/kappa.tiff
Untracked: output/kappa2.tiff
Untracked: output/mean_pheno.csv
Untracked: output/medias_semestre.xlsx
Untracked: output/pesos_BLUPS_all clones.csv
Untracked: output/pesos_GEBVS_all clones.csv
Untracked: output/pesos_GETGVS_all clones.csv
Untracked: output/pheno_mean_sd.csv
Untracked: output/residuos_row_col_random.csv
Untracked: output/result_sommer.RData
Untracked: output/result_sommer_row_col.RData
Untracked: output/result_sommer_row_col_random.RDS
Untracked: output/results_MSPE.csv
Untracked: output/results_accuracy.csv
Untracked: output/results_cv_BayesA.RDS
Untracked: output/results_cv_BayesB.RDS
Untracked: output/results_cv_DOM.RDS
Untracked: output/results_cv_G_BLUP.RDS
Untracked: output/results_cv_G_BLUP_row_col.RDS
Untracked: output/results_cv_G_BLUP_row_col_random.RDS
Untracked: output/results_cv_RF.RDS
Untracked: output/results_cv_RKHS.RDS
Untracked: output/results_cv_RR_BLUP.RDS
Untracked: output/results_h2_GBLUP (1).rds
Untracked: output/results_h2_GBLUP.csv
Untracked: output/results_h2_GBLUP.rds
Untracked: output/results_h2_GBLUP_D.rds
Untracked: output/results_kappa.csv
Untracked: output/results_kappa2.csv
Untracked: output/results_values-selection.csv
Untracked: output/results_values_kappa.csv
Untracked: output/results_values_rescale_selection.csv
Untracked: output/results_values_selection.csv
Untracked: output/rstudio-export.zip
Untracked: output/teste_LRT.csv
Untracked: output/teste_LRT_row_col.csv
Untracked: output/teste_LRT_row_col_random.csv
Untracked: output/varcomp.tiff
Untracked: output/varcomp2.csv
Untracked: output/varcomp_row_col.csv
Untracked: output/varcomp_row_col.tiff
Untracked: output/varcomp_row_col_random.csv
Untracked: output/varcomp_row_col_random.tiff
Untracked: varE.dat
Unstaged changes:
Modified: README.md
Modified: analysis/GWS.Rmd
Modified: analysis/_site.yml
Modified: analysis/index.Rmd
Modified: analysis/license.Rmd
Modified: analysis/phenotype.Rmd
Modified: data/SNPs.rds
Deleted: data/convet_haplo_diplo.txt
Modified: data/pheno_clean.csv
Deleted: output/BLUPS.RDS
Modified: output/BLUPS.csv
Deleted: output/BLUPS_Multi.csv
Modified: output/BLUPS_par.Rdata
Deleted: output/BLUPS_par_ind.Rdata
Deleted: output/H2_Multi.csv
Deleted: output/accuracy.png
Deleted: output/accuracy2.png
Deleted: output/accuracy_multi.png
Deleted: output/blups_med.png
Deleted: output/herdabilidade.csv
Modified: output/media_pheno.csv
Deleted: output/media_pheno_Multi.csv
Deleted: output/resultMM.Rdata
Deleted: output/result_G_BLUP.csv
Deleted: output/results_cv.csv
Modified: output/varcomp.csv
Deleted: output/varcomp_multi.csv
Staged changes:
New: Artigo GS tolerancia a seca_revised.rar
New: Artigo GS tolerancia a seca_revised/Costa et al_FPS_2024 manuscript.docx
New: Artigo GS tolerancia a seca_revised/Costa et al_FPS_2024 manuscript_revised.docx
New: Artigo GS tolerancia a seca_revised/Costa et al_FPS_2024 supplement.docx
New: Artigo GS tolerancia a seca_revised/Figure 1.tiff
New: Artigo GS tolerancia a seca_revised/Figure 2.tiff
New: Artigo GS tolerancia a seca_revised/Figure 3.tiff
New: Artigo GS tolerancia a seca_revised/Figure 4.tiff
New: Artigo GS tolerancia a seca_revised/GWS.Rmd
New: Artigo GS tolerancia a seca_revised/phenotype.Rmd
New: Artigo GS tolerancia a seca_revised/response to reviewers.docx
New: Seleção Genômica deficit hidrico.pptx
New: data/Article.docx
New: data/Article2.docx
New: data/Costa et al_FPS_2024 manuscript eng.docx
New: data/Costa et al_FPS_2024 manuscript eng_v1.docx
New: data/Costa et al_FPS_2024 manuscript eng_v2.docx
New: data/Costa et al_FPS_2024 manuscript eng_v3.docx
New: data/Costa et al_FPS_2024 manuscript.docx
New: data/Costa et al_FPS_2024 supplement.docx
New: data/Frontiers_Template.docx
Deleted: data/Genomic Selection for Drought Tolerance Using Genome-Wide SNPs in Maize.pdf
New: data/Supplementary_Material.docx
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/phenotype.Rmd
) and HTML
(docs/phenotype.html
) files. If you’ve configured a remote
Git repository (see ?wflow_git_remote
), click on the
hyperlinks in the table below to view the files as they were in that
past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
html | c64a991 | Weverton Gomes | 2023-10-27 | add phenotype html |
Rmd | 286b492 | Weverton Gomes | 2023-10-27 | Update Scripts and README |
Rmd | 90dc112 | WevertonGomesCosta | 2022-11-17 | Update |
Rmd | 6cc4d23 | WevertonGomesCosta | 2022-11-17 | Update |
html | 6cc4d23 | WevertonGomesCosta | 2022-11-17 | Update |
html | d930880 | WevertonGomesCosta | 2022-11-11 | Update |
Rmd | 5988c27 | WevertonGomesCosta | 2022-11-11 | Update |
html | 5988c27 | WevertonGomesCosta | 2022-11-11 | Update |
Rmd | bf7b1d3 | WevertonGomesCosta | 2022-11-11 | Update |
html | bf7b1d3 | WevertonGomesCosta | 2022-11-11 | Update |
Load Libraries
library(kableExtra)
library(tidyverse)
#if (!require("BiocManager", quietly = TRUE))
# install.packages("BiocManager")
#BiocManager::install("ComplexHeatmap")
require(ComplexHeatmap)
library(data.table)
library(readxl)
library(metan)
library(DataExplorer)
library(ggthemes)
library(GGally)
theme_set(theme_bw())
Let’s import the phenotypic dataset, excluding the traits without information and the traits Local (redundant with Year) and Treatment (only one observation).
pheno <- read_excel("data/Phenotyping2.xlsx", na = "NA") %>%
select_if(~ !all(is.na(.))) %>% # Deleting traits without information
select(-c("Local", "Tratamento"))
We will perform some manipulations to adjust our database and to facilitate the visualization of the exploratory analysis.
First, let’s convert the traits that are character into factors. Then we will convert the traits that refer to the grades to integers and then into factors. After that, let’s create the trait ANo.Bloco for nesting in the model to obtain the BLUPs.
Introductory analysis of the entire dataset
introduce(pheno) %>%
t() %>%
kbl(escape = F, align = 'c') %>%
kable_classic(
"hover",
full_width = F,
position = "center",
fixed_thead = T
)
rows | 2336 |
columns | 29 |
discrete_columns | 6 |
continuous_columns | 23 |
all_missing_columns | 0 |
total_missing_values | 16875 |
complete_rows | 440 |
total_observations | 67744 |
memory_usage | 534240 |
We don’t have any columns that have all of the missing observations, but we do have a lot of missing values in every dataset. Some manipulations should be performed to improve the quality of the data.
Let’s produce a heatmap to check the clone amount each year. I’m going to create another dataset with the Year and Clone count. Then I will create the objects corresponding to the clones and years array. Finally, I created the matrix that represents the presence and absence of the clone in the year.
pheno2 <- pheno %>%
count(Ano, Clone)
genmat <- model.matrix( ~ -1 + Clone, data = pheno2)
envmat <- model.matrix( ~ -1 + Ano, data = pheno2)
genenvmat <- t(envmat) %*% genmat
genenvmat_ch <- ifelse(genenvmat == 1, "Present", "Abscent")
Heatmap(
genenvmat_ch,
col = c("white", "tomato"),
show_column_names = F,
heatmap_legend_param = list(title = ""),
column_title = "Genotypes",
row_title = "Environments"
)
Version | Author | Date |
---|---|---|
bf7b1d3 | WevertonGomesCosta | 2022-11-11 |
From the heatmap, it is clear that the year 2016 has very few observations. So, we must eliminate it.
Just for reference, let’s re-view the clone heatmap by year.
pheno2 <- pheno %>%
count(Ano, Clone)
genmat = model.matrix(~ -1 + Clone, data = pheno2)
envmat = model.matrix(~ -1 + Ano, data = pheno2)
genenvmat = t(envmat) %*% genmat
genenvmat_ch = ifelse(genenvmat == 1, "Present", "Abscent")
Heatmap(
genenvmat_ch,
col = c("white", "tomato"),
show_column_names = F,
heatmap_legend_param = list(title = ""),
column_title = "Genotypes",
row_title = "Environments"
)
Version | Author | Date |
---|---|---|
bf7b1d3 | WevertonGomesCosta | 2022-11-11 |
We can check how many clones we have in common between the years and also note that the years differ in the number of clones evaluated:
Table 1 - Year of evaluation and number of genotypes evaluated
pheno2 <- pheno %>%
count(Ano, Clone)
genmat = model.matrix(~ -1 + Clone, data = pheno2)
envmat = model.matrix(~ -1 + Ano, data = pheno2)
genenvmat = t(envmat) %*% genmat
genenvmat %*% t(genenvmat) %>%
kbl(escape = F, align = 'c') %>%
kable_classic(
"hover",
full_width = F,
position = "center",
fixed_thead = T
)
Ano2017 | Ano2018 | Ano2019 | Ano2020 | |
---|---|---|---|---|
Ano2017 | 165 | 42 | 22 | 14 |
Ano2018 | 42 | 138 | 39 | 16 |
Ano2019 | 22 | 39 | 133 | 29 |
Ano2020 | 14 | 16 | 29 | 138 |
The year 2020 has a lower number of clones in common, however, we will keep it for the analysis.
Here, it is possible to observe that our dataset has clones that were evaluated in just one year. Let’s visualize this, to see how many clones were evaluated according to the number of years.
pheno %>%
count(Ano, Clone) %>%
count(Clone) %>%
count(n) %>%
kbl(
escape = F,
align = 'c',
col.names = c("N of Environment", "N of genotypes")
) %>%
kable_classic(
"hover",
full_width = F,
position = "center",
fixed_thead = T
)
Storing counts in `nn`, as `n` already present in input
ℹ Use `name = "new_name"` to pick a new name.
N of Environment | N of genotypes |
---|---|
1 | 350 |
2 | 72 |
3 | 20 |
4 | 5 |
Only 5 clones were evaluated in all years, this will possibly decrease our model accuracy.
Another factor that reduces the accuracy, and therefore adopting mixed models via REML in the analysis is the most suitable for obtaining BLUPs.
Inicialmente, iremos verificar as estatísticas descritivas para cada coluna.
summary(pheno) %>%
t() %>%
kbl(escape = F, align = 'c') %>%
kable_classic(
"hover",
full_width = F,
position = "center",
fixed_thead = T
)
Clone | BGM-0164: 16 | BGM-0170: 16 | BGM-0396: 16 | BGM-1243: 16 | BGM-1267: 16 | BGM-0134: 12 | (Other) :2204 |
Ano | 2017:660 | 2018:552 | 2019:532 | 2020:552 | NA | NA | NA |
Bloco | 1:574 | 2:574 | 3:574 | 4:574 | NA | NA | NA |
row | 11 : 121 | 3 : 120 | 10 : 120 | 2 : 119 | 6 : 118 | 4 : 117 | (Other):1581 |
col | 6 : 131 | 4 : 130 | 8 : 130 | 2 : 128 | 5 : 128 | 9 : 128 | (Other):1521 |
N_Roots | Min. : 0.125 | 1st Qu.: 2.333 | Median : 4.000 | Mean : 4.293 | 3rd Qu.: 6.000 | Max. :15.667 | NA’s :322 |
FRY | Min. : 0.116 | 1st Qu.: 1.857 | Median : 3.750 | Mean : 4.946 | 3rd Qu.: 6.857 | Max. :22.200 | NA’s :335 |
ShY | Min. : 0.694 | 1st Qu.: 6.944 | Median :11.167 | Mean :14.228 | 3rd Qu.:18.714 | Max. :61.167 | NA’s :211 |
DMC | Min. :11.98 | 1st Qu.:25.00 | Median :28.80 | Mean :29.06 | 3rd Qu.:32.78 | Max. :48.34 | NA’s :968 |
StY | Min. :0.0154 | 1st Qu.:0.5494 | Median :1.2068 | Mean :1.5164 | 3rd Qu.:2.1092 | Max. :8.8669 | NA’s :978 |
Plant.Height | Min. :0.3600 | 1st Qu.:0.9633 | Median :1.1600 | Mean :1.1919 | 3rd Qu.:1.3908 | Max. :3.0333 | NA’s :220 |
HI | Min. : 1.574 | 1st Qu.:15.726 | Median :23.345 | Mean :24.556 | 3rd Qu.:31.884 | Max. :71.967 | NA’s :341 |
StC | Min. : 7.326 | 1st Qu.:20.350 | Median :24.147 | Mean :24.419 | 3rd Qu.:28.160 | Max. :43.686 | NA’s :968 |
PltArc | Min. :1.000 | 1st Qu.:1.000 | Median :2.000 | Mean :1.996 | 3rd Qu.:3.000 | Max. :5.000 | NA’s :678 |
Leaf.Ret | Min. :0.3333 | 1st Qu.:1.0000 | Median :1.3333 | Mean :1.7796 | 3rd Qu.:2.0000 | Max. :5.0000 | NA’s :216 |
Root.Le | Min. : 7.00 | 1st Qu.:19.00 | Median :23.00 | Mean :23.21 | 3rd Qu.:27.00 | Max. :47.33 | NA’s :339 |
Root.Di | Min. : 6.12 | 1st Qu.:23.36 | Median :28.17 | Mean :28.88 | 3rd Qu.:33.50 | Max. :63.30 | NA’s :339 |
Stem.D | Min. :1.013 | 1st Qu.:1.859 | Median :2.101 | Mean :2.112 | 3rd Qu.:2.362 | Max. :4.373 | NA’s :224 |
Mite | Min. :1.000 | 1st Qu.:3.000 | Median :3.667 | Mean :3.475 | 3rd Qu.:4.000 | Max. :5.000 | NA’s :222 |
Incidence_Mites | Min. :1 | 1st Qu.:1 | Median :1 | Mean :1 | 3rd Qu.:1 | Max. :1 | NA’s :674 |
Nstem.Plant | Min. :1.000 | 1st Qu.:1.333 | Median :2.000 | Mean :2.131 | 3rd Qu.:2.667 | Max. :6.667 | NA’s :850 |
Stand6MAP | Min. :1.000 | 1st Qu.:4.000 | Median :6.000 | Mean :5.202 | 3rd Qu.:7.000 | Max. :8.000 | NA’s :849 |
Branching | Min. :0.0000 | 1st Qu.:0.0000 | Median :0.3333 | Mean :0.5911 | 3rd Qu.:1.0000 | Max. :4.0000 | NA’s :673 |
Staygreen | Min. :1.000 | 1st Qu.:1.000 | Median :1.000 | Mean :1.292 | 3rd Qu.:2.000 | Max. :3.000 | NA’s :669 |
Vigor | Mode:logical | TRUE:1075 | NA’s:1221 | NA | NA | NA | NA |
Flowering | Min. :0.0000 | 1st Qu.:0.0000 | Median :0.0000 | Mean :0.0074 | 3rd Qu.:0.0000 | Max. :1.0000 | NA’s :673 |
Canopy_Lenght | Min. : 10.00 | 1st Qu.: 54.33 | Median : 72.00 | Mean : 73.35 | 3rd Qu.: 91.67 | Max. :240.17 | NA’s :1458 |
Canopy_Width | Min. : 7.50 | 1st Qu.: 56.33 | Median : 75.58 | Mean : 76.68 | 3rd Qu.: 96.00 | Max. :213.33 | NA’s :1458 |
Leaf_Lenght | Min. : 3.750 | 1st Qu.: 8.417 | Median :11.250 | Mean :11.676 | 3rd Qu.:14.200 | Max. :32.667 | NA’s :1459 |
We have a high missing value ratio for Vigor, Leaf_Lenght, Canopy_Width and Canopy_Lenght, I’ll exclude those too. Mite Incidence and Flowering have little information for some levels and many NA’s, we will also exclude these traits.
pheno <- pheno %>%
select(-c(
Incidence_Mites,
Vigor,
Flowering,
Leaf_Lenght,
Canopy_Width,
Canopy_Lenght
))
Let’s just look at the missing values now, to check the proportions.
Version | Author | Date |
---|---|---|
bf7b1d3 | WevertonGomesCosta | 2022-11-11 |
StC, DMC and StY still do not have good quality, but we will maintain these traits, as they are important to us.
Let’s check the distribution of traits by year now and let’s look at the histograms of the quantitative traits:
Version | Author | Date |
---|---|---|
bf7b1d3 | WevertonGomesCosta | 2022-11-11 |
For Branching, Leaf.Ret, Mite, PltArc, Stand6MAP and Staygreen don’t have normal distribution. To get the BLUPs we will have to remove that traits from the database.
First, let’s check the amount of missing values for each clone by year. We are filtering the Clones with the average bigger than 2 missing values by year.
pheno2 <- pheno %>%
select(-Bloco) %>%
group_by(Clone, Ano) %>%
summarise_all(.funs = list(~ sum(is.na(.)))) %>%
ungroup() %>%
select_numeric_cols() %>%
mutate(mean = rowMeans(.), Clone.Ano = unique(interaction(pheno$Clone, pheno$Ano))) %>%
filter(mean > 2) %>%
droplevels()
nlevels(pheno2$Clone.Ano) %>%
kbl(
escape = F,
align = 'c',
col.names = c("N of genotypes")
) %>%
kable_classic(
"hover",
full_width = F,
position = "center",
fixed_thead = T
)
N of genotypes |
---|
40 |
76 clones presented many missing values, i.e., they were evaluated in less than two blocks by year. Therefore, they should be excluded from future analyses, according to the year.
Let’s evaluate the descriptive statistics of the combination between clone and year for the traits.
ge_details(pheno, Ano, Clone, resp = everything()) %>%
t() %>%
kbl(escape = F, align = 'c') %>%
kable_classic(
"hover",
full_width = F,
position = "center",
fixed_thead = T
)
Parameters | Mean | SE | SD | CV | Min | Max | MinENV | MaxENV | MinGEN | MaxGEN |
N_Roots | 4.29 | 0.06 | 2.51 | 58.56 | 0.12 (BGM-1031 in 2017) | 15.67 (2012-107-002 in 2019) | 2018 (1.62) | 2019 (5.76) | BGM-0411 (0.33) | 2012-107-002 (11.33) |
FRY | 4.95 | 0.09 | 4.04 | 81.79 | 0.12 (BGM-0886 in 2017) | 22.2 (BGM-1267 in 2018) | 2017 (2.75) | 2020 (6.52) | BGM-1488 (0.34) | IAC-14 (14.07) |
ShY | 14.23 | 0.22 | 10.16 | 71.45 | 0.69 (BGM-0996 in 2017) | 61.17 (BGM-2124 in 2020) | 2017 (8.47) | 2020 (25.87) | BGM-0048 (1.52) | BGM-2124 (54.33) |
DMC | 29.06 | 0.17 | 6.1 | 21 | 11.98 (BGM-0626 in 2020) | 48.34 (BGM-1015 in 2020) | 2020 (26.21) | 2018 (35.38) | BGM-0626 (14.98) | BGM-1015 (45.02) |
StY | 1.52 | 0.04 | 1.27 | 84.01 | 0.02 (BGM-0340 in 2019) | 8.87 (BGM-0396 in 2018) | 2019 (1.32) | 2018 (1.84) | BGM-0089 (0.06) | BGM-1023 (4.76) |
Plant.Height | 1.19 | 0.01 | 0.33 | 27.43 | 0.36 (BGM-0426 in 2020) | 3.03 (BR-11-24-156 in 2020) | 2017 (1) | 2019 (1.48) | Jatobá (0.58) | BGM-1200 (1.91) |
HI | 24.56 | 0.27 | 11.89 | 48.42 | 1.57 (BGM-1159 in 2017) | 71.97 (BGM-1315 in 2018) | 2020 (18.61) | 2018 (32.36) | BGM-0961 (2.5) | Mata_Fome_Branca (52.78) |
StC | 24.42 | 0.17 | 6.11 | 25.05 | 7.33 (BGM-0626 in 2020) | 43.69 (BGM-1015 in 2020) | 2020 (21.56) | 2018 (30.78) | BGM-0626 (10.33) | BGM-1015 (40.37) |
Root.Le | 23.21 | 0.13 | 5.8 | 24.99 | 7 (BGM-1574 in 2017) | 47.33 (BGM-0396 in 2018) | 2017 (19.72) | 2019 (27.14) | Jatobá (8.5) | BGM-1956 (35.5) |
Root.Di | 28.88 | 0.18 | 7.9 | 27.35 | 6.12 (BGM-2142 in 2019) | 63.3 (BRS Mulatinha in 2018) | 2017 (24.49) | 2018 (34.74) | BGM-0089 (12.14) | BGM-1956 (48.91) |
Stem.D | 2.11 | 0.01 | 0.38 | 17.9 | 1.01 (BGM-0592 in 2018) | 4.37 (BRS Tapioqueira in 2020) | 2018 (2.02) | 2017 (2.16) | BGM-0048 (1.25) | BGM-1523 (2.93) |
Nstem.Plant | 2.13 | 0.03 | 0.95 | 44.53 | 1 (BGM-0036 in 2018) | 6.67 (BGM-0714 in 2019) | 2018 (1.44) | 2019 (2.71) | BGM-0066 (1) | BGM-0451 (4.44) |
Apparently we no longer have a genotype that could harm our analysis. Now we must evaluate the clone-only descriptive statistics for the traits. Some traits presented hight cv, StY and FRY, but we will go continue.
desc_stat(pheno,
by = Ano,
na.rm = TRUE,
stats = "cv") %>%
na.omit() %>%
arrange(desc(cv)) %>%
pivot_wider(names_from = "Ano", values_from = "cv") %>%
kbl(escape = F, align = 'c') %>%
kable_classic(
"hover",
full_width = F,
position = "center",
fixed_thead = T
)
variable | 2018 | 2020 | 2019 | 2017 |
---|---|---|---|---|
StY | 91.8742 | 81.2220 | 72.3243 | NA |
FRY | 86.0073 | 68.4455 | 70.0326 | 66.8379 |
ShY | 69.0625 | 42.7475 | 47.4901 | 45.4973 |
N_Roots | 63.2913 | 47.0839 | 47.2357 | 48.8708 |
HI | 41.5442 | 43.3204 | 41.8572 | 48.2942 |
Nstem.Plant | 33.3849 | 37.1507 | 37.3740 | NA |
Plant.Height | 28.0409 | 24.0927 | 19.1125 | 21.0398 |
StC | 15.6937 | 27.3600 | 16.5407 | NA |
Root.Le | 27.1625 | 18.9730 | 21.1089 | 21.5823 |
Root.Di | 23.3943 | 20.5858 | 24.3155 | 21.0039 |
DMC | 13.6086 | 22.5059 | 13.8212 | NA |
Stem.D | 22.3941 | 17.5042 | 15.7037 | 16.7774 |
Again, some traits were not computed for the year 2017, so we have to eliminate that year when performing the analysis for these traits.
What draws attention in this table are the high cv for some traits, especially: HI, Nstem.Plant, N_Roots, ShY, StY and FRY.
This may be due to the presence of outliers, let’s inspect the entire dataset to assess whether there are outliers:
inspect(pheno %>%
select_if(~ !is.factor(.)), verbose = FALSE) %>%
arrange(desc(Outlier)) %>%
kbl(escape = F, align = 'c') %>%
kable_classic(
"hover",
full_width = F,
position = "center",
fixed_thead = T
)
Variable | Class | Missing | Levels | Valid_n | Min | Median | Max | Outlier | Text |
---|---|---|---|---|---|---|---|---|---|
ShY | numeric | Yes |
|
2085 | 0.69 | 11.17 | 61.17 | 89 | NA |
FRY | numeric | Yes |
|
1961 | 0.12 | 3.75 | 22.20 | 71 | NA |
StY | numeric | Yes |
|
1318 | 0.02 | 1.21 | 8.87 | 49 | NA |
Root.Di | numeric | Yes |
|
1957 | 6.12 | 28.17 | 63.30 | 32 | NA |
Nstem.Plant | numeric | Yes |
|
1446 | 1.00 | 2.00 | 6.67 | 30 | NA |
Plant.Height | numeric | Yes |
|
2076 | 0.36 | 1.16 | 3.03 | 23 | NA |
Stem.D | numeric | Yes |
|
2072 | 1.01 | 2.10 | 4.37 | 22 | NA |
Root.Le | numeric | Yes |
|
1957 | 7.00 | 23.00 | 47.33 | 19 | NA |
N_Roots | numeric | Yes |
|
1974 | 0.12 | 4.00 | 15.67 | 16 | NA |
HI | numeric | Yes |
|
1955 | 1.57 | 23.35 | 71.97 | 15 | NA |
DMC | numeric | Yes |
|
1328 | 11.98 | 28.80 | 48.34 | 6 | NA |
StC | numeric | Yes |
|
1328 | 7.33 | 24.15 | 43.69 | 6 | NA |
Confirming what was described before, most traits with high cv have many outliers and therefore we will exclude them in the loop to obtain the blups.
Now let’s just perform a general inspection of the data to finish the manipulations.
StC with DMC and StY with FRY show high correlation.
Furthermore most of the traits apparently show normal distribution of phenotypic data. So let’s save the clean data and move on to getting the blups.
write.csv(pheno,
"data/pheno_clean.csv",
row.names = F,
quote = F)
pheno_mean_sd <- desc_stat(pheno, stats = c("mean, min, max, cv"))
write.csv(pheno_mean_sd,
"output/pheno_mean_sd.csv",
row.names = F,
quote = F)
Proceed with Genotype-environment analysis using mixed-effect models to obtain BLUPs (Best Linear Unbiased Predictors) and further interpret the results to identify superior genotypes.
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=Portuguese_Brazil.utf8 LC_CTYPE=Portuguese_Brazil.utf8
[3] LC_MONETARY=Portuguese_Brazil.utf8 LC_NUMERIC=C
[5] LC_TIME=Portuguese_Brazil.utf8
time zone: America/Sao_Paulo
tzcode source: internal
attached base packages:
[1] grid stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] GGally_2.1.2 ggthemes_4.2.4 DataExplorer_0.8.2
[4] metan_1.18.0 readxl_1.4.3 data.table_1.14.8
[7] ComplexHeatmap_2.16.0 lubridate_1.9.2 forcats_1.0.0
[10] stringr_1.5.0 dplyr_1.1.2 purrr_1.0.2
[13] readr_2.1.4 tidyr_1.3.0 tibble_3.2.1
[16] ggplot2_3.4.3 tidyverse_2.0.0 kableExtra_1.3.4
loaded via a namespace (and not attached):
[1] gridExtra_2.3 rlang_1.1.1 magrittr_2.0.3
[4] clue_0.3-64 GetoptLong_1.0.5 git2r_0.32.0
[7] matrixStats_1.0.0 compiler_4.3.1 png_0.1-8
[10] systemfonts_1.0.4 vctrs_0.6.3 rvest_1.0.3
[13] pkgconfig_2.0.3 shape_1.4.6 crayon_1.5.2
[16] fastmap_1.1.1 labeling_0.4.2 utf8_1.2.3
[19] promises_1.2.1 rmarkdown_2.24 tzdb_0.4.0
[22] nloptr_2.0.3 xfun_0.40 cachem_1.0.8
[25] jsonlite_1.8.7 highr_0.10 later_1.3.1
[28] reshape_0.8.9 tweenr_2.0.2 parallel_4.3.1
[31] cluster_2.1.4 R6_2.5.1 bslib_0.5.1
[34] stringi_1.7.12 RColorBrewer_1.1-3 boot_1.3-28.1
[37] numDeriv_2016.8-1.1 jquerylib_0.1.4 cellranger_1.1.0
[40] Rcpp_1.0.11 iterators_1.0.14 knitr_1.43
[43] IRanges_2.34.1 igraph_1.5.1 Matrix_1.6-1
[46] httpuv_1.6.11 splines_4.3.1 timechange_0.2.0
[49] tidyselect_1.2.0 rstudioapi_0.15.0 yaml_2.3.7
[52] doParallel_1.0.17 codetools_0.2-19 lmerTest_3.1-3
[55] lattice_0.21-8 plyr_1.8.8 withr_2.5.2
[58] evaluate_0.22 polyclip_1.10-4 xml2_1.3.5
[61] circlize_0.4.15 pillar_1.9.0 whisker_0.4.1
[64] foreach_1.5.2 stats4_4.3.1 generics_0.1.3
[67] rprojroot_2.0.3 mathjaxr_1.6-0 S4Vectors_0.38.1
[70] hms_1.1.3 munsell_0.5.0 scales_1.2.1
[73] minqa_1.2.6 glue_1.6.2 tools_4.3.1
[76] lme4_1.1-34 webshot_0.5.5 fs_1.6.3
[79] colorspace_2.1-0 networkD3_0.4 nlme_3.1-163
[82] patchwork_1.1.3 ggforce_0.4.1 cli_3.6.1
[85] workflowr_1.7.1 fansi_1.0.4 viridisLite_0.4.2
[88] svglite_2.1.1 gtable_0.3.4 sass_0.4.7
[91] digest_0.6.33 BiocGenerics_0.46.0 ggrepel_0.9.3
[94] htmlwidgets_1.6.2 rjson_0.2.21 farver_2.1.1
[97] htmltools_0.5.6 lifecycle_1.0.3 httr_1.4.7
[100] GlobalOptions_0.1.2 MASS_7.3-60
Weverton Gomes da Costa, Pós-Doutorando, Embrapa Mandioca e Fruticultura, wevertonufv@gmail.com↩︎