Last updated: 2025-01-07

Checks: 6 1

Knit directory: Genomic-Selection-for-Drought-Tolerance-Using-Genome-Wide-SNPs-in-Casava/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20221020) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 5b6a406. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    data/allchrAR08.txt

Untracked files:
    Untracked:  ETA_1_ScaleBayesA.dat
    Untracked:  ETA_1_parBayesB.dat
    Untracked:  analysis/figure/
    Untracked:  analysis/mixed_models.Rmd
    Untracked:  data/Articles/
    Untracked:  data/Phenotyping2.xlsx
    Untracked:  data/Tabela auxiliar.xlsx
    Untracked:  data/data.rar
    Untracked:  mu.dat
    Untracked:  output/BLUPS2.csv
    Untracked:  output/BLUPS_GEBV_GETG_all clones.csv
    Untracked:  output/BLUPS_density_med.png
    Untracked:  output/BLUPS_density_med_row_col.png
    Untracked:  output/BLUPS_drgBLUPs__boxplot_med.png
    Untracked:  output/BLUPS_drgBLUPs__boxplot_med_row_col.png
    Untracked:  output/BLUPS_drgBLUPs_density_med.png
    Untracked:  output/BLUPS_drgBLUPs_med.png
    Untracked:  output/BLUPS_par_mmer.Rdata
    Untracked:  output/BLUPS_row_col.csv
    Untracked:  output/BLUPS_row_col_random.csv
    Untracked:  output/BLUPS_x_BLUPS_row_col_boxplot_med.png
    Untracked:  output/Density_residual_row_col.tiff
    Untracked:  output/Figuras_article.rar
    Untracked:  output/GEBVS_BayesA.RDS
    Untracked:  output/GEBVS_BayesB.RDS
    Untracked:  output/GEBVS_DOM.RDS
    Untracked:  output/GEBVS_G_BLUP.RDS
    Untracked:  output/GEBVS_G_BLUP_row_col.RDS
    Untracked:  output/GEBVS_G_BLUP_row_col_random.RDS
    Untracked:  output/GEBVS_RF.RDS
    Untracked:  output/GEBVS_RKHS.RDS
    Untracked:  output/GEBVS_RR_BLUP.RDS
    Untracked:  output/GEBV_BLUP.csv
    Untracked:  output/GEBV_DOMxBLUPS.tiff
    Untracked:  output/GEBV_GBLUPxBLUPS.tiff
    Untracked:  output/GEBVxGETGV.tiff
    Untracked:  output/GETGVxBLUPS.tiff
    Untracked:  output/G_matrix.rds
    Untracked:  output/H2.csv
    Untracked:  output/H2_row_col.csv
    Untracked:  output/H2_row_col_random.csv
    Untracked:  output/Heatmap.tiff
    Untracked:  output/MSPE_all_methods.tiff
    Untracked:  output/Residuals_vs_fitted_row_col.tiff
    Untracked:  output/accuracy_all_methods.tiff
    Untracked:  output/accuracy_all_methods_points.tiff
    Untracked:  output/cor_GEBVxBLUPS.tiff
    Untracked:  output/cor_GEBVxBLUPS_all_methods.tiff
    Untracked:  output/correlation_blups.tiff
    Untracked:  output/density_blups.tiff
    Untracked:  output/dierencial_selecao.csv
    Untracked:  output/dierencial_selecao.xlsx
    Untracked:  output/drgBLUP.csv
    Untracked:  output/fitted_residual_data_row_col_random.csv
    Untracked:  output/fitted_values_row_col_random.csv
    Untracked:  output/indice_selection.tiff
    Untracked:  output/indice_selection3.tiff
    Untracked:  output/indice_selection_GEBV.tiff
    Untracked:  output/indice_selection_GEBV_GETGV.tiff
    Untracked:  output/indice_selection_GETGV.tiff
    Untracked:  output/kappa.tiff
    Untracked:  output/kappa2.tiff
    Untracked:  output/mean_pheno.csv
    Untracked:  output/medias_semestre.xlsx
    Untracked:  output/pesos_BLUPS_all clones.csv
    Untracked:  output/pesos_GEBVS_all clones.csv
    Untracked:  output/pesos_GETGVS_all clones.csv
    Untracked:  output/pheno_mean_sd.csv
    Untracked:  output/residuos_row_col_random.csv
    Untracked:  output/result_sommer.RData
    Untracked:  output/result_sommer_row_col.RData
    Untracked:  output/result_sommer_row_col_random.RDS
    Untracked:  output/results_MSPE.csv
    Untracked:  output/results_accuracy.csv
    Untracked:  output/results_cv_BayesA.RDS
    Untracked:  output/results_cv_BayesB.RDS
    Untracked:  output/results_cv_DOM.RDS
    Untracked:  output/results_cv_G_BLUP.RDS
    Untracked:  output/results_cv_G_BLUP_row_col.RDS
    Untracked:  output/results_cv_G_BLUP_row_col_random.RDS
    Untracked:  output/results_cv_RF.RDS
    Untracked:  output/results_cv_RKHS.RDS
    Untracked:  output/results_cv_RR_BLUP.RDS
    Untracked:  output/results_h2_GBLUP (1).rds
    Untracked:  output/results_h2_GBLUP.csv
    Untracked:  output/results_h2_GBLUP.rds
    Untracked:  output/results_h2_GBLUP_D.rds
    Untracked:  output/results_kappa.csv
    Untracked:  output/results_kappa2.csv
    Untracked:  output/results_values-selection.csv
    Untracked:  output/results_values_kappa.csv
    Untracked:  output/results_values_rescale_selection.csv
    Untracked:  output/results_values_selection.csv
    Untracked:  output/rstudio-export.zip
    Untracked:  output/teste_LRT.csv
    Untracked:  output/teste_LRT_row_col.csv
    Untracked:  output/teste_LRT_row_col_random.csv
    Untracked:  output/varcomp.tiff
    Untracked:  output/varcomp2.csv
    Untracked:  output/varcomp_row_col.csv
    Untracked:  output/varcomp_row_col.tiff
    Untracked:  output/varcomp_row_col_random.csv
    Untracked:  output/varcomp_row_col_random.tiff
    Untracked:  varE.dat

Unstaged changes:
    Modified:   README.md
    Modified:   analysis/GWS.Rmd
    Modified:   analysis/_site.yml
    Modified:   analysis/index.Rmd
    Modified:   analysis/license.Rmd
    Modified:   analysis/phenotype.Rmd
    Modified:   data/SNPs.rds
    Deleted:    data/convet_haplo_diplo.txt
    Modified:   data/pheno_clean.csv
    Deleted:    output/BLUPS.RDS
    Modified:   output/BLUPS.csv
    Deleted:    output/BLUPS_Multi.csv
    Modified:   output/BLUPS_par.Rdata
    Deleted:    output/BLUPS_par_ind.Rdata
    Deleted:    output/H2_Multi.csv
    Deleted:    output/accuracy.png
    Deleted:    output/accuracy2.png
    Deleted:    output/accuracy_multi.png
    Deleted:    output/blups_med.png
    Deleted:    output/herdabilidade.csv
    Modified:   output/media_pheno.csv
    Deleted:    output/media_pheno_Multi.csv
    Deleted:    output/resultMM.Rdata
    Deleted:    output/result_G_BLUP.csv
    Deleted:    output/results_cv.csv
    Modified:   output/varcomp.csv
    Deleted:    output/varcomp_multi.csv

Staged changes:
    New:        Artigo GS tolerancia a seca_revised.rar
    New:        Artigo GS tolerancia a seca_revised/Costa et al_FPS_2024 manuscript.docx
    New:        Artigo GS tolerancia a seca_revised/Costa et al_FPS_2024 manuscript_revised.docx
    New:        Artigo GS tolerancia a seca_revised/Costa et al_FPS_2024 supplement.docx
    New:        Artigo GS tolerancia a seca_revised/Figure 1.tiff
    New:        Artigo GS tolerancia a seca_revised/Figure 2.tiff
    New:        Artigo GS tolerancia a seca_revised/Figure 3.tiff
    New:        Artigo GS tolerancia a seca_revised/Figure 4.tiff
    New:        Artigo GS tolerancia a seca_revised/GWS.Rmd
    New:        Artigo GS tolerancia a seca_revised/phenotype.Rmd
    New:        Artigo GS tolerancia a seca_revised/response to reviewers.docx
    New:        Seleção Genômica deficit hidrico.pptx
    New:        data/Article.docx
    New:        data/Article2.docx
    New:        data/Costa et al_FPS_2024 manuscript eng.docx
    New:        data/Costa et al_FPS_2024 manuscript eng_v1.docx
    New:        data/Costa et al_FPS_2024 manuscript eng_v2.docx
    New:        data/Costa et al_FPS_2024 manuscript eng_v3.docx
    New:        data/Costa et al_FPS_2024 manuscript.docx
    New:        data/Costa et al_FPS_2024 supplement.docx
    New:        data/Frontiers_Template.docx
    Deleted:    data/Genomic Selection for Drought Tolerance Using Genome-Wide SNPs in Maize.pdf
    New:        data/Supplementary_Material.docx

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/phenotype.Rmd) and HTML (docs/phenotype.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
html c64a991 Weverton Gomes 2023-10-27 add phenotype html
Rmd 286b492 Weverton Gomes 2023-10-27 Update Scripts and README
Rmd 90dc112 WevertonGomesCosta 2022-11-17 Update
Rmd 6cc4d23 WevertonGomesCosta 2022-11-17 Update
html 6cc4d23 WevertonGomesCosta 2022-11-17 Update
html d930880 WevertonGomesCosta 2022-11-11 Update
Rmd 5988c27 WevertonGomesCosta 2022-11-11 Update
html 5988c27 WevertonGomesCosta 2022-11-11 Update
Rmd bf7b1d3 WevertonGomesCosta 2022-11-11 Update
html bf7b1d3 WevertonGomesCosta 2022-11-11 Update

Libraries

Load Libraries

library(kableExtra)
library(tidyverse)
#if (!require("BiocManager", quietly = TRUE))
#    install.packages("BiocManager")
#BiocManager::install("ComplexHeatmap")
require(ComplexHeatmap)
library(data.table)
library(readxl)
library(metan)
library(DataExplorer)
library(ggthemes)
library(GGally)
theme_set(theme_bw())

Data import and manipulation

Let’s import the phenotypic dataset, excluding the traits without information and the traits Local (redundant with Year) and Treatment (only one observation).

pheno <- read_excel("data/Phenotyping2.xlsx", na = "NA") %>%
  select_if(~ !all(is.na(.))) %>%  # Deleting traits without information
  select(-c("Local", "Tratamento"))

We will perform some manipulations to adjust our database and to facilitate the visualization of the exploratory analysis.

First, let’s convert the traits that are character into factors. Then we will convert the traits that refer to the grades to integers and then into factors. After that, let’s create the trait ANo.Bloco for nesting in the model to obtain the BLUPs.

pheno <- pheno %>%
  mutate(
    Clone = as.factor(Clone),
    Ano = as.factor(Ano),
    Bloco = as.factor(Bloco),
    row = as.factor(row),
    col = as.factor(col)
  )   # Convert Clone, Ano, row, col and Bloco in factors

Exploratory Data Analysis

Introductory analysis of the entire dataset

introduce(pheno) %>%
  t() %>%
  kbl(escape = F, align = 'c') %>%
  kable_classic(
    "hover",
    full_width = F,
    position = "center",
    fixed_thead = T
  )
rows 2336
columns 29
discrete_columns 6
continuous_columns 23
all_missing_columns 0
total_missing_values 16875
complete_rows 440
total_observations 67744
memory_usage 534240

We don’t have any columns that have all of the missing observations, but we do have a lot of missing values in every dataset. Some manipulations should be performed to improve the quality of the data.

Year Analysis

Let’s produce a heatmap to check the clone amount each year. I’m going to create another dataset with the Year and Clone count. Then I will create the objects corresponding to the clones and years array. Finally, I created the matrix that represents the presence and absence of the clone in the year.

pheno2 <- pheno %>%
  count(Ano, Clone)

genmat <- model.matrix( ~ -1 + Clone, data = pheno2)
envmat <- model.matrix( ~ -1 + Ano, data = pheno2)
genenvmat <- t(envmat) %*% genmat
genenvmat_ch <- ifelse(genenvmat == 1, "Present", "Abscent")

Heatmap(
  genenvmat_ch,
  col = c("white", "tomato"),
  show_column_names = F,
  heatmap_legend_param = list(title = ""),
  column_title = "Genotypes",
  row_title = "Environments"
)

Version Author Date
bf7b1d3 WevertonGomesCosta 2022-11-11

From the heatmap, it is clear that the year 2016 has very few observations. So, we must eliminate it.

pheno <- pheno %>%
  filter(Ano != 2016) %>%
  droplevels()

Just for reference, let’s re-view the clone heatmap by year.

pheno2 <- pheno %>%
  count(Ano, Clone)

genmat = model.matrix(~ -1 + Clone, data = pheno2)
envmat = model.matrix(~ -1 + Ano, data = pheno2)
genenvmat = t(envmat) %*% genmat
genenvmat_ch = ifelse(genenvmat == 1, "Present", "Abscent")

Heatmap(
  genenvmat_ch,
  col = c("white", "tomato"),
  show_column_names = F,
  heatmap_legend_param = list(title = ""),
  column_title = "Genotypes",
  row_title = "Environments"
)

Version Author Date
bf7b1d3 WevertonGomesCosta 2022-11-11

We can check how many clones we have in common between the years and also note that the years differ in the number of clones evaluated:

Table 1 - Year of evaluation and number of genotypes evaluated

pheno2 <- pheno %>%
  count(Ano, Clone)

genmat = model.matrix(~ -1 + Clone, data = pheno2)
envmat = model.matrix(~ -1 + Ano, data = pheno2)
genenvmat = t(envmat) %*% genmat

genenvmat %*% t(genenvmat) %>%
  kbl(escape = F, align = 'c') %>%
  kable_classic(
    "hover",
    full_width = F,
    position = "center",
    fixed_thead = T
  )
Ano2017 Ano2018 Ano2019 Ano2020
Ano2017 165 42 22 14
Ano2018 42 138 39 16
Ano2019 22 39 133 29
Ano2020 14 16 29 138
rm(pheno2, genmat, envmat, genenvmat, genenvmat_ch)

The year 2020 has a lower number of clones in common, however, we will keep it for the analysis.

Here, it is possible to observe that our dataset has clones that were evaluated in just one year. Let’s visualize this, to see how many clones were evaluated according to the number of years.

pheno %>%
  count(Ano, Clone) %>%
  count(Clone) %>%
  count(n) %>%
  kbl(
    escape = F,
    align = 'c',
    col.names = c("N of Environment", "N of genotypes")
  ) %>%
  kable_classic(
    "hover",
    full_width = F,
    position = "center",
    fixed_thead = T
  )
Storing counts in `nn`, as `n` already present in input
ℹ Use `name = "new_name"` to pick a new name.
N of Environment N of genotypes
1 350
2 72
3 20
4 5

Only 5 clones were evaluated in all years, this will possibly decrease our model accuracy.

Another factor that reduces the accuracy, and therefore adopting mixed models via REML in the analysis is the most suitable for obtaining BLUPs.

Analysis of traits

Inicialmente, iremos verificar as estatísticas descritivas para cada coluna.

summary(pheno) %>%
  t() %>%
  kbl(escape = F, align = 'c') %>%
  kable_classic(
    "hover",
    full_width = F,
    position = "center",
    fixed_thead = T
  )
Clone BGM-0164: 16 BGM-0170: 16 BGM-0396: 16 BGM-1243: 16 BGM-1267: 16 BGM-0134: 12 (Other) :2204
Ano 2017:660 2018:552 2019:532 2020:552 NA NA NA
Bloco 1:574 2:574 3:574 4:574 NA NA NA
row 11 : 121 3 : 120 10 : 120 2 : 119 6 : 118 4 : 117 (Other):1581
col 6 : 131 4 : 130 8 : 130 2 : 128 5 : 128 9 : 128 (Other):1521
N_Roots Min. : 0.125 1st Qu.: 2.333 Median : 4.000 Mean : 4.293 3rd Qu.: 6.000 Max. :15.667 NA’s :322
FRY Min. : 0.116 1st Qu.: 1.857 Median : 3.750 Mean : 4.946 3rd Qu.: 6.857 Max. :22.200 NA’s :335
ShY Min. : 0.694 1st Qu.: 6.944 Median :11.167 Mean :14.228 3rd Qu.:18.714 Max. :61.167 NA’s :211
DMC Min. :11.98 1st Qu.:25.00 Median :28.80 Mean :29.06 3rd Qu.:32.78 Max. :48.34 NA’s :968
StY Min. :0.0154 1st Qu.:0.5494 Median :1.2068 Mean :1.5164 3rd Qu.:2.1092 Max. :8.8669 NA’s :978
Plant.Height Min. :0.3600 1st Qu.:0.9633 Median :1.1600 Mean :1.1919 3rd Qu.:1.3908 Max. :3.0333 NA’s :220
HI Min. : 1.574 1st Qu.:15.726 Median :23.345 Mean :24.556 3rd Qu.:31.884 Max. :71.967 NA’s :341
StC Min. : 7.326 1st Qu.:20.350 Median :24.147 Mean :24.419 3rd Qu.:28.160 Max. :43.686 NA’s :968
PltArc Min. :1.000 1st Qu.:1.000 Median :2.000 Mean :1.996 3rd Qu.:3.000 Max. :5.000 NA’s :678
Leaf.Ret Min. :0.3333 1st Qu.:1.0000 Median :1.3333 Mean :1.7796 3rd Qu.:2.0000 Max. :5.0000 NA’s :216
Root.Le Min. : 7.00 1st Qu.:19.00 Median :23.00 Mean :23.21 3rd Qu.:27.00 Max. :47.33 NA’s :339
Root.Di Min. : 6.12 1st Qu.:23.36 Median :28.17 Mean :28.88 3rd Qu.:33.50 Max. :63.30 NA’s :339
Stem.D Min. :1.013 1st Qu.:1.859 Median :2.101 Mean :2.112 3rd Qu.:2.362 Max. :4.373 NA’s :224
Mite Min. :1.000 1st Qu.:3.000 Median :3.667 Mean :3.475 3rd Qu.:4.000 Max. :5.000 NA’s :222
Incidence_Mites Min. :1 1st Qu.:1 Median :1 Mean :1 3rd Qu.:1 Max. :1 NA’s :674
Nstem.Plant Min. :1.000 1st Qu.:1.333 Median :2.000 Mean :2.131 3rd Qu.:2.667 Max. :6.667 NA’s :850
Stand6MAP Min. :1.000 1st Qu.:4.000 Median :6.000 Mean :5.202 3rd Qu.:7.000 Max. :8.000 NA’s :849
Branching Min. :0.0000 1st Qu.:0.0000 Median :0.3333 Mean :0.5911 3rd Qu.:1.0000 Max. :4.0000 NA’s :673
Staygreen Min. :1.000 1st Qu.:1.000 Median :1.000 Mean :1.292 3rd Qu.:2.000 Max. :3.000 NA’s :669
Vigor Mode:logical TRUE:1075 NA’s:1221 NA NA NA NA
Flowering Min. :0.0000 1st Qu.:0.0000 Median :0.0000 Mean :0.0074 3rd Qu.:0.0000 Max. :1.0000 NA’s :673
Canopy_Lenght Min. : 10.00 1st Qu.: 54.33 Median : 72.00 Mean : 73.35 3rd Qu.: 91.67 Max. :240.17 NA’s :1458
Canopy_Width Min. : 7.50 1st Qu.: 56.33 Median : 75.58 Mean : 76.68 3rd Qu.: 96.00 Max. :213.33 NA’s :1458
Leaf_Lenght Min. : 3.750 1st Qu.: 8.417 Median :11.250 Mean :11.676 3rd Qu.:14.200 Max. :32.667 NA’s :1459

We have a high missing value ratio for Vigor, Leaf_Lenght, Canopy_Width and Canopy_Lenght, I’ll exclude those too. Mite Incidence and Flowering have little information for some levels and many NA’s, we will also exclude these traits.

pheno <- pheno %>%
  select(-c(
    Incidence_Mites,
    Vigor,
    Flowering,
    Leaf_Lenght,
    Canopy_Width,
    Canopy_Lenght
  ))

Let’s just look at the missing values now, to check the proportions.

plot_missing(pheno)

Version Author Date
bf7b1d3 WevertonGomesCosta 2022-11-11

StC, DMC and StY still do not have good quality, but we will maintain these traits, as they are important to us.

Let’s check the distribution of traits by year now and let’s look at the histograms of the quantitative traits:

plot_histogram(pheno, ncol = 6)

Version Author Date
bf7b1d3 WevertonGomesCosta 2022-11-11

For Branching, Leaf.Ret, Mite, PltArc, Stand6MAP and Staygreen don’t have normal distribution. To get the BLUPs we will have to remove that traits from the database.

pheno <- pheno %>%
  select(-c(Branching, Leaf.Ret, Mite, PltArc, Stand6MAP, Staygreen))

Analisys of Clone

First, let’s check the amount of missing values for each clone by year. We are filtering the Clones with the average bigger than 2 missing values by year.

pheno2 <- pheno %>%
  select(-Bloco) %>%
  group_by(Clone, Ano) %>%
  summarise_all(.funs = list(~ sum(is.na(.)))) %>%
  ungroup() %>%
  select_numeric_cols() %>%
  mutate(mean = rowMeans(.), Clone.Ano = unique(interaction(pheno$Clone, pheno$Ano))) %>%
  filter(mean > 2) %>%
  droplevels()

nlevels(pheno2$Clone.Ano) %>%
  kbl(
    escape = F,
    align = 'c',
    col.names = c("N of genotypes")
  ) %>%
  kable_classic(
    "hover",
    full_width = F,
    position = "center",
    fixed_thead = T
  )
N of genotypes
40
rm(pheno2)

76 clones presented many missing values, i.e., they were evaluated in less than two blocks by year. Therefore, they should be excluded from future analyses, according to the year.

Let’s evaluate the descriptive statistics of the combination between clone and year for the traits.

ge_details(pheno, Ano, Clone, resp = everything()) %>%
  t() %>%
  kbl(escape = F, align = 'c') %>%
  kable_classic(
    "hover",
    full_width = F,
    position = "center",
    fixed_thead = T
  )
Parameters Mean SE SD CV Min Max MinENV MaxENV MinGEN MaxGEN
N_Roots 4.29 0.06 2.51 58.56 0.12 (BGM-1031 in 2017) 15.67 (2012-107-002 in 2019) 2018 (1.62) 2019 (5.76) BGM-0411 (0.33) 2012-107-002 (11.33)
FRY 4.95 0.09 4.04 81.79 0.12 (BGM-0886 in 2017) 22.2 (BGM-1267 in 2018) 2017 (2.75) 2020 (6.52) BGM-1488 (0.34) IAC-14 (14.07)
ShY 14.23 0.22 10.16 71.45 0.69 (BGM-0996 in 2017) 61.17 (BGM-2124 in 2020) 2017 (8.47) 2020 (25.87) BGM-0048 (1.52) BGM-2124 (54.33)
DMC 29.06 0.17 6.1 21 11.98 (BGM-0626 in 2020) 48.34 (BGM-1015 in 2020) 2020 (26.21) 2018 (35.38) BGM-0626 (14.98) BGM-1015 (45.02)
StY 1.52 0.04 1.27 84.01 0.02 (BGM-0340 in 2019) 8.87 (BGM-0396 in 2018) 2019 (1.32) 2018 (1.84) BGM-0089 (0.06) BGM-1023 (4.76)
Plant.Height 1.19 0.01 0.33 27.43 0.36 (BGM-0426 in 2020) 3.03 (BR-11-24-156 in 2020) 2017 (1) 2019 (1.48) Jatobá (0.58) BGM-1200 (1.91)
HI 24.56 0.27 11.89 48.42 1.57 (BGM-1159 in 2017) 71.97 (BGM-1315 in 2018) 2020 (18.61) 2018 (32.36) BGM-0961 (2.5) Mata_Fome_Branca (52.78)
StC 24.42 0.17 6.11 25.05 7.33 (BGM-0626 in 2020) 43.69 (BGM-1015 in 2020) 2020 (21.56) 2018 (30.78) BGM-0626 (10.33) BGM-1015 (40.37)
Root.Le 23.21 0.13 5.8 24.99 7 (BGM-1574 in 2017) 47.33 (BGM-0396 in 2018) 2017 (19.72) 2019 (27.14) Jatobá (8.5) BGM-1956 (35.5)
Root.Di 28.88 0.18 7.9 27.35 6.12 (BGM-2142 in 2019) 63.3 (BRS Mulatinha in 2018) 2017 (24.49) 2018 (34.74) BGM-0089 (12.14) BGM-1956 (48.91)
Stem.D 2.11 0.01 0.38 17.9 1.01 (BGM-0592 in 2018) 4.37 (BRS Tapioqueira in 2020) 2018 (2.02) 2017 (2.16) BGM-0048 (1.25) BGM-1523 (2.93)
Nstem.Plant 2.13 0.03 0.95 44.53 1 (BGM-0036 in 2018) 6.67 (BGM-0714 in 2019) 2018 (1.44) 2019 (2.71) BGM-0066 (1) BGM-0451 (4.44)

Apparently we no longer have a genotype that could harm our analysis. Now we must evaluate the clone-only descriptive statistics for the traits. Some traits presented hight cv, StY and FRY, but we will go continue.

desc_stat(pheno,
          by = Ano,
          na.rm = TRUE,
          stats = "cv") %>%
  na.omit() %>%
  arrange(desc(cv)) %>%
  pivot_wider(names_from = "Ano", values_from = "cv") %>%
  kbl(escape = F, align = 'c') %>%
  kable_classic(
    "hover",
    full_width = F,
    position = "center",
    fixed_thead = T
  )
variable 2018 2020 2019 2017
StY 91.8742 81.2220 72.3243 NA
FRY 86.0073 68.4455 70.0326 66.8379
ShY 69.0625 42.7475 47.4901 45.4973
N_Roots 63.2913 47.0839 47.2357 48.8708
HI 41.5442 43.3204 41.8572 48.2942
Nstem.Plant 33.3849 37.1507 37.3740 NA
Plant.Height 28.0409 24.0927 19.1125 21.0398
StC 15.6937 27.3600 16.5407 NA
Root.Le 27.1625 18.9730 21.1089 21.5823
Root.Di 23.3943 20.5858 24.3155 21.0039
DMC 13.6086 22.5059 13.8212 NA
Stem.D 22.3941 17.5042 15.7037 16.7774

Again, some traits were not computed for the year 2017, so we have to eliminate that year when performing the analysis for these traits.

What draws attention in this table are the high cv for some traits, especially: HI, Nstem.Plant, N_Roots, ShY, StY and FRY.

This may be due to the presence of outliers, let’s inspect the entire dataset to assess whether there are outliers:

inspect(pheno %>%
          select_if(~ !is.factor(.)), verbose = FALSE) %>%
  arrange(desc(Outlier)) %>%
  kbl(escape = F, align = 'c') %>%
  kable_classic(
    "hover",
    full_width = F,
    position = "center",
    fixed_thead = T
  )
Variable Class Missing Levels Valid_n Min Median Max Outlier Text
ShY numeric Yes
2085 0.69 11.17 61.17 89 NA
FRY numeric Yes
1961 0.12 3.75 22.20 71 NA
StY numeric Yes
1318 0.02 1.21 8.87 49 NA
Root.Di numeric Yes
1957 6.12 28.17 63.30 32 NA
Nstem.Plant numeric Yes
1446 1.00 2.00 6.67 30 NA
Plant.Height numeric Yes
2076 0.36 1.16 3.03 23 NA
Stem.D numeric Yes
2072 1.01 2.10 4.37 22 NA
Root.Le numeric Yes
1957 7.00 23.00 47.33 19 NA
N_Roots numeric Yes
1974 0.12 4.00 15.67 16 NA
HI numeric Yes
1955 1.57 23.35 71.97 15 NA
DMC numeric Yes
1328 11.98 28.80 48.34 6 NA
StC numeric Yes
1328 7.33 24.15 43.69 6 NA

Confirming what was described before, most traits with high cv have many outliers and therefore we will exclude them in the loop to obtain the blups.

General Inspection

Now let’s just perform a general inspection of the data to finish the manipulations.

corr_plot(pheno, col.by = Ano)

StC with DMC and StY with FRY show high correlation.

Furthermore most of the traits apparently show normal distribution of phenotypic data. So let’s save the clean data and move on to getting the blups.

write.csv(pheno,
          "data/pheno_clean.csv",
          row.names = F,
          quote = F)

pheno_mean_sd <- desc_stat(pheno, stats = c("mean, min, max, cv"))

write.csv(pheno_mean_sd,
          "output/pheno_mean_sd.csv",
          row.names = F,
          quote = F)

Proceed with Genotype-environment analysis using mixed-effect models to obtain BLUPs (Best Linear Unbiased Predictors) and further interpret the results to identify superior genotypes.


sessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default


locale:
[1] LC_COLLATE=Portuguese_Brazil.utf8  LC_CTYPE=Portuguese_Brazil.utf8   
[3] LC_MONETARY=Portuguese_Brazil.utf8 LC_NUMERIC=C                      
[5] LC_TIME=Portuguese_Brazil.utf8    

time zone: America/Sao_Paulo
tzcode source: internal

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] GGally_2.1.2          ggthemes_4.2.4        DataExplorer_0.8.2   
 [4] metan_1.18.0          readxl_1.4.3          data.table_1.14.8    
 [7] ComplexHeatmap_2.16.0 lubridate_1.9.2       forcats_1.0.0        
[10] stringr_1.5.0         dplyr_1.1.2           purrr_1.0.2          
[13] readr_2.1.4           tidyr_1.3.0           tibble_3.2.1         
[16] ggplot2_3.4.3         tidyverse_2.0.0       kableExtra_1.3.4     

loaded via a namespace (and not attached):
  [1] gridExtra_2.3       rlang_1.1.1         magrittr_2.0.3     
  [4] clue_0.3-64         GetoptLong_1.0.5    git2r_0.32.0       
  [7] matrixStats_1.0.0   compiler_4.3.1      png_0.1-8          
 [10] systemfonts_1.0.4   vctrs_0.6.3         rvest_1.0.3        
 [13] pkgconfig_2.0.3     shape_1.4.6         crayon_1.5.2       
 [16] fastmap_1.1.1       labeling_0.4.2      utf8_1.2.3         
 [19] promises_1.2.1      rmarkdown_2.24      tzdb_0.4.0         
 [22] nloptr_2.0.3        xfun_0.40           cachem_1.0.8       
 [25] jsonlite_1.8.7      highr_0.10          later_1.3.1        
 [28] reshape_0.8.9       tweenr_2.0.2        parallel_4.3.1     
 [31] cluster_2.1.4       R6_2.5.1            bslib_0.5.1        
 [34] stringi_1.7.12      RColorBrewer_1.1-3  boot_1.3-28.1      
 [37] numDeriv_2016.8-1.1 jquerylib_0.1.4     cellranger_1.1.0   
 [40] Rcpp_1.0.11         iterators_1.0.14    knitr_1.43         
 [43] IRanges_2.34.1      igraph_1.5.1        Matrix_1.6-1       
 [46] httpuv_1.6.11       splines_4.3.1       timechange_0.2.0   
 [49] tidyselect_1.2.0    rstudioapi_0.15.0   yaml_2.3.7         
 [52] doParallel_1.0.17   codetools_0.2-19    lmerTest_3.1-3     
 [55] lattice_0.21-8      plyr_1.8.8          withr_2.5.2        
 [58] evaluate_0.22       polyclip_1.10-4     xml2_1.3.5         
 [61] circlize_0.4.15     pillar_1.9.0        whisker_0.4.1      
 [64] foreach_1.5.2       stats4_4.3.1        generics_0.1.3     
 [67] rprojroot_2.0.3     mathjaxr_1.6-0      S4Vectors_0.38.1   
 [70] hms_1.1.3           munsell_0.5.0       scales_1.2.1       
 [73] minqa_1.2.6         glue_1.6.2          tools_4.3.1        
 [76] lme4_1.1-34         webshot_0.5.5       fs_1.6.3           
 [79] colorspace_2.1-0    networkD3_0.4       nlme_3.1-163       
 [82] patchwork_1.1.3     ggforce_0.4.1       cli_3.6.1          
 [85] workflowr_1.7.1     fansi_1.0.4         viridisLite_0.4.2  
 [88] svglite_2.1.1       gtable_0.3.4        sass_0.4.7         
 [91] digest_0.6.33       BiocGenerics_0.46.0 ggrepel_0.9.3      
 [94] htmlwidgets_1.6.2   rjson_0.2.21        farver_2.1.1       
 [97] htmltools_0.5.6     lifecycle_1.0.3     httr_1.4.7         
[100] GlobalOptions_0.1.2 MASS_7.3-60        

  1. Weverton Gomes da Costa, Pós-Doutorando, Embrapa Mandioca e Fruticultura, ↩︎