Smooth normative brain mapping of 3-dimensional morphometry imaging data using skew-normal regression
Author
Marco Palma
Published
January 31, 2025
This is the code for the paper “Smooth normative brain mapping of 3-dimensional morphometry imaging data using skew-normal regression” by Palma et al., available as a preprint on arxiv.
We import the mask file. Then, we take the MDT (minimal deformation template, the common template for the TBM images), we apply the mask on it and drop the empty dimensions of the array (i.e., those containing all zeros).
Code
mask <- here::here("smoothed_mask_125.nii.gz") %>%readNIfTI(fname = .)fn = here::here("ADNI_ICBM9P_mni_4step_MDT.img")dimscan <-c(220,220,220)img_template <-array(readBin(fn, what ="int", n =prod(dimscan), size =2, signed =FALSE), dim = dimscan)template_resized <-mask_img(img_template, mask, allow.NA =TRUE) %>%dropEmptyImageDimensions(., value =0, threshold =0, reorient =FALSE) dims_mask <-dim(template_resized)voxel_active <-which(as.vector(template_resized) !=0)LVA <-length(voxel_active)
The dimensions for the images are now 151, 189, 145 and the number of voxels within the mask is 2019941.
Loading the TBM images
Now it is time to load the TBM images (we recommend to do it on a cluster): you can run the script import_masked_images.R to apply the mask on each image and return a vector with TBM values for the voxels within the mask. Alternatively, you can run the script project_TBM_fast.R to return the coefficients of a basis representation (e.g. B-splines) for each image. Please note: you need to specify the path to the mask and TBM images, and for the project_TBM_fast.R you need to provide as input a design matrix with the basis functions (which you can build using this function on my package neurofundata).
Create training set
We will fit our models on a subset of the imaging dataset. First, we open the ADNIMERGE dataset and we select the 817 subjects for which we have the TBM images.
We build a regular grid of voxels within the 3D box containing the mask.
Code
grid_dist <-8grid_list <-list(unique(c(seq(1, dims_mask[1], by = grid_dist), dims_mask[1])),unique(c(seq(1, dims_mask[2], by = grid_dist), dims_mask[2])),unique(c(seq(1, dims_mask[3], by = grid_dist), dims_mask[3])))allgrid <-expand.grid(x =seq_len(dims_mask[1]), y =seq_len(dims_mask[2]), z =seq_len(dims_mask[3]),KEEP.OUT.ATTRS =FALSE)knot_loc <-which(allgrid$x %in% grid_list[[1]] & allgrid$y %in% grid_list[[2]] & allgrid$z %in% grid_list[[3]])allgrid$knots <-0allgrid$knots[knot_loc] <-1
The spacing between consecutive knots is prespecified to be equal to 8 mm in each dimension.
Workflow of the analysis
We are now ready to perform the skew-normal modelling at the voxelwise level. In the diagramme below, (V) means that the content of the block applies to all \(V\) voxels in the mask, while (V*) refers to steps applied only on the \(V^*\) voxels in the grid.
First, we import the TBM vectors and we select the voxels in the grid for the subjects in the training set, as well as the corresponding covariates.
At those voxels, we fit the skew-normal regression using age and gender (and their interaction) as covariates. Please note: the code below fits sequentially a skew-normal regression model for each voxel on the grid, therefore it will take a few minutes. The output of the regressions is a set of three parameters (location, scale, skewness) in CP (centered parameterisation, see SN package) at each knot on the grid.
We selected two voxels which show respectively the lowest and the highest sample standard deviation (see file RawData.Rmd for additional exploratory analysis)
To obtain a 3D version of radial basis functions, we have built a 3D version of the plane function from the package FRK. Although the structure of the following function is very close to the corresponding function on the package, please reuse it with caution.
Code
setClass("plane3D",contains="manifold")plane3D <-function(measure=Euclid_dist(dim=3L)) {stopifnot(dimensions(measure)==3L)new("plane3D",measure=measure)}## Constructor for planemeasure <-function(dist,dim) {## Basic checksif(!is.function(dist))stop("dist needs to be a function that accepts dim arguments")if(!(is.numeric(dim) |is.integer(dim)))stop("dim needs to be an integer, generally 1L, 2L or 3L") dim =as.integer(dim) # coerce to integer if numericnew("measure",dist=dist,dim=dim) # init. object}Euclid_dist <-function(dim=2L) {## Euclidean distancestopifnot(is.integer(dim)) # dimension needs to be integernew("measure", # init. measure object with the distR functiondist=function(x1,x2) distR(x1,x2), dim=dim)}setMethod("initialize",signature="plane3D",function(.Object,measure=Euclid_dist(dim=3L)) { .Object@type <-"plane3D"# set type .Object@measure <- measure # set measurecallNextMethod(.Object)}).GRBF_wrapper <-function(manifold,mu,std) {.check_basis_function_args(manifold,mu,std)function(s) {stopifnot(ncol(s) ==dimensions(manifold)) # Internal checking dist_sq <-distance(manifold,s,mu)^2exp(-0.5* dist_sq/(std^2) ) }}.MQ_wrapper <-function(manifold,mu,std) {.check_basis_function_args(manifold,mu,std)function(s) {stopifnot(ncol(s) ==dimensions(manifold)) # Internal checking dist_sq <-distance(manifold,s,mu)^21/sqrt(1+ (dist_sq*std)^2) ###(dist_sq^2 + std^2)^1.03 }}.check_basis_function_args <-function(manifold,loc,scale) {if(!is.matrix(loc))stop("Basis functions need to be evaluated using locations inside a matrix")if(!(dimensions(manifold) ==ncol(loc)))stop("Incorrect number of columns for the argument loc")if(!is.numeric(scale))stop("The scale parmaeter needs to be numeric")if(!(scale >0))stop("The scale parameter needs to be greater than zero")}local_basis <-function(manifold=sphere(), # default manifold is sphereloc=matrix(c(1,0),nrow=1), # one centroid at (1,0)scale=1, # std = 1, and Gaussian RBFtype=c("bisquare","Gaussian","exp","Matern32", "multiquadric")) {## Basic checksif(!is.matrix(loc)) stop("loc needs to be a matrix")if(!(dimensions(manifold) ==ncol(loc))) stop("number of columns in loc needs to be the same as the number of manifold dimensions")if(!(length(scale) ==nrow(loc))) stop("need to have as many scale parameters as centroids") type <-match.arg(type) n <-nrow(loc) # n is the number of centroidscolnames(loc) <-c(outer("loc",1:ncol(loc), # label dimensions as loc1,loc2,...,locnFUN = paste0)) fn <- pars <-list() # initialise lists of functions and parametersfor (i in1:n) {## Assign the functions as determined by userif(type=="bisquare") { fn[[i]] <-.bisquare_wrapper(manifold,matrix(loc[i,],nrow=1),scale[i]) } elseif (type=="Gaussian") { fn[[i]] <-.GRBF_wrapper(manifold,matrix(loc[i,],nrow=1),scale[i]) } elseif (type=="exp") { fn[[i]] <-.exp_wrapper(manifold,matrix(loc[i,],nrow=1),scale[i]) } elseif (type=="Matern32") { fn[[i]] <-.Matern32_wrapper(manifold,matrix(loc[i,],nrow=1),scale[i]) } elseif (type=="multiquadric") { fn[[i]] <-.MQ_wrapper(manifold,matrix(loc[i,],nrow=1),scale[i]) }## Save the parameters to the parameter list (loc and scale) pars[[i]] <-list(loc =matrix(loc[i,],nrow=1), scale=scale[i]) }## Create a data frame which summarises info about the functions. Set resolution = 1 df <-data.frame(loc,scale,res=1)## Create new basis function, using the manifold, n, functions, parameters list, and data frame. this_basis <-Basis(manifold = manifold, n = n, fn = fn, pars = pars, df = df)return(this_basis)}
Now we create the local basis functions centered at the knots of the grid which fall within the mask. The scale parameter is set to be smaller than the distance between adjacent knots. The design matrix with the radial basis functions is saved.
Code
basis_locs <-data.frame(allgrid_CN) %>%filter(mask*knots >0) %>% dplyr::select(x,y,z) %>%as.matrix() G <-local_basis(manifold =plane3D(),loc=basis_locs,scale=rep(grid_dist*2/3,nrow(basis_locs)),type="Gaussian")coords<-filter(allgrid_CN, mask>0) %>% dplyr::select(x,y,z) %>%as.matrix(.)fun1 <-function(xx, maxdist){update_progress(pb) ii <- FRK::distR(coords, matrix(xx, nrow =1)) %>% magrittr::is_less_than(maxdist) %>%which(.) ji <-which(apply(basis_locs, 1, function(x) identical(x, xx))) %>%rep(., length(ii)) xi <- coords[ii, ] %>%##row index of xx in basis_locs G@fn[[unique(ji)]](.) %>%as.vector(.) return(list(i=ii, p =length(ii) , x = xi)) #j=ji}pb <-progress_estimated(nrow(basis_locs))system.time( radial_design_mat <-apply(basis_locs, 1, fun1, maxdist = grid_dist*2) %>>% {sparseMatrix(x=unlist(lapply(.,"[[","x")), i=unlist(lapply(.,"[[","i")),p=c(0,cumsum(sapply(.,function(x){x$p}))),dims=c(nrow(coords), length(.)),index1=TRUE)})save(radial_design_mat, file =paste0("RBF_", grid_dist, "mm.RData"))
Parameter functions
Then, we use glmnet to get an intercept, scale and skewness parameter for all the voxels outside the grid, thus covering the entire images. In other words, we are interpolating to obtain parameter functions across the rest of the brain images.
We report here the code to obtain the parameter functions also in the “full” case where all voxels are used (not only the ones in the grid). We do not run the code here as it takes several hours. The computation is run in parallel using the package parabar.
Code
fit2 <-function(col1, data = ADNI_covariates){ {library(dplyr); library(magrittr); library(sn)}model.matrix(~ AGEminus70*PTGENDER, data = data) %>%sn.mple(x = .,y = col1,opt.method="nlminb", penalty=NULL) %>% magrittr::extract2("cp")}library(parallel)library(parabar)n.cores <-detectCores() -2print("Start fitskew.")backend <-start_backend(cores = n.cores, cluster_type ="psock", backend_type ="async")set_option("progress_track", TRUE)configure_bar(type ="modern", format ="[:bar] :percent") # format = "> completed :current out of :total tasks [:percent] [:elapsed]")export(backend, c("tbm_data_FBM_knots", "ADNI_covariates"), environment())peek(backend)fitskew_time_FULL <-system.time( params_gridmat_parabar <-par_apply(backend, tbm_data_FBM_knots[], 2, fit2) %>%t(.) %>%data.frame())# fitskew_time_FULLstop_backend(backend)
We can plot the association of the covariates with the mean of the TBM value at each voxel. In particular, we can look at the intercept (that can be interpreted as the mean for the reference category, that is, a 70-year-old female CN subject.
The figure below shows the sex main effect in the normative population (i.e. the average difference in TBM values between CN males and CN females at the same age).
The figure below shows the age-sex interaction in the normative population (i.e. the change in TBM values corresponding to a 1-year age increase for CN males).
Let us now look at the left-hand side of the workflow.
For each TBM image, we can now compute the z-values on the voxels on the grid and then interpolate on the rest of the images. Then, we can compute indices of abnormality for each image. This can be done for each image (in the training as well as the test set, either cognitively normal or with diseases) with the script produce_zmaps_covariates.R (we recommend to do it on a cluster). Then the indices are joined in a single table.
Indices of abnormality
We compute a lot of indices: the mean of the z-values, their median, their range, their variance etc… You could also take an extreme block (say, the 99.99-th percentile) and compute the mean of that. You would expect larger extremes to be associated with departure from the normative population, possibly linked with some cognitive impairment.
The figures below show boxplots of one specific index of deviation by diagnosis. MCI and AD groups show a higher median with respect to the CN group. The score is also higher for MCI+AD on average across ADAS13 values and in the age range considered.