Background

In Part 1, we learned about pretraining for the lasso, as it applies to “input grouped” data – data where each observation belongs to one of \(k\) distinct groups (e.g. cancer classes). With simulated data, we saw how to fit pretrained models in this context using the R package ptLasso.

Now, we will show how to do pretraining for three different settings:

  1. data with a multinomial response
  2. multi-response data, and
  3. time series data.

For all of these examples, we’ll use the ptLasso package - so let’s load it now.

require(ptLasso)

In all cases, our modeling pipeline will look like this:

# Data: features X, response y,
# and "useCase" (targetGroups, multiresponse or timeSeries)

# Fit a model:
fit = ptLasso(X, y, use.case = "useCase")
# ...optionally using cross-validation for parameter selection:
fit = cv.ptLasso(X, y, use.case = "useCase")

# Visualize the model:
plot(fit)

# And make predictions:
predict(fit, Xtest)

# And make predictions:
coef(fit, Xtest)

Multinomial response data

  1. Fit an overall model:

      Fit a multinomial model using a group lasso penalty on the coefficients \(\beta\).

      The group lasso penalty forces all responses to have the same support, but allows different coefficient values.

  1. Fit group-specific models:

      Train individual one-vs-rest models using the penalty factor and offset defined by the overall model (as in the input grouped setting).

      This step allows us to discover features that are specific to each class.

Example with simulated data

First, simulate data:

set.seed(1234)

n = 900  # number of observations
p = 100   # number of features
k = 3    # number of classes
class.sizes = rep(n/k, k) # size of each class
ncommon = 15  # number of shared features
nindiv = 10    # number of individual features
base.seq = seq(-.5, .5, length.out = ncommon + 1)
shift.common = matrix(
  c(sample(base.seq, ncommon), sample(base.seq, ncommon), sample(base.seq, ncommon)),
  nrow = 3
)
base.seq = seq(-.1, .1, length.out = nindiv + 1)
shift.indiv = matrix(
  c(sample(base.seq, nindiv), sample(base.seq, nindiv), sample(base.seq, nindiv)),
  nrow = 3
)
x     = matrix(rnorm(n * p), n, p)
xtest = matrix(rnorm(n * p), n, p)
y = ytest = c(sapply(1:length(class.sizes), function(i) rep(i, class.sizes[i])))

# Simulate our data.
# Shift x according to our coefficients, which differ by class.
start = ncommon + 1
for (i in 1:k) {
  end = start + nindiv - 1
  
  for(j in 1:ncommon){
    x[y == i, j] = x[y == i, j] + shift.common[i, j]
    xtest[ytest == i, j] = xtest[ytest == i, j] + shift.common[i, j]
  }
  jj = 1
  for(j in start:end){
    x[y == i, j] = x[y == i, j] + shift.indiv[i, jj]
    xtest[ytest == i, j] = xtest[ytest == i, j] + shift.indiv[i, jj]
    jj = jj + 1
  }
  
  start = end + 1
}

Model fitting and visualization

Now we are ready to model.

By default, ptLasso uses type.measure = "deviance", but for ease of interpretability, we use type.measure = "class" (the misclassification rate).

ptLasso uses cv.glmnet to fit \(7\) models:

  • the overall model (one multinomial model trained using all \(3\) groups),
  • the 3 pretrained models (a one-vs-rest model for each group) and
  • the 3 individual models (a one-vs-rest model for each group).
fit = ptLasso(x = x, y = y, 
              use.case = "targetGroups", 
              type.measure = "class",
              alpha = 0.5)

predict(fit, xtest, ytest = ytest)
## 
## Call:  
## predict.ptLasso(object = fit, xtest = xtest, ytest = ytest) 
## 
## 
## 
## alpha =  0.5 
## 
## Performance (Misclassification error):
## 
##            overall   mean group_1 group_2 group_3
## Overall     0.3322                               
## Pretrain    0.3367 0.2267  0.1956  0.2633  0.2211
## Individual  0.3356 0.2337  0.2011  0.2633  0.2367
## 
## Support size:
##                                          
## Overall    28                            
## Pretrain   70 (16 common + 54 individual)
## Individual 50

Now, our performance metrics include:

  • the “overall” misclassification rate, computed by treating this as a multinomial problem, and
  • the “mean” and group-specific misclassification rates, computed by treating this as a set of one-vs-rest problems.

We can plot each of the models fitted:

plot(fit)

And extract coefficients:

coefs = coef(fit)

Choosing \(\alpha\) with cross-validation

To choose the pretraining hyperparameter \(\alpha\), we can use cross-validation. Plotting and predicting are done using the same syntax as before.

set.seed(2345)
cvfit = cv.ptLasso(x = x, y = y, 
              use.case = "targetGroups", 
              type.measure = "class")

plot(cvfit)

predict(cvfit, xtest, ytest = ytest)
## 
## Call:  
## predict.cv.ptLasso(object = cvfit, xtest = xtest, ytest = ytest) 
## 
## 
## 
## alpha =  0.9 
## 
## Performance (Misclassification error):
## 
##            overall   mean group_1 group_2 group_3
## Overall     0.3344                               
## Pretrain    0.3367 0.2341  0.2067  0.2589  0.2367
## Individual  0.3489 0.2348  0.2089  0.2611  0.2344
## 
## Support size:
##                                          
## Overall    34                            
## Pretrain   33 (14 common + 19 individual)
## Individual 76

Multi-response data

  1. Fit an overall model:

      Fit a multi-response model using a group lasso penalty on the coefficients \(\beta\).

  1. Fit response-specific models:

      Train individual models for each response, using the penalty factor and offset defined by the overall model.

Example with simulated data

We start by simulating data with two Gaussian responses. The first 5 features are predictive for both responses; there is an extra set of 5 coefficients that are predictive for each individual response.

set.seed(1234)

# Define constants
n = 1000         # Total number of samples
ntrain = 650     # Number of training samples
p = 500          # Number of features
sigma = 2        # Standard deviation of noise
     

# Generate covariate matrix
x = matrix(rnorm(n * p), n, p)

# Define coefficients for responses 1 and 2
beta1 = c(rep(1, 5), rep(0.5, 5), rep(0, p - 10))
beta2 = c(rep(1, 5), rep(0, 5), rep(0.5, 5), rep(0, p - 15))

mu = cbind(x %*% beta1, x %*% beta2)
y  = cbind(mu[, 1] + sigma * rnorm(n), 
           mu[, 2] + sigma * rnorm(n))

# Split into train and test
xtest = x[-(1:ntrain), ]
ytest = y[-(1:ntrain), ]

x = x[1:ntrain, ]
y = y[1:ntrain, ]

cat("Correlation between two tasks:", cor(y[, 1], y[, 2]))
## Correlation between two tasks: 0.5119585

Model fitting and visualization

fit = ptLasso(x = x, y = y, 
              use.case = "multiresponse",
              alpha = 0.5)

predict(fit, xtest, ytest = ytest)
## 
## Call:  
## predict.ptLasso(object = fit, xtest = xtest, ytest = ytest) 
## 
## 
## 
## alpha =  0.5 
## 
## Performance (Mean squared error):
## 
##            allResponses  mean response_1 response_2
## Overall           9.217 4.608      4.092      5.125
## Pretrain          9.072 4.536      4.132      4.940
## Individual        9.324 4.662      4.168      5.157
## 
## Support size:
##                                         
## Overall    57                           
## Pretrain   23 (20 common + 3 individual)
## Individual 75
plot(fit)

Choosing \(\alpha\) with cross-validation

cvfit = cv.ptLasso(x = x, y = y, 
                   use.case = "multiresponse")

predict(cvfit, xtest, ytest = ytest)
## 
## Call:  
## predict.cv.ptLasso(object = cvfit, xtest = xtest, ytest = ytest) 
## 
## 
## 
## alpha =  0.3 
## 
## Performance (Mean squared error):
## 
##            allResponses  mean response_1 response_2
## Overall           9.217 4.608      4.092      5.125
## Pretrain          9.006 4.503      4.149      4.857
## Individual        9.324 4.662      4.168      5.157
## 
## Support size:
##                                         
## Overall    57                           
## Pretrain   22 (20 common + 2 individual)
## Individual 75
plot(cvfit)

Multi-response data with time ordered columns

  1. Fit an initial model:

      Fit a lasso penalized model for the first time point.

  1. Fit the next time-specific model:

      Train a model for time 2 using the penalty factor and offset defined by previous model.

  1. Continue:

      Train a model for time 3 using the offset from time 2 and the union of supports from times 1 and 2.

      Train a model for time 4 using the offset from time 3 and the union of supports from times 1, 2 and 3.

      etc.

Example with simulated data

We will again start by simulating data; here we imagine data with two time points. These two time points have a shared support, with different coefficient values.

set.seed(1234)

# Define constants
n = 600          # Total number of samples
ntrain = 300     # Number of training samples
p = 100          # Number of features
sigma = 2        # Standard deviation of noise

# Generate covariate matrix
x = matrix(rnorm(n * p), n, p)

# Define coefficients for time points 1 and 2
beta1 = c(rnorm(10), rep(0, p - 10))  # Coefs at time 1
beta2 = 0.5 * beta1 + c(rep(0, 10), runif(5, min = 0, max = 2), rep(0, p-15))

# Generate response variables for times 1 and 2
y = cbind(
  x %*% beta1 + sigma * rnorm(n),
  x %*% beta2 + sigma * rnorm(n)
)

# Split data into training and testing sets
xtest = x[-(1:ntrain), ]  # Test covariates
ytest = y[-(1:ntrain), ]  # Test response

x = x[1:ntrain, ]  # Train covariates
y = y[1:ntrain, ]  # Train response

Model fitting and visualization

Modeling is just as before, only now we specify use.case = "timeSeries".

Notice that our performance metrics no longer include an “Overall” model – we imagine that the alternative here is to just treat each time point separately. For Gaussian responses, one could alternatively try use.case = "multiresponse". This ignores the time ordering of the responses, but is otherwise a valid approach to consider.

fit = ptLasso(x = x, y = y, 
              use.case = "timeSeries",
              alpha = 0.5)

plot(fit)

predict(fit, xtest, ytest = ytest)
## 
## Call:  
## predict.ptLasso(object = fit, xtest = xtest, ytest = ytest) 
## 
## 
## 
## alpha =  0.5 
## 
## Performance (Mean squared error):
## 
##             mean response_1 response_2
## Pretrain   4.706      4.554      4.859
## Individual 4.413      4.554      4.272
## 
## Support size:
##                                         
## Pretrain   35 (6 common + 29 individual)
## Individual 50

Choosing \(\alpha\) with cross-validation

set.seed(2345)
cvfit = cv.ptLasso(x = x, y = y, 
                   use.case = "timeSeries")

plot(cvfit)

predict(cvfit, xtest, ytest = ytest)
## 
## Call:  
## predict.cv.ptLasso(object = cvfit, xtest = xtest, ytest = ytest) 
## 
## 
## 
## alpha =  0.1 
## 
## Performance (Mean squared error):
## 
##             mean response_1 response_2
## Pretrain   4.343      4.573      4.112
## Individual 4.502      4.573      4.431
## 
## Support size:
##                                         
## Pretrain   37 (8 common + 29 individual)
## Individual 33

Wrap-up

Now we know how to do pretraining for all of the settings covered natively by ptLasso.

If you want to see more example use cases of pretraining – including conditional average treatment effect estimation and multi-response data with different outcome types – check out the ptLasso vignette!