In Part 1, we learned about pretraining for the lasso, as it applies
to “input grouped” data – data where each observation belongs to one of
\(k\) distinct groups (e.g. cancer
classes). With simulated data, we saw how to fit pretrained models in
this context using the R package ptLasso.
Now, we will show how to do pretraining for three different settings:
For all of these examples, we’ll use the ptLasso package
- so let’s load it now.
require(ptLasso)
In all cases, our modeling pipeline will look like this:
# Data: features X, response y,
# and "useCase" (targetGroups, multiresponse or timeSeries)
# Fit a model:
fit = ptLasso(X, y, use.case = "useCase")
# ...optionally using cross-validation for parameter selection:
fit = cv.ptLasso(X, y, use.case = "useCase")
# Visualize the model:
plot(fit)
# And make predictions:
predict(fit, Xtest)
# And make predictions:
coef(fit, Xtest)
Multinomial response data has a categorical response: each response is in one of three or more categories. For example, observations may be from people with cancer, and our goal can be to predict the type of cancer.
The pretraining philosophy for modeling data with a multinomial response is analogous to that for input grouped data.
We wish to learn across all classes and learn a specific model for each class.
We again use a two-stage training strategy.
Fit a multinomial model using a group lasso penalty on the coefficients \(\beta\).
The group lasso penalty forces all responses to have the same support, but allows different coefficient values.
Train individual one-vs-rest models using the penalty factor and offset defined by the overall model (as in the input grouped setting).
This step allows us to discover features that are specific to each class.
First, simulate data:
set.seed(1234)
n = 900 # number of observations
p = 100 # number of features
k = 3 # number of classes
class.sizes = rep(n/k, k) # size of each class
ncommon = 15 # number of shared features
nindiv = 10 # number of individual features
base.seq = seq(-.5, .5, length.out = ncommon + 1)
shift.common = matrix(
c(sample(base.seq, ncommon), sample(base.seq, ncommon), sample(base.seq, ncommon)),
nrow = 3
)
base.seq = seq(-.1, .1, length.out = nindiv + 1)
shift.indiv = matrix(
c(sample(base.seq, nindiv), sample(base.seq, nindiv), sample(base.seq, nindiv)),
nrow = 3
)
x = matrix(rnorm(n * p), n, p)
xtest = matrix(rnorm(n * p), n, p)
y = ytest = c(sapply(1:length(class.sizes), function(i) rep(i, class.sizes[i])))
# Simulate our data.
# Shift x according to our coefficients, which differ by class.
start = ncommon + 1
for (i in 1:k) {
end = start + nindiv - 1
for(j in 1:ncommon){
x[y == i, j] = x[y == i, j] + shift.common[i, j]
xtest[ytest == i, j] = xtest[ytest == i, j] + shift.common[i, j]
}
jj = 1
for(j in start:end){
x[y == i, j] = x[y == i, j] + shift.indiv[i, jj]
xtest[ytest == i, j] = xtest[ytest == i, j] + shift.indiv[i, jj]
jj = jj + 1
}
start = end + 1
}
Now we are ready to model.
By default, ptLasso uses type.measure = "deviance", but
for ease of interpretability, we use type.measure = "class"
(the misclassification rate).
ptLasso uses cv.glmnet to fit \(7\) models:
fit = ptLasso(x = x, y = y,
use.case = "targetGroups",
type.measure = "class",
alpha = 0.5)
predict(fit, xtest, ytest = ytest)
##
## Call:
## predict.ptLasso(object = fit, xtest = xtest, ytest = ytest)
##
##
##
## alpha = 0.5
##
## Performance (Misclassification error):
##
## overall mean group_1 group_2 group_3
## Overall 0.3322
## Pretrain 0.3367 0.2267 0.1956 0.2633 0.2211
## Individual 0.3356 0.2337 0.2011 0.2633 0.2367
##
## Support size:
##
## Overall 28
## Pretrain 70 (16 common + 54 individual)
## Individual 50
Now, our performance metrics include:
We can plot each of the models fitted:
plot(fit)
And extract coefficients:
coefs = coef(fit)
To choose the pretraining hyperparameter \(\alpha\), we can use cross-validation. Plotting and predicting are done using the same syntax as before.
set.seed(2345)
cvfit = cv.ptLasso(x = x, y = y,
use.case = "targetGroups",
type.measure = "class")
plot(cvfit)
predict(cvfit, xtest, ytest = ytest)
##
## Call:
## predict.cv.ptLasso(object = cvfit, xtest = xtest, ytest = ytest)
##
##
##
## alpha = 0.9
##
## Performance (Misclassification error):
##
## overall mean group_1 group_2 group_3
## Overall 0.3344
## Pretrain 0.3367 0.2341 0.2067 0.2589 0.2367
## Individual 0.3489 0.2348 0.2089 0.2611 0.2344
##
## Support size:
##
## Overall 34
## Pretrain 33 (14 common + 19 individual)
## Individual 76
Multi-response data has multiple responses for each observation. Data like this may come from a survey, where each person answered multiple questions. In this case, the people would be the observations and their answers would be the responses.
ptLasso supports the special case where all
responses are continuous (Gaussian).
Pretraining for multi-response data is quite similar to the multinomial setting.
We again wish to learn across all classes and learn a specific model for each class, and we use a two-stage training strategy.
Fit a multi-response model using a group lasso penalty on the coefficients \(\beta\).
Train individual models for each response, using the penalty factor and offset defined by the overall model.
ptLasso. Note that we only
support data with multiple Gaussian responses.We start by simulating data with two Gaussian responses. The first 5 features are predictive for both responses; there is an extra set of 5 coefficients that are predictive for each individual response.
set.seed(1234)
# Define constants
n = 1000 # Total number of samples
ntrain = 650 # Number of training samples
p = 500 # Number of features
sigma = 2 # Standard deviation of noise
# Generate covariate matrix
x = matrix(rnorm(n * p), n, p)
# Define coefficients for responses 1 and 2
beta1 = c(rep(1, 5), rep(0.5, 5), rep(0, p - 10))
beta2 = c(rep(1, 5), rep(0, 5), rep(0.5, 5), rep(0, p - 15))
mu = cbind(x %*% beta1, x %*% beta2)
y = cbind(mu[, 1] + sigma * rnorm(n),
mu[, 2] + sigma * rnorm(n))
# Split into train and test
xtest = x[-(1:ntrain), ]
ytest = y[-(1:ntrain), ]
x = x[1:ntrain, ]
y = y[1:ntrain, ]
cat("Correlation between two tasks:", cor(y[, 1], y[, 2]))
## Correlation between two tasks: 0.5119585
fit = ptLasso(x = x, y = y,
use.case = "multiresponse",
alpha = 0.5)
predict(fit, xtest, ytest = ytest)
##
## Call:
## predict.ptLasso(object = fit, xtest = xtest, ytest = ytest)
##
##
##
## alpha = 0.5
##
## Performance (Mean squared error):
##
## allResponses mean response_1 response_2
## Overall 9.217 4.608 4.092 5.125
## Pretrain 9.072 4.536 4.132 4.940
## Individual 9.324 4.662 4.168 5.157
##
## Support size:
##
## Overall 57
## Pretrain 23 (20 common + 3 individual)
## Individual 75
plot(fit)
cvfit = cv.ptLasso(x = x, y = y,
use.case = "multiresponse")
predict(cvfit, xtest, ytest = ytest)
##
## Call:
## predict.cv.ptLasso(object = cvfit, xtest = xtest, ytest = ytest)
##
##
##
## alpha = 0.3
##
## Performance (Mean squared error):
##
## allResponses mean response_1 response_2
## Overall 9.217 4.608 4.092 5.125
## Pretrain 9.006 4.503 4.149 4.857
## Individual 9.324 4.662 4.168 5.157
##
## Support size:
##
## Overall 57
## Pretrain 22 (20 common + 2 individual)
## Individual 75
plot(cvfit)
Here, we again have multiple responses for each observation. Now, however, these responses are measurements of the same outcome at different moments in time. For example, we may observe patients across many hospital visits, and measure the same biomarker at each visit. In this case, the time ordered biomarker measurements would be the responses.
ptLasso has built-in support for data like this –
we’ll call it time series data – where all responses are either Gaussian
or binomial.
We wish to leverage signal across time points and fit a model specifically tuned for each time point.
To do pretraining with time series data, we:
Fit a lasso penalized model for the first time point.
Train a model for time 2 using the penalty factor and offset defined by previous model.
Train a model for time 3 using the offset from time 2 and the union of supports from times 1 and 2.
Train a model for time 4 using the offset from time 3 and the union of supports from times 1, 2 and 3.
etc.
We will again start by simulating data; here we imagine data with two time points. These two time points have a shared support, with different coefficient values.
set.seed(1234)
# Define constants
n = 600 # Total number of samples
ntrain = 300 # Number of training samples
p = 100 # Number of features
sigma = 2 # Standard deviation of noise
# Generate covariate matrix
x = matrix(rnorm(n * p), n, p)
# Define coefficients for time points 1 and 2
beta1 = c(rnorm(10), rep(0, p - 10)) # Coefs at time 1
beta2 = 0.5 * beta1 + c(rep(0, 10), runif(5, min = 0, max = 2), rep(0, p-15))
# Generate response variables for times 1 and 2
y = cbind(
x %*% beta1 + sigma * rnorm(n),
x %*% beta2 + sigma * rnorm(n)
)
# Split data into training and testing sets
xtest = x[-(1:ntrain), ] # Test covariates
ytest = y[-(1:ntrain), ] # Test response
x = x[1:ntrain, ] # Train covariates
y = y[1:ntrain, ] # Train response
Modeling is just as before, only now we specify
use.case = "timeSeries".
Notice that our performance metrics no longer include an “Overall”
model – we imagine that the alternative here is to just treat each time
point separately. For Gaussian responses, one could alternatively try
use.case = "multiresponse". This ignores the time ordering
of the responses, but is otherwise a valid approach to consider.
fit = ptLasso(x = x, y = y,
use.case = "timeSeries",
alpha = 0.5)
plot(fit)
predict(fit, xtest, ytest = ytest)
##
## Call:
## predict.ptLasso(object = fit, xtest = xtest, ytest = ytest)
##
##
##
## alpha = 0.5
##
## Performance (Mean squared error):
##
## mean response_1 response_2
## Pretrain 4.706 4.554 4.859
## Individual 4.413 4.554 4.272
##
## Support size:
##
## Pretrain 35 (6 common + 29 individual)
## Individual 50
set.seed(2345)
cvfit = cv.ptLasso(x = x, y = y,
use.case = "timeSeries")
plot(cvfit)
predict(cvfit, xtest, ytest = ytest)
##
## Call:
## predict.cv.ptLasso(object = cvfit, xtest = xtest, ytest = ytest)
##
##
##
## alpha = 0.1
##
## Performance (Mean squared error):
##
## mean response_1 response_2
## Pretrain 4.343 4.573 4.112
## Individual 4.502 4.573 4.431
##
## Support size:
##
## Pretrain 37 (8 common + 29 individual)
## Individual 33
Now we know how to do pretraining for all of the settings covered
natively by ptLasso.
If you want to see more example use cases of pretraining – including
conditional average treatment effect estimation and multi-response data
with different outcome types – check out the ptLasso
vignette!