Deep latent variable models for generating knockoffs

Ying Liu, Cheng Zheng

Research output: Contribution to journalArticle

Abstract

Selective inference is an emerging field in big data analytics; it targets on conducting variable selection and providing statistical inference at the same time. Among various selective inference frameworks, the model-X framework offers the most flexible tool to equip almost any machine learning method with the ability for false discovery rate (FDR) controlled variable selection. This paper provides a practical and flexible approach to generate knockoffs. We propose to fit a latent variable model for generating knockoffs. Under general conditions, the knockoffs can be generated by approximate inference of a latent variable, which captures all the correlation of predictors. We propose an algorithm based on recent advancement in stochastic variational inference to approximately reconstruct the distribution of data via the latent variables. We demonstrate that our proposed method can achieve FDR control and better power than existing knockoff generation methods in various simulated settings and a real data example for finding mutations associated with drug resistance in human immunodeficiency virus type 1 patients.

Original languageEnglish (US)
Article numbere260
JournalStat
Volume8
Issue number1
DOIs
StatePublished - 2019

Keywords

  • FDR control
  • deep generative model
  • latent variable mode
  • model-X knockoff

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint Dive into the research topics of 'Deep latent variable models for generating knockoffs'. Together they form a unique fingerprint.

  • Cite this