JOM KITA KE POLITEKNIK

Fast approximate inference for variable selection in Dirichlet process mixtures, with an application to pan-cancer proteomics (Record no. 1946)

MARC details
042 ## - AUTHENTICATION CODE
Authentication code dc
100 10 - MAIN ENTRY--PERSONAL NAME
Personal name Crook, Oliver M.
Relator term author
9 (RLIN) 1934
245 00 - TITLE STATEMENT
Title Fast approximate inference for variable selection in Dirichlet process mixtures, with an application to pan-cancer proteomics
260 ## - PUBLICATION, DISTRIBUTION, ETC.
Date of publication, distribution, etc. 2019-12-12.
500 ## - GENERAL NOTE
General note /pmc/articles/PMC7614016/
500 ## - GENERAL NOTE
General note /pubmed/31829970
520 ## - SUMMARY, ETC.
Summary, etc. The Dirichlet Process (DP) mixture model has become a popular choice for model-based clustering, largely because it allows the number of clusters to be inferred. The sequential updating and greedy search (SUGS) algorithm (Wang & Dunson, 2011) was proposed as a fast method for performing approximate Bayesian inference in DP mixture models, by posing clustering as a Bayesian model selection (BMS) problem and avoiding the use of computationally costly Markov chain Monte Carlo methods. Here we consider how this approach may be extended to permit variable selection for clustering, and also demonstrate the benefits of Bayesian model averaging (BMA) in place of BMS. Through an array of simulation examples and well-studied examples from cancer transcriptomics, we show that our method performs competitively with the current state-of-the-art, while also offering computational benefits. We apply our approach to reverse-phase protein array (RPPA) data from The Cancer Genome Atlas (TCGA) in order to perform a pan-cancer proteomic characterisation of 5157 tumour samples. We have implemented our approach, together with the original SUGS algorithm, in an open-source R package named sugsvarsel, which accelerates analysis by performing intensive computations in C++ and provides automated parallel processing. The R package is freely available from: https://github.com/ococrook/sugsvarsel
540 ## - TERMS GOVERNING USE AND REPRODUCTION NOTE
Terms governing use and reproduction
540 ## - TERMS GOVERNING USE AND REPRODUCTION NOTE
Terms governing use and reproduction https://creativecommons.org/licenses/by/4.0/This work is licensed under the Creative Commons Attribution 4.0 Public License https://creativecommons.org/licenses/by/4.0/.
546 ## - LANGUAGE NOTE
Language note en
690 ## - LOCAL SUBJECT ADDED ENTRY--TOPICAL TERM (OCLC, RLIN)
Topical term or geographic name as entry element Article
655 7# - INDEX TERM--GENRE/FORM
Genre/form data or focus term Text
Source of term local
700 10 - ADDED ENTRY--PERSONAL NAME
Personal name Gatto, Laurent
Relator term author
9 (RLIN) 1935
700 10 - ADDED ENTRY--PERSONAL NAME
Personal name Kirk, Paul D.W.
Relator term author
9 (RLIN) 1936
786 0# - DATA SOURCE ENTRY
Note Stat Appl Genet Mol Biol
856 41 - ELECTRONIC LOCATION AND ACCESS
Uniform Resource Identifier <a href="http://dx.doi.org/10.1515/sagmb-2018-0065">http://dx.doi.org/10.1515/sagmb-2018-0065</a>
Public note Connect to this object online.

No items available.