The University of Texas M. D. Anderson Cancer CenterMDACC Home

TCGA-Assembler: Open-Source Software for Retrieving and Processing TCGA Data

Yitan Zhu, Peng Qiu, Yuan Ji*

Please cite using Zhu, Qiu and Ji (2014), Nature Methods. 11(6):599-600. doi:10.1038/nmeth.2956

    For more information, contact Yitan Zhu or Yuan Ji

TCGA-Assembler is an open-source, freely available tool that automatically downloads, assembles, and processes public The Cancer Genome Atlas (TCGA) data, to facilitate downstream data analysis by relieving investigators from the burdens of data preparation. TCGA-Assembler includes two modules. Module A acquires public TCGA data from TCGA Data Coordinating Center and assembles individual data files into locally stored data tables. Module B does various manipulations on the data tables to prepare them for downstream analysis.

TCGA-Assembler official webpage is moved to

Recent News

  • Version 1.0.3 release - 07/09/2014

    • Version 1.0.3 is available through the new TCGA-Assembler official webpage at
    • TCGA-Assembler can now retrieve and process all microarray gene expression data from more than 10 cancer types in TCGA. Please use the newly updated "DownloadRNASeqData()" function and "ProcessRNASeqData()" function to acquire and process microarray data. The user manual has also been updated to reflect this change.
    • TCGA-Assembler can now retrieve not only the clinical information of patients, but also the biospecimen information of patient samples. A new function "DownloadBiospecimenData()" has been added to Module A for retrieving the biospecimen information of samples.
    • Fixed a user-reported bug in the "DownloadRPPAData()" function that may be trigged by additional columns in RPPA antibody annotation file. Typical antibody annotation files include only three columns, while the annotation files of a few cancer types may have more columns.
    • Updated the directory traverse result file in the package. It is now DirectoryTraverseResult_Jul-08-2014 and includes the URLs of all TCGA data files accurate as of July 8th, 2014. With the updated file, you can easily download most current TCGA data files. And you can also update the file by yourself using the "TraverseAllDirectories()" function.
    • To download version 1.0.3 requires a simple registration process for basic information of TCGA-Assembler users.
  • Version 1.0.2 release - 03/05/2014

  • Bug report - 03/01/2014:

    • TCGA just changed their naming rule for clinical data! This affects the function "DownloadClinicalData()" in our package! We are in contact with TCGA team and working to release a mini-update version soon.
  • Version 1.0.1 release - 02/17/2014

    • TCGA-Assembler is programmed using R ( It needs to be installed to use TCGA-Assembler.
      (Download Software)  (Quick Start Guide)   (Full Manual)  

    • TCGA-Assembler Acquires and Processes Large Numbers of TCGA Data.
      Below is the summary of public TCGA data that can be acquired and processed by TCGA-Assembler. Entries are the numbers of patient samples measured by different assay platforms and the numbers of patients with de-identified clinical information (accurate as of August, 2013). The numbers will gradually increase as new data are still being produced.

Updated: 04/25/2014 Copyright © 2014 All rights reserved.