Title: | Training Datasets for iC10 Package |
---|---|
Description: | Training datasets for iC10; which implements the classifier described in the paper 'Genome-driven integrated classification of breast cancer validated in over 7,500 samples' (Ali HR et al., Genome Biology 2014). It uses copy number and/or expression form breast cancer data, trains a pamr classifier (Tibshirani et al.) with the features available and predicts the iC10 group. Genomic annotation for the training dataset has been obtained from Mark Dunning's lluminaHumanv3.db package. |
Authors: | Oscar M Rueda and Jose Antonio Seoane Fernandez |
Maintainer: | Oscar M. Rueda <[email protected]> |
License: | GPL-3 |
Version: | 2.0.1 |
Built: | 2025-02-24 05:33:01 UTC |
Source: | https://github.com/cran/iC10TrainingData |
Training datasets for iC10; which implements the classifier described in the paper 'Genome-driven integrated classification of breast cancer validated in over 7,500 samples' (Ali HR et al., Genome Biology 2014). It uses copy number and/or expression form breast cancer data, trains a pamr classifier (Tibshirani et al.) with the features available and predicts the iC10 group. Genomic annotation for the training dataset has been obtained from Mark Dunning's lluminaHumanv3.db package. Training datasets for iC10; which implements the classifier described in the METABRIC paper 'The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups' (Curtis et al., Nature 2012). It uses copy number and/or expression form breast cancer data, trains a pamr classifier (Tibshirani et al.) with the features available and predicts the iC10 group.
The DESCRIPTION file:
Package: | iC10TrainingData |
Type: | Package |
Title: | Training Datasets for iC10 Package |
Version: | 2.0.1 |
Date: | 2024-07-16 |
Author: | Oscar M Rueda and Jose Antonio Seoane Fernandez |
Maintainer: | Oscar M. Rueda <[email protected]> |
Description: | Training datasets for iC10; which implements the classifier described in the paper 'Genome-driven integrated classification of breast cancer validated in over 7,500 samples' (Ali HR et al., Genome Biology 2014). It uses copy number and/or expression form breast cancer data, trains a pamr classifier (Tibshirani et al.) with the features available and predicts the iC10 group. Genomic annotation for the training dataset has been obtained from Mark Dunning's lluminaHumanv3.db package. |
License: | GPL-3 |
Packaged: | 2024-07-16 07:15:09 UTC; oscar |
NeedsCompilation: | no |
Date/Publication: | 2024-07-16 08:00:02 UTC |
Depends: | R (>= 3.5.0) |
Repository: | https://rueda-lab.r-universe.dev |
RemoteUrl: | https://github.com/cran/iC10TrainingData |
RemoteRef: | HEAD |
RemoteSha: | 1e41b8cc1497e8183f2f07ae0f3aca4522734642 |
Index of help topics:
IntClustMemb Class Membership for the training set Map.All Probe mapping of the complete set of features of the training set Map.CN Probe mapping of the copy number features of the training set. Map.Exp Probe mapping of the Expression features of the training set iC10TrainingData-package Training Datasets for iC10 Package train.CN Copy number data for the training set train.Exp Expression data for the training set.
Training datasets for iC10; which implements the classifier described in the METABRIC paper 'The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups' (Curtis et al., Nature 2012). It uses copy number and/or expression form breast cancer data, trains a pamr classifier (Tibshirani et al.) with the features available and predicts the iC10 group.
Oscar M Rueda and Jose Antonio Seoane Fernandez
Maintainer: Oscar M. Rueda <[email protected]>
Curtis et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 2012; 486:346-352. Tibshirani et al. Diagnosis of multiple cancer types by shrunken centroids of gene expression. PNAS 2002; 99(10):6567-6572.
iC10
data(train.CN) data(train.Exp)
data(train.CN) data(train.Exp)
iC10 assignment for the Metabric training dataset (997 samples).
data(IntClustMemb)
data(IntClustMemb)
The format is: Factor w/ 10 levels "1","2","3","4",..: 2 9 3 3 8 6 7 7 7 3 ... - attr(*, "names")= chr [1:997] "MB.0135" "MB.0167" "MB.0136" "MB.3403" ...
Curtis et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 2012; 486:346-352.
data(IntClustMemb) barplot(table(IntClustMemb))
data(IntClustMemb) barplot(table(IntClustMemb))
Probe mapping of the complete set of features of the training set
data(Map.All)
data(Map.All)
A data frame with 714 observations on the following 10 variables:
Probe_ID
a character vector with the Illumina probe ids that flank the features
Gene_symbol
a factor with the hugo gene names
Ensembl_ID
a factor with the ensemble ids
Cytoband
a factor with the cytobands (on hg18)
Genomic_location_hg18
a factor with the genomic locations on hg18
chromosome_name_hg18
a numeric vector with the chromosome on hg18
start_position_hg18
a numeric vector with the start position on hg18
end_position_hg18
a numeric vector with the end position on hg18
Synonyms_0
a character vector with the gene name synonyms of the feature
Gene.Chosen
a character vector (YES or NO) specifiying the probe chosen for gene-based selection
Genomic_location_hg19
a factor with the genomic locations on hg19
chromosome_name_hg19
a numeric vector with the chromosome on hg19
start_position_hg19
a numeric vector with the start position on hg19
end_position_hg19
a numeric vector with the end position on hg19
chromosome_name_hg38
a numeric vector with the chromosome on hg38
start_position_hg38
a numeric vector with the start position on hg38
end_position_hg38
a numeric vector with the end position on hg38
Curtis et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 2012; 486:346-352.
data(Map.All) head(Map.All)
data(Map.All) head(Map.All)
Probe mapping of the copy number features of the training set.
data(Map.CN)
data(Map.CN)
A data frame with 38 observations on the following 8 variables.
Probe_ID
a character vector with the Illumina probe ids that flank the features
Gene_symbol
a factor with the hugo gene names
Ensembl_ID
a factor with the ensemble ids
Cytoband
a factor with the cytobands (on hg18)
Genomic_location_hg18
a factor with the genomic locations on hg18
chromosome_name_hg18
a numeric vector with the chromosome on hg18
start_position_hg18
a numeric vector with the start position on hg18
end_position_hg18
a numeric vector with the end position on hg18
Genomic_location_hg19
a factor with the genomic locations on hg19
chromosome_name_hg19
a numeric vector with the chromosome on hg19
start_position_hg19
a numeric vector with the start position on hg19
end_position_hg19
a numeric vector with the end position on hg19
chromosome_name_hg38
a numeric vector with the chromosome on hg38
start_position_hg38
a numeric vector with the start position on hg38
end_position_hg38
a numeric vector with the end position on hg38
Curtis et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 2012; 486:346-352.
data(Map.CN) head(Map.CN)
data(Map.CN) head(Map.CN)
Probe mapping of the Expression features of the training set
data(Map.Exp)
data(Map.Exp)
A data frame with 711 observations on the following 10 variables.
Probe_ID
a character vector with the Illumina probe ids that flank the features
Gene_symbol
a factor with the hugo gene names
Ensembl_ID
a factor with the ensemble ids
Cytoband
a factor with the cytobands (on hg18)
Genomic_location_hg18
a factor with the genomic locations on hg18
chromosome_name_hg18
a numeric vector with the chromosome on hg18
start_position_hg18
a numeric vector with the start position on hg18
end_position_hg18
a numeric vector with the end position on hg18
Synonyms_0
a character vector with the gene name synonyms of the feature
Gene.Chosen
a character vector (YES or NO) specifiying the probe chosen for gene-based selection
Genomic_location_hg19
a factor with the genomic locations on hg19
chromosome_name_hg19
a numeric vector with the chromosome on hg19
start_position_hg19
a numeric vector with the start position on hg19
end_position_hg19
a numeric vector with the end position on hg19
Curtis et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 2012; 486:346-352.
data(Map.Exp) head(Map.Exp)
data(Map.Exp) head(Map.Exp)
Copy number data for the training set
data(train.CN)
data(train.CN)
A matrix with 714 rows and 997 columns. Rows are features and columns are training samples.
Each row corresponds to one copy number feature for all samples in the training set. Note that it includes all features in the classifier. Note also that, depending on the data available and the type of matching (gene or probe) only some of the features will be used.
Curtis et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 2012; 486:346-352.
data(train.CN) summary(train.CN)
data(train.CN) summary(train.CN)
Expression data for the training set.
data(train.Exp)
data(train.Exp)
A matrix with 714 rows and 997 columns. Rows are features and columns are training samples.
Each row corresponds to one expression feature for all samples in the training set. Note that it includes all features in the classifier. Note that, depending on the data available and the type of matching (gene or probe) only some of the features will be used.
Curtis et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 2012; 486:346-352.
data(train.Exp) summary(train.Exp)
data(train.Exp) summary(train.Exp)