r - Aggregate entries in table by subset of column id characters -


i working on gene expression dataset using r. new coding please forgive me if not describe problem in adequate detail.

my dataset looks looks this:

    geneid        sample1    sample2     slc26a5-001   7          8     slc26a5-002   1          2     homer2-001    6          5     slc26a5-200   8          10 

the gene name first part of id (slc26a5) , transcript number denoted (-001). need find way collapse of different transcript ids , sum respective rows @ same time. output following:

    geneid        sample1    sample2     slc26a5       16         20     homer2        6          5 

the aggregate function should work summing rows based on gene id. stuck because can not figure out how refer gene id's first part of name inside of aggregate function.

does know how this?

thanks help!

the main thing remove tail part of geneid column standardize grouping. done below sub(). it's pretty standard aggregation. aggregate(), following it.

aggregate(df[-1], list(geneid = sub("-.*", "", df$geneid)), sum) #    geneid sample1 sample2 # 1  homer2       6       5 # 2 slc26a5      16      20 

we use rowsum() , not unnecessarily convert data.

rowsum(df[-1], sub("-.*", "", df$geneid)) #         sample1 sample2 # homer2        6       5 # slc26a5      16      20 

data:

df <- structure(list(geneid = structure(c(2l, 3l, 1l, 4l), .label = c("homer2-001",  "slc26a5-001", "slc26a5-002", "slc26a5-200"), class = "factor"),      sample1 = c(7l, 1l, 6l, 8l), sample2 = c(8l, 2l, 5l, 10l)), .names = c("geneid",  "sample1", "sample2"), class = "data.frame", row.names = c(na,  -4l)) 

Comments

Popular posts from this blog

mysql - Dreamhost PyCharm Django Python 3 Launching a Site -

java - Sending SMS with SMSLib and Web Services -

java - How to resolve The method toString() in the type Object is not applicable for the arguments (InputStream) -