r - How to perform pairwise division based on row grouping -


i have data frame made following way:

df <- structure(list(celltype = structure(c(1l, 1l, 2l, 2l, 3l, 3l, 4l, 4l, 5l, 5l, 6l, 6l, 7l, 7l, 8l, 8l, 9l, 9l, 10l, 10l), .label = c("bcells", "dendriticcells", "macrophages", "monocytes", "nkcells", "neutrophils", "stemcells", "stromalcells", "abtcells", "gdtcells"), class = "factor"),     sample = c("sp id control", "sp id treated", "sp id control",     "sp id treated", "sp id control", "sp id treated", "sp id control",     "sp id treated", "sp id control", "sp id treated", "sp id control",     "sp id treated", "sp id control", "sp id treated", "sp id control",     "sp id treated", "sp id control", "sp id treated", "sp id control",     "sp id treated"), `mean(score)` = c(0.160953535029424, 0.155743474395545,     0.104788051104575, 0.125247035158472, -0.159665650045289,     -0.134662049979712, 0.196249441751866, 0.212256889027029,     0.0532668251890109, 0.0738264693971133, 0.151828478029596,     0.159941552142933, -0.14128323638966, -0.120556640790534,     0.196518649474078, 0.185264282171863, 0.0654641151966543,     0.0837989059507186, 0.145111577618456, 0.145448549866796)), .names = c("celltype", "sample", "mean(score)"), row.names = c(7l, 8l, 17l, 18l, 27l, 28l, 37l, 38l, 47l, 48l, 57l, 58l, 67l, 68l, 77l, 78l, 87l, 88l, 97l, 98l), class = "data.frame") 

it looks this:

> df          celltype        sample mean(score) 7          bcells sp id control  0.16095354 8          bcells sp id treated  0.15574347 17 dendriticcells sp id control  0.10478805 18 dendriticcells sp id treated  0.12524704 27    macrophages sp id control -0.15966565 28    macrophages sp id treated -0.13466205 37      monocytes sp id control  0.19624944 38      monocytes sp id treated  0.21225689 47        nkcells sp id control  0.05326683 48        nkcells sp id treated  0.07382647 57    neutrophils sp id control  0.15182848 58    neutrophils sp id treated  0.15994155 67      stemcells sp id control -0.14128324 68      stemcells sp id treated -0.12055664 77   stromalcells sp id control  0.19651865 78   stromalcells sp id treated  0.18526428 87       abtcells sp id control  0.06546412 88       abtcells sp id treated  0.08379891 97       gdtcells sp id control  0.14511158 98       gdtcells sp id treated  0.14544855 

what want compute division of score based on treated , control sample within cell type grouping.

the following excel image illustrate example. we're after right column. example in bcells (0.155/0.161 = 0.967).

enter image description here

at end of day i'd df looks this:

celltype            sample          pairwise division bcells              sp id treated   0.967630031 dendriticcells      sp id treated   1.195241574 macrophages         sp id treated   0.843400255 monocytes           sp id treated   1.081566841 nkcells             sp id treated   1.385974647 neutrophils         sp id treated   1.053435786 stemcells           sp id treated   0.853297563 stromalcells        sp id treated   0.942731303 abtcells            sp id treated   1.280073915 gdtcells            sp id treated   1.002322158 

how can achieve in r?

if spread wide form, it's pretty trivial:

library(tidyr) library(dplyr)  df %>% spread(sample, `mean(score)`) %>%      mutate(pairwise_division = `sp id treated` / `sp id control`)  ##          celltype sp id control sp id treated pairwise_division ## 1          bcells    0.16095354    0.15574347         0.9676300 ## 2  dendriticcells    0.10478805    0.12524704         1.1952416 ## 3     macrophages   -0.15966565   -0.13466205         0.8434003 ## 4       monocytes    0.19624944    0.21225689         1.0815668 ## 5         nkcells    0.05326683    0.07382647         1.3859746 ## 6     neutrophils    0.15182848    0.15994155         1.0534358 ## 7       stemcells   -0.14128324   -0.12055664         0.8532976 ## 8    stromalcells    0.19651865    0.18526428         0.9427313 ## 9        abtcells    0.06546412    0.08379891         1.2800739 ## 10       gdtcells    0.14511158    0.14544855         1.0023222 

note should fix column names don't have use backticks often.

to precisely desired result, gather long, filter treated rows, , select desired columns:

df %>% spread(sample, `mean(score)`) %>%      mutate(pairwise_division = `sp id treated` / `sp id control`) %>%      gather(sample, `mean(score)`, starts_with('sp')) %>%      filter(sample == 'sp id treated') %>%      select(celltype, sample, pairwise_division)  ##          celltype        sample pairwise_division ## 1          bcells sp id treated         0.9676300 ## 2  dendriticcells sp id treated         1.1952416 ## 3     macrophages sp id treated         0.8434003 ## 4       monocytes sp id treated         1.0815668 ## 5         nkcells sp id treated         1.3859746 ## 6     neutrophils sp id treated         1.0534358 ## 7       stemcells sp id treated         0.8532976 ## 8    stromalcells sp id treated         0.9427313 ## 9        abtcells sp id treated         1.2800739 ## 10       gdtcells sp id treated         1.0023222 

equivalent versions possible in base , data.table, if prefer. or take direct route:

aggregate(cbind(pairwise_division = `mean(score)`) ~ celltype,            df[order(df$celltype, df$sample), ],            fun = function(x){x[2]/x[1]})  ##          celltype pairwise_division ## 1          bcells         0.9676300 ## 2  dendriticcells         1.1952416 ## 3     macrophages         0.8434003 ## 4       monocytes         1.0815668 ## 5         nkcells         1.3859746 ## 6     neutrophils         1.0534358 ## 7       stemcells         0.8532976 ## 8    stromalcells         0.9427313 ## 9        abtcells         1.2800739 ## 10       gdtcells         1.0023222 

Comments

Popular posts from this blog

mysql - Dreamhost PyCharm Django Python 3 Launching a Site -

java - Sending SMS with SMSLib and Web Services -

java - How to resolve The method toString() in the type Object is not applicable for the arguments (InputStream) -