r - How to perform pairwise division based on row grouping -
i have data frame made following way:
df <- structure(list(celltype = structure(c(1l, 1l, 2l, 2l, 3l, 3l, 4l, 4l, 5l, 5l, 6l, 6l, 7l, 7l, 8l, 8l, 9l, 9l, 10l, 10l), .label = c("bcells", "dendriticcells", "macrophages", "monocytes", "nkcells", "neutrophils", "stemcells", "stromalcells", "abtcells", "gdtcells"), class = "factor"), sample = c("sp id control", "sp id treated", "sp id control", "sp id treated", "sp id control", "sp id treated", "sp id control", "sp id treated", "sp id control", "sp id treated", "sp id control", "sp id treated", "sp id control", "sp id treated", "sp id control", "sp id treated", "sp id control", "sp id treated", "sp id control", "sp id treated"), `mean(score)` = c(0.160953535029424, 0.155743474395545, 0.104788051104575, 0.125247035158472, -0.159665650045289, -0.134662049979712, 0.196249441751866, 0.212256889027029, 0.0532668251890109, 0.0738264693971133, 0.151828478029596, 0.159941552142933, -0.14128323638966, -0.120556640790534, 0.196518649474078, 0.185264282171863, 0.0654641151966543, 0.0837989059507186, 0.145111577618456, 0.145448549866796)), .names = c("celltype", "sample", "mean(score)"), row.names = c(7l, 8l, 17l, 18l, 27l, 28l, 37l, 38l, 47l, 48l, 57l, 58l, 67l, 68l, 77l, 78l, 87l, 88l, 97l, 98l), class = "data.frame")
it looks this:
> df celltype sample mean(score) 7 bcells sp id control 0.16095354 8 bcells sp id treated 0.15574347 17 dendriticcells sp id control 0.10478805 18 dendriticcells sp id treated 0.12524704 27 macrophages sp id control -0.15966565 28 macrophages sp id treated -0.13466205 37 monocytes sp id control 0.19624944 38 monocytes sp id treated 0.21225689 47 nkcells sp id control 0.05326683 48 nkcells sp id treated 0.07382647 57 neutrophils sp id control 0.15182848 58 neutrophils sp id treated 0.15994155 67 stemcells sp id control -0.14128324 68 stemcells sp id treated -0.12055664 77 stromalcells sp id control 0.19651865 78 stromalcells sp id treated 0.18526428 87 abtcells sp id control 0.06546412 88 abtcells sp id treated 0.08379891 97 gdtcells sp id control 0.14511158 98 gdtcells sp id treated 0.14544855
what want compute division of score based on treated
, control
sample within cell type
grouping.
the following excel image illustrate example. we're after right column. example in bcells (0.155/0.161 = 0.967).
at end of day i'd df looks this:
celltype sample pairwise division bcells sp id treated 0.967630031 dendriticcells sp id treated 1.195241574 macrophages sp id treated 0.843400255 monocytes sp id treated 1.081566841 nkcells sp id treated 1.385974647 neutrophils sp id treated 1.053435786 stemcells sp id treated 0.853297563 stromalcells sp id treated 0.942731303 abtcells sp id treated 1.280073915 gdtcells sp id treated 1.002322158
how can achieve in r?
if spread wide form, it's pretty trivial:
library(tidyr) library(dplyr) df %>% spread(sample, `mean(score)`) %>% mutate(pairwise_division = `sp id treated` / `sp id control`) ## celltype sp id control sp id treated pairwise_division ## 1 bcells 0.16095354 0.15574347 0.9676300 ## 2 dendriticcells 0.10478805 0.12524704 1.1952416 ## 3 macrophages -0.15966565 -0.13466205 0.8434003 ## 4 monocytes 0.19624944 0.21225689 1.0815668 ## 5 nkcells 0.05326683 0.07382647 1.3859746 ## 6 neutrophils 0.15182848 0.15994155 1.0534358 ## 7 stemcells -0.14128324 -0.12055664 0.8532976 ## 8 stromalcells 0.19651865 0.18526428 0.9427313 ## 9 abtcells 0.06546412 0.08379891 1.2800739 ## 10 gdtcells 0.14511158 0.14544855 1.0023222
note should fix column names don't have use backticks often.
to precisely desired result, gather long, filter treated rows, , select desired columns:
df %>% spread(sample, `mean(score)`) %>% mutate(pairwise_division = `sp id treated` / `sp id control`) %>% gather(sample, `mean(score)`, starts_with('sp')) %>% filter(sample == 'sp id treated') %>% select(celltype, sample, pairwise_division) ## celltype sample pairwise_division ## 1 bcells sp id treated 0.9676300 ## 2 dendriticcells sp id treated 1.1952416 ## 3 macrophages sp id treated 0.8434003 ## 4 monocytes sp id treated 1.0815668 ## 5 nkcells sp id treated 1.3859746 ## 6 neutrophils sp id treated 1.0534358 ## 7 stemcells sp id treated 0.8532976 ## 8 stromalcells sp id treated 0.9427313 ## 9 abtcells sp id treated 1.2800739 ## 10 gdtcells sp id treated 1.0023222
equivalent versions possible in base , data.table, if prefer. or take direct route:
aggregate(cbind(pairwise_division = `mean(score)`) ~ celltype, df[order(df$celltype, df$sample), ], fun = function(x){x[2]/x[1]}) ## celltype pairwise_division ## 1 bcells 0.9676300 ## 2 dendriticcells 1.1952416 ## 3 macrophages 0.8434003 ## 4 monocytes 1.0815668 ## 5 nkcells 1.3859746 ## 6 neutrophils 1.0534358 ## 7 stemcells 0.8532976 ## 8 stromalcells 0.9427313 ## 9 abtcells 1.2800739 ## 10 gdtcells 1.0023222
Comments
Post a Comment