r - dplyr::mutate:- new column = difference between two comma-delimited list columns -
example works:
df <- data.frame(c0=c(1, 2), c1=c("a,b,c", "d,e,f"), c2=c("b,c", "d,e")) df # c0 c1 c2 # 1 1 a,b,c b,c # 2 2 d,e,f d,e # add column d difference between c1 , c2 df %>% mutate(d=setdiff(unlist(strsplit(as.character(c1), ",")), unlist(strsplit(as.character(c2), ",")))) # c0 c1 c2 d # 1 1 a,b,c b,c # 2 2 d,e,f d,e f
i expected above: d assigned difference between these 2 lists of characters (they sorted).
however, if introduce more 1 different character no longer works:
df <- data.frame(c0=c(1, 2), c1=c("a,b,c", "d,e,f,g"), c2=c("b,c", "d,e")) df # c0 c1 c2 # 1 1 a,b,c b,c # 2 2 d,e,f,g d,e # add column d difference between c1 , c2 df %>% mutate(d=setdiff(unlist(strsplit(as.character(c1), ",")), unlist(strsplit(as.character(c2), ",")))) error: wrong result size (3), expected 2 or 1
what wanted there is:
c0 c1 c2 d 1 1 a,b,c b,c 2 2 d,e,f,g d,e f,g
i've tried adding paste()
around setdiff didn't help. in end want able use tidyr::separate
split out d column new rows like:
c0 c1 c2 d 1 1 a,b,c b,c 2 2 d,e,f,g d,e f 3 2 d,e,f,g d,e g
what doing wrong setdiff above?
thanks
tim
you error because @ row 2 have more 1 element can not fit cell, 1 way use rowwise
, wrap result list can fit , after use unnest
tidyr
expand list type column:
library(dplyr) library(tidyr) df %>% rowwise() %>% mutate(d=list(setdiff(unlist(strsplit(as.character(c1), ",")), unlist(strsplit(as.character(c2), ","))))) %>% unnest() # source: local data frame [3 x 4] # c0 c1 c2 d # <dbl> <fctr> <fctr> <chr> # 1 1 a,b,c b,c # 2 2 d,e,f,g d,e f # 3 2 d,e,f,g d,e g
Comments
Post a Comment