r - dplyr::mutate:- new column = difference between two comma-delimited list columns -


example works:

df <- data.frame(c0=c(1, 2), c1=c("a,b,c", "d,e,f"), c2=c("b,c", "d,e")) df #   c0    c1  c2 # 1  1 a,b,c b,c # 2  2 d,e,f d,e  # add column d difference between c1 , c2 df %>% mutate(d=setdiff(unlist(strsplit(as.character(c1), ",")), unlist(strsplit(as.character(c2), ","))))  #   c0    c1  c2 d # 1  1 a,b,c b,c # 2  2 d,e,f d,e f 

i expected above: d assigned difference between these 2 lists of characters (they sorted).

however, if introduce more 1 different character no longer works:

df <- data.frame(c0=c(1, 2), c1=c("a,b,c", "d,e,f,g"), c2=c("b,c", "d,e")) df #   c0      c1  c2 # 1  1   a,b,c b,c # 2  2 d,e,f,g d,e  # add column d difference between c1 , c2 df %>% mutate(d=setdiff(unlist(strsplit(as.character(c1), ",")), unlist(strsplit(as.character(c2), ",")))) error: wrong result size (3), expected 2 or 1 

what wanted there is:

  c0    c1    c2  d 1  1 a,b,c    b,c 2  2 d,e,f,g  d,e f,g 

i've tried adding paste() around setdiff didn't help. in end want able use tidyr::separate split out d column new rows like:

  c0    c1    c2  d 1  1 a,b,c    b,c 2  2 d,e,f,g  d,e f 3  2 d,e,f,g  d,e g 

what doing wrong setdiff above?

thanks

tim

you error because @ row 2 have more 1 element can not fit cell, 1 way use rowwise , wrap result list can fit , after use unnest tidyr expand list type column:

library(dplyr) library(tidyr) df %>%        rowwise() %>%        mutate(d=list(setdiff(unlist(strsplit(as.character(c1), ",")),                              unlist(strsplit(as.character(c2), ","))))) %>%        unnest()  # source: local data frame [3 x 4]  #      c0      c1     c2     d #   <dbl>  <fctr> <fctr> <chr> # 1     1   a,b,c    b,c     # 2     2 d,e,f,g    d,e     f # 3     2 d,e,f,g    d,e     g 

Comments

Popular posts from this blog

mysql - Dreamhost PyCharm Django Python 3 Launching a Site -

java - Sending SMS with SMSLib and Web Services -

java - How to resolve The method toString() in the type Object is not applicable for the arguments (InputStream) -