r - Can I replace NAs when joining two data frames with dplyr? -


i join 2 data frames. of column names overlap, , there na entries in 1 of data frame's overlapping columns. here simplified example:

df1 <- data.frame(fruit = c('apples','oranges','bananas','grapes'), var1 = c(1,2,3,4), var2 = c(3,na,6,na), stringsasfactors = false) df2 <- data.frame(fruit = c('oranges','grapes'), var2=c(5,6), var3=c(7,8), stringsasfactors = false) 

can use dplyr join functions join these data frames , automatically prioritize non-na entry "var2" column have no na entries in joined data frame? now, if call left_join, keeps na entries, , if call full_join duplicates rows.

coalesce might need. fills na first vector values second vector @ corresponding positions:

library(dplyr) df1 %>%          left_join(df2, = "fruit") %>%          mutate(var2 = coalesce(var2.x, var2.y)) %>%          select(-var2.x, -var2.y)  #     fruit var1 var3 var2 # 1  apples    1   na    3 # 2 oranges    2    7    5 # 3 bananas    3   na    6 # 4  grapes    4    8    6 

or use data.table, in-place replacing:

library(data.table) setdt(df1)[setdt(df2), on = "fruit", `:=` (var2 = i.var2, var3 = i.var3)] df1 #      fruit var1 var2 var3 # 1:  apples    1    3   na # 2: oranges    2    5    7 # 3: bananas    3    6   na # 4:  grapes    4    6    8 

Comments

Popular posts from this blog

mysql - Dreamhost PyCharm Django Python 3 Launching a Site -

java - Sending SMS with SMSLib and Web Services -

java - How to resolve The method toString() in the type Object is not applicable for the arguments (InputStream) -