r - Can I replace NAs when joining two data frames with dplyr? -
i join 2 data frames. of column names overlap, , there na
entries in 1 of data frame's overlapping columns. here simplified example:
df1 <- data.frame(fruit = c('apples','oranges','bananas','grapes'), var1 = c(1,2,3,4), var2 = c(3,na,6,na), stringsasfactors = false) df2 <- data.frame(fruit = c('oranges','grapes'), var2=c(5,6), var3=c(7,8), stringsasfactors = false)
can use dplyr join functions join these data frames , automatically prioritize non-na
entry "var2" column have no na
entries in joined data frame? now, if call left_join
, keeps na
entries, , if call full_join
duplicates rows.
coalesce
might need. fills na first vector values second vector @ corresponding positions:
library(dplyr) df1 %>% left_join(df2, = "fruit") %>% mutate(var2 = coalesce(var2.x, var2.y)) %>% select(-var2.x, -var2.y) # fruit var1 var3 var2 # 1 apples 1 na 3 # 2 oranges 2 7 5 # 3 bananas 3 na 6 # 4 grapes 4 8 6
or use data.table
, in-place replacing:
library(data.table) setdt(df1)[setdt(df2), on = "fruit", `:=` (var2 = i.var2, var3 = i.var3)] df1 # fruit var1 var2 var3 # 1: apples 1 3 na # 2: oranges 2 5 7 # 3: bananas 3 6 na # 4: grapes 4 6 8
Comments
Post a Comment