scala - Extract substring based on regex to use in RDD.filter -


i trying filter out rows of text file second column value begins words list.

i have list such as:

val mylist = ["inter", "intra"] 

if have row like:

cricket inter-house 

inter in list, row should filtered out rdd.filter operation. using following regex:

`[a-za-z0-9]+` 

i tried using """[a-za-z0-9]+""".r extract substring result in non empty iterator.

my question how access above result in filter operation?

you need construct regular expression ".* inter.*".r since """[a-za-z0-9]+""" matches word. here working example, hope helps:

val mylist = list("inter", "intra")     val textrdd = sc.parallelize(list("cricket inter-house", "cricket int-house",                                    "aaa bbb", "cricket intra-house"))  // map on list dynamically construct regular expressions , check if within  // text , use reduce make sure none of pattern exists in text, have  // call collect() see result or take(5) if want see first 5 results.  (textrdd.filter(text => mylist.map(word => !(".* " + word + ".*").r                        .pattern.matcher(text).matches).reduce(_&&_)).collect())  // res1: array[string] = array(cricket int-house, aaa bbb) 

Comments

Popular posts from this blog

mysql - Dreamhost PyCharm Django Python 3 Launching a Site -

java - Sending SMS with SMSLib and Web Services -

java - How to resolve The method toString() in the type Object is not applicable for the arguments (InputStream) -