spark not contain



1、You can negate predicate using either not or ! so all what's left is to add another condition:


import org.apache.spark.sql.functions.not

df.where($"referrer".contains("www.mydomain.") &&

2、or separate filter:



You may use a Regex. Here you can find a reference for the usage of regex in Scala. And here you can find some hints about how to create a proper regex for URLs.

Thus in your case you will have something like:

val regex = "PUT_YOUR_REGEX_HERE".r // something like (https?|ftp)://[^\s]*)? should work
val filteredDf = unfilteredDf.filter(regex.findFirstIn(($"referrer")) match {
    case Some => true
    case None => false
} )


你可能感兴趣的:(spark not contain)