Cannot grow BufferHolder by size because the size after growing exceeds size limitation
今天用spark的时候碰到的问题,直接对一个大的dataframe做agg,导致buffer超了。可以人为的在dataframe上append一个新的字段,根据字段先做一个agg,最后再agg,就不会超了importrandomdefget_rand(i):returnrandom.randint(1,10000)randUdf=udf(get_rand,IntegerType())getP=ud