标题去标点、空格、停用词等采用AI提供的开源包:

标题去标点、空格、停用词等采用AI提供的开源包:

            com.hankcs
            hanlp
            portable-1.7.8
        

调用方法:
 HanLP.segment(text)
                    .stream()
                    .map(t -> HanLP.convertToSimplifiedChinese(t.word))
                    .filter(NormalizeUtils::checkStringContainChinese) //必须要有中文
                    .filter(w -> !CoreStopWordDictionary.contains(w) && !StringUtils.isEmpty(w))
                    .filter(w -> !w.matches(
                            "[\\pP\\p{Punct}]")) //标点
                    .collect(Collectors.joining(""));

 

 

// 信息
String title ="中文-----";
//去标点  返回String
title = HanLP.segment(title)
        .stream()
        .map(t -> HanLP.convertToSimplifiedChinese(t.word))
        .filter(w -> !CoreStopWordDictionary.contains(w) && !StringUtils.isEmpty(w))
        .filter(w -> !w.matches("[\\pP\\p{Punct}]"))
        .collect(Collectors.joining(""));

 

你可能感兴趣的:(2020年工作)