Halo博客的百度定时页面提交
前言
- 好不容易搭建好博客,写了些自以为有点意思的文章,但是没人看!!因为没有提交到搜索引擎,所以根本没人能搜到嘛~。虽然Next主题提供了百度自动提交的配置,但是百度收录已经不再提供推动收录的服务,所以Next主题的配置也没啥用了
百度收录网站中提供了三种收录方式,其中API提交最快捷,因此考虑使用Java实现将Halo博客文章推送到百度收录中
- API提交
- sitemap提交
- 手动提交
Halo提供了用于获取文章列表的API,因此思路很简单:使用Java定时任务线程池按照固定的时间间隔从Halo API中获取全部的文章链接,然后调用百度收录API,向百度提交文章链接
- 百度收录对于频繁提交旧的链接有一定的限制,如果经常重复提交旧链接,会下调配额,甚至可能会失去API推送功能的权限,如果经常提交新的文章链接,可能适当提高配额。因此需要建立一个简单的缓存,提交链接时滤除旧的已经提交过的链接
- 尽管Google使用站点地图就已经能很好地进行链接的抓取了,不用单独提交,但是Google同样推荐使用indexing API主动提交要收录的链接,具体可参考Halo博客的谷歌定时页面提交
工程搭建
建立Gradle工程,配置文件如下所示
plugins { id 'java' id 'application' } group 'xyz.demoli' version '1.0-SNAPSHOT' sourceCompatibility = 1.11 mainClassName="xyz.demoli.Main" repositories { mavenCentral() } application{ applicationDefaultJvmArgs = ['-Duser.timezone=GMT+8'] } dependencies { testImplementation 'org.junit.jupiter:junit-jupiter-api:5.8.1' testRuntimeOnly 'org.junit.jupiter:junit-jupiter-engine:5.8.1' // https://mvnrepository.com/artifact/com.squareup.okhttp3/okhttp implementation group: 'com.squareup.okhttp3', name: 'okhttp', version: '4.9.3' implementation 'com.google.code.gson:gson:2.9.0' // https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j-api implementation group: 'org.apache.logging.log4j', name: 'log4j-api', version: '2.14.1' // https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j-core implementation group: 'org.apache.logging.log4j', name: 'log4j-core', version: '2.14.1' // https://mvnrepository.com/artifact/org.projectlombok/lombok compileOnly group: 'org.projectlombok', name: 'lombok', version: '1.18.22' annotationProcessor group: 'org.projectlombok', name: 'lombok', version: '1.18.22' } test { useJUnitPlatform() }
annotationProcessor group: 'org.projectlombok', name: 'lombok', version: '1.18.22'
保证gradle项目中lombok的注解可以被正确解析applicationDefaultJvmArgs
参数的设置是为了解决后续服务部署在容器中时日志打印时间不是东八区时区的问题
配置文件如下:
prefix=https://blog.demoli.xyz postAPI=%s/api/content/posts?api_access_key=%s&page=%d apiAccessKey=*** baiduUrl=http://data.zz.baidu.com/urls? token=***
日志配置文件如下:
整个工程只有两个核心类
PostScrap
import com.google.gson.Gson; import com.google.gson.JsonArray; import com.google.gson.JsonElement; import com.google.gson.JsonObject; import java.io.IOException; import java.io.InputStream; import java.util.ArrayList; import java.util.HashSet; import java.util.List; import java.util.Properties; import java.util.Set; import java.util.stream.Collectors; import okhttp3.OkHttpClient; import okhttp3.Request; import okhttp3.Response; /** * 使用Halo API获取文章链接 */ public class PostScrap { static private String postAPI; static private String apiAccessKey; static private String prefix; // 缓存 static private final Set
links = new HashSet<>(); // 注意properties配置文件中字符串不用加引号 static { try (InputStream stream = PostScrap.class.getResourceAsStream("/config.properties")) { Properties properties = new Properties(); properties.load(stream); apiAccessKey = properties.getProperty("apiAccessKey"); prefix = properties.getProperty("prefix"); postAPI = properties.getProperty("postAPI"); } catch (IOException e) { e.printStackTrace(); } } /** * 发起请求获取全部文章链接 * * @return */ public static List getPosts() { List res = new ArrayList<>(); OkHttpClient client = new OkHttpClient(); Request initialRequest = new Request.Builder().get().url(String.format(postAPI, prefix, apiAccessKey, 0)) .build(); try (Response response = client.newCall(initialRequest).execute()) { res = handlePage(response, client); } catch (IOException e) { e.printStackTrace(); } return res; } /** * 处理分页 * * @param initialResponse * @param client * @return * @throws IOException */ private static List handlePage(Response initialResponse, OkHttpClient client) throws IOException { JsonObject jsonObject = new Gson().fromJson(initialResponse.body().string(), JsonObject.class); JsonArray array = jsonObject.get("data").getAsJsonObject().get("content").getAsJsonArray(); int pages = jsonObject.get("data").getAsJsonObject().get("pages").getAsInt(); // jsonArray转为List List posts = new ArrayList<>(); for (JsonElement element : array) { posts.add(element.getAsJsonObject().get("fullPath").getAsString()); } // 分页查询 for (int i = 1; i < pages; i++) { Request request = new Request.Builder().get().url(String.format(postAPI, prefix, apiAccessKey, i)) .build(); try (Response response = client.newCall(request).execute()) { jsonObject = new Gson().fromJson(response.body().string(), JsonObject.class); array = jsonObject.get("data").getAsJsonObject().get("content").getAsJsonArray(); for (JsonElement element : array) { posts.add(element.getAsJsonObject().get("fullPath").getAsString()); } } catch (IOException e) { e.printStackTrace(); } } // 缓存过滤 return posts.stream().map(content -> prefix + content).filter(links::add) .collect(Collectors.toList()); } } BaiduSubmitter
import com.google.gson.Gson; import com.google.gson.JsonObject; import java.io.IOException; import java.io.InputStream; import java.util.Optional; import java.util.Properties; import lombok.extern.log4j.Log4j2; import okhttp3.MediaType; import okhttp3.OkHttpClient; import okhttp3.Request; import okhttp3.RequestBody; import okhttp3.Response; /** * 提交百度收录 */ @Log4j2 public class BaiduSubmitter { static private String submitUrl; static { try (InputStream stream = PostScrap.class.getResourceAsStream("/config.properties")) { Properties properties = new Properties(); properties.load(stream); String baiduUrl = properties.getProperty("baiduUrl"); String site = properties.getProperty("prefix"); String token = properties.getProperty("token"); submitUrl = baiduUrl + "site=" + site + "&token=" + token; } catch (IOException e) { e.printStackTrace(); } } /** * 提交链接 */ public static void submit() { OkHttpClient client = new OkHttpClient(); Optional
urlStrings = PostScrap.getPosts().stream().reduce((out, url) -> out + "\n" + url); if (urlStrings.isEmpty()) { log.info("无新增文章"); return; } RequestBody body = RequestBody.create(MediaType.get("text/plain"), urlStrings.get()); Request request = new Request.Builder().post(body).url(submitUrl) .header("Content-Type", "text/plain") .build(); try (Response response = client.newCall(request).execute()) { JsonObject jsonObject = new Gson().fromJson(response.body().string(), JsonObject.class); if (jsonObject.get("error") != null) { log.error("提交失败: {}", jsonObject.get("message").getAsString()); } log.info("提交成功 {} 条链接,剩余额度: {},链接清单如下:", jsonObject.get("success").getAsInt(), jsonObject.get("remain").getAsInt()); log.info(urlStrings.get()); } catch (IOException e) { e.printStackTrace(); } } }
Main
public class Main { public static void main(String[] args) { Executors.newScheduledThreadPool(1) .scheduleWithFixedDelay(BaiduSubmitter::submit, 0, 12, TimeUnit.HOURS); } }
工程部署
- 项目根目录执行
gradle build -x test
将
build/distributions/BaiduSubmitter-1.0-SNAPSHOT.tar
拷贝到安装有Java环境的服务器tar xf BaiduSubmitter-1.0-SNAPSHOT.tar` cd BaiduSubmitter-1.0-SNAPSHOT nohup bin/BaiduSubmitter > nohup.out &
tail -f nohup.out
查看日志
补充
- 博主是一个Docker容器的究极爱好者,因为使用容器可以保证宿主机环境的”纯净“,所以这里补充使用Docker容器部署服务的方式
首先将项目构建得到的软件包
build/distributions/BaiduSubmitter-1.0-SNAPSHOT.tar
拷贝到服务器,解压并重新命名,创建Dockerfiletar xf BaiduSubmitter-1.0-SNAPSHOT.tar mkdir -p blogSubmitter/baiduSubmitter mv BaiduSubmitter-1.0-SNAPSHOT blogSubmitter/baiduSubmitter/baidu cd blogSubmitter/baiduSubmitter touch Dockerfile
Dockerfile文件如下:
FROM openjdk:11 COPY . /submitter WORKDIR /submitter # 更改时区 RUN rm -rf /etc/localtime RUN ln -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime CMD ["nohup","baidu/bin/BaiduSubmitter"," &"]
创建yaml配置文件,使用Docker Compose构建服务
cd blogSubmitter touch submitter.yaml
version: '3.1' services: blog-baidu-submitter: build: ./baiduSubmitter container_name: blogBaiduSubmitter restart: unless-stopped
- 执行
docker-compose -f submitter.yaml up -d
创建服务
注意事项
- 如果更改了源码,需要重新构建镜像,此时要把之前的镜像删除(应该有更好的解决办法,有待改善,比如使用volume的方式执行挂载)