来源
后台批量生成数据图片,保存快照
思路
数据->html文件->图片
解决
尝试过cssbox和phantomjs,生成的图片样式上会有问题,放弃
- cssbox
net.sf.cssbox
cssbox
4.17
@Test
public void testCssBox() throws Exception {
ImageRenderer render = new ImageRenderer();
System.out.println("start...");
FileInputStream originalIn = new FileInputStream("/Users/shao/Downloads/snapshot.html");
File changedFile = new File("/Users/shao/xxx" + File.separator + "yyy.html");
FileUtils.copyInputStreamToFile(originalIn, changedFile);
String url = changedFile.toURI().toURL().toString();
FileOutputStream out = new FileOutputStream(new File("/Users/shao/xxx" + File.separator + "yyy.png"));
render.renderURL(url, out, ImageRenderer.Type.PNG);
out.close();
System.out.println("OK");
}
- phantomjs
private static String BLANK = " ";
private static String binPath = "/Users/shao/Downloads/phantomjs-2.1.1-macosx/bin/phantomjs";// 插件引入地址
private static String jsPath = "/Users/shao/Downloads/phantomjs-2.1.1-macosx/examples/rasterize.js ";// js引入地址
@Test
public void testWebkit() throws Exception {
FileInputStream originalIn = new FileInputStream("/Users/shao/xxx/snapshot.html");
// 随便修改下html文件,这里没动
File changedFile = new File("/Users/xxx" + File.separator + "yyy.html");
FileUtils.copyInputStreamToFile(originalIn, changedFile);
String url = changedFile.toURI().toURL().toString();
String imgagePath = "/Users/shao/xxx" + File.separator + "yyy.png";
// Java中使用Runtime和Process类运行外部程序
Process process = Runtime.getRuntime().exec(cmd(imgagePath,url));
InputStream inputStream = process.getInputStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream));
String tmp = "";
while ((tmp = reader.readLine()) != null) {
close(process,reader);
}
System.out.println("success");
}
// 执行cmd命令
public static String cmd(String imgagePath, String url) {
return binPath + BLANK + jsPath + BLANK + url + BLANK + imgagePath;
}
// 关闭命令
public static void close(Process process, BufferedReader bufferedReader) throws IOException {
if (bufferedReader != null) {
bufferedReader.close();
}
if (process != null) {
process.destroy();
process = null;
}
}
使用chrome驱动,完美还原样式,成功
- chrome驱动
http://chromedriver.storage.googleapis.com/index.html
- jar依赖
org.jsoup
jsoup
1.6.2
org.seleniumhq.selenium
selenium-java
3.4.0
ru.yandex.qatools.ashot
ashot
1.5.4
net.coobird
thumbnailator
0.4.12
- java代码
private static final int WEB_DRIVER_WAIT_MILLISECONDS = 200;
private static final int SCROLL_TIMEOUT = 200;
private static final int SCROLL_BAR_WIDTH = 20;
private static final String PIC_FORMAT = "PNG";
private static String getImage(File htmlFile) throws IOException {
// 1. html本地文件链接,可以通过jsoup操作/生成html文件
String url = htmlFile.toURI().toURL().toString();
// 2. 获取chromeDriver
WebDriver driver = getWebDriver(url);
// 3. 等待页面加载完成
new WebDriverWait(driver, WEB_DRIVER_WAIT_MILLISECONDS).until(dri -> ((JavascriptExecutor) dri)
.executeScript("return document.readyState").equals("complete"));
// 4. 设置浏览窗口大小,SCROLL_BAR_WIDTH是滚动截屏时滚动条的宽度,后续Thumbnails截取掉
JavascriptExecutor jsExecutor = (JavascriptExecutor) driver;
int width = Integer.parseInt(String.valueOf(jsExecutor.executeScript("return document.getElementById('snapshotImageHtmlBody').clientWidth")));
int height = Integer.parseInt(String.valueOf(jsExecutor.executeScript("return document.getElementById('snapshotImageHtmlBody').clientHeight")));
driver.manage().window().setSize(new Dimension(width + SCROLL_BAR_WIDTH, height));
// 5. 滚动截屏
Screenshot screenshot = getScreenshot(driver);
// 6. 输出图片流,也可以直接输出文件
ByteArrayOutputStream bos = new ByteArrayOutputStream();
Thumbnails.of(screenshot.getImage())
.sourceRegion(0, 0, width, height)
.size(width, height)
.outputFormat(PIC_FORMAT)
.toOutputStream(bos);
Thumbnails.of(screenshot.getImage())
.sourceRegion(0, 0, width, height)
.size(width, height)
.toFile(new File("/Users/shao/xxx" + File.separator + "yyy.png"));
// 7. 转成base64String,存入数据库
String base64StringOfImage = new String(Base64.getEncoder().encode(bos.toByteArray()), StandardCharsets.UTF_8);
// 8. chromeDriver退出
driver.quit();
return base64StringOfImage;
}
/**
* 滚动截屏
*/
private static Screenshot getScreenshot(WebDriver driver) {
Screenshot screenshot = new AShot()
.shootingStrategy(ShootingStrategies.viewportPasting(SCROLL_TIMEOUT))
.takeScreenshot(driver);
if (EnvironmentHelper.isLocal()) {
screenshot = new AShot()
.shootingStrategy(ShootingStrategies.viewportRetina(SCROLL_TIMEOUT, 0, 0, 2))
.takeScreenshot(driver);
}
return screenshot;
}
/**
* 构造chromeDriver
*/
private static WebDriver getWebDriver(String url) {
System.setProperty("webdriver.chrome.driver", TempFileUtils.getChromeDriverPath());
ChromeOptions options = new ChromeOptions();
// open Browser in maximized mode
options.addArguments("start-maximized");
// disabling infobars
options.addArguments("disable-infobars");
// disabling extensions
options.addArguments("--disable-extensions");
// applicable to windows os only
options.addArguments("--disable-gpu");
// overcome limited resource problems
options.addArguments("--disable-dev-shm-usage");
// Bypass OS security model
options.addArguments("--no-sandbox");
//设置为 headless 模式,linux系统不需要真实启动浏览器
options.addArguments("--headless");
//This will force Chrome to use the /tmp directory instead.
// This may slow down the execution though since disk will be used instead of memory.
//options.addArguments("--disable-dev-shm-usage");
WebDriver driver = new ChromeDriver(options);
driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
driver.manage().window().maximize();
driver.get(url);
return driver;
}
- 期间出现过各种问题, 有驱动版本问题,有环境问题(docker部署也有坑),有代码问题。
这里简单罗列一些,后续有空我再更新下细节。
有问题想问的也可以留言
ERROR org.openqa.selenium.os.OsProcess - org.apache.commons.exec.ExecuteException: Execution failed (Exit value: -559038737. Caused by java.io.IOException: Cannot run program "/home/appuser/driver/chromedriver-linux" (in directory "."): error=2, No such file or directory)
ERROR - test error
org.openqa.selenium.WebDriverException: Timed out waiting for driver server to start.
Build info: version: 'unknown', revision: 'unknown', time: 'unknown'
System info: host: 'ants-daily', ip: '10.7.5.222', os.name: 'Linux', os.arch: 'amd64', os.version: '3.10.0-693.17.1.el7.x86_64', java.version: '1.8.0_252'
Driver info: driver.version: ChromeDriver
at org.openqa.selenium.remote.service.DriverService.waitUntilAvailable(DriverService.java:192)
at org.openqa.selenium.remote.service.DriverService.start(DriverService.java:178)
at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:79)
ERROR - test error
org.openqa.selenium.WebDriverException: unknown error: Chrome failed to start: exited abnormally.
(unknown error: DevToolsActivePort file doesn't exist)
(The process started from chrome location /usr/lib/chromium/chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
Build info: version: 'unknown', revision: 'unknown', time: 'unknown'
System info: host: 'ants-daily', ip: '10.7.5.222', os.name: 'Linux', os.arch: 'amd64', os.version: '3.10.0-693.17.1.el7.x86_64', java.version: '1.8.0_252'
Driver info: driver.version: ChromeDriver
remote stacktrace:
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
org.openqa.selenium.WebDriverException: unknown error: session deleted because of page crash
from unknown error: cannot determine loading status
from tab crashed
(Session info: headless chrome=83.0.4103.116)
Build info: version: 'unknown', revision: 'unknown', time: 'unknown'
System info: host: 'ants-daily', ip: '10.7.5.222', os.name: 'Linux', os.arch: 'amd64', os.version: '3.10.0-693.17.1.el7.x86_64', java.version: '1.8.0_252'
Driver info: org.openqa.selenium.chrome.ChromeDriver
Capabilities {acceptInsecureCerts: false, browserName: chrome, browserVersion: 83.0.4103.116, chrome: {chromedriverVersion: 83.0.4103.116 (8f0c18b4dca9..., userDataDir: /tmp/.org.chromium.Chromium...}, goog:chromeOptions: {debuggerAddress: localhost:43842}, javascriptEnabled: true, networkConnectionEnabled: false, pageLoadStrategy: normal, platform: LINUX, platformName: LINUX, proxy: Proxy(), setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify, webauthn:virtualAuthenticators: true}
Session ID: 4327692b474f3a18bcac43d95b7b6d33
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.openqa.selenium.remote.http.W3CHttpResponseCodec.createException(W3
[appuser@ants-daily tmp]$ df -h /dev/shm/
文件系统 容量 已用 可用 已用% 挂载点
tmpfs 1.8G 0 1.8G 0% /dev/shm
- 尚未解决的问题
chromedriver代码只能单线程跑,多线程就出错。现在是通过MQ分发,多台机器单线程处理图片生成。