一、问题:Windows 下 flume采集配置TAILDIR的时候,会报如下错误:
agent.sources.seqGenSrc.type = TAILDIR agent.sources.seqGenSrc.positionFile = .\\taildir_mongodb_position.json agent.sources.seqGenSrc.filegroups = seqGenSrc agent.sources.seqGenSrc.filegroups.seqGenSrc = D:\\bigdata-tax-crawler-python\\results\\jiangsu.log agent.sources.seqGenSrc.fileHeader = false
错误日志:
java.lang.UnsupportedOperationException: View 'unix' not available at sun.nio.fs.AbstractFileSystemProvider.readAttributes(AbstractFileSystemProvider.java:91) at java.nio.file.Files.readAttributes(Files.java:1964) at java.nio.file.Files.getAttribute(Files.java:1869) at org.apache.flume.source.taildir.ReliableTaildirEventReader.getInode(ReliableTaildirEventReader.java:284) at org.apache.flume.source.taildir.ReliableTaildirEventReader.updateTailFiles(ReliableTaildirEventReader.java:248) at org.apache.flume.source.taildir.ReliableTaildirEventReader.(ReliableTaildirEventReader.java:93) at org.apache.flume.source.taildir.ReliableTaildirEventReader. (ReliableTaildirEventReader.java:49) at org.apache.flume.source.taildir.ReliableTaildirEventReader$Builder.build(ReliableTaildirEventReader.java:355) at org.apache.flume.source.taildir.TaildirSource.start(TaildirSource.java:105) at org.apache.flume.source.PollableSourceRunner.start(PollableSourceRunner.java:71) at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:249) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
二、问题原因:flume的源码(此处以1.9版本为例)中ReliableTaildirEventReader.java获取inode时通过inode = (long) Files.getAttribute(file.toPath(), "unix:ino");进行获取,该方法只支持unix系统,无法支持windows操作系统,故而报错。
TaildirSource动态监听文件变化的技术基础就是获取文件的inode,建立inode和文件之间的一一对应关系,利用RandomAccessFile去读取文件,并将inode和读取的位置以及文件位置保存成json文件进行持久化,以便后续的继续跟踪。inode是linux文件的概念,而获取inode是在ReliableTaildirEventReader的getInode方法里,在这个方法里是不支持unix操作系统的。TaildirSource的思想是获取一个文件的标识(linux里inode可以作为文件的标识使用,当系统读取文件时,其实就是根据文件路径转换成对应的inode值来做的操作)并记录对应的文件路径,windows中是有file id这种类似于inode的存在的,file id是跟文件系统有关的, 在FAT系统中,如果修改的名字长于旧名字,file id可能会发生改变,但是在NTFS系统中,在删除之前file id都是稳定的。如果是windows系统 并且文件系统是ntfs,那么我们就使用file id去获取文件作为inode。
本文作者:张永清 转载请注明来源于博客园:https://www.cnblogs.com/laoqing/p/12836826.html
1.9中flume-taildir-source 中ReliableTaildirEventReader.java中的完整源码如下(ReliableTaildirEventReader.java中的284行的方法只能运行于unix操作系统):
1 /* 2 * Licensed to the Apache Software Foundation (ASF) under one 3 * or more contributor license agreements. See the NOTICE file 4 * distributed with this work for additional information 5 * regarding copyright ownership. The ASF licenses this file 6 * to you under the Apache License, Version 2.0 (the 7 * "License"); you may not use this file except in compliance 8 * with the License. You may obtain a copy of the License at 9 * 10 * http://www.apache.org/licenses/LICENSE-2.0 11 * 12 * Unless required by applicable law or agreed to in writing, 13 * software distributed under the License is distributed on an 14 * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 15 * KIND, either express or implied. See the License for the 16 * specific language governing permissions and limitations 17 * under the License. 18 */ 19 20 package org.apache.flume.source.taildir; 21 22 import com.google.common.annotations.VisibleForTesting; 23 import com.google.common.base.Preconditions; 24 import com.google.common.collect.Lists; 25 import com.google.common.collect.Maps; 26 import com.google.common.collect.Table; 27 import com.google.gson.stream.JsonReader; 28 import org.apache.flume.Event; 29 import org.apache.flume.FlumeException; 30 import org.apache.flume.annotations.InterfaceAudience; 31 import org.apache.flume.annotations.InterfaceStability; 32 import org.apache.flume.client.avro.ReliableEventReader; 33 import org.apache.flume.source.taildir.util.WinFileUtil; 34 import org.slf4j.Logger; 35 import org.slf4j.LoggerFactory; 36 37 import java.io.File; 38 import java.io.FileNotFoundException; 39 import java.io.FileReader; 40 import java.io.IOException; 41 import java.nio.file.Files; 42 import java.nio.file.NoSuchFileException; 43 import java.util.Arrays; 44 import java.util.List; 45 import java.util.Map; 46 import java.util.Map.Entry; 47 48 @InterfaceAudience.Private 49 @InterfaceStability.Evolving 50 public class ReliableTaildirEventReader implements ReliableEventReader { 51 private static final Logger logger = LoggerFactory.getLogger(ReliableTaildirEventReader.class); 52 53 private final ListtaildirCache; 54 private final Table headerTable; 55 56 private TailFile currentFile = null; 57 private Map tailFiles = Maps.newHashMap(); 58 private long updateTime; 59 private boolean addByteOffset; 60 private boolean cachePatternMatching; 61 private boolean committed = true; 62 private final boolean annotateFileName; 63 private final String fileNameHeader; 64 public static final String OS_NAME = System.getProperty("os.name").toLowerCase(); 65 /** 66 * Create a ReliableTaildirEventReader to watch the given directory. 67 */ 68 private ReliableTaildirEventReader(Map filePaths, 69 Table headerTable, String positionFilePath, 70 boolean skipToEnd, boolean addByteOffset, boolean cachePatternMatching, 71 boolean annotateFileName, String fileNameHeader) throws IOException { 72 // Sanity checks 73 Preconditions.checkNotNull(filePaths); 74 Preconditions.checkNotNull(positionFilePath); 75 76 if (logger.isDebugEnabled()) { 77 logger.debug("Initializing {} with directory={}, metaDir={}", 78 new Object[] { ReliableTaildirEventReader.class.getSimpleName(), filePaths }); 79 } 80 81 List taildirCache = Lists.newArrayList(); 82 for (Entry e : filePaths.entrySet()) { 83 taildirCache.add(new TaildirMatcher(e.getKey(), e.getValue(), cachePatternMatching)); 84 } 85 logger.info("taildirCache: " + taildirCache.toString()); 86 logger.info("headerTable: " + headerTable.toString()); 87 88 this.taildirCache = taildirCache; 89 this.headerTable = headerTable; 90 this.addByteOffset = addByteOffset; 91 this.cachePatternMatching = cachePatternMatching; 92 this.annotateFileName = annotateFileName; 93 this.fileNameHeader = fileNameHeader; 94 updateTailFiles(skipToEnd); 95 96 logger.info("Updating position from position file: " + positionFilePath); 97 loadPositionFile(positionFilePath); 98 } 99 100 /** 101 * Load a position file which has the last read position of each file. 102 * If the position file exists, update tailFiles mapping. 103 */ 104 public void loadPositionFile(String filePath) { 105 Long inode, pos; 106 String path; 107 FileReader fr = null; 108 JsonReader jr = null; 109 try { 110 fr = new FileReader(filePath); 111 jr = new JsonReader(fr); 112 jr.beginArray(); 113 while (jr.hasNext()) { 114 inode = null; 115 pos = null; 116 path = null; 117 jr.beginObject(); 118 while (jr.hasNext()) { 119 switch (jr.nextName()) { 120 case "inode": 121 inode = jr.nextLong(); 122 break; 123 case "pos": 124 pos = jr.nextLong(); 125 break; 126 case "file": 127 path = jr.nextString(); 128 break; 129 } 130 } 131 jr.endObject(); 132 133 for (Object v : Arrays.asList(inode, pos, path)) { 134 Preconditions.checkNotNull(v, "Detected missing value in position file. " 135 + "inode: " + inode + ", pos: " + pos + ", path: " + path); 136 } 137 TailFile tf = tailFiles.get(inode); 138 if (tf != null && tf.updatePos(path, inode, pos)) { 139 tailFiles.put(inode, tf); 140 } else { 141 logger.info("Missing file: " + path + ", inode: " + inode + ", pos: " + pos); 142 } 143 } 144 jr.endArray(); 145 } catch (FileNotFoundException e) { 146 logger.info("File not found: " + filePath + ", not updating position"); 147 } catch (IOException e) { 148 logger.error("Failed loading positionFile: " + filePath, e); 149 } finally { 150 try { 151 if (fr != null) fr.close(); 152 if (jr != null) jr.close(); 153 } catch (IOException e) { 154 logger.error("Error: " + e.getMessage(), e); 155 } 156 } 157 } 158 159 public Map getTailFiles() { 160 return tailFiles; 161 } 162 163 public void setCurrentFile(TailFile currentFile) { 164 this.currentFile = currentFile; 165 } 166 167 @Override 168 public Event readEvent() throws IOException { 169 List events = readEvents(1); 170 if (events.isEmpty()) { 171 return null; 172 } 173 return events.get(0); 174 } 175 176 @Override 177 public List readEvents(int numEvents) throws IOException { 178 return readEvents(numEvents, false); 179 } 180 181 @VisibleForTesting 182 public List readEvents(TailFile tf, int numEvents) throws IOException { 183 setCurrentFile(tf); 184 return readEvents(numEvents, true); 185 } 186 187 public List readEvents(int numEvents, boolean backoffWithoutNL) 188 throws IOException { 189 if (!committed) { 190 if (currentFile == null) { 191 throw new IllegalStateException("current file does not exist. " + currentFile.getPath()); 192 } 193 logger.info("Last read was never committed - resetting position"); 194 long lastPos = currentFile.getPos(); 195 currentFile.updateFilePos(lastPos); 196 } 197 List events = currentFile.readEvents(numEvents, backoffWithoutNL, addByteOffset); 198 if (events.isEmpty()) { 199 return events; 200 } 201 202 Map headers = currentFile.getHeaders(); 203 if (annotateFileName || (headers != null && !headers.isEmpty())) { 204 for (Event event : events) { 205 if (headers != null && !headers.isEmpty()) { 206 event.getHeaders().putAll(headers); 207 } 208 if (annotateFileName) { 209 event.getHeaders().put(fileNameHeader, currentFile.getPath()); 210 } 211 } 212 } 213 committed = false; 214 return events; 215 } 216 217 @Override 218 public void close() throws IOException { 219 for (TailFile tf : tailFiles.values()) { 220 if (tf.getRaf() != null) tf.getRaf().close(); 221 } 222 } 223 224 /** Commit the last lines which were read. */ 225 @Override 226 public void commit() throws IOException { 227 if (!committed && currentFile != null) { 228 long pos = currentFile.getLineReadPos(); 229 currentFile.setPos(pos); 230 currentFile.setLastUpdated(updateTime); 231 committed = true; 232 } 233 } 234 235 /** 236 * Update tailFiles mapping if a new file is created or appends are detected 237 * to the existing file. 238 */ 239 public List updateTailFiles(boolean skipToEnd) throws IOException { 240 updateTime = System.currentTimeMillis(); 241 List updatedInodes = Lists.newArrayList(); 242 243 for (TaildirMatcher taildir : taildirCache) { 244 Map headers = headerTable.row(taildir.getFileGroup()); 245 246 for (File f : taildir.getMatchingFiles()) { 247 long inode; 248 try { 249 inode = getInode(f); 250 } catch (NoSuchFileException e) { 251 logger.info("File has been deleted in the meantime: " + e.getMessage()); 252 continue; 253 } 254 TailFile tf = tailFiles.get(inode); 255 if (tf == null || !tf.getPath().equals(f.getAbsolutePath())) { 256 long startPos = skipToEnd ? f.length() : 0; 257 tf = openFile(f, headers, inode, startPos); 258 } else { 259 boolean updated = tf.getLastUpdated() < f.lastModified() || tf.getPos() != f.length(); 260 if (updated) { 261 if (tf.getRaf() == null) { 262 tf = openFile(f, headers, inode, tf.getPos()); 263 } 264 if (f.length() < tf.getPos()) { 265 logger.info("Pos " + tf.getPos() + " is larger than file size! " 266 + "Restarting from pos 0, file: " + tf.getPath() + ", inode: " + inode); 267 tf.updatePos(tf.getPath(), inode, 0); 268 } 269 } 270 tf.setNeedTail(updated); 271 } 272 tailFiles.put(inode, tf); 273 updatedInodes.add(inode); 274 } 275 } 276 return updatedInodes; 277 } 278 279 public List updateTailFiles() throws IOException { 280 return updateTailFiles(false); 281 } 282 283 284 private long getInode(File file) throws IOException { 285 long inode = (long) Files.getAttribute(file.toPath(), "unix:ino"); 286 return inode; 287 } 288 289 private TailFile openFile(File file, Map headers, long inode, long pos) { 290 try { 291 logger.info("Opening file: " + file + ", inode: " + inode + ", pos: " + pos); 292 return new TailFile(file, headers, inode, pos); 293 } catch (IOException e) { 294 throw new FlumeException("Failed opening file: " + file, e); 295 } 296 } 297 298 /** 299 * Special builder class for ReliableTaildirEventReader 300 */ 301 public static class Builder { 302 private Map filePaths; 303 private Table headerTable; 304 private String positionFilePath; 305 private boolean skipToEnd; 306 private boolean addByteOffset; 307 private boolean cachePatternMatching; 308 private Boolean annotateFileName = 309 TaildirSourceConfigurationConstants.DEFAULT_FILE_HEADER; 310 private String fileNameHeader = 311 TaildirSourceConfigurationConstants.DEFAULT_FILENAME_HEADER_KEY; 312 313 public Builder filePaths(Map filePaths) { 314 this.filePaths = filePaths; 315 return this; 316 } 317 318 public Builder headerTable(Table headerTable) { 319 this.headerTable = headerTable; 320 return this; 321 } 322 323 public Builder positionFilePath(String positionFilePath) { 324 this.positionFilePath = positionFilePath; 325 return this; 326 } 327 328 public Builder skipToEnd(boolean skipToEnd) { 329 this.skipToEnd = skipToEnd; 330 return this; 331 } 332 333 public Builder addByteOffset(boolean addByteOffset) { 334 this.addByteOffset = addByteOffset; 335 return this; 336 } 337 338 public Builder cachePatternMatching(boolean cachePatternMatching) { 339 this.cachePatternMatching = cachePatternMatching; 340 return this; 341 } 342 343 public Builder annotateFileName(boolean annotateFileName) { 344 this.annotateFileName = annotateFileName; 345 return this; 346 } 347 348 public Builder fileNameHeader(String fileNameHeader) { 349 this.fileNameHeader = fileNameHeader; 350 return this; 351 } 352 353 public ReliableTaildirEventReader build() throws IOException { 354 return new ReliableTaildirEventReader(filePaths, headerTable, positionFilePath, skipToEnd, 355 addByteOffset, cachePatternMatching, 356 annotateFileName, fileNameHeader); 357 } 358 } 359 360 }
三、问题解决方式(windows如何支持tail和taildir):
1、增加tail 命令支持,windows中并没有tail 命令。可以通过链接: https://files.cnblogs.com/files/laoqing/tail.zip 下载tail 命令放到windows32 目录下。
2、修改源码来支持taildir。
在flume的flume-taildir-source工程中引入如下依赖:
本文作者:张永清 转载请注明来源于博客园:https://www.cnblogs.com/laoqing/p/12836826.html
net.java.dev.jna jna 4.2.2 net.java.dev.jna jna-platform 4.2.2
1)、新增Kernel32.java
/* * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, * software distributed under the License is distributed on an * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY * KIND, either express or implied. See the License for the * specific language governing permissions and limitations * under the License. */ package org.apache.flume.source.taildir.util; import java.util.ArrayList; import java.util.Arrays; import java.util.HashMap; import java.util.List; import java.util.Map; import com.sun.jna.Library; import com.sun.jna.Native; import com.sun.jna.Structure; import com.sun.jna.platform.win32.WinBase.FILETIME; import com.sun.jna.platform.win32.WinDef.DWORD; import com.sun.jna.platform.win32.WinNT.HANDLE; import com.sun.jna.win32.StdCallLibrary; import com.sun.jna.win32.W32APIFunctionMapper; import com.sun.jna.win32.W32APITypeMapper; /** * Created by zhangyongqing on 2020-05-06. */ public interface Kernel32 extends StdCallLibrary { final static Map WIN32API_OPTIONS = new HashMap() { private static final long serialVersionUID = 1L; { put(Library.OPTION_FUNCTION_MAPPER, W32APIFunctionMapper.UNICODE); put(Library.OPTION_TYPE_MAPPER, W32APITypeMapper.UNICODE); } }; Kernel32 INSTANCE = (Kernel32) Native.loadLibrary("Kernel32", Kernel32.class, WIN32API_OPTIONS); int GetLastError(); class BY_HANDLE_FILE_INFORMATION extends Structure { public DWORD dwFileAttributes; public FILETIME ftCreationTime; public FILETIME ftLastAccessTime; public FILETIME ftLastWriteTime; public DWORD dwVolumeSerialNumber; public DWORD nFileSizeHigh; public DWORD nFileSizeLow; public DWORD nNumberOfLinks; public DWORD nFileIndexHigh; public DWORD nFileIndexLow; public static class ByReference extends BY_HANDLE_FILE_INFORMATION implements Structure.ByReference { } ; public static class ByValue extends BY_HANDLE_FILE_INFORMATION implements Structure.ByValue { } @Override protected List getFieldOrder() { List fields = new ArrayList(); fields.addAll(Arrays.asList(new String[]{"dwFileAttributes", "ftCreationTime", "ftLastAccessTime", "ftLastWriteTime", "dwVolumeSerialNumber", "nFileSizeHigh", "nFileSizeLow", "nNumberOfLinks", "nFileIndexHigh", "nFileIndexLow"})); return fields; } ; } ; boolean GetFileInformationByHandle(HANDLE hFile, BY_HANDLE_FILE_INFORMATION lpFileInformation); }
2)、新增WinFileUtil.java
/* * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, * software distributed under the License is distributed on an * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY * KIND, either express or implied. See the License for the * specific language governing permissions and limitations * under the License. */ package org.apache.flume.source.taildir.util; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import com.sun.jna.platform.win32.Kernel32; import com.sun.jna.platform.win32.WinBase; import com.sun.jna.platform.win32.WinNT.HANDLE; import java.io.File; import java.nio.file.Files; /** * Created by zhangyongqing on 2020-05-06. */ public class WinFileUtil { public static WinFileUtil getWinFile(){ return new WinFileUtil(); } private static Logger logger = LoggerFactory.getLogger(WinFileUtil.class); public static String getFileId(String filepath) { final int FILE_SHARE_READ = (0x00000001); final int OPEN_EXISTING = (3); final int GENERIC_READ = (0x80000000); final int FILE_ATTRIBUTE_ARCHIVE = (0x20); WinBase.SECURITY_ATTRIBUTES attr = null; org.apache.flume.source.taildir.util.Kernel32.BY_HANDLE_FILE_INFORMATION lpFileInformation = new org.apache.flume.source.taildir.util.Kernel32.BY_HANDLE_FILE_INFORMATION(); HANDLE hFile = null; hFile = Kernel32.INSTANCE.CreateFile(filepath, 0, FILE_SHARE_READ, attr, OPEN_EXISTING, FILE_ATTRIBUTE_ARCHIVE, null); String ret = "0"; if (Kernel32.INSTANCE.GetLastError() == 0) { org.apache.flume.source.taildir.util.Kernel32.INSTANCE .GetFileInformationByHandle(hFile, lpFileInformation); ret = lpFileInformation.dwVolumeSerialNumber.toString() + lpFileInformation.nFileIndexLow.toString(); Kernel32.INSTANCE.CloseHandle(hFile); if (Kernel32.INSTANCE.GetLastError() == 0) { logger.debug("inode:" + ret); return ret; } else { logger.error("close file:{} cause exception", filepath); throw new RuntimeException("close file:" + filepath+" cause Exception"); } } else { if (hFile != null) { Kernel32.INSTANCE.CloseHandle(hFile); } logger.error("open file:{} cause Exception", filepath); throw new RuntimeException("open file :" + filepath+" cause Exception"); } } }
3)、修改ReliableTaildirEventReader.java 中的private long getInode(File file) throws IOException 方法,替换为如下代码
private long getInode(File file) throws IOException { long inode; if (OS_NAME.contains("windows")) { inode = Long.parseLong(WinFileUtil.getFileId(file.toPath().toString())); } else { inode = (long) Files.getAttribute(file.toPath(), "unix:ino"); } return inode; }
4)、重新编译打包flume-taildir-source工程,将生成的flume-taildir-source-1.9.0.jar包替换到flume的lib目录中,并且将jna-platform-4.2.2.jar和jna-4.2.2.jar 拷贝到flume的lib目录中。重新启动taildir 采集,问题得以解决。