Optimize file system operations

I shouldn't have to pinvoke to get the best out of the local and remote filesystems.

Here notes of various optimizations .NET could be performing, pooled from various forums.

/* 
* On Windows 7 / W2K8R2, you can use FindFirstFileEx with FindExInfoBasic, 
* the main speedup being omitting the short file name on NTFS file systems where this is enabled.

* set the additional flags parameter to 2 (defined in MSDN as FIND_FIRST_EX_LARGE_FETCH) with this setting conditioned on 
* (Win32majorversion >= 6) and (Win32minorversion >= 1), for the time being.

IShellFolder::EnumObjects is faster but SHGetDataFromIDList does not return as much of information as FindFirstFile/FindNextFile, which is used by GetFiles. If your result set is large and you need a lot of information from every file, you may have faster returns by querying the whole set of file information. However, if your result set is small compares to the number of files in the directory, or you only care about a few attributes of the file, you can call IShellFolder::EnumObjects to get the files' pidl first and use SHGetDataFromIDList/IShellFolder::GetAttributesOf to get the attributes.

You can also use FIND_FIRST_EX_LARGE_FETCH/FIND_FIRST_EX_CASE_SENSITIVE to speed up FindFirstFileEx/FindNextFile

IShellFolder::EnumObjects may fail prematurely on some file server as buggy drivers that does not work well with IShellFolder::EnumObjects. Windows XP uses FindFirstFile/FindNextFile in Windows Explorer.

* http://social.microsoft.com/Forums/en-HK/Offtopic/thread/db8067ba-9a4a-4a83-a40f-1b30c7dadfc4 
* I have a directory on a remote computer that contains over a million files. 
* The network latency is high enough that attempting to iterate through the files one by one Directory.GetFiles() 
* and delete them with File.Delete() is just hopelessly slow (I gave up after 15 minutes). 
* However I do know that the MS SMB protocol for performing remote directory file operations 
* does support passing a wildcard to the SMB_COM_DELETE command (http://msdn.microsoft.com/en-us/library/dd327689.aspx), 
* and I'm pretty sure if I were to issue this command directly, I'd be able to delete all the files in question fairly quickly. 
* But I don't see a way this is possible using the .NET framework as it is, and indeed I'm not even sure you can do it via 
* the Windows API. The only function that looks like it could offer this is SHFileOperation but not only did this take nearly 
* a minute to execute on a sample directory with only 2000 files but it failed to actually delete any of the files!

BTW, some timings on a smaller test directory (under .NET 4.0):

new DirectoryInfo("\\remotemachine\share\").GetFiles().Length //takes about 44 seconds 
new DirectoryInfo("\\remotemachine\share\").EnumerateFiles().Count //takes about 22 seconds

IntPtr handle = FindFirstFileEx(@"\\remotemachine\share\*", FINDEX_INFO_LEVELS.FindExInfoBasic, out find_data, FINDEX_SEARCH_OPS.FindExSearchNameMatch, IntPtr.Zero, FIND_FIRST_EX_LARGE_FETCH); 
do ++count; 
while (FindNextFile(handle, out find_data));

takes about 3 seconds (without FIND_FIRST_EX_LARGE_FETCH it's about 17 seconds)

I'd noticed del \\remotemachine\share\*.* from the command line works moderately quickly (though still too slow really) - initially I thought this might be passing wildcards to the SMB delete command, but upon using ProcMon I determined it's just using FindFirstFileEx/FindNextFileEx with the FIND_FIRST_EX_LARGE_FETCH flag.

*/

/* http://stackoverflow.com/questions/7430959/how-to-make-createfile-as-fast-as-possible 
CreateFile in kernel32.dll has some extra overhead compared to the kernel syscall NtCreateFile in ntdll.dll. This is the real function that CreateFile calls to ask the kernel to open the file. If you need to open a large number of files, NtOpenFile will be more efficient by avoiding the special cases and path translation that Win32 has-- things that wouldn't apply to a bunch of files in a directory anyway.

NTSYSAPI NTSTATUS NTAPI NtOpenFile(OUT HANDLE *FileHandle, IN ACCESS_MASK DesiredAccess, IN OBJECT_ATTRIBUTES *ObjectAttributes, OUT IO_STATUS_BLOCK *IoStatusBlock, IN ULONG ShareAccess, IN ULONG OpenOptions); HANDLE Handle; OBJECT_ATTRIBUTES Oa = {0}; UNICODE_STRING Name_U; IO_STATUS_BLOCK IoSb; RtlInitUnicodeString(&Name_U, Name); Oa.Length = sizeof Oa; Oa.ObjectName = &Name_U; Oa.Attributes = CaseInsensitive ? OBJ_CASE_INSENSITIVE : 0; Oa.RootDirectory = ParentDirectoryHandle; Status = NtOpenFile(&Handle, FILE_READ_DATA, &Oa, &IoSb, FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, FILE_SEQUENTIAL_ONLY); Main downside: this API is not supported by Microsoft for use in user mode. That said, the equivalent function is documented for kernel mode use and hasn't changed since the first release of Windows NT in 1993.

NtOpenFile also allows you to open a file relative to an existing directory handle (ParentDirectoryHandle in the example) which should cut down on some of the filesystem overhead in locating the directory.

In the end, NTFS may just be too slow in handling directories with large numbers of files as Carey Gregory said.

*/

/* http://stackoverflow.com/questions/4203573/finding-a-set-of-file-names-quickly-on-ntfs-volumes-ideally-via-its-mft

The best way to solve your problem seems to be by using the Windows Change Journal.

Problem: If it is not enabled for a volume or the volume is a non-NTFS you need a fallback (or enable the Change Journal if it is NTFS). You need administrator rights as well to access the Change Journal.

You get the files by using the FSCTL_ENUM_USN_DATA and DeviceIOControll with LowUsn=0. This directly accesses the MFT and writes all filenames into the supplied buffer. Because it sequentially acesses the MFT it is faster than the FindFirstFile API.

* *

If you set StartUSN to zero as described this gives you all files on the volume in a fast way (And it is really fast). If you want changes you have to set StartUSN to a higher number. Then you get the changed files since that USN. – UrOni Nov 22 '10 at 17:56 
Sorry. It is FSCTL_ENUM_USN_DATA and not FSCTL_QUERY_USN_JOURNAL - my bad. – UrOni Nov 22 '10 at 18:06 
Ah, then the "journal" actually does more than just journalling it seems (contrary to OS X's function which only tells you of changes while listening).

I don't think you need the Change Journal enabled to use FSCTL_ENUM_USN_DATA. There's a separate ioctl for change tracking, FSCTL_READ_USN_JOURNAL, which is probably more similar to the OSX journal you've used before, although the NTFS one is more like a closed-caption security tape: your process doesn't have to be running when the change occurs as long as you query the journal before it wraps around and gets overwritten.

Here is the link to the MSDN documentation on FSCTL_ENUM_USN_DATA: msdn.microsoft.com/en-us/library/aa364563%28VS.85%29.aspx 
Have a look at these links: microsoft.com/msj/0999/journal/journal.aspx and technet.microsoft.com/en-us/library/bb742450.aspx. This 2 part series named "Keeping an Eye on Your NTFS Drives: the Windows 2000 Change Journal Explained" helped me kept my sanity when implementing change journal functionality.

*/

原文链接:http://visualstudio.uservoice.com/forums/121579-visual-studio/suggestions/2385336-optimize-file-system-operations

你可能感兴趣的:(Optimize file system operations)