Java 读取文本文件

文章目录

  • File
  • FileReader
  • InputStreamReader
  • BufferedReader(解决方法在这里)
    • 解决中文乱码
  • FileInputStream

我想用Java 读取文本文件(txt)中的字符,但是对Java的文件操作不怎么熟悉,于是开始翻官方文档,解决了如何从文件中读取一行或者全部数据的问题。

File

File 代表目录或者文件
File类的实例是不可变的;也就是说,一旦创建,由File对象表示的抽象路径名将永远不会改变。下面是File 类的部分方法:

public File(String pathname)
public File(String parent, String child)
public File(File parent, String child)
public File(URI uri)

public String getName() 
public String getParent()
public String getPath()
public URL toURL()
public boolean canRead()
public boolean canWrite()
public boolean exists()
public boolean isDirectory()
public boolean isFile()
public boolean isHidden()
public long lastModified()
public long length()
public boolean createNewFile()
public boolean delete()
public void deleteOnExit()
....

File 类本身并没有提供用于输入输出的方法,它只是代表了计算机中的文件或目录。

FileReader

FileReader 继承自InputStreamReader ,在类文件中只看到了新增的三个构造方法:
public FileReader(String fileName)
public FileReader(File file)
public FileReader(FileDescriptor fd)
文档说FileReader是用于读取字符文件,将文件以字符流的形式读出,但是仍然没有看到输入的方法,接下来看看它们的父类,看父类有没有我们要的方法。

InputStreamReader

InputStreamReader 继承自抽象类 Reader ,下面是InputstreamReader的全部公共方法:

  //所有的构造方法参数都是InputStream
  public InputStreamReader(InputStream in) 
  public InputStreamReader(InputStream in, String charsetName) //用指定字符集创建对象
  public InputStreamReader(InputStream in, Charset cs)
  public InputStreamReader(InputStream in, CharsetDecoder dec)
  
  public String getEncoding() //获取字符集
  public int read() //读取单个字符的字符集编码,如果流被读完,返回-1
  public int read(char cbuf[], int offset, int length) //读取部分字符到字符数组cbuf
  public boolean ready() //如果该流的输入缓冲区非空,返回true
  public void close()

终于看到read()方法了,现在我知道怎么从文本文件中读取字符了:

  public class Main {
    public static void main(String[] args) throws IOException {

        String fileName ="C:\\Users\\lin\\Desktop\\English.txt";
        FileReader fileReader = new FileReader(fileName);

        char[] chars = new char[10];

        fileReader.read(chars,0,9);//先读取十个字符试试

        for(char ch:chars){
            System.out.print(ch);
        }
    }
}

文件中的内容
Java 读取文本文件_第1张图片
运行结果:

insult ��

中文乱码,先不管它。
但不管是FileReader 还是 InputStreamReader,都只有两个方法可以用于读出数据 :
public int read()public int read(char[] cbuf,int offset,int length)
显然这种简易的方法不能满足我的需求,然后我又找到了BufferedReader

BufferedReader(解决方法在这里)

下面是BufferedReader的文档(jdk1.8)以及来自英语渣不负责任的翻译:

Reads text from a character-input stream, buffering characters so as to provide for the efficient reading of characters, arrays, and lines.
从字符输入流读取文本,并且缓冲字符,以便提供对字符、数组和行的有效读取。
The buffer size may be specified, or the default size may be used. The default is large enough for most purposes.
可以指定缓冲字符的数量,如果没有指定的话会使用默认值,这个默认值对大多数的需求来说是足够大的。
In general, each read request made of a Reader causes a corresponding read request to be made of the underlying character or byte stream.
通常来说,每个由Reader构建的读取请求会引起相应的字符或是字节流读取请求。
It is therefore advisable to wrap a BufferedReader around any Reader whose read() operations may be costly, such as FileReaders and InputStreamReaders. For example,
因此建议用BufferedReader 包装 可能耗费高昂代价的Reader的read() 方法,比如:
BufferedReader in = new BufferedReader(new FileReader("foo.in"));
will buffer the input from the specified file.
将会缓存指定的输入流.
Without buffering, each invocation of read() or readLine() could cause bytes to be read from the file, converted into characters, and then returned, which can be very inefficient.
如果没有缓存,每次调用read() 或者readLine() 方法都会从文件中读取字节,将字节转换为字符,然后再返回,这样是很低效的。
Programs that use DataInputStreams for textual input can be localized by replacing each DataInputStream with an appropriate BufferedReader.

下面是BufferedReader 的全部公有方法:

public BufferedReader(Reader in, int sz)
public BufferedReader(Reader in)
public int read()
public int read(char cbuf[], int off, int len)
public String readLine()
public long skip(long n)
public boolean ready()
public boolean markSupported()
public void mark(int readAheadLimit)
public void reset()
public void close()
public Stream lines()

文档说FileReaderread 方法是比较低效的,同时也给出了解决方案:用BufferedReader 包装FileReader, 于是我修改了我的代码:

public class Main {
    public static void main(String[] args) throws IOException {

        String fileName ="C:\\Users\\lin\\Desktop\\English.txt";
        FileReader fileReader = new FileReader(fileName);

        BufferedReader bufferedReader = new BufferedReader(fileReader);
        
        System.out.println(bufferedReader.readLine());

        bufferedReader.close();
        fileReader.close();
        
    }
}

运行结果:

insult ����

感觉好多了,如果要读取文本文件中的全部数据,我是这样做的:

public class Main {
    public static void main(String[] args) throws IOException {

        String fileName ="C:\\Users\\lin\\Desktop\\English.txt";
        FileReader fileReader = new FileReader(fileName);

        BufferedReader bufferedReader = new BufferedReader(fileReader);

        String line =bufferedReader.readLine();

        while (line!=null){
            System.out.println(line);
            line = bufferedReader.readLine();
        }

        bufferedReader.close();
        fileReader.close();

    }
}

运行结果:

insult ����
harsh �����ġ��̶���
intimidate ����
compromise��Э
executionִ��
novel �����С˵
engage����������
revenue-generating ����-���� ����
sweat ����
ownership ����Ȩ
synchronized ͬ��
asynchronized �첽
employee ְ��
hint ���� ���� ��ʾ
indication ָʾ
denote ָ������������ʾ
portion ����
offset ƫ����

解决中文乱码

翻文档的时候看到InputStreamReader 里有一个public String getEncoding() 的方法,jdk1.8对其的描述如下:

Returns the name of the character encoding being used by this stream.
返回该流的字符编码名
If the encoding has an historical name then that name is returned; otherwise the encoding’s canonical(权威的,牧师的) name is returned.
如果这个字符编码有历史名就返回历史名,否则返回规范名。简而言之就是返回该流的字符编码名。
If this instance was created with the InputStreamReader(InputStream, String) constructor then the returned name, being unique for the encoding, may differ from the name passed to the constructor. This method will return null if the stream has been closed.
如果这个实例是由InputStreamReader(InputStream, String) 这个构造方法创建的,那么返回的独一无二的编码名可能和传过来的的形参不同。如果该流被关闭,则返回null

需要注意的是这个方法返回的是文件流的字符编码,不是文件的编码。
然后我就用了这个方法,发现控制台输出的字符编码是UTF8

public class Main {
    public static void main(String[] args) throws IOException {

        String fileName ="C:\\Users\\lin\\Desktop\\English.txt";
        FileReader fileReader = new FileReader(fileName);

        BufferedReader bufferedReader = new BufferedReader(fileReader);
        
        System.out.println("字符集:"+fileReader.getEncoding()); 
        
        String line =bufferedReader.readLine();

        while (line!=null){
            System.out.println(line);
            line = bufferedReader.readLine();
        }

        bufferedReader.close();
        fileReader.close();

    }
}

我记得win10记事本的默认字符编码是ASCI,于是我把English.txt 改成UTF8 控制台就能够正常显示中文了。
Java 读取文本文件_第2张图片

FileInputStream

此前介绍的都是用于输出字符流的Java API。
FileInputStream 是字节输出流,将文件以字节流的形式读出
FileInputStream 继承自抽象类InputStream

A FileInputStream obtains input bytes from a file in a file system. What files are available depends on the host environment.
FileInputStream 从文件系统的文件中获取输入的二进制字节。文件是否可用取决于本地的计算机。
FileInputStream is meant for reading streams of raw bytes such as image data. For reading streams of characters, consider using FileReader.
FileInputStream 是为了读取诸如图像此类的原生的二进制字节而设计的。如果要读取字符流,考虑使用FileReader


读出对应着写入,每一个InputStream或者Reader都对应着一个OutputStream或者Writer,后者和前者大体相同,不再赘述。
另外,Java I/O 之所以设计得 看起来如此复杂 ,是因为使用了***装饰模式***,目的是在不破坏原有代码的情况下为功能的扩展提供比继承更好的灵活性,亦即 对修改关闭,对扩展开放

你可能感兴趣的:(Java)