pyspark读写csv文件

读取csv文件

from pyspark import SparkContext
from pyspark.sql import SQLContext

sc = SparkContext()
sqlsc = SQLContext(sc)
df = sqlsc.read.format('csv')\
          .option('delimiter', '\t')\
          .load('/path/to/file.csv')\
          .toDF('col1', 'col2', 'col3')

写入csv文件

df.write.format('csv')\
          .option('header','true')\
          .save('/path/to/file1.csv')

option支持参数

  • path: csv文件的路径。支持通配符;
  • header: csv文件的header。默认值是false;
  • delimiter: 分隔符。默认值是',';
  • quote: 引号。默认值是"";
  • mode: 解析的模式。支持的选项有:
    1. PERMISSIVE: nulls are inserted for missing tokens and extra tokens are ignored.
    2. DROPMALFORMED: drops lines which have fewer or more tokens than expected.
    3. FAILFAST: aborts with a RuntimeException if encounters any malformed line.
reference
  1. pyspark 读取csv文件创建DataFrame
  2. Pyspark读取csv文件
  3. 使用Spark读写CSV格式文件

你可能感兴趣的:(pyspark读写csv文件)