##[pdf]Debugging PySpark【Spark Summit East 2017】

【PDF大放送】Spark&Hadoop Summit精选分享PDF合集-博客-云栖社区-阿里云 https://yq.aliyun.com/articles/72207?spm=5176.100239.blogcont71098.13.Kt7Srt

//下载链接
【Spark Summit East 2017】Debugging PySpark

##[pdf]Debugging PySpark【Spark Summit East 2017】_第1张图片
Paste_Image.png

//p13
● Error messages reported to the console*
● Log messages reported to the console*
● Log messages on the workers - access through the
Spark Web UI or Spark History Server :)

//p16
● Use yarn logs to get logs after log collection
● Or set up the Spark history server
● Or yarn.nodemanager.delete.debug-delay-sec :)

//p17
Most of the time it tells you things you already know
● Or don’t need to know
● You can dynamically control the log level with
sc.setLogLevel

//p25


##[pdf]Debugging PySpark【Spark Summit East 2017】_第2张图片
Ok maybe the web UI is easier?.png

//p28
Regardless of language
● Can be difficult to determine which element failed
● Stack trace sometimes helps (it did this time)
● take(1) + count() are your friends - but a lot of work :(

//p30
● spark-testing-base is on pip now for your happy test
adventures

//p31
Adding your own logging:
● Java users use Log4J & friends
● Python users: use logging library (or even print!)

你可能感兴趣的:(##[pdf]Debugging PySpark【Spark Summit East 2017】)