Spark图计算实战:在pyspark环境下使用GraphFrames库
from pyspark import SparkConf, SparkContext
from pyspark.sql import SparkSession
from graphframes import GraphFrame
sc = SparkContext()
spark = SparkSession(sc)
# Vertics DataFrame
vertics = spark.createDataFrame([
("a", "Alice", 34),
("b", "Bob", 36),
("c", "Charlie", 37),
("d", "David", 29),
("e", "Esther", 32),
("f", "Fanny", 38),
("g", "Gabby", 60)
], ["id", "name", "age"])
vertics.show()
# Edges DataFrame
edges = spark.createDataFrame([
("a", "b", "friend"),
("b", "c", "follow"),
("c", "b", "follow"),
("f", "c", "follow"),
("e", "f", "follow"),
("e", "d", "friend"),
("d", "a", "friend"),
("a", "e", "friend"),
("g", "e", "follow")
], ["src", "dst", "relationship"])
edges.show()
# Create a GraphFrame
graph = GraphFrame(vertics, edges)
在执行graph初始化语句时,报错信息如下
pyspark: Py4JJavaError: An error occurred while calling o138.loadClass.: java.lang.ClassNotFoundException: org.graphframes.GraphFramePythonAPI
graphframes环境报错:
在终端执行
pyspark --packages graphframes:graphframes:0.8.0-spark3.0-s_2.12 --jars graphframes-0.8.0-spark3.0-s_2.12.jar
发现报错原因是缺少graphframes-0.8.0-spark3.0-s_2.12.jar
,因此需要到官网下载
官网链接:graphframes
下载之后,由于终端启动pyspark路径定位在user路径下,如果要在终端启动pyspark,需要将jar包放在图片路径上
对于jupyter notebook,只需要声明SparkContext环境后,添加jar包所在路径即可,对jar包位置没有要求
from pyspark import SparkConf, SparkContext
from pyspark.sql import SparkSession
from graphframes import GraphFrame
sc = SparkContext()
spark = SparkSession(sc)
# 添加jar包路径,修复bug
sc.addPyFile("../envs/project/lib/python3.8/site-packages/pyspark/jars/graphframes-0.8.0-spark3.0-s_2.12.jar")
# Vertics DataFrame
vertics = spark.createDataFrame([
("a", "Alice", 34),
("b", "Bob", 36),
("c", "Charlie", 37),
("d", "David", 29),
("e", "Esther", 32),
("f", "Fanny", 38),
("g", "Gabby", 60)
], ["id", "name", "age"])
vertics.show()
# Edges DataFrame
edges = spark.createDataFrame([
("a", "b", "friend"),
("b", "c", "follow"),
("c", "b", "follow"),
("f", "c", "follow"),
("e", "f", "follow"),
("e", "d", "friend"),
("d", "a", "friend"),
("a", "e", "friend"),
("g", "e", "follow")
], ["src", "dst", "relationship"])
edges.show()
# Create a GraphFrame
graph = GraphFrame(vertics, edges)
Spark raphFrames图计算API:https://blog.csdn.net/weixin_45839604/article/details/117751806
基于pyspark图计算的算法实例:
https://blog.csdn.net/weixin_39198406/article/details/104940179
基于SparkGraph的社交关系图谱实战
https://blog.51cto.com/u_11200224/5275069
graphframes环境报错:
https://blog.csdn.net/m0_37754282/article/details/110086095
https://blog.csdn.net/qq_42166929/article/details/105983616