0%

graphframe

发表于 2020-06-17

counting common friends pseudocode without graphframe

from pyspark.sql import SparkSession
sparkSession = SparkSession.builder.enableHiveSupport().master('local').getOrCreate()
from pyspark.sql.types import StructType,StructField,IntegerType,StringType
from pyspark.sql import Row

Edgelist = [(1,2),(1,3),(1,4),(2,3),(2,4),(xx,xx)]
graphData= sparkSession.sparkContext.parallelize(Edgelist).map(lambda (src,dst): Row(src,dst))
graphSchemaAB = StructType([StructField('A',IntegerType(),nullable=False),StructField('B',StringType(),nullable=False)])

abDF=sparkSession.createDataFrame(graphData,graphSchemaAB)
graphSchemaBC =StructType([StructField('B',IntegerType(),nullable=False),StructField('C',StringType(),nullable=False)])
bcDF = sparkSession.createDataFrame(graphData,graphSchemaBC)

abDF.show()

joinDF = abDF.join(bcDF,abDF.B == bcDF.B)
joinDF.show()
abcDF.drop('B').groupBy('A','C').count().filter('A=1').show()

graphframe

edges = sparkSession.createDataFrame([(‘xx’,‘xx’,‘friend’),(‘xx’,‘xx’,‘friend’)],[‘src’,‘dst’,‘relationship’])
g = GraphFrame(vertices,edges)

DSL

Edge
Union of Edges
Names:
identify common elements
identify names of columns in the result DataFrame
Anonymous edges and vertices
Negation

triangles

本文作者： Jay Kay
本文链接： http://jaykay233.github.io/2020/06/17/graphframe/
版权声明： 本博客所有文章除特别声明外，均采用 BY-NC-SA 许可协议。转载请注明出处！