首页 > 代码库 > python3+spark2.1+kafka0.8+sparkStreaming
python3+spark2.1+kafka0.8+sparkStreaming
python代码:
import time from pyspark import SparkContext from pyspark.streaming import StreamingContext from pyspark.streaming.kafka import KafkaUtils from operator import add sc = SparkContext(master="local[1]",appName="PythonSparkStreamingRokidDtSnCount") ssc = StreamingContext(sc, 2) zkQuorum = ‘localhost:2181‘ topic = {‘rokid‘:1} groupid = "test-consumer-group" lines = KafkaUtils.createStream(ssc, zkQuorum, groupid, topic) lines1 = lines.flatMap(lambda x: x.split("\n")) valuestr = lines1.map(lambda x: x.value.decode()) valuedict = valuestr.map(lambda x:eval(x)) message = valuedict.map(lambda x: x["message"]) rdd2 = message.map(lambda x: (time.strftime("%Y-%m-%d",time.localtime(float(x.split("\u0001")[0].split("\u0002")[1])/1000))+"|"+x.split("\u0001")[1].split("\u0002")[1],1)).map(lambda x: (x[0],x[1])) rdd3 = rdd2.reduceByKey(add) rdd3.saveAsTextFiles("/tmp/wordcount") rdd3.pprint() ssc.start() ssc.awaitTermination()
执行SparkStreaming:
spark/bin/spark-submit --jars spark-streaming-kafka-0-8-assembly_2.11-2.1.0.jar ReadFromKafkaStreaming.py
其中spark-streaming-kafka-0.98-assembly_2.11-2.1.0.jar从以下网站下载
http://search.maven.org
作为入门参考。
python3+spark2.1+kafka0.8+sparkStreaming
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。