i working in spark , picking scala along way. have question rdd api , how various base rdds implemented. specifically, ran following code in spark-shell:
scala> val gspeech_path="/home/myuser/gettysburg.txt" gspeech_path: string = /home/myuser/gettysburg.txt scala> val lines=sc.textfile(gspeech_path) lines: org.apache.spark.rdd.rdd[string] = mappartitionsrdd[7] @ textfile @ <console>:29 scala> val pairs = lines.map(x => (x.split(" ")(0), x)) pairs: org.apache.spark.rdd.rdd[(string, string)] = mappartitionsrdd[8] @ map @ <console>:3 scala> val temps:seq[(string,seq[double])]=seq(("sp",seq(68,70,75)), ("tr",seq(87,83,88,84,88)), ("en",seq(52,55,58,57.5)), ("er",seq(90,91.3,88,91))) temps: seq[(string, seq[double])] = list((sp,list(68.0, 70.0, 75.0)), (tr,list(87.0, 83.0, 88.0, 84.0, 88.0)), (en,list(52.0, 55.0, 58.0, 57.5)), (er,list(90.0, 91.3, 88.0, 91.0))) scala> var temps_rdd0=sc.parallelize(temps) temps_rdd0: org.apache.spark.rdd.rdd[(string, seq[double])] = parallelcollectionrdd[9] @ parallelize @ <console>:29
i wanted investigate bit more , looked api mappartitionsrdd
, parallelcollectionrdd
expecting subclasses of base rdd org.apache.spark.rdd. however, couldn't find these classes when searched spark scala api (scaladocs)
i able find them in java docs not scala docs @ spark.apache.org. know of scala 2 languages can intermingle spark written in java. however, appreciate clarification exact relationship pertains rdds. case have abstract scala rdd reference underlying implementation java rdd per response :
# scala abstract rdd = concrete java mappartitionsrdd org.apache.spark.rdd.rdd[string] = mappartitionsrdd[7]
?
thanks in advance help/explanation.
as @archeg pointed out in comment above, these classes indeed scala classes , can found @ org.apache.spark.rdd.mappartitionsrdd
what caused confusion couldn't find mappartitionsrdd when did search in spark scala api (scaladoc)
Comments
Post a Comment