apache spark - Java RDD vs Scala RDD -


i working in spark , picking scala along way. have question rdd api , how various base rdds implemented. specifically, ran following code in spark-shell:

scala> val gspeech_path="/home/myuser/gettysburg.txt" gspeech_path: string = /home/myuser/gettysburg.txt  scala> val lines=sc.textfile(gspeech_path) lines: org.apache.spark.rdd.rdd[string] = mappartitionsrdd[7]  @ textfile @ <console>:29  scala> val pairs = lines.map(x => (x.split(" ")(0), x)) pairs: org.apache.spark.rdd.rdd[(string, string)] =   mappartitionsrdd[8] @ map @ <console>:3  scala> val temps:seq[(string,seq[double])]=seq(("sp",seq(68,70,75)),                                        ("tr",seq(87,83,88,84,88)),                                         ("en",seq(52,55,58,57.5)),                                        ("er",seq(90,91.3,88,91)))  temps: seq[(string, seq[double])] = list((sp,list(68.0, 70.0, 75.0)),  (tr,list(87.0, 83.0, 88.0, 84.0, 88.0)), (en,list(52.0, 55.0, 58.0,   57.5)), (er,list(90.0, 91.3, 88.0, 91.0)))  scala> var temps_rdd0=sc.parallelize(temps) temps_rdd0: org.apache.spark.rdd.rdd[(string, seq[double])] =  parallelcollectionrdd[9] @ parallelize @ <console>:29 

i wanted investigate bit more , looked api mappartitionsrdd , parallelcollectionrdd expecting subclasses of base rdd org.apache.spark.rdd. however, couldn't find these classes when searched spark scala api (scaladocs)

i able find them in java docs not scala docs @ spark.apache.org. know of scala 2 languages can intermingle spark written in java. however, appreciate clarification exact relationship pertains rdds. case have abstract scala rdd reference underlying implementation java rdd per response :

# scala abstract rdd = concrete java mappartitionsrdd org.apache.spark.rdd.rdd[string] = mappartitionsrdd[7]  

?

thanks in advance help/explanation.

as @archeg pointed out in comment above, these classes indeed scala classes , can found @ org.apache.spark.rdd.mappartitionsrdd

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/mappartitionsrdd.scala

what caused confusion couldn't find mappartitionsrdd when did search in spark scala api (scaladoc)


Comments