apache spark - Run a geospatial query in memory using Scala -


is there way run in scala geospatial query, given set of lat/lon coordinates, find nearest distance? query needs run in memory possibly.

the set of values 1 million lon/lat coordinates. trying in spark solution have found magellan cannot make work spark 1.6 , scala 2.11 trying customized solution.

example of query: given 1 point in wgs84 coordinates , 1 million set of wsg84 coords, want nearest 15 coords in radius of 1 mile.

here library rtree implemetation can used indexing of geo data in scala: https://github.com/davidmoten/rtree

just select bounding box rectangle(s) point center of circle given radius (distance in case) , filter points distance cut out false positives in corners of bounding boxes , sort results calculated distance take required nearest 15.

you can use ‘haversine’ formula check distance condition between points (see description here http://www.movable-type.co.uk/scripts/latlong.html):

import java.lang.math._ import com.github.davidmoten.rtree.geometry.{point, rectangle} import com.github.davidmoten.rtree.geometry.geometries._  def distance(p1: point, p2: point): double = {   val radlon1 = toradians(p1.x)   val radlat1 = toradians(p1.y)   val radlon2 = toradians(p2.x)   val radlat2 = toradians(p2.y)   val x = sin((radlon2 - radlon1) * 0.5)   val y = sin((radlat2 - radlat1) * 0.5)   val = y * y + cos(radlat1) * cos(radlat2) * x * x   atan2(sqrt(a), sqrt(1 - a)) * 12756274 // earth diameter in meters } 

for calculation of bounding boxes use following function:

def boundingrectangles(c: point, r: double): list[rectangle] = {   val radlon = toradians(c.x)   val radlat = toradians(c.y)   val raddist = r / 6378137 // earth radius in meters   val lat1 = todegrees(radlat - raddist)   val lat2 = todegrees(radlat + raddist)   if (lat1 > -90 && lat2 < 90) {     val deltalon = asin(sin(raddist) / cos(radlat))     val lon1 = todegrees(radlon - deltalon)     val lon2 = todegrees(radlon + deltalon)     if (lon1 < -180) rectangle(-180, lat1, lon2, lat2) :: rectangle(lon1 + 360, lat1, 180, lat2) :: nil     else if (lon2 > 180) rectangle(-180, lat1, lon2 - 360, lat2) :: rectangle(lon1, lat1, 180, lat2) :: nil     else rectangle(lon1, lat1, lon2, lat2) :: nil   } else rectangle(-180, max(lat1, -90), 180, min(lat2, 90)) :: nil } 

list of rectangles required case when circle crossed date change meridian, because rtree doesn't support wrapping of geo-coordinates on earth, split rectangles on 2 date change meridian.

formula , description here http://janmatuschek.de/latitudelongitudeboundingcoordinates#longitude


Comments