querydsl - Elasticsearch how to match documents for which the field tokens are a sub-set of the query tokens -


i have keyword/key-phrase field tokenize using standard analyser. want field match if if there search phrase has tokens of field in it.

for example if field value "veni, vidi, vici" , search phrase "ceaser veni,vidi,vici" want search phrase match search phrase "veni, vidi" not match.

i need "vidi, veni, vici" (weird!) match. positions , ordering of terms not important. phrase match not quite work me think.

i can use "bool query" "minimum_should_match" parameter specific example not want minimum should match ratio/number of tokens in search phrase.

pure es solution go this. need 2 requests.

1) first need pass user query through analyze api search tokens.

curl -xget 'localhost:9200/_analyze' -d ' {   "analyzer" : "standard",   "text" : "ceaser veni,vidi,vici" }' 

you 4 tokens ceaser, veni, vidi, vici . need pass these tokens array next search request.

2) need search documents tokens subset of search tokens.

{   "query": {     "filtered": {       "filter": {         "bool": {           "must": [             {               "query": {                 "match": {                   "title": "ceaser veni,vidi,vici"                 }               }             },             {               "script": {                 "script": "if(search_tokens.containsall(doc['title'].values)){return true;}",                 "params": {                   "search_tokens": [                     "ceaser",                     "veni",                     "vidi",                     "vici"                   ]                 }               }             }           ]         }       }     }   } } 

here job of first match query inside filter narrow down documents on script should run. containsall method check if documents tokens sublist of search tokens. slow job current set up. 1 big improvement can store tokens array doc['title'].values can replaced field improve script.

hope helps!


Comments