Uploaded image for project: 'SimplyE 2.0'
  1. SimplyE 2.0
  2. SIMPLY-2748

Nondeterministic Elasticsearch behavior can duplicate and omit books from search results

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Medium Medium
    • None
    • Server - Core
    • None

      This problem got worse recently on NYPL's circulation manager, possibly because we scaled up the Elasticsearch server and added nodes.

      Each node in an Elasticsearch cluster has a slightly different index, so will order results slightly differently given the same query. This can cause problems if a request for page N of results goes to node A, and the request for page N+1 goes to node B. It's possible that a book will be repeated (node A puts it at the end of page N, node B puts it at the beginning of N+1). When that happens, some other book may be omitted (node A would have put it at the beginning of page N+1, node B would have put it at the end of page N).

      When we use Elasticsearch to generate feeds, we avoid this problem using keyset pagination. That works for (e.g.) alphabetical order because all nodes agree on what alphabetical order is. It doesn't work for relevance because nodes don't quite agree on what "relevance" means.

      The simplest way to mitigate this problem is to increase the search result page size: say, to 50 or 100. We've discussed this before but I don't see a record of the discussion and the circulation manager search page size is still ten. I don't know what the right number is, but ten is way too small: ten means means SimplyE makes multiple HTTP requests every time someone runs a search, to fill up some internal buffer.

      I don't know how big the buffer is, and it may be different sizes between platforms. But at the very least, the circulation manager should serve enough search results that if you're using SimplyE and you run a search and don't scroll down, SimplyE only makes one HTTP request. I can make this change easily on the server side, but I need to know from the mobile team what number will work.

      Increasing the page size will reduce the chances of this problem happening, both because there will be fewer joins between pages, and because fewer users will scroll down far enough to trigger another page fetch.

      This is speculation on my part, but by propagating node information through the search URL, we might be able to use Elasticsearch preferences to make it more likely that the page N+1 search goes to the same node as the page N search. This is a lot more work than changing the page size (which we should also do) but if it works it will 100% solve the problem.

      On the client side, SimplyE can be changed to filter duplicates out of the search results it displays, but this won't bring back the book that __ was omitted, so I think it's better to focus on server-side solutions.

            Unassigned Unassigned
            leonardrichardson Leonard Richardson [X] (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: