Uploaded image for project: 'SimplyE 2.0'
  1. SimplyE 2.0
  2. SIMPLY-3858

Crash in genre classification code

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: High High
    • 4.0.2cm
    • Server - Core
    • None
    • SIMPLY S15 July 20 - Aug 3, SIMPLY S16 August 3 - 17, SIMPLY S19 Sep 14 - Sep 28, SIMPLY S20 Sep 28 - Oct 12, SIMPLY Sprint Oct 13 - Oct 26, SIMPLY S22 Oct 26 - Nov 9, SIMPLY S23 Nov 10 - 24, SIMPLY S24 Nov 24 - December 7, SIMPLY S25 December 8 - 23, SIMPLY S26 Dec 22 - Jan 5, SIMPLY Sprint 1 Jan 5 - 19, SIMPLY Sprint 2 Jan 19 - Feb 2, SIMPLY Sprint 3 Feb 2 - 16, SIMPLY S13 June 22 - July 6, SIMPLY S14 July 6 - 20

      While investigating a problem with NYPL's Axis 360 collection I discovered a crash that can happen when determining which genres a book belongs to.

      Here's the stack trace:

      Traceback (most recent call last):
        File "/Users/leonardr/simplified/circulation/core/monitor.py", line 180, in run
          new_timestamp = self.run_once(progress)
        File "/Users/leonardr/simplified/circulation/core/monitor.py", line 270, in run_once
          self.catch_up_from(start, cutoff, progress)
        File "/Users/leonardr/simplified/circulation/api/axis.py", line 601, in catch_up_from
          self.process_book(bibliographic, circulation)
        File "/Users/leonardr/simplified/circulation/api/axis.py", line 609, in process_book
          bibliographic, circulation
        File "/Users/leonardr/simplified/circulation/api/axis.py", line 486, in update_book
          bibliographic.apply(edition, self.collection, replace=policy)
        File "/Users/leonardr/simplified/circulation/core/metadata_layer.py", line 1922, in apply
          self.circulation.apply(_db, collection, replace)
        File "/Users/leonardr/simplified/circulation/core/metadata_layer.py", line 1300, in apply
          work, work_changed = pool.calculate_work()
        File "/Users/leonardr/simplified/circulation/core/model/licensing.py", line 1085, in calculate_work
          work.calculate_presentation(exclude_search=exclude_search)
        File "/Users/leonardr/simplified/circulation/core/model/work.py", line 952, in calculate_presentation
          default_audience=default_audience
        File "/Users/leonardr/simplified/circulation/core/model/work.py", line 1338, in assign_genres
          default_audience=default_audience)
        File "/Users/leonardr/simplified/circulation/core/classifier/__init__.py", line 1244, in classify
          genres = self.genres(fiction)
        File "/Users/leonardr/simplified/circulation/core/classifier/__init__.py", line 1440, in genres
          genres = self.consolidate_genre_weights(genres)
        File "/Users/leonardr/simplified/circulation/core/classifier/__init__.py", line 1494, in consolidate_genre_weights
          for parent, (child, weight) in sorted(list(heaviest_child.items())):
      TypeError: '<' not supported between instances of 'GenreData' and 'GenreData'
      

      I don't have the classifier system in my mental cache so I'm not sure what's going on, but this must be an extraordinarily rare occurrence given how many books are run through this sytem. It happens when there's more than one item in "heaviest_child", but I don't know how that might come about. Maybe when a library licenses the same book from multiple vendors, and the vendors disagree on how the book should be classified?

            michaelbenowitz Michael Benowitz
            michaelbenowitz Michael Benowitz
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: