Uploaded image for project: 'SimplyE 2.0'
  1. SimplyE 2.0
  2. SIMPLY-1174

During Axis 360 ingest, I received the following e...

XMLWordPrintable

    • S1 SIMPLY Dec 26 - Jan 8

      During Axis 360 ingest, I received the following error:

      {... "message": "Error running monitor Axis 360 Circulation Monitor for collection XYZ Public Library - Axis 360: xmlParseCharRef: invalid xmlChar value 31, line 1, column 1051978 (line 1)", "filename": "scripts.py"}
      

      Multiple runs of the axis_monitor script resulted in the same error, but always with a different column location.

      After investigation, I found that the entity string

      
      

      was contained in the XML document downloaded from Baker & Taylor. There was only one occurrence of the entity in any of the download files (I captured the document during each cron run for an hour).

      For a temporary fix, I added the following line to the api/axis.py script to "remove" the string, after line 304 (v2.2.8; line 445 as of commit f61b90b):

      content = content.replace('', '')
      

      however, I could have tried replacing line 304 with the following for succinctness:

      content = availability.content.replace('', '')
      

      Question: Specifically, do we have code which wrangles invalid Unicode/XML characters (or bad data in general)? I don't know nearly enough about regexes, XML, or Unicode, but I realize my temporary solution should be something more general (say, that would replace any control-character-type entity reference or invalid character beside the CR and LF characters). If there's not code, the following reference might be helpful to someone who understands more than I:

      https://chase-seibert.github.io/blog/2011/05/20/stripping-control-characters-in-python.html

      --Robert

      Reporter: Robert Williams
      E-mail: williams@amigos.org

            leonardrichardson Leonard Richardson [X] (Inactive)
            RobertWilliams Robert Williams [X] (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: