Uploaded image for project: 'SimplyE 2.0'
  1. SimplyE 2.0
  2. SIMPLY-384

Receiving HTML 404 as images via the Overdrive API

XMLWordPrintable

      It appears sometimes HTML 404 errors are being passed back via the OverdriveBibliographicCoverageProvider as though they are images, resulting in this error:

      2016-11-01 13:47:50,109:Overdrive Bibliographic Coverage Provider:WARNING:coverage.py:Error applying metadata to edition 318917: cannot identify image file <cStringIO.StringI object at 0x7f0c580d0470>
      Traceback (most recent call last):
        File "/home/ec2-user/metadata/core/coverage.py", line 575, in _set_metadata
          edition, replace=metadata_replacement_policy,
        File "/home/ec2-user/metadata/core/metadata_layer.py", line 1454, in apply
          self.mirror_link(edition, data_source, link, link_obj, replace)
        File "/home/ec2-user/metadata/core/metadata_layer.py", line 512, in mirror_link
          max_age=max_age,
        File "/home/ec2-user/metadata/core/model.py", line 6933, in get
          representation.update_image_size()
        File "/home/ec2-user/metadata/core/model.py", line 7017, in update_image_size
          image = self.as_image()
        File "/home/ec2-user/metadata/core/model.py", line 7273, in as_image
          return Image.open(fh)
        File "/home/ec2-user/metadata/env/local/lib/python2.7/site-packages/PIL/Image.py", line 2295, in open
          % (filename if filename else fp))
      IOError: cannot identify image file <cStringIO.StringI object at 0x7f0c580d0470>
      

      Further investigation demonstrated that while the Representation had a

      clean_media_type

      of

      'image/jpeg'

      , the content of the Representation was an HTML 404 error (as its

      status_code

      :

      Unable to find source-code formatter for language: py. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
      (Pdb) self.clean_media_type
      'image/jpeg'
      (Pdb) fh
      <cStringIO.StringI object at 0x1114e8f10>
      (Pdb) Image.open(fh)
      *** IOError: cannot identify image file <cStringIO.StringI object at 0x1114e8f10>
      (Pdb) fh.readlines()
      ['\n', '<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">\n', '        <title>The page is not found</title>\n', '        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />\n', '        <style type="text/css">\n', '            /*<![CDATA[*/\n', '            body {\n', '                background-color: #fff;\n', '                color: #000;\n', '                font-size: 0.9em;\n', '                font-family: sans-serif,helvetica;\n', '                margin: 0;\n', '                padding: 0;\n', '            }\n', '            :link {\n', '                color: #c00;\n', '            }\n', '            :visited {\n', '                color: #c00;\n', '            }\n', '            a:hover {\n', '                color: #f50;\n', '            }\n', '            h1 {\n', '                text-align: center;\n', '                margin: 0;\n', '                padding: 0.6em 2em 0.4em;\n', '                background-color: #294172;\n', '                color: #fff;\n', '                font-weight: normal;\n', '                font-size: 1.75em;\n', '                border-bottom: 2px solid #000;\n', '            }\n', '            h1 strong {\n', '                font-weight: bold;\n', '                font-size: 1.5em;\n', '            }\n', '            h2 {\n', '                text-align: center;\n', '                background-color: #3C6EB4;\n', '                font-size: 1.1em;\n', '                font-weight: bold;\n', '                color: #fff;\n', '                margin: 0;\n', '                padding: 0.5em;\n', '                border-bottom: 2px solid #294172;\n', '            }\n', '            h3 {\n', '                text-align: center;\n', '                background-color: #ff0000;\n', '                padding: 0.5em;\n', '                color: #fff;\n', '            }\n', '            hr {\n', '                display: none;\n', '            }\n', '            .content {\n', '                padding: 1em 5em;\n', '            }\n', '            .alert {\n', '                border: 2px solid #000;\n', '            }\n', '\n', '            img {\n', '                border: 2px solid #fff;\n', '                padding: 2px;\n', '                margin: 2px;\n', '            }\n', '            a:hover img {\n', '                border: 2px solid #294172;\n', '            }\n', '            .logos {\n', '                margin: 1em;\n', '                text-align: center;\n', '            }\n', '            /*]]>*/\n', '        </style>\n', '    </head>\n', '\n', '    <body>\n', '        <h1><strong>404 Error</strong></h1>\n', '\n', '        <div class="content">\n', '\n', '            <h3>The page you are looking for is not found.</h3>\n', '\n', '        </div>\n', '    </body>\n', '</html>\n']
      (Pdb) self.status_code
      404
      

      Issue is synchronized with a GitHub issue
      Repository Name: metadata_wrangler
      Issue Number: 104

            leonardrichardson Leonard Richardson [X] (Inactive)
            github Github Sync
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: