-
Bug
-
Resolution: Done
-
Lowest
-
None
-
None
It appears sometimes HTML 404 errors are being passed back via the OverdriveBibliographicCoverageProvider as though they are images, resulting in this error:
2016-11-01 13:47:50,109:Overdrive Bibliographic Coverage Provider:WARNING:coverage.py:Error applying metadata to edition 318917: cannot identify image file <cStringIO.StringI object at 0x7f0c580d0470> Traceback (most recent call last): File "/home/ec2-user/metadata/core/coverage.py", line 575, in _set_metadata edition, replace=metadata_replacement_policy, File "/home/ec2-user/metadata/core/metadata_layer.py", line 1454, in apply self.mirror_link(edition, data_source, link, link_obj, replace) File "/home/ec2-user/metadata/core/metadata_layer.py", line 512, in mirror_link max_age=max_age, File "/home/ec2-user/metadata/core/model.py", line 6933, in get representation.update_image_size() File "/home/ec2-user/metadata/core/model.py", line 7017, in update_image_size image = self.as_image() File "/home/ec2-user/metadata/core/model.py", line 7273, in as_image return Image.open(fh) File "/home/ec2-user/metadata/env/local/lib/python2.7/site-packages/PIL/Image.py", line 2295, in open % (filename if filename else fp)) IOError: cannot identify image file <cStringIO.StringI object at 0x7f0c580d0470>
Further investigation demonstrated that while the Representation had a
clean_media_type
of
'image/jpeg'
, the content of the Representation was an HTML 404 error (as its
status_code
:
Unable to find source-code formatter for language: py. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
(Pdb) self.clean_media_type 'image/jpeg' (Pdb) fh <cStringIO.StringI object at 0x1114e8f10> (Pdb) Image.open(fh) *** IOError: cannot identify image file <cStringIO.StringI object at 0x1114e8f10> (Pdb) fh.readlines() ['\n', '<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">\n', ' <title>The page is not found</title>\n', ' <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />\n', ' <style type="text/css">\n', ' /*<![CDATA[*/\n', ' body {\n', ' background-color: #fff;\n', ' color: #000;\n', ' font-size: 0.9em;\n', ' font-family: sans-serif,helvetica;\n', ' margin: 0;\n', ' padding: 0;\n', ' }\n', ' :link {\n', ' color: #c00;\n', ' }\n', ' :visited {\n', ' color: #c00;\n', ' }\n', ' a:hover {\n', ' color: #f50;\n', ' }\n', ' h1 {\n', ' text-align: center;\n', ' margin: 0;\n', ' padding: 0.6em 2em 0.4em;\n', ' background-color: #294172;\n', ' color: #fff;\n', ' font-weight: normal;\n', ' font-size: 1.75em;\n', ' border-bottom: 2px solid #000;\n', ' }\n', ' h1 strong {\n', ' font-weight: bold;\n', ' font-size: 1.5em;\n', ' }\n', ' h2 {\n', ' text-align: center;\n', ' background-color: #3C6EB4;\n', ' font-size: 1.1em;\n', ' font-weight: bold;\n', ' color: #fff;\n', ' margin: 0;\n', ' padding: 0.5em;\n', ' border-bottom: 2px solid #294172;\n', ' }\n', ' h3 {\n', ' text-align: center;\n', ' background-color: #ff0000;\n', ' padding: 0.5em;\n', ' color: #fff;\n', ' }\n', ' hr {\n', ' display: none;\n', ' }\n', ' .content {\n', ' padding: 1em 5em;\n', ' }\n', ' .alert {\n', ' border: 2px solid #000;\n', ' }\n', '\n', ' img {\n', ' border: 2px solid #fff;\n', ' padding: 2px;\n', ' margin: 2px;\n', ' }\n', ' a:hover img {\n', ' border: 2px solid #294172;\n', ' }\n', ' .logos {\n', ' margin: 1em;\n', ' text-align: center;\n', ' }\n', ' /*]]>*/\n', ' </style>\n', ' </head>\n', '\n', ' <body>\n', ' <h1><strong>404 Error</strong></h1>\n', '\n', ' <div class="content">\n', '\n', ' <h3>The page you are looking for is not found.</h3>\n', '\n', ' </div>\n', ' </body>\n', '</html>\n'] (Pdb) self.status_code 404
┆Issue is synchronized with a GitHub issue
┆Repository Name: metadata_wrangler
┆Issue Number: 104