-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
502 error printout breaks console interface #116
Comments
There is no such thing as "special characters". If you're dealing with text, your software should know the encoding it is in and handle that properly. Just assuming it's a single-byte encoding is a bad idea, especially since the Flickr API documentation pretty much screams that everything is UTF-8. Ignoring character encoding will always turn around to bite you.
AFAIK the library doesn't |
Hey @sybrenstuvel I did some further digging, I still think there is a program logging problem in the repo code, specificly in this line. First, let me say that I did not feed non-UTF text into the API interface. The issue is in response payload from Flickr. With that in mind, I believe the possible error is not lie within logging, but "urllib_parse.unquote". Let me explain with a fun experiment:
Cheers! |
Please don't screenshot your code. Just use Markdown to format it properly. That will allow me to copy-paste whatever you did and try it myself, instead of having to type everything myself. Your use of the |
hey @sybrenstuvel Yeah you are right. This is an issue relative to python2. However, my original screenshot is running within python3.5. I was doing a quick test with my laptop on my way out when I submit the last post, so the issue is still there, just i did not get the right one. I dig a bit further and try to replicate the issue. So the problem is about display this page. However, I tried to google the specific html code for this page trying to load the webpage again, I failed to find any. And also because the server issue are rare, I can not replicate it by send request. However, I looked into it, and believe that it breaks the code when it is at displaying Korea. So I downloaded a html source code of offical Korea Tourism website to get some Korea byte string. Now we can successfully locate the issue: import logging
from urllib import parse as urllib_parse
# the following line should freeze your console, or python interface, if not let me know
logging.error(urllib_parse.unquote('\xeb\xac\xb4\xeb\x8b\xa8\xec\x88\x98\xec\xa7\x91\xea\xb1\xb0\xeb\xb6\x80</a></li>\n')) |
Some info I hope it helps...
$ python3.4
Python 3.4.3 (default, Nov 17 2016, 01:08:31)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information. >>> import logging
>>> from urllib import parse as urllib_parse
>>> logging.error(urllib_parse.unquote('\xeb\xac\xb4\xeb\x8b\xa8\xec\x88\x98\xec\xa7\x91\xea\xb1\xb0\xeb\xb6\x80</a></li>\n'))
ERROR:root:무���거�</a></li>
>>> print('still here')
still here
>>> hope it helps |
Why are you unquoting a string that clearly isn't URL-encoded at all? |
hey @sybrenstuvel I got confused about that part 2. If the error is generated at this line, it is a mistake to use unquote function. The function should parse a url string, not request text. |
@newpro just trying to help out. Would you mind going back to the beginning? I have a wild guess that the console/shell might not have the appropriate locale settings and may be getting confused!
# I've used this setting to allow support for international characters in
# folders and file names
export LC_ALL=en_US.utf8
export LANG=en_US.utf8
$ echo $LANG
en_US.UTF-8
$ find . -type d
.
./Test Photo Library/Várias Pics
$ LANG=en_US find . -type d
.
./Test Photo Library/V??rias Pics
$ |
Hey @sybrenstuvel
Thanks so much for the repo! It really saves me a lot of time in computer vision research.
The flickr server sometimes gives 502 error, even through it is very rare. My strategy currently include catch the error, and do an exponential backoff, wait for the flickr server to recover. The strategy works very well, however, in some cases, the library print out 502 error message payload, which is 502 webpage, and break the console, most likely special characters causes the program into memory space that should not be accessed. The program seems to be still running and collecting data, however, can not print further messages to monitor the progress. I attached a screenshot of the symptoms for reference.
If I may provide some recommendations to fix the issue, the easy way would be disabling print out the payload, or specifically filter out special characters that cause the console to break.
Thanks again!
Head of the message:
Tail of the message:
The text was updated successfully, but these errors were encountered: