Replies: 2 comments
-
From the URL spec: https://url.spec.whatwg.org/#percent-encoded-bytes, emphasis added.
You said:
Therefore, you're deliberately sending data against spec and getting an unexpected result. You need to percent encode the characters that must be encoded. |
Beta Was this translation helpful? Give feedback.
-
@davidism thank you for your answer. These requests are nothing we send ourselves but rather requests we receive in the wild. I was maybe not clear enough on this in my description. The I'm absolutely agree there shouldn't be any request with code points greater than U+007E (~) without percent-encoding however there are and as recipient one can't really influence them. |
Beta Was this translation helpful? Give feedback.
-
Hi,
first off thank you for your effort and work on werkzeug and flask they are really great.
In some of our projects where we use flask we regularly see
UnicodeDecodeError
when accessing request parameters with valid latin1 codes (withrequest.args.get
) . This happens when using other WSGI servers than werkzeug's development server.As an example the following minimal request will trigger the exception because of the
á
when gunicorn is used:Received bytes:
The request received and passed by the WSGI server sets a valid latin1 encoded str in the environment.
Without handling this directly in werkzeug the only solutions I've found is to
globally catch UnicodeDecodeError or to catch the exception in a try block
(with a wrapper function). Both basically work but are not necessarily what you want.
Therefore I'd like to discuss if there are another solution possible within werkzeug.
I'm happy about any recommendation and hint and gladly create a pull request for a solution.
Steps to reproduce
To reproduce simply create a venv and install the required packages:
For running the commands it is assumed that the venv is activated.
As most libraries and tools automatically do percent encoding for characters > 128 or escape certain chars a raw request is used:
Server and WSGI app - example was taken from https://werkzeug.palletsprojects.com/en/3.0.x/tutorial/
Start gunicorn:
gunicorn 'test:application' --bind 127.0.0.1:5000
If you run the
python raw_request.py
script against gunicorn you will see the UnicodeDecodeErrorSame run of
python raw_request.py
against the development server started witheverything is fine.
Details
To find out the difference in parameter handling between werkzeug's and other UWSGI servers
I tracked the path of the parameters from reception to access. As example for an other
server I've used gunicorn.
werkzeugs development server
0.) byte are received and converted to string by http/server
str(self.raw_requestline, 'iso-8859-1')
1.) string to byte to string:
encode('utf-8').decode("latin1")
in werkzeug/wsgi.py2.) string to byte
encode("latin1")
werkzeug/wrappers/request.py (Request init)3.) byte to string
decode("utf-8")
werkzeug/sansio/request.py argsAll operations applied to a received parameter would be:
gunicorn
0.)
str(b, 'latin1'
) received by the respective worker (gunicorn/http/message.py parse_request_line -> urlsplit in werkzeug/utils)2.) string to byte
encode("latin1")
werkzeug/wrappers/request.py (Request init)3.) byte to string
decode("utf-8")
werkzeug/sansio/request.py argsAll operations applied to a received parameter would be:
In 0.) the request is received and initially decoded to latin1/iso-8859-1. 2.) and 3.) is done in werkzeug regardless of the WSGI server used.
The difference is in 1.) the
encode('utf-8').decode("latin1")
. This is only done when the development server is used.Possible fixes
Setting the error handler of decode to
ignore
"fixes" the issue but might remove valid codes.Another possibility would be to do the
encode('utf-8').decode("latin1")
inwerkzeug/wrappers/request.py
orwerkzeug/sansio/request.py
instead of werkzeug/wsgi.py then handling would be the same for the development server as well for any other WSGI server.If you need any further information or have questions please let me know.
Thank you for taking time!
Beta Was this translation helpful? Give feedback.
All reactions