Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose new path used to compute new url in RequestRedirect #3000

Open
tobias-urdin opened this issue Nov 11, 2024 · 4 comments · May be fixed by #3001
Open

Expose new path used to compute new url in RequestRedirect #3000

tobias-urdin opened this issue Nov 11, 2024 · 4 comments · May be fixed by #3001

Comments

@tobias-urdin
Copy link

The RequestRedirect exception is raised with new_url set to
the computed URL that should be redirected to, but we don't
expose the path that was used to compute the URL in the exception
so depending code needs to urlparse() the URL even though
werkzeug already has the path and can pass it along in the exception.

Referring to this example code:

    def resolve_endpoint(self, environ, path, method):
        """Resolve endpoint from the url map.

        :environ Environment: The environment.
        :path str: The request path.
        :method str: The request method.
        :returns str: The endpoint from the url map.
        """

        urls = url_map.bind_to_environ(environ)
        try:
            endpoint, _ = urls.match(path, method=method)
            return endpoint
        except routing.RequestRedirect as e:
            r = urlparse(e.new_url)
            return resolve_endpoint(environ, r.path, method)
            # FIXME: this could be e.new_path if RequestRedirect passed path
            # return resolve_endpoint(environ, e.new_path, method)
        except (wexc.NotFound, wexc.MethodNotAllowed):
            pass

        return None
tobias-urdin added a commit to tobias-urdin/werkzeug that referenced this issue Nov 11, 2024
When we raise the RequestRedirect exception
we have already computed the new url that
should be redirected to in the exception
(new_url) but we don't pass the the
new path that we used to compute the url.

This causes another layer of redirection
in depending code that needs to urlparse()
the url to get the path we already have
the data for.

This adds new_path to the RequestRedirect
exception and populates it with the path
used when computing new_url.

Fixes: pallets#3000
tobias-urdin added a commit to tobias-urdin/werkzeug that referenced this issue Nov 11, 2024
When we raise the RequestRedirect exception
we have already computed the new url that
should be redirected to in the exception
(new_url) but we don't pass the the
new path that we used to compute the url.

This causes another layer of redirection
in depending code that needs to urlparse()
the url to get the path we already have
the data for.

This adds new_path to the RequestRedirect
exception and populates it with the path
used when computing new_url.

Fixes: pallets#3000
@tobias-urdin tobias-urdin linked a pull request Nov 11, 2024 that will close this issue
@davidism
Copy link
Member

davidism commented Nov 11, 2024

In what situation do you need this?

@tobias-urdin
Copy link
Author

In this case it's a WSGI middleware that uses werkzeg's routing module for matching routes to know when to apply a specific logic. The module does a great job to match URLs but we need to handle cases where URLs contain a trailing slash or double slashes in the middle of the URL.

Example: Lets say we want to match /some/cool/url we also need to match //some/cool/url, /some/cool/url/ and /some//cool/url – works great in the current form but we could optimize away the extra urlparse() per request if werkzeug exposed the URL path it wants to redirect to and not only giving us the full URL in the RequestRedirect exception.

@davidism
Copy link
Member

davidism commented Nov 12, 2024

You could disable merge_slashes, strict_slashes, and redirect_defaults if you don't want to cause redirects and just want them to match the endpoint. You'll presumably end up with the redirects anyway once you route in the app, which would result in this logic running multiple times (the first when you match without redirects and apply it, the the second after you redirect then match and apply again). Also note that redirects can happen for other reasons, such as the rule itself being a redirect, and redirects can go to more places than the current host or subdomain. So it's not clear this would be usable in all cases.

If you're calling match in middleware, then calling it again in the actual app to do routing, then presumably the performance of a single urlsplit call is insignificant. But if it is significant, you can also do much simpler fast string operations to get the path: path = url.partition("/")[2].partition("?")[0].

@tobias-urdin
Copy link
Author

This is how the Map and Rule is setup, with this and the resolve_endpoint() above we can have it return cool as endpoint for all the URLs in my previous comment, and if it returns None we just pass the request directly to the application without intercepting it.

url_rules = [
    routing.Rule('/some/cool/url', methods=['POST'], endpoint='cool'),
]
url_map = routing.Map(
    rules=url_rules,
    strict_slashes=False,
    merge_slashes=True,
    redirect_defaults=False,
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants