Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Jinja2 autoescaping #1921

Merged

Conversation

haxtibal
Copy link
Contributor

@haxtibal haxtibal commented Jul 18, 2024

This enables Jinja2 autoescaping everywhere, except for the diff screen which relies on pre generated HTML. The MR is not meant to be merged as-is.

It fixes the issues observed in #1920. Most integration tests pass. Tests fail for UI forms, presumably because they call html.escape manually and now are escaped twice.

Jinja2 recommneds autoescaping:

In future versions of Jinja we might enable autoescaping by default for security reasons.
As such you are encouraged to explicitly configure autoescaping now instead of relying on the default.

Templates access way more functions and variables than just enumerate_all_fields(). All need to be escaped. This does it automatically without risk of forgetting some places.

This MR changes some formerly explicit escaping:

  • enumerate_all_fields_escaped was the html.escape() variant of enumerate_all_fields. It's only evaluated in Jinja2 templates, therefore completely covered by autoescaping now. This also eliminates the need for get_text_value_escaped, which was only used by enumerate_all_fields_escaped.
  • RequirementFormField.field_escaped_value was the html.escape() variant of field_unescaped_value. field_escaped_value is only evaluated in Jinja2 templates, therefore completely covered by autoescaping now.
  • SearchScreenViewObject.search_value was explictly html.escape()ed. search_value is only read by Jinja2 templates, therefore completely covered by autoescaping now.
  • SourceFileViewHTMLGenerator changes some pygments output lines. pygements output is escaped, but we still have to escape our own changes. Changed html.escape to markupsafe.escape because that's what Jinja2 uses. It's better to use the same escaping function everywhere.
  • TextToHtmlWriter escaped plain text from multiline statements. We still need to do that. Changed html.escape to markupsafe.escape for same reason as above.

Considerations about what to escape

  • render_*_link make strings used as URL in href="xyz" . It's derived from filenames and UIDs. UIDs are restricted by a grammar regex. But Unix files can have weird names like document"<injected>.sdoc. So generated links should be URL encoded. HTML encoding is then not strictly necessary (wouldn't hurt either; could be considered as safety net). Suggestion: Do proper URL encoding in another PR; for now autoescaping prevents at least injection attacks.

TODO:

  • investigate failed ruff checks: Unrelated, new or changed ruff rule PLR1714
  • investigate failed integration tests (works locally): unrelated, we need either old filecheck 0.0.24 or fixed filecheck > 1.0.0
  • investigate failed end2end tests
  • try to replace some | safe with Markup() in code
  • reconsider if autoescaping could be used for the diff screen
  • Templates often render a variable that contains the output of calling a nested Template.render() programmatically. This output can be considered safe, yet we have to wrap it in Markup. Is there a way to automatically flag Template.render() as safe? Should already be that way: "Jinja functions (macros, super, self.BLOCKNAME) always return template data that is marked as safe."

Copy link
Collaborator

@stanislaw stanislaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable. Please proceed with the Jinja2 autoescape approach 👍

@haxtibal haxtibal force-pushed the tdmg/escape_singleline_html branch from 6039713 to 3757368 Compare July 20, 2024 11:46
@haxtibal
Copy link
Contributor Author

Integration tests fail because the recently released filecheck 1.0.0 doesn't support --dump-input. This is a regression unrelated to this PR.

@stanislaw
Copy link
Collaborator

Integration tests fail because the recently released filecheck 1.0.0 AntonLydike/filecheck#19. This is a regression unrelated to this PR.

Sorry for the inconvenience.

We are in the process of transferring the filecheck project to a group of developers who want to actively maintain and develop their own version of filecheck instead of FileCheck.py that was previously developed by me. We were too quick to release a 1.0.0, and it doesn't have the dump-input implemented yet.

I have reported this here: AntonLydike/filecheck#19 (comment).

@stanislaw stanislaw added this to the 2024-Q3 milestone Jul 20, 2024
@haxtibal haxtibal force-pushed the tdmg/escape_singleline_html branch from d1a7be8 to 9b71ab0 Compare July 20, 2024 22:21
@haxtibal
Copy link
Contributor Author

The latest commit "Mark all template.render() output as Markup" is to be discussed: We have to flag every programmatically generated HTML string as "safe", otherwise autoescaping would wrongly process them. This notably applies to all template.render() calls. I hoped Jinja2 would automatically mark the render output as safe, but seemingly not. So I tried to centralize Markup wrapping a bit. The result is a systematic but big diff. Would it be ok for you?

I asked Jinja2 folks whether such a feature could be implemented, see pallets/jinja#2003.

@haxtibal haxtibal force-pushed the tdmg/escape_singleline_html branch from 9b71ab0 to 92f45e1 Compare July 21, 2024 09:00
@stanislaw
Copy link
Collaborator

stanislaw commented Jul 21, 2024

The latest commit "Mark all template.render() output as Markup" is to be discussed: We have to flag every programmatically generated HTML string as "safe", otherwise autoescaping would wrongly process them. This notably applies to all template.render() calls. I hoped Jinja2 would automatically mark the render output as safe, but seemingly not. So I tried to centralize Markup wrapping a bit. The result is a systematic but big diff. Would it be ok for you?

I like the idea of render_template_as_markup because it also reduces the boilerplate everywhere.

I commented two times, asking about the possible redundancy of Markup / autoescape=True. I would like you to confirm that both mechanisms have to be used by us (Markup and autospace=True).

I asked Jinja2 folks whether such a feature could be implemented, see pallets/jinja#2003.

I could not understand what you are referring to in your example there:

The Template knows inner is created with autoescape=True. Couldn't it mark the result of render automatically as safe?

How can it make the result of render automatically if the result is a simple str object?

>>> inner = Template('<b>{{ first_name }}</b> {{ last_name }}', autoescape=True).render(first_name='John', last_name="Doe")
>>> print(type(inner))
<class 'str'>

While reviewing your PR, I considered an alternative approach that could be possible instead of doing the escaping with Jinja.

I recently encountered several implicit conversions with Jinja, when an object is implicitly called __str__ and so something incorrect is rendered to Jinja without me or the integration tests noticing this magic stuff.

An alternative could be that the whole StrictDoc uses a wrapper of a str class, e.g. sdoc_str, that would disable the __str__ method with an explicit assert. Instead the safe version of a string would be implemented as Markup(self) and only this method would be called in each and every Jinja template.

With this approach, the diff could be even bigger but in return we would get a very good control over the strings that we really want to use everywhere. It is in the same direction like the code of field_escaped_value/field_unescaped_value that you are removing now but this escaping stuff would be inside the sdoc_str.

I have confidence that this approach could work just fine but I don't have enough data to compare the advantages/disadvantages of using Jinja's escaping magic vs packing that explicitly in sdoc_str.

The sdoc_str approach would fit the current philosophy of SDoc of doing as many things explicit as possible. The Jinja autoescaping stuff is a different approach but also a capable one.

Another argument for maintaining sdoc_str would be that the escaping business is really core to what StrictDoc does, so keeping this function at the core of all string types could make sense.

And finally there was another source for me thinking in the direction of custom string wrappers. StrictDoc has many cases where we want to deal with a non-empty string and we never want an empty string. Python doesnot support this out of the box, so having a class such as sdoc_ne_str or something like that would simplify code in many places. Maybe it is not too bad of an idea to combine the "non-emptiness" and "markup safety" in one or in a family of str-based wrappers.

EDITED:

These are only thoughts. Let's move forward with Jinja's autoescape, safe, Markup. The alternative approach could work but let's keep it aside.

@haxtibal haxtibal force-pushed the tdmg/escape_singleline_html branch 2 times, most recently from f9de42d to 175b8a8 Compare July 23, 2024 09:02
@haxtibal haxtibal force-pushed the tdmg/escape_singleline_html branch from 175b8a8 to e198268 Compare July 23, 2024 21:05
@haxtibal
Copy link
Contributor Author

EDITED:

These are only thoughts. Let's move forward with Jinja's autoescape, safe, Markup. The alternative approach could work but let's keep it aside.

Sorry for not commenting on your alternative approach yet. It sounds interesting. I was quite focused here and wanted to get autoescaping in a workable state first. Will comment soon and suggest to hop back to #1920 for discussion to not interleave the autoescaping approach with the sdoc_str approach too much.

@stanislaw stanislaw changed the title RFC: Jinja2 autoescape demo Use Jinja2 autoescaping Jul 24, 2024
@stanislaw stanislaw merged commit fd2e364 into strictdoc-project:main Jul 24, 2024
23 checks passed
@stanislaw
Copy link
Collaborator

Thanks a lot for contributing this!

haxtibal added a commit to haxtibal/strictdoc that referenced this pull request Jul 26, 2024
SourceFileViewHTMLGenerator escaping code had multiple issues:

- When marking each line from range_start_pragma_processor as safe, we
  wrapped not only strings but also whole tuples in Markup().
- For tuple elements we didn't differentiate whether they from pygments
  (already escaped) or from source file (not yet escaped). Only the
  later must be explicitly escaped.

Relates to strictdoc-project#1921.
haxtibal added a commit to haxtibal/strictdoc that referenced this pull request Jul 26, 2024
SourceFileViewHTMLGenerator escaping code had multiple issues:

- When marking each line from range_start_pragma_processor as safe, we
  wrapped not only strings but also whole tuples in Markup().
- For tuple elements we didn't differentiate whether they from pygments
  (already escaped) or from source file (not yet escaped). Only the
  later must be explicitly escaped.

Some changes are there to make mypy happy.

Relates to strictdoc-project#1921.
haxtibal added a commit to haxtibal/strictdoc that referenced this pull request Jul 26, 2024
SourceFileViewHTMLGenerator escaping code had multiple issues:

- When marking each line from range_start_pragma_processor as safe, we
  wrapped not only strings but also whole tuples in Markup().
- For tuple elements we didn't differentiate whether they from pygments
  (already escaped) or from source file (not yet escaped). Only the
  later must be explicitly escaped.

Some changes are there to make mypy happy.

Relates to strictdoc-project#1921.
haxtibal added a commit to haxtibal/strictdoc that referenced this pull request Jul 27, 2024
SourceFileViewHTMLGenerator escaping code had multiple issues:

- When marking each line from range_start_pragma_processor as safe, we
  wrapped not only strings but also whole tuples in Markup().
- For tuple elements we didn't differentiate whether they from pygments
  (already escaped) or from source file (not yet escaped). Only the
  later must be explicitly escaped.

Some changes are there to make mypy happy.

Relates to strictdoc-project#1921.
haxtibal added a commit to haxtibal/strictdoc that referenced this pull request Jul 28, 2024
SourceFileViewHTMLGenerator escaping code had multiple issues:

- When marking each line from range_start_pragma_processor as safe, we
  wrapped not only strings but also whole tuples in Markup().
- For tuple elements we didn't differentiate whether they from pygments
  (already escaped) or from source file (not yet escaped). Only the
  later must be explicitly escaped.

Some changes are there to make mypy happy.

Relates to strictdoc-project#1921.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants