Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend resource_files to general Traversables #2946

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

LecrisUT
Copy link
Contributor

@LecrisUT LecrisUT commented May 20, 2024

Exploring "Approach 1" in #2945.

Basically trying to support MultiplexedPath + entry-points

Immediate consequence, external plugins can now fully expand schema verification 🎉. Maybe even more.

Soft depends-on: #2812
Depends-on: #2959


Pull Request Checklist

  • implement the feature
  • write the documentation
  • extend the test coverage
  • mention the version
  • include a release note

@LecrisUT LecrisUT force-pushed the feat/plugin_schemas branch 4 times, most recently from 7e155fa to 5c79195 Compare May 20, 2024 19:24
Copy link
Contributor Author

@LecrisUT LecrisUT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried this on tmt-cmake by creating a schemas and to my surprise, it actually did what it was intending to do 👀

tmt/utils.py Outdated Show resolved Hide resolved
tmt/utils.py Outdated Show resolved Hide resolved
tmt/utils.py Outdated Show resolved Hide resolved
tmt/utils.py Outdated Show resolved Hide resolved
@LecrisUT LecrisUT changed the title [WIP] Allow general Traversable Allow general Traversable May 20, 2024
@LecrisUT LecrisUT marked this pull request as ready for review May 20, 2024 19:35
@LecrisUT LecrisUT force-pushed the feat/plugin_schemas branch 2 times, most recently from 29e25f4 to 9357658 Compare May 20, 2024 20:31
@LecrisUT LecrisUT changed the title Allow general Traversable Extend resource_files general Traversable May 20, 2024
@LecrisUT LecrisUT changed the title Extend resource_files general Traversable Extend resource_files to general Traversables May 20, 2024
@LecrisUT LecrisUT force-pushed the feat/plugin_schemas branch 6 times, most recently from 64f66db to dffa489 Compare May 24, 2024 17:07
@LecrisUT
Copy link
Contributor Author

Fixed the typing issues, but there is still one left:

found 0 vulnerabilities
/home/runner/work/tmt/tmt/tmt/steps/report/html.py
  /home/runner/work/tmt/tmt/tmt/steps/report/html.py:12:1 - error: Type of "HTML_TEMPLATE_PATH" is unknown (reportUnknownVariableType)
  /home/runner/work/tmt/tmt/tmt/steps/report/html.py:103:17 - error: Argument type is unknown
    Argument corresponds to parameter "template_filepath" in function "render_template_file" (reportUnknownArgumentType)
/home/runner/work/tmt/tmt/tmt/templates/__init__.py
  /home/runner/work/tmt/tmt/tmt/templates/__init__.py:83:9 - error: Type of "templates_dir" is unknown (reportUnknownVariableType)
  /home/runner/work/tmt/tmt/tmt/templates/__init__.py:84:36 - error: Argument type is unknown
    Argument corresponds to parameter "root_dir" in function "_get_templates" (reportUnknownArgumentType)
4 errors, 0 warnings, 0 informations

Dunno why that one is failing. I've tried with higher pyright version, __future__.annotations, etc. nothing seems to make it work :/

@LecrisUT LecrisUT force-pushed the feat/plugin_schemas branch 2 times, most recently from b3b82d1 to 3fa7e15 Compare May 28, 2024 12:21
@happz happz added this to the 1.35 milestone Jun 5, 2024
@@ -532,7 +532,7 @@ def prepare_scripts(self, guest: "tmt.steps.provision.Guest") -> None:
""" Prepare additional scripts for testing """
# Install all scripts on guest
for script in self.scripts:
source = SCRIPTS_SRC_DIR / script.path.name
source = Path(str(SCRIPTS_SRC_DIR / script.path.name))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will need some help to understand the various effects of these changes, bear with me, please.

IIUIC, SCRIPTS_SRC_DIR is no longer Path instance, but MultiplexedPath ("If the final path is a file, go ahead and convert it to a regular Path, otherwise keep it as a traversable in order to properly support MultiplexedPaths." - and SCRIPTS_SRC_DIR is definitely not pointing to a file. What effect does this multiplexed property have in loops like this one? tmt tries to construct a path from this now MultiplexedPath and a script file name, do I understand it correctly there might be multiple actual directories "hidden" behind the SCRIPTS_SRC_DIR? Shouldn't we somehow iterate over them to find the script? Maybe it's not present in all of those directories. Would it also make sense to support the addition of the actual scripts? Because without it, having an extra path lets me override known scripts, but I can't add new ones.

Copy link
Contributor Author

@LecrisUT LecrisUT Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the beauty of how MultiplexedPath works, it does all of that for you. I'm on the phone now so I can't make the directory tree, but let's consider we have a mp = MultiplexedPath(A, B).

  • if A / c and B / c both exist, then mp / c is a MultiplexedPath
  • if only one of A / c or B / c exists, mp / c is converted to the regular Path where it exists
  • if neither A / c nor B / c exists, mp / c is converted randomly to a regular Path to one of them (first one)

Now if mp / c points to a file than it will always be a Path, because MultiplexedPath errors if components are not directories. If there are multiple files, than it is illdefined by the definition of Traversable, but it failsafes to the first file.

Design wise we must ask plugin designers to play nice and don't create filestructures that can clash, usually by appropriately prefixing their classes, templates, schemas, etc.

Reference: cpython implementation: https://github.com/python/cpython/blob/main/Lib/importlib/resources/readers.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, need to bounce something passed you @happz. I can basically change this part to:

            source = SCRIPTS_SRC_DIR / script.path.name
            assert isinstance(source, pathlib.Path)

Or better yet add a proper check that it is a file and not a directory. But the issue is that we would need to convert pathlib.Path to tmt.utils.Path (tmt._compat.pathlib.Path). It's quite tricky because it can't be done at the SCRIPTS_SRC_DIR because that part can be a MultiplexedPath, and it has to be done at the endpoint. What interface do you think would be good to translate a Traversable/pathlib.Path to tmt.utils.Path?

What is puzzling me is the relative_to and in what context is it needed to override the original behavior there?

if filepath.exists():
return filepath
if filepath.is_file():
return Path(str(filepath))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen the str() call before, is it needed? Can we create Path() from, hmm, what exactly is filepath? I'd like to avoid having str() everywhere for the conversion, if it's needed and filepath type is one that would be commonly converted to Path, we could add this conversion to Path class itself.

Copy link
Contributor Author

@LecrisUT LecrisUT Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually just to apease the mypy. is_file should technically be TypeGuard[Path]. One issue of course is pathlib.Path vs tmt.utils.Path

Can we create Path() from, hmm, what exactly is filepath

Yes, but there was also a mypy complaint about pathlib.Path being used for tmt.utils.Path. maybe the constructor for the latter needs to be specialized a bit more

See:

Path(importlib.resources.files(package)) / path  # type: ignore[arg-type]

@martinhoyer
Copy link
Collaborator

/packit build

tmt/utils/__init__.py Outdated Show resolved Hide resolved
@thrix thrix self-requested a review August 1, 2024 12:56
Copy link
Collaborator

@thrix thrix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -11,6 +11,10 @@ standard location under ``tmt/steps`` , from all directories
provided in the ``TMT_PLUGINS`` environment variable and from
``tmt.plugin`` entry point.

.. versionadded:: 1.35
You can use ``tmt.resources`` entry point to inject resource
files to be used for tmt, e.g. schemas or templates.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LecrisUT would it be possible to add some more documentation about this feature maybe? I.e. how to exactly use it for my plugin.

Also, according to some other comments, maybe some clashing is possible, so would it be possible to describe the best practices to prevent that?

Copy link
Contributor Author

@LecrisUT LecrisUT Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a section to it, wdyt? Feels like a plugin 101 with a step-by-step would be more appropriate, but that's a much bigger task. (This section is very hidden under the Code section...)

@martinhoyer
Copy link
Collaborator

martinhoyer commented Aug 1, 2024

I wonder why raising tmt.GeneralError is not handled correctly Command 'TMT_DEBUG=weird tmt plan show' (Expected 2, got 1)

something in https://github.com/LecrisUT/tmt/blob/9d58d710718689193b5e1b45cb78fa123fc8d298/tmt/utils/__init__.py#L7207 I guess


if not logger:
# Make sure there is a logger to report entry-point failures
logger = tmt.log.Logger.get_bootstrap_logger()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martinhoyer one possible clue to the new test failures is this addition. Maybe it makes it fail much earlier and the exit code is not adjusted properly?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@happz any ideas?

Copy link
Contributor Author

@LecrisUT LecrisUT Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have an approach. First try...except the Logger.create here:

tmt/tmt/log.py

Lines 856 to 859 in 63b1035

actual_logger = Logger._normalize_logger(logging.getLogger('_tmt_bootstrap'))
cls._bootstrap_logger = Logger.create(actual_logger=actual_logger)
cls._bootstrap_logger.add_console_handler()

If it fails, than pass a high verbose/debug level to the options. Second, invert the priority of env_var vs argument here:

tmt/tmt/log.py

Lines 626 to 638 in 63b1035

debug_level_from_global_envvar = _debug_level_from_global_envvar()
if debug_level_from_global_envvar not in (None, 0):
self.debug_level = debug_level_from_global_envvar
else:
debug_level_from_option = cast(Optional[int], actual_kwargs.get('debug', None))
if debug_level_from_option is None or debug_level_from_option == 0:
pass
else:
self.debug_level = debug_level_from_option

(this is the main thing that unblocks this issue). Then in the except block output the caught exception and continue.

Third, make sure to use envvar or equivalent. This might negate the need for the second point

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why would the creation of bootstrap logger fail? And if it does, shouldn't we rather stop & fix it ASAP?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stinks, I'm trying to find some answers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to fail safely? I mean, clearly an invalid debug level was passed to tmt, shouldn't it fail then? If I try to debug the import process, ignoring the error would give me silence & proceed with no useful info emitted.

But the TMT_DEBUG will be caught later on regardless. Wrapping all imports looks weird to me. Let me make a commit with what I have in mind, because "fail safely" is not exactly true, stay tuned.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to fail safely? I mean, clearly an invalid debug level was passed to tmt, shouldn't it fail then? If I try to debug the import process, ignoring the error would give me silence & proceed with no useful info emitted.

But the TMT_DEBUG will be caught later on regardless. Wrapping all imports looks weird to me.

I can say the same about investing so much effort into not failing early :)

Here's a sample of moving TMT_DEBUG to click's envvar:

$ TMT_DEBUG=weird tmt plans show 
Usage: tmt [OPTIONS] COMMAND [ARGS]...
Try 'tmt --help' for help.

Error: Invalid value for '-d' / '--debug': 'weird' is not a valid integer range.

Didn't even needed to make a special check for it, clicks's count=true took care of it 🙂

I'm not sure... It does feel simple, I am only worried about the can of worms the verbosity and debug logging is in general, with all its inheritance of verbosity between commands and subcommand, being able to increase/decrease their own verbosity (tmt -vvvv run -vvv prepare -vv & co.). A small change like this may break logging behavior easily, I would rather do this kind of change in a standalone patch...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues about making it a standalone patch, see 0e47e56. I think it's way less invasive then you are afraid of, it (should) only cover the top-level option.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, the commit makes it even less clear to me. The option now specifies envvar, fine, no problem with that. But os.getenv is still used, because the bootstrap logger now has its own small special parsing of debug level, so we still do check the validity of the value, it just happens in two places now, and I still don't see the eventual benefit: why all these changes instead of simply failing early? :) tmt was fed with an invalid debug level, why not just fail the first time we see it?

Given the changes, the necessary code now needs to cover also the mere import of tmt package, since even that can spawn a logger and hit the invalid debug level, that makes absolute sense, so why not cover this vector with the tested exception handling we already have, and quit as soon as we hit an error? I don't see the use case for ignoring the debug level, running all imports, with all possible side-effects they may have, and failing after that, telling me I get no debug input from import time because I made a mistake - why not telling me before doing the import dance? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why all these changes instead of simply failing early?

Because it would become a whack-a-mole game of making each import be guarded by a try...except.

But os.getenv is still used, because the bootstrap logger now has its own small special parsing of debug level, so we still do check the validity of the value, it just happens in two places now

Indeed, but do we even care if TMT_DEBUG is invalid for the bootstrap_logger. The only reason we are parsing the env-var is to enable the user to increase the logging early on as well. If it's an invalid value, we shouldn't care and let the later parts deal with it eventually. My reasoning for this is the very nature of the bootsrap logger:

    def get_bootstrap_logger(cls) -> 'Logger':
        """
        Create a logger designed for tmt startup time.

        .. warning::

            This logger has a **very** limited use case span, i.e.
            before tmt can digest its command-line options and create a
            proper logger. This happens inside :py:func:`tmt.cli.main`
            function, but there are some actions taken by tmt code
            before this function is called by Click, actions that need
            to emit logging messages. Using it anywhere outside of this
            brief time in tmt's runtime should be ruled out.
        """

So my reading of this is that it should be a safe logger (read its creation should not fail) that we can use outside of the cli and it should not interfere with any exceptions caught at later points.

so why not cover this vector with the tested exception handling we already have, and quit as soon as we hit an error

Partly I see an issue that it might mask other errors, e.g. in /tests/core/env, if the exit code is 1 then we have an "Unexpected failure" due to a misbehaving module, but if we catch all imports in tmt/__init__.py snippet, this error might be overlooked (ignore the rlAssertGrep there).


Tangent, maybe the exit codes should be revised. If we are hitting any of these uncaught Unexpected failures, than python interpreter would exit with exit code 1. Maybe we should reserve that for assert-like errors instead of There was a fail or warn identified, but no error.

@LecrisUT LecrisUT force-pushed the feat/plugin_schemas branch 3 times, most recently from 9950e38 to 04001ca Compare August 5, 2024 10:51
@martinhoyer
Copy link
Collaborator

martinhoyer commented Aug 6, 2024

/packit build

@LecrisUT
Copy link
Contributor Author

LecrisUT commented Aug 6, 2024

@psss
Copy link
Collaborator

psss commented Aug 7, 2024

Blocked by https://bugzilla.redhat.com/show_bug.cgi?id=2303229

Ok, moving to 1.36.

@psss psss modified the milestones: 1.35, 1.36 Aug 7, 2024
@LecrisUT
Copy link
Contributor Author

LecrisUT commented Aug 8, 2024

@martinhoyer
Copy link
Collaborator

Let's add it to next milestone once https://bugzilla.redhat.com/show_bug.cgi?id=2303229 is resolved.

@martinhoyer martinhoyer removed this from the 1.36 milestone Sep 3, 2024
@LecrisUT
Copy link
Contributor Author

LecrisUT commented Sep 17, 2024

Here is the tracking s.fp.o PR: https://src.fedoraproject.org/rpms/python-importlib-resources/pull-request/2 and its dependencies. Finished

Signed-off-by: Cristian Le <cristian.le@mpsd.mpg.de>
- `importlib.metadata.entry_points`: Backport selectable entry_points
- `importlib.readers.MultiplexedPath`: Before python3.12, it did not work properly on navigating folders, only on file end-points

Signed-off-by: Cristian Le <cristian.le@mpsd.mpg.de>
Signed-off-by: Cristian Le <cristian.le@mpsd.mpg.de>
Signed-off-by: Cristian Le <cristian.le@mpsd.mpg.de>
Signed-off-by: Cristian Le <cristian.le@mpsd.mpg.de>
Also make the bootstrap_logger failsafe

Signed-off-by: Cristian Le <cristian.le@mpsd.mpg.de>
Comment on lines +202 to +209
# TODO: Check if PackageLoader would work instead
# Note: the issue here is that jinja passes the paths through `os.fspath` which breaks the
# MultiplexedPath. This can be resolved by using `as_files` to create a context and copy
# all files to a real data folder, or jinja could learn to support generic Traversable
if isinstance(DEFAULT_TEMPLATE_DIR, MultiplexedPath):
phase.warn("Jinja template extension for report/junit/templates is not supported yet.")
environment.loader = FileSystemLoader(
searchpath=tmt.utils.resource_files(DEFAULT_TEMPLATE_DIR))
searchpath=str(DEFAULT_TEMPLATE_DIR))
Copy link
Contributor Author

@LecrisUT LecrisUT Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welp, this is a nuanced place where this can break. Basically jinja does not support MultiplexedPath (opened an issue: pallets/jinja#2035). I've looked a bit at the implementation and it seems it could support quite easily by just using pathlib/Traversable interface.

Alternatively this can be solved on the tmt side by using the importlib.resources.as_file context which would copy all files under the package path to a native Path structure. It is really not ideal and would need more refactoring of how to work with tmt.utils.resource_files.

For now this could be a very niche breakage, so maybe not worth supporting right now. Probably better to make it error here, but I'm not sure how to do that. Just raise tmt.utils.ReportError?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci | full test Pull request is ready for the full test execution
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants