Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: support the nbands autoset for monomer jy case #97

Merged
merged 4 commits into from
Oct 7, 2024

Conversation

kirk0830
Copy link
Owner

@kirk0830 kirk0830 commented Oct 7, 2024

In ABACUS-ORBGEN, the introduction of spill_guess atomic requires an implicit dft calculation on monomer (single atom). Most parameters can be shared, such as ecutwfc, ecutrho, etc, but the nbands, lmaxmax should be set with much care.

The lmaxmax is the maximal angular momentum (among all atom types, but in present orbital generation design this is not considered). For PW, each planewave can have components with angular momentum from 0 to infinity, so an explicit lmaxmax is needed to determine the range of information that extract from PW calculation. So in PW case, the lmaxmax should be the maximal of all orbitals required to generate.

For isolated atom, partially according to Hund's rule, it is possible to get a rough estimation of number of bands that can include significant component with lmaxmax angular momentum, which is done by a function called cal_nbands_fill_lmax function. So now the nbands and lmaxmax both can have their values.

However, for jy, the logic is somewhat inverted. Instead of inferring lmaxmax from zeta_notation that user defined what orbital wants to generate, now one should always give a lmaxmax value for each reference structure to calculate because lmaxmax is required in jy basis preparation. Then in the range of bands calculated for each structure, a range of bands will be used to generate orbitals. These orbitals (or say radial functions) will have different angular momentums and should get good initial guess from atomic calculation. In the "onion" way to generate orbitals, radial 1 may refer to band 1st, 2nd and 3rd, then with the same angluar momentum the radial 2 will refer to band other than 1, 2 and 3. The number of bands to calculate in monomer case should avoid the case in which all the bands are depleted.

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a new JSON configuration file for computational parameters, enhancing user customization.
    • Added a validation function to ensure user settings are correct before execution.
  • Bug Fixes

    • Improved error handling and assertions in various functions to prevent runtime errors and clarify debugging information.
  • Documentation

    • Updated comments and change logs to better explain recent modifications and their purposes.
  • Refactor

    • Enhanced code structure for improved clarity and maintainability.
  • Configuration Changes

    • Modified existing JSON configuration to reflect updated basis type settings.

Copy link

coderabbitai bot commented Oct 7, 2024

📝 Walkthrough
📝 Walkthrough

Walkthrough

The pull request introduces modifications to several files, enhancing the functionality of input reading and validation processes. Key changes include the addition of a default fitting basis in the read_siab_plaintext function, improvements in path handling for pseudo_dir, and the introduction of a new validation function in read_input.py. Additionally, error handling and assertions are refined in api.py, along with expanded unit tests. New and modified JSON configuration files are also included to support these enhancements.

Changes

File Change Summary
SIAB/io/read_input.py - Added default key-value pair "fit_basis": "pw" in read_siab_plaintext.
- Updated read function to handle pseudo_dir paths starting with ~.
- Introduced _validate_param function for validating user settings.
SIAB/spillage/api.py - Enhanced error handling in _coef_guess and _nzeta_infer functions.
- Added assertions in _orb_matrices to ensure data integrity.
- Expanded unit tests for new error messages and assertions.
examples/jy.json - Introduced a new JSON configuration file with parameters for a computational environment, including MPI command, pseudopotential file details, and optimization settings.
examples/pw.json - Modified key from "basis_type": "jy" to "fit_basis": "pw" in the JSON configuration file.

Possibly related PRs

🐰 In the meadow, where the bunnies play,
New changes hop in, brightening the day!
With paths that expand, and settings so neat,
Our code now dances, a rhythmic beat.
Validation's here, to catch any fuss,
In our joyful burrow, we trust and we trust! 🌼✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (6)
examples/jy.json (5)

1-4: Clarify the purpose of the "environment" field and consider parameterizing the MPI command.

The "environment" field is currently empty. Please clarify its intended use or remove it if it's not needed.

Additionally, consider making the number of MPI processes a configurable parameter rather than hardcoding it to 8. This would allow for easier adjustment based on the available computational resources.


6-11: Consider using relative paths for pseudopotentials and document parameter choices.

The pseudopotential directory is currently specified as an absolute path, which may not be portable across different systems. Consider using a relative path or an environment variable to make it more flexible.

Additionally, it would be helpful to add comments explaining the chosen values for ecutwfc, bessel_nao_smooth, bessel_nao_rcut, and smearing_sigma. This documentation would aid users in understanding the rationale behind these selections and how they might need to be adjusted for different scenarios.


13-18: LGTM! Consider adding documentation for less obvious parameters.

The optimization settings align well with the PR objectives, particularly the use of "jy" as the fitting basis and "atomic" for the spill_guess. The "reduced" jY_type is also consistent with the description.

To improve clarity, consider adding comments or documentation for the less obvious parameters such as max_steps and nthreads_rcut. This would help users understand their impact on the optimization process.


20-35: LGTM! Consider documenting the rationale for chosen parameters.

The reference systems (dimer and trimer) are well-defined with appropriate parameters. The increasing number of bands from dimer to trimer and the consistent lmaxmax value of 2 align well with the PR objectives.

To enhance understanding and maintainability:

  1. Consider adding comments explaining the rationale behind the chosen nbands values for each system.
  2. Document why these specific bond length ranges were selected for each system.

This additional context would be valuable for future users or modifications of this configuration.


36-55: LGTM! Clarify the meaning and impact of orb_ref values.

The orbital configurations are well-structured and align with the PR objectives, particularly the "onion" approach mentioned. The progression in nbands_ref values (4, 10, 20) across configurations suggests an increasing level of complexity or accuracy.

To improve clarity:

  1. Consider adding comments to explain the significance of the different orb_ref values (none, 0, 1) and how they relate to the "onion" approach for orbital generation.
  2. Document how the nbands_ref values were chosen and their impact on the orbital generation process.

This additional information would help users understand the orbital generation strategy and its implications.

SIAB/spillage/api.py (1)

696-698: Improve clarity of the error message

The error message in the assertion can be rephrased for better understanding. Consider rewording it to clearly convey that the requested number of bands exceeds the calculated bands.

Apply this diff to improve the error message:

-        f"ERROR: number of bands for orbgen is larger than calculated: {nband} > {wfc.shape[1]}"
+        f"ERROR: The requested number of bands ({nband}) exceeds the number of calculated bands ({wfc.shape[1]})."
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between e08ab10 and 1c2c83d.

📒 Files selected for processing (4)
  • SIAB/io/read_input.py (5 hunks)
  • SIAB/spillage/api.py (2 hunks)
  • examples/jy.json (1 hunks)
  • examples/pw.json (1 hunks)
🧰 Additional context used
🪛 Ruff
SIAB/io/read_input.py

584-584: Loop control variable iorb not used within loop body

(B007)

🔇 Additional comments (5)
examples/pw.json (2)

15-15: Significant change in basis type - please clarify

The change from "basis_type": "jy" to "fit_basis": "pw" is a substantial modification that could significantly impact the simulation setup. This switch from jy (presumably Jastrow factor) to pw (plane wave) basis might affect various aspects of the calculation.

Could you please clarify:

  1. Is this change intentional and aligned with the PR objectives?
  2. How does this relate to the stated goal of supporting nbands autoset for the monomer jy case?
  3. What are the implications of this change on other parameters in the configuration?

Additionally, it would be helpful to run the following script to check for any other occurrences of 'basis_type' or 'fit_basis' in the codebase:

#!/bin/bash
# Search for 'basis_type' and 'fit_basis' in all files
echo "Occurrences of 'basis_type':"
rg --type json 'basis_type'
echo "Occurrences of 'fit_basis':"
rg --type json 'fit_basis'

This will help ensure consistency across the project and identify any potential areas that might need updating due to this change.


Line range hint 1-62: Verify configuration compatibility with new basis type

While the main change is in the basis type, it's crucial to ensure that the rest of the configuration remains compatible with the new "pw" (plane wave) basis. Please review the following aspects:

  1. Are the ecutwfc, bessel_nao_smooth, and bessel_nao_rcut parameters still relevant and correctly set for a plane wave basis?
  2. Does the spill_guess parameter need any adjustment given the change in basis type?
  3. Are the reference_systems and orbitals configurations still appropriate for a plane wave calculation?

To assist in this verification, you might want to run a script to check for any parameters that might need adjustment:

This will help identify any parameters that might need reconsideration given the change in basis type.

✅ Verification successful

Investigate usage of jY_type parameter

The jY_type parameter appears in the configuration and might be specific to the previous basis type. It's important to verify whether this parameter is still relevant and compatible with the new "pw" basis.

To facilitate this, please run the following script to identify any references to jY_type in the codebase:


Configuration Compatibility Confirmed

The jY_type parameter is utilized across multiple parts of the codebase and is not exclusive to the "jy" basis. Its presence in the examples/pw.json is appropriate and compatible with the new "pw" (plane wave) basis.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# List all keys in the JSON file
echo "All keys in the JSON file:"
jq 'keys' examples/pw.json

# Check for any keys that might be specific to jy basis
echo "Keys potentially specific to jy basis:"
jq 'keys[] | select(. | test("jy|jastrow|basis_type"; "i"))' examples/pw.json

Length of output: 615


Script:

#!/bin/bash
# Search for references to 'jY_type' in the codebase
rg 'jY_type' .

Length of output: 948

examples/jy.json (1)

1-56: Overall, the configuration is well-structured and aligns with PR objectives.

The jy.json configuration file provides a comprehensive set of parameters for the jy case, including environment settings, optimization parameters, reference systems, and orbital configurations. The structure and values align well with the PR objectives, particularly in supporting the nbands autoset for monomer jy cases.

To further enhance this configuration:

  1. Consider adding a top-level comment explaining the purpose and usage of this configuration file.
  2. Where possible, use relative paths or environment variables for better portability.
  3. Add inline comments or documentation for less obvious parameters to improve maintainability.

These improvements will make the configuration more robust and easier for other developers to understand and modify in the future.

SIAB/io/read_input.py (2)

16-16: Initialize 'result' with default 'fit_basis'

Initializing the result dictionary with {"fit_basis": "pw"} ensures that a default fitting basis is set, which enhances the robustness of the read_siab_plaintext function.


613-614: Validate parameters before parsing

Calling _validate_param(user_settings) at the beginning of the parse function ensures that the user settings are validated before proceeding. This enhancement improves the robustness of the parsing process.

SIAB/spillage/api.py Show resolved Hide resolved
SIAB/io/read_input.py Outdated Show resolved Hide resolved
SIAB/io/read_input.py Outdated Show resolved Hide resolved
SIAB/io/read_input.py Show resolved Hide resolved
SIAB/io/read_input.py Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (4)
SIAB/io/read_input.py (4)

16-16: LGTM! Consider adding a comment for clarity.

The addition of a default "fit_basis" key with the value "pw" is a good improvement. It provides a default configuration that can be overridden if needed.

Consider adding a brief comment explaining the purpose of this default value, such as:

# Set default fitting basis to plane wave (pw)
result = {"fit_basis": "pw"}

51-59: LGTM! Consider simplifying the path handling.

The improved handling of the pseudo_dir path is excellent. It correctly manages paths starting with "~" and ensures all paths are absolute.

You can simplify this code using os.path.expanduser and os.path.abspath together:

pseudo_dir = os.path.abspath(os.path.expanduser(result["pseudo_dir"]))
print(f"NOTE: Redirecting `pseudo_dir` to {pseudo_dir}", flush=True)
result["pseudo_dir"] = pseudo_dir

This approach handles both "~" paths and relative paths in one step, reducing the code complexity.


574-612: Good addition of input validation. Consider some improvements.

The new _validate_param function is a valuable addition that will help catch configuration errors early.

Consider the following improvements:

  1. Use custom exceptions instead of assertions for better error handling in production:
class InvalidShapeError(ValueError):
    pass

class InsufficientNbandsError(ValueError):
    pass

# Then in the function:
if not isinstance(s, (str, int)):
    raise InvalidShapeError(f"Shape {s} is not a valid shape")

if orb["nbands_ref"] > user_settings["reference_systems"][shape2index[s]]["nbands"]:
    raise InsufficientNbandsError(f"nbands_ref for orbital {iorb} is larger than the number of bands set for reference system `{s}`")
  1. Consider returning a boolean instead of using assertions, allowing the caller to decide how to handle validation failures.

  2. Add type hints to improve code readability and enable static type checking:

def _validate_param(user_settings: dict) -> bool:
    # ... function body ...
    return True  # if all validations pass

These changes will make the function more robust and easier to use in different contexts.

🧰 Tools
🪛 Ruff

588-588: Loop control variable iorb not used within loop body

(B007)


301-310: LGTM! Consider using integer division for nbands_shapemax.

The addition of specific handling for the 'jy' case in fit_basis is a good improvement. It provides more flexibility in setting the number of bands for the monomer case.

In the calculation of nbands_shapemax, consider using integer division (//) instead of regular division (/) to ensure the result is always an integer:

nbands_shapemax = max([i["nbands"] // natom_from_shape(j["shape"]) + 20
                       for i, j in zip(result, refsys)])

This will prevent any potential issues with floating-point numbers being used for the number of bands.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 1c2c83d and 94ffb6c.

📒 Files selected for processing (1)
  • SIAB/io/read_input.py (5 hunks)
🧰 Additional context used
🪛 Ruff
SIAB/io/read_input.py

588-588: Loop control variable iorb not used within loop body

(B007)

🔇 Additional comments (2)
SIAB/io/read_input.py (2)

617-618: LGTM! Good addition of input validation.

Adding the call to _validate_param at the beginning of the parse function is an excellent improvement. This ensures that input parameters are validated before any processing occurs, which can prevent errors and improve the robustness of the code.


286-299: Excellent addition of explanatory comments!

The new block of comments provides valuable context about the changes introduced in ABACUS-ORBGEN v3.0, particularly regarding the inference of nzeta and the handling of the monomer case. These comments significantly improve the code's maintainability and readability.

The detailed explanation of the different approaches for setting the number of bands in the monomer case is particularly helpful for understanding the code's behavior.

SIAB/io/read_input.py Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (4)
SIAB/spillage/api.py (2)

389-399: Improved error handling, but minor enhancement possible

The additions to the error handling are good improvements. The detailed error message will help developers understand and debug potential issues more easily.

However, to fully address the previous review comment about potential IndexError, consider modifying line 389 as follows:

-assert len(ib) > 0, "ERROR: no band index found"
+assert ib and len(ib[0]) > 0, "ERROR: no band index found"

This change ensures that ib is not empty before accessing ib[0], fully addressing the potential IndexError.

🧰 Tools
🪛 Ruff

390-390: Ambiguous variable name: l

(E741)


390-390: Consider clarifying variable name 'l'

While 'l' is commonly used to represent angular momentum in physics and chemistry calculations, it can be ambiguous in certain contexts or fonts. To improve code readability, consider using a more descriptive name such as angular_momentum or adding a comment to clarify its meaning.

-for l, (nz, nz0) in enumerate(zip(nzeta, excluded))]
+for l, (nz, nz0) in enumerate(zip(nzeta, excluded))]  # l represents angular momentum
🧰 Tools
🪛 Ruff

390-390: Ambiguous variable name: l

(E741)

SIAB/io/read_input.py (2)

279-305: Approved: Enhanced handling of nbands for monomer case

The new implementation for setting nbands in the monomer case is a significant improvement. It uses a mixed approach, calculating both nbands_lmax and nbands_shapemax to determine the appropriate number of bands. The distinction between "pw" and "jy" fit basis types is well-handled, and the informative print statements are helpful for understanding the autoset process.

One minor suggestion for improvement:

Consider extracting the calculation of nbands_shapemax into a separate function for better readability and potential reuse. For example:

def calculate_nbands_shapemax(result, refsys):
    return max([i["nbands"]//natom_from_shape(j["shape"]) + 20
                for i, j in zip(result, refsys)])

# Then in the main function:
nbands_shapemax = calculate_nbands_shapemax(result, refsys)

This would make the main function slightly cleaner and easier to read.


569-607: Approved: Good addition of input parameter validation

The new _validate_param function is a valuable addition that enhances the robustness of the code by validating input parameters. It checks for the validity of shapes assigned to orbitals and ensures that the number of bands set for reference systems is sufficient for fitting orbitals.

Suggestion for improvement:

Consider using custom exceptions instead of assertions for better error handling and user feedback. For example:

class InvalidShapeError(ValueError):
    pass

class InsufficientBandsError(ValueError):
    pass

def _validate_param(user_settings: dict):
    # ... existing code ...
    for s in shape:
        if not isinstance(s, (str, int)):
            raise InvalidShapeError(f"Shape {s} is not a valid shape")
        if isinstance(s, str) and s not in shape2index:
            raise InvalidShapeError(f"Shape {s} is not found in reference systems")
    
    # ... existing code ...
    if orb["nbands_ref"] > user_settings["reference_systems"][shape2index[s]]["nbands"]:
        raise InsufficientBandsError(f"ERROR: `nbands_ref` for orbital {iorb} is larger than the number of bands set for reference system `{s}`")
    
    # ... rest of the function ...

This approach would provide more informative error messages and allow for more flexible error handling in the calling code.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 94ffb6c and 4448fc4.

📒 Files selected for processing (2)
  • SIAB/io/read_input.py (6 hunks)
  • SIAB/spillage/api.py (2 hunks)
🧰 Additional context used
📓 Learnings (1)
SIAB/io/read_input.py (2)
Learnt from: kirk0830
PR: kirk0830/ABACUS-ORBGEN#97
File: SIAB/io/read_input.py:504-0
Timestamp: 2024-10-07T14:16:14.290Z
Learning: In the `skip_ppread` function, returning `False` when the 'element' key is missing in `user_settings` is intentional, as it indicates that the pseudopotential read-in cannot be skipped.
Learnt from: kirk0830
PR: kirk0830/ABACUS-ORBGEN#97
File: SIAB/io/read_input.py:504-508
Timestamp: 2024-10-07T13:44:50.084Z
Learning: In the `skip_ppread` function in `SIAB/io/read_input.py`, returning `False` when the 'element' key is missing in `user_settings` is intentional, as it indicates that the pseudopotential read-in cannot be skipped.
🪛 Ruff
SIAB/spillage/api.py

390-390: Ambiguous variable name: l

(E741)

🔇 Additional comments (5)
SIAB/spillage/api.py (1)

697-699: Good addition for input validation

The new assertion is a valuable addition to the _nzeta_infer function. It ensures that the number of bands for orbital generation doesn't exceed the number of calculated bands, preventing potential runtime errors. The error message is clear and informative, which will help developers quickly identify and resolve issues.

SIAB/io/read_input.py (4)

16-16: Approved: Good addition of a default value

The addition of a default value for "fit_basis" is a good practice. It provides a fallback option if not specified in the input file, improving the robustness of the function.


52-54: Approved: Improved path handling for pseudo_dir

The new implementation for handling the pseudo_dir path is more robust and versatile. By using os.path.abspath(os.path.expanduser(result["pseudo_dir"])), it correctly handles paths starting with a tilde (~) and ensures the path is absolute. This change resolves the previous limitation where the tilde symbol couldn't be understood by os.path.abspath.


612-613: Approved: Good addition of input validation

The addition of the _validate_param(user_settings) call at the beginning of the parse function is a good practice. It ensures that the input parameters are validated early in the process, which can help catch potential issues before they cause problems in later stages of the code. This improves the overall robustness and reliability of the parsing process.


Line range hint 1-698: Overall: Solid improvements to input handling and validation

This review has covered several significant improvements to the SIAB/io/read_input.py file:

  1. Enhanced path handling for pseudo_dir using os.path.abspath and os.path.expanduser.
  2. Improved calculation of nbands for the monomer case in the abacus_settings function.
  3. Addition of a new _validate_param function for input parameter validation.
  4. Integration of the validation step in the parse function.

These changes collectively enhance the robustness, flexibility, and error-handling capabilities of the input processing. The code is now better equipped to handle various input scenarios and provide more informative feedback in case of issues.

While some minor suggestions for improvement were made (such as function extraction and custom exception handling), the overall changes are well-implemented and significantly improve the quality of the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant