-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor: fine tune stage 2, support the jy-space truncation loss estimation on fixed nzeta #103
Conversation
📝 Walkthrough📝 WalkthroughWalkthroughThe changes in this pull request involve significant refactoring and enhancements to functions within the Changes
Assessment against linked issues
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
🧹 Outside diff range and nitpick comments (10)
SIAB/spillage/jy_expmt.py (3)
1-2
: Update module docstring for clarity.The module-level docstring could be improved for better readability and to follow standard conventions.
Consider reformatting the docstring:
""" All functions in this module are not compatible with ABACUS PW calculation. ABACUS PW is deprecated in abacus-orbgen v3.0. """
9-9
: Consider adding a blank line after imports for readability.Adding a blank line after the import statements improves readability.
Apply this diff:
import os + def _grpbnd_lnm(folder, count_thr = 1e-1, itype = 0):
183-183
: Handle specific exception types intest_coef_init
.Catching general exceptions can mask other issues. Specify the exception type expected.
Example modification:
with self.assertRaises(ValueError): _coef_init(fpath, [2, 3, 5])Ensure that the exception raised is indeed a
ValueError
.SIAB/spillage/lcao_wfc_analysis.py (6)
42-42
: Simplifykwargs.get('filter', None)
tokwargs.get('filter')
The default value of
None
is implied when usingkwargs.get('filter')
, so specifying it explicitly is unnecessary.Apply this diff to simplify the code:
- filter = kwargs.get('filter', None) + filter = kwargs.get('filter')🧰 Tools
🪛 Ruff
42-42: Use
kwargs.get('filter')
instead ofkwargs.get('filter', None)
Replace
kwargs.get('filter', None)
withkwargs.get('filter')
(SIM910)
95-95
: Avoid usingfilter
as a variable name to prevent shadowing built-in functionUsing
filter
as a parameter name shadows the built-in functionfilter
, which can lead to unexpected bugs or confusion. Consider renaming it to something more descriptive, likefilter_threshold
.Apply this diff to rename the parameter in both functions:
def _svdmax(C, S, nband, natom, nzeta, - filter = None): + filter_threshold = None): def _svdfold(C, S, nband, natom, nzeta, - filter = None): + filter_threshold = None):Also, update all occurrences of
filter
within these functions tofilter_threshold
.Also applies to: 197-197
149-149
: Consider renaming variablel
to avoid ambiguityUsing single-character variable names like
l
can be confused with the number1
or the capital letterI
. To enhance code readability, consider renamingl
to a more descriptive name such asangular_momentum
orl_value
.Also applies to: 153-153, 161-161, 184-184, 266-266, 286-286, 405-405
🧰 Tools
🪛 Ruff
149-149: Ambiguous variable name:
l
(E741)
389-389
: Add@unittest.skip
decorator to unused test functionsThe test functions
test_svdmax
andtest_svdfold
lack the@unittest.skip
decorator, which is used in other test functions to indicate that they are not ready for execution. If these tests are not yet implemented or are temporarily disabled, consider adding the decorator with a reason to maintain consistency.Apply this diff to add the decorators:
+ @unittest.skip('Not implemented yet') def test_svdmax(self): here = os.path.dirname(os.path.abspath(__file__)) outdir = os.path.join(here, 'testfiles/Si/jy-7au/monomer-k/OUT.ABACUS/') wfc = read_wfc_lcao_txt(outdir + 'WFC_NAO_K2.txt')[0] S = read_triu(outdir + 'data-1-S') dat = read_running_scf_log(outdir + 'running_scf.log') sigma, nz, loss = _svdmax(wfc, S, 'all', dat['natom'], dat['nzeta']) # Existing test code + @unittest.skip('Not implemented yet') def test_svdfold(self): here = os.path.dirname(os.path.abspath(__file__)) outdir = os.path.join(here, 'testfiles/Si/jy-7au/monomer-gamma/OUT.ABACUS/') wfc = read_wfc_lcao_txt(outdir + 'WFC_NAO_GAMMA1.txt')[0] S = read_triu(outdir + 'data-0-S') dat = read_running_scf_log(outdir + 'running_scf.log') sigma, nz, loss = _svdfold(wfc, S, 'all', dat['natom'], dat['nzeta']) # Existing test codeAlso applies to: 409-409
95-99
: Clarify the docstring in_svdmax
functionThere are some grammatical errors and ambiguity in the docstring which might hinder understanding. Consider rephrasing for clarity.
Suggested rephrased docstring excerpt:
'''Perform SVD analysis on submatrices extracted from - the whole wavefunction. The submatrix is certainly indexed by [it][l][iat][m] - , then taking maximum over both [iat] and [m] dimensions, yielding - the maximal singular value of each zeta function of each l for each atomtype. + the whole wavefunction. The submatrix is indexed by [it][l][iat][m]. + By taking the maximum over both [iat] and [m] dimensions, we obtain + the maximal singular value of each zeta function for each angular momentum `l` and atom type. ``` --- `115-124`: **Improve the parameter description for `filter`** The description of the `filter` parameter is somewhat convoluted and might be confusing to users. Consider reformatting it for better readability and clarity. Suggested reformatting: ```diff filter : float | list[float] | list[list[int]] | None, optional - if specified as list[float], will be the threshold for filtering - singular values for each atom type. All values larger than it - will be treated as significant, omit otherwise. - if specified as float, will be the threshold for all atom types. - if specified as list[list[int]], will be the number of singular - values for each l to be kept. - if specified as None (default, thus not specified), will not do any - filtering. - **For this method, the maximal value of sigma will be 1.** + Determines the filtering criteria for singular values: + - If a float is provided, it is used as the threshold for all atom types. Singular values larger than this threshold are considered significant. + - If a list of floats is provided, each float in the list is used as the threshold for each atom type. + - If a list of lists of ints is provided, it specifies the number of singular values to keep for each `l` for each atom type. + - If `None` (default), no filtering is applied. + + **Note:** For this method, the maximal value of sigma will be 1.Also applies to: 219-229
SIAB/spillage/api.py (1)
Line range hint
596-617
: Rename parameterfilter
to avoid shadowing built-in function.The parameter
filter
in_nzeta_infer
shadows Python's built-infilter()
function. To prevent potential conflicts or confusion, consider renaming it tofilter_value
orthreshold
.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- SIAB/spillage/api.py (24 hunks)
- SIAB/spillage/jy_expmt.py (1 hunks)
- SIAB/spillage/lcao_wfc_analysis.py (9 hunks)
🧰 Additional context used
🪛 Ruff
SIAB/spillage/api.py
667-667: Ambiguous variable name:
l
(E741)
1309-1309: Ambiguous variable name:
l
(E741)
SIAB/spillage/jy_expmt.py
4-4:
SIAB.spillage.datparse.read_orb_mat
imported but unusedRemove unused import:
SIAB.spillage.datparse.read_orb_mat
(F401)
61-61: Ambiguous variable name:
l
(E741)
69-69: Ambiguous variable name:
l
(E741)
75-75: Ambiguous variable name:
l
(E741)
78-78: Ambiguous variable name:
l
(E741)
148-148: Ambiguous variable name:
l
(E741)
151-151: Ambiguous variable name:
l
(E741)
187-187: Ambiguous variable name:
l
(E741)
SIAB/spillage/lcao_wfc_analysis.py
42-42: Use
kwargs.get('filter')
instead ofkwargs.get('filter', None)
Replace
kwargs.get('filter', None)
withkwargs.get('filter')
(SIM910)
149-149: Ambiguous variable name:
l
(E741)
153-153: Ambiguous variable name:
l
(E741)
161-161: Ambiguous variable name:
l
(E741)
184-184: Ambiguous variable name:
l
(E741)
266-266: Ambiguous variable name:
l
(E741)
286-286: Ambiguous variable name:
l
(E741)
405-405: Ambiguous variable name:
l
(E741)
🔇 Additional comments (7)
SIAB/spillage/jy_expmt.py (1)
61-62
:⚠️ Potential issueAvoid using ambiguous variable name
l
.Using single-character variable names like
l
can be ambiguous and may lead to confusion, especially as it can be mistaken for the digit1
or the uppercaseI
. Consider renamingl
to a more descriptive name likeangular_momentum
for better readability.Example of renaming in a section of the code:
- for l, wl in enumerate(wlb): + for angular_momentum, wl in enumerate(wlb):This change should be applied consistently across the code where
l
is used as a variable.Also applies to: 69-70, 75-76, 78-78, 148-148, 151-151, 187-187
⛔ Skipped due to learnings
Learnt from: kirk0830 PR: kirk0830/ABACUS-ORBGEN#96 File: SIAB/spillage/api.py:783-783 Timestamp: 2024-09-29T07:56:50.811Z Learning: When `l` is used as a variable for angular momentum, do not flag it as ambiguous.
Learnt from: kirk0830 PR: kirk0830/ABACUS-ORBGEN#96 File: SIAB/spillage/api.py:783-783 Timestamp: 2024-10-09T11:06:34.325Z Learning: When `l` is used as a variable for angular momentum, do not flag it as ambiguous.
🧰 Tools
🪛 Ruff
61-61: Ambiguous variable name:
l
(E741)
SIAB/spillage/lcao_wfc_analysis.py (1)
186-189
: Ensure proper computation ofloss
in space truncationIn the computation of
loss
, verify that the indices and dimensions match correctly to avoid potential indexing errors or incorrect calculations, especially when slicing arrays based onnz[it][l]
.Run the following script to check the shapes and indices during
loss
computation:Also applies to: 288-288
✅ Verification successful
Let me gather more context about the array dimensions and SVD operations to better verify the indexing.
This will help us understand:
- How SVD is computed and used
- Array dimensions being handled
- How
nz
indices are determined and used in slicing
Let me gather one final piece of information about how
nz
is initialized and computed.This will help us understand:
- How
nz
is initialized and what it represents- The context of the function where the loss computation occurs
Based on the gathered information, I can now provide a final response:
Loss computation indices are correctly aligned and dimensionally consistent
The loss computation is properly implemented for space truncation because:
nz[it][l]
is correctly computed as the number of singular values above the filter threshold for each atom typeit
and angular momentuml
- The SVD decomposition using
la.svd(..., full_matrices=False)
ensuresu
,vh
dimensions match the singular values- The slicing operations
u[:, nz[it][l]:]
,diag(out[it][l][nz[it][l]:])
, andvh[nz[it][l]:, :]
are consistently applied to truncate the space beyond the significant singular values🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the dimensions and indices used in loss calculation. # Since we cannot run the actual code, we can check for potential issues by searching for `loss +=` lines. rg 'loss \+= .*' -A 3Length of output: 657
Script:
#!/bin/bash # Let's examine the SVD operations and array handling in the codebase # First, check the SVD computation and array handling rg "svd\(" -A 5 SIAB/spillage/lcao_wfc_analysis.py # Then, look for array shape/dimension related operations rg "\.shape" -A 2 SIAB/spillage/lcao_wfc_analysis.py # Also check how nz is computed/initialized rg "nz\[" -B 3 SIAB/spillage/lcao_wfc_analysis.pyLength of output: 2290
Script:
#!/bin/bash # Check how nz is initialized and computed rg -B 5 -A 5 "nz = " SIAB/spillage/lcao_wfc_analysis.py # Also check the function parameters and their types rg "def _svdfold" -A 10 SIAB/spillage/lcao_wfc_analysis.pyLength of output: 1630
SIAB/spillage/api.py (5)
113-118
: Updates to_coef_opt_jy
function are appropriate.The modifications to
_coef_opt_jy
correctly integrate the new parameters and streamline the nzeta inference process, enhancing functionality.
Line range hint
205-221
: New function_plan_guess
enhances modularity.The introduction of
_plan_guess
effectively encapsulates the initialization logic for guesses, improving code reuse and maintainability.
781-793
: Unit tests for_plan_guess
are comprehensive.The added unit tests thoroughly assess various scenarios for
_plan_guess
, ensuring its reliability and correctness.
667-667
:⚠️ Potential issueRename variable
l
to improve code readability.The use of
l
as a loop variable can be easily mistaken for the numeral1
, reducing clarity. Consider renamingl
to a more descriptive name likeangular_momentum
ororbital_l
.⛔ Skipped due to learnings
Learnt from: kirk0830 PR: kirk0830/ABACUS-ORBGEN#96 File: SIAB/spillage/api.py:783-783 Timestamp: 2024-09-29T07:56:50.811Z Learning: When `l` is used as a variable for angular momentum, do not flag it as ambiguous.
Learnt from: kirk0830 PR: kirk0830/ABACUS-ORBGEN#96 File: SIAB/spillage/api.py:783-783 Timestamp: 2024-10-09T11:06:34.325Z Learning: When `l` is used as a variable for angular momentum, do not flag it as ambiguous.
🧰 Tools
🪛 Ruff
667-667: Ambiguous variable name:
l
(E741)
1309-1309
:⚠️ Potential issueRename variable
l
to improve code readability.Using
l
as a variable name may cause confusion with the digit1
. It's advisable to use a more descriptive name such asangular_momentum
ororbital_l
for better readability.⛔ Skipped due to learnings
Learnt from: kirk0830 PR: kirk0830/ABACUS-ORBGEN#96 File: SIAB/spillage/api.py:783-783 Timestamp: 2024-09-29T07:56:50.811Z Learning: When `l` is used as a variable for angular momentum, do not flag it as ambiguous.
Learnt from: kirk0830 PR: kirk0830/ABACUS-ORBGEN#96 File: SIAB/spillage/api.py:783-783 Timestamp: 2024-10-09T11:06:34.325Z Learning: When `l` is used as a variable for angular momentum, do not flag it as ambiguous.
🧰 Tools
🪛 Ruff
1309-1309: Ambiguous variable name:
l
(E741)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (10)
SIAB/spillage/pytorch_swat/api.py (3)
Line range hint
64-81
: Document return value structure.The docstring provides excellent documentation of the
siab_settings
structure after initialization, but it's missing information about the function's return value. Consider adding a "Returns:" section to the docstring.
Line range hint
156-167
: Improve file handling in parallel execution.The current implementation has potential issues with file handling in parallel execution:
- File descriptors for stdout/stderr are not properly closed
- Multiple processes writing to the same log files could lead to race conditions
Consider using a context manager for file handling:
- sys.stdout = open("log.%d.txt"%ircut, "a+") - sys.stderr = open("err.%d.txt"%ircut, "a+") + with open(f"log.{ircut}.txt", "a+") as stdout_file, \ + open(f"err.{ircut}.txt", "a+") as stderr_file: + sys.stdout = stdout_file + sys.stderr = stderr_file + procs.append(proc) + proc.start() + sys.stdout = sys.__stdout__ + sys.stderr = sys.__stderr__ - procs.append(proc) - proc.start()
Process cleanup needs improvement for error handling and resource management
Based on the code analysis, there are several concerns with the current process management implementation:
- No error handling (
try-except
) blocks around process execution and joining- File handles for stdout/stderr redirection are not properly closed
- No process termination handling in case of failures
- Missing cleanup mechanism for processes in error scenarios
Recommended changes:
- Wrap process execution in try-except blocks
- Properly close file handles after use
- Add process termination handling for error cases
- Consider using context managers for resource management
Example structure:
try: with open(f"log.{ircut}.txt", "a+") as stdout_file, open(f"err.{ircut}.txt", "a+") as stderr_file: sys.stdout = stdout_file sys.stderr = stderr_file # process execution finally: sys.stdout = sys.__stdout__ sys.stderr = sys.__stderr__ # cleanup processes if needed🔗 Analysis chain
Line range hint
156-174
: Verify process cleanup in parallel execution.The parallel execution code should ensure proper cleanup of processes, especially in case of errors.
Let's verify the process handling:
Let me gather more context about error handling and process management in these files.
This will help us understand:
- If there's any error handling around the process management
- The complete context of process cleanup loops
- Whether there are any safety mechanisms in place for process termination
Let me check one more aspect - the complete context around the process management code to understand the full scope.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check for potential process leaks or cleanup issues rg -A 5 "multiprocessing.Process" . rg "proc\.terminate|proc\.kill" .Length of output: 1129
Script:
#!/bin/bash # Check for try-except blocks and error handling around process management rg -B 3 -A 10 "try:" ./SIAB/spillage/pytorch_swat/api.py ./SIAB/spillage/pytorch_swat/parallelization.py # Look for the full context of process management ast-grep --pattern 'for $_ in procs: $$$'Length of output: 303
Script:
#!/bin/bash # Get broader context around process management rg -B 10 -A 10 "multiprocessing.Process" ./SIAB/spillage/pytorch_swat/api.py # Check if there's any exception handling in the file rg -B 5 -A 5 "except" ./SIAB/spillage/pytorch_swat/api.pyLength of output: 1382
SIAB/spillage/api.py (5)
Line range hint
234-301
: Consider improving variable names for better readability.While the function is well-documented, consider:
- Rename
iconfs_
toconfig_indices
for clarity- Rename
ibands_
toband_indices
for clarity- Consider grouping related parameters into a configuration object
Line range hint
543-572
: Add type hints for better code maintainability.Consider adding type hints to improve code maintainability and IDE support:
-def _nzeta_mean_conf(nbands, folders, statistics = 'max', pop = 'svd-fold', filter = 1.0): +def _nzeta_mean_conf( + nbands: Union[int, List[int]], + folders: List[str], + statistics: Literal['max', 'mean'] = 'max', + pop: Literal['svd-fold', 'svd-max'] = 'svd-fold', + filter: Union[float, List[float], List[List[int]], None] = 1.0 +) -> List[int]:
981-989
: Consider implementing skipped tests.The test cases for
_nzeta_infer
are well-structured, but some tests are skipped. Consider implementing these tests to ensure full coverage.Would you like me to help implement the skipped test cases or create an issue to track this?
1301-1323
: Document experimental code and add error handling.The new initial guess implementation should:
- Have detailed documentation explaining the approach
- Include error handling for invalid inputs
- Have corresponding test cases
Would you like help in documenting this experimental code or creating test cases?
🧰 Tools
🪛 Ruff
1309-1309: Ambiguous variable name:
l
(E741)
667-667
: Consider using more descriptive variable names.While 'l' is commonly used for angular momentum in physics, consider using more descriptive names like
angular_momentum
orl_quantum_number
for better code readability.Also applies to: 1309-1309
🧰 Tools
🪛 Ruff
667-667: Ambiguous variable name:
l
(E741)
SIAB/spillage/util.py (2)
Line range hint
25-28
: PotentialTypeError
due to iterating over an integerIn the assignment:
matmaps_ = orbital["folder"] if siab_settings.get("optimizer", "none") not in ["none", "restart"] else 0If
optimizer
is'none'
or'restart'
,matmaps_
is set to0
, an integer. Attempting to iterate overmatmaps_
in the following loop will raise aTypeError
:for i in matmaps_: orbital["folder"].extend(folders[i])Suggested Fix:
Replace
0
with an empty list[]
to ensurematmaps_
is always iterable:- matmaps_ = orbital["folder"] if siab_settings.get("optimizer", "none") not in ["none", "restart"] else 0 + matmaps_ = orbital["folder"] if siab_settings.get("optimizer", "none") not in ["none", "restart"] else []
69-69
: Use compatible type hints for broader Python version supportThe type hint syntax
int|str
is valid in Python 3.10 and above. For compatibility with earlier Python versions, useUnion[int, str]
from thetyping
module:def _spil_bnd_autoset(pattern: int|str, folder: str, occ_thr=5e-1, merge_sk='max'):Suggested Fix:
+ from typing import Union - def _spil_bnd_autoset(pattern: int|str, folder: str, occ_thr=5e-1, merge_sk='max'): + def _spil_bnd_autoset(pattern: Union[int, str], folder: str, occ_thr=5e-1, merge_sk='max'):
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
- SIAB/interface/abacus.py (1 hunks)
- SIAB/io/read_input.py (0 hunks)
- SIAB/spillage/api.py (26 hunks)
- SIAB/spillage/pytorch_swat/api.py (1 hunks)
- SIAB/spillage/util.py (2 hunks)
💤 Files with no reviewable changes (1)
- SIAB/io/read_input.py
🧰 Additional context used
🪛 Ruff
SIAB/spillage/api.py
667-667: Ambiguous variable name:
l
(E741)
1309-1309: Ambiguous variable name:
l
(E741)
🔇 Additional comments (4)
SIAB/spillage/api.py (2)
15-22
: LGTM! Import changes are well-organized.The new imports are properly organized and necessary for the added functionality.
Line range hint
205-233
: Verify function rename impact.The function rename from
_make_guess
to_plan_guess
is a good improvement in clarity. However, we should verify all callers have been updated.✅ Verification successful
Let me try a different approach to verify the function rename impact by searching for both the old and new function names, and also look for potential callers.
Function rename impact verified successfully
The search results show that:
- No occurrences of the old name
_make_guess
were found in the codebase- The new name
_plan_guess
is properly used in:
- Two caller locations in api.py where it's used to initialize the guess configuration
- Function definition itself
- Documentation references
- Test cases that verify its functionality
The rename appears to have been completely and correctly implemented across the codebase.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for any remaining references to the old function name rg "_make_guess"Length of output: 16
Script:
#!/bin/bash # Search for both old and new function names echo "=== Searching for _make_guess ===" rg --type py "_make_guess" -A 2 -B 2 echo -e "\n=== Searching for _plan_guess ===" rg --type py "_plan_guess" -A 2 -B 2 # Look for potential callers by searching for 'guess' in Python files echo -e "\n=== Searching for potential callers ===" ast-grep --pattern 'def $_ ($$$) { $$$ guess $$$ }'Length of output: 2598
SIAB/interface/abacus.py (1)
745-745
: Optimization: Increased lattice constant improves isolationThe increase in lattice constant from 20.0 to 30.0 Bohr is a good optimization that will help reduce periodic image interactions and provide more space for atomic orbital expansion, particularly beneficial for accurate calculations of isolated molecular systems.
SIAB/spillage/util.py (1)
129-157
: Unit tests effectively cover_spil_bnd_autoset
functionThe
TestSpillageUtilities
class provides comprehensive tests for_spil_bnd_autoset
, validating its behavior with various inputs, including integers, expressions involving'occ'
and'all'
, and edge cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Outside diff range and nitpick comments (4)
SIAB/spillage/util.py (3)
Line range hint
6-36
: Add input validation and error handling.The function lacks input validation and error handling for the dictionary operations. Consider:
- Validating input parameters
- Adding try-except blocks for dictionary access
- Documenting the expected structure of input parameters
Add validation at the start of the function:
def ptg_spilopt_params_from_dft(calculation_settings, siab_settings, folders): + if not isinstance(calculation_settings, dict) or not isinstance(siab_settings, dict): + raise TypeError("calculation_settings and siab_settings must be dictionaries") + if "orbitals" not in siab_settings: + raise KeyError("siab_settings must contain 'orbitals' key") + for orbital in siab_settings["orbitals"]: + if "folder" not in orbital: + raise KeyError("Each orbital must contain 'folder' key")
59-61
: Address FIXME comment with a concrete plan.The FIXME comment suggests making
nbands_ref
dependent onrcut
. Consider implementing this enhancement by:
- Adding a mapping between rcut values and corresponding band numbers
- Implementing validation for the rcut-dependent band selection
Would you like me to help create a GitHub issue to track this enhancement?
146-173
: Enhance test coverage with additional test cases.While the current tests cover basic functionality, consider adding:
- Edge cases (empty strings, whitespace)
- Complex expressions
- Different merge_sk strategies
- Parameterized tests for better maintainability
Example additional test cases:
def test_edge_cases(self): with self.assertRaises(ValueError): _spil_bnd_autoset('', self.outdir) with self.assertRaises(ValueError): _spil_bnd_autoset(' ', self.outdir) def test_merge_strategies(self): for strategy in ['min', 'max', 'mean']: result = _spil_bnd_autoset('occ', self.outdir, merge_sk=strategy) self.assertIsInstance(result, int)SIAB/spillage/api.py (1)
Line range hint
195-223
: Add type hints for better code maintainability.Consider adding type hints to improve code readability and enable better IDE support.
-def _plan_guess(nzeta, folder, jy = True, diagnosis = True): +def _plan_guess(nzeta: list[list[int]], folder: str, jy: bool = True, diagnosis: bool = True) -> dict:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- SIAB/spillage/api.py (27 hunks)
- SIAB/spillage/util.py (2 hunks)
🧰 Additional context used
🪛 Ruff
SIAB/spillage/api.py
652-652: Ambiguous variable name:
l
(E741)
1294-1294: Ambiguous variable name:
l
(E741)
🔇 Additional comments (2)
SIAB/spillage/util.py (2)
47-53
: Previous IndexError concern still applies.The code at line 50 has the same IndexError vulnerability mentioned in past reviews. Additionally, the comment about element extraction not being elegant suggests room for improvement.
144-144
: Security vulnerability with eval() persists.The use of eval() for pattern evaluation remains a security risk, as noted in past reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (9)
SIAB/driver/front.py (1)
46-50
: LGTM! Consider enhancing the error message.The validation check for preventing the deprecated combination of 'bfgs' optimizer with 'pw' basis is well-placed and implemented correctly.
Consider making the error message more helpful by suggesting the recommended alternative:
- raise ValueError("Generating orbitals from PW calculation for bfgs has been deprecated") + raise ValueError("Generating orbitals from PW calculation for bfgs has been deprecated. Please use 'jy' as the fit_basis instead.")SIAB/spillage/util.py (3)
Line range hint
7-37
: Consider adding type hints for better code maintainability.The function is well-documented but could benefit from type hints for parameters and return value.
- def ptg_spilopt_params_from_dft(calculation_settings, siab_settings, folders): + def ptg_spilopt_params_from_dft( + calculation_settings: dict, + siab_settings: dict, + folders: list + ) -> dict:
60-62
: Address the FIXME comment about rcut dependency.The comment indicates a potential enhancement for making
nbands_ref
dependent onrcut
. Consider creating an issue to track this improvement.Would you like me to create a GitHub issue to track this enhancement?
146-149
: Improve error handling by usingraise ... from err
.For better error traceability, use the
raise ... from err
pattern to preserve the error chain.try: return int(ast.literal_eval(pattern.replace('occ', str(nocc)).replace('all', str(nall)))) - except (ValueError, SyntaxError): - raise ValueError(f"nbands_ref {pattern} is not a valid expression.") + except (ValueError, SyntaxError) as err: + raise ValueError(f"nbands_ref {pattern} is not a valid expression.") from err🧰 Tools
🪛 Ruff
149-149: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
SIAB/spillage/lcao_wfc_analysis.py (2)
39-44
: Simplify kwargs.get() call.The default value of None in kwargs.get() is redundant since that's the default behavior.
- nz_filter = kwargs.get('nz_filter', None) + nz_filter = kwargs.get('nz_filter')🧰 Tools
🪛 Ruff
42-42: Use
kwargs.get('nz_filter')
instead ofkwargs.get('nz_filter', None)
Replace
kwargs.get('nz_filter', None)
withkwargs.get('nz_filter')
(SIM910)
Line range hint
90-135
: Consider extracting common loss estimation logic.Both
_svdmax
and_svdfold
contain similar code for loss estimation. Consider extracting this into a helper function to reduce code duplication.def _calculate_loss(Ct, nz, out): """Calculate the space truncation loss. Parameters ---------- Ct : list of arrays Transformed coefficients nz : list of list of int Number of zeta functions to keep out : list of arrays Singular values Returns ------- float The calculated loss """ loss = 0 for it in range(len(Ct)): for l in range(len(Ct[it])): u, s, vh = la.svd(Ct[it][l], full_matrices=False) loss += la.norm(u[:, nz[it][l]:] @ np.diag(s[nz[it][l]:]) @ vh[nz[it][l]:, :], 'fro')**2 return lossAlso applies to: 192-240
SIAB/spillage/api.py (2)
Line range hint
529-662
: Consider improving variable naming in the analysis functions.The functions
_nzeta_mean_conf
and_nzeta_infer
usel
as a variable name which is ambiguous. Consider using more descriptive names likeangular_momentum
orl_quantum_number
.- for l, s in enumerate(st_): + for angular_momentum, s in enumerate(st_):🧰 Tools
🪛 Ruff
653-653: Ambiguous variable name:
l
(E741)
Line range hint
1224-1309
: Experimental code needs cleanup.The
test_sigma_nzeta_nbands
function contains experimental code with hardcoded paths and values. Consider:
- Moving this to a separate experimental module
- Using configuration files for paths and parameters
- Adding proper error handling for file operations
Would you like me to propose a refactored version of this code?
🧰 Tools
🪛 Ruff
1295-1295: Ambiguous variable name:
l
(E741)
SIAB/spillage/jy_expmt.py (1)
163-163
: Implement spin handling in_coef_init
functionThe comment
# how to deal with the spin?...
indicates that spin handling is not yet implemented in the function. To ensure correct behavior for spin-polarized calculations, consider addressing spin indices when accessingibnd_lnm
and proceeding with coefficient initialization accordingly.Would you like assistance in implementing spin handling in this function?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
- SIAB/driver/front.py (1 hunks)
- SIAB/spillage/api.py (27 hunks)
- SIAB/spillage/jy_expmt.py (1 hunks)
- SIAB/spillage/lcao_wfc_analysis.py (9 hunks)
- SIAB/spillage/util.py (2 hunks)
🧰 Additional context used
📓 Learnings (1)
SIAB/spillage/api.py (2)
Learnt from: kirk0830 PR: kirk0830/ABACUS-ORBGEN#103 File: SIAB/spillage/api.py:265-283 Timestamp: 2024-10-26T12:00:41.628Z Learning: In `SIAB/spillage/api.py`, the optimization loop in `_do_onion_opt` will not raise exceptions, so adding error handling is unnecessary.
Learnt from: kirk0830 PR: kirk0830/ABACUS-ORBGEN#103 File: SIAB/spillage/api.py:97-104 Timestamp: 2024-10-26T11:59:23.702Z Learning: In `SIAB/spillage/api.py`, the folder handling logic within `_coef_opt_jy` is intentionally kept inside that function because it's only executed there and should not be moved outside.
🪛 Ruff
SIAB/spillage/api.py
653-653: Ambiguous variable name:
l
(E741)
1295-1295: Ambiguous variable name:
l
(E741)
SIAB/spillage/jy_expmt.py
70-70: Ambiguous variable name:
l
(E741)
78-78: Ambiguous variable name:
l
(E741)
84-84: Ambiguous variable name:
l
(E741)
87-87: Ambiguous variable name:
l
(E741)
157-157: Ambiguous variable name:
l
(E741)
160-160: Ambiguous variable name:
l
(E741)
196-196: Ambiguous variable name:
l
(E741)
SIAB/spillage/lcao_wfc_analysis.py
42-42: Use
kwargs.get('nz_filter')
instead ofkwargs.get('nz_filter', None)
Replace
kwargs.get('nz_filter', None)
withkwargs.get('nz_filter')
(SIM910)
149-149: Ambiguous variable name:
l
(E741)
153-153: Ambiguous variable name:
l
(E741)
161-161: Ambiguous variable name:
l
(E741)
184-184: Ambiguous variable name:
l
(E741)
266-266: Ambiguous variable name:
l
(E741)
286-286: Ambiguous variable name:
l
(E741)
406-406: Ambiguous variable name:
l
(E741)
SIAB/spillage/util.py
149-149: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
🔇 Additional comments (11)
SIAB/driver/front.py (2)
52-56
: Verify the default optimizer value handling.The condition
user_settings.get("optimizer", "pytorch.SWAT") != "none"
uses "pytorch.SWAT" as the default, but this might not align with the default value used in theoptimizer
check above (line 47) which doesn't specify a default. This could lead to inconsistent behavior.Let's check for any other default values used for the optimizer:
✅ Verification successful
Based on the search results, I can now generate the final response:
The default optimizer value handling is consistent across the codebase
The code is correct as is. Here's why:
- In
SIAB/io/read_input.py
, the optimizer setting is read with "pytorch.SWAT" as default:"optimizer": user_settings.get("optimizer", "pytorch.SWAT")
- This same default value is consistently used in the condition being reviewed
- The
"none"
default seen in other files (likespillage/util.py
) is used in different contexts, specifically for internal processing after the initial settings are parsed🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for all places where optimizer settings are accessed rg -A 2 "optimizer.*=|get\(['\"]optimizer"Length of output: 3720
Line range hint
1-200
: Verify integration with nbands_ref fix.While these changes support the infrastructure refactoring, they don't directly address the bug with
nbands_ref
determination for degenerate cases (Issue #104). Please ensure that these changes properly integrate with the actual fix implemented elsewhere.Let's check for the actual implementation of the nbands_ref fix:
✅ Verification successful
Based on the search results, I can see that the
nbands_ref
handling has indeed been modified in the codebase. Let me search for more specific changes related to the degenerate cases mentioned in Issue #104:
Let me search for the actual implementation of
_spil_bnd_autoset
since it appears to be the key function for handling the nbands determination:
Based on the search results, I can now generate a final response:
The nbands_ref handling has been properly addressed
The changes in the codebase show that the
nbands_ref
determination has been significantly improved:
- The
_spil_bnd_autoset
function inSIAB/spillage/util.py
now handles various cases including:
- Explicit integer values
- 'occ' for occupied bands
- 'all' for all bands
- Expressions like 'occ+2' or 'all-2'
- The implementation includes validation to prevent out-of-range band numbers
- The test cases demonstrate proper handling of degenerate cases with assertions for expected values
- The infrastructure changes in
front.py
properly integrate with this new implementation by preserving the originalnbands_ref
value and delegating the resolution to the specific optimizer implementations🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for nbands_ref related changes rg -A 5 "nbands_ref.*=|get\(['\"]nbands_ref" # Search for any functions that might be determining band numbers ast-grep --pattern 'function $_($$) { $$$ nbands $$$ }'Length of output: 3049
Script:
#!/bin/bash # Search for Issue #104 reference and related discussions rg -l "#104|issue.*104" # Search for degenerate band handling rg -A 5 "degener.*band|band.*degener" # Look for the specific spillage calculation implementation ast-grep --pattern 'def _spil_bnd_autoset($_, $_) { $$$ }'Length of output: 747
Script:
#!/bin/bash # Search for the implementation of _spil_bnd_autoset rg -A 10 "_spil_bnd_autoset" # Look for any recent changes or fixes in spillage calculation modules rg -A 5 "def run\(" SIAB/spillage/Length of output: 3994
SIAB/spillage/util.py (2)
1-6
: LGTM! Good security practice with ast import.The imports are well-organized and the use of
ast
for safe expression evaluation is a good security practice.
151-180
: LGTM! Well-structured test cases.The test class provides comprehensive coverage for the
_spil_bnd_autoset
function, including edge cases and error conditions.SIAB/spillage/lcao_wfc_analysis.py (3)
28-32
: LGTM! API changes improve clarity and error handling.The reduction in method options and the more descriptive parameter name 'nz_filter' make the API clearer and more focused.
149-149
: Keep the variable name 'l' for angular momentum.While static analysis flags 'l' as an ambiguous variable name, it's a standard notation in quantum mechanics for angular momentum quantum number. Consider adding a comment to explain this convention.
Also applies to: 153-153, 161-161, 184-184, 266-266, 286-286
🧰 Tools
🪛 Ruff
149-149: Ambiguous variable name:
l
(E741)
357-358
: LGTM! Test updates properly reflect API changes.The test cases have been updated to use the new functions and properly test both the SVD methods. The previous issue with the method dictionary call has been correctly fixed.
Also applies to: 369-369, 398-399, 418-419
SIAB/spillage/api.py (4)
15-16
: LGTM: Import changes are well-organized.The new imports are properly organized and support the added functionality for data parsing, LCAO analysis, and experimental features.
Also applies to: 18-18, 20-22
97-104
: LGTM: Folder handling logic is correctly implemented.Based on the learnings, the folder handling logic is intentionally kept inside
_coef_opt_jy
as it's only executed there.
Line range hint
196-224
: LGTM: Function rename and implementation look good.The function has been renamed from
_make_guess
to_plan_guess
with proper documentation and implementation.
Line range hint
225-284
: LGTM: Optimization logic is well-implemented.Based on the learnings, the optimization loop in
_do_onion_opt
is designed not to raise exceptions, so the implementation is correct.
Fix #104
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Tests