-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up the code #15
Conversation
for more information, see https://pre-commit.ci
BTW, here's how opennem handles retries (via
|
Hi Matthew,
Thanks for all of this! Looks great!
I'm taking some time off at the moment, but will make some time to look at
this in early June.
Abi
…On Tue, 14 May 2024, 6:26 pm Matthew Davis, ***@***.***> wrote:
BTW, here's how opennem handles retries (via requests.session retry
configuration):
https://github.com/opennem/opennem/blob/4a16a99744a4582080b4711a105c6f070b875960/opennem/utils/http.py#L106-L124
http = requests.Session()
http.headers.update({"User-Agent": USER_AGENT})
adapter_timeout = TimeoutHTTPAdapter()
http.mount("https://", adapter_timeout)
http.mount("http://", adapter_timeout)
adapter_retry = HTTPAdapter(max_retries=retry_strategy)
http.mount("https://", adapter_retry)
http.mount("http://", adapter_retry)
—
Reply to this email directly, view it on GitHub
<#15 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHEMJQNRDS6CCLRGOXICC5TZCHDETAVCNFSM6AAAAABHVPPR4WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBZGU4DCNBRG4>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #15 +/- ##
==========================================
- Coverage 81.98% 81.81% -0.17%
==========================================
Files 2 2
Lines 161 165 +4
Branches 26 28 +2
==========================================
+ Hits 132 135 +3
- Misses 26 27 +1
Partials 3 3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Hi @mdavis-xyz, can you please sync your fork? I've updated the workflow so CI doesn't fail for codecov. Looks like local tests are passing. As for changes:
I will have to get around to adding some of these improvements to NEMSEER. Thanks very much for your contribution! I'll add you to the acknowledgements. Abi |
requests.session
to re-use the same TLS+HTTP handshake/session (x2 speedup)http
forhttps
. (Change base url from HTTP to HTTPS #14 )requests
can throw exceptions prior to returning an HTTPS status, which will not be caught by the existing logic. (e.g. if your wifi cuts out). Also, the actual zip file download is not retried. Perhaps it would be better to just callresp.raise_for_status()
and configure retries within therequests.session
like this.@cache
decorator can't handle mutable dicts). I'm a bit perplexed by all the header stuff that was there. I've never needed to manually specify any header before when webscraping nemweb. What's the purpose of the headers? (e.g. I thinkrequests
adds most of them for you automatically, e.g.Host
) I left the referrer header in, although I don't think it's needed.poetry.lock
because the CICD and local tests were failing if I didn't do that.One more thing we could do (which I haven't done yet) is to not redownload a zip if we've already downloaded it. I see the library deletes zips when decompressing. I think it should leave them there. So when we download, check if there's an existing file (perhaps check the file size matches expectation). If so, skip the download. That's faster for the user, and results in less server costs for AEMO. Since the folder is called a "cache", I think re-use is also a more intuitive behaviour for the user to expect.
Another thing we could do to speed up the table size estimation is to avoid doing HEAD to get file sizes, and instead grab them from the
<div>
s in the HTML. (Probably a x20 speedup)