Releases: stan-dev/loo
loo v2.8.0
- make E_loo Pareto-k diagnostic more robust by @avehtari in #251
- update psis paper reference by @avehtari in #252
- update PSIS references in vignettes by @jgabry in #254
- fix loo_moment_match p_loo computation by @avehtari in #257
- fix loo_moment_matching NaN issue by @avehtari in #259
- catch Stan log_prob exceptions inside moment matching by @avehtari in #262
- Fix E_loo_khat error when posterior::pareto_khat returns NA by @jgabry in #264
- update psis ref + some minor typo fixes by @avehtari in #266
- update PSIS ref + link to Nabiximols study for Jacobian correction by @avehtari in #267
- Fix issue with pareto_khat output no longer being a list by @n-kall in #269
- fix equations in loo-glossary by @avehtari in #268
New Contributors
Full Changelog: v2.7.0...v2.8.0
loo v2.7.0
Major changes
-
New sample size specific diagnostic threshold for Pareto
k
.
The pre-2022 version of the PSIS paper recommended diagnostic thresholds of
k < 0.5 "good"
0.5 <= k < 0.7 "ok"
0.7 <= k < 1 "bad"
k>=1 "very bad"
The 2022 revision of the PSIS paper now recommends
k < min(1 - 1/log10(S), 0.7) "good"
min(1 - 1/log10(S), 0.7) <= k < 1 "bad"
k > 1 "very bad"
whereS
is the sample size.
There is now one fewer diagnostic threshold ("ok"
has been removed), and the
most important threshold now depends on the sample sizeS
. With sample sizes
100
,320
,1000
,2200
,10000
the sample size specific part1 - 1/log10(S)
corresponds to thresholds of0.5
,0.6
,0.67
,0.7
,0.75
.
Even if the sample size grows, the bias in the PSIS estimate dominates if
0.7 <= k < 1
, and thus the diagnostic threshold for good is capped at
0.7
(ifk > 1
, the mean does not exist and bias is not a valid measure).
The new recommended thresholds are based on more careful bias-variance analysis
of PSIS based on truncated Pareto sums theory. For those who use the Stan
default 4000 posterior draws, the0.7
threshold will be roughly the same, but
there will be fewer warnings as there will be no diagnostic message for0.5 <= k < 0.7
.
Those who use smaller sample sizes may see diagnostic messages with a
threshold less than0.7
, and they can simply increase the sample size to about
2200
to get the threshold to0.7
. -
No more warnings if the
r_eff
argument is not provided, and the
default is nowr_eff = 1
. The summary print output showing MCSE and ESS now
shows diagnostic information on the range ofr_eff
. The change was made to
reduce unnecessary warnings. The use ofr_eff
does not change the expected
value ofelpd_loo
,p_loo
, and Paretok
, and is needed only to estimate
MCSE and ESS. Thus it is better to show the diagnostic information aboutr_eff
only when MCSE and ESS values are shown.
Other changes
- Make Pareto
k
Inf if it is NA by @topipa in #224 - Fix bug in
E_loo()
when type is variance by @jgabry in #22 E_loo()
now allowstype="sd"
by @jgabry in #226- include cc-by 4.0 license for documentation by @jgabry in #216
- Add order statistic warning by @yannmclatchie in #230
pointwise()
convenience function for extracting pointwise estimates by @jgabry in #241- use new
k
threshold by @avehtari in #235 - simplify
mcse_elpd
using log-normal approximation by @avehtari in #246 - show NA for
n_eff/ESS
ifk > k_threshold
by @avehtari in #248 - improved
E_loo()
Pareto-k diagnostics by @avehtari in #247 - Doc improvement in
loo_subsample.R
by @avehtari in #238 - Fix typo and deprecations in LFO vignette by @jgabry in #244
- update array syntax in vignettes by @jgabry in #229
- Fix unbalanced knitr backticks by @jgabry in #232
- Register internal S3 methods by @jgabry in #239
- Avoid R cmd check NOTEs about some internal functions by @jgabry in #240
- fix R cmd check note due to importance_sampling roxygen template by @jgabry in #233
- fix R cmd check notes by @jgabry in #242
New Contributors
- @yannmclatchie made their first contribution in #230
Full Changelog: v2.6.0...v2.7.0
loo v2.6.0
New features
-
New
loo_predictive_metric()
function for computing estimates of leave-one-out
predictive metrics: mean absolute error, mean squared error and root mean
squared error for continuous predictions, and accuracy and balanced accuracy for
binary classification. (#202, @LeeviLindgren) -
New functions
crps()
,scrps()
,loo_crps()
, andloo_scrps()
for
computing the (scaled) continuously ranked probability score. (#203, @LeeviLindgren) -
New vignette "Mixture IS leave-one-out cross-validation for high-dimensional Bayesian models." This is a demonstration of the mixture estimators proposed by Silva and Zanella (2022). (#210)
Bug fixes
- Minor fix to model names displayed by
loo_model_weights()
to make them consistent withloo_compare()
. (#217)
loo v2.5.1
- Fix R CMD check error on M1 Mac as requested by CRAN
loo v2.5.0
Improvements
-
New Frequently Asked Questions page on the package website. (#143)
-
Speed improvement from simplifying the normalization when fitting the
generalized Pareto distribution. (#187, @sethaxen) -
Added parallel likelihood computation to speedup
loo_subsample()
when using posterior approximations. (#171, @kdubovikov) -
Switch unit tests from Travis to GitHub Actions. (#164)
Bug fixes
- Fixed a bug causing the normalizing constant of the PSIS (log) weights not
to get updated when performing moment matching withsave_psis = TRUE
(#166, @fweber144).
loo v2.4.0
Bug fixes
-
Fixed a bug in
relative_eff.function()
that caused an error on Windows when
using multiple cores. (#152) -
Fixed a potential numerical issue in
loo_moment_match()
withsplit=TRUE
. (#153) -
Fixed potential integer overflow with
loo_moment_match()
. (#155, @ecmerkle) -
Fixed
relative_eff()
when used with aposterior::draws_array
. (#161, @rok-cesnovar)
New features
- New generic function
elpd()
(and methods for matrices and arrays) for
computing expected log predictive density of new data or log predictive density
of observed data. A new vignette demonstrates using this function when doing
K-fold CV with rstan. (#159, @bnicenboim)
loo v2.3.1
- Fix a bug in
loo_moment_match()
that prevented...
arguments from being used
correctly. (#149)
loo v2.3.0
-
Added Topi Paananen (@topipa) and Paul Bürkner (@paul-buerkner) as coauthors.
-
New function
loo_moment_match()
(and new vignette), which can be used to
update aloo
object when Pareto k estimates are large. (#130) -
The log weights provided by the importance sampling functions
psis()
,
tis()
, andsis()
no longer have the largest log ratio subtracted from them
when returned to the user. This should be less confusing for anyone using
theweights()
method to make an importance sampler. (#112, #146)
loo v2.2.0
loo 2.2.0
See release notes below or at mc-stan.org/loo/news.
(GitHub issue/PR number in parentheses)
-
Added Mans Magnusson (@MansMeg) as a coauthor.
-
New functions
loo_subsample()
andloo_approximate_posterior()
(and new
vignette) for doing PSIS-LOO with large data. (#113) -
Added support for standard importance sampling and truncated importance
sampling (functionssis()
andtis()
). (#125) -
compare()
now throws a deprecation warning suggestingloo_compare()
. (#93) -
A smaller threshold is used when checking the uniqueness of tail values. (#124)
-
For WAIC, warnings are only thrown when running
waic()
and not when printing
awaic
object. (#117, @mcol) -
Use markdown syntax in roxygen documentation wherever possible. (#108)
loo v2.1.0
See release notes below or at mc-stan.org/loo/news.
Installation
Install from CRAN:
install.packages("loo")
Install from GitHub:
devtools::install_github("stan-dev/loo", ref = "v2.1.0")
Release notes
-
New function
loo_compare()
for model comparison that will eventually replace
the existingcompare()
function. (#93) -
New vignette on LOO for non-factorizable joint Gaussian models. (#75)
-
New vignette on "leave-future-out" cross-validation for time series models. (#90)
-
New glossary page (use
help("loo-glossary")
) with definitions of key terms. (#81) -
New
se_diff
column in model comparison results. (#78) -
Improved stability of
psis()
whenlog_ratios
are very small. (#74) -
Allow
r_eff=NA
to suppress warning when specifyingr_eff
is not applicable
(i.e., draws not from MCMC). (#72) -
Update effective sample size calculations to match RStan's version. (#85)
-
Naming of k-fold helper functions now matches scikit-learn. (#96)