Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DEA functions and plotting to ADRIA.analysis #896

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

Rosejoycrocker
Copy link
Collaborator

@Rosejoycrocker Rosejoycrocker commented Oct 28, 2024

Adds function to perform output-oriented (default) Data Envelopment Analysis (DEA) given inputs X and output metrics Y. DEA is used to measure the performance of entities (scenarios), where inputs are converted to outputs via some process. Each scenario's "efficiency score" is calculated relative to an "efficiency frontier", a region representing scenarios for which outputs
cannot be further increased by changing inputs (scenario settings). Scenarios on the frontier serve as "benchmarks" or "peers", associated with best practice restoration scenarios. Scenarios with efficiencies not equal to 1 can be improved to be more efficient.

Allows usage with generic input functions, such as functions representing cost as a function of intervention scenario.

Flagging @ConnectedSystems as we discussed this previously

Working example is:

using Revise, Infiltrator
using ADRIA
using DataFrames, YAXArrays
using GLMakie, GeoMakie, GraphMakie
using Distributions

dom = ADRIA.load_domain( "domain path", "45")

scens = ADRIA.sample(dom, 2^10)
rs = ADRIA.run_scenarios(dom, scens, "45")

n_scens = size(scens,1)

# Get cost of deploying corals in each scenario, randomised cost example
cost = YAXArray(collect(range(100,stop=1000000, length=n_scens)) .+ rand(Uniform(1000, 2000), n_scens))

# Get mean coral cover and shelter volume for each scenario
s_tac = dropdims(
    mean(ADRIA.metrics.scenario_total_cover(rs); dims=:timesteps); dims=:timesteps
)
s_sv = dropdims(
   mean(ADRIA.metrics.scenario_shelter_volume(rs); dims=:timesteps); dims=:timesteps
    )

# Do output oriented DEA analysis seeking to maximise cover and shelter volume for minimum
# deployment cost.
DEA_scens = ADRIA.economics.data_envelopment_analysis(cost, Array(s_tac), Array(s_sv))
dea_fig = ADRIA.viz.data_envelopment_analysis(rs, DEA_scens)

example_dea_fig

Copy link

codecov bot commented Oct 28, 2024

Codecov Report

Attention: Patch coverage is 84.00000% with 8 lines in your changes missing coverage. Please review.

Project coverage is 51.77%. Comparing base (ad39467) to head (5a339e3).

Files with missing lines Patch % Lines
src/analysis/data_envelopment.jl 60.00% 6 Missing ⚠️
src/viz/viz.jl 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #896      +/-   ##
==========================================
+ Coverage   51.44%   51.77%   +0.33%     
==========================================
  Files          72       74       +2     
  Lines        4782     4832      +50     
==========================================
+ Hits         2460     2502      +42     
- Misses       2322     2330       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Rosejoycrocker Rosejoycrocker force-pushed the add-dea-analysis branch 2 times, most recently from 43b52f3 to f1e5897 Compare October 28, 2024 21:16
… be input

Change default DEA model to `deabigdata` (deals with large data sets more efficiently.
Rename file and remove StableRNGs import as not needed
… clarity

Fix typo

Add preliminary version of coral deployment cost function

Add DataEnvelopmentAnalysis and BasicInterpolators to ADRIA package

Add `economics.jl" to analysis.jl imports


Add cost data for interpolators

Add basic DEA functions for ADRIA

Add first draft of CAD cost function

Make `economics` a module
Include 3 different returns to scale in outputs to allow returns to scale ratios to be calculated and plotted
…ost/coral and no of deployment years

Add setting scenarios with zero years of deployment to zero to avoid div by zero error
Now plots three plots, the efficiency frontier and data cloud, the technical efficiency and the scale efficiency.
Also add docs for viz functions.
…rence to counterfactuals for each intervention scenario
Fix typos, remove cost function from input to `data_envelopment_analysis`
Copy link
Collaborator

@Zapiano Zapiano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR doesn't add any new test. I think it would be a good idea to add at least some basic tests, at least the "happy case" both for the metrics and for the viz function added.

@@ -0,0 +1,169 @@
module economics
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments:

  1. Maybe economics is not a good name for this module. We have other analysis "tools" that are also related to economics (like pareto).
  2. I'm not sure this needs to be in its own module. Since we already have analysis module, why not just rename this file to data_envelopment.jl or something and include that in analysis module? (this is a real question, do you see a reason why this should be on a separate module? to me it looks like this is too small to deserve its own module, at least at this stage).

Comment on lines 35 to 38
function DEAResult(CRS_eff::Vector{Float64}, VRS_eff::Vector{Float64},
FDH_eff::Vector{Float64}, CRS_peers::DEA.DEAPeers, VRS_peers::DEA.DEAPeers,
FDH_peers::DEA.DEAPeers, X::Matrix{Float64}, Y::Vector{Float64}
)::DEAResult
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function DEAResult(CRS_eff::Vector{Float64}, VRS_eff::Vector{Float64},
FDH_eff::Vector{Float64}, CRS_peers::DEA.DEAPeers, VRS_peers::DEA.DEAPeers,
FDH_peers::DEA.DEAPeers, X::Matrix{Float64}, Y::Vector{Float64}
)::DEAResult
function DEAResult(
CRS_eff::Vector{Float64},
VRS_eff::Vector{Float64},
FDH_eff::Vector{Float64},
CRS_peers::DEA.DEAPeers,
VRS_peers::DEA.DEAPeers,
FDH_peers::DEA.DEAPeers,
X::Matrix{Float64},
Y::Vector{Float64}
)::DEAResult

end

"""
DEAResult(CRS_eff::Vector{Float64}, VRS_eff::Vector{Float64}, FDH_eff::Vector{Float64},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this a single line and update (it doesn't match the actual function signature)

Comment on lines 39 to 46
return DEAResult(1 ./ CRS_eff,
1 ./ VRS_eff,
1 ./ FDH_eff,
CRS_peers,
VRS_peers,
FDH_peers,
X,
Y)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return DEAResult(1 ./ CRS_eff,
1 ./ VRS_eff,
1 ./ FDH_eff,
CRS_peers,
VRS_peers,
FDH_peers,
X,
Y)
return DEAResult(
1 ./ CRS_eff,
1 ./ VRS_eff,
1 ./ FDH_eff,
CRS_peers,
VRS_peers,
FDH_peers,
X,
Y
)

end

"""
data_envelopment_analysis(X::YAXArray, Y::YAXArray; orient::Symbol=:Output,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, make this inline and update (doesn't match the function signature)

scale_efficiency = DEA_output.crs_vals ./ DEA_output.vrs_vals

# Plot efficiency frontier and data cloud
axa = Axis(g[1, 1]; xlabel=metrics_x_lab, ylabel=metrics_y_lab, axis_opts...)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

axa? Maybe ax_a? Not only to avoid confusion but also because it seems that we use snake case everywhere in this project.


# Plot efficiency frontier and data cloud
axa = Axis(g[1, 1]; xlabel=metrics_x_lab, ylabel=metrics_y_lab, axis_opts...)
data = scatter!(axa, Y[:, 1], Y[:, 2]; color=data_color)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I my opinion calling a scatter plot data is not very good. Not only because data can mean almost anything but because this is a figure object (right?). Since you are calling this "data cloud" it could be data_cloud, for example..?

frontier_color = get(opts, :frontier_color, :red)
data_color = get(opts, :data_color, :black)
frontier_name = get(opts, :frontier_name, "Best practice frontier")
data_name = get(opts, :data_name, "Scenario data cloud")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, what's the difference between a data cloud and a scatter plot?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, what do you mean here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was probably tired and being picky when I wrote this, you can just ignore this comment.

Legend(g[1, 2], [frontier, data], [frontier_name, data_name])

# Plot the scale efficiency (ratio of efficiencies assuming CRS vs. assuming VRS)
axb = Axis(g[2, 1]; title="Scale efficiency", ylabel=scale_eff_y_lab, axis_opts...)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
axb = Axis(g[2, 1]; title="Scale efficiency", ylabel=scale_eff_y_lab, axis_opts...)
ax_b = Axis(g[2, 1]; title="Scale efficiency", ylabel=scale_eff_y_lab, axis_opts...)

)

# Plot the technical efficiency (inverse VRS efficiencies)
axc = Axis(g[3, 1]; title="Technical efficiency", ylabel=tech_eff_y_lab, axis_opts...)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
axc = Axis(g[3, 1]; title="Technical efficiency", ylabel=tech_eff_y_lab, axis_opts...)
ax_c = Axis(g[3, 1]; title="Technical efficiency", ylabel=tech_eff_y_lab, axis_opts...)

@Rosejoycrocker
Copy link
Collaborator Author

This PR doesn't add any new test. I think it would be a good idea to add at least some basic tests, at least the "happy case" both for the metrics and for the viz function added.

I've added the plotting and calculation functions to analysis.jl tests

- fix function signatures and doc strings
- remove constructor function as unnecessary
- fix example in doc string
- change Y type to AbstractArray
- Allow Y to be Vector or Matrix in inner function'
- Formatting
- Formatting
- adjust variable names
- fix function signature and doc string
- remove cf_difference_scenario for future PR
@Rosejoycrocker
Copy link
Collaborator Author

I think I've addressed your comments @Zapiano, let me know if you have others

Copy link
Collaborator

@Zapiano Zapiano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's almost ready, just a few docstring missing and a refactor suggestion :)

Comment on lines 59 to 63
function ADRIA.viz.data_envelopment_analysis(
rs::ResultSet, DEA_output::DEAResult;
axis_opts::OPT_TYPE=DEFAULT_OPT_TYPE(), fig_opts::OPT_TYPE=DEFAULT_OPT_TYPE(),
opts::OPT_TYPE=DEFAULT_OPT_TYPE()
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function ADRIA.viz.data_envelopment_analysis(
rs::ResultSet, DEA_output::DEAResult;
axis_opts::OPT_TYPE=DEFAULT_OPT_TYPE(), fig_opts::OPT_TYPE=DEFAULT_OPT_TYPE(),
opts::OPT_TYPE=DEFAULT_OPT_TYPE()
)
function ADRIA.viz.data_envelopment_analysis(
rs::ResultSet,
DEA_output::DEAResult;
axis_opts::OPT_TYPE=DEFAULT_OPT_TYPE(),
fig_opts::OPT_TYPE=DEFAULT_OPT_TYPE(),
opts::OPT_TYPE=DEFAULT_OPT_TYPE()
)

using ADRIA.analysis: DEAResult

"""
ADRIA.viz.data_envelopment_analysis(rs::ResultSet, DEA_output::DEAResult;axis_opts=Dict(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring doesn't match the function signature (all opts params types are wrong) and it should be one method per line (this is how we do in the resto of this project at least).

Comment on lines 73 to 76
function ADRIA.viz.data_envelopment_analysis!(g::Union{GridLayout,GridPosition},
rs::ResultSet, DEA_output::DEAResult; axis_opts::OPT_TYPE=DEFAULT_OPT_TYPE(),
opts::OPT_TYPE=DEFAULT_OPT_TYPE()
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function ADRIA.viz.data_envelopment_analysis!(g::Union{GridLayout,GridPosition},
rs::ResultSet, DEA_output::DEAResult; axis_opts::OPT_TYPE=DEFAULT_OPT_TYPE(),
opts::OPT_TYPE=DEFAULT_OPT_TYPE()
)
function ADRIA.viz.data_envelopment_analysis!(
g::Union{GridLayout,GridPosition},
rs::ResultSet,
DEA_output::DEAResult;
axis_opts::OPT_TYPE=DEFAULT_OPT_TYPE(),
opts::OPT_TYPE=DEFAULT_OPT_TYPE()
)

Comment on lines 81 to 83
function ADRIA.viz.data_envelopment_analysis!(g::Union{GridLayout,GridPosition},
DEA_output::DEAResult; axis_opts::OPT_TYPE=DEFAULT_OPT_TYPE(),
opts::OPT_TYPE=DEFAULT_OPT_TYPE())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function ADRIA.viz.data_envelopment_analysis!(g::Union{GridLayout,GridPosition},
DEA_output::DEAResult; axis_opts::OPT_TYPE=DEFAULT_OPT_TYPE(),
opts::OPT_TYPE=DEFAULT_OPT_TYPE())
function ADRIA.viz.data_envelopment_analysis!(
g::Union{GridLayout,GridPosition},
DEA_output::DEAResult; axis_opts::OPT_TYPE=DEFAULT_OPT_TYPE(),
opts::OPT_TYPE=DEFAULT_OPT_TYPE()
)

scale_efficiency = DEA_output.crs_vals ./ DEA_output.vrs_vals

# Plot efficiency frontier and data cloud
ax_a = Axis(g[1, 1]; xlabel=metrics_x_lab, ylabel=metrics_y_lab, axis_opts...)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a way of making this more interesting, doing the same thing in less lines and extracting part of this code to another function:

If I understand, the reason why you need to save the result of these two following scatter! to variables is to render the Legend. You could also render the Legend in a different way:

marker_colors = [frontier_color, data_color]
marker_els = [MarkerElement(;color=m_color) for m_color in marker_colors]
Legend(g[1,2], marker_els, [frontier_name, data_name]; framevisible=false)

You can even put these three lines inside a separate function and pass opts to it so you don't need to extract these colors and names here.

And if you do that, you don't need to save those scatter! results to variables. Then, you can extract these lines that you repeat three times to an external function:

    # Plot the scale efficiency (ratio of efficiencies assuming CRS vs. assuming VRS)
    ax_b = Axis(g[2, 1]; title="Scale efficiency", ylabel=scale_eff_y_lab, axis_opts...)
    scatter!(ax_b, scale_efficiency; color=data_color)
    scatter!(
        ax_b,
        best_practice_scens,
        scale_efficiency[best_practice_scens];
        color=frontier_color
    )

And, again, you can pass opts to this function directly and extract the relevant variables inside it so you don't need to extract them here).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still about the legend: I've done something similar here:

function _render_legend(

And a small digression, hope you don't mind:

That makes me think we should have some standardized way of doing this the function I'm suggesting to you and the function in this link are essentially the same except that one generates a Legend with LineElements and the other generates a Legend with MarkerElements - and that's something we could address using multiple dispatch probably.

All that to say: maybe not do what I just said in this PR, but if you just extract this code that generates a legend (the one I suggested in the previous comment) to a function called render_marker_legend, for example, in the future we can make this better and have just a render_legend that can be used for different types of elements.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add render_marker_legend to data_envelopment.jlbut I'm guessing this will eventually be moved to 'viz.jl' when multiple dispatch is incorporated, so it can be used by both yourscenario.jlfunctions and thedata_envelopment.jl` functions?

data_envelopment_analysis(rs::ResultSet, input_function::Function, metrics...; orient::Symbol=:Output, dea_model::Function=DEA.deabigdata)::DEAResult
data_envelopment_analysis(rs::ResultSet, Y::YAXArray, input_function::Function; orient::Symbol=:Output, dea_model::Function=DEA.deabigdata)::DEAResult
data_envelopment_analysis(X::YAXArray, metrics...; orient::Symbol=:Output, dea_model::Function=DEA.deabigdata)::DEAResult
data_envelopment_analysis(X::YAXArray, Y::YAXArray; orient::Symbol=:Output, dea_model::Function=DEA.deabigdata)::DEAResult
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one doesn't match the function signature.

@Rosejoycrocker
Copy link
Collaborator Author

I think I've addressed your comments @Zapiano, let me know if there is anything else :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants