Add DEA functions and plotting to ADRIA.analysis #896

Rosejoycrocker · 2024-10-28T06:48:14Z

Adds function to perform output-oriented (default) Data Envelopment Analysis (DEA) given inputs X and output metrics Y. DEA is used to measure the performance of entities (scenarios), where inputs are converted to outputs via some process. Each scenario's "efficiency score" is calculated relative to an "efficiency frontier", a region representing scenarios for which outputs
cannot be further increased by changing inputs (scenario settings). Scenarios on the frontier serve as "benchmarks" or "peers", associated with best practice restoration scenarios. Scenarios with efficiencies not equal to 1 can be improved to be more efficient.

Allows usage with generic input functions, such as functions representing cost as a function of intervention scenario.

Flagging @ConnectedSystems as we discussed this previously

Working example is:

using Revise, Infiltrator
using ADRIA
using DataFrames, YAXArrays
using GLMakie, GeoMakie, GraphMakie
using Distributions

dom = ADRIA.load_domain( "domain path", "45")

scens = ADRIA.sample(dom, 2^10)
rs = ADRIA.run_scenarios(dom, scens, "45")

n_scens = size(scens,1)

# Get cost of deploying corals in each scenario, randomised cost example
cost = YAXArray(collect(range(100,stop=1000000, length=n_scens)) .+ rand(Uniform(1000, 2000), n_scens))

# Get mean coral cover and shelter volume for each scenario
s_tac = dropdims(
    mean(ADRIA.metrics.scenario_total_cover(rs); dims=:timesteps); dims=:timesteps
)
s_sv = dropdims(
   mean(ADRIA.metrics.scenario_shelter_volume(rs); dims=:timesteps); dims=:timesteps
    )

# Do output oriented DEA analysis seeking to maximise cover and shelter volume for minimum
# deployment cost.
DEA_scens = ADRIA.economics.data_envelopment_analysis(cost, Array(s_tac), Array(s_sv))
dea_fig = ADRIA.viz.data_envelopment_analysis(rs, DEA_scens)

codecov · 2024-10-28T07:12:22Z

Codecov Report

Attention: Patch coverage is 84.00000% with 8 lines in your changes missing coverage. Please review.

Project coverage is 51.77%. Comparing base (ad39467) to head (5a339e3).

Files with missing lines	Patch %	Lines
src/analysis/data_envelopment.jl	60.00%	6 Missing ⚠️
src/viz/viz.jl	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #896      +/-   ##
==========================================
+ Coverage   51.44%   51.77%   +0.33%     
==========================================
  Files          72       74       +2     
  Lines        4782     4832      +50     
==========================================
+ Hits         2460     2502      +42     
- Misses       2322     2330       +8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

… be input Change default DEA model to `deabigdata` (deals with large data sets more efficiently. Rename file and remove StableRNGs import as not needed

… clarity Fix typo Add preliminary version of coral deployment cost function Add DataEnvelopmentAnalysis and BasicInterpolators to ADRIA package Add `economics.jl" to analysis.jl imports Add cost data for interpolators Add basic DEA functions for ADRIA Add first draft of CAD cost function Make `economics` a module

…ost function)

Include 3 different returns to scale in outputs to allow returns to scale ratios to be calculated and plotted

…ost/coral and no of deployment years Add setting scenarios with zero years of deployment to zero to avoid div by zero error

Now plots three plots, the efficiency frontier and data cloud, the technical efficiency and the scale efficiency. Also add docs for viz functions.

…rence to counterfactuals for each intervention scenario

Fix typos, remove cost function from input to `data_envelopment_analysis`

Zapiano

This PR doesn't add any new test. I think it would be a good idea to add at least some basic tests, at least the "happy case" both for the metrics and for the viz function added.

Zapiano · 2024-10-31T05:04:20Z

src/analysis/economics.jl

@@ -0,0 +1,169 @@
+module economics


Two comments:

Maybe economics is not a good name for this module. We have other analysis "tools" that are also related to economics (like pareto).

I'm not sure this needs to be in its own module. Since we already have analysis module, why not just rename this file to data_envelopment.jl or something and include that in analysis module? (this is a real question, do you see a reason why this should be on a separate module? to me it looks like this is too small to deserve its own module, at least at this stage).

Zapiano · 2024-10-31T05:12:09Z

src/analysis/economics.jl

+function DEAResult(CRS_eff::Vector{Float64}, VRS_eff::Vector{Float64},
+    FDH_eff::Vector{Float64}, CRS_peers::DEA.DEAPeers, VRS_peers::DEA.DEAPeers,
+    FDH_peers::DEA.DEAPeers, X::Matrix{Float64}, Y::Vector{Float64}
+)::DEAResult


Suggested change

function DEAResult(CRS_eff::Vector{Float64}, VRS_eff::Vector{Float64},

FDH_eff::Vector{Float64}, CRS_peers::DEA.DEAPeers, VRS_peers::DEA.DEAPeers,

FDH_peers::DEA.DEAPeers, X::Matrix{Float64}, Y::Vector{Float64}

)::DEAResult

function DEAResult(

CRS_eff::Vector{Float64},

VRS_eff::Vector{Float64},

FDH_eff::Vector{Float64},

CRS_peers::DEA.DEAPeers,

VRS_peers::DEA.DEAPeers,

FDH_peers::DEA.DEAPeers,

X::Matrix{Float64},

Y::Vector{Float64}

)::DEAResult

Zapiano · 2024-10-31T05:13:10Z

src/analysis/economics.jl

+end
+
+"""
+    DEAResult(CRS_eff::Vector{Float64}, VRS_eff::Vector{Float64}, FDH_eff::Vector{Float64},


Make this a single line and update (it doesn't match the actual function signature)

Zapiano · 2024-10-31T05:13:50Z

src/analysis/economics.jl

+    return DEAResult(1 ./ CRS_eff,
+        1 ./ VRS_eff,
+        1 ./ FDH_eff,
+        CRS_peers,
+        VRS_peers,
+        FDH_peers,
+        X,
+        Y)


Suggested change

return DEAResult(1 ./ CRS_eff,

1 ./ VRS_eff,

1 ./ FDH_eff,

CRS_peers,

VRS_peers,

FDH_peers,

X,

Y)

return DEAResult(

1 ./ CRS_eff,

1 ./ VRS_eff,

1 ./ FDH_eff,

CRS_peers,

VRS_peers,

FDH_peers,

X,

Y

)

Zapiano · 2024-10-31T05:14:50Z

src/analysis/economics.jl

+end
+
+"""
+    data_envelopment_analysis(X::YAXArray, Y::YAXArray; orient::Symbol=:Output,


Please, make this inline and update (doesn't match the function signature)

Zapiano · 2024-10-31T06:51:40Z

ext/AvizExt/viz/economics.jl

+    scale_efficiency = DEA_output.crs_vals ./ DEA_output.vrs_vals
+
+    # Plot efficiency frontier and data cloud
+    axa = Axis(g[1, 1]; xlabel=metrics_x_lab, ylabel=metrics_y_lab, axis_opts...)


axa? Maybe ax_a? Not only to avoid confusion but also because it seems that we use snake case everywhere in this project.

Zapiano · 2024-10-31T06:58:03Z

ext/AvizExt/viz/economics.jl

+
+    # Plot efficiency frontier and data cloud
+    axa = Axis(g[1, 1]; xlabel=metrics_x_lab, ylabel=metrics_y_lab, axis_opts...)
+    data = scatter!(axa, Y[:, 1], Y[:, 2]; color=data_color)


So, I my opinion calling a scatter plot data is not very good. Not only because data can mean almost anything but because this is a figure object (right?). Since you are calling this "data cloud" it could be data_cloud, for example..?

Zapiano · 2024-10-31T06:58:21Z

ext/AvizExt/viz/economics.jl

+    frontier_color = get(opts, :frontier_color, :red)
+    data_color = get(opts, :data_color, :black)
+    frontier_name = get(opts, :frontier_name, "Best practice frontier")
+    data_name = get(opts, :data_name, "Scenario data cloud")


So, what's the difference between a data cloud and a scatter plot?

Sorry, what do you mean here?

Sorry, I was probably tired and being picky when I wrote this, you can just ignore this comment.

Zapiano · 2024-10-31T07:02:42Z

ext/AvizExt/viz/economics.jl

+    Legend(g[1, 2], [frontier, data], [frontier_name, data_name])
+
+    # Plot the scale efficiency (ratio of efficiencies assuming CRS vs. assuming VRS)
+    axb = Axis(g[2, 1]; title="Scale efficiency", ylabel=scale_eff_y_lab, axis_opts...)


Suggested change

axb = Axis(g[2, 1]; title="Scale efficiency", ylabel=scale_eff_y_lab, axis_opts...)

ax_b = Axis(g[2, 1]; title="Scale efficiency", ylabel=scale_eff_y_lab, axis_opts...)

Zapiano · 2024-10-31T07:03:03Z

ext/AvizExt/viz/economics.jl

+    )
+
+    # Plot the technical efficiency (inverse VRS efficiencies)
+    axc = Axis(g[3, 1]; title="Technical efficiency", ylabel=tech_eff_y_lab, axis_opts...)


Suggested change

axc = Axis(g[3, 1]; title="Technical efficiency", ylabel=tech_eff_y_lab, axis_opts...)

ax_c = Axis(g[3, 1]; title="Technical efficiency", ylabel=tech_eff_y_lab, axis_opts...)

Rosejoycrocker · 2024-11-06T01:52:03Z

This PR doesn't add any new test. I think it would be a good idea to add at least some basic tests, at least the "happy case" both for the metrics and for the viz function added.

I've added the plotting and calculation functions to analysis.jl tests

- fix function signatures and doc strings - remove constructor function as unnecessary - fix example in doc string - change Y type to AbstractArray - Allow Y to be Vector or Matrix in inner function' - Formatting

- Formatting - adjust variable names - fix function signature and doc string

- remove cf_difference_scenario for future PR

Rosejoycrocker · 2024-11-06T06:05:34Z

I think I've addressed your comments @Zapiano, let me know if you have others

Zapiano

It's almost ready, just a few docstring missing and a refactor suggestion :)

Zapiano · 2024-11-11T23:02:48Z