Implement/Document Causal Workflow #9

cthoyt · 2023-10-24T12:47:00Z

Implement/document Step 1: "Repair the Network Structure" #4 - heavily built on Joseph's conditional independencies and falsification module
Step 2 "Check query identifiability" already built-in to y0
Implement/document Step 3: "Simplify the Network" #5 - heavily built on @cthoyt's implementation of Robin Evans rules
Implement/document Step 4: "Estimate the Query" #6 - we aren't able to actually use estimands. Instead, there's a thin wrapper around Ananke's estimation pipeline. This only works for single treatment/outcome and binary variables.
~~Implement/document Step 5: "Further simplify the network" #7 - not sure what this entails~~
~~Implement/document Step 6: "Verify the correctness of results" #8 - not sure what this entails~~

What does it mean to be "complete" with a task?

Code has been implemented, a pull request has been made, CI tests all pass, and it has been merged
Documentation on both the main functions used in the high-level workflow as well as a full documentation of what the step does (written as part of the python package documentation).
- How do you know if the documentation is good? Consider if you started using NumPy but had never seen it before. How much did you rely on the documentation? It needs to give both high-level context (at module level), specific documentation about all of the inputs and outputs (in docstring), and demonstration of specific usage (i.e., with real data, that has real context)
- All documentation should be written for a person who does not understand the methodology (or care about understanding it) but wants to be oriented so they know when they should use each part, how they should use it (give demonstrations), and how they should interpret the part that comes out

Case Studies

All case studies need to have full, working implementations that have minimal (or ideally, none) data massaging outside of the workflow itself.

Each Needs to include context (how was the network made, what choices were put into it, what is the biology that it's modeling) as well as a guide on how to interpret the results

Each Needs a comparison of the usage of parts of the causal workflow vs not using it. How do you interpret these results? Are they "better", and what is the metric for that? This should answer the question to the reader: why should I bother with this (i.e., Eliater) instead of just directly using y0, ananke, or similar?

cthoyt · 2023-10-24T13:32:49Z

Other issue we talked about: backdoor/frontdoor doesn't get applied by y0 when generating an estimand that can cause different results from ananke's estimation. This might be true, but we need to test this empirically, as it isn't always the case (and we can't lose anything from marking fewer nodes as latent)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement/Document Causal Workflow #9

Implement/Document Causal Workflow #9

cthoyt commented Oct 24, 2023 •

edited

Loading

cthoyt commented Oct 24, 2023

Implement/Document Causal Workflow #9

Implement/Document Causal Workflow #9

Comments

cthoyt commented Oct 24, 2023 • edited Loading

Case Studies

cthoyt commented Oct 24, 2023

cthoyt commented Oct 24, 2023 •

edited

Loading