forked from Nonprofit-Open-Data-Collective/propublica-api
-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.rmd
276 lines (167 loc) · 7.88 KB
/
index.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
---
title: "ProPublic Nonprofit Explorer API"
output:
bookdown::html_document2:
df_print: paged
theme: readable
highlight: tango
self_contained: false
number_sections: false
toc: yes
toc_float: no
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message=F, warning=F, fig.width = 10)
```
## Overview
Demo functions for working with [ProPublica's Nonprofit Explorer](https://projects.propublica.org/nonprofits/).
*Use this database to view summaries of 3 million tax returns from tax-exempt organizations and see financial details such as their executive compensation and revenue and expenses. You can browse IRS data released since 2013 and access more than 14 million tax filing documents going back as far as 2001.*
**From their site:**
Nonprofit Explorer includes summary data for nonprofit tax returns and full Form 990 documents, in both PDF and digital formats.
The summary data contains information processed by the IRS during the 2012-2018 calendar years; this generally consists of filings for the 2011-2017 fiscal years, but may include older records. This data release includes only a subset of what can be found in the full Form 990s.
In addition to the raw summary data, we link to PDFs and digital copies of full Form 990 documents wherever possible. This consists of separate releases by the IRS of Form 990 documents processed by the agency, which we update regularly.
We also link to copies of audits nonprofit organizations that spent $750,000 or more in Federal grant money in a single fiscal year since 2016. These audits are copied from the Federal Audit Clearinghouse.
**Which Organizations Are Here?**
Every organization that has been recognized as tax exempt by the IRS has to file Form 990 every year, unless they make less than $200,000 in revenue and have less than $500,000 in assets, in which case they have to file form 990-EZ. Organizations making less than $50,000 don’t have to file either form but do have to let the IRS they’re still in business via a Form 990N "e-Postcard."
Nonprofit Explorer has organizations claiming tax exemption in each of the 27 subsections of the 501(c) section of the tax code, and which have filed a Form 990, Form 990EZ or Form 990PF. Taxable trusts and private foundations that are required to file a form 990PF are also included. Small organizations filing a Form 990N "e-Postcard" are not included in this data.
**Types of Nonprofits**
There are 27 nonprofit designations based on the numbered subsections of section 501(c) of the tax code.
**How to Research Tax-Exempt Organizations**
We've created a guide for investigating nonprofits for those just getting started as well as for seasoned pros.
**API**
The data powering this website is available programmatically, via an API. Read the API documentation »
**Get the Data**
For those interested in acquiring the original data from the source, here’s where our data comes from:
* Raw filing data. Includes EINs and summary financials as structured data.
* Exempt Organization profiles. Includes organization names, addresses, etc. You can merge this with the raw filing data using EIN numbers.
* Form 990 documents requested and processed by Public.Resource.Org and ProPublica. We post bulk downloads of these documents at the Internet Archive.
* Form 990 documents as XML files. Includes complete filing data (financial details, names of officers, tax schedules, etc.) in machine-readable format. Only available for electronically filed documents.
* Audits. PDFs of single or program-specific audits for nonprofit organizations that spent $750,000 or more in Federal grant money in a single fiscal year. Available for 2016 and later.
## API Capability
There is basic meta-data available for each organization (name, EIN, tax year, total revenue, etc.).
Information for each 990 depends on the type of filing - paper versions or e-files.
The API has some capability to access digitized information, as well as download PDFs of paper filers.
The generic API call will return a list with five tables:
* **organization**: nonprofit org attributes
* **filings_with_data**: basic data from tax years where 990 efile data available
* **filings_without_data**: tax years where PDFs are available, but no digitized data
* **data_source**: original sources of data available through the API
* **api_version**: version of NP Explorer API (currently v2)
## Demo
Grab a table of meta-data available on all returns for one org:
```{r}
library( RCurl )
library( jsonlite )
library( dplyr )
library( pander )
# API CALL
# https://projects.propublica.org/nonprofits/api/v2/organizations/010165097.json
ein <- "010165097"
URL <- paste0( "https://projects.propublica.org/nonprofits/api/v2/organizations/", ein, ".json" )
# THIS BROKE WITH CHANGES TO THE SITE
# results.json <- getURL( URL, ssl.verifypeer = FALSE )
#
# r2 <- gsub( "\n", "", results.json )
#
# api.tables <- fromJSON( r2 )
api.tables <- fromJSON( URL )
names( api.tables )
```
Here are examples of the types of data available in each of 5 tables returned by the API:
```{r}
head( api.tables$organization ) %>% pander()
head( api.tables$filings_with_data ) %>% pander()
head( api.tables$filings_without_data ) %>% pander()
api.tables$data_source %>% pander()
api.tables$api_version
```
## Batch PDF Download
Available PDFs will be listed in the **filings_without_data** table.
Preview PDFs: [ [ON GITHUB](https://github.com/Nonprofit-Open-Data-Collective/propublica-api/tree/main/IRS990_PDFs) ] [ [Download](https://github.com/Nonprofit-Open-Data-Collective/propublica-api/raw/main/IRS990_PDFs.zip) ]
For the organizations with paper filings available you can download PDFs using the following loop:
```{r}
# BATCH DOWNLOAD PDFs
dir.create( "IRS990_PDFs" )
setwd( "./IRS990_PDFs" )
pdfs <- fromJSON( r2 )$filings_without_data
if( nrow(pdfs) > 0 )
{
for( i in 1:nrow(pdfs) )
{
file.name <- paste0( "EIN_", ein , "_YEAR_", pdfs$tax_prd_yr[i], ".pdf" )
download.file( url=pdfs$pdf_url[i],
destfile=file.name,
mode="wb" )
}
}
dir()
```
## Draft Function
```{r}
download_pdfs <- function( ein )
{
URL <- paste0( "https://projects.propublica.org/nonprofits/api/v2/organizations/", ein, ".json" )
results.json <- getURL( URL, ssl.verifypeer = FALSE )
r2 <- gsub( "\n", "", results.json )
d <- fromJSON( r2 )
pdfs <- d$filings_without_data
if( nrow(pdfs) > 0 )
{
for( i in 1:nrow(pdfs) )
{
file.name <- paste0( "EIN_", ein , "_YEAR_", pdfs$tax_prd_yr[i], ".pdf" )
download.file( url=pdfs$pdf_url[i],
destfile=file.name,
mode="wb" )
}
}
return( dir() )
}
```
## Full API Documentation
The full documentation of all API functionality can be found at [Nonprofit Explorer API](https://projects.propublica.org/nonprofits/api).
<br>
<br>
<br>
<hr>
<br>
<center>
![](images/nodc-icon.png) <i><h3>Nonprofit Open Data Collection</h3></i>
*Code available on the [NODC GitHub Project](https://github.com/Nonprofit-Open-Data-Collective/propublica-api)*
</center>
<br>
<hr>
<br>
<br>
<br>
<br>
```{css, echo=F}
body{
font-family:system-ui,-apple-system,"Segoe UI",Roboto,Helvetica,Arial,sans-serif;
font-size:calc(1.5em + 0.25vw);
font-weight:300;line-height:1.65;
-webkit-font-smoothing:antialiased;
-moz-osx-font-smoothing:grayscale;
margin-left:20%;
margin-right:20%}
h1, h3, h4 { color: #995c00; }
h2 { margin-top:120px }
.footer {
background-color:#726e6e;
height:340px;
color:white;
padding: 20px 3px 20px 3px;
margin:0px;
line-height: normal;
}
.footer a{ color:orange; text-decoration:bold !important; }
table{
border-spacing:1px;
margin-top:80px;
margin-bottom:100px !important;
margin-left: auto;
margin-right: auto;
align:center}
td{ padding: 6px 10px 6px 10px }
th{ text-align: left; }
```