To enable a more automated approach to gathering information about companies company_dns
was created. This release enables the synthesis of data from the SEC EDGAR repository and Wikipedia. A Medium article entitled "A case for API based open company firmographics" is available discussing the process and motivation behind the creation of this service.
The V3.0.0 release of the company_dns
is a significant update to the service. The primary changes are:
- Shift from Flask to Starlette with Uvicorn.
- Automated monthly container builds, from the main branch of the repository, using GitHub Actions.
- Simplification of all aspects of the service including code structure, shift towards simpler Docker, and a more streamlined service control script.
- Vastly improved embedded help with a query console to test queries. We were motivated to make these changes to the service making it easier to improve, maintain and use.
The install and setup process is either for users or developers. Instructions for both are provided below.
New from V3.0.0
are automated docker builds providing a fresh image on a monthly basis. There are three reasons for this:
- Gets the latest information from EDGAR such that when the service is queried the user can access the latest quarterly and yearly filings.
- As the code progresses, and is checked into main, users will automatically get the latest improvements and fixes.
- Creates images for both x86 and ARM architectures.
The image can be pulled using docker pull ghcr.io/miha42-github/company_dns/company_dns:latest
. With the image pulled you can run it using docker run -m 1G -p 8000:8000 company_dns:latest
which will run the image in the foreground, and running the image in the background docker run -d -m 1G -p 8000:8000 company_dns:latest
. GitHub's container registry is used to store the images, and more information on this package can be found at company_dns/company_dns.
Assuming you have setup access to GitHub and a Linux or MacOS system of some kind, you'll need to get the repository.
- Create a directory that will contain the code:
mkdir ~/dev
- Enter the directory:
cd ~/dev/
- Clone the repository:
git clone git@github.com:miha42-github/company_dns.git
Since the docker build process takes care of data cache creation, Python requirements installation and other items getting company_dns running is relatively straight forward. To simplify the process further the svc_ctl.sh
script is provided.
svc_ctl.sh
automates build, run, and log observations for company_dns removing many manual steps. To start the system follow these steps:
- Assuming you've cloned the code into
~/dev/company_dns
runcd ~/dev/company_dns
./svc_ctl.sh build
to build the image./svc_ctl.sh foreground
to run the image in the foreground orsvc_ctl.sh start
to run the image in the background
If the service is running in the background you can run ./svc_ctl.sh tail
to watch the image logs. Finally, stopping the image when it is running the background can be achieved with ./svc_ctl.sh stop
.
Usage for the service control script follows:
NAME:
./svc_ctl.sh <sub-command>
DESCRIPTION:
Control functions to run the company_dns
COMMANDS:
help start stop build foreground tail
help - call up this help text
start - start the service using docker-compose
stop - stop the docker service
build - build the docker images for the server
foreground - run the server in the foreground to watch for output
tail - tail the logs for a server running in the background
Depending upon the intention for getting the code it could be running in a Python virtual environment or in a vanilla file system. Regardless the steps below can be followed to get the service up and running.
Before you get started it is important to install all prequisites and then create the cache database.
- Enter the directory with the service bits (assuming you're using ~/dev):
cd ~/dev/company_dns/company_dns
- Install all prerequsites:
pip3 install -r ./requirements.txt
- Create the database cache
python3 ./makedb.py
If everything above completed successfully then running company_dns can be performed via python3 ./company_dns.py
this will run the service in the foreground.
Regardless of the approach taken to run the company_dns checking to see if it is operating is important. A quick way to check on service availability when running on localhost is to follow this link: http://localhost:8000/help. If this is successful the embedded help will display (see screenshot below) describing available endpoints, examples with curl
, and some helpful links to the company_dns GitHub repository. Additionally, new in V3.0.0 is the query console which can be used to test key functions of the system.
A live system is available for Mediuroast efforts and for anyone to try out, relevant links are below.
- Embedded help and query console - https://company-dns.mediumroast.io/help
- Company search for IBM - https://company-dns.mediumroast.io/V3.0/global/company/merged/firmographics/IBM
- Standard industry code search for
Oil
- https://www.mediumroast.io/company_dns/V3.0/na/sic/description/oil
If you encounter a problem with the company_dns please first review existing open issues, and if you find a match then please add a comment with any detail you might deem relevant. If you're unable to find an issue that matches the behavior you're seeing please open a new issue.
We try to keep high level Todos and Improvements in a list contained in a section below, and as we begin to work on things we will create a corresponding issue, link to it, progress and close it. However, if there is a change in design, major improvement, and so on something may fall off the list below. If something isn't on the list then please create a new issue and we will evaluate. We'll let you know if we pick up your request and progress to working on it.
Here are the things that are likely to be worked but without any strict deadline:
- Determine if feasible to talk to the companies house API for gathering data from the UK
- Initial feasibility has been checked, but the value of the data is still being evaluated
- Research other pools of public data which can serve to enrich
- There are additional data pools including NAICS and UK SIC codes which could be added. Additional Industry Code data sources by country are likely a first target to add. The deeper question is how to merge these data sources for a kind of universal classification.
- Evaluate if financial data can be added from EDGAR, Wikipedia and Companies House
- Provide instructions/details for running on a Pi or Arm based system
- Since one of the target docker images is for ARM, the next logical step is to provide instructions for running on a Pi.
Run on a RasberryPi: To be reauthored
Since this code falls under a liberal Apache-V2 license it is provided as is, without warranty or guarantee of support. Feel free to fork the code, but please provide attribution to the authors.
- PyEdgar - used to interface with the SEC's EDGAR repository
- SQLite - helps all utilities and the RESTful service quickly and expressively respond to interactions with the other elements to find appropriate company data
- Starlette - used to create the RESTful service
- Uvicorn - used to run the RESTful service
- GeoPy with ArcGIS - Enables proper address formatting and reporting of lat-long pairs for companies
- wptools - provides access to MediaWiki data for company search