Locate, summarize, and visualize pockets of social activity in meatspace.
This app detects geotagged Twitter activity that stands out above typical background levels and tweets about it.
Example account: https://twitter.com/happening_sf
- Install poetry as described here
- Install requirements:
poetry install
If running the app on Heroku (see below), .env
is not needed but it may still be convenient to fill in the environment variables.
- Copy the (hidden)
.env_template.ini
file to.env
- Edit
.env
to include your credentials (don't commit this file)
- If running the app on Heroku, you can easily provision a database for your app by installing the Postgres add-on (see below).
- Your database credentials will automatically be added to your app's Config Vars.
- If not running the app on Heroku, you'll need to set up your own database.
- Add your database credentials to
.env
- Add your database credentials to
- Run the application:
poetry run python thisishappening/app.py
- Alternatively, activate the virtual environment that poetry created by running
poetry shell
and then run the script:python thisishappening/app.py
- Deactivate the virtual environment by running
deactivate
- Alternatively, activate the virtual environment that poetry created by running
These instructions use the Heroku CLI
- Fork this repo on GitHub and ensure you have a branch called
main
- Create a new app on Heroku:
heroku create my-app-name
- Install add-ons for:
- Papertrail
heroku addons:create papertrail -a my-app-name
- Postgres
heroku addons:create heroku-postgres -a my-app-name
- Papertrail
- Create a new token:
heroku authorizations:create -d "my cool token description"
- Add the token to your GitHub repo's Secrets under the name
HEROKU_API_KEY
- Add the token to your GitHub repo's Secrets under the name
- Add your Heroku app's name to the GitHub repo's Secrets under the name
HEROKU_APP_NAME
(or however it is configured in.github/workflows/deploy.yaml
) - Configure the application by adding environment variables as Config Vars
- Commit and push to your GitHub repo's
main
branch- This can be through committing a change, merging a PR, or just running
git commit -m "empty commit" --allow-empty
- This will use GitHub Actions to build the app using Docker and deploy to Heroku
- This can be through committing a change, merging a PR, or just running
- View the logs via the Heroku CLI or on Papertrail
- Tweets
- Use a density-based metric for event detection, rather than aggregating within pre-defined boundaries (map tiles) and tracking statistics for each tile
- Define activity thresholds using intuitive, human readable values
- Prevent a single prolific user from easily triggering an event by decreasing the weight of their tweets
- Deduplicate tokens within each tweet
- Reduce weight for tweets with specific longitude and latitude (e.g., "canonical" city locations that get assigned to Instagram photo posts)
- Detect and ignore spam tweets, e.g., job postings, apartment listings
- Use a density-based metric for event detection, rather than aggregating within pre-defined boundaries (map tiles) and tracking statistics for each tile
- Queries
- Provide access to the tweets associated with each event
- Write query to keep most recent N days of data and run in main loop
- Maintain maximum recent_tweets table row count and run in main loop
- Clustering
- When an event is found, run a clustering algorithm (DBSCAN) on all recent tweets to determine the full set of event tweets
- Define cluster neighborhood limits using intuitive, human readable values (e.g., kilometers)
- When an event is found, tweet an image of a map with the location/heat map
- Set my tweet's location to the event latitude and longitude
- Exclude my own tweets from the search (put myself in the list of users to ignore)
- Plot the pulse of a neighborhood over time: count of tweets by hour
- Many tweets show up at the canonical city location, especially due to Swarm's "posted a photo" feature
- Using tiles / artificial region boundaries required more workarounds and convoluted solutions than originally expected. Some downsides:
- Potentially splits events across regions
- Keeping the running statistics requires storing many rows in database table; wouldn't be an issue if I wasn't trying to operate on a shoestring budget because I could run my own database
- It's not uncommon to get a false alarm due to one user posting many tweets in a short time period
- To add a location to the bot tweets, need to enable in: Settings and privacy -> Privacy and safety -> Location information -> Add location information to your Tweets
Copyright (c) 2020 Matt Mollison Licensed under the MIT license.