by weLaika
Sputnik is a website crawler written in Elixir.
It crawls a website following all internal links and makes a report of all pages' status codes.
With query flags you can pass one ore more css selector to produce pages report about that.
Sputnik can be built with:
mix deps.get
mix escript.build
Sputnik takes the url to crawl and optional query to perform on the crawled pages:
- query: valid css selectors, separated by commas, that you want to analyze all over the website
- connections: max number of concurrent HTTP connections (default is 10)
sputnik [--query <Q> --query <Q1> ...] [--connections <N>] <url>
running
./sputnik "http://spawnfest.github.io" --query "div" --query "a" --query "h1,h2,h3,h4,h5,h6" --connections 10
produces the following output
#################### Pages ####################
Pages found: 19
status_code 200: 12
status_code 301: 7
#################### Queries ####################
## query `a` ##
327 result(s)
Min 18 result(s) per page
Max 57 result(s) per page
## query `div` ##
347 result(s)
Min 13 result(s) per page
Max 53 result(s) per page
## query `h1,h2,h3,h4,h5,h6` ##
95 result(s)
Min 0 result(s) per page
Max 31 result(s) per page
and it opens the browser with a page like this
Documentation can be generated with ExDoc and published on HexDocs. Once published, the docs can be found at https://hexdocs.pm/sputnik.
To run tests:
$ mix test --cover
To run credo:
$ mix credo
To generate the documentation:
$ mix docs && open doc/index.html
Bump the version in mix.exs
, commit && push, and run mix hex.publish
Please read https://hex.pm/docs/publish for help.