Nosey Parker is a command-line tool that finds secrets and sensitive information in textual data. It is useful both for offensive and defensive security testing.
Key features:
- It supports scanning files, directories, and the entire history of Git repositories
- It uses regular expression matching with a set of 88 patterns chosen for high signal-to-noise based on experience and feedback from offensive security engagements
- It groups matches together that share the same secret, further emphasizing signal over noise
- It is fast: it can scan at hundreds of megabytes per second on a single core, and is able to scan 100GB of Linux kernel source history in less than 2 minutes on an older MacBook Pro
This open-source version of Nosey Parker is a reimplementation of the internal version in use at Praetorian. The internal version has additional capabilities for false positive suppression and an alternative machine learning-based detection engine. Read more in blog posts here and here.
1. (On x86_64) Install the Hyperscan library and headers for your system
On macOS using Homebrew:
brew install hyperscan pkg-config
On Ubuntu 22.04:
apt install libhyperscan-dev pkg-config
1. (On non-x86_64) Build Vectorscan from source
You will need several dependencies, including cmake
, boost
, ragel
, and pkg-config
.
Download and extract the source for the 5.4.8 release of Vectorscan:
wget https://github.com/VectorCamp/vectorscan/archive/refs/tags/vectorscan/5.4.8.tar.gz && tar xfz 5.4.8.tar.gz
Build with cmake:
cd vectorscan-vectorscan-5.4.8 && cmake -B build -DFAT_RUNTIME=OFF -DCMAKE_BUILD_TYPE=Release . && cmake --build build
Set the HYPERSCAN_ROOT
environment variable so that Nosey Parker builds against your from-source build of Vectorscan:
export HYPERSCAN_ROOT="$PWD/build"
Note: The Nosey Parker Dockerfile
builds Vectorscan from source and links against that.
2. Install the Rust toolchain
Recommended approach: install from https://rustup.rs
3. Build using Cargo
cargo build --release
This will produce a binary at target/release/noseyparker
.
A prebuilt Docker image is available for the latest release for x86_64:
docker pull ghcr.io/praetorian-inc/noseyparker:latest
A prebuilt Docker image is available for the most recent commit for x86_64:
docker pull ghcr.io/praetorian-inc/noseyparker:edge
For other architectures (e.g., ARM) you will need to build the Docker image yourself:
docker build -t noseyparker .
Run the Docker image with a mounted volume:
docker run -v "$PWD":/opt/ noseyparker
Note: The Docker image runs noticeably slower than a native binary, particularly on macOS.
Most Nosey Parker commands use a datastore.
This is a special directory that Nosey Parker uses to record its findings and maintain its internal state.
A datastore will be implicitly created by the scan
command if needed.
You can also create a datastore explicitly using the datastore init -d PATH
command.
Nosey Parker has built-in support for scanning files, recursively scanning directories, and scanning the entire history of Git repositories.
For example, if you have a Git clone of CPython locally at cpython.git
, you can scan its entire history with the scan
command.
Nosey Parker will create a new datastore at np.cpython
and saves its findings there.
$ noseyparker scan --datastore np.cpython cpython.git
Found 28.30 GiB from 18 plain files and 427,712 blobs from 1 Git repos [00:00:04]
Scanning content ████████████████████ 100% 28.30 GiB/28.30 GiB [00:00:53]
Scanned 28.30 GiB from 427,730 blobs in 54 seconds (538.46 MiB/s); 4,904/4,904 new matches
Rule Distinct Groups Total Matches
───────────────────────────────────────────────────────────
PEM-Encoded Private Key 1,076 1,192
Generic Secret 331 478
netrc Credentials 42 3,201
Generic API Key 2 31
md5crypt Hash 1 2
Run the `report` command next to show finding details.
You can specify multiple inputs to scan at once in any combination of the supported input types (files, directories, and Git repos).
Nosey Parker prints out a summary of its findings when it finishes scanning. You can also run this step separately:
$ noseyparker summarize --datastore np.cpython
Rule Distinct Groups Total Matches
───────────────────────────────────────────────────────────
PEM-Encoded Private Key 1,076 1,192
Generic Secret 331 478
netrc Credentials 42 3,201
Generic API Key 2 31
md5crypt Hash 1 2
Additional output formats are supported, including JSON and JSON lines, via the --format=FORMAT
option.
To see details of Nosey Parker's findings, use the report
command.
This prints out a text-based report designed for human consumption:
$ noseyparker report --datastore np.cpython
Finding 1/1452: Generic API Key
Match: QTP4LAknlFml0NuPAbCdtvH4KQaokiQE
Showing 3/29 occurrences:
Occurrence 1:
Git repo: clones/cpython.git
Blob: 04144ceb957f550327637878dd99bb4734282d07
Lines: 70:61-70:100
e buildbottest
notifications:
email: false
webhooks:
urls:
- https://python.zulipchat.com/api/v1/external/travis?api_key=QTP4LAknlFml0NuPAbCdtvH4KQaokiQE&stream=core%2Ftest+runs
on_success: change
on_failure: always
irc:
channels:
# This is set to a secure vari
Occurrence 2:
Git repo: clones/cpython.git
Blob: 0e24bae141ae2b48b23ef479a5398089847200b3
Lines: 174:61-174:100
j4 -uall,-cpu"
notifications:
email: false
webhooks:
urls:
- https://python.zulipchat.com/api/v1/external/travis?api_key=QTP4LAknlFml0NuPAbCdtvH4KQaokiQE&stream=core%2Ftest+runs
on_success: change
on_failure: always
irc:
channels:
# This is set to a secure vari
...
(Note: the findings above are synthetic, invalid secrets.)
Additional output formats are supported, including JSON and JSON lines, via the --format=FORMAT
option.
To list URLs for repositories belonging to GitHub users or organizations, use the github repos list
command.
This command uses the GitHub REST API to enumerate repositories belonging to one or more users or organizations.
For example:
$ noseyparker github repos list --user octocat
https://github.com/octocat/Hello-World.git
https://github.com/octocat/Spoon-Knife.git
https://github.com/octocat/boysenberry-repo-1.git
https://github.com/octocat/git-consortium.git
https://github.com/octocat/hello-worId.git
https://github.com/octocat/linguist.git
https://github.com/octocat/octocat.github.io.git
https://github.com/octocat/test-repo1.git
An optional GitHub Personal Access Token can be provided via the GITHUB_TOKEN
environment variable.
Providing an access token gives a higher API rate limit and may make additional repositories accessible to you.
Additional output formats are supported, including JSON and JSON lines, via the --format=FORMAT
option.
Running the noseyparker
binary without arguments prints top-level help and exits.
You can get abbreviated help for a particular command by running noseyparker COMMAND -h
.
More detailed help is available with the help
command.
For example:
$ noseyparker scan -h
Scan content for secrets
Usage: noseyparker scan [OPTIONS] --datastore <PATH> <INPUT>...
Arguments:
<INPUT>... Paths of inputs to scan
Options:
-d, --datastore <PATH> Use the specified datastore path [env: NP_DATASTORE=]
-j, --jobs <N> The number of parallel scanning jobs [default: 12]
-r, --rules <PATH> Path of custom rules to use
-h, --help Print help information (use `--help` for more detail)
Content Discovery Options:
--max-file-size <MEGABYTES> Do not scan files larger than the specified size [default: 100]
-i, --ignore <FILE> Path of a custom ignore rules file to use
Global Options:
-v, --verbose... Enable verbose output
--color <MODE> Enable or disable colored output [default: auto] [possible values: auto, never, always]
--progress <MODE> Enable or disable progress bars [default: auto] [possible values: auto, never, always]
Contributions are welcome, particularly new regex rules. Developing new regex rules is detailed in a separate document.
If you are considering making significant code changes, please open an issue first to start discussion.
Nosey Parker is licensed under the Apache License, Version 2.0.
Any contribution intentionally submitted for inclusion in Nosey Parker by you, as defined in the Apache 2.0 license, shall be licensed as above, without any additional terms or conditions.