Skip to content
forked from bitfield/script

Making it easy to write shell-like scripts in Go

License

Notifications You must be signed in to change notification settings

14bits/go-script

 
 

Repository files navigation

GoDocGo Report CardMentioned in Awesome GoCircleCI

What is script?

script is a Go library for doing the kind of tasks that shell scripts are good at: reading files, executing subprocesses, counting lines, matching strings, and so on.

Why shouldn't it be as easy to write system administration programs in Go as it is in a typical shell? script aims to make it just that easy.

Shell scripts often compose a sequence of operations on a stream of data (a pipeline). This is how script works, too.

How do I import it?

import github.com/bitfield/script

What can I do with it?

Let's see a simple example. Suppose you want to read the contents of a file as a string:

contents, err := script.File("test.txt").String()

That looks straightforward enough, but suppose you now want to count the lines in that file.

numLines, err := script.File("test.txt").CountLines()

For something a bit more challenging, let's try counting the number of lines in the file which match the string "Error":

numErrors, err := script.File("test.txt").Match("Error").CountLines()

But what if, instead of reading a specific file, we want to simply pipe input into this program, and have it output only matching lines (like grep)?

script.Stdin().Match("Error").Stdout()

That was almost too easy! So let's pass in a list of files on the command line, and have our program read them all in sequence and output the matching lines:

script.Args().Concat().Match("Error").Stdout()

Maybe we're only interested in the first 10 matches. No problem:

script.Args().Concat().Match("Error").First(10).Stdout()

What's that? You want to append that output to a file instead of printing it to the terminal? You've got some attitude, mister.

script.Args().Concat().Match("Error").First(10).AppendFile("/var/log/errors.txt")

Table of contents

How does it work?

Those chained function calls look a bit weird. What's going on there?

One of the neat things about the Unix shell, and its many imitators, is the way you can compose operations into a pipeline:

cat test.txt | grep Error | wc -l

The output from each stage of the pipeline feeds into the next, and you can think of each stage as a filter which passes on only certain parts of its input to its output.

By comparison, writing shell-like scripts in raw Go is much less convenient, because everything you do returns a different data type, and you must (or at least should) check errors following every operation.

In scripts for system administration we often want to compose different operations like this in a quick and convenient way. If an error occurs somewhere along the pipeline, we would like to check this just once at the end, rather than after every operation.

Everything is a pipe

The script library allows us to do this because everything is a pipe (specifically, a script.Pipe). To create a pipe, start with a source like File():

var p script.Pipe
p = script.File("test.txt")

You might expect File() to return an error if there is a problem opening the file, but it doesn't. We will want to call a chain of methods on the result of File(), and it's inconvenient to do that if it also returns an error.

Instead, you can check the error status of the pipe at any time by calling its Error() method:

p = script.File("test.txt")
if p.Error() != nil {
    log.Fatalf("oh no: %v", p.Error())
}

What use is a pipe?

Now, what can you do with this pipe? You can call a method on it:

var q script.Pipe
q = p.Match("Error")

Note that the result of calling a method on a pipe is another pipe. You can do this in one step, for convenience:

var q script.Pipe
q = script.File("test.txt").Match("Error")

Handling errors

Woah, woah! Just a minute! What if there was an error opening the file in the first place? Won't Match blow up if it tries to read from a non-existent file?

No, it won't. As soon as an error status is set on a pipe, all operations on the pipe become no-ops. Any operation which would normally return a new pipe just returns the old pipe unchanged. So you can run as long a pipeline as you want to, and if an error occurs at any stage, nothing will crash, and you can check the error status of the pipe at the end.

(Seasoned Gophers will recognise this as the errWriter pattern described by Rob Pike in the blog post Errors are values.)

Getting output

A pipe is useless if we can't get some output from it. To do this, you can use a sink, such as String():

result, err := q.String()
if err != nil {
    log.Fatalf("oh no: %v", err)
}
fmt.Println(result)

Errors

Note that sinks return an error value in addition to the data. This is the same value you would get by calling p.Error(). If the pipe had an error in any operation along the pipeline, the pipe's error status will be set, and a sink operation which gets output will return the zero value, plus the error.

numLines, err := script.File("doesnt_exist.txt").CountLines()
fmt.Println(numLines)
// Output: 0
if err != nil {
	    log.Fatal(err)
}
// Output: open doesnt_exist.txt: no such file or directory

CountLines() is another useful sink, which simply returns the number of lines read from the pipe.

Closing pipes

If you've dealt with files in Go before, you'll know that you need to close the file once you've finished with it. Otherwise, the program will retain what's called a file handle (the kernel data structure which represents an open file). There is a limit to the total number of open file handles for a given program, and for the system as a whole, so a program which leaks file handles will eventually crash, and will waste resources in the meantime.

Files aren't the only things which need to be closed after reading: so do network connections, HTTP response bodies, and so on.

How does script handle this? Simple. The data source associated with a pipe will be automatically closed once it is read completely. Therefore, calling any sink method which reads the pipe to completion (such as String()) will close its data source. The only case in which you need to call Close() on a pipe is when you don't read from it, or you don't read it to completion.

If the pipe was created from something that doesn't need to be closed, such as a string, then calling Close() simply does nothing.

This is implemented using a type called ReadAutoCloser, which takes an io.Reader and wraps it so that:

  1. it is always safe to close (if it's not a closable resource, it will be wrapped in an ioutil.NopCloser to make it one), and
  2. it is closed automatically once read to completion (specifically, once the Read() call on it returns io.EOF).

It is your responsibility to close a pipe if you do not read it to completion.

Why not just use shell?

It's a fair question. Shell scripts and one-liners are perfectly adequate for building one-off tasks, initialization scripts, and the kind of 'glue code' that holds the internet together. I speak as someone who's spent at least thirty years doing this for a living. But in many ways they're not ideal for important, non-trivial programs:

  • Trying to build portable shell scripts is a nightmare. The exact syntax and options of Unix commands varies from one distribution to another. Although in theory POSIX is a workable common subset of functionality, in practice it's usually precisely the non-POSIX behaviour that you need.

  • Shell scripts are hard to test (though test frameworks have been written, and if you're seriously putting mission-critical shell scripts into production, you should be using them, or reconsidering your technology choices).

  • Shell scripts don't scale. Because there are very limited facilities for logic and abstraction, and because any successful program tends to grow remorselessly over time, shell scripts can become an unreadable mess of special cases and spaghetti code. We've all seen it, if not, indeed, done it.

  • Shell syntax is awkward: quoting, whitespace, and brackets can require a lot of fiddling to get right, and so many characters are magic to the shell (*, ?, > and so on) that this can lead to subtle bugs. Scripts can work fine for years until you suddenly encounter a file whose name contains whitespace, and then everything breaks horribly.

  • Deploying shell scripts obviously requires at least a (sizable) shell binary in addition to the source code, but it usually also requires an unknown and variable number of extra userland programs (cut, grep, head, and friends). If you're building container images, for example, you effectively need to include a whole Unix distribution with your program, which runs to hundreds of megabytes, and is not at all in the spirit of containers.

To be fair to the shell, this kind of thing is not what it was ever intended for. Shell is an interactive job control tool for launching programs, connecting programs together, and to a limited extent, manipulating text. It's not for building portable, scalable, reliable, and elegant programs. That's what Go is for.

Go has a superb testing framework built right into the standard library. It has a superb standard library, and thousands of high-quality third-party packages for just about any functionality you can imagine. It is compiled, so it's fast, and statically typed, so it's reliable. It's efficient and memory-safe. Go programs can be distributed as a single binary. Go scales to enormous projects (Kubernetes, for example).

The script library is implemented entirely in Go, and does not require any userland programs (or any other dependencies) to be present. Thus you can build your script program as a container image containing a single (very small) binary, which is quick to build, quick to upload, quick to deploy, quick to run, and economical with resources.

If you've ever struggled to get a shell script containing a simple if statement to work (and who hasn't?), then the script library is dedicated to you.

A real-world example

Let's use script to write a program which system administrators might actually need. One thing I often find myself doing is counting the most frequent visitors to a website over a given period of time. Given an Apache log in the Common Log Format like this:

212.205.21.11 - - [30/Jun/2019:17:06:15 +0000] "GET / HTTP/1.1" 200 2028 "https://example.com/ "Mozilla/5.0 (Linux; Android 8.0.0; FIG-LX1 Build/HUAWEIFIG-LX1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.156 Mobile Safari/537.36"

we would like to extract the visitor's IP address (the first column in the logfile), and count the number of times this IP address occurs in the file. Finally, we might like to list the top 10 visitors by frequency. In a shell script we might do something like:

cut -d' ' -f 1 access.log |sort |uniq -c |sort -rn |head

There's a lot going on there, and it's pleasing to find that the equivalent script program is quite brief:

package main

import (
	"github.com/bitfield/script"
)

func main() {
	script.Stdin().Column(1).Freq().First(10).Stdout()
}

(Thanks to Lucas Bremgartner for suggesting this example. You can find the complete program, along with a sample logfile, in the examples/visitors/ directory.)

Quick start: Unix equivalents

If you're already familiar with shell scripting and the Unix toolset, here is a rough guide to the equivalent script operation for each listed Unix command.

Unix / shell script equivalent
(any program name) Exec()
> WriteFile()
>> AppendFile()
$* Args()
cat File() / Concat()
cut Column()
echo Echo()
grep Match() / MatchRegexp()
grep -v Reject() / RejectRegexp()
head First()
sed Replace() / ReplaceRegexp()
uniq -c Freq()
wc -l CountLines()

Sources, filters, and sinks

script provides three types of pipe operations: sources, filters, and sinks.

  1. Sources create pipes from input in some way (for example, File() opens a file).
  2. Filters read from a pipe and filter the data in some way (for example Match() passes on only lines which contain a given string).
  3. Sinks get the output from a pipeline in some useful form (for example String() returns the contents of the pipe as a string), along with any error status.

Let's look at the source, filter, and sink options that script provides.

Sources

These are operations which create a pipe.

Args

Args() creates a pipe containing the program's command-line arguments, one per line.

p := script.Args()
output, err := p.String()
fmt.Println(output)
// Output: command-line arguments

Echo

Echo() creates a pipe containing a given string:

p := script.Echo("Hello, world!")
output, err := p.String()
fmt.Println(output)
// Output: Hello, world!

Exec

Exec() runs a given command and creates a pipe containing its combined output (stdout and stderr). If there was an error running the command, the pipe's error status will be set.

p := script.Exec("echo hello")
output, err := p.String()
fmt.Println(output)
// Output: hello

Note that Exec() can also be used as a filter, in which case the given command will read from the pipe as its standard input.

Exit status

If the command returns a non-zero exit status, the pipe's error status will be set to the string "exit status X", where X is the integer exit status.

p := script.Exec("ls doesntexist")
output, err := p.String()
fmt.Println(err)
// Output: exit status 1

For convenience, you can get this value directly as an integer by calling ExitStatus() on the pipe:

p := script.Exec("ls doesntexist")
var exit int = p.ExitStatus()
fmt.Println(exit)
// Output: 1

The value of ExitStatus() will be zero unless the pipe's error status matches the string "exit status X", where X is a non-zero integer.

Error output

Even in the event of a non-zero exit status, the command's output will still be available in the pipe. This is often helpful for debugging. However, because String() is a no-op if the pipe's error status is set, if you want output you will need to reset the error status before calling String():

p := Exec("man bogus")
p.SetError(nil)
output, err := p.String()
fmt.Println(output)
// Output: No manual entry for bogus

File

File() creates a pipe that reads from a file.

p = script.File("test.txt")
output, err := p.String()
fmt.Println(output)
// Output: contents of file

Stdin

Stdin() creates a pipe which reads from the program's standard input.

p := script.Stdin()
output, err := p.String()
fmt.Println(output)
// Output: [contents of standard input]

Filters

Filters are operations on an existing pipe that also return a pipe, allowing you to chain filters indefinitely.

Column

Column() reads input tabulated by whitespace, and outputs only the Nth column of each input line (like Unix cut). Lines containing less than N columns will be ignored.

For example, given this input:

  PID   TT  STAT      TIME COMMAND
    1   ??  Ss   873:17.62 /sbin/launchd
   50   ??  Ss    13:18.13 /usr/libexec/UserEventAgent (System)
   51   ??  Ss    22:56.75 /usr/sbin/syslogd

and this program:

script.Stdin().Column(1).Stdout()

this will be the output:

PID
1
50
51

Concat

Concat() reads a list of filenames from the pipe, one per line, and creates a pipe which concatenates the contents of those files. For example, if you have files a, b, and c:

output, err := Echo("a\nb\nc\n").Concat().String()
fmt.Println(output)
// Output: contents of a, followed by contents of b, followed
// by contents of c

This makes it convenient to write programs which take a list of input files on the command line, for example:

func main() {
	script.Args().Concat().Stdout()
}

The list of files could also come from a file:

// Read all files in filelist.txt
p := File("filelist.txt").Concat()

...or from the output of a command:

// Print all config files to the terminal.
p := Exec("ls /var/app/config/").Concat().Stdout()

Each input file will be closed once it has been fully read.

EachLine

EachLine() lets you create custom filters. You provide a function, and it will be called once for each line of input. If you want to produce output, your function can write to a supplied strings.Builder. The return value from EachLine is a pipe containing your output.

p := script.File("test.txt")
q := p.EachLine(func(line string, out *strings.Builder) {
	out.WriteString("> " + line + "\n")
})
output, err := q.String()
fmt.Println(output)

Exec

Exec() runs a given command, which will read from the pipe as its standard input, and returns a pipe containing the command's combined output (stdout and stderr). If there was an error running the command, the pipe's error status will be set.

Apart from connecting the pipe to the command's standard input, the behaviour of an Exec() filter is the same as that of an Exec() source.

// `cat` copies its standard input to its standard output.
p := script.Echo("hello world").Exec("cat")
output, err := p.String()
fmt.Println(output)
// Output: hello world

First

First() reads its input and passes on the first N lines of it (like Unix head):

script.Stdin().First(10).Stdout()

Freq

Freq() counts the frequencies of input lines, and outputs only the unique lines in the input, each prefixed with a count of its frequency, in descending order of frequency (that is, most frequent lines first). Lines with the same frequency will be sorted alphabetically. For example, given this input:

banana
apple
orange
apple
banana

and a program like:

script.Stdin().Freq().Stdout()

the output will be:

2 apple
2 banana
1 orange

This is a common pattern in shell scripts to find the most frequently-occurring lines in a file:

sort testdata/freq.input.txt |uniq -c |sort -rn

Freq()'s behaviour is like the combination of Unix sort, uniq -c, and sort -rn used here. You can use Freq() in combination with First() to get, for example, the ten most common lines in a file:

script.Stdin().Freq().First(10).Stdout()

Like uniq -c, Freq() left-pads its count values if necessary to make them easier to read:

10 apple
 4 banana
 2 orange
 1 kumquat

Join

Join() reads its input and replaces newlines with spaces, preserving a terminating newline if there is one.

p := script.Echo("hello\nworld\n").Join()
output, err := p.String()
fmt.Println(output)
// Output: hello world\n

Match

Match() returns a pipe containing only the input lines which match the supplied string:

p := script.File("test.txt").Match("Error")

MatchRegexp

MatchRegexp() is like Match(), but takes a compiled regular expression instead of a string.

p := script.File("test.txt").MatchRegexp(regexp.MustCompile(`E.*r`))

Reject

Reject() is the inverse of Match(). Its pipe produces only lines which don't contain the given string:

p := script.File("test.txt").Match("Error").Reject("false alarm")

RejectRegexp

RejectRegexp() is like Reject(), but takes a compiled regular expression instead of a string.

p := script.File("test.txt").Match("Error").RejectRegexp(regexp.MustCompile(`false|bogus`))

Replace

Replace() returns a pipe which filters its input by replacing all occurrences of one string with another, like Unix sed:

p := script.File("test.txt").Replace("old", "new")

ReplaceRegexp

ReplaceRegexp() returns a pipe which filters its input by replacing all matches of a compiled regular expression with a supplied replacement string, like Unix sed:

p := script.File("test.txt").ReplaceRegexp(regexp.MustCompile("Gol[a-z]{1}ng"), "Go")

Sinks

Sinks are operations which return some data from a pipe, ending the pipeline.

AppendFile

AppendFile() is like WriteFile(), but appends to the destination file instead of overwriting it. It returns the number of bytes written, or an error:

var wrote int
wrote, err := script.Echo("Got this far!").AppendFile("logfile.txt")

Bytes

Bytes() returns the contents of the pipe as a slice of byte, plus an error:

var data []byte
data, err := script.File("test.bin").Bytes()

CountLines

CountLines(), as the name suggests, counts lines in its input, and returns the number of lines as an integer, plus an error:

var numLines int
numLines, err := script.File("test.txt").CountLines()

Read

Read() behaves just like the standard Read() method on any io.Reader:

buf := make([]byte, 256)
n, err := r.Read(buf)

Because a Pipe is an io.Reader, you can use it anywhere you would use a file, network connection, and so on. You can pass it to ioutil.ReadAll, io.Copy, json.NewDecoder, and anything else which takes an io.Reader.

Unlike most sinks, Read() does not read the whole contents of the pipe (unless the supplied buffer is big enough to hold them).

Stdout

Stdout() writes the contents of the pipe to the program's standard output. It returns the number of bytes written, or an error:

p := Echo("hello world")
wrote, err := p.Stdout()

In conjunction with Stdin(), Stdout() is useful for writing programs which filter input. For example, here is a program which simply copies its input to its output, like cat:

func main() {
	script.Stdin().Stdout()
}

To filter only lines matching a string:

func main() {
	script.Stdin().Match("hello").Stdout()
}

String

String() returns the contents of the pipe as a string, plus an error:

contents, err := script.File("test.txt").String()

Note that String(), like all sinks, consumes the complete output of the pipe, which closes the input reader automatically. Therefore, calling String() (or any other sink method) again on the same pipe will return an error:

p := script.File("test.txt")
_, _ = p.String()
_, err := p.String()
fmt.Println(err)
// Output: read test.txt: file already closed

WriteFile

WriteFile() writes the contents of the pipe to a named file. It returns the number of bytes written, or an error:

var wrote int
wrote, err := script.File("source.txt").WriteFile("destination.txt")

Examples

Since script is designed to help you write system administration programs, a few simple examples of such programs are included in the examples directory:

  • cat (copies stdin to stdout)
  • cat 2 (takes a list of files on the command line and concatenates their contents to stdout)
  • grep
  • head
  • echo
  • visitors

More examples would be welcome!

If you use script for real work (or, for that matter, real play), I'm always very interested to hear about it. Drop me a line to john@bitfieldconsulting.com and tell me how you're using script and what you think of it!

How can I contribute?

See the contributor's guide for some helpful tips.

About

Making it easy to write shell-like scripts in Go

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Go 100.0%