A safe Go interface to liblzma
, the "xz" compression library.
This is a Go package for compression in the xz / lzma format that provides a safer way to call liblzma
without the fear of type-safety issues and simplifies the complexity of lzma-compression options into common use cases.
Install the liblzma-dev
package from your favorite package manager. For Debian/Ubuntu, for instance:
sudo apt install liblzma-dev
then:
go get -u github.com/christoofar/safexz
At runtime you must ensure that liblzma.so
(UNIX) liblzma.dll
(MSYS2) can be properly found by your operating system.
If you're on a Linux distro that has systemd
, don't worry: you already have liblzma
.
This package uses CGo. By default, that means your own project must not (at least directly) disable CGo. If you're willing to
stick around, my next GitHub project is to demonstrate how you can CGO_ENABLE=0
but still vendor CGo support back into your program,
which is a technique you can use to make your program hot-swappable between the musl
C standard library and the GNU C library. I
use this technique to make tiny CGo-supporting containers!
The easiest way to getting this done is to use MSYS2. You will be compiling your Go program with CGo using the GCC compiler that is in MinGW64 (which comes with MSYS2).
After installing MSYS2, open a MinGW64 command prompt and issue:
pacman -S base-devel git mingw-w64-x86_64-gcc mingw-w64-x86_64-lzma
Open a MinGW64 command prompt and run nano /etc/profile
and add the approprite paths to the go compiler at the bottom of this file. In my case I did this:
export GOPATH="/c/Users/chris/go"
export PATH="/go/bin:${PATH}:${GOPATH}/bin"
The /go/bin
folder should be going to your go.exe
compiler. What I did to make this easier is symlink the default Go for Windows bin folder to /go
like this:
ln -s /c/Program\ Files/Go /go
Restart your machine so all the paths can be found. Open a MinGW64 command prompt and try go version
to make sure the Go compiler is reachable.
If you are using Visual Studio Code as your IDE, it can help to make MSYS2 your default terminal program instead of Powershell. Go to File -> Preferences -> Settings
and in settings.json
incorporate this into your vscode settings:
"terminal.integrated.profiles.windows": {
"MSYS2": {
"path": "C:\\msys64\\usr\\bin\\bash.exe",
"args": ["--login", "-i"],
"env": {
"MSYSTEM": "MINGW64",
"CHERE_INVOKING": "1"
}
}
},
"terminal.integrated.defaultProfile.windows": "MSYS2"
Restart vscode
. Now you should be able to compile your Go program that uses this library. Incorporate it
into your Go project:
go get -u github.com/christoofar/safexz
When you are building your program that uses safexz
using CGo on Windows, you don't need to install MSYS2 on the target machine.
Compile your program this way:
go build -ldflags "-extldflags \"-static\""
This will statically link the lzma dependency from MSYS2 into your binary.
You don't need Windows to make a Windows binary if you are developing in CGo. To do this, you need the MinGW64 GCC compiler:
sudo apt-get install gcc-multilib
sudo apt-get install gcc-mingw-w64
Then to build, just pass along the env setting to use the MinGW compiler
GOOS=windows GOARCH=amd64 CGO_ENABLED=1 CC=x86_64-w64-mingw32-gcc CXX=x86_64-w64-mingw32-g++ go build -ldflags "-extldflags \"-static\""
// Compressing a file
safexz.CompressFile("data/mydata.dat", "data/mydata.xz")
// Compressing a string
compressedString := safexz.CompressString("Hello World!")
// Direct-reading a compressed `xz` archive into a decompressed slice of bytes
myPicture := safexz.DecompressFileToMemory("images/monalisa.png.xz")
// Raising the compression level and asking for all cores to be
// engaged in compression
myPictureBytes := safexz.CompressFileToMemory("images/monalisa.png", CompressionFullPowerBetter)
// Forwarding an uncompressed data stream into a compressed writer stream,
// compressing as it goes
safexz.CompressStream(networkLogSource, compressedStreamWriter) // XZWriter is the writer
// Forwarding an XZ-compressed stream into an io.Writer stream,
// decompressing as it goes
safexz.DecompressStream(compressedStream, streamToWriteTo) // XZReader is the reader
// Wrapping an existing reader behind XZReader, which will decompress the xz/lzma stream
// inside
myXZreader := safexz.XZReader.NewReader(myCompressedDataReader) // type XZReader
// Wrapping an existing writer behind XZWriter, so the underlying writer sees and will
// save or send a valid `xz` data stream
myXZWriter := safexz.XZWriter.NewWriter(file) // type XZWriter
// Compressing a stream with XZWriter while reading its contents
resp, err := http.Get("https://media.istockphoto.com/id/1453319272/photo/columbus-ohio-usa-skyline-on-the-scioto-river.jpg?s=2048x2048&w=is&k=20&c=tgQ4HAX-dX7A1XTanxHMrkFOg5Fpa2kW87m96JKLcUM=")
if err != nil {
t.Error("Error downloading image:", err)
}
defer resp.Body.Close()
// Compress the image
f, err := os.Create("test.jpg.xz")
if err != nil {
t.Error("Error creating compressed file:", err)
}
defer f.Close()
compressedImageWriter := safexz.NewWriter(f)
_, err = io.Copy(compressedImageWriter, resp.Body)
if err != nil {
t.Error("Error compressing image:", err)
}
compressedImageWriter.Close()
Full API documentation with even more functions than this can be found at go.dev Please review the Go test cases, as they test over 90% of all the code.
Late March 2024 CVE-2024-3094 was issued against the liblzma
compression library for a supply chain attack. That episode began with the attacker gaining maintainer role on the FOSS project in a 2 year campaign to inject a highly-complicated and stealty backdoor into the software by injecting its own pre-compiled and ready-to-be-linked .o
file into the build stream.
Since then, systemd
and OpenSSH, the two primary projects the xz
backdoor exploited to gain access to sshd
have since made code changes that remove liblzma.so
from static linking.
I'm not a firm believer that dlopen()
is really much of a cure than it is a quick excuse to ignore your supply chain. And still: dynamic linking in software is frought with other problems, one of which is security.
I am pinning my version of lzma
to post-backdoor versions as Lasse Collins refactors out the work of Jia Tan. Similarly, there have been recent commits by Sam James @thesamesam done in the spirit of simplifying maintenance of liblzma
so that the build chain is more compatible with the expectations of modern C developers. The software is healing, the backdoor in xz
is dead, and by the time you've found this project, new versions of liblzma
will ready for production use.
Rather than expose the call-chain dependencies of lzma
directly to you, a simpler Go interface is provided for your integration projects. The interface breaks down your calls into simpler one way chan
transfers stream to lzma
.
Direct file and []byte
, I/O functions for syntactic sugar are provided, so that you can avoid standing up a stream monitor and input feeder.
This project's goal is to abide by a chaos/containment theory which believes unless you wrote the C code, tread carefully.
To increase the difficulty of a user with a suspicious C library gaining control over your own go program that uses a C library, safexz
:
- Hides C call access and data structures as internal packages
- No call is made to a C library without boxing it inside 2 layers of goroutines
- All communication of data to and from the C library's functions must take place from a separate goroutine and data may only pass via channels using packed types (fields without pointers) wherever possible.
- Go pointers into
lzma
are not possible - No one using this library would ever have to manage
liblzma
's internal state, nor willliblzma
be able to "see" your data types of your own code except for the []byte stream of data to be compressed/decompressed.
More detail about this technique is over here.
This work is based off the direct liblzma
library stubs published by @jamespfennell under the MIT license. liblzma
is the published work of Lasse Collin and many others and is published under the 0BSD software license.
safexz
is the published work of Christopher Sawyer and is made available under the BSD3 license.