Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: generating JSON snapshots in compressed format (like gzip) #598

Open
yarden opened this issue Aug 2, 2019 · 2 comments
Labels

Comments

@yarden
Copy link
Contributor

yarden commented Aug 2, 2019

This is a feature request, but hopefully one that shouldn't be too hard to implement and potentially help several users.

Kappa can generate snapshots in JSON format, which is convenient to parse but JSON in general is quite wasteful. For those of us working with simulations that generate thousands or tens of thousands of snapshots, the files are unnecessarily large. It'd be great if KaSim had a -compress option that generated JSON in a compressed format. For example, for a typical snapshot JSON file I work with, simple gzip compression makes the file 10x smaller. Just about every programming language has libraries for reading compressed gzip format, so there's never really a need to uncompress these files. This will save a lot of space.

Thanks,
Yarden

@hmedina
Copy link
Collaborator

hmedina commented Aug 3, 2019

I would add that native Kappa is already 10x smaller than JSON, so the most gains would be from compressing native Kappa itself. In a random snapshot I got, the JSON representation is 144 KBs, the native Kappa is 17.6 KBs, and their compressions are 2.33 KBs and 1.58 KBs.

How about using the file-name detector to specify the compression? E.g.

%mod: [T] > 1 do $SNAPSHOT "snap.ka" ;      // produces native kappa
%mod: [T] > 1 do $SNAPSHOT "snap.ka.gz" ;   // produces gzip compressed native kappa
%mod: [T] > 1 do $SNAPSHOT "snap.json" ;    // produces standard json
%mod: [T] > 1 do $SNAPSHOT "snap.json.gz" ; // produces gzip compressed json

I would say "snap.gz" should default to native Kappa.

And if the devs. have the spare time, the parameters for compression could be specified via user-defined parameters, e.g.:

%def: "gzip_compression" [fast] | [better]
%def: "gzip_rsyncable" [true] | [false]

@yarden
Copy link
Contributor Author

yarden commented Aug 4, 2019

Having it go by the extension of the snapshot filename is a great idea! @pirbo also suggested that in another conversation we had. It'd be perfect to just specify .json.gz or .ka.gz.

Since simple gz compression works so well for this format, I'm personally not looking to specify any compression parameters. I'd prefer whatever's the most portable option. All the languages I work with have built-in or readily available gzip libraries that let you read gz files on the fly (and then there's zcat...). Don't know what the situation is on Windows, but I assume most of those libs are cross-platform (Python's gzip certainly should be).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants