Releases: qri-io/dataset
v0.3.0
v0.3.0 (2021-05-04)
This release of the dataset
package includes one major change, adding a Stats
component, as well as a few minor changes and a bunch of bug fixes that are listed below.
Stats component
Introducing the stats
component, a top-level component that provides the mechanics to quickly generate the stats using probabilistic structures. Unlike previous iterations of calculating stats, it is not bound to size/time limitations. We calculate and store different kinds of stats based on the content of the given column or fields. The different types as of this release are numeric, boolean, and string. We've moved the qri/stats
package into dataset
under the name dsstats
.
Take a look at our spec for details on how stats are calculated.
Bug Fixes
dataset.BodyFile
: if no dataset exists, return nil (84c88eb)- dataset: DropTransients drops peername field (656948d)
- dsgen: fix flag for number of rows in dsgen CLI (8a042f1)
- meta: marshalling to json object should not modify private meta field (5a55038)
- meta,structure: serializing to JSON includes path value (cfd5aca)
- preview: rename CreatePreview -> Preview, don't consume input dataset files (b7a9395)
- stats: avoid nil ptr panic (#242) (556268c)
- stats: limiting top-k frequencies to 200 (#239) (74e6f19)
- transform: Assign() overwrites Steps field (ce73c09)
Features
commit
: addRunID
field toCommit
struct (ecaf655)preview
:CreatePreview
takes adataset.Dataset
and returns a truncated version (1fae175)- dataset: add ID field to dataset.Dataset (ceb9ee1)
- detect.Structure: move struture detection function down from qri (2330b0f)
- dsio.ReadAll: add ReadAll, ReadAllObject, ReadAllArray functions (80263b4)
- dsstats: move stats package from qri core, rename to dsstats (e5257e0)
- dstest: Add
Readme
support (d480331) - dstest: add CompareGoldenDatasetAndUpdate convenience function (577ff3f)
- dstest: add Template function (f588dde)
- dstest: configuraable CompareDatasets, Golden File Functions (1019334)
- ShallowCompare,PathMap: add utility methods for comparing components (80c9f61)
- SigningBytes: new SigningBytes includes all components (1b5ddf1)
- stats: add Assign method, stats component tests (23fb3fd)
- stats: add stats component (4e9ca61)
- stats: use 'sa' as kind prefix, marshal stats to/from JSON (5235164)
- transform: add
Syntaxes
field toTransform
struct (8a30d20) - type: utility to check type presence for columns in tabular (#244) (9b4fc79)
BREAKING CHANGES
- dataset: older versions of qri that attempt to verify the signature of datasets with a
non-empty ID string field will error. - removed Compare* functions, use dstest.Compare instead
chore(release): release v0.2.0
v0.2.0 (2020-06-29)
A minor release that introduces a number of small fixes, an overhauled gen
package based on new tabular type detection, and a small change with some very noticeable performance improvements when using a dsio.CSVReader
Bug Fixes
- detect: Don't treat strings starting with 't','f','n' as the wrong type (1eb7656)
- detect: Iterate type counts in a deterministic manner (6427bdd)
- dsfs.getDepth: fix algorithm & add tests (f67cfd6)
- dsio: json decoder emits int64 instead of int (8a8404c)
- dsio: remove stub schema function for CSV & XLSX formats (94a15a5)
- dsutil: use a context cancel instead of not loading viz (53231a0)
- entryreader: json over batch size propperly unmarshals now (#227) (71e64eb)
- NewJSONPrettyWriter: now writer correctly writes object values when indenting (2d2e247)
Features
- detect: detect tabular schemas from go types (cdaceda)
- dsgen: add dsgen command for generating datasets, overhaul gen pkg (bf363af)
- readme: Readme component for datasets (c2db273)
- structure: add RequiresTabularSchema method (9f24359)
- tabular: package tabular defines tools for tabular datasets (9fec0a3)
- transform: add InlineScript method, matching readme (8929f14)
Performance Improvements
v0.1.4
v0.1.2
v0.1.1
v0.1.0
This is the first proper release of dataset
. In preparation for go 1.13, in which go.mod files and go modules are the primary way to handle go dependencies, we are going to do an official release of all our modules. This will be version v0.1.0 of dataset
.
The change log is huge here because we haven't been properly cutting releases until now. From here forward, that changes! Yay! Progress!