Skip to content

Releases: qri-io/dataset

v0.3.0

04 May 19:52
3f4b696
Compare
Choose a tag to compare

v0.3.0 (2021-05-04)

This release of the dataset package includes one major change, adding a Stats component, as well as a few minor changes and a bunch of bug fixes that are listed below.

Stats component

Introducing the stats component, a top-level component that provides the mechanics to quickly generate the stats using probabilistic structures. Unlike previous iterations of calculating stats, it is not bound to size/time limitations. We calculate and store different kinds of stats based on the content of the given column or fields. The different types as of this release are numeric, boolean, and string. We've moved the qri/stats package into dataset under the name dsstats.

Take a look at our spec for details on how stats are calculated.

Bug Fixes

  • dataset.BodyFile: if no dataset exists, return nil (84c88eb)
  • dataset: DropTransients drops peername field (656948d)
  • dsgen: fix flag for number of rows in dsgen CLI (8a042f1)
  • meta: marshalling to json object should not modify private meta field (5a55038)
  • meta,structure: serializing to JSON includes path value (cfd5aca)
  • preview: rename CreatePreview -> Preview, don't consume input dataset files (b7a9395)
  • stats: avoid nil ptr panic (#242) (556268c)
  • stats: limiting top-k frequencies to 200 (#239) (74e6f19)
  • transform: Assign() overwrites Steps field (ce73c09)

Features

  • commit: add RunID field to Commit struct (ecaf655)
  • preview: CreatePreview takes a dataset.Dataset and returns a truncated version (1fae175)
  • dataset: add ID field to dataset.Dataset (ceb9ee1)
  • detect.Structure: move struture detection function down from qri (2330b0f)
  • dsio.ReadAll: add ReadAll, ReadAllObject, ReadAllArray functions (80263b4)
  • dsstats: move stats package from qri core, rename to dsstats (e5257e0)
  • dstest: Add Readme support (d480331)
  • dstest: add CompareGoldenDatasetAndUpdate convenience function (577ff3f)
  • dstest: add Template function (f588dde)
  • dstest: configuraable CompareDatasets, Golden File Functions (1019334)
  • ShallowCompare,PathMap: add utility methods for comparing components (80c9f61)
  • SigningBytes: new SigningBytes includes all components (1b5ddf1)
  • stats: add Assign method, stats component tests (23fb3fd)
  • stats: add stats component (4e9ca61)
  • stats: use 'sa' as kind prefix, marshal stats to/from JSON (5235164)
  • transform: add Syntaxes field to Transform struct (8a30d20)
  • type: utility to check type presence for columns in tabular (#244) (9b4fc79)

BREAKING CHANGES

  • dataset: older versions of qri that attempt to verify the signature of datasets with a
    non-empty ID string field will error.
  • removed Compare* functions, use dstest.Compare instead

chore(release): release v0.2.0

29 Jun 19:13
@b5 b5
d12a66b
Compare
Choose a tag to compare

v0.2.0 (2020-06-29)

A minor release that introduces a number of small fixes, an overhauled gen package based on new tabular type detection, and a small change with some very noticeable performance improvements when using a dsio.CSVReader

Bug Fixes

  • detect: Don't treat strings starting with 't','f','n' as the wrong type (1eb7656)
  • detect: Iterate type counts in a deterministic manner (6427bdd)
  • dsfs.getDepth: fix algorithm & add tests (f67cfd6)
  • dsio: json decoder emits int64 instead of int (8a8404c)
  • dsio: remove stub schema function for CSV & XLSX formats (94a15a5)
  • dsutil: use a context cancel instead of not loading viz (53231a0)
  • entryreader: json over batch size propperly unmarshals now (#227) (71e64eb)
  • NewJSONPrettyWriter: now writer correctly writes object values when indenting (2d2e247)

Features

  • detect: detect tabular schemas from go types (cdaceda)
  • dsgen: add dsgen command for generating datasets, overhaul gen pkg (bf363af)
  • readme: Readme component for datasets (c2db273)
  • structure: add RequiresTabularSchema method (9f24359)
  • tabular: package tabular defines tools for tabular datasets (9fec0a3)
  • transform: add InlineScript method, matching readme (8929f14)

Performance Improvements

  • csv: increase read buffer size for csv reader (#225) (a8aa566)

v0.1.4

04 Sep 16:13
@b5 b5
96e767b
Compare
Choose a tag to compare

This patch release include small fixes for dsio.JSON format reader, strict valdation error returns in dsfs.SaveDataset, and a method for Dropping derived values from a Dataset & Components

v0.1.2

10 Jun 23:56
@b5 b5
62844fd
Compare
Choose a tag to compare

Quick patch release that adds a utility function to dsviz templates: isType.

v0.1.1

03 Jun 21:35
@b5 b5
43edc2f
Compare
Choose a tag to compare

Due to a circular module dependency, we've move github.com/qri-io/dsdiff into github.com/qri-io/dataset/dsdiff.

v0.1.0

03 Jun 20:09
@b5 b5
91b64ea
Compare
Choose a tag to compare

This is the first proper release of dataset. In preparation for go 1.13, in which go.mod files and go modules are the primary way to handle go dependencies, we are going to do an official release of all our modules. This will be version v0.1.0 of dataset.

The change log is huge here because we haven't been properly cutting releases until now. From here forward, that changes! Yay! Progress!