Skip to content

Commit c474f4b

Browse files
committed
Update terminology to clarify metadata schema and structure
1 parent 71fe5c3 commit c474f4b

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

docs/croissant-spec-draft.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -67,13 +67,13 @@ Croissant is designed to be modular and extensible. One such extension is the Cr
6767

6868
## Terminology
6969

70-
**Dataset**: A collection of data points or items reflecting the results of such activities as measuring, reporting, collecting, analyzing, or observing.
70+
**Dataset**: A collection of data points or items reflecting the results from activities such as measurement, reporting, analysis, or collection.
7171

72-
**Croissant dataset**: A dataset that comes with a description in the Croissant format. Note that the Croissant description of a dataset does not generally contain the actual data of the dataset (with the exception of small examples or enumerations). The data itself is contained in separate files, referenced by the Croissant dataset description.
72+
**Croissant Dataset**: A dataset accompanied by a Croissant description, which is a metadata schema defining its structure, file organization and field properties. Note that the Croissant description does not generally contain the actual data of the dataset (with the exception of small examples or enumerations). The data itself is contained in separate files, referenced by the Croissant dataset description.
7373

74-
**Data Record**: A granular part of a dataset, such as an image, text, or archive file.
74+
**Data Record**: A granular part of a dataset, such as an image, text, or archive file. Data Records are described by `FileObject` and `FileSet` types.
7575

76-
**RecordSet**: A set of structured data records obtained from one or more data sources (typically a file or set of files), such as a collection of images, text files, or all the rows in a table.
76+
**RecordSet**: A set of structured data records obtained from one or more Data Records. It represents a coherent subset of the dataset with defined properties. The properties are described using the `Field` type.
7777

7878
## Format Example
7979

0 commit comments

Comments
 (0)