KBART 5.3.1: data format

kbart_logo_small.png

5.3.1.1 Content providers should provide metadata formatted as tab-delimited values.


This is a generic format that minimizes the effort involved in receiving and
loading the data, and reduces the likelihood of errors being introduced during
exchange. Tab-delimited formats are preferable to comma-separated formats,
as commas appear regularly within the distributed data and, though they can be
“commented out”, doing so leaves a greater opportunity for error than the use of
a tab-delimited format. Tab-delimited formats can be easily exported from all commonly used spreadsheet programs.


5.3.1.2 The file should be entitled “[ProviderName]_AllTitles_[YYYY-MM-DD].txt”.

For
example, JSTOR_AllTitles_2008-12-01.txt.


5.3.1.3 The provider name should be the web domain at which your data is hosted (but
without the punctuation).


For example, jstor or ebscohost. This ensures that
your data is clearly distinguished from data provided by others with similar
package names. Also, the file name should be consistent for each metadata file
deposited.


5.3.1.4 Separate files should be produced for each package of content that the provider
offers.


Files should be named as customers would expect to see it labelled in
the knowledge base, using the syntax “[ProviderName] _[CollectionName]
_[YYYY-MM-DD].txt”. For example, JSTOR_Arts&SciencesV_2008-12-
01.txt. Providers and recipients can agree in advance how best to present
complex collection names.


5.3.1.5 All metadata should be provided as plain text.

If metadata is provided in a
format that does support additional style or formatting, it should be presented
without those enhancements. Data should not include colors, typefaces, italics,
or other markup.


5.3.1.6 Text should be encoded as UTF-8.

The UTF-8 character set is well supported
and encompasses the writing systems of many languages. This is also a
common output option for programs such as Microsoft Excel.


5.3.1.7 One publication should be given in each line of the file, with a column for each
field given in Section 5.3.2, Data Fields.


5.3.1.8 Data should be provided with column headers (see Section 5.3.2) and without a
blank row between the column header and the first row of content.


5.3.1.9 A title should be listed twice if there is a coverage gap of greater than or equal
to 12 months, with only the coverage field changing.


Greater granularity in
reporting data coverage gaps is desirable, and should be agreed with the link
resolver supplier if it can be supported.


5.3.1.10 All rows should be consistent in terms of format.

For example, ISSN should
always be expressed as nine characters with a hyphen separator, and date
fields should always be in the format described in Section 5.3.2.


5.3.1.11 The metadata file should be supplied in alphabetical order by title to ensure
ease of checking and import by knowledge base developers.