CIB merge technical documentation (EN): CSV

CIB merge technical documentation (EN)

4. Data supply

4.1. CSV

General
Notes for the usage of separation and special characters inside the CSV file
Notes on UTF-8 encoded data CSVs
Single CSV file
Multi CSV file

General

The CSV files are one or more text files containing the names of the input fields and the corresponding values. The abbreviation CSV stands for "Comma Separated Values".

The first line of the control file contains the so-called control record, which consists of field names separated by ";". The tax record can contain any number of field names, limited only by the free working memory.

Each subsequent line contains exactly one data record. A data record contains the text modules or data to be inserted, also separated by ";", in the order of the field names. The number of entries in the data record must match the number of field names in the tax record.

CIB merge can process a single CSV file or multi-CSV files.

Usage with CIB merge:

The parameter -d<Dataset source> sets the CSV file for CIB merge, see chapter Parameter –d.

Notes for the usage of separation and special characters inside the CSV file

If a text passage to be inserted contains a semicolon, a tab character or a quotation mark, the entire text passage must be placed in quotation marks. Quotation marks in a text passage must then be doubled. For example, to insert the company name Laundry "Weißer Riese" in a raw text, the entry in the control file must have the following appearance: ; "Laundry""Weißer Riese"";.

CIB merge can also use the -T parameter to apply a different separator than ";" to the CSV files, see chapter Parameter –T.

The dataset can also be provided in a separate control file which can be set in CIB merge using the parameter –h, see chapter Parameter –h.

Notes on UTF-8 encoded data CSVs

To merge UTF-8 encoded data files correctly using the CIB merge, the following steps are necessary:

1. The CIB merge par file needs to be extended by the following parameter:
-putf-8
This parameter indicates that the data files are encoded in UTF-8 format.

2. Remove the "byte order mark" (BOM) from the data files.

Since for the processing of UTF-8 encoded data files by CIB merge the parameter described under 1.) is used, all characters contained in the data file are interpreted according to the UTF-8 character set. This also applies to a BOM. This results in error messages during processing. For this reason all BOMs must be removed from the data files used.

Single CSV file

Einzel CSV

Description

With the single CSV file, the input fields are directly assigned to their values. The user uses the field name directly in the document module to access a value.

Syntax	Example
Headline 1. DataLine … n.DataLine	FieldName1;FieldName2 Value11; Value12 ... ValueN1; ValueN2

Syntax

Example

Headline

1. DataLine

…

n.DataLine

FieldName1;FieldName2

Value11; Value12

...

ValueN1; ValueN2

Multi CSV file

Multi CSV

Description:

A multi CSV file can be used to manage several CSV files. It contains the names of all CSV files to be loaded in the current merge process. Using the fields in the header of the multi CSV file, each CSV file is assigned an alias name, which can then be used to access these CSV files in the document.

Usage with CIB merge

For a multi CSV data supply, in addition to the parameter -d with the multi CSV file, the parameter -c must also be set, see chapter Parameter -c.

Syntax	Example
Header with alias names All involved CSV file names	Table1; Table2 Tab1.csv; Tab2.csv

Syntax

Example

Header with alias names

All involved CSV file names

Table1; Table2

Tab1.csv; Tab2.csv

Syntax

Tab1.csv

Tab2.csv

CSVName1; CSVName2

Value11; Value12

ValueN1; ValueN2

CSVName1; CSVName2

Value11; Value12

ValueN1; ValueN2

Advantages over XML:

Simple format
Simple 1-n relationship
Smaller file size