Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit a2ca858

Browse files
Victor LeeVictor Lee
Victor Lee
authored and
Victor Lee
committedOct 15, 2023
DOC-1839-loading-two-warnings
1 parent ac9b724 commit a2ca858

File tree

1 file changed

+16
-11
lines changed

1 file changed

+16
-11
lines changed
 

‎modules/ddl-and-loading/pages/creating-a-loading-job.adoc

+16-11
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,10 @@ A line should contain only data values and separators, without extra whitespace.
1212
From a tabular view, each line of data is a row, and each row consists of a series of column values.
1313

1414
Loading data is a two-step process.
15-
First, a loading job is defined.
16-
Next, the job is executed with a `RUN LOADING JOB` statement.
15+
16+
. First, a loading job is defined with a `CREATE LOADING JOB` statement.
17+
. Next, the job is executed with a `RUN LOADING JOB` statement.
18+
1719
These two statements, and the components of the loading job, are detailed below.
1820

1921
The structure of a loading job will be presented hierarchically, top-down:
@@ -24,9 +26,8 @@ The structure of a loading job will be presented hierarchically, top-down:
2426
* `LOAD` statements, which can have several clauses
2527
2628
[NOTE]
27-
====
28-
*All blank spaces are meaningful in string fields in CSV and JSON*. Either pre-process your data files to remove extra spaces, or use GSQL's token processing functions `gsql_trim`, `gsql_ltrim`, and `gsql_rtrim` (<<_token_functions>>).
29-
====
29+
*All blank spaces are meaningful in string fields in CSV and JSON*.
30+
Either pre-process your data files to remove extra spaces, or use GSQL's token processing functions `gsql_trim`, `gsql_ltrim`, and `gsql_rtrim` (<<_token_functions>>).
3031

3132
[NOTE]
3233
User privileges for running loading jobs are treated as separate from privileges regarding reading and writing data to vertices and edges.
@@ -49,9 +50,7 @@ Among its several duties, the RESTPP component manages loading jobs. There can b
4950
Furthermore, if the TigerGraph graph is distributed (partitioned) across multiple machine nodes, each machine's RESTPP-LOADER(s) can be put into action. Each RESTPP-LOADER only reads local input data files, but the resulting graph data can be stored on any machine in the cluster.
5051

5152
[NOTE]
52-
====
5353
To maximize loading performance in a cluster, use at least two loaders per machine, and assign each loader approximately the same amount of data.
54-
====
5554

5655
A concurrent-capable loading job can logically be separated into parts according to each file variable. When a concurrent-capable loading job is compiled, a xref:tigergraph-server:API:built-in-endpoints.adoc#_run_a_loading_job[RESTPP endpoint] is generated for the loading job, which you can call to load data into your graph as an alternative to `RUN LOADING JOB`.
5756

@@ -70,11 +69,13 @@ Each statement in the block, including the last one, should end with a semicolon
7069
[source,gsql]
7170
----
7271
CREATE LOADING JOB job_name FOR GRAPH Graph_Name {
73-
[zero or more DEFINE statements;]
74-
[zero or more LOAD statements;] | [zero or more DELETE statements;] <1>
72+
[zero or more DEFINE statements;] <1>
73+
[zero or more LOAD statements;] | [zero or more DELETE statements;] <2>
7574
}
7675
----
77-
<1> A loading job may contain either `LOAD` or `DELETE` statements but not both.
76+
77+
<1> While one loading job may define multiple data sources (files), keep the number below 100 for best performance.
78+
<2> A loading job may contain either `LOAD` or `DELETE` statements but not both.
7879
A loading job that includes both will be rejected when the `CREATE` statement is executed.
7980

8081
=== Loading data to global vertices and edges
@@ -132,6 +133,10 @@ The `DEFINE FILENAME` statement defines a filename variable.
132133
The variable can then be used later in the `JOB` block by a `LOAD` statement to identify its data source.
133134
Every concurrent loading job must have at least one `DEFINE FILENAME` statement.
134135

136+
[NOTE]
137+
Having more than 100 file or folder sources will degrade performance.
138+
Consider either consolidating sources or splitting your work into separate loading jobs.
139+
135140
[source,ebnf]
136141
----
137142
DEFINE FILENAME filevar ["=" filepath_string ];
@@ -256,7 +261,7 @@ A basic principle in the GSQL Loader is cumulative loading. Cumulative loading m
256261
. Complex type: Depends on the field type or element type. Any invalid field (in `UDT`), element (in `LIST` or `SET`), key or value (in `MAP`) causes rejection.
257262

258263
* *New data objects:* If a valid data object has a new ID value, then the data object is added to the graph store. Any attributes which are missing are assigned the default value for that data type or for that attribute.
259-
* *Overwriting existing data objects*: If a valid data object has a ID value for an existing object, then the new object overwrites the existing data object, with the following clarifications and exceptions:
264+
* *Overwriting existing data objects*: If a valid data object has an ID value for an existing object, then the new object overwrites the existing data object, with the following clarifications and exceptions:
260265

261266
. The attribute values of the new object overwrite the attribute values of the existing data object.
262267
. *Missing tokens*: If a token is missing from the input line so that the generated attribute is missing, then that attribute retains its previous value.

0 commit comments

Comments
 (0)
Please sign in to comment.