Description
Overview.
The following configuration gets JSON data using HTTP results
It outputs nine entries, But the current parser outputs three entries only.
in:
type: http
url: http://express.heartrails.com/api/json
params:
- {name: method, value: getStations}
- {name: x, value: 135.0}
- {name: y, value: "35"}
cursor:
request_parameter_cursor_name: name
response_parameter_cursor_json_path: '$.response.station[0].next'
parser:
type: json
root: '/response/station'
flatten_json_array: true
out: {type: stdout}
Envnironment
- Embulk: v0.11.0
- embulk-input-http 16a4adf94e60caddc0ce590e16332a7111043a4b forked version.
- embulk-parser-json: 0.4.0
The reason.
The results are separated by multiple HTTP responses. Those results are independent JSON objects like multiple files.
However, It constructs a single FileInputInputStream which contains multiple InputStreams.
embulk-parser-json just parses the first inputStream, as a result, It outputs three entires only.
It is the same issue embulk-parser-jsonpath
Most plugin's
TransactionalFileInput
has only one file (input stream), but the embulk specifications also supports multiple files (input streams) .
In the latter case, only the first file(input stream) is read in the current implementation.
Execution results.
embulk-input-http invoked the GET request six times.
2023-08-24 09:29:13.973 +0900 [INFO] (0015:task-0000): GET "http://express.heartrails.com/api/json?method=getStations&x=135.0&y=35"
2023-08-24 09:29:15.686 +0900 [INFO] (0015:task-0000): GET "http://express.heartrails.com/api/json?method=getStations&x=135.0&y=35&name=%E9%BB%92%E7%94%B0%E5%BA%84"
2023-08-24 09:29:15.754 +0900 [INFO] (0015:task-0000): GET "http://express.heartrails.com/api/json?method=getStations&x=135.0&y=35&name=%E6%9C%AC%E9%BB%92%E7%94%B0"
2023-08-24 09:29:15.799 +0900 [INFO] (0015:task-0000): GET "http://express.heartrails.com/api/json?method=getStations&x=135.0&y=35&name=%E8%88%B9%E7%94%BA%E5%8F%A3"
2023-08-24 09:29:15.840 +0900 [INFO] (0015:task-0000): GET "http://express.heartrails.com/api/json?method=getStations&x=135.0&y=35&name=%E4%B9%85%E4%B8%8B%E6%9D%91"
2023-08-24 09:29:15.986 +0900 [INFO] (0015:task-0000): GET "http://express.heartrails.com/api/json?method=getStations&x=135.0&y=35&name=%E8%B0%B7%E5%B7%9D"
Simulate with the curl
command.
% curl -Lv 'http://express.heartrails.com/api/json?method=getStations&x=135.0&y=35'
* Trying 35.75.165.181:80...
* Connected to express.heartrails.com (35.75.165.181) port 80 (#0)
> GET /api/json?method=getStations&x=135.0&y=35 HTTP/1.1
> Host: express.heartrails.com
> User-Agent: curl/7.88.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Thu, 24 Aug 2023 00:38:05 GMT
< Content-Type: application/json; charset=utf-8
< Content-Length: 572
< Connection: keep-alive
< Server: nginx
< Expires: Thu, 01 Dec 1994 16:00:00 GMT
< Pragma: no-cache
< X-Runtime: 1
< ETag: "952bd603b2f475e0e56ae31927adb679"
< Cache-Control: private, max-age=0, must-revalidate
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Methods: GET, OPTIONS
< Access-Control-Allow-Headers: *
<
{"response":{"station":[{"name":"日本へそ公園","prefecture":"兵庫県","line":"JR加古川線","x":134.997633,"y":35.002069,"postal":"6770039","distance":"320m","prev":"比延","next":"黒田庄"},{"name":"比延","prefecture":"兵庫県","line":"JR加古川線","x":134.995733,"y":34.988773,"postal":"6770033","distance":"1310m","prev":"新西脇","next":"日本へそ公園"},{"name":"黒田庄","prefecture":"兵庫県","line":"JR加古川線","x":134.992522,"y":35.022689,"postal":"6790313","distance":"2620m","prev":"日本へそ公園","next":"本黒田"}]}}
* Connection #0 to host express.heartrails.com left intact
{
"response": {
"station": [
{
"name": "日本へそ公園",
"prefecture": "兵庫県",
"line": "JR加古川線",
"x": 134.997633,
"y": 35.002069,
"postal": "6770039",
"distance": "320m",
"prev": "比延",
"next": "黒田庄"
},
{
"name": "比延",
"prefecture": "兵庫県",
"line": "JR加古川線",
"x": 134.995733,
"y": 34.988773,
"postal": "6770033",
"distance": "1310m",
"prev": "新西脇",
"next": "日本へそ公園"
},
{
"name": "黒田庄",
"prefecture": "兵庫県",
"line": "JR加古川線",
"x": 134.992522,
"y": 35.022689,
"postal": "6790313",
"distance": "2620m",
"prev": "日本へそ公園",
"next": "本黒田"
}
]
}
}
% curl -Lv 'http://express.heartrails.com/api/json?method=getStations&x=135.0&y=35&name=%E9%BB%92%E7%94%B0%E5%BA%84'
* Trying 35.75.165.181:80...
* Connected to express.heartrails.com (35.75.165.181) port 80 (#0)
> GET /api/json?method=getStations&x=135.0&y=35&name=%E9%BB%92%E7%94%B0%E5%BA%84 HTTP/1.1
> Host: express.heartrails.com
> User-Agent: curl/7.88.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Thu, 24 Aug 2023 00:39:18 GMT
< Content-Type: application/json; charset=utf-8
< Content-Length: 192
< Connection: keep-alive
< Server: nginx
< Expires: Thu, 01 Dec 1994 16:00:00 GMT
< Pragma: no-cache
< X-Runtime: 1
< ETag: "3be1f77accfba140aa48670d77eb6e97"
< Cache-Control: private, max-age=0, must-revalidate
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Methods: GET, OPTIONS
< Access-Control-Allow-Headers: *
<
{"response":{"station":[{"name":"黒田庄","prefecture":"兵庫県","line":"JR加古川線","x":134.992522,"y":35.022689,"postal":"6790313","prev":"日本へそ公園","next":"本黒田"}]}}
* Connection #0 to host express.heartrails.com left intact
{
"response": {
"station": [
{
"name": "黒田庄",
"prefecture": "兵庫県",
"line": "JR加古川線",
"x": 134.992522,
"y": 35.022689,
"postal": "6790313",
"prev": "日本へそ公園",
"next": "本黒田"
}
]
}
}
....
Example reproduce outputs
2023-08-24 18:00:12.915 +0900 [INFO] (main): m2_repo is set as a sub directory of embulk_home: /Users/user/.embulk/lib/m2/repository
2023-08-24 18:00:12.918 +0900 [INFO] (main): gem_home is set as a sub directory of embulk_home: /Users/user/.embulk/lib/gems
2023-08-24 18:00:12.918 +0900 [INFO] (main): gem_path is set empty.
2023-08-24 18:00:12.918 +0900 [DEBUG] (main): Embulk system property "default_guess_plugin" is set to: "gzip,bzip2,json,csv"
2023-08-24 18:00:13.049 +0900 [INFO] (main): Started Embulk v0.11.0
2023-08-24 18:00:14.752 +0900 [INFO] (0001:transaction): Gem's home and path are set by system configs "gem_home": "/Users/user/.embulk/lib/gems", "gem_path": ""
2023-08-24 18:00:15.364 +0900 [INFO] (0001:transaction): Loaded JRuby runtime 9.4.2.0
2023-08-24 18:00:15.395 +0900 [INFO] (0001:transaction): Loaded plugin embulk/input/http from a load path
2023-08-24 18:00:15.487 +0900 [INFO] (0001:transaction): Loaded plugin embulk-output-stdout
2023-08-24 18:00:15.538 +0900 [INFO] (0001:transaction): Loaded plugin embulk-parser-json
2023-08-24 18:00:15.687 +0900 [INFO] (0001:transaction): Using local thread executor with max_threads=16 / output tasks 8 = input tasks 1 * 8
2023-08-24 18:00:15.724 +0900 [INFO] (0001:transaction): {done: 0 / 1, running: 0}
2023-08-24 18:00:15.860 +0900 [INFO] (0015:task-0000): GET "http://express.heartrails.com/api/json?method=getStations&x=135.0&y=35"
2023-08-24 18:00:15.985 +0900 [INFO] (0015:task-0000): GET "http://express.heartrails.com/api/json?method=getStations&x=135.0&y=35&name=%E9%BB%92%E7%94%B0%E5%BA%84"
2023-08-24 18:00:16.007 +0900 [INFO] (0015:task-0000): GET "http://express.heartrails.com/api/json?method=getStations&x=135.0&y=35&name=%E6%9C%AC%E9%BB%92%E7%94%B0"
2023-08-24 18:00:16.029 +0900 [INFO] (0015:task-0000): GET "http://express.heartrails.com/api/json?method=getStations&x=135.0&y=35&name=%E8%88%B9%E7%94%BA%E5%8F%A3"
2023-08-24 18:00:16.050 +0900 [INFO] (0015:task-0000): GET "http://express.heartrails.com/api/json?method=getStations&x=135.0&y=35&name=%E4%B9%85%E4%B8%8B%E6%9D%91"
2023-08-24 18:00:16.085 +0900 [INFO] (0015:task-0000): GET "http://express.heartrails.com/api/json?method=getStations&x=135.0&y=35&name=%E8%B0%B7%E5%B7%9D"
{"prefecture":"兵庫県","distance":"320m","line":"JR加古川線","next":"黒田庄","prev":"比延","x":134.997633,"y":35.002069,"postal":"6770039","name":"日本へそ公園"}
{"prefecture":"兵庫県","distance":"1310m","line":"JR加古川線","next":"日本へそ公園","prev":"新西脇","x":134.995733,"y":34.988773,"postal":"6770033","name":"比延"}
{"prefecture":"兵庫県","distance":"2620m","line":"JR加古川線","next":"本黒田","prev":"日本へそ公園","x":134.992522,"y":35.022689,"postal":"6790313","name":"黒田庄"}
2023-08-24 18:00:16.162 +0900 [INFO] (0001:transaction): {done: 1 / 1, running: 0}
2023-08-24 18:00:16.167 +0900 [INFO] (main): Committed.
2023-08-24 18:00:16.167 +0900 [INFO] (main): Next config diff: {"in":{},"out":{}}