Skip to content

Commit c12d5e7

Browse files
authored
Improve data tools info (#72)
* update references and clean up some examples * fix internal links * link to docs when referencing sys fns * capitalise sys fns in example * unindent code with no output * clean up ⎕MAP section
1 parent 519eb92 commit c12d5e7

File tree

1 file changed

+164
-46
lines changed

1 file changed

+164
-46
lines changed

docs/Data.md

Lines changed: 164 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# Data Input/Output
22
Although we have [`⎕IO`](http://help.dyalog.com/latest/#Language/System Functions/io.htm), "IO" in APL can still refer to input/output.
33

4+
This page refers to APL tools for reading and writing data to and from files, databases, and the internet. If you are already familiar with Python, R or .NET then you can use one of the [external language bridges](./Interfaces.md) to bring data into APL from files via one of these languages. However, it can be simpler and faster in many cases to use one of the following tools.
5+
46
## Hello, World!
57
If you have seen any kind of computer programming before, you are probably aware of a famous program called ["Hello, World!"](https://en.wikipedia.org/wiki/%22Hello,_World!%22_program).
68

@@ -97,7 +99,7 @@ My great string 'which has some quoted text'
9799
```
98100

99101
!!! Note
100-
The user command `]Repr` can generate APL expressions which produce most arrays. In some sense, it is like an inverse to **execute** ``. There is also a utility function `⎕SE.Dyalog.Utils.repObj` which can be used in code, but we do not recommend using it in applications; use the primitives to test the properties of arrays, as explained in [the sections on error handling](../Errors/#who-needs-to-know).
102+
The user command `]Repr` can generate APL expressions which produce most arrays. In some sense, it is like an inverse to **execute** ``. There is also a utility function `⎕SE.Dyalog.Utils.repObj` which can be used in code, but we do not recommend using it in applications; use the primitives to test the properties of arrays, as explained in [the sections on error handling](./error-handling-and-debugging.md#who-needs-to-know).
101103

102104
### Convenient text output
103105
Once upon a time, APL was considered an incredible, revolutionary tool for scientists, artists and business people alike to be able to get work done using computers. In a time before spreadsheet software was so ubiquitous, APL terminals offered a way to quickly and easily process data, produce reports and format them for printing.
@@ -124,9 +126,9 @@ Take a look at the [Chapter F of Mastering Dyalog APL](https://www.dyalog.com/up
124126

125127
1.
126128

127-
In Dyalog version 18.0, `1200⌶` (*twelve hundred eye beam*) can convert date times into human readable formats according to some specification. For example:
129+
In Dyalog version 18.0, the experimental [`1200⌶`](https://help.dyalog.com/latest/#Language/I%20Beam%20Functions/Format%20Datetime.htm) (*twelve hundred eye beam*) function can convert date times into human readable formats according to some specification. For example:
128130

129-
<pre><code class="language-APL"> 'Dddd Mmmm Doo YYYY'(1200⌶)1⎕dt⊂3↑⎕ts
131+
<pre><code class="language-APL"> 'Dddd Mmmm Doo YYYY'(1200⌶)1⎕DT⊂3↑⎕TS
130132
┌──────────────────────────┐
131133
│Wednesday August 12th 2020│
132134
└──────────────────────────┘</code></pre>
@@ -143,25 +145,93 @@ Take a look at the [Chapter F of Mastering Dyalog APL](https://www.dyalog.com/up
143145
│Wednesday August 12th 2020│
144146
└──────────────────────────┘</code></pre>
145147

146-
## Native Files
148+
## Importing code and data while developing
149+
The experimental `]Get` user command can be used in the interactive IDE to obtain code and data from the internet or local file system in various formats. For example:
150+
151+
- APL code from files, folders and online repositories like GitHub
152+
- Workspaces and text source shipped with the interpreter, for example dfns and HttpCommand
153+
- Text data including plain text, CSV, XML and JSON
154+
155+
`]Get` is a development tool intended as a one-stop utility for quickly bringing resources into the workspace while programming. Do not use it at run time, as exact results can vary. Instead, use precisely documented features like [`⎕JSON`](#json), [`⎕CSV`](#csv), [`⎕XML`](#xml), and [`⎕FIX`](./Code.md#fix) in combination with loading tools like [`⎕NGET`](#text-files), [`HttpCommand`](#downloading-data-from-the-internet), [`⎕SE.Link.Import`](./Code.md#link), etc.
156+
157+
Enter `]Get -?` into the interactive session to see more information.
158+
159+
## Downloading data from the internet
160+
[:fontawesome-brands-dyalog: HttpCommand User Guide](https://dyalog.github.io/HttpCommand/)
161+
162+
**HttpCommand** is a utility for making requests to interact with web services. The HttpCommand class is built on top of the [**Conga**](https://docs.dyalog.com/latest/Conga%20User%20Guide.pdf) framework for TCP/IP communications.
163+
164+
Load HttpCommand into the active workspace.
165+
166+
```APL
167+
]Get HttpCommand
168+
#.HttpCommand
169+
```
170+
171+
Make an HTTP GET request to receive plain text data.
172+
173+
```APL
174+
(HttpCommand.Get 'https://catfact.ninja/fact').Data
175+
{"fact":"Cats have about 130,000 hairs per square inch (20,155 hairs per square centimeter).","length":83}
176+
```
177+
178+
The GetJSON method automatically converts JSON payloads to APL namespaces. Remember to specify the HTTP method (`'GET'` in the following example).
179+
180+
```APL
181+
(HttpCommand.GetJSON 'GET' 'https://catfact.ninja/fact').Data.fact
182+
There are approximately 60,000 hairs per square inch on the back of a cat and about 120,000 per square inch on its underside.
183+
```
184+
185+
The result of a call to an HttpCommand method is a namespace including information about the request and its response.
186+
187+
```APL
188+
r←HttpCommand.Get 'https://catfact.ninja/fact'
189+
r.(HttpStatus HttpMessage)
190+
┌───┬──┐
191+
│200│OK│
192+
└───┴──┘
193+
```
194+
195+
Using `HttpCommand` with [`⎕FIX`](../Code/#fix) is a way to download APL code from the internet.
196+
197+
## Native files
147198
The term "Native Files" refers to any type of file on a hard disk. These can be text or media files, or even executable files. Usually we are interested in various kinds of text files.
148199

200+
### Text files
201+
[:material-web: Read Text File `⎕NGET` documentation](https://help.dyalog.com/latest/#Language/System%20Functions/nget.htm)
202+
[:material-web: Write Text File `⎕NPUT` documentation](https://help.dyalog.com/latest/#Language/System%20Functions/nput.htm)
203+
204+
Generally, the [`⎕N...`](#binary-files-and-other-arbitrary-file-types) family of system functions are for reading and writing *native files* as described in the documentation. `⎕NGET` and `⎕NPUT` are useful for reading and writing text files without having to tie and untie them.
205+
206+
```APL
207+
(⊂words)⎕NPUT'data/words.txt' ⍝ Write words to a unicode text file
208+
(content encoding newline)←⎕NGET'data/words.txt' ⍝ Read words from a unicode text file
209+
words←⊃⎕NGET'data/words.txt' 1 ⍝ Split words on each new line
210+
```
211+
149212
### ⎕CSV
150-
Comma separated values are a very common and convenient . While we encourage you to [read the documentation](https://help.dyalog.com/latest/#Language/System Functions/csv.htm) for a full description, here is an overview of features of `⎕CSV`:
213+
[:material-web: Comma Separated Values documentation](https://help.dyalog.com/latest/#Language/System%20Functions/csv.htm)
214+
[:material-video: Parsing content from text files using `⎕CSV`](https://www.youtube.com/watch?v=AHoiROI15BA)
215+
216+
The Comma Separator Values system function `⎕CSV` can read tabular data from .csv files as APL matrices. Here are some features of `⎕CSV`:
151217

152218
- Read data from and write data to files directly
153219
```APL
154-
data ← ⎕CSV '/path/to/file.csv'
220+
data ← ⎕CSV '/path/to/file.csv' ⍝ Read from file.csv
221+
data ⎕CSV '/path/to/file.csv' ⍝ Write to file.csv
155222
```
156223
- Separate the header (first row) from the rest of the data
157224
```APL
158225
(data header) ← ⎕CSV '/path/to/file.csv' ⍬ ⍬ 1
159226
```
160-
- Treat specific columns of input as numeric or text, depending on the options provided.
161-
The `4` here indicates to convert numeric values if possible, else keep the value as text.
227+
- Import specific columns as numbers or characters, depending on the options provided.
228+
162229
```APL
163230
numeric_if_possible ← ⎕CSV '/path/to/file.csv' ⍬ 4
164231
```
232+
233+
The `4` in this example indicates to convert numeric values if possible, else keep the value as text.
234+
165235
- Use a separator other than commas, using the "Separator" variant option, for example using tabs (`⎕UCS 9`) for Tab Separated Values (.tsv).
166236
```APL
167237
tsv ← ⎕CSV⍠'Separator' (⎕UCS 9)⊢'/path/to/file.csv'
@@ -205,15 +275,47 @@ Comma separated values are a very common and convenient . While we encourage you
205275
*[CSV]: Comma Separated Values
206276

207277
### ⎕JSON
208-
JSON is not only a convenient way to represent nested data structures, but also a convenient data representation for the modern web since it is natively handled by JavaScript. `⎕JSON` converts between APL arrays, including namespaces and text vector representations of JSON.
278+
[:material-web: JSON Convert `⎕JSON` documentation](https://help.dyalog.com/latest/#Language/System%20Functions/json.htm)
279+
[:material-video: `⎕JSON` Table Support](https://dyalogprod.gos.dyalog.com/video-library/watch/?v=UHJHqCdUs8w)
209280

210-
```APL
211-
'ns'⎕NS⍬
212-
ns.var←1 2 3
213-
ns.char←'abc'
214-
⎕JSON ns
215-
{"char":"abc","var":[1,2,3]}
216-
```
281+
JavaScript Object Notation (JSON) can be translated to and from APL.
282+
283+
- Lists can be represented as APL vectors
284+
285+
```APL
286+
1⎕JSON (1 2 3)'ABCD'
287+
[[1,2,3],"ABCD"]
288+
```
289+
290+
- Objects can be represented as APL namespaces.
291+
292+
```APL
293+
0⎕JSON '{"name":"David", "age": 42}'
294+
#.[JSON object]
295+
```
296+
297+
- Both can be represented as a matrix of depth, name, value and type columns somewhat similar to that used by [`⎕XML`](#xml).
298+
299+
```APL
300+
0 (⎕JSON ⎕OPT'Format' 'M')'[{"name":"David", "age": 42}, {"name": "Sandra", "age": 42}]'
301+
┌─┬────┬──────┬─┐
302+
│0│ │ │2│
303+
├─┼────┼──────┼─┤
304+
│1│ │ │1│
305+
├─┼────┼──────┼─┤
306+
│2│name│David │4│
307+
├─┼────┼──────┼─┤
308+
│2│age │42 │3│
309+
├─┼────┼──────┼─┤
310+
│1│ │ │1│
311+
├─┼────┼──────┼─┤
312+
│2│name│Sandra│4│
313+
├─┼────┼──────┼─┤
314+
│2│age │42 │3│
315+
└─┴────┴──────┴─┘
316+
```
317+
318+
JSON is not only a convenient way to represent nested data structures, but also a convenient data representation for the modern web since it is natively handled by JavaScript.
217319

218320
A JSON object in Dyalog uses dot-syntax to access members. Some JSON object keys are invalid APL names, so Dyalog works around this using special characters:
219321
```APL
@@ -242,21 +344,28 @@ Using `⎕JSON`, we can also [display error information in a human-readable form
242344
*[JSON]: JavaScript Object Notation
243345

244346
### ⎕XML
245-
XML is a format that has fallen out of favour in recent years, but is still useful to be able to import and export it easily when you need to.
347+
[:material-web: XML Convert `⎕XML` documentation](https://help.dyalog.com/latest/#Language/System%20Functions/xml.htm)
246348

247-
*[XML]: Extensible Markup Language
248-
249-
### Text Files
250-
Generally the `⎕N...` family of system functions are for reading and writing *native files* as described in the documentation. `⎕NGET` and `⎕NPUT` are useful for reading and writing text files without having to tie and untie them.
349+
`⎕XML` converts between XML character vectors and a nested matrices of node depth, tag name, value, attribute key/value pairs and markup description columns.
251350

252351
```APL
253-
(⊂words)⎕NPUT'data/words.txt' ⍝ Write words to a unicode text file
254-
(content encoding newline)←⎕NGET'data/words.txt' ⍝ Read words from a unicode text file
255-
words←(⎕UCS newline)((~∊⍨)⊆⊢)content ⍝ Split words on each new line
352+
⎕XML'<name born="1920">Ken</name><name born="1925">Jean</name>'
353+
┌─┬────┬────┬───────────┬─┐
354+
│0│name│Ken │┌────┬────┐│5│
355+
│ │ │ ││born│1920││ │
356+
│ │ │ │└────┴────┘│ │
357+
├─┼────┼────┼───────────┼─┤
358+
│0│name│Jean│┌────┬────┐│5│
359+
│ │ │ ││born│1925││ │
360+
│ │ │ │└────┴────┘│ │
361+
└─┴────┴────┴───────────┴─┘
256362
```
257363

258-
### ⎕N...
259-
This is a quick summary. For more details see [the Native Files cheat sheet](https://docs.dyalog.com/latest/CheatSheet%20-%20Native%20Files.pdf) and [system functions and variables A-Z](https://help.dyalog.com/latest/index.htm#Language/System%20Functions/Summary%20Tables/System%20Functions%20and%20Variables%20ColWise.htm) in the online documentation.
364+
*[XML]: Extensible Markup Language
365+
366+
### Binary files and other arbitrary file types
367+
[:fontawesome-solid-file-pdf: Native Files Cheat Sheet](https://docs.dyalog.com/latest/CheatSheet%20-%20Native%20Files.pdf)
368+
[:material-web: System Functions Categorised](https://help.dyalog.com/latest/#Language/System%20Functions/Summary%20Tables/System%20Functions%20Categorised.htm)
260369

261370
In the chapter on selecting from arrays there was [an example of reading a text file](../loops-and-recursion/#word-problems) using `⎕NGET`. Before Dyalog version 15.0, reading text files required a couple of extra steps. Some `⎕N...` native file functions are general and can be used to read and write any type of file. As a simple example, here we tie the file **words.txt**, read the data and store it in a variable, and finally untie the file.
262371

@@ -273,12 +382,21 @@ In the chapter on selecting from arrays there was [an example of reading a text
273382
```
274383

275384
### ⎕MAP
276-
The memory mapping function `⎕MAP` associates a file on disk with an APL array in the workspace. This is useful if you are working with data that cannot fit inside the available workspace memory. One approach might be to read the data in chunks and process one chunk at a time (for example, see the "Records" variant option for `⎕CSV`). Another approach is to use `⎕MAP`.
385+
[:material-web: Map File `⎕MAP` documentation](https://help.dyalog.com/latest/index.htm#Language/System%20Functions/map.htm)
277386

278-
## Component files
279-
If it is only APL systems that need to store data, the most convenient and efficient way to store that data is in APL **component files**.
387+
The memory mapping function `⎕MAP` allows you to treat a file on disk as if it were a variable in the workspace. This is useful if you are working with data that cannot fit inside the available workspace memory. One approach might be to read the data in chunks and process one chunk at a time (for example, see the "Records" variant option for [`⎕CSV`](#csv)). Another approach is to use `⎕MAP`.
388+
389+
```APL
390+
text ← 80 ¯1 ⎕MAP '/path/to/file.txt'
391+
```
392+
393+
You must specify the type according to the [Data Representation `⎕DR`](http://help.dyalog.com/latest/#Language/System%20Functions/Data%20Representation%20Monadic.htm) of the data to be read.
280394

281-
Here we will briefly look at the basic usage of component files. A full treatment of component files is provided in [Chapter N of Mastering Dyalog APL](https://www.dyalog.com/uploads/documents/MasteringDyalogAPL.pdf#page=557) and more information can be found in the [component file documentation](http://help.dyalog.com/latest/#Language/APL Component Files/Component Files.htm).
395+
## APL Component files
396+
[:fontawesome-solid-file-pdf: Chapter N of Mastering Dyalog APL](https://www.dyalog.com/uploads/documents/MasteringDyalogAPL.pdf#page=557)
397+
[:material-web: Component File documentation](https://help.dyalog.com/latest/#Language/APL%20Component%20Files/Component%20Files.htm)
398+
399+
If it is only APL systems that need to store data, the most convenient and efficient way to store that data is in APL **component files**.
282400

283401
System functions that deal with component files begin `⎕F...`.
284402

@@ -288,23 +406,23 @@ In Dyalog, component files have the extension **.dcf** (Dyalog Component File) a
288406
A component file may be exclusively tied (`⎕FTIE`) or have a shared tie (`⎕FSTIE`). With an exclusive tie, no other process may access the file.
289407

290408
```APL
291-
tn←'cfile'⎕FCREATE 0 ⍝ The file is exclusively tied
292-
⎕FUNTIE tn ⍝ The file is untied, it can now be used by other applications and processes
409+
tn←'cfile'⎕FCREATE 0 ⍝ The file is exclusively tied
410+
⎕FUNTIE tn ⍝ The file is untied, it can now be used by other applications and processes
293411
```
294412

295413
The next time we want to use this file, we can use `⎕FTIE` instead of `⎕FCREATE`. The right argument to these functions specifies a tie number (which can be different each time the file is tied), but with a right argument of `0` the next available tie number is used (component file tie numbers start at 1).
296414

297415
```APL
298-
tn←'cfile'⎕FTIE 0 ⍝ The file on disk is cfile.dcf, but this extension is assumed if not specified
416+
tn←'cfile'⎕FTIE 0 ⍝ The file on disk is cfile.dcf, but this extension is assumed if not specified
299417
```
300418

301419
The structure of a component file is analogous to a nested vector of arrays. We add new values by appending them to the end of a file.
302420

303421
```APL
304-
(3 3⍴⍳9)⎕FAPPEND tn
305-
(↑'Dave' 'Sam' 'Ellie' 'Saif')⎕FAPPEND tn
306-
nested←2 2⍴'this' 0 'that' (1 2 3)
307-
nested ⎕FAPPEND tn
422+
(3 3⍴⍳9)⎕FAPPEND tn
423+
(↑'Dave' 'Sam' 'Ellie' 'Saif')⎕FAPPEND tn
424+
nested←2 2⍴'this' 0 'that' (1 2 3)
425+
nested ⎕FAPPEND tn
308426
```
309427

310428
Each array stored in a component file (a *component*) is referred to by its index in the file (its *component number*), starting from 1 (not affected by `⎕IO`).
@@ -366,19 +484,19 @@ If you are working on a system through which multiple users need to access the s
366484

367485
Multi-user access can mean manual access by actual human users, or automated access by separate computers or processes.
368486

369-
## Downloading data from the internet
370-
The **HttpCommand** class is built on top of the [**Conga**](https://docs.dyalog.com/latest/Conga%20User%20Guide.pdf) framework for TCP/IP communications. At the most basic level, it can be used to perform HTTP requests to retrieve data from servers.
487+
## SQL Databases
488+
[:fontawesome-solid-file-pdf: SQL Interface Guide](https://docs.dyalog.com/latest/SQL%20Interface%20Guide.pdf)
489+
490+
**SQAPL** ships with Dyalog and can be used out-of-the-box provided that a database is installed and a corresponding ODBC data source has been set up.
371491

372492
```APL
373-
]Get HttpCommand
374-
#.HttpCommand
375-
⍴(#.HttpCommand.Get 'https://google.com').Data
376-
14107
493+
'SQA'⎕CY'sqapl'
494+
SQA.Connect cid odbc_datasource_name sql_password sql_user
495+
SQA.Do cid 'USE my_database'
496+
SQA.Do cid 'SELECT * FROM my_table'
377497
```
378498

379-
Using `HttpCommand` with [`⎕FIX`](../Code/#fix) is a way to download APL code from the internet.
380-
381-
For more information, see [the online documentation for HttpCommand](https://dyalog.github.io/HttpCommand). Alternatively, there is documentation within the comments of the code for the HttpCommand class; simply use `)ed HttpCommand` or press <kbd>Shift+Enter</kbd> with the text cursor on the name in the session.
499+
Some freely available ODBC drivers allow you to connect to databases and are sufficient for most use cases, such as the [MySQL ODBC Connector](https://dev.mysql.com/downloads/connector/odbc/) or the [MariaDB ODBC Connector](https://mariadb.com/kb/en/mariadb-connector-odbc/). If you cannot find one which works for your particular hardware and software, Dyalog resells [Progress DataDirect ODBC drivers](http://www.datadirect.com/products/datadirect-connect/odbc-drivers), but these require a different version of SQAPL which is licensed separately. Contact Dyalog sales if you require the use of Progress DataDirect ODBC drivers.
382500

383501
## Problem set 13
384502

0 commit comments

Comments
 (0)