Skip to content

Commit 2d07d48

Browse files
committed
Split README into focused documentation files
The README was ~1370 lines mixing tutorial, reference, and configuration docs. Split the detailed content into focused files under docs/ while keeping the README as a concise entry point with links.
1 parent 845f34c commit 2d07d48

File tree

8 files changed

+1128
-1117
lines changed

8 files changed

+1128
-1117
lines changed

README.md

Lines changed: 10 additions & 1117 deletions
Large diffs are not rendered by default.

docs/configuration.md

Lines changed: 436 additions & 0 deletions
Large diffs are not rendered by default.

docs/import.md

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
# Import
2+
3+
## Default import options
4+
5+
Every index has `default_import_options` configuration to specify, suddenly, default import options:
6+
7+
```ruby
8+
class ProductsIndex < Chewy::Index
9+
index_scope Post.includes(:tags)
10+
default_import_options batch_size: 100, bulk_size: 10.megabytes, refresh: false
11+
12+
field :name
13+
field :tags, value: -> { tags.map(&:name) }
14+
end
15+
```
16+
17+
See [import.rb](../lib/chewy/index/import.rb) for available options.
18+
19+
## Raw import
20+
21+
Another way to speed up import time is Raw Imports. This technology is only available in ActiveRecord adapter. Very often, ActiveRecord model instantiation is what consumes most of the CPU and RAM resources. Precious time is wasted on converting, say, timestamps from strings and then serializing them back to strings. Chewy can operate on raw hashes of data directly obtained from the database. All you need is to provide a way to convert that hash to a lightweight object that mimics the behaviour of the normal ActiveRecord object.
22+
23+
```ruby
24+
class LightweightProduct
25+
def initialize(attributes)
26+
@attributes = attributes
27+
end
28+
29+
# Depending on the database, `created_at` might
30+
# be in different formats. In PostgreSQL, for example,
31+
# you might see the following format:
32+
# "2016-03-22 16:23:22"
33+
#
34+
# Taking into account that Elastic expects something different,
35+
# one might do something like the following, just to avoid
36+
# unnecessary String -> DateTime -> String conversion.
37+
#
38+
# "2016-03-22 16:23:22" -> "2016-03-22T16:23:22Z"
39+
def created_at
40+
@attributes['created_at'].tr(' ', 'T') << 'Z'
41+
end
42+
end
43+
44+
index_scope Product
45+
default_import_options raw_import: ->(hash) {
46+
LightweightProduct.new(hash)
47+
}
48+
49+
field :created_at, 'datetime'
50+
```
51+
52+
Also, you can pass `:raw_import` option to the `import` method explicitly.
53+
54+
## Index creation during import
55+
56+
By default, when you perform import Chewy checks whether an index exists and creates it if it's absent.
57+
You can turn off this feature to decrease Elasticsearch hits count.
58+
To do so you need to set `skip_index_creation_on_import` parameter to `false` in your `config/chewy.yml`.
59+
60+
## Skip record fields during import
61+
62+
You can use `ignore_blank: true` to skip fields that return `true` for the `.blank?` method:
63+
64+
```ruby
65+
index_scope Country
66+
field :id
67+
field :cities, ignore_blank: true do
68+
field :id
69+
field :name
70+
field :surname, ignore_blank: true
71+
field :description
72+
end
73+
```
74+
75+
### Default values for different types
76+
77+
By default `ignore_blank` is false on every type except `geo_point`.
78+
79+
## Journaling
80+
81+
You can record all actions that were made to the separate journal index in Elasticsearch.
82+
When you create/update/destroy your documents, it will be saved in this special index.
83+
If you make something with a batch of documents (e.g. during index reset) it will be saved as a one record, including primary keys of each document that was affected.
84+
Common journal record looks like this:
85+
86+
```json
87+
{
88+
"action": "index",
89+
"object_id": [1, 2, 3],
90+
"index_name": "...",
91+
"created_at": "<timestamp>"
92+
}
93+
```
94+
95+
This feature is turned off by default.
96+
You can turn it on by setting `journal` option to `true` in `config/chewy.yml`.
97+
98+
Also, you can provide this option while you're importing some index:
99+
100+
```ruby
101+
CityIndex.import journal: true
102+
```
103+
104+
Or as a default import option for an index:
105+
106+
```ruby
107+
class CityIndex
108+
index_scope City
109+
default_import_options journal: true
110+
end
111+
```
112+
113+
You may be wondering why do you need it? The answer is simple: not to lose the data.
114+
115+
Imagine that you reset your index in a zero-downtime manner (to separate index),
116+
and in the meantime somebody keeps updating the data frequently (to old
117+
index). So all these actions will be written to the journal index and you'll be
118+
able to apply them after index reset using the `Chewy::Journal` interface.
119+
120+
When enabled, journal can grow to enormous size, consider setting up cron job
121+
that would clean it occasionally using [`chewy:journal:clean` rake
122+
task](rake_tasks.md#chewyjournal).

0 commit comments

Comments
 (0)