Skip to content

Commit 8a6b5f4

Browse files
committed
[TASK] update README and documentation in Settings.yaml
1 parent 6397454 commit 8a6b5f4

2 files changed

Lines changed: 224 additions & 3 deletions

File tree

Configuration/Settings.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,13 @@ Flowpack:
22
ElasticSearch:
33
ContentRepositoryAdaptor:
44

5+
# API. name of the ElasticSearch index to use
56
indexName: typo3cr
67

8+
# we use batch indexing, in order to reduce the number of HTTP requests while indexing
79
indexingBatchSize: 100
810

11+
# configuration of the ElasticSearch logfile
912
log:
1013
backendOptions:
1114
fileBackend:

README.md

Lines changed: 221 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,229 @@
11
# TYPO3 Neos ElasticSearch Adapter
22

3+
Created by Sebastian Kurfürst; contributions by Karsten Dambekalns and Robert Lemke.
4+
35
This project connects the TYPO3 Content Repository (TYPO3CR) to ElasticSearch; enabling two
46
main functionalities:
57

6-
* Full-Text Indexing of Pages and other Documents (of course including the full content)
78
* finding Nodes in TypoScript / Eel by arbitrary queries
9+
* Full-Text Indexing of Pages and other Documents (of course including the full content)
10+
11+
12+
## Building up the Index
13+
14+
The node index is updated on the fly, but during development you need to update it frequently.
15+
16+
In case of a mapping update, you need to reindex all nodes. Don't worry to do that in production;
17+
the system transparently creates a new index, fills it completely, and when everything worked,
18+
changes the index alias.
19+
20+
21+
```
22+
./flow nodeindex:build
23+
24+
# if during development, you only want to index a few nodes, you can use "limit"
25+
./flow nodeindex:build --limit 20
26+
27+
# in order to remove old, non-used indices, you should use this command from time to time:
28+
./flow nodeindex:cleanup
29+
```
30+
31+
32+
## Doing Arbitrary Queries
33+
34+
We'll first show how to do arbitrary ElasticSearch Queries in TypoScript. This is a more powerful
35+
alternative to FlowQuery. In the long run, we might be able to integrate this API back into FlowQuery,
36+
but for now it works well as-is.
37+
38+
Generally, ElasticSearch queries are done using the `ElasticSearch` Eel helper. In case you want
39+
to retieve a *list of nodes*, you'll generally do:
40+
```
41+
nodes = ${ElasticSearch.query(site)....execute()}
42+
```
43+
44+
In case you just want to retrieve a *single node*, the form of a query is as follows:
45+
```
46+
nodes = ${q(ElasticSearch.query(site)....execute()).get(0)}
47+
```
48+
49+
All queries search underneath a certain subnode. In case you want to search "globally", you will
50+
search underneath the current site node (like in the example above).
51+
52+
Furthermore, the following operators are supported:
53+
54+
* `nodeType("Your.Node:Type")`
55+
* `exactMatch(key, value)`; supports simple types: `exactMatch('tag', 'foo')`, or node references: `exactMatch('author', authorNode)`
56+
* `sortAsc('propertyName')` and `sortDesc('propertyName')` -- can also be used multiple times, e.g. `sortAsc('tag').sortDesc(`date')` will first sort by tag ascending, and then by date descending.
57+
* `limit(5)` -- only return five results. If not specified, the default limit by ElasticSearch applies (which is at 10 by default)
58+
* `from(5)` -- return the results starting from the 6th one
59+
60+
Furthermore, there is a more low-level operator which can be used to add arbitrary ElasticSearch filters:
61+
62+
* `queryFilter("filterType", {option1: "value1"})`
63+
64+
In order to debug the query more easily, the following operation is helpful:
65+
66+
* `log()` log the full query on execution into the ElasticSearch log (i.e. in `Data/Logs/ElasticSearch.log`)
67+
68+
### Example Queries
69+
70+
#### Finding all pages which are tagged in a special way and rendering them in an overview
71+
72+
Use Case: On a "Tag Overview" page, you want to show all pages being tagged in a certain way
73+
74+
Setup: You have two node types in a blog called `Acme.Blog:Post` and `Acme.Blog:Tag`, both
75+
inheriting from `TYPO3.Neos:Document`. The `Post` node type has a property `tags` which is
76+
of type `references`, pointing to `Tag` documents.
77+
78+
TypoScript setup:
79+
80+
```
81+
# for "Tag" documents, replace the main content area.
82+
prototype(TYPO3.Neos:PrimaryContent).acmeBlogTag {
83+
condition = ${q(node).is('[instanceof Acme.Blog:Tag]')}
84+
type = 'Acme.Blog:TagPage'
85+
}
86+
87+
# The "TagPage"
88+
prototype(Acme.Blog:TagPage) < prototype(TYPO3.TypoScript:Collection) {
89+
collection = ${ElasticSearch.query(site).nodeType('Acme.Blog:Post').exactMatch('tags', node).sortDesc('creationDate').execute()}
90+
itemName = 'node'
91+
itemRenderer = Acme.Blog:SingleTag
92+
}
93+
prototype(Acme.Blog:SingleTag) < prototype(TYPO3.Neos:Template) {
94+
...
95+
}
96+
```
97+
98+
99+
## Fulltext Search / Indexing
100+
101+
When searching in a fulltext index, we want to show Pages, or, generally speaking, everything
102+
which is a `Document` node. However, the main content of a certain `Document` is often not stored
103+
in the node itself, but inside its (`Content`) child nodes.
104+
105+
This is why we need some special functionality for indexing, which *adds the content of the inner
106+
nodes* to the `Document` nodes where they belong to, to a field called `__fulltext` and
107+
`__fulltextParts`.
108+
109+
Furthermore, we want that a fulltext match e.g. inside a headline is seen as *more important* than
110+
a match inside the normal body text. That's why the `Document` node not only contains one field with
111+
all the texts, but multiple "buckets" where text is added to: One field which contains everything
112+
deemed as "very important" (`__fulltext.h1`), one which is "less important" (`__fulltext.h2`),
113+
and finally one for the plain text (`__fulltext.text`). All of these fields add themselves to the
114+
ElasticSearch `_all` field, and are configured with different `boost` values.
115+
116+
In order to search this index, you can just search inside the `_all` field with an additional limitation
117+
of `__typeAndSupertypes` containing `TYPO3.Neos:Document`.
118+
119+
**Currently, this package does not contain a plugin for searching, though we might provide one lateron.**
120+
121+
122+
## Advanced: Configuration of Indexing
123+
124+
**Normally, this does not need to be touched, as this package supports all TYPO3 Neos data types natively.**
125+
126+
Indexing of properties is configured at two places. The defaults per-data-type are configured
127+
inside `Flowpack.ElasticSearch.ContentRepositoryAdaptor.defaultConfigurationPerType` of `Settings.yaml`.
128+
Furthermore, this can be overridden using the `properties.[....].elasticSearch` path inside
129+
`NodeTypes.yaml`.
130+
131+
This configuration contains two parts:
132+
133+
* Underneath `mapping`, the ElasticSearch property mapping can be defined.
134+
* Underneath `indexing`, an Eel expression which preprocesses the value before indexing has to be
135+
specified. It has access to the current `value` and the current `node`.
136+
137+
Example (from the default configuration):
138+
```
139+
# Settings.yaml
140+
Flowpack:
141+
ElasticSearch:
142+
ContentRepositoryAdaptor:
143+
defaultConfigurationPerType:
144+
145+
# strings should, by default, not be included in the _all field; and
146+
# indexing should just use their simple value.
147+
string:
148+
mapping:
149+
type: string
150+
include_in_all: false
151+
indexing: '${value}'
152+
```
153+
154+
```
155+
# NodeTypes.yaml
156+
'TYPO3.Neos:Timable':
157+
properties:
158+
'_hiddenBeforeDateTime':
159+
elasticSearch:
160+
161+
# a date should be mapped differently, and in this case we want to use a date format which
162+
# ElasticSearch understands
163+
mapping:
164+
type: date
165+
include_in_all: false
166+
format: 'date_time_no_millis'
167+
indexing: '${(node.hiddenBeforeDateTime ? node.hiddenBeforeDateTime.format("Y-m-d\TH:i:s") + "Z" : null)}'
168+
```
169+
170+
There are a few indexing helpers inside the `ElasticSearch` namespace which are usable inside the
171+
`indexing` expression. In most cases, you don't need to touch this, but they were needed to build up
172+
the standard indexing configuration:
173+
174+
* `ElasticSearch.buildAllPathPrefixes`: for a path such as `foo/bar/baz`, builds up a list of path
175+
prefixes, e.g. `['foo', 'foo/bar', 'foo/bar/baz']`.
176+
* `ElasticSearch.extractNodeTypeNamesAndSupertypes(NodeType)`: extracts a list of node type names for
177+
the passed node type and all of its supertypes
178+
* `ElasticSearch.convertArrayOfNodesToArrayOfNodeIdentifiers(array $nodes)`: convert the given nodes to
179+
their node identifiers.
180+
181+
182+
183+
## Advanced: Fulltext Indexing
184+
185+
In order to enable fulltext indexing, every `Document` node must be configured as *fulltext root*. Thus,
186+
the following is configured in the default configuration:
187+
188+
```
189+
'TYPO3.Neos:Document':
190+
elasticSearch:
191+
fulltext:
192+
isRoot: true
193+
```
194+
195+
A *fulltext root* contains all the *content* of its non-document children, such that when one searches
196+
inside these texts, the document itself is returned as result.
197+
198+
In order to specify how the fulltext of a property in a node should be extracted, this is configured
199+
in `NodeTypes.yaml` at `properties.[propertyName].elasticSearch.fulltextExtractor`.
200+
201+
An example:
202+
203+
```
204+
'TYPO3.Neos.NodeTypes:Text':
205+
properties:
206+
'text':
207+
elasticSearch:
208+
fulltextExtractor: '${ElasticSearch.fulltext.extractHtmlTags(value)}'
209+
210+
'My.Blog:Post':
211+
properties:
212+
title:
213+
elasticSearch:
214+
fulltextExtractor: ${ElasticSearch.fulltext.extractInto('h1', value)}
215+
```
216+
217+
218+
## Fulltext Searching / Search Plugin
219+
220+
There is currently no fulltext search plugin included, though we might add one lateron.
221+
222+
223+
## Debugging
8224

9-
TODO Sebastian: this needs improved documentation!
225+
In order to understand what's going on, the following commands are helpful:
10226

11-
If you need this package, ping Sebastian Kurfuerst or Karsten Dambekalns and bug them to update documentation :-)
227+
* use `./flow nodeindex:showMapping` to show the currently defined ElasticSearch Mapping
228+
* use the `.log()` statement inside queries to dump them to the ElasticSearch Log
229+
* the logfile `Data/Logs/ElasticSearch.log` contains loads of helpful information.

0 commit comments

Comments
 (0)