|
1 | 1 | # TYPO3 Neos ElasticSearch Adapter |
2 | 2 |
|
| 3 | +Created by Sebastian Kurfürst; contributions by Karsten Dambekalns and Robert Lemke. |
| 4 | + |
3 | 5 | This project connects the TYPO3 Content Repository (TYPO3CR) to ElasticSearch; enabling two |
4 | 6 | main functionalities: |
5 | 7 |
|
6 | | -* Full-Text Indexing of Pages and other Documents (of course including the full content) |
7 | 8 | * finding Nodes in TypoScript / Eel by arbitrary queries |
| 9 | +* Full-Text Indexing of Pages and other Documents (of course including the full content) |
| 10 | + |
| 11 | + |
| 12 | +## Building up the Index |
| 13 | + |
| 14 | +The node index is updated on the fly, but during development you need to update it frequently. |
| 15 | + |
| 16 | +In case of a mapping update, you need to reindex all nodes. Don't worry to do that in production; |
| 17 | +the system transparently creates a new index, fills it completely, and when everything worked, |
| 18 | +changes the index alias. |
| 19 | + |
| 20 | + |
| 21 | +``` |
| 22 | +./flow nodeindex:build |
| 23 | +
|
| 24 | + # if during development, you only want to index a few nodes, you can use "limit" |
| 25 | +./flow nodeindex:build --limit 20 |
| 26 | +
|
| 27 | + # in order to remove old, non-used indices, you should use this command from time to time: |
| 28 | +./flow nodeindex:cleanup |
| 29 | +``` |
| 30 | + |
| 31 | + |
| 32 | +## Doing Arbitrary Queries |
| 33 | + |
| 34 | +We'll first show how to do arbitrary ElasticSearch Queries in TypoScript. This is a more powerful |
| 35 | +alternative to FlowQuery. In the long run, we might be able to integrate this API back into FlowQuery, |
| 36 | +but for now it works well as-is. |
| 37 | + |
| 38 | +Generally, ElasticSearch queries are done using the `ElasticSearch` Eel helper. In case you want |
| 39 | +to retieve a *list of nodes*, you'll generally do: |
| 40 | +``` |
| 41 | +nodes = ${ElasticSearch.query(site)....execute()} |
| 42 | +``` |
| 43 | + |
| 44 | +In case you just want to retrieve a *single node*, the form of a query is as follows: |
| 45 | +``` |
| 46 | +nodes = ${q(ElasticSearch.query(site)....execute()).get(0)} |
| 47 | +``` |
| 48 | + |
| 49 | +All queries search underneath a certain subnode. In case you want to search "globally", you will |
| 50 | +search underneath the current site node (like in the example above). |
| 51 | + |
| 52 | +Furthermore, the following operators are supported: |
| 53 | + |
| 54 | +* `nodeType("Your.Node:Type")` |
| 55 | +* `exactMatch(key, value)`; supports simple types: `exactMatch('tag', 'foo')`, or node references: `exactMatch('author', authorNode)` |
| 56 | +* `sortAsc('propertyName')` and `sortDesc('propertyName')` -- can also be used multiple times, e.g. `sortAsc('tag').sortDesc(`date')` will first sort by tag ascending, and then by date descending. |
| 57 | +* `limit(5)` -- only return five results. If not specified, the default limit by ElasticSearch applies (which is at 10 by default) |
| 58 | +* `from(5)` -- return the results starting from the 6th one |
| 59 | + |
| 60 | +Furthermore, there is a more low-level operator which can be used to add arbitrary ElasticSearch filters: |
| 61 | + |
| 62 | +* `queryFilter("filterType", {option1: "value1"})` |
| 63 | + |
| 64 | +In order to debug the query more easily, the following operation is helpful: |
| 65 | + |
| 66 | +* `log()` log the full query on execution into the ElasticSearch log (i.e. in `Data/Logs/ElasticSearch.log`) |
| 67 | + |
| 68 | +### Example Queries |
| 69 | + |
| 70 | +#### Finding all pages which are tagged in a special way and rendering them in an overview |
| 71 | + |
| 72 | +Use Case: On a "Tag Overview" page, you want to show all pages being tagged in a certain way |
| 73 | + |
| 74 | +Setup: You have two node types in a blog called `Acme.Blog:Post` and `Acme.Blog:Tag`, both |
| 75 | +inheriting from `TYPO3.Neos:Document`. The `Post` node type has a property `tags` which is |
| 76 | +of type `references`, pointing to `Tag` documents. |
| 77 | + |
| 78 | +TypoScript setup: |
| 79 | + |
| 80 | +``` |
| 81 | + # for "Tag" documents, replace the main content area. |
| 82 | +prototype(TYPO3.Neos:PrimaryContent).acmeBlogTag { |
| 83 | + condition = ${q(node).is('[instanceof Acme.Blog:Tag]')} |
| 84 | + type = 'Acme.Blog:TagPage' |
| 85 | +} |
| 86 | +
|
| 87 | + # The "TagPage" |
| 88 | +prototype(Acme.Blog:TagPage) < prototype(TYPO3.TypoScript:Collection) { |
| 89 | + collection = ${ElasticSearch.query(site).nodeType('Acme.Blog:Post').exactMatch('tags', node).sortDesc('creationDate').execute()} |
| 90 | + itemName = 'node' |
| 91 | + itemRenderer = Acme.Blog:SingleTag |
| 92 | +} |
| 93 | +prototype(Acme.Blog:SingleTag) < prototype(TYPO3.Neos:Template) { |
| 94 | + ... |
| 95 | +} |
| 96 | +``` |
| 97 | + |
| 98 | + |
| 99 | +## Fulltext Search / Indexing |
| 100 | + |
| 101 | +When searching in a fulltext index, we want to show Pages, or, generally speaking, everything |
| 102 | +which is a `Document` node. However, the main content of a certain `Document` is often not stored |
| 103 | +in the node itself, but inside its (`Content`) child nodes. |
| 104 | + |
| 105 | +This is why we need some special functionality for indexing, which *adds the content of the inner |
| 106 | +nodes* to the `Document` nodes where they belong to, to a field called `__fulltext` and |
| 107 | +`__fulltextParts`. |
| 108 | + |
| 109 | +Furthermore, we want that a fulltext match e.g. inside a headline is seen as *more important* than |
| 110 | +a match inside the normal body text. That's why the `Document` node not only contains one field with |
| 111 | +all the texts, but multiple "buckets" where text is added to: One field which contains everything |
| 112 | +deemed as "very important" (`__fulltext.h1`), one which is "less important" (`__fulltext.h2`), |
| 113 | +and finally one for the plain text (`__fulltext.text`). All of these fields add themselves to the |
| 114 | +ElasticSearch `_all` field, and are configured with different `boost` values. |
| 115 | + |
| 116 | +In order to search this index, you can just search inside the `_all` field with an additional limitation |
| 117 | +of `__typeAndSupertypes` containing `TYPO3.Neos:Document`. |
| 118 | + |
| 119 | +**Currently, this package does not contain a plugin for searching, though we might provide one lateron.** |
| 120 | + |
| 121 | + |
| 122 | +## Advanced: Configuration of Indexing |
| 123 | + |
| 124 | +**Normally, this does not need to be touched, as this package supports all TYPO3 Neos data types natively.** |
| 125 | + |
| 126 | +Indexing of properties is configured at two places. The defaults per-data-type are configured |
| 127 | +inside `Flowpack.ElasticSearch.ContentRepositoryAdaptor.defaultConfigurationPerType` of `Settings.yaml`. |
| 128 | +Furthermore, this can be overridden using the `properties.[....].elasticSearch` path inside |
| 129 | +`NodeTypes.yaml`. |
| 130 | + |
| 131 | +This configuration contains two parts: |
| 132 | + |
| 133 | +* Underneath `mapping`, the ElasticSearch property mapping can be defined. |
| 134 | +* Underneath `indexing`, an Eel expression which preprocesses the value before indexing has to be |
| 135 | + specified. It has access to the current `value` and the current `node`. |
| 136 | + |
| 137 | +Example (from the default configuration): |
| 138 | +``` |
| 139 | + # Settings.yaml |
| 140 | +Flowpack: |
| 141 | + ElasticSearch: |
| 142 | + ContentRepositoryAdaptor: |
| 143 | + defaultConfigurationPerType: |
| 144 | + |
| 145 | + # strings should, by default, not be included in the _all field; and |
| 146 | + # indexing should just use their simple value. |
| 147 | + string: |
| 148 | + mapping: |
| 149 | + type: string |
| 150 | + include_in_all: false |
| 151 | + indexing: '${value}' |
| 152 | +``` |
| 153 | + |
| 154 | +``` |
| 155 | + # NodeTypes.yaml |
| 156 | +'TYPO3.Neos:Timable': |
| 157 | + properties: |
| 158 | + '_hiddenBeforeDateTime': |
| 159 | + elasticSearch: |
| 160 | +
|
| 161 | + # a date should be mapped differently, and in this case we want to use a date format which |
| 162 | + # ElasticSearch understands |
| 163 | + mapping: |
| 164 | + type: date |
| 165 | + include_in_all: false |
| 166 | + format: 'date_time_no_millis' |
| 167 | + indexing: '${(node.hiddenBeforeDateTime ? node.hiddenBeforeDateTime.format("Y-m-d\TH:i:s") + "Z" : null)}' |
| 168 | +``` |
| 169 | + |
| 170 | +There are a few indexing helpers inside the `ElasticSearch` namespace which are usable inside the |
| 171 | +`indexing` expression. In most cases, you don't need to touch this, but they were needed to build up |
| 172 | +the standard indexing configuration: |
| 173 | + |
| 174 | +* `ElasticSearch.buildAllPathPrefixes`: for a path such as `foo/bar/baz`, builds up a list of path |
| 175 | + prefixes, e.g. `['foo', 'foo/bar', 'foo/bar/baz']`. |
| 176 | +* `ElasticSearch.extractNodeTypeNamesAndSupertypes(NodeType)`: extracts a list of node type names for |
| 177 | + the passed node type and all of its supertypes |
| 178 | +* `ElasticSearch.convertArrayOfNodesToArrayOfNodeIdentifiers(array $nodes)`: convert the given nodes to |
| 179 | + their node identifiers. |
| 180 | + |
| 181 | + |
| 182 | + |
| 183 | +## Advanced: Fulltext Indexing |
| 184 | + |
| 185 | +In order to enable fulltext indexing, every `Document` node must be configured as *fulltext root*. Thus, |
| 186 | +the following is configured in the default configuration: |
| 187 | + |
| 188 | +``` |
| 189 | +'TYPO3.Neos:Document': |
| 190 | + elasticSearch: |
| 191 | + fulltext: |
| 192 | + isRoot: true |
| 193 | +``` |
| 194 | + |
| 195 | +A *fulltext root* contains all the *content* of its non-document children, such that when one searches |
| 196 | +inside these texts, the document itself is returned as result. |
| 197 | + |
| 198 | +In order to specify how the fulltext of a property in a node should be extracted, this is configured |
| 199 | +in `NodeTypes.yaml` at `properties.[propertyName].elasticSearch.fulltextExtractor`. |
| 200 | + |
| 201 | +An example: |
| 202 | + |
| 203 | +``` |
| 204 | +'TYPO3.Neos.NodeTypes:Text': |
| 205 | + properties: |
| 206 | + 'text': |
| 207 | + elasticSearch: |
| 208 | + fulltextExtractor: '${ElasticSearch.fulltext.extractHtmlTags(value)}' |
| 209 | +
|
| 210 | +'My.Blog:Post': |
| 211 | + properties: |
| 212 | + title: |
| 213 | + elasticSearch: |
| 214 | + fulltextExtractor: ${ElasticSearch.fulltext.extractInto('h1', value)} |
| 215 | +``` |
| 216 | + |
| 217 | + |
| 218 | +## Fulltext Searching / Search Plugin |
| 219 | + |
| 220 | +There is currently no fulltext search plugin included, though we might add one lateron. |
| 221 | + |
| 222 | + |
| 223 | +## Debugging |
8 | 224 |
|
9 | | -TODO Sebastian: this needs improved documentation! |
| 225 | +In order to understand what's going on, the following commands are helpful: |
10 | 226 |
|
11 | | -If you need this package, ping Sebastian Kurfuerst or Karsten Dambekalns and bug them to update documentation :-) |
| 227 | +* use `./flow nodeindex:showMapping` to show the currently defined ElasticSearch Mapping |
| 228 | +* use the `.log()` statement inside queries to dump them to the ElasticSearch Log |
| 229 | +* the logfile `Data/Logs/ElasticSearch.log` contains loads of helpful information. |
0 commit comments