boyxuper's blog: Elasticsearch learning log

Elasticsearch

https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started.html

Usages:

autocomplete suggestions
collect log or transaction data and you want to analyze and mine this data to look for trends, statistics, summarizations, or anomalies
price alerting platform which allows price-savvy customers to specify a rule
analytics/business-intelligence needs and want to quickly investigate, analyze, visualize, and ask ad-hoc questions

https://www.elastic.co/guide/en/elasticsearch/reference/current/_basic_concepts.html

Make sure that you don’t reuse the same Cluster names in different environments
a Node is identified by a name which by default is a random Universally Unique IDentifier (UUID) that is assigned to the node at startup. You can define any node name you want if you do not want the default.
An IndDex is a collection of documents that have somewhat similar characteristics.
Within an index, you can define one or more Types.
A Document is a basic unit of information that can be indexed.
Shards & Replicas you may change the number of replicas dynamically anytime but you cannot change the number of shards

Default 5 shards & 1 replica (means 2)

https://www.elastic.co/guide/en/elasticsearch/reference/current/_installation.html

Pass

https://www.elastic.co/guide/en/elasticsearch/reference/current/_cluster_health.html

curl http://localhost:9200/_cluster/health/?pretty
curl http://localhost:9200/_cluster/state?pretty
curl -XGET 'localhost:9200/_cat/indices?v&pretty'
curl -XGET 'localhost:9200/_cat/nodes?v&pretty'
curl -XGET 'localhost:9200/_cat/health?v&pretty'
curl -XDELETE 'localhost:9200/customer?pretty&pretty'
<REST Verb> /<Index>/<Type>/<ID>
curl -XPUT 'localhost:9200/customer/external/1?pretty&pretty' -H 'Content-Type: application/json' -d' { "name": "John Doe" } '

Update / insert
using the POST verb instead of PUT since we didn’t specify an ID.

curl -XPOST 'localhost:9200/customer/external/1/_update?pretty&pretty' -H 'Content-Type: application/json' -d'{"script" : "ctx._source.age += 5"}'
DELETE /customer/external/2?pretty
Batch Processing

POST /customer/external/_bulk?pretty
{"index":{"_id":"1"}}
{"name": "John Doe" }
{"index":{"_id":"2"}}
{"name": "Jane Doe" }
{"update":{"_id":"1"}}
{"doc": { "name": "John Doe becomes Jane Doe" } }
{"delete":{"_id":"2”}}
The Bulk API does not fail due to failures in one of the actions. you can check if a specific action failed or not.

https://www.elastic.co/guide/en/elasticsearch/reference/current/_the_search_api.html

The request body method allows you to be more expressive and also to define your searches in a more readable JSON format.
GET /bank/_search
{
"query": { "match_all": {} },
"sort": { "balance": { "order": "desc" } },
"_source": ["account_number", "balance"] # fields
"from": 10,
"size": 10
}
Query:
"bool": {
     "should": [
       { "match": { "address": "mill" } },
       { "match": { "address": "lane" } }
     ]

should -> or
must -> and
must_not -> all false

    "filter": {
       "range": {
         "balance": {
           "gte": 20000,
           "lte": 30000
         }
       }
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html

没时间看

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html

Full query DSL supporting
Cross cluster
Scripting
Fetch the status of all running reindex requests

curl -XGET 'localhost:9200/_tasks?detailed=true&actions=*reindex&pretty'

https://www.elastic.co/guide/en/elasticsearch/reference/current/ingest.html

pre-process documents before indexing, you define a pipeline that specifies a series of processors.
Append Processor
Convert Processor
Date Processor
Date Index Name Processor
Fail Processor
Foreach Processor
Grok Processor
Gsub Processor
Join Processor
JSON Processor
KV Processor
Lowercase Processor
Remove Processor
Rename Processor
Script Processor
Set Processor
Split Processor
Sort Processor
Trim Processor
Uppercase Processor
Dot Expander Processor

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html#scroll-search-context

Keeping the search context alive
POST /_search/scroll
{
"scroll" : "1m",
"scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAA..."
}
POST /twitter/tweet/_search?scroll=1m

https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html

Filtered / routing / mapping

https://www.elastic.co/guide/en/elasticsearch/reference/current/general-recommendations.html

Don’t return large result sets, use scroll APIs
Avoid large documents, http.max_context_length is set to 100MB, Lucene still has a limit of about 2GB.

you want to make books searchable doesn’t necessarily mean that a document should consist of a whole book

Avoid sparsity

Avoid putting unrelated data in the same index
Even if you really need to put different kinds of documents in the same index, Normalize document structures
Avoid types… having multiple types that have different fields in a single index will also cause problems
norms can be disabled if producing scores is not necessary on a field

https://www.elastic.co/guide/en/elasticsearch/reference/current/glossary.html

document is stored in an index and has a type and an id. A document is a JSON object, The original JSON document that is indexed will be stored in the _source field
A mapping is like a schema definition in a relational database. The mapping also allows you to define (amongst other things) how the value for a field should be analyzed. Has a number of index-wide settings. Fields with the same name in different types in the same index must have the same mapping
An index is like a table in a relational database. It has a mapping which defines the fields in the index, which are grouped by multiple type.
Each primary shard can have zero or more replicas. A replica is a copy of the primary shard, and has two purposes: increase failover,increase performance. never be started on the same node as its primary shard.
A shard is a single Lucene instance. you never need to refer to shards directly.
A term is an exact value that is indexed in elasticsearch. The terms foo, Foo, FOO are NOT equivalent. can be searched for using term queries
Analysis is the process of converting full text to terms. These terms are what is actually stored in the index. A full text query (not a term query) for FoO:bAR will also be analyzed to the terms foo,bar and will thus match the terms stored in the index.
Text (or full text) is ordinary unstructured text, such as this paragraph.
Each document is stored in a single primary shard. primary shard is chosen by hashing the routing value. derived from document id or, parent document id(to ensure stored on the same shard). This value can be overridden by specifying a routing value at index time, or a routing field in the mapping.

https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html

Slow Log

https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html

Fields with the same name in different mapping types in the same index must have the same mapping.
dynamic mapping rules can guess fields types. Or you can define by Explicit mappings
existing type and field mappings cannot be updated

https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html

token_count is really an integer field, to count the number of tokens in a string
Array support does not require a dedicated type
object for single JSON objects
nested for arrays of JSON objects
geo_point for lat/lon points
geo_shape for complex shapes like polygons
ip for IPv4 and IPv6 addresses
completion to provide auto-complete suggestions
murmur3 to compute hashes of values at index-time and store them in the index
the mapper-attachments plugin which supports indexing attachments like Microsoft Office formats, Open Document formats, ePub, HTML, etc. into an attachment datatype.
Percolator type: Accepts queries from the query-dsl

https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-all-field.html

The _all field concatenates the values of all of the other fields into one big string, then analyzed and indexed, but not stored. ("store": true)
All values treated as strings
The _all field takes fields’ boosts into account
copy_to parameter allows the creation of multiple custom _all fields
       "first_name": {
         "type":    "text",
         "copy_to": "full_name"
       },
       "last_name": {
         "type":    "text",
         "copy_to": "full_name"
       },

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-percolate-query.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/percolator.html#percolator
Stores query instead of document

https://www.elastic.co/guide/en/elasticsearch/reference/current/object.html

"properties": {
           "age": { "type": "integer" },
           "name": {
             "properties": {
               "first": { "type": "text" },
               "last": { "type": "text" }
             }
           }

https://www.elastic.co/guide/en/elasticsearch/reference/current/array.html

all values in the array must be of the same datatype.
null values are either replaced by the configured null_value or skipped entirely. An empty array [] is treated as a missing field — a field with no values.
Treated as a set of data, without order

https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html

use nested query to query them
Indexing a document with 100 nested fields actually indexes 101 documents

https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-fields.html

Meta fields reference, wrong url

https://www.elastic.co/guide/en/elasticsearch/reference/current/doc-values.html#doc-values

If you are sure that you don’t need to sort or aggregate on a field, or access the field value from a script, you can disable doc values in order to save disk space:

https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-mapping.html

_default_ mapping, Configure the base mapping to be used for new mapping types.
PUT index-name/_settings "index.mapper.dynamic":false
numeric detection (which is disabled by default)

https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-templates.html

How to dynamic mapping fields to type

https://www.elastic.co/guide/en/elasticsearch/plugins/5.2/mapper-size.html

The mapper-size plugin provides the _size meta field which, when enabled, indexes the size in bytes of the original _source field.

https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html

Keyword fields are only searchable by their exact value.

https://www.elastic.co/guide/en/elasticsearch/reference/current/coerce.html#coerce

Coercion attempts to clean up dirty values to fit the datatype of a field by default

https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html

"properties": {
       "city": {
         "type": "text",
         "fields": {
           "raw": {
             "type": "keyword", "analyzer": "english"
           }
         }
       }
"sort": {
"city.raw": "asc"
},

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters.html

Suggest search on different fields

https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis.html

converting text, like the body of any email, into tokens or terms which are added to the inverted index for searching
This same analysis process is applied to the query string at search time

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-termvectors.html

Outputs the statistic information of terms in a document

boyxuper's blog

Tuesday, March 28, 2017

Elasticsearch learning log

No comments:

Post a Comment