Elastic Search
Introduction
User Defined Function (ingest processor) for Elastic Search
STATUS: … EXPERIMENTAL …
The ElasticSearch ingest plugin is very new.
And yes it is similar to https://www.elastic.co/guide/en/elasticsearch/reference/master/user-agent-processor.html
Getting the UDF
You can get the prebuilt ingest plugin from maven central for
Installing the plugin
You only need to install it into your Elastic Search once
On Elastic Search 7.x:
bin/elasticsearch-plugin install file:///path/to/yauaa-elasticsearch-7.29.0.zip
On Elastic Search 8.x
bin/elasticsearch-plugin install file:///path/to/yauaa-elasticsearch-8-7.29.0.zip
Usage
This plugin is intended to be used in an ingest pipeline
.
You have to specify the name of the input field
and the place where
the possible configuration flags are:
Name | Mandatory/Optional | Description | Default | Example |
---|---|---|---|---|
field_to_header_mapping | M | The mapping from the input field name to the original request header name of this field | - | "field_to_header_mapping" : { "ua": "User-Agent" } |
- | "useragent" | |||
target_field | M | The name of the output structure that will be filled with the parse results | "user_agent" | "parsed_ua" |
fieldNames | O | A list of Yauaa fieldnames that are desired. When specified the system will limit processing to what is needed to get these. This means faster and less memory used. | All possible fields | [ "DeviceClass", "DeviceBrand", "DeviceName", "AgentNameVersionMajor" ] |
cacheSize | O | The number of entries in the LRU cache of the parser | 10000 | 100 |
preheat | O | How many testcases are put through the parser at startup to warmup the JVM | 0 | 1000 |
extraRules | O | A yaml expression that is a set of extra rules and testcases. | - | "config:\n- matcher:\n extract:\n - '"'"'FirstProductName : 1 :agent.(1)product.(1)name'"'"'\n" |
Example usage
Basic pipeline
Create a pipeline that just extracts everything using the default settings:
curl -H 'Content-Type: application/json' -X PUT 'localhost:9200/_ingest/pipeline/yauaa-test-pipeline_basic' -d '
{
"description": "A pipeline to do whatever",
"processors": [
{
"yauaa" : {
"field_to_header_mapping" : {
"useragent": "User-Agent",
"uach_platform": "Sec-CH-UA-Platform",
"uach_platform_version": "Sec-CH-UA-Platform-Version"
},
"target_field" : "parsed"
}
}
]
}
'
Common pipeline
In this example a pipeline is created that only gets the fields that are actually desired.
curl -H 'Content-Type: application/json' -X PUT 'localhost:9200/_ingest/pipeline/yauaa-test-pipeline_some' -d '
{
"description": "A pipeline to do whatever",
"processors": [
{
"yauaa" : {
"field_to_header_mapping" : {
"useragent": "User-Agent",
"uach_platform": "Sec-CH-UA-Platform",
"uach_platform_version": "Sec-CH-UA-Platform-Version"
},
"target_field" : "parsed",
"fieldNames" : [ "DeviceClass", "DeviceBrand", "DeviceName", "AgentNameVersionMajor", "FirstProductName" ],
}
}
]
}
'
Advanced pipeline
In this example a pipeline is created that includes an example of a custom rule. The hardest part is making the yaml (with quotes, newlines and the needed indentation) encode correctly inside a JSon structure.
curl -H 'Content-Type: application/json' -X PUT 'localhost:9200/_ingest/pipeline/yauaa-test-pipeline_full' -d '
{
"description": "A pipeline to do whatever",
"processors": [
{
"yauaa" : {
"field_to_header_mapping" : {
"useragent": "User-Agent",
"uach_platform": "Sec-CH-UA-Platform",
"uach_platform_version": "Sec-CH-UA-Platform-Version"
},
"target_field" : "parsed",
"fieldNames" : [ "DeviceClass", "DeviceBrand", "DeviceName", "AgentNameVersionMajor", "FirstProductName" ],
"cacheSize" : 10,
"preheat" : 10,
"extraRules" : "config:\n- matcher:\n extract:\n - '"'"'FirstProductName : 1 :agent.(1)product.(1)name'"'"'\n"
}
}
]
}
'
Put record
I put a record in ElasticSearch using the above mentioned Advanced pipeline
curl -H 'Content-Type: application/json' -X PUT 'localhost:9200/my-index/my-type/1?pipeline=yauaa-test-pipeline_full' -d '
{
"useragent" : "Mozilla/5.0 (Linux; Android 7.0; Nexus 6 Build/NBD90Z) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.124 Mobile Safari/537.36"
}
'
which returns
{"_index":"my-index","_type":"my-type","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}
then I retrieve the record from elasticsearch and the additional parse results are now part of the indexed record.
curl -s -H 'Content-Type: application/json' -X GET 'localhost:9200/my-index/my-type/1' | python -m json.tool
results in
{
"_id": "1",
"_index": "my-index",
"_primary_term": 1,
"_seq_no": 0,
"_source": {
"parsed": {
"AgentName": "Chrome",
"AgentNameVersionMajor": "Chrome 53",
"AgentVersion": "53.0.2785.124",
"AgentVersionMajor": "53",
"DeviceBrand": "Google",
"DeviceClass": "Phone",
"DeviceName": "Google Nexus 6",
"FirstProductName": "Mozilla"
},
"useragent": "Mozilla/5.0 (Linux; Android 7.0; Nexus 6 Build/NBD90Z) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.124 Mobile Safari/537.36"
},
"_type": "my-type",
"_version": 1,
"found": true
}
NOTES for developers
The ElasticSearch testing tools are quick to complain about jar classloading issues: “jar hell”.
To make it possible to test this in IntelliJ you’ll need to set a custom property
- Help –> Edit Custom properties
- Make sure there is a line with
idea.no.launcher=true
- Restart IntelliJ