Elastic Search

Introduction

User Defined Function (ingest processor) for Elastic Search

STATUS: … EXPERIMENTAL …

The ElasticSearch ingest plugin is very new.

And yes it is similar to https://www.elastic.co/guide/en/elasticsearch/reference/master/user-agent-processor.html

Getting the UDF

You can get the prebuilt ingest plugin from maven central for

Installing the plugin

You only need to install it into your Elastic Search once

On Elastic Search 7.x:

bin/elasticsearch-plugin install file:///path/to/yauaa-elasticsearch-7.29.0.zip

On Elastic Search 8.x

bin/elasticsearch-plugin install file:///path/to/yauaa-elasticsearch-8-7.29.0.zip

Usage

This plugin is intended to be used in an ingest pipeline.

You have to specify the name of the input field and the place where the possible configuration flags are:

NameMandatory/OptionalDescriptionDefaultExample
field_to_header_mappingMThe mapping from the input field name to the original request header name of this field-"field_to_header_mapping" : { "ua": "User-Agent" }
field (deprecated)MThe name of the input field that contains the UserAgent string-"useragent"
target_fieldMThe name of the output structure that will be filled with the parse results"user_agent""parsed_ua"
fieldNamesOA list of Yauaa fieldnames that are desired. When specified the system will limit processing to what is needed to get these. This means faster and less memory used.All possible fields[ "DeviceClass", "DeviceBrand", "DeviceName", "AgentNameVersionMajor" ]
cacheSizeOThe number of entries in the LRU cache of the parser10000100
preheatOHow many testcases are put through the parser at startup to warmup the JVM01000
extraRulesOA yaml expression that is a set of extra rules and testcases.-"config:\n- matcher:\n extract:\n - '"'"'FirstProductName : 1 :agent.(1)product.(1)name'"'"'\n"

Example usage

Basic pipeline

Create a pipeline that just extracts everything using the default settings:

curl -H 'Content-Type: application/json' -X PUT 'localhost:9200/_ingest/pipeline/yauaa-test-pipeline_basic' -d '
{
  "description": "A pipeline to do whatever",
  "processors": [
    {
      "yauaa" : {
        "field_to_header_mapping" : {
            "useragent":                "User-Agent",
            "uach_platform":            "Sec-CH-UA-Platform",
            "uach_platform_version":    "Sec-CH-UA-Platform-Version"
        },
        "target_field"  : "parsed"
      }
    }
  ]
}
'

Common pipeline

In this example a pipeline is created that only gets the fields that are actually desired.

curl -H 'Content-Type: application/json' -X PUT 'localhost:9200/_ingest/pipeline/yauaa-test-pipeline_some' -d '
{
  "description": "A pipeline to do whatever",
  "processors": [
    {
      "yauaa" : {
        "field_to_header_mapping" : {
            "useragent":                "User-Agent",
            "uach_platform":            "Sec-CH-UA-Platform",
            "uach_platform_version":    "Sec-CH-UA-Platform-Version"
        },
        "target_field"  : "parsed",
        "fieldNames"    : [ "DeviceClass", "DeviceBrand", "DeviceName", "AgentNameVersionMajor", "FirstProductName" ],
      }
    }
  ]
}
'

Advanced pipeline

In this example a pipeline is created that includes an example of a custom rule. The hardest part is making the yaml (with quotes, newlines and the needed indentation) encode correctly inside a JSon structure.

curl -H 'Content-Type: application/json' -X PUT 'localhost:9200/_ingest/pipeline/yauaa-test-pipeline_full' -d '
{
  "description": "A pipeline to do whatever",
  "processors": [
    {
      "yauaa" : {
        "field_to_header_mapping" : {
            "useragent":                "User-Agent",
            "uach_platform":            "Sec-CH-UA-Platform",
            "uach_platform_version":    "Sec-CH-UA-Platform-Version"
        },
        "target_field"  : "parsed",
        "fieldNames"    : [ "DeviceClass", "DeviceBrand", "DeviceName", "AgentNameVersionMajor", "FirstProductName" ],
        "cacheSize" : 10,
        "preheat"   : 10,
        "extraRules" : "config:\n- matcher:\n    extract:\n      - '"'"'FirstProductName     : 1 :agent.(1)product.(1)name'"'"'\n"
      }
    }
  ]
}
'

Put record

I put a record in ElasticSearch using the above mentioned Advanced pipeline

curl -H 'Content-Type: application/json' -X PUT 'localhost:9200/my-index/my-type/1?pipeline=yauaa-test-pipeline_full' -d '
{
  "useragent" : "Mozilla/5.0 (Linux; Android 7.0; Nexus 6 Build/NBD90Z) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.124 Mobile Safari/537.36"
}
'

which returns

{"_index":"my-index","_type":"my-type","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

then I retrieve the record from elasticsearch and the additional parse results are now part of the indexed record.

curl -s -H 'Content-Type: application/json' -X GET 'localhost:9200/my-index/my-type/1' | python -m json.tool

results in

{
    "_id": "1",
    "_index": "my-index",
    "_primary_term": 1,
    "_seq_no": 0,
    "_source": {
        "parsed": {
            "AgentName": "Chrome",
            "AgentNameVersionMajor": "Chrome 53",
            "AgentVersion": "53.0.2785.124",
            "AgentVersionMajor": "53",
            "DeviceBrand": "Google",
            "DeviceClass": "Phone",
            "DeviceName": "Google Nexus 6",
            "FirstProductName": "Mozilla"
        },
        "useragent": "Mozilla/5.0 (Linux; Android 7.0; Nexus 6 Build/NBD90Z) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.124 Mobile Safari/537.36"
    },
    "_type": "my-type",
    "_version": 1,
    "found": true
}

NOTES for developers

The ElasticSearch testing tools are quick to complain about jar classloading issues: “jar hell”.

To make it possible to test this in IntelliJ you’ll need to set a custom property

  1. Help –> Edit Custom properties
  2. Make sure there is a line with idea.no.launcher=true
  3. Restart IntelliJ

See also https://stackoverflow.com/questions/51045201/using-the-elasticsearch-test-framework-in-intellij-how-to-resolve-the-jar-hell/51045272