Apache Hive
Introduction
This is a User Defined Function for Apache Hive
Getting the UDF
You can get the prebuilt UDF from maven central (yauaa-hive-7.29.0-udf.jar).
NOTE: You MUST use the -udf.jar
: yauaa-hive-7.29.0-udf.jar
If you use a maven based project simply add this dependency
<dependency>
<groupId>nl.basjes.parse.useragent</groupId>
<artifactId>yauaa-hive</artifactId>
<classifier>udf</classifier>
<version>7.29.0</version>
</dependency>
Building
Simply install the normal build tools for a Java project (i.e. maven and jdk) and then simply do:
mvn clean package
Example usage
First the jar file must be ‘known’ Either by doing
ADD JAR hdfs:///yauaa-hive-7.29.0-udf.jar
or by defining it as a permanent function
CREATE FUNCTION ParseUserAgent
AS 'nl.basjes.parse.useragent.hive.ParseUserAgent'
USING JAR 'hdfs:///yauaa-hive-7.29.0-udf.jar';
If you are using a recent version of Hive you can also do
CREATE FUNCTION ParseUserAgent
AS 'nl.basjes.parse.useragent.hive.ParseUserAgent'
USING JAR 'ivy://nl.basjes.parse.useragent:yauaa-hive:7.29.0?classifier=udf';
or installing it locally with the Hive Server
TODO: Document installation
Verify if it has been installed
DESCRIBE FUNCTION ParseUserAgent;
+-----------------------------------------------------------------------+
| tab_name |
+-----------------------------------------------------------------------+
| ParseUserAgent(str) - Parses the UserAgent into all possible pieces. |
+-----------------------------------------------------------------------+
> DESCRIBE FUNCTION EXTENDED ParseUserAgent;
+---------------------------------------------------------------------------+
| tab_name |
+---------------------------------------------------------------------------+
| parseuseragent(str) - Parses the UserAgent into all possible pieces. |
| Synonyms: default.parseuseragent |
| Example: |
| > SELECT ParseUserAgent(useragent).DeviceClass, |
| ParseUserAgent(useragent).OperatingsystemNameVersion, |
| ParseUserAgent(useragent).AgentNameVersionMajor |
| FROM clickLogs; |
| +---------------+-----------------------------+------------------------+ |
| | deviceclass | operatingsystemnameversion | agentnameversionmajor | |
| +---------------+-----------------------------+------------------------+ |
| | Phone | Android 6.0 | Chrome 46 | |
| | Tablet | Android 5.1 | Chrome 40 | |
| | Desktop | Linux ?? | Chrome 59 | |
| | Game Console | Windows 10.0 | Edge 13 | |
| +---------------+-----------------------------+------------------------+ |
| |
+---------------------------------------------------------------------------+
Basic test it works (trimmed the output here)
> SELECT ParseUserAgent('Mozilla/5.0 (Linux\; Android 6.0\; Nexus 6 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.76 Mobile Safari/537.36');
+----------------------------------------------------+
| _c0 |
+----------------------------------------------------+
| { "deviceclass":"Phone",
"devicename":"Google Nexus 6",
"devicebrand":"Google",
...
"operatingsystemnameversion":"Android 6.0",
...
"layoutenginenameversion":"Blink 46.0",
...
"agentclass":"Browser",
...
"agentnameversion":"Chrome 46.0.2490.76",
...
} |
+----------------------------------------------------+
1 row selected (5.682 seconds)
Usage example:
CREATE TABLE useragents (useragent STRING COMMENT 'The useragent string');
INSERT INTO TABLE useragents VALUES ('Mozilla/5.0 (Linux\; Android 6.0\; Nexus 6 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.76 Mobile Safari/537.36');
INSERT INTO TABLE useragents VALUES ('Mozilla/5.0 (Linux\; Android 5.1\; Nexus 10 Build/LMY47D) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.109 Safari/537.36');
INSERT INTO TABLE useragents VALUES ('Mozilla/5.0 (X11\; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36');
INSERT INTO TABLE useragents VALUES ('Mozilla/5.0 (Windows NT 10.0\; Win64\; x64\; Xbox\; Xbox One) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2486.0 Safari/537.36 Edge/13.10553');
SELECT ParseUserAgent(useragent).DeviceClass, ParseUserAgent(useragent).OperatingsystemNameVersion, ParseUserAgent(useragent).AgentName, ParseUserAgent(useragent).AgentNameVersionMajor from useragents;
+---------------+-----------------------------+------------+------------------------+
| deviceclass | operatingsystemnameversion | agentname | agentnameversionmajor |
+---------------+-----------------------------+------------+------------------------+
| Phone | Android 6.0 | Chrome | Chrome 46 |
| Tablet | Android 5.1 | Chrome | Chrome 40 |
| Desktop | Linux ?? | Chrome | Chrome 59 |
| Game Console | Windows 10.0 | Edge | Edge 13 |
+---------------+-----------------------------+------------+------------------------+
ClientHints example
If you pass only a single parameter it is assumed to be the “User-Agent”.
If you want to make use of the support for the User-Agent Client Hints
you must call the function from your SQL with a list of header name
and value
. The header names must be the same as what a browser would send to the webserver (see: Specification).
Example:
SELECT 'CLIENTHINTS',
parsedUseragentAllFields.DeviceClass AS deviceClass,
parsedUseragentAllFields.AgentNameVersionMajor AS agentNameVersionMajor,
parsedUseragentAllFields.OperatingSystemNameVersion AS operatingSystemNameVersion
FROM (
SELECT ParseUserAgent(
'user-Agent', useragent,
'sec-CH-UA-Platform', chPlatform,
'sec-CH-UA-Platform-Version', chPlatformVersion
) AS parsedUseragentAllFields
FROM clickLogs
) ParsedSubSelect;