Apache Pig
DEPRECATED
Apache Pig is no longer used. So with Yauaa 7 this UDF has been dropped. Version 6.12 is the last released version which still has the Apache Pig in it.
Introduction
This is a User Defined Function for Apache Pig
Getting the UDF
You can get the prebuilt UDF from maven central.
If you use a maven based project simply add this dependency
<dependency>
<groupId>nl.basjes.parse.useragent</groupId>
<artifactId>yauaa-pig</artifactId>
<classifier>udf</classifier>
<version>6.12</version>
</dependency>
Example usage
-- Import the UDF jar file so this script can use it
REGISTER ../target/*-udf.jar;
------------------------------------------------------------------------
-- Define a more readable name for the UDF and pass optional parameters
-- First parameter is ALWAYS the cache size (as a text string!)
-- The parameters after that are the requested fields.
----------
-- If you simply want 'everything'
-- DEFINE ParseUserAgent nl.basjes.parse.useragent.pig.ParseUserAgent;
----------
-- If you just want to set the cache
-- DEFINE ParseUserAgent nl.basjes.parse.useragent.pig.ParseUserAgent('10000');
----------
-- If you want to set the cache and only retrieve the specified fields
DEFINE ParseUserAgent nl.basjes.parse.useragent.pig.ParseUserAgent('10000', 'DeviceClass', 'DeviceBrand' );
rawData =
LOAD 'testcases.txt'
USING PigStorage()
AS ( useragent: chararray );
UaData =
FOREACH rawData
GENERATE useragent,
-- Do NOT specify a type for this field as the UDF provides the definitions
ParseUserAgent(useragent) AS parsedAgent;