This is a User Defined Function for Apache Flink
You can get the prebuilt UDF from maven central.
If you use a maven based project simply add this dependency to your project.
<dependency>
<groupId>nl.basjes.parse.useragent</groupId>
<artifactId>yauaa-flink</artifactId>
<version>7.20.0</version>
</dependency>
Assume you have a DataSet or DataStream with your records. In most cases I see (clickstream data) these records (In this example this class is called “TestRecord”) contain the useragent string in a field and the parsed results must be added to these fields.
Now you must do two things:
Note that the name of the setters is not important, the system looks at the annotation.
.map(new UserAgentAnalysisMapper<TestRecord>(15000) { // Setting the cacheSize
@Override
public String getUserAgentString(TestRecord record) {
return record.useragent;
}
@SuppressWarnings("unused") // Called via the annotation
@YauaaField("DeviceClass")
public void setDC(TestRecord record, String value) {
record.deviceClass = value;
}
@SuppressWarnings("unused") // Called via the annotation
@YauaaField("AgentNameVersion")
public void setANV(TestRecord record, String value) {
record.agentNameVersion = value;
}
@SuppressWarnings("unused") // Called via the annotation
@YauaaField("OperatingSystemNameVersion")
public void setOSNV(TestRecord record, String value) {
record.operatingSystemNameVersion = value;
}
})
The only difference with the “Only User-Agent” implementation is that the getRequestHeaders
is overridden.
The resulting map should have the original request header names
as keys and the actual header value
as the value.
To illustrate:
Map<String, String> requestHeaders = new TreeMap<>();
requestHeaders.put("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.0.0 Safari/537.36");
requestHeaders.put("Sec-Ch-Ua", "\" Not A;Brand\";v=\"99\", \"Chromium\";v=\"100\", \"Google Chrome\";v=\"100\"");
requestHeaders.put("Sec-Ch-Ua-Arch", "\"x86\"");
requestHeaders.put("Sec-Ch-Ua-Full-Version-List", "\" Not A;Brand\";v=\"99.0.0.0\", \"Chromium\";v=\"100.0.4896.75\", \"Google Chrome\";v=\"100.0.4896.75\"");
requestHeaders.put("Sec-Ch-Ua-Mobile", "?0");
requestHeaders.put("Sec-Ch-Ua-Model", "\"\"");
requestHeaders.put("Sec-Ch-Ua-Platform", "\"Windows\"");
requestHeaders.put("Sec-Ch-Ua-Platform-Version", "\"0.1.0\"");
requestHeaders.put("Sec-Ch-Ua-Wow64", "?0");
Using the analyzer whould then be something like this:
.map(new UserAgentAnalysisMapper<TestRecord>(15000) { // Setting the cacheSize
@Override
public Map<String, String> getRequestHeaders(TestRecord element) {
return element.getHeaders();
}
@SuppressWarnings("unused") // Called via the annotation
@YauaaField("DeviceClass")
public void setDC(TestRecord record, String value) {
record.deviceClass = value;
}
@SuppressWarnings("unused") // Called via the annotation
@YauaaField("AgentNameVersion")
public void setANV(TestRecord record, String value) {
record.agentNameVersion = value;
}
@SuppressWarnings("unused") // Called via the annotation
@YauaaField("OperatingSystemNameVersion")
public void setOSNV(TestRecord record, String value) {
record.operatingSystemNameVersion = value;
}
})
An anonymous inner class in Java is by default private.
If you define it as an anonymous inner class as shown above then the system will try to make this class to become public by means of the method .setAccessible(true). There are situations in which this will fail (amongst others the SecurityManager can block this). If you run into such a scenario then simply ’not’ define it inline as an anonymous class and define it as a named (public) class instead.
So the earlier example will look something like this:
public class MyUserAgentAnalysisMapper extends UserAgentAnalysisMapper<TestRecord> {
@Override
public Map<String, String> getRequestHeaders(TestRecord element) {
return element.getHeaders();
}
@SuppressWarnings("unused") // Called via the annotation
@YauaaField("DeviceClass")
public void setDC(TestRecord record, String value) {
record.deviceClass = value;
}
@SuppressWarnings("unused") // Called via the annotation
@YauaaField("AgentNameVersion")
public void setANV(TestRecord record, String value) {
record.agentNameVersion = value;
}
@SuppressWarnings("unused") // Called via the annotation
@YauaaField("OperatingSystemNameVersion")
public void setOSNV(TestRecord record, String value) {
record.operatingSystemNameVersion = value;
}
}
and then in the topology simply do this
.map(new MyUserAgentAnalysisMapper())