Selector
Framework
A generic
data selection framework based on the JMS selector specification
A White Paper
Author:
Jawaid Hakim
Published:
January 2006
Abstract
Applications frequently require the capability to select data based on
flexible selection criteria. Usually the selection specification is custom
built for each application. There is clear value in a flexible selection
framework that is easy to use and has low performance impact on the
application. An excellent example of a flexible selection specification is the
selector specification in Java Message Service (JMS).
CodeStreet has leveraged the JMS selector specification to develop a
generic selection framework that can be used to select arbitrary application
data on the client side. The framework is called CodeStreet API for Data
Selection (CADS). This framework is used in production quality systems at
several top-tier Wall Street banks.
Table of contents
Message oriented middleware (MOM) has gained considerable currency as the preferred way to build highly decoupled and scalable systems. Many messaging protocols and their associated development frameworks exist in the marketplace. Some popular protocols and frameworks are TIBCO/Rendezvous[1], JMS[2], and SOAP[3].
JMS defines the concept of selectors. A selector is a conditional expression used by JMS clients to filter messages. A selector matches a message if the selector evaluates to true when the messages headers and properties are substituted for their corresponding identifiers in the selector[4].
Client applications can optionally - specify a selector when they create a message consumer for a specific destination[5]. If a selector is specified for a message consumer, then all messages are filtered through the specified selector before being delivered to the consumer.
For example, suppose an application receives trade messages on a specific JMS queue and each trade message has two long properties - Quantity and Price. The following selector could be used to select messages where Quantity is greater than 100 and Price is less than 20:
String selector
= Quantity > 100 AND Price < 20;
Applying selectors on the server (daemon) is the most efficient way to filter messages. However, most messaging systems and communication protocols do not provide built-in support for selectors.
An example of such a transport is TIBCO/RV. A TIBCO/RV client specifies interest in messages through subjects[6]. The RV daemon delivers all messages on the specified subject(s) to the client. Unlike JMS, TIBCO/RV clients cannot install a selector on the daemon to filter messages.
In fact, even in the case of JMS, if message selection is desired based on message content not header fields and properties - then selectors must be applied on the client side.
This white paper describes CADS, a general purpose and highly efficient framework for implementing conditional expression evaluation functionality in applications. The framework described here is a complete implementation of the JMS selector specification 1.1.
A message selector is a String whose syntax is based on a subset of the SQL92[7] conditional expression syntax.
The order of evaluation of a selector is from left to right within precedence level. Parenthesis can be used to change the evaluation order.
Predefined selector literals and operator names are shown here in uppercase, however, they are case insensitive.
A selector can contain:
- A string literal is enclosed in single quotes.
- An exact numeric literal is a numeric value without a decimal point, such as 57, -91, and +62; numbers in the range of Java long as supported.
- An approximate numeric literal is a numeric value is scientific notation, such as 7E3, -57.9E2, or a numeric value with a decimal such as 7., and -95.6; numbers is the range of Java double are supported.
- The boolean literals TRUE and FALSE.
- An identifier is an unlimited-length character sequence must begin with a Java identifier start character, all following characters must be Java identifier part characters.
- Identifier cannot be the names NULL, TRUE, FALSE, NOT, IN, BETWEEN, AND, OR, IS, or ESCAPE.
- Identifiers are either header field references or property references. If a property that does not exist is referenced, its value is NULL.
- No type conversions are supported. For example, suppose a message contain a String property Address. If Address is used in an arithmetic sub-expression, then the sub-expression evaluates to FALSE.
- Identifiers are case sensitive.
- Message header fields are restricted to JMSDeliveryMode, JMSPriority, JMSMessageID, JMSTimestamp, JMSCorrelationID, and JMSType. JMSMessageID and JMSType values may be null and if so are treated as a NULL value.
- A selector is a conditional expression; a selector that evaluates to true matches; a selector that evaluates to false or unknown does not match.
- Arithmetic expressions are composed of themselves, arithmetic operators, identifiers with numeric values, and numeric literals.
- Conditional expressions are composed of themselves, comparison operators, logical operations, identifiers with boolean values, and boolean literals.
- Only like type values can be compared. If comparison on non-like type values is attempted, the value of the operation is FALSE. If either of the type values evaluates to NULL, the value of the expression is UNKNOWN.
- String and Boolean comparison is restricted to = and <>.
- +, - (unary)
- *, / (multiplication and division)
- +, - (addition and subtraction)
- Age BETWEEN 15 AND 19 is equivalent to Age >= 15 AND Age <= 19.
- Age NOT BETWEEN 15 AND 19 is equivalent to Age < 15 OR Age > 19.
-
Country in (US,
- Phone LIKE 718% is true for 718 or 718123 and false for 719.
- Name LIKE j_n is true for jon or jan and false for john.
As noted above, header fields and property values may be NULL. The evaluation of selector expressions containing NULL values is defined by SQL92 NULL semantics. A brief description is provided here.
SQL treats a NULL value as unknown. Comparison or arithmetic with an unknown value always yields an unknown value. The IS NULL and IS NOT NULL operators convert an unknown header field or property into the respective TRUE and FALSE values.
With the selector syntax background under our belts, it is time to get a look at the CADS API. The API can be divided into two groups: the API for creating selectors and the API for evaluating selectors.
A selector within CADS is created as follows:
try
{
// Create selector
String selector = Quantity > 100 AND
Price < 20;
ISelector isel = Selector.valueOf(selector);
}
catch
(InvalidSelectorException ex)
{
}
Behind the scene, the specified selector expression is parsed using a high-speed JAVACC[8] parser, and an in-memory parse tree is built for efficient evaluation.
Once a Selector instance has been successfully created, there are two options available for evaluation of the selector. This section describes the two options in detail.
In the first option, the application asks the selector for the list of identifiers that were encountered during the parse. The application then creates a Map with values for each identifier. The application then evaluates the selector using the Map:
try
{
// Create selector
String selector = Quantity > 100 AND
Price < 20;
ISelector isel = Selector.valueOf(selector);
// Get identifiers encountered during the
parse
Map identifiers = isel.getIdentifiers();
// Specify values for identifiers
Map values = new HashMap();
for (Iterator iter =
identifiers.values().iterator(); iter.hasNext(); )
{
// Set value for identifier
Identifier id = (Identifier)iter.next();
String idName = id.getIdentifier();
values.put(idname, getValueForId(idname));
}
// Evaluate
Result result = isel.eval(values);
}
catch
(InvalidSelectorException ex)
{
}
Note, the method getValueForId() is application code.
The second option is based on the notion of value provider classes. The idea is to allow applications to register a value provider with the selector. During evaluation of the selector, the selector invokes callbacks on the registered provider to get the value of the desired identifier. This option is more efficient because the value provider is invoked only for values that are actually required to evaluate the expression. For example, if the selector expression consists of two sub-expressions connected by an OR and the left-hand side sub-expression evaluated to TRUE, then the second sub-expression is not evaluated.
The following code fragment shows how a value provider is used. The actual implementation of IValueProvider that is used will depend on the application. The framework contains built-in value provider classes for both JMS and TIBCO/RV.
try
{
// Create selector
String selector = Quantity > 100 AND
Price < 20;
ISelector isel = Selector.valueOf(selector);
// Get appropriate value provider
IValueProvider vp =
;
// Get correlation data (or NULL) required
by value provider
Object corr =
;
// Evaluate
Result result = isel.eval(vp,
corr);
}
catch
(InvalidSelectorException ex)
{
}
Applications are free to implement their own value providers. Implementing a value provider is as simple as coding a class that implements the ICSValueProvider interface.
In order to realize the full potential of CADS it is important to keep the following in mind: selector evaluation is much faster than selector creation. The following code shows how not to use CADS:
try
{
for (int i = 0; i < numMsgs; ++i)
{
// Create selector
String selector = Quantity > 100
AND Price < 20;
ISelector isel = Selector.valueOf(selector);
IValueProvider vp =
;
Object corr =
;
// Evaluate
Result result = isel.eval(vp,
corr);
}
}
catch
(InvalidSelectorException ex)
{
}
What is wrong with the code shown above? The selector is created inside the loop, even though the selector is unchanged through the loop. The code shown above works but is highly inefficient.
A better usage is the following:
try
{
// Create selector
String selector = Quantity > 100 AND
Price < 20;
ISelector isel = Selector.valueOf(selector);
for (int i = 0; i < numMsgs; ++i)
{
IValueProvider vp =
;
Object corr =
;
// Evaluate
Result result = isel.eval(vp,
corr);
}
}
catch
(CSInvalidSelectorException ex)
{
}
The JMS specification restricts selector identifiers to header fields and properties values. However, applications frequently require the ability to filter messages based on their content. One option for applications is to publish relevant message fields as properties in order to allow selection. This requires changes to the application code.
To allow applications to filter messages based on content, the framework provides the following extension to the JMS specification: identifier names can contain . (dot). This extension can be used by value provider classes to extract values of message content fields.
CADS value provider classes for JMS and TIBCO/RV use the dot notation to provide content-based selection. These two classes treat any identifier prefixed with a . as a message content field.
For example, the following code selects JMS messages based on the value of a content field called Quantity:
try
{
// Create selector
String selector = .Quantity > 100;
ISelector isel = Selector.valueOf(selector);
// Get the JMS message for filtering
Message msg =
;
// Get appropriate value provider
IValueProvider vp = new com.codestreet.selector.jms.ValueProvider(msg);
// Get correlation data (or null) required
by value provider
Object corr = null;
// Evaluate
Result result = isel.eval(vp, corr);
if (result == Result.UNDEFINED)
{
}
else if (result == Result.TRUE)
{
}
else
{
// FALSE
}
}
catch
(InvalidSelectorException ex)
{
}
Some JMS vendors provide allow nested messages within messages. TIBCO/JMS is an example of such an implementation.
Suppose a JMS message contains a nested message named Trade. The Trade message in turn contains a field named Quantity. The diagram below shows the conceptual structure of this message:
The following code uses the value provider class for JMS to select JMS messages based on the value of Quantity:
try
{
// Create selector
String selector = .Trade.Quantity >
100;
ISelector isel = Selector.valueOf(selector);
// Get the JMS message for filtering
Message msg =
;
// Get appropriate value provider
IValueProvider vp = new com.codestreet.selector.jms.ValueProvider(msg);
// Get correlation data (or NULL) required
by value provider
Object corr = null;
// Evaluate
Result result = isel.eval(vp, corr);
if (result == Result.UNDEFINED)
{
}
else if (result == Result.TRUE)
{
}
else
{
// FALSE
}
}
catch
(InvalidSelectorException ex)
{
}
The framework makes heavy use of the immutable[9] pattern. All classes in the framework are immutable. As a result, no internal or external synchronization is required. Instances of CSSelector may be shared freely between threads.
CADS is a high performance framework with low memory usage. Once a selector has been parsed and the in-memory representation has been built, evaluating it is extremely efficient.
For example, consider the following selector:
String selector = "JMSPriority >= 0
AND Quantity > 100 AND MessageName in ('Login', Logout, 'Query',
'CreateOrder', 'CreateWatchlist', 'Trade', 'OrderAction', 'Orderbook)";
This selector can be evaluated by CADS using the JMS value provider - at a rate of about 200,000 evaluations per second on a low-end PC running Windows 2000 with 512 MB of RAM.
As a comparison, the leading open source selector implementation - in JBOSS[10] - is roughly four times slower.
CADS is also available as a .NET framework. The .NET framework has been written from the ground up in C#. Any .NET/CLS-compliant language can use the framework.
CADS is a complete implementation, along with useful extensions, of the JMS selector specification. It can be used in any context JMS, TIBCO/RV, etc. - where conditional expression evaluation is required.
Jawaid Hakim CTO jawaid.hakim@codestreet.com
[2] See http://java.sun.com/products/jms for details.
[3] See http://www.w3.org/TR/SOAP/ for details.
[4] A JMS message consists of three distinct sections headers fields, properties, and application fields. Header fields are set by the JMS daemon e.g. JMSMessageID. Properties are set by the application or the specific JMS implementation e.g. JMS_TIBCO_MSG_EXT. Applications fields are set by the application.
[5] A JMS destination is either a topic (one-to-many) or a queue (one-to-one).
[6] A TIBCO/RV message send subject is analogous to a JMS destination.
[7] See X/OPEN CAE Specification Data Management: Structured Query Language (SQL), Version 2, ISBN: 1-85912-151-9 March 1996.
[8] See http://www.experimentalstuff.com/Technologies/JavaCC/ for details.
[9] The easiest way to make a class thread-safe is to make it immutable. Since an immutable objects state is set during construction and cannot be changed afterwards, the object can be freely shared between threads without synchronization. See Effective Java by Joshua Bloch for more details, ISBN: 0-201-31005-8.
[10] See http://www.jboss.org