Friday, April 17, 2015

Dealing with mongoDB restrictions on field names

Unstructured databases are awesome in many ways, especially when your application requires a storage with a flexible schema. However, even the most flexible of the databases needs a way to reference the different parts of the information into the database model.

In mongoDB, this system of references is inspired by JSON and uses a dot notation where the field identifiers are separated by dots. For example, in the following example the dot notation is used in a query to search all documents in the collection inventory where the value of the field producer is an embedded document that contains a field company with the value 'ABC123':

db.inventory.find( { 'producer.company': 'ABC123' } )

Since the dot (.) is a recognized operator in mongoDB, there exits restrictions on its use, in especial as part of field names:
"Field names cannot contain dots (i.e. .) or null characters, and they must not start with a dollar sign (i.e. $)."
The proposed alternative to deal with this restriction is to substitute the reserved characters with their Unicode full width equivalents:
"In some cases, you may wish to build a BSON object with a user-provided key. In these situations, keys will need to substitute the reserved $ and . characters. Any character is sufficient, but consider using the Unicode full width equivalents: U+FF04 (i.e. “$”) and U+FF0E (i.e. “.”)."
It's not that a big deal, until you realize that this restriction also applies to your mapped Java classes. If you are using the latest version of the Java mongoDB Driver you will be relying on a Document to represent a key-value map that can be saved to your mongoDB database. To illustrate this problem, let's have a look at this class and in particular to the constructor Document(Map<String,Object> map), which provides a simple way to create a new document in the database from a Java Map. If you create a map with a key containing a mongoDB reserved character, such as the dot (.), then any attempt to insert your map in the database will fail with an exception.

Taking into account the recommendation of the mongoDB team, the safest way to proceed is to implement a Map that replaces reserved characters from the map key and to use it to insert new records in the database. For example, the following class wraps a Hashtable that uses a key compatible with mongoDB field name restrictions:
public class MongoDBSafeMap<K extends MongoDBSafeKey, V> implements Map<K, V> {

  private Map<K, V> __map = new Hashtable<>();

  public Map&lt;String, Object&gt; toMap() {
    final Map&lt;String, Object&gt; map = newHashMap();
    for (final Map.Entry&lt;K, V&gt; entry : __map.entrySet()) {
      map.put(entry.getKey().getKey(), entry.getValue());
    }
    return unmodifiableMap(map);
  }

  // override map operations

}
And the class implementing the map key uses regular expressions to replace the reserved characters from the user-provided key with the Unicode full width equivalents:
import static com.google.common.base.Preconditions.checkArgument;
import static java.util.regex.Matcher.quoteReplacement;
import static java.util.regex.Pattern.compile;
import static java.util.regex.Pattern.quote;
import static org.apache.commons.lang.StringEscapeUtils.unescapeJava;
import static org.apache.commons.lang.StringUtils.isNotBlank;
import static org.apache.commons.lang.StringUtils.trimToNull;

public class MongoDBSafeKey {

  private static final Pattern DOLLAR_PATTERN = compile(quote("$"));
  private static final Pattern DOT_PATTERN = compile(quote("."));

  private static final String DOLLAR_REPLACEMENT = quoteReplacement("\\") + "uff04";
  private static final String DOT_REPLACEMENT = quoteReplacement("\\") + "uff0e";

  private static final Pattern UDOLLAR_PATTERN = compile(quote("\uff04"));
  private static final Pattern UDOT_PATTERN = compile(quote("\uff0e"));

  private static final String UDOLLAR_REPLACEMENT = quoteReplacement("$");
  private static final String UDOT_REPLACEMENT = quoteReplacement(".");

  private String key;

  public static MongoDBSafeKey escapeMapKey(final String name) {
    final MongoDBSafeKey instance = new MongoDBSafeKey();
    instance.setKey(escapeFieldName(name));
    return instance;
  }

  public static String escapeFieldName(final String name) {
    String name2 = null;
    checkArgument(isNotBlank(name2 = trimToNull(name)), "Uninitialized or invalid field name");  
    String escaped = DOLLAR_PATTERN.matcher(name2).replaceAll(DOLLAR_REPLACEMENT);
    escaped = DOT_PATTERN.matcher(escaped).replaceAll(DOT_REPLACEMENT);
    return unescapeJava(escaped);
  }

  // class implementation

}
This class uses a method parameter validation strategy that I explained in a previous post.

A final step is needed to bind the new map to your binding classes. In my case, I'm using the Jackson JSON processor Java library to marshal/unmarshal Java objects exchanged with client applications. A key serializer and a key deserializer allow me to use the new map with Jackson:
public class MongoDBSafeKeySerializer extends StdSerializer<MongoDBSafeKey> {

  public MongoDBSafeKeySerializer() {
    super(MongoDBSafeKey.class);
  }

  @Override
  public void serialize(final MongoDBSafeKey value, final JsonGenerator jgen, final SerializerProvider provider) throws IOException, JsonProcessingException {
    if (value == null) {
      jgen.writeNull();
    } else {
      jgen.writeFieldName(value.getKey());
    }
  }

}

import static MongoDBSafeKey.escapeMapKey;

public class MongoDBSafeKeyDeserializer extends KeyDeserializer { 

  @Override
  public Object deserializeKey(final String key, final DeserializationContext ctxt) throws IOException, JsonProcessingException {  
    return escapeMapKey(key);
  }

}
Register these classes with Jackson before processing any message coming from your clients and you're done:
import com.fasterxml.jackson.annotation.JsonInclude.Include;
import com.fasterxml.jackson.databind.ObjectMapper;

public final class MongoDBJsonMapper {

  public static final ObjectMapper JSON_MAPPER = new ObjectMapper();
  static {
    // apply general configuration
    JSON_MAPPER.setSerializationInclusion(Include.NON_NULL);
    JSON_MAPPER.setSerializationInclusion(Include.NON_DEFAULT);
    // register external serializers/deserializers  
    final SimpleModule simpleModule = new SimpleModule("MyModule", new Version(1, 0, 0, null, "com.github.etorres.codexposed", "mymodule"));
    simpleModule.addKeySerializer(MongoDBSafeKey.class, new MongoDBSafeKeySerializer());
    simpleModule.addKeyDeserializer(MongoDBSafeKey.class, new MongoDBSafeKeyDeserializer());
    JSON_MAPPER.registerModule(simpleModule);
  }

}
Now you can use the mongoDB-safe map in your code:
// create a new map escaping the key
final MongoDBSafeMap<MongoDBSafeKey, String> safeMap = new MongoDBSafeMap<>();
map.put(escapeMapKey("$this.is.an.invalid...key."), "Hello World!");

// serialize to JSON
final String payload = JSON_MAPPER.writeValueAsString(safeMap);

// deserialize from JSON
@SuppressWarnings("unchecked")
final MongoDBSafeMap<MongoDBSafeKey, String> safeMap2 = JSON_MAPPER.readValue(payload, MongoDBSafeMap.class);

// create a mongoDB document
final Document doc = new Document(safeMap.toMap());
An additional advantage of this approach is that you can use the new map in your REST services and clients. First, create a provider to inject your object mapper:
import static MongoDBJsonMapper.JSON_MAPPER;
import javax.ws.rs.ext.ContextResolver;
import javax.ws.rs.ext.Provider;
import com.fasterxml.jackson.databind.ObjectMapper;

@Provider
public class MapperProvider implements ContextResolver<ObjectMapper> {

  @Override
  public ObjectMapper getContext(final Class<?> type) {
    return JSON_MAPPER;
  }

}
Finally, register the provider with your JAX-RS implementation. For example, in the case of Jersey:
import static com.google.common.collect.Sets.newHashSet;
import javax.ws.rs.core.Application;
import org.glassfish.jersey.jackson.JacksonFeature;

public class MyApplication extends Application {

  final Set<Class<?>> classes = newHashSet();
  final Set<Object> instances = newHashSet();

  public MyApplication() {
    // create your resources here
    // instances.add(new MyResource());
    // register additional JAX-RS providers
    classes.add(MapperProvider.class);
    classes.add(JacksonFeature.class);
  }

  // class implementation

}

Conclusions

In this post, I offer a solution to the problem of storing user-provided keys with mongoDB reserved characters. The solution works with the JSON processing library Jackson and can be easily adapted to work with a REST API.

A complete working example is available on my GitHub account under the Apache License, version 2.0. Convenient test cases are provided with JUnit and Maven.

Wednesday, April 15, 2015

Validating method parameters in Java

Validating input arguments is a tedious task. Regardless of the programming language used, method parameters must be checked to ensure that the allowed values are used. In some cases, this process includes trimming additional spaces from string parameters before validation rules can be applied to the input.

The following post describes a validation strategy for method parameters that uses utility methods from the Java API, the Apache Commons Lang API and the Google Guava library.

The strategy consists on creating a defensive copy of a method's input parameters, simplifying the input whenever possible and improving code security by removing unnecessary update operations from collections that don't need to support mutation. Then, validation rules are applied to the copies, interrupting the execution of the method when an invalid value is found and providing default values to optional parameters. An example method is described below to illustrate this strategy.

The example method receives two parameters. The first parameter is always required and the validation rules can vary from one case to another. When the method fails to validate the value passed in the required parameter, two unchecked exceptions are thrown: a NullPointerException to indicate that a null value was passed to the method and a IllegalArgumentException to indicate other invalid values. In every case, a custom error message is provided with the exception. The second parameter is optional and a default value is used when none is provided.

Three types of parameters are covered using similar validation rules and producing the same errors described above. First, a method receiving String parameters is presented. The second method receives List parameters, and finally, a third method is presented that receives Map parameters. Several variations of the methods are described covering common usage scenarios for each type.

String parameters


The first method receives a required parameter that accepts empty values and a second optional parameter where null and empty values are replaced by a default value:


Method parameters are declared final, which is a good practice to prevent your methods from overriding the parameters they receive. This is especially important when you want to preserve the values of the attributes of an object passed to your method.

In addition, the optional parameter is marked with the @Nullable annotation to make clear that this parameter accepts null values. This will help other developers to quickly understand the values accepted by your method, but it also serves as a hint for code analyzers like FindBugs. Include the following import in your class to get access to the @Nullable annotation:

import javax.annotation.Nullable;

Also, include the following imports in your class in order to get access to the utility methods:

import static com.google.common.base.Optional.fromNullable;
import static com.google.common.base.Preconditions.checkNotNull;
import static org.apache.commons.lang3.StringUtils.trim;
import static org.apache.commons.lang3.StringUtils.trimToNull;

The methods trim and trimToNull remove spaces from both ends of a string. The main difference is in how empty strings are treated. While trim preserves empty strings, trimToNull replaces them with null.

A copy of the required parameter is created with checkNotNull, which also checks that the value is not null. The second parameter of this method is the error message provided with the NullPointerException when a null value is entered.

In the case of the optional parameter, the method fromNullable is used to create a non-null reference and if a valid value is present, a copy is created. Otherwise, the specified default is used.

In this strategy, the methods trim and trimToNull serve to reduce the input to its equivalent simplest form, which is commonly referred as the canonical form. Canonicalization occurs before validation, allowing methods to operate on the canonicalized, validated version of the parameters.

When the empty string is not an allowed value of the required parameter, adding the following check will throw an IllegalArgumentException if the provided value is empty. The second parameter of the method checkArgument is the error message of the IllegalArgumentException:

checkArgument(!required2.isEmpty(), "Empty string is not allowed");

Add the following import to your class to get access to this method:

import static com.google.common.base.Preconditions.checkArgument;


List parameters


The same strategy can be extended to validate parameters with other commonly used types, in particular with generic lists:


The same method checkNotNull can be used to check that the value of the input list is not null. However, depending on the needs of your application, there are several options that you can use to create a copy of the list.

While Java provides methods to make your collections unmodifiable, you can also use the Guava's immutable collections to create lists where the items stored in the list cannot be modified. Otherwise, if you need to make changes to your list, then you can rely on the standard Java List.

However, always consider using immutable collections (Java or Guava's implementation) because they are safer than the mutable ones, avoiding synchronization problems and keeping memory usage at a minimum.

Include the following additional condition in case that you need to check that your required list is not empty:

checkArgument(!required2.isEmpty(), "Empty list is not allowed");


Map parameters


Validating generic map parameters is quite similar to validating lists:


The same options exist to create copies of the maps. Similarly to the case of the list, the required map can be validated to avoid null values:

checkArgument(!required2.isEmpty(), "Empty map is not allowed");


Conclusions


Despite method's argument validation is a common need of all applications, there is no simple way to check if a parameter passed to a method is valid, and most solutions require significant effort to write validation tests. In this post, a strategy is explored that combines utility methods from the Java API, the Apache Commons Lang API and the Google Guava library. The provided examples serve to the purpose of illustrating common usage cases where the strategy can help to write better code, which is my objective with this blog.

A complete working example is available on my GitHub account under the Apache License, version 2.0. Convenient test cases are provided with JUnit and Maven.