Friday, April 17, 2015

Dealing with mongoDB restrictions on field names

Unstructured databases are awesome in many ways, especially when your application requires a storage with a flexible schema. However, even the most flexible of the databases needs a way to reference the different parts of the information into the database model.

In mongoDB, this system of references is inspired by JSON and uses a dot notation where the field identifiers are separated by dots. For example, in the following example the dot notation is used in a query to search all documents in the collection inventory where the value of the field producer is an embedded document that contains a field company with the value 'ABC123':

db.inventory.find( { 'producer.company': 'ABC123' } )

Since the dot (.) is a recognized operator in mongoDB, there exits restrictions on its use, in especial as part of field names:
"Field names cannot contain dots (i.e. .) or null characters, and they must not start with a dollar sign (i.e. $)."
The proposed alternative to deal with this restriction is to substitute the reserved characters with their Unicode full width equivalents:
"In some cases, you may wish to build a BSON object with a user-provided key. In these situations, keys will need to substitute the reserved $ and . characters. Any character is sufficient, but consider using the Unicode full width equivalents: U+FF04 (i.e. “$”) and U+FF0E (i.e. “.”)."
It's not that a big deal, until you realize that this restriction also applies to your mapped Java classes. If you are using the latest version of the Java mongoDB Driver you will be relying on a Document to represent a key-value map that can be saved to your mongoDB database. To illustrate this problem, let's have a look at this class and in particular to the constructor Document(Map<String,Object> map), which provides a simple way to create a new document in the database from a Java Map. If you create a map with a key containing a mongoDB reserved character, such as the dot (.), then any attempt to insert your map in the database will fail with an exception.

Taking into account the recommendation of the mongoDB team, the safest way to proceed is to implement a Map that replaces reserved characters from the map key and to use it to insert new records in the database. For example, the following class wraps a Hashtable that uses a key compatible with mongoDB field name restrictions:
public class MongoDBSafeMap<K extends MongoDBSafeKey, V> implements Map<K, V> {

  private Map<K, V> __map = new Hashtable<>();

  public Map&lt;String, Object&gt; toMap() {
    final Map&lt;String, Object&gt; map = newHashMap();
    for (final Map.Entry&lt;K, V&gt; entry : __map.entrySet()) {
      map.put(entry.getKey().getKey(), entry.getValue());
    }
    return unmodifiableMap(map);
  }

  // override map operations

}
And the class implementing the map key uses regular expressions to replace the reserved characters from the user-provided key with the Unicode full width equivalents:
import static com.google.common.base.Preconditions.checkArgument;
import static java.util.regex.Matcher.quoteReplacement;
import static java.util.regex.Pattern.compile;
import static java.util.regex.Pattern.quote;
import static org.apache.commons.lang.StringEscapeUtils.unescapeJava;
import static org.apache.commons.lang.StringUtils.isNotBlank;
import static org.apache.commons.lang.StringUtils.trimToNull;

public class MongoDBSafeKey {

  private static final Pattern DOLLAR_PATTERN = compile(quote("$"));
  private static final Pattern DOT_PATTERN = compile(quote("."));

  private static final String DOLLAR_REPLACEMENT = quoteReplacement("\\") + "uff04";
  private static final String DOT_REPLACEMENT = quoteReplacement("\\") + "uff0e";

  private static final Pattern UDOLLAR_PATTERN = compile(quote("\uff04"));
  private static final Pattern UDOT_PATTERN = compile(quote("\uff0e"));

  private static final String UDOLLAR_REPLACEMENT = quoteReplacement("$");
  private static final String UDOT_REPLACEMENT = quoteReplacement(".");

  private String key;

  public static MongoDBSafeKey escapeMapKey(final String name) {
    final MongoDBSafeKey instance = new MongoDBSafeKey();
    instance.setKey(escapeFieldName(name));
    return instance;
  }

  public static String escapeFieldName(final String name) {
    String name2 = null;
    checkArgument(isNotBlank(name2 = trimToNull(name)), "Uninitialized or invalid field name");  
    String escaped = DOLLAR_PATTERN.matcher(name2).replaceAll(DOLLAR_REPLACEMENT);
    escaped = DOT_PATTERN.matcher(escaped).replaceAll(DOT_REPLACEMENT);
    return unescapeJava(escaped);
  }

  // class implementation

}
This class uses a method parameter validation strategy that I explained in a previous post.

A final step is needed to bind the new map to your binding classes. In my case, I'm using the Jackson JSON processor Java library to marshal/unmarshal Java objects exchanged with client applications. A key serializer and a key deserializer allow me to use the new map with Jackson:
public class MongoDBSafeKeySerializer extends StdSerializer<MongoDBSafeKey> {

  public MongoDBSafeKeySerializer() {
    super(MongoDBSafeKey.class);
  }

  @Override
  public void serialize(final MongoDBSafeKey value, final JsonGenerator jgen, final SerializerProvider provider) throws IOException, JsonProcessingException {
    if (value == null) {
      jgen.writeNull();
    } else {
      jgen.writeFieldName(value.getKey());
    }
  }

}

import static MongoDBSafeKey.escapeMapKey;

public class MongoDBSafeKeyDeserializer extends KeyDeserializer { 

  @Override
  public Object deserializeKey(final String key, final DeserializationContext ctxt) throws IOException, JsonProcessingException {  
    return escapeMapKey(key);
  }

}
Register these classes with Jackson before processing any message coming from your clients and you're done:
import com.fasterxml.jackson.annotation.JsonInclude.Include;
import com.fasterxml.jackson.databind.ObjectMapper;

public final class MongoDBJsonMapper {

  public static final ObjectMapper JSON_MAPPER = new ObjectMapper();
  static {
    // apply general configuration
    JSON_MAPPER.setSerializationInclusion(Include.NON_NULL);
    JSON_MAPPER.setSerializationInclusion(Include.NON_DEFAULT);
    // register external serializers/deserializers  
    final SimpleModule simpleModule = new SimpleModule("MyModule", new Version(1, 0, 0, null, "com.github.etorres.codexposed", "mymodule"));
    simpleModule.addKeySerializer(MongoDBSafeKey.class, new MongoDBSafeKeySerializer());
    simpleModule.addKeyDeserializer(MongoDBSafeKey.class, new MongoDBSafeKeyDeserializer());
    JSON_MAPPER.registerModule(simpleModule);
  }

}
Now you can use the mongoDB-safe map in your code:
// create a new map escaping the key
final MongoDBSafeMap<MongoDBSafeKey, String> safeMap = new MongoDBSafeMap<>();
map.put(escapeMapKey("$this.is.an.invalid...key."), "Hello World!");

// serialize to JSON
final String payload = JSON_MAPPER.writeValueAsString(safeMap);

// deserialize from JSON
@SuppressWarnings("unchecked")
final MongoDBSafeMap<MongoDBSafeKey, String> safeMap2 = JSON_MAPPER.readValue(payload, MongoDBSafeMap.class);

// create a mongoDB document
final Document doc = new Document(safeMap.toMap());
An additional advantage of this approach is that you can use the new map in your REST services and clients. First, create a provider to inject your object mapper:
import static MongoDBJsonMapper.JSON_MAPPER;
import javax.ws.rs.ext.ContextResolver;
import javax.ws.rs.ext.Provider;
import com.fasterxml.jackson.databind.ObjectMapper;

@Provider
public class MapperProvider implements ContextResolver<ObjectMapper> {

  @Override
  public ObjectMapper getContext(final Class<?> type) {
    return JSON_MAPPER;
  }

}
Finally, register the provider with your JAX-RS implementation. For example, in the case of Jersey:
import static com.google.common.collect.Sets.newHashSet;
import javax.ws.rs.core.Application;
import org.glassfish.jersey.jackson.JacksonFeature;

public class MyApplication extends Application {

  final Set<Class<?>> classes = newHashSet();
  final Set<Object> instances = newHashSet();

  public MyApplication() {
    // create your resources here
    // instances.add(new MyResource());
    // register additional JAX-RS providers
    classes.add(MapperProvider.class);
    classes.add(JacksonFeature.class);
  }

  // class implementation

}

Conclusions

In this post, I offer a solution to the problem of storing user-provided keys with mongoDB reserved characters. The solution works with the JSON processing library Jackson and can be easily adapted to work with a REST API.

A complete working example is available on my GitHub account under the Apache License, version 2.0. Convenient test cases are provided with JUnit and Maven.

1 comment:

  1. "I very much enjoyed this article.Nice article thanks for given this information. i hope it useful to many pepole.php jobs in hyderabad.
    "

    ReplyDelete