Monday, 4 March 2013

Indexing Multiple Document Types with the Elasticsearch CouchDB River Plugin

I've been using CouchDB with Elasticsearch a lot and one of the problems we had to solve was how to index multiple document types in a single CouchDB database. An additional wrinkle is that the model we use has a two level hierachy that ends up as a parent-child relationship in Elasticsearch. To get the best out of the ES in this case we had to modify the river plugin configuration.

So for example we have two documents like this:

{
   "_id" : "1",
   "type" : "parent",
   "child_ids" [ "2" ]
}

and


{
   "_id" : "2",
   "type" : "child",
   "parent_id" [ "1" ]
}

The first step is to get Elasticsearch to recognise these as two different types of documents. This can be achieved using the script filter function in the Elasticsearch CouchDB river plugin like this:

{
    "type" : "couchdb",
    "couchdb" : {
        "host" : "localhost",
        "port" : 5984,
        "db" : "example",
        "script" : "ctx._type = ctx.doc.type"
    },
    "index" : {
        "index" : "example"
    }
}

The simple script takes the the type field from the original CouchDB document and uses it to set the mapping type in Elasticsearch. To add the parent child information, change the script to this:

"script" : "ctx._type = ctx.doc.type; if (ctx._type == 'child') ctx._parent = ctx.doc.parent_id"

Now Elasticsearch has all the information it needs to support multiple document types and parent/child mappings.

One downside of this approach is that the documents in CouchDB must always have type information available. This isn't the case if you just use HTTP DELETE to remove documents as CouchDB will not retain anything but the ID and revision in that case. Instead you must use the bulk operations API to mark documents as deleted and retain type information. So to delete the above child document you would do as follows:


POST /example/_bulkdocs HTTP/1.1

{
  "docs" : [{
        "_id: 2,
        "_rev" : "rev",
        "_deleted" : true,
        "type" : "child"
  }]
}

Which will preserve the type information while still having the document appear as deleted.

Wednesday, 4 April 2012

Securing Public Facing REST APIs

I've been looking at how to secure public facing REST APIs lately and it's been an interesting journey. This article on designing secure REST APIs neatly encapsulates every step I went through (modulo the peyote). While the article says "without OAuth" reading through the comments and the updates it looks like the conclusion is "use two-legged OAuth".

Since I mostly use Java these days I looked around for a Java library to support OAuth usage on the server side. Seems everyone has one these days, Jersey, Spring Security, etc. A good summary of the available libraries is on stackoverflow.