Tuesday, July 10, 2007

The Great LDAP-Database Divide

It is a frequent requirement in enterprise application development that a system integrate with an external security store for authentication/authorization -- typically an LDAP system or derivative.

It is also almost always a requirement to leverage some kind of database for persistent storage of information.

But what happens when the information in one store must frequently be related to the information in the other? For example, your reports call for data that is both in LDAP and the database? This is the stuff religious wars are fought over. Use LDAP for everything! Use databases for everything! And the ever-famous Let's Just Copy All The Data Into the Database and Have it in Both Places!

None of these is very useful in practice because, well, it is what it is and you don't always get to choose your environment or your requirements. To find a good solution, you are going to need to work with a distributed, multi-type data store. Assuming that you don't want corrupted data across the stores, this means an external transaction server and transaction-aware services. This means ensuring that your client understands the issue at hand -- and the costs of requiring your application work in this manner.

The first thing you'll probably want to do is hide the fact that there are disparate data stores at all. The last thing you want is application developers trying to manage the connectivity and mechanics of working across LDAP and databases. Ideally, you want them working with a unified API that consists of business methods and that's all.

As an example of this, one of the Java-based implementations I was involved with used a high-level interface to expose the business methods. The implementation was dependency-injected (via Spring beans) and was composed of two data sources -- an LDAP data source and a database data source, each wrapped in transactionally-aware JTA containers. Annotations specified in the interface signal how the transactional nature of the method should be treated.

But all of this is, to me at least, Herculean effort, and almost never worth it. The performance of such applications often suffers, the level of complexity goes up significantly, and the costs of developing and maintaining such a solution go up with it.

So far the only way I've figured out how to get around this is to attempt to limit the relational usage of the data -- force scenarios where I can load user and permission data from LDAP during the authentication process and then never touch it again, and resist scenarios where reporting is done with user data as part of the result set. But this is not always possible -- again, it's not always possible to choose your environment or requirements.

I'd be very interested to hear the experiences of other architects -- how have you bridged (or avoided) the LDAP/database divide? Have you developed best practices for this issue?

1 comment:

Anonymous said...

http://blogs.sun.com/treydrake/entry/ldap_vs_relational_database