Building a Blog Redux - Setting Up Lucene.Net For Search (Part 7)

Thursday, September 27, 2012

This is the seventh post in a series of posts about how I went about building my blogging application.

  1. Building a Blog Redux - Why Torture Myself (Part 1)
  2. Building a Blog Redux - The tools for the trade (Part 2)
  3. Building a Blog Redux - Entity Framework Code First (Part 3)
  4. Building a Blog Redux - Web fonts with @font-face and CSS3 (Part 4)
  5. Building a Blog Redux - Goodreads feed using Backbone.js (Part 5)
  6. Building a Blog Redux - Mapping View Models to Entities Using AutoMapper (Part 6)

Search is a segnificant part of website usability. When someone comes to my website looking for something, I want to be able to provide a good user experience where if they look down at the tags and don't see a specific tag they are looking for, then give them the ability to search my blogs for word matches. I would rather not have the user leave my site to go to Google and search for term that will lead a user to another website. Users should be able to enter a term or phrase on my website, and get the most relevant responses back for them to review.

That being said, the latest functionality added to my blog site is a searching capability. I chose Lucene.net mainly because it is the big open source player on the capability for ASP.Net applications. The common features that come with this framework that I am using is the indexing of content from by blogs and the querying of the indexes to return relevant information in a quick response. It is a direct function to function port over from the original Java version Lucene.

Lucene.Net is an indexing and search framework that can run on many different platforms, but most commonly it is used for searching capabilities on websites. A few years ago, it was essentially a dead project, however some dedicated programmers revived the projected and entered it into the Apache Foundation where was in an incubator status until this past August 15th, where it was then voted out of the incubator status.

Be aware that now that the project has graduated from the incubator status, there project site has moved and it took me some time to actually find it. At least at the time of this writing, it didn't even come up on a Google search. It was on Twitter where I found current project link.

What I would like to do in this post, is show you how I have implement Lucene.net in a MVC ASP.Net web application. This article is mainly going to focus on setting up the components in a MVC web application using StructureMap for Inversion of Control. Later on, I will have a post on setting up the index and another post on setting up querying the index.

Getting Started

First off, I should say that documentation on the project site for Lucene.net is a bit lacking, but you can get a pretty good introduction from the CodeClimber web site starting with this article. Furthermore, since this article talks about how Lucene.net was implemented in the SubText blog engine project, I went to their project site and downloaded the Subtext source code. I tried not to copy any code directly from the Subtext source code, but if you compare my code to the code that in the Subtext source code you will see a lot of similarities.  I think they way the Subtext code is implement on that site is pretty good, and pretty much the most common way to have the framework implemented. For the most part, all the samples I have seen either refer to the same articles I am referring to, or give examples very similar to the code I have setup as well. The point being is that I am standing on other developers shoulders with this code mainly taking there ideas and modifying them to fit my needs.

Set Up

Lucent.Net is up on Nuget so its easy enough to do download install as a package. Just download the package via Nuget, and you are ready to start writing code.

Since I am using StructureMap to manage all of my dependencies, I need to explicitly add the Lucene.Net components to the registry because I want to specify a singleton instance of the components so only one of each of the components run while the application is running. Typically, I have StructureMap scan all my objects and then use StructureMap's "Convention of Configuration" pattern to register them all; however, this unique situation where this pattern does not work. I need to explicitly tell StructureMap that components must run as Singletons; that is, there can only be one instance of the objects running at a single given time.

Regarding how I have set up StructureMap in general, I mentioned this in an earlier post, but if you want to see a good example, of how I have StructureMap setup see Elijah Manner's post.

Here is my StructureMap registry code:

    public class AviBlogRegistry : Registry
    {
        #region Constructors and Destructors
 
        public AviBlogRegistry()
        {
            Scan(
                x =>
                    {
                        x.TheCallingAssembly();
                        x.Assembly("AviBlog.Core");
                        x.WithDefaultConventions();
                    });
 
            //Register Search and Indexing Services
            For<ISearchEngineService>().Singleton().Use(
                () => new SearchEngineService(SingletonDirectory.Instance, SingletonAnalyzer.Instance));
            For<ISearchIndexService>().Singleton().Use<SearchIndexService>();
        }

As I stated, I have the scan feature, which registering all my objects, but right below that I am registering to Lucene.net objects explicitly that will be using one single instance of the SearchEngine object and a single instance of the SearchIndexService object. These objects are my custom objects that wrap the Lucene.net objects which also need to be and are set up as Singletons. I am also telling StructureMap, that when the SearchEngineService is instantiated, they should use the singleton versions of the Directory and Analyzer Lucene.Net objects.

I am using the "Multithreaded Singleton Pattern" to instantiate these objects as Singletons. The Singleton pattern is used when you want to accomplish the following:

  • You want an object to have only one instance
  • You want an object to one global entry point, in my case I am loading the objects up on the Application Start Event.
  • You want the single instance object to "thread-safe". That is, in a multi threaded in environment, the object should be safely created without the chance that another. thread could accidently create a multible instance at the same time.

I would have liked to have StructureMap create the singletons for the Directory and Analyzer, but I think because I am still in the AviBlogRegistry class, the objects are not registered at this point and thus the instantiated objects cannot be used. I am getting null objects from StructureMap at this point. Perhaps if I go back and create a separate StructureMap registry class for these two objects before this registry is executed, I can once again have every object in the application initiated by StructureMap. I might try that later and update this post.

Therefore to create a Singleton instance of the Directory and Analyzer Lucenet.Net objects, I have created some static classes. Here is the Lucene.Net Directory Singleton instance:

public sealed class SingletonDirectory
    {
        private static volatile Directory instance;
 
        private static readonly object syncRoot = new Object();
 
        private SingletonDirectory()
        {
        }
 
        public static Directory Instance
        {
            get
            {
                if (instance == null)
                {
                    lock (syncRoot)
                    {
                        if (instance == null && HttpContext.Current != null)
                            instance =
                                FSDirectory.Open(
                                    new DirectoryInfo(HttpContext.Current.Server.MapPath("~/folder/subfolder/")));
                    }
                }
                return instance;
            }
        }
    }

The Directory object creates the index that the application will use to read search queries from. In this case I am implementing the concrete FSDirectory version of Directory which is the version of the object that writes the index out to a file. There is also MMapDirectory mmap for reading, and RAMDirectory which reads and writes to memory, which from what I can see is mainly used for unit testing. The FSDirectory takes a DirectoryInfo object which specifies where you want the index files to be saved to.

Conclusion

I'll stop here for now. In my next post in the series I'll talk how to read all the blog records from the database and create an index for searching. As always you can check out the code on my GitHub account.

Resources

comments powered by Disqus