Wednesday, October 31, 2012

Solr 4.0 and the BaseTokenFilterFactory

At work, we're upgrading from an ancient version of Lucene to the shiny new Solr 4.0. Unfortunately, the documentation on the Lucene wiki hasn't quite caught up with the most recent version of the software. I would fix omissions like this as I found them, but the wiki does not seem to accept public edits.

There were three classes in Solr 3.6 for creating custom analyzers. Those classes were the BaseCharFilterFactory, BaseTokenizerFactory, and the BaseTokenFilterFactory. These three classes were in the documented package, org.apache.solr.analysis, through Solr 4.0 ALPHA, but as of the BETA release, they have been moved and renamed.

In Solr 4.0, the new classes are the CharFilterFactory, TokenizerFactory, and TokenFilterFactory. They can be found in the org.apache.lucene.analysis.util package, which is part of Lucene's analyzers-common project

Handy links:
Thanks to comment 7 for pointing this out.