Sunday, November 11, 2012

Custom Solr token filter factories with arguments

Many of the token filter factories, tokenizer factories, and char filter factories that come bundled with Solr accept parameters from a schema.xml. The documentation for writing your own filters and tokenizers doesn't include any details for how to access these parameters, but it's pretty easy to figure out by inspecting the source code for one of the included factories.

The MappingCharFilterFactory takes a path to a file as a parameter. The Javadoc for the MappingCharFilterFactory shows the declaration to put in schema.xml.
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping.txt"/>
In the source code for the MappingCharFilterFactory we can find the following line.
mapping = args.get("mapping"); 
Looking through the class hierarchy, all token filter factories, tokenizer factories, and char filter factories are descendants of AbstractAnalysisFactory where args is declared as a protected variable. All you have to do access the parameters passed from schema.xml is access the args map. args can also be accessed via the getArgs() function.