一、修改增加中文分词模块为 Paoding-analysis
非常简单,只需要修改一个源码文件。
源代码文件(以下都用下划线表示):src\net\sf\regainRegainToolKit.java
import net.paoding.analysis.analyzer.PaodingAnalyzer;
import org.apache.lucene.analysis.cn.ChineseAnalyzer;
public static Analyzer createAnalyzer(String analyzerType,
String[] stopWordList, String[] exclusionList, String[] untokenizedFieldNames)
throws RegainException
if (analyzerType.equalsIgnoreCase(”english”)) {
analyzerClassName = StandardAnalyzer.class.getName();
} else if (analyzerType.equalsIgnoreCase(”german”)) {
analyzerClassName = GermanAnalyzer.class.getName();
} else if (analyzerType.equalsIgnoreCase(”chinese”)){
analyzerClassName = ChineseAnalyzer.class.getName();//Add by ping.
}  else if (analyzerType.equalsIgnoreCase(”paoding”)){
analyzerClassName = PaodingAnalyzer.class.getName();//Add by ping.
}
源码修改只涉及以上一个文件,但是要完整编译和最终运成功,还需要其他修改。
主要包括:
1.修改ant的编译配置文件build.xml,
2.拷贝paoding-analysis.jar到lib目录。
build.xml修改如下:
[这里摘录修改的片段,修改增加部分为粗体]

<target name=”runtime-desktop” depends=”prepare-once, runtime-desktop-fast”>
<echo message=”Creating the jars …” />
<fileset id=”desktop-common-jars” dir=”build/included-lib-classes/common”>
<include name=”org/apache/lucene/**”/>
<include name=”org/apache/log4j/**”/>
<include name=”org/apache/regexp/**”/>
<!– Add by ping. –>
<include [...]