centos 安装solr

1.安装jdk

下载安装包(也可以使用其他下载文件方式),因为认证问题,不能直接wget, 打开此页面 ,在Java SE Development Kit 8u161勾上Accept License Agreement,然后点击jdk-8u161-linux-x64.rpm,在下载页面获取文件下载地址,比如我本次的是http://download.oracle.com/otn-pub/java/jdk/8u161-b12/2f38c3b165be4555a1fa6e98c45e0808/jdk-8u161-linux-x64.rpm?AuthParam=1519538436_f233fa0ab4a9cba466bec47d360db37a,然后在/down目录下wget此地址。然后再重命名文件 mv jdk-8u161-linux-x64.rpm?AuthParam=1519538436_f233fa0ab4a9cba466bec47d360db37a jdk-8u161-linux-x64.rpm

安装 rpm -ivh jdk-8u161-linux-x64.rpm

配置系统环境变量,在/etc/profile里追加

JAVA_HOME=/usr/java/jdk1.8.0_161
JRE_HOME=$JAVA_HOME/jre
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
export PATH JAVA_HOME CLASSPATH

生效配置,并检验结果

source /etc/profile
java -version

2.安装solr

安装前检测rng-tools及lsof是否正确安装(安装成功输入命令后显示版本号)

rpm -qa |grep rng-tools
rpm -qa |grep lsof

若未安装,设备能够联网的情况下,可使用如下命令安装,否则自行下载安装包安装

yum install rng-tools
yum install lsof

配置rng-tools

echo 'EXTRAOPTIONS="--rng-device /dev/urandom"' >/etc/sysconfig/rngd
service rngd start
chkconfig rngd on

在down目录下创建solr,下载solr安装包,下载apache-tomcat安装包(如果找不到页面,则在浏览器打开https://tomcat.apache.org/download-80.cgi查看有的版本号,然后修改版本号),然后使用tar -zxvf命令解压两个文件

#下载solr安装包
wget http://mirror.bit.edu.cn/apache/lucene/solr/6.6.2/solr-6.6.2.tgz
#下载apache-tomcat安装包(如果找不到页面,则在浏览器打开https://tomcat.apache.org/download-80.cgi查看有的版>本号,然后修改版本号)
wget http://mirror.bit.edu.cn/apache/tomcat/tomcat-8/v8.0.50/bin/apache-tomcat-8.0.50.tar.gz

3.配置solr

使用tar -zxvf命令解压刚刚下载的solr安装包和下载apache-tomcat安装包

tar -zxvf solr-6.6.2.tgz
tar -zxvf apache-tomcat-8.0.50.tar.gz

在solr-6.6.2目录下拷贝dataimporthandler的jar包

cp dist/solr-dataimporthandler-* server/solr-webapp/webapp/WEB-INF/lib/

在/down/solr目录下载mysql-connector-java-5.1.45安装包

wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.45.tar.gz

解压mysql-connector-java-5.1.45.tar.gz

tar -zxvf mysql-connector-java-5.1.45.tar.gz

拷贝mysql-connector-java

cp mysql-connector-java-5.1.45/mysql-connector-java-5.1.45-bin.jar solr-6.6.2/server/solr-webapp/webapp/WEB-INF/lib/

在/down/solr目录下载ikanalyzer安装包

wget https://github.com/zxiaofan/ik-analyzer-solr6/releases/download/6.6.0/ikanalyzer-6.6.0.jar

解压ikanalyzer

jar -xvf ikanalyzer-6.6.0.jar

拷贝ikanalyzer

cp ikanalyzer-6.6.* solr-6.6.2/server/solr-webapp/webapp/WEB-INF/lib/

在/down/solr目录下载pinyin4j

wget https://github.com/zxiaofan/ik-analyzer-solr6/releases/download/6.6.0/pinyin4j_IKconfig.zip

复制文件

unzip pinyin4j_IKconfig.zip -d pinyin4j
cp pinyin4j/pinyin*.jar solr-6.6.2/server/solr-webapp/webapp/WEB-INF/lib/
mkdir solr-6.6.2/server/solr-webapp/webapp/WEB-INF/classes
cp pinyin4j/ext.dic solr-6.6.2/server/solr-webapp/webapp/WEB-INF/classes/
cp pinyin4j/IKAnalyzer.cfg.xml solr-6.6.2/server/solr-webapp/webapp/WEB-INF/classes/
cp pinyin4j/stopword.dic solr-6.6.2/server/solr-webapp/webapp/WEB-INF/classes/

修改配置solr-6.6.2/server/solr-webapp/webapp/WEB-INF/classes/IKAnalyzer.cfg.xml加入新的内容

<!--词典动态更新时间间隔[首次延时,时间间隔](格式:正整数,单位:分钟)-->
<entry key="dic_updateMin">1,1</entry>

<!--禁用内置主词典main2012.dic(默认false)-->
<!--<entry key="dicInner_disable">true</entry> -->

修改时区,修改solr-6.6.2/bin/solr.in.sh中的SOLR_TIMEZONE="UTC+8"

4.创建core并配置

进入solr-6.6.2目录,执行创建名称为goods的core,如果无法确定实例端口,加上-p 端口号,如果在root用户下启动solr存在风险,要么换个账号,要么加上 -force

bin/solr create -c goods -p 8983 -force

修改配置:修改server/solr/goods/conf目录下的solrconfig.xml,data-config.xml,managed-schema的文件,没有就新建

#solrconfig.xml添加内容(在</config>之前)
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
  <lst name="defaults">
      <str name="config">data-config.xml</str>
  </lst>
</requestHandler>

#data-config.xml添加内容
<?xml version="1.0" encoding="UTF-8"?>
<dataConfig>
  <dataSource
      name="dbSource"
      type="JdbcDataSource"
      driver="com.mysql.jdbc.Driver"
      url="jdbc:mysql://数据库地址:端口号/数据库名称"
      batchSize="-1"
      user="用户名"
      password="密码"
      readOnly="true"
      autoCommit="true"
      netTimeoutForStreamingResults="0"
      />
  <document>
      <entity name="goods" dataSource="dbSource" onError="skip" pk="id" query="select id,address-detail,create_time from lab_model_address"
          deltaImportQuery="select id,address-detail,create_time from lab_model_address where id = '${dih.delta.id}'"
          deltaQuery="select id,address-detail,create_time from lab_model_address where create_time > unix_timestamp('${dataimporter.last_index_time}')">
          <field column="id" name="id" />
          <field column="address-detail" name="address-detail" />
          <field column="create_time" name="create_time" />
      </entity>
  </document>
</dataConfig>

#managed-schema添加内容(在<field name="id" ... />之后)
<field name="address-detail" type="text_ik" indexed="true" stored="true"/>
  <fieldType name="text_pinyin" class="solr.TextField" positionIncrementGap="0">
      <analyzer type="index">
          <tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory"/>
          <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
          <filter class="com.shentong.search.analyzers.PinyinTransformTokenFilterFactory" minTermLenght="2" />
          <filter class="com.shentong.search.analyzers.PinyinNGramTokenFilterFactory" minGram="1" maxGram="20" />
      </analyzer>
      <analyzer type="query">
          <tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory"/>
          <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
          <filter class="solr.LowerCaseFilterFactory" />
      </analyzer>
  </fieldType>
  <fieldType name="text_ik" class="solr.TextField">
      <analyzer type="index" useSmart="false" isMaxWordLength="false" >
          <tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory"/>
          <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
      </analyzer>
      <analyzer type="query" useSmart="true" isMaxWordLength="true" >
          <tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory"/>
          <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
      </analyzer>
  </fieldType>
  <field name="title_ik" type="text_ik" indexed="true" required="true" stored="true"/>
  <copyField source="address-detail" dest="title_ik"/>
  <field name="create_time" type="int" indexed="true" stored="true"/>
  <field name="pinyin" type="text_pinyin" indexed="true" stored="true"/>
  <copyField source="address-detail" dest="pinyin"/>

5.启动,停止,重启solr

bin/solr stop -all
bin/solr start -force
bin/solr stop -all; bin/solr start -force

6.导入数据

浏览器中访问http://IP:8983/, 查看CoreAdmin中是否存在创建的core:goods Core Selector选择新建的core(如goods),选择Dataimport,Command选择full-import,Start, Rows选择合理值,点击Excute执行数据导入

7.验证数据是否导入:接上一步中选择Query,直接点击Execute Query查看结果

8.验证分词是否可用:接上一步中选择Analysis,输入值,类型选择text_ik,查看分词结果(需要分词的数据类型在managed-schema中field的type为text_ik类型。

9.添加批处理任务(apache-solr-dataimporthandler-.jar可以在 我的GitHub下载 )

将apache-solr-dataimporthandler-.jar放到solr-6.6.2/server/solr-webapp/webapp/WEB-INF/lib/ 在solr-6.6.2/server/solr-webapp/webapp/WEB-INF/web.xml中的</web-app>之前加入下面代码

<listener>
 <listener-class>org.apache.solr.handler.dataimport.scheduler.ApplicationListener</listener-class>
</listener>

在solr-6.6.2/server/solr/conf中新建dataimport.properties,文件夹不存在时新建
#################################################
#                                               #
#       dataimport scheduler properties         #
#                                               #
#################################################

#  to sync or not to sync
#  1 - active; anything else - inactive
syncEnabled=1

#  which cores to schedule
#  in a multi-core environment you can decide which cores you want syncronized
#  leave empty or comment it out if using single-core deployment
syncCores=goods,goods-test

#  solr server name or IP address
#  [defaults to localhost if empty]
server=localhost

#  solr server port
#  [defaults to 80 if empty]
port=8983

#  application name/context
#  [defaults to current ServletContextListener's context (app) name]
webapp=solr

#  URL params [mandatory]
#  remainder of URL
#增量
params=/dataimport?command=delta-import&clean=false&commit=true&optimize=false&wt=json&indent=true&entity=goods&verbose=false&debug=false

#  schedule interval
#  number of minutes between two runs
#  [defaults to 30 if empty]
interval=20

#  重做索引的时间间隔,单位分钟,默认7200,即1天;
#  为空,为0,或者注释掉:表示永不重做索引
reBuildIndexInterval=7200

#  重做索引的参数
reBuildIndexParams=/dataimport?command=full-import&clean=true&commit=true&optimize=true&wt=json&indent=true&entity=goods&verbose=false&debug=false

#  重做索引时间间隔的计时开始时间,第一次真正执行的时间=reBuildIndexBeginTime+reBuildIndexInterval*60*1000;
#  两种格式:2012-04-11 03:10:00 或者  03:10:00,后一种会自动补全日期部分为服务启动时的日期
reBuildIndexBeginTime=09:00:00

10.重启solr