Upgrading Sitecore 6.6 to Sitecore 8 – SolrCloud (Part 1 of 2) – Sitecore and Solr Cloud Integration

Many thanks for staying patient with me on this post. I have finally able to get some time to write this article and am much excited to share my achievements on solutioning and implementing a production level SolrCloud configuration for one of my customers.

A-lot has been said about supportability by Sitecore on SolrCloud. Sitecore has indicated experimental support for SolrCloud starting with v8.2. Although experimental, this is still very much a strong indication that Sitecore is pushing towards official support. To make it less worrying for you, I had been able to successfully get SolrCloud configured and setup with Sitecore XP 8.0 Update-4 running in a Production environment for my client. I will also like to credit Sitecore Support who have been helpful to determine workaround solutions to address one to two issues encountered along the way.

This article attempts to describe my experience with the setup and configuration of SolrCloud on a simulated Azure environment.

In the first part of this series, I will outline the checklist of prerequisite items needed for the SolrCloud setup on an Azure VM using Infrastructure-as-a-Service (IaaS) approach. The next part of the article goes on to explain how to setup an internal load balancer which acts a distribution point of Solr HTTP queries to be distributed cluster of Azure VMs with an installation of Solr.

What is SolrCloud?

SolrCloud is a multi-server or cluster setup of multiple Solr instances to provide an automatic HADR solution which can scale to large volumes of content. It provides redundancy for your index storages to avoid a single point of failure. With its automatic HADR features, customers will not have to worry about manual recovery to restore another member to become the primary, should the previous primary instance encounter a power or outage failure, network partition failure, server instance downtime etc. For those who are unfamiliar with SolrCloud, I strongly suggest reading the following recommended articles:

  1. https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud
  2. https://support.lucidworks.com/hc/en-us/articles/201298317-What-is-SolrCloud-And-how-does-it-compare-to-master-slave-

In my illustration, I will demonstrate a 3 Solr member instance setup for a minimal automatic HADR setup.

Note*: A Solr member instance is a Solr Server that participates in a quorum or election process managed by Zookeeper as part of the automatic HADR mechanisms.

What are the prerequisite downloads?

To get up an Azure VM setup with a single running Solr member instance, the following items are required and to be setup in order of sequence:

  1. Windows Server 2012 R2 vM servers
  2. Separate Drive Partition(s) for both Solr installation files and log files each to prevent resource contention between transaction processing of rolling log files and the indexing process. – E:/solr log files and zookeeper log files, F:/solr installation
  3. Java Server JRE (http://www.oracle.com/technetwork/java/javase/downloads/index.html) – jdk1.8.0_60
  4. Apache Zookeeper (https://zookeeper.apache.org) – Zookeeper-3.4.6
  5. Apache Solr (http://lucene.apache.org/solr) – Solr 5.3.0

Technical Architecture

  1. 3 Solr Services hosted on each of 3 individual Azure VMs.
  2. 3 Zookeeper Services hosted on the each of the 3 same individual Azure VMs – because Zookeeper services consume fairly minimal resource, this can be safely hosted within the same VMs.
  3. Solr Cloud collection setup with a single shard and replication factor of 3 for minimal automatic HADR.

Installation and Configuration Steps

  1. Download Zookeeper Release version 3.4.6 (stable) from http://www.eu.apache.org/dist/zookeeper/. Zookeeper is a high performance coordination service to manage your Solr configuration for you cluster in a single managed location. For more info on the installation and configuration steps, visit the http://zookeeper.apache.org/doc/r3.4.6/
  2.  Extract out Zookeeper using a utility such as WinRAR or 7Zip. Make sure to extract it out to a separate drive from your system drive (C:/) as the below screenshot. solr-zookeeper-on-dedicated-drive
  3. Edit zoo.cfg and adjust the necessary parameters for example the zookeeper port, peer connection port (2888) and the leadership election port (3888). The zoo.cfg lives in E:\zookeeper-3.4.6\conf.
  4. Edit zoo.cfg. Find value dataDir and set it to zookeeper\data folder (E:/zookeeper-3.4.6/data). Create the folder if the folder doesn’t exist. Note you need to use forward-slashes (/) to separate path segments in this case.
  5. Append a line for each server to the zoo.cfg file. Note in my case I’ve chosen to use IP addresses – 10.0.0.4, 10.0.0.5 and 10.0.0.6 for the three VMs to be created – these will be the statically assigned IPs to the three servers.
    1. server.1=10.0.0.4:2888:3888
    2. server.2=10.0.0.5:2888:3888
    3. server.3=10.0.0.6:2888:3888

6. Recommendation: It is a good idea to separate out storage of log files to a     separate drive for optimum performance. Zookeeper recommends that in order to achieve lower latency on updates, it is important to have a dedicated transaction log directory. By default transaction logs are put in the same directory as the data snapshots and myid file. The dataLogDir parameters indicates a different directory to use for the transaction logs.

7. Recommendation: It is also a good idea to get a full understanding on what Zookeeper does and how to setup, refer to this link (http://zookeeper.apache.org/doc/r3.4.6/zookeeperStarted.html#ch_GettingStarted) 88. Repeats step 1 – 3 for other individual servers to host Zookeeper.
9. Download Solr 5.3.x from http://lucene.apache.org/solr/downloads.html.
10. Extract out the Solr installation zip package to the same drive where Zookeeper is installed. This is where your physical Solr cores (indexes) will live.
11. Repeats step 1 -3 for other individual servers to host Solr.

Running a Solr Member Instance

Once the installation steps are performed, you are almost ready to run your Solr instance member.

For a reminder note:

Note 1*: A Solr member instance is a Solr Server that participates in a quorum or election process managed by Zookeeper as part of the automatic HADR mechanisms.

Note 2*: You can only run a Solr member instance that is part of a ZK quorum during at least one Zookeeper service is running.

Below are the steps to run a Solr Member Instance:

  1. To ensure Zookeeper is running, ensure that zkServer.cmd runs prior to running Solr. In this case, the Zookeeper service is running in the same server as Solr.
  2. The below is an example startup command for a Solr service member in SolrCloud mode that is part of a Zookeeper quorum:

set SOLRDIR=solr-5.3.0\bin

START %Z_TIP%%SOLRDIR%\solr.cmd start -p 8983 -f -c -z “10.0.0.4:2181,10.0.0.5:2182,10.0.0.6:2183” -noprompt

3. Once fired, this Solr member instance will run in a Zookeeper ensemble that is SolrCloud.

4. Repeat steps 1-2 for the remaining two Solr Member Instance.

Once all Solr member instances are up and running, you will be able to browse to the SolrCloud administration panel dashboard.

SolrCloud_Capture

If you have reached this step, you will have successfully setup a vanilla SolrCloud installation.

Schema xml Generation

Sitecore requires that Solr’s schema.xml be generated to include dynamic fields that map to Sitecore’s system fields for proper indexing to take place on Solr. In order to do this, a few steps are required.

You will need to modify the original schema.xml file to prepare it for Sitecore’s Solr Schema Generator (available under Sitecore’s Control Panel) to generate the correct and final schema.xml to be loaded within your cores. For detailed steps to perform this, refer to https://kb.sitecore.net/articles/227897.

Once the schema.xml is generated and if you are planning to utilise a single language for your Sitecore platform, you may replace the following in the final schema.xml.

From:

<dynamicField name="*_t_ar" type="text_ar" indexed="true" stored="true" />
<dynamicField name="*_t_bg" type="text_bg" indexed="true" stored="true" />
<dynamicField name="*_t_ca" type="text_ca" indexed="true" stored="true" />
<dynamicField name="*_t_cz" type="text_cz" indexed="true" stored="true" />
<dynamicField name="*_t_da" type="text_da" indexed="true" stored="true" />
<dynamicField name="*_t_de" type="text_de" indexed="true" stored="true" />
<dynamicField name="*_t_el" type="text_el" indexed="true" stored="true" />
<dynamicField name="*_t_es" type="text_es" indexed="true" stored="true" />
<dynamicField name="*_t_eu" type="text_eu" indexed="true" stored="true" />
<dynamicField name="*_t_fa" type="text_fa" indexed="true" stored="true" />
<dynamicField name="*_t_fi" type="text_fi" indexed="true" stored="true" />
<dynamicField name="*_t_fr" type="text_fr" indexed="true" stored="true" />
<dynamicField name="*_t_ga" type="text_ga" indexed="true" stored="true" />
<dynamicField name="*_t_gl" type="text_gl" indexed="true" stored="true" />
<dynamicField name="*_t_hi" type="text_hi" indexed="true" stored="true" />
<dynamicField name="*_t_hu" type="text_hu" indexed="true" stored="true" />
<dynamicField name="*_t_hy" type="text_hy" indexed="true" stored="true" />
<dynamicField name="*_t_id" type="text_id" indexed="true" stored="true" />
<dynamicField name="*_t_it" type="text_it" indexed="true" stored="true" />
<dynamicField name="*_t_ja" type="text_ja" indexed="true" stored="true" />
<dynamicField name="*_t_lv" type="text_lv" indexed="true" stored="true" />
<dynamicField name="*_t_nl" type="text_nl" indexed="true" stored="true" />
<dynamicField name="*_t_no" type="text_no" indexed="true" stored="true" />
<dynamicField name="*_t_pt" type="text_pt" indexed="true" stored="true" />
<dynamicField name="*_t_ro" type="text_ro" indexed="true" stored="true" />
<dynamicField name="*_t_ru" type="text_ru" indexed="true" stored="true" />
<dynamicField name="*_t_sv" type="text_sv" indexed="true" stored="true" />
<dynamicField name="*_t_th" type="text_th" indexed="true" stored="true" />
<dynamicField name="*_t_tr" type="text_tr" indexed="true" stored="true" />

To:

<dynamicField name="*_t_ar" type="text_general" indexed="true" stored="true" />
<dynamicField name="*_t_bg" type="text_general" indexed="true" stored="true" />
<dynamicField name="*_t_ca" type="text_general" indexed="true" stored="true" />
<dynamicField name="*_t_cz" type="text_general" indexed="true" stored="true" />
<dynamicField name="*_t_da" type="text_general" indexed="true" stored="true" />
<dynamicField name="*_t_de" type="text_general" indexed="true" stored="true" />
<dynamicField name="*_t_el" type="text_general" indexed="true" stored="true" />
<dynamicField name="*_t_es" type="text_general" indexed="true" stored="true" />
<dynamicField name="*_t_eu" type="text_general" indexed="true" stored="true" />
<dynamicField name="*_t_fa" type="text_general" indexed="true" stored="true" />
<dynamicField name="*_t_fi" type="text_general" indexed="true" stored="true" />
<dynamicField name="*_t_fr" type="text_general" indexed="true" stored="true" />
<dynamicField name="*_t_ga" type="text_general" indexed="true" stored="true" />
<dynamicField name="*_t_gl" type="text_general" indexed="true" stored="true" />
<dynamicField name="*_t_hi" type="text_general" indexed="true" stored="true" />
<dynamicField name="*_t_hu" type="text_general" indexed="true" stored="true" />
<dynamicField name="*_t_hy" type="text_general" indexed="true" stored="true" />
<dynamicField name="*_t_id" type="text_general" indexed="true" stored="true" />
<dynamicField name="*_t_it" type="text_general" indexed="true" stored="true" />
<dynamicField name="*_t_ja" type="text_general" indexed="true" stored="true" />
<dynamicField name="*_t_lv" type="text_general" indexed="true" stored="true" />
<dynamicField name="*_t_nl" type="text_general" indexed="true" stored="true" />
<dynamicField name="*_t_no" type="text_general" indexed="true" stored="true" />
<dynamicField name="*_t_pt" type="text_general" indexed="true" stored="true" />
<dynamicField name="*_t_ro" type="text_general" indexed="true" stored="true" />
<dynamicField name="*_t_ru" type="text_general" indexed="true" stored="true" />
<dynamicField name="*_t_sv" type="text_general" indexed="true" stored="true" />
<dynamicField name="*_t_th" type="text_general" indexed="true" stored="true" />
<dynamicField name="*_t_tr" type="text_general" indexed="true" stored="true" />

The Sitecore generated schema.xml must replace the original schema.xml in the configuration folder. This configuration will be used as the shared configuration to be uploaded to Zookeeper. Details on uploading the configuration to Zookeeper is discussed in the “Upload Configuration to Zookeeper” section in this article below.

Solr and Sitecore Integration

Next, you will need to perform the integration between Solr and Sitecore which will involve installing a valid Sitecore Solr package which contains a set of Sitecore and Solr .NET assemblies and configuration to be included within your Sitecore website root. The package can be downloaded from Sitecore’s official documentatation website.

Before you proceed to install the package, I strongly suggest taking a back up of your Sitecore website using Sitecore Instance Manager (SIM). This is to support cases whereby you accidentally missed or skipped a step during the installation process, or wrongly performed a configuration which you may have forgotten along the way. Follow the necessary steps to enable the Solr index configurations and disable the Lucene config files. There will also be a step to modify the Global.asax to ensure the Sitecore application is able to initialize the Solr index and content search configurations accordingly upon appllication initialization. To avoid re-inventing instructional steps, this article will not describe detailed steps to integrate Solr with Sitecore, but you must complete the integration before proceeding to the next part of the series.

For more info on the steps to integrate Solr with Sitecore, you may refer to the community article, https://sitecore-community.github.io/docs/search/solr/Configuring-Solr-for-use-with-Sitecore-8/

Identify Solr Cores

Identify the Solr cores which are required to be setup. For Sitecore XP 8.0 Update-4, the following cores are required:

  1. sitecore_master_index
  2. sitecore_web_index
  3. sitecore_core_index
  4. sitecore_suggested_test_index
  5. sitecore_testing_index
  6. fxm core (sitecore_fxm_master and sitecore_fxm_web will read from this single cor. In Solr Cloud, a core is referred to as a collection.)
  7. sitecore_listing_index
  8. sitecore_social_master
  9. sitecore_social_web
  10. sitecore_analytics_index
  11. sitecore_marketing_asset_index_master
  12. sitecore_marketing_asset_index_web

Upload Configuration to Zookeeper

As opposed to a configuration per index for standalone Solr configuration, SolrCloud’s collection reads its configuration from one single / shared managed location. This will require uploading the configuration to ZooKeeper which is then attached to each created collection for each of the Solr members within the quorum.

Tip: It is recommended to use a single managed configuration for all of your collections.

In one of your Solr VMs, perform the following in order:

  1. Make a copy of basic_config folder in the  E:\solr-5.3.0\server\solr\configsets folder and rename the new folder to sitecore_config.
  2. Copy the generated schema.xml into the sitecore_config folder.
  3. CD to e:\solr-5.3.0\server\scripts\cloud-scripts\
  4. Run the zkcli.cmd to the upload the configuration to Zookeeper:
    zkcli.cmd -zkhost localhost:2181 -cmd upconfig -confdir  E:\solr-5.3.0\server\solr\configsets\sitecore_common_config -confname sitecore_common_config
  5. Once step 4 is completed, this configuration is now uploaded to Zookeeper and shared with the Solr members of the cluster.

Create the Collections

In SolrCloud, collection is often referred to as the logical collection which makes up the physical cores. For example, sitecore_master_index is a logical collection that makes up the following physical indexes that comprise the logical set.

  1.  sitecore_master_index_shard1_replica1
  2. sitecore_master_index_shard1_replica2
  3. sitecore_master_index_shard1_replica3

Once all Solr members are up and running, the next step is to create all 12 logical collections in SolrCloud. Perform the following in order.

From a command line, perform the following in order:

cd D:\Solr-5.3.0-Instance\bin
./solr.cmd create_collection -c sitecore_core_index -d sitecore_configs -n sitecore_common_config -shards 1 -replicationFactor 3; 
./solr.cmd create_collection -c sitecore_master_index -n sitecore_common_config -shards 1 -replicationFactor 3;
./solr.cmd create_collection -c sitecore_web_index -n sitecore_common_config -shards 1 -replicationFactor 3;
./solr.cmd create_collection -c sitecore_analytics_index -n sitecore_common_config -shards 1 -replicationFactor 3;
./solr.cmd create_collection -c sitecore_marketing_asset_index_master -n sitecore_common_config -shards 1 -replicationFactor 3;
./solr.cmd create_collection -c sitecore_marketing_asset_index_web -n sitecore_common_config -shards 1 -replicationFactor 3;
./solr.cmd create_collection -c sitecore_testing_index -n sitecore_common_config -shards 1 -replicationFactor 3;
./solr.cmd create_collection -c sitecore_suggested_test_index -n sitecore_common_config -shards 1 -replicationFactor 3;
./solr.cmd create_collection -c fxm -n sitecore_common_config -shards 1 -replicationFactor 3; 
./solr.cmd create_collection -c sitecore_list_index -n sitecore_common_config -shards 1 -replicationFactor 3;
./solr.cmd create_collection -c social_messages_master -n sitecore_common_config -shards 1 -replicationFactor 3

The first command above creates the collection and uploads the configuration specified in the sitecore_config directory to a zookeeper directory called sitecore_common_config.

The commands after the first utilise the ZooKeeper configuration uploaded in the first command. All collections share this same configuration.

Each create collection command will create the physical index folders across the three Solr member instances. The create collection command will issue a HTTP request to the Solr Server to execute the collection creation and ensure all appropriate index folders and configuration information is created accordingly. A success message is displayed once the command executes successfully.

In summary, you will have by now able to achieve a setup of a fully working SolrCloud service. To validate that the SolrCloud service’s collection is attached to Sitecore, it is recommended that you login into your Sitecore CMS portal as an Administrator, and attempt to rebuild each indexes. For example when you are testing the rebuild of the Master index, monitor changes in the directory of the physical index folder, for index files that are recreated with latest time stamps. Repeat this for the other indexes. This will have validated a successful and fully working SolrCloud installation, configuration and setup.

In the next part of this article, I will demonstrate how you can use an automated approach to spin up your Solr Cloud service using a Windows Scheduler Task.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.