Monthly Archives: August 2013

A First Exploration Of SolrCloud

Post by: NGData

SolrCloud has recently been in the news and was merged into Solr trunk, so it was high time to have a fresh look at it.The SolrCloud wiki page gives various examples but left a few things unclear for me. The examples only show Solr instances which host one core/shard, and it doesn’t go deep on the relation between cores, collections and shards, or how to manage configurations.

In this blog, we will have a look at an example where we host multiple shards per instance, and explain some details along the way.

The setup we are going to create is shown in this diagram.

solrcloud sample setup

SolrCloud terminology

In SolrCloud you can have multiple collections. Collections can be divided into partitions, these partitions are called slices. Each slice can exist in multiple copies, these copies of the same slice are called shards. So the word shard has a bit of a confusing meaning in SolrCloud, it is rather a replica than a partition. One of the shards within a slice is the leader, though this is not fixed: any shard can become the leader through a leader-election process.

Each shard is a physical index, so one shard corresponds to one Solr core.

If you look at the SolrCloud wiki page, you won’t find the word slice [anymore]. It seems like the idea is to hide the use of this word, though once you start looking a bit deeper you will encounter it anyway so it’s good to know about it. It’s also good to know that the words shard and slice are often used in ambiguous ways, switching one for the other (even in the sources). Once you know this, things become more comprehensible. An interesting quote in this regard: “removing that ambiguity by introducing another term seemed to add more perceived complexity”. In this article I’ll use the words slice and shard as defined above, so that we can distinguish the two concepts.

In SolrCloud, the Solr configuration files like schema.xml and solrconfig.xml are stored in ZooKeeper. You can upload multiple configurations to ZooKeeper, each collection can be associated with one configuration. The Solr instances hence don’t need the configuration files to be on the file system, they will read them from ZooKeeper.

Running ZooKeeper

Let’s start by launching a ZooKeeper instance. While Solr allows to run an embedded ZooKeeper instance, I find that this rather complicates things. ZooKeeper is responsible for storing coordination and configuration information for the cluster, and should be highly available. By running it separately, we can start and stop Solr instances without having to think about which one(s) embed ZooKeeper.

For the purpose of this article, you can get it running like this:

  • download ZooKeeper 3.3. Don’t take version 3.4, as it’s not recommended for production yet.
  • extract the download
  • copy conf/zoo_sample.cfg to conf/zoo.cfg
  • mkdir /tmp/zookeeper (path can be changed in zoo.cfg)
  • ./bin/ start

And that’s it, you have ZooKeeper running.

Setting up Solr instance directories

We are going to run two Solr instances, thus we’ll need two Solr instance directories.

Let’s create two directories like this:

mkdir -p ~/solrcloudtest/sc_server_1
mkdir -p ~/solrcloudtest/sc_server_2

Create the file ~/solrcloudtest/sc_server_1/solr.xml containing:

<solr persistent="true">
  <cores adminPath="/admin/cores" hostPort=”8501”>

Create the file ~/solrcloudtest/sc_server_2/solr.xml containing (note the different hostPort value):

<solr persistent="true">
  <cores adminPath="/admin/cores" hostPort=”8502”>

We need to specify the hostPort attribute since Solr can’t detect the port, it falls back to the default 8983 when not specified.

This is all we need: the actual core configuration will be uploaded to ZooKeeper in the next section.

Creating a Solr configuration in ZooKeeper

As explained before, the Solr configuration needs to be available in ZooKeeper rather than on the file system.

Currently, you can upload a configuration directory from the file system to ZooKeeper as part of the Solr startup. It is also possible to run ZkController’s main method for this purpose (SOLR-2805), but as there’s no script to launch it, the easiest way right now to upload a configuration is by starting Solr:

export SOLR_HOME=/path/to/solr-trunk/solr

# important: move into one the instance directory!
# (otherwise Solr will start up with defaults and create a core etc.)
cd sc_server_1

java \
 -Djetty.port=8501 \
 -Djetty.home=$SOLR_HOME/example/ \
 -Dsolr.solr.home=. \
 -Dbootstrap_confdir=$SOLR_HOME/example/solr/conf \
 -Dcollection.configName=config1 \
 -DzkHost=localhost:2181 \
 -jar $SOLR_HOME/example/start.jar

Now that the configuration is uploaded, you can stop this Solr instance again (ctrl+c).

We can now check in ZooKeeper that the configuration has been uploaded, and that no collections have been created yet.

For this, go to the ZooKeeper directory, and run ./bin/, and do the following commands:

[zk: localhost:2181(CONNECTED) 1] ls /configs

[zk: localhost:2181(CONNECTED) 2] ls /collections

You could repeat this process to upload more configurations.

If you would like to change a configuration later on, you essentially have to upload it again in the same way. The various Solr cores that make use of that configuration won’t be reloaded automatically however (SOLR-3071).

Starting the Solr servers

All SolrCloud magic is activated by specifying the zkHost parameter. Without this parameter, you run Solr ‘classic’, with the parameter, you run SolrCloud. If you look into the source code, you will see that this parameter causes the creation of a ZkController, and at various places checks of the kind ‘zkController != null’ are done to change behavior when in cloud mode.

Open two shells, and start the two Solr instances:

export SOLR_HOME=/path/to/solr-trunk/solr
cd sc_server_1
java \
 -Djetty.port=8501 \
 -Djetty.home=$SOLR_HOME/example/ \
 -Dsolr.solr.home=. \
 -DzkHost=localhost:2181 \
 -jar $SOLR_HOME/example/start.jar

and (note: different instance dir & jetty port)

export SOLR_HOME=/path/to/solr-trunk/solr
cd sc_server_2
java \
 -Djetty.port=8502 \
 -Djetty.home=$SOLR_HOME/example/ \
 -Dsolr.solr.home=. \
 -DzkHost=localhost:2181 \
 -jar $SOLR_HOME/example/start.jar

Note that now, we don’t have to specify the boostrap_confdir and collection.configName properties anymore (though that last one can still be useful as default sometimes, but not with the way we will create collections & shards below).

We have neither added the -Dnumshards parameter, which you might have encountered elsewhere. When you manually assign cores to shards as we will do below, I don’t think it serves any purpose.

So the situation now is that we have two Solr instances running, both with 0 cores.

Define the cores, collections, slices, shards

We are now going to create cores, and assign each core to a specific collection and slice. It is not necessary to define collections & shards anywhere, they are implicit by the fact that there are cores that refer them.

In our example, the collection is called ‘collectionOne’ and the slices are called ‘slice1′ and ‘slice2′.

Let’s start with creating a core on the first server:

curl 'http://localhost:8501/solr/admin/cores?action=CREATE&name=core_collectionOne_slice1_shard1&collection=collectionOne&shard=slice1&collection.configName=config1'

(in the URL above, and the solr.xml snippet below, the word ‘shard’ is used for ‘slice’)

If you have a look now at sc_server_1/solr.xml, you will see the core was added:

<?xml version="1.0" encoding="UTF-8" ?>
<solr persistent="true">
  <cores adminPath="/admin/cores" zkClientTimeout="10000" hostPort="8501"

AFAIU the information in ZooKeeper takes precedence, so the attributes collection and shard on the core above serve more as documentation, or they are of course also relevant if you would create cores by listing them in solr.xml rather than using the cores-admin API. Actually listing them in solr.xml might be simpler than doing a bunch of API calls, but there is currently one limitation: you can’t specify the configName this way.

In ZooKeeper, you can verify this collection is associated with the config1 configuration:

[zk: localhost:2181(CONNECTED) 5] get /collections/collectionOne

And you can also get an overview of all collections, slices and shards like this:

[zk: localhost:2181(CONNECTED) 0] get /clusterstate.json
  "collectionOne": {
    "slice1": {
      "fietsbel:8501_solr_core_collectionOne_slice1_shard1": {

(the somewhat strange “shard_id”:”slice1″ is just a back-pointer from the shard to the slice to which it belongs)

Now let’s create the remaining cores: one more on server 1, and two on server 2 (notice the different port numbers to which we send these requests).

curl 'http://localhost:8502/solr/admin/cores?action=CREATE&name=core_collectionOne_slice2_shard1&collection=collectionOne&shard=slice2&collection.configName=config1'

curl 'http://localhost:8501/solr/admin/cores?action=CREATE&name=core_collectionOne_slice2_shard2&collection=collectionOne&shard=slice2&collection.configName=config1'

curl 'http://localhost:8502/solr/admin/cores?action=CREATE&name=core_collectionOne_slice1_shard2&collection=collectionOne&shard=slice1&collection.configName=config1'

Let’s have a look in ZooKeeper at the current state of the clusterstate.json:

[zk: localhost:2181(CONNECTED) 0] get /clusterstate.json
  "collectionOne": {
    "slice1": {
      "fietsbel:8501_solr_core_collectionOne_slice1_shard1": {
      "fietsbel:8502_solr_core_collectionOne_slice1_shard2": {
    "slice2": {
      "fietsbel:8502_solr_core_collectionOne_slice2_shard1": {
      "fietsbel:8501_solr_core_collectionOne_slice2_shard2": {

We see we have:

  • one collection named collectionOne
  • two slices named slice1 and slice2
  • each slice has two shards. Within each slice, one shard is the leader (see “leader”:”true”), the other(s) are replicas. Of each slice, one shard is hosted in each Solr instance.

Adding some documents

Now let’s try our setup works by adding some documents:

cd $SOLR_HOME/example/exampledocs
java -Durl=http://localhost:8501/solr/core_collectionOne_slice1_shard1/update -jar post.jar *.xml

We sent the request to one specific core, but you could have picked any other core and the end result would be the same. The request will be forwarded automatically to the leader shard of the appropriate slice. The slice is selected based on the hash of the id of the document.

Use the admin stats page to see documents got added to all cores:





In my case, the cores for slice1 got 16 documents, those for slice2, 12. Unlike the traditional Solr replication, with SolrCloud updates are sent directly to the replica’s.


Let’s just query all documents. Again, we send the request to one particular shard:


You will see the numFound=”28”, the sum of 16 and 12.

What happens internally is that when you sent a request to a core, when in SolrCloud mode, Solr will look up what collection the core is associated with, and do a distributed query across all slices (it will pick one shard for each slice).

The SolrCloud wiki page gives the suggestion that you can use the collection name in the URL (like /solr/collection1/select). In our example, this would then be /solr/collectionOne/select. This is however not the case, but rather a particularity of that example. As long as you don’t host more than one slice and shard of the same collection in one Solr server, it can make sense to use such a core naming strategy.

Starting from scratch

When playing around with this stuff, you might want to start from scratch sometimes. In such case, don’t forget you have to remove data in three places: (1) the state stored in ZooKeeper (2) the cores defined in solr.xml and (3) the instance directories of these cores.

When writing the first draft of this article, I was using just one Solr instance and tried to have all the 4 cores (including replica’s) in one Solr instance. Turns out there was a bug that prevents this from working correctly (SOLR-3108).

Managing slices & shards

Once you have defined a collection, you can not (or rather should not) add new slices to it, since documents won’t be automatically moved to the new slice to fit with the hash-based partitioning (SOLR-2595).

Adding more replica shards should be no problem though. While above we have used a very explicit way of assigning each core to a particular slice, you can actually leave that parameter off and Solr will automatically assign it to some slice within the collection. (I guess here the -Dnumshards parameter kicks in to decide whether the new core should be a slice or a shard)

How about removing replicas? It can be done, but manually. You have to unload the core and remove the related state in ZooKeeper. This is an area that will be improved upon later. (SOLR-3080)

Another interesting thing to note is that when your run in SolrCloud mode, all cores will automatically take part in the cloud thing. If you add a core without specifying the collection, a collection named after that core will be created. You can’t mix ‘classic’ cores and ‘cloud’ cores in one Solr instance.


In this article we have barely touched the surface of everything SolrCloud is: there’s the update log for durability and recovery, the sync’ing between replica’s, the details of distributed querying and updating, the special _version_ field to help with some of the these points, the coordination (election of overseer & shard leaders), … Much interesting stuff to explore!

As becomes clear from this article, SolrCloud isn’t as easy to use yet as ElasticSearch. It still needs polishing and there’s more manual work involved in setting it up. To some extent this has its advantages, as long as it’s clear what you can expect from the system, and what you have to take care of yourself. Anyway, it’s great to see that the Solr developers were able to catch up with the cloud world.


Copy from:

Leave a comment

Posted by on August 14, 2013 in Application Server, Solr


Install Apache Solr and Tomcat under Windows

Solr logo

Installing and getting Solr up and running is straight forward, as long as you follow these simple steps. I ran into problems myself trying to get Solr starting up under Tomcat, even after reading several posts explaining how to, which is why I wrote this post, explaining all the necessary steps. Feel free to make a comment, if you have any questions or run into problems.

Pre Requirements:

1. Download Java SDK version 7 or above from:

2. Download Apache Tomcat from Apache Software Foundation;

3. Download the latest version of “Solr”:

Installation and Configuration:

– Open, navigate to \example\solr and extract the directories and files to c:\solr\, so it contains; bin, collection1, README.txt, solr.xml and zoo.cfg

– Again in navigate to \example\lib and extract all the jar-files into c:\Tomcat\lib\. Repeat the same procedure with the jar-files in \examples\lib\ext.

– In navigate to \example\webapps and extract solr.war to c:\Tomcat\webapps\. If the file is called solr-versionNumber.war, rename it to solr.war

3. In c:\Tomcat\conf\Catalina\localhost created a file called “solr.xml” with following content, where you specify solr docBase “c:/Tomcat/webapps/solr.war” and enviroment value “/Tomcat/webapps/”.

<?xml version="1.0" encoding="UTF-8"?>
<context docBase="c:/Tomcat/webapps/solr.war" debug="0" crossContext="true">
    <environment name="solr/home" type="java.lang.String" value="/Tomcat/webapps/" override="true"></environment>

4. Open up c:\Tomcat\webapps\solr\WEB-INF\web.xml and uncomment the following and replace “/put/your/solr/home/here” with your solr location


During these steps you might have to stop and start your Apache Tomcat to be able to write to the files and reload the changes. To stop/start Apache Tomcat, click the Apache icon in the task bar and click “Stop service / Start service”.

Restart Apache Tomcat Windows

You should now be able to navigate to http://localhost:8080/solr/ and get the Solr admin site, presuming you installed Apache Tomcat on port 8080 (as per default)

Solr 4.4.0 admin page

If you run into problems, you can view the Catalina log located in c:\Tomcat\logs\

*** Notation If you got error 404 and 503, please go to

1. C:\Tomcat\webapps and delete directory named [solr], wait until new directory is called [solr] generate, go to C:\tomcat\webapps\solr\WEB-INF and edit file web.xml in section [Solr Home]

2. Go to Apache Tomcat Properties (Taskbar Notification -> right click on Apache Tomcat -> Configure) , in Java Tab; add [-Dsolr.solr.home=C:\solr] in the bottom line

Or in (/usr/share/tomcat/bin/

JAVA_OPTS="-Dsolr.solr.home=/etc/tomcat6/solr -Djava.awt.headless=true -server -XX:NewSize=256m -XX:MaxNewSize=256m -XX:PermSize=256m -XX:MaxPermSize=256m -XX:+DisableExplicitGC"

Copy from:

Leave a comment

Posted by on August 13, 2013 in Application Server, Solr


Quick and easy setup of Mercurial server on Windows

by Gilles QUERRET

Installing Mercurial on a Linux server is a matter of minutes (depending on your Linux distro). But I had to install it on a Windows server, and there are subtle pitfalls you’re likely to encounter when doing that. This post is by no mean a full and detailed installation process for every configuration, it will just focus on a simple install.

Step 1 : HTTPD install
HTTPD is one of the widely used HTTP server. The Windows version can be downloaded fromhere. I’m using the 2.2.17 Win32 Binary including OpenSSL 0.9.8o. Choosing a typical install will use C:\Program Files (x86)\Apache Software Foundation\Apache2.2 as the default directory. The only specific step during install is this screen :

Just fill with appropriate values depending on your network.

Step 2 : Python install
Python is needed by Mercurial. You always have to use the same Python version as the one used by Mercurial. The current Mercurial version (1.7.5) is still using 2.6, so download this version from Python website. Even if you’re running Win x64, download the x86 version as I encountered some bugs with the 64 bits package (no idea why). Use the default settings for everything (especially installation directory).

Step 3 : Mercurial install
Mercurial compiled for Windows can be downloaded from here. Never use TortoiseHg to setup a Mercurial server, it won’t work correctly (and easily). Always download *.win32-py2.6.exe packages, they’re easier to setup.
Setup should detect your Python version and so compile Mercurial with Python 2.6.

Step 4 : Create directory structure
This test installation will use 3 differents directories to host repositories. We will use subfolders of C:\Repositories , named /Production, /QA and /Dev
We will keep them empty for now, just create this structure.

Step 5 : HTTPD setup
Create a new file called hgweb.cgi in C:\Program Files (x86)\Apache Software Foundation\Apache2.2\cgi-bin with this content :

#!C:/Python26/python.exe -u

# See also
# Path to repo or hgweb config to serve (see ‘hg help hgweb’)
config = “C:/Program Files (x86)/Apache Software Foundation/Apache2.2/cgi-bin/hgweb.config”

# Uncomment and adjust if Mercurial is not installed system-wide:
# import sys; sys.path.insert(0, “SomeDirectory”)

# Uncomment to send python tracebacks to the browser if an error occurs:
import cgitb; cgitb.enable()

from mercurial import demandimport; demandimport.enable()
from mercurial.hgweb import hgweb, wsgicgi
application = hgweb(config)

Create a new file called hgweb.config in C:\Program Files (x86)\Apache Software Foundation\Apache2.2\cgi-bin with this content :
/Production = C:/Repositories/Production/*
/QA = C:/Repositories/QA/*
/Dev = C:/Repositories/Dev/*
allow_push = *
style = monoblue
contact =
push_ssl = false

 Add this line in C:\Program Files (x86)\Apache Software Foundation\Apache2.2\conf\httpd.conf in the <IfModule alias_module=””> section :


ScriptAlias /hg “C:/Program Files (x86)/Apache Software Foundation/Apache2.2/cgi-bin/hgweb.cgi

This line allows HTTPD to map the /hg query directly to Mercurial (instead of calling /cgi-bin/hgweb.cgi).
Don’t forget to restart HTTPD.

Step 6 : Test !
Open your web browser at http://localhost/hg and voilà ! You should have this page :


Step 7 : Enjoy
You can now import/clone your repositories


Copy From:

Leave a comment

Posted by on August 12, 2013 in Control Version, Mercurial


How to keep footers at the bottom of the page


When an HTML page contains a small amount of content, the footer can sometimes sit halfway up the page leaving a blank space underneath. This can look bad, particularly on a large screen. Web designers are often asked to push footers down to the bottom of the viewport, but it’s not immediately obvious how this can be done.

A diagram describing the footer problem and the ideal solution

When I first ditched tables for pure CSS layouts I tried to make the footer stay at the bottom but I just couldn’t do it. Now, after a few years of practice I have finally figured out a neat way to do it. My method uses 100% valid CSS and it works in all standards compliant browsers. It also fails gracefully with older browsers so it’s safe to use on any website.


See it in action: View my demo with the footer at the bottom

The main features

  • Works in all modern, standards compliant browsers

    Compatible browsers: Firefox (Mac & PC), Safari (Mac & PC), Internet Explorer 7, 6 & 5.5, Opera and Netscape 8

  • Fails gracefully on older browsers

    Older non-standards compliant browsers position the footer under the content as per normal. We can’t help it if people are using an out of date browser, all we can do is reward people who have upgraded by giving them a better browsing experience through progressive enhancement.

  • Longer content pushes the footer off the page

    On long pages with lots of content the footer is pushed off the visible page to the very bottom. Just like a normal website, it will come into view when you scroll all the way down. This means that the footer isn’t always taking up precious reading space.

  • 100% valid CSS with no hacks

    The CSS used in this demo is 100% valid and contains no hacks.

  • No JavaScript

    JavaScript is not necessary because it works with pure CSS.

  • iPhone compatible

    This method also works on the iPhone and iPod Touch in the mobile Safari browser.

  • Free to download

    Simply save the source code of my demo page and use it however you like.

There is only one limitation

You must set the height of the footer div to something other than auto. Choose any height you like, but make sure the value is specified in pixels or ems within your CSS. This is not a big limitation, but it is essential for this method to work correctly.

If you have a lot of text in your footer then it’s also a good idea to give the text a bit more room at the bottom by making your footer a bit deeper. This is to cater for people who have their browser set to a larger text size by default. Another way to solve the same problem is to set the height of the footer in em units; this will ensure that the footer grows in size along with the text. If you only have images in your footer than there’s nothing to worry about – just set your footer height to a pixel value and away you go.

So how does it work?

It’s actually not that complicated. There are two parts to it – the HTML and the CSS.

The HTML div structure

<div id="container">
   <div id="header"></div>
   <div id="body"></div>
   <div id="footer"></div>

There are only four divs required for this to work. The first is a container div that surrounds everything. Inside that are three more divs; a header, a body and a footer. That’s it, all the magic happens in the CSS.


body {
#container {
#header {
#body {
   padding-bottom:60px;   /* Height of the footer */
#footer {
   height:60px;   /* Height of the footer */

And one simple CSS rule for IE 6 and IE 5.5:

#container {

The html and body tags

The html and body tags must be set to height:100%; this allows us to set a percentage height on our container div later. I have also removed the margins and padding on the body tag so there are no spaces around the parameter of the page.

The container div

The container div has a min-height:100%; this will ensure it stays the full height of the screen even if there is hardly any content. Many older browsers don’t support the min-height property, there are ways around it with JavaScript and other methods but that is out of scope for this article. The container div is also set to position:relative; this allows us to absolutely position elements inside it later.

The header div

There is nothing unusual with the header. Make it whatever colour and size you like.

The body div

The body is quite normal too. The only important thing is it must have a bottom padding that is equal to (or slightly larger than) the height of the footer. You can also use a bottom border if you prefer but a margin won’t work.

The footer div

The footer has a set height in pixels (or ems). The div is absolutely positioned bottom:0; this moves it to the bottom of the container div. When there is little content on the page the container div is exactly the height of the browser viewport (because of the min-height:100%;) and the footer sits neatly at the bottom of the screen. When there is more than a page of content the container div becomes larger and extends down below the bottom of the viewport – the footer is still positioned at the bottom of the container div but this time you need to scroll down to the end of the page to see it. The footer is also set to width:100%; so it stretches across the whole page.

The IE 6 & IE 5.5 CSS

Older versions of Internet Explorer don’t understand the min-height property but lucky for us the normal height property behaves exactly the same way in these old Microsoft browsers, that is, it will stretch to 100% height of the viewport but if the content is longer it will stretch even further. We simply expose this 100% height rule to Internet Explorer only by using IE conditional comments. View the source on the demo to see how this is done.


So there you have it… A simple, valid way to make the footer get down! I hope you find it useful.

Copy from:

Leave a comment

Posted by on August 2, 2013 in CS Style