Hortonworks Brings Indexing and Search to HDP2.2

Bet you’ve heard about the new Hortonworks Data Platform 2.2, right? If you haven’t I have some good news for you – it comes preloaded with Apache Solr and Banana, the Kibana port, which is great for all of us.

For years now companies like Splunk have made big bucks selling you logging systems on a pay-per-gigabyte basis. And don’t get me wrong, Splunk is a great tool. The thing is that a custom, open source, data platform has the potential of becoming so much more. There is also Elasticsearch which is the origin of Kibana.

Edit Oct 18th 2014: You might want to have a look at how Solr measures up to Elasticsearch as well. They are competitors, and HDP 2.2 made it real.

Previously you would have had the same experience, as is now being introduced in HDP, through Lucidworks SiLK.

Below I’ll show a couple of screenshots from the sandboxed version, running on the Apache Solr tutorial of Hortonworks. You can by the way find a thorough walk-through of most of the techologies at the Hortonworks tutorial page.

Get the sandbox, open in VMWare, Virtualbox or Parallels. In Parallels I had to do some conversion (which will take you 10-15 minutes):

ovftool --lax HDP_2.2_Preview_VMware.ova HDP_2.2_Preview_VMware.vmx
prl_convert HDP_2.2_Preview_VMware.vmx --allow-no-os
  1. Logon with SSH as advised in the VM. Start Apache Ambari and Apache Solr from the home directory of the user:

    ./start_ambari.sh
    ./start_solr.sh
    

Additionally I modified start_solr.sh and added export SOLR_HOME=/opt/solr/solr/hdp as a courtesy to Banana.

The home directory of Solr on HDP is /opt/solr/solr/hdp. If you now browse to e.g. http://<ip>:8983/solr/ you’ll find the Solr administrative GUI. This should look like the following.

Solr administrative GUI on HDP 2.2

There are some examples of a core in /opt/solr/solr/hdp/solr/ (may be compared to the Splunk sourcetype with fields). The one I chose to use is the hdp1 one. I needed to customize it a bit, resulting in the following fields (see the Hortonworks Solr tutorial for more background on fields):

When you’ve finished up, add the core through the web GUI:

Now you can add a couple of entries/documents:

Which should leave you with:

Now that you’ve got a core set up, and some data in it – we are ready to deploy Banana for searching and organizing the indexes.

I’m going to keep this short. Although the changelog of Hortonworks stated that Banana should have been included, it wasn’t in my sandbox image. To get it up and running on Solr, run the following:

yum -y install ant
cd /usr/local/src
git clone https://github.com/LucidWorks/banana.git
cd banana
cp jetty-contexts/banana-context.xml /opt/solr/solr/hdp/contexts/

The next step is to compile a web application archive (.war) for integrating Banana with Solr. This is pretty straight forward, and you will basically need to change the IP given in src/config.js, e.g. like this (from localhost to whatever you run it on):

solr: "http://<ip>:8983/solr/",

You are now ready to build and import the war:

ant
wait
cp build/banana-*.war /opt/solr/solr/hdp/webapps/banana.war

You will need to kill and restart Solr now by the ~/stop_solr.sh. The pid can be identified from /var/run/solr.pid if anything goes wrong. That should really be it. You may now go to the Banana interface at http://<host>:8983/banana/

Summary

Feature-wise HDP 2.2 still seems a little rough in the edges. I couldn’t find Kafka spinning per default on Ambari, but there is an RPM in the repo. Kafka can be found referenced as implemented in Jira also, so an update will probably be out soon.

The integration of Solr is of course more than welcome, and the Banana setup that I had to do myself was of course nothing more than trivial. Anyways, there you’ve got it.

Below is the new Ambari interface by the way.

Ambari 1.7

Tommy

Tommy is an analyst and incident handler with more than seven years of experience from the government and private industry. He holds an M.Sc. in Digital Forensics and a B.Tech. in information security