Hortonworks Brings Indexing and Search to HDP2.2
Bet you’ve heard about the new Hortonworks Data Platform 2.2, right? If you haven’t I have some good news for you – it comes preloaded with Apache Solr and Banana, the Kibana port, which is great for all of us.
For years now companies like Splunk have made big bucks selling you logging systems on a pay-per-gigabyte basis. And don’t get me wrong, Splunk is a great tool. The thing is that a custom, open source, data platform has the potential of becoming so much more. There is also Elasticsearch which is the origin of Kibana.
Edit Oct 18th 2014: You might want to have a look at how Solr measures up to Elasticsearch as well. They are competitors, and HDP 2.2 made it real.
Previously you would have had the same experience, as is now being introduced in HDP, through Lucidworks SiLK.
Below I’ll show a couple of screenshots from the sandboxed version, running on the Apache Solr tutorial of Hortonworks. You can by the way find a thorough walk-through of most of the techologies at the Hortonworks tutorial page.
Get the sandbox, open in VMWare, Virtualbox or Parallels. In Parallels I had to do some conversion (which will take you 10-15 minutes):
ovftool --lax HDP_2.2_Preview_VMware.ova HDP_2.2_Preview_VMware.vmx prl_convert HDP_2.2_Preview_VMware.vmx --allow-no-os
Logon with SSH as advised in the VM. Start Apache Ambari and Apache Solr from the home directory of the user:
Additionally I modified
start_solr.sh and added
export SOLR_HOME=/opt/solr/solr/hdp as a courtesy to Banana.
The home directory of Solr on HDP is
/opt/solr/solr/hdp. If you now browse to e.g.
http://<ip>:8983/solr/ you’ll find the Solr administrative GUI. This should look like the following.
There are some examples of a core in
/opt/solr/solr/hdp/solr/ (may be compared to the Splunk sourcetype with fields). The one I chose to use is the hdp1 one. I needed to customize it a bit, resulting in the following fields (see the Hortonworks Solr tutorial for more background on fields):
When you’ve finished up, add the core through the web GUI:
Now you can add a couple of entries/documents:
Which should leave you with:
Now that you’ve got a core set up, and some data in it – we are ready to deploy Banana for searching and organizing the indexes.
I’m going to keep this short. Although the changelog of Hortonworks stated that Banana should have been included, it wasn’t in my sandbox image. To get it up and running on Solr, run the following:
yum -y install ant cd /usr/local/src git clone https://github.com/LucidWorks/banana.git cd banana cp jetty-contexts/banana-context.xml /opt/solr/solr/hdp/contexts/
The next step is to compile a web application archive (.war) for integrating Banana with Solr. This is pretty straight forward, and you will basically need to change the IP given in
src/config.js, e.g. like this (from localhost to whatever you run it on):
You are now ready to build and import the war:
ant wait cp build/banana-*.war /opt/solr/solr/hdp/webapps/banana.war
You will need to kill and restart Solr now by the
~/stop_solr.sh. The pid can be identified from
/var/run/solr.pid if anything goes wrong. That should really be it. You may now go to the Banana interface at
Feature-wise HDP 2.2 still seems a little rough in the edges. I couldn’t find Kafka spinning per default on Ambari, but there is an RPM in the repo. Kafka can be found referenced as implemented in Jira also, so an update will probably be out soon.
The integration of Solr is of course more than welcome, and the Banana setup that I had to do myself was of course nothing more than trivial. Anyways, there you’ve got it.
Below is the new Ambari interface by the way.