Using Elastic Stack to aid with your GSA replacement Planning

Elastic Stack for GSA log analysis

The Elastic Stack provides insights on how your users are searching.

Log analysis is the first step to planning your GSA replacement.

Why use Elastic Stack (ELK) in GSA replacement planning?

We have written previously on uses for the Elastic Stack. We often the first step and deciding where to go to look at where you’ve been. If you have watched any of our webinars regarding replacement strategies for your Google Search Appliance (GSA), you’ll know that we recommend analyzing your current GSA usage to determine how you are using your GSA. Knowing this will help you as you examine your existing use case to understand many aspects of what your replacement technology should do. Looking at the logs can provide an understanding of essential metrics:

  • Your query volume
    • At Peak (Queries Per Second [QPS])
    • Per month (Queries Per Month [QPM])
  • What returns zero results
  • Filter usage
  • Potentially where the users are performing the search
  • What results are being clicked on (If you are tracking clicks)
  • Query Intent

The process to do this is to download the logs from the GSA and then import them into a tool to analyze them. While these logs are t delimited, Excel is not the best tool because the log contains ‘packed’ fields that need to be parsed to be useful. Given our experience with the Elastic stack, our preference is to use ELK (Elastic, Logstash, and Kabana).

ELK is becoming the de facto analytic platform. You can spin up a stack on Google Cloud Platform in minutes, or if you are a little savvy, you can run it locally on your PC. MC+A does offer consulting services for ELK and Elastic. Contact our Professional Services team if you want to set up an environment of your own.

Since it’s high season for companies replacing their Google Search Appliances before they expire, we thought to share this guide to load and analyze those logs.

Get your logs into ELK

Step 1 – Installing ELK stack

First of all, we need to install ELK stack. For this purpose we recommend one of two approaches:

  1. Install ELK locally
  2. Spin up an Elastic cloud instance on Google Cloud Platform

Option 1 – Installing ELK Locally
If you have homebrew, you can install Elastic, Logstash and Kibana via the command line:


$ brew install elasticsearch
$ brew install kibana
$ brew install logstash

Option 2 – Spin up an Elastic Cloud Instance

If you goto www.elastic.co and sign up, you can get a 14-day trial instance.

Step 2 – Input Logs into Elastic

With an instance or cluster of ELK, we can start to push log data into it. These logs come from the GSA (Note: the GSA only stores the last 90 days of logs on the appliance). You can export these by individual collection or all logs. We recommend that you export all logs as we can filter on the site parameter.

Place all of the logs into a directory and note the location. Logstash, which will be used to feed the logs to Elastic, works only with a fully qualified path.

Configuring Logstash

Logstash has set of plugins for different types of inputs, outputs, and filters, which can read, parse and filter data according to your needs. The configuration of these plugins in Logstash can found in the configuration file (.conf extension).

Logstash Configuration file

A Logstash config file has a separate section for each type of plugin you want to add to the event processing pipeline. Each section contains the configuration options for one or more plugins. If you specify multiple filters, they are applied in the order of their appearance in the configuration file. For more information about the configuration file structure, click here.
So, for our case, the configuration file that we would use to parse the logs will be this one:


input {
file {
# Point to where needed, with absolute path!
path => "/Users/user/gsa/logs/*.log"
# From which to read the log ( "end" to take live data)
start_position => "beginning"
sincedb_path => "/dev/null"
ignore_older => 0
}
}
filter {
grok {
# This splits the log line format for the GSA
match => { "message" => "%{URIHOST}!%{COMMONAPACHELOG} %{NUMBER:count} %{NUMBER:servingtime}"}
# Match => { "message" => "% {COMMONAPACHELOG}"}
}
kv {
# With this parses even querystrings¡
source => "request"
target => "params"
field_split => "? &"
}
date {
# Formatting the log's date to match the @timestamp in Kibana
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
# If the months are in English, or another language in such a case
locale => "en"
}

ruby {
code =>
"
event.set('servingtime',event.get('servingtime').to_f)
event.set('response',event.get('response').to_i)
event.set('bytes', event.get('bytes').to_i)
event.set('count', event.get('count').to_i)
event.set('[params][q]', event.get('[params][q]').gsub('+',' '))

#--inmeta: split
if event.get('[params][q]').to_s.include? 'inmeta:'
str = event.get('[params][q]').to_s
str = str.gsub('%2520',' ').gsub('%3D','=').gsub('%2528','(').gsub('%252D','-').gsub('%2529',')').gsub('%252C',',')
str = str.to_s[7+str.to_s.index('inmeta:')..-1].split('inmeta:')
# Now set a new parm with value
event.set('[params][q_inmeta]', str)

end

#--Keyword split
# if exists 'dnavs'
if event.get('[params][dnavs]')
#if 'dnavs' include 'inmeta:' and is not the first string
if event.get('[params][dnavs]').to_s.include? 'inmeta:' and event.get('[params][dnavs]').to_s.index('inmeta:')!=0
event.set('[params][d_keyword]', event.get('[params][dnavs]').to_s[0..event.get('[params][dnavs]').to_s.index('inmeta:')-(1)])
elsif !(event.get('[params][dnavs]').to_s.include? 'inmeta:')
event.set('[params][d_keyword]', event.get('[params][dnavs]') )
end
end
"
}

# update the geo database to where you have installed it or remove this
geoip {
source => "clientip"
target => "geoip"
database => "/Users/user/geo/GeoLite2-City.mmdb"
add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]
}
mutate {
convert => [ "[geoip][coordinates]", "float"]
rename => { "[params][q_inmeta]" => "[params][q_inmeta]" }
}
}
output {
# update this to your elastic cluster
elasticsearch {hosts => "localhost"}
# Stdout colored, useful for debug
# stdout {codec => "json"}
}

We can see that the file contains the three sections Input, Filters and Output:
Input section: Using the file plugin, in this section, we say to Logstash where is located the file that we want to parse (line 4, path => "/Users/user/gsa/logs/*.log”).

For more information about the others configurations click here.

Filter section: In this section, we parse the logs into useful fields for future Dashboards in Kibana. The plugins used here are:

  • – grok, used to parse an entire log entry, splitting it into fields like the client ip, serving time, bytes, etc. and the query search as a single string.
  • – kv, used to parse the query search into different parameters.
  • – date, used for matching the log’s date with the @timestamp in Kibana
  • – ruby, used for convert some field type into types more useful, for further visualizations in Kibana and also, used for split some of the resultant parameters from the kv plugin (Keyword, Searches, inmeta:, etc.)
  • – geoip, used to obtain the locations of the logs. To use the geoip plugin it’s necessary to do some additional steps, which we will detail in a separate blog, but the database and some information can be found here.
  • – mutate, in this case, is used to complement the geoip plugin.

Output section: We use the elasticsearch plugin for store logs in Elasticsearch, so this way we can use the Kibana Web interface.
For more information about this plugin and others for the Output, section click here.
Initializing the Pipeline
Now that we have all the fundamentals, we can start logstash and send data
$ /opt/logstash/bin/logstash -f /home/me/gsa.conf
After that, the shell will print a lot of stuff meaning that the logs are being parsed and send it to Elasticsearch. We can see the parsed logs in Kibana, at http://localhost:5601/ by default.

Visualization of GSA logs in Kibana

Now that our data is indexed in Elasticsearch, we can look at the Kibana interface to get some useful analytics.

If we follow all the steps above, Kibana will be up and running at http://localhost:5601/. If we click on that link, we will probably see something like this.

This is because the date of the logs that we are working on, is from June to September; and for default, Kibana shows the data with a \@timestamp range of the last 15 minutes.

So, to see our logs, we will need to expand the time range to June-September, clicking the time picker (Last 15 minutes) and then the Absolute section. In this case, our time range is from June 9th to September 10th.
If we click on “GO”, we would now able to see our data as a Histogram:

You can explore the data loaded, make sure that every field that you need was parsed correctly, etc. and then start creating some visualizations and making useful conclusions.

Visualizations

The visualize page is used to create new Visualizations based on new interactive search, a saved search or an existing saved visualization. Kibana allows you to create the following visualizations:

  • Area chart
  • Data table
  • Line chart
  • Markdown widget
  • Metric
  • Pie chart
  • Tile map
  • Vertical bar chart

This visualization can be saved, used individually, or can be used in Dashboards.We’ve made this easier for you. Request our starter kit which includes dozens of foundational visualizations.

Dashboards

The Dashboard page is used to show a collection of saved visualizations in different groups, under a Dashboard name.
We’ve created three dashboards that are included in our starter kit.

* These fields are required.