Product Catalog Search with a Google Search Appliance
December 26th, 2008
Two weeks ago MC+A did a webinar on using a Google Search Appliance (GSA) to index product catalog data. The demonstration was developed to showcase how effective Google search technology can be added to a public facing website search. Jason Wasserman from boats.net joined us on the call. I recommend visiting their site to see great example of the technology in production use ( and for all of your OEM Yamaha and Honda marine parts). This post is a detailing of how we built the example site for the demonstration.
Content Acquisition
Product catalog information provides some challenges because skus vary greatly from product line to product line. Many times there are white space characters such as ‘-’ or ‘_’ in the sku name. By default, when the search engine indexes these documents, the skus will be tokenized so that the single sku will become multiple keywords. Additional, meta data values become significant when a user would like to filter based on structured information.
The GSA has two effective means for this. First there is the Google web crawler and navigator your site hierarchy and find all of the documents very much like the Google.com crawlers do. Second, the GSA comes with a database crawler that queries the database directly, indexes the result set and provide a direct link to the document. There are some benefits for each approach. Specifically,
Web Crawler
- The Web Crawler acts similarly to the public crawlers. Improvements to it will positively effect your public search ranking.
- Utilizing the Web Crawler requires no direct access to the database by the GSA.
- Search Engine Optimizations performed for improving the GSA search quality will also positively effectively your public search results.
- Meta data can be added via meta data tag anywhere in the content (i.e. <meta name=”price” content=”12.95″)
Database Crawler
- Very fast reindexing. (Jason mentioned his 1.5 million documents get indexed in 20 minutes.)
- The content that is indexed is limited to the query result and very specific to the product content. We say that it has limited ‘noise’ or interference.
- You can add meta data per column of the result set. This does not have to be included in your public content.
- Customization on how the content is index via the dbstylesheet.
For the purpose of the demonstration we choose the db crawler. We set up a commerce website at http://catalog.mcplusa-dev.com. It is a fresh install of OS Commerce. OS Commerce by default uses product keys as the url. This is a less effective way to reference a document. For example, http://catalog.mcplusa-dev.com/product_info.php?products_id=1 would be better if it were http://catalog.mcplusa-dev.com/graphics/Matrox_G200_MMS. There are url rewriters for most standard commerce products that help you achieve this.
Step 1: Setting up the Database Crawler
For the purposes of the webinar we set up a simple query based on the schema of OS Commerce. We create a Google Search Appliance data source with the following settings:
- Database Type: MySQL
- Host name: the Ip address of the server
- Port: 3306
- DatabaseName/UserName/Password: Specific to the db
- Crawl Query:
select products_description.products_name, products.*, manufacturers.manufacturers_name, products_description.products_description, concat(‘http://catalog.mcplusa-dev.com/product_info.php?products_id=’, cast(products.products_id as char)) as url from
products inner join products_description
inner join manufacturers on products_description.products_id = products.products_id and
manufacturers.manufacturers_id = products.manufacturers_id
where products_description.language_id =1 - DB StyleSheet: dbstylesheet_catalog_testxsl This xsl was written by John Gregory a couple of years ago. Its better than the on board stylesheet because it changes the title to the first column returned from your query and it makes each column a meta data field.
- Serve URL Field (Radio Checked): “Url” Url is a computed column in the result set and links directly to the product page.
Step 2: Setting Up the Collection
The set up of the collection was pretty straight forward. We set the url pattern to be: http://catalog.mcplusa-dev.com
Content Serving
We create a separate GSA frontend for this demonstration and added several tool-kits to it. The first one we added was parametric navigation. This toolkit allows you to filter your result set based on meta data. Secondly, we added Search As you Type open source toolkit. Following the instructions in each toolkit we were able to build the front end in about 20 minutes (and it probably shows
).
Conclusion
We’ve demonstrated the steps necessary to effectively implement the Google Search Appliance for a product catalog. Customers of ours who have used these steps have seen conversion rates increase 3 fold using the search compared to those who have not. Feel free to contact us should you have any additional questions.






