MC+A Stream

Our Blog and News Stream

Product Catalog Search with a Google Search Appliance

December 26th, 2008

Two weeks ago MC+A did a webinar on using a Google Search Appliance (GSA) to index product catalog data.  The demonstration was developed to showcase how effective Google search technology can be added to a public facing website search.  Jason Wasserman from boats.net joined us on the call.  I recommend visiting their site to see great example of the technology in production use ( and for all of your OEM Yamaha and Honda marine parts).  This post is a detailing of how we built the example site for the demonstration.

Content Acquisition

Product catalog information provides some challenges because skus vary greatly from product line to product line.  Many times there are white space characters such as ‘-’ or ‘_’ in the sku name.  By default, when the search engine indexes these documents, the skus will be tokenized so that the single sku will become multiple keywords.  Additional, meta data values become significant when a user would like to filter based on structured information.

The GSA has two effective means for this.  First there is the Google web crawler and navigator your site hierarchy and find all of the documents very much like the Google.com crawlers do.  Second, the GSA comes with a database crawler that queries the database directly, indexes the result set and provide a direct link to the document.  There are some benefits for each approach.    Specifically,

Web Crawler

  • The Web Crawler acts similarly to the public crawlers.  Improvements to it will positively effect your public search ranking.
  • Utilizing the Web Crawler requires no direct access to the database by the GSA.
  • Search Engine Optimizations performed for improving the GSA search quality will also positively effectively your public search results.
  • Meta data can be added via meta data tag anywhere in the content (i.e. <meta name=”price” content=”12.95″)

Database Crawler

  • Very fast reindexing.  (Jason mentioned his 1.5 million documents get indexed in 20 minutes.)
  • The content that is indexed is limited to the query result and very specific to the product content.   We say that it has limited ‘noise’ or interference.
  • You can add meta data per column of the result set.  This does not have to be included in your public content.
  • Customization on how the content is index via the dbstylesheet.

For the purpose of the demonstration we choose the db crawler.  We set up a commerce website at http://catalog.mcplusa-dev.com.  It is a fresh install of OS Commerce.  OS Commerce by default uses product keys as the url.  This is a less effective way to reference a document.  For example, http://catalog.mcplusa-dev.com/product_info.php?products_id=1 would be better if it were http://catalog.mcplusa-dev.com/graphics/Matrox_G200_MMS.  There are url rewriters for most standard commerce products that help you achieve this.

Here's the out of the box installation of OS Commerce

Here's the out of the box installation of OS Commerce

Step 1: Setting up the Database Crawler

For the purposes of the webinar we set up a simple query based on the schema of  OS Commerce.  We create a Google Search Appliance data source with the following settings:

  1. Database Type: MySQL
  2. Host name: the Ip address of the server
  3. Port: 3306
  4. DatabaseName/UserName/Password: Specific to the db
  5. Crawl Query:

    select products_description.products_name, products.*, manufacturers.manufacturers_name, products_description.products_description, concat(‘http://catalog.mcplusa-dev.com/product_info.php?products_id=’, cast(products.products_id as char)) as url from
    products inner join products_description
    inner join manufacturers on products_description.products_id = products.products_id and
    manufacturers.manufacturers_id = products.manufacturers_id
    where products_description.language_id =1

  6. DB StyleSheet: dbstylesheet_catalog_testxsl This xsl was written by John Gregory a couple of years ago.  Its better than the on board stylesheet because it changes the title to the first column returned from your query and it makes each column a meta data field.
  7. Serve URL Field (Radio Checked): “Url”  Url is a computed column in the result set and links directly to the product page.
Basic configuration of the Database Crawler

Basic configuration of the Database Crawler

Step 2: Setting Up the Collection

The set up of the collection was pretty straight forward.  We set the url pattern to be: http://catalog.mcplusa-dev.com
Content Serving
We create a separate GSA frontend for this demonstration and added several tool-kits to it.  The first one we added was parametric navigation.  This toolkit allows you to filter your result set based on meta data.  Secondly, we added Search As you Type open source toolkit.  Following the instructions in each toolkit we were able to build the front end in about 20 minutes (and it probably shows :) ).

Search Results with Parametric and Search As You Type

Search Results with Parametric and Search As You Type

Conclusion

We’ve demonstrated the steps necessary to effectively implement the Google Search Appliance for a product catalog.  Customers of ours who have used these steps have seen conversion rates increase 3 fold using the search compared to those who have not.   Feel free to contact us should you have any additional questions.

MC+A Launches Online Store

December 20th, 2008

MC+A is proud to announce the launch of it’s online store. The store will allow for streamlined purchasing of our hardware and software solutions.

MC+A President, Michael Cizmar welcomed the stores launch. “We are extremely excited about the store coming online. It should allow for a streamline purchasing experience for our clients and expand our reach in the marketplace.”

You can visit our store at: http://stop.mcplusa.com

Combating SPAM – Real Life Example

October 31st, 2008

Earlier this week, a client asked me to investigate a spamming issue that they were concerned about the origination of the emails.  Upon first glance it appeared that the emails were coming from their email address.  After examining the message header, it was clear that it was not.  I foolishly sent an email out to demonstrate what was happening.  Unknowingly, I added myself to the list and inadvertently sent a message to thousands of people.

This post is to help spread information about the cause for people affected by it.  Also, people facing other issues can uses the same techniques.

The Problem

  • Spam emails were being generated from and to info@worldswidedomains.com
  • Replying to this address caused your name to be added to the list server and an email went out to everyone’s name who had previously been added to the list.
  • Most people on the list were added manually without their knowing.

Resolution

FIRST
Find out who the domain is registered to by going to: http://whois.domaintools.com/worldswidedomainname.com (you can replace the worldswidedomainname.com for other domains).  This produced the following information.

Registrant:
Alex Shafts
504 LEONARD AV
Las Vegas, NV 89106
US
Domain name: WORLDSWIDEDOMAINNAME.COM

Administrative Contact:
Shafts, Alex  
504 LEONARD AV
Las Vegas, NV 89106
US
702.5431469
Technical Contact:
Shafts, Alex  
504 LEONARD AV
Las Vegas, NV 89106
US
702.5431469
Registration Service Provider:
Ecommerce, Inc., 
800-861-9394

http://ecommerce.com

UNLIMITED Storage Space, 3 TERRABYTES of Monthly Transfer & up-to 16
domains, starting at $3.95!

LIFETIME FREE DOMAIN REGISTRATION + FREE FEATURES INCLUDED. ONLY AT
ECOMMERCE.COM

Registrar of Record: TUCOWS, INC.
Record last updated on 24-Oct-2008.
Record expires on 25-Oct-2009.
Record created on 25-Oct-2008.

Registrar Domain Name Help Center:

http://domainhelp.tucows.com

Domain servers in listed order:
NS16.IXWEBHOSTING.COM
NS15.IXWEBHOSTING.COM

Domain status: clientHold
clientTransferProhibited
clientUpdateProhibited

SECOND
Next I looked up the MX record.  The MX record is a type of domain record that tells people looking to send you an email where to send it.  On most computers there is a command called nslookup.  Open a command prompt and type nslookup.  Next type the ‘set type=MX’ so that you’ll look up the MX record.  Then type in the domain you are looking for.

Using the Set Command with an MX record.

Based on this and the emails I received.  I contacted ixwebhosting.com.  They have assured me that the domain was suspended.

The question remains…who is “Shafts, Alex” and is this the mail address we should send to:

504 LEONARD AV
Las Vegas, NV 89106
US
702.5431469

View Larger Map

Case Study – JumpStart to Findability

October 27th, 2008

Last year we assisted our customer Legal Services of Northern California with our JumpStart service. It has been nearly a year and I am happy to see how much they’ve come along. They’ve recently launched a website showcasing their project.

Their Findability Project is the public face of a special technology search project undertaken by Legal Services of Northern California (LSNC), a legal services program assisting low-income clients in 23 counties in the upper third of California. Made possible by a grant from the LSC Technology Initiative Grant Program, the goal of the project is to implement a Google Enterprise Search Appliance as the principal building block of a modest but secure, well structured, fully web accessible knowledge-content system. If you have some time please read up on their initiative. If you are interested in how we can help you please contact us.

Quick Google Search Appliance Tip: Using ADSI Edit as a Powerful tool for LDAP configuration

October 17th, 2008

Many of our customers find it difficult to figure out which LDAP query to run in their Google Search Appliance (GSA) or Google Mini (Mini) LDAP setup wizard.  When configuring LDAP on the GSA, a common issue that the user may run into is finding the correct information for the “Distinguished Name” (DN) field.

Our solution to this problem is a handy command that can be executed on your server providing Active Directory.

In order to quickly find the DN, open a command prompt and enter the command “adsiedit.msc”.

After you execute this command navigate to the account in which you are seeking the distinguished name (This is typically your account).

Navigating ADSI Edit

Right-click on that account and select “properties”.

Then find the “Distinguished Name” category, double-click, and you will have the answer you seek.

Google Chrome Polishing

September 4th, 2008

Earlier this week, Google announced their new browser Chrome.  Our clients have been asking us what impact this has on their enterprise and the reasons behind the new browser.

Scott McCloud, who is a heavy weight in the comic book industry, drew up a explanation in this online comic for create for Google. It is the best place to start investigating.  We’ll update more as we test out the tool (this blog entry is being written with Chrome).

New Google Search Appliance GB-1001 release

August 6th, 2008

We just found out that there is a new release for the Google Search Appliance.  It includes many of the rumored features that we’ve been hearing were in the works.  The short list is listed below.  I have yet to actually take a look on it, but we’ll be reach out to our customers to tell them the details as we are updated.

<copied from Google>

End User Features

Personalized Search Experience Allow administrators to adjust search results for different user groups, based on department or function.

Alerts Employees can subscribe to email alerts for topics and documents of interest, choosing an hourly, daily, or weekly schedule.

Spellchecker in Six New Languages French, Italian, German, Spanish, Portuguese, and Dutch.

Enterprise Content

Languages Restrict search results to any of 27 auto-detected languages including administrative functions in five new languages (Czech, English-UK, Portuguese-Portugal, Turkish, Vietnamese); contextual spell checking for all end users in Portuguese, French, Italian, German, Spanish, and Dutch; and query expansion for all end users in Dutch.

Security and Access Control

Kerberos Support Provide native support for Kerberos, enabling a silent authentication experience for end-users.

Metadata Biasing Administrators can bias results based on metadata (in addition to biasing based on source, URL or date).

Advanced Reporting View and export daily and hourly result sets, top queries, special feature usage, and more. Report for every query, including reports on which queries receive no clicks by a user and how often users are clicking on sponsored links in comparison to organic search results or OneBox modules.

Administration and Customization

Localized Administration Administer your Google Search Appliance around the globe in 27 different languages. Full administration is now supported in Basque, Catalan, Chinese (simplified), Chinese (traditional), Czech, Danish, Dutch, English (US), English (UK), Finnish, French, Galician, German, Greek, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese (Brazilian), Portuguese (Portugal), Russian, Spanish, Swedish, Turkish, and Vietnamese.

Michael Cizmar Interviewed by Tech Target

July 14th, 2008

In a recent Tech Target interview, MC+A’s president, Michael Cizmar, outlined his opinions on the growing popularity of cloud computing and how it will affect both IT solutions companies and enterprise end-users. 

 While a strong advocate for innovation, Michael believes that even in a “cloudy” world, there will still be a great need for solution companies who can add value by efficiently implementing new technologies across the enterprise.

According to Michael, those who prove their expertise by developing applications that make the cloud more efficient and productive, will continue to play a vital role in the technology world.  In addition, enterprise users win as talented developers are incented to showcase their abilities to an otherwise difficult to reach audience.   

As Michael notes, “If you have an application, and put it on the Google marketplace, you’re immediately exposed to 500,000 customers.”  With this type of opportunity for developers, and the resulting benefits to enterprise users, cloud computing – and innovative companies like MC+A – may be in the process of creating the perfect storm. 

Read the full interview here: 

http://channelmarker.blogs.techtarget.com/2008/06/12/the-impending-cloud/

Google Enterprise releases 5.0.4

July 14th, 2008

Google Enterprise has released the long awaited 5.0.4 release.  Most notably in this release is support Microsoft Office 2007 documents and the connector manager auth prompt bug fix.

Existing customers can goto:

https://support.google.com/enterprise/doc/gsa/00/update_index_page.html

 

to download the new system and software versions.  I’ll updated this thread with any discoveries that we find in our upgrades.  Please write your findings as well.

 

Update:

 

We’ve attempted to update two GSAs.  The first last week went fine.  The one we attempted today twice had a checksum error….

 

Downloading URL http://support.google.com/enterprise/updates/XXXXXXX

 

Percent completed: 100%
MD5 Checksum = 74f78e89c09d4be143c3b1d5ca1a4fb7

** Unpacking downloaded file **

ERROR: Downloaded file is not properly signed or corrupted. Please try again one more time.

MC+A's Michael Cizmar Joins Advisory Board of OpenPipeline Project

May 8th, 2008

Tuesday, Dieselpoint announced the release of a new pipeline architecture as open source software. MC+A’s president, Michael Cizmar, will be serving on the advisory board.

Thursday, May 08, 2008, (Worldwide Headquarters – Chicago, IL)

Today, MC+A announces that Michael Cizmar, president of MC+A, will be serving on the advisory board of Dieselpoint’s OpenPipeline. Currently, search, text analytics and connector vendors find themselves constantly reengineering their software to integrate with closed, proprietary systems. OpenPipeline works out-of-the-box with Dieselpoint Search, but can be used with any search engine.

“Since 2002, we have been working with our clients to implement search technologies. We see a tremendous potential for our clients because OpenPipeline simplifies many of the complexities of the UIMA.”, said Michael Cizmar, President of MC+A.

Both broader and simpler than IBM’s UIMA (Unstructured Information Management Architecture), OpenPipeline provides a common architecture for connectors to data sources, file filters, text analyzers and modules to distribute documents across a network. It is fully functional out of the box and includes an installer, a job scheduler, file scanner and crawlers, doc filters, and point and click interface with drag and drop module installation. Document processing can be centralized or parallelized as needed. The transport mechanism is simple, web-services XML over HTTP. RSS/Atom feeds are also possible. The development philosophy behind OpenPipeline stresses simple, elegant design, and massive scalability. Minimal external dependencies and straightforward plug-in implementation ensure that the learning curve is low. OpenPipeline Beta is available for free download via the Apache License 2.0.

Current Advisory Board members include individuals from enterprise search, text analytics and connector firms and consultancies. More information can be found at www.openpipeline.org. MC+A is dedicated to solving business challenges by enabling organizations to derive maximum value from their business intelligence assets. With expertise in areas of portal development, management and architecture, and supporting technologies.

About MC+A

MC+A provide solutions around managing business intelligence. We focus on helping our clients build, find and share those assets securely. Additionally, MC+A provides managed services implementing enterprise 2.0 initiatives.

Page 10 of 13« First...89101112...Last »