Why Field Mapping?
The Google Search Appliance was effectively schemaless and required no mapping of content source fields and metadata to an internal schema. The schema-less design is in complete contrast to every other search index. Through the many migrations that we’ve performed in the past year, we’ve noticed an emergent pattern where our customers have difficulty relating to this.
Isn’t our new system is schemaless?
Not true, even Google’s Big Table is not “schemaless.” If you look at Google’s product documentation for Big Table, they describe their product as being “schemaless.” Modern indexes allow for the definition of field types and templates for fields. This definition capability means that a document does not have to contain all of the fields that are defined in the index for the document to be indexed. Adding of additional fields can be done dynamically and not require purging of the index and reindexing. This feature is de facto among modern insight engines. Just know, to be most successful you should take on the work of defining your schema.
What do we have to define?
We typically start with the user interface and work backward to the data. With this approach, you start with the search wireframe and examine the data elements that are or are expected to drive the interface. Data points that are typical include facet fields search result attributes and autocomplete. Once you have these defined, we then map these to the type of fields required in the index. Facets are typically “stored” or have some similar attribute for example.
How do we assemble the various sources?
Ideally, you don’t. As every data scientist knows, as part of the extract-transform-load (ETL) process is normalizing your data into standard fields. Normalization, in this case, means that you move the “owa_Author” field that you get from Sharepoint and the “createdby” field that you get from Confluence into a defined schema field like “author.” We have created a field mapping spreadsheet that you can use on your project. With this spreadsheet, we run a script to create the necessary field definitions.
Once the data is in the correct field, we then work on a standard form. The standard form is vital because “insight engine” and “Insight Engine” are two entirely different facets.
Can I get a copy of your template?
You can request to download the spreadsheet by filling out the form on this page. You’ll be sent a link to a Google Spreadsheet which you can copy or export. Keep that link handy as it will update it over time based on the feedback we receive.