Semantic Search Strengthened with Innovative Linguistic Analysis

July 16, 2013

In Search Engine Journal we came across a recent article outlining two important topics in the search arena today, “The Difference Between Semantic Search and Semantic Web.” The post presents definitions for each and delves into the numerous distinctions between these terms.

Pulling from Cambridge Semantics, the article asserts that the Semantic Web is a set of technologies that store and query information, usually numbers and dates. Textual data is not typically stored in large quantities.

We thought their simple explanation of semantic search was a good starting point for those learning about the technology:

Semantic Search is the process of typing something into a search engine and getting more results than just those that feature the exact keyword you typed into the search box. Semantic search will take into account the context and meaning of your search terms. It’s about understanding the assumptions that the searcher is making when typing in that search query.

We also appreciate that the article refers to semantic search as a concept that is not new, but is currently gaining much traction. Essentially semantic search mirrors the process people use when reading; text is analyzed and context is developed in order for a rich understanding to be developed. Many innovative technologies are emerging out of this concept. For example, solutions from Expert System offer precise analytics using their core semantic search technologies. Their linguistic analysis capabilities enhance the extraction and application of data in the natural language interface.

Megan Feil, July 16, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Big Data Might Not Be for All Businesses

July 16, 2013

Data professional and Connexica blogger Richard Lewis is beginning to suspect that small and medium companies are being swindled. In “Is Big Data All That?” he concludes that only large corporations really benefit from the trend, and that for everyone else it is a bunch of ballyhoo. Interesting assertion. Lewis explains:

“I find all of the current hype around big data and the promotion of technologies that require dozens of servers, armies of techies and even then professors of programming to get at the data a bit of a scam. At the end of the day it’s about making money, customer satisfaction and self-improvement so it’s not size that matters it’s being able to zero in on the facts… finding the information that matters. I am concentrating on finding out the facts and masking out the noise. With all this ‘big data’ we are creating a maelstrom where it’s increasingly difficult to see the facts through the mist. Big data is for the likes of Coca cola for McDonald’s for Walmart. For everyone else big data is noise. Concentrate on getting the most out of what we have before seeking solace from social chatter.”

Lewis is not the first to question the push for all companies to embrace big data. Surely there are some smaller entities for which the technology is helpful, but perhaps more for which it is a waste of resources. Are businesses harming themselves by jumping on the bandwagon without a clear idea of what they expect to gain? Does data extracted from social media really hold valuable insights for every organization?

Maybe companies that have not yet taken the big data plunge would do well to step back, and closely consider their unique needs before going forward.

Cynthia Murrell, July 16, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Norconex Offers Open Source HTTP Crawler

July 16, 2013

Most commercial enterprise search vendors offer their own HTTP crawler, and several are open-source. One new entry to the field stands out, though, for its odd blend of web and enterprise search functionality. In the post, “Norconex Gives Back to Open-Source,” Norconex describes their crawler and associated libraries:

“The Norconex HTTP Collector is an HTTP Crawler meant to give the greatest flexibility possible for developers and integrators. It makes it easy for Java developers to add custom features, so no one will get stuck again when dealing with odd requirements, difficult websites, or close-source crawler limitations. . . . The HTTP collector can be used stand-alone or embedded as a library in your own software.

“Norconex may release other collectors for various data sources in the future. In the meantime, we have encapsulated the document parsing process and sending of parsed data to your target search engine or repository into two separate libraries. We are releasing them as Norconex Importer and Norconex Committer.”

Norconex tells us that they focused on a simple configuration, as well as providing features that cannot be found in some existing crawlers. The enterprise search firm was founded in 2007 and is based in Ottawa, Canada.

Cynthia Murrell, July 16, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Topology Configuration for SharePoint 2013

July 16, 2013

We’ve found an interesting how-to on topology configuration. The Team Blog of MCS @ Middle East and Africa posts, “Configuring SharePoint 2013 Search Topology.” Blogger and Microsoft employee Issa Ayyash begins:

“When creating the Search Service application, a default topology will be built automatically where all Search components are assigned to the server which is running the Central Administration, in multi servers farm scenario you need to change this topology, the only available way currently is through PowerShell which will provide more flexibility in configuring the topology, (you can NOT modify the topology through UI like you used to do with SharePoint 2010)”

Yes, that change could be frustrating if one didn’t get the memo. For a rundown of the differences between SharePoint 2010 and SharePoint 2013, click here.

Ayyash goes on to guide us through an example five-server setup, complete with a helpful diagram, a screenshot, and snippets of code. The model dedicates three servers to query processing and two as application servers. This post is a concise and informative resource for anyone who may be called upon to configure a SharePoint 2013 search topology.

Cynthia Murrell, July 16, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Solr 4 Webinar from LucidWorks

July 16, 2013

Make plans to attend the Solr 4 webinar this Thursday hosted by the experts at LucidWorks, through their open resource portal SearchHub. Read all the details of the upcoming event in the LucidWorks release, “WEBINAR: Scaling Through Partitioning and Shard Splitting in Solr 4.”

The details state:

“Over time, even the best designed Solr cluster will reach a point where individual shards are too large to maintain query performance. In this Webinar, you’ll learn about new features in Solr to help manage large-scale clusters. Specifically, we’ll cover data partitioning and shard splitting. Partitioning helps you organize subsets of data based on data contained in your documents, such as a date or customer ID. We’ll see how to use custom hashing to route documents to specific shards during indexing. Shard splitting allows you to split a large shard into 2 smaller shards to increase parallelism during query execution.”

Attendees will come away with real world examples and applications to make Solr 4 production ready. The featured presenter is Timothy Potter, senior Big Data architect at Dachis Group, a true expert in the field. Register today for the free webinar.

Emily Rae Aldridge, July 16, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Setting Up Endeca Provisioning

July 15, 2013

Need to know how to establish Endeca‘s Provisioning? David Sowerby at 3sixty-analytics helpfully outlines the setup in, “Endeca Provisioning – ‘Self Service’.” (The article notes that users looking for information on installing Provisioning should see this entry.) He writes:

“I mentioned in an earlier post that Oracle had indicated the future direction of Endeca by including a feature described as ‘Provisioning’. In practice, this is the facility to load a single spreadsheet, perform some simple but useful data cleansing, and then let Endeca generate an application for you.

“Although it currently only loads a single spreadsheet per application, it is a great way to see some of Endeca’s capability first hand, and also to act a hands on introduction to building an application from your data.”

The write-up describes four main steps: finding a spreadsheet of data to use, loading the spreadsheet, experimenting with the data, and, naturally, bragging about the results. It walks us through the process of loading a spreadsheet, complete with screenshots. He then links to an Oracle video that covers working with the data and creating an application. The article and video together make for a valuable resource for those looking into Endeca Provisioning.

Cynthia Murrell, July 15, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Talend Snatches Up Local Data

July 15, 2013

From Dataversity we read about another analytics partnership; this one is between Talend and Loqate. “Talend and Loqate Partner to Increase the Value of Strategic Information” describes the two companies. Talend is a global open source software plater and Loqate does international address verification. The reason for the season of this relationship deals with addressing data quality demands of enterprise organizations.

The article quotes Fabrice Bonan, co-founder and chief technical officer at Talend:

“Talend’s commitment to our customers is to let them realize the full value of their information assets. Our partnership with Loqate provides a crucial piece of this puzzle: the verification, enrichment and geocoding of addresses that ensure the accuracy and reliability of location data. These added capabilities help our customers increase the efficiency of critical business processes such as logistics or customer acquisition and retention.”

As local data becomes increasing relevant to a mobile-charged society, it is no wonder Talend and Loqate have buddied up. As previous partnerships have shown, it is important for companies like Loqate to have the clean and accurate data ready and available before anything else takes place. We will keep watching to see how this progresses.

Megan Feil, July 15, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

The Many Projects of SumoBrain Solutions

July 15, 2013

Often times we come across companies so interesting that they deserve a write-up of their own — news or no current news. We stumbled upon SumoBrain Solutions and were intrigued enough to delve into their About Us.

SumoBrain Solutions participated in last year’s PE2E (Patent End-to-End) pilot project and developed a solution to the US Patent and Trademark Office that they integrated into the non-search architecture.

Also of note is their technology to support complex search optimization for their chemical industry client. We learned:

“A matrix query is particularly powerful when a user desires a dataset based on 2 or more dimensions. For example, rather than asking for all patents where the assignee is IBM, Intel, or Microsoft, the user might request all datasets where the assignee is IBM, Intel, or Microsoft, by year for the last 20 years. Other use cases include mapping term lists against assignees or … The conventional way of running such queries was to run each permutation – and depending on the number of dimensions and the number of possibilities in each dimension, this approach can quickly become intractable. Our innovative approaches to this challenge and many other complex search problems have given us the most powerful and scalable capabilities on the market.”

Their projects don’t stop there. They are also working on a project with Harvard University Business School on historical and geographical trends in patenting activity in the US.

Megan Feil, July 15, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

IDC Report on Lucene Revolution

July 15, 2013

Lucene Revolution 2013 was a huge success, bringing together a variety of brilliant minds all focused on bringing out the potential of Apache Lucene Solr. LucidWorks is the main sponsor of the recurring event, one of the biggest in the world focused on open source search technology. Now a huge endorsement has come in through the venerable organization, IDC. Read more about their report in the release, “IDC Report on Lucene Solr Revolution 2013.”

The release has this to say about Lucene Revolution 2013:

“The conference produced an amazing set of videos and slides presenting the best thinking on search technology. IDC analyst David Schubmehl recently wrote a report discussing why the Lucene/Solr Revolution Conference 2013 is an excellent example of the increasing maturation and acceptance of open source search technologies within organizations. In this report you will find several examples and case studies of companies and organizations using Lucene/Solr.”

Interested readers can register for free to read the full IDC report. This is just one more way that LucidWorks continues their support of the open source search developer community. European followers should stay tuned for the upcoming Lucene Revolution EU event happening in Dublin in November.

Emily Rae Aldridge, July 15, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Big Data Startup Parade Begins with These 14 Companies

July 14, 2013

Business Insider posted an article titled 14 Big Data Startups You’re Going to be Hearing A Lot More About on June 4, 2013. The article explores the big data companies teetering on the edge of wild success and fame. The companies named include WibiData, Hadapt, Sqrrl, Precog, Datameer, HStreaming, Alpine Data Labs and Kontagent. The article claims,

“Google, Facebook, Amazon and other web giants have harnessed big data to solve some of their biggest tech challenges. Now many of these engineers are setting out on their own with startups. Some are focused on analytics. Some are working on in-memory databases, which do all their work on data stored in memory instead of hard drives. Others are casting their lot with NoSQL, a new kind of database that spreads processing and storage across multiple servers and storage systems.”

For example, Data Gravity, founded in 2012 with headquarters in Nashua, NH and star Paula Long, makes big data more affordable by embedding the tech into storage systems. The implications posed by these startups for IBM SPSS, SAS, Palantir and Digital Reasoning are as yet unclear. VC’s certainly seem optimistic, with almost all of the startups mentioned raking in millions of dollars from various backers.

Chelsea Kerwin, July 14, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta