ThoughtSpot: Another Search Vendor with Aspirations of Big Bucks

July 4, 2014

One of the two or three readers of this blog reported a new and revolutionary search and Big Data vendor called ThoughtSpot. I navigated to the site and enjoyed to wolf / dog. The headline is:

Your business is fast and data hungry.

I really liked the wolf / dog. I found the various links kept pointing to the wolf / dog. I am no longer fast or data hungry. I am outta here. Maybe a reader will let me know when the Web site is working again. the company has captured $30 million in funding according to Venture Beat. I assume the Web site will be fattened in the days ahead. This should be easy. According to Google Maps, ThoughtSpot is very near the In and Out Burger in Redwood City. Presumably the Google-like search for Big Data will be the next double double cheeseburger. My dogs like In and Out Burgers. Neither is fast nor data hungry.

Stephen E Arnold, July 4, 2014

Users Prefer Search Apps Over Search Engines

July 3, 2014

Search engines are seeing a drop in ad revenue, because instead of opening Web browser and hitting a search engine to find information, users are turning to search apps instead. TechCrunch states that in the article: “We Search More On Apps, Less On Google Now.” Google dropped from its 82.8 percent dominance of the search engine ad revenue to a mere 65.7 percent. The US mobile ad market, however, spiked to over $17.73 billion-way more than Google brought in the past two years for search.

Users are sticking to niche apps to find the information they need. It makes sense given that the aggregated results are more in tune to do what we want than having to sift through irrelevant search results. Nielson ran a consumer report that found users are spending 34 hours a month on mobile phones over 27 at their desktop. Their search wants have also shifted:

“According to the eMarketer report, we’re really big on local search. Yelp is leading the pack here in terms of ad-revenue growth. Predictions for the local business search company are at 136 percent, or $119 million in mobile ad revenue this year. While that’s a drop in the bucket compared to the spend for Google, Yahoo or Bing, it’s a telling shift in consumer behavior. Revenues are expected to triple by 2016 for Yelp. Meanwhile, Google revenue is expected to drop to 64.2 percent of the overall market by then.”

Google is not going bankrupt. The company is still making money and is still growing, it is just not dominating the entire search market. Users are getting smarter about the way they search as well as finding different ways to get their information. The old search browser might be a thing of the past soon.

Whitney Grace, July 03, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

NoSQL Has a Weakness. Just Tell No One.

July 2, 2014

I read “The Rise (and Fall?) of NoSQL.” The write up seems to take a stance somewhat different from that adopted by enterprise search vendors. With search getting more difficult to sell for big bucks, findability folks are reinventing themselves as Big Data mavens. Examples range from the Fast Search clones to tagging outfits. (Sorry, no names this morning. Search and content processing vendors with chunks of venture firm cash do not need any more fireworks today.)

Is Big Data the white knight that will allow those venture funded companies to deliver a huge payday? I don’t know, but I keep my nest egg is less risky

Here’s the segment I noted:

It’s quite simple: analytics tooling for NoSQL databases is almost non-existent. Apps stuff a lot of data into these databases, but legacy analytics tooling based on relational technology can’t make any sense of it (because it’s not uniform, tabular data). So what usually happens is that companies extract, transform, normalize, and flatten their NoSQL data into an RDBMS, where they can slice and dice data and build reports. The cost and pain of this process, together with the fact that NoSQL databases aren’t fully self-contained (using them requires using their “competition”” for analytics!) is the biggest threat to the possible dominance of NoSQL databases.

My take on this searchification of Big Data boils down to one word: Scrambling for revenues. Perhaps some of the money pumped into crazy marketing schemes might be directed at creating something that works. Systems that dip into a barrel of trail mix return a snack that cannot replace a square meal.

Stephen E Arnold, July 2, 2014

Elasticsearch: A Platform for Third Party Revenue

July 2, 2014

Making money from search and content processing is difficult. One company has made a breakthrough. You can learn how Mark Brandon, one of the founders of QBox, is using the darling of the open source search world to craft a robust findability business.

I interviewed Mr. Brandon, a graduate of the University of Texas as Austin, shortly after my return from a short trip to Europe. Compared with the state of European search businesses, Elasticsearch and QBox are on to what diamond miners call a “pipe.”

In the interview, which is part of the Search Wizards Speak series, Mr. Brandon said:

We offer solutions that work and deliver the benefits of open source technology in a cost-effective way. Customers are looking for search solutions that actually work.

Simple enough, but I have ample evidence that dozens and dozens of search and content  processing vendors are unable to generate sufficient revenue to stay in business. Many well known firms would go belly up without continual infusions of cash from addled folks with little knowledge of search’s history and a severe case of spreadsheet fever.

Qbox’s approach pivots on Elasticsearch. Mr. Brandon said:

When our previous search product proved to be too cumbersome, we looked for an alternative to our initial system. We tested Elasticsearch and built a cluster of Elasticsearch servers. We could tell immediately that the Elasticsearch system was fast, stable, and customizable. But we love the technology because of its built-in distributed nature, and we felt like there was room for a hosted provider, just as Cloudant is for CouchDB, Mongolab and MongoHQ are for MongoDB, Redis Labs is for Redis, and so on. Qbox is a strong advocate for Elasticsearch because we can tailor the system to customer requirements, confident the system makes information more findable for users.

When I asked where Mr. Brandon’s vision for functional findablity came from, he told me about an experience he had at Oracle. Oracle owns numerous search systems, ranging from the late 1980s Artificial Linguistics’ system to somewhat newer systems like the late 1990s Endeca system, and the newer technologies from Triple Hop. Combine these with the SES technology and the hybrid InQuira formed from two faltering NLP systems, and Oracle has some hefty investments.

Here’s Mr. Brandon’s moment of insight:

During my first week at Oracle, I asked one of my colleagues if they could share with me the names of the middleware buyer contacts at my 50 or so named accounts. One colleague said, “certainly”, and moments later an Excel spreadsheet popped into my inbox. I was stunned. I asked him if he was aware that “Excel is a Microsoft technology and we are Oracle.” He said, “Yes, of course.” I responded, “Why don’t you just share it with me in the CRM System?” (the CRM was, of course, Siebel, an Oracle product). He chortled and said, “Nobody uses the CRM here.” My head exploded. I gathered my wits to reply back, “Let me get this straight. We make the CRM software and we sell it to others. Are you telling me we don’t use it in-house?” He shot back, “It’s slow and unusable, so nobody uses it.” As it turned out, with around 10 million corporate clients and about 50 million individual names, if I had to filter for “just middleware buyers”, “just at my accounts”, “in the Northeast”, I could literally go get a cup of coffee and come back before the query was finished. If I added a fourth facet, forget it. The CRM system would crash. If it is that bad at the one of the world’s biggest software companies, how bad is it throughout the enterprise?

You can read the full interview at http://bit.ly/1mADZ29. Information about QBox is at www.qbox.com.

Stephen E Arnold, July 2, 2014

Using Search For Environment Protection

July 2, 2014

Environment protection organizations are always asking for support and most of the time that translates into money. Paying a few dollars might make you feel good for a short time, but what if you could donate hundreds of dollars instead of your pocket change? How? Simply by clicking your mouse. The Ecosia team have taken the economics of search and applied it to a new search engine type. The Ecosia search engine generates income whenever people use it. Eighty percent of the income from the searches is used to plant trees in Brazil.

Technology and nature are pitted against each other in the collective consciousness, but Ecosia pairs them together. Ecosia recently updated its search experience, according to the blog post “Search And You Shall Receive.” The update includes images, maps, videos, and latest news. Ecosia pulls its results from many places, including Google, so you can still search through Google results and plant a tree at the same time.

There are also other cool updates:

“What else? Get results tailored to the past day, week or month with the addition of chronological filters on the search results page. We were listening, too, when you asked about data privacy at Ecosia – so check out our updated Privacy Policy. Plus, donate for free when you shop just about anywhere on the web with the all-new EcoLinks browser extension.”

There are other search engines that use a similar model such as GoodSearch.com. Startups with a charitable goal never get enough attention. We encourage you to spread the word about Ecosia and plant a tree.

Whitney Grace, July 2, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

SharePoint through Rose Colored Glasses

July 1, 2014

SharePoint is definitely a powerful and ubiquitous enterprise tool. However, it is not always efficient and is definitely not easy to use – at least that is what the majority of users would argue. However, every now and then an article wants to paint a “best case scenario” picture of SharePoint. The harmon.ie article, “’Seek and Ye Shall Find:’ Making the Most of SharePoint Search” does just that.

After a lengthy discussion of some helpful SharePoint 2013 highlights, the article sums up the argument:

“A lot of the new functionality in SharePoint 2013 is provided by the previously separate add-on ‘FAST Search,’ developed by a company Microsoft bought in 2008). Until SharePoint 2010, this was a separate product, but Microsoft has now integrated it fully into core SharePoint functionality. With the exciting new developments of Office 365 and the cloud, we expect search to become even more powerful and user friendly in coming years. All of which is good news for the most important SharePoint audience of all – end users.”

But in order to get to that level of usability, most organizations will have to work through Microsoft’s “easy” tips and tricks for customization. We say “easy” because for most people this will be anything but easy. But for many organizations the investment in staffing and time is worth it for the end result. SharePoint is big and powerful, but in order to control this beast many organizations will have to sacrifice ease of use.

Emily Rae Aldridge, July 01, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

AeroText: A New Breakthrough in Entity Extraction

June 30, 2014

I returned from a brief visit to Europe to an email asking about Rocket Software’s breakthrough technology AeroText. I poked around in my archive and found a handful of nuggets about the General Electric Laboratories’ technology that migrated to Martin Marietta, then to Lockheed Martin, and finally in 2008 to the low profile Rocket Software, an IBM partner.

When did the text extraction software emerge? Is Rocket Software AeroText a “new kid on the block”? The short answer is that AeroText is pushing 30, maybe 35 years young.

Digging into My Archive of Search Info

As far as my archive goes, it looks as though the roots of AeroText are anchored in the 1980s, Yep, that works out to an innovation about the same age as the long in the tooth ISYS Search system, now owned by Lexmark. Over the years, the AeroText “product” has evolved, often in response to US government funding opportunities. The precursor to AeroText was an academic exercise at General Electric. Keep in mind that GE makes jet engines, so GE at one time had a keen interest in anything its aerospace customers in the US government thought was a hot tamale.

1_interface

The AeroText interface circa mid 2000. On the left is the extraction window. On the right is the document window. From “Information Extraction Tools: Deciphering Human Language, IT Pro, November December 2004, page 28.

The GE project, according to my notes, appeared as NLToolset, although my files contained references to different descriptions such as Shogun. GE’s team of academics and “real” employees developed a bundle of tools for its aerospace activities and in response to Tipster. (As a side note, in 2001, there were a number of Tipster related documents in the www.firstgov.gov system. But the new www.usa.gov index does not include that information. You will have to do your own searching to unearth these text processing jump start documents.)

The aerospace connection is important because the Department of Defense in the 1980s was trying to standardize on markup for documents. Part of this effort was processing content like technical manuals and various types of unstructured content to figure out who was named, what part was what, and what people, places, events, and things were mentioned in digital content. The utility of NLToolset type software was for cost reduction associated with documents and the intelligence value of processed information.

The need for a markup system that worked without 100 percent human indexing was important. GE got with the program and appears to have assigned some then-young folks to the project. The government speak for this type of content processing involves terms like “message understanding” or MU, “entity extraction,” and “relationship mapping. The outputs of an NLToolset system were intended for use in other software subsystems that could count, process, and perform other operations on the tagged content. Today, this class of software would be packaged under a broad term like “text mining.” GE exited the business, which ended up in the hands of Martin Marietta. When the technology landed at Martin Marietta, the suite of tools was used in what was called in the late 1980s and early 1990s, the Louella Parsing System. When Lockheed and Martin merged to form the giant Lockheed Martin, Louella was renamed AeroText.

Over the years, the AeroText system competed with LingPipe, SRA’s NetOwl and Inxight’s tools. In the hay day of natural language processing, there were dozens and dozens of universities and start ups competing for Federal funding. I have mentioned in other articles the importance of the US government in jump starting the craziness in search and content processing.

In 2005, I recall that Lockheed Martin released AeroText 5.1 for Linux, but I have lost track of the open source versions of the system. The point is that AeroText is not particularly new, and as far as I know, the last major upgrade took place in 2007 before Lockheed Martin sold the property to AeroText. At the time of the sale, AeroText incorporated a number of subsystems, including a useful time plotting feature. A user could see tagged events on a timeline, a function long associated with the original version of i2’s the Analyst Notebook. A US government buyer can obtain AeroText via the GSA because Lockheed Martin seems to be a reseller of the technology. Before the sale to Rocket, Lockheed Martin followed SAIC’s push into Australia. Lockheed signed up NetMap Analytics to handle Australia’s appetite for US government accepted systems.

AeroText Functionality

What does AeroText purport to do that caused the person who contacted me to see a 1980s technology as the next best thing to sliced bread?

AeroText is an extraction tool; that is, it has capabilities to identify and tag entities at somewhere between 50 percent and 80 percent accuracy. (See NIST 2007 Automatic Content Extraction Evaluation Official Results for more detail.)

The AeroText approach uses knowledgebases, rules, and patterns to identify and tag pre-specified types of information. AeroText references patterns and templates, both of which assume the licensee knows beforehand what is needed and what will happen to processed content.

In my view, the licensee has to know what he or she is looking for in order to find it. This is a problem captured in the famous snippet, “You don’t know what you don’t know” and the “unknown unknowns” variation popularized by Donald Rumsfeld. Obviously without prior knowledge the utility of an AeroText-type of system has to be matched to mission requirements. AeroText pounded the drum for the semantic Web revolution. One of AeroText’s key functions was its ability to perform the type of markup the Department of Defense required of its XML. The US DoD used a variant called DAML or Darpa Agent Markup Language. natural language processing, Louella, and AeroText collected the dust of SPARQL, unifying logic, RDF, OWL, ontologies, and other semantic baggage as the system evolved through time.

Also, staff (headcount) and on-going services are required to keep a Louella/AeroText-type system generating relevant and usable outputs. AeroText can find entities, figure out relationships like person to person and person to organization, and tag events like a merger or an arrest “event.” In one briefing about AeroText I attended, I recall that the presenter emphasized that AeroText did not require training. (The subtext for those in the know was that Autonomy required training to deliver actionable outputs.) The presenter did not dwell on the need for manual fiddling with AeroText’s knowledgebases and I did not raise this issue.)

Read more

Duck Duck Go Reimagined

June 30, 2014

Duck Duck Go has launched a sleek redesigned web presence, complete with a flashy “What’s New” page to go over the highlights. Duck Duck Go is gaining more traction for users who are interested in secure search, so there will be great interest in what the team is bringing to the table.

Their overview says:

“DuckDuckGo is a search engine driven by community – you’re on the team! We’re not just servers and an algorithm. We’re so much more. Real Privacy. We Don’t Track You. Smarter Search. Get Answers Quicker. Less Clutter. Fewer Ads and Reduced Spam.”

Of course details are provided for those who want to seek them out. But as Google gets bigger and bigger, some users are looking for smart search that allows them to remain an anonymous face in the crowd, and that is Duck Duck Go’s specialty. It may not quite be a David and Goliath situation as the giant does not look like it is going down anytime soon, but Duck Duck Go is on the rise and worth keeping an eye on. But do keep in mind that DDG is a metasearch system, so its weakness is that it has to rely on others’ search indexes.

Emily Rae Aldridge, June 30, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

X1 Search 8 Moves to Unified Search

June 30, 2014

With the move to more data across a wider variety of repositories (SharePoint, OneDrive, Dropbox, and more) the need to search across platforms is becoming more urgent. X1’s search model has responded to the need by introducing X1 Search 8. Details are covered in the CollabShow article, “X1 Search 8—Unified Search for SharePoint, Desktop, Mail and More…

The article begins:

“X1 has been analyzing the needs of the information worker and consumer in this space for over a decade. With their analyses, they have identified the need for fast retrieval and an intuitive, simple interface and powerful filtering across all of the repositories that a user uses and values. When you’re searching for information, you don’t want to have to go to a dozen different places across a variety of user interfaces. You’re likely to give up and end up spending hours duplicating effort or emailing someone else and wasting their time because you couldn’t find the email or document you were looking for and that you know you’ve seen somewhere.”

X1 is using familiar language – unified search, everything search, etc. And while it is perhaps trendy, it is not exactly original. The term “unified” is also used by Attivio, BA Insight, and Sinequa. Keep an eye out to see whether this trend turns into the norm in search. It stands to reason that all enterprise search has to be unified because of the natural direction of the technology. Time will tell.

Emily Rae Aldridge, June 30, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Google and Disappearing Locations in Satellite Imagery

June 29, 2014

I am okay with information disappearing. Whether it is a pointer or the actual content, information is fluid. When doing routine updates of my information about enterprise search vendors, I come across file not found errors or documents that are different from the ones I previously accessed. Some content has vaporized, the target page displaying a blank white screen. A recent example of this is information about the Aerotext entity extraction system now owned by an outfit called Rocket.

I read with some interest “Erasing Your Home from Google Maps Is Way Easier Than You Think.” As satellite imagery for public access creeps toward higher resolution, certain locations require blurring. The article explains how you can “blur” your property in a Google Maps’ image. I learned:

The process is relatively simple. First go to Google Maps and enter your home address (or the address of whatever you want blurred). Enter ?street view” mode by dragging the little man on the right side of the screen to the spot you want blurred. Once there, hit the ?Report a problem” button on the lower-right corner of the screen. It will pull up a page where you can specify whatever image you want to have blurred.

The write up explains how a criminal can use online imagery. The list is incomplete, but it may create more awareness of the consequences of not knowing what one does not know.

How is this relevant to search? Well, if it is incorrect, altered, or not there, it is tough to make certain types of informed decisions. Ignorance can be bliss as long as those who are ignorant are not making certain types of decisions that require precise, current, and accurate information.

Stephen E Arnold, June 29, 2014

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta