A Glimpse of Enterprise Search in 24 Months
February 3, 2015
The enterprise search sector faces one of its most critical periods in the next 24 months. The open source “commodity” search threat has moved into the mainstream. The value added indexing boomlet has helped make suggestions, point-and-click queries, and facets standard features. Prices for traditional search systems are all over the place. Proprietary technology vendors offer useful solutions for a few hundred dollars. The gap between the huge license fees of the early 2000s is, in theory, closed by the vendors’ consulting and engineering services revenue.
But the grim reality is that most systems today include some type of information access tool. Whether it is Google’s advertiser-energized model or Microsoft’s attempts to provide information to a Bing user before he or she knows she wants that information suggest that the human query is slowly being eased out of the system.
I would suggest you read “Replacing Middle Management with APIs.” The article focuses on examples that at first glance seem far removed from locating the name and address of a customer. That view would be one dimensional. The article suggests that another significant wave of disintermediation will take place. Instead of marginalizing the research librarian, next generation software will have an impact on middle management.
Humans, instead of performing decision making functions, become “cogs in a giant automated dispatching machine.” The example applies to an Uber type operation but it can be easily seen as a concept that will apply to many intermediating tasks.
Here’s the passage I highlighted in yellow this morning:
What’s bizarre here is that these lines of code directly control real humans. The Uber API dispatches a human to drive from point A to point B. And the 99designs Tasks API dispatches a human to convert an image into a vector logo (black, white and color). Humans are on the verge of becoming literal cogs in a machine, completely anonymized behind an API. And the companies that control those APIs have strong incentives to drive down the cost of executing those API methods.
What does this have to do with enterprise search?
I see several possible points of intersection:
First, software can eliminate the much reviled guessing game of finding the keywords that unlock the index. The next generation search system presents information to the user. The user becomes an Uber driver, executing the tasks assigned by the machine. Need a name and address? The next generation system identifies the need, fetches the information, and injects it into a work flow that still requires a human to perform a function.
Second, the traditional information retrieval vendors will have to find the time, money, and expertise to overhaul their keyword systems. Cosmetics just will not be enough to deal with the threat of what the author calls application programming interfaces. The disintermediation will not be limited to middle managers. The next wave of work casualties will be companies that sell old school information access systems. The disintermediation of companies anchored in the past will have significant influence over the success of search vendors marketing aggressively 24×7.
Third, the user in the Gen X, Millennial, and Gen Y demographics have been conditioned to rely on smart software. Need a pizza? The Apple and Google mapping services deliver in a manner of speaking. Keywords are just not ideal on a mobile device.
The article states:
And I suspect these software layers will only get thicker. Entrepreneurial software developers will find ways to tie these APIs together, delivering products that combine several “human” APIs. Someone could use Mechanical Turk’s API to automate sales prospect research, plug that data into 99designs Tasks’ API to prepare customized infographics for the prospect sent via email. Or someone could use Redfin’s API to automatically purchase houses, and send a Zirtual [sic] assistant instructions via email on how to project-manage a renovation, flipping the house completely programmatically. These “real-world APIs” allow complex programs (or an AI in the spooky storyline here), to affect and control things in the real-world. It does seem apropos that we invest in AI safety now. As the software layer gets thicker, the gap between Below the API jobs and Above the API jobs widens. And economic incentives will push Above the API engineers to automate the jobs Below the API: self-driving cars and drone delivery are certainly on the way.
My view is that this API shift is well underway. I document a number of systems that automatically collect, analyze, and output actionable information to humans and to other systems. For more information about next generation information access solutions, check out CyberOSINT, my most recent monograph about information access.
For enterprise search vendors dependent on keywords and hyperbolic marketing, APIs may be one of the most serious challenges the sector has yet faced.
Stephen E Arnold, February 3, 2015
Apache Solr Search NoSQL Search Shines Solo
February 3, 2015
Apache Solr is an open source enterprise search engine that is used for relational databases and Hadoop. ZDNet’s article, “Why Apache Solr Search Is On The Rise And Why It’s Going Solo” explores why its lesser-known use as a NoSQL store might explode in 2015.
At the beginning of 2014, the most Solr deployments were using it in the old-fashioned way, but 2015 shows that fifty percent of the pipeline is now using it as a first class data store. Companies are upgrading their old file intranets for the enterprise cloud. They want the upgraded system to be searchable and they are relying on Solr to get the job done.
Search is more complex than basic NoSQL and needs something more robust to handle the new data streams. Solr adds the extra performance level, so users have access to their data and nothing is missing.
” ‘So when we talk about Solr, it’s all your data, all the time at scale. It’s not just a guess that we think is likely the right answer. ‘We’re going to go ahead and push this one forward’. We guarantee the quality of those results. In financial services and other areas where guarantees are important, that makes Solr attractive,’ [CEO Will Hayes of LucidWorks, Apache Solr’s commercial sponsor] said.”
It looks like anything is possible for LucidWorks in the coming year.
Whitney Grace, February 03, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Advanced Analytics Are More Important Than We Think
February 3, 2015
Alexander Linden, one of Gartner’s research directors, made some astute observations about advanced analytics and data science technologies. Linden shared his insights with First Post in the article, “Why Should CIOs Consider Advanced Analytics?”
Chief information officers are handling more data and relying on advanced analytics to manage it. The data is critical gaining market insights, generating more sales, and retaining customers. The old business software cannot handle the overload anymore.
What is astounding is that many companies believe they are already using advanced analytics, when in fact they can improve upon their current methods. Advanced analytics are not an upgraded version of normal, descriptive analytics. They use more problem solving tools such as predictive and prescriptive analytics.
Gartner also flings out some really big numbers:
“One of Gartner’s new predictions says that through 2017, the number of citizen data scientists will grow five times faster than the number of highly skilled data scientists.”
This is akin to there being more people able to code and create applications than the skilled engineers with the college degrees. It will be a do it yourself mentality in the data analytics community, but Gartner stresses that backyard advanced analytics will not cut it. Companies need to continue to rely on skilled data scientists the interpret the data and network it across the business units.
Whitney Grace, February 03, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Big Data is About More than Just Size
February 3, 2015
Although Big Data has been around for some time, and some might say its discussions are getting old, the truth is that Big Data is beginning to streamline into more mainstream uses. ZDNet covers the topic in their article, “Getting Big Data Right is About More than the Size of Your Database.”
The article begins:
“As a term becomes more familiar, it also gets misused and it’s easy to think of big data as a system that makes a data warehouse achievable even for a small business. That misses the point, though. Big data isn’t big because you have lots of it. It’s big because it covers lots of areas in which you can find insights that your regular data set – however large – doesn’t cover.”
The article goes on to highlight Microsoft Delve as one way to see Big Data making an appearance in most mainstream architectures. Office 365 is using Delve and other components to move out of the box and help users make the most of their installations. Stephen E. Arnold is a longtime leader in search and had devoted a good bit of time to Microsoft SharePoint, in both its former and current iterations. He gives some attention to Delve as well as other Office 365 topics on his SharePoint feed at ArnoldIT.com. Keep your eyes open to see if his findings might have an impact on your organization’s installation.
Emily Rae Aldridge, February 03, 2015
LucidWorks (Really?) Defines, Redefines Startup
February 2, 2015
I received one of those off the wall LinkedIn requests. Years ago the original LucidWorks (Really?) was a client of my advisory services. Marc Krellenstein, who left the company in an interesting, mysterious, and wave generating founder escape, mentioned me to another LucidWorks (Really?) employee. (Note: Dr. Krellenstein is now the senior vice president of technology development at Decision Resources.)
In the beginning, there was the dream of becoming the next RedHat of the enterprise search world.
Flash forward through two presidents and a legion of leaders to the departure of Paul Doscher, once involved with Exalead and Jaspersoft. Eric Gries left his CEO role after the first Lucene Revolution Conference. Yep, revolution. A new platoon of Horse Artillery arrived. I lost interest in the outfit.
Then the company morphed into a vendor who sold consulting that actually worked, often a rarity in the world of information access.
About half way through the almost eight year journey, Lucid Imagination morphed into LucidWorks (Really?). The company flip flopped from a consulting firm selling Lucene/Solr engineering into a Big Data company. The move was sparked by the company’s inability to generate a payback on the $40 million in venture capital pumped into the company since it opened for business in 2007.
Now the company has an off kilter logo in two shades of red and a lower case “w.” Marketing genius illuminates this substantive typographical maneuver. My goodness, the shift from blue to red is something I would associate with Dr. Einstein’s analysis of Brownian motion or Dr. Jon Kleinberg’s CLEVER algorithm or Dr. Jeffrey Dean’s work on Google Chubby.
The way I do math reveals that LucidWorks (Really?) is a seven year old company. The burn rate works out to about $6 million in venture funding plus whatever revenues the company has been able to generate on its 84 month journey. When LucidWorks (Really?) with Krellenstein on board set up shop Bill Cowher resigned as head coach of the Pittsburgh Steelers and started his journey to seemingly low key Time Warner pitchman. Also in 2007 the Indianapolis Colts beat the the Chicago Bears to win the super bowl. The first episode of Mad Men ran on a US pay for view channel. The number one song in 2007 was Beyonce’s “Irreplaceable.” Is this the tune Elasticsearch plays as it wins clients from LucidWorks (Really?)?
Now to the LinkedIn email:
A LucidWorks (Really?) employee wanted me to know that he was previously employed by Raritan, a connector and consulting company specializing in “federated search.” This person wanted to be my LinkedIn “amigo,” “BBF,” “Robin,” or who knows what else.
I pointed out that I did not want to be a LinkedIn friend with an outfit that may be the object of considerable attention from Granite Ventures, Shasta Ventures, Walden International, and In-Q-Tel, an outfit known for investments based on the US government’s curiosity, not payback.
My former Raritan federated search expert read my “no” and sent me this message:
Fair enough – we are after all a startup for chrissakes! I just published a blog on our Lucidworks site -( lower case ‘w’ please dude! that was from our Marketing Guys) called The Well Tempered Search Application – Prelude. Fusion 1.1 has a lot of gaps to fill – I have trying to help our whizz kids realize that this is somewhat wheel-reinvention … I would be interested in your thoughts on my blog/rant because you are one of my heroes: a real dyed in the wool crusty curmudgeon if you will (that is meant as a compliment!)
Okay, I took away a couple of factoids from this email: Cursing is a Sillycon Valley convention. I live in rural Kentucky where there are Baptists and others who get frisky when curse words are tossed around the Speedy Mart. Another factoid is that LucidWorks (Really?) is a startup. But now to the big deal at LucidWorks (Really?): Lucidworks with a lower case “w.” I had to reach for my blood pressure medicine. A lower case “w”. Oy vay. LucidWorks (Really?) has hit upon a significant and brilliant move. A. Lower. Case. W. I have to take a couple of deep breaths.
I pointed out that a seven year old company is not a startup as much as the marketing “guys” want it to be. I then learned this from my correspondent:
Point taken what I meant was that we are still VC funded. We have undergone a lot of transformation in the last year so your criticisms are totally valid say up to 2013, but we are working hard to redress these as we speak. So stay tuned sir, hope that we can make a convert but to be clear, I am NOT a sales or marketing guy thank you very much. But whatever the case, I share your cynicism in general – I have been doing this for about 15 years now – so I have seen hype cycles like Big Data come and go – FWIW our earlier claims for Big Data were BS but the re-tooling that we are doing now will hopefully change your mind somewhat. [emphasis added]
Fascinating is the phrase “still VC funded.” In my mind this begs the question, “After seven years of trying to generate revenue, when will LucidWorks (Really?) start to fund itself, pay back its stakeholders, and generate sufficient surplus to invest in research to deal with the demons of Big Data?”
Maybe LucidWorks (Really?) should update its information in stories like this: “Trouble at LucidWorks: Lawsuits, Lost Deals, & Layoffs Plague the Search Startup Despite Funding.” Isn’t the Big Data drum becoming noise; for example, “The Promise of Big Data Still Looms, but Execution Lags.”
Looking back over seven years, LucidWorks (Really?) has an intriguing pattern of hiring people, engaging in litigation, getting more venture funding, and repositioning itself. How many repackagers of Lucene/Solr does the world’s appetite demand.
Based on my monograph about open source search, the winner in the keyword search solutions is Elasticsearch. In terms of venture funding, staff stability, and developer support—Elasticsearch is the winner in this game.
LucidWorks (Really?) will have to do more than tell me that it is not a start up after telling me it is a startup, flip-flopping its value proposition, making substantive changes like the use of a lower case “w”, and asking me to give the company a hunting license for my LinkedIn contacts.
In short, as the revenue pressure mounts, I look forward to more amusing antics. I particularly like the slang phrase “We are after all a startup for chrissakes!”
No, dear LucidWorks (Really?), you are not a start up and you are not a player in the next generation information access market. If I were more like my old Halliburton/Booz Allen self, I would try to sell a briefing to your venture funding outfits. Now it is not my problem. l
Enjoy your meetings to review your lower case “w” quarterly revenues. And, please, do not tell me that you cannot afford my CyberOSINT: Next Generation Information Access study. That’s okay. I cannot afford a McLaren P1. No one cares, including me. I prefer products that work, really.
Stephen E Arnold, February 2, 2015
HP Autonomy and High Customer Experience Hopes
February 2, 2015
I read “Hewlett Packard Packs Its Bags for New Flagship Office in the City at 1 Aldermanbury Square.” Reorganizations and real estate are somewhat disruptive in my experience. The write up contained a quote to note:
HP’s UK& Ireland chief of staff, Susan Bowen, said: “1 Aldermanbury Square provides the perfect location and environment for our team to create a first class European customer experience centre. “With outstanding transport links and the capacity to host strategic customer briefings, our new offices will also serve as Hewlett Packard’s flagship London hub,” she said.
HP seems to be confident that despite the legal hassles and billion dollar write down for the purchase of Autonomy, Mike Lynch’s IDOL and DRE promise to keep selling.
I do like the “customer experience center” because Autonomy IDOL and DRE are essentially math centric. How does a customer experience math? Classroom, chalkboard, exams?
Stephen E Arnold, February 2, 2015
A Microsoft Azure How to PHP Search
February 2, 2015
Microsoft Azure is a cloud computing platform and infrastructure that has a variety of functions. If you want to hook up Microsoft Azure Search to your PHP Web site and are at a loss about what to do, then you need to check out this MSDN blog by Nick J Trogh. Simply titled Nick’s Blog, Trogh writes about “all things technical about the Microsoft platform.” He recently posted a guide about how to integrate Azure Search service into a PHP Web site and take advantage of advanced search techniques.
Trogh does not complicate the installation process and includes screenshots for easy reference. He ends with two last pieces of advice:
“In this article we’ve gone through adding search as a service using Azure Search to your PHP website. In a matter of minutes you can get started and provide your users with a complex search functionality. And as your site gets more traffic, you can easily scale out your search service. Make sure to get started with the Azure Search service and also try out the other application, data and infrastructure services in the Microsoft Azure platform. You can get started for free on Azure or activate your MSDN Azure benefits.”
Azure is turning out to be a decent cloud service and much more favored than Windows 8. It is rare to see that Microsoft fans are justified in their praise for Windows.
Whitney Grace, February 02, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Recommind Gets a New Senior VP
February 2, 2015
MarketWired published a press release announcing a change in management: “Recommind Appoints Steve Kennedy As Senior Vice President Of Field Operations.” Kennedy first started off as EMC’s Worldwide President of Data Protection and Availability Software and he was in charge of ediscovery, data protection software, and archiving. He is also the former Senior Vice President of Worldwide Sales for Zantaz, where he increased sales by 900 percent.
He will be able to add Recommind to his list of careers and as a new senior vice president, he will be responsible for customer experience, expanding Recommind’s cloud services, and growing the on-demand hosted service and information governance.
” ‘There isn’t a better time to join Recommind with business analytics, including search, end-to-end ediscovery and information governance software, in such high demand among law firms, enterprises and government organizations,’ said Kennedy. ‘I look forward to helping Recommind continue its impressive revenue growth and customer engagement breakthroughs.’”
Recommind continues to be a leader in information intelligence and hiring Kennedy is a sign they are predicting mote future growth.
Whitney Grace, February 02, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Recorded Future: Google and Cyber OSINT
February 2, 2015
I find the complaints about Google’s inability to handle time amusing. On the surface, Google seems to demote, ignore, or just not understand the concept of time. For the vast majority of Google service users, Google is no substitute for the users’ investment of time and effort into dating items. But for the wide, wide Google audience, ads, not time, are more important.
Does Google really get an F in time? The answer is, “Nope.”
In CyberOSINT: Next Generation Information Access I explain that Google’s time sense is well developed and of considerable importance to next generation solutions the company hopes to offer. Why the craw fishing? Well, Apple could just buy Google and make the bitter taste of the Apple Board of Directors’ experience a thing of the past.
Now to temporal matters in the here and now.
CyberOSINT relies on automated collection, analysis, and report generation. In order to make sense of data and information crunched by an NGIA system, time is a really key metatag item. To figure out time, a system has to understand:
- The date and time stamp
- Versioning (previous, current, and future document, data items, and fact iterations)
- Times and dates contained in a structured data table
- Times and dates embedded in content objects themselves; for example, a reference to “last week” or in some cases, optical character recognition of the data on a surveillance tape image.
For the average query, this type of time detail is overkill. The “time and date” of an event, therefore, requires disambiguation, determination and tagging of specific time types, and then capturing the date and time data with markers for document or data versions.
A simplification of Recorded Future’s handling of unstructured data. The system can also handle structured data and a range of other data management content types. Image copyright Recorded Future 2014.
Sounds like a lot of computational and technical work.
In CyberOSINT, I describe Google’s and In-Q-Tel’s investments in Recorded Future, one of the data forward NGIA companies. Recorded Future has wizards who developed the Spotfire system which is now part of the Tibco service. There are Xooglers like Jason Hines. There are assorted wizards from Sweden, countries the most US high school software cannot locate on a map, and assorted veterans of high technology start ups.
An NGIA system delivers actionable information to a human or to another system. Conversely a licensee can build and integrate new solutions on top of the Recorded Future technology. One of the company’s key inventions is numerical recipes that deal effectively with the notion of “time.” Recorded Future uses the name “Tempora” as shorthand for the advanced technology that makes time along with predictive algorithms part of the Recorded Future solution.
Which Auto Classification Method Works?
February 1, 2015
Unfortunately A. Lancichinetti, et al do not deliver a consumer reports type analysis in “High Reproducibility and High Accuracy Method for Automated Topic Classification.” The paper does raise some issues that keyword search vendors with add on categorization do not often explain to licensees. In a nutshell, classification is often chugging along with 65 to 80 percent accuracy. Close enough for horseshoes, Gustav Dirichlet is here. A couple of thoughts:
- Dirichlet died in 1822. Yep, another of the math guys from the past.
- The method is described as “state of the art.”
- Bounded content sets of sci tech information yield more accurate classification.
How do these methods work on WhatsApp messages in gang slang?
Stephen E Arnold, February 1, 2015