In Defense of MarkLogic

December 13, 2013

Many people would like to know exactly what went wrong with HealthCare.gov, and The New York Times obliged with a lengthy post-Thanksgiving article on the subject. However, former MarkLogic CEO Dave Kellogg takes issue with the amount of blame the story places on his former company. (The references to MarkLogic are on page seven of the Times piece.) His response can be found in a post at his Kellblog: “The Pillorying of MarkLogic: Why Selling Disruptive Technology to the Government is Hard and Risky.” Whose dog is in this fight? Not ours, for sure, but this is an interesting exchange to watch from the sidelines.

Here’s what the Times asserts:

“Some of the companies building the system opposed an early decision by the Medicare agency to use database software from a company called MarkLogic, which handles data differently from systems by companies like IBM and Oracle. Some suggest that its unfamiliar nature slowed their work. By mid-November, more than six weeks after the rollout, the MarkLogic database — essentially the website’s virtual filing cabinet and index — continued to perform below expectations, according to one person who works in the command center.”

However, the database firm was not operating in a vacuum; the Times piece acknowledges that MarkLogic was but one of the many vendors involved to complain about inadequate computing power, data-center instability, and integration failures that were out of their hands. That does not keep the article for singling out MarkLogic. Scapegoat much?

Kellogg lists problems the site has had: unrealistic timelines, the refusal to go through a Beta stage, a lack of oversight, insufficient testing, and late change requests. As he notes, these problems are common on large projects (especially, I would add, on those organized by someone unexperienced in IT, as this one apparently was.) Though he had left MarkLogic by the time this project was underway, Kellogg has some very educated guesses about what went wrong.

He writes:

“To me, guessing from a distance, it seems pretty obvious what happened.

*Someone who didn’t understand how hard it to build was ordered up a website of very high complexity with totally unrealistic timeframes.

*A bunch of integrators (and vendors) who wanted their share of the $630M put in bids, probably convincing themselves in each part of the system that if things went very well that they could maybe make the deadlines or, if not, maybe cut some scope. (Remember you don’t win a $50M bid by saying ‘the project is crazy and the timeframe unrealistic.’)

*Everybody probably did their best but knew deep down that the project was failing.

*Everyone was afraid to admit that the project was failing because nobody likes to deliver bad news, and it seems that there was no one central coordinator whose job it was to do so.”

So, the problem lies at the intersection of human nature and bureaucracy; no surprise there. Kellogg goes on to observe that this is the reason most organizations have shifted from huge, unwieldy projects to agile methodologies. Government process, though, is hardly set up to take advantage of the latest methodologies. Will it ever be?

Kellogg discusses why he thinks MarkLogic is being thrown under the bus: Instead of exploring the serious flaws within our government’s procurement process, some folks involved would rather point to ways MarkLogic is different from stodgy but familiar systems like Oracle’s and IBM’s. “Non-standard,” they call it, and say that is the root of the problem. With that attitude, I’m surprised administrators were convinced to run one of those new-fangled websites at all. Wouldn’t a phone line do?

Cynthia Murrell, December 13, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Yahoo Snaps Up NLP Startup SkyPhrase

December 13, 2013

There is more NLP excitement at Yahoo, we learn from TechCrunch‘s piece, “Yahoo Acquires Natural Language Processing Company SkyPhrase to Help Drive Intent Identification.” Writer Darrell Etherington reports that SkyPhrase will be integrated into Yahoo’s office in New York.

The article observes:

“Back in October, we covered SkyPhrase, and noted specifically that its NLP tech could be used to advance fantasy sports, which is of course an area where Yahoo excels and has a considerable investment already. The company has created an app that makes it easy for fantasy football players to search through stats and find only those relevant to making picks and monitoring their team, which would be very handy integrated directly into Yahoo’s fantasy sports products.

“[…] In October, the entrepreneur and cognitive scientist said that what he really hoped to accomplish with the company was to make NLP tech useful to as much of the world as possible via tailoring it to specific verticals in a way that’s easy for everyday users to access, and to make it easier for third-party partners to build NLP-powered interfaces for their own products, data and services.”

Sounds great! Unfortunately, laments Etherington, Yahoo is more likely to task its new acquisition with improving Yahoo’s products than with spreading the wealth of their third-party-friendly NLP. He notes that Yahoo has been focusing on mobile functionality, and that SkyPhrase’s tech can help with that.

Launched in 2011, SkyPhrase has built its algorithms around research performed at Rensselaer Polytechnic Institute by Cassimatis and some of his grad students. The startup received funding from investment firm Breakout Labs, which invests in breakthrough advances. Let us hope that Yahoo’s rulership does not dim SkyPhrase’s unique potential.

Cynthia Murrell, December 13, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

E-Retailers Guide Ranks EasyAsk Semantic Search Leader in E-Commerce Technology

December 13, 2013

EasyAsk Ranked Among Top Four Providers of E-Commerce Technology, an article on Virtual-Strategy Magazine, recognizes the achievements of EasyAsk, the natural language search company. EasyAsk was recently named one of the top 4 vendors (out of 1,000) in driving e-commerce sales by the E-Retailers Guide. Craig Bassin, CEO of EasyAsk, expressed no surprise at this, since reports show that 43% of visitors to a given website will head straight for the search box.

Bassin expanded on his company’s progress:

“”EasyAsk is poised to capture a significant share of the growing spend on e-commerce technology, said Bassin. “EasyAsk eCommerce Edition delivers amazing value to our clients. EasyAsk is embedded within Infor Storefront and has out-of-the-box integrations with the leading e-commerce platforms, such as IBM Websphere Commerce, Magento, Hybris and Netsuite. Our customers consistently tell us we help them turn shoppers into buyers.” Gartner Inc. estimates that retailers spent about $3 billion on e-commerce technology in 2012. “

Semantic search has become unavoidably important, with Google and Microsoft adopting their own offerings since in the last two years. But EasyAsk stands out as offering “natural language search for e-commerce enterprise, on-premise and cloud platforms.” Their work in raising online revenue by allowing users to search in plain English and receive specific and relevant results has made them a leader in the field.

Chelsea Kerwin, December 13, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Semantria and Diffbot: Clever Way to Forge a Tie Up

December 12, 2013

Short honk. I came across an interesting marketing concept in “Diffbot and Semantria Join to Find and Parse the Important Text on the ‘Net (Exclusive).”

Semantria (a company that offers sentiment analysis as a service) participated in a hackathon in San Francisco. The explains:

To make the Semantria service work quickly, even for text-mining novices, Rogynskyy’s team decided to build a plugin for Microsoft’s popular Excel spreadsheet program. The data in a spreadsheet goes to the cloud for processing, and Semantria sends back analysis in Excel format.

Semantria sponsored a prize for the best app. Diffbot won:

A Diffbot developer built a simple plugin for Google’s Chrome browser that changes the background color of messages on Facebook and Twitter based on sentiment — red for negative, green for positive. The concept won a prize from Semantria, Rogynskyy said. A Diffbot executive was on hand at the hackathon, and Rogynskyy started talking with him about how the two companies could work together.

I like the “sponsor”, “winner” and “team up” approach. The pay off, according to the article, is “While Semantria and Diffbot technologies continue to be available separately, they can now be used together.”

Sentiment analysis is one of the search submarkets that caught fire and then, based on the churning at some firms like Attensity, may be losing some momentum. Marketing innovation may be a goal other firms offering this functionality in 2014.

Stephen E Arnold, December 12, 2013

SharePoint Faces Challenging Future

December 12, 2013

Anytime a company is the leader in a particular area, the challenge is to hold that position. In many ways it is a lot more fun to be the up-and-comer than to be the behemoth trying to hold on to the lion’s share of the market. SharePoint is in this very position. ComputerWorld brings the news in their article, “Why Microsoft SharePoint Faces a Challenging Future.”

The article begins:

“Many enterprises use and like SharePoint. Microsoft likes it, too, because it’s one of the company’s fastest-growing product lines. But making enterprises support separate cloud and on-premises versions and telling SharePoint app developers not to work in C# and ASP.NET may make for a rocky relationship as time goes by.”

SharePoint is going to constantly battle threats to its supremacy. Stephen E. Arnold, a longtime leader in search and the brains behind ArnoldIT, often covers the comings and goings of SharePoint. He finds that although most enterprises prefer customization and add-ons to their SharePoint infrastructure, it doesn’t appear that SharePoint will lose its number 1 spot anytime soon.

Emily Rae Aldridge, December 12, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Elasticsearch in a Box Through Vagrant for the Holidays

December 12, 2013

The article JavaWorld titled Elasticsearch in a Box explores the possibilities of using Elasticsearch as a platform. There are different options, but Elasticsearch-in-a-box through Vagrant is the subject of this article. The base box is 64-bit Ubuntu 12.04 using Oracle’s Java 7 and the Elasticsearch version 0.90.7. It is free, and all you need to begin is Vagrant and VirtualBox installed. The article explains,

“Elasticsearch-in-a-box is a freely available Vagrant base box. What that means is that you can quickly fire up and tear down an Elasticsearch environment with simple commands like vagrant up and vagrant destroy…First, you need to add and initialize the Elasticsearch-in-a-box template. Go ahead and create a directory, like /projects/esinabox, change directories into it and execute this command: This command will create a Vagrant definition named esinabox from the downloaded template:

1 vagrant box add esinabox https://s3.amazonaws.com/coffers/esinabox.box

These steps will account for downloading Elasticsearch-in-a-box template.

A search present just in time for the holidays. Following this, you must only create a VagrantFile which will enable you to customize. Once you have finished you can start Elasticsearch-in-a-box running locally on your machine. From there, executing queries and “tearing down instances” should be no trouble. The template was built through Veewee.

Chelsea Kerwin, December 12, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

How to Take Advantage of Local Market Opportunities

December 12, 2013

The article titled Three Steps for Crushing Multi-Location Search on Search Engine Land offers tips for “local market opportunity” aka multi- location businesses taking advantage of local coverage in all of the areas serviced. The first tip is to know your local market coverage by identifying all of the areas you might be missing out in and compiling search volume data as well as average order value and doing some fancy mathematical footwork to understand more clearly where you stand to gain the most in terms of first page coverage on search engines. The second tip is to optimize your business listings.

The article states:

“Beef up your listings with as much data as you can provide — directions, payments accepted, localized description, categories, images, local coupons, photos, social network links and links to individual store pages can really make your listing stand out. I call it good data fidelity. This data — when accurate, current and consistent across locations — helps search engines deliver optimum results to user queries. And search engines live or die by delivering a good user experience through accurate results.”

The third and final suggestion is to keep the bulk and manual feeds for local maps through Google Plus, Bing Business and Yahoo up to date and accurate. These all comprise sound advice, but it was surprising to see that the author left out a major tip: buy Google ads.

Chelsea Kerwin, December 12, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Innoz Brings the Internet to Those Without Smartphones

December 12, 2013

The article titled India Startup Touted as World’s Largest Office Search Engine on ZDNet provides a glimpse into the offline world of SMS-based internet. Innoz, an Indian company called “the offline Google” aspires to connect people without smartphones or ipads to the Internet. It works through texting, which any phone can do now.

The article explains:

“Innoz’s flagship product, SmartSMS, provides a specific answer of up to 480 characters to an SMS query within seconds. There’s an option to retrieve more information on the query if needed. Users text their query to 55444 and the software searches for an answer. The query can be on any topic–from what to wear for a job interview to who a particular actor is dating. Innoz works in partnership with Wikipedia, knowledge engine WolframAlpha, and other Internet resources to provide answers.”

Co-founder Deepak Ravindran believes that his company is capable of connecting the cut-off people of India. The majority of the text queries do come from smaller cities in a variety of Indian dialects and languages. Giving these less connected people the ability to access the internet without a smartphone has made Innoz a “top 5 startup” on Forbes magazine’s ranking and perhaps, perhaps even has Google looking over its shoulder.

Chelsea Kerwin, December 12, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Quote to Note: NLP and Recipes for Success and Failure

December 11, 2013

I read “Natural language Processing in the Kitchen.” The post was particularly relevant because I had worked through “The Main Trick in Machine Learning.” The essay does an excellent job of explaining coefficients (what I call for ease of recall, “thresholds.”) The idea is that machine learning requires a human to make certain judgments. Autonomy IDOL uses Bayesian methods and the company has for many years urged licensees to “train” the IDOL system. Not only that, successful Bayesian systems, like a young child, have to be prodded or retrained. How much and how often depends on the child. For Bayesian-like systems, the “how often” and “how much” varies by the licensees’ content contexts.

Now back to the Los Angeles Times’ excellent article about indexing and classifying a small set of recipes. Here’s the quote to note:

Com­puters can really only do so much.

When one jots down the programming and tuning work required to index recipes, keep in mind the “The Main Trick in Machine Learning.” There are three important lessons I draw from the boundary between these two write ups:

  1. Smart software requires programming and fiddling. At the present time (December 2013), this reality is as it has been for the last 50 years, maybe more.
  2. The humans fiddling with or setting up the content processing system have to be pretty darned clever. The notion of “user friendliness” is strongly disabused by these two articles. Flashy graphics and marketers’ cooing are not going to cut the mustard or the sirloin steak.
  3. The properly set up system with filtered information processed without some human intervention hits 98 percent accuracy. The main point is that relevance is a result of humans, software, and consistent, on point content.

How many enterprise search and content processing vendors explain that a failure to put appropriate resources toward the search or content processing implementation guarantees some interesting issues. Among them, systems will routinely deliver results that are not germane to the user’s query.

The roots of dissatisfaction with incumbent search and retrieval systems is not the systems themselves. In my opinion, most are quite similar, differing only in relatively minor details. (For examples of the similarity, review the reports at Xenky’s Vendor Profiles page.)

How many vendors have been excoriated because their customers failed to provide the cash, time, and support necessary to deliver a high-performance system? My hunch is that the vendors are held responsible for failures that are predestined by licensees’ desire to get the best deal possible and believe that magic just happens without the difficult, human-centric work that is absolutely essential for success.

Stephen E Arnold, December 11, 2013

Palantir: What Is the Main Business of the Company?

December 11, 2013

I read about Palantir and its successful funding campaign in “Palantir’s Latest Round Valuing It at $9B Swells to $107.8M in New Funding.” Compared to the funding for ordinary search and content processing companies, Palantir is obviously able to attract investors better than most of the other companies that make sense out of data.

If you run a query for “Palantir” on Beyond Search, you will get links to articles about the company’s previous funding and to a couple of stories about the companies interaction with IBM i2 related to an allegation about Palantir’s business methods.

http://www.louisianalottery.com/assets/images/games/scratchoffs/LA406.gif

Image from the Louisiana Lottery.

I find Palantir interesting for three reasons.

First, it is able to generate significant buzz in police and intelligence entities in a number of countries. Based on what I have heard at conferences, the Palantir visualizations knock the socks off highly placed officials who want killer graphics in their personal slide presentations.

Second, the company has been nosing into certain financial markets. The idea is that the Palantir methods will give some of the investment outfits a better way to figure out what’s going up and what’s going down. The visuals are good, I have heard, but the Palantir analytics are perceived, if my sources are accurate, as better than those from companies like IBM SPSS, Digital Reasoning, Recorded Future, and similar analytics firms.

Third, the company may have moved into a new business sector. The firm’s success in fund raising begs the question, “Is Palantir becoming a vehicle to raise more and more cash?”

Palantir is worth monitoring. The visualizations and the math are not really a secret sauce. The magic ingredient at Palantir may be its ability to sell its upside to investors. Is Palantir introducing a new approach to search and content processing? The main business of the company could be raising more and more money.

Stephen E Arnold, December 11, 2013

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta