Big Data Startup Parade Begins with These 14 Companies

July 14, 2013

Business Insider posted an article titled 14 Big Data Startups You’re Going to be Hearing A Lot More About on June 4, 2013. The article explores the big data companies teetering on the edge of wild success and fame. The companies named include WibiData, Hadapt, Sqrrl, Precog, Datameer, HStreaming, Alpine Data Labs and Kontagent. The article claims,

“Google, Facebook, Amazon and other web giants have harnessed big data to solve some of their biggest tech challenges. Now many of these engineers are setting out on their own with startups. Some are focused on analytics. Some are working on in-memory databases, which do all their work on data stored in memory instead of hard drives. Others are casting their lot with NoSQL, a new kind of database that spreads processing and storage across multiple servers and storage systems.”

For example, Data Gravity, founded in 2012 with headquarters in Nashua, NH and star Paula Long, makes big data more affordable by embedding the tech into storage systems. The implications posed by these startups for IBM SPSS, SAS, Palantir and Digital Reasoning are as yet unclear. VC’s certainly seem optimistic, with almost all of the startups mentioned raking in millions of dollars from various backers.

Chelsea Kerwin, July 14, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under Big data, News | Comments Off on Big Data Startup Parade Begins with These 14 Companies

Search and Content Processing Vendor in the Spotlight

June 8, 2013

Once again I have no opinion about allegations regarding data intercepts. Not my business. Here in Harrod’s Creek, I am thrilled to have electric power and a couple of dogs to accompany me on my morning walk in the hollow by the pond filled with mine drainage.

I did read a TPM story commenting about Palantir, a company which has more than $100 million in funding and now has a PR profile higher than the Empire State Building. The write up explains that a company with search, connectors, and some repackaged numerical recipes may be involved with certain US government activities.

Here’s a quote from a quote in the write up:

Apparently, Palantir has a software package called “Prism”: “Prism is a software component that lets you quickly integrate external databases into Palantir.” That sounds like exactly the tool you’d want if you were trying to find patterns in data from multiple companies.

The write up has some links to Palantir documents.

Several thoughts:

First, there are quite a few firms working in the same content processing sector as Palantir. Some of these you may know; for example IBM. Others are probably off your radar and maybe drifting into oblivion like Digital Reasoning. The point is that many organizations looking to make money from search and content processing have turned to government contracts to stay afloat. Why haven’t real journalists and azure chip consultants cranking out pay to play profiles described the business functions of these outfits? Maybe these experts and former English majors are not such smart folks after all. Writing about Microsoft is just easier perhaps>

Second, the fancy math outfits are not confined to Silicon Valley. Nope, there are some pretty clever systems built and operated outside the US. You can find some nifty technology in such surprising places as downtown Paris, a Stockholm suburb, and far off Madrid. Why? There is a global appetite for software and systems which can make sense of Big Data. I don’t want to rain on anyone’s parade, but these systems do not vary too much. They use similar math, have similar weaknesses, and similar outputs. The reason? Ah, gentle reader, Big O helps make clear why fancy math systems are pretty much alike as information access systems have been for decades.

Third, the marketers convince the bureaucrats that they have a capability which is bigger, faster, and cheaper. In today’s world this translates to giant server farms and digital Dysons. When the marketers have moved on to sell Teslas, lesser souls are left with the task of making the systems work.

My view is that we are in the midst of the largest single PR event related to search in my lifetime.

Will the discussion of search and content processing improve information access?

Nope.

Will the visibility alter the trajectory of hybrid systems which “understand” content?

Nope.

Will Big Data yield high value insights which the marketers promised?

Nope.

My thought is that there will be more marketing thrills in the search and content processing sector. Stay tuned but don’t use a fancy math system to pick your retirement investment, the winner of today’s Belmont, or do much more than deliver a 1970s type of survey output.

Oh, the Big O. The annoying computational barriers. The need to recycle a dozen or so well known math methods juiced with some visualizations.

The search and content processing bandwagon rolls forward. The cloud of unknowing surrounds information access. What’s new?

Stephen E Arnold, June 8, 2013

Sponsored by Xenky, the ArnoldIT portal.

Written by Stephen E. Arnold · Filed Under Analytics, News, Search, Security | Comments Off on Search and Content Processing Vendor in the Spotlight

HP, Autonomy, and a Context Free Expert Output about Search: The Bet on a Horse Approach to Market Analysis

May 4, 2013

I don’t think too much about:

Azure chip consultants. You know, these are the firms which make a living from rah rahs, buzzwording, and pontification to sell reports. (I know. I labored at a non-azure chip outfit for what seems like decades. Experience is a good instructor. Oh, if you are a consultant, please, complain about my opinion using the comments section of this free blog.)
Hewlett Packard. I recall that the company used to make lab equipment which was cool. Now I think the firm is in some other businesses but as quickly as I latch on to one like the Treo and mobile, HP exits the business. The venerable firm confuses my 69 year old mind.
Autonomy. I think I did some work for the outfit but I cannot recall. Age and the lifestyle in rural Kentucky takes a toll on the memory I admit.

Nevertheless, I read “HP’s Autonomy Could Face Uphill Battle In Data Market.” There were some gems in the write up which I found amusing and illustrative of the problems which azure chip consulting firms and their experts have when tackling certain business issues.

The main idea of the write up for “investors” is that HP faces “challenges.” Okay. That’s a blinding insight. As you may recall, HP bought Autonomy for $11 billion and then a few months later roiled the “investors” by writing off billions on the deal. That was the mobile phone model, wasn’t it?

The write up then pointed out:

HP wanted Autonomy to jump-start its move into software and cloud-based computing. Autonomy is the No. 1 provider of search and retrieval software that companies use to find and share files and other information on their websites and document management systems.

Okay. But that too seems obvious.

Now here comes the kicker. The expert outfit providing inputs to the reporter doing the bull dog grip on this worn out bone is quoted as saying:

“Software license revenue (in this market) isn’t growing at the same rate as before, and we are beginning to see the rise of some new technologies, specifically content analytics and unified information access,” Schubmehl said. These new types of software can be used with types of business analytics software, business intelligence software and other software to help enterprises do a better job of locating specific information, he says, which is the job of search retrieval software.

I don’t know much about IDC but what strikes me from this passage is that there are some assertions in this snippet which may warrant a tiny bit of evaluation.

Will context free analyses deliver a winner? Will there be a Gamblers Anonymous for those who bet on what journalists and mid tier (second string) consultancies promulgate? For more about Gamblers Anonymous navigate to http://www.gamblersanonymous.org/ga/

Here goes:

Written by Stephen E. Arnold · Filed Under Business strategy, Feature, Marketing, Search | 1 Comment

Swinging for the Fences and Search

April 22, 2013

I have been reading—actually time traveling to an economics class in graduate school—David Stockman’s The Great Deformation. I follow the argument. No problem, but I am skeptical of blame from those who were involved in the events. I have been in quite a few crazy meetings, and I avoid discussing the subjects of most of those stories for two reasons: [a] In the midst of events, I had zero clue about the larger, political forces at work in which the meeting was a grain of sand in the larger dust storm and [b] I focus on search and retrieval, a subject definitely not part of the more interesting meetings in which I have participated over the last 40 years.

What impact does the “big bet” approach to investing have on search, content, and analytics vendors?

However, the “deformation” arguments triggered some thinking after I read “Google Investors Say Yes to Big Bets.” I have been looking at some of the reviews of the book. In the Kirkus Review a theme surfaced:

fiscal math hit the shoals,” leaving a legacy of permanent “massive deficit finance” and the legend that “deficits didn’t matter.”

What’s this have to do with search? Well, that is a good question. I took a moment and looked up the venture money which has flowed into a handful of search and content processing companies. Here’s the table in which I captured my result. The link points to the source (maybe a good source, maybe a lousy source).

Company	Venture Funding	Year Founded
Attivio	$48.2 million	2007
BA Insight	$10.5 million	2004
Coveo	$34.7 million	2004
Digital Reasoning*	$5.2 million	2000
Palantir **	$301 million	2004
Vivisimo	$4 million	2008

* The Digital Reasoning number includes In-Q-Tel funding excludes friends, angels, and family funding

** I included Palantir because in one briefing the system was presented as having a robust search function available to analyst users.

If I total these numbers, I get $403.6 million. Tossing out the astounding $301 million for Palantir, the more “searchy” vendors’ funding in this sample total $102.6 million.

Several questions rose in my mind:

First, in today’s economy, how will these firms return to investors their money, interest, and a profit?

Written by Stephen E. Arnold · Filed Under Feature, Financial, Search | Comments Off on Swinging for the Fences and Search

Tougher Times for Cash Hungry Content Processing Vendors?

April 18, 2013

I read the troubling write up “Q1 Venture Capital Spending And Number Of Deals Down, M&A Activity Drops 44 Percent And Pre-Money Valuations Plummet”. Try as I might, I could not see much good news in the data presented.

The main point of the write up was in my opinion:

Deals in Information Technology (IT), Healthcare, Energy and Utilities, and Industrial Goods all declined, and deals in Business and Financial Services, Consumer Goods, and Consumer Services investment increased from the previous quarter.

For companies in the search, content processing, and analytics sector with a consumer angle, the good news is that money may continue to flow and may, in some cases, spike.

For other types of outfits, money may become more difficult to get. If a funding source is available, my hunch is that investors may be taking increasingly critical looks at the companies ingesting money. How does one age a Type A 35 year old senior manager? My thought is, “Ask for actions that deliver revenue, not marketing puffery.” I am probably off base, but the Techcrunch story suggests that a downward trend may be upon us.

One cannot forget that the investors’ expectation is a return. For companies in the old “search” space, revenues are going to be needed to avoid one of those legendary investor actions: Top management replacement, fire sale, forced merger, intellectual property auction, shut down, or some similar step.

Going forward, search, content processing, and analytics vendors are going to have to generate more revenue. In short, the squeezable days of the last three years may be going away.

Can the search, content processing, and analytics vendors which have taken sums ranging from a few million (BA Insight, Digital Reasoning) to tens of millions (Attivio, Coveo) to hundreds of millions (Palantir) deliver significant top line growth and demonstrate a here-and-now value proposition? One or more of these companies will definitely perform. The ones which do not? Well, that’s what makes search and content processing so darned interesting.

One of my financial clients has asked me to poke around with some numbers and market appetite. No results in hand yet. The project is interesting.

Stephen E Arnold, April 18, 2013

Government Initiatives and Search: A Make-Work Project or Innovation Driver?

March 25, 2013

I don’t want to pick on government funding of research into search and retrieval. My goodness, pointing out that payoffs from government funded research into information retrieval would bring down the wrath of the Greek gods. Canada, the European Community, the US government, Japan, and dozens of other nation states have poured funds into search.

In the US, a look at the projects underway at the Center for Intelligent Information Retrieval reveals a wide range of investigations. Three of the projects have National Science Foundation support: Connecting the ephemeral and archival information networks, Transforming long queries, and Mining a million scanned books. These are interesting topics and the activity is paralleled in other agencies and in other countries.

Is fundamental research into search high level busy work. Researchers are busy but the results are not having a significant impact on most users who struggle with modern systems usability, relevance, and accuracy.

In 2007 I read “Meeting of the MINDS: An Information Retrieval Research Agenda.” The report was sponsored by various US government agencies. The points made in the report were, like the University of Massachusetts’ current research run down, were excellent. The 2007 recent influences are timely six years later. The questions about commercial search engines, if anything, are unanswered. The challenges of heterogeneous data also remain. Information analysis and organization which is today associated with analytics and visualization-centric systems could be reprinted with virtually no changes. I cite one example, now 72 months young, for your consideration:

We believe the next generation of IR systems will have to provide specific tools for information transformation and user-information manipulation. Tools for information transformation in real time in response to a query will include, for example, (a) clustering of documents or document passages to identify both an information group and also the document or set of passages that is representative of the group; (b) linking retrieved items in timelines that reflect the precedence or pseudo-causal relations among related items; (c) highlighting the implicit social networks among the entities (individuals) in retrieved material;
and (d) summarizing and arranging the responses in useful rhetorical presentations, such as giving the gist of the “for” vs. the “against” arguments in a set of responses on the question of whether surgery is recommended for very early-stage breast cancer. Tools for information manipulation will include, for example, interfaces that help a person visualize and explore the information that is thematically related to the query. In general, the system will have to support the user both actively, as when the user designates a specific information transformation (e.g., an arrangement of data along a timeline), and also passively, as when the system recognizes that the user is engaged in a particular task (e.g., writing a report on a competing business). The selection of information to retrieve, the organization of results, and how the results are displayed to the user all are part of the new model of relevance.

In Europe, there are similar programs. Examples range from Europa’s sprawling ambitions to Future Internet activities. There is Promise. There are data forums, health competence initiatives, and “impact”. See, for example, Impact. I documented Japan’s activities in the 1990s in my monograph Investing in an Information Infrastructure, which is now out of print. A quick look at Japan’s economic situation and its role in search and retrieval reveals that modest progress has been made.

Stepping back, the larger question is, “What has been the direct benefit of these government initiatives in search and retrieval?”

On one hand, a number of projects and companies have been kept afloat due to the funds injected into them. In-Q-Tel has supported dozens of commercial enterprises, and most of them remain somewhat narrowly focused solution providers. Their work has been suggestive, but none has achieved the breathtaking heights of Facebook or Twitter. (Search is a tiny part of these two firms, of course, but the government funding has not had a comparable winner in my opinion.) The benefit has been employment, publications like the one cited above, and opportunities for researchers to work in a community.,

On the other hand, the fungible benefits have been modest. As the economic situation in the US, Europe, and Japan has worsened, search has not kept pace. The success story is Google, which has used search to sell advertising. I suppose that’s an innovation, but it is not one which is a result of government funding. The Autonomy, Endeca, Fast Search-type of payoff has been surprising. Money has been made by individuals, but the technology has created a number of waves. The Hewlett Packard Autonomy dust up is an example. Endeca is a unit of Oracle and is becoming more of a utility than a technology game changer. Fast Search has largely contracted and has, like Endeca, become a component.

Some observations are warranted.

First, search and retrieval is a subject of intense interest. However, the progress in information retrieval is advancing just slowly in my opinion. I think there are fundamental issues which researchers have not been able to resolve. If anything, search is more complicated today than it was when the Minds Agenda cited above was published. The question is, “Maybe search is more difficult than finding the Higgs Boson?” If so, more funding for search and retrieval investigations is needed. The problem is that the US, Europe, and Japan are operating at a deficit. Priorities must come into play.

Second, the narrow focus of research, while useful, may generate insights which affect the margins of larger information retrieval questions. For example, modern systems can be spoofed. Modern systems generate strong user antipathy more than half the time because they are too hard to use or don’t answer the user’s question. The problem is that the systems output information which is quite likely incorrect or not useful. Search may contribute to poor decisions, not improve decisions. The notion that one is better off using more traditional methods of research is something not discussed by some of the professionals engaged in inventing, studying, or selling search technology.

Third, search has fragmented into a mind boggling number of disciplines and sub-disciplines. Examples range from Coveo (a company which has ingested millions in venture funding and support from the province of Québec) which is sometimes a customer support system and sometimes a search system to Palantir (a recipient of venture funding and US government funding) which outputs charts and graphs, relegating search to a utility function.

Net net: I am not advocating the position that search is unimportant. Information retrieval is very important. One cannot perform some work today unless one can locate a specific digital item in many cases.

The point is that money is being spent, energies invested, and initiatives launched without accountability. When programs go off the rails, these programs need to be redirected or, in some cases, terminated.

What’s going on is that information about search produced in 2007 is as fresh today as it was 72 months ago. That’s not a sign of progress. That’s a sign that very little progress is evident. The government initiatives have benefits in terms of making jobs and funding some start ups. I am not sure that the benefits affect a broader base of people.

With deficit financing the new normal, I think accountability is needed. Do we need some conferences? Do we need giveaways like pens and bags? Do we need academic research projects running without oversight? Do we need to fund initiatives which generate Hollywood type outputs? Do we need more search systems which cannot detect semantically shaped or incorrect outputs?

Time for change is upon us.

Stephen E Arnold, March 25, 2013

Written by Stephen E. Arnold · Filed Under Editorial opinion, Government, News, Search | Comments Off on Government Initiatives and Search: A Make-Work Project or Innovation Driver?

WCC Group and ImageWare

March 20, 2013

I saw a reference to a court filing by the law firm called San Diego IP Law Group LLP. You can find the document at the San Diego court as Case 3:13-cv-oo309-DMS-JMA. I took a quick look and it appeared that the a company in the search and content processing business is a party to the legal matter. The “defendant”, if I read the document correctly, is WCC Services US, Inc., a Delaware corporation owned by WCC Group BV in the Netherlands.

Here’s what WCC says about its company:

WCC is a high-end software company that automates the matching process by providing more accurate and intelligent results. Non-core activities such as client implementations are performed by qualified partners like Accenture or EDS. To maintain its stated company objectives, WCC recruits and retains a motivated, flexible and highly educated staff. The knowledge and passion of our people drives industry-leading innovation and delights customers with the quality of our products and support. WCC is committed to a transparent Corporate Governance structure, even as a privately-held company. The organization’s openness, internally and externally, gives stakeholders up-to-date information about WCC and its future course. Conservative accounting policies assure continuity of the company and clearly signal WCC’s reliability as a business partner.

The court document carries the phrase “Complain for patent infringement” with a demand for a jury trial. The court document references a number of patents; for example, US 7298873 and some others.

I just wanted to document the existence of this court document. Like the Palantir i2 Group dust up, these disputes about content processing are interesting to me. Once resolved, the information about the matter can disappear. Google, of course, does not like urls which fail to resolve. I don’t loud sirens. Like Google, there’s not much one can do about certain content going dark. Stuff happens whether Google or I like it.

Keep in mind that I don’t have a dog in this fight. I have been monitoring WCC Group’s information retrieval business, but the company has kept a low profile. I did try to contact the company a couple of years ago, but I was unable to get much traction.

WCC’s search system is called Elise. There are some public descriptions of the search related business at these links:

Job matching
An Elise solution for “PES” which I don’t recognize as a useful acronym for public employment services
A case study for a German PES

The San Diego Law Group’s Web site is http://firm.sandiegoiplaw.com/. The WCC Web site (assuming I have located the correct Web destination) is http://www.wcc-group.com/.

Stephen E Arnold, March 20, 2013

Written by Stephen E. Arnold · Filed Under Analytics, Legal matters, News | Comments Off on WCC Group and ImageWare

eDiscovery: A Source of Thrills and Reduced Costs?

February 2, 2013

When I hear the phrase “eDiscovery”, I don’t get chills. I suppose some folks do. I read after dinner last night (February 1, 2013) “Letter From LegalTech: The Thrills of E-Discovery.” The author addresses the use of search and content processing technology to figure out which documents are most germane to a legal matter. Once the subset has been identified, eDiscovery provides outputs which “real” attorneys (whether in Bangalore or Binghamton) can use to develop their “logical” arguments.

A happy quack to

One interesting factoid bumps into my rather sharp assessment of the “size” of the enterprise search market generated by an azure chip out. The number was about $1.5 billion. In the eDiscovery write up, the author says:

Nobody seems to know how large the e-discovery market is — estimates range from 1.2 to 2.8 billion dollars — but everyone agree it’s not going anywhere. We’re never going back to sorting through those boxes of documents in that proverbial warehouse.

I like the categorical affirmative “nobody.” The point is that sizing any of the search and content processing markets is pretty much like asking Bernie Madoff type professionals, “How much in liquid assets do you have?” The answer is situational, enhanced by marketing, and believed without a moment’s hesitation.

I know the eDiscovery market is out there because I get lots of PR spam about various breakthroughs, revolutions, and inventions which promise to revolutionize figuring out which email will help a legal eagle win a case with his or her “logical” argument. I wanted to use the word “rational” in the manner of John Ralston Saul, but the rational attorneys are leaving the field and looking for work as novelists, bloggers, and fast food workers.

One company—an outfit called Catalyst Repository Systems—flooded me with PR email spam about its products. I called the company on January 31, 2013. I was treated in an offhand, suspicious manner by a tense, somewhat defensive young man named Mark, Monk, Matt, or Mump. At age 69, I have a tough time figuring out Denver accents. Mark, Monk, Matt, or Mump took my name and phone number. He assured me that his boss would call me back to answer my questions about PR spam and the product which struck me as a “me too.” I did learn that he had six years of marketing experience and that he just “push the send button.” I suggested that he may want to know to whom he is sending messages multiple times, he said, “You are being too aggressive.” I pointed out that I was asking a question just like the lawyers who, one presumes, gobbles up the Catalyst products. He took my name, did not ask how to spell it, wrote down my direct line and did not bother to repeat it back to me, and left me with the impression that I was out of bounds and annoying. That was amusing because I was trying hard to be a regular type caller.

Post image for I’m Unemployed and Feel Ripped Off By My TTT Law School

A happy quack to Bitter Lawyer which has information about the pressures upon some in the legal profession. See http://www.bitterlawyer.com/i%E2%80%99m-unemployed-and-feel-ripped-off-by-my-ttt-law-school/

Mark, Monk, Matt, or Mump may have delivered the message and the Catalyst top dog was too busy to give me a jingle. Another possibility is that Mark, Monk, Matt, or Mump never took the note. He just wanted to get a person complaining about PR spam off the phone. Either way, Catalyst qualifies as an interesting example of what’s happening in eDiscovery. Desperation marketing has infected other subsectors of the information retrieval market. Maybe this is an attempt to hit in reality revenues of $1.5 billion?

Written by Stephen E. Arnold · Filed Under EDiscovery, Feature, Marketing, Text analytics, Text processing | 1 Comment

Information Confusion: Search Gone South

January 26, 2013

I read “We Are Supposed to Be Truth Tellers.” I think the publication is owned by a large media firm. The point of the write up is that “real news” has a higher aspiration and may deal with facts with a smidgen of opinion.

I am not a journalist. I am a semi retired guy who lives in rural Kentucky. I am not a big fan of downloading and watching television programs. The idea that I would want to record multiple shows, skip commercials, and then feel smarter and more informed as a direct result of those activities baffles me.

Here’s what I understand:

A large company clamped down on a subsidiary’s giving a recording oriented outfit a prize for coming up with a product that allows the couch potato to skip commercials. The fallout from this corporate decision caused a journalist to quit and triggered some internal grousing.

The article addresses these issues, which I admit, are foreign to me. Here’s one of the passages which caught my attention:

CNET reporters need to either be resigning or be reporting this story, or both. On CNET. If someone higher up removes their content then they should republish it on their personal blogs. If they are then fired for that they should sue the company. And either way, other tech sites, including this one, would be more than happy to make them job offers.

I agree I suppose. But what baffles me are these questions:

In today’s uncertain financial climate, does anyone expect senior management to do more than take steps to minimize risk, reduce costs, and try to keep their jobs? I don’t. The notion that senior management of a media company embraces the feel good methods of Whole Earth or the Dali Lama is out of whack with reality in my opinion.
In the era of “weaponized information,” pay to play search traffic, and sponsored content from organizations like good old ArnoldIT—what is accurate. What is the reality? What is given spin? I find that when I run a query for “gourmet craft spirit” I get some darned interesting results. Try it. Who are these “gourmet craft spirit” people? Interesting stuff, but what’s news, what’s fact, and what’s marketing? If I cannot tell, how about the average Web surfer who lets online systems predict what the user needs before the user enters a query?
At a time when those using online to find pizza and paradise, can users discern when a system is sending false content? More importantly, can today’s Fancy Dan intelligence systems from Palantir-likeand i2 Group-like discern “fake” information from “real” information? My experience is that with sufficient resources, these advanced systems can output results which are shaped by crafty humans. Not exactly what the licensees want or know about.

Net net: I am confused about the “facts” of any content object available today and skeptical of smart systems’ outputs. These can be, gentle reader, manipulated. I see articles in the Wall Street Journal which report on wire tapping. Interesting but did not the owner of the newspaper find itself tangled in a wire tapping legal matter? I read about industry trends from consulting firms who highlight the companies who pay to be given the high intensity beam and the rah rah assessments. Is this Big Data baloney sponsored content, a marketing trend, or just the next big thing to generate cash in a time of desperation. I see conference programs which feature firms who pay for platinum sponsorships and then get the keynote, a couple of panels, and a product talk. Heck, after one talk, I get the message about sentiment analysis. Do I need to hear from this sponsor four or five more times. Ah, “real” information? So what’s real?

In today’s digital world, there are many opportunities for humans to exercise self interest. The dust up over the CBS intervention is not surprising to me. The high profile resignation of a real journalist is a heck of a way to get visibility for “ethical” behavior. The subsequent buzz on the Internet, including this blog post, are part of the information game today.

Thank goodness I am sold and in a geographic location without running water, but I have an Internet connection. Such is progress. The ethics stuff, the assumptions of “real” journalists, and the notion of objective, fair information don’t cause much of stir around the wood burning stove at the local grocery.

“Weaponized information” has arrived in some observers’ consciousness. That is a step forward. That insight is coming after the train left the station. Blog posts may not be effective in getting the train to stop, back up, and let the late arrivals board.

Stephen E Arnold, January 26, 2013

Written by Stephen E. Arnold · Filed Under Business intelligence, News, Search quality, Text processing | Comments Off on Information Confusion: Search Gone South

Big Data and Search

January 1, 2013

A new year has arrived. Flipping a digit on the calendar prompts many gurus, wizards, failed Web masters, former real journalists, and unemployed English majors to identify trends. How can I resist a chrome plated, Gangnam style bandwagon? Big Data is no trend. It is, according to the smart set:

that Big Data would be “the next big chapter of our business history.

My approach is more modest. And I want to avoid silver-numbered politics and the monitoring business. I want to think about a subject of interest to a small group of techno-watchers: Big Data and search.

My view is that there has been Big Data for a long time. Marketers and venture hawks circle an issue. If enough birds block the sun, others notice. Big Data is now one of the official Big Trends for 2013. Search, as readers of this blog may know, experiences the best of times and the worst of times regardless of the year or the hot trends.

As the volume of unstructured information increases, search plays a part. What’s different for 2013 is that those trying to make better decisions need a helping hand, crutches, training wheels, and tools. Vendors of analytics systems like SAS and IBM SPSS should be in the driver’s seat. But these firms are not. An outfit like Palantir claims to be the leader of the parade. The company has snazzy graphics and $150 million in venture funding. Good enough for me I suppose. The Palantirs suggest that the old dudes at SAS and SPSS still require individuals who understand math and can program for the “end user”. Not surprisingly, there are more end users than there are SAS and SPSS wizards. One way around the shortage is to make Big Data a point-and-click affair. Satisfying? The marketers say, “For sure.”

A new opportunity arises for those who want the benefits of fancy math without the cost, hassle, and delay of dealing with intermediaries who may not have an MBA or aspire to be independently wealth before the age of 30. Toss in the health care data the US Federal government mandates, the avalanche of fuzzy thinking baloney from blogs like this one, and the tireless efforts of PR wizards to promote everything thing from antique abacuses to zebra striped fabrics. One must not overlook e-mail, PowerPoint presentations, and the rivers of video which have to be processed and “understood.” In these streams of real time and semi-fresh data, there must be gems which can generate diamond bright insights. Even sociology major may have a shot at a permanent job.

The biggest of the Big Berthas are firing away at Big Data. Navigate to “Sure, Big Data Is Great. But So Is Intuition.” Harvard, MIT, and juicy details explain that the trend is now anchored into the halls of academe. There is even a cautionary quote from an academic who was able to identify just one example of Big Data going somewhat astray. Here’s the quote:

At the M.I.T. conference, a panel was asked to cite examples of big failures in Big Data. No one could really think of any. Soon after, though, Roberto Rigobon could barely contain himself as he took to the stage. Mr. Rigobon, a professor at M.I.T.’s Sloan School of Management, said that the financial crisis certainly humbled the data hounds. “Hedge funds failed all over the world,” he said. THE problem is that a math model, like a metaphor, is a simplification. This type of modeling came out of the sciences, where the behavior of particles in a fluid, for example, is predictable according to the laws of physics.

Sure Big Data has downsides. MBAs love to lift downsides via their trusty, almost infallible intellectual hydraulics.

My focus is search. The trends I wish to share with my two or three readers require some preliminary observations:

Search vendors will just say they can handle Big Data. Proof not required. It is cheaper to assert a technology than actually develop a capability.
Search vendors will point out that sooner or later a user will know enough to enter a query. Fancy math notwithstanding, nothing works quite like a well crafted query. Search may be a commodity, but it will not go away.
Big Data systems are great at generating hot graphics. In order to answer a question, a Big Data system must be able to display the source document. Even the slickest analytics person has to find a source. Well, maybe not all of the time, but sometimes it is useful prior to a deposition.
Big Data systems cannot process certain types of data. Search systems cannot process certain types of data. It makes sense to process whatever fits into each system’s intake system and use both systems. The charm of two systems which do not quite align is sweet music to a marketer’s ears. If a company has a search system, that outfit will buy a Big Data system. If a company has a Big Data system, the outfit will be shopping for a search system. Nice symmetry!
Search systems and Big Data systems can scale. Now this particular assertion is true when one criterion is met; an unending supply of money. The Big Data thing has a huge appetite for resources. Chomp. Chomp. That’s the sound of a budget being consumed in a sprightly way.

Now the trends:

Trend 1. Before the end of 2013, Big Data will find itself explaining why the actual data processed were Small Data. The assertion that existing systems can handle whatever the client wants to process will be exposed as selective content processing systems. Big Data are big and systems have finite capacity. Some clients may not be thrilled to learn that their ore did not include the tonnage that contained the gems. In short, say hello to aggressive sampling and indexes which are not refreshed in anything close to real time.

Trend 2. Big Data and search vendors will be tripping over themselves in an effort to explain which system does what under what circumstances. The assertion that a system can do both structured and unstructured while uncovering the meaning of the data is one I want to believe. Too bad the assertion is mushy in the accuracy department’s basement.

Trend 3.The talent pool for Big Data and search is less plentiful than the pool of art history majors. More bad news. The pool is not filling rapidly. As a result, quite a few data swimmers drown. Example: the financial crisis perhaps? The talent shortage suggests some interesting cost overruns and project failures.

Trend 4. A new Big Thing will nose into the Big Data and search content processing space. Will the new Big Thing work? Nah. The reason is that extracting high value knowledge from raw data is a tough problem. Writing new marketing copy is a great deal easier. I am not sure what the buzzword will be. I am pretty sure vendors will need a new one before the end of 2013. Even PSY called it quits with Gangnam style. No such luck in Big Data and search at this time.

Trend 5. The same glassy eyed confusion which analytics and search presentations engender will lead to greater buyer confusion and slow down procurements. Not even the magic of the “cloud” will be able to close certain deals. In a quest for revenue, the vendors will wrap basic ideas in a cloud of unknowing.

I suppose that is a good thing. Thank goodness I am unemployed, clueless, and living in a rural Kentucky goose pond.

Stephen E Arnold, January 1, 2012

Another Beyond Search analysis for free

Written by Stephen E. Arnold · Filed Under Analytics, Big data, Feature, Search | 3 Comments

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Big Data Startup Parade Begins with These 14 Companies

Search and Content Processing Vendor in the Spotlight

HP, Autonomy, and a Context Free Expert Output about Search: The Bet on a Horse Approach to Market Analysis

Swinging for the Fences and Search

Tougher Times for Cash Hungry Content Processing Vendors?

Government Initiatives and Search: A Make-Work Project or Innovation Driver?

WCC Group and ImageWare

eDiscovery: A Source of Thrills and Reduced Costs?

Information Confusion: Search Gone South

Big Data and Search

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta