VVVVV and Big Data

February 7, 2015

Somewhere along the line a marketer cooked up volume, variety, and velocity to describe Big Data. Well, VVV is good but now we have VVVVV. Want to know more about “value” and “veracity”? Navigate to “2 More Big Data V’s—Value and Veracity.” The new Vs are slippery. How does one demonstrate value. The write up does not nail down the concept. There are MBA type references to ROI, use cases, and brand. Not much numerical evidence or a credible analytic foundation is presented. Isn’t “value” a matter of perception. Numbers may not be needed.

Veracity is also a bit mushy. What about Brian Williams’ and his oft repeated “conflation”? What about marketing collateral for software vendors in search of a sale?

I typed 25 and moved on. Neither a big number nor much in the way of big data.

Stephen E Arnold, February 7, 2015

Attivio: New, New, New after $70 Million and Seven Years

February 7, 2015

With new senior managers and a hunt on for a new director of financial services, Attivio is definitely trying to shake ‘em up. I received some public relations spam about the most recent version of the Attivio system. The approach combines open source software with home brew code, an increasingly popular way to sell licenses, consulting, and services. To top it off, Attivio is an outfit that has the “best company culture” and Dave Schubmehl’s IDC report about Attivio with my name on it available for free. This was a $3,500 item on Amazon earlier this year. Now. Free.

Attivio’s February 3, 2015, news release explains that Attivio is in the enterprise search business. You can read the presser at this link. Not too long ago, Attivio was asserting that it was the solution to some business intelligence woes. I suppose search and business intelligence are related, but “real” intelligence requires more than keyword search and a report capability.

The release explains that Attivio is—I find this fascinating—“reinventing Big Data Search and Dexterity.” Not bad for open source, home brew, and Fast Search & Technology flavoring. Search and dexterity. Definitely a Google Adword keeper.

Attivio’s presser says:

Attivio 4.3 delivers new functionality and improvements that make it dramatically easier to build, deploy, and manage contextually relevant applications that drive revolutionary insight. Companies with structured and unstructured data in disparate silos can now quickly gain immediate access to all information with universal contextual enrichment, all delivered from Attivio’s agile enterprise platform.

I like “revolutionary insight.” Keep in mind that Attivio was formed by former Fast Search & Transfer executives in 2007 and has ingested, according to Crunchbase, $71.1 million in seven years. That works out to $10 million per year to do various technical things and sell products and services to generate money.

More significant to me than money that may be difficult or impossible to repay with a hefty uptick is that in seven years, Attivio has released four versions of its flagship software. With open source providing a chunk of functionality, it strikes me that Attivio may be lagging behind the development curve of some other companies in the content processing sector. But with advisors like Dave Schubmehl and his colleagues, the pace of innovation is likely to be explained as just wonderful. At Cambridge University, one researcher pointed out that work done in 2014 is essentially part of ancient history. There is perhaps a difference between Cambridge in the UK and Cambridge in Massachusetts.

What does Attivo 4.3 offer as “key features”? Here’s what the news release offers:

  • ASAP: Attivio Search Application Platform – a simple, intuitive user interface for non-technical users building search-based applications;
  • SAIL: Search Analytics Interactive Layer – offers more robust functionality and an enhanced user experience;
  • Advanced Entity Extraction: New machine-learning based entity extraction module enriches content with higher accuracy and improved disambiguation, enabling deeper discovery and providing a smart alternative to managing entity dictionaries;
  • Simplified Management: Empowers business users to handle documents and manage settings in a code-free environment;
  • Composite Documents: Unique ability to search across document fragments optimized to deliver sub-second response times;
  • New Designer Tools: Simplifies Attivio management through Visual Workflow and Component Editors, enables all users to design and build custom processing logic in an integrated UI.

There are a couple of important features that are available in other vendors’ systems; for example, geographic functions, automated real-time content collection, automated content analytics, and automated outputs to a range of devices, humans, or other systems.

The notion of ASAP and SAIL are catchy acronyms, but I find them less than satisfying. The entity extraction function is interesting but there is no detail about how it works in languages other than Roman based character sets, how the system deals with variants, and how the system maps one version of an entity to another in content that is either static imagery or video.

I am not sure what a composite document is. If a document contains images and videos, what does the system do with these content objects. If the document is an XML representation, what’s the time penalty to convert content objects to well formed XML? With interfaces becoming the new black, Attivio is closing the gap with the Endeca interface toolkit. Endeca dates from the late 1990s and has blazed a trail through the same marketing jungle that Attivio is now retracing.

For more information about Attivio, visit the company’s Web site at www.attivio.com. The company will be better equipped to explain virtual, enterprise search, big data, and the company’s financial posture than I.

Stephen E Arnold, February 7, 2015

Digital Imaging: An IDC Estimate Misses the Mark

February 6, 2015

Digital imaging is getting bigger. I came across this interesting factoid in LensVid:

Predicting the future of the camera market proved challenging in the past – IDC (the American market research, analysis and advisory firm) failed to predict what will happen to the mirrorless camera market. In 2012 they concluded that in 2014 we will see no less than 13 million mirrorless cameras sold worldwide. Only 3 million mirrorless cameras were actually sold…

For the full run down on digital photography in 2014, navigate to “LensVid Exclusive: What Happened to the Photography Industry in 2014?

Next time someone tosses mid tier consulting firm predictions your way, perhaps you should let them pass on by.

Stephen E Arnold, February 6, 2015

Graphic Pits dtSearch against Lucene

February 6, 2015

An oddball TechWars graphic suggests that Lucene is making life difficult for vendors of proprietary search systems. In the site’s head-to-head “dtSearch vs Lucene” comparison, the open source solution seems to handily trounce dtSearch. Of course, for us, Lucene means Elasticsearch. For those unfamiliar with TechWars, here’s what the site’s description of what it does:

Data-driven: TechWars shows objective data gathered from the web to help you make the right decision when choosing technology for your projects.

Up-to-date: TechWars scans the web to catch the latest trends, so you can sit back and relax while we keep you updated.

Professional: TechWars is built for professionals, by professionals. Let’s build the best tech comparison tool together!

Community: TechWars serves the developer community by opening case studies for discussion. We are always open to requests and feedback via Facebook and Twitter.

The graphic compares dtSearch and Lucene in several areas. We’re told that 196 of TechWars users use Lucene, versus just 15 who use dtSearch. Under the “which companies use it?” heading, sixteen companies (several high-profile) are listed for Lucene, but “no companies found” for dtSearch. Um, it seems like a pretty shallow dataset they’re tapping into there. The site does use Google data for one comparison—a graph that shows how very many more folks have searched for information on Lucene than on dtSearch. At a glance, Lucene would seem to be coming out ahead.

Cynthia Murrell, February 06, 2015

Sponsored by ArnoldIT.com, developer of Augmentext

Attensity Partners with Moreover at LexisNexis

February 6, 2015

Attensity has cut a deal with LexisNexis for sentiment analysis. Well, technically with the content delivery firm Moreover Technologies, which is now part of LexisNexis. We learn of the deal from a press release at PRNewswire, “Attensity Announces Strategic Alliance with Moreover; Extends Lead Over Competitors for Most Comprehensive Sentiment Analytics on the Market.” Our question—is Lexis Nexis profitable? The write-up tells us:

“Moreover (recently acquired by LexisNexis) provides Attensity direct access to its media management data through the Moreover API, enabling brands to aggregate vast amounts of web resources for real-time sentiment analytics to anticipate industry and consumer shifts. Attensity has incorporated Moreover’s Metabase of news sources into its Pipeline and recently updated Attensity Q solutions, as well as made it available for deep dive business intelligence analysis in Attensity Analyze.”

We’re informed that Attensity can now boast over 550 million data sources—websites, forums, social media, and the like. Headquartered in Palo Alto, California, Attensity is at the fore of the natural language processing and sentiment analysis fields. Rooted in their development of tools that serve the intelligence community, the company now provides real-time discovery solutions to Global 1000 companies.

Moreover began aggregating global news sources since 1998, and has built their award-winning enterprise data distribution and analytical tools on that early foundation. Headquartered in Reston, Virginia, the company was acquired by LexisNexis in October 2014.

LexisNexis’ specialty is workflow solutions for the legal, risk management, corporate, government, law enforcement, accounting, and academic markets. The company traces its roots to the U.K.’s Butterworth publishing house, founded way back in 1818. It is now headquartered in Albany, New York.

Cynthia Murrell, February 06, 2015

Sponsored by ArnoldIT.com, developer of Augmentext

IBM Watson Offers Demos

February 6, 2015

One of Vivisimo’s founders, Jerome Pesenti, seems to be the voice of IBM Watson. Vivisimo was a metasearch system with hit clustering. The company went through several management arabesques and was sold to IBM in 2012. Vivisimo pitched its system as a federated search engine. The configuration method, as I recall, required Jerome level input. In one installation, I learned that the Vivisimo system hit a wall when 250,000 documents were processed. There were work arounds, but these too required humans who knew the ins and outs of Vivisimo.

I recall that prior to the sale of Vivisimo to IBM, Vivisimo shifted to a government consulting services focus. Many search vendors in the hay day of the buy outs followed this path. License fees were not generating the cash the spreadsheet jockeys funding outfits like Endeca, Exalead, and Vivisimo envisioned. No problem. Some organizations wanted proprietary content processing systems and figured that it was time to sell out. The Big Dog of sell outs was Hewlett Packard’s $11 billion purchase of Autonomy. Vivisimo fetched about $20 million or one year’s projected revenue according to the stockholder familiar with the deal suggested.

Fast forward two or three years and Vivisimo is now Watson. Oh, Vivisimo is also a Big Data solution, not a metasearch engine. I assume the index limits have been addressed. I am thinking about IBM Watson for two reasons:

  1. IBM is going through a staff reduction. I assume this action was determined by querying the super smart Watson system
  2. I read “Five New Services Expand IBM Watson Capabilities to Images, Speech, and More,” an IBM in house marketing article.

To my surprise there was a significant shift in Watson marketing; to wit, there are now links to demos of IBM’s text to speech service, image recognition service, relationship analysis service, and something called tradeoff analytics. Now demos are helpful. So is the Watson “great video” about concept insights.

I ran the suggested query for “quantum physics.” Remember I used to work at Halliburton Nuclear Services. Here’s what I saw:

image

I noticed that each of the experts in the human resources database use the word “quantum” to describe their background.

I then ran a query for “tamarind,” one of the ingredients in a barbeque sauce created by Watson during its recipe phase. Here’s what I saw:

image

There is no recipe, nor is there an IBM person listing the barbeque recipe as his or her work. I was surprised. No tamarind wizard in the data set.

I asked myself, “Can’t I do this with Elasticsearch?” The answer my mind generated was, “No. No. No. You silly oaf. Watson uses Lucene but it is much, much more.”

How confident are the Watson workers who have dodged IBM layoffs?

What happens if Watson with Vivisimo, iPhrase, WebFountain, and assorted Almaden semantic goodies are aced by Hewlett Packard Autonomy or—heaven forbid—Amazon?

Will Dr. Pesenti be able to build a business that is orders of magnitude larger than Vivisimo’s revenue?

Interesting stuff. Not CyberOSINT level work, but interesting. I wonder why the i2 and related technologies are not pushed more aggressively. i2 works. (Note: I was a consultant to i2 prior to IBM’s purchase of the company.)

Stephen E Arnold, February 6, 2015

Deconstructing the Glass Deconstruction

February 6, 2015

You know about Google Glass. The glasshole thing.

I read “Broken Glass” or “Why Glass Broke.” You may be able to locate this deconstruction of Google Glass at this link. If you have to pay or the  link is dead, don’t complain to me, gentle reader. Cast your aspersions elsewhere.

image

Google sunGlass. Handy.

The write up appears in the Style section of the New York Times. I assume that the subject (Glass) is not appear to be a business story. The write up contains 12 “I” statements. These refer to the author’s “being there,” but not in the Jerzy Kosinski sense. There are anecdotes about the happenstance of Google X Labs. Well, the company creating Glass is Google. There is an intriguing fact: The super secret headquarters of Google X Labs is or was 1489 Charleston Avenue which does not appear in my instance of Google Earth. Perhaps the address is “Charleston Road”?

The write up provides an insight into Google’s technology management processes; for example:

At the time [2011], unknown to anyone outside X, an impassioned split was forming between X engineers about the most basic functions of Google Glass. One faction argued that it should be worn all day, like a “fashionable device,” while others thought it should be worn only for specific utilitarian functions. Still, nearly everyone at X was in agreement that the current prototype was just that: a prototype, with major kinks to be worked out. There was one notable dissenter. Mr. Brin knew Google Glass wasn’t a finished product and that it needed work, but he wanted that to take place in public, not in a top-secret lab. Mr. Brin argued that X should release Glass to consumers and use their feedback to iterate and improve the design.

I want to credit the New York Times’ Style editor for including information about the alleged Brin Rosenberg interaction. A factoid or two may have slipped to the cutting room floor. Anyone know anything about an alleged suicide attempt?

Glass is, of course, not dead. For style lovers, Glass will live on in the history of head mounted computers with an ever so brief battery life. But, as a fashion forward person said, “This was the first time that people talked about wearable technology.”

Did you know that, Jaron Lanier?

Stephen E Arnold, February 5, 2015

WikiGalaxy: Interactive Visualization

February 5, 2015

Short honk: A visualization of some Wikipedia articles is available at this link.

image

The visualization includes a search box. It is helpful. I did not understand the dots of light that flew across the display. The display held my attention for a short period of time.

Stephen E Arnold, February 5, 2015

Enterprise Search: Mapless and Lost?

February 5, 2015

One of the content challenges traditional enterprise search trips over is geographic functions. When an employee looks for content, the implicit assumption is that keywords will locate a list of documents in which the information may be located. The user then scans the results list—whether in Google style laundry lists or in the graphic display popularized by Grokker and Kartoo which have gone dark. (Quick aside: Both of these outfits reflect the influence of French information retrieval wizards. I think of these as emulators of Datops “balls” displays.)

grok_150

A results list displayed by the Grokker system. The idea is that the user explores the circular areas. These contain links to content germane to the user’s keyword query.

The Kartoo interface displays sources connected to related sources. Once again the user clicks and goes through the scan, open, read, extract, and analyze process.

In a broad view, both of these visualizations are maps of information. Do today’s users want these type of hard to understand maps?

In CyberOSINT I explore the role of “maps” or more properly geographic intelligence (geoint), geo-tagging, and geographic outputs) from automatically collected and analyzed data.

The idea is that a next generation information access system recognizes geographic data and displays those data in maps. Think in terms of overlays on the eye popping maps available from commercial imagery vendors.

What do these outputs look like? Let me draw one example from the discussion in CyberOSINT about this important approach to enterprise related information. Keep in mind that an NGIA can process any information made available to the systems; for example, enterprise accounting systems or databased content along with text documents.

In response to either a task, a routine update when new information becomes available, or a request generated by a user with a mobile device, the output looks like this on a laptop:

image

Source: ClearTerra, 2014

The approach that ClearTerra offers allows a person looking for information about customers, prospects, or other types of data which carries geo-codes appears on a dynamic map. The map can be displayed on the user’s device; for example a mobile phone. In some implementations, the map is a dynamic PDF file which displays locations of items of interest as the item of interest moves. Think of a person driving a delivery truck or an RFID tagged package.

Read more

SkyMall Files for Bankruptcy, Blames Smartphone

February 5, 2015

The point, counterpoint articles on Ars Technica titled SkyMall, Killed By the Smartphone explore the end of SkyMall. It seems inevitable that the usage of personal electronic devices on airplanes would push SkyMall out of the picture. Who ever looked through one of those catalogues if they had anything better to do. The article states,

“Blame the FAA’s relaxation of its ban on the use of personal electronic devices by airline passengers. Could it ever have ended any other way?… SkyMall worked because it had a captive audience with nothing else to look at; now that we can keep browsing or playing Cwazy Cupcakes how could it compete? Perhaps the more surprising thing—to us, at any rate—was the fact that until now, the power of boredom evidently made a decent business model.”

While there is something to be said for the bizarre items and strange model poses in the SkyMall catalogue, it seems incontrovertible that they didn’t sell anything anyone actually needed. While this outcome of bankruptcy may be specific to the “boredom business model” of SkyMall, it might be making publishers nervous in other arenas as well. If everyone has a smartphone on which they can read, check emails, and play games, they won’t need a magazine to flip through.

Chelsea Kerwin, February 05, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta