Chiliad Offline: A Precursor for Other BI Outfits

October 13, 2014

According to PacerMonitor, Chiliad, Inc. filed for bankruptcy on August 6, 2014. As you may recall, the company was a Washington, DC area analytics firm founded by Christine Maxwell of McKinley Group and Magellan fame. (Magellan became part of Excite, which also faded away.)

About two years ago, Beyond Search wrote about Chiliad and its big rocks. Also, in 2012, the company named Craig Norris, as chief executive officer. Mr. Norris (an industry leader according to Reuters)  had been the CEO of Attensity, sentiment analysis outfit, which has experienced its share of strong headwinds. In the news release about his appointment, he said:

“I am excited to be joining Chiliad at an important stage in its growth. What makes or breaks an analytics company is the quality and usability of its core technology. Chiliad’s offering has proven its ability to extract critical findings from data at massive scale for both Government and Commercial customers. I am eager to see us gain recognition for our technology leadership.”

The news release included assertions by Patrick Gross (Chairman of the Chiliad board of directors) that I have encountered many times in the last five years; to wit:

“Chiliad has already solved two very challenging problems. The first is the ability to rapidly search data collections at greater scale than any other offering in the market. The second is to allow search formulation and analysis in natural language. This means that no longer is an elite class of analysts required in order to generate meaningful results, thus reducing the personnel training and skills shortages that plague alternative solutions and put timely discovery at risk. The explosion of ‘Big Data’ is real and valuable findings are buried in vast collections for both enterprises and governments. Chiliad has the opportunity to integrate its innovative, massively scalable solutions with emerging open source software to build customized solutions for the largest-scale clients.”

Businessweek described the company in this way:

Chiliad, Inc. provides data analysis solutions for various clouds, agencies, departments, and other stovepipes. The company offers Discovery/Alert, a platform that enables investigators, business analysts, and knowledge workers to securely reach, find, analyze, and continuously stay on top of big data—whether structured or unstructured, and classified or unclassified. Its software solutions include Iterative Discovery cycle that allows analysts and researchers to reach various content silos, find what matters, analyze it to find meaning from the information relationships presented and continuously monitor changes; and Architecture, a virtual consolidated data center that enables multidimensional analysis and ranking. It serves government/intelligence, law enforcement, healthcare, pharmaceutical, insurance, and other markets. Chiliad, Inc. was founded in 1998 and is headquartered in Herndon, Virginia.

I have highlighted the buzzwords that were designed to generate sales leads and revenue. I can only assume that the verbiage and the Attensity management touch fell short of the mark. How many of the “analytics” and “business intelligence” companies will follow Chiliad’s path? Good question but I keep asking it.

Stephen E Arnold, October 12, 2014

The Problem with Data Silos for Companies and Consumers

October 13, 2014

The article titled What If Your Data Worked Together on The Woopra Blog makes a plea for normalized and federated information. It also stomps data silos in the dirt for causing frustration in both customers and employees. The call for efficiency in this article does laud certain companies for organizing relevant data, with the example that follows,

“I had ordered a bed from (Overstock.com) and called them a few days later to ask a question about the delivery. The woman who answered my call didn’t ask me for a single piece of information, just “How can I help you?”. She already knew exactly who I was and what I had ordered…she told me that their system automatically gave her my profile based on my phone number.”

This particular example resonated with me, especially after dealing with certain cable companies who seem to keep all of their data in lockboxes and throw away the keys. The article went on to suggest that data silos hurt companies as much as customers by segmenting data and making it more difficult to understand the entire story in a certain usage. This article ends with a promise that it will follow up with more information on data harmony, and we can only hope that someone out there is listening.

Chelsea Kerwin, October 13, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Portals are Back, for All Devices through CMS

October 13, 2014

The article on Business Zone titled Enterprise Portals: Your Company’s Internal Business Card offers some tips on the perfect portal. That’s, right portals are back! The question is, will mobile users rely on them? The article suggests that the typically slow and overloaded portals need not be the rule for all portals. Due to their association with difficulty, many companies fail to spend the appropriate time and resources building clear and intuitive portals. The article states,

“Once the structure of the portal has been created, regular updating is key to avoiding clutter. In particular, platforms that have grown over time with little control or management, tend to get overcrowded with information quickly. It is essential to provide a positive user experience which is why regular audits and updates to the content are vital. Businesses should also make sure to have gatekeepers in place that control the amount and nature of content that is uploaded.”

Stressing the importance of relevance, the article puts forth the notion that a well-crafted enterprise portal can act as a “virtual colleague.” Using Content Management Systems (CMS) can help corporations allow for a portal use from a variety of devices. The automatic distribution of content to all devices could foreseeably be an excellent step in streamlining the output of up-to-date information.

Chelsea Kerwin, October 13, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

New IBM Redbook: IBM Watson Enterprise Search and Analytics

October 12, 2014

The Redbook is free. You can download it from this IBM link for now. The full title is “IBM Watson Content Analytics. Discovering Actionable Insight from Your Content.”

The Redbook weighs in with 598 pages of Watson goodness. If you follow the IBM content analytics products, you may know that the previous version was know as IBM Content Analytics with Enterprise Search or (ICAwES).

The Redbook presents some philosophical content. IBM has a tradition to uphold. In addition, the Redbook provides information about facets (yep, good old metadata), some mathy features that make analytics analytical, and sentiment analysis.

ICAwES does not operate as an island. The sprawling system can hook into IBM’s semi automatic classification system, Cognos, and interface tools.

Is ICAwES an “enterprise search” system? I would say, “Sure is.” You will have to work through the Redbook and draw your own conclusions. You will also want to identify the Watson component. Watson is Lucene with IBM scripts and wrappers, but IBM has far more colorful lingo for describing the system. After all, IBM Watson is supposed to generate $1 billion in a snappy manner. If IBM’s plan bears revenue fruit, in five or six years, Watson will be a $10 billion per year business. That’s quite a goal, considering Autonomy required 13 years to push into $800 million in revenue territory and IBM has been offering information retrieval systems since the days of STAIRS.

The new information in the July 2014 edition of the Redbook adds a chapter containing some carefully selected case studies. There is a new chapter called “Enterprise Search” to which I will return in a moment. Also, the many authors of the Redbook have added to the discussion of Cognos, one of IBM’s business intelligence systems. Finally, the Redbook provides some helpful suggestions for “customizing and extending the content analytics miner.”

I urge you to work through this volume because it provides a useful yardstick against which to measure the IBM Watson marketing and public relations explanations against the reality, limitations, and complexity of the IBM Content Analytics system. Is the Redbook describing a product or a collection of components that an IBM implementation team will use to craft a customized solution?

The chapter on Enterprise Search begins on page 445 and continues to page 486. The solution is a two part affair. On one hand, processed content will output data about the entities, word frequencies, and similar metrics in the corpus and updates to the corpus. On the other hand, ICAwES is a search and retrieval system. Many vendors take this approach today; however, certain types of content cannot be comprehensively processed by the system. Examples range from video content, engineering drawings, digital imagery, and certain types of ephemeral content such as text messages sent via an ad hoc Bluetooth mesh network. One can code up a fix, but that is likely to be more hassle than many licensees will tolerate.

The Redbook shows some ready-to-use interfaces. These can, of course, be modified. The sample in the screenshot below looks quite a bit like the original Fulcrum Technologies’ presentation of information processed by the system. A more modern implementation would be Amazon’s recent JSON centric system for content.

image

ICAwES Redbook, Copyright IBM 2014.

The illustration shows a record viewed by tags; for example categories. Items can be tallied in a chart that provides a summary of how many content objects share a particular index terms. The illustration shows the ICAwES identifying terms in a user’s query, identifying entities like IBM Lotus Domino, and other features associated with Autonomy IDOL or Endeca style systems. Both of these date from the late 1990s, so IBM is not pushing too far from the dirt path carved out of the findability woods by former leaders in enterprise search.

IBM provides information needed to implement query expansion. Yes, a dictionary lurks within the system, and an interface is provided so the licensee can be like Noah Webster. The system is rules based, and a specialist is needed to create or edit rules. As you may know, rules based systems suffer from several drawbacks. Rules have to be maintained, subject matter experts or programmers are usually required to make the proper judgments, and rules can drift out of phase with the users’ queries unless the system is monitored with above average rigor. Like Autonomy IDOL, skimp on monitoring and tuning, and the system can generate some interesting results.

The provided user interface looks like this:

image

ICAwES Redbook, Copyright IBM 2014.

With many users wanting a “big red button” to simplify information access, this interface brings forward the high density displays associated with TeraText and similar legacy systems. The density seems to include hints of Attivio and BA Insight user interfaces as well. There are many choices available to the user. However, without special training, it is unlikely that a marketing professional using ICAwES will be able to make full use of of query trees, category trees, and the numerous icons that appear in four different locations. I can hear the user now, “I want this system to be just like Google? I want to type in a three words and scan the results.”

Net net. If you are working in an organization that favors IBM solutions, this system is likely to be what senior management licenses. Keep in mind that ICAwES will require the ministrations of IBM professional services, probably additional headcount, and on-going work to keep the system delivering useful results to users and decision makers.

The system delivers key word search, rich indexing, and basic metrics about the content. IBM offers more robust analytic tools in its SPSS product line. For more comprehensive text analysis, take a look at IBM i2 and Cybertap solutions if your organization has appropriate credentials for these somewhat more sophisticated information access and analysis systems.

After working through the Redbook, I had one question, “Where’s Watson?”

Stephen E Arnold, October 12, 2014

SRCH2: Security and Speed

October 12, 2014

Oracle’s Secure Enterprise Search offered advanced security. Perfect Search stressed its speed. SES has been marginalized. That particular security pitch did not work. Perfect Search also has faded from the scene.

Perhaps pitching both security and speed will yield more together than as separate features.

SRCH2 asserts that it is four times faster than open source search engines. None of the open source search engines is a speed demon. Speed boosts require additional work on the specific subsystem introducing the latency for a particular deployment.

SRCH2’s “Real Time Computer Requires Faster Search” makes a case for the optimization built in to SRCH2’s system. The article states:

SRCH2 offers the world’s fastest search engine. Why is speed so important? After all, the human eye can’t detect the difference between a 10-millisecond and 50-millisecond response time.

Some data backing this assertion would be helpful. In a direct comparison of Lucid Works’ technology with ElasticSearch’s technology, the ArnoldIT team found that one was faster in indexing and the other was faster in query processing. Both could be improved with focused optimization. Perhaps SRCH2 will share some of their data which backs up the “four time faster claim? (I am not at liberty to release the performance data a client requested my team compile from live tests on my test corpus.

SRCH2’s “SRCH2 Introduces Access Control Lists to Improve Search Security.” The article states:

SRCH2 took the approach of providing native support of access control to set restrictions on search results. With SRCH2’s ACL feature, developers can restrict user permissions to access either certain records in an index, or specific attributes within a record or set of records.

The approach is useful. However, it is less robust that the Oracle approach which implemented a wider range of features provided by specialized Oracle subsystems.

Will the combination of security and speed pay off for SRCH2? Good question. I do not have an answer.

Stephen E Arnold, October 11, 2014

V.I. Arnold and His Math Teaching Ideas

October 11, 2014

Short honk: My relative, Vladimir Igorevich Arnold, worked with some fairly smart people; for example, Kolmogorov. If you want to get a sense of his ideas about math teaching, you may find “On Teaching Mathematics” interesting. Like most of my family’s work, one can improve by applying more effort. Demanding group, probably just like yours, gentle reader.

Stephen E Arnold, October 11, 2014

Whither Convera? Still Around

October 10, 2014

A happy quack to the reader who forwarded me a link to the biographical information for Gerald Burnand. I learned from this information page that Convera lives on as Ntent. The page reports:

Ntent. Privately Held; Search Technology, Semantics, Advertising and Marketing company. Previously was Vertical Search Works, born from the merger of Convera and Firstlight ERA (a UK company).

For those of you who want that old time goodness that was Excalibur Technologies, ConQuest, and Convera, navigate to www.ntent.com. A profile of Convera is available on the Xenky profiles page.

Stephen E Arnold, October 11, 2014

Amazon Learns from XML Adventurers

October 10, 2014

I recall learning a couple of years ago that Amazon was a great place to store big files. Some of the XML data management systems embraced the low prices and pushed forward with cloud versions of their services.

When I read “Amazon’s DynamoDB Gets Hugely Expanded Free Tier And Native JSON Support,” I formed some preliminary thoughts. The trigger was this passage in the write up:

many new NoSQL and relational databases (including Microsoft’s DocumentDB service) now use JSON-style document models. DynamoDB also allowed you to store these documents, but developers couldn’t directly work with the information stored in them. That’s changing today. With this update, developers can now use the AWS SDKs for Java, .NET, Ruby and JavaScript to easily map their JSON data to DynamoDB’s own data types. That turns DynamoDB in a fully-featured document store and is going to make life easier for many developers on the platform.

Is JSON better than XML? Is JSON easier to use than XML? Is JSON development faster than XML? Ask an XML rock star and the answer is probably, “You crazy.” I can hear the guitar riff from Joe Walsh now.

Ask a 20 year old in a university programming class, and the answer may be different. I asked the 20 something sitting in my office about XML and he snorted: “Old school, dude.” I hire only people with respect for their elders, of course.

Here are the thoughts that flashed through my 70 year old brain:

  1. Is Amazon getting ready to make a push for the customers of Oracle, MarkLogic, and other “real” database systems capable of handling XML?
  2. Will Amazon just slash prices, take the business, and make the 20 year old in my office a customer for life just because Amazon is “new school”?
  3. Will Amazon’s developer love provide the JSON fan with development tools, dashboards, features, and functions that push clunky methods like proprietary Xquery messages into a reliquary?

No answers… yet.

Stephen E Arnold, October 10, 2014

xx

The AIIM Enterprise Search Study 2014

October 10, 2014

I worked through the 34 page report “Industry Watch. Search and Discovery. Exploiting Knowledge, Minimizing Risk.” The report is based on a sampling of 80,000 AIIM community members. The explanation of the process states:

Graphs throughout the report exclude responses from organizations with less than 10 employees, and suppliers of ECM products and services, taking the number of respondents to 353.

The demographics of the sample were tweaked to discard responses from organizations with fewer than 10 employees. The sample included respondents from North America (67 percent), Europe (18 percent) and “rest of world” (15 percent).

Some History for the Young Reader of Beyond Search

AIIM has roots in imaging (photographic and digital imaging). Years ago I spent an afternoon with Betty Steiger, a then well known executive with a high profile in Washington, DC’s technology community. She explained that the association wanted to reach into the then somewhat new technology for creating digital content. Instead of manually indexing microfilm images, AIIM members would use personal computers. I think we connected in 1982 at her request. My work included commercial online indexing, experiments in full text content online, a CD ROM produced in concert with Predicasts’ and Lotus, and automated indexing processes invented by Howard Flank, a sidekick of mine for a very long time. (Mr. Flank received the first technology achievement award from the old Information Industry Association, now the SIIA).

AIIM had its roots in the world of microfilm. And the roots of microfilm reached back to University Microfilms at the close of World War II. After the war, innovators wanted to take advantage of the marvels of microimaging and silver-based film. The idea was to put lots of content on a new medium so users could “find” answers to questions.

The problem for AIIM (originally the National Micrographics Association) was indexing. As an officer at a company considered in the 1980 as one of the leaders in online and semi automated indexing methods, Ms. Steiger and I had a great deal to discuss.

But AIIM evokes for me:

Microfilm —> Finding issues —> Digital versions of microfilm —> CD ROMs —> On premises online access —> Finding issues.

I find the trajectory of a microfilm leading to pronouncements about enterprise search, content processing, and eDiscovery fascinating. The story of AIIM is a parallel for the challenges the traditional publishing industry (what I call the “dead tree method”) has, like Don Quixote, galloped, galloped into battle with ones and zeros.

Asking a trade association’s membership for insights about electronic information is a convenient idea. What’s wrong with sampling the membership and others in the AIIM database, discarding those who belong to organizations with fewer than 10 employees, and tallying up the survey “votes.” For most of those interested in search, absolutely nothing. And that may be part of the challenge for those who want to get smart about search, findability, and content processing.

Let’s look at three findings from the 30 plus page study. (I have had to trim because the number of comments and notes I wrote when reading the report is too massive  for Beyond Search.)

Finding: 25 percent have no advanced or dedicated search tools. 13 percent have five or more [advanced or dedicated search tools].

Talk about good news for vendors of findability solutions. If  one thinks about the tens of millions of organizations in the US, one just discards the 10 percent with 10 or fewer employees, and there are apparently quite a large percentage with simplistic tools. (Keep in mind that there are more small businesses than large businesses by a very wide margin. But that untapped market is too expensive for most companies to penetrate with marketing messages.) The study encourages the reader to conclude that a bonanza awaits the marketer who can identify these organizations and convince them to acquire an advanced or dedicated search tool. There is a different view. The research Arnold IT (owner of Beyond Search) has conducted over the last couple of decades suggests that this finding conveys some false optimism. For example, in the organizations and samples with which we have worked, we found almost 90 percent saturation of search. The one on one interviews reveal that many employees were unaware of the search functions available for the organization’s database system or specialized tools like those used for inventory, the engineering department with AutoCAD, or customer support. So, the search systems with advanced features are in fact in most organizations. A survey of a general population reveals a market that is quite different from what the chief financial officer perceives when he or she tallies up the money spent for software that includes a search solution. But the problems of providing one system to handle the engineering department’s drawings and specifications, the legal departments confidential documents, the HR unit’s employee health data, and the Board of Director’s documents revealing certain financial and management topics have to remain in silos. There is, we have found, neither an appetite to gather these data nor the money to figure out how to make images and other types of data searchable from a single system. Far better to use a text oriented metasearch system and dismiss data from proprietary systems, images, videos, mobile messages, etc. We know that most organizations have search systems about which most employees know nothing. When an organization learns about these systems and then gets an estimate to creating one big federated system, the motivation drains from those who write the checks. In our research, senior management perceives aggregation of content as increasing risk and putting an information time bomb under the president’s leather chair.

Finding:  47% feel that universal search and compliant e-discovery is becoming near impossible given the proliferation of cloud share and collaboration apps, personal note systems and mobile devices. 60% are firmly of the view that automated analytics tools are the only way to improve classification and tagging to make their content more findable.

The thrill of an untapped market fades when one considers the use of the word “impossible.” AIIM is correct in identifying the Sisyphean tasks vendors face when pitching “all” information available via a third party system. Not only are the technical problems stretching the wizards at Google, the cost of generating meaningful “unified” search results are a tough nut to crack for intelligence and law enforcement entities. In general, some of these groups have motivation, money, and expertise. Even with these advantages, the hoo hah that many search and eDiscovery vendors pitch is increasing potential customers’ skepticism. The credibility of over-hyped findability solutions is squandered. Therefore, for some vendors, their marketing efforts are making it more difficult for them to close deals and causing a broader push back against solutions that are known by the prospects to be a waste of money. Yikes. How does a trade association help its members with this problem? Well, I have some ideas. But as I recall, Ms. Steiger was not too thrilled to learn about the nitty gritty of shifting from micrographics to digital. Does the same characteristic exist within AIIM today? I don’t know.

Read more

Robot Writers Flood the Web

October 10, 2014

If you are reading this, it is likely that you look to the Internet for bit of news that inform your opinion on trends, technology, news stories, and the like. And most would assume that those stories and articles are crafted by humans who have an interest and experience in the field, just as this one is. But alas, we would all be wrong to believe that assumption. Robot writers are a growing proportion of the field. Read the details in the Contently article, “Does Your Brand Newsroom Need a Robot Writer?

The article begins:

“If you’ve spent any time reading on the web the past week, odds are you’ve read something written by a robot—and you didn’t even realize it. Robot writers are algorithms that collect and analyze data and then turn them into readable narratives. Many news sites like the Los Angeles Times and Forbes are already using them. Even Wikipedia has articles that weren’t written by humans.”

It is not surprising that automation has invaded the world of writing, but the jury is still out as to whether the quality is acceptable. But this also poses a question about cultural expectations regarding the quality of writing, particularly on Web outlets. See if you can spot the difference between articles crafted by human experts versus those written by a robot.

Emily Rae Aldridge, October 10, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta