IDC: Knowledge Managemment and Knowledge Quotients

June 2, 2015

IDC tried to sell some of my work on Amazon without my permission. Much lawyering ensued, and IDC removed the $3,500 eight page heavily edited report about Attivio. I suppose that is a form of my knowledge management expertise: But $3,500 for eight pages without my caveats about Attivio? Goodness gracious. $3,500 for eight pages on Amazon, a company I describe as a digital WalMart..

I then wrote a humorous (to me) analysis of an IDC report about something called a knowledge quotient. You can read that Swiftian write up at this link: http://arnoldit.com/wordpress/honk/ . I write a column about knowledge management, and I found the notion of the KQ intellectually one of the lighter, almost diaphonous, IDC information molecules.

An I too harsh? No because now there is more evidence for my tough love approach to IDC and its KQ content marketing jingoism.

Navigate to “Where to for Knowledge Management in 2015: IDM Reader Survey.” The survey may or may not be spot on. Some of the data undermine the IDC KQ argument and raise important questions about those who would “manage knowledge.” Also, I had to read the title a couple of times to figure out what IDC’s expert was trying to communicate. The where to for is particularly clumsy to me.

I noted this passage:

“The challenge is for staff being able to find the time to contribute and leverage the knowledge/information repositories and having technology systems that are intuitive putting the right information that their fingertips, instead of having to wade through the sea of information spam.”

Ah, ha. KM is about search.

Wait. Not so fast. I highlighted this statement:

Technology is making it easier to integrate systems and connect across traditional boundaries, and social media has boosted people’s expectations for interaction and feedback. The result is that collaboration across the extended value chain is becoming the new normal.

Yikes. A revelation. KM is about social collaboration.

No, no. Another speed bump. I marked this insight too:

“There is also a fair gap between knowledge of the theoretical and knowledge of how things actually work. It is easy to say we should assign metadata to information to increase its discovery but if that metadata should really be more of a folksonomy, some systems and approaches are far too restrictive to enable this. Semantics is also a big issue.”

Finally. KM is about indexing and semantics. Yes, the info I needed.

Wrong again. I circled this brilliant gem:

“Knowledge management has probably lost it momentum as the so-called measurement tools are really measuring best practice which in turn is an average. Perhaps the approach should be along the lines of “Communities of Process” where there is a common objective but various degrees and level of participation but collectively provide a knowledge pool,” he [survey participant]observed.

The write continues along this rocky road of generalizations and buzzwords.

The survey data make three things clear to me:

  • The knowledge quotient jargon is essentially a scoop of sales Jello, Jack Benny’s long suffering sponsor
  • Knowledge is so broad IDC’s attempt to clarify gave me the giggles
  • Workers know that knowledge has value, so workers protect it with silos.

I assume that experts cooked up the knowledge quotient notion. The pros running the survey reported data which suggests that knowledge management is a bit of a challenge.

Perhaps IDC experts will coordinate their messaging in the future? In my opinion, two slabs of spam do not transmogrify into prime rib.

Little wonder IDC contracts is unable to function, one of its officers (Dave Schubmehl) resells my research on Amazon without my permission at $3,500 per eight pages edited to remove the considerations Attivio warranted from my team. Then an IDC research unit provides data which strike me as turning the silly KQ thing into search engine optimization corn husks.

Is IDC able to manage its own knowledge processes using its own theories and data? Perhaps IDC should drop down a level and focus on basic business processes? Yet IDC’s silos appear before me, gentle reader. and the silos are built from hefty portions of a mystery substance. Could it be consulting spam, to use IDC’s own terminology?

Stephen E Arnold, June 2, 2015

 

Prepare To Update Your Cassandra

June 2, 2015

It is time for an update to Apache’s headlining, open source, enterprise search software!  The San Diego Times let us know that “DataStax Enterprise 4.7 Released” and it has a slew of updates set to make open source search enthusiasts drool.   DataStax is a company that built itself around the open source Apache Cassandra software.  The company specializes in enterprise applications for search and analytics.

The newest release of DataStax Enterprise 4.7 includes several updates to improve a user’s enterprise experience:

“…includes a production-certified version of Cassandra 2.1, and it adds enhanced enterprise search, analytics, security, in-memory, and database monitoring capabilities. These include a new certified version of Apache Solr and Live Indexing, a new DSE feature that makes data immediately available for search by leveraging Cassandra’s native ability to run across multiple data centers.”

The update also includes DataStax’s OpCenter 5.2 for enhanced security and encryption.  It can be used to store encryption keys on servers and to manage admin security.

The enhanced search capabilities are the real bragging points: fault-tolerant search operations-used to customize failed search responses, intelligent search query routing-queries are routed to the fastest machines in a cluster for the quickest response times, and extended search analytics-using Solr search syntax and Apache Spark research and analytics tasks can run simultaneously.

DataStax Enterprise 4.7 improves enterprise search applications.  It will probably pull in users trying to improve their big data plans.  Has DataStax considered how its enterprise platform could be used for the cloud or on mobile computing?

Whitney Grace, June 2, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Semantic Search Failure Rate: 50% and There Is Another Watson Search System

June 1, 2015

The challenge of creating a semantic search system is a mini Mt. Everest during an avalanche. One of the highest profile semantic search systems was Siderean Software. The company quietly went quiet several years ago. I thought about Siderean when I followed up on a suggestion made by one of the stalwarts who read Beyond Search.

That reader sent me a link to a list of search systems. The list appeared on AI3. I could not determine when the list was compiled. To check the sticking power of the companies/organizations on the list, we looked up each vendor.

The results were interesting. Half of the listed companies were no longer in the search business.

Here’s the full list and the Beyond Search researcher’s annotations:

Search System Type
Antidot Finder Suite Commercial vendor
BAAGZ Not available
Beagle++ Not available
BuddyFinder (CORDER) Search buddyspace and Jabber
CognitionSearch Emphasis on monitoring
ConWeaver Customer support
DOAPspace Search not a focus of the site
EntityCube Displays a page with a handful of ideographs
Falcons Search system from Nanjing University
Ferret Open source search library
Flamenco A Marti Hearst search interface framework
HyperTwitter Does not search current Twitter stream
LARQ Redirects to Apache Jena, an open source Java framework for building Semantic Web and Linked Data applications
Lucene Apache Lucene Core
Lucene-skos Deprecated; points visitor to Lucene
LuMriX Medical search
Lupedia 404 error
OntoFrame Redirect due to 404 error
Ontogator Link to generic view based RDF search engine
OntoSearch 404 error
Opossum Page content not related to search
Picky Search engine in Ruby script
Searchy A metasearch engine performing a semantic translation into RDF; page updated in 2006
Semantic Search 404
Semplore 404
SemSearch Keyword based semantic search. Link points to defunct Google Code service
Sindice 404
SIREn 404
SnakeT Page renders; service 404s
Swangler Displays SemWebCentral.org; last update 2005
Swoogle Search over 10,000 ontologies
SWSE 404
TrueKnowledge 404
Watson Not IBM; searches semantic documents
Zebra General purpose open source structured text indexing and retrieval engine
ZoomInfo Commercial people search system

The most interesting entry in the list is the Watson system which seems to be operating as part of an educational institution.

Here’s what the Open.ac.uk Watson looks like:

image

IBM’s attorneys may want to see who owns what rights to the name “Watson.” But for IBM’s working on a Watson cookbook, this errant Watson may have been investigated, eh, Sherlock.

Stephen E Arnold, June 1, 2015

Amazon and Elasticsearch

May 29, 2015

If you are curious about the utility of Elastic’s technology, you will find “Indexing Common Crawl Metadata on Amazon EMR Using Cascading and Elasticsearch” a useful article to review. The main idea is that Amazon made Elasticsearch do some circus tricks. The write up explains the approach, provides code snippets, and includes a couple of nifty graphics which help those zany Zonies figure out the implications of the data crunched. the main idea is that Elasticsearch did something use with content in everyone’s favorite magic wand Hadoop. Why didn’t Amazon use LucidWorks (Really?)? Hmm. Good question.

Stephen E Arnold, May 29, 2015

Medical Tagging: No Slam Dunk

May 28, 2015

The taxonomy/ontology/indexing professionals have a challenge. I am not sure many of the companies pitching better, faster, cheaper—no, strike that—better automated indexing of medical information will become too vocal about a flubbed layup.

Navigate to “Coalition for ICD 10 Responds to AMA.” It seems as if indexing what is a more closed corpus is a sticky ball of goo. The issue is the coding scheme required by everyone who wants to get reimbursed and retain certification.

The write up quotes a person who is supposed to be in the know:

“We’d see 13,000 diagnosis codes balloon into 68,000 – a five-fold increase.” [Dr. Robert Wah of the AMA]

The idea is that the controlled terms are becoming obese, weighty, and frankly sufficiently numerous to require legions of subject matter experts and software a heck of a lot more functional than Watson to apply “correctly.” I will let you select the definition of “correctly” which matches your viewpoint from this list of Beyond Search possibilities:

  • Health care administrators: Get paid
  • Physicians: Avoid scrutiny from any entity or boss
  • Insurance companies: Pay the least possible amount yet have an opportunity for machine assisted claim identification for subrogation
  • Patients: Oh, I forgot. The patients are of lesser importance.

You, gentle reader, are free to insert your own definition.

I circled this statement as mildly interesting:

As to whether ICD-10 will improve care, it would seem obvious that more precise data should lead to better identification of potential quality problems and assessment of provider performance. There are multiple provisions in current law that alter Medicare payments for providers with excess patient complications. Unfortunately, the ICD-9 codes available to identify complications are woefully inadequate. If a patient experiences a complication from a graft or device, there is no way to specify the type of graft or device nor the kind of problem that occurred. How can we as a nation assess hospital outcomes, pay fairly, ensure accurate performance reports, and embrace value-based care if our coded data doesn’t provide such basic information? Doesn’t the public have a right to know this kind of information?

Maybe. In my opinion, the public may rank below patients in the priorities of some health care delivery outfits, professionals, and advisers.

Indexing is necessary. Are the codes the ones needed? In an automatic indexing system, what’s more important: [a] Generating revenue for the vendor; [b] Reducing costs to the customer of the automated tagging system; [c] Making the indexing look okay and good enough?

Stephen E Arnold, May 28, 2015

Connotate Reveals There Are One Billion Web Sites

May 16, 2015

I did not know there were one billion Web sites. Here’s the Web page on Connotate’s Web site which puts me in the know:

image

Source: www.connotate.com

The figure has been bandied about by Internet Live States, Business Insider, and the Daily Mail. This number was hit in late 2014 and confirmed by “the inventor of the Internet.” I noted that no one asked Google, an outfit which has a reasonable log file of its crawling activities. Doesn’t Google “know” a number? If the GOOG does, it is not talking or maybe the company is not returning phone calls from people asking, “How many Web sites make up the Internet?”

I navigated to Internet Live Stats on May 16, 2015, and noted this item of information:

image

I don’t want to rain on the parade, but the number is 900 million and apparently growing. Apparently the “number” can vary. Internet Live Stats says:

We do expect, however, to exceed 1 billion websites again sometime in 2015 and to stabilize the count above this historic milestone in 2016.

So what? Frankly the one billion number is irrelevant to me. What is relevant is that a company is using what I suppose is a sketchy number as a way to capture business is a good example of the marketing used by search and content processing vendors.

I know that generating organic, sustainable revenue from search and content processing, information access, and indexing software is very difficult.

The number of Web sites does not mean much, if anything. In an interview with BrightPlanet, I learned that savvy customers narrow the focus of their content acquisition and analysis. Less, it seems to me, may be more. Also, Darpa’s MEMEX project is designed to figure out the width, depth, and breadth of the Dark Net. Is it larger or smaller than the Clear Net?

I prefer value propositions and “marketing hooks” that do not equate size with importance or trigger the fear of not knowing what’s out there? But if it works, it is definitely okay in today’s pressurized sales environment.

There are a billion crazy search and content marketing assertions. Wait. Make that two billion.

Stephen E Arnold, May 16, 2015

HP Idol and Hadoop: Search, Analytics, and Big Data for You

May 16, 2015

I was clicking through links related to Autonomy IDOL. One of the links which I noted was to a YouTube video labeled “HP IDOL for for Hadoop: Create a Smarter Data Lake.” Hadoop has become a simile for making sense of Big Data. I am not sure what Big Data are, but I assume I will know when my eight gigabyte USB key cannot accept another file. Big Data? Doesn’t it depend on one’s point of view?

What is fascinating about the HP Idol video is that it carries a posting date of October 2014, which is in the period when HP was ramping up its anti-Autonomy legal activities. The video, I assumed before watching, would break from the Autonomy marketing assertions and move in a bold, new direction.

The video contained some remarkable assertions. Please, watch the video yourself because I may have missed some howlers as I was chuckling and writing on my old school notepad with a decidedly old fashioned pencil. Hey, these tools work, which is more than I can say for some of the software we examined last week.

Here’s what I noted with the accompanying screenshot so you can locate the frame in the YouTube video to double check my observation with the reality of the video.

First, there is the statement that in an organization 88 percent of its information is “unanalyzed.” The source is a 2012 study from Forrsights Strategy Spotlight: Business Intelligence and Big Data. Forrester, another mid tier consulting firm, produces these reports for its customers. Okay, a couple of years old research. Maybe it is valid? Maybe not? My thought was that HP may be a company which did not examine the data to which it had access about Autonomy before it wrote a check for billions of dollars. I assume HP has rectified any glitch along this line. HP’s litigation with Autonomy and the billions in write down for the deal underscore the problem with unanalyzed data. Alas, no reference was made to this case example in the HP video.

Second, Hadoop, a variant of Google’s MapReduce technology, is presented as a way to reap the benefits of cost efficiency and scalability. These are generally desirable attributes of Hadoop and other data management systems. The hitch, in my opinion, is that it is a collection of projects. These have been developed via the open source / commercial model. Hadoop works well for certain types of problems. Extract, transform, and load works reasonably well once the Hadoop installation is set up, properly resourced, and the Java code debugged so it works. Hadoop requires some degree of technical sophistication; otherwise, the system can be slow, stuffed with duplicates, and a bit like a Rube Goldberg machine. But the Hadoop references in the video are not a demonstration. I noted this “explanation.”

image

Third, HP jumps from the Hadoop segment to “what if” questions. I liked the “democratize Big Data” because “Big Data Changes everything.” Okay, but the solution is Idol for Hadoop. The HP approach is to create a “smarter data lake.” Hmmm. Hadoop to Idol to data lake for the purpose of advanced analytics, machine learning functions, and enterprise level security. That sounds quite a bit like Autonomy’s value proposition before it was purchased from Dr. Lynch and company. In fact, Autonomy’s connectors permitted the system to ingest disparate types of data as I recall.

Fourth, the next logical discontinuity is the shift from Hadoop to something called “contextual search.” A Gartner report is presented which states with Douglas McArthur-like confidence:

HP Idol. A leader in the 2014 Garnter Magic Quadrant for Contextual Search.

What the heck is contextual search in a Hadoop system accessed by Autonomy Idol? The answer is SEARCH. Yep, a concept that has been difficult to implement for 20, maybe 30 years. Search is so difficult to sell that Dr. Lynch generated revenues by acquiring companies and applying his neuro-linguistic methods to these firms’ software. I learned:

The sophistication and extensibility of HP Autonomy’s Intelligent Data Operating Layer (Idol) offering enable it to tackle the most demanding use cases, such as fraud detection and search within large video libraries and feeds.

Yo, video. I thought Autonomy acquired video centric companies and the video content resided within specialized storage systems using quite specific indexing and information access features. Has HP cracked the problem of storing video in Hadoop so that a licensee can perform fraud detection and search within video libraries. My experience with large video libraries is that certain video like surveillance footage is pretty tough to process with accuracy. Humans, even academic trainees, can be placed in front of a video monitor and told, “Watch this stream. Note anomalies.” Not exciting but necessary because processing large volumes of video remains what I would describe as “a bit of a challenge, grasshopper.” Why is Google adding wild and crazy banners, overlays, and required metadata inputs? Maybe because automated processing and magical deep linking are out of reach? HP appears to have improved or overhauled Autonomy’s video analysis functions, and the Gartner analyst is reporting a major technical leap forward. Identifying a muzzle flash is different from recognizing a face in a flow of subway patrons captured on a surveillance camera, is it not?

image

I have heard some pre HP Autonomy sales pitches, but I can’t recall hearing that Idol can crunch flows of video content unless one uses the quite specialized system Autonomy acquired. Well, I have been wrong before, and I am certainly not qualified to be an analyst like the ones Gartner relies upon. I learned that HP Idol has a comprehensive list of data connectors. I think I would use the word “library,” but why niggle?

Fifth, the video jumps to a presentation of a “content hub.” The idea is that HP idol provides visual programming tools. I assume an HP Idol customer will point and click to create queries. The  queries will deliver outputs from the Hadoop data management system and the content which embodies the data lake. The user can also run a query and see a list of documents. but the video jumps from what strikes me as exactly what many users no longer want to do to locate information. One can search effectively when one knows what one is looking for and that the needed information is actually in the index. The use case appears to be health care and the video concludes with a reminder that one can perform advanced analytics. There is a different point of view available in this ParAccel  white paper.

I understand the strengths and weaknesses of videos. I have been doing some home brew videos since I retired. But HP is presenting assertions about Autonomy’s technology which seem to be out of step with my understanding of what Idol, the digital reasoning engine, Autonomy’s acquired video technology.

The point is that HP seems to be out marketing Autonomy’s marketing. The assert6ions and logical leaps in the HP Idol Hadoop video stretch the boundaries of my credulity. I find this interesting because HP is alleging that Autonomy used similar verbal polishing to convince HP to write a billion dollar check for a search vendor which had grown via acquisitions over a period of 15 years.

Stephen E Arnold, May 16, 2015

Semantic Search: The View from a Taxonomy Consultant

May 9, 2015

My team and I are working on a new project. With our Overflight system, we have an archive of memorable and not so memorable factoids about search and content processing. One of the goslings who was actually working yesterday asked me, “Do you recall this presentation?”

The presentation was “Implementing Semantic Search in the Enterprise,” created in 2009, which works out to six years ago. I did not recall the presentation. But the title evoked an image in my mind like this:

image

I asked, “How is this germane to our present project?’

The reply the gosling quacked was, “Semantic search means taxonomy.” The gosling enjoined me to examine this impressive looking diagram:

image

Okay.

I don’t want a document. I don’t want formatted content. I don’t want unformatted content. I want on point results I can use. To illustrate the gap between dumping a document on my lap and presenting some useful, look at this visualization from Geofeedia:

image

The idea is that a person can draw a shape on a map, see the real time content flowing via mobile devices, and look at a particular object. There are search tools and other utilities. The user of this Geofeedia technology examines information in a manner that does not produce a document to read. Sure, a user can read a tweet, but the focus is on understanding information, regardless of type, in a particular context in real time. There is a classification system operating in the plumbing of this system, but the key point is the functionality, not the fact that a consulting firm specializing in taxonomies is making a taxonomy the Alpha and the Omega of an information access system.

The deck starts with the premise that semantic search pivots on a taxonomy. The idea is that a “categorization scheme” makes it possible to index a document even though the words in the document may be the words in the taxonomy.

image

For me, the slide deck’s argument was off kilter. The mixing up of a term list and semantic search is the evidence of a Rube Goldberg approach to a quite important task: Accessing needed information in a useful, actionable way. Frankly, I think that dumping buzzwords into slide decks creates more confusion when focus and accuracy are essential.

At lunch the goslings and I flipped through the PowerPoint deck which is available via LinkedIn Slideshare. You may have to register to view the PowerPoint deck. I am never clear about what is viewable, what’s downloadable, and what’s on Slideshare. LinkedIn has its real estate, publishing, and personnel businesses to which to attend, so search and retrieval is obviously not a priority. The entire experience was superficially amusing but on a more profound level quite disturbing. No wonder enterprise search implementations careen in a swamp of cost overruns and angry users.

Now creating taxonomies or what I call controlled term lists can a darned exciting process. If one goes the human route, there are discussions about what term maps to what word or phrase. Think buzz group and discussion group and online collaboration. What terms go with what other terms. In the good old days, these term lists were crafted by subject matter and indexing specialists. For example, the guts of the ABI/INFORM classification coding terms originated in the 1981-1982 period and was the product of more than 14 individuals, one advisor (the now deceased Betty Eddison), and the begrudging assistance of the Courier Journal’s information technology department which performed analyses of the index terms and key words in the ABI/INFORM database. The classification system was reasonably, and it was licensed by the Royal Bank of Canada, IBM, and some other savvy outfits for their own indexing projects.

As you might know, investing two years in human and some machine inputs was an expensive proposition. It was the initial step in the reindexing of the ABI/INFORM database, which at the time was one of the go to sources of high value business and management information culled from more than 800 publications worldwide.

The only problem I have with the slide deck’s making a taxonomy a key concept is that one cannot craft a taxonomy without knowing what one is indexing. For example, you have a flow of content through and into an organization. In a business engaged in the manufacture of laboratory equipment, there will be a wide range of information. There will be unstructured information like Word documents prepared by wild eyed marketing associates. There will be legal documents artfully copied and pasted together from boiler plate. There will be images of the products themselves. There will be databases containing the names of customers, prospects, suppliers, and consultants. There will be information that employees download from the Internet or tote into the organization on a storage device.

The key concept of a taxonomy has to be anchored in reality, not an external term list like those which used to be provided by Oracle  for certain vertical markets. In short, the time and cost of processing these items of information so that confidentiality is not breached is likely to make the organization’s accountant sit up and take notice.

Today many vendors assert that their systems can intelligently, automatically, and rapidly develop a taxonomy for an organization. I suggest you read the fine print. Even the whizziest taxonomy generator is going to require some baby sitting. To get a sense of what is required, track down an experienced licensee of the Autonomy IDOL system. There is a training period which requires a cohesive corpus of representative source material. Sorry, no images or videos accepted but the existing image and video metadata can be processed. Once the system is trained, then it is run against a test set of content. The results are examined by a human who knows what he or she is doing, and then the system is tuned. After the smart system runs for a few days, the human inspects and calibrates. The idea is that as content flows through the system  and periodic tweaks are made, the system becomes smarter. In reality, indexing drift creeps in. In effect, the smart software never strays too far from the human subject matter experts riding herd on algorithms.

The problem exists even when there is a relatively stable core of technical terminology. The content of a lab gear manufacturer is many times greater than the problem of a company focusing on a specific branch of engineering, science, technology, or medicine. Indexing Halliburton nuclear energy information is trivial when compared to indexing more generalized business content like that found in ABI/INFORM or the typical services organization today.

I agree that a controlled term list is important. One cannot easily resolve entities unless there is a combination of automated processes and look up lists. An example is figuring out if a reference to I.B.M., Big Blue, or Armonk is a reference to the much loved marketers of Watson. Now handle a transliterated name like Anwar al-Awlaki and its variants. This type of indexing is quite important. Get it wrong and one cannot find information germane to a query. When one is investigating aliases used by bad actors, an error can become a bad day for some folks.

The remainder of the slide deck rides the taxonomy pony into the sunset. When one looks at the information created 72 months ago, it is easy for me to understand why enterprise search and content processing has become a “oh, my goodness” problem in many organizations. I think that a mid sized company would grind to a halt if it needed a controlled vocabulary which matched today’s content flows.

My take away from the slide deck is easy to summarize: The lesson is that putting the cart before the horse won’t get enterprise where it must go to retain credibility and deliver utility.

Stephen E Arnold, May 9, 2015

Lightcrest Cloud Nine: Does Nirvana Come from Commodity Plumbing?

May 7, 2015

Lightcrest seems to want to be a major player in the enterprise search market. Recently the company’s senior management has posted links to LinkedIn enterprise search discussion groups. The president is Zach Fierstadt, and I wanted to read some of this other contributions to the search and content processing discussions I follow.

The Metaphors Used to Sell Search in the Cloud

I read “Cloud Nine Is a Private Cloud.” To me, Cloud Nine evokes a somewhat imprecise connotation; specifically, “heaven” and “a utopia of pleasure.” The notion of a utopia of pleasure makes me uncomfortable because promising wondrous outcomes from jargonized technology often comes to no good end.

Definitions

The Urban Dictionary’s word cloud  for Cloud Nine exacerbates my discomfort:

image

How do pleasure and technology link in hosted search services. Here’s a definition of pleasure from Google.

image

I noted that the word is used or intended for entertainment rather than business. “pleasure boats”. I immediately think of Caligula’s Lake Nemi ships, the Gary Hart vessel Monkey Business, and the Xoogler’s death by heroin yacht Escape. Let me say that I am not calmed by how my mind relates to metaphors of pleasure and information access.

Assertions

Now let’s look at the article “Cloud Nine Is a Private Cloud,” which is at this link, http://www.lightcrest.com/blog/2015/04/cloud-nine-is-a-private-cloud/. The author is Zach Fierstadt, who asserts:

Most public cloud providers are not tuned to provide you with full-stack support, including things like DevOps services and caching best-practices. This cost haunts CTOs in the form of sprawling staff requirements, whereby operational staff required to support a 24x7x365 operation grows as the infrastructure on the public cloud grows.

None of these references evoke any pleasure. I noodled over the reference to “DevOps,” which is a neologism. Like much jargon, the word “DevOps” blurs the distinction between two perfectly useful terms: Developers and Operations.

Hosting companies in general and Lightcrest in particular can, as I understand it, make a DevOp’s life into a digital utopia. Mr. Fierstadt writes:

The growth of private and hybrid cloud solutions is indicative of CIOs and CTOs realizing the economic benefits and performance optimizations associated with sophisticated cloud orchestration layered on top of single-tenant hardware. As your workloads and storage requirements grow, make sure your costs don’t blow your budget – and be sure to consider long-term alternatives that allow you to focus on your core business initiatives, and not on cloud operations or cloud economics.

Now this sounds pretty darned good. I like the parental tone and parental rhetoric of “make sure” and attendant sentence structure as well. When I was in college, I knew one student who thought any polysyllabic stream of nonsense was the stuff of his Technicolor dreams. For me, references to sophisticated, optimizations, workloads, costs, core business initiatives, etc. is a substitute for facts, thought provoking commentary, and useful information. Lightcrest offers my hungry mind thin gruel.

Lightcrest’s Alleged Expertise

I did some poking around on the Lightcrest Web site and learned that when the verbiage is parsed, the company does a couple of things. These are:

  • Hosting
  • Consulting.

Before I could see the sun through the psychedelic cloud of marketing silliness, I learned that Lightcrest has expertise in the following search and content processing systems. You can find the list at this link. Lightcrest, the Cloud Nine technology operation, can provide “expertise” for:

  • Document management search
  • eCommerce search
  • Intranet search
  • Web indexing.

When it comes to expertise which means skill or knowledge in a particular field, Lightcrest makes other search centric outfits a bit like also rans. Please, check out this collection of systems which the Cloud Nine organization can make bark, sit, roll over, and fetch the newspaper:

  • Attivio enterprise search
  • Autonomy and Verity. (I thought that Hewlett Packard had moved Autonomy to the cloud and repositioned it as something other than enterprise search. I am confused.)
  • Custom indexers and support. (What is a custom indexer? Does Lightcrest have proprietary crawling, parsing, and querying technology? Isn’t that important? Doesn’t an outfit with gargantuan expertise have a fact sheet about these functions?)
  • Endeca search and business intelligence. (Isn’t Oracle the owner of Endeca? Why is Endeca separate from Oracle? What happened to Endeca as an eCommerce search system? I must be senile.)
  • LucidWorks (Really?)
  • Microsoft Fast ESP (Enterprise Search Platform) and FDS 4.x. (which I thought was shorthand for Fire Dynamics Simulator. Shows how little search expertise I have.)
  • Oracle Enterprise Search (Is this Secure Enterprise Search, Oracle Text, or functionality from InQuira, TripleHop, or RightNow? No matter. Expertise is easy to say, but I think it might be slightly more difficult to deliver.)
  • Solr, Lucene, Nutch, Mahout, and Hadoop. (Are Mahout and Hadoop software delivering functions other than enterprise information retrieval? )
  • Sphinx and MySQL full text searching.

Some Considerations

Frankly I have grave doubts about this organization’s expertise in these areas. I have several reasons:

First, the odd ball mix of search systems mixes apples and quite old oranges. The square pegs are not in the square spaces. Round pegs sit precariously in the gaps designed for squares.

Also, the logic of the listing of these search engines defies me. I thought Mahout was software for machine learning and data mining, not information retrieval. How does one support and host software which is difficult to obtain from its owners of the intellectual property like Fast ESP or Verity?

The reference to “custom indexers” is interesting. Is Lightcrest able to index the Deep Web like BrightPlanet or like Recorded Future and its monitoring of Tor exit nodes? I wonder if Lightcrest has comparable technical horsepower for this type of work? Based on my experience with BrightPlanet and Recorded Future, I would suggest that Lightcrest is nosing into quite rarified territory without setting forth credentials which give me confidence in the company’s ability to deliver. What exactly are “custom indexers”? Am I able to apply these to a list of Tor sites and cross tabulate retrieved data with targeted clear Web crawls?

In my opinion and without evidence, facts, and concrete examples, the Lightcrest assertions are search engine optimization outputs.

The CEO as a Thought Leader

At least in the LinkedIn enterprise search “space,” Zach Fierstadt has attracted modest attention with his one sentence link only posts. Mr. Fierstadt wrote a non search related article in 2003 labeled “10G Matures” for Computerworld. He has a brief profile or “entry” in Google Plus, Zoom Info, and Stocktwits and a number of other social media sites. He made this statement in a 2010

“Look, there a lot of search solutions out there; but few cut the mustard when it comes to delivering sub-second performance at a reasonable price point. Lucene/Solr is the only platform that gives us the economy of scale needed to provide enterprise-grade search within our hosting model. By leveraging our expertise in deploying search within the enterprise, Lightcrest will be able to provide search solutions to smaller and mid-sized businesses that currently find proprietary platforms to be cost prohibitive.

What’s up with Lightcrest? Lightcrest walks gently, almost as if the company were weightless and massless. Maybe content marketing or just social media shot gunning? The company’s blog archives reveal marketing activities in September 2013 and then gaps in the content flow until January 2014, September 2014, December 2014, and the recent efflorescence of marketing oriented posts.

Bottom Line: Mass or Massless

Net net: Lightcrest may answer the question, “Is light a particle or a wave?” From what I understand about this company, there is most hand waving.

Stephen E Arnold, May 7, 2015

Cerebrant Discovery Platform from Content Analyst

May 6, 2015

A new content analysis platform boasts the ability to find “non-obvious” relationships within unstructured data, we learn from a write-up hosted at PRWeb, “Content Analyst Announces Cerebrant, a Revolutionary SaaS Discovery Platform to Provide Rapid Insight into Big Content.” The press release explains what makes Cerebrant special:

“Users can identify and select disparate collections of public and premium unstructured content such as scientific research papers, industry reports, syndicated research, news, Wikipedia and other internal and external repositories.

“Unlike alternative solutions, Cerebrant is not dependent upon Boolean search strings, exhaustive taxonomies, or word libraries since it leverages the power of the company’s proprietary Latent Semantic Indexing (LSI)-based learning engine. Users simply take a selection of text ranging from a short phrase, sentence, paragraph, or entire document and Cerebrant identifies and ranks the most conceptually related documents, articles and terms across the selected content sets ranging from tens of thousands to millions of text items.”

We’re told that Cerebrant is based on the company’s prominent CAAT machine learning engine. The write-up also notes that the platform is cloud-based, making it easy to implement and use. Content Analyst launched in 2004, and is based in Reston, Virginia, near Washington, DC. They also happen to be hiring, in case anyone here is interested.

Cynthia Murrell, May 6, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta