Yebol: A Goner, Folks

August 11, 2015

I received a couple of messages about Yebol. The brand name referenced a human and semantic search engine which disappeared in the 2009-2010 time period. The system has been associated with Hong Feng Yin. The buzzwords associated with the system were meme theory and optimization, clustering and classification, etc. I am not sure what has triggered references to the system, but my file data shows that this is a system that anticipated Qwant.com. After a PR and marketing push in 2009, the Yebol shout became muted. The comments and links to Xavier Lur’s write up are a joint in time. Even Wikipedia knows this cat’s nine lives have been exhausted.

Stephen E Arnold, August 11, 2015

Advice for Smart SEO Choices

August 11, 2015

We’ve come across a well-penned article about the intersection of language and search engine optimization by The SEO Guy. Self-proclaimed word-aficionado Ben Kemp helps website writers use their words wisely in, “Language, Linguistics, Semantics, & Search.” He begins by discrediting the practice of keyword stuffing, noting that search-ranking algorithms are more sophisticated than some give them credit for. He writes:

“Search engine algorithms assess all the words within the site. These algorithms may be bereft of direct human interpretation but are based on mathematics, knowledge, experience and intelligence. They deliver very accurate relevance analysis. In the context of using related words or variations within your website, it is one good way of reinforcing the primary keyword phrase you wish to rank for, without over-use of exact-match keywords and phrases. By using synonyms, and a range of relevant nouns, verbs and adjectives, you may eliminate excessive repetition and more accurately describe your topic or theme and at the same time, increase the range of word associations your website will rank for.”

Kemp goes on to lament the dumbing down of English-language education around the world, blaming the trend for a dearth of deft wordsmiths online. Besides recommending that his readers open a thesaurus now and then, he also advises them to make sure they spell words correctly, not because algorithms can’t figure out what they meant to say (they can), but because misspelled words look unprofessional. He even supplies a handy list of the most often misspelled words.

The development of more and more refined search algorithms, it seems, presents the opportunity for websites to craft better copy. See the article for more of Kemp’s language, and SEO, guidance.

Cynthia Murrell, August 11, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Flawed Search As a Tactic

August 10, 2015

I read “Why Facebook’s Video theft Problem Can’t Last.” My initial reaction was, “Sure it can.” The main point of the write up struck me as:

But then popular YouTuber Hank Green leveled a number of allegations at Facebook’s video team, including a charge of rampant copyright infringement from Facebook users who are uploading videos from YouTube and other platforms without creators’ consent. Facebook has responded that it has measures in place to address copyright infringement, including allowing users to report stolen content and suspending accounts guilty of repeated violations.

I noted this statement:

For Facebook, video represents an irresistible new business opportunity. Early experiments with running natively inside the News Feed showed that it kept users on the site longer — and kept them from clicking external links that took them to YouTube and elsewhere.

Money and irresistible are words which flow.

The gem appears deep in the write up:

Facebook hasn’t made it easy for creators like Green to find instances of copyright infringement — there’s no way to filter Facebook searches for videos. And even if the stolen videos can be found, creators must fill out multiple forms, meaning it could be several days (and countless views) before a stolen video is taken down.

I find it interesting that search and retrieval may not do the trick. Then the bureaucratic process adds a deft touch.

I will file this item in my follow up folder. I know I can search my system for text files. Search which does not allow one to find information may be a tactic which serves other purposes. Is flawed search a business advantage? If one cannot find something, does that mean the “something” is not there?

Stephen E Arnold, August 10, 2015

The Girl with the Advert Tattoo

August 10, 2015

It looks like real publishing companies are now into tattoos or, at least, into leveraging ink’s growing popularity. The Verge reports, “The Desperate Book Industry and ‘Tatvertising’ are a Perfect, Tragic Match.” Reporter Kaitlyn Tiffany tells us that Hachette Austrailia put out the call for a model willing to be tattooed and photographed as part of a promotion for the next Steig Larsson book, “The Girl in the Spider’s Web.” Tiffany likens the effort to a practice, widely considered predatory, that was common just after the turn of the millennium: websites paying those desperate for cash to have ads tattooed on them, (sometimes on their faces!)  But, hey, at least those people were paid good money; apparently the reward for this scheme was meant to be the tattoo itself. The article elaborates:

“But why the [heck] does it need to be a real tattoo? When reached for comment, a representative from Razor & JOY, the advertising agency in charge of the campaign, told me, ‘The character of Lisbeth doesn’t do things in half measures — and so we wanted our marketing to capture this passion.’ The representative also explained that the compensation for the woman who is cast would be something… less than monetary: ‘This campaign is an opportunity to give a truly passionate fan a free tattoo that is unique to a strong literary character.’ And a new type of degrading, unpaid labor in the publishing industry was born.”

I’m not sure I’d personally consider this scheme “predatory,” but apparently Tiffany was not alone in her outrage. I visited the link she supplies in her article, and was greeted with a take-back notice; it reads, in part, “The campaign was conceived with good intentions …  but some people have been offended. As this was never our intention, we have listened and we have decided we will not continue with the tattoo element of the campaign.” At least the company was wise enough to make a change in response to criticism. I wonder, though, what they will come up with next.

Cynthia Murrell, August 10, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

A Call for More Friendly Enterprise Search Results

August 10, 2015

An idea from ClearBox Consulting would bring enterprise search results in line with today’s online searches. The company’s blog asserts, “Enterprise Search? We Need Some Answers on a Card.” Writer  Sam Marshall likes the way Google now succinctly presents key information about a user’s query in a “card” at the top of the results page, ahead of the old-school list of relevant links. For example, he writes:

“Imagine you want to know the time of the next train between two cities. When you type this into Google, the first hit isn’t a link to a site but a card like the one below. It not only gives the times but also useful additional information: a map, trip duration, and tabs for walking, driving, and cycling. Enterprise search isn’t like this. The same query on an intranet gives the equivalent of a link to a PDF containing the timetable for the whole region. It’s like saying ‘here’s the book, look it up yourself’. This is not only a poor user experience for the employee, but a direct cost to the employer in wasted time. I’d like to see enterprise search move away from results pages of links to providing pages of answers too, and cards are a powerful way of doing this.”

Marshall emphasizes some advantage of the card approach: the most important information is right there, separated from related but irrelevant data; cards work better on mobile devices; and cards are user-friendly. Besides, he notes, since this format is now popular with sites from Facebook to Twitter, users are becoming familiar with them.

The card concept could be enhanced, Marshall continues, by personalizing results to the individual—tapping into employee profiles or even GPS data. For more information, see the article; it utilizes a hypothetical  query about paternity leave to well-illustrate its point. Though enterprise search is not exactly known for living on the cutting edge of technology, developers would be foolish not to incorporate this (or a similar) efficient format.

Cynthia Murrell, August 10, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

IBM Spends to Make Watson Healthier, Hopefully Quickly

August 7, 2015

I noted the article “IBM Adds Medical Images to Watson, Buying Merge Healthcare for $1 Billion.” The company is in the content management business. Medical images are pretty much of a hassle whether in the good old fashioned film form or in digital versions. The few opportunities I have had to looked at murky gray or odd duck enhanced color images, I marveled at how a professional would make sense of the data displayed. Did this explanation trigger thoughts of IBM FileNet?

The image processing technology available from specialist firms permitting satellite or surveillance image analysis are a piece of cake compared to the medical imaging examples I reviewed. From my point of view the nifty stuff available to an analyst looking at the movement of men and equipment were easier to figure out.

Merge delivers a range of image and content management services to health care outfits. The systems can work with on premises systems and park data in the cloud in a way that keeps the compliance folks happy.

According to the write up:

When IBM set up its Watson health business in April, it began with a couple of smaller medical data acquisitions and industry partnerships with Apple, Johnson & Johnson and Medtronic. Last week, IBM announced a partnership with CVS Health, the large pharmacy chain, to develop data-driven services to help people with chronic ailments like diabetes and heart disease better manage their health.

Now Watson is plopping down a $1 billion to get a more substantive, image centric, and—dare I say it—more traditional business.

The idea I learned:

“We’re bringing Watson and analytics to the largest data set in health care — images,” John Kelly, IBM’s senior vice president of research who oversees the Watson business, said in an interview.

The idea, as I understand the management speak, is that Watson will be able to perform image analysis, thus allowing IBM to convert Watson into a significant revenue generator. IBM does need all the help it can get. The company has just achieved a milestone of sorts; IBM’s revenue has declined for 13 consecutive quarters.

My view is that the integration of the Merge systems with the evolving Watson “solution” will be expensive, slow, and frustrating to those given the job of making image analysis better, faster, and cheaper.

My hunch is that the time and cost required to integrate Watson and Merge will be an issue in six or nine months. Once the “integration” is complete, the costs of adding new features and functions to keep pace with regulations and advances in diagnosis and treatment will create a 21st century version of FileNet. (FileNet, as you, gentle reader, know as the 2006 acquisition. At the time, nine years ago, IBM said that the FileNet technology would

“advance its Information on Demand initiative, IBM’s strategy for pursuing the growing market opportunity around helping clients capture insights from their information so it can be used as a strategic asset. FileNet is a leading provider of business process and content management solutions that help companies simplify critical and everyday decision making processes and give organizations a competitive advantage.”

FileNet was an imaging technology for financial institutions and a search system which allowed a person with access to the system to locate a check or other scanned document.)

And FileNet today? Well, like many IBM acquisitions it is still chugging along, just part of the services oriented architecture at Big Blue. Why, one might ask, was the FileNet technology not applicable to health care? I will leave you to ponder the answer.

I want to be optimistic about the upside of this Merge acquisition for the companies involved and for the health care professionals who will work with the Watsonized system. I assume that IBM will put on a happy face about Watson’s image analysis capabilities. I, however, want to see the system in action and have some hard data, not M&A fluff, about the functionality and accuracy of the merged systems.

At this moment, I think Watson and other senior IBM managers are looking for a way to make a lemon grove from Watson. Nothing makes bankers and deal makers happy than a big, out of the blue acquisition.

Now the job is to find a way to sell enough lemons to pay for the maintenance and improvement of the lemon grove. I assume Watson has an answer to on going costs for maintenance and enhancements, bug finding and stomping, and the PR such activities trigger. Yep, costs and revenue. Boring but important to IBM’s stakeholders.

Stephen E Arnold, August 7, 2015

Quality and Text Processing: An Old Couple Still at the Alter

August 6, 2015

I read “Why Quality Management Needs Text Analytics.” I learned:

To analyze customer quality complaints to find the most common complaints and steer the production or service process accordingly can be a very tedious job. It takes time and resources.

This idea is similar to the one expressed by Ronen Feldman in a presentation he gave in the early 2000s. My notes of the event record that he reviewed the application of ClearForest technology to reports from automobile service professionals which presented customer comments and data about repairs. ClearForest’s system was able to pinpoint that a particular mechanical issue was emerging. The client responded to the signals from the ClearForest system and took remediating action. The point was that sometime in the early 2000s, ClearForest had built and deployed a text analytics system with a quality-centric capability.

I mention this point because many companies are recycling ideas and concepts which are in some cases long beards. ClearForest was acquired by the estimable Thomson Reuters. Some of the technology is available as open source at Calais.

In search and content processing, the case examples, the lingo, and even the technology has entered what I call its “recycling” phase.

I learned about several new search systems this week. I looked at each. One was a portal, another a metasearch system, and a third a privacy centric system with a somewhat modest index. Each was presented as new, revolutionary, and innovative. The reality is that today’s information highways are manufactured from recycled plastic bottles.

Stephen E Arnold, August 6, 2015

Rocket AeroText Search: Stretching the Access Concept

August 6, 2015

I did a quick check on AeroText search. I assume that even the most jejune enterprise search expert is familiar with this system. What I noticed is that AeroText now moves beyond search into six separate functions. These reminded me of Fast Search & Transfer’s approach in the 2006-2007, pre-implosion period.

The six functions, which you can read about and request a demo of, are at this link. These are:

  1. Folio Views. The idea is that basic search and retrieval are provided by Rocket
  2. Folio Builder. The idea is that information can be organized into folders for research purposes
  3. Folio Publisher. A commercial publishing company can package its information and sell it in digital form.
  4. Folio Integrator. This is a a software development kit.
  5. NXT Enterprise Server. This is the enterprise centric content processing and search system.
  6. NXT Professional Publishing Server. This is a “suite for storing, assembling, securing, and distributing content” which includes search.

If you navigate have a copy of one the first three editions of the Enterprise Search Report I wrote between 2003 and 2006, you will be able to check out the similarities. I present some of the Fast Search nomenclature in this 2012 article.

I find the marketing and positioning of Autonomy and Fast Search interesting. These companies themes are as fresh today as they were years ago.

Stephen E Arnold, August 6, 2015

Hey Google Doubters, Burn This into Your Memory

August 6, 2015

It has been speculated that Google would lose its ad profits as mobile search begins to dominate the search market but Quartz tells a different story in the article, “Mobile Isn’t Ruining Google’s Search Business After All.”  Google’s revenue continues to grow, especially with YouTube, but search remains its main earner.

According to the second-quarter earnings, Google earned $12.4 billion in Google Web sites, a $1.5 billion increase from last year.  Google continues to grow on average $1.6 billion per quarter.  Being able to maintain a continuous growth proves that Google is weathering the mobile search market.  Here is some other news, the mobile search revolution is now and not in the future.

“That is, if mobile really was going to squeeze Google’s search advertising business, we probably would have already seen it start by now. Smartphone penetration keeps deepening—with 75% saturation in the US market, according to comScore. And for many top media properties, half of the total audience only visits on mobile, according to a recent comScore report on mobile media consumption.”

There are new actions that could either impede or help Google search, such as deep linking between apps and the Web and predictive information services, but these are still brand new and their full effect has not been determined.

Google refuses to be left behind in the mobile search market and stands to be a main competitor for years to come.

Whitney Grace, August 6, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Google: Technical Debt Has Implications for Some AI Cheerleaders

August 5, 2015

If you are interested in smart software, you may want to read “Machine Learning: the High Interest Credit Card of Technical Debt.” I like the credit card analogy. It combines big costs with what some folks see as a something-for-nothing feature of the modern world.

The write up is important because it makes clear the future cost of using certain machine learning methods. The paper helps explain why search and content processing companies often burn more cash than available.

The paper identifies specific cost points which most MBAs happily ignore or downplay in post mortems of failed search and content processing companies. The whiz kids, both boys and girls, rationalize their failure to deal with shifting boundaries, “dark dependencies,” expensive spaghetti, and the tendency of smart software to sort of drift off center.

There is a fix. It is just darned expensive like credit card interest as the clueless consumer just covers the interest.

Applying the Google paper to search and content processing vendors, the only positive financial outcome is to sell the dog before it dies. Shift the search and content problem “credit card debt” to some other firm.

Perhaps that helps explain the Lexmark financial challenge and the dismay at Hewlett Packard as the reality of Autonomy dawned on those quick to spend billions.

Worth reading. Well done, Googlers.

Stephen E Arnold, August 5, 2015

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta