CyberOSINT banner

Magic May Not Come From Pre-Made Taxonomies

June 17, 2015

There are hundreds of companies that advertise they can increase information access, retrieval and accuracy for enterprise search by selling prefabricated taxonomies.  These taxonomies are industry specific and are generated by using an all-or-nothing approach, rather than individualizing them for each enterprise search client.  It turns out that the prefabricated taxonomies are not guaranteed to help enterprise search; in fact, they might be a waste of money.  The APQC Blog posted “Make Enterprise Search Magical Without Money” that uses an infographic to explain how organizations can improve their enterprise search without spending a cent.

APQC found that “best-practice organizations don’t have significantly better search technology.  Instead, they meet employees’ search needs with superior processes and approaches the content management.”

How can it be done?

The three steps are quite simple:

  1. Build taxonomies that reflect how people actually think and work-this can be done with focus groups and periodically reviewing taxonomies and metadata. This contributes to better and more effective content management.
  2. Use scope, metadata, and manual curation to ensure search returns the most relevant results-constantly the taxonomies for ways to improve and how users are actually users search.
  3. Clear out outdated, irrelevant, and duplicate content that’s cluttering up your search results-keep taxonomies updated so they continue to deliver accurate results.

These are really simple editing steps, but the main problem organizations might have is actually implementing the steps.  Will they assign the taxonomy creation task to the IT department or information professionals?  Who will be responsible for setting up focus groups and monitoring usage?  Yes, it is easy to do, but it takes a lot of time.

Whitney Grace, June 17, 2015

Sponsored by, publisher of the CyberOSINT monograph


Apple: Search Not Important. The Mom Approach Is.

June 16, 2015

Imagine the difficulty of pitching a six figure information retrieval service today. On one hand, there are suitable open source alternatives for search. On the other hand, there are users who prefer to do the 7-11 thing; that is, find a convenient source of good enough stuff.

That’s the Apple approach, which in my opinion, denigrates the importance of proactive online information access. Take what’s provided. Want a peach iced tea. The 7-11 has a lemon iced tea. Take what’s available. Good enough.

Navigate to “Apple News Curation Will Have Human Editors and That Will Raise Important Questions.” I am not too concerned about publishers. That crowd lost my interest decades ago. What does interest me is that the article does not put the notion of humans making choices in the context of customers just accepting what an Apple, Facebook, or Google provides.

I did highlight this passage:

Curating lists of apps is one thing, but having Apple employees curate the news flow could be more controversial depending on how the company ultimately works with select publishers and surfaces content from its preferred sources.

But associating the loss of interest in or the ability to look for information, figure out what is accurate or more nearly accurate, is not associated with the research function. Search vendors take another whack on the snoot.

It is tough to sell search when the customers prefer to drop by a convenience store and let that experience dictate the set of things from which one chooses.

Stephen E Arnold, June 16, 2015

Instagram Promises Improved Search

June 15, 2015

Frustrated with the abysmal search functionality at Instagram? Rejoice, for Wired tells us that, soon, “Better Search Will Transform How You Use Instagram.” Instagram’s cofounder Mike Krieger admitted that it is currently difficult for users to discover many photos that would interest them, but also asserted the company knows it must do better. Why, then, wasn’t search a priority earlier in the company’s history, and why are they talking about this now? Writer Julia Greenberg informs us:

“All that could soon change, given that Instagram has Facebook on its team. The social media titan, which acquired Instagram in 2012, is targeting Google itself as it develops a robust search system to make both its own platform and the whole web searchable through its own app. But while Facebook users post links, status updates, news, opinions, and photos, Instagram is almost completely visual. That means Instagram needs to teach its search engine to see. Krieger said his team has worked on a project to better understand how to automate sight. ‘Computer vision and machine learning have really started to take off, but for most people the whole idea of what is a computer seeing when it’s looking at an image is relatively obscure,’ Krieger said.”

Ah, prodded by their Facebook overlords; makes sense. Instagram isn’t ready to hand the site over to algorithms entirely, though. Their human editorial team still works to help users find the best images. Apparently, they feel humans are more qualified to choose photos with the most emotional impact (go figure). Krieger sees Instagram developing into a “storytelling” destination, the place users to go connect with world events through images: “the real-time view into the world,” as Krieger puts it. We agree that implementing an effective search system should help toward that goal.

Cynthia Murrell, June 15, 2015

Sponsored by, publisher of the CyberOSINT monograph

Solcara Is The Best!  Ra Ra Ra!

June 15, 2015

Thomson-Reuters is a world renowned news syndication, but the company also has its own line of search software called Solcara Federated Search also known as Solcara SolSearch.”  In a cheerleading press release, Q-resolve highlights Solcara’s features and benefits: “Solcara Legal Search, Federated Search And Know How.”  Solcara allows users to search multiple information resources, including intranets, databases, Knowledge Management, and library and document management systems.  It returns accurate results according to the inputted search terms or keywords.  In other words, it acts like an RSS feed combined with Google.

Solcara also has a search product specially designed for those in the legal profession and the press release uses a smooth reading product description to sell it:

“Solcara legal Search is as easy to use as your favorite search engine. With just one search you can reference internal documents and approved legal information resources simultaneously without the need for large scale content indexing, downloading or restructuring. What’s more, you can rely on up-to-date content because all searches are carried out in real time.”

The press release also mentions some other tools, case studies, and references the semantic Web.  While Solcara does sound like a good product and comes from a reliable new aggregator like Thomson-Reuters, the description and organization of the press release makes it hard to understand all the features and who the target consumer group is.  Do they want to sell to the legal profession and only that group or do they want to demonstrate how Solcara can be adapted to all industries that digest huge information amounts?  The importance of advertising is focusing the potential buyer’s attention.  This one jumps all over the place.

Whitney Grace, June 15, 2015
Sponsored by, publisher of the CyberOSINT monograph

More Semantic Search and Search Engine Optimization Chatter

June 10, 2015

I read “Understanding Semantic Search.” I had high hopes. The notion of Semantic Search as set forth by Tim Bray, Ramanathan Guha, and some other wizards years ago continues to intrigue me. The challenge has been to deliver high value outputs that generate sufficient revenue to pay for the plumbing, storage, and development good ideas can require.

I spent considerable time exploring one of the better known semantic search systems before the company turned off the lights and locked its doors. Siderean Software offered its Seamark system which could munch on triples and output some quite remarkable results. I am not sure why the company was not able to generate more revenue.

The company emphasized “discovery searching.” Vivisimo later imitated Siderean’s user input feature. The idea is that if a document required an additional key word, the system accepted the user input and added the term to the index. Siderean was one of the first search vendors to suggest that “graph search” or relationships would allow users to pinpoint content processed by the system. In the 2006-2007 period, Siderean indexed Oracle text content as a demonstration. (At the time, Oracle had the original Artificial Linguistics’ technology, the Oracle Text function, Triple Hop, and PL/SQL queries. Not surprisingly, Oracle did not show the search acquisition appetite the company demonstrated a few years later when Oracle bought Endeca’s ageing technology, the RightNow Netherlands-originated technology, or the shotgun marriage search vendor InQuira.)

I also invested some time on behalf of the client in the semantic inventions of Dr. Ramanathan Guha. This work was summarized in Google Version 2.0, now out of print. Love those print publishers, folks.

Dr. Guha applied the features of the Semantic Web to plumbing which, if fully implemented, would have allowed Google to build a universal database of knowledge, serve up snippets from a special semantic server, and perform a number of useful functions. This work was done by Dr. Guha when he was at IBM Almaden and at Google. My analysis of Dr. Guha’s work suggests that Google has more semantic plumbing than most observers of the search giant notice. The reason, I concluded, was that semantic technology works behind the scenes. Dragging the user into OWL, RDF, and other semantic nuances does not pay off as well as embedding certain semantic functions behind the scenes.

In the “Understanding Semantic Search” write up, I learned that my understanding of semantic search is pretty much a wild and crazy collection of half truths. Let me illustrate what the article presents as the “understanding” function for addled geese like me.

  • Searches have a context
  • Results can be local or national
  • Entities are important; for example, the White House is different from a white house

So far, none of this helps me understand semantic search as embodied in the 3WC standard nor in the implementation of companies like Siderean or the Google-Guha patent documents from 2007 forward.

The write up makes a leap from context to the question, “Are key words still important?”

From that question, the article informs me that I need to utilize schema mark up. These are additional code behinds which provide information to crawlers and other software about the content which the user sees on a rendering device.

And that’s it.

So let’s recap. I learned that context is important via illustrations which show Google using different methods to localize or personalize content. The write up does not enumerate the different methods which use browser histories, geolocation, and other signals. The write up then urges me to use additional mark up.

I think I will stick with my understanding of semantics. My work with Siderean and my research for an investment bank provided a richer base of knowledge about the real world applications of semantic technology. Technology, I wish to point out, which can be computationally demanding unless one has sufficient resources to perform the work.

What is happening in this “Understanding Semantic Search” article is an attempt to generate business for search engine optimization experts. Key word stuffing and doorway pages no longer work very well. In fact, SEO itself is a problem because it undermines precision and recall. Spoofing relevance is not my idea of a useful activity.

For those looking to semantics to deliver Google traffic, you might want to invest the time and effort in creating content which pulls users to you.

Stephen E Arnold, June 9, 2015

Online Shopping Is Too Hard

June 10, 2015

Online shopping is supposed to drive physical stores out of business, but that might not be the case if online shopping is too difficult.  The Ragtrader article, “Why They Abandon” explains that 45 percent of Australian consumers will not make an online purchase if they experience Web site difficulties.  The consumers, instead, are returning to physical stores to make the purchase.  The article mentions that 44 percent believe that traditional shopping is quicker if they know what to look for and 43 percent as prefer in-store service.

The research comes from a Rackspace survey to determine shopping habits in New Zealand and Australia.  The survey also asked participants what other problems they experienced shopping online:

“42 percent said that there were too many pop-up advertisements, 34 percent said that online service is not the same as in-store and 28 percent said it was too time consuming to narrow down options available.”

These are understandable issues.  People don’t want to be hounded to purchase other products when they have a specific item in mind and thousands of options are overwhelming to search through.  Then a digital wall is often daunting if people prefer interpersonal relationships when they shop.  The survey may pinpoint online shopping weaknesses, but it also helps online stores determine the best ways for improvement.

“ ‘This survey shows that not enough retailers are leveraging powerful and available site search and navigation solutions that give consumers a rewarding shopping experience.’ ”

People shop online for convenience, variety, lower prices, and deals.  Search is vital for consumers to narrow down their needs, but if they can’t navigate a Web site then search proves as useless as an expired coupon.


Whitney Grace, June 10, 2015
Sponsored by, publisher of the CyberOSINT monograph

Free Version of InetSoft Style Scope Agile Edition Available

June 10, 2015

The article titled InetSoft Launches Style Scope Agile Edition for Dashboarding and Visual Analytics on PRWeb tells of a free version of InetSoft’s application for visualizing analysis. Business users will gain access to an interactive dashboard with an easy-to-use drag and drop sensibility. The article offers more details about the launch:

“Advanced visualization types ideal for multi-dimensional charting and point-and-click controls like selection lists and ranger sliders give greater abilities for data exploration and performance monitoring than a simple spreadsheet offers. Any dashboard or analysis can be privately shared with others using just a browser or a mobile device, setting the application apart from other free BI tools… Setting up the software will be straightforward for anyone with power spreadsheet skills or basic knowledge of their database.”

Drawbacks to the free version are mentioned, such as being limited to two concurrent users. Of course, the free version is meant to “showcase” the company’s technology according to CMO Mark Flaherty. There is a demo available, to check out the features of the free application. InetSoft has been working since 1996 to bring users intuitive solutions to business problems. This free version is specifically targeted at smaller businesses who might be unable to afford the full application.

Chelsea Kerwin, June 10, 2015

Sponsored by, publisher of the CyberOSINT monograph

Data From eBay Illustrates Pricing Quirk

June 8, 2015

The next time you go to sell or buy an item, pay attention to the message the price is sending. Discover reports that “The Last Two Digits of a Price Signal Your Desperation to Sell.” Researchers at UC Berkeley’s business school recently analyzed data from eBay, tracking original prices and speed of negotiations. Writer Joshua Gans shares a chart from the report, and explains:

“The chart shows that when the posted initial price is of a round number (the red dots), like $1,000, the average counteroffer is much lower than if it is a non-round number (the blue circles), like $1,079. For example, the graph suggests that you can actually end up with a higher counteroffer if you list $998 rather than $1,000. In other words, you are better off initially asking for a lower price if price was all you cared about. [Researchers] Backus et al postulate that what is going on here is ‘cheap talk’ – that is, an easy-to-make statement that may be true or untrue with no consequences for dishonesty – and not an otherwise reliable signal. There are some sellers who don’t just care about price and, absent any other way of signaling that to buyers, they set their price at a round number. Alternatively, you can think that the more patient sellers are using non-round numbers to signal their toughness. Either way, the last two digits of the price is cheap talk.”

Gans notes that prices ending in “99” are apparently so common that eBay buyers treat them the same as those  ending in round numbers. The team performed a similar analysis on real estate sales data and found the same pattern: properties priced at round numbers sell faster. According to the write-up, real estate agents are familiar with the tendency and advise clients accordingly. Now you, too, can send and receive signals through prices’ last two digits.

Cynthia Murrell, June 8, 2015

Sponsored by, publisher of the CyberOSINT monograph

Your Mobile Is the Search Interface: In App or In Ept Revisited

June 7, 2015

I wrote an article for Information Today about the shift from having control of a search to being controlled by a search. The idea is that with an in app search function, the convenience makes the user in ept. With limited choices and the elimination of user defined filtering, the in app search converts searchers into puppets. The string puller is the ubiquitous and convenient search system.

I read “How Google Is Taking Search Outside the Box.” Nifty title but the opposite, in my opinion, is what’s “appening.” Search is now within the mobile device.

The write up asserts:

But search is still the heart of Google, even though the division that once went by that name is now called “Knowledge.” This reflects an evolution of Google search from something that pointed users to relevant websites to an all-knowing digital oracle that often provides answers to questions instantly (or sooner!) from a vast corpus of information called the Knowledge Graph. The whole ball of wax is threatened by the fact that the I/O of billions of users is now centered on mobile devices.

Google is indeed threatened with a search revenue problem. One way to address providing information to a user is to move the interaction inside the Google “walled garden.” In Google Version 2.0, now out of print, I explained that the walled garden allows Google to control the messages, information, and search results. The user consumes what Google delivers. Convenience for many is more important than taking control of the information provided.

For me, it is very, very difficult to run queries on a mobile device. The size and the keyboards present a problem. The results are difficult to manipulate. When I run queries from my desktop, I am able to move content, save it, output it, and process it using tools which are not available on a mobile device.

I do not accept mobile outputs as accurate. Last week I was in Prague. My mobile device connected in Frankfurt, Germany, and the system required two days to figure out that I was working in Prague. The few stabs I took at getting maps for Prague required lots of thumb typing.

The combination of half cooked software, high latency mobile systems for content delivery, and the paramount need to display unwanted information were evident.

The fix, for me, was to take my laptop computer, locate a “neutral” Wi-Fi connection, and take control of the queries.

Those who mindlessly consume what an in app experience delivers are racing toward ineptness. Sorry. Consuming information without considering what’s presented, ensuring that the output is one that meets the needs of the user, and performing the filtering function oneself are pretty dangerous behaviors.

Google is about revenue. The logic of the math club is not an approach designed to help out users. The problem is that consumers of information are not able to think about the objectivity, accuracy, or the relevance of the information.

Not good.

Stephen E Arnold, June 7, 2015

The Public Living Room

June 6, 2015

While much of the information that libraries offer is available via the Internet, many of their services are not.  A 2013 Gallup survey showed that over 90 percent of Americans feel that libraries are important to their communities.  The recent recession, however, forced local governments to cut library funding by 38 percent and the federal government by 19 percent.  Some library users see the “public living room” (a place to read, access computers, research, play games, etc.) as a last bastion for old technology and printed material.

Alternet’s article. “Why Libraries Matter More Than Ever In The Age Of Google” highlights a new book by John Palfrey called BiblioTech that discusses how libraries can maintain their relevancy and importance in communities.  Palfrey’s biggest argument is that humans are creating huge amounts of data, which is controlled by big and small tech companies.  These companies are controlling what information is available for consumption, while libraries offer people the ability to access any type of information and free of charge.

Palfrey offers other reasons to continue using libraries: print and ink archives are more reliable than digital, how physical, communal space is important for communities and education, and how librarians are vital components.

“These arguments, however, rely too heavily on the humans-are-better-than-technology rationale where “better” is measured by technological rather than humanistic standards. If librarians have a higher success rate than Amazon’s algorithm at recommending books, this might not be true forever. Does that mean we won’t need librarians at some point? No, the dilemma of disappearing libraries is not just about efficiency, it’s also about values. Librarians recommend books because they are part of a community and want to start a discussion among the people they see around them—to solve the world’s problems, but also just to have a conversation, because people want to be near each other. The faster technology improves and surpasses human capability, the more obvious it becomes that being human is not merely about being capable, it’s about relating to other humans.”

Palfrey’s views are described as ideological and in many ways they are.  Politicians cut funding, because they view libraries as archaic institutions and are blinded when it comes to the inequity when it comes to information access. Libraries indeed need a serious overhaul, but unlike the article explains, it is not simply updating the buildings and collections.  It runs more along the lines of teaching people the importance of information and free information access.

Whitney Grace, June 7, 2015

Sponsored by, publisher of the CyberOSINT monograph

« Previous PageNext Page »