Search and Security: Old Wine Rediscovered

July 20, 2011

There is nothing like the surprise on a user’s face when an indiscriminate content crawl allows a person to read confidential, health, or employment information. Over enthusiastic “search experts” often learn the hard way that conducting a thorough content audit * before * indexing content on an Intranet is a really good idea.

Computerworld’s new article “Security Manager’s Journal:The perils of enterprise search,” is an insight into the dangers of sloppy search parameters or what we call old wine rediscovered.

The author does a good job of addressing the security concerns that can pop up if an enterprise search is not well thought out.

 

If security concerns aren’t addressed, this is what you can expect: The IT team does some research, makes a choice, deploys the infrastructure and begins pointing it to data repositories. Before you know it, someone conducts a search with a term like “M&A” and turns up a sensitive document naming a company that’s being considered for acquisition, or a search for the word “salary” reveals an employee salary list that was saved in an inappropriate directory. In other words, people will be able to find all manner of documents that they shouldn’t have access to.

 

Thurman sites the ‘rule of least privilege’ or the rule that information should only be available to those who need to know of it. With enterprise searching, it means that queries should return only information relevant to the search and that the user is allowed to see.

All in all, a rather informative if redundant read that outlines a few security options and ideas.

What we find interesting is that such write ups have to be recommissioned. Not much sophistication in enterprise search land we fear.

Stephen E Arnold, July 20, 2011

Sponsored by ArticleOnePartners.com, the source for patent research

Search Engine Optimization Thrashing

July 20, 2011

The addled goose is not into clicks. The goose is near retirement. The content he posts is primarily an aide de memoire. The topics he covers are of interest to a few believers in precision and recall, not 20 something and faux consultant court jesters.

Adam Audette’s article on searchengineland.com, “Weighing In-House vs. Agency SEO Enterprise Search Strategies” is a two handed grasp at finding something that will work for the SEO crowd. Audette makes valid arguments for utilizing in-house SEO’s as well as agencies.

The primary weakness of the in-house SEO role is that of myopia. Not in the sense of a lack of imagination, but in a pervading nearsightedness that’s almost inescapable. The in-house is so deeply immersed in her industry, her company, and her sites, that she can’t see the forest for the trees. Even worse, she becomes out of touch with where the industry is trending.

 

This conundrum is where an outside agency is beneficial. Their workload is often much more diversified and have large pools of people and resources to contact and brainstorm should the need arise, they are the ones responsible for SEO on a daily basis.

Audette’s solution to the problem is a “dream team” built of in-house, agency and consultants working together in harmony.I’m not entirely sure that it will play out that way.

Our view is that SEO can chew up a lot of cash for iffy results in the post-Panda world. In order to control costs, organizations clinging to the icons of the SEO faith will want to do in the cubes of the organization’s marketing department. For outfits flush with bucks to pump into the third part experts’ pockets, performance, not promises, are going to be needed.

Can the SEO industry survive and thrive? Absolutely. PT Barnum had it nailed.

Stephen E Arnold, July 20, 2011

Sponsored by Stephen E Arnold, author of The New Landscape of Enterprise Search

Protected: SharePoint has a New Beau: Yammer

July 20, 2011

This content is password protected. To view it please enter your password below:

New Site from PolySpot

July 19, 2011

PolySpot Blog announces a new site in “PolySpot Information At Work sur le portail de l’Intelligence économique.” (“PolySpot Information At Work on the economic intelligence portal.”) If you don’t read French, you can run the article through Google Translate.

Or, you can just check out the site itself, PolySpot Information At Work. The intro page explains that it is based on four modules: one which extracts raw data; a platform structure and semantic enrichment module; an indexed search service; and an administration module. The main applications of the site are listed as:

  • Research transverse company
  • Business-oriented research
  • Information management
  • Intelligence and regulatory framework
  • Aid for the production of editorial content
  • Web site / Extranet
  • Research services included in a third-party application
  • Worth a look-see for those interested in business intelligence.

PolySpot has been providing software research and information access to businesses since 2001. The company prides itself on its innovative, modular approach to meeting their clients’ information needs.

Cynthia Murrell, July 19, 2011

Sponsored by Pandia.com, publishers of the New Landscape of Enterprise Search.

Oracle, Sun Burned, and Solr Exposure

July 19, 2011

Frankly we wondered when Oracle would move off the dime in faceted search. “Faceted search”, in my lingo, is showing users categories. You can fancy up the explanation, but a person looking for a subject may hit a dead end. The “facet” angle displays links to possibly related content. If you want to educate me, use the comments section for this blog, please.

We are always looking for a solution to our clients’ Oracle “findability” woes. It’s not just relevance. Think performance. Query and snack is the operative mode for at least one of our technical baby geese. Well, Oracle is a bit of a red herring. The company is not looking for a solution to SES11g functionality. Lucid Imagination, a company offering enterprise grade enterprise search solutions, is.

The question is a compelling one: “Need a better way to perform targeted searches of an enterprise database?” Lucid Imagination suggests a solution in “Solr Searching with RDBMS.”

Writer Altan Khendup observes that pattern searches within a range of criteria are often addressed with standardized reports or custom-written relational queries. These tools, however, can be inadequate and high-maintenance. He suggests using the open-source application Solr, instead.

Khendup describes a job he performed for a large company which had an existing customer relationship management (CRM) application. Users had no efficient to pull specific information from the large database. Solr provided the key to bypass the complex solution that would otherwise have been required. Check out the article for details.

Khendup concludes:

“With these new capabilities, answers to key questions can be found in seconds. Data can be mined quickly, efficiently and flexibly without a lot of specialized training for business users. Additionally, the indexes could be managed in such a way such that additional data could be added for to increase the scope of analysis, or subsets of data could be indexed and searched for specific business reasons such as service outages or legal reasons.”

We wonder, will the Oracle SES team embrace or resist this option? Right now, not likely. Going forward, * if * the open source litigation arena smiles on Dolphin Way, maybe.

Cynthia Murrell July 19, 2011

Sponsored by—believe it or not—Quasar Capital Advisors, a next generation financial services and analytics firm. Check the company out: www.quasarca.com.

Exclusive Interview with Margie Hlava, Access Innovations

July 19, 2011

Access Innovations has been a leader in the indexing, thesaurus, and value-added content processing space for more than 30 years. Her company has worked for most of the major commercial database publishers, the US government, and a number of professional societies.

image

See www.accessinn.com for more information about MAI and the firm’s other products and services.

When I worked at the database unit of the Courier-Journal & Louisville Times, we relied on Access Innovations for a number of services, including thesaurus guidance. Her firm’s MAI system and its supporting products deliver what most of the newly-minted “discovery” systems need. Indexing that is accurate, consistent, and makes it easy for a user to find the information needed to answer a research or consumer level question. What few realize is that using the systems and methods developed by the taxonomy experts at Access Innovations is the value of standards. Specifically, the Access Innovations’ approach generates an ANSI standard term list. Without getting bogged down in details, the notion of an ANSI compliant controlled term list embodies logical consistency and adherence to strict technical requirements. See the Z39.19 ANSI/NISO standard. Most of the 20 somethings hacking away at indexing fall far short of the quality of the Access Innovations’ implementations. Quality? Not in my book. Give me the Access Innovations (Data Harmony) approach.

Care to argue? I think you need to read the full interview with Margie Hlava in the ArnoldIT.com Search Wizards Speak series. Then we can interact enthusiastically.

On a rare visit to Louisville, Kentucky, on July 15, 2011, I was able to talk with Ms. Hlava about the explosion of interest in high quality content tagging, the New Age word for indexing. Our conversation covered the roots of indexing to the future of systems which will be available from Access Innovations in the next few months.

Let me highlight three points from our conversation, interview, and enthusiastic discussion. (How often do I in rural Kentucky get to interact with one of the, if not the, leading figure in taxonomy development and smart, automated indexing? Answer: Not often enough.)

First, I asked how her firm fit into the landscape of search and retrieval?

She said:

I have always been fascinated with logic and the application of it to the search algorithms was a perfect match for my intellectual interests. When people have an information need, I believe there are three levels to the resources which will satisfy them. First, the person may just need a fact checked. For this they can use encyclopedia, dictionary etc. Second, the person needs what I call “discovery.” There is no simple factual answer and one needs to be created or inferred. This often leads to a research project and it is certainly the beginning point for research. Third, the person needs updating, what has happened since I last gathered all the information available. Ninety five percent of search is either number one or number two. These three levels are critical to answering properly the user questions and determining what kind of search will support their needs. Our focus is to change search to found.

Second, I probed why is indexing such a hot topic?

She said:

Indexing, which I define as the tagging of records with controlled vocabularies, is not new. Indexing has been around since before Cutter and Dewey. My hunch is that librarians in Ephesus put tags on scrolls thousands of years ago. What is different is that it is now widely recognized that search is better with the addition of controlled vocabularies. The use of classification systems, subject headings, thesauri and authority files certainly has been around for a long time. When we were just searching the abstract or a summary, the need was not as great because those content objects are often tightly written. The hard sciences went online first and STM [scientific, technical, medical] content is more likely to use the same terms worldwide for the same things. The coming online of social sciences, business information, popular literature and especially full text has made search overwhelming, inaccurate, and frustrating. I know that you have reported that more than half the users of an enterprise search system are dissatisfied with that system. I hear complaints about people struggling with Bing and Google.

Third, I queried her about her firm’s approach, which I know to be anchored in personal service and obsessive attention to detail to ensure the client’s system delivers exactly what the client wants and needs.

She said:

The data processed by our systems are flexible and free to move. The data are portable. The format is flexible. The interfaces are tailored to the content via the DTD for the client’s data.  We do not need to do special programming. Our clients can use our system and perform virtually all of the metadata tasks themselves through our systems’ administrative module. The user interface is intuitive. Of course, we would do the work for a client as well. We developed the software for our own needs and that includes needing to be up running and in production on a new project very quickly. Access Innovations does not get paid for down time. So our staff are are trained. The application can be set up, fine tuned, deployed in production mode in two weeks or less. Some installations can take a bit longer. But as soon as we have a DTD, we can have the XML application up in two hours. We can create a taxonomy really quickly as well. So the benefits, are fast, flexible, accurate, high quality, and fun!

You will want to read the complete interview with Ms. Hlava. Skip the pretend experts in indexing and taxonomy. The interview answers the question, “Where’s the beef in the taxonomy burger?”

Answer: http://www.arnoldit.com/search-wizards-speak/access-innovations.html

Stephen E Arnold, July 19, 2011

It pains me to say it, but this is a freebie.

Lucid Imagination Suggests FAST Is Slowing Down

July 19, 2011

Not So FAST,” quips Lucid Imagination. Unenthused about Microsoft’s treatment of its enterprise search acquisition, FAST, writer David M. Fishman reports on the latest development:

It seems that Microsoft is taking another step towards absorbing the technology once known as FAST Search and Transfer. Sadly, rather than absorb the expertise into the company, they’ve apparently furloughed the folks responsible for sales and marketing of the technologies known as FSIS and FSIA.

If this is true, the SharePoint centric search vendors are going to have a banner year.

Combined with the revelation back in February that Microsoft would limit the use of FAST to Windows users, this represents a poor showing by the software giant. We noted a post in another blog that seemed to suggest that Fast had crashed or was in the process of crashing. We don’t know what’s what with Fast. We find it fascinating that an open source search company is pointing to something that my reverberate through the enterprise search sector. Microsoft paid $1.2 billion for the Fast property in 2008. Now 36 months later chatter about clipping an oak tree while speeding on Highway 1? Yikes. I thought Stephen E Arnold, who pays me to write these items, suggested that Microsoft should have purchased Exalead. Well, Exalead is now owned by Dassault Systèmes may be poised to make some sales calls on Fast licensees. But every search vendor on the planet may have the same idea if this explosive notion is valid.

Do we think Microsoft will be feeling warm and happy about the Lucid Imagination blog post? No clue. We are far from the action in rural Kentucky and darned happy about that. Hey, while we’re on the subject: check out our SharePoint coverage in www.sharepointsemantics.com.

Cynthia Murrell, July 19, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Protected: Add Bling to SharePoint with a Content Slider

July 19, 2011

This content is password protected. To view it please enter your password below:

Belgium Google Dust Up

July 18, 2011

Short honk: The goose was incorrect. The goose believed that Google would not manually intervene in search results. The goose is shattered. Navigate to “After Copiepresse “Boycott,” Google Restores Search of News Sites”. If the story is accurate, there seems to be an allegation that Google had imposed a “so called boycott” of the Copiepresse newspapers. I thought Google was an algorithm baby. Now it seems that humans do shape search results. Implications? Lots. What about sites affected by Panda? Algorithm or manual intervention? What about relevance? Algorithm or human? What about big advertisers’ position in results sets? Algorithm or human?

Stephen E Arnold, July 18, 2011

Freebie

Breaking Relevance: The TrackMeKnot Method

July 18, 2011

Okay, with ProQuest, Cambridge Scientific, and Dialog about to jump into the statistical fog of relevance, I fell pretty glum. Most old school searchers prefer to type in explicit commands; for example

b 15
ss cc=77? AND cc=76?? AND esop

When the new “fuzzified” version of the commercial search system for ProQuest, Cambridge Scientific, and Dialog-type users, good luck with that. In the new commercial systems, the old school, brute force, Boolean approach would return consistent results search in and search out. Take it to the bank.

Change is afoot so queries will return somewhat unpredictable results depending on what pointers get jiggled in an index update.

If we shift to the free Web search engines, the notion of relevance is based on lots of “signals”. A signal is something that allows the search system to disambiguate or add context to an action. If you are running around an airport, the mobile search wizards want to look at your search history and hook those signals to your wandering GPS input. The result is search done for you.

Why is relevance lousy? Well, search engine optimization is to blame. The focus on selling targeted ads is a contributor. And there are some interesting software tools that aim to confuse certain traffic analysis systems. So far, no one wants to confuse the ProQuest, Cambridge Scientific, and Dialog-type systems, but the Web search world is like catnip.

One of our readers alerted us to TrackMeKnot, which is an obfuscation software designed to defeat certain types of usage tracking. Here’s what the developers say:

TrackMeNot runs in Firefox as a low-priority background process that periodically issues randomized search-queries to popular search engines, e.g., AOL, Yahoo!, Google, and Bing. It hides users’ actual search trails in a cloud of ‘ghost’ queries, significantly increasing the difficulty of aggregating such data into accurate or identifying user profiles. To better simulate user behavior TrackMeNot uses a dynamic query mechanism to ‘evolve’ each client (uniquely) over time, parsing the results of its searches for ‘logical’ future query terms with which to replace those already used.

If you want to cover your search clicks, give it a whirl. Obfuscation methods, if used by lots of people, may have an adverse impact on relevance, particularly when personalization is enabled. Lucky me.

Stephen E Arnold, July 18, 2011

Sponsored by Pandia.com (www.pandia.com), publishers of The New Landscape of Enterprise Search.

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta