PolySpot Wins over OSEO with Enterprise Search

October 28, 2011

Paris-based PolySpot’s reliability in conjunction with their innovative technologies paid off. In the news release, “OSEO Opts for a new Search Engine with PolySpot” we got to hear about many of the specifics that made PolySpot stand out amongst the competition.

First, lets look at the issues that prompted OSEO to make the switch. OSEO had a Java-based directory in addition to a search engine supplied with its open source content management system.

OSEO’s former service was characterized by the following:

Indexing of data was restricted to the intranet and the search engine picked up too much ‘noise’. The users, unable to locate required information quickly, were no longer satisfied with the existing search engine which offered basic functionality.

Frédéric Vincent, Information System and Quality Assurance Manager champions their decision to use PolySpot Enterprise Search.

The functionalities that comprise an intuitive user interface make PolySpot’s Search stand out: users can now customize their internal search tool, see added-value tags related to their queries in tag cloud, and access search without quitting any other applications.

We think it may be a prudent step to check out PolySpot’s solutions at www.polyspot.com.

Megan Feil, October 28, 2011

Sponsored by Pandia.com

Facebook and Semantic Search

October 27, 2011

Stories about Facebook search surface then disappear. For years we have wondered why Twitter resists indexing the urls posted by Facebook members. Our view is that for the Facebook crowd, this curated subset of Web pages would be a useful reference resource. With Facebook metadata, the collection could become quite interesting in a number of dimensions.

Not yet, but the ongoing social media war between Web giants Facebook and Google doesn’t seem to be stopping at social media.

Facebook was last spring beavering away to create a semantic search engine using meta data, based on the company’s Open Graph system and by using collected data on every user. Few companies have the ability to build a semantic search engine, but with Facebook’s scale of users (over 400 million users), the company has the ability to create something huge. We learn more on AllFacebook’s article, “Facebook Seeks To Build the Semantic Search Engine”:

There are a number of standards that have been created in the past as some developers have pointed out, microformats being the most widely accepted version, however the reduction of friction for implementation means that Facebook has a better shot at more quickly collecting the data. The race is on for building the semantic web and now that developers and website owners have the tools to implement this immediately.

The source document appeared in April 2011 and here we are in the run up to Turkey Day and no semantic search system. Now we are wondering if Facebook has concluded that search is yesterday’s business or is the company struggling with implementation of semantic technology in a social space?

We will keep watching.

Andrea Hayden, October 27, 2011

Sponsored by Pandia.com

Microsoft on Semantic Search

October 25, 2011

We were interested to learn that semantic search is alive and kicking. A helping hand may be needed, but semantic search is not on life support.

Microsoft is making baby steps toward more user-friendly services, particularly in the realm of semantic search. MSDN Library offers information and assistance for developers using Microsoft products and services. I found one reference article while browsing the site that I found particularly useful.

Semantic Search (SQL Server)” is an write up which is still in its “preview” stage, so it is short and has a few empty links, but it provides quite a bit of insight and examples that are very useful for someone attempting to integrate Statistical Semantic Search in SQL Server databases. This process, we learn, extracts and indexes statistically relevant key phrases and uses these phrases to identify and index documents that are similar or related. A user queries these semantic indexes by using Transact-SQL rowset functions.

The document tells us:

Semantic search builds upon the existing full-text search feature in SQL Server, but enables new scenarios that extend beyond keyword searches. While full-text search lets you query the words in a document, semantic search lets you query the meaning of the document. Solutions that are now possible include automatic tag extraction, related content discovery, and hierarchical navigation across similar content. For example, you can query the index of key phrases to build the taxonomy for an organization, or for a corpus of documents.

The article goes on to explain various features of semantic search, such as finding key phrases in a document, finding similar or related documents, or even finding the key phrases that make documents similar or related. Add in storage, installation, indexing, and we have a good move in “how-to” for Microsoft. With Powerset, Fast Search, and Cognition Technologies, Microsoft should be one of the aces in semantic search.

Andrea Hayden, October 25, 2011

Sponsored by Pandia.com

Google and the Perils of Posting

October 21, 2011

I don’t want to make a big deal out of an simple human mistake from a button click. I just had eye surgery, and it is a miracle that I can [a] find my keyboard and [b] make any function on my computers work.

However, I did notice this item this morning and wanted to snag it before it magically disappeared due to mysterious computer gremlins. The item in question is “Last Week I Accidentally Posted”, via Google Plus at this url. I apologize for the notation style, but Google Plus posts come with the weird use of the “+” sign which is a killer when running queries on some search systems. Also, there is no title, which means this is more of a James Joyce type of writing than a standard news article or even a blog post from the addled goose in Harrod’s Creek.

To get some context you can read my original commentary in “Google Amazon Dust Bunnies.” My focus in that write up is squarely on the battle between Google and Amazon, which I think is more serious confrontation that the unemployed English teachers, aging hippies turned consultant, and the failed yet smarmy Web masters who have reinvented themselves as “search experts” think.

Believe me, Google versus Amazon is going to be interesting. If my research is on the money, the problems between Google and Amazon will escalate to and may surpass the tension that exists between Google and Oracle, Google and Apple, and Google and Viacom. (Well, Viacom may be different because that is a personal and business spat, not just big companies trying to grab the entire supply of apple pies in the cafeteria.)

In the Dust Bunnies write up, I focused on the management context of the information in the original post and the subsequent news stories. In this write up, I want to comment on four aspects of this second post about why Google and Amazon are both so good, so important, and so often misunderstood. If you want me to talk about the writer of these Google Plus essays, stop reading. The individual’s name which appears on the source documents is irrelevant.

1. Altering or Idealizing What Really Happened

I had a college professor, Dr. Philip Crane who told us in history class in 1963, “When Stalin wanted to change history, he ordered history textbooks to be rewritten.” I don’t know if the anecdote is true or not. Dr. Crane went on to become a US congressman, and you know how reliable those folks’ public statements are. What we have in the original document and this apologia is a rewriting of history. I find this interesting because the author could use other methods to make the content disappear. My question, “Why not?” And, “Why revisit what was a pretty sophomoric tirade involving a couple of big companies?”

2, Suppressing Content with New Content

One of the quirks of modern indexing systems such as Baidu, Jike, and Yandex is that once content is in the index, it can persist. As more content on a particular topic accretes “around” an anchor document, the document becomes more findable. What I find interesting is that despite the removal of the original post the secondary post continues to “hook” to discussions of that original post. In fact, the snippet I quoted in “Dust Bunnies” comes from a secondary source. I have noted and adapted to “good stuff” disappearing as a primary document. The only evidence of a document’s existence are secondary references. As these expand, then the original item becomes more visible and more difficult to suppress. In short, the author of the apologia is ensuring the findability of the gaffe. Fascinating to me.

3. Amazon: A Problem for Google

Read more

Q-Sensei: Multi-Dimensional Information Management

October 6, 2011

I found the MarketWatch story or news release “Frost & Sullivan Recognizes Q-Sensei’s Innovative Enterprise Search Platform for Providing Relevant Search Results across Information Sources” a buzzword bonanza. The system seems more versatile than Autonomy’s, Exalead’s, and Apache Lucene combined if I believe the story or news release. I am confident some of the azure chip crowd and the former librarians laboring away as search experts will gobble the hook and its plastic worm. Geese eat bread crumbs and trash, by the way.

Before getting to the meat of the story or news release, I noted this sub head:

Q-Sensei Leverages Its Proprietary Search and Indexing Engine to Offer High-Performance, Multi-Dimensional Information Management Capabilities to Deliver Quick ROI for Enterprises.

The story or news release explains that the system:

  • Won an award for innovation
  • provides a “holistic, multi-dimensional, real time view of enterprise data. (Repetition of the word does not help my understanding, however.)
  • Unified access to structure, semi structured and unstructured data
  • A simple interface
  • Offers the user an ability to collate data available in varied formats across different resources.

I will be talking about the meanings of “real time” and some of the weaknesses these systems hide under a pile of marketing brochures. I find the notion of “data dimensions” interesting, but I am not sure what that means. One of the challenges many systems have is proper time identification. A file stamp is one time, but when the document was written to a storage device is another. There is also the interesting challenge of a document changed offline and then a week later written to the device monitored by a system. Presumably Q-Sensei can handles these different time issues.

The write up also tosses in the MBA tattoo, ROI. Search is being embedded, morphed, and marginalized. I am not sure how one calculates ROI in an organization today, particularly if the company is losing money or disappointing investors who have watched their cash disappear. I suppose there is a negative ROI, but that is not mentioned in the story or news release.

If you want more information about this “easy to implement” system, navigate to the firm’s Web site. You can get more lingo like “haystacks”, multi-dimensional, and ROI. Q-Sensei’s Search and Presentation Engine is protected by U.S. Patents 7,080,059 and 7,680,777. More information can be found at www.qsensei.com.

Stephen E Arnold, October 6, 2011

Sponsored by Pandia.com

Observations about Content Shaping

October 3, 2011

Writer’s Note: Stephen E Arnold can be surprising. He asked me to review the text of his keynote speech at ISS World Americas October 2011 conference, which is described as “America’s premier intelligence gathering and high technology criminal investigation conference.” Mr. Arnold has moved from government work to a life of semi retirement in Harrod’s Creek. I am one of the 20 somethings against whom he rails in his Web log posts and columns. Nevertheless, he continues to rely on my editorial skills, and I have to admit I find his approach to topics interesting and thought provoking. He asked me to summarize his keynote, which I attempted to do. If you have questions about the issues he addresses, he has asked me to invite you to write him at seaky2000 at yahoo dot com. Prepare to find a different approach to the content mechanisms he touches upon. (Yes, you can believe this write up.) If you want to register, point your browser at www.issworldtraining.com.— Andrea Hayden

Research results manipulation is not a topic that is new in the era of the Internet. Information has been manipulated by individuals in record keeping and researching for ages. People want to (and can) affect how and what information is presented. Information can also be manipulated not just by people, but by the accidents of numerical recipes.

However, even though this is not a new issue, the information manipulation in this age is much more frequent than many believe, and the information we are trying to gather is much more accessible. I want to answer the question, “What information analysts need to know about this interesting variant of disinformation?”

The volume of data in a digital environment means that algorithms or numerical recipes process content in digital form. The search and content processing vendors can acquire as much or as little content as the system administrator wishes.

no-baloney-480

In addition to this, most people don’t know that all of the leading search engines specify what content to acquire, how much content to process, and when to look for new content. This is where search engine optimization comes in. Boosting a ranking in a search result is believed to be an important factor for many projects, businesses, and agencies.

Intelligence professionals should realize that conforming to the Webmaster guidelines set forth by Web indexing services will result in a grade much like the scoring of an essay with a set rubric. Documents should conform to these set guidelines to result in a higher search result ranking. This works because most researches rely on the relevance ranking to provide the starting point for research. Well-written content which conforms to the guidelines will then frame the research on what is or is not important. Such content can be shaped in a number of ways.

Read more

Vivisimo and Its High Score Search Plays

September 27, 2011

Vivisimo, the leader in information optimization, announced this week that Forrester Research, an independent research firm, gave them high scores in their report titled “Market Overview: Enterprise Search.” According to a PR Newswire article Vivisimo Earns Excellent Scores in All But Two Categories In Evaluation by Independent Research Firm Focused on Enterprise Search Market:

Vivisimo’s Velocity 8.0 platform was evaluated among 11 other competing vendor products with testing focusing on 10 distinct criteria, including: mobile support; federation model; indexing and connectivity; social and collaborative features; management and analysis; security; semantics/text analytics; interface flexibility; relevance model; and platform readiness. Scores for each criterion ranged from top to bottom: excellent; very good; good; fair; and poor. Following the conclusion of the evaluations Vivisimo was the only vendor to receive excellent scores in all but two judging criteria.

I’m glad to hear that Vivisimo delivers a high quality product. However, it would have been nice if the article stated what the two weaknesses were so that I could be aware of what they still need to improve upon. Our publisher’s new study—The New Landscape of Enterprise Search—identifies some “considerations” one may want to know about when selecting and enterprise search system. Also, don’t confuse The New Landscape with the azure chip consulting firm’s study which helpfully and quite originally uses the word “landscape” as well. Imitation is a form of flattery. Unimaginative, yes. Sincerest, no. Give everyone an “A” for effort.

Jasmine Ashton, September 27, 2011

Yahoo Ups Image Search: Is It Too Little Too Late Again?

September 20, 2011

Yahoo did make its mail service a bit more responsive. That’s a plus because Yahoo mail has been disappointing to our publisher Stephen E Arnold for a year. He complains about it when his T Mobile wireless broadband connection hangs when Yahoo’s servers are on a break.

And image search? We’re confused about Flickr. And in a much-needed effort to stay in the game, Yahoo has increased its image search functions. Search Engine Watch profiles the newest upgrade to Yahoo in, “Yahoo Launches Enhanced Image Search.”

Yahoo has announced a new image search that matches recent enhancement to Google and Bing. Yahoo’s new image presentation also allows for easy searching of galleries, a connection to your friends’ Facebook images, and easy navigation of full-sized images.

It boils down to whether anyone cares, and we are not sure that they do. Innovative in the beginning, Yahoo’s indexing set them apart, encouraging use by the librarian set who appreciated a more structured layout. Now Yahoo is relegated to a position of keeping up, mainly with Bing and Google. While the image features might be highly innovative, we are not sure that Yahoo still has the clout the pull in users to explore those features, or even stumble upon them.

Emily Rae Aldridge, September 20, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

MaxxCAT Claws into the Search Appliance Market

September 19, 2011

A news item, “Strengthen Your Enterprise Search with a Highly Optimized Search Appliance from MaxxCAT,” focuses on MaxxCAT, a competitor to Google Search and Thunderstone Search appliances. we learned:

No matter what size or style of business that you have, MaxxCAT can offer a solution for your enterprise search that is easily scalable as your data storage and recovery needs grow. In most cases, upgrading is as simple as adding another appliance to your rack mount.

According to the MaxxCAT website, the Pittsburgh based company was founded in 2007, a relative newcomer to the world of enterprise search. They claim to provide the highest performing search solutions at the lowest price points in the industry. The story asserts:

MaxxCAT’s product line has grown from a single high performance machine, the original SB-250, to a comprehensive line of high availability, high performance, low cost solutions that address some of the leading edge requirements of today’s search industry.

The prices are certainly competitive, and the line-up of offered products is diverse. While not yet the top player in the market, MaxxCAT could certainly move up the ranks as efficient but affordable search solutions are sought in the current sobering economic climate. Beyond Search has a soft spot in its heart for Pittsburgh-based firms. Our founder, Stephen E Arnold, attended Duquesne University while indexing Latin sermons and then he and his partners sold The Point (Top 5% of the Internet) to Lycos more than 15 years ago. Hilly place, however.

Emily Rae Aldridge, September 19, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Smartlogic Buys SchemaLogic: Consoliation Underway

September 15, 2011

Mergers have captured the attention of the media and for good reason. Deals which fuse two companies create new opportunities and can disrupt certain market sectors. For example, Hewlett Packard’s purchase of Autonomy has bulldozed the search landscape. Now Smartlogic has acquired SchemaLogic and is poised to have the same effect on the world of taxonomies, controlled vocabularies, and the hot business sector described as “tagging” or “metadata.”

As you know, Smartlogic has emerged as one of the leaders in content tagging, metadata, indexing, ontologies, and associated services. The company’s tag line is that its systems and methods deliver content intelligence solutions. Smartlogic supports the Google search technology, open source search solutions such as Solr, and Microsoft SharePoint and Microsoft Fast Search. Smartlogic’s customers include UBS, Yell.com, Autodesk, the McClatchy Company, and many others.

With the acquisition of SchemaLogic, Smartlogic tries to become one of the leading if not the leading company in the white hot semantic content processing market.  The addition of SchemaServer to the platform adds incremental functionality and extends solutions for customers. The merger adds more clients to Smartlogic’s current list of Fortune 1000 and global enterprise customers and confirms the company as the leading provider of Content Intelligent Software. Jeremy Bentley told Beyond Search:

Smartlogic has a reputation for providing innovative Content Intelligence solutions alongside an impeccable delivery record. We look forward to providing Grade A support to our new clients, and to broadening the appeal of Semaphore.

SchemaLogic was founded in 2003 by Breanna Anderson (CTO) and Andrei Ovchinnikov (a Russian martial arts expert with a love of taxonomy and advisory board member) and Trevor Traina (chairman and entrepreneur; he sold Compare.Net comparison shopping company to Microsoft in 1999). SchemaLogic launched its first product in November 2003. The company’s flagship product is SchemaServer. The executive lineup has changed since the company’s founding, but the focus on indexing and management of controlled term lists has remained.

A company can use the SchemaLogic products to undertake master metadata management for content destined for a search and retrieval system or a text analytics / business intelligence system. However, unlike fully automated tagging systems, SchemaLogic products can make use of available controlled term lists, knowledge bases, and dictionaries. The system includes an administrative interface and index management tools which permit the licensee to edit or link certain concepts. The idea is that SchemaServer (and MetaPoint which is the SharePoint variant) provides a centralized repository which other enterprise applications can use as a source of key words and phrases. When properly resourced and configured, the SchemaLogic approach eliminates the Balkanization and inconsistency of indexing which is a characteristic of many organization’s content processing systems.

Early in the company’s history, SchemaLogic focused on SharePoint. The firm added support for Linux and Unix. Today, when I think of SchemaLogic, I associate the company with Microsoft SharePoint. The MetaPoint system works when one wants to improve the quality of Sharepoint metadata. But the system can be used for eDiscovery and applications where compliance guidelines require consistent application of terminology? Time will tell, particularly as the market for taxonomy systems continues to soften.

Three observations are warranted:

First, not since Business Objects’ acquisition of Inxight has a content processing deal had the potential to disrupt an essential and increasingly important market sector.

Second, with the combined client list and the complementary approach to semantic technology, Smartlogic is poised to move forward rapidly with value added content processing services. Work flow is one area where I expect to see significant market interest.

Third, smaller firms will now find that size does matter, particularly when offering products and services to Fortune 1000 firms.

Our view is that there will be further content centric mergers and investments in the run up to 2012. Attrition is becoming a feature of the search and content processing sector.

Stephen E Arnold, September 15, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta