A Search Library for Python

December 6, 2013

Python is one of the many programming languages available. Programmers rely on already existing libraries and open source to help them create new projects. Bitbucket points our attention to “Whoosh-Python Search Library” that appears to be a powerful open source solution to satisfy you search woes.

The article states:

“Whoosh is a fast, featureful full-text indexing and searching library implemented in pure Python. Programmers can use it to easily add search functionality to their applications and websites. Every part of how Whoosh works can be extended or replaced to meet your needs exactly.”

What can Whoosh do? It has fielded indexing, fast indexing and retrieval, a powerful query language, the only production quality pure Python spell-checker, pluggable scoring algorithm, and a Pythonic API. Whoosh was built to handle situations where the programmer needs to avoid creating native libraries, make a research platform, provides one deeply-integrated search solution, and has an easy-to-use interface.

Whoosh started out as a search solution for proprietary software. Matt Chaput designed it for Side Effects Software Inc.’s animation software Houdini. Side Effects Software allowed Chaput to release the library to the open source community and many Python programmers probably consider it an early Christmas gift.

Whitney Grace, December 06, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Search Innovations: Revisiting the Past

December 5, 2013

I printed out an article from ReadWrite Web five or six years ago. The story was “Top 17 Search Innovations Outside of Google.” I suggest that anyone tracking Yahoo’s decision to jump back into search or struggling with the dearth of Web search options may want to read this article. I think the list, prepared in May 2007, is a useful reminder of the lack of progress in search.

Let me highlight five of the innovations. These are “breakthroughs” that various search vendors and satraps have explained as the “next big thing.” Well, maybe.

  1. Personalization. The idea that the user does not see a list of results that are believed to be objective and relevant to the query is fascinating. When vendors filter information, vendors control the information agenda. Quite an innovation. I thought something similar happened in other spheres of interest years ago.
  2. Algorithm improvement. I like the idea that search has broken free of the algorithms that have been in use since the early days of SDC, SMART, and STAIRS. If the “improvement” erodes precision and recall, is that a good thing? If “improvement” means computational efficiency to reduce costs, is that a better thing?
  3. Parametric search. Yep, structured query language queries. What’s new? The fact that fewer professionals want to hassle with figuring out a query is fresher than the method itself.
  4. Semantic search. Does a user understand the upside and downside of semantic search? Do marketers? Oh, yeah.
  5. Results visualization. Hollywood style outputs have helped Palantir raise lots of money. Does a user know what a visualization “means”? Not too often.

The point is that the ReadWrite list makes clear that no significant progress in search has been made in the last five or six years. Am I missing progress?

To get some details about the dead end for search and content processing, check out the vendor case studies at www.xenky.com/vendor-profiles. The similarity among systems, features, and methods is interesting.

Stephen E Arnold, December 5, 2013

TEMIS Gets Another Client

December 5, 2013

Good news for TEMIS, everyone! According to The Sacramento Bee in the article, “OECD Chooses TEMIS To Semantically Structure Its Knowledge And Information Management Process,” TEMIS has a new and very big client. The OECD stands for the Organization for Economic Cooperation and Development and they have selected TEMIS’s semantic content enrichment solution Luxid.  OECD has started a new project called the Knowledge and Information Management (KIM) Program to create framework for managing and delivering information as well as improving its accessibility and presentation. The OECD collects and analyzes data for over thirty-four member governments and over one hundred countries to help them sustain economic growth, boost employment, and raise the standard of living. The KIM Program will be a portal for the organization’s information and will hopefully increase findability and search.

What will TEMIS do? the article explains:

“In this context, the OECD has chosen TEMIS’s flagship Luxid® Content Enrichment Platform to address all Semantic Enrichment stages of the KIM framework. Luxid® will help OECD to consistently enrich document metadata in alignment with its taxonomies and ontologies, providing a genuinely semantic integration layer across heterogeneous document storage and content management components. This semantic layer will both enable new search and browsing methods and improved relevance and accuracy of search results, as well as progressively build an integrated map of OECD knowledge.”

Glad to see that enriched search and findability are not dead yet. Metadata still has its place, folks. How else will the big data people be able to find their new insights if metadata is not used?

Whitney Grace, December 05, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Tips to Customize Your Bing Start Search on Windows 8.1

December 5, 2013

The story on LifeHacker titled How to Configure or Disable Bing Web Search in Windows 8.1 explains a step-by step process to shut down or adjust Bing Search. The article responds to some complaints about Windows 8.1 search being slow and frustrating. Windows latest version is set up so that any search from the Start screen will yield web results.

The article explains:

“You can either turn off Bing search completely, or simply tweak settings like whether to give personalized results using your location or turning off safe search. To do any of these things, here’s where to go: Open the charms menu (place your cursor in the top right or bottom right corner) and select “Settings.” Click “Change PC settings.” Click “Search and apps.” Click “Search” in the side bar if it’s not already selected. Disable or change any options you choose.”

Some may find the search option useful, and time-saving, but for others the web search option is unnecessary. Depending on the connection speed, this might be a very frustrating option. For a more thorough tutorial laying out all of the options for customizing your Start search read How to Customize or Disable Search with Bing in Windows 8.1.

Chelsea Kerwin, December 05, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Yahoo and Search: Innovation or PR?

December 4, 2013

I read “You Are the Query: Yahoo’s Bold Quest to Reinvent Search.” The write up explains that “search” is important to Yahoo. The buzzwords personalization and categorization make an appearance. There is no definition of “search.” So the story suggests that the new direction may be a “feed”, a stream of information. The passage I noted is:

So what is Yahoo building? To wit, the company is working on a new “personalization platform,” according to the LinkedIn profile of one Yahoo senior director. Cris Luiz Pierry, the director who headed up Yahoo’s now-shuttered Flipboard clone Livestand, writes that he is heading up a “stealth project,” and that he is “building the best content discovery and recommendation engine on the Web, across all of our regions.” Pierry also has an in-the-weeds search background, with experience in core Web search, ranking algorithms, and e-commerce software — which may come in handy when dealing with monetization.

A stealth search project. Didn’t Fulcrum Technologies operate in this way between 1983 and its run up to a much needed initial public offering in the early 1990s? Wasn’t the newcomer SRCH2 in stealth mode earlier in 2013?

The hook to the new approach may be nestled within this comment in the article:

That search experience would likely be layered on top of another company’s Web crawler, like Microsoft’s Bing, which took over those operations for Yahoo in 2010, as part of a 10-year deal. (More on that later.) Beginning in 2008.

Indexing the Web is an expensive proposition. No commercial publisher can afford it. Google is able to pull it off via its Yahoo-inspired ad model. Yandex is struggling to find monetization methods that allow it to keep its indexes fresh. But other Web indexers have had to cut back on coverage. Exalead’s Web index is thin gruel. Blekko has lost its usefulness for me. In fact, looking for information is now more difficult that it has been for a number of years.

Another interesting comment in the article jumped off the screen for me; to wit:

We firmly believe that the Search Product of tomorrow will not be anything alike [sic] the product that we are used to today,” says the job description for the search architect. The posting also name-checks Search Direct, Yahoo’s version of Google Instant, as the “first step” in changing the landscape of search. After testing out a few queries on Yahoo’s home page, the feature, which looks up queries without requiring the user to hit “search,” looks to be dormant.

The write up concludes with this speculative paragraph:

Some theories: The company could be planning a Bing exit strategy for 2015 or earlier, and look to partner with another Web crawler, aka Google. Some reports have said Mayer has been cozying up to her former company on that front. Or Yahoo could be rebuilding its own core search capabilities, though that’s the unlikeliest of scenarios because that would be a nightmare for the company’s margins. Or Yahoo could even be beefing up its team just enough to gain more authority within the Bing partnership, in case it wanted to advise Bing on what to do on the back end.

What I find interesting is that the term “search” is not really defined in this write up or most of the information I see that address findability. I am not sure what  “search” means for Yahoo. The company has a history of listing sites by categories. Then the company indexed Web sites. Then the company used other vendors’ results. What’s next? I am not sure.

Observations? I have a few:

First, anyone looking for specific information has a tough job on their hands today. In a conversation with two experts in information retrieval, both mentioned that finding historical information via Web search systems was getting more difficult.

Second, queries run by different researchers return different results. The notion of comparative searching is tricky.

Third, with library funding shrinking, access to commercial databases is dwindling. For example, in Kentucky, patrons cannot locate a company news release from the 1980s using public library services.

The article about Yahoo is less about search and more about public relations. Is Yahoo or any vendor able to do something “new” in search? Without defining the term “search,” does it matter to the current generation of experts?

Personally I don’t want to influence a query. I want to locate information that is germane to a query that I craft and submit to an information retrieval system. Then I want to review results lists for relevant content and I want to read that information, analyze the high value information, synthesize it, and move on about my business.

I want to control the query. I don’t want personalization, feeds, or predictive analytics clouding the process. Does “search” mean thinking or taking what a company wants to provide to advance its own agenda?

Stephen E Arnold, December 4, 2013

Bing Continues Making Changes to Shopping Search Experience

December 4, 2013

An article titled Bing Sunsets Shopping Search, Integrates Directly Into Web Results on Search Engine Watch offers some insights into Bing’s attempts to improve its shopping experience. Bing announced in August that they are working to improve shopping results and more recently that they are retiring the “dedicated shopping experience” in favor of a user intent model.

The article explains:

“Using Bing Snapshot technology, certain search queries will return snapshots of various products in the right side column. Clicking on these products will produce a different result set of vendor sites that sell that particular product. Those results will also contain a carousel of similar products or models directly under the search box. Reviews, product specs are also included as snapshot information in the sidebar, as are prices from various vendors who purchase Bing ads.”

Bing is working to gain on Amazon, the company to beat worldwide when it comes to online shopping. Bing’s user intent plan is shaped around logical connections between queries and product comparisons. Bing is trying to move away from keywords and toward understanding what the user really wants. The integration of shopping results into the main experience is meant to provide for an improved proficiency.

Chelsea Kerwin, December 04, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

ThisPlusThat for Smarter Searches

December 2, 2013

Leave it to an astrophysicist to make search smarter. One of the fellows over at the Insight Data Science Fellows Program, Christopher Moody, describes how his search engine uses vector words to produce more accurate search results in, “ThisPlusThat.me: a Search Engine that Lets You ‘Add’ Words as Vectors.” The scientist says he was inspired by the possibilities presented by Google’s new vectoring algorithm, word2vec. He explains:

“What [Google] doesn’t do is understand the relationships between words and understand the similarities or dissimilarities. That’s where ThisPlusThat.me comes in–a search site I built to experiment with the word2vec algorithm recently released by Google. word2vec allows you to add and subtract concepts as if they were vectors, and get out sensible, and interesting results. I applied it to the Wikipedia corpus, and in doing so, tried creating an interactive search site that would allow users to put word2vec through its paces.”

Moody supplies several examples of his project in action. The first and most elementary: querying “King – Man + Woman” leads to “Queen.” Since the algorithm was trained using Wikipedia‘s vast collection of data, Moody explains, it has “a pretty good grasp of not only common words like ‘smart’ or ‘American’ but also loads of human concepts and real world objects, allowing us to manipulate proper nouns.” You can try ThisPlusThat.me for yourself here.

Moody explains how he approached word2vec’s huge dimensional vector table using Hadoop‘s Map functions. To speed computation, he tried a number of tools: NumPy, Cython, Numba, and Numexpr. Near the end of the article, Moody shares links to his code and notebook experiments. The write-up is worth a look for anyone interested in the development of natural language algorithms.

Cynthia Murrell, December 02, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Never Forget a Radio Station Again

November 29, 2013

Last Christmas I was ready to annihilate my regular radio stations, because they kept playing the same carol mix over and over again. There was not one new song introduced within a twenty-four hour period. Looking for some relief, I surfed the FM waves in hopes of finding a new station. My efforts were rewarded with a station I had never heard before and I was filled with new musical glee. While I never found the station again, Michael Robertson can help me avoid WHAM’s cover of “Last Christmas I Gave You My Heart” by “Introducing the World’s First Radio Search Engine.” Robertson recently launched his beta version of RadioSearchEngine.com.

The article explains:

“There are other directories of A-Z lists of radio stations, but this is the first search engine where any song or artist can be located on stations playing from anywhere in the world. A universal web player for the first time connects to and plays nearly every station offering immediate audio satisfaction and unprecedented user control.

The search engine updates in real-time, so users will be able to track a song and instantly play it. The search engine indexes all the songs every three-five minutes for an instantaneous searchable music. Robertson’s creation also makes recommendations to the user based on the song selection, allows users to skip songs, and view popularity rankings.”

Before finishing the article, I was about to say that YouTube is just as easy, but the ability to fast forward, skip songs, and add new content is the search engine’s major selling point. Robertson might have just launched the newest music trend.

Whitney Grace, November 29, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

HP Autonomy: The Beat Goes On and On and On

November 27, 2013

I read “HP’s Meg Whitman Ordered to Face Autonomy Charges.” Hard on the heels of Hewlett Packard’s quarterly results, the company has to explain to one disgruntled shareholder why the Autonomy deal went south.

The write up states:

In the latest $1 billion (£647m) lawsuit, HP shareholders accused HP’s management team of ignoring warnings before it bought Autonomy for $11.3 billion (£7.3bn) in 2011 and that the company’s financial numbers had been exaggerated. It is also claimed that HP tried to get out of the deal before it closed. The company later took a nearly $9 billion write-down largely connected with the purchase.

The deal put a burr under some digital cowpokes’ saddles. HP paid $11 billion for Autonomy. At the time of the deal, Autonomy was an $800 to $900 million a year company. Some months after the deal closed, the canny HP management took an $8 billion write down on the Autonomy deal.

According to the Tech Week Europe article:

The investors allege that HP’s management was negligent because of the $8.8 billion (£5.7bn) write-down on the deal HP announced in November 2012. HP officials blamed ‘accounting irregularities’ by Autonomy executives in the months leading up to the deal. The investors allege that the resulting drop in HP’s stock price effectively wiped billions of dollars from the company’s market value. The FBI are said to be investigating the allegations, as is the UK’s Serious Fraud Office (SFO).

In the meantime, the HP deal has not generated the big time payoff that someone at HP assumed would result from the deal. HP, like many other search vendor buyers, seems to be learning that:

  1. Search is an expensive business to fund. Those marketing, research, and support costs are brutal. Most of the failed search vendors ran into financial trouble despite the ministrations of different CEOs. Maybe Autonomy was managed better? Interesting question.
  2. Search, by itself, is not a compelling product or service to many potential customers. As a result, search is no longer search. Search embraces dozens of functions from text mining to the ubiquitous and fuzzy Big Data. HP is now trying to market lots of search related products and services. My hunch is that this is a bigger job than trying to sell $11 billion worth of key word search licenses.
  3. Companies that are not really software centric do not understand the oddities of the enterprise search sector. My view is that MBAs at outfits like HP assume that their Swiss Army knife budgeting and managing skills are going to “fix up” an outfit like Autonomy. Billions will flow as a result of the MBA approach. Who needs a PhD with an aptitude for math to run a mere search company. HP is coming to grips with its own shortcomings in the vision and motivation departments of Autonomy.

An ironic twist to the tale is that HP licensed the hugely complex, expensive, and cumbersome Verity system. With the purchase of Autonomy, HP became the owner of Verity’s technology. The six figure license deal for Verity is now free when viewed one way. On the other hand, that Verity technology cost HP billions of dollars.

And what about the founder of Autonomy? Dr. Michael Lynch has set up an investment company called invoke capital. The company took an interest in Darktrace, a security firm. Dr. Lynch, according to the Financial Times,

…is also a defendant in a suit by HP’s shareholders relating to the acquisition. A court in San Francisco this month gave HP a deadline of January to complete an internal audit, a decision welcomed by Mr Lynch.

The year 2014 may hold more fodder for business school case studies about Hewlett Packard and Autonomy. I am eager.

Stephen E Arnold, November 27, 2013

Coveo Explains: Complex Enterprise Search Delivered in a Day

November 27, 2013

The subtitle is the keeper, however: “No, I’m not insane.” The insane person is Wim Nijmeijer or Nicky Singh. Interesting semantic connection to either entity I believe. I learned this “insanity” stuff in a candidate chunk of possible PR ersatz http://goo.gl/ogVgIe. Since the publication of the New York Times’ story about Vocus and its PR spam, I have started a collection of search vendor messaging that may be a trifle light in the protein department.

Here’s the passage I noted:

Today Coveo announced that it will lead a session at Search Solutions 2013 on Wednesday, November 27 in London, UK.

No problem except that Coveo itself announces that its staff will explain the nuances behind “No, I’m not insane.” A third party “voice” might help.

There were some supporting “facts”. Here’s an example of a fact:

The reality is that many enterprise search implementations are far from simple, and often match the complexity of the systems they need to interface with. Coveo understands the complexity and challenges of enterprise search. Our revolutionary Search & Relevance Technology securely connects with all of an organization’s systems, and harnesses big, fragmented data from any combination of cloud, social and on-premise systems — without complex integrations.

Okay. Okay, well, “facts” may be too strong a word. I think the “revolutionary” and the “all” are going to be tough for me to accept. In a large organization, figuring out what not to make available can be time consuming in my experience. Toss is the information that will cause the company to feel a bit of heat, and you have some heavy lifting.

For instance, is “all” possible in today’s regulated environment. What about employee medical records, documents related to secret contracts and research work, salary information, clinical trial data, information related to a legal matter, and “any combination of cloud, social, and on premises systems”? Insane? Okay.

Well, maybe Coveo can deliver?

My observations:

  1. On a conference call with an enterprise search vendor, I pointed out that marketing enterprise solutions has changed. Hyperbole and cheerleading have replaced the more mundane information that answers such questions as, “Will this system work?” There continues to be skepticism in some circles about the claims of search vendors.
  2. Sending messages about oneself are interesting but even Paris Hilton and Lady Gaga employ publicists. Sure, Lady Gaga uses a drone dress to get media coverage, but she doesn’t issue a news release that says, “No, I’m not insane.”
  3. Enterprise search groups on LinkedIn are struggling with the question, “Why do vendors get fired?” The reason goes back to the days of Verity. That company charted the course that many vendors wittingly or unwittingly followed; that is, promise absolutely anything to get the job. The legacy of Verity’s mind boggling complexity are marketing assertions that enterprise search works and can be up and running in a day.

Not even Google can make that eight hour assertion stick for the new Google Search Appliance with 100 percent confidence in my experience.

Anyway, by the time you read this, the lecture “No, I’m not insane” by a Coveo expert will be over. I suppose I can catch the summary in the Guardian.  Stop the presses.

Stephen E Arnold, November 27, 2013

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta