Business Structures Revealed through New Analysis Technique

April 7, 2013

Now here is an interesting implication of social-graph analysis in business. The MIT Technology Review reports, “Social Networks Reveal Structure (And Weaknesses) of Business.” We’ve known for some time that, through the analysis of connections, social networks can reveal even more about us than is obvious to most users. Now, researchers at Israel’s Ben Gurion University used this concept to derive an impressive amount of information about businesses. The article reveals that the team begins:

“. . . by using a search engine to find the Facebook pages of a number of individuals who work for a specific company.

“Using these individuals as seeds, they then begin crawling the social networks, sometimes jumping from one network to another, looking for other individuals at the same company. These in turn become seeds to find more employees and so on.

“They end up with a basic network of links between employees within the company. It’s then that the fun begins.

“Using standard measures of connectedness, Fire and co then identified people in positions of leadership and by adding in details such as location, mined from the Facebook pages, they reconstructed the international structure of these organisations. They also used community detection algorithms to reconstruct the organisational structure of the company.”

Wow. The researchers used their method on several “well known hi-tech companies” and found startling details. For example, they found a cluster of comparatively disconnected folks at a large organization, and discerned they belonged to an acquired startup that had yet to be well-integrated into the company. This sort of information can be used by companies to monitor themselves, but it could also be used by potential investors (for good or ill for the business, I suppose, depending on what turned up.)

More ominously, competitors could use the information to their advantage. Now that this technology is in the news, many companies will want to prevent such details from emerging, but how? Researcher Michael Fire advises them to “enforce strict policies which control the use of social media by their employees.” Immediately, I might add. And, I suspect that whatever was previously considered a “strict policy” must become even more strict in order to avoid exposure from this technique.

Won’t employees be thrilled?

Cynthia Murrell, April 07, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

NLP: Do Not Look at Results. Look at Pictures

April 6, 2013

One of my two or three readers sent me a link to a LinkedIn post in the Information Access and Search Professionals section of the job hunting and consultant networking service. LinkedIn owns Slideshare (a hosting service for those who are comfortable communicating with presentations) and Pulse (an information aggregation service which plays the role of a selective dissemination of information service via a jazzy interface).

The posting which the reader wanted me to read was “How Natural Language Processing Will Change E Commerce Search Forever.” Now that is a bold statement. Most of the search systems we have tested feature facets, prediction, personalization, hit boosting for specials and deals, and near real time inventory updating.

The company posting the information put a version of the LinkedIn information on the Web at Inbenta.

The point of the information is to suggest that Inbenta can deliver more functionality which is backed by what is called “search to buy conversions.” In today’s economy, that’s catnip to many ecommerce site owners who—I presume—use Endeca, Exalead, SLI, and EasyAsk, among others.

I  am okay with a vendor  like Inbenta or any of the analytics hustlers asserting that one type of cheese is better than another. In France alone, there are more than 200 varieties and each has a “best”. When it comes to search, there is no easy way to do a tasting unless I can get my hands on the fungible Chevrotin.

Search, like cheese, has to be experienced, not talked about. A happy nibble to Alpes gourmet at http://www.alpesgourmet.com/fromage-savoie-vercors/1008.php

In the case of this Inbenta demonstration, I am enjoined to look at two sets of results from a the Grainger.com site. The problem is I cannot read the screenshots. I am not able to determine if the present Grainer.com site is the one used for the “before” and “after” examples.

Next I am asked to look at queries from PCMall.com. Again, I could not read the screenshots. The write up says:

Again, the actual details of the search results are not important; just pay attention that both are very different. But in both cases, wasn’t what we searched basically the same thing? Why are the results so different?

The same approach was used to demonstrate that Amazon’s ecommerce search is doing some interesting things. Amazon is working on search at this time, and I think the company realizes that its system for ecommerce and for the hosted service leaves something out of the cookie recipe.

My view is that if a vendor wants to call attention to differences, perhaps these simple guidelines would eliminate the confusion and frustration I experience when I try to figure out what is going on, what is good and bad, and how the outputs differ:

First, provide a link to each of the systems so I can run the queries and look at the results myself. I did not buy into the Watson Jeopardy promotion because in television, magic takes place in some editing studios. Screenshots which I cannot read nor replicate open the door to similar suspicions.

Second, to communicate the “fix” I need more than an empty data table. A list of options does not help me. We continue to struggle with systems which describe a “to be” future yet cannot deliver a “here and now” result. I have a long and winding call with an analytics vendor in Nashville, Tennessee which follows a similar, abstract path in explaining what the company’s technology does. If one cannot show functionality, I don’t have time to listen to science fiction.

Third, the listing of high profile sites is useful for search engine optimization, but not for making crystal clear the whys and wherefores of a content processing system. Specific information is needed, please.

To wrap up, let me quote from the Inbenta essay:

By applying these techniques on e-commerce website search, we have accomplished the following results in the first few weeks.

  • Increase in conversion ratio: +1.73%
  • Increase average purchase value: +11%

Okay, interesting numbers. What is the factual foundation of them? What method was used to calculate the deltas? What was the historical base of the specific sites in the sample?

In a world in which vendors and their pet consultants jump forward with predictions, assertions, and announcements of breakthroughs—some simple facts can be quite helpful. I am okay with self promotion but when asking me to see comparisons, I have to be able to run the queries myself. Without that important step, I am skeptical just as I was with the sci-fi fancies of the folks who put marketing before substance.

Stephen E Arnold, April 6, 2013

Sponsored by Augmentext

Newest Version of MongoDB Includes Text Search

April 6, 2013

Some welcome enhancements to MongoDB are included in the open-source data base’s latest release, we learn from “MongoDB 2.4 Can Now Search Text,” posted at the H Open. The ability to search text indexes has been one of the most requested features, and the indexing supports 14 languages (or no language at all.) The write-up supplies this handy link to a discussion of techniques for creating and searching text indexes.

The post describes a second feature of MongoDB 2.4, the hashed index and sharding:

“Hash-based sharding allows data and CPU load to be spread well between distributed database nodes in a simple to implement way. The developers recommend it for cases of randomly accessed documents or unpredictable access patterns. New Geospatial indexes with support for GeoJSON and spherical geometry allow for 2dsphere indexing; this, in turn, offers better spherical queries and can store points, lines and polygons.”

There is also a new modular authentication system, though its availability is limited so far. The project has also: added support for fixed sized arrays in documents; optimized counting performance in the execution engine; and added a working set size analyzer. See the article for more details, or see the release notes, which include upgrade instructions. The newest version can be downloaded here.

Cynthia Murrell, April 06, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

LinkedIn Focuses on Search

April 5, 2013

LinkedIn wants to make search easier for its members. The Computerworld article “LinkedIn Sharpens Search Engine Feature” gives all of the details about the new revamped search system. With this new system LinkedIn wants its members to be able to find information easier on their site. LinkedIn’s initial goal was to provide a place for professionals to place their career bios as well as interact with their peers and colleagues. However, LinkedIn has grown and now serves a much larger audience. Companies as well as various groups have set up pages. In addition there is a job section as well as a section where individuals and publishers can share or posts comments, as well as provide links to articles. LinkedIn’s search engine sales 5.7 billion queries last year alone so the new search features will definitely reach a large audience. Johnathan Podemsky, a LinkedIn product manager shared the following

“Now, all you need to do is type what you’re looking for into the search box and you’ll see a comprehensive page of results that pulls content from all across LinkedIn including people, jobs, groups and companies.”

In addition to segmenting their results users will also enjoy auto-complete and suggested search capabilities to help them fine-tune their query terms. The search engine will also keep a log of members search queries in order to help deliver better results. It is important to note that these changes will only be applied to the main site and not the mobile application. Regardless, these new search features will definitely improve LinkedIn search capabilities for users. It seems that LinkedIn is definitely paying attention to the needs of their users and takes search very serious. Users want good results but they also want a user friendly and efficient search system. Looks like LinkedIn is on the right page.

April Holmes, April 05, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Game Over Mode: Consumers and Searching

April 4, 2013

This morning I read “As Web Search Goes Mobile, Competitors Chip at Google’s Lead.” Keep in mind that when the link goes dead you will need the paper edition of the story on pages A 1 and A 4 of the April 4, 2013 issue or a for fee password to the New York Times’s online service.)

image

The main point is that mobile is surging. For many reasons, mobile search does not work the way desktop search and Web surfing worked when Backrub was bubbling toward Google. The article identifies the geolocation trend where coordinates coupled with some data about user behavior can deliver a place to buy coffee.

The article then says:

No longer do consumers want to search the Web like the index of a book — finding links at which a particular keyword appears. They expect new kinds of customized search, like that on topical sites such as Yelp, TripAdvisor or Amazon, which are chipping away at Google’s hold. Google and its competitors are trying to develop the knowledge and comprehension to answer specific queries, not just point users in the right direction.

The story then points out that there are 30 trillion Web address which is definitely quite a few places to index content. Searching a massive index with 2.5 words just does not work for “consumers.”

The story identifies social systems which put a person closer to someone or some information from someone which answers the user’s question. The wrap up to the article quotes a Google “fellow” who correctly states a Google truism:

“Most people have this very strong Google habit,” he said. “I go there every day and it gives me information I want, so it’s a self-reinforcing cycle. Not anyone can come in and just do those things.”

So what exactly is happening in consumer search? Outfits like Amazon and LinkedIn look like they are growing and presumably taking traffic from Google. On the other hand, Google seems confident that its market share and its remarkable diversity of ways to present information to users is in pretty good shape. Is this a chess-type draw, a paradox, or an analysis which makes search almost impossible to discuss without getting lost in clicks, segments, traffic, and user behavior data?

My view is that search has become a word which is acceptable in some circles and the equivalent of a curse word in others. Consumer wants answers to questions, and according to some experts, answers to questions the user does not know she yet has formulated. Vendors want revenue. Advertisers want people to buy their products and services. Teens want whatever teens want. Each tiny grouping of online users which can be labeled has search needs.

The problem is that figuring out exactly what the “need” is in a specific context is a field where further research and innovation are needed.

Read more

Ingersoll Says the Solution is Search

April 4, 2013

For companies tackling big problems related to large sets of data, Grant Ingersoll has the solution – search. At the recent GigaOm Structure: Data Conference, Ingersoll, CTO of LucidWorks, recommends that organizations take another look at search solutions. GigaOm covers the details in their story, “How Search Can Solve Big Data Problems.”

The article begins:

“There are many solutions for figuring out how to parse large amounts of data, but LucidWorks CTO Grant Ingersoll has a suggestion: use search. At GigaOM’s Structure:Data conference in New York City Thursday, Ingersoll laid out his case for why search is a big part of dealing with databases and indexes. ‘Search should be a critical part of your architecture,’ he told attendees. It is a system building block for any large problem you’re trying to solve that requires a ranked set of results. And it doesn’t have to be just text search, it can be for any type of search, he said.”

Ingersoll goes on to assert that search has changed dramatically quickly. For those organizations that have not updated their search solution in several years, there are more options on the market that are likely to serve their purposes more effectively. LucidWorks, Ingersoll’s company, is a longstanding name in the field, and yet has undergone dramatic changes even in the last few years. If your organization is exploring options for more effective search and Big Data management, LucidWorks is worth a serious look.

Emily Rae Aldridge, April 4, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Advice for Scalable Search from Parse

April 4, 2013

Ah, the excitement of scaling. The ParseBlog gives developers some practical advice in, “Implementing Scalable Search on a NoSQL Backend.” As the makers of the popular cloud platform used by such conspicuous clients as Cisco, Ferrari, and the Food Network, Parse should know what they’re talking about, particularly when it comes to working with their product.

Engineer Brad Kittenbrink emphasizes that simple search algorithms, perfectly good for quickly getting a prototype up and running, can lead to seriously bogged-down performance later. He writes:

“The key to making searches run efficiently is to minimize the number of documents that have to be examined when executing each query by using an index. To do that you need to keep in mind what kinds of queries you want to support when designing how to organize your data. The more structured and limited these queries are, the easier this will be. . . .

“To organize your data model to support efficient searching, you’ll need to know a bit about how our systems are operating behind the abstraction. You’ll need to build your data model in a way that it’s easy for us to build an index for the data you want to be searchable.”

The post notes that Parse has implemented some new features to make searches more efficient, and goes on to give a couple of examples, including some sample code. Launched in 2011, the company is located in San Francisco. And, by the way, they are hiring.

Cynthia Murrell, April 04, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

GIF Images Get Easy Button Thanks to Google

April 3, 2013

Is there anything Google isn’t affiliated with these days? I didn’t think so. Wired reports in its article on “Google Image Search: Now With More GIF Action” that information powerhouse is now turning its sights on graphics interchange format(GIF).

“On Tuesday, Google announced via Google+ that Image Search now has an “Animated” filter. That means that if you’re only searching for animated magic, you need never be bothered with a still image again. Finally that search for Jennifer Lawrence GIFs from the Academy Awards just got a whole lot easier.”

GIF’s have been around since 1987 and have become the go to for short animations on the Web . The feature is still being worked out but for now when you search an image in Google Images, you can select the drop down menu in the Search Tools category and simply click on animations.

It doesn’t seem like a significant change to the Google lineup but it does have a consumer first approach to the addition. If Google is the only place you can filter your content to find the exact information you want, well, Google then becomes the go-to.

Leslie Radcliff, April 3, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Medical Search Engine Identifies Rare Diseases

April 3, 2013

Specialized search engines are often used to located subject-specific professional or academic information in databases. Certain professions are used to shying away from the open web for fear of retrieving poor quality information. However, a new project is proving that quality medical information can be retrieved from the open web. Read more about FindZebra.com in the article, “New Medical Search Engine Quickly IDs Rare Diseases.”

The article states:

“In medical school, students are taught to concentrate on more common diseases, not ‘zebras’–slang for a surprising diagnosis. Now, the zebras have taken to the web at FindZebra.com, a new search engine for medical professionals which navigates the web quickly to identify rare and genetic diseases. Researchers . . . sought out to assess how well web search engines, such as Google, work for diagnostic queries, and what contributes to web research success or failure. The results determined that FindZebra outperformed Google Search. The authors concluded that a specialized search engine can improve online diagnostic quality without a loss of ease of use that popular search engines possess.”

It seems that quality results can be retrieved easily. This is the ultimate aim of search, quick, effective, and easy. LucidWorks aims to achieve the same goal in the much more difficult environment of enterprise search. Their expertise combines solid open source infrastructure, built on Lucene/Solr, with award winning customer support.

Emily Rae Aldridge, April 3, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Specialized Search Engine Helps Diagnose Rare Diseases

April 3, 2013

A recent piece from the MIT Technology Review that examines “The Rare Disease Search Engine That Outperforms Google” compares apples with oranges. The real takeaway is much bigger than a swipe at Google—that technical innovation is being used to help humanity.

Rare diseases are notoriously difficult to diagnose, and medical professionals have been using an Internet search engine, usually Google, to help with the process for years. Of course, Google was not designed for that use, so researchers have created a tailor-made engine to streamline this difficult but essential task. The article informs us:

“Radu Dragusin at the Technical University of Denmark and a few pals unveil an alternative. These guys have set up a bespoke search engine dedicated to the diagnosis of rare diseases called FindZebra, a name based on the common medical slang [“zebra”]for a rare disease. After comparing the results from this engine against the same searches on Google, they show that it is significantly better at returning relevant results.”

Is this supposed to be a surprise? Google does ads, not rare diseases. Ah well, the important thing is that doctors have a powerful new tool to help folks with diseases that stoutly defy accurate identification. How did the team from the Technical University of Denmark do it? The write-up goes on to say:

“The magic sauce in FindZebra is the index it uses to hunt for results. These guys have created this index by crawling a specially selected set of curated  databases on rare diseases. . . . They then use the open source information retrieval tool Indri  to search this index via a website with a conventional search engine interface. The result is FindZebra.”

Though the zebra engine is still an in-progress research project, the team has made it publically available at www.findzebra.com. Medical professionals can already use the innovation to help patients who might otherwise be doomed to years of painful frustration. Hooray, progress!

Cynthia Murrell, April 03, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta