Google Semantics Surfacing

January 8, 2009

ReadWriteWeb.com (January 6, 2009) ran an interesting article that tiptoes around Google’s semantic activities. You will want to read “Did Google Just Expose Semantic Data in Search Results”. Google won’t answer the question, of course. But the addled goose will, “Yep, where have you been since early 2007?” Let me point out that Marshall Kirkpatrick has done a good job of tracking down “in the wild” examples of Google’s machine-based semantic methods. These examples (and others in Google open source documents) make it clear that the semantic activities are chugging along and maturing nicely. “Semantics” as used in this write up means “figuring out what something is about.” Once one knows the “about” part of an information object, then other methods can hook these “about” metadata together. If you want to get a sense of the scope of the Google semantic system, click here. I have a checking copy of the report I wrote for BearStearns before that outfit went up in flames or down the drain. (Pick your metaphor.) My write up here does not include the detail that is in the full discussion in Google Version 2.0 here. But this draft provides some meat for the “in the wild” examples found in Mr. Kirkpatrick’s good article. How significant is the investment in semantics at Google? You can find some color on the sweep of Google’s semantic activities in the dataspace white paper Sue Feldman and I wrote (September 2008). You can get this report from IDC; it is report number 213562.

Let me close with three observations:

Google is deeply involved in semantics, but with a Googley twist. Watching for examples in the wild is a very useful activity, especially for competitors
The notion of semantics is sufficiently broad to embrace metadata generation and new types of metadata so that new types of data constructs can be automatically generated by Google. Think publishing new constructs for money.
The competitors chasing Google face the increasingly likely prospect that Google has jumped over its own present position and will land even farther ahead of the likes of IBM, Microsoft, Oracle, and SAP. Yahoo. Forget them. The smart Yahooligans are either at Google or surfing on Google.

Now I expect some push back from the anti Google crowd. Have at it. Just make sure you have internalized Google’s technical papers, Google “talks”, and the patent documentation. This goose is not too interested in uninformed challenges. You can read more about Google semantics and in my forthcoming Google and Publishing study from my trusty publisher Infonortics Ltd. located near Oxford, England, in the spring.

Stephen Arnold, January 8, 2009

Written by Stephen E. Arnold · Filed Under Cloud computing, Database, Enterprise, Google, News, Online (general), Search, Semantic, Technology, Text analytics, Text processing | 3 Comments

New Conference Pushes beyond Search

January 5, 2009

After watching some of the traditional search and content processing conferences fall on their swords, muffins, and self-assurance in 2008, I have rejiggled my conference plans for 2009. One new venue that caught my attention is The Rockley Group’s event in Palm Springs, California, January 29-30, 2009. You can get more informatio0n about the program here. The event organizer is Ann Rockley, who is one of the people emphasizing the importance of intelligent content.

Ann Rockley, The Rockley Group

I spoke with Ms. Rockley on January 2, 2008. The text of that conversation appears below:

Why is another conference needed?

Admittedly there are a lot of conferences around for people to attend, but not one that focuses specifically on the topic of Intelligent Content. My background is content management, structured content and XML. There are lots of conferences that focus mainly on the technology, others that focus on the content vehicle or channel (e.g., web) and others that focus on XML. The technology oriented conferences often lose sight of the content; who it’s for, how can we most effectively create it and most importantly how can we optimize it for our customers. The content channel oriented conferences e.g. Web, focus only on the vehicle and forget that content is not just about the way we distribute it; content should be optimized for each channel yet at the same time it should be possible to repurpose and reconfigure the content for multiple channels. And XML conferences tend to be highly technical, focusing on the code and the applications and not on how we can optimize our content using XML so that we can manipulate it and transform it much the way we do data. So this conference is all about the CONTENT! Identifying how we can most effectively create it so that we can manipulate it, transform it and deliver it in a multitude of ways personalized for a particular audience is an area of focus sadly lacking in many conferences.

With topics like Web 2.0 and Social Search I am at a loss to know what will be covered. What are the issues your conference will address?

Web 2.0 is about social networking and sharing of content and media and it has had a tremendous influence on content. Organizations have huge volumes of content stuck in static web pages or files and they have a growing volume of content stuck, and sometimes lost in the masses of content being accumulated in wikis, blogs, etc. How can organizations integrate their content, share their content and make it most useful to their customers and readers without a lot of additional work? How do we combine the best of Web 2.0 with the best of traditional content practices? Organizations don’t have the time, resources or budget to do all the things we need and want to do for our customers, but if we create our content intelligently in the first place (structure it, tag it, store it) we can increase our ability to do so much more and increase our ability to effectively meet our customers’ needs. This conference was specifically designed to answer those questions.

Intelligent Content provides a venue for sharing information on such topics as:

Personalization (structured content, metadata and XQuery)
Intelligent publishing (dynamic multichannel delivery)
Hybrid content strategies (integrating Web 2.0 content with traditional customer content)
Dynamic messaging/personalized marketing
Increasing findability
Content/Information Management

Most attendees complain about two things: The quality of the presentations and the need for better networking with other attendees. How are you addressing these issues?

We are doing things a little differently. All the speakers have been assigned a mentor for review of their outline, drafts and final materials. We are working with them closely to ensure that the presentations are top notch and we have asked them all to summarize their information in Best Practices and Tips. In addition, Intelligent Content was designed to be a small intimate conference with lots of opportunities to network. We will have a luncheon with tables focused on special interests and we are arranging “Birds of a Feather” dinners where like-minded people can get together over a great meal and chat, have fun and network. We also have a number of panels which are designed to work interactively with the audience. And to increase the feeling intimacy we have not chosen to hold the conference in a traditional “big box” hotel, rather we have chosen a boutique hotel, the Parker Palm Springs (http://www.starwoodhotels.com/lemeridien/property/overview/index.html?propertyID=1911), a hotel favored by Hollywood stars from the 1930s. It is a very cool hotel with lots of character that encourages people to have fun while they interact and network.

What will you offer attendees?

The two day conference includes 16 sessions, 2 panels, breakfast, lunch and snacks. It also includes organized networking sessions both for lunch and dinner, and opportunities to ask the Experts key questions. And the conference isn’t over when it is over, we are setting up a Community of Practice including a blog, discussion forum, and webinars to continue to share and network so that every attendee will have an instant ongoing network.

I enjoy small group sessions and simple things like going to dinner with a group of people whom I don’t know. Will you include activities to help attendees make connections?

Absolutely. We deliberately designed the conference to be a small intimate learning experiencing so people weren’t lost in the crowd and we have specifically created a number of luncheon and dinner networking experiences.

How can I register to attend? What is the url for more information.

The conference information can be found at www.intelligentcontent2009.com. Contact info@intelligentcontent2009.com if you have questions. Note that the conference hotel is really a vacation destination so we can only hold the rooms at the special rate for a limited time and that expires January 12th so act quickly. And we’ve extended the early bird registration to Jan. 12 as well. If you have any other questions you can contact us at moreinfo@rockley.com.

Stephen Arnold, January 5, 2008

Written by Stephen E. Arnold · Filed Under Cloud computing, Conferences, Enterprise, Interview, News, Search, Semantic, Social, Text analytics, Text processing | Comments Off on New Conference Pushes beyond Search

Interview Exclusive: Exalead’s New US Chief Executive Officer

January 5, 2009

On January 2, 2008, I spoke with Paul Doscher, the newly appointed chief executive officer for Exalead, the Paris-based information access company. I received a preview of Exalead technology in November 2008, and I will summarize some of my impressions in a short white paper on my ArnoldIT.com Web site in the next few days.

The full text of my interview with Mr. Doscher appears below:

Why are you expanding in the US market? What’s your background?

Exalead has seen tremendous growth in Europe over the past few years and unlike some of our competitors, our clients are with us for the long haul. We enjoy 100% customer referenceability in Europe. The US represents a significant growth engine for Exalead and we believe we are in a unique position not just to grow our US business – but to help redefine the information access industry.

I have been in the computer software space for 30 years starting in sales and sales management eventually leading to my most recent role as CEO. I have worked in companies such as Oracle, Business Objects and VMware. Before becoming CEO of Exalead, Inc I was CEO of JasperSoft, the leading open source business intelligence company.

What is the major content processing problem your system solves?

This is a new era in information access. In business, valuable information is increasingly stored in silos – dozens of various locations and data formats – that are hard to retrieve in a way that provides necessary context to the end user. Exalead CloudView has been designed to make sense of the structured and unstructured data found both internally behind the firewall and from external sources. Exalead offers quick-to-implement information access solutions that help workers, partners and customers make better, faster and more accurate business decisions.

What is the basis of your firm’s technical approach?

Exalead provides a highly scalable and manageable information access platform built on open standards. Exalead transforms raw data, whatever its nature, into actionable intelligence through best of breed indexing, extraction and classification technologies.

Can you give me an example of your system in action? You don’t have to mention a company name, but I am interested in what the problem was and what your system delivered to the customer?

Exalead is moving beyond what people generally think of when they think about enterprise search. I’ll give you two examples – one that discusses an innovative use case of searching structured data. The second discusses unstructured data.

First is an example of our dealing with structured data. GEFCO, €3.5 billion company, ranks among Europe’s leading transport and logistics firms. They are using Exalead to track their vehicles. GEFCO’s new “Track and Trace” application is built upon Exalead’s flagship platform that offers powerful search functionality and can provide up-to-the-minute information from an extremely large data set. Integrated into GEFCO’s Internet portal Gefconet, Track and Trace allows GEFCO staff, partners and customers to locate the exact position of vehicles, track their progress and optimize transport schedules in real time.

Second is a project where we search and make sense of unstructured data. Our engineers at Exalead built an unreleased project called Restminer – a site aimed at helping find restaurants in a large city like New York City. What we do here is interesting. Restminer gives the user useful, structured information extracted from the unstructured web including dedicated press, blog posts, restaurant reviews, directories – with relevant tips coming from different sources.

Exalead is French owned company. What’s the customer footprint? As you look forward what is your goal for the footprint in 2009?

At the end of 20008, we have around 190 customers across multiple vertical markets including on-line media/publishing, social networking, the public sector, on-line directories, financial services and telecommunications. We are looking for 50% growth in our customer base in 2009.

The Exalead software was quite solid? What are the benefits your system delivers to a typical enterprise customer? Is it search or another type of solution?

Exalead provides information access and search solutions in basically three market segments: OEM, B2C and B2B.

In the OEM [original equipment manufacturing] market, software companies have realized what a powerful embedded search platform can bring to their own solution. ISVs [independent software vendors] enrich their functional capabilities by introducing new sources of content and more powerful access retrieval into their core applications.

In the B2C space, consumer web sites such as our customer RightMove in the UK are finding that a highly scalable information access solution can save on hardware costs and make their visitor’s experience much better (for www.rightmove.co.uk). Globally, we are seeing sites use our cutting edge semantic mash-up technologies to bring search result from video, audio and text, such as http://virgilio.alice.it/ in Italy.

For our B2B customers, we are seeing companies implement real-time search across multiple data repositories. Any search platform tied to mission critical business applications have to be flexible, scalable and fast. Exalead’s product is used in various mission critical implementations, including track and tracing trucks; operational reporting and large scale document searches.

I recall hearing that your firm has patented technology? Can you provide me with a snapshot of this invention? What’s the patent application number? How many patents does your firm have? What are the key features of the Exalead CloudView system?

Exalead has a significant number of patents granted and pending both in the US and EU relating to the areas of intelligent searching, indexing, keyword extraction and other aspects of the search technology. For example, US Patent 7152064 was issued to Exalead in 2006, providing for improved unified search results – allowing for end users to more easily navigate and refine complex search results.

Our explosive growth continues to drive innovation and functionality into our products – we continue to submit for new patents as our product expands.

In the OEM sector, Autonomy seems to be the giant with its OEM deals with BEA and the Verity OEM deals. Some of the Verity deals date from the late 1980s. How do you see Exalead fitting into this sector?

There is always a place for innovation. We are confident in our capabilities and how they can meet the growing demands of OEMs.

We are beginning to see customers move away from our competitor’s legacy OEM solutions. We provide an easy to implement, scalable and manageable solution. Also, we see growing demand for our simpler licensing model – which makes life much easier for our customers.

Exalead OEM has all the rich features as our other product platforms such as Enterprise Search Edition and the 360 Edition. No matter how huge the volume of information processed by the OEM application, Exalead CloudView provides an easy to implement SOA architecture. OEM customers build applications that search their own system’s content – as well as from any kind of other sources that can be relevant. OEMs can dramatically increase their product functionality and differentiation by adding search of external Web sites, external knowledge bases and building in new hybrid services using our developer kit.

There’s quite a bit of turmoil in search. In fact, the last few weeks Alexa (an Amazon company) closed its web search unit and Lycos Europe (which purchased software from my partner and me in the mid 1990s) said it would close up shop. What’s that mean for Exalead going forward?

Our web search engine is available at www.exalead.com/search. Based on CloudView, it provides Internet users with an innovative way of discovering results and content from the Web’s 8 billion+ pages. Web search has always been a real world lab to test our technologies and user features – some of which, like facial recognition, have been implemented on Exalead well in advance of their use on other major search sites. But, more than this, we consider the Web as a key source of information – competitive intelligence, partner information, customer information, legal documents, external database providers, blogs, etc. There is more and more key information on the web that enterprises need to manage effectively. Exalead Web search is key in the overall Exalead strategy – and the functionality on our Internet search site will continue to drive innovation in our information access platform.

One trend in enterprise content processing is the shift from results lists to answers. Among the companies in this sector are Relegence (a Time Warner company), Connotate (privately held but backed by Goldman Sachs), and Attivio (a company describing itself as delivering active intelligence). Each of these firms is really in the search business but positioning search as “intelligence”. What’s your take on the changing face of search in an organization?

If making information instantly available for decisions is intelligence, we definitely are working in the information intelligence business. Our approach is driven by customer demand for TCO and ROI – we bring real value to businesses looking to make better, faster decisions. For example, at our customer GEFCO, structured data is available in real time for staff and customers so transportation cycles can be adjusted in real time – significantly improving their bottom line.

As the economic crisis depends, we continue to see our partners such as Capgemini, Logica, and Sogeti come up with new, exciting solutions for Exalead CloudView for their customers.

Google has been a disruptive force in search. In one US agency, different Google resellers have placed search appliances, often at $400,000 a unit in a major US government agency. No single person realized that there were more than $6 million worth of devices. As a result, the project to “fix” search means that Google is the default search system. What are the challenges and opportunities Google presents to Exalead? What about the challenges and opportunities Microsoft presents with its strong grip on the desktop and a growing presence in servers?

Ironically, former Google and Microsoft customers fuel much of our sales funnel – so we appreciate and benefit from everyone’s niche in this marketplace.

Google raised end-user expectations about what web search can achieve – it brought a new level of simplicity, relevancy and interactivity. But as we’ve seen as more Google Enterprise Search customers move to Exalead – bringing this functionality to enterprises is a different matter all together.

Google Enterprise Search has technical and functional limits in terms of scalability, security compliance, the ability to search structure and unstructured data and the ability to provide all the necessary context to make a search relevant. Enterprises know that information access means more than a flat list of results – which is driving more companies to look at Exalead.

Microsoft and its acquisition of FAST Search & Transfer brought many opportunities to us as well. For example, we’ve seen a growing number of companies who use Linux or other non-Microsoft operating systems look for a new partner instead of Microsoft.

Mobile search is slowly making headway. Some of the push has been because of the iPhone and Google’s report that queries on an iPhone are higher than from users with other brands of smart phones? What does Exalead provide for mobile search?

Exalead is actively working with mobile companies and telcos in a number of ways. We launched an iPhone search www.exalead.com/iphone in Europe. We are also working with mobile companies to help connect mobile devices to PCs and help accelerate access to mobile content. We will announce more of this functionality in 2009.

The economic climate is quite weak. How is Exalead adjusting to this global problem? I have heard that you have built out a US office with more than two dozen people? Is that correct?

We met all of our aggressive sales numbers in 2008 – in large part because our technologies provide our customers a high return on their investment. We unleash new levels of information access and allow better, faster decision-making. So far, it appears the appetite for our offerings is growing in this economic client.

What are the three major trends you see with regards to search and content processing in 2009?

The biggest trend we see in 2009 is that search will become a development platform. Open product platforms like Exalead will become a platform for new, unexpected solutions by 3rd party vendors.

Other big trends in 2009 will be continuation of what we’ve seen over past few years: smarter context around search results and better searching of rich content including audio and video.

Can you hint at what’s coming in 2009 in terms of features in the CloudView system?

The launch of Exalead CloudView 360 later this year will be a game changer for the industry. Exalead CloudView 360 will have functionality that will transform heterogeneous corporate data into contextualized building blocks of business information that can be directly searched and queried – and allow for an explosion of new applications to be built on top of the platform.

Stephen Arnold, January 5, 2008

Written by Stephen E. Arnold · Filed Under Business strategy, EDiscovery, Enterprise, Financial, Interview, News, Search, Semantic, Text analytics, Text processing | 2 Comments

Natural Search: SEO Boffin Changes His Spots

January 2, 2009

The crackle of gun fire echoed through the hollow this morning. I am not sure if my neighbors are celebrating the new year or just getting some squirrel for a mid day burgoo. As I scanned the goodies in my newsreader, I learned about a type of search that had eluded me. I want to capture this notion before it dribbles off my slippery memory. Media Post reported in “Search Insider: The Inside Line on Search Marketing” that 2009 is ripe for “natural search”. The phrase appears in Rob Garner’s “Measuring Natural Search Marketing Success” here. The notion (I think) is that content helps a Web site come up in a results list. I had to sit down and preen my feathers. I was so excited by this insight I was ruffled. For me the most important comment was:

For starters, think of an investment in natural search as a protection for what you are currently getting from natural search engines across the board. Good natural search advice costs are a drop in the bucket compared to returns from natural search, and the risk of doing harm only once can far exceed your costs, and even do irreparable damage. I see clients with returns coming from natural search at over one half-billion to one billion dollars a year or more, and one simple slip could cost millions.

I must admit that I have to interpolate to ferret the meaning from this passage. What I concluded (your mileage may differ) is that if you don’t have content, you may not appear in a Google, Microsoft, or Yahoo results list.

What happened to the phrase “organic search”. I thought it evoked a digital Euell Gibbons moving from Web site to Web site, planting content seeds. “Natural search” has for me a murkier connotation. I think of Tom’s toothpaste, the Natural Products Association, and Mold Cleaner Molderizer.

My hunch is that Google’s tweaks to its PageRank algorithm places a heavy load on the shoulders of the SEO consultants. I have heard that some of the higher profile firms (which I will not name) are charging five figure fees and delivering spotty results. As a result, the SEO mavens are looking for a less risky way to get a Web site to appear in the Google rankings.

Mr. Garner is one of the first in 2009 to suggest that original content offering useful information to a site visitor is an “insurance policy”. I don’t agree. Content is the life support system of a Web site. You buy insurance for you automobile and home.

Stephen Arnold, January 1, 2009

Written by Stephen E. Arnold · Filed Under Business strategy, News, Search, Text analytics, Text processing | 1 Comment

Duplicates and Deduplication

December 29, 2008

In 1962, I was in Dr. Daphne Swartz’s Biology 103 class. I still don’t recall how I ended up amidst the future doctors and pharmacists, but there I was sitting next to my nemesis Camille Berg. She and I competed to get the top grades in every class we shared. I recall that Miss Berg knew that there five variations of twinning three dizygotic and two monozygotic. I had just turned 17 and knew about the Doublemint Twins. I had some catching up to do.

Duplicates continue to appear in data just as the five types of twins did in Bio 103. I find it amusing to hear and read about software that performs deduplication; that is, the machine process of determining which item is identical to another. The simplest type of deduplication is to take a list of numbers and eliminate any that are identical. You probably encountered this type of task in your first programming class. Life gets a bit more tricky when the values are expressed in different ways; for example, a mixed list with binary, hexadecimal, and real numbers plus a few more interesting versions tossed in for good measure. Deduplication becomes a bit more complicated.

At the other end of the scale, consider the challenge of examining two collections of electronic mail seized from a person of interest’s computers. There is the email from her laptop. And there is the email that resides on her desktop computer. Your job is to determine which emails are identical, prepare a single deduplicated list of those emails, generate a file of emails and attachments, and place the merged and deduplicated list on a system that will be used for eDiscovery.

Here are some of the challenges that you will face once you answer this question, “What’s a duplicate?” You have two allegedly identical emails and their attachments. One email is dated January 2, 2008; the other is dated January 3, 2008. You examine each email and find that difference between the two emails is in the inclusion of a single slide in the two PowerPoint decks. You conclude what:

The two emails are not identical and include both and the two attachments
The earlier email is the accurate one and exclude the later email
The later email is accurate and exclude the earlier email.

Now consider that you have 10 million emails to process. We have to go back to our definition of a duplicate and apply the rules for that duplicate to the collection of emails. If we get this wrong, there could be legal consequences. A system develop who generates a file of emails where a mathematical process has determined that a record is different may be too crude to deal with the problem in the context of eDiscovery. Math helps but it is not likely to be able to handle the onerous task of determining near matches and the reasoning required to determine which email is “the” email.

Which is Jill? Which is Jane? Parents keep both. Does data work like this? Source: http://celebritybabies.typepad.com/photos/uncategorized/2008/04/02/natalie_grant_twins.jpg

Here’s another situation. You are merging two files of credit card transactions. You have data from an IBM DB2 system and you have data from an Oracle system. The company wants to transform these data, deduplicate them, normalize them, and merge them to produce on master “clean” data table. No, you can’t Google for an offshore service bureau, you have to perform this task yourself. In my experience, the job is going to be tricky. Let me give you one example. You identify two records which agree in field name and data for a single row in Table A and Table B. But you notice that the telephone number varies by a single digit. Which is the correct telephone number? You do a quick spot check and find that half of the entries from Table B have this variant, or you can flip the analysis around and say that half of the entries in Table A vary from Table B. How do you determine which records are duplicates.

Written by Stephen E. Arnold · Filed Under Database, EDiscovery, Enterprise, Feature, Online (general), Search, Text analytics, Text processing | Comments Off on Duplicates and Deduplication

Moore’s Law: Not Enough for Google

December 29, 2008

I made good progress on my Google and Publishing report for Infonortics over the last three days. I sat down this morning and riffed through my Google technical document collection to find a number. The number is interesting because it appears in a Google patent document and provides a rough estimate of the links that Google would have to process when it runs its loopy text generation system. Here’s the number as it is expressed in the Google patent document:

50 million million billion links

Google’s engineers included an exclamation point to US7231393. The number is pretty big even by Googley standards. And who cares? Few pay much attention to Google’s PhD like technical documents. Google is a search company that sells advertising and until the forthcoming book about Google’s other business interests comes out, I don’t think many people realize that Moore’s law is not going to help Google when it processes lots of links–50 million million billion give or take a few million million.

When I scanned “Sustaining Moore’s Law – 10 Years of the CPU” by Vincent Chang here, I realized that Google has little choice to use fast CPUs and math together. In fact, the faster and more capable the CPU, the more math Google can use. Name another company worrying about Kolmogorov’s procedures?

Take a look at Mr. Chang’s article. The graph shows that the number of transistors appear to keep doubling. The problem is that information keeps growing and the type of analysis Google wants to do to use various probabilistic methods is rising even faster.

The idea that building more data centers allows Google to do more is only half the story. The other half is math. Competitors who focus on building data centers, therefore, may be addressing only part of the job when trying to catch up with Google. Leapfrogging Google seems difficult if my understanding of the issue.

Written by Stephen E. Arnold · Filed Under Google, News, Search, Technology, Text analytics, Text processing | Comments Off on Moore’s Law: Not Enough for Google

Getting Doored by Search

December 28, 2008

Have you been in Manhattan and watch a bike messenger surprised by a car door opening. The bike messenger loses these battles, which typically destroy the front wheel of the bike. When this occurs, the messenger has been doored. You can experience a similar surprise with enterprise search.

What happens when you get doored. Source: http://citynoise.org/author/ken_rosatio

The first situation is one that will be increasingly common in 2009. As the economy tanks, litigation is likely to increase. This means that you will need to provide information as part of the legal discovery process. You will get doored if you try to use your existing search system for this function. No go. You will need specialized systems and you will have to be able to provide assurance that spoliation will not occur. “Spoliation” refers to changing an email. Autonomy offers a solution, to cite one example.

The second situation occurs when you implement one of the social systems; for example, a Web log or a wiki. You will find that most enterprise search systems may lack filters to handle the content in blogs. Some vendors–for example, Blossom Search–can index Web log content. Exalead has a connector to index information within the Blogger.com and other systems. However, your search system may lack the connector. You will be doored because you will have to code or buy a connector. Ouch.

The third situation arises when you need to make email searchable from a mobile device. To pull this off, you need to find a way to preserve security, prevent a user from deleting mail from her desktop or the mail server, and deliver results without latency. When you try this trick with most enterprise search systems, you will be doored. The fix is to tap a vendor like Coveo and use that company’s email search system.

There’s a small consulting outfit prancing around like a holiday elf saying, “Search is simple. Search is easy. Search is transparent.” Like elves, this assertion is a weird mix of silliness, fairy dust, and ignorance. If this outfit helps you deal with a “simple” search, prepare to get doored. It may not be the search system; it may be your colleagues.

Stephen Arnold, December 28, 2008

Written by Stephen E. Arnold · Filed Under News, Search, Technology, Text analytics, Text processing | Comments Off on Getting Doored by Search

Google Translation Nudges Forward

December 27, 2008

I recall a chipper 20 something telling me she learned in her first class in engineering; to wit, “Patent applications are not products.” As a trophy generation member, flush with entitlement, she’s is generally correct, but patent applications are not accidental. They are instrumental. If you are working on translation software, you may want to check out Google’s December 25, 2008, “Machine Translation for Query Expansion.” You can find this document by searching the wonderful USPTO system for US20080319962. Once you have that document in front of you, you will learn that Google asserts that it can snag a query, generate synonyms from its statistical machine translation system, and pull back a collection. There are some other methods in the patent application. When I read it, my thought was, “Run a query in English, get back documents in other languages that match the query, and punch the Google Translate button and see the source document in English.” Your interpretation may vary. I was amused that the document appeared on December 25, 2008, when most of the US government was on holiday. I guess the USPTO is working hard to win the favor of the incoming administration.

Stephen Arnold, December 27, 2008

Written by Stephen E. Arnold · Filed Under Cloud computing, Google, News, Online (general), Search, Semantic, Technology, Text analytics, Text processing | Comments Off on Google Translation Nudges Forward

The Future of EasyAsk: Depends on Progress

December 18, 2008

EasyAsk is a search system that works quite well. You can read EasyAsk Facts here. The company is now a unit of Progress Software. Progress began with a core of original code and over the years has acquired a number of companies. I think of the firm as a boutique, which is not what the Progress public relations people want me to keep in my tiny goose brain. I saw a news item about Progress Software’s most recent financial report. You can read a summary of the numbers here. If you want more detail, navigate to Google Finance here. The story is simple: earnings are down to $8.5 million from $15.8 million in the fourth quarter of 2007. With the economic climate in deep chill mode, Progress will have to retool its sales and marketing. If the downdraft continues, the company will have to make some tough decisions about which of its many products to hook up to life support. EasyAsk is like other search systems a complicated beastie. Search systems gobble up money, and the sales cycle is often long even when the MBAs are running at full throttle. When the MBAs are home worrying about their mortgage payments, the search business is likely to suffer. One warning sign: EasyAsk was not mentioned in the news release I read. This goose is accustomed to watching the weather for signs of a storm. My thought is that one might be building and heading the EasyAsk way. What’s your take? No PR people need reply, thanks.

Stephen Arnold, December

Written by Stephen E. Arnold · Filed Under Business strategy, Enterprise, Financial, News, Search, Technology, Text analytics, Text processing | 1 Comment

Leximancer Satmetrix Tie Up

December 18, 2008

Leximancer has partnered with Satmetrix so that company can utilize Leximancer’s Customer Insight Portal. Satmetrix provides software applications and consulting services to improve customer loyalty. Using “intuitive concept discovery” — semantic analysis — Leximancer develops responses on customer attitudes. Leximancer will provide customer analytics and unstructured text mining for Satmetrix’s Net Promoter, which automatically sifts and categorizes data from blogs, Web sites, social media, e-mails, service notes and survey feedback to increase companies’ customer loyalty, retention and growth. The focus on analyzing positive and negative trends in text entries from customers is key to speed and response for customer service-oriented companies. Satmetrix serves a wide spread of markets including telecommunications firms like Verizon and business services like Careerbuilder.

Jessica Bratcher, December 17, 2008

Written by Stephen E. Arnold · Filed Under Business strategy, Enterprise, News, Search, Semantic, Technology, Text analytics, Text processing | 1 Comment

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Google Semantics Surfacing

New Conference Pushes beyond Search

Interview Exclusive: Exalead’s New US Chief Executive Officer

Natural Search: SEO Boffin Changes His Spots

Duplicates and Deduplication

Moore’s Law: Not Enough for Google

Getting Doored by Search

Google Translation Nudges Forward

The Future of EasyAsk: Depends on Progress

Leximancer Satmetrix Tie Up

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta