New Search and Old Boundaries

April 28, 2010

Yesterday in my talk at a conference I pointed out that for many people, the Facebook environment will cultivate new species of information retrieval. Understandably the audience listened politely and converted my observations into traditional information retrieval methods. Several of the people with whom I spoke pointed out that the Facebook information was findable only with a programmatic query via the Facebook application programming interfaces or by taking a Facebook feed and processing it. The idea that “search” now spans silos, includes structured and unstructured data, and delivers actionable results describes what some organizations want. There are challenges, of course. These include:

Mandated silos of information; for example, in certain situations, mash ups and desiloization are prohibited for legal or practical reasons
The costs of shifting from inefficient, expensive methods to more informed methods; for example, the costs of data transformation can be onerous. I have talked with individuals who point out that data transformation can consume significant sums of money and these expenditures are often inadequately budgeted. One result is a slow down or cut back on the behind-the-scenes preparatory work
Business processes have sometimes emerged based on convention, user behavior or because the system was refined over time. When “data” are meshed with such a business process, the marriage is a less-than-happy one. Data centric thinking can be blunted when juxtaposed to certain traditional business processes and methods.

In short, the new world can be envisioned, based on speculation, or assembled from fragmentary reports from the field. I can imagine the intrepid 16th century navigators understanding why innovators have to push forward into a new and unknown world. One reminder is the assertion that an estimated 358 million personal data records have been leaked since 2005.

The Guardian article “Facebook Privacy Hole ‘Lets You See Where Strangers Plan to Go‘” provides an example of one challenge. The point of the write up is that the Facebook social network has a “privacy hole”. The Guardian says:

Some people report that they are able to see the public “events” that Facebook users have said they will attend – even if they person is not a “friend” on the social network…The implications of being able to find out the movements of any of the 400m people on Facebook are potentially wide-ranging – although the flaw does not seem to apply to every user, or every event. Yee says that the simplest way to prevent your name appearing in such lists is to put “not attending” against any event you are invited to.

As the Facebook approach to finding information captures users, the barriers between new types of information and the uses to which those information objects can be put come down. In a social space, the issue is personal privacy. In an organizational space, the issue is the security of information assets.

As young people enter the workforce, these folks bring a comfort level with Facebook type of systems markedly different from mine. I think organizations are largely unable to control effectively what some employees do with online services. Telework, mobile devices, and smart phones present a management and information challenge.

The lowering of information barriers and the efforts to dissolve silos further reduces an organization’s control of information and the knowledge of the uses to which that information may be put.

Let’s step back.

First, ineffective search and content processing systems exist, so organizations need ways to address the costs and inefficiencies of incumbent systems. Web services and fresh approaches to indexing content seem to be solutions to findability problems in some situations.

Second, employees—particularly those comfortable with pervasive connectivity and social methods of obtaining information—do what works for them. These methods are not necessarily controllable or known to some employers. An employee can use a personal smart phone to ask “friends” a question. After all, what are friends for?

Third, vendors want to describe their systems using words and phrases that connote ways to solve findability problems. Talking about merged data and collaboration may be what’s needed to close a deal.

When these three ingredients are mixed, the result is a security and information control challenge that is only partially understood.

Is it possible to deliver a next generation information experience and minimize the risks from such a system? Sure, but there will be surprises along the route. Whether it is Mr. Zuckerberg’s schedule or insights into the Web browsing habits of government employees, there will be unexpected and important insights about these systems. The ability to use a search interface to obtain reports is increasing. Are the privacy and security controls lagging behind?

Stephen E Arnold, April 28, 2010

Unsponsored post.

Written by Stephen E. Arnold · Filed Under Business strategy, News, Search, Social, Text analytics, Text processing | Comments Off on New Search and Old Boundaries

SAS and Social Media

April 28, 2010

The social media bandwagon rolls on. I read “SAS aims to Make a Splash in Social Media Analytics” and realized that even large firms cannot ignore the shift to Facebook’s impact. True, there are many social media companies, but Facebook has emerged as the go-to service, threatening to eclipse even Twitter. The story says:

SAS says its technology can identify influencers within social networks, quantify their impact and from that forecast the future volume of social media conversations. The ultimate aim is to predict what impact these conversations will have on a business so companies can allocate relevant resources, create “what-if” scenarios and correlate key marketing metrics like brand preference, web traffic, online campaign effectiveness and media mix.

IBM SPSS will be quick to respond. Statistics could even become even more fun.

Stephen E Arnold, April 28, 2010

Unsponsored post.

Written by Stephen E. Arnold · Filed Under Business strategy, News, Text analytics, Text processing | 2 Comments

Endeca and a New Marketing Angle

April 25, 2010

I was reading “SAP Spend Performance Management: A New Supply Risk and Spend Analysis Option”. SAP is a company whose actions I watch. The story in Search SAP confused me but it included a reference to the search vendor Endeca that surprised me. Here’s the passage that caught my attention. I have added the bold face for the part that I noted:

Stand-out analytics providers in the spend visibility arena with unique capabilities and/or analytics experience include BIQ, Endeca, Rosslyn Analytics and SAP. At the time of publication, SAP appears furthest along (among these) from an analytics-driven supply risk management solution perspective, while Endeca offers the most flexible capability for examining and integrating disparate data sources in a new type of spend visibility/supply risk mashup approach.

I found the notion of Endeca’s embracing “spend visibility/supply risk mashup approach” fascinating. I am not sure what it means, however. The confusion may be a result of the SAP angle, but it suggests that Endeca is moving in a direction of which I was unaware.

Stephen E Arnold, April 25, 2010

Written by Stephen E. Arnold · Filed Under Business strategy, News, Text analytics | Comments Off on Endeca and a New Marketing Angle

Cuil Founder Lands Another Google Invention

April 22, 2010

I have been reluctant to beat up on the alleged weaknesses of the Cuil.com system for one good reason. Dr. Anna Patterson is a very sharp computer scientist. She developed a quite ingenious system called Xift which she sold to the AltaVista.com crowd. After more engineering and family work, she joined Google and invented some fascinating technology which I discuss in Google Version 2.0. Even though she and her equally smart companion founded Cuil.com, the Patterson impact on Google continues. One example is the April 20, 2010 patent granted for her invention “Information Retrieval System for Archiving Multiple Document Versions.” You can read in my studies The Google Legacy and Google Version 2.0 about the importance of this technique to some Google “time” centric processes. A moment’s reflection will reveal that this ability to traverse deltas has some interesting applications. There are other benefits as well, but the invention is meritorious in my opinion and worth reading in US 7,702,618. Here’s the fine Google/lawyer explanation in the patent’s abstract:

An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases. Index data for multiple versions or instances of documents is also maintained. Each document instance is associated with a date range and relevance data derived from the document for the date range.

Dr. Patterson has tallied more than a half dozen inventions for the Google. I pay attention to her work and I discount much of the criticism aimed at her most recent activities. In my experience, the systems reveal significant insights into the trajectory of search. Care to disagree? Just bring some facts and your list of inventions and your record of innovation in search. Dr. Patterson may find the dust up amusing. I will.

Stephen E Arnold, April 22, 2010

Unsponsored post. Dr. Patterson let me pet one of her dogs once. Does that count as a payoff?

Written by Stephen E. Arnold · Filed Under Google, News, Online (general), Technology, Text analytics, Text processing | Comments Off on Cuil Founder Lands Another Google Invention

The Seven Forms of Mass Media

April 21, 2010

Last evening on a pleasant boat ride on the Adriatic, a number of young computer scientists to be were asking about my Google lecture. A few challenged me, but most seemed to agree with my assertion that Google has a large number of balls in the air. A talented juggler, of course, can deal with five or six balls. The average juggler may struggle to keep two or three in sync.

One of the students shifted the subject to search and “findability.” As you know, I floated the idea that search and content processing is morphing into operational intelligence, preferably real-time operational intelligence, not the somewhat stuffy method of banging two or three words into a search box and taking the most likely hit as the answer.

The question put to me was, “Search has not kept up with printed text, which has been around since the 1500s, maybe earlier. What are we going to do about mobile media?”

The idea is that we still have a difficult time locating the precise segment of text or datum. With mobile devices placing restraints on interface, fostering new types of content like short text messages, and producing an increasing flow of pictures and video, finding is harder not easier.

I remembered reading “Cell Phones: The Seventh Mass Media” and had a copy of this document on my laptop. I did not give the assertion that mobile derives were a mass medium, but I thought the insight had relevance. Mobile information comes with some interesting characteristics. These include:

The potential for metadata derived from the user’s mobile number, location, call history, etc
The index terms in content, if the system can parse information objects or unwrap text in an image or video such as converting an image to ASCII and then indexing the name of a restaurant or other message in an object
Contextual information, if available, related to content, identified entities, recipients of messages, etc.
Log file processing for any other cues about the user, recipient(s), and information objects.

What this line of thinking indicates is that a shift to mobile devices has the potential for increasing the amount of metadata about information objects. A “tweet”, for instance, may be brief but one could given the right processing system impart considerable richness to the information object in the form of metadata of one sort or another.

The previous six forms of media—[I] print (books, magazines, and newspapers), [II] recordings; [III] cinema; [IV] radio; [V] television; and [VI] Internet—fit neatly under the umbrella of [VII] mobile. The idea is mobile embraces the other six. This type of reasoning is quite useful because it gathers some disparate items and adds some handles and knobs to the otherwise unwieldy assortment in the collection.

In the write up referenced above, I found this passage interesting: “Mobile is as different from the Internet as TV is from the radio.”

The challenge that is kicked to the side of the information highway is, “How does one find needed information in this seventh mass media?” Not very well in my experience. In fact, finding and accessing information is clumsy for textual information. After 500 years, the basic approach of hunting, Easter egg style, has been facilitated by information retrieval systems. But I think most people who look for information can point out some obvious deficiencies. For example, most retrieval systems ignore content in various languages. Real time information is more of a marketing ploy than a useful means of figuring out the pulse count for a particular concept. A comprehensive search remains a job for a specialist who would be recognized by an archivist who worked in Ephesus’ library 2500 years ago.

Are you able to locate this video on Ustream or any other video search system? I could not, but I know the video exists. Here is a screen capture. Finding mobile content can be next to impossible in my opinion.

When I toss in the radio and other rich media content, finding and accessing pose enormous challenges to a researcher and a casual user alike. In my keynote speech on April 15, 2010, I referenced some Google patent documents. The clutch of disclosures provide some evidence that Google wants to apply smart software to the editorial job of creating personalized rich media program guides. The approach strikes me as an extension of other personalization approaches, and I am not convinced that explicit personalization is a method that will crack the problem of finding information in the seventh medium or any other for that matter.

Here’s my reasoning:

Search and retrieval methods for text don’t solve problems. The more information processed means longer results lists and an increase in the work required to figure out where the answer is.
Smart systems like Google’s or the Cuil Cpedia project are in their infancy. An expert may find fault with smart software that is actually quite stupid from the informed user’s point of view.
Making use of context is a challenging problem for research scientists but asking one’s “friends” may be the simplest, most economical, and widely used method. Facebook’s utility as a finding system or Twitter’s vibrating mesh may be the killer app for finding content from mobile devices.
As impressive as Google’s achievements have been in the last 11 years, the approach remains largely a modernization of search systems from the 1970s. A new direction may be needed.

The bright young PhDs have the job of figuring out if mobile is indeed the seventh medium. The group with which I was talking or similar engineers elsewhere have the job of cracking the findability problem for the seventh medium. My hope is that on the road to solving the problem of the new seventh medium’s search challenge, a solution to finding information in the other six is discovered as well.

The interest in my use of the phrase “operational intelligence” tells me one thing. Search is a devalued and somewhat tired bit of jargon. Unfortunately substituting operational intelligence for the word search does not address the problem of delivering the right information when it is needed in a form that the user can easily apprehend and use.

There’s work to be done. A lot of work in my opinion.

Stephen E Arnold, April 20, 2010

No sponsor for this post, gentle reader.

Written by Stephen E. Arnold · Filed Under Business strategy, Feature, Mobile, Rich media, Semantic, Technology, Text analytics, Text processing | 4 Comments

Explaining Artificial Intelligence to Everyone

April 18, 2010

Science Daily ran a story on April 1, 2010. I was not sure if this story was a joke or whether it was serious. I will let you decide. The title was “Grand Unified Theory of AI: New Approach Unites Two Prevailing but Often Opposed Strains in Artificial-Intelligence Research.” The write up explains the Math Club approach; that is, the use of numerical methods, which are now popular. The article describes the rules based approach, which requires a human to write the rules. The core of the story is a pitch for the “Church system”. Science Daily explains:

“With probabilistic reasoning, you get all that structure for free,” Goodman says. A Church program that has never encountered a flightless bird might, initially, set the probability that any bird can fly at 99.99 percent. But as it learns more about cassowaries — and penguins, and caged and broken-winged robins — it revises its probabilities accordingly. Ultimately, the probabilities represent all the conceptual distinctions that early AI researchers would have had to code by hand. But the system learns those distinctions itself, over time — much the way humans learn new concepts and revise old ones. “What’s brilliant about this is that it allows you to build a cognitive model in a fantastically much more straightforward and transparent way than you could do before,” says Nick Chater, a professor of cognitive and decision sciences at University College London. “You can imagine all the things that a human knows, and trying to list those would just be an endless task, and it might even be an infinite task. But the magic trick is saying, ‘No, no, just tell me a few things,’ and then the brain — or in this case the Church system, hopefully somewhat analogous to the way the mind does it — can churn out, using its probabilistic calculation, all the consequences and inferences. And also, when you give the system new information, it can figure out the consequences of that.”

We talked about this write up at lunch and decided that we would invite readers to read the article and draw a conclusion about a “unified theory of artificial intelligence.”

Stephen E Arnold, April 19, 2010

A freebie.

Written by Stephen E. Arnold · Filed Under News, Semantic, Technology, Text analytics, Text processing | 1 Comment

A-Life NLP Renew Medical Automation Deal

April 17, 2010

A-Life Medical, Inc., a leading provider of computer-assisted coding (CAC) products and services to the healthcare industry, announced today the renewal of an extensive contract with Associated Billing Services, Inc. “Associated Billing Services Renews Extensive Agreement with A-Life Medical” The computerized coding and workflow management product that leverages A-Life’s proprietary and patented technology, LifeCode ® appears to be the source of reaping the cost-savings benefits and efficiencies key to a successful business.

According to Associated Billing Services’ vice president, Matthew Frick:

“We have built a long-standing relationship with A-Life based on the benefits of the company’s patented NLP technology. Its accuracy rate and ability to appropriately code quickly, seamlessly and efficiently, has helped us to significantly reduce turnaround time, labor costs and accounts receivable days of services outstanding.”

Using NLP technology, A-Life deciphers electronic transcribed patient encounters via the Internet through its data center, which are then appropriately coded for reimbursement purposes.

Melody K. Smith, April 17, 2010

Note: Post was not sponsored.

Written by Stephen E. Arnold · Filed Under Enterprise, News, Semantic, Technology, Text analytics, Text processing | Comments Off on A-Life NLP Renew Medical Automation Deal

Blogs May Be Training Input for AI Systems

April 16, 2010

The Montréal Gazette ran an interesting story “’Mundane’ Blogs Could Help Train Artificial-Intelligence Computers: Researcher. I think of blogs as marketing vehicles, not instructional material. That goes to show how little I know. For me, the key passage in the write up was:

For Andrew Gordon, there’s no such thing as a boring blog — even if it chronicles making breakfast or walking to work. A research scientist at the University of Southern California’s Institute for Creative Technologies, he’s heading a new project with the ambitious aim of archiving every English-language blog entry posted online — a million of them a day — in hopes of using this vast database to teach artificial-intelligence computers about real life. “People write about the mundane aspects of their daily life, and for me, personally, I find it incredibly interesting,” he says.

This line of research falls within what has been called “a formalization of common sense.”

Stephen E Arnold, April 16, 2010

No one paid for this post.

Written by Stephen E. Arnold · Filed Under News, Search, Text analytics, Text processing | Comments Off on Blogs May Be Training Input for AI Systems

Google and Disruption: Will It Work Tomorrow?

April 15, 2010

Editor’s Note: The text in this article is derived from the notes prepared by Stephen E Arnold’s keynote talk on April 15, 2010. He delivered this speech as part of Slovenian Information Days in Portoroz, Slovenia.

Thank you, Mr. Chairman. I am most grateful for the opportunity to address this group and offer some observations about Google and its disruptive tactics.

I started tracking Google’s technical inventions in 2002. A client, now out of business, asked me to indicate if “Google really had something solid.”

My analysis showed a platform diagram and a list of markets that Google was likely to disrupt. I captured three ideas in my 2005 monograph “The Google Legacy“, which is still timely and available from Infonortics Ltd. in Tetbury, Glos.

The three ideas were:

First, Google had figured out how to add computing capacity, including storage, using mostly commodity hardware. I estimated the cost in 2002 dollars as about one-third what companies like Excite, Lycos, Microsoft, and Yahoo and were paying.

Second, Google had solved the problem of text search for content on Web pages. Google’s engineers were using that infrastructure to deliver other types of services. In 2002, there were rumors that Google was experimenting with services that ranged from email to an online community / messaging system. One person, whose name I have forgotten, pointed out that Google’s internal network MOMA was the test bed for this type of service.

Third, Google was not an invention company. Google was an applied research company. The firm’s engineers, some of whom came from Sun Microsystems and AltaVista.com, were adepts at plucking discoveries from university research computing tests and hooking them into systems that were improvements on what most companies used for their applications. The genius was focus and selection and integration.

Google is an information factory, a digital Rouge River construct. Raw materials enter at one end and higher value information products and services come out at the other end of the process.

In my second Google monograph, funded funded in part by another client, I built upon my research into technology and summarized Google’s patent activities between 2004 and mid 2007. Google Version 2.0: The Calculating Predator, also published by Infonortics Ltd., disclosed several interesting facts about the company.

Written by Stephen E. Arnold · Filed Under Business strategy, Feature, Google, Rich media, Technology, Text analytics, Text processing | 1 Comment

Operational Intelligence, the New Enterprise Search

April 14, 2010

Worlds are colliding. Business intelligence, search, analytics, and business process are hurtling toward one another. No collider is needed. The impetus comes from managers who are struggling to keep their firms above water. Make no mistake about it. The economic climate may be improving based on government data and the self serving reports from global financial powerhouses. But just look at the number of empty buildings, the fraying infrastructure, and the desperation in the eyes of most employees in North America.

For those lucky enough to be thriving in a world gone mad for sending ads to individuals, life may be good. For people who are in more traditional jobs, the notion of finding information is an everyday struggle. Without the right information at the moment it is needed, organizations can make costly mistakes. These are not errors of judgment like magazine publishers who see the iPad as the font of new revenue or the dew eyed MBA looking for a job with a third string consulting firm. Nope. These visages reflect the person who cannot explain to a customer why an order was lost or an automobile was delivered with a faulty electronic gizmo. In fact, I see the effects of downsizing, the need to squeeze extra money from every transaction, and crazy decisions made by committees everywhere I look, regardless of the country.

What’s the answer? According to a sponsored white paper from the consulting outfit IDC, Teradata has the fix. Now you may not think that even bigger piles of data will help your business. I admit that I don’t believe the premise either. You can get the story in “Real-Time Operational Intelligence Gains Momentum in Europe: Teradata-sponsored business survey shows adoption details for ‘Active Data Warehousing’” and make up your own mind. Big data means big costs in my experience.

What I liked about this write up was the phrase “real time operational intelligence”. True, the acronym RTOI is a bit clumsy, but I think the phrase points to an important shift in search and content processing. RTOI delivers what many of the people with whom I speak perceive enterprise search delivering. The idea is that the information in an organization is available when needed to help people answer questions and make decisions. Hopefully the decision makers did well in school and have a modicum of common sense.

After thinking about this phrase and the acronym RTOI, I had several thoughts:

Vendors of enterprise search may want to make this phrase their own. It is a heck of lot more compelling than “putting information at your fingertips” or “dashboard”
Search, in this phrase’s embrace, becomes an enabler. Search becomes like butter in a recipe. Without the ingredient the dish does not work. Many vendors of search see themselves as the fish, vegetables, and spices in the meal. RTOI makes search an essential but supporting ingredient.
The conceptual outcome of RTOI may be consolidation of what now are marketed as separate systems. For RTOI to work, an organization needs an integrated approach. Data are not enough. The various features and functions of analytics, retrieval, report generation, and business processes must be woven together into one coherent, affordable system.

Is RTOT the future? I am willing to float a tentative, “Yes.” Fragmented information centric systems are now a cost and resource challenge for many organizations. The time is ripe for a new approach. Maybe it will be fueled by open source software like Lucene? Maybe it will be the use of a system like Google’s? Maybe it will be a roll up following the trajectory of Autonomy or OpenText.

The status quo is not delivering and change may be coming. Teradata may not be the winner, but it has contributed a useful catch phrase in my opinion. The phrase “enterprise search” could be put to rest which would be a step forward in my opinion.

Stephen E Arnold, April 14, 2010

Unsponsored post.

Written by Stephen E. Arnold · Filed Under Business intelligence, Enterprise, News, Search, Technology, Text analytics | 2 Comments

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

New Search and Old Boundaries

SAS and Social Media

Endeca and a New Marketing Angle

Cuil Founder Lands Another Google Invention

The Seven Forms of Mass Media

Explaining Artificial Intelligence to Everyone

A-Life NLP Renew Medical Automation Deal

Blogs May Be Training Input for AI Systems

Google and Disruption: Will It Work Tomorrow?

Operational Intelligence, the New Enterprise Search

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta