Oracle Rolls Out Text Index Strategy

March 7, 2013

Oracle’s support of locally partitioned indexes has created a need for users to be able to split those indexes and rebuild them in a timely manner. How do you rebuild an index without making your application unavailable for the entire time?

Prsync’s look into the maintenance disadvantages and subsequent problem solving by Oracle in “Partition Maintenance and Oracle Text Indexes” gives us a look at something new; a “Without Validation” and “Split Partition” features. These options offer a way to rebuild indexes without checking each line-by-line first.

“That solves the problem, but it’s rather heavy-handed. So instead we need to institute some kind of “change management”. There are doubtless several ways to achieve this, but I’ve done it by creating triggers which monitor any updates or inserts on the base table, and copy them to a temporary “staging” table. These transactions can then be copied back to the main table after the partition split or merge is complete, and the index sync’d in the normal way.”

So now that there is a solution, but, by avoiding the need for a system to check every partition key value to make sure the row is going to the correct partition, there is need for extra care when using the without validation feature.

It’s a long needed saving grace that will save time and ultimately money by getting apps back up and running in a more efficient manner but there is no substitute for attention to detail. For a more in-depth look at the process we suggest heading over to prsync.

Leslie Radcliff,  March 07, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Come Here, Watson. I Want a Cusp of Commercialization

February 28, 2013

For a moment, I thought I was reading a sitcom script. You judge for yourself. Navigate to “And Now, from IBM, It’s Chef Watson.” If you have an environmentally unfriendly version of the New York Times, you can find the script—sorry, real news story—on page B1 of the February 28, 2013, edition.

Let me highlight several phrases and sentences which I found amusing and somewhat troubling for those trying to convince people to license next generation search systems. Keep in mind that the point of the story is Watson, IBM’s next generation Jeopardy winning search system. The peripatetic Watson has done education, insurance, and cancer cracking. Now, Watson and its formidable technical amalgamation of open source and proprietary code is prepping for the Food Network.

IBM Watson’s is hunting for revenues and finding publicity. Can a $100 billion dollar entity find money in search, content processing, and analytics with a silicon Watson? Someday perhaps.

Here are the items I noted, highlighted in dark red and bold to make the words easy to spot:

First, this phrase, “…tries to expand its [IBM’s] artificial intelligence technology and turn turn Watson into something that actually makes commercial sense.” Reading this statement in the context of Hewlett Packard’s interesting commercial activities related to the write down of the spectacular $11 billion purchase of Autonomy is ripe with irony, probably unintentional too.

Second,  I found the phrase “on the cusp of commercialization.” Interesting. The Jeopardy show aired in early 2011. A “cusp,” according to one of the online dictionaries is “A transitional point or time, as between two astrological signs.” Yep, I believe is astrology.

Read more

Exclusive Interview: Tom Reamy KAPS Group

February 27, 2013

You have encountered a special page. To continue, click here.

Another Palantir Push: But Little Hard Financial Data. Why Not?

February 23, 2013

I was reading about the TED Conference’s yo-yo presentation. My eye drifted across an expanse of cellulose and landed on “The Humane Way to Crack Terrorists.” (This link will go dead so be aware that you may have to pay to read the item online.) The subtitle was one of those News Corp. Google things: “Big data may make enhanced interrogation obsolete.” The source? Some minor blog from America’s hinterland, Silicon Valley? Nope. The Wall Street Journal, February 23, 2013, page C 12.

What’s the subject – really? The answer, in my opinion, Palantir. If you monitor the flagship, traditional media, Palantir has a solid track record of getting written about in print magazines. I suppose that the folks who have pumped about $150 million into the “big data” company read those magazines and the Wall Street Journal type publications each day. I know I do, and I am an addled goose in rural Kentucky, the high tech nerve center of the new industrial revolution. After February 28, 2013, I am not sure about the economy, however.

Here’s the passage I noted:

There’s a tellingly brief passage in “The Finish: The Killing of Osama bin Laden” by Mark Bowden. “The hunt for bin Laden and others eventually drew on an unfathomably rich database,” he writes. “Sifting through it required software capable of ranging deep and fast and with keen discernment—a problem the government itself proved less effective at solving than were teams of young software engineers in Silicon Valley. A startup called Palantir, for instance, came up with a program that elegantly accomplished what TIA [Terrorism Information Awareness program, set up in 2002] had set out to do.” When I met the chief executive and co-founder of Palantir, Alex Karp, recently, he was straightforward: “It is my personal belief that flawless data integration at any kind of scale, with a rigorous access control model, allows analysts to perform operations that are only intrusive on the data. They are not intrusive on human beings.” Obviously, Palantir doesn’t comment on classified work. But its technological phalanx—processing countless leads, from flight manifests to tapped phone calls, into one resource for people to interpret—is known to have been key in locating bin Laden. The company, founded in 2004, has large contracts across the intelligence community and is enterprise-wide at the FBI. Its first client was the CIA.

Nifty stuff. Palantir has high profile clients like intelligence and law enforcement outfits. But where is a hedge fund or a consumer products company? Allegedly the fancy math technology can work wonders. The implication is that outfits like Digital Reasoning, Recorded Future, and even Tibco are not in Palantir’s league. Oh, really? What about outfits like IBM and Oracle and SAS? Nah. Palantir seems to be where the good stuff happens in the context of this Wall Street Journal article.

In my view, the write up triggered several notes on my ubiquitous 4×6 paper note cards, just like the ones I used in high school debate competitions:

First, what about that legal dust up with i2 Group? Here’s a link to refresh one’s memory. I recall that there was  also some disagreement, a few real media stories, and then a settlement regarding sector leader i2. Note: I did some work years ago for this out, which is now owned by IBM. Oh, and after the settlement silence. Just what was that legal dispute about anyway? The Wall Street Journal story does not touch on that obviously trivial issue related to the legal matter. Why not? The space in the newspaper was probably needed to cover the yo-yo guy.

Second, can software emulate the motion picture approach to reality? In my experience, numerical recipes can be useful, but they can also provide some points which are subject to contention. A recent example is the gentleman’s disagreement about an electric vehicle. Data, analyses, and interpretations—muddled. Not like the motion pictures’ tidiness and quite final end point. “The end” solves a lot of fictional problems. Life is less clear, a lot less clear in my experience.

Third, how is Palantir doing as a business? After all, the story ran in the Wall Street Journal, which is about business. I appreciate the references to a motion picture, but I am curious about how Palantir is doing on its march to generate a billion or more in revenues. At some point, the investors are going to look at the money pumped into Palantir, the time spent developing the magical technology which warrants metaphorical juxtaposition to Hollywood outputs, and the profitability of the company’s sales. Why doesn’t the Wall Street Journal do the business thing? Revenue, commercial customers, and case studies which do not flaunt words which Bing and Google love to consume in their indexing systems?

It is Saturday, and I suppose I there are lots of 20 somethings working at 0900 Eastern as I write this. They will fill the gap. I will have to wait. I wonder if the predictive algorithms from Palantir can tell me how long before hard facts become available?

One final question: If this Palantir type of system worked, why aren’t the firms in this Palantir-type software sector dominating in financial services, marketing, and consumer products? I wonder if the reason is that fancy math generates high expectations and then creates some situations in which reality does not work just like a cinema thriller?

Stephen E Arnold, February 23, 2013

Oracle Text Workaround for Stop Words List

February 22, 2013

We’ve come across a discussion about Oracle Text at StackOverflow, “Oracle Text Search Doesn’t Work on Some Words.” Essentially, some words cannot initially be indexed, and the fix is to go in and remove those words from the stop words list. Interesting.

The question-and-answer site for programmers received this query:

“I am using Oracle’ Text Search for my project. I created a ctxsys.context index on my column and inserted one entry ‘Would you like some wine???’. I executed the query ‘select guid, text, score(10) from triplet where contains (text, ‘Would’, 10) > 0′; it gave me no results. Querying ‘you’ and ‘some’ also return zero results. Only ‘like’ and ‘wine’ matches the record. Does Oracle consider you, would, some as stop words?? How can I let Oracle match these words? Thank you.”

The top response reveals:

“I found that the query’s output is perfect according to the stop word lists that is in the oracle.

those words can be found in the ctxsys package, and you could query for the stoplist and the stop words using “SELECT * FROM CTX_STOPLISTS; SELECT * FROM ctx_stopwords;” and yes, the oracle consider ‘you’, ‘would’ in your query as stop words.”

The solution—remove the offending stop words with the command, “GRANT EXECUTE ON CTXSYS.CTX_DDL to you” followed by the desired procedure. See the link for an example.

Cynthia Murrell, February 22, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

IBM and Price Cuts: Is Watson a Factor?

February 17, 2013

I read “IBM Cuts Price of Watson Based Power Servers.” I have no clue if the story is correct, half current, or incorrect. What’s important is that CIOL.com thought the notion of a Watson related price cut newsworthy.

The Power7 based servers were hot stuff several years ago. CPU performance is no longer the gating factor as it was in the days of STAIRS III. Input output, memory subsystems, and various types of latency make a system fast or not. Heck, careless programming can make Google’s zippy boxes howl with pain when its innards suffer a computational cramp.

The write up asserts:

IBM will roll out eight new Power Systems for entry level starting at $5,947. The new systems include Power Express 710, 720, 730 and 740 family of products…. IBM will also introduce two new PowerLinux Systems – 7R1 and 7R2 – optimized for IBM InfoSphere BigInsights and InfoSphere Streams big data analytics software. The company will also introduce two new Power Systems – 750 and 760 – for midsized and large enterprises.

The hot item in the story in my opinion is this reference:

The new systems are based on IBM’s Watson system and are powered by its Power7+ microprocessor technology. These will enable users to build and deploy infrastructure for private and hybrid clouds, as per a release.

The write up includes the now obligatory baloney about the big data, cloud and caching tactics for performance.

If the story is incorrect, no big deal. Any publicity is good, even for a dog movie like “Heaven’s Gate” and its expensive roller skates. If the story is half correct, why is Watson making an appearance in juxtaposition to “entry level.” Is the vaunted Jeopardy winning technology not generating sufficient revenue to payback the development time and the sunk marketing costs? If the story is correct, I am interested in the fact that high end information technology has to be bundled at lower prices.

Years ago, I was told by an informed person that IBM knew what it was doing when it came to search and information retrieval. Maybe the company will come to dominate the enterprise market for big data, analytics, and smarter search. On the other hand, hasn’t IBM travelled this road before and yet the journey continues.

Stay tuned to Jeopardy or monitor the cancer related news stream. Watson is with us along with a Power7 chip which may be experiencing some symptoms of rheumatism.

Stephen E Arnold, February 17, 2013

Sinequa France: Update 2013

February 14, 2013

My research team was winnowing our archive of information about European search vendors. Since Martin White’s article for eContent in 2011, a number of changes have swept through the search and content processing sector. Some changes were significant; for example, HP’s stunning acquisition of Autonomy. Others were more modest; for example, the steady progress of such companies as Sinequa and Spotter, among others.

The European technical grip on search is getting stronger. Google is the dominant player in Web search. But in enterprise content processing, some European firms are moving more rapidly than their North American or Pacific Rim counterparts.

image

The Sinequa tag cloud. See http://www.sinequa.com/en/page/solutions/category-1.aspx

One interesting example is Sinequa, based in Paris. The company, like other French technology firms, has a staff of capable engineers and managers. However, unlike some other companies, Sinequa has continued to establish a track record as a company innovating in technology and capturing some important accounts; for example, Siemens, the German industrial powerhouse.

Sinequa’s approach is to emphasize that enterprise search has moved to unified information access. A number of companies make similar claims. Sinequa has established that its technology can deliver the type of one-stop access to structured and unstructured content that almost every vendor claims to deliver. You can get a useful overview of the architecture of the Sinequa platform at http://www.sinequa.com/en/page/product/product.aspx.

A relatively recent addition to the Sinequa.com Web site are case analysis videos. I find case examples extremely useful. The presentation of this type of information in rich media format makes it easier for me to get a sense of the value of the solution a vendor delivers. I found the Mercer video particularly interesting. You can find these testimonials at http://www.sinequa.com/en/page/clients/clients-video.aspx.

The trajectory of European search, content processing, and analytics vendors is difficult to plot in today’s uncertain economic climate. Sinequa warrants a close look for organizations seeking an integrated approach to its content assets. For more information about Sinequa’s current activities, tap into the firm’s blog at http://blog.sinequa.com/

Stephen E Arnold, February 14, 2013

Sponsored by EMRxNow, the information service which tracks automated indexing of electronic medical records

Change Comes to Attensity

February 14, 2013

Just as the demand for analytics is ascending, Attensity makes a management change. We learn the company recently named J. Kirsten Bay their head honcho in “Attensity Names New President/CEO,” posted at Destination CRM. The press release stresses the new CEO’s considerable credentials:

“Bay brings to Attensity nearly 20 years of strategic process and organizational policy experience derived from the information management, finance, and consumer product industries. She is an expert in advising both the public and private sector on the development of econometric policy models. Most recently, as vice president of commercial business with iSIGHT Partners, Bay provided strategic counsel to Fortune 500 companies on managing intelligence requirements and implementing customer and development programs to integrate intelligence into decision programs.”

The company’s flagship product Attensity Pipeline collects and semantically annotates data from social media and other online sources. From there, it passes to Attensity Analyze for text analytics and customer engagement suggestions.

Headquartered in Palo Alto, California, folks at Attensity pride themselves on the accuracy of their analytic engines and their intuitive reports. Rooted in their development of tools that serve the intelligence community, the company now provides semantic solutions to many Global 2000 companies and government agencies.

Cynthia Murrell, February 14, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

From Jeopardy to Cancer Treatment: An IBM Story

February 10, 2013

I read “IBM Supercomputer Watson to Help in Cancer Treatment.” I am burned out on the assertions of search, content processing, and analytics vendors. The algorithms predict, deliver actionable information, and answer tough questions. Okay, I will just believe these statements. Most of the folks with whom I interact either believe these statements or do not really care.

Watson, as you may know, takes open source goodness, layers on a knowledge base, and wraps the confection in layers of smart software. I am simplifying, but the reality is irrelevant given the marketing need.

Here’s the passage I noted:

A year ago, a team at Memorial Sloan-Kettering started working with an IBM and a WellPoint team to train Watson to help doctors choose therapies for breast and lung cancer patients. They continue to share their knowledge and expertise in oncology and information technology, beginning with hundreds of lung cancers, the aim being to help Watson learn as much as possible about cancer care and how oncologists use medical data, as well as their experiences in personalized cancer therapies. During this period, doctors and technology experts have spent thousands of hours helping Watson learn how to process, analyze and interpret the meaning of sophisticated clinical data using natural language processing; the aim being to achieve better health care quality and efficiency.

There you go. For the dozens of companies working to create next generation information retrieval systems which are affordable, actually work, and can be deployed without legions of engineers—game over. IBM Watson has won the search battle. Now for the optimists who continue to pump money into decade old search companies which have modest revenue growth, kiss those bucks goodbye. For the PhD students working on the revolutionary system which promises to transform findability, get a job at Kentucky Fried Chicken. And Google? Well, IBM knows your limits so stick to selling ads.

IBM is doing it all:

Manoj Saxena, IBM General Manager, Watson Solutions, said:

“IBM’s work with WellPoint and Memorial Sloan-Kettering Cancer Center represents a landmark collaboration in how technology and evidence based medicine can transform the way in which health care is practiced. breakthrough capabilities bring forward the first in a series of Watson-based technologies, which exemplifies the value of applying big data and analytics and cognitive computing to tackle the industry’s most pressing challenges.”

How different is Watson from the HP Autonomy, Recommind, or even the DR LINK technology? Well, maybe the open source angle is the same. But IBM needs to do more than make assertions and buy analytics companies as the company recycles open source technology in my opinion. I thought IBM was a consulting firm? Here I am wrong again. Watson probably “knew” that after hours of training, tuning, and talking. But in the back of my mind, I ask, “What if those training data are inapplicable to the problem at hand? What if the journal articles are fiddled by tenure seekers or even pharmaceutical outfits or institutions trying to maximize insurance payouts or careless record keeping by medical staff? Nah, irrelevant questions. IBM has this smart system nailed. Search solved. What’s next IBM?

Stephen E Arnold, February 10, 2013

Salesforce Acquires Entropy Soft

February 7, 2013

Connectors are important. If a system cannot acquire content from a system, the fanciest text processing system cannot do its work. Years ago Oracle acquired Stellent which had snapped up Outside In. Now Salesforce has followed in Oracle’s footsteps with its acquisition of Entropy Soft. (I assume the story “Salesforce.com Has Acquired French Startup EntropySoft.”) Entropy Soft is not a start up in my opinion. The company was set up in 2005 or so. According to the write up, Entropy Soft had about $3.5 million in funding. Details are limited. The question which the deal raises is, “What services will Salesforce introduce which acquires software to acquire diverse enterprise content?”

 

A list of Entropy Soft connectors is no longer available online. According to my files, Entropy Soft has more than 40 connectors. These include:

  • Microsoft SharePoint
  • IBM Lotus Quickplace
  • Hummingbird DM
  • Alfresco
  • FileNet
  • Interwoven
  • EMC Documentum

Update: A list of Oracle connectors is at http://www.oracle.com/technetwork/search/oses/ses-connectors-178226.pdf

Stephen E Arnold, February 7, 2013

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta