New Tool Integrates with Text Analytics

March 21, 2013

Language and analytics are starting a new trend by coming together. According to the Destination CRM.com article “New SDL Machine Translation Tool Integrates with Text Analytics” SDL has announced that its machine translation tool can now be integrated to work with text analytics solutions. SDL BeGlobal can translate both structured and unstructured information across more than 80 different language combinations. The information is then analyzed using text analytics solutions. This gives users the ability to access global customer insights as well as important business trends. Jean-Francois Damais, Deputy Managing Director of loyalty global clients solutions at Ispos had the following to say regarding SDL BeGlobal.

“With the growth in global business and the accessibility of online information, we now have a much greater need to access and analyze data from multiple languages. As a company focused on innovation and dedicated to our clients’ successes, we deployed SDL BeGlobal machine translation to further improve our research insights and bring new value to our customers.”

SDL BeGlobal has already caught on with several companies in the text analytics industry and several well known companies have jumped on the bandwagon. Raytheon BBN Technologies currently uses the technology for broadcast and Web content monitoring and Expert Systems uses it for semantic intelligence. Language and analytics are two things that are not normally thought of together but seems like SDL BeGlobal has a good thing going. Only time will tell if the new friendship between language and analytics will last the test of time.

April Holmes, March 21, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Semantria Adds Value to Unstructured Data With Sentiment Analysis

March 19, 2013

We are constantly on the lookout for movers and shakers in the area of text analysis and sentiment analysis. So, I was intrigued when I came across Semantria’s Web site recently, a company claiming text and sentiment analysis is made fast and easy with their software. With claims to simplify costs and high-value capturing, I had to research further.

The company was founded in 2011 as a software-as-a-service and services company, specializing in cloud-based text and sentiment analysis.The team boasts a foundation from text analytics provider Lexalytics, software development Postindustria, and demand generation consultancy DemandGen.

The company page shares about how its software can give insight into unstructured content:

“Semantria’s API helps organizations to extract meaning from large amounts of unstructured text. The value of the content can only be accessed if you see the trends and sentiments that are hidden within. Add sophisticated text analytics and sentiment analysis to your application: turn your unstructured content into actionable data.”

Semantria API is powered by the Lexalytics Salience 5 analytics engine and is fully REST compliant.  A processing demo is available at at https://semantria.com/demo. We think it is well worth a look.

Andrea Hayden, March 19, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Oracle Rolls Out Text Index Strategy

March 7, 2013

Oracle’s support of locally partitioned indexes has created a need for users to be able to split those indexes and rebuild them in a timely manner. How do you rebuild an index without making your application unavailable for the entire time?

Prsync’s look into the maintenance disadvantages and subsequent problem solving by Oracle in “Partition Maintenance and Oracle Text Indexes” gives us a look at something new; a “Without Validation” and “Split Partition” features. These options offer a way to rebuild indexes without checking each line-by-line first.

“That solves the problem, but it’s rather heavy-handed. So instead we need to institute some kind of “change management”. There are doubtless several ways to achieve this, but I’ve done it by creating triggers which monitor any updates or inserts on the base table, and copy them to a temporary “staging” table. These transactions can then be copied back to the main table after the partition split or merge is complete, and the index sync’d in the normal way.”

So now that there is a solution, but, by avoiding the need for a system to check every partition key value to make sure the row is going to the correct partition, there is need for extra care when using the without validation feature.

It’s a long needed saving grace that will save time and ultimately money by getting apps back up and running in a more efficient manner but there is no substitute for attention to detail. For a more in-depth look at the process we suggest heading over to prsync.

Leslie Radcliff,  March 07, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Come Here, Watson. I Want a Cusp of Commercialization

February 28, 2013

For a moment, I thought I was reading a sitcom script. You judge for yourself. Navigate to “And Now, from IBM, It’s Chef Watson.” If you have an environmentally unfriendly version of the New York Times, you can find the script—sorry, real news story—on page B1 of the February 28, 2013, edition.

Let me highlight several phrases and sentences which I found amusing and somewhat troubling for those trying to convince people to license next generation search systems. Keep in mind that the point of the story is Watson, IBM’s next generation Jeopardy winning search system. The peripatetic Watson has done education, insurance, and cancer cracking. Now, Watson and its formidable technical amalgamation of open source and proprietary code is prepping for the Food Network.

IBM Watson’s is hunting for revenues and finding publicity. Can a $100 billion dollar entity find money in search, content processing, and analytics with a silicon Watson? Someday perhaps.

Here are the items I noted, highlighted in dark red and bold to make the words easy to spot:

First, this phrase, “…tries to expand its [IBM’s] artificial intelligence technology and turn turn Watson into something that actually makes commercial sense.” Reading this statement in the context of Hewlett Packard’s interesting commercial activities related to the write down of the spectacular $11 billion purchase of Autonomy is ripe with irony, probably unintentional too.

Second,  I found the phrase “on the cusp of commercialization.” Interesting. The Jeopardy show aired in early 2011. A “cusp,” according to one of the online dictionaries is “A transitional point or time, as between two astrological signs.” Yep, I believe is astrology.

Read more

Exclusive Interview: Tom Reamy KAPS Group

February 27, 2013

You have encountered a special page. To continue, click here.

Another Palantir Push: But Little Hard Financial Data. Why Not?

February 23, 2013

I was reading about the TED Conference’s yo-yo presentation. My eye drifted across an expanse of cellulose and landed on “The Humane Way to Crack Terrorists.” (This link will go dead so be aware that you may have to pay to read the item online.) The subtitle was one of those News Corp. Google things: “Big data may make enhanced interrogation obsolete.” The source? Some minor blog from America’s hinterland, Silicon Valley? Nope. The Wall Street Journal, February 23, 2013, page C 12.

What’s the subject – really? The answer, in my opinion, Palantir. If you monitor the flagship, traditional media, Palantir has a solid track record of getting written about in print magazines. I suppose that the folks who have pumped about $150 million into the “big data” company read those magazines and the Wall Street Journal type publications each day. I know I do, and I am an addled goose in rural Kentucky, the high tech nerve center of the new industrial revolution. After February 28, 2013, I am not sure about the economy, however.

Here’s the passage I noted:

There’s a tellingly brief passage in “The Finish: The Killing of Osama bin Laden” by Mark Bowden. “The hunt for bin Laden and others eventually drew on an unfathomably rich database,” he writes. “Sifting through it required software capable of ranging deep and fast and with keen discernment—a problem the government itself proved less effective at solving than were teams of young software engineers in Silicon Valley. A startup called Palantir, for instance, came up with a program that elegantly accomplished what TIA [Terrorism Information Awareness program, set up in 2002] had set out to do.” When I met the chief executive and co-founder of Palantir, Alex Karp, recently, he was straightforward: “It is my personal belief that flawless data integration at any kind of scale, with a rigorous access control model, allows analysts to perform operations that are only intrusive on the data. They are not intrusive on human beings.” Obviously, Palantir doesn’t comment on classified work. But its technological phalanx—processing countless leads, from flight manifests to tapped phone calls, into one resource for people to interpret—is known to have been key in locating bin Laden. The company, founded in 2004, has large contracts across the intelligence community and is enterprise-wide at the FBI. Its first client was the CIA.

Nifty stuff. Palantir has high profile clients like intelligence and law enforcement outfits. But where is a hedge fund or a consumer products company? Allegedly the fancy math technology can work wonders. The implication is that outfits like Digital Reasoning, Recorded Future, and even Tibco are not in Palantir’s league. Oh, really? What about outfits like IBM and Oracle and SAS? Nah. Palantir seems to be where the good stuff happens in the context of this Wall Street Journal article.

In my view, the write up triggered several notes on my ubiquitous 4×6 paper note cards, just like the ones I used in high school debate competitions:

First, what about that legal dust up with i2 Group? Here’s a link to refresh one’s memory. I recall that there was  also some disagreement, a few real media stories, and then a settlement regarding sector leader i2. Note: I did some work years ago for this out, which is now owned by IBM. Oh, and after the settlement silence. Just what was that legal dispute about anyway? The Wall Street Journal story does not touch on that obviously trivial issue related to the legal matter. Why not? The space in the newspaper was probably needed to cover the yo-yo guy.

Second, can software emulate the motion picture approach to reality? In my experience, numerical recipes can be useful, but they can also provide some points which are subject to contention. A recent example is the gentleman’s disagreement about an electric vehicle. Data, analyses, and interpretations—muddled. Not like the motion pictures’ tidiness and quite final end point. “The end” solves a lot of fictional problems. Life is less clear, a lot less clear in my experience.

Third, how is Palantir doing as a business? After all, the story ran in the Wall Street Journal, which is about business. I appreciate the references to a motion picture, but I am curious about how Palantir is doing on its march to generate a billion or more in revenues. At some point, the investors are going to look at the money pumped into Palantir, the time spent developing the magical technology which warrants metaphorical juxtaposition to Hollywood outputs, and the profitability of the company’s sales. Why doesn’t the Wall Street Journal do the business thing? Revenue, commercial customers, and case studies which do not flaunt words which Bing and Google love to consume in their indexing systems?

It is Saturday, and I suppose I there are lots of 20 somethings working at 0900 Eastern as I write this. They will fill the gap. I will have to wait. I wonder if the predictive algorithms from Palantir can tell me how long before hard facts become available?

One final question: If this Palantir type of system worked, why aren’t the firms in this Palantir-type software sector dominating in financial services, marketing, and consumer products? I wonder if the reason is that fancy math generates high expectations and then creates some situations in which reality does not work just like a cinema thriller?

Stephen E Arnold, February 23, 2013

Oracle Text Workaround for Stop Words List

February 22, 2013

We’ve come across a discussion about Oracle Text at StackOverflow, “Oracle Text Search Doesn’t Work on Some Words.” Essentially, some words cannot initially be indexed, and the fix is to go in and remove those words from the stop words list. Interesting.

The question-and-answer site for programmers received this query:

“I am using Oracle’ Text Search for my project. I created a ctxsys.context index on my column and inserted one entry ‘Would you like some wine???’. I executed the query ‘select guid, text, score(10) from triplet where contains (text, ‘Would’, 10) > 0′; it gave me no results. Querying ‘you’ and ‘some’ also return zero results. Only ‘like’ and ‘wine’ matches the record. Does Oracle consider you, would, some as stop words?? How can I let Oracle match these words? Thank you.”

The top response reveals:

“I found that the query’s output is perfect according to the stop word lists that is in the oracle.

those words can be found in the ctxsys package, and you could query for the stoplist and the stop words using “SELECT * FROM CTX_STOPLISTS; SELECT * FROM ctx_stopwords;” and yes, the oracle consider ‘you’, ‘would’ in your query as stop words.”

The solution—remove the offending stop words with the command, “GRANT EXECUTE ON CTXSYS.CTX_DDL to you” followed by the desired procedure. See the link for an example.

Cynthia Murrell, February 22, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

IBM and Price Cuts: Is Watson a Factor?

February 17, 2013

I read “IBM Cuts Price of Watson Based Power Servers.” I have no clue if the story is correct, half current, or incorrect. What’s important is that CIOL.com thought the notion of a Watson related price cut newsworthy.

The Power7 based servers were hot stuff several years ago. CPU performance is no longer the gating factor as it was in the days of STAIRS III. Input output, memory subsystems, and various types of latency make a system fast or not. Heck, careless programming can make Google’s zippy boxes howl with pain when its innards suffer a computational cramp.

The write up asserts:

IBM will roll out eight new Power Systems for entry level starting at $5,947. The new systems include Power Express 710, 720, 730 and 740 family of products…. IBM will also introduce two new PowerLinux Systems – 7R1 and 7R2 – optimized for IBM InfoSphere BigInsights and InfoSphere Streams big data analytics software. The company will also introduce two new Power Systems – 750 and 760 – for midsized and large enterprises.

The hot item in the story in my opinion is this reference:

The new systems are based on IBM’s Watson system and are powered by its Power7+ microprocessor technology. These will enable users to build and deploy infrastructure for private and hybrid clouds, as per a release.

The write up includes the now obligatory baloney about the big data, cloud and caching tactics for performance.

If the story is incorrect, no big deal. Any publicity is good, even for a dog movie like “Heaven’s Gate” and its expensive roller skates. If the story is half correct, why is Watson making an appearance in juxtaposition to “entry level.” Is the vaunted Jeopardy winning technology not generating sufficient revenue to payback the development time and the sunk marketing costs? If the story is correct, I am interested in the fact that high end information technology has to be bundled at lower prices.

Years ago, I was told by an informed person that IBM knew what it was doing when it came to search and information retrieval. Maybe the company will come to dominate the enterprise market for big data, analytics, and smarter search. On the other hand, hasn’t IBM travelled this road before and yet the journey continues.

Stay tuned to Jeopardy or monitor the cancer related news stream. Watson is with us along with a Power7 chip which may be experiencing some symptoms of rheumatism.

Stephen E Arnold, February 17, 2013

Sinequa France: Update 2013

February 14, 2013

My research team was winnowing our archive of information about European search vendors. Since Martin White’s article for eContent in 2011, a number of changes have swept through the search and content processing sector. Some changes were significant; for example, HP’s stunning acquisition of Autonomy. Others were more modest; for example, the steady progress of such companies as Sinequa and Spotter, among others.

The European technical grip on search is getting stronger. Google is the dominant player in Web search. But in enterprise content processing, some European firms are moving more rapidly than their North American or Pacific Rim counterparts.

image

The Sinequa tag cloud. See http://www.sinequa.com/en/page/solutions/category-1.aspx

One interesting example is Sinequa, based in Paris. The company, like other French technology firms, has a staff of capable engineers and managers. However, unlike some other companies, Sinequa has continued to establish a track record as a company innovating in technology and capturing some important accounts; for example, Siemens, the German industrial powerhouse.

Sinequa’s approach is to emphasize that enterprise search has moved to unified information access. A number of companies make similar claims. Sinequa has established that its technology can deliver the type of one-stop access to structured and unstructured content that almost every vendor claims to deliver. You can get a useful overview of the architecture of the Sinequa platform at http://www.sinequa.com/en/page/product/product.aspx.

A relatively recent addition to the Sinequa.com Web site are case analysis videos. I find case examples extremely useful. The presentation of this type of information in rich media format makes it easier for me to get a sense of the value of the solution a vendor delivers. I found the Mercer video particularly interesting. You can find these testimonials at http://www.sinequa.com/en/page/clients/clients-video.aspx.

The trajectory of European search, content processing, and analytics vendors is difficult to plot in today’s uncertain economic climate. Sinequa warrants a close look for organizations seeking an integrated approach to its content assets. For more information about Sinequa’s current activities, tap into the firm’s blog at http://blog.sinequa.com/

Stephen E Arnold, February 14, 2013

Sponsored by EMRxNow, the information service which tracks automated indexing of electronic medical records

Change Comes to Attensity

February 14, 2013

Just as the demand for analytics is ascending, Attensity makes a management change. We learn the company recently named J. Kirsten Bay their head honcho in “Attensity Names New President/CEO,” posted at Destination CRM. The press release stresses the new CEO’s considerable credentials:

“Bay brings to Attensity nearly 20 years of strategic process and organizational policy experience derived from the information management, finance, and consumer product industries. She is an expert in advising both the public and private sector on the development of econometric policy models. Most recently, as vice president of commercial business with iSIGHT Partners, Bay provided strategic counsel to Fortune 500 companies on managing intelligence requirements and implementing customer and development programs to integrate intelligence into decision programs.”

The company’s flagship product Attensity Pipeline collects and semantically annotates data from social media and other online sources. From there, it passes to Attensity Analyze for text analytics and customer engagement suggestions.

Headquartered in Palo Alto, California, folks at Attensity pride themselves on the accuracy of their analytic engines and their intuitive reports. Rooted in their development of tools that serve the intelligence community, the company now provides semantic solutions to many Global 2000 companies and government agencies.

Cynthia Murrell, February 14, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta