Oracle Text Circa 2005, Still Relevant?

October 8, 2011

Oracle Text, formerly known as interMedia Text, is Oracle’s enterprise software that uses SQL to search, index, and analyze information stored in the database, in files, and on the web.

Their website provides a plethora of links on technical information related to this program.

The most basic FAQ’s section, for those folks who may not know the intricacies of what Oracle means when they say ‘Document Services,’ expands on topics such as how this feature simplifies application development.

For the seasoned user, they provide a section on Oracle Text in Oracle Database 11g which links to a PDF describing all the new features. However, it, in addition to White Paper, was created in 2007.

In terms of the actual technical stuff, Oracle does provide comprehensive overviews by version of their previous Oracle Text 10g all the way back to interMedia Text 8.1.5. Prefacing with a disclaimer in the introduction that these summaries are not for a newbie, Oracle has definitely considered their audience in developing this page.

Despite the fact that new versions are released, Oracle recognizes the relevance of maintaining information about previous versions—not everyone wants to keep up with the Joneses of text mining. They still have XML Features, from the 9 version ready and available for reading.

A draw to their website for those in this field is their section devoted to selected papers and presentations from Oracle. They quite a few, ranging from powerpoints on Text Mining with Oracle to a how-to-guide for Search Enable a Website and even a primer on Text Retrieval Quality.

Additionally, they link to multiple customer presentations and case studies such as the ones on Motorola or World Bank. The main drawback is that these are mostly from the early years of the last decade. Updates would be much appreciated from a company that Forbes reported as “Heading to $40 with Strength in Software.”

Megan Feil October 8, 2011

 

Inteltrax: Top Stories, September 26 to September 30

October 3, 2011

Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically, how some of the biggest names in the business are underwhelming us lately and need to do better.

One such story was “Microstrategy Not the King of Cloud BI,” http://inteltrax.com/?p=2471 which discussed how one very fine business intelligence operation is failing at becoming something it is not—a cloud BI hotspot.

Similarly, the story “Teradata Doesn’t Have the Power to be Dominant” http://inteltrax.com/?p=2435 showcases how another smart analytics firm is stretching itself too thin by trying to become everything to everyone instead of focusing on the core things it does exceedingly well.

Our feature story, “Pentaho and SizeUp Lean Toward Free Analytics,” http://inteltrax.com/?p=2616 was rich with successes for analytic software providers, but also cataloged how customers working exclusively with freeware programs from top BI names will be regretting the choice.

We’re keeping our eye on the top names in business intelligence and data analytics and playing watchdog in the process. Even excellent companies make mistakes and we’ll be here to warn consumers and slap said companies on the wrists with cutting commentary and insight every day.

Follow the Inteltrax news stream by visiting
www.inteltrax.com

Patrick Roland, Editor, Inteltrax, October 3, 2011

Sponsored by Pandia.com

Observations about Content Shaping

October 3, 2011

Writer’s Note: Stephen E Arnold can be surprising. He asked me to review the text of his keynote speech at ISS World Americas October 2011 conference, which is described as “America’s premier intelligence gathering and high technology criminal investigation conference.” Mr. Arnold has moved from government work to a life of semi retirement in Harrod’s Creek. I am one of the 20 somethings against whom he rails in his Web log posts and columns. Nevertheless, he continues to rely on my editorial skills, and I have to admit I find his approach to topics interesting and thought provoking. He asked me to summarize his keynote, which I attempted to do. If you have questions about the issues he addresses, he has asked me to invite you to write him at seaky2000 at yahoo dot com. Prepare to find a different approach to the content mechanisms he touches upon. (Yes, you can believe this write up.) If you want to register, point your browser at www.issworldtraining.com.— Andrea Hayden

Research results manipulation is not a topic that is new in the era of the Internet. Information has been manipulated by individuals in record keeping and researching for ages. People want to (and can) affect how and what information is presented. Information can also be manipulated not just by people, but by the accidents of numerical recipes.

However, even though this is not a new issue, the information manipulation in this age is much more frequent than many believe, and the information we are trying to gather is much more accessible. I want to answer the question, “What information analysts need to know about this interesting variant of disinformation?”

The volume of data in a digital environment means that algorithms or numerical recipes process content in digital form. The search and content processing vendors can acquire as much or as little content as the system administrator wishes.

no-baloney-480

In addition to this, most people don’t know that all of the leading search engines specify what content to acquire, how much content to process, and when to look for new content. This is where search engine optimization comes in. Boosting a ranking in a search result is believed to be an important factor for many projects, businesses, and agencies.

Intelligence professionals should realize that conforming to the Webmaster guidelines set forth by Web indexing services will result in a grade much like the scoring of an essay with a set rubric. Documents should conform to these set guidelines to result in a higher search result ranking. This works because most researches rely on the relevance ranking to provide the starting point for research. Well-written content which conforms to the guidelines will then frame the research on what is or is not important. Such content can be shaped in a number of ways.

Read more

Lucid Imagination: Open Source Search Reaches for Big Data

September 30, 2011

We are wrapping up a report about the challenges “big data” pose to organizations. Perhaps the most interesting outcome of our research is that there are very few search and content processing systems which can cope with the digital information required by some organizations. Three examples merit listing before I comment on open source search and “big data”.

The first example is the challenge of filtering information required by orgnaizatio0ns produced within the organization and by the organizations staff, contractors, and advisors. We learned in the course of our investigation that the promises of processing updates to Web pages, price lists, contracts, sales and marketing collateral, and other routine information are largely unmet. One of the problems is that the disparate content types have different update and change cycles. The most widely used content management system based on our research results is SharePoint, and SharePoint is not able to deliver a comprehensive listing of content without significant latency. Fixes are available but these are engineering tasks which consume resources. Cloud solutions do not fare much better, once again due to latency. The bottom line is that for information produced within an organization employees are mostly unable to locate information without a manual double check. Latency is the problem. We did identify one system which delivered documented latency across disparate content types of 10 to 15 minutes. The solution is available from Exalead, but the other vendors’ systems were not able to match this problem of putting fresh, timely information produced within an organization in front of system users. Shocked? We were.

lucid decision copy

Reducing latency in search and content processing systems is a major challenge. Vendors often lack the resources required to solve a “hard problem” so “easy problems” are positioned as the key to improving information access. Is latency a popular topic? A few vendors do address the issue; for example, Digital Reasoning and Exalead.

Second, when organizations tap into content produced by third parties, the latency problem becomes more severe. There is the issue of the inefficiency and scaling of frequent index updates. But the larger problem is that once an organization “goes outside” for information, additional variables are introduced. In order to process the broad range of content available from publicly accessible Web sites or the specialized file types used by certain third party content producers, connectors become a factor. Most search vendors obtain connectors from third parties. These work pretty much as advertised for common file types such as Lotus Notes. However, when one of the targeted Web sites such as a commercial news services or a third-party research firm makes a change, the content acquisition system cannot acquire content until the connectors are “fixed”. No problem as long as the company needing the information is prepared to wait. In my experience, broken connectors mean another variable. Again, no problem unless critical information needed to close a deal is overlooked.

Read more

Inteltrax: Top Stories, September 12 to September 16

September 19, 2011

Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically, in the world of big data and business intelligence.

Our flagship story this week was the feature, “Solving Big Data’s Problems Stirs Controversy ,” http://inteltrax.com/?p=2522 that gave a deeper look at how quickly our online data is piling up and whether all the talk of harnessing it’s real power is just that: talk.

Another big data tale, “Big Data Skeptics Still Lingering” http://inteltrax.com/?p=2350 illuminated how infant the analytics industry really is and had a little fun at the expense of ourselves and other industry insiders.

Finally, we took another look at the growing world of online data with, “Data Analytics Needs More Specialization, not Less,” http://inteltrax.com/?p=2357 and discovered niches might just be the solution to all the analytic nightmares out there.

The theme for this week seems to have been the mounting concern over data buildup. We can’t stop, we know that. And, thankfully, it looks like we’ll be able to do some fascinating stuff with it—though not everyone agrees. You can bet, as innovations and setbacks happen along this road, we’ll be watching it closely.

Follow the Inteltrax news stream by visiting
www.inteltrax.com

Patrick Roland, Editor, Inteltrax September 19, 2011

Oracle Data Mining Update

September 5, 2011

The new Oracle Data Mining Update is generating buzz, including a piece by James Taylor entitled, “First Look – Oracle Data Mining Update.” Oracle Data Mining (ODM) is an in-database data mining and predictive analytics engine, which allows for the building of predictive models. The features added in the latest version are highlighted.

The fundamental architecture has not changed, of course. ODM remains a “database-out” solution surfaced through SQL and PL-SQL APIs and executing in the database. It has the 12 algorithms and 50+ statistical functions I discussed before and model building and scoring are both done in-database. Oracle Text functions are integrated to allow text mining algorithms to take advantage of them. Additionally, because ODM mines star schema data it can handle an unlimited number of input attributes, transactional data and unstructured data such as CLOBs, tables or views.

The ability of ODM to build and executive analytic models completely in-database is a real plus in the market. The software would be a good candidate for anyone interested in using predictive analytics to take advantage of their operational data.

Emily Rae Aldridge, September  5, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Statisticians Weigh In on Big Data

September 5, 2011

The Joint Statistical Meetings, the largest assembly of data scientists in North America, provided fertile ground this summer for a survey by Revolution Analytics on the state of Big Data technologies. Revolution Analytics presents the results in “97 Percent of Data Scientists Say ‘Big Data’ Technology Solutions Need Improvement.”

As the headline suggests, the vast majority of these experts crave improvement in the field:

The survey revealed nearly 97 percent of data scientists believe big data technology solutions need improvement and the top three obstacles data scientists foresee when running analytics on Big Data are: complexity of big data solutions; difficulty of applying valid statistical models to the data; and having limited insight into the meaning of the data.

Results also show a lack of consensus on the definition of “Big Data.” Is the threshold a terabyte? Petabyte? Or does it vary by the job? No accepted standard exists.

Survey-takers were asked about their future use of existing analytics platforms, SPSS, SAS, R, S+, and MATLAB. Most respondents expected to increase use of only one of these, the open source R project (a.k.a. GNU S).

Revolution Analytics bases their data management software and services on the R project. The company also sponsors Inside-R.org, a resource for the R project community. I’d have to see the survey to know whether the emphasis they found on R was skewed, but let’s give them the benefit of the doubt for now.

Cynthia Murrell, September 5, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Attensity: to Tweet or Not to Tweet, That Is The Question

September 1, 2011

Social media seems to be the solution to everyone’s problems these days. Even if your tweets don’t actually solve any issues, at least you can get something off your chest. Contact Center Solutions Community reported on how companies can take back the upper hand in their article, “Customer Service Trends: Monitoring and Responding to Social Media Conversations.”

While consumers see their status updates as mere complaints or topics of conversation amongst Facebook friends, Attensity sees this as unstructured data that they can help other companies extract insights from and eventually act based on the analysis.

The article taught us the following about the inner-workings of Attensity’s Analyze and Respond Solutions:

This is done through text analytics capable of feats like analyzing the entire Twitter “fire hose” (fed into the Attensity system as an API) in real time. Analyze 6, Attensity’s latest release, includes a feature called ‘hot spotting,’ which identifies trending conversations as they’re happening, tracks “normal” volume, and alerts companies when that volume goes hot or cold.

What happens when negative tweets about the company who is trying to prevent complaints on social media start infiltrating the “firehose”? Our view is that the “fire hose” is looking more and more like a stream that only a handful of companies can make available and process.

Maybe Nathan Wehner knows?

Megan Feil, September 1, 2011

Sponsored by Pandia.com

Inteltrax: Top Stories, August 8 to August 12, 2011

August 15, 2011

Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically how the legal world is impacted by data analytics.

One of our most popular entries this week was “Legal Marketplace Filled with Analytic Options.”  This was a quick look at all the data mining tools available to lawyers.

Another hot topic was our article, “Zettaset and Others Cashing in on Forensics.”  Proving forensic science has been aided undoubtedly by predictive analytics in ways CSI could only dream of.

In addition, our story, “Facial Recognition a Boon for Facebook and a Threat for SSNs”  detailed how legal tools, like facial recognition software, can backfire, causing a serious breach in security.

For the most part, we feel the legal world is aided in amazing ways by big data management systems. From the courtroom to the police station, people are utilizing these tools. But with any strong advance in technology, there is always a risk of misuse. We’ll be following these trends and others to watch this fascinating corner of the industry unfold.

Follow the Inteltrax news stream by visiting www.inteltrax.com

Patrick Roland, Editor, Inteltrax, August 15, 2011

Sponsored by Digital Reasoning, developers of the next-generation analytics platform, Synthesys.

Are Text Analytics Companies Learning the Silicon Valley Way?

August 11, 2011

Seth Grimes, founding chair for the Text Analytics Summit, interviewed three experts in order to find out what it is that Silicon Valley and the world of text analytics have in common. The full interview, “What Can Text Analytics and Silicon Valley Learn From Each Other?” can be found at Text Analytics News.

Grimes reports, “Business markets are global, yet the Bay Area stands out as a source and consumer of innovative technologies and in particular, as a pace-setter for the online and social worlds. With the Text Analytics Summit coming to San Jose, I reached out to a few west-coasters who are making Valley text analytics news: Nitin Indurkhya, principal research scientist at eBay Research Labs; YY Lee, COO of FirstRain; and Michael Osofsky, co-founder and chief innovation officer at NetBase.”

Osofsky explains the balance between precision and recall in text analytics, and urges Silicon Valley to understand that time and energy should be devoted to experimenting to find a balance between the two principles. On the other hand, Silicon Valley’s fast and exciting nature could be a good influence on the text analytics world. Software can be launched, edited, and evolved quickly and risks can be taken. Absorbing a bit of that mentality could enable text analytics to be a little more innovative and adventurous.

Indurkya encourages the text analytics world to adopt the Valley principle of “fail often and fail quickly.” In this way, he explains, innovation happens and failure does not bog down the overall momentum.

Lee encourages text analytics companies to focus separately on each of three equally important components: 1) Input 2) Internal process 3) Presentation. Each of the categories falls broadly under the category of text analytics and yet Lee stresses each must be treated independently during development.

Grimes concludes with his own collective thoughts on the three interviews.

The key takeaways that I see in these responses involve problem and product focus, agility, and the desirability of pulling and integrating information from multiple sources with the application of a variety of analytical techniques, in order to achieve technical and business goals. There’s no “Do X, Y, and Z” formula here, but there is definitely a sense of the rewards that are possible if text analytics is done right.

Out-of-the-box thinking is beneficial in any business arena, but especially those known more for rigidity than innovation.

Emily Rae Aldridge, August 11, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search. And our own Stephen E Arnold is speaking at this year’s November 2011 event.

The Text Analytics Summit has been a staple of the text analytics community for the past 7 years. To help this community grow, the Text Analytics Summit is finally coming to the west coast to foster new networking opportunities, promote more healthy knowledge sharing, and create strong, long-lasting business relationships. Text Analytics is essential for maximizing the customer experience, effectively monitoring the social media world, conducting first-class data analysis and research, and improving the business decision making process. Attend the summit to discover how to unlock the power of text analytics to leverage new and profitable business opportunities. Whether you’re interested in taking advantage of social media analytics, customer experience management, sentiment analysis, or Voice of the Customer, Text Analytics Summit West is the only place to get the inside information that you need to stay ahead of the competition and profit from text mining. For more information, click here.

 

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta