Inteltrax: Top Stories, July 2 to July 6

July 9, 2012

Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically, some of the more interesting niches in the industry.

America’s passtime takes front and center stage in our story, “Baseball and Analytics Hit a Home Run,” which showcases the data explosion in the sport.

Wall Street, too, is a big center of analytic thought and our story, “Analytic Financial Trends” unlocks some of the big moves happening.

Finally, Washington has long been a supporter of big data and our story, “Data Mining to Play Role in 2012 Election” shows that Obama, Romney and other offices are using this technology to their advantage.

Analytics is invading our world, often in the most unexpected places. This is just a small sampling of the deep research we provide every day.

Follow the Inteltrax news stream by visiting www.inteltrax.com

Patrick Roland, Editor, Inteltrax.

July 9, 2012

Written by Stephen E. Arnold · Filed Under Analytics, Business intelligence, Data mining, Text analytics, Text processing | Comments Off on Inteltrax: Top Stories, July 2 to July 6

Protected: The Federal Government Turns to Litigation Software

July 5, 2012

Written by Stephen E. Arnold · Filed Under Analytics, EDiscovery, News, Text analytics | Comments Off on Protected: The Federal Government Turns to Litigation Software

Google and Latent Semantic Indexing: The KnowledgeGraph Play

June 26, 2012

One thing that is always constant is Google changing itself. Not too long ago Google introduced yet another new tool: Knowledge Graph. Business2Community spoke highly about how this new application proves the concept of latent semantic indexing in “Keyword Density is Dead…Enter “Thing Density.” Google’s claim to fame is providing the most relevant search results based on a user’s keywords. Every time they update their algorithm it is to keep relevancy up. The new Knowledge Graph allows users to break down their search by clustering related Web sites and finding what LSI exists between the results. From there the search conducts a secondary search and so on. Google does this to reflect the natural use of human language, i.e. making their products user friendly.

But this change begs an important question:

“What does it mean for me!? Well first and foremost keyword density is dead, I like to consider the new term to be “Concept Density” or to coin Google’s title to this new development “Thing Density.” Which thankfully my High School English teachers would be happy about. They always told us to not use the same term over and over again but to switch it up throughout our papers. Which is a natural and proper style of writing, and we now know this is how Google is approaching it as well.”

The change will means good content and SEO will be rewarded. This does not change the fact, of course, that Google will probably change their algorithm again in a couple months but now they are recognizing that LSI has value. Most IVPs that provide latent semantic indexing, content and text analytics, such as Content Analyst,have gone way beyond what Google’s offering with the latest LSI trends to make data more findable and discover new correlations.

Whitney Grace, June 26, 2012

The Alleged Received Wisdom about Predictive Coding

June 19, 2012

Let’s start off with a recommendation. Snag a copy of the Wall Street Journal and read the hard copy front page story in the Marketplace section, “Computers Carry Water of Pretrial Legal Work.” In theory, you can read the story online if you don’t have Sections A-1, A-10 of the June 18, 2012, newspaper. Check out a variant of the story appears as “Why Hire a Lawyer? Computers Are Cheaper.”

Now let me offer a possibly shocking observation: The costs of litigation are not going down for certain legal matters. Neither bargain basement human attorneys nor Fancy Dan content processing systems make the legal bills smaller. Your mileage may vary, but for those snared in some legal traffic jams, costs are tough to control. In fact, search and content processing can impact costs, just not in the way some of the licensees of next generation systems expect. That is one of the mysteries of online that few can penetrate.

The main idea of the Wall Street Journal story is that “predictive coding” can do work that human lawyers do for a higher cost but sometimes with much less precision. That’s the hint about costs in my opinion. But the article is traditional journalistic gold. Coming from the Murdoch organization, what did I expect? i2 Group has been chugging along with relationship maps for case analyses of important matters since 1990. Big alert: i2 Ltd. was a client of mine. Let’s see that was more than a couple of weeks ago that basic discovery functions were available.

The write up quotes published analyses which indicate that when humans review documents, those humans get tired and do a lousy job. The article cites “experts” who from Thomson Reuters, a firm steeped in legal and digital expertise, who point out that predictive coding is going to be an even bigger business. Here’s the passage I underlined: “Greg McPolin, an executive at the legal outsourcing firm Pangea3 which is owned by Thomson Reuters Corp., says about one third of the company’s clients are considering using predictive coding in their matters.” This factoid is likely to spawn a swarm of azure chip consultants who will explain how big the market for predictive coding will be. Good news for the firms engaged in this content processing activity.

What goes faster? The costs of a legal matter or the costs of a legal matter that requires automation and trained attorneys? Why do companies embrace automation plus human attorneys? Risk certainly is a turbo charger?

The article also explains how predictive coding works, offers some cost estimates for various actions related to a document, and adds some cautionary points about predictive coding proving itself in court. In short, we have a touchstone document about this niche in search and content processing.

My thoughts about predictive coding are related to the broader trends in the use of systems and methods to figure out what is in a corpus and what a document is about.

First, the driver for most content processing is related to two quite human needs. First, the costs of coping with large volumes of information is high and going up fast. Second, the need to reduce risk. Most professionals find quips about orange jump suits, sharing a cell with Mr. Madoff, and the iconic “perp walk” downright depressing. When a legal matter surfaces, the need to know what’s in a collection of content like corporate email is high. The need for speed is driven by executive urgency. The cost factor clicks in when the chief financial officer has to figure out the costs of determining what’s in those documents. Predictive coding to the rescue. One firm used the phrase “rocket docket” to communicate speed. Other firms promise optimized statistical routines. The big idea is that automation is fast and cheaper than having lots of attorneys sifting through documents in printed or digital form. The Wall Street Journal is right. Automated content processing is going to be a big business. I just hit the two key drivers. Why dance around what is fueling this sector?

Written by Stephen E. Arnold · Filed Under EDiscovery, Editorial opinion, Feature, Indexing, Legal matters, Search, Text analytics, Text processing | Comments Off on The Alleged Received Wisdom about Predictive Coding

More Predictive Silliness: Coding, Decisioning, Baloneying

June 18, 2012

It must be the summer vacation warm and fuzzies. I received another wild analytics news release today. This one comes from 5WPR, “a top 25 PR agency.” Wow. I learned from the spam: PeekAnalytics “delivers enterprise class Twitter analytics and help marketers understand their social consumers.”

What?

Then I read:

By identifying where Twitter users exist elsewhere on the Web, PeekAnalytics offers unparalleled audience metrics from consumer data aggregated not just from Twitter, but from over sixty social sites and every major blog platform.

The notion of algorithms explaining anything is interesting. But the problem with numerical recipes is that those who use outputs may not know what’s going on under the hood. Wide spread knowledge of the specific algorithms, the thresholds built into the system, and the assumptions underlying the selection of a particular method is in short supply.

Analytics is the realm of the one percent of the population trained to understand the strengths and weaknesses of specific mathematical systems and methods. The 99 percent are destined to accept analytics system outputs without knowing how the data were selected, shaped, formed, and presented given the constraints of the inputs. Who cares? Well, obviously not some marketers of predictive analytics, automated indexing, and some trigger trading systems. Too bad for me. I do care.

When I read about analytics and understanding, I shudder. As an old goose, each body shake costs me some feathers, and I don’t have many more to lose at age 67. The reality of fancy math is that those selling its benefits do not understand its limitations.

Consider the notion of using a group of analytic methods to figure out the meaning of a document. Then consider the numerical recipes required to identify a particular document as important from thousands or millions of other documents.

When companies describe the benefits of a mathematical system, the details are lost in the dust. In fact, bringing up a detail results in a wrinkled brow. Consider the Kolmogorov-Smirnov Test. Has this non parametric test been applied to the analytics system which marketers have presented to you in the last “death by PowerPoint” session? The response from 99.5 percent of the people in the world is, “Kolmo who?” or “Isn’t Smirnov a vodka?” Bzzzz. Wrong.

Mathematical methods which generate probabilities are essential to many business sectors. When one moves fuel rods at a nuclear reactor, the decision about what rod to put where is informed by a range of mathematical methods. Special training experts, often with degrees in nuclear engineering plus post graduate work handle the fuel rod manipulation. Take it from me. Direct observation is not the optimal way to figure out fuel pool rod distribution. Get the math “wrong” and some pretty exciting events transpire. Monte Carlo anyone? John Gray? Julian Steyn? If these names mean nothing to you, you would not want to sign up for work in a nuclear facility.

Why then would a person with zero knowledge of how numerical recipes, oddball outputs from particular types of algorithms, and little or know experience with probability methods use the outputs of a system as “truth.” The outputs of analytical systems require expertise to interpret. Looking at a nifty graphic generated by Spotfire or Palantir is NOT the same as understand what decisions have been made, what limitations exist within the data display, and what are the blind spots generated by the particular method or suite of methods. (Firms which do focus on explaining and delivering systems which make it clear to users about methods, constraints, and considerations include Digital Reasoning, Ikanow, and Content Analyst. Others? You are on your own, folks.)

Today I have yet another conference call with 30 somethings who are into analytics. Analytics is the “next big thing.” Just as people assume coding up a Web site is easy, people assume that mathematical methods are now the mental equivalent of clicking a mouse to get a document. Wrong.

The likelihood of misinterpreting the outputs of modern analytic systems is higher than it was when I entered the workforce after graduate school. These reasons include:

A rise in the “something for nothing” approach to information. A few clicks, a phone call, and chit chat with colleagues makes many people expert in quite difficult systems and methods. In the mid 1960s, there was limited access to systems which could do clever stuff with tricks from my relative Vladimir Ivanovich Arnold. Today, the majority of the people with whom I interact assume their ability to generate a graph and interpret a scatter diagram equips them as analytic mavens. Math is and will remain hard. Nothing worthwhile comes easy. That truism is not too popular with the 30 somethings who explain the advantages of analytics products they sell.
Sizzle over content. Most of the wild and crazy decisions I have learned about come from managers who accept analytic system outputs as a page from old Torah scrolls from Yitzchok Riesman’s collection. High ranking government officials want eye candy, so modern analytic systems generate snazzy graphics. Does the government official know what the methods were and the data’s limitations? Nope. Bring this up and the comment is, “Don’t get into the weeds with me, sir.” No problem. I am an old advisor in rural Kentucky.
Entrepreneurs, failing search system vendors, and open source repackagers are painting the bandwagon and polishing the tubas and trombones. The analytics parade is on. From automated and predictive indexing to surfacing nuggets in social media—the music is loud and getting louder. With so many firms jumping into the bandwagon or joining the parade, the reality of analytics is essentially irrelevant.

The bottom line for me is that the social boom is at or near its crest. Marketers—particularly those in content processing and search—are desperate for a hook which will generate revenues. Analytics seems to be as good as any other idea which is converted by azure chip consultants and carpetbaggers into a “real business.”

The problem is that analytics is math. Math is easy as 1-2-3; math is as complex as MIT’s advanced courses. With each advance in computing power, more fancy math becomes possible. As math advances, the number of folks who can figure out what a method yields decreases. The result is a growing “cloud of unknowing” with regard to analytics. Putting this into a visualization makes clear the challenge.

Stephen E Arnold, June 18, 2012

Written by Stephen E. Arnold · Filed Under Analytics, Business strategy, Feature, Search, Text analytics, Text processing | Comments Off on More Predictive Silliness: Coding, Decisioning, Baloneying

Inteltrax: Top Stories, June 11 to June 15

June 18, 2012

Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically, how governments and the voting public are utilizing big data.

In “Government Leads Way in Big Data Training” we discovered the private sector lagging behind the government in terms of user education.

Our story, “U.S. Agencies Analytics Underused” showed that even though we have all that training, some agencies still need more to fully utilize this digital power.

“Cultural Opinion Predicted by Analytics” used the Eurovision song contest to show us the power of people using analytics and gives the nugget of thought as to how this could be used in government elections.

While sometimes the outcomes contradict one another, there’s no denying that big data analytics is a huge part of governments around the world. Expect these facts to only rise as the popularity catches fire.

Follow the Inteltrax news stream by visiting www.inteltrax.com

Patrick Roland, Editor, Inteltrax.

June 18, 2012

Written by Stephen E. Arnold · Filed Under Analytics, Big data, Business intelligence, Enterprise, Search, Text analytics | Comments Off on Inteltrax: Top Stories, June 11 to June 15

Inteltrax: Top Stories, June 4 to June 8

June 11, 2012

Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically, how financial markets are being influenced and affected by big data analytics.

In “Venture Capitalists Invest in Cloud Based API Provider” we explore how tons of financial investments, namely in the cloud, are changing the game of big data.

In “UK Financial Industry Benefiting from Analytics” we discovered how England is attempting to avoid Eurozone financial catastrophe with analytics.

Finally, our feature, “Quantitative Financial Analytics is a Serious Weapon” dove headlong into this new buzzword and its impact on financial markets and the vendors supplying software.

With global markets plummeting or rising in equally shaky motions, analytics looks to be a potential stabilizing force. We’ll keep watching to see what kind of aid it can be.

Follow the Inteltrax news stream by visiting www.inteltrax.com

Patrick Roland, Editor, Inteltrax.

June 11, 2012

Written by Stephen E. Arnold · Filed Under Analytics, Big data, Business intelligence, Data mining, Enterprise, Technology, Text analytics | Comments Off on Inteltrax: Top Stories, June 4 to June 8

HP Autonomy: The Big Data Arabesque

June 5, 2012

Hewlett Packard has big plans for Autonomy. HP paid $10 billion for the search and content processing company last year. HP faces a number of challenges in its printer and ink business. The personal computer business is okay, but HP is without a strong revenue stream from mobile devices.

“HP Rolls Out Hadoop AppSystem Stack” provided some interesting information about Autonomy and big data. The write up focuses on the big data trend. In order to make sense out of large volumes of information, HP wants to build management software, integrate the “Vertica column oriented distributed database and the Autonomy Intelligent Data Operating Layer (IDOL) 10 stack.” The article reports:

On the Autonomy front, HP has announced the capability to put the IDOL 10 engine, which supports over 1,000 file types and connects to over 400 different kinds of data repositories, onto each node in a Hadoop cluster. So you can MapReduce the data and let Autonomy make use of it. For instance, you can use it to feed the Optimost Clickstream Analytics module for the Autonomy software, which also uses the Vertica data store for some parts of the data stream. HP is also rolling out its Vertica 6 data store, and the big new feature is the ability to run the open source R statistical analysis programming language in parallel on the nodes where Vertica is storing data in columnar format. More details on the new Vertica release were not available at press time, but Miller says that the idea is to provider connectors between Vertica, Hadoop, and Autonomy so all of the different platforms can share information.

HP’s idea blends a hot trend, HP’s range of hardware, HP’s system management software, a database, and Autonomy IDOL. In order to make this ensemble play in tune, HP will offer professional services.

InfoWorld’s “HP Extends Autonomy’s Big Data Chops to Hadoop Cloud” added some additional insight. I learned that former Autonomy boss Michael Lynch will leave HP “along with Autonomy’s entire original management team and 20 percent of its staff.”

The story then explained that Autonomy, which combines with Vertica:

can now be embedded in Hadoop nodes. From there, users can combine Idol’s 500-plus functions — including automatic categorization, clustering, and hyperlinking — to scour various sources of structured and unstructured data to glean deeper meanings and trends. Sources run the gamut, too, from structured data such as purchase history, services issues, and inventory records to unstructured Twitter streams, and even audio files. IDOL includes 400 connectors, which companies can use to get at external data.

Autonomy moved beyond search many years ago. This current transformation of Autonomy makes marketing sense. I am interested in monitoring this big data approach. IBM had a similar idea when it presented the Vivisimo clustering and deduplication system as a “big data” system. The challenge will be applying text centric technology to ensembles which generate insights from “big data.”

Will the shift earn back the purchase price of $10 billion and have enough horsepower to pull HP into robust top line growth? Big data and analytics have promise but I don’t know of any single analytics company that has multi-billion dollar product lines. Big data is a hot button, but does it hard wire into the pocketbooks of chief financial officers?

Stephen E Arnold, June 5, 2012

The New Lexi-Portal Version 4 Offers More Options

June 5, 2012

Leximancer just introduced Lexi-Portal Version 4 to the market. This new service provides availability to users for all the wide-ranging text analytic capability of Leximancer. Market researchers will find that this portal will provide them with fast analysis of qualitative surveys, spreadsheets and verbatim data.

Leximancer’s technology is proven with customers all around the globe. Their providing new and innovative ways for businesses to benefit in a no strings attached way. Basically, you have options on how to utilize the Lexi-Portal.

There are several aspects of their portal that make it unique to users, such as the fact they made it an ‘on demand’ service. This means you don’t actually have to subscribe every month, but instead are charged for the actual amount of usage based on either a time used or service basis. The convenience of the pay as you go aspect is that the Lexi-Portal will retain your company’s information for up to two months even if your usage drops for a month.

About Leximancer:

“Leximancer is an Australian company that has been providing leading-edge text analytics technology for almost 10 years.”

“The technology was created following 7 years research and development at the University of Queensland by Dr Andrew Smith. Andrew’s physics and cognitive science background, in conjunction with his working IT application experience, enabled him to envisage and develop an innovative solution to the growing need to readily determine meaning from unstructured, qualitative, textual data.”

You can view sample out’s at the Leximancer Chart Gallery such as the interview dashboard below:

Jennifer Shockley, June 5, 2012

Inteltrax: Top Stories, May 28 to June 1

June 4, 2012

Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically, what is hot and trending in big data these days.

The first answer came from our story, “Dashboard Data Analytics Hot” which showcases the many ways in which increased usability is increasing big data’s popularity.

Also, “The Next Great Data Gold Mine” looks a little deeper into what we already know, social media is going to be huge for analytics.

Finally, “Analytic Healthcare Contests Boom” showed that many of the health field’s biggest problems are being solved by analytic contests.

The rapidly evolving world of big data is always in flux. What’s hot today might be cold next week. But know we’ll be taking the industry’s temperature every day to stay atop all the exciting changes.

Follow the Inteltrax news stream by visiting www.inteltrax.com

Patrick Roland, Editor, Inteltrax.

June 4, 2012

Written by Stephen E. Arnold · Filed Under Analytics, Big data, Business intelligence, Data mining, Social, Technology, Text analytics | Comments Off on Inteltrax: Top Stories, May 28 to June 1

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Inteltrax: Top Stories, July 2 to July 6

Protected: The Federal Government Turns to Litigation Software

Google and Latent Semantic Indexing: The KnowledgeGraph Play

The Alleged Received Wisdom about Predictive Coding

More Predictive Silliness: Coding, Decisioning, Baloneying

Inteltrax: Top Stories, June 11 to June 15

Inteltrax: Top Stories, June 4 to June 8

HP Autonomy: The Big Data Arabesque

The New Lexi-Portal Version 4 Offers More Options

Inteltrax: Top Stories, May 28 to June 1

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta