Semantic Wranglers to Tame Media Content

February 6, 2012

When the prolificacy of the media scape overwhelms, it is semantic technology to the rescue. So declares ReadWriteWeb in “Semantic Tech the Key to Finding Meaning in the Media.” Writer Chris Lamb maintains that today’s deluges of information have made attention span the prize, and delivering relevancy the key. Strategies have included tapping readers’ social graphs, profiles, and preferences to filter news content. Lamb writes:

These current approaches are doomed. With respect to social graph curation, people have different roles at during different times. On the weekend, a reader might be interested in arts, entertainment and sports news based on a friends and family. During the week, this same person may be interested in business news based on recommendations from trading partners in the capital markets. How do readers seamlessly reconcile this?

Lamb doesn’t have the answer, but says he does know what technologies will underlie the eventual solutions: tagging, semantic extraction, disambiguation, and linked data structures (including cloud data). See the write up for more the reasoning behind each.

Semantic technology can perform useful functions. Rich media pose some special challenges. Among them are the issues of data volume and available processing power, latency, and variability in indexable content. What about a silent movie? What about a program which features interviews with individuals with a substance abuse problem who speak colloquially with a mumble?

Cynthia Murrell, February 6, 2012

Sponsored by Pandia.com

Written by Stephen E. Arnold · Filed Under News, Semantic, Text analytics, Text processing | Comments Off on Semantic Wranglers to Tame Media Content

Craig Norris Leaves Attensity

February 2, 2012

Chiliad has issued the press release, “New CEO Begins Duties at CHILIAD in Herndon, VA.” Craig Norris is leaving Attensity to head that company. Attensity, owned by Aeris Capital, is positioned as a global natural language analytics company. Chiliad seems to be its direct competitor. Interesting.

Chiliad Chairman Patrick Gross noted a couple of challenges his company’s new CEO has already tackled:

The first is the ability to rapidly search data collections at greater scale than any other offering in the market. The second is to allow search formulation and analysis in natural language. This means that no longer is an elite class of analysts required in order to generate meaningful results, thus reducing the personnel training and skills shortages that plague alternative solutions and put timely discovery at risk. The explosion of ‘Big Data’ is real and valuable findings are buried in vast collections for both enterprises and governments. Chiliad has the opportunity to integrate its innovative, massively scalable solutions with emerging open source software to build customized solutions for the largest-scale clients.

It will be interesting to see how the market reacts to this shift.

Cynthia Murrell, February 2, 2012

Written by Stephen E. Arnold · Filed Under Business strategy, News, Text analytics, Text processing | 1 Comment

Search Only Goes So Far

January 30, 2012

Infocentric Research surveyor Stephan Schillerwein, who presented his findings at the Online Information Conference, released some alarming statistics about enterprise search in his report “The Digital Workplace.” Among the points which jumped out at me were 40 percent of employees use the wrong information when conducting enterprise searches and 63 percent “make critical decisions without being informed,” which results in a 25 percent work information productivity loss.

According to the Pandia Search Engine News Article “Huge Problems for Search In the Enterprise” Schillerwein believes there are a few reasons why enterprise search is problematic. Users don’t account for the fact that enterprise search is different from Web Search, they have unrealistic expectations and there is a clear problem of lack of content. The Pandia article asserted: Schillerwein suggests a solution based on several elements, such as consistent coverage of information flows for processes, bringing together the worlds of structured and unstructured information, and adding context. I would agree as this ability to combine structured and unstructured data while maintaining context is key in our approach. However, when you combine the crowded jumble of tweets, social media and other data that crowd employees’ smart devices the problems with enterprise search could continue to take a downward spiral and “finding a needle in a haystack” could be easier than doing an enterprise search.

These observations triggered several questions and observations.

First, there are a number of companies offering enterprise information solutions. Many are focused on the older approach of key word queries. There are business intelligence systems which provide “find-ability” tools along with a range of useful analytic features. Although search is not the focal point of these solutions, they do provide useful visualizations and statistics on content. The problem is that most organizations are confused about what is needed and what must be done to maximize the value of systems which go beyond key word retrieval. This confusion is likely to play a far larger role in enterprise search challenges than many market analysts want to acknowledge. Instead, many solutions today seem to be making information access more confusing and problematic, not clearer and more trouble free.

Second, the challenge may be more directly related to figuring out what specific business process needs which information. Without a clear understanding of the user’s requirements, it may be difficult to deploy a system that delivers higher user satisfaction. If this hypothesis is correct, perhaps more vendors should adopt the approach we have taken at Digital Reasoning. We make an extra effort to understand what the user requires and then invest time and resources in hooking appropriate information and data into the system. No solution can deliver the right fact-based answers if the required information is not within the data store and available to the algorithms which make sense of what is otherwise noise? We think that many problems with user acceptance originate with a misunderstanding or sidestepping of user requirements and the fundamental task of getting the necessary information for the system.

Third, the terminology used to describe information retrieval and access is becoming devalued. At Digital Reasoning, we work to explain succinctly and without jargon how our next-generation system can facilitate better decision making for financial, health, intelligence, and other professional markets. We have complex numerical recipes and sophisticated systems and methods. Our focus, however, is on what the system does for a user. We have been fortunate to receive support from a range of clients from government and industry as well as the investment community for our next-generation approach. We think our strength is our focus on the customer’s need and not only our unique predictive algorithms and cloud-based solution.

To learn more about Digital Reasoning and our products, navigate to www.digitalreasoning.com .

Dave Danielson, Digital Reasoning, January 30, 2012

Sponsored by Pandia.com

Written by Stephen E. Arnold · Filed Under Business strategy, News, Semantic, Technology, Text analytics, Text processing | Comments Off on Search Only Goes So Far

Prediction Data Joins the Fight

January 12, 2012

It seems that prediction data could be joining the fight against terrorism. According to the Social Graph Paper article “Prediction Data As An API in 2012” some companies are working on developing prediction models that can be applied to terror prevention. The article mentions the company Palantir “they emphasize development of prediction models as applied to terror prevention, and consumed by non-technical field analysts.” Recorded Future is another company but they rely on “creating a ‘temporal index’, a big data/ semantic analysis problem, as a basis to predict future events.” Other companies that have been dabbling in big data/prediction modeling are Sense Networks, Digital Reasoning, BlueKai and Primal. The author theorizes that “There will be data-domain experts spanning the ability to make sense of unstructured data, aggregate from multiple sources, run prediction models on it, and make it available to various “application” providers.” Using data to predict the future seems a little farfetched but the technology is still new and not totally understood. Everyone does need to join the fight against terrorism but exactly how data prediction fits in remains to be seen.

April Holmes, January 12, 2012

Sponsored by Pandia.com

Written by Stephen E. Arnold · Filed Under Analytics, Government, News, Text analytics, Text processing | Comments Off on Prediction Data Joins the Fight

Big Data in 2012: Reliable Open-Source Software Required

January 11, 2012

Enthusiasm and optimism that Big Data as a concept is the next big thing. We are almost ready to board the Big Data bull dozer. The hoopla surrounding Big Data has not died down in 2012. Instead, the concept demonstrates the continuing environment of processing and analysis.

As businesses become aware that the Big Data trend is here to stay, publishers are looking for reliable support. The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. The company offers much in the way of dealing with unstructured data and is setting the pace for consolidation as well as personalization. I came across an interesting article, “State of the World IT: Big Data, An Offer That is Formed” (The original article is in French, but http://translate.google.com works well for this gosling). We learn:

As a recognition of the market in 2011, Hadoop has also attracted the top names in the IT industry who put this framework in the heart of their range of data processing volume. One reason: the cost mainly reminded us James Markarian, executive vice president and technical director of Informatica confirming that the framework ‘helped to change the economic model of the Big Data.’ Adding that flexibility… was as a criterion for adoption.

It is clear that the excess of data will only continue to grow by the minute. Generations of search, publishing, and consolidation will continue to emerge. I recommend staying informed of the products and the specific capabilities of each. However, Big Data which is filtered may pose some interesting problems; for example, will the outputs match the pre-filtered reality? Will predictive methods work when some data are no longer in the stream? So far the cheerleading is using chants from an older, pre-filtering era. Is this a good thing or a no-thing?

Andrea Hayden, January 11, 2012

Sponsored by Pandia.com

Written by Stephen E. Arnold · Filed Under Analytics, Government, News, Privacy, Security, Text analytics, Text processing | 1 Comment

Temis, Spammy PR, and Quite Silly Assertions

January 11, 2012

I am working on a project related to semantics. The idea is, according to that almost always reliable Wikipedia resource is:

the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata.

Years ago I studied at Duquesne University, a fascinating blend of Jesuit obsession, basketball, and phenomenological existentialism. If you are not familiar with this darned exciting branch of philosophy, you can dig into Psychology from an Empirical Standpoint by Franz Brentano or grind through Carl Stumpf’s The Psychological Origins of Space Perception, or just grab the Classic Comic Book from your local baseball card dealer. (My hunch is that many public relations professionals feel more comfortable with the Classic approach, not the primary texts of philosophers who focus on how ephemera and baloney affect one’s perception of reality one’s actions create.)

But my personal touchstone is Edmund Husserl’s body of work. To get the scoop on Lebenswelt (a universe of what is self-evident), you will want to skip the early work and go directly to The Crisis of European Sciences and Transcendental Phenomenology. For sure, PR spam is what I would call self evident because it exists, was created by a human (possibly unaware that actions define reality), to achieve an outcome which is hooked to the individual’s identify.

Why mention the crisis of European thought? Well, I received “American Society for Microbiology Teams Up With TEMIS to Strengthen Access to Content” in this morning’s email (January 10, 2012). I noted that the document was attributed to an individual identified as Martine Fallon. I asked to be removed from the spam email list that dumps silly news releases about Temis into my system. I considered that Martine Fallon may be a ruse like Betty Crocker. Real or fictional, I am certain she or one of her colleagues, probably schooled in an esoteric discipline such as modern dance, agronomy, and public relations are familiar with the philosophical musings of Jean Genet.

You can get a copy of Born to Lose at this link.

I recall M. Genet’s observation:

I recognize in thieves, traitors and murderers, in the ruthless and the cunning, a deep beauty – a sunken beauty.

Temis, a European company in the dicey semantic game, surely appreciates the delicious irony of explaining a license deal as a “team”. The notion of strengthening access to content is another semantic bon mot. The problem is that the argument does not satisfy my existential quest for factual information; for example, look at the words and bound phrases in bold:

Temis, the leading provider of Semantic Content Enrichment solutions for the Enterprise, today announced it has signed a license and services agreement with the American Society for Microbiology (ASM), the oldest and largest life science membership organization in the world.

Do tell. Leading? Semantic content enrichment. What’s that?

The “leading” word is interesting but it lacks the substance of verifiable fact. Well, there’s more to the news story and the Temis pitch. Temis speaks for its client, asserting:

To serve its 40,000 members better, ASM is completely revamping its online content offering, and aggregating at a new site all of its authoritative content, including ASM’s journal titles dating back to 1916, a rapidly expanding image library, 240 book titles, its news magazine Microbe, and eventually abstracts of meetings and educational publications.

I navigated to the ASM Web site, did some poking around, and learned that ASM is rolling in dough. You can verify the outfit’s financial status at this page. But the numbers and charts allowed me to see that ASM has increasing assets, which is good. However, this chart suggests that since 2008, revenue has been heading south.

Source: http://www.faqs.org/tax-exempt/DC/American-Society-For-Microbiology.html

In my limited experience in rural Kentucky, not-for-profits embrace technology for one of three reasons. Let me list them and see if we can figure out what causes the estimable American Society for Microbiology.

Written by Stephen E. Arnold · Filed Under Business strategy, Feature, Marketing, Semantic, Text analytics, Text processing | 2 Comments

Connotate Embraces Big Data

January 10, 2012

The Internet is an environment where unregulated data is being created at rapid rates. It has become far too much for company staff to keep track of. Therefore, software that collects and organizes Big Data is becoming a hot commodity for enterprises all over the world.

According to the recent news release “Staffing and the Volume of Information are the Primary Big Data Challenges” Connotate, Inc., a provider of solutions that help organizations monitor and collect data and content from the Web, announced the results of its Big Data Attitudes and Perceptions Survey.

Connotate CEO Tom Meyer said:

Our research shows that Big Data goes beyond technology and is an HR challenge for corporate America. While it is important that organizations devote resources to Big Data, employees must be freed from the information fire hose so they can concentrate only on the information that is relevant to their tasks. Connotate’s Agent Community data extraction and monitoring tools are a proven force multiplier, enabling companies to drastically reduce the amount of personnel needed to run and achieve significant ROI from Big Data projects.

The Connotate survey suggests that companies are finding it too time consuming and impractical for their staff to sort through Big Data. Companies focused on data fusion are responding to the explosion in social content. Clients demand; vendors respond.

Jasmine Ashton, January 10, 2012

Sponsored by Pandia.com

Written by Stephen E. Arnold · Filed Under Data mining, News, Search, Text analytics, Text processing | 2 Comments

Digital Reasoning Connects with TeraDact

January 4, 2012

Big data analytics specialist Digital Reasoning has been a regular topic of discussion here at Beyond Search, most recently for achieving series B funding for a big data intelligence push.

Now, we would like to share an exciting new development in the quest to solve the big data problem in the news release “Digital Reasoning and TeraDact Partner to Automatically Remove Sensitive Information from Big Data.”

According to the article, TeraDact Solutions, a software tools and data integration solutions provider, has integrated their TeraDactor Information Identification and Presentation capabilities with Synthesys Cloud, a software-as-a-service data analytics solution.

The news story states:

In conjunction with Synthesys, TeraDactor can automatically assist in appropriately classifying information not recognized by the original data provider. TeraDactor allows participants to push and pull information without waiting for the declassification process, assuring that formerly classified documents may be released without unintended leakages.

The innovative technology that TeraDact Solutions brings to Digital Reasoning’s table demonstrates the power of Synthesys as a cloud-based data analytics tool in building the next generation of Big Data analytic solutions. Kudos to the surging Digital Reasoning organization.

Jasmine Ashton, January 4, 2012

Sponsored by Pandia.com

Written by Stephen E. Arnold · Filed Under Business strategy, Data mining, Database, News, Technology, Text analytics, Text processing | Comments Off on Digital Reasoning Connects with TeraDact

Digital Reasoning: A New Generation of Big Data Tools

December 31, 2011

I read “Tool Detects Patterns Hidden in Vast Data Sets.” The Broad Institute’s online Web site reported that a group of researchers in the US and Israel “have developed a tool that can tackle large data sets in a way that no other software program can.”

What seems exciting to me is that the mathematical procedure which involves creating a space and grids into which certain discerned patterns are placed provides a fascinating potential enhancement to companies like ours–Digital Reasoning. Our proprietary methods have performed similar associative analytics in order to reduce the uncertainty associated with processing large flows of data and distilling meaningful relationships from them. Some day computers and associated systems will be able to cope with exabytes of data from the Internet of things. Today, the Broad Institute validates the next-generation numerical methods that its researchers, Digital Reasoning’s engineers, and a handful of other organizations have been exploring.

The technical information about the method, which is called MIC, shorthand for Maximal Information Coefficient, is available to members of the AAAS. To get a copy of the original paper and its mathematical exegesis you will want the full bibliographic information:

“Detecting Novel Associations in Large Data Sets” by David N. Reshef, Yakir A. Reshef, Hilary K. Finucane, Sharon R. Grossman, Gilean McVean, Peter J. Turnbaugh, Eric S. Lander, Michael Mitzenmacher, and Pardis C. Sabeti, Science, 16 December 2011, Volume. 334, Number 6062, pages 1518-1524.

The core of the authors work is:

Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R²) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.

Digital Reasoning’s application of similar mathematical methods underpins our entity-oriented analytics. You can read more about our methods in our description of Synthesys, a platform for performing automated understanding of the meaning of Big Data in real time.

The significance of this paper is that it shines a spotlight on the increasing importance of research into applications of next-generation numerical methods. Public discussion of methods like MIC will serve to accelerate innovation and the diffusion of knowledge. At Digital Reasoning we see this as further evidence of the potential of algorithmic, unaided approaches like ours to achieve true “automated understanding” of all forms of text regardless of volume, velocity or variety. As we shift to IPv6, the “Internet of things” will dramatically increase the flows of real time data. With automobiles and consumer devices transmitting data continuously or on demand, the digital methods of 10 or five years ago fall short.

Three other consequences of MIC-style innovations will accrue:

First, at Digital Reasoning, we will be able to enhance our existing methods with the new insights, forming partnerships and investing in research to apply demonstrations to real world problems. The confidence SilverLake partners’ investment in Digital Reasoning has provided us with capital to extend our commercial system quickly and in new directions such as financial services, health care, legal, and other verticals.

Second, we see the MIC method fueling additional research into methods making Big Data more accessible and useful; that is, consumerize some applications without solutions. Big Data will eventually be part of a standard information process, not something discussed as “new” and “unusual.”

Third, greater awareness of the contribution of mathematics will, I believe, stimulate young men and women to make mathematics and statistics a career. With more talent entering the workforce, the pace of innovation and integration will accelerate. That’s good for many companies, not just Digital Reasoning.

Kudos to the MIC team. What’s next?

Tim Estes, December 31, 2011

Sponsored by Pandia.com

Written by Stephen E. Arnold · Filed Under Analytics, Business strategy, News, Search, Technology, Text analytics, Text processing | 1 Comment

Text Analytics Offers Summit, Podcast

December 28, 2011

Social media has exploded, with billions and billions of pieces of content hitting the Web every day; making sense of it all can be overwhelming. That’s why Text Analytics News is sponsoring the first ever two day conference dedicated solely to social media analytics– the Social Media Analytics Summit in San Francisco next April 17–18. The conference description informs us:

The Social Media Analytics Summit offers unmatchable networking and knowledge sharing opportunities for social media and analytics professionals. The Summit will be a true forum for vendors, end users, and consultants alike to develop long-lasting business relationships. The conference agenda has been designed based on meticulous research with industry experts and is well-rounded with presentations, panels, case studies, and workshops, giving you deep insights into the social analytics industry from many angles.

Social media analytics is only going to keep growing; learning about this key field now is a wise investment. Here’s an inside tip: register with discount code BSEARCHSMA to save $150.

Text Analytics News’ Chief Editor Erza Steinberg also has a podcast available called the “Social Media Analytics Perspectives Panel,” which he recently recorded with professionals from Social Media Today, Radian6, J.D. Power & Associates, and Beyond The Arc. The podcast explores:

Effective ways for leveraging social media information to gain a competitive advantage
The cutting edges of social media analytics and sentiment analysis technology
How to make business sense out of the flood of user-generated content across social media channels

Two sources of important information from Text Analytics News. Here’s hoping you can take advantage of both.

Cynthia Murrell, December 28, 2011

Sponsored by Pandia.com

Written by Stephen E. Arnold · Filed Under Conferences, Enterprise, News, Technology, Text analytics, Text processing | Comments Off on Text Analytics Offers Summit, Podcast

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Semantic Wranglers to Tame Media Content

Craig Norris Leaves Attensity

Search Only Goes So Far

Prediction Data Joins the Fight

Big Data in 2012: Reliable Open-Source Software Required

Temis, Spammy PR, and Quite Silly Assertions

Connotate Embraces Big Data

Digital Reasoning Connects with TeraDact

Digital Reasoning: A New Generation of Big Data Tools

Text Analytics Offers Summit, Podcast

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta