<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Beyond Search &#187; Text analytics</title>
	<atom:link href="http://arnoldit.com/wordpress/category/text-processing/text-analytics-text-processing/feed/" rel="self" type="application/rss+xml" />
	<link>http://arnoldit.com/wordpress</link>
	<description>by Stephen E. Arnold</description>
	<lastBuildDate>Sun, 12 Feb 2012 13:49:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Politicians Try to Surf on Social Media</title>
		<link>http://arnoldit.com/wordpress/2012/02/12/politicians-try-to-surf-on-social-media/</link>
		<comments>http://arnoldit.com/wordpress/2012/02/12/politicians-try-to-surf-on-social-media/#comments</comments>
		<pubDate>Sun, 12 Feb 2012 05:07:41 +0000</pubDate>
		<dc:creator>Stephen E. Arnold</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Social]]></category>
		<category><![CDATA[Text analytics]]></category>
		<category><![CDATA[Text processing]]></category>

		<guid isPermaLink="false">http://arnoldit.com/wordpress/?p=23194</guid>
		<description><![CDATA[Is this a new type of polling or is it social trolling? Attensity’s blog reports, “Politico Uses Attensity to Analyze SOPA Sentiment.” Attensity took on Politico’s challenge to mine social media for attitudes on the Stop Online Piracy Act. It turns out that people who spend a lot of time online skew heavily against the [...]]]></description>
			<content:encoded><![CDATA[<p>Is this a new type of polling or is it social trolling? Attensity’s blog reports, “<a href="http://blog.attensity.com/2012/01/19/politico-uses-attensity-to-analyze-sopa-sentiment/">Politico Uses Attensity to Analyze SOPA Sentiment</a>.” Attensity took on Politico’s challenge to mine social media for attitudes on the <a href="http://thomas.loc.gov/cgi-bin/query/z?c112:H.R.3261:">Stop Online Piracy Act</a>. It turns out that people who spend a lot of time online skew heavily against the law. Go figure.</p>
<p>Author James Purchase writes:</p>
<blockquote><p>If I had to directly summarize this analysis, I would say that the SOPA-opposition is significantly more organized and vocal in using Social Media to make their point. Whether or not the social media outcry affects the outcome of the legislation remains to be seen.</p></blockquote>
<p>Perhaps, though I hope the uproar against the law has reached the ears of even the most tech-adverse legislators. They have interns, right? Some are awkward too. Wipe out!</p>
<p>Cynthia Murrell, February 12, 2012</p>
<p>Sponsored by <a href="http://www.pandia.com/enterprise-search">Pandia.com</a></p>
]]></content:encoded>
			<wfw:commentRss>http://arnoldit.com/wordpress/2012/02/12/politicians-try-to-surf-on-social-media/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Linguamatics Embraces Informatics</title>
		<link>http://arnoldit.com/wordpress/2012/02/09/linguamatics-embraces-informatics/</link>
		<comments>http://arnoldit.com/wordpress/2012/02/09/linguamatics-embraces-informatics/#comments</comments>
		<pubDate>Thu, 09 Feb 2012 05:03:46 +0000</pubDate>
		<dc:creator>Stephen E. Arnold</dc:creator>
				<category><![CDATA[Business strategy]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Text analytics]]></category>
		<category><![CDATA[Text processing]]></category>

		<guid isPermaLink="false">http://arnoldit.com/wordpress/?p=22352</guid>
		<description><![CDATA[Fierce Biotech IT announces, “EU Program Backs Linguamatics and ChemAxon’s Informatics Work.” The European Union’s Eurostars Program grants research and development funding to small and medium companies. The project being funded is, according to the companies, the first interactive text-mining system specifically for chemistry research. Writer Ryan McBride elaborates: The companies say that pharma and [...]]]></description>
			<content:encoded><![CDATA[<p>Fierce Biotech IT announces, “<a href="http://www.fiercebiotechit.com/story/eu-program-backs-linguamatics-and-chemaxons-informatics-work/2011-12-19">EU Program Backs Linguamatics and ChemAxon’s Informatics Work</a>.” The <a href="http://europa.eu/index_en.htm">European Union</a>’s <a href="http://www.eurostars-eureka.eu/home.do">Eurostars Program</a> grants research and development funding to small and medium companies.</p>
<p>The project being funded is, according to the companies, the first interactive text-mining system specifically for chemistry research. Writer Ryan McBride elaborates:</p>
<blockquote><p>The companies say that pharma and biotech outfits are expected to be the main customers for the technology. With this tool, ChemAxon and Linguamatics want drug companies or other users to be able to do chemical evaluations, hunt for new chemicals, get structure visualizations in searches and ‘explore image to structure conversion,’ according to the companies&#8217; press release.</p></blockquote>
<p>More personalized medical research is expected to be one application of the system. That sounds promising.</p>
<p><a href="http://www.chemaxon.com/">ChemAxon</a> serves the biotechnology and pharmaceutical fields worldwide, providing chemical software development platforms as well as desktop applications.</p>
<p><a href="http://www.linguamatics.com/">Linguamatics</a>  bases its data management solutions on natural language processing technology. <a href="http://www.linguamatics.com/welcome/software/I2E.html">I2E</a> is the company’s flagship text mining software, also available in the cloud as <a href="http://www.linguamatics.com/welcome/software/I2E_OnDemand_Online_Cloud_SaaS_Text_Mining.html">I2E OnDemand</a>.</p>
<p>Cynthia Murrell, February 9, 2012</p>
<p>Sponsored by <a href="http://www.pandia.com/enterprise-search">Pandia.com</a></p>
]]></content:encoded>
			<wfw:commentRss>http://arnoldit.com/wordpress/2012/02/09/linguamatics-embraces-informatics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Inteltrax: Top Stories, January 30 to February 3, 2012</title>
		<link>http://arnoldit.com/wordpress/2012/02/06/inteltrax-top-stories-january-30-to-february-3-2012/</link>
		<comments>http://arnoldit.com/wordpress/2012/02/06/inteltrax-top-stories-january-30-to-february-3-2012/#comments</comments>
		<pubDate>Mon, 06 Feb 2012 16:21:11 +0000</pubDate>
		<dc:creator>Stephen E. Arnold</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Reference tool]]></category>
		<category><![CDATA[Text analytics]]></category>
		<category><![CDATA[Text processing]]></category>

		<guid isPermaLink="false">http://arnoldit.com/wordpress/?p=23266</guid>
		<description><![CDATA[Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically, how governments are embracing and utilizing big data analytics, especially during this early stage in the 2012 political cycle. We got a good overall look at the issue from the story, “Government Healthcare and Analytics Make [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.inteltrax.com" target="_blank">Inteltrax</a>, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically, how governments are embracing and utilizing big data analytics, especially during this early stage in the 2012 political cycle.</p>
<p>We got a good overall look at the issue from the story, “<a href="http://inteltrax.com/?p=3368" target="_blank">Government Healthcare and Analytics Make a Good Team</a>,”  showed how, as the title implies, this pairing is making some impressive waves in the world.</p>
<p>Another story, “<a href="http://inteltrax.com/?p=3391" target="_blank">Social Media and Politics Share Big Data Love</a>”  showed us how Ron Paul and others have utilized social media to get a better take on the issues.</p>
<p>Finally, the most promising of our stories, “<a href="http://inteltrax.com/?p=3556" target="_blank">Government Grows Into Big Data Workhorse</a>”  shows how governments around the globe could kick start a big data revolution.</p>
<p>Analytics and big data are growing by leaps and bounds. However, it seems as if government can be its best friend and often tries to be so. We’re going to keep chronicling this partnership, because we sense big things on the horizon.</p>
<p>Follow the Inteltrax news stream by visiting <a href="http://www.inteltrax.com.">www.inteltrax.com.</a></p>
<p>Patrick Roland, Editor, Inteltrax, February 6, 2012</p>
<p>Sponsored by <a href="http://www.pandia.com/enterprise-search" target="_blank">Pandia.com</a></p>
]]></content:encoded>
			<wfw:commentRss>http://arnoldit.com/wordpress/2012/02/06/inteltrax-top-stories-january-30-to-february-3-2012/feed/</wfw:commentRss>
		<slash:comments>34</slash:comments>
		</item>
		<item>
		<title>Semantic Wranglers to Tame Media Content</title>
		<link>http://arnoldit.com/wordpress/2012/02/06/semantic-wranglers-to-tame-media-content/</link>
		<comments>http://arnoldit.com/wordpress/2012/02/06/semantic-wranglers-to-tame-media-content/#comments</comments>
		<pubDate>Mon, 06 Feb 2012 05:12:24 +0000</pubDate>
		<dc:creator>Stephen E. Arnold</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Semantic]]></category>
		<category><![CDATA[Text analytics]]></category>
		<category><![CDATA[Text processing]]></category>

		<guid isPermaLink="false">http://arnoldit.com/wordpress/?p=23150</guid>
		<description><![CDATA[When the prolificacy of the media scape overwhelms, it is semantic technology to the rescue. So declares ReadWriteWeb in “Semantic Tech the Key to Finding Meaning in the Media.” Writer Chris Lamb maintains that today’s deluges of information have made attention span the prize, and delivering relevancy the key. Strategies have included tapping readers’ social [...]]]></description>
			<content:encoded><![CDATA[<p>When the prolificacy of the media scape overwhelms, it is semantic technology to the rescue. So declares ReadWriteWeb in “<a href="http://www.readwriteweb.com/archives/semantic_tech_the_key_to_finding_meaning_in_the_me.php">Semantic Tech the Key to Finding Meaning in the Media</a>.” Writer Chris Lamb maintains that today’s deluges of information have made attention span the prize, and delivering relevancy the key. Strategies have included tapping readers’ social graphs, profiles, and preferences to filter news content. Lamb writes:</p>
<blockquote><p>These current approaches are doomed. With respect to social graph curation, people have different roles at during different times. On the weekend, a reader might be interested in arts, entertainment and sports news based on a friends and family. During the week, this same person may be interested in business news based on recommendations from trading partners in the capital markets. How do readers seamlessly reconcile this?</p></blockquote>
<p>Lamb doesn’t have the answer, but says he does know what technologies will underlie the eventual solutions: tagging, semantic extraction, disambiguation, and linked data structures (including cloud data). See the write up for more the reasoning behind each.</p>
<p>Semantic technology can perform useful functions. Rich media pose some special challenges. Among them are the issues of data volume and available processing power, latency, and variability in indexable content. What about a silent movie? What about a program which features interviews with individuals with a substance abuse problem who speak colloquially with a mumble?</p>
<p>Cynthia Murrell, February 6, 2012</p>
<p>Sponsored by <a href="http://www.pandia.com/enterprise-search">Pandia.com</a></p>
]]></content:encoded>
			<wfw:commentRss>http://arnoldit.com/wordpress/2012/02/06/semantic-wranglers-to-tame-media-content/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Craig Norris Leaves Attensity</title>
		<link>http://arnoldit.com/wordpress/2012/02/02/craig-norris-leaves-attensity/</link>
		<comments>http://arnoldit.com/wordpress/2012/02/02/craig-norris-leaves-attensity/#comments</comments>
		<pubDate>Thu, 02 Feb 2012 05:03:48 +0000</pubDate>
		<dc:creator>Stephen E. Arnold</dc:creator>
				<category><![CDATA[Business strategy]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Text analytics]]></category>
		<category><![CDATA[Text processing]]></category>

		<guid isPermaLink="false">http://arnoldit.com/wordpress/?p=23104</guid>
		<description><![CDATA[Chiliad has issued the press release, “New CEO Begins Duties at CHILIAD in Herndon, VA.” Craig Norris is leaving Attensity to head that company. Attensity, owned by Aeris Capital, is positioned as a global natural language analytics company. Chiliad seems to be its direct competitor. Interesting. Chiliad Chairman Patrick Gross noted a couple of challenges [...]]]></description>
			<content:encoded><![CDATA[<p>Chiliad has issued the press release, “<a href="http://www.chiliad.com/AltHomePage.php">New CEO Begins Duties at CHILIAD in Herndon, VA</a>.” Craig Norris is leaving <a href="http://www.attensity.com/home/">Attensity</a> to head that company. Attensity, owned by <a href="http://www.aeris-capital.com/">Aeris Capital</a>, is positioned as a global natural language analytics company. <a href="http://www.chiliad.com/index1.php">Chiliad</a> seems to be its direct competitor. Interesting.</p>
<p>Chiliad Chairman Patrick Gross noted a couple of challenges his company’s new CEO has already tackled:</p>
<blockquote><p>The first is the ability to rapidly search data collections at greater scale than any other offering in the market. The second is to allow search formulation and analysis in natural language. This means that no longer is an elite class of analysts required in order to generate meaningful results, thus reducing the personnel training and skills shortages that plague alternative solutions and put timely discovery at risk. The explosion of ‘Big Data’ is real and valuable findings are buried in vast collections for both enterprises and governments. Chiliad has the opportunity to integrate its innovative, massively scalable solutions with emerging open source software to build customized solutions for the largest-scale clients.</p></blockquote>
<p>It will be interesting to see how the market reacts to this shift.</p>
<p>Cynthia Murrell, February 2, 2012</p>
]]></content:encoded>
			<wfw:commentRss>http://arnoldit.com/wordpress/2012/02/02/craig-norris-leaves-attensity/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Heat in SharePoint Semantics: January 20 &#8211; January 27</title>
		<link>http://arnoldit.com/wordpress/2012/01/31/the-heat-in-sharepoint-semantics-january-20-january-27/</link>
		<comments>http://arnoldit.com/wordpress/2012/01/31/the-heat-in-sharepoint-semantics-january-20-january-27/#comments</comments>
		<pubDate>Tue, 31 Jan 2012 05:59:36 +0000</pubDate>
		<dc:creator>Stephen E. Arnold</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Data mining]]></category>
		<category><![CDATA[Enterprise]]></category>
		<category><![CDATA[Management]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[Semantic]]></category>
		<category><![CDATA[SharePoint]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Text analytics]]></category>

		<guid isPermaLink="false">http://arnoldit.com/wordpress/?p=23061</guid>
		<description><![CDATA[As always, SharePoint Semantics has delivered many posts that are vitally important to both SharePoint end users and search enthusiasts alike. The first post that I would like to share with you is entitled “SharePoint Joel Lists Seven Actions to Take Before Calling Microsoft Support.” This post shares helpful hints on how to solve your [...]]]></description>
			<content:encoded><![CDATA[<p>As always, SharePoint Semantics has delivered many posts that are vitally important to both SharePoint end users and search enthusiasts alike.</p>
<p>The first post that I would like to share with you is entitled “<a href="http://sharepointsemantics.com/2012/01/sharepoint-joel-lists-seven-actions-to-take-before-calling-microsoft-support/" target="_blank">SharePoint Joel Lists Seven Actions to Take Before Calling Microsoft Support</a>.” This post shares helpful hints on how to solve your SharePoint issues on your own before having to involve Microsoft.</p>
<p>Writer Ken Toth summarizes the key points:</p>
<blockquote><p>“The seven things you should do are: 1. Review the Service Pack and Cumulative Update Level 2. Reboot / Recycle 3. Eliminate Third-Party Add-ons as the Issue 4. Engineers Escalate / Partner / Awareness (maybe you could solve the problem in-house if you asked engineering) 5. Isolate the Issue 6. Code Issue 7. Reach Out to the Community (Twitter and/or Newsgroups).”</p></blockquote>
<p>Many organizations use <a href="http://office.microsoft.com/en-us/sharepoint-server-help/create-a-wiki-HA010226177.aspx" target="_blank">wikis</a> to gather and share ideas on SharePoint quickly and efficiently. The post “<a href="http://sharepointsemantics.com/2012/01/build-the-best-microsoft-sharepoint-wiki-you-can-build/" target="_blank">Build the Best Microsoft SharePoint Wiki You Can Build</a>” shares virtues and tips on how to make a SharePoint wiki work effectively for your business.</p>
<p>Toth states:</p>
<blockquote><p>“To be useful, the wiki must be easy to navigate and provide all of the resources the SharePoint end user needs linked into the wiki Home page. In this way the wiki can be a one-stop shop for information about every task team members need to accomplish. Contributions are limited in order to make sure the information is accurate.”</p></blockquote>
<p>Another noteworthy post from this week is “<a href="http://sharepointsemantics.com/2012/01/excellent-resources-on-end-user-issues-for-those-new-to-sharepoint/" target="_blank">Excellent Resources on End User Issues for Those New to SharePoint</a>” which points beginners with no previous experience with SharePoint to small to medium-sized implementations to resources that can be of help.</p>
<p>After sharing the three helpful resources for SharPoint end users, Toth notes:</p>
<blockquote><p>“The three resources above can be quite useful for beginning users of SharePoint in smaller deployments, but if you have frustrated end users in an enterprise deployment, look to <a href="http://www.smartlogic.com/" target="_blank">Smartlogic</a>. The Semaphore Content Intelligence Platform provides a comprehensive solution to frustrating out of the box SharePoint search and navigation.”</p></blockquote>
<p>As always, while these articles provide helpful tips for users to efficiently overcome the lack of out-of-the box help that SharePoint provides, It is important that users recognize the web application platform’s limitations and utilize other products like Smartlogic’s <a href="http://www.smartlogic.com/home/products/semaphore-solutions/sharepoint-integration-pack/sharepoint" target="_blank">Semaphore Content Intelligence Platform</a>. Smartlogic fills in the gaps by using semantic technology to deliver information quickly and in context.</p>
<p>Jasmine Ashton, January 31, 2012</p>
]]></content:encoded>
			<wfw:commentRss>http://arnoldit.com/wordpress/2012/01/31/the-heat-in-sharepoint-semantics-january-20-january-27/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Search Only Goes So Far</title>
		<link>http://arnoldit.com/wordpress/2012/01/30/search-only-goes-so-far/</link>
		<comments>http://arnoldit.com/wordpress/2012/01/30/search-only-goes-so-far/#comments</comments>
		<pubDate>Mon, 30 Jan 2012 05:04:50 +0000</pubDate>
		<dc:creator>Stephen E. Arnold</dc:creator>
				<category><![CDATA[Business strategy]]></category>
		<category><![CDATA[Digital Reasoning]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Semantic]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Text analytics]]></category>
		<category><![CDATA[Text processing]]></category>

		<guid isPermaLink="false">http://arnoldit.com/wordpress/?p=23019</guid>
		<description><![CDATA[Infocentric Research surveyor Stephan Schillerwein, who presented his findings at the Online Information Conference, released some alarming statistics about enterprise search in his report “The Digital Workplace.” Among the points which jumped out at me were 40 percent of employees use the wrong information when conducting enterprise searches and 63 percent “make critical decisions without [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.infocentricresearch.com/">Infocentric Research</a> surveyor Stephan Schillerwein, who presented his findings at the <a href="http://www.online-information.co.uk/">Online Information Conference</a>, released some alarming statistics about enterprise search in his report “<a href="http://www.infocentricresearch.com/Research/Publications/The-Digital-Workplace.aspx">The Digital Workplace</a>.” Among the points which jumped out at me were 40 percent of employees use the wrong information when conducting enterprise searches and 63 percent “make critical decisions without being informed,” which results in a 25 percent work information productivity loss.</p>
<p>According to the Pandia Search Engine News Article “<a href="http://www.pandia.com/sew/4372-problems-for-enterprise-search.html">Huge Problems for Search In the Enterprise”</a> Schillerwein believes there are a few reasons why enterprise search is problematic. Users don’t account for the fact that enterprise search is different from Web Search, they have unrealistic expectations and there is a clear problem of lack of content. The Pandia article asserted: Schillerwein suggests a solution based on several elements, such as consistent coverage of information flows for processes, bringing together the worlds of structured and unstructured information, and adding context. I would agree as this ability to combine structured and unstructured data while maintaining context is key in our approach. However, when you combine the crowded jumble of tweets, social media and other data that crowd employees’ smart devices the problems with enterprise search could continue to take a downward spiral and “finding a needle in a haystack” could be easier than doing an enterprise search.</p>
<p>These observations triggered several questions and observations.</p>
<p>First, there are a number of companies offering enterprise information solutions. Many are focused on the older approach of key word queries. There are business intelligence systems which provide “find-ability” tools along with a range of useful analytic features. Although search is not the focal point of these solutions, they do provide useful visualizations and statistics on content. The problem is that most organizations are confused about what is needed and what must be done to maximize the value of systems which go beyond key word retrieval. This confusion is likely to play a far larger role in enterprise search challenges than many market analysts want to acknowledge. Instead, many solutions today seem to be making information access more confusing and problematic, not clearer and more trouble free.</p>
<p>Second, the challenge may be more directly related to figuring out what specific business process needs which information. Without a clear understanding of the user’s requirements, it may be difficult to deploy a system that delivers higher user satisfaction. If this hypothesis is correct, perhaps more vendors should adopt the approach we have taken at <a href="http://www.digitalreasoning.com/">Digital Reasoning</a>. We make an extra effort to understand what the user requires and then invest time and resources in hooking appropriate information and data into the system. No solution can deliver the right fact-based answers if the required information is not within the data store and available to the algorithms which make sense of what is otherwise noise? We think that many problems with user acceptance originate with a misunderstanding or sidestepping of user requirements and the fundamental task of getting the necessary information for the system.</p>
<p>Third, the terminology used to describe information retrieval and access is becoming devalued. At Digital Reasoning, we work to explain succinctly and without jargon how our next-generation system can facilitate better decision making for financial, health, intelligence, and other professional markets. We have complex numerical recipes and sophisticated systems and methods. Our focus, however, is on what the system does for a user. We have been fortunate to receive support from a range of clients from government and industry as well as the investment community for our next-generation approach. We think our strength is our focus on the customer’s need and not only our unique predictive algorithms and cloud-based solution.</p>
<p>To learn more about Digital Reasoning and our products, navigate to <a href="http://www.digitalreasoning.com/">www.digitalreasoning.com</a> .</p>
<p>Dave Danielson, <a href="http://www.digitalreasoning.com" target="_blank">Digital Reasoning</a>, January 30, 2012</p>
<p>Sponsored by <a href="http://www.pandia.com/enterprise-search">Pandia.com</a></p>
]]></content:encoded>
			<wfw:commentRss>http://arnoldit.com/wordpress/2012/01/30/search-only-goes-so-far/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Prediction Data Joins the Fight</title>
		<link>http://arnoldit.com/wordpress/2012/01/12/prediction-data-joins-the-fight/</link>
		<comments>http://arnoldit.com/wordpress/2012/01/12/prediction-data-joins-the-fight/#comments</comments>
		<pubDate>Thu, 12 Jan 2012 05:04:08 +0000</pubDate>
		<dc:creator>Stephen E. Arnold</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Government]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Text analytics]]></category>
		<category><![CDATA[Text processing]]></category>

		<guid isPermaLink="false">http://arnoldit.com/wordpress/?p=22644</guid>
		<description><![CDATA[It seems that prediction data could be joining the fight against terrorism. According to the Social Graph Paper article “Prediction Data As An API in 2012” some companies are working on developing prediction models that can be applied to terror prevention. The article mentions the company Palantir “they emphasize development of prediction models as applied [...]]]></description>
			<content:encoded><![CDATA[<p>It seems that prediction data could be joining the fight against terrorism. According to the Social Graph Paper article <a href="http://socialgraphpaper.blogspot.com/2012/01/prediction-data-as-api-in-2012.html">“Prediction Data As An API in 2012”</a> some companies are working on developing prediction models that can be applied to terror prevention. The article mentions the company <a href="http://www.palantirtech.com/">Palantir</a> “they emphasize development of prediction models as applied to terror prevention, and consumed by non-technical field analysts.” <a href="https://www.recordedfuture.com/">Recorded Future</a> is another company but they rely on “creating a &#8216;temporal index&#8217;, a big data/ semantic analysis problem, as a basis to predict future events.”  Other companies that have been dabbling in big data/prediction modeling are <a href="sensenetworks.com:">Sense Networks</a>, <a href="http://www.digitalreasoning.com/">Digital Reasoning</a>, <a href="http://www.bluekai.com/">BlueKai</a> and <a href="http://www.primalgame.com/">Primal</a>. The author theorizes that “There will be data-domain experts spanning the ability to make sense of unstructured data, aggregate from multiple sources, run prediction models on it, and make it available to various &#8220;application&#8221; providers.”  Using data to predict the future seems a little farfetched but the technology is still new and not totally understood. Everyone does need to join the fight against terrorism but exactly how data prediction fits in remains to be seen.</p>
<p>April Holmes, January 12, 2012</p>
<p>Sponsored by <a href="http://www.pandia.com/enterprise-search" target="_blank">Pandia.com</a></p>
]]></content:encoded>
			<wfw:commentRss>http://arnoldit.com/wordpress/2012/01/12/prediction-data-joins-the-fight/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Big Data in 2012: Reliable Open-Source Software Required</title>
		<link>http://arnoldit.com/wordpress/2012/01/11/big-data-in-2012-reliable-open-source-software-required/</link>
		<comments>http://arnoldit.com/wordpress/2012/01/11/big-data-in-2012-reliable-open-source-software-required/#comments</comments>
		<pubDate>Wed, 11 Jan 2012 05:09:39 +0000</pubDate>
		<dc:creator>Stephen E. Arnold</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Government]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Privacy]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[Text analytics]]></category>
		<category><![CDATA[Text processing]]></category>

		<guid isPermaLink="false">http://arnoldit.com/wordpress/?p=22616</guid>
		<description><![CDATA[Enthusiasm and optimism that Big Data as a concept is the next big thing. We are almost ready to board the Big Data bull dozer. The hoopla surrounding Big Data has not died down in 2012. Instead, the concept demonstrates the continuing environment of processing and analysis. As businesses become aware that the Big Data [...]]]></description>
			<content:encoded><![CDATA[<p>Enthusiasm and optimism that Big Data as a concept is the next big thing. We are almost ready to board the Big Data bull dozer. The hoopla surrounding Big Data has not died down in 2012. Instead, the concept demonstrates the continuing environment of processing and analysis.</p>
<p>As businesses become aware that the Big Data trend is here to stay, publishers are looking for reliable support. The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. The company offers much in the way of dealing with unstructured data and is setting the pace for consolidation as well as personalization. I came across an interesting article, “<a href="http://www.lemagit.fr/article/donnees-hadoop-big-data/10136/1/etat-monde-big-data-une-offre-qui-forme/">State of the World IT: Big Data, An Offer That is Formed</a>” (The original article is in French, but <a href="http://translate.google.com/">http://translate.google.com</a> works well for this gosling). We learn:</p>
<blockquote><p>As a recognition of the market in 2011, Hadoop has also attracted the top names in the IT industry who put this framework in the heart of their range of data processing volume. One reason: the cost mainly <a href="http://translate.googleusercontent.com/translate_c?act=url&amp;hl=en&amp;ie=UTF8&amp;prev=_t&amp;rurl=translate.google.com&amp;sl=auto&amp;tl=en&amp;twu=1&amp;u=http://www.lemagit.fr/article/apache-donnees-gestion-hadoop-framework/9984/1/hadoop-engouement-pour-une-technologie-qui-doit-encore-evoluer/&amp;usg=ALkJrhj3RNLomEZl3TTbprNyDrQUC6U5fg">reminded us James Markarian</a>, executive vice president and technical director of Informatica confirming that the framework ‘helped to change the economic model of the Big Data.’ Adding that flexibility… was as a criterion for adoption.</p></blockquote>
<p>It is clear that the excess of data will only continue to grow by the minute. Generations of search, publishing, and consolidation will continue to emerge. I recommend staying informed of the products and the specific capabilities of each. However, Big Data which is filtered may pose some interesting problems; for example, will the outputs match the pre-filtered reality? Will predictive methods work when some data are no longer in the stream? So far the cheerleading is using chants from an older, pre-filtering era. Is this a good thing or a no-thing?</p>
<p>Andrea Hayden, January 11, 2012</p>
<p>Sponsored by <a href="http://www.pandia.com/enterprise-search" target="_blank">Pandia.com</a></p>
]]></content:encoded>
			<wfw:commentRss>http://arnoldit.com/wordpress/2012/01/11/big-data-in-2012-reliable-open-source-software-required/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Temis, Spammy PR, and Quite Silly Assertions</title>
		<link>http://arnoldit.com/wordpress/2012/01/11/temis-spammy-pr-and-quite-silly-assertions/</link>
		<comments>http://arnoldit.com/wordpress/2012/01/11/temis-spammy-pr-and-quite-silly-assertions/#comments</comments>
		<pubDate>Wed, 11 Jan 2012 05:01:18 +0000</pubDate>
		<dc:creator>Stephen E. Arnold</dc:creator>
				<category><![CDATA[Business strategy]]></category>
		<category><![CDATA[Feature]]></category>
		<category><![CDATA[Marketing]]></category>
		<category><![CDATA[Semantic]]></category>
		<category><![CDATA[Text analytics]]></category>
		<category><![CDATA[Text processing]]></category>

		<guid isPermaLink="false">http://arnoldit.com/wordpress/?p=22630</guid>
		<description><![CDATA[I am working on a project related to semantics. The idea is, according to that almost always reliable Wikipedia resource is: the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata. Years ago I studied at Duquesne University, a fascinating [...]]]></description>
			<content:encoded><![CDATA[<p>I am working on a project related to semantics. The idea is, according to that almost always reliable <a href="http://en.wikipedia.org/wiki/Semantics" target="_blank">Wikipedia</a> resource is:</p>
<blockquote><p>the study of <a href="http://en.wikipedia.org/wiki/Meaning_(linguistic)">meaning</a>. It focuses on the relation between <em>signifiers</em>, such as <a href="http://en.wikipedia.org/wiki/Word">words</a>, <a href="http://en.wikipedia.org/wiki/Phrase">phrases</a>, <a href="http://en.wikipedia.org/wiki/Sign">signs</a> and <a href="http://en.wikipedia.org/wiki/Symbol">symbols</a>, and what they stand for, their <a href="http://en.wikipedia.org/wiki/Denotation">denotata</a>.</p></blockquote>
<p>Years ago I studied at Duquesne University, a fascinating blend of Jesuit obsession, basketball, and phenomenological existentialism. If you are not familiar with this darned exciting branch of philosophy, you can dig into <em>Psychology from an Empirical Standpoint</em> by <a href="http://plato.stanford.edu/entries/brentano/" target="_blank">Franz Brentano</a> or grind through <a href="http://plato.stanford.edu/entries/stumpf/" target="_blank">Carl Stumpf’s</a> <em>The Psychological Origins of Space Perception</em>, or just grab the Classic Comic Book from your local baseball card dealer. (My hunch is that many public relations professionals feel more comfortable with the Classic approach, not the primary texts of philosophers who focus on how ephemera and baloney affect one’s perception of reality one’s actions create.)</p>
<p>But my personal touchstone is <a href="http://plato.stanford.edu/entries/husserl/" target="_blank">Edmund Husserl’s</a> body of work. To get the scoop on <em>Lebenswelt</em> (a universe of what is self-evident), you will want to skip the early work and go directly to <em><a href="http://goo.gl/rfLMM" target="_blank">The Crisis of European Sciences and Transcendental Phenomenology</a></em>. For sure, PR spam is what I would call self evident because it exists, was created by a human (possibly unaware that actions define reality), to achieve an outcome which is hooked to the individual&#8217;s identify.</p>
<p>Why mention the crisis of European  thought? Well, I received “<a href="http://tagline.temis.com/" target="_blank">American Society for Microbiology Teams Up With TEMIS to Strengthen Access to Content</a>” in this morning’s email (January 10, 2012). I noted that the document was attributed to an individual identified as Martine Fallon. I asked to be removed from the spam email list that dumps silly news releases about Temis into my system. I considered that Martine Fallon may be a ruse like <a href="http://chnm.gmu.edu/sidelights/who-was-betty-crocker/" target="_blank">Betty Crocker</a>. Real or fictional, I am certain she or one of her colleagues, probably schooled in an esoteric discipline such as modern dance, agronomy, and public relations are familiar with the philosophical musings of <a href="http://www.leninimports.com/jean_genet.html" target="_blank">Jean Genet</a>.</p>
<p><img src="http://ecx.images-amazon.com/images/I/51A56F093YL._SL500_AA300_.jpg" alt="" width="181" height="181" /></p>
<p><span style="color: #800000; font-size: x-small;">You can get a copy of <em>Born to Lose</em> at </span><a href="http://goo.gl/dfsqc" target="_blank"><span style="color: #800000; font-size: x-small;">this link</span></a><span style="color: #800000; font-size: x-small;">.</span></p>
<p>I recall M. Genet’s observation:</p>
<blockquote><p><span style="color: #800000; font-size: x-small;">I recognize in thieves, traitors and murderers, in the ruthless and the cunning, a deep beauty &#8211; a sunken beauty.</span></p></blockquote>
<p><a href="http://www.temis.com" target="_blank">Temis</a>, a European company in the dicey semantic game, surely appreciates the delicious irony of explaining a license deal as a “team”. The notion of strengthening access to content is another semantic <em>bon mot</em>. The problem is that the argument does not satisfy my existential quest for factual information; for example, look at the words and bound phrases in bold:</p>
<blockquote><p>Temis, the <strong>leading provider</strong> of <strong>Semantic Content Enrichment</strong> <strong>solutions</strong> for the Enterprise, today announced it has signed a license and services agreement with the American Society for Microbiology (ASM), the oldest and largest life science membership organization in the world.</p></blockquote>
<p>Do tell. Leading? Semantic content enrichment. What&#8217;s that?</p>
<p>What about outfits like <a href="http://www.accessinn.com" target="_blank">Access Innovations</a>, <a href="http://www.conceptsearching.com" target="_blank">Concept Searching</a>, <a href="http://www.expertsystem.net" target="_blank">Expert System SA</a>, <a href="http://www.smartlogic.com" target="_blank">Smartlogic</a>, and more than 75 other firms in the semantic space. The “leading” word is interesting but it lacks the substance of verifiable fact. Well, there’s more to the news story and the Temis pitch. Temis speaks for its client, asserting:</p>
<blockquote><p>To serve its 40,000 members better, ASM is completely revamping its online content offering, and aggregating at a new site all of its authoritative content, including ASM’s journal titles dating back to 1916, a rapidly expanding image library, 240 book titles, its news magazine <em>Microbe</em>, and eventually abstracts of meetings and educational publications.</p></blockquote>
<p>I navigated to the ASM Web site, did some poking around, and learned that ASM is rolling in dough. You can verify the outfit’s financial status at this page. But the numbers and charts allowed me to see that ASM has increasing assets, which is good. However, this chart suggests that since 2008, revenue has been heading south.</p>
<p><a href="http://arnoldit.com/wordpress/wp-content/uploads/2012/01/image1.png"><img style="display: inline; border: 0px;" title="image" src="http://arnoldit.com/wordpress/wp-content/uploads/2012/01/image_thumb1.png" alt="image" width="244" height="198" border="0" /></a></p>
<p><span style="color: #800000; font-size: x-small;">Source: </span><a href="http://www.faqs.org/tax-exempt/DC/American-Society-For-Microbiology.html"><span style="color: #800000; font-size: x-small;">http://www.faqs.org/tax-exempt/DC/American-Society-For-Microbiology.html</span></a></p>
<p>In my limited experience in rural Kentucky, not-for-profits embrace technology for one of three reasons. Let me list them and see if we can figure out what causes the estimable American Society for Microbiology.</p>
<p><span id="more-22630"></span></p>
<ol>
<li><strong>Cost reduction.</strong> Professional associations are not usually in growth mode. Health is a hot area, but it is looking at green eyeshades covering accounting programs that have to chop jobs. Automated indexing is one of those teen fantasies about silver bullet solutions that sound good in a meeting but can prove a bit of a challenge in the real world. Not even the medical vocabularies are immune to the disease of language drift and neologisms, issues that marketers and PR professionals often ignore.</li>
<li><strong>Declining traffic.</strong> There are folks inspired by taxonomy fire drills, boot camps, and triage sessions who assert, “Better indexing will boost traffic.” Sorry. Life does not work that way. Annoyed users may become less annoyed if the indexing changes deliver on point content. In my experience more than a single system’s indexing is needed to remediate the often lousy usage of an increasingly expensive enterprise or Web site indexing system. Once again, it is often easier to focus on a component of a far larger problem than tackling the cause of poor usage. Perhaps management, resources, and technical expertise are the issue? Again, most sales oriented organizations ignore the facts documented in <a href="http://www.galatea.com" target="_blank">Successful Enterprise Search Management</a>.</li>
<li><strong>Remediation of a lack of planning and management actions</strong>. I am 67, worked at a couple of reasonably respectable management consulting firms, and have had to investigate vendor compliance with statements of work in search and content processing. What I have learned is that short cuts are preferable to hard work, truism more important than facts, and hope valid than the blunt edge of reality. Without effective management, do we end up with search disasters? Perhaps.</li>
</ol>
<p>Why focus on Temis? I previously asked the firm’s public relations expert, who seems to be more inclined to spam than research, to cease sending me meaningless spammy news releases. My request was ignored. Nifty. What fascinated me is that Temis asked me to facilitate an introduction for them to a $1.2 billion company’s president. I did this and moved on. I assumed in the manner of French cultural norms that I would be rewarded with <em>entrecote</em>. Wrong. My reward has been spam.</p>
<p>Does this illustrate how Temis perceives equity? Does the spirit of M. Genet apply? My recommendation? Check out the semantic products from the Temis competitors. I quite like <a href="http://www.expertsystem.net" target="_blank">Expert System SA</a> in Bologna, Italy, and <a href="http://www.bitext.com">Bitext</a> in Madrid, Spain. Great food, interesting culture, and&#8211;<em>nota bene</em>&#8211; no spam. One has to get the semantics correct. No spam from Italy. No spam from Spain. Hmmm. There&#8217;s a cultural message perhaps?</p>
<p>What PR spam connotes is what in my opinion could be characterized as <strong>desperation marketing</strong>. The phenomenon itself defines the act and its progenitor, does it not? And what about those semantics? As M. Genet allegedly said:</p>
<blockquote><p><span style="color: #800000; font-size: x-small;">To achieve harmony in bad taste is the height of elegance.</span></p></blockquote>
<p><a href="http://www.arnoldit.com/sitemap.html" target="_blank">Stephen E Arnold</a>, January 11, 2012</p>
<p>Sponsored by <a href="http://www.pandia.com/enterprise-search" target="_blank">Pandia.com</a>, publisher of the New Landscape of Enterprise Search which does not include an analysis of Temis and the firm’s technologies which are asserted to be from “the leading provider of semantic content enrichment solutions for the enterprise.” I just don’t believe this, but the outfit is good at spam.</p>
]]></content:encoded>
			<wfw:commentRss>http://arnoldit.com/wordpress/2012/01/11/temis-spammy-pr-and-quite-silly-assertions/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

