Library of Congress Vows to Archive All Tweets

August 28, 2012

Andrew Phelps of Nieman Journalism Lab recently reported on a huge undertaking by the Library of Congress in the article “The Plan to Archive Every Tweet in the LIbrary of Congress? Definitely Still Happening.”

According to the article, back in 2010 the Library of Congress announced its plan to preserve every public tweet for future generations. Little did it know at the time, there are 400 million public tweets a day and the number is continuing to grow. However, when Canada.com recently reported that the “LOC is quietly backing out of the commitment”, an LOC spokesperson replied saying that the the project is very much still happening.

Library Spokesperson Jennifer Gavin said:

“The process of how to serve it out to researchers is still being worked out, but we’re getting a lot of closer,” Gavin told me. “I couldn’t give you a date specific of when we’ll be ready to make the announcement…We began receiving the material, portions of it, last year. We got that system down. Now we’re getting it almost daily. And of course, as I think is obvious to anyone who follows Twitter, it has ended up being a very large amount of material.”

Since the project is definitely going underway, the real challenge is how will this unstructured data be organized and made searchable. I’m interested to see what they figure out.

Jasmine Ashton, August 28, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

More on Exalead Please

August 27, 2012

On the Q2 Dassault conference call, there was a brief mention of one of our favorite companies, Exalead. Seaking Alpha serves up the conversation in “Dassault Systemes’ CEO Discusses Q2 2012 Results (Afternoon Call)- Earnings Call Transcript.”

We have long been interested in Exalead, and applauded Dassault’s decision purchase the business and invest resources in expanding it, rather than simply licensing its technology. So, how have things been going? When Dessault president and CEO Bernard Charles was asked about any general plans to provide a lifecycle management solution, he noted in part:

“. . . there is one thing we are doing in a completely different way maybe you have heard about it so I want to connect this to that point. We are now providing extremely innovative spare part management systems which are based on completely revolutionary platform using EXALEAD which has proven to provide amazing results that are very different from traditional implementation of spare parts systems potential available or proximities to talk about it. Jay? Next question?”

Wait, next question? But we want to know more! Oh, well. Not much discussion about Exalead, I’m afraid. Perhaps next quarter.

Exalead was founded in 2000 and purchased by engineering powerhouse Dassault in 2010. Exalead’s CloudView platform is uniquely capable of seamlessly integrating structured and unstructured data. We find their approach to be stable, offering platform flexibility, mobile search, and mash-ups. Oh, and their solutions are more affordable than much of the competition.

Cynthia Murrell, August 27, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Atigeo Releases Medical Research Search Application

August 27, 2012

A big data analytics company, Atigeo, is making strides in the health care field with a new application that searches the federal database to produce more relevant medical data search results.

Atigeo has launched PubMed Explorer which allows users to search the National Institute of Health’s PubMed database to present results of medical studies based on context in a graphical display.

A story on eWeek.com, “Atigeo Launches Big Data Semantic Search Tool Using NIH PubMed,” tells us more about the product. We learn that the product uses Atigeo’s xPatterns big data semantic search platform in the cloud to fine-tune search results to help the program learn user’s search patterns. This makes for quicker medical research. The article states:

“‘Our goal is to provide medical researchers with the appropriate tools to shorten research cycles, enable breakthroughs and ultimately improve our health,’ Michael Sandoval, chairman and CEO of Atigeo, said in a statement.

PubMed Explorer acts as a domain expert in which an algorithm extracts relevant terms from research studies or clinical EHRs and generates a graph of connections between the documents and discovered data, Burgess explained.”

Over time, this tool can learn the context of searches and the manner in which the query relates to the data can change. A demonstration is available here. These capabilities can help reduce errors in the health care field and facilitate better and faster research.

Andrea Hayden, August 27, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Inmagic Releases Presto with Web Publishing Capabilities

August 27, 2012

A new product is available from Inmagic that will enable many advanced Web-publishing capabilities for current DB/Text users.

The product, Presto for DB/Text, was created by the company to work with its current full featured, Web-based library management system and will enable new Web-publishing abilities while still allowing textbases to continue to be created and maintained in DB/Text. According to a post on Inmagic Inc. blog titled, “Announcing ‘Presto for DB/Text’,” capabilities include the ability to easily search across all textbases at once and display results in one view as well as integration of social features. We learn:

“Presto for DB/Text has been designed for customers that require advanced web-publishing capabilities without the need for custom programming, which is often necessary when using WebPublisher PRO.  Presto for DB/Text does not replace WebPublisher PRO, however — WebPublisher PRO will continue to be enhanced and supported.  Presto for DB/Text just gives DB/Text customers an additional option for publishing information to the Internet or their intranet.”

Additional (albeit optional and at a cost to the customer,) features include SharePoint integration, federated search, and the ability to add and create native Presto databases/content types. We are interested to see more from the company and are excited about these available features.

Andrea Hayden, August 27, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Linguamatics and the US FDA

August 24, 2012

Linguamatics recently announced that the FDA’s Center for Drug Evaluation and research (CDER) is set to use their I2E platform, which is the company’s interactive data mining and extraction software across CDER’s laboratory research relating to drug safety.

The write-up “Linguamatics’ I2E Text Mining Platform Chosen by FDA” provides more details about why the text mining company was selected by the CDER:

“I2E’s NLP-based querying capabilities, coupled with its scalability and flexibility, mean it is ideally suited to answering many challenging, high value questions in life sciences and healthcare by unlocking knowledge buried in the scientific literature and other textual information. Rather than just retrieving documents, I2E can rapidly identify, extract, synthesize and analyze specific, relevant facts and relationships, such as those between genes and diseases or compounds and side effects. Customers include nine of the top ten global pharmaceutical companies.”

What’s great about the I2E platform is that unlike other text mining systems, I2E provides businesses with full control over what information is to be extracted, what query definitions are, and the kind of output. With it, users can obtain information in a short period of time even from large documents.

Again, another company from the science sector has opted to use Linguamatics’ I2E platform. CDER joins Pfizer, Selventa, AstraZeneca, and others from the company’s roster of prestigious clients. Linguamatics has truly evolved from being a small player to being the industry leader in NLP-based text mining within just a few years. We’re excited to see what the company will become two to three years from now.

Lauren Llamanzares, August 24, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Not All Subscribe to Big Data Hype

August 21, 2012

Anyone who’s been paying attention knows that big data is everyone’s next big thing, right? Not so fast. Computerworld enlightens us, declaring that “Most Firms Have No Big Data Plans, Survey Finds. ” Writer Lucas Mearian points to the latest iteration of an annual survey from market-research outfit TheInfoPro. Each year, its Technology Heat Index Survey polls hundreds of IT pros about their plans. This year’s results show that 56% of the 255 respondents see no big data analytics in their futures. The write up reports:

“Survey respondents with no plans to roll out Hadoop or other big data analytics software said doing so requires a specific business case, and in most instances they didn’t see a need for it, according to Marco Coulter, managing director of TheInfoPro’s Cloud Computing Practice. . . .

“Coulter said those companies rolling out big data analytics tend to be in the financial services and healthcare arena, where great amounts of data can be boiled down to reveal trends and best practices.”

You mean people are actually examining their needs before jumping on a bandwagon? Imagine that.

The article shares several more survey findings; check it out for full details. One example—server virtualization was found (again) to be considered the leading driver of capacity growth; 67% of those surveyed revealed that 80% -100% of their production servers connect to a Fibre Channel storage area network (SAN). The survey also found a leap in organizations planning to deploy solid state drive (SSD) technology, from just 7% last year to 37% this time around.

Not surprisingly, many respondents are having to make due with tightening budgets. Some things are nigh universal, I suppose.

Cynthia Murrell, August 21, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

XML Exhausting Possibly Too Complex to Last

August 19, 2012

A post on DevXtra Editors’ Blog, “Is XML Too Big? Does Anyone Care?,” poses an interesting sentiment on the size and possibilities of XML.

XML, or the Extensible Markup Language, is too big and can be quite complex depending on the size and purpose of the documents. Syntactic analysis of XML documents are time consuming and difficult, not only for the people completing the task but also for the CPU. The World Wide Web Consortium says that XML “is a simple, very flexible text format.”

The blog post disagrees, stating:

“[…]it’s actually more difficult to parse a large document than to create one. If an XML document is damaged or malformed, software can become very confused, and often, even trivial errors or corruption in the XML document can stop processing. Working with schema extensions can be difficult, and older documents written using DTDs (Document Type Definitions) and Document Object Models (DOMs) can be incomprehensible.”

We think the better question is: “Will people care about XML in two years?” Currently, XML is crucial to exchange data and documents, but will the complexity of the system make it an inexpugnable solution? It is hard to validate using such extensive resources. A simplified system is surely, hopefully, on the way.

Andrea Hayden, August 19, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Expert System Client Wins Web Site Award

August 18, 2012

In another “we won a prize” announcement from a search and content processing vendor, Expert System boasts, “Expert System Customer Telecom Italia Recognized for Top Website.” Telecom Italia‘s site, which uses Expert System’s Cogito semantic technology, was named the top corporate site by KWD Webranking in its Europe 500 annual survey.

Naturally, Expert System takes the opportunity to highlight the newest Cogito features that helped Telecom Italia build a great site. The write up lists:

  • “Did you mean?”: Cogito’s ability to understand the meaning of words facilitates greater access to information, even in the case of ambiguous requests. This feature suggests alternate formulas for search queries that contain errors or misspellings.
  • Categorization: Expert System developed a custom taxonomy to categorize the Telecom Italia knowledge base, which enables more effective search and navigation of site content.
  • Multilanguage results: In addition to search results in Italian, the search engine broadens results by including a separate set of results in English for each query.
  • Results filtering by file type: Users can choose to refine results by the type of content they’re looking for, such as by web pages, videos or PDF.

All valuable features, to be sure. We find this crowing about prizes to be an interesting approach to marketing. Effective? Not sure.

Based in Modena, Italy, Expert System has satellite offices in Europe and the US. Business and government organizations in several fields use their solutions for data management, collaboration, and customer relationship management.

Cynthia Murrell, August 18, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Algebraix Vaunts Speedy Results

August 16, 2012

It looks like Algebraix is calling out competitor Revelytix, who is specifically mentioned as having been trounced in Sys-Con Media’s “Algebraix Data Announces Record-Breaking Semantic Benchmark Performance.” Algebraix boasts that, for 80 percent of queries, their benchmark test outperformed Revelytix’s best published results twelvefold. Yep, a dozen times faster. That does seem pretty fast.

The benchmark test was performed on Algebraix’s SPARQL Server RDF database using an Amazon Cloud EC2 Large hardware configuration. This setup is identical, the write up states, to the one used in the multivendor SP2Bench performance comparisons that Revelytix had published. The press release crows:

“Furthermore, Algebraix Data’s SPARQL Server is the only database to have executed all of SP2Bench queries, including all six of the queries that were not successfully executed within the Revelytix guidelines by other Resource Description Framework (RDF) databases. . . .

“‘The outstanding SPARQL Server performance is a direct result of the algebraic techniques enabled by our patented Algebraix technology,’ said Chris Piedmonte, co-founder and CTO of Algebraix Data.”

Algebraix Data is headquartered in Austin, TX. In 2004, the company was founded on the vision of real-time access to data, structured and unstructured, in a distributed, collaborative, and dynamic environment. Their technology has garnered seven US patents.

Revelytix boasts that their community-based knoodl.com is currently the most widely used ontology editing tool. The company was formed in 2006, and is based in Sparks, MD.

Cynthia Murrell, August 16, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Autonomy Big Data Solutions Highlighted

August 14, 2012

HP has put forth a new write up about HP Autonomy and Big Data, “Autonomy IDOL Big Data Solutions.” In our opinion, the pre-buy-out Autonomy had more marketing flair. Oh, well.

The article lists a couple of solutions based on HP’s Converged Cloud and Autonomy IDOL 10. The description elaborates:

“*IDOL Powered Hadoop: New capabilities for leveraging IDOL technology within Hadoop deployments.

*Autonomy Optimost Clickstream Analytics: Groundbreaking solution that provides marketers with a single, consistent view of visits, conversions, and customer engagement across all channels.

“Together, these solutions enable businesses to discover new trends, opportunities, and risks, and accelerate revenue growth by understanding and acting on web clickstream, sentiment, and transactional data.”

Next, the write up lists the primary customer benefits of each solution. For IDOL-powered Hadoop, for example, it notes that the IDOL engine can be embedded in each Hadoop node, and that IDOL’s 400 connectors enable the combination of Hadoop data with other enterprise and external data.

Autonomy Optimost lets marketers perform complex queries on complete datasets and in real time. Users can also blend clickstream data with human information and application data. The application is integrated with the Autonomy Promote suite.

Autonomy, originally founded in 1996, was snatched up by HP in 2011. They take pride in building tools that efficiently extract meaning from unwieldy tangles of unstructured data. The technology grew from research originally performed at Cambridge University.

Cynthia Murrell, August 14, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta