Hakia Helps Pick Stock Winners
January 13, 2011
SENSEnews.com by Hakia uses advanced semantic algorithms to analyze news and social media coverage to determine which stocks are over or under valued. In this way, they can offer a “best-stock-picks” portfolio for the market.
Does it work? Here’s what we noted:
Over a 6-month period, the SENSEnews stock indicator has consistently produced higher returns than that of the DJIA and S&P 500 indices.
If this sounds tempting, it’s currently available to a limited number of participants for a monthly subscription and will be available to enterprises later this year.
Alice Wasielewski, January 11, 2011
Search and the Responsive Web
January 13, 2011
I hate the term UX, shorthand for user experience. “Responsive Web Design: What It Is and How To Use It” introduced me to a new term, “responsive Web design.” I like it. The article that explains what responsive Web design is. The passage I noted was:
We should rather start a new era today: creating websites that are future-ready right now. Understanding how to make a design responsive to the user doesn’t require too much learning, and it can definitely be a lot less stressful and more productive than learning how to design and code properly for every single device available. Responsive Web design and the techniques discussed above are not the final answer to the ever-changing mobile world. Responsive Web design is a mere concept that when implemented correctly can improve the user experience, but not completely solve it for every user, device and platform.
The article includes a number of excellent examples and some of those very useful, ready to edit code snippets that the goslings and I love.
What can search vendors learn from this write up? In my opinion, vendors can learn how to break out of the search box. Times and user needs have changed. It’s not experience. It is responsiveness.
Stephen E Arnold, January 13, 2011
Freebie
Old-New Method from BackType
January 13, 2011
There is considerable interest in real time and big data. One question we hear is, “How does the infrastructure deliver the throughput?” Answering this question can be difficult. We found quite useful “Secrets of BackType’s Data Engineers.” The tips and approach may not be right for some organizations but the information about software and “plumbing” is a quick introduction to one company’s approach.
For me the most striking segment of the write up was:
They experimented with writing out the data to a Cassandra cluster, but ran into performance issues. What they ended up creating instead was a system they call ElephantDB. It takes all the data from a batch job, splits it up into shards, each of which is written out to disk as BerkeleyDB-format files. After that they fire up an ElephantDB cluster to serve the shards. Unlike many traditional databases, it’s read-only, so to update data served from the batch layer you create a new set of shards. So that’s how the heavy processing is done, but what about instant updates? The speed layer exists to compensate for the high latency of the batch layer. It is completely transient and because the batch layer is constantly running it only needs to worry about new data. The speed layer can often make aggressive trade-offs for performance because the batch layer will later extract deep insights and run tougher computations. It takes the data that came in after the last batch processing job and applies fast running algorithms. Because the Hadoop processing is run once or twice a day, the fast layer only has to keep track of a few hours of data to produce its results. The smaller volume makes it easy to use database technologies like MySQL, Tokyo Tyrant and Cassandra in the speed layer. Crawlers put new data on Gearman queues and workers process and write to a database. When the API is called, a thin layer of code queries both the speed layer database and the batch ElephantDB system, and merges the information from both to produce the final output that’s shown to the outside world.
The combination of time proven methods with some of the newer engineering ideas is quite suggestive. A mix of methods can provide the building blocks for a reliable, high performance system. Useful article.
Stephen E Arnold, January 13, 2011
Freebie
2007 Semantic Search Info Still Relevant
January 13, 2011
Short honk. We had a long call today (January 12, 2011) about semantic search. In the course of the call, I mentioned a presentation by Jon Atle Gulla, a profession in 2007 at the Norwegian University of Science and Technology. I did some poking around and found the link to the presentation. Quite useful in 2007 and still germane today. The presentation puts into context some of the work that must be done to deploy an effective semantic technology system in an organization. The slide deck is on Slideshare at this link. Registration may be required to access the file.
Stephen E Arnold, January 13, 2011
Freebie
Finding Needles in an OmniFind XML Haystacks
January 13, 2011
Yes, you can use IBM DB2 for XML searches. “CI Can . . . Search XML in OmniFind V1R2” gives examples from the the XML search in IBM DB2 OmniFind V1R2. To sum up: “XML search allows a search to be scoped to a specific element or attribute, rather than the entire document. In addition, the search syntax allows comparing an element or attribute value to a numeric, ISO date or ISO dateTime value during the search.” It also supports XML namespaces on element and tag names. No need to break out the metal detector, the needles will pop right out of those haystacks.
Alice Wasielewski, January 11, 2011
kCura Relativity 6.9
January 12, 2011
kCura has been gaining traction in the eDiscovery market over the last 12 months. When we last updated our eDiscovery content processing report, Kcura was showing some moxie. We learned this week that kCura is providing a pre-release version of Relativity 6.9, the company’s core product. The new version of Relativity adds some important enhancements, including:
- Better optical character recognition integration
- An updated deployment system
- Speed ups in hit highlighting
- Modifications to the workflow engine.
These features complement the analytics, clustering, and batch processing features. If you are not familiar with the eDiscovery space and this company, here’s how kCura explains its positioning:
kCura are the developers of the e-discovery software Relativity. Relativity is a web-based application servicing the analysis, review and production stages of the EDRM. kCura helps corporations and law firms with e-discovery challenges by installing Relativity on-premises, as well as providing hosted on-demand solutions through a global network of partners in Asia, Australia, Europe, and North America.
You can get some basic information from the firm’s Web site, www.kcura.com. The company, like other legal-oreinted vendors, maintains a low profile. The company is based in Chicago, Illinois.
Stephen E Arnold, January 12, 2011
Freebie
IBM OmniFind Search Documentation
January 12, 2011
We fielded a call about the architecture of IBM’s portal and OmniFind search technologies. The new OmniFind may have hit the market, but the IBM online documentation carries the date of January 2008. We still think that anyone integrating various IBM bits and pieces for an Intranet will want to check out the “old” documentation.
Here are the types of information available and a hot link to each set of Web pages. When we last checked (January 11, 2011), the information was online and did not require an IBM account or password to access.
- General information.
- Integration points. Important information.
- Installation procedure. Not exhaustive but useful.
- Specific integration of the IBM portal and search systems.
- Some additional IBM resources.
Can you install various IBM components without reading the documentation. Sure, but you will have some backtracking to do in order to figure out dependencies. Even though the documents pointed to date from 2008, the approach and method is useful.
Stephen E Arnold, January 12, 2011
Freebie
SAP Embedded Search
January 12, 2011
On a call today (January 10, 2011), we fielded a question about TREX, the plumbing for SAP’s bespoke search and retrieval service. The question had to do about what SAP calls search. Well, we have good luck locating information about TREX in the current SAP environment by searching for the string “embedded search.” Queries for TREX or for earlier versions of R/3, you may have more luck with the phrase “search engine service.” Years ago, I did a chapter in a book about TREX. My recollection is that information in 2006 and 2007 was difficult to find as well.
If you want to know about TREX and its various incarnations, you might consider these links:
- Basic description of TREX
- Discussion of classification functions
- Connecting a local search with an SAP search hub
We continue to hear chatter about third party solutions that “snap in” to SAP R/3 environments. We also hear comments about SAP’s on again, off again interest in buying a search vendor. Our suggestion: stick with SAP’s engine. A good alternative may become available.
Stephen E Arnold, January 12, 2010
Freebie
Google Compound Documents
January 12, 2011
Short honk: Just when you thought you could search the content in a word processing document, life becomes more exciting. “You Can Play Videos in Google Docs Now” reported that “Google has introduced support for video playback.” The write up said:
Google’s YouTube remains the destination of choice for anyone wanting to share a video with the world. But you can host and share videos on other Google products, Picasa expanded support, also via the YouTube player, last year and Docs now followed.
No word about one’s ability to search for a word or phrase within an embedded video in a Google Document. This feature is available from Exalead in Paris, however. See this Exalead Labs’s description.
Stephen E Arnold, January 12, 2011
Freebie
NLP and MedlEE
January 12, 2011
NLP International Corp. and its MedlEE product have popped on and off our radar several times in the last two years. NLP offers natural language processing technology and service tailored to the needs of the health care market. MedlEE has roots that reach back to Columbia University. The firm:
With its unique deployment model NLP International makes this world class solution available for through our MedlEE Portal. The MedlEE Portal is a SaaS offering the has applications ranging from quality to semantic search and retrieval to computer assisted coding and meaningful use… The MedlEE Natural Language Processing engine codifies standard text documents for data extraction, thereby enabling Discrete Reportable Transcription (DRT). MedlEE was developed over the course of 20 years by Columbia University in New York and is a powerful patented NLP engine that automates analytics, reporting and alerting for various reports for billing and for meeting various requirements.
There appears to be interest in the use of NLP and semantic technology in various health care applications. StenTel Corp., one of NLP’s MedlEE partners, says:
MedlEE NLP turns unstructured, dictated medical narratives into easily retrievable accurate data to support multiple health care systems in the hospital to enhance patient safety, quality assurance, diagnosis and prognosis support, billing and reimbursement administration. The physician is not required to change work habits.
What we found interesting is that NLP International has gone “silent”. Is an acquisition or other deal in the works? Is there another reason for the low profile?
We will monitor the situation.
Stephen E Arnold, January 12, 2011
Freebie