RegEx for Those Who Hunger for Regular Expressions
August 13, 2011
Here’s a useful description for those who need to feed their inner geek. How-To Geek describes “How To Use Basic Regular Expressions to Search Better and Save Time.” First, the Geek defines regular expressions:
Regular expressions are statements formatted in a very specific way and that can stand for many different results. Also known as “regex” or “regexp,” they are primarily used in search and file naming functions. One regex can be used like a formula to create a number of different possible outputs, all of which are searched for. Alternatively, you can specify how a group of files should be named by specifying a regex, and your software can incrementally move to the next intended output.
It’s important to note that regular expressions won’t work unless your software understands them.
The article gives clear and specific instructions and examples on getting started. It also provides links to three other resources you can turn to when you’re ready to move beyond the basics.
Cynthia Murrell, August 13, 2011
Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search
Open Source Tensions on the Rise
August 12, 2011
Commercial companies and open source strike sparks in Switzerland. The H Open Reveals, “Swiss Proprietary Companies Block Government Open Source Release.” The article explains,
In 2007, the Swiss federal court began development of its own internal document management system, OpenJustitia, designed to make it more efficient to search through court decisions. In 2009, the court’s IT department announced it would release the system as open source under the GPLv3. This summer, it was expected that OpenJustitia would be released to allow other courts to make use of it.
The plan has hit a snag, however. Software companies, such as Abraxas, Delta Logic, Weblaw, and Eurospider, have requested the release be delayed. They claim that the government is interfering in the market, essentially becoming a competitor to their businesses.
For its part, the government notes that all these companies have equal free access to this code and can use it in their own applications.
The Swiss Parliament and the Public Accounts Committee, which oversees the federal court, will decide where to go from here.
Cynthia Murrell August 12, 2011
ReVerb: The Whole Language Movement
August 12, 2011
Reverb, a new search method, presents an optimistic future for search engines and intelligence levels. Projecting what Web search engines will look like in ten years, ReVerb should hope that the whole language movement doesn’t make a comeback in schools. Requiring users to input an “argument” and a “predicate,” this program automatically identifies and extracts binary relationships from English sentences—and requires users to know the basic parts of a sentence.
Created by the University of Washington’s Turing Center, as a part of the KnowItAll project, there are currently 15 million Reverb extractions available for academic use. This program has blown similar ones out of the water.
The paper entitled, “Identifying Relations for Open Information Extraction” asserts the following:
“[ReVerb] more than doubles the area under the precision-recall curve relative to previous extractors such as TextRunner and WOE-pos. More than 30% of ReVerb’s extractions are at precision 0.8 or higher— compared to virtually none for earlier systems.”
The creators are confident that ReVerb will be useful for queries where target relations cannot be specified in advance and speed is important. Currently, there is a demo available.
Is this the next big thing in search or another public relations push? Will this generate sympathetic vibrations within the Google?
Megan Feil, August 11, 2011
Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search
Free Program Removes DRM Controls from PDFs
August 12, 2011
We’ve found a tool that is, perhaps, a bit concerning. Softpedia presents, for free, PDF Drm Removal 1.4.2.0. The developer of the software is listed as Removedrmfromepub.com. The product description reads,
PDF Drm Removal is a professional and reliable application designed to remove DRM protections from PDF files with no quality loss. Just removes the PDF files drm header, no change on the files. Read the PDF on any supported devices!
Interesting and somewhat concerning. We understand there’s controversy over Digital Rights Management controls; some say that they stifle innovation or violate private property rights. Others say the technology unnecessarily locks documents into a format that is bound to become obsolete someday.
However, we necessarily sympathize with publishers and writers, like Stephen E. Arnold, who
rely on PDF security to safeguard documents. How else will they protect their work in the digital age?
Cynthia Murrell August 11, 2011
A Google Plus Index Transition?
August 12, 2011
A few weeks after the Google Plus launch, Senior VP at Google Vic Gundotra is addressing feedback and criticism of the new social platform. Google Plus itself is a transition for Google, a move from their trademark keyword indexing and a move toward the trendy social tagging. In “Google Plus Is Being Changed This Week Based on User Feedback,” Google’s next move is discussed.
You may think Google could sit back and watch the Google Plus network grow, but that would be a mistake. The search company has realized it can’t just watch what happens, it needs to respond to users quickly in order to keep them happy and the network growing. While the general view of Google Plus is a positive one, there’s also a lot of criticism and user feedback of which Google is about to tackle.
Google is no doubt remembering failed ventures like Buzz and Wave while striving to make Google plus a lasting service. Another possible motivation is worth considering. Does Google see the end of the era of indexing? With social media placing more and more importance on social meaning within a given context, perhaps tagging is becoming more relevant than keyword indexing. If this is indeed the case, Google no doubt hopes to insure their dominance for the next generation through Google Plus.
Emily Rae Aldridge, August 11, 2011
Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search
Protected: Going Retro: Add MySpace Features to SharePoint
August 12, 2011
Search and Patent Research
August 11, 2011
We noted an article in PatentPoints.com called “Free USPTO Patent Searches Best Left to Professionals.” The article makes a point that is often ignored by those who believe their ability to search free online Web index equips them to handle more complex types of online queries. The article reflects the viewpoint of professionals trained to handle such commercial services as Lexis and Westlaw. The write up points to a service which we have found to be quite useful when investigating patents or intellectual property. The article asserts:
We understand that the majority of inventors seeking a patent are not covered under the large umbrella of million dollar investors and mega-companies. That is why we recommend Article One to our readers. Article One Partners provide low-cost, fast patent searches for their clients. Unlike traditional patent search companies, AOP utilize crowd-sourcing to guarantee their clients a worldwide, thorough search.
An example of the escalating “heat” in the intellectual property kitchen appears in “Patent Wars Heat Up as Google Courts InterDigital.” The story asserts that Google “recently held talks about buying InterDigital”, a company which produces wireless technology. The Forbes’ article emphasizes that “InterDigital is crucial for Google’s lagging patent portfolio.” The most telling observation in the write up, in my opinion, was:
Although Google indicates that these lawsuits act as a threat to innovation, Google is really looking to acquire more patents to give itself for strategic fire power since it currently has just about 730 U.S. patents to its name compared to Apple’s 4,000 and Microsoft’s 18,000.
What’s clear is that a patent arms race is now underway. Instead of MBA thinking, some companies are likely to be influenced by Herman Kahn’s and Evan Jones’s On Thermonuclear War.
My view is that many professionals overestimate their ability to conduct online research. In a focus group conducted with young librarians, three surprising insights emerged from a one hour discussion of online searching expertise.
First, some of the people trained as librarians often avoid commercial services due to cost and complexity, preferring to use the Web services available without charge. One example given was Google’s patent search service. One person in the focus group pointed out that some librarians may not be aware that the corpus does not include applications and other matter available from commercial services.
Second, there was a perception that an increasing number of reference desk visitors and libraries patrons believed that they could locate the information via a service like Bing or Exalead’s excellent Web search which I use for certain types of professional content. The problem is that casual searchers lack expertise in certain complex content domains like chemistry or patents. Furthermore, there was, according to the focus group, no awareness of services such as Derwent or Questel, both of high value for certain types of patent research.
Third, there was even among the focus group members only basic knowledge about the conventions and coding used for patent searches. The numbering methods, the challenge of figures, and the importance of claims were not covered in depth in certain information service training programs. This is an issue with the library training programs themselves.
The bottom line is that when an important patent research project surfaces, it is a good idea to turn to professionals. We have been impressed with the Article One Partners’ approach to patent research and litigation support. That is one path you may want to consider.
Stephen E Arnold, August 11, 2011
Freebie
The App Approach: A Dead End?
August 11, 2011
From the amazing statements department. Flash. Venture Beat’s “Nokia Exec: Android and iPhone Focus on the App Is “Outdated” caught my attention. For this write up, let;s assume the fellow is dead wrong. I am okay with headlines written for Bing and Google indexing subsystems. I am also okay with wild and crazy statements from cash strapped azure chip consultants, search vendors worried about making the next payment on the CEO’s company car, and unemployed English majors explaining that they are really social media experts. In an economic depression, words are worthless. When one has nothing to lose, is the approach “Go for broke?’
The assertion reported by Venture Beat’s Mobile Beat online publication was quite interesting. First, Nokia is not hitting any financial home runs. Say what you will about Apple, the outfit has a nifty balance sheet. Even the Google which is a giant ad system is able to “give away” a mobile operating system and make big waves. One example is the factoid that hundreds of thousands of Android-based devices are sold every time I check the weather in Harrod’s Creek.
A happy quack to http://zekjevets.blogspot.com/2010/02/alternative-racism.html
Here’s the statement that snagged me. (This is a longer quote than we normally use, but I want to get the context right. Please, navigate to the Venture Beat original for the full story. Also, note that I have put in bold face the items upon which I wish to comment.)
Nokia’s future phones will merge the latest Microsoft Windows Phone software based on the Mango update (which Weber said has had great reviews) with Nokia’s hardware, which he said boasts reliability and phone call quality. Weber cited state-of-the-art imaging technology and battery performance as areas Nokia phones would excel in. Weber also said Nokia may beat competitors on pricing, thanks to the company’s significant global reach, which gives it economies of scale. Moreover, Weber said the company will launch its superphone portfolio with a focus on U.S. market, because he said winning in the U.S. market is what it takes to win globally. He also confirmed that Nokia will back the launch with the company’s largest marketing effort to date, though wouldn’t go into specifics. Weber called Android and the iOS phone platforms “outdated.” While Apple’s iPhone, and its underlying iOS operating system, set the standard for a modern user interface with “pinch and zoom,” Weber conceded, it also forces people to download multiple applications which they then have to navigate between. There’s a lot of touching involved as you press icons or buttons to activate application features. Android essentially “commoditized” this approach, Weber said.
Whew. Let me do an addled goose style run down.
- Reliability and call quality. In my experience the phone is only part of the reliability and call quality equation. There are networks involved. I have worked throughout the world and reliability and call quality has more to do with where I am than the handset. In the arctic circle my Treo 650 worked like a champ. In the hollow near my pond, I can’t get a coherent squawk from my BlackBerry. So how’s Nokia going to fix this? Nokia can’t. Baloney.
- Imaging and battery performance. Whoa, horsey. Putting a better camera in a phone is a question of economics and technical tradeoffs. The battery issue is a big deal. As crazy as Research in Motion’s present management set up is, the company does have good battery technology as does Apple. Nokia? Better get that pony aimed at the battery corral is my advice.
Quote to Note: Newspapers and Android Tablets
August 11, 2011
I read two different news items which informed me that Apple has stopped the sale of Samsung tablets in countries far from Kentucky. Ominous if accurate. Then I came upon “Newspaper Eyes Creating Subsidized Samsung Tablet.” The idea seemed a bit of a reach, but, hey, those newspaper managers are sharp business professionals. I am not sure how many of my neighbors would put down their jug of firewater to browse a tablet, but one never knows until one tries. What caught my attention was a quote to note, allegedly made by “one insider”, a great source for sure. Here the quote perches:
“If it turns out to be a failure, it will be a fantastically interesting failure.” Another source commented: “I would be shocked if it was successful.”
There it is.
Stephen E Arnold, August 11, 2011
Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search
Are Text Analytics Companies Learning the Silicon Valley Way?
August 11, 2011
Seth Grimes, founding chair for the Text Analytics Summit, interviewed three experts in order to find out what it is that Silicon Valley and the world of text analytics have in common. The full interview, “What Can Text Analytics and Silicon Valley Learn From Each Other?” can be found at Text Analytics News.
Grimes reports, “Business markets are global, yet the Bay Area stands out as a source and consumer of innovative technologies and in particular, as a pace-setter for the online and social worlds. With the Text Analytics Summit coming to San Jose, I reached out to a few west-coasters who are making Valley text analytics news: Nitin Indurkhya, principal research scientist at eBay Research Labs; YY Lee, COO of FirstRain; and Michael Osofsky, co-founder and chief innovation officer at NetBase.”
Osofsky explains the balance between precision and recall in text analytics, and urges Silicon Valley to understand that time and energy should be devoted to experimenting to find a balance between the two principles. On the other hand, Silicon Valley’s fast and exciting nature could be a good influence on the text analytics world. Software can be launched, edited, and evolved quickly and risks can be taken. Absorbing a bit of that mentality could enable text analytics to be a little more innovative and adventurous.
Indurkya encourages the text analytics world to adopt the Valley principle of “fail often and fail quickly.” In this way, he explains, innovation happens and failure does not bog down the overall momentum.
Lee encourages text analytics companies to focus separately on each of three equally important components: 1) Input 2) Internal process 3) Presentation. Each of the categories falls broadly under the category of text analytics and yet Lee stresses each must be treated independently during development.
Grimes concludes with his own collective thoughts on the three interviews.
The key takeaways that I see in these responses involve problem and product focus, agility, and the desirability of pulling and integrating information from multiple sources with the application of a variety of analytical techniques, in order to achieve technical and business goals. There’s no “Do X, Y, and Z” formula here, but there is definitely a sense of the rewards that are possible if text analytics is done right.
Out-of-the-box thinking is beneficial in any business arena, but especially those known more for rigidity than innovation.
Emily Rae Aldridge, August 11, 2011
Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search. And our own Stephen E Arnold is speaking at this year’s November 2011 event.
The Text Analytics Summit has been a staple of the text analytics community for the past 7 years. To help this community grow, the Text Analytics Summit is finally coming to the west coast to foster new networking opportunities, promote more healthy knowledge sharing, and create strong, long-lasting business relationships. Text Analytics is essential for maximizing the customer experience, effectively monitoring the social media world, conducting first-class data analysis and research, and improving the business decision making process. Attend the summit to discover how to unlock the power of text analytics to leverage new and profitable business opportunities. Whether you’re interested in taking advantage of social media analytics, customer experience management, sentiment analysis, or Voice of the Customer, Text Analytics Summit West is the only place to get the inside information that you need to stay ahead of the competition and profit from text mining. For more information, click here.