Heresy: Google Books Less Useful Than Some Perceive

October 31, 2015

I love Google. Well, technically Google is now the Alphabet Google thing. You know the firm. Loon balloons, drones with solar cells, a lab working to kill death (poetic, right?).

I read a paper which some Alphabet Google thing lovers may perceive as heretical. In 14th century Spain going against the cultural flow could be hazardous to an author’s health. Here in Internetland, complaining about a Google service may not raise an eyebrow. Mention an academic Google service to a college sophomore, and the individual may panic and seek restoration with a 30 minute dip in the Facebook ocean.

I read “Is Google Books Leading Researchers Astray?” which is sort of summary of information captured in academic-speak in a PLOS article with the spider friendly title “Characterizing the Google Books Corpus: Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution” by several collaborating scholars.

The point of the two write ups is to point out that using the text in Google Books for linguistic and other types of academic text analysis pursuits may not deliver the results one might hope for. The reason is that the text of the collection included Google Books does not reflect what’s popular.

I circled this comment about the editorial policy of Google Books:

For Shore  [a university English professor], then, the real issues begin with Google’s little-understood archival methodology. Its approach, he noted, has been to simply “scan it all.” Because of this, “It’s often difficult to figure out what you’re reading, where it came from, who published it.”

Editorial policy and Google? Do these concepts go together like ham and eggs, peanut butter and jelly, and Lucy and Desi.

Net net: If you hanker for a way to study cultural evolution, Google Books may not be the Grand Central Station for your research.

Again I ask, “Does Google have an editorial policy for any of its products except certain advertisers and entries on its alleged black list of forbidden words and phrases?”

Stephen E Arnold, October 31, 2015

Ravel, Harvard, and Indigestion for Lexis and Westlaw

October 31, 2015

If you are a lucky online maven with free Lexis and Westlaw access, you do not want to waste your time reading “Harvard Law School Launches “Free the Law” Project with Ravel Law To Digitize US Case Law, Provide Free Access.”

But if you pay hard cash to run queries on certain court documents, you may want to pay attention to the Ravel-Harvard plan to provide access to US case law.

Ravel wants to catch the attention of the big guns at Reed Elsevier and Thomson Reuters. I assume the executives at these companies are on top of the Ravel plan to unravel their money machines.

According to the Harvard write up:

Harvard Law School’s collection comprises 40,000 books containing approximately forty million pages of court decisions, including original materials from cases that predate the U.S. Constitution. It is the most comprehensive and authoritative database of American law and cases available anywhere except for the Library of Congress, containing binding judicial decisions from the federal government and each of the fifty states, from the founding of each respective jurisdiction. The Harvard Law School Library—the largest academic law library in the world—has been collecting these decisions over the past two hundred years.

Where there is legal information and the two leading for fee legal online services, my hunch is that there will be some legal eagles taking flight.

According to Techdirt:

Harvard “owns” the resulting data (assuming what’s ownable), and while there are some initial restrictions that Ravel can put on the corpus of data, that goes away entirely after eight years, and can end earlier if Ravel “does not meet its obligations.” Beyond that, Harvard is making everything available to non-profits and researchers anyway. Ravel is apparently looking to make some money by providing advanced tools for sifting through the database, even if the content itself will be freely available.

What will the professional publishing outfits do to preserve their market? I can think of several actions. Sure, litigation is one route. But taking Harvard to court might generate some bad vibes. Perhaps Reed Elsevier and Thomson Reuters will finally bite the bullet, merge, and then buy out Ravel? We have Walgreen Boots, why not LexisWestlaw? Is that a scary Halloween thought? Let the Department of Justice unravel that deal. Don’t lawyers enjoy that sort of challenge.

Stephen E Arnold, October 31, 2015

X1 Asserts That It Sets the Standard for Enterprise Search

October 30, 2015

I read “A Series of Firsts: How X1 Sets the Standard for the New Enterprise Search Market.” I am interested in firsts. I like a series of first even more.

According to the write up the first first is:

X1 is ready for the IT reality of always-on, virtual, cloud, and hybrid environments and business mobility.  This is evidenced by two “firsts” that X1 is proud to announce.  First, X1 is the first search application with an app publicly available in an Enterprise Mobility Management (EMM) app store.

The second first is:

The second “first” that X1 is proud of is the listing of X1 Rapid Discovery in the Amazon AWS Marketplace.  Again, this is no small feat – this is the first enterprise-grade search and eDiscovery application to be available in the AWS Marketplace.

The article points out:

X1 will continue to provide solutions that work in the infrastructures that organizations utilize today.  The traditional approach to search will not work, but with X1, companies will have the flexibility to deploy into any environment and give users a powerful search experience on any device.  That is a powerful productivity tool – and businesses require worker productivity the same way humans require oxygen.  It is a new enterprise search market out there and X1 is uniquely positioned to lead the charge.

According to Crunchbase, X1 was founded in 2003 and has received $12.2 million in three rounds from several investors. If you search for the company, be aware that hits may be returned to the X1 platform by Xfinity, which is an interactive TV system. There is a company offering the X1 orgasmatron. I also spotted links to the Bell aircraft labeled X-1.

Vendors of search and content processing systems may want to attend to making certain that a query for “X1 search” returns links to their brand. I have noted that a number of search vendors have lost control of their “name”; for example, Thunderstone and Smartlogic, to name two.

To locate X1, the search and discovery outfit, navigate to www. Web searches can offer some first hand discovery excitement to the uninformed.

Stephen E Arnold, October 30, 2015

Google Hacks to Make You Grin

October 30, 2015

Google is run by a bunch of geeks that entertain themselves using the high tech toys at their fingertips.  Beyond the insertion of Douglas Adams references in search results, there are other Google hacks that the tech geeks developed to make themselves and you smile.  Digital Spy tracked down “Eleven Google Secrets That Will Change The Way You Search, From Playing Pac-Man To Lego Street View.”

“Day after day you hammer out search after search, overlooking not only the hidden gems lurking beneath the surface, but the very thing that makes Google such an anomaly amongst the world’s biggest companies – its sense of humor. Here are a few thinks you might not have known you can do in Google.”

Google can do numerous things just by typing a few simple commands into the search bar.  Try typing: “askew” or “tilt,” “do a barrel roll,” and “Zerg rush.”  Google is also a time machine and can take you back to the 1998 Google interface or you can spend hours playing Pac-Man on an uploaded Google Doodle from May 2010.  The yellow stick figure on Google Street View also likes to play dress-up when he visits certain places.

But our absolute favorite is the six degrees of Kevin Bacon calculator.  Based off an old Internet meme that everyone in Hollywood has worked with Kevin Bacon in less than six degrees, type in a famous person and “bacon number” to find out how close their careers are.

Little hacks and fun games like this show the human side to the Google empire.  What will they think of next?  However, it would be nice if Google added some practical functions, such as a time and date feature.

Whitney Grace, October 30, 2015

Sponsored by, publisher of the CyberOSINT monograph

Latest Global Internet Report Available

October 30, 2015

The Internet Society has made available its “Global Internet Report 2015,” just the second in its series. World-wide champions of a free and open Internet, the society examines mobile Internet usage patterns around the globe. The report’s Introduction explains:

“We focus this year’s report on the mobile Internet for two reasons. First, as with mobile telephony, the mobile Internet does not just liberate us from the constraints of a wired connection, but it offers hundreds of millions around the world their only, or primary, means of accessing the Internet. Second, the mobile Internet does not just extend the reach of the Internet as used on fixed connections, but it offers new functionality in combination with new portable access devices.”

It continues with this important warning:

“The nature of the Internet should remain collaborative and inclusive, regardless of changing means of access. In particular, the mobile Internet should remain open, to enable the permission-less innovation that has driven the continuous growth and evolution of the Internet to date, including the emergence of the mobile Internet itself.”

Through the report’s landing page, above, you can navigate to the above-cited Introduction, the report’s Executive Summary, and Section 2: Trends and Growth. There is even an interactive mobile Internet timeline. Scroll to the bottom to download the full report, in PDF, Kindle, or ePub formats. The download is free, but those interested can donate to the organization here.

Cynthia Murrell, October 30, 2015

Sponsored by, publisher of the CyberOSINT monograph

Is Google Imitating IBM?

October 29, 2015

I find the stampede to machine learning, artificial intelligence, and smart software somewhat underwhelming. The efforts of programmers to make software and systems less like their mainframe antecedents has been an objective for decades.

What’s changed?

The need to come up with a buzzword, zippy notion, or catchy phrase for software that leaves the command line behind along with the dependence on the user to know what he or she wants.

I chuckled along with the PCWorld write up “Google Says It’s ‘Rethinking Everything’ around Machine Learning.” I noted this statement:

it’s not hard to imagine where it might turn up. He [Google wizard] mentioned machine learning in the context of mobile, for example, where machine learning could determine if a user is at work, at home or in their car, so that their phone can deliver information accordingly.

Yep, information along with ads. Eliminating expensive humans in the process may also be helpful. But I think the elephant in the write up is the need for the Alphabet Google thing to generate high margin revenue from advertising.

The article included this statement which may explain the “uncertain future” some Google wizards perceive:

Not all was rosy, though. Aggregate paid clicks—the number of times Google made money when a user clicked on an advertisement—increased 23 percent from a year earlier, but the amount of money Google received for each click dropped by 11 percent, continuing a downward trend from the past several quarters.

But what about Amazon, IBM, Microsoft, and other firms racing forward with smart software. My hunch is that the company which generates the most revenue from smart software will be the winner. The hypothetical examples, whether the practical Google stuff or the absolutely out there examples from outfits like IBM, will define artificial intelligence, cognitive computing, or whatever jargon slapped on a development direction discernible even to the slower Natty Bumpo’s in the digital forest.

Stephen E Arnold, October 29, 2015

Datafari Ventures into the Enterprise Search Jungle

October 29, 2015

A less-than-enthusiastic reader called out attention to Datafari, a new explorer of the enterprise search jungle. The software uses Solr and contains “the heart of a CMS.” The Datafari Web site explains:

A CMS allows for organizing collaboration within a company. But it is never monolithic, and only a federated search engine can fin the data wherever they are.

Datafari, Version 2.0 is explained in a video at this link. The system permits key word search and offers a point-and-click sidebar to facilitate exploration of the content.


A user can save a particular document to a Favorites folder. The system administrator can view log file data in a graphical format. Hit boosting is available as well.

A live demonstration is available at this link. When I visited the site, it appeared that I needed to load my own content into the system. I decided against taking this step.

If you are looking for an enterprise search system that can double as a content management system, Datafari may be for you. The company is located in France, so a trip for training could be an added bonus.

Stephen E Arnold, October 29, 2015

CSI Search Informatics Are Actually Real

October 29, 2015

CSI might stand for a popular TV franchise, but it also stands for “compound structured identification” explains in “Bioinformaticians Make The Most Efficient Search Engine For Molecular Structures Available Online.” Sebastian Böcker and his team at the Friedrich Schiller University are researching metabolites, chemical compounds that determine an organism’s metabolism.  Metabolites are used to gauge information about the condition of living cells.

While this is amazing science there are some drawbacks:

“This process is highly complex and seldom leads to conclusive results. However, the work of scientists all over the world who are engaged in this kind of fundamental research has now been made much easier: The bioinformatics team led by Prof. Böcker in Jena, together with their collaborators from the Aalto-University in Espoo, Finland, have developed a search engine that significantly simplifies the identification of molecular structures of metabolites.”

The new search works like a regular search engine, but instead of using keywords it searches through molecular structure databases containing information and structural formulae of metabolites.  The new search will reduce time in identifying the compound structures, saving on costs and time.  The hope is that the new search will further research into metabolites and help researchers spend more time working on possible breakthroughs.

Whitney Grace, October 29, 2015

Sponsored by, publisher of the CyberOSINT monograph

The PurePower Geared Turbofan Little Engine That Could

October 29, 2015

The article on Bloomberg Business titled The Little Gear That Could Reshape the Jet Engine conveys the 30 year history of Pratt & Whitney’s new PurePower Geared Turbofan aircraft engines. These are impressive machines, they burn less fuel, pollute less, and produce 75% less noise. But thirty years in the making? The article explains,

“In Pratt’s case, it required the cooperation of hundreds of engineers across the company, a $10 billion investment commitment from management, and, above all, the buy-in of aircraft makers and airlines, which had to be convinced that the engine would be both safe and durable. “It’s the antithesis of a Silicon Valley innovation,” says Alan Epstein, a retired MIT professor who is the company’s vice president for technology and the environment. “The Silicon Valley guys seem to have the attention span of 3-year-olds.”

It is difficult to imagine what, if anything, “Silicon Valley guys” might develop if they spent three decades researching, collaborating, and testing a single project. Even more so because of the planned obsalesence of their typical products seeming to speed up every year. In the case of this engine, the article suggests that the time spent has positives and negatives for the company- certain opportunities with big clients were lost along the way, but the dedicated effort also attracted new clients.

Chelsea Kerwin, October 29, 2015

Sponsored by, publisher of the CyberOSINT monograph


Watson Weakly: IBM Case Manager Includes Watson

October 28, 2015

I read “IBM Case manager Provides Tailored Content Management.” For those of you not keeping track of IBM’s product line, may I share with you that IBM Case Manager is a wrapper perched on top of FileNet?

Remember FileNet?

In 2006, yep, nine short years ago, IBM purchased FileNet for $1.6 billion. I stumbled literally upon FileNet when I was doing some work for one of the financial outfits who once found me mildly amusing. FileNet was founded in 1982. The company scanned checks and other documents, stored the images on optical discs, and made the contents searchable—sort of.

The hardware was pure 1982: Big machines, big scanners, and lots of humans doing tasks. Over time, FileNet updated its human dependent system to become more automated. FileNet was a proprietary set up and required lots of engineers, programmers, and specialists to set up the system and keep it humming along at 2 am when most back office operations were performed in the 1980s.

Enter IBM.

FileNet is still available. But IBM has created applications which are designed to make the system more saleable in today’s market. The IBM Case Manager includes FileNet and workflow, collaboration, and compliance tools. You can now run FileNet from a mobile device. When I first stubbed my  toe on a giant scanning system, folks were using nifty green screens. Progress.

The 1980s are gone. IBM now delivers a case manager. Keep in mind, gentle reader, that case management is a solution keenly desired by many in law enforcement and certain intelligence disciplines. The US government continues to search for a case management system that meets its various units’ requirements. I would suggest that some of the products available as commercial off the shelf software do not do the job. But let’s focus on what the article reveals about IBM Case Manager.

The article points out that IBM Case Manager includes these components:

  1. A unified interface. Always good for a busy user.
  2. A data capture and parsing module.
  3. Information life cycle tools. The idea is that one can comply with Federal regulations and make darned sure information has a “kill on” date.
  4. A content manager which “provides features for capture, workflow, collaboration, and compliance on both mobile and desktop [devices].”
  5. SoftLayer which makes IBM Case Manager a cloud application. But licensees can install the system on premises or use a hybrid approach which can be exciting for engineers and investigators.

But the big news in the article is contained in this passage, which I circled in dollar bill green:

Analytics, which is a set of packages that includes IBM Watson, which can glean insight from business content, present that insight in the right context, and identify patterns and trends.

I did not know that IBM Case Management included Watson. My understanding was that Watson was the new IBM; therefore, Watson includes IBM Case Management.

Perhaps this is a minor point, but since we are dealing with technology from the 1980s, open source code, and wrappers which add a range of user experience features, I think getting the horse and cart lined up correctly can be helpful at times.

Another remarkable revelation in the article is that IBM Case Manager is for “enterprise of all sizes.” There you go. The local Harrod’s Creek, Kentucky, car wash and grocery can use IBM Case Management with Watson to help the proprietors deal with their information demands.

May I suggest that FileNet, regardless of its name, is appropriate for outfits like banks, hospitals, and meaty government agencies.

I also learned:

IBM Case Manager can be used to monitor social media sites to get a reading on public sentiment on a given subject or brand. Case Manager can also provide collaboration with social media platforms.

I have updated my Watson files and noted that IBM Case Manager includes Watson or is it the other way around?

Stephen E Arnold, October 28, 2015

Next Page »

  • Archives

  • Recent Posts

  • Meta