On the Prevalence of Open Source
November 11, 2015
Who would have thought, two decades ago, that open source code was going to dominate the software field? Vallified’s Philip O’Toole meditates on “The Strange Economics of Open-Source Software.” Though the industry gives so much away for free, it’s doing quite well for itself.
O’Toole notes that closed-source software is still in wide use, largely in banks’ embedded devices and underpinning services. Also, many organizations are still attached to their Microsoft and Oracle products. But the tide has been turning; he writes:
“The increasing dominance of open-source software seems particularly true with respect to infrastructure software. While security software has often been open-source through necessity — no-one would trust it otherwise — infrastructure is becoming the dominant category of open-source. Look at databases — MySQL, MongoDB, RethinkDB, CouchDB, InfluxDB (of which I am part of the development team), or cockroachdb. Is there anyone today that would even consider developing a new closed-source database? Or take search technology — elasticsearch, Solr, and bleve — all open-source. And Linux is so obvious, it is almost pointless to mention it. If you want to create a closed-source infrastructure solution, you better have an enormously compelling story, or be delivering it as part of a bigger package such as a software appliance.”
It has gotten to the point where developers may hesitate to work on a closed-source project because it will do nothing for their reputation. Where do the profits come from, you may ask? Why in the sale of services, of course. It’s all part of today’s cloud-based reality.
Cynthia Murrell, November 11, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Another Semantic Search Play
November 6, 2015
The University of Washington has been search central for a number of years. Some interesting methods have emerged. From Jeff Dean to Alon Halevy, the UW crowd has been having an impact.
Now another search engine with ties to UW wants to make waves with a semantic search engine. Navigate to “Artificial-Intelligence Institute Launches Free Science Search Engine.” The wizard behind the system is Dr. Oren Etzioni. The money comes from Paul Allen, a co founder of Microsoft.
Dr. Etzioni has been tending vines in the search vineyard for many years. His semantic approach is described this way:
But a search engine unveiled on 2 November by the non-profit Allen Institute for Artificial Intelligence (AI2) in Seattle, Washington, is working towards providing something different for its users: an understanding of a paper’s content. “We’re trying to get deep into the papers and be fast and clean and usable,” says Oren Etzioni, chief executive officer of AI2.
Sound familiar: Understanding what a sci-tech paper means?
According to the write up:
Semantic Scholar offers a few innovative features, including picking out the most important keywords and phrases from the text without relying on an author or publisher to key them in. “It’s surprisingly difficult for a system to do this,” says Etzioni. The search engine uses similar ‘machine reading’ techniques to determine which papers are overviews of a topic. The system can also identify which of a paper’s cited references were truly influential, rather than being included incidentally for background or as a comparison.
Does anyone remember Gene Garfield? I did not think so. There is a nod to Expert System, an outfit which has been slogging semantic technology in an often baffling suite of software since 1989. Yep, that works out to more than a quarter of a century.) Hey, few doubt that semantic hoohah has been a go to buzzword for decades.
There are references to the Microsoft specialist search and some general hand waving. The fact that different search systems must be used for different types of content should raise some questions about the “tuning” required to deliver what the vendor can describe as relevant results. Does anyone remember what Gene Garfield said when he accepted the lifetime achievement award in online? Right, did not think so. The gist was that citation analysis worked. Additional bells and whistles could be helpful. But humans referencing substantive sci-tech antecedents was a very useful indicator of the importance of a paper.
I interpreted Dr. Garfield’s comment as suggesting that semantics could add value if the computational time and costs could be constrained. But in an era of proliferating sci-tech publications, bells and whistles were like chrome trim on a 59 Oldsmobile 98. Lots of flash. Little substance.
My view is that Paul Allen dabbled in semantics with Evri. How did that work out? Ask someone from the Washington Post who was involved with the system.
Worth testing the system in comparative searches against commercial databases like Compendex, ChemAbs, and similar high value commercial databases.
Stephen E Arnold, November 5, 2015
Google Continues to Improve Voice Search
November 5, 2015
Google’s research arm continues to make progress on voice search. The Google Research Blog updates us in, “Google Voice Search: Faster and More Accurate.” The Google Speech Team begins by referring back to 2012, when they announced their Deep Neural Network approach. They have since built on that concept; the team now employs a couple of models built upon recurrent neural networks, which they note are fast and accurate: connectionist temporal classification and sequence discriminative (machine) training techniques. The write-up goes into detail about how speech recognizers work and what makes their latest iteration the best yet. I found the technical explanation fascinating, but it is too lengthy to describe here; please see the post for those details.
I am still struck when I see any article mention that an algorithm has taken the initiative. This time, researchers had to rein in their model’s insightful decision:
“We now had a faster and more accurate acoustic model and were excited to launch it on real voice traffic. However, we had to solve another problem – the model was delaying its phoneme predictions by about 300 milliseconds: it had just learned it could make better predictions by listening further ahead in the speech signal! This was smart, but it would mean extra latency for our users, which was not acceptable. We solved this problem by training the model to output phoneme predictions much closer to the ground-truth timing of the speech.”
At least the AI will take direction. The post concludes:
“We are happy to announce that our new acoustic models are now used for voice searches and commands in the Google app (on Android and iOS), and for dictation on Android devices. In addition to requiring much lower computational resources, the new models are more accurate, robust to noise, and faster to respond to voice search queries – so give it a try, and happy (voice) searching!”
We always knew natural-language communication with machines would present huge challenges, ones many said could never be overcome. It seems such naysayers were mistaken.
Cynthia Murrell, November 5, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Digging into Googles Rich Answer Vault
November 4, 2015
Google has evolved from entering precise keywords into the search engine to inputting random questions, complete with question mark. Google has gone beyond answering questions and keyword queries. Directly within search results for over a year now, Google has included content referred to as “rich answers,” meaning answers to search queries without having to click through to a Web site. Stone Temple Consulting was curious how much people were actually using rich answers, how they worked, and how can they benefit their clients. In December 2014 and July 2015, they ran a series of tests and “Rich Answers Are On The Rise!” discusses the results.
Using the same data sets for both trials, Stone Temple Consulting discovered that use of Google rich answers significantly grew in the first half of 2015, as did the use of labeling the rich answers with titles, and using images with them. The data might be a skewed in favor of the actual usage of rich answers, because:
“Bear in mind that the selected query set focused on questions that we thought had a strong chance of generating a rich answer. The great majority of questions are not likely to do so. As a result, when we say 31.2 percent of the queries we tested generated a rich answer, the percentage of all search queries that would do so is much lower.”
After a short discussion about the different type of rich answers Google uses and how those different types of answers grew. One conclusion that can be drawn from the types of rich answers is that people are steadily relying more and more on one tool to find all of their information from a basic research question to buying a plane ticket.
Whitney Grace, November 4, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
New Scan Video Search Tool
November 2, 2015
Navigate to Scan Video at http://www.scan.video. The service is in development. Results are limited. Documentation, although brief, is on the site’s about page. I ran a query for my video about cyberosint. The system did not locate my YouTube video on that subject. Other queries were more successful; for example, when I searched for “dance”, I received hits and new search box appeared inviting me to search for words in the video. My quest for a killer video search system continues.
Stephen E Arnold, November 2, 2015
IRS: Searching for Computers
November 1, 2015
I read “IRS Hasn’t Finished Doing Windows Upgrades Because It Can’t Find a Bunch of Its Computers.” Who knows if it is true. I find the write up darned amusing. The notion that the IRS cannot locate some of its computers is a Jack Benny-type knee slapper.
Here’s the passage I highlighted in tax delinquent red, a delightful hue:
The IRS has spent $128 million in its attempt to upgrade all computers away from Windows XP and all servers away from Windows Server 2003. But when the Treasury Inspector General for Tax Administration (TIGTA) conducted the audit between December 2014 and June 2015, about half of the agency’s servers and more than 1,000 computers still had not been upgraded. “At the conclusion of our fieldwork, the IRS had not accounted for the location or migration status of approximately 1,300 workstations and upgraded only about one-half of its Windows servers,” the report explains. It’s a diplomatic way of saying that a bunch of computers were missing, whether they were hiding in plain sight or in a black market parts exchange somewhere.
At some point, one wonders if the 18f.gov folks can find the time to assist the IRS in its technical quest.
The write up also included this fascinating statement:
Using legacy operating systems is a problem because it makes systems more vulnerable to hacks. And since the IRS stores valuable information about millions of people it’s especially important for the agency. Don’t forget that the agency disclosed a big data breach in May [2015]. All of this is making the Navy’s disastrous upgrade process look a little better. Or maybe it’s just making every agency look worse.
I did not know that using a legacy operating system might be a problem. Insight time.
Stephen E Arnold, November 1, 2015
The PurePower Geared Turbofan Little Engine That Could
October 29, 2015
The article on Bloomberg Business titled The Little Gear That Could Reshape the Jet Engine conveys the 30 year history of Pratt & Whitney’s new PurePower Geared Turbofan aircraft engines. These are impressive machines, they burn less fuel, pollute less, and produce 75% less noise. But thirty years in the making? The article explains,
“In Pratt’s case, it required the cooperation of hundreds of engineers across the company, a $10 billion investment commitment from management, and, above all, the buy-in of aircraft makers and airlines, which had to be convinced that the engine would be both safe and durable. “It’s the antithesis of a Silicon Valley innovation,” says Alan Epstein, a retired MIT professor who is the company’s vice president for technology and the environment. “The Silicon Valley guys seem to have the attention span of 3-year-olds.”
It is difficult to imagine what, if anything, “Silicon Valley guys” might develop if they spent three decades researching, collaborating, and testing a single project. Even more so because of the planned obsalesence of their typical products seeming to speed up every year. In the case of this engine, the article suggests that the time spent has positives and negatives for the company- certain opportunities with big clients were lost along the way, but the dedicated effort also attracted new clients.
Chelsea Kerwin, October 29, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
RAVN Pipeline Coupled with ElasticSearch to Improve Indexing Capabilities
October 28, 2015
The article on PR Newswire titled RAVN Systems Releases its Enterprise Search Indexing Platform, RAVN Pipeline, to Ingest Enterprise Content Into ElasticSearch unpacks the decision to improve the ElasticSearch platform by supplying the indexing platform of the RAVN Pipeline. RAVN Systems is a UK company with expertise in processing unstructured data founded by consultants and developers. Their stated goal is to discover new lands in the world of information technology. The article states,
“RAVN Pipeline delivers a platform approach to all your Extraction, Transformation and Load (ETL) needs. A wide variety of source repositories including, but not limited to, File systems, e-mail systems, DMS platforms, CRM systems and hosted platforms can be connected while maintaining document level security when indexing the content into Elasticsearch. Also, compressed archives and other complex data types are supported out of the box, with the ability to retain nested hierarchical structures.”
The added indexing ability is very important, especially for users trying to index from from or into cloud-based repositories. Even a single instance of any type of data can be indexed with the Pipeline, which also enriches data during indexing with auto-tagging and classifications. The article also promises that non-specialists (by which I assume they mean people) will be able to use the new systems due to their being GUI driven and intuitive.
Chelsea Kerwin, October 28, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
EasyAsk Crowdfunder Campaign
October 27, 2015
In June 2015, EasyAsk kicked off a program to elicit investment funds. The EasyAsk approach is not one I usually see in the search and content processing sector.
“Mobile Commerce Company EasyAsk Seeks $2.5M on Crowdfunder” reported:
With over 400 pre-eminent customers already under their belt, EasyAsk has a proven track record of providing this valuable service to companies including The North Face, JJill and others, and is looking to expand its reach into an even broader base. EasyAsk, ranked just behind Oracle and Adobe in e-commerce search, has committed to dedicating sales and marketing resources on the Magento and IBM WebSphere platforms, to attract retailers and engage partners to ensure a high growth and return on investment for our investors.
A quick check of the EasyAsk News Web page did not include any information about the Crowdfunder campaign. I noted that the most recent news posts was a June 5, 2015, announcement that Tacoma Screw Products, an EasyAsk customer, was nominated for an Internet Retailer Excellence Aware.
With the economic pressures building across the search and content processing sector, we will keep you posted on EasyAsk’s trajectory.
Stephen E Arnold, October 27, 2015
The Lack of Digital Diversity
October 27, 2015
Tech companies and their products run our lives. Companies like Apple, Google, and Microsoft have made it impossible to function in developed nations without them. They have taken over everything from communication to how we entertain ourselves. While these companies offer a variety of different products and services, they are more similar than different. The Verge explains that “Apple, Google, And Microsoft Are All Solving The Same Problem.”
Google, Apple, and Microsoft are offering similar services and products in their present options with zero to little diversity among them. For example, there are the personal assistants Cortana vs. Google Now vs. Siri, options for entertainment in the car like Apple CarPlay and Android Auto, and seamless accessibility across devices with Chrome browser, Continuity, and Continuum. There are more comparisons between the three tech giants and their business plans for the future, but it is not only them. Social media sites like Facebook and Twitter are starting to resemble each other more too.
Technology companies have borrowed from each and have had healthy competition for years spurring more innovation, but these companies are operating on such similar principles that it is stifling creativity and startups are taking more risks:
“Without the dual pressures of both the consumer and the stock market, and without a historic reputation to uphold, small startups are now the best engine for generating truly new and groundbreaking innovations. Uber and Airbnb are fundamentally altering the economics of renting things, while hardware designers like Pebble and Oculus are inventing cool new technology that isn’t bound to any particular company’s ecosystem. Startups can see a broader range of problems to address because they don’t have to wear the same economic blinkers as established, monolithic companies.”
The article ends on positive thoughts, however. The present is beating along at a consistent pace, but in order to have more diversity companies should not be copying each other on every little item. Tech companies should borrow ideas from the future to create more original ideas.
Whitney Grace, October 27, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph