Google Continues to Improve Voice Search

November 5, 2015

Google’s research arm continues to make progress on voice search. The Google Research Blog updates us in, “Google Voice Search: Faster and More Accurate.” The Google Speech Team begins by referring back to 2012, when they announced their Deep Neural Network approach. They have since built on that concept; the team now employs a couple of models built upon recurrent neural networks, which they note are fast and accurate: connectionist temporal classification and sequence discriminative (machine) training techniques. The write-up goes into detail about how speech recognizers work and what makes their latest iteration the best yet. I found the technical explanation fascinating, but it is too lengthy to describe here; please see the post for those details.

I am still struck when I see any article mention that an algorithm has taken the initiative. This time, researchers had to rein in their model’s insightful decision:

“We now had a faster and more accurate acoustic model and were excited to launch it on real voice traffic. However, we had to solve another problem – the model was delaying its phoneme predictions by about 300 milliseconds: it had just learned it could make better predictions by listening further ahead in the speech signal! This was smart, but it would mean extra latency for our users, which was not acceptable. We solved this problem by training the model to output phoneme predictions much closer to the ground-truth timing of the speech.”

At least the AI will take direction. The post concludes:

“We are happy to announce that our new acoustic models are now used for voice searches and commands in the Google app (on Android and iOS), and for dictation on Android devices. In addition to requiring much lower computational resources, the new models are more accurate, robust to noise, and faster to respond to voice search queries – so give it a try, and happy (voice) searching!”

We always knew natural-language communication with machines would present huge challenges, ones many said could never be overcome. It seems such naysayers were mistaken.

Cynthia Murrell, November 5, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Data, Google, News, Search, Security | Comments Off on Google Continues to Improve Voice Search

Digging into Googles Rich Answer Vault

November 4, 2015

Google has evolved from entering precise keywords into the search engine to inputting random questions, complete with question mark. Google has gone beyond answering questions and keyword queries. Directly within search results for over a year now, Google has included content referred to as “rich answers,” meaning answers to search queries without having to click through to a Web site. Stone Temple Consulting was curious how much people were actually using rich answers, how they worked, and how can they benefit their clients. In December 2014 and July 2015, they ran a series of tests and “Rich Answers Are On The Rise!” discusses the results.

Using the same data sets for both trials, Stone Temple Consulting discovered that use of Google rich answers significantly grew in the first half of 2015, as did the use of labeling the rich answers with titles, and using images with them. The data might be a skewed in favor of the actual usage of rich answers, because:

“Bear in mind that the selected query set focused on questions that we thought had a strong chance of generating a rich answer. The great majority of questions are not likely to do so. As a result, when we say 31.2 percent of the queries we tested generated a rich answer, the percentage of all search queries that would do so is much lower.”

After a short discussion about the different type of rich answers Google uses and how those different types of answers grew. One conclusion that can be drawn from the types of rich answers is that people are steadily relying more and more on one tool to find all of their information from a basic research question to buying a plane ticket.

Whitney Grace, November 4, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Business intelligence, Data, Google, News, Search | Comments Off on Digging into Googles Rich Answer Vault

Journalists Use Dark Web Technology to Protect Source Privacy

November 4, 2015

Canada’s paper the Globe and Mail suggests those with sensitive information to reveal some Dark Web tech: “SecureDrop at the Globe and Mail.” As governments get less squeamish about punishing whistleblowers, those with news the public deserves to know must be increasingly careful how they share their knowledge. The website begins by informing potential SecureDrop users how to securely connect through the Tor network. The visitor is informed:

“The Globe and Mail does not log any of your interactions with the SecureDrop system, including your visit to this page. It installs no tracking cookies or tracking software of any kind on your computer as part of the process. Your identity is not exposed to us during the upload process, and we do not know your unique code phrase. This means that even if a code phrase is compromised, we cannot comply with demands to provide documents that were uploaded by a source with that code phrase. SecureDrop itself is an open-source project that is subject to regular security audits, reducing the risk of bugs that could compromise your information. Information provided through SecureDrop is handled appropriately by our journalists. Journalists working with uploaded files are required to use only computers with encrypted hard drives and follow security best practices. Anonymous sources are a critical element of journalism, and The Globe and Mail has always protected its sources to the best of its abilities.

The page closes with a warning that no communication can be perfectly secure, but that this system is closer than most. Will more papers take measures to ensure folks can speak up without being tracked down?

Cynthia Murrell, November 4, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Business intelligence, Data, News, Security, Technology, Web Services | 1 Comment

Latest Global Internet Report Available

October 30, 2015

The Internet Society has made available its “Global Internet Report 2015,” just the second in its series. World-wide champions of a free and open Internet, the society examines mobile Internet usage patterns around the globe. The report’s Introduction explains:

“We focus this year’s report on the mobile Internet for two reasons. First, as with mobile telephony, the mobile Internet does not just liberate us from the constraints of a wired connection, but it offers hundreds of millions around the world their only, or primary, means of accessing the Internet. Second, the mobile Internet does not just extend the reach of the Internet as used on fixed connections, but it offers new functionality in combination with new portable access devices.”

It continues with this important warning:

“The nature of the Internet should remain collaborative and inclusive, regardless of changing means of access. In particular, the mobile Internet should remain open, to enable the permission-less innovation that has driven the continuous growth and evolution of the Internet to date, including the emergence of the mobile Internet itself.”

Through the report’s landing page, above, you can navigate to the above-cited Introduction, the report’s Executive Summary, and Section 2: Trends and Growth. There is even an interactive mobile Internet timeline. Scroll to the bottom to download the full report, in PDF, Kindle, or ePub formats. The download is free, but those interested can donate to the organization here.

Cynthia Murrell, October 30, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Data, Enterprise, Mobile, News, Search quality, Technology, Web Services | Comments Off on Latest Global Internet Report Available

CSI Search Informatics Are Actually Real

October 29, 2015

CSI might stand for a popular TV franchise, but it also stands for “compound structured identification” Phys.org explains in “Bioinformaticians Make The Most Efficient Search Engine For Molecular Structures Available Online.” Sebastian Böcker and his team at the Friedrich Schiller University are researching metabolites, chemical compounds that determine an organism’s metabolism. Metabolites are used to gauge information about the condition of living cells.

While this is amazing science there are some drawbacks:

“This process is highly complex and seldom leads to conclusive results. However, the work of scientists all over the world who are engaged in this kind of fundamental research has now been made much easier: The bioinformatics team led by Prof. Böcker in Jena, together with their collaborators from the Aalto-University in Espoo, Finland, have developed a search engine that significantly simplifies the identification of molecular structures of metabolites.”

The new search works like a regular search engine, but instead of using keywords it searches through molecular structure databases containing information and structural formulae of metabolites. The new search will reduce time in identifying the compound structures, saving on costs and time. The hope is that the new search will further research into metabolites and help researchers spend more time working on possible breakthroughs.

Whitney Grace, October 29, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Data, Enterprise, News, Search quality, Security, Technology | Comments Off on CSI Search Informatics Are Actually Real

The PurePower Geared Turbofan Little Engine That Could

October 29, 2015

The article on Bloomberg Business titled The Little Gear That Could Reshape the Jet Engine conveys the 30 year history of Pratt & Whitney’s new PurePower Geared Turbofan aircraft engines. These are impressive machines, they burn less fuel, pollute less, and produce 75% less noise. But thirty years in the making? The article explains,

“In Pratt’s case, it required the cooperation of hundreds of engineers across the company, a $10 billion investment commitment from management, and, above all, the buy-in of aircraft makers and airlines, which had to be convinced that the engine would be both safe and durable. “It’s the antithesis of a Silicon Valley innovation,” says Alan Epstein, a retired MIT professor who is the company’s vice president for technology and the environment. “The Silicon Valley guys seem to have the attention span of 3-year-olds.”

It is difficult to imagine what, if anything, “Silicon Valley guys” might develop if they spent three decades researching, collaborating, and testing a single project. Even more so because of the planned obsalesence of their typical products seeming to speed up every year. In the case of this engine, the article suggests that the time spent has positives and negatives for the company- certain opportunities with big clients were lost along the way, but the dedicated effort also attracted new clients.

Chelsea Kerwin, October 29, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Data, News, Search, Search quality, Security, Technology | Comments Off on The PurePower Geared Turbofan Little Engine That Could

Neglect Exposes Private Medical Files

October 28, 2015

Data such as financial information and medical files are supposed to be protected behind secure firewalls and barriers that ensure people’s information does not fall into the wrong hands. While digital security is at the best it has ever been, sometimes a hacker does not to rely on his/her skills to get sensitive information. Sometimes all they need to do is wait for an idiotic mistake, such as what happened on Amazon Web Services wrote Gizmodo in “Error Exposes 1.5 Million People’s Private Records On Amazon Web Services.”

Tech junkie Chris Vickery heard a rumor that “strange data dumps” could appear on Amazon Web Services, so he decided to go looking for some. He hunted through AWS, found one such dump, and it was a huge haul or it would have been if Vickery was a hacker. Vickery discovered it was medical information belonging to 1.5 million people and from these organizations: Kansas’ State Self Insurance Fund, CSAC Excess Insurance Authority, and the Salt Lake County Database.

“The data came from Systema Software, a small company that manages insurance claims. It still isn’t clear how the data ended up on the site, but the company did confirm to Vickery that it happened. Shortly after Vickery made contact with the affected organizations, the database disappeared from the Amazon subdomain.”

The 1.5 million people should be thanking Vickery, because he alerted these organizations and the data was immediately removed from the Amazon cloud. It turns out that Vickery was the only one to access the data, but it begs the question what would happen if a malicious hacker had gotten hold of the data? You can count on that the medical information would have been sold to the highest bidder.

Vickery’s discovery is not isolated. Other organizations are bound to be negligent in data and your personal information could be posted in an unsecure area. How can you get organizations to better protect your information? Good question.

Whitney Grace, October 28, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Cloud computing, Data, Management, News, Security, Web Services | Comments Off on Neglect Exposes Private Medical Files

RAVN Pipeline Coupled with ElasticSearch to Improve Indexing Capabilities

October 28, 2015

The article on PR Newswire titled RAVN Systems Releases its Enterprise Search Indexing Platform, RAVN Pipeline, to Ingest Enterprise Content Into ElasticSearch unpacks the decision to improve the ElasticSearch platform by supplying the indexing platform of the RAVN Pipeline. RAVN Systems is a UK company with expertise in processing unstructured data founded by consultants and developers. Their stated goal is to discover new lands in the world of information technology. The article states,

“RAVN Pipeline delivers a platform approach to all your Extraction, Transformation and Load (ETL) needs. A wide variety of source repositories including, but not limited to, File systems, e-mail systems, DMS platforms, CRM systems and hosted platforms can be connected while maintaining document level security when indexing the content into Elasticsearch. Also, compressed archives and other complex data types are supported out of the box, with the ability to retain nested hierarchical structures.”

The added indexing ability is very important, especially for users trying to index from from or into cloud-based repositories. Even a single instance of any type of data can be indexed with the Pipeline, which also enriches data during indexing with auto-tagging and classifications. The article also promises that non-specialists (by which I assume they mean people) will be able to use the new systems due to their being GUI driven and intuitive.

Chelsea Kerwin, October 28, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Acquisition, Analytics, Data, Indexing, News, Search | Comments Off on RAVN Pipeline Coupled with ElasticSearch to Improve Indexing Capabilities

The Lack of Digital Diversity

October 27, 2015

Tech companies and their products run our lives. Companies like Apple, Google, and Microsoft have made it impossible to function in developed nations without them. They have taken over everything from communication to how we entertain ourselves. While these companies offer a variety of different products and services, they are more similar than different. The Verge explains that “Apple, Google, And Microsoft Are All Solving The Same Problem.”

Google, Apple, and Microsoft are offering similar services and products in their present options with zero to little diversity among them. For example, there are the personal assistants Cortana vs. Google Now vs. Siri, options for entertainment in the car like Apple CarPlay and Android Auto, and seamless accessibility across devices with Chrome browser, Continuity, and Continuum. There are more comparisons between the three tech giants and their business plans for the future, but it is not only them. Social media sites like Facebook and Twitter are starting to resemble each other more too.

Technology companies have borrowed from each and have had healthy competition for years spurring more innovation, but these companies are operating on such similar principles that it is stifling creativity and startups are taking more risks:

“Without the dual pressures of both the consumer and the stock market, and without a historic reputation to uphold, small startups are now the best engine for generating truly new and groundbreaking innovations. Uber and Airbnb are fundamentally altering the economics of renting things, while hardware designers like Pebble and Oculus are inventing cool new technology that isn’t bound to any particular company’s ecosystem. Startups can see a broader range of problems to address because they don’t have to wear the same economic blinkers as established, monolithic companies.”

The article ends on positive thoughts, however. The present is beating along at a consistent pace, but in order to have more diversity companies should not be copying each other on every little item. Tech companies should borrow ideas from the future to create more original ideas.

Whitney Grace, October 27, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Business intelligence, Data, Google, Marketing, Microsoft, News, Search | Comments Off on The Lack of Digital Diversity

Braiding Big Data

October 26, 2015

An apt metaphor to explain big data is the act of braiding. Braiding requires person to take three or more locks of hair and alternating weaving them together. The end result is clean, pretty hairstyle that keeps a person’s hair in place and off the face. Big data is like braiding, because specially tailored software takes an unruly mess of data, including the combed and uncombed strands, and organizes them into a legible format. Perhaps this is why TopQuadrant named its popular big data software TopBraid, read more about its software upgrade in “TopQuadrant Launches TopBraid 5.0.”

TopBraid Suite is an enterprise Web-based solution set that simplifies the development and management of standards-based, model driven solutions focused on taxonomy, ontology, metadata management, reference data governance, and data virtualization. The newest upgrade for TopBraid builds on the current enterprise information management solutions and adds new options:

“ ‘It continues to be our goal to improve ways for users to harness the full potential of their data,’ said Irene Polikoff, CEO and co-founder of TopQuadrant. ‘This latest release of 5.0 includes an exciting new feature, AutoClassifier. While our TopBraid Enterprise Vocabulary Net (EVN) Tagger has let users manually tag content with concepts from their vocabularies for several years, AutoClassifier completely automates that process.’ “

The AutoClassifer makes it easier to add and edit tags before making them a part of the production tag set. Other new features are for TopBraid Enterprise Vocabulary Net (TopBraid EVN), TopBraid Reference Data Manager (RDM), TopBraid Insight, and the TopBraid platform, including improvements in internationalization and a new component for increasing system availability in enterprise environments, TopBraid DataCache.

TopBraid might be the solution an enterprise system needs to braid its data into style.

Whitney Grace, October 26, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Analytics, Big data, Data, Metadata, News | Comments Off on Braiding Big Data

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Google Continues to Improve Voice Search

Digging into Googles Rich Answer Vault

Journalists Use Dark Web Technology to Protect Source Privacy

Latest Global Internet Report Available

CSI Search Informatics Are Actually Real

The PurePower Geared Turbofan Little Engine That Could

Neglect Exposes Private Medical Files

RAVN Pipeline Coupled with ElasticSearch to Improve Indexing Capabilities

The Lack of Digital Diversity

Braiding Big Data

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta