Iran-Russia Ink Pact for Search Engine Services

November 28, 2016

Owing to geopolitical differences, countries like Iran are turning towards like-minded nations like Russia for technological developments. Russian Diplomat posted in Iran recently announced that home-grown search engine service provider Yandex will offer its services to the people of Iran.

Financial Tribune in a news report Yandex to Arrive Soon said that:

Last October, Russian and Iranian communications ministers Nikolay Nikiforov and Mahmoud Vaezi respectively signed a deal to expand bilateral technological collaborations. During the meeting, Russian Ambassador Vaezi said, We are familiar with the powerful Russian search engine Yandex. We agreed that Yandex would open an office in Iran. The system will be adapted for the Iranian people and will be in Persian.

Iran traditionally has been an extremist nation and at the center of numerous international controversies that indirectly bans American corporations from conducting business in this hostile territory. On the other hand, Russia which is seen as a foe to the US stands to gain from these sour relations.

As of now, .com and .com.tr domains owned by Yandex are banned in Iran, but with the MoU signed, that will change soon. There is another interesting point to be observed in this news piece:

Looking at Yandex.ir, an official reportedly working for IRIB purchased the website, according to a domain registration search.  DomainTools, a portal that lists the owners of websites, says Mohammad Taqi Mozouni registered the domain address back in July.

Technically, and internationally accepted, no individual or organization can own a domain name of a company with any extension (without necessary permissions) that has already carved out a niche for itself online. It is thus worth pondering what prompted a Russian search engine giant to let a foreign governmental agency acquire its domain name.

Vishal Ingole November 28, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

EasyAsk Guarantees Revenue Boost with Its eCommerce Search System

November 26, 2016

I read “How EasyAsk Will Help You Drive 23 to 121% Higher eCommerce Revenues: Guaranteed.” The headline is quite different from most search vendors’ announcements. Search vendors, in my experience, do not guarantee anything: Uptime, fees, performance. EasyAsk, a natural language search technology vendor, is guaranteeing more eCommerce revenues. Like most information available online, I assume that the facts are correct.

I highlighted this statement:

Within 90 days of the EasyAsk implementation, 95% of internal searches were returning the right results – nearly eliminating the dreaded no-results pages. The results have been outstanding;

  • Search conversion has increased by 54%
  • Revenue from search has seen a boost of over 71%
  • Transactions are up 81%

Unlike SOLR, EasyAsk offers powerful merchandising tools that are intuitive, easy-to-use and maintained by business users instead of programmers.

Now the “guarantee” part:

We [EasyAsk] will contractually guarantee that EasyAsk will drive at least 20% more revenue from search.

Here’s how:

  • We will take a baseline benchmark measuring revenue, conversion rate and average transactions on your existing search engine.
  • We will work with you to deploy and implement EasyAsk’s eCommerce suite to provide you with advanced Natural Language semantic search and merchandising.
  • Within 90 days of implementation, we will perform a new benchmark that measures revenue, conversion rate and average transactions and compare them with the original baseline. EasyAsk will contractually guarantee to drive at least 20% more revenue.

The write up explains that there is no risk to the eCommerce vendor who embraces EasyAsk.

There you go. A New Year’s gift which is six weeks early.

Stephen E Arnold, November 26, 2016

Need Data Integration? Think of Cisco. Well, Okay

November 25, 2016

Data integration is more difficult than some of the text analytics’ wizards state. Software sucks in disparate data and “real time” analytics systems present actionable results to marketers, sales professionals, and chief strategy officers. Well, that’s not exactly accurate.

Industrial strength data integration demands a company which has bought a company which acquired a technology which performs data integration. Cisco offers a system that appears to combine the functions of Kapow with the capabilities of Palantir Technologies’ Gotham and tosses in the self service business information which Microsoft touts.

Cisco acquired Composite Information in 2013. Cisco now offers the Composite system as the Cisco Information Server. Here’s what the block diagram of the federating behemoth looks like. You can get a PDF version at this link.

image

The system is easy to use. “The graphical development and management environments are easy to learn and intuitive to use,” says the Cisco Teradata information sheet. For some tips about the easy to use system check out the Data Virtualization Cisco Information Server blog. A tutorial, although dated is, at this link. Note that the block diagram between 2011 and the one presented above has not significantly changed. I assume there is not much work required to ingest and make sense of the Twitter stream or other social media content.
The blog has one post and was last updated in 2011. But there is a YouTube video at this link.

The system includes a remarkable range of features; for example:

  • Modeling which means import and transform what Cisco calls “introspect”, create a model and figure out how to make it run at an acceptable level of performance, and expose the data to other services. (Does this sound like iPhrase’s and Teratext’s method? It does to me.)
  • Search
  • Transformation
  • Version control and governance
  • Data quality control and assurance
  • Outputs
  • Security
  • Administrative controls.

The time required to create this system is, according to Cisco Teradata, is “over 300 man years.”

The licensee can plug the system into an IBM DB2 running on a z/OS8 “handheld”. You will need a large hand by the way. No small hands need apply.

Stephen E Arnold, November 25, 2016

The Noble Quest Behind Semantic Search

November 25, 2016

A brief write-up at the ontotext blog, “The Knowledge Discovery Quest,” presents a noble vision of the search field. Philologist and blogger Teodora Petkova observed that semantic search is the key to bringing together data from different sources and exploring connections. She elaborates:

On a more practical note, semantic search is about efficient enterprise content usage. As one of the biggest losses of knowledge happens due to inefficient management and retrieval of information. The ability to search for meaning not for keywords brings us a step closer to efficient information management.

If semantic search had a separate icon from the one traditional search has it would have been a microscope. Why? Because semantic search is looking at content as if through the magnifying lens of a microscope. The technology helps us explore large amounts of systems and the connections between them. Sharpening our ability to join the dots, semantic search enhances the way we look for clues and compare correlations on our knowledge discovery quest.

At the bottom of the post is a slideshow on this “knowledge discovery quest.” Sure, it also serves to illustrate how ontotext could help, but we can’t blame them for drumming up business through their own blog. We actually appreciate the company’s approach to semantic search, and we’d be curious to see how they manage the intricacies of content conversion and normalization. Founded in 2000, ontotext is based in Bulgaria.

Cynthia Murrell, November 25, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Keeping Current with Elastic.co

November 24, 2016

Short honk. If you want to keep up with Elastic and Elasticsearch, the company’s “This Week in Elasticsearch and Apache Lucene” may be of interest. The weekly posting includes information about commits, releases, and training. Unlike the slightly crazed, revenue challenged open source search vendors, Elastic.co provides factual information about the plumbing for the search and retrieval system. We found the “Ongoing Changes” section useful and interesting. The idea is that one can keep track of certain features, methods, and issues by scanning a list. The short description of an issue, for instance, includes a link to additional information. Highly recommended for those hooked on Elastic.co’s free and open source solution or the for fee products and services the company offers.

Stephen E Arnold, November 24, 2016

Do Not Forget to Show Your Work

November 24, 2016

Showing work is messy, necessary step to prove how one arrived at a solution.  Most of the time it is never reviewed, but with big data people wonder how computer algorithms arrive at their conclusions.  Engadget explains that computers are being forced to prove their results in, “MIT Makes Neural Networks Show Their Work.”

Understanding neural networks is extremely difficult, but MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) has developed a way to map the complex systems.  CSAIL figured the task out by splitting networks in two smaller modules.  One for extracting text segments and scoring according to their length and accordance and the second module predicts the segment’s subject and attempts to classify them.  The mapping modules sounds almost as complex as the actual neural networks.  To alleviate the stress and add a giggle to their research, CSAIL had the modules analyze beer reviews:

For their test, the team used online reviews from a beer rating website and had their network attempt to rank beers on a 5-star scale based on the brew’s aroma, palate, and appearance, using the site’s written reviews. After training the system, the CSAIL team found that their neural network rated beers based on aroma and appearance the same way that humans did 95 and 96 percent of the time, respectively. On the more subjective field of “palate,” the network agreed with people 80 percent of the time.

One set of data is as good as another to test CSAIL’s network mapping tool.  CSAIL hopes to fine tune the machine learning project and use it in breast cancer research to analyze pathologist data.

Whitney Grace, November 24, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Hitachi Digs into Enterprise Search

November 23, 2016

HItachi Data Systems has embraced “content intelligence.” My recollection is that the “search” underlying the HItachi Content Platform is Perfect Search, a proprietary system which emphasized its performance features, not its ease of use for system administrators.

“Hitachi Adds Enterprise Search to Object Store” informs me that:

Hitachi Data Systems today debuted Content Intelligence, a new offering that adds a slew of enterprise search and analytic capabilities to its object-based file system.

Slew?

The system supports multi tenant, cloud scale deployments. The block diagram for the system looks like this:

image

According to a Hitachi professional, the new system will be “invaluable.” That is, I presume, a “slew” of value.

Hitachi was the second best system for object storage according to the big moon, mid tier consulting firm Gartner Group. The number one system was IBM Watson’s cell mate CleverSafe dsNet. (This is not the IBM Almaden Clever system for relevance determination.)

Other features, in addition to search, are a cloud gateway component, a file synchronization tool, and the ability to share access. For more information about the system, you can read “Better Object Storage with Hitachi Content Platform 2014.”

Stephen E Arnold, November 23, 2016

Hear That Bing Ding: A Warning for Google Web Search

November 23, 2016

Bing. Bing. Bing. The sound reminds me of a broken elevator door in the Block & Kuhl when I was but a wee lad. Bing. Bing. Bing. Annoying? You bet.

I read “Microsoft Corporation Can Defeat Alphabet Inc in Search.” I enjoy these odd, disconnected from the real world write ups predicting that Microsoft will trounce Google in a particular niche. This particular write up seizes upon the fluff about Microsoft having an “intelligence fabric.” Then with a spectacular leap, which ignores the fact that more than 90 percent of the humans use Google Web search, suggests that Bing will be the next big thing in Web search.

Get real.

Bing, after two decades of floundering, allegedly is profitable. No word on how long it will take to pay back the money Microsoft has invested in Web search over these 4,000 days of stumbling.

I highlighted this passage in the write up:

Rik van der Kooi, corporate vice president of Microsoft Search Advertising, referred to Bing as an “intelligence fabric” that has been embedded into Windows 10, Cortana, Xbox and other products, including Hololens. He went on to say the future Bing will be personal, pervasive and offer a personal experience so much that it “might not be obvious users are even interacting with the search engine.

I think I understand. Microsoft is everywhere. Microsoft Bing is embedded. Therefore, Microsoft beats Google Web search.

Great thinking.

I do like this passage:

This is a bold call considering that Google owned 89.38% of the global desktop search engine market, while Microsoft owned 4.2% as of July 2016, according to data provided by Statista. With MSFT’s endeavors to create an integrated ecosystem, however, the long-term scale is tipping in the favor of Microsoft stock. That’s because Microsoft’s traditional business is entrenched into many people’s lives as well as business operations. For instance, the majority of desktop devices run on Windows.

Yep, there are lots of desktops still. However, there are more mobile devices. If I am not mistaken, Google’s Android runs more than 80 percent of these devices. Add desktop and mobile and what do you get? No dominance of Web search by Bing the way I understand the situation.

Sure, I love the Bing thing. I have some affection for Qwant.com, Yandex.com, and Inxight.com too. But Microsoft has yet to demonstrate that it can deliver a Web search system which is able to change the behaviors of today’s users. Look at the Google in the word processing space. Microsoft continues to have an edge and Google has been trying for more than a decade to make Word an afterthought. That hasn’t happened. Inertia is a big factor.

Search for growing market share on Bing. What’s that answer look like? Less than five percent of the Web search market? Oh, do that query on Google by the way.

Stephen E Arnold, November 23, 2016

Writing That Is Never Read

November 23, 2016

It is inevitable in college that you were forced to write an essay.  Writing an essay usually requires the citation of various sources from scholarly journals.  As you perused the academic articles, the thought probably crossed your mind: who ever reads this stuff?  Smithsonian Magazine tells us who in the article, “Academics Write Papers Arguing Over How Many People Read (And Cite) Their Papers.”  In other words, themselves.

Academic articles are read mostly by their authors, journal editors, and the study’s author write, and students forced to cite them for assignments.  In perfect scholarly fashion, many academics do not believe that their work has a limited scope.  So what do they do?  They decided to write about it and have done so for twenty years.

Most academics are not surprised that most written works go unread.  The common belief is that it is better to publish something rather than nothing and it could also be a requirement to keep their position.  As they are prone to do, academics complain about the numbers and their accuracy:

It seems like this should be an easy question to answer: all you have to do is count the number of citations each paper has. But it’s harder than you might think. There are entire papers themselves dedicated to figuring out how to do this efficiently and accurately. The point of the 2007 paper wasn’t to assert that 50 percent of studies are unread. It was actually about citation analysis and the ways that the internet is letting academics see more accurately who is reading and citing their papers. “Since the turn of the century, dozens of databases such as Scopus and Google Scholar have appeared, which allow the citation patterns of academic papers to be studied with unprecedented speed and ease,” the paper’s authors wrote.

Academics always need something to argue about, no matter how miniscule the topic. This particular article concludes on the note that someone should get the number straight so academics can move onto to another item to argue about.  Going back to the original thought a student forced to write an essay with citations also probably thought: the reason this stuff does not get read is because they are so boring.

Whitney Grace, November 23, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Exit Shakespeare, for He Had a Coauthor

November 22, 2016

Shakespeare is regarded as the greatest writer in the English language.  Many studies, however, are devoted to the theory that he did not pen all of his plays and poems.  Some attribute them to Francis Bacon, Edward de Vere, Christopher Marlowe, and others.  Whether Shakespeare was a singular author or one of many, two facts remain:  he was a dirty, old man and it could be said he plagiarized his ideas from other writers.  Shall he still be regarded as the figurehead for English literature?

Philly.com takes the Shakespeare authorship into question in the article, “Penn Engineers Use Big Data To Show Shakespeare Had Coauthor On ‘Henry VI’ Plays.”  Editors of a new edition of Shakespeare’s complete works listed Marlowe as a coauthor on the Henry VI plays due to a recent study at the University of Pennsylvania.  Alejandro Ribeiro used his experience researching networks could be applied to the Shakespeare authorship question using big data.

Ribeiro learned that Henry VI was among the works for which scholars thought Shakespeare might have had a co-author, so he and lab members Santiago Segarra and Mark Eisen tackled the question with the tools of big data.  Working with Shakespeare expert Gabriel Egan of De Montfort University in Leicester, England, they analyzed the proximity of certain target words in the playwright’s works, developing a statistical fingerprint that could be compared with those of other authors from his era.

Two other research groups had the same conclusion with other analytical techniques.  The results from all three studies were enough to convince the lead general editor of the New Oxford Shakespeare Gary Taylor, who decided to list Marlowe as a coauthor to Henry VI.  More research has been conducted to determine other potential Shakespeare coauthors and six more will also be credited in the New Oxford editions.

Ribeiro and his team created “word-adjacency networks” that discovered patterns in Shakespeare’s writing style and six other dramatists.  They discovered that many scenes in Henry VI were non-written in Shakespeare’s style, enough to prove a coauthor.

Some Shakespeare purists remain against the theory that Shakespeare did not pen all of his plays, but big data analytics proves many of the theories that other academics have theorized for generations.  The dirty old man was not old alone as he wrote his ditties.

Whitney Grace, November 22, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta