August 25, 2016
The article titled National Language Processing: Turning Words Into Data on B2C takes an in-depth look at NLP and why it is such a difficult area to perfect. Anyone who has conversed with an automated customer service system knows that NLP technology is far from ideal. Why is this? The article suggests that while computers are great at learning the basic rules of language, things get far more complex when you throw in context-dependent or ambiguous language, not to mention human error. The article explains,
“This has changed with the advent of machine learning…In the case of NLP, using a real-world data set lets the computer and machine learning expert create algorithms that better capture how language is actually used in the real world, rather than on how the rules of syntax and grammar say it should be used. This allows computers to devise more sophisticated—and more accurate—models than would be possible solely using a static set of instructions from human developers.”
Throw in Big Data and we have a treasure trove of unstructured data to glean value from in the form of text messages, emails, and social media. The article lists several exciting applications such as automatic translation, automatic summarization, Natural Language Generation, and sentiment analysis.
Chelsea Kerwin, August 25, 2016
August 24, 2016
While science fiction portrays artificial intelligence in novel and far-reaching ways, certain products utilizing artificial intelligence are already in existence. WinBeta released a story, Microsoft exec at London conference: AI will “change everything”, which reminds us of this. Digital assistants like Cortana and Siri are one example of how mundane AI can appear. However, during a recent AI conference, Microsoft UK’s chief envisioning officer Dave Choplin projected much more impactful applications. This article summarizes the landscape of concerns,
Of course, many also are suspect about the promise of artificial intelligence and worry about its impact on everyday life or even its misuse by malevolent actors. Stephen Hawking has worried AI could be an existential threat and Tesla CEO Elon Musk has gone on to create an open source AI after worrying about its misuse. In his statements, Choplin also stressed that as more and more companies try to create AI, ‘We’ve got to start to make some decisions about whether the right people are making these algorithms.
There is much to consider in regards to artificial intelligence. However, such a statement about “the right people” cannot stop there. Choplin goes on to refer to the biases of people creating algorithms and the companies they work for. Because organizational structures must also be considered, so too must their motivator: the economy. Perhaps machine learning to understand the best way to approach AI would be a good first application.
August 23, 2016
After several tests, the fourth HonkinNews video is available on YouTube. You can view the six minute video at https://youtu.be/AIYdu54p2Mg. The HonkinNews highlights a half dozen stories from the previous week’s Beyond Search stream. The commentary adds a tiny twist to most of the stories. We know that search and content processing are not the core interests of the millennials. We don’t expect to attract much of a following from teens or from “real” search experts. Nevertheless, we will continue with the weekly news program because Google has an appetite for videos. We will continue with the backwoods theme and the 16 mm black and white film. We think it adds a high tech look to endless recycling of search and content jargon which fuels information access today.
Kenny Toth, August 23, 2016
August 22, 2016
I learned about the Ami search system called Albert a decade ago. My notes indicated that at that time the company was Swiss but had strong ties to France. Not surprisingly, when Ami’s market momentum dictated a sale, a French company stepped forward and bought Ami and its happy face identity:
Bertin Technologies has integrated Ami Albert into its market intelligence suite. Search appears to be a utility function. The company says that it is “a publisher and integrator of cutting edge software solutions.” The company offers cyber security, digital intelligence, and speech processing.
According to the deal description on the Bertin Web site:
The ability to offer Market Intelligence and Risk Intelligence sees the creation of a key player in Web Content Mining, whose international outlook is supported by an industrial group with a presence in 15 countries.
Ami, a search vendor, morphed into a market intelligence company. When the deal was announced in mid 2015, AMI had 150 clients. The company operated via two subsidiaries in the UK and Morocco. The unique value of Ami comes from Bertin’s capabilities.
In 2006, Ami counted LexisNexis, Sinequa, Lingway, and itself via the Go Albert unit as “partners.”
The company’s search interface looked like this before Ami pivoted to content scraping and “market intelligence.”
Search results looked like this:
Ami emphasized that it could perform metasearch functions; that is, take a user’s query and send it to different systems with individual search interfaces. Here’s how Ami presented this idea to prospective customers:
Ami also emulated the analytic report methods found in i2 Analyst’s Notebook and Palantir Technologies, among others.
No details about the terms of the deal were announced. I did not include Ami Albert in any of the Enterprise Search Report profiles I created. The company seemed to be focused on building traction in Europe, not the US. In retrospect, Ami’s trajectory is similar to many other search vendors’. The company enters the market, moves forward for ten years, and then sells. A new owner is probably a better fate than locking the doors and turning off the lights.
Stephen E Arnold, August 22, 2016
August 22, 2016
Recent news has made clear that online content from the U.S. or any country foreign to China faces challenges in China. An article from CNN Money recently published Microsoft is giving up on its Chinese web portal. This piece informs us that Microsoft will sunset it’s MSN website in China on June 7. Through their company statement, Microsoft mentions their commitment to China remains and notes China is home to the largest R&D facility outside the U.S. An antitrust investigation on Microsoft in China has been underway since July 2014. The article shares an overview of the bigger picture,
The company’s search engine, Bing, also flopped in the country amid tough competition with homegrown rivals. It didn’t help that in Chinese, “Bing” sounds similar to the word for “sickness.
Other Western tech firms have come under scrutiny in China before, including Qualcomm(QCOM, Tech30) and Apple (AAPL, Tech30). Social networks like Facebook (FB, Tech30) and Google (GOOG) remain blocked in the country.”
It looks like Bing will bite the dust soon, in China at least. Does this news mean anything for Microsoft as a company? While regulations China are notably stringent, the size of their population makes up a notably sized market. We will be watching to see how search plays out in China.
Megan Feil, August 22, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph There is a Louisville, Kentucky Hidden /Dark Web meet up on August 23, 2016. Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233019199/
August 19, 2016
Has the next Ashley Madison incident happened? International Business Times reports on breached information that has surfaced on the Dark Web. The article, Fling.com breach: Passwords and sexual preferences of 40 million users up for sale on dark web, sheds some light on what happened in the alleged 40 million records posted on the The Real Deal marketplace. One source claims the leaked data was old information. Another source reports a victim who says they never had an account with Fling.com. The article states,
“The leak is the latest in a long line of dating websites being targeted by hackers and follows similar incidents at Ashley Madison, Mate1, BeautifulPeople and Adult Friend Finder. In each of these cases, hundreds of thousands – if not millions – of sensitive records were compromised. While in the case of Ashley Madison alone, the release of information had severe consequences – including blackmail attempts, high-profile resignations, and even suicide. Despite claims the data is five years old, any users of Fling.com are now advised to change their passwords in order to stay safe from future account exploitation.”
Many are asking about the facts related to this data breach on the Dark Web — when it happened and if the records are accurate. We’re not sure if it’s true, but it is sensational. The interesting aspect of this story is in the terms of service for Fling.com. The article reveals Fling.com is released from any liability related to users’ information.
Megan Feil, August 19, 2016
There is a Louisville, Kentucky Hidden /Dark Web meet up on August 23, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233019199/
August 17, 2016
I am quite skeptical about the results a free Web index presents when I look for information. I don’t want to single out any of the towering giants of Web search as spoofers, charlatans, and snake oil vendors but those ads and the quest for money are Job One.
I read “Why Poets Can Make Better Search Engines.” You may be able to access this write up for free or maybe not. A real journalistic outfit created the confection.
The idea is that search engines have to have the ability to form more poetic thoughts about what a user’s query “means.” I learned:
Artificial intelligence and machine intelligence are about decreasing the length of human perception. Google autocomplete is an attempt to shorten the time and path between thought and a response — to decrease the time and path between seeing something and categorizing it or identifying it and moving on.
The person making this statement is offered by a person working on Kensho, which is described as:
essentially a search engine for economic events and data.
Dig out your copy of Percy Bysshe Shelly’s collected poems. Use them as models for improving precision in recall in search. How exactly? Not important. As ee cummings wrote:
Unbeing dead isn’t being alive.
Stephen E Arnold, August 17, 2016
August 16, 2016
The weekly news program about search, online, and content processing is now available at https://youtu.be/mE3MGlmrUWc. In addition to comments about Goo!Hoo, IBM, and Microsoft, you will learn about grilling squirrel over a wood fire. Live from Harrod’s Creek.
Stephen E Arnold, August 16, 2016
August 16, 2016
In an exclusive interview, Yippy’s head of enterprise search reveals that Yippy launched an enterprise search technology that Google Search Appliance users are converting to now that Google is sunsetting its GSA products.
Yippy also has its sights targeting the rest of the high-growth market for cloud-based enterprise search. Not familiar with Yippy, its IBM tie up, and its implementation of the Velocity search and clustering technology? Yippy’s Michael Cizmar gives some insight into this company’s search-and-retrieval vision.
Yippy ((OTC PINK:YIPI) is a publicly-trade company providing search, content processing, and engineering services. The company’s catchphrase is, “Welcome to your data.”
The core technology is the Velocity system, developed by Carnegie Mellon computer scientists. When IBM purchased Vivisimio, Yippy had already obtained rights to the Velocity technology prior to the IBM acquisition of Vivisimo. I learned from my interview with Mr. Cizmar that IBM is one of the largest shareholders in Yippy. Other facets of the deal included some IBM Watson technology.
This year (2016) Yippy purchased one of the most recognized firms supporting the now-discontinued Google Search Appliance. Yippy has been tallying important accounts and expanding its service array.
John Cizmar, Yippy’s senior manager for enterprise search
Beyond Search interviewed Michael Cizmar, the head of Yippy’s enterprise search division. Cizmar found MC+A and built a thriving business around the Google Search Appliance. Google stepped away from on premises hardware, and Yippy seized the opportunity to bolster its expanding business.
I spoke with Cizmar on August 15, 2016. The interview revealed a number of little known facts about a company which is gaining success in the enterprise information market.
Cizmar told me that when the Google Search Appliance was discontinued, he realized that the Yippy technology could fill the void and offer more effective enterprise findability. He said, “When Yippy and I began to talk about Google’s abandoning the GSA, I realized that by teaming up with Yippy, we could fill the void left by Google, and in fact, we could surpass Google’s capabilities.”
Cizmar described the advantages of the Yippy approach to enterprise search this way:
We have an enterprise-proven search core. The Vivisimo engineers leapfrogged the technology dating from the 1990s which forms much of Autonomy IDOL, Endeca, and even Google’s search. We have the connector libraries THAT WE ACQUIRED FROM MUSE GLOBAL. We have used the security experience gained via the Google Search Appliance deployments and integration projects to give Yippy what we call “field level security.” Users see only the part of content they are authorized to view. Also, we have methodologies and processes to allow quick, hassle-free deployments in commercial enterprises to permit public access, private access, and hybrid or mixed system access situations.
With the buzz about open source, I wanted to know where Yippy fit into the world of Lucene, Solr, and the other enterprise software solutions. Cizmar said:
I think the customers are looking for vendors who can meet their needs, particularly with security and smooth deployment. In a couple of years, most search vendors will be using an approach similar to ours. Right now, however, I think we have an advantage because we can perform the work directly….Open source search systems do not have Yippy-like content intake or content ingestion frameworks. Importing text or an Oracle table is easy. Acquiring large volumes of diverse content continues to be an issue for many search and content processing systems…. Most competitors are beginning to offer cloud solutions. We have cloud options for our services. A customer picks an approach, and we have the mechanism in place to deploy in a matter of a day or two.
Connecting to different types of content is a priority at Yippy. Even through the company has a wide array of import filters and content processing components, Cizmar revealed that Yippy is “enhanced the company’s connector framework.”
I remarked that most search vendors do not have a framework, relying instead on expensive components licensed from vendors such as Oracle and Salesforce. He smiled and said, “Yes, a framework, not a widget.”
Cizmar emphasized that the Yippy IBM Google connections were important to many of the company’s customers plus we have also acquired the Muse Global connectors and the ability to build connectors on the fly. He observed:
Nobody else has Watson Explorer powering the search, and nobody else has the Google Innovation Partner of the Year deploying the search. Everybody tries to do it. We are actually doing it.
Cizmar made an interesting side observation. He suggested that Internet search needed to be better. Is indexing the entire Internet in Yippy’s future? Cizmar smiled. He told me:
Yippy has a clear blueprint for becoming a leader in cloud computing technology.
For the full text of the interview with Yippy’s head of enterprise search, Michael Cizmar, navigate to the complete Search Wizards Speak interview. Information about Yippy is available at http://yippyinc.com/.
Stephen E Arnold, August 16, 2016
August 12, 2016
I read “The Human Cost of Tech Debt.” The write up picks up the theme about the amount of money needed to remediate engineering mistakes, bugs, and short cuts. The cost of keeping an original system in step with newer market entrants’ products adds another burden.
The write up is interesting and includes some original art. Even though the art is good, the information presented is better; for example:
For a manager, a code base high in technical debt means that feature delivery slows to a crawl, which creates a lot of frustration and awkward moments in conversation about business capability. For a developer, this frustration is even more acute. Nobody likes working with a significant handicap and being unproductive day after day, and that is exactly what this sort of codebase means for developers. Each day they go to the office knowing that it’s going to take the better part of a day to do something simple like add a checkbox to a form. They know that they’re going to have to manufacture endless explanations for why seemingly simple things take them a long time. When new developers are hired or consultants brought in, they know that they’re going to have to face confused looks, followed by those newbies trying to hide mild contempt.
My interest is search and content processing. I asked myself, “Why are search and retrieval systems better than they were in 1975. When I queried the RECON system, I was able to find specific documents which contained information matching the terms in my query. Four decades ago, I could generate a useful result set. The bummer was that the information appeared on weird thermal printer paper. But I usually found the answer to my question in a fraction of the time required for me to run a query on my Windows machine or my Mac.
My view is that search and retrieval tends to be a recycling business. The same basic systems and methods are used again and again. The innovations are wrappers. But to make search more user friendly, add ons look at a user’s query history and behind the scenes filter the results to match the history.
The shift to mobile has been translated to providing results that other people have found useful. Want a pizza? You can find one, but if you want Cuban food in Washington, DC, you may find that the mapping service does not include a popular restaurant for reasons which may be related to advertising expenditures.
We ran a series of queries across five Dark Web search and retrieval systems. None of the systems delivered high precision and high recall results. In order to find certain large sites, manual review and one-at-a-time clicking and review were needed to locate what we were querying.
Regular Web or Dark Web. Online search has discarded useful AND, OR, NOT functions, date and time stamps, and any concern about revealing editorial or filtering postures to a user.
Technological debt explains that most search outfits lack the money to deliver a Class A solution. What about the outfits with oodles of dough and plenty of programmers? The desire and need to improve search is not a management priority.
Some vendors mobile search operates from a vendor’s copy of the indexed sites. Easy, computationally less expensive, and good enough.
Tech debt is a partial explanation for the sad state of online search at this time.
Stephen E Arnold, August 12, 2016