January 23, 2017
Twitter is making news again. The company sold some tools to the Google. Google, wisely Beyond Search thinks, has not yet built up the gumption to buy the whole Twitter enchilada. And Twitter continues to annoy some professionals who use Twitter data to figure out the who, what, and why of certain illegal activities.
A London, Ont., data mining company has been banned from Twitter and is being reviewed by Facebook for selling surveillance software to North American police services to monitor people at Black Lives Matter events and other public protests.
The company in question is Media Sonar, one of a number of firms which developed tools to make sense of messages and metadata generated by the folks who send information via Twitter “tweets”. (You can watch a video explaining some of the firm’s methods at this link.) Another example of a social media analysis outfit is Geofeedia which has been given a bloody nose by spasmodic Silicon Valley wizards.
The write up reports:
Media Sonar did not return calls to CBC News but its website states that it works to help clients analyze the sentiment of social media posts and can use location-based data to monitor threats.
Beyond Search believes that some high flying Silicon Valley companies develop systems and do not think about how these systems will be used. Then when the high flying Silicon Valley executives realize that their whizzy new creation has some interesting applications, the Twitter-type outfits take action. The approach is fascinating to watch.
On one hand, Twitter is struggling to develop its user base and get some sizzle back. On the other hand, the company is selling off grandma’s furniture and turning off revenue from licensees of the Twitter content stream.
Interesting stuff. Chaos monkeys in real life? Seems like it.
Stephen E Arnold, January 23, 2017
January 23, 2017
Yep, indexing is back. The cacaphone “ontology” is the next big thing yet again. Folks, an ontology is a form of metadata. There are key words, categories, and classifications. Whipping these puppies into shape has been the thankless task of specialists for hundreds if not thousands of years. “What Is an Ontology and Why Do I Want One?” tries to make indexing more alluring. When an enterprise search system delivers results which are off the user’s information need or just plain wrong, it is time for indexing. The problem is that machine based indexing requires some well informed humans to keep the system on point. Consider Palantir Gotham. Content finds its way into the system when a human performs certain tasks. Some of these tasks are riding herd on the indexing of the content object. IBM Analyst’s Notebook and many other next generation information access systems work hand in glove with expensive humans. Why? Smart software is still only sort of smart.
The write up dances around the need for spending money on indexing. The write up prefers to confuse a person who just wants to locate the answer to a business related question without pointing, clicking, and doing high school research paper dog work. I noted this passage:
Think of an ontology as another way to classify content (like a taxonomy) that allows you to identify what the content is about and how it relates to other types of content.
Okay, but enterprise search generally falls short of the mark for 55 to 70 percent of a search system’s users. This is a downer. What makes enterprise search better? An ontology. But without the cost and time metrics, the yap about better indexing ends up with “smart content” companies looking confused when their licenses are not renewed.
What I found amusing about the write up is that use of an ontology improves search engine optimization. How about some hard data? Generalities are presented, not instead of some numbers one can examine and attempt to verify.
SEO means getting found when a user runs a query. That does not work too well for general purpose Web search systems like Google. SEO is struggling to deal with declining traffic to many Web sites and the problem mobile search presents.
But in an organization, SEO is not what the user wants. The user needs the purchase order for a client and easy access to related data. Will an ontology deliver an actionable output. To be fair, different types of metadata are needed. An ontology is one such type, but there are others. Some of these can be extracted without too high an error rate when the content is processed; for example, telephone numbers. Other types of data require different processes which can require knitting together different systems.
To build a bubble gum card, one needs to parse a range of data, including images and content from a range of sources. In most organizations, silos of data persist and will continue to persist. Money is tight. Few commercial enterprises can afford to do the computationally intensive content processing under the watchful eye and informed mind of an indexing professional.
Cacaphones like “ontology” exacerbate the confusion about indexing and delivering useful outputs to users who don’t know a Boolean operator from a SQL expression.
Indexing is a useful term. Why not use it?
Stephen E Arnold, January 23, 2017
January 23, 2017
Recently I was speaking with someone and the conversation turned to libraries. I complimented the library’s collection in his hometown and he asked, “You mean they still have a library?” This response told me a couple things: one, that this person was not a reader and two, did not know the value of a library. The Lucidea blog discussed how “Do The Original 5 Laws Of Library Science Hold Up In A Digital World?” and apparently they still do.
S.R. Ranganathan wrote five principles of library science before computers dominated information and research in 1931. The post examines how the laws are still relevant. The first law states that books are meant to be used, meaning that information is meant to be used and shared. The biggest point of this rule is accessibility, which is extremely relevant. The second laws states, “Every reader his/her book,” meaning that libraries serve diverse groups and deliver non-biased services. That still fits considering the expansion of the knowledge dissemination and how many people access it.
The third law is also still important:
Dr. Ranganathan believed that a library system must devise and offer many methods to “ensure that each item finds its appropriate reader”. The third law, “every book his/her reader,” can be interpreted to mean that every knowledge resource is useful to an individual or individuals, no matter how specialized and no matter how small the audience may be. Library science was, and arguably still is, at the forefront of using computers to make information accessible.
The fourth law is “save time for the reader” and it refers to being able to find and access information quickly and easily. Search engines anyone? Finally, the fifth law states that “the library is a growing organism.” It is easy to interpret this law. As technology and information access changes, the library must constantly evolve to serve people and help them harness the information.
The wording is a little outdated, but the five laws are still important. However, we need to also consider how people have changed in regards to using the library as well.
Whitney Grace, January 23, 2017
January 23, 2017
The article titled Microsoft Launches Researcher and Editor in Word, Zoom in PowerPoint on VentureBeat discusses the pros and cons of the new features coming to Office products. Editor is basically a new and improved version of spellcheck that goes beyond typos to report back on wordiness, passive voice, and cliché usage. This is an exciting tool that might put a few proofreaders out of work, but it is hard to see any issues beyond that. The more controversial introduction by Microsoft is Researcher, and the article explains why,
Researcher… will give users a way to find and incorporate additional information from outside sources. This makes it easy to add a quote and even generate proper academic citations for use in papers. Explicit content won’t appear in search results, so you won’t accidentally import it into your work. And you won’t find yourself in some random Wikipedia rabbit hole, because the search for additional information happens in a panel on the right side of your Word document.
Researcher pulls information from the Bing Knowledge Graph to provide writers with relevant connections to their topics. The question is, will users rely on Researcher to fact-check for them, or will they make sure that the suggested source material is appropriate and substantiated? In spite of the lessons of the Republic National Convention, plagiarism can get you into big trouble (in a college classroom, anyway.) It is easy to see student users failing to properly cite or quote the suggested information, unless Researcher also offers help in those activities as well. Is this a good thing, or is it another way to make our children dumber by enabling shortcuts?
Chelsea Kerwin, January 23, 2017
January 21, 2017
Beyond Search read a short but interesting “news” item with the interesting title “Yahoo Japan is Refusing to Stop the Sale of Ivory on Its Website.” Like other Internet news items, we believe everything we read online. Yahoo, according to the write up, is selling ivory. The write up points out:
“Even Marissa Mayer, CEO of Yahoo, has tried to stop the trade — but the business argues that so long as no laws are broken, people should be able to trade whatever it wants on the site.”
We love the “even.”
A Yahoo Japan person, quoted anonymously in the write up, allegedly says:
We want to provide an internet auction site where people can trade freely, and at this moment we have no intention of banning legal trading without any reason,” a spokesman for Yahoo Japan said. “We don’t believe the ivory sales contribute to a fall in elephant numbers.”
US Yahoo, I learned:
bans the sale of endangered animal products, says it can’t force Yahoo Japan to change. Mayer has not publicly addressed the issue, though she has let it be known that she has raised concerns internally.
The tireless warriorette, Marissa Mayer, “has met up dozens of times with Yahoo Japan on this issue.” Meeting up is easy because US Yahoo owns more than 35 percent of Yahoo Japan.
Well, Yahoo is trying, using the same management methods which may have contributed to the loss of users’ credentials. Trying. Yes, Ms. Mayer is trying.
Stephen E Arnold, January 21, 2018
January 20, 2017
I read “MongoDB Hackers Set Sights on ElasticSearch Servers with Widespread Ransomware Attacks.” According to the write up, more than 2,400 ElasticSearch services were “affected by ransomware in three days.”
“Attackers are finding open servers where there is no authentication at all. This can be done via a number of services and tools. Unfortunately, system admins and developers have been leaving these unauthenticated systems online for a while and attackers are just picking off the low hanging fruit right now.”
The write up explained:
ElasticSearch is a Java-based search engine, commonly used by enterprises for information cataloguing and data analysis.
What’s the remediation? One can pay the ransom. We suggest that Elastic cloud users read the documentation and implement the features appropriate for their use case.
Stephen E Arnold, January 20, 2017
January 20, 2017
The Lost Angeles Times published “A Look at the 17 Agencies That Make Up the U.S. Intelligence Community.” My hunch is that the “real” journalists thought that the list would be “real” news. I scanned the information and noted:
- No useful urls were provided
- Where to track funding and new project announcements was not included
- Specific information about the objectives of each entity was omitted
- The sub entities associated with the principal intelligence entity; for example, Strategic Capabilities Office.
What is the list? Well, if a small outfit in Orange County wants to sell its products and services to the US government’s “intelligence’ entities, the list provides a starting point for research.
The article could have been become a useful way to stimulate outfits not participating in these agencies’ projects to get the ball rolling. The write up contains one useful thing—a list of agencies which blurs the role of the Department of Defense and omits some interesting entities:
Air Force Intelligence, Surveillance and Reconnaissance
Army Military Intelligence
Central Intelligence Agency
Coast Guard Intelligence
Defense Intelligence Agency
Drug Enforcement Administration, Office of National Security Intelligence
Energy Department, Office of Intelligence and Counterintelligence
Federal Bureau of Investigation
Homeland Security, Office of Intelligence and Analysis
Marine Corp Intelligence
National Geospatial Intelligence Agency
National Reconnaissance Office
National Security Agency
Office of Naval Intelligence
Office of the Director of National Intelligence
State Department, Bureau of Intelligence and Research
Treasury Department, Office of Intelligence and Analysis
My hunch is that the “real” newspaper is revealing the vapidity of its editorial method. But, hey, I live in rural Kentucky and don’t understand the ways of the big city folks.
Stephen E Arnold, January 20, 2017
January 20, 2017
After reading Search Engine Journal’s, “The Evolution Of Semantic Search And Why Content Is Still King” brings to mind how there RankBrain is changing the way Google ranks search relevancy. The article was written in 2014, but it stresses the importance of semantic search and SEO. With RankBrain, semantic search is more of a daily occurrence than something to strive for anymore.
RankBrain also demonstrates how far search technology has come in three years. When people search, they no longer want to fish out the keywords from their query; instead they enter an entire question and expect the search engine to understand.
This brings up the question: is content still king? Back in 2014, the answer was yes and the answer is a giant YES now. With RankBrain learning the context behind queries, well-written content is what will drive search engine ranking:
What it boils to is search engines and their complex algorithms are trying to recognize quality over fluff. Sure, search engine optimization will make you more visible, but content is what will keep people coming back for more. You can safely say content will become a company asset because a company’s primary goal is to give value to their audience.
The article ends with something about natural language and how people want their content to reflect it. The article does not provide anything new, but does restate the value of content over fluff. What will happen when computers learn how to create semantic content, however?
Whitney Grace, January 20, 2016
January 20, 2017
Just a quick honk about a little Google feature called Popular Times. LifeHacker points out an improvement to the tool in, “Google Will Now Show You How Busy a Business Is in Real Time.” To help users determine the most efficient time to shop or dine, the feature already provided a general assessment of businesses’ busiest times. Now, though, it bases that information on real-time metrics. Writer Thorin Klosowski specifies:
The real time data is rolling out starting today. You’ll see that it’s active if you see a ‘Live’ box next to the popular times when you search for a business. The data is based on location data and search terms, so it’s not perfect, but will at least give you a decent idea of whether or not you’ll easily find a place to sit at a bar or how packed a store might be. Alongside the real-time data comes some other info, including how long people stay at a location on average and hours by department, which is handy when a department like a pharmacy or deli close earlier the rest of a store.
Just one more way Google tries to make life a little easier for its users. That using it provides Google with even more free, valuable data is just a side effect, I’m sure.
Cynthia Murrell, January 20, 2017
January 19, 2017
I believe everything I read on the Internet. I am so superficial. Perhaps I am the most superficial person living in rural Kentucky. The write up “The Google-Facebook Online Ad Cartel is the Biggest Competition Problem” seems to be the work of a person who specializes in future Internet competition. He has worked for presidents and written op eds for “real” journalistic outfits. I am convinced… almost.
The main point of the write up is that Facebook and Google operate as a cartel. I highlighted this statement:
Google commands ~90% market share of mobile search and search advertising. It protects those monopolies with an anti-competitive moat around Alphabet-Google by cross-subsidizing the global offering over 200 expensive-to-create, products and services for free, i.e. dramatically below Google’s total costs. Those many expensive subsidized products and services make Google’s moat competitively impregnable, because no competitor could afford to recreate them without a highly profitable online ad business, and the Goobook ad cartel forecloses that very competitive possibility.
The statement echoes Chaos Monkeys, the tell all about the high flying world of Silicon Valley.
I also noted:
In early 2013, Facebook launched its alternative to Google search, called “Facebook Graph Search” in partnership with Microsoft’s Bing search engine. Then in 2014, Google and Facebook obviously, abruptly, and relatively quietly, chose to no longer directly compete with one another. In the first half of 2014, Google reversed course in social, defunding Google+, ending its forced integration, and announcing the shutdown of Orkut, Google’s 300 million user social network. In the second half of 2014, Facebook quietly dropped its Facebook Graph Search alternative to Google search and its search partnership with Microsoft’s Bing.
One consequence is:
Goobook’s customers – advertisers — pay higher ad prices and have less cohesive and effective ad campaigns under the Goobook ad cartel than they would have if Google and Facebook continued to compete. No material competition to keep them honest, also means Google and Facebook can avoid third party accountability for the core advertising activity metrics that they use to charge for their ad services.
The net net is that US laws and policies:
favors free-content models over paid content models, ultimately produces monopolies and monopolies colluding in cartel behaviors that are hostile to property rights. Monopsonies [sic] de facto forcing property owners to offer their property for sale at a wholesale price at zero, is anti-competitive and predatory. Free is not a price, it’s a subsidy or a loss.
No monopoly word. The cartel word is the moniker for these two esteemed outfits grouped under the neologism “Goobook.” WWTD? Oh, that means “What will Trump do?” Perhaps the Trump White House will retain the author as a policy adviser for cartels?
Stephen E Arnold, January 19, 2017