Misinformation and Truth: An Issue in Play

July 6, 2015

Navigate to “Italian Newspaper Creates Fake Restaurant to Prove TripAdvisor Sucks.” The story tells the story of a real journalistic operation which created a non existent restaurant. Then the real journalists contributed reviews of the vaporous eatery. TripAdvisor’s algorithms sucked in the content and, according to the write up,

declared La Scaletta the best restaurant in the town, beating out another highly-regarded restaurant with over 300 reviews (most of them positive).

Ah, real journalism, truth, and the manipulation of socially-anchored systems.

Now direct your attention to “Fact Verification As Easy as Spellcheck?” The point of this article is that figuring what’s accurate and inaccurate is non trivial. The write up reports:

Researchers at Indiana University decided to try a different approach to the problem.  Instead of trying to build complex logic into a program, researchers proposed something simpler.  Why not try measure the likelihood of a statement being true by analyzing the proximity of its terms and the specificity of its connectors?

The procedure involves a knowledge graph. Is this the same, much loved graph approach built with the most frequently used mathematical methods? No information to answer that question is in my files, gentle reader.

My radar is directed at Bloomington, Indiana. Perhaps more information will become available on software’s ability to figure out if the Italian restaurant is real or the confection of real journalists. Note: The GOOG seems to be laboring in this vineyard was well. See this Bezos story.

What if—just hypothetical, of course—the “truth” methods can be spoofed by procedures more sophisticated that cooking up some half cooked tortellini? Those common numerical methods are pliable, based on my team’s research. Really flexible when it comes to what’s “truth.”

Stephen E Arnold, July 6, 2015

Bing Game: Search Has to Be Fun, Fun, Fun

July 6, 2015

Navigate to “Microsoft Put a Pong Game in Its Bing Search Engine.” Yep, when I run a query I definitely want to distract myself with a quick video game session. Doesn’t everyone 70 years old have this compelling need to lose focus and forget why one visited a search engine in the first place. No wonder Bing is just so darned wonderful. Just the other day I was looking for information about the Citadel exploit from 2011, and I ended up playing Pong. Wow, as I recall, the experience was really helpful to my work.

The write up states;

People are discovering that if you search for “pong” on the Bing site, the search results include a playable version of one of the first video games ever made. The game allows the classic digital paddles to be moved up and down with a mouse or keyboard on the PC, or via fingers on touch screen.

Let’s have more distractions to prevent me from experiencing incomplete and irrelevant results to my queries.

Stephen E Arnold, July 6, 2015

Short Honk: Renormalization Group Equation

July 6, 2015

A short honk for the math lovers: I recommend “Why Deep Learning Works 2: The Renormalization Group.” The post presents several important examples of renormalization, which is Fancy Math for figuring out variances and then figuring out what the inputs are “about.” (Math nerds, I am trying to express important concepts in a very, very simple way.) The big ideas explained pretty well are manifolds and renormalization. The methods are applicable to certain types of machine learning. A happy quack to Dr. Charles Martin for this article.

Stephen E Arnold, July 6, 2015

Google Before The SEC

July 6, 2015

Searching the Web you can find the most amazing and obscure items, such as this little gem from the Securities and Exchange Commission’s Web site: “Schedule 14A” registered by Google IncSchedule 14A explains information required for an SEC proxy statement, which is given to stockholders when votes are solicited at stockholders’ meetings.  This Schedule 14A lists many of the high-tech projects Google is working on to improve the lives of people.  Google founder Sergey Brin supposedly writes the schedule, but more than likely is was written by an assistant and his name was signed at the end.

It opens with this brief passage:

“When Larry and I founded Google in 1998, many elements came together to make our work possible. Like other companies at the time, we benefited from the increasing power and low cost of computation and from the unprecedented shift of information to the internet. We shared a profound belief in the power of technology to make life better for people everywhere and imagined what life could be like 10, 15, 20 years down the road. Nevertheless, now that we are here, I am amazed at the progress and opportunities. For example, I could not have imagined we would be making a computer that fits in a contact lens, with the potential to make life better for millions of people with diabetes.”

It is followed by a description of the contact lens that measures glucose levels in a body, then it goes into how Google revolutionized search and in turn delivered high-end services like email and Google Photos.

What Google piece would be complete without mentioning the self-driving cars?  Autonomous cars came about by increased computation power, but at least they do mention it will be sometime before they are ready for consumers.

Google does have an impressive list of accomplishments, sure to please any stockholder.  The question is will there be anything they will not experiment with?

Whitney Grace, July 6, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

What Watson Can Do For Your Department

July 6, 2015

The story of Justin Chen, a Finance Manager, is one of many “Stories by Role” now displayed on IBM. Each character has a different job, such as Liza Hay from Marketing, Donny Cruz from IT and Anisa Mirza from HR. Each job comes with a problem for which Watson, IBM’s supercomputer, has just the solution. Justin, the article relates, is having trouble deciding which payments to follow. Watson provides solutions,

“With IBM® Watson™ Analytics, Justin can ask which customers are least likely to pay, who is most likely to pay and why. He can analyze this information… [and] collect more payments more efficiently… With Watson Analytics, Justin can ask which customers are likely to leave and which are likely to stay and why. He can use the answers for analysis of customer attrition and retention, predict the effect on revenue and determine which customer investments will lead to more profitable growth.”

It seems that the now world-famous Watson has been converted from search to a basket containing any number of IBM software solutions. It isn’t stated in the article, but we can probably assume that the revenue from each solution counts toward Watson’s soon to be reported billions in revenue.

Chelsea Kerwin, July 6, 2014

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Lexmark: Brainware, ISYS, and Kofax May Not Be Enough

July 5, 2015

Here I am. Sitting in the misty morn contemplating layoffs in the Louisville-Lexington region. At a Fourth of July party, the founder of a large Kentucky-based business reassured his listeners that there would be almost no layoffs as a result of the Aetna-Humana deal. I yawned.

My mind was not attending to the woes of Humana’s soon to be unemployed thousands. I was considering the news item I had just read on my trusty Blackberry Classic (right, no iPhone for me, gentle reader).

The short item was “Insider Selling: Lexmark International CFO David Reeder Sells 7,283 Shares of Stock (LXK).” Who was doing the selling? The person was David Reeder, the Lexmark chief financial officer. Perhaps Mr. Reeder has to send a child to school or must replace a cracking concrete driveway?

Lexmark beat some analyst estimates in its April 2015 quarterly statement. What’s the big deal?

The write up reports:

Several analysts have recently commented on the stock. Analysts at Goldman Sachs initiated coverage on shares of Lexmark International in a research note on Wednesday, June 17th. They set a “sell” rating and a $34.00 price target on the stock. Analysts at Zacks downgraded shares of Lexmark International from a “hold” rating to a “sell” rating in a research note on Wednesday, June 3rd. Analysts at Cross Research upgraded shares of Lexmark International from a “sell” rating to a “hold” rating and raised their price target for the stock from $36.00 to $43.00 in a research note on Thursday, May 14th. Analysts at Brean Capital reiterated a “hold” rating on shares of Lexmark International in a research note on Thursday, April 30th. Finally, analysts at TheStreet upgraded shares of Lexmark International from a “hold” rating to a “buy” rating in a research note on Tuesday, April 28th. Five analysts have rated the stock with a sell rating, four have issued a hold rating and two have assigned a buy rating to the company. The company currently has an average rating of “Hold” and an average target price of $39.29.

My question is, “Will revenues from the content processing acquisitions ignite Lexmark’s revenues and pump up the profits?” My research suggests that Lexmark may find that making big money from content centric software is no picnic on a warm sunny day.

I am rooting for the printer company, but I am a realist. Some Lexmarkians may want to keep their résumés sparkling and bright. When a CFO sells shares, I pay attention.

Stephen E Arnold, July 5, 2015

A Reminder about What Is Available to Search

July 5, 2015

Navigate to “Big Data, Big Problems: 4 Major Link Indexes Compared.” The write up explains why indexes have different content in their indexes. The services referenced in the write up are:

  • Ahrefs. A backlink index updated every 15 minutes.
  • Majestic. A big data solution for marketers and others. The company says, “Majestic-12 has crawled the web again, and again, and again. We have seen 2.7 trillion URLs come and go, and in the last 90 days we have seen, checked, scored and categorized 715 billion URLs.”
  • Moz. Products for in bound marketers.
  • SEMrush. Search engine marketing for digital marketers.

Despite the marketing focus, there were some interesting comments based on the analysis of backlink services (who links to what). Here’s one point I highlighted:

Each organization has to create a crawl prioritization strategy.

The article points out:\

The bigger the crawl, the more the crawl prioritization will cause disparities. This is not a deficiency; this is just the nature of the beast.

Yep, editorial choice. Inclusions and exclusions. Take away. When you run a query, chances are you are getting biased, incomplete information for the query.

The most important statement in the write up, in my opinion, is this one:

If anything rings true, it is that once again it makes sense to get data from as many sources as possible.

Good advice for search experts and sixth graders. Oh, MBAs may want to heed the statement as well.

But who cares? Probably not too many Internet users. Exciting when these “incomplete” information searchers make decisions.

Stephen E Arnold, July 5, 2015

Quote to Note: Google and Focus

July 5, 2015

Here’s a quote I tucked “Great Remarks from Sillycon Valley”

“We have a great many priorities.”—Sergey Brin, Google

You can read more about the future of the GOOG in “Where Is Google Taking Us?

Does anyone remember Steve Jobs’s statement:

Focusing is about saying No.

I do.

Stephen E Arnold, July 5, 2015

An Oddly Mystical, Whimsical Listicle Combining Big Data and Search

July 4, 2015

Some listicles are clearly the work of college students after a tough beer pong tournament. Others seem as if they emanate from beyond Pluto’s orbit. I am not sure where on this spectrum between the addled and extraterrestrial the listicle in “Top 11 Open Source big Data Enterprise Search Software” falls.

Here’s the list for your contemplation. I have added some questions after each company’s name. Consult the original write up for the explanation the inclusion of these systems in the list. I found the write ups without much heft or “wood” to use a Google term.

  1. Apache Solr. Yep, uses Lucene libraries, right. Performance? Exciting sometimes.
  2. Apache Lucene Core. Ah, Lego blocks for the engineer with some aspirations for continuous employment.
  3. Elasticsearch. The leader in search and retrieval. To do big data, there are some other components required. Make sure your programming and engineering expertise are up to the job.
  4. Sphinx. Okay, workable for structured data. Work required to stuff unstructured content into this system.
  5. Constellio. Isn’t this a part time project of a consulting firm focused on Canadian government work?
  6. DataparkSearch Engine. Yikes.
  7. ApexKB. Okay, a script. For enterprise applications. Big Data? Wow.
  8. Searchdaimon ES. Useful, speedier than either Lucene or Elasticsearch. Not a big data engine without some extra work. Come to think of it. A lot of work.
  9. mnoGoSearch. Well, maybe for text.
  10. Nutch. Old in the tooth. Why not use Lucene?
  11. Xapian. Very robust. Make certain that you have programming expertise and engineering knowledge. Often ignored which is too bad. But be prepared for some heavy lifting or paying a wizard with a mental fork lift to do the job.

Now which of these systems can do “big data.” In one sense, if you are exceptionally gifted with engineering and programming skills, I suppose any of these can do tricks. As Samuel Johnson allegedly observed to his biographer:

“Sir, a woman’s preaching is like a dog’s walking on his hind legs. It is not done well; but you are surprised to find it done at all.”

On the other hand, these programs can be used as a utility within a more robust content processing system which has been purpose built to deal with large flows of structured and unstructured content. But even that takes work.

Anyone want to give Constellio a shot at processing real time Facebook posts? Anyone want to use any of these systems to solve that type of search problem? Show of hands, please?

Stephen E Arnold, July 4, 2015

Google: Clogging my Alerts with Negative Semantics

July 4, 2015

I hit the computer for a few moments when I was jarred from slumber by my neighbors setting off heavy ordinance. Ah, rural Kentucky.

Two stories perched at the top of my “Read this list.”

The first is “Google Just Suffered Three new Attacks against Its Already Trouble Core Business.” I am not sure what the GOOG has done to get in the nose of some online publications, but the story is loaded for bear. The headline suggests that the Google has a trouble business and it is going to become more troubled. The word “eroding” is a semantic killer: Google + eroding. Quite a bound phrase. The story asserts:

Google is still a search and ads company — that’s where it makes about 90% of its revenue. And that core business is under attack from a million directions at once.

What ways?

Well, the Google lost Verizon AOL search and ad goodies. Facebook is goosing its advertising options to its billion or so lads and lasses. And Pinterest cooked up “buyable pins.” Sounds painful, but presumably the pain will be stuck into the Google voodoo doll.

The second is the Slashdot story “Google Hangouts and SMS Integration: A Mess, for Now.” Whether true or not, the semantic union of Google + Mess is interesting. The main point of the story which may not be accurate is:

I wish there were a good roadmap for all the overlapping and sometimes circular-seeming options for Google’s various flavors of VoiP and messaging. Between Google Voice, Google Plus, Messenger (not Facebook’s Messenger), Gmail, and now Google Fi, it’s hard to tell quite where the there begins.

Maybe the post Fourth of July fireworks will be sparklers, not aerial reports.

Stephen E Arnold, July 4, 2015

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta