Factoids about Toutiao: Smart News Filtering Service

August 28, 2017

The filtering service Toutiao is operated by Bytedance. The company attracted attention  because it is generating money (allegedly) and has lots of users or “daily average users” in the 120 million range. (If you are acronym minded, the daily average user count is a DAU. Holy Dau!)

Forget Google’s “translate this page” for Toutiao, the service is blind to the Toutiao content. A work around is to cut and paste snippets into FreeTranslations.org or get someone who reads Chinese to explain what’s on the Toutiao’s pages.

Other items of interest include. (Oh, the hyperlinks point to the source of the factoid.)

    • $900 million in revenue (allegedly). Wall Street Journal, August 28, 2017 with a pay wall for your delectation
    • Funding of $3 billion Crunchbase
    • Valuation of $20 billion or more Reuters
    • Toutiao means headlines Wikipedia
    • What it does from Wikipedia:

Toutiao uses algorithms to select different quality content for individual users. It has created algorithmic models that understand information (text, images, videos, comments, etc.) in depth, and developed large-scale machine learning systems for personalized recommendation that surfaces content users have not necessarily signaled preference for yet. Using Natural Language Processing and Computer Vision technologies in A.I, Toutiao extracts hundreds of entities and keywords as features from each piece of content. When a user first open the app, Toutiao makes a preliminary recommendation based on the operation system of his mobile device, his location and other factors. With users’ interactions with the app, Toutiao fine-tunes its models and make better recommendations.

  • Founded by Zhang Yiming, age 34, in 2012 Reuters

Technode’s “Why Is Toutiao, a News App, Setting Off Alarm Bells for China’s Giants?” suggests that Toutiao may be the next big Chinese online success. The reason is that the service aggregates “news” from disparate content sources; for example, text, video, images, and data.

Toutiao may be the next big thing in algorithmic, mobile centric information access solutions. The company generates revenues from online ads. The company’s secret sauce include smart software plus some extra ingredients:

  • Social functions
  • Search
  • Video
  • User generated “original” content
  • Global plans.

Net net: Worth watching.

Stephen E Arnold, August 28, 2017

Is China the New Los Angeles Trend Machine?

August 28, 2017

I was last in China in 2007 and then in Hong Kong in 2010. My information is, therefore, out of date. That’s no big whoop for me, since I am ready to tally 74 years in our thrilling world.

I read “In China You Now Have to Provide Your Real Identity If You Want to Comment Online.” The main point of the write up is that the free and open Internet is going the way of the dodo. The goal of “real name registration” is to make it easy for certain official to track down individuals without the expensive, time consuming, and sometimes messy “traditional” identity investigations.

I noted this passage:

So what exactly constitutes forbidden topics on the Chinese internet? An unnamed CAC official told a journalist the following when asked about the new rules (first translated by The Diplomat):

  1. opposing the principles of the constitution of China
  2. endangering national security, revealing state secrets, subverting state power, and undermining national reunification
  3. damaging national honor and interests
  4. inciting national hatred, ethnic discrimination, and undermining national unity
  5. undermining the state’s policies on religion or promoting cults and feudal superstitions
  6. spreading rumors or disrupting social order
  7. spreading obscenity, pornography, violence, or terror, or abetting a crime
  8. insulting or slandering others and infringing upon the lawful rights and interests of others
  9. violating any other laws and regulations

My reaction to the write up is that censorship, China-style, may be the latest trend to emerge from the Middle Kingdom. Once Los Angeles on the left coast generated the “in” fads which would then roll toward Harrod’s Creek.

My thought is that censorship may be the new black or whatever the hot color is for fall fashion. I am not particularly surprised because similar governmental actions seem to have emerged from the deliberative bodies in Russia, Turkey, and other countries. One African nation state just turned off the Internet, an Iran-style touch.

One idea struck me. Is now the time for individuals to generate an alternative or optional Internet identity. Creating a “legend” or an alternate Internet identity is important. Just ask the person who ran the illegal Dark Web site AlphaBay. The mistake that individual made was to use an identity which was not “clean.

The procedure for setting up a legend or clean Internet identity is not easy. There are a number of steps. Human mistakes can render a clean identity traceable; that is, dirty. If you are able to verify that you are working for a recognized law enforcement or intelligence entity, you can obtain a legend from the Beyond Search Overflight team. This is our WITSEC Light bundle. More comprehensive legends are also available to qualified LE and intel professionals.

To explore this package which contains an alias, matching email address, and other necessary elements like a Walmart pay as you go phone, just write darkwebnotebook at yandex dot com. Remember. We verify that you have a legitimate LE or intel role prior to providing the legend, a workable biography, and summary of what one has to do to build out the legend.

Those who do not qualify will have to look elsewhere for a way to deal with censorship constraints in countries other than the US. If the China censorship trend moves outward from that country, more than one online identity may be needed for some operations.

Stephen E Arnold, August 28, 2017

Google Drops Instant Search as Mobile Use Rises

August 28, 2017

As more and more Googlers turn to mobile devices to access the search giant the Instant Search feature, first introduced in 2010, becomes irrelevant. Originally, this feature was a time saving (albeit milliseconds) feature giving Google users a much-needed edge in search. But that was for desktops. Now that mobile is king, Google is rethinking their strategies.

According The Verge,

…more than half of Google searches happen on mobile, with the scales continually tipping away from desktop as time goes on. On mobile screens, Instant Search doesn’t make as much sense given we use our fingers and virtual buttons to interact with software, and trying to load a results page on top of the onscreen keyboard isn’t exactly good user experience design.

Internet based services recognizing the trend toward mobile use is nothing new, but Google eliminating one of its hallmark features shows that mobile use for search is much more than simply a trend. Always leading the way, Google is making a statement about the direction of search and we expect others to quickly jump on the bandwagon.

Catherine Lamsfuss, August 28, 2017

The Tech Unicorn Ploy

August 28, 2017

This should not come as much of a surprise— Business Insider reports, “Nearly Half of Tech ‘Unicorns’ Rely on Tricky Math to Land Imaginary Valuations.” So dubbed because they were once rare, “unicorn” startups are ones that have achieved valuations of at least a billion dollars. That is “billion” with a “b.” According to a pair of business professors (from the UBC Sauder School of Business and the Stanford  Graduate School of Business), there are now more than 200 such “rare” prospects globally. Why the apparent boom in unicorn birth rates? Citing a recent study put out by the above-mentioned professors, reporter Alex Morrell writes:

Many of [these startups] are using creative financing maneuvers to conjure imaginary valuation figures that don’t hold up to scrutiny, according to the UBC/GSB study, which examined 116 unicorns. It turns out, when you adjust the valuations to account for guarantees provided to preferred shareholders that dilute the value of common shares, nearly half of unicorns lose their coveted $1 billion status.

The article links to an interview with Will Gornall, the professor from UBC Sauder, that explains how he and co-researcher Ilya Strebulaev re-evaluated purported unicorns to discount the influence of such preferred-shareholder guarantees. They found nearly half sported fake horns, with 11% having been valued at more than twice their fair values. The article continues:

Here’s how it works: In later funding rounds, startups will negotiate a higher share price, but as part of the bargain they guarantee their investors certain protections — such as earning a minimum return on their money or guaranteeing they’ll be paid out in full before all other shareholders. ‘Specifically, we found that 53 per cent of unicorns gave their most recent investors either a return guarantee in IPO (14%), the ability to block IPOs that did not return most of their investment (20%), seniority over all other investors (31%), or other important terms,’ Gornall said. Even though this sort of thing has become normal, valuations haven’t caught up to the fact that providing additional protections to senior shareholders lessens the value of common shareholders. Treating the shares equally can significantly inflate the overall value of the company.

Overvaluation can, of course, help a startup attract funding, talent, and customers. For employees, however, such tactics can end up devaluing their compensation packages. Both workers and investors should be wary of over-valuation trickery.

Cynthia Murrell, August 28, 2017

An Automatic Observer for Neural Nets

August 25, 2017

We are making progress in training AI systems through the neural net approach, but exactly how those systems make their decisions remains difficult to discern. Now, Tech Crunch reveals, “MIT CSAIL Research Offers a Fully Automated Way to Peer Inside Neural Nets.” Writer Darrell Etherington recalls that, a couple years ago, the same team of researchers described a way to understand these decisions using human reviewers. A fully automated process will be much more efficient and lead to greater understanding of what works and what doesn’t. Etherington explains:

Current deep learning techniques leave a lot of questions around how systems actually arrive at their results – the networks employ successive layers of signal processing to classify objects, translate text, or perform other functions, but we have very little means of gaining insight into how each layer of the network is doing its actual decision-making. The MIT CSAIL team’s system uses doctored neural nets that report back the strength with which every individual node responds to a given input image, and those images that generate the strongest response are then analyzed. This analysis was originally performed by Mechanical Turk workers, who would catalogue each based on specific visual concepts found in the images, but now that work has been automated, so that the classification is machine-generated. Already, the research is providing interesting insight into how neural nets operate, for example showing that a network trained to add color to black and white images ends up concentrating a significant portion of its nodes to identifying textures in the pictures.

The write-up points us to MIT’s own article on the subject for more information. We’re reminded that, because the human thought process is still largely a mystery to us, AI neural nets are based on hypothetical models that attempt to mimic ourselves. Perhaps, the piece suggests, a better understanding of such systems could inform the field of neuroscience. Sounds fair.

Cynthia Murrell, August 25, 2017

Google Is Rewiring Internet, Again

August 25, 2017

Google revolutionized the Internet by downloading all data in its server and offering fast search results. The search engine giant plans to do it again by introducing a series of network infrastructures to make search faster.

The Next Platform in an article titled How Google Wants to Rewire the Internet says:

Running a fast, efficient, hyperscale network for internal datacenters is not sufficient for a good user experience, and that is why Google has created a software defined networking stack to do routing over the public Internet, called Espresso.

The whole exercise of creating an extra layer or network infrastructure is to enhance user experience. As Google today generates 25% of the global Internet traffic, it is becoming difficult for the search engine giant to keep the results relevant.

Google used custom developed routers and switches for implementing this program. Hope that now people are able to find what they are looking for without getting lost in the maze of sponsored advertisements.

Vishal Ingole, August 25, 2017

Lucidworks: The Future of Search Which Has Already Arrived

August 24, 2017

I am pushing 74, but I am interested in the future of search. The reason is that with each passing day I find it more and more difficult to locate the information I need as my routine research for my books and other work. I was anticipating a juicy read when I requested a copy of “Enterprise Search in 2025.” The “book” is a nine page PDF. After two years of effort and much research, my team and I were able to squeeze the basics of Dark Web investigative techniques into about 200 pages. I assumed that a nine-page book would deliver a high-impact payload comparable to one of the chapters in one of my books like CyberOSINT or Dark Web Notebook.

I was surprised that a nine-page document was described as a “book.” I was quite surprised by the Lucidworks’ description of the future. For me, Lucidworks is describing information access already available to me and most companies from established vendors.

The book’s main idea in my opinion is as understandable as this unlabeled, data-free graphic which introduces the text content assembled by Lucidworks.


However, the pamphlet’s text does not make this diagram understandable to me. I noted these points as I worked through the basic argument that client server search is on the downturn. Okay. I think I understand, but the assertion “Solr killed the client-server stars” was interesting. I read this statement and highlighted it:

Other solutions developed, but the Solr ecosystem became the unmatched winner of the search market. Search 1.0 was over and Solr won.

In the world of open source search, Lucene and Solr have gained adherents. Based on the information my team gathered when we were working on an IDC open source search project, the dominant open source search system was Lucene. If our data were accurate when we did the research, Elastic’s Elasticsearch had emerged as the go-to open source search system. The alternatives like Solr and Flaxsearch have their users and supporters, but Elastic, founded by Shay Branon, was a definite step up from his earlier search service called Compass.

In the span of two and a half years, Elastic had garnered more than a $100 million in funding by 2014and expanded into a number adjacent information access market sectors. Reports I have received from those attending Elastic meetings was that Elastic was putting considerable pressure on proprietary search systems and a bit of a squeeze on Lucidworks. Google’s withdrawing its odd duck Google Search Appliance may have been, in small part, due to the rise of Elasticsearch and the changes made by organizations trying to figure out how to make sense of the digital information to which their staff had access.

But enough about the Lucene-Solr and open source versus proprietary search yin and yang tension.

Read more

A Brilliant List of Open Source Localization Tools

August 24, 2017

Open source projects over technology developers the ability to access technology usually locked behind pay walls.  One trouble with open source technology is language translation and the ability for developers to localize their projects.  Language continues to remain a barrier in our technology driven world, but there are tools to overcome it.  OpenSource.com curated a list of, “18 Open Source Translation Tools To Localize Your Project.”

The curator understands the pains of proprietary software:

The proprietary versions of these tools can be quite expensive. A single license for SDL Trados Studio (the leading CAT tool) can cost thousands of euros, and even then it is only useful for one individual and the customizations are limited (and psst, they cost more, too). Open source projects looking to localize into many languages and streamline their localization processes will want to look at open source tools to save money and get the flexibility they need with customization.

The list includes tools for machine translation, which is a hot commodity.  Software that can generate a digestible and accurate translation from one language to another is a must have for many localization projects.  The list recommends checking out Apertium and Moses.  Computer-assisted translation tools are a must have for all translations and language students, because they can save hours of looking up information in dead tree lexicons.  They also work in real time, saving more countless hours, so you should check out OmegaT, Subtitles Translator, and Anaphraseus.  If you are working with multiple translators on your project you will need to utilize a translation management system to organize everyone-think SharePoint.  Jabylon, Zanata, GlobalSight, and Pootle are some good TMS software to check out.  Also included are localization automation tools that can ease your work burden, such as Okapi Framework and Mojito.

Whitney Grace, August 24, 2017

Attack Planes Soon to Be Equipped with Lasers

August 24, 2017

The US Air Force soon will be equipping its attack planes with laser weapons to fight UAVs that terrorist organizations may use for launching attacks.

According to an op-ed published by Defense One and titled The Future of the Air Force, the author says:

We are currently investing in the hardware to ensure space superiority; in the near future we will need to grow the number of space airmen and the accompanying infrastructure much like we did for the combat Air Force 40 years ago.

Wars in the future will be fought on multiple fronts, including space. As per the op-ed, the US Air Force needs to be equipped sufficiently to fight these battles without putting people on the front line.

The op-ed also says about the acquisition of an Israeli company that enables attack planes using lasers to fend off drones that are used for dropping bombs and other weapons. The acquisition does not come as a surprise as Pentagon had been researching use of lasers as tactical weapons since long. It seems the days of Star Wars are very near.

Vishal Ingole, August 24, 2017

Google and Walmart: More Than a Super Saver Special?

August 23, 2017

I read “Walmart and Google Partner on Voice-Based Shopping.” The main point of the write up is that talking to a device is the way people will buy nylon shirts, dog food, and giant bottles of fizzy drinks. The write up points out the smart Google features and the allure of having a person (a Googley electric vehicle putting the packages in front of a house. (Package poacher alert.)

I noted this passage:

Google Express is also today ditching its membership fees, and now promises free delivery across its retailers in one to three days, as long as customer orders are above each store’s minimums… Google believes its fees were limiting adoption and were particularly cumbersome when it came to enabling voice shopping.

Google may not be as much believing as reacting to data which may suggest that the approach was as tasty as off brand cat food to a persnickety feline. Google. Data. Remember?

The notion of the Google bubble providing a boost to Walmart’s mobilization against Amazon is threaded through the write up.

From my vantage point in Harrod’s Creek, I thought about three issues:

  1. Amazon is a far greater threat to Google than just product search. Amazon is winning in this particular category if the data I have collected are accurate. A three to one gap seems to loom for the GOOG. I think of the dropped ball with Froogle, and the rest is Amazon’s history.
  2. Google is thinking less like the bold imitator it was when it needed to generate revenue and the Yahoo, Overture, GoTo approach was so darned juicy and semi-available. Now the teaming is a response to a genuine business threat. Yep, Amazon again. Google is reacting in a way that reminds me of a small business that finds itself watching a larger outfit changing the rules of the game and threatening the small business as collateral damage. “We have to do something big, significant” echoes in my mind.
  3. Neither Google nor Walmart are particularly fast moving. The companies share other similarities: Neither has figured out Act 2 in their corporate dramas. Neither believes that what happened to Endeca or Sears can be allowed to happen to them. Neither has been able to spin gold from acquisitions. Are there other parallels? This is a question worth considering.

Net net: The tie up is less about a leapfrog of Amazon and more about what big companies sensing future distress do to come up with a “significant action.”

Stephen E Arnold, August 23, 2017

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta