Department of Defense: Learning from Social Media Posts

May 25, 2019

A solicitation request dated May 13, 2019, “A–Global Social Media Archive, 350 billion digital data records” is an interesting public message. Analysis of social media allegedly has been a task within other projects handled by firms specializing in content analytic. These data mining efforts are, based on DarkCyber’s understanding of open source information from specialist vendors, are nothing new. The solicitation offers some interesting insights which may warrant some consideration.

First, the scope of the task is 350 billion digital records. It is not clear what a “digital record” constitutes, but the 350 billion number represents about two or three months of Facebook posts. It is not clear if the content comes from one service like Twitter or is drawn from a range of messaging and content sources.

Second, the content pool must include 60 languages. The most used languages on the public Internet are English, Chinese, and Spanish. The other 57 languages contribute a small volume of content, and this fact may create a challenge for the vendors responding to the solicitation. The document states:

Data includes messages from at least 200 million unique users in at least 100 countries, with no single country accounting for more than 30% of users.

Third, the text content and the metadata must be included in the content bundle.

The exclusion of photographs and videos is interesting. These are important content mechanisms. Are commercial enterprises operating without connections to nation states operating large-scale content aggregation systems likely to be able to comply? Worth watching to find out who lands this project.

Stephen E Arnold, May 25, 2019

Comments

Got something to say?





  • Archives

  • Recent Posts

  • Meta