Distributed and scalable monitoring and analysis of foreign language Web sites, broadcast media, and social media
M3S is an end-to-end capability for collecting, translating, searching and organizing content from a range of media types in multiple languages. Sources include content from the World Wide Web, broadcast media, YouTube videos, Twitter and Facebook. M3S integrates and manages the media analysis process from beginning to end – from data collection and processing, to automated triage and retrieval, to machine-assisted translation and support for human translation, to export and dissemination. The system's automatic analysis of content supports effective retrieval and triage for human analysts who must deal with overwhelming volumes of continuously accumulating media.
Multi-lingual data collection and extraction
M3S continuously captures content from user-selected sites, channels and social media users into an archive that can be shared by multiple distributed user groups. The captured media is archived and versioned for later use. Internal links are preserved in harvested Web pages so users can navigate within the archive.
Text analysis and automatic translation
Using state-of-the-art human language technologies, M3S converts speech into text, identifies and extracts text from Web pages and social media platforms, and automatically tags named entities (people, places, and organizations). The extracted text is then automatically translated into English using machine translation software. English speakers can use the machine translation to get the gist of an article, broadcast or post; linguists and analysts can correct the machine translation and add analytical commentary.
M3S supports more than 40 languages for text-based sources, and 16 languages for speech-based sources.
Support for analysis
M3S includes tools and technologies that enable analysts to quickly discover relevant information and drill down into the data.
- Geolocation: Geographical visualizations pinpoint the areas about which participants are communicating.
- Sentiment: Analysis of the tone of interactions enables users to understand sentiments expressed over time, either individually or as a group by topic or theme.
- Topics and themes: BBN's Unsupervised Topic Discovery component automatically identifies topics, thematically classifying content or correlating it to Twitter hashtags.
- Visualizations: Graphical representations include charts, maps, and pivot views that enable convenient browsing, filtering, sorting and grouping of search results.
- Personalized labeling: Unique tags of sources, regions, or users enable targeted searching of content.
Support for social media
M3S not only harvests the text of social media postings, but also gathers metadata such as geotags, hashtags, user references, URLs, topic threading information and retweet information. Using this information, the system can accurately reconstruct interactions across networks, setting the stage for social media analytics. Relationships between participants are presented as visualizations of explicit connections (such as "follower-followed") or implicit connections indicated by use of hashtags, retweets, or sharing of content.
- Web media: Processing of streaming Web video such as webcasts.
- YouTube: Searching YouTube for videos of interest, then downloading and processing them.
- Media indexing: Processing and management of user-imported audio and video files.
- Entity summaries: Automatic generation of up-to-date information about a particular person or organization.
Development of the BBN Multimedia Monitoring System has been supported in part by the CTTSO Technical Support Working Group (TSWG), the Defense Advanced Research Projects Agency (DARPA), and other U.S. government agencies.