How advanced and sophisticated is the data collection system of a powerful Social Listening tool?
Social Listening tools collect data from all media platforms that enable multi-dimensional interactions, relying on two main methods: APIs and Sites.
This article is part of a series that reveals the true picture of how social listening tools work, especially Buzzmetrics, a solution for social media monitoring and analysis used to track campaigns and brands managed by major corporations like Coca-Cola, Unilever, and Mead Johnson, as well as global agencies such as Ogilvy, Maxus, Leo Burnett, and Phibious in Vietnam.
In the context of social listening, social media encompasses not only social networks but also all media platforms that allow for multi-dimensional interactions, including forums, online news (comment sections), blogs, consumer review sites like Foody, and product reviews on e-commerce platforms like Lazada and Tiki.
Social listening is a variant business model of the market research industry. Similar to traditional market research processes, social media research also goes through four stages:
1. Developing an analysis plan
2. Collecting data
3. Selecting and analyzing data
4. Visualizing data and generating reports
→ Learn more: What is Social Listening and how does it play a role in Social Media Marketing?
Data collection marks the initial stage in conducting social media research. Currently, Buzzmetrics’ system gathers data from an extensive range of sources, including 1,141,412 Facebook fan pages, 211,571 Facebook groups, 1,240 forums, 3,067 online news outlets, 138,114 YouTube channels, 219,691 Instagram accounts, as well as review and e-commerce sites. This system processes millions of discussions daily at high speed, enabling comprehensive market analysis for trend and industry insights. The technological and hardware investments for Buzzmetrics' Social Listening tool are on par with major search engines and continue to grow over time, ensuring cutting-edge data tracking capabilities.
Currently, there are two main methods used for data collection: API and Sites.
1. Data Collection via API (Application Programming Interface)
This method is applied to major global social networks such as Facebook, Google Plus, YouTube, Twitter, and Instagram. Social listening tools connect to these platforms' APIs (Application Programming Interfaces), requesting posts containing specific keywords.
In theory, this method allows data collection from the entire social network, including individual profiles. However, actual data availability is limited by restrictions set by these social networks. For example, Facebook’s limitations on organic reach impact fan page owners and advertisers alike, as it also restricts the extent and consistency of individual post data returned to social listening tools through its API. Currently, there are no precise statistics on what percentage of discussions can be captured through API-based data collection.
2. Data Collection via Sites
Buzzmetrics utilizes a site-based data collection system, gathering data from specific sources such as news websites, forums, Facebook fan pages, YouTube channels, Instagram pages, and more. This method allows for comprehensive data capture across all listed channels. Data collection is conducted in two main ways: automatic crawling and curated site listings (panel).
(A) Data Collection via Site Listing Method:
Creating a social listening platform for a new market starts with assembling a comprehensive list of social media fan pages, news sites, forums, blogs, and other relevant sources in that market—a process that typically takes between six months to a year. Once the list is compiled, the data team develops web crawlers to continuously scan these sites and capture user discussions. These crawlers function like human users, automatically navigating the page content, identifying threads, post details (such as the lead post, author, date, and any comments or replies), and gathering data accordingly.
Unlike search engines that process an entire page as a single data point, social listening systems treat each comment as a separate line of data. For instance, if a post has 907 responses, the system logs 908 lines of data—908 mentions or buzz points that reflect consumer opinions.
The crawlers only access publicly visible content, collecting discussions set to public and respecting privacy laws by not gathering private content. However, they can capture discussions within closed Facebook groups, provided they log in with a group member ID and have the group admin’s consent.
The system collects all data from a page, spanning from the past to the present, and continuously updates the latest data every 15 minutes to 1 hour.
The data collection method via sites depends on four factors: internet connection, the page's data response speed, the crawler's ability to recognize content and the page structure, and the page's ability to block crawlers. Large forums often undergo structural changes every year, so when crawlers encounter a new layout different from the original design, it can disrupt the data collection process. Publishers typically have mechanisms to detect and block data collection by computers, affecting bandwidth. Crawlers must also frequently update and change their identity to bypass these blocking mechanisms.
Due to these challenges, data gaps or interruptions are unavoidable for social listening tools. At Buzzmetrics, a dedicated data team of programmers works continuously to update crawlers and implement measures to handle unexpected situations, ensuring complete data for clients—especially during campaigns or crisis management. This is also the main reason why international or free social listening tools like iSentia, Brandtology, Sysomos, Radiant6, and mention.com often fail to operate effectively in Vietnam, as they frequently experience data shortages due to an incomplete list of sites or lack the immediate personnel to resolve issues when they arise.
(B) Automated Page Data Collection:
Automated page data collection can be carried out through two smart mechanisms:
- Trend-Based Collection: The system automatically detects and collects pages that discuss the most talked-about topics and trends on social media. For example, when an event gains significant media attention, the system automatically identifies keywords related to the event and scans social media platforms to collect pages discussing those keywords. This includes Facebook pages, forums, and more.
- Propagation-Based Collection: Once pages or groups are collected, the system identifies and collects additional pages, groups, or users mentioned within these initial pages. This mechanism allows for the collection of data from interconnected discussions across multiple platforms.
Trend-based and propagation-based data collection are two processes carried out simultaneously, ensuring that the most-discussed topics on social media are captured in the system as quickly and comprehensively as possible. Social listening technology, much like search engine technology, is a model for aggregating market data. A social listening tool must store data for at least two years to support research purposes. The pressure to store and process data increases over time, which makes investment in hardware infrastructure substantial and ongoing.
To effectively apply Social Listening in Social Media Marketing, contact Buzzmetrics here.
Article Information