Why COVID-19 Misinformation in the Philippines Needs Human, Cultural, and Computational Analysis


Article information

Isip Tan, I. T., Cleofas, J., Solano, G., Pillejera, J. G., & Catapang, J. K. (2023). Interdisciplinary approach to identify and characterize COVID-19 misinformation on Twitter: Mixed methods study. JMIR Formative Research, 7, e41134. https://doi.org/10.2196/41134

What this study is about

During the early months of COVID-19, people were not only dealing with a new virus. They were also dealing with an overwhelming flow of information: news updates, rumors, jokes, panic posts, political commentary, “health tips,” conspiracy theories, and marketing claims. This study looked at that problem in the Philippine Twitter space. 

The authors wanted to know two things: how much COVID-19 misinformation appeared in Philippine tweets, and what kinds of topics, formats, and communication styles it used. The study is especially important because many misinformation studies rely heavily on English-language data and high-income country contexts. This paper focused on tweets geolocated to the Philippines, including tweets written in Filipino, English, and combinations of both. 

Why this matters

Misinformation is not always obvious. Some false or misleading posts look like jokes. Some look like sincere warnings. Some imitate expert language. Some use fear. Some use humor. Some are attached to political frustration. Some sell products. During a public health crisis, these forms matter because they can shape how people understand risk, treatment, prevention, government response, and responsibility. 

The paper’s bigger message is methodological: machines can help, but they are not enough. Natural language processing can scan large datasets quickly, but it may misread local language, humor, sarcasm, Taglish, and cultural context. For Philippine misinformation, human coders with cultural and platform knowledge were crucial. 

What the researchers did

The study used a convergent mixed methods design, combining computational and qualitative approaches. The researchers collected tweets geolocated around the National Capital Region from January 1 to March 21, 2020, using terms such as “coronavirus,” “covid,” and “ncov.” This produced a primary corpus of 12,631 tweets

The team then used biterm topic modeling to identify major topic clusters in the tweet corpus. They also conducted 10 key informant interviews with people such as health professionals, educators, data specialists, and others who had encountered COVID-19 misinformation online. These interviews helped the team identify keywords and examples of misinformation across Twitter and other platforms such as Facebook, Viber, and Messenger. 

The study had two major analysis paths. First, the team manually coded a subcorpus of 5,881 tweets to identify misinformation. Second, they created another subcorpus of 4,634 tweets, used manually labeled misinformation tweets as training data, and applied natural language processing to identify additional misinformation. The NLP-labeled tweets were then manually reviewed. 


What the study found

1) Manual coding found misinformation in 6.8% of the analyzed tweets

The manual coding process identified 398 tweets with COVID-19 misinformation in subcorpus A, equal to 6.8% of the manually coded tweets. The authors note that prevalence estimates depend heavily on how a Twitter dataset is collected and filtered. 

2) NLP struggled with the Philippine language context

The NLP approach identified 165 tweets as misinformation. However, manual review showed that 69.7% of those NLP-labeled tweets did not actually contain misinformation. The authors suggest this likely happened because many tweets used Filipino, English, or mixed Filipino-English language, making automated detection more difficult. 

In plain language: the algorithm often misunderstood the tweets. This does not mean NLP is useless. It means that in multilingual and culturally specific settings, automated tools need human verification.

3) COVID-19 tweets clustered into four major topic areas

The topic modeling identified 15 clusters, which the researchers grouped into four broad areas:

  1. COVID-19 prevention and management — safety measures, testing, precautions, sanitation, distancing, and health practices.
  2. Nature of COVID-19 — uncertainty, cases, deaths, and attempts to understand the virus.
  3. People and agents of COVID-19 — lawmakers, governments, countries, frontliners, and health facilities.
  4. Contexts and consequences of COVID-19 — loved ones, panic buying, economic concerns, and other crises occurring during the same period. 

This matters because misinformation was not limited to one theme. It appeared across health behavior, disease explanation, politics, economics, and everyday fear.

4) The most common misinformation topic was prevention and management

Among the misinformation tweets, the largest topic was COVID-19 prevention and management, making up 48% of the misinformation tweets. This included misleading claims about how to prevent infection or manage risk. Other topics included the nature of COVID-19, people/agents of COVID-19, and the broader context and consequences of the pandemic. 

This is important for public health because false prevention advice can directly affect behavior. If people believe ineffective remedies, misunderstood transmission claims, or misleading safety advice, they may make riskier decisions.

5) Misinformation appeared in five formats

The study identified five major formats of misinformation:

  • Misleading content — the most common format, found in 45.5% of misinformation tweets.
  • Satire/parody — jokes or humorous content that may still spread misleading ideas.
  • False connection — mismatch between content and linked material.
  • False context — genuine information reframed in a misleading way.
  • Conspiracy theories — claims that the pandemic was part of a secret plan by powerful actors. 

This finding is useful because fact-checking cannot only ask, “Is this true or false?” It also needs to ask, “How is the falsehood packaged?”

6) Misinformation used culturally meaningful discursive strategies

The study’s most interesting contribution is its analysis of discursive strategies: the emotional, rhetorical, and social styles used in misinformation tweets. The study found seven strategies:

  • Humor
  • Fear mongering
  • Political commentary
  • Anger and disgust
  • Performing credibility
  • Overpositivity
  • Marketing

Humor was the most common strategy. This is important in the Philippine context because humor is often used to cope with crisis. But humorous misinformation can still normalize unsafe ideas or make false claims more memorable. Overpositivity also matters because hopeful or religiously toned messages can downplay risk when they are not grounded in evidence. 

7) The study created a matrix of formats and strategies

The paper offers a matrix showing how misinformation formats and discursive strategies intersect. For example, misleading content can be delivered through humor, fear, anger, political commentary, or marketing. This matrix can help public health communicators understand not only what misinformation says, but how it persuades people. 

8) The study has important limits

The authors are careful about limitations. The Twitter dataset was not representative of all COVID-19 misinformation in the Philippines. Only geolocated tweets around NCR were included, and not all tweets have geolocation data. Retweets, quote tweets, replies, bot activity, images, links, and emojis were not fully analyzed. Human coding also has limits because language can be ambiguous and coders bring their own positionality. 

Bottom line

This study shows that COVID-19 misinformation in the Philippines was not just a technical problem of detecting false claims. It was a social, cultural, linguistic, and political problem. To understand it, the researchers needed computational tools, health expertise, social science interpretation, and culturally grounded human coding. 


Policy/practice recommendations

  1. Use human review alongside automated misinformation detection
    NLP tools should not be used alone in multilingual settings. Filipino, English, Taglish, humor, sarcasm, and political context require culturally informed human interpretation. 
  2. Design health messages that respond to style, not only content
    Public health agencies should not only correct false claims. They should understand whether misinformation spreads through fear, humor, overpositivity, political commentary, or marketing. 
  3. Treat humor as a serious misinformation pathway
    Humorous posts can make false ideas feel harmless or memorable. Risk communication should learn how to respond without sounding overly punitive or disconnected from Filipino online culture. 
  4. Build interdisciplinary infodemic teams
    Effective misinformation work needs health professionals, data scientists, social scientists, linguists, communication experts, and people who understand platform culture. 
  5. Create locally grounded misinformation taxonomies
    Imported frameworks are useful starting points, but local categories should emerge from actual Philippine data, including Filipino-language and Taglish posts. 
  6. Monitor misinformation beyond one platform
    The study’s key informants encountered misinformation not only on Twitter but also on Facebook, Viber, and Messenger, suggesting that public health monitoring should consider the wider information ecosystem. 

Glossary of key terms

  • Misinformation — False or misleading information shared without necessarily intending to deceive.
  • Disinformation — False information deliberately shared to deceive or manipulate.
  • Malinformation — True information used in a harmful or misleading way.
  • Infodemic — An overwhelming information environment during a crisis, where accurate information, misinformation, disinformation, and rumor circulate together. 
  • Natural language processing / NLP — Computational techniques used to analyze human language, often used to classify or detect patterns in large text datasets.
  • Biterm topic modeling / BTM — A topic modeling method useful for short texts such as tweets because it examines word co-occurrence patterns across the corpus. 
  • Corpus — A collection of texts used for analysis; in this study, the main corpus included 12,631 tweets. 
  • Subcorpus — A smaller subset of a larger text corpus used for more focused analysis.
  • Manual coding — Human-led classification and interpretation of text data.
  • Constant comparative analysis — A qualitative method that repeatedly compares data, codes, and categories to develop themes.
  • Consensual qualitative analysis — A team-based qualitative approach where researchers discuss and agree on codes and categories.
  • Discursive strategies — Communication styles or persuasive techniques used in posts, such as humor, fear, anger, political commentary, credibility performance, overpositivity, or marketing. 
  • Misleading content — Information that frames an issue inaccurately or in a way that leads readers toward a false understanding.
  • False connection — Content where the headline, caption, link, or framing does not match the actual evidence or linked material. 
  • False context — Genuine content presented in a misleading context.
  • Satire/parody — Humorous or exaggerated content that may still spread inaccurate ideas.
  • Performing credibility — Making a post look authoritative or science-based even when the claim is not supported by evidence. 
  • Taglish — A mixture of Tagalog/Filipino and English, common in Philippine online communication.

Comments