A Graph RAG Approach to Refugee Feedback Analysis in Jordan
1. Introduction
The UNHCR in Jordan faces the challenge of analyzing vast amounts of qualitative feedback from refugees to inform protection strategies and humanitarian assistance. This paper presents an experimental application of the Graph RAG (Retrieval-Augmented Generation) approach to refugee feedback analysis. Our work is inspired by the methodology outlined in “From Local to Global: A Graph RAG Approach to Query-Focused Summarization” (Edge et al., 2024), adapting their innovative approach to the specific context of refugee protection in Jordan.
2. Graph RAG Approach & Pipeline
Our experimental pipeline for analyzing refugee feedback in Jordan consists of the following stages:
2.1 Source Documents → Text Chunks
We extracted text from various feedback sources and split it into chunks of 600 tokens with a 100-token overlap. The chunk size C and overlap O are defined as:
C = 600, O = 100
2.2 Text Chunks → Element Instances
We used a multilingual Large Language Model (LLM) to extract entities and relationships. The extraction function E for a given text chunk t can be represented as:
E(t) = {(e_i, r_ij, e_j) | e_i, e_j ∈ Entities, r_ij ∈ Relationships}
Where e_i and e_j are entities, and r_ij is the relationship between them.
2.3 Element Instances → Element Summaries
For each unique entity and relationship, we generated concise summaries. The summary function S for an entity e or relationship r is:
S(e) = LLM(context(e)) S(r) = LLM(context(r))
Where context() provides relevant information about the entity or relationship from all mentions in the data.
2.4 Element Summaries → Graph Communities
We constructed a knowledge graph G = (V, E), where V is the set of entities and E is the set of weighted edges representing relationships. The edge weight w(e) for an edge e is calculated as:
w(e) = count(e) / sqrt(count(source(e)) * count(target(e)))
We applied the Leiden algorithm for community detection, generating a hierarchical community structure with 4 levels (C0, C1, C2, C3).
2.5 Graph Communities → Community Summaries
For each detected community c_i at level l, we generated a summary:
CS(c_i, l) = LLM(aggregate(elements(c_i)))
Where elements(c_i) returns all entities and relationships in community c_i, and aggregate() combines their summaries.
2.6 Community Summaries → Community Answers → Global Answer
Given a query q, we generate the global answer A(q) as follows:
- Prepare: Divide community summaries into chunks of 8k tokens.
- Map: Generate intermediate answers I_i(q) for each chunk i.
- Reduce: Synthesize the final answer:
A(q) = LLM(aggregate(sort_by_relevance(I_i(q))))
3. Experimental Setup
3.1 Data Sources
Our experiment used the following data sources from UNHCR Jordan:
- 61 Focus Group Discussion transcripts
- 14 Community Representative Meeting reports
- 32 Community Support Committee logs
- 9 Information Session summaries
- 15 Multimedia feedback entries (social media, WhatsApp)
3.2 Entity Types
We focused on extracting the following entity types:
- Refugee demographics
- Protection concerns
- Geographical locations
- Assistance types
3.3 Evaluation Metrics
We evaluated our approach using four metrics:
- Comprehensiveness (C )
- Diversity (D)
- Empowerment (E)
- Directness (Dr)
Each metric was computed using LLM-based comparisons between our approach and baselines, with scores normalized to a 0–100 scale.
4. Results
Our Graph RAG approach showed significant improvements over baselines in several key areas. Figure 1 illustrates the performance comparison across the metrics:
[Figure 1: Bar chart comparing Graph RAG performance against baselines (Naïve RAG and Global Text Summarization) across four metrics: Comprehensiveness, Diversity, Empowerment, and Directness. The y-axis shows scores from 0–100, with Graph RAG showing higher bars for Comprehensiveness and Diversity, similar heights for Empowerment, and slightly lower for Directness compared to baselines.]
The hierarchical community structure revealed interesting patterns in refugee concerns. Figure 2 shows the distribution of top-level communities:
[Figure 2: Pie chart showing the distribution of top-level (C0) communities in the refugee feedback graph. Slices represent major themes such as “Basic Needs” (30%), “Protection” (25%), “Education” (20%), “Health” (15%), and “Livelihoods” (10%).]
5. Discussion
Our results demonstrate the effectiveness of the Graph RAG approach in capturing complex, interconnected refugee protection issues in Jordan. The hierarchical community structure proved particularly valuable in analyzing multi-faceted concerns.
One interesting finding was the relationship between cash assistance reductions and other protection issues. We observed a strong correlation coefficient (r = 0.78) between mentions of cash assistance cuts and increased reports of negative coping mechanisms.
6. Conclusion and Future Work
The Graph RAG approach shows promise for enhancing UNHCR’s capacity for data-driven decision-making in refugee protection. Future work will focus on:
- Refining entity extraction for refugee-specific terminology
- Incorporating temporal data to analyze trends in protection concerns
- Exploring multi-lingual capabilities to handle diverse refugee populations
By leveraging this approach, we aim to provide more comprehensive and nuanced insights into refugee needs and concerns, ultimately leading to more effective humanitarian assistance strategies.
Note: This experiment was conducted on test data mimicking original refugee feedback in Jordan. Application to real data is pending data protection and cybersecurity clearance. Full working code can be available upon request.