Research Project: Norwegian + Urdu Multilingual AI Assistant

This is a flagship research-style project that combines two underrepresented languages — Norwegian and Urdu — in a practical AI assistant. The goal is not just to build something that works, but to understand the specific challenges each language presents, measure performance systematically, and document findings the way a researcher would.

Why these languages? Norwegian has strong public NLP resources but is underrepresented in production AI tools relative to English. Urdu is one of the most spoken languages globally but severely underrepresented in NLP benchmarks — Roman Urdu (Urdu written in Latin script, common in informal digital communication) is almost entirely absent from most models' training data.

What you will build:

Multilingual sentiment analysis pipeline
Norwegian-English and English-Urdu translation
Domain-specific text classification (immigrant support queries)
Multilingual chatbot with language detection
Research-quality evaluation report

Setup

Bash

pip install transformers datasets torch langdetect sacrebleu evaluate pandas tqdm

Python

from transformers import (
    pipeline, AutoTokenizer, AutoModelForSequenceClassification,
    AutoModelForSeq2SeqLM, MarianMTModel, MarianTokenizer
)
from datasets import load_dataset
import langdetect
from langdetect import detect
import pandas as pd
import torch

print(f"PyTorch: {torch.__version__}")
print(f"Device: {'cuda' if torch.cuda.is_available() else 'cpu'}")

Phase 1: Baseline Features

Language Detection

Python

def detect_language(text: str) -> str:
    try:
        return langdetect.detect(text)
    except Exception:
        return "unknown"

test_texts = [
    "Hva er reglene for foreldrepermisjon i Norge?",   # Norwegian
    "مجھے اپنا پاسپورٹ تجدید کرنا ہے",                 # Urdu (Nastaliq script)
    "Mujhe apna passport renew karna hai",              # Roman Urdu
    "I need to renew my passport",                      # English
]

for text in test_texts:
    lang = detect_language(text)
    print(f"[{lang}] {text[:50]}")

[no] Hva er reglene for foreldrepermisjon i Norge?
[ur] مجھے اپنا پاسپورٹ تجدید کرنا ہے
[en] Mujhe apna passport renew karna hai  ← Roman Urdu detected as English (known limitation)
[en] I need to renew my passport

Note: Roman Urdu detection is a known challenge — most language detectors classify it as English. This becomes a research finding.

Multilingual Sentiment Analysis

Python

# XLM-RoBERTa trained on multiple languages including Norwegian
sentiment_pipeline = pipeline(
    "sentiment-analysis",
    model="cardiffnlp/twitter-xlm-roberta-base-sentiment",
    tokenizer="cardiffnlp/twitter-xlm-roberta-base-sentiment",
)

test_sentiments = [
    "Dette er et fantastisk system!",          # Norwegian: "This is a fantastic system!"
    "Jeg er veldig frustrert over ventetiden", # Norwegian: "I am very frustrated with the wait time"
    "یہ سروس بہت اچھی ہے",                    # Urdu: "This service is very good"
    "مجھے اس نظام سے مسائل ہیں",             # Urdu: "I have problems with this system"
]

for text in test_sentiments:
    result = sentiment_pipeline(text, truncation=True, max_length=512)
    print(f"[{result[0]['label']} {result[0]['score']:.2f}] {text}")

Translation Pipeline

Python

class TranslationPipeline:
    def __init__(self):
        # Norwegian → English
        self.no_en_model = MarianMTModel.from_pretrained("Helsinki-NLP/opus-mt-tc-big-no-en")
        self.no_en_tok   = MarianTokenizer.from_pretrained("Helsinki-NLP/opus-mt-tc-big-no-en")

        # English → Urdu
        self.en_ur_model = MarianMTModel.from_pretrained("Helsinki-NLP/opus-mt-en-ur")
        self.en_ur_tok   = MarianTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-ur")

        # English → Norwegian
        self.en_no_model = MarianMTModel.from_pretrained("Helsinki-NLP/opus-mt-en-ROMANCE")
        # Note: Norwegian requires a multi-target model — a research limitation to document

    def translate(self, text: str, src: str, tgt: str) -> str:
        if src == "no" and tgt == "en":
            model, tok = self.no_en_model, self.no_en_tok
        elif src == "en" and tgt == "ur":
            model, tok = self.en_ur_model, self.en_ur_tok
        else:
            return f"[Translation not supported: {src}→{tgt}]"

        inputs = tok(text, return_tensors="pt", truncation=True, max_length=512)
        output = model.generate(**inputs, max_new_tokens=200)
        return tok.decode(output[0], skip_special_tokens=True)

translator = TranslationPipeline()

test_query = "Hva er reglene for å søke om familieinnvandring?"
# "What are the rules for applying for family immigration?"

en_translation = translator.translate(test_query, "no", "en")
ur_translation  = translator.translate(en_translation, "en", "ur")

print(f"Norwegian: {test_query}")
print(f"English:   {en_translation}")
print(f"Urdu:      {ur_translation}")

Phase 2: Domain Adaptation — Immigrant Support Queries

Build a classifier that routes queries to the right support category.

Dataset Construction

Python

# Create a small labelled dataset of immigrant support queries
# In a real project this would come from actual support ticket data
training_data = {
    "text": [
        # Norwegian queries
        "Hva er reglene for foreldrepermisjon?",
        "Jeg trenger hjelp med skattemeldingen",
        "Hvordan søker jeg om familiegjenforening?",
        "Hva koster det å fornye oppholdstillatelse?",
        "Jeg har mistet jobben og trenger hjelp",
        # English queries
        "How do I apply for parental leave?",
        "I need help with my tax return",
        "What documents do I need for family reunification?",
        "How much does it cost to renew residence permit?",
        "I lost my job and need support",
        # Urdu (Nastaliq)
        "میں اپنی اقامت کی اجازت کیسے تجدید کروں؟",
        "مجھے ٹیکس ریٹرن میں مدد چاہیے",
        "خاندانی اتحاد کے لیے کیا کاغذات درکار ہیں؟",
        # Roman Urdu
        "Mujhe residence permit renew karni hai",
        "Tax return mein madad chahiye",
        "Family reunification ke liye kya documents chahiye?",
    ],
    "category": [
        "parental_leave", "tax", "family_reunion", "residence_permit", "employment",
        "parental_leave", "tax", "family_reunion", "residence_permit", "employment",
        "residence_permit", "tax", "family_reunion",
        "residence_permit", "tax", "family_reunion",
    ]
}

df = pd.DataFrame(training_data)
print(df["category"].value_counts())

Python

# Use zero-shot classification for low-resource categorisation
# (we don't have enough labelled data for fine-tuning)
zero_shot = pipeline(
    "zero-shot-classification",
    model="facebook/bart-large-mnli",
)

candidate_labels = [
    "parental leave", "tax and finance", "family reunification",
    "residence permit", "employment and unemployment", "healthcare", "education"
]

def classify_query(text: str) -> dict:
    result = zero_shot(text, candidate_labels)
    return {
        "predicted_category": result["labels"][0],
        "confidence": result["scores"][0],
        "all_scores": dict(zip(result["labels"], result["scores"]))
    }

# Test
queries = [
    "Jeg trenger hjelp med foreldrepengeordningen",
    "مجھے میری بچت پر ٹیکس کیسے ادا کرنا ہے؟",
    "Mujhe maternity leave ke baare mein jaan'na hai",
]

for q in queries:
    result = classify_query(q)
    print(f"Query: {q[:60]}")
    print(f"  → {result['predicted_category']} (confidence: {result['confidence']:.2f})")

Phase 3: Multilingual Chatbot

Python

from transformers import pipeline as hf_pipeline

class MultilingualSupportBot:
    def __init__(self):
        self.translator = TranslationPipeline()
        self.sentiment  = pipeline(
            "sentiment-analysis",
            model="cardiffnlp/twitter-xlm-roberta-base-sentiment"
        )
        self.classifier = zero_shot

        # Knowledge base in English (source of truth)
        self.knowledge_base = {
            "parental_leave": (
                "In Norway, parental leave totals 49 weeks at 100% pay or 59 weeks at 80% pay. "
                "Both parents are entitled to leave. The father's quota is 15 weeks."
            ),
            "residence_permit": (
                "To renew your residence permit, apply via UDI.no at least 1 month before expiry. "
                "Required documents: valid passport, documentation of accommodation, proof of income."
            ),
            "tax": (
                "Tax returns in Norway are pre-filled and sent in March. You have until April 30 to submit. "
                "If you have income from abroad, you must add it manually."
            ),
            "family_reunion": (
                "Family reunification requires the sponsor to have a valid residence permit. "
                "Processing time is typically 3-6 months. Fees vary by relationship type."
            ),
            "employment": (
                "Unemployed residents can apply for dagpenger (unemployment benefit) through NAV. "
                "You must have worked for at least 12 months in the last 2 years."
            ),
        }

    def respond(self, user_message: str) -> str:
        # 1. Detect language
        lang = detect_language(user_message)

        # 2. Analyse sentiment
        sentiment = self.sentiment(user_message, truncation=True)[0]
        is_frustrated = sentiment["label"] == "Negative" and sentiment["score"] > 0.8

        # 3. Translate to English for classification if needed
        english_message = user_message
        if lang == "no":
            english_message = self.translator.translate(user_message, "no", "en")

        # 4. Classify query
        classification = classify_query(english_message)
        category = classification["predicted_category"].replace(" and ", "_").replace(" ", "_")

        # Map to knowledge base key
        key_map = {
            "parental_leave": "parental_leave",
            "tax_and_finance": "tax",
            "family_reunification": "family_reunion",
            "residence_permit": "residence_permit",
            "employment_and_unemployment": "employment",
        }
        kb_key = key_map.get(category, None)

        # 5. Retrieve answer from knowledge base
        if kb_key and kb_key in self.knowledge_base:
            answer_en = self.knowledge_base[kb_key]
        else:
            answer_en = "I don't have specific information about that. Please contact NAV or UDI directly."

        # 6. Add empathy if user is frustrated
        if is_frustrated:
            answer_en = "I understand this process can be stressful. " + answer_en

        # 7. Translate answer back to user's language
        if lang == "no":
            # For this prototype, translate back to Norwegian
            # Production would use a better no-targeted model
            return f"[NO→EN→NO translation prototype]\n{answer_en}"
        elif lang == "ur":
            answer_ur = self.translator.translate(answer_en, "en", "ur")
            return answer_ur
        else:
            return answer_en

bot = MultilingualSupportBot()

# Test conversations
test_queries = [
    "Hva er reglene for foreldrepermisjon?",
    "I need to renew my residence permit urgently",
    "مجھے اپنے ٹیکس کے بارے میں مدد چاہیے",
]

for query in test_queries:
    print(f"\nUser: {query}")
    response = bot.respond(query)
    print(f"Bot:  {response[:200]}")

Phase 4: Research-Style Evaluation

Translation Quality — BLEU Score

Python

from evaluate import load

bleu = load("bleu")
sacrebleu = load("sacrebleu")

# Reference translations (ground truth from a professional translator)
references_no_en = [
    ["I need help with parental leave application"],
    ["What documents do I need for family reunification?"],
    ["How much does it cost to renew a residence permit?"],
]

hypotheses_no_en = [
    translator.translate("Jeg trenger hjelp med søknad om foreldrepermisjon", "no", "en"),
    translator.translate("Hvilke dokumenter trenger jeg for familiegjenforening?", "no", "en"),
    translator.translate("Hva koster det å fornye oppholdstillatelse?", "no", "en"),
]

result = sacrebleu.compute(
    predictions=hypotheses_no_en,
    references=references_no_en
)
print(f"Norwegian→English BLEU: {result['score']:.2f}")

# Urdu translation quality (harder to evaluate without native speaker references)
# Document this as a limitation in the report

Per-Language Accuracy Report

Python

# Test classification accuracy per language
test_cases = pd.DataFrame({
    "text": [
        "Hva er reglene for foreldrepermisjon?",
        "مجھے اقامت کی اجازت تجدید کرنی ہے",
        "Mujhe residence permit renew karni hai",
        "How do I apply for unemployment benefits?",
        "Jeg har mistet jobben",
    ],
    "language": ["no", "ur", "roman_ur", "en", "no"],
    "true_category": [
        "parental_leave", "residence_permit", "residence_permit",
        "employment", "employment"
    ]
})

results = []
for _, row in test_cases.iterrows():
    pred = classify_query(row["text"])
    predicted = pred["predicted_category"].lower()
    correct = row["true_category"].replace("_", " ") in predicted
    results.append({
        "language": row["language"],
        "correct": correct,
        "confidence": pred["confidence"],
    })

results_df = pd.DataFrame(results)
print("\nAccuracy by language:")
print(results_df.groupby("language")["correct"].mean().round(2))

print("\nConfidence by language:")
print(results_df.groupby("language")["confidence"].mean().round(3))

Hallucination and Error Categories

Python

# Systematically document error types (for the research report)
error_categories = {
    "Roman Urdu misdetected as English": 0,
    "Low-confidence classification (< 0.5)": 0,
    "Translation quality insufficient for classification": 0,
    "Out-of-vocabulary cultural concepts": 0,
    "Sentiment false negative on indirect language": 0,
}

# Fill in from your test results
error_categories["Roman Urdu misdetected as English"] = 8   # out of 10 test cases
error_categories["Low-confidence classification (< 0.5)"] = 3

print("\nError category frequencies:")
for category, count in error_categories.items():
    print(f"  {category}: {count}")

Deliverables

1. GitHub repo containing:
   [ ] scripts/translate.py — translation pipeline
   [ ] scripts/classify.py  — zero-shot classification
   [ ] scripts/chatbot.py   — multilingual bot
   [ ] notebooks/evaluation.ipynb — all evaluation results
   [ ] data/test_cases.csv  — labelled test set with ground truth

2. Evaluation report (markdown or PDF) containing:
   [ ] BLEU scores for Norwegian→English translation (with confidence intervals)
   [ ] Classification accuracy broken down by language (Norwegian, Urdu, Roman Urdu, English)
   [ ] Error analysis table (error type, frequency, example, proposed fix)
   [ ] Qualitative examples: 3 successful and 3 failed responses with explanation
   [ ] Fairness analysis: does the system perform equally across languages?
   [ ] Recommendations for production deployment

3. Demo:
   [ ] Short video (3-5 minutes) walking through the chatbot in all three languages
   [ ] Or a README with screenshots showing multilingual conversations

Key Research Findings to Document

After running your evaluation, your report should address these questions honestly:

Roman Urdu: How does the system handle queries in Urdu written in Latin script? What is the language detection accuracy? How does this affect downstream classification?

Domain coverage: Which support categories does the system handle well and which does it struggle with? Are there Norwegian-specific concepts (NAV, UDI, dagpenger) that translate poorly?

Fairness: Is the system's accuracy consistent across languages? If Norwegian queries get 90% accuracy and Urdu queries get 60%, what are the implications for deployment to an immigrant support service?

Limitations vs. production readiness: What would need to change before this system could be deployed in a real immigrant support context?

Documenting failures honestly is what separates research from marketing. A system that acknowledges it cannot handle Roman Urdu reliably is more trustworthy than one that claims multilingual support and silently fails.

Research Project: Norwegian + Urdu AI Assistant

Research Project: Norwegian + Urdu Multilingual AI Assistant

Setup

Phase 1: Baseline Features

Language Detection

Multilingual Sentiment Analysis

Translation Pipeline

Phase 2: Domain Adaptation — Immigrant Support Queries

Dataset Construction

Phase 3: Multilingual Chatbot

Phase 4: Research-Style Evaluation

Translation Quality — BLEU Score

Per-Language Accuracy Report

Hallucination and Error Categories

Deliverables

Key Research Findings to Document