Contents – Modern AI Stack for Social Data

Tap any chapter to start reading.

Chapter 1 Large Language Models for Social Data

Tokenization, embeddings, attention — how LLMs read text. Using pretrained models for classification, semantic search, and zero-shot tagging. Prompting patterns, RAG, and evaluating LLM outputs at scale.

Chapter 2 Foundation Models — Adaptation, Alignment, and Deployment

LoRA, QLoRA, full-vs-PEFT fine-tuning, RLHF and the closed-form DPO derivation, Constitutional AI / RLAIF, distillation, INT8/INT4 quantization, vLLM / Llama.cpp / Ollama / MLX deployment.

Chapter 3 Multimodal Analysis — Text, Image, and Video

Images as tensors, convolution from scratch, CNN intuition, CLIP joint embeddings with InfoNCE, vision-language models (BLIP, LLaVA, GPT-4o), audio spectrograms and MFCCs, multimodal content-moderation.

Chapter 4 Temporal and Dynamic Networks

Time-stamped edges, temporal paths and reachability, temporal centralities, motifs (Paranjape–Benson–Leskovec), dynamic link prediction (Adamic–Adar), burstiness vs. Poisson contagion, community life-cycle.

Chapter 5 Causal Inference and Peer Effects on Networks

Manski’s reflection problem, Bramoullé–Djebbari–Fortin IV identification, Aronow–Samii exposure mapping with Horvitz–Thompson weights, two-stage randomized saturation designs, homophily vs. influence.

Chapter 6 Knowledge Graphs and Ontologies

RDF/OWL basics, schema vs. ontology, KG embeddings (TransE, DistMult, ComplEx, RotatE) implemented from scratch, link prediction, applications (Wikidata, biomedical KGs, financial entity graphs).

Chapter 7 Knowledge-Grounded Text Analytics

NER, entity linking, relation extraction, ontology-aware information extraction with type checking and rule-based reasoning, KG-augmented RAG, neuro-symbolic NLP, compliance-grade news triage case study.

Chapter 8 LLM Agents for Content Operations

From chat to agent: planning, tool use, memory. ReAct, Chain-of-Thought, Tree-of-Thoughts, ReWoo. Multi-agent systems, agent benchmarks (GAIA, AgentBench), prompt-injection defenses, content-moderation agent.

Chapter 9 Vector Databases and Embedding Infrastructure

Nearest neighbor at scale: HNSW, IVF, Product Quantization, hybrid search with RRF. Pinecone, Weaviate, Qdrant, Milvus, pgvector. Embedding drift, re-indexing, operational realities, billion-vector deployments.

Chapter 10 Recommenders and Personalized Ranking

User-based CF, matrix factorization, BPR, Wide&Deep, DCN, DLRM. Two-tower retrieval with in-batch negatives, multi-stage ranking, MMR diversity, IPS/doubly-robust evaluation, TikTok FYP case study.

Chapter 11 MLOps and LLMOps for Production AI

Full model lifecycle, feature stores, train-serve skew, distributed training, PagedAttention / continuous batching / speculative decoding, KL drift detector, sequential A/B testing, cost economics, governance.

Chapter 12 Misinformation and Stance Detection

Wardle–Derakhshan taxonomy, linguistic-feature classifiers, TF-IDF + logistic regression, FNC-1 stance detection, network propagation signatures, domain reputation, transformer-based detection, RAG for verification.

Chapter 13 Synthetic Content and Deepfake Detection

Generation taxonomy (diffusion, GANs, voice clones, video), DetectGPT and watermarking, FFT spectral fingerprints, face/audio/video forensics, C2PA provenance, the detection-ceiling result, multi-modality late-fusion.

Chapter 14 Financial Networks and Systemic Risk

The Eisenberg–Noe clearing vector, stress testing under contagion, DebtRank, Acemoglu–Ozdaglar–Tahbaz-Salehi robust-yet-fragile result, fire sales and common-asset contagion, CoVaR, full bank-asset stress-test.

How to read this book

Every Python code block in this book runs live in your browser. The book has a natural arc — foundation models and representations (Ch 1–3), modern network modeling (Ch 4–5), knowledge and reasoning (Ch 6–8), production systems (Ch 9–11), and trust/safety/applications (Ch 12–14) — but chapters are largely independent.

Tips for self-study

Companion volume Foundations of Network and Text Data is recommended for readers new to graphs, topic models, or sentiment analysis.
This book is opinionated about the AI industry as of 2026 — naming names, citing real systems, and listing real cost numbers. Things will change; the chapter-level abstractions should stay durable for 5–10 years.
Heavy use of the non-executing python block pattern for production code that won’t run in your browser (PyTorch, Faiss, vLLM, etc.). Copy these to Colab or your own environment to run.

← Back to Cover