Contents
Contents
Tap any chapter to start reading.
Chapter 1 Large Language Models for Social DataTokenization, embeddings, attention — how LLMs read text. Using pretrained models for classification, semantic search, and zero-shot tagging. Prompting patterns, RAG, and evaluating LLM outputs at scale.
Chapter 2 Foundation Models — Adaptation, Alignment, and DeploymentLoRA, QLoRA, full-vs-PEFT fine-tuning, RLHF and the closed-form DPO derivation, Constitutional AI / RLAIF, distillation, INT8/INT4 quantization, vLLM / Llama.cpp / Ollama / MLX deployment.
Chapter 3 Multimodal Analysis — Text, Image, and VideoImages as tensors, convolution from scratch, CNN intuition, CLIP joint embeddings with InfoNCE, vision-language models (BLIP, LLaVA, GPT-4o), audio spectrograms and MFCCs, multimodal content-moderation.
Chapter 4 Temporal and Dynamic NetworksTime-stamped edges, temporal paths and reachability, temporal centralities, motifs (Paranjape–Benson–Leskovec), dynamic link prediction (Adamic–Adar), burstiness vs. Poisson contagion, community life-cycle.
Chapter 5 Causal Inference and Peer Effects on NetworksManski’s reflection problem, Bramoullé–Djebbari–Fortin IV identification, Aronow–Samii exposure mapping with Horvitz–Thompson weights, two-stage randomized saturation designs, homophily vs. influence.
Chapter 6 Knowledge Graphs and OntologiesRDF/OWL basics, schema vs. ontology, KG embeddings (TransE, DistMult, ComplEx, RotatE) implemented from scratch, link prediction, applications (Wikidata, biomedical KGs, financial entity graphs).
Chapter 7 Knowledge-Grounded Text AnalyticsNER, entity linking, relation extraction, ontology-aware information extraction with type checking and rule-based reasoning, KG-augmented RAG, neuro-symbolic NLP, compliance-grade news triage case study.
Chapter 8 LLM Agents for Content OperationsFrom chat to agent: planning, tool use, memory. ReAct, Chain-of-Thought, Tree-of-Thoughts, ReWoo. Multi-agent systems, agent benchmarks (GAIA, AgentBench), prompt-injection defenses, content-moderation agent.
Chapter 9 Vector Databases and Embedding InfrastructureNearest neighbor at scale: HNSW, IVF, Product Quantization, hybrid search with RRF. Pinecone, Weaviate, Qdrant, Milvus, pgvector. Embedding drift, re-indexing, operational realities, billion-vector deployments.
Chapter 10 Recommenders and Personalized RankingUser-based CF, matrix factorization, BPR, Wide&Deep, DCN, DLRM. Two-tower retrieval with in-batch negatives, multi-stage ranking, MMR diversity, IPS/doubly-robust evaluation, TikTok FYP case study.
Chapter 11 MLOps and LLMOps for Production AIFull model lifecycle, feature stores, train-serve skew, distributed training, PagedAttention / continuous batching / speculative decoding, KL drift detector, sequential A/B testing, cost economics, governance.
Chapter 12 Misinformation and Stance DetectionWardle–Derakhshan taxonomy, linguistic-feature classifiers, TF-IDF + logistic regression, FNC-1 stance detection, network propagation signatures, domain reputation, transformer-based detection, RAG for verification.
Chapter 13 Synthetic Content and Deepfake DetectionGeneration taxonomy (diffusion, GANs, voice clones, video), DetectGPT and watermarking, FFT spectral fingerprints, face/audio/video forensics, C2PA provenance, the detection-ceiling result, multi-modality late-fusion.
Chapter 14 Financial Networks and Systemic RiskThe Eisenberg–Noe clearing vector, stress testing under contagion, DebtRank, Acemoglu–Ozdaglar–Tahbaz-Salehi robust-yet-fragile result, fire sales and common-asset contagion, CoVaR, full bank-asset stress-test.
How to read this book
Every Python code block in this book runs live in your browser. The book has a natural arc — foundation models and representations (Ch 1–3), modern network modeling (Ch 4–5), knowledge and reasoning (Ch 6–8), production systems (Ch 9–11), and trust/safety/applications (Ch 12–14) — but chapters are largely independent.
- Companion volume Foundations of Network and Text Data is recommended for readers new to graphs, topic models, or sentiment analysis.
- This book is opinionated about the AI industry as of 2026 — naming names, citing real systems, and listing real cost numbers. Things will change; the chapter-level abstractions should stay durable for 5–10 years.
- Heavy use of the non-executing
pythonblock pattern for production code that won’t run in your browser (PyTorch, Faiss, vLLM, etc.). Copy these to Colab or your own environment to run.