Explored adversarial search and Nash equilibrium to understand how multi-agents plan, compete, and cooperate strategically under partial information. Applied regret minimization to develop adaptive reasoning that optimizes multi-agent decisions.
Studied classic game-playing settings such as chess, Go, poker, and real-time strategy games as two-player and multi-player zero-sum Markov games, and modeled them as search problems where agents alternate moves while maximizing long-term utility against optimal opponents using minimax and α-β pruning. This included designing evaluation functions, depth-limited search, and heuristics that make adversarial reasoning tractable in very large game trees.
Formalized strategic interaction using normal-form and extensive-form games, analyzing Nash and subgame-perfect equilibria for both complete and partial-information games. Worked through mixed-strategy equilibria where players randomize to remain unpredictable, and reasoned about nature moves to handle stochastic transitions and chance events within the game. These tools were used to characterize when agents should commit to pure exploitation versus randomized policies that are robust to an intelligent adversary observing their behavior.
To design agents that learn good strategies online, implemented no-regret learning algorithms in multi-armed bandit and expert settings, comparing deterministic greedy policies against randomized algorithms such as Randomized Greedy and Multiplicative Weights Updates. Regret was used as a benchmark to guarantee that, over time, the learned policy performs almost as well as the best fixed action in hindsight, even under adversarial reward sequences.
Connected these decision-theoretic ideas back to knowledge-based agents by representing domain knowledge in logic, building simple knowledge bases, and using inference (e.g., resolution, forward and backward chaining) so agents can reason about hidden state, constraints, and goals before engaging in strategic interaction. This combination of logical reasoning, game-theoretic analysis, and regret-based learning provided a unified view of how AI systems can plan, predict opponents, and adapt their policies in complex, multi-agent environments.
- Implemented minimax and α-β pruning for deterministic two-player board games, tuning move ordering and evaluation functions to search deeper under strict time limits.
- Analyzed Bayesian and mixed-strategy Nash equilibria in partial-information games, deriving best-response dynamics and reasoning about how players should randomize their actions when they cannot fully observe each other’s choices.
- Built regret-minimizing policies for expert and bandit settings, comparing greedy, Randomized Greedy, and Multiplicative Weights algorithms and proving logarithmic or sublinear regret bounds against adversarial sequences of rewards.
- Designed logical agents for environments like Wumpus World, encoding safety and reachability constraints in propositional logic and using inference to decide which actions keep the agent safe while still pursuing rewards.