StudyWikiCanonicalModerate
Designing Optimal Dynamic Treatment Regimes: A Causal Reinforcement Learning Approach
Junzhe Zhang · International Conference on Machine Learning · 2020 · 16 citations
Read the breakdown →BookWikiCanonicalHigh evidence score
Reinforcement Learning: An Introduction
Richard S. Sutton, Andrew G. Barto · MIT Press · 2018
The standard textbook introduction to reinforcement learning, covering MDPs, value functions, temporal-difference learning, policy gradients, and core algorithms.
Read the breakdown →StudyWikiCanonicalHigh confidence
A Survey of Constraint Formulations in Safe Reinforcement Learning
Akifumi Wachi, Xun Shen, Yanan Sui · IJCAI · 2024
A survey of safe RL constraint formulations, representative algorithms, and the relationships among common constrained decision-making criteria.
Read the breakdown →BookWikiCanonicalHigh evidence score
Bandit Algorithms
Tor Lattimore, Csaba Szepesvari · Cambridge University Press · 2020
A comprehensive reference for stochastic bandits, adversarial bandits, contextual bandits, lower bounds, UCB, Thompson sampling, and structured variants.
Read the breakdown →StudyWikiCanonicalHigh confidence
A Comprehensive Survey on Safe Reinforcement Learning
Javier Garcia, Fernando Fernandez · Journal of Machine Learning Research · 2015
A classic survey of safe reinforcement learning, including risk-sensitive criteria, constrained exploration, safety during learning, and external guidance.
Read the breakdown →StudyWikiCanonicalHigh confidence
Off-Policy Policy Evaluation for Sequential Decisions under Unobserved Confounding
Hongseok Namkoong, Ramtin Keramati, Steve Yadlowsky +1 more · arXiv · 2020
Studies off-policy evaluation for sequential decisions when hidden confounders may bias logged trajectories.
Read the breakdown →StudyWikiCanonicalHigh confidence
Markov Decision Processes with Unobserved Confounders: A Causal Approach
Junzhe Zhang, Elias Bareinboim · CausalAI Lab Technical Report R-23 · 2016
Extends causal reasoning to MDPs where hidden variables may affect both actions and outcomes, motivating CRL methods that reason about confounding in sequential settings.
Read the breakdown →StudyWikiCanonicalHigh confidence
Near-Optimal Reinforcement Learning in Dynamic Treatment Regimes
Junzhe Zhang, Elias Bareinboim · NeurIPS · 2019
Connects causal reinforcement learning with dynamic treatment regimes, focusing on near-optimal sequential treatment policies.
Read the breakdown →StudyWikiCanonicalHigh confidence
Bandits with Unobserved Confounders: A Causal Approach
Elias Bareinboim, Andrew Forney, Judea Pearl · NeurIPS · 2015
Introduces a causal treatment of bandit problems where observational feedback may be confounded, showing when causal structure can improve intervention selection.
Read the breakdown →StudyWikiCanonicalHigh confidence
Structural Causal Bandits: Where to Intervene?
Sanghack Lee, Elias Bareinboim · NeurIPS · 2018
Introduces structural causal bandits, where the learner chooses interventions in a causal graph rather than arms with unrelated reward distributions.
Read the breakdown →StudyWikiCanonicalHigh confidence
An Introduction to Causal Reinforcement Learning
Elias Bareinboim, Junzhe Zhang, Sanghack Lee · CausalAI Lab Technical Report R-65 · 2024
A tutorial survey that organizes causal reinforcement learning around offline-to-online learning, intervention choice, counterfactual decision-making, transportability, causal discovery, imitation, curriculum learning, reward shaping, and causal game theory.
Read the breakdown →StudyPreprintWikiCanonicalModerate
Always Valid Inference: Bringing Sequential Analysis to A/B Testing
Ramesh Johari, Leo Pekelis, David J. Walsh · 2015 · 101 citations
A/B tests are typically analyzed via frequentist p-values and confidence intervals; but these inferences are wholly unreliable if users endogenously choose samples sizes by *continuously monitoring* their tests. We define *always valid* p-values and confidence intervals that let users try to take advantage of data as fast as it becomes available, providing valid statistical inference whenever they make their decision. Always valid inference can be interpreted as a natural interface for a sequential hypothesis test, which empowers users to implement a modified test tailored to them. In particular, we show in an appropriate sense that the measures we develop tradeoff sample size and power efficiently, despite a lack of prior knowledge of the user's relative preference between these two goals. We also use always valid p-values to obtain multiple hypothesis testing control in the sequential context. Our methodology has been implemented in a large scale commercial A/B testing platform to analyze hundreds of thousands of experiments to date.
Read the breakdown →StudyPreprintWikiCanonicalModerate
A Tutorial on Thompson Sampling
Daniel Russo, Benjamin Van Roy, Abbas Kazerouni +2 more · 2017 · 1,175 citations
Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information that may improve future performance. The algorithm addresses a broad range of problems in a computationally efficient manner and is therefore enjoying wide use. This tutorial covers the algorithm and its application, illustrating concepts through a range of examples, including Bernoulli bandit problems, shortest path problems, product recommendation, assortment, active learning with neural networks, and reinforcement learning in Markov decision processes. Most of these problems involve complex information structures, where information revealed by taking an action informs beliefs about other actions. We will also discuss when and why Thompson sampling is or is not effective and relations to alternative algorithms.
Read the breakdown →StudyPreprintWikiCanonicalModerate
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
Sergey Levine, Aviral Kumar, George Tucker +1 more · 2020
In this tutorial article, we aim to provide the reader with the conceptual tools needed to get started on research on offline reinforcement learning algorithms: reinforcement learning algorithms that utilize previously collected data, without additional online data collection. Offline reinforcement learning algorithms hold tremendous promise for making it possible to turn large datasets into powerful decision making engines. Effective offline reinforcement learning methods would be able to extract policies with the maximum possible utility out of the available data, thereby allowing automation of a wide range of decision-making domains, from healthcare and education to robotics. However, the limitations of current algorithms make this difficult. We will aim to provide the reader with an understanding of these challenges, particularly in the context of modern deep reinforcement learning methods, and describe some potential solutions that have been explored in recent work to mitigate these challenges, along with recent applications, and a discussion of perspectives on open problems in the field.
Read the breakdown →StudyPreprintWikiCanonicalModerate
Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal +2 more · 2017
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of minibatch updates. The new methods, which we call proximal policy optimization (PPO), have some of the benefits of trust region policy optimization (TRPO), but they are much simpler to implement, more general, and have better sample complexity (empirically). Our experiments test PPO on a collection of benchmark tasks, including simulated robotic locomotion and Atari game playing, and we show that PPO outperforms other online policy gradient methods, and overall strikes a favorable balance between sample complexity, simplicity, and wall-time.
Read the breakdown →StudyWikiHigh confidence
Sequential Causal Imitation Learning with Unobserved Confounders
Daniel Kumor, Junzhe Zhang, Elias Bareinboim · NeurIPS · 2021
Extends causal imitation learning to sequential settings where confounding can persist across time.
Read the breakdown →StudyWikiHigh confidence
Characterizing Optimal Mixed Policies: Where to Intervene, What to Observe
Sanghack Lee, Elias Bareinboim · NeurIPS · 2020
Characterizes policies that mix interventions and observations in causal decision problems.
Read the breakdown →StudyWikiHigh confidence
Structural Causal Bandits with Non-Manipulable Variables
Sanghack Lee, Elias Bareinboim · AAAI · 2019
Extends structural causal bandits to settings where some variables can be observed but not directly manipulated.
Read the breakdown →StudyWikiHigh confidence
Budgeted Experiment Design for Causal Structure Learning
AmirEmad Ghassami, Saber Salehkaleybar, Negar Kiyavash +1 more · ICML · 2018
Addresses how to allocate a limited intervention budget to learn causal structure efficiently.
Read the breakdown →StudyWikiHigh confidence
Counterfactual Data-Fusion for Online Reinforcement Learners
Andrew Forney, Judea Pearl, Elias Bareinboim · ICML · 2017
Studies how online learners can combine heterogeneous observational and experimental data sources using counterfactual data-fusion principles.
Read the breakdown →Meta-analysisHigh evidence score
The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective
John K. Kruschke, Torrin M. Liddell · Psychonomic Bulletin & Review · 2017 · 1,326 citations
Meta-analysisHigh evidence score
Testing by Betting: A Strategy for Statistical and Scientific Communication
Glenn Shafer · Journal of the Royal Statistical Society Series A (Statistics in Society) · 2021 · 112 citations
Abstract The most widely used concept of statistical inference—the p-value—is too complicated for effective communication to a wide audience. This paper introduces a simpler way of reporting statistical evidence: report the outcome of a bet against the null hypothesis. This leads to a new role for likelihood, to alternatives to power and confidence, and to a framework for meta-analysis that accommodates both planned and opportunistic testing of statistical hypotheses and probabilistic forecasts. This framework builds on the foundation for mathematical probability developed in previous work by Vladimir Vovk and myself.
Systematic ReviewHigh evidence score
A survey on causal inference for recommendation
Huishi Luo, Fuzhen Zhuang, Ruobing Xie +4 more · The Innovation · 2024 · 37 citations
Causal inference has recently garnered significant interest among recommender system (RS) researchers due to its ability to dissect cause-and-effect relationships and its broad applicability across multiple fields. It offers a framework to model the causality in RSs such as confounding effects and deal with counterfactual problems such as offline policy evaluation and data augmentation. Although there are already some valuable surveys on causal recommendations, they typically classify approaches based on the practical issues faced in RS, a classification that may disperse and fragment the unified causal theories. Considering RS researchers' unfamiliarity with causality, it is necessary yet challenging to comprehensively review relevant studies from a coherent causal theoretical perspective, thereby facilitating a deeper integration of causal inference in RS. This survey provides a systematic review of up-to-date papers in this area from a causal theory standpoint and traces the evolutionary development of RS methods within the same causal strategy. First, we introduce the fundamental concepts of causal inference as the basis of the following review. Subsequently, we propose a novel theory-driven taxonomy, categorizing existing methods based on the causal theory employed, namely those based on the potential outcome framework, the structural causal model, and general counterfactuals. The review then delves into the technical details of how existing methods apply causal inference to address particular recommender issues. Finally, we highlight some promising directions for future research in this field. Representative papers and open-source resources will be progressively available at https://github.com/Chrissie-Law/Causal-Inference-for-Recommendation.
StudyModerate
Quantiles via moments
José A. F. Machado, João Santos Silva · Journal of Econometrics · 2019 · 2,156 citations
StudyModerate
Reinforcement Learning: A Survey
Leslie Pack Kaelbling, Michael L. Littman, Andrew Moore · Journal of Artificial Intelligence Research · 1996 · 8,787 citations
This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.
StudyModerate
Interpretable machine learning: Fundamental principles and 10 grand challenges
Cynthia Rudin, Chaofan Chen, Zhi Chen +3 more · Statistics Surveys · 2022 · 828 citations
Interpretability in machine learning (ML) is crucial for high stakes decisions and troubleshooting. In this work, we provide fundamental principles for interpretable ML, and dispel common misunderstandings that dilute the importance of this crucial topic. We also identify 10 technical challenge areas in interpretable machine learning and provide history and background on each problem. Some of these problems are classically important, and some are recent problems that have arisen in the last few years. These problems are: (1) Optimizing sparse logical models such as decision trees; (2) Optimization of scoring systems; (3) Placing constraints into generalized additive models to encourage sparsity and better interpretability; (4) Modern case-based reasoning, including neural networks and matching for causal inference; (5) Complete supervised disentanglement of neural networks; (6) Complete or even partial unsupervised disentanglement of neural networks; (7) Dimensionality reduction for data visualization; (8) Machine learning models that can incorporate physics and other generative or causal constraints; (9) Characterization of the “Rashomon set” of good models; and (10) Interpretable reinforcement learning. This survey is suitable as a starting point for statisticians and computer scientists interested in working in interpretable machine learning.
StudyModerate
Bayesian inference for psychology. Part I: Theoretical advantages and practical ramifications
Eric‐Jan Wagenmakers, Maarten Marsman, Tahira Jamil +10 more · Psychonomic Bulletin & Review · 2017 · 1,697 citations
Bayesian parameter estimation and Bayesian hypothesis testing present attractive alternatives to classical inference using confidence intervals and p values. In part I of this series we outline ten prominent advantages of the Bayesian approach. Many of these advantages translate to concrete opportunities for pragmatic researchers. For instance, Bayesian hypothesis testing allows researchers to quantify evidence and monitor its progression as data come in, without needing to know the intention with which the data were collected. We end by countering several objections to Bayesian hypothesis testing. Part II of this series discusses JASP, a free and open source software program that makes it easy to conduct Bayesian estimation and testing for a range of popular statistical scenarios (Wagenmakers et al. this issue).
StudyTop journalModerate
Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors
Jennifer A. Hoeting, David Madigan, Adrian E. Raftery +1 more · Statistical Science · 1999 · 4,164 citations
Standard statistical practice ignores model uncertainty. Data analysts typically select a model from some class of models and then proceed as if the selected model had generated the data. This approach ignores the uncertainty in model selection, leading to over-confident inferences and decisions that are more risky than one thinks they are. Bayesian model averaging (BMA)provides a coherent mechanism for accounting for this model uncertainty. Several methods for implementing BMA have recently emerged. We discuss these methods and present a number of examples.In these examples, BMA provides improved out-of-sample predictive performance. We also provide a catalogue of currently available BMA software.
StudyModerate
An Introduction to Deep Reinforcement Learning
Vincent François-Lavet, Peter Henderson, Riashat Islam +2 more · Foundations and Trends® in Machine Learning · 2018 · 1,241 citations
Deep reinforcement learning is the combination of reinforcement learning (RL) and deep learning. This field of research has been able to solve a wide range of complex decision making tasks that were previously out of reach for a machine. Thus, deep RL opens up many new applications in domains such as healthcare, robotics, smart grids, finance, and many more. This manuscript provides an introduction to deep reinforcement learning models, algorithms and techniques. Particular focus is on the aspects related to generalization and how deep RL can be used for practical applications. We assume the reader is familiar with basic machine learning concepts.
StudyTop journalModerate
The magical number 4 in short-term memory: A reconsideration of mental storage capacity
Nelson Cowan · Behavioral and Brain Sciences · 2001 · 6,745 citations
Miller (1956) summarized evidence that people can remember about seven chunks in short-term memory (STM) tasks. However, that number was meant more as a rough estimate and a rhetorical device than as a real capacity limit. Others have since suggested that there is a more precise capacity limit, but that it is only three to five chunks. The present target article brings together a wide variety of data on capacity limits suggesting that the smaller capacity limit is real. Capacity limits will be useful in analyses of information processing only if the boundary conditions for observing them can be carefully described. Four basic conditions in which chunks can be identified and capacity limits can accordingly be observed are: (1) when information overload limits chunks to individual stimulus items, (2) when other steps are taken specifically to block the recording of stimulus items into larger chunks, (3) in performance discontinuities caused by the capacity limit, and (4) in various indirect effects of the capacity limit. Under these conditions, rehearsal and long-term memory cannot be used to combine stimulus items into chunks of an unknown size; nor can storage mechanisms that are not capacity-limited, such as sensory memory, allow the capacity-limited storage mechanism to be refilled during recall. A single, central capacity limit averaging about four chunks is implicated along with other, noncapacity-limited sources. The pure STM capacity limit expressed in chunks is distinguished from compound STM limits obtained when the number of separately held chunks is unclear. Reasons why pure capacity estimates fall within a narrow range are discussed and a capacity limit for the focus of attention is proposed.
StudyModerate
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference
Tom McCoy, Ellie Pavlick, Tal Linzen · 2019 · 914 citations
A machine learning system can score well on a given test set by relying on heuristics that are effective for frequent example types but break down in more challenging cases. We study this issue within natural language inference (NLI), the task of determining whether one sentence entails another. We hypothesize that statistical NLI models may adopt three fallible syntactic heuristics: the lexical overlap heuristic, the subsequence heuristic, and the constituent heuristic. To determine whether models have adopted these heuristics, we introduce a controlled evaluation set called HANS (Heuristic Analysis for NLI Systems), which contains many examples where the heuristics fail. We find that models trained on MNLI, including BERT, a state-of-the-art model, perform very poorly on HANS, suggesting that they have indeed adopted these heuristics. We conclude that there is substantial room for improvement in NLI systems, and that the HANS dataset can motivate and measure progress in this area.
StudyModerate
Challenges of real-world reinforcement learning: definitions, benchmarks and analysis
Gabriel Dulac-Arnold, Nir Levine, Daniel J. Mankowitz +4 more · Machine Learning · 2021 · 557 citations
RCTHigh evidence score
Selective Trials: A Principal-Agent Approach to Randomized Controlled Experiments
Sylvain Chassang, Gerard Padró i Miquel, Erik Snowberg · American Economic Review · 2012 · 153 citations
We study the design of randomized controlled experiments when outcomes are significantly affected by experimental subjects' unobserved effort expenditure. While standard randomized controlled trials (RCTs) are internally consistent, the unobservability of effort compromises external validity. We approach trial design as a principal-agent problem and show that natural extensions of RCTs—which we call selective trials—can help improve external validity. In particular, selective trials can disentangle the effects of treatment, effort, and the interaction of treatment and effort. Moreover, they can help identify when treatment effects are affected by erroneous beliefs and inappropriate effort expenditure.(JEL C90, D82)
StudyModerate
Benchmarking and survey of explanation methods for black box models
Francesco Bodria, Fosca Giannotti, Riccardo Guidotti +3 more · Data Mining and Knowledge Discovery · 2023 · 234 citations
Abstract The rise of sophisticated black-box machine learning models in Artificial Intelligence systems has prompted the need for explanation methods that reveal how these models work in an understandable way to users and decision makers. Unsurprisingly, the state-of-the-art exhibits currently a plethora of explainers providing many different types of explanations. With the aim of providing a compass for researchers and practitioners, this paper proposes a categorization of explanation methods from the perspective of the type of explanation they return, also considering the different input data formats. The paper accounts for the most representative explainers to date, also discussing similarities and discrepancies of returned explanations through their visual appearance. A companion website to the paper is provided as a continuous update to new explainers as they appear. Moreover, a subset of the most robust and widely adopted explainers, are benchmarked with respect to a repertoire of quantitative metrics.
StudyModerate
Finite-time Analysis of the Multiarmed Bandit Problem
Peter Auer, Nicolò Cesa‐Bianchi, Paul Fischer · Machine Learning · 2002 · 5,786 citations
Meta-analysisTop journalHigh evidence score
The Anytime-Valid Logrank Test: Error Control Under Continuous Monitoring with Unlimited Horizon
Judith ter Schure, Muriel F. Pérez-Ortiz, Alexander Ly +1 more · The New England Journal of Statistics in Data Science · 2024 · 7 citations
We introduce the anytime-valid (AV) logrank test, a version of the logrank test that provides type-I error guarantees under optional stopping and optional continuation. The test is sequential without the need to specify a maximum sample size or stopping rule, and allows for cumulative meta-analysis with type-I error control. The method can be extended to define anytime-valid confidence intervals. The logrank test is an instance of the martingale tests based on E-variables that have been recently developed. We demonstrate type-I error guarantees for the test in a semiparametric setting of proportional hazards, show explicitly how to extend it to ties and confidence sequences and indicate further extensions to the full Cox regression model. Using a Gaussian approximation on the logrank statistic, we show that the AV logrank test (which itself is always exact) has a similar rejection region to O’Brien-Fleming α-spending but with the potential to achieve $100\% $ power by optional continuation. Although our approach to study design requires a larger sample size, the expected sample size is competitive by optional stopping.
StudyTop journalModerate
Interval Estimation for a Binomial Proportion
Lawrence D. Brown, Tommaso Cai, Anirban Dasgupta · Statistical Science · 2001 · 3,475 citations
We revisit the problem of interval estimation of a binomial proportion. The erratic behavior of the coverage probability of the standard Wald confidence interval has previously been remarked on in the literature (Blyth and Still, Agresti and Coull, Santner and others). We begin by showing that the chaotic coverage properties of the Wald interval are far more persistent than is appreciated. Furthermore, common textbook prescriptions regarding its safety are misleading and defective in several respects and cannot be trusted. This leads us to consideration of alternative intervals. A number of natural alternatives are presented, each with its motivation and context. Each interval is examined for its coverage probability and its length. Based on this analysis, we recommend the Wilson interval or the equal-tailed Jeffreys prior interval for small n and the interval suggested in Agresti and Coull for larger n. We also provide an additional frequentist justification for use of the Jeffreys interval.
RCTHigh evidence score
Group Sequential Tests for Delayed Responses (with discussion)
Lisa V. Hampson, Christopher Jennison · Journal of the Royal Statistical Society Series B (Statistical Methodology) · 2012 · 109 citations
Summary Group sequential methods are used routinely to monitor clinical trials and to provide early stopping when there is evidence of a treatment effect, a lack of an effect or concerns about patient safety. In many studies, the response of clinical interest is measured some time after the start of treatment and there are subjects at each interim analysis who have been treated but are yet to respond. We formulate a new form of group sequential test which gives a proper treatment of these ‘pipeline’ subjects; these tests can be applied even when the continued accrual of data after the decision to stop the trial is unexpected. We illustrate our methods through a series of examples. We define error spending versions of these new designs which handle unpredictable group sizes and provide an information monitoring framework that can accommodate nuisance parameters, such as an unknown response variance. By studying optimal versions of our new designs, we show how the benefits of lower expected sample size that are normally achieved by a group sequential test are reduced when there is a delay in response. The loss of efficiency for larger delays can be ameliorated by incorporating data on a correlated short-term end point, fitting a joint model for the two end points but still making inferences on the original, longer-term end point. We derive p-values and confidence intervals on termination of our new tests.
StudyModerate
Human-in-the-Loop Reinforcement Learning: A Survey and Position on Requirements, Challenges, and Opportunities
Carl Orge Retzlaff, Srijita Das, Christabel Wayllace +7 more · Journal of Artificial Intelligence Research · 2024 · 129 citations
Artificial intelligence (AI) and especially reinforcement learning (RL) have the potential to enable agents to learn and perform tasks autonomously with superhuman performance. However, we consider RL as fundamentally a Human-in-the-Loop (HITL) paradigm, even when an agent eventually performs its task autonomously. In cases where the reward function is challenging or impossible to define, HITL approaches are considered particularly advantageous. The application of Reinforcement Learning from Human Feedback (RLHF) in systems such as ChatGPT demonstrates the effectiveness of optimizing for user experience and integrating their feedback into the training loop. In HITL RL, human input is integrated during the agent’s learning process, allowing iterative updates and fine-tuning based on human feedback, thus enhancing the agent’s performance. Since the human is an essential part of this process, we argue that human-centric approaches are the key to successful RL, a fact that has not been adequately considered in the existing literature. This paper aims to inform readers about current explainability methods in HITL RL. It also shows how the application of explainable AI (xAI) and specific improvements to existing explainability approaches can enable a better human-agent interaction in HITL RL for all types of users, whether for lay people, domain experts, or machine learning specialists. Accounting for the workflow in HITL RL and based on software and machine learning methodologies, this article identifies four phases for human involvement for creating HITL RL systems: (1) Agent Development, (2) Agent Learning, (3) Agent Evaluation, and (4) Agent Deployment. We highlight human involvement, explanation requirements, new challenges, and goals for each phase. We furthermore identify low-risk, high-return opportunities for explainability research in HITL RL and present long-term research goals to advance the field. Finally, we propose a vision of human-robot collaboration that allows both parties to reach their full potential and cooperate effectively.
StudyModerate
Bayes factor design analysis: Planning for compelling evidence
Felix D. Schönbrodt, Eric‐Jan Wagenmakers · Psychonomic Bulletin & Review · 2017 · 764 citations
A sizeable literature exists on the use of frequentist power analysis in the null-hypothesis significance testing (NHST) paradigm to facilitate the design of informative experiments. In contrast, there is almost no literature that discusses the design of experiments when Bayes factors (BFs) are used as a measure of evidence. Here we explore Bayes Factor Design Analysis (BFDA) as a useful tool to design studies for maximum efficiency and informativeness. We elaborate on three possible BF designs, (a) a fixed-n design, (b) an open-ended Sequential Bayes Factor (SBF) design, where researchers can test after each participant and can stop data collection whenever there is strong evidence for either $\mathcal {H}_{1}$ or $\mathcal {H}_{0}$ , and (c) a modified SBF design that defines a maximal sample size where data collection is stopped regardless of the current state of evidence. We demonstrate how the properties of each design (i.e., expected strength of evidence, expected sample size, expected probability of misleading evidence, expected probability of weak evidence) can be evaluated using Monte Carlo simulations and equip researchers with the necessary information to compute their own Bayesian design analyses.
StudyModerate
Pair-copula constructions of multiple dependence
Kjersti Aas, Claudia Czado, Arnoldo Frigessi +1 more · Insurance Mathematics and Economics · 2007 · 2,030 citations
RCTHigh evidence score
Choice of futility boundaries for group sequential designs with two endpoints
Svenja Schüler, Meinhard Kieser, Geraldine Rauch · BMC Medical Research Methodology · 2017 · 43 citations
BACKGROUND: In clinical trials, the opportunity for an early stop during an interim analysis (either for efficacy or for futility) may relevantly save time and financial resources. This is especially important, if the planning assumptions required for power calculation are based on a low level of evidence. For example, when including two primary endpoints in the confirmatory analysis, the power of the trial depends on the effects of both endpoints and on their correlation. Assessing the feasibility of such a trial is therefore difficult, as the number of parameter assumptions to be correctly specified is large. For this reason, so-called 'group sequential designs' are of particular importance in this setting. Whereas the choice of adequate boundaries to stop a trial early for efficacy has been broadly discussed in the literature, the choice of optimal futility boundaries has not been investigated so far, although this may have serious consequences with respect to performance characteristics. METHODS: In this work, we propose a general method to construct 'optimal' futility boundaries according to predefined criteria. Further, we present three different group sequential designs for two endpoints applying these futility boundaries. Our methods are illustrated by a real clinical trial example and by Monte-Carlo simulations. RESULTS: By construction, the provided method of choosing futility boundaries maximizes the probability to correctly stop in case of small or opposite effects while limiting the power loss and the probability of stopping the study 'wrongly'. Our results clearly demonstrate the benefit of using such 'optimal' futility boundaries, especially compared to futility boundaries commonly applied in practice. CONCLUSIONS: As the properties of futility boundaries are often not considered in practice and unfavorably chosen futility boundaries may imply bad properties of the study design, we recommend assessing the performance of these boundaries according to the criteria proposed in here.
ObservationalModerate
Optimal Dynamic Treatment Regimes
Susan A. Murphy · Journal of the Royal Statistical Society Series B (Statistical Methodology) · 2003 · 1,040 citations
Summary A dynamic treatment regime is a list of decision rules, one per time interval, for how the level of treatment will be tailored through time to an individual’s changing status. The goal of this paper is to use experimental or observational data to estimate decision regimes that result in a maximal mean response. To explicate our objective and to state the assumptions, we use the potential outcomes model. The method proposed makes smooth parametric assumptions only on quantities that are directly relevant to the goal of estimating the optimal rules. We illustrate the methodology proposed via a small simulation.
StudyModerate
A Survey of Zero-shot Generalisation in Deep Reinforcement Learning
Robert Kirk, Amy Zhang, Edward Grefenstette +1 more · Journal of Artificial Intelligence Research · 2023 · 158 citations
The study of zero-shot generalisation (ZSG) in deep Reinforcement Learning (RL) aims to produce RL algorithms whose policies generalise well to novel unseen situations at deployment time, avoiding overfitting to their training environments. Tackling this is vital if we are to deploy reinforcement learning algorithms in real world scenarios, where the environment will be diverse, dynamic and unpredictable. This survey is an overview of this nascent field. We rely on a unifying formalism and terminology for discussing different ZSG problems, building upon previous works. We go on to categorise existing benchmarks for ZSG, as well as current methods for tackling these problems. Finally, we provide a critical discussion of the current state of the field, including recommendations for future work. Among other conclusions, we argue that taking a purely procedural content generation approach to benchmark design is not conducive to progress in ZSG, we suggest fast online adaptation and tackling RL-specific problems as some areas for future work on methods for ZSG, and we recommend building benchmarks in underexplored problem settings such as offline RL ZSG and reward-function variation.
StudyModerate
Meaningful Explanations of Black Box AI Decision Systems
Dino Pedreschi, Fosca Giannotti, Riccardo Guidotti +3 more · Proceedings of the AAAI Conference on Artificial Intelligence · 2019 · 201 citations
Black box AI systems for automated decision making, often based on machine learning over (big) data, map a user’s features into a class or a score without exposing the reasons why. This is problematic not only for lack of transparency, but also for possible biases inherited by the algorithms from human prejudices and collection artifacts hidden in the training data, which may lead to unfair or wrong decisions. We focus on the urgent open challenge of how to construct meaningful explanations of opaque AI/ML systems, introducing the local-toglobal framework for black box explanation, articulated along three lines: (i) the language for expressing explanations in terms of logic rules, with statistical and causal interpretation; (ii) the inference of local explanations for revealing the decision rationale for a specific case, by auditing the black box in the vicinity of the target instance; (iii), the bottom-up generalization of many local explanations into simple global ones, with algorithms that optimize for quality and comprehensibility. We argue that the local-first approach opens the door to a wide variety of alternative solutions along different dimensions: a variety of data sources (relational, text, images, etc.), a variety of learning problems (multi-label classification, regression, scoring, ranking), a variety of languages for expressing meaningful explanations, a variety of means to audit a black box.
StudyModerate
A Gentle Introduction to Reinforcement Learning and its Application in Different Fields
Muddasar Naeem, Syed Tahir Hussain Rizvi, Antonio Coronato · IEEE Access · 2020 · 241 citations
Due to the recent progress in Deep Neural Networks, Reinforcement Learning (RL) has become one of the most important and useful technology. It is a learning method where a software agent interacts with an unknown environment, selects actions, and progressively discovers the environment dynamics. RL has been effectively applied in many important areas of real life. This article intends to provide an in-depth introduction of the Markov Decision Process, RL and its algorithms. Moreover, we present a literature review of the application of RL to a variety of fields, including robotics and autonomous control, communication and networking, natural language processing, games and self-organized system, scheduling management and configuration of resources, and computer vision.
StudyModerate
Harms from Increasingly Agentic Algorithmic Systems
Alan Chan, Rebecca Salganik, Alva Markelius +19 more · 2023 · 100 citations
Research in Fairness, Accountability, Transparency, and Ethics (FATE)1 has established many sources and forms of algorithmic harm, in domains as diverse as health care, finance, policing, and recommendations. Much work remains to be done to mitigate the serious harms of these systems, particularly those disproportionately affecting marginalized communities. Despite these ongoing harms, new systems are being developed and deployed, typically without strong regulatory barriers, threatening the perpetuation of the same harms and the creation of novel ones. In response, the FATE community has emphasized the importance of anticipating harms, rather than just responding to them. Anticipation of harms is especially important given the rapid pace of developments in machine learning (ML). Our work focuses on the anticipation of harms from increasingly agentic systems. Rather than providing a definition of agency as a binary property, we identify 4 key characteristics which, particularly in combination, tend to increase the agency of a given algorithmic system: underspecification, directness of impact, goal-directedness, and long-term planning. We also discuss important harms which arise from increasing agency – notably, these include systemic and/or long-range impacts, often on marginalized or unconsidered stakeholders. We emphasize that recognizing agency of algorithmic systems does not absolve or shift the human responsibility for algorithmic harms. Rather, we use the term agency to highlight the increasingly evident fact that ML systems are not fully under human control. Our work explores increasingly agentic algorithmic systems in three parts. First, we explain the notion of an increase in agency for algorithmic systems in the context of diverse perspectives on agency across disciplines. Second, we argue for the need to anticipate harms from increasingly agentic systems. Third, we discuss important harms from increasingly agentic systems and ways forward for addressing them. We conclude by reflecting on implications of our work for anticipating algorithmic harms from emerging systems.
StudyModerate
Filtering via Simulation: Auxiliary Particle Filters
M. Pitt, Neil Shephard · Journal of the American Statistical Association · 1999 · 2,261 citations
This article analyses the recently suggested particle approach to filtering time series. We suggest that the algorithm is not robust to outliers for two reasons: the design of the simulators and the use of the discrete support to represent the sequentially updating prior distribution. Here we tackle the first of these problems.
StudyModerate
Bridging Direct and Indirect Data-Driven Control Formulations via Regularizations and Relaxations
Florian Dörfler, Jeremy Coulson, Ivan Markovsky · IEEE Transactions on Automatic Control · 2022 · 195 citations
In this article, we discuss connections between sequential system identification and control for linear time-invariant systems, often termed indirect data-driven control, as well as a contemporary direct data-driven control approach seeking an optimal decision compatible with recorded data assembled in a Hankel matrix and robustified through suitable regularizations. We formulate these two problems in the language of behavioral systems theory and parametric mathematical programs, and we bridge them through a multicriteria formulation trading off system identification and control objectives. We illustrate our results with two methods from subspace identification and control: namely, subspace predictive control and low-rank approximation, which constrain trajectories to be consistent with a nonparametric predictor derived from (respectively, the column span of) a data Hankel matrix. In both cases, we conclude that direct and regularized data-driven control can be derived as convex relaxation of the indirect approach, and the regularizations account for an implicit identification step. Our analysis further reveals a novel regularizer and a plausible hypothesis explaining the remarkable empirical performance of direct methods on nonlinear systems.
StudyLeading journalModerate
Regimes of Expectations: An Active Inference Model of Social Conformity and Human Decision Making
Axel Constant, Maxwell J. D. Ramstead, Samuel P. L. Veissière +1 more · Frontiers in Psychology · 2019 · 174 citations
How do humans come to acquire shared expectations about how they ought to behave in distinct normalized social settings? This paper offers a normative framework to answer this question. We introduce the computational construct of 'deontic value' - based on active inference and Markov decision processes - to formalize conceptions of social conformity and human decision-making. Deontic value is an attribute of choices, behaviors, or action sequences that inherit directly from deontic cues in our econiche (e.g., red traffic lights); namely, cues that denote an obligatory social rule. Crucially, the prosocial aspect of deontic value rests upon a particular form of circular causality: deontic cues exist in the environment in virtue of the environment being modified by repeated actions, while action itself is contingent upon the deontic value of environmental cues. We argue that this construction of deontic cues enables the epistemic (i.e., information-seeking) and pragmatic (i.e., goal- seeking) values of any behavior to be 'cached' or 'outsourced' to the environment, where the environment effectively 'learns' about the behavior of its denizens. We describe the process whereby this particular aspect of value enables learning of habitual behavior over neurodevelopmental and transgenerational timescales.