cover

Tracking Economic Sentiment: Advanced Time Series Models for Survey Responses

15 May 2025

GAR(1) models reveal trends in consumer inflation beliefs, showing how metric-space time series illuminate economic perceptions over time.

cover

Numerical Experiments Confirm GAR(1) Model Power in Non-Euclidean Time Series

15 May 2025

Simulations show GAR(1) model estimators are consistent and powerful for time series in real, Wasserstein, and SPD matrix spaces.

cover

Adventures in Metric Spaces: Defining Means Where None Exist

15 May 2025

Hadamard spaces provide structure for modeling time series of random objects; key parameters like the Fréchet mean are well-defined and identifiable here.

cover

Fréchet Means, Concentration, and Serial Dependence in Non-Euclidean Time Series

15 May 2025

Presents a new autoregressive model for time series in Hadamard spaces, enabling estimation, testing, and inference for random objects beyond Euclidean data.

cover

How to Improve Crowdsourced Labels for Dialogue Systems

9 Apr 2025

Explore supplementary materials supporting the main paper—Context Does Matter: Implications for Crowdsourced Evaluation Labels in Task-Oriented Dialogue Systems

cover

How to Improve the Accuracy of Online Ratings for AI Chatbots and Virtual Assistants

9 Apr 2025

This study shows that minimal context and heuristic methods can boost crowdsourced label consistency for relevance and usefulness in TDS evaluations.

cover

When Rating AI Chatbots, More Context Isn't Always Better

8 Apr 2025

Expanding context improves label consistency in TDS evaluations, but too much context can confuse annotators.

cover

Can AI-Generated Context Improve the Quality of Crowdsourced Feedback?

8 Apr 2025

Heuristic-generated context boosts crowdsourced label quality and consistency, outperforming LLM-based methods for both relevance and usefulness evaluations.

cover

The Surprising Effects of Minimal Dialogue Context on AI Judgment

8 Apr 2025

Increasing dialogue context boosts agreement on relevance ratings, but can cause inconsistency in usefulness judgments due to complex user feedback.