Regression analysis is a way of extrapolating values along a numerical scale for a given set of inputs. A classical example would be modelling house prices based on the number of bedrooms, bathrooms, proximity to a good school etc.

Linear Regression

A well understood and widely popular method for regression is Linear Regression. In this mode we plot a line of best fit through our data points

Illustration of a simple regression from my PhD thesis

Bayesian Regression

Bayesian regression is a way to calculate the probabilities of a range of possible outputs for y and account for uncertainty in our input variables

See: Bayesian Modelling

Here is documentation about bayesian modelling for linear regression from Stan: https://mc-stan.org/docs/stan-users-guide/regression.html

Annotating Data for Regression

Annotating data for regression is typically more challenging than annotation for Text Classification because humans are pretty bad at placing data along a linear scale.

A common approach is the use of a Likert Scale (Usually 4-5 options, Strongly Disagree, Disagree, Agree, Strongly Agree) but recent work1 has shown that this approach is typically less reliable than Best Worst Ranking in which annotators are asked to rank examples relative to each other.

Footnotes

  1. S. Kiritchenko and S. Mohammad, ‘Best-Worst Scaling More Reliable than Rating Scales: A Case Study on Sentiment Intensity Annotation’, in Proceedings of the 55th Annual Meeting of the Association for          Computational Linguistics (Volume 2: Short Papers), Vancouver, Canada: Association for Computational Linguistics, 2017, pp. 465–470. doi: 10.18653/v1/P17-2074.