March 20, 2022

Introducing Core Concepts

Thinking about all survey data as tensors

We deal exclusively with discrete variables. This means all variables can be represented as tensors. Tensors have a few properties:

Let’s consider a multi-select question with 5 options. The data might look something like: [1,0,1,1,0] for a single response, or [[1,0,1,1,0,1], [0,0,0,1,0], [1,1,1,1,1]] for all 3 responses.

The shape of this tensor is $(n_{responses}, n_{options})$. The rank is 1 more than the rank of the underlying data type, which for selection questions is a rank 1 row-vector.

Generalizing the underlying data format

Formalizing Survey Variables

Consider a survey $S$ with 2 questions:

  1. What is your gender? (M, F, Other)
  2. What is your favourite letter? (A, B, C)

We can call all of the answers to these questions $X_1$ and $X_2$, our first 2 variables. Say we have N respondents answer this survey:

$$ R_1, R_2, ... , R_N $$

We know that $X_1= \{X_1[R_a], X_1[R_b], X_1[R_c], ...\}$ is a set potentially as large as $N$ (we definitely know $|X_n| \leq |R|$).