March 20, 2022
We deal exclusively with discrete variables. This means all variables can be represented as tensors. Tensors have a few properties:
Let’s consider a multi-select question with 5 options. The data might look something like:
[1,0,1,1,0]
for a single response, or [[1,0,1,1,0,1], [0,0,0,1,0], [1,1,1,1,1]]
for all 3 responses.
The shape of this tensor is $(n_{responses}, n_{options})$. The rank is 1 more than the rank of the underlying data type, which for selection questions is a rank 1 row-vector.
Consider a survey $S$ with 2 questions:
We can call all of the answers to these questions $X_1$ and $X_2$, our first 2 variables. Say we have N respondents answer this survey:
$$ R_1, R_2, ... , R_N $$
We know that $X_1= \{X_1[R_a], X_1[R_b], X_1[R_c], ...\}$ is a set potentially as large as $N$ (we definitely know $|X_n| \leq |R|$).