I often struggle to remember the correct way to do `and`

type comparisons when working in pandas.

I remember learning long long ago that `and`

and `&`

are different, the former being lazy boolean evaluation whereas the latter is a bitwise operation.

**I learned a lot from this SO post**

## Lists

Python `list`

objects can contain unlike elements - ie. `[True, 'foo', 1, '1', [1,2,3]]`

is a valid list with booleans, strings, integers, and another list.
Because of this, we can't use `&`

to compare two lists since they can't be combined in a consistent and meaningful way.

However we can use `and`

since it doesn't do bitwise operations, it just evaluates the boolean value of the list (basically if it's non-empty then `bool(my_list)`

evaluates to `True`

)

Here's an example:

```
sandbox NO VCS via 3.8.11(sandbox) ipython
❯ my_list = [1, "2", "foo", [True], False]
sandbox NO VCS via 3.8.11(sandbox) ipython
❯ bool(my_list)
True
```

If we compare `my_list`

with `another_list`

using `and`

then the comparision will go:

```
if bool(my_list):
if bool(another_list):
<operation>
else:
break
```

Let's see another example:

```
sandbox NO VCS via 3.8.11(sandbox) ipython
❯ another_list = [False, False]
sandbox NO VCS via 3.8.11(sandbox) ipython
❯ my_list and another_list
[False, False]
```

`bool(my_list)`

evaluated to `True`

, and `bool(another_list)`

*also* evaluated to `True`

even though it's full of `False`

values because the object is non-empty.

```
sandbox NO VCS via 3.8.11(sandbox) ipython
❯ if my_list and another_list:
...: print("foo")
foo
```

So using `and`

in this case results in a `True`

conditional, so the `print`

statement is executed.

Feels kind of counter-intuitive at first glance, to me anyways...

However, we can't use `&`

because there isn't a meaningful to do bitwise operations over these two lists:

```
sandbox NO VCS via 3.8.11(sandbox) ipython
❯ my_list & another_list
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ <ipython-input-19-a2a16cebb3da>:1 in <cell line: 1> │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: unsupported operand type(s) for &: 'list' and 'list'
```

## Numpy

`numpy`

arrays are special and they have a lot of fancy vectorization utilities built-in which make them great and fast for mathematical operations but now our logical comparisons need to be handled with a different kind of care.

First thing though - without some trickery they do not hold mixed data types like a `list`

does (necessary, I think, for the vectorized optimization that numpy is built on top of)

With that out of the way here's the main thing for this post, we can't just evaluate the `bool`

of an array - numpy says no no no.

```
sandbox NO VCS via 3.8.11(sandbox) ipython
❯ arr = np.array(["1", 2, True, False])
sandbox NO VCS via 3.8.11(sandbox) ipython
❯ arr
array(['1', '2', 'True', 'False'], dtype='<U21')
sandbox NO VCS via 3.8.11(sandbox) ipython
❯ bool(arr)
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ <ipython-input-25-4e8c5dd85b93>:1 in <cell line: 1> │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
```

This means that using

`and`

with`numpy`

arrays doesn't really make sense because we probably care about the truth value of each element (bitwise), not the truth value of the array.

Notice that when I print `arr`

all the elements are a string - and the `dtype`

is `<U21`

for all elements.

This is not how I instantiated the array so be aware of that behavior with numpy.

`<U21`

is a dtype expressing the values are 'Little Endian', Unicode, 12 characters. See here for docs for docs

So for logical comparisions we should look at the error message then...
Our handy error message says to try `any`

or `all`

Because the datatypes in this example are basically strings, using `arr.any()`

will result in an error that I do not fully understand, but `any(arr)`

and `all(arr)`

work...

```
sandbox NO VCS via 3.8.11(sandbox) ipython
❯ if arr.any():
...: print("foo")
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ <ipython-input-48-25ecac52db96>:1 in <cell line: 1> │
│ /home/u_paynen3/personal/sandbox/.venv/sandbox/lib/python3.8/site-packages/numpy/core/_methods.p │
│ y:57 in _any │
│ │
│ 54 def _any(a, axis=None, dtype=None, out=None, keepdims=False, *, where=True): │
│ 55 │ # Parsing keyword arguments is currently fairly slow, so avoid it for now │
│ 56 │ if where is True: │
│ ❱ 57 │ │ return umr_any(a, axis, dtype, out, keepdims) │
│ 58 │ return umr_any(a, axis, dtype, out, keepdims, where=where) │
│ 59 │
│ 60 def _all(a, axis=None, dtype=None, out=None, keepdims=False, *, where=True): │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
UFuncTypeError: ufunc 'logical_or' did not contain a loop with signature matching types (None, <class 'numpy.dtype[str_]'>) -> None
sandbox NO VCS via 3.8.11(sandbox) ipython
❯ if all(arr):
...: print("foo")
foo
sandbox NO VCS via 3.8.11(sandbox) ipython
❯ if any(arr):
...: print("foo")
foo
```

Let's change the example to just use integers and see what happens:

```
sandbox NO VCS via 3.8.11(sandbox) ipython
❯ arr2 = np.array([1, True, False])
sandbox NO VCS via 3.8.11(sandbox) ipython
❯ arr2
array([1, 1, 0])
sandbox NO VCS via 3.8.11(sandbox) ipython
❯ if arr2.any():
...: print("foo")
foo
sandbox NO VCS via 3.8.11(sandbox) ipython
❯ if arr2.all():
...: print("foo")
```

Ah, now some sanity...
First, the booleans are stored as integers, which based on this discussion makes sense.
Next we check if `any`

values (this is a bitwise operation) are `True`

, which we see they are so the conditional evaluates to `True`

.
Howver, if we check that `all`

values are `True`

we see they aren't, the last value is `False`

or `0`

so the conditional fails.

This is a different way to evaluate logical conditions than with lists and it's because of the special nature of numpy arrays that allows them to be compared bitwise but on the flip side, there isn't a meaningful way to evaluate the `truth value`

of an array.

## Pandas

Now for `pandas`

, which under the hood is a lot of `numpy`

but not fully.
`pandas.Series`

objects can hold mixed data types like lists, however to logically evaluate truth values we have to treat them like numpy arrays.

```
sandbox NO VCS via 3.8.11(sandbox) ipython
❯ s = pd.Series([1, "foo", True, False])
sandbox NO VCS via 3.8.11(sandbox) ipython
❯ s
0 1
1 foo
2 True
3 False
dtype: object
sandbox NO VCS via 3.8.11(sandbox) ipython
❯ bool(s)
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ <ipython-input-60-68e48e81da14>:1 in <cell line: 1> │
│ /home/u_paynen3/personal/sandbox/.venv/sandbox/lib/python3.8/site-packages/pandas/core/generic.p │
│ y:1527 in __nonzero__ │
│ │
│ 1524 │ │
│ 1525 │ @final │
│ 1526 │ def __nonzero__(self): │
│ ❱ 1527 │ │ raise ValueError( │
│ 1528 │ │ │ f"The truth value of a {type(self).__name__} is ambiguous. " │
│ 1529 │ │ │ "Use a.empty, a.bool(), a.item(), a.any() or a.all()." │
│ 1530 │ │ ) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
```

Just like with numpy, we can't evaluate the truth value of the series in a meaningful way, but bitwise operations make perfect sense...

```
❯ if s.any():
...: print("foo")
foo
sandbox NO VCS via 3.8.11(sandbox) ipython
❯ if s.all():
...: print("foo")
```

**I thought this was about and and &...**

Right, so recall that `and`

is a lazy boolean evaluation (ie. it evaluates the 'truth value' an object) whereas `&`

does bitwise comparison.

What we see then with `pandas`

and `numpy`

is that if we want to do logical comparisons, we need to do them bitwise, ie. use `&`

.

Keep in mind though that the data types make a big deal - we can't use `&`

with strings because the bitwise operation isn't supported, for strings we need to use the boolean evaluation.

## The Original Point

My main use case for this is finding elements in a dataframe/series based on 2 or more columns aligning row values...

Say I have a dataframe like this:

```
sandbox NO VCS via 3.8.11(sandbox) ipython
❯ df
s s2 s3
0 1 0 foo
1 1 a bar
2 1 b baz
3 2 a fee
4 2 0 fi
```

Example use case is I want to get the values in `s3`

where `s`

is 1 and `s2`

is 'a'. ie. I'm just after `bar`

for now...

Up until now I've always just tried `df.s3[(df.s == 1) and (df.s2 == "a")]`

the first time and every single time I've gotten this error that I just haven't ever fully understood:

```
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
```

But after this deep dive I think I've grasped that `and`

doesn't actually do what I want here, and in order to do the bitwise comparision I need to use `&`

```
sandbox NO VCS via 3.8.11(sandbox) ipython
❯ df.s3[(df.s == 1) & (df.s2 == "a")]
1 bar
Name: s3, dtype: object
```

## End

Hopefully this set of ramblings brings some clarity to `and`

and `&`

and you can Google one less error in the future in your logical comparison workflows 😄