A common task is to isolate records when some condition changes. Failure analysis is an obvious example. When working in a `tidy`

environment, this isn’t always easy because of its strong column bias. Looking back at previous rows is clunky, at best.

The sometimes overlooked `which()`

provides one part of the solution. Consider a simple case of two vectors, easily combinable into a data frame, one of which indicates a classification and the other, whether or not that classification is present.

```
genres <- c("Action", "Animation", "Comedy", "Documentary", "Romance", "Short")
row_vector <- c(0,0,1,1,0,0,0)
indices <-min(which(row_vector == TRUE))
indices
```

`## [1] 3`

Both `genres`

and `row_vector`

are equal-length character and numeric vectors and the correspondence depends on position, with the `Action 0`

pair being the first and `Romance 0`

the last. Python users will recognize this as a `hash`

.

The `indices`

atomic (single element) vector applies, working from the inside out, the `which()`

function to select the elements of `row_vector`

that are `TRUE`

, which evaluates to `1`

and `min`

finds the position of the first among that subset. So, we end up with `3`

and `genres[3]`

evaluates to **Comedy**.

Using the positions of elements in one vector to identify elements in another provides a way to use `rle()`

, the run-length encoding function.

What `rle`

does is to keep track of the number of times an element in a vector appears repeated zero or more times.

As usual, it helps to run the `help()`

example, with some inspection:

```
x <- rev(rep(6:10, 1:5))
x
```

`## [1] 10 10 10 10 10 9 9 9 9 8 8 8 7 7 6`

```
y <- rle(x)
y
```

```
## Run Length Encoding
## lengths: int [1:5] 5 4 3 2 1
## values : int [1:5] 10 9 8 7 6
```

`str(y)`

```
## List of 2
## $ lengths: int [1:5] 5 4 3 2 1
## $ values : int [1:5] 10 9 8 7 6
## - attr(*, "class")= chr "rle"
```

The two pieces of `y`

, `y$lengths`

(or `y[1]`

) and `y$values`

tell us that there are five repetitions of 10, four of 9, etc.

Let’s create a simple data frame

```
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(tibble))
my_data <- tibble(type=rep(c(1:2), each= 9), hour= rep(1:9, 2), event = c(1,1,1,0,0,0,0,1,1,1,1,0,0,0,1,0,1,0))
my_data
```

```
## # A tibble: 18 x 3
## type hour event
## <int> <int> <dbl>
## 1 1 1 1
## 2 1 2 1
## 3 1 3 1
## 4 1 4 0
## 5 1 5 0
## 6 1 6 0
## 7 1 7 0
## 8 1 8 1
## 9 1 9 1
## 10 2 1 1
## 11 2 2 1
## 12 2 3 0
## 13 2 4 0
## 14 2 5 0
## 15 2 6 1
## 16 2 7 0
## 17 2 8 1
## 18 2 9 0
```

Let `1`

in column `event`

indicate success and `0`

failure. Where does each string of successes turn into failure?

```
runs <- rle(my_data$event)
runs <- tibble(runs$lengths, runs$values)
colnames(runs) <- c("lengths", "values")
runs
```

```
## # A tibble: 8 x 2
## lengths values
## <int> <dbl>
## 1 3 1
## 2 4 0
## 3 4 1
## 4 3 0
## 5 1 1
## 6 1 0
## 7 1 1
## 8 1 0
```

```
sequences <- sequences <- tibble(lengths = runs$lengths, values = runs$values) %>% mutate(indices = cumsum(runs$lengths))
sequences
```

```
## # A tibble: 8 x 3
## lengths values indices
## <int> <dbl> <int>
## 1 3 1 3
## 2 4 0 7
## 3 4 1 11
## 4 3 0 14
## 5 1 1 15
## 6 1 0 16
## 7 1 1 17
## 8 1 0 18
```

```
post_zero <- sequences %>% filter(values == 0)
post_zero
```

```
## # A tibble: 4 x 3
## lengths values indices
## <int> <dbl> <int>
## 1 4 0 7
## 2 3 0 14
## 3 1 0 16
## 4 1 0 18
```

```
result <- left_join(sequences, post_zero, by = "indices") %>% select(1:3) %>% filter(values.x == 1)
colnames(result) <- c("lengths", "runs", "indices")
result
```

```
## # A tibble: 4 x 3
## lengths runs indices
## <int> <dbl> <int>
## 1 3 1 3
## 2 4 1 11
## 3 1 1 15
## 4 1 1 17
```

`my_data[result$indices,]`

```
## # A tibble: 4 x 3
## type hour event
## <int> <int> <dbl>
## 1 1 3 1
## 2 2 2 1
## 3 2 6 1
## 4 2 8 1
```

The variable `type = 1`

had one string of successes that ended at hour three, `type = 2`

had three ending at hours two, six and eight.

More interesting, of course, is the case where `hour`

is a `datetime`

object and you can bring date arithmetic into play.

The main point is that if you can design a logical test to mutate a numeric column, `rle`

provides a straightforward way of subsetting sequences based on the the test result.