Section 7 Splitting large .wav files
Here, we will first split the raw data which was collected for 24 hours at a site, for 7 days at a stretch. This is being done for the sake of manual annotation of bird species. The deployment schedule of the AudioMoths was set to record for 4-minutes and was switched off for 1-min. For the sake of analysis, data was split into 10s chunks and annotated manually using Raven Pro.
7.1 Load required libraries
# Source any custom/other internal functions necessary for analysis
7.2 Selecting dawn acoustic data
We will use warbleR::split.wavs() to split a large file. To do so, we will first load a list of .wav files from folders (will have to be done site by site). Next, we we select only files between 6 am and 10 am (this can be varied depending on the exercise or the question at hand). For each day selected, we randomly extracted a continuous 16-min of recording.
# List the path that contains all folders, which contain the audiomoth data
<- "C:\\data\\2020-winter\\"
# Listing the folders within which .WAV files are stored
<- dir(path, recursive=F,full.names=T)
# Now get only those files that begin at 6am and end at 10am
<- list()
for(i in 1:length(folders)){
# Below code needs to be run only if we have to rename files
# List the files within each folder and renaming the files with the prefix - SITE_ID
<- list.files(paste0(path,basename(folders)[i],"\\"), full.names = T)
a file.rename(from = a, to=paste0(basename(folders)[i],"_",basename(a)))
# Extract the strings for .wav files between 6am and 10am
<- list.files(paste0(path,basename(folders)[i],"\\"),full.names = T) %>%
time_str ::file_path_sans_ext() %>% str_extract('\\d+$')
tools<- time_str[time_str>="060000" & time_str <="100000"] # vary times here depending on the question at hand
for(j in 1:length(unique(time_str))){
<- list.files(paste0(path,basename(folders)[i],"\\"),full.names = T,
b pattern = time_str[j])
<- c(files,b)
# These are the list of files we need
<- unlist(files)
# Now we choose a random consecutive 16 min of data between 6am and 10am
# Get a list of unique dates (since we will be generating a random 16min for every date across every site)
<- str_extract(basename(files),'\\w+_\\d+_')
site_date unique(site_date) # Give you unique date and sites for which we need to generate 16 min of data
<- list()
for(i in 1:length(unique(site_date))){
<- files[str_detect(files,unique(site_date)[i])]
a if(length(a)<4){ # essentially specifies that the min number you need
else {
} <- extractRandWindow(a,4)
subset_dat <- na.exclude(subset_dat) # If there are less than 4 files
subset_dat <- c(subset_files, subset_dat)
<- unlist(subset_files)
# Subset those files and copy it to a separate folder
# Please note that these folders & files are locally stored (they are extremely large and cannot be added to GitHub)
file.copy(from = final_subset, to="C:\\data\\subset\\")
7.3 Split the files
Split the files and provide unique names to each file
# Note: the path you choose to store data is upto the user.
<- "C:\\data\\subset\\"
# Split the files into n-second chunks
split_wavs(path=subset_path, sgmt.dur = 10, parallel=4)
# Get files that need to be renamed
<- list.files(subset_path, full.names = T, pattern = "-")
# Note the number of chunks will vary as a function of segment duration
# 240 seconds = 24 chunks each of 10s
<- c("01-10","10-20","20-30",
chunks "30-40","40-50","50-60",
for(i in 1:length(chunks)){
<- split_files[endsWith(split_files,paste0("-",i,".wav"))]
c <- str_replace(c,paste0("-",i),paste0("_",chunks[i]))
d file.rename(from=c, to=d)
# Remove the original files
<- list.files(subset_path, full.names = T, pattern = ".WAV$")
orig_files file.remove(orig_files)
Now, go ahead and begin the process of manual annotation!