nena masthead
NENA Home Staff & Editors For Readers For Authors

Testing Automated Call-recognition Software for Winter Bird Vocalizations
Andrew Wolfgang and Aaron Haines

Northeastern Naturalist, Volume 23, Issue 2 (2016): 249–258

Full-text pdf (Accessible only to subscribers. To subscribe click here.)


Access Journal Content

Open access browsing of table of contents and abstract pages. Full text pdfs available for download for subscribers.

Current Issue: Vol. 30 (3)
NENA 30(3)

Check out NENA's latest Monograph:

Monograph 22
NENA monograph 22

All Regular Issues


Special Issues






JSTOR logoClarivate logoWeb of science logoBioOne logo EbscoHOST logoProQuest logo

Northeastern Naturalist Vol. 23, No. 2 A. Wolfgang and A. Haines 2016 249 2016 NORTHEASTERN NATURALIST 23(2):249–258 Testing Automated Call-recognition Software for Winter Bird Vocalizations Andrew Wolfgang1,2 and Aaron Haines1,* Abstract - Automated recording devices and call-recognition software are technologies available to survey vocal fauna. We evaluated the Song Scope® call-recognition software designed by Wildlife Acoustics to automatically identify vocalizations of 3 North American winter bird species. We used a Wildlife Acoustics SM-2 automated recorder to record winter avian vocalizations and then screened these field recordings using the Wildlife Acoustics Song Scope software programmed with recognition models, or recognizers, we created. Song Scope correctly identified an average 39% of target vocalizations to species using featured recognizers, with some recognizers performing better than others (accuracy range = 20–59%). Screening a 10-h field recording with Song Scope took an average of 7 minutes per recognizer. Call-recognition software can be used to survey vocal species; however, when biologists use this software to determine species presence or density, they need to be aware of potential bias in survey results because some species-recognizer models perform better than others. Introduction The ability to assess bird abundance and diversity by recording and identifying vocalizations is a valuable tool in avian ecology (Blumstein et al. 2011, Kroodsma and Budney 2011). A number of studies have experimented with surveying populations using automated recording devices and recognition software to identify species by vocalizations (Brandes 2008, Buxton and Jones 2012, Holmes et al. 2014, Lopes et al. 2011, Venier et al. 2012, Waddle et al. 2009). Automated identification requires less time than manual scanning of recordings, but has been reported to lack the accuracy of trained surveyors (Swiston and Mennil 2009). However, automated detection of vocal species can be more efficient and improve detection probability because modern recording devices can be left in the field to survey either continuously or during programmed time-intervals (Acevedo and Villaneuva-Rivera 2006, Holmes et al. 2014, Venier et al. 2012). Seasonal changes, time restrictions, and weather events often limit efforts for surveyors, but the limitations can be reduced by using automated recorders (Bas et al. 2008, Bridges and Dorcas 2000, Diefenbach et al. 2007). Sound-recognition software programs such as Raven® (Duan 2013), XBAT (Brandes 2008, Swiston and Mennil 2009), and Song Scope® (Buxton and Jones 2012, Duan 2013, Holmes et al. 2014) can be programmed to attempt to distinguish wildlife species by their recorded vocalizations. These recordings can provide a permanent, biologically important library, 1Department of Biology, Applied Conservation Lab, PO Box 1002, Millersville University, Millersville, PA 17551. 2Current address - 34001 Mill Dam Road, Wallops Island, VA 23337. *Corresponding author - Manuscript Editor: Susan Herrick Northeastern Naturalist 250 A. Wolfgang and A. Haines 2016 Vol. 23, No. 2 where digital sound files are available for review and re-analysis (Blumstein et al. 2011, Kroodsma and Budney 2011). Audio signals from field recordings have many characteristics, frequencies, and syllables (Brandes 2008), and there are various methods for isolating and classifying them (Blumstein et al. 2011, Brandes 2008). Song Scope software analyzes an entire audio-signal structure using Hidden Markov Models (HMM) (Duan 2013), where coefficients record changes in patterns within the harmonic structure as the signal is processed (Brandes 2008). HMMs and other signal-processing techniques have been used to effectively identify various birds in the family Thamnophilidae (antbird species), as well as Setophaga cerulea (Wilson) (Cerulean Warbler), Empidonax virescens (Vieillot) (Acadian Flycatcher), and Passerina cyanea L. (Indigo Bunting) songs (Holmes et al. 2014, Kirschel et al. 2009, Kogan and Margoliash 1998, Trifa et al. 2008). However, accuracy of sound-recognition software to identify particular species calls has varied. Duan (2013) reported accuracies of only 37%, while Buxton and Jones (2012) reported accuracies of greater than 50%. Both of these studies used Song Scope software. In our study, we used Song Scope (v. 4.1.1) recognition software and Wildlife Acoustics hardware (automated recording device SM-2) (Wildlife Acoustics 2011a). These tools have been used to verify presence of avian woodland species including Cerulean Warbler and Acadian Flycatcher (Holmes et al. 2014). Our objectives for this study were to assess the accuracy of Song Scope software, document presence of 3 winter woodland species, and determine whether Song Scope is a tractable option for field biologists with no experience using signalprocessing software. We believe that validating the accuracy of call-recognition software is important in order to avoid bias in species-detection rates. Field-site Description We mounted the Song Meter® recording device (SM-2, Wildlife Acoustics, East Lansing, MI) facing northeast at a height of 2 m on a tree in a mixed deciduous forest on the Millersville University Biological Preserve west of the Conestoga River in Millersville, PA (39°99'5''N, 76°34'63''W). The habitat around the device was characterized by large Liriodendron tulipefera L. (Yellow Poplar) and Platanus occidentalis L. (American Sycamore). The mid-story was primarily Acer rubrum L. (Red Maple) with a thick shrub layer of Rosa multiflora Thunb. (Multiflora Rose). The proximity of the Conestoga River and a nearby road created an edge habitat utilized by birds in the winter months. Methods We tested Song Scope using 3 focal winter bird vocalizations: the “jay” calls of Cyanocitta cristata L. (Blue Jay), the basic “tea-kettle” songs of Thryothorus ludovicianus (Latham) (Carolina Wren), and the “chick-a-dee-dee-dee” calls of Poecile carolinensis (Audubon) (Carolina Chickadee) (Fig. 1). We chose these target vocalizations because they are consistently heard during the winter season in eastern temperate forests of the US. Northeastern Naturalist Vol. 23, No. 2 A. Wolfgang and A. Haines 2016 251 Figure 1. Spectrograms from Song Scope ann o t a t i o n s showing the diversity of call structure between target vocalizations of 3 focal species. Northeastern Naturalist 252 A. Wolfgang and A. Haines 2016 Vol. 23, No. 2 We placed the Song Meter at the site to ensure that microphone settings and automated recording times were optimal for recording a large number of bird vocalizations. We then deployed the SM-2 with Wildlife Acoustic default audio settings and SMX-II microphone settings. All channels were stereo, left and right microphones were set at +40.0 dB, sample rate was 16,000 hertz and all files were stored in .wav format. We used these settings for the duration of the study. We first deployed the SM-2 device on 27 September 2013 and last retrieved it on 23 December 2013. During that interval, we sampled for reference calls and test vocalizations on a total of 33 days. We collected reference vocalizations to use as training data to build song recognizers. Each species had a minimum of 15 high-quality reference vocalizations. Buxton and Jones (2012) built models using 2–5 high-quality reference vocalizations, while Holmes et al. (2014) used over 100 various vocalizations. Song Scope uses a “recognizer” created from reference vocalizations to isolate and classify signals within a field recording. Recognizers are recognition models that search field recordings for a match based on the features of the reference vocalizations from which they were created. Success of recognizers is dependent on the purity of prerecorded reference vocalizations and correct model parameters (Wildlife Acoustics 2011b). Reference vocalizations must first be annotated in Song Scope, and then annotations can be grouped into a usable recognizer. We recorded reference vocalizations with the SM-2 automated recording device from 25 October to 7 November 2013 and 26 November to 4 December 2013. The SM-2 was programmed to record from 0845 to 0900 when the target birds were most vocal during the winter. We obtained other reference vocalizations from Thayer Birding Software (2012) and hand-held recordings of target birds. For hand-held recordings, we used a Tascam DR-05 device (TEAC, Montebello, CA) from 0800 to 1000 on 22 December 2013 at Pinchot State Park (PSP), located ~40 mi west of Millersville in York, PA. After annotating reference vocalizations, we adjusted Song Scope recognizer parameters to create the best recognizer using a “cross-training score” (see Wildlife Acoustics 2011b) for each recognizer based on its ability to identify vocalizations in trial recordings. We created recognizers with selected model parameters set to match the target structure of a reference vocalization for a specific species (Buxton and Jones 2012). Spectrograms show the diversity of our targets (Fig. 1); therefore, recognizer parameters had to be species-specific for our study. Recognizer-model parameters have values that can be manipulated based on the characteristics of a target vocalization (Table 1); a more in-depth explanation is outlined in the Song Scope manual (Wildlife Acoustics 2011b). We tested recognizer parameters using 15-min field recordings of target vocalizations (trial recordings) from our study site. We changed parameters by trial and error until the recognizer identified all (100%) of the trial-recording target vocalizations. These targets were then assimilated into the recognizer as additional reference vocalizations to create an “improved recognizer.” We tested the improved recognizer for 100% accuracy on a second 15-min field recording of target vocalizations. We used 3 sets of reference vocalizations (i.e., SM-2 recordings, Thayer software, hand-held Northeastern Naturalist Vol. 23, No. 2 A. Wolfgang and A. Haines 2016 253 recordings) to build 3 separate models for each target species. Different combinations of calls can be combined into a single recognizer model, allowing the model to capture and correctly identify a greater variation of species vocalizations (Wildlife Acoustics 2011b). Thus, we ran all 3 reference-vocalization models simultaneously for each target species. These combined models performed better than any solitary recognizer for each target species, and we used them for our analysis. We tested improved recognizers on 30-min field recordings taken at 0800 and 1530 from 12 to 23 December 2013. We obtained a total of 10 hours of field recordings, which we later screened for target vocalizations with the Song Scope program. We considered complete vocalizations that stood out over filtered background noise (registering 20–40 dB) to be quality vocalizations and incomplete vocalizations or those that lacked the volume (registering 0–20 dB) to stand out over filtered noise at any given point in the recording as non-quality vocalizations. Song Scope has the ability to screen many hours of recordings at once by “batching” the recordings from separate recording times together; our 10 usable hours of field recordings were batched. We evaluated the Song Scope recognizer models by manually reviewing all 10 h of audio. If the software correctly matched a quality target vocalization, we defined this as a true positive, a non-target vocalization that was recognized as a target was a false-positive, and if the software missed a quality target vocalization which we identified, it was labeled a false-neg ative. Results After manual evaluation, we classified 20% of all recorded vocalizations (317/1280) as quality vocalizations (Carolina Wren [83 of 135], Carolina Chickadee [54 of 661], and Blue Jay [180 of 484]). We used these quality vocalizations to test the Song Scope software. The mean total accuracy for our featured recognizer models using Song Scope was 39%, based on true positives among quality calls. The highest percentages of true positive calls identified were 59% for the Carolina Wren, 39% for the Blue Jay, and 20% for the Carolina Chickadee (Table 2). In addition, Table 1. Song Scope parameters needed to develop a recognition model (Wildlife Acoustics 2011b). Song Scope parameter Definition Maximum song duration The expected duration of a song or call Maximum syllable duration The expected duration of a syllable in a song or call Maximum syllable interval The expected duration of the time between a syllable in a song or call Minimum frequency The lowest frequency used in the song or call Frequency range The span of frequency for the song or call Dynamic range This sets a level of decibels to normalize song scope with peak song or call levels Maximum complexity The maximum number of Hidden Markov Models used by Song Scope in a recognizer Maximum resolution Maximum number of feature vectors used by a Song Scope recognizer Background filter Puts emphasis on song or call and cuts stationary noise Sample rate The rate at which the song or calls are analyzed by the recognizer FFT Fast Fourier Transform window size Northeastern Naturalist 254 A. Wolfgang and A. Haines 2016 Vol. 23, No. 2 Carolina Wren had the lowest number of false positives (13), while Carolina Chickadee had the highest (61). Manually screening the field recordings for our focal species took 12 h, or an average of 4 h per species. Automated identification for these species using Song Scope took a total of 15 h and 21 min, or an average of 5 h and 7 min per species, including 4.5 h to create a featured recognizer for each species (Table 3). However, after recognizers were created, Song Scope, on average, processed the 10-h field recording in 7 minutes, and it took us 30 min to evaluate Song Scope’s findings after the program displayed results. Not including the time needed to create a recognizer, identification of audio vocalizations using the Song Scope program took on average 37 mins per species for a 10-h recording (T able 3). Discussion We used percent of accurate identifications to evaluate Song Scope’s ability to make correct identifications from quality vocalizations. Song Scope was not able Table 2. Accuracy of Song Scope automated recognition software from Wildlife Acoustics identifying target vocalizations from 3 bird species in Millersville, PA. True positives = the number of quality calls of the species correctly identified. False positives = number of misidentifications of sounds that were not calls of the species as calls of the species. False negatives = number of undetected quality calls of the species. Accuracy = the percentage of quality calls of the species correctly identified. Quality calls True False False Species (>20 dB) positives positives negatives Accuracy Cyannocita cristata (Blue Jay) 180 71 29 109 39% Poecile carolinensis (Carolina Chickadee) 54 11 61 43 20% Thryothorus ludovicianus (Carolina Wren) 83 50 13 33 59% Table 3. Time required to identify 3 winter bird species from 10-h field recordings in Millersville, PA, using manual identification and the Song Scope automated detection recognition software available from Wildlife Acoustics. The amount of time is variable dependent on skill level, number of calls used to make up each recognizer, and density of calls within field recordings. Method Used Total time required (h) Average time required per species (h) Manual identification 12.00 4.00 Automated identification, using Song Scope with newly created re cognition models Gather reference calls 1.50 0.50 Prepare/annotate reference calls 3.00 1.00 Toggle recognizer settings 4.50 1.50 Verification trials for recognizer 4.50 1.50 Song Scope screening 0.36 0.12 Process Song Scope results 1.50 0.50 Total 15.36 5.12 Automated identification, using Song Scope with pre-made recogni zer models Song Scope screening 0.36 0.12 Process Song Scope results 1.50 0.50 Total 1.86 0.62 Northeastern Naturalist Vol. 23, No. 2 A. Wolfgang and A. Haines 2016 255 to isolate faint or fragmented calls over background noise; therefore, it could not classify a large percentage of targets that a trained listener could both isolate and classify. However, our definition of accuracy was the software’s ability to identify the vocalizations that it was trained to identify, not identify partial songs, interrupted songs, or faint calls just barely audible over background noise. Identifications of these cryptic vocalizations would require the development of an infinite number of models for partial songs and calls with various sources of back ground noise. The software correctly identified our focal species an average of 39% of the time using quality vocalizations, and we found the identification accuracy of Song Scope was species dependent. We assume that this species-dependent accuracy is a function of the different categories of sound used by each species in their vocalizations (Brandes 2008). The Carolina Wren song has spectrogram components similar to frequency-modulated whistles with little harmonic structure (Brandes 2008), and of the 3 species tested, was most successfully identified by SongScope (Table 2). It has been reported that HMMs often fail to identify birds with strong harmonics such as the call of the Carolina Chickadee and especially the Blue Jay (Fig. 1; Brandes 2008). The variable accuracy of species-recognizer models creates the potential to underestimate or overestimate species presence when surveying for different vocal species because one species model may have lower accuracy in correctly identifying target calls than another (e.g., for Blue Jays or Carolina Chickadees vs. for Carolina Wrens in our study) or a higher rate of false positives (e.g., for Carolina Chickadees vs. for either Blue Jays or Carolina Wrens in our study). When comparing presence/absence and/or population density between species, we recommend that field biologists compare the accuracy of the different recognizer models they use to account for potential bias in detection rates. Song Scope can detect diverse targets, and performs best with lower call volumes. True positives were instances when the software correctly identified targets, and false negatives were instances when the targets were missed or skipped. These false negatives occurred due to background noise and microphone limitations, a known problem with automated identification (Blumstein et al. 2011, Brandes 2008, Buxton and Jones 2012). Call identification is heavily influenced by background noise; thus, it is important to ensure that future studies incorporate multiple survey sites to account for various sources of background noise . Buxton and Jones (2012) reported higher accuracy (56–69%) in audio-call identification using Song Scope than the 39% accuracy we report here with the same program. Buxton and Jones (2012) surveyed seabirds with simple calls on isolated islands and had a specialist from Wildlife Acoustics help develop their recognizer models. Duan (2013) recommended that an expert in audio-signal interpretation should manipulate model parameters to improve accuracy and reduce time in developing recognizers using Song Scope. Wildlife Acoustics acknowledges that patience and time are needed to create optimum parameters for recognizers (Wildlife Acoustics 2011b). Therefore, the use of acoustic-identification software is only a tractable option to the average field biologist if they are able to dedicate significant effort in developing a recognizer model. We recommend that field biologists take a Northeastern Naturalist 256 A. Wolfgang and A. Haines 2016 Vol. 23, No. 2 training workshop on the use of sound-recognition software if they wish to develop more reliable species-specific recognizer models. The time needed to create recognizers is substantial (5.12 h per species; Table 3); however, when a good recognizer model is developed, it can be used repeatedly for multiple survey efforts (Holmes et al. 2014), making the method very time-efficient, even considering the initial time investment in model development. Waddle et al. (2009) suggested that biologists develop a database to archive accurate recognizers developed by experts utilizing programs such as Song Scope. Researchers could benefit from a robust online library full of recognizer-like files created by experts using standard software parameters. After creating species-specific recognizer models, we found that running these recognizers using Song Scope’s batch feature can be done quickly. With the use of developed recognizers, Song Scope was able to screen and process results for each species in an average of 37 minutes (Table 3). Comparatively, manual screenings took 4 h per species. An online database of recognizers would make automated identification simpler and faster. A shared database of recognizer models for sound-recognition software should be a multi-regional collaborative effort and include many hours of recordings. In addition, the database should cover the needs of users working with various sound-recognition software programs; thus, it should archive different types of “recognizer”-like files and be comprised of a diverse library of vocalization files. Once this database is created, researchers intending to survey for a diversity of species with automated software and equipment will save hours of time because they will not have to develop their own identifiers. As the use of remote recording devices and call-identification software increases, the community of researchers who employ the technology will grow. This trend will create a demand for the development of an online database to provide species-specific recognizer models for different sound-recognition software packages. Song Scope has been successfully used for presence/absence surveys of vocal species (Buxton and Jones 2012, Holmes et al. 2014) and performs best when background noise is minimal. However, biologists must consider the accuracy of recognizer models. When using multiple species-recognizer models, field biologists should be aware that a model with low detection ability may bias survey results and under-represent the presence or density of species in the field in comparison to species for which available models are more accurate. Acknowledgments We acknowledge Millersville University and the Millersville University Student Grant for Independent Research for funding this project. We thank B. Horton for reviewing this manuscript. Literature Cited Acevedo, M.A., and L.J. Villanueva-Rivera. 2006. Using automated digital-recording systems as effective tools for the monitoring of birds and amphibians. Wildlife Society Bulletin 34:211–214. Northeastern Naturalist Vol. 23, No. 2 A. Wolfgang and A. Haines 2016 257 Bas, Y., V. Devictor, J.P. Moussus, and F. Jiguet. 2008. Accounting for weather and timeof- day parameters when analyzing count data from monitoring programs. Biodiversity and Conservation 17:3403–3416. Blumstein, D.T., D.J. Mennill, P. Clemins, L. Girod, K. Yao, G. Patricelli, J.L. Deppe, A.H. Krakauer, C. Clark, K.A. Cortopassi, S.F. Hanser, B. McCowan, A.M. Ali, and A. Kirschel. 2011. Acoustic monitoring in terrestrial environments using microphone arrays: Applications, technological considerations, and prospectus. Journal of Applied Ecology 48:758–767. Brandes, T.S. 2008. Automated sound recording and analysis techniques for bird surveys and conservation. Bird Conservation International 18:163–173. Bridges, A., and M. Dorcas. 2000. Temporal variation in anuran calling behavior: Implications for surveys and monitoring programs. Copeia 2000:587–592. Buxton, R.T., and I.L. Jones. 2012. Measuring nocturnal seabird activity and status using acoustic recording devices: Applications for island restoration. Journal of Field Ornithology 83:47–60. Diefenbach, D., M. Marshall, and J. Mattice. 2007. Incorporating availability for detection in estimates of bird abundance. Auk 124:96–106. Duan, S., J. Zhang, P. Roe, J. Wimmer, X.Dong, A. Truskinger, and M. Towsey. 2013. Timed probabilistic automaton: A bridge between Raven and Song Scope for automatic species recognition. Pp. 1519–1524, In H. Muñoz-Avila and D.J. Stracuzzi (Eds.) Proceedings of the 25th Innovative Applications of Artificial Intelligence Conference, Bellvue, WA. Holmes, S.B., K.A. McIlwrick, and L.A. Venier. 2014. Using automated sound-recording and analysis to detect bird species at risk in southwestern Ontario woodlands. Wildlife Society Bulletin 38:591–598. Kirschel, A.N., D.A. Earl, Y. Yao, I.A. Escobar, E. Vilches, E.E. Vallejo, and C.E. Taylor. 2009. Using songs to identify individual Mexican Antthrush, Formicarius moniliger: Comparison of four classification methods. Bioacoustics 19:1–20. Kogan, J., and D. Margoliash. 1998. Automated recognition of bird-song elements from continuous recordings using dynamic time warping and Hidden Markov models: A comparative study. Journal of the Acoustical Society of America 103:2185–2196. Kroodsma, D., and G.F. Budney. 2011. Sound Recordings: An essential tool for conservation. Conservation Biology 25:851–852. Lopes, M.T., L.L. Gioppo, T.T. Higushi, C.A. Kaestner, C.N. Silla, and A.L. Koerich. 2011. Automatic bird-species identification for a large number of species. Pp. 117–122, In Proceedings of the 2010 IEEE International Symposium on Multimedia, Taichung, Taiwan. Swiston, K.A., and D.J. Mennill. 2009. Comparison of manual and automated methods for identifying target sounds in audio recordings of Pileated, Pale-billed, and putative Ivorybilled Woodpeckers. Journal of Field Ornithology 80:42–50. Thayer Birding Software. 2012. Thayer’s Birds of North America, Version 4.5 (DVD). Thayer Birding Software, Naples, FL. Available online at http://www.thayerbirding. com/. Accessed 15 September 2013. Trifa, V.M., A.N. Kirschel, C.E. Taylor, and E.E. Vallejo. 2008. Automated species-recognition of antbirds in a Mexican rainforest using Hidden Markov models. The Journal of the Acoustical Society of America 123:2424–2431. Venier, L.A., S.B. Holmes, G.W. Holborn, K.A. McIlwrick, and G. Brown. 2012. Evaluation of an automated recording-device for monitoring forest birds. Wildlife Society Bulletin 36:30–39. Northeastern Naturalist 258 A. Wolfgang and A. Haines 2016 Vol. 23, No. 2 Waddle, J.H., T.F. Thigpen, and B.M. Glorioso. 2009. Efficacy of automatic vocalizationrecognition software for anuran monitoring. Herpetological Conservation and Biology 4:384–388. Wildlife Acoustics. 2011a. Song Scope: Bioacoustics Software, Version 4.1.1. Wildlife Acoustics, Inc. Maynard, MA. Available online at product/analysis-software. Accessed 15 September 2013. Wildlife Acoustics. 2011b. Song Scope User Manual: Bioacoustics Software (Version 4.0) Documentation. USA: Wildlife Acoustics, Inc. Maynard, MA. Available online at http:// Accessed 15 September 2013.