Parsing sonar data in Python using NumPy

Sep 6, 2022 5 min read Python

Recreational-grade sonar equipment can collect vast amounts of data. Unfortunately, the data is often hidden in some kind of proprietary binary format. However, efforts in reverse engineering such formats have made it possible to extract of the information. I have spent time tracking down some this information which has resulted in a R-package as well which can read ‘.sl2’ and ‘.sl3’ file formats collected using Lowrance sonar equipment. See also the sllib Python library which fills a similar gap. Nonetheless, the process of parsing the data has been slow - until now…

Update: I wrapped up some of the content in this post with some extras into the Python package sonarlight

Parsing binary data

The binary data in ‘.sl2’ and ‘.sl3’ files are stored in a simple layout. The file starts with a header of 8 bytes, followed by the data, which is stored in frames that are recorded by different sensors, e.g. down-looking or side-scan sonar. Each frame consists of a header whose size depends on the format and contains metadata associated with each reading. Then follows a binary blob with the actual readings or ‘sonar pings’.

A nice way to get an overview is to produce a data frame with all the metadata of interest including the byte positions of the associated sonar data. The challenge is to identify bytes of interest in the frame header. See some of the links in the R-package project description on this matter.

Speeding it up

Binary files can be read and parsed incrementally just fine. If memory is available, the process can be sped up by reading in all the binary data at once and then parsed rapidly. In NumPy, this can be done using custom data types (i.e. numpy.dtype) which proved to be very much faster than my previous attempts parsing GB’s files in a few seconds rather than several minutes. To sum up, the strategy to speed up file parsing is to load all binary data into memory, cut and clip the regions of interest, and finally parse it using NumPy. For convenience in subsequent handling, the parsed data is converted to a Pandas Dataframe.

The Python code is uploaded as a Github Gist and can be broken down into a few parts:

Loading libraries and defining constants:

import numpy as np
import pandas as pd
import math

#Constants
file_head_size_sl2 = 8
frame_head_size_sl2 = 144

file_head_size_sl3 = 8
frame_head_size_sl3 = 168

#NumPy custom dtypes
dtype_sl2 = np.dtype([
    ("first_byte", "<u4"),
    ("frame_size", "<u2"),
    ("survey_type", "<u2"),
    ("min_range", "<f4"),
    ("max_range", "<f4"),
    ("water_depth", "<f4"),
    ("x", "<i4"),
    ("y", "<i4"),
    ("heading", "<f4")
    ])

dtype_sl3 = np.dtype([
    ("first_byte", "<u4"),
    ("frame_size", "<u2"),
    ("survey_type", "<u2"),
    ("min_range", "<f4"),
    ("max_range", "<f4"),
    ("water_depth", "<f4"),
    ("x", "<i4"),
    ("y", "<i4"),
    ("heading", "<f4")
    ])

Define a few helper functions for coordinate conversion and reading binary data:

#Helper functions to convert units etc.
survey_dict = {0: 'primary', 1: 'secondary', 2: 'downscan', 3: 'left_sidescan', 4: 'right_sidescan', 5: 'sidescan'}

def x2lon(x):
    return(x/6356752.3142*(180/math.pi))

def y2lat(y):
    return(((2*np.arctan(np.exp(y/6356752.3142)))-(math.pi/2))*(180/math.pi))

#Funtion for reading binary data into memory
def read_bin(path):

    with open(path, "rb") as f:
        data = f.read()
        
    return(data)

The two functions used for decoding ‘.sl2’ and ‘.sl3’ binary data:

#Function for parsing binary data in '.sl2' format
def sl2_decode(data):
    
    position = file_head_size_sl2
    headers = []

    #Cut binary blob
    while (position < len(data)):
        head = data[position:(position+frame_head_size_sl2)]
        frame_size = int.from_bytes(head[28:30], "little", signed = False)
        head_sub = head[0:4]+head[28:30]+head[32:34]+head[40:48]+head[64:68]+head[108:116]+head[124:128]    
        headers.append(head_sub)
        position += frame_size
    
    #Parse binary blob using NumPy custom dtype
    frame_head_np = np.frombuffer(b''.join(headers), dtype=dtype_sl2)

    #Convert to pandas dataframe
    frame_head_df = pd.DataFrame(frame_head_np)
    
    #Convert x-y coordinates to lat/long 
    frame_head_df["longitude"] = x2lon(frame_head_df["x"])
    frame_head_df["latitude"] = y2lat(frame_head_df["y"])

    #Get survey type label
    frame_head_df["survey_label"] = [survey_dict.get(i, "Other") for i in frame_head_df["survey_type"]]

    #Convert feet to meters
    frame_head_df[["water_depth", "min_range", "max_range"]] /= 3.2808399
    
    return(frame_head_df)

#Function for parsing binary data in '.sl3' format
def sl3_decode(data):
    
    position = file_head_size_sl3
    headers = []

    #Cut binary blob
    while (position < len(data)):
        head = data[position:(position+frame_head_size_sl3)]
        frame_size = int.from_bytes(head[8:10], "little", signed = False)
        head_sub = head[0:4]+head[8:10]+head[12:14]+head[20:28]+head[48:52]+head[92:100]+head[104:108]    
        headers.append(head_sub)
        position += frame_size
    
    #Parse binary blob using NumPy custom dtype
    frame_head_np = np.frombuffer(b''.join(headers), dtype=dtype_sl3)

    #Convert to pandas dataframe
    frame_head_df = pd.DataFrame(frame_head_np)
    
    #Convert x-y coordinates to lat/long 
    frame_head_df["longitude"] = x2lon(frame_head_df["x"])
    frame_head_df["latitude"] = y2lat(frame_head_df["y"])

    #Get survey type label
    frame_head_df["survey_label"] = [survey_dict.get(i, "Other") for i in frame_head_df["survey_type"]]

    #Convert feet to meters
    frame_head_df[["water_depth", "min_range", "max_range"]] /= 3.2808399
    
    return(frame_head_df)

And finally, a function putting it all together:

def read_sl(path):

    format = path.split(".")[-1]

    if format == "sl2":
        data = read_bin(path)
        df = sl2_decode(data)

    elif format == "sl3":
        data = read_bin(path)
        df = sl3_decode(data)

    else:
        print("Only '.sl2' or '.sl3' file formats supported")
        return(-1)

    return(df)

Example

Using this code snippet is simple, e.g. in a script in the same directory as the read_sl.py file:

from read_sl import read_sl
path = 'PATH TO SONAR FILE'
df = read_sl(path) #returns Pandas Dataframe

The above Pandas Dataframe contains all frame headers in the file and some very useful metadata, e.g. it contains the byte positions of the raw sonar data. This enables the reading of specific byte sequences from the positions of interest which can then be parsed using NumPy. Below is an example of how to extract the raw ‘sonar ping’ data from the ‘primary channel’ (down-looking sonar) and plot a subset of it:

from read_sl import read_bin, read_sl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

path = 'PATH TO SONAR FILE'
df = read_sl(path)
sl2_bin_data = read_bin(path)

df_primary = df.query("survey_label == 'primary' & max_range < 60") #Subset Pandas Dataframe

#144 is the frame header size for the '.sl2' format. For '.sl3' it is 168
primary_list = [np.frombuffer(sl2_bin_data[(f+144):(f+(p-144))], dtype="uint8") for f, p in zip(df_primary["first_byte"], df_primary["frame_size"])]
primary_np = np.stack(primary_list)

plt.imshow(primary_np[30000:32000, ].transpose(), cmap="cividis")
plt.show()

Concluding remarks

That is all! Using NumPy just makes the parsing so simple using the custom numpy.dtypes’ and it is fast as well. The approach can readily be adapted for sonar file formats from other manufacturers (I have been looking into data from Humminbird equipment which appear to be straightforward) and binary file formats in general. Further optimization is possible making it a breeze to read and explore sonar data files. These types of equipment collect so much data, and for some of the channels, I still have some issues figuring out how to interpret the byte layout and data types. So in other words, there is plenty of stuff here to dive into!

visualization GIS open source software image analysis

Kenneth Thorø Martinsen

Biologist (PhD)

Research interests in data science and carbon cycling.