Python | datainwater

Large language models for text translation

In recent years, machine translation has come a long way. Thanks to advances in artificial intelligence and natural language processing (NLP), it’s now possible to translate text from one language to another quickly and accurately. However, traditional approaches to machine translation have their limitations. They often rely on rule-based systems or statistical models that can struggle with complex sentence structures and idiomatic expressions. That’s where generative large language models (LLMs) come in.

Finetuning GPT-2 for scientific text generation

Suggesting that deep learning models based are capable of generating realistic text from a prompt would be an understatement. Ever since the advent of Transformer models, natural language processing has been undergoing a revolution. Large language models (LLMs), and generative models in general, have received public attention with the releases of text-to-image models (Stable Diffusion) and of course the ChatGPT chatbot. While LLMs have impressive generalized capabilities for text generation, they can be challenging to use due to their size (hundreds of millions or even billions of trainable parameters).

Deploy machine learning models with R Shiny and ONNX

Python is often the go-to language for machine learning, especially for training deep learning models using the PyTorch or TensorFlow libraries. Python definitely provides nice tools for deploying such models on the web as REST APIs or GUI web applications. However, models can also be exported to the ONNX format and subsequently be used for inference using an ONNX runtime. Conversion to ONNX format, as opposed to doing inference using PyTorch, is beneficial as the ONNX runtime comes in a much smaller package in terms of size and is very efficient.

Plant ID app (part 2): REST API

In part 1 of this blog post, we downloaded ~25.000 images of 100 plant species and trained a deep learning classification model. The 100 plant species are included in the Danish stream plant index (DVPI). In part 2, we create a REST API with endpoints/services that can be accessed from a very simple landing page. All code from parts 1 and 2 of this blog post can be found on GitHub.

Plant ID app (part 1): Data and model training

Plants species can be truly difficult to tell apart and this job often requires expert knowledge. However, when images are available computer vision methods can be used to guide us in the right direction. Deep learning methods are very useful for image analysis. Training convolutional neural networks have become the way to solve a wide range of image task including segmentation, classification, etc. Here, we will train a lightweight image classification model to identify 100 different plant species.

Parsing sonar data in Python using NumPy

Recreational-grade sonar equipment can collect vast amounts of data. Unfortunately, the data is often hidden in some kind of proprietary binary format. However, efforts in reverse engineering such formats have made it possible to extract of the information. I have spent time tracking down some this information which has resulted in a R-package as well which can read ‘.sl2’ and ‘.sl3’ file formats collected using Lowrance sonar equipment. See also the sllib Python library which fills a similar gap.

Semantic segmentation using U-Net with PyTorch

Deep learning is here to stay and has revolutionized the way data is analyzed. Furthermore, it is straightforward to get started. Recently, I played around with the fastai library to classify fish species but wanted to go further behind the scenes and dig deeper into PyTorch. As part of another project, I have used a U-Net to perform semantic segmentation of ‘pike’ in images. Training has been done on Google Colab and a local GPU powered workstation excellent for smaller experiments.

Fish species classification using deep learning and the fastai library

Deep learning is everywhere. The surge of new methods for analyzing all kinds of data is astonishing. Especially image analysis has been impacted by deep learning with new methods and rapid improvements in model performance for many different tasks. Convolutional neural networks (CNN) can be used to classify images with high accuracy and new libraries have made it easier than ever to build and train such networks. The best thing is that you do not need large amounts of data or specialized GPU hardware to experiment with techniques such as transfer learning, where we only need to fine-tune the last part of a pre-trained network.

Calculating wind fetch on lakes using Python

Wind traveling across water surfaces creates waves. Wave action depends on several parameters including fetch which is the unobstructed length which the wind can travel across a water surface from a given direction. Areas with high wind fetch are often exposed areas but this also depends on the primary wind direction. By calculating wind fetch, we can quantify exposure of different areas and shorelines in waterbodies.

Height above nearest drainage map for Denmark

Rain, rain and more rain - 2019 was a very wet year in Denmark and especially September and October were very rainy (DR news). The year ended with 905.2 mm which tied the previous record from 1999. The normal amount is around 700 mm. It continued to rain in 2020 with February and already on February 23 the previous record was surpassed (DR news). The extreme amount of water caused flooding in several parts of Denmark.