Purr-LM

Fine-tuning Gemma 2b for Purr-Data: An Experiment

Recently, I decided to experiment with fine-tuning Google's Gemma 2b instruct model to generate source code for Purr-Data patches.

Purr-Data, a visual programming language for creating multimedia applications, is amazing for its unique approach. But its niche use case means there's not a ton of data floating around online. This lack of data makes it a challenge for traditional machine learning models to understand.

Building a Dataset for Purr-Data Patch Source code examples

I created a dataset with the goal of evaluating the ability of large language models like Google's 2B GEMMA to be fine-tuned for Purr-Data source code generation.
It focuses specifically on patches that output a particular message when a "bang" object is clicked.

Dataset Characteristics:

Content: Each data point consists of two parts:

Instruction: A textual description of the desired Purr-Data patch functionality. This description focuses on the message the patch should output.
example instruction =>
"can you make a Purr-Data patch that displays a funny message?
Response: The corresponding Purr-Data source code that fulfills the given instruction:
example response =>
#N canvas 761 0 768 809 10;
#X obj 260 170 bng 15 250 50 0 empty empty empty 17 7 0 10 #fcfcfc #000000 #000000;
#X msg 334 25 What do you call a fish with no eyes? Fsh!;
#X obj 427 335 print;
#X connect 0 0 1 0;
#X connect 1 0 2 0;

Focus: The dataset is restricted to examples where the patch functionality centers around printing a specific message on a bang click.

link to the dataset: https://huggingface.co/datasets/ParZiVal04/Purr-Data_example_source_codes

Video Demo

view on youtube if the player above doesn't work

A Proof of Concept for Niche Languages

This experiment showed that fine-tuning a large language model can be a viable approach for working with niche visual languages like Purr-Data. It's a small step, but one that paves the way for further exploration.

The Future

There's still a lot to explore. I'd love to expand the dataset to include more complex Purr-Data patches and see how the model performs. Ideally, some human programmers evaluating the quality of the code the model generates would be nice.