UW LLM

Some of the projects that I have worked (is working) on:

Code DPO: Leading a thesis paper on applying direct preference optimization (DPO) to create a new reward model for code generation tasks. Created a new reward model by fine-tuning the pairwise reward model architecture with DeepSpeed and LoRA on a newly curated preference dataset with 250,000+ entries for intent alignment. Utilized 50+ pre-trained LLMs to make inferences on 10+ datasets using Huggingface Transformers and vLLM.
MFuyu: We explored LLMs’ ability to reason about multiple images. Created a new multi-image benchmark dataset from over 17 existing datasets. Then fine-tuned the Fuyu model to create a custom model capable of complex reasoning over multiple images.