University of Waterloo
LLM Researcher
Nov 2023 - Ongoing, Waterloo, Canada
Some of the projects that I have worked (is working) on:
-
Code DPO: Leading a thesis paper on applying direct preference optimization (DPO) to create
a new reward
model for code generation tasks. Created a new reward model by fine-tuning the pairwise
reward
model architecture with DeepSpeed and LoRA on a newly curated preference dataset with
250,000+
entries for intent alignment. Utilized 50+ pre-trained LLMs to make inferences on 10+
datasets
using Huggingface Transformers and vLLM.
-
MFuyu: We explored LLMs’ ability to reason about multiple images. Created a new multi-image
benchmark
dataset from over 17 existing datasets. Then fine-tuned the Fuyu model to create a custom
model
capable of complex reasoning over multiple images.