Phi-4-multimodal-instruct
Phi-4-multimodal-instruct is a lightweight (5.57B parameters) open multimodal foundation model that leverages research and datasets from Phi-3.5 and 4.0. It processes text, image, and audio inputs to generate text outputs, supporting a 128K token context length. Enhanced via SFT, DPO, and RLHF for instruction following and safety.
Context 128K