website

llama.cpp is a framework for doing LLM and LMM inference on consumer hardware. E.g. laptops. The framework is able to run models that are stored using the GGUF format.

GBNF

Formal grammar definition for constraining generation of llama models to a certain shape.

There is a generator that converts from pydantic to GBNF here

I abused the enum type to force specific combinations of field values and data structures. For example BioASQ has different types of questions that can have differently formatted results. We want to constrain the model to generate the correct type of response to a given question.

class YesNoTypeEnum(Enum):
    yesno = "yesno"
 
class FactoidOrList(Enum):
    factoid = "factoid"
    list = "list"
 
class SummaryEnum(Enum):
    summary = "summary"
 
class YesOrNoEnum(Enum):
    yes = "yes"
    no = "no"
 
class YesNoAnswer(BaseModel):
    type: YesNoTypeEnum = Field(default="yesno")
    ideal_answer : List[str] = Field()
    exact_answer : YesOrNoEnum = Field()
    
class FactoidOrListAnswer(BaseModel):
    type: FactoidOrList = Field(default="factoid")
    ideal_answer : List[str] = Field()
    exact_answer : List[List[str]] = Field()
 
class SummaryAnswer(BaseModel):
    type: SummaryEnum = Field(default="summary")
    ideal_answer : List[str] = Field()
 
# We need no additional parameters other than our list of pydantic models.
gbnf_grammar, documentation = generate_gbnf_grammar_and_documentation([YesNoAnswer, FactoidOrListAnswer, SummaryAnswer])

We force the GBNF converter to pair the type strings with particular ideal_answer and exact_answer formats.