Token Categories and Filtering¶
What Are Token Categories?¶
When kernpy parses a **kern file, every musical element becomes a token with a semantic category. This categorization system allows you to selectively export, analyze, or filter specific types of musical information.
For example, a single note might contain information about:
- Its pitch (C4, D#5, etc.)
- Its duration (quarter note, dotted eighth, etc.)
- Articulations and decorations (staccato, accent, etc.)
- Performance instructions (pizzicato, tremolo, etc.)
Token categories help you work with each type of information independently.
The Category Hierarchy¶
Token categories are organized in a strict hierarchy. Here's the main structure:
STRUCTURAL
├── HEADER
└── SPINE_OPERATION
CORE (Musical notes and rests)
├── NOTE_REST
│ ├── NOTE
│ │ ├── PITCH
│ │ ├── DECORATION
│ │ └── ALTERATION
│ ├── REST
│ └── DURATION
├── CHORD
├── EMPTY
└── ERROR
SIGNATURES (Clefs, time/key signatures)
├── CLEF
├── TIME_SIGNATURE
├── METER_SYMBOL
├── KEY_SIGNATURE
└── KEY_TOKEN
ENGRAVED_SYMBOLS
OTHER_CONTEXTUAL
BARLINES
COMMENTS
├── FIELD_COMMENTS
└── LINE_COMMENTS
DYNAMICS (Dynamic markings)
HARMONY (Harmonic analysis)
FINGERING (Fingering instructions)
LYRICS (Text/lyrics)
INSTRUMENTS (Instrument declarations)
IMAGE_ANNOTATIONS
├── BOUNDING_BOXES
└── LINE_BREAK
OTHER
MHXM (Scale degree notation)
ROOT (Root movement analysis)
Common Categories Explained¶
CORE¶
The main musical content:
NOTE_REST— Notes and restsNOTE— Pitches, durations, articulationsPITCH— Pitch information (c, d, e, etc.)DECORATION— Articulations (staccato, accent, etc.)ALTERATION— Accidentals (#, -, ##, --)
REST— SilenceDURATION— Note lengths (4, 8, 16, dotted notes, etc.)CHORD— Multiple simultaneous pitchesEMPTY— Placeholders or null interpretationsERROR— Parsing errors
SIGNATURES¶
Score-level information:
CLEF— Clef declarations (clefG2, clefF4, etc.)TIME_SIGNATURE— Numerator/denominator of time signatureMETER_SYMBOL— Common time (C), cut time (C/), etc.KEY_SIGNATURE— Key signature (*k[f#c#g#])KEY_TOKEN— Key center declarations
DYNAMICS¶
Dynamic markings: p, f, ff, ppp, crescendos, diminuendos, etc.
HARMONY¶
Harmonic analysis symbols: Roman numerals, scale degrees, functional labels.
LYRICS & ANNOTATIONS¶
LYRICS— Text underlay (words, syllables)FINGERING— Fingering numbers or positionsINSTRUMENTS— Instrument names and transpositionsBOUNDING_BOXES— Coordinates for image-based scoresIMAGE_ANNOTATIONS— Metadata from optical music recognition
Using Categories to Filter Exports¶
Include Specific Categories¶
Export only certain types of information:
import kernpy as kp
doc, _ = kp.load('score.krn')
# Export only pitches and durations
kp.dump(doc, 'pitches_only.krn',
include={
kp.TokenCategory.PITCH,
kp.TokenCategory.DURATION,
kp.TokenCategory.BARLINES
})
# Export only with clefs and key signatures
kp.dump(doc, 'signatures_only.krn',
include={
kp.TokenCategory.SIGNATURES,
kp.TokenCategory.BARLINES
})
Exclude Specific Categories¶
Export everything except certain types:
import kernpy as kp
doc, _ = kp.load('score.krn')
# Export without decorations (articulations)
kp.dump(doc, 'no_articulations.krn',
exclude={kp.TokenCategory.DECORATION})
# Export without dynamics or lyrics
kp.dump(doc, 'no_text.krn',
exclude={
kp.TokenCategory.DYNAMICS,
kp.TokenCategory.LYRICS
})
Using Pre-defined Category Sets¶
kernpy provides pre-defined sets for common use cases:
import kernpy as kp
doc, _ = kp.load('score.krn')
# Export with basic kern information only
kp.dump(doc, 'basic.krn',
include=kp.BEKERN_CATEGORIES)
# Equivalent to:
# include={STRUCTURAL, CORE, SIGNATURES, BARLINES, IMAGE_ANNOTATIONS}
Hierarchical Matching¶
The category hierarchy allows flexible filtering. When you include or exclude a parent category, it affects all children:
import kernpy as kp
doc, _ = kp.load('score.krn')
# Include CORE includes all NOTE, REST, DURATION, CHORD, etc.
kp.dump(doc, 'core_only.krn',
include={kp.TokenCategory.CORE})
# Include SIGNATURES includes CLEF, TIME_SIGNATURE, KEY_SIGNATURE, etc.
kp.dump(doc, 'signatures.krn',
include={kp.TokenCategory.SIGNATURES})
# Exclude NOTE_REST excludes PITCH, DECORATION, ALTERATION, REST, DURATION
kp.dump(doc, 'no_notes.krn',
exclude={kp.TokenCategory.NOTE_REST})
Programmatic Access to Hierarchy¶
Inspect the token category hierarchy in your code:
import kernpy as kp
# View the hierarchy as a tree
print(kp.TokenCategory.tree())
# Get all categories
all_categories = kp.TokenCategory.all()
# Get immediate children of a category
children = kp.TokenCategory.children(kp.TokenCategory.CORE)
# Check if one category is a child of another
is_child = kp.TokenCategory.is_child(
child=kp.TokenCategory.PITCH,
parent=kp.TokenCategory.NOTE
)
Best Practices¶
- Start with parent categories — Use
CORE,SIGNATURES,DYNAMICS, etc. instead of individual leaf categories when possible - Combine include and exclude — You can use both at the same time for fine-grained control:
- Understand the hierarchy before filtering — Use
print(kp.TokenCategory.tree())to see what children a category has - Test your filters — Small example files make it easy to verify filtering behavior
Next Steps¶
- See concepts/encodings.md to understand different export formats (kern, ekern, etc.)
- See advanced/custom-export.md for more advanced filtering strategies
- See guides/transform-documents.md for practical filtering examples