Newsletter
TechAnV Blog
Get updates on security engineering, Rust, eBPF, and DevSecOps. No spam, unsubscribe anytime.
Check your inbox and click the confirmation link to complete your subscription.
Named Entity Resolution with dslim/distilbert-NER#
I was exploring the original BERT model from 2018, which is mainly useful if you fine-tune a model on top of it for a specific task.
dslim/distilbert-NER by David S. Lim is a popular implementation of this, with around 20,000 downloads from Hugging Face every month.
I tried the demo from the README but it didn’t quite work - it complained about an incompatibility with Numpy 2.0.
So I used uv run --with 'numpy<2.0' to run it in a temporary virtual environment. Here’s a Bash one-liner that demonstrated the model:
1uv run --with 'numpy<2.0' --with transformers python -c '2from transformers import AutoTokenizer, AutoModelForTokenClassification3from transformers import pipeline4import json5model = AutoModelForTokenClassification.from_pretrained("dslim/distilbert-NER")6tokenizer = AutoTokenizer.from_pretrained("dslim/distilbert-NER")7nlp = pipeline("ner", model=model, tokenizer=tokenizer)8text = "This is an example sentence about Simon Willison who lives in Half Moon Bay"9print(json.dumps(nlp(text), indent=2, default=repr))'The first time you run this it will download 250MB to your ~/.cache/huggingface/hub/models--dslim--distilbert-NER folder.
Example output:
1[2 {3 "entity": "B-PER",4 "score": "0.9982101",5 "index": 7,6 "word": "Simon",7 "start": 34,8 "end": 399 },10 {11 "entity": "I-PER",12 "score": "0.99835676",13 "index": 8,14 "word": "Willis",15 "start": 40,16 "end": 4617 },18 {19 "entity": "I-PER",20 "score": "0.9977602",21 "index": 9,22 "word": "##on",23 "start": 46,24 "end": 4825 },26 {27 "entity": "B-LOC",28 "score": "0.99432063",29 "index": 13,30 "word": "Half",31 "start": 62,32 "end": 6633 },34 {35 "entity": "I-LOC",36 "score": "0.99325883",37 "index": 14,38 "word": "Moon",39 "start": 67,40 "end": 7141 },42 {43 "entity": "I-LOC",44 "score": "0.9919292",45 "index": 15,46 "word": "Bay",47 "start": 72,48 "end": 7549 }50]