Named Entity Resolution with dslim/distilbert-NER#

I was exploring the original BERT model from 2018, which is mainly useful if you fine-tune a model on top of it for a specific task.

dslim/distilbert-NER by David S. Lim is a popular implementation of this, with around 20,000 downloads from Hugging Face every month.

I tried the demo from the README but it didn’t quite work - it complained about an incompatibility with Numpy 2.0.

So I used uv run --with 'numpy<2.0' to run it in a temporary virtual environment. Here’s a Bash one-liner that demonstrated the model:

1
uv run --with 'numpy<2.0' --with transformers python -c '
2
from transformers import AutoTokenizer, AutoModelForTokenClassification
3
from transformers import pipeline
4
import json
5
model = AutoModelForTokenClassification.from_pretrained("dslim/distilbert-NER")
6
tokenizer = AutoTokenizer.from_pretrained("dslim/distilbert-NER")
7
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
8
text = "This is an example sentence about Simon Willison who lives in Half Moon Bay"
9
print(json.dumps(nlp(text), indent=2, default=repr))'

The first time you run this it will download 250MB to your ~/.cache/huggingface/hub/models--dslim--distilbert-NER folder.

Example output:

1
[
2
  {
3
    "entity": "B-PER",
4
    "score": "0.9982101",
5
    "index": 7,
6
    "word": "Simon",
7
    "start": 34,
8
    "end": 39
9
  },
10
  {
11
    "entity": "I-PER",
12
    "score": "0.99835676",
13
    "index": 8,
14
    "word": "Willis",
15
    "start": 40,
16
    "end": 46
17
  },
18
  {
19
    "entity": "I-PER",
20
    "score": "0.9977602",
21
    "index": 9,
22
    "word": "##on",
23
    "start": 46,
24
    "end": 48
25
  },
26
  {
27
    "entity": "B-LOC",
28
    "score": "0.99432063",
29
    "index": 13,
30
    "word": "Half",
31
    "start": 62,
32
    "end": 66
33
  },
34
  {
35
    "entity": "I-LOC",
36
    "score": "0.99325883",
37
    "index": 14,
38
    "word": "Moon",
39
    "start": 67,
40
    "end": 71
41
  },
42
  {
43
    "entity": "I-LOC",
44
    "score": "0.9919292",
45
    "index": 15,
46
    "word": "Bay",
47
    "start": 72,
48
    "end": 75
49
  }
50
]