废话少说

在基于大语言模型(LLM)做开发时, 经常用到搜索引擎API去检索互联网上的内容, 包括但不限于检索新闻和论文. 而SearXNG则可提供了一个免费的解决方案:

  • 本地部署简单
  • 支持主流搜索引擎
  • 支持针对新闻和论文的检索

以下是最简的配置过程.

配置过程

环境要求

  • 操作系统: linux
  • 所需软件: docker, docker-compose, nginx
  • 此外如果需要部署到互联网, 还需要独立IP和域名

简单步骤

1. 获取SeaXNG项目目录

git clone https://github.com/searxng/searxng-docker.git
cd searxng-docker

# 修改密钥
sed -i "s|ultrasecretkey|$(openssl rand -hex 32)|g" searxng/settings.yml 

2. 配置SearXNG

以下配置过程可能和其他过程不太一样.

2.1 修改docker-compose.yaml

因为接下来会用nginx来做反向代理, 因此我把caddy相关的配置全部去掉了.

services:
  redis:
    container_name: redis
    image: docker.io/valkey/valkey:8-alpine
    command: valkey-server --save 30 1 --loglevel warning
    restart: unless-stopped
    networks:
      - searxng
    volumes:
      - valkey-data2:/data
    logging:
      driver: "json-file"
      options:
        max-size: "1m"
        max-file: "1"

  searxng:
    container_name: searxng
    image: docker.io/searxng/searxng:latest
    restart: unless-stopped
    networks:
      - searxng
    ports:
      - 127.0.0.1:52679:8080 # 端口号只对内开放
    volumes:
      - ./searxng:/etc/searxng:rw
      - searxng-data:/var/cache/searxng:rw
    environment:
      # - SEARXNG_BASE_URL=https://${SEARXNG_HOSTNAME:-localhost}/
      - SEARXNG_BASE_URL=https://searxng.grassyiyi.com/ # 这里填写你的域名
    logging:
      driver: "json-file"
      options:
        max-size: "1m"
        max-file: "1"

networks:
  searxng:

volumes:
  # caddy-data:
  # caddy-config:
  valkey-data2:
  searxng-data:

2.2 修改searxng/settings.yml

这个文件应该不用修改也是可以的. 我增加了对网页端json输出的支持.
如果只是api调用, 应该不需要做任何调整 (未经严格测试).

# see https://docs.searxng.org/admin/settings/settings.html#settings-use-default-settings
use_default_settings: true
server:
  # base_url is defined in the SEARXNG_BASE_URL environment variable, see .env and docker-compose.yml
  secret_key: "4dada664c40aa445d8f3567d914fd077bf43ed224135903618c3ca3655c50b72"  # change this!
  limiter: false  # enable this when running the instance for a public usage on the internet
  image_proxy: true
redis:
  url: redis://redis:6379/0

# 以下内容才是新加的
search:
  formats:
    - html
    - json

2.3 启动容器

如果配置没问题, 容器应该能正常启动:

docker-compose up -d
docker-compose logs -f

以下是正常输出的日志:

searxng  | SearXNG 2025.8.20-41a4a3e
searxng  | [INFO] Starting granian (main PID: 1)
searxng  | [INFO] Listening at: http://:::8080
searxng  | [INFO] Spawning worker-1 with PID: 11
searxng  | /usr/local/searxng/searx/valkeydb.py:42: DeprecationWarning: setting redis.url is deprecated, use valkey.url
searxng  |   warnings.warn("setting redis.url is deprecated, use valkey.url", DeprecationWarning)
searxng  | [INFO] Started worker-1
searxng  | [INFO] Started worker-1 runtime-1

3. 配置nginx

3.1 修改配置文件

默认配置下, 在nginx/conf.d下新建文件searxng.conf:

server {
    listen 80;
    listen 443 ssl;

    server_name searxng.grassyiyi.com  searxng.m2e.me; 

    ssl_certificate /etc/nginx/ssl/searxng.grassyiyi.com/fullchain.cer; # 证书文件路径
    ssl_certificate_key /etc/nginx/ssl/searxng.grassyiyi.com/searxng.grassyiyi.com.key; # 私钥文件路径
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers HIGH:!aNULL:!MD5;
    ssl_prefer_server_ciphers on;

    root /usr/share/nginx/html/ ;

    location ~ /.well-known {
        allow all;
    }

    location / {
        proxy_pass http://127.0.0.1:52679; # 端口号需要与 SearXNG 的对内端口号对应
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }

    access_log /var/log/nginx/searxng-access.log main;
    error_log /var/log/nginx/searxng-error.log warn;
    
}

然后重新加载配置:

nginx -s reload

注意: 在还没申请到ssl证书时, 需要先把ssl证书相关的配置删掉, 否则会报错.

3.2 申请ssl证书

这里是通过acme.sh申请的证书. 先安装acme.sh:

# 参照 https://github.com/acmesh-official/get.acme.sh
apt-get update
apt-get install -y curl wget cron socat
curl https://get.acme.sh | sh -s [email protected]

然后通过acme.sh申请ssl证书:

mkdir -p /etc/nginx/ssl/searxng.grassyiyi.com/ # 一定要手动创建目录, 否则报错

/root/.acme.sh/acme.sh --issue \
    -d searxng.grassyiyi.com \
        --webroot /usr/share/nginx/html \
      --cert-file /etc/nginx/ssl/searxng.grassyiyi.com/searxng.grassyiyi.com.cer \
       --key-file /etc/nginx/ssl/searxng.grassyiyi.com/searxng.grassyiyi.com.key \
 --fullchain-file /etc/nginx/ssl/searxng.grassyiyi.com/fullchain.cer \
      --reloadcmd "nginx -s reload" # 证书自动更新

修改完证书后, 记得调整nginxssl, 然后执行nginx -s reload.

若无意外, 此时可以通过https://searxng.grassyiyi.com/访问到该搜索引擎.

API调用

SearXNG同时支持getpost请求. 请求参数可见官方文档的API参数说明.

基于python3requests

这部分太简单了, 直接给代码吧.

import requests
import json

url = "https://searxng.grassyiyi.com/search"

payload = {
    "q": "openai", # 搜索关键词
    "language": "en", # 语言
    "time_range": "day", # 最近一天
    "safesearch": "0",
    "engines": "duckduckgo news", # 搜索引擎设置
    "category": "news", # 类目
    "format": "json" # 输出格式
}

headers = {
    "Content-Type": "application/x-www-form-urlencoded"
}

response = requests.post(url, data=payload, headers=headers)

# 检查请求是否成功
response.raise_for_status()

# 保存 JSON 响应
with open("news.json", "w", encoding="utf-8") as f:
    json.dump(response.json(), f, indent=2, ensure_ascii=False)

print("结果已保存到 news.json")

输出的json格式如下:

{
  "query": "openai",
  "number_of_results": 0,
  "results": [
    {
      "url": "https://www.businessinsider.com/how-to-stay-competitive-ai-era-openai-cfo-sarah-friar-2025-8",
      "title": "OpenAI CFO says these 3 things will help your company stay competitive in the AI era",
      "content": "OpenAI CFO Sarah Friar outlined three key strategies companies should focus on to build a \"competitive moat\" in the AI era.",
      "source": "Insider",
      "publishedDate": "2025-08-20T17:52:00",
      "engine": "duckduckgo news",
      "template": "default.html",
      "parsed_url": [
        "https",
        "www.businessinsider.com",
        "/how-to-stay-competitive-ai-era-openai-cfo-sarah-friar-2025-8",
        "",
        "",
        ""
      ],
      "img_src": "",
      "thumbnail": "",
      "priority": "",
      "engines": [
        "duckduckgo news"
      ],
      "positions": [
        1
      ],
      "score": 1.0,
      "category": "news",
      "pubdate": "2025-08-20 17:52:00"
    },
(以下略)

后记

  1. SearXNG缺点很多. 最明显的是搜索结果的质量明显不如主流搜索引擎. 但是凑活能用吧.
  2. 官方文档的API参数说明提到time_range仅支持[day, month, year]这三个参数. 我看了相应的代码, 从time_range_map这个变量来看time_range应该还支持week这个参数. (未经严格测试)

标签: none

添加新评论