Files
dotfiles_arch/ansible/TROUBLESHOOTING.md
2026-02-16 23:40:30 +01:00

510 lines
9.2 KiB
Markdown

# Troubleshooting Guide
Common issues and solutions for Nextcloud Stack deployment.
## Table of Contents
- [DNS Issues](#dns-issues)
- [SSL Certificate Problems](#ssl-certificate-problems)
- [Docker Issues](#docker-issues)
- [LXC Container Issues](#lxc-container-issues)
- [Nextcloud Issues](#nextcloud-issues)
- [Database Connection Issues](#database-connection-issues)
- [Tailscale Issues](#tailscale-issues)
- [Port Conflicts](#port-conflicts)
- [Permission Issues](#permission-issues)
---
## DNS Issues
### Problem: DNS records not resolving
**Symptoms:**
- Let's Encrypt fails to issue certificates
- Caddy shows certificate errors
- Services inaccessible via domain
**Diagnosis:**
```bash
dig +short cloud.yourdomain.com @8.8.8.8
```
**Solution:**
1. Ensure all required A records point to your server IP
2. Wait for DNS propagation (up to 48 hours, usually minutes)
3. Use [DNSChecker.org](https://dnschecker.org) to verify global propagation
**Required DNS Records:**
```
cloud.yourdomain.com → YOUR_SERVER_IP
office.yourdomain.com → YOUR_SERVER_IP
draw.yourdomain.com → YOUR_SERVER_IP
notes.yourdomain.com → YOUR_SERVER_IP
home.yourdomain.com → YOUR_SERVER_IP
manage.yourdomain.com → YOUR_SERVER_IP
uptime.yourdomain.com → YOUR_SERVER_IP
```
**Temporary Workaround:**
Edit `/etc/hosts` on your local machine:
```
YOUR_SERVER_IP cloud.yourdomain.com
```
---
## SSL Certificate Problems
### Problem: Let's Encrypt rate limit exceeded
**Symptoms:**
- Error: "too many certificates already issued"
**Solution:**
1. Use Let's Encrypt staging server for testing
2. Edit Caddyfile (add to global options):
```caddy
{
email {{ user_email }}
acme_ca https://acme-staging-v02.api.letsencrypt.org/directory
}
```
3. Reload Caddy: `docker exec caddy caddy reload`
4. After testing, remove staging server line
**Rate Limits:**
- 50 certificates per domain per week
- 5 duplicate certificates per week
### Problem: Certificate validation failed
**Symptoms:**
- "Failed to verify" errors in Caddy logs
**Diagnosis:**
```bash
docker logs caddy
```
**Common Causes:**
1. DNS not pointing to server
2. Firewall blocking port 80/443
3. Another service using port 80/443
**Solution:**
```bash
# Check firewall
sudo ufw status
# Check port usage
sudo ss -tlnp | grep ':80\|:443'
# Check DNS
dig +short yourdomain.com
```
---
## Docker Issues
### Problem: Docker daemon won't start
**Symptoms:**
- `docker ps` fails
- Error: "Cannot connect to Docker daemon"
**Diagnosis:**
```bash
sudo systemctl status docker
sudo journalctl -xu docker
```
**Solution:**
```bash
sudo systemctl restart docker
```
### Problem: Containers keep restarting
**Diagnosis:**
```bash
cd /opt/nextcloud-stack
docker compose logs [service-name]
```
**Common Causes:**
1. Configuration errors
2. Port conflicts
3. Missing dependencies
**Solution:**
```bash
# Check specific container
docker logs next-db
docker logs next
docker logs caddy
# Restart specific service
docker compose restart next
```
---
## LXC Container Issues
### Problem: Docker fails to start in LXC
**Symptoms:**
- Error: "cgroups: cgroup mountpoint does not exist"
- Docker daemon fails to start
**Diagnosis:**
```bash
systemd-detect-virt # Should show "lxc"
```
**Solution on LXC Host:**
```bash
# Set security nesting
lxc config set CONTAINER_NAME security.nesting true
# May also need privileged mode
lxc config set CONTAINER_NAME security.privileged true
# Restart container
lxc restart CONTAINER_NAME
```
**Inside LXC Container:**
```bash
# Verify cgroups
mount | grep cgroup
# Check Docker status
sudo systemctl status docker
```
### Problem: AppArmor denials in LXC
**Solution on LXC Host:**
```bash
lxc config set CONTAINER_NAME raw.lxc "lxc.apparmor.profile=unconfined"
lxc restart CONTAINER_NAME
```
---
## Nextcloud Issues
### Problem: Nextcloud stuck in maintenance mode
**Symptoms:**
- Web interface shows "System in maintenance mode"
**Solution:**
```bash
docker exec -u www-data next php occ maintenance:mode --off
```
### Problem: Trusted domain error
**Symptoms:**
- "Access through untrusted domain" error
**Solution:**
```bash
docker exec -u www-data next php occ config:system:set trusted_domains 1 --value=cloud.yourdomain.com
```
### Problem: Redis connection failed
**Diagnosis:**
```bash
docker logs next-redis
docker exec next-redis redis-cli ping
```
**Solution:**
```bash
# Reconfigure Redis in Nextcloud
docker exec -u www-data next php occ config:system:set redis host --value=next-redis
docker exec -u www-data next php occ config:system:set redis port --value=6379
```
### Problem: File uploads fail
**Symptoms:**
- Large files won't upload
- Error 413 (Payload Too Large)
**Solution:**
Already configured in Caddyfile for 10GB uploads. Check:
```bash
docker exec -u www-data next php occ config:system:get max_upload
```
### Problem: OnlyOffice integration not working
**Solution:**
```bash
# Install OnlyOffice app
docker exec -u www-data next php occ app:install onlyoffice
# Configure document server URL
docker exec -u www-data next php occ config:app:set onlyoffice DocumentServerUrl --value="https://office.yourdomain.com/"
# Disable JWT (or configure if needed)
docker exec -u www-data next php occ config:app:set onlyoffice jwt_secret --value=""
```
---
## Database Connection Issues
### Problem: Nextcloud can't connect to database
**Symptoms:**
- Error: "SQLSTATE[08006]"
- Nextcloud shows database error
**Diagnosis:**
```bash
# Check if PostgreSQL is running
docker ps | grep next-db
# Check PostgreSQL logs
docker logs next-db
# Test connection
docker exec next-db pg_isready -U nextcloud
```
**Solution:**
```bash
# Restart database
docker compose restart next-db
# Wait for it to be healthy
docker exec next-db pg_isready -U nextcloud
# Restart Nextcloud
docker compose restart next
```
### Problem: Database initialization failed
**Symptoms:**
- PostgreSQL container keeps restarting
- Empty database
**Solution:**
```bash
# Remove volumes and recreate
cd /opt/nextcloud-stack
docker compose down -v
docker compose up -d
```
**⚠️ WARNING:** This deletes all data! Only use for fresh installations.
---
## Tailscale Issues
### Problem: Can't access Tailscale-only services
**Symptoms:**
- Homarr, Dockhand, Uptime Kuma return 403 Forbidden
**Diagnosis:**
```bash
# Check if Tailscale is running
sudo tailscale status
# Get Tailscale IP
tailscale ip -4
```
**Solution:**
```bash
# Activate Tailscale (if not done)
sudo tailscale up
# Verify connection
tailscale status
```
**Access via:**
- Tailscale IP: `https://100.64.x.x:PORT`
- MagicDNS: `https://hostname.tailnet-name.ts.net`
### Problem: Tailscale not installed
**Solution:**
```bash
# Re-run Tailscale playbook
ansible-playbook playbooks/04-tailscale-setup.yml --ask-vault-pass
```
---
## Port Conflicts
### Problem: Port 80 or 443 already in use
**Symptoms:**
- Error: "bind: address already in use"
- Caddy won't start
**Diagnosis:**
```bash
sudo ss -tlnp | grep ':80\|:443'
```
**Common Culprits:**
- Apache2
- Nginx
- Another Caddy instance
**Solution:**
```bash
# Stop conflicting service
sudo systemctl stop apache2
sudo systemctl disable apache2
# OR
sudo systemctl stop nginx
sudo systemctl disable nginx
# Restart Caddy
docker compose restart caddy
```
---
## Permission Issues
### Problem: Permission denied errors in Nextcloud
**Symptoms:**
- Can't upload files
- Can't install apps
**Diagnosis:**
```bash
# Check file permissions
docker exec next ls -la /var/www/html
```
**Solution:**
```bash
# Fix permissions (run inside container)
docker exec next chown -R www-data:www-data /var/www/html
```
### Problem: Docker socket permission denied
**Symptoms:**
- Homarr or Dockhand can't see containers
**Solution:**
Docker socket is mounted read-only by design for security.
This is normal and expected.
---
## Emergency Commands
### Completely restart the stack
```bash
cd /opt/nextcloud-stack
docker compose down
docker compose up -d
```
### View all logs in real-time
```bash
cd /opt/nextcloud-stack
docker compose logs -f
```
### Check container health
```bash
docker compose ps
docker inspect --format='{{.State.Health.Status}}' next
```
### Rebuild a specific container
```bash
docker compose up -d --force-recreate --no-deps next
```
### Emergency backup
```bash
/opt/nextcloud-stack/backup.sh
```
### Reset Nextcloud admin password
```bash
docker exec -u www-data next php occ user:resetpassword admin
```
---
## Getting Help
If none of these solutions work:
1. **Check logs:**
```bash
docker compose logs [service-name]
```
2. **Check system logs:**
```bash
sudo journalctl -xe
```
3. **Verify configuration:**
```bash
cat /opt/nextcloud-stack/docker-compose.yml
cat /opt/nextcloud-stack/.env
```
4. **Test connectivity:**
```bash
curl -I https://cloud.yourdomain.com
docker exec caddy caddy validate
```
5. **Deployment report:**
```bash
cat /opt/nextcloud-stack/DEPLOYMENT.txt
```
---
## Recovery Procedures
### Restore from backup
See [BACKUP_RESTORE.md](BACKUP_RESTORE.md)
### Complete reinstallation
```bash
# 1. Backup first!
/opt/nextcloud-stack/backup.sh
# 2. Remove deployment
ansible-playbook playbooks/99-rollback.yml --ask-vault-pass
# 3. Redeploy
ansible-playbook playbooks/site.yml --ask-vault-pass
```
---
**Last Updated:** 2026-02-16