Passt nicht? Macht nichts! Bei uns ist die Rückgabe innerhalb von 30 Tagen möglich
Mit einem Geschenkgutschein können Sie nichts falsch machen. Der Beschenkte kann sich im Tausch gegen einen Geschenkgutschein etwas aus unserem Sortiment aussuchen.
30 Tage für die Rückgabe der Ware
AI Engineering: Building Multi-Modal Intelligent Systems with Vision, Language, and Audio
From LLM Fine-Tuning to Voice Agents, AR Interfaces, and Real-World Deployment
Unlock the future of artificial intelligence with practical, production-ready multi-modal engineering.
This hands-on guide is built for developers, researchers, and AI professionals who want to go beyond chatbots and dive into building intelligent systems that understand text, images, audio, and human intent - all in one pipeline.
Whether you're fine-tuning large language models (LLMs) or creating voice-driven AR interfaces, this book walks you through the real engineering decisions, tools, and architectures needed to bring multi-modal AI to life.
Fine-tuning Large Language Models (LLMs): Train and adapt models like GPT-2, LLaMA, and Mistral for custom tasks using Hugging Face, LoRA, QLoRA, and PEFT.
Voice Interfaces: Combine Whisper, LLMs, and Bark/Tortoise TTS to build interactive speech-driven assistants.
Computer Vision + Language: Use models like BLIP, CLIP, and DETR to connect what systems see to what they say and understand.
Instruction Tuning & Hyperparameter Optimization: Build smarter, domain-specific models with efficient training workflows.
Multi-Modal Pipelines: Chain audio, image, and text inputs for question answering, summarization, tutoring, and AR/robotic control.
Real-Time Interfaces: Deploy intelligent agents using FastAPI, Streamlit, Gradio, Docker, and Hugging Face Spaces.
Edge & Offline Deployment: Optimize models with ONNX, quantization (4-bit, 8-bit), and TensorRT for low-latency inference on CPU/GPU.
Smart document summarizers with OCR + TTS
Voice-enabled image assistants
Emotion-aware agents
Virtual tutors
AR-enhanced AI interfaces
Robotic perception + control from voice/image input
Secure, multilingual, and privacy-conscious AI systems
Python, PyTorch, Hugging Face Transformers
LangChain, OpenCV, Whisper, TTS, BLIP
ROS, Unity (AR/VR), Gradio, Streamlit
Docker, FastAPI, gRPC, TorchServe
Built for engineers. Written with depth. Designed for real-world impact.
If you're ready to build intelligent multi-modal agents that understand the world like humans do - across speech, vision, and language - this book gives you the complete roadmap.
Perfect for:
Machine learning engineers, data scientists, AI product developers, researchers, robotics engineers, and anyone building cutting-edge AI systems.
Hallo! Ich bin Libroamiko, dein Buchberater.
Wie kann ich dir helfen?