Modern consumers carry many electronic devices, like a mobile phone, digital camera, GPS, PDA and an MP3 player. The functionality of each of these devices has gone through an important evolution over recent years, with a steep increase in both the number of features as in the quality of the services that they provide. However, providing the required compute power to support (an uncompromised combination of) all this functionality is highly non-trivial. Designing processors that meet the demanding requirements of future mobile devices requires the optimization of the embedded system in general and of the embedded processors in particular, as they should strike the correct balance between flexibility, energy efficiency and performance. In general, a designer will try to minimize the energy consumption (as far as needed) for a given performance, with a sufficient flexibility. However, achieving this goal is already complex when looking at the processor in isolation, but, in reality, the processor is a single component in a more complex system. In order to design such complex system successfully, critical decisions during the design of each individual component should take into account effect on the other parts, with a clear goal to move to a global Pareto optimum in the complete multi-dimensional exploration space.In the complex, global design of battery-operated embedded systems, the focus of Ultra-Low Energy Domain-Specific Instruction-Set Processors is on the energy-aware architecture exploration of domain-specific instruction-set processors and the co-optimization of the datapath architecture, foreground memory, and instruction memory organisation with a link to the required mapping techniques or compiler steps at the early stages of the design. By performing an extensive energy breakdown experiment for a complete embedded platform, both energy and performance bottlenecks have been identified, together with the important relations between thedifferent components. Based on this knowledge, architecture extensions are proposed for all the bottlenecks.