Big data ingestion is about moving data - especially unstructured data - from where it is originated, into a system where it can be stored and analyzed such as Hadoop.
Data ingestion may be continuous or asynchronous, real-time or batched or both (lambda architecture) depending upon the characteristics of the source and the destination. In many scenarios, the source and the destination may not have the same data timing, format or protocol and will require some type of transformation or conversion to be usable by the destination system.
As the number of IoT devices grows, both volume and variance of data sources are expanding rapidly, sources which now need to be accommodated, and often in real time. Yet extracting the data such that it can be used by the destination system is a significant challenge in terms of time and resources. Making data ingestion as efficient as possible helps focus resources on big data streaming and analysis, rather than the mundane efforts of data preparation and transformation.