Proper focus of Big data
We
find ourselves in the era of big data, where vast, continuous streams of
heterogeneous human-related data are collected by digital means, and simplified
for consumption according to the 5v characterization: volume (size of the
data), variety (diversity of the content), velocity (the rate it’s produced),
veracity (the quality of the content) and value (it’s business impact). these
humongous data sets are collected via many different means including computer
networks, social media profiles, web browsing histories, mobile phone sensors,
internet of things (iot) devices, video data from (self-)driving and robotic
applications, our commercial transactions and more, Learn Big
Data training in Chennai at Greens
Technologys .
The
complex task of processing and analyzing big data has pushed computer
engineering and computer science on several fronts, such as distributed
parallel processing (e.g., map-reduce and streaming architectures) and machine
learning (e.g., deep learning). it has challenged our technological limits in
data center design, computer processing, storage capacity and communication
bandwidth. although big data still has many significant research problems,
sometimes the solution is just more data. however, for most institutions,
having and using big data is either impossible, impractical, costly to justify,
or difficult to outsource due to the over demand of qualified resources.
But
what exactly is small data? this question has been asked and answered so
frequently it has multiple definitions: “data that is small enough size for
human comprehension” ,“data that fits in a laptop”, “data in a volume and
format that makes it accessible, informative and actionable” ,“the digital
trace that each person generates” ,for our purposes, all of these definitions
are appropriate since they exclude big data applications.
The
most important reason to worry about small data is that most companies in the
world will never have big data. displayed graphically below, with companies
plotted in one axis and the amount of data that they can gather in the other,
we see that data generated by most companies forms the torso and the diverse
long tail of data at large.
Big
data has the benefit of malleability, meaning we can use big data to generate
small data. one of the most common purposes of big data is to produce myriads
of coherent, specialized small data sets, often created just from the
transformation process itself. some key benefits of small data include:
Most
data that people consume is small data:
- In most cases, small data is the right data for the problem at hand
- Small data is more available, precise, and complete
- Small data is driving the internet of things
- Small data is about people, small groups, and communities
- Small data describes every person in each context
- Small data can be understood and interpreted by humans
- Most innovations are triggered by small data
For
the reasons above there appears to be larger value in small data. some people
claim that “small data is the new big data” , that “small data is the real
revolution” or that “small data is where the money lies” . in fact, extremely
small data contributes to our yes-or-no-decisions for any important choice,
making how much data we need to determine a given decision a primary concern.
however, having less data does not imply the problem is simpler or that we
exactly know what to do with it
At
best, currently we have partial answers to these questions. however, many
research problems beckon better answers or have no answers at all. among them
are privacy preservation, resource-efficient machine learning, adaptive and
dynamic model selection, overfitting analysis, feature selection, specialized smoothing,
error analysis, bias detection and calibration, better interpretability and
causation analysis.
In
addition, while processing small data should be faster, in most cases there
isn’t enough data to apply deep learning, spawning new problems such as bias,
noise detection and correction, quantifying error and uncertainty, constrained
modeling, and smoothing. to make matters worse, these problems have
interdependencies with trade-offs that in most cases are not well studied. if
we factor in that most target data is personal and lives in a tiny, portable
device, then we must preserve privacy and/or resolve the problem in a device
that has limited computing power, memory, communication, and energy.
Therefore,
due to small data’s ubiquitous presence and large impact in the world of smes
and individuals, it is crucial to understand it well. in addition to the 5v
characterization of volume, velocity, variety, veracity, and value, we should
mention some other notable aspects :
- Scope - how exhaustive is the data related to the problem at hand?
- Resolution and identity - how fine-grained is the data and how identifiable is each item?
- Relational - how easy is it to conjoin different datasets through common fields or encodings that are part of the data?
- Flexibility - how easy is it to extend the data (g., adding new fields) and scale in size?
- Privacy - how does the data relate to people?
It
compares small data with big data using 12 dimensions. additional aspects of
data might be important for most applications, including how data is collected,
the technology and software used, the data ontology employed, and the context
in which the data is generated.
Lately,
the use of small data has awakened interest among the scientific e-health
community. deborah estrin defines small
data as “the picture of your personal health.” she spearheads initiatives that
liberate data to the consumer, arguing that digital behavior leads to valuable
knowledge about an individual’s personal health (e.g., http://smalldata.io/).
patients may opt to share certain data with researchers and analysts while
keeping other information solely for their doctor, or choose not to disclose
any data at all. researchers and practitioners should follow a framework of
informed consent to guarantee the only data exchanged is the data authorized by
the patient. in some cases, the patient can be reticent to share her sensitive
personal information. in those cases (maybe the most interesting ones) digital
devices should be able to analyze the data locally, triggering an alarm only in
case of emergency
In
this and other contexts, small personal data refers to information related to a
living individual. such data is normally associated with an identifiable person
and might include first name, middle name, last name, address, telephone
number, passport number, specific health or cognitive conditions, etc. in
general, this data is private and hence the regulatory environment with respect
to privacy, data protection and security is pivotal.
As
a side effect, there are two facts that make the use of personal data
challenging for machine learning based applications. first, it is difficult to
gather; personal data is considered private or sensitive and most people are
not comfortable sharing it. second, personal data related to mental, health and
educational conditions is scarce because such conditions are infrequent, and
often “hidden,” meaning they are not identified before the symptoms.
furthermore, most of these conditions (e.g., physical and mental illnesses,
education disorders) are very much subject dependent. their manifestations
highly depend from person to person, making generalizations impossible. this
means that data coming from some subjects might not be suitable training data
for other individuals. as a result, the most suitable data to train an
algorithm is the data belonging to the same subject, making the target data
even smaller.
Clearly
more research is needed in addition to our current efforts at big data advances
like the two main trends of distributed/parallel processing, and deep learning.
we also need to explore the limits of small data and exacting its uses. then we
can learn if the right data, needs to be small or big. however, if 2013 was
considered the year of small data , we are already late. don’t wait another
minute. now is the perfect time to perfect it.
Big Data @ Greens Technologys
- If you are seeking to get a good Big Data training in Chennai, then Greens Technologys should be the first and the foremost option.
- We are named as the best training institute in Chennai for providing the IT related training. Greens Technologys is already having an eminent name in Chennai for providing the best software courses training.
- We have more than 115 courses for you. We offer both online and physical training along with the flexible timings so as to ease the things for you.

Comments
Post a Comment