Computer Science/Basics

What is Data Science? (from Coursera)

metamong 2022. 4. 13.

1) Defining Data Science & What Data Scientists Do

* What is Data Science?

 

= the field of exploring, manipulating, and analyzing data, and using data to answer questions or make recommendations.

→ what is NEW? the vast quantity of data available from massively varied sources

 

* the process of data science

 

- many organizations use data science to focus on a specific problem, and so it's essential to clarify the question that the organization wants answered

- ask questions to clarify the business need

- "what data do we need, and where will that data come from?"

- analyze the data in different ways

- after the data has revealed its insights, communicate the results to the project stakeholders (can use powerful data visualization tools)

- changing the way organizations understand the world

 

* Advice

 

- to be curious, argumentative(hypothesis), judgemental(to have preconceived notions)

- some comfort & flexibility with analytic platforms

- the ability to tell a story

- want to be a data scientist in any fields? or in specific field? (figure out your competitive advantage - know sets of skills)

cf. competitive advantage) understanding of some aspect of life (where you exceed beyond others)

 

* Old & New Problems, Data Science Solutions

 

- all organizations ultimately use data science for the same reason—to discover optimum solutions to existing problems

- innovative solutions for old problems

cf. Uber uses data to put the right number of drivers in the right place, at the right time, for a cost the rider is willing to pay

- MUST: Identify the problem and establish a clear understanding of it. Gather the data for analysis. Identify the right tools to use, and develop a data strategy.

- structured data(something which fits nicely into tables and columns and rows) & non-structured data(a weblog)

- regression: constant, the relationship between the fare and the distance (say when you get in a taxi)

 

* Cloud for Data Science

 

- to bypass the physical limitations of the computers and the systems you're using and it allows you to deploy the analytics and storage capacities of advanced machines that do not necessarily have to be your machine or your company's machine

- allows you not just to store large amounts of data on servers somewhere in California or in Nevada, but it also allows you to deploy very advanced computing algorithms and the ability to do high-performance computing using machines that are not yours

- it allows multiple entities to work with same data at the same time

- Multiple collaborators or teams can access the data simultaneously, working together on producing a solution

2) Data Science Topics Introduction

* Big Data

 

 

5V's>

 

* Digital Transformation

 

- 

 

* Data Mining

 

- Establishing Data Mining Goals: the cost benefit trade-offs for the desired level of accuracy are important considerations for data mining goals.

- Selecting Data: identifying the right kind of data needed for data mining that could answer the questions at reasonable costs is critical.

- Preprocessing Data: identify the irrelevant attributes of data and expunge such attributes from further consideration. At the same time, identifying the erroneous aspects of the data set and flagging them as such is necesary. Data should be subject to checks to ensure integrity. Lastly, you must develop a formal method of dealing with missing data and determine whether the data are missing randomly or systematically.

- Transforming Data: the next step is to determine the appropriate format in which data must be stored. An important consideration in data mining is to reduce the number of attributes needed to explain the phenomena. Data reduction algorithms, such as Principal Component Analysis can reduce the number of attributes without a significant loss in information.

- Storing Data: The transformed data must be stored in a format that makes it conducive for data mining. Data safety & privacy

- Mining Data: This step covers data analysis methods, including parametric and non-parametric methods, and machine-learning algorithms. A good starting point for data mining is data visualization

- Evaluating Mining Results: Formal evaluation could include testing the predictive capabilities of the models on observed data to see how effective and efficient the algorithms have been in reproducing data (In-sample forecast).

 

* Deep Learning & Machine Learning & AI (with Data Science)

 

- Machine Learning: 

* The Final Deliverable

 

- includes powerful narrative

- The initial planning and conceptualizing of the final deliverable is extremely important for producing a compelling document

 

* How Can Someone become a Data Scientist & Recruiting

 

- a

- relational databases, computer science theory & statistics, probability

- 

 

 

 

* The Report Structure

 

- the length of the reports varied depending largely on the purpose of the report. Brief reports were drafted as commentaries on current trends and developments that attracted public or media attention. Detailed and comprehensive reports offered a critical review of the subject matter with extensive data analysis and commentary

 

- cover page) at a minimum, the cover page should include the title of the report, names of authors, their affiliations, and contacts, name of the institutional publisher (if any), and the date of publication

- table of contents) an abstract or an executive summary (even for a short document)

- introductory section) helpful in setting up the problem for the reader who might be new to the topic and who might need to be gently introduced to the subject matter before being immersed in intricate details. You might use literature review to highlight gaps in the existing knowledge, which your analysis will try to fill

- methodology section) introduce the research methods and data sources you used for the analysis

- results section) present empirical findings

- discussion section) where rely on the power of narrative to enable numbers to communicate your thesis to your readers

- conclusion section) generalize your specific findings and take on a rather marketing approach to promote your findings so that the reader does not remain stuck in the caveats that you have voluntarily outlined earlier. 

- the list of references

- acknowledgement section) acknowledging the support of those who have enabled your work is always good

- appendices (if needed)


* 출처> 'what is data science?' by Coursera

댓글